Evaluate region numbers
ps_regions_eval.Rd
This function compares multiple potential values for k
, the number of clusters in to use in ps_regions()
,
to help you decide how well different numbers of regions fit your data set. For each value of k
, it performs
a cluster analysis and calculates the proportion of total variance explained (SSE, the sum of squared pairwise
distances explained). It also calculates second-order metrics of the relationship between k and SSE. While
many data sets have no optimal value of k
and the choice is often highly subjective, these evaluation metrics
can help you identify potential points where the variance explained stops increasing quickly as k
increases.
Arguments
- ps
A
phylospatial
object. Must contain adissim
component generated by ps_add_dissim.- k
Vector of positive integers giving possible values for
k
. Values greater than the number of sites in the data set will be ignored.- plot
Logical indicating whether to print a plot of the results (
TRUE
, the default) or return a data frame of the results (FALSE
).- ...
Further arguments passed to ps_regions.
Value
The function generates a data frame with the following columns. If plot = FALSE
the data frame is
returned, otherwise the function prints a plot of the latter variables as a function of k
:
"k": The number of clusters.
"sse": The proportion of total variance explained, with variance defined as squared pairwise community phylogenetic dissimilarity between sites.
"curvature": The local second derivative. Lower (more negative) values indicate more attractive break-point values of k.
"dist11": The distance from the point to the 1:1 line on a plot of
k
vssse
in which k values over the interval from 1 to the number of sites are rescaled to the unit interval. Higher values indicate more attractive values for k.
Examples
ps <- ps_add_dissim(ps_simulate())
ps_regions_eval(ps, k = 1:15, plot = TRUE)