Analog density: kernel-weighted analog count within climate/geographic limits
Source:R/analog_density.R
analog_density.RdComputes, for each focal location, the kernel-weighted sum of all reference locations that satisfy the supplied climate and geographic constraints.
Usage
analog_density(
x,
pool,
x_cov = NULL,
weight = NULL,
coord_type = "auto",
stat = c("sum_weights"),
max_clim = NULL,
max_geog = NULL,
kernel = c("uniform", "inverse_clim", "inverse_geog", "gaussian_clim", "gaussian_geog",
"gaussian_joint", "inverse_joint"),
theta = NULL,
normalize = "auto",
index_res = "auto",
cell_area_weight = "auto",
n_threads = NULL,
downsample = 1,
seed = NULL,
progress = FALSE,
...
)Arguments
- x
Focal locations for which analogs will be found. Should be a matrix/data.frame with columns x, y, and climate variables, or a SpatRaster with climate variable layers.
- pool
The reference dataset to search for analogs. Either:
Matrix/data.frame with columns x, y, and climate variables, or SpatRaster with climate variable layers, OR
An
analog_indexobject created bybuild_analog_index()(for repeated queries).
- x_cov
Optional focal-specific covariance matrices for Mahalanobis distance calculations. Should be a matrix or data.frame with one row per focal location and one column per unique covariance component, or a SpatRaster with a layer for each component. For n climate variables, there are n*(n+1)/2 unique components, ordered as: variances first (diagonals), then covariances (upper triangle by row).
- weight
Optional pool site weights for use in aggregation. Numeric vector, single-column matrix/data.frame, or single-layer SpatRaster, with one value per row/cell of
pool. For aggregation stats like"weighted_mean","regression", etc., weights multiply through the weighted aggregation alongside any kernel weighting and cell-area weighting; they do not influence which analogs are selected byknn_*modes (selection remains distance-only). They are reported in pair mode as auser_weightcolumn. Values must be non-negative;NAis allowed and treated as 0 (the point is excluded from aggregation). DefaultNULLmeans no user-supplied weights.If you want to exclude a static subset of pool sites entirely, masking
pool(and any associatedy/covariates) upfront is more efficient than passingweight = 0for those sites, since the lattice index will not have to scan or distance-compute against them. Useweight = 0for cases where the mask varies per query against a shared index, or where some sites have a continuous weight and others should be excluded.- coord_type
Coordinate system type:
"auto"(default): Automatically detect from coordinate ranges."lonlat": Unprojected lon/lat coordinates (uses great-circle distance; assumesmax_geogis in km)."projected": Projected XY coordinates (uses planar distance; assumesmax_geogis in projection units).
- stat
Character vector of one or more density-style summary statistics. Allowed options:
"sum_weights"(default),"mean_weights","count","ess". Seeanalog_search()for detailed stat descriptions.- max_clim
Maximum climate distance constraint (default: NULL = no climate constraint). Can be either:
A scalar: Euclidean radius in climate space (e.g., 0.5)
A vector: Per-variable absolute differences (length must equal number of climate variables)
Only reference locations within this climate distance are considered. When
x_covis provided, scalar thresholds are interpreted in Mahalanobis distance units.- max_geog
Maximum geographic distance constraint (default: NULL = no geographic constraint). When specified, only reference locations within this distance are considered. Radius units should be specified in kilometers if
coord_type = "lonlat", or in projected coordinate units ifcoord_type = "projected".- kernel
Kernel decay function for weighting matches, used only when
statincludes a weighted aggregation ("sum_weights","mean_weights","weighted_sum","weighted_mean","ess","regression", or"tabulate"). One of:"uniform": All matches weighted equally (kernel weight = 1.0)."inverse_clim": Inverse climate distance, kernel weight = 1 / (climate_distance + eps), with epsilon given bytheta."inverse_geog": Inverse geographic distance, kernel weight = 1 / (geographic_distance + eps), with epsilon given bytheta."gaussian_clim": Gaussian kernel on climate distance, kernel weight = exp(-climate_distance^2 / (2 sigma^2)), with sigma given bytheta."gaussian_geog": Gaussian kernel on geographic distance, kernel weight = exp(-geographic_distance^2 / (2 sigma^2)), with sigma given bytheta."gaussian_joint": Gaussian kernel on combined distance, kernel weight = exp(-(clim_dist^2 / (2 sigma_clim^2) + geog_dist^2 / (2 sigma_geog^2))), with sigmas given bytheta."inverse_joint": Inverse joint distance, kernel weight = 1 / (sqrt(clim_dist^2 + geog_dist^2) + eps), with epsilon given bytheta.
- theta
Optional numeric parameter controlling the shape of the weighting
kernel, used wheneverkernelis active (i.e. wheneverstatincludes a weighted aggregation) andkernelis not"uniform". Interpretation depends onkernel:For
"inverse_clim"or"inverse_geog": epsilon value added to distances (scalar; default: 1e-12 for climate, 1e-6 for geography).For
"gaussian_clim"or"gaussian_geog": sigma bandwidth parameter (scalar; larger values = slower decay with distance).For
"gaussian_joint"or"inverse_joint": 2-element vectorc(theta_clim, theta_geog)(defaults: 1 for climate, 1 for geography).
See
kernel_params()for help choosingthetaandmax_clim/max_geogvalues that work well together.- normalize
One of
TRUE,FALSE, or"auto"(default). Only used ifstatincludes"sum_weights"or"tabulate"andpoolis a raster. When active, results for these stats are divided by a global scalar so that they represent a fraction of a theoretically "perfect" scenario where the full search area withinmax_geogis occupied wall-to-wall by cells whose climate exactly matchesx. See details underanalog_search()for more info.- index_res
Tuning parameter giving the number of bins per dimension of the internally-used lattice search index. Either:
A positive integer.
"auto"(the default): Automatically tune the index resolution by optimizing compute time on a subsample of focal points. If focal has relatively few rows, auto-tuning is skipped and a default resolution of 16 is used. Auto-tuning is not supported whendownsample < 1, because the speed-optimal resolution can sometimes result in higher uncertainty of stat results under downsampling. In that case setindex_resexplicitly; finer values (e.g. 32) generally give better accuracy at the possible cost of query speed.
Ignored if
poolis ananalog_index(uses index's resolution).- cell_area_weight
Controls cell-area weighting when
poolis a raster. One of"auto"(default; on for raster pools, off otherwise),TRUE(force on; errors ifpoolis not a SpatRaster), orFALSE(force off). Cell-area weights correct aggregation statistics for non-uniform cell areas (e.g. lonlat grids near the poles, or projected grids on non-equal-area projections); they are computed viaterra::cellSize()and normalized to mean 1. Whenpoolis a pre-builtanalog_index, this argument must agree with the index's stored configuration:cell_area_weight = FALSEerrors if the index was built with cell-area weighting on (rebuild the index instead).- n_threads
Optional integer number of threads to use for the computation. If
NULL(default), the global RcppParallel setting is used (seeRcppParallel::setThreadOptions).- downsample
Optional downsampling rate (0-1) for the reference pool, indicating the proportion of points to retain. Values < 1 reduce memory and improve speed at some cost to precision. Default is 1.0 (no downsampling). Ignored if
poolis a pre-built index. Whendownsample < 1,index_resmust be set explicitly (auto-tuning is not supported in this case; see theindex_resparameter for details).- seed
Optional random seed for reproducible downsampling. If
NULL(default), uses current R random state. Ignored ifpoolis a pre-built index ordownsample = 1.- progress
Logical; if
TRUE, display a progress bar during computation. Progress tracking works by splitting the focal dataset into chunks and processing them sequentially. Useful for large datasets. Default isFALSE.- ...
Additional arguments forwarded to
analog_search()(e.g.mean_cell_areafor tiled workflows).
Value
A data.frame, or a SpatRaster when x is one. Contains
index, x, y, and sum_weights. When normalize = TRUE, the
value of D_max is attached as a result attribute. See metadata()
for attached metadata attributes.
Details
This function is a wrapper that calls analog_search() using
select = "all" and stat = "sum_weights". It also supports "count",
"mean_weights", and "ess" as additional stats.
By default (normalize = "auto"), the returned sum_weights column is
divided by a global scalar D_max whenever pool meets the prerequisites
(raster pool with cell-area weighting active). This gives values on roughly
[0, 1], interpretable as the fraction of theoretical maximum analog density
achieved at each focal. See analog_search() for more details.
References
Mahony CR, Cannon AJ, Wang T, Aitken SN (2017). "A closer look at novel climates: New methods and insights at continental to landscape scales." Global Change Biology, 23(9), 3934-3955. doi:10.1111/gcb.13645
Abatzoglou JT, Dobrowski SZ, Parks SA (2020). "Multivariate climate departures have outpaced univariate changes across global lands." Scientific Reports, 10(1), 3891. doi:10.1038/s41598-020-60270-5
Williams JW, Jackson ST, Kutzbach JE (2007). "Projected distributions of novel and disappearing climates by 2100 AD." Proceedings of the National Academy of Sciences, 104(14), 5738-5742. doi:10.1073/pnas.0606292104
See also
analog_search() for the underlying flexible analog search function;
tiled_analog_search() for memory-safe searches on large raster datasets.
Examples
if (FALSE) { # \dontrun{
# Normalized density (default): returns sum_weights on roughly [0, 1]
dens <- analog_density(
x = sites,
pool = climate_data,
max_clim = 0.5,
max_geog = 100,
kernel = "gaussian_clim",
theta = 0.2
)
attr(dens, "D_max") # the global denominator used
# Raw (unnormalized) kernel-weighted sums
dens_raw <- analog_density(
x = sites,
pool = climate_data,
max_clim = 0.5,
max_geog = 100,
kernel = "gaussian_clim",
theta = 0.2,
normalize = FALSE
)
# Joint Gaussian weighting (both climate and geography)
intens_joint <- analog_density(
x = sites,
pool = climate_data,
max_clim = 0.5,
max_geog = 100,
kernel = "gaussian_joint",
theta = c(0.2, 50) # c(clim_bandwidth, geog_bandwidth)
)
# With pre-built index (for repeated queries)
index <- build_analog_index(climate_data)
i1 <- analog_density(x = sites1, pool = index, max_clim = 0.5,
max_geog = 100, kernel = "inverse_clim")
i2 <- analog_density(x = sites2, pool = index, max_geog = 100,
kernel = "gaussian_clim", theta = 0.3)
} # }