Automatically finds the optimal lattice index resolution for your data and query pattern using adaptive bracketing search. Runs test queries with different resolutions and recommends the one with the fastest compute speed.
Usage
tune_index_res(
x,
pool,
downsample = 1,
seed = NULL,
select = "all",
stat = NULL,
max_clim = NULL,
max_geog = NULL,
k = NULL,
weight = NULL,
theta = NULL,
x_cov = NULL,
values = NULL,
coord_type = c("auto", "lonlat", "projected"),
n_threads = NULL,
default_res = 16L,
verbose = FALSE
)Arguments
- x
Focal locations for which analogs will be found. Should be a matrix/data.frame with columns x, y, and climate variables, or a SpatRaster with climate variable layers.
- pool
The reference dataset to search for analogs. Either:
Matrix/data.frame with columns x, y, and climate variables, or SpatRaster with climate variable layers, OR
An
analog_indexobject created bybuild_analog_index()(for repeated queries).
- downsample
Optional downsampling rate (0-1) for the reference pool, indicating the proportion of points to retain. Values < 1 reduce memory and improve speed at some cost to precision. Default is 1.0 (no downsampling). Ignored if
poolis a pre-built index.- seed
Optional random seed for reproducible downsampling. If
NULL(default), uses current R random state. Ignored ifpoolis a pre-built index ordownsample = 1.- select
Character string specifying the analog selection strategy. One of:
"all"(default): Select all analogs that satisfy themax_climandmax_geogconstraints."knn_clim": For each focal, select up tokanalogs with smallest climate distance, subject to filters."knn_geog": For each focal, select up tokanalogs with smallest geographic distance, subject to filters.
- stat
Statistic(s) used to aggregate selected analogs. Either:
NULLor"none": Return all selected analog pairs as a data.frame."count": For each focal, count the number of selected analogs."sum_weights": For each focal, sum the weights of selected analogs (seeweightandtheta)."mean_weights": For each focal, mean of weights of selected analogs."sum": Sum of values across analogs (requiresvalues)."mean": Mean of values across analogs (requiresvalues)."weighted_sum": Sum of (value × weight) across analogs (requiresvaluesandweight)."weighted_mean": Weighted mean of values across analogs (requiresvaluesandweight)."ess": Kish's effective sample size (ESS), computed as the squared sum of weights divided by the sum of squared weights (requiresweight).A character vector combining multiple stats (e.g.,
c("count", "sum", "mean")). Note:"none"cannot be combined with other stats.
- max_clim
Maximum climate distance constraint (default: NULL = no climate constraint). Can be either:
A scalar: Euclidean radius in climate space (e.g., 0.5)
A vector: Per-variable absolute differences (length must equal number of climate variables)
Only reference locations within this climate distance are considered. When
x_covis provided, scalar thresholds are interpreted in Mahalanobis distance units.- max_geog
Maximum geographic distance constraint (default: NULL = no geographic constraint). When specified, only reference locations within this distance are considered. Radius units should be specified in kilometers if
coord_type = "lonlat", or in projected coordinate units ifcoord_type = "projected".- k
Number of nearest analogs to return per focal location for kNN selection modes. Required when
selectis"knn_geog"or"knn_clim"; must beNULLforselect = "all".- weight
Weighting function for matches, used only when
statincludes"sum_weights"or"mean_weights". One of:"uniform": All matches weighted equally (weight = 1.0)."inverse_clim": Inverse climate distance, weight = 1 / (climate_distance + eps), with epsilon given bytheta."inverse_geog": Inverse geographic distance, weight = 1 / (geographic_distance + eps), with epsilon given bytheta."gaussian_clim": Gaussian kernel on climate distance, weight = exp(-climate_distance^2 / (2sigma^2)), with sigma given bytheta."gaussian_geog": Gaussian kernel on geographic distance, weight = exp(-geographic_distance^2 / (2sigma^2)), with sigma given bytheta."gaussian_joint": Gaussian kernel on combined distance, weight = exp(-(clim_dist^2 / (2sigma_clim^2) + geog_dist^2 / (2sigma_geog^2))), with sigmas given bytheta."inverse_joint": Inverse joint distance, weight = 1 / (sqrt(clim_dist^2 + geog_dist^2) + eps), with epsilon given bytheta.
- theta
Optional numeric parameter used by weighting functions when
statincludes"sum_weights"or"mean_weights"andweightis not"uniform". Interpretation depends onweight:For
"inverse_clim"or"inverse_geog": epsilon value added to distances (scalar; default: 1e-12 for climate, 1e-6 for geography).For
"gaussian_clim"or"gaussian_geog": sigma bandwidth parameter (scalar; larger values = slower decay with distance).For
"gaussian_joint"or"inverse_joint": 2-element vectorc(theta_clim, theta_geog)(defaults: 1 for climate, 1 for geography).
- x_cov
Optional focal-specific covariance matrices for Mahalanobis distance calculations. Should be a matrix or data.frame with one row per focal location and one column per unique covariance component, or a SpatRaster with a layer for each component. For n climate variables, there are n*(n+1)/2 unique components, ordered as: variances first (diagonals), then covariances (upper triangle by row).
- values
Optional user-defined variables for each reference location in
poolto aggregate across selected analogs. Can be a numeric vector (single variable), matrix or data.frame with numeric columns (multiple variables), or a SpatRaster with one or more numeic layers. Must have exactly the same number of reference locations aspool.When provided, enables value-based aggregation stats
"sum","mean","weighted_sum", and"weighted_mean". For stat = NULL/"none" (pairs mode), value columns are included in output for each analog pair.- coord_type
Coordinate system type:
"auto"(default): Automatically detect from coordinate ranges."lonlat": Unprojected lon/lat coordinates (uses great-circle distance; assumesmax_geogis in km)."projected": Projected XY coordinates (uses planar distance; assumesmax_geogis in projection units).
- n_threads
Optional integer number of threads to use for the computation. If
NULL(default), the global RcppParallel setting is used (seeRcppParallel::setThreadOptions).- default_res
Default resolution to use as starting point for search. Default is 16.
- verbose
Logical; if TRUE, print the selected resolution. Default is FALSE.
Details
The function uses an adaptive bracketing algorithm:
Starts with three resolutions: default/2, default, default*2
Evaluates elapsed time for each
If minimum is at an edge, expands search in that direction
Returns resolution with lowest elapsed time
This typically requires only 3-5 query evaluations total, making it much faster than exhaustive grid search.
The function only performs tuning for non-trivial problem sizes (>2000 focal points). For smaller datasets, it returns the default resolution.
A subsample of focal points is used for benchmarking to keep tuning fast while still being representative of actual query performance.
Examples
if (FALSE) { # \dontrun{
# Find optimal resolution for velocity queries
optimal_res <- tune_index_res(
x = sample_sites,
pool = climate_data,
select = "knn_geog",
stat = NULL,
max_clim = 0.5,
k = 1
)
# Use the optimized resolution
index <- build_analog_index(climate_data, index_res = optimal_res)
} # }