Pre-builds a reusable lattice index from reference climate data. The index can be queried multiple times with different focal points and parameters, avoiding the need to rebuild the lattice for each query.
Usage
build_analog_index(
pool,
coord_type = c("auto", "lonlat", "projected"),
index_res = 16,
downsample = 1,
seed = NULL,
cell_area_weight = "auto",
mean_cell_area = NULL
)Arguments
- pool
The reference dataset to search for analogs. Should be a matrix/data.frame with columns x, y, and climate variables, or a SpatRaster with climate variable layers.
- coord_type
Coordinate system type:
"auto"(default): Automatically detect from coordinate ranges."lonlat": Unprojected lon/lat coordinates (uses great-circle distance; assumesmax_geogis in km)."projected": Projected XY coordinates (uses planar distance; assumesmax_geogis in projection units).
- index_res
Tuning parameter giving the number of bins per dimension of the internally-used lattice search index. Either:
A positive integer.
"auto"(the default): Automatically tune the index resolution by optimizing compute time on a subsample of focal points. If focal has relatively few rows, auto-tuning is skipped and a default resolution of 16 is used.
Ignored if
poolis ananalog_index(uses index's resolution).- downsample
Optional downsampling rate (0-1) indicating the proportion of points in
poolto retain. Downsampling reduces memory use and improves query speed at the cost of some precision; adaptive stratified sampling is used to minimize loss of precision. The default is 1.0 (no downsampling). See Details for more info.- seed
Optional random seed for reproducible downsampling. If
NULL(default), uses current R random state.- cell_area_weight
Controls cell-area weighting for raster pools. One of:
"auto"(default): Compute cell-area weights whenpoolis a SpatRaster, and skip them otherwise. This corrects aggregation statistics for non-uniform cell areas (e.g. lonlat grids where cell area shrinks toward the poles, or projected grids on non-equal-area projections).TRUE: Force cell-area weighting on. Errors ifpoolis not a SpatRaster.FALSE: Force cell-area weighting off; treat all pool points as having equal weight.A numeric vector of length
nrow(pool): Use these caller-supplied weights as-is, without any further normalization. This is intended for advanced workflows liketiled_analog_search()that need to maintain a globally consistent normalization across multiple per-tile index builds; most users should use one of the three options above.
When
"auto"orTRUEtriggers computation, weights are computed viaterra::cellSize()and normalized to mean 1 over finite values, so absolute magnitudes of stats likesum_weightsremain comparable to the unweighted case. The weights are stored on the returned index and used during all subsequent queries.- mean_cell_area
Optional scalar mean cell area (in km^2) to attach to the index, overriding any value auto-computed from the raster pool. Intended for internal use by
tiled_analog_search()to propagate a globally-consistent mean area across per-tile index builds (so thatanalog_density(normalize = TRUE)produces consistent values across tiles). Most users should leave thisNULL.
Value
An S3 object of class "analog_index" containing:
The compiled lattice index (internal C++ structure)
Reference data
Metadata: coordinate type, dimensions, ranges, resolution
Diagnostics: bin counts, occupancy statistics, and downsampling info
Details
The lattice index is a multidimensional grid of bins, built over both geographic and climate dimensions. This structure enables efficient analog searches by first filtering and sorting bins of similar points before computing exact results. For lon/lat coordinates, the index uses ECEF (Earth-Centered Earth-Fixed) space internally for optimal performance.
Index resolution (index_res) controls the granularity of spatial
binning. The optimal value depends on your data size and query patterns.
Use tune_index_res() to find the best resolution for your use case,
or accept the default of 16 which works well for many applications.
Downsampling
For very large datasets, downsampling can significantly improve memory usage
and query speed, at the cost of some precision. The downsample parameter controls
the target fraction of the data points in pool that are retained in the index.
Downsampling uses an adaptive stratified approach: densely-packed bins are thinned more
aggressively while sparse bins are preserved, which helps reduce imprecision
in sparse regions compared to fully random sampling. Note: The actual rate may be
higher than requested if maintaining at least one point per occupied bin requires
it (common with sparse data or fine-grained binning); check index$downsample_actual.
Each remaining analog in the downsampled pool gets a sample_weight indicating
the number of points it represents in the original pool; this weight is the inverse
of the sampling rate in the analog's index bin. For pair queries (stat = "none"),
results include each analog's sample_weight. For aggregation stats (count, sum,
mean, etc.), sampling weights are used internally to automatically correct for
the downsampling bias.
Examples
if (FALSE) { # \dontrun{
# Build index with default settings
index <- build_analog_index(climate_data)
# Build with explicit resolution
index <- build_analog_index(climate_data, index_res = 20)
# Build with downsampling for large datasets
index <- build_analog_index(
large_climate_data,
index_res = 16,
downsample = 0.1, # Reduce max bin size to 10%
seed = 123 # Reproducible sampling
)
# Query the index multiple times
v1 <- analog_velocity(sites1, pool = index, max_clim = 0.5)
v2 <- analog_velocity(sites2, pool = index, max_clim = 0.3)
a1 <- analog_availability(sites3, pool = index, max_clim = 0.5, max_geog = 100)
} # }