windscape
windscape.Rmd
Introduction
The windscape
package was designed to help users analyze
the role of wind geography in spatial ecology. Wind strength and
direction vary over space and time, influencing the transport dynamics
of airborne particles like spores, pollen, seeds, and insects, and in
turn shaping emergent biodiversity patterns such as landscape genetics,
community composition, and species distributions.
The functions in windscape
help you to download hourly
wind rasters, convert them into a landscape connectivity graph, use the
graph to estimate directional wind flows among sites of interest, and
test the statistical relationships between these flows and your own
ecological data. Each of these four steps is detailed in its own section
below.
This package differs from other wind connectivity modeling frameworks
like rWind
in that it focuses not on a snapshot of wind
conditions but on connectivity over longer time periods, integrating
over many shorter, step-wise dispersal events that occur over time.
To get started, let’s load the windscape
package, as
well as tidyverse
for its general utility:
Importing wind data
The input data for a windscape analysis is a time series of wind
fields. A single wind field is a pair of raster layers, with values in
each grid cell representing the u
and v
components of the local wind vector at one point in time. Because the
modeling approach is designed to capture temporal variability in wind
conditions, high-frequency time series data should be used; summaries
like monthly, annual, or multi-year mean wind fields will probably give
poor results. “Reanalysis” data sets, such as the CFSR, ERA5, and NARR,
are great data sources as they comprise decades of
hourly gridded wind data that give excellent
representations of wind variability over space and time.
Depending on the study system, one might wish to model wind
connectivity using a relatively short time frame of a few days or weeks,
or to integrate over decades of wind conditions to more fully capture
the variability of wind dynamics that shape ecological patterns over
longer time frames. Users might also want to filter their input wind
data by season, time of day, or temperature, in order to isolate wind
conditions when their study species is likely to be airborne.
windscape
models are built using data for a single
atmospheric layer, and vertical transport is not considered. In this
vignette we’ll use data on near-surface (10 m) winds, but
higher-altitude wind data could also be used.
Wind data can be downloaded from within R, or can be obtained separately and loaded into R. Let’s see an example of both approaches.
The function cfsr_dl()
downloads hourly wind data from
the Climate Forecast System Reanalysis (CFSR). In the code below, we’ll
import 10-m-level data for a chunk of North America for a sparse series
of days distributed across a single summer. A real analysis would likely
want a denser and/or longer time series. This example runs in a few
minutes, but it could take hours or longer to download a more complete
dataset. (Note that because this region is in the western hemisphere and
CFSR uses longitudes in the 0-360 range, some shifty business is
needed.)
wind <- cfsr_dl(variable = "wnd10m", years = 2000, months = 6:9, days = seq(1, 28, 3),
xlim = c(-120, -90) + 360, # shift longitudes to be in [0, 360] range for CFSR
ylim = c(30, 50)) %>%
shift(dx = -360) # shift longitude back to the standard [-180, 180] range
Alternatively we can load wind data locally from disk (a more likely
scenario for an actual analysis), using terra::rast()
. The
data set loaded below, which ships with the windscape
package, is the same one downloaded by the example code above. We can
see it’s a SpatRaster
object with many layers, each a ‘u’
(east-west) or ‘v’ (north-south) component of the local wind vector for
a particular date and time.
wind <- rast(system.file("extdata/cfsr_usa.tif", package = "windscape"))
wind
#> class : SpatRaster
#> dimensions : 64, 96, 960 (nrow, ncol, nlyr)
#> resolution : 0.3157895, 0.3174603 (x, y)
#> extent : -120.1579, -89.84211, 29.84127, 50.15873 (xmin, xmax, ymin, ymax)
#> coord. ref. : lon/lat WGS 84
#> source : cfsr_usa.tif
#> names : u 200~00:00, u 200~00:00, u 200~00:00, u 200~00:00, u 200~00:00, u 200~00:00, ...
#> min values : -9.30, -10.16, -10.20, -10.79, -11.53, -10.94, ...
#> max values : 10.42, 10.30, 10.32, 10.16, 10.01, 9.60, ...
Next we’ll convert these data into a formal wind field time series
object of class wind_series
, which is a
SpatRaster
of wind data meeting certain criteria. To do
this, we need to tell it that our wind
dataset has
order = "uuvv"
, meaning all the u components come first,
followed by all the v components; other data sets might have
order = "uvuv"
. (It’s assumed that the order of the u’s is
the same as the v’s, so we need to be confident of this based on
knowledge of our data.) The wind_series()
function creates
this object:
series <- wind_series(wind, order = "uuvv")
Constructing a connectivity graph
In order to estimate wind flows among sites of interest, we need to
first convert our big wind field time series into a directed
connectivity graph representing the average wind flow between each grid
cell and each of its eight “queen” neighbors. The steps in this process
include summarizing the time series into a wind_rose
raster
object, optionally modifying the wind rose to incorporate non-wind
factors influencing dispersal, and/or to increase its spatial
resolution, and then converting this into a wind_graph
transition object.
1: Creating a wind rose
A wind_rose
object, constructed with the
wind_rose()
function, is a raster data set with eight
layers, each of which represents long-term wind conductance in the
direction of a neighboring cell. These conductance values are a function
of the input data on wind speed and direction, adjusted for the
curvature of the earth, distances to neighboring cells, and potential
differences between x and y raster resolutions.
The main decision when creating a wind rose is the trans
argument, which defines the transformation that turns wind speed into
conductance strength. The default is trans = 1
, which makes
conductance proportional to wind speed – but alternatively, conductance
can be made proportional to aerodynamic drag (speed^2) or to wind force
(speed^3), can account for threshold speeds above which seed abscission
is likely to occur, or can ignore speed entirely and consider only
direction; see ?wind_rose
for details.
After a local wind vector for a particular time step is transformed
from speed to conductance, this conductance is allocated between a
cardinal and semi-cardinal neighbor, in proportion to the angle and
distance between the downwind direction and the directions of the two
adjacent downwind neighbors. Conductance values are calculated for each
individual time step in the wind field data set, and then averaged to
produce the final wind rose object. If input speeds are in m/s and
trans = 1
, then the wind rose conductance values are in
units of 1/hours.
Let’s covert our wind_field_ts
into a
wind_rose
here. Plotting it, we see that is has 8 layers,
one for wind connectivity in each semi-cardinal direction.
2: Incorporationg non-wind dispersal barriers (optional)
In some cases, you may want to account for factors other than wind
that influence connectivity across a landscape, by down-weighting
conductance over cells that contain water, inhospitable terrain, or
other dispersal barriers. To incorporate factors like this, we can use
the weight_conductance()
function to scale the values in
our wind_rose
. Weights are supplied as a raster layer with
values between 0 and 1.
As an example, let’s downweight conductance by 90% in grid cells with
open water. Large water bodies could reduce landscape genetic
connectivity for wind-dispersed terrestrial organisms, since water can’t
support the intermediate populations that help facilitate long-term gene
flow across a landscape. In the example below, we’ll use the
cfsr_dl_land()
function to download a land-water raster
layer corresponding to our CFSR wind data set, classify()
its values to convert 0’s (water) to 0.1 while leaving 1’s (land) as is,
and then use this weights layer to adjust our wind rose. Plotting the
first layer of the modified wind rose, we can see ocean and lake areas
in the southwest and northeast corners where conductance has been
reduced:
3: Adjusting spatial resolution
Next, we have to consider how the spatial resolution of our wind data
relates to the distances among our study sites. The CFSR data we’re
using in this example, like many comparable data sets, has a grid cell
size of around ~30 km. This resolution will work well for estimating
wind conductance among distant sites, but since windscape models
estimate connectivity among the centers of the grid cells where sites
occur (regardless of where in the cell the a site is located), estimates
will be very noisy for sites separated by only a few cells, and will be
impossible for sites in the same cell. We can check whether this is a
problem using the check_cell_distance()
function.
If it is a problem, we can address it by increasing the resolution of
our data set using the downscale()
function, which
interpolates wind conductance data and adjusts the values to maintain
the correct connectivity units. (Note that the
wind_cost_distance()
function, discussed below, can also
make its own internal adjustments for cell-distance discrepancies, so
downscaling is not the only solution.) Note that this downscaling
process does not add more information; it just “smooths” the wind
values. It fixes artifacts of the coarse resolution, but it does not
account for any real fine-scale variation in wind dynamics that actually
occurs within a grid cell due to factors like local terrain.
Importantly, while downscaling increases accuracy for nearby sites,
it comes at a computational cost. Downscaling a wind rose by a factor of
10 will increase its size 100-fold, increasing memory usage and
dramatically increasing processing time for the connectivity
calculations discussed below. For many analyses, a compromise resolution
can be used that balances accuracy and computational tractability. When
this isn’t possible, the vrcd()
function can be used to
circumvent the trade-off; see the documentation for that function for
more details.
As an example, let’s imagine we have a set of ten study sites. Here we’ll randomly generate spatial coordinates for their locations. Then we’ll check how our wind rose resolution will work for these sites:
sites <- cbind(x = runif(10, -115, -95),
y = runif(10, 33, 47))
check_cell_distance(rose, sites)
#> Loading required package: geosphere
#> Total point pairs: 45
#> Point pairs in the same grid cell: 0 (0%)
#> Distribution of cell-point distance discrepancies:
#> 0--1%: 21 (46.7%)
#> 1--2.5%: 15 (33.3%)
#> 2.5--5%: 8 (17.8%)
#> 5--10%: 1 (2.22%)
#> 10--25%: 0 (0%)
#> 25--Inf%: 0 (0%)
It looks like we have no site pairs in the same grid cell, which is great. But we do have one site pair with a distance error of 5-10%, and another 23 pairs with errors of 1-5%. These aren’t huge errors but they’ll add some noise to our wind estimates. Let’s downscale our wind rose by a factor of 5, and then re-check our distance error rates:
rose <- downscale(rose, 5)
check_cell_distance(rose, sites)
#> Total point pairs: 45
#> Point pairs in the same grid cell: 0 (0%)
#> Distribution of cell-point distance discrepancies:
#> 0--1%: 43 (95.6%)
#> 1--2.5%: 2 (4.44%)
#> 2.5--5%: 0 (0%)
#> 5--10%: 0 (0%)
#> 10--25%: 0 (0%)
#> 25--Inf%: 0 (0%)
After downscaling, 96% of our site pairs now have distance
discrepancies of less than 1%, and the other 4% all have discrepancies
of less than 2.5% – probably an acceptable margin of error given all the
other sources of uncertainty in a modeling analysis, and given the
additional corrections used in least_cost_distance()
.
4: Building a wind graph
Next we need to create a wind_graph
object using the
wind_graph()
function. Technically, a wind graph is a
transitionLayer
as defined in the gdistance
package.
The only decision here is whether we want a downwind
or
upwind
connectivity model. These are inversions of the same
idea: the outbound (downwind) conductance from site A to site B is the
same thing as the inbound (upwind) conductance arriving to site B from
site A. The decision is not critical if the goal is to analyze flows
among a set of sites, but if the goal is to create maps of the entire
wind accessibility landscape for a single site, then the difference is
more material. We’ll create a graph of each type here:
downwind <- wind_graph(rose, direction = "downwind")
upwind <- wind_graph(rose, direction = "upwind")
upwind
#> class : wind_graph
#> dimensions : 320, 480, 153600 (nrow, ncol, ncell)
#> resolution : 0.06315789, 0.06349206 (x, y)
#> extent : -120.1579, -89.84211, 29.84127, 50.15873 (xmin, xmax, ymin, ymax)
#> crs : +proj=longlat +datum=WGS84 +no_defs
#> values : resistance
#> matrix class: dgCMatrix
Estimating wind connectivity
The wind_graph
we created above is our final wind
connectivity model, and we can use it in various ways to estimate rates
of wind transport among sites. This package supports two algorithms for
calculating wind flows between sites: “least cost path” (LCP) and
“random walk” (RW). Note that because wind inherently involves
directional flow, wind connectivity has to be represented by a “directed
cyclic” graph, which makes it impossible to use circuit theory-based
algorithms that are commonly used in other landscape connectivity
applications.
The LCP algorithm finds the fastest wind travel route across a landscape, based on the local connectivity between each cell and its neighbors. It is computationally efficient, and it represents the speed at which the first air particles would arrive at site after diffusing across the landscape from a given origin location, given certain assumptions.
The RW method instead runs a stepwise simulation. It is computationally much slower, but it captures the full distribution of particles diffusing in different directions rather than simply the speed of the first, fastest particle to reach a site.
Least cost path
Wind LCP calculations are handled through the
least_cost_surface()
and least_cost_distance()
functions. Both functions calculate the same metric but for different
data structures. For wind models constructed with
trans = 1
, they produce results in units of travel time
between grid cells; otherwise they give relative measures with less
interpretable units.
least_cost_surface()
calculates the least cost distance
between a set of user-defined sites (one or more point locations) and
every grid cell across the region, producing a wall-to-wall raster of
wind cost-distance; if an upwind
model is used then the
output represents travel times from a given grid cell to the focal site,
whereas a downwind
model will calculate travel times from
the focal site to cells across the landscape. Here we’ll calculate both
variants, and make maps of them:
site <- matrix(c(-105, 40), ncol = 2)
downwind_hrs <- least_cost_surface(downwind, site)
upwind_hrs <- least_cost_surface(upwind, site)
# restructure data and plot
d <- c(downwind_hrs, upwind_hrs) %>%
setNames(c("downwind", "upwind")) %>%
as.data.frame(xy = T) %>%
gather(direction, wind_hours, -x, -y)
ggplot(d) +
facet_wrap(~direction) +
geom_raster(aes(x, y, fill = wind_hours)) +
geom_contour(aes(x, y, z = wind_hours), bins = 20, color = "white", linewidth = .25) +
geom_point(data = as.data.frame(site), aes(V1, V2)) +
coord_fixed(ratio = 1.2) +
scale_fill_gradientn(colors = c("yellow", "red", "blue", "black")) +
theme_void() +
theme(legend.position = "top",
strip.text = element_text(size=15))
In contrast, least_cost_distance()
calculates wind
travel times between every pair of sites in a user-specified set of
locations. It returns an asymmetric matrix in which element [i,j]
represents the cost-distance from the i’th to j’th site. Here we’ll
calculate cost-distances among the 10 sites we defined above. These
values represent hours of travel time, but if we had specified the
argument rate = TRUE
, they would represent would represent
the inverse, wind flow rates per hour:
wind_time <- least_cost_distance(downwind, sites)
wind_time[1:5, 1:5]
#> [,1] [,2] [,3] [,4] [,5]
#> [1,] NaN 119.5523 379.1244 720.1041 519.7705
#> [2,] 403.0177 NaN 426.1965 599.9876 689.3432
#> [3,] 1172.0352 818.1965 NaN 539.9783 705.5348
#> [4,] 1404.3221 1057.7030 407.2637 NaN 1073.4969
#> [5,] 1076.2774 1038.5731 232.4783 719.9133 NaN
As discussed later in this document, these results can be used in downstream analyses, such as comparing them to ecological data from the same set of sites to assess the role of wind in structuring spatial biodiversity patterns.
Random walk
RW is implemented via the windscape function
random_walk()
. It involves an iterative computation, with
“particle mass” diffusing from cells to their neighbors at each
iteration, in proportion to local directional wind conductance.
To run a random walk simulation, we need to specify the initial
conditions (the starting distribution of the particle mass), the number
of iterations, and the simulation mode. As an example, let’s start with
one unit of particle mass in a single central site and run a random walk
for 900 iterations, recording the distribution of particle mass every
100 iterations. We’ll use the default mode = "pulse"
, which
models the fleeting diffusion of the particle mass that is initially
present. Plotting the result (with a red marker at the starting
location), we see the cloud of particle mass drifting and spreading as
the walk proceeds:
# create a cropped, non-downscaled wind rose so that example runs quickly
rose <- series %>% wind_rose() %>% crop(ext(.)/2)
# run the RW computation
walk <- random_walk(rose, init = site, iter = 600, record = seq(100, 600, 100))
# plot the results
walk %>%
as.data.frame(xy = T) %>%
gather(layer, value, -x, -y) %>%
mutate(layer = paste0(layer, " (", round(as.integer(str_remove(layer, "iter")) * iter_length(walk)), " hours", ")"),
layer = factor(layer, levels = unique(layer))) %>%
ggplot(aes(x, y, fill = value)) +
geom_raster() +
annotate(geom = "point", x = site[1], y = site[2], color = "red", size = .5) +
facet_wrap(~layer, nrow = 2) +
scale_fill_viridis_c(trans = "sqrt") +
theme_minimal() +
labs(fill = "mass", x = NULL, y = NULL)
The amount of simulated time represented by each iteration will vary
depending on the data; it is computed internally to maximize the ratio
of simulation time to computational processing time, and can be accessed
by calling iter_length()
on the random walk output. In our
case, one iteration is 1.82 hours, so our full simulation represents
1094 hours. (It only represents actual clock time if you use a
conductance value of trans = 1
when creating the wind rose;
otherwise, it represents a relative measure of diffusion speed with less
interpretable units.)
Our simulation tracked the fleeting diffusion of the initial “pulse”
of particle mass, most of which will eventually leave the modeling
domain after enough iterations. Alternatively, we could have specified
mode = "ratchet"
to run a propagating simulation in which
local particle mass never declines, continuing to transmit mass at the
cumulative maximum rate. This is equivalent to simulating continued
input of the initial particle mass at every iteration. Compared to the
“pulse” mode, this option may be more useful in modeling a biological
process where particles are continually released or can propagate
locally after colonization.
Testing statistical relationships
The windscape
library also provides a set of utilities
for testing statistical relationships between wind connectivity results
and data representing other ecological features of the same landscape.
These functions are intended for working with pairwise wind connectivity
estimates among a set of sites, such as the wind_time
data
we generated in the “least cost path” example above.
We’ll work with those results here. Let’s imagine the ten sites represent the locations of populations where landscape genetic data were sampled. There are various measures of landscape genetic relationships among sites we could calculate using other software, including estimates of directional gene flow, genetic differentiation, and genetic diversity. Let’s look at how to test relationships between wind travel times and each of these genetic patterns.
Statistical significance is calculated based on the widely-used
partial Mantel test. The windscape library provides a
mantel_test()
function that allows distance matrices with
different upper and lower triangles, and allows multiple control
variables, a combination of features that is key for our purposes and
isn’t possible in Mantel test implementations in other R packages. The
pairwise_means()
and pairwise_ratios()
functions refactor the data to help test different hypotheses.
In the example below, we’ll test how wind flow, wind speed, and wind asymmetry relate to a hypothetical genetic data set (see here for details on these hypotheses). Because the genetic data are generated randomly in this example, the results are not significant. Here we’ll use geographic distances as a control variable; in other cases, you might want to use variables like environmental differences or non-wind dispersal influences as controls.
# pairwise geographic distance between sites,
# for use as a control variable in the partial Mantel tests
distance <- point_distance(sites)
# test correlation between wind flow and gene flow, controlling for distance
n <- nrow(sites)
gene_flow <- matrix(runif(n^2), n) # simulate random gene flow data
r <- mantel_test(wind_time, gene_flow, z = list(distance))
# test correlation between bidirectional wind connectivity and genetic isolation,
# controlling for distance
gene_dist <- matrix(runif(n^2), n) # simulate random genetic differentiation data
wind_conn <- pairwise_means(wind_time) # convert to symmetric matrix
r <- mantel_test(wind_conn, gene_dist, z = list(distance))
# test correlation between wind asymmetry and gene flow asymmetry
# (no distance control is needed for a test of reciprocally symmetric matrices)
wind_asym <- pairwise_ratios(wind_time) # convert to asymmetry matrix
gene_asym <- pairwise_ratios(gene_flow) # convert to asymmetry matrix
r <- mantel_test(wind_asym, gene_asym)
# test correlation between wind asymmetry and genetic diversity asymmetry
diversity <- runif(n) # simulated diversity for each population
div_asym <- pairwise_ratios(diversity) # pairwise diversity ratios
r <- mantel_test(wind_asym, div_asym)
str(r)
#> List of 4
#> $ stat : num 0.219
#> $ quantile: num 0.776
#> $ p.value : num 0.448
#> $ perm : num [1:999] -0.292 -0.273 0.505 -0.239 -0.188 ...