Bin quantitative data into strata
Usage
stratify(
x,
breaks = NULL,
n_strata = 5,
transform = identity,
offset = 0,
zero_stratum = FALSE
)Arguments
- x
A matrix or vector containing non-negative values.
- breaks
Numeric vector of stratum breakpoints.
- n_strata
Integer giving the number of strata to split the data into. Must be 2 or greater. Larger values yield randomizations with less mixing but higher fidelity to the original marginal distributions. Default is
5. Ignored unlessbreaks = NULL.- transform
A function used to transform the values in
xbefore assigning them ton_strataequal-width intervals. Examples includesqrt,log,rank(for equal-occupancy strata), etc.; the default isidentity. Ifzero_stratum = TRUE, the transformation is only applied on nonzero values. The function should pass NA values. This argument is ignored unlessbreaks = NULL.- offset
Numeric value between -1 and 1 (default 0) indicating how much to shift stratum breakpoints relative to the binwidth (applied during quantization as:
breaks <- breaks + offset * bw). To assess sensitivity to stratum boundaries, runquantize()multiple times with different offset values. Ignored unlessbreaks = NULL.- zero_stratum
Logical indicating whether to segregate zeros into their own stratum. If
FALSE(the default), zeros will likely be combined into a stratum that also includes small positive numbers. Ifbreaksis specified, zero simply gets added as an additional break; if not, one of then_stratawill represent zeros and the others will be nonzero ranges.
Examples
# Stratify a numeric vector
x <- c(0, 0, 0.1, 0.5, 1.2, 3.4, 5.6, 10.2)
stratify(x, n_strata = 3)
#> [1] 1 1 1 1 1 1 2 3
#> attr(,"breaks")
#> [1] -Inf 3.4 6.8 Inf
# With transformation
stratify(x, n_strata = 3, transform = log1p)
#> [1] 1 1 1 1 1 2 3 3
#> attr(,"breaks")
#> [1] -Inf 1.2 3.4 Inf
# Separate zero stratum
stratify(x, n_strata = 3, zero_stratum = TRUE)
#> [1] 1 1 2 2 2 2 3 3
#> attr(,"breaks")
#> [1] -Inf 0.00 5.15 Inf
