Declare sampling procedure

declare_sampling(..., handler = sampling_handler, label = NULL)

sampling_handler(data, ..., sampling_variable = "S", drop_nonsampled = TRUE)

Arguments

...

arguments to be captured, and later passed to the handler

handler

a tidy-in, tidy-out function

label

a string describing the step

data

A data.frame.

sampling_variable

The prefix for the sampling inclusion probability variable.

drop_nonsampled

Logical indicating whether to drop units that are not sampled. Default is TRUE.

Value

A sampling declaration, which is a function that takes a data.frame as an argument and returns a data.frame subsetted to sampled observations and (optionally) augmented with inclusion probabilities and other quantities.

Details

declare_sampling can work with any sampling_function that takes data and returns data. The default handler is draw_rs from the randomizr package. This allows quick declaration of many sampling schemes that involve strata and clusters.

The arguments to draw_rs can include N, strata, clusters, n, prob, strata_n, and strata_prob. The arguments you need to specify are different for different designs.

Note that declare_sampling works similarly to declare_assignment a key difference being that declare_sampling functions subset data to sampled units rather than simply appending an indicator for membership of a sample (assignment). If you need to sample but keep the dataset use declare_assignment and define further steps (such as estimation) with respect to subsets defined by the assignment.

For details see the help files for complete_rs, strata_rs, cluster_rs, or strata_and_cluster_rs

Examples

# Simple random sampling design <- declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) + declare_potential_outcomes(Y ~ 0.5 * Z + 0.2 * female + 0.1 * Z * female + U) design_with_sampling <- design + declare_sampling(n = 50) nrow(draw_data(design))
#> [1] 100
nrow(draw_data(design_with_sampling))
#> [1] 50
# Stratified random sampling design + declare_sampling(strata = female)
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 55 45 #> 0.55 0.45 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.03 -0.06 -0.08 2.62 1 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ 0.5 * Z + 0.2 * female + 0.1 * Z * female + U) #> #> Formula: Y ~ 0.5 * Z + 0.2 * female + 0.1 * Z * female + U #> #> Added variable: Y_Z_0 #> min median mean max sd N_missing N_unique #> -3.03 0.04 0.01 2.62 1.01 0 100 #> #> Added variable: Y_Z_1 #> min median mean max sd N_missing N_unique #> -2.53 0.6 0.55 3.12 1.01 0 100 #> #> Step 3 (sampling): declare_sampling(strata = female) --------------------------- #> #> N = 50 (50 subtracted) #> #> Added variable: S_inclusion_prob #> 0.5 #> 50 #> 1.00 #> #> Altered variable: ID #> Before: #> N_missing N_unique class #> 0 100 character #> #> After: #> N_missing N_unique class #> 0 50 character #> #> Altered variable: female #> Before: #> 0 1 #> 55 45 #> 0.55 0.45 #> #> After: #> 0 1 #> 27 23 #> 0.54 0.46 #> #> Altered variable: U #> Before: #> min median mean max sd N_missing N_unique #> -3.03 -0.06 -0.08 2.62 1 0 100 #> #> After: #> min median mean max sd N_missing N_unique #> -2.32 -0.03 0.01 2.62 1.01 0 50 #> #> Altered variable: Y_Z_0 #> Before: #> min median mean max sd N_missing N_unique #> -3.03 0.04 0.01 2.62 1.01 0 100 #> #> After: #> min median mean max sd N_missing N_unique #> -2.12 0.08 0.11 2.62 1 0 50 #> #> Altered variable: Y_Z_1 #> Before: #> min median mean max sd N_missing N_unique #> -2.53 0.6 0.55 3.12 1.01 0 100 #> #> After: #> min median mean max sd N_missing N_unique #> -1.52 0.63 0.65 3.12 1 0 50 #>
# Custom random sampling functions my_sampling_function <- function(data, n = 20) { data[sample(n, n, replace = TRUE), , drop = FALSE] } design + declare_sampling(handler = my_sampling_function)
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 56 44 #> 0.56 0.44 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.42 0.11 0.16 2.09 0.86 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ 0.5 * Z + 0.2 * female + 0.1 * Z * female + U) #> #> Formula: Y ~ 0.5 * Z + 0.2 * female + 0.1 * Z * female + U #> #> Added variable: Y_Z_0 #> min median mean max sd N_missing N_unique #> -2.22 0.23 0.25 2.13 0.89 0 100 #> #> Added variable: Y_Z_1 #> min median mean max sd N_missing N_unique #> -1.62 0.75 0.79 2.73 0.91 0 100 #> #> Step 3 (sampling): declare_sampling(handler = my_sampling_function) ------------ #> #> N = 20 (80 subtracted) #> #> Altered variable: ID #> Before: #> N_missing N_unique class #> 0 100 character #> #> After: #> N_missing N_unique class #> 0 13 character #> #> Altered variable: female #> Before: #> 0 1 #> 56 44 #> 0.56 0.44 #> #> After: #> 0 1 #> 11 9 #> 0.55 0.45 #> #> Altered variable: U #> Before: #> min median mean max sd N_missing N_unique #> -2.42 0.11 0.16 2.09 0.86 0 100 #> #> After: #> min median mean max sd N_missing N_unique #> -1.19 0.62 0.62 1.84 0.99 0 13 #> #> Altered variable: Y_Z_0 #> Before: #> min median mean max sd N_missing N_unique #> -2.22 0.23 0.25 2.13 0.89 0 100 #> #> After: #> min median mean max sd N_missing N_unique #> -1.19 0.72 0.71 2.04 1.06 0 13 #> #> Altered variable: Y_Z_1 #> Before: #> min median mean max sd N_missing N_unique #> -1.62 0.75 0.79 2.73 0.91 0 100 #> #> After: #> min median mean max sd N_missing N_unique #> -0.69 1.27 1.26 2.64 1.1 0 13 #>