Declare Design

declare_design(...)

Arguments

...

A set of steps in a research design, beginning with a data.frame representing the population or a function that draws the population. Steps are evaluated sequentially. With the exception of the first step, all steps must be functions that take a data.frame as an argument and return a data.frame. Typically, many steps are declared using the declare_ functions, i.e., declare_population, declare_population, declare_sampling, declare_potential_outcomes, declare_estimand, declare_assignment, and declare_estimator. Functions from the dplyr package such as mutate can also be usefully included.

Value

a list of two functions, the design_function and the data_function. The design_function runs the design once, i.e. draws the data and calculates any estimates and estimands defined in ..., returned separately as two data.frame's. The data_function runs the design once also, but only returns the final data.

Details

Users can supply three kinds of functions to declare_design:

1. Data generating functions. These include population, assignment, and sampling functions.

2. Estimand functions.

3. Estimator functions.

The location of the estimand and estimator functions in the chain of functions determine *when* the values of the estimand and estimator are calculated. This allows users to, for example, differentiate between a population average treatment effect and a sample average treatment effect by placing the estimand function before or after sampling.

Designs declared with declare_design can be investigated with a series of post-declaration commands, such as draw_data, get_estimands, get_estimates, and diagnose_design.

The print and summary methods for a design object return some helpful descriptions of the steps in your research design. If randomizr functions are used for any assignment or sampling steps, additional details about those steps are provided.

Examples

my_population <- declare_population(N = 500, noise = rnorm(N)) my_potential_outcomes <- declare_potential_outcomes( Y_Z_0 = noise, Y_Z_1 = noise + rnorm(N, mean = 2, sd = 2)) my_sampling <- declare_sampling(n = 250) my_assignment <- declare_assignment(m = 25) my_estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) my_estimator <- declare_estimator(Y ~ Z, estimand = my_estimand) design <- declare_design(my_population, my_potential_outcomes, my_sampling, my_estimand, dplyr::mutate(noise_sq = noise^2), my_assignment, reveal_outcomes, my_estimator) design
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 500, noise = rnorm(N)) ------------- #> #> N = 500 #> #> Added variable: ID #> N_missing N_unique class #> 0 500 character #> #> Added variable: noise #> min median mean max sd N_missing N_unique #> -2.94 -0.07 -0.03 2.65 0.99 0 500 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y_Z_0 = noise, Y_Z_1 = noise + rnorm(N, mean = 2, sd = 2)) #> #> Added variable: Y_Z_0 #> min median mean max sd N_missing N_unique #> -2.94 -0.07 -0.03 2.65 0.99 0 500 #> #> Added variable: Y_Z_1 #> min median mean max sd N_missing N_unique #> -4.44 1.78 1.81 8.1 2.29 0 500 #> #> Step 3 (sampling): declare_sampling(n = 250) ----------------------------------- #> #> N = 250 (250 subtracted) #> #> Added variable: S_inclusion_prob #> 0.5 #> 250 #> 1.00 #> #> Altered variable: ID #> Before: #> N_missing N_unique class #> 0 500 character #> #> After: #> N_missing N_unique class #> 0 250 character #> #> Altered variable: noise #> Before: #> min median mean max sd N_missing N_unique #> -2.94 -0.07 -0.03 2.65 0.99 0 500 #> #> After: #> min median mean max sd N_missing N_unique #> -2.81 -0.05 -0.04 2.65 1 0 250 #> #> Altered variable: Y_Z_0 #> Before: #> min median mean max sd N_missing N_unique #> -2.94 -0.07 -0.03 2.65 0.99 0 500 #> #> After: #> min median mean max sd N_missing N_unique #> -2.81 -0.05 -0.04 2.65 1 0 250 #> #> Altered variable: Y_Z_1 #> Before: #> min median mean max sd N_missing N_unique #> -4.44 1.78 1.81 8.1 2.29 0 500 #> #> After: #> min median mean max sd N_missing N_unique #> -4.11 1.71 1.63 7.11 2.29 0 250 #> #> Step 4 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 1.669875 #> #> Step 5 (wrapped): ~dplyr::mutate(noise_sq = noise^2) --------------------------- #> #> Added variable: noise_sq #> min median mean max sd N_missing N_unique #> 0 0.47 0.99 7.88 1.38 0 250 #> #> Step 6 (assignment): declare_assignment(m = 25) -------------------------------- #> #> Added variable: Z #> 0 1 #> 225 25 #> 0.90 0.10 #> #> Added variable: Z_cond_prob #> 0.1 0.9 #> 25 225 #> 0.10 0.90 #> #> Step 7 (reveal outcomes): reveal_outcomes() ------------------------------------ #> #> Added variable: Y #> min median mean max sd N_missing N_unique #> -3.77 0.01 0.06 5.64 1.23 0 250 #> #> Step 8 (estimator): declare_estimator(Y ~ Z, estimand = my_estimand) ----------- #> #> Formula: Y ~ Z #> #> A single draw of the estimator: #> estimator_label coefficient_name est se p ci_lower #> my_estimator Z 0.9070776 0.4666258 0.06322904 -0.05389551 #> ci_upper estimand_label #> 1.868051 ATE #>
df <- draw_data(design) estimates <- get_estimates(design) estimands <- get_estimands(design)
# NOT RUN { diagnosis <- diagnose_design(design) summary(diagnosis) # }