Declare Design

declare_design(...)

Arguments

...

A set of steps in a research design, beginning with a data.frame representing the population or a function that draws the population. Steps are evaluated sequentially. With the exception of the first step, all steps must be functions that take a data.frame as an argument and return a data.frame. Typically, many steps are declared using the declare_ functions, i.e., declare_population, declare_population, declare_sampling, declare_potential_outcomes, declare_estimand, declare_assignment, and declare_estimator. Functions from the dplyr package such as mutate can also be usefully included.

Value

a list of two functions, the design_function and the data_function. The design_function runs the design once, i.e. draws the data and calculates any estimates and estimands defined in ..., returned separately as two data.frame's. The data_function runs the design once also, but only returns the final data.

Details

Users can supply three kinds of functions to declare_design:

1. Data generating functions. These include population, assignment, and sampling functions.

2. Estimand functions.

3. Estimator functions.

The location of the estimand and estimator functions in the chain of functions determine *when* the values of the estimand and estimator are calculated. This allows users to, for example, differentiate between a population average treatment effect and a sample average treatment effect by placing the estimand function before or after sampling.

Designs declared with declare_design can be investigated with a series of post-declaration commands, such as draw_data, get_estimands, get_estimates, and diagnose_design.

The print and summary methods for a design object return some helpful descriptions of the steps in your research design. If randomizr functions are used for any assignment or sampling steps, additional details about those steps are provided.

Examples

my_population <- declare_population(N = 500, noise = rnorm(N)) my_potential_outcomes <- declare_potential_outcomes( Y_Z_0 = noise, Y_Z_1 = noise + rnorm(N, mean = 2, sd = 2)) my_sampling <- declare_sampling(n = 250) my_assignment <- declare_assignment(m = 25) my_estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) my_estimator <- declare_estimator(Y ~ Z, estimand = my_estimand) design <- declare_design(my_population, my_potential_outcomes, my_sampling, my_estimand, dplyr::mutate(noise_sq = noise^2), my_assignment, reveal_outcomes, my_estimator) design
#> #> Design Summary #> #> Step 1 (population): my_population --------------------------------------------- #> #> Added variable: ID #> N_missing N_unique #> 0 500 #> #> Added variable: noise #> min median mean max sd N_missing N_unique #> -2.94 -0.07 -0.03 2.65 0.99 0 500 #> #> Step 2 (potential outcomes): my_potential_outcomes ----------------------------- #> #> Added variable: Y_Z_0 #> min median mean max sd N_missing N_unique #> -2.94 -0.07 -0.03 2.65 0.99 0 500 #> #> Added variable: Y_Z_1 #> min median mean max sd N_missing N_unique #> -4.44 1.78 1.81 8.10 2.29 0 500 #> #> Step 3 (sampling): my_sampling ------------------------------------------------- #> #> #> Random sampling procedure: Complete random sampling #> Number of units: 500 #> The inclusion probabilities are constant across units. #> #> Added variable: S_inclusion_prob #> 0.5 #> Frequency 250 #> Proportion 1.00 #> #> Step 4 (estimand): my_estimand ------------------------------------------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 1.669875 #> #> Step 5 (custom data modification): dplyr::mutate(noise_sq = noise^2) ----------- #> #> Added variable: noise_sq #> min median mean max sd N_missing N_unique #> 0.00 0.47 0.99 7.88 1.38 0 250 #> #> Step 6 (assignment): my_assignment --------------------------------------------- #> #> #> Random assignment procedure: Complete random assignment #> Number of units: 250 #> Number of treatment arms: 2 #> The possible treatment categories are 0 and 1. #> The probabilities of assignment are constant across units. #> #> Added variable: Z #> 0 1 #> Frequency 225 25 #> Proportion 0.90 0.10 #> #> Added variable: Z_cond_prob #> 0.1 0.9 #> Frequency 25 225 #> Proportion 0.10 0.90 #> #> Step 7 (reveal outcomes): reveal_outcomes -------------------------------------- #> #> Added variable: Y #> min median mean max sd N_missing N_unique #> -2.81 0.02 0.16 5.79 1.37 0 250 #> #> Step 8 (estimator): my_estimator ----------------------------------------------- #> #> A single draw of the estimator: #> estimator_label est se p ci_lower ci_upper df #> my_estimator 2.061241 0.5111964 7.358847e-05 1.054401 3.068081 248 #> estimand_label #> ATE #>
df <- draw_data(design) estimates <- get_estimates(design) estimands <- get_estimands(design) diagnosis <- diagnose_design(design) summary(diagnosis)
#> #> Research design diagnosis #> #> Estimand Label Estimator Label Bias Rmse Power Coverage #> ATE my_estimator 0.01376767 0.4250835 0.994 0.952 #> Mean Estimate Sd Estimate Type S Rate Mean Estimand #> 2.007359 0.4464943 0 1.993592 #>