Declare Design

declare_design(..., title = NULL, authors = NULL, description = NULL,
  citation = NULL)

Arguments

...

A set of steps in a research design, beginning with a data.frame representing the population or a function that draws the population. Steps are evaluated sequentially. With the exception of the first step, all steps must be functions that take a data.frame as an argument and return a data.frame. Typically, many steps are declared using the declare_ functions, i.e., declare_population, declare_population, declare_sampling, declare_potential_outcomes, declare_estimand, declare_assignment, and declare_estimator. Functions from the dplyr package such as mutate can also be usefully included.

title

(optional) The title of the study, as a character string.

authors

(optional) The authors of the study, as a character string.

description

(optional) A description of the design in words, as a character string, stored alongside the declaration in code.

citation

(optional) The preferred citation for the design, as a character string. Either include the full citation in text, or paste a BibTeX entry. If title and authors are specified and you leave citation empty, a BibTeX entry will be created automatically.

Value

a list of two functions, the design_function and the data_function. The design_function runs the design once, i.e. draws the data and calculates any estimates and estimands defined in ..., returned separately as two data.frame's. The data_function runs the design once also, but only returns the final data.

Details

Users can supply three kinds of functions to declare_design:

1. Data generating functions. These include population, assignment, and sampling functions.

2. Estimand functions.

3. Estimator functions.

The location of the estimand and estimator functions in the chain of functions determine *when* the values of the estimand and estimator are calculated. This allows users to, for example, differentiate between a population average treatment effect and a sample average treatment effect by placing the estimand function before or after sampling.

Designs declared with declare_design can be investigated with a series of post-declaration commands, such as draw_data, get_estimands, get_estimates, and diagnose_design.

The print and summary methods for a design object return some helpful descriptions of the steps in your research design. If randomizr functions are used for any assignment or sampling steps, additional details about those steps are provided.

Examples

my_population <- declare_population(N = 500, noise = rnorm(N)) my_potential_outcomes <- declare_potential_outcomes( Y_Z_0 = noise, Y_Z_1 = noise + rnorm(N, mean = 2, sd = 2)) my_sampling <- declare_sampling(n = 250) my_assignment <- declare_assignment(m = 25) my_estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) my_estimator <- declare_estimator(Y ~ Z, estimand = my_estimand) design <- declare_design(my_population, my_potential_outcomes, my_sampling, my_estimand, dplyr::mutate(noise_sq = noise^2), my_assignment, reveal_outcomes, my_estimator) design
#> #> Design Summary #> #> Step 1 (population): my_population --------------------------------------------- #> #> N = 500 #> #> Added variable: ID #> N_missing N_unique #> 0 500 #> #> Added variable: noise #> min median mean max sd N_missing N_unique #> -2.42 -0.02 0.03 2.84 0.95 0 500 #> #> Step 2 (potential outcomes): my_potential_outcomes ----------------------------- #> #> Added variable: Y_Z_0 #> min median mean max sd N_missing N_unique #> -2.42 -0.02 0.03 2.84 0.95 0 500 #> #> Added variable: Y_Z_1 #> min median mean max sd N_missing N_unique #> -5.51 2.08 2.05 9.77 2.19 0 500 #> #> Step 3 (sampling): my_sampling ------------------------------------------------- #> #> N = 250 (250 subtracted) #> #> Added variable: S_inclusion_prob #> 0.5 NA #> Frequency 250 0 #> Proportion 1.00 0.00 #> #> Altered variable: ID #> Before: #> N_missing N_unique #> 0 500 #> #> After: #> N_missing N_unique #> 0 250 #> #> Altered variable: noise #> Before: #> min median mean max sd N_missing N_unique #> -2.42 -0.02 0.03 2.84 0.95 0 500 #> #> After: #> min median mean max sd N_missing N_unique #> -2.27 0.09 0.14 2.84 0.94 0 250 #> #> Altered variable: Y_Z_0 #> Before: #> min median mean max sd N_missing N_unique #> -2.42 -0.02 0.03 2.84 0.95 0 500 #> #> After: #> min median mean max sd N_missing N_unique #> -2.27 0.09 0.14 2.84 0.94 0 250 #> #> Altered variable: Y_Z_1 #> Before: #> min median mean max sd N_missing N_unique #> -5.51 2.08 2.05 9.77 2.19 0 500 #> #> After: #> min median mean max sd N_missing N_unique #> -4.10 2.03 2.13 8.85 2.21 0 250 #> #> Step 4 (estimand): my_estimand ------------------------------------------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 1.988531 #> #> Step 5 (declare step): dplyr::mutate(noise_sq = noise^2) ----------------------- #> #> Added variable: noise_sq #> min median mean max sd N_missing N_unique #> 0.00 0.42 0.90 8.06 1.26 0 250 #> #> Step 6 (assignment): my_assignment --------------------------------------------- #> #> Added variable: Z #> 0 1 NA #> Frequency 225 25 0 #> Proportion 0.90 0.10 0.00 #> #> Added variable: Z_cond_prob #> 0.1 0.9 NA #> Frequency 25 225 0 #> Proportion 0.10 0.90 0.00 #> #> Step 7 (reveal outcomes): reveal_outcomes -------------------------------------- #> #> Added variable: Y #> min median mean max sd N_missing N_unique #> -4.10 0.10 0.32 6.68 1.36 0 250 #> #> Step 8 (estimator): my_estimator ----------------------------------------------- #> #> Formula: Y ~ Z #> #> A single draw of the estimator: #> estimator_label coefficient_name est se p ci_lower #> my_estimator Z 2.050421 0.532321 0.0007364653 0.9533854 #> ci_upper estimand_label #> 3.147456 ATE #>
df <- draw_data(design) estimates <- get_estimates(design) estimands <- get_estimands(design)
# NOT RUN { diagnosis <- diagnose_design(design) summary(diagnosis) # }