Declares an estimator which generates estimates and associated statistics.

Use of declare_test is identical to use of declare_estimator. Use declare_test for hypothesis testing with no specific estimand in mind; use declare_estimator for hypothesis testing when you can link each estimate to an estimand. For example, declare_test could be used for a K-S test of distributional equality and declare_estimator for a difference-in-means estimate of an average treatment effect.

declare_estimator(
  ...,
  handler = label_estimator(model_handler),
  label = "estimator"
)

declare_estimators(
  ...,
  handler = label_estimator(model_handler),
  label = "estimator"
)

label_estimator(fn)

model_handler(
  data,
  ...,
  model = estimatr::difference_in_means,
  model_summary = tidy_try,
  term = FALSE
)

Arguments

...

arguments to be captured, and later passed to the handler

handler

a tidy-in, tidy-out function

label

a string describing the step

fn

A function that takes a data.frame as an argument and returns a data.frame with the estimates, summary statistics (i.e., standard error, p-value, and confidence interval), and a term column for labeling coefficient estimates.

data

a data.frame

model

A model function, e.g. lm or glm. By default, the model is the difference_in_means function from the estimatr package.

model_summary

A model-in data-out function to extract coefficient estimates or model summary statistics, such as tidy or glance. By default, the DeclareDesign model summary function tidy_try is used, which first attempts to use the available tidy method for the model object sent to model, then if not attempts to summarize coefficients using the coef(summary()) and confint methods. If these do not exist for the model object, it fails.

term

Symbols or literal character vector of term that represent quantities of interest, i.e. Z. If FALSE, return the first non-intercept term; if TRUE return all term. To escape non-standard-evaluation use !!.

Value

A function that accepts a data.frame as an argument and returns a data.frame containing the value of the estimator and associated statistics.

Details

declare_estimator is designed to handle two main ways of generating parameter estimates from data.

In declare_estimator, you can optionally provide the name of an estimand or an objected created by declare_estimand to connect your estimate(s) to estimand(s).

The first is through label_estimator(model_handler), which is the default value of the handler argument. Users can use standard modeling functions like lm, glm, or iv_robust. The models are summarized using the function passed to the model_summary argument. This will usually be a "tidier" like broom::tidy. The default model_summary function is tidy_try, which applies a tidy method if available, and if not, tries to make one on the fly.

An example of this approach is:

declare_estimator(Y ~ Z + X, model = lm_robust, model_summary = tidy, term = "Z", estimand = "ATE")

The second approach is using a custom data-in, data-out function, usually first passed to label_estimator. The reason to pass the custom function to label_estimator first is to enable clean labeling and linking to estimands.

An example of this approach is:

my_fun <- function(data){ with(data, median(Y[Z == 1]) - median(Y[Z == 0])) }

declare_estimator(handler = label_estimator(my_fun), estimand = "ATE")

label_estimator takes a data-in-data out function to fn, and returns a data-in-data-out function that first runs the provided estimation function fn and then appends a label for the estimator and, if an estimand is provided, a label for the estimand.

Examples

# base design design <- declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) + declare_potential_outcomes( Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) + declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) + declare_assignment(m = 50) # Most estimators are modeling functions like lm or glm. # Default statistical model is estimatr::difference_in_means design + declare_estimator(Y ~ Z, estimand = "ATE")
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 47 53 #> 0.47 0.53 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.63 0.03 0 2.43 1.09 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Added variable: Y_Z_1 #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 45 55 #> 0.45 0.55 #> #> Step 6 (estimator): declare_estimator(Y ~ Z, estimand = "ATE") ----------------- #> #> Formula: Y ~ Z #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.06 0.100326 0.5980504 0.5511862 -0.139094 #> conf.high df outcome estimand_label #> 0.259094 97.9855 Y ATE #>
# lm from base R (classical standard errors assuming homoskedasticity) design + declare_estimator(Y ~ Z, model = lm, estimand = "ATE")
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.65 0.15 0.18 2.37 1.02 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 42 58 #> 0.42 0.58 #> #> Added variable: Y_Z_1 #> 0 1 #> 34 66 #> 0.34 0.66 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.08 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 44 56 #> 0.44 0.56 #> #> Step 6 (estimator): declare_estimator(Y ~ Z, model = lm, estimand = "ATE") ----- #> #> Formula: Y ~ Z #> #> Model: lm #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.12 0.09955001 1.205424 0.2309421 -0.07755375 #> conf.high estimand_label #> 0.3175538 ATE #>
# Use lm_robust (linear regression with heteroskedasticity-robust standard errors) # from `estimatr` package design + declare_estimator(Y ~ Z, model = lm_robust, estimand = "ATE")
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 63 37 #> 0.63 0.37 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.27 0.05 0 2.52 1.15 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 46 54 #> 0.46 0.54 #> #> Added variable: Y_Z_1 #> 0 1 #> 39 61 #> 0.39 0.61 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.07 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Step 6 (estimator): declare_estimator(Y ~ Z, model = lm_robust, estimand = "ATE") #> #> Formula: Y ~ Z #> #> Model: lm_robust #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z -8.881784e-17 0.09897433 -8.973826e-16 1 -0.1964113 #> conf.high df outcome estimand_label #> 0.1964113 98 Y ATE #>
# use `term` to select particular coefficients design + declare_estimator(Y ~ Z*female, term = "Z:female", model = lm_robust)
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.47 -0.23 -0.13 2.31 0.94 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 56 44 #> 0.56 0.44 #> #> Added variable: Y_Z_1 #> 0 1 #> 35 65 #> 0.35 0.65 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.21 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 44 56 #> 0.44 0.56 #> #> Step 6 (estimator): declare_estimator(Y ~ Z * female, term = "Z:female", model = lm_robust) #> #> Formula: Y ~ Z * female #> #> Model: lm_robust #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z:female -0.1234615 0.1969713 -0.6267995 0.5322785 -0.5144466 #> conf.high df outcome #> 0.2675235 96 Y #>
# Use glm from base R design + declare_estimator( Y ~ Z + female, family = "gaussian", estimand = "ATE", model = glm )
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 54 46 #> 0.54 0.46 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.2 -0.06 -0.09 2.63 0.96 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 60 40 #> 0.60 0.40 #> #> Added variable: Y_Z_1 #> 0 1 #> 43 57 #> 0.43 0.57 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.17 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 52 48 #> 0.52 0.48 #> #> Step 6 (estimator): declare_estimator(Y ~ Z + female, family = "gaussian", estimand = "ATE", model = glm) #> #> Formula: Y ~ Z + female #> #> Model: glm #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.1480392 0.1003722 1.474902 0.1434773 -0.04868674 #> conf.high estimand_label #> 0.3447652 ATE #>
# If we use logit, we'll need to estimate the average marginal effect with # margins::margins. We wrap this up in function we'll pass to model_summary library(margins) # for margins library(broom) # for tidy tidy_margins <- function(x) { tidy(margins(x, data = x$data), conf.int = TRUE) } design + declare_estimator( Y ~ Z + female, model = glm, family = binomial("logit"), model_summary = tidy_margins, term = "Z" )
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.55 -0.04 0.02 2.51 1.08 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: Y_Z_1 #> 0 1 #> 37 63 #> 0.37 0.63 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.12 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 45 55 #> 0.45 0.55 #> #> Step 6 (estimator): declare_estimator(Y ~ Z + female, model = glm, family = binomial("logit"), model_summary = tidy_margins, term = "Z") #> #> Formula: Y ~ Z + female #> #> Model: glm #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.2235912 0.08243304 2.712398 0.006679832 0.06202543 #> conf.high #> 0.385157 #>
# Multiple estimators for one estimand two_estimators <- design + declare_estimator(Y ~ Z, model = lm_robust, estimand = "ATE", label = "OLS") + declare_estimator( Y ~ Z + female, model = glm, family = binomial("logit"), model_summary = tidy_margins, estimand = "ATE", term = "Z", label = "logit" ) run_design(two_estimators)
#> estimand_label estimand estimator_label term estimate std.error statistic #> 1 ATE 0.1 OLS Z 0.04000000 0.09889182 0.4044824 #> 2 ATE 0.1 logit Z 0.06977544 0.09619629 0.7253444 #> p.value conf.low conf.high df outcome #> 1 0.6867395 -0.1562476 0.2362476 98 Y #> 2 0.4682408 -0.1187658 0.2583167 NA <NA>
# Declare estimator using a custom handler # Define your own estimator and use the `label_estimator` function for labeling # Must have `data` argument that is a data.frame my_dim_function <- function(data){ data.frame(estimate = with(data, mean(Y[Z == 1]) - mean(Y[Z == 0]))) } design + declare_estimator( handler = label_estimator(my_dim_function), estimand = "ATE" )
#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 45 55 #> 0.45 0.55 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.65 -0.07 -0.08 2.06 1 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 44 56 #> 0.44 0.56 #> #> Added variable: Y_Z_1 #> 0 1 #> 34 66 #> 0.34 0.66 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.1 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Step 6 (estimator): declare_estimator(estimand = "ATE", handler = label_estimator(my_dim_function)) #> #> A single draw of the estimator: #> estimator_label estimate estimand_label #> estimator 0.04 ATE #>