Declares an estimator which generates estimates and associated statistics.
Use of declare_test
is identical to use of declare_estimator
. Use declare_test
for hypothesis testing with no specific estimand in mind; use declare_estimator
for hypothesis testing when you can link each estimate to an estimand. For example, declare_test
could be used for a K-S test of distributional equality and declare_estimator
for a difference-in-means estimate of an average treatment effect.
declare_estimator( ..., handler = label_estimator(model_handler), label = "estimator" ) declare_estimators( ..., handler = label_estimator(model_handler), label = "estimator" ) label_estimator(fn) model_handler( data, ..., model = estimatr::difference_in_means, model_summary = tidy_try, term = FALSE )
... | arguments to be captured, and later passed to the handler |
---|---|
handler | a tidy-in, tidy-out function |
label | a string describing the step |
fn | A function that takes a data.frame as an argument and returns a data.frame with the estimates, summary statistics (i.e., standard error, p-value, and confidence interval), and a term column for labeling coefficient estimates. |
data | a data.frame |
model | A model function, e.g. lm or glm. By default, the model is the |
model_summary | A model-in data-out function to extract coefficient estimates or model summary statistics, such as |
term | Symbols or literal character vector of term that represent quantities of interest, i.e. Z. If FALSE, return the first non-intercept term; if TRUE return all term. To escape non-standard-evaluation use |
A function that accepts a data.frame as an argument and returns a data.frame containing the value of the estimator and associated statistics.
declare_estimator
is designed to handle two main ways of generating parameter estimates from data.
In declare_estimator
, you can optionally provide the name of an estimand or an objected created by declare_estimand
to connect your estimate(s) to estimand(s).
The first is through label_estimator(model_handler)
, which is the default value of the handler
argument. Users can use standard modeling functions like lm, glm, or iv_robust. The models are summarized using the function passed to the model_summary
argument. This will usually be a "tidier" like broom::tidy
. The default model_summary
function is tidy_try
, which applies a tidy method if available, and if not, tries to make one on the fly.
An example of this approach is:
declare_estimator(Y ~ Z + X, model = lm_robust, model_summary = tidy, term = "Z", estimand = "ATE")
The second approach is using a custom data-in, data-out function, usually first passed to label_estimator
. The reason to pass the custom function to label_estimator
first is to enable clean labeling and linking to estimands.
An example of this approach is:
my_fun <- function(data){ with(data, median(Y[Z == 1]) - median(Y[Z == 0])) }
declare_estimator(handler = label_estimator(my_fun), estimand = "ATE")
label_estimator
takes a data-in-data out function to fn
, and returns a data-in-data-out function that first runs the provided estimation function fn
and then appends a label for the estimator and, if an estimand is provided, a label for the estimand.
# base design design <- declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) + declare_potential_outcomes( Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) + declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) + declare_assignment(m = 50) # Most estimators are modeling functions like lm or glm. # Default statistical model is estimatr::difference_in_means design + declare_estimator(Y ~ Z, estimand = "ATE")#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 47 53 #> 0.47 0.53 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.63 0.03 0 2.43 1.09 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Added variable: Y_Z_1 #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 45 55 #> 0.45 0.55 #> #> Step 6 (estimator): declare_estimator(Y ~ Z, estimand = "ATE") ----------------- #> #> Formula: Y ~ Z #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.06 0.100326 0.5980504 0.5511862 -0.139094 #> conf.high df outcome estimand_label #> 0.259094 97.9855 Y ATE #># lm from base R (classical standard errors assuming homoskedasticity) design + declare_estimator(Y ~ Z, model = lm, estimand = "ATE")#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.65 0.15 0.18 2.37 1.02 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 42 58 #> 0.42 0.58 #> #> Added variable: Y_Z_1 #> 0 1 #> 34 66 #> 0.34 0.66 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.08 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 44 56 #> 0.44 0.56 #> #> Step 6 (estimator): declare_estimator(Y ~ Z, model = lm, estimand = "ATE") ----- #> #> Formula: Y ~ Z #> #> Model: lm #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.12 0.09955001 1.205424 0.2309421 -0.07755375 #> conf.high estimand_label #> 0.3175538 ATE #># Use lm_robust (linear regression with heteroskedasticity-robust standard errors) # from `estimatr` package design + declare_estimator(Y ~ Z, model = lm_robust, estimand = "ATE")#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 63 37 #> 0.63 0.37 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.27 0.05 0 2.52 1.15 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 46 54 #> 0.46 0.54 #> #> Added variable: Y_Z_1 #> 0 1 #> 39 61 #> 0.39 0.61 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.07 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Step 6 (estimator): declare_estimator(Y ~ Z, model = lm_robust, estimand = "ATE") #> #> Formula: Y ~ Z #> #> Model: lm_robust #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z -8.881784e-17 0.09897433 -8.973826e-16 1 -0.1964113 #> conf.high df outcome estimand_label #> 0.1964113 98 Y ATE #># use `term` to select particular coefficients design + declare_estimator(Y ~ Z*female, term = "Z:female", model = lm_robust)#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.47 -0.23 -0.13 2.31 0.94 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 56 44 #> 0.56 0.44 #> #> Added variable: Y_Z_1 #> 0 1 #> 35 65 #> 0.35 0.65 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.21 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 44 56 #> 0.44 0.56 #> #> Step 6 (estimator): declare_estimator(Y ~ Z * female, term = "Z:female", model = lm_robust) #> #> Formula: Y ~ Z * female #> #> Model: lm_robust #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z:female -0.1234615 0.1969713 -0.6267995 0.5322785 -0.5144466 #> conf.high df outcome #> 0.2675235 96 Y #># Use glm from base R design + declare_estimator( Y ~ Z + female, family = "gaussian", estimand = "ATE", model = glm )#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 54 46 #> 0.54 0.46 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -2.2 -0.06 -0.09 2.63 0.96 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 60 40 #> 0.60 0.40 #> #> Added variable: Y_Z_1 #> 0 1 #> 43 57 #> 0.43 0.57 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.17 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 52 48 #> 0.52 0.48 #> #> Step 6 (estimator): declare_estimator(Y ~ Z + female, family = "gaussian", estimand = "ATE", model = glm) #> #> Formula: Y ~ Z + female #> #> Model: glm #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.1480392 0.1003722 1.474902 0.1434773 -0.04868674 #> conf.high estimand_label #> 0.3447652 ATE #># If we use logit, we'll need to estimate the average marginal effect with # margins::margins. We wrap this up in function we'll pass to model_summary library(margins) # for margins library(broom) # for tidy tidy_margins <- function(x) { tidy(margins(x, data = x$data), conf.int = TRUE) } design + declare_estimator( Y ~ Z + female, model = glm, family = binomial("logit"), model_summary = tidy_margins, term = "Z" )#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.55 -0.04 0.02 2.51 1.08 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 49 51 #> 0.49 0.51 #> #> Added variable: Y_Z_1 #> 0 1 #> 37 63 #> 0.37 0.63 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.12 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 45 55 #> 0.45 0.55 #> #> Step 6 (estimator): declare_estimator(Y ~ Z + female, model = glm, family = binomial("logit"), model_summary = tidy_margins, term = "Z") #> #> Formula: Y ~ Z + female #> #> Model: glm #> #> A single draw of the estimator: #> estimator_label term estimate std.error statistic p.value conf.low #> estimator Z 0.2235912 0.08243304 2.712398 0.006679832 0.06202543 #> conf.high #> 0.385157 #># Multiple estimators for one estimand two_estimators <- design + declare_estimator(Y ~ Z, model = lm_robust, estimand = "ATE", label = "OLS") + declare_estimator( Y ~ Z + female, model = glm, family = binomial("logit"), model_summary = tidy_margins, estimand = "ATE", term = "Z", label = "logit" ) run_design(two_estimators)#> estimand_label estimand estimator_label term estimate std.error statistic #> 1 ATE 0.1 OLS Z 0.04000000 0.09889182 0.4044824 #> 2 ATE 0.1 logit Z 0.06977544 0.09619629 0.7253444 #> p.value conf.low conf.high df outcome #> 1 0.6867395 -0.1562476 0.2362476 98 Y #> 2 0.4682408 -0.1187658 0.2583167 NA <NA># Declare estimator using a custom handler # Define your own estimator and use the `label_estimator` function for labeling # Must have `data` argument that is a data.frame my_dim_function <- function(data){ data.frame(estimate = with(data, mean(Y[Z == 1]) - mean(Y[Z == 0]))) } design + declare_estimator( handler = label_estimator(my_dim_function), estimand = "ATE" )#> #> Design Summary #> #> Step 1 (population): declare_population(N = 100, female = rbinom(N, 1, 0.5), U = rnorm(N)) #> #> N = 100 #> #> Added variable: ID #> N_missing N_unique class #> 0 100 character #> #> Added variable: female #> 0 1 #> 45 55 #> 0.45 0.55 #> #> Added variable: U #> min median mean max sd N_missing N_unique #> -3.65 -0.07 -0.08 2.06 1 0 100 #> #> Step 2 (potential outcomes): declare_potential_outcomes(Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U))) #> #> Formula: Y ~ rbinom(N, 1, prob = pnorm(0.2 * Z + 0.2 * female + 0.1 * Z * female + U)) #> #> Added variable: Y_Z_0 #> 0 1 #> 44 56 #> 0.44 0.56 #> #> Added variable: Y_Z_1 #> 0 1 #> 34 66 #> 0.34 0.66 #> #> Step 3 (estimand): declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) ----------------- #> #> A single draw of the estimand: #> estimand_label estimand #> ATE 0.1 #> #> Step 4 (assignment): declare_assignment(m = 50) -------------------------------- #> #> Added variable: Z #> 0 1 #> 50 50 #> 0.50 0.50 #> #> Added variable: Z_cond_prob #> 0.5 #> 100 #> 1.00 #> #> Step 5 (reveal): reveal_outcomes(outcome_variables = "Y", assignment_variables = "Z", label = "Autogenerated by ") #> #> Added variable: Y #> 0 1 #> 40 60 #> 0.40 0.60 #> #> Step 6 (estimator): declare_estimator(estimand = "ATE", handler = label_estimator(my_dim_function)) #> #> A single draw of the estimator: #> estimator_label estimate estimand_label #> estimator 0.04 ATE #>