The idea motivating DeclareDesign is that the core analytic features of research designs can be declared in a complete manner and saved as an object. Once properly declared, a design can easily be shared, modified, improved, and used. A design contains the information needed to implement key parts of data generation and subsequent analysis. It also contains enough information to allow researchers or third parties to query it and determine whether it can support the claims it makes. We describe this framework in greater detail in our paper.
Components of a research design
A research design characterized in words or in code should include four components:
A model, M, of how the world works. The model specifies the moving parts — the variables — and how these are causally related to each other. In this sense the model provides the context of a study, but also a speculation about the world.
An inquiry, I, about the distribution of variables, perhaps given interventions on some variables. In many applications I might be thought of as the “estimand.” Some inquiries are statements about the values of variables, others about the causal relations between variables. In all cases however the inquiry should be answerable given the model.
A data strategy, D, generates data on variables. Note that implicitly the data strategy includes case selection, or sampling decisions, but it also represents interventions such as assignment of treatments or measurement strategies. A model M tells you what sort of data you might observe if you employ data strategy D.
An answer strategy, A, that uses data to generate an answer.
A simple design declaration
Here is an illustration using a very simple two arm trial.
# M -- Model: Speculation on variables and relations between them model <- declare_population(N = 100, U = rnorm(N)) + declare_potential_outcomes(Y ~ 0.5*Z + U) # I -- Inquiry: A query defined in terms of potential outcomes inquiry <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) # D -- Data Strategy: Researcher interventions on the world data_strategy <- declare_assignment(m = 50) + reveal_outcomes(Y, Z) # A -- Answer Strategy: Conclusions to be drawn from data answer_strategy <- declare_estimator(Y ~ Z, estimand = "ATE") # Design: Putting it all together design <- model + inquiry + data_strategy + answer_strategy
Making use of a design
Use the design object to simulate data, including treatment assignments. Here are the first five rows:
data <- draw_data(design)
Use the design object to implement analysis:
estimates <- draw_estimates(design)
Diagnosing a design
The fully declared design contains the information needed to diagnose it. We report the bootstrapped standard errors of the diagnosands in parentheses.
diagnosis <- diagnose_design(design, sims = 1000, bootstrap_sims = 500)
|N Sims||Bias||RMSE||Power||Coverage||Mean Estimate||SD Estimate||Mean Se||Type S Rate||Mean Estimand|
- The diagnosis here confirms the fact that random assignment to treatment allows for unbiased estimates of treatment effects.
- We can also observe that under the current design, coverage is close to the nominal 95% rate.
You can find a few different entry points into these ideas and the software tools:
- Short introduction to DeclareDesign on the World Bank Development Impact blog
- Our paper in the American Political Science Review (appendices)
- Library of research designs, which includes a range of canonical designs that can be declared with a single line of code and then tailored for your own uses
- Blog series that demonstrates how DeclareDesign can be used to shed light on a range of tricky design choices, which includes declarations of many designs