The idea motivating DeclareDesign is that the core analytic features of research designs can be declared in a complete manner and saved as an object. Once properly declared, a design can easily be shared, modified, improved, and used. A design contains the information needed to implement key parts of data generation and subsequent analysis. It also contains enough information to allow researchers or third parties to query it and determine whether it can support the claims it makes. We describe this framework in greater detail in our paper (conditionally accepted, American Political Science Review).
Components of a research design
A research design characterized in words or in code should include four components:
A model, M, of how the world works. The model specifies the moving parts — the variables — and how these are causally related to each other. In this sense the model provides the context of a study, but also a speculation about the world.
An inquiry, I, about the distribution of variables, perhaps given interventions on some variables. In many applications I might be thought of as the “estimand.” Some inquiries are statements about the values of variables, others about the causal relations between variables. In all cases however the inquiry should be answerable given the model.
A data strategy, D, generates data on variables. Note that implicitly the data strategy includes case selection, or sampling decisions, but it also represents interventions such as assignment of treatments or measurement strategies. A model M tells you what sort of data you might observe if you employ data strategy D.
An answer strategy, A, that uses data to generate an answer.
A simple design declaration
Here is an illustration using a very simple two arm trial.
# M -- Model: Speculation on variables and relations between them population <- declare_population(N = 100, u = rnorm(N)) potential_outcomes <- declare_potential_outcomes(Y_Z_0 = 0, Y_Z_1 = 1 + u) # I -- Inquiry: A query defined in terms of potential outcomes estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) # D -- Data Strategy: Researcher interventions on the world assignment <- declare_assignment(m = 50) reveal_Y <- declare_reveal(Y,Z) # A -- Answer Strategy: Conclusions to be drawn from data estimator <- declare_estimator(Y ~ Z, estimand = estimand) # Design: Putting it all together design <- population + potential_outcomes + estimand + assignment + reveal_Y + estimator
Making use of a design
Use the design object to simulate data, including treatment assignments:
data <- draw_data(design)
Use the design object to implement analysis:
estimates <- draw_estimates(design)
Diagnosing a design
The fully declared design contains the information needed to diagnose it. We report the bootstrapped standard-errors of the diagnosands in parentheses.
diagnosis <- diagnose_design(design, sims = 10000, bootstrap_sims = 500)
|Bias||RMSE||Power||Coverage||Mean Estimand||Mean Estimate||SD Estimate||Type S-Rate|
- The diagnosis here confirms the fact that random assignment to treatment allows for unbiased estimates of treatment effects.
- We can also observe that under the current design, coverage is higher than the nominal 95% rate.
- The high coverage rate arises because conventional standard errors are generally too large (though with some exceptions, for example, when treatment effects are constant) (see page 852 of Aronow, Green and Lee)
- This point is often underappreciated: without diagnosis we might not be aware that our design is overly prone to null findings
To go further, you can get started using
R, check out our design library for more diagnoses of common designs, and read our working paper for an overview of the conceptual framework behind