The idea motivating DeclareDesign is that the core analytic features of research designs can be declared in a complete manner and saved as an object. Once properly declared, a design can easily be shared, modified, improved, and used. A design contains the information needed to implement key parts of data generation and subsequent analysis. It also contains enough information to allow researchers or third parties to query it and determine whether it can support the claims it makes. We describe this framework in greater detail in our paper.
A research design characterized in words or in code should include four components:
A model, M, of how the world works. The model specifies the moving parts — the variables — and how these are causally related to each other. In this sense the model provides the context of a study, but also a speculation about the world.
An inquiry, I, about the distribution of variables, perhaps given interventions on some variables. In many applications I might be thought of as the “estimand.” Some inquiries are statements about the values of variables, others about the causal relations between variables. In all cases however the inquiry should be answerable given the model.
A data strategy, D, generates data on variables. Note that implicitly the data strategy includes case selection, or sampling decisions, but it also represents interventions such as assignment of treatments or measurement strategies. A model M tells you what sort of data you might observe if you employ data strategy D.
An answer strategy, A, that uses data to generate an answer.
Here is an illustration using a very simple two arm trial.
# M -- Model: Speculation on variables and relations between them population <- declare_population(N = 100, u = rnorm(N)) potential_outcomes <- declare_potential_outcomes(Y_Z_0 = 0, Y_Z_1 = 1 + u) # I -- Inquiry: A query defined in terms of potential outcomes estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) # D -- Data Strategy: Researcher interventions on the world assignment <- declare_assignment(m = 50) # A -- Answer Strategy: Conclusions to be drawn from data estimator <- declare_estimator(Y ~ Z, estimand = estimand) # Design: Putting it all together design <- declare_design(population, potential_outcomes, estimand, assignment, reveal_outcomes, estimator, description = "A very simple design")
Use the design object to simulate data, including treatment assignments:
data <- draw_data(design)
Use the design object to implement analysis:
estimates <- get_estimates(design)
The fully declared design contains the information needed to diagnose it.
diagnosis <- diagnose_design(design, sims = 10000, bootstrap_sims = 500)
|Bias||RMSE||Power||Coverage||Mean Estimand||Mean Estimate||SD Estimate||Type S-Rate|
The diagnosis here confirms the fact that random assignment to treatment allows for unbiased estimates of treatment effects. We can also observe that under the current design, coverage is higher than the nominal 95% rate. This reflects the fact that the typical standard error estimators are conservative.