The idea motivating DeclareDesign is that the core analytic features of research designs can be declared in a complete manner and saved as an object. Once properly declared, a design can easily be shared, modified, improved, and used. A design contains the information needed to implement key parts of data generation and subsequent analysis. It also contains enough information to allow researchers or third parties to query it and determine whether it can support the claims it makes. We describe this framework in greater detail in our paper.

Components of a research design

A research design characterized in words or in code should include four components:

  • A model, M, of how the world works. The model specifies the moving parts — the variables — and how these are causally related to each other. In this sense the model provides the context of a study, but also a speculation about the world.

  • An inquiry, I, about the distribution of variables, perhaps given interventions on some variables. In many applications I might be thought of as the “estimand.” Some inquiries are statements about the values of variables, others about the causal relations between variables. In all cases however the inquiry should be answerable given the model.

  • A data strategy, D, generates data on variables. Note that implicitly the data strategy includes case selection, or sampling decisions, but it also represents interventions such as assignment of treatments or measurement strategies. A model M tells you what sort of data you might observe if you employ data strategy D.

  • An answer strategy, A, that uses data to generate an answer.

A simple design declaration

Here is an illustration using a very simple two arm trial.

# M -- Model: Speculation on variables and relations between them
population <- declare_population(N = 100, u = rnorm(N))
potential_outcomes <- declare_potential_outcomes(Y_Z_0 = 0, 
                                                 Y_Z_1 = 1 + u)

# I -- Inquiry: A query defined in terms of potential outcomes
estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0))

# D -- Data Strategy: Researcher interventions on the world
assignment <- declare_assignment(m = 50) 

# A -- Answer Strategy: Conclusions to be drawn from data
estimator <- declare_estimator(Y ~ Z, estimand = estimand)

# Design: Putting it all together
design <- declare_design(population, 
                         potential_outcomes, 
                         estimand, 
                         assignment, 
                         reveal_outcomes, 
                         estimator,
                         description = "A very simple design")

Making Use of A Design

Use the design object to simulate data, including treatment assignments:

data <- draw_data(design)
ID u Y_Z_0 Y_Z_1 Z Z_cond_prob Y
001 1.37 0 2.37 1 0.5 2.37
002 -0.56 0 0.44 1 0.5 0.44
003 0.36 0 1.36 1 0.5 1.36
004 0.63 0 1.63 0 0.5 0.00
005 0.40 0 1.40 0 0.5 0.00
006 -0.11 0 0.89 1 0.5 0.89

Use the design object to implement analysis:

estimates <- get_estimates(design)
estimator_label coefficient_name est se p ci_lower ci_upper estimand_label
my_estimator Z 0.96 0.13 0 0.7 1.2 ATE

Diagnosing a design

The fully declared design contains the information needed to diagnose it.

diagnosis <- diagnose_design(design, sims = 10000, bootstrap_sims = 500)
Bias RMSE Power Coverage Mean Estimand Mean Estimate SD Estimate Type S-Rate
Diagnosand 0 0.1 1 0.99 1 1 0.14 0
Boostrapped SE 0 0.0 0 0.00 0 0 0.00 0

The diagnosis here confirms the fact that random assignment to treatment allows for unbiased estimates of treatment effects. We can also observe that under the current design, coverage is higher than the nominal 95% rate. This reflects the fact that the typical standard error estimators are conservative.