Model-Inquiry-Data Strategy-Answer Strategy (MIDA)

The idea motivating DeclareDesign is that the core analytic features of research designs can be declared in a complete manner and saved as an object. Once properly declared, a design can easily be shared, modified, improved, and used. A design contains the information needed to implement key parts of data generation and subsequent analysis. It also contains enough information to allow researchers or third parties to query it and determine whether it can support the claims it makes. We describe this framework in greater detail in our paper.

Components of a research design

A research design characterized in words or in code should include four components:

  • A model, M, of how the world works. The model specifies the moving parts — the variables — and how these are causally related to each other. In this sense the model provides the context of a study, but also a speculation about the world.

  • An inquiry, I, about the distribution of variables, perhaps given interventions on some variables. In many applications I might be thought of as the “estimand.” Some inquiries are statements about the values of variables, others about the causal relations between variables. In all cases however the inquiry should be answerable given the model.

  • A data strategy, D, generates data on variables. Note that implicitly the data strategy includes case selection, or sampling decisions, but it also represents interventions such as assignment of treatments or measurement strategies. A model M tells you what sort of data you might observe if you employ data strategy D.

  • An answer strategy, A, that uses data to generate an answer.

A simple design declaration

Here is an illustration using a very simple two arm trial.

# M -- Model: Speculation on variables and relations between them
model <- 
  declare_population(N = 100, U = rnorm(N)) +
  declare_potential_outcomes(Y ~ 0.5*Z + U)

# I -- Inquiry: A query defined in terms of potential outcomes
inquiry <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0))

# D -- Data Strategy: Researcher interventions on the world
data_strategy <- 
  declare_assignment(m = 50) +
  reveal_outcomes(Y, Z)

# A -- Answer Strategy: Conclusions to be drawn from data
answer_strategy <- 
  declare_estimator(Y ~ Z, estimand = "ATE")

# Design: Putting it all together
design <- model + inquiry + data_strategy + answer_strategy

Making use of a design

Use the design object to simulate data, including treatment assignments. Here are the first five rows:

data <- draw_data(design)
ID U Y_Z_0 Y_Z_1 Z Z_cond_prob Y
001 1.37 1.37 1.87 1 0.5 1.87
002 -0.56 -0.56 -0.06 1 0.5 -0.06
003 0.36 0.36 0.86 0 0.5 0.36
004 0.63 0.63 1.13 0 0.5 0.63
005 0.40 0.40 0.90 1 0.5 0.90

Use the design object to implement analysis:

estimates <- draw_estimates(design)
estimator_label term estimate std.error statistic p.value conf.low conf.high df outcome estimand_label
estimator Z 0.69 0.21 3.4 0 0.29 1.1 96 Y ATE

Diagnosing a design

The fully declared design contains the information needed to diagnose it. We report the bootstrapped standard errors of the diagnosands in parentheses.

diagnosis <- diagnose_design(design, sims = 1000, bootstrap_sims = 500)
N Sims Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
1000 -0.00 0.20 0.71 0.95 0.50 0.20 0.20 0.00 0.50
(0.01) (0.00) (0.01) (0.01) (0.01) (0.00) (0.00) (0.00) (0.00)
  • The diagnosis here confirms the fact that random assignment to treatment allows for unbiased estimates of treatment effects.
  • We can also observe that under the current design, coverage is close to the nominal 95% rate.

You can find a few different entry points into these ideas and the software tools: