Example Design

You should make sure to install any required packages. The example design uses systemfit as well as DeclareDesign.

install.packages("systemfit")
install.packages("DeclareDesign")
install.packages("knitr")
install.packages("RCurl")

Load the packages:

library(systemfit)
library(DeclareDesign)
library(knitr)
library(RCurl)

Here is one way to load external data for use in the design:

online_data <- read.csv(file = "https://declaredesign.org/example_data.csv",
                        header = TRUE)

rho <- with(online_data, rho)
tau_A <- with(online_data, tau_A)
tau_B <- with(online_data, tau_B)

Design Declaration

You may want to give an overview of the design, and provide any background info.

  • Model:

    We have two untreated potential outcomes, \(Y_A(0)\) and \(Y_B(0)\) that are distributed standard normally and correlated with each other at \(\rho = .2\). We have \(Y_A(1) = Y_A(0) + \tau_A\) and \(Y_B(1) = Y_B(0) + \tau_B\).

  • Inquiry:

    We want to know the value of \(E[Y_A(1) - Y_A(0)]\).

  • Data strategy:

    Half the units are assigned to a treatment, \(Z_A\), that only affects \(Y_A\). Units are also independently assigned to \(Z_B\), a treatment that only affects \(Y_B\).

  • Answer strategy:

    We compare two strategies: in one we simply regress \(Y_A\) on an indicator for \(Z_A\), and in the other we use include in our system of seemingly unrelated regressions (SUR) a second equation for \(Y_B\) on \(Z_B\).

Design code

Very often, designs require custom functions and data cleaning. You may want to do that separately from your design declaration. Here we make a custom sur() function to use the seemingly unrelated regression estimator in systemfit::systemfit.

sur <- function(data){
  fit <- data.frame(summary(systemfit(formula = list(YA ~ ZA, 
                                                     YB ~ ZB),
                                      data = data,
                                      method = "SUR"))$coefficients)
  names(fit) <- c("estimate","std.error","statistic","p.value")
  fit$conf.low <- with(fit, estimate - 1.96 * std.error)
  fit$conf.high <- with(fit, estimate + 1.96 * std.error)
  fit <- cbind(estimator_label = "SUR",
               term = row.names(fit),
               fit, 
               df = NA, 
               outcome = "YA",
               estimand_label = "ate_YA")
  row.names(fit) <- NULL
  fit <- subset(fit, term == "eq1_ZA")
  return(fit)
}

Now we declare the design in code.

my_design <- 
  # Model
  declare_population(N = 100,
                     YA_0 = rnorm(N),
                     YB_0 = rnorm(N, YA_0 * rho, sqrt(1 - rho^2))) +
  declare_potential_outcomes(YA ~ YA_0 + ZA * tau_A,
                             assignment_variable = "ZA") +
  declare_potential_outcomes(YB ~ YB_0 + ZB * tau_B,
                             assignment_variable = "ZB") +
  # Inquiry
  declare_estimand(ate_YA = mean(YA_ZA_1 - YA_ZA_0)) +
  # Data Strategy
  declare_assignment(m = 50, assignment_variable = "ZA") +
  declare_assignment(m = 50, assignment_variable = "ZB") +
  declare_reveal(YA,ZA) +
  declare_reveal(YB,ZB) +
  # Answer Strategy
  declare_estimator(YA ~ ZA, model = lm_robust, 
                    estimand = "ate_YA", label = "lm_robust") +
  declare_estimator(handler = sur) 

Design diagnosis

You may want to display features of the design here.

my_design_cor <- redesign(design = my_design, rho = .8)
diagnosis <- diagnose_design(my_design, my_design_cor)
kable(reshape_diagnosis(diagnosis), digits = 2)
Design Label rho Estimand Label Estimator Label Term N Sims Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
my_design NA ate_YA lm_robust ZA 500 0.00 0.20 0.16 0.95 0.20 0.20 0.20 0.00 0.20
(0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00) (0.00)
my_design NA ate_YA SUR eq1_ZA 500 0.00 0.19 0.16 0.95 0.20 0.19 0.20 0.00 0.20
(0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00) (0.00)
my_design_cor 0.8 ate_YA lm_robust ZA 500 0.01 0.19 0.19 0.96 0.21 0.19 0.20 0.00 0.20
(0.01) (0.01) (0.02) (0.01) (0.01) (0.01) (0.00) (0.00) (0.00)
my_design_cor 0.8 ate_YA SUR eq1_ZA 500 0.00 0.12 0.36 0.95 0.20 0.12 0.12 0.00 0.20
(0.01) (0.00) (0.02) (0.01) (0.01) (0.00) (0.00) (0.00) (0.00)

We see that the SUR approach exhibits much lower variance in estimates and much higher power when \(\rho\) is high.