One of the simplest ways to learn about the effect of a cause is to randomly divide a group in two: one gets a treatment and the other doesn’t. By comparing the averages of the outcomes in the treatment group to those of the control group, you get to learn about the causal effect of the treatment – at least on average. Magic.

So how does it work? In fact it works only if we are willing to make a few assumptions. First, for each individual in our study, we are willing to imagine two counterfactuals: one in which they ended up in the treatment group, and one where they ended up in the control. We’ll define the unit level treatment effect as the difference between these two. Second we’ll assume that an individual’s outcome is not affected by whoever else did or did not end up in their group.^{1}

Ideally, we’d like to observe both counterfactuals for each individual, and we’d know the true causal effect by looking at the differences. But we can’t see the world in two counterfactual states at once: that’s called the fundamental problem of causal inference.

Amazingly though, while we can never learn the true causal effect of an individual, we *can* learn about the true causal effect *averaged across groups of individuals*. The basis for that inference shares an interesting parallel with random sampling.

When we take a random sample from a population, the sample becomes representative of the population in the sense that the average outcome in the sample will, on average, be the same as the average outcome in the population. Technically: the procedure delivers unbiased estimates of averages. In an experiment, it’s as though we’re taking two separate representative samples from *two counterfactual states of the world*, one from a world in which the cause is present and one from one in which it is absent. By taking the mean of the group that represents the treated potential outcomes and comparing it to the mean of the group that represents the untreated potential outcomes, we can construct an unbiased guess of the true average difference in the two states of the world.^{2}

This logic does not guarantee that an answer is *correct.* A two arm trial does not let us *see* the true treatment effect. but it does produce an estimate that is not biased up or down. Just like in random sampling, as the size of our experiment grows, our guess of the average treatment effect (ATE) will converge to the true ATE. As we shall see, however, characterizing our uncertainty about our guesses can be complicated, even in such a simple design.

## Design Declaration

here are the components of a two arm trial design:

**M**odel:Our model of the world specifies a population of \(N\) units that have a control potential outcome, \(Y_i(Z = 0)\), that is distributed standard normally. A unit’s individual treatment effect is a random draw from a distribution with mean \(\tau\) and standard deviation \(\sigma\), that is added to its control potential outcome: \(Y_i(Z = 1) = Y_i(Z = 0) + t_i\).

^{3}**I**nquiry:We want to know the average of all units’ differences in treated and untreated potential outcomes – the average treatment effect: \(E[Y_i(Z = 1) - Y_i(Z = 0)] = E[t_i] = \tau\).

**D**ata strategy:We randomly sample \(n\) units from the population of \(N\). We randomly assign a fixed number, \(m\), to treatment, and the rest of the \(n-m\) units to control.

**A**nswer strategy:We subtract the mean of the control group from the mean of the treatment group in order to estimate the average treatment effect. \(E[Y_i(Z = 1)] - E[Y_i(Z = 0)] = \hat{\tau}\).

```
N <- 100
assignment_prob <- 0.5
control_mean <- 0
control_sd <- 1
treatment_mean <- 1
treatment_sd <- 1
rho <- 1
population <- declare_population(N = N, u_0 = rnorm(N), u_1 = rnorm(n = N,
mean = rho * u_0, sd = sqrt(1 - rho^2)))
potential_outcomes <- declare_potential_outcomes(Y ~ (1 -
Z) * (u_0 * control_sd + control_mean) + Z * (u_1 * treatment_sd +
treatment_mean))
estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0))
assignment <- declare_assignment(prob = assignment_prob)
reveal_Y <- declare_reveal()
estimator <- declare_estimator(Y ~ Z, estimand = estimand)
two_arm_design <- population + potential_outcomes + estimand +
assignment + reveal_Y + estimator
```

## Takeaways

`diagnosis <- diagnose_design(two_arm_design)`

Term | Bias | RMSE | Power | Coverage | Mean Estimate | SD Estimate | Mean Se | Type S Rate | Mean Estimand |
---|---|---|---|---|---|---|---|---|---|

Z | 0.01 | 0.20 | 1.00 | 0.94 | 1.01 | 0.20 | 0.20 | 0.00 | 1.00 |

(0.01) | (0.01) | (0.00) | (0.01) | (0.01) | (0.01) | (0.00) | (0.00) | (0.00) |

As the diagnosis reveals, the estimate of the ATE is unbiased. We could have shown this without simulations using simple algebra. Our estimator, \(\hat{\tau}\), can be written \(E[Y_i(Z = 1)] - E[Y_i(Z = 0)] = E[Y_i(Z = 1) - Y_i(Z = 0)]\) (via “the linearity of expectation”). This implies our estimand is equivalent to our estimator in expectation.

But our coverage is above what it should be (95%) – suggesting that our unbiased point estimates have biased confidence intervals! This reflects the fact that conventional estimators of the standard error for the difference in means make conservative assumptions about the true covariance in potential outcomes. Using this conservative rule of thumb, those standard error estimators tend to produce confidence intervals that are too wide.

## Exercises

Using the

`two_arm_designer()`

, declare a set of designs in which the correlation in potential outcomes goes from negative, to zero, to positive. What do you notice about the coverage? Explain with reference to the mean standard error and the standard deviation of the estimates.Modify the assignment step in the design so that individuals’ probability of assignment is a function of

`u_0`

, but don’t change anything else. What do you notice about the diagnosands of the design now? How might you change the estimation strategy to account for this updated assignment strategy?

This is often referred to as the “no spillovers” assumption. An individual’s outcome depends only on their treatment assignment status and is not affected by that of any other individual.↩

This last step used the useful fact that the difference of averages is the same as the average of differences.↩

This implies that the variance of the sample’s treated potential outcomes is higher than the variance of their control potential outcomes, although they are correlated because the treated potential outcome is created by simply adding the treatment effect to the control potential outcome.↩