DeclareDesign Blog

You can’t speak meaningfully about spillovers without specifying an estimand

A dangerous fact: it is quite possible to talk in a seemingly coherent way about strategies to answer a research question without ever properly specifying what the research question is. The risk is that you end up with the right solution to the wrong problem. The problem is particularly acute for studies where there are risks of “spillovers.”

By spillovers we mean situations where one unit’s outcome depends upon how another unit is assigned to treatment. You often hear worries that estimates of average treatment effects are biased in the presence of such spillovers. And in particular that when there are positive spillovers, estimates will be biased downwards. Sensible as these worries sound, try to state them formally and you can run into some difficulties. The key issue is that the assumption of no spillovers runs so deep that it is often invoked even prior to the definition of estimands. If you write the “average treatment effect” estimand using potential outcomes notation, as \(E(Y(1) - Y(0))\), you are already assuming that a unit’s outcomes depend only on its own assignment to treatment and not on how other units are assigned to treatment. The definition of the estimand leaves no space to even describe spillovers.

If there are in fact spillovers, then the estimand needs to be spelled out more carefully. In this case, the range of estimands to choose may be very wide and the appropriateness of different strategies is going to depend on which estimand you are shooting for.

This post shows how:

  • different types of spillover estimands might be formally declared
  • positive spillovers may produce under- or over-estimates of treatment effects
  • the merits of addressing spillovers through a sampling strategy depend on the estimand of interest

Along the way, we show how to modify designs by switching up steps at different points.

Estimands given spillovers

There are many different estimands that can take account of spillover effects, if there are any, but that correspond to the usual average treatment effects estimand when there are no spillovers. For instance, we can define the difference for a unit when it—and only it—is treated, compared to a situation in which no unit is treated. In potential outcomes notation that could be written, for unit 3, say, as:

\[\tau_3 = Y(0,0,1,0,0,\dots) - Y(0,0,0,0,0,\dots)\]

One could then define a population estimand that is the average of all these differences over a population. Note that these differences specify different counterfactual assignment vectors for each individual. In the absence of spillovers, this estimand is equivalent to the usual average treatment effect. In the presence of spillovers, this estimand is well defined, whereas the usual average treatment effect estimand is not.

There are many other possibilities. For example, the difference in outcomes when no units are treated and all units are treated. Or the difference between not being treated when all others are, and being treated when all others are. Or, the difference a change in your own condition would make given that others are assigned to the value that they actually have (see e.g. Sävje, Aronow, and Hudgens (2017)). In fact, with \(n\) units and binary assignments, we can define \(2^n\times2^n\) simple comparisons for each unit.

Below, we declare a design that allows for the possibility of spillovers. Diagnosing the design shows how severe the problem of spillovers can be for estimation. Modifying the design lets us explore different types of solutions.

How spillovers make defining and estimating average effects hard

Consider a situation in which units are grouped into triplets, indexed \(G\). Suppose that there are 80 triplets. If any member of group \(G\) is treated, then all of its members receive some equal benefit (with marginal gains possibly increasing or decreasing in the numbers treated).

We can declare this design as follows:1

N_groups <- 80
N_i_group <- 3
sd_i <- 0.2
gamma <- 2

model <- declare_model(G = add_level(N = N_groups,
                                               n = N_i_group),
                                 i = add_level(N = n, I = 1:N, zeros = 0, ones = 1))

dgp <- function(i, Z, G, n) (sum(Z[G == G[i]])/n[i])^gamma + rnorm(1,0,sd_i)

inquiry <- declare_inquiry(Treat_one = mean(
                 sapply(I, function(i) dgp(i, I==i, G, n) - dgp(i, zeros, G, n))

assign    <- declare_assignment(Z = complete_ra(N, prob = .5))

measurement  <- declare_measurement(Y = sapply(1:N, function(i) dgp(i, Z, G, n)))

estimator <- declare_estimator(Y ~ Z,  model = lm_robust,
                               label = "naive", inquiry = "Treat_one")

spillover_design <- model + inquiry + assign + measurement + estimator

The most complex part of this design declaration involves the specification of the estimand. We define a helper function, dgp, which reports an individual’s outcome given the full treatment assignment vector. We then apply that function to each unit separately and take the average.

This design produces data that looks like this:

G n i I zeros ones Z Y
01 3 001 1 0 1 1 1.1426481
01 3 002 2 0 1 1 0.8914236
01 3 003 3 0 1 1 1.1771557
02 3 004 4 0 1 0 0.0413922
02 3 005 5 0 1 0 -0.0904998
02 3 006 6 0 1 1 0.4877476

And we can diagnose the design like this:

diagnosis <- diagnose_design(spillover_design)
Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
0.22 0.23 1.00 0.00 0.33 0.05 0.04 0.00 0.11
(0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)

We see considerable bias here because the difference-in-means estimator does not account for within-group spillovers. Many units in the control condition are affected by the treatments received by other members of their group.

Interestingly, in this case, we have an overestimate of the effect, even though there are positive spillovers. In the assumed model, there are increasing returns to spillovers. Units that are treated are also likely to be in groups with a higher proportion of treated units (since, by definition, at least one member of their group is already treated). On average, treated units thus receive more positive spillover than untreated units, leading to exaggerated estimates.

Whether a sparser sample helps depends on the estimand of interest

There are multiple solutions to this kind of problem. How well they work depends, however, on how well we understand the structure of spillover effects in the first place.

One approach is to alter assignment strategies (see, e.g., Bowers et al. (2018)). An even simpler one is to employ a sparser sample. This approach seems to go against one of the few near-universal principles of research design: study as many units as possible.

But here there may be some wisdom to it. Although more units generally means greater precision, there may be a cost to this when there are risks of spillovers. If a larger study means treating more units, and this means more units interfere with each other, you might end up with a spuriously precise, biased estimate.

Here is an alternative design that implements the original data and analysis strategies on a sample of one subject per group:

sparse_design <- insert_step(spillover_design, after = "Treat_one",
                          declare_sampling(S = strata_rs(strata = G, prob = 1/3)))

Note that it is important that the sampling takes place here after the definition of the estimand. Spillovers operate as a function of population characteristics, not just sample characteristics.

Our narrower sampling strategy reduces the N by two-thirds (from 240 to 80). But the results are now unbiased. Note that we did not change our original assignment strategy, in which units are assigned to treatment or to control with .5 probability. Rather, we changed the sampling strategy. Before, we kept whole groups and assigned no members or several members to treatment. We now select one member per group and assign them to treatment or to control. Because we only select one member per group, and spillovers only go within and not between groups, there is no way for our treatment or control units to receive spillovers.

Here is a diagnosis of this sparser design:

diagnosis <- diagnose_design(sparse_design, sims = sims)
Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
0.00 0.05 0.70 0.93 0.11 0.04 0.04 0.00 0.11
(0.00) (0.00) (0.01) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00)

We have an unbiased design. Here though getting an unbiased design depended on having a good understanding of the nature of spillovers. In this case, we made use of the fact there would be no spillovers between groups.

However, how good this solution is also depends on what exactly the estimand is. What if the estimand were the average difference in outcomes between a world in which no unit is treated and one in which all units are treated?

We answer this question by declaring a new design that adds in the new estimand and links it to the estimator.

We first define the steps:

new_estimand  <- declare_inquiry(Treat_all = mean(
                  sapply(I, function(i) dgp(i,ones,G,n) - dgp(i, zeros, G,n))

new_estimator <- declare_estimator(Y ~ Z, model = lm_robust,
                                   estimand = list("Treat_one", "Treat_all"))

We then splice the steps in where we want them (estimand before the sampling, estimator after the sampling):

sparse_design <- insert_step(sparse_design, after = "Treat_one", new_estimand)
sparse_design <- replace_step(sparse_design, "naive", new_estimator)

The diagnosis highlights how poorly the sampling solution works for this new estimand:

Inquiry Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
Treat_all -0.89 0.89 0.68 0.00 0.11 0.05 0.04 0.00 1.00
(0.00) (0.00) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00) (0.00)
Treat_one 0.00 0.05 0.68 0.93 0.11 0.05 0.04 0.00 0.11
(0.00) (0.00) (0.01) (0.01) (0.00) (0.00) (0.00) (0.00) (0.00)

The sparser design fails because the treatment density is insufficient to reveal the potential outcomes required by the estimand: you cannot figure out the effects of saturation if you do not saturate.

A future post will describe approaches to addressing spillovers that work through estimation strategies, rather than through sampling and assignment strategies.

Post edited to reflect name change to spillover_designer() on 11/12/2018.


Bowers, Jake, Bruce A. Desmarais, Mark Frederickson, Nahomi Ichino, Hsuan-Wei Lee, and Simi Wang. 2018. “Models, Methods and Network Topology: Experimental Design for the Study of Interference.” Social Networks 54: 196–208.

Sävje, Fredrik, Peter M. Aronow, and Michael G. Hudgens. 2017. “Average Treatment Effects in the Presence of Unknown Interference.”

  1. In fact, here we just made some minor edits to the code pulled from the DesignLibrary package using get_design_code(spillover_designer()).↩︎