DeclareDesign Blog
Get me a random assignment YESTERDAY 2018/12/04

You’re partnering with an education nonprofit and you are planning on running a randomized control trial in 80 classrooms spread across 20 community schools. The request is in: please send us a spreadsheet with random assignments. The assignment’s gotta be blocked by school, it’s gotta be reproducible, and it’s gotta be tonight. The good news is that you can do all this in a couple of lines of code. We show how using some DeclareDesign tools and then walk through handling of more complex cases.

Randomization does not justify t-tests. How worried should I be? 2018/11/27

Deaton and Cartwright (2017) provide multiple arguments against claims that randomized trials should be thought of as a kind of gold standard of scientific evidence. One striking argument they make is that randomization does not justify the statistical tests that researchers typically use. They are right in that. Even if researchers can claim that their estimates of uncertainty are justified by randomization, their habitual use of those estimates to conduct t-tests are not. To get a handle on how severe the problem is we replicate the results in Deaton and Cartwright (2017) and then use a wider set of diagnosands to probe more deeply. Our investigation suggests that what at first seems like a big problem might not in fact be so great if your hypotheses are what they often are for experimentalists—sharp and sample-focused.

Instead of avoiding spillovers, you can model them 2018/11/20

Spillovers are often seen as a nuisance that lead researchers into error when estimating effects of interest. In a previous post, we discussed sampling strategies to reduce these risks. A more substantively satisfying approach is to try to study spillovers directly. If we do it right we can remove errors in our estimation of primary quantities of interest and learn about how spillovers work at the same time.

What does a p-value tell you about the probability a hypothesis is true? 2018/11/13

The humble $$p$$-value is much maligned and terribly misunderstood. The problem is that everyone wants to know the answer to the question: “what is the probability that [hypothesis] is true?” But $$p$$ answers a different (and not terribly useful) question: “how (un)surprising is this evidence given [hypothesis]?” Can $$p$$ shed insight on the question we really care about? Maybe, though there are dangers.

Common estimators of uncertainty overestimate uncertainty 2018/11/07

Random assignment provides a justification not just for estimates of effects but also for estimates of uncertainty about effects. The basic approach, due to Neyman, is to estimate the variance in estimates of the difference between outcomes in treatment and in control outcomes using the variability that can be observed among units in control and units in treatment. It’s an ingenious approach and dispenses with the need to make any assumptions about the shape of statistical distributions or about asymptotics. The problem though is that it can sometimes be upwardly biased, meaning that it might lead you to maintain null hypotheses when you should be rejecting them. We use design diagnosis to get a handle on how great this problem is and how it matters for different estimands.

Cluster randomized trials can be biased when cluster sizes are heterogeneous 2018/10/31

In many experiments, random assignment is performed at the level of clusters. Researchers are conscious that in such cases they cannot rely on the usual standard errors and they should take account of this feature by clustering their standard errors. Another, more subtle, risk in such designs is that if clusters are of different sizes, clustering can actually introduce bias, even if all clusters are assigned to treatment with the same probability. Luckily, there is a relatively simple fix that you can implement at the design stage.

With great power comes great responsibility 2018/10/23

We usually think that the bigger the study the better. And so huge studies often rightly garner great publicity. But the ability to generate more precise results also comes with a risk. If study designs are at risk of bias and readers (or publicists!) employ a statistical significance filter, then big data might not remove threats of bias and might actually make things worse.

How misleading are clustered SEs in designs with few clusters? 2018/10/16

Cluster-robust standard errors are known to behave badly with too few clusters. There is a great discussion of this issue by Berk Özler “Beware of studies with a small number of clusters” drawing on studies by Cameron, Gelbach, and Miller (2008). See also this nice post by Cyrus Samii and a recent treatment by Esarey and Menger (2018). A rule of thumb is to start worrying about sandwich estimators when the number of clusters goes below 40. But here we show that diagnosis of a canonical design suggests that some sandwich approaches fare quite well even with fewer than 10 clusters.