By Clara Bicalho, Sisi Huang, and Markus Konrad
We often think of an instrumental variable (\(Z\)) as a random shock that generates exogenous variation in a treatment of interest \(X\). The randomness of \(Z\) lets us identify the effect of \(X\) on \(Y\), at least for units for which \(Z\) perturbs \(X\) in a way that’s not possible by just looking at the relationship between \(X\) and \(Y\). But surprisingly, we think, if effects are constant the instrumental variables estimator can be consistent for the effect of \(X\) on \(Y\) even when the relationship between the instrument (\(Z\)) and the endogenous variable (\(X\)) is confounded (for example, Hernán and Robins (2006)). That’s the good news. Less good news is that when there is effect heterogeneity you can get good estimates for some units but it can be hard to know which units those are (Swanson and Hernán 2017). We use a declaration and diagnosis to illustrate these insights.
An obvious requirement of a good research design is that the question it seeks to answer does in fact have an answer, at least under plausible models of the world. But we can sometimes get quite far along a research path without being conscious that the questions we ask do not have answers and the answers we get are answering different questions.
We sometimes worry about whether we need to model data generating processes correctly. For example you have ordinal outcome variables, on a five-point Likert scale. How should you model the data generation process? Do you need to model it at all? Go-to approaches include ordered probit and ordered logit models which are designed for this kind of outcome variable. But maybe you don’t need them. After all, the argument that the difference-in-means procedure estimates the treatment effect doesn’t depend on any assumptions about the type of data (as long as expectations are defined)—ordered, count, censored, etc. We diagnose a design that hedges by using both differences in means and an ordered probit model. We do so assuming that the ordered probit model correctly describes data generation. Which does better?
Qualitative process-tracing sometimes seeks to answer “cause of effects” claims using within-case data: how probable is the hypothesis that \(X\) did in fact cause \(Y\)? Fairfield and Charman (2017), for example, ask whether the right changed position on tax reform during the 2005 Chilean presidential election (\(Y\)) because of anti-inequality campaigns (\(X\)) by examining whether the case study narrative bears evidence that you would only expect to see if this were true.1 When inferential logics are so clearly articulated, it becomes possible to do design declaration and diagnosis. Here we declare a Bayesian process-tracing design and use it to think through choices about what kinds of within-case information have the greatest probative value.
Data collection is expensive, and we often only get one bite at the apple. In response, we often conduct an inexpensive (and small) pilot test to help better design the study. Pilot studies have many virtues, including practicing the logistics of data collection and improving measurement tools. But using pilots to get noisy estimates in order to determine sample sizes for scale up comes with risks.
We’re in an observational study setting in which treatment assignment was not controlled by the researcher. We have pre-treatment data on baseline outcomes and we’d like to incorporate them, mainly to decrease bias due to confounding and but also, ideally, to increase precision. One approach is to use the difference between pre and post outcomes as the outcome variable; another is to use the baseline data as a control. Which is better?
Mostly we use design diagnostics to assess issues that arise because of design decisions. But you can also use these tools to examine issues that arise after implementation. Here we look at risks from publication bias and illustrate two distinct types of upwards bias that arise from a “significance filter.” A journal for publishing null results might help, but the results in there are also likely to be biased, downwards.
We’ll be back on January 7 – Happy New Year!
In designs in which a treatment is assigned in clusters (e.g. classrooms), it’s usual practice to account for cluster-level correlations when you generate estimates of uncertainty about estimated effects. But units often share commonalities at higher levels, such as at a block level (e.g. schools). Sometimes you need to take account of this and sometimes you don’t. We show an instance of the usual procedure of clustering by assignment cluster (classrooms) working well and show how badly you can do with a more conservative approach (clustering by schools). We then show an example of a design in which clustering at the level of treatment assignment (classroom) is not good enough; in the troublesome example, schools are thought of as being sampled from a larger population of schools and treatment effects are different in different schools. In this case, if you want estimates of uncertainty for population level effects you have to cluster at the school level even though treatment is assigned within schools.