Blog on A Hugo website
/blog.html
Recent content in Blog on A Hugo websiteHugo -- gohugo.ioen-usWed, 08 Jan 2020 00:00:00 +0000Now there is a web interface for declaring and diagnosing research designs
/blog/now-there-is-a-web-interface-for-declaring-and-diagnosing-research-designs.html
Wed, 08 Jan 2020 00:00:00 +0000/blog/now-there-is-a-web-interface-for-declaring-and-diagnosing-research-designs.htmlBy Clara Bicalho, Sisi Huang, and Markus Konrad
An instrument does not have to be exogenous to be consistent
/blog/2019-02-05-instrumental-variables.html
Tue, 19 Feb 2019 00:00:00 +0000/blog/2019-02-05-instrumental-variables.htmlWe often think of an instrumental variable ((Z)) as a random shock that generates exogenous variation in a treatment of interest (X). The randomness of (Z) lets us identify the effect of (X) on (Y), at least for units for which (Z) perturbs (X) in a way that’s not possible by just looking at the relationship between (X) and (Y). But surprisingly, we think, if effects are constant the instrumental variables estimator can be consistent for the effect of (X) on (Y) even when the relationship between the instrument ((Z)) and the endogenous variable ((X)) is confounded (for example, Hernán and Robins (2006)). That’s the good news. Less good news is that when there is effect heterogeneity you can get good estimates for some units but it can be hard to know which units those are (Swanson and Hernán 2017). We use a declaration and diagnosis to illustrate these insights.
Some designs have badly posed questions and design diagnosis can alert you to the problem
/blog/some-designs-have-badly-posed-questions-and-design-diagnosis-can-alert-you-to-the-problem.html
Tue, 12 Feb 2019 00:00:00 +0000/blog/some-designs-have-badly-posed-questions-and-design-diagnosis-can-alert-you-to-the-problem.htmlAn obvious requirement of a good research design is that the question it seeks to answer does in fact have an answer, at least under plausible models of the world. But we can sometimes get quite far along a research path without being conscious that the questions we ask do not have answers and the answers we get are answering different questions.
Estimating Average Treatment Effects with Ordered Probit: Is it worth it?
/blog/estimating-average-treatment-effects-with-ordered-probit-is-it-worth-it.html
Wed, 06 Feb 2019 00:00:00 +0000/blog/estimating-average-treatment-effects-with-ordered-probit-is-it-worth-it.htmlWe sometimes worry about whether we need to model data generating processes correctly. For example you have ordinal outcome variables, on a five-point Likert scale. How should you model the data generation process? Do you need to model it at all? Go-to approaches include ordered probit and ordered logit models which are designed for this kind of outcome variable. But maybe you don’t need them. After all, the argument that the difference-in-means procedure estimates the treatment effect doesn’t depend on any assumptions about the type of data (as long as expectations are defined)—ordered, count, censored, etc. We diagnose a design that hedges by using both differences in means and an ordered probit model. We do so assuming that the ordered probit model correctly describes data generation. Which does better?
What can you learn from simulating qualitative inference strategies?
/blog/2019-01-30-process-tracing.html
Wed, 30 Jan 2019 00:00:00 +0000/blog/2019-01-30-process-tracing.htmlQualitative process-tracing sometimes seeks to answer “cause of effects” claims using within-case data: how probable is the hypothesis that (X) did in fact cause (Y)? Fairfield and Charman (2017), for example, ask whether the right changed position on tax reform during the 2005 Chilean presidential election ((Y)) because of anti-inequality campaigns ((X)) by examining whether the case study narrative bears evidence that you would only expect to see if this were true.1 When inferential logics are so clearly articulated, it becomes possible to do design declaration and diagnosis. Here we declare a Bayesian process-tracing design and use it to think through choices about what kinds of within-case information have the greatest probative value.
Should a pilot study change your study design decisions?
/blog/2019-01-23-pilot-studies.html
Wed, 23 Jan 2019 00:00:00 +0000/blog/2019-01-23-pilot-studies.htmlData collection is expensive, and we often only get one bite at the apple. In response, we often conduct an inexpensive (and small) pilot test to help better design the study. Pilot studies have many virtues, including practicing the logistics of data collection and improving measurement tools. But using pilots to get noisy estimates in order to determine sample sizes for scale up comes with risks.
Use change scores or control for pre-treatment outcomes? Depends on the true data generating process
/blog/use-change-scores-or-control-for-pre-treatment-outcomes-depends-on-the-true-data-generating-process.html
Tue, 15 Jan 2019 00:00:00 +0000/blog/use-change-scores-or-control-for-pre-treatment-outcomes-depends-on-the-true-data-generating-process.htmlWe’re in an observational study setting in which treatment assignment was not controlled by the researcher. We have pre-treatment data on baseline outcomes and we’d like to incorporate them, mainly to decrease bias due to confounding and but also, ideally, to increase precision. One approach is to use the difference between pre and post outcomes as the outcome variable; another is to use the baseline data as a control. Which is better?
A journal of null results is a flawed fix for a significance filter
/blog/a-journal-of-null-results-is-a-flawed-fix-for-a-significance-filter.html
Tue, 08 Jan 2019 00:00:00 +0000/blog/a-journal-of-null-results-is-a-flawed-fix-for-a-significance-filter.htmlMostly we use design diagnostics to assess issues that arise because of design decisions. But you can also use these tools to examine issues that arise after implementation. Here we look at risks from publication bias and illustrate two distinct types of upwards bias that arise from a “significance filter.” A journal for publishing null results might help, but the results in there are also likely to be biased, downwards.
DeclareDesign Holiday Hiatus
/blog/declaredesign-holiday-hiatus.html
Thu, 20 Dec 2018 00:00:00 +0000/blog/declaredesign-holiday-hiatus.htmlWe’ll be back on January 7 – Happy New Year!
Sometimes you need to cluster standard errors above the level of treatment
/blog/sometimes-you-need-to-cluster-standard-errors-above-the-level-of-treatment.html
Tue, 18 Dec 2018 00:00:00 +0000/blog/sometimes-you-need-to-cluster-standard-errors-above-the-level-of-treatment.htmlIn designs in which a treatment is assigned in clusters (e.g. classrooms), it’s usual practice to account for cluster-level correlations when you generate estimates of uncertainty about estimated effects. But units often share commonalities at higher levels, such as at a block level (e.g. schools). Sometimes you need to take account of this and sometimes you don’t. We show an instance of the usual procedure of clustering by assignment cluster (classrooms) working well and show how badly you can do with a more conservative approach (clustering by schools). We then show an example of a design in which clustering at the level of treatment assignment (classroom) is not good enough; in the troublesome example, schools are thought of as being sampled from a larger population of schools and treatment effects are different in different schools. In this case, if you want estimates of uncertainty for population level effects you have to cluster at the school level even though treatment is assigned within schools.
Meta-analysis can be used not just to guess about effects out-of-sample but also to re-evaluate effects in sample
/blog/meta-analysis-can-be-used-not-just-to-guess-about-effects-out-of-sample-but-also-to-re-evaluate-effects-in-sample.html
Tue, 11 Dec 2018 00:00:00 +0000/blog/meta-analysis-can-be-used-not-just-to-guess-about-effects-out-of-sample-but-also-to-re-evaluate-effects-in-sample.htmlImagine you are in the fortunate position of planning a collection of studies which you will later get to analyze together (looking at you metaketas). Each study estimates a site specific effect. You want to learn something about general effects. We work through design issues using a multi-study design with J studies that employs both frequentist and Bayesian approaches to meta-analysis. In the designs that we diagnose these perform very similarly in terms of estimating sample and population average effects. But there are tradeoffs. The Bayesian model does better at estimating individual effects by separating out true heterogeneity from sampling error but can sometimes fare poorly at estimating prediction intervals.
Get me a random assignment YESTERDAY
/blog/get-me-a-random-assignment-yesterday.html
Tue, 04 Dec 2018 00:00:00 +0000/blog/get-me-a-random-assignment-yesterday.htmlYou’re partnering with an education nonprofit and you are planning on running a randomized control trial in 80 classrooms spread across 20 community schools. The request is in: please send us a spreadsheet with random assignments. The assignment’s gotta be blocked by school, it’s gotta be reproducible, and it’s gotta be tonight. The good news is that you can do all this in a couple of lines of code. We show how using some DeclareDesign tools and then walk through handling of more complex cases.
Randomization does not justify t-tests. How worried should I be?
/blog/randomization-does-not-justify-t-tests.-how-worried-should-i-be.html
Tue, 27 Nov 2018 00:00:00 +0000/blog/randomization-does-not-justify-t-tests.-how-worried-should-i-be.htmlDeaton and Cartwright (2017) provide multiple arguments against claims that randomized trials should be thought of as a kind of gold standard of scientific evidence. One striking argument they make is that randomization does not justify the statistical tests that researchers typically use. They are right in that. Even if researchers can claim that their estimates of uncertainty are justified by randomization, their habitual use of those estimates to conduct t-tests are not. To get a handle on how severe the problem is we replicate the results in Deaton and Cartwright (2017) and then use a wider set of diagnosands to probe more deeply. Our investigation suggests that what at first seems like a big problem might not in fact be so great if your hypotheses are what they often are for experimentalists—sharp and sample-focused.
Instead of avoiding spillovers, you can model them
/blog/modelling_spillovers.html
Tue, 20 Nov 2018 00:00:00 +0000/blog/modelling_spillovers.htmlSpillovers are often seen as a nuisance that lead researchers into error when estimating effects of interest. In a previous post, we discussed sampling strategies to reduce these risks. A more substantively satisfying approach is to try to study spillovers directly. If we do it right we can remove errors in our estimation of primary quantities of interest and learn about how spillovers work at the same time.
What does a p-value tell you about the probability a hypothesis is true?
/blog/2018-11-13-learning-from-p.html
Tue, 13 Nov 2018 00:00:00 +0000/blog/2018-11-13-learning-from-p.htmlThe humble (p)-value is much maligned and terribly misunderstood. The problem is that everyone wants to know the answer to the question: “what is the probability that [hypothesis] is true?” But (p) answers a different (and not terribly useful) question: “how (un)surprising is this evidence given [hypothesis]?” Can (p) shed insight on the question we really care about? Maybe, though there are dangers.
Common estimators of uncertainty overestimate uncertainty
/blog/neyman-sate-pate.html
Wed, 07 Nov 2018 00:00:00 +0000/blog/neyman-sate-pate.htmlRandom assignment provides a justification not just for estimates of effects but also for estimates of uncertainty about effects. The basic approach, due to Neyman, is to estimate the variance in estimates of the difference between outcomes in treatment and in control outcomes using the variability that can be observed among units in control and units in treatment. It’s an ingenious approach and dispenses with the need to make any assumptions about the shape of statistical distributions or about asymptotics. The problem though is that it can sometimes be upwardly biased, meaning that it might lead you to maintain null hypotheses when you should be rejecting them. We use design diagnosis to get a handle on how great this problem is and how it matters for different estimands.
Cluster randomized trials can be biased when cluster sizes are heterogeneous
/blog/bias-cluster-randomized-trials.html
Wed, 31 Oct 2018 00:00:00 +0000/blog/bias-cluster-randomized-trials.htmlIn many experiments, random assignment is performed at the level of clusters. Researchers are conscious that in such cases they cannot rely on the usual standard errors and they should take account of this feature by clustering their standard errors. Another, more subtle, risk in such designs is that if clusters are of different sizes, clustering can actually introduce bias, even if all clusters are assigned to treatment with the same probability. Luckily, there is a relatively simple fix that you can implement at the design stage.
With great power comes great responsibility
/blog/with-great-power-comes-great-responsibility.html
Tue, 23 Oct 2018 00:00:00 +0000/blog/with-great-power-comes-great-responsibility.htmlWe usually think that the bigger the study the better. And so huge studies often rightly garner great publicity. But the ability to generate more precise results also comes with a risk. If study designs are at risk of bias and readers (or publicists!) employ a statistical significance filter, then big data might not remove threats of bias and might actually make things worse.
How misleading are clustered SEs in designs with few clusters?
/blog/how-misleading-are-clustered-ses-in-designs-with-few-clusters.html
Tue, 16 Oct 2018 00:00:00 +0000/blog/how-misleading-are-clustered-ses-in-designs-with-few-clusters.htmlCluster-robust standard errors are known to behave badly with too few clusters. There is a great discussion of this issue by Berk Özler “Beware of studies with a small number of clusters” drawing on studies by Cameron, Gelbach, and Miller (2008). See also this nice post by Cyrus Samii and a recent treatment by Esarey and Menger (2018). A rule of thumb is to start worrying about sandwich estimators when the number of clusters goes below 40. But here we show that diagnosis of a canonical design suggests that some sandwich approaches fare quite well even with fewer than 10 clusters.
The trouble with ‘controlling for blocks’
/blog/biased-fixed-effects.html
Tue, 09 Oct 2018 00:00:00 +0000/blog/biased-fixed-effects.htmlIn many experiments, different groups of units get assigned to treatment with different probabilities. This can give rise to misleading results unless you properly take account of possible differences between the groups. How best to do this? The go-to approach is to “control” for groups by introducing “fixed-effects” in a regression set-up. The bad news is that this procedure is prone to bias. The good news is that there’s an even simpler and more intuitive approach that gets it right: estimate the difference-in-means within each group, then average over these group-level estimates weighting according to the size of the group. We’ll use design declaration to show the problem and to compare the performance of this and an array of other proposed solutions.
Improve power using your answer strategy, not just your data strategy
/blog/improve-power-using-your-answer-strategy-not-just-your-data-strategy.html
Tue, 02 Oct 2018 00:00:00 +0000/blog/improve-power-using-your-answer-strategy-not-just-your-data-strategy.htmlMost power calculators take a small number of inputs: sample size, effect size, and variance. Some also allow for number of blocks or cluster size as well as the overall sample size. All of these inputs relate to your data strategy. Unless you can control the effect size and the noise, you are left with sample size and data structure (blocks and clusters) as the only levers to play with to try to improve your power.
Sometimes blocking can reduce your precision
/blog/sometimes-blocking-can-reduce-your-precision.html
Mon, 24 Sep 2018 00:00:00 +0000/blog/sometimes-blocking-can-reduce-your-precision.htmlYou can often improve the precision of your randomized controlled trial with blocking: first gather similar units together into groups, then run experiments inside each little group, then average results across experiments. Block random assignment (sometimes called stratified random assignment) can be great—increasing precision with blocking is like getting extra sample size for free. Blocking works because it’s like controlling for a pre-treatment covariate in the “Data Strategy” rather than in the “Answer Strategy.” But sometimes it does more harm than good.
You can’t speak meaningfully about spillovers without specifying an estimand
/blog/you-cant-speak-meaningfully-about-spillovers-without-specifying-an-estimand.html
Tue, 18 Sep 2018 00:00:00 +0000/blog/you-cant-speak-meaningfully-about-spillovers-without-specifying-an-estimand.htmlA dangerous fact: it is quite possible to talk in a seemingly coherent way about strategies to answer a research question without ever properly specifying what the research question is. The risk is that you end up with the right solution to the wrong problem. The problem is particularly acute for studies where there are risks of “spillovers.”
How controlling for pretreatment covariates can introduce bias
/blog/how-controlling-for-pretreatment-covariates-can-introduce-bias.html
Wed, 12 Sep 2018 00:00:00 +0000/blog/how-controlling-for-pretreatment-covariates-can-introduce-bias.htmlConsider an observational study looking at the effect of a non-randomly assigned treatment, (Z), on an outcome (Y). Say you have a pretreatment covariate, (X), that is correlated with both (Z) and (Y). Should you control for (X) when you try to assess the effect of (Z) on (Y)?
DeclareDesign: The Blog
/blog/declaredesign-blog.html
Tue, 11 Sep 2018 00:00:00 +0000/blog/declaredesign-blog.htmlWelcome to the DeclareDesign blog! We have been working on developing the DeclareDesign family of software packages to let researchers easily generate research designs and assess their properties. Our plan over the next six months is to put up weekly blog posts showing off features of the packages or highlighting the kinds of things you can learn about research design using this approach.