install.packages(c("DeclareDesign", "rdddr"))
This getting started guide is an excerpt from Chapter 4 from Blair, Coppock, and Humphreys, 2023. Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign. Princeton University Press. For a more advanced guide, see Chapter 13.
Installing R
You can download R for free from CRAN. We also recommend the free program RStudio, which provides a friendly interface to R. Both R and RStudio are available on Windows, Mac, and Linux.
Once you have R and RStudio installed, open up RStudio and install DeclareDesign and its related packages. These include three packages that enable specific steps in the research process: fabricatr
for simulating social science data, randomizr
for random sampling and random assignment, and estimatr
for design-based estimators. You can also install rdddr
, which includes datasets and helper functions used in the book. To install them all, copy the following code into your R console:
We also recommend that you install and get to know the tidyverse
set of packages for data analysis:
install.packages("tidyverse")
For introductions to R and the tidyverse
we especially recommend the free resource R for Data Science.
Declaration
Designs are constructed from design elements: models, inquiries, data strategies, and answer strategies.
In DeclareDesign
, each design element is made with a function that starts with the word declare
. For example, we can declare an assignment procedure using declare_assignment
as follows:
library(DeclareDesign)
<-
simple_random_assignment declare_assignment(Z = simple_ra(N = N, prob = 0.6))
Each element created by a declare_*
function, perhaps surprisingly, is itself a function. The object simple_random_assignment
is not a particular assignment — instead, it is a function that conducts assignment when called. Each time we call simple_random_assignment
we get a different random assignment:
<- data.frame(ID = 1:100)
participants
<- simple_random_assignment(participants)
assignment_1 <- simple_random_assignment(participants)
assignment_2 <- simple_random_assignment(participants)
assignment_3
bind_cols(assignment_1, assignment_2, assignment_3)
ID | Z | ID | Z | ID | Z |
---|---|---|---|---|---|
1 | 0 | 1 | 1 | 1 | 0 |
2 | 0 | 2 | 0 | 2 | 0 |
3 | 0 | 3 | 1 | 3 | 1 |
4 | 1 | 4 | 0 | 4 | 1 |
5 | 0 | 5 | 1 | 5 | 0 |
Every step in a research design can be declared using one of the declare_*
functions. Table 2 collects these according to the four elements of a research design. In Chapter 13 of the book, we detail how to build each kind of step.
Design component | Function | Description |
---|---|---|
Model | declare_model() |
background variables and potential outcomes |
Inquiry | declare_inquiry() |
research questions |
Data strategy | declare_sampling() |
sampling procedures |
declare_assignment() |
assignment procedures | |
declare_measurement() |
measurement procedures | |
Answer strategy | declare_estimator() |
estimation procedures |
declare_test() |
testing procedures |
We use the +
operator to build from elements of a design to a design. The declaration below represents a two-arm randomized experiment with 100 units from which we aim to estimate the average treatment effect.
Two-arm randomized experiment
Diagnosis
Diagnosis is the process of simulating the design many times and calculating summary statistics about the design that describe its properties, which we call diagnosands. Once a design is declared, diagnosis is as simple as using the diagnose_design
function on it.
Example design diagnosis
diagnose_design(declaration, sims = 100)
Bias | RMSE | Power |
---|---|---|
-0.02 | 0.31 | 0.11 |
(0.03) | (0.02) | (0.03) |
The output of the diagnosis includes the diagnosand values (top row), such as bias of \(-0.01\), and our uncertainty about the diagnosand value (bootstrapped standard error in parentheses in the bottom row). The uncertainty estimates tell us whether we have conducted enough simulations to precisely estimate the diagnosands. The fact that that the estimate of bias is \(-0.01\) and the standard error is \(0.02\) means that we cannot distinguish the amount of bias from no bias at all.
Redesign
We redesign to learn how the diagnosands change as design features change. We can do this using the redesign
function over a range of sample sizes, which produces a list of designs.
<- redesign(declaration, N = c(100, 200, 300, 400, 500)) designs
Our simulation and diagnosis tools can operate directly on this list of designs:
diagnose_design(designs)
Library of designs
In our DesignLibrary
package, we have created a set of common designs as designers (functions that create designs from just a few parameters), so you can get started quickly.
library(DesignLibrary)
<-
block_cluster_design block_cluster_two_arm_designer(N = 1000, N_blocks = 10)