install.packages(c("DeclareDesign", "rdss"))This getting started guide is an excerpt from Chapter 4 from Blair, Coppock, and Humphreys, 2023. Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign. Princeton University Press. For a more advanced guide, see Chapter 13.
Installing R
You can download R for free from CRAN. We also recommend the free program RStudio, which provides a friendly interface to R. Both R and RStudio are available on Windows, Mac, and Linux.
Once you have R and RStudio installed, open up RStudio and install DeclareDesign and its related packages. These include three packages that enable specific steps in the research process: fabricatr for simulating social science data, randomizr for random sampling and random assignment, and estimatr for design-based estimators. You can also install rdss, which includes datasets and helper functions used in the book. To install them all, copy the following code into your R console:
We also recommend that you install and get to know the tidyverse set of packages for data analysis:
install.packages("tidyverse")For introductions to R and the tidyverse we especially recommend the free resource R for Data Science.
Declaration
Designs are constructed from design elements: models, inquiries, data strategies, and answer strategies.
In DeclareDesign, each design element is made with a function that starts with the word declare. For example, we can declare an assignment procedure using declare_assignment as follows:
library(DeclareDesign)
simple_random_assignment <-
declare_assignment(Z = simple_ra(N = N, prob = 0.6))Each element created by a declare_* function, perhaps surprisingly, is itself a function. The object simple_random_assignment is not a particular assignment — instead, it is a function that conducts assignment when called. Each time we call simple_random_assignment we get a different random assignment:
participants <- data.frame(ID = 1:100)
assignment_1 <- simple_random_assignment(participants)
assignment_2 <- simple_random_assignment(participants)
assignment_3 <- simple_random_assignment(participants)
bind_cols(assignment_1, assignment_2, assignment_3)| ID | Z | ID | Z | ID | Z |
|---|---|---|---|---|---|
| 1 | 0 | 1 | 1 | 1 | 0 |
| 2 | 0 | 2 | 0 | 2 | 0 |
| 3 | 0 | 3 | 1 | 3 | 1 |
| 4 | 1 | 4 | 0 | 4 | 1 |
| 5 | 0 | 5 | 1 | 5 | 0 |
Every step in a research design can be declared using one of the declare_* functions. Table 2 collects these according to the four elements of a research design. In Chapter 13 of the book, we detail how to build each kind of step.
| Design component | Function | Description |
|---|---|---|
| Model | declare_model() |
background variables and potential outcomes |
| Inquiry | declare_inquiry() |
research questions |
| Data strategy | declare_sampling() |
sampling procedures |
declare_assignment() |
assignment procedures | |
declare_measurement() |
measurement procedures | |
| Answer strategy | declare_estimator() |
estimation procedures |
declare_test() |
testing procedures |
We use the + operator to build from elements of a design to a design. The declaration below represents a two-arm randomized experiment with 100 units from which we aim to estimate the average treatment effect.
Two-arm randomized experiment
Diagnosis
Diagnosis is the process of simulating the design many times and calculating summary statistics about the design that describe its properties, which we call diagnosands. Once a design is declared, diagnosis is as simple as using the diagnose_design function on it.
Example design diagnosis
diagnose_design(declaration, sims = 100)| Bias | RMSE | Power |
|---|---|---|
| -0.02 | 0.31 | 0.11 |
| (0.03) | (0.02) | (0.03) |
The output of the diagnosis includes the diagnosand values (top row), such as bias of \(-0.01\), and our uncertainty about the diagnosand value (bootstrapped standard error in parentheses in the bottom row). The uncertainty estimates tell us whether we have conducted enough simulations to precisely estimate the diagnosands. The fact that that the estimate of bias is \(-0.01\) and the standard error is \(0.02\) means that we cannot distinguish the amount of bias from no bias at all.
Redesign
We redesign to learn how the diagnosands change as design features change. We can do this using the redesign function over a range of sample sizes, which produces a list of designs.
designs <- redesign(declaration, N = c(100, 200, 300, 400, 500))Our simulation and diagnosis tools can operate directly on this list of designs:
diagnose_design(designs)Library of designs
In our DesignLibrary package, we have created a set of common designs as designers (functions that create designs from just a few parameters), so you can get started quickly.
library(DesignLibrary)
block_cluster_design <-
block_cluster_two_arm_designer(N = 1000, N_blocks = 10)