Getting started with DeclareDesign

This getting started guide is an excerpt from Chapter 4 from Blair, Coppock, and Humphreys, 2023. Research Design in the Social Sciences: Declaration, Diagnosis, and Redesign. Princeton University Press. For a more advanced guide, see Chapter 13.

Installing R

You can download R for free from CRAN. We also recommend the free program RStudio, which provides a friendly interface to R. Both R and RStudio are available on Windows, Mac, and Linux.

Once you have R and RStudio installed, open up RStudio and install DeclareDesign and its related packages. These include three packages that enable specific steps in the research process: fabricatr for simulating social science data, randomizr for random sampling and random assignment, and estimatr for design-based estimators. You can also install rdddr, which includes datasets and helper functions used in the book. To install them all, copy the following code into your R console:

install.packages(c("DeclareDesign", "rdddr"))

We also recommend that you install and get to know the tidyverse set of packages for data analysis:

install.packages("tidyverse")

For introductions to R and the tidyverse we especially recommend the free resource R for Data Science.

Declaration

Designs are constructed from design elements: models, inquiries, data strategies, and answer strategies.

In DeclareDesign, each design element is made with a function that starts with the word declare. For example, we can declare an assignment procedure using declare_assignment as follows:

library(DeclareDesign)

simple_random_assignment <- 
  declare_assignment(Z = simple_ra(N = N, prob = 0.6))

Each element created by a declare_* function, perhaps surprisingly, is itself a function. The object simple_random_assignment is not a particular assignment — instead, it is a function that conducts assignment when called. Each time we call simple_random_assignment we get a different random assignment:

participants <- data.frame(ID = 1:100)

assignment_1 <- simple_random_assignment(participants)
assignment_2 <- simple_random_assignment(participants)
assignment_3 <- simple_random_assignment(participants)

bind_cols(assignment_1, assignment_2, assignment_3)
Table 1: Three random assignments from the same random assignment step.
Assignment 1
Assignment 2
Assignment 3
ID Z ID Z ID Z
1 0 1 1 1 0
2 0 2 0 2 0
3 0 3 1 3 1
4 1 4 0 4 1
5 0 5 1 5 0

Every step in a research design can be declared using one of the declare_* functions. Table 2 collects these according to the four elements of a research design. In Chapter 13 of the book, we detail how to build each kind of step.

Table 2: Declaration functions in DeclareDesign
Design component Function Description
Model declare_model() background variables and potential outcomes
Inquiry declare_inquiry() research questions
Data strategy declare_sampling() sampling procedures
declare_assignment() assignment procedures
declare_measurement() measurement procedures
Answer strategy declare_estimator() estimation procedures
declare_test() testing procedures

We use the + operator to build from elements of a design to a design. The declaration below represents a two-arm randomized experiment with 100 units from which we aim to estimate the average treatment effect.

Two-arm randomized experiment

Figure 1: Two-arm randomized experiment declaration

Diagnosis

Diagnosis is the process of simulating the design many times and calculating summary statistics about the design that describe its properties, which we call diagnosands. Once a design is declared, diagnosis is as simple as using the diagnose_design function on it.

Example design diagnosis

diagnose_design(declaration, sims = 100)
Table 3: Design diagnosis.
Bias RMSE Power
-0.02 0.31 0.11
(0.03) (0.02) (0.03)

The output of the diagnosis includes the diagnosand values (top row), such as bias of \(-0.01\), and our uncertainty about the diagnosand value (bootstrapped standard error in parentheses in the bottom row). The uncertainty estimates tell us whether we have conducted enough simulations to precisely estimate the diagnosands. The fact that that the estimate of bias is \(-0.01\) and the standard error is \(0.02\) means that we cannot distinguish the amount of bias from no bias at all.

Redesign

We redesign to learn how the diagnosands change as design features change. We can do this using the redesign function over a range of sample sizes, which produces a list of designs.

designs <- redesign(declaration, N = c(100, 200, 300, 400, 500))

Our simulation and diagnosis tools can operate directly on this list of designs:

diagnose_design(designs)

Library of designs

In our DesignLibrary package, we have created a set of common designs as designers (functions that create designs from just a few parameters), so you can get started quickly.

library(DesignLibrary)

block_cluster_design <- 
  block_cluster_two_arm_designer(N = 1000, N_blocks = 10)