DeclareDesign is a system for describing research designs in code and simulating them in order to understand their properties. Because DeclareDesign employs a consistent grammar of designs, you can focus on the intellectually challenging part – designing good research studies – without having to code up simulations from scratch. DeclareDesign is based on the Model-Inquiry-Data Strategy-Answer Strategy (MIDA) framework for describing designs and a declare-diagnose-redesign workflow for improving research designs before implementing them.

## Installing R

You can download the statistical computing environment R for free from CRAN. We also recommend the free program RStudio, which provides a friendly interface to R. Both R and RStudio are available on Windows, Mac, and Linux.

Once you have R and RStudio installed, open it up and install `DeclareDesign`

and its related packages. These include three packages that enable specific steps in the research process (`fabricatr`

for simulating social science data; `randomizr`

for random sampling and random assignment; and `estimatr`

for design-based estimators). You can also install `DesignLibrary`

, which gets standard designs up-and-running in one line. To install them, copy the following code into your R console:

```
install.packages(c(
"DeclareDesign",
"fabricatr",
"randomizr",
"estimatr",
"DesignLibrary"
))
```

We also recommend that you install and get to know the `tidyverse`

suite of packages for data analysis:

`install.packages("tidyverse")`

For introductions to R and the `tidyverse`

we especially recommend the free resource `R for Data Science`

.

## Building a step of a research design

A research design is a concatenation of design steps. The best way to learn how to build a design is to learn how to make a step. We will start out by making—or *declaring*—a step that implements random assignment.

Almost all steps take a dataset as input and return a dataset as output. We will imagine input data that describes a set of voters in Los Angeles. The research project we are planning involves randomly assigning voters to receive (or not receive) a knock on their door from a canvasser. Our data look like this:

ID | age | sex | party | precinct |
---|---|---|---|---|

001 | 66 | M | REP | 9104 |

002 | 54 | F | DEM | 8029 |

003 | 18 | M | GRN | 8383 |

004 | 42 | F | DEM | 2048 |

005 | 27 | M | REP | 5210 |

There are 100 voters in the dataset.

We want a function that takes this dataset, implements a random assignment, adds it to the dataset, and then returns the new dataset containing the random assignment.

You could write your own function to do that but you can also use one of the `declare_*`

functions in `DeclareDesign`

that are designed to write functions. Each one of these functions is a kind of *function factory*: it takes a set of parameters about your research design like the number of units and the random assignment probability as *inputs*, and returns a *function* as an output.
Here is an example of a `declare_assignment`

step.

```
simple_random_assignment_step <-
declare_assignment(Z = simple_ra(N = N, prob = 0.6))
```

The big idea here is that the object we created, `simple_random_assignment_step`

, is not a particular assignment, it is a *function* that conducts assignment when called. You can run the function on data:

`simple_random_assignment_step(voter_file) `

ID | age | sex | party | precinct | Z |
---|---|---|---|---|---|

001 | 66 | M | REP | 9104 | 1 |

002 | 54 | F | DEM | 8029 | 1 |

003 | 18 | M | GRN | 8383 | 0 |

004 | 42 | F | DEM | 2048 | 1 |

005 | 27 | M | REP | 5210 | 0 |

The output of the `simple_random_assignment_step(voter_file)`

call is the original dataset with a new column indicating treatment assignment (`Z`

) appended. As a bonus, the data also includes the probability that each unit is assigned *to the condition in which it is in* (`Z_cond`

), which is an extremely useful number to know in many analysis settings. The most important thing to understand here is that steps are “dataset-in, dataset-out” functions. The `simple_random_assignment_step`

took the `voter_file`

dataset and returned a dataset with assignment information appended.

Every step of a research design declaration can be written using one of the `declare_*`

functions. Table 3 collects these according to the four elements of a research design. Below, we walk through the common uses of each of these declaration functions.

Design component | Function | Description |
---|---|---|

Model | `declare_model()` |
define background variables and potential outcomes |

Inquiry | `declare_inquiry()` |
define research question |

Data strategy | `declare_sampling()` |
specify sampling procedures |

`declare_assignment()` |
specify assignment procedures | |

`declare_measurement()` |
specify measurement procedures | |

Answer strategy | `declare_estimator()` |
specify estimation procedures |

`declare_test()` |
specify testing procedures |

### Your own handlers

The built-in functions we provide in the `DeclareDesign`

package are quite flexible and handle many major designs, but not all. The framework is built so that you are never constrained by what we provide. At any point, you can write a function that implements your own procedures. The only discipline that the framework imposes is that you write your procedure as a function that takes data in and sends data back.

Here is an example of how you turn your own functions into design steps.

```
custom_assignment <- function(data) {
mutate(data, Z = rbinom(n = nrow(data), 1, prob = 0.5))
}
my_assignment_step <- declare_assignment(handler = custom_assignment)
my_assignment_step(voter_file)
```

ID | age | sex | party | precinct | Z |
---|---|---|---|---|---|

001 | 66 | M | REP | 9104 | 0 |

002 | 54 | F | DEM | 8029 | 1 |

003 | 18 | M | GRN | 8383 | 1 |

004 | 42 | F | DEM | 2048 | 0 |

005 | 27 | M | REP | 5210 | 0 |

## Research design steps

In this section, we walk through how to declare each step of a research design using `DeclareDesign`

. In the next section, we build those steps into a research design, and then describe how to interrogate the design.

### Model

The model defines the structure of the world, both its size and background characteristics as well as how interventions in the world determine outcomes.

#### Population

The population defines the number of units in the population, any multilevel structure to the data, and its background characteristics. We can define the population in several ways. In some cases, you may start a design with data on the population. When that happens, we do not need to simulate it. We can simply declare the data as our population:

`declare_model(data = voter_file)`

ID | age | sex | party | precinct |
---|---|---|---|---|

001 | 66 | M | REP | 9104 |

002 | 54 | F | DEM | 8029 |

003 | 18 | M | GRN | 8383 |

004 | 42 | F | DEM | 2048 |

005 | 27 | M | REP | 5210 |

When we do not have complete data on the population, we simulate it. Relying on the data simulation functions from our `fabricatr`

package, `declare_model`

asks about the size and variables of the population. For instance, if we want a function that generates a dataset with 100 units and a random variable `U`

we write:

`declare_model(N = 100, U = rnorm(N))`

When we run this population function, we will get a different 100-unit dataset each time, as shown in Table 6.

ID | U | ID | U | ID | U | ID | U | ID | U |
---|---|---|---|---|---|---|---|---|---|

001 | -0.700 | 001 | -0.35 | 001 | -1.07 | 001 | 0.696 | 001 | 0.87 |

002 | 0.561 | 002 | 0.27 | 002 | 0.11 | 002 | 0.059 | 002 | 0.46 |

003 | 0.048 | 003 | 0.45 | 003 | 0.35 | 003 | -0.030 | 003 | 1.73 |

004 | 0.567 | 004 | -2.69 | 004 | -1.37 | 004 | -1.279 | 004 | -0.46 |

005 | -1.196 | 005 | 0.87 | 005 | -0.39 | 005 | -0.247 | 005 | -0.70 |

The `fabricatr`

package can simulate many different types of data, including various types of categorical variables or different types of data structures, such as panel or multilevel structures. You can read the `fabricatr`

website vignette to get started simulating data.

As an example of a two-level hierarchical data structure, here is a declaration for 100 households with a random number of individuals within each household. This two-level structure could be declared as:

```
declare_model(
households = add_level(
N = 100,
individuals_per_hh = sample(1:6, N, replace = TRUE)
),
individuals = add_level(
N = individuals_per_hh,
age = sample(1:100, N, replace = TRUE)
)
)
```

As always, you can exit our built-in way of doing things and bring in your own code. This is useful for complex designs, or when you have already written code for your design and you want to use it directly. Here is an example of a custom population declaration:

```
complex_population_function <- function(data, N_units) {
data.frame(U = rnorm(N_units))
}
declare_model(
handler = complex_population_function, N_units = 100
)
```

#### Potential outcomes

Defining potential outcomes is as easy as a single expression per potential outcome. Potential outcomes may depend on background characteristics, other potential outcomes, or other R functions.

```
declare_model(
Y_Z_0 = U,
Y_Z_1 = Y_Z_0 + 0.25)
```

```
design <-
declare_model(
N = 100, U = rnorm(N),
potential_outcomes(Y ~ 0.25 * Z + U)
)
```

ID | U | Y_Z_0 | Y_Z_1 |
---|---|---|---|

001 | -2.41 | -2.41 | -2.16 |

002 | 0.81 | 0.81 | 1.06 |

003 | -0.86 | -0.86 | -0.61 |

004 | 1.30 | 1.30 | 1.55 |

005 | 0.59 | 0.59 | 0.84 |

The `declare_model`

function also includes an alternative interface for defining potential outcomes that uses R’s formula syntax with the `potential_outcomes`

function. The formula syntax lets you specify “regression-like” outcome equations. One downside is that it mildly obscures how the names of the eventual potential outcomes columns are named. We build the names of the potential outcomes columns the outcome name (here `Y`

on the left-hand side of the formula) and the name of the assignment variable from the variable name in the `conditions`

argument (here `Z`

).

`declare_model(potential_outcomes(Y ~ 0.25 * Z + U, conditions = list(Z = c(0, 1))))`

Either way of creating potential outcomes works; one may be easier or harder to code up in a given research design setting.

### Inquiry

To define your inquiry, declare your estimand. Estimands are typically summaries of the data produced in `declare_model`

. Here we define the average treatment effect as follows:

`declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0))`

Notice that we defined the PATE (the *population* average treatment effect), but said nothing special related to the population – it looks like we just defined the average treatment effect. This is because *order matters*. If we want to define a SATE (the *sample* average treatment effect), we would have to do so after sampling has occurred. We will see how to do this in a moment.

### Data strategy

The data strategy constitutes one or more steps representing interventions the researcher makes in the world from sampling to assignment to measurement.

#### Sampling

The sampling step relies on the `randomizr`

package to conduct random sampling. See Section **??** for an overview of the many kinds of sampling that are possible. Here we define a procedure for drawing a 50-unit sample from the population:

`declare_sampling(S = complete_rs(N, n = 50), legacy = FALSE)`

When we draw data from our simple design at this point, it will have fewer rows: it will have shrunk from 100 units in the population to a data frame of 50 units representing the sample. The new data frame also includes a variable indicating the probability of being included in the sample. In this case, every unit in the population had an equal inclusion probability of 0.5.

ID | U | Y_Z_0 | Y_Z_1 | S | |
---|---|---|---|---|---|

5 | 005 | 0.523 | 0.523 | 0.77 | 1 |

6 | 006 | 1.933 | 1.933 | 2.18 | 1 |

8 | 008 | 0.748 | 0.748 | 1.00 | 1 |

10 | 010 | -0.078 | -0.078 | 0.17 | 1 |

11 | 011 | 2.119 | 2.119 | 2.37 | 1 |

Sampling could also be non-random, which could be accomplished by using a custom handler.

#### Assignment

Here, we define an assignment procedure that allocates subjects to treatment with probability 0.5.

`declare_assignment(Z = complete_ra(N, prob = 0.5), legacy = FALSE)`

After treatments are assigned, some *potential* outcomes are *revealed*. Treated units reveal their treated potential outcomes and untreated units reveal their untreated potential outcomes. The `reveal_outcomes`

function performs this switching operation.

`declare_measurement(Y = reveal_outcomes(Y ~ Z))`

Adding these two declarations to the design results in a data frame with an additional indicator `Z`

for the assignment as well as its corresponding probability of assignment. Again, here the assignment probabilities are constant, but in other designs described in Section **??** they are not and this is crucial information for the analysis stage. The outcome variable `Y`

is composed of each unit’s potential outcomes depending on its treatment status.

ID | U | Y_Z_0 | Y_Z_1 | S | Z | Y |
---|---|---|---|---|---|---|

004 | 0.94 | 0.94 | 1.19 | 1 | 1 | 1.19 |

005 | 0.77 | 0.77 | 1.02 | 1 | 0 | 0.77 |

008 | -0.17 | -0.17 | 0.08 | 1 | 1 | 0.08 |

009 | 1.60 | 1.60 | 1.85 | 1 | 1 | 1.85 |

010 | -1.39 | -1.39 | -1.14 | 1 | 0 | -1.39 |

#### Measurement

Measurement is a critical part of every research design; sometimes it is beneficial to explicitly declare the measurement procedures of the design, rather than allowing them to be implicit in the ways variables are created in `declare_model`

. For example, we might imagine that the normally distributed outcome variable `Y`

is a latent outcome that will be translated into a binary outcome when measured by the researcher:

`declare_measurement(Y_binary = rbinom(N, 1, prob = pnorm(Y)))`

ID | U | Y_Z_0 | Y_Z_1 | S | Z | Y | Y_binary |
---|---|---|---|---|---|---|---|

001 | -1.19 | -1.19 | -0.94 | 1 | 0 | -1.19 | 0 |

002 | -0.66 | -0.66 | -0.41 | 1 | 0 | -0.66 | 0 |

003 | 1.09 | 1.09 | 1.34 | 1 | 0 | 1.09 | 1 |

004 | 0.51 | 0.51 | 0.76 | 1 | 1 | 0.76 | 1 |

007 | 0.76 | 0.76 | 1.00 | 1 | 0 | 0.76 | 0 |

### Answer strategy

Through our model and data strategy steps, we have simulated a dataset with two key inputs to the answer strategy: an assignment variable and an outcome. In other answer strategies, pretreatment characteristics from the model might also be relevant. The data look like this:

ID | U | Y_Z_0 | Y_Z_1 | S | Z | Y |
---|---|---|---|---|---|---|

001 | -0.06 | -0.06 | 0.190 | 1 | 0 | -0.06 |

002 | -0.16 | -0.16 | 0.089 | 1 | 0 | -0.16 |

004 | 0.99 | 0.99 | 1.239 | 1 | 1 | 1.24 |

006 | -2.61 | -2.61 | -2.364 | 1 | 0 | -2.61 |

007 | 0.78 | 0.78 | 1.030 | 1 | 1 | 1.03 |

Our estimator is the difference-in-means estimator, which compares outcomes between the group that was assigned to treatment and that assigned to control. The `difference_in_means()`

function in the `estimatr`

package calculates the estimate, the standard error, \(p\)-value and confidence interval for you:

`difference_in_means(Y ~ Z, data = simple_design_data)`

term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|

Z | 0.39 | 0.28 | 1.4 | 0.17 | -0.17 | 0.95 | 46 | Y |

Now, in order to *declare* our estimator, we can send the name of a modeling function to `declare_estimator`

. R has many modeling functions that work with `declare_estimator`

, including `lm`

, `glm`

, or the `ictreg`

function from the `list`

package, among hundreds of others. We use many estimators from `estimatr`

because they are fast and calculate robust standard errors easily. Estimators are (almost always) associated with estimands.^{1} Here, we are targeting the population average treatment effect with the difference-in-means estimator.

```
declare_estimator(
Y ~ Z, model = difference_in_means, inquiry = "PATE"
)
```

#### Two finer points: `model_summary`

and `label_estimator`

Many answer strategies use modeling functions like `lm`

, `lm_robust`

, or `glm`

. The output from these modeling functions are typically very complicated `list`

objects that contain large amounts of information about the modeling process. We typically only want a few summary pieces of information out of these model objects, like the coefficient estimates, standard errors, and confidence intervals. We use model summary functions passed to the `model_summary`

argument of `declare_estimator`

to do so. Model summary functions take models as inputs and return data frames as outputs.

The default model summary function is `tidy`

:

```
declare_estimator(
Y ~ Z, model = lm_robust, model_summary = tidy
)
```

You could also use `glance`

to get model fit statistics like \(R^2\).

```
declare_estimator(
Y ~ Z, model = lm_robust, model_summary = glance
)
```

Occasionally, you’ll need to write your own model summary function that takes a model fit object and returns a data.frame with the information you need. For example, in order to calculate average marginal effects estimates from a logistic regression, we run a `glm`

model through the `margins`

function from the `margins`

package; we then need to “tidy” the output from `margins`

using the `tidy`

function. Here we’re also asking for a 95% confidence interval.

```
tidy_margins <- function(x) {
tidy(margins(x, data = x$data), conf.int = TRUE)
}
declare_estimator(
Y ~ Z + X,
model = glm,
family = binomial("logit"),
model_summary = tidy_margins,
term = "Z"
)
```

If your answer strategy does not use a `model`

function, you’ll need to provide a function that takes `data`

as an input and returns a `data.frame`

with the estimate. Set the handler to be `label_estimator(your_function_name)`

to take advantage of DeclareDesign’s mechanism for matching estimands to estimators. When you use `label_estimator`

, you can provide an estimand, and DeclareDesign will keep track of which estimates match each estimand. For example, to calculate the mean of an outcome, you could write your own estimator in this way:

```
my_estimator <- function(data){
data.frame(estimate = mean(data$Y))
}
declare_estimator(handler = label_estimator(my_estimator),
label = "mean", inquiry = "Y_bar")
```

```
## declare_estimator(inquiry = "Y_bar", handler = label_estimator(my_estimator),
## label = "mean")
```

### Other design steps

The main `declare_*`

functions cover many elements of research designs, but not all. You can include any operations we haven’t explicitly included as steps in your design too, using `declare_step`

. Here, you must define a specific handler. Some handlers that may be useful are the `dplyr`

verbs such as `mutate`

and `summarize`

, and the `fabricate`

function from our `fabricatr`

package.

To add a variable using fabricate:

`declare_step(handler = fabricate, added_variable = rnorm(N))`

If you have district-month data you may want to analyze at the district level, collapsing across months:

```
collapse_data <- function(data, collapse_by) {
data %>%
group_by({{ collapse_by }}) %>%
summarize_all(mean, na.rm = TRUE)
}
declare_step(handler = collapse_data, collapse_by = district)
# Note: The `{{ }}` syntax is handy for writing functions in `dplyr`
# where you want to be able to reuse the function with different variable
# names. Here, the `collapse_data` function will `group_by` the
# variable you send to the argument `collapse_by`, which in our
# declaration we set to `district`. The pipeline within the function
# then calculates the mean in each district.
```

## Building a design from design steps

In the last section, we defined a set of individual research steps. We draw one version of them together here:

```
model <-
declare_model(N = 100, U = rnorm(N),
potential_outcomes(Y ~ 0.25 * Z + U))
inquiry <-
declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0))
sampling <- declare_sampling(
S = complete_rs(N, n = 50))
assignment <- declare_assignment(
Z = complete_ra(N, prob = 0.5))
measurement <- declare_measurement(Y = reveal_outcomes(Y ~ Z))
answer_strategy <-
declare_estimator(
Y ~ Z, model = difference_in_means, inquiry = "PATE"
)
```

To construct a research design *object* that we can operate on — diagnose it, redesign it, draw data from it, etc. — we add them together with the `+`

operator, just as `%>%`

makes `dplyr`

pipelines or `+`

creates `ggplot`

objects.

```
design <-
model + inquiry +
sampling + assignment + measurement + answer_strategy
```

We will usually declare designs more compactly, concatenating steps directly with `+`

:

```
design <-
declare_model(N = 100, U = rnorm(N),
potential_outcomes(Y ~ 0.25 * Z + U)) +
declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0)) +
declare_sampling(S = complete_rs(N, n = 50)) +
declare_assignment(Z = complete_ra(N, prob = 0.5)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_estimator(
Y ~ Z, model = difference_in_means, inquiry = "PATE"
)
```

### Order matters

When defining a design, the order in which steps are included in the design via the `+`

operator matters. Think of the order of your design as the temporal order in which steps take place. Here, since the inquiry comes before sampling and assignment, it is a *population* inquiry, the population average treatment effect.

```
model +
declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0)) +
sampling +
assignment +
measurement +
answer_strategy
```

We could define our inquiry as a *sample* average treatment effect by putting `inquiry`

after `sampling`

:

```
model +
sampling +
declare_inquiry(SATE = mean(Y_Z_1 - Y_Z_0)) +
assignment +
measurement +
answer_strategy
```

## Simulating a research design

Diagnosing a research design — learning about its properties — requires first simulating running the design over and over. We need to simulate the data generating process, then calculate the values of its inquiries, then calculate the resulting estimates. For example, to draw simulated data based on the design, we use `draw_data`

:

`draw_data(design)`

ID | U | Y_Z_0 | Y_Z_1 | S | Z | Y |
---|---|---|---|---|---|---|

003 | 0.080 | 0.080 | 0.33 | 1 | 1 | 0.33 |

008 | -1.194 | -1.194 | -0.94 | 1 | 1 | -0.94 |

010 | 1.300 | 1.300 | 1.55 | 1 | 1 | 1.55 |

013 | -1.914 | -1.914 | -1.66 | 1 | 0 | -1.91 |

016 | 0.034 | 0.034 | 0.28 | 1 | 1 | 0.28 |

`draw_data`

runs all of the “data steps” in a design, which are both from the model and from the data strategy (sampling, assignment, and measurement).

To simulate the estimands from a single run of the design, we use `draw_inquiries`

. This runs two operations at once: it draws the data, and calculates the estimands at the point defined by the design. For example, in our design, the estimand comes just after the potential outcomes. In this design, `draw_inquiries`

will run the first two steps and then calculate the value of inquiry from the `inquiry`

function we declared:

`draw_estimands(design)`

inquiry | estimand |
---|---|

PATE | 0.25 |

Similarly, we can draw the estimates from a single run with `draw_estimates`

which simulates data and, at the appropriate moment, calculates estimates.

`draw_estimates(design)`

term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome | inquiry |
---|---|---|---|---|---|---|---|---|---|

Z | 0.24 | 0.29 | 0.84 | 0.41 | -0.34 | 0.83 | 48 | Y | PATE |

To simulate designs, we use the `simulate_design`

function to draw data, calculate estimands and estimates, and then repeat the process over and over.

```
simulation_df <- simulate_design(design)
simulation_df
```

sim_ID | estimand | estimate | std.error | statistic | p.value | conf.low | conf.high | df |
---|---|---|---|---|---|---|---|---|

1 | 0.25 | 0.64 | 0.25 | 2.55 | 0.014 | 0.136 | 1.15 | 47 |

2 | 0.25 | -0.18 | 0.31 | -0.59 | 0.557 | -0.804 | 0.44 | 43 |

3 | 0.25 | 0.41 | 0.29 | 1.40 | 0.170 | -0.181 | 1.00 | 46 |

4 | 0.25 | 0.17 | 0.23 | 0.73 | 0.468 | -0.300 | 0.64 | 48 |

5 | 0.25 | 0.71 | 0.32 | 2.21 | 0.032 | 0.063 | 1.36 | 45 |

## Diagnosing a research design

Using the simulations data frame, we can calculate diagnosands like bias, root mean-squared-error, and power for each estimator-estimand pair. In `DeclareDesign`

, we do this in two steps. First, declare your diagnosands, which are functions that summarize simulations data. The software includes many pre-coded diagnosands (see Section **??**), though you can write your own like this:

```
study_diagnosands <- declare_diagnosands(
bias = mean(estimate - estimand),
rmse = sqrt(mean((estimate - estimand)^2)),
power = mean(p.value <= 0.05)
)
```

Second, apply your diagnosand declaration to the simulations data frame with the `diagnose_design`

function:

`diagnose_design(simulation_df, diagnosands = study_diagnosands)`

Bias | RMSE | Power |
---|---|---|

-0.00 | 0.28 | 0.14 |

(0.01) | (0.01) | (0.01) |

We can also do this in a single step by sending `diagnose_design`

a design object. The function will first run the simulations for you, then calculate the diagnosands from the simulation data frame that results.

`diagnose_design(design, diagnosands = study_diagnosands)`

### Redesign

After the declaration phase, you will often want to learn how the diagnosands change as design features change. We can do this using `redesign`

:

`redesign(design, N = c(100, 200, 300, 400, 500))`

An alternative way to do this is to write a “designer.” A designer is a function that makes designs based on a few design parameters. Designer help researchers flexibly explore design variations. Here’s a simple designer based on our running example:

```
simple_designer <- function(sample_size, effect_size) {
declare_model(
N = sample_size,
U = rnorm(N),
potential_outcomes(Y ~ effect_size * Z + U)
) +
declare_inquiry(PATE = mean(Y_Z_1 - Y_Z_0)) +
declare_sampling(S = complete_rs(N, n = 50)) +
declare_assignment(Z = complete_ra(N, prob = 0.5)) +
declare_measurement(Y = reveal_outcomes(Y ~ Z)) +
declare_estimator(
Y ~ Z, model = difference_in_means, inquiry = "PATE"
)
}
```

To create a single design, based on our original parameters of a 100-unit sample size and a treatment effect of `0.25`

, we can run:

`design <- simple_designer(sample_size = 100, effect_size = 0.25)`

Now to simulate multiple designs, we can use the `DeclareDesign`

function `expand_design`

. Here we examine our simple design under several possible sample sizes, which we might want to do to conduct a minimum power analysis. We hold the effect size constant.

```
designs <- expand_design(
simple_designer,
sample_size = c(100, 500, 1000),
effect_size = 0.25
)
```

Our simulation and diagnosis tools can take a list of designs and simulate all of them at once, creating a column called `design`

to keep track. For example:

`diagnose_design(designs)`

### Comparing designs

Alternatively, we can compare a pair of designs directly with the `compare_designs`

function. This function is most useful for comparing the differences between a planned design and an implemented design.

`compare_designs(planned_design, implemented_design)`

Similarly, we can compare two designs on the basis of their diagnoses:

`compare_diagnoses(planned_design, implemented_design)`

### Library of designs

In our `DesignLibrary`

package, we have created a set of common designs as designers (functions that create designs from just a few parameters), so you can get started quickly.

```
library(DesignLibrary)
block_cluster_design <- block_cluster_two_arm_designer(N = 1000, N_blocks = 10)
```

## Further Reading

For much more detail and help, we recommend the following resources.

- Short introduction to DeclareDesign on the World Bank Development Impact blog
- Our paper in the
*American Political Science Review*(appendices) - Library of research designs, which includes a range of canonical designs that can be declared with a single line of code and then tailored for your own uses
- Blog series that demonstrates how DeclareDesign can be used to shed light on a range of tricky design choices, which includes declarations of many designs
- randomizr cheatsheet
- estimatr cheatsheet
- DeclareDesign cheatsheet
- R for Data Science
- RStudio R primers
- Computational social science bootcamp

Sometimes, you may be interested in properties of an estimator that do not depend on an estimand, such as calculating its power↩︎