This vignette serves as a brief introduction to the `DeclareDesign`

package for R. `DeclareDesign`

is a software implementation of every step of the design-diagnose-redesign process. While you can of course declare, diagnose, and redesign your design using nearly any programming language, `DeclareDesign`

is structured to make it easy to mix-and-match design elements while handling the tedious simulation bookkeeping behind the scenes.

You can download the statistical computing environment R for free from CRAN. We also recommend the free program RStudio, which provides a friendly interface to R. Both R and RStudio are available on Windows, Mac, and Linux.

Once you have R and RStudio installed, open it up and install `DeclareDesign`

and its related packages. These include three packages that enable specific steps in the research process (`fabricatr`

for simulating social science data; `randomizr`

for random sampling and random assignment; and `estimatr`

for design-based estimators). You can also install `DesignLibrary`

, which gets standard designs up-and-running in one line. To install them, copy the following code into your R console:

```
install.packages(c(
"DeclareDesign",
"fabricatr",
"randomizr",
"estimatr",
"DesignLibrary"
))
```

We also recommend that you install and get to know the `tidyverse`

suite of packages for data analysis, which we will use throughout the book:

`install.packages("tidyverse")`

For introductions to R and the `tidyverse`

we especially recommend the free resource `R for Data Science`

.

A research design is a concatenation of design steps. The best way to learn how to build a design is to learn how to make a step. We will start out by making—or *declaring*—a step that implements random assignment.

Almost all steps take a dataset as input and return a dataset as output. We will imagine input data that describes a set of voters in Los Angeles. The research project we are planning involves randomly assigning voters to receive (or not receive) a knock on their door from a canvasser. Our data look like this:

ID | age | sex | party | precinct |
---|---|---|---|---|

001 | 66 | M | REP | 9104 |

002 | 54 | F | DEM | 8029 |

003 | 18 | M | GRN | 8383 |

004 | 42 | F | DEM | 2048 |

005 | 27 | M | REP | 5210 |

There are 100 voters in the dataset.

We want a function that takes this dataset, implements a random assignment, adds it to the dataset, and then returns the new dataset containing the random assignment.

You could write your own function to do that but you can also use one of the `declare_*`

functions in `DeclareDesign`

that are designed to write functions. Each one of these functions is a kind of *function factory*: it takes a set of parameters about your research design like the number of units and the random assignment probability as *inputs*, and returns a *function* as an output. Here is an example of a `declare_assignment`

step.

`simple_random_assignment_step <- declare_assignment(prob = 0.6)`

The big idea here is that the object we created, `simple_random_assignment_step`

, is not a particular assignment, it is a *function* that conducts assignment when called. You can run the function on data:

`simple_random_assignment_step(voter_file) `

ID | age | sex | party | precinct | Z | Z_cond_prob |
---|---|---|---|---|---|---|

001 | 66 | M | REP | 9104 | 0 | 0.4 |

002 | 54 | F | DEM | 8029 | 0 | 0.4 |

003 | 18 | M | GRN | 8383 | 1 | 0.6 |

004 | 42 | F | DEM | 2048 | 0 | 0.4 |

005 | 27 | M | REP | 5210 | 0 | 0.4 |

The output of the `simple_random_assignment_step(voter_file)`

call is the original dataset with a new column indicating treatment assignment (`Z`

) appended. As a bonus, the data also includes the probability that each unit is assigned *to the condition in which it is in* (`Z_cond`

), which is an extremely useful number to know in many analysis settings. The most important thing to understand here is that steps are “dataset-in, dataset-out” functions. The `simple_random_assignment_step`

took the `voter_file`

dataset and returned a dataset with assignment information appended.

Every step of a research design declaration can be written using one of the `declare_*`

functions. This table collects these according to the four elements of a research design. Below, we walk through the common uses of each of these declaration functions.

Design component | Function | Description |
---|---|---|

Model | `declare_population()` |
define background variables |

`declare_potential_outcomes()` |
define functional relationships between treatments and outcomes | |

Inquiry | `declare_estimand()` |
define research question |

Data strategy | `declare_sampling()` |
specify sampling procedures |

`declare_assignment()` |
specify assignment procedures | |

`reveal_outcomes()` |
link potential outcomes to revealed outcomes via assignment | |

`declare_measurement()` |
specify measurement procedures | |

Answer strategy | `declare_estimator()` |
specify data summary procedures |

Each of the `declare_*`

functions has many options. In general, you do not have to specify these as default values are usually provided. For instance, you might have noticed above that when you ran the assignment step above, the new variable that was created was called `Z`

. This is because `declare_assignment`

has an argument `assignment_variable`

that defaults to `Z`

. You can change that of course to whatever you want.

More subtly, the `declare_*`

functions also default to “handlers” which have their own default arguments. These handlers are generally well-developed sets of functions that implement the tasks needed by the `declare_`

function. For instance, `assignment_handler`

defaults to the `conduct_ra`

function from the `randomizr`

package. The declaration passes any additional arguments that you give it on to `conduct_ra`

, and, by the same token, assumes the default values of the handler. In the example above, we had `prob = 0.6`

as an argument. If you look at the documentation, `prob`

is not an argument of `declare_assignment`

but it is an argument of `conduct_ra`

, with a default value of 0.5. If we had left this bit out we would have gotten a function that assigned treatment with probability 0.5. As with any software, learning these defaults will take some time and can be looked up in the help files, e.g. `?declare_assignment`

.

The built-in functions we provide in the `DeclareDesign`

package are quite flexible and handle many major designs, but not all. The framework is built so that you are never constrained by what we provide. At any point, rather than using the default handlers (such as `conduct_ra`

), you can write a function that implements your own procedures. The only discipline that the framework imposes is that you write your procedure as a function that takes data in and sends data back.

Here is an example of how you turn your own functions into design steps.

```
custom_assignment <- function(data) {
mutate(data, Z = rbinom(n = nrow(data), 1, prob = 0.5))
}
my_assignment_step <- declare_assignment(handler = custom_assignment)
my_assignment_step(voter_file)
```

ID | age | sex | party | precinct | Z |
---|---|---|---|---|---|

001 | 66 | M | REP | 9104 | 0 |

002 | 54 | F | DEM | 8029 | 1 |

003 | 18 | M | GRN | 8383 | 1 |

004 | 42 | F | DEM | 2048 | 0 |

005 | 27 | M | REP | 5210 | 1 |

In this section, we walk through how to declare each step of a research design using `DeclareDesign`

. In the next section, we build those steps into a research design, and then describe how to interrogate the design.

The model defines the structure of the world, both its size and background characteristics as well as how interventions in the world determine outcomes. In `DeclareDesign`

, we split the model into two functions: `declare_population`

and `declare_potential_outcomes`

.

The population defines the number of units in the population, any multilevel structure to the data, and its background characteristics. We can define the population in several ways. In some cases, you may start a design with data on the population. When that happens, we do not need to simulate it. We can simply declare the data as our population:

`declare_population(data = voter_file)`

ID | age | sex | party | precinct |
---|---|---|---|---|

001 | 66 | M | REP | 9104 |

002 | 54 | F | DEM | 8029 |

003 | 18 | M | GRN | 8383 |

004 | 42 | F | DEM | 2048 |

005 | 27 | M | REP | 5210 |

When we do not have complete data on the population, we simulate it. Relying on the data simulation functions from our `fabricatr`

package, `declare_population`

asks about the size and variables of the population. For instance, if we want a function that generates a dataset with 100 units and a random variable `U`

we write:

`declare_population(N = 100, U = rnorm(N))`

When we run this population function, we will get a different 100-unit dataset each time, as shown here.

ID | U | ID | U | ID | U | ID | U | ID | U |
---|---|---|---|---|---|---|---|---|---|

001 | -0.32 | 001 | 0.19 | 001 | -1.280 | 001 | -0.38 | 001 | 0.744 |

002 | 1.17 | 002 | 0.69 | 002 | 1.880 | 002 | -0.35 | 002 | 2.445 |

003 | 1.70 | 003 | 0.82 | 003 | 0.597 | 003 | -0.64 | 003 | 0.043 |

004 | 0.93 | 004 | -0.98 | 004 | -1.963 | 004 | 0.40 | 004 | 0.159 |

005 | -1.15 | 005 | -1.29 | 005 | 0.084 | 005 | 0.34 | 005 | 1.686 |

The `fabricatr`

package can simulate many different types of data, including various types of categorical variables or different types of data structures, such as panel or multilevel structures. You can read the `fabricatr`

website vignette to get started simulating data.

As an example of a two-level hierarchical data structure, here is a declaration for 100 households with a random number of individuals within each household. This two-level structure could be declared as:

```
declare_population(
households = add_level(
N = 100,
individuals_per_hh = sample(1:6, N, replace = TRUE)
),
individuals = add_level(
N = individuals_per_hh,
age = sample(1:100, N, replace = TRUE)
)
)
```

As always, you can exit our built-in way of doing things and bring in your own code. This is useful for complex designs, or when you have already written code for your design and you want to use it directly. Here is an example of a custom population declaration:

```
complex_population_function <- function(data, N_units) {
data.frame(U = rnorm(N_units))
}
declare_population(
handler = complex_population_function, N_units = 100
)
```

Defining potential outcomes is as easy as a single expression per potential outcome. Potential outcomes may depend on background characteristics, other potential outcomes, or other R functions.

```
declare_potential_outcomes(
Y_Z_0 = U,
Y_Z_1 = Y_Z_0 + 0.25)
```

```
design <-
declare_population(N = 100, U = rnorm(N)) +
declare_potential_outcomes(Y ~ 0.25 * Z + U)
draw_data(design)
```

ID | U | Y_Z_0 | Y_Z_1 |
---|---|---|---|

001 | 1.53 | 1.53 | 1.782 |

002 | 0.92 | 0.92 | 1.172 |

003 | -1.19 | -1.19 | -0.937 |

004 | -0.20 | -0.20 | 0.046 |

005 | -1.06 | -1.06 | -0.809 |

The `declare_potential_outcomes`

function also includes an alternative interface for defining potential outcomes that uses R’s formula syntax. The formula syntax lets you specify “regression-like” outcome equations. One downside is that it mildly obscures how the names of the eventual potential outcomes columns are named. We build the names of the potential outcomes columns the outcome name (here `Y`

on the left-hand side of the formula) and from the `assignment_variables`

argument (here `Z`

).

`declare_potential_outcomes(Y ~ 0.25 * Z + U, assignment_variables = Z)`

Either way of creating potential outcomes works; one may be easier or harder to code up in a given research design setting.

To define your inquiry, declare your estimand. Estimands are typically summaries of the data produced in `declare_population`

and `declare_potential_outcomes`

. Here we define the average treatment effect as follows:

`declare_estimand(PATE = mean(Y_Z_1 - Y_Z_0))`

Notice that we defined the PATE (the *population* average treatment effect), but said nothing special related to the population – it looks like we just defined the average treatment effect. This is because *order matters*. If we want to define a SATE (the *sample* average treatment effect), we would have to do so after sampling has occurred. We will see how to do this in a moment.

The data strategy constitutes one or more steps representing interventions the researcher makes in the world from sampling to assignment to measurement.

The sampling step relies on the `randomizr`

package to conduct random sampling. Here we define a procedure for drawing a 50-unit sample from the population:

`declare_sampling(n = 50)`

When we draw data from our simple design at this point, it will have fewer rows: it will have shrunk from 100 units in the population to a data frame of 50 units representing the sample. The new data frame also includes a variable indicating the probability of being included in the sample. In this case, every unit in the population had an equal inclusion probability of 0.5.

ID | U | Y_Z_0 | Y_Z_1 | S_inclusion_prob | |
---|---|---|---|---|---|

1 | 001 | 0.86 | 0.86 | 1.11 | 0.5 |

3 | 003 | 0.62 | 0.62 | 0.87 | 0.5 |

5 | 005 | 1.02 | 1.02 | 1.27 | 0.5 |

6 | 006 | 0.86 | 0.86 | 1.11 | 0.5 |

8 | 008 | -0.43 | -0.43 | -0.18 | 0.5 |

Sampling could also be non-random, which could be accomplished by using a custom handler.

The default handler for `declare_assignment`

also relies on the `randomizr`

package for random assignment. Here, we define an assignment procedure that allocates subjects to treatment with probability 0.5. One subtlety is that by default, `declare_assignment`

conducts complete random assignment (exactly \(m\) of \(N\) units assigned to treatment, where \(m\) = `prob`

* \(N\)).

`declare_assignment(prob = 0.5)`

After treatments are assigned, some *potential* outcomes are *revealed*. Treated units reveal their treated potential outcomes and untreated units reveal their untreated potential outcomes. The `reveal_outcomes`

function performs this switching operation.

`reveal_outcomes(Y, Z)`

Adding these two declarations to the design results in a data frame with an additional indicator `Z`

for the assignment as well as its corresponding probability of assignment. Again, here the assignment probabilities are constant, but in other designs they are not and this is crucial information for the analysis stage. The outcome variable `Y`

is composed of each unit’s potential outcomes depending on its treatment status.

ID | U | Y_Z_0 | Y_Z_1 | S_inclusion_prob | Z | Z_cond_prob | Y |
---|---|---|---|---|---|---|---|

001 | -0.86 | -0.86 | -0.61 | 0.5 | 1 | 0.5 | -0.61 |

003 | 0.92 | 0.92 | 1.17 | 0.5 | 1 | 0.5 | 1.17 |

004 | -0.72 | -0.72 | -0.47 | 0.5 | 0 | 0.5 | -0.72 |

006 | 0.45 | 0.45 | 0.70 | 0.5 | 0 | 0.5 | 0.45 |

008 | 0.15 | 0.15 | 0.40 | 0.5 | 0 | 0.5 | 0.15 |

Measurement is a critical part of every research design; sometimes it is beneficial to explicitly declare the measurement procedures of the design, rather than allowing them to be implicit in the ways variables are created in `declare_population`

and `declare_potential_outcomes`

. For example, we might imagine that the normally distributed outcome variable `Y`

is a latent outcome that will be translated into a binary outcome when measured by the researcher:

`declare_measurement(Y_binary = rbinom(N, 1, prob = pnorm(Y)))`

ID | U | Y_Z_0 | Y_Z_1 | S_inclusion_prob | Z | Z_cond_prob | Y | Y_binary |
---|---|---|---|---|---|---|---|---|

001 | -1.1 | -1.1 | -0.86 | 0.5 | 1 | 0.5 | -0.86 | 0 |

006 | 3.1 | 3.1 | 3.38 | 0.5 | 1 | 0.5 | 3.38 | 1 |

008 | 1.9 | 1.9 | 2.11 | 0.5 | 0 | 0.5 | 1.86 | 1 |

012 | -1.6 | -1.6 | -1.32 | 0.5 | 1 | 0.5 | -1.32 | 0 |

013 | -0.3 | -0.3 | -0.05 | 0.5 | 0 | 0.5 | -0.30 | 0 |

Through our model and data strategy steps, we have simulated a dataset with two key inputs to the answer strategy: an assignment variable and an outcome. In other answer strategies, pretreatment characteristics from the model might also be relevant. The data look like this:

ID | U | Y_Z_0 | Y_Z_1 | S_inclusion_prob | Z | Z_cond_prob | Y |
---|---|---|---|---|---|---|---|

001 | 1.680 | 1.680 | 1.93 | 0.5 | 1 | 0.5 | 1.930 |

002 | 0.057 | 0.057 | 0.31 | 0.5 | 0 | 0.5 | 0.057 |

003 | -1.280 | -1.280 | -1.03 | 0.5 | 1 | 0.5 | -1.030 |

005 | 0.613 | 0.613 | 0.86 | 0.5 | 1 | 0.5 | 0.863 |

006 | -1.241 | -1.241 | -0.99 | 0.5 | 1 | 0.5 | -0.991 |

Our estimator is the difference-in-means estimator, which compares outcomes between the group that was assigned to treatment and that assigned to control. The `difference_in_means()`

function in the `estimatr`

package calculates the estimate, the standard error, \(p\)-value and confidence interval for you:

`difference_in_means(Y ~ Z, data = simple_design_data)`

term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome |
---|---|---|---|---|---|---|---|---|

Z | 0.13 | 0.27 | 0.5 | 0.62 | -0.4 | 0.67 | 48 | Y |

Now, in order to *declare* our estimator, we can send the name of a modeling function to `declare_estimator`

. R has many modeling functions that work with `declare_estimator`

, including `lm`

, `glm`

, or the `ictreg`

function from the `list`

package, among hundreds of others. Throughout the book, we will be using many estimators from `estimatr`

because they are fast and calculate robust standard errors easily. Estimators are (almost always) associated with estimands.^{1} Here, we are targeting the population average treatment effect with the difference-in-means estimator.

```
declare_estimator(
Y ~ Z, model = difference_in_means, estimand = "PATE"
)
```

`model_summary`

and `label_estimator`

Many answer strategies use modeling functions like `lm`

, `lm_robust`

, or `glm`

. The output from these modeling functions are typically very complicated `list`

objects that contain large amounts of information about the modeling process. We typically only want a few summary pieces of information out of these model objects, like the coefficient estimates, standard errors, and confidence intervals. We use model summary functions passed to the `model_summary`

argument of `declare_estimator`

to do so. Model summary functions take models as inputs and return data frames as outputs.

The default model summary function is `tidy`

:

```
declare_estimator(
Y ~ Z, model = lm_robust, model_summary = tidy
)
```

You could also use `glance`

to get model fit statistics like \(R^2\).

```
declare_estimator(
Y ~ Z, model = lm_robust, model_summary = glance
)
```

Occasionally, you’ll need to write your own model summary function that takes a model fit object and returns a data.frame with the information you need. For example, in order to calculate average marginal effects estimates from a logistic regression, we run a `glm`

model through the `margins`

function from the `margins`

package; we then need to “tidy” the output from `margins`

using the `tidy`

function. Here we’re also asking for a 95% confidence interval.

```
tidy_margins <- function(x) {
tidy(margins(x, data = x$data), conf.int = TRUE)
}
declare_estimator(
Y ~ Z + X,
model = glm,
family = binomial("logit"),
model_summary = tidy_margins,
term = "Z"
)
```

If your answer strategy does not use a `model`

function, you’ll need to provide a function that takes `data`

as an input and returns a `data.frame`

with the estimate. Set the handler to be `label_estimator(your_function_name)`

to take advantage of DeclareDesign’s mechanism for matching estimands to estimators. When you use `label_estimator`

, you can provide an estimand, and DeclareDesign will keep track of which estimates match each estimand. For example, to calculate the mean of an outcome, you could write your own estimator in this way:

```
my_estimator <- function(data){
data.frame(estimate = mean(data$Y))
}
declare_estimator(handler = label_estimator(my_estimator), label = "mean", estimand = "Y_bar")
```

```
## declare_estimator(estimand = "Y_bar", handler = label_estimator(my_estimator),
## label = "mean")
```

The main `declare_*`

functions cover many elements of research designs, but not all. You can include any operations we haven’t explicitly included as steps in your design too, using `declare_step`

. Here, you must define a specific handler. Some handlers that may be useful are the `dplyr`

verbs such as `mutate`

and `summarize`

, and the `fabricate`

function from our `fabricatr`

package.

To add a variable using fabricate:

`declare_step(handler = fabricate, added_variable = rnorm(N))`

If you have district-month data you may want to analyze at the district level, collapsing across months:

```
collapse_data <- function(data, collapse_by) {
data %>%
group_by({{ collapse_by }}) %>%
summarize_all(mean, na.rm = TRUE)
}
declare_step(handler = collapse_data, collapse_by = district)
# Note: The `{{ }}` syntax is handy for writing functions in `dplyr`
# where you want to be able to reuse the function with different variable
# names. Here, the `collapse_data` function will `group_by` the
# variable you send to the argument `collapse_by`, which in our
# declaration we set to `district`. The pipeline within the function
# then calculates the mean in each district.
```

In the last section, we defined a set of individual research steps. We draw one version of them together here:

```
population <-
declare_population(N = 100, U = rnorm(N))
potential_outcomes <-
declare_potential_outcomes(Y ~ 0.25 * Z + U)
estimand <-
declare_estimand(PATE = mean(Y_Z_1 - Y_Z_0))
sampling <-
declare_sampling(n = 50)
assignment <-
declare_assignment(prob = 0.5)
reveal <-
reveal_outcomes(outcome_variables = Y, assignment_variables = Z)
estimator <-
declare_estimator(
Y ~ Z, model = difference_in_means, estimand = "PATE"
)
```

To construct a research design *object* that we can operate on — diagnose it, redesign it, draw data from it, etc. — we add them together with the `+`

operator, just as `%>%`

makes `dplyr`

pipelines or `+`

creates `ggplot`

objects.

```
design <-
population + potential_outcomes + estimand +
sampling + assignment + reveal + estimator
```

We will usually declare designs more compactly, concatenating steps directly with `+`

:

```
design <-
declare_population(N = 100, U = rnorm(N)) +
declare_potential_outcomes(Y ~ 0.25 * Z + U) +
declare_estimand(PATE = mean(Y_Z_1 - Y_Z_0)) +
declare_sampling(n = 50) +
declare_assignment(prob = 0.5) +
reveal_outcomes(outcome_variables = Y, assignment_variables = Z) +
declare_estimator(
Y ~ Z, model = difference_in_means, estimand = "PATE"
)
```

When defining a design, the order in which steps are included in the design via the `+`

operator matters. Think of the order of your design as the temporal order in which steps take place. Here, since the estimand comes before sampling and assignment, it is a *population* estimand, the population average treatment effect.

```
population + potential_outcomes + estimand +
sampling + assignment + reveal + estimator
```

We could define our estimand as a *sample* average treatment effect by putting `estimand`

after `sampling`

:

```
population + potential_outcomes + sampling +
estimand + assignment + reveal + estimator
```

Diagnosing a research design — learning about its properties — requires first simulating running the design over and over. We need to simulate the data generating process, then calculate the estimands, then calculate the resulting estimates.

With the design defined as an object, we can learn about what kind of data it generates, the values of its estimand and estimates, and other features. For example, to draw simulated data based on the design, we use `draw_data`

:

`draw_data(design)`

ID | U | Y_Z_0 | Y_Z_1 | S_inclusion_prob | Z | Z_cond_prob | Y |
---|---|---|---|---|---|---|---|

002 | 0.076 | 0.076 | 0.33 | 0.5 | 0 | 0.5 | 0.076 |

005 | 0.595 | 0.595 | 0.84 | 0.5 | 0 | 0.5 | 0.595 |

006 | -0.431 | -0.431 | -0.18 | 0.5 | 0 | 0.5 | -0.431 |

007 | -1.133 | -1.133 | -0.88 | 0.5 | 0 | 0.5 | -1.133 |

008 | -2.237 | -2.237 | -1.99 | 0.5 | 0 | 0.5 | -2.237 |

`draw_data`

runs all of the “data steps” in a design, which are both from the model (population and potential outcomes) and from the data strategy (sampling, assignment, and measurement).

To simulate the estimands from a single run of the design, we use `draw_estimands`

. This runs two operations at once: it draws the data, and calculates the estimands at the point defined by the design. For example, in our design, the estimand comes just after the potential outcomes. In this design, `draw_estimands`

will run the first two steps and then calculate the estimands from the `estimand`

function we declared:

`draw_estimands(design)`

estimand_label | estimand |
---|---|

PATE | 0.25 |

Similarly, we can draw the estimates from a single run with `draw_estimates`

which simulates data and, at the appropriate moment, calculates estimates.

`draw_estimates(design)`

term | estimate | std.error | statistic | p.value | conf.low | conf.high | df | outcome | estimand_label |
---|---|---|---|---|---|---|---|---|---|

Z | 0.63 | 0.32 | 2 | 0.054 | -0.011 | 1.3 | 44 | Y | PATE |

To simulate designs, we use the `simulate_design`

function to draw data, calculate estimands and estimates, and then repeat the process over and over.

`simulation_df <- simulate_design(design)`

sim_ID | estimand | estimate | std.error | statistic | p.value | conf.low | conf.high | df |
---|---|---|---|---|---|---|---|---|

1 | 0.25 | 0.10 | 0.31 | 0.33 | 0.74 | -0.52 | 0.73 | 43 |

2 | 0.25 | 0.30 | 0.30 | 0.99 | 0.33 | -0.31 | 0.91 | 47 |

3 | 0.25 | 0.24 | 0.32 | 0.73 | 0.47 | -0.42 | 0.89 | 40 |

4 | 0.25 | 0.44 | 0.27 | 1.61 | 0.12 | -0.11 | 0.98 | 41 |

5 | 0.25 | -0.13 | 0.29 | -0.45 | 0.66 | -0.71 | 0.45 | 45 |

Using the simulations data frame, we can calculate diagnosands like bias, root mean-squared-error, and power for each estimator-estimand pair. In `DeclareDesign`

, we do this in two steps. First, declare your diagnosands, which are functions that summarize simulations data. The software includes many pre-coded diagnosands, though you can write your own like this:

```
study_diagnosands <- declare_diagnosands(
bias = mean(estimate - estimand),
rmse = sqrt(mean((estimate - estimand)^2)),
power = mean(p.value <= 0.05)
)
```

Second, apply your diagnosand declaration to the simulations data frame with the `diagnose_design`

function:

`diagnose_design(simulation_df, diagnosands = study_diagnosands)`

Bias | RMSE | Power |
---|---|---|

0.00 | 0.28 | 0.14 |

(0.01) | (0.01) | (0.01) |

We can also do this in a single step by sending `diagnose_design`

a design object. The function will first run the simulations for you, then calculate the diagnosands from the simulation data frame that results.

`diagnose_design(design, diagnosands = study_diagnosands)`

After the declaration phase, you will often want to learn how the diagnosands change as design features change. We can do this using `redesign`

:

An alternative way to do this is to write a “designer.” A designer is a function that makes designs based on a few design parameters. Designer help researchers flexibly explore design variations. Here’s a simple designer based on our running example:

```
simple_designer <- function(sample_size, effect_size) {
declare_population(N = sample_size, U = rnorm(N)) +
declare_potential_outcomes(Y ~ effect_size * Z + U) +
declare_estimand(PATE = mean(Y_Z_1 - Y_Z_0)) +
declare_sampling(n = 50) +
declare_assignment(prob = 0.5) +
reveal_outcomes(outcome_variables = Y, assignment_variables = Z) +
declare_estimator(
Y ~ Z, model = difference_in_means, estimand = "PATE"
)
}
```

To create a single design, based on our original parameters of a 100-unit sample size and a treatment effect of `0.25`

, we can run:

`design <- simple_designer(sample_size = 100, effect_size = 0.25)`

Now to simulate multiple designs, we can use the `DeclareDesign`

function `expand_design`

. Here we examine our simple design under several possible sample sizes, which we might want to do to conduct a minimum power analysis. We hold the effect size constant.

```
designs <- expand_design(
simple_designer,
sample_size = c(100, 500, 1000),
effect_size = 0.25
)
```

Our simulation and diagnosis tools can take a list of designs and simulate all of them at once, creating a column called `design_label`

to keep track. For example:

`diagnose_design(designs)`

Alternatively, we can compare a pair of designs directly with the `compare_designs`

function. This function is most useful for comparing the differences between a planned design and an implemented design.

`compare_designs(planned_design, implemented_design)`

Similarly, we can compare two designs on the basis of their diagnoses:

`compare_diagnoses(planned_design, implemented_design)`

In our `DesignLibrary`

package, we have created a set of common designs as designers (functions that create designs from just a few parameters), so you can get started quickly.

```
library(DesignLibrary)
b_c_design <- block_cluster_two_arm_designer(N = 1000, N_blocks = 10)
```

Sometimes, you may be interested in properties of an estimator that do not depend on an estimand, such as calculating its power↩︎