A $$2^K$$ factorial design is an extension of the two_by_two_designer that allows for $$k$$ number of factors (or treatments), each with two levels, resulting in $$2^K$$ combinations of treatment conditions. You might be interested in testing not just how two, but rather how three, or how any number ($$k$$) of treatment conditions interact. The interest in the effect of different treatments in the presence or absence of other treatments is a key distinction from the multi_arm_designer, in which we evaluate the effect of treatments where there is no overlap in the assignment of different conditions.

Let’s say you set out to study how incentives and managerial practices affect employee productivity in combination. Let’s imagine that intervention $$A$$ is a salary increase conditional on performance indicators, intervention $$B$$ is increased monitoring, and intervention $$C$$ is a motivational activity. A key question might be the marginal effect of conditional salary increases on productivity in the presence of greater monitoring or in the presence of motivational interventions, or both.

## Design Declaration

• Model:

We stipulate $$2^k$$ potential outcomes. In our example, with treatments $$\{A,B,C\} \in \{0,1\}$$, we have $$Y^{ABC}$$. We can denote the potential outcome of a unit who is in control for all treatments $$Y^{000}_i$$, for example, and that for a unit who is assigned to B and C only as $$Y^{011}$$. Each potential outcome is distributed normally with mean $$\mu_j$$ and standard deviation $$\sigma_j$$, where $$j \in \{1,...,2^k\}$$.

• Inquiry:

Estimands are defined for each combination of treatment assignment as weighted averages of differences in potential outcomes. The effect of A in this design, $$\tau_A$$, averaged over conditions defined by B and C, is given by: $$\tau_A = 1/4*(Y^{111} - Y^{011}) + 1/4*(Y^{101} - Y^{001}) + 1/4*(Y^{110} - Y^{010}) + 1/4*(Y^{100} - Y^{000})$$.

• Data strategy:

We randomly assign units to one of the $$2^k$$ combinations of treatment using equal probabilities of assignment.

Estimates draw from a regression in which all treatments are demeaned and weighted by the inverse probability of being in the condition they are in. In this demeaned regression, the constant captures the average outcome across all conditions — not the outcome when all units are in the control condition.

N <- 256
outcome_means <- c(0, 0, 0, 0, 0, 0, 0, 0)
outcome_sds <- c(1, 1, 1, 1, 1, 1, 1, 1)

population <- declare_population(N)
potential_outcomes <- declare_potential_outcomes(Y_0_0_0 = outcome_means[1L] + rnorm(N,
0, outcome_sds[1L]), Y_1_0_0 = outcome_means[2L] + rnorm(N,
0, outcome_sds[2L]), Y_0_1_0 = outcome_means[3L] + rnorm(N,
0, outcome_sds[3L]), Y_1_1_0 = outcome_means[4L] + rnorm(N,
0, outcome_sds[4L]), Y_0_0_1 = outcome_means[5L] + rnorm(N,
0, outcome_sds[5L]), Y_1_0_1 = outcome_means[6L] + rnorm(N,
0, outcome_sds[6L]), Y_0_1_1 = outcome_means[7L] + rnorm(N,
0, outcome_sds[7L]), Y_1_1_1 = outcome_means[8L] + rnorm(N,
0, outcome_sds[8L]))
reveal_Y <- declare_reveal(handler = function(data) {
potential_cols <- mapply(paste, data[, c("T1", "T2", "T3"
), drop = FALSE], sep = "_", SIMPLIFY = FALSE)
potential_cols <- do.call(paste, c("Y", potential_cols, sep = "_"))
upoc <- unique(potential_cols)
df <- data[, upoc, drop = FALSE]
R <- seq_len(nrow(df))
C <- match(potential_cols, colnames(df))
data[, "Y"] <- df[cbind(R, C)]
data
})
estimand <- declare_estimand(Overall_average = mean(0.125 * Y_0_0_0 + 0.125 *
Y_0_0_1 + 0.125 * Y_0_1_0 + 0.125 * Y_0_1_1 + 0.125 * Y_1_0_0 +
0.125 * Y_1_0_1 + 0.125 * Y_1_1_0 + 0.125 * Y_1_1_1), te_T3 = mean(-0.25 *
Y_0_0_0 + 0.25 * Y_0_0_1 + -0.25 * Y_0_1_0 + 0.25 * Y_0_1_1 +
-0.25 * Y_1_0_0 + 0.25 * Y_1_0_1 + -0.25 * Y_1_1_0 + 0.25 *
Y_1_1_1), te_T2 = mean(-0.25 * Y_0_0_0 + -0.25 * Y_0_0_1 +
0.25 * Y_0_1_0 + 0.25 * Y_0_1_1 + -0.25 * Y_1_0_0 + -0.25 *
Y_1_0_1 + 0.25 * Y_1_1_0 + 0.25 * Y_1_1_1), te_T2:T3 = mean(0.5 *
Y_0_0_0 + -0.5 * Y_0_0_1 + -0.5 * Y_0_1_0 + 0.5 * Y_0_1_1 +
0.5 * Y_1_0_0 + -0.5 * Y_1_0_1 + -0.5 * Y_1_1_0 + 0.5 * Y_1_1_1),
te_T1 = mean(-0.25 * Y_0_0_0 + -0.25 * Y_0_0_1 + -0.25 *
Y_0_1_0 + -0.25 * Y_0_1_1 + 0.25 * Y_1_0_0 + 0.25 * Y_1_0_1 +
0.25 * Y_1_1_0 + 0.25 * Y_1_1_1), te_T1:T3 = mean(0.5 *
Y_0_0_0 + -0.5 * Y_0_0_1 + 0.5 * Y_0_1_0 + -0.5 * Y_0_1_1 +
-0.5 * Y_1_0_0 + 0.5 * Y_1_0_1 + -0.5 * Y_1_1_0 + 0.5 *
Y_1_1_1), te_T1:T2 = mean(0.5 * Y_0_0_0 + 0.5 * Y_0_0_1 +
-0.5 * Y_0_1_0 + -0.5 * Y_0_1_1 + -0.5 * Y_1_0_0 + -0.5 *
Y_1_0_1 + 0.5 * Y_1_1_0 + 0.5 * Y_1_1_1), te_T1:T2:T3 = mean(-1 *
Y_0_0_0 + 1 * Y_0_0_1 + 1 * Y_0_1_0 + -1 * Y_0_1_1 +
1 * Y_1_0_0 + -1 * Y_1_0_1 + -1 * Y_1_1_0 + 1 * Y_1_1_1),
label = "ATE")
assignment_factors <- declare_assignment(conditions = 1:(2^3), prob_each = c(0.125,
0.125, 0.125, 0.125, 0.125, 0.125, 0.125, 0.125))
assignment <- declare_step(fabricate, T1 = as.numeric(Z %in% c(2L, 4L, 6L,
8L)), T2 = as.numeric(Z %in% c(3L, 4L, 7L, 8L)), T3 = as.numeric(Z %in%
5:8))
estimator <- declare_estimator(handler = label_estimator(function(data) {
data[, names(data) %in% c("T1", "T2", "T3")] <- data[, names(data) %in%
c("T1", "T2", "T3")] - 0.5
mod <- lm_robust(formula = Y ~ T1 * T2 * T3, data = data,
weights = 1/Z_cond_prob)
estimate_df <- tidy(mod)
estimate_df$estimand_label <- paste0("te_", estimate_df$term)
estimate_df$estimand_label[estimate_df$estimand_label ==
"te_(Intercept)"] <- "Overall_average"
estimate_df
}))
factorial_design <- population + potential_outcomes + assignment_factors +
assignment + reveal_Y + estimand + estimator

# Takeaways

diagnosis <- diagnose_design(factorial_design, sims = 25)
## Warning: We recommend you choose a higher number of simulations than 25 for the
## top level of simulation.
Design Label Estimand Label Estimator Label Term N Sims Bias RMSE Power Coverage Mean Estimate SD Estimate Mean Se Type S Rate Mean Estimand
factorial_design Overall_average estimator (Intercept) 25 0.00 0.06 0.08 0.92 0.00 0.06 0.06 0.00 0.00
(0.01) (0.01) (0.06) (0.06) (0.01) (0.01) (0.00) NA (0.00)
factorial_design te_T1 estimator T1 25 0.03 0.10 0.00 1.00 0.02 0.09 0.13 NaN -0.01
(0.02) (0.01) (0.00) (0.00) (0.02) (0.01) (0.00) NA (0.01)
factorial_design te_T1:T2 estimator T1:T2 25 0.01 0.23 0.00 1.00 0.02 0.25 0.25 NaN 0.01
(0.05) (0.02) (0.00) (0.00) (0.05) (0.02) (0.00) NA (0.01)
factorial_design te_T1:T2:T3 estimator T1:T2:T3 25 0.18 0.43 0.04 1.00 0.15 0.41 0.50 0.00 -0.03
(0.07) (0.06) (0.04) (0.00) (0.08) (0.08) (0.00) NA (0.03)
factorial_design te_T1:T3 estimator T1:T3 25 -0.05 0.26 0.08 0.96 -0.03 0.26 0.25 0.50 0.01
(0.05) (0.03) (0.05) (0.04) (0.05) (0.04) (0.00) NA (0.02)
factorial_design te_T2 estimator T2 25 -0.01 0.10 0.00 1.00 -0.02 0.10 0.13 NaN -0.01
(0.02) (0.01) (0.00) (0.00) (0.02) (0.01) (0.00) NA (0.01)
factorial_design te_T2:T3 estimator T2:T3 25 -0.03 0.27 0.08 0.92 -0.03 0.32 0.25 0.50 -0.01
(0.06) (0.03) (0.06) (0.06) (0.07) (0.03) (0.00) NA (0.02)
factorial_design te_T3 estimator T3 25 0.01 0.11 0.04 0.96 0.02 0.12 0.13 0.00 0.01
(0.02) (0.02) (0.04) (0.04) (0.02) (0.02) (0.00) NA (0.01)
• We see that the space of estimands we can estimate is amazingly rich, but our power to do so is very low.