Getting started with fabricatr

fabricatr is designed to help you solve two key problems:

1. Creating common variable types

fabricatr allows you to quickly create variables that mimic those you plan to collect during the course of observational or experimental work. The current version supports common variable types including assignment to treatment, count data, ordinal data (including “Likert scale” data, popular in surveys and survey experiments), categorical data (popular for modeling demographic characteristics). In addition, we support the creation of data with fixed intra-cluster correlations, so individual observations can be modelled as being part of groups or regions.

Imagine a survey experiment of voters from across social groups. With fabricatr, we can model voters as part of social groups, each of whom has characteristics like ideology and income, opinions about political issues. We can assign these voters to a treatment encouraging them to vote for a proposition, and model the results of the experiment:

library(fabricatr)

voters <- fabricate(
  N = 1000,
  group_id = rep(1:10, 100),
  ideology = draw_normal_icc(mean = 0, N = N, clusters = group_id, ICC = 0.7),
  ideological_label = draw_ordered(
    x = ideology,
    break_labels = c(
      "Very Conservative", "Conservative",
      "Liberal", "Very Liberal"
    )
  ),
  income = exp(rlnorm(n = N, meanlog = 2.4 - (ideology * 0.1), sdlog = 0.12)),
  Q1_immigration = draw_likert(x = ideology, min = -5, max = 5, bins = 7),
  Q2_defence = draw_likert(x = ideology + 0.5, min = -5, max = 5, bins = 7),
  treatment = draw_binary(0.5, N = N),
  proposition_vote = draw_binary(latent = ideology + 1.2 * treatment, link = "probit")
)

Let’s look at a small fraction of the data generated this way:

group_id	ideology	ideological_label	Q1_immigration	Q2_defence	treatment	proposition_vote
3	-0.22	Conservative	4	4	0	0
3	-0.39	Conservative	4	4	0	0
4	-1.19	Very Conservative	3	4	0	1
6	-0.77	Conservative	3	4	1	1
1	0.68	Liberal	4	5	1	1

Modeling data like this allows common analyses in advance of conducting your experiment. This data can also be included in a pre-analysis plan to add clarity to your experimental design and contribute to improving transparency and replicability. fabricatr also allows you to import existing data and modify it easily, or to resample existing data into new, simulated populations.

If you’d like to read more about using fabricatr to model the variables you plan to collect in your experiment, see our guide on common social science variables, our technical manual on generating variables with fabricatr, or our tutorials on resampling data or integrating other data-generating packages into a fabricatr workflow..

2. Structuring your data

fabricatr also allows you to structure your data in the shape your real experimental data will be. Although many experimental data are individual observations, like the example above, other popular data structures include panel data, multi-level (hierarchical or “nested”) data and cross-classified data. fabricatr supports both of these cases.

One common example in the social sciences is panel data, which is easy to create with fabricatr:

library(fabricatr)

panel <- fabricate(
  countries = add_level(N = 150, country_fe = runif(N, 1, 10)),
  years = add_level(N = 25, year_shock = runif(N, 1, 10), nest = FALSE),
  observations = cross_levels(
    by = join_using(countries, years),
    outcome_it = country_fe + year_shock + rnorm(N, 0, 2)
  )
)

If you’d like to read more about using fabricatr to structure your data, see our guides on building and importing datasets in fabricatr, generating cross-classified and panel data, or resampling data with fabricatr.

DeclareDesign

fabricatr is one of four packages that make up the DeclareDesign software suite. Along with fabricatr, which helps you imagine your data before you collect it, we offer estimatr, fast estimators for social scientists, randomizr, an easy to use tools for common forms of random assignment and sampling, and DeclareDesign, a package for declaring and diagnosing research designs to understand and improve them.

In addition to our documentation in R and online, we are happy to respond to any questions you have about using our packages, or incorporate your requests for new features. You can contact us via the DeclareDesign help board.

Aaron Rudkin

1. Creating common variable types

2. Structuring your data

DeclareDesign