fabricate
helps you simulate a dataset before you collect it. You can
either start with your own data and add simulated variables to it (by passing
data
to fabricate()
) or start from scratch by defining
N
. Create hierarchical data with multiple levels of data such as
citizens within cities within states using add_level()
or modify
existing hierarchical data using modify_level()
. You can use any R
function to create each variable. Use cross_levels()
and
link_levels()
to make more complex designs such as panel or
cross-classified data.
Usage
fabricate(..., data = NULL, N = NULL, ID_label = NULL)
add_level(N = NULL, ..., nest = TRUE)
modify_level(..., by = NULL)
nest_level(N = NULL, ...)
Arguments
- ...
Variable or level-generating arguments, such as
my_var = rnorm(N)
. Forfabricate
, you may also passadd_level()
ormodify_level()
arguments, which define a level of a multi-level dataset. See examples.- data
(optional) user-provided data that forms the basis of the fabrication, e.g. you can add variables to existing data. Provide either
N
ordata
(N
is the number of rows of the data ifdata
is provided). Ifdata
andN
are not provided, fabricatr will try to interpret the first un-named argument as eitherdata
orN
based on type.- N
(optional) number of units to draw. If provided as
fabricate(N = 5)
, this determines the number of units in the single-level data. If provided inadd_level
, e.g.fabricate(cities = add_level(N = 5))
,N
determines the number of units in a specific level of a hierarchical dataset.- ID_label
(optional) variable name for ID variable, e.g. citizen_ID. Set to NA to suppress the creation of an ID variable.
- nest
(Default TRUE) Boolean determining whether data in an
add_level()
call will be nested under the current working data frame or create a separate hierarchy of levels. See our vignette for cross-classified, non-nested data for details.- by
(optional) quoted name of variable
modify_level
uses to split-modify-combine data by.
Details
We also provide several built-in options to easily create variables, including
draw_binary
, draw_count
, draw_likert
,
and intra-cluster correlated variables draw_binary_icc
and
draw_normal_icc
Examples
# Draw a single-level dataset with a covariate
building_df <- fabricate(
N = 100,
height_ft = runif(N, 300, 800)
)
head(building_df)
#> ID height_ft
#> 1 001 640.3523
#> 2 002 776.1245
#> 3 003 392.9777
#> 4 004 440.5836
#> 5 005 550.4447
#> 6 006 318.7363
# Start with existing data instead
building_modified <- fabricate(
data = building_df,
rent = rnorm(N, mean = height_ft * 100, sd = height_ft * 30)
)
# Draw a two-level hierarchical dataset
# containing cities within regions
multi_level_df <- fabricate(
regions = add_level(N = 5),
cities = add_level(N = 2, pollution = rnorm(N, mean = 5)))
head(multi_level_df)
#> regions cities pollution
#> 1 1 01 4.862222
#> 2 1 02 4.402063
#> 3 2 03 4.277927
#> 4 2 04 5.072762
#> 5 3 05 6.473363
#> 6 3 06 4.683794
# Start with existing data and add a nested level:
company_df <- fabricate(
data = building_df,
company_id = add_level(N=10, is_headquarters = sample(c(0, 1), N, replace=TRUE))
)
# Start with existing data and add variables to hierarchical data
# at levels which are already present in the existing data.
# Note: do not provide N when adding variables to an existing level
fabricate(
data = multi_level_df,
regions = modify_level(watershed = sample(c(0, 1), N, replace = TRUE)),
cities = modify_level(runoff = rnorm(N))
)
#> regions cities pollution watershed runoff
#> 1 1 01 4.862222 1 -0.4299057
#> 2 1 02 4.402063 0 -0.6519163
#> 3 2 03 4.277927 1 -1.3227553
#> 4 2 04 5.072762 0 -0.6110024
#> 5 3 05 6.473363 1 -0.4334848
#> 6 3 06 4.683794 0 0.2211316
#> 7 4 07 3.585279 0 0.4146353
#> 8 4 08 3.013821 0 1.0035196
#> 9 5 09 5.705734 1 -2.6260422
#> 10 5 10 4.209572 0 -0.8735349
# fabricatr can add variables that are higher-level summaries of lower-level
# variables via a split-modify-combine logic and the \code{by} argument
multi_level_df <-
fabricate(
regions = add_level(N = 5, elevation = rnorm(N)),
cities = add_level(N = 2, pollution = rnorm(N, mean = 5)),
cities = modify_level(by = "regions", regional_pollution = mean(pollution))
)
# fabricatr can also make panel or cross-classified data. For more
# information about syntax for this functionality please read our vignette
# or check documentation for \code{link_levels}:
cross_classified <- fabricate(
primary_schools = add_level(N = 50, ps_quality = runif(N, 0, 10)),
secondary_schools = add_level(N = 100, ss_quality = runif(N, 0, 10), nest=FALSE),
students = link_levels(N = 2000,
by = join_using(ps_quality, ss_quality, rho = 0.5),
student_quality = ps_quality + 3*ss_quality + rnorm(N)))