fabricate
helps you simulate a dataset before you collect it. You can
either start with your own data and add simulated variables to it (by passing
data
to fabricate()
) or start from scratch by defining
N
. Create hierarchical data with multiple levels of data such as
citizens within cities within states using add_level()
or modify
existing hierarchical data using modify_level()
. You can use any R
function to create each variable. Use cross_levels()
and
link_levels()
to make more complex designs such as panel or
cross-classified data.
fabricate(..., data = NULL, N = NULL, ID_label = NULL) add_level(N = NULL, ..., nest = TRUE) modify_level(..., by = NULL) nest_level(N = NULL, ...)
... | Variable or level-generating arguments, such as
|
---|---|
data | (optional) user-provided data that forms the basis of the
fabrication, e.g. you can add variables to existing data. Provide either
|
N | (optional) number of units to draw. If provided as
|
ID_label | (optional) variable name for ID variable, e.g. citizen_ID. Set to NA to suppress the creation of an ID variable. |
nest | (Default TRUE) Boolean determining whether data in an
|
by | (optional) quoted name of variable |
data.frame
We also provide several built-in options to easily create variables, including
draw_binary
, draw_count
, draw_likert
,
and intra-cluster correlated variables draw_binary_icc
and
draw_normal_icc
# Draw a single-level dataset with a covariate building_df <- fabricate( N = 100, height_ft = runif(N, 300, 800) ) head(building_df)#> ID height_ft #> 1 001 662.2729 #> 2 002 734.0104 #> 3 003 428.4516 #> 4 004 504.6861 #> 5 005 519.3739 #> 6 006 432.6478# Start with existing data instead building_modified <- fabricate( data = building_df, rent = rnorm(N, mean = height_ft * 100, sd = height_ft * 30) ) # Draw a two-level hierarchical dataset # containing cities within regions multi_level_df <- fabricate( regions = add_level(N = 5), cities = add_level(N = 2, pollution = rnorm(N, mean = 5))) head(multi_level_df)#> regions cities pollution #> 1 1 01 5.422167 #> 2 1 02 4.336884 #> 3 2 03 4.337235 #> 4 2 04 5.113292 #> 5 3 05 5.942951 #> 6 3 06 3.784127# Start with existing data and add a nested level: company_df <- fabricate( data = building_df, company_id = add_level(N=10, is_headquarters = sample(c(0, 1), N, replace=TRUE)) ) # Start with existing data and add variables to hierarchical data # at levels which are already present in the existing data. # Note: do not provide N when adding variables to an existing level fabricate( data = multi_level_df, regions = modify_level(watershed = sample(c(0, 1), N, replace = TRUE)), cities = modify_level(runoff = rnorm(N)) )#> regions cities pollution watershed runoff #> 1 1 01 5.422167 0 0.3404986 #> 2 1 02 4.336884 1 3.0972307 #> 3 2 03 4.337235 1 0.6773527 #> 4 2 04 5.113292 1 0.3859950 #> 5 3 05 5.942951 0 -0.8388426 #> 6 3 06 3.784127 1 -0.5300276 #> 7 4 07 4.952737 1 -2.2931821 #> 8 4 08 4.807955 1 1.8552636 #> 9 5 09 5.632149 0 -0.6247150 #> 10 5 10 4.848095 1 0.1758726# fabricatr can add variables that are higher-level summaries of lower-level # variables via a split-modify-combine logic and the \code{by} argument multi_level_df <- fabricate( regions = add_level(N = 5, elevation = rnorm(N)), cities = add_level(N = 2, pollution = rnorm(N, mean = 5)), cities = modify_level(by = "regions", regional_pollution = mean(pollution)) ) # fabricatr can also make panel or cross-classified data. For more # information about syntax for this functionality please read our vignette # or check documentation for \code{link_levels}: cross_classified <- fabricate( primary_schools = add_level(N = 50, ps_quality = runif(N, 0, 10)), secondary_schools = add_level(N = 100, ss_quality = runif(N, 0, 10), nest=FALSE), students = link_levels(N = 2000, by=join(ps_quality, ss_quality, rho = 0.5), student_quality = ps_quality + 3*ss_quality + rnorm(N)))