More complicated level creation with variable numbers of observations
add_level() can be used to create more complicated patterns of nesting. For example, when creating lower level data, it is possible to use a different
N for each of the values of the higher level data:
variable_data <- fabricate( cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)), citizens = add_level(N = c(2, 4), age = runif(N, 18, 70)) ) variable_data
Here, each city has a different number of citizens. And the value of
N used to create the age variable automatically updates as needed. The result is a dataset with 6 citizens, 2 in the first city and 4 in the second. As long as N is either a number, or a vector of the same length of the current lowest level of the data,
add_level() will know what to do.
It is also possible to provide a function to N, enabling a random number of citizens per city:
my_data <- fabricate( cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)), citizens = add_level(N = sample(1:6, size = 2, replace = TRUE), age = runif(N, 18, 70)) ) my_data
Here, each city is given a random number of citizens between 1 and 6. Since the
sample() function returns a vector of length 2, this is like specifying 2 separate
Ns as in the example above.
It is also possible to define
N on the basis of higher level variables themselves. Consider the following example:
variable_n <- fabricate( cities = add_level(N = 5, population = runif(N, 10, 200)), citizens = add_level(N = round(population * 0.3)) )
Here, the city has a defined population, and the number of citizens in our simulated data reflects a sample of 30% of that population. Although we only display the first 6 rows for brevity’s sake, the first city would have 27 rows in total.
Finally, relying on the ID label from the higher level, it is also possible to define
N on the basis of the higher level’s length:
n_inherit <- fabricate( cities = add_level(N = 5, population = runif(N, 10, 200)), citizens = add_level(N = sample(1:10, length(cities), replace=TRUE)) )
Here, each city has a random number of citizens from 1 to 10, but we need to supply the length of the higher level’s variable (in this case, the ID label
cities) to the sample function to ensure that one draw is made per city.
Because the functions in fabricatr take data and return data, they are cross-compatible with a
tidyverse workflow. Here is an example of using magrittr’s pipe operator (
%>%) and dplyr’s
mutate verbs to add new data.
library(dplyr) my_data <- fabricate( cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)), citizens = add_level(N = c(2, 3), age = runif(N, 18, 70)) ) %>% group_by(cities) %>% mutate(pop = n()) my_data
It is also possible to use the pipe operator (
%>%) to direct the flow of data between
fabricate() calls. Remember that every
fabricate() call can import existing data frames, and every call returns a single data frame.
my_data <- data_frame(Y = sample(1:10, 2)) %>% fabricate(lower_level = add_level(N = 3, Y2 = Y + rnorm(N)))
## Warning: `data_frame()` is deprecated, use `tibble()`. ## This warning is displayed once per session.