# More complicated level creation with variable numbers of observations

`add_level()`

can be used to create more complicated patterns of nesting. For example, when creating lower level data, it is possible to use a different `N`

for each of the values of the higher level data:

```
variable_data <-
fabricate(
cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
citizens = add_level(N = c(2, 4), age = runif(N, 18, 70))
)
variable_data
```

cities | elevation | citizens | age |
---|---|---|---|

1 | 1778 | 1 | 46 |

1 | 1778 | 2 | 50 |

2 | 1499 | 3 | 35 |

2 | 1499 | 4 | 65 |

2 | 1499 | 5 | 34 |

2 | 1499 | 6 | 23 |

Here, each city has a different number of citizens. And the value of `N`

used to create the age variable automatically updates as needed. The result is a dataset with 6 citizens, 2 in the first city and 4 in the second. As long as N is either a number, or a vector of the same length of the current lowest level of the data, `add_level()`

will know what to do.

It is also possible to provide a function to N, enabling a *random* number of citizens per city:

```
my_data <-
fabricate(
cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
citizens = add_level(N = sample(1:6, size = 2, replace = TRUE), age = runif(N, 18, 70))
)
my_data
```

cities | elevation | citizens | age |
---|---|---|---|

1 | 1850 | 1 | 53 |

2 | 1128 | 2 | 55 |

2 | 1128 | 3 | 45 |

2 | 1128 | 4 | 42 |

2 | 1128 | 5 | 47 |

2 | 1128 | 6 | 69 |

Here, each city is given a random number of citizens between 1 and 6. Since the `sample()`

function returns a vector of length 2, this is like specifying 2 separate `N`

s as in the example above.

It is also possible to define `N`

on the basis of higher level variables themselves. Consider the following example:

```
variable_n <- fabricate(
cities = add_level(N = 5, population = runif(N, 10, 200)),
citizens = add_level(N = round(population * 0.3))
)
```

cities | population | citizens |
---|---|---|

1 | 90 | 001 |

1 | 90 | 002 |

1 | 90 | 003 |

1 | 90 | 004 |

1 | 90 | 005 |

1 | 90 | 006 |

Here, the city has a defined population, and the number of citizens in our simulated data reflects a sample of 30% of that population. Although we only display the first 6 rows for brevity’s sake, the first city would have 27 rows in total.

Finally, relying on the ID label from the higher level, it is also possible to define `N`

on the basis of the higher level’s length:

```
n_inherit <- fabricate(
cities = add_level(N = 5, population = runif(N, 10, 200)),
citizens = add_level(N = sample(1:10, length(cities), replace=TRUE))
)
```

Here, each city has a random number of citizens from 1 to 10, but we need to supply the length of the higher level’s variable (in this case, the ID label `cities`

) to the sample function to ensure that one draw is made per city.

# Tidyverse integration

Because the functions in **fabricatr** take data and return data, they are cross-compatible with a `tidyverse`

workflow. Here is an example of using **magrittr**’s pipe operator (`%>%`

) and **dplyr**’s `group_by`

and `mutate`

verbs to add new data.

```
library(dplyr)
my_data <-
fabricate(
cities = add_level(N = 2, elevation = runif(n = N, min = 1000, max = 2000)),
citizens = add_level(N = c(2, 3), age = runif(N, 18, 70))
) %>%
group_by(cities) %>%
mutate(pop = n())
my_data
```

cities | elevation | citizens | age | pop |
---|---|---|---|---|

1 | 1341 | 1 | 50 | 2 |

1 | 1341 | 2 | 69 | 2 |

2 | 1011 | 3 | 65 | 3 |

2 | 1011 | 4 | 47 | 3 |

2 | 1011 | 5 | 32 | 3 |

It is also possible to use the pipe operator (`%>%`

) to direct the flow of data between `fabricate()`

calls. Remember that every `fabricate()`

call can import existing data frames, and every call returns a single data frame.

```
my_data <-
data_frame(Y = sample(1:10, 2)) %>%
fabricate(lower_level = add_level(N = 3, Y2 = Y + rnorm(N)))
my_data
```

Y | lower_level | Y2 |
---|---|---|

9 | 1 | 9.4 |

9 | 2 | 10.1 |

9 | 3 | 8.5 |

10 | 4 | 9.2 |

10 | 5 | 9.8 |

10 | 6 | 9.6 |