# fabricatr

Data is generated to ensure inter-cluster correlation 0, intra-cluster correlation in expectation ICC. The data generating process used in this function is specified at the following URL: https://stats.stackexchange.com/questions/263451/create-synthetic-data-with-a-given-intraclass-correlation-coefficient-icc

draw_normal_icc(mean = 0, N = NULL, clusters, sd = NULL,
sd_between = NULL, ICC = NULL)

## Arguments

mean A number or vector of numbers, one mean per cluster. If none is provided, will default to 0. (Optional) A number indicating the number of observations to be generated. Must be equal to length(clusters) if provided. A vector of factors or items that can be coerced to clusters; the length will determine the length of the generated data. A number or vector of numbers, indicating the standard deviation of each cluster's error terms -- standard deviation within a cluster (default 1) A number or vector of numbers, indicating the standard deviation between clusters. A number indicating the desired ICC.

## Value

A vector of numbers corresponding to the observations from the supplied cluster IDs.

## Details

The typical use for this function is for a user to provide an ICC and, optionally, a set of within-cluster standard deviations, sd. If the user does not provide sd, the default value is 1. These arguments imply a fixed between-cluster standard deviation.

An alternate mode for the function is to provide between-cluster standard deviations, sd_between, and an ICC. These arguments imply a fixed within-cluster standard deviation.

If users provide all three of ICC, sd_between, and sd, the function will warn the user and use the provided standard deviations for generating the data.

## Examples


# Divide observations into clusters
clusters = rep(1:5, 10)

# Default: unit variance within each cluster
draw_normal_icc(clusters = clusters, ICC = 0.5)#>  [1] -2.774221580  0.744750945 -1.256170958 -0.464284195  0.005476574
#>  [6] -1.369422702 -0.836103254 -0.123449524 -2.102324520  0.621329562
#> [11] -0.898702207 -0.317246896 -1.218305552 -0.871250814  1.557310484
#> [16] -0.696322293 -0.604463143 -1.979579020 -1.810849294  0.296515115
#> [21] -1.004455099  0.625096428 -3.047062288 -1.052697373  0.788810839
#> [26] -1.368391559 -1.480018037 -1.367819304 -1.488752796  0.350384221
#> [31] -0.398461203 -0.150745376  0.684173868 -0.621120925  0.985746945
#> [36] -0.855393624 -0.627176508 -1.390609732 -1.669810464  0.650505185
#> [41] -0.875855986 -1.114138467 -0.352778182 -0.649242153 -1.654094115
#> [46] -0.627612972 -1.378201814 -1.439114985 -0.931369804  2.069935332
# Alternatively, you can specify characteristics:
draw_normal_icc(mean = 10, clusters = clusters, sd = 3, ICC = 0.3)#>  [1]  6.5076580  4.9897328 10.1449755  7.8583626 14.4104899 10.9502646
#>  [7]  3.9472303 10.3020590  9.3788620 11.7987364  7.4582735  0.8442592
#> [13] 14.1324137  7.8228458 11.7481211 10.9525766  5.1749352 15.7697438
#> [19] 12.9372572 12.0742556 15.0646442 10.2735558 12.9598582 10.8189339
#> [25] 15.8945017  8.2490312 16.2499167 17.0550873  2.6046481 12.4859585
#> [31]  7.7983020 11.4686566 17.4882516  8.8115134 11.2750603 12.4293539
#> [37]  7.9646933 11.3487835  5.5431375  9.7058788  4.7493910  7.9675376
#> [43] 15.3378425 10.0832600 12.3363107  7.4247893 12.1190561  9.9523218
#> [49] 10.5290856  8.6540212
# Can specify between-cluster standard deviation instead:
draw_normal_icc(clusters = clusters, sd_between = 4, ICC = 0.2)#>  [1] -17.7058653   0.5882161   5.6486827   7.9324515  -8.3428828   2.5919477
#>  [7]  -4.1270824   9.5123579   2.6435674  -2.6263506 -10.2739136  11.8095493
#> [13]   4.2633041  11.1662272  11.4890005  -7.0800055   1.7052403  -2.5329016
#> [19]  12.1615392  -2.4679993  -9.7255167   6.1329169  19.8382806   8.8485680
#> [25]  -5.5736394  -0.6081300 -10.1431205  17.4245857   7.4653253   1.2508721
#> [31] -13.9938790   1.7155350  11.1095098   7.7341433   0.5826849  -2.9654141
#> [37]   9.5051761   5.7550496  -4.3387526   6.1589327  -7.8782514  11.0888177
#> [43]  14.9726313  15.7865843   5.3907583 -11.7361528  11.0765795   0.5280482
#> [49]   1.5394966  -4.0823270
# Verify that ICC generated is accurate
corr_draw = draw_normal_icc(clusters = clusters, ICC = 0.4)
summary(lm(corr_draw ~ as.factor(clusters)))\$r.squared#> [1] 0.2247414