R/variable_creation_functions.R
draw_discrete.Rd
Drawing discrete data based on probabilities or latent traits is a common
task that can be cumbersome. Each function in our discrete drawing set creates
a different type of discrete data: draw_binary
creates binary 0/1 data,
draw_binomial
creates binomial data (repeated trial binary data),
draw_categorical
creates categorical data, draw_ordered
transforms latent data into observed ordered categories, draw_count
creates count data (poissondistributed). draw_likert
is an alias to
draw_ordered
that prespecifies break labels and offers default breaks
appropriate for a likert survey question.
draw_binomial( prob = link(latent), trials = 1, N = length(prob), latent = NULL, link = "identity", quantile_y = NULL ) draw_categorical( prob = link(latent), N = NULL, latent = NULL, link = "identity", category_labels = NULL ) draw_ordered( x = link(latent), breaks = c(1, 0, 1), break_labels = NULL, N = length(x), latent = NULL, strict = FALSE, link = "identity" ) draw_count( mean = link(latent), N = length(mean), latent = NULL, link = "identity", quantile_y = NULL ) draw_binary( prob = link(latent), N = length(prob), link = "identity", latent = NULL, quantile_y = NULL ) draw_likert( x, type = 7, breaks = NULL, N = length(x), latent = NULL, link = "identity", strict = !is.null(breaks) ) draw_quantile(type, N)
prob  A number or vector of numbers representing the probability for binary or binomial outcomes; or a number, vector, or matrix of numbers representing probabilities for categorical outcomes. If you supply a link function, these underlying probabilities will be transformed. 

trials  for 
N  number of units to draw. Defaults to the length of the vector of probabilities or latent data you provided. 
latent  If the user provides a link argument other than identity, they
should provide the variable 
link  link function between the latent variable and the probability of a positive outcome, e.g. "logit", "probit", or "identity". For the "identity" link, the latent variable must be a probability. 
quantile_y  A vector of quantiles; if provided, rather than drawing stochastically from the distribution of interest, data will be drawn at exactly those quantiles. 
category_labels  vector of labels for the categories produced by

x  for 
breaks  vector of breaks to cut a latent outcome into ordered
categories with 
break_labels  vector of labels for the breaks to cut a latent outcome
into ordered categories with 
strict  Logical indicating whether values outside the provided breaks should be coded as NA. Defaults to 
mean  for 
type  Type of Likert scale data for 
A vector of data in accordance with the specification; generally
numeric but for some functions, including draw_ordered
, may be factor if
break labels are provided.
For variables with intracluster correlations, see
draw_binary_icc
and draw_normal_icc
# Drawing binary values (success or failure, treatment assignment) fabricate(N = 3, p = c(0, .5, 1), binary = draw_binary(prob = p))#> ID p binary #> 1 1 0.0 0 #> 2 2 0.5 0 #> 3 3 1.0 1# Drawing binary values with probit link (transforming continuous data # into a probability range). fabricate(N = 3, x = 10 * rnorm(N), binary = draw_binary(latent = x, link = "probit"))#> ID x binary #> 1 1 12.1515166 1 #> 2 2 0.4221889 0 #> 3 3 3.9740704 1# Repeated trials: `draw_binomial` fabricate(N = 3, p = c(0, .5, 1), binomial = draw_binomial(prob = p, trials = 10))#> ID p binomial #> 1 1 0.0 0 #> 2 2 0.5 6 #> 3 3 1.0 10# Ordered data: transforming latent data into observed, ordinal data. # useful for survey responses. fabricate(N = 3, x = 5 * rnorm(N), ordered = draw_ordered(x = x, breaks = c(Inf, 1, 1, Inf)))#> ID x ordered #> 1 1 0.7245082 2 #> 2 2 1.6222348 1 #> 3 3 0.8628245 2# Providing break labels for latent data. fabricate(N = 3, x = 5 * rnorm(N), ordered = draw_ordered(x = x, breaks = c(Inf, 1, 1, Inf), break_labels = c("Not at all concerned", "Somewhat concerned", "Very concerned")))#> ID x ordered #> 1 1 6.1803146 Not at all concerned #> 2 2 9.5115210 Not at all concerned #> 3 3 0.4725201 Somewhat concerned# Likert data: often used for survey data fabricate(N = 10, support_free_college = draw_likert(x = rnorm(N), type = 5))#> ID support_free_college #> 1 01 Don't Know / Neutral #> 2 02 Don't Know / Neutral #> 3 03 Agree #> 4 04 Don't Know / Neutral #> 5 05 Agree #> 6 06 Don't Know / Neutral #> 7 07 Disagree #> 8 08 Disagree #> 9 09 Disagree #> 10 10 Don't Know / Neutral# Count data: useful for rates of occurrences over time. fabricate(N = 5, x = c(0, 5, 25, 50, 100), theft_rate = draw_count(mean=x))#> ID x theft_rate #> 1 1 0 0 #> 2 2 5 5 #> 3 3 25 26 #> 4 4 50 51 #> 5 5 100 88# Categorical data: useful for demographic data. fabricate(N = 6, p1 = runif(N), p2 = runif(N), p3 = runif(N), cat = draw_categorical(cbind(p1, p2, p3)))#> ID p1 p2 p3 cat #> 1 1 0.09022236 0.22092086 0.1625581 2 #> 2 2 0.23832468 0.06540764 0.5351122 2 #> 3 3 0.25913474 0.64489713 0.5850109 1 #> 4 4 0.02566954 0.31255200 0.3186254 2 #> 5 5 0.17555081 0.63886730 0.4680848 3 #> 6 6 0.61982165 0.97387534 0.6392598 1