R/variable_creation_functions.R
draw_discrete.Rd
Drawing discrete data based on probabilities or latent traits is a common
task that can be cumbersome. Each function in our discrete drawing set creates
a different type of discrete data: draw_binary
creates binary 0/1 data,
draw_binomial
creates binomial data (repeated trial binary data),
draw_categorical
creates categorical data, draw_ordered
transforms latent data into observed ordered categories, draw_count
creates count data (poisson-distributed). draw_likert
is an alias to
draw_ordered
that pre-specifies break labels and offers default breaks
appropriate for a likert survey question.
draw_binomial( prob = link(latent), trials = 1, N = length(prob), latent = NULL, link = "identity", quantile_y = NULL ) draw_categorical( prob = link(latent), N = NULL, latent = NULL, link = "identity", category_labels = NULL ) draw_ordered( x = link(latent), breaks = c(-1, 0, 1), break_labels = NULL, N = length(x), latent = NULL, strict = FALSE, link = "identity" ) draw_count( mean = link(latent), N = length(mean), latent = NULL, link = "identity", quantile_y = NULL ) draw_binary( prob = link(latent), N = length(prob), link = "identity", latent = NULL, quantile_y = NULL ) draw_likert( x, type = 7, breaks = NULL, N = length(x), latent = NULL, link = "identity", strict = !is.null(breaks) ) draw_quantile(type, N)
prob | A number or vector of numbers representing the probability for binary or binomial outcomes; or a number, vector, or matrix of numbers representing probabilities for categorical outcomes. If you supply a link function, these underlying probabilities will be transformed. |
---|---|
trials | for |
N | number of units to draw. Defaults to the length of the vector of probabilities or latent data you provided. |
latent | If the user provides a link argument other than identity, they
should provide the variable |
link | link function between the latent variable and the probability of a positive outcome, e.g. "logit", "probit", or "identity". For the "identity" link, the latent variable must be a probability. |
quantile_y | A vector of quantiles; if provided, rather than drawing stochastically from the distribution of interest, data will be drawn at exactly those quantiles. |
category_labels | vector of labels for the categories produced by
|
x | for |
breaks | vector of breaks to cut a latent outcome into ordered
categories with |
break_labels | vector of labels for the breaks to cut a latent outcome
into ordered categories with |
strict | Logical indicating whether values outside the provided breaks should be coded as NA. Defaults to |
mean | for |
type | Type of Likert scale data for |
A vector of data in accordance with the specification; generally
numeric but for some functions, including draw_ordered
, may be factor if
break labels are provided.
For variables with intra-cluster correlations, see
draw_binary_icc
and draw_normal_icc
# Drawing binary values (success or failure, treatment assignment) fabricate(N = 3, p = c(0, .5, 1), binary = draw_binary(prob = p))#> ID p binary #> 1 1 0.0 0 #> 2 2 0.5 0 #> 3 3 1.0 1# Drawing binary values with probit link (transforming continuous data # into a probability range). fabricate(N = 3, x = 10 * rnorm(N), binary = draw_binary(latent = x, link = "probit"))#> ID x binary #> 1 1 12.1515166 1 #> 2 2 0.4221889 0 #> 3 3 3.9740704 1# Repeated trials: `draw_binomial` fabricate(N = 3, p = c(0, .5, 1), binomial = draw_binomial(prob = p, trials = 10))#> ID p binomial #> 1 1 0.0 0 #> 2 2 0.5 6 #> 3 3 1.0 10# Ordered data: transforming latent data into observed, ordinal data. # useful for survey responses. fabricate(N = 3, x = 5 * rnorm(N), ordered = draw_ordered(x = x, breaks = c(-Inf, -1, 1, Inf)))#> ID x ordered #> 1 1 -0.7245082 2 #> 2 2 -1.6222348 1 #> 3 3 -0.8628245 2# Providing break labels for latent data. fabricate(N = 3, x = 5 * rnorm(N), ordered = draw_ordered(x = x, breaks = c(-Inf, -1, 1, Inf), break_labels = c("Not at all concerned", "Somewhat concerned", "Very concerned")))#> ID x ordered #> 1 1 -6.1803146 Not at all concerned #> 2 2 -9.5115210 Not at all concerned #> 3 3 -0.4725201 Somewhat concerned# Likert data: often used for survey data fabricate(N = 10, support_free_college = draw_likert(x = rnorm(N), type = 5))#> ID support_free_college #> 1 01 Don't Know / Neutral #> 2 02 Don't Know / Neutral #> 3 03 Agree #> 4 04 Don't Know / Neutral #> 5 05 Agree #> 6 06 Don't Know / Neutral #> 7 07 Disagree #> 8 08 Disagree #> 9 09 Disagree #> 10 10 Don't Know / Neutral# Count data: useful for rates of occurrences over time. fabricate(N = 5, x = c(0, 5, 25, 50, 100), theft_rate = draw_count(mean=x))#> ID x theft_rate #> 1 1 0 0 #> 2 2 5 5 #> 3 3 25 26 #> 4 4 50 51 #> 5 5 100 88# Categorical data: useful for demographic data. fabricate(N = 6, p1 = runif(N), p2 = runif(N), p3 = runif(N), cat = draw_categorical(cbind(p1, p2, p3)))#> ID p1 p2 p3 cat #> 1 1 0.09022236 0.22092086 0.1625581 2 #> 2 2 0.23832468 0.06540764 0.5351122 2 #> 3 3 0.25913474 0.64489713 0.5850109 1 #> 4 4 0.02566954 0.31255200 0.3186254 2 #> 5 5 0.17555081 0.63886730 0.4680848 3 #> 6 6 0.61982165 0.97387534 0.6392598 1