Drawing discrete data based on probabilities or latent traits is a common task that can be cumbersome. Each function in our discrete drawing set creates a different type of discrete data: draw_binary creates binary 0/1 data, draw_binomial creates binomial data (repeated trial binary data), draw_categorical creates categorical data, draw_ordered transforms latent data into observed ordered categories, draw_count creates count data (poisson-distributed). draw_likert is an alias to draw_ordered that pre-specifies break labels and offers default breaks appropriate for a likert survey question.

draw_binomial(prob = link(latent), trials = 1, N = length(prob),
  latent = NULL, link = "identity", quantile_y = NULL)

draw_categorical(prob = link(latent), N = NULL, latent = NULL,
  link = "identity", category_labels = NULL)

draw_ordered(x = link(latent), breaks = c(-1, 0, 1),
  break_labels = NULL, N = length(x), latent = NULL,
  strict = FALSE, link = "identity")

draw_count(mean = link(latent), N = length(mean), latent = NULL,
  link = "identity", quantile_y = NULL)

draw_binary(prob = link(latent), N = length(prob), link = "identity",
  latent = NULL, quantile_y = NULL)

draw_likert(x, type = 7, breaks = NULL, N = length(x),
  latent = NULL, link = "identity", strict = !is.null(breaks))

draw_quantile(type, N)

Arguments

prob

A number or vector of numbers representing the probability for binary or binomial outcomes; or a number, vector, or matrix of numbers representing probabilities for categorical outcomes. If you supply a link function, these underlying probabilities will be transformed.

trials

for draw_binomial, the number of trials for each observation

N

number of units to draw. Defaults to the length of the vector of probabilities or latent data you provided.

latent

If the user provides a link argument other than identity, they should provide the variable latent rather than prob or mean

link

link function between the latent variable and the probability of a positive outcome, e.g. "logit", "probit", or "identity". For the "identity" link, the latent variable must be a probability.

quantile_y

A vector of quantiles; if provided, rather than drawing stochastically from the distribution of interest, data will be drawn at exactly those quantiles.

category_labels

vector of labels for the categories produced by draw_categorical. If provided, must be equal to the number of categories provided in the prob argument.

x

for draw_ordered or draw_likert, the latent data for each observation.

breaks

vector of breaks to cut a latent outcome into ordered categories with draw_ordered or draw_likert

break_labels

vector of labels for the breaks to cut a latent outcome into ordered categories with draw_ordered. (Optional)

strict

Logical indicating whether values outside the provided breaks should be coded as NA. Defaults to FALSE, in which case effectively additional breaks are added between -Inf and the lowest break and between the highest break and Inf.

mean

for draw_count, the mean number of count units for each observation

type

Type of Likert scale data for draw_likert. Valid options are 4, 5, and 7. Type corresponds to the number of categories in the Likert scale.

Value

A vector of data in accordance with the specification; generally numeric but for some functions, including draw_ordered, may be factor if break labels are provided.

Details

For variables with intra-cluster correlations, see draw_binary_icc and draw_normal_icc

Examples

# Drawing binary values (success or failure, treatment assignment) fabricate(N = 3, p = c(0, .5, 1), binary = draw_binary(prob = p))
#> ID p binary #> 1 1 0.0 0 #> 2 2 0.5 0 #> 3 3 1.0 1
# Drawing binary values with probit link (transforming continuous data # into a probability range). fabricate(N = 3, x = 10 * rnorm(N), binary = draw_binary(latent = x, link = "probit"))
#> ID x binary #> 1 1 12.1515166 1 #> 2 2 0.4221889 0 #> 3 3 3.9740704 1
# Repeated trials: `draw_binomial` fabricate(N = 3, p = c(0, .5, 1), binomial = draw_binomial(prob = p, trials = 10))
#> ID p binomial #> 1 1 0.0 0 #> 2 2 0.5 6 #> 3 3 1.0 10
# Ordered data: transforming latent data into observed, ordinal data. # useful for survey responses. fabricate(N = 3, x = 5 * rnorm(N), ordered = draw_ordered(x = x, breaks = c(-Inf, -1, 1, Inf)))
#> ID x ordered #> 1 1 -0.7245082 2 #> 2 2 -1.6222348 1 #> 3 3 -0.8628245 2
# Providing break labels for latent data. fabricate(N = 3, x = 5 * rnorm(N), ordered = draw_ordered(x = x, breaks = c(-Inf, -1, 1, Inf), break_labels = c("Not at all concerned", "Somewhat concerned", "Very concerned")))
#> ID x ordered #> 1 1 -6.1803146 Not at all concerned #> 2 2 -9.5115210 Not at all concerned #> 3 3 -0.4725201 Somewhat concerned
# Likert data: often used for survey data fabricate(N = 10, support_free_college = draw_likert(x = rnorm(N), type = 5))
#> ID support_free_college #> 1 01 Don't Know / Neutral #> 2 02 Don't Know / Neutral #> 3 03 Agree #> 4 04 Don't Know / Neutral #> 5 05 Agree #> 6 06 Don't Know / Neutral #> 7 07 Disagree #> 8 08 Disagree #> 9 09 Disagree #> 10 10 Don't Know / Neutral
# Count data: useful for rates of occurrences over time. fabricate(N = 5, x = c(0, 5, 25, 50, 100), theft_rate = draw_count(mean=x))
#> ID x theft_rate #> 1 1 0 0 #> 2 2 5 5 #> 3 3 25 26 #> 4 4 50 51 #> 5 5 100 88
# Categorical data: useful for demographic data. fabricate(N = 6, p1 = runif(N), p2 = runif(N), p3 = runif(N), cat = draw_categorical(cbind(p1, p2, p3)))
#> ID p1 p2 p3 cat #> 1 1 0.09022236 0.22092086 0.1625581 2 #> 2 2 0.23832468 0.06540764 0.5351122 2 #> 3 3 0.25913474 0.64489713 0.5850109 1 #> 4 4 0.02566954 0.31255200 0.3186254 2 #> 5 5 0.17555081 0.63886730 0.4680848 3 #> 6 6 0.61982165 0.97387534 0.6392598 1