Draw discrete variables including binary, binomial count, poisson count, ordered, and categorical
Source:R/variable_creation_functions.R
draw_discrete.Rd
Drawing discrete data based on probabilities or latent traits is a common
task that can be cumbersome. Each function in our discrete drawing set creates
a different type of discrete data: draw_binary
creates binary 0/1 data,
draw_binomial
creates binomial data (repeated trial binary data),
draw_categorical
creates categorical data, draw_ordered
transforms latent data into observed ordered categories, draw_count
creates count data (poisson-distributed).
Usage
draw_binomial(
prob = link(latent),
trials = 1,
N = length(prob),
latent = NULL,
link = "identity",
quantile_y = NULL
)
draw_categorical(
prob = link(latent),
N = NULL,
latent = NULL,
link = "identity",
category_labels = NULL
)
draw_ordered(
x = link(latent),
breaks = c(-1, 0, 1),
break_labels = NULL,
N = length(x),
latent = NULL,
strict = FALSE,
link = "identity"
)
draw_count(
mean = link(latent),
N = length(mean),
latent = NULL,
link = "identity",
quantile_y = NULL
)
draw_binary(
prob = link(latent),
N = length(prob),
link = "identity",
latent = NULL,
quantile_y = NULL
)
draw_quantile(type, N)
Arguments
- prob
A number or vector of numbers representing the probability for binary or binomial outcomes; or a number, vector, or matrix of numbers representing probabilities for categorical outcomes. If you supply a link function, these underlying probabilities will be transformed.
- trials
for
draw_binomial
, the number of trials for each observation- N
number of units to draw. Defaults to the length of the vector of probabilities or latent data you provided.
- latent
If the user provides a link argument other than identity, they should provide the variable
latent
rather thanprob
ormean
- link
link function between the latent variable and the probability of a positive outcome, e.g. "logit", "probit", or "identity". For the "identity" link, the latent variable must be a probability.
- quantile_y
A vector of quantiles; if provided, rather than drawing stochastically from the distribution of interest, data will be drawn at exactly those quantiles.
- category_labels
vector of labels for the categories produced by
draw_categorical
. If provided, must be equal to the number of categories provided in theprob
argument.- x
for
draw_ordered
, the latent data for each observation.- breaks
vector of breaks to cut a latent outcome into ordered categories with
draw_ordered
- break_labels
vector of labels for the breaks to cut a latent outcome into ordered categories with
draw_ordered
. (Optional)- strict
Logical indicating whether values outside the provided breaks should be coded as NA. Defaults to
FALSE
, in which case effectively additional breaks are added between -Inf and the lowest break and between the highest break and Inf.- mean
for
draw_count
, the mean number of count units for each observation- type
The number of buckets to split data into. For a median split, enter 2; for terciles, enter 3; for quartiles, enter 4; for quintiles, 5; for deciles, 10.
Value
A vector of data in accordance with the specification; generally
numeric but for some functions, including draw_ordered
and
draw_categorical
, may be factor if labels are provided.
Details
For variables with intra-cluster correlations, see
draw_binary_icc
and draw_normal_icc
Examples
# Drawing binary values (success or failure, treatment assignment)
fabricate(N = 3,
p = c(0, .5, 1),
binary = draw_binary(prob = p))
#> ID p binary
#> 1 1 0.0 0
#> 2 2 0.5 0
#> 3 3 1.0 1
# Drawing binary values with probit link (transforming continuous data
# into a probability range).
fabricate(N = 3,
x = 10 * rnorm(N),
binary = draw_binary(latent = x, link = "probit"))
#> ID x binary
#> 1 1 -14.825575 0
#> 2 2 1.361874 1
#> 3 3 12.949887 1
# Repeated trials: `draw_binomial`
fabricate(N = 3,
p = c(0, .5, 1),
binomial = draw_binomial(prob = p, trials = 10))
#> ID p binomial
#> 1 1 0.0 0
#> 2 2 0.5 3
#> 3 3 1.0 10
# Ordered data: transforming latent data into observed, ordinal data.
# useful for survey responses.
fabricate(N = 3,
x = 5 * rnorm(N),
ordered = draw_ordered(x = x,
breaks = c(-Inf, -1, 1, Inf)))
#> ID x ordered
#> 1 1 0.1627789 2
#> 2 2 2.3064506 3
#> 3 3 6.9070015 3
# Providing break labels for latent data.
fabricate(N = 3,
x = 5 * rnorm(N),
ordered = draw_ordered(x = x,
breaks = c(-Inf, -1, 1, Inf),
break_labels = c("Not at all concerned",
"Somewhat concerned",
"Very concerned")))
#> ID x ordered
#> 1 1 -2.082381 Not at all concerned
#> 2 2 3.404713 Very concerned
#> 3 3 -2.071865 Not at all concerned
# Count data: useful for rates of occurrences over time.
fabricate(N = 5,
x = c(0, 5, 25, 50, 100),
theft_rate = draw_count(mean=x))
#> ID x theft_rate
#> 1 1 0 0
#> 2 2 5 4
#> 3 3 25 26
#> 4 4 50 62
#> 5 5 100 101
# Categorical data: useful for demographic data.
fabricate(N = 6, p1 = runif(N), p2 = runif(N), p3 = runif(N),
cat = draw_categorical(cbind(p1, p2, p3)))
#> ID p1 p2 p3 cat
#> 1 1 0.2713756 0.09022236 0.22092086 1
#> 2 2 0.5784195 0.23832468 0.06540764 1
#> 3 3 0.2323582 0.25913474 0.64489713 2
#> 4 4 0.1263368 0.02566954 0.31255200 3
#> 5 5 0.9966877 0.17555081 0.63886730 1
#> 6 6 0.4742831 0.61982165 0.97387534 2