Randomized Response

Randomized response is a tool used in survey research aimed at increasing subject privacy and mitigate potential bias, particularly when our inquiries of interest pertain sensitive or controversial behavior or beliefs. For example, researchers may be interested in measuring participation in violence, engagement in illicit activities, or support for the opposition in authoritarian contexts.

Let us consider a specific type of randomized response design in which respondents are asked to use a randomization device, such as a die, whose outcome is unobserved to the enumerator. Respondents are then asked to answer “Yes” if the die shows numbers “1”, “2”, “3”, or “4” and to answer truthfully if the die shows “5”, or “6.” This is aimed at increasing the privacy of respondent’s and elicit them to indeed respond truthfully.

Say, for example, we are interested in measuring the rate of intimate partner violence (IPV) in a given locality. Our design compares estimates from direct questions and questions using the randomized response principle.

Design Declaration

• Model: Respondents are expected to answer the sensitive question with a probability $$p$$ (a known quantity). $$L_i$$ is a latent binary response to the sensitive question that equals 0 for respondent $$i$$ if he or she does not engage in IPV and 1 if he or she does. This is defined in our design as the prevalence rate. $$Y_i$$ is the response observed by researchers from the randomized approach. $$D_i$$ is the response to the direct question of whether respondent engages in IPV. We expect respondents who engage in IPV to be inclined to misreport if asked to answer directly (we define this as the withholding rate).

• Inquiry: Our estimand is the rate of intimate partner violence in the studied locality, or the population mean of L. Formally, we expect $$Pr(Y_i = 1) = p + (1-p)Pr(L_i = 1)$$.

• Data strategy: We collect survey data on a representative sample of 1,000 individuals and respondents are assigned to respond “Yes” with probability $$p = \frac{4}{6}$$ and truthfully with probability $$1 - p = \frac{1}{3}$$.

• Answer strategy: We estimate our population rate of IPV in two ways: firstly by averaging the responses observed via direct question ($$\hat{\bar{L}} = \bar{D}$$), and secondly via randomized response: $$\hat{\bar{L}} = \frac{\bar{Y} - p}{1 - p}$$.

N <- 1000
prob_forced_yes <- 0.6
prevalence_rate <- 0.1
withholding_rate <- 0.5

population <- declare_population(N = N, sensitive_trait = draw_binary(prob = prevalence_rate,
N = N), withholder = draw_binary(prob = sensitive_trait *
withholding_rate, N = N), direct_answer = sensitive_trait -
withholder)
potential_outcomes <- declare_potential_outcomes(Y_Z_Yes = 1,
Y_Z_Truth = sensitive_trait)
estimand <- declare_estimand(true_rate = mean(sensitive_trait))
assignment <- declare_assignment(prob = prob_forced_yes,
conditions = c("Truth", "Yes"))
estimator_randomized_response <- declare_estimator(handler = tidy_estimator(function(data) with(data,
data.frame(estimate = (mean(Y) - prob_forced_yes)/(1 -
prob_forced_yes)))), estimand = estimand, label = "Forced Randomized Response")
estimator_direct_question <- declare_estimator(handler = tidy_estimator(function(data) with(data,
data.frame(estimate = mean(direct_answer)))), estimand = estimand,
label = "Direct Question")
randomized_response_design <- population + assignment + potential_outcomes +
estimand + declare_reveal(Y, Z) + estimator_randomized_response +
estimator_direct_question
randomized_response_design <- set_diagnosands(design = randomized_response_design,
diagnosands = declare_diagnosands(select = bias))

Takeaways

diagnosis <- diagnose_design(randomized_response_design,
diagnosands = declare_diagnosands(bias = mean(estimate - estimand),
rmse = sqrt(mean((estimate - estimand)^2)),
mean_estimate = mean(estimate),
mean_estimand = mean(estimand),
keep_defaults = FALSE))
Design Label Estimand Label Estimator Label N Sims Bias RMSE Mean Estimate Mean Estimand
randomized_response_design true_rate Direct Question 500 -0.05 0.05 0.05 0.10
(0.00) (0.00) (0.00) (0.00)
randomized_response_design true_rate Forced Randomized Response 500 0.00 0.01 0.10 0.10
(0.00) (0.00) (0.00) (0.00)

Our diagnosis of the design indicates that randomized response yields an unbiased estimate of the true rate of IPV in the study sample and offer a better alternative to direct questions involving sensitive or controversial inquiries.

Using the Randomized Response Designer

In R, you can generate a randomized response design using the template function randomized_response_designer() in the DesignLibrary package by running the following lines, which load the package:

library(DesignLibrary)

We can then create specific designs by defining values for each argument. For example, we create a design called my_randomized_response_design with N, prob_forced_yes, prevalence_rate, and withholding_rate set to 300, .6, .2, and .2, respectively, by running the lines below.

my_randomized_response_design <- randomized_response_designer(N = 300,
prob_forced_yes = .6,
prevalence_rate = .2,
withholding_rate = .2)

You can see more details on the randomized_response_designer() function and its arguments by running the following line of code:

??randomized_response_designer