An Unbiased Regression Type Estimator In Randomized Response Sampling

Arias, Roberto; Sedory, Stephen A.; Singh, Sarjinder

doi:10.1007/s13571-021-00256-z

An Unbiased Regression Type Estimator In Randomized Response Sampling

Published: 26 May 2021

Volume 84, pages 243–258, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Sankhya B Aims and scope Submit manuscript

An Unbiased Regression Type Estimator In Randomized Response Sampling

Download PDF

Roberto Arias¹,
Stephen A. Sedory¹ &
Sarjinder Singh¹

225 Accesses
Explore all metrics

Abstract

In this paper, we suggest a new method of constructing an unbiased regression type estimator in randomized response sampling. We introduce two new randomized response estimators, one we created through the utilization of a sum of special products technique and the other through the utilization of the method used for computing a matrix determinant. This new idea of making an unbiased regression type estimator proves to be more efficient with no loss in respondent protection. Analytical comparisons show the proposed unbiased regression type estimator is always more efficient than the considered competitors. The theoretical justification that the proposed estimator has a smaller variance over its competitors is crystal clear, so no simulation study is required. However to study the gain in magnitude of the relative efficiency, a simulation study has been carried out.

An efficient stratified randomized response model

Article 01 December 2017

A Multi-proportion Randomized Response Model Using the Inverse Sampling

Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

Article 06 April 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Statisticians have been using randomized response techniques for some time now in order to predict the proportion of those individuals who belong to a group defined by a sensitive characteristic. Inaccurate results have been an issue when surveying respondents over sensitive questions as trust between the interviewer and the respondent has posed as an issue. Thus, the techniques of randomized response sampling were created. Pioneered by Warner (1965), randomized response techniques allowed the interviewer to ask sensitive questions that allowed the respondents to give their replies in a way that did not reveal true status. These methods allowed many researchers, including social scientists, to conduct surveys over sensitive subjects and obtain more accurate and efficient results. Since Warner (1965) first proposed his method, many other statisticians have made strides in randomized response, making improvements along the way which can be had from recent valuable monographs by Fox (2016), Chaudhuri et al. (2016), Chaudhuri and Christofides (2013), and Chaudhuri (2011).

In the next section, we will discuss the Warner (1965), Mangat and Singh (1990), Kuk (1990) and Odumade and Singh (2009) models.

2 Background

In brief, survey sampling statisticians have long dealt with the difficulties of estimating the true population proportion of those individuals belonging to a group defined by a sensitive characteristic. One popular solution, first proposed by Warner (1965), is the implementation of a randomization device which protects the privacy of those individuals being surveyed. The idea instructs the respondent, while keeping to themselves, to make use of randomization device, such as a deck of cards. The respondent will select a card from the deck. Each card in this deck will have either the statement “I belong to group A” or the statement “I do not belong to group A” with proportion P₀ and (1 − P₀), respectively. After selecting a card, the respondent will read it to themselves and only tell the interviewer ‘yes’ or ‘no’ if the statement on the drawn card matches his/her status. By letting π represent the true population proportion of those individuals belonging to group A, the Warner (1965) model gives the estimator in Eq. 2.1 of true proportion π of those individuals who belong to group A, for a given P₀, as:

$$ \hat{\pi}_{w(P_{0})}= \frac{\hat{\theta}_{w}-(1-P_{0})}{2P_{0}-1}, P_{0} \neq 0.5 $$

(2.1)

where $\hat {\theta }_{w}=n_{w}/n$ is the observed proportion of ‘yes’ replies out of n respondents selected from the population utilizing a simple random with replacement sampling (SRSWR) scheme, and n_w is the observed number of ‘yes’ replies received by the interviewer. Then, the above estimator is unbiased and provides the variance in Eq. 2.2 for two trials per respondent, for a given P₀, as

$$ V(\hat{\pi}_{w(P_{0})})= \frac{\pi(1-\pi)}{n} + \frac{P_{0}(1-P_{0})}{2n(2P_{0}-1)^{2}} $$

(2.2)

Since the Warner (1965) randomized response model only requires the respondent to deal with a single randomization device, such as a deck of cards, we also consider the case where the Warner (1965) model is performed twice, independently. We do this because the randomization devices that have been proposed, and which will be discussed, after the Warner (1965) model make use of two randomization devices, such as two decks of cards. These devices that use a second device gain efficiency and also improve protection for the respondents participating. Thus, while still using Warner’s (1965) model, consider the case in which the interviewer receive 0, 1, or 2 ‘yes’ replies based on using two independent randomizing devices with parameter P, and T. Then by letting π represent the true population proportion of people belonging to group A, then the probability mass function (p.m.f) of the i-th reply Z_i is obtained as in Table 1:

Table 1 Probability mass function (p.m.f) of Z_i

Full size table

From this, the expected value of Z_i is given as

$$ \begin{array}{@{}rcl@{}} E(Z_{i}) &=&(0) [ \pi (1-P)(1-T) +(1-\pi) PT] \\&&+(1)[\pi \{ P(1-T) +T(1-P) \} +(1-\pi) \{ P(1-T) +T(1-P) \} ] \\&&+(2) [\pi PT +(1-\pi) (1-P)(1-T) ]\\& =&2(P+T-1) \pi + P(1-T) + T(1-P) +2(1-P)(1-T) \end{array} $$

(2.3)

The variance of the response Z_i, V (Z_i), is given by

$$ V(Z_{i})= E({Z_{i}^{2}}) - (E(Z_{i}))^{2} $$

(2.4)

Now we have

$$ \begin{array}{@{}rcl@{}} E({Z_{i}^{2}}) &{}={}&(0)^{2} [ \pi (1-P)(1-T) +(1-\pi) PT] \\ &&+(1)^{2}[\pi \{ P(1-T) +T(1-P) \}{} +{}(1-\pi) \{ P(1-T) {}+{}T(1-P) \} ]\\ &&+(2)^{2} [\pi PT +(1-\pi) (1-P)(1-T) ] \end{array} $$

(2.5)

By plugging Eqs. 2.3 and 2.5 into Eq. 2.4, the variance of Z_i is given as

$$ V(Z_{i})= 4(P+T-1)^{2} \pi (1-\pi) +P(1-P) + T(1-T) $$

(2.6)

From Eq. 2.3, an unbiased estimator of π is given by

$$ \hat{\pi}_{w(PT)} = \frac{\frac{1}{n} {\sum}_{i=1}^{n} Z_{i} -[P(1-T)+T(1-P)+2(1-P)(1-T)]}{2(P+T-1)} $$

(2.7)

The variance of the Warner (1965) estimator $\hat {\pi }_{w}$ for two trials per respondent with two independent devices with parameters, P and T, is given as

$$ V(\hat{\pi}_{w(PT)}) = \frac{\pi (1-\pi)}{n} + \frac{P(1-P)+T(1-T)}{4n(P+T-1)^{2}} $$

(2.8)

Clearly, if we let P = T = P₀, this model reduces back to the original Warner (1965) model with two independent trials per respondent as in Eq. 2.2. To our knowledge the result in Eq. 2.8 is new, however we cited in the background section of the present investigation.

Mangat and Singh (1990) improved on the Warner (1965) model by proposing a two stage randomized response by making use of two decks of cards. In the Mangat and Singh (1990) model, each respondent is asked to use two randomized devices as R₁ and R₂. The device R₁ consists of two outcomes, “Are you a member of group A?”, with relative frequency T₀ and “Go to the second randomization device R₂” with relative frequency (1 − T₀). The second randomization device R₂ is the same as the Warner (1965) randomization device. Similarly to the Warner (1965) model, we let π represent the true population proportion of those individuals belonging to group A, and n_ms be the number of ‘yes’ replies received by the interviewer frpm n respondents selected from the population utilizing a SRSWR. The following estimator of π is derived for the Mangat and Singh (1990) model:

$$ \hat{\pi}_{ms} = \frac{\hat{\theta}_{ms}-(1-T_{0})(1-P_{0})}{(2P_{0}-1)+2T_{0}(1-P_{0})} $$

(2.9)

where $\hat {\theta }_{ms}=n_{ms}/n$ is the proportion of the observed ‘yes’ answers in the sample. The estimator in Eq. 2.9, provided by Mangat and Singh (1990) is unbiased and has the variance:

$$ V(\hat{\pi}_{ms}) = \frac{\pi (1-\pi)}{n} + \frac{(1-T_{0})(1-P_{0})[1-(1-T_{0})(1-P_{0})]}{n[(2P_{0}-1)+2T_{0}(1-P_{0})]^{2}} $$

(2.10)

Following Warner (1986) and Fox and Tracy (1986 p. 30), Kuk (1990) suggests to use theory of recoding of responses to overcome an undesirable feature of randomized response techniques. Kuk (1990) model avoids putting statements like, “I belong to group A”, because A is a sensitive group so the respondent may become sceptical and uncooperative. He suggests to put cards of same size and type but of different colors (red and blue, say). If a respondent belongs to group A (A^c) then he/she is to draw a card from deck-I (deck-2) and report the color of the drawn card, instead of answering the sensitive question. Let 𝜃₁ be the proportion of red cards in the deck-I and 𝜃₂ be the proportion of red cards in deck-2. By letting π represent the true population proportion of those individuals belonging to group A, Kuk’s (1990) model gives the probability of a “red” color response from a respondent:

$$ P(\text{red}) =\theta_{kuk} = \theta_{1} \pi + \theta_{2} (1-\pi) $$

(2.11)

Additionally, suppose the n respondents are selected from the population utilizing a SRSWR. Then, letting n_kuk be the number of ‘red’ replies received by the interviewer, we have that n_kuk follows a Binomial distribution with parameters n and 𝜃_kuk. If each individual being interviewed is requested to give k ≥ 1 replies, then Kuk’s (1990) model gives the variance:

$$ V(\hat{\pi}_{kuk}) = \frac{\theta_{kuk} (1-\theta_{kuk})}{nk(\theta_{1}-\theta_{2})^{2}} + \frac{\pi (1-\pi)}{n}(1-\frac{1}{k}) $$

(2.12)

The Kuk (1990) model serves as a special case of many suggested randomized response models such as those proposed by Warner (1965), Mangat and Singh (1990). While all of these models posed as improvements upon one another, a recent paper has proposed a method that has been shown to be more efficient than all of these models. The Odumade and Singh (2009) randomized response model is shown to be more efficient to that of Warner (1965), Mangat and Singh (1990), and Kuk (1990) models. Naturally, this means that Odumade and Singh (2009) is the estimator which we wish to modify. The Odumade and Singh (2009) model consists of two decks of cards as as shown in Fig. 1.

In this model, each respondent selected in the sample experiences an ordered pair of a deck of cards. This ordered pair is (Deck-I, Deck-II). The respondent matches his/her status with the two outcomes from the two decks and replies either (yes, yes), (yes, no), (no, yes), or (no, no). There are four types of responses that can be observed from both types of respondents either belonging to group A or A^c. This randomized response model produces the following probabilities:

$$ P(\text {yes, yes} ) = \theta_{11} =(P+T-1) \pi + (1-P)(1-T) $$

(2.13)

$$ P(\text {yes, no} ) = \theta_{10} =(P-T) \pi + T(1-P) $$

(2.14)

$$ P(\text {no, yes} ) = \theta_{01} =(T-P) \pi + P(1-T) $$

(2.15)

and

$$ P(\text {no, no} ) = \theta_{00} =(1-P-T) \pi + PT $$

(2.16)

As before we let π represent the true population proportion of those individuals belonging to group A and considering n respondents, with n₁₁, n₁₀, n₀₁, and n₀₀ being the number of (yes, yes), (yes, no), (no, yes) and (no, no) respective replies received by the interviewer. The respondents are selected from the population utilizing a SRSWR. They considered a minimization of a distance function defined as

$$ D = \frac{1}{2} \sum\limits^{1}_{i=0} \sum\limits^{1}_{j=0} (\theta_{ij} -\hat{\theta}_{ij})^{2} $$

(2.17)

where $\hat {\theta }_{ij} = n_{ij}/n$, i = 0,1;j = 0,1 is the observed proportions of (yes, yes), (yes, no), (no, yes) and (no, no) replies.

Then, the estimator of true proportion derived from the Odumade and Singh (2009) model is given as

$$ \hat{\pi}_{os} = \frac{(P+T-1)(\hat{\theta}_{11} -\hat{\theta}_{00}) +(P-T)(\hat{\theta}_{10}-\hat{\theta}_{01}) }{2[(P+T-1)^{2}+(P-T)^{2}]} + \frac{1}{2} $$

(2.18)

The estimator $\hat {\pi }_{os}$ of Odumade and Singh (2009) is unbiased and has the variance:

$$ V(\hat{\pi}_{os}) =\frac{(P+T-1)^{2}[PT+(1-P)(1-T)]+(P-T)^{2}[T(1-P)+P(1-T)]}{4n[(P+T-1)^{2}+(P-T)^{2}]^{2}}-\frac{(2\pi-1)^{2}}{4n} $$

(2.19)

An unbiased estimator of the $V(\hat {\pi }_{os})$ is given by

$$ \hat{v}(\hat{\pi}_{os}) =\frac{(P+T-1)^{2}[PT+(1-P)(1-T)]+(P-T)^{2}[T(1-P)+P(1-T)]}{4(n-1)[(P+T-1)^{2}+(P-T)^{2}]^{2}}-\frac{(2 \hat{\pi}_{os}-1)^{2}}{4(n-1)} $$

(2.20)

In the next section, we consider alternative methods to squared distance function to analyze the data collected by using two deck method.

3 New Randomized Response Estimators Based Upon a Sum of Special Products Technique and Upon the Method of Solving for a Matrix Determinant

We became curious as to what types of methods we could use to produce a new and more efficient randomized response estimator. In particular, how best to utilize the data at hands. The method developed here, which leads to an unbiased and efficient estimator of population proportion without any additional cost (or lost of protection), will be expect to be a new challenge to survey statisticians to think more along these lines.

Now, instead of trying to minimize the distance function as Odumade and Singh (2009) did, we attempted two new methods. The first idea, is to optimize the Sum of Special Products (SSP) given by

$$ SSP = (\theta_{11}-\hat{\theta}_{11})(\theta_{10}-\hat{\theta}_{10}) + (\theta_{01}-\hat{\theta}_{01})(\theta_{00}-\hat{\theta}_{00}) $$

(3.1)

We named it SSP because the first term in the product is obtained by fixing the first response as “Yes” and the second product is obtained by fixing the first response as “No”. No doubt there are many possibilities, but this SSP leads to an amazing estimator and also opens a big-window for future research. Now we have the following theorems:

Theorem 3.1.

The estimator given below, which optimizes the SSP is unique and unbiased.

$$ \hat{\pi}_{SSP} =\frac{(P+T-1)(\hat{\theta}_{10} -\hat{\theta}_{01}) + (P-T) (\hat{\theta}_{11} - \hat{\theta}_{00}) }{4(P+T-1)(P-T)}+\frac{1}{2}, \quad P \neq T $$

(3.2)

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 3.2.

The variance of the estimator $\hat {\pi }_{SSP}$ is given by:

$$ V(\hat{\pi}_{SSP}) =\frac{(P+T-1)^{2}[P(1-T)+T(1-P)]+(P-T)^{2}[(1-P)(1-T)+PT]}{16n(P+T-1)^{2}(P-T)^{2}}-\frac{(2 \pi-1)^{2}}{4n} $$

(3.3)

Proof.

See Online Supplementary Material in Appendix-A.□

Theorem 3.3.

An unbiased estimator of $V(\hat {\pi }_{SSP})$ is given by:

$$ \hat{v}(\hat{\pi}_{SSP}) =\frac{(P+T-1)^{2}[P(1-T)+T(1-P)]+(P-T)^{2}[(1-P)(1-T)+PT]}{16(n-1)(P+T-1)^{2}(P-T)^{2}}-\frac{(2 \hat{\pi}_{SSP}-1)^{2}}{4(n-1)} $$

(3.4)

Proof.

Following Odumade and Singh (2009), it is easy to verify $E[\hat {v}(\hat {\pi }_{SSP})]=V(\hat {\pi }_{SSP})$, which proves the theorem. □

The second method for creating an estimator from the data generated by Odumade and Singh (2009) device takes inspiration from the computation of a determinant of a 2 × 2 matrix. Consider the following square matrix of differences in Fig. 2 as:

Theorem 3.4.

The estimator below, which optimizes the determinant of the true differences is unique and unbiased.

$$ \hat{\pi}_{DET} =\frac{(P+T-1)(\hat{\theta}_{00} -\hat{\theta}_{11}) + (P-T) (\hat{\theta}_{10} - \hat{\theta}_{01}) }{2[(P-T)^{2} -(P+T-1)^{2}]}+\frac{1}{2} $$

(3.5)

Proof.

See Online Supplementary Material in Appendix-A.□

Theorem 3.5.

The variance of the estimator $\hat {\pi }_{DET}$ is given by

$$ \begin{array}{@{}rcl@{}} \begin{aligned} V(\hat{\pi}_{DET}) &=\frac{(P+T-1)^{2}[(1-P)(1-T)+PT]+(P-T)^{2}[T(1-P)+P(1-T)]}{4n[(P-T)^{2}-(P+T-1)^{2}]^{2}} \\& -\frac{(2\pi-1)^{2}}{4n} \end{aligned} \end{array} $$

(3.6)

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 3.6.

An unbiased estimator of $V(\hat {\pi }_{DET})$ is given by

$$ \begin{array}{@{}rcl@{}} \hat{v}(\hat{\pi}_{DET}) \!&=&\!\frac{(P + T - 1)^{2}[(1 - P)(1 - T) + PT]+(P - T)^{2}[T(1 - P) + P(1 - T)]}{4(n-1)[(P-T)^{2}-(P+T-1)^{2}]^{2}} \\ &&-\frac{(2\hat{\pi}_{DET}-1)^{2}}{4(n-1)} \end{array} $$

(3.7)

Proof.

It is easy to verify $E[\hat {v}(\hat {\pi }_{DET})]=V(\hat {\pi }_{DET})$, which proves the theorem. □

In the next section, we propose a new unbiased regression type estimator that makes use of estimators obtained from optimizing the SSP approach and Determinant (DET) approach along with the Odumade and Singh (2009) estimator.

4 A New Unbiased Regression Type Estimator

Unfortunately, when checking the efficiency of the randomized response models derived in Section 2, we came to determine that the new estimators, $\hat {\pi }_{SSP}$ and $\hat {\pi }_{DET}$, did not improve upon the Odumade and Singh (2009) estimator, $\hat {\pi }_{os}$ in terms of relative efficiency. However, we considered the idea of possibly combining the SSP and DET type estimators previously derived and obtain a regression type estimator.

Theorem 4.1.

An unbiased regression type estimator of the true proportion π of individuals belonging to a sensitive group A is given by:

$$ \hat{\pi}_{reg} = \hat{\pi}_{os} + \beta (\hat{\pi}_{SSP} - \hat{\pi}_{DET} ) $$

(4.1)

where β is a constant to be derived.

Proof 7.

The estimator is unbiased whatever the choice of β since $\hat {\pi }_{os}$, $\hat {\pi }_{SSP}$ and $\hat {\pi }_{DET}$ are all unbiased. The optimum value of β is free from the value of π. See Online Supplementary Material in Appendix-A. □

Theorem 4.2.

The minimum variance of $\hat {\pi }_{reg}$ is given by:

$$ \begin{array}{@{}rcl@{}} min. V(\hat{\pi}_{reg}) = V(\hat{\pi}_{os})- \frac{[Cov(\hat{\pi}_{os}, \hat{\pi}_{SSP}-\hat{\pi}_{DET})]^{2}}{V(\hat{\pi}_{SSP}-\hat{\pi}_{DET})} \end{array} $$

(4.2)

where

$$ \begin{array}{@{}rcl@{}} Cov(\hat{\pi}_{os}, \hat{\pi}_{SSP}-\hat{\pi}_{DET}) =-\frac{1}{8n} \end{array} $$

(4.3)

and

$$ \begin{array}{@{}rcl@{}} V(\hat{\pi}_{SSP}-\hat{\pi}_{DET}) &=&\frac{ (P+T-1)^{2} \{ P(1-T)+T(1-P) \} +(P-T)^{2}\{ (1-P)(1-T)+PT \} }{ 16n(P+T-1)^{2}(P-T)^{2} } \\ &+&\frac{(P+T-1)^{2}\{(1-P)(1-T)+PT\} +(P-T)^{2}\{T(1-P)+P(1-T)\}}{4n\{(P-T)^{2}-(P+T-1)^{2} \}^{2}}\\ &&-\frac{1}{4n} \end{array} $$

(4.4)

which are free from the value of π.

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 4.3.

The optimum value of β which minimizes the variance of $\hat {\pi }_{reg}$ in (4.1) can be written as

$$ \begin{array}{@{}rcl@{}} \beta =-\frac{2(2T-1)^{2}(2P-1)^{2}(P+T-1)^{2}(P-T)^{2}}{[8PT(1-P-T+PT)-P(1-P)-T(1-T)][2P(1-P)+2T(1-T)-1]^{2}} \end{array} $$

(4.5)

which is again free from the value of π.

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 4.4.

An unbiased estimator of $V(\hat {\pi }_{reg})$ is given by

$$ \hat{v}(\hat{\pi}_{reg}) =\hat{v}(\hat{\pi}_{os}) -\frac{\{ Cov(\hat{\pi}_{os}, \hat{\pi}_{SSP}-\hat{\pi}_{DET} ) \}^{2}}{ V(\hat{\pi}_{SSP}-\hat{\pi}_{DET} ) } $$

(4.6)

where $\hat {v}(\hat {\pi }_{os})$ is given in (2.20).

Proof.

Trivial, because $Cov(\hat {\pi }_{os}, \hat {\pi }_{SSP}-\hat {\pi }_{DET} )$ in Eq. 3.3 and $V(\hat {\pi }_{SSP}-\hat {\pi }_{DET} )$ in (4.4) are free from the value of π.□

5 Efficiency Comparisons

In order to show that our new estimator is better than the randomized response estimators derived for two trials by Warner (1965), Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009), we must compute the relative efficiencies. With respect to Warner (1965), the relative efficiency criterion for the two cases considered are given by

$$ RE(w)_{P_{0}} = \frac{V(\hat{\pi}_{w(P_{0})})}{V(\hat{\pi}_{reg})}\times 100\% $$

(5.1)

$$ RE(w)_{PT} = \frac{V(\hat{\pi}_{w_{2}})_{PT}}{V(\hat{\pi}_{reg})}\times 100\% $$

(5.2)

Similarly, the relative efficiency criterion with respect to Mangat and Singh (1990), Kuk (1990) and Odumade and Singh (2009) are given by

$$ RE(ms) = \frac{V(\hat{\pi}_{ms})}{V(\hat{\pi}_{reg})}\times 100\% $$

(5.3)

$$ RE(kuk) = \frac{V(\hat{\pi}_{kuk})}{V(\hat{\pi}_{reg})}\times 100\% $$

(5.4)

and

$$ RE(os) = \frac{V(\hat{\pi}_{os})}{V(\hat{\pi}_{reg})}\times 100\% $$

(5.5)

We used the suggested model, and ran a code in SAS (given in Arias (2019)) to compare the efficiency of the proposed model with respect to Warner (1965), Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009). For Warner (1965) with two trials per respondent in Eq. 2.2, Mangat and Singh (1990), we set P₀ = P and T₀ = T. For Kuk (1990), we set 𝜃₁ = P, 𝜃₂ = T, and k = 2. For Odumade and Singh (2009), Warner (1965) with two trials in Eq. 2.8, and the suggested model, we allowed the values of P to range from 0.55 to 0.80, and the value of T to range from 0.10 to 0.25, both with a step of 0.05.

Tables 2, 3 and 4 display summaries of the results, while the full outcome of results can be obtained by executing the SAS Codes. For each value of π, we found the mean, standard deviation, maximum, and minimum of the found relative efficiencies for various choices of P and T.

Table 2 Summarized results of relative efficiencies $RE(w)_{P_{0}}$ and RE(w)_PT

Full size table

Table 3 Summarized results of relative efficiencies RE(ms) and RE(kuk)

Full size table

Table 4 Summarized results of relative efficiencies RE(os)

Full size table

In Tables 2 to 4freq stands for the number of times the proposed estimator is more efficient than all the competitors considered out of 24 possible combinations for P and T. As one can clearly see from Tables 2 to 4, the relative efficiency of the suggested model with respect to the models of the competitors is much better. As the value for π ranges from 0.05 ≤ π ≤ 0.50 with a step of 0.05 , the proposed estimator performs much better than the models proposed by Warner (1965) for both cases, Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009). Now in order to visualize this concept, we can use the Figs. 3 and 4 to see the relative efficiency. In these figures, we will put, into visuals, the relative efficiencies over the unknown proportion π.

From Figs. 3 and 4, one can clearly see that when compared with each of the randomized response models produced by Warner (1965) for two trials, Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009), the suggested model performs better. In each respective case, it is easy to see that each of the values for relative efficiency, with respect to the other models, will remain above 100%. These graphs and summary of results all show that the suggested estimator can perform much better than all of the other estimators provided by Warner (1965), Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009).

6 Simulation Study

A simulation study, very similar to real survey data, was conducted using SAS. We wanted to see how the proposed estimator would perform against the Odumade and Singh (2009) estimator in a real survey. While we let 0.05 ≤ π ≤ 0.50 and set P = 0.316 and T = 0.845, we first determined the true probabilities of 𝜃₁₁,𝜃₁₀,𝜃₀₁, and 𝜃₀₀. Then, utilizing SAS, we created a sample size of n = 50 replies by utilizing the call function RandMultinomial(ntrials,ns,prob) where ntrials represents the number of trials in each simulation, ns is the sample size and prob represents the probabilities. We then used NITR = 10,000 which means each trial had 10,000 iterations. Naturally, we computed the simulated variances of $\hat {\pi }_{os}$ and $\hat {\pi }_{reg}$ as follows

$$ V_{sim} (\hat{\pi}_{os}) = \frac{1}{NITR} \sum\limits_{i=1}^{NITR} (\hat{\pi}_{os(i)} - \pi )^{2} $$

(6.1)

and

$$ V_{sim} (\hat{\pi}_{reg}) = \frac{1}{NITR} \sum\limits_{i=1}^{NITR} (\hat{\pi}_{reg(i)} - \pi )^{2} $$

(6.2)

The relative efficiency can be determined from Eqs. 6.1 and 6.2 as follows

$$ RE_{sim} = \frac{V_{sim}(\hat{\pi}_{os})}{V_{sim}(\hat{\pi}_{reg})}\times 100\% $$

(6.3)

where RE_sim is the simulated relative efficiency of $\hat {\pi }_{reg}$ with respect to $\hat {\pi }_{os}$. A total of 10 simulations were run for 0.05 ≤ π ≤ 0.50 with a step of 0.05. Each individual case considered 10,000 different trials. In each trial a sample size of 50 individuals was used in order to produce accurate results. For each study we determined and studied the relative efficiency of the suggested estimator with respect to the Odumade and Singh (2009) estimator. The results are given below in Table 5.

Table 5 Summarized results of relative efficiencies RE_sim in Eq. 6.3

Full size table

Clearly from Table 5, the results of the simulation studies run on SAS show that the suggested estimator will perform better than that of the Odumade and Singh (2009) estimator. The added β coefficient forces the suggested estimator to always be more efficient, especially when the optimum choice is used for the minimum variance of the estimator. The original version of this work can be had from Arias (2019), and suggests that there is potential for further research.

7 Conclusion

We knew that the model produced by Odumade and Singh (2009) could perform better than that of Warner (1965), Mangat and Singh (1990), and Kuk (1990). However, the natural question was posed, how do we beat the Odumade and Singh (2009) model? The use of new methods of minimizing SSP and determinant type techniques proved to be of no use. Nevertheless, it became obvious to us that a regression type estimator with an optimum value for β would do the trick. As one can clearly see from the proofs, tables, and figures, the idea to utilize a regression type estimator was the way to go. The suggested estimator became more efficient than all the other estimators we were comparing against. Naturally, additional research is required to see if we can further improve an estimator in the field of randomized response sampling.

References

Arias, R. (2019). New methods for efficient results using randomized response sampling. Unpublished M.Sc. thesis submitted to the Deaprtment of Mathematics, Texas A & M University-Kingsville.
Chaudhuri, A. (2011). Randomized response and indirect questioning technique in surveys. CRC Press, Boca Raton.
MATH Google Scholar
Chaudhuri, A. and Christofides, T.C. (2013). Indirect questioning in sample surveys. Springer Science & Business Media, Berlin.
Book Google Scholar
Chaudhuri, A., Christofides, T.C. and Rao, C.R. (2016). Data gathering, analysis and protection of privacy through randomized response techniques: Qualitative and quantitative human traits, 34. Elsevier, North-Holland.
MATH Google Scholar
Fox, J.A. (2016). Randomized response and related methods, 2nd edn. SAGE, Los Angeles.
Google Scholar
Fox, J.A. and Tracy, P.E. (1986). Randomized response: A method for sensitive surveys. SAGE, LOs Angles.
Book Google Scholar
Kuk, A.Y.C. (1990). Asking sensitive questions indirectly. Biometrika77, 2, 439–442.
Article MathSciNet Google Scholar
Mangat, N.S. and Singh, R. (1990). An alternative randomized response procedure. Biometrika 77, 2, 439–442.
Article MathSciNet Google Scholar
Odumade, O. and Singh, S. (2009). Efficient use of two decks of cards in randomized response sampling. Commun. Stat.-Theory Methods 38, 439–446.
Article MathSciNet Google Scholar
Warner, S.L. (1965). Randomized response: a survey technique for eliminating evasive answer bias. J. Amer. Statist. Assoc. 60, 63–69.
Article Google Scholar
Warner, S.L. (1986). The omitted digit randomized response model for telephone applications. In Proceedings Survey Res. Meth. Sect Am. Statist. Assoc.. pp. 441–443.

Download references

Acknowledgments

The authors are thankful to the Editor-in-Chief Dr. Dipak K. Dey, an Associate Editor, a referee and Editorial Assistant: Mr. Sarvagnan Subramanian for their comments and help on the original version of this manuscript.

Author information

Authors and Affiliations

Department of Mathematics, Texas A&M University-Kingsville, Kingsville, TX, USA
Roberto Arias, Stephen A. Sedory & Sarjinder Singh

Authors

Roberto Arias
View author publications
You can also search for this author in PubMed Google Scholar
Stephen A. Sedory
View author publications
You can also search for this author in PubMed Google Scholar
Sarjinder Singh
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sarjinder Singh.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

(PDF 296 KB)

Rights and permissions

Reprints and permissions

About this article

Cite this article

Arias, R., Sedory, S.A. & Singh, S. An Unbiased Regression Type Estimator In Randomized Response Sampling. Sankhya B 84, 243–258 (2022). https://doi.org/10.1007/s13571-021-00256-z

Download citation

Received: 29 July 2020
Accepted: 09 February 2021
Published: 26 May 2021
Issue Date: May 2022
DOI: https://doi.org/10.1007/s13571-021-00256-z

Keywords

AMS (2000) subject classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

An Unbiased Regression Type Estimator In Randomized Response Sampling

Abstract

Similar content being viewed by others

An efficient stratified randomized response model

A Multi-proportion Randomized Response Model Using the Inverse Sampling

Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

1 Introduction

2 Background

3 New Randomized Response Estimators Based Upon a Sum of Special Products Technique and Upon the Method of Solving for a Matrix Determinant

Theorem 3.1.

Proof.

Theorem 3.2.

Proof.

Theorem 3.3.

Proof.

Theorem 3.4.

Proof.

Theorem 3.5.

Proof.

Theorem 3.6.

Proof.

4 A New Unbiased Regression Type Estimator

Theorem 4.1.

Proof 7.

Theorem 4.2.

Proof.

Theorem 4.3.

Proof.

Theorem 4.4.

Proof.

5 Efficiency Comparisons

6 Simulation Study

7 Conclusion

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s Note

Electronic supplementary material

(PDF 296 KB)

Rights and permissions

About this article

Cite this article

Share this article

Keywords

AMS (2000) subject classification

Search

Navigation