1 Introduction

Statisticians have been using randomized response techniques for some time now in order to predict the proportion of those individuals who belong to a group defined by a sensitive characteristic. Inaccurate results have been an issue when surveying respondents over sensitive questions as trust between the interviewer and the respondent has posed as an issue. Thus, the techniques of randomized response sampling were created. Pioneered by Warner (1965), randomized response techniques allowed the interviewer to ask sensitive questions that allowed the respondents to give their replies in a way that did not reveal true status. These methods allowed many researchers, including social scientists, to conduct surveys over sensitive subjects and obtain more accurate and efficient results. Since Warner (1965) first proposed his method, many other statisticians have made strides in randomized response, making improvements along the way which can be had from recent valuable monographs by Fox (2016), Chaudhuri et al. (2016), Chaudhuri and Christofides (2013), and Chaudhuri (2011).

In the next section, we will discuss the Warner (1965), Mangat and Singh (1990), Kuk (1990) and Odumade and Singh (2009) models.

2 Background

In brief, survey sampling statisticians have long dealt with the difficulties of estimating the true population proportion of those individuals belonging to a group defined by a sensitive characteristic. One popular solution, first proposed by Warner (1965), is the implementation of a randomization device which protects the privacy of those individuals being surveyed. The idea instructs the respondent, while keeping to themselves, to make use of randomization device, such as a deck of cards. The respondent will select a card from the deck. Each card in this deck will have either the statement “I belong to group A” or the statement “I do not belong to group A” with proportion P0 and (1 − P0), respectively. After selecting a card, the respondent will read it to themselves and only tell the interviewer ‘yes’ or ‘no’ if the statement on the drawn card matches his/her status. By letting π represent the true population proportion of those individuals belonging to group A, the Warner (1965) model gives the estimator in Eq. 2.1 of true proportion π of those individuals who belong to group A, for a given P0, as:

$$ \hat{\pi}_{w(P_{0})}= \frac{\hat{\theta}_{w}-(1-P_{0})}{2P_{0}-1}, P_{0} \neq 0.5 $$
(2.1)

where \(\hat {\theta }_{w}=n_{w}/n\) is the observed proportion of ‘yes’ replies out of n respondents selected from the population utilizing a simple random with replacement sampling (SRSWR) scheme, and nw is the observed number of ‘yes’ replies received by the interviewer. Then, the above estimator is unbiased and provides the variance in Eq. 2.2 for two trials per respondent, for a given P0, as

$$ V(\hat{\pi}_{w(P_{0})})= \frac{\pi(1-\pi)}{n} + \frac{P_{0}(1-P_{0})}{2n(2P_{0}-1)^{2}} $$
(2.2)

Since the Warner (1965) randomized response model only requires the respondent to deal with a single randomization device, such as a deck of cards, we also consider the case where the Warner (1965) model is performed twice, independently. We do this because the randomization devices that have been proposed, and which will be discussed, after the Warner (1965) model make use of two randomization devices, such as two decks of cards. These devices that use a second device gain efficiency and also improve protection for the respondents participating. Thus, while still using Warner’s (1965) model, consider the case in which the interviewer receive 0, 1, or 2 ‘yes’ replies based on using two independent randomizing devices with parameter P, and T. Then by letting π represent the true population proportion of people belonging to group A, then the probability mass function (p.m.f) of the i-th reply Zi is obtained as in Table 1:

Table 1 Probability mass function (p.m.f) of Zi

From this, the expected value of Zi is given as

$$ \begin{array}{@{}rcl@{}} E(Z_{i}) &=&(0) [ \pi (1-P)(1-T) +(1-\pi) PT] \\&&+(1)[\pi \{ P(1-T) +T(1-P) \} +(1-\pi) \{ P(1-T) +T(1-P) \} ] \\&&+(2) [\pi PT +(1-\pi) (1-P)(1-T) ]\\& =&2(P+T-1) \pi + P(1-T) + T(1-P) +2(1-P)(1-T) \end{array} $$
(2.3)

The variance of the response Zi, V (Zi), is given by

$$ V(Z_{i})= E({Z_{i}^{2}}) - (E(Z_{i}))^{2} $$
(2.4)

Now we have

$$ \begin{array}{@{}rcl@{}} E({Z_{i}^{2}}) &{}={}&(0)^{2} [ \pi (1-P)(1-T) +(1-\pi) PT] \\ &&+(1)^{2}[\pi \{ P(1-T) +T(1-P) \}{} +{}(1-\pi) \{ P(1-T) {}+{}T(1-P) \} ]\\ &&+(2)^{2} [\pi PT +(1-\pi) (1-P)(1-T) ] \end{array} $$
(2.5)

By plugging Eqs. 2.3 and 2.5 into Eq. 2.4, the variance of Zi is given as

$$ V(Z_{i})= 4(P+T-1)^{2} \pi (1-\pi) +P(1-P) + T(1-T) $$
(2.6)

From Eq. 2.3, an unbiased estimator of π is given by

$$ \hat{\pi}_{w(PT)} = \frac{\frac{1}{n} {\sum}_{i=1}^{n} Z_{i} -[P(1-T)+T(1-P)+2(1-P)(1-T)]}{2(P+T-1)} $$
(2.7)

The variance of the Warner (1965) estimator \(\hat {\pi }_{w}\) for two trials per respondent with two independent devices with parameters, P and T, is given as

$$ V(\hat{\pi}_{w(PT)}) = \frac{\pi (1-\pi)}{n} + \frac{P(1-P)+T(1-T)}{4n(P+T-1)^{2}} $$
(2.8)

Clearly, if we let P = T = P0, this model reduces back to the original Warner (1965) model with two independent trials per respondent as in Eq. 2.2. To our knowledge the result in Eq. 2.8 is new, however we cited in the background section of the present investigation.

Mangat and Singh (1990) improved on the Warner (1965) model by proposing a two stage randomized response by making use of two decks of cards. In the Mangat and Singh (1990) model, each respondent is asked to use two randomized devices as R1 and R2. The device R1 consists of two outcomes, “Are you a member of group A?”, with relative frequency T0 and “Go to the second randomization device R2” with relative frequency (1 − T0). The second randomization device R2 is the same as the Warner (1965) randomization device. Similarly to the Warner (1965) model, we let π represent the true population proportion of those individuals belonging to group A, and nms be the number of ‘yes’ replies received by the interviewer frpm n respondents selected from the population utilizing a SRSWR. The following estimator of π is derived for the Mangat and Singh (1990) model:

$$ \hat{\pi}_{ms} = \frac{\hat{\theta}_{ms}-(1-T_{0})(1-P_{0})}{(2P_{0}-1)+2T_{0}(1-P_{0})} $$
(2.9)

where \(\hat {\theta }_{ms}=n_{ms}/n\) is the proportion of the observed ‘yes’ answers in the sample. The estimator in Eq. 2.9, provided by Mangat and Singh (1990) is unbiased and has the variance:

$$ V(\hat{\pi}_{ms}) = \frac{\pi (1-\pi)}{n} + \frac{(1-T_{0})(1-P_{0})[1-(1-T_{0})(1-P_{0})]}{n[(2P_{0}-1)+2T_{0}(1-P_{0})]^{2}} $$
(2.10)

Following Warner (1986) and Fox and Tracy (1986 p. 30), Kuk (1990) suggests to use theory of recoding of responses to overcome an undesirable feature of randomized response techniques. Kuk (1990) model avoids putting statements like, “I belong to group A”, because A is a sensitive group so the respondent may become sceptical and uncooperative. He suggests to put cards of same size and type but of different colors (red and blue, say). If a respondent belongs to group A (Ac) then he/she is to draw a card from deck-I (deck-2) and report the color of the drawn card, instead of answering the sensitive question. Let 𝜃1 be the proportion of red cards in the deck-I and 𝜃2 be the proportion of red cards in deck-2. By letting π represent the true population proportion of those individuals belonging to group A, Kuk’s (1990) model gives the probability of a “red” color response from a respondent:

$$ P(\text{red}) =\theta_{kuk} = \theta_{1} \pi + \theta_{2} (1-\pi) $$
(2.11)

Additionally, suppose the n respondents are selected from the population utilizing a SRSWR. Then, letting nkuk be the number of ‘red’ replies received by the interviewer, we have that nkuk follows a Binomial distribution with parameters n and 𝜃kuk. If each individual being interviewed is requested to give k ≥ 1 replies, then Kuk’s (1990) model gives the variance:

$$ V(\hat{\pi}_{kuk}) = \frac{\theta_{kuk} (1-\theta_{kuk})}{nk(\theta_{1}-\theta_{2})^{2}} + \frac{\pi (1-\pi)}{n}(1-\frac{1}{k}) $$
(2.12)

The Kuk (1990) model serves as a special case of many suggested randomized response models such as those proposed by Warner (1965), Mangat and Singh (1990). While all of these models posed as improvements upon one another, a recent paper has proposed a method that has been shown to be more efficient than all of these models. The Odumade and Singh (2009) randomized response model is shown to be more efficient to that of Warner (1965), Mangat and Singh (1990), and Kuk (1990) models. Naturally, this means that Odumade and Singh (2009) is the estimator which we wish to modify. The Odumade and Singh (2009) model consists of two decks of cards as as shown in Fig. 1.

Figure 1
figure 1

Two deck randomized response model

In this model, each respondent selected in the sample experiences an ordered pair of a deck of cards. This ordered pair is (Deck-I, Deck-II). The respondent matches his/her status with the two outcomes from the two decks and replies either (yes, yes), (yes, no), (no, yes), or (no, no). There are four types of responses that can be observed from both types of respondents either belonging to group A or Ac. This randomized response model produces the following probabilities:

$$ P(\text {yes, yes} ) = \theta_{11} =(P+T-1) \pi + (1-P)(1-T) $$
(2.13)
$$ P(\text {yes, no} ) = \theta_{10} =(P-T) \pi + T(1-P) $$
(2.14)
$$ P(\text {no, yes} ) = \theta_{01} =(T-P) \pi + P(1-T) $$
(2.15)

and

$$ P(\text {no, no} ) = \theta_{00} =(1-P-T) \pi + PT $$
(2.16)

As before we let π represent the true population proportion of those individuals belonging to group A and considering n respondents, with n11, n10, n01, and n00 being the number of (yes, yes), (yes, no), (no, yes) and (no, no) respective replies received by the interviewer. The respondents are selected from the population utilizing a SRSWR. They considered a minimization of a distance function defined as

$$ D = \frac{1}{2} \sum\limits^{1}_{i=0} \sum\limits^{1}_{j=0} (\theta_{ij} -\hat{\theta}_{ij})^{2} $$
(2.17)

where \(\hat {\theta }_{ij} = n_{ij}/n\), i = 0,1;j = 0,1 is the observed proportions of (yes, yes), (yes, no), (no, yes) and (no, no) replies.

Then, the estimator of true proportion derived from the Odumade and Singh (2009) model is given as

$$ \hat{\pi}_{os} = \frac{(P+T-1)(\hat{\theta}_{11} -\hat{\theta}_{00}) +(P-T)(\hat{\theta}_{10}-\hat{\theta}_{01}) }{2[(P+T-1)^{2}+(P-T)^{2}]} + \frac{1}{2} $$
(2.18)

The estimator \(\hat {\pi }_{os}\) of Odumade and Singh (2009) is unbiased and has the variance:

$$ V(\hat{\pi}_{os}) =\frac{(P+T-1)^{2}[PT+(1-P)(1-T)]+(P-T)^{2}[T(1-P)+P(1-T)]}{4n[(P+T-1)^{2}+(P-T)^{2}]^{2}}-\frac{(2\pi-1)^{2}}{4n} $$
(2.19)

An unbiased estimator of the \(V(\hat {\pi }_{os})\) is given by

$$ \hat{v}(\hat{\pi}_{os}) =\frac{(P+T-1)^{2}[PT+(1-P)(1-T)]+(P-T)^{2}[T(1-P)+P(1-T)]}{4(n-1)[(P+T-1)^{2}+(P-T)^{2}]^{2}}-\frac{(2 \hat{\pi}_{os}-1)^{2}}{4(n-1)} $$
(2.20)

In the next section, we consider alternative methods to squared distance function to analyze the data collected by using two deck method.

3 New Randomized Response Estimators Based Upon a Sum of Special Products Technique and Upon the Method of Solving for a Matrix Determinant

We became curious as to what types of methods we could use to produce a new and more efficient randomized response estimator. In particular, how best to utilize the data at hands. The method developed here, which leads to an unbiased and efficient estimator of population proportion without any additional cost (or lost of protection), will be expect to be a new challenge to survey statisticians to think more along these lines.

Now, instead of trying to minimize the distance function as Odumade and Singh (2009) did, we attempted two new methods. The first idea, is to optimize the Sum of Special Products (SSP) given by

$$ SSP = (\theta_{11}-\hat{\theta}_{11})(\theta_{10}-\hat{\theta}_{10}) + (\theta_{01}-\hat{\theta}_{01})(\theta_{00}-\hat{\theta}_{00}) $$
(3.1)

We named it SSP because the first term in the product is obtained by fixing the first response as “Yes” and the second product is obtained by fixing the first response as “No”. No doubt there are many possibilities, but this SSP leads to an amazing estimator and also opens a big-window for future research. Now we have the following theorems:

Theorem 3.1.

The estimator given below, which optimizes the SSP is unique and unbiased.

$$ \hat{\pi}_{SSP} =\frac{(P+T-1)(\hat{\theta}_{10} -\hat{\theta}_{01}) + (P-T) (\hat{\theta}_{11} - \hat{\theta}_{00}) }{4(P+T-1)(P-T)}+\frac{1}{2}, \quad P \neq T $$
(3.2)

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 3.2.

The variance of the estimator \(\hat {\pi }_{SSP}\) is given by:

$$ V(\hat{\pi}_{SSP}) =\frac{(P+T-1)^{2}[P(1-T)+T(1-P)]+(P-T)^{2}[(1-P)(1-T)+PT]}{16n(P+T-1)^{2}(P-T)^{2}}-\frac{(2 \pi-1)^{2}}{4n} $$
(3.3)

Proof.

See Online Supplementary Material in Appendix-A.□

Theorem 3.3.

An unbiased estimator of \(V(\hat {\pi }_{SSP})\) is given by:

$$ \hat{v}(\hat{\pi}_{SSP}) =\frac{(P+T-1)^{2}[P(1-T)+T(1-P)]+(P-T)^{2}[(1-P)(1-T)+PT]}{16(n-1)(P+T-1)^{2}(P-T)^{2}}-\frac{(2 \hat{\pi}_{SSP}-1)^{2}}{4(n-1)} $$
(3.4)

Proof.

Following Odumade and Singh (2009), it is easy to verify \(E[\hat {v}(\hat {\pi }_{SSP})]=V(\hat {\pi }_{SSP})\), which proves the theorem. □

The second method for creating an estimator from the data generated by Odumade and Singh (2009) device takes inspiration from the computation of a determinant of a 2 × 2 matrix. Consider the following square matrix of differences in Fig. 2 as:

Figure 2
figure 2

2 × 2 matrix of true differences

Theorem 3.4.

The estimator below, which optimizes the determinant of the true differences is unique and unbiased.

$$ \hat{\pi}_{DET} =\frac{(P+T-1)(\hat{\theta}_{00} -\hat{\theta}_{11}) + (P-T) (\hat{\theta}_{10} - \hat{\theta}_{01}) }{2[(P-T)^{2} -(P+T-1)^{2}]}+\frac{1}{2} $$
(3.5)

Proof.

See Online Supplementary Material in Appendix-A.□

Theorem 3.5.

The variance of the estimator \(\hat {\pi }_{DET}\) is given by

$$ \begin{array}{@{}rcl@{}} \begin{aligned} V(\hat{\pi}_{DET}) &=\frac{(P+T-1)^{2}[(1-P)(1-T)+PT]+(P-T)^{2}[T(1-P)+P(1-T)]}{4n[(P-T)^{2}-(P+T-1)^{2}]^{2}} \\& -\frac{(2\pi-1)^{2}}{4n} \end{aligned} \end{array} $$
(3.6)

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 3.6.

An unbiased estimator of \(V(\hat {\pi }_{DET})\) is given by

$$ \begin{array}{@{}rcl@{}} \hat{v}(\hat{\pi}_{DET}) \!&=&\!\frac{(P + T - 1)^{2}[(1 - P)(1 - T) + PT]+(P - T)^{2}[T(1 - P) + P(1 - T)]}{4(n-1)[(P-T)^{2}-(P+T-1)^{2}]^{2}} \\ &&-\frac{(2\hat{\pi}_{DET}-1)^{2}}{4(n-1)} \end{array} $$
(3.7)

Proof.

It is easy to verify \(E[\hat {v}(\hat {\pi }_{DET})]=V(\hat {\pi }_{DET})\), which proves the theorem. □

In the next section, we propose a new unbiased regression type estimator that makes use of estimators obtained from optimizing the SSP approach and Determinant (DET) approach along with the Odumade and Singh (2009) estimator.

4 A New Unbiased Regression Type Estimator

Unfortunately, when checking the efficiency of the randomized response models derived in Section 2, we came to determine that the new estimators, \(\hat {\pi }_{SSP}\) and \(\hat {\pi }_{DET}\), did not improve upon the Odumade and Singh (2009) estimator, \(\hat {\pi }_{os}\) in terms of relative efficiency. However, we considered the idea of possibly combining the SSP and DET type estimators previously derived and obtain a regression type estimator.

Theorem 4.1.

An unbiased regression type estimator of the true proportion π of individuals belonging to a sensitive group A is given by:

$$ \hat{\pi}_{reg} = \hat{\pi}_{os} + \beta (\hat{\pi}_{SSP} - \hat{\pi}_{DET} ) $$
(4.1)

where β is a constant to be derived.

Proof 7.

The estimator is unbiased whatever the choice of β since \(\hat {\pi }_{os}\), \(\hat {\pi }_{SSP}\) and \(\hat {\pi }_{DET}\) are all unbiased. The optimum value of β is free from the value of π. See Online Supplementary Material in Appendix-A. □

Theorem 4.2.

The minimum variance of \(\hat {\pi }_{reg}\) is given by:

$$ \begin{array}{@{}rcl@{}} min. V(\hat{\pi}_{reg}) = V(\hat{\pi}_{os})- \frac{[Cov(\hat{\pi}_{os}, \hat{\pi}_{SSP}-\hat{\pi}_{DET})]^{2}}{V(\hat{\pi}_{SSP}-\hat{\pi}_{DET})} \end{array} $$
(4.2)

where

$$ \begin{array}{@{}rcl@{}} Cov(\hat{\pi}_{os}, \hat{\pi}_{SSP}-\hat{\pi}_{DET}) =-\frac{1}{8n} \end{array} $$
(4.3)

and

$$ \begin{array}{@{}rcl@{}} V(\hat{\pi}_{SSP}-\hat{\pi}_{DET}) &=&\frac{ (P+T-1)^{2} \{ P(1-T)+T(1-P) \} +(P-T)^{2}\{ (1-P)(1-T)+PT \} }{ 16n(P+T-1)^{2}(P-T)^{2} } \\ &+&\frac{(P+T-1)^{2}\{(1-P)(1-T)+PT\} +(P-T)^{2}\{T(1-P)+P(1-T)\}}{4n\{(P-T)^{2}-(P+T-1)^{2} \}^{2}}\\ &&-\frac{1}{4n} \end{array} $$
(4.4)

which are free from the value of π.

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 4.3.

The optimum value of β which minimizes the variance of \(\hat {\pi }_{reg}\) in (4.1) can be written as

$$ \begin{array}{@{}rcl@{}} \beta =-\frac{2(2T-1)^{2}(2P-1)^{2}(P+T-1)^{2}(P-T)^{2}}{[8PT(1-P-T+PT)-P(1-P)-T(1-T)][2P(1-P)+2T(1-T)-1]^{2}} \end{array} $$
(4.5)

which is again free from the value of π.

Proof.

See Online Supplementary Material in Appendix-A. □

Theorem 4.4.

An unbiased estimator of \(V(\hat {\pi }_{reg})\) is given by

$$ \hat{v}(\hat{\pi}_{reg}) =\hat{v}(\hat{\pi}_{os}) -\frac{\{ Cov(\hat{\pi}_{os}, \hat{\pi}_{SSP}-\hat{\pi}_{DET} ) \}^{2}}{ V(\hat{\pi}_{SSP}-\hat{\pi}_{DET} ) } $$
(4.6)

where \(\hat {v}(\hat {\pi }_{os})\) is given in (2.20).

Proof.

Trivial, because \(Cov(\hat {\pi }_{os}, \hat {\pi }_{SSP}-\hat {\pi }_{DET} )\) in Eq. 3.3 and \(V(\hat {\pi }_{SSP}-\hat {\pi }_{DET} )\) in (4.4) are free from the value of π.□

5 Efficiency Comparisons

In order to show that our new estimator is better than the randomized response estimators derived for two trials by Warner (1965), Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009), we must compute the relative efficiencies. With respect to Warner (1965), the relative efficiency criterion for the two cases considered are given by

$$ RE(w)_{P_{0}} = \frac{V(\hat{\pi}_{w(P_{0})})}{V(\hat{\pi}_{reg})}\times 100\% $$
(5.1)
$$ RE(w)_{PT} = \frac{V(\hat{\pi}_{w_{2}})_{PT}}{V(\hat{\pi}_{reg})}\times 100\% $$
(5.2)

Similarly, the relative efficiency criterion with respect to Mangat and Singh (1990), Kuk (1990) and Odumade and Singh (2009) are given by

$$ RE(ms) = \frac{V(\hat{\pi}_{ms})}{V(\hat{\pi}_{reg})}\times 100\% $$
(5.3)
$$ RE(kuk) = \frac{V(\hat{\pi}_{kuk})}{V(\hat{\pi}_{reg})}\times 100\% $$
(5.4)

and

$$ RE(os) = \frac{V(\hat{\pi}_{os})}{V(\hat{\pi}_{reg})}\times 100\% $$
(5.5)

We used the suggested model, and ran a code in SAS (given in Arias (2019)) to compare the efficiency of the proposed model with respect to Warner (1965), Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009). For Warner (1965) with two trials per respondent in Eq. 2.2, Mangat and Singh (1990), we set P0 = P and T0 = T. For Kuk (1990), we set 𝜃1 = P, 𝜃2 = T, and k = 2. For Odumade and Singh (2009), Warner (1965) with two trials in Eq. 2.8, and the suggested model, we allowed the values of P to range from 0.55 to 0.80, and the value of T to range from 0.10 to 0.25, both with a step of 0.05.

Tables 23 and 4 display summaries of the results, while the full outcome of results can be obtained by executing the SAS Codes. For each value of π, we found the mean, standard deviation, maximum, and minimum of the found relative efficiencies for various choices of P and T.

Table 2 Summarized results of relative efficiencies \(RE(w)_{P_{0}}\) and RE(w)PT
Table 3 Summarized results of relative efficiencies RE(ms) and RE(kuk)
Table 4 Summarized results of relative efficiencies RE(os)

In Tables 2 to 4freq stands for the number of times the proposed estimator is more efficient than all the competitors considered out of 24 possible combinations for P and T. As one can clearly see from Tables 2 to 4, the relative efficiency of the suggested model with respect to the models of the competitors is much better. As the value for π ranges from 0.05 ≤ π ≤ 0.50 with a step of 0.05 , the proposed estimator performs much better than the models proposed by Warner (1965) for both cases, Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009). Now in order to visualize this concept, we can use the Figs. 3 and 4 to see the relative efficiency. In these figures, we will put, into visuals, the relative efficiencies over the unknown proportion π.

Figure 3
figure 3

RE w.r.t the Warner, Mangat and Singh, and Kuk models. a RE w.r.t. Warner as in (5.1). b RE w.r.t. Warner as in (5.2). c RE w.r.t. Mangat and Singh (1990) model. d RE w.r.t. Kuk (1990) model

Figure 4
figure 4

RE wi.r.t the Odumade and Singh model

From Figs. 3 and 4, one can clearly see that when compared with each of the randomized response models produced by Warner (1965) for two trials, Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009), the suggested model performs better. In each respective case, it is easy to see that each of the values for relative efficiency, with respect to the other models, will remain above 100%. These graphs and summary of results all show that the suggested estimator can perform much better than all of the other estimators provided by Warner (1965), Mangat and Singh (1990), Kuk (1990), and Odumade and Singh (2009).

6 Simulation Study

A simulation study, very similar to real survey data, was conducted using SAS. We wanted to see how the proposed estimator would perform against the Odumade and Singh (2009) estimator in a real survey. While we let 0.05 ≤ π ≤ 0.50 and set P = 0.316 and T = 0.845, we first determined the true probabilities of 𝜃11,𝜃10,𝜃01, and 𝜃00. Then, utilizing SAS, we created a sample size of n = 50 replies by utilizing the call function RandMultinomial(ntrials,ns,prob) where ntrials represents the number of trials in each simulation, ns is the sample size and prob represents the probabilities. We then used NITR = 10,000 which means each trial had 10,000 iterations. Naturally, we computed the simulated variances of \(\hat {\pi }_{os}\) and \(\hat {\pi }_{reg}\) as follows

$$ V_{sim} (\hat{\pi}_{os}) = \frac{1}{NITR} \sum\limits_{i=1}^{NITR} (\hat{\pi}_{os(i)} - \pi )^{2} $$
(6.1)

and

$$ V_{sim} (\hat{\pi}_{reg}) = \frac{1}{NITR} \sum\limits_{i=1}^{NITR} (\hat{\pi}_{reg(i)} - \pi )^{2} $$
(6.2)

The relative efficiency can be determined from Eqs. 6.1 and 6.2 as follows

$$ RE_{sim} = \frac{V_{sim}(\hat{\pi}_{os})}{V_{sim}(\hat{\pi}_{reg})}\times 100\% $$
(6.3)

where REsim is the simulated relative efficiency of \(\hat {\pi }_{reg}\) with respect to \(\hat {\pi }_{os}\). A total of 10 simulations were run for 0.05 ≤ π ≤ 0.50 with a step of 0.05. Each individual case considered 10,000 different trials. In each trial a sample size of 50 individuals was used in order to produce accurate results. For each study we determined and studied the relative efficiency of the suggested estimator with respect to the Odumade and Singh (2009) estimator. The results are given below in Table 5.

Table 5 Summarized results of relative efficiencies REsim in Eq. 6.3

Clearly from Table 5, the results of the simulation studies run on SAS show that the suggested estimator will perform better than that of the Odumade and Singh (2009) estimator. The added β coefficient forces the suggested estimator to always be more efficient, especially when the optimum choice is used for the minimum variance of the estimator. The original version of this work can be had from Arias (2019), and suggests that there is potential for further research.

7 Conclusion

We knew that the model produced by Odumade and Singh (2009) could perform better than that of Warner (1965), Mangat and Singh (1990), and Kuk (1990). However, the natural question was posed, how do we beat the Odumade and Singh (2009) model? The use of new methods of minimizing SSP and determinant type techniques proved to be of no use. Nevertheless, it became obvious to us that a regression type estimator with an optimum value for β would do the trick. As one can clearly see from the proofs, tables, and figures, the idea to utilize a regression type estimator was the way to go. The suggested estimator became more efficient than all the other estimators we were comparing against. Naturally, additional research is required to see if we can further improve an estimator in the field of randomized response sampling.