Abstract
In this paper, we have suggested a weighted unbiased estimator based on mixed randomized response model. Some unbiased estimators are generated from the proposed weighted estimator. The variance of the proposed weighted estimator is obtained and relevant condition is obtained in which the proposed weighted estimator is superior to Singh and Tarray (Sociol Methods Res 44(4):706–722, 2014) estimator. It is interesting to mention that we have investigated an estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) which is the member of the suggested weighted estimator \( \hat{\pi }_{\text{HS}} \) provide better efficiency than the Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) and close to the optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \). Thus, the estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) is an alternative to optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \). The study is further extended in case of stratified random sampling.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Respondents sometimes come across sensitive questions, such as gambling, alcoholism, sexual and physical abuse, drug addiction, abortion, tax evasion, illegal income, mobbing, political view, doping usage, homosexual activities and many others. When respondents are asked directly with such questions, they may refuse to answer the question or give untruthful answers, which would significantly affect the quantity and quality of the survey. Warner (1965) introduced the randomized response technique (RRT) to address this problem.
Several variations of the original RRT models, both binary response and quantitative response models, have been discussed by researchers, including Mangat and Singh (1990) Fox and Tracy (1986), Chaudhuri and Mukerjee (1987, 1988), Hedayat and Sinha (1991), Tracy and Mangat (1996), Mangat and Singh (1990),Mangat (1994), Mahmood et al. (1998), Singh et al. (2000), Chang and Huang (2001), Christofides (2003), Huang (2004), Chang et al. (2004a, b), and Singh and Tarray (2012).
To implement the privacy problem with the Moors (1997) model, Mangat et al. (1997) and Singh et al. (2000) have given several strategies as alternatives to Moors (1997) model, but their models may lose a large portion of data information and require a high cost to obtain confidentiality of the respondents. These drawbacks with the previous alternative models for the Moors model motivated Kim and Warde (2005) to envisage a mixed RR model using simple random sampling with replacement that modifies the privacy problem. The work of this paper based on mixed randomized response model due to Singh and Tarray’s (2014). So the description of their model is given below.
1.1 Singh and Tarray’s (2014) mixed randomized response model
In the model given by Singh and Tarray (2014), a single sample with size n is selected by simple random sampling with replacement (SRSWR) from the population. Each respondent from the sample is instructed to answer the direct question, “I am a member of the innocuous trait group”. If a respondent answers “Yes” to direct question, then he or she is instructed to go to randomization device \( R_{1} \) consisting of the statements (i) “I am a member of the sensitive trait group” and (ii) “I am a member of the innocuous trait group” with probabilities of selection \( P_{1} \) and \( \left( {1 - P_{1} } \right) \), respectively. If a respondent answers “No” to the direct question, then the respondent is instructed to use a randomization procedure due to Mangat (1994). In the Mangat’s (1994) RR procedure, each respondent is instructed to say “Yes” if he or she is a member of the sensitive trait group. If he or she is not a member of the sensitive trait group, then the respondent is required to use the Warner’s (1965) randomization device \( R_{2} \) consisting of statements: (a) “I belong to the sensitive trait group” and (b) “I do not belong to the sensitive trait group” represented with probabilities \( P \) and \( \left( {1 - P} \right) \), respectively. Then he or she is to report “Yes” or “No” according to the outcome of the randomization device \( R_{2} \) and the actual status that he or she has with respect to the sensitive trait group. The survey procedures are performed under the assumption that both the sensitive and the innocuous questions are unrelated and independent in a randomization device \( R_{1} \). To protect the respondent’s privacy, the respondents should not disclose to the interviewer the question they answered from either \( R_{1} \) or \( R_{2} \).
Let n be the sample size confronted with a direct question, and \( n_{1} \) and \( n_{2} \) \( \left( { = n - n_{1} } \right) \) denote the number of “Yes” and “No” answers from the sample. Since all the respondents using a randomization device \( R_{1} \) already responded “Yes” from the initial direct innocuous question, the proportion “Y” of getting “Yes” answers from the respondents using randomization device \( R_{1} \) is expressed as
where \( \pi_{\text{s}} \) is the proportion of “Yes” answers from the sensitive trait and \( \pi_{1} \) is the proportion of “Yes” answer from the innocuous question.
An unbiased estimator of \( \pi_{\text{s}} \), in terms of the sample proportion of “Yes” responses \( \hat{Y} \), is given by
with variance
The proportion of “Yes” answers from the respondents using Mangat’s (1994) randomization device \( R_{2} \)
An unbiased estimator of \( \pi_{\text{s}} \), in terms of the sample proportion of “Yes” responses \( \hat{X} \) is given by
The variance of \( \hat{\pi }_{2} \) is given by
Giving weight \( \lambda = {{n_{1} } \mathord{\left/ {\vphantom {{n_{1} } n}} \right. \kern-0pt} n} \) to the estimator \( \hat{\pi }_{1} \) and \( \left( {1 - \lambda } \right) = {{\left( {n - n_{1} } \right)} \mathord{\left/ {\vphantom {{\left( {n - n_{1} } \right)} n}} \right. \kern-0pt} n} \) to the estimator \( \hat{\pi }_{2} \), Singh and Tarray (2014) suggested an unbiased estimator for \( \pi_{\text{s}} \) as
with the variance
For \( P = \left( {2 - P_{1} } \right)^{ - 1} \), Singh and Tarray (2014) obtained the variance of \( \hat{\pi }_{\text{h}} \) as
where
In Sect. 2, we have suggested a weighted unbiased estimator for \( \pi_{\text{s}} \) and studied its properties.
2 Proposed class of unbiased estimators
We define a weighted unbiased estimator for \( \pi_{\text{s}} \) as
where \( \eta_{1} \) and \( \eta_{2} \) are suitably chosen weights such that \( \eta_{1} + \eta_{2} = 1 \).
For suitable values of \( \left( {\eta_{1} ,\eta_{2} } \right) \), a set of estimators can be identified, for instance, see Table 1
It is known that the two randomization devices are independent, therefore, the variance of \( \hat{\pi }_{\text{HS}} \) is given by
Inserting \( P = \left( {2 - P_{1} } \right)^{ - 1} \) in (11) we get
The variance of \( \hat{\pi }_{HS} \) at (12) is minimised for
Inserting (13) in (10) we get the optimum estimator (OE) for \( \pi_{\text{s}} \) as
Thus, the resulting minimum variance of \( \hat{\pi }_{\text{HS}} \) (or the variance of the OE \( \left( {\hat{\pi }_{\text{HS}}^{\text{o}} } \right) \) is given by
Thus, we state the following Theorem.
Theorem 2.1
The variance of the weighted estimator \( \hat{\pi }_{HS} \),
with equality holding if
Putting \( \eta_{1} = \lambda \) and \( \eta_{2} = \left( {1 - \lambda } \right) \) in (12) one can easily get the variance of Singh and Tarray (2014) estimator as given in (9).
2.1 Special case
For \( \eta_{1} = \frac{{\lambda P_{1} }}{1 - \lambda } \), the proposed estimator \( \hat{\pi }_{\text{HS}} \) defined by (10) reduces to an unbiased estimator
Here, we note that \( \left( {\lambda ,P_{1} } \right) \) are known.
Putting \( \eta_{1} = \frac{{\lambda P_{1} }}{1 - \lambda } \) in (12) we get the variance of the unbiased estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) as
Putting \( \eta_{1} = \lambda \Rightarrow \eta_{2} = \left( {1 - \lambda } \right) \) in (12) we get the variance of the Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) as
which is positive if
To see the performance of the suggested estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) at (16) relative to Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) given by (7) we have computed the percent relative efficiency (PRE) of \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) with respect to \( \hat{\pi }_{\text{h}} \) using the formula given in Sect. 3 for different values of \( \left( {\lambda ,P_{1} ,\pi_{\text{s}} } \right) \).
3 Efficiency comparison
In this section, we have made the comparison of the proposed weighted mixed randomized response model, under completely truthful reporting case, with Singh and Tarray’s (2014) model.
We have from (9) and (16) that
which is always positive.
It follows that the proposed class of estimators \( \hat{\pi }_{\text{HS}} \) is more efficient than Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) at optimum condition. Thus, we infer that to get estimator better than Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) one has to choose the values of \( \left( {\eta_{1} ,\eta_{2} } \right) \) in the vicinity of the exact optimum values \( \left( {\eta_{10} ,\eta_{20} } \right) \) of \( \left( {\eta_{1} ,\eta_{2} } \right) \).
The percent relative efficiency of the OE \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) with respect to Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) is given by
Further, the PRE of \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) with respect to \( \hat{\pi }_{\text{h}} \) is given by
with the help of the formula given in (13), we have computed the optimum values of \( \eta_{10} \) and \( \eta_{20} \) for different values of \( \left( {\lambda ,\;\pi_{\text{s}} ,\;P_{1} } \right) \) and findings are shown in Table 2.
Using the formulae given by (21) and (22) we have computed the values of \( {\text{PRE}}\left( {\hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} ,\hat{\pi }_{\text{h}} } \right) \) and \( {\text{PRE}}\left( {\hat{\pi }_{{{\text{HS}}\left( 1 \right)}} ,\hat{\pi }_{\text{h}} } \right) \) for different values of \( \left( {\lambda ,\;\pi_{\text{s}} ,\;P_{1} } \right) \) and findings are tabulated in Tables 3 and 4, respectively.
Table 2 depicts the optimum values \( \left( {\eta_{10} ,\eta_{20} } \right) \) of weights \( \left( {\eta_{1} ,\eta_{2} } \right) \) in the proposed estimator \( \hat{\pi }_{\text{HS}} \) for the various values of \( \pi_{\text{s}} \), \( \lambda \), \( P_{1} \), and n = 1000. Table 2 reveals that for fixed values of \( \left( {\pi_{\text{s}} ,\lambda } \right) \), the value of \( \eta_{10} \) increases as \( P_{1} \) increases while \( \eta_{20} \) decreases as \( P_{1} \) increases. On the other hand, it is looked upon that for fixed values of \( \left( {\lambda ,P_{1} } \right) \) the value of \( \eta_{10} \) increases as \( \pi_{s} \) increases and \( \eta_{20} \) decreases as \( \pi_{\text{s}} \) increases. It follows from Table 3 that \( {\text{PRE}}\left( {\hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} ,\hat{\pi }_{\text{h}} } \right) \) decreases as \( P_{1} \) increases and it decreases as \( \pi_{\text{s}} \) increases. For fixed values of \( \left( {\pi_{\text{s}} ,P_{1} } \right) \) the \( {\text{PRE}}\left( {\hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} ,\hat{\pi }_{\text{h}} } \right) \) increases as λ decreases.
There is considerable gain in efficiency using the proposed OE \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) over Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) as long as \( P_{1} < \frac{1}{2} \). However, in general, the PRE of the proposed OE is larger than 100%.
Further, from Table 4 it is observed that
-
1.
there is substantial gain in efficiency using the envisaged estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) over Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) where \( P_{1} \le 0.42 \).
-
2.
the proposed estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) is always better than Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) as long as \( 0 < P_{1} \le 0.42 \) and \( \lambda \in \left( {0.1,\;0.5} \right) \).
-
3.
the \( {\text{PRE}}\left( {\hat{\pi }_{{{\text{HS}}\left( 1 \right)}} ,\hat{\pi }_{\text{h}} } \right) \) decreases as \( P_{1} \) increases.
Thus, the proposed estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) is to be preferred over Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) under the parametric restrictions (i) and (ii).
Further comparing results of Tables 3 and 4 we observed that the values of the Table 3 is very close to the values of Table 4. Thus, we infer that the proposed estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) would be used as an alternative to the optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \). There is practical difficulty in using the proposed optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) as it depends on the unknown parameter \( \pi_{\text{s}} \) under investigation while the proposed estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) does not face any such difficulty. So the estimator \( \hat{\pi }_{{{\text{HS}}\left( 1 \right)}} \) would be preferred over the optimum estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) and Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \).
3.1 Analytical comparison between the estimator \( \hat{\pi }_{\text{h}} \) and \( \hat{\pi }_{\text{HS}} \)
which is positive if
i.e. if \( \left\{ { - \left( {1 - \lambda } \right)V_{1} - \lambda V_{2} } \right\}\eta_{1}^{2} + 2\eta_{1} \lambda V_{2} + \left\{ {\lambda^{2} \left( {1 - \lambda } \right)V_{1} + \lambda \left( {1 - \lambda } \right)^{2} V_{2} - \lambda V_{2} > 0} \right\} \)
i.e. if \( - \eta_{1}^{2} D + 2\eta_{1} \lambda V_{2} + \lambda \left\{ {\left( {1 - \lambda } \right)\lambda V_{1} + \left( {1 - \lambda } \right)^{2} V_{2} - V_{2} } \right\} > 0 \)
i.e. if \( - \eta_{1}^{2} D + 2\eta_{1} \lambda V_{2} + \lambda \left\{ {\lambda \left[ {\left( {1 - \lambda } \right)V_{1} + \lambda V_{2} - \lambda V_{2} } \right] + \left( {1 - \lambda } \right)^{2} V_{2} - V_{2} } \right\} > 0 \)
i.e. if \( - \eta_{1}^{2} D + 2\eta_{1} \lambda V_{2} + \lambda \left\{ {\lambda D - \lambda^{2} V_{2} + \left( {1 - \lambda } \right)^{2} V_{2} - V_{2} } \right\} > 0 \)
i.e. if \( - \eta_{1}^{2} D + 2\eta_{1} \lambda V_{2} + \lambda \left\{ {\lambda D - 2\lambda V_{2} } \right\} > 0 \)
i.e. if \( - \eta_{1}^{2} D + 2\eta_{1} \eta_{10} D + \lambda \left\{ {\lambda D - 2\eta_{10} D} \right\} > 0 \)
i.e. if \( - \eta_{1}^{2} + 2\eta_{1} \eta_{10} + \lambda \left\{ {\lambda - 2\eta_{10} } \right\} > 0 \)
i.e. if \( \eta_{1}^{2} - 2\eta_{1} \eta_{10} - \lambda \left\{ {\lambda - 2\eta_{10} } \right\} < 0 \)
i.e. if \( \left( {\eta_{1} - \eta_{10} } \right)^{2} - \left( {\lambda - \eta_{10} } \right)^{2} < 0 \)
i.e. if \( \left( {\eta_{1} - \eta_{10} } \right)^{2} < \left( {\lambda - \eta_{10} } \right)^{2} \)
where \( D = \left[ {\left( {1 - \lambda } \right)V_{1} + \lambda V_{2} } \right] .\)
It is observed that the OE \( \hat{\pi }_{\text{HS}}^{{ ( {\text{o)}}}} \) is hard to apply in practice as the optimum weights involve the unknown parameter \( \pi_{\text{s}} \). However, one can generate estimators from \( \hat{\pi }_{\text{HS}} \) better than Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) with help of (23) even when exact optimum value of \( \eta_{1} \) is unknown.
We have computed the range of \( \eta_{1} \) using (23) for different values of \( \pi_{\text{s}} ,\lambda ,P_{1} \) and n =1000 and findings are shown in Table 5. It is observed from Table 5 that the value of lower limit of \( \eta_{1} \) increases as \( P_{1} \) increases for fixed values of \( \left( {\pi_{\text{s}} ,\lambda } \right) \) resulting in the shorter range of \( \eta_{1} \). We note from Tables 2 and 5 that one can obtain efficient estimator of \( \pi_{\text{s}} \) from the proposed class of estimators \( \hat{\pi }_{\text{HS}} \) even if the value of \( \eta_{1} \) deviates from its exact optimum value \( \eta_{10} \). Thus, the proposed class of estimators \( \hat{\pi }_{\text{HS}} \) can be used in practice even if the investigator is less experienced or has less association with the population under investigation.
The range of \( \lambda \) can be obtained from (23) in which the estimators shown in Table 1 are better than the Singh and Tarray estimator \( \hat{\pi }_{\text{h}} \). For example, if we set \( w_{1} = \left( {1 - \lambda } \right) \), we find that the estimator \( \hat{\pi }_{\text{HS1}} \) is more efficient than the Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{h}} \) if
4 Estimation that utilizes approximate optimum value
In this section, we study the “robustness” of the OE \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) in (14) against departure from the true optimum values \( \left( {\eta_{10} ,\eta_{20} } \right) \) of \( \left( {\eta_{1} ,\eta_{2} } \right) \).
It is to be mentioned that the OE \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) in (14) is of little practical utility as it depends on optimum values \( \left( {\eta_{10} ,\eta_{20} } \right) \) in (13) which are functions of the unknown parameter \( \pi_{\text{s}} \) (under study) and the known probability \( P_{1} \). However, in many practical situations, investigator has prior information regarding the parameter \( \pi_{\text{s}} \) and hence of \( \left( {\eta_{10} ,\eta_{20} } \right) \) due to either long association with the experimental material or through past data. One can also obtain the values of \( \left( {\eta_{10} ,\eta_{20} } \right) \) from the sample data at hand. Thus, the assumption that the investigator has prior information or guessed or approximate values of \( \left( {\eta_{10} ,\eta_{20} } \right) \) is quite reasonable. The estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) which substitute the approximate value \( \tilde{\eta }_{10} = \alpha \eta_{10} \Rightarrow \tilde{\eta }_{20} = \left( {1 - \alpha \eta_{20} } \right) \), where \( \alpha \left( { > 0} \right) \) is the departure from the true optimum value in the estimator \( \hat{\pi }_{HS}^{(o)} \) at (14) is defined by
The variance of \( \hat{\pi }_{\text{HS}}^{{({\text{o}})*}} \) is given by
which is always positive if
i.e. if \( \left( {\alpha - 1} \right)^{2} - \left( {1 - \frac{\lambda }{{\eta_{10} }}} \right)^{2} < 0 \)
i.e. if \( \left( {\alpha - 1} \right)^{2} < \left( {1 - \frac{\lambda }{{\eta_{10} }}} \right)^{2} \)
i.e. if \( \left| {\alpha - 1} \right| < \left| {1 - \frac{\lambda }{{\eta_{10} }}} \right| \)
From (9) and (26) the percent relative efficiency of the proposed estimator \( \hat{\pi }_{\text{HS}}^{{({\text{o}})*}} \) for the approximate values \( \left( {\tilde{\eta }_{10} ,\tilde{\eta }_{20} } \right) \), with respect to Singh and Tarray (2014) estimator is given as
We have computed the range of \( \alpha \left( \% \right) \) for different values of \( \left( {\pi_{\text{s}} ,P_{1} ,\lambda } \right) \) in Table 6. It is observed from Table 6 that the upper limit of \( \alpha \) decreases while lower limit of \( \alpha \) increases as \( P_{1} \) increases for the fixed values of \( \left( {\pi_{\text{s}} ,\lambda } \right) \). Table 6 also exhibits that for fixed values of \( \left( {\lambda ,P_{1} } \right) \), the value of upper limit of \( \alpha \) decreases while the lower limit of increases as increases.
Further to appreciate the idea of robustness, we have computed the relative efficiency (%) of the proposed estimator \( \hat{\pi }_{\text{HS}}^{{({\text{o}})*}} \) with respect to Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \) for various values of \( \left( {\pi_{s} ,P_{1} ,\lambda } \right) \) and \( \alpha \) demonstrated in Tables 7, 8, 9, 10, 11, 12. It is observed that the values of \( {\text{PRE}}\left( {\hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)*}} ,\hat{\pi }_{\text{h}} } \right) \) are more than 100. Further from Tables 7, 8, 9, 10, 11, 12 we note that the value of percent relative efficiency \( {\text{PRE}}\left( {\hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)*}} ,\hat{\pi }_{\text{h}} } \right) \) decreases as the value of \( P_{1} \) increases and it increases with increasing value of \( \pi_{\text{s}} \). Thus, we conclude that the proposed estimator \( \hat{\pi }_{\text{HS}}^{{\left( {\text{o}} \right)}} \) has practical utility in practice even if \( \alpha \) departs from ‘unity’.
5 Stratified mixed randomized response model
Stratified random sampling is usually applied by decomposing the population into distinct homogeneous groups called strata. It gives reasonably representative sample of the population. Many researchers have suggested RR techniques using stratified random sampling, for instance, Hong et al. (1994), Kim and Elam (2005) and Singh and Tarray (2014). We now present estimator under stratified estimation method proposed by Singh and Tarray (2014) to be used later for comparison purposes.
5.1 Singh and Tarray (2014) stratified mixed randomized response model
Singh and Tarray (2014) assumed that the population is partitioned into “r” nonoverlapping strata, and a sample is selected by simple random sampling with replacement from each stratum. To get the full benefit from stratification, they assumed that the number of units in each stratum is known. In this model, an individual respondent in a sample from each stratum is instructed to answer a direct question “I am a member of the innocuous trait group.” Respondents answer the direct question by “Yes” or “No.” If a respondent answers “Yes,” then he or she is instructed to go to the randomization device \( R_{k1} \) consisting of statements: (i) “I am the member of the sensitive trait group” and (ii) “I am a member of the innocuous trait group” with preassigned probabilities \( Q_{k} \) and \( \left( {1 - Q_{k} } \right) \), respectively. If a respondent answers “No,” then the respondent is instructed to use a randomization procedure due to Mangat (1994). In the Mangat’s (1994) RR procedure, each respondent is instructed to say “Yes” if he or she is a member of the sensitive trait group. If he or she is not a member of the sensitive trait group, then the respondent is required to use the Warner’s (1965) randomization device \( R_{k2} \) consisting of the statement: (i) “I belong to the sensitive trait group” and (b) “I do not belong to the sensitive trait group” with preassigned probabilities \( P_{k} \) and \( \left( {1 - P_{k} } \right) \), respectively. Then he or she is to report “Yes” or “No” according to the outcome of the randomization device \( R_{k2} \) and the actual status that he or she has with respect to the sensitive trait group. The survey procedures are performed under the assumption that both the sensitive and the innocuous questions are unrelated and independent in a randomization device \( R_{k1} \). To protect the respondent’s privacy, the respondents should not disclose to the interviewer the question they answered from either \( R_{k1} \) or \( R_{k2} \). Suppose we denote \( m_{k} \) as the number of units in the sample from stratum k and n as the total number of units in samples from all strata. Let \( m_{k1} \) be the number of people responding “Yes” when respondents in a sample \( m_{k} \) were asked the direct question and \( m_{k2} \) be the number of people responding “No” when respondents in a sample \( m_{k} \) were asked the direct question so that \( \sum\limits_{k = 1}^{r} {m_{k} = \sum\limits_{k = 1}^{r} {\left( {m_{k1} + m_{k2} } \right)} } \). Under the assumption that these “Yes” or “No” reports are made truthfully, and \( Q_{k} \) and \( P_{k} \left( { \ne 0.5} \right) \) are set by the researcher, then the proportion of “Yes” answer from the respondents using the randomization device \( R_{k1} \) will be
The estimator of \( \pi_{{{\text{S}}_{k} }} \) in terms of the sample proportion of “Yes” response \( \hat{Y}_{k} \) is given as
with the variance
where \( V_{1k} = \frac{{\left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left[ {Q_{k} \pi_{{{\text{S}}_{k} }} + \left( {1 - Q_{k} } \right)} \right]}}{{Q_{k} }}. \)
The proportion of “Yes” answers from the respondents using Mangat (1994) randomization device \( R_{2k} \):
The estimator of \( \pi_{{{\text{S}}_{k} }} \) in terms of the sample proportion of “Yes” response \( \hat{X}_{k} \) is given by
with the variance
where \( V_{2k} = \left[ {\pi_{{S_{k} }} \left( {1 - \pi_{{{\text{S}}_{k} }} } \right) + \left( {1 - Q_{k} } \right)\left( {1 - \pi_{{{\text{S}}_{k} }} } \right)} \right] \)
The pooled unbiased estimator of \( \pi_{{{\text{S}}_{k} }} \) in terms of the sample proportion of “Yes” response \( \hat{Y}_{k} \) and \( \hat{X}_{k} \) is given as
with the variance
where \( m_{k} = m_{k1} + m_{k2} \) and \( \lambda_{k} = {{m_{k1} } \mathord{\left/ {\vphantom {{m_{k1} } {m_{k} }}} \right. \kern-0pt} {m_{k} }} \).
Thus, the unbiased estimator of \( \pi_{\text{S}} = \sum\limits_{k = 1}^{r} {w_{k} \pi_{{{\text{S}}_{k} }} } \) is given as
The variance of \( \pi_{\text{mS}} \) is given as
In the next section, we have proposed a weighted unbiased estimator for Singh and Tarray (2014) stratified estimator \( \pi_{S} \) and studied its properties.
6 Proposed Stratified Mixed Randomized Response Model Using Weights
Moving along the direction for stratified mixed RR model traced by Singh and Tarray (2014), we introduce a weighted unbiased estimator for \( \pi_{\text{s}} \) as
where \( \eta_{1k} \) and \( \eta_{2k} \) are suitably chosen constant such that \( \eta_{1k} + \eta_{2k} = 1 \).
For \( \eta_{1k} = \frac{{\lambda_{k} Q_{k} }}{{\left( {1 - \lambda_{k} } \right)}} \) and \( \eta_{2k} = \left( {1 - \eta_{1k} } \right) = \frac{{\left\{ {1 - \lambda_{k} \left( {1 + Q_{k} } \right)} \right\}}}{{\left( {1 - \lambda_{k} } \right)}} \), in (40) we get an unbiased estimator \( \hat{\pi }_{{{\text{mh}}_{k} }} \) for \( \pi_{\text{s}} \) as
We mention that in (41) \( \lambda_{k} 's \) and \( Q_{k} 's \) are known.
The variance of the estimator \( \hat{\pi }_{{{\text{mh}}_{k} }} \) is given as
where \( D_{k} = \frac{1}{{\left( {1 - \lambda_{k} } \right)^{2} }}\left[ {\lambda_{k} Q_{k}^{2} V_{1k} + \frac{{\left\{ {1 - \lambda_{k} \left( {1 + Q_{k} } \right)} \right\}^{2} V_{2k} }}{{\left( {1 - \lambda_{k} } \right)}}} \right] \) and \( V_{1k} \) and \( V_{2k} \) are same as defined earlier
The unbiased estimator of \( \pi_{S} = \sum\limits_{k = 1}^{r} {w_{k} \pi_{{S_{k} }} } \) is given by
with the variance
6.1 Variance of \( \hat{\pi }_{\text{mh}} \) under Neyman allocation
Information on \( \pi_{{{\text{S}}_{k} }} \) is usually unavailable. But if prior information about them is available from past experience then we may derive the Neyman allocation formula.
The Neyman allocation of n to \( m_{1} ,m_{2} , \ldots ,m_{r - 1} \;{\text{and}}\;m_{r} \), to derive the minimum variance of \( \hat{\pi }_{\text{mh}} \) subject to \( n = \sum\nolimits_{k = 1}^{r} {m_{k} } \) is approximately given by
Using (45) and (46) the minimal variance of \( \hat{\pi }_{\text{mh}} \) is given by
6.2 Efficiency comparison with Singh and Tarray (2014) model
In this section, we have made the comparison of the proposed mixed randomized response model using Singh and Tarray’s (2014) model by way of variance comparison.
To compare the proposed estimator \( \hat{\pi }_{\text{mh}} \) with that of Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{mS}} \), we write the variance of Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{mS}} \) under Neyman allocation as
where \( S_{k}^{*} = \left[ {\pi_{{{\text{S}}_{k} }} \left( {1 - \pi_{{{\text{S}}_{k} }} } \right) + \frac{{\left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right)\lambda_{k} }}{{Q_{k} }} + \left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right)\left( {1 - \lambda_{k} } \right)} \right] \).
From Eq. (47) and (48), we have
i.e. if \( \left( {\sum\limits_{k = 1}^{r} {w_{k} \sqrt {S_{k}^{*} } } } \right)^{2} > \left( {\sum\limits_{k = 1}^{r} {w_{k} \sqrt {D_{k} } } } \right)^{2} \)
i.e. if \( \sum\limits_{k = 1}^{r} {w_{k} \sqrt {D_{k} } } < \sum\limits_{k = 1}^{r} {w_{k} \sqrt {S_{k}^{*} } } \)
i.e. if \( \sum\limits_{k = 1}^{r} {w_{k} \left( {\sqrt {D_{k} } - \sqrt {S_{k}^{*} } } \right)} < 0 \)
i.e. if \( \left( {\sqrt {D_{k} } - \sqrt {S_{k}^{*} } } \right) < 0\quad \forall \quad k = 1,2, \ldots ,r \)
Thus, we state the following theorem.
Theorem 6.1
The proposed mixed randomized response model based on stratified random sampling is more efficient than the Singh and Tarray’s (2014) stratified mixed randomized response model as long as the condition \( D_{k} < S_{k}^{*} \;\forall \;k = 1,2, \ldots ,r \); is satisfied.
To have an idea about the efficiency gain of the proposed stratified estimator \( \hat{\pi }_{\text{mh}} \), we perform a numerical study. Their performance is evaluated through percent relative efficiency with respect to Singh and Tarray (2014) stratified estimator \( \hat{\pi }_{\text{mS}} \) in case of two strata (i.e., r = 2) using formula:
where
\( \pi_{\text{S}} = w_{1} \pi_{{{\text{S}}_{1} }} + w_{2} \pi_{{{\text{S}}_{2} }} , \) \( \sqrt {S_{k}^{*} } = \left[ {\pi_{{{\text{S}}_{k} }} \left( {1 - \pi_{{S_{k} }} } \right) + \frac{{\left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right)\lambda_{k} }}{{Q_{k} }} + \left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right)\left( {1 - \lambda_{k} } \right)} \right]^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} \), and \( \sqrt {D_{k} } = \frac{1}{{\left( {1 - \lambda_{k} } \right)}}\left[ {\lambda_{k} Q_{1k}^{2} V_{1k} + \frac{{\left\{ {1 - \lambda_{k} \left( {1 + Q_{k} } \right)} \right\}^{2} V_{2k} }}{{\left( {1 - \lambda_{k} } \right)}}} \right]^{{{1 \mathord{\left/ {\vphantom {1 2}} \right. \kern-0pt} 2}}} \)
Findings for the percent relative efficiency are given in Table 13 for the different cases of \( \pi_{\text{S}} ,\lambda_{k} \) and \( Q_{k} \) respectively.
It is observed from Table 13 that, the values of \( {\text{PRE}}\left( {\hat{\pi }_{\text{mh}} ,\hat{\pi }_{\text{mS}} } \right) \) are more than 100. Thus, the proposed estimator \( \hat{\pi }_{\text{mh}} \) is more efficient than the one earlier considered by Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{mS}} \) for given parametric values. Further, we note that the \( {\text{PRE}}\left( {\hat{\pi }_{\text{mh}} ,\hat{\pi }_{\text{mS}} } \right) \) decreases as \( Q_{1} \) increases. For the fixed values of \( \left( {\pi_{\text{S}} ,Q_{1} } \right) \) the \( {\text{PRE}}\left( {\hat{\pi }_{\text{mh}} ,\hat{\pi }_{\text{mS}} } \right) \) increases as \( \lambda \) increases. Larger gain in efficiency is observed as long as \( Q_{1} \) lies between 0.1 and 0.3 (i.e. \( 0.1 \le Q_{1} \le 0.3 \)) and \( Q_{2} \) lies between 0.2 and 0.5 (i.e. \( 0.2 \le Q_{2} \le 0.5 \)).
6.3 Estimation of population proportion using mixed randomized response model when weights \( \eta_{1k} \) and \( \eta_{2k} \) are scalars
Using the estimator \( \hat{\pi }_{{{\text{mh}}k}} \) defined at (40) we define a weighted unbiased estimator of population proportion \( \pi_{\text{S}} = \sum\nolimits_{k = 1}^{r} {w_{k} \pi_{{{\text{S}}_{k} }} } \) as
The variance of \( \hat{\pi }_{{{\text{m}}\eta }} \) is given by
The variance \( V\left( {\hat{\pi }_{{{\text{m}}\eta }} } \right) \) at (51) is minimized for
Thus, the resulting minimum variance of \( \hat{\pi }_{{{\text{m}}\eta }} \) is given by
where
Thus, the resulting optimum estimator for \( \pi_{\text{S}} \) is given by
whose variance is
The variance of the optimum estimator (OE) under Neyman allocation \( m_{k} \propto w_{k} \sqrt {S_{k} } \), k = 1,2,…r.
is given by
which is positive if
i.e. if \( \sum\limits_{k = 1}^{r} {w_{k} \sqrt {S_{k}^{*} } } - \sum\limits_{k = 1}^{r} {w_{k} } \sqrt {V_{{k{\text{o}}}} } > 0 \)
i.e. if \( \sum\limits_{k = 1}^{r} {w_{k} } \left( {\sqrt {S_{k}^{*} } - \sqrt {V_{{k{\text{o}}}} } } \right) > 0 \)
i.e. if \( \sqrt {S_{k}^{*} } - \sqrt {V_{{k{\text{o}}}} } > 0\quad \forall \quad k = 1,2, \ldots ,r \)
i.e. if \( \sqrt {S_{k}^{*} } > \sqrt {V_{{k{\text{o}}}} } \quad \forall \quad k = 1,2, \ldots ,r \)
i.e. if \( S_{k}^{*} > V_{{k{\text{o}}}} \quad \forall \ldots k = 1,2, \ldots ,r \)
i.e. if \( \begin{aligned} \left[ {\pi_{{{\text{S}}_{k} }} \left( {1 - \pi_{{{\text{S}}_{k} }} } \right) + \left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right) + \left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right)\lambda_{k} \left( {\frac{1}{{Q_{k} - 1}}} \right)} \right] \hfill \\ > \frac{{V_{1k} V_{2k} }}{{\left[ {\left( {1 - \lambda_{k} } \right)V_{1k} + \lambda_{k} V_{2k} } \right]}}\quad \forall \quad k = 1,2, \ldots ,r \hfill \\ \end{aligned} \)
i.e. if \( V_{2k} + \left( {1 - \pi_{{{\text{S}}_{k} }} } \right)\left( {1 - Q_{k} } \right)\lambda_{k} \left( {\frac{1}{{Q_{k} }} - 1} \right) - \frac{{V_{1k} V_{2k} }}{{\left\{ {\left( {1 - \lambda_{k} } \right)V_{1k} + \lambda_{k} V_{2k} } \right\}}} > 0\quad \forall \quad k = 1,2, \ldots ,r \)
i.e. if \( \frac{{\lambda_{k} V_{2k} \left( {1 - \pi_{{S_{k} }} } \right)\left( {1 - Q_{k} } \right)^{2} }}{{Q_{k} \left[ {\left( {1 - \lambda_{k} } \right)V_{1k} + \lambda_{k} V_{2k} } \right]}} + \frac{{\left( {1 - \pi_{{S_{k} }} } \right)\left( {1 - Q_{k} } \right)^{2} \lambda_{k} }}{{Q_{k} }} > 0\quad \forall \quad k = 1,2, \ldots ,r \)
which is always true.
Thus, the proposed optimum estimator (OE) \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) is more efficient than the Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{mS}} \) under Neyman allocation.
which is positive if
i.e. if \( \sqrt {D_{k} } > \sqrt {V_{{k{\text{o}}}} } ,\quad \forall \quad k = 1,2, \ldots ,r \);
i.e. if \( D_{k} > V_{{k{\text{o}}}} ,\quad \forall \quad k = 1,2, \ldots ,r \);
Now we have
which is always true.
Thus,
It follows from (64) that the proposed estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) is more efficient than the proposed estimator \( \hat{\pi }_{\text{mh}} \).
Now we established the following theorem.
Theorem 6.2
The proposed optimum estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) is more efficient than the Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{ms}} \) and the proposed estimator \( \hat{\pi }_{\text{mh}} \).
To see the performance of the proposed optimum estimator (OE) \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) relative to Singh and Tarray’s (2014) estimator \( \hat{\pi }_{\text{mS}} \) under Neyman allocation, we have computed the percent relative efficiency (PRE) of the estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) with respect to the estimator \( \hat{\pi }_{\text{mS}} \) using the formula:
for two strata (i.e. r = 2), \( \lambda_{1} = \lambda_{2} = \lambda \) and different values of \( \left( {\pi_{{{\text{S}}1}} ,\pi_{{{\text{S}}2}} ,\pi_{\text{S}} } \right) \) and \( Q_{1} \;{\text{and}}\;Q_{2} \).
Findings are shown in Table 14
Table 14 shows that the proposed optimum estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) is more efficient than Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{mS}} \). We further note that the values of \( {\text{PRE}}\left( {\hat{\pi }_{\text{mh}} ,\hat{\pi }_{\text{mS}} } \right) \) is very close to the value of \( {\text{PRE}}\left( {\hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} ,\hat{\pi }_{\text{mS}} } \right) \). There is practical difficulty in using the proposed optimum estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) as it depends on the unknown parameter \( \pi_{\text{s}} \) under investigation while the proposed estimator \( \hat{\pi }_{\text{mh}} \) does not face any such difficulty, so the estimator \( \hat{\pi }_{\text{mh}} \) would be preferred over the optimum estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \) and Singh and Tarray (2014) estimator \( \hat{\pi }_{\text{h}} \). Thus, we infer that the proposed estimator \( \hat{\pi }_{\text{mh}} \) would be used as an alternative to the optimum estimator \( \hat{\pi }_{{{\text{m}}\eta_{{k{\text{o}}}} }} \).
7 Conclusion
In this article, we have suggested a weighted unbiased estimator based on mixed randomized response model and its Stratified RR model which are more efficient than the Singh and Tarray (2014) model. We have also discussed a particular case by giving a suitable weight in the proposed weighted estimator and found that the relative efficiency of the estimator for the different parametric choices is close to the proposed optimum estimator. Thus, our mixed RR model and Stratified mixed RR model are good alternative to the Singh and Tarray’s (2014) model.
References
Chang HJ, Huang KC (2001) Estimation of proportion and sensitivity of a qualitative character. Mertika 53:269–280
Chang HJ, Wang CL, Huang KC (2004a) Using randomized response to estimate the proportion and truthful reporting probability in a dichotomous finite population. J Appl Stat 53:269–280
Chang HJ, Wang CL, Huang KC (2004b) On estimating the proportion of a qualitative sensitive character using randomized response sampling. Qual Quant 38:675–680
Chaudhuri A, Mukerjee R (1987) Randomized response technique: a review. Stat Neerl 41:27–44
Chaudhuri A, Mukerjee R (1988) Randomized response. Statistics: textbooks and monographs, vol 85. Marcel Dekker Inc, New York
Christofides TC (2003) A generalized randomized response technique. Metrika 57(195):200
Fox JA, Tracy PE (1986) Randomized response: a method of sensitive surveys. Sage, Newbury Park
Hedayat AS, Sinha BK (1991) Design and inference in finite population sampling. Wiley, New York
Hong K, Yum J, Lee H (1994) A stratified randomized response technique. Korean J Appl Stat 7:141–147
Huang KC (2004) A survey technique for estimating the proportion and sensitivity in a dichotomous finite population. Stat Neerlandica 58:75–82
Kim JM, Elam ME (2005) A two-stage stratified Warner’s randomized response model using optimal allocation. Metrika 61:1–7
Kim JM, Warde WD (2005) A mixed randomized response model. J Stat Plan Inference 133:211–221
Mahmood M, Singh S, Horn S (1998) On the confidentiality guaranteed under randomized response sampling: a comparison with several new techniques. Biom J 40(2):237–242
Mangat NS (1994) An improved randomized response strategy. J R Stat Soc Ser B 56:93–95
Mangat NS, Singh R (1990) An alternative randomized response procedure. Biometrika 77(2):439–442
Mangat NS, Singh R, Singh S (1997) Violation of respondent’s privacy in moors model—its rectification through a random group strategy response model. Commun Stat Theory Methods 3:243–255
Moors JJA (1997) A critical evaluation of Mangat’s two-step procedure in randomized response. Discussion paper at Center for Economic Research. Tilburg University, Tilburg
Singh HP, Tarray TA (2012) A stratified unknown repeated trials in randomized response sampling. Commun Stat Appl Methods 19(6):751–759
Singh HP, Tarray TA (2014) An efficient alternative mixed randomized response procedure. Sociol Methods Res 44(4):706–722
Singh S, Singh R, Mangat NS (2000) Some alternative strategies to Moor’s model in randomized response model. J Stat Plan Inference 83:243–255
Tracy DS, Mangat NS (1996) Some developments in randomized response sampling during the last decade—a follow up of review by Chaudhuri and Mukherjee. J Appl Stat Sci 4:147–158
Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60:63–69
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Haruhiko Ogasawara.
About this article
Cite this article
Singh, H.P., Gorey, S.M. Use of weights in mixed randomized response model. Behaviormetrika 45, 225–259 (2018). https://doi.org/10.1007/s41237-018-0049-9
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s41237-018-0049-9