Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

Tiwari, Kuldeep Kumar; Bhougal, Sandeep; Kumar, Sunil; Rather, Khalid Ul Islam

doi:10.1007/s42519-022-00251-1

Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

Original Article
Published: 06 April 2022

Volume 16, article number 28, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Statistical Theory and Practice Aims and scope Submit manuscript

Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

Download PDF

165 Accesses
6 Citations
1 Altmetric
Explore all metrics

Abstract

There are situations in survey sampling where the study characters are sensitive. Due to the sensitivity of characters, practitioners don’t get the actual response. Randomized response technique (RRT) models are developed to reduce the bias raised by an evasive response on the sensitive variable. The measurement error (ME) is usually always present in the surveys so we need to study the RRT models with ME. We propose an estimator to predict the population mean of a sensitive variable in the influence of ME. The properties of the proposed estimator are studied and comparisons are made with the existing estimators. At last, a simulation study is executed to illustrate the results numerically.

An Unbiased Regression Type Estimator In Randomized Response Sampling

Article 26 May 2021

Ratio Estimation of Finite Population Mean Using Optional Randomized Response Models

Article 01 September 2015

Mean Estimation of Sensitive Variables Under Non-response and Measurement Errors Using Optional RRT Models

Article 03 November 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

A variable having some sensitive information about a person or enterprise can be classified as a sensitive variable. Direct observation on the study variable is sometimes not possible in surveys because the information may be sensitive. That is, a respondent may be uncomfortable to provide the information that the interviewer required due to some personal or any reasons, e.g., the questions seeking the information regarding corruption, criminality, abortion, drug addiction, etc. To handle such situations, Warner [1] proposed a randomized response technique (RRT) to reduce the bias on evasive response. In RRT, a scramble variable which is independent of sensitive study variable and auxiliary variable is used to estimate the population mean of a sensitive variable. It is assumed that the distribution of the scramble variable is known. The interviewee is asked to provide a scrambled response to sensitive variable but give true response to the auxiliary variable. Pollock and Bek [2] proposed an additive RRT model for a quantitative sensitive variable, which is further discussed by Himmelfarb and Edgell [3]. Based on multiplicative scrambling, Eichhorn and Hayre [4] introduced an RRT model to get information on the sensitive variable. A question in the survey questionnaire may be sensitive for one respondent but not for another. That is, for the same question some respondents provide scrambled response while some of them may provide true response. Addressing this issue, Gupta, Gupta and Singh [5] proposed the concept of the optional randomized response technique (ORRT) model and explained that an ORRT model is generally more efficient than the corresponding RRT model. Gupta et al. [6] show that there is no extra loss of privacy in using ORRT models as compared to the corresponding RRT models. The problem of estimation of population mean of a sensitive variable considered by many authors as Wu, Tian and Tang [7], Gupta, Shabbir and Sehra [8], Sousa et al. [9], Gupta et al. [10], Koyuncu, Gupta and Sousa [11], Tarray and Singh [12], Shahzad et al. [13], Mushtaq and Noor-ul-Amin [14], Saleem and Sanaullah [15], Su et al. [16], etc. None of these studies have examined the impact of measurement errors (ME) that happen commonly in surveys. ME is the difference between observed and true value. ME is one of the very common contributors to non-sampling error. The problem of ME is inherent in survey sampling. It may increase in the case of a sensitive issue as the surveyor has to deal with an evasive response. And so, RRT in the presence of ME seeks attention for an extensive study. Very limited efforts have been made to estimate the finite population mean of a sensitive variable in the presence of ME. Recently some researchers focuses on this issue. Blattman et al. [17] developed a survey validation technique for qualitative variables to check for ME when dealing with sensitive attributes. Khalil, Gupta and Hanif [18] propose a study in stratified sampling in the presence of scrambled response and ME. Khalil, Zhang and Gupta [19] use the ORRT model under ME to study some estimators of population mean. Zahid and Shabbir [20] use dual auxiliary variable to estimate population mean of a sensitive variable in the presence of ME. Onyango, Oduor and Odundo [21] propose an estimator using RRT and ME in double stratified sampling. Zhang, Khalil and Gupta [22] propose a study on mean estimation comprising sensitive variable, ME and non-response. Some more recent work on the sensitive issue in the presence of ME are Khalil, Noor-ul-Amin and Hanif [23], Zhang, Khalil and Gupta [24], Zahid, Shabbir and Alamri [25].

We propose an estimator of the population mean of a sensitive variable using a non-sensitive auxiliary variable when there is a presence of ME in the study.

2 Notations and Existing Estimators

Let $\Omega = \{\Omega _{1}, \Omega _{2,..,\Omega _{N}}\}$ be a finite population of size N. A sample of size n is taken using simple random sampling without replacement from $\Omega $. Let Y be the sensitive study variable, which cannot be observed directly and X be a non-sensitive auxiliary variable correlated with Y. Let S be a scrambling variable independent of Y and X. We assume that S has a known distribution with mean zero and variance $\sigma _{s}^{2}$. Here we use the additive RRT model of Pollock and Bek [2]. In the survey, the scrambled response variable Z is observed as $Z=Y+S$. It is assumed that $E(S)=0$ and $E(Z)=E(Y)+E(S)$, so $E(Z)=E(Y)$. That is, $\bar{Z}=\bar{Y}$. So we have to estimate the population mean of the scrambled response variable Z and that will be the population mean of sensitive variable Y. The degree of protection for the additive RRT model is $\Delta =E(Z-Y)^{2}=\sigma _{s}^{2}.$ For detail, see Yan, Wang and Lai [26] and Saleem, Sanaullah and Hanif [27].

Let $(x_{i}, z_{i})$ be the observed values and $(X_{i}, Z_{i})$ be the true values of the variables X, Z. Let u be the ME on Z and v be the ME on X. The measurement errors on $i^{th}$ observed unit are $u_{i}=z_{i}-Z_{i}$ and $v_{i}=x_{i}-X_{i}$. Since the measurement errors are independent of each other and there is both under and over reporting so it is assumed that u and v are uncorrelated with mean zero and variances $\sigma _{u}^{2}$ and $\sigma _{v}^{2}$, respectively.

Some other notations used in the article are: $\bar{X}$, $\bar{Y}$, $\bar{Z}$ and $\bar{x}$, $\bar{y}$, $\bar{z}$ population and sample means of X, Y and Z, respectively, coefficient of skewness for auxiliary variable $\beta _{1}(x)$, coefficient of kurtosis for auxiliary variable $\beta _{2}(x)$, $\sigma _{z}^{2}$ population variance for Z, $\sigma _{x}^{2}$ population variance for X, $\rho _{zx}$ correlation coefficient between Z and X, $C_{x}=\frac{\sigma _{x}}{\bar{X}}$, $C_{z}=\frac{\sigma _{z}}{\bar{Z}}$, $\lambda = \frac{1}{n}-\frac{1}{N}$.

The need of efficiency motivate the researchers to work on development of estimators. Sample mean per unit estimator $\hat{\bar{Y}}_{1}=\bar{y}$ is the usual estimator of population mean. To get better efficiency over $\hat{\bar{Y}}_{1}$, Cochran [28] uses ratio method of estimation as $\hat{\bar{Y}}_{2}=\bar{y}\frac{\bar{X}}{\bar{x}}$. It is found that $\hat{\bar{Y}}_{2}$ works better than $\hat{\bar{Y}}_{1}$ when there is high positive correlation between Y and X. Using coefficient of variation of auxiliary variable, Sisodia and Dwivedi [29] propose $\hat{\bar{Y}}_{2}=\bar{y} \left[ \frac{\bar{X}+C_{x}}{\bar{x}+C_{x}} \right] $ and show its worth over $\hat{\bar{Y}}_{2}$. Following Sisodia and Dwivedi [29], Upadhaya and Singh [30], Singh [31], Singh and Tailor [32] and Singh et al. [33] propose various estimators using coefficient of kurtosis, coefficient of skewness, correlation coefficient, standard deviation, etc. Some other works in the process are Kadilar and Cingi [34], Yan and Tian [35], Abid et al. [36], etc.

The RRT sample mean per unit estimator is $\hat{\mu }_{1}=\bar{z}$. The MSE of $\hat{\mu }_{1}$ is

$$\begin{aligned} MSE(\hat{\mu }_{1})= \lambda \sigma _{z}^{2}. \end{aligned}$$

When there is sensitivity on the study variable, Sousa et al. [9] defined the ratio estimator as $\hat{\mu }_{2}=\bar{z}\left( \frac{\bar{X}}{\bar{x}}\right) $ and obtained the bias and MSE of $\hat{\mu }_{2}$ up to first order of approximation as

$$\begin{aligned} Bias(\hat{\mu }_{2})= & {} \lambda \bar{Z}\left( C_{x}^{2}-\rho _{zx} C_{z}C_{x}\right) \end{aligned}$$

(1)

$$\begin{aligned} MSE(\hat{\mu }_{2})= & {} \lambda \bar{Z}^{2}\left( C_{z}^{2} + C_{x}^{2}-2\rho _{zx}C_{z}C_{x}\right) . \end{aligned}$$

(2)

Further, Sousa et al. [9] propose four more estimators to estimate the population mean of the sensitive variable using auxiliary information as

$$\begin{aligned} \hat{\mu }_{3}= & {} \bar{z} \left[ \frac{\bar{X}+\beta _{1}(x)}{\bar{x}+\beta _{1}(x)} \right] \\ \hat{\mu }_{4}= & {} \bar{z} \left[ \frac{\bar{X}+\beta _{2}(x)}{\bar{x}+\beta _{2}(x)} \right] \\ \hat{\mu }_{5}= & {} \bar{z} \left[ \frac{\beta _{1}(x) \bar{X}+\beta _{2}(x)}{\beta _{1}(x) \bar{x}+\beta _{2}(x)} \right] \\ \hat{\mu }_{6}= & {} \bar{z} \left[ \frac{\beta _{2}(x) \bar{X}+\beta _{1}(x)}{\beta _{2}(x) \bar{x}+\beta _{1}(x)} \right] . \end{aligned}$$

The bias and MSE of $\hat{\mu }_{3}$, $\hat{\mu }_{4}$, $\hat{\mu }_{5}$ and $\hat{\mu }_{6}$ are

$$\begin{aligned} Bias(\hat{\mu }_{i})= & {} \lambda \bar{Y}\left( q_{i}^{2}C_{x}^{2}-q_{i}\rho _{zx} C_{z}C_{x}\right) \end{aligned}$$

(3)

$$\begin{aligned} MSE(\hat{\mu }_{i})= & {} \lambda \bar{Y}^{2}\left( C_{z}^{2} + q_{i}^{2}C_{x}^{2}-2q_{i} \rho _{zx}C_{z}C_{x}\right) \end{aligned}$$

(4)

where $i=3,4,5,6$ and $q_{3}=\frac{\bar{X}}{\bar{X}+\beta _{1}(x)}$, $q_{4}=\frac{\bar{X}}{\bar{X}+\beta _{2}(x)}$, $q_{5}=\frac{\beta _{1}(x) \bar{X}}{\beta _{1}(x) \bar{X}+\beta _{2}(x)}$, $q_{6}=\frac{\beta _{2}(x) \bar{X}}{\beta _{2}(x) \bar{X}+\beta _{1}(x)}$.

Using a simulation study, Sousa et al. [9] show that $\hat{\mu }_{3}$ achieve modest gain over other member estimators. The performance of these estimators in the presence of ME will be discussed hereafter.

The MSEs of the estimators $\hat{\mu }_{i}$, $i=1,2,\ldots ,6$ in the presence of measurement error can be derived as

$$\begin{aligned} MSE(\hat{\mu }_{i})= \lambda (\sigma _{z}^{2}+\sigma _{u}^{2}) + q_{i}^{2}\lambda (\sigma _{x}^{2}+\sigma _{v}^{2})-(2q_{i}\lambda \rho _{zx}\sigma _{z}\sigma _{x}) \end{aligned}$$

(5)

where $i=1,2,3,4,5,6$ and $q_{1}=0$, $q_{2}=\frac{\bar{Z}}{\bar{X}}$, $q_{3}=\frac{\bar{X}}{\bar{X}+\beta _{1}(x)}$, $q_{4}=\frac{\bar{X}}{\bar{X}+\beta _{2}(x)}$, $q_{5}=\frac{\beta _{1}(x) \bar{X}}{\beta _{1}(x) \bar{X}+\beta _{2}(x)}$, $q_{6}=\frac{\beta _{2}(x) \bar{X}}{\beta _{2}(x) \bar{X}+\beta _{1}(x)}$.

3 Proposed Estimator

As it is known that ratio estimators usually do not perform better than regression type estimators and so to get a better estimate than Sousa et al. [9], we propose a class of difference type randomized response estimator to estimate the population mean of a sensitive variable Y using a non-sensitive auxiliary variable X in the presence of measurement error. The proposed estimator is

$$\begin{aligned} t_{(\delta )}= \left[ \eta _{1}\bar{z} + \eta _{2}(\bar{X}-\bar{x})\right] \exp \left[ \frac{\delta (\bar{X}-\bar{x})}{\bar{X}+\bar{x}}\right] \end{aligned}$$

(6)

where $\eta _{1}$, $\eta _{2}$ are constants to be optimize for minimum MSE of $t_{(\delta )}$ and $\delta $ is suitable constant which takes real or parametric values. By giving different values to $\delta $, we can generate new members of the class of estimators $t_{(\delta )}$, e.g., for $\delta =c(constant)$, the member estimator will be denoted by $t_{(c)}$.

4 Bias and MSE

To derive the bias and mean squared error (MSE) of the proposed class of estimator, let

$\omega _{z}=\sum _{i=1}^{N}\left( Z_{i}-\bar{Z}\right) $, $\omega _{u}=\sum _{i=1}^{N}u_{i}$, $\omega _{x}=\sum _{i=1}^{N}\left( X_{i}-\bar{X}\right) $ and $\omega _{v}=\sum _{i=1}^{N}v_{i}$.

Adding $\omega _{z}$ and $\omega _{u}$, we get $\omega _{z} + \omega _{u} =\sum _{i=1}^{N}\left( Z_{i}-\bar{Z}\right) + \sum _{i=1}^{N}u_{i}$. Now, divide both sides by n and simplify, we get

$$\begin{aligned} \bar{z}= \bar{Z} + \xi _{z} ~~ \text {where} ~~ \xi _{z}= \frac{1}{n}\left( \omega _{z} + \omega _{u}\right) . \end{aligned}$$

Similarly, using $\omega _{x}$ and $\omega _{v}$ we get

$$\begin{aligned} \bar{x}= \bar{X} + \xi _{x} ~~ \text {where} ~~ \xi _{x}= \frac{1}{n}\left( \omega _{x} + \omega _{v}\right) . \end{aligned}$$

The error terms to get bias and MSE of estimators are $e_{z}=\frac{\bar{z}-\bar{Z}}{\bar{Z}}=\frac{\xi _{z}}{\bar{Z}}$ and $e_{x}=\frac{\bar{x}-\bar{X}}{\bar{X}}=\frac{\xi _{z}}{\bar{X}}$.

The expected values are $E(e_{z}^{2})=\frac{\lambda (\sigma _{z}^{2}+\sigma _{u}^{2})}{\bar{Z}^{2}}$, $E(e_{x}^{2})=\frac{\lambda (\sigma _{x}^{2}+\sigma _{v}^{2})}{\bar{X}^{2}}$ and $E(e_{z}e_{x})=\frac{\lambda \rho _{zx} \sigma _{z}\sigma _{x}}{\bar{Z}\bar{X}}$.

Now, express proposed estimator $t_{(\delta )}$ in terms of errors $e_{z}$ and $e_{x}$, we have

$$\begin{aligned} t_{(\delta )}= & {} \left[ \eta _{1}\bar{Z}(1+e_{z}) + \eta _{2}\left\{ \bar{X}-\bar{X}(1+e_{x})\right\} \right] \exp \left[ \frac{\delta \{\bar{X}-\bar{X}(1+e_{x})\}}{\bar{X}+\bar{X}(1+e_{x})}\right] \\ t_{(\delta )}= & {} \left[ \eta _{1}\bar{Z}+\eta _{1}\bar{Z}e_{z} - \eta _{2}\bar{X}e_{x}\right] \exp \left[ \frac{-\delta e_{x}}{2} \left( 1+\frac{e_{x}}{2}\right) ^{-1}\right] . \end{aligned}$$

Assuming $ \mid e_{x} \mid < 1$ expand above equation and terminate the terms having e’s degree greater than two and simplify, using $\bar{Z}=\bar{Y}$ we get

$$\begin{aligned} t_{(\delta )}-\bar{Y}= & {} (\eta _{1}-1)\bar{Y} + \eta _{1}\bar{Y}e_{z} - \left( \eta _{2}\bar{X}+ \frac{\eta _{1}\bar{Y} \delta }{2} \right) e_{x} \nonumber \\&+ \left[ \frac{\eta _{2}\bar{X} \delta }{2}+ \left( 1+ \frac{\delta }{2}\right) \frac{\eta _{1}\bar{Y} \delta }{4} \right] e_{x}^{2} \nonumber \\&- \frac{\eta _{1}\bar{Y} \delta }{2}e_{z}e_{x}. \end{aligned}$$

(7)

Taking expectation on both sides of Eq. (7), we get the bias.

$$\begin{aligned} Bias(t_{(\delta )})&=(\eta _{1}-1)\bar{Y}+\frac{1}{2\bar{X}} \left[ \left\{ \eta _{2} \delta + \left( 1+ \frac{\delta }{2}\right) \frac{\eta _{1}\bar{Y} \delta }{2\bar{X}} \right\} \right. \nonumber \\ {}&\quad \times \left. \lambda (\sigma _{x}^{2}+\sigma _{v}^{2}) - \eta _{1} \delta \lambda \rho _{zx} \sigma _{z}\sigma _{x}\right] . \end{aligned}$$

(8)

Squaring Eq. (7) and terminate terms having e’s degree greater than two and simplify, we get

$$\begin{aligned} (t_{(\delta )}-\bar{Y})^{2}= & {} (\eta _{1}-1)^{2}\bar{Y}^{2} + \eta _{1}^{2}\bar{Y}^{2}e_{z}^{2} \nonumber \\&+ \left[ \eta _{2}^{2} + \eta _{1}^{2}\left( \frac{\delta R^{2}}{2}+\frac{\delta ^{2} R^{2}}{4} \right) + 2\eta _{1}\eta _{2}\delta R \right] \bar{X}^{2}e_{x}^{2} \nonumber \\&-[ 2\eta _{1}\eta _{2}+ 2\delta \eta _{1}^{2}R]\bar{X}\bar{Y}e_{z}e_{x} \end{aligned}$$

(9)

where $R=\frac{\bar{Y}}{\bar{X}}$.

Taking expectation on both sides of Eq. (9), we get

$$\begin{aligned} MSE(t_{(\delta )})= & {} (\eta _{1}-1)^{2}\bar{Y}^{2} + \eta _{1}^{2}\nu _{1} \nonumber \\&+ \left[ \eta _{2}^{2} + \eta _{1}^{2}\left( \frac{\delta R^{2}}{2}+\frac{\delta ^{2} R^{2}}{4} \right) + 2\eta _{1}\eta _{2}\delta R \right] \nu _{2} -[ 2\eta _{1}\eta _{2}+ 2\delta \eta _{1}^{2}R]\nu _{3}\nonumber \\ \end{aligned}$$

(10)

where $\nu _{1}=\lambda (\sigma _{z}^{2}+\sigma _{u}^{2})$, $\nu _{2}=\lambda (\sigma _{x}^{2}+\sigma _{v}^{2})$ and $\nu _{3}=\lambda \rho _{zx}\sigma _{z}\sigma _{x}$.

$$\begin{aligned} MSE(t_{(\delta )})= & {} \bar{Y}^{2} -2 \bar{Y}^{2} \eta _{1} \nonumber \\&+ \left[ \bar{Y}^{2}+ \nu _{1} + \left( \frac{\delta R^{2}}{2}+\frac{\delta ^{2} R^{2}}{4} \right) \nu _{2} - 2\delta R \nu _{3} \right] \eta _{1}^{2} + \nu _{2} \eta _{2}^{2}\nonumber \\&+ (2\delta R \nu _{2}-2\nu _{3}) \eta _{1} \eta _{2}. \end{aligned}$$

(11)

The optimum values of $\eta _{1}$ and $\eta _{2}$ to minimize $MSE(t_{(\delta )})$ are

$\eta _{1_{O}}= \frac{4\bar{Y}^{2}\nu _{2}}{4\bar{Y}^{2}\nu _{2}+4\nu _{1}\nu _{2}-4\nu _{3}^{2}+2\delta R^{2}\nu _{2}^{2}-3\delta ^{2}R^{2}\nu _{2}^{2}}$ and$\eta _{2_{O}}= \frac{4\bar{Y}^{2}\nu _{3}-4\bar{Y}^{2}\delta R \nu _{2}}{4\bar{Y}^{2}\nu _{2}+4\nu _{1}\nu _{2}-4\nu _{3}^{2}+2\delta R^{2}\nu _{2}^{2}-3\delta ^{2}R^{2}\nu _{2}^{2}}$.

For optimum values of $\eta _{1}$ and $\eta _{2}$, the minimum MSE of $t_{(\delta )}$ is

$$\begin{aligned} MSE_{min}(t_{(\delta )})= \bar{Y}^{2} - \frac{4\bar{Y}^{4}\nu _{2}}{4\bar{Y}^{2}\nu _{2}+4\nu _{1}\nu _{2}-4\nu _{3}^{2}+2\delta R^{2}\nu _{2}^{2}-3\delta ^{2}R^{2}\nu _{2}^{2}}. \end{aligned}$$

(12)

As $MSE_{min}(t_{(\delta )})$ depends on $\delta $ so the optimum MSE of any member $t_{(c)}$ of the class of estimators $t_{(\delta )}$ can be obtained by putting $\delta =c$ in Eq. (12). So, for $\delta =c$ the member estimator is $t_{(c)}$ and optimum MSE is $MSE_{min}(t_{(c)})$.

5 Efficiency Comparison

The proposed class of estimators $t_{(\delta )}$ will be more efficient than the estimators $\hat{\mu }_{i}$, $i=1,2,\ldots ,6$ whenever the condition $MSE_{min}(t_{(\delta )})<MSE(\hat{\mu }_{i})$ satisfied.

From Eqs. (5) and (12), we found that $MSE_{min}(t_{(\delta )})<MSE(\hat{\mu }_{i})$ if

$$\begin{aligned} \bar{Y}^{2}+2q_{i}\nu _{3} < \nu _{1}+q_{i}^{2}\nu _{2}+\frac{4\bar{Y}^{4}\nu _{2}}{4\bar{Y}^{2}\nu _{2}+4\nu _{1}\nu _{2}-4\nu _{3}^{2}+2\delta R^{2}\nu _{2}^{2}-3\delta ^{2}R^{2}\nu _{2}^{2}} \end{aligned}$$

where $i=1,2,\ldots ,6$ and $\nu _{1}=\lambda (\sigma _{z}^{2}+\sigma _{u}^{2})$, $\nu _{2}=\lambda (\sigma _{x}^{2}+\sigma _{v}^{2})$, $\nu _{3}=\lambda \rho _{zx}\sigma _{z}\sigma _{x}$.

6 Monte Carlo Simulation

In this section, we do a simulation to see the performance of the proposed estimator against the existing estimator. For that, we have generated three different populations using R software. The scrambling variable S is employed to get the observation from the bivariate normal population as $Z=Y+S$. Also, there is measurement error u and v on z and x, respectively. Here, $\mu $ is the population mean vector as $\mu = \begin{bmatrix} \bar{X} \\ \bar{Y} \end{bmatrix}$ and $\sum $ is the covariance matrix. We have made 15000 replication to get a reliable result. The input data used to derive the three different populations and their descriptive statistics are given below.

6.1 Population 1

$N=250,~ n=80,~ \mu = \begin{bmatrix} 40 \\ 20 \end{bmatrix},~ \sum = \begin{bmatrix} 75 &{} 60 \\ 60 &{} 55 \end{bmatrix},~ S=rnorm(N,0,10),~ u=rnorm(N,0,7) ~\text {and}~ v=rnorm(N,0,9)$. Derived parametric values are $\bar{Z}=20.0074$, $\bar{X}=40.0029$, $\sigma _{z}=12.4384$, $\sigma _{x}=8.6517$, $\sigma _{u}=6.9926$, $\sigma _{v}=8.9946$, $\rho _{zx}=0.5557$, $\beta _{1}(x)=0.00085$, $\beta _{2}(x)=2.97348$.

6.2 Population 2

$N=1000,~ n=150,~ \mu = \begin{bmatrix} 2300 \\ 200 \end{bmatrix},~ \sum = \begin{bmatrix} 12000 &{} 2900 \\ 2900 &{} 1500 \end{bmatrix},~ S=rnorm(N,0,25),~ u=rnorm(N,0,10) ~\text {and}~ v=rnorm(N,0,13)$. Derived parametric values are $\bar{Z}=200.0065$, $\bar{X}=2300.0016$, $\sigma _{z}=41.5169$, $\sigma _{x}=109.5119$, $\sigma _{u}=9.9999$, $\sigma _{v}=12.9915$, $\rho _{zx}=0.6370$, $\beta _{1}(x)=0.00116$, $\beta _{2}(x)=2.99341$.

6.3 Population 3

$N=100,~ n=25,~ \mu = \begin{bmatrix} 15 \\ 3 \end{bmatrix},~ \sum = \begin{bmatrix} 20 &{} 12 \\ 12 &{} 14 \end{bmatrix},~ S=rnorm(N,0,2),~ u=rnorm(N,0,3) ~\text {and}~ v=rnorm(N,0,5)$. Derived parametric values are $\bar{Z}=3.0047$, $\bar{X}=15.0018$, $\sigma _{z}=4.2342$, $\sigma _{x}=4.4610$, $\sigma _{u}=2.9902$, $\sigma _{v}= 4.9886$, $\rho _{zx}=0.6304$, $\beta _{1}(x)=0.00301$, $\beta _{2}(x)=2.93465$.

The percent relative efficiency (PRE) of an estimator with respect to RRT sample mean per unit estimator $\hat{\mu }_{1}=\bar{z}$ is defined as

$$\begin{aligned} PRE(\hat{\mu }_{1},~.)= \frac{MSE(\hat{\mu }_{1})}{MSE(.)}\times 100. \end{aligned}$$

(13)

Table 1 PRE of the estimators with respect to $\hat{\mu }_{1}$

Full size table

The steps of the simulation process are:

Step 1 Using R software, generate a random population by giving inputs to N, $\mu $, $\sum $, S, u and v.
Step 2 Use scrambling variable S as $Z=Y+S$ to make study variable sensitive.
Step 3 Derive the required parametric values from the generated population.
Step 4 To get a stable value to the parameters, replicate Step 1 to Step 3 up to 15000 times and record it.
Step 5 Use the average of 15000 values to calculate the MSEs of the estimators.
Step 6 Calculate PREs of the estimators by using MSEs from Step 5 and Eq. (13).

From Table 1, we can see that the PRE values of the proposed estimator $t_{(\delta )}$ for $\delta =-2,~-1,~ 0, ~ 1, ~ 2$ are higher than the all considered existing estimators $\hat{\mu }_{i}$; i=1,2,...,6. This conclude that the proposed class of estimators $t_{(\delta )}$ is more efficient than the existing estimators.

7 Conclusion

The article presents a study on the estimation of population mean of a sensitive variable in the influence of measurement error. An estimator of the population mean is proposed under randomized response technique with measurement error. The expressions for bias and mean squared error derived up to the first order of approximation. The theoretical comparison is made with the existing estimators. A simulation study is performed to see the results numerically. The theoretical and simulation results show that the proposed class of estimators are more efficient than the estimators of Sousa et al. [9].

References

Warner SL (1965) Randomized response: a survey technique for eliminating evasive answer bias. J Am Stat Assoc 60(309):63–69. https://doi.org/10.1080/01621459.1965
Article MATH Google Scholar
Pollock KH, Bek Y (1976) A comparison of three randomized response models for quantitative data. J Am Stat Assoc 71(356):884–886. https://doi.org/10.2307/2286855
Article MATH Google Scholar
Himmelfarb S, Edgell SE (1980) Additive constants model: a randomized response technique for eliminating evasiveness to quantitative response questions. Psychol Bull 87(3):525–530. https://doi.org/10.1037/0033-2909.87.3.525
Article Google Scholar
Eichhorn BH, Hayre LS (1983) Scrambled randomized response methods for obtaining sensitive quantitative data. J Statist Plan Inference 7(4):307–316. https://doi.org/10.1016/0378-3758(83)90002-2
Article Google Scholar
Gupta SN, Gupta BC, Singh S (2002) Estimation of sensitivity level of personal interview survey questions. J Statist Plan Inference 100(2):239–247. https://doi.org/10.1016/S0378-3758(01)00137-9
Article MathSciNet MATH Google Scholar
Gupta S, Samridhi M, Shabbir J, Khalil S (2018) A unified measure of respondent privacy and model efficiency in quantitative rrt models. J Statist Theory Practice 12(3):506–511. https://doi.org/10.1080/15598608.2017.1415175
Article MathSciNet MATH Google Scholar
Wu JW, Tian GL, Tang ML (2008) Two new models for survey sampling with sensitive characteristic: design and analysis. Metrika 67:251–263. https://doi.org/10.1007/s00184-007-0131-x
Article MathSciNet MATH Google Scholar
Gupta S, Shabbir J, Sehra S (2010) Mean and sensitivity estimation in optional randomized response models. J Statist Plan Inference 140(10):2870–2874. https://doi.org/10.1016/j.jspi.2010.03.010
Article MathSciNet MATH Google Scholar
Sousa R, Shabbir J, Real PC, Gupta S (2010) Ratio estimation of the mean of a sensitive variable in the presence of auxiliary information. J Statist Theory Practice 4(3):495–507. https://doi.org/10.1080/15598608.2010.10411999
Article MathSciNet MATH Google Scholar
Gupta S, Shabbir J, Sousa R, Real PC (2012) Estimation of the mean of a sensitive variable in the presence of auxiliary information. Commun Statist-Theory Methods 41(13–14):2394–2404. https://doi.org/10.1080/03610926.2011.641654
Article MathSciNet MATH Google Scholar
Koyuncu N, Gupta S, Sousa R (2014) Exponential type estimators of the mean of a sensitive variable in the presence of non-sensitive auxiliary information. Commun Statist-Simul Comput 43(7):1583–1594. https://doi.org/10.1080/03610918.2012.737492
Article MATH Google Scholar
Tarray TA, Singh HP (2017) An improved estimation procedure of the mean of a sensitive variable using auxiliary information. Biostat Biometr Open Acc J 3(2):26–33. https://doi.org/10.19080/BBOAJ.2017.03.555607
Article Google Scholar
Shahzad U, Hanif M, Koyuncu N, Luengo AVG (2019) A regression type estimators for mean estimation under ranked set sampling alongside the sensitivity issue. Commun Fac Sci Univ Ank Ser A1 Math Stat 68(2):2037–2049. https://doi.org/10.31801/cfsuasmas.586057
Article MathSciNet Google Scholar
Mushtaq N, Noor-ul-Amin M (2020) Joint influence of double sampling and randomized response technique on estimation method of mean. Appl Math 10(1):12–19. https://doi.org/10.5923/j.am.20201001.03
Article Google Scholar
Saleem I, Sanaullah A (2021) Estimation of mean of a sensitive variable using efficient exponential-type estimators in stratified sampling. J Statist Comput Simul 92:232–248. https://doi.org/10.1080/00949655.2021.1940182
Article MathSciNet MATH Google Scholar
Su S, Salinas VI, Zamora ML, Sedory SA, Singh S (2021) Randomized response sampling with applications to tracking drugs for better life. Stat Sin 31:1–20. https://doi.org/10.5705/ss.202016.0177
Article MathSciNet MATH Google Scholar
Blattman C, Jamison J, Koroknay-Palicz T, Rodrigues K, Sheridan M (2016) Measuring the measurement error: a method to qualitatively validate survey data. J Dev Econ 120:99–112. https://doi.org/10.1016/j.jdeveco.2016.01.005
Article Google Scholar
Khalil S, Gupta S, Hanif M (2018) Estimation of finite population mean in stratified sampling using scrambled responses in the presence of measurement errors. Commun Statist - Theory Methods 48(6):1553–1561. https://doi.org/10.1080/03610926.2018.1435817
Article MathSciNet Google Scholar
Khalil S, Zhang Q, Gupta S (2019) Mean estimation of sensitive variables under measurement errors using optional rrt models. Commun Statist - Simul Comput 50:1417–1426. https://doi.org/10.1080/03610918.2019.1584298
Article MathSciNet MATH Google Scholar
Zahid E, Shabbir J (2019) Estimation of finite population mean for a sensitive variable using dual auxiliary information in the presence of measurement errors. PLoS One 14(2):e0212111. https://doi.org/10.1371/journal.pone.0212111
Article Google Scholar
Onyango R, Oduor B, Odundo F (2021) Joint influence of measurement errors and randomized response technique on mean estimation under stratified double sampling. Open J Math Sci 5:192–199. https://doi.org/10.30538/oms2021.0156
Article Google Scholar
Zhang Q, Khalil S, Gupta S (2021) Mean estimation of sensitive variables under non-response and measurement errors using optional rrt models. J Statist Theory Practice 15:1–15. https://doi.org/10.1007/s42519-020-00135-2
Article MathSciNet MATH Google Scholar
Khalil S, Noor-ul-Amin M, Hanif M (2018) Estimation of population mean for a sensitive variable in the presence of measurement error. Statist Manag Syst 21(1):81–91. https://doi.org/10.1080/09720510.2017.1367478
Article Google Scholar
Zhang Q, Khalil S, Gupta S (2021) Mean estimation in the simultaneous presence of measurement errors and non-response using optional rrt models under stratified sampling. J Statist Comput Simul 91:3492–3504. https://doi.org/10.1080/00949655.2021.1941018
Article MathSciNet MATH Google Scholar
Zahid E, Shabbir J, Alamri OA (2022) A generalized class of estimators for sensitive variable in the presence of measurement error and non-response under stratified random sampling. J King Saud Univ - Sci 34:1–10. https://doi.org/10.1016/j.jksus.2021.101741
Article Google Scholar
Yan Z, Wang J, Lai J (2008) An efficiency and protection degree-based comparison among the quantitative randomized response strategies. Commun Statist - Theory Methods 38(3):400–408. https://doi.org/10.1080/03610920802220785
Article MathSciNet MATH Google Scholar
Saleem I, Sanaullah A, Hanif M (2019) Double-sampling regression-cum-exponential estimator of the mean of a sensitive variable. Math Popul Stud 26(3):163–182. https://doi.org/10.1080/08898480.2019.1565273
Article MathSciNet MATH Google Scholar
Cochran WG (1940) The estimation of the yields of cereal experiments by sampling for the ratio of grain to total produce. J Agric Sci 30(2):262–275. https://doi.org/10.1017/S0021859600048012
Article Google Scholar
Sisodia BVS, Dwivedi BK (1981) A modified ratio estimator using coefficient of variation of auxiliary variable. J Indian Soc Agric Statist 33:13–18
Google Scholar
Upadhyaya LN, Singh HP (1999) Use of transformed auxiliary variable in estimating the finite population mean. Biom J 41(5):627–636
Article MathSciNet Google Scholar
Singh GN (2003) On the improvement of product method of estimation in sample surveys. J Indian Soc Agric Statist 56(3):267–275
MathSciNet MATH Google Scholar
Singh HP, Tailor R (1999) Use of known correlation coefficient in estimating the finite population mean. Statist Trans 6:555–560
Google Scholar
Singh HP, Tailor R, Tailor R, Kakran MS (2004) An improved estimator of population mean using power transformation. J Indian Soc Agric Statist 58:223–230
MathSciNet MATH Google Scholar
Kadilar C, Cingi H (2004) Ratio estimators in simple random sampling. Appl Math Comput 151(5):893–902. https://doi.org/10.1016/S0096-3003(03)00803-8
Article MathSciNet MATH Google Scholar
Yan Z, Tian B (2010) Ratio method to the mean estimation using coefficient of skewness of auxiliary variable. In: Zhu R, Zhang Y, Liu B, Liu C (eds) Inform Comput Appl. Springer, Berlin, Heidelberg, pp 103–110
Google Scholar
Abid M, Abbas N, Nazir HZ, Lin Z (2016) Enhancing the mean ratio estimators for estimating population mean using non-conventional location parameters. Revista Colombiana Estadist 39(1):63–79. https://doi.org/10.15446/rce.v39n1.55139
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The authors are thankful to Professor Sat N. Gupta, Editor-in-Chief and the learned referees for their suggestions, which improved the quality of the paper.

Author information

Kuldeep Kumar Tiwari, Sunil Kumar and Khalid Ul Islam Rather have contributed equally to this study.

Authors and Affiliations

School of Mathematics, Shri Mata Vaishno Devi University, Katra, Jammu and Kashmir, 182320, India
Kuldeep Kumar Tiwari & Sandeep Bhougal
Department of Statistics, University of Jammu, Jammu, Jammu and Kashmir, 180016, India
Sunil Kumar
Division of Statistics and Computer Science, SKUAST, Jammu, Jammu and Kashmir, 180009, India
Khalid Ul Islam Rather

Authors

Kuldeep Kumar Tiwari
View author publications
You can also search for this author in PubMed Google Scholar
Sandeep Bhougal
View author publications
You can also search for this author in PubMed Google Scholar
Sunil Kumar
View author publications
You can also search for this author in PubMed Google Scholar
Khalid Ul Islam Rather
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sandeep Bhougal.

Ethics declarations

Conflict of interest

There is no conflict of interest to disclose.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tiwari, K.K., Bhougal, S., Kumar, S. et al. Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error. J Stat Theory Pract 16, 28 (2022). https://doi.org/10.1007/s42519-022-00251-1

Download citation

Accepted: 25 February 2022
Published: 06 April 2022
DOI: https://doi.org/10.1007/s42519-022-00251-1

Keywords

Mathematics Subject Classification

62D05

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

Abstract

Similar content being viewed by others

An Unbiased Regression Type Estimator In Randomized Response Sampling

Ratio Estimation of Finite Population Mean Using Optional Randomized Response Models

Mean Estimation of Sensitive Variables Under Non-response and Measurement Errors Using Optional RRT Models

1 Introduction

2 Notations and Existing Estimators

3 Proposed Estimator

4 Bias and MSE

5 Efficiency Comparison

6 Monte Carlo Simulation

6.1 Population 1

6.2 Population 2

6.3 Population 3

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Mathematics Subject Classification

Navigation

Using Randomized Response to Estimate the Population Mean of a Sensitive Variable under the Influence of Measurement Error

Abstract

Similar content being viewed by others

An Unbiased Regression Type Estimator In Randomized Response Sampling

Ratio Estimation of Finite Population Mean Using Optional Randomized Response Models

Mean Estimation of Sensitive Variables Under Non-response and Measurement Errors Using Optional RRT Models

1 Introduction

2 Notations and Existing Estimators

3 Proposed Estimator

4 Bias and MSE

5 Efficiency Comparison

6 Monte Carlo Simulation

6.1 Population 1

6.2 Population 2

6.3 Population 3

7 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation