Abstract
In the present study, we proposed a class of estimators for estimating a finite population mean in the presence of non-response to the study variable. We set out to investigate their properties under the polynomial regression model (PRM) modelling approach. Some of the special cases of the class were discussed separately to show how some non-response versions of the existing estimators can be generated and studied from the general class. The comparison of the model-based mean square errors of the estimators under different settings of the considered model was illustrated with the help of some empirical data.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
1.1 Significance of non-response
In sampling theory, it is generally assumed that the true value of each unit in the population can be determined without error. In practice, this assumption may be violated for various reasons and because of practical constraints that exist at the time of the survey. The problem of nonresponse in sample surveys is usually due to a lack of interest on the part of respondents in answering, persons not present at home, lack of knowledge about the survey or the questions asked ethical issues, and refusal of respondents to answer the given questionnaire. In postal surveys, questionnaires are mailed to units selected in the sample with a request that they be returned within a specified period of time. In general, however, many respondents do not answer the questions or do not return the completed questionnaire within the specified time. In such cases, one can use the information obtained from the available questionnaires, but this may result in a loss of estimator efficiency due to the smaller sample size. In general, the postal questionnaire method is very often used in sample surveys to reduce the cost of the survey. Due to the high non-response rate that occurs for the reasons mentioned above. The component of bias creeps into the estimation procedure, leading to results as inaccurate as would not be expected without the presence of non-response. For mail surveys, Hansen and Hurwitz (Hansen and Hurwitz 1946) proposed a method of subsampling from nonresponding units and provided an estimate based on values obtained from responding units and subsamples obtained by the personal interview method from some nonresponding units. The literature on the various sources of non-response and methods of eliminating the problem of non-response can be found in Kish (Kish 1965), Kumar et al. (Kumar et al. 2019), Singh et al. (Singh and Singh 2022). Cochran (Cochran 1977) referred to the failure to measure some of the units in the selected sample, while Zarkovich (Zarkovich 1966) and Ford (Ford 1976) referred to the same problem as missing data. Sudman (Sudman 1976) addressed the problem of bias due to non-cooperation. Sukhatme and Sukhatme (Sukhatme and Sukhatme 1970) described the effects of incomplete samples. All of these approaches refer to the phenomenon of non-response. Sarndal et al. (Sarndal et al. 2003), Groves (Groves 2004), and Dillman et al. (Dillman et al. 2002) and Chaudhary et al. (Chaudhary and Kumar 2016) have also defined various aspects and reasons for non-response.
1.2 Model-based approach
To make inference about the finite population, on the basis of a probability sample, by measuring the sample elements, two types of problems may arise Sarndal et al. (1992):
-
(a)
Inference about the finite population itself.
-
(b)
Inference about a model or a super-population from which the given population thought to have generated.
Case (a) can be dealt with design-based inference as it is usual in most of the survey sampling problems, but case (b) creates a different situation where the population is considered to be a random sample from a super-population which can be designated by a model \(D\). Among others the approach was advocated by Brewer (Brewer 1963), Royall (Royall et al. 1971), Cassel et al. (Cassel et al. 1976), Singh et al. (Singh et al. 2009), Basu (Basu 1958), Singh et al. (Singh et al. 2017) and Singh et al. (Sarndal et al. 2003) Recently, Ahmed and Shabbir (Shakeel Ahmed and Javid Shabbir 2019) have discussed the utility of the estimator for paradigm of the model-based and in the presence of non-ignorable non-response.
Royall and Herson (Royall and Herson 1973a) (Royall and Herson 1973b) suggested a particular type of super population model, termed as PRM, which is described as
\({\text{With}}E_{D} \left( {Y_{t} } \right) = h(x_{t} ) = \sum\limits_{j = 0}^{J} {\delta_{j} \beta_{j} x_{t}^{j} }\);
where \(Y_{t}\) is the study variable associated with the tth unit of the universe of size N, \(x_{t}\) is the value of the tth unit on the auxiliary variable X of the universe and it is non-negative.\(\varepsilon_{t} ;t = 1,2,...,N\) are independent random error with mean zero and variance \(\sigma^{2}\), \(\delta_{j}\)(j = 0, 1,.., J) is zero or one according as the term \(x_{t}^{j}\) is absent or present respectively. In the model (1), \(v\left( {x_{t} } \right)\) are known function of x-values and \(\beta_{j} ;j = 1,2,...,J\) are unknown model parameters. It is denoted this model as \(D\left[ {\delta_{0} ,\delta_{1} ,\delta_{2} ,...,\delta_{J} :v\left( x \right)} \right]\). Chambers (Chambers 1986) has described that in both sample survey theory and practice, mean of \(Y_{t}\) is proportional to \(x_{t}\). The variance of \(Y_{t}\) is proportional to \(v\left( {x_{t} } \right)\).
2 Initiation of the problem
2.1 Let a finite population of size N, denoted by Ω, consists of \(N_{1}\) respondents and \(N_{2}\) non-respondents units. We selected a sample of size n from a universe that consists \(n_{1}\) respondents and \(n_{2}\) non-respondents. For efficient estimate, we collect some information the non-respondents. Therefore, we select a sub-sample \(h_{2}\) from the non-respondents \(n_{2}\) units. Let the sample of size n and \(h_{2}\) be denoted by s and \(s_{h_{2}}\) respectively. Further let \(\Omega = s \cup \overline{s}\), \(s\) and \(\overline{s}\) are two disjoint sets, such that \(s\) represent the observed part of the universe and \(\overline{s}\) represents the missing part of the universe. Further, consider \(s = s_{1} \cup s_{2}\)(\(s_{1}\) and \(s_{2}\) samples of disjoint sets) and \(s_{2} = s_{{h_{2} }} \cup \overline{s}_{{h_{2} }}\). The value of where \(\overline{s}_{{h_{2} }}\) is sub-portion of \(s_{2}\) sample. We have discuss the notations:
Z = Variable X or Y
and
\(\overline{Z} = N^{ - 1} \sum\limits_{\Omega } {z_{t} }\): population mean of \(Z^{th}\) variable.
\(S_{Z}^{2} = (N - 1)^{ - 1} \sum\limits_{\Omega } {(z_{t} } - \overline{Z})^{2}\): \(Z^{th}\) population mean square.
Also sample means:
Higher order moments:
We considered for X
Obviously, we have seen that
Hansen and Hurwitz (Hansen and Hurwitz 1946) suggested the problem of non-response of population mean \(\overline{Y}\) is given by.
They obtained the variance of estimator as
where \(f_{2} = \frac{{n_{2} }}{{h_{2} }}\) and \(\begin{gathered} {\text{ S}}_{{{\text{2Y}}}}^{{2}} = (N_{2} - 1)^{ - 1} \sum\limits_{t = 1}^{{N_{2} }} {(y_{t} } - \overline{Y}_{{N_{2} }} )^{2} \hfill \\ \, \hfill \\ \end{gathered}\); \({\overline{\text{Y}}}_{{{\text{N}}_{{2}} }}\) shows non-respondent population mean.
3 Proposed family of strategy
Family \(T_{1}^{*} \left( \alpha \right)\): Here, we first write the family of one parameter estimators for population mean in the presence of non-response
Remark 1
Instances of estimator \(T_{1}^{*} \left( \alpha \right)\) when \(\alpha\) = 0.
It is extended form of estimator developed by Hansen and Hurwitz (Groves 2004). They have also developed some other cases for \(\alpha\) = 1 and -1, we obtained an exponential-type ratio and product estimators.
4 \(D\)-Bias and \(D\)-MSE of \(T_{1}^{*} \left( \alpha \right)\)
The simple form of the model \(D\left[ {\delta_{0} ,\delta_{1} ,...,\delta_{J} :v\left( x \right)} \right]\), D—based estimator bias and MSE. Theorem 1: Value of D-bias of estimator \(T_{1}^{*} \left( \alpha \right)\) is
The appendix A, Section –I mentioned proof of this Eq. (7)
Theorem 2
The value of D-MSE of the estimator \(T_{1}^{*} \left( \alpha \right)\) is given by.
The appendix A, Section –I mentioned proof of this Eq. (8).
Remark 2
We have seen from Eq. (8). The D-MSE of \(T_{1}^{*} \left( \alpha \right)\) is consisting two parts one is D -Bias and second is variance of estimator. We know that variance of the estimator does not depend on the function \(\sum\nolimits_{j = 0}^{J} {\delta_{j} \beta_{j} x_{t}^{j} }\). However, it is depending on error term \(\varepsilon_{t}\) and variance \(v\left( {x_{t} } \right)\). In the case spoil selection of polynomial regression there is no change in the variance of estimator.
5 \(D\)-Bias and \(D\)-MSE of particular cases of \(T_{1}^{*} \left( \alpha \right)\).
5.1. The PRM instance cases assumed two different forms with a functions \(h\left( {x_{t} } \right) = \sum\nolimits_{j = 0}^{J} {\delta_{j} \beta_{j} x_{t}^{j} }\) and function \(v\left( {x_{t} } \right)\). The instances forms of the model is given by
Obviously models (9), (10) & (11) may be denoted respectively as \(D\left[ {0,\,\,1\,\,:x} \right]\), \(D\left[ {1,\,\,1:x^{2} } \right]\) and \(D\left[ {1,\,\,1,\,\,1\,\,:x^{g} } \right]\). The variance function of model based idea was developed Cochran (Chambers 1986) and Brewer (Brewer 1963) when \(v\left( {x_{t} } \right)\) assumed to be \(x_{t}^{g}\) with \(0 \le g \le 2\). We motive by their contribution, we have provided six different PRMs:
6 D-MSE of \(T_{1}^{*} \left( \alpha \right)\) under Models I-VI
The MSE of the family of estimators \(T_{1}^{*} \left( \delta \right)\) in the models I–VI can be obtained from expression (8). These are presented as follows:
7 Some existing strategies with their MSEs
Two families of estimators in the presence of non-response was discussed Singh et al. (Singh et al. 2017) and compared them under different PRMs. The estimators and their MSEs under \(D\left[ {\delta_{0} ,\delta_{1} ,...,\delta_{J} :v\left( x \right)} \right]\) are as follows
where \(\psi^{{}} \left( {\alpha ,\overline{X},\overline{x}} \right) = \exp \left[ {\alpha \left( {\frac{{\overline{X} - \overline{x}_{{}} }}{{\overline{X} + \overline{x}_{{}} }}} \right)} \right]\), and
and
8 Robustness of the estimator \(T_{1}^{*} \left( \alpha \right)\) and its comparison with \(T_{s}^{*} \left( \alpha \right)\) and \(t_{NR}^{*} (\alpha )\)
8.1 In remark 2, we have seen, the D-MSE of the strategies is affected by the deviations in \(h(x_{t} )\) and function \(v(x_{t} )\), if it a D-biased, while the bias is affected only by the function \(h(x_{t} )\) and is completely do not dependent in \(v(x_{t} )\). Therefore, it would be desirable to consider change in the amount of MSE with the deviation of the model, either due to misspecification in \(h(x_{t} )\) or in \(v(x_{t} )\) or both.
Royall and Herson [17&18], therefore, considered an estimator ‘robust’ if there is a nominal change in the amount of D—MSE due to the deviation of the model, that is, if the optimality of an estimator vitiates slightly under the deviation of model, it could be termed as robust one. Thus, it can be stated that the general aim of the model approach is to find out a strategy which performs well in some broad sense allowing for our uncertainty about the assumed model, that is, a strategy which is almost insensitive to errors in the model. Since a large number of theoretical models may be thought of, it is quite impossible to examine the robustness of an estimator theoretically under the deviations of the models. It is, therefore, advisable to examine the robustness of the estimator under some working models with known parameters.
8.2 It is also appropriate to have a study of comparison of MSEs of estimators \(T_{1}^{*} \left( \alpha \right)\),\(T_{s}^{*} \left( \alpha \right)\) and \(t_{NR}^{*} (\alpha )\), all of which are developed under same set-up. Based upon an empirical data, such a comparison has also been presented in the next section.
9 Empirical data and results
9.1 We have taken a real data Singh et al. (Singh et al. 2017). We have considered two different (15% and 30%) non—response rate. The number of dwelling is \(x_{t}\) and dwelling occupied with \(y_{t}\). The following values are as:
\(N\) = 90, \(\beta_{0}\) = 0.8787, \(\beta_{1}\) =—4.9157, \(\overline{X}\) = 41.4556, \(\sigma^{2}\) = 0.7998.
(i) The values of non-response rate at 15% are:
\(n\) = 20, \(n_{1}\) = 17, \(n_{2}\) = 3, \(h_{2}\) = 2, \(f_{2}\) = 1.5, \(\overline{x}\) = 39.55, \(\overline{x}_{{s_{1} }}\) = 39.824, \(\overline{x}_{{s_{2} }}\) = 38.0,\(\overline{x}_{{s_{{h_{2} }} }}\) = 30.5, \(\sum\limits_{{s_{1} }} {x_{t} }\) = 677, \(\sum\limits_{{s_{h2} }} {x_{t} }\) = 61, \(\sum\limits_{{\overline{s}_{{h_{2} }} }} {x_{t} }\) = 53, \(\sum\limits_{{\overline{s}}} {x_{t} }\) = 2940 \(\sum\limits_{{s_{1} }} {x_{t}^{2} } =\) 36,729, \(\sum\limits_{{s_{{h_{2} }} }} {x_{t}^{2} } =\) 2081,
\(\sum\limits_{{\overline{s}}} {x_{t}^{2} } =\) 179,614,\(\sum\limits_{{\overline{s}_{{h_{2} }} }} {x_{t}^{2} } =\) 2809.
(ii) The values of non-response rate at 30% are:
\(n\) = 20, \(n_{1}\) = 14, \(n_{2}\) = 6, \(h_{2}\) = 4, \(f_{2}\) = 1.5, \(\overline{x}\) = 39.55, \(\overline{x}_{{s_{1} }}\) = 36.5, \(\overline{x}_{{s_{2} }}\) = 46.6667, \(\overline{x}_{{s_{{h_{2} }} }}\) = 50, \(\sum\limits_{{s_{1} }} {x_{t} }\) = 511, \(\sum\limits_{{s_{h2} }} {x_{t} }\) = 200, \(\sum\limits_{{\overline{s}_{{h_{2} }} }} {x_{t} }\) = 80, \(\sum\limits_{{\overline{s}}} {x_{t} }\) = 2940, \(\sum\limits_{s1} {x_{t}^{2} } =\) 25,337, \(\sum\limits_{{\overline{s}}} {x_{t}^{2} } =\) 179,614,\(\sum\limits_{{\overline{s}h2}} {x_{t}^{2} } = 3328\).
9.2 Tables 1, 2, 3, 4, 5, 6 details the MSEs of stretegies \(T_{1}^{*} \left( \alpha \right)\), \(T_{s}^{*} \left( \alpha \right)\) & \(t_{NR}^{*} (\alpha )\) for \(\alpha = 0,1\) and -1 with 15% and 30% non response rates over Models I-VI.
10 Conclusions
We obtained the following point from the tables:
(i) The proposed estimator \(T_{1}^{*} \left( \alpha \right)\) is nearly robust for models I, II, IV and V regardless of the choice of \(\alpha\) of a given nonresponse rate. There is a significant change in the MSE value of the estimator for models III and VI compared to models I, II, IV and V, but for a fixed non-response rate. The estimator can again be considered robust under the variance function \(x^{2}\), regardless of the choice of \(\alpha\). Thus, the estimator is not affected under misspecification of \(h\left( {x_{t} } \right)\) when the variance function is \(x_{t}^{g}\) for \(g = 0\) and 1. Similarly, the conclusion for the variance function \(x^{g}\) for \(g = 2\).
(ii) \(T_{1}^{*} \left( \alpha \right)\) is an efficient or sometimes better than the estimators \(T_{s}^{*} \left( \alpha \right)\) and \(t_{NR}^{*} (\alpha )\) in terms of estimators accuracy.
(iii) It is interesting to note that for \(\alpha\) =0, \(T_{s}^{*} \left( \alpha \right)\) coincides with the estimator \(T_{1}^{*} \left( \alpha \right)\) under all the models and non-response rates. This is because of the reason that for \(\alpha\) =0, both estimators reduces to \(T_{1}^{*} (0) = \overline{y}_{w}^{*} = T_{s}^{*} (0)\).
However, the conclusions drawn above are based on solely on empirical data and a particular configuration of the sample for different non-response rates. Therefore, a comparison of the estimators at different non-response rates would not be possible due to changing in the sample configuration. Furthermore, the results presented here are limited to the data at hand so no consistent conclusions can be drawn. Clearly, results may change with other data. The presentation made here is only an attempt to get an idea of the nature of the proposed family on the misspecifications of PRMs.
References
Bahl S, Tuteja RK (1991) Ratio and product type-exponential estimator. Inform Opt Sci XII I:159–163
Basu D (1958) On sampling with and without replacement, Sankhya, 20 A, 287–294.
Brewer KRW (1963) Ratio estimation and finite populations: Some results deducible from the assumption of an underlying stochastic process. Aust J Stat 5:93–105
Cassel CM, Sarndal CE, Wretman JH (1976) Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63:615–620
Chambers RL (1986) Outlier robust finite population estimation. J Am Stat Assoc 81(396):1063–1069
Chaudhary MK, Kumar A (2016) Combined-type estimators of finite population mean using double sampling scheme under nonresponse. J Adv Res Appl Math Stat 1(3&4):6–18
Cochran WG (1953) Sampling Techniques, John Wiley and Sons, Inc., New York, I, Edition.
Cochran WG (1977) Sampling Techniques, Wiley Eastern Limited, New Delhi III Edition.
Dillman D, Eltinge J, Groves RM, Little R (2002) Survey non-response in design, data collection and analysis, In Survey Non-response, 3–26. Wiley, New York
Ford BL (1976) Missing data procedures: A comparative study, American Statistical Association Proceedings, Social Statistics Section, 324–329.
Groves R (2004) Survey errors and survey costs. Wiley, New York
Hansen MH, Hurwitz WN (1946) The problem of non-response in sample surveys. J Am Stat Assoc 41:517–529
Kish L (1965) Survey sampling. Wiley and Sons, New York, I Edition
Kish L (1967) Survey Sampling. John Wiley and Sons Inc, New York, II Edition
Kumar A, Singh AK, Singh VK (2019) Investigating the performance of a family of exponential-type estimators in presence of measurement error in Communications in Statistics—Theory and Methods, Vol-49, Issue No-23, PP. 1–20.
Royall RM (1971) Linear regression models in finite population sampling theory. In: Godambe VP, Sprott DA (eds) Foundations of Statistical Inference. Holt, Rinehart and Winston, Toronto, pp 259–274
Royall RM, Herson J (1973a) Robust estimation in finite populations I. J Am Stat Assoc 68(344):880–889
Royall RM, Herson J (1973b) Robust estimation in finite populations II: Stratification on a size variable. J Am Stat Assoc 68(344):890–893
Sarndal CE, Swensson B, Wretman J (1992) Model Assisted Survey Sampling, I. Springer-Verlag, New York Inc
Sarndal CE, Swensson B, Wretman J (2003) Model Assisted Survey Sampling. Edition, Springer-Verlag, New York Inc, II
Shakeel Ahmed and Javid Shabbir (2019) Model based estimation of population total in presence of non-ignorable non-response. PLoS ONE 14(10):e0222701
Singh AK, Singh VK (2022) (2022): A family of estimators for population mean under model approach in presence of non-response. J Reliab Stat Stud 15(1):1–20
Singh VK, Singh RVK, Shukla RK (2009) Model-based study of some estimators in the presence of non-response. In: Singh KK, Yadava RC, Pandey A (eds) Population, Poverty and Health : Analytical Approaches. Hindustan Publishing Corporation, New Delhi, India, pp 360–365
Singh AK, Singh P, Singh VK (2017) Model based study of families of exponential type estimators in presence of nonresponse. Commun Stat-Theory Methods 46(13):6478–6490
Sudman S (1976) Applied sampling. Academic Press, New York
Sukhatme PV, Sukhatme BV (1970) Sampling theory of surveys with applications. Asia Publishing House, London
Zarkovich SS (1966) Quality of statistical data. Food and Agricultural Organization of the United Nations, Rome
Funding
There is no funding.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix
Section I:
We have
Now, using the PRM
\(= \sum\limits_{j = 0}^{J} {\delta_{j} \beta_{j} x_{t}^{j} } + \varepsilon_{t} \left[ {v\left( {x_{t} } \right)} \right]^{1/2}\) for \(t = 1,2,...,N\)
With \(E_{D} \left( {Y_{t} } \right) = \sum\limits_{j = 0}^{J} {\delta_{j} \beta_{j} x_{t}^{j} }\),
\(Var\left[ {Y_{t} } \right] = \sigma^{2} v\left( {x_{t} } \right)\), \(Cov\left( {Y_{s,} Y_{t} } \right) = 0\) for \(s \ne t\),
\(E_{D} \left( {\varepsilon_{t} } \right) = 0\) for all \(k\), \(E_{D} \left( {\varepsilon_{t}^{2} } \right) = \sigma^{2}\) for all \(t\).
we can write
Since \(E_{D} (\varepsilon_{t} ) = 0\) for all t, we have
Thus Eq. (7) follows.
Section II
The \(D\)-MSE of the strategy \(T_{1}^{*} \left( \alpha \right)\) for model 1 is derived as follows:
We have
such that \(E_{D} \left( {\varepsilon_{t} ,\varepsilon_{s} } \right) = 0\) for \(s \ne t\) and \(E_{D} \left( {\varepsilon_{t}^{2} } \right) = \sigma^{2}\), we have
Expression (A5) can further be written as
Hence the expression (8) follows.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Singh, A.K., Ashutosh, A. & Singh, J.K. Design based study for adjusting non-response while estimating population mean. Life Cycle Reliab Saf Eng (2024). https://doi.org/10.1007/s41872-024-00268-4
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s41872-024-00268-4