Keywords

1 Introduction

The analysis of poverty measures is a topic of increased interest to society. The official poverty rate and the number of people in poverty are important measures of the country’s economic wellbeing. The common characteristic of many poverty measures is their complexity. The literature on survey sampling is usually focused on the goal of estimating linear parameters. However when the variable of interest is a measure of wages or income, the distribution function is a relevant tool because is required to calculate the poverty line, the low income proportion, the poverty gap and other poverty measures.

The lack of response is a growing problem in economic surveys. Although there are many procedures for their treatment, few efficient techniques have been developed for their treatment in the estimation of non-linear parameters. Recently, in [15] various estimators for the distribution function in the presence of missing data have been proposed. Using these estimators, we first propose new estimators for several poverty measures, which efficiently use auxiliary information at the estimation stage. Due to the complexity of the percentile ratios and the complex sampling designs used by the official sample surveys, variances of these complex statistics could be not expressible by simple formulae. Additional techniques for variance estimation are therefore required under this scenario.

This paper is organized as follows. Section 11.2 introduces the estimation of the distribution function when there are missing data. In Sect. 11.3, the proposed percentile ratio estimators are described. In Sect. 11.4 we derive resampling techniques for the problem of the variance estimation of percentile ratio estimators. A simulation study based on data derived from the Spanish Household Panel Survey is presented in Sect. 11.5. This study shows how the proposed estimates of the poverty measures perform in reduction of bias and precision when calibration is used for nonresponse that is not missing at random.

2 Calibrating the Distribution Function for Treating the Non-response

Consider a finite population \(U=\left \{1,\ldots ,N\right \}\) consisting of N different and identifiable units. Let us assume a sampling design d defined in U with positive first-order inclusion probabilities π i i,  ∈ U. Let \(d_i=\pi _i^{-1}\) denote the sampling design-basic weight for unit i ∈ U which is known. We assume missing data on the sample s obtained by the sampling design d. Let us denote by s r, the respondent sample of size r, and s m the non-respondent sample of size n − r.

Let y i be the value of the character under study. The distribution function F y(t) can be estimated by the Horvitz-Thompson estimator:

$$\displaystyle \begin{aligned} \widehat{F}_{HT}(t)=\frac{1}{N}\sum_{k\in s_r}d_{k}\Delta(t-y_{k}), \end{aligned} $$
(11.1)

where

$$\displaystyle \begin{aligned} \Delta(t-y_{k})=\left\{ \begin{array}{l l} 0 & \ \text{ if }\ t<y_{k},\\ 1 & \ \text{ if }\ t\geq y_{k}, \end{array}\right . \end{aligned}$$

and d k = 1∕π k, the basic design weights.

This estimator is biased for the distribution function. There are several approach for dealing with nonresponse. The most important method is weighting. We assume the existence of auxiliary information relative to several variables related to the main variable y, x = (x 1, x 2, …, x J). Based on this auxiliary information, calibration weighting is used in [15] to propose three methods to reduce the non-response bias in the estimation of the distribution function:

–:

The first method is based on the methodology proposed in [14]. We define a pseudo-variable \(g_{k}=\widehat {\beta }^{\prime }{\mathbf {x}}_{k}\) for k = 1, 2, …N, where

$$\displaystyle \begin{aligned}\widehat{\beta}=\Bigg(\sum_{j\in {s_r}}d_{j}{\mathbf{x}}_{j}{\mathbf{x}}_{j}^{\prime}\bigg)^{-1}\cdot\sum_{j\in s_r}d_{j}{\mathbf{x}}_{j}y_{j}.\end{aligned}$$

Thus we define a calibrated estimator by imposing that the calibrated weights w k evaluated in the observed sample give perfect estimates for the distribution function in a set of predetermined points t j for j = 1, 2, …, P that we choose arbitrarily:

$$\displaystyle \begin{aligned} \frac{1}{N}\sum_{k\in s_r} w_{k} \Delta(\mathbf{t}-g_k)=F_{g}(\mathbf{t}), \end{aligned} $$
(11.2)

where F g(t) denotes the finite distribution function of the pseudo-variable g k evaluated at the point t = (t 1, …, t p) and Δ(t − g k) = ( Δ(t 1g k), …, Δ(t pg k)).

A common way to compute calibration weights is linearly (using the chi-square distance method) and we obtain an explicit expression of the estimator as:

$$\displaystyle \begin{aligned} \hat{F}_{cal}^{(1)}(t)= & \frac{1}{N} \sum_{k \in s_r} w_k^{(1)} \Delta(t-y_{k}) \notag\\ &= \widehat{F}_{HT}(t)+\Big(F_{g}(\mathbf{t})- \frac{1}{N}\sum_{k \in s_r} d_{k}\Delta( \mathbf{t}-g_{k})\Big)^{\prime}\cdot T^{-1}\cdot H \end{aligned} $$
(11.3)

where \(T=\sum _{k \in s_r}d_{k}\Delta (\mathbf {t}-g_{k})\Delta (\mathbf {t}-g_{k})^{\prime }\) and \(H=\sum _{k \in s_r}d_{k}\Delta (\mathbf {t}-g_{k})\Delta (t-y_{k})\).

Following [14], if we denote by \(k_{i}=\displaystyle \sum _{k\in s_{r}}d_{k}\Delta (t_{i}-g_{k})\) for i = 1, …, P the condition k i > k i−1 for i = 2, …, P guarantees the existence of T −1.

–:

The second method is based on two-step calibration weighting as in the work in [10]:

  1. 1.

    The first calibration is designed to remove the non-response bias.

    Consider the M vector of explanatory model variables, \({\mathbf {x}}^{*}_k\) which population totals \({\sum _{U}}{\mathbf {x}}^{*}_k\) are know. The calibration under the restrictions \( \sum _{s_r}v_k^{(1)}{\mathbf {x}}^{*}_k= {\sum _{U}}{\mathbf {x}}^{*}_k \) yields the calibrations weights \(v_k^{(1)}, k=1,\dots , s_r\).

  2. 2.

    The second one to decrease the sampling error in the estimation of the distribution function.

    The auxiliary information of the calibration variables x is incorporated through the calibrated weights \(v_k^{(2)}\) obtained with the restrictions \(\sum _{s_r}v_k^{(2)} \Delta (\mathbf {t}-g_k)=F_{g}(\mathbf {t}).\) The final estimator is given by

    $$\displaystyle \begin{aligned} \hat{F}_{cal}^{(2)}(t)= \frac{1}{N} \sum_{k \in s_r} w_k^{(2)} \Delta(t-y_{k}) = \frac{1}{N} \sum_{k \in s_r} v_k^{(2)} v_k^{(1)} \Delta(t-y_k). \end{aligned} $$
    (11.4)

This method allows different variables to be used in each phase (model variables \({\mathbf {x}}^{*}_k\) and calibration variables x), since the model for non-response and the predictive model can be very different.

–:

The last method is based on instrumental variables (see [6] and[11]). The calibration is done in a single stage, but different variables are also used to model the lack of response and for the calibration equation. By assuming that the probability of response can be modeled by: \(\theta _k = f( \gamma ^{\prime } {\mathbf {x}}_k^{*})\) for some vector parameter γ, where h(⋅) = 1∕f(⋅) is a known and everywhere monotonic and twice differentiable function. We denote as z k =  Δ(t − g k).

The calibration equation is given by

$$\displaystyle \begin{aligned} \frac{1}{N} \sum_{k \in s_r}\frac{d_k}{ f(\hat{\gamma}^{\prime} {\mathbf{x}}^{*}_k)} {\mathbf{z}}_k =\frac{1}{N} \sum_{k \in s_r} d_k h(\hat{\gamma}^{\prime} {\mathbf{x}}^{*}_k) {\mathbf{z}}_k = F_{g}(\mathbf{t}) \end{aligned} $$
(11.5)

and the resulting calibrated estimator is:

$$\displaystyle \begin{aligned} \hat{F}_{cal}^{(3)}(t) =\frac{1}{N} \sum_{k \in s_r} w_k^{(3)} \Delta(t-y_{k}) = \frac{1}{N} \sum_{k \in s_r} d_k h( \hat{\gamma}^{\prime} {\mathbf{x}}_k^{*}) \Delta(t-y_{k}), \end{aligned} $$
(11.6)

where \(\hat {\gamma }\) is a consistent estimator of vector γ. Authors use several approximation methods for deriving the solution of the minimization problem. We denote by \(\widehat {F}_{Dcal}^{(3)}(t)\) the estimator based in the Deville’s approach [6] which needs to meet the condition M = P. To consider more calibration restrictions in Eq. (11.5) than M, we consider the estimator \(\widehat {F}_{KL1cal}^{(3)}(t)\) and \(\widehat {F}_{KL2cal}^{(3)}(t)\) based on [11] where P > M.

3 Poverty Measures Estimation with Missing Values

Currently, poverty measurement, wage inequality, inequality and life condition are overriding issues for governments and society. Some indices and poverty measures used in the poverty evaluation and income inequality measurement are based on quantile and quantiles ratios. Thereby, Eurostat currently set the poverty line (the population threshold for classification into poor and nonpoor) equal to sixty percent of the equivalized net income median Q 50. On the other hand, the percentile ratios Q 95Q 20; Q 90Q 10 and Q 80Q 20 [9]; Q 95Q 50 and Q 50Q 10 (see [12] and [4]); Q 50Q 5 and Q 50Q 25 [7] have been considered as measures for wage inequality. We focus on estimating the poverty measures based on percentile ratios.

The population α-quantile of y is defined as follows

$$\displaystyle \begin{aligned} Q_{y}(\alpha)=\inf\{t: F_{y}(t)\geq\alpha\}=F_{y}^{-1}(\alpha). \end{aligned} $$
(11.7)

A general procedure to incorporate the auxiliary information in the estimation of Q y(α) is based on the obtainment of an indirect estimator \(\widehat {F}_{y}(t)\) of F y(t) that fulfills the distribution function’s properties. Under this assumption, the quantile Q y(α) can be estimated in a following way:

$$\displaystyle \begin{aligned} \widehat{Q}_{y}(\alpha)=\inf\{t: \widehat{F}_{y}(t)\geq\alpha\}=\widehat{F}_{y}^{-1}(\alpha). \end{aligned} $$
(11.8)

The distribution function estimators described in the previous section allow us to incorporate the auxiliary information in the estimation of quantiles in the presence of non-response and obtain estimators for percentile ratios. Perhaps, some of these calibrated estimators do not satisfy all the properties of the distribution function and consequently for its application in the estimation of quantiles some modifications are necessary. Specifically, the properties that an estimator \(\widehat {F}_{y}(t)\) of the distribution function F y(t) must meet are the following:

  1. i.

    \(\widehat {F}_{y}(t)\) is continuous on the right.

  2. ii.

    \(\widehat {F}_{y}(t)\) is monotone nondecreasing,

  3. iii.

    (a) \(\displaystyle \lim _{t\rightarrow -\infty }{\widehat {F}_{y}(t)}=0\) and (b) \(\displaystyle \lim _{t\rightarrow +\infty }{\widehat {F}_{y}(t)}=1\).

Firstly, it is easy to see that all estimators satisfy the conditions (i) and iii.(a). Secondly, following [14], the estimator \(\widehat {F}_{cal}^{(1)}(t)\) meet the rest of conditions if t P is sufficiently large (i.e F g(t P) = 1). On the other hand, it’s easy to see that \(\widehat {F}_{cal}^{(2)}(t)\) satisfy the condition iii.(b) if t P is sufficiently large but it is not monotone nondecreasing in general. Thus, we can apply the procedure described in [13]. This procedure, for a general estimator \(\widehat {F}_{y}\), is defined in the following way:

$$\displaystyle \begin{aligned} \tilde{F}_{y}(y_{[1]})=\widehat{F}_{y}(y_{[1]}),\quad \tilde{F}_{y}(y_{[i]})=\max\{\widehat{F}_{y}(y_{[i]}),\, \tilde{F}_{y}(y_{[i-1]})\} \quad i=2,\ldots, r. \end{aligned} $$
(11.9)

Finally, all estimators based on \(\widehat {F}_{cal}^{(3)}(t)\) are nondecreasing if \(\theta _k = f( \gamma ^{\prime } {\mathbf {x}}_k^{*})\geq 0\) for all k ∈ U (response model based on logit, raking and logistic methods) because the calibration weights \(\omega _{k}^{(3)}\geq 0\). Moreover, \(\widehat {F}_{Dcali}^{(3)}(t)\) fulfills condition iii.(b) with t P sufficiently large whereas following [15], \(\widehat {F}_{KL1cali}^{(3)}(t)\) and \(\widehat {F}_{KL2cali}^{(3)}(t)\) meet condition iii.(b) if in addition to considering t P sufficiently large, a component of the vector \({\mathbf {x}}^{*}_k\) contains all 1’s.

Based on the population distribution function F y(t), given two values 1 > α 1 > α 2 > 0, the percentile ratio R(α 1, α 2) is define as follow:

$$\displaystyle \begin{aligned} R(\alpha_{1},\alpha_{2})=\frac{Q_{y}(\alpha_{1})}{Q_{y}(\alpha_{2})} \end{aligned} $$
(11.10)

and it can be estimated with a generic quantile estimator \(\widehat {Q}_{y}(\alpha )\) as follows:

$$\displaystyle \begin{aligned} \widehat{R}(\alpha_{1},\alpha_{2})=\frac{\widehat{Q}_{y}(\alpha_{1})}{\widehat{Q}_{y}(\alpha_{2})}. \end{aligned} $$
(11.11)

Thus, the quantile estimator derived from \(\widehat {F}_{cal}^{(1)}\); \(\widehat {F}_{cal}^{(2)}\) and \(\widehat {F}_{cal}^{(3)}\) can be employed in the estimation of R(α 1, α 2).

4 Variance Estimation for Percentile Ratio Estimators with Resampling Method

Given the complexity of the proposed percentile ratio estimators, we have considered the use of bootstrap techniques for estimating variance and developing confidence intervals associated with the proposed calibration estimators. In this study, we consider the frameworks proposed in [1], [2], and [3].

First, the bootstrap procedure described in [3] consider the repetition of sample units for creating artificial bootstrap populations. The bootstrap samples are drawing with the original sampling design from artificial populations. Specifically, if the population size N = n ⋅ q + m with 0 < m < n, the artificial population U B is obtained with q repetitions of s and an additional sample of size m selected by simple random sampling without replacement from s. Given a generic percentile ratio estimator \(\widehat {R}(\alpha _{1},\alpha _{2})\), if we consider M independent artificial populations \(U_{B}^{j}\) with j = 1, …, M and for each pseudo population \(U_{B}^{j}\) we select K bootstrap samples \(s_{1}^{j},\ldots ,s_{K}^{j}\) with sample size n, we can compute the bootstrap estimates \(\widehat {R}^{*}(\alpha _{1},\alpha _{2})_{h}^{j}\) with the sample \(s_{h}^{j}\) for the population \(U_{B}^{j}\) and following [5], we can compute

$$\displaystyle \begin{aligned} \widehat{V}_{j}=\frac{1}{K-1}\sum_{h=1}^{K}(\widehat{R}^{*}(\alpha_{1},\alpha_{2})_{h}^{j}-\widehat{R}^{*}_{j}(\alpha_{1},\alpha_{2}))^{2}, \end{aligned} $$
(11.12)

where

$$\displaystyle \begin{aligned} \widehat{R}^{*}_{j}(\alpha_{1},\alpha_{2})=\frac{1}{K}\sum_{h=1}^{K}\widehat{R}^{*}(\alpha_{1},\alpha_{2})_{h}^{j}, \end{aligned} $$
(11.13)

Finally, the variance estimation for the estimator \(\widehat {R}(\alpha _{1},\alpha _{2})\) is given by

$$\displaystyle \begin{aligned} \widehat{V}(\widehat{R}(\alpha_{1},\alpha_{2}))=\frac{1}{M}\sum_{j=1}^{M} \widehat{V}_{j}. \end{aligned} $$
(11.14)

On the other hand, in [1] and [2] a direct bootstrap method has been proposed, where it is not necessary to obtain an artificial population, since the bootstrap samples are drawn from s under a sampling scheme different from the original sampling design. Both frameworks (see [1] and [2]) can be applied under several sample designs, but particularly, if the sample s is drawing with simple random sampling without replacement, the sampling design proposed by Antal and Tillé [1] select two samples from s, the first one is drawing by simple random sampling without replacement and the second one is drawing with one-one sampling design (a sampling design for resampling). Similarly, under simple random sampling without replacement, the sampling design proposed by [2] draw a first sample with Bernoulli design and a second sample with double half sampling design (another sampling design for resampling). For more details see [1] and [2].

For two frameworks, given a percentile ratio estimator \(\widehat {R}(\alpha _{1},\alpha _{2})\), we draw M bootstrap samples \(s_{1}^{*},\ldots ,s_{M}^{*}\) from s, according to the sampling schemes of [1] and [2] respectively. The bootstrap estimation for variance of the estimator \(\widehat {R}(\alpha _{1},\alpha _{2})\) is given by

$$\displaystyle \begin{aligned} \widehat{V}(\widehat{R}(\alpha_{1},\alpha_{2}))=\frac{1}{M}\sum_{j=1}^{M} (\widehat{R}(\alpha_{1},\alpha_{2})_{j}^{*}-\bar{R}(\alpha_{1},\alpha_{2})^{*})^{2}, \end{aligned} $$
(11.15)

where \(\widehat {R}(\alpha _{1},\alpha _{2})_{j}^{*}\) is the bootstrap estimator computed with the bootstrap sample \(s_{j}^{*}\) and

$$\displaystyle \begin{aligned} \bar{R}(\alpha_{1},\alpha_{2})^{*}=\frac{1}{M}\sum_{j=1}^{M}\widehat{R}(\alpha_{1},\alpha_{2})_{j}^{*}. \end{aligned} $$
(11.16)

Finally, based on the variance estimation \(\widehat {V}(\widehat {R}(\alpha _{1},\alpha _{2}))\) obtained with a bootstrap method, the 1 − α level confidence interval based on the approximation by a standard normal distribution is defined as follows:

$$\displaystyle \begin{aligned} \Big[\widehat{R}(\alpha_{1},\alpha_{2})-z_{1-\alpha/2}\cdot \sqrt{\widehat{V}(\widehat{R}(\alpha_{1},\alpha_{2}))},\widehat{R}(\alpha_{1},\alpha_{2})+z_{1-\alpha/2}\cdot \sqrt{\widehat{V}(\widehat{R}(\alpha_{1},\alpha_{2}))}\Big], \end{aligned} $$
(11.17)

where z α is the α quantile of the standard normal distribution. For all bootstrap methods included in this study, we can compute with this procedure the respective confident interval.

5 Simulation Study

To determine the behaviour of the estimators when they are applied to real data we consider data from the region of Andalusia of 2016 Spanish living conditions survey carried out by the Instituto Nacional de Estadística (INE) of Spain. The survey data collected are considered as a population with size N = 1442 and samples are selected from it. The study variable y is the equivalised net income and the auxiliary variables included are the following dummy variables b 1 =  “Home without mortgage”, b 2 =  “Four-bedroom home” and b 3 = “Can the home afford to go on vacation away from home, at least one week a year?”. We considered the vector of model variables \((x_{k}^{*})^{\prime }=(1,b_{1k})\) and the vector of calibration variables (x k) = (1, b 1k, b 2k, b 3k).

We consider four response mechanism where the probability of the k-th individual of responde is given by

$$\displaystyle \begin{aligned} \theta_k= \frac{1} {exp(A+b_{1k}/B)} \end{aligned} $$
(11.18)

with different values for A and B.

The ratio estimators considered in this simulation study, based on the respondent sample s r, are obtained from the Horvitz-Thompson estimator \(\widehat {F}_{HT}(t)\). We denoted by \(\widehat {R}_{D}^{(3)}\) the calibration estimator based on [6] and we denoted by \(\widehat {R}_{KL1}^{(3)}\) and \(\widehat {R}_{KL2}^{(3)}\) the calibration estimators based on [11]. The estimator \(\widehat {R}_{cal}^{(1)}\) has been included only with comparative purposes with respect to the rest of proposed estimators because it only considers the respondent sample and it does not deal with nonresponse whereas the rest of the estimators proposed try to deal with the bias produced by nonresponse. Although the real response mechanism considered is based on raking method, for \(\widehat {R}_{D}^{(3)}\), \(\widehat {R}_{KL1}^{(3)}\) and \(\widehat {R}_{KL2}^{(3)}\) three versions of them are computed based on linear, raking and logit (l; u) response models.

We selected W = 1000 samples with several sample sizes, n = 100, n = 125, n = 150 and n = 200, under simple random sampling without replacement (SRSWOR) and for each estimator included in the simulation study, we computed estimates of R(α 1, α 2) for 50th/25th, 80th/20th, 90th/10th, 90th/20th, 95th/20th and 95th/50th. The performance of each estimator is measured by the relative bias, (RB), and the relative efficiency (RE), given by

(11.19)
(11.20)

where \(\widehat {R}(\alpha _{1},\alpha _{2})\) is a percentile ratio estimator and \(\widehat {R}_{HT}(\alpha _{1},\alpha _{2})\) is the percentile ratio estimator based in the Horvitz-Thompson \(\widehat {F}_{HT}(t)\) estimator .

Tables 11.1, 11.2, 11.3, 11.4, 11.5 and 11.6 provide the values of the relative bias and the relative efficiency for this population for several sample sizes and response mechanism of the estimators compared.

Table 11.1 RB and RE for several sample sizes of the estimators of R(0.5, 0.25). srswor from the 2016 Spanish living conditions survey
Table 11.2 RB and RE for several sample sizes of the estimators of R(0.8, 0.2). srswor from the 2016 Spanish living conditions survey
Table 11.3 RB and RE for several sample sizes of the estimators of R(0.9, 0.1). srswor from the 2016 Spanish living conditions survey
Table 11.4 RB and RE for several sample sizes of the estimators of R(0.9, 0.2). srswor from the 2016 Spanish living conditions survey
Table 11.5 RB and RE for several sample sizes of the estimators of R(0.95, 0.2). srswor from the 2016 Spanish living conditions survey
Table 11.6 RB and RE for several sample sizes of the estimators of R(0.95, 0.5). srswor from the 2016 Spanish living conditions survey

Results from Tables 11.1, 11.2, 11.3, 11.4, 11.5, and 11.6 show an important bias for the estimator \(\widehat {R}_{HT}\) in almost percentile ratios. The estimator \(\widehat {R}_{cal}^{(1)}\) is not capable of correcting the bias in several situations, giving worse estimates than the HT estimator for some ratios and some response mechanisms. The proposed estimators have better values of RB with slight differences between them, although there is no uniformly better estimator than the rest.

Regarding efficiency, in general, the proposed estimators show the best performance for all sample sizes. Finally, in terms of bias and efficiency, there are no differences between the three versions of the estimators (linear method, raking and logit (l; u)) for the estimators \(\widehat {R}_{cal}^{(3)}\).

For the variance estimation and confidence intervals, we computed the coverage probability (CP) , the lower (L) and the upper (U) tail error rates of the 95% confidence intervals, in percentage and the average length (AL) of the confidence intervals for each estimator and each bootstrap method.

Concerning the variance estimation and confidence intervals, we used 1000 bootstrap replications from each initial sample with all bootstrap methods included in the study to compute CP, L, U and AL of the 95% confidence intervals for each percentile ratio estimator considered. Result from this simulation study for some percentile ratios are presented in Tables 11.7, 11.8, and 11.9.

Table 11.7 AL, CP %, L % and U % for several resampling method of the estimators compared. srswor from the 2016 Spanish living conditions survey R (0.9, 0.1)
Table 11.8 AL, CP %,L % and U % for several resampling method of the estimators compared. srswor from the 2016 Spanish living conditions survey R (0.9, 0.2)
Table 11.9 AL, CP %,L % and U % for several resampling method of the estimators compared. srswor from the 2016 Spanish living conditions survey R (0.8, 0.2)

From bootstrap estimates, it is observed that:

–:

Bootstrap methods produce intervals with high true coverage.

–:

None of the intervals constructed with each estimator have problems of lack of coverage.

–:

The intervals obtained from the proposed calibration estimators always provide intervals with less amplitude than the intervals obtained from \(\widehat {R}_{HT}\) and \(\widehat {R}_{cal}^{(1)}\).

–:

The last method [1] provides results very similar to the one in[2].

6 Conclusion

In this study we use calibration techniques to estimate poverty measures based on percentiles ratios in presence of missing data through a more efficient estimation of the distribution function. The simulation study included shows the improvement in bias and efficiency with the two proposed calibration techniques, \(\widehat {R}_{cal}^{(2)}\) and \(\widehat {R}_{cali}^{(3)}\). The first one is based in two-step calibration method [10]. In the first step, the weighting is designed to remove the non-response bias while in the second step the weighting is designed to decrease the sampling error in the estimation of the distribution function. The second method is based on calibration weighting with instrumental variables [11].

The results show a large decrease in bias and MSE for all ratio percentiles considered, for both calibration methods, and for the three versions of them based on linear, raking and logit response models, which shows the robustness of the adjustment method. Although the simulation results show that there is no uniformly better estimator than another among the proposed estimators (both with respect to bias and efficiency), the \(\widehat {R}_{cal}^{(2)}\) and \(\widehat {R}_{KL2cal}^{(3)}\) estimators are computationally simpler than the other alternatives which implies that they are a suitable option for the estimation of measures for wage inequality based on percentiles ratios.

In [10] is said that there are reasons for preferring the use of two calibration-weighting steps even when the sets of calibration variables used in both steps are the same or a subset of the calibration variables in a single step. These reasons, together with the good performance of the two-step estimator shown in the simulation study, suggest the choice of the estimator \(\widehat {R}_{cal}^{(2)}\).

We used parametric methods to model the lack of response but we could use machine learning techniques as regression trees, spline regression, random forests etc. Other way to reduce the bias is to combine calibration technique with other techniques as the Propensity Score Adjustment [8]. Further research should 318 focus on extensions of those methods for general parameter estimation.