Abstract
In this paper, we consider quantile regression estimation for linear models with covariate measurement errors and nonignorable missing responses. Firstly, the influence of measurement errors is eliminated through the bias-corrected quantile loss function. To handle the identifiability issue in the nonignorable missing, a nonresponse instrument is used. Then, based on the inverse probability weighting approach, we propose a weighted bias-corrected quantile loss function that can handle both nonignorable missingness and covariate measurement errors. Under certain regularity conditions, we establish the asymptotic properties of the proposed estimators. The finite sample performance of the proposed method is illustrated by Monte Carlo simulations and an empirical data analysis.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Quantile regression, first proposed by Koenker and Bassett (1978), has become an important statistical method. By considering different quantiles, quantile regression provides a more complete description of the conditional distribution of responses given covariates. In addition, compared to mean regression, quantile regression demonstrates robustness in the presence of heavy-tailed errors. A detailed review of quantile regression can be found in Koenker et al. (2017).
Traditional quantile regression assumes that the data is fully observed, that is, there are no missingness or measurement errors. However, in many applications, especially in biomedical and social science studies, this assumption may be violated. It is well known that ignoring measurement errors and missing data may produce a large bias for the regression coefficients (Carroll et al. 1995; Little and Rubin 2002). Therefore, when measurement errors and missingness coexist, it becomes essential to handle both problems to obtain reliable results. Depending on the missing mechanism, Little and Rubin (2002) defined three types of missing: missing completely at random, missing at random (MAR), and missing not at random (MNAR). In this paper, we consider the data to be MNAR, which can also be called nonignorable missing data.
In the context of quantile regression, numerous methods have been proposed to handle measurement errors or nonignorable missing responses separately. For quantile regression with covariate measurement errors, He and Liang (2006) studied an orthogonal regression method by assuming that both regression errors and measurement errors obey the same symmetric distribution. This method limits the flexibility of the model. Wei and Carroll (2009) established joint estimating equations and developed an iterative estimation program, which produce a consistent estimator. However, their approach is computationally complex. Wang et al. (2012) developed a smooth corrected quantile estimation procedure that avoids the symmetry assumption and is simple to implement. Nonignorable missing responses in quantile regression have also been studied. For example, based on the instrumental variable, Zhao et al. (2017) considered the empirical likelihood method for the linear models. Ding et al. (2020) introduced a regularized estimation for ultrahigh-dimensional data. For more literature, see (Jiang et al. 2016; Ma et al. 2022; Yu et al. 2022), and so on.
To the best of our knowledge, for quantile regression, there is little literature addressing the concurrent biases arising from nonignorable missing responses and covariate measurement errors simultaneously. We focus on this topic in this paper. Specifically, we propose a two-stage procedure for constructing a weighted bias-corrected quantile loss function, which yields a consistent estimate for linear quantile regression models with both covariate measurement errors and nonignorable nonresponse. In the first stage, we employ the bias-corrected quantile loss function to eliminate the bias introduced by measurement errors. Subsequently, a nonresponse instrument and a generalized gethod of moments (GMM) approach are utilized to estimate the unknown parameters in the propensity. Once the propensity is consistently estimated, in the second stage, we construct a weighted bias-corrected quantile loss function based on the inverse probability weighting (IPW) approach. Furthermore, under some regularity conditions, the asymptotic properties of the proposed estimators are derived.
The remainder of this article is organized as follows. In Sect. 2, the linear quantile regression model with covariate measurement errors and nonignorable missing responses is described. In Sect. 3, we propose a weighted bias-corrected quantile loss function. Asymptotic properties of the proposed estimators are also presented in this section. Simulation studies are given in Sect. 4. Section 5 concludes with a discussion. Proofs of the theorems are deferred to the Appendix A.
2 Statistical modeling
2.1 Linear quantile regression
For a given quantile level \(\tau \in (0,1)\), consider the following linear quantile regression model
where \(Y_{i} \in {\mathbb {R}} \) is response, \({\textbf{X}}_{i}=(X_{i1},\ldots ,X_{ip}) \in {\mathbb {R}}^p\) is the corresponding covariate, \(\varvec{\beta }_{\tau 0}\) is a p-dimensional vector of unknown parameters and \(e _{i}\) is an error term satisfied \({\text {Pr}}(e_{i}< 0 \mid {\textbf{X}}_{i})=\tau \). Let \(Q_{Y_i}(\tau \vert {\textbf{X}}_{i})\) be the condition quantile of \(Y_i\) given \({\textbf{X}}_{i}\), then
To simplify the notation, in the remainder of the paper, we omit the subscript \(\tau \) from \(\varvec{\beta }_{\tau 0}\).
When \({\textbf{X}}_{i}\) is measured without an error and \(Y_i\) is fully observed, \(\varvec{\beta }_0\) can be estimated consistently by
where \(\rho (Y,{\textbf{X}},\varvec{\beta })=\rho _\tau (Y-{\textbf{X}}^\top \varvec{\beta })\), \(\rho _{\tau }(t)=\tau t-t \textrm{I}(t<0)\) is the quantile loss function and and \(\textrm{I}(\cdot )\) is the indicator function.
2.2 Measurement error process
Assume that \({\textbf{X}}_i\) is measured with error and consider the following additive measurement error model
where \({\textbf{U}}_{i} \in {\mathbb {R}}^p\) follows a certain distribution with mean \({\textbf{0}}\) and covariance matrix \(\varvec{\Sigma }\), and is independent of \({\textbf{X}}_{i}\) and \(Y_{i}\). In the subsequent sections, our focus is on two types of measurement errors: normal and Laplace, as these error distributions provide reasonable error models in many applications (Wang et al. 2012). Compared to the normal distribution, the Laplace distribution has heavier tails, so random variables that follow the Laplace distribution are more likely to have extreme values.
In practice, often not all covariates are measured with errors. In this paper, we therefore suppose that only the first q (\(q<p\)) components of \({\textbf{X}}\) have measurement errors, then
where \(\varvec{\varvec{\Sigma }}^{\prime }_{q \times q}\) is a \(q \times q\) matrix.
2.3 Nonignorable missing process
Consider the case where \(Y_i\) is sbuject to nonignorable missingness. Let \(\delta _i\) be a binary response indicator that equals 1 if and only if \(Y_i\) is observable. In this case, the propensity \({\text {Pr}}(\delta _{i}=1 \mid {\textbf{W}}_{i},Y_{i})\) is not identifiable. To solve the identifiability problem, similar to the method of Wang et al. (2014), we assume that \({\textbf{W}}_{i}\) can be decomposed into two parts \({\textbf{W}}_{i}=({\textbf{V}}_i,{\textbf{Z}}_i)\), such that
Furthermore, we impose a parametric model on the propensity
where \(\varvec{\alpha }=(\alpha _1,\varvec{\alpha }_2^\top , \alpha _3)^\top \) is a \(d_{\alpha }\)-dimensional unknown parameter vector and \(\Psi \) is a known monotone function defined on [0, 1]. Popular choices of \(\Psi \) can be the cLog-log model with \(\Psi (t)=1-\exp [-\exp (t)]\), the probit model with \(\Psi \) being the standard normal distribution function and the logistic model with \(\Psi (t)=\exp (t)/[1+\exp (t)]\). Equation (3) shows that, given \(Y_i\) and \({\textbf{V}}_{i}\), \({\textbf{Z}}_i\) can be excluded from the propensity, which will be used to create estimation equations to estimate the unknown parameter vector \(\varvec{\alpha }\) and ensure that \(\Psi \) is identifiable. \({\textbf{Z}}_i\) is referred to as a nonresponse instrument.
3 Inference method
3.1 Weighted corrected-loss estimation
The quantile regression estimator \(\tilde{\varvec{\beta }}\) obtained by (2) satisfies
Under model (1), there is \({\text {Pr}}(Y<{\textbf{X}}^\top \varvec{\beta }_0\mid {\textbf{X}})=\tau \), then
so \(\varphi (Y,{\textbf{X}},\varvec{\beta })={\textbf{X}}\{\textrm{I}(Y-{\textbf{X}}^\top \varvec{\beta }<0)-\tau \}\) is an unbiased estimating function of \(\varvec{\beta }_0\). When \({\textbf{X}}_i\) is measured with error, replacing \({\textbf{X}}_i\) in (2) with the surrogate variable \({\textbf{W}}_i\) usually results in an inconsistent estimator, because \({\mathbb {E}}\left[ {\textbf{W}}\{\textrm{I}(Y-{\textbf{W}}^\top \varvec{\beta }_0<0)-\tau \}\right] =0\) may not be satisfied. To account for the measurement error, we adopt the approach proposed by Wang et al. (2012).
Assume that \({\textbf{U}}_i\sim {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})\), is a p-dimensional normal random vector, define
where \(\epsilon _1\sim {\mathcal {N}}(\mu ,\sigma ^2)\), \(G_{\mathcal {N}}( x) = \pi ^{- 1}\int _0^x\sin ( t) /t\textrm{d}t\), \(\pi \) is the mathematical constant, h is the smoothing parameter. \(\rho _{\mathcal {N}}(\epsilon _1,h)\) offers a smooth approximation to \(\rho _{\tau }(\epsilon _1)\). Let
where \(u\sim {\mathcal {N}}(0,1)\) is independent of \(\epsilon _1\). Note that \((Y-{\textbf{W}}^\top \varvec{\beta })\mid (Y,{\textbf{X}})\sim \) \({\mathcal {N}}(Y-{\textbf{X}}^\top \varvec{\beta },\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta }),\) then, motivated by Wang et al. (2012), the bias-corrected quantile loss function of the model (1) only involving normal measurement error is defined as
Next we consider the Laplace measurement error, suppose that \({\textbf{U}}_{i}\) is a p-dimensional Laplace random vector, denoted as \({\textbf{U}}_{i}\sim {\mathcal {L}}({\textbf{0}},\varvec{\varvec{\Sigma }})\). Let \(\epsilon _2=Y-{\textbf{W}}^\top \varvec{\beta }\), we have \(\epsilon _2\mid (Y,{\textbf{X}})\sim {\mathcal {L}}(Y-{\textbf{X}}^\top \varvec{\beta },\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta })\). Subsequently, the corrected quantile loss function of model (1) only involving Laplace measurement error is defined as
where \(\rho _{\mathcal {L}}(\epsilon _2,h)=\epsilon _2\{\tau -1+G_{\mathcal {L}}(\epsilon _2/h)\}\), \(G_{\mathcal {L}}(x)=\int _{t<x} K(t) \textrm{d}t\), \(K(\cdot )\) is a kernel density function, \(\sigma ^2=\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta }\). By some calculations, as \(h \rightarrow 0\), there are
where \({\mathbb {E}}^{*}\) is the expectation with respect to \({\textbf{W}}\) given Y and \({\textbf{X}}\). Hence, the minimizers of \(\sum _{i=1}^n\rho _{{\mathcal {N}}}(Y_i,{\textbf{W}}_i,\varvec{\beta },h)\) and \(\sum _{i=1}^n\rho _{{\mathcal {L}}}(Y_i,{\textbf{W}}_i,\varvec{\beta },h)\) are consistent estimators of \(\varvec{\beta }_0\).
But when Y has nonignorable missing values, the consistency mentioned above is broken. To eliminate the missing effect, the IPW method is employed to adjust the bias-corrected quantile loss functions (7) and (8), resulting in the following weighted bias-corrected quantile loss functions
where \({\Delta }({\textbf{V}}, Y, \varvec{\alpha })=\Psi \left( \alpha _1+\varvec{\alpha }_2^\top {\textbf{V}}+\alpha _3 Y\right) \). Note that there exists one obstacle in (10), specifically, \(\varvec{\alpha }\) is unknown.
To estimate the unknown parameter \(\varvec{\alpha }\), we construct the following estimation equation
where \(\eta ({\textbf{W}})\) is a known vector-valued function with dimension \(d_{\eta } \ge d_{\alpha }\). When \(d_{\eta }=d_{\alpha }\), the estimator \({\varvec{\hat{\alpha }}}\) is obtained by solving \(\sum _{i=1}^n g\left( Y_i,{\textbf{W}}_i, \delta _i,\varvec{\alpha }\right) =0\). When \(d_{\eta }>d_{\alpha }\), we apply the GMM (Hansen 1982) approach as follows
where \({\bar{g}}(\varvec{\alpha })=n^{-1} \sum _{i=1}^n g\left( Y_i,{\textbf{W}}_i, \delta _i, \varvec{\alpha }\right) \), \({\varvec{\hat{\Omega }}}^{-1}\) is the inverse of the matrix \(n^{-1}\) \(\sum _{i=1}^n g\left( Y_i,{\textbf{W}}_i, \delta _i, {\varvec{\hat{\alpha }}}^{(1)}\right) g\left( Y_i,{\textbf{W}}_i, \delta _i, {\varvec{\hat{\alpha }}}^{(1)}\right) ^{\top }\) and \({\varvec{\hat{\alpha }}}^{(1)}=\underset{\varvec{\alpha }}{\arg \min } {\bar{g}}(\varvec{\alpha })^{\top } {\bar{g}}(\varvec{\alpha })\). Once a consistent estimator \({\varvec{\hat{\alpha }}}\) is obtained, we define the weighted bias-corrected quantile estimators as
It is not difficult to show that the expectation of \(\rho ^\star _{\mathcal {N}}\left( Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\hat{\varvec{\alpha }}\right) \) and \(\rho ^\star _{\mathcal {L}}\left( Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\hat{\varvec{\alpha }}\right) \) with respect to \(\delta \) given Y and \({\textbf{W}}\) is equal to \(\rho _{\mathcal {N}}\left( Y, {\textbf{W}}, \varvec{\beta }, h\right) \) and \(\rho _{\mathcal {L}}\left( Y, {\textbf{W}}, \varvec{\beta }, h\right) \) respectively. Thus, according to Eq. (9), \(\hat{\varvec{\beta }}_{\mathcal {N}}\) and \(\hat{\varvec{\beta }}_{\mathcal {L}}\) are consistent estimators for \(\varvec{\beta }_0\), which can handle both covariate measurement errors and nonignorable missing responses.
Remark 1
The minimization problems (11) and (12) can be solved by the “optim” function in the R software. The the initial value of \(\varvec{\beta }\) is obtained by regressing observed \(Y_i\) on \({\textbf{W}}_i\). The smoothing parameter h can be selected through simulation-extrapolation-type strategy, which is proposed by Wang et al. (2012).
3.2 Large sample properties
Theorem 1
When the measurement error \({\textbf{U}}_{i} \sim {\mathcal {L}}({\textbf {0}}, \varvec{\Sigma })\), suppose that Conditions (C1)–(C4), (C6) and (C8) in Appendix A hold, if \(h \rightarrow 0\) and \((n h)^{-1 / 2} \log (n) \rightarrow 0\),then \(\hat{\varvec{\beta }}_{\mathcal {L}}\) converges to \(\varvec{\beta }_0\) in probability, as \(n \rightarrow \infty \).
Theorem 2
When the measurement error \({\textbf{U}}_{i} \sim {\mathcal {N}}({\textbf {0}}, \varvec{\Sigma })\), suppose that Conditions (C1)-(C5) and (C8) in Appendix A hold, if \(h \rightarrow 0\) and \(h=c(\log n)^{-\xi }\), where \(\xi <1 / 2\) and c is a positive constant, then \(\hat{\varvec{\beta }}_{\mathcal {N}}\) converges to \(\varvec{\beta }_0\) in probability, as \(n \rightarrow \infty \).
Theorem 3
Under the conditions given in the Appendix A. Suppose that \(\varvec{\alpha }_0 \in \Theta _{\alpha }\) is the unique solution to \({\mathbb {E}}[g(Y, {\textbf{W}},\delta , \varvec{\alpha })]=0\), \(\varvec{\Lambda }={\mathbb {E}}\left[ \partial g\left( Y, {\textbf{W}}, \delta , \varvec{\alpha }_0\right) / \partial \varvec{\alpha }\right] \) is of full rank and \(\varvec{\Omega }={\mathbb {E}}\left[ g\left( Y, {\textbf{W}},\delta , \varvec{\alpha }_0\right) g\left( Y, {\textbf{W}}, \delta , \varvec{\alpha }_0\right) ^{\top }\right] \) is positive definite. As \(n \rightarrow \infty \), we have
where \(\hat{\varvec{\beta }}\) is the consistent estimator of \(\varvec{\beta }_0\), either \(\hat{\varvec{\beta }}_{\mathcal {N}}\) or \(\hat{\varvec{\beta }}_{\mathcal {L}}\) defined in Sect. 3.1. The definition of \({\textbf{A}}\) and \({\textbf{D}}\) is given in Appendix A.
Remark 2
The large sample properties mentioned above are developed under the assumption that \(\varvec{\Sigma }\) is known. When \(\varvec{\Sigma }\) is unknown, it needs to be estimated. The common estimation method is the partial replication method proposed by Carroll et al. (1995). We assume that each \({\textbf{W}}_{i}\) is itself the average of m replicate measurements \({\textbf{W}}_{i,k}, k=1, \ldots , m\), each having variance \(m \varvec{\Sigma }\). Then, a consistent and unbiased estimate of \(\varvec{\Sigma }\) is
4 Numerical studies
4.1 Instrument and propensity model selection
How to find a suitable nonresponse instrument from a set of covariates is an important question. For example, when \({\textbf{W}}=(W_1,W_2)^\top \) is a two-dimensional random vector, \({\textbf{Z}}\) has the following three choices
Several studies have attempted to address the issues mentioned above. Let \(p(Y\mid {\textbf{X}})\) be a generic notation for conditional distribution. By assuming a parametric model of \(p(Y\mid {\textbf{X}})\) and a unspecifed propensity, Chen et al. (2021) developed a two-step instrument search procedure. In contrast, Wang et al. (2021) proposed a penalized validation criterion (PVC), under a parametric model on propensity, but an unspecified \(p(Y\mid {\textbf{X}})\). The assumptions about the \(p(Y\mid {\textbf{X}})\) and propensity in this paper are consistent with (Wang et al. 2021), which motivates us to consider the following PVC
where \(\textrm{VC}(k)=\frac{1}{n}\sum _{i=1}^n|{\hat{F}}_k({\textbf{W}}_i)-{\hat{F}}({\textbf{W}}_i)|\), \({\hat{F}}({\textbf{w}})=n^{-1}\sum _{i=1}^n\textrm{I}({\textbf{W}}_i\le {\textbf{w}})\), \({\hat{F}}_k({\textbf{w}})=\frac{1}{n}\sum _{i=1}^n\frac{\delta _i\textrm{I}({\textbf{W}}_i\le {\textbf{w}})}{\Delta _k({\textbf{V}}_i,Y_i,\hat{\varvec{\alpha }}^k)},1\le k \le K\), \(\Delta _k({\textbf{V}}_i,Y_i,\varvec{\alpha }^k)\) are candidate models, K is the total number of candidate model and \(d_k\) is the dimension of \(\varvec{\alpha }^k\) and \(\lambda \ge 0\) is a regularization parameter whose value can be determined by the cross-validation method.
To be specific, when assuming \(\Psi (\vartheta )=\exp (\vartheta )/ [1+\exp (\vartheta )]\), the candidate model \(\Delta _k({\textbf{V}}, Y, \varvec{\alpha }^k)\) corresponding to (13) are as follows
where \({\textbf{V}}_0=\emptyset , {\textbf{V}}_1=\{W_2\},{\textbf{V}}_2=\{W_1\}\). Please note that criterion (14) enables the simultaneous selection of of both the propensity model and the nonresponse instrument. By replacing \(\exp (\vartheta )/ [1+\exp (\vartheta )]\) with an alternative function, we can derive three additional prospective models. Selection among these six candidates can then be carried out in accordance with criterion (14).
4.2 Monte Carlo studies
In this section, we conduct Monte Carlo simulations to study the finite-sample performance of the proposed estimation. Simulated data are generated from the model:
where \(X_{i1}\sim \textrm{Uniform}(-3,3)\), \(X_{i2}\sim {\mathcal {N}}(0,2^2)\), \(\beta _{1}=1\), \(\beta _{2}=2\), \({e_i}(\tau )={e_i}-F_{{e_i}}^{-1}(\tau )\) and \(F_{{e_i}}(\cdot )\) is the distribution function of \({e_i}\). We consider three different distributions for \({e_i}\):
-
(1)
Normal distribution (E1): \({\mathcal {N}}(0,2^2)\);
-
(2)
Heteroscedastic normal distribution (E2): \({\mathcal {N}}(0,(1+{|X_{i2} |})^2)\);
-
(3)
t-distribution with 3 degrees of freedom (E3): t(3).
Note that E2 is heteroscedastic error and E3 is heavy-tailed error. Furthermore, the measurement error model is \({\textbf{W}}_{i}={\textbf{X}}_{i}+{\textbf{U}}_{i}\), where \({\textbf{U}}_{i}\) are generated from \( {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})\) with
We generate \(\delta _{i}\) from the Bernoulli distribution according to the following probability
The coefficients are chosen such that the missing rate are between \(25\%\) and \(40\%\). Then, we choose \(\eta ({\textbf{W}}) = (1, {\textbf{V}}^\top ,{\textbf{Z}}^\top )^\top \), which is consistent with (Wang et al. 2014) and Wang et al. (2021). More specifically, according to Eq. (15), it can be deduced that \({\textbf{V}}=W_{i1}\) and \({\textbf{Z}}=W_{i2}\), therefore, in this example, \(\eta ({\textbf{W}}) = (1, W_{i1}, W_{i2})^\top \).
First, we conducted simulations with sample sizes of \(n = 300\), 500, and 800 to assess the PVC outlined in Sect. 4.1. Table 1 reports the number of times each candidate model is selected by the PVC in 100 Monte Carlo replications. According to Table 1, it can be seen that the PVC can select the correct propensity \(\Delta _{2}(Y,{\textbf{V}}_{2},\varvec{\alpha }^2)\) with empirical probability higher than those for other candidates. Remarkably, the probability of selecting \(\Delta _{2}(Y,{\textbf{V}}_{2},\varvec{\alpha }^2)\) almost reaches 1 when the sample size expands to 800.
Furthermore, to evaluate the estimation efficiency, we conduct simulation studies of the following four estimators:
-
(1)
N: The naive estimator that ignores both the measurement errors and missingness is defined as follows
$$\begin{aligned} \underset{\varvec{\beta }}{{\text {argmin}}}\sum _{i=1}^n\delta _i\rho _\tau \left( Y_i-{\textbf{W}}_i^\top \varvec{\beta } \right) . \end{aligned}$$ -
(2)
D: The estimator that only considers the missingness is obtained by
$$\begin{aligned} \underset{\varvec{\beta }}{{\text {argmin}}}\sum _{i=1}^n\frac{\delta _i}{\Delta \left( {\textbf{V}}_{i},Y_i, \hat{\varvec{\alpha }}\right) } \rho _\tau \left( Y_i-{\textbf{W}}_i^\top \varvec{\beta } \right) . \end{aligned}$$ -
(3)
M: The estimator that only considers the measurement errors is defined as
$$\begin{aligned} \underset{\varvec{\beta }}{{\text {argmin}}}\sum _{i=1}^n\rho _{{\mathcal {N}}}(Y_i,{\textbf{W}}_i,\varvec{\beta },h). \end{aligned}$$ -
(4)
DM: The proposed estimator which consider the measurement errors and missingness simultaneously.
All results are based on 200 simulation replications and the sample sizes \(n = 300\) and 500. The biases (Bias) and the root mean square errors (RMSE) are utilized to assess the performance of the aforementioned estimators. Bias and RMSE are defined as follows
Simulation results are presented in Tables 2, 3. Figure 1 presents the boxplots of \({\hat{\beta }}_j-\beta _{0j},(j=1,2)\) at \((\tau ,n)=(0.5,300)\) by all four methods. A few conclusions can be drawn as follows:
-
(1)
The proposed estimator has negligible biases in all cases. This also demonstrates the proposed estimator is less sensitive to the distribution of the error term \(e_i\). These results are consistent with our theory. As expected, the naive estimator is biased due to the presence of measurement errors and nonignorable missing.
-
(2)
From the Fig. 1, it can be seen that the variance of the proposed estimator is larger compared to the naive estimator. However, as the sample size n increases, the RMSE of the proposed estimator tend to be consistently lower. This indicates that despite the increased variance associated with the proposed method, the benefit from bias correction effectively offsets this variance, leading to an overall improvement in estimation accuracy.
To conclude, the robustness analysis of the proposed estimator to \(\eta ({\textbf{W}})\) is investigated. In specific, we consider \(\eta _1({\textbf{W}}) = (W_{i1}, W_{i2}, W_{i2}^2)^\top \), \(\eta _2({\textbf{W}}) = (1, W_{i1}, W_{i2}, W_{i2}^2)^\top \), and the simulation results with \(n = 500\) are reported in the Table 4. The empirical results show that the proposed estimator is robust to the choice of \(\eta ({\textbf{W}})\).
Simulation studies under the Laplace measurement error are presented in Simulation I in Appendix B.1. The experiment results yield conclusions that align with those in the above example. Simulation II in Appendix B.2 displays the proposed estimators perform well even though the measurement error distribution is misspecification.
4.3 Real data example: Boston housing data
As an illustration, the proposed methodology is now applied to the Boston housing data, which is available in the MASS package in R software. These data contain 506 observations on fourteen variables. Many studies have used this data and found potential relationships between MEDV and PTRATIO, RM, TAX, LSTAT; see (Yu and Lu 2004) and Jiang et al. (2016). In this paper, we also focused on the following five variables:
-
MEDV:
Median value of owner-occupied homes in $1000;
-
PTRATIO:
Pupil-teacher ratio by town;
-
RM:
Average number of rooms per dwelling;
-
TAX:
Full-value property-tax rate per $10,000;
-
LSTAT:
Percentage of low-income population.
We follow previous studies by log-transforming TAX and LSTAT. For simplicity of notation, the variables MEDV, PTRATIO, RM, \(log(\textrm{TAX})\) and \(log(\textrm{LSTAT})\) are denoted, respectively, \(Y_i\), \(X_{i1}\), \(X_{i2}\), \(X_{i3}\) and \(X_{i4}\). The model
is used to fit the data at quantile level \(\tau = 0.5\). To better illustrate our proposed method, we assume that \(X_{i1}\) are subject to measurement error. The measurement error model is constructed as \({\textbf{W}}_{i}={\textbf{X}}_{i}+{\textbf{U}}_{i}\), where \({\textbf{X}}_{i}=(X_{i1},X_{i2},X_{i3},X_{i4})^\top \), and \({\textbf{U}}_{i}\) are generated from \( {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})\) with
Because our proposed method is robust to the misspecification of the measurement error distribution, in this example, only the case where \({\textbf{U}}_{i}\sim {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})\) is studied. Then to consider scenarios with missing data, binary response indicators \(\delta _i\) are generated prior to estimation, with \(\delta _i \sim {\text {Bernoulli}} (p_i)\). We consider three choices of \(p_{i}\) as follows:
-
M1
\(p_i=1/\{1+\exp (-1.5+0.9W_{i1}+0.9Y_i)\}\);
-
M2
\(p_i=1/\{1+\exp (-1.4+0.9W_{i1}+0.8sin(Y_i))\}\);
-
M3
\(p_i=|sin(-1+0.2W_{i1}^{-1}+0.1Y_i)|\).
The above coefficients in M1, M2 and M3 are set to ensure that the missing ratio is approximately 20%. To apply the proposed method, we use the working model
Therefore, under M1, the working model is correct, while under M2 and M3, the working model is misspecified. Table 5 summarizes the coefficient estimates with four methods under M1. The standard errors in the parentheses are obtained based on 200 bootstrap samples. The findings presented in Table 5 reveal that only RM positively influences housing price, while PTRATIO, TAX, and LSTAT negatively impact housing price, consistent with the conclusions in Yu and Lu (2004) and Jiang et al. (2016).
For comparison, we assess the performance of these estimators based on out-of-sample predictions. Specifically, we estimate the above four regression model based on 300 data and then employ the estimated coefficients to generate a forecast of the other 206 data. We compare the mean squared errors (MSE) and mean absolute deviations (MAD) of the predictions, which are defined as
The MSE and MAD for the four estimators under M1 are given in Table 6. The results show that our proposed method outperforms the remaining three methods. Additionlly, Jiang et al. (2018) used the partially linear varying coefficient model to fit the Boston housing data based on weighted composite quantile regression method. It is noteworthy that they used the same evaluation criterion as we did, while the MSE and MAD of their method were 4.4039 and 0.9560, respectively. This again proves that our method is relatively effective in correcting the biases caused by measurement errors and missing data.
In the final analysis, we computed the MSE and MAD of the proposed estimator under M2 and M3, resulting in values of 0.653, 0.549, 0.639, and 0.543, respectively. Compared to the results obtained under M1, it can be seen that using the wrong propensity in the estimation process leads to poorer results. Consequently, in applications where the true propensity is unknown, it is advisable to utilize the PVC outlined in Sect. 4.1 to determine a suitable propensity model, thereby conducting better statistical inference.
5 Conclusion and discussion
In this paper, a robust method has been proposed to simultaneously deal with nonignorable nonresponse and measurement errors in covariates for the linear quantile regression model. We also established the asymptotic properties of the proposed estimates. Simulation study and real data analysis are given to examine the finite sample performance of the proposed approaches. Several extensions can be investigated in the future. To obtain more efficient estimates, the results of this paper can be generalized to composite quantile regression (Kai et al. 2011). In addition, penalized variable selection can be used to identify the significant predictors.
References
Carroll RJ, Ruppert D, Stefanski LA (1995) Measurement error in nonlinear models. Chapman and Hall, London
Chen J, Shao J, Fang F (2021) Instrument search in pseudo-likelihood approach for nonignorable nonresponse. Ann Inst Stat Math 73(3):519–533. https://doi.org/10.1007/s10463-020-00758-z
Ding X, Chen J, Chen X (2020) Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses. Metrika 83(5):545–568. https://doi.org/10.1007/s00184-019-00744-3
Hansen L (1982) Large sample properties of generalized method of moments estimators. Econometrica 50(4):1029–1054. https://doi.org/10.1016/j.jeconom.2012.05.008
He X, Liang H (2006) Quantile regression estimates for a class of linear and partially linear errors-in-variables models. Stat Sin 10(1):129–140
Jiang D, Zhao P, Tang N (2016) A propensity score adjustment method for regression models with nonignorable missing covariates. Comput Stats Data Anal 94:98–119. https://doi.org/10.1016/j.csda.2015.07.017
Jiang R, Qian W, Zhou Z (2016) Weighted composite quantile regression for single-index models. J Multivar Anal 148:34–48. https://doi.org/10.1016/j.jmva.2016.02.015
Jiang R, Qian W, Zhou Z (2018) Weighted composite quantile regression for partially linear varying coefficient models. Commun Stat Theory Methods 47(16):3987–4005. https://doi.org/10.1080/03610926.2017.1366522
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39(1):305–332. https://doi.org/10.1214/10-AOS842
Koenker R, Bassett J (1978) Regression quantiles. Econometrica 46(1):33–50
Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of Quantile Regression. Chapman and Hall, New York
Little R, Rubin D (2002) Statistical Analysis with Missing Data. Wiley, New York
Ma W, Zhang T, Wang L (2022) Improved multiple quantile regression estimation with nonignorable dropouts. J Korean Stat Soc 52:1–32. https://doi.org/10.1007/s42952-022-00185-1
Qin G, Zhang J, Zhu Z (2016) Quantile regression in longitudinal studies with dropouts and measurement errors. J Stat Comput Simul 86(17):3527–3542. https://doi.org/10.1080/00949655.2016.1171867
Wang H, Stefanski L, Zhu Z (2012) Corrected-loss estimation for quantile regression with covariate measurement errors. Biometrika 99(2):405–421. https://doi.org/10.1093/biomet/ass005
Wang L, Shao J, Fang F (2021) Propensity model selection with nonignorable nonresponse and instrument variable. Stat Sin 31(2):647–672. https://doi.org/10.5705/ss.202019.0025
Wang S, Shao J, Kim JK (2014) An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat Sin 24(3):1097–1116. https://doi.org/10.5705/ss.2012.074
Wei Y, Carroll RJ (2009) Quantile regression with measurement error. J Am Stat Assoc 104(487):1129–1143. https://doi.org/10.1198/jasa.2009.tm08420
White H (1980) Nonlinear regression on cross-sectional data. Econometrica 48:721–746
Yu A, Zhong Y, Feng X, Wei Y (2022) Quantile regression for nonignorable missing data with its application of analyzing electronic medical records. Biometrics 79(3):2036–2049. https://doi.org/10.1111/biom.13723
Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31(3):333–346. https://doi.org/10.1111/j.1467-9469.2004.03_035.x
Zhao P, Zhao H, Tang N, Li Z (2017) Weighted composite quantile regression analysis for nonignorable missing data using nonresponse instrument. J Nonparametric Stat 29(2):189–212. https://doi.org/10.1080/10485252.2017.1285030
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
No potential Conflict of interest was reported by the authors.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Proofs of theorems
To establish the asymptotic properties of the proposed estimators, the following regularity conditions are imposed.
-
(C1)
The samples \(\{({\textbf{X}}_i,Y_i,\delta _i):i=1,\ldots ,n\}\) are independent and identically distributed.
-
(C2)
The parameter space of \(\varvec{\beta }\) denoted by \(\Theta _{\varvec{\beta }}\) is a compact set. The vector \(\varvec{\beta }\) is an interior point of the \(\Theta _{\varvec{\beta }}\).
-
(C3)
The expectation \({\mathbb {E}}\left[ \left\| {\textbf{X}}_i\right\| ^2\right] \) is bounded, and \({\mathbb {E}}\left[ {\textbf{X}}_i {\textbf{X}}_i^\top \right] \) is a positive definite \(p \times p\) matrix.
-
(C4)
The probability density function of \(e_{i}\) conditional on \({\textbf{X}}_{i}\) is bounded from infinity, and it is bounded away from zero and has a bounded first derivative in the neighbourhood of zero.
-
(C5)
For each i, \({\mathbb {E}}\left[ e_i^2 \mid {\textbf{X}}_i\right] \) is bounded as a function of \(\tau \).
-
(C6)
The kernel function K(x) is a bounded probability density function which exists finite fourth moment. Moreover, K(x) is twice-differentiable, and its second derivative \(K^{(2)}(x)\) is bounded and Lipschitz continuous on \((-\infty , \infty )\).
-
(C7)
Let \(\rho ^\star (Y, {\textbf{W}},\varvec{\beta },h,\delta ,\varvec{\alpha })\) and \(\rho (Y, {\textbf{W}}, \varvec{\beta }, h)\) respectively denote the weighted bias-corrected quantile loss function and the bias-corrected quantile loss function for either normal or Laplace measurement errors. Denote \(\psi _1(Y, {\textbf{W}}, \varvec{\beta },h,\delta , \alpha ) =\partial \rho ^\star (Y, {\textbf{W}}, \varvec{\beta },h,\delta , \varvec{\alpha }) / \partial \varvec{\beta }\), \(\psi _2(Y, {\textbf{W}}, \varvec{\beta },h,\delta , \varvec{\alpha })=\) \(\partial ^2 \rho ^\star (Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\alpha )/ \partial \varvec{\beta }\partial \varvec{\beta }^\top \) and\({\tilde{\psi }}_1(Y, {\textbf{W}}, \varvec{\beta }, h )\) \(=\partial \rho (Y, {\textbf{W}}, \varvec{\beta }, h) / \partial \varvec{\beta }\). As \(n \rightarrow \infty \) and \(h \rightarrow 0\), there exist positive definite matrices \({\textbf{A}}\), \({\textbf{B}}\) and \({\textbf{H}}\) such that \({\mathbb {E}}[\psi _2(Y, {\textbf{W}}, \varvec{\beta }_0, h, \delta ,\varvec{\alpha }_0)] \rightarrow {\textbf{A}}\) and \({\mathbb {E}}[\psi _1(Y, {\textbf{W}}, \varvec{\beta }_0, h,\delta ,\varvec{\alpha }_0)^{\otimes 2}] \rightarrow {\textbf{B}}\), \({\mathbb {E}}\left[ \frac{\partial \Delta \left( {\textbf{V}}, Y, \varvec{\alpha }_0\right) / \partial \varvec{\alpha }}{\Delta \left( {\textbf{V}}, Y, \varvec{\alpha }_0\right) } {\tilde{\psi }}_1\left( Y, {\textbf{W}}, \varvec{\beta }_0, h\right) \right] \rightarrow {\textbf{H}}\).
-
(C8)
The propensity \(\Delta \left( {\textbf{V}},Y, \varvec{\alpha }\right) \) satisfies: (a) it is twice differentiable with respect to \(\varvec{\alpha }\); (b) \(0< c<\Delta \left( {\textbf{V}},Y,\varvec{\alpha }\right) < 1\) for a positive constant c; (c) \(\partial \Delta ({\textbf{V}},Y,\varvec{\alpha })/\partial \varvec{\alpha }\) is uniformly bounded.
Remark 3
(C2) ensures the existence of \(\hat{\varvec{\beta }}_{\mathcal {N}}\) and \(\hat{\varvec{\beta }}_{\mathcal {L}}\), and the uniformity of the convergence of the minimand over \(\Theta _{\varvec{\beta }}\), as required to prove the consistency. (C3)-(C4) ensure that \(\varvec{\beta }_0\) is the unique minimizer of \({\mathbb {E}}\{\rho (Y,{\textbf{X}},\varvec{\beta })\}\). (C7) are assumed to achieve the asymptotic normality of the estimators \({\varvec{\hat{\beta }}}_{\mathcal {N}}\) and \({\varvec{\hat{\beta }}}_{\mathcal {L}}\).
Proof of Theorem 1
Define
Following Theorem 2 in Wang et al. (2012), we have \({\mathbb {E}} [M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]={\mathbb {E}} [{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]\), it leads to
For the first term in the right of the inequality, by Taylor expansion,
where \(\varvec{\alpha }^*\) lines between \(\varvec{\alpha }_0\) and \(\hat{\varvec{\alpha }}\). According to Wang et al. (2014), we have \(\hat{\varvec{\alpha }}-\varvec{\alpha }_0=O_p(n^{-1 / 2})\), then
Using the similar arguments to Wang et al. (2012) and Qin et al. (2016), the other four terms in the right of the inequality, respectively, have
and
Combining Eqs. (A2)–(A6), when \(h \rightarrow 0\) and \((n h)^{-1 / 2} \log (n) \rightarrow 0\), we can obtain
By Conditions (C3) and (C4), \(\varvec{\beta }_0\) uniquely minimizes \({\mathbb {E}}[ M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) ]\) over \(\Theta _{\varvec{\beta }}\). According to Lemma 2.2 in White (1980), \(\hat{\varvec{\beta }}_{\mathcal {L}}\) converges to \(\varvec{\beta }_0\) in probability. \(\square \)
Proof of Theorem 2
The proof of Theorem 2 is similar to the proof of Theorem 1 and the proof of Part (ii) of Theorem 3 in Wang et al. (2012), so it is omitted here. \(\square \)
Proof of Theorem 3
In the proof of Theorem 3, we drop \({\mathcal {L}}\) in \(\hat{\varvec{\beta }}_{\mathcal {L}}\) for notational simplicity. Define
Furthermore, let
By Taylor expansion, we can obtain
After sorting, we get
Under the Conditions (C1)-(C6), we have
By Condition (C7),
Then
For \(I_{1}\), there is
By using the results of Theorems 2 in Wang et al. (2012) and methods like those used to obtain the asymptotic means and variances of kernel density estimators, we have
as \(n \rightarrow \infty \) and \(h \rightarrow 0\). Therefore
Together with (A8), (A9), and (A10), Condition (C7) and the central limit theorem, we can derive that
In terms of \(I_{2}\), noticing that
and
thus we have
by the law of large numbers and Condition (C7). Furthermore, according to Wang et al. (2014), we have \(\hat{\varvec{\alpha }}-\varvec{\alpha }_0=O_p(n^{-1 / 2})\) and \(n^{1 / 2}\left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}(0, \varvec{\Sigma }_{\alpha })\) with \(\varvec{\Sigma }_{\alpha }=\left\{ \varvec{\Lambda }^{\top } \varvec{\Omega }^{-1} \varvec{\Lambda }\right\} ^{-1}\). As a result,
In addition, it is not difficult to verify that \({\mathbb {E}}[I_{i 1}+I_{i 2}]=o_p(1)\). Note that
Direct calculation yields that
Employing similar idea in the proof of \(I_{i 2}\), we get \(D_1=o_p(1)\). On the other hand,
For \(i \ne j\), note that \(\frac{-\delta _{{j}}\partial \Delta ({\textbf{V}}_{{j}},Y_{{j}},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{{j}},Y_{{j}},\varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_{{j}},{\textbf{W}}_{{j}}, \varvec{\beta }_0, h \right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \) and \(\psi _1\big (Y_i,{\textbf{W}}_{i}, \) \(\varvec{\beta }_0, h,\delta _i,\varvec{\varvec{\alpha }}_0\big )\) are independent. Similar to the proofs of \(I_{i 1}\) and \(I_{i 2}\), it leads to
Thus, we have \(D_2=o_p(1)\). To conclude, \(\left( n^{-1 / 2} \sum _{i=1}^n I_{i 1}\right) \cdot \left( n^{-1 / 2} \sum _{i=1}^n I_{i 2}\right) =\) \(o_p(1)\), then \({\text {Cov}}\left( I_{i 1}, I_{i 2}\right) \) \(=o_p(1)\). Hence, it can be derived that \({\text {Cov}}\left( I_{i 1}+I_{i 2}\right) ={\textbf{B}}+{\textbf{H}} \varvec{\Sigma }_{\alpha } {\textbf{H}}^{\top }\triangleq {\textbf{D}}\). As a result,
Combined with (A7), (A14) then
The asymptotic normality of \(\hat{\varvec{\beta }}_{\mathcal {N}}\) can be obtained by replacing \(\rho _{\mathcal {L}}(Y, {\textbf{W}}, \varvec{\beta }, h) \) and \(\dot{\rho }_{\mathcal {L}}\left( Y, {\textbf{X}}, \varvec{\beta }, h\right) \) with \(\rho _{\mathcal {N}}(Y, {\textbf{W}}, \varvec{\beta }, h) \) and \(\dot{\rho }_{\mathcal {N}}\left( Y, {\textbf{X}}, \varvec{\beta }, h\right) \) in the above proof. \(\square \)
Additional simulation studies
1.1 Simulation I
Consider model
where \((X_{i1},X_{i2})^\top \sim {\mathcal {N}}\left( {\textbf{1}}, \varvec{\Sigma }_x\right) \) with \(\varvec{\Sigma }_x=\left( 0.5^{|j-k|}\right) _{1\le j,k\le 2}\). The measurement error model is \({\textbf{W}}_{i}={\textbf{X}}_{i }+{\textbf{U}}_{i}\), where \({\textbf{U}}_{i} \sim {\mathcal {L}}({\textbf{0}},\varvec{\Sigma })\) with
In this examples, we choose \(K(x)=\frac{1}{\sqrt{2\pi }}e^{-\frac{x^2}{2}}\). The other settings are the same as those in the example in Sect. 4.2, and the missing rate in this example is between \(30\%\) and \(48\%\). The simulation results are presented in Tables 7, 8. Figure 2 presents the boxplots of \({\hat{\beta }}_k-\beta _{0k},(k=1,2)\) at \((\tau ,n)=(0.25,300)\) by all four methods. We can obtain similar conclusions as those in the first example.
1.2 Simulation II
In order to verify whether the proposed method is robust to the misspecification of the measurement error distribution, in this example, we exchange the distribution of the measurement \({\textbf{U}}_{i}\) between the example in Sect. 4.2 and the example in Simulation I without changing the other settings. More specifically, we define two estimators \({\hat{\beta }}_{{\mathcal {L}}{{\mathcal {N}}}}\) and \({\hat{\beta }}_{{\mathcal {N}}{{\mathcal {L}}}}\) as follows
Table 9 reports the Bias and RMSE of the estimators \(\hat{\varvec{\beta }}_{{\mathcal {L}}{{\mathcal {N}}}}\) and \(\hat{\varvec{\beta }}_{{\mathcal {N}}{{\mathcal {L}}}}\) in 200 simulation replicates. Figures 3 and 4, present the boxplots of \({\hat{\beta }}_k-\beta _{0k},(k=1,2)\) at \(\tau =0.50\). Simulation results show that the two proposed estimators are both robustness against misspecification of the measurement error distribution.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Liang, X., Tian, B. Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses. Metrika (2024). https://doi.org/10.1007/s00184-024-00967-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00184-024-00967-z