Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Liang, Xiaowen; Tian, Boping

doi:10.1007/s00184-024-00967-z

Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Published: 18 May 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Metrika Aims and scope Submit manuscript

Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Download PDF

Xiaowen Liang¹ &
Boping Tian¹

113 Accesses
Explore all metrics

Abstract

In this paper, we consider quantile regression estimation for linear models with covariate measurement errors and nonignorable missing responses. Firstly, the influence of measurement errors is eliminated through the bias-corrected quantile loss function. To handle the identifiability issue in the nonignorable missing, a nonresponse instrument is used. Then, based on the inverse probability weighting approach, we propose a weighted bias-corrected quantile loss function that can handle both nonignorable missingness and covariate measurement errors. Under certain regularity conditions, we establish the asymptotic properties of the proposed estimators. The finite sample performance of the proposed method is illustrated by Monte Carlo simulations and an empirical data analysis.

Quantile regression and its empirical likelihood with missing response at random

Article 04 June 2016

Imputation based statistical inference for partially linear quantile regression models with missing responses

Article 09 June 2016

Imputation-based empirical likelihood inferences for partially nonlinear quantile regression models with missing responses

Article 06 April 2022

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Quantile regression, first proposed by Koenker and Bassett (1978), has become an important statistical method. By considering different quantiles, quantile regression provides a more complete description of the conditional distribution of responses given covariates. In addition, compared to mean regression, quantile regression demonstrates robustness in the presence of heavy-tailed errors. A detailed review of quantile regression can be found in Koenker et al. (2017).

Traditional quantile regression assumes that the data is fully observed, that is, there are no missingness or measurement errors. However, in many applications, especially in biomedical and social science studies, this assumption may be violated. It is well known that ignoring measurement errors and missing data may produce a large bias for the regression coefficients (Carroll et al. 1995; Little and Rubin 2002). Therefore, when measurement errors and missingness coexist, it becomes essential to handle both problems to obtain reliable results. Depending on the missing mechanism, Little and Rubin (2002) defined three types of missing: missing completely at random, missing at random (MAR), and missing not at random (MNAR). In this paper, we consider the data to be MNAR, which can also be called nonignorable missing data.

In the context of quantile regression, numerous methods have been proposed to handle measurement errors or nonignorable missing responses separately. For quantile regression with covariate measurement errors, He and Liang (2006) studied an orthogonal regression method by assuming that both regression errors and measurement errors obey the same symmetric distribution. This method limits the flexibility of the model. Wei and Carroll (2009) established joint estimating equations and developed an iterative estimation program, which produce a consistent estimator. However, their approach is computationally complex. Wang et al. (2012) developed a smooth corrected quantile estimation procedure that avoids the symmetry assumption and is simple to implement. Nonignorable missing responses in quantile regression have also been studied. For example, based on the instrumental variable, Zhao et al. (2017) considered the empirical likelihood method for the linear models. Ding et al. (2020) introduced a regularized estimation for ultrahigh-dimensional data. For more literature, see (Jiang et al. 2016; Ma et al. 2022; Yu et al. 2022), and so on.

To the best of our knowledge, for quantile regression, there is little literature addressing the concurrent biases arising from nonignorable missing responses and covariate measurement errors simultaneously. We focus on this topic in this paper. Specifically, we propose a two-stage procedure for constructing a weighted bias-corrected quantile loss function, which yields a consistent estimate for linear quantile regression models with both covariate measurement errors and nonignorable nonresponse. In the first stage, we employ the bias-corrected quantile loss function to eliminate the bias introduced by measurement errors. Subsequently, a nonresponse instrument and a generalized gethod of moments (GMM) approach are utilized to estimate the unknown parameters in the propensity. Once the propensity is consistently estimated, in the second stage, we construct a weighted bias-corrected quantile loss function based on the inverse probability weighting (IPW) approach. Furthermore, under some regularity conditions, the asymptotic properties of the proposed estimators are derived.

The remainder of this article is organized as follows. In Sect. 2, the linear quantile regression model with covariate measurement errors and nonignorable missing responses is described. In Sect. 3, we propose a weighted bias-corrected quantile loss function. Asymptotic properties of the proposed estimators are also presented in this section. Simulation studies are given in Sect. 4. Section 5 concludes with a discussion. Proofs of the theorems are deferred to the Appendix A.

2 Statistical modeling

2.1 Linear quantile regression

For a given quantile level $\tau \in (0,1)$, consider the following linear quantile regression model

$$\begin{aligned} Y_{i}={\textbf{X}}_{i}^\top \varvec{\beta }_{\tau 0}+e_{i}, \ i=1,2,\ldots ,n, \end{aligned}$$

(1)

where $Y_{i} \in {\mathbb {R}} $ is response, ${\textbf{X}}_{i}=(X_{i1},\ldots ,X_{ip}) \in {\mathbb {R}}^p$ is the corresponding covariate, $\varvec{\beta }_{\tau 0}$ is a p-dimensional vector of unknown parameters and $e _{i}$ is an error term satisfied ${\text {Pr}}(e_{i}< 0 \mid {\textbf{X}}_{i})=\tau $. Let $Q_{Y_i}(\tau \vert {\textbf{X}}_{i})$ be the condition quantile of $Y_i$ given ${\textbf{X}}_{i}$, then

$$\begin{aligned} Q_{Y_i}(\tau \vert {\textbf{X}}_{i})={\textbf{X}}_{i}^\top \varvec{\beta }_{\tau 0}. \end{aligned}$$

To simplify the notation, in the remainder of the paper, we omit the subscript $\tau $ from $\varvec{\beta }_{\tau 0}$.

When ${\textbf{X}}_{i}$ is measured without an error and $Y_i$ is fully observed, $\varvec{\beta }_0$ can be estimated consistently by

$$\begin{aligned} \tilde{\varvec{\beta }}=\underset{\varvec{\beta }}{\text {argmin}}\sum _{i=1}^{n} \rho \left( Y_i,{\textbf{X}}_{i},\varvec{\beta }\right) , \end{aligned}$$

(2)

where $\rho (Y,{\textbf{X}},\varvec{\beta })=\rho _\tau (Y-{\textbf{X}}^\top \varvec{\beta })$, $\rho _{\tau }(t)=\tau t-t \textrm{I}(t<0)$ is the quantile loss function and and $\textrm{I}(\cdot )$ is the indicator function.

2.2 Measurement error process

Assume that ${\textbf{X}}_i$ is measured with error and consider the following additive measurement error model

$$\begin{aligned} {\textbf{W}}_{i}={\textbf{X}}_{i}+{\textbf{U}}_{i} , \ i=1,2,\ldots ,n, \end{aligned}$$

where ${\textbf{U}}_{i} \in {\mathbb {R}}^p$ follows a certain distribution with mean ${\textbf{0}}$ and covariance matrix $\varvec{\Sigma }$, and is independent of ${\textbf{X}}_{i}$ and $Y_{i}$. In the subsequent sections, our focus is on two types of measurement errors: normal and Laplace, as these error distributions provide reasonable error models in many applications (Wang et al. 2012). Compared to the normal distribution, the Laplace distribution has heavier tails, so random variables that follow the Laplace distribution are more likely to have extreme values.

In practice, often not all covariates are measured with errors. In this paper, we therefore suppose that only the first q ($q<p$) components of ${\textbf{X}}$ have measurement errors, then

$$\begin{aligned} \varvec{\Sigma }=\left( \begin{array}{cc} \varvec{\Sigma }^{\prime }_{q \times q} &{} \quad {\textbf {0}} \\ {\textbf {0}}^\top &{} \quad {\textbf {0}} \end{array}\right) , \end{aligned}$$

where $\varvec{\varvec{\Sigma }}^{\prime }_{q \times q}$ is a $q \times q$ matrix.

2.3 Nonignorable missing process

Consider the case where $Y_i$ is sbuject to nonignorable missingness. Let $\delta _i$ be a binary response indicator that equals 1 if and only if $Y_i$ is observable. In this case, the propensity ${\text {Pr}}(\delta _{i}=1 \mid {\textbf{W}}_{i},Y_{i})$ is not identifiable. To solve the identifiability problem, similar to the method of Wang et al. (2014), we assume that ${\textbf{W}}_{i}$ can be decomposed into two parts ${\textbf{W}}_{i}=({\textbf{V}}_i,{\textbf{Z}}_i)$, such that

$$\begin{aligned} {\text {Pr}}(\delta _i=1 \mid {\textbf{W}}_{i},Y_i)={\text {Pr}}(\delta _i=1 \mid {\textbf{V}}_{i},Y_i). \end{aligned}$$

(3)

Furthermore, we impose a parametric model on the propensity

$$\begin{aligned} {\text {Pr}}(\delta _i=1 \mid {\textbf{V}}_{i}, Y_i)=\Psi \left( \alpha _1+{\varvec{\alpha }_2^{\top }{\textbf{V}}_{i}} +\alpha _3 Y_i\right) , \end{aligned}$$

(4)

where $\varvec{\alpha }=(\alpha _1,\varvec{\alpha }_2^\top , \alpha _3)^\top $ is a $d_{\alpha }$-dimensional unknown parameter vector and $\Psi $ is a known monotone function defined on [0, 1]. Popular choices of $\Psi $ can be the cLog-log model with $\Psi (t)=1-\exp [-\exp (t)]$, the probit model with $\Psi $ being the standard normal distribution function and the logistic model with $\Psi (t)=\exp (t)/[1+\exp (t)]$. Equation (3) shows that, given $Y_i$ and ${\textbf{V}}_{i}$, ${\textbf{Z}}_i$ can be excluded from the propensity, which will be used to create estimation equations to estimate the unknown parameter vector $\varvec{\alpha }$ and ensure that $\Psi $ is identifiable. ${\textbf{Z}}_i$ is referred to as a nonresponse instrument.

3 Inference method

3.1 Weighted corrected-loss estimation

The quantile regression estimator $\tilde{\varvec{\beta }}$ obtained by (2) satisfies

$$\begin{aligned} n^{-1}\sum \limits _{i=1}^n {\textbf{X}}_i \left\{ \textrm{I} \left( Y_i-{\textbf{X}}_i^\top \tilde{\varvec{\beta }}<0 \right) -\tau \right\} =o_p(1). \end{aligned}$$

Under model (1), there is ${\text {Pr}}(Y<{\textbf{X}}^\top \varvec{\beta }_0\mid {\textbf{X}})=\tau $, then

$$\begin{aligned} {\mathbb {E}}\left[ {\textbf{X}} \left\{ \textrm{I} \left( Y-{\textbf{X}}^\top \varvec{\beta }_0<0 \right) -\tau \right\} \right] =0, \end{aligned}$$

(5)

so $\varphi (Y,{\textbf{X}},\varvec{\beta })={\textbf{X}}\{\textrm{I}(Y-{\textbf{X}}^\top \varvec{\beta }<0)-\tau \}$ is an unbiased estimating function of $\varvec{\beta }_0$. When ${\textbf{X}}_i$ is measured with error, replacing ${\textbf{X}}_i$ in (2) with the surrogate variable ${\textbf{W}}_i$ usually results in an inconsistent estimator, because ${\mathbb {E}}\left[ {\textbf{W}}\{\textrm{I}(Y-{\textbf{W}}^\top \varvec{\beta }_0<0)-\tau \}\right] =0$ may not be satisfied. To account for the measurement error, we adopt the approach proposed by Wang et al. (2012).

Assume that ${\textbf{U}}_i\sim {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})$, is a p-dimensional normal random vector, define

$$\begin{aligned} \rho _{\mathcal {N}}(\epsilon _1,h)=\epsilon _1 \left\{ \tau -1/2+G_{\mathcal {N}}(\epsilon _1/h)\right\} , \end{aligned}$$

where $\epsilon _1\sim {\mathcal {N}}(\mu ,\sigma ^2)$, $G_{\mathcal {N}}( x) = \pi ^{- 1}\int _0^x\sin ( t) /t\textrm{d}t$, $\pi $ is the mathematical constant, h is the smoothing parameter. $\rho _{\mathcal {N}}(\epsilon _1,h)$ offers a smooth approximation to $\rho _{\tau }(\epsilon _1)$. Let

$$\begin{aligned} \begin{aligned} f\left( \epsilon _1,\sigma ^2,h\right)&={\mathbb {E}}\left[ \rho _{\mathcal {N}}(\epsilon _1+\sqrt{-1}\sigma u,h)\mid \epsilon _1\right] \\&=\epsilon _1(\tau -1/2)+\pi ^{-1} \\&\quad \times \int _0^{1/h}\{t^{-1}\epsilon _1\sin (t\epsilon _1)-\sigma ^2\cos (t\epsilon _1)\}\exp (t^2\sigma ^2/2) \textrm{d}t. \end{aligned} \end{aligned}$$

(6)

where $u\sim {\mathcal {N}}(0,1)$ is independent of $\epsilon _1$. Note that $(Y-{\textbf{W}}^\top \varvec{\beta })\mid (Y,{\textbf{X}})\sim $ ${\mathcal {N}}(Y-{\textbf{X}}^\top \varvec{\beta },\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta }),$ then, motivated by Wang et al. (2012), the bias-corrected quantile loss function of the model (1) only involving normal measurement error is defined as

$$\begin{aligned} \rho _{\mathcal {N}}(Y,{\textbf{W}},\varvec{\beta },h) =f\left( Y-{\textbf{W}}^\top \varvec{\beta },\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta },h\right) . \end{aligned}$$

(7)

Next we consider the Laplace measurement error, suppose that ${\textbf{U}}_{i}$ is a p-dimensional Laplace random vector, denoted as ${\textbf{U}}_{i}\sim {\mathcal {L}}({\textbf{0}},\varvec{\varvec{\Sigma }})$. Let $\epsilon _2=Y-{\textbf{W}}^\top \varvec{\beta }$, we have $\epsilon _2\mid (Y,{\textbf{X}})\sim {\mathcal {L}}(Y-{\textbf{X}}^\top \varvec{\beta },\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta })$. Subsequently, the corrected quantile loss function of model (1) only involving Laplace measurement error is defined as

$$\begin{aligned} \begin{aligned} \rho _{\mathcal {L}}(Y,{\textbf{W}},\varvec{\beta },h)&=\rho _{\mathcal {L}}(\epsilon _2,h)-\frac{\sigma ^2}{2}\frac{\partial ^2\rho _{\mathcal {L}}(\epsilon _2,h)}{\partial \epsilon _2^2} \\&=\epsilon _2(\tau -1)+\epsilon _2 G_{\mathcal {L}}\left( \dfrac{\epsilon _2}{h}\right) \\&\quad -\dfrac{\sigma ^2}{2} \left\{ \dfrac{2}{h}K\left( \dfrac{\epsilon _2}{h}\right) +\dfrac{\epsilon _2}{h^2}K^{\prime }\left( \dfrac{\epsilon _2}{h}\right) \right\} , \end{aligned} \end{aligned}$$

(8)

where $\rho _{\mathcal {L}}(\epsilon _2,h)=\epsilon _2\{\tau -1+G_{\mathcal {L}}(\epsilon _2/h)\}$, $G_{\mathcal {L}}(x)=\int _{t<x} K(t) \textrm{d}t$, $K(\cdot )$ is a kernel density function, $\sigma ^2=\varvec{\beta }^\top \varvec{\varvec{\Sigma }}\varvec{\beta }$. By some calculations, as $h \rightarrow 0$, there are

$$\begin{aligned} {\mathbb {E}}^{*}[\rho _{\mathcal {N}}(Y,{\textbf{W}},\varvec{\beta },h)]&=\rho _{\mathcal {N}}(Y-{\textbf{X}}^\top \varvec{\beta },h) \triangleq \dot{\rho }_{\mathcal {N}}(Y,{\textbf{X}},\varvec{\beta },h) \longrightarrow \rho (Y,{\textbf{X}},\varvec{\beta }),\nonumber \\ {\mathbb {E}}^{*}[\rho _{\mathcal {L}}(Y,{\textbf{W}},\varvec{\beta },h)]&=\rho _{\mathcal {L}}(Y-{\textbf{X}}^\top \varvec{\beta },h) \triangleq \dot{\rho }_{\mathcal {L}}(Y,{\textbf{X}},\varvec{\beta },h) \longrightarrow \rho (Y,{\textbf{X}},\varvec{\beta }), \end{aligned}$$

(9)

where ${\mathbb {E}}^{*}$ is the expectation with respect to ${\textbf{W}}$ given Y and ${\textbf{X}}$. Hence, the minimizers of $\sum _{i=1}^n\rho _{{\mathcal {N}}}(Y_i,{\textbf{W}}_i,\varvec{\beta },h)$ and $\sum _{i=1}^n\rho _{{\mathcal {L}}}(Y_i,{\textbf{W}}_i,\varvec{\beta },h)$ are consistent estimators of $\varvec{\beta }_0$.

But when Y has nonignorable missing values, the consistency mentioned above is broken. To eliminate the missing effect, the IPW method is employed to adjust the bias-corrected quantile loss functions (7) and (8), resulting in the following weighted bias-corrected quantile loss functions

$$\begin{aligned} {\rho ^\star _{\mathcal {N}}}\left( Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\varvec{\alpha }\right)&=\frac{\delta }{\Delta ({\textbf{V}}, Y, \varvec{\alpha })} \rho _{\mathcal {N}}\left( Y, {\textbf{W}}, \varvec{\beta }, h\right) ,\nonumber \\ {\rho ^\star _{\mathcal {L}}}\left( Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }\right)&=\frac{\delta }{\Delta ({\textbf{V}}, Y, \varvec{\alpha })} \rho _{\mathcal {L}}\left( Y, {\textbf{W}}, \varvec{\beta }, h\right) , \end{aligned}$$

(10)

where ${\Delta }({\textbf{V}}, Y, \varvec{\alpha })=\Psi \left( \alpha _1+\varvec{\alpha }_2^\top {\textbf{V}}+\alpha _3 Y\right) $. Note that there exists one obstacle in (10), specifically, $\varvec{\alpha }$ is unknown.

To estimate the unknown parameter $\varvec{\alpha }$, we construct the following estimation equation

$$\begin{aligned} g(Y, {\textbf{W}}, \delta , \varvec{\alpha })={\eta }({\textbf{W}})\left[ \frac{\delta }{\Delta ({\textbf{V}}, Y, \varvec{\alpha })}-1\right] , \end{aligned}$$

where $\eta ({\textbf{W}})$ is a known vector-valued function with dimension $d_{\eta } \ge d_{\alpha }$. When $d_{\eta }=d_{\alpha }$, the estimator ${\varvec{\hat{\alpha }}}$ is obtained by solving $\sum _{i=1}^n g\left( Y_i,{\textbf{W}}_i, \delta _i,\varvec{\alpha }\right) =0$. When $d_{\eta }>d_{\alpha }$, we apply the GMM (Hansen 1982) approach as follows

$$\begin{aligned} {\varvec{\hat{\alpha }}}=\underset{\varvec{\alpha }}{\arg \min } {\bar{g}}(\varvec{\alpha })^{\top } {\varvec{\hat{\Omega }}}^{-1} {\bar{g}}(\varvec{\alpha }), \end{aligned}$$

where ${\bar{g}}(\varvec{\alpha })=n^{-1} \sum _{i=1}^n g\left( Y_i,{\textbf{W}}_i, \delta _i, \varvec{\alpha }\right) $, ${\varvec{\hat{\Omega }}}^{-1}$ is the inverse of the matrix $n^{-1}$ $\sum _{i=1}^n g\left( Y_i,{\textbf{W}}_i, \delta _i, {\varvec{\hat{\alpha }}}^{(1)}\right) g\left( Y_i,{\textbf{W}}_i, \delta _i, {\varvec{\hat{\alpha }}}^{(1)}\right) ^{\top }$ and ${\varvec{\hat{\alpha }}}^{(1)}=\underset{\varvec{\alpha }}{\arg \min } {\bar{g}}(\varvec{\alpha })^{\top } {\bar{g}}(\varvec{\alpha })$. Once a consistent estimator ${\varvec{\hat{\alpha }}}$ is obtained, we define the weighted bias-corrected quantile estimators as

$$\begin{aligned}{} & {} {\varvec{\hat{\beta }}}_{\mathcal {N}}=\underset{\varvec{\beta } }{{\text {argmin}}} \sum _{i=1}^n \rho ^\star _{\mathcal {N}}\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }, h,\delta _i,{\varvec{\hat{\alpha }}}\right) , \end{aligned}$$

(11)

$$\begin{aligned}{} & {} {\varvec{\hat{\beta }}}_{\mathcal {L}}=\underset{\varvec{\beta } }{{\text {argmin}}} \sum _{i=1}^n \rho ^\star _{\mathcal {L}}\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }, h,\delta _i, {\varvec{\hat{\alpha }}}\right) . \end{aligned}$$

(12)

It is not difficult to show that the expectation of $\rho ^\star _{\mathcal {N}}\left( Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\hat{\varvec{\alpha }}\right) $ and $\rho ^\star _{\mathcal {L}}\left( Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\hat{\varvec{\alpha }}\right) $ with respect to $\delta $ given Y and ${\textbf{W}}$ is equal to $\rho _{\mathcal {N}}\left( Y, {\textbf{W}}, \varvec{\beta }, h\right) $ and $\rho _{\mathcal {L}}\left( Y, {\textbf{W}}, \varvec{\beta }, h\right) $ respectively. Thus, according to Eq. (9), $\hat{\varvec{\beta }}_{\mathcal {N}}$ and $\hat{\varvec{\beta }}_{\mathcal {L}}$ are consistent estimators for $\varvec{\beta }_0$, which can handle both covariate measurement errors and nonignorable missing responses.

Remark 1

The minimization problems (11) and (12) can be solved by the “optim” function in the R software. The the initial value of $\varvec{\beta }$ is obtained by regressing observed $Y_i$ on ${\textbf{W}}_i$. The smoothing parameter h can be selected through simulation-extrapolation-type strategy, which is proposed by Wang et al. (2012).

3.2 Large sample properties

Theorem 1

When the measurement error ${\textbf{U}}_{i} \sim {\mathcal {L}}({\textbf {0}}, \varvec{\Sigma })$, suppose that Conditions (C1)–(C4), (C6) and (C8) in Appendix A hold, if $h \rightarrow 0$ and $(n h)^{-1 / 2} \log (n) \rightarrow 0$,then $\hat{\varvec{\beta }}_{\mathcal {L}}$ converges to $\varvec{\beta }_0$ in probability, as $n \rightarrow \infty $.

Theorem 2

When the measurement error ${\textbf{U}}_{i} \sim {\mathcal {N}}({\textbf {0}}, \varvec{\Sigma })$, suppose that Conditions (C1)-(C5) and (C8) in Appendix A hold, if $h \rightarrow 0$ and $h=c(\log n)^{-\xi }$, where $\xi <1 / 2$ and c is a positive constant, then $\hat{\varvec{\beta }}_{\mathcal {N}}$ converges to $\varvec{\beta }_0$ in probability, as $n \rightarrow \infty $.

Theorem 3

Under the conditions given in the Appendix A. Suppose that $\varvec{\alpha }_0 \in \Theta _{\alpha }$ is the unique solution to ${\mathbb {E}}[g(Y, {\textbf{W}},\delta , \varvec{\alpha })]=0$, $\varvec{\Lambda }={\mathbb {E}}\left[ \partial g\left( Y, {\textbf{W}}, \delta , \varvec{\alpha }_0\right) / \partial \varvec{\alpha }\right] $ is of full rank and $\varvec{\Omega }={\mathbb {E}}\left[ g\left( Y, {\textbf{W}},\delta , \varvec{\alpha }_0\right) g\left( Y, {\textbf{W}}, \delta , \varvec{\alpha }_0\right) ^{\top }\right] $ is positive definite. As $n \rightarrow \infty $, we have

$$\begin{aligned} \sqrt{n}\left( \hat{\varvec{\beta }}-\varvec{\beta }_0\right) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}\left( 0, {\textbf{A}}^{-1} {\textbf{D}} {\textbf{A}}^{-1}\right) \end{aligned}$$

where $\hat{\varvec{\beta }}$ is the consistent estimator of $\varvec{\beta }_0$, either $\hat{\varvec{\beta }}_{\mathcal {N}}$ or $\hat{\varvec{\beta }}_{\mathcal {L}}$ defined in Sect. 3.1. The definition of ${\textbf{A}}$ and ${\textbf{D}}$ is given in Appendix A.

Remark 2

The large sample properties mentioned above are developed under the assumption that $\varvec{\Sigma }$ is known. When $\varvec{\Sigma }$ is unknown, it needs to be estimated. The common estimation method is the partial replication method proposed by Carroll et al. (1995). We assume that each ${\textbf{W}}_{i}$ is itself the average of m replicate measurements ${\textbf{W}}_{i,k}, k=1, \ldots , m$, each having variance $m \varvec{\Sigma }$. Then, a consistent and unbiased estimate of $\varvec{\Sigma }$ is

$$\begin{aligned} \hat{\varvec{\Sigma }}=\left\{ n\left( m-1\right) \right\} ^{-1} \sum _{i=1}^n \sum _{k=1}^{m}\left( {\textbf{W}}_{i, k}-{\textbf{W}}_{i }\right) \left( {\textbf{W}}_{i,k}-{\textbf{W}}_{i}\right) ^{\top }. \end{aligned}$$

4 Numerical studies

4.1 Instrument and propensity model selection

How to find a suitable nonresponse instrument from a set of covariates is an important question. For example, when ${\textbf{W}}=(W_1,W_2)^\top $ is a two-dimensional random vector, ${\textbf{Z}}$ has the following three choices

$$\begin{aligned} {\textbf{Z}}_0=\{W_1,W_2\},~{\textbf{Z}}_1=\{W_1\},~{\textbf{Z}}_2=\{W_2\}. \end{aligned}$$

(13)

Several studies have attempted to address the issues mentioned above. Let $p(Y\mid {\textbf{X}})$ be a generic notation for conditional distribution. By assuming a parametric model of $p(Y\mid {\textbf{X}})$ and a unspecifed propensity, Chen et al. (2021) developed a two-step instrument search procedure. In contrast, Wang et al. (2021) proposed a penalized validation criterion (PVC), under a parametric model on propensity, but an unspecified $p(Y\mid {\textbf{X}})$. The assumptions about the $p(Y\mid {\textbf{X}})$ and propensity in this paper are consistent with (Wang et al. 2021), which motivates us to consider the following PVC

$$\begin{aligned} \begin{aligned} \textrm{PVC}_\lambda (k)&={\text {VC}}(k)+\lambda \log \left( d_k\right) , \\ {\hat{k}}&=\underset{1 \le k \le K}{{\text {argmin}}} \textrm{PVC}_\lambda (k), \end{aligned} \end{aligned}$$

(14)

where $\textrm{VC}(k)=\frac{1}{n}\sum _{i=1}^n|{\hat{F}}_k({\textbf{W}}_i)-{\hat{F}}({\textbf{W}}_i)|$, ${\hat{F}}({\textbf{w}})=n^{-1}\sum _{i=1}^n\textrm{I}({\textbf{W}}_i\le {\textbf{w}})$, ${\hat{F}}_k({\textbf{w}})=\frac{1}{n}\sum _{i=1}^n\frac{\delta _i\textrm{I}({\textbf{W}}_i\le {\textbf{w}})}{\Delta _k({\textbf{V}}_i,Y_i,\hat{\varvec{\alpha }}^k)},1\le k \le K$, $\Delta _k({\textbf{V}}_i,Y_i,\varvec{\alpha }^k)$ are candidate models, K is the total number of candidate model and $d_k$ is the dimension of $\varvec{\alpha }^k$ and $\lambda \ge 0$ is a regularization parameter whose value can be determined by the cross-validation method.

To be specific, when assuming $\Psi (\vartheta )=\exp (\vartheta )/ [1+\exp (\vartheta )]$, the candidate model $\Delta _k({\textbf{V}}, Y, \varvec{\alpha }^k)$ corresponding to (13) are as follows

$$\begin{aligned} \begin{aligned} \Delta _{0} \left( {\textbf{V}}_{0},Y,\varvec{\alpha }^0 \right)&=\frac{\exp \left( \alpha ^0_{1}+\alpha ^0_{2}Y \right) }{ \left\{ 1+\exp \left( \alpha ^0_{1}+\alpha ^0_{2}Y \right) \right\} }, \\ \Delta _1 \left( {\textbf{V}}_{1},Y,\varvec{\alpha }^1 \right)&=\frac{\exp \left( \alpha ^1_{1}+\varvec{\alpha }^1_{2}{\textbf{V}}_{1}+\alpha ^1_{3}Y \right) }{ \left\{ 1+\exp \left( \alpha ^1_{1}+\varvec{\alpha }^1_{2}{\textbf{V}}_{1}+\alpha ^1_{3}Y \right) \right\} }, \\ \Delta _{2} \left( {\textbf{V}}_{2},Y,\varvec{\alpha }^2\right)&=\frac{\exp \left( \alpha ^2_{1}+\varvec{\alpha }^2_{2}{\textbf{V}}_{2}+\alpha ^2_{3}Y \right) }{ \left\{ 1+\exp \left( \alpha ^2_{1}+\varvec{\alpha }^2_{2}{\textbf{V}}_{2}+\alpha ^2_{3}Y \right) \right\} }, \end{aligned} \end{aligned}$$

where ${\textbf{V}}_0=\emptyset , {\textbf{V}}_1=\{W_2\},{\textbf{V}}_2=\{W_1\}$. Please note that criterion (14) enables the simultaneous selection of of both the propensity model and the nonresponse instrument. By replacing $\exp (\vartheta )/ [1+\exp (\vartheta )]$ with an alternative function, we can derive three additional prospective models. Selection among these six candidates can then be carried out in accordance with criterion (14).

4.2 Monte Carlo studies

In this section, we conduct Monte Carlo simulations to study the finite-sample performance of the proposed estimation. Simulated data are generated from the model:

$$\begin{aligned} Y_{i}=\beta _1 X_{i1} +\beta _2 X_{i2}+{e_i}(\tau ), \quad i=1,2,\ldots n, \end{aligned}$$

where $X_{i1}\sim \textrm{Uniform}(-3,3)$, $X_{i2}\sim {\mathcal {N}}(0,2^2)$, $\beta _{1}=1$, $\beta _{2}=2$, ${e_i}(\tau )={e_i}-F_{{e_i}}^{-1}(\tau )$ and $F_{{e_i}}(\cdot )$ is the distribution function of ${e_i}$. We consider three different distributions for ${e_i}$:

(1)
Normal distribution (E1): ${\mathcal {N}}(0,2^2)$;
(2)
Heteroscedastic normal distribution (E2): ${\mathcal {N}}(0,(1+{|X_{i2} |})^2)$;
(3)
t-distribution with 3 degrees of freedom (E3): t(3).

Note that E2 is heteroscedastic error and E3 is heavy-tailed error. Furthermore, the measurement error model is ${\textbf{W}}_{i}={\textbf{X}}_{i}+{\textbf{U}}_{i}$, where ${\textbf{U}}_{i}$ are generated from $ {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})$ with

$$\begin{aligned} \varvec{\varvec{\Sigma }}=\left( \begin{array}{cc} 0.5^2 &{} 0 \\ 0 &{} 0 \end{array}\right) . \end{aligned}$$

We generate $\delta _{i}$ from the Bernoulli distribution according to the following probability

$$\begin{aligned} \Pr (\delta _i=1\mid {\textbf{V}}_i,Y_i)=\frac{\exp \left( 1.2-0.3 W_{i1}-0.3 Y_i\right) }{1+\exp \left( 1.2-0.3 W_{i1}-0.3 Y_i\right) }. \end{aligned}$$

(15)

The coefficients are chosen such that the missing rate are between $25\%$ and $40\%$. Then, we choose $\eta ({\textbf{W}}) = (1, {\textbf{V}}^\top ,{\textbf{Z}}^\top )^\top $, which is consistent with (Wang et al. 2014) and Wang et al. (2021). More specifically, according to Eq. (15), it can be deduced that ${\textbf{V}}=W_{i1}$ and ${\textbf{Z}}=W_{i2}$, therefore, in this example, $\eta ({\textbf{W}}) = (1, W_{i1}, W_{i2})^\top $.

Table 1 Number of times PVC criterion selects each instrument in 100 simulations

Full size table

Table 2 Bias and RMSE of four estimates with $n=300$ and $\eta ({\textbf{W}}) = (1, W_{i1},W_{i2})^\top $

Full size table

Table 3 Bias and RMSE of four estimates with $n=500$ and $\eta ({\textbf{W}}) = (1, W_{i1},W_{i2})^\top $

Full size table

First, we conducted simulations with sample sizes of $n = 300$, 500, and 800 to assess the PVC outlined in Sect. 4.1. Table 1 reports the number of times each candidate model is selected by the PVC in 100 Monte Carlo replications. According to Table 1, it can be seen that the PVC can select the correct propensity $\Delta _{2}(Y,{\textbf{V}}_{2},\varvec{\alpha }^2)$ with empirical probability higher than those for other candidates. Remarkably, the probability of selecting $\Delta _{2}(Y,{\textbf{V}}_{2},\varvec{\alpha }^2)$ almost reaches 1 when the sample size expands to 800.

Furthermore, to evaluate the estimation efficiency, we conduct simulation studies of the following four estimators:

(1)
N: The naive estimator that ignores both the measurement errors and missingness is defined as follows
$$\begin{aligned} \underset{\varvec{\beta }}{{\text {argmin}}}\sum _{i=1}^n\delta _i\rho _\tau \left( Y_i-{\textbf{W}}_i^\top \varvec{\beta } \right) . \end{aligned}$$
(2)
D: The estimator that only considers the missingness is obtained by
$$\begin{aligned} \underset{\varvec{\beta }}{{\text {argmin}}}\sum _{i=1}^n\frac{\delta _i}{\Delta \left( {\textbf{V}}_{i},Y_i, \hat{\varvec{\alpha }}\right) } \rho _\tau \left( Y_i-{\textbf{W}}_i^\top \varvec{\beta } \right) . \end{aligned}$$
(3)
M: The estimator that only considers the measurement errors is defined as
$$\begin{aligned} \underset{\varvec{\beta }}{{\text {argmin}}}\sum _{i=1}^n\rho _{{\mathcal {N}}}(Y_i,{\textbf{W}}_i,\varvec{\beta },h). \end{aligned}$$
(4)
DM: The proposed estimator which consider the measurement errors and missingness simultaneously.

Table 4 Bias and RMSE of the proposed estimator with two different $\eta ({\textbf{W}})$

Full size table

All results are based on 200 simulation replications and the sample sizes $n = 300$ and 500. The biases (Bias) and the root mean square errors (RMSE) are utilized to assess the performance of the aforementioned estimators. Bias and RMSE are defined as follows

$$\begin{aligned} \text {Bias}({\hat{\beta }}_j)\!=\! \frac{1}{200}\sum _{a\!=\!1}^{200} \left( {\hat{\beta }}_j^{(a)}\!-\!\beta _{0j} \right) ,\text {RMSE}\left( {\hat{\beta }}_j \right) \!=\! \sqrt{\frac{1}{200}\sum _{a\!=\!1}^{200} \left( {\hat{\beta }}_j^{(a)}\!-\!\beta _{0j} \right) ^2}, j=1,2. \end{aligned}$$

Simulation results are presented in Tables 2, 3. Figure 1 presents the boxplots of ${\hat{\beta }}_j-\beta _{0j},(j=1,2)$ at $(\tau ,n)=(0.5,300)$ by all four methods. A few conclusions can be drawn as follows:

(1)
The proposed estimator has negligible biases in all cases. This also demonstrates the proposed estimator is less sensitive to the distribution of the error term $e_i$. These results are consistent with our theory. As expected, the naive estimator is biased due to the presence of measurement errors and nonignorable missing.
(2)
From the Fig. 1, it can be seen that the variance of the proposed estimator is larger compared to the naive estimator. However, as the sample size n increases, the RMSE of the proposed estimator tend to be consistently lower. This indicates that despite the increased variance associated with the proposed method, the benefit from bias correction effectively offsets this variance, leading to an overall improvement in estimation accuracy.

To conclude, the robustness analysis of the proposed estimator to $\eta ({\textbf{W}})$ is investigated. In specific, we consider $\eta _1({\textbf{W}}) = (W_{i1}, W_{i2}, W_{i2}^2)^\top $, $\eta _2({\textbf{W}}) = (1, W_{i1}, W_{i2}, W_{i2}^2)^\top $, and the simulation results with $n = 500$ are reported in the Table 4. The empirical results show that the proposed estimator is robust to the choice of $\eta ({\textbf{W}})$.

Simulation studies under the Laplace measurement error are presented in Simulation I in Appendix B.1. The experiment results yield conclusions that align with those in the above example. Simulation II in Appendix B.2 displays the proposed estimators perform well even though the measurement error distribution is misspecification.

4.3 Real data example: Boston housing data

As an illustration, the proposed methodology is now applied to the Boston housing data, which is available in the MASS package in R software. These data contain 506 observations on fourteen variables. Many studies have used this data and found potential relationships between MEDV and PTRATIO, RM, TAX, LSTAT; see (Yu and Lu 2004) and Jiang et al. (2016). In this paper, we also focused on the following five variables:

MEDV:
Median value of owner-occupied homes in $1000;
PTRATIO:
Pupil-teacher ratio by town;
RM:
Average number of rooms per dwelling;
TAX:
Full-value property-tax rate per $10,000;
LSTAT:
Percentage of low-income population.

We follow previous studies by log-transforming TAX and LSTAT. For simplicity of notation, the variables MEDV, PTRATIO, RM, $log(\textrm{TAX})$ and $log(\textrm{LSTAT})$ are denoted, respectively, $Y_i$, $X_{i1}$, $X_{i2}$, $X_{i3}$ and $X_{i4}$. The model

$$\begin{aligned} {Y_i=\beta _1X_{i1}+\beta _2X_{i2}+\beta _3X_{i3}+\beta _4X_{i4}+e_i}, \end{aligned}$$

is used to fit the data at quantile level $\tau = 0.5$. To better illustrate our proposed method, we assume that $X_{i1}$ are subject to measurement error. The measurement error model is constructed as ${\textbf{W}}_{i}={\textbf{X}}_{i}+{\textbf{U}}_{i}$, where ${\textbf{X}}_{i}=(X_{i1},X_{i2},X_{i3},X_{i4})^\top $, and ${\textbf{U}}_{i}$ are generated from $ {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})$ with

$$\begin{aligned} \varvec{\varvec{\Sigma }}=\left( \begin{array}{cccc} 0.5^2 &{} \quad 0 &{} \quad 0&{} \quad 0\\ 0 &{} \quad 0 &{} \quad 0&{} \quad 0\\ 0 &{}\quad 0&{}\quad 0&{}\quad 0\\ 0 &{}\quad 0&{}\quad 0&{}\quad 0 \end{array}\right) . \end{aligned}$$

Because our proposed method is robust to the misspecification of the measurement error distribution, in this example, only the case where ${\textbf{U}}_{i}\sim {\mathcal {N}}({\textbf{0}},\varvec{\varvec{\Sigma }})$ is studied. Then to consider scenarios with missing data, binary response indicators $\delta _i$ are generated prior to estimation, with $\delta _i \sim {\text {Bernoulli}} (p_i)$. We consider three choices of $p_{i}$ as follows:

M1
$p_i=1/\{1+\exp (-1.5+0.9W_{i1}+0.9Y_i)\}$;
M2
$p_i=1/\{1+\exp (-1.4+0.9W_{i1}+0.8sin(Y_i))\}$;
M3
$p_i=|sin(-1+0.2W_{i1}^{-1}+0.1Y_i)|$.

The above coefficients in M1, M2 and M3 are set to ensure that the missing ratio is approximately 20%. To apply the proposed method, we use the working model

$$\begin{aligned} \Pr (\delta _i=1\mid {\textbf{V}}_i,Y_i)=1/\{1+\exp \left( \alpha _0+\alpha _1 W_{i1}+\alpha _2 Y_i\right) \}. \end{aligned}$$

Therefore, under M1, the working model is correct, while under M2 and M3, the working model is misspecified. Table 5 summarizes the coefficient estimates with four methods under M1. The standard errors in the parentheses are obtained based on 200 bootstrap samples. The findings presented in Table 5 reveal that only RM positively influences housing price, while PTRATIO, TAX, and LSTAT negatively impact housing price, consistent with the conclusions in Yu and Lu (2004) and Jiang et al. (2016).

Table 5 The estimates (with standard errors in parentheses) for Boston housing data

Full size table

For comparison, we assess the performance of these estimators based on out-of-sample predictions. Specifically, we estimate the above four regression model based on 300 data and then employ the estimated coefficients to generate a forecast of the other 206 data. We compare the mean squared errors (MSE) and mean absolute deviations (MAD) of the predictions, which are defined as

$$\begin{aligned} \textrm{MSE}=\frac{1}{206}\sum _{i=1}^{206} \left( Y_i-{\hat{Y}}_i \right) ^2,~\textrm{MAD}=\frac{1}{206}\sum _{i=1}^{206} \left|Y_i-{\hat{Y}}_i \right|. \end{aligned}$$

The MSE and MAD for the four estimators under M1 are given in Table 6. The results show that our proposed method outperforms the remaining three methods. Additionlly, Jiang et al. (2018) used the partially linear varying coefficient model to fit the Boston housing data based on weighted composite quantile regression method. It is noteworthy that they used the same evaluation criterion as we did, while the MSE and MAD of their method were 4.4039 and 0.9560, respectively. This again proves that our method is relatively effective in correcting the biases caused by measurement errors and missing data.

In the final analysis, we computed the MSE and MAD of the proposed estimator under M2 and M3, resulting in values of 0.653, 0.549, 0.639, and 0.543, respectively. Compared to the results obtained under M1, it can be seen that using the wrong propensity in the estimation process leads to poorer results. Consequently, in applications where the true propensity is unknown, it is advisable to utilize the PVC outlined in Sect. 4.1 to determine a suitable propensity model, thereby conducting better statistical inference.

Table 6 The MSE and MAD for real data example

Full size table

5 Conclusion and discussion

In this paper, a robust method has been proposed to simultaneously deal with nonignorable nonresponse and measurement errors in covariates for the linear quantile regression model. We also established the asymptotic properties of the proposed estimates. Simulation study and real data analysis are given to examine the finite sample performance of the proposed approaches. Several extensions can be investigated in the future. To obtain more efficient estimates, the results of this paper can be generalized to composite quantile regression (Kai et al. 2011). In addition, penalized variable selection can be used to identify the significant predictors.

References

Carroll RJ, Ruppert D, Stefanski LA (1995) Measurement error in nonlinear models. Chapman and Hall, London
Book Google Scholar
Chen J, Shao J, Fang F (2021) Instrument search in pseudo-likelihood approach for nonignorable nonresponse. Ann Inst Stat Math 73(3):519–533. https://doi.org/10.1007/s10463-020-00758-z
Article MathSciNet Google Scholar
Ding X, Chen J, Chen X (2020) Regularized quantile regression for ultrahigh-dimensional data with nonignorable missing responses. Metrika 83(5):545–568. https://doi.org/10.1007/s00184-019-00744-3
Article MathSciNet Google Scholar
Hansen L (1982) Large sample properties of generalized method of moments estimators. Econometrica 50(4):1029–1054. https://doi.org/10.1016/j.jeconom.2012.05.008
Article MathSciNet Google Scholar
He X, Liang H (2006) Quantile regression estimates for a class of linear and partially linear errors-in-variables models. Stat Sin 10(1):129–140
MathSciNet Google Scholar
Jiang D, Zhao P, Tang N (2016) A propensity score adjustment method for regression models with nonignorable missing covariates. Comput Stats Data Anal 94:98–119. https://doi.org/10.1016/j.csda.2015.07.017
Article MathSciNet Google Scholar
Jiang R, Qian W, Zhou Z (2016) Weighted composite quantile regression for single-index models. J Multivar Anal 148:34–48. https://doi.org/10.1016/j.jmva.2016.02.015
Article MathSciNet Google Scholar
Jiang R, Qian W, Zhou Z (2018) Weighted composite quantile regression for partially linear varying coefficient models. Commun Stat Theory Methods 47(16):3987–4005. https://doi.org/10.1080/03610926.2017.1366522
Article MathSciNet Google Scholar
Kai B, Li R, Zou H (2011) New efficient estimation and variable selection methods for semiparametric varying-coefficient partially linear models. Ann Stat 39(1):305–332. https://doi.org/10.1214/10-AOS842
Article MathSciNet Google Scholar
Koenker R, Bassett J (1978) Regression quantiles. Econometrica 46(1):33–50
Article MathSciNet Google Scholar
Koenker R, Chernozhukov V, He X, Peng L (2017) Handbook of Quantile Regression. Chapman and Hall, New York
Book Google Scholar
Little R, Rubin D (2002) Statistical Analysis with Missing Data. Wiley, New York
Book Google Scholar
Ma W, Zhang T, Wang L (2022) Improved multiple quantile regression estimation with nonignorable dropouts. J Korean Stat Soc 52:1–32. https://doi.org/10.1007/s42952-022-00185-1
Article MathSciNet Google Scholar
Qin G, Zhang J, Zhu Z (2016) Quantile regression in longitudinal studies with dropouts and measurement errors. J Stat Comput Simul 86(17):3527–3542. https://doi.org/10.1080/00949655.2016.1171867
Article MathSciNet Google Scholar
Wang H, Stefanski L, Zhu Z (2012) Corrected-loss estimation for quantile regression with covariate measurement errors. Biometrika 99(2):405–421. https://doi.org/10.1093/biomet/ass005
Article MathSciNet Google Scholar
Wang L, Shao J, Fang F (2021) Propensity model selection with nonignorable nonresponse and instrument variable. Stat Sin 31(2):647–672. https://doi.org/10.5705/ss.202019.0025
Article MathSciNet Google Scholar
Wang S, Shao J, Kim JK (2014) An instrumental variable approach for identification and estimation with nonignorable nonresponse. Stat Sin 24(3):1097–1116. https://doi.org/10.5705/ss.2012.074
Article MathSciNet Google Scholar
Wei Y, Carroll RJ (2009) Quantile regression with measurement error. J Am Stat Assoc 104(487):1129–1143. https://doi.org/10.1198/jasa.2009.tm08420
Article MathSciNet Google Scholar
White H (1980) Nonlinear regression on cross-sectional data. Econometrica 48:721–746
Article MathSciNet Google Scholar
Yu A, Zhong Y, Feng X, Wei Y (2022) Quantile regression for nonignorable missing data with its application of analyzing electronic medical records. Biometrics 79(3):2036–2049. https://doi.org/10.1111/biom.13723
Article MathSciNet Google Scholar
Yu K, Lu Z (2004) Local linear additive quantile regression. Scand J Stat 31(3):333–346. https://doi.org/10.1111/j.1467-9469.2004.03_035.x
Article MathSciNet Google Scholar
Zhao P, Zhao H, Tang N, Li Z (2017) Weighted composite quantile regression analysis for nonignorable missing data using nonresponse instrument. J Nonparametric Stat 29(2):189–212. https://doi.org/10.1080/10485252.2017.1285030
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Mathematics, Harbin Institute of Technology, Xidazhi, Harbin, 150001, Heilongjiang, China
Xiaowen Liang & Boping Tian

Authors

Xiaowen Liang
View author publications
You can also search for this author in PubMed Google Scholar
Boping Tian
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Boping Tian.

Ethics declarations

Conflict of interest

No potential Conflict of interest was reported by the authors.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Proofs of theorems

To establish the asymptotic properties of the proposed estimators, the following regularity conditions are imposed.

(C1)
The samples $\{({\textbf{X}}_i,Y_i,\delta _i):i=1,\ldots ,n\}$ are independent and identically distributed.
(C2)
The parameter space of $\varvec{\beta }$ denoted by $\Theta _{\varvec{\beta }}$ is a compact set. The vector $\varvec{\beta }$ is an interior point of the $\Theta _{\varvec{\beta }}$.
(C3)
The expectation ${\mathbb {E}}\left[ \left\| {\textbf{X}}_i\right\| ^2\right] $ is bounded, and ${\mathbb {E}}\left[ {\textbf{X}}_i {\textbf{X}}_i^\top \right] $ is a positive definite $p \times p$ matrix.
(C4)
The probability density function of $e_{i}$ conditional on ${\textbf{X}}_{i}$ is bounded from infinity, and it is bounded away from zero and has a bounded first derivative in the neighbourhood of zero.
(C5)
For each i, ${\mathbb {E}}\left[ e_i^2 \mid {\textbf{X}}_i\right] $ is bounded as a function of $\tau $.
(C6)
The kernel function K(x) is a bounded probability density function which exists finite fourth moment. Moreover, K(x) is twice-differentiable, and its second derivative $K^{(2)}(x)$ is bounded and Lipschitz continuous on $(-\infty , \infty )$.
(C7)
Let $\rho ^\star (Y, {\textbf{W}},\varvec{\beta },h,\delta ,\varvec{\alpha })$ and $\rho (Y, {\textbf{W}}, \varvec{\beta }, h)$ respectively denote the weighted bias-corrected quantile loss function and the bias-corrected quantile loss function for either normal or Laplace measurement errors. Denote $\psi _1(Y, {\textbf{W}}, \varvec{\beta },h,\delta , \alpha ) =\partial \rho ^\star (Y, {\textbf{W}}, \varvec{\beta },h,\delta , \varvec{\alpha }) / \partial \varvec{\beta }$, $\psi _2(Y, {\textbf{W}}, \varvec{\beta },h,\delta , \varvec{\alpha })=$ $\partial ^2 \rho ^\star (Y, {\textbf{W}}, \varvec{\beta }, h, \delta ,\alpha )/ \partial \varvec{\beta }\partial \varvec{\beta }^\top $ and${\tilde{\psi }}_1(Y, {\textbf{W}}, \varvec{\beta }, h )$ $=\partial \rho (Y, {\textbf{W}}, \varvec{\beta }, h) / \partial \varvec{\beta }$. As $n \rightarrow \infty $ and $h \rightarrow 0$, there exist positive definite matrices ${\textbf{A}}$, ${\textbf{B}}$ and ${\textbf{H}}$ such that ${\mathbb {E}}[\psi _2(Y, {\textbf{W}}, \varvec{\beta }_0, h, \delta ,\varvec{\alpha }_0)] \rightarrow {\textbf{A}}$ and ${\mathbb {E}}[\psi _1(Y, {\textbf{W}}, \varvec{\beta }_0, h,\delta ,\varvec{\alpha }_0)^{\otimes 2}] \rightarrow {\textbf{B}}$, ${\mathbb {E}}\left[ \frac{\partial \Delta \left( {\textbf{V}}, Y, \varvec{\alpha }_0\right) / \partial \varvec{\alpha }}{\Delta \left( {\textbf{V}}, Y, \varvec{\alpha }_0\right) } {\tilde{\psi }}_1\left( Y, {\textbf{W}}, \varvec{\beta }_0, h\right) \right] \rightarrow {\textbf{H}}$.
(C8)
The propensity $\Delta \left( {\textbf{V}},Y, \varvec{\alpha }\right) $ satisfies: (a) it is twice differentiable with respect to $\varvec{\alpha }$; (b) $0< c<\Delta \left( {\textbf{V}},Y,\varvec{\alpha }\right) < 1$ for a positive constant c; (c) $\partial \Delta ({\textbf{V}},Y,\varvec{\alpha })/\partial \varvec{\alpha }$ is uniformly bounded.

Remark 3

(C2) ensures the existence of $\hat{\varvec{\beta }}_{\mathcal {N}}$ and $\hat{\varvec{\beta }}_{\mathcal {L}}$, and the uniformity of the convergence of the minimand over $\Theta _{\varvec{\beta }}$, as required to prove the consistency. (C3)-(C4) ensure that $\varvec{\beta }_0$ is the unique minimizer of ${\mathbb {E}}\{\rho (Y,{\textbf{X}},\varvec{\beta })\}$. (C7) are assumed to achieve the asymptotic normality of the estimators ${\varvec{\hat{\beta }}}_{\mathcal {N}}$ and ${\varvec{\hat{\beta }}}_{\mathcal {L}}$.

Proof of Theorem 1

Define

$$\begin{aligned} \begin{aligned} M_{\mathcal {L}}({\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha })&=\frac{1}{n} \sum _{i=1}^n \left\{ \frac{\delta _{i}}{\Delta ({\textbf{V}}_{i}, Y_i, \varvec{\alpha })} \big [ \rho _{\mathcal {L}}\left( Y_{i}, {\textbf{W}}_{i}, \varvec{\beta }, h\right) - \rho _{\mathcal {L}}\left( Y_{i}, {\textbf{W}}_{i}, \varvec{\beta }_0, h\right) \big ]\right\} ,\\ {\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right)&=\frac{1}{n} \sum _{i=1}^n \left\{ \frac{\delta _{i}}{\Delta ({\textbf{V}}_{i}, Y_i, \varvec{\alpha }_0)} \big [\dot{\rho }_{\mathcal {L}}\left( Y_{i}, {\textbf{X}}_{i}, \varvec{\beta }, h\right) -\dot{\rho }_{\mathcal {L}}\left( Y_{i}, {\textbf{X}}_{i}, \varvec{\beta }_0, h\right) \big ]\right\} ,\\ M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right)&=\frac{1}{n} \sum _{i=1}^n \left\{ \frac{\delta _{i}}{\Delta ({\textbf{V}}_{i}, Y_i, \varvec{\alpha }_0)} \big [\rho \left( Y_{i}, {\textbf{X}}_{i}, \varvec{\beta }\right) - \rho \left( Y_{i}, {\textbf{X}}_{i}, \varvec{\beta }_0\right) \big ]\right\} . \end{aligned} \end{aligned}$$

Following Theorem 2 in Wang et al. (2012), we have ${\mathbb {E}} [M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]={\mathbb {E}} [{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]$, it leads to

$$\begin{aligned} \begin{aligned}&|M_{\mathcal {L}}({\textbf{W}}, \varvec{\beta }, h, \hat{\varvec{\alpha }})-{\mathbb {E}}[M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) ] |\\&\le |M_{\mathcal {L}}({\textbf{W}}, \varvec{\beta }, h, \hat{\varvec{\alpha }})-M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) |+|M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) -{\mathbb {E}}[M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]|\\&\quad +|{\mathbb {E}}[{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]-{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) |+|{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) -M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) |\\&\quad +|M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) -{\mathbb {E}}[M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) ]|. \end{aligned} \end{aligned}$$

(A1)

For the first term in the right of the inequality, by Taylor expansion,

$$\begin{aligned} \begin{aligned}&|M_{\mathcal {L}}({\textbf{W}}, \varvec{\beta }, h, \hat{\varvec{\alpha }})-M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) |\\&\quad \le \left|\frac{1}{n} \sum _{i=1}^n\frac{\partial }{\partial \varvec{\alpha }}\left( \frac{\delta _{i}}{\Delta \left( {\textbf{V}}_i,Y_i,\varvec{\alpha }_0\right) }\right) \rho _{\mathcal {L}}\left( Y_{i}, {\textbf{W}}_{i}, \varvec{\beta }, h\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }^*\right) \right|\\&\qquad +\left|\frac{1}{n} \sum _{i=1}^n\frac{\partial }{\partial \varvec{\alpha }}\left( \frac{\delta _{i}}{\Delta \left( {\textbf{V}}_i,Y_i,\varvec{\alpha }_0\right) }\right) \rho _{\mathcal {L}}\left( Y_{i}, {\textbf{W}}_{i}, \varvec{\beta }_0, h\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }^*\right) \right|, \end{aligned} \end{aligned}$$

where $\varvec{\alpha }^*$ lines between $\varvec{\alpha }_0$ and $\hat{\varvec{\alpha }}$. According to Wang et al. (2014), we have $\hat{\varvec{\alpha }}-\varvec{\alpha }_0=O_p(n^{-1 / 2})$, then

$$\begin{aligned} \sup \limits _{\varvec{\beta }\in \Theta _{\varvec{\beta }}}|M_{\mathcal {L}}({\textbf{W}},\varvec{\beta },h,\hat{\varvec{\alpha }})-M_{\mathcal {L}}({\textbf{W}},\varvec{\beta },h,\varvec{\varvec{\alpha }}_0)|=o_p(1). \end{aligned}$$

(A2)

Using the similar arguments to Wang et al. (2012) and Qin et al. (2016), the other four terms in the right of the inequality, respectively, have

$$\begin{aligned} \sup _{\varvec{\beta } \in \Theta _{\varvec{\beta }}}|M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) -{\mathbb {E}}[M_{\mathcal {L}}\left( {\textbf{W}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]|=o\left( \log (n) /(n h)^{1 / 2}\right) +O(h), \end{aligned}$$

(A3)

$$\begin{aligned} \sup _{\varvec{\beta } \in \Theta _{\varvec{\beta }}}|{\mathbb {E}}[{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) ]-{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) |=o\left( \log (n)/(n)^{1 / 2}\right) +O(h), \end{aligned}$$

(A4)

$$\begin{aligned} \sup _{\varvec{\beta } \in \Theta _{\varvec{\beta }}}|{\dot{M}}_{\mathcal {L}}\left( {\textbf{X}}, \varvec{\beta }, h, \varvec{\alpha }_0\right) -M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) |=O(h), \end{aligned}$$

(A5)

and

$$\begin{aligned} \sup _{\varvec{\beta } \in \Theta _{\varvec{\beta }}}|M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) -{\mathbb {E}}[M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) ]|=o\left( \log (n) /(n)^{1 / 2}\right) . \end{aligned}$$

(A6)

Combining Eqs. (A2)–(A6), when $h \rightarrow 0$ and $(n h)^{-1 / 2} \log (n) \rightarrow 0$, we can obtain

$$\begin{aligned} \sup _{\varvec{\beta } \in \Theta _{\varvec{\beta }}}|M_{\mathcal {L}}({\textbf{W}}, \varvec{\beta }, h, \hat{\varvec{\alpha }})-{\mathbb {E}}[M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) ]|=o_p(1). \end{aligned}$$

By Conditions (C3) and (C4), $\varvec{\beta }_0$ uniquely minimizes ${\mathbb {E}}[ M\left( {\textbf{X}}, \varvec{\beta }, \varvec{\alpha }_0\right) ]$ over $\Theta _{\varvec{\beta }}$. According to Lemma 2.2 in White (1980), $\hat{\varvec{\beta }}_{\mathcal {L}}$ converges to $\varvec{\beta }_0$ in probability. $\square $

Proof of Theorem 2

The proof of Theorem 2 is similar to the proof of Theorem 1 and the proof of Part (ii) of Theorem 3 in Wang et al. (2012), so it is omitted here. $\square $

Proof of Theorem 3

In the proof of Theorem 3, we drop ${\mathcal {L}}$ in $\hat{\varvec{\beta }}_{\mathcal {L}}$ for notational simplicity. Define

$$\begin{aligned} \psi _1(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }){} & {} =\partial \rho ^\star _{\mathcal {L}}(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }) / \partial \varvec{\beta },\\ \psi _2(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }){} & {} =\partial ^2 \rho ^\star _{\mathcal {L}}(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }) / \partial \varvec{\beta } \partial \varvec{\beta }^\top ,\\ {\tilde{\psi }}_1(Y, {\textbf{W}}, \varvec{\beta }, h ){} & {} =\partial \rho _{\mathcal {L}}(Y, {\textbf{W}}, \varvec{\beta }, h) / \partial \varvec{\beta }. \end{aligned}$$

Furthermore, let

$$\begin{aligned} \rho _n(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }){} & {} =n^{-1} \sum _{i=1}^n \rho ^\star _{\mathcal {L}}\left( Y_i,{\textbf{W}}_{i},\varvec{\beta }, h,\delta _i , \varvec{\alpha }\right) ,\\ \psi _{1 n}(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }){} & {} =\partial \rho _n(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }) / \partial \varvec{\beta },\\ \psi _{2 n}(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }){} & {} =\partial ^2 \rho _n(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha }) / \partial \varvec{\beta } \partial \varvec{\beta }^\top . \end{aligned}$$

By Taylor expansion, we can obtain

$$\begin{aligned}{} & {} { \psi _{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \left( \hat{\varvec{\beta }}-\varvec{\beta }_0\right) } \\{} & {} \quad =-\left[ \psi _{1 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) +\frac{\partial }{\partial \varvec{\alpha }^\top } \psi _{1 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \right] +o_p(1) . \end{aligned}$$

After sorting, we get

$$\begin{aligned} {n^{1/2}\left( \hat{\varvec{\beta }}-\varvec{\beta }_0\right) }{} & {} =- \, n^{1/2}\psi ^{{-1}}_{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \bigg [\psi _{1 n}\left( Y, {\textbf{W}},\varvec{\beta }_0, h,\delta , \varvec{\alpha }_0\right) \\{} & {} \quad + \, \frac{\partial }{\partial \varvec{\alpha }^\top } \psi _{1 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0, h,\delta , \varvec{\alpha }_0\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \bigg ] +o_p(1)\\{} & {} =- \, \psi ^{{-1}}_{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \bigg [\frac{1}{\sqrt{n}}\sum _{i=1}^n \psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0,h,\delta _i, \varvec{\alpha }_0\right) \\{} & {} \quad +\frac{1}{\sqrt{n}}\frac{\partial }{\partial \varvec{\alpha }^\top } \sum _{i=1}^n \psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0,h,\delta _i, \varvec{\alpha }_0\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \bigg ] +o_p(1)\\{} & {} =- \, \psi ^{{-1}}_{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0, h,\delta , \varvec{\alpha }_0\right) \bigg [\frac{1}{\sqrt{n}}\sum _{i=1}^n \psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \\{} & {} \quad +\frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{-\delta _i \partial \Delta ( {\textbf{V}}_{i},Y_{i},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_0)^2} {\tilde{\psi }}_1\left( Y_i, {\textbf{W}}_i, \varvec{\beta }_0, h \right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \bigg ] +o_p(1)\\{} & {} \triangleq - \, \psi ^{{-1}}_{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0, h,\delta , \varvec{\alpha }_0\right) [\frac{1}{\sqrt{n}}\sum _{i=1}^{n}I_{i1}+\frac{1}{{\sqrt{n}}}\sum _{i=1}^n I_{i2}]+o_p(1). \end{aligned}$$

Under the Conditions (C1)-(C6), we have

$$\begin{aligned} \sup _{\varvec{\beta } \in \Theta _{\varvec{\beta }}}|\psi _{2 n}(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha })-{\mathbb {E}}[\psi _{2 n}(Y, {\textbf{W}}, \varvec{\beta }, h,\delta , \varvec{\alpha })]|=o_p(1). \end{aligned}$$

By Condition (C7),

$$\begin{aligned} \lim _{n \rightarrow \infty } {\mathbb {E}} \left[ \psi _{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \right] ={\textbf{A}}. \end{aligned}$$

Then

$$\begin{aligned} {n^{1/2}\left( \hat{\varvec{\beta }}-\varvec{\beta }_0\right) }{} & {} ={-}{\mathbb {E}} \left[ \psi ^{{-1}}_{2 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0,h,\delta , \varvec{\alpha }_0\right) \right] \left[ \frac{1}{\sqrt{n}}\sum _{i=1}^{n}I_{i1}+\frac{1}{{\sqrt{n}}}\sum _{i=1}^n I_{i2} \right] +o_p(1)\nonumber \\{} & {} \triangleq -{{\textbf{A}}^{-1}}[I_{1}+I_{2}]+o_p(1). \end{aligned}$$

(A7)

For $I_{1}$, there is

$$\begin{aligned} I_{1}=n^{-1/2}\sum _{i=1}^{n}\psi _{1}\left( Y_{i},{\textbf{W}}_{i},\varvec{\beta }_{0},h,\delta _i,\varvec{\varvec{\alpha }}_0\right) =n^{1/2}\psi _{1n}\left( Y,{\textbf{W}},\varvec{\beta }_{0},h,\delta ,\varvec{\varvec{\alpha }}_0\right) . \end{aligned}$$

(A8)

By using the results of Theorems 2 in Wang et al. (2012) and methods like those used to obtain the asymptotic means and variances of kernel density estimators, we have

$$\begin{aligned} {\mathbb {E}} \left[ \psi _1\left( Y_i,{\textbf{W}}_i,\varvec{\beta }_0, h,\delta _i,\varvec{\varvec{\alpha }}_0\right) \right] = {\mathbb {E}} \left[ \psi _1\left( Y_{i},{\textbf{X}}_{i},\varvec{\beta }_{0},h,\delta _i,\varvec{\varvec{\alpha }}_0\right) \right] = o\left( n^{-1 / 2}\right) , \nonumber \\ \end{aligned}$$

(A9)

as $n \rightarrow \infty $ and $h \rightarrow 0$. Therefore

$$\begin{aligned} n^{1 / 2} {\mathbb {E}}[\psi _{1 n}\left( Y, {\textbf{W}}, \varvec{\beta }_0, h,\delta , \varvec{\alpha }_0\right) ]=n^{1 / 2} {\mathbb {E}}[\psi _1\left( Y_{i},{\textbf{W}}_{i},\varvec{\beta }_{0},h,\delta _i,\varvec{\varvec{\alpha }}_0\right) ]=o(1). \end{aligned}$$

(A10)

Together with (A8), (A9), and (A10), Condition (C7) and the central limit theorem, we can derive that

$$\begin{aligned} I_1=n^{1 / 2} \psi _{1 n}\left( Y, {\textbf{W}},\varvec{\beta }_0, h, \delta , \varvec{\alpha }_0\right) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}(0, {\textbf{B}}). \end{aligned}$$

(A11)

In terms of $I_{2}$, noticing that

$$\begin{aligned} \begin{aligned} \frac{1}{{\sqrt{n}}}\sum _{i=1}^n I_{i2}&=\frac{1}{\sqrt{n}}\sum _{i=1}^n \frac{-\delta _i \partial \Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_0)^2} {\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i},\varvec{\beta }_0, h \right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \\&=\frac{1}{n} \sum _{i=1}^n \frac{-\delta _i\partial \Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_0)^2} {\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h\right) \sqrt{n} \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) , \end{aligned} \end{aligned}$$

(A12)

and

$$\begin{aligned} \begin{aligned} {\mathbb {E}}&\left[ \frac{\delta _i\partial \Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h \right) \right] \\&= {\mathbb {E}}\left[ {\mathbb {E}}\left\{ \frac{\delta _i\partial \Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h \right) \mid {\textbf{W}}_{i}, Y_i\right\} \right] \\&= {\mathbb {E}}\left[ \frac{\partial \Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_0)}{\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h \right) \right] , \end{aligned} \end{aligned}$$

(A13)

thus we have

$$\begin{aligned} \lim _{n \rightarrow \infty } \frac{1}{n} \sum _{i=1}^n \frac{\delta _{i}\partial \Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_{0})/ \partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_0)^2} {\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h\right) ={\textbf{H}}+o_p(1), \end{aligned}$$

by the law of large numbers and Condition (C7). Furthermore, according to Wang et al. (2014), we have $\hat{\varvec{\alpha }}-\varvec{\alpha }_0=O_p(n^{-1 / 2})$ and $n^{1 / 2}\left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}(0, \varvec{\Sigma }_{\alpha })$ with $\varvec{\Sigma }_{\alpha }=\left\{ \varvec{\Lambda }^{\top } \varvec{\Omega }^{-1} \varvec{\Lambda }\right\} ^{-1}$. As a result,

$$\begin{aligned} \frac{1}{\sqrt{n}} \sum _{i=1}^n I_{i 2} {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}\left( 0, {\textbf{H}} \varvec{\Sigma }_{\alpha } {\textbf{H}}^{\top }\right) . \end{aligned}$$

In addition, it is not difficult to verify that ${\mathbb {E}}[I_{i 1}+I_{i 2}]=o_p(1)$. Note that

$$\begin{aligned} \begin{aligned}&\!\!\!\! \left( \frac{1}{\sqrt{n}}\sum _{i=1}^{n}I_{i1}\right) \cdot \left( \frac{1}{{\sqrt{n}}}\sum _{i=1}^n I_{i2}\right) \\&\quad = \frac{1}{n} \sum _{i=1}^n \psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h,\delta _i, \varvec{\alpha }_0\right) \frac{-\delta _i\partial \Delta ({\textbf{V}}_{i},Y_{i}, \varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ( {\textbf{V}}_{i},Y_{i},\varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \\&\qquad +\frac{1}{n} \sum _{i \ne j}^n\psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h ,\delta _i,\varvec{\alpha }_0\right) \frac{-\delta _{{j}} \partial \Delta ({\textbf{V}}_{{j}},Y_{{j}}, \varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_ {{j}},Y_{{j}},\varvec{\alpha }_0)^2} {\tilde{\psi }}_1\left( Y_{{j}},{\textbf{W}}_{{j}}, \varvec{\beta }_0, h\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) \\&\quad = D_1+D_2 . \end{aligned} \end{aligned}$$

Direct calculation yields that

$$\begin{aligned} \begin{aligned} D_1&= \frac{1}{n} \sum _{i=1}^n \psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h,\delta _i,\varvec{\alpha }_0\right) \\&\quad \times \frac{-\delta _{i}\partial \Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{i},Y_{i},\varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h\right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) . \end{aligned} \end{aligned}$$

Employing similar idea in the proof of $I_{i 2}$, we get $D_1=o_p(1)$. On the other hand,

$$\begin{aligned} \begin{aligned} D_2&=\frac{1}{n} \sum _{i \ne j}^n \psi _1\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }_0, h,\delta _i,\varvec{\alpha }_0\right) \\&\quad \times \frac{-\delta _{{j}}\partial \Delta ({\textbf{V}}_{{j}},Y_{{j}},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{{j}},Y_{{j}},\varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_{{j}},{\textbf{W}}_{{j}}, \varvec{\beta }_0, h \right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) . \\ \end{aligned} \end{aligned}$$

For $i \ne j$, note that $\frac{-\delta _{{j}}\partial \Delta ({\textbf{V}}_{{j}},Y_{{j}},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_{{j}},Y_{{j}},\varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_{{j}},{\textbf{W}}_{{j}}, \varvec{\beta }_0, h \right) \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) $ and $\psi _1\big (Y_i,{\textbf{W}}_{i}, $ $\varvec{\beta }_0, h,\delta _i,\varvec{\varvec{\alpha }}_0\big )$ are independent. Similar to the proofs of $I_{i 1}$ and $I_{i 2}$, it leads to

$$\begin{aligned}{} & {} {\mathbb {E}}\left[ \psi _1\left( Y_i,{\textbf{W}}_{i},\varvec{\beta }_0, h,\delta _i,\varvec{\alpha }_0\right) \right] =o\left( n^{-1 / 2}\right) , \\{} & {} {\mathbb {E}}\left[ \frac{\partial \Delta ({\textbf{V}}_{j},Y_{j},\varvec{\alpha }_{0})/\partial \varvec{\alpha }}{\Delta ({\textbf{V}}_j,Y_{j},\varvec{\alpha }_0)^2}{\tilde{\psi }}_1\left( Y_j,{\textbf{W}}_{j}, \varvec{\beta }_0, h,\delta _i, \varvec{\alpha }_0\right) \right] \left( \hat{\varvec{\alpha }}-\varvec{\alpha }_0\right) =O_p\left( n^{-1 / 2}\right) . \end{aligned}$$

Thus, we have $D_2=o_p(1)$. To conclude, $\left( n^{-1 / 2} \sum _{i=1}^n I_{i 1}\right) \cdot \left( n^{-1 / 2} \sum _{i=1}^n I_{i 2}\right) =$ $o_p(1)$, then ${\text {Cov}}\left( I_{i 1}, I_{i 2}\right) $ $=o_p(1)$. Hence, it can be derived that ${\text {Cov}}\left( I_{i 1}+I_{i 2}\right) ={\textbf{B}}+{\textbf{H}} \varvec{\Sigma }_{\alpha } {\textbf{H}}^{\top }\triangleq {\textbf{D}}$. As a result,

$$\begin{aligned} \frac{1}{\sqrt{n}}\sum _{i=1}^{n}I_{i1}+\frac{1}{{\sqrt{n}}}\sum _{i=1}^n I_{i2} {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}\left( 0, {\textbf{D}}\right) . \end{aligned}$$

(A14)

Combined with (A7), (A14) then

$$\begin{aligned} \sqrt{n}\left( \hat{\varvec{\beta }}-\varvec{\beta }_0\right) {\mathop {\longrightarrow }\limits ^{d}} {\mathcal {N}}\left( 0, {\textbf{A}}^{-1} {\textbf{D}} {\textbf{A}}^{-1}\right) . \end{aligned}$$

The asymptotic normality of $\hat{\varvec{\beta }}_{\mathcal {N}}$ can be obtained by replacing $\rho _{\mathcal {L}}(Y, {\textbf{W}}, \varvec{\beta }, h) $ and $\dot{\rho }_{\mathcal {L}}\left( Y, {\textbf{X}}, \varvec{\beta }, h\right) $ with $\rho _{\mathcal {N}}(Y, {\textbf{W}}, \varvec{\beta }, h) $ and $\dot{\rho }_{\mathcal {N}}\left( Y, {\textbf{X}}, \varvec{\beta }, h\right) $ in the above proof. $\square $

Additional simulation studies

1.1 Simulation I

Consider model

$$\begin{aligned} Y_{i}=\beta _1 X_{i1} +\beta _2 X_{i2}+e_i(\tau ), \quad i=1,2,\ldots n, \end{aligned}$$

where $(X_{i1},X_{i2})^\top \sim {\mathcal {N}}\left( {\textbf{1}}, \varvec{\Sigma }_x\right) $ with $\varvec{\Sigma }_x=\left( 0.5^{|j-k|}\right) _{1\le j,k\le 2}$. The measurement error model is ${\textbf{W}}_{i}={\textbf{X}}_{i }+{\textbf{U}}_{i}$, where ${\textbf{U}}_{i} \sim {\mathcal {L}}({\textbf{0}},\varvec{\Sigma })$ with

$$\begin{aligned} \varvec{\Sigma }=\left( \begin{array}{cc} 0.35^2 &{} \quad 0 \\ 0 &{} \quad 0 \end{array}\right) . \end{aligned}$$

In this examples, we choose $K(x)=\frac{1}{\sqrt{2\pi }}e^{-\frac{x^2}{2}}$. The other settings are the same as those in the example in Sect. 4.2, and the missing rate in this example is between $30\%$ and $48\%$. The simulation results are presented in Tables 7, 8. Figure 2 presents the boxplots of ${\hat{\beta }}_k-\beta _{0k},(k=1,2)$ at $(\tau ,n)=(0.25,300)$ by all four methods. We can obtain similar conclusions as those in the first example.

Table 7 Bias and RMSE of four estimates in Simulation I with $n=300$

Full size table

Table 8 Bias and root RMSE of four estimates in Simulation I with $n=500$

Full size table

1.2 Simulation II

Table 9 Bias and RMSE of four estimates in Simulation II with $n=300$

Full size table

In order to verify whether the proposed method is robust to the misspecification of the measurement error distribution, in this example, we exchange the distribution of the measurement ${\textbf{U}}_{i}$ between the example in Sect. 4.2 and the example in Simulation I without changing the other settings. More specifically, we define two estimators ${\hat{\beta }}_{{\mathcal {L}}{{\mathcal {N}}}}$ and ${\hat{\beta }}_{{\mathcal {N}}{{\mathcal {L}}}}$ as follows

$$\begin{aligned} \begin{aligned} \hat{\varvec{\beta }}_{{\mathcal {L}}{{\mathcal {N}}}}&=\underset{\varvec{\beta } }{{\text {argmin}}} \sum _{i=1}^n \rho ^\star _{\mathcal {L}}\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }, h,\delta _i, \hat{\varvec{\alpha }}\right) , with~ {\textbf{U}}_{i}\sim {\mathcal {N}}({\textbf{0}},\varvec{\Sigma }),\\ \hat{\varvec{\beta }}_{{\mathcal {N}}{{\mathcal {L}}}}&=\underset{\varvec{\beta } }{{\text {argmin}}} \sum _{i=1}^n \rho ^\star _{\mathcal {N}}\left( Y_i,{\textbf{W}}_{i}, \varvec{\beta }, h,\delta _i,\hat{\varvec{\alpha }}\right) , with~ {\textbf{U}}_{i}\sim {\mathcal {L}}({\textbf{0}},\varvec{\Sigma }). \end{aligned} \end{aligned}$$

Table 9 reports the Bias and RMSE of the estimators $\hat{\varvec{\beta }}_{{\mathcal {L}}{{\mathcal {N}}}}$ and $\hat{\varvec{\beta }}_{{\mathcal {N}}{{\mathcal {L}}}}$ in 200 simulation replicates. Figures 3 and 4, present the boxplots of ${\hat{\beta }}_k-\beta _{0k},(k=1,2)$ at $\tau =0.50$. Simulation results show that the two proposed estimators are both robustness against misspecification of the measurement error distribution.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Liang, X., Tian, B. Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses. Metrika (2024). https://doi.org/10.1007/s00184-024-00967-z

Download citation

Received: 11 June 2023
Accepted: 09 April 2024
Published: 18 May 2024
DOI: https://doi.org/10.1007/s00184-024-00967-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Statistical inference for linear quantile regression with measurement error in covariates and nonignorable missing responses

Abstract

Similar content being viewed by others

Quantile regression and its empirical likelihood with missing response at random

Imputation based statistical inference for partially linear quantile regression models with missing responses

Imputation-based empirical likelihood inferences for partially nonlinear quantile regression models with missing responses

1 Introduction

2 Statistical modeling

2.1 Linear quantile regression

2.2 Measurement error process

2.3 Nonignorable missing process

3 Inference method

3.1 Weighted corrected-loss estimation

Remark 1

3.2 Large sample properties

Theorem 1

Theorem 2

Theorem 3

Remark 2

4 Numerical studies

4.1 Instrument and propensity model selection

4.2 Monte Carlo studies

4.3 Real data example: Boston housing data

5 Conclusion and discussion

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Appendices

Proofs of theorems

Remark 3

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Additional simulation studies

1.1 Simulation I

1.2 Simulation II

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation