Generalized varying coefficient partially linear measurement errors models

Zhang, Jun; Feng, Zhenghui; Xu, Peirong; Liang, Hua

doi:10.1007/s10463-015-0532-y

Generalized varying coefficient partially linear measurement errors models

Published: 29 July 2015

Volume 69, pages 97–120, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

Generalized varying coefficient partially linear measurement errors models

Download PDF

Jun Zhang¹,
Zhenghui Feng²,
Peirong Xu³ &
…
Hua Liang⁴

647 Accesses
7 Citations
Explore all metrics

Abstract

We study generalized varying coefficient partially linear models when some linear covariates are error prone, but their ancillary variables are available. We first calibrate the error-prone covariates, then develop a quasi-likelihood-based estimation procedure. To select significant variables in the parametric part, we develop a penalized quasi-likelihood variable selection procedure, and the resulting penalized estimators are shown to be asymptotically normal and have the oracle property. Moreover, to select significant variables in the nonparametric component, we investigate asymptotic behavior of the semiparametric generalized likelihood ratio test. The limiting null distribution is shown to follow a Chi-square distribution, and a new Wilks phenomenon is unveiled in the context of error-prone semiparametric modeling. Simulation studies and a real data analysis are conducted to evaluate the performance of the proposed methods.

Estimation and Inference in Semi-Functional Partially Linear Measurement Error Models

Article 26 December 2019

Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

Article 20 April 2020

Estimation and inference for varying-coefficient regression models with error-prone covariates

Article 30 November 2014

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Generalized varying coefficient partially linear models (GVCPLM) (Li and Liang 2008) are powerful extensions of generalized partially linear models (GPLM). These models offer additional flexibility compared to GPLM when modeling data with discrete response variable, because they further relax model assumptions imposed on GPLM and allow interactions between covariates and certain unknown functions depending on other covariates, while keep some linear components there. GVCPLM are also useful generalizations of varying coefficient models (Hastie and Tibshirani 1993), which have been applied to parsimoniously describe data structure and uncover scientific feature, and have been studied in the context of quasi-likelihood principle. As well known in the literature, several useful semiparametric models can be classified as special cases of GVCPLM in one way or another to name a few such as GPLM (Hunsberger 1994; Hunsberger et al. 2002; Lin and Carroll 2001; Severini and Staniswalis 1994); partially linear models (Härdle et al. 2000; Robinson 1988; Speckman 1988); semivarying-coefficient models (Fan and Huang 2005; Xia et al. 2004; Zhang et al. 2002) and varying coefficient models (Cai et al. 2000; Hastie and Tibshirani 1993).

Li and Liang (2008) studied variable selection for GVCPLM using the SCAD (Fan and Li 2001) to identify parametric components and generalized likelihood ratio test (Fan et al. 2001) to select nonparametric components. Wang and Xia (2009) proposed a shrinkage method for selecting nonparametric components in varying coefficient models. Wang et al. (2011) developed an estimation procedure and variable selection procedure for generalized additive partial linear models (PLM) with an incorporation of polynomial spline smoothing to estimate nonparametric functions and penalized SCAD quasi-likelihood-based estimators to select linear covariates. Li et al. (2011) considered variable selection on varying coefficient partially linear models when both the number of parametric and nonparametric components diverge at appropriate rates. Wei et al. (2011) further considered variable selection and estimation in “large p, small n” setting using the group Lasso idea (Yuan and Lin 2006).

Measurement errors are often encountered in biomedical research. Simply ignoring the errors can cause bias in estimation and lead to a loss of power for accurately detecting the relationship among variables. Regression calibration and simulation extrapolation (SIMEX, Cook and Stefanski 1994) are two widely useful methods for eliminating or reducing bias caused by measurement errors. But the corresponding estimators are consistent only in special cases such as linear or loglinear regression, and approximately consistent in general cases. There are possible alternative methods to remedy consistency concerns by deriving unbiased score functions in the presence of measurement error, for example, the conditional score method (Stefanski and Carroll 1987) and corrected score method (Stefanski 1989), which are essentially M-estimation methods, and the usual numerical methods and asymptotic theory for M-estimators are applicable. But just like other methods, these two methods have their own limitations. In particular, conditional scores can be derived under specific assumptions on the model for response given covariates and the error model for the surrogates, and some conditional score methods may require integration, while corrected scores also impose sufficient assumptions on the error model to ensure unbiased estimation of the true-data estimating function. See Carroll et al. (2006) for more detailed discussions on a variety of estimation and inference methods for nonlinear measurement errors models. Ma and Carroll (2006), Ma and Tsiatis (2006) and Tsiatis and Ma (2004) investigated estimation efficiency for semiparametric models with measurement errors. Hall and Ma (2007) studied semiparametric estimators of functional measurement error models. Yi et al. (2012) considered marginal analysis of longitudinal data when errors-in-variables and missing response appear simultaneously.

Efforts have been made to address various scientific questions using semiparametric models in the presence of measurement errors. For example, Sinha et al. (2010) proposed a semiparametric Bayesian method for handling measurement errors commonly appeared in nutritional epidemiological studies. Carroll and Wang (2008) studied effects of measurement errors on microarray data analysis, and noticed that a direct application of the simulation extrapolation method leads to inconsistent estimators. The authors proposed a permutation SIMEX method which leads to consistent estimators in theory. In environmental research, environmental factors are generally measured with error. Lobach et al. (2008, 2010) developed a genotype-based approach for association analysis of case–control studies of gene–environment interactions using pseudo-likelihood principle to reduce bias caused by measurement errors.

Recently, variable selection in semiparametric regressions with measurement errors has been considered. Liang and Li (2009) developed two variable selection procedures, penalized least squares and penalized quantile regression, for PLM with measurement errors. Ma and Li (2010) proposed a penalized estimating equation with SCAD penalty for a class of parametric measurement error models and semiparametric measurement error models. As observed in Liang and Li (2009), if measurement errors are ignored, some variable selection procedures may falsely choose variables and result in a final biased model.

In this article, we study estimation and variable selection for GVCPLM when the covariates are error prone. We consider three problems: first, calibrating the error-prone covariates using ancillary information and applying nonparametric regression techniques; second, developing quasi-likelihood profile estimating procedures and justifying that the corresponding estimators of parameters of interest are asymptotically normal; third, proposing a penalized quasi-likelihood procedure for selecting significant parameters and a generalized likelihood ratio test for selecting nonzero nonparametric functions. Zhou and Liang (2009) once studied the case where the link function is identity one, and gave a variety of examples to illustrate the flexibility of the model. The authors developed a profile-based estimation procedure to estimate unknown parameters of interest.

It is remarkable that extension of the profile estimation procedure of Zhou and Liang (2009) to GVCPLM is by no means trivial. For the case of identity link function, the profile least-square technique can be used and a closed form of estimators is available. Nevertheless, for GVCPLM with measurement errors, only quasi-likelihood-based objective function is available. Whether the resulting estimators still have nice properties such as asymptotic normality is theoretically difficult to address. In GVCPLM, Li and Liang (2008) proposed SCAD-type procedure for parametric component selection and theoretically showed its oracle properties under certain assumptions. Whether such a procedure can be developed under a measurement error framework is not clear and has not been investigated in the literature. No measurement errors, Fan et al. (2001) proposed a generalized quasi-likelihood ratio test (GLRT) to investigate whether the coefficient functions in GVCPLM are constant or not. In this paper, we investigate Wilks phenomenon in the context of error-prone semiparametric setting. We further propose a bootstrap procedure to estimate null distribution of GLRT. To the best of our knowledge, this Wilks phenomenon under error prone is new and the findings contribute to the literature on semiparametric modeling.

The remainder of the paper is organized as follows. In Sect. 2, we propose the quasi-likelihood procedure for estimation of parametric components, then we develop a penalized quasi-likelihood for variable selection. Sampling properties of the proposed procedures are investigated. In Sect. 3, estimation procedure and variable selection procedure are proposed for the nonparametric component. The null distribution of the GLRT is also established. Simulation results and a real data analysis are presented in Sect. 4. Regularity conditions and technical proofs are given in the Appendix.

2 Estimation and variable selection for parametric components

Let $X=(X_{1}, \ldots , X_{p})^{T}\in \mathbb {R}^{p}$, $\xi =(\xi _{1}, \ldots , \xi _{d})^{T} \in \mathbb {R}^{d}$, $W=(W_{1}, \ldots , W_{r})^{T} \in \mathbb {R}^{r}$, $U\in \mathbb {R}$ be the covariates and Y be the response variable. The GVCPLM are of form:

$$\begin{aligned} g\{\mu (U,\xi , W, X)\}=\beta ^{T}\xi +\theta ^{T}W+\alpha (U)^{T}X, \end{aligned}$$

(1)

where $g(\cdot )$ is a known link function, $\beta $ and $\theta $ are vectors of unknown regression coefficients and $\alpha (\cdot )$ is a vector of unknown smooth nonparametric functions of U. The response Y is related to covariates (U, $\xi $, W, X) through an unknown mean function $\mu (U, \xi , W, X)=E(Y|U, \xi , W, X)$ and the conditional variance determined by a known positive function $T(\cdot )$, i.e., $\mathrm{Var}(Y|U, \xi , W, X)=\sigma ^{2}T\{\mu (U, \xi , W, X)\}$. The components $\xi $ are unobserved directly, but auxiliary variables $(\eta , V)$ are available to remit $\xi $. $\eta $ is related to V via

$$\begin{aligned} \eta =\xi (V)+e, \end{aligned}$$

(2)

where e is a measurement error and independent of (X, W, V, U, Y), and has a positive finite covariance matrix $\Sigma _{e}=E(e e^{T})$. We term (1) and (2) generalized varying coefficient partially linear measurement error models (GVCPLMeM).

Let $\big \{(Y_{i}, U_{i}, \eta _{i}, V_{i}, W_{i}, X_{i})\big \}_{i=1}^{n}$ be an i.i.d. sample from $(Y, U, \eta , V, W, X)$. When the covariates $\xi $ are measured with error, we first calibrate $\xi $ using ancillary observed sample $\big \{(\eta _{i}, V_{i})\big \}_{i=1}^{n}$.

2.1 Covariate calibration

We introduce the calibration estimation procedure for $\xi $ in this section. For notational simplicity, we assume V is univariate throughout this paper. Let $\eta _{ik}$ be the kth entry of vector $\eta _{i}$ for $i=1, \ldots , n$. To estimate $\xi _{k}(v)$, the kth component of $\xi (v)$, we employ the local linear smoothing technique (Fan and Gijbels 1996). That is, to minimize

$$\begin{aligned} \sum _{i=1}^{n}\big \{\eta _{ik}-c_{0k}-c_{1k}(V_{i}-v)\big \}^{2}L_{b_{k}}(V_{i}-v) \end{aligned}$$

(3)

with respect to $c_{0k}, c_{1k}$, where $L_{b}(\cdot )=L(\cdot /b)/b$ with $L(\cdot )$ be a kernel function, $b=b_{k}$ ($k=1, \ldots , p$) is a bandwidth. Let $\hat{c}_{0k}, \hat{c}_{1k}$ be the minimizers of (3). Write

$$\begin{aligned} \hat{\xi }_{k}(v)=\hat{c}_{0k}=\frac{D_{20,k}(v)D_{01,k}(v)-D_{10,k}(v)D_{11,k}(v)}{D_{00,k}(v)D_{20,k}(v)-D_{10,k}^{2}(v)}, \end{aligned}$$

(4)

where $D_{s_{1}s_{2},k}(v)=\sum _{i=1}^{n}L_{b_{k}}(V_{i}-v)(V_{i}-v)^{s_{1}} \eta _{ik}^{s_{2}}$ for $s_{1}=0,1,2$, $s_{2}=0,1$, $k=1, \ldots , d$.

We now list the assumptions needed in the following proposition and theorems. The following are the regularity conditions for our asymptotic results.

(C1)
$q_{2}(x, y)<0$ for $x\in \mathbb {R}$ and y in the range of the response variable.
(C2)
The functions $T''(\cdot )$ and $g'''(\cdot )$ are continuous.
(C3)
The random variable U has bounded support $\mathcal {U}$. The elements of the function $\alpha ''(u)$ are continuous in $u\in \mathcal {U}$.
(C4)
The density functions $f_{U}(u)$, $f_{V}(v)$ of U, V are Lipschitz continuous and bounded away from 0 and infinite on their supports, respectively. Moreover, the joint density function $f_{U, V}(u, v)$ of (U, V) is continuous on the support $\mathcal {U}\times \mathcal {V}$.
(C5)
Let $Z=\beta ^{T}\xi +\theta ^{T}W+\alpha (U)^{T}X$. Then, s$E\big [q_{l}^{s}(Z,Y)N^{\otimes 2}\big |U=u\big ]$, $E\big [q_{l}^{s}(Z, Y)N^{\otimes 2}\big |V=v\big ]$ and $E\big [q_{l}^{s}(Z, Y)N^{\otimes 2}\big |U=u, V=v\big ]$ for $l=1,2$, $s=1,2$ are Lipschitz continuous and twice differentiable on $u\in \mathcal {U}$ and $v\in \mathcal {V}$. Moreover, $E\{q_{2}^{2}(Z, Y)\}< \infty $, $E\{q_{1}^{2+\delta }(Z, Y)\}< \infty $ for some $\delta >2$ and $E\big [\rho _{2}(Z)N^{\otimes 2}\big |U=u\big ]$ is nonsingular for each $u\in \mathcal {U}$.
(C6)
The kernel functions $K(\cdot )$, $L(\cdot )$ are univariate bounded, continuous and symmetric density functions satisfying that $\int t^{2}K(t)\mathrm{d}t \not = 0 $, $\int t^{2}L(t)\mathrm{d}t\not =0$, and $\int |t|^{j}K(t)\mathrm{d}t <\infty $, $\int |t|^{j}L(t)\mathrm{d}t <\infty $. for $j=1,2,3,4$. Moreover, the second derivatives of $K(\cdot )$ and $L(\cdot )$ are bounded on $\mathbb {R}$.
(C7)
The bandwidths h and b satisfy:
1. (i)
  $b=b_{k}$, $k=1,\ldots , d$, $b_{k}\asymp c_{b}h_{o}$ for some constant $c_{b}>0$; $h\asymp c_{h}h_{o}$ for some constant $c_{h}>0$.
2. (ii)
  $h_{o}\rightarrow 0$ as $n\rightarrow \infty $, $nh_{o}^{2}/(\log h_{o}^{-1})^{4} \rightarrow \infty $, $nh_{o}^{4}\rightarrow 0$.
(C8)
For all $\lambda _{1j}$, $\lambda _{2s}$, $j=1, \ldots , d$, $s=1, \ldots , r$, $\lambda _{1j}\rightarrow 0$, $\sqrt{n}\lambda _{1j}\rightarrow \infty $, $\lambda _{2s}\rightarrow 0$, $\sqrt{n}\lambda _{2s}\rightarrow \infty $, and $ \displaystyle \lim \inf _{n\rightarrow \infty } \lim \inf _{u\rightarrow 0^{+}}p'_{\lambda _{1j}}(u)/\lambda _{1j}>0$, $ \lim \inf _{n\rightarrow \infty } \lim \inf _{u\rightarrow 0^{+}}p'_{\lambda _{2s}}(u)/\lambda _{2s}>0. $

Condition (C1) is imposed to ensure the local likelihood concave and guarantees the solution unique. Conditions (C2) and (C3) are usual smooth conditions (Li and Liang 2008). Condition (C4) is a technique condition commonly imposed for conventional nonparametric regression analysis. Condition (C5) is needed for Taylor expansion and ensures asymptotic variance finite. Condition (C6) is commonly imposed for nonparametric kernel smoothing. Condition (C7) is generally required for bandwidths h and $b_{k}$ in semiparametric setting. Condition (C8) is a technique condition involved in the SCAD variable selection procedure (Fan and Li 2001; Liang and Li 2009).

Proposition 1

Under the conditions (C4), (C6) and (C7), we have

$$\begin{aligned}&\hat{\xi }_{k}(v)-\xi _{k}(v)\nonumber \\&\quad =\frac{\mu _{L2}}{2}b_{k}^{2}\xi _{k}^{(2)}(v)+\frac{1}{nf_{{\text{ V }}}(v)}\sum _{i=1}^{n}L_{b_{k}}(V_{i}-v)e_{ki}+o\left( b_{k}^{2} +\log b_{k}^{-1}/\sqrt{nb_{k}}\right) \nonumber \\ \end{aligned}$$

(5)

holds uniformly on $v \in \mathcal {V}$, where $\mu _{Lj}=\int u^{j}L(u)\mathrm{d}u$, $\xi _{k}^{(2)}(v)$ is the second derivatives of $\xi _{k}(v)$, $e_{ki}$ is the kth component of $e_{i}$, $i=1, \ldots , n$.

The proof of (5) can be completed in a way similar to Zhou and Liang (2009).

2.2 Quasi-likelihood-based estimation

After we calibrate $\xi $, we model the “synthesis” data $\{Y_{i}, U_{i}, \hat{\xi }_{i}, W_{i}, X_{i}; 1\le i \le n\}$ using the local likelihood principle (Fan and Gijbels 1996) to estimate $\beta , \theta , \alpha (\cdot )$ based on the model:

$$\begin{aligned} g\big \{\mu \big (U, \hat{\xi }, W, X\big )\big \}= \beta ^{T}\hat{\xi }+\theta ^{T}W+\alpha (U)^{T}X. \end{aligned}$$

(6)

Specifically, let h be the bandwidth, $K(\cdot ) $ be the kernel function satisfying the condition (C6), and $K_{h}(\cdot )=h^{-1}K(\cdot /h)$. For each u in a neighborhood of U, we approximate $\alpha _{j}(U)$ by $\alpha _{0j}(u)+\alpha _{0j}'(u)(U-u)$, $j=1, \ldots , p$. Let $\alpha (u)=(\alpha _{01}(u),\ldots , \alpha _{0p}(u) )^{T}$, $b(u)=(\alpha _{01}'(u), \ldots , \alpha _{0p}'(u))^{T}$. The estimators of $\beta $, $\theta $, $\alpha _{j}(u)$’s and $\alpha _{j}'(u)$’s are obtained by maximizing the following local quasi-likelihood function with respect to $\alpha (u)$, b(u), $\beta $, $\theta $,

$$\begin{aligned}&\mathcal {L}_{loc}\big (a(u),b(u), \beta , \theta \big )\nonumber \\&\quad =\sum _{i=1}^{n}\mathcal {Q}\left[ g^{-1}\big (\beta ^{T}\hat{\xi }_{i}+\theta ^{T}W_{i} +\alpha (u)^{T}X_{i}+b(u)^{T}X_{i}(U_{i}-u)\big ), Y_{i} \right] K_{h}(U_{i}-u),\nonumber \\ \end{aligned}$$

(7)

where $\mathcal {Q}(x, y)$ is the quasi-likelihood function and is defined as $\mathcal {Q}(x, y)=\int _{y}^{x}\frac{y-u}{T(u)}\hbox {d}u$. Denote the local quasi-likelihood estimators from (7) by $\hat{\alpha }_{*}(u), \hat{b}_{*}(u), \hat{\beta }_{*}, \hat{\theta }_{*}$. As demonstrated in Lemma A.2 in Appendix that these estimators are all $\sqrt{nh}$-consistent (or $\sqrt{nh_{o}}$-consistent, under Condition (C7)).

We now update estimates of $\beta $ and $\theta $ using all data, through by considering a global quasi-likelihood procedure for improving efficiency. Define

$$\begin{aligned} \mathcal {L}_{gol}\big (\beta , \theta \big ) =\sum _{i=1}^{n}\mathcal {Q}\left[ g^{-1}\big (\beta ^{T}\hat{\xi }_{i}+\theta ^{T}W_{i} +\hat{\alpha }_{*}(U_{i})^{T}X_{i}\big ), Y_{i} \right] , \end{aligned}$$

(8)

where $\hat{\alpha }_{*}(u)$ is obtained from (7). As a result, we have global quasi-likelihood estimators $\hat{\beta }$ and $\hat{\theta }$ by maximizing $\mathcal {L}_{gol}(\beta , \theta )$. The corresponding estimators have the same merit as one-step backfitting algorithm estimates. One may also consider a full iterative backfitting algorithm or a profile likelihood approach to obtain estimators of $\beta $, $\theta $.

In the following, we introduce some notations for presenting the properties of the estimators. Denote $\mathrm{A}^{\otimes 2}= \mathrm{A}\mathrm{A}^{T}$ for any matrix or vector $\mathrm{A}$. Let $q_{\ell }(x,y)=\frac{{{\partial }}^{\ell } }{{\partial } x^{\ell }}\mathcal {Q}\{g^{-1}(x), y\}$ for $\ell =1, 2$. Then $ q_{1}(x,y)=\{y-g^{-1}(x)\}\rho _{1}(x), q_{2}(x,y)=\{y-g^{-1}(x)\}\rho _{1}'(x)-\rho _{2}(x) $ with $\rho _{\ell }(x)=\left\{ \frac{d g^{-1}(x)}{dx}\right\} ^{\ell }\big /\big [\sigma ^{2}T\{g^{-1}(x)\}\big ]$. Let $Z=\beta ^{T}\xi +\theta ^{T}W+\alpha (U)^{T}X$, $Q=(\xi ^{T}, W^{T})^{T}$, $N=(\xi ^{T}, W^{T}, X^{T})^{T}$, $\Sigma =E\left[ \rho _{2} (Z)Q^{\otimes 2}\right] $. Denote by $\kappa _{k}(u)$ the kth element of $E\big [\rho _{2}(Z)N^{\otimes 2}\big |U=u\big ]^{-1}{N}$, $\iota _{k}(u,v)$ is the kth element of $E\big [\rho _{2} (Z)N^{\otimes 2}\big |U=u\big ]^{-1}E\big [\rho _{2} (Z)N\big |U=u, V=v\big ]$. Moreover,

$$\begin{aligned} \varGamma (u)= & {} \left\{ Q-\sum _{k=1}^{p}\kappa _{k}(u)E\big [\rho _{2} (Z)Q X_{k}\big |U=u\big ]\right\} q_{1}(Z, Y),\\ \varkappa _{k}(v)= & {} E\left[ Q X_{k}\rho _{2} (Z)\iota _{k}(U,v)\frac{f_{U,V}(U, v)}{f_{U}(U)f_{V}(v)}\right] , \\ \varLambda (v)= & {} \left\{ \sum _{k=1}^{p} \varkappa _{k}(v)-E\left[ \rho _{2} (Z)Q\big |V=v\right] \right\} e^{T}\beta . \end{aligned}$$

We have the following asymptotic results.

Theorem 1

Under Conditions (C1)–(C7) given in the Appendix, we have

$$\begin{aligned}&\sqrt{n}\big (\big (\hat{\beta }-\beta \big )^{T}, \big (\hat{\theta } -\theta \big )^{T} \big )^{T}\\&\quad \mathop {\longrightarrow }\limits ^{\mathcal {L}} N_{q}\left( 0, \Sigma ^{-1}E\left[ \varGamma (U)^{\otimes 2}\right] \Sigma ^{-1}+ \Sigma ^{-1}E\left[ \varLambda (V)^{\otimes 2}\right] \Sigma ^{-1}\right) . \end{aligned}$$

Remark 1

To ensure Theorem 1 holds, undersmoothing is necessary. This strategy concurs with that adapted in modeling GPLM (Severini and Staniswalis 1994). In the asymptotic variance, the first term $\Sigma ^{-1}E\left[ \varGamma (U)^{\otimes 2}\right] \Sigma ^{-1}$ is similar to that obtained by Li and Liang (2008) for partially linear models, while the second term $\Sigma ^{-1}E\left[ \varLambda (V)^{\otimes 2}\right] \Sigma ^{-1}$ is owing to the impact of measurement error and a bias correction in virtue of the ancillary variable V.

Bandwidth selection The proposed procedure involves the bandwidth $b_{k}$ and h, to be selected. As indicated in Zhou and Liang (2009), the undersmoothing is necessary when we estimate $\xi $. So, the optimal bandwidth for $b_{k}$ has to be violated. The consequence of undersmoothing $\xi $ is to keep the bias small and preclude the optimal bandwidth for $b_{k}$. As suggested by Carroll et al. (1997), an ad hoc but reasonable choice is $O(n^{-1/5})\times n^{-2/15}=O(n^{-1/3})$. The suitable bandwidth $b_{k}$ is $b_{k}=C_{1}n^{-1/3}$, where $C_{1}$ is a positive constant. One can use a plug-in rule to estimate the constant $C_{1}$, i.e., $b_{k}=\hat{\sigma }_{V}n^{-1/3}$. Another selection for $b_{k}$ can be chosen as $b_{k}=n^{-2/15}\hat{b}_{k*}$, where $\hat{b}_{k*}=\arg \min _{b_{*}}\mathrm{CV}_{k}(b_{*})$, $\mathrm{CV}_{k}(b_{*})=n^{-1}\sum _{i=1}^{n}\left\{ \eta _{ik}-\hat{\xi }^{(-i)}_{k,b_{*}} (V_{i})\right\} ^{2}$, where $\hat{\xi }^{(-i)}_{k,b_{*}}(V_{i}) $ is computed analogous to (3) from the data with the ith observation $\eta _{i}, V_{i}$ deleted and bandwidth $b_{*}$. To select h, we first define the “leave-one-sample out” method $h_{1}=\arg \min _{h_{*}}\sum _{i=1}^{n}\mathcal {Q}\left[ g^{-1} \big (\hat{\beta }_{-i}^{T}\hat{\xi }_{i}+\hat{\theta }_{-i}^{T}W_{i}+ \hat{\alpha }_{-i,h_{*}}(u)^{T}X_{i}\big ), Y_{i} \right] $, where $\hat{\beta }_{-i}$, $\hat{\theta }_{-i}$ are obtained from (8), and $\hat{\alpha }_{-i,h_{*}}(U_{i})$ is obtained from (7) with the fixed bandwidth $h_{*}$ and the leave-one-out sample $\{Y_{j}, \hat{\xi }_{j}, W_{j}, X_{j}, U_{j}\}_{1\le j \not = i \le n}$.

2.3 Penalized quasi-likelihood-based variable selection

In this section, we consider the variable selection problem. We define the penalized quasi-likelihood as

$$\begin{aligned} \mathcal {L}_{P}\big (\beta , \theta \big )=\mathcal {L}_{gol}\big (\beta , \theta \big ) -n\sum _{j=1}^{d}p_{\lambda _{1j}}(|\beta _{j}|)-n\sum _{s=1}^{r}p_{\lambda _{2s}}(|\theta _{s}|), \end{aligned}$$

(9)

where $p_{\lambda _{1j}}(\cdot )$, $p_{\lambda _{2s}}(\cdot )$ are penalty functions, and $\lambda _{1j}$ and $\lambda _{2s}$ are tuning parameters. We distinctively choose tuning parameters $\lambda _{1}$’s, $\lambda _{2}$’s for identifying nonzero elements of $\beta $ and $\theta $. If we are only interested in selecting W-variable, then we set $p_{\lambda _{1j}}(\cdot )=0$, $j=1, \ldots , p$. Similarly, we can commit only on $\xi $-variable.

We first briefly discuss the choice of penalty functions. There have been many penalty functions in the variables selection literature. For example, $L_{0}$-penalty, $p_{\lambda _{1j}}(|\beta _{j}|)=0.5 \lambda _{1j}^{2}I\{|\beta _{j}|\not =0\}$, where $I\{\cdot \}$ is an indicator function. Specially, if we further let $\lambda _{1j}=\sigma \sqrt{2/n}$, $\sigma \sqrt{\log (n)/n}$ and $\sigma \sqrt{\log (d)/n}$, those penalty functions correspond to the popular variable selection criteria such as AIC (Akaike 1973), BIC (Schwarz 1978) and RIC (Foster and George 1994). We adopt SCAD penalty (Fan and Li 2001), whose first derivative is

$$\begin{aligned} p'_{\lambda }(\gamma )=\lambda \Big \{I(\gamma \le \lambda )+\frac{(a\lambda -\gamma )_{+}}{(a-1)\lambda }I(\gamma >\lambda )\Big \}, \end{aligned}$$

where $(s)_{+}=sI(s>0)$ is the hinge loss function and $a = 3.7$.

We next study the asymptotic properties of the resulting penalized quasi-likelihood estimates. Without loss of generality, assume the first $d_{1}$ components of $\beta $ are nonzeros, the first $r_{1}$ components of $\theta $ are nonzeros. I.e., $\beta _{s}\not =0$, $s=1,\ldots , d_{1}$, $\theta _{l}\not =0$, $l=1, \ldots , r_{1}$ and $\beta _{k}\equiv 0$, $k=d_{1}+1, \ldots , d$, $\theta _{t}\equiv 0$, $t=r_{1}+1, \ldots , r$.

For notational simplicity, denote $\mathcal {R}_{n,\lambda _{1}, \lambda _{2}}=(\mathcal {R}_{n,\lambda _{1}}^{T}, \mathcal {R}_{n,\lambda _{2}}^{T})^{T}$ with

$$\begin{aligned} \mathcal {R}_{n,\lambda _{1}}= & {} \{p'_{\lambda _{11}}(|\beta _{1}|)\mathrm{sign} (\beta _{1}), \ldots , p'_{\lambda _{1d_{1}}}(|\beta _{d_{1}}|)\mathrm{sign} (\beta _{d_{1}})\}^{T},\\ \mathcal {R}_{n,\lambda _{2}}= & {} \{ p'_{\lambda _{21}}(|\theta _{1}|)\mathrm{sign} (\theta _{1}), \ldots , p'_{\lambda _{2r_{1}}}(|\theta _{r_{2}}|)\mathrm{sign} (\theta _{r_{2}})\}^{T}, \end{aligned}$$

and we further define

$$\begin{aligned} a^{*}_{n}= & {} \max _{1\le j\le d}\{p'_{\lambda _{1j}}(|\beta _{j}|),~~~ \beta _{j}\not = 0\}, b^{*}_{n}=\max _{1\le s\le r}\{ p'_{\lambda _{2s}}(|\theta _{s}|), \theta _{s}\not = 0\}, \\ a^{**}_{n}= & {} \max _{1\le j\le d}\{p''_{\lambda _{1j}}(|\beta _{j}|),~~~ \beta _{j}\not = 0\}, b^{**}_{n}=\max _{1\le s\le r}\{p''_{\lambda _{2s}}(|\theta _{s}|), \theta _{s}\not = 0\}, \\ \Sigma _{n,\lambda _{1}, \lambda _{2}}= & {} \mathrm{diag}\{p''_{\lambda _{11}}(|\beta _{1}|), \ldots , p''_{\lambda _{1d_{1}}}(|\beta _{d_{1}}|), p''_{\lambda _{21}}(|\theta _{1}|), \ldots , p''_{\lambda _{2r_{1}}}(|\theta _{r_{1}}|)\}. \end{aligned}$$

Denote the resulting penalized estimators from (9) by $\hat{\beta }_{\lambda _{1}}$, $\hat{\theta }_{\lambda _{2}}$. We have the following asymptotic results.

Theorem 2

Under Conditions (C1)–(C8) given in the Appendix, moreover, suppose $a_{n}^{*}=O(n^{-1/2})$, $b_{n}^{*}=O(n^{-1/2})$, $a_{n}^{**}\rightarrow 0$, $b_{n}^{**}\rightarrow 0$, then there exist local maximizers $\hat{\beta }_{\lambda _{1}}$, $\hat{\theta }_{\lambda _{2}}$ of (9) such that their rates of convergence are $\hat{\beta }_{\lambda _{1}}=\beta +O_{P}(n^{-1/2})$ and $\hat{\theta }_{\lambda _{2}}=\theta +O_{P}(n^{-1/2})$.

We further introduce notations for presenting the oracle properties of the resulting penalized likelihood estimates. Without loss of generality, denote $\beta =(\beta _{(1)}^{T}, \beta _{(2)}^{T})^{T}$, $\theta =(\theta _{(1)}^{T}, \theta _{(2)}^{T})^{T}$, where $\beta _{(1)}$ and $\theta _{(1)}$ are $d_{1}$ and $r_{1}$ nonzero components of $\beta $ and $\theta $, respectively, and $\beta _{(2)}$ and $\theta _{(2)}$ are two $(d-d_{1})$- and $(r-r_{1})\times 1$-zero vectors. Accordingly, $\xi _{(1)}$ and $W_{(1)}$ are the first $d_{1}$ covariates of $\xi $, and the first $r_{1}$ covariates of W. Let $Z_{(1)}=\beta _{(1)}^{T}\xi _{(1)}+\theta _{(1)}^{T}W_{(1)}+\alpha (U)^{T}X$, $Q_{(1)}=(\xi _{(1)}^{T}, W_{(1)}^{T})^{T}$, $N_{(1)}=(\xi _{(1)}^{T}, W_{(1)}^{T}, X^{T})^{T}$, and $e_{(1)}$ be the first $d_{1}$ covariates of the error e. Moreover, the definitions of $\Sigma _{(1)}$, $\varGamma _{(1)}(u)$ and $\varLambda _{(1)}(v)$ and the terms involved in these definitions are accordingly to $\Sigma , \varGamma (u), \varLambda (v)$ by substituting $\beta , Z, Q, N, e$ with $\beta _{(1)}, Z_{(1)}, Q_{(1)}, N_{(1)}, e_{(1)}$, respectively.

Theorem 3

Under Conditions (C1)–(C8), the penalized estimators $\hat{\beta }_{\lambda _{1}}=\big (\hat{\beta }_{\lambda _{1}(1)}^{T}, \hat{\beta }_{\lambda _{1}(2)}^{T}\big )^{T}$ and $\hat{\theta }_{\lambda _{2}}= \big (\hat{\theta }_{\lambda _{2}(1)}^{T}, \hat{\theta }_{\lambda _{2}(2)}^{T}\big )^{T}$ satisfy: (a) with probability tending to one, $ \hat{\beta }_{\lambda _{1}(2)}=\mathbf{0}$, $ \hat{\theta }_{\lambda _{2}(2)}=\mathbf{0}$; and (b) $\hat{\beta }_{\lambda _{1}(1)}$ and $\hat{\theta }_{\lambda _{2}(1)}$ are asymptotically normal, i.e.,

$$\begin{aligned}&\sqrt{n}\big (\Sigma _{(1)}+\Sigma _{n,\lambda _{1}, \lambda _{2}}\big )\Big \{\big (\big (\hat{\beta }_{\lambda _{1}(1)}-\beta _{(1)}\big )^{T}, \big (\hat{\theta }_{\lambda _{2}(1)}-\theta _{(1)}\big )^{T}\big )^{T}\\&\quad +\big (\Sigma _{(1)}+\Sigma _{n,\lambda _{1}, \lambda _{2}}\big )^{-1}\mathcal {R}_{n, \lambda _{1}, \lambda _{2}}\Big \}\\&\quad \mathop {\longrightarrow }\limits ^{\mathcal {L}} N_{d_{1}+r_{1}}\left( \mathbf{0}, \Sigma _{(1)}^{-1}E\left[ \varGamma _{(1)}(U)^{\otimes 2}\right] \Sigma _{(1)}^{-1}+ \Sigma _{(1)}^{-1}E\left[ \varLambda _{(1)}(V)^{\otimes 2}\right] \Sigma _{(1)}^{-1}\right) . \end{aligned}$$

Remark 2

Theorem 3 indicates that the proposed variable selection procedure processes the oracle property with proper choices of tuning parameters $\lambda _{1j}$’s, $\lambda _{2s}$’s. If we further demand that $\sqrt{n}\mathcal {R}_{n, \lambda _{1}, \lambda _{2}}\rightarrow 0$, and $\Sigma _{n, \lambda _{1}, \lambda _{2}}\rightarrow 0$, the asymptotic variance simplifies to summand of $\Sigma _{(1)}^{-1}E\left[ \varGamma _{(1)}(U)^{\otimes 2}\right] \Sigma _{(1)}^{-1}$ and $\Sigma _{(1)}^{-1}E\left[ \varLambda _{(1)}(V)^{\otimes 2}\right] \Sigma _{(1)}^{-1}$.

Choice of $\lambda _{j}$’s. We adopt a data-driven GCV procedure proposed by Li and Liang (2008) to select the tuning parameters $\lambda _{1}$’s, $\lambda _{2}$’s in a $d+r$-dimensional space. Let $\lambda _{1j}=\lambda * \mathrm{Se}(\hat{\beta }_{j})$, $\lambda _{2i}=\lambda *\mathrm{Se}(\hat{\theta }_{i})$, where $\mathrm{Se}(\hat{\beta }_{j})$ and $\mathrm{Se}(\hat{\theta }_{i})$ are the estimated standard error of $\hat{\beta }_{j}, \hat{\theta }_{i} $. Thus, the minimization over $\lambda _{1}$’s, $\lambda _{2}$’s is simplified to an one-dimensional minimization through $\lambda $. We first introduce the estimation procedure for the standard errors, which can be obtained from the estimated covariance matrix $\widehat{\mathrm{Cov}}(\hat{\gamma })$, where $\hat{\gamma }=(\hat{\beta }^{T}, \hat{\theta }^{T})^{T}$ is obtained from (8). Write $\ell '(\gamma )= \frac{\mathcal {L}_{gol}({\beta },{ \theta })}{\partial {\gamma }}$ $\ell ''(\gamma )=\frac{\mathcal {L}_{gol}({\beta },{ \theta })}{\partial {\gamma } \partial {\gamma }^{T} }$, $\gamma =(\beta ^{T}, \theta ^{T})^{T}$ and

$$\begin{aligned} \Sigma _{n,\lambda _{1}, \lambda _{2}}^{*}=\mathrm{diag}\left( \frac{p'_{\lambda _{11}}(|\beta _{1}|)}{|\beta _{1}|}, \ldots , \frac{p'_{\lambda _{1d}}(|\beta _{d}|)}{|\beta _{d}|}, \frac{p'_{\lambda _{21}} (|\theta _{1}|)}{|\theta _{1}|}, \ldots , \frac{p'_{\lambda _{2r}}(|\theta _{r}|)}{|\theta _{r}|}\right) . \end{aligned}$$

(10)

A sandwich formula for the covariance matrix of the estimates $\hat{\gamma } =\left( \hat{\beta }^{T}, \hat{\theta }^{T}\right) ^{T}$ is given by

$$\begin{aligned} \widehat{\mathrm{Cov}}(\hat{\gamma })=\left\{ \ell ''(\hat{\gamma })-n \Sigma _{n,\lambda _{1}, \lambda _{2}}^{*}\right\} ^{-1}\widehat{\mathrm{Cov}}(\ell '(\hat{\gamma }))\left\{ \ell ''(\hat{\gamma })-n\Sigma _{n, \lambda _{1}, \lambda _{2}}^{*}\right\} ^{-1}. \end{aligned}$$

Write $e(\lambda )=\mathrm{tr}\left\{ \left\{ \ell ''(\hat{\gamma })- n\Sigma _{n,\lambda , \lambda }^{*}\right\} ^{-1}\ell ''(\hat{\gamma })\right\} , $ where $\Sigma _{n,\lambda _{1}, \lambda _{2}}^{*}$ is obtained from (10) by substituting $\lambda _{1j}, \lambda _{2i}$ with $\lambda * \mathrm{Se}(\hat{\beta }_{j}), \lambda *\mathrm{Se}(\hat{\theta }_{i})$ respectively. The GCV statistic is defined by

$$\begin{aligned} \mathrm{GCV}(\lambda )=\frac{\sum _{i=1}^{n}\mathcal {D}\left\{ Y_{i}, g^{-1}(\hat{\alpha }^{T}(U_{i})X_{i}+\hat{\xi }_{i}^{T}\hat{\beta } (\lambda )+W_{i}^{T}\hat{\theta }(\lambda ))\right\} }{n\{1-e(\lambda )/n\}^{2}}, \end{aligned}$$

where $\mathcal {D}\{Y, \mu \}$ denotes the deviance of Y corresponding to the model fitting with $\lambda $. The minimizer of $\mathrm{GCV} (\lambda )$ with respect to $\lambda $ can be obtained by a grid search.

3 Statistical inference for nonparametric components

In this section, we consider a refined estimator of $\alpha (u)$ and propose a generalized likelihood ratio test to select significant components of X.

3.1 Refined estimator of nonparametric component

After we obtain the final estimators of $\beta $ and $\theta $ from Sect. 2.2, the estimator of $\alpha (u)$ can be refined by maximizing the following local likelihood function:

$$\begin{aligned}&\mathcal {L}^{*}_{loc}\big (a(u),b(u)\big )\nonumber \\&\quad =\sum _{i=1}^{n}\mathcal {Q}\left[ g^{-1}\big (\hat{\beta }^{T}\hat{\xi }_{i}+ \hat{\theta }^{T}W_{i}+\alpha (u)^{T}X_{i}+b(u)^{T}X_{i}(U_{i}-u)\big ), Y_{i} \right] K_{h}(U_{i}-u)\nonumber \\ \end{aligned}$$

(11)

with respect to $\alpha (u)$ and b(u). Let $\hat{\alpha }(u)$ be the maximizer of (11). We have the following asymptotic result.

Theorem 4

Under Conditions (C1)–(C7), we have

$$\begin{aligned}&\sqrt{nh}\Bigg (\hat{\alpha }(u)-\alpha (u)-\frac{h^{2}\mu _{K2}}{2}\alpha ''(u)\nonumber \\&\quad +\frac{b^{2}\mu _{L2} }{2}\Sigma _{X}(u)^{-1}E\left[ \rho _{2}(Z)\xi ^{(2)}(V)^{T}\beta {X} \Big | U=u\right] \Bigg )\\\nonumber&\quad \mathop {\longrightarrow }\limits ^{\mathcal {L}} N\left( \mathbf{0}, \frac{v_{K_{0}}}{f_{U}(u)}\Sigma _{X}(u)^{-1}\right) , \end{aligned}$$

where $\Sigma _{X}(u)=E\left[ \rho _{2}(Z){X}^{\otimes 2} \Big |U=u\right] $, and $\mu _{K_{2}}=\int t^{2}K(t)\mathrm{d}t$, $\mu _{L_{2}}=\int t^{2}L(t)\mathrm{d}t$, $v_{K_{0}}=\int K^{2}(t)\mathrm{d}t$.

Remark 3

The second term in the asymptotic bias of $\hat{\alpha }(u)$ is owing to calibrating the error-prone covariates. In fact, we can eliminate two bias terms $O(h^{2})$ and $O(b^{2})$ if we adapt the undersmoothing strategy in order for $\hat{\beta }$, $\hat{\theta }$ being root-n consistent. As such, the bias of $\hat{\alpha }(u)$ tends to zero and the rates of $\hat{\alpha }(u)$ are $(nh_{o})^{1/2}$.

3.2 Variable selection for nonparametric component

It is of interest to select nonzero component of $\alpha (u)$ to increase model prediction. In this section, we adopt the GLRT proposed by Fan et al. (2001) to detect significant components of X, achieved by using the backward elimination procedure. In each step, we test ${H_{0}}: \alpha _{j_{1}}(u)=\cdots =\alpha _{j_{k}}(u)=0$ versus ${H_{1}}: \mathrm{not~ all~} \alpha _{j_{l}}(u)\not =0.$ For ease of presentation, we consider the following hypothesis:

$$\begin{aligned} {H_{0}}: \alpha _{1}(u)=\cdots =\alpha _{p}(u)=0 \qquad \mathrm{versus} \qquad { H_{1}}: \mathrm{not~ all~} \alpha _{j}(u)\not =0. \end{aligned}$$

(12)

Let $\hat{\alpha }(u)$, $\hat{\beta }$, $\hat{\theta }$ be the estimators obtained from (8) and (11) under the alternative hypothesis, and $\bar{\beta }$ and $\bar{\theta }$ be the estimators of $\beta $, $\theta $ under the null hypothesis. Write

$$\begin{aligned} \mathcal {H}_{1}=\sum _{i=1}^{n}\mathcal {Q}\left\{ g^{-1}\left( \hat{\alpha } (U_{i})^{T}X_{i}+\hat{\theta }^{T}\hat{\xi }_{i}+\hat{\theta }^{T}W_{i}\right) ,Y_{i}\right\} \end{aligned}$$

and

$$\begin{aligned} \mathcal {H}_{0}=\sum _{i=1}^{n}\mathcal {Q}\left\{ g^{-1}\left( \bar{\beta }^{T} \hat{\xi }_{i}+\bar{\theta }^{T}W_{i}\right) ,Y_{i}\right\} . \end{aligned}$$

Following Fan et al. (2001) and Li and Liang (2008), we define the GLRT statistic

$$\begin{aligned} \mathcal {T}_{\mathrm{GLR}}= \mathcal {H}_{1}- \mathcal {H}_{0}. \end{aligned}$$

Define $v_{L_{0}}=\int L^{2}(t)\mathrm{d}t$, $v_{K_{0}}=\int K^{2}(t)\mathrm{d}t$, $\sigma _{K}^{2}=2 p\left\{ \int [2K(t)-K*K(t)]^{2}\mathrm{d}t\right\} ^{2}|\mathcal {U}|$ with $|\mathcal {U}|$ being the length of the support of U. $\sigma _{L}^{2}=2\left\{ \int [L*L(t)]^{2}\mathrm{d}t\right\} ^{2}E\Big \{\frac{\{E[\rho _{2}({Z})|{V}]\}^{2}}{f_{V}({V})}\Big \}(\beta ^{T}\Sigma _{e}\beta )^{2}$. $K*K(t), L*L(t)$ are the convolutions of K(t), L(t), respectively. $c_{b}$ and $c_{h}$ are two positive constants satisfying Condition (C7). We have the following theorem.

Theorem 5

Under Conditions (C1)–(C7), $r_{LK}(\mathcal {T}_{\mathrm{GLR}} -\chi _{df_n}^2)\mathop {\longrightarrow }\limits ^\mathcal{{L}}0$ under the null hypothesis $\mathcal {H}_0$, here

$$\begin{aligned} r_{LK}= & {} \dfrac{8c_{b}^{-1}v_{L_{0}}\beta ^{T}\Sigma _{e}\beta E\left\{ \frac{E[\rho _{2}({Z})|{V}]}{f_{V}({V})}\right\} + 8c_{h}^{-1} p |\mathcal {U}| \left[ K(0)-0.5 v_{K_{0}}\right] }{ c_{b}^{-1} \sigma _{L}^{2} + c_{h}^{-1}\sigma _{K}^{2}}, ~~\text {and} \\ df_{n}= & {} \frac{v_{L_{0}}\beta ^{T}\Sigma _{e}\beta }{b}E\left[ \frac{E[\rho _{2}(Z)|V]}{f_{V}(V)}\right] +\frac{p |\mathcal {U}|}{h}\left[ K(0)-0.5 v_{K_{0}}\right] . \end{aligned}$$

Remark 4

Theorem 5 claims that the Wilks type of phenomenon holds for GVCPLMeM. The first part of $df_{n}$ gains insight into the effect of measurement error and ancillary variable. As indicated in Li and Liang (2008), this generalized likelihood ratio theory can be justified using empirical procedure, such as Monte Carlo simulation or a bootstrap procedure, since the degrees of freedom $df_{n}$ tend to infinity as sample size n increases. It is worth mentioning that the main order of the degree of freedom $r_{LK}df_{n}$ cannot be obtained similarly to those in Fan et al. (2001), because $\Sigma _{e}$, $\beta $ and $E\left[ \frac{E[\rho _{2}(Z)|V]}{f_{V}(V)}\right] $ are usually unknown in practice and needed to be estimated from the data, and their estimators may not perform well when sample sizes are small or moderate. Moreover, those constants $c_{b}, c_{h}$ involved in Condition (C7) for the bandwidth h, b are also unknown. If the covariate $\xi $ can be observed without measurement errors, i.e., $\Sigma _{e}=0$, the $c_{b}, c_{h}$ are vanished in $ r_{LK}$ and $ df_{n}$, and the method of calibration formulas for degree of freedoms proposed by Zhang (2004) can be directly applied. For these reasons and for practical purposes, one can follow the conditional bootstrap procedure suggested by Zhou and Liang (2009) and Cai et al. (2000) to estimate null distribution of $\mathcal {T}_{\mathrm{GLR}}$.

Remark 5

Under the Conditions (C1)–(C7), we can have the following asymptotic

$$\begin{aligned} \sqrt{n}\big (\big (\bar{\beta }-\beta \big )^{T}, \big (\bar{\theta } -\theta \big )^{T} \big )^{T}\mathop {\longrightarrow }\limits ^{\mathcal {L}} N_{q}\big (0, \bar{\Sigma }^{-1}\bar{\varGamma }\bar{\Sigma }^{-1}+ \bar{\Sigma }^{-1}E\left[ \bar{\varLambda }(V)^{\otimes 2}\right] \bar{\Sigma }^{-1}\big ). \end{aligned}$$

(13)

where $\bar{\Sigma }=E\left[ \rho _{2}(Z_{*})Q^{\otimes 2}\right] $, $\bar{\varGamma }=E\left[ q^{2}_{1}(Z_{*},Y)Q^{\otimes 2}\right] $, $\bar{\varLambda }(v)=E\left[ \rho _{2}(Z_{*})Q|V=v\right] e^{T}\beta $ and $Z_{*}=\beta ^{T}\xi +\theta ^{T}W$. The asymptotic relative efficiency (ARE) of $\bar{\beta }, \bar{\theta }$ with respect to $\hat{\beta }, \hat{\theta }$ obtained in (8) is

$$\begin{aligned} \mathrm{ARE}\left( (\bar{\beta }, \bar{\theta }), (\hat{\beta }, \hat{\theta })\right) = \frac{\Vert {\Sigma }\Vert ^{2/q}_{D}}{\Vert \bar{\Sigma }\Vert ^{2/q}_{D}} \frac{\Vert \bar{\varGamma }+E\left[ \bar{\varLambda }(V)^{\otimes 2}\right] \Vert ^{1/q}_{D}}{\Vert E\left[ {\varGamma }(U)^{\otimes 2}\right] +E\left[ {\varLambda }(V)^{\otimes 2}\right] \Vert ^{1/q}_{D}}, \end{aligned}$$

where $\Vert \cdot \Vert _{D}$ denotes the determinants of the covariance matrices.

4 Numerical studies

In this section, we conduct simulation studies to assess the performance of the proposed method. We then apply our method to analyze a real data set from a diabetes study. We used the Epanechnikov kernel function $L(t)=K(t)=0.75(1-t^{2})_{+}$ in all numerical studies below. Note Condition (C7) means that the optimal bandwidth cannot be used because undersmoothing is necessary. As such, we used the rule of thumb (Silverman 1986). The smoothing parameter b was chosen as $\hat{\sigma }_{V}n^{-1/3}$, where $\hat{\sigma }_{V}$ is the sample deviation of V. This choice of b naturally meets Condition (C7). As pointed out in Remark 1 of Sect. 3.2, undersmoothing for h is an usual requirement for fitting generalized semiparametric models.

In our simulation studies, we generated 500 data sets consisting of $n=500$ and $n=1000$ observations from the semiparametric coefficient logistic regression model:

$$\begin{aligned} \text{ logit }\{p(U, \xi , W, X )\} =\xi ^{T}\beta +\theta ^{T}W+\alpha (U)^{T}X \end{aligned}$$

(14)

with covariates, nonparametric functions and parameters being explicitly specified below.

4.1 Simulation studies

Example 1

$\beta =2$, $\theta =(3, 1.5, 2)$ or $\beta =0.2$, $\theta =(0.3, 0.15, 0.2)$. $X=(1, X)^{T}$ and $X\sim N(0,1)$, $\alpha (u)=(\alpha _{1}(u), \alpha _{2}(u))^{T}$, $\alpha _{1}(u)=\exp (2u-1)$, $\alpha _{2}(u)=2\sin ^{2}(2\pi u)$. $\xi $ is unobserved and remitted by $(\eta , V)$ through $\eta =\xi (V)+e$ with $\xi (V)= 3V-\cos (V)$, $V\sim N(0, 0.5^{2})$ and is independent of (U, W, X), e follows $N(0, 0.5^{2})$ and is independent of (U, V, W, X). We consider three cases: (i) W is independent of U, $W\sim N(\mathbf{0}, \Sigma _{W})$, $\Sigma _{W}=(\sigma _{w,ij})$ with $\sigma _{w, ij}=0.25^{|i-j|}$, $U\sim \mathrm{Unif}[0,1]$. (ii) (W, U) follows $\mathrm{Unif}[-1,1]$, and $\mathrm{Var}((W^{T}, U)^{T})=(\sigma _{ij})$ with $\sigma _{ij}=0.5^{|i-j|}$. (iii) The first component of W is 0 with probability 0.5 and 1 with probability 0.5, the rest components of W are normally distributed with mean 0, and $\mathrm{Var}(W)=(\sigma _{w,i'j'})$ with $\sigma _{w, i'j'}=0.5^{|i'-j'|}$, $U\sim \mathrm{Unif}[0,1]$ and is independent of W. In this example, we use the bandwidth $h=3\times n^{-1/3}$.

The simulation results for the benchmark estimator (i.e., all covariates are measured exactly), the proposed estimator and the naive estimator (using $\eta $ directly) are presented in Tables 1 and 2, which reports the mean and associated standard errors of $(\hat{\beta }, \hat{\theta } )$. We can see that the estimated values based on the proposed procedure and the benchmark procedure are close to the true value in all three cases. This indicates our proposed method is promising. However, the naive estimator has severe bias and performs worse, especially when the sample size $n=500$.

Table 1 The simulation results for Example 1

Full size table

Table 2 The simulation results for Example 1

Full size table

Example 2

In this example, we examined the performance of the proposed variable selection procedure by comparing it with the traditional subset selection criteria such as AIC, BIC and RIC from model (14). Let $\beta =(-0.5, 0.5)^{T}$. X and $\alpha (u)$ are the same as in Example 1. $\xi (V)=(\xi _{1}(V), \xi _{2}(V))^{T}$, $\xi _{1}(V)=2\cos (V)$, $\xi _{2}(V)=0.1\exp (V)+3\sin (V)$, and the ancillary variable $\eta =(\eta _{1}, \eta _{2})^{T}$ with $\eta _{1}=\xi _{1}(V)+e_{1}$, $\eta _{1}=\xi _{2}(V)+e_{2}$. V is independent of $(e_{1}, e_{2})^{T}$ and follows N(0, 1). $(e_{1}, e_{2})^{T}$ follows $N_{2}((0, 0)^{T}, \Sigma _{e})$ with $\Sigma _{e}=(\sigma _{e,ij})_{1\le i, j \le 2}$, $\sigma _{e, ij}=(-0.5)^{|i-j|}$. U follows Unif[0, 1]. Moreover, $\mathrm{Var}((\xi ^{T}, W)^{T})= (\sigma _{o,ij})_{1\le i,j \le q}$ with $\sigma _{o,ij}=0.5^{|i-j|}$. We considered two cases: $\theta =(1, 0, 0, 1, 0)^{T} \in \mathbb {R}^{5}$ and $\theta =(1, 0, 0, 1, 0, 0, 0 ,0)^{T}\in \mathbb {R}^{8}$.

We examined the following quantities: the median of the squares errors (MedSE) of $\Vert \hat{\gamma }-\gamma \Vert _{2}^{2}$, the average number (labeled “C”) of the three or eight true zero coefficients correctly set to zero, and the average number (labeled “I”) of the four true nonzeros incorrectly set to zero. Similar to Example 1, we considered three estimators: the benchmark estimator, the proposed estimator and the naive estimator. The GCV procedure introduced in Sect. 2.3 was used for selecting $\lambda _{j}$’s. 30 grid points were set to be evenly distributed over the range of $\lambda $. The simulation results are reported in Table 3.

Table 3 The simulation results for Example 2

Full size table

We can see that both benchmark estimator and our proposed estimator perform better as the sample size increase to 1000. The values of “C” and “I” are close to the true values 3 in case 1 or 6 in case 2, and 0, respectively. The performance of the SCAD procedure is similar to that of the oracle procedure and better than the penalized best subset variable selection procedure using AIC and RIC. Moreover, the performance of the SCAD is similar to that of BIC, which costs much more computational time, however. The MedSE of the SCAD and BIC procedures for both benchmark estimator and the proposed estimator are also close to those obtained from the oracle procedure. As anticipated, the naive procedure has a much higher rate of incorrectly setting nonzero coefficients to zero. Especially, the number of SCAD incorrectly setting nonzero coefficients closes to 1 instead of 0 in the two cases. When sample size $n=1000$, the number of best subset variable selection incorrectly setting nonzero coefficients is at least 0.4 instead of 0. At the same time, the MedSE of the naive estimator is about 0.37 when $n=500$ and 0.26 when $n=1000$ even for the oracle setting. This means that ignoring measurement error e increases the chance of identifying more significant components, and causes that one may falsely choose variables and result in an inappropriate model. $\hat{\xi }(v)$ performs well for variable selection.

Example 3

In this example, we examined the performance of the estimation procedure for nonparametric components introduced in Sect. 3.1. $\beta =(1,1,1)^{T}$, $\theta =(-1, 0.5)^{T}$, $X=(1, X)^{T}$, where X follows N(0, 1). $\alpha (u)=(\alpha _{1}(u), \alpha _{2}(u))^{T}$, $\alpha _{1}(u)=2\exp (-2 u)$, $\alpha _{2}(u)=2\sin ^{2}(\pi u)$. U, $\xi (V)$, V and e are the same as in Example 2. Moreover, $\mathrm{Var}\{(\xi ^{T}, W)^{T}\}=(\sigma _{o,ij})_{1\le i,j \le q}$ with $\sigma _{o,ij}=0.5^{|i-j|}$. In this example, we set $h=0.2$. The performance of the estimator $\hat{\alpha }(u)=(\hat{\alpha }_{1}(u), \hat{\alpha }_{2}(u))^{T}$ was assessed by the square root of average square errors (RASE)

$$\begin{aligned}&\mathrm{RASE}_{1}=\Bigg \{n_{0}^{-1}\sum _{i=1}^{n_{0}}\Vert \hat{\alpha }_{1} (u_{i})-\alpha _{1}(u_{i})\Vert \Bigg \}^{1/2},\\&\mathrm{RASE}_{2}=\Bigg \{n_{0}^{-1}\sum _{i=1}^{n_{0}}\Vert \hat{\alpha }_{2}(u_{i})- \alpha _{2}(u_{i})\Vert \Bigg \}^{1/2}, \end{aligned}$$

where $\{u_{1}, \ldots , u_{n_{0}} \}$ are the given grid points, and $n_{0}=200$ is the number of grid points.

We evaluated the estimation procedure (11) for two scenarios: (i) using the estimated $\hat{\gamma }=(\hat{\beta }^{T}, \hat{\theta }^{T})^{T}$, (ii) using the true value ${\gamma }=({\beta }^{T}, {\theta }^{T})^{T}$. We report the simulation mean and standard derivation of $\mathrm{RASE}_{1}$ and $\mathrm{RASE}_{2}$, and the simulation mean and associated stand derivation of $\Vert \hat{\gamma }-\gamma \Vert ^{2}$ in Table 4. These results indicate that the performance of both the benchmark estimator and the proposed estimator works well regardless $\hat{\gamma }$ or $\gamma $ being used. This is not surprising because $\hat{\gamma }$ is root-n consistent with higher convergence rates than nonparametric estimates. As a result, the benchmark estimator and the proposed estimator work satisfactorily under the two scenarios in term of RASE. On the other hand, the naive procedure results in no-ignorable biases in estimation of $\gamma $ and the biased estimators $\hat{\gamma }$ deteriorate the estimation procedure for $\alpha (\cdot )$ and eventually make $\hat{\alpha }(u)$ larger biases. It is worthy mention that the naive estimator through by using true $\gamma $ works well since no biases are caused (see the third row under the “Exact $\gamma $” column in Table 4). The estimation for function $\alpha _2(u)$ with the estimated $\hat{\gamma }$ performs as well as if we knew the true value of $\gamma $ regardless the proposed estimation method or naive estimation. But the estimation procedure for $\alpha _1(u)$ does not have such a nice property; i.e., the RASE value for $\alpha _1(u)$ increases from 0.0527 for the proposed method to 0.7993 for the naive estimation when the sample size $n=1000$, whereas the RASE value for $\alpha _2(u)$ keeps the same scale for the proposed method and the naive estimation. This substantial difference is because $\alpha _1(u)$ but $\alpha _2(u)$ includes the biases caused by ignoring measurement errors. These features can further be observed in the left panels and right panels of Fig. 1 for the benchmark procedure when $n=1000$, i.e., using $\xi (v)$ and Fig. 2 for the proposed procedure, i.e., using $\hat{\xi }(v)$, where we plot the RASE values based on the true $\gamma $ against the RASE values based on the estimated $\hat{\gamma }$ for $\alpha _1(u)$ the (left panel) and $\alpha _2(u)$ (right panel). It can be seen that the estimation procedure (11) for $\alpha _2(u)$ with the estimated $\hat{\gamma }$ performs as well as if we knew the true value of $\gamma $.

Table 4 The simulation results for Example 3

Full size table

Example 4

In this example, we examine the performance of the test procedures proposed in Sect. 4.2. The simulation setting is the same as in Example 3. Consider the hypothesis

$$\begin{aligned} H_{0}: \alpha _{2}(u)=0 \ \text{ vs } \ H_{1}: \alpha _{2}(u)\not =0, \end{aligned}$$

(15)

where $\alpha _2(u)$ is a sequence of alternative models indexed by $C_{o}$ of form $\alpha _{2}(u)=C_{o} \times u(1-u).$ We conducted 400 simulations at four different significance levels: 0.01, 0.025, 0.05 and 0.10 for the benchmark procedure and the proposed procedure. 500 conditional bootstrap (Cai et al. 2000) samples were generated in each simulation for power calculation. The simulation results are reported in Table 5 and Fig. 3. We can see that when $C_{o}=0$, all empirical levels obtained by these two procedures are close to the four nominal levels, which indicates that the bootstrap method gives proper Type I errors. As $C_{o}$ increases, the power functions increases rapidly. It is worth noting that the simulation results for the benchmark procedure concur with what Li and Liang (2008) observed, and the proposed estimation procedure performs also well. This indicates that the proposed GLRT under the measurement error setting works well numerically and confirms our theoretical findings.

Table 5 The simulation results for Example 4

Full size table

4.2 An empirical example

We analyzed a data set with 358 complete observations from a diabetes study conducted in central Virginia for African Americans, whose aim was at understanding the relationship between the prevalence of obesity, diabetes, and other cardiovascular risk factors. There are 14 covariates of potential interest: “TC, Total Cholesterol”; “SG, Stabilized Glucose”; “ HDL, High-Density Lipoprotein”; “ Ratio, Cholesterol/HDL ”; “ GH, Glycosolated Hemoglobin”; “age”; “gender”; “height”; “weight”; “frame”; “FSBP, First Systolic Blood Pressure”; “ FDBP, First Diastolic Blood Pressure”; “waist” and “hip”. Usually, $\mathtt{GH}$ over 7.0 indicates a positive diagnosis of diabetes. So Y was assigned 1 if $\mathtt{GH} >7.0$ and 0 otherwise. We are interested in the relationship between the probability being diabetes and the collected covariates. Cambien et al. (1987) found that blood pressure is strongly associated with glucose. Han et al. (1995) also found that Ratio is associated with TC and HDL. On the basis of preliminary results, we treat $\eta =(\mathtt{FSBP}, \mathtt{FDBP})^{T}$ and $V=\mathtt{SG}$ as ancillary variables to remit unobservable variables $\xi =(\xi _{1}(V), \xi _{2}(V))^{T}$. Take $X=(\mathtt{TC}, \mathtt{HDL})^{T}$ and $U=\mathtt{Ratio}$ to possibly investigate the varying coefficient functions $\alpha (\cdot )=(\alpha _{1}(U), \alpha _{2}(U))^{T}$. Other W-variables include age, gender, height, weight, frame, waist, hip. Both gender and frame are discrete variables of 1 and 0 for male and female, and of 1, 2, 3 for small, medium and large frames, respectively. All continuous covariates were standardized.

We used the proposed quasi-likelihood method and the penalized quasi-likelihood with SCAD penalty for estimation and variable selection. To make a comparison, we also considered AIC, BIC and RIC variable selection procedures. The bandwidth $h=0.5n^{-1/3}$ was used for local regression fitting. The results are reported in Table 6. We can see that the SCAD procedure is in conjunction with BIC and RIC procedures, and all the three methods indicate that only the possibly remitted variables $\xi =(\xi _{1}(V), \xi _{2}(V))^{T}$ are significant, while all W-variable are not significant. Compared with the estimated value of $\beta $, the SCAD-based estimates for $\beta $ are close to those obtained using the quasi-likelihood. AIC selects extra 2 W-variables: $\mathtt{waist}$ and $\mathtt{hip}$. Recalling the simulation performance in Sect. 4.1, AIC may suggest an over-fitted model. As such, the model selected through SCAD, BIC and RIC may be more proper.

Table 6 Estimation and variable selection results of real data analysis

Full size table

We further considered estimation procedure and variable selection for X-variables. We conducted 500 bootstraps to test $\alpha _{1}(\cdot )= 0$. The corresponding GLRT-based p value is 0.0711, larger than the 97.5 % quantile of 500 bootstraps, 0.0418, and suggests a rejection of the null hypothesis. In the same way, we tested $\alpha _{2}(\cdot )=0$ and got the corresponding p value 0.3027, much larger than the 97.5 % quantile of 500 bootstraps, 0.0355. This also indicates that we should reject the null hypothesis. The estimated curves associated with their 95 % pointwise confidence bands are depicted in Fig. 4, which shows a nonzero and nonlinear pattern. As a result, both $\alpha _{1}(u)$ and $\alpha _{2}(u)$ should be included in the final model.

References

Akaike, H. (1973). Maximum likelihood identification of Gaussian autoregressive moving average models. Biometrika, 60, 255–265.
Article MathSciNet MATH Google Scholar
Cai, Z., Fan, J., Li, R. (2000). Efficient estimation and inferences for varying-coefficient models. Journal of the American Statistical Association, 95, 888–902.
Cambien, F., Warnet, J., Eschwege, E., Jacqueson, A., Richard, J., Rosselin, G. (1987). Body mass, blood pressure, glucose, and lipids. Does plasma insulin explain their relationships? Arteriosclerosis, Thrombosis, and Vascular Biology, 7, 197–202.
Carroll, R. J., Wang, Y. (2008). Nonparametric variance estimation in the analysis of microarray data: A measurement error approach. Biometrika, 95(2), 437–449.
Carroll, R. J., Fan, J., Gijbels, I., Wand, M. P. (1997). Generalized partially linear single-index models. Journal of the American Statistical Association, 92(438), 477–489.
Carroll, R. J., Ruppert, D., Stefanski, L. A., Crainiceanu, C. M. (2006). Nonlinear measurement error models, a modern perspective (2nd ed.). New York: Chapman and Hall.
Cook, J. R., Stefanski, L. A. (1994). Simulation-extrapolation estimation in parametric measurement error models. Journal of the American Statistical Association, 89, 1314–1328.
Fan, J., Gijbels, I. (1996). Local polynomial modelling and its applications (Vol. 66). London: Chapman & Hall.
Fan, J., Huang, T. (2005). Profile likelihood inferences on semiparametric varying-coefficient partially linear models. Bernoulli, 11, 1031–1057.
Fan, J., Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. Journal of the American Statistical Association, 96, 1348–1360.
Fan, J., Zhang, C. M., Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. The Annals of Statistics, 29, 153–193.
Foster, D., George, E. (1994). The risk inflation criterion for multiple regression. The Annals of Statistics, 22, 1947–1975.
Hall, P., Ma, Y. (2007). Semiparametric estimators of functional measurement error models with unknown error. Journal of the Royal Statistical Society, Series B Statistical Methodology, 69(3), 429–446.
Han, T. S., van Leer, E. M., Seidell, J. C., Lean, M. E. (1995). Waist circumference action levels in the identification of cardiovascular risk factors: prevalence study in a random sample. British Medical Journal (BMJ), 311(7017), 1401–1405.
Härdle, W., Liang, H., Gao, J. (2000). Partially linear models. Heidelberg: Physica-Verlag. Hastie, T. and Tibshirani, R. (1993). Varying-coefficient models (with discussion). Journal of the Royal Statistical Society, Series B Statistical Methodology, 55, 757–796.
Hastie, T., Tibshirani, R. (1993). Varying-coefficient models (with discussion). Journal of the Royal Statistical Society, Series B Statistical Methodology, 55, 757–796.
Hunsberger, S. (1994). Semiparametric regression in likelihood-based models. Journal of the American Statistical Association, 89, 1354–1365.
Article MathSciNet MATH Google Scholar
Hunsberger, S., Albert, P. S., Follmann, D. A., Suh, E. (2002). Parametric and semiparametric approaches to testing for seasonal trend in serial count data. Biostatistics, 3, 289–298.
Li, G., Xue, L., Lian, H. (2011). Semi-varying coefficient models with a diverging number of components. Journal of Multivariate Analysis, 102(7), 1166–1174.
Li, R., Liang, H. (2008). Variable selection in semiparametric regression modeling. The Annals of Statistics, 36, 261–286.
Liang, H., Li, R. (2009). Variable selection for partially linear models with measurement errors. Journal of the American Statistical Association, 104(485), 234–248.
Lin, X., Carroll, R. J. (2001). Semiparametric regression for clustered data using generalized estimating equations. Journal of the American Statistical Association, 96, 1045–1056.
Lobach, I., Carroll, R. J., Spinka, C., Gail, M., Chatterjee, N. (2008). Haplotype-based regression analysis and inference of case–control studies with unphased genotypes and measurement errors in environmental exposures. Biometrics, 64, 673–684.
Lobach, I., Fan, R., Carroll, R. J. (2010). Genotype-based association mapping of complex diseases: gene–environment interactions with multiple genetic markers and measurement error in environmental exposures. Genetic Epidemiology, 34, 792–802.
Ma, Y., Carroll, R. J. (2006). Locally efficient estimators for semiparametric models with measurement error. Journal of the American Statistical Association, 101(476), 1465–1474.
Ma, Y., Li, R. (2010). Variable selection in measurement error models. Bernoulli, 16(1), 274–300.
Ma, Y., Tsiatis, A. A. (2006). On closed form semiparametric estimators for measurement error models. Statistica Sinica, 16(1), 183–193.
Robinson, P. M. (1988). Root n-consistent semiparametric regression. Econometrica, 56, 931–954.
Article MathSciNet MATH Google Scholar
Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6, 461–464.
Article MathSciNet MATH Google Scholar
Severini, T. A., Staniswalis, J. G. (1994). Quasi-likelihood estimation in semiparametric models. Journal of the American Statistical Association, 89, 501–511.
Silverman, B. W. (1986). Density estimation for statistics and data analysis, Vol. 26 of Monographs on statistics and applied probability. London: Chapman and Hall.
Sinha, S., Mallick, B. K., Kipnis, V., Carroll, R. J. (2010). Semiparametric Bayesian analysis of nutritional epidemiology data in the presence of measurement error. Biometrics, 66(2), 444–454.
Speckman, P. E. (1988). Kernel smoothing in partial linear models. Journal of the Royal Statistical Society, Series B Statistical Methodology, 50, 413–436.
MathSciNet MATH Google Scholar
Stefanski, L. A. (1989). Unbiased estimation of a nonlinear function of a normal mean with application to measurement error models. Communications in Statistics. Theory and Methods, 18(12), 4335–4358.
Article MathSciNet MATH Google Scholar
Stefanski, L. A., Carroll, R. J. (1987). Conditional scores and optimal scores for generalized linear measurement-error models. Biometrika, 74(4), 703–716.
Tsiatis, A. A., Ma, Y. (2004). Locally efficient semiparametric estimators for functional measurement error models. Biometrika, 91(4), 835–848.
Wang, H., Xia, Y. (2009). Shrinkage estimation of the varying coefficient model. Journal of the American Statistical Association, 104, 747–757.
Wang, L., Liu, X., Liang, H., Carroll, R. J. (2011). Estimation and variable selection for generalized additive partial linear models. The Annals of Statistics, 39, 1827–1851.
Wei, F., Huang, J., Li, H. (2011). Variable selection and estimation in high-dimensional varyingcoefficient models. Statistica Sinica, 21(4), 1515–1540.
Xia, Y., Zhang, W., Tong, H. (2004). Efficient estimation for semivarying-coefficient models. Biometrika, 91, 661–681.
Yi, G. Y., Ma, Y. Y., Carroll, R. J. (2012). A functional generalized method of moments approach for longitudinal studies with missing responses and covariate measurement error. Biometrika, 99(1), 151–165.
Yuan, M., Lin, Y. (2006). Model selection and estimation in regression with grouped variables. Journal of the Royal Statistical Society, Series B Statistical Methodology, 68, 49–67.
Zhang, C. (2003). Calibrating the degrees of freedom for automatic data smoothing and effective curve checking. Journal of the American Statistical Association, 98(463), 609–628.
Article MathSciNet MATH Google Scholar
Zhang, W., Lee, S.-Y., Song, X. (2002). Local polynomial fitting in semivarying coefficient model. Journal of Multivariate Analysis, 82, 166–188.
Zhou, Y., Liang, H. (2009). Statistical inference for semiparametric varying-coefficient partially linear models with error-prone linear covariates. The Annals of Statistics, 37, 427–458.

Download references

Acknowledgments

The authors thank the associate editor, two referees for their constructive suggestions that helped us to improve the early manuscript. Zhang Jun’s research was supported by the National Natural Science Foundation of China (NSFC) Grant No. 11326179 (Tian yuan fund for Mathematics), and NSFC Grant No. 11401391, and the Project of Department of Education of Guangdong Province of China, Grant No. 2014KTSCX112. Feng Zhenghui’s research was supported by the NSFC Grant No. 11301434. Xu Peirong’s research was supported by the Natural Science Foundation of Jiangsu Province, China, Grant No. BK20140617. Liang Hua’s research was partially supported by NSF Grants DMS- 1440121 and DMS-1418042, and by Award Number 11228103, made by National Natural Science Foundation of China.

Author information

Authors and Affiliations

Shen Zhen-Hong Kong Joint Research Centre for Applied Statistical Sciences, School of Mathematics and Statistics, Institute of Statistical Sciences, Shenzhen University, Shenzhen, 518060, China
Jun Zhang
School of Economics, and the Wang Yanan Institute for Studies in Economics, Xiamen University, Xiamen, 361005, China
Zhenghui Feng
Department of Mathematics, Southeast University, Nanjing, 211189, China
Peirong Xu
Department of Statistics, George Washington University, Washington, DC, 20052, USA
Hua Liang

Authors

Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhenghui Feng
View author publications
You can also search for this author in PubMed Google Scholar
Peirong Xu
View author publications
You can also search for this author in PubMed Google Scholar
Hua Liang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jun Zhang.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 143 KB)

About this article

Cite this article

Zhang, J., Feng, Z., Xu, P. et al. Generalized varying coefficient partially linear measurement errors models. Ann Inst Stat Math 69, 97–120 (2017). https://doi.org/10.1007/s10463-015-0532-y

Download citation

Received: 25 September 2014
Revised: 05 April 2015
Published: 29 July 2015
Issue Date: February 2017
DOI: https://doi.org/10.1007/s10463-015-0532-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Generalized varying coefficient partially linear measurement errors models

Abstract

Similar content being viewed by others

Estimation and Inference in Semi-Functional Partially Linear Measurement Error Models

Regularization statistical inferences for partially linear models with high dimensional endogenous covariates

Estimation and inference for varying-coefficient regression models with error-prone covariates

1 Introduction

2 Estimation and variable selection for parametric components

2.1 Covariate calibration

Proposition 1

2.2 Quasi-likelihood-based estimation

Theorem 1

Remark 1

2.3 Penalized quasi-likelihood-based variable selection

Theorem 2

Theorem 3

Remark 2

3 Statistical inference for nonparametric components

3.1 Refined estimator of nonparametric component

Theorem 4

Remark 3

3.2 Variable selection for nonparametric component

Theorem 5

Remark 4

Remark 5

4 Numerical studies

4.1 Simulation studies

Example 1

Example 2

Example 3

Example 4

4.2 An empirical example

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Supplementary material 1 (pdf 143 KB)

About this article

Cite this article

Share this article

Keywords

Search

Navigation