1 Introduction

Generalized partially linear models have been widely used in statistics. Such models enrich the more classic generalized linear models by allowing a covariate to enter the link function through a nonparametric form. This is useful when the dependence of the response to some covariates, even after transformation through a suitable link function, is still not linear and difficult to specify. At the same time, the model also allows the more classic generalized linear dependence on some other covariates. Many works exist in the literature for estimation and inference for generalized partially linear models, see, for example, Carroll et al. (1995), Liang et al. (2009), Apanasovich et al. (2009), and Yu and Ruppert (2012).

When one of the covariates involved in the generalized partially linear model cannot be measured precisely, the problem becomes much more difficult. In fact, most of the works in handling measurement error issues in the generalized partially linear model considered only the case that measurement error occurs to a covariate involved in the linear component (Ma and Carroll 2006; Liu et al. 2017; Liang and Ren 2005; Liu 2007; Liang and Thurston 2008). When the model degenerates to simply the generalized linear model, even more literatures exist to handle the measurement error issues (Carroll et al. 2006; Stefanski and Carroll 1985, 1987; Huang and Wang 2001; Ma and Tsiatis 2006; Buonaccorsi 2010; Xu and Ma 2015). When handled properly, it can be shown that the parameters can be estimated at the root-n convergence rate despite of the presence of the measurement error and the possible presence of the nonparametric function in the model. However, it is a different story when the covariate inside the nonparametric function itself is measured with error. We conjecture that this is because as soon as the covariate inside an unknown function is subject to error, the problem falls into the general framework of nonparametric measurement error models and the standard practice for estimation and inference is through deconvolution. Deconvolution method is widely used in handling latent components and has been used to show that nonparametric regression with errors in covariates can have very slow convergence rate. Possibly due to these inherent difficulties, generalized partially linear models with errors in the covariate inside the nonparametric function have not been studied systematically.

We tackle this difficult problem where the error occurs to the covariate inside the nonparametric component of the generalized partially linear model through a novel approach that avoids the deconvolution treatment completely. Two key ideas lead to our success in this endeavor. The first is the idea of using B-splines expansion to approximate the nonparametric function of the latent covariate. The B-spline nature allows us to write out the approximation form without having to perform the estimation simultaneously. This is different from nonparametric estimation via kernel method, where the approximation and estimation is integrated and inseparable. The second idea is the recognition that after the B-spline approximation, the error-free model is effectively a parametric model, or at least a parametric model in terms of operation, hence the only nonparametric component in the measurement error model is the distribution of the latent covariate. This implies that the semiparametric approach in Tsiatis and Ma (2004) can be adopted here to help establishing the estimation procedure. The encouraging discovery is that we not only can bypass the difficulties caused by nonparametric function of a covariate measured with error in terms of estimation, we also prove that the procedure can retain the root-n convergence rate of the parameter estimation in the original model.

The structure of this paper is as follows. We describe the model and the estimation methodology in Sect. 2, following with establishing the large sample properties of the parameter estimation in Sect. 3. Two simulation studies are conducted in Sect. 4, and we analyze the AIDS Clinical Trials Group (ACTG) study in Sect. 5. We finish the paper with some discussions in Sect. 6. All the technical details and proofs are provided in “Appendix”.

2 Main results

2.1 The model

We work in a measurement error model framework, sometimes also referred to as errors-in-covariate model. It is different from the standard regression model; in that, at least one of the covariates is not directly observable. Instead, a measurement of this covariate that contains error is observed only. Generally speaking, in a standard regression problem, we would observe independent and identically distributed (i.i.d.) observations \((X_i, \mathbf{Z}_i, Y_i), i=1, \dots , n\), where \((X_i, \mathbf{Z}_i)\) is the covariate and \(Y_i\) is the response. Then, with a specific model of \(Y_i\) given \(X_i\) and \(\mathbf{Z}_i\), we can then proceed to estimate the unknown components in this regression relation. However, in a measurement error model, \(X_i\) is no longer available, instead, only an errored version of \(X_i\), say \(W_i\) is observed. Thus, the goal is to still perform the estimation of the parameters in the model of \(Y_i\) given \((X_i, \mathbf{Z}_i)\), but using \((W_i, \mathbf{Z}_i, Y_i)\)’s, instead of \((X_i, Z_i, Y_i)\)’s.

In this paper, we study the generalized partially linear model

$$\begin{aligned} f_{Y\mid X, \mathbf{Z}}(y,x,\mathbf{z},{\varvec{\alpha }},{\varvec{\beta }},g)= f\{y,\mathbf{z}^{\mathrm{T}}{\varvec{\beta }}+g(x),{\varvec{\alpha }}\}. \end{aligned}$$
(1)

Here, Y is the univariate response variable and X and \(\mathbf{Z}\) are covariates. We assume the univariate variable X to be compactly supported. Without loss of generality, let the support be \([0, \,\,\,1]\). We assume \(\mathbf{Z}\in {{\mathscr {R}}}^{p_z}, p_z\ge 1\). The unknown components in (1) are \({\varvec{\beta }}\in {\mathscr {R}}^{p_z}\) and \({\varvec{\alpha }}\), whose estimation and inference are of the main interest to us, and the nuisance function \(g(\cdot )\), which contributes to the name “partially linear.” The link function \(f(\cdot )\) is assumed to be known. Here, the parameter \({\varvec{\beta }}\) describes the linear effect of the covariate in \(\mathbf{Z}\), \(g(\cdot )\) describes the unspecified effect of X and \({\varvec{\alpha }}\) arises according to the link function f. For example, \(f(\cdot )\) can be the inverse logit link function \(f(\cdot )=1-1/\{\exp (\cdot )+1\}\) or the normal link function \(f(\cdot )=\exp \{ -(\cdot )^2/(2\alpha ^2)\}/\{(2\pi \alpha ^2)^{1/2}\}\). Note that in the logistic example, the parameter \({\varvec{\alpha }}\) does not appear, while in the normal example, \(\alpha \) captures the standard error of Y. Now, although Y and \(\mathbf{Z}\)’s are observable, \(\mathbf{X}\) is not. Instead, X is a random variable measured with error. Thus, lieu of observing X, we observe W, where

$$\begin{aligned} W=X+U, \end{aligned}$$
(2)

and U is a normal random error independent of X and \(\mathbf{Z}\) with mean zero and variance \(\sigma _U^2\). For ease of the presentation of the main methodology, we assume \(\sigma _U^2\) is known. When \(\sigma _U^2\) is unknown, a common approach is to use repeated measurements to estimate \(\sigma _U^2\) first and then plug in. The observed data are \((W_i, \mathbf{Z}_i, Y_i), i=1, \dots , n\), which are iid. Our goal is to estimate \({\varvec{\alpha }}\), \({\varvec{\beta }}\), together with \(g(\cdot )\) hence to understand the dependence of Y on the covariates \((X,\mathbf{Z})\).

2.2 Efficient score derivation

For preparation, we first approximate g(x) with a B-spline representation, i.e., \(g(x)\approx \mathbf{B}(x)^{\mathrm{T}}{\varvec{\gamma }}\). Under this approximation, model (1) becomes

$$\begin{aligned} f_{Y\mid X, \mathbf{Z}}(y,x,\mathbf{z},{\varvec{\alpha }},{\varvec{\beta }},g)\approx f_{Y\mid X, \mathbf{Z}}(y,x,\mathbf{z},\varvec{\theta })\equiv f\{y,\mathbf{z}^{\mathrm{T}}{\varvec{\beta }}+\mathbf{B}(x)^{\mathrm{T}}{\varvec{\gamma }},{\varvec{\alpha }}\}, \end{aligned}$$
(3)

which is a complete parametric model with unknown parameters \(\varvec{\theta }\equiv ({\varvec{\alpha }}^{\mathrm{T}},{\varvec{\beta }}^{\mathrm{T}},{\varvec{\gamma }}^{\mathrm{T}})^{\mathrm{T}}\). This model falls in the general framework of Tsiatis and Ma (2004), hence their estimation procedure can be adopted here. Specifically, the joint distribution of the observed variables conditional on \(\mathbf{Z}\) is

$$\begin{aligned} f_{W,Y\mid \mathbf{Z}}(y,w,\mathbf{z},\varvec{\theta })=\int f\{y,\mathbf{z}^{\mathrm{T}}{\varvec{\beta }}+\mathbf{B}(x)^{\mathrm{T}}{\varvec{\gamma }},{\varvec{\alpha }}\}f_{W\mid X}(w,x)f_{X\mid \mathbf{Z}}(x,\mathbf{z})d\mu (x), \end{aligned}$$

with the condition distribution function \(f_{X\mid \mathbf{Z}}(x,\mathbf{z})\) being a nuisance parameter. The nuisance tangent space \(\varLambda \) and its orthogonal complement \(\varLambda ^\perp \) can be written as

$$\begin{aligned} \varLambda= & {} [E\{\mathbf{a}(X,\mathbf{Z}) | Y, W,\mathbf{Z}\}: E\{\mathbf{a}(X,\mathbf{Z})\mid \mathbf{Z}\}=0],\\ \varLambda ^\perp= & {} [\mathbf{h}(Y, W, \mathbf{Z}): E\{\mathbf{h}(Y, W, \mathbf{Z})\mid X,\mathbf{Z}\}=\mathbf{0}\ \text{ almost } \text{ everywhere }]. \end{aligned}$$

The efficient score for \(\varvec{\theta }\) is the residual of its score vector \(\mathbf{S}_{\varvec{\theta }}(y,w,\mathbf{z})\) after projecting it on to the nuisance tangent space \(\varLambda \), denoted by

$$\begin{aligned} \mathbf{S}_{\mathrm{res}}(y,w,\mathbf{z},\varvec{\theta })\equiv \mathbf{S}_{\varvec{\theta }}(y,w,\mathbf{z},\varvec{\theta }) - \varPi \{\mathbf{S}_{\varvec{\theta }}(Y,W,\mathbf{Z},\varvec{\theta }) | \varLambda \}, \end{aligned}$$

where \( \mathbf{S}_{\varvec{\theta }}(y,w,\mathbf{z},\varvec{\theta })\equiv {\partial \log f_{W,Y\mid \mathbf{Z}}(y,w,\mathbf{z},\varvec{\theta })}/{\partial \varvec{\theta }}. \) Here, “\(_{\mathrm{res}}\)” stands for residual. The detailed form of \(\mathbf{S}_{\mathrm{res}}(y,w,\mathbf{z},\varvec{\theta })\) is given as

$$\begin{aligned} \mathbf{S}_{\mathrm{res}}(Y, W, \mathbf{Z},\varvec{\theta }) = \mathbf{S}_{\varvec{\theta }}(Y, W, \mathbf{Z},\varvec{\theta }) -E\{\mathbf{a}(X,\mathbf{Z},\varvec{\theta })|Y, W,\mathbf{Z}\} , \end{aligned}$$
(4)

where \(\mathbf{a}(X,\mathbf{Z},\varvec{\theta })\) satisfies

$$\begin{aligned} E\{\mathbf{S}_{\varvec{\theta }}(Y, W, \mathbf{Z}, \varvec{\theta }) \mid X, \mathbf{Z}\} =E[E\{\mathbf{a}(X,\mathbf{Z}, \varvec{\theta }) | Y, W, \mathbf{Z}\} \mid X, \mathbf{Z}]. \end{aligned}$$
(5)

Now, noting that the above derivation is obtained from the approximate model (3), we hence perform some further analysis. Separating the components corresponding to \({\varvec{\alpha }}, {\varvec{\beta }}\) and \({\varvec{\gamma }}\) in \(\varvec{\theta }\), we can write \(\mathbf{S}_{\varvec{\theta }}(y,w,z,\varvec{\theta })=\{ \mathbf{S}_{{\varvec{\alpha }},{\varvec{\beta }}}(y,w,z,\varvec{\theta })^{\mathrm{T}}, \mathbf{S}_{{\varvec{\gamma }}}(y,w,z,\varvec{\theta })^{\mathrm{T}}\}^{\mathrm{T}}\), which leads to the corresponding relation as follows:

$$\begin{aligned} \mathbf{S}_{\mathrm{res}}(y, w, \mathbf{z}, \varvec{\theta })=\{{\mathbf{S}_{\mathrm{res}}}_1(y, w, \mathbf{z}, \varvec{\theta })^{\mathrm{T}}, {\mathbf{S}_{\mathrm{res}}}_2(y, w, \mathbf{z}, \varvec{\theta })^{\mathrm{T}}\}^{\mathrm{T}}. \end{aligned}$$

The estimating equation of the approximate model can be written as

$$\begin{aligned} \sum _{i=1}^n\mathbf{S}_{\mathrm{res}}(Y_i,W_i,\mathbf{Z}_i,\varvec{\theta })= \sum _{i=1}^n\{{\mathbf{S}_{\mathrm{res}}}_1(Y_i,W_i,\mathbf{Z}_i,\varvec{\theta })^{\mathrm{T}}, {\mathbf{S}_{\mathrm{res}}}_2(Y_i,W_i,\mathbf{Z}_i,\varvec{\theta })^{\mathrm{T}}\}^{\mathrm{T}}=\mathbf{0}.\nonumber \\ \end{aligned}$$
(6)

Remember that our original model contains an unknown function g(z). Thus, for the estimation of \({\varvec{\alpha }}, {\varvec{\beta }}\), it is beneficial to treat g as a nuisance parameter as well first, and estimate \({\varvec{\alpha }}, {\varvec{\beta }}\) using profiling. We then plug in the estimated values of \({\varvec{\alpha }}\) and \({\varvec{\beta }}\) and estimate g via the B-spline approximation. Of course, in addition to g, the distribution of the unobservable covariate conditional on the observable covariate \(\mathbf{Z}\) is also a nuisance component and still has to be taken into account.

Let \({\varvec{\delta }}\equiv ({\varvec{\alpha }}^{\mathrm{T}},{\varvec{\beta }}^{\mathrm{T}})^{\mathrm{T}}\) be a p-dimensional parameter. We propose to solve for \({\varvec{\gamma }}\) from \(\sum _{i=1}^n{\mathbf{S}_{\mathrm{res}}}_2(Y_i,W_i,\mathbf{Z}_i,\varvec{\theta })=\mathbf{0}\) to obtain \(\widehat{{\varvec{\gamma }}}({\varvec{\delta }})\) first. Now from

$$\begin{aligned} f_{W,Y\mid \mathbf{Z}}(w,\mathbf{z},y,{\varvec{\delta }},g,f_X)=\int f\{y,\mathbf{z}^{\mathrm{T}}{\varvec{\beta }}+g(x),{\varvec{\alpha }}\}f_{W\mid X}(w,x)f_{X\mid \mathbf{Z}}(x,\mathbf{z})d\mu (x), \end{aligned}$$

we can construct the nuisance tangent space as \(\varLambda =\varLambda _{f_X}+\varLambda _g\), where

$$\begin{aligned} \varLambda _{f_X}= & {} [E\{\mathbf{a}(X,\mathbf{Z}) | Y, W,\mathbf{Z}\}: E\{\mathbf{a}(X,\mathbf{Z})\mid \mathbf{Z}\}=\mathbf{0}]\\ \varLambda _g= & {} \left( E\left[ s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\}\mathbf{b}(X) | Y, W,\mathbf{Z}\right] : \forall \mathbf{b}(X)\right) , \end{aligned}$$

where \(s(y,t,{\varvec{\alpha }}) \equiv \partial \log f(y,t,{\varvec{\alpha }})/\partial t\). Note that \(\varLambda _{f_X}\) and \(\varLambda _g\) are not orthogonal to each other. We can further verify that

$$\begin{aligned} \varLambda _{f_X}^\perp= & {} [\mathbf{h}(Y, W, \mathbf{Z}): E\{\mathbf{h}(Y, W, \mathbf{Z})\mid X,\mathbf{Z}\}=\mathbf{0}\ \text{ a.e. }],\\ \varLambda _g^\perp= & {} \left( \mathbf{h}(Y, W, \mathbf{Z}): E\left[ \mathbf{h}(Y, W, \mathbf{Z}) s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\} \mid X,\mathbf{Z}\right] =\mathbf{0}\ \text{ almost } \text{ everywhere }\right) . \end{aligned}$$

The efficient score for \({\varvec{\delta }}\) is now the residual of the score vector \(\mathbf{S}_{{\varvec{\delta }}}\) after projecting it on to the nuisance tangent space \(\varLambda \), denoted by

$$\begin{aligned} \mathbf{S}_{\mathrm{eff}}(Y,W,\mathbf{Z},{\varvec{\delta }},g)\equiv \mathbf{S}_{{\varvec{\delta }}}(Y,W,\mathbf{Z},{\varvec{\delta }},g) - \varPi \{\mathbf{S}_{{\varvec{\delta }}}(Y,W,\mathbf{Z},{\varvec{\delta }},g) \mid \varLambda \}. \end{aligned}$$
(7)

Its explicit form is given as

$$\begin{aligned} \mathbf{S}_{\mathrm{eff}}(Y, W, \mathbf{Z}, {\varvec{\delta }},g)= & {} \mathbf{S}_{{\varvec{\delta }}}(Y, W, \mathbf{Z}, {\varvec{\delta }},g) - E\{\mathbf{a}(X,\mathbf{Z})|Y, W,\mathbf{Z}\} \\&-\, E\left[ s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\}\mathbf{b}(X)| Y, W, \mathbf{Z}\right] , \end{aligned}$$

where \(\mathbf{a}(X,\mathbf{Z})\) and \(\mathbf{b}(X)\) satisfy

$$\begin{aligned}&E\{\mathbf{S}_{{\varvec{\delta }}}(Y, W, \mathbf{Z},{\varvec{\delta }},g) \mid X,\mathbf{Z}\} \nonumber \\&\quad =E[E\{\mathbf{a}(X,\mathbf{Z}) | Y, W, \mathbf{Z}\} \mid X, \mathbf{Z}] ]\nonumber \\&\qquad +\, E(E[s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\}\mathbf{b}(X)| Y, W, \mathbf{Z}] \mid X,\mathbf{Z})\nonumber \\&\text{ and }\nonumber \\&E[\mathbf{S}_{{\varvec{\delta }}}(Y, W, \mathbf{Z}, {\varvec{\delta }},g) s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\} \mid X,\mathbf{Z}]\nonumber \\&\quad =E[E\{\mathbf{a}(X,\mathbf{Z}) | Y, W, \mathbf{Z}\}s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\} \mid X,\mathbf{Z}]\nonumber \\&\qquad +\,E(E[s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\} \mathbf{b}(X)| Y, W,\mathbf{Z}]s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\} \mid X,\mathbf{Z}).\nonumber \\ \end{aligned}$$
(8)

We can then form the estimating equation \(\sum _{i=1}^n\mathbf{S}_{\mathrm{eff}}\{Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }},\widehat{{\varvec{\gamma }}}({\varvec{\delta }})\}=\mathbf{0}\) to solve for \(\widehat{{\varvec{\delta }}}\) as the estimator, where \(\mathbf{a}(X,\mathbf{Z}), \mathbf{b}(X)\) are the solutions to the integral equations in (8).

2.3 Estimation under working model

The above derivations are based on efficient score calculation and hence will yield the efficient estimator. However, a close look at the procedure reveals that the procedure is not practical because the implementation relies on the unknown function \(f_{X\mid \mathbf{Z}}(x,\mathbf{z})\). Thus, our estimator needs to be calculated under a posited working model of \(\mathbf{f}_{X\mid \mathbf{Z}}^*(x,\mathbf{z})\). The procedure is described below, where we use \(^*\) to denote a quantity whose calculation is carried out using \(\mathbf{f}_{X\mid \mathbf{Z}}^*(x,\mathbf{z})\) instead of \(\mathbf{f}_{X\mid \mathbf{Z}}(x,\mathbf{z})\).

  1. 1.

    Posit a working model \(\mathbf{f}_{X\mid \mathbf{Z}}^*(x,\mathbf{z})\).

  2. 2.

    Solve for \({\varvec{\gamma }}\) from \(\sum _{i=1}^n{\mathbf{S}_{\mathrm{res}}}_2^*(Y_i,W_i,\mathbf{Z}_i,\varvec{\theta })=\mathbf{0}\) to obtain \(\widehat{{\varvec{\gamma }}}({\varvec{\delta }})\).

  3. 3.

    Calculate the score function \(\mathbf{S}_{{\varvec{\delta }}}^*(Y, W, \mathbf{Z},{\varvec{\delta }},g)\) under the working model \(\mathbf{f}_{X\mid \mathbf{Z}}^*(x,\mathbf{z})\).

  4. 4.

    Solve the integral equation (8) to get \(\mathbf{a}(X, \mathbf{Z})\) and \(\mathbf{b}(X)\).

  5. 5.

    Calculate the approximate efficient score function \(\mathbf{S}_{\mathrm{eff}}^*(Y, W, \mathbf{Z}, {\varvec{\delta }},\widehat{g})\) following (7), where \(\widehat{g}(\cdot )=\mathbf{B}(\cdot )^{\mathrm{T}}\widehat{{\varvec{\gamma }}}({\varvec{\delta }})\).

  6. 6.

    Solve the estimating equation \(\sum _{i=1}^n\mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }},\widehat{g})=\mathbf{0}\) to obtain \(\widehat{{\varvec{\delta }}}\).

When we calculate \(\mathbf{a}(X,\mathbf{Z})\) at each observed \(\mathbf{z}\) value and calculate \(\mathbf{b}(\mathbf{X})\), we discretize the distribution of X on m equally spaced points on the support of \(f_{X\mid \mathbf{Z}}(x,\mathbf{z})\) and calculate the probability mass function \(\pi _j(\mathbf{Z})\) at each of the m points. We of course normalize the \(\pi _j(\mathbf{Z})\) in order to ensure \(\sum _{j=1}^m\pi _j(\mathbf{Z})=1\). Note that using the discretization,

$$\begin{aligned} f_{X,Y,W\mid \mathbf{Z}}^*(x_j,y, w, \mathbf{z})\approx f\{y,\mathbf{z}^{\mathrm{T}}{\varvec{\beta }}+g(x_j),{\varvec{\alpha }}\}f_{W\mid X=x_j}(w,x_j)\pi _j(\mathbf{Z}). \end{aligned}$$

Further, \(\mathbf{S}_{{\varvec{\delta }}}^*(Y, W, \mathbf{Z},{\varvec{\delta }},g)\), \(E^*\{\mathbf{a}(X,\mathbf{Z}) | Y, W, \mathbf{Z}\} \) and \(E^*[s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X,{\varvec{\delta }}),{\varvec{\alpha }}\}\mathbf{b}(X)| Y, W, \mathbf{Z}]\) can be approximated by

$$\begin{aligned}&\mathbf{S}_{{\varvec{\delta }}}^*(Y, W, \mathbf{Z},{\varvec{\delta }},g)\\&\quad \approx \frac{\partial \log [\sum _{i=1}^m f\{y,\mathbf{z}^{\mathrm{T}}{\varvec{\beta }}+g(x_i),{\varvec{\alpha }}\}f_{W\mid X}(w,x_i)\pi _i(\mathbf{Z})]}{\partial {\varvec{\delta }}},\\&E^*\{\mathbf{a}(X,\mathbf{Z}) | Y, W, \mathbf{Z}\}\\&\quad \approx \frac{\sum _{i=1}^m\mathbf{a}(x_i,\mathbf{Z}) f_{X,Y,W\mid \mathbf{Z}}^*(x_i,Y, W,\mathbf{Z})}{\sum _{i=1}^m f_{X,Y,W\mid \mathbf{Z}}^*(x_i,Y,W,\mathbf{Z})},\\&E^*[s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(X),{\varvec{\alpha }}\}\mathbf{b}(X)| Y, W, \mathbf{Z}]\\&\quad \approx \frac{\sum _{i=1}^m s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(x_i),{\varvec{\alpha }}\}\mathbf{b}(x_i) f_{X,Y,W\mid \mathbf{Z}}^*(x_i,Y, W,\mathbf{Z})}{\sum _{i=1}^m f_{X,Y,W\mid \mathbf{Z}}^*(x_i,Y,W,\mathbf{Z})}. \end{aligned}$$

Let \(\mathbf{A}(X,\mathbf{Z})\equiv \{\mathbf{a}(x_1, \mathbf{Z}),\dots , \mathbf{a}(x_m,\mathbf{Z})\}^{\mathrm{T}}\) and \(\mathbf{B}(X)\equiv \{\mathbf{b}(x_1),\dots , \mathbf{b}(x_m)\}^{\mathrm{T}}\). Let \(\mathbf{M}_1(X,\mathbf{Z})\equiv \{\mathbf{m}_1(x_1, \mathbf{Z}),\dots ,\mathbf{m}_1(x_m,\mathbf{Z})\}^{\mathrm{T}}\) be a \(m\times p_{{\varvec{\delta }}}\) matrix, where \(p_{{\varvec{\delta }}}\) is the length of \({\varvec{\delta }}\) and \(\mathbf{m}_1(x_i,\mathbf{Z})\equiv E\{\mathbf{S}_{{\varvec{\delta }}}^*(Y, W, \mathbf{Z},{\varvec{\delta }},g)\mid x_i, \mathbf{Z}\}\). Further, let \(\mathbf{M}_2(X,\mathbf{Z})\equiv \{\mathbf{m}_2(x_1, \mathbf{Z}),\dots ,\mathbf{m}_2(x_m,\mathbf{Z})\}^{\mathrm{T}}\) be a \(m\times p_{{\varvec{\delta }}}\) matrix, where we define \(\mathbf{m}_2(x_i,\mathbf{Z})\equiv E\left[ \mathbf{S}_{{\varvec{\delta }}}^*(Y, W, \mathbf{Z},{\varvec{\delta }},g)s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(x_i)\}\mid x_i, \mathbf{Z}\right] \). Finally, let \(\mathbf{C}(X,\mathbf{Z})\) be a \(m \times m\) matrix with the (ij) block equal to

$$\begin{aligned} E\left\{ \frac{ f_{X, Y, W\mid \mathbf{Z}}^*(x_j,Y, W, \mathbf{Z})}{\sum _{i=1}^m f_{X,Y, W\mid \mathbf{Z}}^*(x_i,Y,W, \mathbf{Z})}\mid x_i, \mathbf{Z}\right\} , \end{aligned}$$

let \(\mathbf{D}(X,\mathbf{Z})\) be an \(m \times m\) matrix with the (ij) block equal to

$$\begin{aligned} E\left[ \frac{s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(x_j),{\varvec{\alpha }}\} f_{X, Y, W\mid \mathbf{Z}}^*(x_j,Y, W, \mathbf{Z})}{\sum _{i=1}^m f_{X,Y, W\mid \mathbf{Z}}^*(x_i,Y,W, \mathbf{Z})}\mid x_i, \mathbf{Z}\right] , \end{aligned}$$

let \(\mathbf{F}(X,\mathbf{Z})\) be an \(m \times m\) matrix with the (ij) block equal to

$$\begin{aligned} E\left[ \frac{ f_{X, Y, W\mid \mathbf{Z}}^*(x_j,Y, W, \mathbf{Z})s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(x_i)\}}{\sum _{i=1}^m f_{X,Y, W\mid \mathbf{Z}}^*(x_i,Y,W, \mathbf{Z})}\mid x_i, \mathbf{Z}\right] , \end{aligned}$$

and let \(\mathbf{G}(X,\mathbf{Z})\) be an \(m \times m\) matrix with the (ij) block

$$\begin{aligned} E\left[ \frac{s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(x_j),{\varvec{\alpha }}\} f_{X, Y, W\mid \mathbf{Z}}^*(x_j,Y, W, \mathbf{Z})s\{Y,\mathbf{Z}^{\mathrm{T}}{\varvec{\beta }}+g(x_i)\}}{\sum _{i=1}^m f_{X,Y, W\mid \mathbf{Z}}^*(x_i,Y,W, \mathbf{Z})}\mid x_i, \mathbf{Z}\right] . \end{aligned}$$

We can then get \(\mathbf{a}(x_i,\mathbf{Z})\) and \(\mathbf{b}(x_i)\) by solving

$$\begin{aligned} \left[ \begin{array}{cc} \mathbf{C}(X,\mathbf{Z}) &{} \mathbf{D}(X,\mathbf{Z})\\ \mathbf{F}(X,\mathbf{Z}) &{} \mathbf{G}(X,\mathbf{Z}) \end{array} \right] \left[ \begin{array}{c} \mathbf{A}(X,\mathbf{Z}) \\ \mathbf{B}(X) \end{array} \right] = \left[ \begin{array}{c} \mathbf{M}_1(X,\mathbf{Z})\\ \mathbf{M}_2(X,\mathbf{Z}) \end{array} \right] . \end{aligned}$$

3 Asymptotic properties

Let \({\mathbf{S}_{\mathrm{res}}}_2(Y_i, W_i, \mathbf{Z}_i, {\varvec{\alpha }},{\varvec{\beta }}, g)\) be \({\mathbf{S}_{\mathrm{res}}}_2(Y_i, W_i, \mathbf{Z}_i,{\varvec{\alpha }},{\varvec{\beta }}, {\varvec{\gamma }})\) with all the appearance of \(\mathbf{B}(X)^{\mathrm{T}}{\varvec{\gamma }}\) in it replaced by g(X). We first list the set of regularity conditions required for establishing the large sample properties of our estimator.

  1. (C1)

    The true density \(f_X(x)\) is bounded with compact support. Without loss of generality, we assume the support of \(f_X(x)\) is [0, 1].

  2. (C2)

    The function \(g(x) \in C^q([0, 1])\), \(q>1\), is bounded.

  3. (C3)

    The spline order \(r \ge q\).

  4. (C4)

    Define the knots \(t_{-r+1} = \dots = t_0 = 0< t_1< \dots< t_N < 1 = t_{N+1} = \dots = t_{N+r}\), where N is the number of interior knots that satisfies \(N \rightarrow \infty \), \(N^{-1}n(\log {n})^{-1} \rightarrow \infty \) and \(Nn^{-1/(2q)} \rightarrow \infty \) as \(n \rightarrow \infty \). Denote the number of spline bases \(d_{{\varvec{\gamma }}}\), i.e., \(d_{{\varvec{\gamma }}}=N+r\).

  5. (C5)

    Let \(h_j\) be the distance between the jth and \((j-1)\)th interior knots. Let \(h_b = \max _{1\le j \le N} h_j\) and \(h_s = \min _{1\le j \le N} h_j\). There exists a constant \(c_h \in (0, \infty )\) such that \(h_b/h_s < c_h\). Hence, \(h_b = O_p(N^{-1})\) and \(h_s = O_p(N^{-1})\).

  6. (C6)

    \({\varvec{\gamma }}_0\) is a \(d_{{\varvec{\gamma }}}\)-dimensional spline coefficient vector such that \(\sup _{x \in [0, 1]}|\mathbf{B}(x)^{\mathrm{T}}{\varvec{\gamma }}_0 - g(x)| = O_p(h_b^q)\).

  7. (C7)

    The equation set

    $$\begin{aligned} E\{\mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }},{\varvec{\gamma }})\}= & {} \mathbf{0}, \\ E\{{\mathbf{S}_{\mathrm{res}}}_2^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}, {\varvec{\gamma }})\}= & {} \mathbf{0} \end{aligned}$$

    has unique root for \(\varvec{\theta }\) in the neighborhood of \(\varvec{\theta }_0\). Recall that \(\varvec{\theta }=({\varvec{\alpha }}^{\mathrm{T}}, {\varvec{\beta }}^{\mathrm{T}}, {\varvec{\gamma }}^{\mathrm{T}})^{\mathrm{T}}\) and \({\varvec{\delta }}=({\varvec{\alpha }}^{\mathrm{T}},{\varvec{\beta }}^{\mathrm{T}})^{\mathrm{T}}\). The derivatives with respect to \(\varvec{\theta }\) of the left-hand side are smooth functions of \(\varvec{\theta }\), with its singular values bounded and bounded away from \(\mathbf{0}\). Let the unique root be \(\varvec{\theta }^*\). Note that \(\varvec{\theta }_0\) and \(\varvec{\theta }^*\) are functions of N, that is, for any sufficiently large N, there is a unique root \(\varvec{\theta }^*\) in the neighborhood of \(\varvec{\theta }_0\).

  8. (C8)

    The maximum absolute row sum of the matrix \(\partial \mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}_0, {\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}_0^{\mathrm{T}}\), i.e., \(\Vert \partial \mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}_0, {\varvec{\gamma }}_0)/\partial {\varvec{\gamma }}_0^{\mathrm{T}}\Vert _{\infty }\), is integrable.

The conditions listed above are all standard bounded, smoothness conditions on functions and some classical conditions imposed on the spline order and number of knots. These are commonly used conditions in spline approximation and semiparametric regression literature. We now establish the consistency of \(\widehat{{\varvec{\delta }}}_n\) and \(\widehat{{\varvec{\gamma }}}_n\) as well as the asymptotic distribution property of \(\widehat{{\varvec{\delta }}}_n\).

Theorem 1

Assume Conditions \((\mathrm {C}1){-}(\mathrm {C}7)\) to hold. Let \(\widehat{\varvec{\theta }}_n \) satisfy

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, \widehat{{\varvec{\delta }}}_n, \widehat{{\varvec{\gamma }}}_n)= & {} \mathbf{0}\\ \frac{1}{n}\sum _{i=1}^{n}{\mathbf{S}_{\mathrm{res}}}_2^*(Y_i, W_i, \mathbf{Z}_i, \widehat{{\varvec{\delta }}}_n, \widehat{{\varvec{\gamma }}}_n)= & {} \mathbf{0}. \end{aligned}$$

Then, \(\widehat{\varvec{\theta }}_n - \varvec{\theta }_0=o_p(1)\) element-wise.

The result in Theorem 1 is used to further establish the asymptotic properties of the estimator of the parameters of interest \(\widehat{{\varvec{\delta }}}_n\) and the estimator of the function of interest \(\mathbf{B}(\cdot )^{\mathrm{T}}\widehat{{\varvec{\gamma }}}_n\).

Theorem 2

Assume Conditions \((\mathrm {C}1)-(\mathrm {C}8)\) to hold and let

$$\begin{aligned} \mathbf{Q}\equiv E\left\{ \frac{\partial \mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}_0,{\varvec{\gamma }})}{\partial {\varvec{\delta }}_0^{\mathrm{T}}}\bigg \arrowvert _{\mathbf{B}(\cdot )^{\mathrm{T}}{\varvec{\gamma }}= g(\cdot )} \right\} . \end{aligned}$$

Then,

$$\begin{aligned} \sqrt{n}(\widehat{{\varvec{\delta }}}_n - {\varvec{\delta }}_0)= -\mathbf{Q}^{-1}\frac{1}{\sqrt{n}}\sum _{i=1}^{n}\mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}_0, g)+o_p(1). \end{aligned}$$

Consequently, \(\sqrt{n}(\widehat{{\varvec{\delta }}}_n-{\varvec{\delta }}_0)\rightarrow N(\mathbf{0}, \mathbf{V})\) in distribution when \(n\rightarrow \infty \), where

$$\begin{aligned} \mathbf{V}=\mathbf{Q}^{-1}\hbox {var}\{\mathbf{S}_{\mathrm{eff}}^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}_0, g)\}(\mathbf{Q}^{-1})^{\mathrm{T}}. \end{aligned}$$

Theorem 2 indicates that \({\varvec{\delta }}\) is estimated at the root-n rate. The proofs of Theorems 1 and 2 are given in “Appendix.” Because the B-spline estimation of \(g(\cdot )\) is at a slower rate than root-n, the estimation of \({\varvec{\delta }}\) does not have any impact on the first-order asymptotic properties of \(\widehat{g}\). Thus, for the analysis of the asymptotic properties of \(\widehat{g}\), we can treat \({\varvec{\delta }}\) as known. Then, the proof of Theorem 2 in Jiang and Ma (2018) can be directly used. We skip the details of the proof and provide the specific convergence property of the estimation of g in Theorem 3.

Theorem 3

Assume Conditions \((\mathrm{C}1)-(\mathrm{C}8)\) to hold and let

$$\begin{aligned} \mathbf{P}\equiv E\left\{ \frac{\partial {\mathbf{S}_{\mathrm{res}}}_2^*(Y_i, W_i, \mathbf{Z}_i,{\varvec{\delta }}_0,{\varvec{\gamma }})}{\partial {\varvec{\gamma }}^{\mathrm{T}}}\bigg \arrowvert _{\mathbf{B}(\cdot )^{\mathrm{T}}{\varvec{\gamma }}=g(\cdot )}\right\} . \end{aligned}$$

Then, \(\Vert \widehat{{\varvec{\gamma }}}_n - {\varvec{\gamma }}_0\Vert _2 = O_p\{(nh_b)^{-1/2}\}\). Further,

$$\begin{aligned} \widehat{{\varvec{\gamma }}}_n - {\varvec{\gamma }}_0= -\mathbf{P}^{-1}n^{-1}\sum _{i=1}^n{\mathbf{S}_{\mathrm{res}}}_2^*(Y_i, W_i,\mathbf{Z}_i, {\varvec{\delta }}_0,{\varvec{\gamma }})\{1 + o_p(1)\}. \end{aligned}$$

This leads to that \(\widehat{g}(x)\), which equals \(\mathbf{B}(x)^{\mathrm{T}}\widehat{{\varvec{\gamma }}}_n\), satisfies \(\sup _{x \in [0, 1]} |\widehat{g}(x) - g(x)|= O_p\{(nh_b)^{-1/2}\}\). Specifically, \(\text{ bias }\{\widehat{g}(x)\} = E\{\widehat{g}(x) - g(x)\} = O(h_b^{q-1/2})\) and

$$\begin{aligned}&\sqrt{n h_b}[\widehat{g}(x) - g(x) - \text{ bias }\{\widehat{g}(x)\}]\nonumber \\&\quad = \sqrt{n h_b} \mathbf{B}(x)^{\mathrm{T}}\left\{ - \mathbf{P}^{-1} n^{-1}\sum _{i=1}^n{\mathbf{S}_{\mathrm{res}}}_2^*(Y_i, W_i, \mathbf{Z}_i, {\varvec{\delta }}_0, g)\right\} + o_p(1). \end{aligned}$$

4 Numerical study

In our first simulation, we generated the observations \((W_i, \mathbf{Z}_i, Y_i)\) from the model

$$\begin{aligned} \hbox {pr}(Y_i=1|X_i=x_i,Z_i=z_i) =H\{g( x_{i})+\beta _1 z_{1i}+\beta _2 z_{2i}+\beta _3 z_{3i}+\beta _4 z_{4i}\}, \end{aligned}$$
(9)

where \(W=X+U\) and \(U=\mathrm{normal}(0,0.03)\). The true function is: \(g(x)=-5\exp \{-\,0.8(x-2.5)^2\}\) and H(t) is the inverse logit link function. We set \(\beta _1=1\), \(\beta _2=0.5\), \(\beta _3=1\) and \(\beta _4=-0.3\). The sample size is 1000, and we run 1000 simulations. \(X_i\) is generated from a truncated normal distribution with mean 0.5 and variance 1 / 36 on [0,1] independently of \(\mathbf{Z}_i\). We implemented our method using a normal working model, corresponding to a correct working model case. In order to investigate the performance of our method under a misspecified working model, we also performed another study, in which we have \(X_i\) generated from a truncated student t distribution with degrees of freedom 5. Covariates \(Z_{1i}\), \(Z_{2i}\) and \(Z_{4i}\) are generated from the standard normal distribution. The covariate \(Z_{3i}\) is generated from a uniform distribution on \([-1,1]\). In both studies, we estimated both the parameters \(\beta _1\), \(\beta _2\), \(\beta _3\), \(\beta _4\) and the function g(x).

In the second simulation, we set the true g function to be \(g(x)= -5\exp (-0.2x^2)+5\), while all other settings remain the same. Similarly to the first simulation, we compared the performance of a correct working model and a misspecified working model in terms of estimating both \(\beta _1\), \(\beta _2\), \(\beta _3\), \(\beta _4\) and g(x).

In the third simulation, we increase the sample size to 2000 to see the performance of our method while having a g function that has more nonlinear feature. Specifically, we set \(g(x)=\sin (2\pi x)\), while keep all other settings unchanged.

In simulations 1, 2 and 3, we discretized the distribution of X on [0, 1] to \(m=15\) equal segments and we use the truncated normal distribution discussed earlier as our working model. We used quadratic splines with seven equally spaced knots on [0, 1] to estimate g(x). The number of knots is chosen to be larger than \(n^{1/4}\) to reflect condition (C4). When we further increase the number of knots, the results do not change much. The simulation results are shown in Tables 12 and Figs. 12 and 3.

Table 1 Simulation results under a correct working model
Table 2 Simulation results under a misspecified working model
Fig. 1
figure 1

True function (black line), median estimation (green line), mean estimation (red line) and 90% confidence band (blue line) of g(x) in simulation 1. Correct working model on the left and misspecified working model on the right (color figure online)

Fig. 2
figure 2

True function (black line), median estimation (green line), mean estimation (red line) and 90% confidence band (blue line) of g(x) in simulation 2. Correct working model on the left and misspecified working model on the right (color figure online)

Fig. 3
figure 3

True function (black line), median estimation (green line), mean estimation (red line) and 90% confidence band (blue line) of g(x) in simulation 3. Correct working model on the left and misspecified working model on the right (color figure online)

The results in Tables 1 and 2 show little bias for the \({\varvec{\beta }}\) estimation, regardless a correct working model or a misspecified working model is used. Figures 12 and 3 show that the estimators of g(x) have somewhat large bias on the boundary in both methods, which are within our expectation when factoring in the boundary effect. The performance of g(x) estimation is satisfactory in the interior of the function domain. The simulation results show no big difference between the performance of the correct working model of \(f_X(x)\) and a misspecified one, confirming our theory on consistency in both cases.

5 Data analysis

The data set we analyzed is from an AIDS Clinical Trials Group (ACTG) study. The goal of this study is to compare four different treatments, “ZDV,” “ZDV+ddI,” “ZDV+ddC” and “ddC,” on HIV-infected adults whose CD4 cell counts were from 200 to 500 per cubic millimeter. We labeled those treatments as treatment 1, treatment 2, treatment 3 and treatment 4. We used treatment 1 as the base treatment because it is a standard treatment. There were 1036 patients enrolled in the study and they had no antiretroviral therapy at enrollment. The criterion that we used to compare the four treatments is whether a patient has his or her CD4 count drop below 50%, which is an important indicator for HIV-infected patients to develop AIDS or die. We have \(Y=1\) if a patient has his or her CD4 count drop below 50%, and \(Y=0\) otherwise.

Our model has the form:

$$\begin{aligned} \hbox {pr}(Y_i=1|X_i=x_i,Z_i=z_i) = H\{g( x_{i})+\beta _1 z_{1i}+\beta _2 z_{2i}+\beta _3 z_{3i}\}, \end{aligned}$$
(10)

where \(W=X+U\) and \(U=\mathrm{normal}(0,\sigma ^2_U)\). The covariates \(Z_1\), \(Z_2\) and \(Z_3\) are dichotomous variables. \(Z_{1i}=Z_{2i}=Z_{3i}=0\) indicates that the ith individual receives treatment 1, the base treatment; \(Z_{1i}=1\) and \(Z_{2i}=Z_{3i}=0\) indicates that the ith individual receives treatment 2; \(Z_{1i}=0\), \(Z_{2i}=1\) and \(Z_{3i}=0\) indicates that the ith individual receives treatment 3; \(Z_{1i}=Z_{2i}=0\) and \(Z_{3i}=1\) indicates that the ith individual receives treatment 4. The covariate X is the baseline log(CD4 count) prior to the start of treatment. Because CD4 count cannot be measured precisely, X is considered as our unobservable covariate. We use the average of two available measurements of \(\log \)(CD4 count) as W.

Table 3 Real data analysis results
Fig. 4
figure 4

Estimated g(x) for real data (black line), median estimation (green line), mean estimation (red line) and 90% confidence band (blue line) of g(x) from 1000 bootstrapped samples (color figure online)

First, we estimated the variance of U using the two repeated measurements and we got \(\widehat{\sigma }^2_U=0.3\). Then, we constructed our working model of unobservable variance X. We assume that X follows a truncated normal distribution and estimated its variance by \(\widehat{\sigma }^2_X=\widehat{\sigma }^2_W-\widehat{\sigma }^2_U\).

Table 3 shows that treatment 2, treatment 3 and treatment 4 are more efficient than the baseline treatment, i.e., treatment 1, at 90% confidence level according to the P-values of \(\beta _1\), \(\beta _2\) and \(\beta _3\). The estimated index function g(x) is in Fig. 4. We generated 1000 bootstrapped samples and calculated the bootstrapped mean, median and 90% confidence band for g(x). It shows that g(x) is an decreasing function, indicating that a large baseline CD4 count leads to a smaller risk of developing AIDS or having his/her CD4 counts drop below 50%. Thus, our analysis indicates that in general, the alternative treatments and a higher baseline CD4 count are beneficial to a patient.

6 Discussion

We devised a consistent and locally efficient estimation procedure to estimate both parameters and functions in a generalized partially linear model where the covariate inside the nonparametric function is subject to measurement error. The method does not make any assumption on the distribution of the covariate measured with error other than its finite support, which is easily satisfied in practice. The method is efficient in terms of estimating the model parameters if a correct working model is used, and retains its consistency even if this working model is misspecified. The estimation procedure breaks free from the deconvolution approach, which is the norm of practice in handling nonparametric problems with measurement errors. Instead, a novel usage of B-spline approach in combination with semiparametric method is exploited to push through the analysis.

Many possible extensions can be explored further. Possibilities include handling multivariate covariates measured with error, via multivariate B-splines, or incorporating index modeling approach or additive structures. Although our method is developed conceptually for generalized linear models, we did not really make use of the linear structure, hence any model of the form \(f(Y, g(X), \mathbf{Z}, {\varvec{\beta }})\) can be treated in a similar way. To this end, the continuous Y case typically involves normal error and has been widely studied, while the binary response case is studied in the main text of this work. When Y is count data, many computational issues arise, and is worth careful investigation further.

We have assumed the measurement error U to either have a known distribution, or to have its model parameters estimable from multiple observations. Of course, any other available information to identify the measurement error distributional model parameter also works and the plug-in procedure is largely “blind” to how the parameter is estimated. Of course, the estimated distributional model parameter will alter the estimation variability of \({\varvec{\delta }}\), which can be take into account in a standard way (Yi et al. 2015).