1 Introduction

In many disciplines, some covariates may be endogenous in regression modeling. In this situation, the estimator based on the classical method, such as the ordinary least squares method, is not consistent any more (see Newhouse and McClellan 1998; Greenland 2000; Hernan and Robins 2006). The instrumental variable method provides a way to correct the possible endogeneity between covariates and structural errors, and can obtain consistent parameter estimators. Recently, this method has been widely used in applied statistics, econometrics, and more generally related disciplines. Because the linear instrumental variable model, which assumes that the coefficients of all covariates are constant, is sometimes too restrictive for real economic models (see Schultz 1997; Card 2001), many papers have considered the statistical inferences for semiparametric models. For example, Yao (2012) considered the efficient estimation for partially linear instrumental variable models, and proposed a semiparametric instrumental variable estimation procedure. Zhao and Xue (2013) considered the confidence region construction for regression coefficients in partially linear instrumental variable models based on the empirical likelihood method. Zhao and Li (2013) considered the variable selection for varying coefficient instrumental variable models by using the smooth-threshold estimating equations method. The varying coefficient instrumental variable model allows the effect of endogenous covariates to be varying with a covariate, and is commonly used for analysis of data measured repeatedly over time, such as time series analysis, longitudinal data analysis and functional data analysis. In practice, however, only some of the coefficients vary with certain covariate, hence one useful extension of the varying coefficient instrumental variable model is the partially varying coefficient model with endogenous variables.

$$\begin{aligned} \left\{ \begin{array}{lll} Y_{i}=X_{i}^{T}\theta (U_{i})+Z_{i}^{T}\beta +\varepsilon _{i}\\ Z_{i}=\varGamma \xi _{i}+e_{i},~~~i=1,\ldots ,n, \end{array}\right. \end{aligned}$$
(1)

where \(\theta (u)=(\theta _{1}(u),\ldots ,\theta _{p}(u))^{T}\) is a \(p\times 1\) vector of unknown functions, \(\beta =(\beta _{1},\ldots ,\beta _{q})^{T}\) is a \(q\times 1\) vector of unknown parameters, \(\varGamma \) is a \(q\times k\) matrix of unknown parameters. \(Y_{i}\) is the response variable, and \(\varepsilon _{i}\) and \(e_{i}\) are zero-mean model errors. Furthermore, we assume that \(X_{i}\) and \(U_{i}\) are exogenous covariates, \(Z_{i}\) is the endogenous covariate, and \(\xi _{i}\) is the corresponding instrumental variable. This implies that the covariate \(Z_{i}\) is correlated with the model error \(\varepsilon _{i}\), but \(X_{i}\), \(U_{i}\) and \(\xi _{i}\) are uncorrelated with \(\varepsilon _{i}\). Then we have

$$\begin{aligned} E(\varepsilon _{i}|Z_{i})\ne 0,\quad \hbox {and}\quad E(\varepsilon _{i}|X_{i},U_{i},\xi _{i})=0. \end{aligned}$$

Model (1) is more flexible, and the linear instrumental variable model, the partially linear instrumental variable model and the varying coefficient instrumental variable model are all special cases of model (1). For model (1), Cai and Xiong (2012) considered the efficient estimation problem, and proposed a three-step estimation procedure to estimate the parametric components and the nonparametric components. However, when the number of covariates in model (1) is the large, an important problem is to select the important variables in such model.

Variable selection is a very important topic in modern statistical inference. Recently, based on some penalty methods, many variable selection procedures have been proposed. For example, Frank and Friedman (1993) proposed a variable selection procedure based on the bridge regression technology. Tibshirani (1996) proposed a variable selection procedure based on the least absolute shrinkage and selection operator (LASSO) technology. Fan and Li (2001) proposed a variable selection procedure based on smoothly clipped absolute deviation penalty (SCAD), which include bridge regression and LASSO penalty. Wang et al. (2008) extended the SCAD variable selection method to the varying coefficient model, and proposed a group SCAD (gSCAD) variable selection procedure. Zhao and Xue (2009) proposed a partial gSCAD variable selection method for the varying coefficient partially linear model. Recently, many papers considered the variable selection for varying coefficient models with high dimensional data. For example, Lin and Yuan (2012) considered the variable selection for generalized varying coefficient partially linear models with diverging number of parameters. Lian (2012) considered the variable selection for high-dimensional generalized varying coefficient models. Wang et al. (2013) considered the polynomial spline estimation for generalized varying coefficient partially linear models with a diverging number of components. However, for the case that some covariates are endogenous, these variable selection methods are not consistent, and can not be directly used any more.

To overcome this problem, in this paper, we extend the partial gSCAD variable selection method, used by Zhao and Xue (2009), to the varying coefficient partially linear regression model with endogenous covariates. We propose an instrumental variable based partial gSCAD variable selection procedure which can select significant variables in the parametric components and nonparametric components simultaneously. With the proper choice of regularization parameters, we show that the variable selection procedure is consistent, and the penalized estimators have the oracle property in the sense of Fan and Li (2001). In addition, it is noteworthy that the proposed method can attenuate the effect of the endogeneity of covariates, which is an improvement of the variable selection method used in Zhao and Xue (2009).

The rest of this paper is organized as follows. In Sect. 2, we propose the instrumental variable based partial gSCAD variable selection procedure, and establish some asymptotic properties, including the consistency and the oracle property. In Sect. 3, based on the local quadratic approximation technology, we propose an iterative algorithm for finding the penalized estimators. In Sect. 4, some simulations are carried out to assess the performance of the proposed methods. Finally, the technical proofs of all asymptotic results are provided in “Appendix”.

2 Methodology and main results

We let \(B(u)=(B_{1}(u), \ldots , B_{L}(u))^{T}\) denote B-spline basis functions with the order of M, where \(L=K+M+1\), and K is the number of interior knots. Then, \(\theta _{k}(u)\) can be approximated by

$$\begin{aligned} \theta _{k}(u)\approx B(u)^{T}\gamma _{k},\quad k=1,\ldots ,p. \end{aligned}$$

Substituting this into model (1), we can get

$$\begin{aligned} Y_{i}=W_{i}^{T}\gamma +Z_{i}^{T}\beta +\varepsilon _{i}, \end{aligned}$$
(2)

where \(W_{i}=I_{p}\otimes B(U_{i})\cdot X_{i} \) and \(\gamma =(\gamma _{1}^{T},\ldots ,\gamma _{p}^{T})^{T}\). Model (2) is a standard linear regression model. Note that each function \(\theta _{k}(u)\) in (1) is characterized by \(\gamma _{k}\) in (2). Then, motivated by the idea of Zhao and Xue (2009), we propose the following partial gSCAD regularized estimation

$$\begin{aligned} Q(\gamma ,\beta )=\displaystyle \sum _{i=1}^{n}\left\{ Y_{i}- W_{i}^{T}\gamma -Z_{i}^{T}\beta \right\} ^{2} +n\sum _{k=1}^{p}p_{\lambda }(\Vert \gamma _{k}\Vert _{H})+n\sum _{l=1}^{q} p_{\lambda }(|\beta _{l}|), \end{aligned}$$
(3)

where \(\Vert \gamma _{k}\Vert _{H}=(\gamma ^{T}H\gamma )^{1/2}\), \(H=(h_{ij})_{L\times L}\) is a matrix with \(h_{ij}=\int B_{i}(u)B_{j}(u)du\), and \(p_{\lambda }(\cdot )\) is the SCAD penalty function with \(\lambda \) as a tuning parameter (see Fan and Li 2001), defined as

$$\begin{aligned} p'_{\lambda }(w)=\lambda \{I(w\le \lambda )+\frac{(a\lambda -w)_{+}}{(a-1)\lambda }I(w>\lambda )\}, \end{aligned}$$

with \(a>2, w>0\) and \(p_{\lambda }(0)=0\).

If \(Z_{i}, i=1,\ldots ,n\) in model (1) are exogenous as well, then by Zhao and Xue (2009), it can be shown that we can get a consistent sparse solution by minimizing (3). However, \(Z_{i}\), \(i=1,\ldots ,n\) in model (1) are endogenous covariates, and then \(E(\varepsilon _{i}|Z_{i})\ne 0\). In this case, one can show that the resulting estimator, based on (3), is biased. Hence, (3) cannot be used directly to select the important variables and estimate regression coefficients any more.

Next, we propose an adjustment for (3) based on instrumental variables \(\xi _{i}, i=1,\ldots ,n\). From model (1), we have \(E(Z\xi ^{T})=\varGamma E(\xi \xi ^{T})\). Hence, the moment estimator of \(\varGamma \) can be given by

$$\begin{aligned} \hat{\varGamma }=\hat{\varGamma }_{1}\hat{\varGamma }_{2}^{-1} \end{aligned}$$

where

$$\begin{aligned} \hat{\varGamma }_{1}=\frac{1}{n}\sum _{i=1}^{n}Z_{i}\xi _{i}^{T},\quad \hbox {and}\quad \hat{\varGamma }_{2}=\frac{1}{n}\sum _{i=1}^{n}\xi _{i}\xi _{i}^{T}, \end{aligned}$$

By the proof in the “Appendix”, we have \(\hat{\varGamma }=\varGamma +o_{p}(1)\). Note that \(E(Z_{i}|\xi _{i})=\varGamma \xi _{i}\), then an unbiased adjustment of \(Z_{i}\) can be given by \(\hat{Z}_{i}=\hat{\varGamma }\xi _{i}\). Hence, an instrumental variable based partial gSCAD regularized estimation function can be given by

$$\begin{aligned} \hat{Q}(\gamma ,\beta )=\displaystyle \sum _{i=1}^{n}\left\{ Y_{i}- W_{i}^{T}\gamma -\hat{Z}_{i}^{T}\beta \right\} ^{2} +n\sum _{k=1}^{p}p_{\lambda }(\Vert \gamma _{k}\Vert _{H})+n\sum _{l=1}^{q} p_{\lambda }(|\beta _{l}|). \end{aligned}$$
(4)

Remark 1

Because the endogeneity of the covariate \(Z_{i}\) will result in the inconsistent estimation and variable selection, we replace \(Z_{i}\) in \(Q(\gamma ,\beta )\) by the adjustment \(\hat{Z}_{i}\). Note that \(\hat{\varGamma }=\varGamma +o_{p}(1)\), we have \(\hat{Z}_{i}=\varGamma \xi _{i}+o_{p}(1)\). Hence, invoking that the instrumental variable \(\xi _{i}\) is an exogenous covariate, the following asymptotic results show that such an adjustment can attenuate the effect of endogenous covariates, and give a consistent regularity estimation procedure.

Let \(\hat{\beta }\) and \(\hat{\gamma }=(\hat{\gamma }_{1}^{T},\ldots ,\hat{\gamma }_{p}^{T})^{T}\) be the solution by minimizing (4). Then, \(\hat{\beta }\) is the penalized least squares estimator of \(\beta \), and the estimator of \(\theta _{k}(u)\) can be obtained by \(\hat{\theta }_{k}(u)=B^{T}(u)\hat{\gamma }_{k}\).

Next, we study the asymptotic properties of the resulting penalized least squares estimators. Similar to Zhao and Xue (2009), we let \(\theta _{0}(\cdot )\) and \(\beta _{0}\) be the true value of \(\theta (\cdot )\) and \(\beta \) respectively. Without loss of generality, we assume that \(\beta _{l0}=0,~ l=s+1,\ldots ,q\), and \(\beta _{l0},~ l=1,\ldots ,s\) are all nonzero components of \(\beta _{0}\). Furthermore, we assume that \(\theta _{k0}(\cdot )=0,~ k=d+1,\ldots ,p\), and \(\theta _{k0}(\cdot ),~ k=1,\ldots ,d\) are all nonzero components of \(\theta _{0}(\cdot )\). Let

$$\begin{aligned} a_{1n}=\max _{l}\left\{ |p_{\lambda }'(|\beta _{l0}|)|:\beta _{l0}\ne 0\right\} , ~~~a_{2n}=\max _{k}\left\{ |p_{\lambda }'(\Vert \gamma _{k0}\Vert _{H})|:\gamma _{k0}\ne 0\right\} , \end{aligned}$$

and

$$\begin{aligned} b_{1n}=\max _{l}\left\{ |p_{\lambda }''(|\beta _{l0}|)|:\beta _{l0}\ne 0\right\} ,\qquad b_{2n}=\max _{k}\left\{ |p_{\lambda }''(\Vert \gamma _{k0}\Vert _{H})|:\gamma _{k0}\ne 0\right\} . \end{aligned}$$

Furthermore, we let \(a_{n}=\max \{a_{1n},a_{2n}\}\) and \(b_{n}=\max \{b_{1n},b_{2n}\}\). Then, the following theorem gives the consistency of the penalized least squares estimators.

Theorem 1

Suppose that the regularity conditions C1-C5 in “Appendix” hold and the number of knots \(K=O_{p}(n^{1/(2r+1)})\), where r is defined in condition C1 in “Appendix”. If \(a_{n}\rightarrow 0\) and \(b_{n}\rightarrow 0\), as \(n\rightarrow \infty \), then,

  1. (i)

    \(\Vert \hat{\beta }-\beta _{0}\Vert =O_{p}(n^{\frac{-r}{2r+1}}+a_{n})\).

  2. (ii)

    \(\Vert \hat{\theta }_{k}(u)-\theta _{k0}(u)\Vert = O_{p}(n^{\frac{-r}{2r+1}}+a_{n}),~~k=1,\ldots ,p\).

Remark 2

For the SCAD penalty function that used in this paper, it is clear that \(a_{n}=0\) if \(\lambda \rightarrow 0\) when n is large enough. Hence, under the regularity conditions defined in the “Appendix”, the consistent penalized estimator indeed exists with probability tending to one.

Furthermore, under some conditions, we show that such consistent estimators must possess the sparsity property, which is stated as follows

Theorem 2

Suppose that the regularity conditions in Theorem 1 hold, and

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\liminf _{|\beta _{l}|\rightarrow 0}\lambda ^{-1}p'_{\lambda } (|\beta _{l}|)>0,\quad l=s+1,\ldots ,q,\\&\liminf _{n\rightarrow \infty }\liminf _{\Vert \gamma _{k}\Vert _{H}\rightarrow 0}\lambda ^{-1}p'_{\lambda } (\Vert \gamma _{k}\Vert _{H})>0,\quad k=d+1,\ldots ,p. \end{aligned}$$

If \(n^{r/(2r+1)}\lambda \rightarrow \infty \) and \(\lambda \rightarrow 0\), as \(n\rightarrow \infty \). Then, with probability tending to 1, \(\hat{\beta }\) and \(\hat{\theta }(u)\) must satisfy

  1. (i)

    \(\hat{\beta }_{l}=0,\quad l=s+1,\ldots ,q.\)

  2. (ii)

    \(\hat{\theta }_{k}(u)=0,\quad k=d+1,\ldots ,p.\)

Remark 3

From remark 1 in Fan and Li (2001), we have that, if \(\lambda \rightarrow 0\) as \(n\rightarrow \infty \), then \(a_{n}=0\). Then from Theorems 1 and 2, it is clear that, by choosing a proper \(\lambda \), the proposed variable selection method is consistent and the estimators achieve the convergence rate as if the subset of true zero coefficients is already known. This implies that the penalized estimators have the oracle property.

3 Algorithm

Note that the penalty function \(p_{\lambda }(\cdot )\) in \(\hat{Q}(\gamma ,\beta )\) is irregular at the origin, then the classical gradient method can not be used to solve \(\hat{Q}(\gamma ,\beta )\). In this section, we give an iterative algorithm based on local quadratic approximation technology that used in Fan and Li (2001) and Zhao and Xue (2009). More specifically, for any given non-zero \(w_{0}\), in a neighborhood of \(w_{0}\), we have the following approximation

$$\begin{aligned} p_{\lambda }(|w|)\approx p_{\lambda }(|w_{0}|)+\frac{1}{2}\frac{p'_{\lambda }(|w_{0}|)}{|w_{0}|}(w^{2}-w_{0}^{2}). \end{aligned}$$

Hence, for the given initial value \(\beta _{l}^\mathrm{ini}\) with \(|\beta _{l}^\mathrm{ini}|>0, l=1,\ldots ,q\), and \(\gamma _{k}^\mathrm{ini}\) with \(\Vert \gamma _{k}^\mathrm{ini}\Vert _{H}>0, k=1,\ldots ,p\), we can obtain that

$$\begin{aligned}&p_{\lambda }(|\beta _{l}|)\approx p_{\lambda }(|\beta _{l}^\mathrm{ini}|)+\frac{1}{2}\frac{p'_{\lambda }(|\beta _{l}^\mathrm{ini}|)}{|\beta _{l}^\mathrm{ini}|}\left( |\beta _{l}|^{2} -|\beta _{l}^\mathrm{ini}|^{2}\right) ,\\&p_{\lambda }(\Vert \gamma _{k}\Vert _{H})\approx p_{\lambda }(\Vert \gamma _{k}^\mathrm{ini}\Vert _{H})+\frac{1}{2}\frac{p'_{\lambda }(\Vert \gamma _{k}^\mathrm{ini}\Vert _{H})}{\Vert \gamma _{k}^\mathrm{ini}\Vert _{H}} \left( \Vert \gamma _{k}\Vert _{H}^{2}-\Vert \gamma _{k}^\mathrm{ini}\Vert _{H}^{2}\right) \!. \end{aligned}$$

Let \(\tilde{Z}_{i}=(\hat{Z}_{i}^{T},W_{i}^{T})^{T}\) and \(\alpha =(\beta ^{T},\gamma ^{T})^{T}\) be \(pL+q\)-dimensional vectors. Furthermore, we let

$$\begin{aligned} \Sigma (\alpha ^\mathrm{ini})=\hbox {diag}\left\{ \frac{p'_{\lambda }(|\beta _{1}^\mathrm{ini}|)}{|\beta _{1}^\mathrm{ini}|},\ldots , \frac{p'_{\lambda }(|\beta _{q}^\mathrm{ini}|)}{|\beta _{q}^\mathrm{ini}|},\frac{p'_{\lambda }(\Vert \gamma _{1}^\mathrm{ini}\Vert _{H})}{\Vert \gamma _{1}^\mathrm{ini}\Vert _{H}}H, \ldots ,\frac{p'_{\lambda }(\Vert \gamma _{p}^\mathrm{ini}\Vert _{H})}{\Vert \gamma _{p}^\mathrm{ini}\Vert _{H}}H\right\} \!, \end{aligned}$$

where \(\alpha ^\mathrm{ini}=(\beta ^{\mathrm{ini} T},\gamma ^{\mathrm{ini} T})^{T}\). Then, except for a constant term, \(\hat{Q}(\gamma ,\beta )\) that defined in (4) can be written as

$$\begin{aligned} \hat{Q}(\alpha )=\sum _{i=1}^{n}\{Y_{i}-\tilde{Z}_{i}^{T}\alpha \}^{2} +\frac{n}{2}\alpha ^{T}\Sigma (\alpha ^\mathrm{ini})\alpha . \end{aligned}$$

It is clear that \(\hat{Q}(\alpha )\) is a quadratic form, and it can be solved by

$$\begin{aligned} \left( \sum _{i=1}^{n}\tilde{Z}_{i}\tilde{Z}_{i}^{T}+\frac{n}{2}\Sigma (\alpha ^\mathrm{ini})\right) \alpha =\sum _{i=1}^{n}\tilde{Z}_{i}Y_{i}. \end{aligned}$$
(5)

Hence, we can give an iterative algorithm as follows

S1.:

Initialize \(\alpha ^{(0)}=\alpha ^\mathrm{ini}\).

S2.:

Set \(\alpha ^{(0)}=\alpha ^{(k)}\), solve \(\alpha ^{(k+1)}\) by Eq. (5).

S3.:

Iterate the step S2 until convergence, and denote the final estimator of \(\alpha \) as \(\hat{\alpha }\).

Then \(\hat{\beta }= (I_{q\times q},0_{q\times pL})\hat{\alpha }\), and \(\hat{\gamma }= (0_{pL\times q},I_{pL\times pL})\hat{\alpha }\). In the initialization step, we obtain an initial estimator \(\alpha ^\mathrm{ini}=(\beta ^{\mathrm{ini} T},\gamma ^{\mathrm{ini} T})^{T}\) by using ordinary least squares method based on the following objective function

$$\begin{aligned} \hat{Q}^{*}(\gamma ,\beta )=\displaystyle \sum _{i=1}^{n}\left\{ Y_{i}- W_{i}^{T}\gamma -\hat{Z}_{i}^{T}\beta \right\} ^{2}. \end{aligned}$$

Furthermore, to implement this method, the number of interior knots K, and the tuning parameters a and \(\lambda \) in the penalty function should be chosen. Fan and Li (2001) showed that the choice of \(a=3.7\) performs well in a variety of situations. Hence, we use this suggestion throughout this paper. In addition, we estimate \(\lambda \) and K by minimizing the following cross-validation score function

$$\begin{aligned} CV(K,\lambda )=\sum _{i=1}^{n}\left\{ Y_{i}-X_{i}^{T}\hat{\theta }_{[i]}(U_{i})-\hat{Z}_{i}^{T}\hat{\beta }_{[i]} \right\} ^{2}, \end{aligned}$$
(6)

where \(\hat{\theta }_{[i]}(\cdot )\) and \(\hat{\beta }_{[i]}\) are estimators of \(\theta (\cdot )\) and \(\beta \) respectively based on (4) after deleting the ith subject.

Although maybe some nonzero parameters will be incorrectly set to zeros in this algorithm, from the following simulation studies, we can see that the number of the nonzeros incorrectly set to zero is very small, and it decreases rapidly when the sample size n increases. This implies that the proposed iterative algorithm is workable.

4 Simulation studies

In this section, we conduct some Monte Carlo simulations to evaluate the finite sample performance of the proposed variable selection method. And as in Zhao and Xue (2009), the performance of estimator \(\hat{\beta }\) will be assessed by using the generalized mean square error (GMSE), defined as

$$\begin{aligned} \hbox {GMSE}=(\hat{\beta }-\beta _{0})^{T}E(ZZ^{T})(\hat{\beta }-\beta _{0}). \end{aligned}$$

The performance of estimator \(\hat{\theta }(\cdot )\) will be assessed by using the square root of average square errors (RASE)

$$\begin{aligned} \hbox {RASE}=\left\{ \frac{1}{M}\sum _{s=1}^{M}\sum _{k=1}^{p}\left[ \hat{\theta }_{k}(u_{s})-\theta _{k0}(u_{s})\right] ^{2}\right\} ^{1/2}, \end{aligned}$$

where \(u_{s}, s=1,\ldots ,M\) are the grid points at which the function \(\hat{\theta }(u)\) are evaluated. In our simulation, \(M=200\) is used.

Table 1 Variable selection results for parametric components based on different variable selection methods

We simulate data from model (1), where \(\beta =(\beta _{1},\ldots ,\beta _{10})^{T}\) with \(\beta _{1}=3\), \(\beta _{2}=2, \beta _{3}=1\) and \(\beta _{4}=0.5\), and \(\theta (u)=(\theta _{1}(u),\ldots ,\theta _{10}(u))^{T}\) with \(\theta _{1}(u)=2.5+0.5\exp (2u-1), \theta _{2}(u)=2-\sin (\pi u)\) and \(\theta _{3}(u)=0.5+0.8u(1-u)\). While the remaining coefficients, corresponding to the irrelevant variables, are given by zeros. To perform this simulation, we take the covariates \(U\sim U(0,1)\), \(X_{k}\sim N(1, 1.5)\), and the instrumental variables \(\xi _{k}\sim N(1, 1), k=1,\ldots ,10\). The covariate \(Z_{k}=\xi _{k}+\alpha \varepsilon \), where \(\varepsilon \sim N(0, 0.5)\) and \(\alpha =0.2, 0.4\) and 0.6 to represent different levels of endogeneity of covariates. This setting up makes sure \(E(Z_{k}\varepsilon )\ne 0\), which implies that the covariate \(Z_{k}\) is endogenous. In the following simulations, we use the quadratic B-splines, and the interior knots are taken equidistantly. Furthermore, the sample size is taken as \(n=100, 200\) and 300 respectively, and for each case, we take 1000 simulation runs.

To evaluate the performance of the proposed variable selection method, two methods are compared: the instrumental variable based partial gSCAD variable selection method (IV-gSCAD) based on Theorem 1, and the naive partial gSCAD variable selection method (Naive-gSCAD). The latter is neglecting the endogeneity of covariate \(Z_{i}\), and using the partial gSCAD penalty method based on (3) directly. Based on the 1000 simulation runs, the average number of zero coefficients for parametric components is reported in Table 1, and the average number of zero coefficients for nonparametric components is reported in Table 2. In Tables 1 and 2, the column labeled “C” presents the average number of coefficients of the true zeros correctly set to zero, and the column labeled “I” presents the average number of the true nonzeros incorrectly set to zero. Tables 1 and 2 also present the average false selection rate (FSR), which is defined as \(\hbox {FSR}=\hbox {IN/TN}\), where “IN” is the average number of the true zeros incorrectly set to nonzero, and “TN” is the average total number set to nonzero. In fact, FSR represents the proportion of falsely selected unimportant variables among the total variables selected in the variable selection procedure. From Tables 1 and 2, we can make the following observations:

Table 2 Variable selection results for nonparametric components based on different variable selection methods
  1. (i)

    The performances of IV-gSCAD method for parametric components and nonparametric components are both better than those of Naive-gSCAD method, and this is especially true when the level of endogeneity of covariates is large. Because the Naive-gSCAD variable selection method cannot eliminate some unimportant variables in the parametric and nonparametric components, and gives significantly larger model errors. This implies that the Naive-gSCAD variable selection procedure is biased.

  2. (ii)

    For the given level of endogeneity of covariates, the GMSE, RASE and FSR, obtained by the IV-gSCAD variable selection method, all decrease as the sample size n increases. This implies that the proposed IV-gSCAD variable selection procedure is consistent.

  3. (iii)

    For given n, the IV-gSCAD variable selection method performs similar in terms of model error and model complexity for all levels of endogeneity of covariates. This indicates that the proposed instrumental variable based variable selection can attenuate the effect of the endogeneity of covariates. In general, the proposed variable selection method works well in terms of model error and the model complexity.