Estimation and variable selection for partial functional linear regression

Tang, Qingguo; Jin, Peng

doi:10.1007/s10182-018-00342-0

Estimation and variable selection for partial functional linear regression

Original Paper
Published: 14 December 2018

Volume 103, pages 475–501, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Estimation and variable selection for partial functional linear regression

Download PDF

422 Accesses
2 Citations
Explore all metrics

Abstract

We propose a new estimation procedure for estimating the unknown parameters and function in partial functional linear regression. The asymptotic distribution of the estimator of the vector of slope parameters is derived, and the global convergence rate of the estimator of unknown slope function is established under suitable norm. The convergence rate of the mean squared prediction error for the proposed estimators is also established. Based on the proposed estimation procedure, we further construct the penalized regression estimators and establish their variable selection consistency and oracle properties. Finite sample properties of our procedures are studied through Monte Carlo simulations. A real data example about the real estate data is used to illustrate our proposed methodology.

Estimation and variable selection for partially functional linear models

Article 28 May 2018

Varying coefficient partially functional linear regression models

Article 01 April 2015

Estimation for functional linear semiparametric model

Article 18 November 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In the last two decades, there has been an increasing interest in regression models for functional variables as more and more data have arisen where the primary unit of observation can be viewed as a curve or in general a function, such as in biology, chemometrics, econometrics, geophysics, the medical sciences, meteorology and neurosciences. As a natural extension of the ordinary regression to the case where predictors include random functions and responses are scalars or functions, functional linear regression analysis provides valuable insights into these problems. The effectively infinite-dimensional character of functional data analysis is a source of many of its differences from more conventional multivariate analysis. The functional linear model has been extensively studied and successfully applied; see Cardot et al. (2003), Ramsay and Silverman (2002, 2005), Cai and Hall (2006), Hall and Horowitz (2007), Reiss and Ogden (2010), Brunel and Roche (2015), Hsing and Eubank (2015), among many others.

It is frequently the case that a response is related to both a vector of finite length and a function-valued random variable as predictor variables. With a square integrable random function X on a compact set ${\mathcal {T}}$ in R and a d-dimensional vector of random variables $Z=(Z_{1},\ldots ,Z_{d})^{T}$, we suppose that the scalar response Y is linearly related to predictor variables (X, Z) through the relationship

$$\begin{aligned} Y=\int _{{\mathcal {T}}}\gamma (t)X(t)\mathrm{d}t+Z^{T}\pmb {\beta }_{0}+\varepsilon , \end{aligned}$$

(1.1)

where $\pmb {\beta }_{0}$ is a $d\times 1$ vector of regression coefficients of Z, $\gamma (t)$ is a square integrable function on ${\mathcal {T}}$, and $\varepsilon $ is a random error. Model (1.1) generalizes both the classical linear regression model and functional linear regression model which correspond to the cases $\gamma (t)=0$ and $\pmb {\beta }_{0}=0$, respectively. Moreover, this model includes the analysis of covariance model where the covariate is a random function, i.e., the model represents functional linear models between a scalar variable Y and a function-valued random variable X for each group simultaneously with the $Z_{k}$ being scalar-valued indicator variables associated with subgroups. Zhang et al. (2007) proposed a two-stage functional mixed effects model to deal with measurement error and irregularly spaced time points and estimated the regression coefficient function using a two-stage nonparametric regression calibration method. Shin (2009) and Reiss and Ogden (2010) proposed the estimators of $\pmb {\beta }_{0}$ and $\gamma (t)$ by generalizing the functional principal components estimation method in the functional linear regression and Shin and Lee (2012) considered a prediction of a scalar variable based on both a function-valued variable and a finite number of real-valued variables.

In this paper, we propose a new method for estimating the unknown parameters and function in model (1.1). Using functional principal component analysis, the unknown slope function is approximated by an average value which includes the unknown parameters. The estimators of the unknown parameters are obtained by solving a minimization problem. Although our method is obviously different from Shin (2009) and Shin and Lee (2012), we find that the estimators obtained by the two methods have the same behavior through simulation and further derivation. In fact, our estimators are more simple in expression and require less computation. Under conditions weaker than Shin (2009), we derive the asymptotic normality of the estimator of $\pmb {\beta }_{0}$ and establish the global convergence rate of the estimator of the slope function $\gamma (t)$. Since our assumptions are weaker than that of Shin (2009), the asymptotic distribution of the estimator of $\pmb {\beta }_{0}$ is different from that of Shin (2009) and Shin and Lee (2012). The proofs of our theorems are essentially different from Shin (2009). We establish the convergence rate of the mean squared prediction error for a predictor. Based on the proposed estimation procedure, we further propose a family of variable selection procedures via the penalized least squares using concave penalty functions. We show that the proposed penalized regression estimators have the variable selection consistency and oracle property of Fan and Li (2001).

Variable selection is particularly important when the true underlying model has a sparse representation. Identifying significant predictors will enhance the prediction performance of the fitted model. A penalty function generally facilitates variable selection in regression models. Various penalty functions have been used in the literature: the bridge regression (Frank and Friedman 1993), LASSO (Tibshirani 1996), SCAD (Fan and Li 2001), adaptive LASSO (Zou 2006), MCP (Zhang 2010), are well known. Liang and Li (2009) considered variable selection for partially linear models with measurement errors, Wang and Wang (2014) proposed adaptive Lasso estimators for ultrahigh-dimensional generalized linear models, and Aneirosa et al. (2015) investigated variable selection in partial linear regression with functional covariate. Fan et al. (2014) studied oracle optimality of folded concave penalized estimation.

The paper is organized as follows. Section 2 describes the estimation method and studies its asymptotic properties. Section 3 investigates an adaptive variable selection method and its asymptotic properties. Section 4 presents finite sample behaviors of the estimators. A real data example about the real estate data is given in Sect. 5. All proofs are relegated to “Appendix.”

2 Estimation method and asymptotic results

Let Y be a real-valued random variable defined on a probability space $(\Omega , {\mathcal {B}}, P)$. Let Z be a d-dimensional vector of random variables with finite second moments, and let $\{X(t): t\in {\mathcal {T}}\}$ be a zero-mean and second-order (i.e., $EX(t)^{2}<\infty $ for all $t\in {\mathcal {T}})$ stochastic process defined on $(\Omega , {\mathcal {B}}, P)$ with sample paths in $L_{2}({\mathcal {T}})$, the set of all square integrable functions on ${\mathcal {T}}$, where ${\mathcal {T}}$ is a bounded closed interval. $\varepsilon $ is a random error with mean zero and is independent of (X, Z). Let $<\cdot ,\cdot>$ and $\Vert \cdot \Vert $ represent, respectively, the $L_{2}({\mathcal {T}})$ inner product and norm. Denote the covariance function of the process X(t) by $K(s,t)=cov(X(s),X(t))$. We suppose that K(s, t) is positive definite, in which case it admits a spectral decomposition in terms of strictly positive eigenvalues $\lambda _{j}$,

$$\begin{aligned} K(s, t)=\sum _{j=1}^{\infty }\lambda _{j}\phi _{j}(s)\phi _{j}(t), \quad s,t\in {\mathcal {T}}, \end{aligned}$$

(2.1)

where $(\lambda _{j},\phi _{j})$ are (eigenvalue, eigenfunction) pairs for the linear operator with kernel K, the eigenvalues are ordered so that $\lambda _{1}>\lambda _{2}>\cdots $ and the functions $\phi _{1},\phi _{2},\ldots $ form an orthonormal basis for $L_{2}({\mathcal {T}})$. This leads to the Karhunen–Lo$\grave{\mathrm{e}}$ve representation

$$\begin{aligned} X(t)=\sum _{j=1}^{\infty }\xi _{j}\phi _{j}(t), \end{aligned}$$

where the $\xi _{j}=\int _{{\mathcal {T}}}X(t)\phi _{j}(t)\mathrm{d}t$ are uncorrelated random variables with mean 0 and variance $E\xi _{j}^{2}=\lambda _{j}$. Let $\gamma (t)=\sum _{j=1}^{\infty }\gamma _{j}\phi _{j}(t)$, then model (1.1) can be written as

$$\begin{aligned} Y=\sum _{j=1}^{\infty }\gamma _{j}\xi _{j}+Z^{T}\pmb {\beta }_{0}+\varepsilon . \end{aligned}$$

(2.2)

By (2.2), we have

$$\begin{aligned} \gamma _{j}=E\{[Y-Z^{T}\pmb {\beta }_{0}]\xi _{j}\}/\lambda _{j}. \end{aligned}$$

(2.3)

Let $(X_{i}(t),Z_{i}, Y_{i}), i=1,\ldots ,n$, be independent realizations of (X(t), Z, Y) generated by the model (1.1). Empirical versions of K and of its spectral decomposition are

$$\begin{aligned} \hat{K}(s,t)=\frac{1}{n}\sum _{i=1}^{n}X_{i}(s)X_{i}(t)=\sum _{j=1}^{\infty }\hat{\lambda }_{j}\hat{\phi }_{j}(s)\hat{\phi }_{j}(t), \quad s,t\in {\mathcal {T}}. \end{aligned}$$

Analogously to the case of K, $(\hat{\lambda }_{j},\hat{\phi }_{j})$ are (eigenvalue, eigenfunction) pairs for the linear operator with kernel $\hat{K}$, ordered such that $\hat{\lambda }_{1}\ge \hat{\lambda }_{2}\ge \cdots \ge 0$. We take $(\hat{\lambda }_{j},\hat{\phi }_{j})$ and $\hat{\xi } _{ij}=\langle X_{i},\hat{\phi }_{j}\rangle $ to be the estimators of $(\lambda _{j},\phi _{j})$ and $\xi _{ij}=\langle X_{i},\phi _{j}\rangle ,$ respectively, and set

$$\begin{aligned} \tilde{\gamma }_{j}=\frac{1}{n\hat{\lambda }_{j}}\sum _{i=1}^{n}\left( Y_{i}-Z_{i}^{T}\pmb {\beta }_{0}\right) \hat{\xi }_{ij}. \end{aligned}$$

(2.4)

We use $\sum _{j=1}^{m}\tilde{\gamma }_{j}\hat{\xi }_{j}$ to approximate $\sum _{j=1}^{\infty }\gamma _{j}\xi _{j}$ in (2.2). Combining (2.2) and (2.4), we then solve the following minimization problem

$$\begin{aligned} \min _{\pmb {\beta }}\sum _{i=1}^{n}\left\{ Y_{i}-\sum _{j=1}^{m}\frac{\hat{\xi }_{ij}}{n\hat{\lambda }_{j}}\sum _{l=1}^{n}\left( Y_{l} -Z_{l}^{T}\pmb {\beta }\right) \hat{\xi }_{lj} -Z_{i}^{T}\pmb {\beta }\right\} ^{2} \end{aligned}$$

(2.5)

to obtain the estimator of $\pmb {\beta }_{0}$. Define $\tilde{\xi }_{li}=\sum _{j=1}^{m}\frac{\hat{\xi }_{lj}\hat{\xi }_{ij}}{\hat{\lambda }_{j}}$, $\tilde{Y}_{i}=Y_{i}-\frac{1}{n}\sum _{l=1}^{n}Y_{l}\tilde{\xi }_{li}$ and $\tilde{Z}_{i}=Z_{i}-\frac{1}{n}\sum _{l=1}^{n}Z_{l}\tilde{\xi }_{li}.$ Then, (2.5) can be written as

$$\begin{aligned} \min _{\beta }\sum _{i=1}^{n}\left( \tilde{Y}_{i}-\tilde{Z}_{i}^{T}\pmb {\beta }\right) ^{2} \end{aligned}$$

(2.6)

Let $\tilde{Y}=(\tilde{Y}_{1},\ldots ,\tilde{Y}_{n})^{T}$ and $\tilde{Z}=(\tilde{Z}_{1},\ldots ,\tilde{Z}_{n})^{T}$. Then the estimator $\hat{\pmb {\beta }}$ of $\pmb {\beta }_{0}$ is given by

$$\begin{aligned} \hat{\pmb {\beta }}=(\tilde{Z}^{T}\tilde{Z})^{-1}\tilde{Z}^{T}\tilde{Y}. \end{aligned}$$

(2.7)

The estimator of $\gamma (t)$ is given by $\hat{\gamma }(t)=\sum _{j=1}^{m}\hat{\gamma }_{j}\hat{\phi }_{j}(t)$ with

$$\begin{aligned} \hat{\gamma }_{j}=\frac{1}{n\hat{\lambda }_{j}}\sum _{i=1}^{n}\left( Y_{i}-Z_{i}^{T}\hat{\pmb {\beta }}\right) \hat{\xi }_{ij}. \end{aligned}$$

(2.8)

To implement our estimation method, we need to know how to choose m. The value for m can be selected by leave-one-curve-out cross-validation of the prediction error. Define CV function as

$$\begin{aligned} \mathrm{CV}(m)=\sum _{i=1}^{n} \left( Y_{i} -\sum _{j=1}^{m}\hat{\gamma }_{j}^{-i}\hat{\xi }_{ij}-Z_{i}^{T}\hat{\pmb {\beta }}^{-i}\right) ^{2}, \end{aligned}$$

where $\hat{\gamma }_{j}^{-i},j=1,\ldots ,m$ and $\hat{\pmb {\beta }}^{-i}$ are computed after removing $(X_{i}, Z_{i}, Y_{i})$. As an alternative to cross-validation, m can also be chosen by information criteria BIC. The BIC criteria as a function of m is given by

$$\begin{aligned} \mathrm{BIC}(m)=\log \left\{ \sum _{i=1}^{n}\left( Y_{i} -\sum _{j=1}^{m}\hat{\gamma }_{j}\hat{\xi }_{ij}-Z_{i}^{T}\hat{\pmb {\beta }}\right) ^{2}\right\} +\frac{\log n}{n}(m+1). \end{aligned}$$

Large values of BIC indicate poor fits.

Remark 2.1

Noting that $\hat{\xi } _{ij}=\langle X_{i},\hat{\phi }_{j}\rangle $, it can be easily shown that our estimators have the same performance as the estimators given in Shin (2009) and Shin and Lee (2012). However, our estimators are more simple in expression and require less computation.

In the following, we derive asymptotic normality of the estimator $\hat{\pmb {\beta }}$ and the rate of convergence for the estimator $\hat{\gamma }(t)$. We make the following assumptions.

Assumption 1

X has finite fourth moment, in that $\int _{{\mathcal {T}}}E(X^{4})<\infty $, and for each j, $E(\xi _{j}^{4})<C_{1}\lambda _{j}^{2}$ for some constant $C_{1}$.

Assumption 2

There exists a convex function $\varphi $ defined on the interval [0, 1] such that $\varphi (0) = 0$ and $\lambda _{j}=\varphi (1/j)$ for $j\ge 1$.

Assumption 3

For Fourier coefficients $\gamma _{j}$, there exist constants $C_{2}>0$ and $\delta >3/2$ such that $|\gamma _{j}|\le C_{2}j^{-\delta }$ for all $j\ge 1$.

Assumption 4

$m\rightarrow \infty $ and $n^{-1/2}m\lambda _{m}^{-1}\rightarrow 0$.

Assumption 5

$E(\Vert Z\Vert ^{4})<+\,\infty $.

Assumptions 1 and 3 are standard conditions for functional linear models; see, e.g., Cai and Hall (2006) and Hall and Horowitz (2007). Assumption 2 is slightly less restrictive than (3.2) of Hall and Horowitz (2007). Assumptions 4 can be easily verified and will be further discussed below.

Remark 2.2

Assumptions 2 and 4 are weaker than the assumptions for $\lambda _{j}$ and m, respectively, in Shin (2009) and Shin and Lee (2012).

We first establish the asymptotic distribution of the estimator $\hat{\pmb {\beta }}$. To derive the asymptotic normality of the estimator $\hat{\pmb {\beta }}$, we need to adjust for the dependence of $Z=(Z_{1},\ldots ,Z_{d})^{T}$ and X(t), which is a common complication in semiparametric models. Let ${\mathcal {G}}$ denote the class of the random variables such that $G\in {\mathcal {G}}$ if $G=\sum _{j=1}^{\infty }g_{j}\xi _{j}$ and $|g_{j}|\le C_{3}j^{-\delta }$ for all $j\ge 1 $, where $\delta $ is defined in Assumption 3 and $C_{3}>0$ is a constant. Note that ${\mathcal {G}}$ is related to the first term on the right side of (2.2). Denote $G_{r} =\sum _{j=1}^{\infty }g_{rj}\xi _{j}$. Let

$$\begin{aligned} G_{r}^{*}=\text{ arginf }_{G_{r}\in {\mathcal {G}}}E \left[ \left( Z_{r}-\sum _{j=1}^{\infty }g_{rj}\xi _{j}\right) ^{2}\right] . \end{aligned}$$

Since

$$\begin{aligned} E\left[ \left( Z_{r}-\sum _{j=1}^{\infty }g_{rj}\xi _{j}\right) ^{2}\right] =E[(Z_{r}-E(Z_{r}|X))^{2}] +E\left[ \left( E(Z_{r}|X)-\sum _{j=1}^{\infty }g_{rj}\xi _{j}\right) ^{2}\right] , \end{aligned}$$

therefore,

$$\begin{aligned} G_{r}^{*} =\text{ arginf }_{G_{r}\in {\mathcal {G}}} E\left[ \left( E(Z_{r}|X)-\sum _{j=1}^{\infty }g_{rj}\xi _{j}\right) ^{2}\right] . \end{aligned}$$

Thus, $G_{r}^{*}$ are the projections of $E(Z_{r}|X)$ onto the space ${\mathcal {G}}$. In other words, $G_{r}^{*}$ is an element that belongs to ${\mathcal {G}}$ and it is the closest to $E(Z_{r}|X)$ among all the random variables in ${\mathcal {G}}$. Let $H_{r}=Z_{r}-G_{r}^{*}$ for $r=1,\ldots ,d$, and $H=(H_{1},\ldots ,H_{d})^{T}$. We then have the following results.

Theorem 2.1

Suppose that Assumptions 1–5 hold and $\Omega =E(HH^{T})$ is invertible, then

$$\begin{aligned} \sqrt{n}(\hat{\pmb {\beta }}-\pmb {\beta }_{0})\rightarrow _{d}N(0,\Omega ^{-1}\sigma ^{2}), \end{aligned}$$

(2.9)

where $\rightarrow _{d}$ means convergence in distribution.

Remark 2.3

When the model is changed from functional linear model to partial functional linear model, to derive the asymptotic normality of the estimator $\hat{\pmb {\beta }}$, it is key to handle the relation of the vector Z and X(t). In our analysis, $Z_{r}, r=1,\ldots ,d$ are divided into two unrelated parts $G_{r}^{*}=\sum _{j=1}^{\infty }g_{rj}^{*}\xi _{j}$ and $H_{r}$. Consequently, (2.2) can be written as

$$\begin{aligned} Y=\sum _{j=1}^{\infty } \left( \gamma _{j}+\sum _{r=1}^{d}g_{rj}^{*}\beta _{0r}\right) \xi _{j}+H^{T}\pmb {\beta }_{0}+\varepsilon , \end{aligned}$$

where $\pmb {\beta }_{0}=(\beta _{01},\ldots ,\beta _{0d})^{T}$. If $Z_{r}=\sum _{j=1}^{\infty }\tilde{g}_{rj}\xi _{j}+V_{r}$ and $V_{r}$ is independent of X(t), then $G_{r}^{*}=\sum _{j=1}^{\infty }\tilde{g}_{rj}\xi _{j}$ and $H_{r}=V_{r}$. If $Z_{r}$ is independent of X(t), then $G_{r}^{*}=0$ and $H_{r}=Z_{r}$. If $E(Z_{r}|X(t))=\sum _{j=1}^{\infty }\bar{g}_{rj}\xi _{j}$, then $G_{r}^{*}=\sum _{j=1}^{\infty }\bar{g}_{rj}\xi _{j}$ and $H_{r}=Z_{r}-G_{r}^{*}$. In Shin (2009) and Shin and Lee (2012), it is assumed that $E(Z_{r}|X(t))=\sum _{j=1}^{\infty }\lambda _{j}^{-1}<K_{Z_{k}X},\phi _{j}>\xi _{j}$, where $K_{Z_{r}X}=cov(Z_{r},X)$ for $r=1,\ldots ,d$. In this case, $G_{r}^{*}=\sum _{j=1}^{\infty }\lambda _{j}^{-1}<K_{Z_{k}X},\phi _{j}>\xi _{j}$ and $H_{r}=Z_{r}-G_{r}^{*}$, and the result of our Theorem 2.1 is the same as that of Theorem 3.1 in Shin (2009). Hence, Theorem 3.1 of Shin (2009) is a special case of our Theorem 2.1.

Next we establish the convergence rates of the estimators $\hat{\gamma }(t)$.

Theorem 2.2

Assume that Assumptions 1–5 hold and that $n^{-1}m^{2}\lambda _{m} ^{-1}\log m\rightarrow 0$. Then

$$\begin{aligned} \int _{{\mathcal {T}}}\left\{ \hat{\gamma }(t)-\gamma (t)\right\} ^{2}\mathrm{d}t=O_{p} \left( \frac{m}{n\lambda _{m}}+\frac{m}{n^{2} \lambda _{m}^{2}}\sum _{j=1}^{m}\frac{j^{3}\gamma _{j}^{2}}{\lambda _{j}^{2}}+\frac{1}{n\lambda _{m}}\sum _{j=1}^{m} \frac{\gamma _{j}^{2}}{\lambda _{j}}+m^{-2\delta +1}\right) .\nonumber \\ \end{aligned}$$

(2.10)

If $\lambda _{j}\sim j^{-\tau }$, $\tau >1$, $m\sim n^{1/(\tau +2\delta )}$, $\delta >2$ and $\delta >1+\tau /2$, then $\sum _{j=1}^{m}j^{3}\gamma _{j} ^{2}\lambda _{j}^{-2}\le C_{4}(\log m+m^{2\tau +4-2\delta )})$ and $\sum _{j=1}^{m}\gamma _{j}^{2}\lambda _{j}^{-1}<+\,\infty $, where $C_{4}$ is a positive constant. We then have the following corollary.

Corollary 2.1

Under Assumptions 1–5, if $\lambda _{j}\sim j^{-\tau }$, $\tau >1$, $m\sim n^{1/(\tau +2\delta )}$ and $\delta >\min (2,1+\tau /2)$, then it holds that

$$\begin{aligned} \int _{{\mathcal {T}}}\left\{ \hat{\gamma }(t)-\gamma (t)\right\} ^{2}\mathrm{d}t=O_{p}\left( n^{-(2\delta -1)/(\tau +2\delta )}\right) . \end{aligned}$$

(2.11)

The global convergence result (2.11) indicates that the estimator $\hat{\gamma }(t)$ attains the same convergence rate as those of the estimators of Hall and Horowitz (2007), which are optimal in the minimax sense.

Let ${\mathcal {S}}=\{(Z_{i},X_{i},Y_{i}): 1\le i\le n\}$. In the following, for a new pair of predictor variables $(Z_{n+1}, X_{n+1})$ taking from the same population as the data and independent of the data, we shall derive the convergence rate of the mean squared prediction error (MSPE) given by

$$\begin{aligned} \mathrm{MSPE}= & {} E\left( \left[ \left( \int _{{\mathcal {T}}}\hat{\gamma }(t)X_{n+1}(t)\mathrm{d}t+Z_{n+1}^{T}\hat{\pmb {\beta }}\right) \right. \right. \\&\left. \left. -\left( \int _{{\mathcal {T}}}\gamma (t)X_{n+1}(t)\mathrm{d}t+Z_{n+1}^{T}\pmb {\beta }_{0}\right) \right] ^{2}|{\mathcal {S}}\right) . \end{aligned}$$

Theorem 2.3

Under Assumptions 1, 3 and 5, if $\lambda _{j}\sim j^{-\tau }$, $\tau >1$, $m\sim n^{1/(\tau +2\delta )}$ and $\delta >\min (2,1+\tau /2)$, then

$$\begin{aligned} \mathrm{MSPE}=O_{p}(n^{-(\tau +2\delta -1)/(\tau +2\delta )}). \end{aligned}$$

(2.12)

Remark 2.4

In practical application, X(t) is only discretely observed. Without loss of generality, suppose ${\mathcal {T}}=[0,1]$ and for each $i=1,\ldots ,n$, $X_{i}(t)$ is observed at $n_{i}$ discrete points $0=t_{i1}<\ldots <t_{in_{i}}=1$. Typically, $\max _{i}\max _{1\le j\le n_{i}-1}(t_{i(j+1)}-t_{ij})\rightarrow 0$ as $n\rightarrow \infty $ is also assumed. Based on the discrete observations, for each $i=1,\ldots ,n$, linear interpolation functions or spline interpolation functions can be used for the estimators of $X_{i}(t)$. For example, we can use the following linear interpolation function

$$\begin{aligned} \hat{X}_{i}(t)&=X_{i}(t_{ij})+\frac{(X_{i}(t_{i(j+1)})-X_{i}(t_{ij}))}{t_{i(j+1)}-t_{ij}}(t-t_{ij}), \\&\quad \text{ for } \ t\in [t_{ij}, t_{i(j+1)}], j=0,\ldots ,n_{i}-1 \end{aligned}$$

as the estimator of $X_{i}(t)$. It is necessary to point out that if $X_{i}(t),i=1,\ldots ,n$ are replaced by $\hat{X}_{i}(t),i=1,\ldots ,n$, the conclusions of Theorems 2.1–2.3 do not hold. We note that it is difficult to establish the related asymptotic properties by our current approach, and further research is expected.

3 Variable selection for partial functional linear model

In the variable selection problem, it is assumed that some components of $\pmb {\beta _{0}}$ in model (1.1) are equal to zero. The goal is to identify and estimate the subset model. It has been argued that folded concave penalties are preferable to convex penalties such as the $L_{1}$-penalty in terms of both model-estimation accuracy and variable selection consistency (Lv and Fan 2009; Fan and Lv 2011). Let $p_{\nu _{n}}(|u|)=p_{a,\nu _{n}}(|u|)$ be general folded concave penalty functions defined on $u\in (-\,\infty ,+\,\infty )$ satisfying

(a)
The $p_{\nu _{n}}(u)$ are increasing and concave in $u\in [0,+\,\infty )$;
(b)
The $p_{\nu _{n}}(u)$ are differentiable in $u\in (0,+\,\infty )$ with $p_{\nu _{n}}^{\prime }(0):=p_{\nu _{n}}^{\prime }(0+)\ge a_{1} \nu _{n}$, $p_{\nu _{n}}^{\prime }(u)\ge a_{1}\nu _{n}$ for $u\in (0,a_{2}\nu _{n}]$, $p_{\nu _{n}}^{\prime }(u)\le a_{3} \nu _{n}$ for $u\in [0,+\,\infty )$, and $p_{\nu _{n}}^{\prime }(u)=0$ for $u\in [a\nu _{n},+\,\infty )$ with a prespecified constant $a>a_{2}$, where $a_{1}$, $a_{2}$ and $a_{3}$ are fixed positive constants.

The above family of general folded concave penalties contains several popular penalties including the SCAD penalty (Fan and Li 2001), the derivative of which is given by

$$\begin{aligned} p_{\nu _{n}}^{\prime }(u)=\nu _{n}I_{\{u\le \nu _{n}\}} +\frac{(a\nu _{n}-u)_{+}}{a-1}I_{\{u>\nu _{n}\}} \quad \text{ for } \text{ some }\ a>2, \end{aligned}$$

and the MCP penalty (Zhang 2010), the derivative of which is given by

$$\begin{aligned} p_{\nu _{n}}^{\prime }(u)=\left( \nu _{n}-\frac{u}{a}\right) _{+} \quad \text{ for } \text{ some }\ a>1. \end{aligned}$$

It is easy to see that $a_{1}=a_{2}=a_{3}=1$ for the SCAD, and $a_{1} =1-a^{-1}$, $a_{2}=a_{3}=1$ for the MCP.

Based on the above analysis, we define a penalized least squares estimator of $\pmb {\beta }_{0}$ as

$$\begin{aligned} \hat{\pmb {\beta }}_\mathrm{PLS}=\arg \min _{\pmb {\beta }}(\tilde{Y}-\tilde{Z}\pmb {\beta })^{T}(\tilde{Y}-\tilde{Z}\pmb {\beta })+n\sum _{k=1}^{d}p_{\nu _{n}}^{\prime }(|\beta _{k}^{(0)}|)|\beta _{k}|, \end{aligned}$$

(3.1)

where ${\pmb {\beta }}^{(0)}=(\beta _{1}^{(0)},\ldots ,\beta _{d}^{(0)})^{T}$ is an initial estimator of $\pmb {\beta }_{0}$. For example, ${\pmb {\beta }}^{(0)}$ can be obtained from (2.7) in Sect. 2.

In the following, we show that the penalized least squares estimator defined by (3.1) has the oracle property (Fan and Li 2001). Without loss of generality, let $\pmb {\beta }=({\pmb {\beta }}_{1}^{T},\pmb {\beta }_{2}^{T})^{T}$, where $\pmb {\beta }_{1}\in \mathbf {R}^{d_{1}}$ and $\pmb {\beta }_{2}\in \mathbf {R}^{d-d_{1}}$. The vector of true parameters is denoted by $\pmb {\beta }_{0}=(\pmb {\beta }_{01} ^{T},\pmb {\beta }_{02}^{T})^{T}$ with each element of $\pmb {\beta }_{01}$ being nonzero and $\pmb {\beta }_{02}=0$.

Theorem 3.1

Suppose that the conditions of Theorem 2.1 hold. Let $p_{\nu _{n}}(\cdot )$ be general folded concave penalty functions satisfying assumptions (a) and (b) above and ${\pmb {\beta }}^{(0)}$ be the estimator defined by (2.7). If $\nu _{n}\rightarrow 0$ and $\sqrt{n}\nu _{n}\rightarrow \infty $ as $n\rightarrow \infty $, then the penalized least squares estimator $\hat{\pmb {\beta }}_\mathrm{PLS} =(\hat{\pmb {\beta }}_{\mathrm{PLS}1}^{T},\hat{\pmb {\beta }}_{\mathrm{PLS}2}^{T})^{T}$ defined by (3.1) satisfies

(1)
Sparsity: $P(\hat{\pmb {\beta }}_{\mathrm{PLS}2}=0)\rightarrow 1.$
(2)
Asymptotic normality:
$$\begin{aligned} \sqrt{n}(\hat{\pmb {\beta }}_{\mathrm{PLS}1}-\pmb {\beta }_{01})\rightarrow _{d}N(0,\Omega _{1}^{-1}\sigma ^{2}), \end{aligned}$$
(3.2)
where $\Omega _{1}=E[(H_{1},\ldots ,H_{d_{1}})^{T}(H_{1},\ldots ,H_{d_{1}})]$.

Let

$$\begin{aligned} \hat{\gamma }_{\mathrm{PLSj}}=\frac{1}{n\hat{\lambda }_{j}}\sum _{i=1}^{n} \left( Y_{i}-Z_{i}^{T}\hat{\pmb {\beta }}_\mathrm{PLS}\right) \hat{\xi }_{ij} \end{aligned}$$

(3.3)

and $\hat{\gamma }_\mathrm{PLS}(t)=\sum _{j=1}^{m}\hat{\gamma }_{PLSj}\hat{\phi }_{j}(t)$. We then have the following theorem.

Theorem 3.2

(1)
Under the assumptions of Theorems 3.1 and 2.2, the estimator $\hat{\gamma }_\mathrm{PLS}(t)$ satisfies the conclusions of Theorem 2.2.
(2)
Under the assumptions of Theorems 3.1 and 2.3, the conclusions of Theorem 2.3 hold.

4 Simulation results

Since our estimators have the same performances as Shin (2009) and Shin and Lee (2012), in this section, we only investigate the finite sample performance of the penalized least squares estimators proposed in Sect. 3 by carrying out a Monte Carlo study. The data sets were generated from the following models

$$\begin{aligned} Y_{i}=\int _{{\mathcal {T}}}\gamma (t)X_{i}(t)\mathrm{d}t+Z_{i}^{T}\pmb {\beta }_{0}+\varepsilon _{i}, \end{aligned}$$

(4.1)

with ${\mathcal {T}}=[0,1]$, $\pmb {\beta }_{0}=(2,0,1.5,0,0.3)^{T}$. We took $\gamma (t)=\sum _{j=1}^{50}\gamma _{j}\phi _{j}(t)$ and $X_{i}(t)=\sum _{j=1}^{50}\xi _{ij}\phi _{j}(t)$, where $\gamma _{1}=0.3$ and $\gamma _{j}=4(-1)^{j+1}j^{-\delta },j\ge 2$; $\phi _{1}(t)\equiv 1$ and $\phi _{j}(t)=2^{1/2}\cos ((j-1)\pi t),j\ge 2$; the $\xi _{ij}$’s were independent and normal $N(0, \lambda _{j})$. We let $Z_{i}=(Z_{i1},\ldots ,Z_{i5})^{T}$, when conditioning on $\xi _{ij}$, be a multivariate normal distribution with the mean vector $((1+\lambda _{1})^{-1/2}\xi _{i1},\ldots , (1+\lambda _{5})^{-1/2}\xi _{i5})^{T}$ and the variance-covariance matrix $V=v_{kl}$ with $v_{kk}=(1+\lambda _{k})^{-1}$ and $v_{kl}= 0.7((1+\lambda _{k})(1+\lambda _{l}))^{-1/2}$ for $k,l =1,\ldots ,5$, so that $Z_{i}$ has a multivariate normal distribution with the zero-mean vector and the variance-covariance matrix whose diagonal elements are 1 and off-diagonal elements are $v_{kl}$. The errors $\varepsilon _{i}$ were normally distributed with the mean 0 and the standard deviation 0.5. Similar to Shin and Lee (2012), we used 4 different sets of the eigenvalues, $\{\lambda _{j}\}$. In the two settings, $\lambda _{j}=j^{-\tau }$ and different values of $\tau $ are considered. In the other two settings, eigenvalues are “closely spaced” as in Hall and Horowitz (2007): $\lambda _{1}= 1, \lambda _{j} =0.2^{2}(1-0.0001j)^{2}$ if $2 \le j\le 4$, $\lambda _{5j+k}= 0.2^{2}\{(5j)^{-\tau /2}-0.0001k\}^{2}$ for $j\ge 1$ and $0\le k \le 4$.

1.
Set $\tau =1.1$ and $\delta =2$ with the well-spaced eigenvalues.
2.
Set $\tau =1.1$ and $\delta =2$ with the closely spaced eigenvalues.
3.
Set $\tau =3$ and $\delta =2$ with the well-spaced eigenvalues.
4.
Set $\tau =3$ and $\delta =2$ with the closely spaced eigenvalues.

Table 1 Results of Monte Carlo experiments for model (4.1)

Full size table

All the results in this section are based on 500 replications. In all the simulated designs, we used the SCAD penalty function with $a=3.7$. We set the sample size n to be 100 and 200, respectively. For each simulated data set, the penalized least squares estimators $\hat{\pmb {\beta }}_\mathrm{PLS}$ and $\hat{\gamma }_\mathrm{PLS}(t)$ were computed by the procedure given in Sects. 2 and 3. The tuning parameter m is determined by BIC criterion as described in Sect. 2, and the tuning parameter $\nu _{n}$ in (3.1) is selected by the method given by Fan et al. (2014).

We measured the estimation accuracy for parametric estimators by the average $l_{1}$-losses: $|\hat{\beta }_{1}-\beta _{1}|$, $|\hat{\beta }_{3}-\beta _{3}|$, and $|\hat{\beta }_{5}-\beta _{5}|$ over 500 replications. We also evaluated the selection accuracy by the average counts of false positive (FP) and false negative (FN) over the 500 replications; that is, the number of noise covariates included in the model and the number of signal covariates not included. Table 1 displays the simulation results for model (4.1). We see from Table 1 that there is a general tendency for the average $l_{1}$-loss and PN and FN to decrease as n increases and there is a general tendency for the average $l_{1}$-loss to decrease as $\tau $ increases. Table 1 also shows that PNs and FNs for Settings 1 and 3 with the well-spaced eigenvalues are less than that for Settings 2 and 4 with the closely spaced eigenvalues, while PN and FN for the Setting 4 are less than that for the Setting 2.

Table 2 reports the integrated squared bias ($\hbox {Bias}^{2}$), integrated variance (Var) and mean integrated squared error (MISE) of the estimator $\hat{\gamma }(t)$ computed on a grid of 100 equally spaced points on ${\mathcal {T}}$. Table 2 shows that there is a general tendency for the MISE to decrease as $\tau $ increases. We also see from Table 2 that the MISEs for Settings 1 and 3 with the well-spaced eigenvalues are less than that for Settings 2 and 4 with the closely spaced eigenvalues.

Table 2 Results of Monte Carlo experiments for model (4.1)

Full size table

Table 3 Results of Monte Carlo experiments under high-dimensional data

Full size table

In the following, we investigate the variable selection for high-dimensional data. In (4.1), let $Z_{i}=(Z_{i1},\ldots ,Z_{i30},)^{T}$, where $Z_{i1},\ldots ,Z_{i5}$ are taken the same as above, $Z_{i6},\ldots ,Z_{i30}$ are mutually independent and independent of $Z_{i1},\ldots ,Z_{i5}$ and $X_{ij}\sim N(0,1)$ for $j=6,\ldots ,30$, $\pmb {\beta }_{0}=(2,0,1.5,0,0.3,0,\ldots ,0)^{T}$. The simulation results under this high-dimensional data are reported in Tables 3 and 4. We find that Tables 3 and 4 show conclusions similar to those in Tables 1 and 2. Comparing Table 3 with Table 1 and Table 4 with Table 2, we see that our penalized least squares estimators also behave well under the high-dimensional data.

Table 4 Results of Monte Carlo experiments under high-dimensional data

Full size table

5 A real data example

In this section, we analyze a real data set using the proposed methodology. For this purpose, we analyze the real estate data set which was collected from the statistical yearbooks of various cities, real estate market reports and statistical bulletins on national economic and social development in China. It includes the real estate data for 197 second-, third- and fourth-tier cities in China. In this data set, there are the average annual income of urban residents from 2000 to 2016, and the other data are based on 2016. Our purpose is to study the relationship between urban housing prices and their influencing factors. The response variable Y represents urban housing price. Since it takes many years of savings for the average resident to buy a house, we choose the average annual income of the residents as the functional covariate. Let $X_{i}^{*}(t)$ denote the average annual income of the residents of the $i\hbox {th}$ city for the year t and $X_{i}(t)=X_{i}^{*}(t)-\bar{X}^{*}(t)$, where $\bar{X}^{*}(t)=\frac{1}{197}\sum _{i=1}^{197}X_{i}^{*}(t)$. The scalar covariates of primary interests include urban category ($Z_{2},Z_{3}$), urban population ($Z_{4}$), urban GDP ($Z_{5}$), bank interest rate ($Z_{6}$), urban livability index ($Z_{7}$), urban comprehensive competitiveness ($Z_{8}$) and urban development index ($Z_{9}$). We note that among these variables the data of some variables such as $Z_{4}$ and $Z_{5}$ are very large, whereas those of some variables such as $Z_{6}$ are small. For this purpose, for each data of these variables, we first make the following modification: Let $\bar{z}_{i4}$, $i=1,\ldots ,197$ be the observations of $Z_{4}$. Let $z_{i4} = \bar{z}_{i4}/\max \bar{z}_{i4}$,$i=1,\ldots ,197$, so that the maximum of modified data of the variable $Z_{4}$ is 1. The data of the variables $Z_{5},\ldots ,Z_{9}$ are modified in a similar fashion. We construct the following partial functional linear model:

$$\begin{aligned} \log (Y_{i})=\int _{0}^{17}\gamma (t)X_{i}(t)\mathrm{d}t+Z_{i1}\beta _{01}+\cdots +Z_{i9}\beta _{09}+\varepsilon _{i}, \end{aligned}$$

(5.1)

where $Z_{i1}\equiv 1$, $Z_{i2}=1$ and $Z_{i3}=0$ stand for second-tier city, $Z_{i2}=0$ and $Z_{i3}=1$ stand for third-tier city, and $Z_{i2}=0$ and $Z_{i3}=0$ stand for fourth-tier city.

The estimators of unknown parameters and function in model (5.1) are computed by the method given in Sect. 2, and the tuning parameter m is determined by BIC criterion as described in Sect. 2. Table 5 exhibits the parametric estimators, and Fig. 1a shows the estimated curve of $\gamma (t)$ and its 95% confidence interval. We see from Table 5 that urban population, urban GDP, urban livability index , urban comprehensive competitiveness and urban development index have nonnegative effects, while bank interest rate has a negative effect. The fact that $\beta _{02}>\beta _{01}>0$ in Table 5 indicates that the housing price for a third-tier city is larger than that for a fourth-tier city and the housing price for a second-tier city is larger than that for a third-tier city. We see from Fig. 1a that the estimated curve varies smoothly, but there is a rapid upward trend in the tail which shows that the effect of the average annual incomes of the residents on house prices varies greatly with different cities in recent years.

Table 5 The parametric estimators for model (5.1)

Full size table

Table 6 exhibits the penalized least squares estimators of the parameters computed by the procedure given in Sect. 3, and Fig. 1b shows the estimated curve of $\gamma (t)$ computed by (3.3) and its 95% confidence interval. Table 6 shows that urban category, urban GDP, urban livability index and urban development index are important factors affecting house prices. Comparing Fig. 1b with Fig. 1a, we see that the difference between the two is not much.

Table 6 The penalized least squares estimators of the parameters for model (5.1)

Full size table

To evaluate the prediction performance of our model and methods, we applied leave-one-out cross-validation to the data; i.e., when predicting the housing price for the ith city, we omit the data for this city when fitting the model. Figure 2 displays the boxplots for the absolute prediction errors $|\widehat{\log (y_{j})}-\log (y_{j})|,\ j=1,\ldots ,197,$ for the method given in Sect. 2 and the penalized method given in Sect. 3. The mean values of these errors for the two methods are 0.2529 and 0.2521, respectively. These observations and Fig. 2 suggest that the penalized method is slightly better than the method given in Sect. 2.

References

Aneirosa, G., Ferraty, F., Vieu, P.: Variable selection in partial linear regression with functional covariate. Statistics 49, 1322–1347 (2015)
Article MathSciNet Google Scholar
Brunel, E., Roche, A.: Penalized contrast estimation in functional linear models with circular data. Statistics 49, 1298–1321 (2015)
Article MathSciNet Google Scholar
Cai, T.T., Hall, P.: Prediction in functional linear regression. Ann. Stat. 34, 2159–2179 (2006)
Article MathSciNet Google Scholar
Cardot, H., Ferraty, F., Sarda, P.: Spline estimators for the functional linear model. Stat. Sin. 13, 571–591 (2003)
MathSciNet MATH Google Scholar
Cardot, H., Mas, A., Sarda, P.: CLT in functional linear models. Probab. Theory Relat. Fields 138, 325–361 (2007)
Article MathSciNet Google Scholar
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Article MathSciNet Google Scholar
Fan, J., Lv, J.: Non-concave penalized likelihood with np-dimensionality. IEEE Trans. Inf. Theory 57, 5467–5484 (2011)
Article Google Scholar
Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42, 819–849 (2014)
Article MathSciNet Google Scholar
Frank, I., Friedman, J.: A statistical view of some chemometrics regression tools (with discussion). Technometrics 35, 109–135 (1993)
Article Google Scholar
Hall, P., Horowitz, J.L.: Methodology and convergence rates for functional linear regression. Ann. Stat. 35, 70–91 (2007)
Article MathSciNet Google Scholar
Hsing, T., Eubank, R.: Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, New York (2015)
Book Google Scholar
Liang, H., Li, R.: Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 104, 234–248 (2009)
Article MathSciNet Google Scholar
Lv, J., Fan, J.: A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 37, 3498–3528 (2009)
Article MathSciNet Google Scholar
Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, New York (2002)
Book Google Scholar
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, New York (2005)
Book Google Scholar
Reiss, P.T., Ogden, R.T.: Functional generalized linear models with images as predictors. Biometrics 66, 61–69 (2010)
Article MathSciNet Google Scholar
Shin, H.: Partial functional linear regression. J. Stat. Plan. Inference 139, 3405–3418 (2009)
Article MathSciNet Google Scholar
Shin, H., Lee, M.H.: On prediction rate in partial functional linear regression. J. Multivar. Anal. 103, 93–106 (2012)
Article MathSciNet Google Scholar
Tang, Q.: Estimation for semi-functional linear regression. Statistics 49, 1262–1278 (2015)
Article MathSciNet Google Scholar
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
MathSciNet MATH Google Scholar
Wang, M., Wang, X.: Adaptive Lasso estimators for ultrahigh dimensional generalized linear models. Stat. Prob. Lett. 89, 41–50 (2014)
Article MathSciNet Google Scholar
Zhang, C.H.: Nearly unbiased variable selection under mini-max concave penalty. Ann. Stat. 38, 894–942 (2010)
Article Google Scholar
Zhang, D., Lin, X., Sowers, M.F.: Two-stage functional mixed models for evaluating the effect of longitudinal covariate profiles on a scalar outcome. Biometrics 63, 351–362 (2007)
Article MathSciNet Google Scholar
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Article MathSciNet Google Scholar

Download references

Acknowledgements

This work was supported by the National Social Science Foundation of China (16BTJ019), the Humanities and Social Science Foundation of Ministry of Education of China (14YJA910004) and Natural Science Foundation of Jiangsu Province of China (Grant No. BK20151481).

Author information

Authors and Affiliations

School of Economics and Management, Nanjing University of Science and Technology, Nanjing, 210094, China
Qingguo Tang & Peng Jin

Authors

Qingguo Tang
View author publications
You can also search for this author in PubMed Google Scholar
Peng Jin
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Qingguo Tang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix: Proofs

In this section, let $C>0$ denote a generic constant of which the value may change from line to line. For a matrix $A=(a_{ij})$, set $\Vert A\Vert _{\infty }=\max _{i}\sum _{j}|a_{ij}|$ and $|A|_{\infty }=\max _{i,j}|a_{ij}|$. For a vector $v=(v_{1},\ldots ,v_{k})^{T}$, set $\Vert v\Vert _{\infty }=\sum _{j=1}^{k}|v_{j}|$ and $|v|_{\infty }=\max _{1\le j\le k}|v_{j}|$. Denote $W_{l}=\sum _{j=1}^{\infty }\gamma _{j}\xi _{lj}$, $\tilde{W}_{i}=W_{i}-\frac{1}{n}\sum _{l=1}^{n}W_{l}\tilde{\xi }_{li}$, $\tilde{\varepsilon }_{i}=\varepsilon _{i}-\frac{1}{n}\sum _{l=1}^{n}\varepsilon _{l}\tilde{\xi }_{li}$ and $\tilde{W}=(\tilde{W}_{1},\ldots ,\tilde{W}_{n})^{T}$, $\tilde{\varepsilon }=(\tilde{\varepsilon }_{1},\ldots ,\tilde{\varepsilon }_{n})^{T}$. Then

$$\begin{aligned} \hat{\pmb {\beta }}-\pmb {\beta }_{0}=(\tilde{Z}^{T}\tilde{Z})^{-1}\tilde{Z}^{T}(\tilde{W}+\tilde{\varepsilon }). \end{aligned}$$

(A.1)

Lemma A.1

Suppose that Assumptions 1, 2, 4 and 5 hold, then it holds that

$$\begin{aligned} \frac{1}{n}\tilde{Z}^{T}\tilde{Z}=\Omega +o_{p}(1). \end{aligned}$$

Proof

Let $\tilde{Z}_{i}=(\tilde{Z}_{i1},\ldots ,\tilde{Z}_{id})^{T}$. Set $\vec {\xi }_{li}=\sum _{j=1}^{m}\frac{\xi _{lj} \xi _{ij}}{\lambda _{j}}$, $\vec {Z}_{ir1}=Z_{ir}-\frac{1}{n}\sum _{l=1}^{n}Z_{lr}\vec {\xi }_{li}$ and $\vec {Z}_{ir2}=\frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\tilde{\xi }_{li}-\vec {\xi }_{li}).$ Then $\tilde{Z}_{ir}=\vec {Z}_{ir1}-\vec {Z}_{ir2}$ and

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\tilde{Z}_{ir}\tilde{Z}_{iq}= & {} \frac{1}{n}\sum _{i=1}^{n}(\vec {Z}_{ir1}\vec {Z}_{iq1}-\vec {Z}_{ir1}\vec {Z}_{iq2}-\vec {Z}_{ir2}\vec {Z}_{iq1}\nonumber \\&+\vec {Z}_{ir2}\vec {Z}_{iq2}), \quad r,q=1,\ldots ,d. \end{aligned}$$

(A.2)

Let $\vec {Z}_{ir21}=\sum _{j=1}^{m}\frac{1}{\lambda _{j}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj}-\xi _{lj})\right] \xi _{ij}$, $\vec {Z}_{ir22}=\sum _{j=1}^{m}\left( \frac{1}{\hat{\lambda }_{j}} -\frac{1}{\lambda _{j}}\right) \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}\right) \xi _{ij}$ and $\vec {Z}_{ir23}=\sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}\right) (\hat{\xi }_{ij}-\xi _{ij}).$ We then have

$$\begin{aligned} |\vec {Z}_{ir2}\vec {Z}_{iq2}|\le \frac{3}{2}\left( \vec {Z}_{ir21}^{2}+\vec {Z}_{ir22}^{2}+\vec {Z}_{ir23}^{2}+\vec {Z}_{iq21}^{2}+\vec {Z}_{iq22}^{2}+\vec {Z}_{iq23}^{2}\right) . \end{aligned}$$

(A.3)

Lemma 5.1 of Hall and Horowitz (2007) implies that

$$\begin{aligned} \hat{\xi }_{lj}-\xi _{lj}= \sum _{k\ne j}\frac{\xi _{lk}}{\hat{\lambda }_{j}-\lambda _{k}}\int \Delta \hat{\phi }_{j}\phi _{k}+\xi _{lj}\int (\hat{\phi }_{j}-\phi _{j})\phi _{j}, \end{aligned}$$

(A.4)

where $\Delta =\hat{K}-K$. We then obtain that

$$\begin{aligned} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj}-\xi _{lj})\right] ^{2}\le & {} 2\left( \sum _{k\ne j}\frac{\vec {\xi }_{rk}}{\hat{\lambda }_{j}-\lambda _{k}} \int \Delta \hat{\phi }_{j}\phi _{k}\right) ^{2} +2\left( \vec {\xi }_{rj}\int (\hat{\phi }_{j}-\phi _{j})\phi _{j}\right) ^{2} \\\le & {} 2\left[ \sum _{k\ne j}\frac{\vec {\xi }_{rk}^{2}}{\left( \hat{\lambda }_{j}-\lambda _{k}\right) ^{2}}\right] \left[ \sum _{k=1}^{\infty }\left( \int \Delta \hat{\phi }_{j}\phi _{k}\right) ^{2}\right] \\&+\,2\vec {\xi }_{rj}^{2}\left( \int (\hat{\phi }_{j}-\phi _{j})\phi _{j}\right) ^{2}, \end{aligned}$$

where $\vec {\xi }_{rj}=\frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj}$. Lemma 1 of Cardot et al. (2007) implies that

$$\begin{aligned} |\lambda _{j}-\lambda _{k}|\ge \lambda _{j}-\lambda _{j+1}\ge \lambda _{m}-\lambda _{m+1}\ge \lambda _{m}/(m+1)\ge \lambda _{m}/(2m) \end{aligned}$$

uniformly for $1\le j\le m$. By (5.2) of Hall and Horowitz (2007), it holds that $\sup _{j\ge 1}|\hat{\lambda }_{j}-\lambda _{j}|\le |\Vert \Delta \Vert |=O_{p}(n^{-1/2})$ and

$$\begin{aligned} \begin{array}{ll} \left( \int (\hat{\phi }_{j}-\phi _{j})\phi _{j}\right) ^{2} \le \Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}\le C\frac{|\Vert \Delta \Vert |^{2}}{(\lambda _{j}-\lambda _{j+1})^{2}}\le C|\Vert \Delta \Vert |^{2}\lambda _{j}^{-2}j^{2}, \end{array} \end{aligned}$$

(A.5)

where $|\Vert \Delta \Vert |=(\int _{{\mathcal {T}}}\int _{{\mathcal {T}}}\Delta ^{2}(s,t)\mathrm{d}s\mathrm{d}t)^{1/2}$. Using Parseval’s identity, we get that

$$\begin{aligned} \sum _{k=1}^{\infty } \left( \int \Delta \hat{\phi }_{j}\phi _{k}\right) ^{2} =\int \left( \int \Delta \hat{\phi }_{j}\right) ^{2}\le |\Vert \Delta \Vert |^{2}=O_{p}(n^{-1}). \end{aligned}$$

Assumption 4 implies that $|\hat{\lambda }_{j}-\lambda _{j}|=o_{p}(\lambda _{m}/m)$. Consequently, $\sum _{k\ne j}\frac{\vec {\xi }_{rk}^{2}}{(\hat{\lambda }_{j}-\lambda _{k})^{2}}=\sum _{k\ne j}\frac{\vec {\xi }_{rk}^{2}}{(\lambda _{j}-\lambda _{k})^{2}}[1+o_{p}(1)]$, where $o_{p}(1)$ holds uniformly for $1\le j\le m$. By arguments similar to those used in the proof of Lemma 2 of Cardot et al. (2007) and use the fact that $(\lambda _{j}-\lambda _{k})^{2}\ge (\lambda _{k}-\lambda _{k+1})^{2}$, we deduce that

$$\begin{aligned} \sum _{k\ne j}\frac{1}{(\lambda _{j}-\lambda _{k})^{2}}E(\vec {\xi }_{rk}^{2})&\le C\sum _{k\ne j}\frac{1}{(\lambda _{j}-\lambda _{k})^{2}}\left( n^{-1}\lambda _{k}+g_{rk}^{2}\lambda _{k}^{2}\right) \\&\le C\left( n^{-1}\lambda _{j}^{-1}j^{2}\log j+1\right) . \end{aligned}$$

Lemma 1 of Cardot et al. (2007) yields that

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-2}j^{2}\log j\le m^{-2}\lambda _{m}^{-2}\sum _{j=1}^{m}j^{4}\log j\le \lambda _{m}^{-2}m^{3}\log m \end{aligned}$$

and $\sum _{j=1}^{m}\lambda _{j}^{-1}\le \lambda _{m}^{-1}m$. Therefore,

$$\begin{aligned} \begin{array}{ll} \sum _{j=1}^{m}\frac{1}{\lambda _{j}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj}-\xi _{lj})\right] ^{2} =O_{p}\left( n^{-2}\lambda _{m}^{-2}m^{3}\log m +n^{-1}\lambda _{m}^{-1}m\right) \end{array}\nonumber \\ \end{aligned}$$

(A.6)

and

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{ir21}^{2}\le & {} \left( \sum _{j=1}^{m}\frac{1}{\lambda _{j}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj}-\xi _{lj})\right] ^{2}\right) \left( \sum _{j=1}^{m}\frac{1}{n\lambda _{j}}\sum _{i=1}^{n}\xi _{ij}^{2}\right) \nonumber \\= & {} O_{p}\left( n^{-2}\lambda _{m}^{-2}m^{4}\log m +n^{-1}\lambda _{m}^{-1}m^{2}\right) . \end{aligned}$$

(A.7)

Decomposing $\frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}=\vec {\xi }_{rj}+\frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj} -\xi _{lj})$ and using (A.6), we get

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{ir22}^{2}\le & {} C\sum _{j=1}^{m} \frac{(\hat{\lambda }_{j}-\lambda _{j})^{2}}{\lambda _{j}^{3}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}\right) ^{2} [1+o_{p}(1)] \left( \sum _{j=1}^{m}\frac{1}{n\lambda _{j}}\sum _{i=1}^{n}\xi _{ij}^{2}\right) \nonumber \\= & {} O_{p}\left( n^{-1}\lambda _{m}^{-1}m+n^{-3}\lambda _{m}^{-4}m^{4}\log m +n^{-2}\lambda _{m}^{-3}m^{2}\right) . \end{aligned}$$

(A.8)

By (A.10) of Tang (2015), it holds that

$$\begin{aligned} n\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}/(j^{2}\log j)=O_{p}(1), \end{aligned}$$

(A.9)

where $O_{p}(\cdot )$ holds uniformly for $1\le j\le m$. Using (A.8) and (A.9), we obtain

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{ir23}^{2}\le & {} \left( \sum _{j=1}^{m}\frac{1}{\hat{\lambda }^{2}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}\right) ^{2}\right) \left( \frac{1}{n}\sum _{i=1}^{n}\Vert X_{i}\Vert ^{2}\right) \left( \sum _{j=1}^{m}\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}\right) \nonumber \\= & {} O_{p}\left( \left( n^{-2}\lambda _{m}^{-1}m^{4}+n^{-1}m^{3}+n^{-3}\lambda _{m}^{-3}m^{6}\log m +n^{-2}\lambda _{m}^{-2}m^{4}\right) \log m\right) .\nonumber \\ \end{aligned}$$

(A.10)

Hence, by (A.3), (A.7), (A.8), and (A.10) and Assumption 4, we conclude that

$$\begin{aligned} \frac{1}{n}\sum _{i=1}^{n}|\vec {Z}_{ir2}\vec {Z}_{iq2}|=O_{p}\left( n^{-2} \lambda _{m}^{-2}m^{4}\log m +n^{-1}\lambda _{m}^{-1}m^{2}\right) =o_{p}(1). \end{aligned}$$

(A.11)

Define $\check{\xi }_{jr}=\frac{1}{n}\sum _{l=1}^{n}\lambda _{j}^{-1/2}\xi _{lj}Z_{lr}$. Since $E[\max _{1\le j\le m}(\check{\xi }_{jr}-E(\check{\xi }_{jr}))^{2}]\le \frac{1}{n}\sum _{j=1}^{m}\lambda _{j}^{-1}E(\xi _{j}Z_{r})^{2}\le Cn^{-1}$, we then have $\max _{1\le j\le m}|\check{\xi }_{jr}-E(\check{\xi }_{jr})|=O_{p}(n^{-1/2})$. Hence

$$\begin{aligned}&\frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{ir1}\vec {Z}_{iq1}\nonumber \\&\quad =\frac{1}{n}\sum _{i=1}^{n}Z_{ir}Z_{iq} -2\sum _{j=1}^{m}\check{\xi }_{jr}\check{\xi }_{jq}+ \sum _{j=1}^{m}\frac{\check{\xi }_{jr}\check{\xi }_{jq}}{n\lambda _{j}}\left( \sum _{i=1}^{n}\xi _{ij}^{2}\right) +\sum _{j\ne j'}\check{\xi }_{jr}\check{\xi }_{j'q}\bar{\xi }_{jj'} \nonumber \\&\quad =\sum _{j=1}^{\infty }g_{rj}g_{qj}\lambda _{j} +E(H_{r}H_{q})-2\sum _{j=1}^{m}g_{rj}g_{qj}\lambda _{j} +\sum _{j=1}^{m}g_{rj}g_{qj}\lambda _{j}+o_{p}(1) \nonumber \\&\quad =E(H_{r}H_{q})+o_{p}(1), \end{aligned}$$

(A.12)

where $\bar{\xi }_{jj'}=\frac{1}{n(\lambda _{j}\lambda _{j'})^{1/2}}\sum _{i=1}^{n}\xi _{ij}\xi _{ij'}$. Now Lemma A.1 follows from (A.2), (A.11), (A.12) and the fact that $\frac{1}{n}|\sum _{i=1}^{n}\vec {Z}_{ir1}\vec {Z}_{iq2}| \le \left( \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{ir1}^{2}\right) ^{1/2} \left( \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{iq2}^{2}\right) ^{1/2}$. $\square $

Lemma A.2

Under Assumptions 1–4, it holds that

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j} \left[ \gamma _{j}-\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \right] ^{2} =O_{p}\left( n^{-1}\lambda _{m}^{-1}m\right) . \end{aligned}$$

Proof

Set $S_{1}=\sum _{j=1}^{m}\lambda _{j}\left[ \gamma _{j}-\frac{1}{\lambda _{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}\right) \right] ^{2}$, $S_{2}=\sum _{j=1}^{m}\frac{1}{\lambda _{j}} \left[ \frac{1}{n}\sum _{l=1}^{n}W_{l}(\hat{\xi }_{lj}-\xi _{lj})\right] ^{2}$ and $S_{3}=\sum _{j=1}^{m}\lambda _{j} \left( \frac{1}{\hat{\lambda }_{j}}-\frac{1}{\lambda _{j}}\right) ^{2} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) ^{2}$. We have

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j} \left[ \gamma _{j}-\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \right] ^{2}\le 3(S_{1}+S_{2}+S_{3}). \end{aligned}$$

(A.13)

Since $E\left[ \gamma _{j}-\frac{1}{\lambda _{j}}\left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}\right) \right] =0$, then by Assumptions 1–3, we obtain that

$$\begin{aligned} E(S_{1})=\sum _{j=1}^{m}\frac{1}{\lambda _{j}} Var\left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}\right) \le \sum _{j=1}^{m}\frac{1}{n^{2}\lambda _{j}}\sum _{l=1}^{n} E\left( W_{l}^{2}\xi _{lj}^{2}\right) \le Cm/n.\nonumber \\ \end{aligned}$$

(A.14)

Similar to the proof of (A.6) and (A.8) and using Assumption 4, we deduce that

$$\begin{aligned} S_{2}=O_{p}\left( n^{-2}\lambda _{m}^{-2}m^{3}\log m +n^{-1}\lambda _{m}^{-1}m\right) =O_{p}(n^{-1}\lambda _{m}^{-1}m) \end{aligned}$$

(A.15)

and

$$\begin{aligned} S_{3}\le & {} C\sum _{j=1}^{m}\frac{(\hat{\lambda }_{j}-\lambda _{j})^{2}}{\lambda _{j}^{3}}\left( \bar{\zeta }_{j}^{2} +\left[ \frac{1}{n}\sum _{l=1}^{n}\zeta _{l}(\hat{\xi }_{lj}-\xi _{lj})\right] ^{2}\right) [1+o_{p}(1)] \nonumber \\= & {} O_{p}\left( n^{-1}\lambda _{m}^{-1}+n^{-3}\lambda _{m}^{-4}m^{3}\log m +n^{-2}\lambda _{m}^{-3}m)=O_{p}(n^{-1}\lambda _{m}^{-1}\right) .\qquad \end{aligned}$$

(A.16)

Now Lemma A.2 follows from (A.13)–(A.16). $\square $

Lemma A.3

Under Assumptions 1, 2, 4 and 5, it holds that

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-1} \left( \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right) ^{2} =O_{p}(nm+\lambda _{m}^{-2}m^{4}\log m). \end{aligned}$$

Proof

Let $Z_{ir}^{*}=Z_{ir}-\sum _{j'=1}^{m}\frac{1}{\lambda _{j'}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj'}\right) \xi _{ij'}$. Observe that

$$\begin{aligned} \left( \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right) ^{2}\le & {} 4\left( \sum _{i=1}^{n}\xi _{ij}Z_{ir}^{*}\right) ^{2} \nonumber \\&+\,4\left( \sum _{i=1}^{n}\xi _{ij}\sum _{j'=1}^{m}\frac{1}{\lambda _{j'}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\left( \hat{\xi }_{lj'}-\xi _{lj'}\right) \right] \xi _{ij'}\right) ^{2} \nonumber \\&+\, 4\left( \sum _{i=1}^{n}\xi _{ij}\sum _{j'=1}^{m} \left( \frac{1}{\hat{\lambda }_{j'}}-\frac{1}{\lambda _{j'}}\right) \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj'}\right] \xi _{ij'}\right) ^{2} \nonumber \\&+\,4\left( \sum _{i=1}^{n}\xi _{ij}\sum _{j'=1}^{m} \frac{1}{\hat{\lambda }_{j'}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj'}\right] (\hat{\xi }_{ij'}-\xi _{ij'})\right) ^{2} \nonumber \\=: & {} 4(T_{j1}+T_{j2}+T_{j3}+T_{j4}). \end{aligned}$$

(A.17)

By direct computation and using Assumption 1, we get

$$\begin{aligned} E\left( \xi _{ij}^{2}{Z_{ir}^{*}}^{2}\right)\le & {} 2E\left( \xi _{ij}^{2}Z_{ir}^{2}\right) +2E \left[ \xi _{ij}^{2}\left( \sum _{j'=1}^{m}\frac{1}{\lambda _{j'}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj'}\right) \xi _{ij'}\right) ^{2}\right] \\\le & {} C\left( \lambda _{j}+m\lambda _{j}/n^{2} +(n-1)m\lambda _{j}/n^{2}+m^{2}\lambda _{j}/n^{2}\right) \le C\lambda _{j} \end{aligned}$$

and

$$\begin{aligned} \left| \sum _{i_{1}\ne i_{2}}E\left( \xi _{i_{1}j}\xi _{i_{2}j}{Z_{i_{1}r}^{*}}{Z_{i_{2}r}^{*}}\right) \right| \le C[(n-1)(n+2)\lambda _{j}/n+(n-1)m\lambda _{j}/n]\le Cn\lambda _{j}. \end{aligned}$$

Hence

$$\begin{aligned} E(T_{j1})=\sum _{i=1}^{n}E\left( \xi _{ij}^{2}{Z_{ir}^{*}}^{2}\right) +\sum _{i_{1}\ne i_{2}}E\left( \xi _{i_{1}j}\xi _{i_{2}j}{Z_{i_{1}r}^{*}}{Z_{i_{2}r}^{*}}\right) \le Cn\lambda _{j}. \end{aligned}$$

(A.18)

Since $\sum _{j'=1}^{m}\frac{1}{\lambda _{j'}}E\left( \sum _{i=1}^{n}\xi _{ij}\xi _{ij'}\right) ^{2}\le Cn^{2}\lambda _{j}$, then by (A.6), we have

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-1}T_{j2}\le & {} \left( \sum _{j'=1}^{m}\frac{1}{\lambda _{j'}}\left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj'}- \xi _{lj'})\right] ^{2}\right) \nonumber \\&\times \left( \sum _{j=1}^{m}\lambda _{j}^{-1}\sum _{j'=1}^{m}\frac{1}{\lambda _{j'}}\left( \sum _{i=1}^{n}\xi _{ij}\xi _{ij'}\right) ^{2}\right) \nonumber \\= & {} O_{p}(n^{-2}\lambda _{m}^{-2}m^{3}\log m)O_{p}(n^{2}m)=O_{p}(\lambda _{m}^{-2}m^{4}\log m).\qquad \end{aligned}$$

(A.19)

Similar to the proof (A.8) and using Assumption 4, we deduce that

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-1}T_{j3}\le & {} \left( \sum _{j'=1}^{m}\lambda _{j'} \left( \frac{1}{\hat{\lambda }_{j'}}-\frac{1}{\lambda _{j'}}\right) ^{2} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj'}\right] ^{2}\right) \nonumber \\&\times \left( \sum _{j=1}^{m}\lambda _{j}^{-1}\sum _{j'=1}^{m} \frac{1}{\lambda _{j'}}\left( \sum _{i=1}^{n}\xi _{ij}\xi _{ij'}\right) ^{2}\right) \nonumber \\= & {} O_{p}\left( \lambda _{m}^{-2}m^{2}+n^{-1}\lambda _{m}^{-4}m^{4}\log m\right) =o_{p}\left( \lambda _{m}^{-2}m^{2}\log m\right) .\qquad \end{aligned}$$

(A.20)

and

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-1}T_{j4}\le & {} \left( \sum _{j'=1}^{m}\frac{1}{\lambda _{j'}^{2}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj'}\right] ^{2}\right) [1+o_{p}(1)] \nonumber \\&\quad \times \left( \sum _{j=1}^{m}\frac{1}{\lambda _{j}} \sum _{i=1}^{n}\xi _{ij}^{2}\right) \left( \sum _{j'=1}^{m}\sum _{i=1}^{n}(\hat{\xi }_{ij'}-\xi _{ij'})^{2}\right) \nonumber \\= & {} O_{p}\left( n^{-1}\lambda _{m}^{-1}m^{5}\log m+n^{-2}\lambda _{m}^{-3}m^{7}(\log m)^{2}\right) =o_{p}\left( \lambda _{m}^{-2}m^{4}\log m\right) .\nonumber \\ \end{aligned}$$

(A.21)

Now Lemma A.3 follows from (A.17)–(A.21) and Assumption 4. $\square $

Lemma A.4

Under Assumptions 1–5, it holds that

$$\begin{aligned} n^{-1/2}\left| \sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \sum _{i=1}^{n}(\hat{\xi }_{ij}-\xi _{ij})\tilde{Z}_{ir}\right| =o_{p}(1). \end{aligned}$$

Proof

Let $\breve{W}_{j}=\frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}$. Applying the Cauchy–Schwarz inequality, we get

$$\begin{aligned} \left( \sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}}\breve{W}_{j}\sum _{i=1}^{n}(\hat{\xi }_{ij}-\xi _{ij})\tilde{Z}_{ir}\right) ^{2}\le \left( \sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}^{2}}\breve{W}_{j}^{2}\right) \left( \sum _{j=1}^{m}\left( \sum _{i=1}^{n}(\hat{\xi }_{ij}-\xi _{ij}) \tilde{Z}_{ir}\right) ^{2}\right) . \end{aligned}$$

Using (A.4), (A.5), Assumption 4 and Parseval’s identity and the arguments similar to those used to prove Lemma A.3, we deduce that

$$\begin{aligned}&\sum _{j=1}^{m} \left( \sum _{i=1}^{n}(\hat{\xi }_{ij}-\xi _{ij})\tilde{Z}_{ir}\right) ^{2} \\&\quad \le 2\sum _{j=1}^{m}\left[ \left( \sum _{k\ne j}(\hat{\lambda }_{j}-\lambda _{k})^{-1}\int \Delta \hat{\phi }_{j}\phi _{k} \sum _{i=1}^{n}\xi _{ik}\tilde{Z}_{ir}\right) ^{2}\right. \\&\qquad \left. +\left( \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right) ^{2} \left( \int (\hat{\phi }_{j}-\phi _{j})\phi _{j}\right) ^{2}\right] \\&\le C|\Vert \Delta \Vert |^{2}\sum _{j=1}^{m}\left[ \sum _{k\ne j}\left( \hat{\lambda }_{j}-\lambda _{k}\right) ^{-2} \left( \sum _{i=1}^{n}\xi _{ik}\tilde{Z}_{ir}\right) ^{2} +\lambda _{j}^{-2}j^{2}\left( \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right) ^{2}\right] \\&=O_{p} \left( \lambda _{m}^{-1}m^{3}\log m+n^{-1}\lambda _{m}^{-3}m^{6}\log m\right) =o_{p}(n). \end{aligned}$$

Let $\vec {W}_{j}=\frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}$. Decomposing $\frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}=\vec {W}_{j}+\frac{1}{n}\sum _{l=1}^{n}W_{l}(\hat{\xi }_{lj} -\xi _{lj})$ and using arguments similar to those used in the proof of (A.6) and using Assumption 4 , we obtain that

$$\begin{aligned} \sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}^{2}}\breve{W}_{j}^{2}=O_{p}(n^{-1}\lambda _{m}^{-1}m+1+n^{-2}\lambda _{m}^{-3}m^{3}\log m+n^{-1}\lambda _{m}^{-2}m)=O_{p}(1). \end{aligned}$$

This finishes the proof of Lemma A.4. $\square $

Lemma A.5

Under Assumptions 1–5, it holds that

$$\begin{aligned} n^{-1/2}\left| \sum _{i=1}^{n}\tilde{W}_{i}\tilde{Z}_{ir}\right| =o_{p}(1). \end{aligned}$$

Proof

Observe that

$$\begin{aligned} \sum _{i=1}^{n}\tilde{W}_{i}\tilde{Z}_{ir}= & {} \sum _{j=1}^{m}\left[ \gamma _{j}-\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \right] \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\nonumber \\&-\sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \sum _{i=1}^{n}\left( \hat{\xi }_{ij}-\xi _{ij}\right) \tilde{Z}_{ir}\nonumber \\&+\sum _{j=m+1}^{\infty }\gamma _{j} \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir} \end{aligned}$$

(A.22)

Lemmas A.2 and A.3 and Assumption 4 imply that

$$\begin{aligned}&n^{-\frac{1}{2}}\left| \sum _{j=1}^{m}\left[ \gamma _{j}-\frac{1}{\hat{\lambda }_{j}}\left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \right] \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right| \nonumber \\&\quad \le n^{-\frac{1}{2}}\left( \sum _{j=1}^{m}\lambda _{j}\left[ \gamma _{j}-\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) \right] ^{2}\right) ^{\frac{1}{2}} \left( \sum _{j=1}^{m}\lambda _{j}^{-1}(\sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir})^{2}\right) ^{\frac{1}{2}} \nonumber \\&\quad =O_{p}\left( n^{-1/2}\lambda _{m}^{-1/2}m+n^{-1}\lambda _{m}^{-3/2}m^{5/2}(\log m)^{1/2}\right) =o_{p}(1). \end{aligned}$$

(A.23)

By arguments similar to those used in the proof of Lemma A.3, we obtain that

$$\begin{aligned}&\left( \sum _{j=m+1}^{\infty }\gamma _{j} \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right) ^{2}\nonumber \\&\quad \le \left( \sum _{j=m+1}^{\infty }\gamma _{j}^{2}\right) \left( \sum _{j=m+1}^{\infty } \left( \sum _{i=1}^{n}\xi _{ij}\tilde{Z}_{ir}\right) ^{2}\right) \nonumber \\&\quad =O_{p}\left( nm^{-2\gamma +1}+\lambda _{m}^{-2}m^{-2\gamma +4}\log m\right) \sum _{j=m+1}^{\infty }\lambda _{j}\nonumber \\&\quad =o_{p}(n). \end{aligned}$$

(A.24)

Now Lemma A.5 follows from (A.22)–(A.24) and Lemma A.4. $\square $

Proof of Theorem 2.1

By arguments similar to those used to prove Lemmas A.4 and A.5, we deduce that $n^{-1/2}\sum _{i=1}^{n}\left( \frac{1}{n}\sum _{l=1}^{n}\varepsilon _{l}\tilde{\xi }_{li}\right) \tilde{Z}_{ir}=o_{p}(1)$. Hence

$$\begin{aligned} n^{-\frac{1}{2}}\sum _{i=1}^{n}\tilde{Z}_{ir}\tilde{\varepsilon }_{i}=n^{-\frac{1}{2}}\sum _{i=1}^{n}\tilde{Z}_{ir}\varepsilon _{i}+o_{p}(1). \end{aligned}$$

We decompose $\sum _{i=1}^{n}\tilde{Z}_{ir}\varepsilon _{i}$ into three terms as

$$\begin{aligned} \sum _{i=1}^{n}\tilde{Z}_{ir}\varepsilon _{i}= & {} \sum _{i=1}^{n}\varepsilon _{i}\left( Z_{ir}-\sum _{j=1}^{m}\frac{E(Z_{lr}\xi _{j})}{\lambda _{j}}\xi _{ij}\right) -\sum _{i=1}^{n}\varepsilon _{i}\sum _{j=1}^{m}\frac{\xi _{ij}}{\lambda _{j}}\nonumber \\&\left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj} -E(Z_{lr}\xi _{j})\right) -\sum _{i=1}^{n}\varepsilon _{i}\frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\tilde{\xi }_{li}-\vec {\xi }_{li}). \end{aligned}$$

Similar to the proof of Lemma A.4, we have $\sum _{i=1}^{n}\varepsilon _{i} \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\tilde{\xi }_{li}-\vec {\xi }_{li})=o_{p}(n)$. Since

$$\begin{aligned} \sum _{i=1}^{n}\varepsilon _{i}\left( Z_{ir}-\sum _{j=1}^{m}\frac{E(Z_{lr}\xi _{j} )}{\lambda _{j}}\xi _{ij}\right) =\sum _{i=1}^{n}\varepsilon _{i}H_{ir}+\sum _{i=1} ^{n}\varepsilon _{i}\sum _{j=m+1}^{\infty }g_{rj}\xi _{ij}, \end{aligned}$$

$\sum _{i=1}^{n}\varepsilon _{i}\sum _{j=1}^{m}\frac{\xi _{ij}}{\lambda _{j} }\left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj}-E(Z_{lr}\xi _{j})\right) =o_{p}(n)$ and $\sum _{i=1}^{n}\varepsilon _{i}\sum _{j=m+1}^{\infty }g_{kj}\xi _{ij} =o_{p}(n)$, it follows that

$$\begin{aligned} n^{-\frac{1}{2}}\sum _{i=1}^{n}\tilde{Z}_{ir}\tilde{\varepsilon }_{i}=n^{-\frac{1}{2}}\sum _{i=1}^{n}H_{ir}\varepsilon _{i}+o_{p}(1). \end{aligned}$$

(A.25)

Now (2.9) follows from (A.1), Lemmas A.1 and A.5, (A.25) and the central limit theorem. The proof of Theorem 2.1 is finished. $\square $

Lemma A.6

Define $\check{\gamma }_{j}=\frac{1}{\hat{\lambda }_{j} }E[(Y-Z^{T}\pmb {\beta }_{0})\xi _{j}]$. Under the assumptions of Theorem 3.2, it holds that

$$\begin{aligned} \sum _{j=1}^{m}(\hat{\gamma }_{j}-\check{\gamma }_{j})^{2} =O_{p}\left( n^{-1}m\lambda _{m}^{-1} +n^{-2}m\lambda _{m}^{-2}\sum _{j=1}^{m}\gamma _{j}^{2}\lambda _{j}^{-2}j^{3}\right) . \end{aligned}$$

Proof

Define $I_{1}=\frac{1}{n}\sum _{i=1}^{n} (Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})\xi _{ij} -\gamma _{j}\lambda _{j}$, $I_{2}=\frac{1}{n}\sum _{i=1}^{n}(Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})(\hat{\xi }_{ij}-\xi _{ij})$ and $I_{3}=\frac{1}{n}\sum _{i=1}^{n}Z_{i}^{T}(\hat{\pmb {\beta }}-\pmb {\beta }_{0})\hat{\xi }_{ij}$. Noting that $E[(Y-Z^{T}\pmb {\beta }_{0})\xi _{j}]=\gamma _{j}\lambda _{j}$, we have

$$\begin{aligned} \sum _{j=1}^{m}(\hat{\gamma }_{j}-\check{\gamma }_{j})^{2}\le 3\sum _{j=1} ^{m}\lambda _{j}^{-2}\left( I_{1}^{2}+I_{2}^{2}+I_{3}^{2}\right) [1+o_{p} (1)], \end{aligned}$$

(A.26)

where $o_{p}(1)$ holds uniformly for $j=1,\ldots ,m$. Since $E(I_{1})=0$ and $E(I_{1}^{2})\le \frac{1}{n}[\sum _{k=1}^{\infty }\gamma _{k} ^{2}E(\xi _{k}^{2}\xi _{j}^{2})+\sigma ^{2}\lambda _{j}]\le C\lambda _{j}/n$, we obtain that

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-2}I_{1}^{2}=O_{p}\left( n^{-1}\sum _{j=1} ^{m}\lambda _{j}^{-1}\right) =O_{p}(n^{-1}m\lambda _{m} ^{-1}). \end{aligned}$$

(A.27)

Let $M(t)=E[(Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})X_{i}(t)] =\sum _{k=1}^{\infty }\gamma _{k}\lambda _{k}\phi _{k}(t)$. Then

$$\begin{aligned} I_{2}^{2}\le & {} 2\int _{{\mathcal {T}}}\left( \frac{1}{n}\sum _{i=1}^{n}(Y_{i} -Z_{i}^{T}\pmb {\beta }_{0})X_{i}(t)-M(t)\right) ^{2} \mathrm{d}t\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}\\&+\,2\left( \int _{{\mathcal {T}}}M(t)(\hat{\phi }_{j}(t)-\phi _{j}(t))\mathrm{d}t\right) ^{2}. \end{aligned}$$

Applying Assumption 1, it holds that

$$\begin{aligned}&E\left( \int _{{\mathcal {T}}}\left( \frac{1}{n}\sum _{i=1}^{n}(Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})X_{i}(t)-M(t)\right) ^{2} \mathrm{d}t\right) \\&\quad \le \frac{1}{n}\int _{{\mathcal {T}}}E[(Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})^{2}X_{i}^{2}(t)]\mathrm{d}t=O(n^{-1}). \end{aligned}$$

From (A.9), we obtain $\sum _{j=1}^{m}\lambda _{j}^{-2}\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}=O_{p}(n^{-1}m^{3}\lambda _{m}^{-2} \log m)$. By arguments similar to those used in the proof of (5.15) of Hall and Horowitz (2007), it follows that

$$\begin{aligned}&\sum _{j=1}^{m}\lambda _{j}^{-2}\left( \int _{{\mathcal {T}}}M(t)(\hat{\phi }_{j}(t)-\phi _{j}(t))\mathrm{d}t\right) ^{2}\\&\quad =O_{p}\left( \frac{m}{n\lambda _{m} }+\frac{m}{n^{2}\lambda _{m}^{2}}\sum _{j=1}^{m} \gamma _{j}^{2}\lambda _{j}^{-2}j^{3}+\frac{m^{3}\log m}{n^{2} \lambda _{m}^{2}}\right) . \end{aligned}$$

Hence, using the assumption that $n^{-1}m^{2}\lambda _{m} ^{-1}\log m\rightarrow 0$, we obtain

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-2}I_{2}^{2}=O_{p}\left( n^{-1}m \lambda _{m}^{-1}+n^{-2}m\lambda _{m}^{-2}\sum _{j=1}^{m}\gamma _{j}^{2}\lambda _{j}^{-2}j^{3}\right) . \end{aligned}$$

(A.28)

Using Theorem 3.1, it holds that

$$\begin{aligned} \sum _{j=1}^{m}\lambda _{j}^{-2}I_{3}^{2}\le & {} \left( \sum _{j=1} ^{m}\frac{1}{n\lambda _{j}^{2}}\sum _{i=1}^{n}\hat{\xi }_{ij} ^{2}\right) \left( \frac{1}{n}\sum _{i=1}^{n}[Z_{i}^{T}(\hat{\pmb {\beta }}-\pmb {\beta }_{0})]^{2}\right) \nonumber \\= & {} O_{p}\left( m\lambda _{m}^{-1}+n^{-1}m^{3}\lambda _{m}^{-2}\log m\right) O_{p}(n^{-1})=O_{p}\left( n^{-1}m\lambda _{m}^{-1}\right) .\nonumber \\ \end{aligned}$$

(A.29)

Now Lemma A.6 follows from combining (A.26)–(A.29). $\square $

Proof of Theorem 2.2

Note that

$$\begin{aligned} \int _{{\mathcal {T}}}[\hat{\gamma }(t)-\gamma (t)]^{2}\mathrm{d}t\le & {} C\left( \sum _{j=1}^{m}(\hat{\gamma }_{j}-\check{\gamma }_{j})^{2}\right. \nonumber \\&\left. +\sum _{j=1}^{m}(\check{\gamma }_{j}-\gamma _{j})^{2} +m\sum _{j=1}^{m}\gamma _{j}^{2}\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2} +\sum _{j=m+1}^{\infty }\gamma _{j}^{2}\right) \nonumber \\ \end{aligned}$$

(A.30)

and

$$\begin{aligned} \sum _{j=1}^{m}(\check{\gamma }_{j}-\gamma _{j})^{2} =\sum _{j=1}^{m}\frac{(\hat{\lambda }_{j}-\lambda _{j})^{2}}{\lambda _{j}^{2}}\gamma _{j}^{2}[1+o_{p}(1)] =O_{p}\left( n^{-1}\lambda _{m}^{-1}\sum _{j=1}^{m}\gamma _{j}^{2}\lambda _{j}^{-1}\right) .\nonumber \\ \end{aligned}$$

(A.31)

Assumption 3 implies that $m\sum _{j=1}^{m}\gamma _{j}^{2}\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}=O_{p}(mn^{-1}\sum _{j=1}^{m}\gamma _{j}^{2}j^{2}\log j)=o_{p}(m/n)$ and $\sum _{j=m+1}^{\infty }\gamma _{j}^{2}=O(m^{-2\gamma +1})$. Now (2.10) follows from Lemma A.6, (A.30) and (A.31). The proof of Theorem 2.2 is finished. $\square $

Proof of Theorem 2.3

Observe that

$$\begin{aligned} \begin{array} [c]{l} \text{ MSPE }\le 2\{\Vert \hat{\gamma }-\gamma \Vert _{K}^{2}+(\hat{\pmb {\beta }} -\pmb {\beta }_{0})^{T}E(ZZ^{T})(\hat{\pmb {\beta }}-\pmb {\beta }_{0})\}, \end{array} \end{aligned}$$

(A.32)

where $\Vert \hat{\gamma }-\gamma \Vert _{K}^{2}=\int _{{\mathcal {T}}}\int _{{\mathcal {T}} }K(s,t)[\hat{\gamma }(s)-\gamma (s)][\hat{\gamma }(t)-\gamma (t)]\mathrm{d}s\mathrm{d}t$. Under the assumptions of Theorem 2.3, using arguments similar to those used in the proof of Theorem 2 of Tang (2015), we deduce that $\Vert \hat{\gamma }-\gamma \Vert _{K}^{2}=O_{p}(n^{-(\tau +2\delta -1)/(\tau +2\delta )})$. Now (2.12) follows from (A.32) and Theorem 2.1. The proof of Theorem 2.3 is finished. $\square $

Lemma A.7

Under the assumptions of Theorem 3.1, there exists a local minimizer $\hat{\pmb {\beta }}$ of (3.1) such that $\Vert \hat{ \pmb {\beta }}-\pmb {\beta }_{0}\Vert =O_{p}(n^{-1/2})$.

Proof

Let

$$\begin{aligned} P_{n}(\pmb {\beta })=n\sum _{k=1}^{d}p_{\nu _{n}}^{\prime }\left( \left| \beta _{k}^{(0)} \right| \right) |\beta _{k}|, \quad P_{n1}(\pmb {\beta })=n\sum _{k=1}^{d_{1}}p_{\nu _{n}}^{\prime }\left( \left| \beta _{k}^{(0)} \right| \right) |\beta _{k}| \end{aligned}$$

and $D_{n}(\pmb {\beta })=(\tilde{Y}-\tilde{Z}\pmb {\beta })^{T}(\tilde{Y}-\tilde{Z}\pmb {\beta })+P_{n}(\pmb {\beta })$. It suffices to prove that for any given $\varepsilon >0$, there exists a constant C such that

$$\begin{aligned} P\left\{ \sup _{\Vert u\Vert =C}D_{n}(\pmb {\beta }_{0}+n^{-1/2}u)>D_{n}(\pmb {\beta }_{0})\right\} \ge 1-\varepsilon . \end{aligned}$$

(A.33)

Note that

$$\begin{aligned} D_{n}(\pmb {\beta }_{0}+n^{-1/2}u)-D_{n}(\pmb {\beta }_{0})\ge & {} -2n^{-1/2}(\tilde{Y}-\tilde{Z}\pmb {\beta }_{0})^{T}\tilde{Z}u+n^{-1}u^{T}\tilde{Z}^{T}\tilde{Z}u \nonumber \\&+\,\left[ P_{n1}\left( \pmb {\beta }_{01}+n^{-1/2}u_{1}\right) -P_{n1}(\pmb {\beta }_{01})\right] \qquad \qquad \end{aligned}$$

(A.34)

and

$$\begin{aligned} (\tilde{Y}-\tilde{Z}\pmb {\beta }_{0})^{T}\tilde{Z}=(\tilde{W}+\tilde{\varepsilon })^{T}\tilde{Z}. \end{aligned}$$

By Lemma A.5, we have that $n^{-1/2}\tilde{W}^{T}\tilde{Z}=o_{p}(1)$. By (A.25), it follows that $n^{-1/2}\tilde{\varepsilon }^{T}\tilde{Z}=O_{p}(1)$. By Theorem 2.1, it holds that $\pmb {\beta }^{(0)}\rightarrow _{P}\pmb {\beta }_{0}$, and we then have $P\{P_{n1}( \pmb {\beta }_{01}+n^{-1/2}u_{1})-P_{n1}(\pmb {\beta }_{01})=0\}\rightarrow 1$ as $n\rightarrow \infty $. Hence, for sufficiently large C, (A.33) follows from (A.34) and Lemma A.1 and the fact that $\Omega $ is positive definite. The proof of Lemma A.7 is complete. $\square $

Proof of Theorem 3.1

We first prove that for any $\pmb {\beta }=(\pmb {\beta }_{1}^{T},\pmb {\beta }_{2}^{T})^{T}$ in the neighborhood $ \Vert \pmb {\beta }-\pmb {\beta }_{0}\Vert =O(n^{-1/2})$ for sufficiently large n and $\pmb {\beta } _{2}\ne \pmb {0}$, with probability tending to 1, we have

$$\begin{aligned} D_{n}((\pmb {\beta }_{1},\pmb {\beta }_{2}))>D_{n}((\pmb {\beta }_{1},\pmb {0})). \end{aligned}$$

(A.35)

Observe that

$$\begin{aligned} D_{n}\left( \left( \pmb {\beta }_{1},\pmb {\beta }_{2}\right) \right) -D_{n}\left( \left( \pmb {\beta }_{1},\pmb {0}\right) \right)= & {} -2\left( \tilde{W}-\tilde{Z}\left( \left( \pmb {\beta }_{1}-\pmb {\beta }_{01}\right) ^{T},\pmb {0}^{T}\right) ^{T}+\tilde{\varepsilon }\right) ^{T}\tilde{Z}\left( \pmb {0}^{T},\pmb {\beta }_{2}^{T}\right) ^{T} \\&+\left( \pmb {0}^{T},\pmb {\beta }_{2}^{T}\right) \tilde{Z}^{T}\tilde{Z}\left( \pmb {0}^{T},\pmb {\beta }_{2}^{T}\right) ^{T} +n\sum _{k=d_{1}}^{d}p_{\nu _{n}}^{\prime }\left( \left| \beta _{k}^{(0)} \right| \right) |\beta _{k}| \end{aligned}$$

By Lemma A.5, we have that $n^{-1/2}\tilde{W}^{T}\tilde{Z}=o_{p}(1)$. By (A.25), it follows that $n^{-1/2}\tilde{\varepsilon }^{T}\tilde{Z}=O_{p}(1)$. Hence, using Lemma A.1 and the fact that $\Vert \pmb {\beta }_{2}\Vert =O(n^{-1/2})$ and $ n^{1/2}\nu _{n}\rightarrow +\,\infty $ and the result of Theorem 2.1, we deduce that with probability tending to 1, it holds that

$$\begin{aligned}&D_{n}\left( \left( \pmb {\beta }_{1},\pmb {\beta }_{2}\right) \right) -D_{n}\left( \left( \pmb {\beta }_{1},\pmb {0}\right) \right) \\&\quad =O_{p}\left( n^{1/2}\right) \sum _{k=d_{1}}^{d}|\beta _{k}|+n\sum _{k=d_{1}}^{d}p_{\nu _{n}}^{\prime }\left( \left| \beta _{k}^{\left( 0\right) } \right| \right) |\beta _{k}| \\&\quad =n\nu _{n}\sum _{k=d_{1}}^{d}\left[ O_{p}\left( \left( n^{1/2} \nu _{n}\right) ^{-1}\right) +\nu _{n}^{-1}p_{\nu _{n}}^{ \prime }\left( \left| \beta ^{\left( 0\right) }_{k}\right| \right) \right] |\beta _{k}|>0. \end{aligned}$$

By Lemma A.7 and (A.35), there exists a $ \sqrt{n}$-consistent local minimizer $\check{\pmb {\beta }}=(\check{ \pmb {\beta }}_{1},\pmb {0}^{T})^{T}$ of (3.1). Note that

$$\begin{aligned} D_{n}((\hat{\pmb {\beta }}_{\mathrm{PLS}1},\hat{\pmb {\beta }}_{\mathrm{PLS}2}))= & {} D_{n}((\check{\pmb {\beta }}_{1},\pmb {0}))-2\sqrt{n}\left[ n^{-1/2}(\tilde{Y}-\tilde{Z}\check{\pmb {\beta }})^{T}\tilde{Z}(\hat{\pmb {\theta }}_\mathrm{PLS}-\check{ \pmb {\beta }}) \right. \nonumber \\&+\,n^{-1/2}(\hat{\pmb {\theta }}_\mathrm{PLS}-\check{\pmb {\beta }})^{T}\tilde{Z}^{T}\tilde{Z}(\hat{\pmb {\theta }}_\mathrm{PLS}-\check{ \pmb {\beta }})\nonumber \\&\left. +\,\sqrt{n}\sum _{k=d_{1}+1}^{d}p_{\nu _{n}}^{\prime }\left( \left| \beta _{k}^{(0)}\right| \right) |\hat{\beta }_{PLSk}|\right] , \end{aligned}$$

(A.36)

where $\hat{\pmb {\beta }}_\mathrm{PLS}=(\hat{\beta }_{\mathrm{PLS}1},\ldots ,\hat{\beta }_{PLSd})^{T}$. Write $\tilde{Z}=(\tilde{\pmb {Z}}_{1},\tilde{\pmb {Z}}_{2})$. Since $\hat{\pmb {\beta }}_\mathrm{PLS}$ is a minimizers of (3.1) and $\check{\pmb {\beta }}$ is a local minimizer of (3.1), we then have that

$$\begin{aligned} (\tilde{Y}-\tilde{Z}\check{\pmb {\beta }})^{T}\tilde{Z}(\hat{\pmb {\theta }}_\mathrm{PLS}-\check{ \pmb {\beta }})=(\tilde{W}+\tilde{\varepsilon })^{T}\tilde{\pmb {Z}}_{2}\hat{\pmb {\theta }}_{\mathrm{PLS}2}+(\pmb {\beta }_{0}-\check{\pmb {\beta }})\tilde{Z}^{T}\tilde{\pmb {Z}}_{2}\hat{\pmb {\theta }}_{\mathrm{PLS}2}. \end{aligned}$$

(A.37)

By Lemma A.5, we have that $n^{-1/2}\tilde{W}^{T}\tilde{\pmb {Z}}_{2}=o_{p}(1)$. By (A.25), it follows that $n^{-1/2}\tilde{\varepsilon }^{T}\tilde{\pmb {Z}}_{2}=O_{p}(1)$. The fact that $\pmb {\beta }_{0}-\check{\pmb {\beta }}=O_{p}(n^{-1/2})$ and Lemma A.1 imply that $n^{-1/2}(\pmb {\beta }_{0}-\check{\pmb {\beta }})\tilde{Z}^{T}\tilde{\pmb {Z}}_{2}=O_{p}(1)$. If $\hat{\pmb {\beta }}_\mathrm{PLS}\ne \check{ \pmb {\beta }}$, under the assumptions of Theorem 3.1, then by (A.36) and (A.37), we have $D_{n}((\hat{\pmb {\beta }}_{\mathrm{PLS}1},\hat{\pmb {\beta }}_{\mathrm{PLS}2}))> D_{n}((\check{\pmb {\beta }}_{1},\pmb {0}))$. This is a contradiction to the fact that $\hat{\pmb {\beta }}_\mathrm{PLS}$ is a minimizer of (3.1). So $\hat{\pmb {\beta }}_{\mathrm{PLS}2}=0$ and $\hat{\pmb {\beta }}_{\mathrm{PLS}1}=\check{\pmb {\beta }}_{1}$.

We now prove asymptotic normality part. Consider $D_{n}((\pmb {\beta }_{1}, \pmb {0}))$ as a function of $\pmb {\beta }_{1}$. Noting that with probability tending 1, $\hat{\pmb {\beta }}_{\mathrm{PLS}1}$ is the $\sqrt{n}$-consistent minimizer of $D_{n}((\pmb {\beta }_{1},\pmb {0}))$ and satisfies

$$\begin{aligned} \frac{\partial D_{n}((\pmb {\beta }_{1},\pmb {0}))}{\partial \pmb {\beta }_{1}} \left| _{\pmb {\beta }_{1}=\hat{\pmb {\beta }}_{\mathrm{PLS}1}}=-2\tilde{\pmb {Z}}_{1}^{T}(\tilde{Y}-\tilde{Z}\hat{\pmb {\beta }}_\mathrm{PLS})=0 \right. \end{aligned}$$

Hence

$$\begin{aligned} \hat{\pmb {\beta }}_{\mathrm{PLS}1}-\pmb {\beta }_{01}= \left( \tilde{\pmb {Z}}_{1}^{T}\tilde{\pmb {Z}}_{1}\right) ^{-1}\tilde{\pmb {Z}}_{1}^{T}\tilde{Y}. \end{aligned}$$

By arguments similar to those used in the proof of (2.9), we can prove (3.2). The proof of Theorem 3.1 is finished. $\square $

Proof of Theorem 3.2

Similar to the proofs of Theorems 2.2 and 2.3, we can complete the proof of Theorem 3.2. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tang, Q., Jin, P. Estimation and variable selection for partial functional linear regression. AStA Adv Stat Anal 103, 475–501 (2019). https://doi.org/10.1007/s10182-018-00342-0

Download citation

Received: 05 May 2018
Accepted: 22 November 2018
Published: 14 December 2018
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10182-018-00342-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Estimation and variable selection for partial functional linear regression

Abstract

Similar content being viewed by others

Estimation and variable selection for partially functional linear models

Varying coefficient partially functional linear regression models

Estimation for functional linear semiparametric model

1 Introduction

2 Estimation method and asymptotic results

Remark 2.1

Assumption 1

Assumption 2

Assumption 3

Assumption 4

Assumption 5

Remark 2.2

Theorem 2.1

Remark 2.3

Theorem 2.2

Corollary 2.1

Theorem 2.3

Remark 2.4

3 Variable selection for partial functional linear model

Theorem 3.1

Theorem 3.2

4 Simulation results

5 A real data example

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendix: Proofs

Appendix: Proofs

Lemma A.1

Proof

Lemma A.2

Proof

Lemma A.3

Proof

Lemma A.4

Proof

Lemma A.5

Proof

Proof of Theorem 2.1

Lemma A.6

Proof

Proof of Theorem 2.2

Proof of Theorem 2.3

Lemma A.7

Proof

Proof of Theorem 3.1

Proof of Theorem 3.2

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation