Abstract
We propose a new estimation procedure for estimating the unknown parameters and function in partial functional linear regression. The asymptotic distribution of the estimator of the vector of slope parameters is derived, and the global convergence rate of the estimator of unknown slope function is established under suitable norm. The convergence rate of the mean squared prediction error for the proposed estimators is also established. Based on the proposed estimation procedure, we further construct the penalized regression estimators and establish their variable selection consistency and oracle properties. Finite sample properties of our procedures are studied through Monte Carlo simulations. A real data example about the real estate data is used to illustrate our proposed methodology.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
In the last two decades, there has been an increasing interest in regression models for functional variables as more and more data have arisen where the primary unit of observation can be viewed as a curve or in general a function, such as in biology, chemometrics, econometrics, geophysics, the medical sciences, meteorology and neurosciences. As a natural extension of the ordinary regression to the case where predictors include random functions and responses are scalars or functions, functional linear regression analysis provides valuable insights into these problems. The effectively infinite-dimensional character of functional data analysis is a source of many of its differences from more conventional multivariate analysis. The functional linear model has been extensively studied and successfully applied; see Cardot et al. (2003), Ramsay and Silverman (2002, 2005), Cai and Hall (2006), Hall and Horowitz (2007), Reiss and Ogden (2010), Brunel and Roche (2015), Hsing and Eubank (2015), among many others.
It is frequently the case that a response is related to both a vector of finite length and a function-valued random variable as predictor variables. With a square integrable random function X on a compact set \({\mathcal {T}}\) in R and a d-dimensional vector of random variables \(Z=(Z_{1},\ldots ,Z_{d})^{T}\), we suppose that the scalar response Y is linearly related to predictor variables (X, Z) through the relationship
where \(\pmb {\beta }_{0}\) is a \(d\times 1\) vector of regression coefficients of Z, \(\gamma (t)\) is a square integrable function on \({\mathcal {T}}\), and \(\varepsilon \) is a random error. Model (1.1) generalizes both the classical linear regression model and functional linear regression model which correspond to the cases \(\gamma (t)=0\) and \(\pmb {\beta }_{0}=0\), respectively. Moreover, this model includes the analysis of covariance model where the covariate is a random function, i.e., the model represents functional linear models between a scalar variable Y and a function-valued random variable X for each group simultaneously with the \(Z_{k}\) being scalar-valued indicator variables associated with subgroups. Zhang et al. (2007) proposed a two-stage functional mixed effects model to deal with measurement error and irregularly spaced time points and estimated the regression coefficient function using a two-stage nonparametric regression calibration method. Shin (2009) and Reiss and Ogden (2010) proposed the estimators of \(\pmb {\beta }_{0}\) and \(\gamma (t)\) by generalizing the functional principal components estimation method in the functional linear regression and Shin and Lee (2012) considered a prediction of a scalar variable based on both a function-valued variable and a finite number of real-valued variables.
In this paper, we propose a new method for estimating the unknown parameters and function in model (1.1). Using functional principal component analysis, the unknown slope function is approximated by an average value which includes the unknown parameters. The estimators of the unknown parameters are obtained by solving a minimization problem. Although our method is obviously different from Shin (2009) and Shin and Lee (2012), we find that the estimators obtained by the two methods have the same behavior through simulation and further derivation. In fact, our estimators are more simple in expression and require less computation. Under conditions weaker than Shin (2009), we derive the asymptotic normality of the estimator of \(\pmb {\beta }_{0}\) and establish the global convergence rate of the estimator of the slope function \(\gamma (t)\). Since our assumptions are weaker than that of Shin (2009), the asymptotic distribution of the estimator of \(\pmb {\beta }_{0}\) is different from that of Shin (2009) and Shin and Lee (2012). The proofs of our theorems are essentially different from Shin (2009). We establish the convergence rate of the mean squared prediction error for a predictor. Based on the proposed estimation procedure, we further propose a family of variable selection procedures via the penalized least squares using concave penalty functions. We show that the proposed penalized regression estimators have the variable selection consistency and oracle property of Fan and Li (2001).
Variable selection is particularly important when the true underlying model has a sparse representation. Identifying significant predictors will enhance the prediction performance of the fitted model. A penalty function generally facilitates variable selection in regression models. Various penalty functions have been used in the literature: the bridge regression (Frank and Friedman 1993), LASSO (Tibshirani 1996), SCAD (Fan and Li 2001), adaptive LASSO (Zou 2006), MCP (Zhang 2010), are well known. Liang and Li (2009) considered variable selection for partially linear models with measurement errors, Wang and Wang (2014) proposed adaptive Lasso estimators for ultrahigh-dimensional generalized linear models, and Aneirosa et al. (2015) investigated variable selection in partial linear regression with functional covariate. Fan et al. (2014) studied oracle optimality of folded concave penalized estimation.
The paper is organized as follows. Section 2 describes the estimation method and studies its asymptotic properties. Section 3 investigates an adaptive variable selection method and its asymptotic properties. Section 4 presents finite sample behaviors of the estimators. A real data example about the real estate data is given in Sect. 5. All proofs are relegated to “Appendix.”
2 Estimation method and asymptotic results
Let Y be a real-valued random variable defined on a probability space \((\Omega , {\mathcal {B}}, P)\). Let Z be a d-dimensional vector of random variables with finite second moments, and let \(\{X(t): t\in {\mathcal {T}}\}\) be a zero-mean and second-order (i.e., \(EX(t)^{2}<\infty \) for all \(t\in {\mathcal {T}})\) stochastic process defined on \((\Omega , {\mathcal {B}}, P)\) with sample paths in \(L_{2}({\mathcal {T}})\), the set of all square integrable functions on \({\mathcal {T}}\), where \({\mathcal {T}}\) is a bounded closed interval. \(\varepsilon \) is a random error with mean zero and is independent of (X, Z). Let \(<\cdot ,\cdot>\) and \(\Vert \cdot \Vert \) represent, respectively, the \(L_{2}({\mathcal {T}})\) inner product and norm. Denote the covariance function of the process X(t) by \(K(s,t)=cov(X(s),X(t))\). We suppose that K(s, t) is positive definite, in which case it admits a spectral decomposition in terms of strictly positive eigenvalues \(\lambda _{j}\),
where \((\lambda _{j},\phi _{j})\) are (eigenvalue, eigenfunction) pairs for the linear operator with kernel K, the eigenvalues are ordered so that \(\lambda _{1}>\lambda _{2}>\cdots \) and the functions \(\phi _{1},\phi _{2},\ldots \) form an orthonormal basis for \(L_{2}({\mathcal {T}})\). This leads to the Karhunen–Lo\(\grave{\mathrm{e}}\)ve representation
where the \(\xi _{j}=\int _{{\mathcal {T}}}X(t)\phi _{j}(t)\mathrm{d}t\) are uncorrelated random variables with mean 0 and variance \(E\xi _{j}^{2}=\lambda _{j}\). Let \(\gamma (t)=\sum _{j=1}^{\infty }\gamma _{j}\phi _{j}(t)\), then model (1.1) can be written as
By (2.2), we have
Let \((X_{i}(t),Z_{i}, Y_{i}), i=1,\ldots ,n\), be independent realizations of (X(t), Z, Y) generated by the model (1.1). Empirical versions of K and of its spectral decomposition are
Analogously to the case of K, \((\hat{\lambda }_{j},\hat{\phi }_{j})\) are (eigenvalue, eigenfunction) pairs for the linear operator with kernel \(\hat{K}\), ordered such that \(\hat{\lambda }_{1}\ge \hat{\lambda }_{2}\ge \cdots \ge 0\). We take \((\hat{\lambda }_{j},\hat{\phi }_{j})\) and \(\hat{\xi } _{ij}=\langle X_{i},\hat{\phi }_{j}\rangle \) to be the estimators of \((\lambda _{j},\phi _{j})\) and \(\xi _{ij}=\langle X_{i},\phi _{j}\rangle ,\) respectively, and set
We use \(\sum _{j=1}^{m}\tilde{\gamma }_{j}\hat{\xi }_{j}\) to approximate \(\sum _{j=1}^{\infty }\gamma _{j}\xi _{j}\) in (2.2). Combining (2.2) and (2.4), we then solve the following minimization problem
to obtain the estimator of \(\pmb {\beta }_{0}\). Define \(\tilde{\xi }_{li}=\sum _{j=1}^{m}\frac{\hat{\xi }_{lj}\hat{\xi }_{ij}}{\hat{\lambda }_{j}}\), \(\tilde{Y}_{i}=Y_{i}-\frac{1}{n}\sum _{l=1}^{n}Y_{l}\tilde{\xi }_{li}\) and \(\tilde{Z}_{i}=Z_{i}-\frac{1}{n}\sum _{l=1}^{n}Z_{l}\tilde{\xi }_{li}.\) Then, (2.5) can be written as
Let \(\tilde{Y}=(\tilde{Y}_{1},\ldots ,\tilde{Y}_{n})^{T}\) and \(\tilde{Z}=(\tilde{Z}_{1},\ldots ,\tilde{Z}_{n})^{T}\). Then the estimator \(\hat{\pmb {\beta }}\) of \(\pmb {\beta }_{0}\) is given by
The estimator of \(\gamma (t)\) is given by \(\hat{\gamma }(t)=\sum _{j=1}^{m}\hat{\gamma }_{j}\hat{\phi }_{j}(t)\) with
To implement our estimation method, we need to know how to choose m. The value for m can be selected by leave-one-curve-out cross-validation of the prediction error. Define CV function as
where \(\hat{\gamma }_{j}^{-i},j=1,\ldots ,m\) and \(\hat{\pmb {\beta }}^{-i}\) are computed after removing \((X_{i}, Z_{i}, Y_{i})\). As an alternative to cross-validation, m can also be chosen by information criteria BIC. The BIC criteria as a function of m is given by
Large values of BIC indicate poor fits.
Remark 2.1
Noting that \(\hat{\xi } _{ij}=\langle X_{i},\hat{\phi }_{j}\rangle \), it can be easily shown that our estimators have the same performance as the estimators given in Shin (2009) and Shin and Lee (2012). However, our estimators are more simple in expression and require less computation.
In the following, we derive asymptotic normality of the estimator \(\hat{\pmb {\beta }}\) and the rate of convergence for the estimator \(\hat{\gamma }(t)\). We make the following assumptions.
Assumption 1
X has finite fourth moment, in that \(\int _{{\mathcal {T}}}E(X^{4})<\infty \), and for each j, \(E(\xi _{j}^{4})<C_{1}\lambda _{j}^{2}\) for some constant \(C_{1}\).
Assumption 2
There exists a convex function \(\varphi \) defined on the interval [0, 1] such that \(\varphi (0) = 0\) and \(\lambda _{j}=\varphi (1/j)\) for \(j\ge 1\).
Assumption 3
For Fourier coefficients \(\gamma _{j}\), there exist constants \(C_{2}>0\) and \(\delta >3/2\) such that \(|\gamma _{j}|\le C_{2}j^{-\delta }\) for all \(j\ge 1\).
Assumption 4
\(m\rightarrow \infty \) and \(n^{-1/2}m\lambda _{m}^{-1}\rightarrow 0\).
Assumption 5
\(E(\Vert Z\Vert ^{4})<+\,\infty \).
Assumptions 1 and 3 are standard conditions for functional linear models; see, e.g., Cai and Hall (2006) and Hall and Horowitz (2007). Assumption 2 is slightly less restrictive than (3.2) of Hall and Horowitz (2007). Assumptions 4 can be easily verified and will be further discussed below.
Remark 2.2
Assumptions 2 and 4 are weaker than the assumptions for \(\lambda _{j}\) and m, respectively, in Shin (2009) and Shin and Lee (2012).
We first establish the asymptotic distribution of the estimator \(\hat{\pmb {\beta }}\). To derive the asymptotic normality of the estimator \(\hat{\pmb {\beta }}\), we need to adjust for the dependence of \(Z=(Z_{1},\ldots ,Z_{d})^{T}\) and X(t), which is a common complication in semiparametric models. Let \({\mathcal {G}}\) denote the class of the random variables such that \(G\in {\mathcal {G}}\) if \(G=\sum _{j=1}^{\infty }g_{j}\xi _{j}\) and \(|g_{j}|\le C_{3}j^{-\delta }\) for all \(j\ge 1 \), where \(\delta \) is defined in Assumption 3 and \(C_{3}>0\) is a constant. Note that \({\mathcal {G}}\) is related to the first term on the right side of (2.2). Denote \(G_{r} =\sum _{j=1}^{\infty }g_{rj}\xi _{j}\). Let
Since
therefore,
Thus, \(G_{r}^{*}\) are the projections of \(E(Z_{r}|X)\) onto the space \({\mathcal {G}}\). In other words, \(G_{r}^{*}\) is an element that belongs to \({\mathcal {G}}\) and it is the closest to \(E(Z_{r}|X)\) among all the random variables in \({\mathcal {G}}\). Let \(H_{r}=Z_{r}-G_{r}^{*}\) for \(r=1,\ldots ,d\), and \(H=(H_{1},\ldots ,H_{d})^{T}\). We then have the following results.
Theorem 2.1
Suppose that Assumptions 1–5 hold and \(\Omega =E(HH^{T})\) is invertible, then
where \(\rightarrow _{d}\) means convergence in distribution.
Remark 2.3
When the model is changed from functional linear model to partial functional linear model, to derive the asymptotic normality of the estimator \(\hat{\pmb {\beta }}\), it is key to handle the relation of the vector Z and X(t). In our analysis, \(Z_{r}, r=1,\ldots ,d\) are divided into two unrelated parts \(G_{r}^{*}=\sum _{j=1}^{\infty }g_{rj}^{*}\xi _{j}\) and \(H_{r}\). Consequently, (2.2) can be written as
where \(\pmb {\beta }_{0}=(\beta _{01},\ldots ,\beta _{0d})^{T}\). If \(Z_{r}=\sum _{j=1}^{\infty }\tilde{g}_{rj}\xi _{j}+V_{r}\) and \(V_{r}\) is independent of X(t), then \(G_{r}^{*}=\sum _{j=1}^{\infty }\tilde{g}_{rj}\xi _{j}\) and \(H_{r}=V_{r}\). If \(Z_{r}\) is independent of X(t), then \(G_{r}^{*}=0\) and \(H_{r}=Z_{r}\). If \(E(Z_{r}|X(t))=\sum _{j=1}^{\infty }\bar{g}_{rj}\xi _{j}\), then \(G_{r}^{*}=\sum _{j=1}^{\infty }\bar{g}_{rj}\xi _{j}\) and \(H_{r}=Z_{r}-G_{r}^{*}\). In Shin (2009) and Shin and Lee (2012), it is assumed that \(E(Z_{r}|X(t))=\sum _{j=1}^{\infty }\lambda _{j}^{-1}<K_{Z_{k}X},\phi _{j}>\xi _{j}\), where \(K_{Z_{r}X}=cov(Z_{r},X)\) for \(r=1,\ldots ,d\). In this case, \(G_{r}^{*}=\sum _{j=1}^{\infty }\lambda _{j}^{-1}<K_{Z_{k}X},\phi _{j}>\xi _{j}\) and \(H_{r}=Z_{r}-G_{r}^{*}\), and the result of our Theorem 2.1 is the same as that of Theorem 3.1 in Shin (2009). Hence, Theorem 3.1 of Shin (2009) is a special case of our Theorem 2.1.
Next we establish the convergence rates of the estimators \(\hat{\gamma }(t)\).
Theorem 2.2
Assume that Assumptions 1–5 hold and that \(n^{-1}m^{2}\lambda _{m} ^{-1}\log m\rightarrow 0\). Then
If \(\lambda _{j}\sim j^{-\tau }\), \(\tau >1\), \(m\sim n^{1/(\tau +2\delta )}\), \(\delta >2\) and \(\delta >1+\tau /2\), then \(\sum _{j=1}^{m}j^{3}\gamma _{j} ^{2}\lambda _{j}^{-2}\le C_{4}(\log m+m^{2\tau +4-2\delta )})\) and \(\sum _{j=1}^{m}\gamma _{j}^{2}\lambda _{j}^{-1}<+\,\infty \), where \(C_{4}\) is a positive constant. We then have the following corollary.
Corollary 2.1
Under Assumptions 1–5, if \(\lambda _{j}\sim j^{-\tau }\), \(\tau >1\), \(m\sim n^{1/(\tau +2\delta )}\) and \(\delta >\min (2,1+\tau /2)\), then it holds that
The global convergence result (2.11) indicates that the estimator \(\hat{\gamma }(t)\) attains the same convergence rate as those of the estimators of Hall and Horowitz (2007), which are optimal in the minimax sense.
Let \({\mathcal {S}}=\{(Z_{i},X_{i},Y_{i}): 1\le i\le n\}\). In the following, for a new pair of predictor variables \((Z_{n+1}, X_{n+1})\) taking from the same population as the data and independent of the data, we shall derive the convergence rate of the mean squared prediction error (MSPE) given by
Theorem 2.3
Under Assumptions 1, 3 and 5, if \(\lambda _{j}\sim j^{-\tau }\), \(\tau >1\), \(m\sim n^{1/(\tau +2\delta )}\) and \(\delta >\min (2,1+\tau /2)\), then
Remark 2.4
In practical application, X(t) is only discretely observed. Without loss of generality, suppose \({\mathcal {T}}=[0,1]\) and for each \(i=1,\ldots ,n\), \(X_{i}(t)\) is observed at \(n_{i}\) discrete points \(0=t_{i1}<\ldots <t_{in_{i}}=1\). Typically, \(\max _{i}\max _{1\le j\le n_{i}-1}(t_{i(j+1)}-t_{ij})\rightarrow 0\) as \(n\rightarrow \infty \) is also assumed. Based on the discrete observations, for each \(i=1,\ldots ,n\), linear interpolation functions or spline interpolation functions can be used for the estimators of \(X_{i}(t)\). For example, we can use the following linear interpolation function
as the estimator of \(X_{i}(t)\). It is necessary to point out that if \(X_{i}(t),i=1,\ldots ,n\) are replaced by \(\hat{X}_{i}(t),i=1,\ldots ,n\), the conclusions of Theorems 2.1–2.3 do not hold. We note that it is difficult to establish the related asymptotic properties by our current approach, and further research is expected.
3 Variable selection for partial functional linear model
In the variable selection problem, it is assumed that some components of \(\pmb {\beta _{0}}\) in model (1.1) are equal to zero. The goal is to identify and estimate the subset model. It has been argued that folded concave penalties are preferable to convex penalties such as the \(L_{1}\)-penalty in terms of both model-estimation accuracy and variable selection consistency (Lv and Fan 2009; Fan and Lv 2011). Let \(p_{\nu _{n}}(|u|)=p_{a,\nu _{n}}(|u|)\) be general folded concave penalty functions defined on \(u\in (-\,\infty ,+\,\infty )\) satisfying
-
(a)
The \(p_{\nu _{n}}(u)\) are increasing and concave in \(u\in [0,+\,\infty )\);
-
(b)
The \(p_{\nu _{n}}(u)\) are differentiable in \(u\in (0,+\,\infty )\) with \(p_{\nu _{n}}^{\prime }(0):=p_{\nu _{n}}^{\prime }(0+)\ge a_{1} \nu _{n}\), \(p_{\nu _{n}}^{\prime }(u)\ge a_{1}\nu _{n}\) for \(u\in (0,a_{2}\nu _{n}]\), \(p_{\nu _{n}}^{\prime }(u)\le a_{3} \nu _{n}\) for \(u\in [0,+\,\infty )\), and \(p_{\nu _{n}}^{\prime }(u)=0\) for \(u\in [a\nu _{n},+\,\infty )\) with a prespecified constant \(a>a_{2}\), where \(a_{1}\), \(a_{2}\) and \(a_{3}\) are fixed positive constants.
The above family of general folded concave penalties contains several popular penalties including the SCAD penalty (Fan and Li 2001), the derivative of which is given by
and the MCP penalty (Zhang 2010), the derivative of which is given by
It is easy to see that \(a_{1}=a_{2}=a_{3}=1\) for the SCAD, and \(a_{1} =1-a^{-1}\), \(a_{2}=a_{3}=1\) for the MCP.
Based on the above analysis, we define a penalized least squares estimator of \(\pmb {\beta }_{0}\) as
where \({\pmb {\beta }}^{(0)}=(\beta _{1}^{(0)},\ldots ,\beta _{d}^{(0)})^{T}\) is an initial estimator of \(\pmb {\beta }_{0}\). For example, \({\pmb {\beta }}^{(0)}\) can be obtained from (2.7) in Sect. 2.
In the following, we show that the penalized least squares estimator defined by (3.1) has the oracle property (Fan and Li 2001). Without loss of generality, let \(\pmb {\beta }=({\pmb {\beta }}_{1}^{T},\pmb {\beta }_{2}^{T})^{T}\), where \(\pmb {\beta }_{1}\in \mathbf {R}^{d_{1}}\) and \(\pmb {\beta }_{2}\in \mathbf {R}^{d-d_{1}}\). The vector of true parameters is denoted by \(\pmb {\beta }_{0}=(\pmb {\beta }_{01} ^{T},\pmb {\beta }_{02}^{T})^{T}\) with each element of \(\pmb {\beta }_{01}\) being nonzero and \(\pmb {\beta }_{02}=0\).
Theorem 3.1
Suppose that the conditions of Theorem 2.1 hold. Let \(p_{\nu _{n}}(\cdot )\) be general folded concave penalty functions satisfying assumptions (a) and (b) above and \({\pmb {\beta }}^{(0)}\) be the estimator defined by (2.7). If \(\nu _{n}\rightarrow 0\) and \(\sqrt{n}\nu _{n}\rightarrow \infty \) as \(n\rightarrow \infty \), then the penalized least squares estimator \(\hat{\pmb {\beta }}_\mathrm{PLS} =(\hat{\pmb {\beta }}_{\mathrm{PLS}1}^{T},\hat{\pmb {\beta }}_{\mathrm{PLS}2}^{T})^{T}\) defined by (3.1) satisfies
-
(1)
Sparsity: \(P(\hat{\pmb {\beta }}_{\mathrm{PLS}2}=0)\rightarrow 1.\)
-
(2)
Asymptotic normality:
$$\begin{aligned} \sqrt{n}(\hat{\pmb {\beta }}_{\mathrm{PLS}1}-\pmb {\beta }_{01})\rightarrow _{d}N(0,\Omega _{1}^{-1}\sigma ^{2}), \end{aligned}$$(3.2)where \(\Omega _{1}=E[(H_{1},\ldots ,H_{d_{1}})^{T}(H_{1},\ldots ,H_{d_{1}})]\).
Let
and \(\hat{\gamma }_\mathrm{PLS}(t)=\sum _{j=1}^{m}\hat{\gamma }_{PLSj}\hat{\phi }_{j}(t)\). We then have the following theorem.
Theorem 3.2
4 Simulation results
Since our estimators have the same performances as Shin (2009) and Shin and Lee (2012), in this section, we only investigate the finite sample performance of the penalized least squares estimators proposed in Sect. 3 by carrying out a Monte Carlo study. The data sets were generated from the following models
with \({\mathcal {T}}=[0,1]\), \(\pmb {\beta }_{0}=(2,0,1.5,0,0.3)^{T}\). We took \(\gamma (t)=\sum _{j=1}^{50}\gamma _{j}\phi _{j}(t)\) and \(X_{i}(t)=\sum _{j=1}^{50}\xi _{ij}\phi _{j}(t)\), where \(\gamma _{1}=0.3\) and \(\gamma _{j}=4(-1)^{j+1}j^{-\delta },j\ge 2\); \(\phi _{1}(t)\equiv 1\) and \(\phi _{j}(t)=2^{1/2}\cos ((j-1)\pi t),j\ge 2\); the \(\xi _{ij}\)’s were independent and normal \(N(0, \lambda _{j})\). We let \(Z_{i}=(Z_{i1},\ldots ,Z_{i5})^{T}\), when conditioning on \(\xi _{ij}\), be a multivariate normal distribution with the mean vector \(((1+\lambda _{1})^{-1/2}\xi _{i1},\ldots , (1+\lambda _{5})^{-1/2}\xi _{i5})^{T}\) and the variance-covariance matrix \(V=v_{kl}\) with \(v_{kk}=(1+\lambda _{k})^{-1}\) and \(v_{kl}= 0.7((1+\lambda _{k})(1+\lambda _{l}))^{-1/2}\) for \(k,l =1,\ldots ,5\), so that \(Z_{i}\) has a multivariate normal distribution with the zero-mean vector and the variance-covariance matrix whose diagonal elements are 1 and off-diagonal elements are \(v_{kl}\). The errors \(\varepsilon _{i}\) were normally distributed with the mean 0 and the standard deviation 0.5. Similar to Shin and Lee (2012), we used 4 different sets of the eigenvalues, \(\{\lambda _{j}\}\). In the two settings, \(\lambda _{j}=j^{-\tau }\) and different values of \(\tau \) are considered. In the other two settings, eigenvalues are “closely spaced” as in Hall and Horowitz (2007): \(\lambda _{1}= 1, \lambda _{j} =0.2^{2}(1-0.0001j)^{2}\) if \(2 \le j\le 4\), \(\lambda _{5j+k}= 0.2^{2}\{(5j)^{-\tau /2}-0.0001k\}^{2}\) for \(j\ge 1\) and \(0\le k \le 4\).
-
1.
Set \(\tau =1.1\) and \(\delta =2\) with the well-spaced eigenvalues.
-
2.
Set \(\tau =1.1\) and \(\delta =2\) with the closely spaced eigenvalues.
-
3.
Set \(\tau =3\) and \(\delta =2\) with the well-spaced eigenvalues.
-
4.
Set \(\tau =3\) and \(\delta =2\) with the closely spaced eigenvalues.
All the results in this section are based on 500 replications. In all the simulated designs, we used the SCAD penalty function with \(a=3.7\). We set the sample size n to be 100 and 200, respectively. For each simulated data set, the penalized least squares estimators \(\hat{\pmb {\beta }}_\mathrm{PLS}\) and \(\hat{\gamma }_\mathrm{PLS}(t)\) were computed by the procedure given in Sects. 2 and 3. The tuning parameter m is determined by BIC criterion as described in Sect. 2, and the tuning parameter \(\nu _{n}\) in (3.1) is selected by the method given by Fan et al. (2014).
We measured the estimation accuracy for parametric estimators by the average \(l_{1}\)-losses: \(|\hat{\beta }_{1}-\beta _{1}|\), \(|\hat{\beta }_{3}-\beta _{3}|\), and \(|\hat{\beta }_{5}-\beta _{5}|\) over 500 replications. We also evaluated the selection accuracy by the average counts of false positive (FP) and false negative (FN) over the 500 replications; that is, the number of noise covariates included in the model and the number of signal covariates not included. Table 1 displays the simulation results for model (4.1). We see from Table 1 that there is a general tendency for the average \(l_{1}\)-loss and PN and FN to decrease as n increases and there is a general tendency for the average \(l_{1}\)-loss to decrease as \(\tau \) increases. Table 1 also shows that PNs and FNs for Settings 1 and 3 with the well-spaced eigenvalues are less than that for Settings 2 and 4 with the closely spaced eigenvalues, while PN and FN for the Setting 4 are less than that for the Setting 2.
Table 2 reports the integrated squared bias (\(\hbox {Bias}^{2}\)), integrated variance (Var) and mean integrated squared error (MISE) of the estimator \(\hat{\gamma }(t)\) computed on a grid of 100 equally spaced points on \({\mathcal {T}}\). Table 2 shows that there is a general tendency for the MISE to decrease as \(\tau \) increases. We also see from Table 2 that the MISEs for Settings 1 and 3 with the well-spaced eigenvalues are less than that for Settings 2 and 4 with the closely spaced eigenvalues.
In the following, we investigate the variable selection for high-dimensional data. In (4.1), let \(Z_{i}=(Z_{i1},\ldots ,Z_{i30},)^{T}\), where \(Z_{i1},\ldots ,Z_{i5}\) are taken the same as above, \(Z_{i6},\ldots ,Z_{i30}\) are mutually independent and independent of \(Z_{i1},\ldots ,Z_{i5}\) and \(X_{ij}\sim N(0,1)\) for \(j=6,\ldots ,30\), \(\pmb {\beta }_{0}=(2,0,1.5,0,0.3,0,\ldots ,0)^{T}\). The simulation results under this high-dimensional data are reported in Tables 3 and 4. We find that Tables 3 and 4 show conclusions similar to those in Tables 1 and 2. Comparing Table 3 with Table 1 and Table 4 with Table 2, we see that our penalized least squares estimators also behave well under the high-dimensional data.
5 A real data example
In this section, we analyze a real data set using the proposed methodology. For this purpose, we analyze the real estate data set which was collected from the statistical yearbooks of various cities, real estate market reports and statistical bulletins on national economic and social development in China. It includes the real estate data for 197 second-, third- and fourth-tier cities in China. In this data set, there are the average annual income of urban residents from 2000 to 2016, and the other data are based on 2016. Our purpose is to study the relationship between urban housing prices and their influencing factors. The response variable Y represents urban housing price. Since it takes many years of savings for the average resident to buy a house, we choose the average annual income of the residents as the functional covariate. Let \(X_{i}^{*}(t)\) denote the average annual income of the residents of the \(i\hbox {th}\) city for the year t and \(X_{i}(t)=X_{i}^{*}(t)-\bar{X}^{*}(t)\), where \(\bar{X}^{*}(t)=\frac{1}{197}\sum _{i=1}^{197}X_{i}^{*}(t)\). The scalar covariates of primary interests include urban category (\(Z_{2},Z_{3}\)), urban population (\(Z_{4}\)), urban GDP (\(Z_{5}\)), bank interest rate (\(Z_{6}\)), urban livability index (\(Z_{7}\)), urban comprehensive competitiveness (\(Z_{8}\)) and urban development index (\(Z_{9}\)). We note that among these variables the data of some variables such as \(Z_{4}\) and \(Z_{5}\) are very large, whereas those of some variables such as \(Z_{6}\) are small. For this purpose, for each data of these variables, we first make the following modification: Let \(\bar{z}_{i4}\), \(i=1,\ldots ,197\) be the observations of \(Z_{4}\). Let \(z_{i4} = \bar{z}_{i4}/\max \bar{z}_{i4}\),\(i=1,\ldots ,197\), so that the maximum of modified data of the variable \(Z_{4}\) is 1. The data of the variables \(Z_{5},\ldots ,Z_{9}\) are modified in a similar fashion. We construct the following partial functional linear model:
where \(Z_{i1}\equiv 1\), \(Z_{i2}=1\) and \(Z_{i3}=0\) stand for second-tier city, \(Z_{i2}=0\) and \(Z_{i3}=1\) stand for third-tier city, and \(Z_{i2}=0\) and \(Z_{i3}=0\) stand for fourth-tier city.
The estimators of unknown parameters and function in model (5.1) are computed by the method given in Sect. 2, and the tuning parameter m is determined by BIC criterion as described in Sect. 2. Table 5 exhibits the parametric estimators, and Fig. 1a shows the estimated curve of \(\gamma (t)\) and its 95% confidence interval. We see from Table 5 that urban population, urban GDP, urban livability index , urban comprehensive competitiveness and urban development index have nonnegative effects, while bank interest rate has a negative effect. The fact that \(\beta _{02}>\beta _{01}>0\) in Table 5 indicates that the housing price for a third-tier city is larger than that for a fourth-tier city and the housing price for a second-tier city is larger than that for a third-tier city. We see from Fig. 1a that the estimated curve varies smoothly, but there is a rapid upward trend in the tail which shows that the effect of the average annual incomes of the residents on house prices varies greatly with different cities in recent years.
Table 6 exhibits the penalized least squares estimators of the parameters computed by the procedure given in Sect. 3, and Fig. 1b shows the estimated curve of \(\gamma (t)\) computed by (3.3) and its 95% confidence interval. Table 6 shows that urban category, urban GDP, urban livability index and urban development index are important factors affecting house prices. Comparing Fig. 1b with Fig. 1a, we see that the difference between the two is not much.
To evaluate the prediction performance of our model and methods, we applied leave-one-out cross-validation to the data; i.e., when predicting the housing price for the ith city, we omit the data for this city when fitting the model. Figure 2 displays the boxplots for the absolute prediction errors \(|\widehat{\log (y_{j})}-\log (y_{j})|,\ j=1,\ldots ,197,\) for the method given in Sect. 2 and the penalized method given in Sect. 3. The mean values of these errors for the two methods are 0.2529 and 0.2521, respectively. These observations and Fig. 2 suggest that the penalized method is slightly better than the method given in Sect. 2.
References
Aneirosa, G., Ferraty, F., Vieu, P.: Variable selection in partial linear regression with functional covariate. Statistics 49, 1322–1347 (2015)
Brunel, E., Roche, A.: Penalized contrast estimation in functional linear models with circular data. Statistics 49, 1298–1321 (2015)
Cai, T.T., Hall, P.: Prediction in functional linear regression. Ann. Stat. 34, 2159–2179 (2006)
Cardot, H., Ferraty, F., Sarda, P.: Spline estimators for the functional linear model. Stat. Sin. 13, 571–591 (2003)
Cardot, H., Mas, A., Sarda, P.: CLT in functional linear models. Probab. Theory Relat. Fields 138, 325–361 (2007)
Fan, J., Li, R.: Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96, 1348–1360 (2001)
Fan, J., Lv, J.: Non-concave penalized likelihood with np-dimensionality. IEEE Trans. Inf. Theory 57, 5467–5484 (2011)
Fan, J., Xue, L., Zou, H.: Strong oracle optimality of folded concave penalized estimation. Ann. Stat. 42, 819–849 (2014)
Frank, I., Friedman, J.: A statistical view of some chemometrics regression tools (with discussion). Technometrics 35, 109–135 (1993)
Hall, P., Horowitz, J.L.: Methodology and convergence rates for functional linear regression. Ann. Stat. 35, 70–91 (2007)
Hsing, T., Eubank, R.: Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators. Wiley, New York (2015)
Liang, H., Li, R.: Variable selection for partially linear models with measurement errors. J. Am. Stat. Assoc. 104, 234–248 (2009)
Lv, J., Fan, J.: A unified approach to model selection and sparse recovery using regularized least squares. Ann. Stat. 37, 3498–3528 (2009)
Ramsay, J.O., Silverman, B.W.: Applied Functional Data Analysis: Methods and Case Studies. Springer, New York (2002)
Ramsay, J.O., Silverman, B.W.: Functional Data Analysis. Springer, New York (2005)
Reiss, P.T., Ogden, R.T.: Functional generalized linear models with images as predictors. Biometrics 66, 61–69 (2010)
Shin, H.: Partial functional linear regression. J. Stat. Plan. Inference 139, 3405–3418 (2009)
Shin, H., Lee, M.H.: On prediction rate in partial functional linear regression. J. Multivar. Anal. 103, 93–106 (2012)
Tang, Q.: Estimation for semi-functional linear regression. Statistics 49, 1262–1278 (2015)
Tibshirani, R.: Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58, 267–288 (1996)
Wang, M., Wang, X.: Adaptive Lasso estimators for ultrahigh dimensional generalized linear models. Stat. Prob. Lett. 89, 41–50 (2014)
Zhang, C.H.: Nearly unbiased variable selection under mini-max concave penalty. Ann. Stat. 38, 894–942 (2010)
Zhang, D., Lin, X., Sowers, M.F.: Two-stage functional mixed models for evaluating the effect of longitudinal covariate profiles on a scalar outcome. Biometrics 63, 351–362 (2007)
Zou, H.: The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101, 1418–1429 (2006)
Acknowledgements
This work was supported by the National Social Science Foundation of China (16BTJ019), the Humanities and Social Science Foundation of Ministry of Education of China (14YJA910004) and Natural Science Foundation of Jiangsu Province of China (Grant No. BK20151481).
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix: Proofs
Appendix: Proofs
In this section, let \(C>0\) denote a generic constant of which the value may change from line to line. For a matrix \(A=(a_{ij})\), set \(\Vert A\Vert _{\infty }=\max _{i}\sum _{j}|a_{ij}|\) and \(|A|_{\infty }=\max _{i,j}|a_{ij}|\). For a vector \(v=(v_{1},\ldots ,v_{k})^{T}\), set \(\Vert v\Vert _{\infty }=\sum _{j=1}^{k}|v_{j}|\) and \(|v|_{\infty }=\max _{1\le j\le k}|v_{j}|\). Denote \(W_{l}=\sum _{j=1}^{\infty }\gamma _{j}\xi _{lj}\), \(\tilde{W}_{i}=W_{i}-\frac{1}{n}\sum _{l=1}^{n}W_{l}\tilde{\xi }_{li}\), \(\tilde{\varepsilon }_{i}=\varepsilon _{i}-\frac{1}{n}\sum _{l=1}^{n}\varepsilon _{l}\tilde{\xi }_{li}\) and \(\tilde{W}=(\tilde{W}_{1},\ldots ,\tilde{W}_{n})^{T}\), \(\tilde{\varepsilon }=(\tilde{\varepsilon }_{1},\ldots ,\tilde{\varepsilon }_{n})^{T}\). Then
Lemma A.1
Suppose that Assumptions 1, 2, 4 and 5 hold, then it holds that
Proof
Let \(\tilde{Z}_{i}=(\tilde{Z}_{i1},\ldots ,\tilde{Z}_{id})^{T}\). Set \(\vec {\xi }_{li}=\sum _{j=1}^{m}\frac{\xi _{lj} \xi _{ij}}{\lambda _{j}}\), \(\vec {Z}_{ir1}=Z_{ir}-\frac{1}{n}\sum _{l=1}^{n}Z_{lr}\vec {\xi }_{li}\) and \(\vec {Z}_{ir2}=\frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\tilde{\xi }_{li}-\vec {\xi }_{li}).\) Then \(\tilde{Z}_{ir}=\vec {Z}_{ir1}-\vec {Z}_{ir2}\) and
Let \(\vec {Z}_{ir21}=\sum _{j=1}^{m}\frac{1}{\lambda _{j}} \left[ \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj}-\xi _{lj})\right] \xi _{ij}\), \(\vec {Z}_{ir22}=\sum _{j=1}^{m}\left( \frac{1}{\hat{\lambda }_{j}} -\frac{1}{\lambda _{j}}\right) \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}\right) \xi _{ij}\) and \(\vec {Z}_{ir23}=\sum _{j=1}^{m}\frac{1}{\hat{\lambda }_{j}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}\right) (\hat{\xi }_{ij}-\xi _{ij}).\) We then have
Lemma 5.1 of Hall and Horowitz (2007) implies that
where \(\Delta =\hat{K}-K\). We then obtain that
where \(\vec {\xi }_{rj}=\frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj}\). Lemma 1 of Cardot et al. (2007) implies that
uniformly for \(1\le j\le m\). By (5.2) of Hall and Horowitz (2007), it holds that \(\sup _{j\ge 1}|\hat{\lambda }_{j}-\lambda _{j}|\le |\Vert \Delta \Vert |=O_{p}(n^{-1/2})\) and
where \(|\Vert \Delta \Vert |=(\int _{{\mathcal {T}}}\int _{{\mathcal {T}}}\Delta ^{2}(s,t)\mathrm{d}s\mathrm{d}t)^{1/2}\). Using Parseval’s identity, we get that
Assumption 4 implies that \(|\hat{\lambda }_{j}-\lambda _{j}|=o_{p}(\lambda _{m}/m)\). Consequently, \(\sum _{k\ne j}\frac{\vec {\xi }_{rk}^{2}}{(\hat{\lambda }_{j}-\lambda _{k})^{2}}=\sum _{k\ne j}\frac{\vec {\xi }_{rk}^{2}}{(\lambda _{j}-\lambda _{k})^{2}}[1+o_{p}(1)]\), where \(o_{p}(1)\) holds uniformly for \(1\le j\le m\). By arguments similar to those used in the proof of Lemma 2 of Cardot et al. (2007) and use the fact that \((\lambda _{j}-\lambda _{k})^{2}\ge (\lambda _{k}-\lambda _{k+1})^{2}\), we deduce that
Lemma 1 of Cardot et al. (2007) yields that
and \(\sum _{j=1}^{m}\lambda _{j}^{-1}\le \lambda _{m}^{-1}m\). Therefore,
and
Decomposing \(\frac{1}{n}\sum _{l=1}^{n}Z_{lr}\hat{\xi }_{lj}=\vec {\xi }_{rj}+\frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\hat{\xi }_{lj} -\xi _{lj})\) and using (A.6), we get
By (A.10) of Tang (2015), it holds that
where \(O_{p}(\cdot )\) holds uniformly for \(1\le j\le m\). Using (A.8) and (A.9), we obtain
Hence, by (A.3), (A.7), (A.8), and (A.10) and Assumption 4, we conclude that
Define \(\check{\xi }_{jr}=\frac{1}{n}\sum _{l=1}^{n}\lambda _{j}^{-1/2}\xi _{lj}Z_{lr}\). Since \(E[\max _{1\le j\le m}(\check{\xi }_{jr}-E(\check{\xi }_{jr}))^{2}]\le \frac{1}{n}\sum _{j=1}^{m}\lambda _{j}^{-1}E(\xi _{j}Z_{r})^{2}\le Cn^{-1}\), we then have \(\max _{1\le j\le m}|\check{\xi }_{jr}-E(\check{\xi }_{jr})|=O_{p}(n^{-1/2})\). Hence
where \(\bar{\xi }_{jj'}=\frac{1}{n(\lambda _{j}\lambda _{j'})^{1/2}}\sum _{i=1}^{n}\xi _{ij}\xi _{ij'}\). Now Lemma A.1 follows from (A.2), (A.11), (A.12) and the fact that \(\frac{1}{n}|\sum _{i=1}^{n}\vec {Z}_{ir1}\vec {Z}_{iq2}| \le \left( \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{ir1}^{2}\right) ^{1/2} \left( \frac{1}{n}\sum _{i=1}^{n}\vec {Z}_{iq2}^{2}\right) ^{1/2}\). \(\square \)
Lemma A.2
Under Assumptions 1–4, it holds that
Proof
Set \(S_{1}=\sum _{j=1}^{m}\lambda _{j}\left[ \gamma _{j}-\frac{1}{\lambda _{j}} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}\right) \right] ^{2}\), \(S_{2}=\sum _{j=1}^{m}\frac{1}{\lambda _{j}} \left[ \frac{1}{n}\sum _{l=1}^{n}W_{l}(\hat{\xi }_{lj}-\xi _{lj})\right] ^{2}\) and \(S_{3}=\sum _{j=1}^{m}\lambda _{j} \left( \frac{1}{\hat{\lambda }_{j}}-\frac{1}{\lambda _{j}}\right) ^{2} \left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\right) ^{2}\). We have
Since \(E\left[ \gamma _{j}-\frac{1}{\lambda _{j}}\left( \frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}\right) \right] =0\), then by Assumptions 1–3, we obtain that
Similar to the proof of (A.6) and (A.8) and using Assumption 4, we deduce that
and
Now Lemma A.2 follows from (A.13)–(A.16). \(\square \)
Lemma A.3
Under Assumptions 1, 2, 4 and 5, it holds that
Proof
Let \(Z_{ir}^{*}=Z_{ir}-\sum _{j'=1}^{m}\frac{1}{\lambda _{j'}} \left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj'}\right) \xi _{ij'}\). Observe that
By direct computation and using Assumption 1, we get
and
Hence
Since \(\sum _{j'=1}^{m}\frac{1}{\lambda _{j'}}E\left( \sum _{i=1}^{n}\xi _{ij}\xi _{ij'}\right) ^{2}\le Cn^{2}\lambda _{j}\), then by (A.6), we have
Similar to the proof (A.8) and using Assumption 4, we deduce that
and
Now Lemma A.3 follows from (A.17)–(A.21) and Assumption 4. \(\square \)
Lemma A.4
Under Assumptions 1–5, it holds that
Proof
Let \(\breve{W}_{j}=\frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}\). Applying the Cauchy–Schwarz inequality, we get
Using (A.4), (A.5), Assumption 4 and Parseval’s identity and the arguments similar to those used to prove Lemma A.3, we deduce that
Let \(\vec {W}_{j}=\frac{1}{n}\sum _{l=1}^{n}W_{l}\xi _{lj}\). Decomposing \(\frac{1}{n}\sum _{l=1}^{n}W_{l}\hat{\xi }_{lj}=\vec {W}_{j}+\frac{1}{n}\sum _{l=1}^{n}W_{l}(\hat{\xi }_{lj} -\xi _{lj})\) and using arguments similar to those used in the proof of (A.6) and using Assumption 4 , we obtain that
This finishes the proof of Lemma A.4. \(\square \)
Lemma A.5
Under Assumptions 1–5, it holds that
Proof
Observe that
Lemmas A.2 and A.3 and Assumption 4 imply that
By arguments similar to those used in the proof of Lemma A.3, we obtain that
Now Lemma A.5 follows from (A.22)–(A.24) and Lemma A.4. \(\square \)
Proof of Theorem 2.1
By arguments similar to those used to prove Lemmas A.4 and A.5, we deduce that \(n^{-1/2}\sum _{i=1}^{n}\left( \frac{1}{n}\sum _{l=1}^{n}\varepsilon _{l}\tilde{\xi }_{li}\right) \tilde{Z}_{ir}=o_{p}(1)\). Hence
We decompose \(\sum _{i=1}^{n}\tilde{Z}_{ir}\varepsilon _{i}\) into three terms as
Similar to the proof of Lemma A.4, we have \(\sum _{i=1}^{n}\varepsilon _{i} \frac{1}{n}\sum _{l=1}^{n}Z_{lr}(\tilde{\xi }_{li}-\vec {\xi }_{li})=o_{p}(n)\). Since
\(\sum _{i=1}^{n}\varepsilon _{i}\sum _{j=1}^{m}\frac{\xi _{ij}}{\lambda _{j} }\left( \frac{1}{n}\sum _{l=1}^{n}Z_{lr}\xi _{lj}-E(Z_{lr}\xi _{j})\right) =o_{p}(n)\) and \(\sum _{i=1}^{n}\varepsilon _{i}\sum _{j=m+1}^{\infty }g_{kj}\xi _{ij} =o_{p}(n)\), it follows that
Now (2.9) follows from (A.1), Lemmas A.1 and A.5, (A.25) and the central limit theorem. The proof of Theorem 2.1 is finished. \(\square \)
Lemma A.6
Define \(\check{\gamma }_{j}=\frac{1}{\hat{\lambda }_{j} }E[(Y-Z^{T}\pmb {\beta }_{0})\xi _{j}]\). Under the assumptions of Theorem 3.2, it holds that
Proof
Define \(I_{1}=\frac{1}{n}\sum _{i=1}^{n} (Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})\xi _{ij} -\gamma _{j}\lambda _{j}\), \(I_{2}=\frac{1}{n}\sum _{i=1}^{n}(Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})(\hat{\xi }_{ij}-\xi _{ij})\) and \(I_{3}=\frac{1}{n}\sum _{i=1}^{n}Z_{i}^{T}(\hat{\pmb {\beta }}-\pmb {\beta }_{0})\hat{\xi }_{ij}\). Noting that \(E[(Y-Z^{T}\pmb {\beta }_{0})\xi _{j}]=\gamma _{j}\lambda _{j}\), we have
where \(o_{p}(1)\) holds uniformly for \(j=1,\ldots ,m\). Since \(E(I_{1})=0\) and \(E(I_{1}^{2})\le \frac{1}{n}[\sum _{k=1}^{\infty }\gamma _{k} ^{2}E(\xi _{k}^{2}\xi _{j}^{2})+\sigma ^{2}\lambda _{j}]\le C\lambda _{j}/n\), we obtain that
Let \(M(t)=E[(Y_{i}-Z_{i}^{T}\pmb {\beta }_{0})X_{i}(t)] =\sum _{k=1}^{\infty }\gamma _{k}\lambda _{k}\phi _{k}(t)\). Then
Applying Assumption 1, it holds that
From (A.9), we obtain \(\sum _{j=1}^{m}\lambda _{j}^{-2}\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}=O_{p}(n^{-1}m^{3}\lambda _{m}^{-2} \log m)\). By arguments similar to those used in the proof of (5.15) of Hall and Horowitz (2007), it follows that
Hence, using the assumption that \(n^{-1}m^{2}\lambda _{m} ^{-1}\log m\rightarrow 0\), we obtain
Using Theorem 3.1, it holds that
Now Lemma A.6 follows from combining (A.26)–(A.29). \(\square \)
Proof of Theorem 2.2
Note that
and
Assumption 3 implies that \(m\sum _{j=1}^{m}\gamma _{j}^{2}\Vert \hat{\phi }_{j}-\phi _{j}\Vert ^{2}=O_{p}(mn^{-1}\sum _{j=1}^{m}\gamma _{j}^{2}j^{2}\log j)=o_{p}(m/n)\) and \(\sum _{j=m+1}^{\infty }\gamma _{j}^{2}=O(m^{-2\gamma +1})\). Now (2.10) follows from Lemma A.6, (A.30) and (A.31). The proof of Theorem 2.2 is finished. \(\square \)
Proof of Theorem 2.3
Observe that
where \(\Vert \hat{\gamma }-\gamma \Vert _{K}^{2}=\int _{{\mathcal {T}}}\int _{{\mathcal {T}} }K(s,t)[\hat{\gamma }(s)-\gamma (s)][\hat{\gamma }(t)-\gamma (t)]\mathrm{d}s\mathrm{d}t\). Under the assumptions of Theorem 2.3, using arguments similar to those used in the proof of Theorem 2 of Tang (2015), we deduce that \(\Vert \hat{\gamma }-\gamma \Vert _{K}^{2}=O_{p}(n^{-(\tau +2\delta -1)/(\tau +2\delta )})\). Now (2.12) follows from (A.32) and Theorem 2.1. The proof of Theorem 2.3 is finished. \(\square \)
Lemma A.7
Under the assumptions of Theorem 3.1, there exists a local minimizer \(\hat{\pmb {\beta }}\) of (3.1) such that \(\Vert \hat{ \pmb {\beta }}-\pmb {\beta }_{0}\Vert =O_{p}(n^{-1/2})\).
Proof
Let
and \(D_{n}(\pmb {\beta })=(\tilde{Y}-\tilde{Z}\pmb {\beta })^{T}(\tilde{Y}-\tilde{Z}\pmb {\beta })+P_{n}(\pmb {\beta })\). It suffices to prove that for any given \(\varepsilon >0\), there exists a constant C such that
Note that
and
By Lemma A.5, we have that \(n^{-1/2}\tilde{W}^{T}\tilde{Z}=o_{p}(1)\). By (A.25), it follows that \(n^{-1/2}\tilde{\varepsilon }^{T}\tilde{Z}=O_{p}(1)\). By Theorem 2.1, it holds that \(\pmb {\beta }^{(0)}\rightarrow _{P}\pmb {\beta }_{0}\), and we then have \(P\{P_{n1}( \pmb {\beta }_{01}+n^{-1/2}u_{1})-P_{n1}(\pmb {\beta }_{01})=0\}\rightarrow 1\) as \(n\rightarrow \infty \). Hence, for sufficiently large C, (A.33) follows from (A.34) and Lemma A.1 and the fact that \(\Omega \) is positive definite. The proof of Lemma A.7 is complete. \(\square \)
Proof of Theorem 3.1
We first prove that for any \(\pmb {\beta }=(\pmb {\beta }_{1}^{T},\pmb {\beta }_{2}^{T})^{T}\) in the neighborhood \( \Vert \pmb {\beta }-\pmb {\beta }_{0}\Vert =O(n^{-1/2})\) for sufficiently large n and \(\pmb {\beta } _{2}\ne \pmb {0}\), with probability tending to 1, we have
Observe that
By Lemma A.5, we have that \(n^{-1/2}\tilde{W}^{T}\tilde{Z}=o_{p}(1)\). By (A.25), it follows that \(n^{-1/2}\tilde{\varepsilon }^{T}\tilde{Z}=O_{p}(1)\). Hence, using Lemma A.1 and the fact that \(\Vert \pmb {\beta }_{2}\Vert =O(n^{-1/2})\) and \( n^{1/2}\nu _{n}\rightarrow +\,\infty \) and the result of Theorem 2.1, we deduce that with probability tending to 1, it holds that
By Lemma A.7 and (A.35), there exists a \( \sqrt{n}\)-consistent local minimizer \(\check{\pmb {\beta }}=(\check{ \pmb {\beta }}_{1},\pmb {0}^{T})^{T}\) of (3.1). Note that
where \(\hat{\pmb {\beta }}_\mathrm{PLS}=(\hat{\beta }_{\mathrm{PLS}1},\ldots ,\hat{\beta }_{PLSd})^{T}\). Write \(\tilde{Z}=(\tilde{\pmb {Z}}_{1},\tilde{\pmb {Z}}_{2})\). Since \(\hat{\pmb {\beta }}_\mathrm{PLS}\) is a minimizers of (3.1) and \(\check{\pmb {\beta }}\) is a local minimizer of (3.1), we then have that
By Lemma A.5, we have that \(n^{-1/2}\tilde{W}^{T}\tilde{\pmb {Z}}_{2}=o_{p}(1)\). By (A.25), it follows that \(n^{-1/2}\tilde{\varepsilon }^{T}\tilde{\pmb {Z}}_{2}=O_{p}(1)\). The fact that \(\pmb {\beta }_{0}-\check{\pmb {\beta }}=O_{p}(n^{-1/2})\) and Lemma A.1 imply that \(n^{-1/2}(\pmb {\beta }_{0}-\check{\pmb {\beta }})\tilde{Z}^{T}\tilde{\pmb {Z}}_{2}=O_{p}(1)\). If \(\hat{\pmb {\beta }}_\mathrm{PLS}\ne \check{ \pmb {\beta }}\), under the assumptions of Theorem 3.1, then by (A.36) and (A.37), we have \(D_{n}((\hat{\pmb {\beta }}_{\mathrm{PLS}1},\hat{\pmb {\beta }}_{\mathrm{PLS}2}))> D_{n}((\check{\pmb {\beta }}_{1},\pmb {0}))\). This is a contradiction to the fact that \(\hat{\pmb {\beta }}_\mathrm{PLS}\) is a minimizer of (3.1). So \(\hat{\pmb {\beta }}_{\mathrm{PLS}2}=0\) and \(\hat{\pmb {\beta }}_{\mathrm{PLS}1}=\check{\pmb {\beta }}_{1}\).
We now prove asymptotic normality part. Consider \(D_{n}((\pmb {\beta }_{1}, \pmb {0}))\) as a function of \(\pmb {\beta }_{1}\). Noting that with probability tending 1, \(\hat{\pmb {\beta }}_{\mathrm{PLS}1}\) is the \(\sqrt{n}\)-consistent minimizer of \(D_{n}((\pmb {\beta }_{1},\pmb {0}))\) and satisfies
Hence
By arguments similar to those used in the proof of (2.9), we can prove (3.2). The proof of Theorem 3.1 is finished. \(\square \)
Proof of Theorem 3.2
Similar to the proofs of Theorems 2.2 and 2.3, we can complete the proof of Theorem 3.2. \(\square \)
Rights and permissions
About this article
Cite this article
Tang, Q., Jin, P. Estimation and variable selection for partial functional linear regression. AStA Adv Stat Anal 103, 475–501 (2019). https://doi.org/10.1007/s10182-018-00342-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10182-018-00342-0