Abstract
Because of its orthogonality, interpretability and best representation, functional principal component analysis approach has been extensively used to estimate the slope function in the functional linear model. However, as a very popular smooth technique in nonparametric/semiparametric regression, polynomial spline method has received little attention in the functional data case. In this paper, we propose the polynomial spline method to estimate a partial functional linear model. Some asymptotic results are established, including asymptotic normality for the parameter vector and the global rate of convergence for the slope function. Finally, we evaluate the performance of our estimation method by some simulation studies.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
With the development of technology of computation and measurement, scientists usually confront the data providing information about curves, surfaces or anything else varying with continuous variables. Such type of data structure, called functional data, attracted great interests in various fields. For example, in chemometrics the spectrometric data consists of hundreds of different wavelength spectra, fMRI data can recover the contours of invisible human organs and spatial data is used to study the topological, geometric, or geographic properties of entities. Due to the infinite dimensionality and the strong mutual relations of predictors, the traditional multivariate statistical methods fail to analyze functional data. To overcome these problems, Ramsay and Dalzell (1991), Ramsay and Silverman (1997, 2005) introduced some fundamental models and tools for functional data analysis.
Regression analysis is very popular in statistical analysis. As an extension of ordinary linear models, Ramsay and Silverman (1997, 2005) introduced the functional linear model to model the relationship between a scalar response and a functional predictor. Further, Cardot et al. (1999), Cai and Hall (2006), Hall and Horowitz (2007) and Li and Hsing (2007) proposed estimation methods based on functional principal component analysis and investigated the asymptotic properties of the estimators. On the other hand, Cardot et al. (2003) and Crambes et al. (2009) employed penalized B-spline and smoothing spline to estimate the functional slope parameter. As an extension of nonparametric model, functional nonparametric regression was also studied in literatures. Kernel regression (Ferraty and Vieu 2006), local linear regression (Baíllo and Grané 2009) and K-nearest neighbours method (Burba et al. 2009) are used to deal with the functional nonparametric models.
In order to improve the power of prediction and interpretation of the functional regression model, some additional real-valued predictors could be introduced. There is some recent literature focusing on this situation. For example, Aneiros-Pérez and Vieu (2006) introduced a semi-functional partial linear regression model to predict the fat content of the chopped pure meat. Further, Aneiros-Pérez and Vieu (2008) extended this model to dependent data. Zhang et al. (2007) introduced the partial functional linear model to assess the effect of women’s hormone on the total hip bone mineral density and Shin (2009) proposed a new estimation method based on functional principal component analysis. Cardot and Sarda (2008) generalized the functional linear model to a varying coefficient functional linear model in which an additional random variable influenced smoothly the functional coefficient. Zhou and Chen (2012) introduced a semi-functional linear model which combined the functional linear regression model and the nonparametric regression model.
In functional linear regression, because of its orthogonality, interpretability and best representation, functional principal component analysis approach has been extensively used to estimate the slope function (see Cardot et al. 1999; Cai and Hall 2006; Hall and Horowitz 2007; Li and Hsing 2007; Shin 2009). As a very popular smooth technique, polynomial spline or regression spline method can produce a smooth function estimate and can be operated easily, so it has received considerable attention in nonparametric/semiparametric regression (see Chen 1991; Stone 1994; Stone et al. 1997; Zhou et al. 1998; Huang 2003a, b; Huang et al. 2004a; Huang and Shen 2004b and so on). However, there is limited literature discussed the polynomial spline method in the functional data case. We only noted that Ramsay and Silverman (1997, 2005) applied polynomial spline to estimate the functional linear model, but they didn’t investigate the asymptotic behaviors of the estimator.
In this paper, we focus on the polynomial spline estimators for the partial functional linear models. We employ the polynomial spline basis to approximate the functional coefficients. Using profile least squares technique, we obtain the optimal convergence rate and asymptotic normality for estimators of parameters. Based on these estimators, we also have the limiting distribution of Wald test statistic for linear hypothesis of parameters. The numerical studies indicate our proposed procedure can enjoy more smoothness for functional coefficients in finite samples.
The rest of the paper is organized as follows. In Sect. 2, we introduce the polynomial spline estimate for partial functional linear models. Section 3 investigates the asymptotic properties of estimators and discusses statical inference problem. Simulation studies are presented in Sect. 4. Conclusion and further research are given in Sect. 5. All technical details and proofs are given in the “Appendix”.
2 Polynomial spline estimation
Let the observed data \((X_i,\mathbf{Z}_i,Y_i)\), \(i=1,\ldots , n\), which are independent and identically distributed (i.i.d.), be generated from the following partial functional linear model
where \(Y_i\) and \(\mathbf{Z}_i=(Z_{i1},\ldots ,Z_{ip})^{T}\) are the scalar response variable and the p-dimensional predictor vector, respectively. The predictor variable \(X_i\) is a random function valued in \(H=L^2([0,1])\), the Hilbert space containing square integrable functions defined on the unit interval. Let \(\langle \phi ,\varphi \rangle =\int ^1_0 \phi (t)\varphi (t)dt\) denote the usual inner product of function \(\phi \) and \(\varphi \) and let \(\Vert \phi \Vert =\langle \phi ,\phi \rangle ^{1/2}\) denote the norm of H. The random errors \(\varepsilon _i\) are independent and identically distributed with mean 0 and finite variance \(\sigma ^2\) and are independent of \((X_i,\mathbf{Z}_i)\). Let \(\beta \) be an unknown p-dimensional parameter vector and \(\alpha (t)\) be an unknown smoothing function belonging to H.
Before introducing the polynomial spline estimation, we recall simply the polynomial spline function. Let \(k\ge 0\). The sequence \(0=t_0<t_1<\cdots <t_{N_n}<t_{N_{n}+1}=1\) is a partition of interval [0, 1], which is called knot sequence. Suppose a function is a polynomial of degree k on each of the intervals \([t_i,t_{i+1}](i=0,1,\ldots ,N_{n})\), and it has \(k-1\) continuous derivatives for \(k\ge 1\) on the interval [0, 1], then it is called spline function of degree k.
We next consider the polynomial spline estimate \(\widehat{\alpha }\) of \(\alpha \). Let \(S_{k,N_n}\) be the space of polynomial splines defined on interval [0, 1] with degree k and \(N_n\) interior knots. The space \(S_{k,N_n}\) is a \(K_n\)-dimensional linear space, \(K_n=N_n+k+1\). From Theorem XII.1 of de Boor (2001), we can conclude that, if the slope function \(\alpha (t)\) is sufficiently smooth, there is a spline function \(a(t)\in S_{k,N_n} \) such that
where \(B_j, j=1,\ldots , K_n\) are the B-spline basis functions. Plugging the approximation (2) into model (1), we have
where the spline coefficient vector \(b=(b_1,\ldots , b_{K_n})^T\) and parameter vector \(\beta \) are to be estimated. Then, the semiparametric estimation problem in model (1) turns into the ordinary parametric estimation problem.
Let the square loss function
The estimators of b and \(\beta \) can be obtained by minimizing (4). For ease to discuss the asymptotic properties, we apply the profile least squares procedure to estimate unknown spline coefficients and parameters. The estimators for \(\beta \) and b are given by
where \(Y = (Y_1, \ldots , Y_n)^T\), \(\mathbf{Z}=(\mathbf{Z}_1, \ldots , \mathbf{Z}_n)^T\), \(B=\big \{\langle X_i,B_{j}\rangle \big \}_{\mathop {i=1,\dots ,n}\limits _{j=1,\dots ,K_n}}\) and \(A = B(B^T B)^{-1} B^T\). Then, the polynomial spline estimator of \(\alpha (t)\) and the estimator of \(\sigma ^2\) can be respectively defined by
3 Asymptotic properties
In this section, we investigate the asymptotic properties of the polynomial spline estimators. For ease to discuss the asymptotic behaviors of our proposed estimators, the following notation is needed. For two sequences of positive numbers \(a_n\) and \(b_n\), \(a_n\lesssim b_n\) signifies \(a_n/b_n\) is uniformly bounded and \(a_n\asymp b_n\) if \(a_n\lesssim b_n\) and \(b_n\lesssim a_n\). The covariance operator \(\Gamma \) of the random function X is defined as \(\Gamma x(t)=\int _0^1 EX(t)X(s)x(s)ds, x\in H\). The norm \(\Vert \cdot \Vert \) of a function \(f\in C^{k+1}([0,1])\) is defined as \(\Vert f\Vert =\big (\int _0^1 f(t)^2dt\big )^{1/2}\).
In order to establish the theoretical properties of polynomial spline estimation, the following assumptions are required:
-
(C1)
There are some positive constants M and \(\frac{1}{4(k+1)}<r<\frac{1}{2}\) such that
$$\begin{aligned} h=\max _{j=0,\ldots ,N_n}(t_{j+1}-t_j)\asymp n^{-r}, \quad K_n\asymp n^r,\quad h/\min _{j=0,\ldots ,N_n}(t_{j+1}-t_j)\le M. \end{aligned}$$ -
(C2)
\(E||X||^4<\infty \) and the eigenvalues of the covariance operator \(\Gamma \) of X are strictly positive.
-
(C3)
\(E|Z_{11}|^4+\cdots +E|Z_{1p}|^4+E|\varepsilon _1|^4<\infty .\)
-
(C4)
For \(j=1,\ldots ,p\), \(E(Z_{1j}|X_1)\) is a continuous linear functional, that is, there exists a function \(g_j\in H\) such that \(E(Z_{1j}|X_1)=\langle X_1,g_j \rangle \). Further, we assume \(g_j, j=1,\ldots ,p\) and slope function \(\alpha \) are smooth enough, that is, \(g_j\in C^{k+1}([0,1])\), \(\alpha \in C^{k+1}([0,1])\).
-
(C5)
Let \(\eta _{1j}=Z_{1j}-E(Z_{1j}|X_1)=Z_{1j}-\langle X_1,g_j \rangle , j=1,\ldots ,p\), \(\eta _1=(\eta _{11},\ldots ,\eta _{1p})^T\). Furthermore, we assume that \(\Sigma =E\eta _1\eta _1^T\) is a positive definite matrix.
Remark 1
Conditions (C1)–(C5) are very general in polynomial spline estimation and functional linear model. In fact, condition (C1) is similar to (3) in Zhou et al. (1998). For the number of spline basis \(K_n\), the requirement is similar to (16) in Shin (2009). Condition (C2) is very common in functional linear model (see H1 and H2 in Cardot et al. 1999 and (12) in Shin 2009). However, we don’t need additional assumption on eigenvalues of the covariance operator \(\Gamma \) like (14) in Shin (2009). Condition (C3) is similar to (11) in Aneiros-Pérez and Vieu (2006) and (17) in Shin (2009). Condition (C4) requires the dependence between the covariate \(Z_{1j}, (j=1,\ldots ,p)\) and the random function \(X_1\) is a continuous linear functional, which is a special case of conditional expectation operators \(E(X_{ij}|T_i=t)\) in Aneiros-Pérez and Vieu (2006). Furthermore, to assure the validity of the polynomial spline estimation, we need a restricted smooth condition on each functional coefficient \(g_j\) and \(\alpha \). Condition (C5) is similar to (12) in Aneiros-Pérez and Vieu (2006) and (20) in Shin (2009).
Under the above assumption conditions, we have the following results.
Theorem 1
If conditions (C1)–(C5) hold, as \(n\rightarrow \infty \), we have
Theorem 2
Suppose that conditions (C1)–(C5) are satisfied, then
Remark 2
For the estimation of the parameter vector, Theorem 1 shows that the asymptotic result is similar to Theorem 1(i) in Aneiros-Pérez and Vieu (2006) and Theorem 3.1 in Shin (2009). For the estimation of functional coefficient, Theorem 2 indicates that, under smoother conditions (\(C^{k+1}\) in particular), the global convergence rate is similar to those given in Newey (1997) and Huang and Shen (2004b) in nonparametric regression setting, which shows that the existence of a random vector as a predictor does not change the rate of convergence of the estimated functional coefficient. Moreover, if we take \( r=(a+1)/(a+2b)\) and \(k=(2b-1)/2(a+1)-1\), then we can obtain the same rate of convergence of the estimated functional coefficient as Shin (2009).
For the estimator of variance \(\sigma ^2\), we have the following theorem
Theorem 3
If conditions (C1)–(C5) hold, then we have
where \(\Lambda ^2=E(\varepsilon _1^2-\sigma ^2)^2\).
Further, let \(\widehat{\Sigma }_n=n^{-1}{\mathbf{Z}^T(I-A)\mathbf{Z}}\). In the light of the above theorems, we can obtain the following corollary.
Corollary 1
Under conditions (C1)–(C5), as \(n\rightarrow \infty \), we have
Remark 3
According to the corollary 1, we can obtain an approximate \((1-\gamma )\) asymptotic confidence region for parameter vector \(\beta \), that is,
Also, we can get an approximate \((1-\gamma )\) asymptotic confidence interval for every parameter \(\beta _j, j=1,\ldots ,p\), that is,
where \(\widehat{\sigma }_n(\widehat{\Sigma }_n)^{-1}_{jj}\) is the jth diagonal element of \(\widehat{\sigma }_n(\widehat{\Sigma }_n)^{-1}\).
4 Simulation studies
In this section, we present some simulation results to illustrate the finite sample behaviors of the polynomial spline estimation and compare our method with the Shin (2009)’s.
4.1 Models for generating simulation data
In this subsection we specify four models to generate simulation data \(\big \{(X_i,\mathbf{Z}_i,Y_i)\big \}_{i=1}^n\). In the first three models we take the same form as Lian (2011) to generate \(X_i\), that is,
where \(\phi _1(t)=1, \phi _j(t)=\sqrt{2}\cos ((j-1)\pi t)\) for \( j\ge 2\) and \(\xi _{ij}\) is independent and identical distribution with \(U[-\sqrt{3},\sqrt{3}]\).
Model 1: \(Y_i=1.5Z_{i1}-Z_{i2}+2Z_{i3}+\int _0^1 X_i(t)\alpha (t)dt+\varepsilon _i\), where \(\mathbf{Z}_i=(Z_{i1}, Z_{i2}, Z_{i3})^T\) is from a multivariate normal distribution \(N(0,\Phi )\) with covariance matrix \(\Phi =[0.9, 0.2,0.3;0.2,0.5,0.1;0.3,0.1,1]\). The functional coefficient \(\alpha (t)=\sum _{j=1}^{50} b_j\phi _j(t)\), where \(b_1=0.5, b_j=4j^{-2}\), for \( j\ge 2\) and the error variable \(\varepsilon _i \) is N(0, 1).
Model 2: \(Y_i=2Z_{i1}-Z_{i2}+\int _0^1 X_i(t)\alpha (t)dt+\varepsilon _i\), \( Z_{i1}=\int _0^1 X_i(t)\alpha _1(t)dt+\varepsilon _{i1}\), \( Z_{i2}=\int _0^1 X_i(t)\alpha _2(t)dt+\varepsilon _{i2} \), where the functional coefficient \(\alpha (t)\) is similarly defined in Model 1. In addition, functional coefficients \(\alpha _1(t)=\sum _{j=1}^{50} b_{1j}\phi _j(t)\) and \(\alpha _2(t)=\sum _{j=1}^{50} b_{2j}\phi _j(t)\), where \(b_{11}=1, b_{21}=-0.5, b_{1j}=2j^{-2}, b_{2j}=3j^{-2}\) for \(j\ge 2\). Random error variables \(\varepsilon _i\) and \(\varepsilon _{i1}\) are N(0, 0.25) and \(\varepsilon _{i2}\) is N(0, 0.64).
Model 3: \(Y_i=1.5Z_{i1}+5Z_{i2}-1.7Z_{i3}+\int _0^1 X_i(t)\alpha (t)dt+\varepsilon _i\), where \(\mathbf{Z}_i=(Z_{i1}, Z_{i2}, Z_{i3})^T\) is from a multivariate normal distribution \(N(0,\mathbf{I}_3)\). The functional coefficient is given by
which is similar to example (a) in Cardot et al. (2003). The error variable \(\varepsilon _i\) is N(0, 0.36).
Model 4: We take the same example in Shin (2009), that is,
where \(X_i(t)\) is a standard Brownian motion and \(\alpha (t)=\sqrt{2}\sin (\pi t/2)+3\sqrt{2}\sin (3\pi t/2)\). Random vector \(\mathbf{Z}_i=(Z_{i1}, Z_{i2}, Z_{i3}, Z_{i4}, Z_{i5})^T\) is from a multivariate normal distribution \(N(0,\mathbf{I}_5)\), and error variable \(\varepsilon _i\) is N(0, 1).
For practicality, the random functions \(X_i(t)\) in Models 1–4 are all only observed at 100 equally spaced points on [0, 1].
4.2 Implementation
In this subsection, we specifically illustrate implementation of our method and Shin (2009)’s method. To implement Shin (2009)’s method, we need to turn discrete observation data of \(X_i(t)\) into functional data objects. In this paper, we utilize the method mentioned in Chapter 4 of Ramsay et al. (2009) and choose 25 B-spline functions to build functional data. We also use pca.fd function mentioned in Chapter 7 of Ramsay et al. (2009) to carry out a functional principal components analysis. For our procedure, we have to choose the degrees of spline functions, the positions and the number of knots. Similarly to Huang and Shen (2004b), we choose B-spline basis with equally spaced knots and the fixed degree 2 in this paper. Then, we only need to select the number of the B-spline basis and eigenfunctions \(K_n\). Many methods can be used to select \(K_n\), for example, AIC (Akaike 1974), BIC (Schwarz 1978), “leave-one-subject-out” cross-validation (Rice and Silverman 1991) and modified multi-fold cross-validation (Cai et al. 2000). In this paper, we use “leave-one-subject-out” cross-validation technique to choose the number of B-spline basis and eigenfunctions. Specifically, we select \(K_n\) by minimizing the following cross-validation score:
where \(\widehat{\alpha }^{-i}\) and \(\widehat{\beta }^{-i}\) are estimators computed by deleting the ith observation \((X_i, \mathbf{Z}_i, Y_i)\). In our procedure, the number of B-spline basis ranges from 3 to 12 and the number of eigenfunctions ranges from 1 to 10. For the integrals involved in matrix B, we approximate them by the trapezoidal rule.
Two risk functions are used to assess the performances of our estimators and the Shin (2009)’s: the mean square prediction error of the response variable Y, which is similar to (26) in Cardot et al. (2003),
and the square-root of average squared error (RASE) of functional coefficient \(\alpha (t)\), which is similar to (6) in Huang and Shen (2004b),
where \(\{t_k, k=1,\ldots , n_{grid}\}\) are grid points chosen to be equally spaced on the interval [0, 1]. In this paper, the number of grid points \(n_{grid}=101\).
We use Matlab to implement our procedure. For each simulation model above-mentioned, we consider two different sample sizes: \(n=100\) and \(n=500\), and each simulation experiment has been repeated 500 times.
4.3 Simulation results
In this subsection, we present some simulation results of 4 simulation models mentioned in (4.1). Notations \(_s\) and \(_p\) denote our estimation method and the Shin (2009)’s, respectively.
Figure 1 displays the empirical distribution function of \(\widehat{\chi }_{n,3}^2\) from 500 simulated samples under Model 1. For Models 2–4, the empirical distribution functions of \(\widehat{\chi }_{n,p}^2\) have same performance, so we omit them to save space. We can see from this figure that as the sample size n increases, the empirical distribution more and more approach the theoretical distribution, which also reveals the validity of asymptotic normality in Sect. 3.
Table 1 summarizes mean squared errors (MSE) of estimators \(\widehat{\beta }\) and \(\widehat{\sigma }_n^2\) under Models 1–4. Table 2 presents the mean and standard deviation of RASE and MSPE to evaluate the performance of our estimation procedure. Figure 2 shows our estimate (dashed curve) and the Shin (2009)’s (dotted curve) from the typical samples which correspond to the minimum of \(\text {RASE}_s\) and \(\text {RASE}_p\) under Models 1–4, respectively. From these results in our simulation examples, we can know that the two estimation methods are very close for the parameter component \(\beta \) and \(\sigma \). However, from the prediction perspective and the estimation effect of function coefficient \(\alpha (t)\), if the functional coefficient \(\alpha (t)\) can be expressed linear combination of eigenfunctions of covariance operator \(\Gamma \), the Shin (2009)’s method is superior to ours, if not, our method seems to perform better than Shin (2009)’ method. At the same time, the differences between the two estimation methods become smaller and smaller as the sample size n increases.
Table 3 displays the mean and standard deviation of running cpu time in a Dell personal computer with Inter(R) Core(TM)2 Duo CPU. We seem to infer from Table 3 that our method is more computationally expedient at least in the examples studied when the sample size is small, while the Shin (2009)’s is more computationally expedient if the sample size is large.
5 Conclusion and further research
In this paper, we propose the polynomial spline estimation for the partial functional linear model. Some asymptotic results are established, including asymptotic normality for the parameter vector and the global rate of convergence for the functional coefficient. By simulation studies, we verify the validity of theoretical results. On the one hand, from the prediction perspective and the estimation effect of function coefficient \(\alpha (t)\), we detect if the functional coefficient \(\alpha (t)\) can be expressed linear combination of eigenfunctions of covariance operator \(\Gamma \), the Shin (2009)’s method is superior to ours, if not, our method seems to perform better than Shin (2009)’ method. While the differences between the two estimation methods become smaller and smaller as the sample size n increases. On the other hand, from computational time, we can draw a conclusion that our method is more computationally expedient at least in the examples studied when the sample size is small, while the Shin (2009)’s is more computationally expedient if the sample size is large. From our limited study, we only consider the functional predictor would be observed fully. However, we can usually obtain some sparsely discrete observations for each functional observation in practice. For this case, we can use smooth techniques to approximate the functional observations. And then, the polynomial spline method can also be used to estimate the partial functional linear model.
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76:1102–1110
Aneiros-Pérez G, Vieu P (2008) Nonparametric time series prediction: a semi-functional partial linear modeling. J Multivar Anal 99:834–857
Baíllo A, Grané A (2009) Local linear regression for functional predictor and scalar response. J Multivar Anal 100:102–111
Burba F, Ferraty F, Vieu P (2009) K-nearest neighbour method in functional nonparametric regression. J Nonparametric Stat 21:453–469
Cai TT, Hall P (2006) Prediction in functional linear regression. Annals Stat 34:2159–2179
Cai Z, Fan J, Yao Q (2000) Functional-coefficient regression model for nonlinear time series. J Am Stat Assoc 95:941–956
Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Prob Lett 45:11–22
Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591
Cardot H, Sarda P (2008) Varying-coefficient functional linear regression models. Commun Stat Theory Methods 37:3186–3203
Chen H (1991) Polynomial splines and nonparametric regression. J Nonparametric Stat 1:143–156
Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Annals Stat 37:35–72
de Boor C (2001) A practical guide to splines. Springer, New York
DeVore RA, Lorentz GG (1993) Constructive approximation. Springer, Berlin
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Huang JZ (2003a) Asymptotics for polynomial spline regression under weak conditions. Stat Probab Lett 65:207–216
Huang JZ (2003b) Local asymptotics for polynomial spline regression. Annals Stat 31:1600–1635
Huang JZ, Wu CO, Zhou L (2004a) Polynomial spline estimation and inference for varying coefficient model with longitudinal data. Stat Sin 14:763–788
Huang JZ, Shen H (2004b) Functional coefficient regression models for non-linear time series: a polynomial spline approach. Scand J Stat 31:515–534
Hall P, Horowitz J (2007) Methodology and convergence rates for functional linear regression. Annals Stat 35:70–91
Li Y, Hsing T (2007) On rates of convergence in functional linear regression. J Multivar Anal 98:1782–1804
Lian H (2011) Functional partial linear model. J Nonparametric Stat 23:115–128
Newey WK (1997) Convergence rates and asymptotic normality for series estimators. J Econom 79:147–168
Ramsay J, Dalzell C (1991) Some tools for functional data analysis. J R Stat Soc Ser B 53:539–572
Ramsay J, Silverman B (1997) Functional data analysis. Springer, New York
Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer, New York
Ramsay J, Hooker G, Graves S (2009) Functional data analysis with R and Matlab. Springer, New York
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B 53:233–243
Shin H (2009) Partial functional linear regression. J Stat Plan Inference 139:3405–3418
Stone CJ (1994) The use of polynomial splines and their tensor products in multivariate function estimation (with discussion). Annals Stat 22:118–171
Stone CJ, Hansen M, Kooperberg C, Truong YK (1997) Polynomial splines and their tensor products in extended linear modeling (with discussion). Annals Stat 25:1371–1470
Schwarz G (1978) Estimating the dimension of a model. Annals Stat 6:461–464
Zhang D, Lin X, Sowers M (2007) Two-stage functional mixed models for evaluating the effect of longitudinal covariate profiles on a scalar outcome. Biometrics 63:351–362
Zhou S, Shen X, Wolfe DA (1998) Local asymptotics for regression splines and confidence regions. Annals Stat 26:1760–1782
Zhou J, Chen M (2012) Spline estimators for semi-functional linear model. Stat Probab Lett 82:505–513
Acknowledgments
The work was supported by National Nature Science Foundation of China (Grant Nos. 10961026, 11171293, 11225103, 11301464), the PH.D. Special Scientific Research Foundation of Chinese University (20115301110004), the Key Fund of Yunnan Province (Grant No. 2010CC003) and the Scientific Research Foundation of Yunnan Provincial Department of Education (No. 2013Y360). We are grateful to the referees and the editors for their constructive remarks that greatly improved the manuscript.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
In the appendix, we give the proofs of the theorems and corollary in Sect. 3.
Set \(B_{s}={K_n}^{1/2}N_{s}^{b}, s=1,\ldots , K_n\), where \(N_{s}^{b}\) are the normalized B-splines. From the Theorem 4.2 of Chapter 5 of DeVore and Lorentz (1993), we have that for any spline function \(\sum _{s=1}^{K_n} b_s B_{s}\), there are positive constants \(M_1\) and \(M_2\) such that
where \(\Vert \cdot \Vert _2\) is Euclidean norm. Let \(\Vert r\Vert _{\infty }=\sup \nolimits _{x\in [0,1]} |r(x)|\).
In order to prove the theorems, we need the following two lemmas.
Lemma 1
If conditions (C1) and (C2) hold, then we have
-
(i)
$$\begin{aligned} \sup _{a\in S_{k,N_n}}\Big |\frac{\frac{1}{n}\sum _{i=1}^n \langle X_i,a \rangle ^2}{E \langle X,a \rangle ^2}-1\Big |=o_p(1). \end{aligned}$$
-
(ii)
there exists an interval \([M_3,M_4],0<M_3<M_4<\infty \) such that as \(n\rightarrow \infty \),
$$\begin{aligned} P\Big \{\text {all the eigenvalues of}~\frac{1}{n}B^TB~\text {fall in}~[M_3,M_4]\Big \}\rightarrow 1. \end{aligned}$$
Note that the Lemma 1 is a generalization of Lemma 1 and 2 in Huang and Shen (2004b) in functional data case. We give a brief proof in the following.
Proof
(i) Let \(\Gamma _n\) denote the empirical versions of operator \(\Gamma \), that is,
By the Cauchy–Schwarz inequality, condition (C2) and (28) in Cardot et al. (2003), we have
Then for an arbitrary constant \(\epsilon >0\), by Lemma 5.2 in Cardot et al. (1999), we have
together with (C2), which gives the result.
(ii) Let \(b=(b_1,\ldots ,b_{K_n})^T, a=\sum _{s=1}^{K_n} b_sB_s\). It follows from (i) that except an event whose probability tends to zero as \(n\rightarrow \infty \),
By the Cauchy–Schwarz inequality, (28) in Cardot et al. (2003) and (7),
Thus, except an event whose probability tends to zero, \(\frac{1}{n}b^TB^TBb\asymp \Vert b\Vert _2^2,\) holds uniformly for all b, which yields the result. \(\square \)
Lemma 2
Under conditions (C1)–(C5), as \(n\rightarrow \infty \), we have
Proof
Let \(\mu _j(X_i)=E(Z_{ij}|X_i)=\langle X_i,g_j\rangle , \eta _{ij}=Z_{ij}-\mu _j(X_i)\),
We also define \(V=(\tilde{V_1},\ldots ,\tilde{V_p})\), \(\eta =(\tilde{\eta _1},\ldots ,\tilde{\eta _p})\). Then, \(\mathbf{Z}=\eta +V\) and
For the (j, l)th element of \(I_1\)
By independence and the Cauchy–Schwarz inequality, we have
Further, by \(C_r\) inequality and (C2)–(C4), we have
Thus,
Note that \(A\ge 0\), then we have
By Lemma 1, we can know that except an event whose probability tends to zero,
Also note that \(E\langle X_i,B_s\rangle \eta _{ij}=E\langle X_i,B_s\rangle E(\eta _{ij}|X_i)=0\). Then, by (7) and conditions (C2)–(C4), we have there exists a positive constant C such that
Thus, for \(j,l=1,\ldots ,p\),
which together with (8) yields
For the (j, l)-th element of \(I_4\), \(j,l=1,\ldots ,p\),
by Cauchy–Schwartz inequality,
It follows from Theorem XII.1 of de Boor (2001) that there exist positive constant \(C_j\) and spline function \(g_j^*\in S_{k,N_n}, j=1,\ldots ,p\) such that
Set \(g_j^*=\sum _{s=1}^{K_n} b_{js}^*B_{s}, \quad b_j^*=(b_{j1}^*,\ldots ,b_{jK_n}^*)^T,\quad j=1,\ldots ,p\), then,
As A is an orthogonal projection matrix,
From the above results and (C1), we have
that is,
For the (j, l)-th element of \(I_2\) and \(I_3\), \(j,l=1,\ldots ,p\), we have
Using (9) and (10), we can infer that
The combination of (9)–(11) allows us to finish the proof of Lemma 2. \(\square \)
Proof of Theorem 1
Denote \(\Phi =\Big (\langle X_1,\alpha \rangle ,\ldots ,\langle X_n,\alpha \rangle \Big )^T\), \(\varepsilon =(\varepsilon _1,\ldots ,\varepsilon _1)^T\). Then, \(Y=\mathbf{Z}\beta +\Phi +\varepsilon \). We can write
Observe that
For \(\Delta _{11}\), as \(\mathbf{Z}=\eta +V\),
By (C4) and the Theorem XII.1 of de Boor (2001), we know that there is a spline function \(\alpha ^*=\sum _{s=1}^{K_n} b_s^*B_s\in S_{k,N_n}\) and positive constant C such that
Set \(\Phi ^*=(\langle X_1,\alpha ^*\rangle ,\ldots ,\langle X_n,\alpha ^*\rangle )^T\) and \(b^*=(b_1^*,\ldots ,b_{K_n}^*)^T\), we have \(\Phi ^*=Bb^*\). For \(j=1,\ldots ,p\), by conditions (C1), (C2), (C4) and Theorem XII.1 of de Boor (2001), we can infer
Thus, by (C1) we have
Observe that for \(j=1,\ldots ,p\),
As \(E\Big (\eta _{ij}\langle X_i,\alpha -\alpha ^*\rangle \Big )=E\Big [\langle X_i,\alpha -\alpha ^*\rangle E(\eta _{ij}|X_i)\Big ]=0\) and
we can infer
Further, under the Lemma 1, (C1) and (13), we can show
By (12), (14)–(16) and Lemma 2, we have
\(\Delta _{21}\) can be expressed as
Let \(\epsilon _i=\eta _i \varepsilon _i\). Since \(\varepsilon _i\) is independent of \((X_i,\mathbf{Z}_i)\) and \((X_i,\mathbf{Z}_i,Y_i)\) is i.i.d. sequence, the \(\epsilon _i\) are i.i.d. random variables with \(E\epsilon _i=0\) and \(Var(\epsilon _i)=\sigma ^2\Sigma \).
Observe that
Then, by the central limit theorem,
Also note that
Then, it follows from Lemma 1 that
Since \(E\langle X_i,B_s\rangle \varepsilon _i\langle X_j,B_s\rangle \varepsilon _j=0, i\ne j\), we have
that is, \(\varepsilon ^TA\varepsilon =O_p(K_n)\). In addition, we can know from the proof of Lemma 2 that
Thus,
which, together with (18), yields
For the jth element of \(R_2\), \(j=1,\ldots ,p\), we have
Since \(\varepsilon _i\) is independent \((X_i,\mathbf{Z}_i)\), we have
Then,
Also, observe that
Then, by (C1), we have
From the above results, we can infer
Now, by Lemma 2, (17), (19), (20) and Slutsky theorem, we can obtain the Theorem 1. \(\square \)
Proof of Theorem 2
Observe that
Let \(\tilde{Y}=\mathbf{Z}(\beta -\widehat{\beta })+\Phi \). Denote \(\tilde{b}=(B^TB)^{-1}B^T\tilde{Y}\) and \(\tilde{\alpha }(t)=\sum _{s=1}^{K_n} \tilde{b_s}B_s(t)\), where \(\tilde{b}=(\tilde{b_1},\ldots ,\tilde{b_{K_n}})^T\). Then, \(\widehat{b}-\tilde{b}=(B^TB)^{-1}B^T\varepsilon \). By Lemma 1, we have
except on an event whose probability tends to zero as \(n\rightarrow \infty \). Thus, by (7), we can infer
Also, it follows from the Theorem XII.1 of de Boor (2001) that there exists a spline function \(\alpha ^*(t)=\sum _{s=1}^{K_n} b_s^*B_s(t)\in S_{k,N_n}\) where \(b^*=(b_1^*,\ldots ,b_{K_n}^*)^T\) and constant \(C>0\) such that
By the Theorem XII.1 of de Boor (2001) and (7), we have
Observe that \(B\tilde{b}=B(B^TB)^{-1}B^T\tilde{Y}\) and \(B(B^TB)^{-1}B^T\) is an orthogonal projection matrix. Thus,
Applying (C2), (22) and the Cauchy–Schwarz inequality, we obtain that
that is,
In addition, note that
Then, it follows from Theorem 1 and (C4) that
which together with (23) yields
Further, we can infer that
Then, the combination of (21), (22), (25) and (26) allows us to complete the proof of Theorem 2. \(\square \)
Proof of Theorem 3
We can write
Observe that
Then, by (C1), (C2), theorem 2, we have
It follows from (24) that
For \(R_{n3}\), since \(E(\varepsilon _1^2-\sigma ^2)=0\) and \(\Lambda ^2=E(\varepsilon _1^2-\sigma ^2)^2<\infty \), it follows from the central limit theorem that
For \(R_{n4}\), we have
Then, applying (C1), (C2) and Theorem 2, we can obtain
Note that
Thus, using (C3) and Theorem 1, we have
Also, observe that
Then, by (C1)–(C3), Theorem 1 and 2, we can get
Finally, using (27)–(32), we can complete the proof of Theorem 3. \(\square \)
Proof of Corollary 1
It follows form Theorem 1 that
Also, by Lemma 2 and Theorem 3, we have that
Then, by the Slutsky theorem, we obtain the Corollary 1. \(\square \)
Rights and permissions
About this article
Cite this article
Zhou, J., Chen, Z. & Peng, Q. Polynomial spline estimation for partial functional linear regression models. Comput Stat 31, 1107–1129 (2016). https://doi.org/10.1007/s00180-015-0636-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00180-015-0636-0