Polynomial spline estimation for partial functional linear regression models

Zhou, Jianjun; Chen, Zhao; Peng, Qingyan

doi:10.1007/s00180-015-0636-0

Polynomial spline estimation for partial functional linear regression models

Original Paper
Published: 02 January 2016

Volume 31, pages 1107–1129, (2016)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Computational Statistics Aims and scope Submit manuscript

Polynomial spline estimation for partial functional linear regression models

Download PDF

Jianjun Zhou¹,
Zhao Chen² &
Qingyan Peng¹

1035 Accesses
16 Citations
Explore all metrics

Abstract

Because of its orthogonality, interpretability and best representation, functional principal component analysis approach has been extensively used to estimate the slope function in the functional linear model. However, as a very popular smooth technique in nonparametric/semiparametric regression, polynomial spline method has received little attention in the functional data case. In this paper, we propose the polynomial spline method to estimate a partial functional linear model. Some asymptotic results are established, including asymptotic normality for the parameter vector and the global rate of convergence for the slope function. Finally, we evaluate the performance of our estimation method by some simulation studies.

Robust estimation with a modified Huber’s loss for partial functional linear models based on splines

Article 18 February 2020

Single-index partially functional linear regression model

Article 18 January 2018

Varying coefficient partially functional linear regression models

Article 01 April 2015

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the development of technology of computation and measurement, scientists usually confront the data providing information about curves, surfaces or anything else varying with continuous variables. Such type of data structure, called functional data, attracted great interests in various fields. For example, in chemometrics the spectrometric data consists of hundreds of different wavelength spectra, fMRI data can recover the contours of invisible human organs and spatial data is used to study the topological, geometric, or geographic properties of entities. Due to the infinite dimensionality and the strong mutual relations of predictors, the traditional multivariate statistical methods fail to analyze functional data. To overcome these problems, Ramsay and Dalzell (1991), Ramsay and Silverman (1997, 2005) introduced some fundamental models and tools for functional data analysis.

Regression analysis is very popular in statistical analysis. As an extension of ordinary linear models, Ramsay and Silverman (1997, 2005) introduced the functional linear model to model the relationship between a scalar response and a functional predictor. Further, Cardot et al. (1999), Cai and Hall (2006), Hall and Horowitz (2007) and Li and Hsing (2007) proposed estimation methods based on functional principal component analysis and investigated the asymptotic properties of the estimators. On the other hand, Cardot et al. (2003) and Crambes et al. (2009) employed penalized B-spline and smoothing spline to estimate the functional slope parameter. As an extension of nonparametric model, functional nonparametric regression was also studied in literatures. Kernel regression (Ferraty and Vieu 2006), local linear regression (Baíllo and Grané 2009) and K-nearest neighbours method (Burba et al. 2009) are used to deal with the functional nonparametric models.

In order to improve the power of prediction and interpretation of the functional regression model, some additional real-valued predictors could be introduced. There is some recent literature focusing on this situation. For example, Aneiros-Pérez and Vieu (2006) introduced a semi-functional partial linear regression model to predict the fat content of the chopped pure meat. Further, Aneiros-Pérez and Vieu (2008) extended this model to dependent data. Zhang et al. (2007) introduced the partial functional linear model to assess the effect of women’s hormone on the total hip bone mineral density and Shin (2009) proposed a new estimation method based on functional principal component analysis. Cardot and Sarda (2008) generalized the functional linear model to a varying coefficient functional linear model in which an additional random variable influenced smoothly the functional coefficient. Zhou and Chen (2012) introduced a semi-functional linear model which combined the functional linear regression model and the nonparametric regression model.

In functional linear regression, because of its orthogonality, interpretability and best representation, functional principal component analysis approach has been extensively used to estimate the slope function (see Cardot et al. 1999; Cai and Hall 2006; Hall and Horowitz 2007; Li and Hsing 2007; Shin 2009). As a very popular smooth technique, polynomial spline or regression spline method can produce a smooth function estimate and can be operated easily, so it has received considerable attention in nonparametric/semiparametric regression (see Chen 1991; Stone 1994; Stone et al. 1997; Zhou et al. 1998; Huang 2003a, b; Huang et al. 2004a; Huang and Shen 2004b and so on). However, there is limited literature discussed the polynomial spline method in the functional data case. We only noted that Ramsay and Silverman (1997, 2005) applied polynomial spline to estimate the functional linear model, but they didn’t investigate the asymptotic behaviors of the estimator.

In this paper, we focus on the polynomial spline estimators for the partial functional linear models. We employ the polynomial spline basis to approximate the functional coefficients. Using profile least squares technique, we obtain the optimal convergence rate and asymptotic normality for estimators of parameters. Based on these estimators, we also have the limiting distribution of Wald test statistic for linear hypothesis of parameters. The numerical studies indicate our proposed procedure can enjoy more smoothness for functional coefficients in finite samples.

The rest of the paper is organized as follows. In Sect. 2, we introduce the polynomial spline estimate for partial functional linear models. Section 3 investigates the asymptotic properties of estimators and discusses statical inference problem. Simulation studies are presented in Sect. 4. Conclusion and further research are given in Sect. 5. All technical details and proofs are given in the “Appendix”.

2 Polynomial spline estimation

Let the observed data $(X_i,\mathbf{Z}_i,Y_i)$, $i=1,\ldots , n$, which are independent and identically distributed (i.i.d.), be generated from the following partial functional linear model

$$\begin{aligned} Y_i=\int \limits _0^1 X_i(t)\alpha (t)dt+ \mathbf{Z}_i^{T}\beta +\varepsilon _i, \quad i=1,\ldots ,n, \end{aligned}$$

(1)

where $Y_i$ and $\mathbf{Z}_i=(Z_{i1},\ldots ,Z_{ip})^{T}$ are the scalar response variable and the p-dimensional predictor vector, respectively. The predictor variable $X_i$ is a random function valued in $H=L^2([0,1])$, the Hilbert space containing square integrable functions defined on the unit interval. Let $\langle \phi ,\varphi \rangle =\int ^1_0 \phi (t)\varphi (t)dt$ denote the usual inner product of function $\phi $ and $\varphi $ and let $\Vert \phi \Vert =\langle \phi ,\phi \rangle ^{1/2}$ denote the norm of H. The random errors $\varepsilon _i$ are independent and identically distributed with mean 0 and finite variance $\sigma ^2$ and are independent of $(X_i,\mathbf{Z}_i)$. Let $\beta $ be an unknown p-dimensional parameter vector and $\alpha (t)$ be an unknown smoothing function belonging to H.

Before introducing the polynomial spline estimation, we recall simply the polynomial spline function. Let $k\ge 0$. The sequence $0=t_0<t_1<\cdots <t_{N_n}<t_{N_{n}+1}=1$ is a partition of interval [0, 1], which is called knot sequence. Suppose a function is a polynomial of degree k on each of the intervals $[t_i,t_{i+1}](i=0,1,\ldots ,N_{n})$, and it has $k-1$ continuous derivatives for $k\ge 1$ on the interval [0, 1], then it is called spline function of degree k.

We next consider the polynomial spline estimate $\widehat{\alpha }$ of $\alpha $. Let $S_{k,N_n}$ be the space of polynomial splines defined on interval [0, 1] with degree k and $N_n$ interior knots. The space $S_{k,N_n}$ is a $K_n$-dimensional linear space, $K_n=N_n+k+1$. From Theorem XII.1 of de Boor (2001), we can conclude that, if the slope function $\alpha (t)$ is sufficiently smooth, there is a spline function $a(t)\in S_{k,N_n} $ such that

$$\begin{aligned} \alpha (t)\approx a(t)=\sum _{s=1}^{K_n}b_sB_s(t), \end{aligned}$$

(2)

where $B_j, j=1,\ldots , K_n$ are the B-spline basis functions. Plugging the approximation (2) into model (1), we have

$$\begin{aligned} Y_i \approx \sum _{s=1}^{K_n}b_s \langle X_i, B_s \rangle +\mathbf{Z}_i^{\,T}\beta +\varepsilon _i, \quad i=1, \ldots , n, \end{aligned}$$

(3)

where the spline coefficient vector $b=(b_1,\ldots , b_{K_n})^T$ and parameter vector $\beta $ are to be estimated. Then, the semiparametric estimation problem in model (1) turns into the ordinary parametric estimation problem.

Let the square loss function

$$\begin{aligned} l(\beta ,b)=\sum _{i=1}^n\left( Y_i-\mathbf{Z}_i^{T}\beta -\sum _{s=1}^{K_n}b_s \langle X_i,B_s \rangle \right) ^2. \end{aligned}$$

(4)

The estimators of b and $\beta $ can be obtained by minimizing (4). For ease to discuss the asymptotic properties, we apply the profile least squares procedure to estimate unknown spline coefficients and parameters. The estimators for $\beta $ and b are given by

$$\begin{aligned} \widehat{\beta }=(\mathbf{Z}^T (I-A) \mathbf{Z})^{-1} \mathbf{Z}^T(I-A)Y, \quad \widehat{b}=(B^T B)^{-1}B^T( Y- \mathbf{Z} \widehat{\beta }), \end{aligned}$$

(5)

where $Y = (Y_1, \ldots , Y_n)^T$, $\mathbf{Z}=(\mathbf{Z}_1, \ldots , \mathbf{Z}_n)^T$, $B=\big \{\langle X_i,B_{j}\rangle \big \}_{\mathop {i=1,\dots ,n}\limits _{j=1,\dots ,K_n}}$ and $A = B(B^T B)^{-1} B^T$. Then, the polynomial spline estimator of $\alpha (t)$ and the estimator of $\sigma ^2$ can be respectively defined by

$$\begin{aligned} \widehat{\alpha }(t)=\sum _{s=1}^{K_n} \widehat{b}_s B_s(t), \quad ~~~\widehat{\sigma }^{\,2}_n = \frac{1}{n}\sum \limits _{i=1}^n \Big (Y_i- \langle X_i,\widehat{\alpha }\rangle -\mathbf{Z}_i^{\,T}\widehat{\beta }\Big )^2. \end{aligned}$$

(6)

3 Asymptotic properties

In this section, we investigate the asymptotic properties of the polynomial spline estimators. For ease to discuss the asymptotic behaviors of our proposed estimators, the following notation is needed. For two sequences of positive numbers $a_n$ and $b_n$, $a_n\lesssim b_n$ signifies $a_n/b_n$ is uniformly bounded and $a_n\asymp b_n$ if $a_n\lesssim b_n$ and $b_n\lesssim a_n$. The covariance operator $\Gamma $ of the random function X is defined as $\Gamma x(t)=\int _0^1 EX(t)X(s)x(s)ds, x\in H$. The norm $\Vert \cdot \Vert $ of a function $f\in C^{k+1}([0,1])$ is defined as $\Vert f\Vert =\big (\int _0^1 f(t)^2dt\big )^{1/2}$.

In order to establish the theoretical properties of polynomial spline estimation, the following assumptions are required:

(C1)
There are some positive constants M and $\frac{1}{4(k+1)}<r<\frac{1}{2}$ such that
$$\begin{aligned} h=\max _{j=0,\ldots ,N_n}(t_{j+1}-t_j)\asymp n^{-r}, \quad K_n\asymp n^r,\quad h/\min _{j=0,\ldots ,N_n}(t_{j+1}-t_j)\le M. \end{aligned}$$
(C2)
$E||X||^4<\infty $ and the eigenvalues of the covariance operator $\Gamma $ of X are strictly positive.
(C3)
$E|Z_{11}|^4+\cdots +E|Z_{1p}|^4+E|\varepsilon _1|^4<\infty .$
(C4)
For $j=1,\ldots ,p$, $E(Z_{1j}|X_1)$ is a continuous linear functional, that is, there exists a function $g_j\in H$ such that $E(Z_{1j}|X_1)=\langle X_1,g_j \rangle $. Further, we assume $g_j, j=1,\ldots ,p$ and slope function $\alpha $ are smooth enough, that is, $g_j\in C^{k+1}([0,1])$, $\alpha \in C^{k+1}([0,1])$.
(C5)
Let $\eta _{1j}=Z_{1j}-E(Z_{1j}|X_1)=Z_{1j}-\langle X_1,g_j \rangle , j=1,\ldots ,p$, $\eta _1=(\eta _{11},\ldots ,\eta _{1p})^T$. Furthermore, we assume that $\Sigma =E\eta _1\eta _1^T$ is a positive definite matrix.

Remark 1

Conditions (C1)–(C5) are very general in polynomial spline estimation and functional linear model. In fact, condition (C1) is similar to (3) in Zhou et al. (1998). For the number of spline basis $K_n$, the requirement is similar to (16) in Shin (2009). Condition (C2) is very common in functional linear model (see H1 and H2 in Cardot et al. 1999 and (12) in Shin 2009). However, we don’t need additional assumption on eigenvalues of the covariance operator $\Gamma $ like (14) in Shin (2009). Condition (C3) is similar to (11) in Aneiros-Pérez and Vieu (2006) and (17) in Shin (2009). Condition (C4) requires the dependence between the covariate $Z_{1j}, (j=1,\ldots ,p)$ and the random function $X_1$ is a continuous linear functional, which is a special case of conditional expectation operators $E(X_{ij}|T_i=t)$ in Aneiros-Pérez and Vieu (2006). Furthermore, to assure the validity of the polynomial spline estimation, we need a restricted smooth condition on each functional coefficient $g_j$ and $\alpha $. Condition (C5) is similar to (12) in Aneiros-Pérez and Vieu (2006) and (20) in Shin (2009).

Under the above assumption conditions, we have the following results.

Theorem 1

If conditions (C1)–(C5) hold, as $n\rightarrow \infty $, we have

$$\begin{aligned} \sqrt{n}(\widehat{\beta }-\beta )\mathop {\longrightarrow }\limits ^{D} N(0,\sigma ^2\Sigma ^{-1}). \end{aligned}$$

Theorem 2

Suppose that conditions (C1)–(C5) are satisfied, then

$$\begin{aligned} \Vert \widehat{\alpha }-\alpha \Vert ^2=O_p\Big (\frac{K_n}{n}+K_n^{-2(k+1)}\Big ). \end{aligned}$$

Remark 2

For the estimation of the parameter vector, Theorem 1 shows that the asymptotic result is similar to Theorem 1(i) in Aneiros-Pérez and Vieu (2006) and Theorem 3.1 in Shin (2009). For the estimation of functional coefficient, Theorem 2 indicates that, under smoother conditions ($C^{k+1}$ in particular), the global convergence rate is similar to those given in Newey (1997) and Huang and Shen (2004b) in nonparametric regression setting, which shows that the existence of a random vector as a predictor does not change the rate of convergence of the estimated functional coefficient. Moreover, if we take $ r=(a+1)/(a+2b)$ and $k=(2b-1)/2(a+1)-1$, then we can obtain the same rate of convergence of the estimated functional coefficient as Shin (2009).

For the estimator of variance $\sigma ^2$, we have the following theorem

Theorem 3

If conditions (C1)–(C5) hold, then we have

$$\begin{aligned} \sqrt{n}\left( \widehat{\sigma }_n^2-\sigma ^2\right) \mathop {\longrightarrow }\limits ^{D} N(0,\Lambda ^2), \end{aligned}$$

where $\Lambda ^2=E(\varepsilon _1^2-\sigma ^2)^2$.

Further, let $\widehat{\Sigma }_n=n^{-1}{\mathbf{Z}^T(I-A)\mathbf{Z}}$. In the light of the above theorems, we can obtain the following corollary.

Corollary 1

Under conditions (C1)–(C5), as $n\rightarrow \infty $, we have

$$\begin{aligned} \widehat{\chi }^2_{n,p}=\frac{n}{\widehat{\sigma }^2_n}(\widehat{\beta }-\beta )^T\widehat{\Sigma }_n(\widehat{\beta }-\beta )\mathop {\longrightarrow }\limits ^{D} \chi ^2_p. \end{aligned}$$

Remark 3

According to the corollary 1, we can obtain an approximate $(1-\gamma )$ asymptotic confidence region for parameter vector $\beta $, that is,

$$\begin{aligned} \Big \{a\in R^p: \frac{n}{\widehat{\sigma }^2_n}(\widehat{\beta }-a)^T\widehat{\Sigma }_n(\widehat{\beta }-a)\le \chi ^2_{p,1-\gamma }\Big \}. \end{aligned}$$

Also, we can get an approximate $(1-\gamma )$ asymptotic confidence interval for every parameter $\beta _j, j=1,\ldots ,p$, that is,

$$\begin{aligned} \left[ \widehat{\beta }_j+z_{\gamma /2}\frac{\widehat{\sigma }_n(\widehat{\Sigma }_n)^{-1}_{jj}}{\sqrt{n}}, \widehat{\beta }_j-z_{\gamma /2}\frac{\widehat{\sigma }_n(\widehat{\Sigma }_n)^{-1}_{jj}}{\sqrt{n}}\right] , \end{aligned}$$

where $\widehat{\sigma }_n(\widehat{\Sigma }_n)^{-1}_{jj}$ is the jth diagonal element of $\widehat{\sigma }_n(\widehat{\Sigma }_n)^{-1}$.

4 Simulation studies

In this section, we present some simulation results to illustrate the finite sample behaviors of the polynomial spline estimation and compare our method with the Shin (2009)’s.

4.1 Models for generating simulation data

In this subsection we specify four models to generate simulation data $\big \{(X_i,\mathbf{Z}_i,Y_i)\big \}_{i=1}^n$. In the first three models we take the same form as Lian (2011) to generate $X_i$, that is,

$$\begin{aligned} X_i=\sum _{j=1}^{50} \xi _{ij} j^{-1}\phi _j(t), \end{aligned}$$

where $\phi _1(t)=1, \phi _j(t)=\sqrt{2}\cos ((j-1)\pi t)$ for $ j\ge 2$ and $\xi _{ij}$ is independent and identical distribution with $U[-\sqrt{3},\sqrt{3}]$.

Model 1: $Y_i=1.5Z_{i1}-Z_{i2}+2Z_{i3}+\int _0^1 X_i(t)\alpha (t)dt+\varepsilon _i$, where $\mathbf{Z}_i=(Z_{i1}, Z_{i2}, Z_{i3})^T$ is from a multivariate normal distribution $N(0,\Phi )$ with covariance matrix $\Phi =[0.9, 0.2,0.3;0.2,0.5,0.1;0.3,0.1,1]$. The functional coefficient $\alpha (t)=\sum _{j=1}^{50} b_j\phi _j(t)$, where $b_1=0.5, b_j=4j^{-2}$, for $ j\ge 2$ and the error variable $\varepsilon _i $ is N(0, 1).

Model 2: $Y_i=2Z_{i1}-Z_{i2}+\int _0^1 X_i(t)\alpha (t)dt+\varepsilon _i$, $ Z_{i1}=\int _0^1 X_i(t)\alpha _1(t)dt+\varepsilon _{i1}$, $ Z_{i2}=\int _0^1 X_i(t)\alpha _2(t)dt+\varepsilon _{i2} $, where the functional coefficient $\alpha (t)$ is similarly defined in Model 1. In addition, functional coefficients $\alpha _1(t)=\sum _{j=1}^{50} b_{1j}\phi _j(t)$ and $\alpha _2(t)=\sum _{j=1}^{50} b_{2j}\phi _j(t)$, where $b_{11}=1, b_{21}=-0.5, b_{1j}=2j^{-2}, b_{2j}=3j^{-2}$ for $j\ge 2$. Random error variables $\varepsilon _i$ and $\varepsilon _{i1}$ are N(0, 0.25) and $\varepsilon _{i2}$ is N(0, 0.64).

Model 3: $Y_i=1.5Z_{i1}+5Z_{i2}-1.7Z_{i3}+\int _0^1 X_i(t)\alpha (t)dt+\varepsilon _i$, where $\mathbf{Z}_i=(Z_{i1}, Z_{i2}, Z_{i3})^T$ is from a multivariate normal distribution $N(0,\mathbf{I}_3)$. The functional coefficient is given by

$$\begin{aligned} \alpha (t)=2\sin (0.5\pi t)+4\sin (1.5\pi t)+5\sin (2.5\pi t), \end{aligned}$$

which is similar to example (a) in Cardot et al. (2003). The error variable $\varepsilon _i$ is N(0, 0.36).

Model 4: We take the same example in Shin (2009), that is,

$$\begin{aligned} Y_i=2Z_{i1}-Z_{i2}+1.5Z_{i3}+5Z_{i4}-1.7Z_{i5}+\int \limits _0^1 X_i(t)\alpha (t)dt+\varepsilon _i, \end{aligned}$$

where $X_i(t)$ is a standard Brownian motion and $\alpha (t)=\sqrt{2}\sin (\pi t/2)+3\sqrt{2}\sin (3\pi t/2)$. Random vector $\mathbf{Z}_i=(Z_{i1}, Z_{i2}, Z_{i3}, Z_{i4}, Z_{i5})^T$ is from a multivariate normal distribution $N(0,\mathbf{I}_5)$, and error variable $\varepsilon _i$ is N(0, 1).

For practicality, the random functions $X_i(t)$ in Models 1–4 are all only observed at 100 equally spaced points on [0, 1].

4.2 Implementation

In this subsection, we specifically illustrate implementation of our method and Shin (2009)’s method. To implement Shin (2009)’s method, we need to turn discrete observation data of $X_i(t)$ into functional data objects. In this paper, we utilize the method mentioned in Chapter 4 of Ramsay et al. (2009) and choose 25 B-spline functions to build functional data. We also use pca.fd function mentioned in Chapter 7 of Ramsay et al. (2009) to carry out a functional principal components analysis. For our procedure, we have to choose the degrees of spline functions, the positions and the number of knots. Similarly to Huang and Shen (2004b), we choose B-spline basis with equally spaced knots and the fixed degree 2 in this paper. Then, we only need to select the number of the B-spline basis and eigenfunctions $K_n$. Many methods can be used to select $K_n$, for example, AIC (Akaike 1974), BIC (Schwarz 1978), “leave-one-subject-out” cross-validation (Rice and Silverman 1991) and modified multi-fold cross-validation (Cai et al. 2000). In this paper, we use “leave-one-subject-out” cross-validation technique to choose the number of B-spline basis and eigenfunctions. Specifically, we select $K_n$ by minimizing the following cross-validation score:

$$\begin{aligned} CV(K_n)=\sum _{i=1}^n \Big (Y_i- \langle X_i, \widehat{\alpha }^{-i} \rangle -\mathbf{Z}_i^T\widehat{\beta }^{-i}\Big )^2, \end{aligned}$$

where $\widehat{\alpha }^{-i}$ and $\widehat{\beta }^{-i}$ are estimators computed by deleting the ith observation $(X_i, \mathbf{Z}_i, Y_i)$. In our procedure, the number of B-spline basis ranges from 3 to 12 and the number of eigenfunctions ranges from 1 to 10. For the integrals involved in matrix B, we approximate them by the trapezoidal rule.

Two risk functions are used to assess the performances of our estimators and the Shin (2009)’s: the mean square prediction error of the response variable Y, which is similar to (26) in Cardot et al. (2003),

$$\begin{aligned} \text {MSPE}=n^{-1}\sum _{i=1}^n \left( \widehat{Y}_i-\mathbf{Z}_i^T\beta -\int _0^1 X_i(t)\alpha (t)dt\right) ^2, \end{aligned}$$

and the square-root of average squared error (RASE) of functional coefficient $\alpha (t)$, which is similar to (6) in Huang and Shen (2004b),

$$\begin{aligned} \text {RASE}=\left[ n^{-1}_{grid}\sum _{k=1}^{n_{grid}}(\widehat{\alpha }(t_k)-\alpha (t_k))^2\right] ^{1/2}, \end{aligned}$$

where $\{t_k, k=1,\ldots , n_{grid}\}$ are grid points chosen to be equally spaced on the interval [0, 1]. In this paper, the number of grid points $n_{grid}=101$.

We use Matlab to implement our procedure. For each simulation model above-mentioned, we consider two different sample sizes: $n=100$ and $n=500$, and each simulation experiment has been repeated 500 times.

4.3 Simulation results

In this subsection, we present some simulation results of 4 simulation models mentioned in (4.1). Notations $_s$ and $_p$ denote our estimation method and the Shin (2009)’s, respectively.

Figure 1 displays the empirical distribution function of $\widehat{\chi }_{n,3}^2$ from 500 simulated samples under Model 1. For Models 2–4, the empirical distribution functions of $\widehat{\chi }_{n,p}^2$ have same performance, so we omit them to save space. We can see from this figure that as the sample size n increases, the empirical distribution more and more approach the theoretical distribution, which also reveals the validity of asymptotic normality in Sect. 3.

Table 1 The MSE of estimators $\widehat{\beta }$ and $\widehat{\sigma }_n^2$ under different models and sample sizes

Full size table

Table 2 The mean(sd) of RASE and MSPE for Models 1–4

Full size table

Table 1 summarizes mean squared errors (MSE) of estimators $\widehat{\beta }$ and $\widehat{\sigma }_n^2$ under Models 1–4. Table 2 presents the mean and standard deviation of RASE and MSPE to evaluate the performance of our estimation procedure. Figure 2 shows our estimate (dashed curve) and the Shin (2009)’s (dotted curve) from the typical samples which correspond to the minimum of $\text {RASE}_s$ and $\text {RASE}_p$ under Models 1–4, respectively. From these results in our simulation examples, we can know that the two estimation methods are very close for the parameter component $\beta $ and $\sigma $. However, from the prediction perspective and the estimation effect of function coefficient $\alpha (t)$, if the functional coefficient $\alpha (t)$ can be expressed linear combination of eigenfunctions of covariance operator $\Gamma $, the Shin (2009)’s method is superior to ours, if not, our method seems to perform better than Shin (2009)’ method. At the same time, the differences between the two estimation methods become smaller and smaller as the sample size n increases.

Table 3 displays the mean and standard deviation of running cpu time in a Dell personal computer with Inter(R) Core(TM)2 Duo CPU. We seem to infer from Table 3 that our method is more computationally expedient at least in the examples studied when the sample size is small, while the Shin (2009)’s is more computationally expedient if the sample size is large.

Table 3 The mean(sd) of running cpu time for Models 3–4

Full size table

5 Conclusion and further research

In this paper, we propose the polynomial spline estimation for the partial functional linear model. Some asymptotic results are established, including asymptotic normality for the parameter vector and the global rate of convergence for the functional coefficient. By simulation studies, we verify the validity of theoretical results. On the one hand, from the prediction perspective and the estimation effect of function coefficient $\alpha (t)$, we detect if the functional coefficient $\alpha (t)$ can be expressed linear combination of eigenfunctions of covariance operator $\Gamma $, the Shin (2009)’s method is superior to ours, if not, our method seems to perform better than Shin (2009)’ method. While the differences between the two estimation methods become smaller and smaller as the sample size n increases. On the other hand, from computational time, we can draw a conclusion that our method is more computationally expedient at least in the examples studied when the sample size is small, while the Shin (2009)’s is more computationally expedient if the sample size is large. From our limited study, we only consider the functional predictor would be observed fully. However, we can usually obtain some sparsely discrete observations for each functional observation in practice. For this case, we can use smooth techniques to approximate the functional observations. And then, the polynomial spline method can also be used to estimate the partial functional linear model.

References

Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19:716–723
Article MathSciNet MATH Google Scholar
Aneiros-Pérez G, Vieu P (2006) Semi-functional partial linear regression. Stat Probab Lett 76:1102–1110
Article MathSciNet MATH Google Scholar
Aneiros-Pérez G, Vieu P (2008) Nonparametric time series prediction: a semi-functional partial linear modeling. J Multivar Anal 99:834–857
Article MathSciNet MATH Google Scholar
Baíllo A, Grané A (2009) Local linear regression for functional predictor and scalar response. J Multivar Anal 100:102–111
Article MathSciNet MATH Google Scholar
Burba F, Ferraty F, Vieu P (2009) K-nearest neighbour method in functional nonparametric regression. J Nonparametric Stat 21:453–469
Article MathSciNet MATH Google Scholar
Cai TT, Hall P (2006) Prediction in functional linear regression. Annals Stat 34:2159–2179
Article MathSciNet MATH Google Scholar
Cai Z, Fan J, Yao Q (2000) Functional-coefficient regression model for nonlinear time series. J Am Stat Assoc 95:941–956
Article MathSciNet MATH Google Scholar
Cardot H, Ferraty F, Sarda P (1999) Functional linear model. Stat Prob Lett 45:11–22
Article MathSciNet MATH Google Scholar
Cardot H, Ferraty F, Sarda P (2003) Spline estimators for the functional linear model. Stat Sin 13:571–591
MathSciNet MATH Google Scholar
Cardot H, Sarda P (2008) Varying-coefficient functional linear regression models. Commun Stat Theory Methods 37:3186–3203
Article MathSciNet MATH Google Scholar
Chen H (1991) Polynomial splines and nonparametric regression. J Nonparametric Stat 1:143–156
Article MathSciNet MATH Google Scholar
Crambes C, Kneip A, Sarda P (2009) Smoothing splines estimators for functional linear regression. Annals Stat 37:35–72
Article MathSciNet MATH Google Scholar
de Boor C (2001) A practical guide to splines. Springer, New York
MATH Google Scholar
DeVore RA, Lorentz GG (1993) Constructive approximation. Springer, Berlin
Book MATH Google Scholar
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
MATH Google Scholar
Huang JZ (2003a) Asymptotics for polynomial spline regression under weak conditions. Stat Probab Lett 65:207–216
Article MathSciNet MATH Google Scholar
Huang JZ (2003b) Local asymptotics for polynomial spline regression. Annals Stat 31:1600–1635
Article MathSciNet MATH Google Scholar
Huang JZ, Wu CO, Zhou L (2004a) Polynomial spline estimation and inference for varying coefficient model with longitudinal data. Stat Sin 14:763–788
MathSciNet MATH Google Scholar
Huang JZ, Shen H (2004b) Functional coefficient regression models for non-linear time series: a polynomial spline approach. Scand J Stat 31:515–534
Article MathSciNet MATH Google Scholar
Hall P, Horowitz J (2007) Methodology and convergence rates for functional linear regression. Annals Stat 35:70–91
Article MathSciNet MATH Google Scholar
Li Y, Hsing T (2007) On rates of convergence in functional linear regression. J Multivar Anal 98:1782–1804
Article MathSciNet MATH Google Scholar
Lian H (2011) Functional partial linear model. J Nonparametric Stat 23:115–128
Article MathSciNet MATH Google Scholar
Newey WK (1997) Convergence rates and asymptotic normality for series estimators. J Econom 79:147–168
Article MathSciNet MATH Google Scholar
Ramsay J, Dalzell C (1991) Some tools for functional data analysis. J R Stat Soc Ser B 53:539–572
MathSciNet MATH Google Scholar
Ramsay J, Silverman B (1997) Functional data analysis. Springer, New York
Book MATH Google Scholar
Ramsay J, Silverman B (2005) Functional data analysis, 2nd edn. Springer, New York
MATH Google Scholar
Ramsay J, Hooker G, Graves S (2009) Functional data analysis with R and Matlab. Springer, New York
Book MATH Google Scholar
Rice JA, Silverman BW (1991) Estimating the mean and covariance structure nonparametrically when the data are curves. J R Stat Soc Ser B 53:233–243
MathSciNet MATH Google Scholar
Shin H (2009) Partial functional linear regression. J Stat Plan Inference 139:3405–3418
Article MathSciNet MATH Google Scholar
Stone CJ (1994) The use of polynomial splines and their tensor products in multivariate function estimation (with discussion). Annals Stat 22:118–171
Article MATH Google Scholar
Stone CJ, Hansen M, Kooperberg C, Truong YK (1997) Polynomial splines and their tensor products in extended linear modeling (with discussion). Annals Stat 25:1371–1470
Article MathSciNet MATH Google Scholar
Schwarz G (1978) Estimating the dimension of a model. Annals Stat 6:461–464
Article MathSciNet MATH Google Scholar
Zhang D, Lin X, Sowers M (2007) Two-stage functional mixed models for evaluating the effect of longitudinal covariate profiles on a scalar outcome. Biometrics 63:351–362
Article MathSciNet MATH Google Scholar
Zhou S, Shen X, Wolfe DA (1998) Local asymptotics for regression splines and confidence regions. Annals Stat 26:1760–1782
Article MathSciNet MATH Google Scholar
Zhou J, Chen M (2012) Spline estimators for semi-functional linear model. Stat Probab Lett 82:505–513
Article MathSciNet MATH Google Scholar

Download references

Acknowledgments

The work was supported by National Nature Science Foundation of China (Grant Nos. 10961026, 11171293, 11225103, 11301464), the PH.D. Special Scientific Research Foundation of Chinese University (20115301110004), the Key Fund of Yunnan Province (Grant No. 2010CC003) and the Scientific Research Foundation of Yunnan Provincial Department of Education (No. 2013Y360). We are grateful to the referees and the editors for their constructive remarks that greatly improved the manuscript.

Author information

Authors and Affiliations

School of Mathematics and Statistics, Yunnan University, Kunming, 650091, China
Jianjun Zhou & Qingyan Peng
Department of Statistics, The Pennsylvania State University, Thomas building, University park, PA, 16802, USA
Zhao Chen

Authors

Jianjun Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Zhao Chen
View author publications
You can also search for this author in PubMed Google Scholar
Qingyan Peng
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Jianjun Zhou.

Appendix

In the appendix, we give the proofs of the theorems and corollary in Sect. 3.

Set $B_{s}={K_n}^{1/2}N_{s}^{b}, s=1,\ldots , K_n$, where $N_{s}^{b}$ are the normalized B-splines. From the Theorem 4.2 of Chapter 5 of DeVore and Lorentz (1993), we have that for any spline function $\sum _{s=1}^{K_n} b_s B_{s}$, there are positive constants $M_1$ and $M_2$ such that

$$\begin{aligned} M_1\Vert b\Vert _2^2\le \int \left\{ \sum _{s=1}^{K_n} b_s B_{s}\right\} ^2\le M_2\Vert b\Vert _2^2, \end{aligned}$$

(7)

where $\Vert \cdot \Vert _2$ is Euclidean norm. Let $\Vert r\Vert _{\infty }=\sup \nolimits _{x\in [0,1]} |r(x)|$.

In order to prove the theorems, we need the following two lemmas.

Lemma 1

If conditions (C1) and (C2) hold, then we have

(i)
$$\begin{aligned} \sup _{a\in S_{k,N_n}}\Big |\frac{\frac{1}{n}\sum _{i=1}^n \langle X_i,a \rangle ^2}{E \langle X,a \rangle ^2}-1\Big |=o_p(1). \end{aligned}$$
(ii)
there exists an interval $[M_3,M_4],0<M_3<M_4<\infty $ such that as $n\rightarrow \infty $,
$$\begin{aligned} P\Big \{\text {all the eigenvalues of}~\frac{1}{n}B^TB~\text {fall in}~[M_3,M_4]\Big \}\rightarrow 1. \end{aligned}$$

Note that the Lemma 1 is a generalization of Lemma 1 and 2 in Huang and Shen (2004b) in functional data case. We give a brief proof in the following.

Proof

(i) Let $\Gamma _n$ denote the empirical versions of operator $\Gamma $, that is,

$$\begin{aligned} \Gamma _n x(t)=\frac{1}{n}\sum _{i=1}^n \langle X_i,x \rangle X_i(t),\quad x\in H,t\in [0,1]. \end{aligned}$$

By the Cauchy–Schwarz inequality, condition (C2) and (28) in Cardot et al. (2003), we have

$$\begin{aligned} \Big |\frac{\frac{1}{n}\sum _{i=1}^n \langle X_i,a \rangle ^2}{E \langle X,a \rangle ^2}-1\Big |= & {} \Big |\frac{\langle (\Gamma _n-\Gamma )a,a\rangle }{\langle \Gamma a,a\rangle }\Big |\\\le & {} \frac{\Vert \Gamma _n-\Gamma \Vert _{\infty }\Vert a\Vert ^2}{C\Vert a\Vert ^2}\\= & {} \frac{\Vert \Gamma _n-\Gamma \Vert _{\infty }}{C}. \end{aligned}$$

Then for an arbitrary constant $\epsilon >0$, by Lemma 5.2 in Cardot et al. (1999), we have

$$\begin{aligned} P\left\{ \sup _{a\in S_{k,N_n}}\Big |\frac{\frac{1}{n}\sum _{i=1}^n\langle X_i,a\rangle ^2}{E\langle X,a\rangle ^2}-1\Big |>\epsilon \right\}\le & {} P\Big \{ \Vert \Gamma _n-\Gamma \Vert _{\infty }>C\epsilon \Big \}\\\le & {} \frac{E\Vert \Gamma _n-\Gamma \Vert _{\infty }^2}{C^2\epsilon ^2}\\\le & {} \frac{E\Vert X\Vert ^4}{nC^2\epsilon ^2}, \end{aligned}$$

together with (C2), which gives the result.

(ii) Let $b=(b_1,\ldots ,b_{K_n})^T, a=\sum _{s=1}^{K_n} b_sB_s$. It follows from (i) that except an event whose probability tends to zero as $n\rightarrow \infty $,

$$\begin{aligned} \frac{1}{n}b^TB^TBb=\frac{1}{n}\sum ^n_{i=1}\left( \sum ^{K_n}_{s=1}b_s\langle X_i,B_s\rangle \right) ^2\asymp E\langle X,a\rangle ^2. \end{aligned}$$

By the Cauchy–Schwarz inequality, (28) in Cardot et al. (2003) and (7),

$$\begin{aligned} E\langle X,a\rangle ^2\asymp \Vert a\Vert ^2\asymp \Vert b\Vert _2^2. \end{aligned}$$

Thus, except an event whose probability tends to zero, $\frac{1}{n}b^TB^TBb\asymp \Vert b\Vert _2^2,$ holds uniformly for all b, which yields the result. $\square $

Lemma 2

Under conditions (C1)–(C5), as $n\rightarrow \infty $, we have

$$\begin{aligned} \frac{\mathbf{Z}^T(I-A)\mathbf{Z}}{n}\mathop {\longrightarrow }\limits ^{P}\Sigma . \end{aligned}$$

Proof

Let $\mu _j(X_i)=E(Z_{ij}|X_i)=\langle X_i,g_j\rangle , \eta _{ij}=Z_{ij}-\mu _j(X_i)$,

$$\begin{aligned} \tilde{V_j}=\Big (\mu _j(X_1),\ldots ,\mu _j(X_n)\Big )^T,\quad \tilde{\eta _j}=(\eta _{1j},\ldots ,\eta _{nj})^T,\quad j=1,\ldots ,p. \end{aligned}$$

We also define $V=(\tilde{V_1},\ldots ,\tilde{V_p})$, $\eta =(\tilde{\eta _1},\ldots ,\tilde{\eta _p})$. Then, $\mathbf{Z}=\eta +V$ and

$$\begin{aligned} \frac{\mathbf{Z}^T(I-A)\mathbf{Z}}{n}= & {} \frac{(\eta +V)^T(I-A)(\eta +V)}{n}\\= & {} \frac{\eta ^T(I-A)\eta }{n}+\frac{\eta ^T(I-A)V}{n}+\frac{V^T(I-A)\eta }{n} +\frac{V^T(I-A)V}{n}\\= & {} I_1+I_2+I_3+I_4. \end{aligned}$$

For the (j, l)th element of $I_1$

$$\begin{aligned} (I_1)_{jl}=\frac{\tilde{\eta _j}^T(I-A)\tilde{\eta _l}}{n}=\frac{\tilde{\eta _j}^T \tilde{\eta _l}}{n}-\frac{\tilde{\eta _j}^TA\tilde{\eta _l}}{n}, \quad j,l=1,\ldots ,p. \end{aligned}$$

By independence and the Cauchy–Schwarz inequality, we have

$$\begin{aligned} E\left\{ \frac{\sum _{i=1}^n[\eta _{ij}\eta _{il}-E(\eta _{ij} \eta _{il})]}{n}\right\} ^2= & {} \frac{E(\eta _{1j}\eta _{1l}-E\eta _{1j}\eta _{1l})^2}{n}\\\le & {} \frac{E\eta _{1j}^2\eta _{1l}^2}{n}\\\le & {} \frac{(E\eta _{1j}^4)^{1/2}(E\eta _{1l}^4)^{1/2}}{n}. \end{aligned}$$

Further, by $C_r$ inequality and (C2)–(C4), we have

$$\begin{aligned} E(\eta _{1j})^4= & {} E\Big (Z_{1j}-\langle X_1,g_j\rangle \Big )^4\le 8\Big (E|Z_{1j}|^4+E|\langle X_1,g_j\rangle |^4\Big )\\\le & {} 8\Big (E|Z_{1j}|^4+E\Vert X_1\Vert ^4\Vert g_j\Vert ^4\Big )<\infty , \quad j=1,\ldots ,p. \end{aligned}$$

Thus,

$$\begin{aligned} \frac{\eta ^T\eta }{n}\mathop {\longrightarrow }\limits ^{P}\Sigma . \end{aligned}$$

(8)

Note that $A\ge 0$, then we have

$$\begin{aligned} \Big |\frac{\tilde{\eta _j}^TA\tilde{\eta _l}}{n}\Big |\le \Big | \frac{\tilde{\eta _j}^TA\tilde{\eta _j}}{n}\Big |^{1/2}\Big |\frac{\tilde{\eta _l}^TA \tilde{\eta _l}}{n}\Big |^{1/2}. \end{aligned}$$

By Lemma 1, we can know that except an event whose probability tends to zero,

$$\begin{aligned} \frac{\tilde{\eta _j}^TA\tilde{\eta _j}}{n}=\frac{\tilde{\eta _j}^TB(B^TB)^{-1}B^T \tilde{\eta _j}}{n}\asymp \frac{\tilde{\eta _j}^TBB^T\tilde{\eta _j}}{n^2}. \end{aligned}$$

Also note that $E\langle X_i,B_s\rangle \eta _{ij}=E\langle X_i,B_s\rangle E(\eta _{ij}|X_i)=0$. Then, by (7) and conditions (C2)–(C4), we have there exists a positive constant C such that

$$\begin{aligned} E\tilde{\eta _j}^TBB^T\tilde{\eta _j}= & {} E\left\{ \sum _{s=1}^{K_n}\left[ \sum _{i=1}^n\langle X_i,B_s\rangle \eta _{ij}\right] ^2\right\} \\= & {} n\sum _{s=1}^{K_n} E\langle X_1,B_s\rangle ^2\eta _{1j}^2\\\le & {} n\sum _{s=1}^{K_n} \Vert B_s\Vert ^2\big (E\Vert X_1\Vert ^4\big )^{1/2}\big (E|\eta _{1j}|^4\big )^{1/2}\\\le & {} C n K_n. \end{aligned}$$

Thus, for $j,l=1,\ldots ,p$,

$$\begin{aligned} \frac{\tilde{\eta _j}^TA\tilde{\eta _l}}{n}=O_p\Big (\frac{K_n}{n}\Big )=o_p(1), \end{aligned}$$

which together with (8) yields

$$\begin{aligned} I_1\mathop {\longrightarrow }\limits ^{P}\Sigma . \end{aligned}$$

(9)

For the (j, l)-th element of $I_4$, $j,l=1,\ldots ,p$,

$$\begin{aligned} (I_4)_{jl}=\frac{\tilde{V_j}^T(I-A)\tilde{V_l}}{n}, \end{aligned}$$

by Cauchy–Schwartz inequality,

$$\begin{aligned} E\Big |\tilde{V_j}^T(I-A)\tilde{V_l}\Big |\le \Big (E\tilde{V_j}^T(I-A)\tilde{V_j}\Big )^{1/2}\Big (E\tilde{V_l}^T(I-A) \tilde{V_l}\Big )^{1/2}. \end{aligned}$$

It follows from Theorem XII.1 of de Boor (2001) that there exist positive constant $C_j$ and spline function $g_j^*\in S_{k,N_n}, j=1,\ldots ,p$ such that

$$\begin{aligned} \Vert g_j-g_j^*\Vert _{\infty }\le C_j h^{k+1}. \end{aligned}$$

Set $g_j^*=\sum _{s=1}^{K_n} b_{js}^*B_{s}, \quad b_j^*=(b_{j1}^*,\ldots ,b_{jK_n}^*)^T,\quad j=1,\ldots ,p$, then,

$$\begin{aligned} \tilde{V_j}^*=\Big (\langle X_1,g_j^*\rangle ,\ldots ,\langle X_n,g_j^*\rangle \Big )^T=Bb_j^*. \end{aligned}$$

As A is an orthogonal projection matrix,

$$\begin{aligned} E\Big |\tilde{V_j}^T(I-A)\tilde{V_j}\Big |= & {} E\Big |(I-A)\tilde{V_j}\Big |^2\le E\Big |\tilde{V_j}-\tilde{V_j^*}\Big |^2\\\le & {} n E\Vert X_1\Vert ^2\Vert g_j-g_j^*\Vert ^2\lesssim nh^{2(k+1)}. \end{aligned}$$

From the above results and (C1), we have

$$\begin{aligned} \frac{\tilde{V_j}^T(I-A)\tilde{V_l}}{n}=O_p\big (h^{2(k+1)}\big )=o_p(1), \end{aligned}$$

that is,

$$\begin{aligned} I_4\mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$

(10)

For the (j, l)-th element of $I_2$ and $I_3$, $j,l=1,\ldots ,p$, we have

$$\begin{aligned} \frac{|\tilde{\eta _j}^T(I-A)\tilde{V_l}|}{n}\le & {} \frac{(\tilde{\eta _j}^T(I-A)\tilde{\eta _j})^{1/2}(\tilde{V_l}^T(I-A) \tilde{V_l})^{1/2}}{n},\\ \frac{|\tilde{V_j}^T(I-A)\tilde{\eta _l}|}{n}\le & {} \frac{(\tilde{V_j}^T(I-A)\tilde{V_j})^{1/2}(\tilde{\eta _l}^T(I-A) \tilde{\eta _l})^{1/2}}{n}. \end{aligned}$$

Using (9) and (10), we can infer that

$$\begin{aligned} I_2\mathop {\longrightarrow }\limits ^{P}0, \quad I_3\mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$

(11)

The combination of (9)–(11) allows us to finish the proof of Lemma 2. $\square $

Proof of Theorem 1

Denote $\Phi =\Big (\langle X_1,\alpha \rangle ,\ldots ,\langle X_n,\alpha \rangle \Big )^T$, $\varepsilon =(\varepsilon _1,\ldots ,\varepsilon _1)^T$. Then, $Y=\mathbf{Z}\beta +\Phi +\varepsilon $. We can write

$$\begin{aligned} \sqrt{n}(\widehat{\beta }-\beta )= & {} \sqrt{n}\Big [\mathbf{Z}^T(I-A)\mathbf{Z}\Big ]^{-1}{} \mathbf{Z}^T(I-A)\Phi +\sqrt{n}\Big [\mathbf{Z}^T(I-A)\mathbf{Z}\Big ]^{-1}{} \mathbf{Z}^T(I-A)\varepsilon \\= & {} \Delta _1+\Delta _2. \end{aligned}$$

Observe that

$$\begin{aligned} \Delta _1= & {} \Big [\frac{\mathbf{Z}^T(I-A)\mathbf{Z}}{n}\Big ]^{-1}n^{-1/2}\mathbf{Z}^T(I-A)\Phi =\Big [\frac{\mathbf{Z}^T(I-A)\mathbf{Z}}{n}\Big ]^{-1}\Delta _{11},\\ \Delta _2= & {} \Big [\frac{\mathbf{Z}^T(I-A)\mathbf{Z}}{n}\Big ]^{-1}n^{-1/2}{} \mathbf{Z}^T(I-A)\varepsilon =\Big [\frac{\mathbf{Z}^T(I-A)\mathbf{Z}}{n}\Big ]^{-1}\Delta _{21}. \end{aligned}$$

For $\Delta _{11}$, as $\mathbf{Z}=\eta +V$,

$$\begin{aligned} \Delta _{11}=n^{-1/2}\eta ^T(I-A)\Phi +n^{-1/2}V^T(I-A)\Phi . \end{aligned}$$

(12)

By (C4) and the Theorem XII.1 of de Boor (2001), we know that there is a spline function $\alpha ^*=\sum _{s=1}^{K_n} b_s^*B_s\in S_{k,N_n}$ and positive constant C such that

$$\begin{aligned} \Vert \alpha -\alpha ^*\Vert _{\infty }\le Ch^{k+1}. \end{aligned}$$

(13)

Set $\Phi ^*=(\langle X_1,\alpha ^*\rangle ,\ldots ,\langle X_n,\alpha ^*\rangle )^T$ and $b^*=(b_1^*,\ldots ,b_{K_n}^*)^T$, we have $\Phi ^*=Bb^*$. For $j=1,\ldots ,p$, by conditions (C1), (C2), (C4) and Theorem XII.1 of de Boor (2001), we can infer

$$\begin{aligned}&E\Big |\tilde{V_j}^T(I-A)\Phi \Big |=E\Big |(\tilde{V_j}-\tilde{V_j^*})^T(I-A) (\Phi -\Phi ^*)\Big |\\&\quad \le E\Big \{\Big |(\tilde{V_j}-\tilde{V_j^*})^T(I-A)(\tilde{V_j}-\tilde{V_j^*}) \Big |^{1/2}\Big | (\Phi -\Phi ^*)^T(I-A)(\Phi -\Phi ^*)\Big |^{1/2}\Big \}\\&\quad \le \left( E\sum _{i=1}^n\langle X_i,g_j-g_j^*\rangle ^2\right) ^{1/2} \left( E\sum _{i=1}^n\langle X_i,\alpha -\alpha ^*\rangle ^2\right) ^{1/2}\\&\quad \lesssim nh^{2(k+1)}. \end{aligned}$$

Thus, by (C1) we have

$$\begin{aligned} n^{-1/2}V^T(I-A)\Phi =O_p(n^{1/2}h^{2(k+1)})=o_p(1). \end{aligned}$$

(14)

Observe that for $j=1,\ldots ,p$,

$$\begin{aligned} \Big |n^{-1/2}\tilde{\eta _j}^T(I-A)\Phi \Big |= & {} \Big |n^{-1/2}\tilde{\eta _j}^T(I-A)(\Phi -\Phi ^*)\Big |\\\le & {} \Big |n^{-1/2}\tilde{\eta _j}^T(\Phi -\Phi ^*)\Big |+\Big |n^{-1/2} \tilde{\eta _j}^TA(\Phi -\Phi ^*)\Big |\\\le & {} n^{-1/2}\Big |\sum _{i=1}^n{\eta _{ij}}\langle X_i,\alpha -\alpha ^*\rangle \Big |\\&+\,n^{-1/2}\Big |\tilde{\eta _j}^TA\tilde{\eta _j}\Big |^{1/2} \Big |\sum _{i=1}^n\langle X_i,\alpha -\alpha ^*\rangle ^2\Big |^{1/2}\\\triangleq & {} I_{j1}+I_{j2}. \end{aligned}$$

As $E\Big (\eta _{ij}\langle X_i,\alpha -\alpha ^*\rangle \Big )=E\Big [\langle X_i,\alpha -\alpha ^*\rangle E(\eta _{ij}|X_i)\Big ]=0$ and

$$\begin{aligned} E\Big |\sum _{i=1}^n \eta _{ij}\langle X_i,\alpha -\alpha ^*\rangle \Big |^2= & {} nE\eta _{1j}^2\langle X_1,\alpha -\alpha ^*\rangle ^2\\\le & {} n \Vert \alpha -\alpha ^*\Vert ^2(E\eta _{1j}^4)^{1/2}(E\Vert X_1\Vert ^4)^{1/2}\\\lesssim & {} nh^{2(k+1)}, \end{aligned}$$

we can infer

$$\begin{aligned} I_{j1}=O_p(h^{k+1})=o_p(1). \end{aligned}$$

(15)

Further, under the Lemma 1, (C1) and (13), we can show

$$\begin{aligned} I_{j2}=O_p(K_n^{1/2}h^{k+1})=O_p(n^{-r(k+\frac{1}{2})})=o_p(1). \end{aligned}$$

(16)

By (12), (14)–(16) and Lemma 2, we have

$$\begin{aligned} \Delta _1\mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$

(17)

$\Delta _{21}$ can be expressed as

$$\begin{aligned} \Delta _{21}=n^{-1/2}\eta ^T(I-A)\varepsilon +n^{-1/2}V^T(I-A)\varepsilon \triangleq R_1+R_2. \end{aligned}$$

Let $\epsilon _i=\eta _i \varepsilon _i$. Since $\varepsilon _i$ is independent of $(X_i,\mathbf{Z}_i)$ and $(X_i,\mathbf{Z}_i,Y_i)$ is i.i.d. sequence, the $\epsilon _i$ are i.i.d. random variables with $E\epsilon _i=0$ and $Var(\epsilon _i)=\sigma ^2\Sigma $.

Observe that

$$\begin{aligned} R_1= & {} n^{-1/2}\eta ^T\varepsilon -n^{-1/2}\eta ^TA\varepsilon =n^{-1/2}\sum _{i=1}^n \eta _i \varepsilon _i-n^{-1/2}\eta ^TA\varepsilon \\= & {} n^{-1/2}\sum _{i=1}^n \epsilon _i-n^{-1/2}\eta ^TA\varepsilon . \end{aligned}$$

Then, by the central limit theorem,

$$\begin{aligned} n^{-1/2}\sum _{i=1}^n \epsilon _i\mathop {\longrightarrow }\limits ^{D}N\big (0,\sigma ^2\Sigma \big ). \end{aligned}$$

(18)

Also note that

$$\begin{aligned} \Big |n^{-1/2}\tilde{\eta _j}^TA\varepsilon \Big |\le n^{-1/2}\Big (\tilde{\eta _j}^TA\tilde{\eta _j}\Big )^{1/2} \Big (\varepsilon ^TA\varepsilon \Big )^{1/2}. \end{aligned}$$

Then, it follows from Lemma 1 that

$$\begin{aligned} \varepsilon ^TA\varepsilon =\varepsilon ^TB(B^TB)^{-1}B^T\varepsilon \asymp \frac{\varepsilon ^TBB^T\varepsilon }{n}. \end{aligned}$$

Since $E\langle X_i,B_s\rangle \varepsilon _i\langle X_j,B_s\rangle \varepsilon _j=0, i\ne j$, we have

$$\begin{aligned} E\varepsilon ^TBB^T\varepsilon= & {} E\sum _{s=1}^{K_n}\left( \sum _{i=1}^n \varepsilon _i\langle X_i,B_s\rangle \right) ^2=\sum _{s=1}^{K_n}\sum _{i=1}^nE\varepsilon _i^2\langle X_i,B_s\rangle ^2\\\le & {} n\sigma ^2\sum _{s=1}^{K_n} E\Vert X_1\Vert ^2\Vert B_s\Vert ^2\lesssim n K_n, \end{aligned}$$

that is, $\varepsilon ^TA\varepsilon =O_p(K_n)$. In addition, we can know from the proof of Lemma 2 that

$$\begin{aligned} \frac{\tilde{\eta _j}^TA\tilde{\eta _j}}{n}=O_p\Big (\frac{K_n}{n}\Big ). \end{aligned}$$

Thus,

$$\begin{aligned} n^{-1/2}\eta ^TA\varepsilon =O_p(K_n n^{-1/2})=o_p(1), \end{aligned}$$

which, together with (18), yields

$$\begin{aligned} R_1\mathop {\longrightarrow }\limits ^{D}N\big (0,\sigma ^2\Sigma \big ). \end{aligned}$$

(19)

For the jth element of $R_2$, $j=1,\ldots ,p$, we have

$$\begin{aligned} \Big |n^{-1/2}\tilde{V_j}^T(I-A)\varepsilon \Big |= & {} \Big |n^{-1/2} (\tilde{V_j}-\tilde{V_j^*})^T(I-A)\varepsilon \Big |\\\le & {} \Big |n^{-1/2}(\tilde{V_j}-\tilde{V_j^*})^T\varepsilon \Big | +\Big |n^{-1/2}(\tilde{V_j}-\tilde{V_j^*})^TA\varepsilon \Big |\\\le & {} \Big |n^{-1/2}(\tilde{V_j}-\tilde{V_j^*})^T\varepsilon \Big | +n^{-1/2}\Big |(\tilde{V_j}-\tilde{V_j^*})^T(\tilde{V_j}-\tilde{V_j^*}) \Big |^{1/2}\Big |\varepsilon A\varepsilon \Big |^{1/2}\\\triangleq & {} J_{j1}+J_{j2}. \end{aligned}$$

Since $\varepsilon _i$ is independent $(X_i,\mathbf{Z}_i)$, we have

$$\begin{aligned} E J_{j1}^2=n^{-1}\sum _{i=1}^n E\varepsilon _i^2\langle X_i,g_j-g_j^*\rangle ^2\le \sigma ^2E\Vert X_1\Vert ^2\Vert g_j-g_j^*\Vert ^2\lesssim h^{2(k+1)}. \end{aligned}$$

Then,

$$\begin{aligned} J_{j1}=O_p\big (h^{k+1}\big )=o_p(1). \end{aligned}$$

Also, observe that

$$\begin{aligned} E\sum _{i=1}^n\langle X_i,g_j-g_j^*\rangle ^2=nE\langle X_1,g_j-g_j^*\rangle ^2\le n E\Vert X_1\Vert ^2\Vert g_j-g_j^*\Vert ^2\lesssim nh^{2k+2}. \end{aligned}$$

Then, by (C1), we have

$$\begin{aligned} J_{j2}=O_p\big (K_n^{1/2}h^{k+1}\big )=O_p\big (n^{-r(k+1/2)}\big )=o_p(1). \end{aligned}$$

From the above results, we can infer

$$\begin{aligned} R_2\mathop {\longrightarrow }\limits ^{P}0. \end{aligned}$$

(20)

Now, by Lemma 2, (17), (19), (20) and Slutsky theorem, we can obtain the Theorem 1. $\square $

Proof of Theorem 2

Observe that

$$\begin{aligned} \widehat{b}=(B^TB)^{-1}B^T(Y-\mathbf{Z}\widehat{\beta })=(B^TB)^{-1}B^T\big (\mathbf{Z}(\beta -\widehat{\beta })+\Phi +\varepsilon \big ). \end{aligned}$$

Let $\tilde{Y}=\mathbf{Z}(\beta -\widehat{\beta })+\Phi $. Denote $\tilde{b}=(B^TB)^{-1}B^T\tilde{Y}$ and $\tilde{\alpha }(t)=\sum _{s=1}^{K_n} \tilde{b_s}B_s(t)$, where $\tilde{b}=(\tilde{b_1},\ldots ,\tilde{b_{K_n}})^T$. Then, $\widehat{b}-\tilde{b}=(B^TB)^{-1}B^T\varepsilon $. By Lemma 1, we have

$$\begin{aligned} \Vert \widehat{b}-\tilde{b}\Vert _2^2=\varepsilon ^T B(B^TB)^{-1}(B^TB)^{-1}B^T\varepsilon \asymp \frac{\varepsilon ^T BB^T\varepsilon }{n^2}, \end{aligned}$$

except on an event whose probability tends to zero as $n\rightarrow \infty $. Thus, by (7), we can infer

$$\begin{aligned} \Vert \widehat{\alpha }-\tilde{\alpha }\Vert ^2\asymp \Vert \widehat{b}-\tilde{b}\Vert _2^2=O_p \Big (\frac{K_n}{n}\Big ). \end{aligned}$$

(21)

Also, it follows from the Theorem XII.1 of de Boor (2001) that there exists a spline function $\alpha ^*(t)=\sum _{s=1}^{K_n} b_s^*B_s(t)\in S_{k,N_n}$ where $b^*=(b_1^*,\ldots ,b_{K_n}^*)^T$ and constant $C>0$ such that

$$\begin{aligned} \Vert \alpha ^*-\alpha \Vert \le \Vert \alpha ^*-\alpha \Vert _{\infty }\le Ch^{k+1}. \end{aligned}$$

(22)

By the Theorem XII.1 of de Boor (2001) and (7), we have

$$\begin{aligned} \Big \Vert \tilde{\alpha }-\alpha ^*\Big \Vert ^2\asymp \Big \Vert \tilde{b}-b^*\Big \Vert _2^2\asymp \frac{(\tilde{b}-b^*)^TB^TB(\tilde{b}-b^*)}{n}=\frac{\Vert B\tilde{b}-Bb^*\Vert _2^2}{n}. \end{aligned}$$

Observe that $B\tilde{b}=B(B^TB)^{-1}B^T\tilde{Y}$ and $B(B^TB)^{-1}B^T$ is an orthogonal projection matrix. Thus,

$$\begin{aligned} \frac{\Vert B\tilde{b}-Bb^*\Vert _2^2}{n}\le & {} \frac{\Vert \tilde{Y}-Bb^*\Vert _2^2}{n}=\frac{\big \Vert \mathbf{Z}(\beta -\widehat{\beta })+(\Phi -\Phi ^*)\big \Vert _2^2}{n}\\\le & {} \frac{2}{n}\Big [\Vert \mathbf{Z}(\beta -\widehat{\beta })\Vert _2^2+\Vert \Phi -\Phi ^*\Vert _2^2\Big ]\\= & {} \frac{2}{n} \sum _{i=1}^n\left\{ \sum _{j=1}^p Z_{ij}(\beta _j-\widehat{\beta }_j)\right\} ^2+\frac{2}{n} \sum _{i=1}^n \langle X_i,\alpha -\alpha ^*\rangle ^2. \end{aligned}$$

Applying (C2), (22) and the Cauchy–Schwarz inequality, we obtain that

$$\begin{aligned} E\langle X_1,\alpha -\alpha ^*\rangle ^2\le E\Vert X_1\Vert ^2\Vert \alpha -\alpha ^*\Vert ^2\le Ch^{2(k+1)}, \end{aligned}$$

that is,

$$\begin{aligned} \frac{\sum _{i=1}^n \langle X_i,\alpha -\alpha ^*\rangle ^2}{n}=O_p\big (h^{2(k+1)}\big ). \end{aligned}$$

(23)

In addition, note that

$$\begin{aligned} \frac{1}{n} \sum _{i=1}^n\left\{ \sum _{j=1}^p Z_{ij}(\beta _j-\widehat{\beta }_j)\right\} ^2\le p\sum _{j=1}^p\big (\beta _j-\widehat{\beta }_j\big )^2\frac{\sum _{i=1}^n Z_{ij}^2}{n}. \end{aligned}$$

Then, it follows from Theorem 1 and (C4) that

$$\begin{aligned} \frac{1}{n} \sum _{i=1}^n\left\{ \sum _{j=1}^p Z_{ij}(\beta _j-\widehat{\beta }_j)\right\} ^2=O_p(n^{-1}), \end{aligned}$$

(24)

which together with (23) yields

$$\begin{aligned} \Vert \tilde{\alpha }-\alpha ^*\Vert ^2=O_p\big (n^{-1}+h^{2(k+1)}\big ). \end{aligned}$$

(25)

Further, we can infer that

$$\begin{aligned} \Vert \widehat{\alpha }-\alpha \Vert ^2\le 3\Big (\Vert \widehat{\alpha }-\tilde{\alpha }\Vert ^2+\Vert \tilde{\alpha }-\alpha ^*\Vert ^2 +\Vert \alpha ^*-\alpha \Vert ^2\Big ). \end{aligned}$$

(26)

Then, the combination of (21), (22), (25) and (26) allows us to complete the proof of Theorem 2. $\square $

Proof of Theorem 3

We can write

$$\begin{aligned} \sqrt{n}(\widehat{\sigma }_n^2-\sigma ^2)= & {} \sqrt{n}\left\{ \frac{\sum _{i=1}^n\big (Y_i-\langle X_i,\widehat{\alpha }\rangle -\mathbf{Z}_i^T\widehat{\beta }\big )^2}{n}-\sigma ^2\right\} \\= & {} \sqrt{n}\left\{ \frac{\sum _{i=1}^n\big [\langle X_i,\alpha -\widehat{\alpha }\rangle +\mathbf{Z}_i^T(\beta -\widehat{\beta })+\varepsilon \big ]^2}{n}-\sigma ^2\right\} \\= & {} n^{-1/2}\sum _{i=1}^n\langle X_i,\alpha -\widehat{\alpha }\rangle ^2+n^{-1/2}\sum _{i=1}^n (\beta -\widehat{\beta })^T\mathbf{Z}_i\mathbf{Z}_i^T(\beta -\widehat{\beta })\\&+\,n^{-1/2}\sum _{i=1}^n(\varepsilon _i^2-\sigma ^2)+2n^{-1/2}\sum _{i=1}^n\langle X_i,\alpha -\widehat{\alpha }\rangle \varepsilon _i\\&+\,2n^{-1/2}\sum _{i=1}^n \varepsilon _i\mathbf{Z}_i^T(\beta -\widehat{\beta })+2n^{-1/2}\sum _{i=1}^n \langle X_i,\alpha -\widehat{\alpha }\rangle \mathbf{Z}_i^T(\beta -\widehat{\beta })\\\triangleq & {} R_{n1}+R_{n2}+R_{n3}+R_{n4}+R_{n5}+R_{n6}. \end{aligned}$$

Observe that

$$\begin{aligned} R_{n1}=n^{-1/2}\sum _{i=1}^n \langle X_i,\alpha -\widehat{\alpha }\rangle ^2\le \Vert \alpha -\widehat{\alpha }\Vert ^2 n^{-1/2}\sum _{i=1}^n \Vert X_i\Vert ^2. \end{aligned}$$

Then, by (C1), (C2), theorem 2, we have

$$\begin{aligned} R_{n1}=O_p\big (K_nn^{-1/2}+n^{1/2}h^{2(k+1)}\big )=O_p \big (n^{r-1/2}+n^{1/2-2r(k+1)}\big )=o_p(1).\nonumber \\ \end{aligned}$$

(27)

It follows from (24) that

$$\begin{aligned} R_{n2}=O_p\big (n^{-1/2}\big )=o_p(1). \end{aligned}$$

(28)

For $R_{n3}$, since $E(\varepsilon _1^2-\sigma ^2)=0$ and $\Lambda ^2=E(\varepsilon _1^2-\sigma ^2)^2<\infty $, it follows from the central limit theorem that

$$\begin{aligned} R_{n3}\mathop {\longrightarrow }\limits ^{D} N(0,\Lambda ^2). \end{aligned}$$

(29)

For $R_{n4}$, we have

$$\begin{aligned} |R_{n4}|=2n^{1/2}\Big |\langle \frac{\sum _{i=1}^n X_i\varepsilon _i}{n},\alpha -\widehat{\alpha }\rangle \Big |\le 2n^{1/2}\Big \Vert \frac{\sum _{i=1}^n X_i\varepsilon _i}{n}\Big \Vert \Big \Vert \alpha -\widehat{\alpha }\Big \Vert . \end{aligned}$$

Then, applying (C1), (C2) and Theorem 2, we can obtain

$$\begin{aligned} R_{n4}=O_p\big (K_n^{1/2}n^{-1/2}+h^{k+1}\big )=o_p(1). \end{aligned}$$

(30)

Note that

$$\begin{aligned} R_{n5}=2n^{1/2}\frac{\sum _{i=1}^n\varepsilon _i \mathbf{Z}_i^T(\beta -\widehat{\beta })}{n}=2n^{1/2} \sum _{j=1}^p(\beta _j-\widehat{\beta }_j)\frac{\sum _{i=1}^n\varepsilon _i Z_{ij}}{n}. \end{aligned}$$

Thus, using (C3) and Theorem 1, we have

$$\begin{aligned} R_{n5}=O_p\big (n^{-1/2}\big )=o_p(1). \end{aligned}$$

(31)

Also, observe that

$$\begin{aligned} |R_{n6}|= & {} 2n^{1/2}\frac{\big |\sum _{i=1}^n \langle X_i,\alpha -\widehat{\alpha }\rangle \mathbf{Z}_i^T(\beta -\widehat{\beta })\big |}{n}\\\le & {} 2n^{1/2}\sum _{j=1}^p|\beta _j-\widehat{\beta }_j| \Vert \alpha -\widehat{\alpha }\Vert \frac{\sum _{i=1}^n\Vert X_i\Vert |Z_{ij}|}{n}. \end{aligned}$$

Then, by (C1)–(C3), Theorem 1 and 2, we can get

$$\begin{aligned} R_{n6}=O_p\big (K_n^{1/2}n^{-1/2}+h^{k+1}\big )=o_p(1). \end{aligned}$$

(32)

Finally, using (27)–(32), we can complete the proof of Theorem 3. $\square $

Proof of Corollary 1

It follows form Theorem 1 that

$$\begin{aligned} \frac{\sqrt{n}}{\sigma }\Sigma ^{\frac{1}{2}}(\widehat{\beta }-\beta )\mathop {\longrightarrow }\limits ^{D} N(0,I_p). \end{aligned}$$

Also, by Lemma 2 and Theorem 3, we have that

$$\begin{aligned} \widehat{\Sigma }_n\mathop {\longrightarrow }\limits ^{P} \Sigma ,\quad \quad \widehat{\sigma }^2_n\mathop {\longrightarrow }\limits ^{P}\sigma ^2. \end{aligned}$$

Then, by the Slutsky theorem, we obtain the Corollary 1. $\square $

Rights and permissions

Reprints and permissions

About this article

Cite this article

Zhou, J., Chen, Z. & Peng, Q. Polynomial spline estimation for partial functional linear regression models. Comput Stat 31, 1107–1129 (2016). https://doi.org/10.1007/s00180-015-0636-0

Download citation

Received: 27 June 2014
Accepted: 09 December 2015
Published: 02 January 2016
Issue Date: September 2016
DOI: https://doi.org/10.1007/s00180-015-0636-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Polynomial spline estimation for partial functional linear regression models

Abstract

Similar content being viewed by others

Robust estimation with a modified Huber’s loss for partial functional linear models based on splines

Single-index partially functional linear regression model

Varying coefficient partially functional linear regression models

1 Introduction

2 Polynomial spline estimation

3 Asymptotic properties

Remark 1

Theorem 1

Theorem 2

Remark 2

Theorem 3

Corollary 1

Remark 3

4 Simulation studies

4.1 Models for generating simulation data

4.2 Implementation

4.3 Simulation results

5 Conclusion and further research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Polynomial spline estimation for partial functional linear regression models

Abstract

Similar content being viewed by others

Robust estimation with a modified Huber’s loss for partial functional linear models based on splines

Single-index partially functional linear regression model

Varying coefficient partially functional linear regression models

1 Introduction

2 Polynomial spline estimation

3 Asymptotic properties

Remark 1

Theorem 1

Theorem 2

Remark 2

Theorem 3

Corollary 1

Remark 3

4 Simulation studies

4.1 Models for generating simulation data

4.2 Implementation

4.3 Simulation results

5 Conclusion and further research

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendix

Appendix

Lemma 1

Proof

Lemma 2

Proof

Proof of Theorem 1

Proof of Theorem 2

Proof of Theorem 3

Proof of Corollary 1

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation