Keywords

1 Introduction

With the advance in technology, it is increasingly common to encounter data that are functions or curves in nature (see Ramsay 2005). Functional linear regression models provide a framework for modeling the dynamic relationship between response and functional predictors, which was first introduced by Ramsay and Dalzell (1991). One of the primary goals for functional linear model (FLM) is to get an estimator of functional coefficient. And many procedures have been proposed to approximate functional coefficient, for example, functional principal component analysis (FPCA) based approaches (Cardot et al. 1999; Hall and Horowitz 2007; Yao et al. 2005b), spline-based approaches (Crambes et al. 2009; Marx and Eilers 1999)), wavelet-based approaches (Zhao et al. 2012; Wang et al. 2019)), and others. We refer to Morris (2015) and Reiss et al. (2017) for more informative and extensive reviews on such functional linear models.

Among the different based methods in functional data analysis, FPCA based approaches for capturing the information of covariates are popular (Hall et al. 2006; Che et al. 2017). In the setting where trajectories are observed on dense and regular grid on the entire domain, the existing works can be found in Besse and Ramsay (1986), Rice and Silverman (1991), Cardot et al. (1999), Shin (2009), Horváth and Kokoszka (2012), to name a few. Yao et al. (2005a) emphasize the case where the functional predictors are observed with irregularly sparse measurements which is often referred to as sparse functional data and proposes a nonparametric method to perform FPCA. For general review on FPCA, see Shang (2014). In this paper, we prefer to use FPCA method to get an estimator of the functional coefficient.

Sparse functional data addresses the case where each trajectory is observed at a small number of points that are distributed randomly on the domain which is different from the partially observed functional data (or incomplete or fragmentary functional data) which was first introduced in Liebl (2013). Partially observed functional data addresses each trajectory is observed at points that cover a subset of the domain in such a way that trajectories can be reasonably treated as fragments of curves (Delaigle and Hall 2016) that has great implication in applications, such as in biomedicine, economics (see Kraus 2015; Kneip and Liebl 2020). Considering the partially observed functional data can be treated as missing data for functional curves over the domain, two missing mechanisms are introduced in the existing works: one is missing completely at random (MCAR), that is, the missing data mechanism is independent from other stochastic components (Delaigle and Hall 2016; Goldberg et al. 2014); the other one is the missing mechanism in which depends on systematic strategies, such as missing parts of the trajectories only occur at the upper interval of the domain (see Liebl and Rameseder 2019). In the setting of MCAR, Delaigle and Hall (2016), Goldberg et al. (2014) and Kraus (2015) address the problem for recovering the missing parts of trajectories. Kraus (2015) and Kneip and Liebl (2020) model the functional principal (FPC) scores of an incomplete trajectory. In the scenario where missing data mechanism depends on systematic strategies, Liebl and Rameseder (2019) establishes estimators for the mean and the covariance function of the incomplete functional data via the fundamental theorem of calculus. To the best of our knowledge, no work exists focusing on estimating functional coefficient of FLM with partially observed trajectories.

In this paper, we address the problem of getting an estimator of functional coefficient for the case of partially observed functional data without and with measurement error. In the scenario that trajectories observed without measurement error, instead of deleting the incomplete trajectories, we get estimators of FPC scores for each incomplete trajectory by modeling it as linear functionals of the observed parts of that trajectory. In the setting where trajectories observed with measurement error, we use local linear smoother methods to estimate mean and covariance function of the functional predictor, followed by getting FPC scores via conditional expectation.

The contributions of this paper are as follows. First, we extend FLM approach to partially observed functional data without measurement error, which leads to an improved estimator for functional coefficient comparing with the one obtained through deleting the incomplete trajectories for given dataset. Second, we develop an estimate method for functional coefficient in FLM for incomplete trajectories with measurement error. We illustrate its usefulness by comparing with another two methods: one is based on integration method to get the FPC scores of the functional predictor instead of using conditional expectations; the other estimator is obtained by ignoring the measurement error of the trajectories in the dataset. Third, in both scenarios, we obtain the rate of convergence for the proposed estimators. Overall, the methodological and numerical developments in this paper can provide a practically useful way in analyzing FLM with partially observed functional data.

The rest of this paper is organized as follows. In Sect. 2, we introduce functional linear models. In Sect. 3.1, we develop an estimator for functional coefficient with incomplete trajectories observed without measurement error and establish theoretical properties for the proposed estimator. An estimator and theoretical properties in the scenario that incomplete trajectories observed with measurement error are introduced in Sect. 3.2. Section 4 illustrates the finite sample performance of our proposed estimators through simulation studies, followed by a real data analysis in Sect. 5. Discussion is presented in Sect. 6. Proofs of theorems are given in the Appendix.

2 Functional Linear Model

Consider a functional linear model, in which the scalar response Y i is linearly related to the functional covariate X i,

$$\displaystyle \begin{aligned} Y_i= \alpha + \int_{\mathcal{T}}\gamma(t)X_i(t)dt+\epsilon_i, {} \end{aligned}$$
(1)

where α is the intercept, \(\{X_i(t): t\in \mathcal {T},i=1,\ldots ,n\}\) are the functional predictors, sampled from the stochastic process \(\{X(t):t\in \mathcal {T}\}\) with mean function μ, domain \(\mathcal {T}\) is bounded and closed, γ is the slope function to be estimated, 𝜖 i are random errors satisfying E[𝜖 i] = 0, \(\text{E} [\epsilon _i^2]=\sigma ^2<\infty \). We can easily get an estimator of intercept once we get an estimator of γ. So we focus on estimating γ in the following (Hall and Horowitz 2007). Let 〈⋅, ⋅〉, ||⋅|| be the inner product and norm on \(L^2(\mathcal {T})\), the set of all square integrable functions on \(\mathcal {T}\), with \(\langle f, g \rangle = \int _{\mathcal {T}}f(t)g(t)\text{dt}\), ∥f∥ = 〈f, f1∕2 for any \(f,g\in L^2(\mathcal {T})\).

We first recall the method FPCA in estimating the slope function for model (1) with the functional predictor X i observed on the entire domain \(\mathcal {T}\). For the stochastic process \(X\in L^2(\mathcal {T})\), denote its mean function as μ: μ = E(X), and its covariance function as c X(s, t): c X(s, t) = cov(X(s), X(t)). Assume c X is continuous on \(\mathcal {T}\times \mathcal {T}\). The expression \(c_X(s,t)=\sum _{j=1}^\infty \lambda _j\phi _j(s)\phi _j(t)\) exists by the Mercer Lemma (Riesz and Nagy 1955), where λ 1 > λ 2 > ⋯ > 0; ϕ 1, ϕ 2, ⋯ are the eigenvalue sequence and the continuous orthonormal eigenfunction sequence of the linear operator C X: \((C_X\phi )(\cdot ) = \int _{\mathcal {T}} c_X(\cdot ,t)\phi (t)\text{dt}\), \(\phi \in L^2(\mathcal {T})\), with the kernel c X. On the other hand, by the Karhunen–Loève (K-L) expansion, one has \( X_i(t)=\sum _{j=1}^\infty U_{ij}\phi _j(t), \) where the random variables U ij = 〈X i − μ, ϕ j〉 are uncorrelated with \(\text{E} [U_{ij}]=0,\,\text{E} [U_{ij}^2]=\lambda _j\), and \( \gamma (t)=\sum _{j=1}^\infty \gamma _j\phi _j(t) \) with γ j = 〈γ, ϕ j〉.

The full model (1) is then equivalent to \( Y_i - \text{E}Y_i= \sum _{j=1}^\infty \gamma _j U_{ij}+\epsilon _i \) based on K-L expansion, which can be approximated by \(\sum _{j=1}^m\gamma _j {U}_{ij}+\epsilon _i\) by using the first m terms. To simplify notations, we assume that {Y i, i = 1, ⋯ , n} are centered. Let Y = (Y 1, ⋯ , Y n)T, γ = (γ 1, ⋯ , γ m)T, \(\hat {\mu }\) be an estimator of μ, \(\{\hat {\lambda }_j\}\) and \(\{\hat {\phi }_j\}\) be estimators of the sequence {λ j} and {ϕ j} with \(\hat {\lambda }_1>\hat {\lambda }_2>\cdots >0\). The least square estimator \(\hat {\mathbf {{\gamma }}}\) is then given as

$$\displaystyle \begin{aligned} \hat{\mathbf{{\gamma}}}=\left({\hat{\mathbf U}_m}^T\hat{\mathbf U}_m\right)^{-1}\hat{\mathbf U}_m \mathbf Y,{} \end{aligned} $$
(2)

provided that \(({\hat {\mathbf U}_m}^T\hat {\mathbf U}_m)^{-1}\) exists with \(\hat {U}_{ij}=\langle X_i - \hat {\mu },\hat {\phi }_j\rangle \), \(\hat {\mathbf U}_m=(\hat {U}_{ij})_{\substack {i=1,\cdots ,n;\\ j=1,\cdots ,m}}\). Moreover, for the estimator \(\hat {\gamma }_j, j=1,\cdots ,m\), it has the equivalent form as

$$\displaystyle \begin{aligned} \hat{\gamma}_j = \hat{\lambda}^{-1}_{j}\left\langle n^{-1}\sum_{i=1}^{n} (Y_i-\bar{Y}_0)(X_i-\hat{\mu}), \hat{\phi}_{j}\right\rangle. \end{aligned}$$

Consequently, an estimator of γ is given by

$$\displaystyle \begin{aligned} \hat \gamma(t)=\sum_{j=1}^m \hat{\gamma}_j\hat{\phi}_{j}(t). {} \end{aligned}$$
(3)

The number m of included eigenfunctions is chosen by fraction of variance explained criterion in practice (James et al. 2000): \(m=\text{min}\{k:\sum _{l=1}^{k}\hat {\lambda }_l/\sum _{l=1}^{n}\hat {\lambda }_i\geq R \}\), with a given threshold R. For the asymptotic analysis, we assume m depends on sample size n such that m → as n →.

3 Estimation Methods

The above analysis is based on the assumption the functional predictor is observed on the entire domain. We now consider the scenario that the predictor X i, i = 1, ⋯ , n may be available only on parts of \(\mathcal {T}\). We first give some notations and then make further analysis. Let X 1, ⋯ , X n be an independent and identically distributed samples from the random function X. We denote the observed and missing parts of X i by O i and M i with \(O_i\cup M_i=\mathcal {T}\). Let \(O_i=[L_i,R_i]\subseteq \mathcal {T}\), and assume that it is a random subinterval independent of X i with R i − L i > 0 almost surely. The observed data for ith functional predictor is then given as X i(t), t ∈ O i, i = 1, ⋯ , n, denoted by \(X_{iO_i}\). In this section, our objective interest is to develop an estimation method for model (1) with partially observed functional observations without and with measurement error, respectively. And in these scenarios, our objective is to get estimators of the functional principal component scores {U ij} and the eigenfunctions {ϕ j} as indicated in formulas (2) and (3). Depending on whether measurement error is presented in partially observed functional curves, two methods are developed: one is established by applying linear functionals of the observed parts of that trajectory, while the other one is based on principal component analysis through conditional expectation.

3.1 Partially Observed Functional Data Without Measurement Error

In the scenario that functional curves are partially observed on the domain without measurement error, to get an estimator of γ in model (1), we need to get estimators of U ij and ϕ j pertaining to this case. An estimator of U ij is obtained based on the linear functional of the observed part \(X_{iO_i}\), and an estimator of ϕ j is obtained by giving estimators of mean and covariance function of X. The steps are given here.

Step 1::

Estimate the mean μ and the covariance function c X by sample mean and sample covariance.

Step 2::

Estimate eigenvalues {λ j} and eigenfunctions {ϕ j} by \(\int _{\mathcal {T}} \hat {c}_X(s,t)\hat {\phi }_j(s)\) \(ds = \hat {\lambda }_j\hat {\phi }_j(t).\)

Step 3::

Estimate principal component scores \(U_{ij}=U_{ijO_i}+U_{ijM_i}\) with \(\hat U_{ijO_i} = \langle X_{iO_i} - \hat {\mu }_{O_i},\hat {\phi }_{jO_i}\rangle \), and estimate \(U_{ijM_i}\) by modeling it as linear functionals of \(X_{iO_i}\) given as \(\hat U_{ijM_i} = \langle \hat {\xi }_{ijM_i},X_{iO_i} - {\hat {\mu }}_{O_i}\rangle \).

Step 4::

Estimate γ based on formulas (2) and (3) for \(X_{iO_i}\) observed without measurement error.

We first address the problem of getting estimators of μ and c X, denoted as \(\hat {\mu }^{\text{NME}}\) and \(\hat {c}^{\text{NME}}_X\), respectively, followed by establishing estimators of U ij and eigenfunctions ϕ j which are denoted as \(\hat {U}^{\text{NME}}_{ij}\) and \(\hat {\phi }^{\text{NME}}_j\). For simplicity of presentation, we suppress the notation on “NME” in this subsection unless otherwise stated.

Let \(O_i(t) = \text{I}_{O_i}(t)\) with indicator function \(\text{I}_{O_i}(t)\) being 1 if t ∈ O i, and 0 otherwise, and let W i(s, t) = O i(s)O i(t). The estimators of the mean function μ and the covariance function c X of X obtained from the observed points s, t of X i, are given by,

$$\displaystyle \begin{aligned} \hat{\mu}(t) = \frac{1}{\sum_{i=1}^n O_i(t)} \sum_{i=1}^n O_i(t) X_i(t), {} \end{aligned} $$
(4)
$$\displaystyle \begin{aligned} \hat{c}_X(s,t) = \frac{1}{\sum_{i=1}^{n} W_i(s,t)}\sum_{i=1}^{n} W_i(s,t) (X_i(s) - \hat{\mu}(s))(X_i(t) - \hat{\mu}(t)). {} \end{aligned} $$
(5)

Therefore, we get the estimators \(\{\hat {\lambda }_j\}\), \(\{\hat {\phi }_j\} \) related to {λ j} and {ϕ j} from \(\hat {c}_X\) associated with the covariance operator \(\hat {C}_X\).

We could not get estimators \(\hat {U}_{ij}\) of FPC scores {U ij} of X i directly from its definition if \(O_i\ne \mathcal {T}\). To bridge the gap, U ij is decomposed into two parts:

$$\displaystyle \begin{aligned} U_{ij}=\langle X_{iO_i} - \mu_{O_i},\phi_{jO_i}\rangle+\langle X_{iM_i} - \mu_{M_i},\phi_{jM_i}\rangle=U_{ijO_i}+U_{ijM_i}, {} \end{aligned} $$
(6)

where \(\mu _{O_i}\) and \(\phi _{jO_i}\) denote the restriction of μ and the eigenfunction ϕ j on O i, respectively, and the definitions of \(\mu _{M_i}\), \(\phi _{jM_i}\) are similar. The estimator \(\hat {U}_{ijO_i}\) of \(U_{ijO_i}\) can be estimated directly from the observed part \(X_{iO_i}\) and the estimator \(\hat \phi _j\), given as \(\hat {U}_{ijO_i} = \langle X_{iO_i}-\hat {\mu }_{iO_i}, \hat \phi _{jO_i}\rangle \). For the term \(U_{ijM_i}\), we consider using the linear functional form \(\langle \xi _{ijM_i},X_{iO_i} - {\mu }_{O_i}\rangle \) of the observed part \(X_{iO_i}\) to estimate it which is also considered in Kraus (2015), that is,

$$\displaystyle \begin{aligned} \hat{\xi}_{ijM_i} = \underset{\xi_{ijM_i}\in L^2}{\text{argmin}}\,\, n^{-1}\sum_{i=1}^{n} (\hat{U}_{ijM_i} - \langle \xi_{ijM_i},X_{iO_i}-\hat{\mu}_{iO_i}\rangle)^2 \end{aligned}$$

with \({\hat {U}}_{ijM_i} = \langle X_{iM_i}-{\hat {\mu }}_{M_i}, {\hat {\phi }}_{jM_i}\rangle \). The estimator \({\hat {\xi }}_{ijM_i}\) has the explicit form: \(\hat {\xi }_{ijM_i}={{\hat C}_{O_iO_i}}^{-1}{\hat C}_{O_iM_i}{\hat {\phi }}_{jM_i}\), where \(\hat C_{O_iO_i}\), \(\hat C_{O_iM_i}\) are the empirical covariance operator for \(C_{O_iO_i}\), \(C_{O_iM_i}\) with the kernel being the covariance function \(\hat c_X\) of X i restricted to O i × O i and O i × M i, respectively. To obtain a stable solution, we adopt ridge regularization, given by

$$\displaystyle \begin{aligned} &\hat{\xi}_{ijM_i}^{(\rho)}={(\hat C^{(\rho)}_{O_iO_i})}^{-1}\hat C_{O_iM_i}\hat{\phi}_{jM_i},\\ &\hat U^{(\rho)}_{ijM_i}=\langle\hat{\xi}_{ijM_i}^{(\rho)},X_{iO_i} - \hat{\mu}_{iO_i}\rangle, i=1,\cdots,n,\,j=1,\cdots,m, {} \end{aligned} $$
(7)

where \(\hat {C}_{O_iO_i}^{(\rho )}=\hat {C}_{O_iO_i}+\rho \mathcal {F}_{O_i}\), \(\mathcal {F}_{O_i}\) is an identity operator defined on L 2(O i), ρ is a ridge parameter; see Kraus (2015) for further details. Let \(\hat {U}^{\text{NME}}_{ij} = \hat {U}_{ijO_i} + \hat U^{(\rho )}_{ijM_i}\). The estimator \(\hat {\gamma }^{\text{NME}}\) of γ using all of the information of the dataset is then obtained through replacing \(\hat {U}_{ij}\) in (2) with \(\hat {U}^{\text{NME}}_{ij}\),

$$\displaystyle \begin{aligned} \hat {\gamma}^{\mathrm{NME}}(t)=\sum_{j=1}^m \hat{\gamma}_j\hat{\phi}_{j}(t). {} \end{aligned} $$
(8)

To facilitate our theoretical analysis, we first impose some assumptions on observation points for partially observed functional curves, indicating the observation points asymptotically provide enough information in individual or pairwise crossover.

  1. (A1)

    \(\text{There exists} ~\delta _1>0 ~\text{s.t.} \mathop {\text{sup}} \limits _{t\in [0,1]}\text{P}\{n^{-1}\sum _{i=1}^n \text{I}_{O_i}(t)\leq \delta _1\}=O(n^{-2})\).

  2. (A2)

    \(\text{There exists} ~\delta _2>0 ~\text{s.t.} \mathop {\text{sup}} \limits _{s,t\in [0,1]^2}\text{P}\{n^{-1}\sum _{i=1}^nW_i(s,t)\leq \delta _2\}=O(n^{-2}). \)

Moreover, we also introduce some regularity conditions necessary to derive theoretical properties for the estimate \(\hat {\gamma }^{\text{NME}}\).

  1. (A3)

    E||X − μ||4 < .

  2. (A4)

    nm −1 →, \(n/(\sum _{j=1}^{m}\delta ^{-2}_j)\rightarrow \infty \) with δ j = minj≥1{λ j − λ j+1, λ j−1 − λ j} and \(n\lambda ^2_m\rightarrow \infty \) as m →.

  3. (A5)

    The ridge parameter ρ satisfies ρ → 0, 3 → 0, nm −1 ρ 2 →.

  4. (A6)

    \(\sum _{k=1}^{\infty }[\text{E}[YU_k]]^2/\lambda _k^2<\infty \).

  5. (A7)

    \(\sum _{j=1}^\infty \sum _{k=1}^\infty \frac {r_{M_iO_ijk}^2}{\lambda _{O_iO_ik}^2}<\infty ,\) with \(r_{M_iO_ijk}=\text{cov}(\langle X_{M_i}-\mu _{M_i},\phi _{M_iM_ij}\rangle , \langle X_{M_i}-\mu _{M_i}, \phi _{O_iO_ik}\rangle )\).

Assumption (A3) is a common condition in the analysis of functional model by using the method of FPCA to guarantee the random functions have finite fourth moment (see Cardot et al. 1999). Note that if the eigenvalues {λ j} are exponentially or geometrically decreasing, the assumption (A4) holds. The same kind of conditions are also introduced in Cardot et al. (1999). Assumption (A5) is used to control the size of ridge effect. To define the convergence of the right hand of the formula \(\gamma (s)=\sum _{k=1}^{\infty }(\text{E}[YU_k]/\lambda _k) \phi _k(s)\), in the L 2 sense, assumption (A6) is required that is similar to the condition (A1) in Yao et al. (2005b). Assumption (A7) is used to make the solution \(\hat {\xi }_{ijM_i}\) valid which is commonplace in the theory of inverse problems as Picard condition (see Hansen 1990).

Let \(\theta _n=\sum _{k=m}^{\infty }[\text{E}[YU_k]]^2/\lambda _k^2\). Then assumption (A6) indicates that θ n → 0. Denote \(\upsilon = \sum _{j=1}^{m}V_{ij}\) with \( V_{ij}= \langle \phi _{jM_i}, (C_{M_iM_i}- C_{M_iO_i}C^{-1}_{O_iO_i}C_{O_iM_i})\phi _{jM_i}\rangle . \) Based on the above assumptions, Theorem 1 gives the converge rate for the estimator \(\hat {\gamma }^{\text{NME}}\) in the L 2 sense.

Theorem 1

Suppose that (A1)–(A7) are satisfied. Then

$$\displaystyle \begin{aligned} \|\hat{\gamma}^{NME}-\gamma\|{}^2 = O_p(n^{-1}m\rho^{-2} + \iota_n + \theta_n + \upsilon), \end{aligned} $$

with \(\iota _n = n^{-1}\sum _{j=1}^{m} \delta ^{-2}_j\).

Theorem 1 indicates that the approximation error rate of \(\hat {\gamma }^{\text{NME}}\) for γ is controlled by four terms. The first term depends on sample size n, tuning parameter m, ridge parameter ρ, which is of the higher order than the one given in Hall and Horowitz (2007) that is mainly due to functional curves observed on the part of the domain. The second term is related to the spacings between adjacent eigenvalues, and its effect on convergence rate of γ is also emphasized in Hall and Horowitz (2007). The third term is related to the convergence of γ in L 2 sense, which is also shown in Yao et al. (2005b) to get approximation error rate for functional coefficient. The fourth term is introduced by approximating \(U_{ijM_i}\) with \(\tilde {U}_{ijM_i}\).

Note that in practice, the ridge parameter ρ included in the regularized estimation of the jth score of the ith functional observation is chosen by generalized cross-validation based on the set of samples observed on the entire domain (see Kraus 2015).

3.2 Partially Observed Functional Data with Measurement Error

In this subsection, we construct an estimator for the slope function γ for partially observed trajectories with measurement error. We suppose the functional observations are:

$$\displaystyle \begin{aligned} Z_{il} = X_i(t_{il})+\varepsilon_{il},\quad t_{il}\in O_i, i=1,\cdots,n, l=1,\cdots N_i, {} \end{aligned} $$
(9)

where ε il is independent from all the other variables X j, j ≠ i, with E(ε il) = 0, \(\text{var}(\varepsilon _{il})=\sigma ^2_X\).

To get an estimator of γ in (1) in the scenario that trajectories may be observed on parts of the domain with measurement error (WME), we need give estimators of FPC scores and eigenstructure pertaining to this case. Estimator of eigenstructure is established after using local linear smoothers to get estimators of mean and covariance function of X. We obtain estimators of FPC scores by using approach of principal component analysis via conditional expectation. The steps are given here.

Step 1::

Estimate the mean and covariance functions by local linear smoothers.

Step 2::

Estimate eigenvalues {λ j} and eigenfunctions {ϕ j} by \(\int _{\mathcal {T}} \hat {c}^{\text{WME}}_X(s,t)\) \(\hat {\phi }^{\text{WME}}_j(s) ds = \hat {\lambda }^{\text{WME}}_j{\hat {\phi }}^{\text{WME}}_j(t)\).

Step 3::

Estimate FPC scores {U ij} by principal component analysis via conditional expectation (PACE): \(\tilde {U}_{ij} = \text{E}[U_{ij}|{\mathbf {Z}}_i]\).

Step 4::

Based on obtained estimators \(\hat {\tilde {U}}_{ij}\) and \(\hat {\phi }^{\text{WME}}_j\), we get estimator γ WME for \(X_{iO_i}\) observed with measurement error.

We first calculate estimators for the mean and the covariance function of X in the scenario (9), denoted as \(\hat {\mu }^{\text{WME}}\) and \(\hat {c}^{\text{WME}}_X\), that are required to derive estimators for the FPC scores \(U_{ij} = \int (X_i(t)-\mu (t))\phi _j(t)\text{dt}\). For simplicity of presentation, we suppress notation on “WME” unless otherwise stated in this subsection.

Let K(⋅) be a nonnegative univariate kernel function that is assumed to be a symmetric probability density function (pdf) with compact support supp(K) = [−1, 1], and h μ, h c be the bandwidths for obtaining estimators of μ, c X. Assume that the second derivatives of μ, c X on \(\mathcal {T}\), \(\mathcal {T}^2\), respectively, exist. We use local linear smoothers for the mean function μ (Yao et al. 2005a,b; Kneip and Liebl 2020) defined as \(\hat {\mu }(t) = \hat {\beta }_0\), where

$$\displaystyle \begin{aligned} (\hat{\beta}_0,\hat{\beta}_1) = \underset{\beta_0,\beta_1}{\text{argmin}}\sum_{i=1}^{n}\sum_{l=1}^{N_i}K\left(\frac{t_{il}-t}{h_{\mu}}\right)[Z_{il}-\beta_0-\beta_1(t-t_{il})]^2. {} \end{aligned} $$
(10)

Let \(\hat G_{ilk} = (Z_{il}-\hat {\mu }(t_{il}))(Z_{ik}-\hat {\mu }(t_{ik}))\) be the raw covariance points. The local linear smoother for the covariance function c X is defined as \(\hat {c}_X = \hat {\tilde {\beta }}_{0}\), where

$$\displaystyle \begin{aligned} (\hat{\tilde{\beta}}_{0},\hat{\tilde{\beta}}_{1},\hat{\tilde{\beta}}_{2}) &= \underset{\tilde{\beta}_{0},\tilde{\beta}_{1},\tilde{\beta}_{2}}{\text{arg min}}\,\sum_{i=1}^{n}\sum_{1\leq l,k\leq N_i} K\left(\frac{t_{il}-t}{h_c}\right)K\left(\frac{t_{ik}-s}{h_c}\right) \\ &\quad \times [\hat{G}_{ilk}-\tilde{\beta}_{0}-\tilde{\beta}_{1}(t_{il}-t)-\tilde{\beta}_{2}(t_{ik}-s)]^2. {} \end{aligned} $$
(11)

Similar to the technique introduced in Yao et al. (2005a), the points \(\hat G_{ill}, l=1\cdots ,N_i\) are not included in (11). Let \(\mathcal {T}_1=[\text{inf}\{L_i \in \mathcal {T}, i=1,\cdots ,n\}+|\mathcal {T}|/4, \text{sup}\{R_i\in \mathcal {T}, i=1,\cdots ,n\}-|\mathcal {T}|/4]\) with \(|\mathcal {T}|\) being the length of \(\mathcal {T}\). The estimator of \(\sigma ^2_X\) is defined as \(\hat {\sigma }^2_X\) if \(\hat {\sigma }^2_X>0\), otherwise \(\hat {\sigma }^2_X=0\) with

$$\displaystyle \begin{aligned}\hat{\sigma}^2_X=2\int_{\mathcal{T}_1}(\hat{V}_X(t)-\tilde{G}(t))\text{dt}/|\mathcal{T}|,\end{aligned}$$

where \(\hat {V}_X(t)\) is the local linear estimator using the points \(\{\hat G_{ill}\}\), \(\tilde {G}(t)\) is the estimate \(\hat {c}_X(s,t)\) restricted to s = t (Staniswalis and Lee 1998; Yao et al. 2005a). The estimators of {λ j, ϕ j}j≥1 are the corresponding solutions of the eigen-equations

$$\displaystyle \begin{aligned} \int_{\mathcal{T}} \hat{c}_X(s,t)\hat{\phi}_j(s) \text{ds} = \hat{\lambda}_j\hat{\phi}_j(t). \end{aligned} $$

Based on the K-L expansion of X i, model (9) can be rewritten as

$$\displaystyle \begin{aligned} Z_{il} = \mu(t_{il}) + \sum_{j=1}^{\infty}U_{ij}\phi_{j}(t_{il}) + \varepsilon_{il}, \quad t_{il}\in O_i, i=1\cdots,n, l=1\cdots,N_i. \end{aligned} $$

Let \({\mathbf {X}}_i=(X_i(t_{i1}),\cdots ,X_i(t_{iN_i}))^T\), \({\mathbf {Z}}_i=(Z_{i1},\cdots ,Z_{iN_i})^T\), μ i = (μ(t i1), ⋯ , \(\mu (t_{iN_i}))^T\), \({\mathbf {\phi }}_{ij}=(\phi _j(t_{i1}),\cdots ,\phi _j(t_{iN_i}))^T\). Assume that U ij and ε il are jointly Gaussian. Following Yao et al. (2005a), the best prediction of U ij of the ith subject given the observations (Z il, t il), l = 1, ⋯ , N i is obtained as

$$\displaystyle \begin{aligned} \tilde{U}_{ij} = \lambda_j\mathbf\phi^T_{ij}\mathbf\varSigma^{-1}_{{\mathbf{Z}}_i}({\mathbf{Z}}_i-\mathbf{\mu}_i), \end{aligned} $$

where \(\mathbf \varSigma _{{\mathbf {Z}}_i}=\text{cov}({\mathbf {Z}}_i, {\mathbf {Z}}_i) = \text{cov}({\mathbf {X}}_i, {\mathbf {X}}_i) + \sigma ^2_X {\mathbf {I}}_{N_i}\) with identity matrix \({\mathbf {I}}_{N_i}\). That is, the (u, v)th element of \(\mathbf \varSigma _{{\mathbf {Z}}_i}\) is \((\mathbf \varSigma _{{\mathbf {Z}}_i})_{u,v} = c_X(t_{iu}, t_{iv}) + \sigma ^2_XI_{uv}\) with I uv = 1 if u = v, and 0 otherwise. Then the estimator of U ij is given through substituting μ, λ j, ϕ j with \(\hat {\mu }, \hat {\lambda }_j, \hat {\phi }_j\) as

$$\displaystyle \begin{aligned} \hat{U}^{\mathrm{WME}}_{ij} = \hat{\lambda}_j\hat{\mathbf\phi}^T_{ij}\hat{\mathbf\varSigma}^{-1}_{{\mathbf{Z}}_i}({\mathbf{Z}}_i-\hat{\mathbf{\mu}}_i), {} \end{aligned} $$
(12)

where the (u, v)th entry of \(\hat {\mathbf \varSigma }_{{\mathbf {Z}}_i}\) is \((\hat {\mathbf \varSigma }_{{\mathbf {Z}}_i})_{u,v} = \hat {c}_X(t_{iu}, t_{iv}) + \hat {\sigma }^2_X I_{uv}\). Replacing \(\hat {U}_{ij}\) in (2) with \(\hat {U}^{\text{WME}}_{ij}\), we then get the estimator \(\hat {\gamma }^{\text{WME}}\) of γ from (3)

$$\displaystyle \begin{aligned} \hat{\gamma}^{\mathrm{WME}}(t) = \sum_{j=1}^{m} \hat{\gamma}_j \hat{\phi}_j, \end{aligned} $$

where \(\hat {\gamma }_j\) is the jth entry of \(\hat {\mathbf \gamma }\) with \(\hat {U}^{\text{WME}}_{ij}\) in (2).

Next, we give some theoretical results for \(\hat {\gamma }^{\text{WME}}(t)\). We assume the following regularity conditions which are similar to the assumptions in Kneip and Liebl (2020), Yao et al. (2005b).

  1. (B1)

    The observational points {t il, l = 1, ⋯ , N i} given O i for the ith subject are i.i.d. random variables with pdf \(f_{t|O_i}(u)>0\) for all \(u\in O_i\subseteq \mathcal {T}\) and zero else. For the marginal pdf f t of observation times t ij, f t(u) > 0 for all \(u\in \mathcal {T}\).

  2. (B2)

    Let N = min{N i, i = 1, ⋯ , n}. N ≍ n r with 0 < r < , where a n ≍ b n means that there exists a constant 0 < L <  such that a nb n → L as n →.

  3. (B3)

    h μ → 0, h c → 0, nNh μ →, nMh c → as n → with M = N 2 − N.

  4. (B4)

    K is a second order kernel with compact support [−1, 1].

  5. (B5)

    Let G ilk = (Z il − μ(t il))(Z ik − μ(t ik)). Define f Zt, f tt, f Gtt as the joint pdf of (Z il, t il) on \(\mathbb {R}\times \mathcal {T}\), \((t_{il_1},t_{il_2})\) on \(\mathcal {T}^2\), (G ilk, t il, t ik) on \(\mathbb {R}\times \mathcal {T}^2\), respectively. All of the second derivatives of f Zt, f tt, f Gtt are uniformly continuous and bounded. Moreover, f t is uniformly continuous and bounded on \(\mathcal {T}\).

  6. (B6)

    Let Λ = diag{λ 1, ⋯ , λ m}, Ξ = (λ 1 ϕ i1, ⋯ , λ m ϕ im)T, \(\varUpsilon = \mathbf {\varLambda }-\varXi \mathbf {\varSigma }^{-1}_{{\mathbf {Z}}_i}\varXi ^T\) and ς n ≡trace(Υ). Denote \(r_{\mu }=h^2_{\mu } + 1/\sqrt {nNh_{\mu }} + 1/\sqrt {n}\), \(r_{c} = h^2_{c} + 1/\sqrt {nMh^2_{c}} + 1/\sqrt {n}\). υ n ≡ mr μ → 0, \(\tau _n \equiv r_c(\sum _{j=1}^{m}\delta ^{-1}_j)\rightarrow 0\).

Theorem 2

Under the regularity conditions (A3), (A6), (B1)–(B6), we have that

$$\displaystyle \begin{aligned} \|\hat{\gamma}^{\mathit{\mbox{WME}}}-\gamma\|{}^2 = O_p(\upsilon_n+\tau_n+\varsigma_n+\theta_n). \end{aligned} $$

Theorem 2 gives the rate of convergence of the estimator \(\hat {\gamma }^{\text{WME}}\) in the L 2 sense. The rate of convergence of \(\hat {\gamma }^{\text{WME}}\) depends on the sample size and bandwidths which is common for estimating curves or surface by local linear smoothers for functional data analysis (see Li and Hsing 2010). Related results of Theorem 2 can also be found in Yao et al. (2005b). The terms υ n, τ n are related to rates of convergence of estimators for the mean and covariance function by using local linear smoothers. The term ς n is introduced by approximating U ij with \(\tilde {U}_{ij}\).

4 Simulation Studies

In this section, we use the simulated datasets to evaluate the finite sample properties of our proposed methods in Sect. 3. These studies are based on n ∈{50, 100, 200} i.i.d. samples \(\{X_i,Y_i\}_{i=1}^n\) and equally spaced grid {t 1, ⋯ , t 30} on [0, 1] with t 1 = 0, t 30 = 1. For the ith functional observation X i(t), the missing interval M i takes the form [R i − E i, R i + E i], with \(R_i=a_1T_{i1}^{1/2}\), E i = a 2 T i2, where T i1, T i2 are independent random variables uniformly distributed on [0, 1], \(a_1,a_2\in \mathbb {R}\). We consider (a 1, a 2) = (1.5, 0.2), (a 1, a 2) = (1.5, 0.4) with the expected missing length over the domain being 0.4 and 0.8, respectively. We set the intercept α = 0. To evaluate the performance of an estimator \(\hat {\gamma }\) of γ, mean integrated square error (MISE) is used below as an evaluation criterion, given by,

$$\displaystyle \begin{aligned}\text{MISE} = \frac{1}{N}\sum_{l=1}^{N}\int_{0}^{1}(\hat{\gamma}_l(t)-\gamma(t))^2\text{dt},\end{aligned}$$

where N is the number of Monte Carlo replications.

For functional predictors {X i} without measurement error, the trajectories are generated as follows. The simulated random function X i has zero mean, the covariance function is generated from two eigenfunctions, \(\phi _1(t)=\sqrt {2}\text{sin}(\pi t/2)\), \(\phi _2(t)=\sqrt {2}\text{sin}(3\pi t/2)\). For the eigenvalues, we take λ 1 = (π∕2)−2, λ 2 = (3π∕2)−2, λ k = 0, for k ≥ 3. The error 𝜖 i in (1) is assumed to be standard normal. For the slope function γ in (1), we take the form γ(t) = ϕ 1(t) + 3ϕ 2(t). We compare the finite sample performance of our proposed method with the method that gives an estimator for γ through formula (2), (3) with deleting the incomplete functional observations in the datasets denoted as “SUB.” Moreover, the estimator of γ based on the original complete dataset is also considered in this scenario and denotes it as “ORI.” We conduct 1000 simulation runs in each setup. Table 1 reports the results.

Table 1 MISEs of the estimators of γ under different methods with 1000 Monte Carlo replications for functional predictors without measurement error

As shown in Table 1, in the scenario where incomplete functional predictors are observed without measurement error, the estimation method in Sect. 3.1 performs better than “SUB” method. This is because some useful information the dataset has will be lost if we delete them directly, while the “NME” method can take advantage of the whole information about the dataset. Specially, in each setting for (a 1, a 2), MISEs from the “NME” method have smaller values relative to the “SUB” method. These simulation results also demonstrate that MISEs decrease with increasing sample size n for these three methods. And MISEs increase with longer missing length on [0, 1] at fixed n indicating that a large error is introduced for the “NME” method in imputing missing scores of incomplete functional predictors through little available information from functional samples. In further, the difference of MISEs among these three methods are reduced with increasing sample size n, and the “NME” method still performs better than the “SUB” method, those imply the “NME” method is promising.

For functional predictors X i with measurement error, they are generated according to Z i(t il) = X i(t il) + ε il, l = 1, ⋯ , 30, as follows. We take \(X_i(t)=\sum _{j=1}^{50}U_{ij}\phi _j(t)\) with U ij = (−1)j+1 j −1.1∕2 W ij, where W ij is uniformly distributed on \([-\sqrt {3}, \sqrt {3}]\), ϕ 1(t) = 1, \(\phi _j(t)=\sqrt {2}\text{cos}(j\pi t)\) for j ≥ 2. The additional random error ε il, l = 1⋯ , 30 and the error 𝜖 i in (1) are assumed to be normal with mean zero, variance 0.25. For the slope function γ, we take \(\gamma =\sum _{j=1}^{50}\gamma _j\phi _j(t)\) with γ 1 = 0.3, γ j = 4(−1)j+1 j −2 for j ≥ 2 (Hall and Horowitz 2007). We conduct 100 simulation runs in each setup. To demonstrate the superior performance of our proposed method in Sect. 3.2, we compare it with the other two methods after we get estimators of μ(t) and c X(s, t) by solving the optimization problems (10), (11), respectively: one is that an estimator of γ is established by applying integration method to get the FPC scores \(\hat {U}_{ij}\) in (2) instead of using formula (12), denoted as “IN”; the other one is that an estimator of γ is obtained by using the method in Sect. 3.1 with dataset {Z i, Y i} with measurement error being ignored. The results are summarized in Table 2.

Table 2 MISEs of the estimators of γ under different methods with 100 Monte Carlo replications for functional predictors with measurement error

We find from Table 2 that the “WME” method has the best performance relative to the other two methods in each setup, and the gains are dramatic when switching from the “NME” method to the “WME” method with the “NME” method ignoring observation errors for functional predictors. Specifically, for the case of n = 100, comparing with the “NME” method, the MISEs are reduced by 74%, 68% using the “WME” method with (a 1, a 2) = (1.5, 0.2) and (a 1, a 2) = (1.5, 0.4), respectively. For the “IN” method, it provides a reasonable estimator for γ and has better performance than the “NME” method, but nevertheless the “WME” method still performs better than “IN” method with improvement of 25%, 32% with respect to (a 1, a 2) = (1.5, 0.2) and (a 1, a 2) = (1.5, 0.4). In addition, these simulation results show that the MISEs decrease with increasing sample size n that is consistent with the derived theoretical results.

To sum up, in the scenario that incomplete functional predictors observed without measurement error, the “NME” method taking advantage of the whole information of the dataset produces a better estimator compared with the “SUB” method; in the scenario that incomplete functional predictors observed with measurement error, the “WME” method is preferred for giving the smallest MISE relative to the “IN” and “NME” methods. Both MISEs of the estimators of γ decrease with increasing sample size n that is consistent with the derived theoretical properties.

5 Real Data Analysis

A real diffusion tensor imaging (DTI) dataset considered here is from NIH Alzheimer’s Disease Neuroimaging Initiative (ADNI) study with 212 subjects, and is obtained through http://adni.loni.usc.edu/. The primary goal of ADNI study is to test whether serial magnetic resonance imaging (MRI), positron emission tomography (PET), biological markers, and neuropsychological assessment can be combined to measure the progression of mild cognitive impairment (MCI) and early Alzheimer’s disease (AD). DTI obtained using mathematical method to represent the anisotropic diffusion of the water molecule in brain organization can be used to learn MCI and AD. The concrete measure of anisotropy includes fractional anisotropy (FA), relative anisotropy (RA), Volume ratio (VR), and FA is commonly adopted for its advantage in contrast ratio of grey-white matter. More details about preprocessing and methods of this study can be found in Zhu et al. (2012) and Yu et al. (2016).

Our main interest is characterizing the dynamic relationship between FA and mini-mental state examination (MMSE) score which is seen as a reliable and valid clinical measure in quantitatively assessing the severity of cognitive impairment. FA is measured at 83 equally spaced grid along the corpus callosum (CC) fiber tract that is the largest fiber tract in human brain, is responsible for much of the communication between two hemispheres, and connects homologous areas in two cerebral hemispheres.

To demonstrate the usefulness of the proposed method in Sect. 3.1, we artificially delete some observed points of FA, and then compare the estimator of γ obtained by using these incomplete functional observations with the estimator obtained by applying original complete dataset. For the ith FA curve, the missing domain has the same form with the interval given in Sect. 4 with (a 1, a 2) = (1.5, 0.2) and (a 1, a 2) = (1.5, 0.4). A part of complete and incomplete individual trajectories are displayed in Fig. 1.

Fig. 1
figure 1

A part of complete (left) and incomplete (right) FA curves with mean function (purple line)

Estimators of functional coefficient obtained by both complete and incomplete FA dataset are illustrated in Fig. 2. It shows that estimators obtained by incomplete dataset with different missing domain (red line and green line) are similar to the estimator obtained from original complete dataset (blue line). This reveals that the proposed framework is useful in getting an estimator for the model with incomplete functional predictors.

Fig. 2
figure 2

Estimators of γ with different expected missing length on [0, 1]. Blue line: the estimator using original complete dataset; Red line: the estimator with (a 1, a 2) = (1.5, 0.2); Green line: the estimator with (a 1, a 2) = (1.5, 0.4)

Next, we focus on the problem of recovering the missing parts \(X_{iM_i}\) of X i. Assume that the infinite-dimensional process X i is well approximated by the projection onto the function space \(L^2(\mathcal {T})\) via the first m eigenfunctions (Yao et al. 2005a). In practice, the prediction for the trajectory X i(t) of the ith subject using the first m eigenfunctions given in Sect. 3.1 can be approached by

$$\displaystyle \begin{aligned} \hat{X}_i(t) = \hat{\mu}^{\text{NME}}(t) + \sum_{k=1}^{m}\hat{U}^{(\rho)}_{ij}\hat{\phi}^{\text{NME}}_j(t). \end{aligned} $$

We randomly select four FA curves with different missing parts. The predicted profiles for these four curves are presented in Fig. 3, showing that the predicted profiles are close to the real part. This demonstrates the “NME” method by recovering the missing parts of incomplete trajectories encourages a better estimator comparing with the “SUB” method with deleting them directly.

Fig. 3
figure 3

Predicted profiles for four randomly chosen FA curves with different missing parts with (a 1, a 2) = (1.5, 0.2). Missing parts of trajectories from left to right and top to down: missing in left side, middle side, right side, both left and right side. Blue point: real data point; Red line: predicted profile

6 Discussion

In this paper, we address the problem for getting estimators of γ in (1) with partially observed trajectories without and with measurement error. Basic elements of our approach are estimators of FPC scores for each partially observed trajectory. Specially, in the scenario that incomplete functional predictors observed without measurement error, we achieve it by modeling FPC scores of the missing part as linear functionals of the observed part of that trajectory. In the scenario where incomplete functional data is observed with measurement error, we obtain estimators of FPC scores via conditional expectation. Rates of convergence of the proposed estimators \(\hat {\gamma }^{\text{NME}}\), \(\hat {\gamma }^{\text{WME}}\) under different scenarios are established. We also compare the proposed methods with the “SUB” or “IN” method. We conclude from simulation studies that both the “NME” and “WME” methods borrowing strength from entire samples to get estimators of γ in model (1) perform well in practice.

The methods proposed here can be extended to other models in terms of functional regression with partially observed trajectories, such as partial functional linear regression (see Shin 2009). The framework established in this paper is based on the assumption that missing parts of trajectories are missing completely at random. In a number of applications, it is common to encounter that the underlying missing mechanism for dataset depends on systematic strategies (Liebl and Rameseder 2019) that clearly violate MCAR assumption. Extension to this scenario is also of interest and significance in practice.