Prediction of functional data with spatial dependence: a penalized approach

Aguilera-Morillo, M. Carmen; Durbán, María; Aguilera, Ana M.

doi:10.1007/s00477-016-1216-8

Prediction of functional data with spatial dependence: a penalized approach

Original Paper
Published: 05 February 2016

Volume 31, pages 7–22, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Stochastic Environmental Research and Risk Assessment Aims and scope Submit manuscript

Prediction of functional data with spatial dependence: a penalized approach

Download PDF

M. Carmen Aguilera-Morillo¹,
María Durbán¹ &
Ana M. Aguilera²

1162 Accesses
26 Citations
Explore all metrics

Abstract

This paper is focus on spatial functional variables whose observations are a set of spatially correlated sample curves obtained as realizations of a spatio-temporal stochastic process. In this context, as alternative to other geostatistical techniques (kriging, kernel smoothing, among others), a new method to predict the curves of temporal evolution of the process at unsampled locations and also the surfaces of geographical evolution of the variable at unobserved time points is proposed. In order to test the good performance of the proposed method, two simulation studies and an application with real climatological data have been carried out. Finally, the results were compared with ordinary functional kriging.

Prediction of spatial functional random processes: comparing functional and spatio-temporal kriging approaches

Article Open access 17 July 2019

Birnbaum–Saunders functional regression models for spatial data

Article 22 July 2019

On estimation and prediction in spatial functional linear regression model

Article 01 January 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Functional data analysis (FDA) is currently a very active statistical research topic from both the theoretical and applied viewpoint. The functional data are set of functions obtained as independent realizations of a functional random variable that takes values in a functional space defined on a continuous domain. In most cases functional data observations are curves that correspond to the evolution of a scalar variable over time but also surfaces representing the evolution of a scalar variable on space can be seen as functional data. The name FDA is due to the pioneer book by Ramsay and Silverman (1997) that contains an excellent collection of the main FDA methodologies and interesting motivating examples. From that moment FDA had a brilliant development with more than 8000 references in academic google during the twenty first century. Although the key tool in FDA is still principal component analysis, other techniques of multivariate analysis like canonical correlation, discriminant and cluster analysis were also investigated. Recently, research on FDA is oriented to regression models, non-parametric estimation, robust estimation, Bayesian estimation, inference and so on. We can say that almost any statistical method is being extended for analyzing functional data. At the same time new books appear on these FDA topics (Ferraty and Vieu 2006; Horvath and Kokoszka 2012; Hsing and Eubank 2015; Shi and Choi 2011; Zhang 2013).

This work is focused on the analysis of univariate functional data with spatial dependence. The sample information is given by a set of curves associated to different geographical locations on a spatial domain. Let us consider as an illustration the Canadian temperature data set introduced by Ramsay and Silverman (1997) that has been analyzed in a lot of papers by using different FDA methodologies. These data are the daily temperature (averaged over 30 years) for 35 weather stations in Canada so that we have a set of 35 spatially correlated curves that represent the temporal evolution of temperatures in different geographical sites (spatio-temporal functional variable). The raw data set together with the map with the geographical locations are shown in Fig. 1. Spatially correlated functional data are very common in environmental applications and are analyzed in many cases by using FDA approaches that does not take into account the spatial dependence structure. Some interesting applications of these type can be seen in Escabias et al. (2005), Zhang and Chen (2007) and Kaufman and Sain (2010).

The aim of this work is to develop a method for modeling the mean of a spatio-temporal functional variable from its discrete observations at a finite set of time points and locations in the temporal and spatial domains, respectively. The estimated model will provide the prediction of the curves of temporal evolution of the variable at unobserved locations and also the prediction of the surface of geographical evolution of the variable at unobserved time points.

This problem has been approached by different authors in the context of geostatistical techniques. The first notions about this topic can be found in Goulard and Voltz (1993), where multivariate approaches were used to predict curves at unsampled spatial sites. A more recent collection of geostatistical tools for spatial functional data can be seen in Giraldo (2010) and Delicado et al. (2009). In general, the most used technique to predict functional data with spatial dependence is functional kriging. In Giraldo et al. (2010) a continuous time-varying kriging was proposed and applied to environmental data. A formal version of ordinary kriging for functional data (OKFD) was developed by Giraldo et al. (2011), and implemented in the R package geofd (Giraldo et al. 2012). Recently, different versions of universal kriging predictor for functional data with spatial dependence were proposed in Menafoglio (2013) and Caballero et al. (2013). Kriging with external drift has also been extended for introducing exogenous variables with spatially correlated functional data (Ignaccolo et al. 2014).

In the context of spatial data an alternative to geostatistical techniques are the nonparametric spatial regression models. A popular approach consists of using penalized-splines (Eilers and Marx 1996). They are based on the use of a rich basis for regression and a penalty (based on differences of adjacent coefficients) to control the smoothness of the fit. This methodology has been successfully applied to both, functional and spatial data in different contexts. In FDA P-splines were used for smoothing the sample curves (Aguilera and Aguilera-Morillo 2013a) and estimating different FDA models as PCA (Aguilera and Aguilera-Morillo 2013b) or functional regression (Marx and Eilers 1999; AguileraMorillo et al. 2013), among others. In the case of spatial data, Lee and Durban (2009) and Ugarte et al. (2009) used P-splines for smoothing spatially correlated count data, Lee and Durban (2011) extended their use to the case of spatio-temporal data, and more recently, Sangalli et al. (2013) proposed a spatial regression model for data distributed over irregularly shaped spatial domains. A wavelet regression approach for estimating the field of ocean temperature at different depths is introduced in Fernndez-Pascual (2015). On the other hand, functional approaches based on autoregressive Hilbertian processes were considered in Ruiz-Medina et al. (2012, 2014).

Univariate kriging and spline smoothers were compared in several papers without reaching an unanimous conclusion. The major objection to kriging is the assumption of stationarity that could not be right for some types of spatial structure. From simulations where spline regression predicts better than kriging when the data contains trends of various types, some papers conclude that non-parametric regression is more robust than kriging because takes into account spatial structure that geostatistics does not (Yakowitz and Szidarovsky 1985). Other papers conclude that kriging never performs worse than splines and has the potential to outpredict splines when data are not sampled on a grid (Dubrule 1984; Laslett 1994).

Our aim is to use spatial smoothing regression techniques within a functional regression approach to provide a new method to predict functional data with spatial dependence at unsampled locations. From the formal definition of spatio-temporal functional data, which is given in Sect. 2, a penalized functional regression model is extended for predicting spatially correlated functional data in Sect. 2.1. The idea is to consider the functional regression model for functional response and scalar covariates (Faraway 1997; Ramsay and Silverman 1997; Chiou et al. 2004) by using the spatial information as regressors. So, a mixture of functional regression model for functional response and penalized spline spatial regression will yield the proposed functional spatial regression model. In practice, functional data are usually observed with some error or noise. To overcome this problem, Ramsay and Silverman (1997) considered a penalized version of functional regression for functional response by introducing a continuous penalty (based on the second order squared derivatives of the parameter functions) in the least squares fitting, and Reiss (2010) used a penalized generalized least squares criterion based on a basis representation. In this paper, we will adapt the idea developed in Eilers et al. (2006), and combine the two-dimensional penalty used for spline spatial regression with the one proposed in Ramsay and Silverman (1997) to obtain a three dimensional P-spline penalty. Hereinafter, this method will be called penalized functional spatial regression model (PFSRM).

Finally, the prediction accuracy of PFSRM is compared with OKFD in two simulation studies in Sect. 4. An application to the Canadian Maritime weather data is presented in Sect. 5. As we have said before, Canadian Maritime weather is a well known example of functional data, which in most cases have been consider as a set of independent curves related to daily temperature and precipitation at 35 different locations in Canada averaged over 1960–1994 (Ramsay and Silverman 1997). But this is a clear example of functional data presenting spatial dependence and in this sense was studied in Delicado et al. (2009) and Menafoglio (2013). The conclusions about these studies close the paper in Sect. 6.

2 Theoretic framework

Let us suppose that we have a sample of non-independent curves (spatial dependence) $\{y_{i} (t){\text {:}}\,t\in T,\, i=1,\ldots ,n\}$ given by

$$\begin{aligned} y_{i} (t) = x\left( s_{i},\,t\right) + \epsilon _{i} (t), \quad i=1,\ldots ,n, \end{aligned}$$

where $\epsilon _{i}(t)$ are zero mean random errors and $x(s_{i},\,t)$ are observations of a spatial functional variable (stochastic process)

$$\begin{aligned} \left\{ X(s,\,t){\text {:}}\,s \in S \subseteq {\mathbb {R}}^{2}, \,t\in T \subseteq {\mathbb {R}} \right\} , \end{aligned}$$

(1)

where $s=(u,\,v)$ is a generic data location in the spatial domain $S=U\times V,\,U,\,V$ and T are real intervals, and for each fixed spatio-temporal position $(s,\,t),\,X (s,\,t)$ is a real random variable defined on a probabilistic space $(\varOmega ,\,{\mathcal {A}},\,P).$

In addition, these sample curves have been observed with error at a finite set of time points $\{ t_{j}{\text {:}}\,j=1,\ldots ,m \}$ for each geographical location $s_{i} = (u_{i},\,v_{i}),$ so that, the sample data $y_{ij}$ are given by $y_{j} = y_{i} (t_{j}), \, i=1,\ldots ,n;\,j=1,\ldots ,m.$

Let us also assume that the realizations of this functional variable are square integrable functions on the spatio-temporal domain $U\times V \times T,$ so that each sample function $x(s,\,t)$ belongs to the Hilbert space $L^{2}( U\times V \times T )$ with the usual scalar product given by

$$\begin{aligned} \langle f,\,g\rangle =\int _{U} \int _{V} \int _{T} f( u,\, v,\, t) g( u,\, v,\, t)du dv dt,\quad \forall f,\,g\in L^{2}( U\times V \times T). \end{aligned}$$

In order to reconstruct the true functional form of the data from discrete spatio-temporal observations, we extend the usual basis expansion approach for representing curves in FDA to the case of spatio-temporal functions that depend on three continuous arguments.

Let us consider three univariate basis $\{ \phi _{k}^{U} ( u){\text {:}}\,u\in U;\, k=1,\ldots ,p\},\,\{ \phi _{l}^{V}( v){\text {:}}\, v\in V;\, l=1,\ldots ,q\}$ and $\{ \phi _{h}^{T} ( t){\text {:}}\, t\in T;\, h=1,\ldots ,r \}.$ Then, we assume that the realizations of the spatio-temporal functional variable belong to the p q r dimensional tensor function space generated by the basis

$$\begin{aligned} \left\{ \phi _{k}^{U} (u) \phi _{l}^{V} (v) \phi _{h}^{T} (t){\text {:}}\, k=1,\ldots ,p;\,l=1,\ldots ,q;\,h=1,\ldots ,r \right\} . \end{aligned}$$

That is,

$$\begin{aligned} x (s,\,t) = \sum _{k=1}^{p} \sum _{l=1}^{q} \sum _{h=1}^{r} a_{klh} \phi ^{U}_{k}(u) \phi ^{V}_{l}(v) \phi ^{T}_{h}(t). \end{aligned}$$

(2)

Then, the matrix $X =(x_{ij})_{n\times m}$ whose entries are the values of the spatio-temporal functional variable at the sampling points given by $x_{ij} = x(s_{i},\,t_{j})$ can be written in matrix form as

$$\begin{aligned} X = \left( \varPhi ^{U} \odot \varPhi ^{V} \right) A \varPhi ^{T^{\prime }}, \end{aligned}$$

(3)

where $\varPhi ^{U} = (\varPhi ^{U}_{ik})_{n\times p}$ with $\varPhi ^{U}_{ik} = \phi ^{U}_{k} (u_{i}),\,\varPhi ^{V} = (\varPhi ^{V}_{il})_{n\times q}$ with $\varPhi ^{V}_{il} = \phi ^{V}_{l}(v_{i}),\,\varPhi ^{T} = (\varPhi ^{T}_{jh})_{m\times r}$ with $\varPhi ^{T}_{jh} = \phi ^{T}_{h}(t_{j}),\,A=(a_{(kl)h})_{p q \times r}$ is the matrix comprising the basis coefficients and $\odot$ denotes the row-wise Khatri–Rao product so that $\varPhi ^{U} \odot \varPhi ^{V} = (( \varPhi ^{U} \odot \varPhi ^{V})_{i(kl)})_{n\times pq}$ with entries $( \varPhi ^{U} \odot \varPhi ^{V})_{i(kl)} = \phi ^{U}_{k} (u_{i}) \phi ^{V}_{l} (v_{i})$ (Rao and Rao 1998).

Once the basis coefficients in A are estimated from the discrete observations $y_{ij},$ the spatio-temporal functional variable can be estimated at unobserved locations and times $(s_{0},\,t_{0})$ by replacing in model (2). This way, we can obtain the complete curve of temporal evolution of the variable for unsampled geographical locations, and the complete surface of spatial evolution of the variable for any time point in the temporal domain.

2.1 Penalized functional spatial regression model

In this work we propose to estimate the basis coefficients in Eq. (2) by introducing the spatial variability through the following functional spatial regression model:

$$\begin{aligned} y(t)= Z \alpha (t) + \epsilon (t), \quad \forall t \in T, \end{aligned}$$

(4)

where $y(t)=(y_{1}(t),\ldots , y_{n}(t))^{\prime }$ is the vector of response functions, $Z=(z_{ik})_{n\times p q}=\varPhi ^{U} \odot \varPhi ^{V}$ is the two dimensional B-spline basis for the geographical position, $\alpha (t)=(\alpha _{1}(t),\ldots ,\alpha _{p q}(t))^{\prime }$ is the vector of parameter functions to be estimated and $\epsilon (t)=(\epsilon _{1}(t),\ldots ,\epsilon _{n}(t))^{\prime }$ the vector of error terms.

Let us consider the basis representation for the functional response $y(t)=C\phi ^{T}(t)$ and for the functional parameters $\alpha (t)=A\phi ^{T} (t),$ with $C=(c_{ih})_{n\times r}$ and $A=(a_{(kl)h})_{p q \times r}$ being the corresponding matrices of basis coefficients and $\phi ^{T}(t) =(\phi _{1}^{T} (t),\ldots ,\phi _{r}^{T} (t))^{\prime }$ being the vector of basis functions. Then, the model given in Eq. (4) can be rewritten as follows

$$\begin{aligned} C \phi ^{T} (t)= Z A\phi ^{T} (t) + \epsilon (t), \quad \forall t \in T. \end{aligned}$$

In order to estimate this model in an accurate way, a roughness penalty is introduced in the least squares fitting criterion, so that

$$\begin{aligned} PSSE(y,\,\alpha )&= \int \left( C\phi ^{T} (t) - ZA\phi ^{T} (t)\right) ^{\prime } \left( C\phi ^{T} (t) -ZA\phi ^{T} (t)\right) dt \\&\quad + vec(A)^{\prime }\left[ PEN_{d}^{U,V,T}\right] vec(A), \end{aligned}$$

(5)

where the operator vec(A) creates a column vector from any matrix A by stacking the column vectors of A, and $PEN_{d}^{U,V,T}$ denotes the d-order P-spline penalty for the space and time. This penalty can be expressed in terms of d-order difference operators $\Delta _{d}$ (Eilers et al. 2006), so that

$$\begin{aligned} PEN_{d}^{U,V,T}&= \lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U}\otimes I_{q}\otimes I_{r}\right) + \lambda _{2}\left( I_{p}\otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V} \otimes I_{r}\right) \\&\quad + \lambda _{3}\left( I_{p}\otimes I_{q} \otimes \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) . \end{aligned}$$

In this context, $\Delta _{d}^{U},\,\Delta _{d}^{V},\, \Delta _{d}^{T}$ are matrices of d-order differences, $\lambda _{1},\,\lambda _{2},$ and $\lambda _{3}$ are the smoothing parameters.

Interchanging the integration and summation operations implied by the matrix products, and computing the derivatives with respect to A in the resulting equation (see Appendix for further details), finally A is given by

$$\begin{aligned} vec(A)=\left[ \varPsi \otimes (Z^{\prime }Z) + PEN_{d}^{U,V,T}\right] ^{-1} vec(Z^{\prime }C\varPsi ^{\prime }), \end{aligned}$$

where $\varPsi =\int \phi ^{T} \phi ^{T}$ is the inner product matrix between the basis functions.

3 Selection of parameters

The three smoothing parameters involved in this problem $(\lambda _{1},\,\lambda _{2},\,\lambda _{3})$ are simultaneously selected by minimizing the following generalized cross validation error

$$\begin{aligned} GCVE\left( \lambda _{1},\,\lambda _{2},\,\lambda _{3}\right) = \displaystyle \frac{\sum _{i=1}^{n} SSE_{i}}{(n-trace(H))^{2}}, \end{aligned}$$

where

$$\begin{aligned} SSE_{i}=\displaystyle \sum _{j=1}^{m}\left( y\left( s_{i},\,t_{j}\right) -{\hat{y}}\left( s_{i},\,t_{j}\right) \right) ^{2}, \end{aligned}$$

and

$$\begin{aligned} H= \left( \varPhi ^{T}\otimes Z\right) \left[ \varPsi \otimes (Z^{\prime }Z) +PEN_{d}^{U,V,T}\right] ^{-1}\left( \varPhi ^{T^{\prime }}\otimes Z^{\prime }\right) , \end{aligned}$$

with $PEN_{d}^{U,V,T}$ being the three-dimensional P-spline penalty of order d described above.

Minimization of the GCVE can become computationally demanding in this case, since we need to search for three smoothing parameters. In order to speed up the computational burden, instead of using an optimization routine, we selected a 3d-array and performed a grid search. We also checked the performance of other criteria such as AIC and BIC, and we found that, in this case BIC tended to oversmooth the spatial component of the model and AIC performed as well as GCVE.

On the other hand, the dimension of the basis in the three spatio-temporal directions must also be selected. Taking into account that the degree of smoothing is controlled by the smoothing parameter, the number and location of knots is not crucial for fitting a P-spline. Generally, the knots of a P-spline are equally spaced and the number of knots must be sufficiently large to fit the data and not so large that computation time is unnecessarily high. Two algorithms for automatic selection of the number of knots by using generalized cross validation were considered in Ruppert (2002). In general, authors select the dimensions of the basis on the rule considered by Ruppert (2002), which proposes to use one definition knot by each five observation knots, approximately.

4 Simulation studies

In order to test the good performance of the proposed PFSRM, two different simulation studies have been developed. The first one considers non equally spaced spatial locations on a grid and independent random errors. The second one was simulated by considering non-regular spatial locations and the random errors were added at the two dimensions (space and time) through a spatio-temporal Gaussian process. In addition, the results are compared with a powerful geostatistical predictor, the OKFD developed by Giraldo et al. (2011). Let us observe that with both methods, PFSRM and OKFD, the first step is to approximate the true functional form of the sample curves in terms of basis functions. In this paper we will use regression splines in terms of cubic B-splines basis functions.

For each method, a leave-one-out cross validation procedure is considered to predict each curve at each spatial location. The integrated squared error of prediction, with respect to the original data, can be computed as

$$\begin{aligned} ISE_{i} = \int _{T} \left( x\left( s_{i},\,t\right) -{\hat{y}}^{(-i)}\left( s_{i},\,t\right) \right) ^{2} dt, \quad i=1,\ldots ,n, \end{aligned}$$

with ${\hat{y}}^{(-i)}(s_{i},\,t)$ being the predicted curve at location $s_{i}$ when the observation $y(s_{i},\,t)$ is not in the sample.

4.1 Simulation study I

This simulation study was first considered in Giraldo et al. (2012). In our case, 225 spatial locations were fixed in a grid according to the coordinates $u=v=({-}20,\,{-}16,\,{-}15,\,{-}10,\,{-}8,\,{-}5,\,{-}1,\, 1,\, 2,\, 6,\, 10,\, 12,\,15,\,16,\,20),$ on which a set of spatially correlated functional data were simulated at 365 equally spaced time points according to the model

$$\begin{aligned} Y\left( s_{i},\,t\right) = \sum _{k=1}^{15}a_{k} \left( s_{i}\right) \phi _{k}(t) + \epsilon \left( s_{i},\,t\right) , \quad i{\text {:}} \,1,\ldots ,225, \end{aligned}$$

where $\phi (t)=(\phi _{1}(t),\ldots ,\phi _{15}(t))$ is a cubic B-spline basis, and each coefficient $a_{k}$ is a realization of a Gaussian random field whose covariance structure is defined according to the exponential model $C(h)=2exp\left( \frac{{-}h}{8}\right) ,$ where $h=\Vert s_{i}-s_{j}\Vert ,\,(i,\,j=1,\ldots ,225)$ is the Euclidean distance between two sites $s_{i}$ and $s_{j}.$ Finally, $\epsilon (s_{i},\,t)$ are independent random errors for each t, with $t=1,\ldots ,365,$ simulated according to a distribution N(0, 0.09). The spatial locations are shown in Fig. 2. The simulated data sets, with and without noise, can be seen in Fig. 3.

The first step for applying both methods, PFSRM and OKFD, was to approximate the sample curves by using regression splines in terms of a cubic B-spline basis of dimension 15 and considering equally spaced knots. As an example, a sample path without noise (dashed line) together with the noisy sample path (grey line) and its basis representation (solid black line) are displayed in Fig. 4. In this case, regression splines on 15 basis functions get a perfect approximation to the true data (without noise).

In the model fitting (second step), the two-dimensional basis for the space was achieved by considering 6 basis knots for each marginal cubic B-spline basis. Regard to the penalty, a 2-order penalty has been considered. For OKFD, the variograms were selected as a linear combination of nugget and exponential models.

In order to check the good performance of the proposed methods, a leave-one-out cross-validation procedure was carried out to obtain the predicted curve at each unsampled spatial location. From the multiple box plot related to the distribution of the ISE’s (Fig. 5) and the statistics summary given in Table 1, it can be concluded that PFSRM achieves the lowest values for the mean, the standard deviation and the median of the prediction errors. On the other hand, in Fig. 6 the 225 predicted curves are displayed joint to the mean of the prediction curves (red lines) and the point wise confidence bands according to the mean $\pm$ 2 times the standard deviation.

In order to test if the performance varies depending on the spatial location, the predicted curves at two outlying spatial locations (A and C) and a more central spatial location (B) have been displayed in Fig. 7 (locations A–C are highlighted in Fig. 2). We can see that PFSRM provides the best predicted curves closer to the shape of the true data, independently of the predicted spatial location. It is interesting to highlight that the worst prediction from OKFD was in the most distant location C. So, the performance of OKFD is not equal across the grid areas.

Finally, in Fig. 8 the residual curves provided by the two compared methods have been displayed. It can be seen that the mean residual curve is zero in both cases, and in some spatial locations OKFD achieves larger residuals than our method.

Table 1 Simulation I: summary of ISE’s from the cross-validation predictions provided by OKFD and PFSRM

Full size table

4.2 Simulation study II

Let us now consider a set of 80 non regular spatial locations, which are displayed in Fig. 9, and a set of 100 equally spaced times at the interval $[0,\,1].$ The idea is to simulate a set of spatially correlated functional data according to the model

$$\begin{aligned} Y\left( s_{i},\,t\right) = \left[ a_{i} f_{1}\left( s_{i},\,t\right) + b_{i} f_{2}\left( s_{i},\,t\right) -0.5\right] \sin \left( c_{i} \pi \, *t\,- 0.2\right) + \epsilon \left( s_{i},\,t\right) , \quad i=1,\ldots ,80, \end{aligned}$$

where

$$\begin{aligned} f_{1}(s,\,t)=e^{\left( \frac{{-}(u-0.2)^{2}}{5} - \frac{(v-0.5)^{2}}{3} -\frac{(t-0.5)^{2}}{4} - 1\right) }, \end{aligned}$$

$$\begin{aligned} f_{2}(s,\,t)=e^{\left( -\frac{(u-0.3)^{2}}{4}- \frac{(v-0.7)^{2}}{2} -\frac{(t-0.4)^{2}}{6} \right) }, \end{aligned}$$

with $s=(u,\,v)$ denoting the pair of coordinates of the spatial locations, $a_{i},\,b_{i},$ and $c_{i}$ randomly simulated from $a\sim Uniform[0.5,\,2],\,b\sim Uniform[0.5,\,1],$ and $c\sim Uniform[1.5,\,2],$ and $\epsilon (s,\,t)$ being the error term corresponding to an observation of a spatio-temporal Gaussian process defined through a stochastic partial differential equation (Sigrist et al. 2015), with parameters $(\rho _{0}=0.1,\,\sigma ^{2}=0.25,\,\zeta =0.9,\,\rho _{1}=0.1,\, \gamma =2,\,\alpha =\pi /4,\,\mu _{x}=0.2,\,\mu _{y}={-}0.2,\, \tau ^{2}=0.01).$ The simulated sample paths with and without error can be seen in Fig. 10.

The first step was to approximate the sample curves by using basis representations with B-splines. In order to check the relation between the forecasting performance and the dimension of the initial approximation of the sample paths, regression splines in terms of a cubic B-spline basis of dimension 13 and 23 with equally spaced knots have been considered. In Fig. 11 a sample path without noise (dashed line) together with the noisy sample path (grey line) and its basis representation (solid black line) are shown. The regression splines of all curves can be seen in Fig. 12. Obviously, a higher dimension for the basis provides noisier sample curves and far away from the original ones. The two scenarios (13 and 23 basis functions) are considered and compared. In the model fitting (second step), the two-dimensional basis for the space was achieved by considering 15 basis knots for each marginal basis. As in simulation I, a 2-order P-spline penalty has been considered. For OKFD, the variograms were selected as a linear combination of nugget and exponential models.

In this study the differences in the accuracy of the predictions provided by the two methods are also shown. According to the statistics summary provided in Table 2, the proposed method reduces the value of the mean and the median of the ISE’s independently of the number of basis functions used at the first step (regression splines fitting). This fact is also supported by the box plots given in Fig. 13.

Table 2 Simulation II: summary of ISE’s from the cross validation predictions by considering 13 and 23 basis functions at the initial regression splines (R-splines) fitting

Full size table

With respect to the predicted curves obtained by each method, in Fig. 14 we can check that even in the most favorable case for OKFD of 13 basis functions, PFSRM provides the most accurate predictions. In order to test if the performance of the predictions varies depending on the spatial location, two outlying spatial locations (A and C) and a central spatial location (B) have been considered. Locations A–C are highlighted in Fig. 9. Accordingly, PFSRM achieves the predicted curves closest to the true data.

On the other hand, in the Fig. 15 the mean curve of the predictions (red line) join to the point wise confidence bands (dashed line) according to the mean $\pm$ 2 times the standard deviation are displayed, with OKFD being the method that provides the noisier mean curve and the predictions with more variability. Let us also observe that PFSRM achieves the smoothest predicted curves while OKFD provides predicted curves further away from the real ones. Taking into account that OKFD seems to be very sensitive to the dimension of the basis, other possibility could be pre-smoothing the data by using P-splines. The measures of accuracy of the predictions provided by the two methods figure in Table 2. In this case the prediction errors decrease for both methods, but the ones given by PFSRM continue being lower.

Therefore, we can say that independently of the basis dimension used to approximate the sample curves at the beginning, OKFD has not been able to predict curves with the same shape than the true data while PFSRM gets more accurate predictions in both scenarios. Then, PFSRM is more robust than OKFD in the sense that it is not so sensitive to the selection of the dimension of the basis.

5 Application to Canadian Maritime weather data

In this study we use averages (over 30 years) of daily temperature curves observed at 35 Canadian Maritime weather stations. This is a clear example of functional data presenting spatial dependence, since curves located at closer geographical locations will be similar to other there are further apart (see Fig. 1).

The first step for both methods is to consider the basis representation of the raw sample paths in terms of cubic B-spline basis functions. In order to get more general conclusions, different number of basis functions have been considered for the initial basis representation of the sample paths, exactly 33 (Case 1) and 65 (Case 2). The regression splines fitted in the two cases are displayed in Fig. 16. As in both simulation studies, a 2-order P-spline penalty was considered. Furthermore, the variograms were selected as a linear combination of nugget and exponential models.

In order to get the predicted curve on each geographical site a leave-one-out cross-validation procedure was carried out. The predicted curves obtained by OKFD and PFSRM next to their mean curve and point wise confidence bands (according to the mean $\pm$ 2 times the standard deviation) can be seen in Fig. 17. In both cases (Cases 1 and 2), the spatial basis is made by considering 6 knots for each marginal basis. It can be seen that when the dimension of the basis for fitting the regression splines increases (Case 2), the predictions provided by OKFD are noisier than in Case 1. By contrast, PFSRM provides similar predictions independently of the number of basis functions used to fit the initial regression splines. In that sense, PFSRM is more robust than OKFD with respect to the dimension of the initial B-spline expansions in the time domain. This is an important advantage of our method, since the selection of the number of initial basis functions is not as relevant as in functional kriging. For two Canadian Maritime provinces, the predicted temperature curves by OKFD (blue) and PFSRM (red) are plotted together with the observed temperature curves in Fig. 18. Independently of the basis dimension, PFSRM provides smoother and more accurate predicted curves than OKFD and also maintains the trend of the raw data.

In order to compare the prediction ability of the two methods, the box plots related to the SSE’s (with respect to the observed data) obtained by cross validation for Cases 1 and 2 can be seen in Fig. 19. Also, the mean, the standard deviation and the median of the 35 SSE’s are summarized in Table 3. Again, independently of the dimension of the basis used in the initial regression splines, the lowest values of the median of the SSE’s are always obtained by PFSRM.

Finally, as goodness-of-fit measure, in Fig. 20 the residual curves have been displayed, highlighting the residuals related to three spatial locations (color and dashed lines). From this figures we can conclude that both methods provide a considerable number of residuals close to zero, but in some locations OKFD provides larger residuals than PFSRM.

Table 3 Application: the median, the mean and the standard deviation of the SSEs (with respect to the observed data) obtained in the cross validation for Cases 1 and 2

Full size table

6 Conclusions

The aim of this paper is to provide a new tool to predict spatially dependent functional data as alternative to other geostatistical prediction techniques, such as functional kriging. From a formal definition of spatial functional data, which was presented in Sect. 2, a penalized estimation of a functional spatial regression model has been proposed in this paper by introducing a three-dimensional P-spline penalty at the least squares fitting criterion (Sect. 2.1).

In order to compare the proposed method with functional kriging on different scenarios, two different simulation schemes have been carried out. The first considers non equally spaced spatial locations on a grid and independent random errors. The second one was simulated by considering non-regular spatial locations and random errors simulated from a spatio-temporal Gaussian process. Furthermore, an application to climatological real data has been presented.

In both simulation studies and the application to real data, the first step was to approximate the sample curves from their discrete observations by using regression splines on cubic B-spline basis with different dimensions just for a comparison purpose (the number basis functions at each study was proposed attending to the corresponding data structure). Regard to the P-spline penalty used in the model fitting, in all cases a 2-order penalty was considered.

A leave-one-out cross-validation procedure was carried out to obtain the predicted curve at each unsampled spatial location. Also, in order to test if the forecasting performance varies depending on the spatial location, the predicted curves in three outlying spatial locations were analyzed in both simulation studies.

The first simulation study revels that even in the most favorable (where the initial regression splines are closer to the true data without noise), PFSRM provides the most accurate predicted curves very close to the shape of the true data, independently of the spatial location. By contrast, the forecasting performance of OKFD is not equal across the grid areas.

The second simulation study highlights that independently of the basis dimension used to approximate the sample curves at the beginning, OKFD has not been able to predict curves with the same shape than the true data while PFSRM gets the most accurate predictions in the two considered scenarios. In addition, we can say that PFSRM is more robust than OKFD in the sense that it is not as sensitive as OKFD to the selection of the basis dimension.

In general, from the two simulation studies, it is clear that PFSRM provides the most accurate predicted curves (even in outlying spatial locations) and reduces the mean and the median of the ISE’s with respect to OKFD, independently of the number of basis functions used at the initial smoothing with regression splines.

In addition, one of the advantages of our method is that it can be used when the spatial locations where measurements are taken change from one time point to another, and also the model can easily cope with missing space or time points.

Summarizing, it can be concluded that in order to predict functional data with spatial dependence, the proposed PFSRM is a more accurate and computationally efficient alternative to existing geostatistical predictors as ordinary functional kriging.

References

Aguilera AM, Aguilera-Morillo MC (2013) Comparative study of different B-spline approaches for functional data. Math Comput Model 58:1568–1579
Article Google Scholar
Aguilera AM, Aguilera-Morillo MC (2013) Penalized PCA approaches for B-spline expansions of smooth functional data. Appl Math Comput 219:7805–7819
Google Scholar
Aguilera-Morillo MC, Aguilera AM, Escabias M, Valderrama MJ (2013) Penalized spline approaches for functional logit regression. Test 22:251–277
Article Google Scholar
Caballero W, Giraldo R, Mateu J (2013) A universal kriging approach for spatial functional data. Stoch Environ Res Risk Assess 27:1553–1563
Article Google Scholar
Chiou JM, Müller HG, Wang JL (2004) Functional response models. Stat Sin 14:659–677
Google Scholar
Delicado P, Giraldo R, Comas C, Mateu J (2009) Statistics for spatial functional data: some recent contributions. Environmetrics 21:224–239
Article Google Scholar
Dubrule O (1984) Comparing kriging and splines. Comput Geosci 10(2–3):327–338
Article Google Scholar
Eilers PHC, Marx B (1996) Flexible smoothing with B-splines and penalties. Stat Sci 11:89–121
Article Google Scholar
Eilers PHC, Currie I, Durban M (2006) Fast and compact smoothing on large multidimensional grids. Comput Stat Data Anal 50:61–76
Article Google Scholar
Escabias M, Aguilera AM, Valderrama MJ (2005) Modeling environmental data by functional principal component logistic regression. Environmetrics 16:95–107
Article Google Scholar
Faraway JJ (1997) Regression analysis for a functional response. Technometrics 39:254–261
Article Google Scholar
Fernandez-Pascual RM, Espejo R, Ruiz-Medina MD (2015) Moment and Bayesian wavelet regression from spatially correlated functional data. Stoch Environ Res Risk Assess. doi:10.1007/s00477-015-1130-5
Ferraty F, Vieu P (2006) Nonparametric functional data analysis. Springer, New York
Google Scholar
Giraldo R (2010) Geostatistical analysis of functional data. PhD Thesis, Universitat Politècnica de Catalunya, Catalunya
Giraldo R, Delicado P, Mateu J (2010) Continuous time-varying kriging for spatial prediction of functional data: an environmental application. J Agric Biol Environ Stat 15:66–82
Article Google Scholar
Giraldo R, Delicado P, Mateu J (2011) Ordinary kriging for function-valued spatial data. Environ Ecol Stat 18:411–426
Article Google Scholar
Giraldo R, Mateu J, Delicado P (2012) geofd: an R package for function-valued geostatistical prediction. Rev Colomb Estad 35:385–407
Google Scholar
Goulard M, Voltz M (1993) Geostatistical interpolation of curves: a case study in soil science. Springer, Dordrecht, pp 805–816
Google Scholar
Harville DA (1997) Matrix algebra from a statistician’s perspective. Springer, New York
Book Google Scholar
Horvath L, Kokoszka P (2012) Inference for functional data with applications. Springer, New York
Book Google Scholar
Hsing T, Eubank R (2015) Theoretical foundations of functional data analysis with an introduction to linear operators. Wiley, Chichester
Book Google Scholar
Ignaccolo R, Mateu J, Giraldo R (2014) Kriging with external drift for functional data for air quality monitoring. Stoch Environ Res Risk Assess 28:1171–1186
Article Google Scholar
Kaufman CG, Sain SR (2010) Bayesian functional ANOVA modeling using Gaussian process prior distributions. Bayesian Anal 5(1):123–150
Article Google Scholar
Laslett GM (1994) Kriging and splines: an empirical comparison of their predictive performance in some applications. J Am Stat Assoc 89(426):391–400
Article Google Scholar
Lee DJ, Durban M (2009) Smooth-car mixed models for spatial count data. Comput Stat Data Anal 53:2968–2977
Article Google Scholar
Lee DJ, Durban M (2011) Pspline ANOVA type interaction models for spatio temporal smoothing. Stat Model 11:49–69
Article Google Scholar
Marx BD, Eilers PHC (1999) Generalized linear regression on sampled signals and curves. A P-spline approach. Technometrics 41:1–13
Article Google Scholar
Menafoglio A, Secchi P, Dalla Rosa M (2013) A universal kriging predictor for spatially dependent functional data of a Hilbert Space. Electron J Stat 7:2209–2240
Article Google Scholar
Ramsay JO, Silverman BW (1997) Functional data analysis, 1st edn. Springer, New York
Book Google Scholar
Rao CR, Rao MB (1998) Matrix algebra and its applications to statistics and econometrics. World Scientific Publishing Co., Pte. Ltd., Singapore
Book Google Scholar
Reiss PT, Huang L, Mennes M (2010) Fast function-on-scalar regression with penalized basis expansions. Int J Biostat 6:1–28
Article Google Scholar
Ruiz-Medina MD, Espejo RM (2012) Spatial autoregressive functional plug-in prediction of ocean surface temperature. Stoch Environ Res Risk Assess 26:335–344
Article Google Scholar
Ruiz-Medina MD, Espejo RM, Ugarte MD, Militino AF (2014) Functional time series analysis of spatiotemporal epidemiological data. Stoch Environ Res Risk Assess 28:943–954
Article Google Scholar
Ruppert D (2002) Selecting the number of knots for penalized splines. J Comput Graph Stat 11:735–757
Article Google Scholar
Sangalli LM, Ramsay JO, Ramsay TO (2013) Spatial spline regression models. J R Stat Soc B 75:1–23
Article Google Scholar
Shi JQ, Choi T (2011) Gaussian process regression analysis for functional data. CRC Press, Chapman and Hall, Boca Raton
Google Scholar
Sigrist F, Kuensch HR, Stahel WA (2015) spate: an R package for spatio-temporal modeling with a stochastic advection–diffusion process. J Stat Softw 63:1–23
Article Google Scholar
Ugarte MD, Goicoa T, Militino AF, Durban M (2009) Spline smoothing in small area trend estimation and smoothing. Comput Stat Data Anal 53:3616–3629
Article Google Scholar
Yakowitz SJ, Szidarovsky F (1985) A comparison of kriging with non-parametric regression methods. J Multivar Anal 16:21–53
Article Google Scholar
Zhang J-T (2013) Analysis of variance for functional data. CRC Press, Chapman and Hall
Zhang J-T, Chen J (2007) Statistical inference for functional data. Ann Stat 35(3):1052–1079
Article Google Scholar

Download references

Acknowledgments

This research has been funded by Project P11-FQM-8068 from Consejería de Innovación, Ciencia y Empresa, Junta de Andalucía, Spain and the Projects MTM2013-47929-P, MTM 2011-28285-C02-C2 and MTM 2014-52184-P from Secretaría de Estado Investigación, Desarrollo e Innovación, Ministerio de Economía y Competitividad, Spain. we want to thanks Giraldo et al. by providing the R code related to Functional Kriging. Finally, we also thank the referees for the valuable comments on our manuscript. These comments helped to improve the organization and the understanding of our paper.

Author information

Authors and Affiliations

University Carlos III de Madrid, Leganés, Madrid, Spain
M. Carmen Aguilera-Morillo & María Durbán
University of Granada, Granada, Spain
Ana M. Aguilera

Authors

M. Carmen Aguilera-Morillo
View author publications
You can also search for this author in PubMed Google Scholar
María Durbán
View author publications
You can also search for this author in PubMed Google Scholar
Ana M. Aguilera
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to M. Carmen Aguilera-Morillo.

Appendix

Taking into account the following properties (Harville 1997)

$$\begin{aligned} vec(A)^{\prime }(D\otimes B)vec(C)&= trace(A^{\prime }BCD^{\prime }), \\ trace(A^{\prime }AB)&= trace(ABA^{\prime }), \end{aligned}$$

(6)

the Eq. (5) can be rewritten as

$$\begin{aligned} PSSE(y,\,\alpha )&= trace(C^{\prime }C\Psi )+ trace(Z^{\prime }ZA\Psi A^{\prime }) \\&\quad - 2trace(A\Psi C^{\prime }Z) + \lambda _{1} trace\left( A^{\prime } \left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U} \otimes I_{q}\right) A\right) \\&\quad + \lambda _{2} trace\left( A^{\prime } \left( I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V}\right) A\right) + \lambda _{3} trace\left( A \left( \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) A^{\prime }\right) , \end{aligned}$$

(7)

with $\Psi =\int \phi ^{T} \phi ^{T}$ being the inner product matrix between the basis functions. Next step is to compute the derivatives of Eq. (3) with respect to A. By considering the following properties (Harville 1997),

$$\begin{aligned} \frac{\partial trace(XAX^{\prime })}{\partial X}&= X(A+A^{\prime }), \end{aligned}$$

(8)

$$\begin{aligned} \frac{\partial trace(X^{\prime }AX)}{\partial X}&= (A+A^{\prime })X, \end{aligned}$$

(9)

$$\begin{aligned} \frac{\partial trace(XA)}{\partial X}&= A^{\prime }, \end{aligned}$$

(10)

we have that

$$\begin{aligned} \frac{\partial trace(C^{\prime }C\Psi )}{\partial A}&= 0,\\ \frac{\partial trace(Z^{\prime }ZA\Psi A^{\prime })}{\partial A}&\mathop {=}\limits ^{(3)}Z^{\prime }ZA(\Psi + \Psi ^{\prime })\\&\mathop {=}\limits ^{(symmetry)}2Z^{\prime }ZA\Psi ,\\ \frac{\partial -2trace(A\Psi C^{\prime }Z)}{\partial A}&\mathop {=}\limits ^{(5)}{-}2Z^{\prime }C\Psi ^{\prime }, \\ \frac{\partial \lambda _{1} trace(A^{\prime } (\Delta _{d}^{U^{\prime }}\Delta _{d}^{U}\otimes I_{q}) A)}{\partial A}&\mathop {=}\limits ^{(4)}2\lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U} \otimes I_{q}\right) A, \\ \frac{\partial \lambda _{2} trace(A^{\prime } (I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V}) A)}{\partial A}&\mathop {=}\limits ^{(4)}2\lambda _{2}\left( I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V}\right) A, \\ \frac{\partial \lambda _{3} trace(A (\Delta _{d}^{T^{\prime }}\Delta _{d}^{T})A^{\prime })}{\partial A}&\mathop {=}\limits ^{(3)}2\lambda _{3}A\left( \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) . \end{aligned}$$

Then, A satisfies the matrix system of linear equations given by

$$\begin{aligned} Z^{\prime }ZA\Psi + \lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U} \otimes I_{q}\right) A + \lambda _{2}\left( I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V}\right) A +\lambda _{3}A \left( \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) = Z^{\prime }C\Psi ^{\prime }. \end{aligned}$$

(11)

In order to get the solution to A, the Kronecker product is used to express Eq. (9) in conventional matrix algebra

$$\begin{aligned} vec(Z^{\prime }ZA\varPsi )&\mathop {=}\limits ^{(1)}\,(\varPsi \otimes (Z^{\prime }Z))vec(A),\\ vec\left( \lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U} \otimes I_{q}\right) A\right)&= vec\left( \lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U} \otimes I_{q}\right) A I_{r}\right) \\&\mathop {=}\limits ^{(1)}\lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U} \otimes I_{q} \otimes I_{r}\right) vec(A),\\ vec\left( \lambda _{2}\left( I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V}\right) A\right)&= vec\left( \lambda _{2}\left( I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V}\right) A I_{r}\right) \\&\mathop {=}\limits ^{(1)}\lambda _{2}\left( I_{p} \otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V} \otimes I_{r}\right) vec(A),\\ vec\left( \lambda _{3}A \left( \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) \right)&= vec\left( \lambda _{3}\left( I_{p} \otimes I_{q}\right) A \left( \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) \right) \\&\mathop {=}\limits ^{(1)}\lambda _{3}\left( I_{p} \otimes I_{q} \otimes \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) vec(A). \\ \end{aligned}$$

Then, Eq. (9) can be re-written as follows

$$\begin{aligned} \left[ \varPsi \otimes (Z^{\prime }Z) + PEN_{d}^{U,V,T}\right] vec(A)= vec(Z^{\prime }C\varPsi ^{\prime }), \end{aligned}$$

where $PEN_{d}^{U,V,T}$ is a P-spline penalty developed by Eilers et al. (2006), which is given by

$$\begin{aligned} PEN_{d}^{U,V,T}&= \lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U}\otimes I_{q}\otimes I_{r}\right) + \lambda _{2}\left( I_{p}\otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V} \otimes I_{r}\right) \\&\quad + \lambda _{3}\left( I_{p}\otimes I_{q} \otimes \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) . \end{aligned}$$

Finally, A is given by

$$\begin{aligned} vec(A)=\left[ \varPsi \otimes (Z^{\prime }Z) + PEN_{d}^{U,V,T}\right] ^{-1} vec(Z^{\prime }C\varPsi ^{\prime }). \end{aligned}$$

Rights and permissions

Reprints and permissions

About this article

Cite this article

Aguilera-Morillo, M.C., Durbán, M. & Aguilera, A.M. Prediction of functional data with spatial dependence: a penalized approach. Stoch Environ Res Risk Assess 31, 7–22 (2017). https://doi.org/10.1007/s00477-016-1216-8

Download citation

Published: 05 February 2016
Issue Date: January 2017
DOI: https://doi.org/10.1007/s00477-016-1216-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Prediction of functional data with spatial dependence: a penalized approach

Abstract

Similar content being viewed by others

Prediction of spatial functional random processes: comparing functional and spatio-temporal kriging approaches

Birnbaum–Saunders functional regression models for spatial data

On estimation and prediction in spatial functional linear regression model

1 Introduction

2 Theoretic framework

2.1 Penalized functional spatial regression model

3 Selection of parameters