1 Introduction

Functional data analysis (FDA) is currently a very active statistical research topic from both the theoretical and applied viewpoint. The functional data are set of functions obtained as independent realizations of a functional random variable that takes values in a functional space defined on a continuous domain. In most cases functional data observations are curves that correspond to the evolution of a scalar variable over time but also surfaces representing the evolution of a scalar variable on space can be seen as functional data. The name FDA is due to the pioneer book by Ramsay and Silverman (1997) that contains an excellent collection of the main FDA methodologies and interesting motivating examples. From that moment FDA had a brilliant development with more than 8000 references in academic google during the twenty first century. Although the key tool in FDA is still principal component analysis, other techniques of multivariate analysis like canonical correlation, discriminant and cluster analysis were also investigated. Recently, research on FDA is oriented to regression models, non-parametric estimation, robust estimation, Bayesian estimation, inference and so on. We can say that almost any statistical method is being extended for analyzing functional data. At the same time new books appear on these FDA topics (Ferraty and Vieu 2006; Horvath and Kokoszka 2012; Hsing and Eubank 2015; Shi and Choi 2011; Zhang 2013).

This work is focused on the analysis of univariate functional data with spatial dependence. The sample information is given by a set of curves associated to different geographical locations on a spatial domain. Let us consider as an illustration the Canadian temperature data set introduced by Ramsay and Silverman (1997) that has been analyzed in a lot of papers by using different FDA methodologies. These data are the daily temperature (averaged over 30 years) for 35 weather stations in Canada so that we have a set of 35 spatially correlated curves that represent the temporal evolution of temperatures in different geographical sites (spatio-temporal functional variable). The raw data set together with the map with the geographical locations are shown in Fig. 1. Spatially correlated functional data are very common in environmental applications and are analyzed in many cases by using FDA approaches that does not take into account the spatial dependence structure. Some interesting applications of these type can be seen in Escabias et al. (2005), Zhang and Chen (2007) and Kaufman and Sain (2010).

Fig. 1
figure 1

Averages (over 30 years) of daily temperature curves observed at 35 Canadian Maritime weather stations

The aim of this work is to develop a method for modeling the mean of a spatio-temporal functional variable from its discrete observations at a finite set of time points and locations in the temporal and spatial domains, respectively. The estimated model will provide the prediction of the curves of temporal evolution of the variable at unobserved locations and also the prediction of the surface of geographical evolution of the variable at unobserved time points.

This problem has been approached by different authors in the context of geostatistical techniques. The first notions about this topic can be found in Goulard and Voltz (1993), where multivariate approaches were used to predict curves at unsampled spatial sites. A more recent collection of geostatistical tools for spatial functional data can be seen in Giraldo (2010) and Delicado et al. (2009). In general, the most used technique to predict functional data with spatial dependence is functional kriging. In Giraldo et al. (2010) a continuous time-varying kriging was proposed and applied to environmental data. A formal version of ordinary kriging for functional data (OKFD) was developed by Giraldo et al. (2011), and implemented in the R package geofd (Giraldo et al. 2012). Recently, different versions of universal kriging predictor for functional data with spatial dependence were proposed in Menafoglio (2013) and Caballero et al. (2013). Kriging with external drift has also been extended for introducing exogenous variables with spatially correlated functional data (Ignaccolo et al. 2014).

In the context of spatial data an alternative to geostatistical techniques are the nonparametric spatial regression models. A popular approach consists of using penalized-splines (Eilers and Marx 1996). They are based on the use of a rich basis for regression and a penalty (based on differences of adjacent coefficients) to control the smoothness of the fit. This methodology has been successfully applied to both, functional and spatial data in different contexts. In FDA P-splines were used for smoothing the sample curves (Aguilera and Aguilera-Morillo 2013a) and estimating different FDA models as PCA (Aguilera and Aguilera-Morillo 2013b) or functional regression (Marx and Eilers 1999; AguileraMorillo et al. 2013), among others. In the case of spatial data, Lee and Durban (2009) and Ugarte et al. (2009) used P-splines for smoothing spatially correlated count data, Lee and Durban (2011) extended their use to the case of spatio-temporal data, and more recently, Sangalli et al. (2013) proposed a spatial regression model for data distributed over irregularly shaped spatial domains. A wavelet regression approach for estimating the field of ocean temperature at different depths is introduced in Fernndez-Pascual (2015). On the other hand, functional approaches based on autoregressive Hilbertian processes were considered in Ruiz-Medina et al. (2012, 2014).

Univariate kriging and spline smoothers were compared in several papers without reaching an unanimous conclusion. The major objection to kriging is the assumption of stationarity that could not be right for some types of spatial structure. From simulations where spline regression predicts better than kriging when the data contains trends of various types, some papers conclude that non-parametric regression is more robust than kriging because takes into account spatial structure that geostatistics does not (Yakowitz and Szidarovsky 1985). Other papers conclude that kriging never performs worse than splines and has the potential to outpredict splines when data are not sampled on a grid (Dubrule 1984; Laslett 1994).

Our aim is to use spatial smoothing regression techniques within a functional regression approach to provide a new method to predict functional data with spatial dependence at unsampled locations. From the formal definition of spatio-temporal functional data, which is given in Sect. 2, a penalized functional regression model is extended for predicting spatially correlated functional data in Sect. 2.1. The idea is to consider the functional regression model for functional response and scalar covariates (Faraway 1997; Ramsay and Silverman 1997; Chiou et al. 2004) by using the spatial information as regressors. So, a mixture of functional regression model for functional response and penalized spline spatial regression will yield the proposed functional spatial regression model. In practice, functional data are usually observed with some error or noise. To overcome this problem, Ramsay and Silverman (1997) considered a penalized version of functional regression for functional response by introducing a continuous penalty (based on the second order squared derivatives of the parameter functions) in the least squares fitting, and Reiss (2010) used a penalized generalized least squares criterion based on a basis representation. In this paper, we will adapt the idea developed in Eilers et al. (2006), and combine the two-dimensional penalty used for spline spatial regression with the one proposed in Ramsay and Silverman (1997) to obtain a three dimensional P-spline penalty. Hereinafter, this method will be called penalized functional spatial regression model (PFSRM).

Finally, the prediction accuracy of PFSRM is compared with OKFD in two simulation studies in Sect. 4. An application to the Canadian Maritime weather data is presented in Sect. 5. As we have said before, Canadian Maritime weather is a well known example of functional data, which in most cases have been consider as a set of independent curves related to daily temperature and precipitation at 35 different locations in Canada averaged over 1960–1994 (Ramsay and Silverman 1997). But this is a clear example of functional data presenting spatial dependence and in this sense was studied in Delicado et al. (2009) and Menafoglio (2013). The conclusions about these studies close the paper in Sect. 6.

2 Theoretic framework

Let us suppose that we have a sample of non-independent curves (spatial dependence) \(\{y_{i} (t){\text {:}}\,t\in T,\, i=1,\ldots ,n\}\) given by

$$\begin{aligned} y_{i} (t) = x\left( s_{i},\,t\right) + \epsilon _{i} (t), \quad i=1,\ldots ,n, \end{aligned}$$

where \(\epsilon _{i}(t)\) are zero mean random errors and \(x(s_{i},\,t)\) are observations of a spatial functional variable (stochastic process)

$$\begin{aligned} \left\{ X(s,\,t){\text {:}}\,s \in S \subseteq {\mathbb {R}}^{2}, \,t\in T \subseteq {\mathbb {R}} \right\} , \end{aligned}$$
(1)

where \(s=(u,\,v)\) is a generic data location in the spatial domain \(S=U\times V,\,U,\,V\) and T are real intervals, and for each fixed spatio-temporal position \((s,\,t),\,X (s,\,t)\) is a real random variable defined on a probabilistic space \((\varOmega ,\,{\mathcal {A}},\,P).\)

In addition, these sample curves have been observed with error at a finite set of time points \(\{ t_{j}{\text {:}}\,j=1,\ldots ,m \}\) for each geographical location \(s_{i} = (u_{i},\,v_{i}),\) so that, the sample data \(y_{ij}\) are given by \(y_{j} = y_{i} (t_{j}), \, i=1,\ldots ,n;\,j=1,\ldots ,m.\)

Let us also assume that the realizations of this functional variable are square integrable functions on the spatio-temporal domain \(U\times V \times T,\) so that each sample function \(x(s,\,t)\) belongs to the Hilbert space \(L^{2}( U\times V \times T )\) with the usual scalar product given by

$$\begin{aligned} \langle f,\,g\rangle =\int _{U} \int _{V} \int _{T} f( u,\, v,\, t) g( u,\, v,\, t)du dv dt,\quad \forall f,\,g\in L^{2}( U\times V \times T). \end{aligned}$$

In order to reconstruct the true functional form of the data from discrete spatio-temporal observations, we extend the usual basis expansion approach for representing curves in FDA to the case of spatio-temporal functions that depend on three continuous arguments.

Let us consider three univariate basis \(\{ \phi _{k}^{U} ( u){\text {:}}\,u\in U;\, k=1,\ldots ,p\},\,\{ \phi _{l}^{V}( v){\text {:}}\, v\in V;\, l=1,\ldots ,q\}\) and \(\{ \phi _{h}^{T} ( t){\text {:}}\, t\in T;\, h=1,\ldots ,r \}.\) Then, we assume that the realizations of the spatio-temporal functional variable belong to the p q r dimensional tensor function space generated by the basis

$$\begin{aligned} \left\{ \phi _{k}^{U} (u) \phi _{l}^{V} (v) \phi _{h}^{T} (t){\text {:}}\, k=1,\ldots ,p;\,l=1,\ldots ,q;\,h=1,\ldots ,r \right\} . \end{aligned}$$

That is,

$$\begin{aligned} x (s,\,t) = \sum _{k=1}^{p} \sum _{l=1}^{q} \sum _{h=1}^{r} a_{klh} \phi ^{U}_{k}(u) \phi ^{V}_{l}(v) \phi ^{T}_{h}(t). \end{aligned}$$
(2)

Then, the matrix \(X =(x_{ij})_{n\times m}\) whose entries are the values of the spatio-temporal functional variable at the sampling points given by \(x_{ij} = x(s_{i},\,t_{j})\) can be written in matrix form as

$$\begin{aligned} X = \left( \varPhi ^{U} \odot \varPhi ^{V} \right) A \varPhi ^{T^{\prime }}, \end{aligned}$$
(3)

where \(\varPhi ^{U} = (\varPhi ^{U}_{ik})_{n\times p}\) with \(\varPhi ^{U}_{ik} = \phi ^{U}_{k} (u_{i}),\,\varPhi ^{V} = (\varPhi ^{V}_{il})_{n\times q}\) with \(\varPhi ^{V}_{il} = \phi ^{V}_{l}(v_{i}),\,\varPhi ^{T} = (\varPhi ^{T}_{jh})_{m\times r}\) with \(\varPhi ^{T}_{jh} = \phi ^{T}_{h}(t_{j}),\,A=(a_{(kl)h})_{p q \times r}\) is the matrix comprising the basis coefficients and \(\odot\) denotes the row-wise Khatri–Rao product so that \(\varPhi ^{U} \odot \varPhi ^{V} = (( \varPhi ^{U} \odot \varPhi ^{V})_{i(kl)})_{n\times pq}\) with entries \(( \varPhi ^{U} \odot \varPhi ^{V})_{i(kl)} = \phi ^{U}_{k} (u_{i}) \phi ^{V}_{l} (v_{i})\) (Rao and Rao 1998).

Once the basis coefficients in A are estimated from the discrete observations \(y_{ij},\) the spatio-temporal functional variable can be estimated at unobserved locations and times \((s_{0},\,t_{0})\) by replacing in model (2). This way, we can obtain the complete curve of temporal evolution of the variable for unsampled geographical locations, and the complete surface of spatial evolution of the variable for any time point in the temporal domain.

2.1 Penalized functional spatial regression model

In this work we propose to estimate the basis coefficients in Eq. (2) by introducing the spatial variability through the following functional spatial regression model:

$$\begin{aligned} y(t)= Z \alpha (t) + \epsilon (t), \quad \forall t \in T, \end{aligned}$$
(4)

where \(y(t)=(y_{1}(t),\ldots , y_{n}(t))^{\prime }\) is the vector of response functions, \(Z=(z_{ik})_{n\times p q}=\varPhi ^{U} \odot \varPhi ^{V}\) is the two dimensional B-spline basis for the geographical position, \(\alpha (t)=(\alpha _{1}(t),\ldots ,\alpha _{p q}(t))^{\prime }\) is the vector of parameter functions to be estimated and \(\epsilon (t)=(\epsilon _{1}(t),\ldots ,\epsilon _{n}(t))^{\prime }\) the vector of error terms.

Let us consider the basis representation for the functional response \(y(t)=C\phi ^{T}(t)\) and for the functional parameters \(\alpha (t)=A\phi ^{T} (t),\) with \(C=(c_{ih})_{n\times r}\) and \(A=(a_{(kl)h})_{p q \times r}\) being the corresponding matrices of basis coefficients and \(\phi ^{T}(t) =(\phi _{1}^{T} (t),\ldots ,\phi _{r}^{T} (t))^{\prime }\) being the vector of basis functions. Then, the model given in Eq.  (4) can be rewritten as follows

$$\begin{aligned} C \phi ^{T} (t)= Z A\phi ^{T} (t) + \epsilon (t), \quad \forall t \in T. \end{aligned}$$

In order to estimate this model in an accurate way, a roughness penalty is introduced in the least squares fitting criterion, so that

$$\begin{aligned} PSSE(y,\,\alpha )&= \int \left( C\phi ^{T} (t) - ZA\phi ^{T} (t)\right) ^{\prime } \left( C\phi ^{T} (t) -ZA\phi ^{T} (t)\right) dt \\&\quad + vec(A)^{\prime }\left[ PEN_{d}^{U,V,T}\right] vec(A), \end{aligned}$$
(5)

where the operator vec(A) creates a column vector from any matrix A by stacking the column vectors of A, and \(PEN_{d}^{U,V,T}\) denotes the d-order P-spline penalty for the space and time. This penalty can be expressed in terms of d-order difference operators \(\Delta _{d}\) (Eilers et al. 2006), so that

$$\begin{aligned} PEN_{d}^{U,V,T}&= \lambda _{1}\left( \Delta _{d}^{U^{\prime }}\Delta _{d}^{U}\otimes I_{q}\otimes I_{r}\right) + \lambda _{2}\left( I_{p}\otimes \Delta _{d}^{V^{\prime }}\Delta _{d}^{V} \otimes I_{r}\right) \\&\quad + \lambda _{3}\left( I_{p}\otimes I_{q} \otimes \Delta _{d}^{T^{\prime }}\Delta _{d}^{T}\right) . \end{aligned}$$

In this context, \(\Delta _{d}^{U},\,\Delta _{d}^{V},\, \Delta _{d}^{T}\) are matrices of d-order differences, \(\lambda _{1},\,\lambda _{2},\) and \(\lambda _{3}\) are the smoothing parameters.

Interchanging the integration and summation operations implied by the matrix products, and computing the derivatives with respect to A in the resulting equation (see Appendix for further details), finally A is given by

$$\begin{aligned} vec(A)=\left[ \varPsi \otimes (Z^{\prime }Z) + PEN_{d}^{U,V,T}\right] ^{-1} vec(Z^{\prime }C\varPsi ^{\prime }), \end{aligned}$$

where \(\varPsi =\int \phi ^{T} \phi ^{T}\) is the inner product matrix between the basis functions.

3 Selection of parameters

The three smoothing parameters involved in this problem \((\lambda _{1},\,\lambda _{2},\,\lambda _{3})\) are simultaneously selected by minimizing the following generalized cross validation error

$$\begin{aligned} GCVE\left( \lambda _{1},\,\lambda _{2},\,\lambda _{3}\right) = \displaystyle \frac{\sum _{i=1}^{n} SSE_{i}}{(n-trace(H))^{2}}, \end{aligned}$$

where

$$\begin{aligned} SSE_{i}=\displaystyle \sum _{j=1}^{m}\left( y\left( s_{i},\,t_{j}\right) -{\hat{y}}\left( s_{i},\,t_{j}\right) \right) ^{2}, \end{aligned}$$

and

$$\begin{aligned} H= \left( \varPhi ^{T}\otimes Z\right) \left[ \varPsi \otimes (Z^{\prime }Z) +PEN_{d}^{U,V,T}\right] ^{-1}\left( \varPhi ^{T^{\prime }}\otimes Z^{\prime }\right) , \end{aligned}$$

with \(PEN_{d}^{U,V,T}\) being the three-dimensional P-spline penalty of order d described above.

Minimization of the GCVE can become computationally demanding in this case, since we need to search for three smoothing parameters. In order to speed up the computational burden, instead of using an optimization routine, we selected a 3d-array and performed a grid search. We also checked the performance of other criteria such as AIC and BIC, and we found that, in this case BIC tended to oversmooth the spatial component of the model and AIC performed as well as GCVE.

On the other hand, the dimension of the basis in the three spatio-temporal directions must also be selected. Taking into account that the degree of smoothing is controlled by the smoothing parameter, the number and location of knots is not crucial for fitting a P-spline. Generally, the knots of a P-spline are equally spaced and the number of knots must be sufficiently large to fit the data and not so large that computation time is unnecessarily high. Two algorithms for automatic selection of the number of knots by using generalized cross validation were considered in Ruppert (2002). In general, authors select the dimensions of the basis on the rule considered by Ruppert (2002), which proposes to use one definition knot by each five observation knots, approximately.

4 Simulation studies

In order to test the good performance of the proposed PFSRM, two different simulation studies have been developed. The first one considers non equally spaced spatial locations on a grid and independent random errors. The second one was simulated by considering non-regular spatial locations and the random errors were added at the two dimensions (space and time) through a spatio-temporal Gaussian process. In addition, the results are compared with a powerful geostatistical predictor, the OKFD developed by Giraldo et al. (2011). Let us observe that with both methods, PFSRM and OKFD, the first step is to approximate the true functional form of the sample curves in terms of basis functions. In this paper we will use regression splines in terms of cubic B-splines basis functions.

For each method, a leave-one-out cross validation procedure is considered to predict each curve at each spatial location. The integrated squared error of prediction, with respect to the original data, can be computed as

$$\begin{aligned} ISE_{i} = \int _{T} \left( x\left( s_{i},\,t\right) -{\hat{y}}^{(-i)}\left( s_{i},\,t\right) \right) ^{2} dt, \quad i=1,\ldots ,n, \end{aligned}$$

with \({\hat{y}}^{(-i)}(s_{i},\,t)\) being the predicted curve at location \(s_{i}\) when the observation \(y(s_{i},\,t)\) is not in the sample.

4.1 Simulation study I

This simulation study was first considered in Giraldo et al. (2012). In our case, 225 spatial locations were fixed in a grid according to the coordinates \(u=v=({-}20,\,{-}16,\,{-}15,\,{-}10,\,{-}8,\,{-}5,\,{-}1,\, 1,\, 2,\, 6,\, 10,\, 12,\,15,\,16,\,20),\) on which a set of spatially correlated functional data were simulated at 365 equally spaced time points according to the model

$$\begin{aligned} Y\left( s_{i},\,t\right) = \sum _{k=1}^{15}a_{k} \left( s_{i}\right) \phi _{k}(t) + \epsilon \left( s_{i},\,t\right) , \quad i{\text {:}} \,1,\ldots ,225, \end{aligned}$$

where \(\phi (t)=(\phi _{1}(t),\ldots ,\phi _{15}(t))\) is a cubic B-spline basis, and each coefficient \(a_{k}\) is a realization of a Gaussian random field whose covariance structure is defined according to the exponential model \(C(h)=2exp\left( \frac{{-}h}{8}\right) ,\) where \(h=\Vert s_{i}-s_{j}\Vert ,\,(i,\,j=1,\ldots ,225)\) is the Euclidean distance between two sites \(s_{i}\) and \(s_{j}.\) Finally, \(\epsilon (s_{i},\,t)\) are independent random errors for each t,  with \(t=1,\ldots ,365,\) simulated according to a distribution N(0, 0.09). The spatial locations are shown in Fig. 2. The simulated data sets, with and without noise, can be seen in Fig. 3.

Fig. 2
figure 2

Simulation I: spatial locations

Fig. 3
figure 3

Simulation I: simulated data without noise (left) and with noise (right)

The first step for applying both methods, PFSRM and OKFD, was to approximate the sample curves by using regression splines in terms of a cubic B-spline basis of dimension 15 and considering equally spaced knots. As an example, a sample path without noise (dashed line) together with the noisy sample path (grey line) and its basis representation (solid black line) are displayed in Fig. 4. In this case, regression splines on 15 basis functions get a perfect approximation to the true data (without noise).

In the model fitting (second step), the two-dimensional basis for the space was achieved by considering 6 basis knots for each marginal cubic B-spline basis. Regard to the penalty, a 2-order penalty has been considered. For OKFD, the variograms were selected as a linear combination of nugget and exponential models.

Fig. 4
figure 4

Simulation I: basis representation of the sample curves by using regression splines in terms of a basis of cubic B-splines of dimension 15 (left). A sample path without noise (dashed line) together with the noisy sample path (grey line) and its basis representation (solid line) (right)

In order to check the good performance of the proposed methods, a leave-one-out cross-validation procedure was carried out to obtain the predicted curve at each unsampled spatial location. From the multiple box plot related to the distribution of the ISE’s (Fig. 5) and the statistics summary given in Table 1, it can be concluded that PFSRM achieves the lowest values for the mean, the standard deviation and the median of the prediction errors. On the other hand, in Fig. 6 the 225 predicted curves are displayed joint to the mean of the prediction curves (red lines) and the point wise confidence bands according to the mean \(\pm\) 2 times the standard deviation.

In order to test if the performance varies depending on the spatial location, the predicted curves at two outlying spatial locations (A and C) and a more central spatial location (B) have been displayed in Fig. 7 (locations A–C are highlighted in Fig. 2). We can see that PFSRM provides the best predicted curves closer to the shape of the true data, independently of the predicted spatial location. It is interesting to highlight that the worst prediction from OKFD was in the most distant location C. So, the performance of OKFD is not equal across the grid areas.

Finally, in Fig. 8 the residual curves provided by the two compared methods have been displayed. It can be seen that the mean residual curve is zero in both cases, and in some spatial locations OKFD achieves larger residuals than our method.

Table 1 Simulation I: summary of ISE’s from the cross-validation predictions provided by OKFD and PFSRM
Fig. 5
figure 5

Simulation I: box plot related to the ISE’s of the predictions provided by OKFD and PFSRM

Fig. 6
figure 6

Simulation I: mean curve of the predictions and point wise confidence bands according to the mean \(\pm\) 2 times the standard deviation of the predicted curves by OKFD (left) and PFSRM (right)

Fig. 7
figure 7

Simulation I: predicted curves by OKFD in two outlying spatial locations (locations A and C, left and right, respectively) and a more central location B in the middle of the grid

Fig. 8
figure 8

Simulation I: residuals (grey lines) and the mean curve of the residuals (black line) from OKFD and PFSRM

4.2 Simulation study II

Let us now consider a set of 80 non regular spatial locations, which are displayed in Fig. 9, and a set of 100 equally spaced times at the interval \([0,\,1].\) The idea is to simulate a set of spatially correlated functional data according to the model

$$\begin{aligned} Y\left( s_{i},\,t\right) = \left[ a_{i} f_{1}\left( s_{i},\,t\right) + b_{i} f_{2}\left( s_{i},\,t\right) -0.5\right] \sin \left( c_{i} \pi \, *t\,- 0.2\right) + \epsilon \left( s_{i},\,t\right) , \quad i=1,\ldots ,80, \end{aligned}$$

where

$$\begin{aligned} f_{1}(s,\,t)=e^{\left( \frac{{-}(u-0.2)^{2}}{5} - \frac{(v-0.5)^{2}}{3} -\frac{(t-0.5)^{2}}{4} - 1\right) }, \end{aligned}$$
$$\begin{aligned} f_{2}(s,\,t)=e^{\left( -\frac{(u-0.3)^{2}}{4}- \frac{(v-0.7)^{2}}{2} -\frac{(t-0.4)^{2}}{6} \right) }, \end{aligned}$$

with \(s=(u,\,v)\) denoting the pair of coordinates of the spatial locations, \(a_{i},\,b_{i},\) and \(c_{i}\) randomly simulated from \(a\sim Uniform[0.5,\,2],\,b\sim Uniform[0.5,\,1],\) and \(c\sim Uniform[1.5,\,2],\) and \(\epsilon (s,\,t)\) being the error term corresponding to an observation of a spatio-temporal Gaussian process defined through a stochastic partial differential equation (Sigrist et al. 2015), with parameters \((\rho _{0}=0.1,\,\sigma ^{2}=0.25,\,\zeta =0.9,\,\rho _{1}=0.1,\, \gamma =2,\,\alpha =\pi /4,\,\mu _{x}=0.2,\,\mu _{y}={-}0.2,\, \tau ^{2}=0.01).\) The simulated sample paths with and without error can be seen in Fig. 10.

Fig. 9
figure 9

Simulation II: spatial locations

Fig. 10
figure 10

Simulation II: simulated data without noise (left) and with noise (right)

The first step was to approximate the sample curves by using basis representations with B-splines. In order to check the relation between the forecasting performance and the dimension of the initial approximation of the sample paths, regression splines in terms of a cubic B-spline basis of dimension 13 and 23 with equally spaced knots have been considered. In Fig. 11 a sample path without noise (dashed line) together with the noisy sample path (grey line) and its basis representation (solid black line) are shown. The regression splines of all curves can be seen in Fig. 12. Obviously, a higher dimension for the basis provides noisier sample curves and far away from the original ones. The two scenarios (13 and 23 basis functions) are considered and compared. In the model fitting (second step), the two-dimensional basis for the space was achieved by considering 15 basis knots for each marginal basis. As in simulation I, a 2-order P-spline penalty has been considered. For OKFD, the variograms were selected as a linear combination of nugget and exponential models.

Fig. 11
figure 11

Simulation II: a sample path without noise (dashed line) together with the noisy sample path (grey line) and its basis representation (solid line) on a cubic B-spline basis of dimension 13 (left) and 23 (right)

Fig. 12
figure 12

Simulation II: basis representation of the sample curves by using regression splines in terms of a cubic B-spline basis of dimension 13 (left) and 23 (right)

In this study the differences in the accuracy of the predictions provided by the two methods are also shown. According to the statistics summary provided in Table 2, the proposed method reduces the value of the mean and the median of the ISE’s independently of the number of basis functions used at the first step (regression splines fitting). This fact is also supported by the box plots given in Fig. 13.

Table 2 Simulation II: summary of ISE’s from the cross validation predictions by considering 13 and 23 basis functions at the initial regression splines (R-splines) fitting
Fig. 13
figure 13

Simulation II: box plot related to the ISE of the predictions by OKFD and PFSRM, considering 13 and 23 basis functions at the initial regression splines fitting

With respect to the predicted curves obtained by each method, in Fig. 14 we can check that even in the most favorable case for OKFD of 13 basis functions, PFSRM provides the most accurate predictions. In order to test if the performance of the predictions varies depending on the spatial location, two outlying spatial locations (A and C) and a central spatial location (B) have been considered. Locations A–C are highlighted in Fig. 9. Accordingly, PFSRM achieves the predicted curves closest to the true data.

On the other hand, in the Fig. 15 the mean curve of the predictions (red line) join to the point wise confidence bands (dashed line) according to the mean \(\pm\) 2 times the standard deviation are displayed, with OKFD being the method that provides the noisier mean curve and the predictions with more variability. Let us also observe that PFSRM achieves the smoothest predicted curves while OKFD provides predicted curves further away from the real ones. Taking into account that OKFD seems to be very sensitive to the dimension of the basis, other possibility could be pre-smoothing the data by using P-splines. The measures of accuracy of the predictions provided by the two methods figure in Table 2. In this case the prediction errors decrease for both methods, but the ones given by PFSRM continue being lower.

Fig. 14
figure 14

Simulation II: predicted curves by OKFD and PFSRM in two outlying spatial locations (locations A and C, left and right, respectively) and a centered spatial location B (in the middle). Locations A–C are highlighted in Fig. 9. The initial sample curves were approximated by using regression splines on 13 (first row) and 23 (second row) B-spline basis functions

Fig. 15
figure 15

Simulation II: mean curve of the predictions and point wise confidence bands according to the mean curve \(\pm\) 2 times the standard deviation of the predicted curves by OKFD and PFSRM. The initial sample curves were approximated by using regression splines on 13 (first row) and 23 (second row) B-spline basis functions

Therefore, we can say that independently of the basis dimension used to approximate the sample curves at the beginning, OKFD has not been able to predict curves with the same shape than the true data while PFSRM gets more accurate predictions in both scenarios. Then, PFSRM is more robust than OKFD in the sense that it is not so sensitive to the selection of the dimension of the basis.

5 Application to Canadian Maritime weather data

In this study we use averages (over 30 years) of daily temperature curves observed at 35 Canadian Maritime weather stations. This is a clear example of functional data presenting spatial dependence, since curves located at closer geographical locations will be similar to other there are further apart (see Fig. 1).

The first step for both methods is to consider the basis representation of the raw sample paths in terms of cubic B-spline basis functions. In order to get more general conclusions, different number of basis functions have been considered for the initial basis representation of the sample paths, exactly 33 (Case 1) and 65 (Case 2). The regression splines fitted in the two cases are displayed in Fig. 16. As in both simulation studies, a 2-order P-spline penalty was considered. Furthermore, the variograms were selected as a linear combination of nugget and exponential models.

Fig. 16
figure 16

Application: regression splines fitted from the temperature raw data by using 33 and 65 cubic B-spline basis functions (Cases 1 and 2, respectively)

Fig. 17
figure 17

Application: predicted curves (grey) by OKFD (at the top) and PFSRM (at the bottom) from the regression splines of the temperature raw data [using 33 (Case 1) and 65 (Case 2) cubic B-spline basis functions] join to its mean curve (blue and red line) and the point wise confidence bands according to the mean \(\pm\) 2 times the standard deviation (black and dashed line)

In order to get the predicted curve on each geographical site a leave-one-out cross-validation procedure was carried out. The predicted curves obtained by OKFD and PFSRM next to their mean curve and point wise confidence bands (according to the mean \(\pm\) 2 times the standard deviation) can be seen in Fig. 17. In both cases (Cases 1 and 2), the spatial basis is made by considering 6 knots for each marginal basis. It can be seen that when the dimension of the basis for fitting the regression splines increases (Case 2), the predictions provided by OKFD are noisier than in Case 1. By contrast, PFSRM provides similar predictions independently of the number of basis functions used to fit the initial regression splines. In that sense, PFSRM is more robust than OKFD with respect to the dimension of the initial B-spline expansions in the time domain. This is an important advantage of our method, since the selection of the number of initial basis functions is not as relevant as in functional kriging. For two Canadian Maritime provinces, the predicted temperature curves by OKFD (blue) and PFSRM (red) are plotted together with the observed temperature curves in Fig. 18. Independently of the basis dimension, PFSRM provides smoother and more accurate predicted curves than OKFD and also maintains the trend of the raw data.

In order to compare the prediction ability of the two methods, the box plots related to the SSE’s (with respect to the observed data) obtained by cross validation for Cases 1 and 2 can be seen in Fig. 19. Also, the mean, the standard deviation and the median of the 35 SSE’s are summarized in Table 3. Again, independently of the dimension of the basis used in the initial regression splines, the lowest values of the median of the SSE’s are always obtained by PFSRM.

Finally, as goodness-of-fit measure, in Fig. 20 the residual curves have been displayed, highlighting the residuals related to three spatial locations (color and dashed lines). From this figures we can conclude that both methods provide a considerable number of residuals close to zero, but in some locations OKFD provides larger residuals than PFSRM.

Fig. 18
figure 18

Application: the predicted curve by OKFD (blue) and PFSRM (red) from the regression splines of the temperature raw data [using 33 (Case 1) and 65 (Case 2) cubic B-spline basis functions] and the observed temperature curve (black) in two of the 35 Canadian Maritime provinces

Fig. 19
figure 19

Application: box plots related to the SSEs (with respect to the raw data) obtained in the cross validation for Cases 1 and 2

Table 3 Application: the median, the mean and the standard deviation of the SSEs (with respect to the observed data) obtained in the cross validation for Cases 1 and 2
Fig. 20
figure 20

Application: residuals (grey lines) and the mean curve of the residuals (black line) from OKFD and PFSRM, considering 33 basis functions at the initial regression splines fitting. In dashed and color lines two residuals have been highlighted

6 Conclusions

The aim of this paper is to provide a new tool to predict spatially dependent functional data as alternative to other geostatistical prediction techniques, such as functional kriging. From a formal definition of spatial functional data, which was presented in Sect. 2, a penalized estimation of a functional spatial regression model has been proposed in this paper by introducing a three-dimensional P-spline penalty at the least squares fitting criterion (Sect. 2.1).

In order to compare the proposed method with functional kriging on different scenarios, two different simulation schemes have been carried out. The first considers non equally spaced spatial locations on a grid and independent random errors. The second one was simulated by considering non-regular spatial locations and random errors simulated from a spatio-temporal Gaussian process. Furthermore, an application to climatological real data has been presented.

In both simulation studies and the application to real data, the first step was to approximate the sample curves from their discrete observations by using regression splines on cubic B-spline basis with different dimensions just for a comparison purpose (the number basis functions at each study was proposed attending to the corresponding data structure). Regard to the P-spline penalty used in the model fitting, in all cases a 2-order penalty was considered.

A leave-one-out cross-validation procedure was carried out to obtain the predicted curve at each unsampled spatial location. Also, in order to test if the forecasting performance varies depending on the spatial location, the predicted curves in three outlying spatial locations were analyzed in both simulation studies.

The first simulation study revels that even in the most favorable (where the initial regression splines are closer to the true data without noise), PFSRM provides the most accurate predicted curves very close to the shape of the true data, independently of the spatial location. By contrast, the forecasting performance of OKFD is not equal across the grid areas.

The second simulation study highlights that independently of the basis dimension used to approximate the sample curves at the beginning, OKFD has not been able to predict curves with the same shape than the true data while PFSRM gets the most accurate predictions in the two considered scenarios. In addition, we can say that PFSRM is more robust than OKFD in the sense that it is not as sensitive as OKFD to the selection of the basis dimension.

In general, from the two simulation studies, it is clear that PFSRM provides the most accurate predicted curves (even in outlying spatial locations) and reduces the mean and the median of the ISE’s with respect to OKFD, independently of the number of basis functions used at the initial smoothing with regression splines.

In addition, one of the advantages of our method is that it can be used when the spatial locations where measurements are taken change from one time point to another, and also the model can easily cope with missing space or time points.

Summarizing, it can be concluded that in order to predict functional data with spatial dependence, the proposed PFSRM is a more accurate and computationally efficient alternative to existing geostatistical predictors as ordinary functional kriging.