1 Introduction

The well-known functional regression model with scalar response (see Horváth and Kokoszka 2012; Ferraty and Vieu 2006 or Ramsay and Silverman 2005, for general discussions) postulates a relation between a real random variable \(Y\) and a random function \(X\), which belongs to a functional space \( \mathcal {F}\) of real functions defined on a compact interval \(I\), via a real valued operator \(r\) as follows:

$$\begin{aligned} Y=r\left[ X\right] +\mathcal {E} \end{aligned}$$

where \(\mathcal {E}\) is a centered real random error uncorrelated with the regressor.

It is opportune to note that some parts of the curves, or even some of their particular points, may be more interesting than others in order to explain the relation between \(X\) and the response \(Y\). Various approaches have recently been developed on this topic, including the partial no effect tests proposed in Cardot et al. (2004) in the context of linear models, the structural nonparametric tests introduced in Delsol et al. (2011) and Delsol (2013), or the methods based on variables selection as for instance in Ferraty et al. (2010) and Aneiros et al. (2011) for a nonparametric model or in McKeague and Sen (2010) in the functional linear framework.

Indeed one can suppose, in some situations, that specific parts of the whole curve \(X\) act in a different way for explaining the response \(Y\). Hence, partitioning \(I\) in \(s\) contiguous sub-intervals \(I_{j}\), and denoting \( X^{j} \) the restriction of \(X\) to \(I_{j}\), one can write the following additive decomposition of the regression operator:

$$\begin{aligned} r\left[ X\right] =\sum _{j=1}^{s}r_{j}\left[ X^{j}\right] . \end{aligned}$$
(1)

Consider for instance the problem of estimating the chemical composition of a given aliment by using spectrometric curves, namely the absorbances of light irradiated on the aliment varying the wavelength of emission. In the chemometric literature it is known that some features of the spectra (see Leardi 2003) or some specific parts of the spectrometric curves (see Delsol 2013) are more interesting than others to predict the proportion of a specific substance.

Figure 1 shows the near-infrared absorbance spectra corresponding to \(215\) pork samples, recorded on a Tecator Infratec Food and Feed Analyzer, and the second derivatives of such spectral curves. Such dataset has become a benchmark in functional regression studies: the aim is to predict the percentage of fat contained in each sample of meat from its near-infrared spectrum. Some empirical evidences on such case study emerge from literature: as pointed out in Ferraty et al. (2013), the regression function exhibits a nonlinear nature; moreover the role of some specific points of the spectrometric curve in explaining the fat content has been emphasized in Ferraty et al. (2010). Combining previous observations, it is reasonable to expect that the decomposition (1) can lead to a regression model with better prediction ability, and which can be able to provide a key to better understand the relationship between predictor and response. These data will be presented with more details and analyzed further along this paper (see Sect. 4).

Fig. 1
figure 1

Spectrometric curves from Tecator dataset (left panel) and their second derivatives (right panel)

The decomposition (1) includes a broad class of modelizations: as, for instance, Linear models with functional coefficients having \(s\) points of jump discontinuity (see for instance Horváth and Reeder 2012), or Generalized Linear models (see for instance James 2002) that act on specific parts of the random curve. In these examples, the shape of \(r_{j}\)s is entirely specified (since they are modeled in a parametric way): although that allows to give some interpretations of the estimated coefficients involved, it appears quite restrictive and the specification of the link is difficult to implement in the functional regression context.

On the other hand, a wide class of flexible and useful tools to modelize the regression operator \(r\) is represented by the Functional Single Index models (FSIM in the sequel). The main idea is to search the direction \(\theta _{0}\in \mathcal {F}\) along which the projection of the covariate \(X\) captures the most information on the response \(Y\). This presents various interests: Firstly, it is avoiding problems due to the dimensionality which one can be meet in the full nonparametric approach (see Ferraty and Vieu 2002); Secondly, it is much more flexible than standard parametric/linear modelization (see James 2002); Finally, estimating the relevant functional direction \(\theta _{0}\) provides an easily interpretable tool.

The Single Index approach is well-known in the standard multivariate context: the interest both for its prediction abilities and for interpretability is attested by various works appeared in the last two decades: see Härdle et al. (1993), Härdle and Stoker (1989), or Xia and Härdle (2006) for a selected sample of references, and see Härdle et al. (2004) for a general presentation of semi-parametric approaches. The extensions to the functional framework of these ideas, such a functional semi-parametric methodology, have been intensively studied in the literature: conditions for identifiability for FSIM have been introduced in Ferraty et al. (2003) and several estimation techniques are proposed in Ait-Saïdi et al. (2008), Amato et al. (2006) and Ferraty et al. (2011). Moreover, this approach can be seen as the first step of the Functional Projection Pursuit regression developed in Ferraty et al. (2013).

The aim of this paper is to exploit the flexibility of FSIM in the additive decomposition (1) in order to treat situations when structural changes occur. More precisely we write:

$$\begin{aligned} r_{j}\left[ X^{j}\right] =g_{j}\left( \int _{I_{j}}\theta _{j}\left( t\right) X\left( t\right) dt\right) \end{aligned}$$
(2)

where \(g_{j}\) is an unknown real link function and \(\theta _{j}\) are unknown directions such that \(\int _{I_{j}}\theta _{j}^{2}\left( t\right) dt=1\).

To the sake of simplicity, we will study specifically the introductory case \( s=2\). More in detail, we introduce an estimation procedure based on a backfitting algorithm where each term (2) is fitted by a procedure combining a spline approximation of the direction and the one-dimensional Nadaraya-Watson kernel regression estimate. Some considerations on the way to obtain asymptotic results are sketched: the crucial aspect emerging is the insensitiveness of the method to dimensionality effects. The selection of the breaking-point for cutting \(I\) into two parts is discussed and a fully data-driven method for that is presented. The study is completed with an extended empirical analysis based both on real and simulated data: as well as emphasizing on the good predictive performance of our method, the study highlights the interpretability of the functional directional outputs.

The paper is organized as follows. In Sect. 2 we deepen some technical aspects about the partitioned model and the estimation technique is described. Section 3 is devoted to some computational issues: in Sect. 3.1 the finite sample performances of the approach are illustrated through simulations, whereas the behaviour of the data-driven procedure for choosing the breaking-point is shown in Sect. 3.2. Finally, an application to the spectrometric dataset is presented in Sect. 4. A short discussion on asymptotics is provided in the final “Appendix”.

2 Model and methodology

2.1 The partitioned FSIM

Our aim is to study the model (2), but in order to make things clearer we only detail the simplest case when \(s=2\); extensions to higher values of \(s\) are straightforward.

Let us fix some notations. Consider a functional r.v. \(X=\left\{ X(t) , t\in I\right\} \) and the real r.v. \(Y\) defined over the same space. Without loss of generality, we take \(I=\left[ 0,1\right] \) and \( \mathbb {E}\left[ X\left( t\right) \right] =0\) for all \(t\). Define the regression model:

$$\begin{aligned} Y=r\left[ X\right] +\mathcal {E} \end{aligned}$$
(3)

where \(r\) is a real value operator and \(\mathcal {E}\) is a real random error with finite variance and such that \(\mathbb {E}\left[ \mathcal {E}|X\right] =0\) a.s.. As usually in the literature, we assume that \(X\) take values in the separable Hilbert space \(L^{2}\left( I\right) \) of square integrable real functions.

Introduce a breaking-point \(\lambda \in \left( 0,1\right) \) and split \(I\) into two subintervals in the following way:

$$\begin{aligned} I_{1}=\left[ 0,\lambda \right] \ \ \ \ \ \ \ \ \ \ I_{2}=\left( \lambda ,1 \right] . \end{aligned}$$

We define the two-terms Partitioned Functional Single Index Model (PFSIM in the sequel) as

$$\begin{aligned} Y=\alpha +g_{1}\left( \int _{I_{1}}\theta _{1}\left( t\right) X\left( t\right) dt\right) +g_{2}\left( \int _{I_{2}}\theta _{2}\left( t\right) X\left( t\right) dt\right) +\mathcal {E} \end{aligned}$$
(4)

where \(\alpha \) is a real coefficient, \(g_{1}\) and \(g_{2}\) are some real smooth functions. For standard identifiability reasons one has to assume that the directions \(\theta _{j}\) satisfy

$$\begin{aligned} \int _{I_{1}}\theta _{1}^{2}\left( t\right) dt=\int _{I_{2}}\theta _{2}^{2}\left( t\right) dt=1 \end{aligned}$$

as well as

$$\begin{aligned} \int _{I_{1}}\theta _{1}\left( t\right) e_{1}\left( t\right) dt=\int _{I_{2}}\theta _{2}\left( t\right) f_{1}\left( t\right) dt=1 \end{aligned}$$

where \(e_{1}\) and \(f_{1}\) are the first elements of some orthonormal bases of \(L^{2}\left( I_{1}\right) \) and \(L^{2}\left( I_{2}\right) \) respectively.

At this stage it is worth being noted the high degree of flexibility of the model. From one side it can be seen as a natural extension of the standard FSIM model as discussed for instance in Ait-Saïdi et al. (2008):

$$\begin{aligned} Y=\alpha +g\left( \int _{I}\theta \left( t\right) X\left( t\right) dt\right) + \mathcal {E}, \end{aligned}$$

as well, of course, as an extension of the basic linear model as discussed for instance in Cardot et al. (2003):

$$\begin{aligned} Y=\alpha +\int _{I}\theta \left( t\right) X\left( t\right) dt+\mathcal {E}. \end{aligned}$$

More surprisingly, it can also be seen as a kind of extension of the fully nonparametric model (3) in the sense that it allows the use of an unsmooth operator \(r\), while the nonparametric literature (see Ferraty and Vieu 2006) is based on continuity-type assumptions. Under this perspective, PFSIM provides a useful approximator for the regression operator:

$$\begin{aligned} r\left[ X\right] \approx \alpha +g_{1}\left( \int _{I_{1}}\theta _{1}\left( t\right) X\left( t\right) dt\right) +g_{2}\left( \int _{I_{2}}\theta _{2}\left( t\right) X\left( t\right) dt\right) \end{aligned}$$
(5)

with constraints \(\int _{I_{1}}\theta _{1}^{2}\left( t\right) dt=\int _{I_{2}}\theta _{2}^{2}\left( t\right) dt=1\). It should be noted that in such context, this decomposition is not unique: indeed, one can found two different couples of terms \(\left\{ \left( g_{j},\theta _{j}\right) _{j=1,2}\right\} \) and \(\left\{ \left( \widetilde{g}_{j},\widetilde{\theta } _{j}\right) _{j=1,2}\right\} \) such that \(\sum _{j=1,2}g_{j}\left( \int _{I_{j}}\theta _{j}\left( t\right) X\left( t\right) dt\right) = \sum _{j=1,2}\widetilde{g}_{j}\left( \int _{I_{j}}\widetilde{\theta } _{j}\left( t\right) X\left( t\right) dt\right) \). If that lack of unicity may cause problem for interpretating the outputs, it should be stressed that it has no effects on the two main features of the model, namely its interest for detecting existence of a possible breakpoint and its high degree of flexibility that will guarantee nice prediction performances.

2.2 Fitting the partitioned FSIM

Consider now the problem of estimating the link functions \(g_{j}\) and the directions \(\theta _{j}\) in the model (4), from a sample \( \left\{ \left( X_{i},Y_{i}\right) \!,~i=1,\dots ,n\right\} \) of  r.v.s identically distributed as \(\left( X,Y\right) \). In a first attempt we consider that the breaking point \(\lambda \) is known; the important question of estimating \(\lambda \) in practice will be addressed in Sect. 2.3. From the additive nature of the model, we propose a backfitting algorithm (see e.g. Hastie et al. 2009) in which each term is estimated by an alternating optimization strategy similar to the one used in Ferraty et al. (2013) and whose principle is illustrated in the following.

For \(j=1,2\), consider the \(\left( q_{j}+k_{j}\right) \)-dimensional space of spline functions defined on \(I_{j}\) with order \(q_{j}\) and with \(k_{j}-1\) interior equispaced knots (with \(q_{j}>2\) and \(k_{j}>1\), integers) and let \( \left\{ B_{s}^{j}\right\} \) be normalized B-splines basis of such space. In such basis \(\theta _{j}\left( t\right) \) is represented as \(\mathbf {\delta } _{j}^{T}\mathbf {B}_{j}\left( t\right) \), where \(\mathbf {B}_{j}\left( t\right) \) is the vector of all the B-splines. To remove trivial ambiguity, each vector \(\mathbf {\delta }_{j}\) of coefficients is such that its first element is positive, and satisfies the normalization condition:

$$\begin{aligned} \mathbf {\delta }_{j}^{T}\int _{I_{j}}\mathbf {B}_{j}\left( t\right) \mathbf {B} _{j}\left( t\right) ^{T}dt\ \mathbf {\delta }_{j}=1. \end{aligned}$$
(6)

The estimation procedure is based on the algorithm described below, which has been implemented in R code and exploits the Nelder-Mead optimization algorithm (see Nelder and Mead 1965). In the following we denote by \( \{(x_{i},y_{i}),\) \(i=1,\dots ,n\}\) the observed values of the random pairs \( (X_{i},Y_{i})\).

  • Initialize - Set \(\widehat{\alpha }=n^{-1}\sum _{i=1}^{n}y_{i}\) , initialize the current residuals \(\widehat{\mathcal {\varepsilon }} _{i}=y_{i}-\widehat{\alpha }\), and fix \(j\) (\(j=1\) or \(j=2\)).

  • Cycle - Find \(\widehat{\mathbf {\delta }}_{j}\) which minimizes over \(\mathbf {d}\in \mathbb {R}^{q_{j}+k_{j}}\) the empirical quadratic cross-validation criterion:

    $$\begin{aligned} CV_{j}\left( \mathbf {d}\right) =\frac{1}{n}\sum _{i=1}^{n}\left[ \left( \widehat{\mathcal {\varepsilon }}_{i}-\widehat{g}_{j}^{\left[ -i\right] }\left( \mathbf {d}^{\prime }\mathbf {b}_{j,i}\right) \right) ^{2}\right] \end{aligned}$$
    (7)

    where \(\mathbf {b}_{j,l}=\left\langle \mathbf {B}_{j},x_{l}\right\rangle \) and

    $$\begin{aligned} \widehat{g}_{j}^{\left[ -i\right] }\left( z\right) =\sum _{l\not =i}\frac{ K_{j}\left( \frac{z-\mathbf {d}^{\prime }\mathbf {b}_{j,l}}{h_{j}}\right) }{ \sum _{l\not =i}K_{j}\left( \frac{z-\mathbf {d}^{\prime }\mathbf {b}_{j,l}}{h_{j} }\right) }\cdot \widehat{\mathcal {\varepsilon }}_{i} \end{aligned}$$

    with \(K_{j}\) a kernel function and \(h_{j}\) a suitable smoothing parameter. As it is made conventionally in the additive models literature, we assume that \(n^{-1}\sum _{l=1}^{n}\widehat{g}_{j}^{\left[ -i\right] }\left( \widehat{ \mathbf {\delta }}_{j}^{\prime }\mathbf {b}_{j,l}\right) =0\). Then, update the residuals

    $$\begin{aligned} \widehat{\mathcal {\varepsilon }}_{i}=y_{i}-\widehat{\alpha }-\widehat{g} _{j}^{\left[ -i\right] }\left( \widehat{\mathbf {\delta }}_{j}^{\prime } \mathbf {b}_{j,l}\right) \end{aligned}$$

    and swap the value of the index \(j\).

The process is continued until stabilization of the quadratic error measure \( n^{-1}\sum _{i=1}^{n}\left( y_{i}-\widehat{y}_{i}\right) ^{2}\), where \( \widehat{y}_{i}\) are the estimated values.

The estimator is tuned by three couples of parameters: the order of splines \( q_{1}\) and \(q_{2}\), the number of knots \(k_{1}\) and \(k_{2}\), and the bandwidths \(h_{1}\) and \(h_{2}\). If the order of splines may be fixed to \(3\), the number of knots \(k_{j}\) has to be chosen conveniently in order to capture to complexity of the shape of the direction \(\theta _{j}\) to estimate: classical Akaike Information Criterion, AIC, and the Schwartz Information Criterion, BIC (for a general presentation, see Burnham and Anderson 2002) can be useful in this view. Often, however the choice may be done heuristically. Finally, since the estimator of \(g_{j}\) is an usual nonparametric regression kernel estimator, the choice of the smoothing parameters \(h_{j}\) can be performed by data-driven selectors of the bandwidth such as cross-validation. Due to the nature of Nelder-Mead method, the proposed algorithm is inclined to get stuck in a local minimum: to alleviate this problem, one can use multiple random initialization of the parameters.

2.3 Data-driven breaking-point selection

While the previous procedure is defined for fixed value of the parameter \( \lambda \), the question of how choosing it in practice is a natural one. The main idea for that is to use the value leading to the minimal prediction error. This work as follows:

  • Step 1 Choose a grid \(\Lambda \) of possible values for \( \lambda \);

  • Step 2 Compute for each \(\lambda \in \Lambda \) the estimates of the direction \(\theta _{j}\) and the link functions \(g_{j}\) by running the algorithm defined in Sect. 2.2;

  • Step 3 Choose the value \(\widehat{\lambda }\) which minimizes the cross-validation criterion

    $$\begin{aligned} CV\left( \lambda \right) =\frac{1}{n}\sum _{i=1}^{n}\left[ \left( y_{i}- \widehat{r}_{\lambda }^{\left[ -i\right] }\left( x_{i}\right) \right) ^{2} \right] , \end{aligned}$$
    (8)

    where \(\widehat{r}_{\lambda }^{\left[ -i\right] }\) is the leave-one-out version of the estimate computed along the step 2 for the value \(\lambda \).

The selection method above is easy to implement but it could fail in detecting the “true” breaking-point due to non unicity problems of the approximation (5) stressed at the end of Sect. 2.1. Moreover, in case of misspecification of \(\lambda \), the extreme flexibility of the approach leads however to obtain a good fitting which counterpoises the erroneous evaluation: the theoretical remarks in “Appendix” will make us clear how this procedure is accurate if one is only looking for predictive performance. Its behaviour on finite sample will be analyzed in Sect. 3.2.

3 Computational issues

3.1 Assessing the performances by simulations

We illustrate the finite sample performances of our procedure, comparing it to several linear and non-linear functional approaches in a series of simulation studies. To avoid introducing noises connected with a misspecification of the breaking-point, we suppose \(\lambda \) known.

Data were generated according to the following regression models:

$$\begin{aligned} Y_{i}=r\left[ X_{i}\right] +\sigma \mathcal {E}_{i}\ \ \ \ \ \ \ i=1,\dots ,n \end{aligned}$$
(9)

where \(n=300\), \(\mathcal {E}_{i}\sim \mathcal {N}\left( 0,1\right) \) and \( \sigma ^{2}=\rho ^{2}Var\left( r\left[ X\right] \right) \), \(\rho \ \)controls the signal-to-noise ratio (we used \(\rho =0.1\), \(0.3\)). The functional covariates obey to

$$\begin{aligned} X_{i}\left( t\right) =a_{i}+b_{i}t^{2}+c_{i}\exp \left( t\right) +\sin \left( d_{i}2\pi t\right) \ \ \ \ \ \ \ t\in \left[ -1,1\right] \end{aligned}$$
(10)

where \(a_{i}\), \(b_{i}\), \(c_{i}\) and \(d_{i}\) are real r.v.s independent and uniformly distributed over \(\left( -1,1\right) \), so that \(\mathbb {E}\left[ X_{i}\left( t\right) \right] =0\), \(t\in \left[ -1,1\right] \). Each functional predictor is discretized over a grid of \(200\) equispaced design points \(\left\{ t_{j},\ j=1,\dots 200\right\} \) to obtain the \(300\times 200\) matrix \(\left[ x_{i}\left( t_{j}\right) \right] \). A random selection of these functional data is plotted in the Figure 2.

Fig. 2
figure 2

A random selection of \(30\) functional predictors used in the simulation experiments

Regression operators \(r\left[ X_{i}\right] \) have been obtained as the sum of two terms acting on \(I_{1}=\left[ -1,0\right] \) and \(I_{2}=\left( 0,1 \right] \). As illustrated in the following, they may be linear, generalized linear or full nonparametric terms, so as to cover a wide range of possible regression links and to show how PFSIM behaves in the different cases.

More in detail, we introduced the real functional coefficients:

$$\begin{aligned} \varphi _{1}\left( t\right) =\kappa _{1}\cos (2\pi t^{2})\ \ \ \ \ \ \ \ \ \ \ \ \varphi _{2}\left( t\right) =\kappa _{2}\sin \left( \dfrac{3}{2}\pi t\right) ^{3} \end{aligned}$$

where \(\kappa _{j}\) are such that \(\left( \int _{I_{j}}\left[ \varphi _{j}\left( t\right) \right] ^{2}dt\right) ^{1/2}=1\), and the random functions, obtained by transformations of the original random data \(X_{i}\):

$$\begin{aligned} m_{1}^{X_{i}}\left( t\right) =\sin \left( X_{i}\left( t\right) \right) \ \ \ \ \ \ \ \ \ \ \ \ m_{2}^{X_{i}}\left( t\right) =\sqrt{\left| X_{i}\left( t\right) \right| }. \end{aligned}$$

Then we considered the following cases:

  1. 1.

    The regression operator is linear with a discontinuous functional coefficient:

    $$\begin{aligned} r_{1}\left[ X_{i}\right] =\int _{-1}^{1}\left[ \varphi _{1}\left( t\right) ~ \mathbf {1}_{t\in I_{1}}+\varphi _{2}\left( t\right) ~\mathbf {1}_{t\in I_{2}} \right] X_{i}\left( t\right) dt, \end{aligned}$$

    where \(\mathbf {1}_{t\in A}\) is the indicator function of subset \(A\).

  2. 2.

    The regression operator is linear over \(I_{1}\) and non linear over \( I_{2}\). About the second addend, we analyzed both the case of generalized linear structure with cubic link function:

    $$\begin{aligned} r_{2.a}\left[ X_{i}\right] =\int _{-1}^{0}\varphi _{1}\left( t\right) X_{i}\left( t\right) dt+4\left( \int _{0}^{1}\varphi _{2}\left( t\right) X_{i}\left( t\right) dt\right) ^{3}, \end{aligned}$$

    and the one of a full nonparametric term:

    $$\begin{aligned} r_{2.b}\left[ X_{i}\right] =\int _{-1}^{0}\varphi _{1}\left( t\right) X_{i}\left( t\right) dt+\int _{0}^{1}m_{2}^{X_{i}}\left( t\right) dt. \end{aligned}$$
  3. 3.

    Both terms composing \(r\left[ X_{i}\right] \) are nonlinear:

    $$\begin{aligned} r_{3.a}\left[ X_{i}\right] =\sin \left( \pi \int _{-1}^{0}\varphi _{1}\left( t\right) X_{i}\left( t\right) dt\right) +4\left( \int _{0}^{1}\varphi _{2}\left( t\right) X_{i}\left( t\right) dt\right) ^{3}, \end{aligned}$$

    or full nonparametric:

    $$\begin{aligned} r_{3.b}\left[ X_{i}\right] =\int _{-1}^{0}m_{1}^{X_{i}}\left( t\right) dt+\int _{0}^{1}m_{2}^{X_{i}}\left( t\right) dt. \end{aligned}$$

We estimated the previous models with the algorithm illustrated in Sect. 2.2 over training-samples of size \(200\) with \(\lambda \) fixed to zero. We used cubic splines with the same number of internal knots: \( k_{1}=k_{2}=3\); the smoothing parameters \(h_{j}\) are selected by a leave-one-out cross-validation procedure. Prediction outcomes were quantified on test-sets of size \(n.out=100\), by the Relative Mean Square Error of prediction:

$$\begin{aligned} RMSE=\frac{\sum _{i=1}^{n.out}\left( y_{i}^{out}-\widehat{y}_{i}\right) ^{2}}{ \sum _{i=1}^{n.out}\left( y_{i}^{out}-\overline{y}\right) ^{2}} \end{aligned}$$

where \(y_{i}^{out}\) are the elements of the test-set, \(\widehat{y}_{i}\) are the corresponding estimated values and \(\overline{y}=n^{-1} \sum _{i=1}^{n}y_{i}^{out}\). Each simulation was repeated for \(100\) times to obtain a frequency distributions of the RMSE.

Prediction results with PFSIM were compared with those obtained from the following competitors:

  1. 1.

    Functional Single Index Model (FSIM) fitted with the first step of the alternating least square algorithm proposed in Ferraty et al. (2013): we used cubic splines with \(5\) knots and a leave-one-out cross-validation procedure for the selection of the bandwidth;

  2. 2.

    Functional Linear Model (FLM) where the functional coefficient was estimated with a penalized B-spline procedure (based on cubic splines with \( 20\) internal knots) and the smoothing parameter in the penalization (controlling second derivatives) was selected with a cross-validation procedure (see e.g. Cardot et al. 2003);

  3. 3.

    Functional Nonparametric Model (FNPM) estimated using the \(\kappa \) -nearest neighbour functional kernel estimator (with \(\kappa \) chosen by local cross-validation) with proximity between curves measured with the classical \(L^{2}\) norm (for details see Ferraty and Vieu 2006).

Comparison between the empirical distributions of \(RMSE\)s resulting from the above estimation strategies can be made analyzing Figs. 3456 and 7.

Fig. 3
figure 3

RMSE for regression model involving operator \(r_{1}\) with \(\rho =0.1\) and \(\rho =0.3\). a stands for PFSIM, b for FSIM, c for FLM and d for FNPM

Fig. 4
figure 4

RMSE for regression model involving operator \(r_{2.a}\) with \( \rho =0.1\) and \(\rho =0.3\). a stands for PFSIM, b for FSIM, c for FLM and d for FNPM

Fig. 5
figure 5

RMSE for regression model involving operator \(r_{2.b}\) with \( \rho =0.1\) and \(\rho =0.3\). a stands for PFSIM, b for FSIM, c for FLM and d for FNPM

Fig. 6
figure 6

RMSE for regression model involving operator \(r_{3.a}\) with \( \rho =0.1\) and \(\rho =0.3\). a stands for PFSIM, b for FSIM, c for FLM and d for FNPM

Fig. 7
figure 7

RMSE for regression model involving operator \(r_{3.b}\) with \( \rho =0.1\) and \(\rho =0.3\). a stands for PFSIM, b for FSIM, c for FLM and d for FNPM

The simulations show that our method performs very well in all the examples, also in comparison with the other proposed methods. Indeed, when the model is linear (see Fig. 3) the PFSIM is practically equivalent to the FSIM and the FLM. Moreover, it produces the best prediction performances when the regression operator \(r\left[ X\right] \) is decomposable in two parts of the type \(g_{j}\left( \int _{I_{j}}\theta \left( t\right) X\left( t\right) dt\right) \) (see Figs. 4 and 6). Finally, when a full nonparametric term appears, PFSIM widely outperforms FSIM and FLM estimators, and it is equivalent to the full nonparametric approach (see Figs. 57).

From the study it emerges that our approach represents a valid alternative to the pure nonparametric one. Compared to this, one can bring out some advantages: the dimensionality problem is avoided, the task of choosing a “good” semi-norm is skipped, and, in same cases, it is possible to detect some latent structures in the regression operator when they exist.

To appreciate the latter aspect, we reproduce in Fig. 8 the estimates of link functions \(g_{j}\) and directions \(\theta _{j}\) when the responses \(y_{i}\) are generated by a model with the regression operator \(r_{2.a}\) and \(\rho =0.1\). In this case the graphs highlight the nature of the link between the predictor and the response: it is possible to detect the existence of a linear relation between the first part of the covariates and \(Y_{i}\), and a nonlinearity in correspondence to the second part of the interval.

Fig. 8
figure 8

Estimates of the directions \(\theta _{j}\) (top panel) and link functions \(g_{j}\) (bottom panels) in the case of the regression operator \(r_{2.a}\)

3.2 Illustrating the selection of the breaking-point

To show how the selection algorithm described in Sect. 2.3 works in practice, in this section we illustrate the results of a simulation study conducted using the regression operators \( r_{2.b}\) and \(r_{3.a}\) defined in Sect. 3.1, where the “true” breaking-point is \(\lambda _{0}=0\) . Data have been generated according to (10) and the regression model (9) in the same way as in Sect. 3.1, with \(\rho =0.1\). Estimations were based on the same parameter conditions used in the previous section (cubic splines with \( k_{1}=k_{2}=3\), and \(h_{j}\) selected by a leave-one-out CV), and the search of the optimal \(\lambda \) is done between \(-0.9\) and \(0.9\) with grid width \( 0.1\) (i.e.\(~\Lambda =\left\{ -0.9,-0.8,\dots ,0.8,0.9\right\} \)). In order to evaluate the role of a misspecification of the breaking-point on the predictive abilities of the PFSIM, once \(\widehat{\lambda }\) was identified, RMSEs have been computed over the test-set using both \(\widehat{ \lambda }\) as well as \(\lambda _{0}\). The experiment has been replicated using \(100\) different random samples, of small, medium and moderately large sample sizes (\(n=50\), \(100\) and \(200\)) in order to relate the identification of \(\lambda \) with the sample size.

Observing the smoothed distributions of the selected \(\widehat{\lambda }\) varying the sample size \(n\) (see plots in Fig. 9) it emerges that, in the proposed examples, the selection method produced some estimates of the parameter \(\lambda \) slightly biased with a variability that decreases, as expected, with \(n\). However, detection problems, ascribable to those raised in the end of Sect. 2.3, do not produce effects on the capacity of the PFSIM to provide good predictions, and this for the flexibility of the procedure. Indeed the box-plots in Figs. 10 and 11 show that the data-driven selected parameters \( \widehat{\lambda }\) gives similar prediction errors as the true \(\lambda _{0}\), which is unknown in practice. This fact justifies the employ of the proposed cross-validation principle in applied frameworks, where the prediction aspect plays a central role.

Fig. 9
figure 9

Estimates of density of selected \(\lambda \) when one uses the regressor operators \(r_{2.b}\) (left panel) and \(r_{3.a}\) (right panel), varying the sample size \(\,n\)

Fig. 10
figure 10

Estimates of RMSEs when one uses \(\lambda \) and the true breaking-point for regression operator \(r_{2.b}\) and varying \(n\)

Fig. 11
figure 11

Estimates of RMSEs when one uses \(\lambda \) and the true breaking-point for regression operator \(r_{3.a}\) and varying \(n\)

4 Application to spectrometric datasets

To knowing the composition of an aliment, instead of relying on expensive chemical analysis, it is often preferable to obtain an estimate by spectroscopic analysis: a spectrometer measures the absorbtion of light emitted with different wavelengths by the studied substance. Absorbtion in function of the wavelength represents a functional data. During the last years, the use of various functional techniques has been widely explored for data of such nature (see for instance, Ferraty and Vieu 2002; Saeysa et al. 2008 or Ferraty et al. 2013).

In what follows, we illustrate an application of our PFSIM method in chemometric analysis: we use the well known Tecator dataset (available at lib.stat.cmu. edu/datasets/tecator): it consists in \(215\) spectra in the near infrared (NIR) wavelength range from \(852\) to 1,050 nm, discretized on a mesh of \(100\) equispaced measures, corresponding to as many finely chopped pork samples. The aim is to predict the fat content \(y_{i}\), measured by chemical analysis, from the spectrometric curve. To avoid the well-known “calibration problem”, due to the presence of shifts in the curves that cause noises, it is conventional to take as regressor \(x_{i}\) the second derivatives of spectrometric curves instead of the original ones (see Ferraty and Vieu 2002).

The regression methodology described in Sect. 2 has been applied over a learning-sample formed by the first \(160\) couples \( \left( x_{i},y_{i}\right) \), and the goodness-of-fit evaluated over a test-set containing the remaining \(55\).

We proceeded with the selection of the breaking-point \(\lambda \) by using the Cross-validation procedure illustrated in Sect. 2.3 , with candidates between \(860\) and 1,040 with mesh width \(10\), so that \( \Lambda =\left\{ 860,870,\dots ,1,\!030,1,\!040\right\} \). The estimator for each \( \lambda \in \Lambda \) was based on cubic splines with \(k_{1}=k_{2}=6\) internal knots. The minimum for \(CV\left( \lambda \right) \) achieved at \( \widehat{\lambda }=960\) (see the left plot of Fig. 12).

Fig. 12
figure 12

Estimated \(\lambda \) and predicted values

Fixed the breaking-point at \(960\), the PSFIM applied to data by using cubic splines with \(k_{1}=9\) and \(k_{2}=5\) internal knots: this choice allows to improve sensibly the performances respect to take the same number of knots for either additive terms. Applying the estimated model to the testing sample, we obtained a square prediction error \(MSE=\frac{1}{55} \sum _{i=1}^{55}\left( y_{i}^{out}-\widehat{y}_{i}\right) ^{2}\) equal to \( 1.3916\) and its relative version \(RMSE\) equal to \(0.00823\). The out-of-sample prediction accuracy can be appreciated looking at the right panel of Fig. 12.

To provide an interpretation of estimated model, we analized the estimated additive terms. First we computed the explained variance by each component as the ratio between the empirical variance of \(\widehat{g}_{j}\left( \int _{805}^{1050}\widehat{\theta }_{j}\left( t\right) x_{j}\left( t\right) dt\right) \) and the variance of response \(y_{i}\). We obtained \(0.96\) for the first term and \(0.02\) for the second one: this said us that the second part of the spectrum, corresponding to wavelengths longer than \(960\) nm, is in practice negligible in explaining the fat content. Therefore, to deepen on the nature of the link function over the relevant part of the spectrum, we look at the estimated directions plotted in Fig. 13: it appears that the wavelengths between \(850\) and \( 890\) nm seem not relevant, whereas the ones in the range 890–950 are the most important. This is coherent with the results on selection of variables in Ferraty et al. (2010) where such interval appears the most interesting.

Fig. 13
figure 13

Estimated directions \(\theta _{j}\) and link functions \( g_{j} \) for spectrometric data

To conclude the analysis we compared the obtained results with ones gave by the same competitors used in Sect. 3.1. Reading the out-of-sample performances reported in Table 1 one can conclude that PFSIM is the best among the proposed techniques.

Table 1 Mean square errors \(MSE\) and relative \(MSE\) on the testing-set for PFSIM, FSIM, FLM and FNPM

Because these data have been widely explored in literature, becoming a benchmark, it is possible to make a large comparison with a lot of methodologies. One can see, for instance, the summary Table 9 in Ferraty and Vieu (2011) and notice that our method appears to be one of the best in terms of prediction. In a nutshell, our method is of great interest on these data for exploratory purpose (see Fig. 13) but it is also one of the most powerful in terms of predictive performance (see Table 9 in Ferraty and Vieu 2011).

5 Conclusions

In this paper we have illustrated a methodology, in the framework of functional regression modeling with scalar response, which allows to approximate in a semi-parametric way the unknown regression operator through a single index approach, but taking possible structural changes into account. The novelty of the methodology consists in treating Single Index Model which can manage ruptures using unsmooth functional directions and additive link functions. In such perspective, our work can be included in the front of literature on selection variable, rather than only in the semi-parametric regression context. In that sense, our paper can be seen as taking part on the recent advances devoted to explore links between Functional Data Analysis and Variable Selection Procedures (see for instance Bongiorno et al. 2014).

An extensive simulation study, made to compare the predictive performance of the method with some classical functional regression competitor (parametric, semi-parametric and nonparametric), has pointed out the abilities of the proposed approach. Moreover it has been shown, through an application to a real benchmark data set, that the method reaches the usual goals of semi-parameric modelling in the sense that it combines good predictive power and interpretability of the outputs: indeed the results obtained are relevant and corroborated by former studies of the same data.

It should be noted that even though the implemented method rests on regressors which are curves, extensions to general functional objects, such as images and arrays, are always possible. Moreover, one can consider situations with a binary response variable.