1 Introduction

This paper proposes a new factor rotation for functional principal components analysis (fPCA). In functional data analysis, the use of principal components has received considerable attention; means of defining principal components were studied in Rice and Silverman (1991) and Silverman (1996) and for sparsely observed curves in Yao et al. (2005a), Peng and Paul (2009) and Paul and Peng (2011). Following carrying out fPCA, one of the most common means of dealing with functional covariates has been by employing principal components scores within multivariate methods. Particular examples include linear models (Yao et al. 2005b; Hall et al. 2006; Goldsmith et al. 2011), as responses (Chiou et al. 2004; Sentürk and Müller 2006) and additive models (Müller and Yao 2008). In some of these cases, interpretation is gained by combining coefficients from the multivariate model with the principal component functions to create a functional parameter—see the functional linear models in Yao et al. (2005b)—but this is not always possible, as in the additive models in Müller and Yao (2008), in which case the model must be interpreted by treating the principal component directions as having particular meanings.

Despite this interest in fPCA, little has been proposed by the way of factor rotations that might make principal components directions more interpretable. Ramsay and Silverman (2005) examine an extension of the VARIMAX rotation (Kaiser 1958) from multivariate factor analysis which has the tendency to produce components that focus on particular ranges of the domain of the functions. Liu et al. (2012) propose a rotation toward periodic components in a remote sensing example with functions that cover multiple years with a distinct annual signal. Other methods from the multivariate factor rotation literature could be considered, but we have found no other suggested factor rotations that make use of the structure of functional data. In this paper, we propose a rotation toward maximally smooth principal components. These are the directions in which there is greatest predictability over time and which are also more interpretable.

The factor rotation that we propose is derived from the definition of min/max autocorrelation factors (MAF) introduced by Shapiro and Switzer (1989) and Switzer and Green (1984) for the analysis of gridded multivariate data and parallel time series data, respectively. The principal underlying MAF is to find linear combinations of the original data that have maximum autocorrelation. This property of MAF is in contrast to PCA which finds linear combinations that have maximum variance. Of particular importance for our setting is the fact that when applied to parallel time series a MAF analysis finds linear combinations of the data that are decreasingly smooth functions of time or, in other words, that are decreasingly predictable functions of time. In this vein, we show that a functional analog of MAF can be obtained that searches for the rotated components that have smallest integrated first derivative. We then demonstrate how this can be extended to any notion of smoothness as given by a linear differential operator defined in Ramsay and Silverman (2005). In our examples, we have developed our methods based on the numerical machinery in the fda package in R (see Ramsay et al. 2009) but they can be readily employed with alternative functional data representations.

In recent literature, interest in the application and theoretical properties of MAF has been increasing (Cunningham and Ghahramani 2014; Gallagher et al. 2014). Of particular relevance to the current work is the paper of Henderson et al. (2009) that compares PCA and MAF in the context of ToF-SIMS image interpretation. The authors conclude that MAF is more effective than PCA for the analysis of high signal intensity data. The importance of MAF for forecasting has been investigated by Woillez et al. (2009) in the context of fish stocks. The utility of MAF for forecasting correlated functional data is ongoing work.

The remainder of the paper is structured as follows: we derive the notion of maximally smooth rotation as an analog of maximally autocorrelated time series in Sect. 2 and describe the extension of these to any linear differential operator. The numerical implementation of this procedure using basis expansion methods is given in Sect. 3 and we demonstrate the effect of these methods in Sect. 4. We finish with some concluding remarks and further directions.

2 Maximal autocorrelation factor rotations (MAFR)

Our methods are developed on top of the Maximum Autocorrelation Functions proposed in Switzer and Green (1984) for multivariate time series. Suppose that we have a multivariate time series \({\mathbf {x}}_1,\ldots ,{\mathbf {x}}_T\) in which at each time point the observed \({\mathbf {x}}_t\) is a vector. The maximally autocorrelated time series is the linear transformation \({\mathbf {b}}\) such that \(\text{ cor }({\mathbf {x}}_t {\mathbf {b}}, {\mathbf {x}}_{t+1} {\mathbf {b}})\) is maximized. In order to apply this to functional data analysis, we re-interpret the criterion as

$$\begin{aligned} {\hat{{\mathbf {b}}}} = \mathop {\mathrm{argmin}}\limits _{{\mathbf {b}}} \frac{ \sum _{t=1}^{T-1} \left( {\mathbf {b}}' {\mathbf {x}}_{t+1} - {\mathbf {b}}' {\mathbf {x}}_t \right) ^2}{ {\mathbf {b}}' \left( \sum _{t=1}' {\mathbf {x}}_t{\mathbf {x}}_t' \right) {\mathbf {b}}}. \end{aligned}$$

In a functional data analysis context, we consider \({\mathbf {x}}_t\) to derive from the evaluation of a vector of functions \({\mathbf {x}}(t)\) at times \(t = i(\Delta t)\) for \(i = 1,\ldots ,T\). By dividing by \(\Delta t\), we can re-represent this criterion as

$$\begin{aligned} {\hat{{\mathbf {b}}}} = \mathop {\mathrm{argmin}}\limits _{{\mathbf {b}}} \frac{ \int {\mathbf {b}}' \dot{{\mathbf {x}}}(t) \dot{{\mathbf {x}}}(t)' {\mathbf {b}}\mathrm{{d}}t }{ {\mathbf {b}}' \int {\mathbf {x}}(t) {\mathbf {x}}(t)' \mathrm{{d}}t {\mathbf {b}}}, \end{aligned}$$

where \(\dot{{\mathbf {x}}}(t)\) is the vector of time-derivatives of \({\mathbf {x}}(t)\).

Here, we recognize the numerator as having the form of a classical first derivative smoothing penalty on the univariate function \(z(t) = {\mathbf {b}}' {\mathbf {x}}(t)\). In this spirit, we can more generally define a criterion by any linear differential smoothing operator L as in Ramsay and Silverman (2005). This allows us to define the MAFR criterion as

$$\begin{aligned} \text{ MAFR }_L({\mathbf {b}}) = \frac{ \int {\mathbf {b}}' L {\mathbf {x}}(t) L{\mathbf {x}}(t)' {\mathbf {b}}\mathrm{{d}}t }{ {\mathbf {b}}' \int {\mathbf {x}}(t) {\mathbf {x}}(t)' \mathrm{{d}}t {\mathbf {b}}}, \end{aligned}$$

where the operator L is a linear combination of derivatives:

$$\begin{aligned} L {\mathbf {x}}(t) = \frac{\mathrm{{d}}^k }{\mathrm{{d}}t^k} {\mathbf {x}}_t + \sum _{j=0}^{k=1} a_j(t) \frac{\mathrm{{d}}^j}{\mathrm{{d}}t^j} {\mathbf {x}}(t). \end{aligned}$$

The most common choices for L correspond to the first and second derivatives: \(L {\mathbf {x}}(t) = \dot{{\mathbf {x}}}(t)\) or \(L {\mathbf {x}}(t) = \ddot{{\mathbf {x}}}(t)\) but more complex penalties can also be useful and we examine the harmonic acceleration penalty

$$\begin{aligned} L {\mathbf {x}}(t) = \frac{\mathrm{{d}}^3}{\mathrm{{d}}t^3} {\mathbf {x}}(t) - \frac{\omega }{2 \pi } \frac{\mathrm{{d}}}{\mathrm{{d}}t} {\mathbf {x}}(t) \end{aligned}$$

which defines smoothness in terms of sine and cosine functions with period \(\omega \) as well as constant shifts (see Ramsay and Silverman 2005).

We have written our criterion in terms of a collection of functions \({\mathbf {x}}(t)\) above, but this method is treated as a factor rotation to be applied following fPCA with a fixed number of components selected. Thus, below we will replace \({\mathbf {x}}\) with \({\varvec{\phi }}(t) = (\phi _1(t),\ldots ,\phi _K(t))\) to conform to more common notation. If the dimension of \({\mathbf {x}}(t)\) is allowed to grow, we will always be able to reduce MAFR by adding further components, yielding rotations in which \(L {\mathbf {b}}' {\mathbf {x}}\rightarrow 0\). This same phenomenon occurs for classical factor rotations in multivariate analysis when the number of variables increases or in MAFs with an increasing number of time series. Liu et al. (2012) found that applying VARIMAX rotations to a large number of principal components resulted in essentially uninterpretable results. Similar comments may be made about the maximal autocorrelation functions in Switzer and Green (1984). Along similar lines, we expect that the trailing components after rotation will be the least interpretable and there is a trade-off between increasing the smoothness of the leading components and allowing some variation to be absorbed into the remaining, less-interpretable versions. These comments also apply to other factor rotations, although in the examples below, we find that the leading components are smoothed, while the remaining ones are relatively unaffected.

3 Numerical implementation

In this section, we describe the numerical implementation of the factor rotation. This can be accomplished easily using the basis expansion methods in the fda package in R (Ramsay et al. 2013, 2009), but it relies only on our ability to obtain inner products of the derivatives of principal component functions.

We assume that a set of principal components \({\varvec{\phi }}(t)\) have been obtained from data. Since these are orthonormal by definition, we have \(\int {\varvec{\phi }}(t)' {\varvec{\phi }}(t) \mathrm{{d}}t\) is given by the identity, and thus the MAFR rotation corresponds to

$$\begin{aligned} {\hat{{\mathbf {b}}}} = \mathop {\mathrm{argmin}}\limits _{{\mathbf {b}}} {\mathbf {b}}' \left[ \int L{{\varvec{\phi }}}(t)' L{{\varvec{\phi }}}(t) \mathrm{{d}}t \right] {\mathbf {b}}, \hbox { subject to } {\mathbf {b}}' {\mathbf {b}}=1. \end{aligned}$$

By standard arguments, the solution to this problem is the smallest eigenvector of the matrix

$$\begin{aligned} P = \left[ \int L{\varvec{\phi }}(t)' L{\varvec{\phi }}(t) \mathrm{{d}}t \right] . \end{aligned}$$
(1)

We may define successive rotations \({\mathbf {b}}_2,\ldots ,{\mathbf {b}}_k\) by minimizing MAFR(\({\mathbf {b}}\)) subject to \({\mathbf {b}}_i' {\mathbf {b}}_j = I_{i=j}\). These are given by the succeeding columns of U in the Eigen-decomposition

$$\begin{aligned} P = UDU'. \end{aligned}$$

We can thus define a rotation to new components

$$\begin{aligned} {\varvec{\psi }}(t) = U' {\varvec{\phi }}(t). \end{aligned}$$

If, as is standard, the diagonal matrix D is ordered from largest to smallest eigenvalues, the final components of \({\varvec{\psi }}\) should be the smoothest. We observe that since both U and the \({\varvec{\phi }}\) are orthonormal, so are the \({\varvec{\psi }}\).

If we have an original set of curves represented in terms of principal component scores

$$\begin{aligned} x_i(t) = \sum _{k=1}^K s_{ij} \phi _j(t) = {\mathbf {s}}_i' {\varvec{\phi }}(t), \end{aligned}$$

then the score vector \({\mathbf {s}}_i\) can be re-represented in the basis defined by \({\varvec{\psi }}(t)\) in terms of \({\varvec{t}}_{i} = U {\mathbf {s}}_i\). If the variances of the original retained principal components are given in the diagonal matrix \(\Sigma \), the MAFR components have scores with associated covariance \(U' \Sigma U\).

We can summarize these calculations in the following pseudocode

  1. 1.

    Initialize by a collection \(x_1(t),\ldots ,x_n(t)\).

  2. 2.

    Obtain a functional principal components decomposition of these data (see e.g. Ramsay and Silverman 2005; Yao et al. 2005b), retaining \({\varvec{\phi }}(t) = \phi _1(t),\ldots ,\phi _K(t)\) as the first K leading components along with the scores \({\mathbf {s}}_1,\ldots ,{\mathbf {s}}_n\).

  3. 3.

    Calculate the \(k \times k\) matrix P in (1) and obtain its eigen-decomposition \(P = UDU'\).

  4. 4.

    Represent the MAFR components as \({\varvec{\psi }}(t) = U'{\varvec{\phi }}(t)\) along with the MAFR scores \({\mathbf {t}}_i = U {\mathbf {s}}_i\).

4 Examples

4.1 A simulated experiment

We begin by experimenting with the effect of this rotation on simulated data in which rotation should help to capture a “true” set of leading principal component directions. For this simulation, we represented 100 curves via a Fourier basis expansion with bases

$$\begin{aligned} f_0(t) = 1, \ f_{2i}(t) = \sin (2 \pi i t), \ f_{2i+1}(t) = \cos (2 \pi i t) \end{aligned}$$

and simulated the coefficients of the first 25 such basis functions as independent normals with exponentially decreasing variance:

$$\begin{aligned} x_i(t) = \sum _{j = 0}^{24} c_{ij} f_j(t), \ c_{ij} \sim N(0, \exp (-j/4)). \end{aligned}$$

Under this framework, 10 principal components are required to capture 99 % of the variation in these data.

We employed a rotation based on minimizing the harmonic acceleration of the leading components and have plotted the original curves along with the first four and final two components in Fig. 1. We also plotted the VARIMAX components. Here we see there is a distinct smoothing of the leading components. Interestingly, the final MAFR component is more purely sinusoidal—with a higher frequency than the harmonic acceleration penalty—than its fPCA counterpart. Note that the MAFR components have been ordered by smoothness and this need not be in order of decreasing variance, although they largely track the original components. By contrast, the VARIMAX rotation spreads out variance among all components. One consequence of this is that the inclusion of more components can significantly change the VARIMAX rotation, while we would not necessarily expect this from MAFR.

Fig. 1
figure 1

Results of factor rotation based on simulated data with sinusoidal functional principal components. Top row: data (left) and variance components for fPCA (solid circles), MAFR (stars), and VARIMAX (crosses) components. Second row: leading four fPCA (left), MAFR (center), and VARIMAX (right) components. Bottom row: ninth (left) and tenth (right) fPCA (solid); MAFR (dashed) and VARIMAX (dotted) components

These simulated data are intended as an illustrative example rather than as a quantitative investigation of the statistical properties of our method and we do not pursue a simulation here. MAFR by definition reduces the roughness of the leading principal components and can also be expected to reduce their variance. In this particular framework, it is also easy to show that rotating 25 principal components exactly recovers the original fourier basis up to changes of sign and the order of sine and cosine pairs.

4.2 Electricity demand data

The electricity demand data are obtained from the R package fds (Shang and Hyndman 2013). The data comprise the half-hourly demand for electricity in Adelaide, Australia over the period 6/7/1997 to 31/3/2007. Electricity demand in Adelaide is highest in summer and winter. Interestingly the variability in electricity demand is largest when temperature is high. For further information on the Electricity Demand data, the reader is referred to Magnano and Boland (2007) and Magnano et al. (2008). In the our analysis, we restrict attention to the Monday demand for electricity and consider electricity demand as a function of time of the day. Figure 2 contains plots of the smoothed electricity demand versus time of the day for each of the 508 Mondays over the period of observation.

Fig. 2
figure 2

Smoothed electricity demand data (top) along with the fPCA (solid circles), MAFR (stars), and VARIMAX (crosses) variance components and fPCA (black), MAFR (dashed), and VARIMAX (dotted) functions. In this analysis, the number of fPCA components required to explain 99 % of the variation were retained

For this analysis, we employed a second derivative rotation to the first five principal component directions—these accounted for 99 % of variation in the data. The second derivative was chosen as a classical smoothness criterion—we seek leading components with low curvature. The remaining plots in Fig. 2 show the fPCA and MAFR components where a smoothing effect is evident, particularly in the second component while the later components largely retain their shape.

For these data, the leading MAFR component represents a nearly constant shift in demand level over the day while the second component measures how quickly demand drops off in the afternoon. Component 3 provides a means to broaden or decrease the width of peak demand. Components 2 and 3 particularly feature better interpretability in the MAFR components compared to the fPCA and VARIMAX components. Component 4 measures early morning demand in all rotations, while component 5 emphasizes the specific change in demand between 6 am and 7 am in all rotations.

5 Discussion

This paper examines the development of factor rotations aimed specifically at providing more interpretable bases for the use of functional principal components. Our approach here has been to find rotations that increase the smoothness of the leading principal components and we find that we are able to provide smoother leading principal components without have a large effect on the more rough components.

The proposed methods are distinct from methods which incorporate smoothing directly into a functional principal components analysis; see, for example Silverman (1996). Here we fix a subspace on which we will project our data and seek a more interpretable representation of it, rather than rotating the subspace so that the original representation is smoother.

Our methods are also distinct from more classical factor rotation methods in that we target the smoothness of the factors in sequence rather than jointly. A joint rotation criterion could be obtained by considering a weighted sum of smoothing factors:

$$\begin{aligned}&({\mathbf {b}}_1,\ldots ,{\mathbf {b}}_k) \\&\quad = \text{ argmin } \sum _{j=1}^k w_j \int \left[ L {\mathbf {b}}_j' {\varvec{\phi }}(t) \right] ^2 \mathrm{{d}}t, \text{ such } \text{ that } {\mathbf {b}}_i' {\mathbf {b}}_j \!=\! I_{i=j} \end{aligned}$$

which is solved by the eigenvectors of \(W^{1/2} P W^{1/2}\) if W is a diagonal matrix with the \(w_j\) on the diagonal and for which the MAFR rotation is a limiting case. However, this then poses the problem of how to select the \(w_j\) and we argue that even after deciding on a dimension, greatest attention is still paid to the leading components and these should be our focus in factor rotation. Our methods also a variation on those of Liu et al. (2012) in defining correlation with an orthogonal basis with respect to a smoothing norm.

While we have some advanced methods designed at improving the smoothness of principal component functions, we believe that there remains potential for the further development of factor rotations aimed at yielding interpretable bases specifically for functional data.