The present book expands SSA methodology in many different directions and unifies different approaches and modifications within the SSA framework. This chapter is introductory; it outlines the main principles and ideas of SSA, presents a unified view on SSA, reviews its computer implementation in the form of the Rssa package, and gives references to all data sources used. The chapter contains eight sections serving different objectives.

Section 1.1 describes the generic structure of all methods from the SSA family and introduces the main concepts essential for understanding different versions of SSA and for making application of SSA in practice efficient.

Section 1.2 classifies different versions of SSA. As explained in that section, there are two complementary directions in which versions of SSA can be created: one is related to geometrical features of the object \(\mathbb {X}\) and the other one is determined by the choice of the procedure of decomposition of the trajectory matrix into rank-one matrices. These two directions of variations of SSA are not related to each other so that any extension of SSA related to the geometry of \(\mathbb {X}\) can be combined with any procedure of decomposition of the trajectory matrix.

Section 1.3 discusses the concept of separability, which is the most theoretically important concept of SSA. Achieving separability (for example, of a signal from noise) is the key task of SSA in most applications. Correct understanding of this concept is therefore imperative for making a particular application of SSA reliable and efficient. We will be returning to separability in many discussions within the book.

Section 1.4 briefly introduces the main underlying model used to apply SSA for common problems such as forecasting, imputation of missing data, and monitoring structural stability of time series. In the one-dimensional case, this model assumes that a part of the series can be described by a linear recurrent relation and in particular, by a sum of damped sinusoids. Estimation of parameters of this model often constitutes the main objective in signal processing.

In Sect. 1.5, we give information about most known implementations of SSA, describe the general structure of the Rssa package, and discuss efficiency of its implementation.

In Sect. 1.6 we briefly discuss the place of SSA among other methods of time series analysis, signal and image processing and provide a short overview of recent publications where a comparison of SSA with several traditional methods has been made.

In Sect. 1.7 we make a short historical survey of SSA, refer to recent applications of SSA and to papers which discuss combination of SSA with other methods; we also list the main papers, which a significant part of this book is based upon.

In Sect. 1.8 we make comments concerning installation of the Rssa package and describe the real-life data sets (taken from many different sources) which we have used in the book for illustrations. We provide the basic information about these data sets and specify their location. The corresponding references would help the reader to get more information about any of these data sets.

1.1 Generic Scheme of the SSA Family and the Main Concepts

We use SSA as a generic term for a family of methods based on a sequential application of the four steps schematically represented in Fig. 1.1 below and briefly described in the next section.

Fig. 1.1
figure 1

SSA family: Generic scheme

1.1.1 SSA Methods

We define an SSA method (or simply SSA) as any method performing the four steps depicted in Fig. 1.1 and briefly described below. The input object \(\mathbb {X}\) is an ordered collection of N numbers (e.g., a time series or a digital image). We denote the set of such objects by . Unless stated otherwise, the entries of \(\mathbb {X}\) are assumed to be real numbers although a straightforward generalization of the main SSA method to the case of complex numbers is available, see Sect. 4.1.

Input: :

\(\mathbb {X}\), an ordered collection of N numbers.

Output: :

A decomposition of \(\mathbb {X}\) into a sum of identifiable components: \(\mathbb {X}=\widetilde {\mathbb {X}}_1+\ldots +\widetilde {\mathbb {X}}_m\).

Step 1: Embedding

The so-called trajectory matrix is constructed, where is a linear map transforming the object \(\mathbb {X}\) into an L × K matrix of certain structure. Let us denote the set of all possible trajectory matrices by . The letter H is used to stress that these matrices have Hankel-related structure.

As an example, in 1D-SSA (that is, SSA for the analysis of one-dimensional real-valued time series), \(\mathbb {X}=(x_1,\ldots ,x_N)\) and maps R N to the space of Hankel matrices L × K with equal values on the anti-diagonals, where N is the series length, L is the window length , which is a parameter, and :

(1.1)

Step 2: Decomposition of X into a Sum of Matrices of Rank 1

The result of this step is the decomposition

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathbf{X} = \sum_i {\mathbf{X}}_i,\quad {\mathbf{X}}_i = \sigma_i U_i V_i^{\mathrm{T}}, \end{array} \end{aligned} $$
(1.2)

where U i  ∈R L and V i  ∈R K are vectors such that ∥U i ∥ = 1 and ∥V i ∥ = 1 for all i and σ i are non-negative numbers.

The main example of this decomposition is the conventional singular value decomposition (SVD ) for real-valued matrices X. If this conventional SVD is used, then we call the corresponding SSA method “Basic SSA” (Golyandina et al. 2001; Chapter 1). Let S = XX T, λ 1 ≥… ≥ λ L  ≥ 0 be eigenvalues of the matrix S, \(d= \mathop {\mathrm {rank}} \mathbf {X} = \max \{j:\,\lambda _j >0\}\), U 1, …, U d be the corresponding eigenvectors , and \(V_j={\mathbf {X}}^{\mathrm {T}} U_j/\sqrt {\lambda _j}\), j = 1, …, d, be factor vectors . Denote \({\mathbf {X}}_j=\sqrt {\lambda _j}U_j V_j^{\mathrm {T}}\). Then the SVD of the trajectory matrix X can be written as

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathbf{X} = {\mathbf{X}}_1 + \ldots + {\mathbf{X}}_d. \end{array} \end{aligned} $$
(1.3)

The triple \((\sqrt {\lambda _j}, U_j, V_j)\) consisting of the singular value \(\sigma _j=\sqrt {\lambda _j}\), the left singular vector U j and the right singular vector V j of X is called jth eigentriple.

Step 3: Grouping

The input in this step is the expansion (1.2) and a specification of how to group the components of (1.2).

Let I = {i 1, …, i p }⊂{1, …, d} be a set of indices. Then the resultant matrix X I corresponding to the group I is defined as \( {\mathbf {X}}_I= {\mathbf {X}}_{i_1}+\ldots +{\mathbf {X}}_{i_p}. \)

Assume that a partition of the set of indices {1, …, d} into m disjoint subsets I 1, …, I m is specified. Then the result of Grouping step is the grouped matrix decomposition

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathbf{X}={\mathbf{X}}_{I_1}+\ldots+{\mathbf{X}}_{I_m}. \end{array} \end{aligned} $$
(1.4)

If only one subset, I, of {1, …, d} is specified, then we still can assume that a partition of {1, …, d} is provided: this is the partition consisting of two subsets, I and \(\bar I = \{1,\ldots ,d\} \setminus I\). In this case, X I is usually associated with the pattern of interest (for example, signal) and \({\mathbf {X}}_{\bar I}= \mathbf {X} - {\mathbf {X}}_{I}\) can be treated simply as a residual.

The grouping of the expansion (1.2), where I k  = {k}, is called elementary.

Step 4: Reconstruction

At this step, each matrix \({\mathbf {X}}_{I_k}\) from the decomposition (1.4) is transferred back to the form of the input object \(\mathbb {X}\). This transformation is performed optimally in the following sense: for a matrix Y ∈ R L×K, we seek for an object that provides the minimum to , where \(\|\mathbf {Z}\|{ }_{\mathrm {F}}=\left (\sum _{ij} |z_{ij}|{ }^2\right )^{1/2}\) is the Frobenius norm of Z = [z ij ] ∈R L×K.

Let be the orthogonal projection from R L×K to in the Frobenius norm. Then . The projection is simply the averaging of the entries corresponding to a given element of an object, see Sect. 1.1.2.6 for details. For example, in 1D-SSA the composite mapping uses the averaging along anti-diagonals so that , where .

Let \(\widehat {\mathbf {X}}_k = {\mathbf {X}}_{I_k}\) be the reconstructed matrices, be the corresponding trajectory matrices, and be the reconstructed objects. Then the resulting decomposition of the initial object \(\mathbb {X}\) is

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \mathbb{X}=\widetilde{\mathbb{X}}_1+\ldots+\widetilde{\mathbb{X}}_m. \end{array} \end{aligned} $$
(1.5)

If the grouping is elementary, then the reconstructed objects \(\widetilde {\mathbb {X}}_k\) in (1.5) are called elementary components.

For convenience of referencing, Steps 1 and 2 of the generic SSA scheme are sometimes combined into the so-called “Decomposition stage” and Steps 3 and 4 are combined into “Reconstruction stage.”

1.1.2 The Main Concepts

1.1.2.1 Parameters of the SSA Methods

  1. Step1:

    parameters of the linear map . For a given object \(\mathbb {X}\), the trajectory matrix can be computed in different ways. In 1D-SSA, there is only one parameter in Step 1, the window length L.

  2. Step2:

    no parameters if the conventional SVD is performed. Otherwise, if an alternative decomposition of X into a sum of rank-one matrices is used, there may be some parameters involved, see Sect. 1.2.1.

  3. Step3:

    the parameter (or parameters) that defines the grouping.

  4. Step4:

    no extra parameters.

1.1.2.2 Separability

A very important concept in the SSA methodology is separability. Let \(\mathbb {X}=\mathbb {X}_1+\mathbb {X}_2\). (Approximate) separability of \(\mathbb {X}_1\) and \(\mathbb {X}_2\) means that there exists a grouping such that the reconstructed object \(\widetilde {\mathbb {X}}_1\) is (approximately) equal to \(\mathbb {X}_1\). The representation \(\mathbb {X}=\mathbb {X}_1+\mathbb {X}_2\) can be associated with many different models such as “signal plus noise,” “trend plus the rest,” and “texture plus the main image.”

If \(\mathbb {X}=\mathbb {X}_1+\mathbb {X}_2\) and \(\mathbb {X}_1\) and \(\mathbb {X}_2\) are approximately separable, then SSA can potentially separate \(\mathbb {X}_1\) from \(\mathbb {X}_2\); that is, it can find a decomposition \(\mathbb {X}=\widetilde {\mathbb {X}}_1+\widetilde {\mathbb {X}}_2\) so that \(\widetilde {\mathbb {X}}_1 \approx \mathbb {X}_1\) and \(\widetilde {\mathbb {X}}_2 \approx \mathbb {X}_2\).

Consider, as an example, Basic SSA. Properties of the SVD yield that the (approximate) orthogonality of columns and of rows of the trajectory matrices X 1 and X 2 of \(\mathbb {X}_1\) and \(\mathbb {X}_2\) can be considered as natural separability conditions.

There is a well-elaborated theory of separability for the one-dimensional time series (Golyandina et al. 2001; Sections 1.5 and 6.1). Many important decomposition problems, from noise reduction and smoothing to trend, periodicity and signal extraction, can be solved by SSA. The success of 1D-SSA in making separation between separable objects is related to the simplicity of the Hankel structure of the trajectory matrices and the optimality features of the SVD.

We will come back to the important concept of separability in Sect. 1.3 where we define the main characteristic which is used in SSA for separability checking.

1.1.2.3 Information for Grouping

The theory of SSA exhibits the ways of helping to detect the components (σ i , U i , V i ) in the decomposition (1.2) related to the object component with certain properties to perform proper grouping under the condition of separability. One of the rules is that U i and V i (eigenvectors and factor vectors in the case of Basic SSA) produced by an object component emulate the properties of this component. For example, in Basic 1D-SSA the eigenvectors produced by slowly-varying series components are slowly-varying, the eigenvectors produced by a sine wave are sine waves with the same frequency, and so on. These properties help to perform the grouping by visual inspection of eigenvectors and also by some automatic procedures, see Sect. 2.7.

1.1.2.4 Trajectory Spaces and Signal Subspaces

Let X be the trajectory matrix corresponding to some object \(\mathbb {X}\). The column (row) trajectory space of X is the linear subspace spanned by the columns (correspondingly, rows) of X. The term “trajectory space” usually means “column trajectory space.” The column trajectory space is a subspace of R L, while the row trajectory space is a subspace of R K. In general, for real-world data the trajectory spaces coincide with the corresponding Euclidean spaces, since they are produced by a noisy data. However, in the “signal plus noise” model, when the signal has rank-deficient trajectory matrix, the signal trajectory space can be called “signal subspace.” Both column and row signal subspaces can be considered; note that the dimensions of the row and column subspaces coincide.

1.1.2.5 Objects of Finite Rank

The class of objects that suit SSA are the so-called objects of finite rank. We say that the object (either time series or image) has L-rank r if the rank of its trajectory matrix is \(r<\min (L,K)\); that is, the trajectory matrix is rank-deficient. If the L-rank r does not depend on the choice of L for any sufficiently large object and trajectory matrix sizes, then we say that the object is of finite rank and has rank r, see Sect. 2.1.2 for rigorous definitions.

Since the trajectory matrices considered in SSA methods are either pure Hankel or consist of Hankel blocks, then the rank-deficient trajectory matrices are closely related to the objects satisfying some linear relations. These linear relations can be used for building forecasting methods. In the one-dimensional case, under some non-restrictive conditions, rank-deficient Hankel matrices are in the one-to-one correspondence with the linear recurrence relations (LRRs) of the form

$$\displaystyle \begin{aligned}x_n= a_1 x_{n-1}+ \ldots +a_r x_{n-r}\end{aligned}$$

and therefore are related to the time series which can be expressed as sums of products of exponentials, polynomials, and sinusoids, see Sect. 2.1.2.2.

Each specific SSA extension produces a class of specific objects of finite rank. The knowledge of ranks of objects of finite rank can help to recognize the rank-one components for the component reconstruction. For example, in order to reconstruct the exponential trend in the one-dimensional case, we need to group only one rank-one component (the exponential function has rank 1), while to reconstruct a sine wave we generally need to group two SVD components (the rank of a sine wave equals 2).

The real-life time series or images are generally not of finite rank. However, if a given object \(\mathbb {X}\) is a sum of a signal of finite rank and noise, then, in view of approximate separability, SSA may be able to approximately extract the signal and subsequently use the methods that are designed for series of finite rank.

1.1.2.6 Reconstruction (Averaging)

Let us formally describe the operation of reconstruction of a matrix used in Step 4 of the generic scheme described in Sect. 1.1.1. By analogy with the one-dimensional case this operation can also be called “averaging over diagonals” even if the averaging will be performed over more complicated patterns.

Assume that the entries x τ of the object \(\mathbb {X} =\{ x_\tau \} \) are indexed by the index τ which can be simply a positive integer (for the one-dimensional series) or multi-index (for digital images).

A linear map is making a one-to-one transformation of to , the set of L × K matrices of a specified structure. It puts elements of \(\mathbb {X}\) on certain places of the matrix .

Let be the object with 1 as the τth entry with all other entries zero. Define the set of indices

where E τ is the matrix

If τ is the place of an element \(x_\tau \in \mathbb {X}\), then (X) ij  = x τ for all .

Assume now that \(\widehat {\mathbf {X}} \in {\mathsf {R}^{L \times K}}\) is an arbitrary L × K matrix and we need to compute

by first making the orthogonal projection of \(\widehat {\mathbf {X}}\) to the set and then writing the result in the object space . This operation is the extension of the “diagonal averaging” procedure applied in 1D-SSA: the elements \(\widetilde x_\tau \) of \(\widetilde {\mathbb {X}}\) are computed by the formula

where , is the number of elements in the set and the Frobenius inner product 〈⋅, ⋅〉F is defined by

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \langle \mathbf{Z}, \mathbf{Y} \rangle_{\mathrm{F}} = \sum_{i, j} z_{ij} y_{ij}\, . \end{array} \end{aligned} $$
(1.6)

1.2 Different Versions of SSA

Let us consider how the four steps of the generic scheme of SSA formulated in Sect. 1.1 can vary for different versions of SSA.

  1. Step1:

    the form of the object \(\mathbb {X}\) and hence the specificity of the embedding operator makes a big influence on how a particular version of SSA looks like.

  2. Step2:

    not only the conventional SVD but many other decompositions of X into rank-one matrices could be used.

  3. Step3:

    formally, this step is exactly the same for all versions of SSA although the tools used to perform the grouping may differ.

  4. Step4:

    the embedding operator defined in Step 1 fully determines the operations performed at this step.

Therefore, we have two directions for creating different versions of SSA: the first direction is related to geometrical features of the object \(\mathbb {X}\) and a form of the embedding operator , while the second direction is determined by the form of the decomposition at Step 2. Essentially, Step 1 is determined by the form of the object; therefore, its variations can be considered as extensions of 1D-SSA. If instead of SVD we use some other decomposition of X into rank-one matrices at Step 2, then we call the corresponding algorithm a modification of Basic SSA. These two directions of variations of SSA are not related to each other so that any extension of Step 1 can be combined with any modification of Step 2.

We start the discussion with outlining some modifications that can be offered for the use at Step 2.

1.2.1 Decomposition of X into a Sum of Rank-One Matrices

1.2.1.1 Variations of SSA Related to Methods of Decomposition

The conventional SVD formulated in the description of SSA in Sect. 1.1.1 is a decomposition of X into a sum of rank-one matrices, which has some optimality properties, see Golyandina et al. (2001; Chapter 4). Therefore, Basic SSA , which is SSA with the conventional SVD used at Step 2, can be considered as the most fundamental version of SSA among all SSA methods.

Let us enumerate several variations of SSA, which could be useful for answering different questions within the framework of SSA.

A well-known modification of Basic SSA is Toeplitz SSA (Sect. 2.2), which was created for dealing with stationary time series. This modification is devised for the analysis of a natural estimate of the auto-covariance matrix of the original time series and assumes that this time series is stationary. However, if the time series \(\mathbb {X}\) is non-stationary, then the reconstruction obtained by Toeplitz SSA can have a considerable bias.

An important variation of SSA is SSA with projection (Sect. 2.3). If we have a parametric model (which should be linear in parameters and agreeable with the finite-rank assumption) for one of the components of the series, such as trend of a one-dimensional series, then a projection on a suitable subspace is performed and is followed by a decomposition of the residual, e.g., by the SVD. The known methods of SSA with centering and SSA with double centering for extraction of constant and linear trends, respectively, are special cases of SSA with projection. More generally, an arbitrary polynomial trend can be extracted by a suitable version of SSA with projection. Another use of SSA with projection is to build a subspace from a supporting series and project the main series onto this subspace.

In some versions of SSA the intention is to improve separability properties of SVD. If we use an oblique version of the SVD, then the resulting SSA method becomes Oblique SSA. The following two versions of Oblique SSA seem to be useful in practice, namely Iterative Oblique SSA (Sect. 2.4) and Filter-adjusted Oblique SSA (Sect. 2.5). The latter is useful for separation of components with equal contribution.

1.2.1.2 Nested Application of Different Versions of SSA

Since Oblique SSA does not have good approximating features, it cannot replace Basic SSA which uses the conventional SVD. Therefore, Oblique SSA should be used in a nested manner so that Basic SSA is used first to extract several components without performing careful split of these components and then one of the proposed oblique methods is used for separating the mixed components.

If we use Basic SSA for denoising and then some other version of SSA (like Independent Component Analysis or Oblique SSA) for improvement of separability, then we can interpret this as if using Basic SSA for preprocessing and using another method for a more refined analysis. There is, however, a significant difference between this and some methods mentioned in Sect. 1.7.3, where Basic SSA is used for preprocessing and then ARIMA or other methods of different nature are employed. Indeed, when we use Basic SSA for denoising and some other SSA technique like Oblique SSA for improvement of separability, then we are using the signal subspace estimated by Basic SSA rather than the estimated signal itself (recall that in the transition from the estimated signal subspace to the estimated signal we incur an additional error).

Let us schematically demonstrate the nested use of the methods as follows. Let \(\mathbb {X}={\mathbb {X}}^{(1)}+{\mathbb {X}}^{(2)}+{\mathbb {X}}^{(3)}\) be a decomposition of the time series and X = X (1) + X (2) + X (3) be the corresponding decomposition of the trajectory matrix of \(\mathbb {X}\). Let Basic SSA return at Decomposition stage \(\mathbf {X}=\widetilde {\mathbf {X}}^{(1,2)}+\widetilde {\mathbf {X}}^{(3)}\) and assume that a chosen nested method makes the decomposition \(\widetilde {\mathbf {X}}^{(1,2)}=\widetilde {\mathbf {X}}^{(1)}+\widetilde {\mathbf {X}}^{(2)}\). Then the final result is \(\mathbf {X}=\widetilde {\mathbf {X}}^{(1)}+\widetilde {\mathbf {X}}^{(2)}+\widetilde {\mathbf {X}}^{(3)}\) and, after the diagonal averaging, \(\mathbb {X}=\widetilde {\mathbb {X}}^{(1)}+\widetilde {\mathbb {X}}^{(2)}+\widetilde {\mathbb {X}}^{(3)}\). There is no need for reconstruction of the signal by Basic SSA as only the estimated signal subspace is used for making a refined decomposition.

1.2.1.3 Features of Decompositions

The result of Decomposition step of SSA (Step 2) can be written in the form (1.2). The SVD is a particular case of (1.2) and corresponds to the orthonormal systems of {U i } and {V i }. By analogy with the SVD, we will call (σ i , U i , V i ) eigentriples , σ i singular values, U i left singular vectors or eigenvectors, V i right singular vectors or factor vectors. For most of SSA decompositions, each U i belongs to the column space of X while each V i belongs to the row space of X. We shall call such decompositions consistent.

If the systems {U i } and {V i } are linearly independent, then the decomposition (1.2) is minimal ; that is, it has smallest possible number of addends equal to \(r= \mathop {\mathrm {rank}} \mathbf {X}\). If at least one of the systems {U i } or {V i } is not linearly independent, then the decomposition (1.2) is not minimal. If the decomposition (1.2) is not consistent, then it can be not minimal even if {U i } or {V i } are linearly independent, since their projections on the column (or row) space can be dependent.

If both vector systems {U i } and {V i } are orthogonal, then the decomposition (1.2) is called biorthogonal. If {U i } is orthogonal, then the decomposition is called left-orthogonal; if {V i } is orthogonal, then the decomposition is called right-orthogonal.

If X i are F-orthogonal so that 〈X i , X j F = 0 for i ≠ j, then we say that the corresponding decompositions are F-orthogonal . Either left or right orthogonality is sufficient for F-orthogonality. For F-orthogonal decompositions (1.2), \(\|\mathbf {X}\|{ }_{\mathrm {F}}^2 = \sum _i \|{\mathbf {X}}_i\|{ }_{\mathrm {F}}^2\). In general, however, \(\|\mathbf {X}\|{ }_{\mathrm {F}}^2\) may differ from \(\sum _i \|{\mathbf {X}}_i\|{ }_{\mathrm {F}}^2\).

The contribution of kth matrix component X k is defined as \( \sigma _k^2 / \|\mathbf {X}\|{ }_{\mathrm {F}}^2\), where \(\sigma _k^2= \|{\mathbf {X}}_k\|{ }_{\mathrm {F}}^2\). For F-orthogonal decompositions, the sum of component contributions is equal to 1. Otherwise, this sum can considerably differ from 1 (e.g., the sum of component contributions can be 146%).

1.2.1.4 Decompositions in Different Versions of SSA

Let us gather several versions of 1D-SSA which are implemented in the Rssa package and based on different procedures used at Decomposition step, and indicate their features. Some of these variations are also implemented for multivariate and multidimensional versions of SSA.

Basic SSA ::

the conventional SVD , consistent , minimal , biorthogonal and therefore F-orthogonal decomposition. Implemented in ssa with kind="1d-ssa".

Toeplitz SSA ::

generally, non-consistent, non-minimal F-orthogonal decomposition. Implemented in ssa with kind="toeplitz-ssa".

SSA with projection ::

F-orthogonal but non-consistent decomposition if at least one basis vector used for the projection does not belong to the column (row) trajectory space. The components, which are obtained by projections, are located at the beginning of the decomposition and have indices 1, …, n special. Implemented in ssa with kind="1d-ssa" and non-NULL row.projector or column.projector arguments.

Oblique SSA with filter preprocessing (Filter-adjusted O-SSA) ::

consistent, minimal F-orthogonal decomposition. The main particular case is DerivSSA . Implemented in fossa.

Iterative Oblique SSA (Iterative O-SSA)::

consistent, minimal oblique decomposition. Implemented in iossa.

Oblique versions of SSA are made to perform in a nested manner.

1.2.2 Versions of SSA Dealing with Different Forms of the Object

In this section, we briefly consider different versions of SSA which operate objects \(\mathbb {X}\) of different forms. The main difference between different versions of SSA of this section is the form of the embedding operator .

As has been mentioned above, SSA can be applied to multivariate and multidimensional objects. SSA for a system of series is called Multivariate (or Multichannel) SSA , shortly MSSA (Sect. 4.2); SSA for digital gray-scale images is called 2D-SSA (Sect. 5.1).

Complex SSA (Sect. 4.1) is a special version of SSA for the analysis of two time series of equal length or a single one-dimensional complex-valued time series.

Shaped SSA (Sects. 2.6 and 5.2) can process data of complex structure and arbitrary shape; the dimension of the object \(\mathbb {X}\) is irrelevant. Shaped SSA can be applied to many different kinds of data including time series, systems of time series, digital images of rectangular and non-rectangular shapes.

For both series and images, circular versions of SSA are available. For series, circular SSA works in the metric of a circle and therefore this version is suitable for series, which are indeed defined on a circle.

For images, circular versions of SSA provide a possibility to decompose images given on a cylinder (for example, obtained as a cylindrical projection of a sphere or an ellipsoid) or given on a torus. Circular versions allow to eliminate the edge effects, which are unavoidable in the case of, e.g., planar unfolding of a cylinder.

Table 1.1 contains a list of the extensions considered in this book.

Table 1.1 Classification of different versions of SSA based on different geometrical features of the object \( \mathbb {X}\)

1.3 Separability in SSA

In this section, we discuss the SSA separability in more detail; see also Sects. 2.1.3, 2.3.3, 2.4.2, 2.5.2 for special cases.

Let us assume that we observe a sum of two objects \(\mathbb {X}=\mathbb {X}^{(1)}+\mathbb {X}^{(2)}\). We say that SSA separates these two objects if a grouping at Grouping step (Step 3) can be found such that \(\widetilde {\mathbb {X}}^{(k)}=\mathbb {X}^{(k)}\) for k = 1, 2. If these equalities hold approximately, then this defines approximate separability. Asymptotic separability can be introduced if the series length N →. In this case, approximate separability takes place for large enough series lengths. The separability property is very important for SSA as it means that the method potentially works; that is, it is able to extract the object components.

In Basic SSA and its multidimensional extensions, (approximate) separability means (almost) orthogonality of the object components, since the biorthogonal SVD decomposition is used. In other versions of SSA, different conditions of separability can be formulated.

If the decomposition (1.2) at Decomposition step is not unique, then two variants of separability are introduced. Weak separability means that there exists a decomposition such that after grouping we obtain \(\widetilde {\mathbb {X}}^{(k)}=\mathbb {X}^{(k)}\). Strong separability means that this equality is achievable for any admissible decomposition.

Conditions of exact separability are very restrictive whereas asymptotic separability takes place for a wide range of object components. For example, slowly varying smooth components are asymptotically separable from regular oscillations and they both are asymptotically separable from noise.

In order to verify separability of the reconstructed components \(\widetilde {\mathbb {X}}_1\) and \(\widetilde {\mathbb {X}}_2\), we should check orthogonality of their reconstructed trajectory matrices \(\widetilde {\mathbf {X}}_1\) and \(\widetilde {\mathbf {X}}_2\). A convenient measure of their orthogonality is the Frobenius inner product \(\langle \widetilde {\mathbf {X}}_1,\widetilde {\mathbf {X}}_2\rangle _{\mathrm {F}}\) defined in (1.6).

The normalized measure of orthogonality is

$$\displaystyle \begin{aligned}\rho(\widetilde{\mathbf{X}}_1,\widetilde{\mathbf{X}}_2)=\langle \widetilde{\mathbf{X}}_1,\widetilde{\mathbf{X}}_2\rangle_{\mathrm{F}}/ (\|\widetilde{\mathbf{X}}_1\|{}_{\mathrm{F}} \|\widetilde{\mathbf{X}}_2\|{}_{\mathrm{F}})\, .\end{aligned}$$

Since the trajectory matrix consists of entries of the τth element x τ of the initial ordered object, we can introduce the weighted inner product in the space : \((\mathbb {Y},\mathbb {Z})_{\mathbf {w}}=\sum _\tau w_\tau y_\tau z_\tau \). Then the quantity

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} \rho_{\mathbf{w}}(\widetilde{\mathbb{X}}_1,\widetilde{\mathbb{X}}_2) = \rho(\widetilde{\mathbf{X}}_1,\widetilde{\mathbf{X}}_2)=\frac{(\widetilde{\mathbb{X}}_1,\widetilde{\mathbb{X}}_2)_{\mathbf{w}}} {\|\widetilde{\mathbb{X}}_1\|{}_{\mathbf{w}} \|\widetilde{\mathbb{X}}_2\|{}_{\mathbf{w}}} \end{array} \end{aligned} $$
(1.7)

will be called w -correlation by statistical analogy. Note however that in this definition the means are not subtracted.

Let \(\widetilde {\mathbb {X}}_j\) be the elementary reconstructed components produced by the elementary grouping I j  = {j}. Then the matrix of \(\rho ^{(\mathbf {w})}_{ij}= \rho _{\mathbf {w}}(\widetilde {\mathbb {X}}_i,\widetilde {\mathbb {X}}_j)\) is called w -correlation matrix.

The norm ∥⋅∥ w is called the weighted norm and serves as a measure of contribution of the components in the decomposition (1.5): the contribution of \(\widetilde {\mathbb {X}}_k\) is defined as \(\|\widetilde {\mathbb {X}}_k\|{ }_{\mathbf {w}}^2\big /\|\mathbb {X}\|{ }_{\mathbf {w}}^2\).

If the weighted correlation between a pair of elementary components is large, then this suggests that these two components are highly correlated and should perhaps be included into the same group.

1.4 Forecasting, Interpolation, Low-Rank Approximation, and Parameter Estimation in SSA

There is a class of objects, which is special for SSA. This is the class of objects satisfying linear recurrence relations. Trajectory matrices of these objects are rank-deficient and, moreover, for these objects the number of non-zero terms in the SVD (1.3) does not depend on the window length if this length is sufficiently large; we will say in such cases that the objects are of finite rank. The class of objects satisfying linear recurrence relations provides a natural model of the signal for SSA, which is a fundamentally important concept for forecasting of time series.

Linear recurrence relation (LRR) for a time series \(\mathbb {S}_N=(s_i)_{i=1}^{N}\) is a relation of the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} s_{i+t}=\sum_{k=1}^t a_k s_{i+t-k},\ 1\leq i\leq N-t,\ a_t\neq 0, \ t<N. \end{array} \end{aligned} $$
(1.8)

It is well known (see, e.g., Hall (1998; Theorem 3.1.1)) that a sequence \(\mathbb {S}_{\infty }=(s_1,\ldots ,s_n,\ldots )\) satisfies the LRR (1.8) for all i ≥ 0 if and only if for some integer p we have for all n

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} s_n = \sum_{k=1}^p P_k(n) \mu_k^n, \end{array} \end{aligned} $$
(1.9)

where P k (n) are polynomials in n and μ k are some complex numbers. For real-valued time series, (1.9) implies that the class of time series governed by the LRRs consists of sums of products of polynomials, exponentials, and sinusoids.

A simplified model of (1.9) is \(s_n = \sum \limits _{k=1}^p c_k \mu _k^n\). Estimation of complex numbers \(\mu _k=\rho _k e^{\mathfrak {i} 2\pi \omega _k}\) is equivalent to estimating the frequencies ω k and the rates \(\ln \rho _k\).

For images \(\mathbb {S}=[s_{mn}]\), LRRs are two-dimensional and the common term of the signal (which can be called pattern) under the model has the form

$$\displaystyle \begin{aligned} \begin{array}{rcl} {} s_{mn} = \sum_{k=1}^p P_k(m,n) \mu_k^m \nu_k^n. \end{array} \end{aligned} $$
(1.10)

This important fact is well known starting from Kurakin et al. (1995; §2.20).

In many real-life problems, a noisy signal (or noisy pattern for images) is observed and the problem is to forecast the signal, impute gaps in the signal, estimate signal parameters, find change-points in the signal, and so on. Note that it is not compulsory to assume that the noise is random. In a very general sense, noise is a residual which does not require further investigation.

SSA may provide estimates of the signal space, which is the subspace spanned by the chosen basis during Grouping step in SSA. The estimation of the signal subspace can be performed by iterations (so-called Cadzow iterations), which consist of iterative SSA processing, see Sect. 3.4.

In the process of estimation of the signal subspace we also obtain a parametric estimate of the signal; that is, of the set of {μ k } in the 1D case and the set of {μ k , ν k } in the 2D case. One of the most common methods is called ESPRIT for series and 2D-ESPRIT for images (see Sects. 3.1.1.2 and 5.3).

For series, sequential on-line estimation of the signal structure produces natural algorithms of subspace tracking (e.g., for monitoring structural stability of an object changing in time) and change-point detection. Also, we can forecast the found structure, e.g., by application of the constructed LRR. By performing interpolation (which is forecasting inside the series) missing data can be filled in.

1.5 The Package

1.5.1 SSA Packages

There are many implementations of SSA. They differ by potential application areas, implemented methods, interactive or non-interactive form, free or commercial use, computer system (Windows, Unix, Mac), level of reliability and support. The most known supported software packages implementing SSA are the following:

  1. 1.

    http://gistatgroup.com:

    general-purpose interactive “Caterpillar”-SSA software (Windows) following the methodology described in Golyandina et al. (2001), Golyandina and Zhigljavsky (2013);

  2. 2.

    http://www.atmos.ucla.edu/tcd/ssa:

    oriented mainly on climatic applications SSA-MTM Toolkit for spectral analysis (Ghil et al. 2002) (Unix) and its commercial extension kSpectra Toolkit (Mac), interactive;

  3. 3.

    The commercial statistical software, SAS, has an econometric extension called SAS/ETSⓇ, which includes SSA in its rather basic form; this version of SSA is based on the methodology of Golyandina et al. (2001).

  4. 4.

    http://cran.r-project.org/web/packages/Rssa:

    R package Rssa (Korobeynikov et al. 2017), an implementation of the main SSA procedures for major platforms, extensively developed.

We consider the Rssa package as an efficient implementation of a large variety of the main SSA algorithms. This package also contains many visual tools which are useful for making proper choice of SSA parameters and examination of results.

Rssa is the only SSA package available from CRAN and we believe it is the fastest implementation of SSA. Another important feature of the package is its very close relation to the SSA methodology thoroughly described in Golyandina et al. (2001), Golyandina and Zhigljavsky (2013), Golyandina et al. (2015). As a result of this, the use of the package is well theoretically and methodologically supported.

1.5.2 Tools for Visual Control and Choice of Parameters

SSA needs tools to help choosing the method parameters and controlling the results. To a great extent, SSA is an exploratory technique and hence visual tools are vital and they are extensively used in Rssa. For example, in order to help to choose the groups in (1.4), Rssa allows plotting measures of separability of series components in the obtained decompositions.

The tools for accuracy control are divided into two groups. First, stability of results with respect to parameter changes can be checked. Second, the bootstrap procedure could be used when the model (not necessarily parametric) of either the series or other object is built on the base of signal reconstruction and then the accuracy of this model is assessed by simulation according to the estimated model. Shortly, Rssa allows the user to enjoy a large variety of graphical tools and bootstrap procedures.

1.5.3 Short Introduction to Rssa

The Rssa package implements all methods and tools mentioned above.

The main function is ssa, which initializes an ssa object and by default performs the decomposition by different methods. Together with reconstruct, they implement the SSA method. For nested versions, iossa and fossa serve for refined decompositions.

An ssa object s contains, among others, elements of the decomposition (1.2), which can be accessed as s$sigma, s$U, and s$V. Features of the decompositions differ for different versions of SSA (see Sect. 1.2.1.4). For Basic SSA, s$sigma are called singular values; squares of s$sigma are called eigenvalues; s$U are called eigenvectors. (We keep these names for other versions of SSA as well.) The relative contributions of components to the decomposition can be obtained as contributions(s); see Sect. 1.2.1.3, where formulas for their calculation are given and explained.

A variety of functions plot help to visualize the results and additional information. Functionality of SSA-related methods is supplemented by the functions forecast, parestimate, and some others.

All essential versions of SSA are implemented in Rssa but not all further actions like forecasting and gap filling are consistent with all implemented versions of SSA. The user can check the ssa object, which is returned by the main function ssa, for compatibility by the function ssa.capabilities. This function returns a list of capabilities with information TRUE or FALSE, respectively.

A general scheme of investigation by means of Rssa is as follows:

  1. 1.

    perform decomposition by ssa;

  2. 2.

    visualize the result by plot;

  3. 3.

    if necessary, refine decomposition by iossa or fossa;

  4. 4.

    again, visualize the result by plot;

  5. 5.

    perform grouping based on the obtained visual and numerical information; in particular, choose the group of signal components;

  6. 6.

    then perform one of the following actions: reconstruction of series components by reconstruct, forecasting by forecast, parameter estimation by parestimate;

  7. 7.

    visualize the result by plot.

Note that Rssa contains more algorithms than this book formulates. However, the book has enough information to understand how to extend the algorithms, such as parameter estimation, filling-in missing data and O-SSA, to different dimensions and geometries. Many of these versions are implemented in Rssa but not described in the book explicitly: there are infinitely many dimensions and geometries and the algorithms are formulated in the book in such a manner that they can be easily generalized if needed.

1.5.4 Implementation Efficiency

The user does not need to know the specifics of the internal implementation of the Rssa functions. However, understanding of the general principles of implementation can help to use the package more effectively.

The fast implementation of SSA-related methods, which was suggested in Korobeynikov (2010), extended in Golyandina et al. (2015) and is used in the Rssa package (Korobeynikov et al. 2017), relies on the following techniques (see Shlemov and Golyandina (2014) for a more thorough discussion).

  1. 1.

    The truncated SVD calculated by the Lanczos methods (Golub and Van Loan 1996; Ch. 9) is used. In most SSA applications, only a number of leading SVD components correspond to the signal and therefore are used at Grouping step of the SSA algorithm. Thus, a truncated SVD rather than the full SVD is usually required by SSA.

  2. 2.

    Lanczos methods do not use the explicit representation of the decomposed matrix A. They need only the results of multiplication of A and A T on some vectors. In view of the special Hankel-type structure of A in the SSA algorithms, multiplication by a vector can be implemented very efficiently with the help of the Fast Fourier Transform (FFT). Fast SVD algorithms are implemented in the R-package svd (Korobeynikov et al. 2016) in such a way that their input is the function of a vector which yields the result of fast multiplication of the vector by the trajectory matrix. Therefore, the use of svd in Rssa allows a fast and space-efficient implementation of the SSA algorithms.

  3. 3.

    At Reconstruction step, hankelization or quasi-hankelization of a matrix of rank 1, stored as σUV T, can be written by means of the convolution operator and therefore can also be effectively implemented; this is also done in Rssa.

The overall complexity of the computations is \(O(k N \log (N)+ k^{2}N)\), where N is the number of elements in a shaped object and k is the number of considered eigentriples, see details in Korobeynikov (2010) and in Golyandina et al. (2015). This makes the computations feasible for large data sets and large window sizes. For example, the case of an image of size 299 × 299 and a window size 100 × 100 can be handled rather easily, whereas the conventional algorithms (e.g., the full SVD (Golub and Van Loan 1996)) are impractical, because the matrix that needs to be decomposed has size 104 × 4 ⋅ 104. Using larger window sizes is often advantageous, since, for example, separability of signal from noise (in the “signal+noise” scenario) can be significantly improved.

1.6 Comparison of SSA with Other Methods

In this section, we provide some notes and a short bibliographical overview concerning comparison of SSA with several traditional methods.

1.6.1 Fourier Transform, Filtering, Noise Reduction

  • Fourier Transform uses a basis given in advance, while SSA uses an adaptive basis, which is not restricted to a frequency grid with resolution 1∕N. The wavelet transform also uses fixed bases; the advantage of the wavelet transform is that a change in the frequencies can be detected by the used time-space basis. In the framework of SSA, the analysis of time series with changing frequency structure can be performed by using moving procedures, e.g., by subspace tracking.

  • One of the state-of-the-art methods of frequency estimation is the high-resolution method ESPRIT, which is a subspace-based method. This method can be considered as an SSA-related method and indeed it is frequently used in the present book and in Rssa.

  • Fourier Transform is very inefficient for series with modulations in amplitudes and frequencies. SSA can easily deal with amplitude modulation but cannot efficiently deal with frequency modulation.

  • SSA decomposition can sometimes be viewed as an application of a set of linear filters (Bozzo et al. 2010; Harris and Yan 2010; Golyandina and Zhigljavsky 2013) with an interpretation depending on the window length L. For small L, each decomposition component on the interval [L, K], where K = N − L + 1, can be obtained by a linear filter. Therefore, the viewpoint of filtering on the decomposition result can be adequate. For example, the reconstruction by the leading components is close to application of the triangle filter.

    If L ≃ K and hence the interval [L, K] is small, then it is not so. In this case, the separability approach, which is based on orthogonality of separated components, is more appropriate. Note that oblique versions of SSA can weaken the condition of orthogonality, see Sect. 2.4.

  • There is a big difference between the moving averaging and SSA for noise reduction. Consider an example of a noisy sinusoid. The moving averaging will add a bias in estimation caused by the second derivative of the signal, while SSA with large L will provide an unbiased estimate of the signal.

    Note that even for small L, when the reconstruction by the leading component is a weighted moving average with positive weights and therefore has the same drawbacks as the moving averaging, the user can add additional components to remove the possible bias.

  • Filtering by SSA to obtain noise reduction can be considered from the viewpoint of the low-rank approximation. The good approximation properties yield appropriate noise suppression. Empirical-mode decomposition (EMD), in turn, starts Intrinsic Mode Functions (IMF) with high frequencies, while the trend is contained in the last IMFs.

As an example of comparison of SSA, Fourier transform and wavelet transform see, e.g., Kumar et al. (2017). The authors conclusion states: “the SSA-based filtering technique is robust for regional gravity anomaly separation and could be effectively exploited for filtering other geophysical data.”

In Barrios-Muriel et al. (2016), an SSA-based de-noising technique for removal of electrocardiogram interference in Electromyography signals is compared with the high-pass Butterworth filter, wavelets, and EMD. The authors of this paper state: “the proposed SSA approach is a valid method to remove the ECG artifact from the contaminated EMG signals without using an ECG reference signal.”

In Watson (2016), many different methods for trend extraction are compared for synthetic data simulating sea level behavior; SSA is compared against moving average, wavelets, regression, EMD. The author writes: “the optimum performing analytic is most likely to be SSA whereby interactive visual inspection (VI) techniques are used by experienced practitioners to optimize window length and component separability.”

Comparison of SSA filtering and Kalman filters (KF) can be found in Chen et al. (2016), where it is shown that “both SSA and KF obtain promising results from the stations with strong seasonal signals, while for the stations dominated by the long-term variations, SSA seems to be superior.”

1.6.2 Parametric Regression

Parametric regression naturally assumes a parametric model. There is a big difference between parametric and non-parametric models: if the assumed model is true, then the related parametric methods are the most appropriate methods (if there are no outliers in the data). If the assumed parametric model is not true, then the results of parametric methods are biased and may be very misleading. Drawbacks of non-parametric methods are also clear: there are problems with forecasting, testing the model, confidence interval construction, and so on. Frequently, non-parametric methods serve as preprocessing tools for parametric methods. As discussed in Sect. 1.7.3, this is often the case for SSA.

For comparison of SSA with double centering and linear regression see, for example, Sect. 2.3 and Golyandina and Shlemov (2017). It appears that SSA with double centering as preprocessing method considerably improves the accuracy of linear trend estimation.

SSA has a very rare advantageous property: it can be a non-parametric method for preliminary analysis and can also be parametric for modeling the series governed by LRRs. Moreover, the forecast by an LRR uses the parametric model in implicit manner; therefore, it is more robust to deviations from the model than the forecast based on explicit parameter estimation.

One of the subspace-based method for constructing the model of the signal, which is governed by an LRR, is Hankel low-rank approximation (HLRA). HLRA can be considered as a method of parameter estimation in a parametric model, where only the rank of the signal is given rather than exact parametric form, see Sect. 3.4.

1.6.3 ARIMA and ETS

First, the (Seasonal) ARIMA and Exponential smoothing models (ETS, which means Error, Trend, Seasonal) totally differ from the model of SSA (for a comprehensive introduction to ARIMA and ETS, see Hyndman and Athanasopoulos (2013)). In particular, in ARIMA the noise is added at each recurrence step, while for SSA the noise is added after the signal is formed. Also, trends/seasonality in SSA are deterministic, while in ARIMA/ETS the trends/seasonality are random. As in many classical methods, ARIMA and ETS need the period values to be specified for the periodic components.

However, if one considers the analysis/forecast of real-life time series, then these time series do not exactly follow any model. Therefore, the problem of comparison of methods of different nature is not easy.

As a rule, confidence intervals for ARIMA forecasts are too large but the mean forecast can often be adequate. Advantage of Seasonal ARIMA and ETS is that the model and its parameters can be fitted automatically on the base of information criteria.

Rigorously substantiated information criteria are not constructed for SSA. One of the reasons for this is the fact that SSA is a non-parametric method. The most standard approach for the choice of parameters, when there are no given models, is the minimization of the forecasting error on the validation period. In the most frequent case, when the forecast is constructed on the base of r leading eigentriples, SSA has only two parameters (L and r), which can be estimated by the minimization of the forecasting error for several forecasts performed within the validation period, see Sect. 3.5.7.

Comparison of SSA and ARIMA/ETS was performed in many papers. Some examples are as follows.

  • It is demonstrated in Hassani et al. (2015) that SSA has topped several other methods in an example involving forecasting of tourist arrivals,

  • It is exhibited in Vile et al. (2012) that for predicting ambulance demands “SSA produces superior longer-term forecasts (which are especially helpful for EMS planning), and comparable shorter-term forecasts to well established methods.”

  • The author of Iqelan (2017) concludes: “The forecasting results are compared with the results of exponential smoothing state space (ETS) and ARIMA models. The three techniques do similarly well in forecasting process. However, SSA outperforms the ETS and ARIMA techniques according to forecasting error accuracy measures.”

  • In Hassani et al. (2009, 2013), the univariate and multivariate SSA were favorable in a comparison with ARIMA and VAR for forecasting of several series of European industrial production.

1.7 Bibliographical Notes

1.7.1 Short History

Commencement of SSA is usually associated with publication in 1986 of the papers Broomhead and King (1986) and Broomhead and King (1986b). However, some ideas, which later became parts of SSA, have been formulated very long ago (de Prony 1795). Arguably, the most influential papers on SSA published in the 1980 and 1990s, in addition to Broomhead and King (1986,b), are Fraedrich (1986), Vautard and Ghil (1989), Vautard et al. (1992), Allen and Smith (1996). In view of many successful applications of SSA, the number of publications considering SSA methodology grows exponentially and has surely reached few hundred.

A parallel development of SSA (under the name “Caterpillar”) has been conducted in the former USSR, especially, in St.Petersburg (known at that time as Leningrad), see, e.g., Danilov and Zhigljavsky (1997). The authors of this book continue the traditions of the St.Petersburg school of SSA.

The monograph (Golyandina et al. 2001) contains a comprehensive description of the theoretical and methodological foundations of SSA for one-dimensional (1D) time series; the authors of that monograph tried to summarize all the knowledge about 1D-SSA available at that time. A short book (Golyandina and Zhigljavsky 2013) developed further the methodology of 1D-SSA. It reflected the authors’ new understandings as well as new SSA insights including subspace-based methods, filtering and rotations in the signal space for improving separability. A substantial paper (Golyandina et al. 2015) supplements the above books by expanding SSA for processing multivariate time series and digital images.

1.7.2 Some Recent Applications of SSA

The number of publications devoted to applications of SSA is steadily increasing. In addition to the standard applications areas such as climatology, meteorology, and geophysics, there are now many papers devoted to applications in engineering, economics, finance, biomedicine, and other areas. One can find many references to recent publications in Zhigljavsky (2010) and many papers in the two special issues of Statistics and Its Interface (2010, v.3, No.3 and 2017, v.10, No.1), which are either fully or partly devoted to SSA. In this short section we briefly mention some recent applications of SSA. In most of these papers, only the simplest versions of SSA (that is, Basic SSA of Sect. 2.1 and Toeplitz SSA of Sect. 2.2) have been used.

Advantages of 2D-SSA (described in Sect. 5.1) over some other methods of image processing are demonstrated in Zabalza et al. (2014, 2015) in application to hyperspectral imaging. Application of 2D-SSA to gap-filling is considered in von Buttlar et al. (2014). Application of Multivariate SSA for detecting oscillator clusters in multivariate datasets is proposed in Groth and Ghil (2015).

It is not easy to find applied areas related to analysis of temporal data, where 1D-SSA was not applied. Let us give some examples. In Salgado et al. (2013) and several other papers of the same authors, SSA has been used as the main technique in the development of a tool-wear monitoring system. Security of mobile devices is considered in Genkin et al. (2016), where SSA is used for preprocessing. In Sella et al. (2016), SSA was used for extraction of economic cycles. Filho and Lima (2016) use SSA for gap filling of precipitation data. Some recent applications in climatology were considered in Mudersbach et al. (2013), Monselesan et al. (2015) and in Pepelyshev and Zhigljavsky (2017). In Karnjana et al. (2017), SSA helps to solve the problem of unauthorized modification in speech signals. In Barrios-Muriel et al. (2016), SSA is used for de-noising in the problem of removal of electrocardiogram interference in electromyography signals. The paper (Hudson and Keatley 2017) is related to the decomposition and reconstruction of long-term flowering records of eight eucalypt species. In Wang et al. (2017), SSA was used as a preprocessing tool prior to making a classification of a medical data; the authors wrote: “the results have demonstrated the robustness of the approach when testing on large scale datasets with clinically acceptable sensitivity and specificity.”

1.7.3 SSA for Preprocessing/Combination of Methods

For many different methods, SSA provides improvement if it is used as a preprocessing tool. There are dozens of papers, where hybrid methods incorporating SSA are considered. In most of the applications, SSA serves for either denoising or feature extraction. Let us give some examples of papers considering hybrids of SSA and other methods.

SSA is used as a preprocessing step for ARIMA in Zhang et al. (2011). A cooperative hybrid of SSA, ARIMA, and Holt-Winters is suggested in Xin et al. (2015). In Lakshmi et al. (2016) it is shown that the hybrid SSA + ARMAX is better than ARMAX alone for detection of structural damages for problems of Structural Health Monitoring.

In machine leaning, SSA is frequently used to obtain new characteristics of time series for a subsequent use of them in other models and methods. This is called feature extraction. The paper (Sivapragasam et al. 2001) is considered as a one of the first papers, where SSA is used together with Support Vector Machines (SVM). A hybrid of SSA with Neural Networks was suggested in Lisi et al. (1995).

In Wang et al. (2016), support vector machine regression (SVR) is applied separately to the trend and fluctuations, which are extracted by SSA. The constructed method is applied to forecast a time series data of failures gathered at the maintenance stage of the Boeing 737 aircraft. It is shown that the suggested hybrid SSA+SVR outperforms Holt-Winters, autoregressive integrated moving average, multiple linear regression, group method of data handling, SSA, and SVR used separately. Similar techniques are considered in Xiao et al. (2014), where SSA is employed for extraction of the trend and seasonality and then Neural Networks and fuzzy logic are applied to them separately with consequent combination. In Wu and Chau (2011), SSA is successfully used for noise removal before Neural Networks are applied. This work contains a review of different approaches to Rainfall-runoff modeling by means of SSA used in combination with other methods.

In Zabalza et al. (2014), SSA has been applied in hyperspectral imaging for effective feature extraction (noise removal), and then SVM was used for classification. It appeared that SSA performed preprocessing better than Empirical Mode Decomposition (EMD). Note that SSA and EMD do not only compete; they can be successful as hybrids. For example, in Yang et al. (2012) EMD is used for trend extraction and then SSA is applied to forecast changes in the trend.

1.7.4 Materials Used in This Book

In writing this book we have used much material from different sources. Many sections contain the material which is entirely new but other sections are based on our previous publications. Let us briefly describe the main references we have used in writing the theoretical and methodological material of the book.

Chapter 1 (Introduction: Overview) contains an original approach to the SSA modifications from a general viewpoint. The generic scheme of SSA-family methods from Sect. 1.1 was suggested in Golyandina et al. (2015).

1D-SSA is well elaborated and therefore Chap. 2 (SSA analysis of one-dimensional time series), in addition to some new material (this especially concerns new examples and the discussions concerning Rssa), revises standard SSA techniques. Sections 2.1 (Basic SSA) and Sect. 2.2 (Toeplitz SSA) contain standard material partially taken from Golyandina et al. (2001; Chapter 1). Ideas of Sect. 2.3 (SSA with projection) were firstly suggested in Golyandina et al. (2001; Section 1.7.1) (centering) and then extended in Golyandina and Shlemov (2017). Methods described in Sect. 2.4 (Iterative Oblique SSA) and Sect. 2.5 (Filter-adjusted O-SSA and SSA with derivatives) were suggested in Golyandina and Shlemov (2015); these two sections closely follow this paper. Section 2.6 (Shaped 1D-SSA) contains a particular case of Shaped SSA, which was suggested in Golyandina et al. (2015) for multidimensional case and is described in Sect. 5.2. Section 2.7 (Automatic grouping in SSA) follows Alexandrov (2009) and Golyandina and Zhigljavsky (2013; Section 1.4.5).

Much of the theoretical material of Chap. 3 (Parameter estimation, forecasting, gap filling) is standard for the methodologies of the subspace-based methods. In writing Sect. 3.1 (Parameter estimation) we have extensively used Golyandina and Zhigljavsky (2013; Sections 2.2, 2.8). Section 3.2 (Forecasting) includes the algorithms from Golyandina et al. (2001; Chapter 2). Gap filling in Sect. 3.3 contains two methods, iterative method taken from Kondrashov and Ghil (2006) and the subspace-based method taken from Golyandina and Osipov (2007). Both methods are described in accordance with Golyandina and Zhigljavsky (2013). Section 3.4 devoted to structured low-rank approximation (briefly, SLRA) describes Cadzow-like (Cadzow 1988) iterative algorithms for finding low-rank approximations. SLRA is a very standard approach, which was extended to weighted Cadzow iterations in Zhigljavsky et al. (2016a) and Zvonarev and Golyandina (2017).

In writing Chap. 4 (SSA for multivariate time series) we use Golyandina and Stepanov (2005) and Golyandina et al. (2015). In Chap. 5 (Image processing), we mainly follow Golyandina et al. (2015). Moreover, in Sect. 5.1 (2D-SSA) we use material from Golyandina and Usevich (2010) and in Sect. 5.2 (Shaped 2D-SSA) and Sect. 5.3 (2D ESPRIT) we incorporate the ideas developed in Golyandina et al. (2015) and Shlemov and Golyandina (2014).

Some material in the algorithmic and Rssa sections is based on the papers (Golyandina and Korobeynikov 2013; Golyandina et al. 2015).

1.8 Installation of Rssa and Description of the Data Used in the Book

1.8.1 Installation of Rssa and Usage Comments

The package Rssa is available from CRAN on http://CRAN.R-project.org/package=Rssa and can be installed via the standard install.packages routine and therefore all the dependencies are installed automatically.

There is a special library, FFTW (Frigo and Johnson 2005), which is not mandatory for the installation of Rssa; however, possibilities of Rssa would be considerably lower if FFTW is not installed. The library FFTW should be installed prior to Rssa by the standard tools of the used operating system. For example, FFTW can be installed by running apt-get install libfftw3-bin libfftw3-dev (Ubuntu Linux) or brew install fftw (MacOS, homebrew). Windows pre-built packages from CRAN already use FFTW.

Sources of Rssa can be found at https://github.com/asl/rssa, where the user can ask questions about installation and usage problems. In addition, the current development version of Rssa could be installed straight from github repository using install_github from devtools (Wickham and Chang 2017).

In this book, Rssa v1.0 has been used for all illustrative examples. All SSA computations can be reproduced by the reader and should run correctly by any later version of Rssa. The sets of data used in these examples are included into the R-package ssabook, unless a particular set is contained in one of the following three R-packages: built-in datasets, fma, and Rssa. The description of the datasets is contained in Tables 1.2 and 1.3. In order to reproduce the examples from the book, ssabook should be installed as well as lattice, latticeExtra, plyr, and fma. Source codes for all examples as well as the R-package ssabook can be downloaded from the web-site devoted to the book https://ssa-with-r-book.github.io.

Table 1.2 Description of data and R-packages
Table 1.3 Description of data and sources

In the following chapters there are quite a few sections named “Description of functions.” In these sections, we describe the main Rssa functions and their basic arguments. For more information about the Rssa functions and their arguments, we refer the reader to the help information in the Rssa package.

1.8.2 Data Description

Tables 1.2 and 1.3 present the data used in the book for examples. All these sets of data can be found in the R-packages indicated in the fourth column of Table 1.2. Table 1.3 contains one of possible references; detailed descriptions and references can be found in the corresponding R-packages.