Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Feature Extraction

Pattern recognition and data compression are two applications that rely critically on efficient data representation [1]. The task of pattern recognition is to decide to which class of objects an observed pattern belonging to, and the compression of data is motivated by the need to save the number of bits to represent the data while incurring the smallest possible distortion [1]. In these applications, it is desirable to extract measurements that are invariant or insensitive to the variations within each class. The process of extracting such measurements is called feature extraction. It is also to say feature extraction is a data processing which maps a high-dimensional space to a low-dimensional space with minimum information loss.

Principal component analysis (PCA) is a well-known feature extraction method, while minor component analysis (MCA) and independent component analysis (ICA) can be regarded as variants or generalizations of the PCA. MCA is most useful for solving total least squares (TLS) problems, and ICA is usually used for blind signal separation (BSS).

In the following, we briefly review PCA, PCA neural networks, and extensions or generalizations of PCA.

1.1.1 PCA and Subspace Tracking

The principal components (PC) are the directions in which the data have the largest variances and capture most of the information contents of data. They correspond to the eigenvectors associated with the largest eigenvalues of the autocorrelation matrix of the data vectors. Expressing data vectors in terms of the PC is called PCA. On the contrary, the eigenvectors that correspond to the smallest eigenvalues of the autocorrelation matrix of the data vectors are defined as the minor components (MC), and MC are the directions in which the data have the smallest variances (they represent the noise in the data). Expressing data vectors in terms of the MC is called MCA. Now, PCA has been successfully applied in many data processing problems, such as high-resolution spectral estimation, system identification, image compression, and pattern recognition, and MCA is also applied in total least squares, moving target indication, clutter cancelation, curve and surface fitting, digital beamforming, and frequency estimation.

The PCA or MCA is usually one dimensional. However, in real applications, PCA or MCA is mainly multiple dimensional. The eigenvectors associated with the r largest (or smallest) eigenvalues of the autocorrelation matrix of the data vectors is called principal (or minor) components, and r is referred to as the number of the principal (or minor) components. The eigenvector associated with the largest (smallest) eigenvalue of the autocorrelation matrix of the data vectors is called largest (or smallest) component. The subspace spanned by the principal components is called principal subspace (PS), and the subspace spanned by the minor components is called minor subspace (MS). In some applications, we are only required to find the PS (or MS) spanned by r orthonormal eigenvectors. The PS is sometimes called signal subspace, and the MS is called noise subspace. Principal and minor component analyzers of a symmetric matrix are matrix differential equations that converge on the PCs and MCs, respectively. Similarly, the principal (PSA) and minor (MSA) subspace analyzers of a symmetric matrix are matrix differential equations that converge on a matrix whose columns’ span is the PS and MS, respectively. PCA/PSA and MCA/MSA are powerful techniques in many information processing fields. For example, PCA/PSA is a useful tool in feature extraction, data compression, pattern recognition, and time series prediction [2, 3], and MCA/MSA has been widely applied in total least squares, moving target indication, clutter cancelation, curve and surface fitting, digital beamforming, and frequency estimation [4].

As discussed before, the PC is the direction which corresponds to the eigenvector associated with the largest eigenvalue of the autocorrelation matrix of the data vectors, and the MC is the direction which corresponds to the eigenvector associated with the smallest eigenvalue of the autocorrelation matrix of the data vectors. Thus, implementations of these techniques can be based on batch eigenvalue decomposition (ED) of the sample correlation matrix or on singular value decomposition (SVD) of the data matrix. This approach is unsuitable for adaptive processing because it requires repeated ED/SVD, which is a very time-consuming task [5]. Thus, the attempts to propose adaptive algorithms are still continuing even though the field has been active for three decades up to now.

1.1.2 PCA Neural Networks

In order to overcome the difficulty faced by ED or SVD, a number of adaptive algorithms for subspace tracking were developed in the past. Most of these techniques can be grouped into three classes [5]. In the first class, classical batch ED/SVD methods such as QR algorithm, Jacobi rotation, power iteration, and Lanczos method have been modified for the use in adaptive processing [610]. In the second class, variations in Bunch’s rank-one updating algorithm [11], such as subspace averaging [12, 13], have been proposed. The third class of algorithms considers the ED/SVD as a constrained or unconstrained optimization problem. Gradient-based methods [1419], Gauss–Newton iterations [20, 21], and conjugate gradient techniques [22] can then be applied to seek the largest or smallest eigenvalues and their corresponding eigenvectors adaptively. Rank revealing URV decomposition [23] and rank revealing QR factorization [24] have been proposed to track the signal or noise subspace.

Neural network approaches on PCA or MCA pursue an effective “online” approach to update the eigen direction after each presentation of a data point, which possess many obvious advantages, such as lower computational complexity, compared with the traditional algebraic approaches such as SVD. Neural network methods are especially suited for high-dimensional data, since the computation of the large covariance matrix can be avoided, and for the tracking of nonstationary data, where the covariance matrix changes slowly over time. The attempts to improve the methods and to suggest new approaches are continuing even though the field has been active for two decades up to now.

In the last decades, many neural network learning algorithms were proposed to extract PS [2531] or MS [4, 3240]. In the class of PS tracking, lots of learning algorithms such as Oja’s subspace algorithm [41], the symmetric error correction algorithm [42], and the symmetric version of the back propagation algorithm [43] were proposed based on some heuristic reasoning [44]. Afterward, some information criterions were proposed and the corresponding algorithms such as LMSER algorithm [31], the projection approximation subspace tracking (PAST) algorithm [5], the conjugate gradient method [45], the Gauss–Newton method [46], and the novel information criterion (NIC) algorithm were developed [44]. These gradient-type algorithms could be claimed to be globally convergent.

In the class of MS tracking, many algorithms [3240] have been proposed on the basis of the feedforward neural network models. Mathew and Reddy proposed the MS algorithm based on a feedback neural network structure with sigmoid activation function [46]. Using the inflation method, Luo and Unbehauen proposed an MSA algorithm that does not need any normalization operation [36]. Douglas et al. presented a self-stabilizing minor subspace rule that does not need periodically normalization and matrix inverses [40]. Chiang and Chen showed that a learning algorithm can extract multiple MCs in parallel with the appropriate initialization instead of inflation method [47]. On the basis of an information criterion, Ouyang et al. developed an adaptive MC tracker that automatically finds the MS without using the inflation method [37]. Recently, Feng et al. proposed the OJAm algorithm and extended it for tracking multiple MCs or the MS, which makes the corresponding state matrix tend to a column orthonormal basis of the MS [35].

1.1.3 Extension or Generalization of PCA

It can be found that the above-mentioned algorithms only focused on eigenvector extraction or eigen-subspace tracking with noncoupled rules. However, a serious speed stability problem exists in the most noncoupled rules [28]. This problem is that in noncoupled PCA rules the eigen motion in all directions mainly depends on the principal eigenvalue of the covariance matrix; thus, numerical stability and fast convergence can only be achieved by guessing this eigenvalue in advance [28]; in noncoupled MCA rules the speed of convergence does not only depend on the minor eigenvalue, but also depend on all other eigenvalues of the covariance matrix, and if these extend over a large interval, no suitable learning rate may be found for a numerical solution that can still guarantee stability and ensure a sufficient speed of convergence in all eigen directions. Therefore, the problem is even more severe for MCA rules. To solve this common problem, Moller proposed some coupled PCA algorithms and some coupled MCA algorithms based on a special information criteria [28]. In coupled rules, the eigen pair (eigenvector and eigenvalue) is simultaneously estimated in coupled equations, and the speed of convergence only depends on the eigenvalue of its Jacobian. Thus, the dependence of the eigenvalues on the covariance matrix can be eliminated [28]. Recently, some modified coupled rules have been proposed [48].

It is well known that the generalized eigen decomposition (GED) plays very important roles in various signal processing applications, e.g., data compression, feature extraction, denoising, antenna array processing, and classification. Though PCA, which is the special case of GED problem, has been widely studied, the adaptive algorithms for the GED problem are scarce. Fortunately, a few efficient online adaptive algorithms for the GED problem that can be applied in real-time applications have been proposed [4954]. In [49], Chaterjee et al. present new adaptive algorithms to extract the generalized eigenvectors from two sequences of random vectors or matrices. Most algorithms in literatures including [49] are gradient-based algorithms [50, 51]. The main problem of this type of algorithms is slow convergence and the difficulty in selecting an appropriate step size which is essential: A too small value will lead to slow convergence and a too large value will lead to overshooting and instability. Rao et al. [51] have developed a fast recursive least squares (RLS)-like, not true RLS, sequential algorithm for GED. In [54], by reinterpreting the GED problem as an unconstrained minimization problem via constructing a novel cost function and applying projection approximation method and RLS technology to the cost function, RLS-based parallel adaptive algorithms for generalized eigen decomposition was proposed. In [55], a power method-based algorithm for tracking generalized eigenvectors was developed when stochastic signals having unknown correlation matrices are observed. Attallah proposed a new adaptive algorithm for the generalized symmetric eigenvalue problem, which can extract the principal and minor generalized eigenvectors, as well as their corresponding subspaces, at a low computational cost [56]. Recently, a fast and numerically stable adaptive algorithm for the generalized Hermitian eigenvalue problem (GHEP) was proposed and analyzed in [48].

Other extensions of PCA also include dual-purpose algorithm [5764], the details of which can be found in Chap. 5, and adaptive or neural networks-based SVD singular vector tracking [6, 6570], the details of which can be found in Chap. 9.

1.2 Basis for Subspace Tracking

In Sect. 1.1, we have reviewed the PCA algorithm and its extensions and generalizations from the viewpoint of the feature extraction. In this section, from another viewpoint of subspace, we will discuss the concept of subspace and subspace tracking method.

1.2.1 Concept of Subspace

Definition 1

If \( {\varvec{S}} = \{ {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{m} \} \) is the vector subset of vector space V, then the set W of all linear combinations of \( {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{m} \) is called the subspace spanned by \( {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{m} \), namely

$$ {\varvec{W}} = {\text{Span}}\{ {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{m} \} = \{ {\varvec{u}}:{\varvec{u}} = \alpha_{1} {\varvec{u}}_{1} + \alpha_{2} {\varvec{u}}_{2} + \cdots + \alpha_{m} {\varvec{u}}_{m} \} , $$
(1.1)

where each vector in W is called the generator of W, and the set \( \{ {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{m} \} \) which is composed of all the generators is called the spanning set of the subspace. A vector subspace which only comprises zero vector is called a trivial subspace. If the vector set \( \{ {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{m} \} \) is linearly irrespective, then it is called a group basis of W.

Definition 2

The number of vectors in any group basis of subspace W is called the dimension of W, which is denoted by dim(W). If any group basis of W is not composed of finite linearly irrespective vectors, then W is called an infinite-dimensional vector subspace.

Definition 3

Assume that \( {\varvec{A}} = [{\varvec{a}}_{1} ,{\varvec{a}}_{2} , \ldots ,{\varvec{a}}_{n} ] \in {\varvec{C}}^{m \times n} \) is a complex matrix and all the linear combinations of its column vectors constitute a subspace, which is called column space of matrix A and is denoted by Col(A), namely

$$ {\text{Col}}({\varvec{A}}) = {\text{Span}}\{ {\varvec{a}}_{1} ,{\varvec{a}}_{2} , \ldots ,{\varvec{a}}_{n} \} = \left\{ {\varvec{y}} \in {\varvec{C}}^{m} :{\varvec{y}} = \sum\limits_{j = 1}^{n} {\alpha_{j} } {\varvec{a}}_{j} :\alpha_{j} \in {\varvec{C}}\right\} . $$
(1.2)

Row space of matrix A can be defined similarly.

As stated in the above, the column space and row space of matrix \( {\varvec{A}}_{m \times n} \) are spanned by n column vectors and m row vectors, respectively. If rank(A) is equal to r, then only r column or row vectors of matrix, which are linearly irrespective, can constitute column space Span(A) and row space Span(A H), respectively. Obviously, it is an economical and better subspace expression method to use basis vector. The methods of constituting a subspace have primary transforms, and one can also use singular value decomposition to set up a normal orthogonal basis of base space.

Suppose that the data matrix A has measure error or noises, and define measure data matrix as

$$ {\varvec{X}} = {\varvec{A}} + {\varvec{W}} = [{\varvec{x}}_{1} ,{\varvec{x}}_{2} , \ldots ,{\varvec{x}}_{n} ] \in {\varvec{C}}^{m \times n} , $$
(1.3)

where \( {\varvec{x}}_{i} \in {\varvec{C}}^{m \times 1} . \) In the fields of signal processing and system science, the column space of measure data matrix \( {\text{Span}}({\varvec{X}}) = {\text{Span}}\{ {\varvec{x}}_{1} ,{\varvec{x}}_{2} , \ldots ,{\varvec{x}}_{n} \} \) is called measure data space.

Define the correlation matrix as:

$$ {\varvec{R}}_{X} = E\{ {\varvec{X}}^{H} {\varvec{X}}\} = E\{ ({\varvec{A}} + {\varvec{W}})^{H} ({\varvec{A}} + {\varvec{W}})\} . $$
(1.4)

Suppose that error matrix \( {\varvec{W}} = [{\varvec{w}}_{1} ,{\varvec{w}}_{2} , \ldots ,{\varvec{w}}_{n} ] \) is statistically irrespective of real data matrix A, then

$$ {\varvec{R}}_{X} = E\{ {\varvec{X}}^{H} {\varvec{X}}\} = E\{ {\varvec{A}}^{H} {\varvec{A}}\} + E\{ {\varvec{W}}^{H} {\varvec{W}}\} . $$
(1.5)

Define \( {\varvec{R}} = E\{ {\varvec{A}}^{H} {\varvec{A}}\} \) and \( E\{ {\varvec{W}}^{H} {\varvec{W}}\} = \sigma_{w}^{2} {\varvec{I}} \), namely every measure noise is statistically irrespective and they have the same variance \( \sigma_{w}^{2} \), it holds that

$$ {\varvec{R}}_{X} = {\varvec{R}} + \sigma_{w}^{2} {\varvec{I}}. $$
(1.6)

Define rank(A) = r, and the eigenvalue decomposition of matrix \( {\varvec{R}}_{X} = E\{ {\varvec{X}}^{H} {\varvec{X}}\} \) can be written as \( {\varvec{R}}_{X} = {\varvec{U\varLambda U}}^{H} + \sigma_{w}^{2} {\varvec{I}} = {\varvec{U}}({\varvec{\varLambda}} + \sigma_{w}^{2} {\varvec{I}}){\varvec{U}}^{H} = {\varvec{U\varPi U}}^{H} , \) where \( {\varvec{\varPi}} = {\varvec{\varSigma}} + \sigma_{w}^{2} {\varvec{I}} = {\text{diag}}\left( {\sigma_{1}^{2} + \sigma_{w}^{2} , \ldots ,\sigma_{r}^{2} + \sigma_{w}^{2} ,\sigma_{w}^{2} , \ldots ,\sigma_{w}^{2} } \right) \), \( {\varvec{\varSigma}} = {\text{diag}}(\sigma_{1}^{2} , \ldots ,\sigma_{r}^{2} ,0, \ldots ,0), \) and \( \sigma_{1}^{2} \ge \sigma_{2}^{2} \ge \cdots \ge \sigma_{r}^{2} \) are the nonzero eigenvalues of the real autocorrelation matrix \( {\varvec{R}} = E\{ {\varvec{A}}^{H} {\varvec{A}}\} . \)

Obviously, if the signal-to–noise ratio is large enough, that is, \( \sigma_{r}^{2} \) is obviously bigger than \( \sigma_{w}^{2} \), then the first r largest eigenvalues of autocorrelation matrix \( {\varvec{R}}_{X} \), namely \( \lambda_{1} = \sigma_{1}^{2} + \sigma_{w}^{2} ,\lambda_{2} = \sigma_{2}^{2} + \sigma_{w}^{2} , \ldots ,\lambda_{r} = \sigma_{r}^{2} + \sigma_{w}^{2} \) are called the principal eigenvalues, and the remaining n − r small eigenvalues \( \lambda_{r + 1} = \sigma_{w}^{2} ,\lambda_{r + 2} = \sigma_{w}^{2} , \ldots ,\lambda_{n} = \sigma_{w}^{2} \) are called the minor eigenvalues. Thus, the eigen decomposition of autocorrelation matrix \( {\varvec{R}}_{X} \) can be written as

$$ {\varvec{R}}_{X} = \left[ {\begin{array}{*{20}c} {{\varvec{U}}_{S} } & {{\varvec{U}}_{n} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{\varSigma}}_{S} } & {\varvec{O}} \\ {\varvec{O}} & {{\varvec{\varSigma}}_{n} } \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{U}}_{S}^{H} } \\ {{\varvec{U}}_{n}^{H} } \\ \end{array} } \right] = {\varvec{S\varSigma }}_{S} {\varvec{S}}^{H} + {\varvec{G\varSigma }}_{n} {\varvec{G}}^{H} , $$
(1.7)

where \( {\varvec{S}}\mathop = \limits^{\text{def}} [{\varvec{s}}_{1} ,{\varvec{s}}_{2} , \ldots ,{\varvec{s}}_{r} ] = [{\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{r} ], \) \( {\varvec{G}}\mathop = \limits^{\text{def}} [{\varvec{g}}_{1} ,{\varvec{g}}_{2} , \ldots ,{\varvec{g}}_{n - r} ] = [{\varvec{u}}_{r + 1} ,{\varvec{u}}_{r + 2} , \ldots ,{\varvec{u}}_{n} ], \) \( {\varvec{\varSigma}}_{S} = {\text{diag}}(\sigma_{1}^{2} + \sigma_{w}^{2} ,\sigma_{2}^{2} + \sigma_{w}^{2} , \ldots ,\sigma_{r}^{2} + \sigma_{w}^{2} ), \) \( {\varvec{\varSigma}}_{n} = {\text{diag}}(\sigma_{w}^{2} ,\sigma_{w}^{2} , \cdots ,\sigma_{w}^{2} ) \); \( m \times r \) unitary matrix S is the matrix composed of the eigenvectors which correspond to the r principal eigenvalues, and \( m \times (n - r) \) unitary matrix G is the matrix composed of the eigenvectors which correspond to the n − r minor eigenvalues.

Definition 4

Define S as the eigenvector matrix which correspond to the first r largest eigenvalues \( \lambda_{1} ,\lambda_{2} , \ldots ,\lambda_{r} \) of the autocorrelation matrix of the measurement data. Then its column space \( {\text{Span}}({\varvec{S}}) = {\text{Span}}\{ {\varvec{u}}_{1} ,{\varvec{u}}_{2} , \ldots ,{\varvec{u}}_{r} \} \) is called the signal subspace of measurement data space \( {\text{Span}}({\varvec{X}}) \), and the column space \( {\text{Span}}({\varvec{G}}) = {\text{Span}}\{ {\varvec{u}}_{r + 1} ,{\varvec{u}}_{r + 2} , \ldots ,{\varvec{u}}_{n} \} \) of the eigenvector matrix G which correspond to the n − r minor eigenvalues is called the noise subspace of measurement data space.

In the following, we analyze the geometric meaning of the signal subspace and the noise subspace. From the constitution method of subspace and the feature of unitary matrix, we know that the signal subspace and noised subspace are orthogonal, that is,

$$ {\text{Span}}\{ {\varvec{s}}_{1} ,{\varvec{s}}_{2} , \ldots ,{\varvec{s}}_{r} \} \bot {\text{Span}}\{ {\varvec{g}}_{1} ,{\varvec{g}}_{2} , \ldots ,{\varvec{g}}_{n - r} \} . $$
(1.8)

Since U is a unitary matrix, it holds that

$$ {\varvec{UU}}^{H} = \left[ {\begin{array}{*{20}c} {\varvec{S}} & {\varvec{G}} \\ \end{array} } \right]\left[ {\begin{array}{*{20}c} {{\varvec{S}}^{H} } \\ {{\varvec{G}}^{H} } \\ \end{array} } \right] = {\varvec{SS}}^{H} + {\varvec{GG}}^{H} = {\varvec{I}}, $$

that is,

$$ {\varvec{GG}}^{H} = {\varvec{I}} - {\varvec{SS}}^{H} . $$
(1.9)

Define the projection matrix of signal subspace as

$$ {\varvec{P}}_{S} \mathop = \limits^{\text{def}} {\varvec{S}}\left\langle {{\varvec{S}},{\varvec{S}}} \right\rangle^{ - 1} {\varvec{S}}^{H} = {\varvec{SS}}^{H} , $$
(1.10)

where the matrix inner product \( \left\langle {{\varvec{S}},{\varvec{S}}} \right\rangle = {\varvec{S}}^{H} {\varvec{S}} = {\varvec{I}} \).

Thus, \( {\varvec{P}}_{S} {\varvec{x}} \) can be considered as the projection of vector x on the signal subspace, and \( ({\varvec{I}} - {\varvec{P}}_{S} ){\varvec{x}} \) means the orthogonal projection of vector x on the signal subspace. From \( \left\langle {{\varvec{G}},{\varvec{G}}} \right\rangle = {\varvec{G}}^{H} {\varvec{G}} = {\varvec{I}} \), it holds that the projection matrix on the noise subspace is \( {\varvec{P}}_{n} = {\varvec{G}}\left\langle {{\varvec{G}},{\varvec{G}}} \right\rangle^{ - 1} {\varvec{G}}^{H} = {\varvec{GG}}^{H} \). Therefore, the following matrix

$$ {\varvec{GG}}^{H} = {\varvec{I}} - {\varvec{SS}}^{H} = {\varvec{I}} - {\varvec{P}}_{S} $$
(1.11)

is usually called as the orthogonal projection matrix of signal subspace.

The subspace applications have the following characteristics [5, 71]:

  1. (1)

    Only a few singular vectors or eigenvectors are needed. Since the number of larger singular values (or eigenvalues) of matrix \( {\varvec{A}}_{m \times n} \) is smaller than the number of smaller singular values (or eigenvalues), it is more efficient to use the signal subspace with smaller dimension than the noise subspace.

  2. (2)

    In many application occasions, one does not need to know the singular values or eigenvalues, and only needs to know the matrix rank and singular vectors or eigenvectors of matrix.

  3. (3)

    In most instances, one does not need to know the singular vectors or eigenvectors of matrix well and truly, and only needs to know the basis vectors spanned by the signal subspace or noise subspace.

1.2.2 Subspace Tracking Method

The iterative computation of an extreme (maximal or minimum) eigen pair (eigenvalue and eigenvector) can date back to 1966 [72]. In 1980, Thompson proposed a LMS-type adaptive algorithm for estimating eigenvector, which correspond to the smallest eigenvalue of sample covariance matrix, and provided the adaptive tracking algorithm of the angle/frequency combing with Pisarenko’s harmonic estimator [14]. Sarkar et al. [73] used the conjugate gradient algorithm to track the variation of the extreme eigenvector which corresponds to the smallest eigenvalue of the covariance matrix of the slowly changing signal and proved its much faster convergence than Thompson’s LMS-type algorithm. These methods were only used to track single extreme value and eigenvector with limited application, but later they were extended for the eigen-subspace tracking and updating methods. In 1990, Comon and Golub [6] proposed the Lanczos method for tracking the extreme singular value and singular vector, which is a common method designed originally for determining some big and sparse symmetrical eigen problem \( {\varvec{Ax}} = \lambda {\varvec{x}} \) [74].

The earliest eigenvalue and eigenvector updating method was proposed by Golub in 1973 [75]. Later, Golub’s updating idea was extended by Bunch et al. [76, 77], the basic idea of which is to update the eigenvalue decomposition of the covariance matrix after every rank-one modification, and then go to the matrix’s latent root using the interlacing theorem, and then update the place of the latent root using the iterative resolving root method. Thus, the eigenvector can be updated. Later, Schereiber [78] introduced a transform to change a majority of complex number arithmetic operation into real-number operation and made use of Karasalo’s subspace mean method [79] to further reduce the operation quantity. DeGroat and Roberts [80] developed a numerically stabilized rank-one eigen structure updating method based on mutual Gram–Schmidt orthogonalization. Yu [81] extended the rank-one eigen structure update to block update and proposed recursive update of the eigenvalue decomposition of a covariance matrix.

The earliest adaptive signal subspace tracking method was proposed by Owsley [7] in 1978. Using the stochastic gradient method, Yang and Kaveh [18] proposed a LMS-type subspace tracking algorithm and extended Owsley’s method and Thompson’s method. This LMS-type algorithm has a high parallel structure and low computational complexity. Karhumen [17] extended Owsley’s idea by developing a stochastic approaching method based on computing subspace. Like Yang and Kaveh’s extension of Thompson’s idea to develop an LMS-type subspace tracking algorithm, Fu and Dowling [45] extended Sarkar’s idea to develop a subspace tracking algorithm based on conjugate gradient. During the recent 20 years, eigen-subspace tracking and update has been an active research field. Since eigen-subspace tracking is mainly applied to real signal processing, these methods should be fast algorithms.

According to [71], the eigen-subspace tracking and updating methods can be classified into the following four classes:

  1. (1)

    In some applications of eigen-subspace method such as MUSIC, one only needs to use the orthogonal basis of the noise subspace eigenvectors and does not need to use the eigenvector itself. This characteristic can predigest the adaptive tracking problem of a class of eigenvectors. The methods which only track the orthogonal basis of noise subspace are classified as the first class, and they are based on rank revealing URV [82] and rank revealing QR [83] decomposition of matrix, respectively.

  2. (2)

    In the method conducting tracking and updating problem of the eigenvalues and eigen-subspace simultaneously, a common sight is to regard the covariance matrix of the nonstationary signal at the kth as the sum of the covariance matrix at the k − 1th and another rank-one matrix (the product of the conjugate transpose of measure vector and itself). Thus, tracking the eigenvalue decomposition of the covariance matrix has much to do with the so-called rank-one updating [81, 84].

  3. (3)

    Regarding the determination of eigen-subspace as an optimization problem: The one is a constrained optimization problem, and the other is unconstrained optimization problem. The constrained optimization problem can be solved using the stochastic gradient [18] and conjugate gradient [45] methods. The unconstrained optimization problem presents a new explanation for the eigen-subspace, and its corresponding method was called projection approximation subspace tracking [5]. The other classical representative is that it is based on Lanczos algorithm, and to use the Lanczos iteration and stochastic approach concept to conduct on the computation of subspace of slowly changing data matrix [85]. Xu et al., proposed [86, 87] three Lanczos and dual Lanczos subspace tracking algorithms, and the former is suitable for the eigen decomposition of covariance matrix, and the latter is for the singular value decomposition of data matrix, and at the processing of Lanczos iteration they can test and estimate the number of principal eigenvalues and principal singular values. In view of the close mathematics connections between the Lanczos algorithm and conjugate gradient, this algorithm, though it has not direct connections with the optimization problem, still falls into the third type of method.

  4. (4)

    Modify and extend the classical eigen decomposition/singular value decomposition batch processing methods such as QR decomposition, Jacobi method, and power iteration to make them adaptive. For example, the singular value decomposition updating algorithm based on QR updating and Jacobi-type method [88] falls into this class.

1.3 Main Features of This Book

This book presents principal component analysis algorithms and its extensions using neural networks approach. Pertinent features include the following:

  1. (1)

    A tutorial-style overview of neural networks-based principal component analysis algorithms, minor component analysis algorithms, principal subspace tracking, and minor subspace tracking.

  2. (2)

    Analysis of self-stabilizing feature of neural-based PCA/MCA algorithms, and development of a self-stabilizing neural-based minor component analysis algorithm.

  3. (3)

    Total least squares estimation application of MCA algorithms, and development of a novel neural-based algorithm for total least squares filtering.

  4. (4)

    Development of a novel dual-purpose principal and minor subspace gradient flow and unified self-stabilizing algorithm for principal and minor components’ extraction.

  5. (5)

    Analysis of a discrete-time dynamics of a class of self-stabilizing MCA learning algorithms and a convergence analysis of deterministic discrete-time system of a unified self-stabilizing algorithm for PCA and MCA.

  6. (6)

    Extension of PCA algorithm to generalized feature extraction and development of a novel adaptive algorithm for minor generalized eigenvector extraction and a novel multiple generalized minor component extraction algorithm.

  7. (7)

    Development of a unified and coupled PCA and MCA rules and an adaptive coupled generalized eigen pairs extraction algorithm, based on Moller’s coupled PCA neural algorithm.

  8. (8)

    Generalization of feature extraction from autocorrelation matrix to cross-correlation matrix, and development of an effective neural algorithm for extracting cross-correlation feature between two high-dimensional data streams and a coupled principal singular triplet extraction algorithm of a cross-covariance matrix.

1.4 Organization of This Book

As reflected in the title, this book is concerned with three areas of principal component analysis method, namely neural-based algorithm, performance analysis method, and generalized/extension algorithm. Consequently, the book can be naturally divided into three parts with a common theme. In the three areas, many novel algorithms were proposed by us. To appreciate theses new algorithms, the conventional approaches and existing methods also need to be understood. Fundamental knowledge of conventional principal component analysis, neural-based feature extraction, subspace tracking, performance analysis methods, and even feature extraction based on matrix theory is essential for understanding the advanced material presented in this book. Thus, each part of this book starts with a tutorial type of introduction of the area.

Part I starts from Chap. 2, which provides an overview of some important concepts and theorems of decomposition and singular value decomposition related to principal component analysis. Chapter 3 serves as a starting point to introduce the neural-based principal component analysis. The key Hebbian network and Oja’s network forming the core of neural network-based PCA algorithms can be founded in this chapter. Chapter 4 provides an introduction to neural network-based MCA algorithms and the self-stabilizing analysis of these algorithms, followed by a novel self-stabilizing MCA algorithm and a novel neural algorithm for total least squares filtering proposed by us. Part I ends on Chap. 5, which addresses the theoretical issue of the dual-purpose principal and minor component analysis. In this chapter, several important dual-purpose algorithms proposed by us are introduced, and their performance and numerical consideration are analyzed. Part II starts from a tutorial-style introduction to deterministic continuous-time (DCT) system, the stochastic discrete-time (SDT) system, the deterministic discrete-time (DDT) system, followed by a detailed analysis of DDT systems of a new self-stabilizing MCA algorithm and Chen’s unified PCA/MCA algorithm in Chap. 6. Part III starts from Chap. 7. The generalized Hermitian eigenvalue problem and existing adaptive algorithms to extract generalized eigen pairs are reviewed, and then, a minor generalized eigenvector extraction algorithm and a novel adaptive algorithm for generalized coupled eigen pairs of ours are introduced and discussed. The other two chapters of Part III are devoted to coupled principal component analysis and cross-correlation feature extraction, respectively, in which our novel coupled or extension algorithms are introduced and analyzed.

Some of the materials presented in this book have been published in archival journals by the authors, and is included in this book after necessary modifications or updates (some modifications are major ones) to ensure accuracy, relevance, completeness and coherence. This portion of materials includes:

  • Section 4.4 of Chapter 4, reprinted from Neural Networks, Xiangyu Kong, Changhua Hu, Chongzhao Han, “A self-stabilizing MSA algorithm in high-dimensional data stream”, Vol. 23, 865–871, © 2010 Elsevier Ltd., with permission from Elsevier.

  • Section 4.5 of Chapter 4, reprinted from Neural Processing Letter, Xiangyu Kong, Changhua Hu, Chongzhao Han, “A self-stabilizing neural algorithm for total least squares filtering”, Vol. 30, 257–271, © 2009 Springer Science+Business Media, LLC., reprinted with permission.

  • Section 5.3 of Chapter 5, reprinted from IEEE Transactions on Signal Processing, Xiangyu Kong, Changhua Hu, Chongzhao Han, “A Dual purpose principal and minor subspace gradient flow”, Vol. 60, No.1, 197–210, © 2012 IEEE., with permission from IEEE.

  • Section 6.3 of Chapter 6, reprinted from IEEE Transactions on Neural Networks, Xiangyu Kong, Changhua Hu, Chongzhao Han, “On the discrete time dynamics of a class of self-stabilizing MCA learning algorithm”, Vol. 21, No. 1, 175–181, © 2010 IEEE., with permission from IEEE.

  • Section 6.4 of Chapter 6, reprinted from Neural Networks, Xiangyu Kong, Qiusheng an, Hongguang Ma, Chongzhao Han, Qizhang, “Convergence analysis of deterministic discrete time system of a unified self-stabilizing algorithm for PCA and MCA”, Vol. 36, 64–72, © 2012 Elsevier Ltd., with permission from Elsevier.

  • Section 7.3 and 7.4 of Chapter 7, reprinted from IEEE Transactions on Signal Processing, Gao Yingbin, Kong Xiangyu, Hu Changhua, Li Hongzeng, and Hou Li'an, “A Generalized Information Criterion for generalized Minor Component Extraction”, Vol. 65, No. 4, 947–959, © 2017 IEEE., with permission from IEEE.

  • Section 8.3 of Chapter 8, reprinted from Neural Processing Letter, Xiaowei Feng, Xiangyu Kong, Hongguang Ma, and Haomiao Liu, “Unified and coupled self-stabilizing algorithm for minor and principal eigen-pair extraction”, doi: 10.1007/s11063-016-9520-3, © 2016 Springer Science+Business Media, LLC., reprinted with permission.

  • Section 8.4 of Chapter 8, reprinted from IEEE Transactions on Signal Processing, Xiaowei Feng, Xiangyu Kong, Zhansheng Duan, and Hongguang Ma, “Adaptive generalized eigen-pairs extraction algorithm and their convergence analysis”, Vol. 64, No. 11, 2976–2989, © 2016 IEEE., with permission from IEEE.

  • Section 9.3 of Chapter 9, reprinted from Neural Processing Letter, Xiang yu Kong, Hong guang Ma, Qiu sheng An, Qi Zhang, “An effective neural learning algorithm for extracting cross-correlation feature between two high-dimensional data streams”, Vol. 42, 459–477, © 2015 Springer Science+Business Media, LLC., reprinted with permission.