8.1 Introduction

Among neural network-based PCA or MCA algorithms, most previously reviewed do not consider eigenvalue estimates in the update equations of the weights, except an attempt to control the learning rate based on the eigenvalue estimates [1]. In [2], Moller provided a framework for a special class of learning rules where eigenvectors and eigenvalues are simultaneously estimated in coupled update equations, and has proved that coupled learning algorithms are solutions for the speed stability problem that plagues most noncoupled learning algorithms. The convergence speed of a system depends on the eigenvalues of its Jacobian, which vary with the eigenvalues of the covariance matrix in noncoupled PCA/MCA algorithms [2]. Moller showed that, in noncoupled PCA algorithms, the eigen motion in all directions mainly depends on the principal eigenvalue of the covariance matrix [2]. Numerical stability and fast convergence of algorithms can only be achieved by guessing this eigenvalue in advance [2]. In particular for chains of principal component analyzers which simultaneously estimate the first few principal eigenvectors [3], choosing the right learning rates for all stages may be difficult. The problem is even more severe for MCA algorithms. MCA algorithms exhibit a wide range of convergence speeds in different eigen directions, since the eigenvalues of the Jacobian cover approximately the same range as the eigenvalues of the covariance matrix. Using small enough learning rates to still guarantee the stability of the numerical procedure, noncoupled MCA algorithms may converge very slowly [2].

In [2], Moller derived a coupled learning rule by applying Newton’s method to a common information criterion. A Newton descent yields learning rules with approximately equal convergence speeds in all eigen directions of the system. Moreover, all eigenvalues of the Jacobian of such a system are approximately. Thus, the dependence on the eigenvalues of the covariance matrix can be eliminated [2]. Moller showed that with respect to averaged differential equations, this approach solves the speed stability problem for both PCA and MCA rules. However, these differential equations can only be turned into the aforementioned online rules for the PCA but not for the MCA case, leaving the more severe MCA stability problem still unresolved [2]. Interestingly, unlike most existing adaptive algorithms, the coupled learning rule for the HEP effectively utilizes the latest estimate of the eigenvalue to update the estimate of the eigenvector [4]. Numerical examples in [2] showed that this algorithm achieves fast and stable convergence for both low-dimensional data and high-dimensional data. Unfortunately, there has been no report about any explicit convergence analysis for the coupled learning rule. Thus, the condition for the convergence to the desired eigen pair is not clear; e.g., the region within which the initial estimate of the eigen pair must be chosen to guarantee the convergence to the desired eigen pair has not yet been known [4].

Recently, Tuan Duong Nguyen et al. proposed novel algorithms in [4] for given explicit knowledge of the matrix pencil (R y , R x ). These algorithms for estimating the generalized eigen pair associated with the largest/smallest generalized eigenvalue are designed (i) based on a new characterization of the generalized eigen pair as a stationary point of a certain function and (ii) by combining a normalization step and quasi-Newton step at each update. Moreover, the rigorous convergence analysis of the algorithms was established by the DDT approach. For adaptive implementation of the algorithms, Tuan Duong Nguyen et al. proposed to use the exponentially weighted sample covariance matrices and the Sherman–Morrison–Woodbury matrix-inversion lemma.

The aim of this chapter was to develop some coupled PCA or coupled generalized PCA algorithms. First, on the basis of a special information criterion in [5], we propose a coupled dynamical system by modifying Newton’s method in this chapter. Based on the coupled system and some approximation, we derive two CMCA algorithms and two CPCA algorithms; thus, two unified coupled algorithms are obtained [6]. Then, we propose a coupled generalized system in this chapter, which is obtained by using the Newton’s method and a novel generalized information criterion. Based on this coupled generalized system, we obtain two coupled algorithms with normalization steps for minor/principal generalized eigen pair extraction. The technique of multiple generalized eigen pair extraction is also introduced in this chapter. The convergence of algorithms is justified by DDT system.

In this chapter, we will review and discuss the existing coupled PCA or coupled generalized PCA algorithms. Two coupled algorithms proposed by us will be analyzed in detail. The remainder of this chapter is organized as follows. An overview of the existing coupled PCA or coupled generalized PCA algorithms is presented in Sect. 8.2. An unified and coupled self-stabilizing algorithm for minor and principal eigen pair extraction algorithms are discussed in Sect. 8.3. An adaptive generalized eigen pair extraction algorithms and their convergence analysis via DDT method are presented in Sect. 8.4, followed by summary in Sect. 8.5.

8.2 Review of Coupled Principal Component Analysis

8.2.1 Moller’s Coupled PCA Algorithm

Learning rules for principal component analysis are often derived by optimizing some information criterion, e.g., by maximizing the variance of the projected data or by minimizing the reconstruction error [2, 7]. In [2], Moller proposed the following information criterion as the starting point of his analysis

$$ p = \varvec{w}^{\text{T}} \varvec{Cw}\lambda^{ - 1} - \varvec{w}^{\text{T}} \varvec{w} + { \ln }\,\lambda . $$
(8.1)

where w denotes an n-dimensional weight vector, i.e., the estimate of the eigenvector, \( \lambda \) is the eigenvalue estimate, and C = E{xx T} is the \( n \times n \) covariance matrix of the data. From (8.1), by using the gradient method and the Newton descent, Moller derived a coupled system of differential equations for the PCA case

$$ \dot{\varvec{w}} = \varvec{Cw}\lambda^{ - 1} - \varvec{ww}^{\text{T}} \varvec{Cw}\lambda^{ - 1} - \frac{1}{2}\varvec{w}\,(1 - \varvec{w}^{\text{T}} \varvec{w}), $$
(8.2)
$$ \dot{\lambda } = \varvec{w}^{\text{T}} \varvec{Cw} - \varvec{w}^{\text{T}} \varvec{w}\lambda , $$
(8.3)

and another for MCA case

$$ \dot{\varvec{w}} = \varvec{C}^{ - 1} \varvec{w}\lambda + \varvec{ww}^{\text{T}} \varvec{Cw}\lambda^{ - 1} - \frac{1}{2}\varvec{w}\,(1 + 3\varvec{w}^{\text{T}} \varvec{w}), $$
(8.4)
$$ \dot{\lambda } = \varvec{w}^{\text{T}} \varvec{Cw} - \varvec{w}^{\text{T}} \varvec{w}\lambda . $$
(8.5)

For the stability of the above algorithms, see [2]. It has been shown that for the above coupled PCA system, if we assume λ j  ≪ λ 1, the system converges with approximately equal speeds in all its eigen directions, and this speed is widely independent of the eigenvalues \( \lambda_{j} \) of the covariance matrix. And for the above coupled MCA system, if we assume λ 1 ≪ λ j , then the convergence speed is again about equal in all eigen directions and independent of the eigenvalues of C.

By informally approximating \( \varvec{C} \approx \varvec{xx}^{\text{T}} \), the averaged differential equations of (8.2) and (8.3) can be turned into an online learning rule:

$$ \dot{\varvec{w}} = \gamma \left[ {y\lambda^{ - 1} (\varvec{x} - \varvec{w}y) - \frac{1}{2}\varvec{w}\,(1 - \varvec{w}^{\text{T}} \varvec{w})} \right], $$
(8.6)
$$ \dot{\lambda } = \gamma (y^{2} - \varvec{w}^{\text{T}} \varvec{w}\lambda ). $$
(8.7)

According to the stochastic approximation theory, the resulting stochastic differential equation has the same convergence goal as the deterministic averaged equation if certain conditions are fulfilled, the most important of which is that a learning rate decreases to zero over time. The online rules (8.6) and (8.7) can be understood as a learning rule for the weight vector w of a linear neuron which computes its output y from the scalar product of weight vector and input vector y = w T x.

In [2], the analysis of the temporal derivative of the (squared) weight vector length in (8.6) has shown that the weight vector length may in general be fluctuating. By further approximating \( \varvec{w}^{\text{T}} \varvec{w} \approx \text{1} \) (which is fulfilled in the vicinity of the stationary points) in the averaged systems (8.2) and (8.3), the following system can be derived

$$ \dot{\varvec{w}} = \varvec{Cw}\lambda^{{ - \text{1}}} - \varvec{ww}^{\text{T}} \varvec{Cw}\lambda^{{ - \text{1}}} , $$
(8.8)
$$ \dot{\lambda } = \varvec{w}^{\text{T}} \varvec{Cw} - \lambda . $$
(8.9)

This learning rule system is known as ALA [1]. The eigenvalues of the system’s Jacobian are still approximately equal and widely independent of the eigenvalues of the covariance matrix. The corresponding online system is given by

$$ \dot{\varvec{w}} = \gamma y\lambda^{ - 1} (\varvec{x} - \varvec{w}y), $$
(8.10)
$$ \dot{\lambda } = \gamma (y^{2} - \lambda ). $$
(8.11)

It is obvious that ALA can be interpreted as an instance of Oja’s PCA rule.

From (8.4) and (8.5), it has been shown that having a Jacobian with eigenvalues that are equal and widely independent of the eigenvalues of the covariance matrix appears to be a solution for the speed stability problem. However, when attempting to turn this system into an online rule, a problem is encountered when replacing the inverse covariance matrix C −1 by a quantity including the input vector x. An averaged equation linearly depending on C takes the form \( \dot{\varvec{w}} = f(\varvec{C},\varvec{w}) = f(E\left\{ {\varvec{xx}^{\text{T}} } \right\},\varvec{w}) \) \( = E\{ f(\varvec{xx}^{\text{T}} ,\varvec{w})\} \). In an online rule, the expectation of the gradient is approximated by slowly following \( \dot{\varvec{w}} = \gamma f(\varvec{xx}^{\text{T}} ,\varvec{w}) \) for subsequent observations of x. This transition is obviously not possible if the equation contains C −1. Thus, there is no online version for the MCA systems (8.4) and (8.5). Despite using the ALA-style normalization, the convergence speed in different eigen directions still depends on the entire range of eigenvalues of the covariance matrix. So the speed stability problem still exists.

8.2.2 Nguyen’s Coupled Generalized Eigen pairs Extraction Algorithm

In [8], Nguyen proposed a generalized principal component analysis algorithm and its differential equation form is given as:

$$ \dot{\varvec{w}} = \varvec{R}_{x}^{{ - \text{1}}} \varvec{R}_{y} \varvec{w} - \varvec{w}^{H} \varvec{R}_{y} \varvec{ww}. $$
(8.12)

Let W = [w 1, w 2,…, w N ], in which w 1, w 2,…, w N are the generalized eigenvectors of matrix pencil (R y , R x ). The Jacobian in the stationary point is given as:

$$ \varvec{J}(\varvec{w}_{\text{1}} ) = \frac{{\partial \dot{\varvec{w}}}}{{\partial \varvec{w}^{\text{T}} }}\left| {_{{\varvec{w} = \varvec{w}_{1} }} } \right. = \varvec{R}_{x}^{{ - \text{1}}} \varvec{R}_{y} -\lambda_{\text{1}} \varvec{I} - 2\lambda_{\text{1}} \varvec{w}_{\text{1}} \varvec{w}_{\text{1}}^{H} \varvec{R}_{x} . $$
(8.13)

Solving for the eigenvectors of J can be simplified to the solving for the eigenvector of its similar diagonally matrix J * = P −1 JP, since J * and J have the same eigenvectors and eigenvalues, and the eigenvalues of diagonal matrix J * are easy to be obtained. Considering W H R x W = I, let P = W. Then we have P −1=W H R x . Thus, it holds that

$$ \begin{aligned} \varvec{J}^{*} (\varvec{w}_{\text{1}} ) & = \varvec{W}^{H} \varvec{R}_{x} \left( {\varvec{R}_{x}^{{ - \text{1}}} \varvec{R}_{y} -\lambda_{\text{1}} \varvec{I} - 2\lambda_{\text{1}} \varvec{w}_{\text{1}} \varvec{w}_{\text{1}}^{H} \varvec{R}_{x} } \right)\varvec{W} \\ & =\varvec{\varLambda}-\lambda_{\text{1}} \varvec{I} - {2\lambda }_{1} \varvec{W}^{H} \varvec{R}_{x} \varvec{w}_{\text{1}} \left( {\varvec{W}^{H} \varvec{R}_{x} \varvec{w}_{\text{1}} } \right)^{H} \frac{{\Delta y}}{{\Delta x}}. \\ \end{aligned} $$
(8.14)

Since W H R x w 1 = e 1 = [1, 0,…, 0]T, (8.14) will be reduced to

$$ \varvec{J}^{*} (\varvec{w}_{\text{1}} ) =\varvec{\varLambda}-\lambda_{\text{1}} \varvec{I} - {2\lambda }_{\text{1}} \varvec{e}_{\text{1}} \varvec{e}_{\text{1}}^{H} . $$
(8.15)

The eigenvalues \( \alpha \) determined from \( { \det }\,(\varvec{J}^{*} - \alpha \varvec{I}) = 0 \) are given as:

$$ \upalpha_{1} = - {2\lambda }_{1} ,\;\upalpha_{j} =\lambda_{j} -\lambda_{1} ,\;j = \text{2}, \ldots ,N. $$
(8.16)

Since the stability requires \( \alpha < 0 \) and thus \( \lambda_{ 1} \gg \lambda_{j} ,\;j = 2,3, \ldots ,n \), it can be seen that only principal eigenvector–eigenvalue pairs are stable stationary points, and all other stationary points are saddles or repellers, which can still testify that (8.12) is a generalized PCA algorithm. In the practical signal processing applications, it always holds that \( \lambda_{ 1} \gg \lambda_{j} ,\;j = 2,3, \ldots ,n \). Thus, α j  ≈ −λ 1, i.e., the eigen motion in all directions in algorithm (8.12) depends on the principal eigenvalue of the covariance matrix. Thus, this algorithm has the speed stability problem.

In [4], an adaptive normalized quasi-Newton algorithm for generalized eigen pair extraction was proposed and its convergence analysis was conducted. This algorithm is a coupled generalized eigen pair extraction algorithm, which can be interpreted as natural combinations of the normalization step and quasi-Newton steps for finding the stationary points of the function

$$ \xi (\varvec{w},\lambda ) = \varvec{w}^{H} \varvec{R}_{\varvec{y}} \varvec{w}\lambda^{ - 1} - \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} + { \ln }\,\lambda , $$
(8.17)

which is a generalization of the information criterion introduced in [2] for the HEP. The stationary point of \( \xi \) is defined as a zero of

$$ \left( {\begin{array}{*{20}c} {\frac{\partial \xi }{{\partial \varvec{w}}}} \\ {\frac{\partial \xi }{\partial \lambda }} \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {2\varvec{R}_{\varvec{y}} \varvec{w}\lambda^{ - 1} - 2\varvec{R}_{\varvec{x}} \varvec{w}} \\ { - \varvec{w}^{H} \varvec{R}_{\varvec{y}} \varvec{w}\lambda^{ - 2} + \lambda^{ - 1} } \\ \end{array} } \right). $$
(8.18)

Hence, from

$$ \left\{ {\begin{array}{*{20}c} {\varvec{R}_{\varvec{y}} \bar{\varvec{w}} = \bar{\lambda }\varvec{R}_{\varvec{x}} \bar{\varvec{w}}} \\ {\bar{\varvec{w}}^{H} \varvec{R}_{\varvec{y}} \bar{\varvec{w}} = \bar{\lambda }} \\ \end{array} } \right., $$
(8.19)

it can be seen that \( (\bar{\varvec{w}},\bar{\lambda }) \in C^{N} \times \Re \) is a stationary point of \( \xi \), which implies that a stationary point \( (\bar{\varvec{w}},\bar{\lambda }) \) of \( \xi \) is a generalized eigen pair of the matrix pencil (R y , R x ). To avoid these computational difficulties encountered in Newton’s method, Nguyen proposed to use approximations of \( \varvec{H}^{ - 1} (\varvec{w},\lambda ) \) in the vicinity of two stationary points of their special interest

$$ \varvec{H}^{ - 1} (\varvec{w},\lambda ) \approx \widetilde{\varvec{H}}_{P}^{ - 1} (\varvec{w},\lambda ) = \frac{1}{2}\left( {\begin{array}{*{20}c} {\frac{1}{2}\varvec{ww}^{H} - \varvec{R}_{\varvec{x}}^{{ - {\mathbf{1}}}} } & { - \varvec{w}\lambda } \\ { - \varvec{w}^{H} \lambda } & 0 \\ \end{array} } \right), $$
(8.20)

for \( (\varvec{w},\lambda ) \approx (\varvec{v}_{N} ,\lambda_{N} ) \), and

$$ \varvec{H}^{ - 1} (\varvec{w},\lambda ) \approx \widetilde{\varvec{H}}_{M}^{ - 1} \,(\varvec{w},\lambda ) = \frac{1}{2}\left( {\begin{array}{*{20}c} {\varvec{R}_{\varvec{y}}^{{ - {\mathbf{1}}}} \lambda - \frac{3}{2}\varvec{ww}^{H} } & { - \varvec{w}\lambda } \\ { - \varvec{w}^{H} \lambda } & 0 \\ \end{array} } \right), $$
(8.21)

for \( (\varvec{w},\lambda ) \approx (\varvec{v}_{1} ,\lambda_{1} ) \). By applying Newton’s strategy for finding the stationary point of \( \xi \) using the gradient (8.18) and the approximations (8.20) and (8.21), a learning rule for estimating the generalized eigen pair associated with the largest generalized eigenvalue was obtained as:

$$ \begin{aligned} \varvec{w}\,(k + 1) & = \varvec{w}(k) + \eta_{1} \left\{ {\varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} \varvec{w}(k)\lambda^{ - 1} (k)} \right. \\ & \quad \left. { - \varvec{w}^{H} (k)\varvec{R}_{\varvec{y}} \,\varvec{w}(k)\,\varvec{w}(k)\lambda^{ - 1} (k) - \frac{1}{2}\varvec{w}(k)[1 - \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \,\varvec{w}(k)]} \right\}, \\ \end{aligned} $$
(8.22)
$$ \lambda (k + 1) = \lambda (k) + \gamma_{1} \left[ {\varvec{w}^{H} (k + 1)\varvec{R}_{\varvec{y}} \varvec{w}(k + 1) - \varvec{w}^{H} (k + 1)\varvec{R}_{\varvec{x}} \varvec{w}(k + 1)\lambda (k)} \right], $$
(8.23)

and a learning rule for estimating the generalized eigen pair associated with the smallest generalized eigenvalue was obtained as:

$$ \begin{aligned} \varvec{w}\,(k + 1) & = \varvec{w}\,(k) + \eta_{2} \left\{ {\varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}\,(k)\lambda \,(k)} \right. + \varvec{w}^{H} (k)\varvec{R}_{\varvec{y}} \varvec{w}\,(k)\,\varvec{w}\,(k)\lambda^{ - 1} (k) \\ & \quad \quad \quad \quad \quad \quad \left. { - \frac{1}{2}\varvec{w}\,(k)\left[ {1 + 3\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{w}\,(k)} \right]} \right\}, \\ \end{aligned} $$
(8.24)
$$ \lambda (k + 1) = \lambda (k) + \gamma_{2} \left[ {\varvec{w}^{H} (k + 1)\varvec{R}_{\varvec{y}} \varvec{w}(k + 1) - \varvec{w}^{H} (k + 1)\varvec{R}_{\varvec{x}} \varvec{w}(k + 1)\lambda (k)} \right], $$
(8.25)

where \( \eta_{1} ,\gamma_{1} ,\eta_{2} ,\gamma_{2} > 0 \) are the step sizes, and \( [\varvec{w}(k),\lambda (k)] \) is the estimate at time k of the generalized eigen pair associated with the largest/smallest generalized eigenvalue. By introducing the normalization step in the above learning rules at each update, using the exponentially weighted sample covariance matrices \( \widehat{\varvec{R}}_{\varvec{y}} \) and \( \widehat{\varvec{R}}_{\varvec{x}} \) which are updated recursively, and using the Sherman–Morrison–Woodbury matrix-inversion lemma, Nguyen’s coupled generalized eigen pair extraction algorithm was obtained as follows.

Adaptive coupled generalized PCA algorithm:

$$ \begin{aligned} \tilde{\varvec{w}}(k) & = \varvec{w}(k - 1) + \frac{{\eta_{1} }}{{\lambda (k - 1 )}}\left( {\varvec{Q}_{\varvec{x}} (k)\widehat{\varvec{R}}_{\varvec{y}} (k)\varvec{w}(k - 1)} \right. \\ & \quad \left. { - \varvec{w}^{H} (k - 1)\widehat{\varvec{R}}_{\varvec{y}} (k)\varvec{w}(k - 1)\varvec{w}(k - 1)} \right), \\ \end{aligned} $$
(8.26)
$$ \varvec{w}(k) = \frac{{\tilde{\varvec{w}}(k)}}{{\left\| {\tilde{\varvec{w}}(k)} \right\|_{{\widehat{\varvec{R}}_{\varvec{x}} (k)}} }}, $$
(8.27)
$$ \lambda\,(k) = \left( {1 -\upgamma_{1} } \right)\lambda\,(k - 1) +\upgamma_{1} \varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{y}} (k)\varvec{w}(k), $$
(8.28)

and adaptive coupled generalized MCA algorithm:

$$ \begin{aligned} \tilde{\varvec{w}}(k) & = \varvec{w}(k - 1) + \eta_{2} \left( {\varvec{Q}_{\varvec{y}} (k)\widehat{\varvec{R}}_{\varvec{x}} (k)\varvec{w}\left( {k - 1} \right)} \right.\lambda\left( {k - 1} \right) \\ & \quad \left. { + \varvec{w}^{H} \left( {k - 1} \right)\widehat{\varvec{R}}_{\varvec{y}} (k)\,\varvec{w}\left( {k - 1} \right)\varvec{w}\left( {k - 1} \right)\lambda^{{\text{ - } 1}} \left( {k - 1} \right) - 2\varvec{w}\left( {k - 1} \right)} \right), \\ \end{aligned} $$
(8.29)
$$ \varvec{w}(k) = \frac{{\tilde{\varvec{w}}(k)}}{{\left\| {\tilde{\varvec{w}}(k)} \right\|_{{\widehat{\varvec{R}}_{\varvec{x}} (k)}} }}, $$
(8.30)
$$ \lambda(k) = \left( {1 -\upgamma_{2} } \right)\lambda\left( {k - 1} \right) +\upgamma_{2} \varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{y}} (k)\varvec{w}\,(k), $$
(8.31)

where \( \left\| \varvec{u} \right\|_{{\varvec{R}_{\varvec{x}} }} = \sqrt {\varvec{u}^{H} \varvec{R}_{\varvec{x}} \varvec{u}} \) is defined as the R x -norm, \( \varvec{Q}_{\varvec{x}} = \varvec{R}_{\varvec{x}}^{ - 1} \), \( \varvec{Q}_{\varvec{y}} = \varvec{R}_{\varvec{y}}^{ - 1} \), which are updated recursively as follows:

$$ \widehat{\varvec{R}}_{\varvec{y}} (k + 1) = \beta \widehat{\varvec{R}}_{\varvec{y}} (k) + \varvec{y}(k + 1)\varvec{y}^{H} (k + 1), $$
(8.32)
$$ \widehat{\varvec{R}}_{\varvec{x}} (k + 1) = \alpha \widehat{\varvec{R}}_{\varvec{x}} (k) + \varvec{x}(k + 1)\varvec{x}^{H} (k + 1), $$
(8.33)
$$ \varvec{Q}_{\varvec{x}} \left( {k + 1} \right){ = }\frac{1}{\alpha }\left( {\varvec{Q}_{\varvec{x}} (k) - \frac{{\varvec{Q}_{\varvec{x}} (k)\varvec{x}\left( {k + 1} \right)\varvec{x}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{x}} (k)}}{{\alpha + \varvec{x}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{x}} (k)\varvec{x}\left( {k + 1} \right)}}} \right), $$
(8.34)
$$ \varvec{Q}_{\varvec{y}} \left( {k + 1} \right){ = }\frac{1}{\beta }\left( {\varvec{Q}_{\varvec{y}} (k) - \frac{{\varvec{Q}_{\varvec{y}} (k)\varvec{y}\left( {k + 1} \right)\varvec{y}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{y}} (k)}}{{\beta + \varvec{y}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{y}} (k)\varvec{y}\left( {k + 1} \right)}}} \right), $$
(8.35)

where \( \alpha ,\beta \in (0,1) \) are the forgetting factors.

Different from the analysis of Möller’s coupled algorithm, the convergence analysis of Nguyen algorithm in [4] was not conducted via the eigenvalue of Jacobian matrix. Nguyen established rigorous analysis of the DDT systems showing that, for a step size within a certain range, the algorithm converges to the orthogonal projection of the initial estimate onto the generalized eigen-subspace associated with the largest/smallest generalized eigenvalue.

Next, we analyze the convergence of Nguyen’s algorithm via the eigenvalues of Jacobian matrix.

By ignoring the normalization step (8.27), the differential equation form of GPCA algorithms (8.26) and (8.28) can be written as:

$$ \dot{\varvec{w}} =\lambda^{ - 1} \left( {\varvec{R}_{x}^{ - 1} \varvec{R}_{y} \varvec{w} - \varvec{w}^{H} \varvec{R}_{y} \varvec{ww}} \right), $$
(8.36)
$$ {\dot{\lambda }} = \varvec{w}^{H} \varvec{R}_{y} \varvec{w} -\lambda . $$
(8.37)

The Jacobian matrix at the stationary point (w 1, λ 1) is given as:

$$ \begin{aligned} \varvec{J}(\varvec{w}_{ 1} ,\lambda_{ 1} ) & = \left. {\left( {\begin{array}{*{20}c} {\frac{{\partial \dot{\varvec{w}}}}{{\partial \varvec{w}^{\text{T}} }}} & {\frac{{\partial \dot{\varvec{w}}}}{{\partial\lambda}}} \\ {\frac{{\partial {\dot{\lambda }}}}{{\partial \varvec{w}^{\text{T}} }}} & {\frac{{\partial {\dot{\lambda }}}}{{\partial\lambda}}} \\ \end{array} } \right)} \right|\begin{array}{*{20}c} {} \\ {} \\ {(\varvec{w}_{ 1} ,\lambda_{ 1} )} \\ \end{array} , \\ & = \left( {\begin{array}{*{20}c} {\lambda_{1}^{ - 1} \varvec{R}_{x}^{ - 1} \varvec{R}_{y} - \varvec{I} - 2\varvec{w}_{1} \varvec{w}_{1}^{H} \varvec{R}_{x} } & {\mathbf{0}} \\ {2\lambda_{1} \varvec{w}_{1}^{H} \varvec{R}_{x} } & { - 1} \\ \end{array} } \right). \\ \end{aligned} $$
(8.38)

Let

$$ \varvec{P} = \left( {\begin{array}{*{20}c} \varvec{W} & {\mathbf{0}} \\ {{\mathbf{0}}^{\text{T}} } & 1 \\ \end{array} } \right). $$
(8.39)

Then, it can be easily seen that

$$ \varvec{P}^{{\text{ - } 1}} = \left( {\begin{array}{*{20}c} {\varvec{W}^{H} \varvec{R}_{x} } & {\mathbf{0}} \\ {{\mathbf{0}}^{\text{T}} } & 1 \\ \end{array} } \right). $$
(8.40)

Solving for the eigenvectors of J can then be simplified to the solving for the eigenvector of its similar diagonally matrix J * = P −1 JP. Then it holds that

$$ \varvec{J}^{*} (\varvec{w}_{ 1} ,\,\lambda_{ 1} ) = \left( {\begin{array}{*{20}c} {\lambda_{1}^{ - 1}\varvec{\varLambda}- \varvec{I} - 2\varvec{e}_{1} \varvec{e}_{1}^{H} } & {\mathbf{0}} \\ {2\lambda_{1} \varvec{e}_{1}^{H} } & { - 1} \\ \end{array} } \right). $$
(8.41)

The eigenvalues \( \alpha \) determined from \( { \det }\,(\varvec{J}^{*} - \alpha \varvec{I}) = 0 \) are

$$ \upalpha_{1} = - 2,\;\upalpha_{N + 1} = - 1,\;\upalpha_{j} = \frac{{\lambda_{j} }}{{\lambda_{1} }} - 1,\;j = 2, \ldots ,N. $$
(8.42)

Since the stability requires \( \alpha < 0 \) and thus \( \lambda_{j} < \lambda_{1} ,\;j = 2,3, \ldots ,n \), it can be seen that only principal eigenvector–eigenvalue pairs are stable stationary points, and all other stationary points are saddles or repellers. If we further assume that λ 1 ≫ λ j , then α  j ≈ −1, \( j = 2,3, \ldots ,n \). That is to say, the eigen motion in all directions in the algorithm do not depend on the generalized eigenvalue of the covariance matrix of input signal. Thus, this algorithm does not have the speed stability problem. Similar analysis can be applied to the GMCA algorithms (8.29) and (8.31).

8.2.3 Coupled Singular Value Decomposition of a Cross-Covariance Matrix

In [9], a coupled online learning rule for the singular value decomposition (SVD) of a cross-covariance matrix was derived. In coupled SVD rules, the singular value is estimated alongside the singular vectors, and the effective learning rates for the singular vector rules are influenced by the singular value estimates [9]. In addition, a first-order approximation of Gram–Schmidt orthonormalization as decorrelation method for the estimation of multiple singular vectors and singular values was used. It has been shown that the coupled learning rules converge faster than Hebbian learning rules and that the first-order approximation of Gram–Schmidt orthonormalization produces more precise estimates and better orthonormality than the standard deflation method [9].

The neural network and its learning algorithm for the singular value decomposition of a cross-covariance matrix will be discussed in Chap. 9, in which the coupled online learning rules for the SVD of a cross-covariance matrix will be analyzed in detail.

8.3 Unified and Coupled Algorithm for Minor and Principal Eigen Pair Extraction

Coupled algorithm can mitigate the speed stability problem which exists in most noncoupled algorithms. Though unified algorithm and coupled algorithm have these advantages over single purpose algorithm and noncoupled algorithm, respectively, there are only few of unified algorithms, and coupled algorithms have been proposed. Moreover, to the best of the authors’ knowledge, there are no both unified and coupled algorithms which have been proposed. In this chapter, based on a novel information criterion, we propose two self-stabilizing algorithms which are both unified and coupled. In the derivation of our algorithms, it is easier to obtain the results compared with traditional methods, because there is no need to calculate the inverse Hessian matrix. Experiment results show that the proposed algorithms perform better than existing coupled algorithms and unified algorithms.

8.3.1 Couple Dynamical System

The derivation of neural network learning rules often starts with an information criterion, e.g., by maximization of the variance of the projected data or by minimization of the reconstruction error [7]. However, as stated in [10], the freedom of choosing an information criterion is greater if Newton’s method is applied because the criterion just has to have stationary points in the desired solutions. Thus in [2], Moller proposed a special criterion. Based on this criterion and by using Newton’s method, Moller derived some CPCA learning rules and a CMCA learning rule. Based on another criterion, Hou [5] derived the same CPCA and CMCA learning rules as that of Moller’s, and Appendix 2 of [5] showed that it is easier and clearer to approximate the inverse of the Hessian.

To start the analysis, we use the same information criterion as Hou’s, which is

$$ p = \varvec{w}^{\text{T}} \varvec{Cw} - \varvec{w}^{\text{T}} \varvec{w}\lambda + \lambda $$
(8.43)

where \( \varvec{C} = E\left\{ {\varvec{xx}^{\text{T}} } \right\} \in \Re^{n \times n} \) is the covariance matrix of the n-dimensional input data sequence x, \( \varvec{w} \in \Re^{n \times 1} \) and \( \lambda \in \Re \) denotes the estimation of eigenvector (weight vector) and eigenvalue of C, respectively.

It is found that

$$ \frac{\partial p}{{\partial \varvec{w}}} = 2\varvec{Cw} - 2\lambda \varvec{w} $$
(8.44)
$$ \frac{\partial p}{\partial \lambda } = - \varvec{w}^{\text{T}} \varvec{w} + 1. $$
(8.45)

Thus, the stationary points \( (\bar{\varvec{w}},\bar{\lambda }) \) of (8.43) are defined by

$$ \left. {\frac{\partial p}{{\partial \varvec{w}}}} \right|_{{(\bar{\varvec{w}},\bar{\lambda })}} = {\mathbf{0}},\;\left. {\frac{\partial p}{\partial \lambda }} \right|_{{(\bar{\varvec{w}},\bar{\lambda })}} = 0. $$
(8.46)

Then, we can obtain

$$ {\varvec{C\bar{w}}} = \bar{\lambda }\bar{\varvec{w}}, $$
(8.47)
$$ \bar{\varvec{w}}^{\text{T}} \bar{\varvec{w}} = 1 $$
(8.48)

from which we can also conclude that \( \bar{\varvec{w}}^{\text{T}} {\varvec{C\bar{w}}} = \bar{\lambda } \). Thus, the criterion (8.43) fulfills the aforementioned requirement: The stationary points include all associated eigenvectors and eigenvalues of C. The Hessian of the criterion is given as:

$$ \varvec{H}\,(\varvec{w},\lambda ) = \left( {\begin{array}{*{20}c} {\frac{{\partial^{2} p}}{{\partial \varvec{w}^{2} }}} & {\frac{{\partial^{2} p}}{{\partial \varvec{w}\partial \lambda }}} \\ {\frac{{\partial^{2} p}}{{\partial \lambda \partial \varvec{w}}}} & {\frac{{\partial^{2} p}}{{\partial \lambda^{2} }}} \\ \end{array} } \right) = 2\left( {\begin{array}{*{20}c} {\varvec{C} - \lambda \varvec{I}} & { - \varvec{w}} \\ { - \varvec{w}^{\text{T}} } & 0 \\ \end{array} } \right). $$
(8.49)

Based on the Newton’s method, the equation used by Moller and Hou to derive the differential equations can be written as:

$$ \left( {\begin{array}{*{20}c} {\dot{\varvec{w}}} \\ {\dot{\lambda }} \\ \end{array} } \right) = - \varvec{H}^{ - 1} (\varvec{w},\lambda )\left( {\begin{array}{*{20}c} {\frac{\partial p}{{\partial \varvec{w}}}} \\ {\frac{\partial p}{\partial \lambda }} \\ \end{array} } \right). $$
(8.50)

Based on different information criteria, both Moller and Hou tried to find the inverse of their Hessian \( \varvec{H}^{ - 1} (\varvec{w},\lambda ) \). Although the inverse Hessian of Moller and Hou is different, they finally obtained the same CPCA and CMCA rules [5]. Here we propose to derive the differential equation with another technical, which is

$$ \varvec{H}(\varvec{w},\lambda )\left( {\begin{array}{*{20}c} {\dot{\varvec{w}}} \\ {\dot{\lambda }} \\ \end{array} } \right) = - \left( {\begin{array}{*{20}c} {\frac{\partial p}{{\partial \varvec{w}}}} \\ {\frac{\partial p}{\partial \lambda }} \\ \end{array} } \right). $$
(8.51)

In this case, there is no need to calculate the inverse Hessian. Substituting (8.44), (8.45), and (8.49) into (8.51), it yields

$$ 2\left( {\begin{array}{*{20}c} {\varvec{C} - \lambda \varvec{I}} & { - \varvec{w}} \\ { - \varvec{w}^{\text{T}} } & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\dot{\varvec{w}}} \\ {\dot{\lambda }} \\ \end{array} } \right) = - \left( {\begin{array}{*{20}c} {2\varvec{Cw} - 2\lambda \varvec{w}} \\ { - \varvec{w}^{\text{T}} \varvec{w} + 1} \\ \end{array} } \right). $$
(8.52)

Then we can get

$$ (\varvec{C} - \lambda \varvec{I})\dot{\varvec{w}} - \varvec{w}\dot{\lambda } = - (\varvec{C} - \lambda \varvec{I})\varvec{w} $$
(8.53)
$$ - 2\varvec{w}^{\text{T}} \dot{\varvec{w}} = \varvec{w}^{\text{T}} \varvec{w} - 1. $$
(8.54)

In the vicinity of the stationary point \( (\varvec{w}_{1} ,\lambda_{1} ) \), by approximating \( \varvec{w} \approx \varvec{w}_{1} ,\,\lambda \approx \lambda_{1} \ll \lambda_{j} \;(2 \le j \le n) \), and after some manipulations (see Appendix A in [6]), we get a coupled dynamical system as

$$ \dot{\varvec{w}} = \frac{{\varvec{C}^{ - 1} \varvec{w}(\varvec{w}^{\text{T}} \varvec{w} + 1)}}{{2\varvec{w}^{\text{T}} \varvec{C}^{ - 1} \varvec{w}}} - \varvec{w} $$
(8.55)
$$ \dot{\lambda } = \frac{{\varvec{w}^{\text{T}} \varvec{w} + 1}}{2}\left( {\frac{1}{{\varvec{w}^{\text{T}} \varvec{C}^{ - 1} \varvec{w}}} - \lambda } \right). $$
(8.56)

8.3.2 The Unified and Coupled Learning Algorithms

8.3.2.1 Coupled MCA Algorithms

The differential equations can be turned into the online form by informally approximating C = x(k)x T(k), where x(k) is a data vector drawn from the distribution. That is, the expression of the rules in online form can be approximated by slowly following w(k + 1) = f(x(k)x T(k); w(k)) for subsequent observations of x. Moller has pointed out [2] that this transition is infeasible if the equation contains C −1, because it is hard to replace the inverse matrix C −1 by an expression containing the input vector x. However, this problem can be solved in another way [11, 12], in which C −1 is updated as

$$ \widehat{\varvec{C}}^{ - 1} (k + 1) = \frac{k + 1}{k}\left[ {\widehat{\varvec{C}}^{ - 1} (k) - \frac{{\widehat{\varvec{C}}^{ - 1} (k)\,\varvec{x}(k + 1)\varvec{x}^{\text{T}} (k + 1)\,\widehat{\varvec{C}}^{ - 1} (k)}}{{k + \varvec{x}^{\text{T}} (k + 1)\,\widehat{\varvec{C}}^{ - 1} (k)\,\varvec{x}^{\text{T}} (k + 1)}}} \right] $$
(8.57)

where \( \widehat{\varvec{C}}^{ - 1} (k) \) starts with \( \widehat{\varvec{C}}^{ - 1} (0) = \varvec{I} \) and converges to \( \varvec{C}^{ - 1} \) as \( k \to \infty \).Then, the CMCA system (8.55)–(8.56) has the online form as:

$$ \varvec{w}(k + 1) = \varvec{w}(k) + \gamma (k)\left\{ {\frac{{[\varvec{w}^{\text{T}} (k)\varvec{w}(k) + 1]\,\varvec{Q}(k)\varvec{w}(k)}}{{2\varvec{w}^{\text{T}} (k)\,\varvec{Q}(k)\varvec{w}(k)}} - \varvec{w}(k)} \right\} $$
(8.58)
$$ \lambda (k + 1) = \lambda (k) + \gamma (k)\frac{{\varvec{w}^{\text{T}} (k)\varvec{w}(k) + 1}}{2}\left[ {\frac{1}{{\varvec{w}^{\text{T}} (k)\varvec{Q}(k)\varvec{w}(k)}} - \lambda (k)} \right] $$
(8.59)
$$ \varvec{Q}(k + 1) = \frac{k + 1}{\alpha k}\left[ {\varvec{Q}(k) - \frac{{\varvec{Q}(k)\,\varvec{x}(k + 1)\,\varvec{x}^{\text{T}} (k + 1)\,\varvec{Q}(k)}}{{k + \varvec{x}^{\text{T}} (k + 1)\,\varvec{Q}(k)\,\varvec{x}^{\text{T}} (k + 1)}}} \right] $$
(8.60)

where \( 0 < \alpha \le 1 \) denotes the forgetting factor and \( \gamma (k) \) is the learning rate. If all training samples come from a stationary process, we choose \( \alpha = 1 \). \( \varvec{Q}(k) = \varvec{C}^{ - 1} (k) \) starts with \( \varvec{Q}(0) = \varvec{I} \). Here, we refer to the rule (8.55)–(8.56) and its online form (8.58)–(8.60) as “fMCA,” where f means fast. In the rest of this section, the online form (which is used in the implementation) and the differential matrix form (which is used in the convergence analysis) of a rule have the same name, and we will not emphasize this again. If we further approximate w T w ≈ 1 (which fulfills in the vicinity of the stationary points) in (8.55)–(8.56), we can obtain q simplified CMCA system

$$ \dot{\varvec{w}} = \frac{{\varvec{C}^{ - 1} \varvec{w}}}{{2\varvec{w}^{\text{T}} \varvec{C}^{ - 1} \varvec{w}}} - \varvec{w} $$
(8.61)
$$ \dot{\lambda } = \frac{1}{{\varvec{w}^{\text{T}} \varvec{C}^{ - 1} \varvec{w}}} - \lambda $$
(8.62)

and the online form is given as:

$$ \varvec{w}(k + 1) = \varvec{w}(k) + \gamma (k)\left\{ {\frac{{\varvec{Q}(k)\,\varvec{w}(k)}}{{\varvec{w}^{\text{T}} (k)\,\varvec{Q}(k)\,\varvec{w}(k)}} - \varvec{w}(k)} \right\} $$
(8.63)
$$ \lambda (k + 1) = \lambda (k) + \gamma (k)\left[ {\frac{1}{{\varvec{w}^{\text{T}} (k)\,\varvec{Q}(k)\,\varvec{w}(k)}} - \lambda (k)} \right] $$
(8.64)

where Q(k) is updated by (8.60). In the following, we will refer to this algorithm as “aMCA,” where a means adaptive.

8.3.2.2 Coupled PCA Algorithms

It is known that in unified rules, MCA rules can be derived from PCA rules by changing the sign or using the inverse of the covariance matrix, and vice versa. Here we propose to derive unified algorithms by deriving CPCA rules from CMCA rules. Suppose that the covariance matrix C has an eigen pair \( (\varvec{w},\lambda ) \); then it holds that [13] \( \varvec{Cw} = \lambda \varvec{w} \) and \( \varvec{C}^{ - 1} \varvec{w} = \lambda^{ - 1} \varvec{w} \), which means that the minor eigen pair of C is also the principal eigen pair of the inverse matrix \( \varvec{C}^{ - 1} \), and vice versa. Therefore, by replacing \( \varvec{C}^{ - 1} \) with C in fMCA and aMCA rules, respectively, we obtain two modified rules to extract the principal eigen pair of C, which is also the minor eigen pair of \( \varvec{C}^{ - 1} \). The modified rules are given as:

$$ \dot{\varvec{w}} = \frac{{\varvec{Cw}\,(\varvec{w}^{\text{T}} \varvec{w} + 1)}}{{2\varvec{w}^{\text{T}} \varvec{Cw}}} - \varvec{w} $$
(8.65)
$$ \dot{\lambda } = \frac{{\varvec{w}^{\text{T}} \varvec{w} + 1}}{2}\left( {\varvec{w}^{\text{T}} \varvec{Cw} - \lambda } \right) $$
(8.66)

and

$$ \dot{\varvec{w}} = \frac{{\varvec{Cw}}}{{2\varvec{w}^{\text{T}} \varvec{Cw}}} - \varvec{w} $$
(8.67)
$$ \dot{\lambda } = \varvec{w}^{\text{T}} \varvec{Cw} - \lambda . $$
(8.68)

Since the covariance matrix C is usually unknown in advance, we use its estimate at time k by \( \widehat{\varvec{C}}(k) \) suggested in [11], which is

$$ \widehat{\varvec{C}}(k + 1) = \alpha \frac{k}{k + 1}\widehat{\varvec{C}}(k) + \frac{1}{k + 1}\varvec{x}(k + 1)\varvec{x}^{\text{T}} (k + 1) $$
(8.69)

where \( \widehat{\varvec{C}}(k) \) starts with \( \widehat{\varvec{C}}(0) = \varvec{x}(0)\varvec{x}^{\text{T}} (0) \) (or I). Actually, (8.57) is obtained from (8.69) by using the SM-formula. Then, the online form of (8.65)–(8.66) and (8.67)–(8.68) is given as:

$$ \varvec{w}(k + 1) = \varvec{w}(k) + \gamma (k)\left\{ {\frac{{\left[ {\varvec{w}^{\text{T}} (k)\,\widehat{\varvec{C}}(k)\varvec{w}(k) + 1} \right]\widehat{\varvec{C}}(k)\varvec{w}(k)}}{{2\varvec{w}^{\text{T}} (k)\widehat{\varvec{C}}(k)\varvec{w}(k)}} - \varvec{w}(k)} \right\} $$
(8.70)
$$ \lambda (k + 1) = \lambda (k) + \gamma (k)\frac{{\varvec{w}^{\text{T}} (k)\varvec{w}(k) + 1}}{2}\left[ {\varvec{w}^{\text{T}} (k)\widehat{\varvec{C}}(k)\varvec{w}(k) - \lambda (k)} \right] $$
(8.71)

and

$$ \varvec{w}(k + 1) = \varvec{w}(k) + \gamma (k)\left\{ {\frac{{\widehat{\varvec{C}}(k)\,\varvec{w}(k)}}{{\varvec{w}^{\text{T}} (k)\,\widehat{\varvec{C}}(k)\,\varvec{w}(k)}} - \varvec{w}(k)} \right\} $$
(8.72)
$$ \lambda (k + 1) = \lambda (k) + \gamma (k)\left[ {\varvec{w}^{\text{T}} (k)\,\widehat{\varvec{C}}(k)\,\varvec{w}(k) - \lambda (k)} \right] $$
(8.73)

respectively. Here we rename this algorithm deduced from fMCA and aMCA as “fPCA” and “aPCA,” respectively. Finally, we obtain two unified and coupled algorithms. The first one is fMCA + fGPCA, and the second one is aMCA + aPCA. These two unified algorithms are capable of both PCA and MCA by using the original or inverse of covariance matrix.

8.3.2.3 Multiple Eigen Pairs Estimation

In some engineering practice, it is required to estimate the eigen-subspace or multiple eigen pairs. As introduced in [4], by using the nested orthogonal complement structure of the eigen-subspace, the problem of estimating the p(≤n)-dimensional principal/minor subspace can be reduced to multiple principal/minor eigenvectors estimation. The following shows how to estimate there maining p − 1 principal/minor eigen pairs.

For the CMCA case, consider the following equations:

$$ \widehat{\varvec{C}}_{j} = \widehat{\varvec{C}}_{j - 1} + \eta \lambda_{j - 1} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} ,\;j = 2, \ldots ,p $$
(8.74)

where \( \widehat{\varvec{C}}_{1} = \widehat{\varvec{C}} \) and \( \eta \) is larger than the largest eigenvalue of \( \widehat{\varvec{C}} \), and \( (\varvec{w}_{j - 1} ,\lambda_{j - 1} ) \) is the (j − 1)th minor eigen pair of \( \widehat{\varvec{C}} \) that has been extracted. It is found that

$$ \begin{aligned} \widehat{\varvec{C}}_{j} \varvec{w}_{q} & = (\widehat{\varvec{C}}_{j - 1} + \eta \lambda_{j - 1} \varvec{w}_{j - 1}^{\text{T}} )\varvec{w}_{q} \\ & = (\widehat{\varvec{C}}_{1} + \eta \sum\limits_{r = 1}^{j - 1} {\lambda_{r} \varvec{w}_{r} \varvec{w}_{r}^{\text{T}} } )\varvec{w}_{q} \\ & = \widehat{\varvec{C}}_{1} \varvec{w}_{q} + \eta \sum\limits_{r = 1}^{j - 1} {\lambda_{r} \varvec{w}_{r} \varvec{w}_{r}^{\text{T}} } \varvec{w}_{q} \\ & = \left\{ {\begin{array}{*{20}l} {\widehat{\varvec{C}}_{1} \varvec{w}_{q} + \eta \lambda_{q} \varvec{w}_{q} = (1 + \eta )\lambda_{q} \varvec{w}_{q} } \hfill & {{\text{for}}\;q = 1, \ldots ,j - 1} \hfill \\ {\widehat{\varvec{C}}_{1} \varvec{w}_{q} = \lambda_{q} \varvec{w}_{q} } \hfill & {{\text{for}}\;q = j, \ldots ,p} \hfill \\ \end{array} } \right.. \\ \end{aligned} $$
(8.75)

Suppose that matrix \( \widehat{\varvec{C}}_{1} \) has eigenvectors \( \varvec{w}_{1} ,\varvec{w}_{2} , \ldots ,\varvec{w}_{n} \) corresponding to eigenvalues \( (0 < )\;\sigma_{1} < \sigma_{2} < \cdots < \sigma_{n} \), and then matrix \( \varvec{C}_{j} \) has eigenvectors \( \varvec{w}_{j} , \ldots ,\varvec{w}_{n} ,\;\varvec{w}_{1} , \ldots ,\varvec{w}_{j - 1} \) corresponding to eigenvalues \( (0 < )\sigma_{j} < \cdots < \sigma_{n} < (1 + \eta )\sigma_{1} < \cdots < (1 + \eta )\sigma_{j - 1} \). In this case, \( \sigma_{j} \) is the smallest eigenvalue of \( \varvec{C}_{j} \). Based on the SM-formula, we have

$$ \begin{aligned} \varvec{Q}_{j} & = \varvec{C}_{j}^{ - 1} = (\varvec{C}_{j - 1} + \eta \lambda_{j - 1} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} )^{ - 1} \\ & = \varvec{C}_{j - 1}^{ - 1} - \frac{{\eta \lambda_{j - 1} \varvec{C}_{j - 1}^{ - 1} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} \varvec{C}_{j - 1}^{ - 1} }}{{1 + \eta \lambda_{j - 1} \varvec{w}_{j - 1}^{\text{T}} \widehat{\varvec{C}}_{j - 1}^{ - 1} \varvec{w}_{j - 1} }} \\ & = \varvec{Q}_{j - 1} - \frac{{\eta \lambda_{j - 1} \varvec{Q}_{j - 1} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} \varvec{Q}_{j - 1} }}{{1 + \eta \lambda_{j - 1} \varvec{w}_{j - 1}^{\text{T}} \varvec{Q}_{j - 1} \varvec{w}_{j - 1} }},\;j = 2, \ldots ,p. \\ \end{aligned} $$
(8.76)

Thus, by replacing \( \widehat{\varvec{Q}} \) with \( \widehat{\varvec{Q}}_{j} \) in (8.58)–(8.59) or (8.63)–(8.64), they can be used to estimate the jth minor eigen pair \( (\varvec{w}_{j} ,\lambda_{j} ) \) of \( \widehat{\varvec{C}} \).

For the CPCA case, consider the following equations

$$ \varvec{C}_{j} = \varvec{C}_{j - 1} - \lambda_{j - 1} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} ,\;j = 2, \ldots ,p $$
(8.77)

where \( (\varvec{w}_{j - 1} ,\lambda_{j - 1} ) \) is the (j  1)th principal eigen pair that has been extracted. It is found that

$$ \begin{aligned} \widehat{\varvec{C}}_{j} \varvec{w}_{q} & = (\widehat{\varvec{C}}_{j - 1} - \lambda_{j - 1} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} )\varvec{w}_{q} \\ & = (\widehat{\varvec{C}}_{1} - \sum\limits_{r = 1}^{j - 1} {\lambda_{r} \varvec{w}_{r} \varvec{w}_{r}^{\text{T}} } )\varvec{w}_{q} \\ & = \widehat{\varvec{C}}_{1} \varvec{w}_{q} - \sum\limits_{r = 1}^{j - 1} {\lambda_{r} \varvec{w}_{r} \varvec{w}_{r}^{\text{T}} } \varvec{w}_{q} \\ & = \left\{ {\begin{array}{*{20}l} 0 \hfill & {{\text{for}}\;q = 1, \ldots ,j - 1} \hfill \\ {\widehat{\varvec{C}}_{1} \varvec{w}_{q} = \lambda_{q} \varvec{w}_{q} } \hfill & {{\text{for}}\;q = j, \ldots ,p} \hfill \\ \end{array} } \right.. \\ \end{aligned} $$
(8.78)

Suppose that the matrix \( \widehat{\varvec{C}}_{1} \) has eigenvectors \( \varvec{w}_{1} ,\varvec{w}_{2} , \ldots ,\varvec{w}_{n} \) corresponding to eigenvalues \( \sigma_{1} > \sigma_{2} > \cdots > \sigma_{n} ( > 0) \), and then the matrix \( \varvec{C}_{j} \) has eigenvectors \( \varvec{w}_{j} , \ldots ,\varvec{w}_{n} ,\varvec{w}_{1} , \ldots ,\varvec{w}_{j - 1} \) corresponding to eigenvalues \( \sigma_{j} > \cdots > \sigma_{n} > \hat{\sigma }_{1} = \cdots = \hat{\sigma }_{j - 1} ( = 0) \). In this case, \( \sigma_{j} \) is the largest eigenvalue of \( \varvec{C}_{j} \). Thus, by replacing \( \widehat{\varvec{C}} \) with \( \widehat{\varvec{C}}_{j} \) in (8.70)–(8.71) or (8.72)–(8.73), they can be used to estimate the jth principal eigen pair \( (\varvec{w}_{j} ,\lambda_{j} ) \) of \( \widehat{\varvec{C}} \).

8.3.3 Analysis of Convergence and Self-stabilizing Property

The major work of convergence analysis of coupled rules is to find the eigenvalues of the Jacobian

$$ \varvec{J}\,(\varvec{w}_{1} ,\lambda_{1} ) = \left( {\begin{array}{*{20}c} {\frac{{\partial \dot{\varvec{w}}}}{{\partial \varvec{w}^{\text{T}} }}} & {\frac{{\partial \dot{\varvec{w}}}}{\partial \lambda }} \\ {\frac{{\partial \dot{\lambda }}}{{\partial \varvec{w}^{\text{T}} }}} & {\frac{{\partial \dot{\dot{\lambda }}}}{\partial \lambda }} \\ \end{array} } \right) $$
(8.79)

of the differential equations for a stationary point \( (\varvec{w}_{1} ,\lambda_{1} ) \). For fMCA rule, after some manipulations (see Appendix B in [6]), we get

$$ \varvec{J}_{fMCA} (\varvec{w}_{1} ,\lambda_{1} ) = \left( {\begin{array}{*{20}c} {\varvec{C}^{ - 1} \lambda_{1} - \varvec{I} - \varvec{w}_{1} \varvec{w}_{1}^{\text{T}} } & {\mathbf{0}} \\ { - 2\lambda_{1} \varvec{w}_{1}^{\text{T}} } & { - 1} \\ \end{array} } \right). $$
(8.80)

The Jacobian can be simplified by an orthogonal transformation with

$$ \varvec{U} = \left( {\begin{array}{*{20}c} {\overline{\varvec{W}} } & {\mathbf{0}} \\ {{\mathbf{0}}^{\text{T}} } & 1 \\ \end{array} } \right). $$
(8.81)

The transformed Jacobian \( \varvec{J}^{*} = \varvec{U}^{\text{T}} \varvec{JU} \) has the same eigenvalues as J. In the vicinity of a stationary point \( (\varvec{w}_{1} ,\lambda_{1} ) \), we approximate \( \overline{\varvec{W}}^{\text{T}} \varvec{w} \approx \varvec{e}_{1} \) and obtain

$$ \varvec{J}_{fMCA}^{*} (\varvec{w}_{1} ,\lambda_{1} ) = \left( {\begin{array}{*{20}c} {\bar{\varvec{\varLambda }}^{ - 1} \lambda_{1} - \varvec{I} - \varvec{e}_{1} \varvec{e}_{1}^{\text{T}} } & {\mathbf{0}} \\ { - 2\lambda_{1} \varvec{e}_{1}^{\text{T}} } & { - 1} \\ \end{array} } \right). $$
(8.82)

The eigenvalues \( \alpha \) of \( \varvec{J}^{*} \) are determined as \( { \det }(\varvec{J}^{*} - \alpha \varvec{I}) = 0 \), which are

$$ \alpha_{1} = \alpha_{n + 1} = - 1,\;\alpha_{j} = \frac{{\lambda_{1} }}{{\lambda_{j} }} - 1\mathop \approx \limits^{{\lambda_{1} \ll \lambda_{j} }} - 1,\;j = 2, \ldots ,n. $$
(8.83)

Since stability requires \( \alpha < 0 \) and thus \( \lambda_{1} < \lambda_{j} ,\;j = 2, \ldots ,n \), we find that only minor eigen pairs are stable stationary points, while all others are saddles or repellers. What’s more, if we further assume \( \lambda_{1} \ll \lambda_{j} , \) all eigenvalues are \( \alpha \approx - 1 \). Hence, the system converges with approximately equal speed in all its eigen directions, and this speed is widely independent of the eigenvalues \( \lambda_{j} \) of the covariance matrix [2]. That is to say, the speed stability problem does not exist in fMCA algorithm.

Similarly, for aMCA rule, we analyze the stability by finding the eigenvalues of

$$ \varvec{J}_{aMCA}^{*} (\varvec{w}_{1} ,\lambda_{1} ) = \left( {\begin{array}{*{20}c} {\bar{\varvec{\varLambda }}^{ - 1} \lambda_{1} - \varvec{I} - 2\varvec{e}_{1} \varvec{e}_{1}^{\text{T}} } & {\mathbf{0}} \\ { - 2\lambda_{1} \varvec{e}_{1}^{\text{T}} } & { - 1} \\ \end{array} } \right) $$
(8.84)

which are

$$ \alpha_{1} = - 2,\;\alpha_{n + 1} = - 1,\;\alpha_{j} = \frac{{\lambda_{1} }}{{\lambda_{j} }} - 1,\;j = 2, \ldots ,n. $$
(8.85)

The situation of aMCA is similar to that of fMCA, and the only difference is that the first eigenvalue of Jacobian is \( \alpha_{1} = - 1 \) for fMCA and \( \alpha_{1} = - 2 \) for aMCA. Thus, the convergence speed of fMCA and aMCA is almost the same.

Similarly, the transformed Jacobian functions of fPCA and aPCA are given as:

$$ \varvec{J}_{fPCA}^{*} (\varvec{w}_{1} ,\lambda_{1} ) = \left( {\begin{array}{*{20}c} {\bar{\varvec{\varLambda }}^{ - 1} \lambda_{1} - \varvec{I} - \varvec{e}_{1} \varvec{e}_{1}^{\text{T}} } & {\mathbf{0}} \\ {2\lambda_{1} \varvec{e}_{1}^{\text{T}} } & { - 1} \\ \end{array} } \right) $$
(8.86)

and

$$ \varvec{J}_{aPCA}^{*} (\varvec{w}_{1} ,\lambda_{1} ) = \left( {\begin{array}{*{20}c} {\bar{\varvec{\varLambda }}^{ - 1} \lambda_{1} - \varvec{I} - 2\varvec{e}_{1} \varvec{e}_{1}^{\text{T}} } & {\mathbf{0}} \\ {2\lambda_{1} \varvec{e}_{1}^{\text{T}} } & { - 1} \\ \end{array} } \right) $$
(8.87)

respectively. And the eigenvalues of (8.86) and (8.87) are given as:

$$ \alpha_{1} = \alpha_{n + 1} = - 1,\;\alpha_{j} = \frac{{\lambda_{j} }}{{\lambda_{n} }} - 1\mathop \approx \limits^{{\lambda_{n} \gg \lambda_{j} }} - 1,\;j = 1, \ldots ,n - 1 $$
(8.88)
$$ \alpha_{1} = - 2,\alpha_{n + 1} = - 1,\;\alpha_{j} = \frac{{\lambda_{j} }}{{\lambda_{n} }} - 1\mathop \approx \limits^{{\lambda_{n} \ll \lambda_{j} }} - 1,\;j = 1, \ldots ,n - 1 $$
(8.89)

respectively. We can see that only principal eigen pairs are stable stationary points, while all others are saddles or repellers. We can further assume \( \lambda_{1} \gg \lambda_{j} \) and thus \( \alpha_{j} \approx - 1\;(j \ne 1) \) for fPCA and aPCA.

The analysis of the self-stabilizing property of the proposed algorithms is omitted here. For details, see [6].

8.3.4 Simulation Experiments

In this section, we provide several experiments to illustrate the performance of the proposed algorithms in comparison with some well-known coupled algorithms and unified algorithms. Experiments 1 and 2 mainly show the stability of proposed CMCA and CPCA algorithms in comparison with existing CMCA and CPCA algorithms, respectively. In experiment 3, the self-stabilizing property of the proposed algorithm is shown. In experiment 4, we compare the performance of aMCA and aPCA with that of two unified algorithms. Experiments 5 and 6 illustrate some examples of practical applications.

In experiments 1–4, all algorithms are used to extract the minor or principal component from a high-dimensional input data sequence, which is generated from \( \varvec{x} = \varvec{B} \cdot \varvec{y}(t), \) where each column of \( \varvec{B} \in \Re^{30 \times 30} \) is Gaussian with variance 1/30, and \( \varvec{y}(t) \in \Re^{30 \times 1} \) is Gaussian and randomly generated.

In all experiments, to measure the estimation accuracy, we compute the norm of eigenvector estimation (weight vector) \( \left\| {\varvec{w}(k)} \right\| \) and the projection \( [\psi (k)] \) of the weight vector onto the true eigenvector at each step:

$$ \psi (k) = \frac{{\left| {\varvec{w}^{\text{T}} (k)\varvec{w}_{1} } \right|}}{{\left\| {\varvec{w}(k)} \right\|}} $$

where w 1 is the true minor (for MCA) or principal (for PCA) eigenvector with unit length.

Unless otherwise stated, we set the initial conditions of experiments 1–4 as follows: (1) The weight vector is initialized with a random vector (unit length). (2) The learning rate \( \gamma (k) \) starts at \( \gamma (0) = 10^{ - 2} \) and decays exponentially toward zero with a final value \( \gamma \,(k_{ \hbox{max} } ) = 10^{ - 4} \). (3) We set \( \alpha = 1 \) (if used), and \( \lambda \,(0) = 0.001 \) for all cMCA and cPCA algorithms.

In experiments 1 and 2, k max = 20,000 training steps are executed for all algorithms. In order to test the stability of the proposed algorithms, after 10,000 training steps, we drastically change the input signals; thus, the eigen information changed suddenly. All algorithms start to extract the new eigen pair since k = 10001. The learning rate for nMCA is 10 times smaller than that for the others. Then, 20 times of Monte Carlo simulation are executed for all experiments.

Figure 8.1 shows the time course of the projection of minor weight vector. We can see that in all rules except mMCA the projection converges toward unity; thus, these weight vectors align with the true eigenvector. The convergence speed of mMCA is lower than that of the others and the projection of mMCA cannot converge toward unity within 10,000 steps. We can also find that the convergence speed of fMCA and aMCA rules is similar, and higher than that of the others. We can also find that, at time step k = 10,001, where the input signals changed suddenly, all algorithms start to extract the new eigen pair. Figure 8.2 shows the time course of weight vector length. We can find that the vector length of nMCA converges to a nonunit length. The convergence speed and the stability of fMCA and aMCA are higher and better than that of the others. It can be seen that the convergence speed of aMCA is a bit higher than that of fMCA.

Fig. 8.1
figure 1

Projection of weight vector onto the true minor eigenvector

Fig. 8.2
figure 2

Weight vector length

Figure 8.3 shows the time course of the minor eigenvalue estimation. We can see that mMCA cannot extract the minor eigenvalue as effective as the other algorithms after the input signals changed. From Figs. 8.1 to 8.3, we can conclude that the performance of fMCA and aMCA is better than that of the other cMCA algorithms. Moreover, nMCA contains C and C −1 simultaneously in the equations, and we can prove that mMCA also has the speed stability problem though it is a coupled rule. These may be the reason why our algorithms perform better than nMCA and mCMA.

Fig. 8.3
figure 3

Minor eigenvalue estimation

In experiment 2, we compare the performance of fPCA and aPCA with that of ALA and nPCA. The time course of the projection and the eigenvector length of principal weight vector are shown in Figs. 8.4 and 8.5, and the principal eigenvalue estimation is shown in Fig. 8.6, respectively. In Fig. 8.5, the curves for fPCA and aPCA are shown in a subfigure because of its small amplitude. We can see that the convergence speed of fPCA and aPCA is similar to that of nPCA and ALA, but fPCA and aPCA have less fluctuations over time compared with nPCA and ALA. This is actually because that in fPCA and aPCA the covariance matrix C is updated by (8.69) while in nPCA and ALAC that is updated by C(k) = x(k) x T(k).

Fig. 8.4
figure 4

Projection of weight vector onto the true principal eigenvector

Fig. 8.5
figure 5

Weight vector length

Fig. 8.6
figure 6

Principal eigenvalue estimation

Experiment 3 is used to test the self-stabilizing property of the proposed algorithms. Figure 8.7 shows the time course of weight vector length estimation of fMCA, aMCA, fPCA, and aPCA which are initialized with nonunit length. We can find that all algorithms converge to unit length rapidly, which shows the self-stabilizing property of eigenvector estimates. The self-stabilizing property of eigenvalue estimates is shown in Figs. 8.3 and 8.6. From the results of experiments 1–3, we can see that the performance off MCA and fPCA is similar to that of aMCA and aPCA, respectively. Thus in experiment 4, we only compare the performance of aMCA and aPCA with that of two unified algorithms which were proposed in recent years, i.e., (1) kMCA + kPCA [14], where k means this algorithm was proposed by Kong;(2) pMCA + pPCA [15], where p means this algorithm was proposed by Peng. The time course of the projection of weight vector onto the true principal/minor eigenvector and the weight vector length is shown in Figs. 8.8 and 8.9, respectively. In Fig. 8.9, the first 1000 steps of aMCA and kMCA are shown in a subfigure. We can see that the proposed algorithms perform better the existing unified algorithms.

Fig. 8.7
figure 7

Weight vector length

Fig. 8.8
figure 8

Projection of weight vector onto the true principal/minor eigenvector

Fig. 8.9
figure 9

Weight vector length

In summary, we propose a novel method to derive neural network algorithms based on a special information criterion. We firstly obtain two CMCA algorithms based on the modified Newton’s method. Then, two CPCA rules are obtained from the CMCA rules. In this case, two unified and coupled algorithms are obtained, which are capable of both PCA and MCA and can also mitigate the speed-stability problem. The proposed algorithms converge faster and are more stable than existing algorithms. Moreover, all of the proposed algorithms are self-stabilized.

8.4 Adaptive Coupled Generalized Eigen Pairs Extraction Algorithms

In [4], based on Moller’s work, Nguyen developed two well-performed quasi-Newton-type algorithms to extract generalized eigen pairs. Actually, Nguyen’s algorithms are the generalization of Moller’s coupled learning algorithms. But with DDT approach, Nguyen also reported the explicit convergence analysis for their learning rules, i.e., the region within which the initial estimate of the eigen pair must be chosen to guarantee the convergence to the desired eigen pair. However, as stated in [4], the GMCA algorithm proposed in [4] may lose robustness when the smallest eigenvalue of the matrix pencil is far less than 1.

Motivated by the efficacy of the coupled learning rules in [2] and [4] for the HEP and GHEP, we will introduce novel coupled algorithms proposed by us to estimate the generalized eigen pair information in this section. Based on a novel generalized information criterion, we have obtained an adaptive GMCA algorithm, as well as an adaptive GPCA algorithm by modifying the GMCA algorithm. It is worth noting that the procedure of obtaining the algorithms in this section is easier than the existing methods, for that it does not need to calculate the inverse of the Hessian matrix when deriving the new algorithms. It can be seen that our algorithms do not involve the reciprocal of the estimated eigenvalue in equations. Thus, they are numerically more robust than Nguyen’s algorithms even when the smallest eigenvalue of the matrix pencil is far less than 1. Compared with Nguyen’s algorithms, it is much easier to choose step size for online implementation of the algorithms.

8.4.1 A Coupled Generalized System for GMCA and GPCA

  1. A.

    Generalized information criterion and coupled generalized system

Generally speaking, neural network model-based algorithms are often derived by optimizing some cost function or information criterion [2, 16]. As pointed out in [17], any criterion may be used if the maximum or minimum (possibly under a constraint) coincides with the desired principal or minor directions or subspace. In [2], Moller pointed out that the freedom of choosing an information criterion is greater if Newton’s method is applied. In that case, it suffices to find a criterion of which the stationary points coincide with the desired solutions. Moller first proposed a special criterion which involves both eigenvector and eigenvalue estimates [2]. Based on Moller’s work, Nguyen [4] first proposed to derive novel generalized eigen pair extraction algorithms by finding the stationary points of a generalized information criterion which is actually the generalization of Moller’s information criterion.

In this section, for a given matrix pencil (R y , R x ), we propose a generalized information criterion based on the criteria introduced in [2] and [4] as

$$ p(\varvec{w},\lambda ) = \varvec{w}^{H} \varvec{R}_{\varvec{y}} \varvec{w} - \lambda \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} + \lambda . $$
(8.90)

We can see that

$$ \left( {\begin{array}{*{20}c} {\frac{\partial p}{\partial w}} \\ {\frac{\partial p}{\partial \lambda }} \\ \end{array} } \right) = \left( {\begin{array}{*{20}c} {2\varvec{R}_{\varvec{y}} \varvec{w} - 2\lambda \varvec{R}_{\varvec{x}} \varvec{w}} \\ { - \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} + 1} \\ \end{array} } \right). $$
(8.91)

Thus, the stationary points \( (\bar{\varvec{w}},\bar{\lambda }) \) are defined by

$$ \left\{ {\begin{array}{*{20}c} {\varvec{R}_{\varvec{y}} \bar{\varvec{w}} = \bar{\lambda }\varvec{R}_{\varvec{x}} \bar{\varvec{w}}} \\ {\bar{\varvec{w}}^{H} \varvec{R}_{\varvec{x}} \bar{\varvec{w}} = 1} \\ \end{array} } \right., $$
(8.92)

from which we can conclude that \( \bar{\varvec{w}}^{H} \varvec{R}_{\varvec{y}} \bar{\varvec{w}} = \bar{\lambda }\bar{\varvec{w}}^{H} \varvec{R}_{\varvec{x}} \bar{\varvec{w}} = \bar{\lambda } \). These imply that a stationary point \( (\bar{\varvec{w}},\bar{\lambda }) \) of (8.90) is a generalized eigen pair of the matrix pencil (R y , R x ). The Hessian of the criterion is given as:

$$ \varvec{H}(\varvec{w},\lambda ) = \left( {\begin{array}{*{20}c} {\frac{{\partial^{2} p}}{{\partial \varvec{w}^{2} }}} & {\frac{{\partial^{2} p}}{{\partial \varvec{w}\partial \lambda }}} \\ {\frac{{\partial^{2} p}}{{\partial \lambda \partial \varvec{w}}}} & {\frac{{\partial^{2} p}}{{\partial \lambda^{2} }}} \\ \end{array} } \right) = 2\left( {\begin{array}{*{20}c} {\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} } & { - \varvec{R}_{\varvec{x}} \varvec{w}} \\ { - \varvec{w}^{H} \varvec{R}_{\varvec{x}} } & 0 \\ \end{array} } \right). $$
(8.93)

After applying the Newton’s method, the equation used to obtain the system can be written as:

$$ \left( {\begin{array}{*{20}c} {\dot{\varvec{w}}} \\ {\dot{\lambda }} \\ \end{array} } \right) = - \varvec{H}^{ - 1} (\varvec{w},\lambda )\left( {\begin{array}{*{20}c} {\frac{\partial p}{{\partial \varvec{w}}}} \\ {\frac{\partial p}{\partial \lambda }} \\ \end{array} } \right), $$
(8.94)

where \( \dot{\varvec{w}} \) and \( \dot{\lambda } \) are the derivatives of w and \( \lambda \) with respect to time t, respectively. Based on the above equation, Nguyen [4] obtained their algorithms by finding the inverse matrix of the Hessian \( \varvec{H}^{ - 1} (\varvec{w},\lambda ) \). Premultiplying both sides of the above equation by \( \varvec{H}(\varvec{w},\lambda ) \), it yields

$$ \varvec{H}\,(\varvec{w},\lambda )\left( {\begin{array}{*{20}c} {\dot{\varvec{w}}} \\ {\dot{\lambda }} \\ \end{array} } \right) = - \left( {\begin{array}{*{20}c} {\frac{\partial p}{{\partial \varvec{w}}}} \\ {\frac{\partial p}{\partial \lambda }} \\ \end{array} } \right). $$
(8.95)

In this section, all our later algorithms are built on this newly proposed Eq. (8.95). Substituting (8.91) and (8.93) into (8.95), we get

$$ 2\left( {\begin{array}{*{20}c} {\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} } & { - \varvec{R}_{\varvec{x}} \varvec{w}} \\ { - \varvec{w}^{H} \varvec{R}_{\varvec{x}} } & 0 \\ \end{array} } \right)\left( {\begin{array}{*{20}c} {\dot{\varvec{w}}} \\ {\dot{\lambda }} \\ \end{array} } \right) = - \left( {\begin{array}{*{20}c} {2\varvec{R}_{\varvec{y}} \varvec{w} - 2\lambda \varvec{R}_{\varvec{x}} \varvec{w}} \\ { - \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} + 1} \\ \end{array} } \right). $$
(8.96)

From (8.96), we can get

$$ (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )\dot{w} - \varvec{R}_{\varvec{x}} \varvec{w}\dot{\lambda } = - (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )\varvec{w} $$
(8.97)
$$ - 2\varvec{w}^{H} \varvec{R}_{\varvec{x}} \dot{\varvec{w}} = \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} - 1. $$
(8.98)

Premultiplying both sides of (8.97) by \( (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \) gives the following:

$$ \dot{\varvec{w}} = (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}\dot{\lambda } - \varvec{w}. $$
(8.99)

Substituting (8.99) into (8.98), we have

$$ - 2\varvec{w}^{H} \varvec{R}_{\varvec{x}} \left( {(\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}\dot{\lambda } - \varvec{w}} \right) = \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} - 1. $$
(8.100)

Thus,

$$ \dot{\lambda } = \frac{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} + 1}}{{2\varvec{w}^{H} \varvec{R}_{\varvec{x}} (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}}. $$
(8.101)

Substituting (8.101) into (8.99), we get

$$ \dot{\varvec{w}} = \frac{{(\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}\,(\varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{w} + 1)}}{{2\varvec{w}^{H} \varvec{R}_{\varvec{x}} \,(\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}} - \varvec{w}. $$
(8.102)

By approximating w H R x w = 1 in the vicinity of the stationary point (w 1, λ 1), we get a coupled generalized system as:

$$ \dot{\varvec{w}} = \frac{{(\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}}{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}} - \varvec{w}, $$
(8.103)
$$ \dot{\lambda } = \frac{1}{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}} - \lambda . $$
(8.104)
  1. B.

    Coupled generalized systems for GMCA and GPCA

Let \( \varvec{\varLambda} \) be a diagonal matrix containing all generalized eigenvalues of the matrix pencil (R y , R x ), i.e., \( \varvec{\varLambda}= {\text{diag}}\{ \lambda_{1} , \ldots ,\lambda_{N} \} \). Let \( \varvec{V} = [\varvec{v}_{1} , \ldots ,\varvec{v}_{N} ] \), where v 1, , v N are the generalized eigenvectors associated with the generalized eigenvalues \( \lambda_{1} , \ldots ,\lambda_{N} \). It holds that \( \varvec{V}^{H} \varvec{R}_{\varvec{x}} \varvec{V} = \varvec{I},\;\varvec{V}^{H} \varvec{R}_{\varvec{y}} \varvec{V} =\varvec{\varLambda} \). Hence, \( \varvec{R}_{\varvec{x}} = (\varvec{V}^{H} )^{ - 1} \varvec{V}^{ - 1} \) and \( \varvec{R}_{\varvec{y}} = (\varvec{V}^{H} )^{ - 1} \varvec{\varLambda V}^{ - 1} \), and

$$ (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} = \varvec{V}(\varvec{\varLambda}- \lambda \varvec{I})^{ - 1} \varvec{V}^{H} . $$
(8.105)

If we consider \( \varvec{w} \approx \varvec{v}_{1} \) and \( \lambda \approx \lambda_{1} \ll \lambda_{j} (2 \le j \le N) \) in the vicinity of the stationary point \( (\varvec{w}_{1} ,\lambda_{1} ) \), then we have \( \lambda_{j} - \lambda \approx \lambda_{j} \). In that case, \( \varvec{V}^{H} \varvec{R}_{\varvec{x}} \varvec{w} \approx e_{1} = [1,0, \ldots ,0]^{H} \) and

$$ \begin{aligned}\varvec{\varLambda}- \lambda \varvec{I} & = {\text{diag}}\left\{ {\lambda_{1} - \lambda , \ldots ,\lambda_{N} - \lambda } \right\} \\ & \approx {\text{diag}}\left\{ {\lambda_{1} - \lambda ,\lambda_{2} , \ldots ,\lambda_{N} } \right\} \\ & =\varvec{\varLambda}- \lambda \varvec{e}_{1} \varvec{e}_{1}^{H} , \\ \end{aligned} $$
(8.106)

where diag{∙} is the diagonal function. Substituting (8.106) into (8.105), we get the following:

$$ \begin{aligned} (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} & = \varvec{V}(\varvec{\varLambda}- \lambda \varvec{I})^{ - 1} \varvec{V}^{H} \\ & \approx [(\varvec{V}^{H} )^{ - 1} (\varvec{\varLambda}- \lambda \varvec{e}_{1} \varvec{e}_{1}^{H} )\varvec{V}^{ - 1} ]^{ - 1} \\ & = [\varvec{R}_{\varvec{y}} - \lambda (\varvec{V}^{H} )^{ - 1} \varvec{e}_{1} \varvec{e}_{1}^{H} \varvec{V}^{ - 1} ]^{ - 1} \\ & \approx \left[ {\varvec{R}_{\varvec{y}} - \lambda (\varvec{V}^{H} )^{ - 1} (\varvec{V}^{H} \varvec{R}_{\varvec{x}} \varvec{w})(\varvec{V}^{H} \varvec{R}_{\varvec{x}} \varvec{w})^{H} \varvec{V}^{ - 1} } \right]^{ - 1} \\ & = \left[ {\varvec{R}_{\varvec{y}} - \lambda (\varvec{R}_{\varvec{x}} \varvec{w})(\varvec{R}_{\varvec{x}} \varvec{w})^{H} } \right]^{ - 1} . \\ \end{aligned} $$
(8.107)

It can be seen that

$$ \begin{aligned} & \left[ {\varvec{R}_{\varvec{y}} - \lambda_{1} (\varvec{R}_{\varvec{x}} \varvec{v}_{1} )(\varvec{R}_{\varvec{x}} \varvec{v}_{1} )^{H} } \right]\varvec{v}_{1} \\ & \quad = \varvec{R}_{\varvec{y}} \varvec{v}_{1} - (\lambda_{1} \varvec{R}_{\varvec{x}} \varvec{v}_{1} )(\varvec{v}_{1}^{H} \varvec{R}_{\varvec{x}} \varvec{v}_{1} ) = 0. \\ \end{aligned} $$
(8.108)

Since \( \varvec{R}_{\varvec{y}} \varvec{v}_{1} = \lambda_{1} \varvec{R}_{\varvec{x}} \varvec{v}_{1} \) and \( \varvec{v}_{1}^{H} \varvec{R}_{\varvec{x}} \varvec{v}_{1} = 1 \). This means that matrix \( \varvec{R}_{\varvec{y}} - \lambda (\varvec{R}_{\varvec{x}} \varvec{w})(\varvec{R}_{\varvec{x}} \varvec{w})^{H} \) has an eigenvalue 0 associated with eigenvector v 1. This is to say, the matrix \( \varvec{R}_{\varvec{y}} - \lambda (\varvec{R}_{\varvec{x}} \varvec{w})(\varvec{R}_{\varvec{x}} \varvec{w})^{H} \) is rank-deficient and hence cannot be inverted if \( (\varvec{w},\lambda ) = (\varvec{v}_{1} ,\lambda_{1} ) \). To address this issue, we add a penalty factor \( \varepsilon \approx 1 \) in (8.107), and then it yields the following:

$$ \begin{aligned} (\varvec{R}_{\varvec{y}} - \lambda \varvec{R}_{\varvec{x}} )^{ - 1} & \approx \left[ {\varvec{R}_{\varvec{y}} - \varepsilon \lambda (\varvec{R}_{\varvec{x}} \varvec{w})(\varvec{R}_{\varvec{x}} \varvec{w})^{H} } \right]^{ - 1} \\ & = \varvec{R}_{\varvec{y}}^{ - 1} + \frac{{\varepsilon \lambda \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{ww}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} }}{{1 - \varepsilon \lambda \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}}, \\ \end{aligned} $$
(8.109)

The last step of (8.109) is obtained by using the SM-formula (Sherman–Morrison formula) [13]. Substituting (8.107) into (8.103), we get the following:

$$ \dot{\varvec{w}} = \frac{{\left( {\varvec{R}_{\varvec{y}}^{ - 1} + \frac{{\varepsilon \lambda \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{ww}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} }}{{1 - \varepsilon \lambda \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}}} \right)\varvec{R}_{\varvec{x}} \varvec{w}}}{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} \left( {\varvec{R}_{\varvec{y}}^{ - 1} + \frac{{\varepsilon \lambda \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{ww}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} }}{{1 - \varepsilon \lambda \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}}} \right)\varvec{R}_{\varvec{x}} \varvec{w}}} - \varvec{w}. $$
(8.110)

Multiplying the numerator and denominator of (8.110) by \( 1 - \varepsilon \lambda \varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w} \) simultaneously, and after some manipulations, we get

$$ \dot{\varvec{w}} = \frac{{\varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}}{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}} - \varvec{w}. $$
(8.111)

Similarly, substituting (8.107) into (8.104), we can get

$$ \dot{\lambda } = \frac{1}{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}} - \varepsilon \lambda . $$
(8.112)

It can be seen that the penalty factor \( \varepsilon \) is not necessarily needed in the equations. Or in other words, we can approximate \( \varepsilon = 1 \) in future equations. Thus, we get the following:

$$ \dot{\lambda } = \frac{1}{{\varvec{w}^{H} \varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}}} - \lambda . $$
(8.113)

Thus, (8.111) and (8.113) are the coupled systems for the GMCA case.

It is known that the ith principal generalized eigenvector v i of the matrix pencil (R y , R x ) is also the ith minor generalized eigenvector of the matrix pencil (R x , R y ). Hence, the problem of extracting principal generalized subspace of the inversed matrix pencil (R y , R x ) is equivalent to that of extracting minor generalized subspace of the matrix pencil (R x , R y ), and vice versa [4]. Therefore, by swapping R x and \( \varvec{R}_{\varvec{y}} ,\,\varvec{R}_{\varvec{x}}^{ - 1} \) and \( \varvec{R}_{\varvec{y}}^{ - 1} \) in (8.111) and (8.113), we obtain a modified system

$$ \dot{\varvec{w}} = \frac{{\varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} \varvec{w}}}{{\varvec{w}^{H} \varvec{R}_{\varvec{y}} \varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} \varvec{w}}} - \varvec{w}, $$
(8.114)
$$ \dot{\lambda } = \varvec{w}^{H} \varvec{R}_{\varvec{y}} \varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} \varvec{w} - \lambda , $$
(8.115)

to extract the minor eigen pair of matrix pencil (R x , R y ) as well as the principal eigen pair of matrix pencil (R y , R x ).

As was pointed out in [4], by using the nested orthogonal complement structure of the generalized eigen-subspace, the problem of estimating the \( p\,( \le N) \)-dimensional minor/principal generalized subspace can be reduced to multiple GHEPs of estimating the generalized eigen pairs associated with the smallest/largest generalized eigenvalues of certain matrix pencils. In the following, we will show how to estimate the remaining p − 1 minor/principal eigen pairs. In the GMCA case, consider the following equations:

$$ \varvec{R}_{j} = \varvec{R}_{j - 1} + \rho \varvec{R}_{\varvec{x}} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} \varvec{R}_{\varvec{y}} , $$
(8.116)
$$ \varvec{R}_{j}^{ - 1} = \varvec{R}_{j - 1}^{ - 1} - \frac{{\rho \varvec{R}_{j - 1}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{\text{T}} \varvec{R}_{\varvec{y}} \varvec{R}_{j - 1}^{ - 1} }}{{1 + \rho \varvec{w}_{j - 1}^{\text{T}} \varvec{R}_{\varvec{y}} \varvec{R}_{j - 1}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}_{j - 1} }}, $$
(8.117)

where \( j = 2, \ldots ,p,\,\rho \ge \lambda_{N} \,/\,\lambda_{1} ,\;\varvec{R}_{1} = \varvec{R}_{\varvec{y}} \) and \( \varvec{w}_{j - 1} = \varvec{v}_{j - 1} \) is the (j − 1)th minor generalized eigenvector extracted. It holds that

$$ \begin{aligned} \varvec{R}_{j} \varvec{v}_{q} & = (\varvec{R}_{\varvec{y}} + \rho \sum\nolimits_{i = 1}^{j - 1} {\varvec{R}_{\varvec{x}} \varvec{v}_{i} \varvec{v}_{i}^{\text{T}} \varvec{R}_{\varvec{y}} )\varvec{v}_{q} } \\ & = \varvec{R}_{\varvec{y}} \varvec{v}_{q} + \rho \sum\nolimits_{i = 1}^{j - 1} {\varvec{R}_{\varvec{x}} \varvec{v}_{i} \varvec{v}_{i}^{\text{T}} \varvec{R}_{\varvec{y}} \varvec{v}_{q} } \\ & = \lambda_{q} \varvec{R}_{\varvec{x}} \varvec{v}_{q} + \rho \lambda_{q} \sum\nolimits_{i = 1}^{j - 1} {\varvec{R}_{\varvec{x}} \varvec{v}_{i} \varvec{v}_{i}^{\text{T}} \varvec{R}_{\varvec{y}} \varvec{v}_{q} } \\ & = \left\{ {\begin{array}{*{20}l} {(1 + \rho )\lambda_{q} \varvec{R}_{\varvec{x}} \varvec{v}_{q} } \hfill & {{\text{for}}\;q = 1, \ldots ,j - 1} \hfill \\ {\lambda_{q} \varvec{R}_{\varvec{x}} \varvec{v}_{q} } \hfill & {{\text{for}}\;q = j, \ldots ,N} \hfill \\ \end{array} } \right.. \\ \end{aligned} $$
(8.118)

Thus, the matrix pencil (R j, R x ) has eigenvalues \( \lambda_{j} \le \cdots \le \lambda_{N} \le (1 + \rho )\lambda_{1} \le \cdots \le (1 + \rho )\lambda_{j - 1} \) associated with eigenvectors \( \varvec{v}_{j} , \ldots ,\varvec{v}_{N} ,\varvec{v}_{1} \ldots \varvec{v}_{j - 1} \). Equation (8.117) is obtained from (8.116) based on the SM-formula. That is to say, by replacing \( \varvec{R}_{\varvec{y}} \) with \( \varvec{R}_{j} \) and \( \varvec{R}_{\varvec{y}}^{ - 1} \) with \( \varvec{R}_{j}^{ - 1} \) in (8.111) and (8.113), we can estimate the jth minor generalized eigen pair \( (\varvec{v}_{j} ,\lambda_{j} ) \).

In the GPCA case, consider the following equation

$$ \varvec{R}_{j} = \varvec{R}_{j - 1} - \varvec{R}_{\varvec{x}} \varvec{w}_{j - 1} \varvec{w}_{j - 1}^{T} \varvec{R}_{\varvec{y}} , $$
(8.119)

where \( \varvec{R}_{1} = \varvec{R}_{\varvec{y}} \), and \( \varvec{w}_{j - 1} = \varvec{v}_{N - j + 1} \) is the (j − 1)th principal generalized eigenvector extracted. By replacing \( \varvec{R}_{\varvec{y}} \) with \( \varvec{R}_{j} \) in (8.114) and (8.115), we can estimate the jth principal generalized eigen pair \( (\varvec{v}_{N - j + 1} ,\lambda_{N - j + 1} ) \).

8.4.2 Adaptive Implementation of Coupled Generalized Systems

In engineering practice, the matrices R y and R x are the covariance matrices of random input sequences \( \left\{ {y(k)} \right\}_{k \in z} \) and \( \left\{ {x(k)} \right\}_{k \in z} \), respectively. Thus, the matrix pencil (R y , R x ) is usually unknown in advance, and even slowly changing over time if the signal is nonstationary. In that case, the matrices R y and R x are variables and thus need to be estimated with online approach. In this section, we propose to update R y and R x with:

$$ \widehat{\varvec{R}}_{\varvec{y}} (k + 1) = \beta \widehat{\varvec{R}}_{\varvec{y}} (k) + \varvec{y}(k + 1)\varvec{y}^{H} (k + 1), $$
(8.120)
$$ \widehat{\varvec{R}}_{\varvec{x}} (k + 1) = \alpha \widehat{\varvec{R}}_{\varvec{x}} (k) + \varvec{x}(k + 1)\varvec{x}^{H} (k + 1). $$
(8.121)

By using the MS-formula, \( \varvec{Q}_{\varvec{y}} (k) = \widehat{\varvec{R}}_{\varvec{y}}^{ - 1} (k) \) and \( \varvec{Q}_{\varvec{x}} (k) = \widehat{\varvec{R}}_{\varvec{x}}^{ - 1} (k) \) can be updated as:

$$ \varvec{Q}_{\varvec{y}} \left( {k + 1} \right){ = }\frac{1}{\beta }\left( {\varvec{Q}_{\varvec{y}} (k) - \frac{{\varvec{Q}_{\varvec{y}} (k)\varvec{y}\left( {k + 1} \right)\varvec{y}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{y}} (k)}}{{\alpha + \varvec{y}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{y}} (k)\varvec{y}\left( {k + 1} \right)}}} \right), $$
(8.122)
$$ \varvec{Q}_{\varvec{x}} \left( {k + 1} \right){ = }\frac{1}{\alpha }\left( {\varvec{Q}_{\varvec{x}} (k) - \frac{{\varvec{Q}_{\varvec{x}} (k)\varvec{x}\left( {k + 1} \right)\varvec{x}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{x}} (k)}}{{\alpha + \varvec{x}^{H} \left( {k + 1} \right)\varvec{Q}_{\varvec{x}} (k)\varvec{x}\left( {k + 1} \right)}}} \right). $$
(8.123)

It is known that

$$ \mathop { \lim }\limits_{k \to \infty } \frac{1}{k}\widehat{\varvec{R}}_{\varvec{y}} (k) = \varvec{R}_{\varvec{y}} $$
(8.124)
$$ \mathop { \lim }\limits_{k \to \infty } \frac{1}{k}\widehat{\varvec{R}}_{\varvec{x}} (k) = \varvec{R}_{\varvec{x}} $$
(8.125)

when α = β = 1. By replacing \( \varvec{R}_{\varvec{y}} ,\varvec{R}_{\varvec{x}} ,\varvec{R}_{\varvec{y}}^{ - 1} \) and \( \varvec{R}_{\varvec{x}}^{ - 1} \) in (8.111)–(8.115) with \( \widehat{\varvec{R}}_{\varvec{y}} (k),\,\widehat{\varvec{R}}_{\varvec{x}} (k),\,\varvec{Q}_{\varvec{y}} (k) \) and \( \varvec{Q}_{\varvec{x}} (k) \), respectively, we can easily obtain the online GMCA algorithm with normalized step as:

$$ \tilde{\varvec{w}}\,(k + 1) = \eta_{1} \frac{{\varvec{Q}_{\varvec{y}} \left( {k + 1} \right)\widehat{\varvec{R}}_{\varvec{x}} \left( {k + 1} \right)\varvec{w}\,(k)}}{{\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{x}} \left( {k + 1} \right)\varvec{Q}_{\varvec{y}} \left( {k + 1} \right)\widehat{\varvec{R}}_{\varvec{x}} \left( {k + 1} \right)\varvec{w}\,(k)}} + (1 - \eta_{1} )\varvec{w}\,(k), $$
(8.126)
$$ \varvec{w}\left( {k + 1} \right) = \frac{{\tilde{\varvec{w}}\left( {k + 1} \right)}}{{\left\| {\tilde{\varvec{w}}\left( {k + 1} \right)} \right\|_{{\widehat{\varvec{R}}_{\varvec{x}} \left( {k + 1} \right)}} }}, $$
(8.127)
$$ \lambda\left( {k + 1} \right) = \gamma_{1} \frac{1}{{\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{x}} (k + 1)\varvec{Q}_{\varvec{y}} (k + 1)\widehat{\varvec{R}}_{\varvec{x}} (k + 1)\varvec{w}(k)}} + (1 - \gamma_{1} ){\lambda (}k\text{)}, $$
(8.128)

and the online GPCA algorithm with normalized step as:

$$ \tilde{\varvec{w}}(k + 1) = \eta_{2} \frac{{\varvec{Q}_{\varvec{x}} (k + 1)\widehat{\varvec{R}}_{\varvec{y}} (k + 1)\varvec{w}(k)}}{{\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{y}} (k + 1)\varvec{Q}_{\varvec{x}} (k + 1)\widehat{\varvec{R}}_{\varvec{y}} (k + 1)\varvec{w}(k)}} + (1 - \eta_{2} )\varvec{w}(k), $$
(8.129)
$$ \varvec{w}(k + 1) = \frac{{\tilde{\varvec{w}}(k + 1)}}{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{\widehat{\varvec{R}}_{y} (k + 1)}} }}, $$
(8.130)
$$ \lambda\left( {k + 1} \right) = \gamma_{2} \varvec{w}^{H} (k)\,\widehat{\varvec{R}}_{\varvec{y}} \left( {k + 1} \right)\varvec{Q}_{\varvec{x}} \left( {k + 1} \right)\widehat{\varvec{R}}_{\varvec{y}} \left( {k + 1} \right)\varvec{w}\,(k) + (1 - \gamma_{2} )\lambda(k), $$
(8.131)

where η 1, η 2, γ 1, γ 2∈ (0, 1] are the step sizes.

In the rest of this section, for convenience, we refer to the GPCA and GMCA algorithms proposed in [4] as nGPCA and nGMCA for short, respectively, where n means that these algorithms were proposed by Nguyen. Similarly, we refer to the algorithm in (8.126)–(8.128) as fGMCA and the algorithm in (8.129)–(8.131) as fGPCA for short.

At the end of this section, we discuss the computational complexity of our algorithms. Taking fGMCA as an example, the computation of \( \widehat{\varvec{R}}_{\varvec{x}} (k) \) and \( \varvec{Q}_{\varvec{y}} (k) \) requires 5N 2 + O(N) multiplications. Moreover, by using (8.121), we have the following:

$$ \begin{aligned} & \widehat{\varvec{R}}_{\varvec{x}} (k + 1)\varvec{w}(k) \\ & \quad = \left[ {\frac{k}{k + 1}\widehat{\varvec{R}}_{\varvec{x}} (k) + \frac{1}{k + 1}\varvec{x}(k + 1)\varvec{x}^{H} (k + 1)} \right]\varvec{w}(k) \\ & \quad = \frac{k}{k + 1}\widehat{\varvec{R}}_{\varvec{x}} (k)\varvec{w}(k) + \frac{1}{k + 1}\varvec{x}(k + 1)[\varvec{x}^{H} (k + 1)\varvec{w}(k)], \\ \end{aligned} $$
(8.132)

where

$$ \widehat{\varvec{R}}_{\varvec{x}} (k)\varvec{w}(k) = \frac{{\widehat{\varvec{R}}_{\varvec{x}} (k)\tilde{\varvec{w}}(k)}}{{\sqrt {\tilde{\varvec{w}}(k)^{H} \widehat{\varvec{R}}_{\varvec{x}} (k)\tilde{\varvec{w}}(k)} }}. $$
(8.133)

Since \( \widehat{\varvec{R}}_{\varvec{x}} (k)\tilde{\varvec{w}}(k) \) has been computed at the previous step when calculating the R x -norm of w(k), the update of \( \widehat{\varvec{R}}_{\varvec{x}} (k + 1)\varvec{w}\,(k) \) requires only O(N) multiplications. Thus, the updates of w(k) and λ(k) in fGMCA requires 2N 2 + O(N) multiplications. Hence, fGMCA requires a total of 7N 2 +O(N) multiplications at each iteration. In a similar way, we can see that fGPCA also requires a total of 7N 2 + O(N) multiplications at each iteration. Thus, the computational complexity of both fGMCA and fGPCA is less than that of nGMCA and nGPCA (i.e., 10N 2 + O(N)).

8.4.3 Convergence Analysis

The convergence of neural network learning algorithms is a difficult topic for direct study and analysis, and as pointed out [18], from the application point of view. The DDT method is more reasonable for studying the convergence of algorithms than traditional method. Using the DDT approach, Nguyen first reported the explicit convergence analysis of coupled generalized eigen pair extraction algorithms [4]. In this section, we will also analyze the convergence of our algorithms with the DDT approach on the basis of [4].

The DDT system of fGMCA is given as:

$$ \tilde{\varvec{w}}\left( {k + 1} \right) = \varvec{w}(k) + \eta_{1} \left[ {\frac{{\varvec{Q}_{\varvec{y}} \widehat{\varvec{R}}_{\varvec{x}} \varvec{w}(k)}}{{\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{x}} \varvec{Q}_{\varvec{y}} \widehat{\varvec{R}}_{\varvec{x}} \varvec{w}(k)}} - \varvec{w}(k)} \right], $$
(8.134)
$$ \varvec{w}(k + 1) = \frac{{\tilde{\varvec{w}}(k + 1)}}{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{\varvec{R}_{\varvec{x}} }} }}, $$
(8.135)
$$ \lambda(k + 1) =\lambda(k) + \gamma_{1} \left[ {\frac{1}{{\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{x}} \varvec{Q}_{\varvec{y}} \widehat{\varvec{R}}_{\varvec{x}} \varvec{w}(k)}}{ - \lambda }(k)} \right]. $$
(8.136)

which is referred to as DDT System 1.

And the DDT system of fGPCA is given as:

$$ \tilde{\varvec{w}}(k + 1) = \varvec{w}(k) + \eta_{2} \left[ {\frac{{\varvec{Q}_{\varvec{x}} \widehat{\varvec{R}}_{\varvec{y}} \varvec{w}(k)}}{{\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{y}} \varvec{Q}_{\varvec{x}} \widehat{\varvec{R}}_{\varvec{y}} \varvec{w}(k)}} - \varvec{w}(k)} \right], $$
(8.137)
$$ \varvec{w}(k + 1) = \frac{{\tilde{\varvec{w}}(k + 1)}}{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{\varvec{R}_{{\varvec{y}}} }} }}, $$
(8.138)
$$ \lambda(k + 1) =\lambda(k) + \gamma_{2} [\varvec{w}^{H} (k)\widehat{\varvec{R}}_{\varvec{y}} \varvec{Q}_{\varvec{x}} \widehat{\varvec{R}}_{\varvec{y}} \varvec{w}(k) -\lambda(k)]. $$
(8.139)

which is referred to as DDT System 2.

Similar to [4], we also denote by \( \left\| \varvec{u} \right\|_{\varvec{R}} \,= \sqrt {\varvec{u}^{H} \varvec{Ru}} \) the R-norm of a vector u, where \( \varvec{R} \in C^{N \times N} \) and \( \varvec{u} \in C^{N} ,\;\varvec{P}_{\varvec{V}}^{\varvec{R}} (\varvec{u}) \in \varvec{V} \) is the R-orthogonal projection of u onto a subspace \( \varvec{V} \in C^{N} \); i.e., \( \varvec{P}_{\varvec{V}}^{\varvec{R}} (\varvec{u}) \) is the unique vector satisfying \( \left\| {\varvec{u} - \varvec{P}_{\varvec{V}}^{\varvec{R}} (\varvec{u})} \right\|_{\varvec{R}} \,= { \hbox{min} }_{{\varvec{v} \in \varvec{V}}} \left\| {\varvec{u} - \varvec{v}} \right\|_{\varvec{R}} ,\;\varvec{V}_{{\lambda_{i} }} \) is the generalized eigen-subspace associated with the ith smallest generalized eigenvalue \( \lambda_{i} \), i.e., \( \varvec{V}_{{\lambda_{i} }} = \left\{ {\varvec{v} \in C^{N} |\varvec{R}_{\varvec{y}} \varvec{v} = \lambda_{i} \varvec{R}_{\varvec{x}} \varvec{v}} \right\}\;(i = 1,2, \ldots ,N) \). (Note that \( \varvec{V}_{{\lambda_{i} }} = \varvec{V}_{{\lambda_{j} }} \) if \( \lambda_{i} = \lambda_{j} \) for some \( i \ne j \)), \( \varvec{V}_{{ < \varvec{R} > }}^{ \bot } \) is the R-orthogonal complement subspace of V for any subspace \( \varvec{V} \subset C^{N} ,\;i.e.,\;\varvec{V}_{{ < \varvec{R} > }}^{ \bot } = \left\{ {\varvec{u} \in C^{N} \,|\, < \varvec{u},\varvec{v} >_{\varvec{R}} = \varvec{v}^{H} \varvec{Ru} = 0,\;\forall \varvec{v} \in \varvec{V}} \right\} \).

Next, we will present two theorems to show the convergence of our algorithms. In the following, two cases will be considered. In Case 1, λ 1 = λ 2 = ··· = λ N and in Case 2, λ 1 < λ N .

Theorem 8.1 (Convergence analysis of fGMCA)

Suppose that the sequence \( [\varvec{w}(k),\lambda (k)]_{k = 0}^{\infty } \) is generated by DDT System 1 with any \( \eta_{1} ,\gamma_{1} \in (0,1] \) , any initial R x -normalized vector \( \varvec{w}(0) \notin (\varvec{V}_{{\lambda_{1} }} )_{{ < \varvec{R}_{\varvec{x}} > }}^{ \bot } \) , and any λ(0) > 0. Then for Case 1, it holds that w (k) = w (0) for all k ≥ 0, which is also a generalized eigenvector associated with the generalized eigenvalue λ 1 of the matrix pencil ( R y , R x ), and \( \mathop { \lim }\limits_{k \to \infty } \lambda (k) = \lambda_{1} \) . For Case 2, it holds that

$$ \mathop { \lim }\limits_{k \to \infty } \varvec{w}\,(k) = \frac{{\varvec{P}_{{{\text{V}}_{{\lambda_{1} }} }}^{{\varvec{R}_{\varvec{x}} }} \left[ {\varvec{w}(0)} \right]}}{{\left\| {\varvec{P}_{{{\text{V}}_{{\lambda_{1} }} }}^{{\varvec{R}_{\varvec{x}} }} \left[ {\varvec{w}(0)} \right]} \right\|_{{\varvec{R}_{\varvec{x}} }} }}, $$
(8.140)
$$ \mathop { \lim }\limits_{k \to \infty } \,\lambda\,(k) =\lambda_{1} . $$
(8.141)

Proof

Case 1:

Since λ 1 = λ 2 = ··· = λ N ensures \( \varvec{V}_{{\lambda_{1} }} = C^{N} \), we can verify that for all k ≥ 0 that \( \varvec{w}(k) = \varvec{w}(0) \ne {\mathbf{0}} \), which is also a generalized eigenvector associated with the generalized eigenvalue λ 1 of matrix pencil (R y , R y ). Moreover, from (8.128) we have λ(k + 1) = (1 − γ 1)λ(k) + γ 1 λ 1 for all k ≥ 0. Hence

$$ \begin{aligned} \lambda (k + 1) & = (1 - \gamma_{1} )\lambda (k) + \gamma_{1} \lambda_{1} = \cdots \\ & = (1 - \gamma_{1} )^{k + 1} \lambda (0) + \gamma_{1} \lambda_{1} [1 + (1 - \gamma_{1} ) + \cdots + (1 - \gamma_{1} )^{k} ] \\ & = (1 - \gamma_{1} )^{k + 1} \lambda (0) + \lambda_{1} [1 - (1 - \gamma_{1} )^{k + 1} ] \\ & = \lambda_{1} + (1 - \gamma_{1} )^{k + 1} [\lambda (0) - \lambda_{1} ]. \\ \end{aligned} $$
(8.142)

Since \( \gamma_{1} \in (0,1] \), we can verify that \( \mathop { \lim }\limits_{k \to \infty } \lambda \,(k) = \lambda_{1} \).

Case 2: Suppose that the generalized eigenvalues of the matrix pencil (R y , R x ) have been ordered as \( \lambda_{1} = \cdots = \lambda_{r} < \lambda_{r + 1} \le \cdots \le \lambda_{N} \;(1 \le r \le N) \). Since \( \left\{ {\varvec{v}_{1} ,\varvec{v}_{2} , \ldots ,\varvec{v}_{N} } \right\} \) is an R x -orthonormal basis of \( C^{N} \), w(k) in DDT System 1 can be written uniquely as:

$$ \varvec{w}(k) = \sum\limits_{i = 1}^{N} {z_{i} (k)\,\varvec{v}_{i} ,\;k = 0,1, \ldots } $$
(8.143)

where \( z_{i} (k) = \left\langle {\varvec{w}(k),\varvec{v}_{i} } \right\rangle_{{\varvec{R}_{\varvec{x}} }} = \varvec{v}_{i}^{H} \varvec{R}_{\varvec{x}} \varvec{w}(k),\;i = 1,2, \ldots ,N. \)

First, we will prove by mathematical induction that for all k > 0, w(k) is well defined, R x -normalized, i.e.,

$$ \varvec{w}(k)^{H} \varvec{R}_{x} \varvec{w}(k) = \sum\limits_{i = 1}^{N} {\left| {z_{i} \left( k \right)} \right|^{2} } = 1, $$
(8.144)

and \( \varvec{w}\left( k \right) \notin \left( {\varvec{V}_{{\lambda_{1} }} } \right)_{{\langle \varvec{R}_{\varvec{x}} \rangle }}^{ \bot } \), i.e., [z 1(k), z 2(k),…, z r (k)] ≠ 0. Note that \( \varvec{w}\left( 0 \right) \notin \left( {\varvec{V}_{{\lambda_{1} }} } \right)_{{\langle \varvec{R}_{\varvec{x}} \rangle }}^{ \bot } \) is R x -normalized. Assume that w(k) is well defined, R x -normalized, and \( \varvec{w}\left( k \right) \notin \left( {\varvec{V}_{{\lambda_{1} }} } \right)_{{\langle \varvec{R}_{\varvec{x}} \rangle }}^{ \bot } \) for some k > 0. By letting \( \tilde{\varvec{w}}\,(k + 1) = \sum\nolimits_{i = 1}^{N} {\tilde{z}_{i} (k + 1){\mathbf{v}}_{i} } \), from (8.134) and (8.143), we have the following:

$$ \tilde{z}_{i} (k + 1) = z_{i} (k)\left\{ {1 + \eta_{1} \left[ {\frac{1}{{\lambda_{i} \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right]} \right\}. $$
(8.145)

Since matrix pencil \( \left( {\varvec{R}_{\varvec{x}} ,\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} } \right) \) has the same eigen pairs as \( \left( {\varvec{R}_{\varvec{y}} ,\varvec{R}_{\varvec{x}} } \right) \), and w(k) is R x -normalized, it follows that

$$ \lambda_{ 1} \le \frac{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{w}(k)}}{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}\,(k)}} = \frac{1}{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w} (k )}} \le\lambda_{N} , $$
(8.146)

which is a generalization of the Rayleigh–Ritz ratio [19]. For i = 1,…, r, (8.146) and (8.145) guarantee that

$$ 1 + \eta_{1} \left[ {\frac{1}{{\lambda_{i} \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right] = 1 + \eta_{1} \left[ {\frac{1}{{\lambda_{1} }}\frac{1}{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right] \ge 1, $$
(8.147)

and [z 1(k + 1), z 2(k + 1), …, z r (k + 1)] ≠ 0. These imply that \( \tilde{\varvec{w}}\,(k + 1) \ne 0 \) and \( \varvec{w}\,(k + 1) = \sum\nolimits_{i = 1}^{N} {z_{i} (k + 1)\,\varvec{v}_{i} } \) is well defined, R x -normalized, and \( \varvec{w}\left( {k + 1} \right) \notin \left( {\varvec{V}_{{\lambda_{1} }} } \right)_{{\langle \varvec{R}_{\varvec{x}} \rangle }}^{ \bot } \), where

$$ z_{i} (k + 1) = \frac{{\tilde{z}_{i} (k + 1)}}{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{\varvec{R}_{\varvec{x}} }} }}. $$
(8.148)

Therefore, w(k) is well defined, R x -normalized, and \( \varvec{w}\left( k \right) \notin \left( {\varvec{V}_{{\lambda_{1} }} } \right)_{{\langle \varvec{R}_{\varvec{x}} \rangle }}^{ \bot } \) for all k ≥ 0.

Second, we will prove (8.125). Note that \( \varvec{w}\left( 0 \right) \notin \left( {\varvec{V}_{{\lambda_{1} }} } \right)_{{\langle \varvec{R}_{\varvec{x}} \rangle }}^{ \bot } \) implies the existence of some m∈ {1,…, r} satisfying z m (0) ≠ 0, where λ 1 = ··· = λ m  = ··· = λ r . From (8.145) and (8.148), we have z m (k + 1)/z m (0) > 0 for all k ≥ 0. By using (8.145) and (8.148), we can see that for i = 1,…, r, it holds that

$$ \begin{aligned} \frac{{z_{i} (k + 1)}}{{z_{m} (k + 1)}} & = \frac{{\tilde{z}_{i} (k + 1)}}{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{\varvec{R}_{\varvec{x}} }} }}\frac{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{\varvec{R}_{\varvec{x}} }} }}{{\tilde{z}_{m} (k + 1)}} \\ & = \frac{{z_{i} (k)}}{{z_{m} (k)}} \cdot \frac{{1 + \eta_{1} \left[ {\frac{1}{{\lambda_{i} \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right]}}{{1 + \eta_{1} \left[ {\frac{1}{{\lambda_{m} \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right]}} \\ & = \frac{{z_{i} (k)}}{{z_{m} (k)}} = \cdots = \frac{{z_{i} (0)}}{{z_{m} (0)}}. \\ \end{aligned} $$
(8.149)

On the other hand, by using (8.145) and (8.148), we have for all k ≥ 0 and i = r + 1, …, N that

$$ \begin{aligned} \frac{{\left| {z_{i} (k + 1)} \right|^{2} }}{{\left| {z_{m} (k + 1)} \right|^{2} }} & = \frac{{\tilde{z}_{i} (k + 1)}}{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{{\varvec{R}}_{\varvec{x}} }} }}\frac{{\left\| {\tilde{\varvec{w}}(k + 1)} \right\|_{{{\varvec{R}}_{\varvec{x}} }} }}{{\tilde{z}_{m} (k + 1)}} \\ & = \left[ {\frac{{1 + \eta_{1} \left( {\frac{1}{{\lambda_{i} \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right)}}{{1 + \eta_{1} \left( {\frac{1}{{\lambda_{m} \varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} - 1} \right)}}} \right]^{2} \cdot \frac{{\left| {z_{i} (k)} \right|^{2} }}{{\left| {z_{m} (k)} \right|^{2} }} \\ & = \left[ {1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{i} }}}}{{(\frac{1}{{\eta_{1} }} - 1)\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k) + \frac{1}{{\lambda_{1} }}}}} \right]^{2} \cdot \frac{{\left| {z_{i} (k)} \right|^{2} }}{{\left| {z_{m} (k)} \right|^{2} }} = \psi (k)\frac{{\left| {z_{i} (k)} \right|^{2} }}{{\left| {z_{m} (k)} \right|^{2} }}, \\ \end{aligned} $$
(8.150)

where

$$ \psi \,(k) = \left[ {1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{i} }}}}{{\left( {\frac{1}{{\eta_{1} }} - 1} \right)\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k) + \frac{1}{{\lambda_{1} }}}}} \right]^{2} . $$
(8.151)

For all i = r + 1, …, N, together with η 1 ∈ (0, 1] and 1 1−1 i  > 0, Eq. (8.146) guarantees that

$$ \begin{aligned} 1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{i} }}}}{{\left( {\frac{1}{{\eta_{1} }} - 1} \right)\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k) + \frac{1}{{\lambda_{1} }}}} & \le 1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{r + 1} }}}}{{\left( {\frac{1}{{\eta_{1} }} - 1} \right)\frac{1}{{\lambda_{1} }} + \frac{1}{{\lambda_{1} }}}} \\ & = 1 - \eta_{1} \left( {1 - \frac{{\lambda_{1} }}{{\lambda_{r + 1} }}} \right) < 1, \\ \end{aligned} $$
(8.152)

and

$$ \begin{aligned} 1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{i} }}}}{{\left( {\frac{1}{{\eta_{1} }} - 1} \right)\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k) + \frac{1}{{\lambda_{1} }}}} & \ge 1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{N} }}}}{{\left( {\frac{1}{{\eta_{1} }} - 1} \right)\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k) + \frac{1}{{\lambda_{1} }}}} \\ & \text{ = }1 - \frac{{\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{N} }}}}{{\frac{1}{{\eta_{1} }}\frac{1}{{\lambda_{N} }} + \left( {\frac{1}{{\lambda_{1} }} - \frac{1}{{\lambda_{N} }}} \right)}} > 0. \\ \end{aligned} $$
(8.153)

From (8.152) and (8.153), we can verify that

$$ 0 < \psi (k) < 1,\;i = r + 1, \ldots ,N, $$
(8.154)

for all k ≥ 0. Denote ψ max = max{ψ(k)|k ≥ 0}. Clearly 0 < ψ max < 1. From (8.150), we have the following:

$$ \frac{{\left| {z_{i} (k + 1)} \right|^{2} }}{{\left| {z_{m} (k + 1)} \right|^{2} }} \le \psi_{ \hbox{max} } \frac{{\left| {z_{i} (k)} \right|^{2} }}{{\left| {z_{m} (k)} \right|^{2} }} \le \cdots \le \psi_{ \hbox{max} }^{k + 1} \frac{{\left| {z_{i} (0)} \right|^{2} }}{{\left| {z_{m} (0)} \right|^{2} }}. $$
(8.155)

Since w ( k) is R x -normalized, |z m (k)|2≤ 1 for all k ≥ 0, it follows from (8.155) that

$$ \begin{aligned} \sum\limits_{i = r + 1}^{N} {\left| {z_{i} (k)} \right|^{2} } & \le \sum\limits_{i = r + 1}^{N} {\frac{{\left| {z_{i} (k)} \right|^{2} }}{{\left| {z_{m} (k)} \right|^{2} }}} \le \cdots \\ & \le \psi_{\hbox{max} }^{k} \sum\limits_{i = r + 1}^{N} {\frac{{\left| {z_{i} (0)} \right|^{2} }}{{\left| {z_{m} (0)} \right|^{2} }}} \to 0\;{\text{as}}\;k \to \infty , \\ \end{aligned} $$
(8.156)

which along with (8.144) implies that

$$ \mathop { \lim }\limits_{k \to \infty } \sum\limits_{i = 1}^{r} {\left| {z_{i} (k)} \right|^{2} = 1.} $$
(8.157)

Note that z m (k)/z m (0) > 0 for all k ≥ 0. Then, from (8.149) and (8.157) we have the following:

$$ \mathop { \lim }\limits_{k \to \infty } z_{i} (k) = \frac{{z_{i} \left( 0 \right)}}{{\sqrt {\sum\nolimits_{j = 1}^{r} {\left| {z_{j} (0)} \right|^{2} } } }},\;i = 1,2, \ldots ,r. $$
(8.158)

Based on (8.156) and (8.158), (8.140) can be obtained as follows:

$$ \mathop { \lim }\limits_{k \to \infty } \,\varvec{w}(k) = \sum\limits_{i = 1}^{r} {\frac{{z_{i} (0)}}{{\sqrt {\sum\nolimits_{j = 1}^{r} {\left| {z_{j} (0)} \right|^{2} } } }}} \varvec{v}_{i} = \frac{{\varvec{P}_{{\varvec{V}_{{\lambda_{1} }} }}^{{\varvec{R}_{\varvec{x}} }} [\varvec{w}\,(0)]}}{{\left\| {\varvec{P}_{{\varvec{V}_{{\lambda_{1} }} }}^{{\varvec{R}_{\varvec{x}} }} [\varvec{w}(0)]} \right\|_{{\varvec{R}_{\varvec{x}} }} }}. $$
(8.159)

Finally, we will prove (8.141). From (8.159), we can see that

$$ \mathop { \lim }\limits_{k \to \infty } \frac{1}{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{x}} \varvec{R}_{\varvec{y}}^{ - 1} \varvec{R}_{\varvec{x}} \varvec{w}(k)}} =\lambda_{1} . $$
(8.160)

That is, for any small positive δ, there exists a K > 0 satisfying

$$ \lambda_{1} - \delta < \frac{1}{{\varvec{w}^{H} (k)\varvec{R}_{x} \varvec{R}_{y}^{ - 1} \varvec{R}_{x} \varvec{w}(k)}} <\lambda_{1} + \delta , $$
(8.161)

for all k > K. It follows from (8.128) that

$$ \begin{aligned}\lambda\,(k) & > \left( {1 - \gamma_{1} } \right)\lambda\left( {k - 1} \right) + \gamma_{1} \left( {\lambda_{1} - \delta } \right) > \cdots > \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} {\lambda (}K\text{)} + \gamma_{1} \left( {\lambda_{1} - \delta } \right) \\ & \quad { \times }\,\left[ {1 + \left( {1 - \gamma_{1} } \right) + \cdots + \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} } \right] = \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} {\lambda (}K\text{)} + \gamma_{1} \left( {\lambda_{1} - \delta } \right) \\ & \quad \times \,\left[ {1 - \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} } \right] = \left( {\lambda_{1} - \delta } \right) + \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} \left[ {{\lambda (}K\text{)} -\lambda_{1} + \delta } \right], \\ \end{aligned} $$
(8.162)

and

$$ \begin{aligned}\lambda\,(k) & < \left( {1 - \gamma_{1} } \right)\lambda\left( {k - 1} \right) + \gamma_{1} \left( {\lambda_{1} + \delta } \right) < \cdots < \left( {1 - \gamma_{1} } \right)^{{k{ - }K}}\lambda\,(K) + \gamma_{1} \left( {\lambda_{1} + \delta } \right) \\ & \quad \times \,\left[ {1 + \left( {1 - \gamma_{1} } \right) + \cdots + \left( {1 - \gamma_{1} } \right)^{{k{ - }K - 1}} } \right] = \left( {1 - \gamma_{1} } \right)^{{k{ - }K}}\lambda\,(K) + \left( {\lambda_{1} + \delta } \right) \\ & \quad \times \,\left[ {1 - \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} } \right] = \left( {\lambda_{1} + \delta } \right) + \left( {1 - \gamma_{1} } \right)^{{k{ - }K}} \left[ {\lambda\,(K) -\lambda_{1} - \delta } \right], \\ \end{aligned} $$
(8.163)

for all k > K. Since γ 1 ∈ (0, 1], it is easy to verify from (8.162) and (8.163) that \( \mathop { \lim }\limits_{k \to \infty }\lambda\,(k) =\lambda_{1} \).

This completes the proof.

Theorem 8.2 (Convergence analysis of fGPCA)

Suppose that the sequence \( [\varvec{w}(k),\lambda (k)]_{k = 0}^{\infty } \) is generated by DDT System 2 with any \( \eta_{2} ,\gamma_{2} \in (0,1] \) , any initial R y -normalized vector \( \varvec{w}(0) \notin (\varvec{V}_{{\lambda_{N} }} )_{{ < \varvec{R}_{x} > }}^{ \bot } \) , and any λ(0) > 0. Then for Case 1, it holds that w (k) = w (0) for all k ≥ 0, which is also a generalized eigenvector associated with the generalized eigenvalue λ N of the matrix pencil ( R y , R x ), and \( \mathop { \lim }\limits_{k \to \infty } \lambda (k) = \lambda_{N} \) . For Case 2, it holds that

$$ \mathop { \lim }\limits_{k \to \infty } \varvec{w}\,(k) = \sqrt {\frac{1}{{\lambda_{N} }}} \frac{{\varvec{P}_{{\varvec{V}_{{\lambda_{N} }} }}^{{\varvec{R}_{x} }} \left[ {\varvec{w}\,(0)} \right]}}{{\left\| {\varvec{P}_{{\varvec{V}_{{\lambda_{N} }} }}^{{\varvec{R}_{x} }} \left[ {\varvec{w}\,(0)} \right]} \right\|_{{\varvec{R}_{x} }} }}, $$
(8.164)
$$ \mathop { \lim }\limits_{k \to \infty } \,\lambda(k) =\lambda_{N} . $$
(8.165)

The proof of Theorem 8.2 is similar to that of Theorem 8.1. A minor difference is that we need to calculate the R y -norm of w(k) at each step. Another minor difference is that in (8.146), it holds that matrix pencil \( (\varvec{R}_{\varvec{y}} \varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} ,\varvec{R}_{\varvec{y}} ) \) has the same eigen pairs as (R y , R x ) and w(k) is well defined, R y -normalized, and \( \varvec{w}\,(k) \notin (\varvec{V}_{{\lambda_{N} }} )_{{\left\langle {\varvec{R}_{x} } \right\rangle }}^{ \bot } \) for all \( k \ge 0 \). Therefore,

$$ \lambda_{ 1} \le \frac{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{y}} \varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} \varvec{w}(k)}}{{\varvec{w}^{H} (k)\varvec{R}_{\varvec{y}} \varvec{w}(k)}} = \varvec{w}^{H} (k)\varvec{R}_{\varvec{y}} \varvec{R}_{\varvec{x}}^{ - 1} \varvec{R}_{\varvec{y}} \varvec{w} (k )\le\lambda_{N} . $$
(8.166)

Particularly, if \( \lambda_{1} \) and \( \lambda_{2} \) are distinct \( (\lambda_{1} < \lambda_{2} \le \cdots \le \lambda_{N} ) \), we have \( \varvec{V}_{{\lambda_{ 1} }} = {\text{span}}\left\{ {\varvec{V}_{1} } \right\}, \) \( \varvec{P}_{{\varvec{V}_{{\lambda_{ 1} }} }}^{{\varvec{R}_{\varvec{x}} }} [\,\varvec{w}\,(0)] = \left\langle {\varvec{w}\,(0),\,\varvec{V}_{1} } \right\rangle_{{\varvec{R}_{\varvec{x}} }} \varvec{V}_{1} \), and \( \left\| {\varvec{P}_{{\varvec{V}_{{\lambda_{ 1} }} }}^{{\varvec{R}_{\varvec{x}} }} [\varvec{w}\,(0)]} \right\|_{{\varvec{R}_{\varvec{x}} }} = \left| {\left\langle {\varvec{w}\,(0),\,\varvec{V}_{1} } \right\rangle_{{\varvec{R}_{\varvec{x}} }} } \right| \). Moreover, if \( \lambda_{N - 1} \) and \( \lambda_{N} \) are distinct \( \text{(}\lambda_{1} \le \cdots \le \lambda_{N - 1} < \lambda_{N} ) \), we have \( \varvec{V}_{{\lambda_{\text{N}} }} = {\text{span}}\left\{ {\varvec{V}_{N} } \right\},\;\varvec{P}_{{\varvec{V}_{{\lambda_{\text{N}} }} }}^{{\varvec{R}_{\varvec{y}} }} [\varvec{w}(0)] = \left\langle {\varvec{w}(0),\varvec{V}_{N} } \right\rangle_{{\varvec{R}_{\varvec{y}} }} \varvec{V}_{N} \) and \( \left\| {\varvec{P}_{{\varvec{V}_{{\lambda_{\text{N}} }} }}^{{\varvec{R}_{\varvec{y}} }} [\varvec{w}\,(0)]} \right\|_{{\varvec{R}_{\varvec{y}} }} = \left| {\left\langle {\varvec{w}\,(0),\,\varvec{V}_{N} } \right\rangle_{{\varvec{R}_{\varvec{y}} }} } \right| \). Hence, the following corollaries hold.

Corollary 8.1

Suppose that \( \lambda_{1} < \lambda_{2} \le \cdots \le \lambda_{N} \) . Then the sequence \( [\varvec{w}\,(k),\lambda (k)]_{k = 0}^{\infty } \) generated by DDT System 1 with any \( \eta_{1} ,\gamma_{1} \in (0,1] \) , any initial R x -normalized vector \( \varvec{w}\,(0) \notin (\varvec{V}_{{\lambda_{1} }} )_{{\left\langle {\varvec{R}_{\varvec{x}} } \right\rangle }}^{ \bot } \) , and any \( \lambda (0) > 0 \) satisfies

$$ \mathop { \lim }\limits_{k \to \infty } \varvec{w}(k) = \frac{{\left\langle {\varvec{w}(0),\varvec{V}_{1} } \right\rangle_{{\varvec{R}_{\varvec{x}} }} \varvec{V}_{1} }}{{\left| {\left\langle {\varvec{w}(0),\varvec{V}_{1} } \right\rangle_{{{\mathbf{R}}_{{\mathbf{x}}} }} } \right|}}, $$
(8.167)
$$ \mathop { \lim }\limits_{k \to \infty } \lambda (k) = \lambda_{1} . $$
(8.168)

Corollary 8.2

Suppose that \( \lambda_{1} \le \cdots \le \lambda_{N - 1} < \lambda_{N} \) . Then the sequence \( [\varvec{w}(k),\lambda (k)]_{k = 0}^{\infty } \) generated by DDT System 2 with any \( \eta_{2} ,\gamma_{2} \in (0,1] \) , any initial R y -normalized vector \( \varvec{w}\,(0) \notin (\varvec{V}_{{\lambda_{N} }} )_{{\left\langle {\varvec{R}_{\varvec{y}} } \right\rangle }}^{ \bot } \) , and any \( \lambda (0) > 0 \) satisfies

$$ \mathop { \lim }\limits_{k \to \infty } \varvec{w}(k) = \sqrt {\frac{1}{{\lambda_{N} }}} \frac{{\left\langle {\varvec{w}(0),\varvec{V}_{N} } \right\rangle_{{\varvec{R}_{x} }} \varvec{V}_{N} }}{{\left| {\left\langle {\varvec{w}(0),\varvec{V}_{N} } \right\rangle_{{\varvec{R}_{x} }} } \right|}}, $$
(8.169)
$$ \mathop { \lim }\limits_{k \to \infty } \lambda (k) = \lambda_{N} . $$
(8.170)

8.4.4 Numerical Examples

In this section, we present two numerical examples to evaluate the performance of our algorithms (fGMCA and fGPCA). The first estimates the principal and minor generalized eigenvectors from two random vector processes, which are generated by two sinusoids with additive noise. The second illustrates performance of our algorithms for the BSS problem. Besides nGMCA and nGPCA, we also compare with the following algorithms, which were proposed in the recent ten years:

  1. (1)

    Gradient-based: adaptive version of ([4], Alg. 2) with negative (for GPCA) and positive (for GMCA) step sizes;

  2. (2)

    Power-like: fast generalized eigenvector tracking [20] based on the power method;

  3. (3)

    R-GEVE: reduced-rank generalized eigenvector extraction algorithm [21];

  4. (4)

    Newton-type: adaptive version of Alg. I proposed in [22].

  1. A.

    Experiment 1

In this experiment, the input samples are generated by:

$$ y\,(n) = \sqrt 2 \, \sin (0.62\pi n + \theta_{1} ) + \varsigma_{1} (n), $$
(8.171)
$$ x\,(n) = \sqrt 2 \,{ \sin }(0.46\pi n + \theta_{2} ) + \sqrt 2 \,{ \sin }(0.74\pi n + \theta_{3} ) + \varsigma_{2} (n), $$
(8.172)

where θ i (i = 1, 2, 3) are the initial phases, which follow uniform distributions within [0, 2π], and ζ 1(n) and ζ 2(n) are zero-mean white noises with variance \( \sigma_{1}^{2} = \sigma_{2}^{2} = 0.1 \).

The input vectors {y(k)} and {x(k)} are arranged in blocks of size N = 8, i.e., y(k) = [y(k),…, y(k − N+1)]T and x(k) = [x(k),…, x(k − N + 1)]T, k ≥ N. Define the N × N matrix pencil \( (\overline{\varvec{R}}_{\varvec{y}} ,\,\overline{\varvec{R}}_{\varvec{x}} ) \) with the (p, q) entry (p, q = 1,2,…,N) of \( \overline{\varvec{R}}_{\varvec{y}} \) and \( \overline{\varvec{R}}_{\varvec{x}} \) given by

$$ \left[ {\overline{\varvec{R}}_{\varvec{y}} } \right]_{pq} = { \cos }\left[ {0.62\pi \,(p - q)} \right] + \delta_{pq} \sigma_{1}^{2} , $$
(8.173)
$$ \left[ {\overline{\varvec{R}}_{\varvec{x}} } \right]_{pq} = { \cos }\left[ {0.46\pi \,(p - q)} \right] + { \cos }\left[ {0.74\pi \,(p - q)} \right] + \delta_{pq} \sigma_{2}^{2} . $$
(8.174)

For comparison, the direction cosine DC(k) is used to measure the accuracy of direction estimate. We also measure the numerical stability of all algorithms by the sample standard deviation of the direction cosine:

$$ SSD(k) = \sqrt {\frac{1}{L - 1}\sum\limits_{j = 1}^{L} {\left[ {{\text{DC}}_{j} (k) - \overline{\text{DC}} (k)} \right]^{2} } } , $$
(8.175)

where DC j (k) is the direction cosine of the jth independent run (j = 1, 2,…, L) and \( \overline{\text{DC}} (k) \) is the average over L = 100 independent runs.

In this example, we conduct two simulations. In the first simulation, we use fGMCA, nGMCA, and the other aforementioned algorithms to extract the minor generalized eigenvector of matrix pencil (R y , R x ). Note that in gradient-based algorithm a positive step size is used, and the other algorithms are applied to estimate the principal generalized eigenvector of matrix pencil (R y , R x ) which is also the minor generalized eigenvector of (R y , R x ). In the second simulation, we use fGPCA, nGPCA, and the other algorithms to extract the principal generalized eigenvector of matrix pencil (R y , R x ). Note that in gradient-based algorithm a negative step size is used. The sets of parameters used in simulations refer to [4], [22]. All algorithms have been initialized with \( \widehat{\varvec{R}}_{x} (0)\text{ = }\widehat{\varvec{R}}_{y} (0) = \varvec{Q}_{x} (0) = \varvec{Q}_{y} (0) = \varvec{I}_{N} \) (if used) and w(0) = e 1, where e 1 stands for the first columns of I N .

The experimental results are shown in Figs. 8.10 to 8.12 and Table 8.1.

Fig. 8.10
figure 10

Example 1: Direction cosine of the principal/minor generalized eigenvector. a First simulation. b Second simulation

Fig. 8.11
figure 11

Example 1: Sample standard deviation of the direction cosine. a First simulation. b Second simulation

Fig. 8.12
figure 12

Example 1: Generalized eigenvalues estimation. a First simulation: principal generalized eigenvalues estimation. b Second simulation: minor generalized eigenvalues estimation

Table 8.1 Computational complexity of all algorithms

Figures 8.10 and 8.11 depict the time course of direction cosine for generalized eigenvector estimation and sample standard deviation of the direction cosine. The results of minor and principal generalized eigenvalues estimation of all generalized eigen-pair extraction algorithms are shown in Fig. 8.12. We find that fGM(P)CA converge faster than nGMCA and nGPCA at the beginning steps, respectively, and fGMCA and fGPCA have similar estimation accuracy as nGMCA and nGPCA, respectively. Figure 8.12 shows that all generalized eigen-pair extraction algorithms can extract the principal or minor generalized eigenvalue efficiently.

The computational complexities of all aforementioned algorithms are shown in Table 8.1. We find that Newton-type has the lowest computational complexity but the worst estimation accuracy and standard deviation. The Power-like has the highest computational complexity compared with the other algorithms. The nGM(P)CA and gradient-based algorithms have same computational complexity. The computational complexities of R-GEVE and the proposed algorithms are similar, which are lower than that of nGM(P)CA and gradient-based algorithms.

  1. B.

    Experiment 2

We perform this experiment to show the performance of our algorithm for the BSS problem. Consider a linear BSS model [23]:

$$ \varvec{x}(n) = \varvec{As}(n) + \varvec{e}(n), $$
(8.176)

where x(n) is a r-dimensional vector of the observed signals at time k, s(n) is a l-dimensional vector of the unknown source signals, A ∈ R l×r denotes the unknown mixing matrix, and e(n) is an unknown noise vector. In general, BSS problem is that of finding a separating matrix W such that the r-dimensional output signal vector y = W T x contains components that are as independent as possible. In this experiment, we compare the proposed algorithms with nGMCA and nGPCA algorithms, as well as batch-processing generalized eigenvalue decomposition method (EVD method in MATLAB software). We use the method given in [20, 22] to formulate the matrix pencil by applying FIR filtering. z(n), the output of FIR filter, is given as

$$ \varvec{z}\,(n) = \sum\limits_{t = 0}^{m} {\tau \,(t)\varvec{x}\,(n - t)} , $$
(8.177)

where τ(t) are the coefficients of the FIR filter. Let \( \varvec{R}_{\varvec{x}} = E[\varvec{x}(k)\varvec{x}^{\text{T}} (k)] \) and \( \varvec{R}_{\varvec{z}} = E[\varvec{z}(k)\varvec{z}^{\text{T}} (k)] \). It was shown in [20] that the separating matrix W can be found by extracting the generalized eigenvectors of matrix pencil (R z , R x ). Hence, the BSS problem can be formulated as finding the generalized eigenvectors associated with the two sample sequences x(k) and z(k). Therefore, we can directly apply our algorithm to solve the BSS problem.

In the simulation, four benchmark signals are extracted from the file ABio7.mat provided by ICALAB [23], as shown in Fig. 8.13. We use the mixing matrix

$$ {\mathbf{A}} = \left[ {\begin{array}{*{20}c} { 2. 7 9 1 4} & { - 0. 1 7 8 0} & { - 0. 4 9 4 5} & { 0. 3 0 1 3} \\ { 1. 3 2 2 5} & { - 1. 7 8 4 1} & { - 0. 3 6 6 9} & { 0. 4 4 6 0} \\ { 0. 0 7 1 4} & { - 1. 9 1 6 3} & { 0. 4 8 0 2} & { - 0. 3 7 0 1} \\ { - 1. 7 3 9 6} & { 0. 1 3 0 2} & { 0. 9 2 4 9} & { - 0. 4 0 0 7} \\ \end{array} } \right], \, $$
(8.178)

which was randomly generated. e[n] is a zero-mean white noise vector with covariance 10−5 I. Figure 8.14 shows the mixed signals. We use a simple FIR filter with coefficients τ = [1, − 1]T.

Fig. 8.13
figure 13

Four original signals

Fig. 8.14
figure 14

Mixed signals

Suppose that the matrix pencil (R z , R x ) has four eigenvectors w 1, w 2, w 3, w 4 associated with four eigenvalues \( \sigma_{1} < \sigma_{2} < \sigma_{3} < \sigma_{4} \). Thus, \( \varvec{B} = [\varvec{w}_{1} ,\varvec{w}_{2} ,\varvec{w}_{3} ,\varvec{w}_{4} ] \). We use fGPCA, nGPCA, and all other algorithms to extract the two principal generalized eigenvectors (w 3 and w 4). To extract the two minor generalized eigenvectors (w 1 and w 2), we use fGMCA, nGMCA, and gradient-based algorithms to extract the minor generalized eigenvectors of matrix pencil (R z , R x ) and other algorithms to extract the principal generalized eigenvectors of matrix pencil (R x , R z ). All parameters and initial values are the same as in Example 1.

Similar to Example 1, a total of L = 100 independent runs are evaluated in this example. The separating matrix B is calculated as \( \varvec{B} = (1\,/\,L)\sum\nolimits_{j = 1}^{L} {\varvec{B}_{j} } \), where B j is the separating matrix extracted from the jth independent run (j = 1, 2,…, L).

Figures 8.15 to 8.16 show the recovered signals by EVD and our method, respectively. Signals separated by other algorithms are similar to Figs. 8.15 and 8.16, which are not shown in these two figures. Table 8.2 shows the absolute values of correlation coefficients between the sources and the recovered signals. The simulation results demonstrate that all methods can solve the BSS problem effectively, and our algorithms and the algorithms proposed in [4] can separate the signals more accurately than other algorithms. Moreover, the advantage of neural network model-based algorithms over EVD method for the BSS problem is that they are recursive algorithms and therefore can be implemented online, whereas EVD is a batch-processing method and therefore needs intensive computation.

Fig. 8.15
figure 15

Signals separated by EVD method

Fig. 8.16
figure 16

Signals separated by proposed method

Table 8.2 Absolute values of correlation coefficients between sources and recovered signals

In this section, we have derived a coupled dynamic system for GHEP based on a novel generalized information criterion. Compared with the existing work, the proposed approach is easier to obtain for that it does not need to calculate the inverse of the Hessian. Based on the dynamic system, a coupled GMCA algorithm (fGMCA) and a coupled GPCA algorithm (fGPCA) have been obtained. The convergence speed of fGMCA and fGPCA is similar to that of Nguyen’s well-performed algorithms (nGMCA and nGPCA), but the computational complexity is less than that of Nguyen. Experiment results show that our algorithms have better numerical stability and can extract the generalized eigenvectors more accurately than the other algorithms.

8.5 Summary

In this chapter, the speed stability problem that plagues most noncoupled learning algorithms has been discussed and the coupled learning algorithms that are a solution for the speed stability problem have been analyzed. Moller’s coupled PCA algorithm, Nguyen’s coupled generalized eigen pair extraction algorithm, coupled singular value decomposition of a cross-covariance matrix, etc., have been reviewed. Then, unified and coupled algorithms for minor and principal eigen pair extraction proposed by us have been introduced, and their convergence has been analyzed. Finally, a fast and adaptive coupled generalized eigen pair extraction algorithm proposed by us has been analyzed in detail, and their convergence analysis has been proved via the DDT method.