Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

The goal of independent component analysis (ICA) is to decompose observed signals into components as independent as possible. In linear instantaneous blind source separation, ICA is used to separate linear instaneous mixtures of source signals into signals which are as close as possible to the original signals. In the estimation of the so-called demixing matrix, one has to distinguish two different factors:

  1. 1.

    Variance of the estimated inverse mixing matrix in the noiseless case due to randomness of the sources. This variance can be lower bounded by the Cramér-Rao bound for ICA derived for the real case in [41, 45] and for the circular and noncircular complex case in [33, 35].

  2. 2.

    Bias of the demixing matrix from the inverse mixing matrix: As already noted in [16], the presence of noise leads to a bias of the demixing matrix from the inverse mixing matrix. Often a bias of an estimator is considered to be unwanted, but in the case of noisy ICA the bias of the demixing matrix from the inverse mixing matrix actually leads to a reduced noise level in the separated signals and hence it can be considered to be desired.

This chapter studies both factors for circular and noncircular complex mixtures. It is important to note that the complex case is not directly equivalent to the real case of twice larger dimension [19]. In the derivations, we aim to clearly show the connections and differences between the complex and real cases.

In many practical applications such as audio processing in frequency-domain or telecommunication, the signals are complex. While many publications focus on circularFootnote 1 complex signals (as traditionally assumed in signal processing), [4, 36, 44] provide a good overview of applications with noncircular complex signals and discuss how to properly deal with noncircularity. Many signals of practical interest are noncircular. Digital modulation schemesFootnote 2 usually produce noncircular complex baseband signals, since the symbol constellations in the complex plane are only rotationally symmetric for a discrete set of rotation angles but not any arbitrary real rotation angle as necessary for circularity [2]. Another source of noncircularity is an imbalance between the in-phase and quadrature (I/Q) components of communication signals. Noncircularity can also be exploited in feature extraction in electrocardiograms (ECGs) and in the analysis of functional magnetic resonance imaging (fMRI) [4]. Moreover, the theory of noncircularity has found applications in acoustics and optics [44].

Although a large number of different algorithms for complex ICA have been proposed [7, 10, 14, 17, 18, 20, 29, 30, 38, 39], the CRB for the complex demixing matrix has only been derived recently in [33, 35]. General conditions regarding identifiability, uniqueness, and separability for complex ICA can be found in Eriksson and Koivunen [19]. Yeredor [48] provides a performance analysis for the strong uncorrelating transform (SUT) in terms of the interference-to-signal ratio matrix. However, since the SUT uses only second-order statistics, the results from [48] do not apply for ICA algorithms exploiting also the non-Gaussianity of the sources. As discussed in [3, 4], many ICA approaches exploiting non-Gaussianity of the sources are intimately related and can be studied under the umbrella of a maximum likelihood framework whose asymptotic performance reaches the CRB if the assumed distribution of the sources matches the true distribution.

The structure of the separation problem changes substantially if we account for additive noise. As discussed in [12], the mixing model is no longer equivariant and the likelihood contrast can no longer be assimilated to mutual information. Furthermore, the ML estimate of the source signals is no longer a linear function of the observations [23]. Source estimation from noisy mixtures can be classified into linear and nonlinear separation. In linear ICA, the presence of noise leads to a bias in the estimation of the mixing matrix. Douglas et al. [16] introduced measures to reduce this bias. Cardoso [8] showed that the performance of noisy source separation depends on the distribution of the sources, the signal-to-noise ratio (SNR) and the mixing matrix. Davies [13] showed for the real case that it is not meaningful to estimate both the mixing matrix and the full covariance matrix of the noise from the data. Koldovsky and Tichavsky [27, 28] drew parallels between linear minimum mean squared error (MMSE) estimation and ICA for the real data case. Up to now, closed-form expressions for the bias of the ICA solution in the complex case have not been derived except for the recent work of Loesch and Yang [34].

After a review of notation for complex-valued signals, complex ICA, and the CRB for a complex parameter vector in Sect. 3.1.1, we derive a closed-form expression for the CRB of the demixing matrix for instantaneous noncircular complex mixtures in Sect. 3.2. We first introduce the signal model and the assumptions in Sect. 3.2.1 and then derive the CRB for the complex demixing matrix in Sect. 3.2.2. Section 3.2.3 discusses the circular complex case and noncircular complex Gaussian case as two special cases of the CRB. In Sect. 3.2.4, we study the CRB numerically for family of noncircular complex generalized Gaussian distributionsFootnote 3 (GGD) and compare it to simulation results of several ICA estimators.

In Sect. 3.3, we consider a linear noisy noncircular complex mixing model and derive an analytic expression for the demixing matrix of ICA based on the Kullback-Leibler divergence (KLD) [34]. This expression contains the circular complex and real case as special cases. The derivation is done using a perturbation analysis valid for small noise variance.Footnote 4 In Sect. 3.3.3, we show that for a wide range of both the shape parameter and the noncircularity index of the GGD, the signal-to-interference-plus-noise ratio (SINR) of KLD-based ICA is close to that of linear MMSE estimation. We also discuss the situations where the two solutions differ. Furthermore, we extend our derivations to the overdetermined case (\(M>N\)) with circular complex noise in Sect. 3.3.4.

Compared to our previous journal and conference publications [3335], we extend the performance study to a larger number of ICA algorithms and extend the results for noisy mixtures to the overdetermined case.

1.1 Notations for Complex-Valued Signals

1.1.1 Complex Random Vector

Let \(\mathbf{x} = \mathbf{x}_R + j\mathbf{x}_I \in \mathbb {C}^N\) be a complex random column vector with a corresponding probability density function (pdf) defined as the pdf \(\tilde{p}(\mathbf{x}_R, \mathbf{x}_I)\) of the real part \(\mathbf{x}_R\) and imaginary part \(\mathbf{x}_I\) of \(\mathbf{x}\). Since \(\mathbf{x}_R = \frac{\mathbf{x} + \mathbf{x}^*}{2}\) and \(\mathbf{x}_I = \frac{\mathbf{x} - \mathbf{x}^*}{2j}\), we can rewrite the pdf \(\tilde{p}(\mathbf{x}_R, \mathbf{x}_I)\) as a function of \(\mathbf{x}\) and \(\mathbf{x}^*\), i.e., \(\tilde{p}(\mathbf{x}_R, \mathbf{x}_I) = p(\mathbf{x},\mathbf{x}^*)\). In the following, we will use \(p(\mathbf{x})\) as a short notation for \(p(\mathbf{x},\mathbf{x}^*)\). The covariance matrix of \(\mathbf{x}\) is

$$\begin{aligned} {{\mathrm{cov}}}(\mathbf{x}) = \mathrm{E}\left[ (\mathbf{x}- \mathrm{E}[\mathbf{x}])(\mathbf{x}- \mathrm{E}[\mathbf{x}])^H\right] . \end{aligned}$$
(3.1)

The pseudo-covariance matrix of \(\mathbf{x}\) is

$$\begin{aligned} {{\mathrm{pcov}}}(\mathbf{x}) = \mathrm{E}\left[ (\mathbf{x}- \mathrm{E}[\mathbf{x}])(\mathbf{x}- \mathrm{E}[\mathbf{x}])^T\right] . \end{aligned}$$
(3.2)

\((\cdot )^T\) and \((\cdot )^H\) stand for transpose and complex conjugate transpose of a vector or matrix. The augmented covariance matrix of \(\mathbf{x}\) is the covariance matrix of the augmented vector \(\underline{\mathbf{x}} = \begin{bmatrix} \mathbf{x}^T&\mathbf{x}^H \end{bmatrix}^T\):

$$\begin{aligned} {{\mathrm{cov}}}(\underline{\mathbf{x}})=\begin{bmatrix} {{\mathrm{cov}}}(\mathbf{x})&{{\mathrm{pcov}}}(\mathbf{x}) \\ {{\mathrm{pcov}}}(\mathbf{x})^*&{{\mathrm{cov}}}(\mathbf{x})^* \end{bmatrix}. \end{aligned}$$
(3.3)

\(\mathbf{x}\) is called circular if \(p(\mathbf{x} e^{j\alpha })=p(\mathbf{x}) \; \forall \alpha \in \mathbb {R}\). Otherwise it is called noncircular. Actually, for a random variable \(s\), the circularity definition \(p(s e^{j\alpha })=p(s) \; \forall \alpha \in \mathbb {R}\) is much stronger than the second-order circularity given by \(\gamma =\mathrm{E}\left[ s^2\right] =0\). There exist noncircular complex random variables with \(\gamma =0\). For simplicity, however, we use the second-order noncircularity index \(\gamma =\mathrm{E}\left[ s^2\right] \) to quantify noncircularity in the remainder of this chapter.

1.1.2 Complex Gradient

Let a complex column parameter vector \(\pmb {\theta }= \pmb {\theta }_R + j\pmb {\theta }_I \in \mathbb {C}^{M}\), its real and imaginary part \(\pmb {\theta }_R, \pmb {\theta }_I \in \mathbb {R}^{M}\), and a real scalar cost function \(f(\pmb {\theta },\pmb {\theta }^*) = \tilde{f}(\pmb {\theta }_R, \pmb {\theta }_I) \in \mathbb {R}\) be given. For ease of notation, we will also use the simplified notation \(f(\pmb {\theta })\) instead of \(f(\pmb {\theta },\pmb {\theta }^*)\). Instead of calculating the derivatives of \(\tilde{f}(\cdot )\) with respect to \(\pmb {\theta }_R\) and \(\pmb {\theta }_I\), the Wirtinger calculus computes the partial derivatives of \(f(\pmb {\theta }, \pmb {\theta }^*)\) with respect to \(\pmb {\theta }\) and \(\pmb {\theta }^*\), treating \(\pmb {\theta }\) and \(\pmb {\theta }^*\) as two independent variables [21, 44]. The complex gradient vectors \(\nabla _{\pmb {\theta }} f\) and \(\nabla _{\pmb {\theta }^*} f\) are given by

$$\begin{aligned} \nabla _{\pmb {\theta }} f = \frac{\partial f}{\partial \pmb {\theta }} = \frac{1}{2} \left( \frac{\partial \tilde{f}}{\partial \pmb {\theta }_R} - j\frac{\partial \tilde{f}}{\partial \pmb {\theta }_I}\right) \; \in \mathbb {C}^{M},\nonumber \\ \nabla _{\pmb {\theta }^*} f=\frac{\partial f}{\partial \pmb {\theta }^*} = \frac{1}{2} \left( \frac{\partial \tilde{f}}{\partial \pmb {\theta }_R} + j\frac{\partial \tilde{f}}{\partial \pmb {\theta }_I}\right) \; \in \mathbb {C}^{M}. \end{aligned}$$
(3.4)

The stationary points of \(f(\cdot )\) and \(\tilde{f}(\cdot )\) are given by \(\left( \frac{\partial \tilde{f}}{\partial \pmb {\theta }_R} = \mathbf{0}\right. \) and \(\left. \frac{\partial \tilde{f}}{\partial \pmb {\theta }_I} = \mathbf{0}\right) \) or \(\frac{\partial f}{\partial \pmb {\theta }} = \mathbf{0}\) or \(\frac{\partial f}{\partial \pmb {\theta }^*} = \mathbf{0}\). The direction of steepest descent of a real function \(f(\pmb {\theta },\pmb {\theta }^*)\) is given by \(-\frac{\partial f}{\partial \pmb {\theta }^*}\) and not \(-\frac{\partial f}{\partial \pmb {\theta }}\) [6]. Note that \(-\frac{\partial f}{\partial \pmb {\theta }^*}\) is the direction of steepest descent for \(\pmb {\theta }\) and not for \(\pmb {\theta }^*\).

As long as the real and imaginary part of a complex function \(g(\pmb {\theta },\pmb {\theta }^*)=g_R(\pmb {\theta }_R,\pmb {\theta }_I)+jg_I(\pmb {\theta }_R, \pmb {\theta }_I)\) are differentiable, the Wirtinger derivatives \(\frac{\partial g}{\partial \pmb {\theta }} = \frac{\partial g_R}{\partial \pmb {\theta }}+j\frac{\partial g_I}{\partial \pmb {\theta }}\) and \(\frac{\partial g}{\partial \pmb {\theta }^*} = \frac{\partial g_R}{\partial \pmb {\theta }^*}+j\frac{\partial g_I}{\partial \pmb {\theta }^*}\) also exist [43]. Furthermore, we note that the Wirtinger derivatives defined in (3.4) are also valid for partial derivatives of \(f\) with respect to a parameter matrix \(\pmb {\Theta }\). In this chapter, we will also use real derivatives which we denote as \((\cdot )^{\prime }\) wherever possible.

1.1.3 Cramér-Rao Bound for a Complex Parameter Vector

Assume that \(L\) complex observations of \(\mathbf{x}\) are iid with the pdf \(p(\mathbf{x}; \pmb {\theta })\) where \(\pmb {\theta }\) is an \(N\)-dimensional complex parameter vector. In principle, it would be possible to derive the CRB for complex parameter \(\pmb {\theta }=\pmb {\theta }_R+j\pmb {\theta }_I\) by considering the real CRB of the \(2N\)-dimensional real composite vector \(\bar{\pmb {\theta }}=\begin{bmatrix} \pmb {\theta }_R^T&\pmb {\theta }_I^T \end{bmatrix}^T\):

$$\begin{aligned} {{\mathrm{cov}}}(\bar{\pmb {\theta }})=\begin{bmatrix} {{\mathrm{cov}}}(\pmb {\theta }_R)&{{\mathrm{cov}}}(\pmb {\theta }_R, \pmb {\theta }_I) \\ {{\mathrm{cov}}}(\pmb {\theta }_I,\pmb {\theta }_R)&{{\mathrm{cov}}}(\pmb {\theta }_I) \end{bmatrix} \ge L^{-1} {\mathbf J}_{\bar{\pmb {\theta }}}^{-1}, \end{aligned}$$
(3.5)

where \({{\mathrm{cov}}}(\mathbf{x}, \mathbf{y}) = \mathrm{E}\left[ (\mathbf{x}- \mathrm{E}[\mathbf{x}])(\mathbf{y}- \mathrm{E}[\mathbf{y}])^T\right] \) denotes the cross-covariance matrix of \(\mathbf{x}\) and \(\mathbf{y}\), \({\mathbf J}_{\bar{\pmb {\theta }}} = \mathrm{E}\left[ \left\{ \nabla _{\bar{\pmb {\theta }}} \ln p(\mathbf{x};\bar{\pmb {\theta }})\right\} \left\{ \nabla _{\bar{\pmb {\theta }}}\ln p(\mathbf{x};\bar{\pmb {\theta }})\right\} ^T\right] \) is the real Fisher information matrix (FIM) and \(\nabla _{\bar{\pmb {\theta }}} \ln p(\mathbf{x};\bar{\pmb {\theta }})\) is the real gradient vector of \(\ln p(\mathbf{x};\bar{\pmb {\theta }})\).

However, it is often more convenient to directly work with the complex CRB introduced in this section: The complex FIM of \(\pmb {\theta }\) is defined as

$$\begin{aligned} \fancyscript{J}_{\pmb {\theta }}= \begin{bmatrix} \fancyscript{I}_{\pmb {\theta }}&\fancyscript{P}_{\pmb {\theta }}\\ \fancyscript{P}_{\pmb {\theta }}^*&\fancyscript{I}_{\pmb {\theta }}^* \end{bmatrix}, \end{aligned}$$
(3.6)

where \(\fancyscript{I}_{\pmb {\theta }}= \mathrm{E}\left[ \left\{ \nabla _{\pmb {\theta }^*} \ln p(\mathbf{x};\pmb {\theta })\right\} \left\{ \nabla _{\pmb {\theta }^*} \ln p(\mathbf{x};\pmb {\theta })\right\} ^H\right] \) is called the information matrix and \(\fancyscript{P}_{\pmb {\theta }}= \mathrm{E}\left[ \left\{ \nabla _{\pmb {\theta }^*} \ln p(\mathbf{x};\pmb {\theta })\right\} \left\{ \nabla _{\pmb {\theta }^*} \ln p(\mathbf{x};\pmb {\theta })\right\} ^T\right] \) the pseudo-information matrix.

The inverse of the FIM of \(\pmb {\theta }\) gives, under some regularity conditions, a lower bound for the augmented covariance matrix of an unbiased estimator \(\hat{\pmb {\theta }}\) of \(\pmb {\theta }\) [42, 44]

$$\begin{aligned} \begin{bmatrix} {{\mathrm{cov}}}(\hat{\pmb {\theta }})&{{\mathrm{pcov}}}(\hat{\pmb {\theta }}) \\ {{\mathrm{pcov}}}(\hat{\pmb {\theta }})^*&{{\mathrm{cov}}}(\hat{\pmb {\theta }})^* \end{bmatrix} \ge \left( L \fancyscript{J}_{\pmb {\theta }}\right) ^{-1} \!=\! L^{-1} \begin{bmatrix} \fancyscript{I}_{\pmb {\theta }}&\fancyscript{P}_{\pmb {\theta }}\\ \fancyscript{P}_{\pmb {\theta }}^*&\fancyscript{I}_{\pmb {\theta }}^* \end{bmatrix}^{-1}\! \! . \end{aligned}$$
(3.7)

Note that the complex CRB (3.7) can be transformed to the corresponding real CRB (3.5) by using the transform \({\mathbf J}_{\bar{\pmb {\theta }}}^{-1} = \frac{1}{2} {\mathbf T} \fancyscript{J}_{\pmb {\theta }}^{-1} {\mathbf T}^{-1}\) [42], where \({\mathbf T} = \frac{1}{2} \begin{bmatrix} {\mathbf I}&{\mathbf I} \\ -j{\mathbf I}&j{\mathbf I} \end{bmatrix}\) is a \(2N \times 2N\) matrix and \({\mathbf I}\) is the \(N \times N\) identity matrix.

By using the block matrix inversion lemma [22], we get from (3.7)

$$\begin{aligned} \begin{bmatrix} {{\mathrm{cov}}}(\hat{\pmb {\theta }}) \!& {{\mathrm{pcov}}}(\hat{\pmb {\theta }}) \\ {{\mathrm{pcov}}}(\hat{\pmb {\theta }})^* \!& {{\mathrm{cov}}}(\hat{\pmb {\theta }})^* \end{bmatrix} \ge L^{-1} \! \begin{bmatrix} {\mathbf R}_{\pmb {\theta }}^{-1} \!& - {\mathbf R}_{\pmb {\theta }}^{-1} {\mathbf Q}_{\pmb {\theta }} \\ - {\mathbf Q}_{\pmb {\theta }}^H {\mathbf R}_{\pmb {\theta }}^{-1} \!& {\mathbf R}_{\pmb {\theta }}^{-*} \end{bmatrix} \end{aligned}$$
(3.8)

with \({\mathbf R}_{\pmb {\theta }}= \fancyscript{I}_{\pmb {\theta }}- \fancyscript{P}_{\pmb {\theta }}\fancyscript{I}_{\pmb {\theta }}^{-*} \fancyscript{P}_{\pmb {\theta }}^{*}\) and \({\mathbf Q}_{\pmb {\theta }} = \fancyscript{P}_{\pmb {\theta }}\fancyscript{I}_{\pmb {\theta }}^{-*}\). \({\mathbf A}^{-*}\) is a short notation for \(\left( {\mathbf A}^{-1}\right) ^* = \left( {\mathbf A}^{*}\right) ^{-1}\). Often we are interested in the bound for \({{\mathrm{cov}}}(\hat{\pmb {\theta }})\) only, which can be obtained from (3.8) as

$$\begin{aligned} {{\mathrm{cov}}}(\hat{\pmb {\theta }}) \ge L^{-1} {\mathbf R}_{\pmb {\theta }}^{-1}=L^{-1} (\fancyscript{I}_{\pmb {\theta }}- \fancyscript{P}_{\pmb {\theta }}\fancyscript{I}_{\pmb {\theta }}^{-*} \fancyscript{P}_{\pmb {\theta }}^{*})^{-1}. \end{aligned}$$
(3.9)

Note that (3.9) gives a bound solely on the covariance matrix of an unbiased estimator. If an estimator reaches that bound, i.e., \({{\mathrm{cov}}}(\hat{\pmb {\theta }})=L^{-1} {\mathbf R}_{\pmb {\theta }}^{-1}\), it does not imply that it also reaches the general CRB defined in (3.7). Only if the pseudo-information matrix \(\fancyscript{P}_{\pmb {\theta }}\) vanishes, \({{\mathrm{cov}}}(\hat{\pmb {\theta }})=L^{-1} {\mathbf R}_{\pmb {\theta }}^{-1}\) implies that \(\hat{\pmb {\theta }}\) reaches the CRB (3.7).

Sometimes, we are interested in introducing constraints on some or all of the complex parameters. The constrained CRB can be derived by following the steps in either [42] or [24]. If the unconstrained Fisher information matrix is singular, only the constrained CRB from [24] can be applied.

2 Cramér-Rao Bound for Complex ICA

For the performance analysis of ICA algorithms, it is useful to have a lower bound for the covariance matrix of estimators of the demixing matrix \({\mathbf W}\). The Cramér-Rao bound (CRB) is a lower bound on the covariance matrix of any unbiased estimator of a parameter vector. A closed-form expression for the CRB of the demixing matrix for real instantaneous ICA has been derived recently in [41, 45] which we summarized in Appendix 1. However, in many practical applications such as audio processing in frequency-domain or telecommunication, the signals are complex and hence we need the CRB for complex ICA.

2.1 Signal Model and Assumptions

Throughout this section, we assume an instantaneous complex linear square noiseless mixing model

$$\begin{aligned} \mathbf{x}={\mathbf A}\mathbf{s} \end{aligned}$$
(3.10)

where \(\mathbf{x} \in \mathbb {C}^N\) are \(N\) linear combinations of the \(N\) source signals \(\mathbf{s} \in \mathbb {C}^N\). We make the following assumptions:

  1. A1.

    The mixing matrix \({\mathbf A}\in \mathbb {C}^{N\times N}\) is deterministic and invertible.

  2. A2.

    \(\mathbf{s}=[s_1,\dots , s_N]^T \in \mathbb {C}^N\) are \(N\) independent random variables with zero mean, unit variance \(\mathrm{E}\left[ |s_i|^2\right] = 1\) and second-order noncircularity index \(\gamma _i = E[s_i^2] \in [0,1]\).Footnote 5 Since \(\gamma _i \in \mathbb {R}\), the real and imaginary part of \(s_i\) are uncorrelated. \(\gamma _i\ne 0\) if and only if the variances of the real and imaginary part of \(s_i\) differ. The probability density functions (pdfs) \(p_{s_i}(s_i)\) of different source signals \(s_i\) can be identical or different. \(p_{s_i}(s_i)\) is continuously differentiable with respect to \(s_i\) and \(s_i^*\) in the sense of Wirtinger derivatives [46] which have been shortly reviewed in Sect. 3.1.1. All required expectations exist.

The task of ICA is to demix the signals \(\mathbf{x}\) by a linear demixing matrix \({\mathbf W} \in \mathbb {C}^{N \times N}\)

$$\begin{aligned} \mathbf{y} = {\mathbf W} \mathbf{x} = {\mathbf W} {\mathbf A} \mathbf{s} \end{aligned}$$
(3.11)

such that \(\mathbf{y}\) is “as close to \(\mathbf{s}\)” as possible according to some metric.

The ideal solution for \({\mathbf W}\) is \({\mathbf A}^{-1}\), neglecting scaling, phase, and permutation ambiguity [19]. If we know the pdfs \(p_{s_i}(s_i)\) perfectly, there is no scaling ambiguity. Due to the “working” assumption \(\gamma _i \in [0,1]\) (see Appendix 2), there is no phase ambiguity for noncircular sources (\(\gamma _i>0\)) [1, 37]. A phase ambiguity occurs only for circular sources (\(\gamma _i=0\)). Noncircular sources \(s_i\) which do not comply with the assumption \(\gamma _i \in [0,1]\) can be transformed according to \(s_i e^{j\alpha _i}\) such that \(\gamma _i \in [0,1]\).

In general, a complex source signal \(s\) can be described by the following statistical properties:

  • non-Gaussianity,

  • noncircularity,

  • nonwhiteness, i.e., \(s(t_1)\) and \(s(t_2)\) are dependent for different time instants \(t_1 \ne t_2\),

  • nonstationarity, i.e., the statistical properties of \(s(t)\) change over time.

In this section, we focus on noncircular complex source signals with independent and identically distributed (iid) time samples. An extension to temporally non-iid sources, i.e., to incorporate nonstationarity and nonwhiteness of the sources, has been given in [35].

Two temporally iid sources can be separated by ICA

  • if at least one of the two sources is non-Gaussian or

  • if both sources are Gaussian but differ in noncircularity [19].

2.2 Derivation of the Cramér-Rao Bound

We form the parameter vector

$$\begin{aligned} \pmb {\theta }= {{\mathrm{vec}}}({\mathbf W}^T) = [\mathbf{w}_1^T, \dots , \mathbf{w}_N^T]^T \in \mathbb {C}^{N^2} \end{aligned}$$
(3.12)

where \(\mathbf{w}_i^T\) denotes the \(i\)-th row vector of \({\mathbf W}\). The \({{\mathrm{vec}}}(\cdot )\) operator stacks the columns of its argument into one long column vector. Given the pdfs \(p_{s_i}(s_i)\) of the complex source signals \(s_i\) and the complex linear transform \(\mathbf{x}={\mathbf A} \mathbf{s}\), it is easy to derive the pdf of \(\mathbf{x}\) as \(p(\mathbf{x}; \pmb {\theta }) = |\det ({\mathbf W})|^2 \prod _{i=1}^N p_{s_i}(\mathbf{w}_i^T \mathbf{x})\). Here, in the derivation of the CRB, \({\mathbf W}\) is a short notation for \({\mathbf A}^{-1}\) and not the demixing matrix which would contain permutation, scaling, and phase ambiguity. By using matrix derivatives [2, 3, 21], we obtain

$$\begin{aligned} \frac{\partial }{\partial {\mathbf W}^H} \ln p(\mathbf{x};\pmb {\theta }) = {\mathbf A}^* - \mathbf{x}^* \pmb {\varphi }^T({\mathbf W} \mathbf{x}) = {\mathbf A}^* ({\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s}))^* \end{aligned}$$
(3.13)

where \(\pmb {\varphi }(\mathbf{s}) = [\varphi _1(s_1), \dots , \varphi _N(s_N)]^T\) and \(\varphi _i(s_i)\) is defined as

$$\begin{aligned} \varphi _i(s_i) = -\frac{\partial }{\partial s_i^*} \ln p_{s_i}(s_i) = -\frac{1}{2} \frac{1}{p_{s_i}(s_i)} \left[ \frac{\partial p_{s_i}(s_i)}{\partial s_{i,R}}+j\frac{\partial p_{s_i}(s_i)}{\partial s_{i,I}}\right] \!. \end{aligned}$$
(3.14)

Since \(\pmb {\theta }= {{\mathrm{vec}}}({\mathbf W}^T)\), we get

$$\begin{aligned} \nabla _{\pmb {\theta }^*} \ln p_{\mathbf{x}}(\mathbf{x};\pmb {\theta }) = {{\mathrm{vec}}}\left( \frac{\partial }{\partial {\mathbf W}^H} \ln p_{\mathbf{x}}(\mathbf{x};\pmb {\theta })\right) =\left[ ({\mathbf I} \otimes {\mathbf A}) {{\mathrm{vec}}}\left( {\mathbf I} - \mathbf{s} \pmb {\varphi }(\mathbf{s})^H\right) \right] ^*, \end{aligned}$$
(3.15)

where \({\mathbf A} \otimes {\mathbf B} = \left[ a_{ij} {\mathbf B}\right] \) denotes the Kronecker product of \({\mathbf A}\) and \({\mathbf B}\). Hence, the information and pseudo-information matrix in (3.6) become

$$\begin{aligned} \fancyscript{I}_{\pmb {\theta }}&= \left( ({\mathbf I} \otimes {\mathbf A}) \mathrm{E}\left[ {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\} {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\}^H\right] ({\mathbf I} \otimes {\mathbf A}^H)\right) ^* \nonumber \\&=\left( ({\mathbf I} \otimes {\mathbf A}) {\mathbf M}_1 ({\mathbf I} \otimes {\mathbf A}^H)\right) ^*, \end{aligned}$$
(3.16)
$$\begin{aligned} \fancyscript{P}_{\pmb {\theta }}&=\left( ({\mathbf I} \otimes {\mathbf A}) \mathrm{E}\left[ {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\} {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\}^T\right] ({\mathbf I} \otimes {\mathbf A}^T)\right) ^* \nonumber \\&= \left( ({\mathbf I} \otimes {\mathbf A}) {\mathbf M}_2 ({\mathbf I} \otimes {\mathbf A}^T)\right) ^*, \end{aligned}$$
(3.17)

where

$$\begin{aligned} {\mathbf M}_1&= \mathrm{E}\left[ {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\} {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\}^H\right] \nonumber \\ \text {and} \quad {\mathbf M}_2&= \mathrm{E}\left[ {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\} {{\mathrm{vec}}}\{{\mathbf I} - \mathbf{s} \pmb {\varphi }^H(\mathbf{s})\}^T\right] . \end{aligned}$$
(3.18)

2.2.1 Induced CRB for the Gain Matrix \({\mathbf G}={\mathbf W}{\mathbf A}\)

Since the so-called gain matrix \({\mathbf G}={\mathbf W} {\mathbf A}\) is a linear function of \({\mathbf W}\), the CRB for \({\mathbf W}\) “induces” a bound for \({\mathbf G}\). For simplicity, we first derive this induced CRB (iCRB) for \({\mathbf G}={\mathbf W} {\mathbf A}={\mathbf A}^{-1} {\mathbf A}={\mathbf I}\) which is independent of the mixing matrix \({\mathbf A}\). Later we will obtain the CRB for \({\mathbf W}\) from the iCRB for \({\mathbf G}\).Footnote 6 When \(\hat{{\mathbf G}}=\hat{{\mathbf W}} {\mathbf A}\) denotes the estimated gain matrix, the diagonal elements \(\hat{G}_{ii}\) should be close to 1. They reflect how well we can estimate the power of each source signal. The off-diagonal elements \(\hat{G}_{ij}\) should be close to 0 and reflect how well we can suppress interfering components. We define the corresponding stacked parameter vector

$$\begin{aligned} \pmb {\vartheta }={{\mathrm{vec}}}({\mathbf G}^T)={{\mathrm{vec}}}({\mathbf A}^T {\mathbf W}^T)=({\mathbf I} \otimes {\mathbf A}^T) {{\mathrm{vec}}}({\mathbf W}^T)=({\mathbf I} \otimes {\mathbf A}^T) \pmb {\theta }. \end{aligned}$$
(3.19)

The covariance matrix of \(\hat{\pmb {\vartheta }} = {{\mathrm{vec}}}((\hat{{\mathbf W}} {\mathbf A})^T)\) is given by \({{\mathrm{cov}}}(\hat{\pmb {\vartheta }}) = ({\mathbf I} \otimes {\mathbf A}^T) {{\mathrm{cov}}}(\hat{\pmb {\theta }}) ({\mathbf I} \otimes {\mathbf A}^*)\) where \(\hat{\pmb {\theta }} = {{\mathrm{vec}}}(\hat{{\mathbf W}}^T)\). By combining (3.9) with (3.16) and (3.17), we get

$$\begin{aligned} {{\mathrm{cov}}}(\hat{\pmb {\vartheta }}) \ge L^{-1} ({\mathbf I} \otimes {\mathbf A}^T) (\fancyscript{I}_{\pmb {\theta }}- \fancyscript{P}_{\pmb {\theta }}\fancyscript{I}_{\pmb {\theta }}^{-*} \fancyscript{P}_{\pmb {\theta }}^{*})^{-1} ({\mathbf I} \otimes {\mathbf A}^*)= L^{-1} {\mathbf R}_{\pmb {\vartheta }}^{-1} \end{aligned}$$
(3.20)

with

$$\begin{aligned} {\mathbf R}_{\pmb {\vartheta }}= ({\mathbf M}_1 - {\mathbf M}_2 {\mathbf M}_1^{-*} {\mathbf M}_2^*)^*. \end{aligned}$$
(3.21)

As shown in [35], \({\mathbf R}_{\pmb {\vartheta }}\) can be calculated as

$$\begin{aligned} {\mathbf R}_{\pmb {\vartheta }} = \sum _{i=1}^N d_i {\mathbf L}_{ii} \otimes {\mathbf L}_{ii} + \sum _{i=1}^N \sum _{j=1,j\ne i}^Na_{ij} {\mathbf L}_{ii} \otimes {\mathbf L}_{jj} +\sum _{i=1}^N \sum _{j=1,j\ne i}^Nb_{ij} {\mathbf L}_{ij} \otimes {\mathbf L}_{ji} \end{aligned}$$
(3.22)

where \(d_i = \frac{(\eta _i - 1)^2-|\beta _i-1|^2}{\eta _i-1} \; \in \mathbb {R}\), \(a_{ij} = \kappa _i-\frac{|\gamma _j \xi _i|^2}{\kappa _i} - \frac{1}{\kappa _j} \; \in \mathbb {R}\) and \(b_{ij} = - \left( \frac{\gamma _j^* \xi _i^*}{\kappa _i} + \frac{\gamma _i \xi _j}{\kappa _j}\right) =b_{ji}^* \; \in \mathbb {C}\). \({\mathbf L}_{ij}\) in (3.22) denotes an \(N\times N\) matrix with a 1 at the \((i,j)\) position and 0’s elsewhere.

The parameters \(\eta _i\), \(\kappa _i\), \(\beta _i\), \(\xi _i\) and \(\gamma _j\) are defined as

$$\begin{aligned} \eta _i&= \mathrm{E}\left[ |s_i|^2 |\varphi _i(s_i)|^2\right] > 1, \end{aligned}$$
(3.23)
$$\begin{aligned} \kappa _i&=\mathrm{E}\left[ |\varphi _i(s_i)|^2\right] \ge 1, \end{aligned}$$
(3.24)
$$\begin{aligned} \beta _i&= \mathrm{E}\left[ s_i^2 (\varphi ^*_i(s_i))^2\right] \in \mathbb {C}, \end{aligned}$$
(3.25)
$$\begin{aligned} \xi _i&=\mathrm{E}\left[ (\varphi ^*_i(s_i))^2\right] \in \mathbb {C}, \end{aligned}$$
(3.26)
$$\begin{aligned} \gamma _j&=\mathrm{E}\left[ s_j^2\right] \in \mathbb {R}. \end{aligned}$$
(3.27)

Properties and other equivalent forms of these parameters can be found in the appendix of [35].

\({\mathbf R}_{\pmb {\vartheta }}\) has a special sparse structure which is illustrated below for \(N=3\):

The \(i\)-th diagonal element of the \(i\)-th diagonal block is \({\mathbf R}_{\pmb {\vartheta }}[i,i]_{(i,i)} = d_i\). The \(j\)-th diagonal element of the \(i\)-th diagonal block is \({\mathbf R}_{\pmb {\vartheta }}[i,i]_{(j,j)} = a_{ij}\). The \((j,i)\) element of the \([i,j]\) block is \({\mathbf R}_{\pmb {\vartheta }}[i,j]_{(j,i)} = b_{ij}\). All remaining elements are 0. By permuting rows and columns of \({\mathbf R}_{\pmb {\vartheta }}\), it can be brought into a block-diagonal form. Then it consists only of \(1 \times 1\) blocks with elements \(d_i\) and \(2 \times 2\) blocks \(\begin{bmatrix} a_{ij}&b_{ij} \\ b_{ji}&a_{ji} \end{bmatrix}\). Hence, \({\mathbf R}_{\pmb {\vartheta }}\) can be easily inverted resulting in a block-diagonal matrix where all \(1\times 1\) and \(2\times 2\) blocks are individually inverted as long as \(d_i \ne 0\) and \(a_{ij}a_{ji}-b_{ij}b_{ji} \ne 0\). The result is

$$\begin{aligned} {\mathbf R}_{\pmb {\vartheta }}^{-1}&=\! \sum _{i=1}^N \frac{1}{d_i} {\mathbf L}_{ii} \otimes {\mathbf L}_{ii} + \! \sum _{i=1}^N \sum _{\mathop {j\ne i}\limits ^{j=1}}^N\frac{a_{ji}}{a_{ij} a_{ji} - b_{ij} b_{ji}} {\mathbf L}_{ii} \otimes {\mathbf L}_{jj} \nonumber \\&\quad \, + \! \sum _{i=1}^N \sum _{\mathop {j\ne i}\limits ^{j=1}}^N\frac{-b_{ij}}{a_{ij} a_{ji} - b_{ij} b_{ji}} {\mathbf L}_{ij} \otimes {\mathbf L}_{ji} \nonumber \\&=\sum _{i=1}^N f_i {\mathbf L}_{ii} \otimes {\mathbf L}_{ii} + \sum _{i=1}^N \sum _{\mathop {j\ne i}\limits ^{j=1}}^Ng_{ij} {\mathbf L}_{ii} \otimes {\mathbf L}_{jj} + \sum _{i=1}^N \sum _{\mathop {j\ne i}\limits ^{j=1}}^Nh_{ij} {\mathbf L}_{ij} \otimes {\mathbf L}_{ji} \end{aligned}$$
(3.28)

with

$$\begin{aligned} f_i&=\frac{1}{d_i}=\frac{\eta _i-1}{(\eta _i - 1)^2-|\beta _i-1|^2},\end{aligned}$$
(3.29)
$$\begin{aligned} g_{ij}&=\frac{a_{ji}}{a_{ij} a_{ji} - b_{ij} b_{ji}}=\frac{\kappa _j (\kappa _i \kappa _j -1) -|\gamma _i \xi _j|^2 \kappa _i}{(\kappa _i \kappa _j-1)^2 + |\gamma _i\gamma _j \xi _i \xi _j -1|^2 - 1 - \kappa _i^2|\gamma _i \xi _j|^2 - \kappa _j^2 |\gamma _j \xi _i|^2}, \end{aligned}$$
(3.30)
$$\begin{aligned} h_{ij}&=\frac{-b_{ij}}{a_{ij} a_{ji} - b_{ij} b_{ji}}=\frac{\gamma _j^* \xi _i^* \kappa _j + \gamma _i \xi _j \kappa _i}{(\kappa _i \kappa _j-1)^2 + |\gamma _i\gamma _j \xi _i \xi _j -1|^2 - 1 - \kappa _i^2|\gamma _i \xi _j|^2 - \kappa _j^2 |\gamma _j \xi _i|^2}. \end{aligned}$$
(3.31)

This means that \({{\mathrm{var}}}(\hat{G}_{ii})\) and \({{\mathrm{var}}}(\hat{G}_{ij})\) of \(\hat{{\mathbf G}} = \hat{{\mathbf W}} {\mathbf A}\) are lower bounded by the \((i,i)\)-th and \((j,j)\)-th element of the \((i,i)\)-th block of \(L^{-1} {\mathbf R}_{\pmb {\vartheta }}^{-1}\):

$$\begin{aligned} {{\mathrm{var}}}(\hat{G}_{ii})&\ge \frac{1}{L} f_i= \frac{1}{L} \frac{\eta _i-1}{(\eta _i - 1)^2-|\beta _i-1|^2}, \end{aligned}$$
(3.32)
$$\begin{aligned} {{\mathrm{var}}}(\hat{G}_{ij})&\ge \frac{1}{L} g_{ij}= \frac{1}{L} \frac{\kappa _j (\kappa _i \kappa _j -1) -|\gamma _i \xi _j|^2 \kappa _i}{(\kappa _i \kappa _j-1)^2 + |\gamma _i\gamma _j \xi _i \xi _j -1|^2 - 1 - \kappa _i^2|\gamma _i \xi _j|^2 - \kappa _j^2 |\gamma _j \xi _i|^2}. \end{aligned}$$
(3.33)

Note that \(L^{-1} {\mathbf R}_{\pmb {\vartheta }}^{-1}\) is the iCRB for \(\pmb {\vartheta }\) as in (3.9). In order to get the complete iCRB for \(\begin{bmatrix} \pmb {\vartheta } \\ \pmb {\vartheta }^* \end{bmatrix}\) as in (3.8), we would also need \({\mathbf P}_{\pmb {\vartheta }}=-{\mathbf R}_{\pmb {\vartheta }}^{-1} {\mathbf Q}_{\pmb {\vartheta }} = - {\mathbf R}_{\pmb {\vartheta }}^{-1} {\mathbf M}_2^* {\mathbf M}_1^{-1}\).

It can be shown in a similar way

$$\begin{aligned} {\mathbf P}_{\pmb {\vartheta }} =\sum _{i=1}^N \tilde{f}_i {\mathbf L}_{ii} \otimes {\mathbf L}_{ii} + \sum _{i=1}^N \sum _{\mathop {j\ne i}\limits ^{j=1}}^N\left( \tilde{g}_{ij} {\mathbf L}_{ii} \otimes {\mathbf L}_{jj} + \tilde{h}_{ij} {\mathbf L}_{ij} \otimes {\mathbf L}_{ji}\right) \end{aligned}$$
(3.34)

has the same form as \({\mathbf R}_{\pmb {\vartheta }}^{-1}\) in (3.28) with

$$\begin{aligned} \tilde{f}_i&=-\frac{f_i (\beta _i-1)^*}{\eta _i-1}=\frac{-(\beta _i-1)^*}{(\eta _i - 1)^2-|\beta _i-1|^2},\end{aligned}$$
(3.35)
$$\begin{aligned} \tilde{g}_{ij}&=-\frac{g_{ij} \gamma _j^* \xi _i^* + h_{ij}}{\kappa _i}=\frac{-(\kappa _j^2-|\gamma _i \xi _j|^2)\gamma _j^* \xi _i^* \gamma _i \xi _j}{(\kappa _i \kappa _j-1)^2 + |\gamma _i\gamma _j \xi _i \xi _j -1|^2 - 1 - \kappa _i^2|\gamma _i \xi _j|^2 - \kappa _j^2 |\gamma _j \xi _i|^2}, \end{aligned}$$
(3.36)
$$\begin{aligned} \tilde{h}_{ij}&=-\frac{g_{ij} + \gamma _i^* \xi _j^* h_{ij}}{\kappa _j}=\frac{1-\kappa _i \kappa _j - (\gamma _j \xi _i \gamma _i \xi _j)^*}{(\kappa _i \kappa _j-1)^2 + |\gamma _i\gamma _j \xi _i \xi _j -1|^2 - 1 - \kappa _i^2|\gamma _i \xi _j|^2 - \kappa _j^2 |\gamma _j \xi _i|^2}. \end{aligned}$$
(3.37)

Note that according to (3.28) and (3.34) the iCRB for \({\mathbf G}={\mathbf W}{\mathbf A}\) has a nice decoupling property: the iCRB for \(G_{ii}\) only depends on the distribution of source \(i\) and the iCRB for \(G_{ij}\) only depends on the distribution of sources \(i\) and \(j\) and not on any other sources. Note that (3.32) and (3.33) cannot be used as a bound for real ICA since the FIM would be singular.

2.2.2 CRB for the Demixing Matrix \({\mathbf W}\)

Starting with the iCRB \(L^{-1} {\mathbf R}_{\pmb {\vartheta }}^{-1}\) for the stacked gain matrix \(\pmb {\vartheta } = {{\mathrm{vec}}}(({\mathbf W} {\mathbf A})^T)=({\mathbf I} \otimes {\mathbf A}^T) \cdot {{\mathrm{vec}}}({\mathbf W}^T)\), it is now straightforward to derive the CRB for the stacked demixing matrix \(\pmb {\theta }= {{\mathrm{vec}}}({\mathbf W}^T) = ({\mathbf I} \otimes {\mathbf A}^T)^{-1} \pmb {\vartheta } = ({\mathbf I} \otimes {\mathbf W}^T) \pmb {\vartheta }\). Since \(\pmb {\theta }\) is a linear function of \(\pmb {\vartheta }\),

$$\begin{aligned} {{\mathrm{cov}}}(\hat{\pmb {\theta }}) \ge L^{-1} {\mathbf R}_{\pmb {\theta }}^{-1} = L^{-1} ({\mathbf I} \otimes {\mathbf W}^T) {\mathbf R}_{\pmb {\vartheta }}^{-1} ({\mathbf I} \otimes {\mathbf W}^*) \end{aligned}$$
(3.38)

holds for any unbiased estimator \(\hat{\pmb {\theta }}\) for \(\pmb {\theta }\). See [35] for a more detailed expression of the CRB for \({\mathbf W}\).

2.3 Special Cases of the iCRB

In the previous section, we derived the iCRB for the gain matrix \({\mathbf G} = {\mathbf W} {\mathbf A}\) for the general complex case. Below, we study some special cases of the iCRB.

2.3.1 Case A: All Sources Are Circular Complex

If all sources are circular complex, \(\gamma _i=0\) and \(\beta _i=\eta _i\) [35]. Due to the phase ambiguity in circular complex ICA, the Fisher information for the diagonal elements \(G_{ii}\) is 0 and hence their iCRB does not exist. However, we can constrain \(G_{ii}\) to be real and derive the constrained CRB [24] for \(G_{ii}\): As noted at the end of Sect. 3.2.2.1, \(G_{ii}\) is decoupled from \(G_{ij}\) and \(G_{jj}\) and hence it is sufficient to consider the constrained CRB for \(G_{ii}\) alone.

The constrained CRB for \(G_{ii}\) is given by [35]

$$\begin{aligned} {{\mathrm{var}}}(\hat{G}_{ii}) \ge \frac{1}{4 L (\eta _i-1)}. \end{aligned}$$
(3.39)

The bound in (3.39) is valid for a phase-constrained \(G_{ii}\) such that \(G_{ii} \in \mathbb {R}\). Equation (3.39) looks similar to the real case (3.90) except for a factor of 4 since \(\eta _i\) is defined using Wirtinger derivatives instead of real derivatives.

For \({{\mathrm{var}}}(\hat{G}_{ij})\) we get from (3.33)

$$\begin{aligned} {{\mathrm{var}}}(\hat{G}_{ij}) \ge \frac{1}{L} \frac{\kappa _j}{\kappa _i \kappa _j-1}, \end{aligned}$$
(3.40)

which again looks the same as in the real case (3.91). However, in the complex case, \(\kappa _i\) is defined using the Wirtinger derivative instead of real derivative. Furthermore, in the complex case \(\kappa \) measures the non-Gaussianity and noncircularity whereas in the real case \(\kappa \) measures only the non-Gaussianity.

If source \(i\) and \(j\) are both circular Gaussian, \(\kappa _i=\kappa _j=1\) and \({{\mathrm{var}}}(\hat{G}_{ij}) \rightarrow \infty \). This corresponds to the known fact that circular complex Gaussian sources cannot be separated by ICA.

2.3.2 Case B: All Sources are Noncircular Complex Gaussian

If all sources are noncircular Gaussian with different \(\gamma _i \in \mathbb {R}\), it can be shown using the expressions for \(\kappa ,\xi ,\eta \) and \(\beta \) in (3.86)–(3.89) with \(c=1\) that

$$\begin{aligned} {{\mathrm{var}}}(\hat{G}_{ii})&\ge \frac{1}{L} \frac{1}{4 \gamma _i^2}, \end{aligned}$$
(3.41)
$$\begin{aligned} {{\mathrm{var}}}(\hat{G}_{ij})&\ge \frac{1}{L} \frac{\gamma _i^2 + \gamma _j^2 - 2 \gamma _i^2 \gamma _j^2}{(\gamma _j^2-\gamma _i^2)^2} (1-\gamma _i^2)\nonumber \\&\quad =\frac{1-\gamma _i^2}{2 L} \left[ \frac{1-\gamma _i\gamma _j}{(\gamma _i-\gamma _j)^2} + \frac{1+\gamma _i\gamma _j}{(\gamma _i+\gamma _j)^2}\right] . \end{aligned}$$
(3.42)

Note that (3.42) is exactly the same result as obtained in [48] for the performance analysis of the SUT, i.e., our result shows that for noncircular Gaussian sources the SUT is indeed asymptotically optimal.

If all sources are noncircular Gaussian with identical \(\gamma _i\), it can be shown that the iCRB for \(G_{ij}\) does not exist because \(\gamma _j^2-\gamma _i^2 \rightarrow 0\). This confirms the result obtained in [19, 29] which showed that ICA fails for two or more noncircular Gaussian signals with same \(\gamma _i\).

2.4 Results for Generalized Gaussian Distribution

In order to verify the CRB derived in the previous sections, we now study complex ICA with noncircular complex generalized Gaussian distributed (GGD) sources. We choose this family of parametric pdf since it enables an analytical calculation of the CRB. The pdf of such a noncircular complex source \(s\) with zero mean, variance \(\mathrm{E}[|s|^2]=1\) and noncircularity index \(\gamma \in [0,1]\) can be written as [40]

$$\begin{aligned} p(s,s^*) = \frac{c \alpha \cdot \text {exp}\left( -\left[ \frac{\alpha /2}{\gamma ^2-1} \left( \gamma s^2 + \gamma {s^*}^2 - 2 s s^*\right) \right] ^c\right) }{\pi \Gamma (1/c)(1-\gamma ^2)^{1/2}}, \end{aligned}$$

where \(\alpha = \varGamma (2/c)/\varGamma (1/c)\) and \(\varGamma (\cdot )\) is the Gamma function. The shape parameter \(c>0\) varies the form of the pdf from super-Gaussian (\(c<1\)) to sub-Gaussian (\(c>1\)). For \(c=1\), the pdf is Gaussian. \(0\le \gamma \le 1\) controls the noncircularity of the pdf. The four parameters \(\kappa \), \(\beta \), \(\eta \), \(\xi \) required to calculate the CRB are derived in Appendix 1. For the simulation study, we consider \(N=3\) sources with random mixing matrices \({\mathbf A}\). The real and imaginary part of all elements of \({\mathbf A}\) are independent and uniformly distributed in \([-1,1]\). We conducted 100 experiments with different random matrices \({\mathbf A}\) and consider the following different ICA estimatorsFootnote 7: Complex ML-ICA [29], complex ICA by entropy bound minimization (ICA-EBM) [30], noncircular complex ncFastICA (ncFastICA) [39], adaptable complex maximization of non-Gaussianity (ACMN) [38] and strong uncorrelating transform (SUT) [18, 44]. The properties and assumptions of the five different ICA algorithms are summarized in Table 3.1.

Table 3.1 Considered separation algorithms and their properties

We want to compare the separation performance of ICA with respect to the iCRB and hence we define the performance metric as in [45]: After running an ICA algorithm, we correct the permutation ambiguity of the estimated demixing matrix and calculate the signal-to-interference ratio (SIR) averaged over all \(N\) sources:

$$\begin{aligned} \text {SIR} = \frac{1}{N} \sum _{i=1}^N \frac{\mathrm{E}\left[ |G_{ii}|^2\right] }{\sum _{j \ne i} \mathrm{E}\left[ |G_{ij}|^2\right] } = \frac{1}{N} \sum _{i=1}^N \frac{1\!+\!{{\mathrm{var}}}(G_{ii})}{\sum _{j \ne i} {{\mathrm{var}}}(G_{ij})}. \end{aligned}$$
(3.43)

In (3.43), the averaging over simulation trials takes place before taking the ratio.

In practice, the accuracy of the estimated demixing matrix depends not only on the optimization cost function but also on the optimization algorithm used to implement the estimator: In some rare cases, complex ML-ICA based on natural-gradient ascent converges to a local maximum of the likelihood and yields a lower SIR value than ICA-EBM. To overcome this problem, we initialized ML-ICA from the solution obtained by ICA-EBM which is close to the optimal solution.

2.4.1 Case A: All Sources Are Identically Distributed

First, we study the performance when all sources are identically distributed with the same shape parameter \(c\) and the same noncircularity index \(\gamma \). Figure 3.1 shows the results: The SIR given by the iCRB increases with increasing non-Gaussianity (\(c \rightarrow \infty \) or \(c \rightarrow 0\)). For \(c \approx 1\), SIR is low since (nearly) Gaussian sources with the same noncircularity index \(\gamma \) cannot be separated by ICA. For \(c \ne 1\), the SIR also increases with increasing noncircularity \(\gamma \), but much slower since all sources have the same noncircularity \(\gamma \). Clearly, all ICA algorithms work quite well except for \(c \approx 1\) (Gaussian). ML-ICA (Fig. 3.1b) achieves the best performance followed by ICA-EBM (Fig. 3.1c) and ACMN (Fig. 3.1f). ncFastICA with kurtosis cost function achieves better performance for sub-Gaussian sources (\(c>1\)) than for super-Gaussian sources (\(c<1\)), whereas ncFastICA with square root (sqrt) nonlinearity works better for super-Gaussian sources than for sub-Gaussian sources. However, as also mentioned in [30], the square root nonlinearity leads overall to the best performance and hence we only consider ncFastICA with this nonlinearity in the following. As expected, SUT fails since it only uses noncircularity for separation and hence we do not show the results. The reason why ML-ICA outperforms ICA-EBM is that ML-ICA uses nonlinearities matched to the source distributions while ICA-EBM uses a linear combination of prespecified nonlinear functions. Note that ICA-EBM allows one to select nonlinearities for approximating the source entropy. Hence if prior knowledge about the source distributions is available, it can be incorporated into ICA-EBM thus improving its performance.

Fig. 3.1
figure 1

Comparison of signal-to-interference ratio [dB] of different ICA estimators with CRB, sample size \(L=1000\), all sources follow a generalized Gaussian distribution with \(c_i=c\) and \(\gamma _i=\gamma \). iCRB (a), ML-ICA (b), ICA-EBM (c), ncFastICA kurtosis (d), ncFastICA sqrt (e), ACMN (f)

2.4.2 Case B: All Sources Have Different Shape Parameters and Different Noncircularities

Now we study the performance when the sources follow a GGD with different shape parameters \(c_1=1, c_2=c, c_3=1/c\) and different noncircularity indices \(\gamma _i = (i-1) \Delta \gamma \). Figure 3.2 shows that the SIR given by the iCRB increases both with increasing non-Gaussianity of source 2 and 3 (i.e., \(c<1\)) as well as increasing difference in noncircularity indices \(\Delta \gamma \). ML-ICA achieves again the best performance, followed by ICA-EBM. The reason is again that ML-ICA uses for each source \(s_i\) a nonlinearity \(\varphi _i(s_i)\) matched to its pdf \(p_{s_i}(s_i)\) whereas the nonlinearities used in ICA-EBM are fixed a priori. Although ncFastICA and ACMN exploit the noncircularity of the sources to improve the convergence, their cost function only uses non-Gaussianity and not noncircularity. This is reflected clearly in Fig. 3.2 since performance for ncFastICA and ACMN is almost constant for different \(\Delta \gamma \). SUT uses only noncircularity for separation, and hence performance is almost constant for different \(c\). SUT can work quite well, as long as \(\Delta \gamma \) is large enough. Only ML-ICA and ICA-EBM use both non-Gaussianity and noncircularity and hence the contour lines in Fig. 3.2b, c resemble those of the CRB Fig. 3.2a.

Fig. 3.2
figure 2

Comparison of signal-to-interference ratio [dB] of different ICA estimators with iCRB, sample size \(L=1000\), all sources follow a generalized Gaussian distribution with \(c_1=1, c_2=c, c_3=1/c\), \(\gamma _i=(i-1) \Delta \gamma \). iCRB (a), ML-ICA (b), ICA-EBM (c), ncFastICA sqrt (d), ACMN (e), SUT (f)

2.4.3 Performance as a Function of the Sample Size

Here, we study the performance as a function of sample size \(L\). Clearly, Fig. 3.3 shows that for circular non-Gaussian sources and limited sample size \(L\), ML-ICA achieves the best performance followed by ACMN and then ICA-EBM. The reason why ACMN outperforms ICA-EBM for circular sources could be the fact that ACMN needs to adapt less parameters since it uses only non-Gaussianity. As expected, SUT fails since it only uses noncircularity for separation. For circular super-Gaussian sources (Fig. 3.3a), ACMN and ncFastICA perform almost the same. For sub-Gaussian sources (Fig. 3.3b), the sqrt nonlinearity is sub-optimal as shown in the larger error of ncFastICA. Figure 3.4a shows results for noncircular Gaussian sources with distinct noncircularity indices: SUT and ML-ICA perform equally well since for noncircular Gaussian sources they are equivalent and asymptotically optimal. ICA-EBM approaches the performance of SUT and ML-ICA for large enough sample size. ncFastICA and ACMN which use only non-Gaussianity for separation fail. Figure 3.4b, c shows results for noncircular super-Gaussian (\(c=0.5\)) and sub-Gaussian (\(c=6\)) sources with distint noncircularity indices: With limited sample size, ML-ICA achieves again the best performance followed by ICA-EBM. For a large sample size (\(L \ge 1000\)) and a wide range of distributions including strongly super-Gaussian but excluding strongly sub-Gaussian sources, ICA-EBM comes close to the performance of ML-ICA, see Figs. 3.1, 3.3, 3.4. The reason for this behavior is that ML-ICA uses nonlinearities matched to the source distributions while ICA-EBM uses a linear combination of prespecified nonlinear functions. These could be extended to improve performance for strongly sub-Gaussian sources. The performance of ncFastICA and ACMN is quite far from that given by the iCRB since these two algorithms do not use noncircularity for separation. For signals with distinct noncircularity indices, SUT can achieve decent separation, but for strongly non-Gaussian signals the performance is quite far from that given by the iCRB (see also Fig. 3.2).

Fig. 3.3
figure 3

Performance as a function of sample size \(L\), circular GGD sources. \(c=0.5\) (a), \(c=6\) (b)

Fig. 3.4
figure 4

Performance as a function of sample size \(L\), noncircular GGD sources with \(c_i=c\) and \(\gamma _i=(i-1) \Delta \gamma \). \(c=1, \Delta \gamma =0.45\) (a), \(c=0.5, \Delta \gamma =0.45\) (b), \(c=6, \Delta \gamma =0.45\) (c)

2.5 Conclusion

In this section, we have derived the CRB for the noncircular complex ICA problem with temporally iid sources. The induced CRB (iCRB) for the gain matrix, i.e., the demixing-mixing-matrix product, depends on the distribution of the sources through five parameters, which can be easily calculated. The derived bound is valid for the general noncircular complex case and contains the circular complex and the noncircular complex Gaussian case as two special cases. The iCRB reflects the phase ambiguity in circular complex ICA. In that case, we derived a constrained CRB for a phase-constrained demixing matrix. Simulation results using five ICA algorithms have shown that for sources following a noncircular complex generalized Gaussian distribution, some algorithms can achieve a signal-to-interference ratio (SIR) close to that of the CRB. Among the studied algorithms, complex ML-ICA and ICA-EBM perform best. The complex ML-ICA algorithm, which uses for each source a nonlinearity matched to its pdf, outperforms ICA-EBM especially for small sample sizes. However, for ML-ICA the pdfs of the sources must be known whereas no such knowledge is required for ICA-EBM. Hence, for practical applications where the pdfs of the sources might be unknown ICA-EBM is an adequate algorithm whose performance comes quite close to the iCRB for large enough sample size \(L\).

3 Solution of Linear Complex ICA in the Presence of Noise

In this section, we study the bias of the demixing matrix in linear noisy ICA from the inverse mixing matrix. We first derive the ICA solution for the general complex determined case. We then show how the circular complex case and the real case can be derived as special cases. Next, we verify the results using simulations. Finally, we extend our derivations to the overdetermined case with circular complex noise.

3.1 Signal Model and Assumptions

We assume the linear noisy mixing model

$$\begin{aligned} \mathbf{x}={\mathbf A}\mathbf{s} + \mathbf{v} \end{aligned}$$
(3.44)

where \(\mathbf{x} \in \mathbb {C}^N\) are \(N\) linear combinations of \(N\) original signals \(\mathbf{s} \in \mathbb {C}^N\) with additive noise \(\mathbf{v} \in \mathbb {C}^N\). Here, all signals are modeled as temporally iid. In addition to the assumptions A1 and A2 (invertibility of mixing matrix \({\mathbf A}\) and assumptions about the source signals \(\mathbf{s}\)) defined in Sect. 3.2.1, we make the following two assumptions regarding the noise \(\mathbf{v}\):

  1. 1.

    \(\mathbf{v}=[v_1,\dots , v_N]^T \in \mathbb {C}^N\) are \(N\) random variables with zero mean and the covariance matrix \(E[\mathbf{v} \mathbf{v}^H] = \sigma ^2 {\mathbf R}_{\mathbf{v}}\). \(\sigma ^2=\frac{1}{N} \text {tr}\left[ \mathrm{E}(\mathbf{v}\mathbf{v}^H)\right] \) is the average variance of \(\mathbf{v}\) and \(\text {tr}({\mathbf R}_{\mathbf{v}})=N\). \(\bar{{\mathbf R}}_{\mathbf{v}} = \frac{1}{\sigma ^2} \mathrm{E}[\mathbf{v} \mathbf{v}^T]\) is the normalized pseudo-covariance matrix. \( \bar{{\mathbf R}}_{\mathbf{v}}={\mathbf 0}\) if \(\mathbf{v}\) is circular complex. The pdf of \(\mathbf{v}\) is arbitrary but assumed to be symmetric, i.e., \(p_{\mathbf{v}}(\mathbf{v})=p_{\mathbf{v}}(-\mathbf{v})\). This implies \(\mathrm{E}(\prod _{i=1}^N v_i^{k_i} (v_i^*)^{\tilde{k}_i})=0\) for \(\sum _{i=1}^N \left( k_i+\tilde{k}_i\right) \) odd.

  2. 2.

    \(\mathbf{s}\) and \(\mathbf{v}\) are independent.

The task of noisy linear ICA is to demix the signals \(\mathbf{x}\) by a linear transform \({\mathbf W} \in \mathbb {C}^{N \times N}\)

$$\begin{aligned} \mathbf{y} = {\mathbf W} \mathbf{x} = {\mathbf W} {\mathbf A} \mathbf{s} + {\mathbf W} \mathbf{v} \end{aligned}$$
(3.45)

so that \(\mathbf{y}\) is “as close to \(\mathbf{s}\)” as possible according to some metric.

3.2 KLD-Based ICA for Determined Case

We focus on the ICA solution based on the KLD

$$\begin{aligned} D_{\text {KL}}({\mathbf W}) = \int p_{\mathbf{y}}(\mathbf{y}; {\mathbf W}) \ln \frac{p_{\mathbf{y}}(\mathbf{y}; {\mathbf W})}{p_{\mathbf{s}}(\mathbf{y})} d\mathbf{y} \end{aligned}$$
(3.46)

where \(p_{\mathbf{y}}(\mathbf{y}; {\mathbf W})\) is the pdf of \(\mathbf{y}\). It depends on the pdf of observation \(\mathbf{x}\), i.e., on the pdf of the original source signals \(\mathbf{s}\) and noise \(\mathbf{v}\), as well as on the demixing matrix \({\mathbf W}\). \(p_{\mathbf{s}}(\mathbf{s})= \prod _{i=1}^N p_{s_i}(s_i)\) is the assumed pdf of the original signals. We assume that we have perfect knowledge about the distribution of the original signals and \(p_{\mathbf{s}}(\mathbf{s})\) is identical to the true pdf \(p_{\mathbf{s}}^0(\mathbf{s})\) of \(\mathbf{s}\). The KLD is known to have the following properties:

  • \(D_{\text {KL}}({\mathbf W}) \ge 0\) for any \(p_{\mathbf{y}}(\mathbf{y}; {\mathbf W})\) and \(p_{\mathbf{s}}(\mathbf{y})\).

  • \(D_{\text {KL}}({\mathbf W})=0\) iff \(p_{\mathbf{y}}(\mathbf{y}; {\mathbf W})=p_{\mathbf{s}}(\mathbf{y})\).

This means, minimizing the KLD with respect to \({\mathbf W}\) is equivalent to making the pdf of the demixed signals \(\mathbf{y}\) as similar as possible to the pdf of the source signals \(p_{\mathbf{s}}(\mathbf{s})\). Since we assume \(p_{\mathbf{s}}(\mathbf{s})= \prod _{i=1}^N p_{s_i}(s_i)\), minimizing KLD corresponds to making (a) \(y_i\) as independent as possible and (b) \(y_i\) to have a pdf as close as possible to \(p_{s_i}(s_i)\). This has been stated as “total mismatch = deviation from independence + marginal mismatch” by Cardoso in [9]. The ICA solution \({\mathbf W}_{\text {ICA}}\) for the demixing matrix based on KLD is given by

$$\begin{aligned} {\mathbf W}_{\text {ICA}} = \arg \min _{{\mathbf W}} D_{\text {KL}}({\mathbf W}). \end{aligned}$$
(3.47)

In the following, we will first derive the ICA solution for the general noncircular complex case. The circular complex case and the real case are discussed as two special cases.

3.2.1 General Noncircular Complex Case

The KLD cost function of a complex demixing matrix \({\mathbf W}\) is a function of the real and imaginary part of \({\mathbf W}\). Using the Wirtinger calculus (see [21, 44] and the summary in Sect. 3.1.1.2), we can also write it as a function of \({\mathbf W}\) and \({\mathbf W}^*\):

$$\begin{aligned} D_{\text {KL}}({\mathbf W}, {\mathbf W}^*) = \int p_{\mathbf{y}}(\mathbf{y},\mathbf{y}^*;{\mathbf W} , {\mathbf W}^*) \ln \frac{p_{\mathbf{y}}(\mathbf{y}, \mathbf{y}^* ; {\mathbf W} , {\mathbf W}^* )}{p_{\mathbf{s}}(\mathbf{y}, \mathbf{y}^*)} d\mathbf{y}. \end{aligned}$$
(3.48)

The derivative \(\frac{\partial D_{\text {KL}}({\mathbf W},{\mathbf W}^*)}{\partial {\mathbf W}^*}\) of the KLD cost function in (3.48) is given by [21]

$$\begin{aligned} \frac{\partial D_{\text {KL}}({\mathbf W},{\mathbf W}^*)}{\partial {\mathbf W}^*} = -{\mathbf W}^{-H} +\mathrm{E}\left[ \pmb {\varphi }(\mathbf{y}, \mathbf{y}^*) \mathbf{x}^H\right] , \end{aligned}$$
(3.49)

where \(\pmb {\varphi }(\mathbf{y}, \mathbf{y}^*) = [\varphi _1(y_1, y_1^*), \dots , \varphi _N(y_N, y_N^*)]^T\) and \(\varphi _i(s_i, s_i^*)=-\frac{\partial \ln p_{s_i}(s_i,s_i^*)}{\partial s_i^*}\). The derivative \(\frac{\partial }{\partial s^*}\) is also defined using the Wirtinger calculus.

A necessary condition for minimizing \(D_{\text {KL}}({\mathbf W}, {\mathbf W}^*)\) at \({\mathbf W}= {\mathbf W}_{\text {ICA}}\) is

$$\begin{aligned} \left. \frac{\partial D_{\text {KL}}({\mathbf W},{\mathbf W}^*)}{\partial {\mathbf W}^*}\right| _{{\mathbf W}={\mathbf W}_{\text {ICA}}} \mathop {=}\limits ^{!} {\mathbf 0} \quad \text {or} \quad \mathrm{E}(\pmb {\varphi }(\mathbf{y}_{\text {ICA}},\mathbf{y}^*_{\text {ICA}}) \mathbf{y}_{\text {ICA}}^H) \mathop {=}\limits ^{!} {\mathbf I} \end{aligned}$$
(3.50)

with \(\mathbf{y}_{\text {ICA}} = {{\mathbf W}}_{\text {ICA}} {\mathbf{x}} ={{\mathbf W}}_{\text {ICA}} {{\mathbf A}} {\mathbf{s}} + {{\mathbf W}}_{\text {ICA}} {\mathbf{v}} ={\hat{\varvec{y}}} + {{\mathbf W}}_{\text {ICA}} {\mathbf{v}}\). An equivalent condition to \(\mathrm{E}(\pmb {\varphi }(\mathbf{y}_{\text {ICA}},\mathbf{y}^*_{\text {ICA}}) \mathbf{y}_{\text {ICA}}^H) \mathop {=}\limits ^{!} {\mathbf I}\) in (3.50) is

$$\begin{aligned} \mathrm{E}(\pmb {\varphi }(\mathbf{y}_{\text {ICA}},\mathbf{y}^*_{\text {ICA}}) \mathbf{y}_{\text {ICA}}^H)^* =\mathrm{E}(\pmb {\varphi }^*(\mathbf{y}_{\text {ICA}},\mathbf{y}^*_{\text {ICA}}) \mathbf{y}_{\text {ICA}}^T)\mathop {=}\limits ^{!} {\mathbf I} \end{aligned}$$
(3.51)

which we will use in the following to facilitate comparison with Sect. 3.2.

The properties of the ICA solution based on KLD are:

  • \({\mathbf W}_{\text {ICA}}= {\mathbf A}^{-1}\) if \(\sigma ^2=0\) (no noise) and \(p_{\mathbf{s}}(\mathbf{s}) = p_{\mathbf{s}}^0(\mathbf{s})\).

  • To compute \({\mathbf W}_{\text {ICA}}\), we do not need to know \({\mathbf A}\) or \(\mathbf{s}\), but the pdf \(p_{\mathbf{s}}(\mathbf{s})= \prod _{i=1}^N p_{s_i}(s_i)\) is required. All \(p_{s_i}(s_i)\) must either be non-Gaussian or Gaussian with distinct noncircularity indices.

  • No permutation ambiguity if \(p_{s_i}(\cdot ) \ne p_{s_j}(\cdot ) \; \forall i \ne j\).

  • There is no scaling ambiguity if \(p_{s_i}(s_i) = p_i^0(s_i)\) is known \(\forall i\). Only a phase ambiguity remains if \(p_{s_i}(s_i)\) is circular.

As shown in Appendix 2, the ICA solution for the general noncircular complex case can be derived approximately using a two-step perturbation analysis for low noise and is given by

$$\begin{aligned} {\mathbf W}_{\text {ICA}} =({\mathbf I} + \sigma ^2 {\mathbf C}) {\mathbf A}^{-1} + \fancyscript{O}(\sigma ^4). \end{aligned}$$
(3.52)

The elements of \({\mathbf C}\) can be obtained from (3.97) and (3.98). If \(p_{\mathbf{s}}(\mathbf{s})\) is symmetric in the real or imaginary part of \(\mathbf{s}\), they are given by (3.99) and (3.100).

For comparison, we consider the linear MMSE estimator

$$\begin{aligned} {\mathbf W}_{\text {MMSE}}&= {\mathbf A}^H \left( {\mathbf A} {\mathbf A}^H + \sigma ^2 {\mathbf R}_{\mathbf{v}}\right) ^{-1} \end{aligned}$$
(3.53)
$$\begin{aligned}&= \left[ {\mathbf I}-\sigma ^2 {\mathbf R}_{-1} \right] {\mathbf A}^{-1} + \fancyscript{O}(\sigma ^4). \end{aligned}$$
(3.54)

where the last line is a first-order Taylor series expansion in \(\sigma ^2\) and \({\mathbf R}_{-1} = {\mathbf A}^{-1} {\mathbf R}_{\mathbf{v}} {\mathbf A}^{-H}\). Comparing (3.54) with (3.52) we see that \({\mathbf W}_{\text {ICA}}\) and \({\mathbf W}_{\text {MMSE}}\) are similar if \({\mathbf C} \approx - {\mathbf R}_{-1}\).

3.2.2 Circular Complex Case

We assume now that the source signals \(\mathbf{s}\) and the noise \(\mathbf{v}\) are circular. Hence, both the noncircularity index of the sources \(\gamma \) and the pseudo-covariance matrix \(\bar{{\mathbf R}}_{\mathbf{v}}\) are zero. As a consequence, (3.99) and (3.100) simplify to

$$\begin{aligned} C_{ii}&=-\frac{\kappa _i + \lambda _i}{1+\rho _i +\delta _i}\left[ {\mathbf R}_{-1}\right] _{ii} \in \mathbb {R},\nonumber \\ C_{ij}&= -\frac{\kappa _j (\kappa _i -1)}{\kappa _i \kappa _j -1} \left[ {\mathbf R}_{-1}\right] _{ij} \in \mathbb {C} \quad (i \ne j). \end{aligned}$$
(3.55)

3.2.3 Real Case

For real signals and noise, we have

$$\begin{aligned} \gamma _i = 1, \quad {\mathbf R}_{\mathbf{v}} = \bar{{\mathbf R}}_{\mathbf{v}}. \end{aligned}$$
(3.56)

In the derivation of \({\mathbf W}_{\text {ICA}}\) we have considered Taylor series expansions of \(\pmb {\varphi }(\mathbf{y})\) using Wirtinger derivatives. The Wirtinger derivatives \(\partial / \partial s\) and \(\partial / \partial s^*\) of \(\varphi (s) \in \mathbb {R}\) are now identical (see (3.4)) and hence

$$\begin{aligned} \xi _i = \kappa _i, \quad \rho _i = \delta _i, \quad \lambda _i = \omega _i = \tau _i. \end{aligned}$$
(3.57)

Furthermore, the Wirtinger derivatives of \(\varphi (s) \in \mathbb {R}\) are identical to the real derivatives except for a factor of \(\frac{1}{2}\) (see (3.4)). Hence it holds

$$\begin{aligned} \kappa _i = \frac{\mathring{\kappa }_i}{2}, \quad \rho _i =\frac{\mathring{\rho }_i}{2}, \quad \lambda _i = \frac{\mathring{\lambda }_i}{4}, \end{aligned}$$
(3.58)

where \(\mathring{\kappa }_i\), \(\mathring{\rho }_i\) and \(\mathring{\lambda }_i\) are defined using real derivatives of \(\varphi (s)\), denoted by \((\cdot )^{\prime }\):

$$\begin{aligned} \mathring{\kappa }_i&=\mathrm{E}(\varphi _i^{\prime }(s_i)) = \int \frac{d}{ds_i} \left( \frac{-p_{s_i}^{\prime }(s_i)}{p_{s_i}(s_i)}\right) p_i^0(s_i) ds_i,\nonumber \\ \mathring{\rho }_i&=\mathrm{E}(\varphi _i^{\prime }(s_i)s_i^2) = \int \frac{d}{ds_i} \left( \frac{-p_{s_i}^{\prime }(s_i)}{p_{s_i}(s_i)}\right) s_i^2 p_i^0(s_i) ds_i,\nonumber \\ \mathring{\lambda }_i&=\mathrm{E}(\varphi _i^{\prime \prime }(s_i)s_i) = \int \frac{d^2}{ds_i^2} \left( \frac{-p_{s_i}^{\prime }(s_i)}{p_{s_i}(s_i)}\right) s_i p_i^0(s_i) ds_i. \end{aligned}$$
(3.59)

Using (3.56)–(3.58), we get from (3.99) and (3.100)

$$\begin{aligned} C_{ii}&=-\frac{\mathring{\kappa }_i + \frac{1}{2}\mathring{\lambda }_i}{1+\mathring{\rho }_i}\left[ {\mathbf R}_{-1}\right] _{ii}=-M_{ii} \left[ {\mathbf R}_{-1}\right] _{ii}, \nonumber \\ C_{ij}&=-\frac{\mathring{\kappa }_j (\mathring{\kappa }_i -1)}{\mathring{\kappa }_i \mathring{\kappa }_j -1} \left[ {\mathbf R}_{-1}\right] _{ij} = -M_{ij} \left[ {\mathbf R}_{-1}\right] _{ij} \quad (i \ne j). \end{aligned}$$
(3.60)

where \(M_{ii} = \frac{\mathring{\kappa }_i + \mathring{\lambda }_i / 2}{1+\mathring{\rho }_i}\) and \(M_{ij} = \frac{\mathring{\kappa }_j (\mathring{\kappa }_i -1)}{\mathring{\kappa }_i \mathring{\kappa }_j -1}\). Note that (3.60) corresponds to the results in [32].

3.3 Results for Complex Generalized Gaussian Distribution

We study KLD-ICA for \(N=3\) sources with spatially white Gaussian noise with \(E[\mathbf{v}\mathbf{v}^H] = \sigma ^2 {\mathbf I}\) and the square mixing matrix \({\mathbf A} = [a_{mn}]\), where \(a_{mn}=e^{- j\pi m \sin \theta _n}\) and \(\theta _n=-60^{\circ },0^{\circ },60^{\circ }\). As proposed in [26], we use the signal-to-interference-plus-noise ratio (SINR) to evaluate separation performance. For spatially uncorrelated noise, we compute the SINR for a given demixing matrix \({\mathbf W}\) by averaging the SINR for each source \(i\)

$$\begin{aligned} \text {SINR} = \frac{1}{N} \sum _{i=1}^N \frac{|[{\mathbf W} {\mathbf A}]_{ii}|^2}{\sum _{j \ne i} |[{\mathbf W} {\mathbf A}]_{ij}|^2 + \sigma ^2 \sum _j |{\mathbf W}_{ij}|^2}. \end{aligned}$$
(3.61)

The term \(|[{\mathbf W} {\mathbf A}]_{ii}|^2\) reflects the power of the desired source \(i\) in the demixed signal \(y_i\). The term \(\sum _{j \ne i} |[{\mathbf W} {\mathbf A}]_{ij}|^2\) corresponds to the power of the interfering signals \(j\ne i\) in the demixed signal \(y_i\) and \(\sigma ^2 \sum _j |{\mathbf W}_{ij}|^2\) is the noise power in the demixed signal \(y_i\). For the remainder of this section, the signal-to-noise ratio (SNR) is defined as that before the mixing process and not at the sensors, i.e., \(\text {SNR} = \frac{\mathrm{E}\left[ s^2\right] }{\sigma ^2} = \frac{1}{\sigma ^2}\). It can be shown that among all linear demixing matrices \({\mathbf W}\), \({\mathbf W}_{\text {MMSE}}\) from (3.53) is the one which maximizes the SINR [28]. We compare the SINR of the theoretical ICA solution \({\mathbf W}_{\text {ICA}}\) from (3.52), the average SINR of \(\hat{{\mathbf W}}_{\text {ICA}}\) obtained from 100 runs of KLD-based ICA using \(L\) samples and the SINR of \({\mathbf W}_{\text {MMSE}}\) from (3.53). The ICA algorithm is initialized with \({\mathbf W} = {\mathbf I}\) and performs gradient descent using the relative gradient [12], i.e., postmultiplies the gradient of KLD (3.49) by \({\mathbf W}^H {\mathbf W}\). We normalize each row of the relative gradient, resulting in an adaptive step size for each source. In the derivation of the theoretical solution \({\mathbf W}_{\text {ICA}}\), we evaluated all expectations exactly. Hence \({\mathbf W}_{\text {ICA}}\) only accounts for the bias from \({\mathbf A}^{-1}\) but not for estimation variance whereas \(\hat{{\mathbf W}}_{\text {ICA}}\) contains both factors.

In the following, all sources are GGD with the same shape parameter \(c_i=c\). The noncircular complex GGD with zero mean and \(E[|s|^2]=1\) has already been introduced in Sect. 3.2.4. By integration in polar coordinates, it can be shown that

$$\begin{aligned} \kappa&= \int \frac{\partial \varphi ^*}{\partial s^*} p_{\mathbf{s}}^0(s) ds=\frac{c^2 \varGamma (2/c)}{(1-\gamma ^2) \varGamma ^2(1/c)}, \end{aligned}$$
(3.62)
$$\begin{aligned} \delta&= \int \frac{\partial \varphi ^*}{\partial s^*} s s^* p_{\mathbf{s}}^0(s)=\frac{2 c+(1-c) \gamma ^2}{2 (1-\gamma ^2)}, \end{aligned}$$
(3.63)
$$\begin{aligned} \rho&= \int \frac{\partial \varphi ^*}{\partial s} s^2 p_{\mathbf{s}}^0(s)ds = -\frac{2 c-2 +(1-3c) \gamma ^2}{2 (1 - \gamma ^2)}, \end{aligned}$$
(3.64)
$$\begin{aligned} \xi&= \int \frac{\partial \varphi ^*}{\partial s} p_{\mathbf{s}}^0(s) ds=- \gamma \kappa , \end{aligned}$$
(3.65)
$$\begin{aligned} \lambda&= \int \frac{\partial ^2 \varphi ^*}{\partial s \partial s^*} s p_{\mathbf{s}}^0(s) ds=(c-1) \kappa , \end{aligned}$$
(3.66)
$$\begin{aligned} \omega&= \int \frac{\partial ^2 \varphi ^*}{(\partial s)^2 } s p_{\mathbf{s}}^0(s) ds=-\frac{3}{2} (c-1) \gamma \kappa , \end{aligned}$$
(3.67)
$$\begin{aligned} \tau&= \int \frac{\partial ^2 \varphi ^*}{(\partial s^*)^2 } s p_{\mathbf{s}}^0(s) ds=-\frac{1}{2} (c-1) \gamma \kappa . \end{aligned}$$
(3.68)

Note that there exists a relationship between these parameters and the ones in the derivation of the CRB in Sect. 3.2: \(\kappa \) and \(\xi \) are identical. Using Corollary 2 from [35], we furthermore get

$$\begin{aligned} \delta =\eta -1 \quad \text {and} \quad \rho =\beta -2 \end{aligned}$$
(3.69)

where \(\eta =\mathrm{E}\left[ |s|^2 |\varphi (s)|^2\right] \) and \(\beta = \mathrm{E}\left[ s^2 (\varphi ^*(s))^2\right] \) have been defined in (3.23) and (3.25) in the previous section. These relationships hold not only for GGD but for all source distributions.

3.3.1 Circular Complex Case

For a circular complex GGD, \(\gamma =0\) and hence we get \(\kappa = \frac{c^2 \varGamma (2/c)}{\varGamma ^2(1/c)}\), \(\delta = c\), \(\rho =c-1\), \(\lambda =(c-1) \kappa \) and \(\xi =\omega =\tau =0\). Figure 3.5 shows that for a wide range of the shape parameter \(c\), both the theoretical ICA solution \({\mathbf W}_{\text {ICA}}\) and its estimate \(\hat{{\mathbf W}}_{\text {ICA}}\) obtained by running KLD-ICA using \(L=10^4\) samples achieve an SINR close to that of the MMSE solution \({\mathbf W}_{\text {MMSE}}\). Note that for \(c\) close to \(1\), the SINR of the theoretical solution \({\mathbf W}_{\text {ICA}}\) is not achievable in practice, since all sources become Gaussian and the CRB approaches infinity for \(c \rightarrow 1\) (see Sect. 3.2 and [35]). Hence estimation of \({\mathbf W}\) becomes impossible. This is reflected in Fig. 3.5: The SINR for \(\hat{{\mathbf W}}_{\text {ICA}}\) estimated by KLD-ICA decreases for \(c \rightarrow 1\).

Note that for strongly non-Gaussian sources (\(c \ll 1\) or \(c \gg 1\)) the SINR of the theoretical solution \({\mathbf W}_{\text {ICA}}\) might be smaller than that for \(\hat{{\mathbf W}}_{\text {ICA}}\) because \({\mathbf W}_{\text {ICA}}\) is based on a Taylor series expansion up to order \(\sigma ^2\). For strongly non-Gaussian sources, higher-order terms become important. These are implicitly taken into account by \(\hat{{\mathbf W}}_{\text {ICA}}\) but not by \({\mathbf W}_{\text {ICA}}\).

Fig. 3.5
figure 5

SINR for circular complex GGD signals and circular complex noise, \(\text {SNR} = 10\,\mathrm {dB}\), \(L=10^4\) samples

3.3.2 Noncircular Complex Case

First, we study the performance with circular noise, i.e., \({\mathbf R}_{\mathbf{v}} = {\mathbf I}\) and \(\bar{{\mathbf R}}_{\mathbf{v}} = {\mathbf 0}\), and SNR of \(10\,\mathrm {dB}\). The SINR of the MMSE solution \({\mathbf W}_{\text {MMSE}}\) is \(12.4\,\mathrm {dB}\). Figure 3.6 shows that for a wide range of the shape parameter \(c\) and the noncircularity index \(\gamma \), the theoretical ICA solution \({\mathbf W}_{\text {ICA}}\) achieves an SINR close to that of MMSE. Comparing Fig. 3.6a, b, we note that the contour plot for the simulation using \(L=10^3\) samples differs from the contour plot for the theoretical ICA solution. One reason is that for noncircular sources with the same noncircularity index \(\gamma _i=\gamma \), the estimation variance increases for \(c \rightarrow 1\) (see Sect. 3.2 and [35]). Hence, in the simulation the SINR decreases in the vicinity of \(c=1\). Furthermore, the smaller sample size of \(L=10^3\) leads to a larger variance of \(\hat{{\mathbf W}}_{\text {ICA}}\) which is not reflected in the theoretical ICA solution \({\mathbf W}_{\text {ICA}}\). With a larger sample size the SINR of \({\mathbf W}_{\text {ICA}}\) would be much closer to that of \({\mathbf W}_{\text {MMSE}}\). However, Fig. 3.6b shows that even with a limited sample size KLD-ICA can still achieve SINR performance quite close to that of MMSE except for \(c\approx 1\).

Fig. 3.6
figure 6

SINR [dB] of ICA solution for noncircular complex GGD signals with \(\gamma _i=\gamma \), circular complex noise and \(\text {SNR} = 10\,\mathrm {dB}\). \({\mathbf W}_{\text {ICA}}\) (52) (a), \(\hat{{\mathbf W}}_{\text {ICA}}\) (simulation, \(L=10^3\) samples) (b)

Now, we consider the case where sources are noncircular complex with \(\gamma _1=0.5, \gamma _{2,3} = 0.5 \pm \Delta \gamma \) and the noise \(\mathbf{v}\) is noncircular with \({\mathbf R}_{\mathbf{v}} = {\mathbf I}\) and \(\bar{{\mathbf R}}_{\mathbf{v}} = 0.5 \cdot {\mathbf I}\), i.e., \(\gamma _{\text {noise}} = 0.5\). Figure 3.7 shows decreasing SINR values for \(c\rightarrow 1\) and \(\Delta \gamma \rightarrow 0\) since in that region \(|\mathfrak {R}C_{ij}|\) in (3.100) becomes large if sources or noise are noncircular. However, except for this region, the SINR of the theoretical ICA solution (Fig. 3.7a) is still close to that of MMSE (\(12.4\,\mathrm {dB}\)). The form of the contour plot for the simulation (Fig. 3.7b) is similar to that of the theoretical solution but shows slightly lower SINR performance especially for \(c\approx 1\) and small \(\Delta \gamma \). This is again due to increasing estimation variance for \(c\rightarrow 1\) and small \(\Delta \gamma \) (see Sect. 3.2 and [35]). Nevertheless, the performance obtainable in simulations can still be considered good as long as \(c\) is not close to \(1\) or \(\Delta \gamma \) is sufficiently large. Finally, we want to note that in Fig. 3.7 the decrease in SINR for strongly noncircular (large \(\Delta \gamma \)), non-Gaussian (\(c \ne 1\)) sources is caused by the noncircularity of the noise. The reason is that the MMSE (or maximum SINR) and the minimum KLD criterion yield different demixing matrices \({\mathbf W}\) for noncircular noise: As can be seen from (3.52), (3.97) and (3.98), \({\mathbf W}_{\text {ICA}}\) depends both on the noncircularity of the sources (\(\gamma _i \ne 0\)) as well as on the noncircularity of the noise (\(\bar{{\mathbf R}}_{\mathbf{v}} \ne {\mathbf 0}\)) whereas \({\mathbf W}_{\text {MMSE}}\) from (3.54) only depends on the normal covariance matrix of the noise \({\mathbf R}_{\mathbf{v}}\). This is due to the different cost functions: Minimization of KLD makes the pdf of the demixed signals as similar to the assumed pdf of the sources as possible whereas MMSE minimizes the expected quadratic error between the demixed signals and the original sources. For circular noise, the difference between \({\mathbf W}_{\text {ICA}}\) and \({\mathbf W}_{\text {MMSE}}\) in terms of SINR is much smaller.

Fig. 3.7
figure 7

SINR [dB] of ICA solution for noncircular complex GGD signals with \(\gamma _1=0.5, \gamma _{2,3} = \gamma _1 \pm \Delta \gamma \), noncircular complex noise, and \(\text {SNR} = 10\,\mathrm {dB}\). \({\mathbf W}_{\text {ICA}}\) (52) (a), \(\hat{{\mathbf W}}_{\text {ICA}}\) (simulation, \(L=10^3\) samples) (b)

In summary, the results in this subsection have shown that

  • in many cases the theoretical solution \({\mathbf W}_{\text {ICA}}\) of KLD-ICA can achieve an SINR close to the optimum attainable by the MMSE demixing matrix \({\mathbf W}_{\text {MMSE}}\).

  • for sources following a GGD, \(\hat{{\mathbf W}}_{\text {ICA}}\) obtained by running KLD-ICA with a finite amount of samples \(L\) can achieve an SINR quite close to that of \({\mathbf W}_{\text {MMSE}}\) except for (nearly) Gaussian sources with similar noncircularity indices.

  • for strongly noncircular, non-Gaussian sources and noncircular noise, the minimization of the KLD and of the MSE yield different solutions.

Although we assumed that we perfectly know the distributions of the sources, other approaches such as ICA-EBM [30] exist which do not require such knowledge. As shown in Fig. 3.8, simulation results using ICA-EBM show similar SINR performance as KLD-ICA (see Figs. 3.6b, 3.7b).

Fig. 3.8
figure 8

SINR [dB] of ICA-EBM solution for noncircular complex GGD signals, \(L=10^3\) samples and \(\text {SNR} = 10\,\mathrm {dB}\). \(\gamma _i=\gamma \), circular complex noise(a), \(\gamma _1=0.5\), \(\gamma _{2,3}=\gamma _1\pm \delta \gamma \) (b)

3.4 Extension to Overdetermined Case

ICA algorithms for the overdetermined case have already been studied in a number of publications (see e.g., [11, 25, 49, 50]). In the overdetermined case, \(\mathbf{x} \in \mathbb {C}^M\) with \(M>N\). In the noiseless case we can select any \(N\) rows of \(\mathbf{x}\) to perform ICA as long as the corresponding square mixing matrix \(\widetilde{{\mathbf A}}\) is invertible. When we consider noisy mixtures, this does not hold since the information contained in the \(M-N\) additional observations is useful to improve demixing. Hence, we need to consider the KLD for \(M>N\). In this case, the demixing matrix \({\mathbf W}\) can be decomposed as \({\mathbf W}=\begin{bmatrix} {\mathbf W}_1&{\mathbf W}_2 \end{bmatrix}\), where \({\mathbf W}_1 \in \mathbb {C}^{N \times N}\) and \({\mathbf W}_2 \in \mathbb {C}^{N \times (M-N)}\). We define an auxiliary vector \(\bar{\mathbf{y}} \in \mathbb {C}^M\):

$$\begin{aligned} \bar{\mathbf{y}} = \begin{bmatrix} {\mathbf W}_1&{\mathbf W}_2\\ {\mathbf 0}&{\mathbf I}_{M-N} \end{bmatrix} \begin{bmatrix} \mathbf{x}_1 \\ \mathbf{x}_2 \end{bmatrix} = \breve{{\mathbf W}} \mathbf{x} = \begin{bmatrix} \mathbf{y} \\ \mathbf{x}_2\end{bmatrix} \end{aligned}$$
(3.70)

Then we calculate \(p_{\mathbf{y}}(\mathbf{y}; {\mathbf W})\) by

$$\begin{aligned} p_{\bar{\mathbf{y}}}(\bar{\mathbf{y}}; \breve{{\mathbf W}})&= \frac{1}{|\text {det}(\breve{{\mathbf W}})|^2} p_{\mathbf{x}}(\mathbf{x})=\frac{1}{|\text {det}({\mathbf W}_1)|^2} p_{\mathbf{x}}(\mathbf{x}), \end{aligned}$$
(3.71)
$$\begin{aligned} p_{\mathbf{y}}(\mathbf{y}; {\mathbf W})&= \frac{1}{|\text {det}({\mathbf W}_1)|^2} \int p_{\mathbf{x}}(\mathbf{x}) d\mathbf{x}_2. \end{aligned}$$
(3.72)

since the linear transformation of a complex random vector yields \(|\det ({\mathbf W})|^2\) instead of \(|\det ({\mathbf W})|\) in the real case (see [4, 42]).

Using the above steps we obtain the modified KLD for \(M>N\)

$$\begin{aligned} D_{\text {KL}}({\mathbf W}) = - \ln | \text {det}({\mathbf W}_1)|^2 - \sum _{i=1}^N \mathrm{E}\left[ \ln p_{s_i}(y_i)\right] + \text {const.} \end{aligned}$$
(3.73)

instead of \(D_{\text {KL}}({\mathbf W}) = - \ln | \text {det}({\mathbf W})|^2 - \sum _{i=1}^N \mathrm{E}\left[ \ln p_{s_i}(y_i)\right] + \text {const.}\) for \(M=N\).

To derive \({\mathbf W}_{\text {ICA}}\) for \(M>N\), we could now perform a similar Taylor series expansion as for \(M=N\). However, it is more convenient to reduce the overdetermined case \(M>N\) to the determined case by applying a linear transform to the data to condense all information about the source signals in the first \(N\) observations and by applying another transform to decorrelate the noise terms in the first \(N\) observations from those in the remaining \(M-N\) observations. The result of these two transforms has a similar effect as a dimension reduction using principal component analysis (PCA) except that the correlation matrix of the observations is only block-diagonal instead of diagonal. To derive \({\mathbf W}_{\text {ICA}}\), we can then combine the solution for the determined case with the linear transforms. Note that this approach is only used for the analysis of KLD-based ICA for the overdetermined case because it simplifies the theoretical derivation. In ICA applications, the transforms are done implicitly by the algorithm itself.

The first step of this procedure is to use the orthogonal transform \({\mathbf Q}\) defined by the decomposition \({\mathbf A} = {\mathbf Q}^H \begin{bmatrix} \bar{{\mathbf A}}_1\\ {\mathbf 0} \end{bmatrix}\) to condense all information about the source signals in the first \(N\) observations:

$$\begin{aligned} \bar{\mathbf{x}}&= {\mathbf Q} \mathbf{x} = {\mathbf Q} ({\mathbf A} \mathbf{s} + \mathbf{v}) \nonumber \\&={\mathbf Q}{\mathbf Q}^H \begin{bmatrix} \bar{{\mathbf A}}_1 \\ {\mathbf 0} \end{bmatrix} \mathbf{s} + {\mathbf Q} \mathbf{v} =\begin{bmatrix} \bar{{\mathbf A}}_1 \\ {\mathbf 0} \end{bmatrix} \mathbf{s} + \bar{\mathbf{v}} = \begin{bmatrix} \bar{{\mathbf A}}_1 \mathbf{s} \\ {\mathbf 0} \end{bmatrix} + \begin{bmatrix} \bar{\mathbf{v}}_1\\ \bar{\mathbf{v}}_2 \end{bmatrix} = \begin{bmatrix} \bar{\mathbf{x}}_1\\ \bar{\mathbf{x}}_2 \end{bmatrix}, \end{aligned}$$
(3.74)
$$\begin{aligned} {\mathbf R}_{\bar{\mathbf{v}}}&= \frac{1}{\sigma ^2} \mathrm{E}\left( \bar{\mathbf{v}} \bar{\mathbf{v}}^H\right) = {\mathbf Q} {\mathbf R}_{\mathbf{v}} {\mathbf Q}^H = \begin{bmatrix} {\mathbf R}_{\bar{\mathbf{v}}_{11}}&{\mathbf R}_{\bar{\mathbf{v}}_{12}} \\ {\mathbf R}_{\bar{\mathbf{v}}_{21}}&{\mathbf R}_{\bar{\mathbf{v}}_{22}} \end{bmatrix}. \end{aligned}$$
(3.75)

Note that \(\bar{\mathbf{v}}_1\) and \(\bar{\mathbf{v}}_2\) may be correlated, i.e., \(\bar{\mathbf{x}}_2 = \bar{\mathbf{v}}_2\) is useful for the processing of \(\bar{\mathbf{x}}_1=\bar{{\mathbf A}}_1 \mathbf{s} + \bar{\mathbf{v}}_1\) to reduce the impact of \(\bar{\mathbf{v}}_1\). Hence, we decorrelate the noise terms \(\bar{\mathbf{v}}_1\) and \(\bar{\mathbf{v}}_2\) by a second transform \({\mathbf T} = \begin{bmatrix} {\mathbf I}_N&- {\mathbf R}_{\bar{\mathbf{v}}_{12}} {\mathbf R}_{\bar{\mathbf{v}}_{22}}^{-1} \\ {\mathbf 0}&{\mathbf I}_{M-N} \end{bmatrix}\):

$$\begin{aligned} \widetilde{\mathbf{x}}&= {\mathbf T} \bar{\mathbf{x}} = {\mathbf T} \begin{bmatrix} \bar{{\mathbf A}}_1 \mathbf{s} \\ {\mathbf 0} \end{bmatrix} + {\mathbf T} \bar{\mathbf{v}} =\begin{bmatrix} \bar{{\mathbf A}}_1 \mathbf{s} \\ {\mathbf 0} \end{bmatrix} + \widetilde{\mathbf{v}} = \begin{bmatrix} \bar{{\mathbf A}}_1 \mathbf{s} \\ {\mathbf 0} \end{bmatrix} + \begin{bmatrix} \widetilde{\mathbf{v}}_1 \\ \widetilde{\mathbf{v}}_2 \end{bmatrix} = \begin{bmatrix} \widetilde{\mathbf{x}}_1 \\ \widetilde{\mathbf{x}}_2 \end{bmatrix},\end{aligned}$$
(3.76)
$$\begin{aligned} {\mathbf R}_{\widetilde{\mathbf{v}}}&= \frac{1}{\sigma ^2} \mathrm{E}\left( \widetilde{\mathbf{v}} \widetilde{\mathbf{v}}^H\right) = \begin{bmatrix} {\mathbf R}_{\bar{\mathbf{v}}_{11}} - {\mathbf R}_{\bar{\mathbf{v}}_{12}} {\mathbf R}_{\bar{\mathbf{v}}_{22}}^{-1} {\mathbf R}_{\bar{\mathbf{v}}_{21}}&{\mathbf 0} \\ {\mathbf 0}&{\mathbf R}_{\bar{\mathbf{v}}_{22}} \end{bmatrix}. \end{aligned}$$
(3.77)

\(\widetilde{\mathbf{v}}_1\) and \(\widetilde{\mathbf{v}}_2\) are now uncorrelated and \(\widetilde{\mathbf{x}}_2=\widetilde{\mathbf{v}}_2\) does not contain any second-order information useful for the processing of \(\widetilde{\mathbf{x}}_1 = \bar{{\mathbf A}}_1 \mathbf{s} + \widetilde{\mathbf{v}}_1\).

The separated signals \(\mathbf{y}\) are now obtained by

$$\begin{aligned} \mathbf{y} = {\mathbf W} \mathbf{x} = {\mathbf W} {\mathbf Q}^H {\mathbf T}^{-1} \widetilde{\mathbf{x}} = \widetilde{{\mathbf W}} \widetilde{\mathbf{x}} =\begin{bmatrix} \widetilde{{\mathbf W}}_1&\widetilde{{\mathbf W}}_2 \end{bmatrix} \begin{bmatrix} \widetilde{\mathbf{x}}_1 \\ \widetilde{\mathbf{x}}_2 \end{bmatrix} = \widetilde{{\mathbf W}}_1 \widetilde{\mathbf{x}}_1 + \widetilde{{\mathbf W}}_2 \widetilde{\mathbf{x}}_2 = \mathbf{y}_1 + \mathbf{y}_2 \end{aligned}$$
(3.78)

with \(\widetilde{{\mathbf W}} = {\mathbf W} {\mathbf Q}^H {\mathbf T}^{-1}\). The noise-only contribution \(\mathbf{y}_2 = \widetilde{{\mathbf W}}_2 \widetilde{\mathbf{x}}_2 = \widetilde{{\mathbf W}}_2 \widetilde{\mathbf{v}}_2\) to \(\mathbf{y}\) is uncorrelated to \(\mathbf{y}_1 = \widetilde{{\mathbf W}}_1 \widetilde{\mathbf{x}}_1\). Hence, it is sufficient to consider the first \(N\) observations \(\widetilde{\mathbf{x}}_1\) to derive the ICA solution for \(\widetilde{{\mathbf W}}_1\).

Considering the KLD (3.73) for the transformed demixing model \(\mathbf{y} = \begin{bmatrix} \widetilde{{\mathbf W}}_1&\widetilde{{\mathbf W}}_2\end{bmatrix} \begin{bmatrix} \widetilde{\mathbf{x}}_1 \\ \widetilde{\mathbf{x}}_2 \end{bmatrix}\), we get

$$\begin{aligned} D_{\text {KL}}(\widetilde{{\mathbf W}}) = - \ln | \text {det}(\widetilde{{\mathbf W}}_1)| - \sum _{i=1}^N \mathrm{E}\left[ \ln p_{s_i}(y_i)\right] + \text {const.} \end{aligned}$$
(3.79)

with \(\widetilde{{\mathbf W}} = \begin{bmatrix} \widetilde{{\mathbf W}}_1&\widetilde{{\mathbf W}}_2\end{bmatrix}\). The real derivatives of \(D_{\text {KL}}(\widetilde{{\mathbf W}})\) with respect to \(\widetilde{{\mathbf W}}_1\) and \(\widetilde{{\mathbf W}}_2\) are given by

$$\begin{aligned} \frac{\partial D_{\text {KL}}(\widetilde{{\mathbf W}})}{\partial \widetilde{{\mathbf W}}_1}&= \mathrm{E}\left[ \pmb {\varphi }(\mathbf{y}) \widetilde{\mathbf{x}}_1^H\right] - \widetilde{{\mathbf W}}_1^{-H} = \mathrm{E}\left[ \pmb {\varphi }(\mathbf{y}) \widetilde{\mathbf{y}}_1^H - {\mathbf I} \right] \widetilde{{\mathbf W}}_1^{-H} \mathop {=}\limits ^{!} {\mathbf 0} \end{aligned}$$
(3.80)
$$\begin{aligned} \frac{\partial D_{\text {KL}}(\widetilde{{\mathbf W}})}{\partial \widetilde{{\mathbf W}}_2}&= \mathrm{E}\left[ \pmb {\varphi }(\mathbf{y}) \widetilde{\mathbf{x}}_2^H\right] = \mathrm{E}\left[ \pmb {\varphi }(\mathbf{y}) \widetilde{\mathbf{y}}_2^H \right] \widetilde{{\mathbf W}}_2^{-H} \mathop {=}\limits ^{!} {\mathbf 0} \end{aligned}$$
(3.81)

A perturbation analysis of (3.81) at \(\mathbf{y} = \widetilde{{\mathbf W}}_1 \bar{{\mathbf A}}_1 \mathbf{s}\) yields \(\widetilde{{\mathbf W}}_2 = \fancyscript{O}(\sigma ^4)\) due to \(\widetilde{\mathbf{x}}_2 = \widetilde{\mathbf{v}}_2\). Hence, \(\mathbf{y}\) is given by \(\mathbf{y} = \widetilde{{\mathbf W}}_1 \widetilde{\mathbf{x}}_1 + \fancyscript{O}(\sigma ^4)\). The solution for \(\widetilde{{\mathbf W}}_1\) is similar to the case of \(M=N\)

$$\begin{aligned} \widetilde{{\mathbf W}}_{1,\text {ICA}} = ({\mathbf I} + \sigma ^2 {\mathbf C})\bar{{\mathbf A}}_1^{-1} + \fancyscript{O}(\sigma ^4) \end{aligned}$$
(3.82)

where the elements of \({\mathbf C}\) can be computed from (3.97) and (3.98) with \(\bar{{\mathbf R}}_{-1}={\mathbf 0}\) and \({\mathbf R}_{-1} = (\bar{{\mathbf A}}_1^{-1} ( {\mathbf R}_{\bar{\mathbf{v}}_{11}} - {\mathbf R}_{\bar{\mathbf{v}}_{12}} {\mathbf R}_{\bar{\mathbf{v}}_{22}}^{-1}) \bar{{\mathbf A}}_1^{-H})\).

Finally, we need to combine \(\widetilde{{\mathbf W}}_{1,\text {ICA}}\), \({\mathbf T}\) and \({\mathbf Q}\) to form the final solution:

$$\begin{aligned} \widetilde{{\mathbf W}}_{\text {ICA}}&= \begin{bmatrix} \widetilde{{\mathbf W}}_{1,\text {ICA}}&{\mathbf 0} \end{bmatrix} + \fancyscript{O}(\sigma ^4),\nonumber \\ {\mathbf W}_{\text {ICA}}&=\widetilde{{\mathbf W}}_{\text {ICA}} {\mathbf T} {\mathbf Q} =\widetilde{{\mathbf W}}_{1,\text {ICA}} \begin{bmatrix} {\mathbf I}&-{\mathbf R}_{\bar{\mathbf{v}}_{12}} {\mathbf R}_{\bar{\mathbf{v}}_{22}}^{-1} \end{bmatrix} {\mathbf Q} + \fancyscript{O}(\sigma ^4). \end{aligned}$$
(3.83)

Note that for the case of noncircular complex noise, the presented transformation does not work since we would need to take into account the pseudo-covariance matrix of the noise.

3.4.1 Results for Circular Complex GGD

Here, we study the performance for the overdetermined case with \(M=6\) sensors, \(N=3\) sources, and circular complex noise. The sources follow a circular complex GGD distribution with identical shape parameters \(c\). Similar to Sect. 3.3.3, we use the mixing matrix \({\mathbf A}=[a_{mn}]\) with \(a_{mn}=e^{-j \pi m \sin \theta _n}\) with \(\theta _n=-60^{\circ },0^{\circ },60^{\circ }\). We first consider spatially uncorrelated noise with \({\mathbf R}_{\mathbf{v}} = {\mathbf I}\). Figure 3.9a shows that for a wide range of the shape parameter \(c\), both the theoretical ICA solution \({\mathbf W}_{\text {ICA}}\) and its estimate \(\hat{{\mathbf W}}_{\text {ICA}}\) obtained by running KLD-ICA using \(L=10^4\) samples achieve an SINR close to that of the MMSE solution \({\mathbf W}_{\text {MMSE}}\). Furthermore, note that additional sensors can improve the SINR of the demixed signals: Using only the first \(M=3\) sensors, \({\mathbf W}_{\text {MMSE}}\) achieves an SINR of \(12.4\,\mathrm {dB}\) (see Fig. 3.5), whereas with \(M=6\) sensors it achieves an SINR of \(17.4\,\mathrm {dB}\).

Fig. 3.9
figure 9

SINR for overdetermined case with circular complex GGD signals and circular complex noise, \(\text {SNR} = 10\,\mathrm {dB}\), \(L=10^4\) samples. \({\mathbf R}_{\mathbf{v}} = {\mathbf I}\) (a), \({\mathbf R}_{\mathbf{v}} \ne {\mathbf I}\) (b)

When the noise \(\mathbf{v}\) is correlated with the normalized correlation matrix

$$\begin{aligned} {\mathbf R}_{\mathbf{v}} = \left[ \begin{array}{rrrrrr} 1.00 + 0.00j &{} 0.62 + 0.23j &{} 0.44 - 0.16j &{} 0.46 + 0.11j &{} -0.09 + 0.26j &{} -0.03 + 0.09j\\ 0.62 - 0.23j &{} 1.00 + 0.00j &{} 0.56 + 0.06j &{} 0.47 - 0.13j &{} 0.44 + 0.18j &{} -0.09 + 0.26j\\ 0.44 + 0.16j &{} 0.56 - 0.06j &{} 1.00 + 0.00j &{} 0.52 + 0.09j &{} 0.47 - 0.13j &{} 0.46 + 0.11j\\ 0.46 - 0.11j &{} 0.47 + 0.13j &{} 0.52 - 0.09j &{} 1.00 + 0.00j &{} 0.56 + 0.06j &{} 0.44 - 0.16j\\ -0.09 - 0.26j &{} 0.43 - 0.18j &{} 0.47 + 0.13j &{} 0.56 - 0.06j &{} 1.00 + 0.00j &{} 0.62 + 0.23j\\ -0.03 - 0.09j &{}-0.09 - 0.26j &{} 0.46 - 0.11j &{} 0.44 + 0.16j &{} 0.62 - 0.23j &{} 1.00 + 0.00j \end{array}\right] , \end{aligned}$$

\({\mathbf W}_{\text {MMSE}}\) achieves an SINR of \(22.2\,\mathrm {dB}\) for an SNR of \(10\,\mathrm {dB}\) and all \(M=6\) sensors. With the first \(M=3\) sensors, it achieves only an SINR of \(13.3\,\mathrm {dB}\). Compared to the case of uncorrelated noise, the form of the SINR curve for \(\hat{{\mathbf W}}_{\text {ICA}}\) changes slightly but it is still quite close to that of \({\mathbf W}_{\text {MMSE}}\) except for \(c \approx 1\) (see Fig. 3.9b).

4 Conclusion

We have derived an analytic expression for the demixing matrix of KLD-based ICA for the low noise regime. We have considered the general noncircular complex determined case. The solution for the circular complex and real case can be derived as special cases. Furthermore, we have shown how to reduce the overdetermined case \(M>N\) to the determined case. Although the KLD and MMSE solutions differ, linear demixing based on these two criteria yields demixed signals with similar SINR in many cases. In practice, however, not only the bias studied in this chapter but also the variance of the estimate are important for SINR. For the noiseless case, the variance of the estimated demixing matrix is lower bounded by the CRB derived in Sect. 3.2 and [35].