Keywords

1 Introduction

Blind source separation (BSS) as an active topic in signal processing community aims to separate linearly or nonlinearly mixtures in both noise-free and noisy environments mixed latent source signals. It has become an important topic of research and development in many areas [13]. Various algorithms have been proposed in the last two decades to separate or extract source signals from their mixtures [17]. The propose of BSS is to separate the potential source signals from the mixtures obtained by the sensors without a priori about the source signals and the mixing process. This is realized by a variety of criteria, including the minimization of mutual information (MI), maximization of nonGaussianity (NG) and maximization of likelihood (ML) [8, 9]. A key factor in BSS is the assumption about the statistical properties of source signals such as statistical independence among different ones. When the source signals are mutual independent, the BSS problem can be solved by using the so called independent component analysis (ICA) method which has attracted considerable attention in the signal processing fields and several efficient algorithms have been proposed [13].

Despite the success of using standard ICA for BSS in many applications, original ICA algorithms are in the sense that all sources are assumed to be statistically independent random variables, but the basic assumptions of ICA may not hold for some real-world situations. Several extented ICA models have been considered based on the basic ICA framework. This type of model can be called dependent component analysis (DCA) model. Multidimensional independent component analysis (MICA) model [11, 12] as the first DCA model of BSS problem, instead of assuming all the source signals to be mutually statistically independent, all the source signals are divided into several groups and the size of the groups can be different, where the signals between different groups are statistical independent and signals within the same group have dependencies, this DCA model can also be called independent subspace analysis (ISA). DCA related algorithm concern mostly the estimation of the entropy or of the MI. Various BSS algorithms have been developed in response to different DCA models [1324].

Another extension of the original BSS task is the blind source extraction (BSE) problem. Unlike common BSS algorithms which consist of separating all the source signals simultaneously by means of the maximization of an independence measure between the output estimated signals, however, in some situations, it may be more appropriate to extract only a single source of interest based on a certain fundamental signal property, which is the task of BSE [2, 10]. One of the main advantages of BSE compared to traditional BSS is that it decreases computational cost since the degrees of freedom are reduced, the possibility to relax the need for preprocessing or postprocessing. Furthermore, this procedure has a great potential when the number of sensors and sources are not equal, even when unknown or underdetermined.

In addition, as compared with overdetermined or determined BSS problem, the underdetermined BSS one, where the number of available recorded mixtures is less than the underlying source signals, is more difficult to treat and attracts much attention in recent years. In this case, even if the mixing matrix is known or has been estimated, it is impossible to estimate the source signals directly. Therefore, in order to realize the source extraction from the mixed signals, some a priori knowledge about the whole system must to be exploited, such as independence or sparsity [25, 26].

The simultaneous assumption of the two extensions of BSS, that is, DCA combined with BSE in underdetermined situation seems to be a more realistic model than any of the two models alone. For example, at the biomedical signal processing, only small number of sources should be extracted which may be weak correlated in spatial. Cardoso showed that strong relationship existed among MI, correlation and NG of source estimates [27]. Inspired from this conclusion, we get that one can not resort minimization the MI, but on the other hand, according to maximization the NG, the dependent sources can be separated or extracted.

Here, we exploit some weaker conditions for extraction or separation source signals assuming that they have statistically dependent properties in the underdetermined situation. Based on the generalization of the central limit theorem (CLT) to special dependent variables, we will try to track the DCA model by maximization NG measure. The proposed NG measure is defined in terms of cumulative distribution function (CDF) instead of the widespread probability density function (PDF) using the nonparametric estimation method, afterwards, the NG distance between the given CDF and the standard normal CDF is proposed which can be estimated by the order statistics using the \( L^{2} \) norm efficiently. The NG distance based cost function will be optimized resorting to a deflation procedure by gradient iterative algorithm, whose local maximization performs the extraction of one dependent component.

2 Theory Fundamentals

The problem of linear instantaneous BSS problem can be formulated as Eq. (1) (see [13] for overview):

$$ {\mathbf{x}}(t) = {\mathbf{As}}(t) + {\mathbf{n}}(t) $$
(1)

where \( {\mathbf{s}}(t) = (s_{1} (t),s_{2} (t), \cdots ,s_{N} (t))^{\text{T}} \) is an unknown source vector which contains \( N \) source signals. The \( M \) observed mixtures \( {\mathbf{x}}(t) = (x_{1} (t),x_{2} (t), \cdots ,x_{M} (t))^{\text{T}} \) are sometimes called as sensor outputs. Matrix \( {\mathbf{A}} = [a_{ij} ] \in {\mathbb{R}}^{M \times N} \) is an unknown full column rank mixing matrix. \( {\mathbf{n}}(t) = (n_{1} (t),n_{2} (t), \cdots ,n_{N} (t))^{\text{T}} \) is a vector of additive noise.

The task of BSS contains estimation of the mixing matrix \( {\mathbf{A}} \) or its pseudoinverse separating (unmixing) matrix \( {\mathbf{W}} = {\mathbf{A}}^{\dag } \) in order to estimate the original source signals \( {\mathbf{s}}(t) \), given only a finite number of observation data \( \{ {\mathbf{x}}(t),\;t = 1, \cdots ,T\} \).

In order to simplify the problem, most of the algorithms of BSS problem contain a spatial decorrelation procedure \( {\mathbf{V}} \) over noiseless \( {\mathbf{x}}(t) \) to obtain the decorrelated signals \( {\mathbf{z}}(t) = (z_{1} (t),z_{2} (t), \cdots ,z_{M} (t))^{\text{T}} \), that is,

$$ {\mathbf{z}}(t) = {\mathbf{Vx}}(t) $$
(2)

So, the global mixture can be expressed as,

$$ {\mathbf{z}}(t) = {\mathbf{Us}}(t) $$
(3)

where \( {\mathbf{U = VA}} \) is an unknown orthogonal matrix.

After some standard BSS methods (such as ICA) are used to the preprocessed decorrelated data \( {\mathbf{z}}(t) \), one can obtain an unitary linear transformation \( {\mathbf{B}} \) and the estimation of the source signals \( {\mathbf{y}}(t) \):

$$ {\mathbf{y}}(t) = {\mathbf{Bz}}(t) = {\mathbf{BVx}}(t) = {\mathbf{BVAs}}(t) $$
(4)

Denote \( {\mathbf{W}} = {\mathbf{BV}} \), then one can get

$$ {\mathbf{y}}(t) = {\mathbf{WAs}}(t) $$
(5)

Recall that two indeterminacies cannot be resolved in BSS without some priori knowledges: scaling and permutation ambiguities. Thus, if the estimate of the mixing matrix \( {\hat{\mathbf{A}}} \) satisfies

$$ {\mathbf{P = WA = } \mathbf{\hat{A}}\mathbf{A = GD}} $$
(6)

where \( {\mathbf{P}} \) is a global transformation which combines the mixing and separating system, \( {\mathbf{G}} \) is permutation matrix and \( {\mathbf{D}} \) is some nonsingular scaling diagonal matrix, then \( ({\hat{\mathbf{A}}},{\hat{\mathbf{s}}}) \) and \( ({\mathbf{A}},{\mathbf{s}}) \) are said to be related by a waveform-preserving relation.

The propose of BSE is to design an extracting vector \( {\mathbf{w}} \) to extract an expected source signal from the mixtures \( {\mathbf{x}}(t) \),

$$ y(t) = {\mathbf{w}}^{\text{T}} {\mathbf{x}}(t) = {\mathbf{w}}^{\text{T}} {\mathbf{As}}(t) $$
(7)

where \( y(t) \) is an estimated of a source signal with scalar ambiguity.

In applications, the priori information of the expected source signals can be utilized to design proper extracted algorithms, so, any of the source signals could come out as the first one with particular property, such as absolute normalized kurtosis value [2], temporal structure [2830], sparseness [31], morphological structure [32] and so on. In this paper, as shown in Sect. 4, the priori information of the desired source signal to be extracted is the maximum NG measure.

3 Generalized CLT and Nonparametric NG Measure

The Gaussian distribution has the maximum Shannon differential entropy (maximum uncertainty) over all the continuous distributions defined on the real space with the same variance. This fact makes the Gaussianity measure a very useful tool for the characterization of data. In recent years, a connection between NG and ICA has been suggested. It can also be explained by the CLT theory. Since CLT is not valid for any set of dependent variables, we must be aware that one may not always recover the original dependent source signals using maximum NG criteria. Caiafa et al. give a very special condition on sources, for which the linear combinations of dependent signals are not more Gaussian than the components and therefore the maximum NG criteria fails, but fortunately this is not the case in most of real world scenarios [33]. Moreover, the independence of source signals is not required when we solve blind deconvolution problem [34]. Additionally, based on minimum entropy, the dependent source signals can be recovered [35].

Conclusion 1.

The maximum NG method can be described as exploring a linear transformation of the mixed signals in the unit-variance signals space, so that the transformed signals (source signal estimates) have maximum NG distributions.

According to this conclusion, if we choose a robust and efficient NG measure, the source signals can be extracted or separated properly.

A natural measure of NG based on the \( L^{2} \) distance of an estimated PDF to the Gaussian PDF is introduced in [19, 33]. The NG measure is defined as Eq. (8):

$$ d(y,g) = \left( {\int {\left[ {p_{g} (y) - p_{y} (y)} \right]^{2} } dy} \right)^{1/2} $$
(8)

where the integral is defined in Lebesgue sense and is taken on all the range of variable \( y \), and \( p_{g} (y) \) is the Gaussian PDF with the same variance of variable \( y \) whose PDF is \( p_{y} (y) \). In this paper, we will build the NG measure using the concept of CDF instead of traditional PDF. Let us call \( F_{y} \) and \( F_{g} \) are the CDFs of random variable \( y \) to be analyzed and its equivalent Gaussian one, respectively, then, the NG measure based on CDF can defined as Eq. (9):

$$ d(F_{{y_{i} }} ,F_{g} ) = \left( {\int_{ - \infty }^{\infty } {\left[ {F_{{y_{i} }} (x) - F_{g} (x)} \right]^{2} dx} } \right)^{1/2} $$
(9)

The definition of NG measure \( d(F_{{y_{i} }} ,F_{g} ) \) possesses the following property that the distance measure should have:

$$ \left\{ \begin{aligned} d(F_{{y_{i} }} ,F_{g} ) = 0,\quad c = 2 \hfill \\ d(F_{{y_{i} }} ,F_{g} ) > 0,\quad c \ne 2 \hfill \\ \end{aligned} \right. $$
(10)

where \( c \) is the shape parameter in the generalized Gaussian distribution (GGD).

The PDF of GGD can be described as: \( p(y) = \frac{c}{2\gamma \,\,\,\Gamma (1/c)}\exp \left[ { - \left( {\left| {y - \mu_{y} } \right|\text{/}\gamma } \right)^{c} } \right] \), where \( \Gamma (z) = \int_{0}^{\infty } {e^{ - t} t^{z - 1} } dt \) is Gamma function and \( \gamma = \sqrt {\sigma^{2} \Gamma \left( {{1 \mathord{\left/ {\vphantom {1 c}} \right. \kern-0pt} c}} \right)/\Gamma \left( {{3 \mathord{\left/ {\vphantom {3 c}} \right. \kern-0pt} c}} \right)} \) is the scale parameter. By changing the values \( c(c > 0) \), a family of distributions with different sharpness will be given. From the relationship between the CDF based NG measure \( d(F_{{y_{i} }} ,F_{g} ) \) and the shape parameter or Gaussian parameter \( c \), we can conclude that the distance measure \( d(F_{{y_{i} }} ,F_{g} ) \) can be used as the measure of NG since it offers a global minimum when the Gaussian parameter \( c \) various from \( 0^{ + } \) to infinite and it reaches its global minimum when \( c = 2 \). In other words, the measure of NG obtains its global minimum when the analyzed distribution is Gaussian.

As the definition of \( d(F_{{y_{i} }} ,F_{g} ) \), we must estimate the CDF \( F_{{y_{i} }} \). So, the next question is how to efficient get the estimation of CDF \( \hat{F}_{{y_{i} }} \). As we know, it would need high computational cost through nonparametric histograms. Alternatively, an equivalent measure can be established in terms of inverse CDF, which is defined as:

$$ Q_{{y_{i} }} = F_{{y_{i} }}^{ - 1} ,\quad Q_{g} = F_{g}^{ - 1} $$
(11)

The relationship between CDF \( F \) and its inverse \( Q \) can be generalized to the NG distance, which is formulated as:

$$ \left\{ \begin{aligned} D(Q_{{y_{i} }} ,Q_{g} ) = 0,\quad c = 2 \hfill \\ D(Q_{{y_{i} }} ,Q_{g} ) > 0,\quad c \ne 2 \hfill \\ \end{aligned} \right. $$
(12)

Since the relationship between \( Q \) and \( F \), they also present the monotone properties in the proper intervals of \( c \), consequently, the distance \( d\left( {\boldsymbol{ \cdot }} \right) \) and its correspondent \( D\left( {\boldsymbol{ \cdot}} \right) \) preserve the same properties, as a result, one can conclude that

$$ D(Q_{{y_{i} }} ,Q_{g} ) = \left( {\int_{0}^{1} {\left[ {Q_{{y_{i} }} (x) - Q_{g} (x)} \right]^{2} dx} } \right)^{1/2} $$
(13)

is also a proper NG measure. In order to estimate the NG measure \( D(Q_{{y_{i} }} ,Q_{g} ) \) from the discrete samples, we must estimate \( Q_{{y_{i} }} \) firstly. The estimation of \( Q_{{y_{i} }} \) can be performed robustly in a simple practical way by using the order statistics (OS) from a large set of discrete time samples. Then the quantiles of the CDF can be constructed using OS, which is a consistent estimator of the distribution [34]:

$$ \hat{Q}_{{y_{i} }} \left( \frac{k}{T} \right) = y_{i(k)} \; \Leftrightarrow \hat{F}\left( {y_{i(k)} } \right) = \;\frac{k}{T} $$
(14)

As a result, the estimation of NG measure using the OS can be expressed as:

$$ \hat{D}(Q_{{y_{i} }} ,Q_{g} ) = \frac{1}{T}\left( {\sum\limits_{k = 1}^{T} {\left[ {y_{i(k)} - Q_{g} \left( \frac{k}{T} \right)} \right]^{2} } } \right)^{1/2} $$
(15)

where \( Q_{g} \left( {{k \mathord{\left/ {\vphantom {k T}} \right. \kern-0pt} T}} \right) \) is the \( {k \mathord{\left/ {\vphantom {k T}} \right. \kern-0pt} T} \) quantile of the equivalent Gaussian distribution.

4 Nonparametric NG Algorithm for Dependent Source Signals

Conclusion 2.

The nonparametric NG measure \( \hat{D}(Q_{{y_{i} }} ,Q_{g} ) \) will reach a local maximum at any output channel for each component if \( {\mathbf{b}}_{i} \) is forced to be unitary [36].

To extract a different source signal at each output channel, a multistage deflation procedure must be applied to the separation system. The NG measure \( \hat{D}(Q_{{y_{i} }} ,Q_{g} ) \) is maximized at each output channel successively under the constriction that the vector \( {\mathbf{b}}_{i} \) has to be orthonormal to the previously obtained vectors, the separation matrix is composed of all the vectors \( {\mathbf{b}}_{i} \). Taking into account Eq. (4) in vector form:

$$ y_{i} (t) = {\mathbf{b}}_{i}^{\text{T}} {\mathbf{z}}(t) $$
(16)

where \( {\mathbf{b}}_{i}^{\text{T}} \) is the i-row of the separation matrix \( {\mathbf{B}} \), the goal is to update \( {\mathbf{b}}_{i} \) at each stage by optimizing a cost function \( J({\mathbf{b}}_{i} ) \). We take the objective function \( J({\mathbf{b}}_{i} ) \) as:

$$ J({\mathbf{b}}_{i} ) = D(Q_{{y_{i} }} ,Q_{g} ) $$
(17)

\( J({\mathbf{b}}_{i} ) \) will be optimized by the stochastic gradient rule of the constrained optimization method with the constrains [36]:

$$ \left\{ \begin{aligned} & {\mathbf{b}}_{i} (k + 1) = {\mathbf{b}}_{i} (k) + \mu \nabla J|_{{{\mathbf{b}}_{i} (k)}} \\ & s.t. \, {\mathbf{b}}_{i} {\text{ is orthonormal to }}\{ {\mathbf{b}}_{1} , \cdots ,{\mathbf{b}}_{i - 1} \} \\ \end{aligned} \right. $$
(18)

Denote \( Y_{i} (k) = y_{i(k)} - Q_{g} \left( {{k \mathord{\left/ {\vphantom {k T}} \right. \kern-0pt} T}} \right) \), then the gradient of \( J({\mathbf{b}}_{i} ) \) in Eq. (15) is:

$$ \left. {\nabla J} \right|_{{{\mathbf{b}}_{i} (k)}} = \left. {\frac{1}{2T}\left( {\sum\limits_{t = 1}^{T} {\left[ {Y_{i} (t)} \right]^{2} } } \right)^{ - 1/2} \frac{{d\left( {\sum\limits_{t = 1}^{T} {\left[ {Y_{i} (t)} \right]^{2} } } \right)}}{{d{\mathbf{b}}_{i} }}} \right|_{{{\mathbf{b}}_{i} (k)}} = \left. {\frac{1}{T}\left( {\sum\limits_{t = 1}^{T} {\left[ {Y_{i} (t)} \right]^{2} } } \right)^{ - 1/2} \left( {\sum\limits_{t = 1}^{T} {Y_{i} (t)} } \right){\mathbf{z}}\frac{{dy_{i(t)} }}{{dy_{i} }}} \right|_{{{\mathbf{b}}_{i} (k)}} $$
(19)

where \( \left. {\frac{{dy_{i(t)} }}{{dy_{i} }}} \right|_{{{\mathbf{b}}_{i} (k)}} = {\mathbf{e}}_{t} = [0,0, \cdots ,0,1,0, \cdots ,0]^{\text{T}} \) and \( {\mathbf{e}}_{t} (l) = \left. {\left\{ \begin{aligned} & 1\quad {\text{if}}\;y_{i} (l) = y_{i(t)} \\ & 0\quad {\text{else}} \\ \end{aligned} \right.} \right|_{t = 1, \cdots ,T} \).

After the ith source signal is extracted, \( {\mathbf{b}}_{i} \) must be normalized and projected over the subspace orthonormal \( {\mathbf{C}}_{i - 1} \) to the vectors obtained at every previous stage. Let us quote that the \( {\mathbf{C}}_{i - 1} \) expression is

$$ {\mathbf{C}}_{i - 1} = {\mathbf{I}} - ({\mathbf{B}}_{i - 1} {\mathbf{B}}_{i - 1}^{\text{T}} )^{ - 1} {\mathbf{B}}_{i - 1}^{\text{T}} $$
(20)

where \( {\mathbf{B}}_{i - 1} = ({\mathbf{b}}_{1} , \cdots ,{\mathbf{b}}_{i - 1} ) \).

5 Computer Simulations

In order to show the performance and the validity of the proposed algorithm, simulations using Matlab are given below. The simulation results presented in this section are divided into four Examples. The statistical performance, or accuracy, was measured by the index signal-to-interference ratio (SIR) as,

$$ \text{SIR}\left( {s_{i} ,y_{j} } \right) = 10\log \frac{{\sum\limits_{t = 1}^{T} {\left( {s_{i} (t)} \right)^{2} } }}{{\sum\limits_{t = 1}^{T} {\left( {|s_{i} (t)| - |y_{j} (t)|} \right)^{2} } }},\quad i,j = 1, \cdots ,N $$

where \( y_{i} (t) \) is the estimation of \( s_{i} (t) \).

5.1 Simulations on Determined BSS Case

In this simulation, we use \( N = 4 \) source signals which are extracted from the real world photo. It should be noted that the source signals are extracted from different pixel columns of real world images and then stack these columns one by one to get a one dimensional source signals. By selecting different intervals between the columns of the image, we can control the level of dependence between the source signals. We choose the columns of the photo which are relatively far away, therefore they are mutually weak correlated. The source’ correlation coefficients are shown in Table 1.

Table 1. The correlation coefficients between different source signals

The input mixed signals of the algorithm are generated by mixing the four source signals with a \( 4 \times 4 \) random mixing matrix in which the elements are distributed with \( N(0,1) \). After convergence of the proposed algorithm, the average results of the performance criteria evaluated by SIR over 10 experiments are shown in Table 2.

Table 2. Average SIR for different source signals using the proposed algorithm over 10 experiments

5.2 Simulations on Underdetermined BSE Case

In this example, we use 4 source signals which are extracted from the same real world photo as example 1. The 3*4 mixing matrix are generated by the randn function of Matlab. The source’ correlation coefficients are shown in Table 3.

Table 3. The correlation coefficients between different source signals

After convergence of the proposed algorithm, the average results of the performance criteria evaluated by SIR over 10 experiments are shown in Table 4. The correlation coefficients of source signals and their corresponding extracted signals using the proposed algorithm are shown in Table 5.

Table 4. Average SIR for different source signals using the proposed algorithm over 10 experiments
Table 5. The correlation coefficients of source signals and their corresponding extracted signals using the proposed algorithm

5.3 Simulations on Effect of Strong Correlations

In order to verify the performance of the proposed algorithm for strong correlations between source signals, we choose four face images extracted from face databases of [37] as the source signals. The source’ correlation coefficients are shown in Table 6. The input mixed signals of the algorithm are generated by mixing the four source signals with a \( 4 \times 4 \) random mixing matrix in which the elements are distributed with \( N(0,1) \). After convergence of the proposed algorithm, the average results of the performance criteria evaluated by SIR over 10 experiments are shown in Table 7. The definition of SIR for images can be found in [38].

Table 6. The correlation coefficients between four different face images
Table 7. Average SIR for different source signals using the proposed algorithm over 10 experiments

5.4 Simulations on Comparison with Other Algorithms

For comparison, we display the this simulation, at the same convergent conditions, the proposed algorithm was compared along two sets of criteria, statistical and computational with other popular BSS algorithms such as FastICA (the nonlinearity function to be chosen as \( y^{3} \)), COMBI, SOBI and JADEop algorithm [39]. The computational load was measured as CPU time needed for convergence (using Matlab R2010b and run in the CPU 3.0 GHz Pentium4 computer). The four source signals and mixed signals are the same as Sect. 5.1.

The statistical performance, or accuracy, was measured using an alternative popular BSS performance index called cross-talking error index \( E \) defined as [2]. The separation results of the 4 different sources are shown in Table 8 for various BSS algorithms (averaged over 100 Monte Carlo simulations).

Table 8. The separation results of various BSS algorithms

From Table 8 we conclude that the proposed algorithm can make an ideal separation results for statistically dependent source signals, the BCA algorithm can also get ideal separation results for these source signals, the other four popular BSS algorithms can not all work well in this condition, but when the boundaries of the source signals are not satisfied properly, the proposed algorithm can also works well, that is to say our proposed method has a further wide field of applications. Then we will discuss the computational load for convergence, as a nonparametric method, the proposed algorithm requires more computation than all the other algorithms, but it has a better convergent performance, moreover the nonparametric estimation method has the robust properties, so from the convergent performance and computation load two aspects, the proposed algorithm can work in the DCA situation.

6 Conclusions

Most of state-of-the-art algorithms for solving BSS or BSE problem rely on independence or at least second order statistical assumption of the source signals. In this paper, we developed a nonparametric BSE algorithm for statistical dependent source signals using the nonparametric NG measure. We show that maximization of the NG measure can not only separate the statistically independent but also dependent source signals, even in the underdetermined BSE situation. The NG measure is defined by statistical distances between distributions based on the CDF instead of traditional PDF which can be realized by the order statistics efficiently. Simulation results on both synthetic and real world data show that the proposed nonparametric algorithm is able to extract the dependent source signals and yield ideal performance. The next purpose of this study is utilizing the proposed method to extract ERP signals in the brain-computer interfaces (BCI) applications.