Blind source separation for the analysis sparse model

Ma, Shuang; Zhang, Hongjuan; Miao, Zhuoyun

doi:10.1007/s00521-020-05606-y

Blind source separation for the analysis sparse model

Original Article
Published: 03 January 2021

Volume 33, pages 8543–8553, (2021)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Neural Computing and Applications Aims and scope Submit manuscript

Blind source separation for the analysis sparse model

Download PDF

Shuang Ma¹,
Hongjuan Zhang¹ &
Zhuoyun Miao¹

260 Accesses
5 Citations
Explore all metrics

Abstract

Sparsity of the signal has been shown to be very useful for blind source separation (BSS) problem which aims at recovering unknown sources from their mixtures. In this paper, we propose a novel algorithm based on the analysis sparse constraint of the source over an adaptive analysis dictionary to address BSS problem. This method has an alternating scheme by keeping all but one unknown fixed at a time so that the dictionary, the source, and the mixing matrix are estimated alternatively. In order to make better use of the sparsity constrain, $l_{0}$-norm is utilized directly for a more exact solution instead of its other relaxation, such as $l_{\mathrm{p}}$-norm ($0<p\le 1$). Numerical experiments show that the proposed method indeed improves the separation performance.

Blind Source Separation Based on Dictionary Learning: A Singularity-Aware Approach

Blind Source Separation Based on Mixed Integer Programming

Blind Separation of Instantaneous Mixtures of Independent/Dependent Sources

Article 19 February 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In signal and image processing, we frequently need to address the problem of recovering unknown sources from their mixtures, that is, the blind source separation (BSS) problem, which can be described as the following linear mixing model

$$\begin{aligned} \varvec{Y}=\varvec{\mathrm{AX}}+\varvec{V} \end{aligned}$$

(1)

where $\varvec{Y}=[\varvec{y}_{1},\varvec{y}_{2},\ldots ,\varvec{y}_{m}]^{T}\in \mathbb {R}^{m\times N}$ and $\varvec{X}=[\varvec{x}_{1},\varvec{x}_{2},\ldots ,\varvec{x}_{n}]^{T} \in \mathbb {R}^{n\times N}$ denote the observed matrix and the source matrix, respectively, $\varvec{A}\in {\mathbb {R}^{m\times n}}$ is the mixing matrix, and $\varvec{V}\in {\mathbb {R}^{m\times N}}$ is referred to the additive Gaussian noise. Notice that $\varvec{X}$ and $\varvec{Y}$ contain n sources and m mixtures, respectively, and here N is the data size. BSS’s task is to estimate the sources $\varvec{X}$ (and the mixing matrix $\varvec{A}$) without any prior knowledge except the observation $\varvec{Y}$. As we all know that this problem does not have a unique solution in general. Therefore, it is necessary to impose some restrictions into the separation process and make the sources distinguishable.

Independent component analysis (ICA) [1], as one of classical approaches to solve the over-determined BSS problem ($m\ge n$), assumes that the sources are non-Gaussian and as statistically independent as possible. It has led to many well-known approaches, such as Infomax [2], maximum likelihood estimation [3], the maximum a posterior (MAP) [4] and FastICA [1]. Where Infomax using a stochastic gradient algorithm to maximize the likelihood may require some hand-tuning and often fails to converge [5], or only converges slowly. Therefore, various methods have been proposed to maximize Infomax likelihood much faster by using curvature information, especially the complete set of second-order derivatives (Hessian). However, as a large object, Hessian will cause high computational cost for large data sets. To address the cost issue, [6] adopted a truncated Newton algorithm, and another approach is to use approximations of the Hessian, such as a simple Quasi-Newton method in [7] and [8], and a trust-region method in [9]. Nevertheless, the work in [10] compared these methods [7,8,9] and found that the Hessian approximations do yield a low cost per iteration, but that they are not accurate enough on real data. Therefore, literature [10] proposed a Preconditioned ICA for Real Data (Picard) algorithm, which is a relative L-BFGS algorithm preconditioned with sparse Hessian approximations, to overcome this problem. However, the classical ICA algorithm is weak in processing noisy sources separation.

Meanwhile, Lee and Seung proposed another effective method, nonnegative matrix factorization (NMF) by taking advantages of the nonnegativity constraints. These constraints lead to aparts-based representation because they allow only additive, not subtractive, combinations [11]. Afterward, Boutsidis and Gallopoulos described nonnegative double singular value decomposition (NNDSVD) [12], a new method designed to enhance the initialization stage of NMF, which is well suited to initialize NMF algorithms with sparse factors and many numerical examples showed that NNDSVD made the approximation error of many NMF algorithms decrease rapidly.

During the past decade, sparsity has been widely used in many aspects [13,14,15,16,17,18]. Vast works have been in an attempt to use sparsity for BSS problem, and many sparsity-based approaches have proposed [19,20,21,22,23,24]. They are collectively coined sparse component analysis (SCA), which means that the sources can be sparsely represented using a known common basis or dictionary. For instance, morphological component analysis (MCA) [19] is a novel decomposition method based on sparse representation of signals with a signal dictionary, which showed that different dictionaries produce entirely different representations and the sparser representation, the better separation. Therefore, Jérôme Bobin described an extension of the MCA-multichannel morphological component analysis (MMCA) [20] adapted to the multichannel data. MMCA took advantage of a large over-complete dictionary to obtain different sparse representations for n sources based on their morphological diversity. And in literature [21], generalized morphological component analysis (GMCA) was proposed to the much more general case and considered that each source is a linear combination of several components and each component is sparse in a given dictionary. Moreover, Vahid Abolghasemi proposed BMMCA [22], which can adaptively learn the dictionaries from the mixed images via K-SVD, adapting MMCA to those cases that the known dictionaries are not available. Further, the work in [24] extended the BMMCA method that uses the only one dictionary to sparsely represent different sources. And it inventively incorporated SimCO optimization framework [25] into two-stage dictionary learning and mixture learning, thus significantly reducing the implementation effort.

It is noting that the above sparsity-related methods without exception are all relied on synthesis model. In this model, a signal $\varvec{w}\in {\mathbb {R}^{d}}$ can be composed as $\varvec{w}=\varvec{D\alpha }$, where $\varvec{D}\in {\mathbb {R}^{d\times k}}$ $(d\ge k)$ is a dictionary and the representation coefficient $\varvec{\alpha }\in {\mathbb {R}^{k}}$ is expected to be sparse, i.e., $\Vert \varvec{\alpha }\Vert _{0}=c\ll k$. Just like above mentioned, the synthesis model has been extensively applied to BSS problem. As for its dual analysis angle—analysis sparse model—in sparse representation has been barely touched [26, 27]. The analysis model is described as $\varvec{z}=\varvec{\varOmega w}$, where the $\varvec{\varOmega }\in {\mathbb {R}^{p\times d}}$ $(p\ge d)$ refers to a analysis dictionary. The implications of this model are the signal $\varvec{w}\in {\mathbb {R}^{d}}$ can be transformed into a low-dimensional subspace via a linear operation $\varvec{\varOmega }$. Therefore, the representation coefficient $\varvec{z}$ is assumed as sparse, i.e., $\Vert \varvec{z}\Vert _{0}=c\ll p$.

Although the two models have very similar structure, the theoretical as well as simulation results provided in [28] have indicated that there exists a respectable gap between them, especially in the over-complete case. Moreover, in some respects, analysis model has advantages over synthesis model. For example, analysis model contains a larger number of low-dimensional subspaces in the case of the same dimension $(k=d-l)$, which enrich in terms of its descriptive power [26]. In addition, in the framework of synthesis model where only a small number of atoms are used to represent each signal, the significance of every atom grows enormously; any wrong choice could potentially lead to a ’domino effect.’ But in the analysis formulation, all atoms take an equal part in describing the signal, thus minimizing the dependence on each individual one, and stabilizing the recovery process [28]. More generally, analysis-based methods are a very common structure in image processing and computer vision applications. Due to the smaller dimension of the unknown, compared with the similar-sized synthesis-based method, the analysis-based method leads to a more simple optimization problem in the over-complete case $(p>d)$ , which is considerably easier to be solved [28]. Therefore, fusing analysis sparse constraint to the processing of BSS problem becomes necessary.

Recently, one of the effective related attempts was the work of literature [29], in which experimental results have demonstrated that the approach based on analysis sparse model has better separation performance. Nevertheless, it should be noticed that this work replaced the $l_{0}$-norm with $l_{1}$-norm as the sparsity measure to obtain a approximate solution, which will certainly reduce sparsity. $l_{0}$-norm is the most intuitive measure of sparsity. Only when certain conditions, such as exact recovery condition (ERC) and restricted isometry property (RIP) [26] are met, can $l_{1}$-norm be considered equivalently to $l_{0}$-norm.

Hence, in this paper, we propose a new method, termed analysis BSS (ABSS), to make better use of the sparsity constrain. Different from the most BSS algorithms based on synthesis model, this method bases on analysis model. And it directly uses $l_{0}$-norm as the measure of sparsity to obtain a more exact solution. In turn we come up with a novel objective function for the BSS problem based on analysis sparse model. In order to simplify this problem, the solution process is divided into three stages: dictionary learning stage, sources estimating stage and mixing matrix estimating stage. In the first stage, we adopt Analysis KSVD [30] to adaptively estimate the analysis dictionary from mixed sources. Next in the second and third stage, the sources and mixing matrix can be uniquely obtained separately by a simple least square linear regression. Finally, the results indicate that using $l_{0}$-norm enhances the separability of the sources.

The organization of this manuscript is as follows. In sect. 2, we are devoted to introducing the proposed framework. Then the details of our algorithm are presented in Sect. 3. Finally, the simulations compared with some typical methods to verify the competitive separation performances of the proposed algorithm are provided in Sect. 4.

2 Problem formulation

Firstly, BSS problem (1) can be written as the following optimization problem

$$\begin{aligned} \min _{\varvec{A},\varvec{X}}\Vert \varvec{Y}-\varvec{\mathrm{AX}}\Vert _{\mathrm{F}}^2 \end{aligned}$$

(2)

where $\Vert \cdot \Vert _{F}$ denotes the Frobenius norm. Notice that in this problem, the mixing matrix $\varvec{A}\in {\mathbb {R}^{m\times n}}$ and the source matrix $\varvec{X}:\{\varvec{x}_{i}\}_{i=1}^{n}\in {\mathbb {R}^{n\times N}}$ are completely unknown. We can only use the known observed matrix $\varvec{Y}:\{\varvec{y}_{i}\}_{i=1}^{m}\in {\mathbb {R}^{m\times N}}$ which is the input of our algorithm to estimate $\varvec{X}$ and $\varvec{A}$. Therefore, this problem is ill-posed and here we will impose the analysis sparse constraint on it, that is, the forward projection of the source $\varvec{x}_{i}$ is sparse enough on the basis elements. The sparse representation of $\varvec{x}_{i}$ based on analysis model can be described as

$$\begin{aligned} \min _{\varvec{\varOmega }_{i},\varvec{s}_{i}}\Vert \varvec{s}_{i}\Vert _{0} \quad s.t. \ \varvec{s}_{i}=\varvec{\varOmega }_{i} \varvec{x}_{i} \end{aligned}$$

(3)

where the $\varvec{s}_{i}$ is referred to the representation coefficient of $\varvec{x}_{i}$, which most entries are zero, and the $\Vert \cdot \Vert _{0}$ is the $\ell _{0}$-norm which counts the number of nonzeros. In particular, $\varvec{\varOmega }_{i}\in \mathbb {R}^{p\times d}$ is an analysis dictionary corresponding to the ith source. Optimization problem (3) can be written in its Lagrangian form

$$\begin{aligned} \min _{\varvec{\varOmega }_{i},\varvec{s}_{i}}\Vert \varvec{s}_{i}-\varvec{\varOmega }_{i} \varvec{x}_{i}\Vert _{\mathrm{F}}^2+\mu \Vert \varvec{s}_{i}\Vert _{0} \end{aligned}$$

(4)

Note that formula (4) describes the analysis model, which is different from the expression of the synthesis model in BMMCA [22]. In addition, $l_{0}$-norm is directly used as the measure of sparsity, which is different from the method proposed by Fang [29]. Overall, the BSS problem based on the analysis sparse model can be described as a joint optimization problem

$$\begin{aligned} \min _{\varvec{A},\varvec{\varOmega }_{i},\varvec{x}_{i},\varvec{s}_{i}}\lambda \Vert \varvec{Y}-\varvec{\mathrm{AX}}\Vert _{\mathrm{F}}^2 +\sum _{i}\mu \Vert \varvec{s}_{i}\Vert _{0}+\sum _{i}\Vert \varvec{s}_{i}-\varvec{\varOmega }_{i}\varvec{x}_{i}\Vert _{\mathrm{F}}^2 \end{aligned}$$

(5)

where scalars $\lambda$ and $\mu$ control the noise power and sparsity degree, respectively. To simplify the problem, the dictionary, the sources and the mixing matrix are estimated alternatively. So overall optimization problem (5) is divided into the following stage

Dictionary learning stage (with $\varvec{X}$ and $\varvec{A}$ fixed)
$$\begin{aligned} \min _{\varvec{\varOmega }_{i},\varvec{s}_{i}} \Vert \varvec{s}_{i}\Vert _{0} \quad s.t.\ \Vert \varvec{s}_{i}-\varvec{\varOmega }_{i} \varvec{x}_{i}\Vert _{\mathrm{F}}^2\le \delta \qquad i=1,\ldots ,n \end{aligned}$$
(6)
Sources estimating stage (with $\varvec{\varOmega }_{i}$, $\varvec{s}_{i}$, and $\varvec{A}$ fixed)
$$\begin{aligned} \min _{\varvec{X}} \lambda \Vert \varvec{Y}-\varvec{\mathrm{AX}}\Vert _{\mathrm{F}}^2+\sum _{i}\Vert \varvec{s}_{i}-\varvec{\varOmega }_{i} \varvec{x}_{i}\Vert _{\mathrm{F}}^2 \end{aligned}$$
(7)
Mixing matrix estimating stage (with $\varvec{X}$, $\varvec{\varOmega }_{i}$, and $\varvec{s}_{i}$ fixed)
$$\begin{aligned} \min _{\varvec{A}}\Vert \varvec{Y}-\varvec{\mathrm{AX}}\Vert _{\mathrm{F}}^2 \end{aligned}$$
(8)

In the overall separation process, the above three optimization subproblems should be addressed iteratively until convergence. The specific details of the proposed algorithm, coined Analysis BSS (ABSS), are described in Sect. 3.

3 ABSS algorithm

Before giving the ABSS algorithm, we make some conventions on the variables in the model in advance, which will be more conducive to the proposed method’s application, such as solving large images separation problems. Here, the j th $\sqrt{N}\times \sqrt{N}$ observation image $\mathcal {Y}_{j}$ is presented as the vector $\varvec{y}_{j}\in \mathbb {R}^{N\times 1}$ and the ith $\sqrt{N}\times \sqrt{N}$ source image $\mathcal {X}_{i}$ is also vectorized by $\varvec{x}_{i}\in \mathbb {R}^{N\times 1}$. Then, we divide each $\varvec{x}_{i}$ and each $\varvec{y}_{j}$ into M patches of size $d\times 1$ which may be overlapped, where $\varvec{x}_{i,f}$ is the fth patch of $\varvec{x}_{i}$ and $\varvec{y}_{j,f}$ is the fth patch of $\varvec{y}_{j}$. The observation matrix $\varvec{y}_{:,f}=[\varvec{y}_{1,f},\varvec{y}_{2,f},\ldots ,\varvec{y}_{m,f}]^{T}\in \mathbb {R}^{m\times d}$, $f=1,\ldots ,M$ can be modeled as

$$\begin{aligned} \varvec{y}_{:,f}=\varvec{\mathrm{Ax}}_{:,f}+\varvec{v} \end{aligned}$$

(9)

where $\varvec{x}_{:,f}=[\varvec{x}_{1,f},\varvec{x}_{2,f},\ldots ,\varvec{x}_{n,f}]^{T}\in \mathbb {R}^{n\times d}$ refers to the source matrix and $\varvec{v}$ is the additive noise. Thus, large-scale images separation problem (1) can be transformed into M small-scale images separation problems (9). Solving overall BSS problem (5) amounts to solving the following problem one patch by one patch

$$\begin{aligned} \min _{\varvec{A},\varvec{\varOmega }_{i},\varvec{x}_{i,f},\varvec{s}_{i,f}}\lambda \Vert \varvec{y}_{:,f} -\varvec{\mathrm{Ax}}_{:,f}\Vert _{\mathrm{F}}^2&+\sum _{i}\mu \Vert \varvec{s}_{i,f}\Vert _{0}\nonumber \\&+\sum _{i}\Vert \varvec{s}_{i,f}-\varvec{\varOmega }_{i} \varvec{x}_{i,f}\Vert _{\mathrm{F}}^2 \end{aligned}$$

(10)

or equivalently,

$$\begin{aligned} \min _{\varvec{A},\varvec{\varOmega }_{i},\varvec{x}_{i,f},\varvec{s}_{i,f}}\lambda \Vert \varvec{E}_{i,f} -\varvec{a}_{:,i} \, \varvec{x}_{i,f}^{T}\Vert _{\mathrm{F}}^2&+\sum _{i}\mu \Vert \varvec{s}_{i,f}\Vert _{0}\nonumber \\&+\sum _{i}\Vert \varvec{s}_{i,f}-\varvec{\varOmega }_{i} \varvec{x}_{i,f}\Vert _{\mathrm{F}}^2 \end{aligned}$$

(11)

where $\varvec{E}_{i,f}$ is defined as the multichannel residual of $\varvec{x}_{i,f}$ and its expression is $\varvec{E}_{i,f}=\varvec{y}_{:,f}-\sum _{r=1,r\ne i}^{n}\varvec{a}_{:,r}\,\varvec{x}_{r,f}^{T}$. Here $\varvec{a}_{:,i}$ corresponds to the ith column of $\varvec{A}$. Because the above optimization subproblems are invariant to scaling [24], we require $\varvec{A}$ is column-normalized matrix and $\varvec{\varOmega }_{i}$ is row-normalized matrix, that is

$$\begin{aligned} \Vert \varvec{a}_{:,i}\Vert =1, \qquad i=1,\ldots ,n \end{aligned}$$

(12)

and

$$\begin{aligned} \Vert \varvec{w}_{j,i}\Vert _{2}=1, \qquad j=1,\ldots ,p \end{aligned}$$

(13)

where $\varvec{w}_{j,i}$ is the jth row of $\varvec{\varOmega }_{i}$. To minimize problem (11), we propose an alternating algorithm by keeping all but one unknown fixed at a time. The estimation process of sources, mixing matrix and analysis dictionary are discussed in the following three subsections, respectively. And the procedure of the proposed algorithm is given at the end of this section.

3.1 Dictionary learning stage

During the dictionary learning stage, we fix $\{\varvec{x}_{i,f}\}_{f=1}^{M}$ and $\varvec{A}$ to estimate the dictionary $\varOmega _{i}$, so problem (6) can be transformed as follows

$$\begin{aligned} \{\hat{\varvec{\varOmega }}_{i},\hat{\varvec{s}}_{i,f}\}=\min _{\varvec{\varOmega }_{i},\varvec{s}_{i,f}} \sum _{f}\Vert \varvec{s}_{i,f}\Vert _{0} \quad s.t.\Vert \varvec{s}_{i,f}-\varvec{\varOmega }_{i} \varvec{x}_{i,f}\Vert _{\mathrm{F}}^2\le \delta \end{aligned}$$

(14)

Note that, for each picture $\varvec{x}_{i}$ we need learn a corresponding dictionary $\hat{\varvec{\varOmega }}_{i}$. Thus, we consider adopt Analysis KSVD to the above problem, in which the learning of $\ell _{0}$ analysis dictionary $\varvec{\varOmega }_{i}$ and the recovery of the correct noiseless signal $\{{\tilde{\varvec{x}}}_{i,f}\}_{f=1}^{M}$ from its noisy version $\{\varvec{x}_{i,f}\}_{f=1}^{M}$ can be achieved. Note that $\varvec{x}_{i,f}={\tilde{\varvec{x}}}_{i,f}+\varvec{v}_{i,f}$ where $\varvec{v}_{i,f}$ is a zero-mean white-Gaussian additive noise vector, and the analysis coefficients can be described as $\varvec{s}_{i,f}=\varOmega _{i}{\tilde{\varvec{x}}}_{i,f}$. Then formula (14) can be transformed into the objective function of Analysis KSVD

$$\begin{aligned} \{\hat{\varvec{\varOmega }}_{i},\hat{\tilde{\varvec{x}}}_{i,f}\}=\min _{\varvec{\varOmega }_{i}, {\tilde{\varvec{x}}}_{i,f}} \sum _{f}\Vert \varvec{\varOmega }_{i}{\tilde{\varvec{x}}}_{i,f}\Vert _{0} \quad s.t.\ \Vert {\tilde{\varvec{x}}}_{i,f}-\varvec{x}_{i,f}\Vert _{\mathrm{F}}^2\le \delta \ \end{aligned}$$

(15)

or

$$\begin{aligned}&\{\hat{\varvec{\varOmega }}_{i},\hat{\tilde{\varvec{x}}}_{i,f},\hat{\Lambda }_{i,f}\} =\min _{\varvec{\varOmega }_{i},{\tilde{\varvec{x}}}_{i,f},\Lambda _{i,f}} \sum _{f} \Vert {\tilde{\varvec{x}}}_{i,f}-\varvec{x}_{i,f}\Vert _{\mathrm{F}}^2 \nonumber \\&\qquad s.t.\ {\varvec{\varOmega }_{i}}_{\Lambda _{i,f}}{\tilde{\varvec{x}}}_{i,f}=0, \ \mathrm{Rank}({\varvec{\varOmega }_{i}}_{\Lambda _{i,f}})=d-r \ \end{aligned}$$

(16)

since here we assume the signal ${\tilde{\varvec{x}}}_{i,f}$ resides in r-dimensional subspace, which implies that $d-r$ rows in $\varvec{\varOmega }_{i}$ are orthogonal to it. These rows define the co-support $\Lambda _{i,f}$ of signal ${\tilde{\varvec{x}}}_{i,f}$ and ${\varvec{\varOmega }_{i}}_{\Lambda _{i,f}}$ is a submatrix of $\varvec{\varOmega }_{i}$ that contains only the rows indexed in $\Lambda _{i,f}$. In problem (15), we constrain the target sparsity of $\varvec{\varOmega }_{i}{\tilde{\varvec{x}}}_{i,f}$ to $d-r$, whereas in problem (16), we require the solution to be $\delta$-close to the given noisy signal, where this error tolerance is derived from the noise power. Once given the correct correspondence between r and $\delta$, problem (15) and problem (16) can be considered equivalent. Note that in the proposed method, the Analysis KSVD is based on a two-phase block-coordinate-relaxation approach to solve problem (16). The first phase is called the sparse-coding stage, using a pursuit method, such as the backward greedy (BG) algorithm and the optimized backward greedy (OBG) algorithm, to estimate $\{ {\tilde{\varvec{x}}}_{i,f}\}_{1}^{M}$ with $\varvec{\varOmega }_{i}$ fixed. It is worth noting that this method is a greedy algorithm which selecting rows from $\varvec{\varOmega }$ one-by-one in a greedy fashion to solve the $l_{0}$-norm problem. The problem can be described as follows

$$\begin{aligned}&\{\hat{\tilde{\varvec{x}}}_{i,f},\hat{\Lambda }_{i,f}\}=\min _{{\tilde{\varvec{x}}}_{i,f}, \Lambda _{i,f}} \Vert {\tilde{\varvec{x}}}_{i,f}-\varvec{x}_{i,f}\Vert _{F}^2 \nonumber \\&\qquad s.t.\ {\varvec{\varOmega }_{i}}_{\Lambda _{i,f}}{\tilde{\varvec{x}}}_{i,f}=0, \ \mathrm{Rank}({\varvec{\varOmega }_{i}}_{\Lambda _{i,f}})=d-r \ \end{aligned}$$

(17)

Once $\{\hat{\tilde{\varvec{x}}}_{i,f}\}_{1}^{M}$ is computed, we turn to update $\varvec{\varOmega }_{i}$ in the second phase, which is called the atom update step. The optimization is carried out sequentially for each of the rows $\varvec{w}_{j,i}$ in $\varvec{\varOmega }_{i}$. And the update of $\varvec{w}_{j,i}$ should only depend on those signals of $\{ {\tilde{\varvec{x}}}_{i,f}\}_{1}^{M}$ that are orthogonal to it, while not on the remaining signals. Therefore, denoting ${\tilde{\varvec{X}}}_{J,i}$ whose columns are derived from the signal in $\{ {\tilde{\varvec{x}}}_{i,f}\}_{1}^{M}$ and are orthogonal to $\varvec{w}_{j,i}$. In addition, denoting by $\varvec{X}_{J,i}$ the corresponding columns of $\{\varvec{x}_{i,f}\}_{1}^{M}$. The update step for $\varvec{w}_{j,i}$ can be written as

$$\begin{aligned} \{\hat{\varvec{w}}_{j,i},\hat{\tilde{\varvec{X}}}_{J,i}\}=\min _{\varvec{w}_{j,i}, {\tilde{\varvec{X}}}_{J,i}}&\Vert {\tilde{\varvec{X}}}_{J,i}-\varvec{X}_{J,i}\Vert _{F}^{2}\nonumber \\ s.t.&\ {\varvec{\varOmega }_{i}}_{\Lambda _{i,f}}{\tilde{\varvec{x}}}_{i,f}=0, \ \forall f\in J\nonumber \\&\mathrm{Rank}({\varvec{\varOmega }_{i}}_{\Lambda _{i,f}})=d-r, \ \forall f\in J\nonumber \\&\Vert \varvec{w}_{j,i}\Vert _{2}=1 \end{aligned}$$

(18)

Unfortunately, in general, solving problem (18) is a difficult task. [30] proposed the following approximation to the above optimization goal

$$\begin{aligned} \hat{\varvec{w}}_{j,i}=\min _{\varvec{w}_{j,i}} \Vert \varvec{w}_{j,i}\varvec{X}_{J,i}\Vert _{F}^{2} \quad s.t.\ \Vert \varvec{w}_{j,i}\Vert _{2}=1 \end{aligned}$$

(19)

For the optimization problem, the $\varvec{w}_{j,i}$ can be updated using the eigenvector associated with the smallest eigenvalue of $\varvec{X}_{J,i}\varvec{X}_{J,i}^{T}$. Finally, the process is repeated until a fixed number of iterations are reached. And the analysis coefficients of ${\tilde{\varvec{x}}}_{i,f}$ can be obtained as $\hat{\varvec{s}}_{i,f}=\hat{\varvec{\varOmega }}_{i}\hat{\tilde{\varvec{x}}}_{i,f}$.

3.2 Sources estimating stage

In this stage, we estimate source $\varvec{x}_{i,f}$ by solving the following convex minimization problem

$$\begin{aligned} \{\hat{\varvec{x}}_{i,f}\}=\min _{\varvec{x}_{i,f}}\lambda \Vert \varvec{E}_{i,f}-\varvec{a}_{:,i}\, x_{i,f}^{T}\Vert _{\mathrm{F}}^2+\Vert \varvec{s}_{i,f}-\varvec{\varOmega }_{i}\varvec{x}_{i,f}\Vert _{\mathrm{F}}^2 \end{aligned}$$

(20)

where $\varvec{\varOmega }_{i}$ and $\varvec{s}_{i,f}$ were already obtained in the previous step, and the other sources in $\varvec{x}_{:,f}$ are assumed to be given ($\varvec{E}_{i,f}$ fixed). Then the solution can be obtained by setting the gradient of subproblem (20) to zero

$$\begin{aligned} 0&=\lambda (\varvec{x}_{i,f}\,\varvec{a}_{:,i}^{T}-\varvec{E}_{i,f}^{T})\varvec{a}_{:,i} +\varvec{\varOmega }_{i}^{T}(\varvec{\varOmega }_{i}\varvec{x}_{i,f}-\varvec{s}_{i,f})\nonumber \\&=\lambda \varvec{x}_{i,f}+\varvec{\varOmega }_{i}^{T}\varvec{\varOmega }_{i}\varvec{x}_{i,f}-\lambda \varvec{E}_{i,f}^{T}\,\varvec{a}_{:,i}-\varvec{\varOmega }_{i}^{T}\varvec{s}_{i,f} \end{aligned}$$

(21)

leading to

$$\begin{aligned} \hat{\varvec{x}}_{i,f}=(\lambda \varvec{I}+\varvec{\varOmega }_{i}^{T}\varvec{\varOmega }_{i})^{-1}(\lambda \varvec{E}_{i,f}^{T}\,\varvec{a}_{:,i}+\varvec{\varOmega }_{i}^{T}\varvec{s}_{i,f}) \end{aligned}$$

(22)

where $\varvec{I}$ is the identity matrix. With the help of these corresponding patches $\{\varvec{x}_{i,f}\}_{1}^{M}$, the source $\varvec{x}_{i}$ can be obtained by averaging over the overlapping regions and then the source matrix $\varvec{X}$ can be recovered.

3.3 Mixing matrix estimating stage

At this part, all the unknowns, except $\varvec{A}$, are fixed and problem (5) is simplified as

$$\begin{aligned} \hat{\varvec{a}}_{:,i}=\min _{\varvec{a}_{:,i}}\Vert \varvec{E}_{i}-\varvec{a}_{:,i}\,\varvec{x}_{i}^{T}\Vert _{\mathrm{F}}^{2} \end{aligned}$$

(23)

where the multichannel residual of $\varvec{x}_{i}$ is defined as $\varvec{E}_{i}=\varvec{Y}-\sum _{r=1,r\ne i}^{n}\varvec{a}_{:,r}\,\varvec{x}_{r}^{T}$. The above minimization subproblem is easily solved by setting its gradient to zero, and then, the estimation of $\varvec{a}_{:,i}$ is achieved

$$\begin{aligned} \varvec{a}_{:,i}=(\varvec{E}_{i}\varvec{x}_{i})(\varvec{x}_{i}^{T}\varvec{x}_{i})^{-1} \end{aligned}$$

(24)

In order to maintain the column norm, the mixing matrix $\varvec{A}$ needs to be normalized after each update of $\varvec{a}_{:,i}$.

3.4 Complexity analysis

Since the proposed algorithm directly uses $l_{0}$-norm as the measure of sparsity, the computational complexity of the proposed algorithm must be higher than Fang’s method. The detailed analysis of the complexity of the proposed method per iteration is as follows. The complexity analysis of the dictionary learning stage is divided into two parts. In Analysis KSVD, the backward greedy (BG) pursuit algorithms used in sparse coding have complexity $\mathcal {O}(\mathrm{Md}^{2}p)$, and atoms update cost $\mathcal {O}(\mathrm{Mdp})$. Therefore, line 4 costs $\mathcal {O}(\mathrm{Md}^{2}p+\mathrm{Mdp})$. The complexity of updating the representation coefficient (line 6) and computing the residual (line 8) is $\mathcal {O}(\mathrm{Mdp})$ and $\mathcal {O}((n-1)\mathrm{mN})$, respectively. Updating sources in line 9 costs $\mathcal {O}(d^{2}p+d^{3})$. Finally, the mixing matrix estimating stage including computing residual (line 11), updating mixing matrix (line 12) and normalization (line 13) total cost $\mathcal {O}((n-1)\mathrm{mN}+\mathrm{mN}+2m)$. Since all these calculations are executed for each source (i.e., $i=1, \ldots , n$), then the overall computation cost per each iteration of the algorithm would be $\mathcal {O}(\mathrm{nNd}^{2}p+n^{2}\mathrm{mN}+\mathrm{nd}^{3}+2\mathrm{mn})$, where $N=M\times d$.

4 Simulations

In the first experiment, we chose two very similar source images^{Footnote 1} (shown in Fig.2a) mixed together using a $4\times 2$ full rank random column normalized mixing matrix $\varvec{A}$ to examine the performance of ABSS algorithm. Each $128\times 128$ (=N) mixture and initialized source were divided into patches of size $8\times 8$, which can avoid leading to large dictionaries and provide sufficient training signals for the dictionary learning stage. The overlap percentage of the patches was fixed to 50%. Of course, the higher the overlap percentage lead to a better the recovery result. In addition, three hundred iterations were selected as the stopping criterion. As to the selection of parameter $\lambda$, we investigated the effects of choosing a different $\lambda$ on recovery quality. For this experiment, we calculated the normalized mean square error (NMSE) while varying $\lambda$. The NMSE in dB can measure the distance between the mixing matrix $\varvec{A}$ and the estimated matrix $\hat{\varvec{A}}$ and is defined as

$$\begin{aligned} \mathrm{NMSE}=10\log _{10}(\frac{\sum _{p,q}(\hat{a}_{p,q}-a_{p,q})^2}{\sum _{p,q}(a_{p,q})^2} \end{aligned}$$

(25)

where $\hat{a}_{p,q}$ is the (p, q) element of the estimated matrix $\hat{\varvec{A}}$. And the smaller the value of NMSE, the better the mixing matrix estimation. Figure 1 represents the achieved results for this experiment. It can be found that $\lambda =3$ is a optimal choice.

Then, we applied the following methods for comparison purposes: FastICA [1], Picard [10], BMMCA [22] and the algorithm proposed by Fang [29] (Computer OS: Windows 8.1, CPU: Intel (R) Core (TM) i5-5200U @ 2.20GHz RAM: 4G). And using the peak signal-to-noise ratio (PSNR) describes the reconstruction performance of the candidate algorithm. PSNR is calculated as

$$\begin{aligned} \mathrm{PSNR}=20\log _{10}(\mathrm{MAX}/\sqrt{\mathrm{MSE}}) \end{aligned}$$

(26)

where the MAX indicates the maximum possible pixel value of the image, such as for a uint-8 image, the MAX equals to 255. And the MSE refers to mean square errors given by $\mathrm{MSE}=(1/N)\Vert \mathcal {X}-\hat{\mathcal {X}}\Vert _{\mathrm{F}}^{2}$, where $\mathcal {X}$ and $\hat{\mathcal {X}}$ are the source image and recovered image. The better the quality of the image, the higher value of PSNR. Note that in the ABSS algorithm, the number of atoms of dictionary is set to 80. Although there is not much can be theoretically said about choosing the optimum dictionary size (or redundancy factor d/p), we try to give an explanation by experiment. Table 1 shows the PSNR values for different numbers of atoms.

Table 1 PSNRs (dB) and computation times (in seconds) of reconstructed source images

Full size table

As shown in Table 1, the separation effect is better when the number of atoms is 80 or 90, and the computation time of 90 atoms is greater than that of 80 atoms. For comprehensive consideration, we set the number of atoms to 80. Then, Fig. 2 shows the reconstructed source images of the five test algorithms. And Table 2 presents the average PSNR results.

Table 2 PSNRs (dB) of the five test algorithms

Full size table

We observed that the proposed algorithm achieves the best performance. BMMCA and Fang have almost similar results, and the results of FastICA and Picard are not as good as the other three algorithms in processing very similar source images. As for another measure on the performance of the proposed method, the reconstruction error as a function of the number of iterations is shown in Fig. 3, which shows that a monotonic decrease in the value of the separation error is achieved. In addition, the asymptotic error decreases relatively rapidly with a very small value around the 20th iteration and is almost zero around the 300th iteration. Therefore, the proposed algorithm is convergent well.

Next, in order to make the experimental results more convincing, we did another experiment to separate four sources from six noiseless mixtures. The four selected source images^{Footnote 2} have very different morphologies, including noiselike texture and bricklike texture. As found from Fig. 4, the parameter $\lambda$ should set to 3. Moreover, the rest of parameters were the same as those in the first experiment. Figure 5 and Table 3 illustrate the separation results of applying all these methods mentioned above (BMMCA, Fang, FastICA and Picard) and the corresponding MSEs. As shown in Fig. 5c the proposed method could successfully recover the image sources. And the achieved MSEs, given in Table 3, are lowest for the proposed method for all image sources except for the cartoon boy. Other methods do not perform as well as the proposed method. For example, as shown in Fig. 5d, although BMMCA has perfectly recovered the cartoon boy with the lowest MSE among all other methods, it has a problem in recovering bricklike texture. In addition, Fang, FastICA and Picard all have some difficulties in separation of some images, separating all the image sources but adding some interference from other sources.

Table 3 MSEs of the five test algorithms

Full size table

In the last simulation, we investigated the performance of all the methods in different noise levels. Gaussian noises which standard deviations varied from 5 to 15 ($\sigma =15$) are added to the mixtures which are mixed from another two image sources (Lena and Boat)^{Footnote 3}. The noisy mixtures are shown in Fig. 5a. As shown in Fig. 6, selection of $\lambda$ is dependent on the noise standard deviation. Therefore, at noise levels ranging from 5 to 15, the optimal choices of $\lambda$ were 1, 5 and 10, respectively. The other experimental settings and procedure were the same as the previous experiment. Table 4 presents the average PSNRS results in different noise levels. And the separation results in the noise level 10 are shown in Fig. 7c–g. We observed that all algorithms successfully separated the noise mixtures, but the proposed algorithm achieves a better separation performance than the other algorithms. When the noise level is 5, the separation results of all methods are similar except Picard, but as the noise level increases, FastICA performs obviously worse and Picard has superior performance than FastICA. In addition, ABSS presents successful performance improvement in denoising, for example, Lena’s facial details are the most legible among the five methods. Overall, the above experimental results have indicated that ABSS has a significant performance improvement in both separation and denoising.

Table 4 PSNRs (dB) of the five test algorithms

Full size table

5 Conclusions and future work

In this paper, we proposed a novel algorithm based on the analysis sparse constraint of the source over an adaptive analysis dictionary to address BSS problem, where the number of mixtures is not less than the number of sources. Moreover, $l_{0}$-norm was chosen as the measure of sparsity to make better use of the sparsity constrain. The simulation results on both noisy and noiseless mixtures have been confirmed that the proposed method improves the separation performance significantly.

In fact, the way of updating dictionaries and sources is completely different from the literature [29]. In the analysis dictionary learning stage, we adopt Analysis KSVD [30] to adaptively estimate the analysis dictionary from mixed sources, whereas Fang uses SP-ADL directly proposed by the literature [31], which is an efficient algorithm for obtaining the analysis dictionary from a given data set in a KSVD-like manner. And in the sources estimating stage, since the subproblem we need to solve is a convex function, the sources can be uniquely obtained by a simple least square linear regression instead of the split Bergman iteration (SBI) algorithm introduced by [32], which use directly in the literature [29] and need five iterations to converge to an approximate solution. Thus, the convergence rate of the proposed algorithm is faster than the Fang’s method. Finally, the simulation results on both noisy and noiseless mixtures have been confirmed that using $l_{0}$-norm enhances the separability of the sources and the proposed method improves the separation performance significantly.

However, due to the greedy algorithm—backward greedy algorithm is used in the dictionary learning stage, it inevitably slows down the proposed algorithm speed, which is also the reason why the computation time in Table 1 is so high. Further work is required to speed up the algorithm and make it suitable for large-scale problems, for example, using some novel and more efficient dictionary learning algorithms [33, 34] for BSS problem. In addition, this work only focuses on the general overdetermined BSS problem and tries to find some new angle to deal with it perfectly. It would be desired to extend this related method to resolve the underdetermined BSS problem in the future, in which the number of mixtures is less than that of the sources, and this kind of problem is harder to deal with than the problem mentioned in this work.

Notes

Available at: http://md.cosmostat.org/Generalized_MCA.html.
Available at: http://md.cosmostat.org.
Available at: https://elad.cs.technion.ac.il/software/.

References

Hyvärinen A, Oja E (2000) Independent component analysis: algorithms and applications. Neural Netw 13(4):411–430
Article Google Scholar
Bell A, Sejnowski T (1995) An information-maximization approach to blind separation and blind deconvolution. Neural Comput 7(6):1129–1159
Article Google Scholar
Gaeta M, Lacoume JL (1990) Source separation without prior knowledge: the maximum likelihood solution. In: Proceeding EUSIPCO’90, pp 621–624
Belouchrani A, Cardoso JF (1994) Maximum likelihood source separation for discrete sources. In: Proceeding EUSIPCO’94, pp 768–771
Montoya-Martínez J, Cardoso JF, Gramfort A (2017) Caveats with stochastic gradient and maximum likelihood based ica for eeg. Lecture Note Computer Sci 10169:279–289
Article Google Scholar
Tillet P, Kung HT, Cox D (2017) Infomax-ica using hessian-free optimization. Proceeding ICASSP’2017, pp 2537–2541
Zibulevsky M (2003) Sparse source separation with relative newton method. Proceeding ICA 2003:897–902
Choi H, Choi S (2007) A relative trust-region algorithm for independent component analysis. Neurocomputing 70(7):1502–1510
Article Google Scholar
Byrd R, Lu P, Nocedal J, Zhu C (2003) A limited memory algorithm for bound constrained optimization. SIAM J Sci Comput 16(5):1190–1208
Article MathSciNet Google Scholar
Ablin P, Cardoso J, Gramfort A (2018) Faster independent component analysis by preconditioning with hessian approximations. IEEE Trans Signal Process 66(15):4040–4049
Article MathSciNet Google Scholar
Lee D, Seung H (1999) Learning the parts of objects by non-negative matrix factorization. Nature 401(6755):788–791
Article Google Scholar
Boutsidis C, Gallopoulos E (2008) Svd based initialization: a head start for nonnegative matrix factorization. Pattern Recognit 41(4):1350–1362
Article Google Scholar
Deng Z, Zhang S, Yang L, Zong M, Cheng D (2018) Sparse sample self-representation for subspace clustering. Neural Comput Appl 29:43–49
Article Google Scholar
Zhu X, Suk HI, Huang H, Shen D (2017) Low-rank graph-regularized structured sparse regression for identifying genetic biomarkers. IEEE Trans Big Data 3(4):405–414
Article Google Scholar
Zhang H, Wang S, Xu X, Chow TWS, Wu QMJ (2018) Tree2vector: learning a vectorial representation for tree-structured data. IEEE Trans Neural Netw Learn Syst 29(11):5304–5318
Article MathSciNet Google Scholar
Alon U, Barkai N, Notterman D, Gish K, Ybarra S, Mack D, Levine A (1999) Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci United States of America 96(12):6745–6750
Article Google Scholar
Shipp MA et al (2002) Diffuse large b-cell lymphoma outcome prediction by gene-expressionprofilingandsupervisedmachinelearning. NatureMed 8(1):68–74
Google Scholar
Huang H, Zhao H, Li X, Ding S, Zhao L, Li Z (2018) An accurate and efficient device-free localization approach based on sparse coding in subspace. IEEE Access 6:61782–61799
Article Google Scholar
Starck JL, Elad M, Donoho D (2004) Redundant multiscale transforms and their application for morphological component separation. Adv Imagin Electron Phys 132(4):287–348
Article Google Scholar
Bobin J, Moudden Y, Starck JL, Elad M (2006) Morphological diversity and source separation. IEEE Signal Process Lett 13(7):409–412
Article Google Scholar
Bobin J, Starck JL, Fadili J, Moudden Y (2007) Sparsity and morphological diversity in blind source separation. Image Process, IEEE Trans Image Process 16(11):2662–2674
Article MathSciNet Google Scholar
Abolghasemi V, Ferdowsi S, Sanei S (2012) Blind separation of image sources via adaptive dictionary learning. IEEE Trans Image Process 21(6):2921–2930
Article MathSciNet Google Scholar
Aharon M, Elad M, Bruckstein A (2006) K-svd: an algorithm for designing overcomplete dictionaries for sparse representation. IEEE Trans Signal Process 54(11):4311–4322
Article Google Scholar
Zhao X, Zhou G, Dai W, Xu T, Wang W (2013) Joint image separation and dictionary learning. Proceedings DSP’2013, pp 1–6
Dai W, Xu T, Wang W (2011) Simultaneous codeword optimization (simco) for dictionary update and learning. IEEE Trans Signal Process 60(12):6340–6353
Article MathSciNet Google Scholar
Nam S, Davies M, Elad M, Gribonval R (2011) The cosparse analysis model and algorithms. Appl Comput Harmonic Anal 34(1):30–56
Article MathSciNet Google Scholar
Giryes R, Nam S, Elad M, Gribonval R, Davies M (2012) Greedylike algorithms for the cosparse analysis model. Linear Algebra Appl 441(1):22–60
MATH Google Scholar
Elad M, Milanfar P, Rubinstein R (2007) Analysis versus synthesis in signal priors. Inverse Probl 23(3):947
Article MathSciNet Google Scholar
Fang W, Wang H, Xu B, Zhang Y (2016) Blind source separation using analysis sparse constraint. Electron Lett 52(13):1112–1114
Article Google Scholar
Rubinstein R, Peleg T, Elad M (2013) Analysis k-svd: a dictionary-learning algorithm for the analysis sparse model. IEEE Trans Signal Process 61(3):661–677
Article MathSciNet Google Scholar
Zhang Y, Wang H, Yu T, Wang W (2013) Subset pursuit for analysis dictionary learning. Proceeding EUSIPCO’2013, pp 1–5
Cai JF, Osher S, Shen Z (2009) Split bregman methods and frame based image restoration. Multiscale Model and Simul 8(2):337–369
Article MathSciNet Google Scholar
Wu F, Jing X, Sun Y, Sun J, Huang L, Cui F, Sun Y (2018) Cross-project and within-project semisupervised software defect prediction: a unified approach. IEEE Trans Reliab 67(2):581–587
Article Google Scholar
Wu F, Xi W, Han L, Jing X, Ji Y (2019) Multi-view synthesis and analysis dictionaries learning for classification. IEICE Trans Inf Syst E102 D(3):659–662
Article Google Scholar

Download references

Acknowledgements

Thank all the referees and the editorial board members for their insightful comments and suggestions, which improved our paper significantly. This study was funded by the National Natural Science Foundation of China under Grant No. 11501351.

Author information

Authors and Affiliations

Department of Mathematics, Shanghai University, Shanghai, 200444, People’s Republic of China
Shuang Ma, Hongjuan Zhang & Zhuoyun Miao

Authors

Shuang Ma
View author publications
You can also search for this author in PubMed Google Scholar
Hongjuan Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Zhuoyun Miao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hongjuan Zhang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ma, S., Zhang, H. & Miao, Z. Blind source separation for the analysis sparse model. Neural Comput & Applic 33, 8543–8553 (2021). https://doi.org/10.1007/s00521-020-05606-y

Download citation

Received: 23 August 2020
Accepted: 11 December 2020
Published: 03 January 2021
Issue Date: July 2021
DOI: https://doi.org/10.1007/s00521-020-05606-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Blind source separation for the analysis sparse model

Abstract

Similar content being viewed by others

Blind Source Separation Based on Dictionary Learning: A Singularity-Aware Approach

Blind Source Separation Based on Mixed Integer Programming

Blind Separation of Instantaneous Mixtures of Independent/Dependent Sources

1 Introduction

2 Problem formulation