1 Introduction

In wireless communications, the useful speech signals are severely degraded by the acoustic noise components caused by different sources such as the environmental background noise, communication channel noise, etc. In recent years, how to estimate the speech of interest from its corrupted observations has become one of the main objectives of various researches on acoustic signal processing, which involves a wide variety of noise reduction and speech enhancement techniques employing one (Lee et al. 2017), two (Gabrea 2003) or multi sensors systems (Bouchard 2003).

Single-channel speech enhancement is still a significant field of research due to its simplicity of implementation and ease of computation. Recently in Upadhyay (2016), the authors deal with the problem of single-channel speech enhancement in stationary environments, they proposed the Wiener filtering with the recursive noise estimation algorithm to enhance the speech signals degraded by the additive noise. In Barysenkaa et al. (2018) authors proposed a single-channel speech enhancement technique using inter-component phase relations, in which it is proposed a new phase estimators that rely on the inter-component phase relations (ICPR) for a polyharmonic signal like speech. The literature is enriched by many works which handle several dual and multi-channel speech enhancement methods. Among them, we can quote the work given in Nabi et al. (2017) where a dual-microphone speech enhancement algorithm was proposed, this last is specified by combining the coherence function and an improved speech enhancement algorithm based on discrete wavelet transform (DWT), also authors in Nabi et al. (2016) proposed a new dual-channel speech enhancement algorithm dedicated to mobile phone applications using the coherence function and the Kalman filter. Additionally, In Meyer and Simmer (1997) authors proposed a Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction, the proposed algorithm yields better results in noise reduction with significantly less distortions and artificial noise than spectral subtraction or Wiener filtering alone. The blind source separation (BSS) structure is also another powerful approach for removal of the acoustic noise from only the observed noisy signals, without any a priori knowledge of the source signals (Syskind Pedersen et al. 2007; Kocinski 2008), many researchers have suggested combinations between adaptive algorithms and the forward and backward blind source separation structures in different domains, as in time domain (Henni et al. 2019; Gabrea et al. 1996), frequency domain (Zoulikha 2016) and wavelet domain (Ghribi et al. 2016). In another hand, the subband adaptive filtering (SAF) have been adopted in real applications of noise reduction and speech enhancement in order to improve the convergence speed and reduce the computational complexity of the conventional fullband adaptive filters (Reza Abutalebi et al. 2004; Milani and Panahi 2009; Lee and Gan 2010; Kim et al. 2008). A survey of techniques that are based on this approach can be found in literature, for example, in previous studies (Lee and Gan 2004) a normalized SAF (NSAF) algorithm based on the principle of minimum disturbance was proposed to deal with the colored input signals, and in Yang et al. (2012) an improved version of the normalized SAF algorithm was proposed to speed up the convergence.

In this paper, we propose a new dual subband implementation of the forward blind source separation structure (FBSS) based on the use of the fast normalized least mean square (FNLMS) algorithm to enhance the speech signal degraded by the acoustic noise components. In this proposed dual subband FNLMS algorithm, the fullband input signals are down sampled and then partitioned into a set of subband signals that occupy contiguous portions of the frequency band, which facilitates the manipulation of the information contained in each subband. After that the forward blind source separation structure is applied. Where, two adaptive filters are used in all subbands configuration of the FBSS structure. This adaptive weight-control mechanism is different from that in the conventional SAF structure, where each subband has its own sub-filter and adaptation loop (Kokkinakis and Loizou 2007). The proposed dual subband algorithm shows a best performance in terms of speed convergence in comparison with: (i) its fullband version, the dual fast normalized least mean square (DFNLMS) algorithm proposed recently in Sayoud et al. (2018), and (ii) the classical fullband dual normalized least mean square (DNLMS) algorithm (Van Gerven et al. 1992) and (iii) the two channel subband forward algorithm (2CSF) (Djendi and Bendoumia 2013).

This paper is organized as follows: Sect. 2 presents the FBSS structure for a convolutive mixture. In Sect. 3 the description of the proposed dual subband algorithm and its mathematical formulation are presented. Section 5 presents the simulation results of the proposed dual subband algorithm in comparison with its fullband version DFNLMS algorithm, and the classical fullband DNLMS algorithm and the two channel subband forward algorithm (2CSF). Finally Sect. 6 concludes the paper.

2 The forward blind source separation structure for the convolutive mixture

In real situations, the recorded speech signal is a combination of multiple reflections that occur from the surroundings as delayed and filtered versions of the source signal. In such situation, the mixing model is well approximated by a convolutive mixing model (Syskind Pedersen et al. 2008; Djendi et al. 2007). In our work we focus specifically on the case of two sources recorded by two microphones, as modeled in Fig. 1, and we consider the following assumptions (Weinstein et al. 1993; Djendi 2010):

  • The two sources of speech signal \(s(n)\) and noise \(b(n)\) are statically independent.

  • The direct acoustic paths are equal to the Kronecker unit impulse \(~\delta (n)\).

Fig. 1
figure 1

The convolutive mixture model

The noisy signals are composed from a linear mixture of filtered versions of each source signals, which are given by the following relations:

$${m_1}\left( n \right)=s\left( n \right)+{h_1}\left( n \right)*b(n)$$
(1)
$${m_2}\left( n \right)=b\left( n \right)+{h_2}\left( n \right)*s(n)$$
(2)

where (*) symbolizes the convolution operation and \({h_1}(n)\) and \({h_2}(n)\) represent the cross-coupling effects between the channels. The principal problem is to retrieve the original sources of \(s(n)\) and \(b(n)\) from the two mixing signals, without possessing any information on the sources, to settle this issue we use the forward blind source separation structure (Darazirar and Djendi 2015; Van Gerven et al. 1992) depicted in Fig. 2.

Fig. 2
figure 2

The FBBS structure model

The two output signals of the FBSS structure are given by:

$${u_1}\left( n \right)={m_1}\left( n \right) - {m_2}\left( n \right)*{w_1}\left( n \right)$$
(3)
$${u_2}\left( n \right)={m_2}\left( n \right) - {m_1}(n)*{w_2}(n)$$
(4)

By inserting relations (1) and (2) in (3) and (4) respectively, the relations of the two output signals become as follows:

$${u_1}(n)=b\left( n \right)*\left[ {{h_1}\left( n \right) - {w_1}(n)} \right]~+s\left( n \right)*\left[ {\delta \left( n \right) - {h_2}\left( n \right)*{w_1}\left( n \right)} \right]$$
(5)
$${u_2}\left( n \right)=s\left( n \right)*\left[ {{h_2}\left( n \right) - {w_2}\left( n \right)} \right]+b\left( n \right)*\left[ {\delta \left( n \right) - {h_1}(n)*{w_2}\left( n \right)} \right]$$
(6)

If we want to get the speech and noise signal respectively at the outputs \({u_1}\left( n \right)\) and \({u_2}\left( n \right)\), we have to do convergence of the adaptive filters towards the optimal solution i.e. \({h_2}(n)={w_2}(n)\) and \({h_1}(n)={w_1}(n)\), hence the Eqs. (5 and 6) are given now as follows:

$${u_1}\left( n \right)=s\left( n \right)*[\delta \left( n \right) - {h_2}\left( n \right)*{h_1}\left( n \right)]$$
(7)
$$~{u_2}\left( n \right)=b\left( n \right)*[\delta \left( n \right) - {h_1}\left( n \right)*{h_2}\left( n \right)]$$
(8)

The FBSS structure presents the drawback of distorting the output signals by the post filters, \(pf=[\delta \left( n \right) - {h_2}\left( n \right)*{h_1}\left( n \right)]\) (Djendi et al. 2006). To avert this situation, we regard in this paper the case where the two microphones are loosely spaced.

3 Description of the proposed dual subband algorithm

In this section we will describe the proposed dual subband algorithm, a general scheme of the proposed dual subbund FNLMS algorithm decomposition is given by Fig. 3. The proposed dual subband algorithm is founded on the following steps:

Fig. 3
figure 3

General scheme of the proposed dual subband FNLMS algorithm

3.1 Step 1

In the first step, we use the analysis filter banks to spilt the fullband mixing signals into a finite number of \(M\) subbund signals \({m_{1i}}(n),\,{m_{2i}}(n)\), therefore we decimated every output sub-signals by a factor D. The decimated mixing sub-signals can be expressed as follows:

$${m_{1i,D}}(p)={m_{1i}}(pM)\,\;i=1,~2,~ \ldots ,M.$$
(9)
$${m_{2i,D}}(p)={m_{2i}}(pM)\;\,i=1,~2,~ \ldots ,M.$$
(10)
$$~{m_{1i}}(n)=H_{i}^{T}{m_1}(n)\,\;i=1,~2,~ \ldots ,M.$$
(11)
$${m_{2i}}(n)=H_{i}^{T}{m_2}(n)\;\,i=1,~2,~ \ldots ,M.$$
(12)

where \(M\) is the number of subbands and \(D\) is the decimeter factor, we take \(D=M\). The variable n is used for the time index of the original fullband signals and\(~p\) is used for the decimated sub-signals. \({m_{1i}}(pM)\) and \({m_{2i}}(pM)\) are the outputs of the analysis filters banks. \({m_1}\left( n \right)=\left[ {{m_1}\left( n \right),~{m_1}\left( {n - 1} \right), \ldots ,{m_1}\left( {n - l+1} \right)} \right]\), \({m_2}\left( n \right)=\left[ {{m_2}\left( n \right),~{m_2}\left( {n - 1} \right), \ldots ,{m_2}\left( {n - l+1} \right)} \right]\). \(l\) is the length of the analysis filters \({H_i}\).

3.2 Step 2

In the second step, we applied the FBSS structure described in Sect. 2 to identify the decimated output sub-signals \({u_{1i,D}}\left( p \right)\), \({u_{2i,D}}\left( p \right)\) from only the decimated mixing sub-signals \({m_{1i,D}}(p)\), \({m_{2i,D}}(p)\). This FBSS structure uses two adaptive filters to extract the original speech from noise, in this paper we update the coefficients of these adaptive filters using the fast normalized least mean square (FNLMS) algorithm in a subband from, a full mathematical description of the proposed dual subband FNLMS algorithm will be presented in the next subsection.

The decimated output sub-signals of the proposed dual subband FNLMS algorithm are given by the following formulas:

$${u_{1i,D}}\left( p \right)={m_{1i,D}}\left( p \right) - w_{1}^{T}\left( p \right){m_{2i,D}}(p)\,\,\,i=1,~2,~ \ldots ,M.$$
(13)
$${u_{2i,D}}\left( p \right)={m_{2i,D}}\left( p \right) - w_{2}^{T}\left( p \right){m_{1i,D}}(p)\,\,\,i=1,~2,~ \ldots ,M.$$
(14)

where \({m_{1i,D}}\left( p \right)=\left[ {{m_{1i,D}}\left( p \right),~{m_{1i,D}}\left( {p - 1} \right), \ldots ,{m_{1i}}\left( {p - L+1} \right)} \right]\) and \({m_{2i,D}}\left( p \right)=\left[ {{m_{2i,D}}\left( p \right),~{m_{2i,D}}\left( {p - 1} \right), \ldots ,{m_{2i,D}}\left( {P - L+1} \right)} \right]\). \(L\) is the length of the adaptive filters.

3.3 Step 3

In the last step, the synthesis filter banks are used to combine the \(M\) decimated output sub-signals \({u_{1i,D}}(p)\), \({u_{2i,D}}(p)\) into the fullband output forms \({u_1}(n)\) and \({u_2}(n)\). The synthesis filter bank consists of a bank of interpolators that up sample the subband signals by an interpolator factor \(~I\), before filtering and adding these subband signals (Reza Abutalebi et al. 2004; Lee and Gan 2010). After an interpolation procedure, the new output sub-signals can be expressed as follows:

$${u_{1i}}\left( n \right)=\left\{ {\begin{array}{*{20}{c}} {{u_{1i,D}}(p/I),~~~n=0, \pm I, \pm 2I, \ldots \ldots } \\ {0~~~~~~~~~~~~~~~~~~{\text{otherwise}}} \end{array}} \right.\,\,{\text{For}}\;i=1,~2,~ \ldots ,M.$$
(15)
$${u_{2i}}\left( n \right)=\left\{ {\begin{array}{*{20}{c}} {{u_{2i,D}}(p/I),~~~n=0, \pm I, \pm 2I, \ldots \ldots } \\ {0~~~~~~~~~~~~~~~~~~{\text{otherwise}}} \end{array}} \right.\,\,{\text{For}}\,i=1,~2,~ \ldots ,M.$$
(16)

where \(I\) is the interpolator factor, in our case we take \(I=D=M~.\)

The fullband outputs \({u_1}(n)\) and \({u_2}(n)\) of the proposed dual subband FNLMS algorithm are given by the following relations:

$${u_1}\left( n \right)=\mathop \sum \limits_{{i=1}}^{M} {\varvec{G}}_{i}^{T}{{\varvec{U}}_{1i}}(n)\,\,\,i=1,~2,~ \ldots ,M.$$
(17)
$${u_2}\left( n \right)=\mathop \sum \limits_{{i=1}}^{M} {\varvec{G}}_{i}^{T}{{\varvec{U}}_{2i}}(n)\,\,i=1,~2,~ \ldots ,M.$$
(18)

where \({U_{1i}}\left( n \right)=\left[ {{u_{1i}}\left( n \right),~{u_{1i}}\left( {n - 1} \right), \ldots ,{u_{1i}}\left( {n - l+1} \right)} \right]\), \({U_{2i}}\left( n \right)=\left[ {{u_{2i}}\left( n \right),~{u_{2i}}\left( {n - 1} \right), \ldots ,{u_{2i}}\left( {n - l+1} \right)} \right].\)

In Table 1, the proposed subband decomposition is summarized.

Table 1 Summary of the proposed dual subband decomposition

3.4 Mathematical formulation of the proposed dual subband FNLMS algorithm

In this subsection, we derive the mathematical formulation of the dual forward fast normalized least mean square algorithm in its subbund form. The scheme of the proposed dual subbund algorithm is given by Fig. 4.

Fig. 4
figure 4

Detailed scheme of the proposed dual subband FNLMS algorithm

From Fig. 4, we can deduce that the update relations of the adaptive filters \(~{w_1}\left( p \right)\) and \(~{w_2}(p)\) of the proposed dual subbund FNLMS algorithm can be expressed as follows:

$${w_1}\left( {p+1} \right)={w_1}\left( p \right) - {\mu _1}\mathop \sum \limits_{{i=1}}^{M} \left[ {{u_{1i,D}}\left( p \right){c_{1i,D}}(p)} \right]\,\,i=1,~2,~ \ldots ,M.$$
(19)
$${w_2}\left( {p+1} \right)={w_2}\left( p \right) - {\mu _2}\mathop \sum \limits_{{i=1}}^{M} \left[ {{u_{2i,D}}\left( p \right){c_{2i,D}}(p)} \right]\,\,i=1,~2,~ \ldots ,M.$$
(20)

where \({w_1}(p)~=~{[{w_1}(p),{w_1}(p~ - ~1),~ \ldots ,~{w_1}(p~ - ~L~+~1)]^T}\) and \({w_2}(p)~=~{[{w_2}(p),{w_2}(p~ - ~1),~ \ldots ,~{w_2}(p~ - ~L~+~1)]^T}\). \(0<{\mu _1},\,{\mu _2}<2\), are defined as the step-size parameters which affects the convergence behavior of the filter weights. \({c_{1i,D}}\left( p \right)\) and \({c_{2i,D}}\left( p \right)\) are the decimated subbund adaptation gain vectors, which are given by the following relations:

$${c_{1i,D}}\left( p \right)={\gamma _{1i,D}}\left( p \right){k_{1i,D}}(p)~\;\,i=1,~2,~ \ldots ,M.$$
(21)
$${c_{2i,D}}\left( p \right)={\gamma _{2i,D}}\left( p \right){k_{2i,D}}(p)~\,\,i=1,~2,~ \ldots ,M.$$
(22)

The scalars \({\gamma _{1i,D}}(p)\) and \({\gamma _{2i,D}}(p)\) that are used in Eqs. (21) and (22) respectively are called likelihood variables, and can be calculated using the following definition (Benallal and Arezki 2013; Benallal and Benkrid 2007):

$${\gamma _{1i,D}}\left( p \right)=\frac{1}{{1 - k_{{1i,D}}^{T}(p){m_{2i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$
(23)
$${\gamma _{2i,D}}\left( p \right)=\frac{1}{{1 - k_{{2i,D}}^{T}(p){m_{1i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$
(24)

The decimated subbund vectors \({k_{1i,D}}(p)~\) and \({k_{2i,D}}(p)\) are called the kalman gain, which are obtained by discarding the forward and backward predictors and using only the forward predictions errors \({e_{1i,D}}(p)\) and \({e_{2i,D}}(p)\) (Benallal and Arezki 2013; Sayoud et al. 2018), so the dual kalman gain of the proposed algorithm in their subbund form can be calculated by the following relations:

$$\left[ {\begin{array}{*{20}{c}} {{k_{1i,D}}(p)} \\ * \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} { - \frac{{{e_{1i,D}}(p)}}{{\lambda {\alpha _{1i,D}}\left( {p - 1} \right)+{c_0}}}} \\ {~{k_{1i,D}}(p - 1)} \end{array}} \right]\,\,\,i=1,~2,~ \ldots ,M.$$
(25)
$$\left[ {\begin{array}{*{20}{c}} {{k_{2i,D}}(p)} \\ * \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} { - \frac{{{e_{2i,D}}(p)}}{{\lambda {\alpha _{2i,D}}\left( {p - 1} \right)+{c_0}}}} \\ {~{k_{2i,D}}(p - 1)} \end{array}} \right]\,\,\,i=1,~2,~ \ldots ,M.$$
(26)

where the asterisk ‘*’represents the last unused element of the dual Kalman gain vectors, \(\lambda \left( {0<\lambda <1} \right)\) is an exponential forgetting factor and \({c_0}\) is a small positive constant used to avoid division by very small values in absence of the input signal. The decimated subband parameters \({\alpha _{1i}}\left( p \right)\) and \({\alpha _{2i}}\left( p \right)\) are the forward prediction errors variances, given by:

$${\alpha _{1i,D}}\left( p \right)=\lambda {\alpha _{1i,D}}\left( {p - 1} \right)+~{e^2}_{{1i,D}}(p)\,\,i=1,~2,~ \ldots ,M.$$
(27)
$${\alpha _{2i,D}}\left( p \right)=\lambda {\alpha _{2i,D}}\left( {p - 1} \right)+~{e^2}_{{2i,D}}(p)~\,\,i=1,~2,~ \ldots ,M.$$
(28)

The forward predictions errors \({e_{1i,D}}(p)\) and \({e_{2i,D}}(p)\) can be calculated using the first order model, so the decimated subbund forward prediction errors of the proposed algorithm can expressed as follows:

$${e_{1i,D}}\left( p \right)={m_{2i,D}}\left( p \right) - {a_{1i,D}}{m_{2i,D}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$
(29)
$${e_{2i,D}}\left( p \right)={m_{1i,D}}\left( p \right) - {a_{2i,D}}{m_{1i,D}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$
(30)

where \({a_{1i,D}}\) and \({a_{2i,D}}\) are the decimated subbund prediction coefficients, to obtain these prediction coefficients we minimize the functions \(E\left[ {e_{{1i,D}}^{2}(n)} \right]\) and \(E\left[ {e_{{2i,D}}^{2}(n)} \right]\), so we get these relations:

$${a_{1i,D}}(p)=\frac{{E\left[ {{m_{2i,D}}\left( p \right){m_{2i,D}}\left( {p - 1} \right)} \right]}}{{E\left[ {{m_{2i,D}}^{2}\left( {p - 1} \right)} \right]}}=\frac{{{r_{1i,D}}(p)}}{{{r_{2i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$
(31)
$${a_{2i,D}}(p)=\frac{{E\left[ {{m_{1i,D}}\left( p \right){m_{1i,D}}\left( {p - 1} \right)} \right]}}{{E\left[ {{m_{1i,D}}^{2}\left( {p - 1} \right)} \right]}}=\frac{{{r_{3i,D}}(p)}}{{{r_{4i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$
(32)

where \({r_{1i,D}}\left( p \right)\) and \({r_{2i,D}}\left( p \right)\) represent respectively, the first coefficient of the autocorrelation function of the decimated subbund mixtures \({m_{2i,D}}(p)\) and the power of the decimated subbund mixtures \({m_{2i,D}}(p)\). \({r_{3i,D}}\left( p \right)\) and \({r_{4i,D}}\left( p \right)\) represent respectively, the first coefficient of the autocorrelation function of the decimated subbund mixtures \({m_{1i,D}}(p)\) and the power of the decimated subbund mixtures \({m_{1i,D}}(p)\).

An estimation of the prediction coefficients can be performed by the following relations:

$${a_{1i,D}}(p)=\frac{{{r_{1i,D}}(p)}}{{{r_{2i,D}}(p)+{c_a}}}\,\,i=1,~2,~ \ldots ,M.$$
(33)
$${a_{2i,D}}(p)=\frac{{{r_{3i,D}}(p)}}{{{r_{4i,D}}(p)+{c_a}}}\,\,i=1,~2,~ \ldots ,M.$$
(34)

where \(~{r_{1i,D}}(p)\), \({r_{2i,D}}(p)\), \({r_{3i,D}}(p)\), and \({r_{4i,D}}(p)\) are estimated recursively by the following relations:

$${r_{1i,D}}(p)={\lambda _a}{r_{1i,D}}\left( {p - 1} \right)+{m_{2i,D}}(p)~{m_{2i}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$
(35)
$${r_{2i,D}}(p)={\lambda _a}{r_{2i,D}}\left( {p - 1} \right)+{m_{2i,D}}^{2}(p)\,\,i=1,~2,~ \ldots ,M.$$
(36)
$${r_{3i,D}}(p)={\lambda _a}{r_{3i,D}}\left( {p - 1} \right)+{m_{1i,D}}(p)~{m_{1i,D}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$
(37)
$${r_{4i,D}}(p)={\lambda _a}{r_{4i,D}}\left( {p - 1} \right)+{m_{1i,D}}^{2}(p)\,\,i=1,~2,~ \ldots ,M.$$
(38)

where \({\lambda _a}\) is a forgetting factor and \({c_a}\) is a small positive constant. The summary of proposed dual subband algorithm is given in Table 2.

Table 2 The proposed dual subband FNLMS algorithm

4 Analysis of simulation results

In this section, we investigate the potential of the proposed dual subband algorithm for achieving speech separation in adverse environments, intensives simulations are carried out. In Fig. 6 we present the source signals \(s(n)\) and \(b(n)\) used in our simulation which are respectively, a French sentence of about 4 s length, pronounced by one male speaker, and a stationary (USASI) noise (United State of America Standard Institute now (ANSI)), digitized at an \(8\;{\text{kHz}}\) sampling frequency with 16 bits quantification. These two source signals and two real acoustic impulse responses (see Fig. 5) are used by the simplified convolutive mixture of section II to generate the mixing signals \({m_1}(n)\) and \({m_2}(n)\), given in Fig. 6.

Fig. 5
figure 5

Example of the simulated impulse responses, in left \({h_1}(n)\) and in right \({h_2}(n)\), with \(L=128\)

Fig. 6
figure 6

Original signals: speech signal \(s(n)\) [top left], noise signal \(b(n)\) [top right] and mixing signals: mixing 1[bottom left], mixing 2 [bottom right]

The proposed dual subband algorithm is adapted using a manual activity voice detector (MAVD) system however, the filter \({w_1}(n)\) is adapted during speech pauses where the noise characteristics being estimated, in order to obtain the speech signal at the output \({u_1}(n)\), and to retrieve the noise signal at the output \({u_2}(n)\) we update the filter \({w_2}(n)~\)during speech presence periods. Figure 7 show an example of the MAVD of the original speech signal.

Fig. 7
figure 7

Original speech signal [in blue] and it manual segmentation [in magenta]. (Color figure online)

We present in Figs. 8, 9, 10 the frequency responses characteristics of the analysis and the synthesis filters [described in Sect. 3], the number of subbands is chosen equal to 2 and 4, and the length of these subband filters is equal to 16 and 32 respectively.

Fig. 8
figure 8

Frequency responses characteristics of the analysis/synthesis filters: two subbands (\({H_1}(n)/{G_1}(n)\) left) and (\({H_2}(n)/{G_2}(n)\) right)

Fig. 9
figure 9

Frequency responses characteristics of the analysis/synthesis filters: four subbands (\({H_1}(n)/{G_1}(n)\) left) and (\({H_2}(n)/{G_2}(n)\) right)

Fig. 10
figure 10

Frequency responses characteristics of the analysis/synthesis filters: four subbands (\({H_3}(n)/{G_3}(n)\) left) and (\({H_4}(n)/{G_4}(n)\) right)

The comparative time evolution of the original speech signal \(s(n)\) and the enhanced one \({u_1}(n)\) obtained by the proposed dual subbund algorithm with \(M=2\) and \(M=4\) and its fullband version (DFNLMS) algorithm published recently in Sayoud et al. (2018) are presented in Fig. 11, from these results it is observed that the two algorithms i.e. proposed dual subband algorithm and its fullband version DFNLMS algorithm are able to remove the noise from the output \({u_1}(n)\), hence we can confirm the good behavior of the two algorithms in noise reduction applications.

Fig. 11
figure 11

The comparative time evolution of the original speech signal \(s(n)\) and the enhanced one \({u_1}(n)\) obtained by the following algorithms, DFNLMS [in top], Proposed [2 subbands: in middle], proposed [4 subbands: in bottom]

4.1 Performance measure

We have compared the noise cancellation performance properties of the proposed dual subband FNLMS algorithm with (i) its fullband version the dual fast normalized least mean square (DFNLMS) (Sayoud et al. 2018) algorithm, this algorithm is based on the combination between the FBBS structure with the FNLMS algorithm, and (ii) the classical dual normalized least mean square (Van Gerven et al. 1992, DNLMS) algorithm, which is a dual adaptive filtering algorithm based on the use of the FBSS structure combined with the NLMS algorithm, and (iii) the two channel subband forward (Djendi and Bendoumia 2013, 2CSF) algorithm, which is a subband adaptive filtering algorithm based on the forward blind source separation structure. This comparative evaluation is performed using the objective measures criteria cited bellow. As we are interested on speech enhancement, we will focus only on the enhanced output \({u_1}(n)\) and the adaptive filter \({w_1}(n)\) in the objective evaluation. The simulation parameters of each simulated algorithm are given in Table 3.

Table 3 simulation parameters of: Fullband DNLMS algorithm (Van Gerven et al. 1992), Fullband DFNLMS algorithm (Sayoud et al. 2018), 2CSF algorithm (Djendi and Bendoumia 2013) and the proposed dual subband FNLMS algorithm [In this paper]

4.1.1 Segmental signal-to-noise-ratio (SegSNR) criterion

To evaluate the noise cancellation performance of the proposed dual subband algorithm in comparison to the DNLMS, DFNLMS, 2CSF algorithms, we have used the segSNR criterion, which is computed as follows (Sayoud et al. 2018):

$$SegSN{R_{dB}}=\frac{{10}}{M}\mathop \sum \limits_{{m=0}}^{{M - 1}} lo{g_{10}}\left( {\frac{{\mathop \sum \nolimits_{{n=Nm}}^{{Nm+N - 1}} {{\left| {s\left( n \right)} \right|}^2}}}{{\mathop \sum \nolimits_{{n=Nm}}^{{Nm+N - 1}} {{\left| {s(n) - {u_1}\left( n \right)} \right|}^2}}}} \right)$$
(39)

where \(s(n)\) and \({u_1}\left( n \right)\) are the original and the enhanced speech signals, respectively. The parameters M and N are the number of segments and the segment length, respectively. We note that at the output, we get M values of the SegSNR criterion, each one is mean averaged on ‘N’ samples. The symbol \(\left| \cdot \right|\) represents the absolute operator. We recall here that all the ‘M’ segments correspond to only speech signal presence periods. The log10 symbol is the base 10 logarithm of a number. The simulation parameters of each simulated algorithm are given in Table 3. The obtained results are reported on Figs. 12, 13. In the first experiment of Fig. 12, we have used the white noise at the input of the convolutive mixture to evaluate the stability performance of the proposed dual subband algorithm with (2 and 4 subbands), and to test the convergence speed performance of each algorithm, the USASI noise is used in the second experiment of Fig. 13. From these results, we note a close behavior of the four simulated algorithms (i.e. DNLMS, DFNLMS, 2CSF and the proposed dual subband) in terms of SegSNR criterion with different adaptive filter lengths (i.e. L = 64 and L = 128) when the noise is white, which confirm that the proposed dual subband FNLMS algorithm is numerically stable. When the USASI noise is used, we observe a poor behavior of the DNLMS algorithm. Also we have noted a similar behavior in the transient regime between the proposed dual subband algorithm (with 2 and 4 subbands) and its fullband version (DFNLMS) algorithm and the 2CSF algorithm with 2 subbands, however in the steady-state regime the SegSNR values of the proposed dual subband algorithm decrease specifically when the number of subbands is selected equal to 4.

Fig. 12
figure 12

Segmental SNR evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using white noise source type, for the adaptive filter lengths: L = 64 [in right], L = 128 [in left]. The input SNRs at both inputs is 0 dB

Fig. 13
figure 13

Segmental SNR evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using USASI noise source type, for the adaptive filter lengths: L = 64 [in right], L = 128 [in left]. The input SNRs at both inputs is 0 dB

4.1.2 System mismatch criterion

The SM criterion allows to quantify objectively the convergence speed of the adaptive filter to the optimal solution of each algorithm (i.e. DNLMS, DFNLMS, 2CSF, and the proposed dual subband algorithms), As we are interesting only on the enhanced output\({u_1}(n)\), we will focus only on the convergence of the adaptive filter \({w_1}(n)\) to the real impulse response\(~{h_1}(n)\), so the SM is estimated by the following relation (Hu and Loizou 2008):

$$S{M_{dB}}=20~{\log _{10}}\left( {\frac{{\left\| {{h_1} - {w_1}(n)} \right\|}}{{\left\| {{h_1}} \right\|}}} \right)$$
(40)

where \(\left\| \cdot \right\|\) represent the mathematical Euclidean norm operator. We recall that the simulation parameters of each algorithm are the same as given in Table 3. We have evaluated the SM criterion for two noise types (i.e. USASI and white) and two adaptive filter lengths (i.e. L = 64 and L = 128). The obtained results are reported on Figs. 14, 15. From Fig. 14 we can see that the overall behavior of the four simulated algorithms is very close. We have also noted in the case of USASI noise (Fig. 15) the convergence speed superiority of the proposed dual subband FNLMS algorithm in comparison with the other ones. The proposed dual subband FNLMS algorithm reaches the fastest convergence speed when the number of subbands is equal to 4, for all selected adaptive filter lengths i.e. L = 64 and L = 128.

Fig. 14
figure 14

System mismatch evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using white noise source type, for the adaptive filter lengths: L = 64 [in right], L = 128 [in left]. The input SNRs at both inputs is 0 dB

Fig. 15
figure 15

System Mismatch evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using USASI noise source type, for the adaptive filter lengths: L = 64 [in right], L = 128 [in left]. The input SNRs at both inputs is 0 dB

4.1.3 Segmental mean square error (SegMSE) criterion

We have used the segmental MSE of Eq. (41) to evaluate the convergence speed performance of the proposed dual subband FNLMS algorithm in comparison whit its fullband version DFNLMS and the classical fullband DNLMS algorithm and the 2CSF algorithm.

$$SegMS{E_{dB}}=\frac{{10}}{M}\mathop \sum \limits_{{m=0}}^{{M - 1}} lo{g_{10}}\left( {\frac{1}{N}\mathop \sum \limits_{{n=Nm}}^{{Nm+N - 1}} {{\left| {s\left( n \right) - {u_1}\left( n \right)} \right|}^2}} \right)$$
(41)

where \(N\) is the segment length of the original signal \({\text{s}}\left( n \right)\) and the enhanced one \({u_1}(n)\), and \(M\) represent the number of segments in silence periods. Relation (41) shows that the SegMSE criterion is evaluated only in the speech pauses (Ghribi et al. 2016). The simulation parameters are the same as given in the previous sections (Table 3).

In Fig. 16 we have shown the SegMSE evaluation of a white noise, in this simulation we can see again that the overall behavior of the four simulated algorithms are very close, however in Fig. 17 when the noise signal is a USASI type, we can see clearly that in the transient phase, the proposed dual subband FNLMS algorithm with (2 and 4 subbands) performs better than the other algorithms. From these results we can confirm that the proposed dual subband FNLMS algorithm improves the convergence speed in real situation when the source signals are not stationary and even in noisy conditions (input SNR is 0 dB).

Fig. 16
figure 16

Segmental MSE evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using white noise source type, for the adaptive filter lengths: L = 64 [in right], L = 128 [in left]. The input SNRs at both inputs is 0 dB

Fig. 17
figure 17

Segmental MSE evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using USASI noise source type, for the adaptive filter lengths: L = 64 [in right], L = 128 [in left]. The input SNRs at both inputs is 0 dB

4.1.4 Cepstral distance criterion

In order to compare the average cepstral distance between the original speech signal and the enhanced outputs obtained by the four algorithms (i.e. DNLMS, DFNLMS, 2CSF, and the proposed dual subband algorithm), we have used the cepstral distance criterion computed as follow (Rabiner 1993; Sayoud et al. 2018):

$$C{D_{dB}}=\mathop \sum \limits_{{\lambda =0}}^{{T - 1}} IFFT{\left[ {{\text{log}}\left( {S\left| {\left( {\lambda ,\omega } \right)} \right|} \right) - \log \left( {{U_1}\left| {\left( {\lambda ,\omega } \right)} \right|VA{D_\lambda }} \right)} \right]^2}$$
(42)

where \(S(\lambda ,\omega )\) and \({U_1}(\lambda ,\omega )\) present the short Fourier transform of the original speech signal \(s(n)\) and the enhanced one \({u_1}(n)\) respectively at each frame \(~\lambda\), and T is the mean averaging value of the CD criterion, and \(VAD\) parameter is a voice activity detector. In Figs. 18, 19 we present the obtained results of the CD criterion evaluation. When a white noise is used in the mixing model (see Fig. 18), the DNLMS algorithm and the DFNLMS algorithm provide the same CD values even with different adaptive filter lengths (i.e. 64 and 128). The proposed dual subband FNLMS algorithm with (2 and 4 subbands) and the 2CSF algorithm with (2 subbands) behaves poorly in comparison with both algorithms. Furthermore, when the USASI noise is used (see Fig. 19) the DFNLMS algorithm outperforms the others algorithms (i.e. DNLMS, 2CSF, and the proposed dual subband algorithm), we can also see that the DNLMS algorithm has the worst performance when the adaptive filter length is large (L = 128).

Fig. 18
figure 18

CD criterion evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, using white noise source type, for the adaptive filter lengths: L = 64 [in Top], L = 128 [in Bottom]. The input SNRs at both inputs is 0 dB

Fig. 19
figure 19

CD criterion evaluation of the DNLMS (Van Gerven et al. 1992), DFNLMS (Sayoud et al. 2018), 2CSF (Djendi and Bendoumia 2013) and the proposed dual subband with (2 and 4 subbands) [in this paper] algorithms, USASI noise source type, for the adaptive filter lengths: L = 64 [in Top], L = 128 [in Bottom]. The input SNRs at both inputs is 0 dB

5 Conclusion

In this paper we have proposed a new dual subband implementation of the forward blind source separation structure (FBSS) based on the use of the fast normalized least mean square (FNLMS) algorithm. This proposed dual subband FNLMS algorithm has shown good properties in extracting the speech signal from very noisy observations. To evaluate the performance of the proposed algorithm in comparison with its fullband version the dual fast NLMS algorithm and the classical fullband dual NLMS algorithm and the two channel forward 2CSF algorithm, intensive experiments were performed using several objective criteria, in different situations where punctual noise components are present at the input of the convolutive mixing model. When the noise in the mixing observations is white the simulated algorithms have shown almost the same performance. However, when the used noise is USASI, The obtained results of the SM and the Segmental MSE evaluation have shown the superiority of the proposed subband FNLMS algorithm in terms of convergence speed property in the transient phase. The Segmental SNR evaluation has also shown the good behavior of the proposed dual subband algorithm in reducing the acoustic noise components at the processing output, the only poor behavior of the proposed dual subband algorithm is observed with the cepstral distance criterion, specifically when the number of subbands is selected high, although the CD values of this proposed algorithm are around − 5 dB which indicates a good intelligibility property of the output speech signal. According to these results, we can conclude that when we use the proposed dual subband FNLMS algorithm, we need to make a good compromise between the number of subbands and the performance that we want to achieve, either faster convergence or less distortion speech, and this choice is done in accordance with the application.