A new dual subband fast NLMS adaptive filtering algorithm for blind speech quality enhancement and acoustic noise reduction

Djendi, Mohamed; Sayoud, Akila

doi:10.1007/s10772-019-09614-9

A new dual subband fast NLMS adaptive filtering algorithm for blind speech quality enhancement and acoustic noise reduction

Published: 28 March 2019

Volume 22, pages 391–406, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Speech Technology Aims and scope Submit manuscript

A new dual subband fast NLMS adaptive filtering algorithm for blind speech quality enhancement and acoustic noise reduction

Download PDF

Mohamed Djendi¹ &
Akila Sayoud¹

223 Accesses
6 Citations
Explore all metrics

Abstract

This paper discusses the problem of acoustic noise reduction and speech enhancement through the forward blind source separation structure. Recently we have proposed a new combination between the forward blind source separation structure and the fast normalized least mean square algorithm that provides an efficient dual algorithm for noise reduction and speech enhancement applications. In this paper we propose a new subband implementation of this recent dual algorithm, this last allows improving the speed convergence behavior of the previous proposed algorithm in its fullband form. The performance of the proposed dual subband algorithm is compared with its fullband version of the dual fast normalized least mean square algorithm and the classical fullband dual normalized least mean square algorithm, and the two channel subband forward algorithm in terms of several objective criteria. The obtained results show the good performances of the proposed dual subband algorithm.

Two-channel forward NLMS algorithm combined with simple variable step-sizes for speech quality enhancement

Article 13 July 2018

Upgraded NLMS algorithm for speech enhancement with sparse and dispersive impulse responses

Article 10 February 2020

Sparse Blind Speech Deconvolution with Dynamic Range Regularization and Indicator Function

Article 06 February 2017

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In wireless communications, the useful speech signals are severely degraded by the acoustic noise components caused by different sources such as the environmental background noise, communication channel noise, etc. In recent years, how to estimate the speech of interest from its corrupted observations has become one of the main objectives of various researches on acoustic signal processing, which involves a wide variety of noise reduction and speech enhancement techniques employing one (Lee et al. 2017), two (Gabrea 2003) or multi sensors systems (Bouchard 2003).

Single-channel speech enhancement is still a significant field of research due to its simplicity of implementation and ease of computation. Recently in Upadhyay (2016), the authors deal with the problem of single-channel speech enhancement in stationary environments, they proposed the Wiener filtering with the recursive noise estimation algorithm to enhance the speech signals degraded by the additive noise. In Barysenkaa et al. (2018) authors proposed a single-channel speech enhancement technique using inter-component phase relations, in which it is proposed a new phase estimators that rely on the inter-component phase relations (ICPR) for a polyharmonic signal like speech. The literature is enriched by many works which handle several dual and multi-channel speech enhancement methods. Among them, we can quote the work given in Nabi et al. (2017) where a dual-microphone speech enhancement algorithm was proposed, this last is specified by combining the coherence function and an improved speech enhancement algorithm based on discrete wavelet transform (DWT), also authors in Nabi et al. (2016) proposed a new dual-channel speech enhancement algorithm dedicated to mobile phone applications using the coherence function and the Kalman filter. Additionally, In Meyer and Simmer (1997) authors proposed a Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction, the proposed algorithm yields better results in noise reduction with significantly less distortions and artificial noise than spectral subtraction or Wiener filtering alone. The blind source separation (BSS) structure is also another powerful approach for removal of the acoustic noise from only the observed noisy signals, without any a priori knowledge of the source signals (Syskind Pedersen et al. 2007; Kocinski 2008), many researchers have suggested combinations between adaptive algorithms and the forward and backward blind source separation structures in different domains, as in time domain (Henni et al. 2019; Gabrea et al. 1996), frequency domain (Zoulikha 2016) and wavelet domain (Ghribi et al. 2016). In another hand, the subband adaptive filtering (SAF) have been adopted in real applications of noise reduction and speech enhancement in order to improve the convergence speed and reduce the computational complexity of the conventional fullband adaptive filters (Reza Abutalebi et al. 2004; Milani and Panahi 2009; Lee and Gan 2010; Kim et al. 2008). A survey of techniques that are based on this approach can be found in literature, for example, in previous studies (Lee and Gan 2004) a normalized SAF (NSAF) algorithm based on the principle of minimum disturbance was proposed to deal with the colored input signals, and in Yang et al. (2012) an improved version of the normalized SAF algorithm was proposed to speed up the convergence.

In this paper, we propose a new dual subband implementation of the forward blind source separation structure (FBSS) based on the use of the fast normalized least mean square (FNLMS) algorithm to enhance the speech signal degraded by the acoustic noise components. In this proposed dual subband FNLMS algorithm, the fullband input signals are down sampled and then partitioned into a set of subband signals that occupy contiguous portions of the frequency band, which facilitates the manipulation of the information contained in each subband. After that the forward blind source separation structure is applied. Where, two adaptive filters are used in all subbands configuration of the FBSS structure. This adaptive weight-control mechanism is different from that in the conventional SAF structure, where each subband has its own sub-filter and adaptation loop (Kokkinakis and Loizou 2007). The proposed dual subband algorithm shows a best performance in terms of speed convergence in comparison with: (i) its fullband version, the dual fast normalized least mean square (DFNLMS) algorithm proposed recently in Sayoud et al. (2018), and (ii) the classical fullband dual normalized least mean square (DNLMS) algorithm (Van Gerven et al. 1992) and (iii) the two channel subband forward algorithm (2CSF) (Djendi and Bendoumia 2013).

This paper is organized as follows: Sect. 2 presents the FBSS structure for a convolutive mixture. In Sect. 3 the description of the proposed dual subband algorithm and its mathematical formulation are presented. Section 5 presents the simulation results of the proposed dual subband algorithm in comparison with its fullband version DFNLMS algorithm, and the classical fullband DNLMS algorithm and the two channel subband forward algorithm (2CSF). Finally Sect. 6 concludes the paper.

2 The forward blind source separation structure for the convolutive mixture

In real situations, the recorded speech signal is a combination of multiple reflections that occur from the surroundings as delayed and filtered versions of the source signal. In such situation, the mixing model is well approximated by a convolutive mixing model (Syskind Pedersen et al. 2008; Djendi et al. 2007). In our work we focus specifically on the case of two sources recorded by two microphones, as modeled in Fig. 1, and we consider the following assumptions (Weinstein et al. 1993; Djendi 2010):

The two sources of speech signal $s(n)$ and noise $b(n)$ are statically independent.
The direct acoustic paths are equal to the Kronecker unit impulse $~\delta (n)$.

The noisy signals are composed from a linear mixture of filtered versions of each source signals, which are given by the following relations:

$${m_1}\left( n \right)=s\left( n \right)+{h_1}\left( n \right)*b(n)$$

(1)

$${m_2}\left( n \right)=b\left( n \right)+{h_2}\left( n \right)*s(n)$$

(2)

where (*) symbolizes the convolution operation and ${h_1}(n)$ and ${h_2}(n)$ represent the cross-coupling effects between the channels. The principal problem is to retrieve the original sources of $s(n)$ and $b(n)$ from the two mixing signals, without possessing any information on the sources, to settle this issue we use the forward blind source separation structure (Darazirar and Djendi 2015; Van Gerven et al. 1992) depicted in Fig. 2.

The two output signals of the FBSS structure are given by:

$${u_1}\left( n \right)={m_1}\left( n \right) - {m_2}\left( n \right)*{w_1}\left( n \right)$$

(3)

$${u_2}\left( n \right)={m_2}\left( n \right) - {m_1}(n)*{w_2}(n)$$

(4)

By inserting relations (1) and (2) in (3) and (4) respectively, the relations of the two output signals become as follows:

$${u_1}(n)=b\left( n \right)*\left[ {{h_1}\left( n \right) - {w_1}(n)} \right]~+s\left( n \right)*\left[ {\delta \left( n \right) - {h_2}\left( n \right)*{w_1}\left( n \right)} \right]$$

(5)

$${u_2}\left( n \right)=s\left( n \right)*\left[ {{h_2}\left( n \right) - {w_2}\left( n \right)} \right]+b\left( n \right)*\left[ {\delta \left( n \right) - {h_1}(n)*{w_2}\left( n \right)} \right]$$

(6)

If we want to get the speech and noise signal respectively at the outputs ${u_1}\left( n \right)$ and ${u_2}\left( n \right)$, we have to do convergence of the adaptive filters towards the optimal solution i.e. ${h_2}(n)={w_2}(n)$ and ${h_1}(n)={w_1}(n)$, hence the Eqs. (5 and 6) are given now as follows:

$${u_1}\left( n \right)=s\left( n \right)*[\delta \left( n \right) - {h_2}\left( n \right)*{h_1}\left( n \right)]$$

(7)

$$~{u_2}\left( n \right)=b\left( n \right)*[\delta \left( n \right) - {h_1}\left( n \right)*{h_2}\left( n \right)]$$

(8)

The FBSS structure presents the drawback of distorting the output signals by the post filters, $pf=[\delta \left( n \right) - {h_2}\left( n \right)*{h_1}\left( n \right)]$ (Djendi et al. 2006). To avert this situation, we regard in this paper the case where the two microphones are loosely spaced.

3 Description of the proposed dual subband algorithm

In this section we will describe the proposed dual subband algorithm, a general scheme of the proposed dual subbund FNLMS algorithm decomposition is given by Fig. 3. The proposed dual subband algorithm is founded on the following steps:

3.1 Step 1

In the first step, we use the analysis filter banks to spilt the fullband mixing signals into a finite number of $M$ subbund signals ${m_{1i}}(n),\,{m_{2i}}(n)$, therefore we decimated every output sub-signals by a factor D. The decimated mixing sub-signals can be expressed as follows:

$${m_{1i,D}}(p)={m_{1i}}(pM)\,\;i=1,~2,~ \ldots ,M.$$

(9)

$${m_{2i,D}}(p)={m_{2i}}(pM)\;\,i=1,~2,~ \ldots ,M.$$

(10)

$$~{m_{1i}}(n)=H_{i}^{T}{m_1}(n)\,\;i=1,~2,~ \ldots ,M.$$

(11)

$${m_{2i}}(n)=H_{i}^{T}{m_2}(n)\;\,i=1,~2,~ \ldots ,M.$$

(12)

where $M$ is the number of subbands and $D$ is the decimeter factor, we take $D=M$. The variable n is used for the time index of the original fullband signals and$~p$ is used for the decimated sub-signals. ${m_{1i}}(pM)$ and ${m_{2i}}(pM)$ are the outputs of the analysis filters banks. ${m_1}\left( n \right)=\left[ {{m_1}\left( n \right),~{m_1}\left( {n - 1} \right), \ldots ,{m_1}\left( {n - l+1} \right)} \right]$, ${m_2}\left( n \right)=\left[ {{m_2}\left( n \right),~{m_2}\left( {n - 1} \right), \ldots ,{m_2}\left( {n - l+1} \right)} \right]$. $l$ is the length of the analysis filters ${H_i}$.

3.2 Step 2

In the second step, we applied the FBSS structure described in Sect. 2 to identify the decimated output sub-signals ${u_{1i,D}}\left( p \right)$, ${u_{2i,D}}\left( p \right)$ from only the decimated mixing sub-signals ${m_{1i,D}}(p)$, ${m_{2i,D}}(p)$. This FBSS structure uses two adaptive filters to extract the original speech from noise, in this paper we update the coefficients of these adaptive filters using the fast normalized least mean square (FNLMS) algorithm in a subband from, a full mathematical description of the proposed dual subband FNLMS algorithm will be presented in the next subsection.

The decimated output sub-signals of the proposed dual subband FNLMS algorithm are given by the following formulas:

$${u_{1i,D}}\left( p \right)={m_{1i,D}}\left( p \right) - w_{1}^{T}\left( p \right){m_{2i,D}}(p)\,\,\,i=1,~2,~ \ldots ,M.$$

(13)

$${u_{2i,D}}\left( p \right)={m_{2i,D}}\left( p \right) - w_{2}^{T}\left( p \right){m_{1i,D}}(p)\,\,\,i=1,~2,~ \ldots ,M.$$

(14)

where ${m_{1i,D}}\left( p \right)=\left[ {{m_{1i,D}}\left( p \right),~{m_{1i,D}}\left( {p - 1} \right), \ldots ,{m_{1i}}\left( {p - L+1} \right)} \right]$ and ${m_{2i,D}}\left( p \right)=\left[ {{m_{2i,D}}\left( p \right),~{m_{2i,D}}\left( {p - 1} \right), \ldots ,{m_{2i,D}}\left( {P - L+1} \right)} \right]$. $L$ is the length of the adaptive filters.

3.3 Step 3

In the last step, the synthesis filter banks are used to combine the $M$ decimated output sub-signals ${u_{1i,D}}(p)$, ${u_{2i,D}}(p)$ into the fullband output forms ${u_1}(n)$ and ${u_2}(n)$. The synthesis filter bank consists of a bank of interpolators that up sample the subband signals by an interpolator factor $~I$, before filtering and adding these subband signals (Reza Abutalebi et al. 2004; Lee and Gan 2010). After an interpolation procedure, the new output sub-signals can be expressed as follows:

$${u_{1i}}\left( n \right)=\left\{ {\begin{array}{*{20}{c}} {{u_{1i,D}}(p/I),~~~n=0, \pm I, \pm 2I, \ldots \ldots } \\ {0~~~~~~~~~~~~~~~~~~{\text{otherwise}}} \end{array}} \right.\,\,{\text{For}}\;i=1,~2,~ \ldots ,M.$$

(15)

$${u_{2i}}\left( n \right)=\left\{ {\begin{array}{*{20}{c}} {{u_{2i,D}}(p/I),~~~n=0, \pm I, \pm 2I, \ldots \ldots } \\ {0~~~~~~~~~~~~~~~~~~{\text{otherwise}}} \end{array}} \right.\,\,{\text{For}}\,i=1,~2,~ \ldots ,M.$$

(16)

where $I$ is the interpolator factor, in our case we take $I=D=M~.$

The fullband outputs ${u_1}(n)$ and ${u_2}(n)$ of the proposed dual subband FNLMS algorithm are given by the following relations:

$${u_1}\left( n \right)=\mathop \sum \limits_{{i=1}}^{M} {\varvec{G}}_{i}^{T}{{\varvec{U}}_{1i}}(n)\,\,\,i=1,~2,~ \ldots ,M.$$

(17)

$${u_2}\left( n \right)=\mathop \sum \limits_{{i=1}}^{M} {\varvec{G}}_{i}^{T}{{\varvec{U}}_{2i}}(n)\,\,i=1,~2,~ \ldots ,M.$$

(18)

where ${U_{1i}}\left( n \right)=\left[ {{u_{1i}}\left( n \right),~{u_{1i}}\left( {n - 1} \right), \ldots ,{u_{1i}}\left( {n - l+1} \right)} \right]$, ${U_{2i}}\left( n \right)=\left[ {{u_{2i}}\left( n \right),~{u_{2i}}\left( {n - 1} \right), \ldots ,{u_{2i}}\left( {n - l+1} \right)} \right].$

In Table 1, the proposed subband decomposition is summarized.

Table 1 Summary of the proposed dual subband decomposition

Full size table

3.4 Mathematical formulation of the proposed dual subband FNLMS algorithm

In this subsection, we derive the mathematical formulation of the dual forward fast normalized least mean square algorithm in its subbund form. The scheme of the proposed dual subbund algorithm is given by Fig. 4.

From Fig. 4, we can deduce that the update relations of the adaptive filters $~{w_1}\left( p \right)$ and $~{w_2}(p)$ of the proposed dual subbund FNLMS algorithm can be expressed as follows:

$${w_1}\left( {p+1} \right)={w_1}\left( p \right) - {\mu _1}\mathop \sum \limits_{{i=1}}^{M} \left[ {{u_{1i,D}}\left( p \right){c_{1i,D}}(p)} \right]\,\,i=1,~2,~ \ldots ,M.$$

(19)

$${w_2}\left( {p+1} \right)={w_2}\left( p \right) - {\mu _2}\mathop \sum \limits_{{i=1}}^{M} \left[ {{u_{2i,D}}\left( p \right){c_{2i,D}}(p)} \right]\,\,i=1,~2,~ \ldots ,M.$$

(20)

where ${w_1}(p)~=~{[{w_1}(p),{w_1}(p~ - ~1),~ \ldots ,~{w_1}(p~ - ~L~+~1)]^T}$ and ${w_2}(p)~=~{[{w_2}(p),{w_2}(p~ - ~1),~ \ldots ,~{w_2}(p~ - ~L~+~1)]^T}$. $0<{\mu _1},\,{\mu _2}<2$, are defined as the step-size parameters which affects the convergence behavior of the filter weights. ${c_{1i,D}}\left( p \right)$ and ${c_{2i,D}}\left( p \right)$ are the decimated subbund adaptation gain vectors, which are given by the following relations:

$${c_{1i,D}}\left( p \right)={\gamma _{1i,D}}\left( p \right){k_{1i,D}}(p)~\;\,i=1,~2,~ \ldots ,M.$$

(21)

$${c_{2i,D}}\left( p \right)={\gamma _{2i,D}}\left( p \right){k_{2i,D}}(p)~\,\,i=1,~2,~ \ldots ,M.$$

(22)

The scalars ${\gamma _{1i,D}}(p)$ and ${\gamma _{2i,D}}(p)$ that are used in Eqs. (21) and (22) respectively are called likelihood variables, and can be calculated using the following definition (Benallal and Arezki 2013; Benallal and Benkrid 2007):

$${\gamma _{1i,D}}\left( p \right)=\frac{1}{{1 - k_{{1i,D}}^{T}(p){m_{2i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$

(23)

$${\gamma _{2i,D}}\left( p \right)=\frac{1}{{1 - k_{{2i,D}}^{T}(p){m_{1i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$

(24)

The decimated subbund vectors ${k_{1i,D}}(p)~$ and ${k_{2i,D}}(p)$ are called the kalman gain, which are obtained by discarding the forward and backward predictors and using only the forward predictions errors ${e_{1i,D}}(p)$ and ${e_{2i,D}}(p)$ (Benallal and Arezki 2013; Sayoud et al. 2018), so the dual kalman gain of the proposed algorithm in their subbund form can be calculated by the following relations:

$$\left[ {\begin{array}{*{20}{c}} {{k_{1i,D}}(p)} \\ * \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} { - \frac{{{e_{1i,D}}(p)}}{{\lambda {\alpha _{1i,D}}\left( {p - 1} \right)+{c_0}}}} \\ {~{k_{1i,D}}(p - 1)} \end{array}} \right]\,\,\,i=1,~2,~ \ldots ,M.$$

(25)

$$\left[ {\begin{array}{*{20}{c}} {{k_{2i,D}}(p)} \\ * \end{array}} \right]=\left[ {\begin{array}{*{20}{c}} { - \frac{{{e_{2i,D}}(p)}}{{\lambda {\alpha _{2i,D}}\left( {p - 1} \right)+{c_0}}}} \\ {~{k_{2i,D}}(p - 1)} \end{array}} \right]\,\,\,i=1,~2,~ \ldots ,M.$$

(26)

where the asterisk ‘*’represents the last unused element of the dual Kalman gain vectors, $\lambda \left( {0<\lambda <1} \right)$ is an exponential forgetting factor and ${c_0}$ is a small positive constant used to avoid division by very small values in absence of the input signal. The decimated subband parameters ${\alpha _{1i}}\left( p \right)$ and ${\alpha _{2i}}\left( p \right)$ are the forward prediction errors variances, given by:

$${\alpha _{1i,D}}\left( p \right)=\lambda {\alpha _{1i,D}}\left( {p - 1} \right)+~{e^2}_{{1i,D}}(p)\,\,i=1,~2,~ \ldots ,M.$$

(27)

$${\alpha _{2i,D}}\left( p \right)=\lambda {\alpha _{2i,D}}\left( {p - 1} \right)+~{e^2}_{{2i,D}}(p)~\,\,i=1,~2,~ \ldots ,M.$$

(28)

The forward predictions errors ${e_{1i,D}}(p)$ and ${e_{2i,D}}(p)$ can be calculated using the first order model, so the decimated subbund forward prediction errors of the proposed algorithm can expressed as follows:

$${e_{1i,D}}\left( p \right)={m_{2i,D}}\left( p \right) - {a_{1i,D}}{m_{2i,D}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$

(29)

$${e_{2i,D}}\left( p \right)={m_{1i,D}}\left( p \right) - {a_{2i,D}}{m_{1i,D}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$

(30)

where ${a_{1i,D}}$ and ${a_{2i,D}}$ are the decimated subbund prediction coefficients, to obtain these prediction coefficients we minimize the functions $E\left[ {e_{{1i,D}}^{2}(n)} \right]$ and $E\left[ {e_{{2i,D}}^{2}(n)} \right]$, so we get these relations:

$${a_{1i,D}}(p)=\frac{{E\left[ {{m_{2i,D}}\left( p \right){m_{2i,D}}\left( {p - 1} \right)} \right]}}{{E\left[ {{m_{2i,D}}^{2}\left( {p - 1} \right)} \right]}}=\frac{{{r_{1i,D}}(p)}}{{{r_{2i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$

(31)

$${a_{2i,D}}(p)=\frac{{E\left[ {{m_{1i,D}}\left( p \right){m_{1i,D}}\left( {p - 1} \right)} \right]}}{{E\left[ {{m_{1i,D}}^{2}\left( {p - 1} \right)} \right]}}=\frac{{{r_{3i,D}}(p)}}{{{r_{4i,D}}(p)}}\,\,i=1,~2,~ \ldots ,M.$$

(32)

where ${r_{1i,D}}\left( p \right)$ and ${r_{2i,D}}\left( p \right)$ represent respectively, the first coefficient of the autocorrelation function of the decimated subbund mixtures ${m_{2i,D}}(p)$ and the power of the decimated subbund mixtures ${m_{2i,D}}(p)$. ${r_{3i,D}}\left( p \right)$ and ${r_{4i,D}}\left( p \right)$ represent respectively, the first coefficient of the autocorrelation function of the decimated subbund mixtures ${m_{1i,D}}(p)$ and the power of the decimated subbund mixtures ${m_{1i,D}}(p)$.

An estimation of the prediction coefficients can be performed by the following relations:

$${a_{1i,D}}(p)=\frac{{{r_{1i,D}}(p)}}{{{r_{2i,D}}(p)+{c_a}}}\,\,i=1,~2,~ \ldots ,M.$$

(33)

$${a_{2i,D}}(p)=\frac{{{r_{3i,D}}(p)}}{{{r_{4i,D}}(p)+{c_a}}}\,\,i=1,~2,~ \ldots ,M.$$

(34)

where $~{r_{1i,D}}(p)$, ${r_{2i,D}}(p)$, ${r_{3i,D}}(p)$, and ${r_{4i,D}}(p)$ are estimated recursively by the following relations:

$${r_{1i,D}}(p)={\lambda _a}{r_{1i,D}}\left( {p - 1} \right)+{m_{2i,D}}(p)~{m_{2i}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$

(35)

$${r_{2i,D}}(p)={\lambda _a}{r_{2i,D}}\left( {p - 1} \right)+{m_{2i,D}}^{2}(p)\,\,i=1,~2,~ \ldots ,M.$$

(36)

$${r_{3i,D}}(p)={\lambda _a}{r_{3i,D}}\left( {p - 1} \right)+{m_{1i,D}}(p)~{m_{1i,D}}(p - 1)\,\,i=1,~2,~ \ldots ,M.$$

(37)

$${r_{4i,D}}(p)={\lambda _a}{r_{4i,D}}\left( {p - 1} \right)+{m_{1i,D}}^{2}(p)\,\,i=1,~2,~ \ldots ,M.$$

(38)

where ${\lambda _a}$ is a forgetting factor and ${c_a}$ is a small positive constant. The summary of proposed dual subband algorithm is given in Table 2.

Table 2 The proposed dual subband FNLMS algorithm

Full size table

4 Analysis of simulation results

In this section, we investigate the potential of the proposed dual subband algorithm for achieving speech separation in adverse environments, intensives simulations are carried out. In Fig. 6 we present the source signals $s(n)$ and $b(n)$ used in our simulation which are respectively, a French sentence of about 4 s length, pronounced by one male speaker, and a stationary (USASI) noise (United State of America Standard Institute now (ANSI)), digitized at an $8\;{\text{kHz}}$ sampling frequency with 16 bits quantification. These two source signals and two real acoustic impulse responses (see Fig. 5) are used by the simplified convolutive mixture of section II to generate the mixing signals ${m_1}(n)$ and ${m_2}(n)$, given in Fig. 6.

The proposed dual subband algorithm is adapted using a manual activity voice detector (MAVD) system however, the filter ${w_1}(n)$ is adapted during speech pauses where the noise characteristics being estimated, in order to obtain the speech signal at the output ${u_1}(n)$, and to retrieve the noise signal at the output ${u_2}(n)$ we update the filter ${w_2}(n)~$during speech presence periods. Figure 7 show an example of the MAVD of the original speech signal.

We present in Figs. 8, 9, 10 the frequency responses characteristics of the analysis and the synthesis filters [described in Sect. 3], the number of subbands is chosen equal to 2 and 4, and the length of these subband filters is equal to 16 and 32 respectively.

The comparative time evolution of the original speech signal $s(n)$ and the enhanced one ${u_1}(n)$ obtained by the proposed dual subbund algorithm with $M=2$ and $M=4$ and its fullband version (DFNLMS) algorithm published recently in Sayoud et al. (2018) are presented in Fig. 11, from these results it is observed that the two algorithms i.e. proposed dual subband algorithm and its fullband version DFNLMS algorithm are able to remove the noise from the output ${u_1}(n)$, hence we can confirm the good behavior of the two algorithms in noise reduction applications.

4.1 Performance measure

We have compared the noise cancellation performance properties of the proposed dual subband FNLMS algorithm with (i) its fullband version the dual fast normalized least mean square (DFNLMS) (Sayoud et al. 2018) algorithm, this algorithm is based on the combination between the FBBS structure with the FNLMS algorithm, and (ii) the classical dual normalized least mean square (Van Gerven et al. 1992, DNLMS) algorithm, which is a dual adaptive filtering algorithm based on the use of the FBSS structure combined with the NLMS algorithm, and (iii) the two channel subband forward (Djendi and Bendoumia 2013, 2CSF) algorithm, which is a subband adaptive filtering algorithm based on the forward blind source separation structure. This comparative evaluation is performed using the objective measures criteria cited bellow. As we are interested on speech enhancement, we will focus only on the enhanced output ${u_1}(n)$ and the adaptive filter ${w_1}(n)$ in the objective evaluation. The simulation parameters of each simulated algorithm are given in Table 3.

Table 3 simulation parameters of: Fullband DNLMS algorithm (Van Gerven et al. 1992), Fullband DFNLMS algorithm (Sayoud et al. 2018), 2CSF algorithm (Djendi and Bendoumia 2013) and the proposed dual subband FNLMS algorithm [In this paper]

Full size table

4.1.1 Segmental signal-to-noise-ratio (SegSNR) criterion

To evaluate the noise cancellation performance of the proposed dual subband algorithm in comparison to the DNLMS, DFNLMS, 2CSF algorithms, we have used the segSNR criterion, which is computed as follows (Sayoud et al. 2018):

$$SegSN{R_{dB}}=\frac{{10}}{M}\mathop \sum \limits_{{m=0}}^{{M - 1}} lo{g_{10}}\left( {\frac{{\mathop \sum \nolimits_{{n=Nm}}^{{Nm+N - 1}} {{\left| {s\left( n \right)} \right|}^2}}}{{\mathop \sum \nolimits_{{n=Nm}}^{{Nm+N - 1}} {{\left| {s(n) - {u_1}\left( n \right)} \right|}^2}}}} \right)$$

(39)

where $s(n)$ and ${u_1}\left( n \right)$ are the original and the enhanced speech signals, respectively. The parameters M and N are the number of segments and the segment length, respectively. We note that at the output, we get M values of the SegSNR criterion, each one is mean averaged on ‘N’ samples. The symbol $\left| \cdot \right|$ represents the absolute operator. We recall here that all the ‘M’ segments correspond to only speech signal presence periods. The log₁₀ symbol is the base 10 logarithm of a number. The simulation parameters of each simulated algorithm are given in Table 3. The obtained results are reported on Figs. 12, 13. In the first experiment of Fig. 12, we have used the white noise at the input of the convolutive mixture to evaluate the stability performance of the proposed dual subband algorithm with (2 and 4 subbands), and to test the convergence speed performance of each algorithm, the USASI noise is used in the second experiment of Fig. 13. From these results, we note a close behavior of the four simulated algorithms (i.e. DNLMS, DFNLMS, 2CSF and the proposed dual subband) in terms of SegSNR criterion with different adaptive filter lengths (i.e. L = 64 and L = 128) when the noise is white, which confirm that the proposed dual subband FNLMS algorithm is numerically stable. When the USASI noise is used, we observe a poor behavior of the DNLMS algorithm. Also we have noted a similar behavior in the transient regime between the proposed dual subband algorithm (with 2 and 4 subbands) and its fullband version (DFNLMS) algorithm and the 2CSF algorithm with 2 subbands, however in the steady-state regime the SegSNR values of the proposed dual subband algorithm decrease specifically when the number of subbands is selected equal to 4.

4.1.2 System mismatch criterion

The SM criterion allows to quantify objectively the convergence speed of the adaptive filter to the optimal solution of each algorithm (i.e. DNLMS, DFNLMS, 2CSF, and the proposed dual subband algorithms), As we are interesting only on the enhanced output${u_1}(n)$, we will focus only on the convergence of the adaptive filter ${w_1}(n)$ to the real impulse response$~{h_1}(n)$, so the SM is estimated by the following relation (Hu and Loizou 2008):

$$S{M_{dB}}=20~{\log _{10}}\left( {\frac{{\left\| {{h_1} - {w_1}(n)} \right\|}}{{\left\| {{h_1}} \right\|}}} \right)$$

(40)

where $\left\| \cdot \right\|$ represent the mathematical Euclidean norm operator. We recall that the simulation parameters of each algorithm are the same as given in Table 3. We have evaluated the SM criterion for two noise types (i.e. USASI and white) and two adaptive filter lengths (i.e. L = 64 and L = 128). The obtained results are reported on Figs. 14, 15. From Fig. 14 we can see that the overall behavior of the four simulated algorithms is very close. We have also noted in the case of USASI noise (Fig. 15) the convergence speed superiority of the proposed dual subband FNLMS algorithm in comparison with the other ones. The proposed dual subband FNLMS algorithm reaches the fastest convergence speed when the number of subbands is equal to 4, for all selected adaptive filter lengths i.e. L = 64 and L = 128.

4.1.3 Segmental mean square error (SegMSE) criterion

We have used the segmental MSE of Eq. (41) to evaluate the convergence speed performance of the proposed dual subband FNLMS algorithm in comparison whit its fullband version DFNLMS and the classical fullband DNLMS algorithm and the 2CSF algorithm.

$$SegMS{E_{dB}}=\frac{{10}}{M}\mathop \sum \limits_{{m=0}}^{{M - 1}} lo{g_{10}}\left( {\frac{1}{N}\mathop \sum \limits_{{n=Nm}}^{{Nm+N - 1}} {{\left| {s\left( n \right) - {u_1}\left( n \right)} \right|}^2}} \right)$$

(41)

where $N$ is the segment length of the original signal ${\text{s}}\left( n \right)$ and the enhanced one ${u_1}(n)$, and $M$ represent the number of segments in silence periods. Relation (41) shows that the SegMSE criterion is evaluated only in the speech pauses (Ghribi et al. 2016). The simulation parameters are the same as given in the previous sections (Table 3).

In Fig. 16 we have shown the SegMSE evaluation of a white noise, in this simulation we can see again that the overall behavior of the four simulated algorithms are very close, however in Fig. 17 when the noise signal is a USASI type, we can see clearly that in the transient phase, the proposed dual subband FNLMS algorithm with (2 and 4 subbands) performs better than the other algorithms. From these results we can confirm that the proposed dual subband FNLMS algorithm improves the convergence speed in real situation when the source signals are not stationary and even in noisy conditions (input SNR is 0 dB).

4.1.4 Cepstral distance criterion

In order to compare the average cepstral distance between the original speech signal and the enhanced outputs obtained by the four algorithms (i.e. DNLMS, DFNLMS, 2CSF, and the proposed dual subband algorithm), we have used the cepstral distance criterion computed as follow (Rabiner 1993; Sayoud et al. 2018):

$$C{D_{dB}}=\mathop \sum \limits_{{\lambda =0}}^{{T - 1}} IFFT{\left[ {{\text{log}}\left( {S\left| {\left( {\lambda ,\omega } \right)} \right|} \right) - \log \left( {{U_1}\left| {\left( {\lambda ,\omega } \right)} \right|VA{D_\lambda }} \right)} \right]^2}$$

(42)

where $S(\lambda ,\omega )$ and ${U_1}(\lambda ,\omega )$ present the short Fourier transform of the original speech signal $s(n)$ and the enhanced one ${u_1}(n)$ respectively at each frame $~\lambda$, and T is the mean averaging value of the CD criterion, and $VAD$ parameter is a voice activity detector. In Figs. 18, 19 we present the obtained results of the CD criterion evaluation. When a white noise is used in the mixing model (see Fig. 18), the DNLMS algorithm and the DFNLMS algorithm provide the same CD values even with different adaptive filter lengths (i.e. 64 and 128). The proposed dual subband FNLMS algorithm with (2 and 4 subbands) and the 2CSF algorithm with (2 subbands) behaves poorly in comparison with both algorithms. Furthermore, when the USASI noise is used (see Fig. 19) the DFNLMS algorithm outperforms the others algorithms (i.e. DNLMS, 2CSF, and the proposed dual subband algorithm), we can also see that the DNLMS algorithm has the worst performance when the adaptive filter length is large (L = 128).

5 Conclusion

In this paper we have proposed a new dual subband implementation of the forward blind source separation structure (FBSS) based on the use of the fast normalized least mean square (FNLMS) algorithm. This proposed dual subband FNLMS algorithm has shown good properties in extracting the speech signal from very noisy observations. To evaluate the performance of the proposed algorithm in comparison with its fullband version the dual fast NLMS algorithm and the classical fullband dual NLMS algorithm and the two channel forward 2CSF algorithm, intensive experiments were performed using several objective criteria, in different situations where punctual noise components are present at the input of the convolutive mixing model. When the noise in the mixing observations is white the simulated algorithms have shown almost the same performance. However, when the used noise is USASI, The obtained results of the SM and the Segmental MSE evaluation have shown the superiority of the proposed subband FNLMS algorithm in terms of convergence speed property in the transient phase. The Segmental SNR evaluation has also shown the good behavior of the proposed dual subband algorithm in reducing the acoustic noise components at the processing output, the only poor behavior of the proposed dual subband algorithm is observed with the cepstral distance criterion, specifically when the number of subbands is selected high, although the CD values of this proposed algorithm are around − 5 dB which indicates a good intelligibility property of the output speech signal. According to these results, we can conclude that when we use the proposed dual subband FNLMS algorithm, we need to make a good compromise between the number of subbands and the performance that we want to achieve, either faster convergence or less distortion speech, and this choice is done in accordance with the application.

References

Barysenkaa, S. Y., Vorobiova, V. I., & Mowlaee, P. (2018). Single-channel speech enhancement using inter-component phase relations. Speech Communication, 99, 144–160.
Article Google Scholar
Benallal, A., & Arezki, M. (2013). A fast convergence normalized least-mean-square type algorithm for adaptive filtering. International Journal of Adaptive Control and Signal Processing, 28(10), 1073–1080.
MathSciNet Google Scholar
Benallal, A., & Benkrid, A. (2007). A simplified FTF-type algorithm for adaptive filtering. Signal Processing, 87(5), 904–917.
Article MATH Google Scholar
Bouchard, M. (2003). Multichannel affine and fast affine projection algorithms for active noise control and acoustic equalization systems. IEEE Transactions on Speech and Audio Processing, 11(1), 54–60.
Article Google Scholar
Darazirar, I., & Djendi, M. (2015). A two-sensor Gauss-Seidel fast affine projection algorithm for speech enhancement and acoustic noise reduction. Applied Acoustics, 96, 39–52.
Article Google Scholar
Djendi, M. (2010). Advanced techniques for two-microphone noise reduction in mobile communications (Ph.D. dissertation), University of Rennes, France (in French).
Djendi, M., & Bendoumia, R. (2013). A new adaptive filtering subband algorithm for two channel acoustic noise reduction and speech enhancement. Computers & Electrical Engineering, 39(8), 2531–2550.
Article Google Scholar
Djendi, M., Gilloire, A., & Scalart, P. (2007) New frequency domain post-filters for noise cancellation using two closely spaced microphones. Proceedings of EUSIPCO, Poznan, 1, 218–221.
Google Scholar
Djendi, M., Scalart, P., & Gilloire, A. (2006). Noise cancellation using two closely spaced microphones: Experimental study with a specific model and two adaptive algorithms. IEEE International Conference on Acoustics Speech and Signal Processing Proceedings, 3, 744–747.
Google Scholar
Gabrea, M. (2003). Double affine projection algorithm-based speech enhancement algorithm. In IEEE International Conference on Acoustics, Speech, and Signal Processing, Montréal, vol. 2, pp. 904–907.
Gabrea, M., Mandridake, E., Menez, M., & Vallauri, A. (1996). Two microphones speech enhancement system based on a double fast recursive least squares (DFRLS) algorithm. 8th European Signal Processing Conference, IEEE.
Ghribi, K., Djendi, M., & Berkani, D. (2016). A New wavelet-based forward BSS algorithm for acoustic noise reduction and speech quality enhancement. Applied Acoustics, 105, 55–66.
Article Google Scholar
Henni, R., Djendi, M., & Djebari, M. (2019). A new efficient two-channel fast transversal adaptive filtering algorithm for blind speech enhancement and acoustic noise reduction. Computer and Electrical Engineering, 73, 349–368.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing, 16(1), 229–238.
Article Google Scholar
Kim, G., Yoo, C. D., & Nguyen, T. Q. (2008). Alia-free subband adaptive filtering with critical sampling. IEEE Transactions on Signal Processing, 56(5), 1894–1904.
MathSciNet Google Scholar
Kocinski, J. (2008). Speech intelligibility improvement using convolutive blind source separation assisted by denoising algorithms. Speech Communication, 50(1), 29–37.
Article Google Scholar
Kokkinakis, K., & Loizou, P. C. (2007). Subband-based blind signal processing for source separation in convolutive mixtures of speech. IEEE Transactions on Acoustics Speech and Signal Processing, 4, 917–920.
Google Scholar
Lee, K. A., & Gan, W. S. (2004). Improving convergence of the NLMS algorithm using constrained subband updates. IEEE Signal Processing Letters, 11(9), 736–739
Article Google Scholar
Lee, K. A., Gan, W. S., Kuo, S. M. (2010). Subband adaptive filtering theory and implementation. Chichester: Wiley.
Google Scholar
Lee, S., Han, D. K., & Ko, H. (2017). Single-channel speech enhancement method using reconstructive NMF with spectrotemporal speech presence probabilities. Applied Acoustics, 117(B), 257–262.
Article Google Scholar
Meyer, J., Simmer, K. U. (1997). Multi-channel speech enhancement in a car environment using Wiener filtering and spectral subtraction. IEEE International Conference on Acoustics, Speech, and Signal Processing.
Milani, A. A., & Panahi, I. M. S., Loizou, P. C. (2009). A new delayless subband adaptive filtering algorithm for active noise control systems. IEEE Transactions on Audio, Speech, and Language Processing, 17(5), 1038–1045.
Article Google Scholar
Nabi, W., Aloui, N., & Cherif, A. (2016). Speech enhancement in dual-microphone mobile phones using Kalman filter. Applied Acoustics, 109, 1–4.
Article Google Scholar
Nabi, W., Aloui, N., & Cherif, A. (2017). An improved speech enhancement algorithm for dual-channel mobile phones using wavelet and genetic algorithm. Computers and Electrical Engineering, 000, 1–14.
Google Scholar
Rabiner, L., B.H. Juang (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Reza Abutalebi, H., Sheikhzadeh, H., Brennan, R. L., & Freeman, G. H. (2004). A hybrid subband adaptive system for speech enhancement in diffuse noise fields. IEEE Signal Processing Letters, 11(1), 44–47.
Google Scholar
Sayoud, A., Djendi, M., Medahi, S., & Guessoum, A. (2018). A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement. Applied Acoustics, 135, 101–110.
Article Google Scholar
Syskind Pedersen, M., Larsen, J., Kjems, U., & Parra, L. C. (2007). A survey of convolutive blind source separation methods. Springer Handbook on Speech Processing and Speech Communication.
Syskind Pedersen, M., Wang, D., Larsen, J., Kjems, U. (2008). Two-microphone separation of speech mixtures. IEEE Transaction on Neural Networks (In Press).
Upadhyay, N., Jaiswal, R. K. (2016) Single channel speech enhancement: using Weiner filtering recursive noise estimation. Procedia Computer Science, 84, 22–30.
Article Google Scholar
Van Gerven, S., Van Compernolle, D. (1992). Feedforward and Feedback in a symmetric adaptive noise canceller: Stability analysis in a simplified case. Proceedings of the IEEE EUSIPCO, Belgium, Brussels, 1, 1081–1084.
Google Scholar
Weinstein, E., Feder, M., & Oppenheim, A. V. (1993). Multi-channel signal separation by decorrelation. IEEE Transactions on Speech and Audio Processing, 1(4), 405–413.
Article Google Scholar
Yang, F., Wu, M., Ji, P., Yang, J. (2012). An improved multiband-structured subband adaptive filter algorithm. IEEE, Signal Processing Letters, 19(10), 647–650.
Article Google Scholar
Zoulikha, M., Djendi, M. (2016). A new regularized forward blind source separation algorithm for automatic speech quality enhancement. Applied Acoustics, 112, 192–200.
Article Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing and Image Laboratory (LATSI), University of Blida 1, Route de Soumaa, B.P. 270, 09000, Blida, Algeria
Mohamed Djendi & Akila Sayoud

Authors

Mohamed Djendi
View author publications
You can also search for this author in PubMed Google Scholar
Akila Sayoud
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Djendi.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Djendi, M., Sayoud, A. A new dual subband fast NLMS adaptive filtering algorithm for blind speech quality enhancement and acoustic noise reduction. Int J Speech Technol 22, 391–406 (2019). https://doi.org/10.1007/s10772-019-09614-9

Download citation

Received: 13 September 2018
Accepted: 21 March 2019
Published: 28 March 2019
Issue Date: 15 June 2019
DOI: https://doi.org/10.1007/s10772-019-09614-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new dual subband fast NLMS adaptive filtering algorithm for blind speech quality enhancement and acoustic noise reduction

Abstract

Similar content being viewed by others

Two-channel forward NLMS algorithm combined with simple variable step-sizes for speech quality enhancement

Upgraded NLMS algorithm for speech enhancement with sparse and dispersive impulse responses

Sparse Blind Speech Deconvolution with Dynamic Range Regularization and Indicator Function

1 Introduction

2 The forward blind source separation structure for the convolutive mixture