A new speech enhancement adaptive algorithm based on fullband–subband MSE switching

Sayoud, Akila; Djendi, Mohamed; Guessoum, Abderrezak

doi:10.1007/s10772-019-09651-4

A new speech enhancement adaptive algorithm based on fullband–subband MSE switching

Published: 05 October 2019

Volume 22, pages 993–1005, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Speech Technology Aims and scope Submit manuscript

A new speech enhancement adaptive algorithm based on fullband–subband MSE switching

Download PDF

Akila Sayoud¹,
Mohamed Djendi¹ &
Abderrezak Guessoum¹

224 Accesses
7 Citations
Explore all metrics

Abstract

This paper presents a new fullband–subband switching adaptive speech enhancement algorithm, based on mean square error estimation. The proposed algorithm is able to automatically switch between two adaptive filtering algorithms, i.e. the two-channel fullband normalized least mean square (TC-FNLMS) algorithm and the two-channel subband normalized least mean square (TC-SNLMS), where, the proposed switching mechanism leads to a significant improvement in the convergence speed performance of the proposed algorithm. To confirm the efficiency and the good performances of the proposed algorithm in comparison with the fullband and subband versions of the two channel NLMS algorithm, several experiments were carried out in terms of the segmental signal-to-noise-ratio (SegSNR), segmental mean square error (SegMSE), system mismatch (SM) and cepstral distance (CD).

Upgraded NLMS algorithm for speech enhancement with sparse and dispersive impulse responses

Article 10 February 2020

Blind Speech Enhancement Using Adaptive Algorithms

Iterative Thresholding-Based Spectral Subtraction Algorithm for Speech Enhancement

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The advent and wide dissemination of mobile voice communication systems has substantially increased the need for reliable communication systems in noisy environments. The environmental noise is a significant limitation on the performance of hands-free voice communications (Thumchirdchupong and Tangsangiumvisai 2013). To handle this critical issue, i.e. the environmental noise, several noise reduction and speech enhancement techniques have been proposed in the literature (Loizou 2017; Kuo and Peng 1999; Dixit and Mulge 2014). Generally, the approaches can be classified into a single-channel (Upadhyay and Jaiswal 2016; Benesty and Cohen 2018), dual-channel (Man Kima and Kook Kim 2014; Nabi et al. 2017) or multi-channel (Kim and Hasegawa-Johnson 2012; Habets and Benesty 2013) speech enhancement methods.

In recent years, dual-channel speech enhancement algorithms have been widely used due to their good performance in different noise situations (i.e. stationary or non-stationary noise signal) (Sayoud et al. 2018; Nabi et al. 2016), unlike the single-channel methods that fail in the presence of non-stationary noise (Loizou 2007). Also, they are easy to implement and do not require a large computational complexity compared to multi-channel speech enhancement techniques. Nabi et al. (2018) proposed a dual-microphone noise reduction algorithm based on the coherence function and the bionic wavelet transform using Kalman filter, this method has the ability to deal with two closely spaced microphones and does not require noise statistics estimation. Moreover, numerous dual-channel speech enhancement techniques based on the combination between adaptive algorithms and the forward and backward blind source separation structures have been proposed (Henni et al. 2019; Djendi and Zoulikha 2018; Djendi 2018). Subband adaptive filtering is another efficient approach that is often used for speech enhancement and noise reduction applications due to its capability in improving convergence performance for highly correlated input signals. Lee and Gan (2004) proposed the normalized subband adaptive filtering (NSAF) algorithm, where, it allows to whiten the input and the desired output signals by divided them into multiple subbands and that improves the convergence rate. For further performances improvement, several modified versions of the NSAF algorithm have been developed (Ni and Li 2010; Yu et al. 2016; Seo and Park 2014). Furthermore, Djendi and Bendoumia (2013) proposed a two-channel subband adaptive filtering algorithm based on the forward blind source separation (FBSS) structure, which has improved the convergence speed performance of the classical FBSS, but, the performance of this algorithm depends on the fixed step size that reflects a compromise between fast convergence rate and low steady-state error. To overcome this conflict, a two-channel subband FBSS structure with variables step-size has been proposed in Djendi and Bendoumia (2016).

In this paper, we propose a new adaptive switching algorithm for noise reduction and speech enhancement applications. The proposed algorithm switches between the two-channel subband normalized least mean square (TC-SNLMS) algorithm (Djendi and Bendoumia 2013) and the two-channel fullband normalized least mean square (TC-FNLMS) one (Van Gerven and Van Compernolle 1992), where, the switching procedure is based on the mean square error (MSE) estimation. After a review of the proposed algorithm, we can say that the proposed switching algorithm exhibits better speed convergence in comparison with the classical two-channel fullband NLMS algorithm and the two- channel subband NLMS one.

The organization of this paper is as follows: in Sect. 2, we present mixing and separation model. In Sect. 3 we describe the principle of the proposed algorithm. The simulation results of the proposed switching algorithm in comparison with other competitive algorithms are presented in Sect. 4. Finally, we conclude our work in Sect. 5.

2 Mixing and separation model

In a car environment, the hands-free communication is normally disturbed by the surrounded noise (engine, babbling or street noise). As depicted on Fig. 1, the recorded signals by the two microphones are contaminated by the noise components, these noisy observations are a linear combination between the useful speech signal and the disturbing noise signal. The main problem is to recover the original source signals from only these two available noisy observations. The blind source separation structure is widely used to address this critical issue, (Sayoud et al. 2018; Djendi and Bendoumia 2013; Syskind Pedersen et al. 2007).

In this paper the acoustical environment is modeled by the simplified convolutive mixture shown in Fig. 2a, and in order to retrieve the original signals, i.e. speech signal $s\left( n \right)$ and noise signal $b\left( n \right)$, from the observed mixed signals, i.e. $m_{1} \left( n \right)$ and $m_{2} \left( n \right)$, we use the forward blind source separation structure (Sayoud et al. 2018; Djendi 2018; Djendi et al. 2016), as depicted in Fig. 2b.

Let $m_{1} \left( n \right)$ and $m_{2} \left( n \right)$ denote the two noisy observation signals that can be modeled by the following formulations:

$$m_{1} \left( n \right) = s\left( n \right)*h_{11} \left( n \right) + b\left( n \right)*h_{21} \left( n \right)$$

(1)

$$m_{2} \left( n \right) = b\left( n \right)*h_{22} \left( n \right) + s\left( n \right)*h_{12} \left( n \right)$$

(2)

where $s\left( n \right)$ and $b\left( n \right)$ are the two independent sources of speech and noise respectively. The symbol ‘*’ stands for the linear convolution operation. $h_{12} \left( n \right)$ and $h_{21} \left( n \right)$ represent the cross-coupling effects between the channels. We assume that, the first microphone is close to the speaker and the second microphone is close to the noise source, thus, the direct acoustic paths $h_{11} \left( n \right)$ and $h_{22} \left( n \right)$ are equal to the Kronecker unit impulse $\delta \left( n \right)$ (Van Gerven and Van Compernolle 1995). According to this assumption the two relations of the noisy observation signals can be rewritten as follows:

$$m_{1} \left( n \right) = s\left( n \right) + b\left( n \right)*h_{21} \left( n \right)$$

(3)

$$m_{2} \left( n \right) = b\left( n \right) + s\left( n \right)*h_{12} \left( n \right)$$

(4)

The two estimated output signals of the FBSS structure are given by the following relations:

$$u_{1} \left( n \right) = m_{1} \left( n \right) - m_{2} \left( n \right)*w_{21} \left( n \right)$$

(5)

$$u_{2} \left( n \right) = m_{2} \left( n \right) - m_{1} \left( n \right)*w_{12} \left( n \right)$$

(6)

where $w_{21} \left( n \right)$ and $w_{12} \left( n \right)$ are the adaptive filters of the FBSS structure.

By inserting Eqs. (3) and (4) in (5) and (6) respectively, we obtain:

$$u_{1} \left( n \right) = b\left( n \right)*\left[ {h_{21} \left( n \right) - w_{21} \left( n \right)} \right] + s\left( n \right)*\left[ {\delta \left( n \right) - h_{12} \left( n \right)*w_{21} \left( n \right)} \right]$$

(7)

$$u_{2} \left( n \right) = s\left( n \right)*\left[ {h_{12} \left( n \right) - w_{12} \left( n \right)} \right] + b\left( n \right)*\left[ {\delta \left( n \right) - h_{21} \left( n \right)*w_{12} \left( n \right)} \right]$$

(8)

The evident theoretical solution of the problem is obtained by setting $h_{21} \left( n \right) = w_{21} \left( n \right)$ and $h_{12} \left( n \right) = w_{12} \left( n \right)$, the estimated output signals $u_{1} \left( n \right)$ and $u_{2} \left( n \right)$ can be expressed as follows:

$$u_{1} \left( n \right) = s\left( n \right)*\left[ {\delta \left( n \right) - h_{12} \left( n \right)*h_{21} \left( n \right)} \right]$$

(9)

$$u_{2} \left( n \right) = b\left( n \right)*\left[ {\delta \left( n \right) - h_{21} \left( n \right)*h_{12} \left( n \right)} \right]$$

(10)

From Eqs. (9) and (10), we notice that the estimated output signals of the FBSS structure is distorted by the post filter. The effect of this post filter is important when the two microphones are closely spaced (Djendi et al. 2007). To avoid the post filter effect, we consider in our work the case where the two microphones are loosely spaced.

3 Proposed algorithm

In this section, we present the proposed adaptive switching algorithm, where, the switching mechanism is based on the mean square error estimation. A block diagram of the proposed algorithm is shown in Fig. 3. The proposed algorithm is very well suited to high and low noise level scenarios, where, it is capable of switching between the two-channel subband NLMS algorithm (TC-SNLMS) (Djendi and Bendoumia 2013) and the two-channel fullband NLMS (TC-FNLMS) algorithm (Van Gerven and Van Compernolle 1992). The main idea is to use the TC-SNLMS algorithm for processing high level noise components and alternatively employ the TC-FNLMS algorithm when noise intensity is low. The switching mechanism can be described by the following steps:

Step 1 Energy calculation.

In the first step, we compute recursively the energy $E\left( n \right)$ of the filtering error $u_{1} \left( n \right)$ by the following relation:

$$E\left( n \right) = \beta E\left( {n - 1} \right) + \left( {1 - \beta } \right)u_{1} \left( n \right)^{2}$$

(11)

where $\beta$ is a smoothing factor.

Step 2 MSE estimation.

Once the energy has been calculated, the mean square error estimate can be computed as follow:

$$MSE_{e} \left( n \right) = 10\log_{10} \left( E \right)$$

(12)

Step 3 Switching rule.

The switching mechanism between the two adaptive filtering algorithms is performed by comparing the MSE estimate with a MSE threshold, where the MSE_th value is determined from experimental observations. The switching rule is defined as follows:

$$\left\{ {\begin{array}{ll} {{\text{if}}\;MSE_{e} (n) < MSE_{th} ,} & {{\text{TC - FNLMS}}\;{\text{algorithm}}} \\ {{\text{else}},} & {{\text{TC - SNLMS}}\;{\text{algorithm}}} \\ \end{array} } \right.$$

3.1 Two-channel subband NLMS (TC-SNLMS) algorithm

The TC-SNLMS algorithm is a subband implementation of the forward blind source separation based on the use of the normalized least mean square (NLMS) algorithm (Djendi and Bendoumia 2013). As depicted on Fig. 4, this algorithm consists on dividing the two noisy signals $m_{1} \left( n \right)$ and $m_{2} \left( n \right)$ into a set of subband signals by an M-channel analysis filter banks. The resultant subband signals $m_{1i} \left( n \right)$ and $m_{2i} \left( n \right)$ for $i = 1,2, \ldots ,M$ are decimated according to the subbands number, then the forward BSS structure is applied to estimate the decimated output sub-signals $u_{1i,D} \left( p \right)$ and $u_{2i,D} \left( p \right)$ from only the decimated mixing sub-signals $m_{1i,D} \left( p \right),\;m_{2i,D} \left( p \right)$, and finally the synthesis filter banks are used to reconstruct the estimated signals to their fullband form $u_{1} \left( n \right)$ and $u_{2} \left( n \right)$.

The decimated output sub-signals of the TC-SNLMS algorithm are given by the following formulas:

$$u_{1i,D} \left( p \right) = m_{1i,D} \left( p \right) - \varvec{w}_{21}^{T} \left( p \right)\varvec{m}_{2i,D} \left( p \right)\;\;\;i = 1, 2, \ldots ,M.$$

(13)

$$u_{2i,D} \left( p \right) = m_{2i,D} \left( p \right) - \varvec{w}_{12}^{T} \left( p \right)\varvec{m}_{1i,D} \left( p \right)\;\;\;i = 1, 2, \ldots ,M.$$

(14)

The decimated mixing sub-signals are defined as:

$$m_{1i,D} \left( p \right) = m_{1i} \left( {pM} \right)\;\;\;i = 1, 2, \ldots ,M.$$

(15)

$$m_{2i,D} \left( p \right) = m_{2i} \left( {pM} \right)\;\;\;i = 1, 2, \ldots ,M.$$

(16)

$$m_{1i} \left( n \right) = \varvec{H}_{i}^{T} \varvec{m}_{1} \left( n \right)\;\;\;i = 1, 2, \ldots ,M.$$

(17)

$$m_{2i} \left( n \right) = \varvec{H}_{i}^{T} \varvec{m}_{2} \left( n \right)\;\;\;i = 1, 2, \ldots ,M.$$

(18)

where M is the number of subbands and D is the decimator factor, (D = M). The variable n is used for the time index of the original fullband mixing signals and p is used for the decimated sub-signals. $m_{1i} \left( {pM} \right)$ and $m_{2i} \left( {pM} \right)$ are the outputs of the analysis filters banks. $\varvec{m}_{1} \left( n \right) = \left[ {m_{1} \left( n \right), m_{1} \left( {n - 1} \right), \ldots ,m_{1} \left( {n - l + 1} \right)} \right],\;\varvec{m}_{2} \left( n \right) = \left[ {m_{2} \left( n \right), m_{2} \left( {n - 1} \right), \ldots ,m_{2} \left( {n - l + 1} \right)} \right]$. l is the length of the analysis filters $\varvec{H}_{i}$.

The estimated fullband signals $u_{1} \left( n \right)$ and $u_{2} \left( n \right)$ are given by the following relations:

$$u_{1} \left( n \right) = \sum\limits_{i = 1}^{M} {\varvec{G}_{i}^{T} \varvec{U}_{1i} \left( n \right)}$$

(19)

$$u_{2} \left( n \right) = \sum\limits_{i = 1}^{M} {\varvec{G}_{i}^{T} \varvec{U}_{2i} \left( n \right)}$$

(20)

where

$$u_{1i} \left( n \right) = \left\{ {\begin{array}{ll} {u_{1i,D} (p/I),} & {n = 0, \pm I, \pm 2I, \ldots } \\ 0 & {otherwise} \\ \end{array} } \right.\;\;\;For\;i = 1, 2, \ldots ,M.$$

(21)

$$u_{2i} \left( n \right) = \left\{ {\begin{array}{ll} {u_{2i,D} (p/I),} & {n = 0, \pm I, \pm 2I, \ldots } \\ 0 & {otherwise} \\ \end{array} } \right.\;\;\;For\;i = 1, 2, \ldots ,M.$$

(22)

$${\text{and}}\;\varvec{U}_{1i} \left( n \right) = \left[ {u_{1i} \left( n \right), u_{1i} \left( {n - 1} \right), \ldots ,u_{1i} \left( {n - l + 1} \right)} \right],\;\varvec{U}_{2i} \left( n \right) = \left[ {u_{2i} \left( n \right), u_{2i} \left( {n - 1} \right), \ldots ,u_{2i} \left( {n - l + 1} \right)} \right],\;\varvec{U}_{2i} \left( n \right) = \left[ {u_{2i} \left( n \right), u_{2i} \left( {n - 1} \right), \ldots ,u_{2i} \left( {n - l + 1} \right)} \right]$$

I is the interpolator factor, in our case we take $I = D = M.$l is the length of the synthesis filters $\varvec{G}_{i}$

Adopting the NLMS algorithm to update the filters $w_{21} \left( p \right)$ and $w_{12} \left( p \right)$, we get in a vector notation the following relations:

$$\varvec{w}_{21} \left( {p + 1} \right) = \varvec{w}_{21} \left( p \right) + \mu_{1} \sum\limits_{i = 1}^{M} {\frac{{\varvec{m}_{2i,D} \left( p \right)u_{1i,D} \left( p \right)}}{{\left| {\varepsilon + \varvec{m}_{2i,D}^{T} \left( p \right)\varvec{m}_{2i,D} \left( p \right)} \right|}}}$$

(23)

$$\varvec{w}_{12} \left( {p + 1} \right) = \varvec{w}_{12} \left( p \right) + \mu_{2} \sum\limits_{i = 1}^{M} {\frac{{\varvec{m}_{1i,D} \left( p \right)u_{2i,D} \left( p \right)}}{{\left| {\varepsilon + \varvec{m}_{1i,D}^{T} \left( p \right)\varvec{m}_{1i,D} \left( p \right)} \right|}}}$$

(24)

where $\varvec{m}_{1i,D} \left( p \right) = \left[ {m_{1i,D} \left( p \right), m_{1i,D} \left( {p - 1} \right), \ldots ,m_{1i,D} \left( {p - L + 1} \right)} \right]$ and $\varvec{m}_{2i,D} \left( p \right) = \left[ {m_{2i,D} \left( p \right), m_{2i,D} \left( {p - 1} \right), \ldots ,m_{2i,D} \left( {P - L + 1} \right)} \right]$. L is the length of the adaptive filters. The step-sizes $\mu_{1} , \mu_{2} ( 0 < \mu_{1} ,\mu_{2} < 2),$ are the control parameters of the TC-SNLMS algorithm, which adjusts respectively, the convergence direction of the adaptive filters $w_{21} \left( p \right)$ and $w_{12} \left( p \right)$. The parameter $\varepsilon$ is a small positive constant which allows avoiding division by very small values in absence of the input signal (silence periods).

3.2 Two-channel fullband NLMS (TC-FNLMS) algorithm

The TC-FNLMS algorithm is a two channel adaptive filtering algorithm based on the combination between the forward blind source separation structure and the normalized least mean square algorithm (Van Gerven and Van Compernolle 1992). A detailed scheme of the TC-FNLMS algorithm is presented on Fig. 5.

The enhanced output signals of the TC-FNLMS algorithm are given by the following relations

$$u_{1} \left( n \right) = m_{1} \left( n \right) - \varvec{w}_{21}^{T} \left( n \right)\varvec{m}_{2} \left( n \right)$$

(25)

$$u_{2} \left( n \right) = m_{2} \left( n \right) - \varvec{w}_{12}^{T} \left( n \right)\varvec{m}_{1} \left( n \right)$$

(26)

where $\varvec{m}_{1} \left( n \right) = \left[ {m_{1} \left( n \right),m_{1} \left( {n - 1} \right), \ldots ,m_{1} \left( {n - L + 1} \right)} \right]^{T}$ and $\varvec{m}_{2} \left( n \right) = \left[ {m_{2} \left( n \right),m_{2} \left( {n - 1} \right), \ldots ,m_{2} \left( {n - L + 1} \right)} \right]^{T}$ are the vectors that contain the last L samples of the inputs $m_{1} \left( n \right)$ and $m_{2} \left( n \right)$, respectively.

The update relations of the adaptive filters $w_{21} \left( n \right)$ and $w_{12} \left( n \right)$ are given as follows:

$$\varvec{w}_{21} \left( {n + 1} \right) = \varvec{w}_{21} \left( n \right) + \mu_{21} \frac{{\varvec{m}_{2} \left( n \right)u_{1} \left( n \right)}}{{\left| {\alpha + \varvec{m}_{2}^{T} \left( n \right)\varvec{m}_{2} \left( n \right)} \right|}}$$

(27)

$$\varvec{w}_{12} \left( {n + 1} \right) = \varvec{w}_{12} \left( n \right) + \mu_{12} \frac{{\varvec{m}_{1} \left( n \right)u_{2} \left( n \right)}}{{\left| {\alpha + \varvec{m}_{1}^{T} \left( n \right)\varvec{m}_{1} \left( n \right)} \right|}}$$

(28)

where $0 < \mu_{12}$,$\mu_{21} < 2$ are the two step-size that control the convergence behavior of the cross-adaptive filters $w_{21} \left( n \right)$ and $w_{12} \left( n \right)$. The parameter $\alpha$ is a small constant introduced to avoid division by zero.

4 Simulation results

4.1 Description of the used signals

In this simulation, the original speech signal $s\left( n \right)$ is a French sentence of about 4 s length, taken from AURORA database (Combescure 1981), and it is presented on Fig. 6 with its manual segmentation. For the punctual noise source signal $b\left( n \right)$ the USASI noise (United state of America Standard Institute now (ANSI)) is used (see Fig. 6). These signals are sampled at 8 kHz and coded on 16 bits. To generate the impulse responses $h _{12} \left( n \right)$ and $h_{21} \left( n \right)$, we have used the physical model described in Djendi et al. (2006). An example of these impulse responses is given in Fig. 7.

We have used the signals described above to generate the noisy observations $m_{1} \left( n \right)$ and $m_{2} \left( n \right)$ according to the simplified convolutive mixture model (describe in Sect. 2). In Fig. 8 we show the time evolution of the two noisy observations $m_{1} \left( n \right)$ and $m_{2} \left( n \right)$. The input signal-to-noise-ratio (SNR) is selected to be $SNR_{1} = 0\;dB$ and $SNR_{2} = 0\; dB$ at the first and the second microphone, respectively.

4.2 Time evolution of the output speech signal

In this section, we present a simple visual test on the output signal obtained by the proposed switching algorithm. As we are interested on speech enhancement, we focus only on the first output $u_{1} \left( n \right)$. Parameters setting of the proposed algorithm are given as follows: the adaptive filters length is L = 128, the subband filters length for M = 2, M = 4, M = 8 are respectively: l = 16, l = 32, l = 64, the MSE threshold value is: $MSE_{th} = - 45\;dB$, the input SNR is selected to be 0 dB at the two microphones. In Fig. 9, we show the time evolution of the output speech signal $u_{1} \left( n \right)$ obtained by the proposed switching algorithm with 2, 4 and 8 subband configurations. This figure shows the good behavior of the proposed algorithm in reducing the acoustic noise components with all subband configurations.

4.3 Objective evaluation

In order to evaluate objectively the performances of the proposed switching algorithm in comparison with the classical two-channel fullband NLMS (TC-FNLMS) algorithm and the two-channel subband NLMS (TC-SNLMS) algorithm, intensive experiments have been done in terms of the following objective criteria:

(i)
The system mismatch (SM),
(ii)
The segmental mean square error (SegMSE),
(iii)
The segmental signal to noise ratio (SegSNR),
(iv)
The Cepstral distance (CD).

4.3.1 System mismatch (SM) evaluation

To assess the speed convergence performance of the proposed switching algorithm in comparison with the TC-FNLMS and TC-SNLMS algorithms, we have used the SM criterion which is computed between the adaptive filter $w_{21} \left( n \right)$ and the real one $h_{21} \left( n \right)$ as follows [31]:

$$SM_{dB} = 20 log_{10} \left( {\frac{{\varvec{h}_{21} - \varvec{w}_{21} \left( n \right)}}{{\varvec{h}_{21} }}} \right)$$

(29)

where the symbol ∥. ∥ represent the mathematical Euclidean norm operator. The simulation parameters of each algorithm (i.e. TC-FNLMS, TC-SNLMS and proposed switching algorithm) are summarized in Table 1. The obtained results of the SM comparison between TC-FNLMS, TC-SNLMS algorithms and proposed switching algorithm for three input SNRs i.e. -3 dB, 0 dB and 3 dB, and with different subband configurations (2, 4 and 8 subbands) are reported on Figs. 10, 11 and 12. It can be obviously seen from these figures that the proposed algorithm with (2, 4 and 8 subbands) outperforms the other simulated algorithms in terms of speed convergence for all input SNR levels. This good performance of the proposed algorithm is achieved thanks to the fullband–subband switching procedure based on MSE estimation, where, the use of the TC-SNLMS algorithm when the MSE_e is superior than the MSE_th, and the alternative use of the TC-FNLMS algorithm in the opposite case allows getting a fast speed convergence in transient and steady phases.

Table 1 Simulation parameters of the simulated algorithms i.e. TC-FNLMS algorithm (Van Gerven and Van Compernolle 1992), TC-SNLMS algorithm (Djendi and Bendoumia 2013), proposed switching algorithm (In this paper)

Full size table

4.3.2 Segmental mean square error (SegMSE) evaluation

In this subsection, we have evaluated the mean square error criterion for the TC-FNLMS, TC-SNLMS algorithms and the proposed one. The segmental mean square error (SegMSE) allows to quantify the speed convergence performance of each simulated algorithm. This SegMSE criterion is given by the following relation:

$$SegMSE_{dB} = \frac{10}{M}\sum\limits_{m = 0}^{M - 1} {\log_{10} } \left( {\frac{1}{N}\sum\limits_{n = Nm}^{Nm + N - 1} {\left| {s\left( n \right) - u_{1} \left( n \right)} \right|^{2} } } \right)$$

(30)

where N is the segment length of the original signal $s\left( n \right)$ and the enhanced one $u_{1} \left( n \right)$, and M represent the number of segments in silence periods. We note that the SegMSE criterion is evaluated only in the absence speech periods (Ghribi et al. 2016). The simulation parameters of each algorithm are summarized in Table 1. Considering Figs. 13, 14 and 15 which show the SegMSE evaluation of the TC-FNLMS, TC-SNLMS (with 2, 4, 8 subbands) and the proposed switching algorithm (with 2, 4, 8 subbands) for three input SNRs i.e. − 3 dB, 0 dB and 3 dB, we notice that the proposed algorithm has better speed convergence performance in comparison with the other ones. This remarks is observed with the entire test when M = 2, 4 and 8.

4.3.3 Segmental signal-to-noise-ratio (SegSNR) evaluation

In order to analyze the noise reduction performance of the proposed algorithm in comparison with the TC-FNLMS and TC-SNLMS algorithms, we have used the SegSNR criterion, which is evaluated for each algorithm as follows (Deller et al. 1993; Sayed 2003):

$$SegSNR_{dB} = \frac{10}{M}\mathop \sum \limits_{m = 0}^{M - 1} \log_{10} \left( {\frac{{\mathop \sum \nolimits_{n = Nm}^{Nm + N - 1} \left| {s\left( n \right)} \right|^{2} }}{{\mathop \sum \nolimits_{n = Nm}^{Nm + N - 1} \left| {s\left( n \right) - u_{1} \left( n \right)} \right|^{2} }}} \right)$$

(31)

where $s\left( n \right)$ and $u_{1} \left( n \right)$ are the original and the enhanced speech signals, respectively. The parameters M and N are the number of segments and the segment length, respectively. We note that at the output, we get M values of the SegSNR criterion, each one is mean averaged on ‘N’ samples. The symbol | · | represents the absolute operator. We recall here that all the ‘M’ segments correspond to only speech signal presence periods. The symbol log10 is the base 10 logarithm. Figures 16, 17 and 18 show the SegSNR evaluation of the proposed algorithm in comparison with the TC-FNLMS and TC-SNLMS ones for three global input SNRs i.e. − 3 dB, 0 dB and 3 dB. For each algorithm, we use the same parameters given in Table 1.

According to the obtained results, we can say that the proposed algorithm behaves more efficiently than the other competitive algorithms (TC-FNLMS and TC-SNLMS) for different input SNR levels i.e. − 3 dB, 0 dB and 3 dB. We have also noted that the output SegSNR values of the TC-SNLMS algorithm decrease in the steady state regime when the number of subbands is selected high (4, 8 subbands), however the proposed switching algorithm with (2,4 and 8 subbands) has given the higher values of the SegSNR in transient and steady phases. This is the main benefit of the proposed switching algorithm that aim to combine the good convergence performance of the TC-SNLMS algorithm when the number of subbands is selected high with the good final values of the TC-FNLMS algorithm.

4.3.4 Cepstral distance (CD) evaluation

In order to quantify the distortion amount introduced in the output speech signal obtained by the proposed switching algorithm in comparison with the TC-FNLMS and TC-SNLMS ones, we have used the cepstral distance criterion which is estimated by the following relation (Hu and Loizou 2008; Rabiner and Juang 1993):

$$CD_{dB} = \sum\limits_{\lambda = 0}^{T - 1} {IFFT\left[ {\log \left( {S\left| {\left( {\lambda ,\omega } \right)} \right|} \right) - \log \left( {U_{1} \left| {\left( {\lambda ,\omega } \right)} \right|VAD_{\lambda } } \right)} \right]^{2} }$$

(32)

where $S\left( {\lambda ,\omega } \right)$ and $U_{1} \left( {\lambda ,\omega } \right)$ represent the short Fourier transform of the original speech signal $s\left( n \right)$ and the enhanced one $u_{1} \left( n \right)$ respectively at each frame $\lambda$, and T is the mean averaging value of the CD criterion, and VAD parameter is a voice activity detector. We have reported on Figs. 19, 20 and 21 the CD evaluation results obtained by the three algorithms for three input SNRs i.e. − 3 dB, 0 dB and 3 dB and with different subband configurations (2,4 and 8 subbands). We recall that the simulation parameters of each algorithm are the same as given by Table 1. These results show that the TC-FNLMS algorithm outperforms the other algorithms (i.e. TC-SNLMS and proposed algorithm) in terms of steady-state CD values. Also a close behavior of the proposed algorithm with the TC-SNLMS one is noted. It is worth noting that the CD values of the proposed algorithm for all subband configurations and in divers situations are below − 5 dB. This indicates the good intelligibility of the output speech signal.

5 Conclusion

In this paper, we have proposed a new switching adaptive speech enhancement algorithm, wherein, the two-channel fullband NLMS (TC-FNLMS) algorithm and the two-channel subband NLMS (TC-SNLMS) algorithm are switched alternatey according to the estimated MSE. To validate the performance of the proposed switching algorithm in comparison with TC-FNLMS and TC-SNLMS algorithms, intensive experiments have been performed using several objective criteria. The obtained results of the SM and the segmental MSE have confirmed the superiority of the proposed algorithm in terms of convergence speed, this good performance is obtained thanks to the proposed fullband–subband switching technique. The SegSNR evaluation has also proved the efficiency of the proposed algorithm on reducing the acoustic noise at the processing output. Unfortunately, for the CD evaluation we have noted a slight degradation on the performance of the proposed algorithm in the steady state regime. In a future work, we aim to address the issue and improve the proposed switching algorithm in the situation where it fails.

References

Benesty, J., & Cohen, I. (2018) Single-channel speech enhancement in the time domain. In: Canonical correlation analysis in speech enhancement. Springer Briefs in Electrical and Computer Engineering. Cham: Springer.
Combescure, P. (1981). 20 listes de dix phrases phonétiquement équilibrées. Revue d’Acoustique,56, 34–38.
Google Scholar
Deller, J., Proakis, J., & Hansen, J. (1993). Discrete time processing of speech signals. New York: MacMillan Publishing.
Google Scholar
Dixit, S., & Mulge, M. Y. (2014). Review on speech enhancement techniques. International Journal of Computer Science and Mobile Computing,3(8), 285–290.
Google Scholar
Djendi, M. (2018). An efficient wavelet-based adaptive filtering algorithm for automatic blind speech enhancement. International Journal of Speech Technology,21, 355–367.
Article Google Scholar
Djendi, M., & Bendoumia, R. (2013). A new adaptive filtering subband algorithm for two channel acoustic noise reduction and speech enhancement. Computers & Electrical Engineering,39(8), 2531–2550.
Article Google Scholar
Djendi, M., & Bendoumia, R. (2016). Improved subband-forward algorithm for acoustic noise reduction and speech quality enhancement. Applied Soft Computing,42, 132–143.
Article Google Scholar
Djendi, M., Gilloire, A., & Scalart, P. (2006). Noise cancellation using two closely spaced microphones: experimental study with a specific model and two adaptive algorithms. IEEE Int. Conf. ICASSP, Toulouse, France, 14–19 May 2006 (Vol. 3, pp. 744–748).
Djendi, M., Gilloire, A., & Scalart, P. (2007). New frequency domain post-filters for noise cancellation using two closely spaced microphones. Proc. EUSIPCO, Poznan, 3–8 Sep. (Vol. 1, pp. 218–221).
Djendi, M., Henni, R., & Sayoud, A. (2016). A new dual forward BSS based RLS algorithm for speech enhancement. International Conference on Engineering and MIS, ICEMIS, Agadir, Morooco.
Djendi, M., & Zoulikha, M. (2018). A new efficient backward BSS crosstalk-resistant algorithm for automatic blind speech quality enhancement. International Journal of Speech Technology,21, 809–823.
Article Google Scholar
Ghribi, K., Djendi, M., & Berkani, D. (2016). A new wavelet-based forward BSS algorithm for acoustic noise reduction and speech quality enhancement. Applied Acoustics,105, 55–66.
Article Google Scholar
Habets, E. A. P., & Benesty, J. (2013). Multi-microphone noise reduction based on orthogonal noise signal decompositions. IEEE Transactions on Audio, Speech and Language Processing,21(6), 1123–1133.
Article Google Scholar
Henni, R., Djendi, M., & Djebari, M. (2019). A new efficient two-channel fast transversal adaptive filtering algorithm for blind speech enhancement and acoustic noise reduction. Computer and Electrical Engineering,73, 349–368.
Article Google Scholar
Hu, Y., & Loizou, P. C. (2008). Evaluation of objective quality measures for speech enhancement. IEEE Transactions on Audio, Speech and Language Processing,16(1), 229–238.
Article Google Scholar
Kim, L. H., & Hasegawa-Johnson, M. (2012). Optimal multi-microphone speech enhancement in cars. Digital signal processing for in-vehicle systems and safety (pp. 195–204). Berlin: Springer.
Book Google Scholar
Kuo, S. M., & Peng, W. M. (1999). Asymmetric crosstalk-resistant adaptive noise canceller. Proc. IEEE workshop on Signal Processing System, pp. 605–614.
Lee, K. A., & Gan, W. S. (2004). Improving convergence of the NLMS algorithm using constrained subband updates. IEEE Signal Processing Letters,11(9), 736–739.
Article Google Scholar
Loizou, P. C. (2007). Speech enhancement: Theory and practice. New York: CRC Press.
Book Google Scholar
Loizou, P. C. (2017). Speech enhancement: theory and practice (2nd ed.). Boca Raton: CRC Press, Taylor & Francis Group.
Google Scholar
Man Kima, S., & Kook Kim, H. (2014). Noise variance estimation based on dual-channel phase difference for speech enhancement. Digital Signal Processing,26, 169–182.
Article Google Scholar
Nabi, W., Aloui, N., & Cherif, A. (2016). Speech enhancement in dual-microphone mobile phones using Kalman filter. Applied Acoustics,109, 1–4.
Article Google Scholar
Nabi, W., Aloui, N., & Cherif, A. (2017). An improved speech enhancement algorithm for dual-channel mobile phones using wavelet and genetic algorithm. Computers & Electrical Engineering,62, 692–705.
Article Google Scholar
Nabi, W., BenNasr, M., Aloui, N., & Cherif, A. (2018). A dual-channel noise reduction algorithm based on the coherence function and the bionic wavelet. Applied Acoustics,131, 186–191.
Article Google Scholar
Ni, J., & Li, F. (2010). A variable step-size matrix normalized subband adaptive filter. IEEE Transactions on Audio, Speech and Language Processing,18(6), 1290–1299.
Article Google Scholar
Rabiner, L., & Juang, B. H. (1993). Fundamentals of speech recognition. Englewood Cliffs: Prentice-Hall.
Google Scholar
Sayed, A. H. (2003). Fundamentals of adaptive filtering. New York: Wiley.
Google Scholar
Sayoud, A., Djendi, M., Medahi, S., & Guessoum, A. (2018). A dual fast NLMS adaptive filtering algorithm for blind speech quality enhancement. Applied Acoustics,135, 101–110.
Article Google Scholar
Seo, J. H., & Park, P. G. (2014). Variable individual step-size subband adaptive filtering algorithm. Electronics Letters,50(3), 177–178.
Article Google Scholar
Syskind Pedersen, M., Larsen, J., Kjems, U., & Parra, L. C. (2007). A survey of convolutive blind source separation methods. Springer Handbook on speech processing and speech communication. Cham: Springer.
Google Scholar
Thumchirdchupong, H., & Tangsangiumvisai, N. (2013). A two-microphone noise reduction scheme for hands-free telephony in a car environment. 10th International Conference on Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology.
Upadhyay, N., & Jaiswal, R. K. (2016). Single channel speech enhancement: Using Weiner filtering recursive noise estimation. Procedia Computer Science,84, 22–30.
Article Google Scholar
Van Gerven, S,. & Van Compernolle, D. (1992). Feedforward and Feedback in a symmetric adaptive noise canceller: Stability analysis in a simplified case. Proc. IEEE. EUSIPCO, Belgium, Brussels (Vol. 24–27, pp. 1081–1084).
Van Gerven, S., & Van Compernolle, D. (1995). Signal separation by symmetric adaptive decorrelation: Stability, convergence, and uniqueness. IEEE Transactions on Signal Processing,74(3), 1602–1612.
Article Google Scholar
Yu, Y., Zhao, H., & Chen, B. (2016). A new normalized subband adaptive filter algorithm with individual variable step sizes. Circuits, Systems and Signal Process,35(4), 1407–1418.
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

Signal Processing and Image Laboratory (LATSI), University of Blida 1, Route de Soumaa, B.P. 270, Blida, 09000, Algeria
Akila Sayoud, Mohamed Djendi & Abderrezak Guessoum

Authors

Akila Sayoud
View author publications
You can also search for this author in PubMed Google Scholar
Mohamed Djendi
View author publications
You can also search for this author in PubMed Google Scholar
Abderrezak Guessoum
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Djendi.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Sayoud, A., Djendi, M. & Guessoum, A. A new speech enhancement adaptive algorithm based on fullband–subband MSE switching. Int J Speech Technol 22, 993–1005 (2019). https://doi.org/10.1007/s10772-019-09651-4

Download citation

Received: 14 August 2019
Accepted: 28 September 2019
Published: 05 October 2019
Issue Date: December 2019
DOI: https://doi.org/10.1007/s10772-019-09651-4

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new speech enhancement adaptive algorithm based on fullband–subband MSE switching

Abstract

Similar content being viewed by others

Upgraded NLMS algorithm for speech enhancement with sparse and dispersive impulse responses

Blind Speech Enhancement Using Adaptive Algorithms

Iterative Thresholding-Based Spectral Subtraction Algorithm for Speech Enhancement

1 Introduction

2 Mixing and separation model

3 Proposed algorithm

3.1 Two-channel subband NLMS (TC-SNLMS) algorithm

3.2 Two-channel fullband NLMS (TC-FNLMS) algorithm