1 Introduction

The advent and wide dissemination of mobile voice communication systems has substantially increased the need for reliable communication systems in noisy environments. The environmental noise is a significant limitation on the performance of hands-free voice communications (Thumchirdchupong and Tangsangiumvisai 2013). To handle this critical issue, i.e. the environmental noise, several noise reduction and speech enhancement techniques have been proposed in the literature (Loizou 2017; Kuo and Peng 1999; Dixit and Mulge 2014). Generally, the approaches can be classified into a single-channel (Upadhyay and Jaiswal 2016; Benesty and Cohen 2018), dual-channel (Man Kima and Kook Kim 2014; Nabi et al. 2017) or multi-channel (Kim and Hasegawa-Johnson 2012; Habets and Benesty 2013) speech enhancement methods.

In recent years, dual-channel speech enhancement algorithms have been widely used due to their good performance in different noise situations (i.e. stationary or non-stationary noise signal) (Sayoud et al. 2018; Nabi et al. 2016), unlike the single-channel methods that fail in the presence of non-stationary noise (Loizou 2007). Also, they are easy to implement and do not require a large computational complexity compared to multi-channel speech enhancement techniques. Nabi et al. (2018) proposed a dual-microphone noise reduction algorithm based on the coherence function and the bionic wavelet transform using Kalman filter, this method has the ability to deal with two closely spaced microphones and does not require noise statistics estimation. Moreover, numerous dual-channel speech enhancement techniques based on the combination between adaptive algorithms and the forward and backward blind source separation structures have been proposed (Henni et al. 2019; Djendi and Zoulikha 2018; Djendi 2018). Subband adaptive filtering is another efficient approach that is often used for speech enhancement and noise reduction applications due to its capability in improving convergence performance for highly correlated input signals. Lee and Gan (2004) proposed the normalized subband adaptive filtering (NSAF) algorithm, where, it allows to whiten the input and the desired output signals by divided them into multiple subbands and that improves the convergence rate. For further performances improvement, several modified versions of the NSAF algorithm have been developed (Ni and Li 2010; Yu et al. 2016; Seo and Park 2014). Furthermore, Djendi and Bendoumia (2013) proposed a two-channel subband adaptive filtering algorithm based on the forward blind source separation (FBSS) structure, which has improved the convergence speed performance of the classical FBSS, but, the performance of this algorithm depends on the fixed step size that reflects a compromise between fast convergence rate and low steady-state error. To overcome this conflict, a two-channel subband FBSS structure with variables step-size has been proposed in Djendi and Bendoumia (2016).

In this paper, we propose a new adaptive switching algorithm for noise reduction and speech enhancement applications. The proposed algorithm switches between the two-channel subband normalized least mean square (TC-SNLMS) algorithm (Djendi and Bendoumia 2013) and the two-channel fullband normalized least mean square (TC-FNLMS) one (Van Gerven and Van Compernolle 1992), where, the switching procedure is based on the mean square error (MSE) estimation. After a review of the proposed algorithm, we can say that the proposed switching algorithm exhibits better speed convergence in comparison with the classical two-channel fullband NLMS algorithm and the two- channel subband NLMS one.

The organization of this paper is as follows: in Sect. 2, we present mixing and separation model. In Sect. 3 we describe the principle of the proposed algorithm. The simulation results of the proposed switching algorithm in comparison with other competitive algorithms are presented in Sect. 4. Finally, we conclude our work in Sect. 5.

2 Mixing and separation model

In a car environment, the hands-free communication is normally disturbed by the surrounded noise (engine, babbling or street noise). As depicted on Fig. 1, the recorded signals by the two microphones are contaminated by the noise components, these noisy observations are a linear combination between the useful speech signal and the disturbing noise signal. The main problem is to recover the original source signals from only these two available noisy observations. The blind source separation structure is widely used to address this critical issue, (Sayoud et al. 2018; Djendi and Bendoumia 2013; Syskind Pedersen et al. 2007).

Fig. 1
figure 1

Illustrative diagram of the mixing and separation model

In this paper the acoustical environment is modeled by the simplified convolutive mixture shown in Fig. 2a, and in order to retrieve the original signals, i.e. speech signal \(s\left( n \right)\) and noise signal \(b\left( n \right)\), from the observed mixed signals, i.e. \(m_{1} \left( n \right)\) and \(m_{2} \left( n \right)\), we use the forward blind source separation structure (Sayoud et al. 2018; Djendi 2018; Djendi et al. 2016), as depicted in Fig. 2b.

Fig. 2
figure 2

Diagrammatic representation of the convolutive mixing and the FBSS structure

Let \(m_{1} \left( n \right)\) and \(m_{2} \left( n \right)\) denote the two noisy observation signals that can be modeled by the following formulations:

$$m_{1} \left( n \right) = s\left( n \right)*h_{11} \left( n \right) + b\left( n \right)*h_{21} \left( n \right)$$
(1)
$$m_{2} \left( n \right) = b\left( n \right)*h_{22} \left( n \right) + s\left( n \right)*h_{12} \left( n \right)$$
(2)

where \(s\left( n \right)\) and \(b\left( n \right)\) are the two independent sources of speech and noise respectively. The symbol ‘*’ stands for the linear convolution operation. \(h_{12} \left( n \right)\) and \(h_{21} \left( n \right)\) represent the cross-coupling effects between the channels. We assume that, the first microphone is close to the speaker and the second microphone is close to the noise source, thus, the direct acoustic paths \(h_{11} \left( n \right)\) and \(h_{22} \left( n \right)\) are equal to the Kronecker unit impulse \(\delta \left( n \right)\) (Van Gerven and Van Compernolle 1995). According to this assumption the two relations of the noisy observation signals can be rewritten as follows:

$$m_{1} \left( n \right) = s\left( n \right) + b\left( n \right)*h_{21} \left( n \right)$$
(3)
$$m_{2} \left( n \right) = b\left( n \right) + s\left( n \right)*h_{12} \left( n \right)$$
(4)

The two estimated output signals of the FBSS structure are given by the following relations:

$$u_{1} \left( n \right) = m_{1} \left( n \right) - m_{2} \left( n \right)*w_{21} \left( n \right)$$
(5)
$$u_{2} \left( n \right) = m_{2} \left( n \right) - m_{1} \left( n \right)*w_{12} \left( n \right)$$
(6)

where \(w_{21} \left( n \right)\) and \(w_{12} \left( n \right)\) are the adaptive filters of the FBSS structure.

By inserting Eqs. (3) and (4) in (5) and (6) respectively, we obtain:

$$u_{1} \left( n \right) = b\left( n \right)*\left[ {h_{21} \left( n \right) - w_{21} \left( n \right)} \right] + s\left( n \right)*\left[ {\delta \left( n \right) - h_{12} \left( n \right)*w_{21} \left( n \right)} \right]$$
(7)
$$u_{2} \left( n \right) = s\left( n \right)*\left[ {h_{12} \left( n \right) - w_{12} \left( n \right)} \right] + b\left( n \right)*\left[ {\delta \left( n \right) - h_{21} \left( n \right)*w_{12} \left( n \right)} \right]$$
(8)

The evident theoretical solution of the problem is obtained by setting \(h_{21} \left( n \right) = w_{21} \left( n \right)\) and \(h_{12} \left( n \right) = w_{12} \left( n \right)\), the estimated output signals \(u_{1} \left( n \right)\) and \(u_{2} \left( n \right)\) can be expressed as follows:

$$u_{1} \left( n \right) = s\left( n \right)*\left[ {\delta \left( n \right) - h_{12} \left( n \right)*h_{21} \left( n \right)} \right]$$
(9)
$$u_{2} \left( n \right) = b\left( n \right)*\left[ {\delta \left( n \right) - h_{21} \left( n \right)*h_{12} \left( n \right)} \right]$$
(10)

From Eqs. (9) and (10), we notice that the estimated output signals of the FBSS structure is distorted by the post filter. The effect of this post filter is important when the two microphones are closely spaced (Djendi et al. 2007). To avoid the post filter effect, we consider in our work the case where the two microphones are loosely spaced.

3 Proposed algorithm

In this section, we present the proposed adaptive switching algorithm, where, the switching mechanism is based on the mean square error estimation. A block diagram of the proposed algorithm is shown in Fig. 3. The proposed algorithm is very well suited to high and low noise level scenarios, where, it is capable of switching between the two-channel subband NLMS algorithm (TC-SNLMS) (Djendi and Bendoumia 2013) and the two-channel fullband NLMS (TC-FNLMS) algorithm (Van Gerven and Van Compernolle 1992). The main idea is to use the TC-SNLMS algorithm for processing high level noise components and alternatively employ the TC-FNLMS algorithm when noise intensity is low. The switching mechanism can be described by the following steps:

Fig. 3
figure 3

Flowchart of the proposed algorithm

  • Step 1 Energy calculation.

In the first step, we compute recursively the energy \(E\left( n \right)\) of the filtering error \(u_{1} \left( n \right)\) by the following relation:

$$E\left( n \right) = \beta E\left( {n - 1} \right) + \left( {1 - \beta } \right)u_{1} \left( n \right)^{2}$$
(11)

where \(\beta\) is a smoothing factor.

  • Step 2 MSE estimation.

Once the energy has been calculated, the mean square error estimate can be computed as follow:

$$MSE_{e} \left( n \right) = 10\log_{10} \left( E \right)$$
(12)
  • Step 3 Switching rule.

The switching mechanism between the two adaptive filtering algorithms is performed by comparing the MSE estimate with a MSE threshold, where the MSEth value is determined from experimental observations. The switching rule is defined as follows:

$$\left\{ {\begin{array}{ll} {{\text{if}}\;MSE_{e} (n) < MSE_{th} ,} & {{\text{TC - FNLMS}}\;{\text{algorithm}}} \\ {{\text{else}},} & {{\text{TC - SNLMS}}\;{\text{algorithm}}} \\ \end{array} } \right.$$

3.1 Two-channel subband NLMS (TC-SNLMS) algorithm

The TC-SNLMS algorithm is a subband implementation of the forward blind source separation based on the use of the normalized least mean square (NLMS) algorithm (Djendi and Bendoumia 2013). As depicted on Fig. 4, this algorithm consists on dividing the two noisy signals \(m_{1} \left( n \right)\) and \(m_{2} \left( n \right)\) into a set of subband signals by an M-channel analysis filter banks. The resultant subband signals \(m_{1i} \left( n \right)\) and \(m_{2i} \left( n \right)\) for \(i = 1,2, \ldots ,M\) are decimated according to the subbands number, then the forward BSS structure is applied to estimate the decimated output sub-signals \(u_{1i,D} \left( p \right)\) and \(u_{2i,D} \left( p \right)\) from only the decimated mixing sub-signals \(m_{1i,D} \left( p \right),\;m_{2i,D} \left( p \right)\), and finally the synthesis filter banks are used to reconstruct the estimated signals to their fullband form \(u_{1} \left( n \right)\) and \(u_{2} \left( n \right)\).

Fig. 4
figure 4

Descriptive scheme of the TC-SNLMS algorithm

The decimated output sub-signals of the TC-SNLMS algorithm are given by the following formulas:

$$u_{1i,D} \left( p \right) = m_{1i,D} \left( p \right) - \varvec{w}_{21}^{T} \left( p \right)\varvec{m}_{2i,D} \left( p \right)\;\;\;i = 1, 2, \ldots ,M.$$
(13)
$$u_{2i,D} \left( p \right) = m_{2i,D} \left( p \right) - \varvec{w}_{12}^{T} \left( p \right)\varvec{m}_{1i,D} \left( p \right)\;\;\;i = 1, 2, \ldots ,M.$$
(14)

The decimated mixing sub-signals are defined as:

$$m_{1i,D} \left( p \right) = m_{1i} \left( {pM} \right)\;\;\;i = 1, 2, \ldots ,M.$$
(15)
$$m_{2i,D} \left( p \right) = m_{2i} \left( {pM} \right)\;\;\;i = 1, 2, \ldots ,M.$$
(16)
$$m_{1i} \left( n \right) = \varvec{H}_{i}^{T} \varvec{m}_{1} \left( n \right)\;\;\;i = 1, 2, \ldots ,M.$$
(17)
$$m_{2i} \left( n \right) = \varvec{H}_{i}^{T} \varvec{m}_{2} \left( n \right)\;\;\;i = 1, 2, \ldots ,M.$$
(18)

where M is the number of subbands and D is the decimator factor, (D = M). The variable n is used for the time index of the original fullband mixing signals and p is used for the decimated sub-signals. \(m_{1i} \left( {pM} \right)\) and \(m_{2i} \left( {pM} \right)\) are the outputs of the analysis filters banks. \(\varvec{m}_{1} \left( n \right) = \left[ {m_{1} \left( n \right), m_{1} \left( {n - 1} \right), \ldots ,m_{1} \left( {n - l + 1} \right)} \right],\;\varvec{m}_{2} \left( n \right) = \left[ {m_{2} \left( n \right), m_{2} \left( {n - 1} \right), \ldots ,m_{2} \left( {n - l + 1} \right)} \right]\). l is the length of the analysis filters \(\varvec{H}_{i}\).

The estimated fullband signals \(u_{1} \left( n \right)\) and \(u_{2} \left( n \right)\) are given by the following relations:

$$u_{1} \left( n \right) = \sum\limits_{i = 1}^{M} {\varvec{G}_{i}^{T} \varvec{U}_{1i} \left( n \right)}$$
(19)
$$u_{2} \left( n \right) = \sum\limits_{i = 1}^{M} {\varvec{G}_{i}^{T} \varvec{U}_{2i} \left( n \right)}$$
(20)

where

$$u_{1i} \left( n \right) = \left\{ {\begin{array}{ll} {u_{1i,D} (p/I),} & {n = 0, \pm I, \pm 2I, \ldots } \\ 0 & {otherwise} \\ \end{array} } \right.\;\;\;For\;i = 1, 2, \ldots ,M.$$
(21)
$$u_{2i} \left( n \right) = \left\{ {\begin{array}{ll} {u_{2i,D} (p/I),} & {n = 0, \pm I, \pm 2I, \ldots } \\ 0 & {otherwise} \\ \end{array} } \right.\;\;\;For\;i = 1, 2, \ldots ,M.$$
(22)
$${\text{and}}\;\varvec{U}_{1i} \left( n \right) = \left[ {u_{1i} \left( n \right), u_{1i} \left( {n - 1} \right), \ldots ,u_{1i} \left( {n - l + 1} \right)} \right],\;\varvec{U}_{2i} \left( n \right) = \left[ {u_{2i} \left( n \right), u_{2i} \left( {n - 1} \right), \ldots ,u_{2i} \left( {n - l + 1} \right)} \right],\;\varvec{U}_{2i} \left( n \right) = \left[ {u_{2i} \left( n \right), u_{2i} \left( {n - 1} \right), \ldots ,u_{2i} \left( {n - l + 1} \right)} \right]$$

I is the interpolator factor, in our case we take \(I = D = M.\)l is the length of the synthesis filters \(\varvec{G}_{i}\)

Adopting the NLMS algorithm to update the filters \(w_{21} \left( p \right)\) and \(w_{12} \left( p \right)\), we get in a vector notation the following relations:

$$\varvec{w}_{21} \left( {p + 1} \right) = \varvec{w}_{21} \left( p \right) + \mu_{1} \sum\limits_{i = 1}^{M} {\frac{{\varvec{m}_{2i,D} \left( p \right)u_{1i,D} \left( p \right)}}{{\left| {\varepsilon + \varvec{m}_{2i,D}^{T} \left( p \right)\varvec{m}_{2i,D} \left( p \right)} \right|}}}$$
(23)
$$\varvec{w}_{12} \left( {p + 1} \right) = \varvec{w}_{12} \left( p \right) + \mu_{2} \sum\limits_{i = 1}^{M} {\frac{{\varvec{m}_{1i,D} \left( p \right)u_{2i,D} \left( p \right)}}{{\left| {\varepsilon + \varvec{m}_{1i,D}^{T} \left( p \right)\varvec{m}_{1i,D} \left( p \right)} \right|}}}$$
(24)

where \(\varvec{m}_{1i,D} \left( p \right) = \left[ {m_{1i,D} \left( p \right), m_{1i,D} \left( {p - 1} \right), \ldots ,m_{1i,D} \left( {p - L + 1} \right)} \right]\) and \(\varvec{m}_{2i,D} \left( p \right) = \left[ {m_{2i,D} \left( p \right), m_{2i,D} \left( {p - 1} \right), \ldots ,m_{2i,D} \left( {P - L + 1} \right)} \right]\). L is the length of the adaptive filters. The step-sizes \(\mu_{1} , \mu_{2} ( 0 < \mu_{1} ,\mu_{2} < 2),\) are the control parameters of the TC-SNLMS algorithm, which adjusts respectively, the convergence direction of the adaptive filters \(w_{21} \left( p \right)\) and \(w_{12} \left( p \right)\). The parameter \(\varepsilon\) is a small positive constant which allows avoiding division by very small values in absence of the input signal (silence periods).

3.2 Two-channel fullband NLMS (TC-FNLMS) algorithm

The TC-FNLMS algorithm is a two channel adaptive filtering algorithm based on the combination between the forward blind source separation structure and the normalized least mean square algorithm (Van Gerven and Van Compernolle 1992). A detailed scheme of the TC-FNLMS algorithm is presented on Fig. 5.

Fig. 5
figure 5

Detailed scheme of the TC-FNLMS algorithm

The enhanced output signals of the TC-FNLMS algorithm are given by the following relations

$$u_{1} \left( n \right) = m_{1} \left( n \right) - \varvec{w}_{21}^{T} \left( n \right)\varvec{m}_{2} \left( n \right)$$
(25)
$$u_{2} \left( n \right) = m_{2} \left( n \right) - \varvec{w}_{12}^{T} \left( n \right)\varvec{m}_{1} \left( n \right)$$
(26)

where \(\varvec{m}_{1} \left( n \right) = \left[ {m_{1} \left( n \right),m_{1} \left( {n - 1} \right), \ldots ,m_{1} \left( {n - L + 1} \right)} \right]^{T}\) and \(\varvec{m}_{2} \left( n \right) = \left[ {m_{2} \left( n \right),m_{2} \left( {n - 1} \right), \ldots ,m_{2} \left( {n - L + 1} \right)} \right]^{T}\) are the vectors that contain the last L samples of the inputs \(m_{1} \left( n \right)\) and \(m_{2} \left( n \right)\), respectively.

The update relations of the adaptive filters \(w_{21} \left( n \right)\) and \(w_{12} \left( n \right)\) are given as follows:

$$\varvec{w}_{21} \left( {n + 1} \right) = \varvec{w}_{21} \left( n \right) + \mu_{21} \frac{{\varvec{m}_{2} \left( n \right)u_{1} \left( n \right)}}{{\left| {\alpha + \varvec{m}_{2}^{T} \left( n \right)\varvec{m}_{2} \left( n \right)} \right|}}$$
(27)
$$\varvec{w}_{12} \left( {n + 1} \right) = \varvec{w}_{12} \left( n \right) + \mu_{12} \frac{{\varvec{m}_{1} \left( n \right)u_{2} \left( n \right)}}{{\left| {\alpha + \varvec{m}_{1}^{T} \left( n \right)\varvec{m}_{1} \left( n \right)} \right|}}$$
(28)

where \(0 < \mu_{12}\),\(\mu_{21} < 2\) are the two step-size that control the convergence behavior of the cross-adaptive filters \(w_{21} \left( n \right)\) and \(w_{12} \left( n \right)\). The parameter \(\alpha\) is a small constant introduced to avoid division by zero.

4 Simulation results

4.1 Description of the used signals

In this simulation, the original speech signal \(s\left( n \right)\) is a French sentence of about 4 s length, taken from AURORA database (Combescure 1981), and it is presented on Fig. 6 with its manual segmentation. For the punctual noise source signal \(b\left( n \right)\) the USASI noise (United state of America Standard Institute now (ANSI)) is used (see Fig. 6). These signals are sampled at 8 kHz and coded on 16 bits. To generate the impulse responses \(h _{12} \left( n \right)\) and \(h_{21} \left( n \right)\), we have used the physical model described in Djendi et al. (2006). An example of these impulse responses is given in Fig. 7.

Fig. 6
figure 6

(in left) The original speech signal with its segmentation, (in right) the USASI noise

Fig. 7
figure 7

Example of the simulated impulse responses, (in left) \(h_{1} \left( n \right)\) and (in right) \(h_{2} \left( n \right)\), with L = 128

We have used the signals described above to generate the noisy observations \(m_{1} \left( n \right)\) and \(m_{2} \left( n \right)\) according to the simplified convolutive mixture model (describe in Sect. 2). In Fig. 8 we show the time evolution of the two noisy observations \(m_{1} \left( n \right)\) and \(m_{2} \left( n \right)\). The input signal-to-noise-ratio (SNR) is selected to be \(SNR_{1} = 0\;dB\) and \(SNR_{2} = 0\; dB\) at the first and the second microphone, respectively.

Fig. 8
figure 8

Time evolution of the two noisy observations, (in left) \(m_{1} \left( n \right)\) and (in right) \(m_{2} \left( n \right)\)

4.2 Time evolution of the output speech signal

In this section, we present a simple visual test on the output signal obtained by the proposed switching algorithm. As we are interested on speech enhancement, we focus only on the first output \(u_{1} \left( n \right)\). Parameters setting of the proposed algorithm are given as follows: the adaptive filters length is L = 128, the subband filters length for M = 2, M = 4, M = 8 are respectively: l = 16, l = 32, l = 64, the MSE threshold value is: \(MSE_{th} = - 45\;dB\), the input SNR is selected to be 0 dB at the two microphones. In Fig. 9, we show the time evolution of the output speech signal \(u_{1} \left( n \right)\) obtained by the proposed switching algorithm with 2, 4 and 8 subband configurations. This figure shows the good behavior of the proposed algorithm in reducing the acoustic noise components with all subband configurations.

Fig. 9
figure 9

Time evolution of the output speech signal \(u_{1} \left( n \right)\) obtained by the proposed switching algorithm with: (in top) 2 subbands, (in middle) 4 subbands, (in bottom) 8 subbands

4.3 Objective evaluation

In order to evaluate objectively the performances of the proposed switching algorithm in comparison with the classical two-channel fullband NLMS (TC-FNLMS) algorithm and the two-channel subband NLMS (TC-SNLMS) algorithm, intensive experiments have been done in terms of the following objective criteria:

  1. (i)

    The system mismatch (SM),

  2. (ii)

    The segmental mean square error (SegMSE),

  3. (iii)

    The segmental signal to noise ratio (SegSNR),

  4. (iv)

    The Cepstral distance (CD).

4.3.1 System mismatch (SM) evaluation

To assess the speed convergence performance of the proposed switching algorithm in comparison with the TC-FNLMS and TC-SNLMS algorithms, we have used the SM criterion which is computed between the adaptive filter \(w_{21} \left( n \right)\) and the real one \(h_{21} \left( n \right)\) as follows [31]:

$$SM_{dB} = 20 log_{10} \left( {\frac{{\varvec{h}_{21} - \varvec{w}_{21} \left( n \right)}}{{\varvec{h}_{21} }}} \right)$$
(29)

where the symbol ∥. ∥ represent the mathematical Euclidean norm operator. The simulation parameters of each algorithm (i.e. TC-FNLMS, TC-SNLMS and proposed switching algorithm) are summarized in Table 1. The obtained results of the SM comparison between TC-FNLMS, TC-SNLMS algorithms and proposed switching algorithm for three input SNRs i.e. -3 dB, 0 dB and 3 dB, and with different subband configurations (2, 4 and 8 subbands) are reported on Figs. 10, 11 and 12. It can be obviously seen from these figures that the proposed algorithm with (2, 4 and 8 subbands) outperforms the other simulated algorithms in terms of speed convergence for all input SNR levels. This good performance of the proposed algorithm is achieved thanks to the fullband–subband switching procedure based on MSE estimation, where, the use of the TC-SNLMS algorithm when the MSEe is superior than the MSEth, and the alternative use of the TC-FNLMS algorithm in the opposite case allows getting a fast speed convergence in transient and steady phases.

Table 1 Simulation parameters of the simulated algorithms i.e. TC-FNLMS algorithm (Van Gerven and Van Compernolle 1992), TC-SNLMS algorithm (Djendi and Bendoumia 2013), proposed switching algorithm (In this paper)
Fig. 10
figure 10

SM evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = - 3\;{\text{dB}}\)

Fig. 11
figure 11

SM evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 24, (in right) M = 28. Input \(SNR_{1} = SNR_{2} = 0\;{\text{dB}}\)

Fig. 12
figure 12

SM evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4 (in right) M = 8. Input \(SNR_{1} = SNR_{2} = 3\;{\text{dB}}\)

4.3.2 Segmental mean square error (SegMSE) evaluation

In this subsection, we have evaluated the mean square error criterion for the TC-FNLMS, TC-SNLMS algorithms and the proposed one. The segmental mean square error (SegMSE) allows to quantify the speed convergence performance of each simulated algorithm. This SegMSE criterion is given by the following relation:

$$SegMSE_{dB} = \frac{10}{M}\sum\limits_{m = 0}^{M - 1} {\log_{10} } \left( {\frac{1}{N}\sum\limits_{n = Nm}^{Nm + N - 1} {\left| {s\left( n \right) - u_{1} \left( n \right)} \right|^{2} } } \right)$$
(30)

where N is the segment length of the original signal \(s\left( n \right)\) and the enhanced one \(u_{1} \left( n \right)\), and M represent the number of segments in silence periods. We note that the SegMSE criterion is evaluated only in the absence speech periods (Ghribi et al. 2016). The simulation parameters of each algorithm are summarized in Table 1. Considering Figs. 13, 14 and 15 which show the SegMSE evaluation of the TC-FNLMS, TC-SNLMS (with 2, 4, 8 subbands) and the proposed switching algorithm (with 2, 4, 8 subbands) for three input SNRs i.e. − 3 dB, 0 dB and 3 dB, we notice that the proposed algorithm has better speed convergence performance in comparison with the other ones. This remarks is observed with the entire test when M = 2, 4 and 8.

Fig. 13
figure 13

SegMSE evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = - 3\;{\text{dB}}\)

Fig. 14
figure 14

SegMSE evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = 0\;{\text{dB}}\)

Fig. 15
figure 15

SegMSE evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching [in this paper] algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 28. Input \(SNR_{1} = SNR_{2} = 3\;{\text{dB}}\)

4.3.3 Segmental signal-to-noise-ratio (SegSNR) evaluation

In order to analyze the noise reduction performance of the proposed algorithm in comparison with the TC-FNLMS and TC-SNLMS algorithms, we have used the SegSNR criterion, which is evaluated for each algorithm as follows (Deller et al. 1993; Sayed 2003):

$$SegSNR_{dB} = \frac{10}{M}\mathop \sum \limits_{m = 0}^{M - 1} \log_{10} \left( {\frac{{\mathop \sum \nolimits_{n = Nm}^{Nm + N - 1} \left| {s\left( n \right)} \right|^{2} }}{{\mathop \sum \nolimits_{n = Nm}^{Nm + N - 1} \left| {s\left( n \right) - u_{1} \left( n \right)} \right|^{2} }}} \right)$$
(31)

where \(s\left( n \right)\) and \(u_{1} \left( n \right)\) are the original and the enhanced speech signals, respectively. The parameters M and N are the number of segments and the segment length, respectively. We note that at the output, we get M values of the SegSNR criterion, each one is mean averaged on ‘N’ samples. The symbol | · | represents the absolute operator. We recall here that all the ‘M’ segments correspond to only speech signal presence periods. The symbol log10 is the base 10 logarithm. Figures 16, 17 and 18 show the SegSNR evaluation of the proposed algorithm in comparison with the TC-FNLMS and TC-SNLMS ones for three global input SNRs i.e. − 3 dB, 0 dB and 3 dB. For each algorithm, we use the same parameters given in Table 1.

Fig. 16
figure 16

SegSNR evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper), algorithms with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = - 3\;{\text{dB}}\)

Fig. 17
figure 17

SegSNR evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 28. Input \(SNR_{1} = SNR_{2} = 0\;{\text{dB}}\)

Fig. 18
figure 18

SegSNR evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 82. Input \(SNR_{1} = SNR_{2} = 3\;{\text{dB}}\)

According to the obtained results, we can say that the proposed algorithm behaves more efficiently than the other competitive algorithms (TC-FNLMS and TC-SNLMS) for different input SNR levels i.e. − 3 dB, 0 dB and 3 dB. We have also noted that the output SegSNR values of the TC-SNLMS algorithm decrease in the steady state regime when the number of subbands is selected high (4, 8 subbands), however the proposed switching algorithm with (2,4 and 8 subbands) has given the higher values of the SegSNR in transient and steady phases. This is the main benefit of the proposed switching algorithm that aim to combine the good convergence performance of the TC-SNLMS algorithm when the number of subbands is selected high with the good final values of the TC-FNLMS algorithm.

4.3.4 Cepstral distance (CD) evaluation

In order to quantify the distortion amount introduced in the output speech signal obtained by the proposed switching algorithm in comparison with the TC-FNLMS and TC-SNLMS ones, we have used the cepstral distance criterion which is estimated by the following relation (Hu and Loizou 2008; Rabiner and Juang 1993):

$$CD_{dB} = \sum\limits_{\lambda = 0}^{T - 1} {IFFT\left[ {\log \left( {S\left| {\left( {\lambda ,\omega } \right)} \right|} \right) - \log \left( {U_{1} \left| {\left( {\lambda ,\omega } \right)} \right|VAD_{\lambda } } \right)} \right]^{2} }$$
(32)

where \(S\left( {\lambda ,\omega } \right)\) and \(U_{1} \left( {\lambda ,\omega } \right)\) represent the short Fourier transform of the original speech signal \(s\left( n \right)\) and the enhanced one \(u_{1} \left( n \right)\) respectively at each frame \(\lambda\), and T is the mean averaging value of the CD criterion, and VAD parameter is a voice activity detector. We have reported on Figs. 19, 20 and 21 the CD evaluation results obtained by the three algorithms for three input SNRs i.e. − 3 dB, 0 dB and 3 dB and with different subband configurations (2,4 and 8 subbands). We recall that the simulation parameters of each algorithm are the same as given by Table 1. These results show that the TC-FNLMS algorithm outperforms the other algorithms (i.e. TC-SNLMS and proposed algorithm) in terms of steady-state CD values. Also a close behavior of the proposed algorithm with the TC-SNLMS one is noted. It is worth noting that the CD values of the proposed algorithm for all subband configurations and in divers situations are below − 5 dB. This indicates the good intelligibility of the output speech signal.

Fig. 19
figure 19

CD evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = - 3\;{\text{dB}}\)

Fig. 20
figure 20

CD evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 24, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = 0\;{\text{dB}}\)

Fig. 21
figure 21

CD evaluation of the TC-FNMLS (Van Gerven and Van Compernolle 1992), TS-SNLMS (Djendi and Bendoumia 2013) and the proposed switching (in this paper) algorithms, with different subband configurations, (in left) M = 2, (in middle) M = 4, (in right) M = 8. Input \(SNR_{1} = SNR_{2} = 3\;dB\)

5 Conclusion

In this paper, we have proposed a new switching adaptive speech enhancement algorithm, wherein, the two-channel fullband NLMS (TC-FNLMS) algorithm and the two-channel subband NLMS (TC-SNLMS) algorithm are switched alternatey according to the estimated MSE. To validate the performance of the proposed switching algorithm in comparison with TC-FNLMS and TC-SNLMS algorithms, intensive experiments have been performed using several objective criteria. The obtained results of the SM and the segmental MSE have confirmed the superiority of the proposed algorithm in terms of convergence speed, this good performance is obtained thanks to the proposed fullband–subband switching technique. The SegSNR evaluation has also proved the efficiency of the proposed algorithm on reducing the acoustic noise at the processing output. Unfortunately, for the CD evaluation we have noted a slight degradation on the performance of the proposed algorithm in the steady state regime. In a future work, we aim to address the issue and improve the proposed switching algorithm in the situation where it fails.