Keywords

1 Introduction

The speech quality is one of the major performance indexes in communication systems. In fact, the presence of background noise and acoustic echo effect the listener’s perception where the voice communication would became difficult or even impossible [1]. The acoustic echo is caused by the reflection of sound waves and acoustics coupling between the loudspeaker and the microphone on the teleconference and the hands-free communication systems. Hence, acoustic echo cancellation (AEC) techniques are generally used to remove this undesired echo and improve the speech intelligibility [2]. Moreover, adaptive filtering is considered as the most effective solution of this problem. The adaptive filter contains two blocks: digital filter and adaptive algorithm. Also, a finite impulse response (FIR) structure of the digital filter is an attractive choice because of the ease in design and stability considerations. Several adaptive filtering algorithms are proposed to update the filter coefficients, including least mean squares (LMS) [3], recursive least squares (RLS) [4], affine projection (AP) [5] algorithms, etc. The filter coefficients updating is the corner stone of the adaptive filtering algorithm. It allows to achieve an optimum coefficients values able to reduce the effect of the echo signal. In addition, a step-size value should be well selected to achieve a good behavior of the adaptive algorithm.

In this paper, we investigate in the step-size parameter of the adaptive filtering algorithm for acoustic echo cancellation. Many works have been proposed in this task to improve the adaptive filtering algorithms like variable step-size versions [6,7,8,9,10] and partial update [11,12,13,14,15].

The rest of this paper is organized as follow. Section 2 provides the principle of AEC based on adaptive filtering with discussion of the NLMS algorithm. In Sect. 3, the proposed approach of the filter coefficients updating is presented. Section 4 illustrates the simulation results obtained by the proposed approach, and Sect. 5 presents some conclusions.

2 Adaptive Filtering Based Acoustic Echo Cancellation

Acoustic echo cancellation is considered as a system identification issue, when the main role of the adaptive filter is to estimate the echo path between the loudspeaker an the microphone. This echo path is modeled by the impulse response of the loudspeaker-enclosure-microphone (LEM) system [16]. In addition, the adaptive filtering choice for acoustic echo cancellation assures a simultaneous full-duplex communication and keeps the speakers more comfortable.

The principle of AEC is shown in Fig. 1, where the basic steps of the AEC can be summarized as follow:

Fig. 1.
figure 1

A basic structure of AEC system.

  1. 1.

    Estimate the characteristics of echo path (impulse response \( {\mathbf{h}} \)).

  2. 2.

    Create a replica of the echo signal (estimated echo \( \hat{y}\left( n \right) \)).

  3. 3.

    The estimated echo signal \( \hat{y}\left( n \right) \) is then subtracted from microphone signal \( d\left( n \right) \) to remove the undesirable echo \( y\left( n \right) \). Adaptive filter \( {\mathbf{w}}\left( n \right) \) is a good supplement to achieve a good replica because of the echo path is usually unknown and time-varying.

The acoustic echo signal \( y\left( n \right) \) is the filter resulting from the far-end signal \( x\left( n \right) \) through the LEM system impulse response \( {\mathbf{h}} \) as is depicted in Fig. 1.

At each sample time \( n \), the echo signal is modeled by the following equation:

$$ y\left( n \right) = {\mathbf{x}}^{T} \left( n \right) {\mathbf{h}} $$
(1)

where

$$ {\mathbf{h}} = \left[ {h_{0 } \quad h_{1 } , \ldots h_{L - 1} } \right]^{T} $$
(2)

\( L \) is the length of the echo path, the superscript \( \left( \cdot \right)^{T} \) denotes transpose of a vector.

$$ {\mathbf{x}}\left( n \right) = \left[ {x\left( n \right) x \left( {n - 1} \right), \ldots x\left( {n - L + 1} \right)} \right]^{T} $$
(3)

is the length-L history of the received signal, or far-end signal \( x\left( n \right). \)

The desired signal \( d\left( n \right) \) of the microphone input includes the echo signal \( y\left( n \right) \), and the background noise signal \( y\left( n \right) \) as:

$$ d\left( n \right) = y\left( n \right) + b\left( n \right) $$
(4)

In this paper we consider that the near-end signal is absent (single-talk scenario) for evaluating the adaptive algorithm performance without the perturbation of the near-end signal.

The adaptive filter generates an estimate of echo \( {\hat{\text{y}}}\left( {\text{n}} \right) \) which is a linear combination of several inputs at time \( n \). This signal represents the echo replica which is expressed as:

$$ \hat{y}\left( n \right) = {\mathbf{x}}^{T} \left( n \right) {\mathbf{w}}\left( n \right) $$
(5)

where

$$ {\mathbf{w}}\left( n \right) = \left[ {w_{0} \left( n \right) \;w_{1} \left( n \right), \ldots w_{L - 1} \left( n \right)} \right]^{T} $$
(6)

is the weight vector of the adaptive filter.

The error signal \( e\left( n \right) \) corresponds to the residual echo signal and it is obtained by subtracting this estimate \( \hat{y}\left( n \right) \) from the microphone signal \( d\left( n \right) \) [8]. This error signal is given by:

$$ e\left( n \right) = d\left( n \right) - \hat{y}\left( n \right) $$
(7)

The NLMS algorithm is the most popular adaptive filtering algorithms, due to its low complexity and its robustness to finite precision errors. It’s used for updating the filter coefficients in AEC context, which is defined as [4]:

$$ {\mathbf{w}}\left( {n + 1} \right) = {\mathbf{w}}\left( n \right) + \frac{\mu }{{{\mathbf{x}}^{T} \left( n \right){\mathbf{x}}\left( n \right) + \varepsilon }}e\left( n \right) {\mathbf{x}}\left( n \right) $$
(8)

where \( {\mathbf{w}}\left( n \right) \) is the present tap weight value of the adaptive filter. \( \mu \) is the step-size parameter which is used in the weight vector updating with \( 0 < \mu < 2 \), and \( \varepsilon > 0 \) is a regularization constant that prevents division by a very small number of the data norm.

3 Proposed Approach of Adaptive Filtering Updating

Two main characteristics of the acoustic echo are: reverberation and latency. Reverberation is the persistence of sound after stopping of the original sound. Impulse response is the pressure-time response function at the receiver position inside a room as a result of an impulse excitation. The impulse response contains three main parts: the direct sound, early reflections and late reverberation as is depicted in Fig. 2 [17]. Various types of reverberation formulae are derived. Most of those formulae feature exponentially decay of reverberation in a room [18].

Fig. 2.
figure 2

Room impulse response.

The sound level decays exponentially over time in the room, so generally the evolution of the acoustic impulse response represents an exponential decay. Moreover, to estimate the impulse response, the update of non-significant coefficients (late reverberation) and significant coefficients (early reflections) by the same step-size value slow-down the global convergence of the filter coefficients.

For this reason, we propose a new strategy of adaptive filtering update based on a weighted updating. In this proposition, the filter coefficients are not updated by the same step-size value when the step-size parameter \( \mu \) is replaced by a vector \( {\varvec{\upmu}} \). This new vector contains step-size values which vary according to the exponential function over the length of the adaptive filter which is defined as:

$$ {\varvec{\upmu}} = \mu \,\exp \left( { - \lambda t} \right) $$
(9)

where \( \mu \) is the step-size parameter, \( t = 1, \ldots ,L \) and \( \lambda \) is the exponential decay constant with \( 0 < \lambda < 1 \), \( { \hbox{min} }\left( {\varvec{\upmu}} \right) > 0 \) and \( { \hbox{max} }\left( {\varvec{\upmu}} \right) < 2 \). Generally, the length of the adaptive filter \( L \) is chosen to be equal to the length of the impulse response.

The update of the filter coefficients by the new approach for NLMS algorithm can be expressed as follow:

$$ {\mathbf{w}}\left( {n + 1} \right) = {\mathbf{w}}\left( n \right) + \frac{e\left( n \right)}{{{\mathbf{x}}^{T} \left( n \right){\mathbf{x}}\left( n \right) + \varepsilon }}{\varvec{\upmu}} {\mathbf{x}}\left( n \right) $$
(10)

Figure 3 shows an example of acoustic impulse response of room with \( 512 \) samples of length in blue, also exponential function curves for different values of the exponential decay constant \( \lambda \) with step-size \( \mu \) equals to \( 1 \). These curves represent the step-size values distribution for the filter coefficients update where the non-significant coefficients have small values of the step-size compared to the significant coefficients.

Fig. 3.
figure 3

The proposed step-size function \( {\varvec{\upmu}} \) with different values of the exponential decay constant \( \lambda \).

4 Simulation Results and Discussions

In the evaluation task, we have used two types of inputs signals: stationary and non-stationary signals. The stationary signal is presented by a white Gaussian noise (WGN). On the other hand, speech signal taken from the TIMIT database [19] is used to evaluate the proposed approach for the non-stationary input signal and simulate AEC scenario. This signal represents the far-end speaker signal. These input signals are sampled at \( 16\, {\text{kHz}} \) and they are plotted in Fig. 4.

Fig. 4.
figure 4

The input signals. (a) the stationary signal, (b) the non-stationary signal.

Two measured impulse responses are used to model the echo paths: the first one consists of 1024 samples [20] and the second 512 samples [21] as shown in Fig. 5.

Fig. 5.
figure 5

Two acoustic echo paths.

The AEC system based on adaptive filter uses NLMS algorithm to update their coefficients with, \( \mu = 0.9 \) and \( \varepsilon = 2.2204 \times 10^{ - 16} \). The length of this filter \( L \) is equal to the length of the echo path. Hence, the acoustic echo signal \( y\left( n \right) \) is resulting from the linear convolution between the input signal and the measured impulse response (echo path).

In order to evaluate the proposed approach, we have used two criteria measures: a normalized misalignment (system mismatch) and echo return loss enhancement (ERLE) with a total number of iterations \( {\text{N}} = 40000 \). These criteria are defined as:

$$ {\text{Misalignment }}\left( {\text{dB}} \right) = 10 \,{ \log }_{10} \left[ {\frac{{\left\| {{\mathbf{w}}\left( n \right) - {\mathbf{h}}} \right\|^{2} }}{{\left\| {\mathbf{h}} \right\|^{2} }}} \right] $$
(11)

where \( \left\| {{\mathbf{w}}\left( n \right) - {\mathbf{h}}} \right\| \) is the Euclidian distance between the adaptive coefficients vector \( {\mathbf{w}}\left( n \right) \) and the real echo path vector \( {\mathbf{h}} \), and \( \left\| {\mathbf{h}} \right\| \) is the Euclidian norm of \( {\mathbf{h}} \).

$$ {\text{ERLE}}\left( {\text{dB}} \right) = 10\, {\text{log}}_{10} \left\{ {\frac{{E\left[ {\left| {y\left( n \right)} \right|^{2} } \right]}}{{E\left[ {\left| {e\left( n \right)} \right|^{2} } \right]}}} \right\} $$
(12)

where \( E\left[ . \right] \) denotes mathematical expectation. The role of AEC system is to minimize the misalignment and maximize ERLE.

The real environment is modeled by a white Gaussian background noise signal \( b\left( n \right) \) that is added to the echo signal \( y\left( n \right) \) at different signal-to-noise ratio (SNR) values, where

$$ {\text{SNR}}\left( {\text{dB}} \right) = 10 \,{ \log }_{10} \left\{ {\frac{{E\left[ {\left| {y\left( n \right)} \right|^{2} } \right]}}{{E\left[ {\left| {b\left( n \right)} \right|^{2} } \right]}}} \right\} $$
(13)

Figures 6 and 7 show the misalignment and the ERLE curves for the stationary input signal using the echo path (a) and the echo path (b), respectively. A jump is realized at 20000 iterations to simulate a change in the echo path and test the tracking capability. These learning curves demonstrate good performance of the proposed approach compared to the classical NLMS algorithm in terms for misalignment steady-state error minimizing and maximizing of the ERLE values. Also, the proposed approach has a good tracking capability in echo path change situations.

Fig. 6.
figure 6

Evaluation curves of the NLMS and the proposed approach for WGN input signal with the acoustic echo path (a) of 1024 taps, \( \mu = 0.9 \), \( \lambda = 0.0025 \), \( {\text{SNR}} = 20 \,{\text{dB}} \). Left: misalignment curves, right: ERLE curves.

Fig. 7.
figure 7

Evaluation curves of the NLMS and the proposed approach for WGN input signal with the acoustic echo path (b) of 512 taps, \( \mu = 0.9 \), \( \lambda = 0.008 \), \( {\text{SNR}} = 20 \,{\text{dB}} \). Left: misalignment curves, right: ERLE curves.

The obtained results in Figs. 8 and 9 confirm that the proposed approach has a better performance in terms of small steady-state error and large values of the ERLE for the non-stationary input signal using the two echo paths. These results denote that the proposed approach can reduce the effect of the acoustic echo and enhance the communication quality.

Fig. 8.
figure 8

Evaluation curves of the NLMS and the proposed approach for speech input signal with the acoustic echo path (a) of 1024 taps, \( \mu = 0.9 \), \( \lambda = 0.005 \), \( {\text{SNR}} = 30 \,{\text{dB}} \). Left: misalignment curves, right: ERLE curves.

Fig. 9.
figure 9

Evaluation curves of the NLMS and the proposed approach for speech input signal with the acoustic echo path (b) of 512 taps, \( \mu = 0.9 \), \( \lambda = 0.008 \), \( {\text{SNR}} = 30 \,{\text{dB}} \). Left: misalignment curves, right: ERLE curves.

The temporal evolution of the error signal \( e\left( n \right) \) (residual echo) for the NLMS and the proposed approach is plotted in Fig. 10. From this result we can note that the proposed approach performs well in acoustic echo cancellation scenario compared to the original NLMS in terms of residual echo reduction. Therefore, it can improve the speech quality in the communication systems.

Fig. 10.
figure 10

Comparison between the temporal evolution of the residual echo of the NLMS algorithm and the proposed approach with the acoustic echo path (a) of 1024 taps, \( \mu = 0.9 \), \( \lambda = 0.005 \), \( {\text{SNR}} = 30\, {\text{dB}} \).

5 Conclusion

In this paper, we have proposed a weighted updating of the adaptive filter coefficients for acoustic echo cancellation. The basic idea behind this work is the use of the significant degree to update the filter coefficients where a small step-size is used to update a non-significant coefficient and vice versa. The performance of the proposed approach is verified using stationary and non-stationary input signals. This proposed approach provides better performance than the original NLMS algorithm in terms of steady-state error reduction, echo return loss enhancement maximization and has a good tracking capability. Also, it shows robustness against background noise.