1 Introduction

Echo and noise related problems are very common in many applications involving speech communication, e.g., audio conference and hands-free telephony. In these systems, the speech that comes from the far-end speaker and echoes back with a time delay produces perception problems. In such condition, the perception is further impaired when the speaker is situated in a noisy environment [1]. In order to provide a better communication service, an acoustic echo canceller (AEC) is required to cancel the echo returned to the transmission room and also to allow uninterrupted communication between the rooms [2,3,4].

A class of promising state of the art techniques exists in the literature in which many one microphone [5, 6], and two microphones [7, 8], adaptive techniques have been proposed to resolve this critical issue, i.e. acoustic echo. However, the presence of the punctual noise impedes the convergence of acoustic echo cancellation algorithms, which leads to poor echo cancellation. Wherefore, for hands-free mobile equipment, it is strongly required to employ an acoustic noise canceller besides the acoustic echo canceller. A survey of techniques that disjointedly reduce the noise are proposed, for example, in previous studies [9, 10], some adaptive techniques based on one [11, 12], and or two [13, 14] microphones have been proposed to correct speech distortion. In [15,16,17], the source separation structures principle (forward and backward) were used to improve the corrupted speech signals [18, 19]. Combinations between these two structures and adaptive filtering algorithm families have given new insight to acoustic noise cancellation field. In the situation where two acoustical disturbances are present in one time, i.e. acoustic noise and echo, and to improve the quality of the communication, the aforementioned noise reduction techniques should be concatenated by a second process, i.e. acoustic echo cancellation. Therefore, a survey of techniques for combining acoustic noise and echo cancellation systems can be found in literature [20,21,22,23,24]. The order in which the processing blocks are applied is very important, in the case where the AEC process is firstly applied and then the ANC, and since most noise reduction techniques make use of multiple microphones, the acoustic echo canceller then obviously has to be repeated for each of the microphones, and these AEC systems need to be robust against the noise that is still present in their input signals. The advantage of the second case where the ANC precedes the AEC is that only one acoustic echo canceller is needed.

In this paper, we propose a novel cascade structure for joint backward blind acoustic noise and echo cancellation systems. In the new structure, we propose to use the BBSS structure with the two-channel SFTF algorithm [25] to suppress the punctual noise components from the far-end signal before to be processed with the AEC system based on the use of the single-channel SFTF algorithm [26,27,28,29,30].

The outline of this paper is as follows: Sect. 2 presents the development of the proposed approach, in which a detailed description of the acoustical environment and the new cascading structure for combining acoustic noise and echo cancellation systems are given. In Sect. 3, we present the experimental study, where we describe the used signals and we present the simulation and the evaluation results of the proposed approach. Finally, we conclude our work in Sect. 4.

2 Development of the proposed approach

In realistic hands-free communication applications, the observed speech signal is corrupted by the echo and noise components as well. As shown in Fig. 1, in addition to the punctual noise, the two microphones of the near-end pick up the amplified and broadcasted far-end signal.

Fig. 1
figure 1

Illustrative scheme of the acoustical environment

In order improve the AEC in the presence of punctual noise components, an acoustic noise canceller should be integrated to deal with noise components. In this section, we present a novel cascade structure for joint backward blind acoustic noise and echo cancellation systems, where the blind ANC stage is placed before the AEC one. In Fig. 2, we give a detailed scheme of the proposed structure, and then each step is described in the next sub-sections.

Fig. 2
figure 2

Detailed scheme of the novel cascading structure for joint backward blind acoustic noise and echo cancellation systems

2.1 Modeling of the acoustical environment

In Fig. 2 [1st step], we give the mixing model that is physically coherent [17], where \({\text{s }}\left( {\text{n}} \right)\) is the far-end signal, \({\text{b }}\left( {\text{n}} \right)\) is the punctual noise, and the impulse response \({\text{h}}_{11} \left( {\text{n}} \right)\) models the direct acoustic path, the impulse responses \({\text{h}}_{12} \left( {\text{n}} \right)\) and \({\text{h}}_{21} \left( {\text{n}} \right)\) model the cross acoustic paths between the source signals and the two microphones. In our work, it is assumed that the noise source is close to the second microphone, hence, the direct impulse response \({\text{h}}_{22} \left( {\text{n}} \right)\) is the Kronecker unit impulse \(\updelta\left( {\text{n}} \right)\) [13]. Both of \({\text{d}}_{1} \left( {\text{n}} \right)\) and \({\text{d}}_{2} \left( {\text{n}} \right)\) are the echo signals picked out by the near-end microphones. These echo signals are given by relations (1) and (2) respectively.

$$d_{1} \left( n \right) = s\left( n \right)*h_{11} \left( n \right)$$
(1)
$$d_{2} \left( n \right) = s\left( n \right)*h_{12} \left( n \right)$$
(2)

However, the observed and available signals are \({\text{p}}_{1} \left( {\text{n}} \right)\) and \({\text{p}}_{2} \left( {\text{n}} \right)\) and are given in a function notation as follow:

$$p_{1} \left( n \right) = d_{1} \left( n \right) + b\left( n \right)*h_{21} \left( n \right)$$
(3)
$$p_{2} \left( n \right) = d_{2} \left( n \right) + b\left( n \right)$$
(4)

where ‘*’ denotes the linear convolution operator.

2.2 Problem formulation of joint backward blind ANC and AEC cancellation systems

The proposed system is depicted in Fig. 2 [2nd and 3rd steps]. It consists of a backward blind ANC process, for dealing with the noise cancellation task, and an AEC process, which is designed for the echo cancellation task. The backward blind ANC process [See Fig. 2 (2nd step)] comprises a two channel noise cancellation system based on the BBSS structure [25]. This blind approach aims to recover the original sources estimates of \({\text{s }}\left( {\text{n}} \right)\) and \({\text{b }}\left( {\text{n}} \right)\) by using only the noisy observations \({\text{p}}_{1} \left( {\text{n}} \right)\) and \({\text{p}}_{2} \left( {\text{n}} \right)\) that are generated by the model of Fig. 2 [1st step].

The outputs of the ANC process are given by the following relations:

$$u_{1} \left( n \right) = p_{1} \left( n \right) - u_{2} \left( n \right)*w_{21} \left( n \right)$$
(5)
$$u_{2} \left( n \right) = p_{2} \left( n \right) - u_{1} \left( n \right)*w_{12} \left( n \right)$$
(6)

By inserting (3) and (6) in (5), and (4) and (5) in (6), we get:

$$u_{1} \left( n \right)\left[ {\delta \left( n \right) - w_{12} \left( n \right)*w_{21} \left( n \right) } \right] = \left[ {s\left( n \right)*\left( {h_{11} \left( n \right) - h_{12} \left( n \right)*w_{21} \left( n \right)} \right) + b\left( n \right)*\left( {h_{21} \left( n \right) - w_{21} \left( n \right)} \right)} \right]$$
(7)
$$u_{2} \left( n \right)\left[ {\delta \left( n \right) - w_{12} \left( n \right)*w_{21} \left( n \right)} \right] = \left[ {s\left( n \right)*\left( {h_{12} \left( n \right) - h_{11} \left( n \right)*w_{12} \left( n \right)} \right) + b\left( n \right)*\left( {\delta \left( n \right) - h_{21} \left( n \right)*w_{12} \left( n \right)} \right)} \right]$$
(8)

by using the optimal solutions:

$$w_{21} \left( n \right) = h_{21} \left( n \right)$$
(9)
$$w_{12} \left( n \right) = h_{12} \left( n \right)*\frac{1}{{h_{11} \left( n \right)}}$$
(10)

It can recover the original signals, i.e. \({\text{s }}\left( {\text{n}} \right)\) and \({\text{b }}\left( {\text{n}} \right)\), as given in (11) and (12), with a distortion as given in (11).

$$u_{1} \left( n \right) = s\left( n \right)*h_{11} \left( n \right)$$
(11)
$$u_{2} \left( n \right) = b\left( n \right)$$
(12)

As depicted in Fig. 2 [3rd step], and since the echo signal is speech we employ the AEC system only at the first output, i.e. \(u_{1} \left( n \right)\), we thus obtain the following output signal relation:

$$u_{3} \left( n \right) = u_{1} \left( n \right) - s\left( n \right)*w_{13} \left( n \right)$$
(13)

For more development, we insert (11) in (13). Then the AEC output signal formula can be written as:

$$u_{3} \left( n \right) = s\left( n \right)*\left( {h_{11} \left( n \right) - w_{13} \left( n \right)} \right)$$
(14)

To suppress the acoustic echo, the filter \({\text{w}}_{13} \left( {\text{n}} \right)\) must identify the impulse response \({\text{h}}_{11} \left( {\text{n}} \right)\), i.e. \({\text{w}}_{13} \left( {\text{n}} \right) = {\text{h}}_{11} \left( {\text{n}} \right)\).

2.3 Algorithms outline

In this analysis, we consider two algorithms to update the filters coefficients of the proposed system: the first one is two-channel SFTF algorithm proposed in our previous work [25], it is used in the blind ANC block, and the single-channel SFTF algorithm [26] which is used in the AEC block.

The output signals \({\text{u}}_{1} \left( {\text{n}} \right)\) and \({\text{u}}_{2} \left( {\text{n}} \right)\) of the blind ANC block are given in a vector notation as follow:

$$u_{1} \left( n \right) = p_{1} \left( n \right) - \varvec{w}_{21}^{T} \left( n \right)\varvec{u}_{2} \left( n \right)$$
(15)
$$u_{2} \left( n \right) = p_{2} \left( n \right) - \varvec{w}_{12}^{T} \left( n \right)\varvec{u}_{1} \left( n \right)$$
(16)

where \({\mathbf{u}}_{1} \left( {\text{n}} \right) = \left[ { {\text{u}}_{1} \left( {\text{n}} \right), \ldots , {\text{u}}_{1} \left( { {\text{n }} - {\text{L}} + 1} \right)} \right]^{\text{T}}\), and \({\mathbf{u}}_{2} \left( {\text{n}} \right) = \left[ { {\text{u}}_{2} \left( {\text{n}} \right), \ldots , {\text{u}}_{2} \left( { {\text{n }} - {\text{L}} + 1} \right)} \right]^{\text{T}}\) are the two error vectors of the BBSS structure of Fig. 2 [2nd step]. \({\mathbf{w}}_{21} \left( {\text{n}} \right) = \left[ {{\text{w}}_{21} \left( {\text{n}} \right), \ldots , {\text{w}}_{21} \left( {{\text{n}} - {\text{L}} + 1} \right)} \right]^{\text{T}}\) and \({\mathbf{w}}_{12} \left( {\text{n}} \right) = \left[ {{\text{w}}_{12} \left( {\text{n}} \right), \ldots , {\text{w}}_{12} \left( {{\text{n}} - {\text{L}} + 1} \right)} \right]^{\text{T}}\) are the two filters updated by two-channel SFTF algorithm and given as:

$$\varvec{w}_{21} \left( {n + 1} \right) = \varvec{w}_{21} \left( n \right) - u_{1} \left( n \right)\gamma_{1} \left( n \right)\varvec{k}_{1} \left( n \right)$$
(17)
$$\varvec{w}_{12} \left( {n + 1} \right) = \varvec{w}_{12} \left( n \right) - u_{2} \left( n \right)\gamma_{2} \left( n \right)\varvec{k}_{2} \left( n \right)$$
(18)

The two-channel SFTF algorithm is obtained by eliminating the backward prediction process, thus only the forward predictor is used to compute the dual Kalman gain vectors, i.e. \({\mathbf{k}}_{1} \left( {\text{n}} \right)\) and \({\mathbf{k}}_{2} \left( {\text{n}} \right)\), these two vectors can be written as:

$$\left[ {\begin{array}{*{20}c} {\varvec{k}_{1} \left( n \right)} \\ * \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 \\ {\varvec{k}_{1} \left( {n - 1} \right)} \\ \end{array} } \right] - \frac{{\varepsilon_{1} \left( n \right)}}{{\lambda \alpha_{1} \left( {n - 1} \right) + \delta }} \left[ {\begin{array}{*{20}c} 1 \\ { - {\mathbf{a}}_{1} \left( {n - 1} \right)} \\ \end{array} } \right]$$
(19)
$$\left[ {\begin{array}{*{20}c} {\varvec{k}_{2} \left( n \right)} \\ * \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 \\ {\varvec{k}_{2} \left( {n - 1} \right)} \\ \end{array} } \right] - \frac{{\varepsilon_{2} \left( n \right)}}{{\lambda \alpha_{2} \left( {n - 1} \right) + \delta }} \left[ {\begin{array}{*{20}c} 1 \\ { - {\mathbf{a}}_{2} \left( {n - 1} \right)} \\ \end{array} } \right]$$
(20)

where \(\upalpha_{1}\) and \(\upalpha_{2}\) are the forward prediction errors variances parameters presented by the following relations:

$$\alpha_{1} \left( n \right) = \lambda \alpha_{1} \left( {n - 1} \right) + \gamma_{1} \left( {n - 1} \right) \varepsilon_{1}^{2} \left( n \right)$$
(21)
$$\alpha_{2} \left( n \right) = \lambda \alpha_{2} \left( {n - 1} \right) + \gamma_{2} \left( {n - 1} \right)\varepsilon_{2}^{2} \left( n \right)$$
(22)

We get the forward prediction coefficients vectors, i.e.\({\mathbf{a}}_{1} \left( {\text{n}} \right)\) and \({\mathbf{a}}_{2} \left( {\text{n}} \right)\), by minimizing the functions \({\text{E}}[{\text{u}}_{1} \left( {\text{n}} \right)]\), and \({\text{E}}[{\text{u}}_{2} \left( {\text{n}} \right)]\), respectively. The update formulas of the forward predictors \({\mathbf{a}}_{1} \left( {\text{n}} \right)\) and \({\mathbf{a}}_{2} \left( {\text{n}} \right)\) are given by:

$${\mathbf{a}}_{1} \left( n \right) = \rho \left[ {{\mathbf{a}}_{1} \left( {n - 1} \right) - \varepsilon_{1} \left( n \right)\gamma_{1} \left( n \right)\varvec{k}_{1} \left( {n - 1} \right)} \right]$$
(23)
$${\mathbf{a}}_{2} \left( n \right) = \rho \left[ {{\mathbf{a}}_{2} \left( {n - 1} \right) - \varepsilon_{2} \left( n \right)\gamma_{2} \left( n \right)\varvec{k}_{2} \left( {n - 1} \right)} \right]$$
(24)

The prediction errors \(\varepsilon_{1} \left( n \right)\) and \(\varepsilon_{2} \left( n \right)\) can be estimated as follows:

$$\varepsilon_{1} \left( n \right) = u_{2} \left( n \right) - {\mathbf{a}}_{1} \left( n \right)\varvec{u}_{2} \left( {n - 1} \right)$$
(25)
$$\varepsilon_{2} \left( n \right) = u_{1} \left( n \right) - {\mathbf{a}}_{1} \left( n \right)\varvec{u}_{1} \left( {n - 1} \right)$$
(26)

And we apply the following definitions to calculate the likelihood variables \(\upgamma_{1} \left( {\text{n}} \right)\), \(\upgamma_{2} \left( {\text{n}} \right)\):

$$\gamma_{1} \left( n \right) = \frac{1}{{1 - \varvec{k}_{1}^{T} \left( n \right) \varvec{u}_{2} \left( n \right)}}$$
(27)
$$\gamma_{2} \left( n \right) = \frac{1}{{1 - \varvec{k}_{2}^{T} \left( n \right) \varvec{u}_{1} \left( n \right)}}$$
(28)

Now, we describe the mathematical derivation of the single-channel SFTF algorithm that we use it in the AEC stage. The a priori filtering error \({\text{u}}_{3} \left( {\text{n}} \right)\) and the adaptive filter \({\text{w}}_{13} \left( {\text{n}} \right)\) equations of this algorithm are given by:

$$u_{3} \left( n \right) = u_{1} \left( n \right) - \varvec{w}_{13}^{T} \left( n \right)\varvec{s}\left( n \right)$$
(29)
$$\varvec{w}_{13} \left( {n + 1} \right) = \varvec{w}_{13} \left( n \right) - u_{3} \left( n \right)\gamma_{3} \left( n \right)\varvec{k}_{3} \left( n \right)$$
(30)

where \(s\left( n \right) = \left[ {s\left( n \right),s\left( {n - 1} \right), \ldots ,s\left( {n - L + 1} \right)} \right]^{T}\) is the coefficients vector of the far-end signal \({\text{k}}_{3} \left( {\text{n}} \right)\) and \(\upgamma_{3} \left( {\text{n}} \right)\) are the Kalman gain vector and the likelihood variable, respectevely, and they are calculated as follow:

$$\left[ {\begin{array}{*{20}c} {k_{3} \left( n \right)} \\ * \\ \end{array} } \right] = \left[ {\begin{array}{*{20}c} 0 \\ {k_{3} \left( {n - 1} \right)} \\ \end{array} } \right] - \frac{{\varepsilon_{3} \left( n \right)}}{{\lambda \alpha_{3} \left( {n - 1} \right) + \delta }} \left[ {\begin{array}{*{20}c} 1 \\ { - {\mathbf{a}}_{3} \left( {n - 1} \right)} \\ \end{array} } \right]$$
(31)
$$\gamma_{3} \left( n \right) = \frac{1}{{1 - \varvec{k}_{3}^{T} \left( n \right)\varvec{ s}\left( n \right)}}$$
(32)

The forward prediction coefficients vector \({\text{a}}_{3} \left( n \right)\) is calculated by minimizing the cost function \(E\left[ {\varepsilon_{3}^{2} \left( n \right)} \right]\). The update formula of the forward predictor is given by the following relation:

$${\mathbf{a}}_{3} \left( n \right) = \rho \left[ {{\text{a}}_{3} \left( {n - 1} \right) - \varepsilon_{3} \left( n \right)\gamma_{3} \left( n \right)\varvec{k}_{3} \left( {n - 1} \right)} \right]$$
(33)

The forward prediction error variance \(\alpha_{3} \left( n \right)\) is given by:

$$\alpha_{3} \left( n \right) = \lambda \alpha_{3} \left( {n - 1} \right) + \gamma_{3} \left( {n - 1} \right) \varepsilon_{3}^{2} \left( n \right)$$
(34)

and the prediction error \(\varepsilon_{3}^{2} \left( n \right)\), is calculated by the following relation:

$$\varepsilon_{1} \left( n \right) = s\left( n \right) - {\mathbf{a}}_{1} \left( n \right)\varvec{s}\left( {n - 1} \right)$$
(35)

The asterisk “\(*\)” represents the last unused element of the dual Kalman gain vector. \(\uplambda\) is a smoothing factor, \(\delta\) is a small positive constant used to avoid division by very small values in silence periods. The parameter \(\uprho\) allows better robustness against numerical propagation errors.

In Tables 1 and 2, the proposed cascade structure is summarized.

Table 1 1st proposed processing algorithm: two-channel SFTF algorithm
Table 2 2nd proposed processing algorithm: single-channel SFTF algorithm

3 Experimental results

3.1 Description of the used signals

The observed signals \(p_{1} \left( n \right)\) and \(p_{2} \left( n \right)\), are generated by the model of Fig. 2 [1st step], these signals contains two statistically independent source signals: (1) the speech signal \(s\left( n \right)\), is a short French sentence of about 4 s, pronounced by a male speaker and is phonetically balanced, and (2) a punctual noise \(b\left( n \right)\), that is USASI noise [United states of America Standard Institute now (ANSI)]. The impulse responses (IR) \(h_{12} \left( n \right)\) and \(h_{21} \left( n \right)\) are generated by the model of [9] and have a length \(L = 256\) [Samples of these IRs are given by Fig. 4]. These IRs are used to generate the observations, i.e. \(p_{1} \left( n \right)\) and \(p_{2} \left( n \right)\), a samples of all these signals are given by Fig. 3. The sampling frequency is set to \(fs = 8\,{\text{kHz}}\) and the input SNR is chosen equal to \(SNR = 15\;{\text{dB}}\). All experimental signals and IRs are presented below on Figs. 3 and 4, respectively.

Fig. 3
figure 3

Time evolution of the signals used in simulation. a The far-end signal \({\text{s}}\left( {\text{n}} \right)\). b The punctual noise signal \({\text{b}}\left( {\text{n}} \right)\). The microphones signals \({\text{p}}_{1} \left( {\text{n}} \right)\) and \({\text{p}}_{2} \left( {\text{n}} \right)\) are presented respectively in (c) and (d)

Fig. 4
figure 4

Simulated impulse responses with \({\text{L}} = 256\). a, b and c represent respectively the impulse responses \(h_{11} \left( n \right)\), \(h_{12} \left( n \right)\) and \(h_{21} \left( n \right)\)

The black stepped curve represented with the original speech signal depicted on Fig. 3a represents the manual voice activity detector (Manual VAD). We use this system to extract the speech signal at the first output \({\text{u}}_{1} \left( {\text{n}} \right)\) of the ANC process. It is recalled here that all the obtained results are mean averaged.

In order to evaluate the backward blind ANC process on the AEC stage, we compare in Fig. 5 the output signal obtained by the proposed structure and that obtained by the conventional AEC system. The conventional AEC system considered in our comparison is based on the use of the adaptive single-channel SFTF algorithm. According to Fig. 5, we see the good behavior of the proposed structure in comparison with the conventional AEC system, where both disturbances have been canceled.

Fig. 5
figure 5

Time evolution of: [In top]: the output signal obtained by the AEC based SFTF system. [In bottom]: The output signal obtained by the proposed cascading structure

3.2 Objective criteria

In order to compare the proposed algorithm with the AEC based SFTF system, several experiments in different conditions were performed. We consider the echo return loss enhancement (ERLE) criterion given by [25]:

$$ERLE_{dB} = \frac{10}{M}\mathop \sum \limits_{m = 0}^{M - 1} log_{10} \left( {\frac{{\mathop \sum \nolimits_{n = Nm}^{Nm + N - 1} \left| {s\left( n \right)} \right|^{2} }}{{\mathop \sum \nolimits_{n = Nm}^{Nm + N - 1} \left| {s\left( n \right) - u_{3} \left( n \right)} \right|^{2} }}} \right)$$
(36)

and the mean square error (MSE) gain, computed as follows [25]:

$$MSE_{dB} = \frac{10}{M} \mathop \sum \limits_{m = 0}^{M - 1} log_{10} \left( {\frac{1}{N}\mathop \sum \limits_{{n = N_{m} }}^{{N_{m} + N - 1}} \left| {s\left( n \right) - u_{3} \left( n \right)} \right|^{2} } \right)$$
(37)

where \(s\left( n \right)\) and \(u_{3} \left( n \right)\) are respectively, the original speech signal and the output signal obtained by the proposed structure. The parameters \(M\) and \(N\) are the number of segments and the segment length, respectively.

We have evaluated the performances of the proposed joint backward blind acoustic noise and echo cancellation systems in comparison with the conventional AEC system based on the single-channel SFTF algorithm and we have reported the obtained results on Figs. 6, 7 and 8. The simulation parameters of the proposed algorithms are summarized in Table 3 (selected by simulation), and the obtained results are given for different conditions test and noisy observations levels (high and low input SNR). In these experiments, three input SNRs, i.e. \(5\,{\text{dB}}\), \(20\,{\text{dB}}\) and \(25\,{\text{dB}}\) are used. Furthermore, we have used three types of noise, i.e. USASI, white, and babble. The white noise is used to test the stability performance of the used algorithms, for the evaluation of the convergence speed performance, we use the USASI noise and a real babble noise is used to evaluate the ability of the processing algorithms to track the non-stationarity of the input signal.

Fig. 6
figure 6

ERLE evaluation of the proposed approach and the AEC based SFTF system, with USASI noise, and for three input SNRs: [left]: 5 dB, [middle]: 15 dB and [right]: 25 dB

Fig. 7
figure 7

ERLE evaluation of the proposed approach and the AEC based SFTF system, with white noise, and for three input SNRs: [left]: 5 dB, [middle]: 15 dB and [right]: 25 dB

Fig. 8
figure 8

ERLE evaluation of the proposed approach and the AEC based SFTF system, with babble noise, and for three input SNRs: [left]: 5 dB, [middle]: 15 dB and [right]: 25 dB

Table 3 Summary of the simulation parameters

According to Figs. 6, 7 and 8, we see that the proposed system gives better results in comparison with the AEC based SFTF system in terms of ERLE. This is due to the noise reduction improvement provided by the integrated backward blind ANC process that allows a better estimate of the acoustic echo. We can see that the proposed algorithm performs well even in high punctual noise (input \(SNR = 5\,{\text{dB}}\)), unlike the classical AEC system, where the algorithm is disturbed by the punctual noise present in the input signal.

To support the previous results, we have performed several other experiments in terms of MSE criterion and have selected one of them to evaluate the AEC performance of the proposed cascade structure in comparison with AEC based SFTF system. The simulation parameters are summarized in Table 3, and the obtained results for three input SNRs, i.e. \(5\,{\text{dB}}\), \(15\,{\text{dB}}\) and \(25\,{\text{dB}}\), and three noise types, i.e. USASI, white and babble, are reported on Figs. 9, 10 and 11.

Fig. 9
figure 9

MSE gain evaluation of the proposed approach and the AEC based SFTF system, with USASI noise, and for three input SNRs: [left]: 5 dB, [middle]: 15 dB and [right]: 25 dB

Fig. 10
figure 10

MSE gain evaluation of the proposed approach and the AEC based SFTF system, with white noise, and for three input SNRs: [left]: 5 dB, [middle]: 15 dB and [right]: 25 dB

Fig. 11
figure 11

MSE gain evaluation of the proposed approach and the AEC based SFTF system, with babble noise, and for three input SNRs: [left]: 5 dB, [middle]: 15 dB and [right]: 25 dB

From the obtained results (that are mean averaged), it can be noted the superiority of the proposed approach (cascade structure) over the AEC based SFTF system in transient and steady-state regime. This superiority is more attractive when the input SNR is selected low, this is due to the backward blind ANC stage that makes the AEC process of the proposed cascade structure more robust against acoustic echo components, whereas, the AEC based SFTF system is highly penalized by the noise components presence. Finally, it can be concluded that the proposed cascade structure allows better cancellation of acoustic echo and noise components in the same time.

3.2.1 A comparative study between the proposed approach, AEC NLMS [2], and joint AEC and ANC system [31]

In this section, the performance of the proposed approach is compared with the following algorithms: (1) AEC based NLMS [2], and (2) a joint AEC and ANC system that uses both two-channel NLMS and single-channel NLMS in their process [31]. In this comparative study, the AEC NLMS algorithm [2] is taken as a reference algorithm. In this experiment, the real and adaptive filters are equal to L = 256. The impulse responses is of a room and composed about 256 points, this means an exact modelization of the room impulse response by the adaptive filters of the same length (equal to 256). The parameters of each algorithm are summarized in Table 4.

Table 4 Parameter values of the following algorithms: (1) AEC based NLMS [1], (2) joint system [31], and (3) the proposed approach [in this paper]

Parameters in Table 4 are selected to get the best performance of each algorithm. It is evaluated the ERLE and MSE criteria for different input SNRs and different noise type. However, we have selected some results of ERLE and MSE and reported in Figs. 12 and 13, respectively. From these results, it can be conlcude that the proposed approach behaves more efficiently than the AEC based NLMS and [2] and the joint AEC and ANC system of [31]. The superiority of our approach is got from the good behavior of the one and two-channel SFTF algorithms that are integrated in the proposed joint AEC and ANC structure.

Fig. 12
figure 12

ERLE comparison of the proposed approach with: the AEC based NLMS [2], and the joint system [31], USASI noise, and for input SNR: 15 dB

Fig. 13
figure 13

MSE gain comparison of the proposed approach with: the AEC based NLMS [2], and the joint system [31], USASI noise, and for input SNR: 15 dB

4 Conclusion

In this paper, we have proposed a novel algorithm for joint backward blind acoustic noise and echo cancellation systems. The proposed algorithm is based on two cascade stages for cancelling punctual noise and then acoustic echo signal. In the ANC stage, the BBSS structure is used to cancell punctual noise, then an AEC system is used to suppress the echo signal. Both stages use efficient SFTF algorithm to take advantage of the adaptive behaviour of this algorithms when combined with the proposed cascade system.

To evaluate the proposed system, we have done intensive tests in terms of ERLE and MSE criteria, in various conditions of noisy observations (highly and slightly noisy observations). In these experiments, our proposed approach is compared with an AEC based SFTF system. The obtained results have shown the superiority of the proposed algorithm in terms of objective criteria even under low input SNR condition (more punctual noise is present). Finally, it can be concluded that the proposed system can be a good alternative for AEC techniques in the presence of strong punctual noise components where the classical techniques fail.