1 Introduction

For the purpose of speech quality improvement and acoustic noise reduction, a number of approaches have been proposed in the literature. The least-mean-square family (LMS and normalized LMS) are the popular ones (Ozeki & Umeda, 1984; Widrow & Hoff, 1960), because of their simplicity and robust performance. The blind source separation scheme (BSSS) is one of the efficient methods, which has shown good performances. These methods usually estimate the original source signals using the information of mixture signals from available input channels. Recently, this BSS scheme has been combined with adaptive filtering mechanism for enhancing the speech signal and to reduce the acoustic noise. Several two-channel BSS techniques were proposed to enhance noisy speech, basing on adaptive filters (Bendoumia & Djendi, 2015; Djendi & Bendoumia, 2013, 2014; Mitra, 2020).

The adaptive identification of unknown impulse response in the cross coupling filters is quite equivalent to blind source separation problem (Gerven & Compernolle, 1995; Mitra, 2020). This same principle of full band algorithms is valid and used with sub band techniques as well using analysis and synthesis filter banks employing decimation and interpolation processes that yields better performance. Speech signal noise removal (Shrawankar & Thakare, 2010) utilized an optimal filtering technique with consideration of time or transform domain technique. This developed filtering approach estimate the noise signal and helps reducing the signal noise with improved speech signal characteristics.

Two channel forward blind source separation is one such speech quality enhancement method, which uses two mixed signals measured at two distinct points, to implement speech quality improvement. In a two channel forward blind source separation, two adaptive finite impulse response filters (FIR) are used to obtain the speech signal estimate from two mixtures of speech and noise. Adaptive filters are usually updated using a forward normalized least mean square algorithm (Priyanka Kajla & George, 2020). Speech enhancement by the consideration of different statistical characteristic properties of noise on the basis of noise classification is presented in Adam and Babikir, (2020).

Enhancement algorithms generally endeavor noise suppression and late reverberation, as early reverberation is perceived, not as a separate sound source and usually improves speech quality and intelligibility. An adaptive denoising and dereverberation kalman filtering framework that tracks the speech and reverberation spectral log-magnitudes is presented in Nicolas and Mike B”, (2019). Computational complexity of fast transversal filtering algorithm can be further reduced where the adaptation gain is obtained by discarding completely the forward and backward predictors (Benallal & Arezki, 2014).

Several adaptive algorithms were presented earlier in order to reduce noise and enhance the speech signal (Djendi et al., 2007) (Rakesh & Kumar, 2015). The popular algorithms are the recursive least square RLS (Cioffi & Kailath, 1984) 16] and the least mean square LMS and its normalized version (Al-Kindi & Dunlop, 1989; Bashar, 2019; Manoharan, 2019). The NLMS algorithm is famous for its ease of implementation and less computational complexity in comparison with the RLS. However the RLS has better convergence speed. Another approach that is widely used in literature to resolve the problem of corrupted speech signals is the blind source separation (BSS) techniques. Two widely employed BSS structures are forward BSS and backward BSS. These structures are often combined with different adaptive algorithms and used for applications such as acoustic noise reduction and speech enhancement (Mirchandani et al., 1992) (Ghribi et al., 2016).

In (Sayoud et al., 2018), fast NLMS (FNLMS) algorithm combined with the FBSS structure has been presented that showed best performance in comparison with the classical double forward NLMS (DNLMS) one. Classical LMS adaptive algorithms suffer from weak performance for nonstationary signals and RLS algorithms suffer from large computational complexity. In (Rahima et al., 2018) a dual backward adaptive algorithm has been presented that employs simplified fast transversal filter structure and forward prediction to calculate the adaptive gain to yield better values for computational metrics.

In this paper, double backward distributive weighted adaptive filtering approach is proposed for speech quality improvement by way of a two channel convolutive mixture model. In this scheme, two mixed speech signals are used as inputs to estimate the original signals which created these mixtures.

The rest of the paper is organized as follows: The development of the proposed method is presented in Sect. 2. Section 3 provides discussion on the simulated results and the performance evaluation of this proposed method with considered objective measures and the concluding remarks are drawn in Sect. 4.

2 Proposed method of speech enhancement

As discussed earlier, the speech quality in the hearing devices can be enhanced by employing double microphone sound acquisition. With this motivation the proposed method has been presented.

In this section, the proposed method for speech quality improvement is presented with mathematical formulations. In a two channel two microphone model scenario (Priyanka Kajla & George, 2020) may be considered equivalent to the comprehensive convolutive mixing model shown in Fig. 1.

Fig. 1
figure 1

Complete convolutive mixing model

Here \(s\left( n \right) \) is the speech signal and \(i\left( n \right)\) is the noise. These signals are passed through the mixture model with two forward impulse responses and two cross coupled impulse responses. In order to simplify the depicted model further, unit impulse response considered for the forward paths and additional background noise is assumed zero. Under this generalization, the outputs \(y1\left( n \right)\) and \(y2\left( n \right)\) of the mixture models are given by

$$ y1\left( n \right) = s\left( n \right) + h21\left( n \right)*i\left( n \right) $$
(1)
$$ y2\left( n \right) = i\left( n \right) + h12\left( n \right)*s\left( n \right) $$
(2)

Now these mixture signals are passed through an adaptive filtering block for speech enhancement. This block will deconvolve the mixture signals (Sayoud et al., 2018) using two adaptive filters with variable step sizes and a variable updation algorithm as shown in Fig. 2.

Fig. 2
figure 2

Deconvolution (filtering) mechanism with updation process

The deconvolved signals are denoted as \(z1\left( n \right)\) and \(z2\left( n \right)\) respectively with the corresponding weights of the adaptive filters as

$$ {\varvec{p21}}\left( {\text{n}} \right) \, = \, \left[ { \, {\varvec{p21}};0\left( {\text{n}} \right),{\text{ p21}};{1}\left( {\text{n}} \right),{\text{ p21}};{2}\left( {\text{n}} \right), \ldots \ldots ..{\text{p21}};{\text{M}} - {1}\left( {\text{n}} \right)} \right]^{{\text{T}}} $$
(3)
$$ {\varvec{p12}}\left( {\text{n}} \right)\; = \;\left[ {{\text{ p12}};0\left( {\text{n}} \right),{\text{ p12}};{1}\left( {\text{n}} \right),{\text{ p12}};{2}\left( {\text{n}} \right), \ldots \ldots ..{\text{p12}};{\text{M}} - {1}\left( {\text{n}} \right)} \right]^{{\text{T}}} $$
(4)

These weights are updated in order to remove the signal content from the noise correlated component. Now the estimated output speech is given as

$$ z1\left( n \right) = y1\left( n \right) - {\mathbf{p12}}^{{\text{T}}} \;{\varvec{z}}2\left( n \right) $$
(5)

where \({\varvec{z}}2\left( n \right)\) = [\(z2\left( n \right), z2\left( {n - 1} \right)\),\(z2\left( {n - 2} \right)\)…………\( z2\left( {n - M + 1} \right)\)]T is the tapped delay mixed signal. In a similar way, noise estimation is done. In the blind deconvolution procedure the weights of two adaptive filters p12(n) and p21(n) are updated (Rahima et al., 2018) using normalized least mean square algorithm and here the weights are updated as follows.

$$ {\mathbf{p12}}\left( {\text{n}} \right) \, = {\mathbf{p12}}\left( {\text{n}} \right) + \sigma 1{\varvec{y}}2\left( n \right) k2\left( n \right) z1\left( n \right) $$
(6)
$$ {\mathbf{p21}}\left( {\text{n}} \right) \, = {\mathbf{p21}}\left( {\text{n}} \right) + \sigma 2{\varvec{y}}1\left( n \right) k1\left( n \right)\;z2\left( n \right) $$
(7)

here \(k1\left( n \right)\) and \(k2\left( n \right) \) are the adaptation gain vectors with variable step sizes(σ1,σ2), which are calculated from the likelihood variables and kalman gain. The above update rules are termed as double backward normalized least mean square in this paper. The cross coupling model used for the mixing is a familiar acoustic path model where in impulse response will be generally smaller in nature. So as to take the advantage over these impulse responses, a new updating rule may be derived as

$$ {\varvec{p21}}\left( {{\text{n}} + {1}} \right)\; = \;{\varvec{p21}}\left( {\text{n}} \right) \, - y2(n){\text{k2}}\left( {\text{n}} \right).{\text{z1}}\left( {\text{n}} \right) \, {-} \, \lambda \left\{ {{\text{rect}}\left( {{\text{p21}}\left( {\text{n}} \right)} \right)/\left| {\left| {{\text{ p21}}\left( {\text{n}} \right)} \right|} \right|^{{2}} } \right\} $$
(8)
$$ {\varvec{p12}}\left( {{\text{n}} + {1}} \right)\; = \;{\varvec{p12}}\left( {\text{n}} \right)\; - \;y1\left( n \right){\text{k1}}\left( {\text{n}} \right).{\text{z2}}\left( {\text{n}} \right) \, {-} \, \varepsilon \left\{ {{\text{rect}}\left( {{\text{p12}}\left( {\text{n}} \right)} \right)/\left| {\left| {{\text{ p12}}\left( {\text{n}} \right)} \right|} \right|^{{2}} } \right\} $$
(9)

where λ and ε are the loss factors for the update filters while minimizing the cost functions. The adaptation gains k1(n) and k2(n) are obtained by using the calculation of dual kalman variables while discarding the forward and backward predictors.

3 Simulation results

For the purpose of the simulation of the proposed method, speech signal is taken from GRID corpus database and noise is taken from NOISEX92 database. Simulation parameters have been computed for different input SNR scenarios are PESQ (Rix et al., 2001) and STOI (Taal et al., 2010), both have shown significant improvement. PESQ is used for evaluating the quality of the processed speech. The higher the PESQ score, the better will be the quality. The short time objective intelligibility is used to evaluate intelligibility of speech. The STOI is shown to have a high correlation with the speech intelligibility. The larger the STOI score, the more intelligible will be the speech. The simulation results for babble noise are presented in the following Tables 1, 2 and are compared with the available methods (Raj, 2019). The computed PESQ and STOI performance measures are depicted graphically in the following Figs. 3 and 4.

Table 1 Comparison of the results of PESQ for different methods
Table 2 Comparison of the results of STOI for different methods
Fig. 3
figure 3

PESQ Score comparison for babble noise in different SNR scenario

Fig. 4
figure 4

STOI Score comparison for babble noise in different SNR scenario

The simulation results for white noise are presented in the following Tables 3, 4 and are compared with the existing methods (Raj, 2019).

Table 3 Comparison of the results of PESQ for different methods
Table 4 Comparison of the results of STOI for different methods

From the Figs. 5 and 6 presented above, it is clearly evident that PESQ and STOI scores have improved significantly, in comparison to the existing methodologies. The performance metrics used here hold the highest scores for all the conditions.

Fig. 5
figure 5

PESQ Score comparison for white noise in different SNR scenario

Fig. 6
figure 6

STOI Score comparison for white noise in different SNR scenario

In the conventional speech enhancement methods, it is easy to take out the harmonic structure at high frequencies owing to its weak energy so that quality and intelligibility is reduced. Considering the temporal relevance of the speech spectra between the adjacent frames and the application of double backward least mean square algorithm to update the weights of the adaptive filter which result in better noise removal, assist in attaining the better results by the proposed method.

4 Concluding remarks

In this paper, a double backward distributive weighted adaptive filtering scheme has been proposed for quality improvement of the speech signal. The proposed approach provides significant lead over the distributive nature of the impulse responses used in the mixing scenario. This scheme has been shown to provide improved speech quality with babble and white noise simulation in terms of PESQ and STOI when compared to the traditional methods. Hence, this is a better method for speech enhancement in terms of speech quality and intelligibility. This method can be implemented for the effectiveness by incorporating a modified filter structure as well as with the other database signals.