A new double backward distributive weighted adaptive filtering approach for speech quality improvement

Srinivasarao, V.; Ghanekar, Umesh

doi:10.1007/s10772-021-09894-0

A new double backward distributive weighted adaptive filtering approach for speech quality improvement

Published: 20 October 2021

Volume 25, pages 831–836, (2022)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

International Journal of Speech Technology Aims and scope Submit manuscript

A new double backward distributive weighted adaptive filtering approach for speech quality improvement

Download PDF

V. Srinivasarao¹ &
Umesh Ghanekar¹

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

In modern telecommunication systems the presence of noise in background degrades the overall intelligibility and quality of the speech signal. The problem of enhancing speech signals and reducing acoustic noise from the noisy environment using adaptive filtering algorithms with incorporation of blind source separation approach has drawn a particular attention in the recent past. In this paper a dual channel double backward distributive weighted adaptive filtering algorithm is proposed for speech quality enhancement. The proposed method has been evaluated using the objective measures such as Perceptual Evaluation of Speech Quality (PESQ) and Short Time Objective Intelligibility (STOI) in different noise setup and the results achieved indicate that this is a better method for speech quality improvement.

An efficient wavelet-based adaptive filtering algorithm for automatic blind speech enhancement

Article 21 April 2018

Advanced Feedforward-and-Feedback Decorrelation Algorithms for Speech Quality Enhancement

Single Channel Speech Enhancement for Mixed Non-stationary Noise Environments

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

For the purpose of speech quality improvement and acoustic noise reduction, a number of approaches have been proposed in the literature. The least-mean-square family (LMS and normalized LMS) are the popular ones (Ozeki & Umeda, 1984; Widrow & Hoff, 1960), because of their simplicity and robust performance. The blind source separation scheme (BSSS) is one of the efficient methods, which has shown good performances. These methods usually estimate the original source signals using the information of mixture signals from available input channels. Recently, this BSS scheme has been combined with adaptive filtering mechanism for enhancing the speech signal and to reduce the acoustic noise. Several two-channel BSS techniques were proposed to enhance noisy speech, basing on adaptive filters (Bendoumia & Djendi, 2015; Djendi & Bendoumia, 2013, 2014; Mitra, 2020).

The adaptive identification of unknown impulse response in the cross coupling filters is quite equivalent to blind source separation problem (Gerven & Compernolle, 1995; Mitra, 2020). This same principle of full band algorithms is valid and used with sub band techniques as well using analysis and synthesis filter banks employing decimation and interpolation processes that yields better performance. Speech signal noise removal (Shrawankar & Thakare, 2010) utilized an optimal filtering technique with consideration of time or transform domain technique. This developed filtering approach estimate the noise signal and helps reducing the signal noise with improved speech signal characteristics.

Two channel forward blind source separation is one such speech quality enhancement method, which uses two mixed signals measured at two distinct points, to implement speech quality improvement. In a two channel forward blind source separation, two adaptive finite impulse response filters (FIR) are used to obtain the speech signal estimate from two mixtures of speech and noise. Adaptive filters are usually updated using a forward normalized least mean square algorithm (Priyanka Kajla & George, 2020). Speech enhancement by the consideration of different statistical characteristic properties of noise on the basis of noise classification is presented in Adam and Babikir, (2020).

Enhancement algorithms generally endeavor noise suppression and late reverberation, as early reverberation is perceived, not as a separate sound source and usually improves speech quality and intelligibility. An adaptive denoising and dereverberation kalman filtering framework that tracks the speech and reverberation spectral log-magnitudes is presented in Nicolas and Mike B”, (2019). Computational complexity of fast transversal filtering algorithm can be further reduced where the adaptation gain is obtained by discarding completely the forward and backward predictors (Benallal & Arezki, 2014).

Several adaptive algorithms were presented earlier in order to reduce noise and enhance the speech signal (Djendi et al., 2007) (Rakesh & Kumar, 2015). The popular algorithms are the recursive least square RLS (Cioffi & Kailath, 1984) 16] and the least mean square LMS and its normalized version (Al-Kindi & Dunlop, 1989; Bashar, 2019; Manoharan, 2019). The NLMS algorithm is famous for its ease of implementation and less computational complexity in comparison with the RLS. However the RLS has better convergence speed. Another approach that is widely used in literature to resolve the problem of corrupted speech signals is the blind source separation (BSS) techniques. Two widely employed BSS structures are forward BSS and backward BSS. These structures are often combined with different adaptive algorithms and used for applications such as acoustic noise reduction and speech enhancement (Mirchandani et al., 1992) (Ghribi et al., 2016).

In (Sayoud et al., 2018), fast NLMS (FNLMS) algorithm combined with the FBSS structure has been presented that showed best performance in comparison with the classical double forward NLMS (DNLMS) one. Classical LMS adaptive algorithms suffer from weak performance for nonstationary signals and RLS algorithms suffer from large computational complexity. In (Rahima et al., 2018) a dual backward adaptive algorithm has been presented that employs simplified fast transversal filter structure and forward prediction to calculate the adaptive gain to yield better values for computational metrics.

In this paper, double backward distributive weighted adaptive filtering approach is proposed for speech quality improvement by way of a two channel convolutive mixture model. In this scheme, two mixed speech signals are used as inputs to estimate the original signals which created these mixtures.

The rest of the paper is organized as follows: The development of the proposed method is presented in Sect. 2. Section 3 provides discussion on the simulated results and the performance evaluation of this proposed method with considered objective measures and the concluding remarks are drawn in Sect. 4.

2 Proposed method of speech enhancement

As discussed earlier, the speech quality in the hearing devices can be enhanced by employing double microphone sound acquisition. With this motivation the proposed method has been presented.

In this section, the proposed method for speech quality improvement is presented with mathematical formulations. In a two channel two microphone model scenario (Priyanka Kajla & George, 2020) may be considered equivalent to the comprehensive convolutive mixing model shown in Fig. 1.

Here $s\left( n \right) $ is the speech signal and $i\left( n \right)$ is the noise. These signals are passed through the mixture model with two forward impulse responses and two cross coupled impulse responses. In order to simplify the depicted model further, unit impulse response considered for the forward paths and additional background noise is assumed zero. Under this generalization, the outputs $y1\left( n \right)$ and $y2\left( n \right)$ of the mixture models are given by

$$ y1\left( n \right) = s\left( n \right) + h21\left( n \right)*i\left( n \right) $$

(1)

$$ y2\left( n \right) = i\left( n \right) + h12\left( n \right)*s\left( n \right) $$

(2)

Now these mixture signals are passed through an adaptive filtering block for speech enhancement. This block will deconvolve the mixture signals (Sayoud et al., 2018) using two adaptive filters with variable step sizes and a variable updation algorithm as shown in Fig. 2.

The deconvolved signals are denoted as $z1\left( n \right)$ and $z2\left( n \right)$ respectively with the corresponding weights of the adaptive filters as

$$ {\varvec{p21}}\left( {\text{n}} \right) \, = \, \left[ { \, {\varvec{p21}};0\left( {\text{n}} \right),{\text{ p21}};{1}\left( {\text{n}} \right),{\text{ p21}};{2}\left( {\text{n}} \right), \ldots \ldots ..{\text{p21}};{\text{M}} - {1}\left( {\text{n}} \right)} \right]^{{\text{T}}} $$

(3)

$$ {\varvec{p12}}\left( {\text{n}} \right)\; = \;\left[ {{\text{ p12}};0\left( {\text{n}} \right),{\text{ p12}};{1}\left( {\text{n}} \right),{\text{ p12}};{2}\left( {\text{n}} \right), \ldots \ldots ..{\text{p12}};{\text{M}} - {1}\left( {\text{n}} \right)} \right]^{{\text{T}}} $$

(4)

These weights are updated in order to remove the signal content from the noise correlated component. Now the estimated output speech is given as

$$ z1\left( n \right) = y1\left( n \right) - {\mathbf{p12}}^{{\text{T}}} \;{\varvec{z}}2\left( n \right) $$

(5)

where ${\varvec{z}}2\left( n \right)$ = [$z2\left( n \right), z2\left( {n - 1} \right)$,$z2\left( {n - 2} \right)$…………$ z2\left( {n - M + 1} \right)$]^T is the tapped delay mixed signal. In a similar way, noise estimation is done. In the blind deconvolution procedure the weights of two adaptive filters p12(n) and p21(n) are updated (Rahima et al., 2018) using normalized least mean square algorithm and here the weights are updated as follows.

$$ {\mathbf{p12}}\left( {\text{n}} \right) \, = {\mathbf{p12}}\left( {\text{n}} \right) + \sigma 1{\varvec{y}}2\left( n \right) k2\left( n \right) z1\left( n \right) $$

(6)

$$ {\mathbf{p21}}\left( {\text{n}} \right) \, = {\mathbf{p21}}\left( {\text{n}} \right) + \sigma 2{\varvec{y}}1\left( n \right) k1\left( n \right)\;z2\left( n \right) $$

(7)

here $k1\left( n \right)$ and $k2\left( n \right) $ are the adaptation gain vectors with variable step sizes(σ1,σ2), which are calculated from the likelihood variables and kalman gain. The above update rules are termed as double backward normalized least mean square in this paper. The cross coupling model used for the mixing is a familiar acoustic path model where in impulse response will be generally smaller in nature. So as to take the advantage over these impulse responses, a new updating rule may be derived as

$$ {\varvec{p21}}\left( {{\text{n}} + {1}} \right)\; = \;{\varvec{p21}}\left( {\text{n}} \right) \, - y2(n){\text{k2}}\left( {\text{n}} \right).{\text{z1}}\left( {\text{n}} \right) \, {-} \, \lambda \left\{ {{\text{rect}}\left( {{\text{p21}}\left( {\text{n}} \right)} \right)/\left| {\left| {{\text{ p21}}\left( {\text{n}} \right)} \right|} \right|^{{2}} } \right\} $$

(8)

$$ {\varvec{p12}}\left( {{\text{n}} + {1}} \right)\; = \;{\varvec{p12}}\left( {\text{n}} \right)\; - \;y1\left( n \right){\text{k1}}\left( {\text{n}} \right).{\text{z2}}\left( {\text{n}} \right) \, {-} \, \varepsilon \left\{ {{\text{rect}}\left( {{\text{p12}}\left( {\text{n}} \right)} \right)/\left| {\left| {{\text{ p12}}\left( {\text{n}} \right)} \right|} \right|^{{2}} } \right\} $$

(9)

where λ and ε are the loss factors for the update filters while minimizing the cost functions. The adaptation gains k1(n) and k2(n) are obtained by using the calculation of dual kalman variables while discarding the forward and backward predictors.

3 Simulation results

For the purpose of the simulation of the proposed method, speech signal is taken from GRID corpus database and noise is taken from NOISEX92 database. Simulation parameters have been computed for different input SNR scenarios are PESQ (Rix et al., 2001) and STOI (Taal et al., 2010), both have shown significant improvement. PESQ is used for evaluating the quality of the processed speech. The higher the PESQ score, the better will be the quality. The short time objective intelligibility is used to evaluate intelligibility of speech. The STOI is shown to have a high correlation with the speech intelligibility. The larger the STOI score, the more intelligible will be the speech. The simulation results for babble noise are presented in the following Tables 1, 2 and are compared with the available methods (Raj, 2019). The computed PESQ and STOI performance measures are depicted graphically in the following Figs. 3 and 4.

Table 1 Comparison of the results of PESQ for different methods

Full size table

Table 2 Comparison of the results of STOI for different methods

Full size table

The simulation results for white noise are presented in the following Tables 3, 4 and are compared with the existing methods (Raj, 2019).

Table 3 Comparison of the results of PESQ for different methods

Full size table

Table 4 Comparison of the results of STOI for different methods

Full size table

From the Figs. 5 and 6 presented above, it is clearly evident that PESQ and STOI scores have improved significantly, in comparison to the existing methodologies. The performance metrics used here hold the highest scores for all the conditions.

In the conventional speech enhancement methods, it is easy to take out the harmonic structure at high frequencies owing to its weak energy so that quality and intelligibility is reduced. Considering the temporal relevance of the speech spectra between the adjacent frames and the application of double backward least mean square algorithm to update the weights of the adaptive filter which result in better noise removal, assist in attaining the better results by the proposed method.

4 Concluding remarks

In this paper, a double backward distributive weighted adaptive filtering scheme has been proposed for quality improvement of the speech signal. The proposed approach provides significant lead over the distributive nature of the impulse responses used in the mixing scenario. This scheme has been shown to provide improved speech quality with babble and white noise simulation in terms of PESQ and STOI when compared to the traditional methods. Hence, this is a better method for speech enhancement in terms of speech quality and intelligibility. This method can be implemented for the effectiveness by incorporating a modified filter structure as well as with the other database signals.

References

Adam, E. E. B. (2020). Deep learning based NLP techniques in text to speech synthesis for communication recognition. Journal of Soft Computing Paradigm (JSCP), 2(04), 209–215.
Article Google Scholar
Al-Kindi, M. J., & Dunlop, J. (1989). Improved adaptive noise cancellation in the presence of signal leakage on the noise reference channel. Signal Processing, 17(3), 241–250.
Article MathSciNet Google Scholar
Bashar, A. (2019). Survey on evolving deep learning neural network architectures. Journal of Artificial Intelligence, 1(02), 73–82.
Google Scholar
Benallal, A., & Arezki, M. (2014). A fast convergence normalized least-mean-square type algorithm for adaptive filtering. International Journal of Adaptive Control and Signal Processing, 28(10), 1073–1080.
Article MathSciNet MATH Google Scholar
Bendoumia, R., & Djendi, M. (2015). Two-channel variable-step-size forward and backward adaptive algorithms for acoustic noise reduction and speech enhancement. Signal Processing, 108, 226–244.
Article Google Scholar
Cioffi, J., & Kailath, T. (1984). Fast recursive least squares transversal filters for adaptive filtering. IEEE Transactions on Acoustic Speech Signal Processing ASSP, 32, 304–337.
Article MATH Google Scholar
Djendi, M., Henni, R., & Sayoud, A. (2016). A new dual forward BSS based RLS algorithm for speech enhancement. In International Conference on Engineering and MIS, ICEMIS 2016, Agadir, Morooco.
Djendi, M., & Bendoumia, R. (2013). A new adaptive filtering subband algorithm for two-channel acoustic noise reduction and speech enhancement. Computers & Electrical Engineering, 39(8), 2531–2550.
Article Google Scholar
Djendi, M., & Bendoumia, R. (2014). A new efficient two-channel backward algorithm for speech intelligibility enhancement: A subband approach. Applied Acoustics, 76, 209–222.
Article Google Scholar
Djendi, M., Gilloire, A., & Scalart, P. (2007). New frequency domain post-filters for noise cancellation using two closely spaced microphones. Proc EUSIPCO, Poznan, 1, 218–221.
Google Scholar
Gerven, S. V., & Compernolle, D. V. (1995). Signal separation by symmetric adaptive decorrelation: Stability, convergence, and uniqueness. IEEE Transactions on Signal Processing, 43(7), 1602–1612.
Article Google Scholar
Ghribi, K., Djendi, M., & Berkani, D. (2016). A New wavelet-based forward BSS algorithm for acoustic noise reduction and speech quality enhancement. Applied Acoustics, 105, 55–66.
Article Google Scholar
Kajla, P., & George, N. V. (2020). Speech quality enhancement using a two channel sparse adaptive filtering approach. Applied Acoustics, 158, 107035.
Article Google Scholar
Manoharan, S. (2019). A smart image processing algorithm for text recognition information extraction and vocalization for the visually challenged. Journal of Innovative Image Processing (JIIP), 1(01), 31–38.
Article Google Scholar
Mirchandani, G., Zinser, R. L., & Evans, J. B. (1992). A new adaptive noise cancellation scheme in the presence of crosstalk. IEEE Transactions on Circuits and Systems, 39(10), 681–694.
Article MATH Google Scholar
Mitra, A. (2020). Sentiment analysis using machine learning approaches (Lexicon based on movie review dataset). Journal of Ubiquitous Computing and Communication Technologies (UCCT), 2(03), 145–152.
Article Google Scholar
Nicolas, D., & Mike, B. (2019). Modulation domain kalman filtering for monaural blind speech denoising and dereverberation. IEEE/ACM Transactions on Audio, Speech and Language Processing, 27(4), 799–814.
Article Google Scholar
Ozeki, K., & Umeda, T. (1984). An adaptive filtering algorithm using an orthogonal projection to an affine subspace and its properties. Electronics and Communications in Japan, 67(5), 19–27.
Article MathSciNet Google Scholar
Rahima, H., Mohamed, D., & Djebari, M. (2018). A dual backward adaptive algorithm for speech enhancement and acoustic noise reduction. In Proceedings of the Fourth International Conference on Engineering & MIS 2018 (pp. 1–4).
Raj, J. S. (2019). A comprehensive survey on the computational intelligence techniques and its applications. Journal of ISMAC, 1(03), 147–159.
Article Google Scholar
Rakesh, P., & Kumar, T. K. (2015). A novel RLS adaptive filtering method for speech enhancement. Electrical, Computer, Energetic, 9(2), 225–229.
Google Scholar
Rix, A. W., Beerends, J. G., Hollier, M. P., & Hekstra, A. P. (2001). Perceptual evaluation of speech quality (PESQ)-a new method for speech quality assessment of telephone networks and codecs. In 2001 IEEE international conference on acoustics, speech, and signal processing. Proceedings (Cat. No. 01CH37221) (Vol. 2, pp. 749-752). IEEE
Sayoud, A., Djendi, M., Medahi, S., & Guessoum, A. (2018). A dual fast nlms adaptive filtering algorithm for blind speech quality enhancement. Applied Acoustics, 135, 101–110.
Article Google Scholar
Shrawankar, U., & Thakare, V. (2010). Noise estimation and noise removal techniques for speech recognition in adverse environment. In International Conference on Intelligent Information Processing (pp. 336–342). Springer, Berlin,
Taal, C. H., Hendriks, R. C., Heusdens, R., & Jensen, J. (2010). A short-time objective intelligibility measure for time-frequency weighted noisy speech. In 2010 IEEE international conference on acoustics, speech and signal processing (pp. 4214–4217).
Widrow, B., & Hoff, M. (1960). Adaptive switching circuits. In Proceedings of IRE Western Electronic Show and Convention (Part 4, pp. 96–104).

Download references

Author information

Authors and Affiliations

ECE Department, National Institute of Technology, Kurukshetra, Haryana, India
V. Srinivasarao & Umesh Ghanekar

Authors

V. Srinivasarao
View author publications
You can also search for this author in PubMed Google Scholar
Umesh Ghanekar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to V. Srinivasarao.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Srinivasarao, V., Ghanekar, U. A new double backward distributive weighted adaptive filtering approach for speech quality improvement. Int J Speech Technol 25, 831–836 (2022). https://doi.org/10.1007/s10772-021-09894-0

Download citation

Received: 11 January 2021
Accepted: 21 August 2021
Published: 20 October 2021
Issue Date: December 2022
DOI: https://doi.org/10.1007/s10772-021-09894-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A new double backward distributive weighted adaptive filtering approach for speech quality improvement

Abstract

Similar content being viewed by others

An efficient wavelet-based adaptive filtering algorithm for automatic blind speech enhancement

Advanced Feedforward-and-Feedback Decorrelation Algorithms for Speech Quality Enhancement

Single Channel Speech Enhancement for Mixed Non-stationary Noise Environments

1 Introduction

2 Proposed method of speech enhancement

3 Simulation results

4 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A new double backward distributive weighted adaptive filtering approach for speech quality improvement

Abstract

Similar content being viewed by others

An efficient wavelet-based adaptive filtering algorithm for automatic blind speech enhancement

Advanced Feedforward-and-Feedback Decorrelation Algorithms for Speech Quality Enhancement

Single Channel Speech Enhancement for Mixed Non-stationary Noise Environments

Explore related subjects

1 Introduction

2 Proposed method of speech enhancement

3 Simulation results

4 Concluding remarks

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation