Abstract
This paper presents a comparative analysis of spectral subtraction and Weiner denoising techniques for musical noise reduction. The iterative spectral subtraction method provides least musical noise generation applied in different noisy environments. The method of musical noise production is traced by observing the change in the kurtosis ratio of noise spectrum using different denoising techniques for different noisy signal. A MATLAB simulation is performed for four different noisy environments car noise, babble noise, operation room noise and machine gun noise at −10, −5, 0, 5 and 10 dB input SNR levels. It is observed that wiener based methods provide more improvement in SNR as compared to spectral subtraction based methods. But at the same time musical noise generation is more in wiener based methods. The wiener based method HRNR gives a maximum 35.77 dB improvement in SNR for car noise at −10 dB input SNR level. Iterative spectral subtraction gives the minimum value of kurtosis ratio for all noises at all input SNR level.
Access provided by Autonomous University of Puebla. Download conference paper PDF
Similar content being viewed by others
Keywords
- Speech enhancement
- Musical noise
- Spectral subtraction
- Iterative spectral subtraction
- Geometrical approach
- DD approach
- TSNR
- HRNR
1 Introduction
The speech quality is deteriorated under adverse noise conditions in hearing aids and mobile phones. Therefore noise reduction requires more attention by various researchers. A commonly and efficient method used for noise reduction is spectral subtraction method. This method gives reduction in both noise performance as well as computational complexity [1–5]. Whereas Weiner based denoising techniques like TSNR (two step noise reductions) and HRNR (harmonic regeneration noise reduction) effectively removes the reverberation effect. Classic short-time noise reduction techniques removes noise as well as introduces harmonic distortion. For example, TSNR enhances speech but introduces harmonic distortion because of the unreliability of estimators for low signal-to-noise ratios [6]. A significant improvement is brought by HRNR compared to TSNR. However, the major disadvantage of these methods is Musical noise generation due to non-linear signal processing. This provides a significant distortion in the speech quality and intelligibility. This paper is an extension of the paper titled “Comparative analysis of speech enhancement methods” [7] with a comparison of musical noise reduction capability of various speech enhancement methods.
Spectral subtraction method [7] provides an efficient noise reduction for low musical noise. The amount of musical noise generation and the difference between higher-order statistics of the power spectra before and after nonlinear signal processing shows a higher correlation [8, 9]. Here, amount of musical noise generated by the spectral subtraction [10], iterative spectral subtraction method [11–14] geometrical approach for spectral subtraction [15], Weiner based denoising techniques like TSNR and HRNR [16, 17] are compared for musical noise reduction capability.
2 Mathematical Analysis of Musical Noise Generation via Higher-Order Statistics
The amount of musical noise generation is strongly correlated with different isolated power spectral components and the isolation level of these components [18]. A Higher order statistics called Kurtosis is adopted to measure these isolated components among all components. A higher value of kurtosis signifies a signal with many isolated components. However, the calculation of kurtosis is not sufficient to measure the amount of musical noise generation. Therefore, the change in kurtosis between signals before and after signal processing is used to identify only the musical-noise components. Therefore the kurtosis ratio is used as a measure to estimate musical noise defined as
where kurtproc = kurtosis of processed signal, kurtorg = kurtosis of observed signal The kurtosis ratio increases with increment in amount of musical noise.
3 Speech Enhancement Algorithms
Two different class of enhancement algorithms are presented, out of which three are spectral subtraction based methods and other two are Weiner based methods. Noisy speech signal is given by Eq. (2).
where s(n), d(n) and y(n) represent the pure speech signal, uncorrelated additive noise and the degraded speech signal respectively [10].
3.1 Spectral Subtraction
The principal of spectral subtraction method [10] is to achieve the estimated clean signal spectrum by the subtraction of an estimated noise spectrum from the corrupted speech signal spectrum. The estimation of noise spectrum taken and updated during the silence periods when the signal is not present i.e. in presence of noise only. The noise is assumed to be additive, stationary or near stationary. The Eq. (2) can be converted to Eq. (3) after Fourier transform.
Magnitude and phase of Y[w] can be expressed as follows
where |Y(w)| = magnitude spectrum, \( \phi_{y} \) = phase spectra of the corrupted noisy speech signal. Noise signal can be expressed in transformed domain as follows
Here clean speech signal is estimated by subtraction of noise spectrum from the noisy speech spectrum as given in Eq. (6).
The unknown noise spectrum |D(w)| is calculated by the average value in absence of speech signal. Spectral subtraction method is represented in Fig. 1.
3.2 Iterative Spectral Subtraction
The only drawback of spectral subtraction is that a clear narrowband of noise still remains in the spectrum, even if our estimate of noise is correct. To overcome the drawback of spectral subtraction of weak signals another approach in which spectral analysis is iteratively applied on the signals, commonly known as Iterative Spectral Subtraction methods [11–14].
3.3 Spectral Subtraction Using Geometrical Approach
Geometric approach [15] is used to overcome the problem of spectral subtraction algorithm. This method involves the estimation of phase differences between the noisy signals and noise.
3.4 Decision-Directed (DD) Approach
The characteristics of this estimator has been tested by decision-directed (DD) approach proposed by Ephraım and Malah [19]. The main disadvantage of DD approach is to introduce the reverberation effect. Reverberation effect is minimized by Two-Step-Noise-Reduction (TSNR) technique as well it keeps the benefit of DD method. But TSNR, introduce harmonic distortion in enhanced speech because of the unreliability of the estimator for small SNR. To remove these harmonic distortions Harmonic Regeneration Noise Reduction (HRNR) is implemented [16, 17, 20].
4 Simulation Results and Discussion
All algorithms are implemented and simulated for speech enhancement in MATLAB. Then these algorithms are compared for four different noises- car noise, F16 noise, operation room noise and machine gun noise. Sound quality is evaluated and compared on the basis of their improved output SNR and the higher order statistics by finding Kurtosis ratio before and after the signal processing. One sample “YAHA SAI LAGHBAG PANCH MEAL DAKSHIN PASCHIM MAI KATGHAR GAON HAI”] has been used to check performance from our database [21].
The noisy version of this sentence was prepared by adding car noise and F16 noise from NOISEX-92 database [22] to this clean sentence at −10, −5, 0, 5 and 10 SNR levels. In spectral subtraction methods, β = 1.1 and ŋ = 0.8 are taken for implementation where β and ŋ are over subtraction factor and spectral floor parameter respectively. Residual noise and the perceived Musical noise are controlled by parameter β. A small value of β means the audible musical noise but the reduced residual noise. Also for a large β, residual noise will be audible but the musical issues will be reduced due to spectral subtraction. Also the amount of speech spectral distortion is greatly affected by the parameter α. The resulting signal will be highly distorted for large α. As well as the signal is suffered with poor intelligibility. Noise remains in enhanced speech signal for small value of α. In Geometrical based analysis parameter α is taken as 0.98 and parameter β is taken as 0.6. Similarly in Weiner based algorithms the value of parameter α at 0.98 gives the optimum result for enhanced speech. Simulation results are shown in the Tables 1, 2, 3 and 4 for the car noise, babble noise, operation room noise and machine gun noise respectively. Figure 2 shows average improvement in SNR at all input SNR level for all noises for each enhancement method.
Figure 3 shows average kurtosis ratio at all input SNR level for all noises for each enhancement method. It is observed from these figures that wiener based methods gives better results than basic spectral subtraction based methods in terms of increase in output SNR. It is observed from the Fig. 2 that output SNR level of HRNR algorithm gives the best result among all the algorithms at all input SNR level for all noises. Iterative spectral subtraction gives improvement in SNR with lesser musical noise generation as compared to other methods due to lowest kurtosis ratio than others.
5 Conclusion
Higher-order statistics has been used for implementation of musical-noise-generation analysis for nonlinear noise reduction. The HRNR Weiner based algorithm provided the best output SNR among all algorithms at all input SNR levels. It is observed from the higher order statistics that iterative spectral subtraction has less kurtosis ratio such that it enhanced the signal with least musical noise generation.
References
P. C. Loizou, Speech Enhancement Theory and Practice, Boca Raton, FL: CRC, Taylor & Francis Group, 2007.
S. F. Boll, “Suppression of acoustic noise in speech using spectral subtraction”, IEEE Trans. Acoust., Speech, Signal Process., vol. 27, no. 2, 1979, pp. 113–120.
M. Berouti, R. Schwartz, and J. Makhoul, “Enhancement of speech corrupted by acoustic noise”, Proc. ICASSP, 1979, pp. 208–211.
R. McAulay and M. Malpass, “Speech enhancement using a soft-decisionnoise suppression filter”,IEEE Trans. Acoust., Speech, Signal Process., vol. 28, no. 2, 1980, pp. 137–145.
R. Martin, “Spectral subtraction based on minimum statistics”, Proc. EUSIPCO, 1994, pp. 1182–1185.
Cyril Plapous, Claude Marro, and Pascal Scalart, “Improved Signal-to-Noise Ratio Estimation for Speech Enhancement” IEEE Transactions on Audio, Speech, and Language Processing, Vol.14, Issue 6, 2006, pp. 2098–2108.
Pankaj Goel, Prateek Saxena, V.K. Gupta, Mahesh Chandra, “Comparative analysis of speech enhancement methods”, proc.10th IEEE Int.Confrence on Wireless and Optical networks, 2013, pp.1–5.
Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, and K. Kondo, “Automatic optimization scheme of spectral subtraction based on musical noise assessment via higher-order statistics”, Proc. Of Int. Workshop. Acoust. Echo and Noise Control, 2008.
Y. Uemura, Y. Takahashi, H. Saruwatari, K. Shikano, and K. Kondo, “Musical noise generation analysis noise reduction methods based on spectral subtraction and MMSE STSA estimation”, Proc. Of ICASSP, 2009, pp. 4433–4436.
Purav Goel, Anil Garg, “Developments in spectral subtraction for speech enhancement,” International Journal of Engineering Research and Applications, Vol. 2, Issue 1, 2012, pp. 055–063.
K. Yamashita, S. Ogata, and T. Shimamura, “Spectral subtraction iterated with weighting factors,” Proc. IEEE Speech Coding Workshop, 2002, pp. 138–140.
Kiyohiro Shikano, and Kazunobu KondoK. Yamashita, S. Ogata, and T. Shimamura, “Improved spectral subtraction utilizing iterative processing”, IEICE Trans. A, vol. 88, no. 11, 2005, pp. 1246–1257.
M. R. Khan and T. Hassan, “Iterative noise power subtraction technique for improved speech quality,” Proc. of Int. Conf. Elect. Comput. Eng, 2008, pp. 391–394.
X. Li, G. Li, and X. Li, “Improved voice activity detection based on iterative spectral subtraction and double thresholds for CVR,” Proc. of Workshop Power Electron. Intell. Transport. Syst., 2008, pp.153–156.
Yang Lu, Philipos C. Loizou, “A geometric approach to spectral subtraction,” Speech Communication, Vol. 50,2008, pp. 453–466.
C. Plapous, C. Marro, P. Scalart, and L. Mauuary, “A Two-Step Noise Reduction Technique,” IEEE Intl. Conf. Acoust., Speech, Signal Processing, Canada, Vol. 1, 2004, pp. 289–292,
C. Plapous, C. Marro, and P. Scalart, “Speech Enhancement Using Harmonic Regeneration,”IEEE Intl. Conf. Acoust., Speech, Signal Processing, USA, Vol. 1, 2005, pp. 157–160.
Ryoichi Miyazaki, Hiroshi Saruwatari, Takayuki Inoue, Yu Takahashi, “Musical-Noise-Free Speech Enhancement Based on Optimized Iterative Spectral Subtraction“, IEEE Transactions on Audio, Speech, and Language Processing, VOL. 20, NO. 7, 2012, pp. 2080–2094.
Y. Ephraım, and D. Malah, “Speech Enhancement Using a Minimum Mean-Square Error Short-Time Spectral Amplitude Estimator,” IEEE Trans. Acoust., Speech, Signal Processing, Vol. 32, No. 6, 1984, pp. 1109–1121.
O. Capp´e, “Elimination of the Musical Noise Phenomenon with the Ephra¨ım and Malah Noise Suppressor,” IEEE Trans. Speech and audio Processing, Vol. 2, No. 2, 1994, pp. 345–349.
Samudravijaya K et. al., Hindi Speech Database, Proc. ICSLP00, Beijing, China, CDROM 00192.pdf.
A. Varga, H. J. M. Steeneken, D. Jones, “The noisex-92 study on the effect of additive noise on automatic speech recognition system,” Reports of NATO Research Study Group (RSG.10), June 1992.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2016 Springer India
About this paper
Cite this paper
Saxena, P., Gupta, V.K., Chandra, M. (2016). Musical Noise Reduction Capability of Various Speech Enhancement Algorithms. In: Satapathy, S.C., Mandal, J.K., Udgata, S.K., Bhateja, V. (eds) Information Systems Design and Intelligent Applications. Advances in Intelligent Systems and Computing, vol 434. Springer, New Delhi. https://doi.org/10.1007/978-81-322-2752-6_68
Download citation
DOI: https://doi.org/10.1007/978-81-322-2752-6_68
Published:
Publisher Name: Springer, New Delhi
Print ISBN: 978-81-322-2750-2
Online ISBN: 978-81-322-2752-6
eBook Packages: EngineeringEngineering (R0)