Keywords

1 Introduction

The speech quality is deteriorated under adverse noise conditions in hearing aids and mobile phones. Therefore noise reduction requires more attention by various researchers. A commonly and efficient method used for noise reduction is spectral subtraction method. This method gives reduction in both noise performance as well as computational complexity [15]. Whereas Weiner based denoising techniques like TSNR (two step noise reductions) and HRNR (harmonic regeneration noise reduction) effectively removes the reverberation effect. Classic short-time noise reduction techniques removes noise as well as introduces harmonic distortion. For example, TSNR enhances speech but introduces harmonic distortion because of the unreliability of estimators for low signal-to-noise ratios [6]. A significant improvement is brought by HRNR compared to TSNR. However, the major disadvantage of these methods is Musical noise generation due to non-linear signal processing. This provides a significant distortion in the speech quality and intelligibility. This paper is an extension of the paper titled “Comparative analysis of speech enhancement methods” [7] with a comparison of musical noise reduction capability of various speech enhancement methods.

Spectral subtraction method [7] provides an efficient noise reduction for low musical noise. The amount of musical noise generation and the difference between higher-order statistics of the power spectra before and after nonlinear signal processing shows a higher correlation [8, 9]. Here, amount of musical noise generated by the spectral subtraction [10], iterative spectral subtraction method [1114] geometrical approach for spectral subtraction [15], Weiner based denoising techniques like TSNR and HRNR [16, 17] are compared for musical noise reduction capability.

2 Mathematical Analysis of Musical Noise Generation via Higher-Order Statistics

The amount of musical noise generation is strongly correlated with different isolated power spectral components and the isolation level of these components [18]. A Higher order statistics called Kurtosis is adopted to measure these isolated components among all components. A higher value of kurtosis signifies a signal with many isolated components. However, the calculation of kurtosis is not sufficient to measure the amount of musical noise generation. Therefore, the change in kurtosis between signals before and after signal processing is used to identify only the musical-noise components. Therefore the kurtosis ratio is used as a measure to estimate musical noise defined as

$$ kurtosis\,ratio = \frac{{kurt_{proc} }}{{kurt_{org} }} $$
(1)

where kurtproc = kurtosis of processed signal, kurtorg = kurtosis of observed signal The kurtosis ratio increases with increment in amount of musical noise.

3 Speech Enhancement Algorithms

Two different class of enhancement algorithms are presented, out of which three are spectral subtraction based methods and other two are Weiner based methods. Noisy speech signal is given by Eq. (2).

$$ y(n) = s(n) + d(n) $$
(2)

where s(n), d(n) and y(n) represent the pure speech signal, uncorrelated additive noise and the degraded speech signal respectively [10].

3.1 Spectral Subtraction

The principal of spectral subtraction method [10] is to achieve the estimated clean signal spectrum by the subtraction of an estimated noise spectrum from the corrupted speech signal spectrum. The estimation of noise spectrum taken and updated during the silence periods when the signal is not present i.e. in presence of noise only. The noise is assumed to be additive, stationary or near stationary. The Eq. (2) can be converted to Eq. (3) after Fourier transform.

$$ Y\left[ w \right] = X\left[ w \right] + D\left[ w \right] $$
(3)

Magnitude and phase of Y[w] can be expressed as follows

$$ Y\left[ w \right] = \left| {y\left[ w \right].e^{{j\phi_{y} }} } \right| $$
(4)

where |Y(w)| = magnitude spectrum, \( \phi_{y} \) = phase spectra of the corrupted noisy speech signal. Noise signal can be expressed in transformed domain as follows

$$ D\left[ w \right] = \left| {D\left[ w \right].e^{{j\phi_{y} }} } \right| $$
(5)

Here clean speech signal is estimated by subtraction of noise spectrum from the noisy speech spectrum as given in Eq. (6).

$$ X(w) = \left[ {\left| {Y(w)} \right| - \left| {D(w)} \right|} \right].e^{{j\phi_{y} }} $$
(6)

The unknown noise spectrum |D(w)| is calculated by the average value in absence of speech signal. Spectral subtraction method is represented in Fig. 1.

Fig. 1
figure 1

Magnitude spectral subtraction block diagram

3.2 Iterative Spectral Subtraction

The only drawback of spectral subtraction is that a clear narrowband of noise still remains in the spectrum, even if our estimate of noise is correct. To overcome the drawback of spectral subtraction of weak signals another approach in which spectral analysis is iteratively applied on the signals, commonly known as Iterative Spectral Subtraction methods [1114].

3.3 Spectral Subtraction Using Geometrical Approach

Geometric approach [15] is used to overcome the problem of spectral subtraction algorithm. This method involves the estimation of phase differences between the noisy signals and noise.

3.4 Decision-Directed (DD) Approach

The characteristics of this estimator has been tested by decision-directed (DD) approach proposed by Ephraım and Malah [19]. The main disadvantage of DD approach is to introduce the reverberation effect. Reverberation effect is minimized by Two-Step-Noise-Reduction (TSNR) technique as well it keeps the benefit of DD method. But TSNR, introduce harmonic distortion in enhanced speech because of the unreliability of the estimator for small SNR. To remove these harmonic distortions Harmonic Regeneration Noise Reduction (HRNR) is implemented [16, 17, 20].

4 Simulation Results and Discussion

All algorithms are implemented and simulated for speech enhancement in MATLAB. Then these algorithms are compared for four different noises- car noise, F16 noise, operation room noise and machine gun noise. Sound quality is evaluated and compared on the basis of their improved output SNR and the higher order statistics by finding Kurtosis ratio before and after the signal processing. One sample “YAHA SAI LAGHBAG PANCH MEAL DAKSHIN PASCHIM MAI KATGHAR GAON HAI”] has been used to check performance from our database [21].

The noisy version of this sentence was prepared by adding car noise and F16 noise from NOISEX-92 database [22] to this clean sentence at −10, −5, 0, 5 and 10 SNR levels. In spectral subtraction methods, β = 1.1 and ŋ = 0.8 are taken for implementation where β and ŋ are over subtraction factor and spectral floor parameter respectively. Residual noise and the perceived Musical noise are controlled by parameter β. A small value of β means the audible musical noise but the reduced residual noise. Also for a large β, residual noise will be audible but the musical issues will be reduced due to spectral subtraction. Also the amount of speech spectral distortion is greatly affected by the parameter α. The resulting signal will be highly distorted for large α. As well as the signal is suffered with poor intelligibility. Noise remains in enhanced speech signal for small value of α. In Geometrical based analysis parameter α is taken as 0.98 and parameter β is taken as 0.6. Similarly in Weiner based algorithms the value of parameter α at 0.98 gives the optimum result for enhanced speech. Simulation results are shown in the Tables 1, 2, 3 and 4 for the car noise, babble noise, operation room noise and machine gun noise respectively. Figure 2 shows average improvement in SNR at all input SNR level for all noises for each enhancement method.

Table 1 Output SNR and Kurtosis ratio for car noise
Table 2 Output SNR and Kurtosis ratio for babble noise
Table 3 Output SNR and Kurtosis ration of operation room noise
Table 4 Output SNR and Kurtosis ration for machine gun noise
Fig. 2
figure 2

Improvement in output SNR for various speech enhancement algorithms

Figure 3 shows average kurtosis ratio at all input SNR level for all noises for each enhancement method. It is observed from these figures that wiener based methods gives better results than basic spectral subtraction based methods in terms of increase in output SNR. It is observed from the Fig. 2 that output SNR level of HRNR algorithm gives the best result among all the algorithms at all input SNR level for all noises. Iterative spectral subtraction gives improvement in SNR with lesser musical noise generation as compared to other methods due to lowest kurtosis ratio than others.

Fig. 3
figure 3

Kurtosis ration variations in speech enhancement algorithms

5 Conclusion

Higher-order statistics has been used for implementation of musical-noise-generation analysis for nonlinear noise reduction. The HRNR Weiner based algorithm provided the best output SNR among all algorithms at all input SNR levels. It is observed from the higher order statistics that iterative spectral subtraction has less kurtosis ratio such that it enhanced the signal with least musical noise generation.