Keywords

1 Introduction

Communication is the necessity of all human beings. Speech is one of the most important communication sources. It is the most natural, intuitive and fastest means of interaction among humans [1]. The speech signal contains the background noise and the distorted speech signal. The noise present in the speech signal causes the degradation in the quality of the speech. Over the last few decades, the pre-processing of speech signal which involves the removal of noise from speech has been an area of interest for the researchers [2]. Therefore, to remove a noise from the speech is very important as it may affect the further processing of the speech signal. So, filtering the noise from the acquired speech signal plays an important role in the speech signal analysis. As the frequency domain based signal processing can be designed easily and most of the noise suppression methods require the use of Short Term Fourier Transform (STFT) but in today’s scenario Wavelet Transform (WT) is gaining much importance as it is simple and efficient method for speech signal de-noising. Wavelet transform can analyze the signal and can select the information present in it that other signal de-noising techniques lack [2]. Therefore, lot of study has been done by the researchers for the suppression of noise in speech signals. Aggarwal et al. [2] introduced an approach of multi-resolution analysis using WT and found that the modified universal threshold gives better results of denoising. Bhowmick et al. [3] proposed a method of Voiced Speech Probability Detection Wavelet Decomposition (VSPDWD) and compared it with seven different techniques. It was found that the VSPDWD technique gave an improved SNR values at all levels of decomposition. Akkar et al. [4] in their study made a comparison between different thresholding techniques. It was found that the square wavelet thresholding gave better results than the traditional thresholding techniques. Babichev et al. [5] introduced a de-noising method that is the Emperical Mode and the Wavelet Decomposition techniques. It was found that there was a relative change in the values of the Shannon Entropy used for the quality criterion. This indicates that the technique used was effective. Hadi et al. [6] in their study made a comparison between different threshold selection rule. It was found that the sawtooth wavelet thresholding gave better results than the traditional thresholding techniques.

In this paper, a combination of Discrete Wavelet transform and Hard Thresholding technique for noise reduction has been proposed. Wavelet Transform provides a multi-resolution and is a better technique than Fourier Transform and STFT. This paper consists of following sections: Section 2 gives the description of the pre-processing of noisy speech signals using wavelets. Section 3 consists of experimental setup, Sect. 4 consists of exploration of suitable Wavelet Family for speech signal analysis and Sect. 5 consists of simulation results. Section 6 consists of the conclusion of the analysis done in the proposed work.

2 Pre-processing of Noisy Speech Signals Using Wavelets

2.1 Discrete Wavelet Transform

For suppressing the noise present in the speech signal, Discrete Wavelet Transform has been used. It involves the decomposition of the speech signal in the time frequency domain. The noise present in the speech signal cannot be easily removed by using the Kalman or Chebyshev filters. Therefore, it can be removed by applying the wavelet transform [7,8,9]. A wavelet is a wave like oscillation that begins at zero, increases and then again goes to zero [1, 10]. The scaled and shifted version of fundamental or mother wavelet Ψ is elucidated below [3]:

$${\varPsi_ {\tau,\beta}} (t) = \beta^{ - 1/2} \varPsi \left( {\frac{t - \tau }{\beta }} \right)$$
(1)

where β is the scaling parameter and τ is the translation parameter. The noisy speech signal s(t) is decomposed into sub-bands through DWT into approximation and detailed coefficients. The detailed coefficient or the higher frequency component D(p, k) has been elucidated below [3]:

$$D(p,k) = 2^{ - p/2} \sum\nolimits_{n} {s(n)\varPsi^{*} (2^{ - p} n - k)}$$
(2)

where p, k and n are integers and ψ*(t) is the complex conjugate of ψ(t). The approximation coefficients or the lower frequency component A(p, k), has been elucidated below [3]:

$$A(p,k) = 2^{ - p/2} \sum\nolimits_{n} {s(n)\phi^{*} (2^{ - p} n - k)}$$
(3)

where *(n) is the complex conjugate of the scaling function(n). When DWT is applied to the noisy signal s(t) at different level then the speech signal decomposes to approximation and detailed coefficients [3, 11]. The detailed coefficients are obtained by filtering the high frequency component present in the noisy speech signals through high pass filter and the approximation coefficients are obtained by filtering the low frequency component present in the noisy speech signals through low pass filter. The reconstruction of the original speech signal is done by applying the Inverse Discrete Wavelet Transform (IDWT) to the filtered speech signal which is formed by combining the detailed and the approximation coefficients from the last level of decomposition to the first level. Figures 1 and 2, shows the two level wavelet decomposition and reconstruction process in which s is the noisy speech signal, cA1 and cD1 are the first level approximation and detailed coefficients and cA2 and cD2 are the second level approximation and detailed coefficients [2].

Fig. 1
figure 1

Wavelet decomposition [2]

Fig. 2
figure 2

Wavelet reconstruction [2]

2.2 Universal Thresholding Based Filtering Method for Pre-processing of Speech Signals

The noise present in the speech signal is a major issue in the speech signal analysis. In the proposed work, a universal thresholding based filtering technique using DWT has been proposed. The higher frequency components acquired through DWT is having a residual noise that cannot be removed by applying the simple filtering process [9, 12, 13].

2.2.1 Threshold Selection

The universal threshold value T can be evaluated by the equation elucidated below [3]:

$$T = \sigma \sqrt {2\ln (L)}$$
(4)

where L is the noisy speech signal sample. The standard deviation σ can be evaluated as [5]:

$$\sigma = \frac{{{\text{MAD}}(\text{|}D(n)\text{|})}}{0.6745}$$
(5)

where MAD is the Median Absolute Deviation and \(D\left[ n \right]\) is the detailed coefficient of noisy speech signal.

2.2.2 Threshold Function

The universal threshold function, hard thresholding has been used in the proposed work. The calculation formula for hard \((H_{m,n} )\) threshold function is given below [1]:

$${H_{m,n}} = \left\{ \begin{aligned} {\omega_{m,n}}\left| {\omega_{m,n}} \right| \ge \mu \hfill \\ 0\left| {\omega_{m,n}} \right| < \mu \hfill \\ \end{aligned} \right.,$$
(6)

where \(\omega_{m,n}\) is the wavelet decomposition coefficient of the noisy speech signal and μ is the threshold value. The threshold value μ is placed to zero and if the value of the coefficients is more than the threshold value then all the coefficients are threshold and this is known as Hard Thresholding.

3 Experimental Setup

The proposed method is performed and evaluated on NOIZEUS database [14]. The database accommodates 30 IEEE sentences contaminated with eight different noises at different SNRs. The noise is added to the sentences from the AURORA database [15] that includes train, babble, car, exhibition hall, restaurant, street, airport and train-station noise. In this experiment the noise from the noisy speech signal is removed using Discrete Wavelet Transform technique. In the proposed methodology the noisy speech signal is decomposed into Approximation and Detailed coefficients by using different types of wavelets like Daubechies, Coiflets, Symlets and Haar wavelet. The detail coefficients are difficult to remove through filters. Therefore, noise suppression in the noisy speech signals is done through hard thresholding [16,17,18]. The evaluation of the proposed work is done by calculating the SNR by using the mathematical expression as elucidated in Eq. (7) [3, 11]:

$${\text{SNR}} = 10\log 10\left( {\frac{{\sum\nolimits_{m = 1}^{L} {s^{2} } (m)}}{{\sum\nolimits_{m = 1}^{L} {\text{|}s(m) - \hat{s}(m)\text{|}^{2} } }}} \right)$$
(7)

where \(L\) is the sample size for the filtered speech signal, \(s\left( m \right)\) is the noisy speech signal and \(\hat{s}\left( m \right)\) is the clean speech signal.

4 Exploration of Suitable Wavelet Family for Speech Signal Analysis

In the proposed work different Wavelet families has been explored for the suppression of noise from speech signals. From the different family used which are Daubechies, Symlets, Coiflets and Haar wavelet, Coiflets tends to give the optimal SNR value. The comparative analysis of SNR values for different wavelet family is shown in Table 1. The noise in the signal decreases from 0 dB to 10 dB and the SNR value of the respective signal tends to increase. The order 5 of the Coiflet wavelet gives the optimal SNR value. The comparative analysis of SNR values for different order of Coiflet Wavelet are shown in Table 2. Here it is examined that as the order of the Wavelet family increases, the SNR value increases. The comparative analysis of SNR values for different level of decomposition of noisy speech signal is shown in Table 3. Here it is analyzed that the SNR improves to a certain level of decomposition and then it stops as the sample number decreases in lower sub-bands. So, based on the explored parameters the simulation of result has been done.

Table 1 Comparative analysis of SNR values for different wavelet families
Table 2 Comparative analysis of SNR values for different order of Coiflet wavelet
Table 3 Comparative analysis of SNR values for different level of decomposition

5 Simulation Results

The speech signals contain the noise which is important to remove as it causes the difficulty in the further processing of the signal. Figure 3 shows the noisy and filtered speech signal. The decomposition of the noisy speech signal is done through DWT at various levels along with a different wavelet family. The hard thresholding is applied by calculating the threshold value \(T\) to the coefficients obtained through the decomposition of the noisy speech signal [19, 20]. From the exploration of suitable wavelet family the result is simulated. Therefore, an improved result of pre-processing of speech signal by using the combination of DWT and Hard Thresholding has been obtained. The reconstruction of the noisy speech signal is done through IDWT. The important information present in the reconstructed speech signal is not lost [21, 22].

Fig. 3
figure 3

Noisy and the filtered speech signal. SNR (noisy signal) = 2.7767 dB. SNR (filtered signal) = 3.7698 dB

6 Conclusion

In the proposed work, the pre-processing of the noisy speech signal through the combination of DWT and hard thresholding has been done and the Wavelet Family has been explored to obtain the improved result. Comparative analysis for the best wavelet family, order of the wavelet and the best level of decomposition has been obtained and based on the explored parameters the simulation of the result for the suppression of noisy speech signal has been done.