1 Introduction

The work of Donoho and Johnstone [10] is the foundation of wavelet-based shrinkage and thresholding methods that have been extensively developed for denoising signals [2, 4, 6, 9, 15, 27]. The wavelet thresholding methods are used in many applications including denoising images [17], electrocardiogram (ECG) signals [1] and speech signals [19, 21]. Despite their success, sometimes the traditional wavelet transforms exhibit visual artifacts. Coifman and Donoho [7] proposed the use of the translation-invariant (TI) wavelet transform—known as undecimated wavelet transform (UWT)— for denoising signals. This helps in suppressing the artifacts and consequently producing better results than the discrete wavelet transform (DWT) as measured by the mean squared error (MSE) and signal-to-noise ratio (SNR) [20, 22, 25].

Although the hard thresholding rule is the simplest one, it can be unstable, i.e., sensitive to small changes in data, because it is defined as a discontinuous function. On the other hand, the soft thresholding rule contributed by Donoho and Johnstone [11]—despite being the most commonly used one—creates unnecessary bias in cases when the true coefficients are large. In order to achieve a compromise between the hard and soft thresholding rules, Fang and Huang [12] proposed the trimmed thresholding rule with a constant parameter \(\alpha \). The trimmed thresholding rule—defined as a continuous function—does not create excessive bias when the wavelet coefficients are large. Actually, the hard and soft thresholding rules are limiting cases of the trimmed thresholding one corresponding to \(\alpha \rightarrow \infty \) and \(\alpha =1\), respectively. One can achieve the best denoising result by a careful tuning of the trimming parameter \(\alpha \) for the signal at hand.

An important step in applying any thresholding method is the determination of an appropriate value of the threshold. Many approaches have been proposed for calculating the threshold value including the mini-max, universal and Stein’s unbiased risk estimator (SURE) [24] thresholds. The SureShrink denoising method—which uses a hybrid of the universal threshold and the SURE based on soft thresholding—has been widely used and shown to perform well [5, 11]. A wavelet thresholding denoising method that uses a hybrid of the universal threshold and the SURE threshold based on a trimmed thresholding strategy for the UWT coefficients was proposed in [16].

In the present paper, a detailed rigorous derivation of the hybrid SURE with optimized trimmed thresholding strategy is given. Moreover, an optimal value of the trimmed thresholding parameter \(\alpha \) is computed using a simple heuristic technique. In order to illustrate the effectiveness of the proposed method, it is compared with hard thresholding, the SureShrink and the hybrid SURE with trimmed thresholding using a fixed value of \(\alpha \). It is also compared with non-wavelet-based methods such as the short-time Fourier transform (STFT) block thresholding [26], the spectral subtraction [3] and the phase spectrum compensation [23] methods. The extensive simulation study includes both the standard test signals of Donoho and Johnstone [10] and ECG signals. It shows an improvement in the signal-to-noise ratio (SNR) of the denoised signals extracted by the proposed method.

In Sect. 2 the signal denoising problem is formulated and wavelet-based extraction methods are reviewed. In Sect. 3 the hybrid SURE with soft thresholding (SureShrink) is surveyed. In Sect. 4 a detailed derivation of the hybrid SURE with optimized trimmed thresholding and the corresponding implementation algorithm are given. In Sect. 5 the proposed method is applied to test signals as well as ECG signals all contaminated by background additive white Gaussian noise (AWGN). A comparison of the denoising results for different noise levels is made.

2 Wavelet-Based Denoising

A sampled noisy signal \(y=\left( {y_1 ,y_2 ,\,\ldots ,\,y_n } \right) ^\mathrm{T}\,\) is generated as

$$\begin{aligned} y=f+\sigma e, \end{aligned}$$
(1)

where f is a vector of samples of a clean signal, e is a vector of independent identically distributed (i.i.d) zero mean Gaussian noises with unit variance \(N\left( {0,1} \right) \), and \(\sigma \) is the noise level which may be known or unknown. The UWT [18] removes the downsampling operator from the usual implementation of the DWT and upsamples the filter responses, thereby inserting holes between nonzero filter taps. The UWT of the one-dimensional signal y in (1) with sample length \(2^{J}\) leads to a set of coefficients \(\mathcal{W}=\left\{ {\hat{c}_{j_0 ,k} ,\hat{d}_{j,k} } \right\} , \quad k=0,1,\ldots ,2^{J}-1,\,j=j_{0\,} ,\ldots ,J-1\), where \(\hat{c}_{j_0,k}\) and \(\hat{d}_{j,k} \,\) are, respectively, the scaling and wavelet coefficients of the noisy data and \(j_{0\,} \) is the coarsest resolution level.

Donoho and Johnstone [10] suggested the extraction of the significant wavelet coefficients by thresholding, that is, shrink the wavelet coefficients toward zero if their absolute value is below a certain threshold level, \({\lambda }\ge 0\). The simplest shrinkage operation is known as “hard thresholding” [10] which is expressed as:

$$\begin{aligned} \eta _\mathrm{H} \left( {\hat{d}_{j,k} ,{\lambda }} \right) =\,\left\{ {{ \begin{array}{ll} {\hat{d}_{j,k}} ,&{} \quad \left| {\hat{d}_{j,k}} \right| \ge \lambda \\ 0,&{} \quad \left| {\hat{d}_{j,k}} \right| <\lambda , \\ \end{array} }} \right. \end{aligned}$$
(2)

where \(\hat{d}_{j,k} \) and \(\eta _\mathrm{H} \left( {\hat{d}_{j,k} ,{\lambda }} \right) \) are the empirical wavelet coefficients before and after hard thresholding, respectively, and \({\lambda }\) is the threshold level. This is a keep-or-kill approach. The hard thresholding process, however, causes discontinuity in the amplitude of the shrunk coefficients. To prevent discontinuity, another approach known as “soft thresholding” was introduced in [7] and can be expressed as:

$$\begin{aligned} \eta _\mathrm{S} \left( {\hat{d}_{j,k} ,{\lambda }} \right) =\left\{ {{ \begin{array}{ll} \hbox {sgn}\left( {\hat{d}_{j,k} } \right) \left( {\left| {\hat{d}_{j,k} } \right| -{\lambda }} \right) ,&{} \quad \left| {\hat{d}_{j,k} } \right| \ge \lambda \\ 0 ,&{} \quad \left| {\hat{d}_{j,k} } \right| <\lambda , \\ \end{array} }} \right. \end{aligned}$$
(3)

where \(\eta _\mathrm{S} \left( {\hat{d}_{j,k} ,{\lambda }} \right) \) are the wavelet coefficients after the soft thresholding process and \(\hbox {sgn}\left( . \right) \,\) is the signum function. This is a shrink-or-kill approach which has the merit that the artifacts caused by discontinuities—which appeared in the case of hard thresholding—are not observed. The disadvantage is that all other coefficients which represent the original signal are also shrunk causing a decrease in the SNR of the denoised signal. A more general thresholding scheme which incorporates the advantages of both soft thresholding and hard thresholding is called trimmed thresholding [13] and is expressed as:

$$\begin{aligned} \eta _\mathrm{T} \left( {\hat{d}_{j,k} ,{\lambda }} \right) =\,\left\{ {{ \begin{array}{ll} \hat{d}_{j,k} \left( {\frac{\left| {\hat{d} _{j,k} } \right| ^{\alpha }-{\lambda }^{{\alpha }}}{\left| {\hat{d}_{j,k} } \right| ^{\alpha }}}\right) ,&{} \quad \left| {\hat{d}_{j,k} } \right| >\lambda \\ 0,&{} \quad \left| {\hat{d}_{j,k} } \right| \le \lambda , \\ \end{array} }} \right. \end{aligned}$$
(4)

where \(\hat{d}_{j,k} \) and \(\eta _\mathrm{T} \left( {\hat{d}_{j,k} ,{\lambda }} \right) \) are the empirical wavelet coefficients before and after trimmed thresholding, respectively, \({\lambda }\) is the threshold value, and \(\alpha \) is a real-valued parameter lying in the range \(1\le \alpha \le \infty \). The wavelet coefficients \(\eta \left( {{\hat{d} _{j,k}} ,{\lambda }} \right) \)—obtained by applying any thresholding rule given in (2), (3) or (4)—are used for a selective reconstruction of the function f in (1). The estimated function—denoted by \(\hat{f} \left( t \right) \)—is obtained by simply performing the inverse undecimated wavelet transform (IUWT) of {\(\hat{c}_{j_0 ,k} ,\eta \left( {{\hat{d}_{j,k}} ,{\lambda }} \right) \)}.

3 The SURE Approach with Soft Thresholding

Donoho and Johnstone [11] have developed a wavelet-based denoising method, denoted by SureShrink, which is based on adaptively choosing threshold values \(\lambda _j \,\) at every resolution level j with soft thresholding. Given a set \(\left\{ {X_1 ,\ldots ,X_p } \right\} \) of statistically independent random variables normally distributed as \(X_i \sim N\left( {\mu _i ,1} \right) \) for \(i=1,\ldots \,,\,{p}\), the estimator of the mean vector of the clean signal \(\mu \) denoted by \({\hat{\mu }} \) is defined by

$$\begin{aligned} \hat{\mu } =X+g\left( X \right) , \end{aligned}$$
(5)

where \(g=\left( {g_i } \right) _{i=1}^p \) is a function from \(\mathcal{R}^{p}\) into \(\mathcal{R}^{p}\). Stein showed that for a weakly differentiable \(g\left( X \right) \), one has:

$$\begin{aligned} E_\mu \left\{ \left| \left| \hat{\mu } \left( X \right) -\mu \right| \right| ^{2} \right\} =\,p+E_\mu \left\{ {\left| \left| g\left( X \right) \right| \right| ^{2}+2\nabla .g\left( X \right) } \right\} , \end{aligned}$$
(6a)

where \(\nabla .g\left( X \right) \) is the divergence of vector \(g\left( X \right) \) defined by \(\nabla .g\left( X \right) \equiv \mathop \sum \nolimits _{i=1}^p \frac{\partial g_i }{\partial X_i }\). One can infer—from the above equation—that the quantity

$$\begin{aligned} \hbox {SURE}\left( {\lambda ;X} \right) =\,p+\left| \left| g\left( X \right) \right| \right| ^{2}+2\nabla .g\left( X \right) \end{aligned}$$
(6b)

is an unbiased estimate of the risk: \(E_\mu \left\{ \parallel {\hat{\mu } \left( X \right) -\mu \parallel ^{2}} \right\} =E_\mu \left\{ {\hbox {SURE}\left( {\lambda ;X} \right) } \right\} \).

For the soft threshold estimator \(\hat{\mu } _i \left( X \right) =\eta _\mathrm{S} \left( {X_i ,{\lambda }} \right) \) [11], the function \(g_i \left( X \right) \) is defined by:

$$\begin{aligned} g_i \left( X \right) =-X_i I\left( i \right) -\lambda \,\hbox {sgn}\left( {X_i } \right) I^{\prime }\left( i \right) ,\, \end{aligned}$$
(7)

where \(I\left( i \right) \) and \(I^{\prime }\left( i \right) \) are, respectively, the set indicator functions of the sets \(\mathcal{I}=\left\{ {i:\left| {X_i } \right| \le \lambda } \right\} \) and \(\mathcal{I}^{{\prime }}=\left\{ {i:\left| {X_i } \right| >\lambda } \right\} \) and \({\lambda }\) is a scalar threshold. The above equation implies that \(\frac{\partial g_i }{\partial X_i }=-I\left( i \right) \) and consequently

$$\begin{aligned} \nabla .g\left( X \right) =\mathop \sum \limits _{i\in \mathcal{I}} -1=-\#\mathcal{I}, \end{aligned}$$
(8)

where #A denotes the cardinality of set A. Since (7) implies that \(\parallel g\left( X \right) \parallel ^{2}=\mathop \sum \nolimits _{i=1}^p [\hbox {min}(\left| {X_i } \right| ,\lambda )]^{2}\), the SURE function of (6b) reduces to:

$$\begin{aligned} \hbox {SURE}\left( {\lambda ;X} \right) =p-2\,\#\,\mathcal{I}+\,\sum \limits _{i=1}^p [\hbox {min}(\left| {X_i } \right| ,\lambda )]^{2}. \end{aligned}$$
(9)

Consequently, the problem of minimizing the MSE function reduces to that of minimizing \(\hbox {SURE}\left( {\lambda ;X} \right) \) in order to obtain the optimal threshold \(\lambda .\,\) Extending this principle to the entire set of resolution levels, at each level j, an expression for the SURE threshold \(\lambda _j^\mathrm{s} \) is given by:

$$\begin{aligned} \lambda _j^\mathrm{s} ={\mathop {\arg {\min }}\limits _{0\le \lambda \le \lambda _j^\mathrm{U} }} \, \hbox {SURE}\,\left( {\lambda ,\hat{\mathbf{d}} _j } \right) ,\, \quad j=j_0 ,\ldots ,J-1, \end{aligned}$$
(10)

where \(\lambda _j^\mathrm{U} =\hat{\sigma }\sqrt{2\ln 2^{j}}\) is the level-wise universal threshold [8], \(\hat{\sigma }\) is a robust estimate of the noise level, and \(\hat{\mathbf{d}} _j \) is the vector of empirical wavelet coefficients at level j whose elements are \(\hat{d}_{jk} \), \(k=\,0,\ldots ,2^{j}-1\). The main advantage of the SURE threshold is that when the underlying function has jump discontinuities, the reconstructed signal has no artifacts at the location near discontinuity.

Donoho and Johnstone considered a hybrid scheme of the SURE threshold and the universal threshold—denoted by SureShrink—in order to avoid a serious drawback in situations of extreme sparsity of the wavelet coefficients as reported in [11]. This hybrid scheme is based on the following heuristic idea: If the set of empirical wavelet coefficients is judged to be sparsely represented, then the hybrid scheme defaults to the universal threshold\(\,\lambda _j^\mathrm{U} \) [8]; otherwise, the SURE criterion is used to select a threshold value.

In mathematical terms, the SureShrink is expressed as [11]:

$$\begin{aligned} \lambda _j^\mathrm{H} =\left\{ {{ \begin{array}{ll} {\lambda _{j}^\mathrm{U} }&{} \quad {\hbox {if}\,\left| \left| \hat{\mathbf{d}}_j\right| \right| _{2}^{2} \le \,\hat{\sigma } ^{2}2^{j/2}\left( {2^{j/2}+\,j^{3/2}} \right) } \\ {\lambda _j^\mathrm{s} \,}&{} \quad {\hbox {otherwise}} \\ \end{array} }} \right. , \quad j=j_0 ,\ldots ,J-1. \end{aligned}$$
(11)

The inequality in (11) is a good measure of sparsity [2, 11]. It is observed that when using the universal thresholds, the losses seem to be very small for sparse vectors compared to the case of using SURE, i.e., the hybrid method adaptively “keeps” or “shrinks” large coefficients and “kills” small ones.

4 The Hybrid Scheme with Trimmed Thresholding

The main advantage of the SureShrink method is that it combines the advantages of the universal threshold and the SURE threshold. Therefore, it is widely used in many applications such as speech and image denoising. In this section a hybrid scheme based on the SURE with trimmed thresholding and the universal threshold—to be referred to as the SureTrimm—is introduced. The trimmed thresholding Eq.  (4) can be rewritten as

$$\begin{aligned} \eta _\mathrm{T} \left( {X_j ,{\lambda }} \right) =X_j \left( {1-\frac{{\lambda }^{{\alpha }}}{\left| {X_j } \right| ^{\alpha }}} \right) I^{\prime }\left( j \right) \quad ,\quad \alpha \ge 1, \end{aligned}$$
(12)

where \(I^{\prime }\left( j \right) \,\)is the set indicator function of the set \(\mathcal{I}^{{\prime }}\) defined by \(\mathcal{I}^{{\prime }}\equiv \left\{ {j:\left| {X_j } \right| >\lambda } \right\} \) and \(X_1 ,\ldots ,X_p \,\) are independent \(N\left( {\mu _i ,1} \right) \), \(i=1,\ldots ,p\). Letting \(\hat{\mu } _j =\eta _\mathrm{T} \left( {X_j ,{\lambda }} \right) \) in (5) and solving for \(g_j \left( X \right) \) one obtains:

$$\begin{aligned} g_j \left( X \right) =\eta _\mathrm{T} \left( {X_j ,{\lambda }} \right) -X_j =X_j \left( {1-\frac{{\lambda }^{{\alpha }}}{\left| {X_j } \right| ^{\alpha }}} \right) I^{\prime }\left( j \right) -X_j. \quad \end{aligned}$$
(13)

Upon defining the set indicator function \(I\left( j \right) \) of the set \(\mathcal{I}\equiv \left\{ {j:\left| {X_j } \right| \le \lambda } \right\} \), the above equation reduces to:

$$\begin{aligned} g_j \left( X \right) =-X_j \,I\left( j \right) -\hbox {sgn}\left( {X_j } \right) .\frac{{\lambda }^{{\alpha }}}{\left| {X_j } \right| ^{\alpha -1}}I^{\prime }\left( j \right) . \end{aligned}$$
(14)

Taking the partial derivative, one gets—as derived in “Appendix A”— the following formula:

$$\begin{aligned} \frac{\partial g_j }{\partial X_j }=-I\left( j \right) +\lambda ^{\alpha }\left\{ {\frac{\left( {\alpha -1} \right) }{\left| {X_j } \right| ^{\alpha }}-\frac{2\delta \left( {X_j } \right) }{\left| {X_j } \right| ^{\left( {\alpha -1} \right) }}} \right\} I^{{\prime }}\left( j \right) , \end{aligned}$$
(15a)

where \(\delta \left( \cdots \right) \) is the Dirac delta function. By ignoring the case of \(X_j =0\), the above equation simplifies to:

$$\begin{aligned} \frac{\partial g_j }{\partial X_j }=-I\left( j \right) +\frac{\lambda ^{\alpha }\left( {\alpha -1} \right) }{\left| {X_j } \right| ^{\alpha }}{I}^{\prime }(j). \end{aligned}$$
(15b)

It is straightforward to show that

$$\begin{aligned} \frac{\partial g_j }{\partial X_i }=0\quad ,\,i\ne j. \end{aligned}$$
(15c)

Using (15b), the divergence of g reduces to:

$$\begin{aligned} \nabla . g\left( X \right)\equiv & {} \mathop \sum \limits _{i=1}^p \frac{\partial g_i }{\partial X_i } \nonumber \\= & {} \mathop \sum \limits _{i\in \mathcal{I}} -1+\left( {\alpha -1} \right) \mathop {\sum }\limits _{i\in {\mathcal{I}}^{\prime }} \frac{{\lambda }^{{\alpha }}}{\left| {X_i } \right| ^{{\alpha }}} \nonumber \\= & {} \,-\#\mathcal{I}+\left( {\alpha -1} \right) {\lambda }^{{\alpha }}\mathop {\sum }\limits _{i\in {\mathcal{I}}^{\prime }} \frac{1}{\left| {X_i } \right| ^{{\alpha }}}. \end{aligned}$$
(16)

Based on (14), the squared norm can be computed as:

$$\begin{aligned} \parallel g\left( X \right) \parallel ^{2}= & {} \mathop \sum \limits _{j=1}^p \left| {g_j } \right| ^{2} \nonumber \\= & {} \mathop \sum \limits _{j=1}^p \left| {X_j \,I\left( j \right) +\hbox {sgn}\left( {X_j } \right) \frac{\lambda ^{\alpha }}{\left| {X_j } \right| ^{\alpha -1}}{I}^{\prime }(j)} \right| \nonumber \\= & {} \mathop \sum \limits _{j\in \mathcal{I}} \left| {X_j \,} \right| ^{2}+\mathop {\sum }\limits _{j\in {\mathcal{I}}^{\prime }} \left( {\frac{\lambda ^{\alpha }}{\left| {X_j } \right| ^{\alpha -1}}} \right) ^{2} \nonumber \\= & {} \mathop \sum \limits _{j\in \mathcal{I}} \left| {X_j \,} \right| ^{2}+\lambda ^{2\alpha }\mathop {\sum }\limits _{j\in {\mathcal{I}}^{\prime }} \frac{1}{\left| {X_j } \right| ^{2\left( {\alpha -1} \right) }}. \end{aligned}$$
(17)

For \(j\in \mathcal{I}\): \(\left| {X_j } \right| \le \lambda \) and \(\left| {X_j } \right| ^{\alpha }\le \lambda ^{\alpha }\) for \({\alpha }\ge 1\) resulting in \(\left| {X_j } \right| \le \frac{\lambda ^{\alpha }}{\left| {X_j } \right| ^{\alpha -1}}\) . For \(j\in \mathcal{I}^{{\prime }}\): \(\left| {X_j } \right| >\lambda \) and \(\left| {X_j } \right| ^{\alpha }>\lambda ^{\alpha }\) for \({\alpha }\ge 1\) resulting in \(\frac{\lambda ^{\alpha }}{\left| {X_j } \right| ^{\alpha -1}}<\left| {X_j } \right| \). Consequently, (17) reduces to:

$$\begin{aligned} \parallel g\left( X \right) \parallel ^{2}=\sum \limits _{i=1}^p \left[ \hbox {min}(\left| {X_i } \right| ,\frac{\lambda ^{\alpha }}{\left| {X_i } \right| ^{\alpha -1}})\right] ^{2}\,. \end{aligned}$$
(18)

According to Stein, if \(g\left( X \right) \) is a weakly differentiable function as in the case of trimmed thresholding, the \(\hbox {SURE}\left( {\lambda ;X} \right) \) function of (6b) upon utilizing (16) and (18) reduces to:

$$\begin{aligned} \hbox {SURE}\left( {\lambda ;X} \right)= & {} p-2\,\#\,\mathcal{I}+2\left( {\alpha -1} \right) \lambda ^{\alpha }\mathop \sum \limits _{i\in \mathcal{I}^{{\prime }}} \frac{1}{\left| {X_i } \right| ^{\alpha }} \nonumber \\&+\mathop \sum \limits _{i=1}^p \left[ {\min \left( {\left| {X_i } \right| ,\frac{\lambda ^{\alpha }}{\left| {X_i } \right| ^{\alpha -1}}} \right) } \right] ^{2}\quad . \end{aligned}$$
(19)

Applying the above formula to the entire set of resolution levels, an expression for the trimmed threshold \(\lambda _j^\mathrm{T} \) is obtained as:

$$\begin{aligned} \lambda _j^\mathrm{T} ={\mathop {\arg {\min }}\limits _{0\le \lambda \le \lambda _j^\mathrm{U} }} \, \hbox {SURE}\left( {\lambda ;\hat{\mathbf{d}} _j } \right) ,\quad j=j_0 ,\ldots ,J-1. \end{aligned}$$
(20)

Consequently, the threshold \(\lambda _j^\mathrm{HT} \) of the hybrid scheme based on the SURE with trimmed thresholding and the universal threshold (SureTrimm) reduces to:

$$\begin{aligned} \lambda _j^\mathrm{HT} =\left\{ {{\begin{array}{ll} {\lambda _j^\mathrm{U} }&{} {\quad \hbox {if}\,\left| \left| \hat{\mathbf{d}}_j\right| \right| _2^2 \le \,\hat{\sigma } ^{2}2^{j/2}\left( {2^{j/2}+\,j^{3/2}} \right) } \\ {\lambda _j^\mathrm{T} \,}&{} \quad {\hbox {otherwise}}. \\ \end{array} }} \right. \, \end{aligned}$$
(21)

The main difference between the hybrid scheme of the SURE with trimmed thresholding and the universal threshold (SureTrimm)— briefly presented in [16]—and the SureShrink algorithm of [11] is the use of Eq. (21) in place of Eq.  (11) in [11], whereas the main difference between the proposed algorithm of the present paper and that of [16] is the use of an optimal \(\alpha \) in contrast to using an arbitrarily fixed \(\alpha \) in [16].

The solution of the optimization problem (20) follows the same approach adopted in [11]. One starts by rewriting (19) as:

$$\begin{aligned} \hbox {SURE}\left( {\lambda ;\hat{\mathbf{d}} _j } \right)= & {} p-2\,\#\,\mathcal{I}+2\left( {\alpha -1} \right) \mathop {\sum }\limits _{i\in {\mathcal{I}}^{\prime }} \left( {\frac{\lambda }{\left| {\left( {\hat{\mathbf{d}} _j } \right) _i } \right| }} \right) ^{\alpha } \nonumber \\&+\,\mathop \sum \limits _{i=1}^p \left| {\left( {\hat{\mathbf{d}} _j } \right) _i } \right| ^{2}\left[ {\min \left( {1,\left( {\frac{\lambda }{\left| {\left( {\hat{\mathbf{d}} _j } \right) _i } \right| }} \right) ^{\alpha }} \right) } \right] ^{2}, \end{aligned}$$
(22)

where \(\left( {\hat{\mathbf{d}}_j } \right) _i \) is the ith element of \(\hat{\mathbf{d}}_j \). Rearrange the absolute values of the p elements of the wavelet coefficients vector \(\hat{\mathbf{d}}_j \) in ascending order to form the nonnegative vector \({{\varvec{v}}}_j =\left[ {{\begin{array}{lll} {y_1 }&{} \ldots &{} {y_p } \\ \end{array} }} \right] ^{T}\), where \(y_1 \le y_2 \le \ldots \le y_p \). Let \(\zeta \) be the index number in the set \(\left\{ {1,\ldots ,p} \right\} \) which satisfies \(y_\zeta \le \lambda \) and \(y_{\zeta +1} >\lambda \). Consequently, \(\hbox {SURE}\left( {\lambda ;\hat{\mathbf{d}} _j } \right) \) will be renamed \(\hbox {SURE}\left( {\lambda ;{{\varvec{v}}}_j } \right) \) and (22) will be rewritten in the following simpler form:

$$\begin{aligned} \hbox {SURE}\left( {\lambda ;{{\varvec{v}}}_j } \right) =p-2\zeta +2\left( {\alpha -1} \right) \mathop \sum \limits _{i=\zeta +1}^p \left( {\frac{\lambda }{y_i }} \right) ^{\alpha }+\mathop \sum \limits _{i=1}^\zeta y_i^2 +\mathop \sum \limits _{i=\zeta +1}^p y_i^2 \left( {\frac{\lambda }{y_i }} \right) ^{2\alpha }.\nonumber \\ \end{aligned}$$
(23)

The corresponding trimmed threshold \(\lambda _j^\mathrm{T} \) of (20) will take the form:

$$\begin{aligned} \lambda _j^\mathrm{T} ={\mathop {\arg {\min }}\limits _{0\le \lambda \le \lambda _j^\mathrm{U}} } \, \hbox {SURE}\,\left( {\lambda ;{{{\varvec{v}}}}_{j} } \right) ,\quad j=j_0 ,\ldots ,J-1. \end{aligned}$$
(24)

Finally, the threshold \(\lambda _j^\mathrm{HT} \) of the hybrid scheme based on the SURE with trimmed thresholding and the universal threshold (SureTrimm) will be obtained from (21). Moreover, one can achieve an overall mean squared error (MSE) lower than that achieved when using a fixed value of \(\alpha \) by optimizing the parameter \(\alpha \) at the coarsest resolution level \(j_0 \) and choosing the threshold accordingly. Toward that goal a heuristic approach is adopted here.

In order to specify a value for \(\alpha \ge 1\) that achieves a minimum value of \(\hbox {SURE}\left( {\lambda ;\hat{\mathbf{d}} _j } \right) \) of (22) over each decomposition level j, the rearranged simple form of SURE, namely \(\hbox {SURE}\left( {\lambda ;{{\varvec{v}}}_{j_0 } } \right) \) of (23), is computed at discrete values of \(\alpha \) over a suitably chosen interval \(\left[ {1,\bar{\alpha }} \right] \), where \(\bar{\alpha }\) is an arbitrary upper bound for \(\alpha \). Consequently, the minimum value of SURE over the chosen interval provides the optimal \(\hat{\alpha }\) with the corresponding optimal threshold \(\lambda _{j_0 }^\mathrm{T} \). As a result, at each level of decomposition j, a minimum value of \(\hbox {SURE}\left( {\lambda ;\hat{\mathbf{d}}_j } \right) \) of (22) is computed using the optimal \(\hat{\alpha }\). After finding the optimal threshold \(\lambda _j^\mathrm{HT} \) using (21), the trimmed thresholding of (12) using the optimal \(\hat{\alpha }\) is applied to the UWT coefficients to get \(\eta _\mathrm{T} \left( {\hat{d}_{jk} ,\lambda _j^\mathrm{HT} } \right) \). Finally, the denoised signal is reconstructed using the IUWT with the coefficients {\(\hat{c}_{j_0 ,k} , \quad \eta _\mathrm{T} \left( {\hat{d}_{jk} ,\lambda _j^\mathrm{HT} } \right) \}.\) The main steps are summarized in the following algorithm:

Algorithm:

Given the noisy signal samples y, find an estimate of the original signal f according to model (1).

Input arguments: y, J, \(j_0 \), \(\bar{\alpha }\) (an arbitrary upper bound for \(\alpha )\).

Procedure:

  1. i.

    Compute the forward UWT coefficients of the noisy signal y: \(w_{j,k} =\left( {\hat{c}_{j_0 ,k} , \hat{d}_{j,k} } \right) ,\) for \(k=0,\,1,\ldots ,2^{J}-1,\,j=j_0 ,\ldots ,J-1\).

  2. ii.

    Compute the minimum value of \(\hbox {SURE}\left( {\lambda ;{{\varvec{v}}}_j } \right) \) of (23) for \(j=j_0 \), and find the corresponding \(\lambda _{j_0 }^\mathrm{T} \) using (24) at discrete values of \(\alpha \) in \(\left[ {1,\bar{\alpha }} \right] \). Select \(\hat{\alpha } \) which achieves the smallest value of SURE.

  3. iii.

    For \(j=j_0 ,\ldots ,J-1\):

    1. a.

      Compute \(\lambda _j^\mathrm{T} \) that minimizes SURE from Eq. (24) using \(\hat{\alpha }\) in (23).

    2. b.

      Determine \(\lambda _j^\mathrm{HT} \), the hybrid threshold of (21).

    3. c.

      Apply trimmed thresholding to the wavelet coefficients to get \(\eta _\mathrm{T} \left( {\hat{d}_{jk} ,\lambda _j^\mathrm{HT} } \right) \) using (12).

  4. iv.

    Compute the inverse undecimated wavelet transform (IUWT) of the thresholded coefficients {\(\hat{c}_{j_0 ,k} , \quad \eta _\mathrm{T} \left( {\hat{d}_{jk} ,\lambda _j^\mathrm{HT} } \right) \}\) and use them in reconstructing the denoised signal.

This simple algorithm achieves a better output SNR than other wavelet-based methods as will be illustrated in the next section. The whole process is computationally efficient because the UWT and IUWT of steps (i) and (iv), respectively, can be performed by a fast algorithm having a computational load of the order of \(n\log \left( n \right) \), where n is the signal size.

5 Comparative Study

In order to illustrate the effectiveness of the proposed algorithm, the four standard test signals of Donoho and Johnstone [10]: “Doppler,” “Bumps,” “Blocks” and “HeaviSine” as well as two electrocardiogram signals—“ECG100.dat” and “ECG117.dat” downloaded from Physionet [14]—are selected. Additive white Gaussian noise (AWGN) is added to these signals such that the signal-to-noise ratio (SNR) is 5 dB. The standard well-known SNR definition is:

$$\begin{aligned} \hbox {SNR}=10\log _{10} \left[ {\frac{\mathop \sum \nolimits _{l=0}^{n-1} f^{2}\left( l \right) }{\mathop \sum \nolimits _{l=0}^{n-1} \left[ \hat{f} \left( l \right) -f\left( l \right) \right] ^{2}}\,} \right] , \end{aligned}$$
(25)

where \(f\left( l \right) \) and \(\hat{f} \left( l \right) \) are the lth clean and noisy signal samples, respectively. A sample of size \(n\,=\,1024\) for the four standard test signals and a sample of frame length \(n_\mathrm{F} =512\) for the ECG signals are chosen, and the undecimated wavelet transform based on the Symlet8 wavelets with five decomposition levels is applied. The noisy signal is filtered using the proposed hybrid scheme of the SURE with trimmed thresholding and the universal threshold (SureTrimm) with both fixed alpha \(\left( {\alpha =2} \right) \) and \(\left( {\alpha =6} \right) \) and optimized alpha \(\left( {\hat{\alpha }} \right) \), and the results are denoted by \(\hbox {TT}\left( {\alpha =2} \right) ,\,\hbox {TT}\left( {\alpha =6} \right) \) and \(\hbox {TT}^{*}\), respectively. For the sake of comparison, the following methods: hard thresholding (HT), hybrid SURE with soft thresholding (ST) [11], spectral subtraction (SS) [3], phase spectrum compensation (PSC) [23] and block thresholding (BT) [26], are also applied.

Fig. 1
figure 1

a Doppler signal; b noisy signal with \(\hbox {SNR} =5\) dB

Fig. 2
figure 2

a Bumps signal; b noisy signal with \(\hbox {SNR} =5\) dB

Fig. 3
figure 3

a Blocks signal; b noisy signal with \(\hbox {SNR} =5\) dB

Fig. 4
figure 4

a HeaviSine signal; b noisy signal with \(\hbox {SNR} =5\) dB

Fig. 5
figure 5

Doppler signal shown in Fig. 1 denoised using a hard thresholding (HT), b SureShrink (ST), c SureTrimm with fixed alpha \(\hbox {TT}(\,\alpha =2)\), d \(\hbox {TT}(\alpha =6)\), e SureTrimm with optimized \(\alpha \) (\(\hbox {TT}^{*}\)), f spectral subtraction (SS), g phase spectrum compensation (PSC) and h block thresholding (BT). (The SNR is in dB.)

Fig. 6
figure 6

Bumps signal shown in Fig. 2 denoised using a hard thresholding (HT), b SureShrink (ST), c SureTrimm with fixed alpha \(\hbox {TT}(\,\alpha =2)\), d \(\hbox {TT}(\alpha =6)\), e SureTrimm with optimized \(\alpha \) (\(\hbox {TT}^{*})\), f spectral subtraction (SS), g phase spectrum compensation (PSC) and h block thresholding (BT). (The SNR is in dB.)

Fig. 7
figure 7

Blocks signal shown in Fig. 3 denoised using a hard thresholding (HT), b SureShrink (ST), c SureTrimm with fixed alpha \(\hbox {TT}(\,\alpha =2)\), d \(\hbox {TT}(\alpha =6)\), e SureTrimm with optimized \(\alpha \) (\(\hbox {TT}^{*})\), f spectral subtraction (SS), g phase spectrum compensation (PSC) and h block thresholding (BT). (The SNR is in dB.)

Fig. 8
figure 8

HeaviSine signal shown in Fig. 4 denoised using a hard thresholding (HT), b SureShrink (ST), c SureTrimm with fixed alpha \(\hbox {TT}(\alpha =2)\), d \(\hbox {TT}(\alpha =6)\), e SureTrimm with optimized \(\alpha \) (\(\hbox {TT}^{*})\), f spectral subtraction (SS), g phase spectrum compensation (PSC) and h block thresholding (BT). (The SNR is in dB.)

Table 1 Denoising results of Doppler signal
Table 2 Denoising results of Bumps signal
Table 3 Denoising results of Blocks signal
Table 4 Denoising results of HeaviSine signal
Fig. 9
figure 9

a ECG100 signal, b noisy signal, denoised signal using c hard thresholding (HT), d SureShrink (ST), e SureTrimm with optimized \(\alpha \) (\(\hbox {TT}^{*})\), f spectral subtraction (SS), g phase spectrum compensation (PSC) and h block thresholding (BT)

Fig. 10
figure 10

a ECG117 signal; b noisy signal, denoised signal using c hard thresholding (HT), d SureShrink (ST), e SureTrimm with optimized \(\alpha \) (\(\hbox {TT}^{*})\), f spectral subtraction (SS), g phase spectrum compensation (PSC) and h block thresholding (BT)

Figures 1, 2, 3 and 4 show the clean signals in part (a) and the noisy signals with \(\hbox {SNR} = 5\) in part (b). The corresponding denoised signals are shown in Figs. 5, 6, 7 and 8 where the results of applying the HT and ST are shown in parts (a) and (b), respectively. Parts (c) and (d) show the results of the \(\hbox {TT}\left( {\alpha =2} \right) \) and \(\hbox {TT}\left( {\alpha =6} \right) \), respectively. Part (e) in those figures shows the results of applying the \(\hbox {TT}^{*}\), where the indicated values of \(\alpha \) are the optimal ones that achieve the minimal risk estimate. Parts (f), (g) and (h) in Figs. 5, 6, 7 and 8 show the denoised signal using the non-wavelet methods: the SS, the PSC and the BT, respectively. An examination of Figs. 5, 6, 7 and 8 reveals the improved performance of the proposed method (\(\hbox {TT}^{*})\) over both wavelet-based and non-wavelet-based denoising methods. The \(\hbox {TT}^{*}\) method significantly reduces the noise and preserves the edges of the filtered signal at the same time.

The same methods are applied over a range of input SNRs {− 2, 0, 2, 5, 10, 15} dB, and the SNRs of the denoised signal—computed according to (25)—are listed in Tables 1, 2, 3 and 4 for the four standard test signals. The results testify to the improved performance of the proposed method (\(\hbox {TT}^{*})\) compared to the other methods for the entire range of input SNR values.

The results of denoising the two electrocardiogram signals—“ECG100.dat” and “ECG117.dat”—are shown in Figs. 9 and 10, respectively. In each of these two figures, part (a) shows the original ECG signal and part (b) shows the noisy signal with \(\hbox {SNR} = 5\)  dB. Parts (c)–(h), respectively, show the results obtained by applying the following six methods: HT, ST, \(\hbox {TT}^{*}\), SS, PSC and BT.

6 Conclusion

The problem of extracting a signal from measurements contaminated by additive noise is considered, and a wavelet-based technique is proposed where the undecimated wavelet transform (UWT) detail coefficients are thresholded with the threshold determined according to a hybrid scheme incorporating trimmed thresholding. More specifically, an expression is derived for the Stein’s unbiased risk estimator (SURE) function based on trimmed thresholding and is next minimized in order to determine the threshold value. The philosophy of the hybrid scheme is to either employ the optimal threshold value, or to default to a universal threshold in the case of sparse wavelet detail coefficients. In addition to applying optimization for the determination of the threshold value, a heuristic approach is followed for the selection of the optimal trimmed thresholding parameter alpha. An extensive comparison is made between the proposed method and other wavelet-based methods (the hard thresholding and the hybrid SURE with soft thresholding) and non-wavelet-based methods (spectral subtraction, phase spectrum compensation and block thresholding). The simulation study— including both test signals and ECG signals—demonstrated the merit of the proposed method in terms of the SNR of the denoised signal.