Introduction

Seismic data is constantly contaminated with a variety of noise during field acquisition process (Zhong et al. 2015; Zheng et al. 2017; Sun and Li 2020; Liu et al. 2023). Among them, random noise is one of the most important noises presented in seismic data (Bing et al. 2020), which has the irrelevant property from trace to trace (Oropeza et al. 2011). Failure to effectively suppress random noise could have a severe impact on following seismic processing tasks, for instance, seismic amplitude interpretation, multiples attenuation, pre-stack migration imaging, seismic inversion, and so on. Random noise attenuation and thus enhancing the signal-to-noise ratio (SNR) enhancement of seismic data is the primary issue in geophysics.

In the past several decades, numerous approaches have been proposed for the denoising of seismic data, and they can be generally classified into four categories. (I) Predictive filtering methods make full use of the predictable property of signal to construct the predictive filters for noise removal, for example, \(f-x\) deconvolution (Canales 1984), forward-backward prediction approach (Wang 1999), \(t-x\) predictive filtering (Abma and Claerbout 1995), non-stationary predictive filtering (Chen and Ma 2014), nonstationary polynomial fitting (Liu et al. 2011), nonstationary signal inversion (Yang et al. 2020), nonstationary predictive filtering (Wang et al. 2021), and 3-D structural complexity-guided predictive filtering (Fu et al. 2022). (II) Decomposition-based approaches transform noisy seismic data into an ensemble of modes and then select the signal-dominant modes to reconstruct the desired signals. The widely utilized techniques include empirical mode decomposition (EMD) (Huang et al. 1998; Long et al. 2021) and its variations (Wu and Huang 2009; Torres et al. 2011), variational mode decomposition (VMD) (Dragomiretskiy and Zosso 2014; Yu and Ma 2018; Li et al. 2018; Liu and Duan 2020; Yao et al. 2021), two-dimensional variational mode decomposition (2DVMD) (Zhang et al. 2022), mini-batch multivariate variational mode decomposition (MMVMD) (Wu et al. 2022), singular value decomposition (SVD) (Ji and Wang 2022), and local singular value decomposition (LSVD) (Bekara and van der Baan 2007). (III) Transformed-domain-based method and dictionary learning can be classified to sparse representation (Dong et al. 2022). They firstly map seismic data into a transformed domain in which the data has the sparse property, and then shrink the coefficients using a threshold function, and finally perform back-transforming the processed coefficients to the time domain. Such algorithms include wavelet transform (Sinha et al. 2005; Lan et al. 2023), wavelet adaptive thresholding (Zhang et al. 2022), radon transform (Trad et al. 2002), curvelet transform (Herrmann et al. 2007; Yang et al. 2017), seislet transform (Chen and Fomel 2018), shearlet transform (Kong and Peng 2015), dreamlet transform (Wang et al. 2015), time-synchroextracting transform (Li et al. 2020), and matching demodulation synchrosqueezing S transform (Wang et al. 2022). (IV) Rank-based methods, e.g., rank reduction, low-rank and sparse decomposition. The former assumes that seismic signal is approximately low-rank while the existing noise will increase the rank of a matrix (Chen et al. 2017), thus one can remove the random noise by implementing a rank-reduction procedure. In the second strategy, one can obtain a low-rank component and a sparse component by performing an appropriate algorithm on the time-frequency representation of a seismic trace with noise. And then we can recover the denoised data by the aid of the obtained low-rank component. Various denoising methods related with such techniques have been gradually used in practice such as classical Cadzow filtering (Sacchi 2009; Trickett 2008), singular spectrum analysis (Oropeza et al. 2011), damped singular spectrum analysis (Huang et al. 2016; Chen et al. 2016), low-rank matrix approximation (Siahsar et al. 2016; Anvari et al. 2017), improved low-rank matrix approximation (Li et al. 2020), enhanced low-rank matrix estimation (Oboue and Chen 2021), low-rank tensor minimization (Feng et al. 2022), adaptive weighting rank-reduction (Bayati and Trad 2023), and low-rank total variation (Ghosh et al. 2023). In addition, the community geophysics has developed a number of deep-learning-based denoising methods for seismic data since 2016. Although these deep-learning-based methods show better denoising methods, they still have drawbacks, such as the time-consuming training and dependence on training dataset (Dong et al. 2019; Yuan et al. 2018; Yu et al. 2019; Dong and Li 2021; Dong et al. 2021).

More recently, a novel reassignment algorithm that would be termed as time reassigned synchrosqueezing transform (TSST) (He et al. 2019) was introduced as an effective technique for analyzing the time-varying non-stationary signal, in which the time-frequency coefficients are reassigned along the time axis instead of that along the frequency axis as the STFT-based synchrosqueezing transform (FSST) does. Such a technique achieves a sharpened time-frequency representation (TFR) for a wide class of strongly modulated signal. However, the original TSST algorithm is conducted by means of the short-time Fourier transform (STFT) with a fixed window. Therefore, once the window function is defined, the time-frequency resolution is also determined, which means that the high resolutions in both time and frequency cannot be achieved simultaneously. In fact, for a low-frequency component in the signal, the wide window is more suitable, while a narrow window is helpful for characterizing the high-frequency content (Li et al. 2020). To deal with this issue, we propose to improve the existing TSST by introducing a time-varying window function in the paper, which is called the adaptive TSST (ATSST). The ATSST achieves the optimized energy concentration and signal recovery for a class of multicomponent non-stationary signal with the fast time-varying feature.

Inspired by the high resolution property of ATSST and low-rank strategy for noise reduction, we integrate the ATSST and the low-rank matrix recovery to achieve seismic random noise attenuation. The proposed method consists of three main steps. Firstly, the seismic trace with noise is decomposed into a sparse time-frequency matrix by using the ATSST. Then, one can solve an optimization problem via an improved Optshrink algorithm to obtain a low-rank matrix. Finally, the filtered signal is retrieved by means of the obtained low-rank time-frequency matrix through the inverse ATSST. Our contributions are as follows: (1) we propose an adaptive TSST (ATSST) algorithm to enhance time-frequency concentration for seismic signal, (2) We combine the ATSST and the improved Optshrink algorithm for seismic random noise suppression, (3) Our method show better ability in signal amplitude-preserving compared to some traditional approaches.

The structure of this paper is arranged as below: Section II focuses the ATSST in detail as a time-frequency representation of the non-stationary seismic signal, and then the Optshrink algorithm is introduced and improved with respect to its implementation in order to obtain a low-rank component related with seismic signal from the noisy time-frequency matrix. In Section III, the proposed method is tested on the synthetic data and field datasets, and we make a comparison between the presented approach, the \(f-x\) deconvolution and the Cadzow filtering. A discussion concerning the parameters setting associated with the denoising methods is provided in Section IV. Section V draws some key conclusions.

Theory

Time-reassigned synchrosqueezing transform (TSST)

The standard STFT of an input signal f is expressed as below:

$$\begin{aligned} V_f^g \left( {\mu ,\xi }\right) = \int {f\left( \tau \right) } g\left( {\tau - \mu }\right) e^{ - 2i\pi \xi \tau } d\tau , \end{aligned}$$
(1)

where \(\mu\) and \(\xi\) denote the time and frequency, respectively, and g is a real-valued window function.

Now, an impulse signal is considered:

$$\begin{aligned} {f_\delta }\left( \tau \right) = A\delta \left( {\tau {-}{\tau _0}}\right) , \end{aligned}$$
(2)

where A denotes the amplitude.

Substitude Eq. (2) into Eq. (1), we get:

$$\begin{aligned} V_{{f_\delta }}^g\left( {\mu ,\xi }\right)&= \int {f\left( \tau \right) } g\left( {\tau - \mu }\right) {e^{ - i2\pi \xi \tau }}d\tau \nonumber \\&= Ag\left( {{\tau _0} - \mu }\right) {e^{ - i2\pi \xi {\tau _0}}}. \end{aligned}$$
(3)

Then, the first-order group delay (GD) estimator is written as:

$$\begin{aligned} {t_{{f_\delta }}}\left( {\mu ,\xi }\right) = - R \left[ {\frac{{{\partial _\xi }V_{{f_\delta }}^g \left( {\mu ,\xi }\right) }}{{i2\pi V_{{f_\delta }}^g \left( {\mu ,\xi }\right) }}} \right] , \end{aligned}$$
(4)

where \(R\left\{ \bullet \right\}\) denotes the real part of a complex number.

Finally, the TSST is formulated as:

$$\begin{aligned} T_{{f_\delta }}^g\left( {t,\xi }\right) = \int _{\psi \left( \xi \right) } {V_{{f_\delta }}^g \left( {\mu ,\xi }\right) \delta \left( {t - {t_{{f_\delta }}} \left( {\mu ,\xi }\right) }\right) } d\mu , \end{aligned}$$
(5)

and the mode reconstruction from TSST is achievable by:

$$\begin{aligned} f\left( \tau \right) = \frac{1}{{2\pi \hat{g} \left( 0\right) }}\int {\int {_{{R^2}}T_{{f_\delta }}^g \left( {t,\xi }\right) } {e^{i2\pi \xi \tau }}dt d\xi } . \end{aligned}$$
(6)

Adaptive time-reassigned synchrosqueezing transform (ATSST)

It is noteworthy that the above-mentioned TSST algorithm is intrinsically based on the STFT with a fixed window size, that is to say, the high time resolution and frequency resolution cannot be attained simultaneously. However, for a multicomponent signal, a short window is helpful for analyzing the high-frequency content while a long window is beneficial to the low-frequency one (Li et al. 2020). Thus, the selection of the optimal window function for signal analysis is a vital issue. In this paper, we propose an adaptive TSST (ATSST) by introducing a time-varying window function, in which the window width can be adaptively determined.

Herein, we consider a time-varying window \({g_\sigma }\left( \tau \right)\):

$$\begin{aligned} {g_\sigma }\left( \tau \right) = \frac{1}{{\sigma \left( t\right) }}g \left( {\frac{\tau }{{\sigma \left( t\right) }}}\right) , \end{aligned}$$
(7)

where both \(\sigma\) and g are the positive functions with respect to t, and \(g\left( 0\right) \ne 0\). Here, \(g\left( t\right)\) is defined as:

$$\begin{aligned} g\left( t\right) = \frac{1}{{\sqrt{2\pi } }}{e^{ - \frac{{{t^2}}}{2}}}. \end{aligned}$$
(8)

Thus, \({g_\sigma }\left( \tau \right)\) is the Gaussian function. It should be pointed out that the width of window function \({g_\sigma }\left( \tau \right)\) in the time domain depends on the parameter \(\sigma\) in Eq. (11) since the time duration of \({g_\sigma }\left( \tau \right)\) is \(\sigma\), which means that the parameter \(\sigma\) has an important influence on the TFR of a signal.

Subsequently, we combine Eqs. (11) and (12), and substitute into Eq. (7), the STFT of f with a time-varying \(\sigma\) can be obtained:

$$\begin{aligned} \tilde{{V_{{f_\delta }}^{{g_\sigma }}}} \left( {\mu ,\xi }\right)&= \int {f\left( \tau \right) } {g_\sigma }\left( {\tau - \mu }\right) {e^{ - i2\pi \xi \tau }}d\tau \nonumber \\&= A\frac{1}{{\sqrt{2\pi } \sigma }} {e^{ - \left[ {{{\left( {{\tau _0} - \mu }\right) }/ {\left( {2{\sigma ^2}}\right) }} + i2\pi \xi {\tau _0}} \right] }}. \end{aligned}$$
(9)

Under the framework of TSST algorithm, the GD estimator can be rewritten as:

$$\begin{aligned} \tilde{{{t_{{f_\delta }}}}} \left( {\mu ,\xi }\right) =- R\left[ {\frac{{{\partial _\xi } \tilde{{V_{{f_\delta }}^{{g_\sigma }}}} \left( {\mu ,\xi }\right) }}{{i\tilde{{2\pi V_{{f_\delta }}^{{g_\sigma }}}} \left( {\mu ,\xi }\right) }}} \right] . \end{aligned}$$
(10)

Finally, the ATSST is reformulated as:

$$\begin{aligned} T_{{f_\delta }}^{{g_\sigma }}\left( {t,\xi }\right) =\int _{\psi \left( \xi \right) } {\tilde{{V_{{f_\delta }}^{{g_\sigma }}}} \left( {\mu ,\xi }\right) \delta \left( {t - \tilde{{{t_{{f_\delta }}}}} \left( {\mu ,\xi }\right) }\right) } d\mu . \end{aligned}$$
(11)

The mode reconstruction based on ATSST is expressed as:

$$\begin{aligned} f\left( \tau \right) = \frac{1}{{2\pi \hat{g} \left( 0\right) }}\int {\int {_{{R^2}}T_{{f_\delta }}^{{g_\sigma }} \left( {t,\xi }\right) } {e^{i2\pi \xi \tau }}dt d\xi } . \end{aligned}$$
(12)

Calculation of time-varying \(\sigma\)

We employ the Rényi entroy to measure the distribution concentration of a TFR (Sheu et al. 2017). The \(\alpha\)-Rényi entroy of a nonzero function f is described as:

$$\begin{aligned} {R_\alpha }\left( f\right) : = \frac{1}{{1 - \alpha }}{\log _2} {\left( {\frac{{{{\left\| f \right\| }_{2\alpha }}}}{{{{\left\| f \right\| }_2}}}}\right) ^{2\alpha }}, \end{aligned}$$
(13)

where \(\alpha > 0\) and \({\left\| f \right\| _\alpha }:={\left( {\int {{{\left| {f\left( x\right) } \right| }^\alpha }dx} }\right) ^{{1/\alpha }}}\). Generally speaking, \(\alpha > 2\) is recommended for TFR measure (Stankovic 2001). In the paper, we choose \(\alpha = 3\). Detailed description of these parameters can be found in Stankovic (2001). As is known to all, a lower Rényi entroy indicates a more concentrated TFR. Therefore, we are aiming at finding the lowest Rényi entroy.

The measure of distribution concentration is represented as:

$$\begin{aligned} {C_{\sigma ,c}}\left( \mu \right) : = \frac{1}{{1 - \alpha }}{\log _2} \frac{{\iint _{{I_\mu }} {{{\left| {T_{{f_\delta }}^{{g_\sigma }} \left( {t,\xi }\right) } \right| }^{2\alpha }}dtd\xi }}}{{{{\left( {\iint _{{I_\mu }} {{{\left| {T_{{f_\delta }}^{{g_\sigma }} \left( {t,\xi }\right) } \right| }^2}dtd\xi }}\right) }^\alpha }}}, \end{aligned}$$
(14)

where \(\mu\) is the time, c denotes the size of the neighborhood, and \({I_\mu }: = \left[ {\mu - c,\mu + c} \right] \times \left[ {0,\infty }\right)\).

Finally, the local optimal window width at \(\mu\) is determined by:

$$\begin{aligned} \sigma _c \left( \mu \right) : = \arg \min _\sigma C_{\sigma ,c} \left( \mu \right) . \end{aligned}$$
(15)

It is worth noting that the parameter c is found to be insensitive to the final result, thus, it is set to 0.15 by several trials in the paper.

figure a

The proposed ATSST algorithm is outlined in Algorithm 1. Now, a two-component signal is taken into account:

$$\begin{aligned} s\left( t\right)&= {s_1}\left( t\right) + {s_2}\left( t\right) \nonumber \\&= \sin \left( {2\pi \left( {{{330} \big / {\sqrt{1000} t + 16\cos \left( {{3 \big / {\sqrt{1000} \pi t}}}\right) }}}\right) }\right) \nonumber \\&\quad + \sin \left( {2\pi \left( {{{190} \big / {\sqrt{1000} t + 9\cos \left( {{3 \big / {\sqrt{1000} \pi t}}}\right) }}}\right) }\right) .\nonumber \\&\qquad t \in \left[ {0,1} \right] . \end{aligned}$$
(16)

We calculate the time-frequency maps using the original TSST and the proposed ATSST, respectively. The spectrums are shown in Fig. 1. It can be found that the ATSST further enhances the time-frequency energy concentration compared with the original TSST. Figure 1c plots the calculated time-varying \(\sigma\).

Fig. 1
figure 1

Results on the two-component signal, a conventional TSST, b ATSST, and c time-varying \(\sigma\) curve

Next, we will focus on incorporating the ATSST with the improved OptShrink algorithm for denoising seismic data.

Improved OptShrink algorithm

OptShrink algorithm, proposed by Nadakuditi (2014), is employed to produce the low-rank matrix related with the signal from the noisy time-frequency matrix. The key principle of the algorithm is that the signal matrix with the low-rank property and the noisy measurement matrix have the same singular value vectors, and it implements a automatically calculated threshold function to shrink the singular value (Cai et al. 2010). Thus, the signal matrix with the low-rank property can be recovered using the measurement matrix associated with the noisy data by performing a reweight on the singular value vectors obtained from the measurement matrix (Anvari et al. 2017).

Here, we simplify the D transformation and its first derivative in the original OptShrink algorithm in order to reduce the computational cost, which is also called the improved OptShrink algorithm. Furthermore, the new algorithm is more convenient and efficient to implement. We have tested the original OptShrink algorithm and the improved version on the single signal shown in Fig. 3b, the result demonstrates that the computing efficiency is increased by 2 times using the improved algorithm on condition that the recovery performance is comparable to that obtained using the original algorithm. The workflow is detailedly as below:

First of all, \(\hat{r}\) is considered as the rank of a input signal, and then the optimum weight regarding kth singular value vector can be computed based on Eq. (21).

$$\begin{aligned} \hat{\omega _{k,\widehat{r}}^{opt}} = - 2\hat{D} \left( {\hat{{\sigma _k}} ; \hat{\sum {_{\widehat{r}}}}}\right) . \bigg /\hat{{D^{'}}}\left( {\hat{{\sigma _k}}; \hat{\sum {_{\widehat{r}}}}}\right) , \end{aligned}$$
(17)

where \(\hat{D}\left( {x;Z}\right)\) denotes a D transformation and the corresponding first derivative is \(\hat{D^{'}} \left( {x;Z}\right)\), which are expressed as follows:

$$\begin{aligned} \hat{D}\left( {x;Z}\right)&= \frac{1}{m}Tr \left[ {x{{\left( {{x^2}I - Z{Z^H}}\right) }^{ - 1}}} \right] \nonumber \\&\quad \cdot \frac{1}{n}Tr \left[ {x{{\left( {{x^2}I - {Z^H}Z}\right) }^{ - 1}}} \right] . \end{aligned}$$
(18)
$$\begin{aligned} \hat{D^{'}} \left( {x;Z}\right)&= \frac{1}{m}Tr \left[ {x{{\left( {{x^2}I - Z{Z^H}}\right) }^{ - 1}}} \right] \nonumber \\&\quad \cdot \frac{1}{n}Tr\left[ { - 2{x^2} {{\left( {{x^2}I - {Z^H}Z}\right) }^{ - 2}} + {{\left( {{x^2}I - {Z^H}Z}\right) }^{ - 1}}} \right] \nonumber \\&\quad + \frac{1}{n}Tr\left[ {x{{\left( {{x^2}I - {Z^H}Z} \right) }^{ - 1}}} \right] \nonumber \\&\quad \cdot \frac{1}{m}\left[ { - 2{x^2} {{\left( {{x^2}I - Z{Z^H}}\right) }^{ - 2}} + {{\left( {{x^2}I - Z{Z^H}}\right) }^{ - 1}}} \right] , \end{aligned}$$
(19)

where n and m are the column and the row of the data matrix corrupted by noise, H is the conjugate transpose.

In our problem, the singular value matrix Z is a square and diagonal matrix, thus, \(m = n\) and \({Z^H} = Z\). Therefore, Eqs. (22) and (23) can be further simplified as:

$$\begin{aligned} \hat{D}\left( {x;Z}\right)&= \frac{1}{{{M^2}}}Tr \left[ {x{{\left( {{x^2}I - {Z^2}}\right) }^{ - 1}}} \right] \nonumber \\&\quad \cdot Tr\left[ {x{{\left( {{x^2}I - {Z^2}} \right) }^{ - 1}}} \right] . \end{aligned}$$
(20)
$$\begin{aligned} \hat{D^{'}} \left( {x;Z}\right)&= \frac{2}{{{M^2}}}Tr \left[ {x{{\left( {{x^2}I - {Z^2}}\right) }^{ - 1}}} \right] \nonumber \\&\quad \cdot Tr\left[ { - 2{x^2}{{\left( {{x^2}I - {Z^2}}\right) }^{ - 2}} + {{\left( {{x^2}I - {Z^2}}\right) }^{ - 1}}} \right] , \end{aligned}$$
(21)

where Z represents the singular value matrix constructed from the \(\left( {N + 1}\right)\)th singular value to the last one, M represents the size of the square matrix Z, \(Tr\left( X\right)\) represents the sum of elements on the main diagonal of a square matrix X, and

$$\begin{aligned} \hat{\sum _{\widehat{r}}} = diag\left( {{{\hat{\sigma }}_{\widehat{r} + 1}}, \cdots ,{{\hat{\sigma }}_q}}\right) , \end{aligned}$$
(22)

where \(\hat{\sigma }_{\hat{r}}\) denotes the singular value extracted using singular value decomposition:

$$\begin{aligned} \sum \limits _{k = 1}^q {{{\hat{\sigma } }_k}{{\hat{u}}_k} \hat{\upsilon _k^H}}. \end{aligned}$$
(23)

Finally, the denoised estimate of the signal matrix with rank \(\hat{r}\) can be computed by using the attained weighted singular values:

$$\begin{aligned} \hat{S_{opt}} = \sum \limits _{k = 1}^{\widehat{r}} {\hat{\omega _{k,\widehat{r}}^{opt}} {{\hat{u}}_k} \hat{\upsilon _k^H} }. \end{aligned}$$
(24)

Proposed denoising framework

In the paper, a new seismic denoising method using the proposed ATSST and the improved Optshrink algorithm is presented. Figure 2 illustrates the proposed algorithm workflow, and the detailed description is as follows:

Step 1: Calculate the sparse TFR of a seismic trace with noise using the ATSST algorithm.

Step 2: Extract the amplitude spectrum and phase spectrum based on the obtained TFR.

Step 3: Perform the improved OptShrink algorithm on the amplitude of TFR to separate the low-rank component.

Step 4: Treat the low-rank component as the desired amplitude spectrum.

Step 5: Generate the denoised seismic signal by back-transforming the aforementioned amplitude spectrum based on the inverse ATSST.

Step 6: Repeat steps (1)–(5) to denoise all seismic traces.

For seismic data with complex geological structures, the rank extraction becomes a challenging task. Thus, the effective rank estimation plays a significant role in this algorithm. In addition, it is noteworthy that the proposed approach is implemented based on the single trace data, the computational efficiency regarding this approach is substantially determined by the ATSST decomposition of seismic trace. Compared with the ATSST, the computational cost of OptShrink algorithm in calculating singular value decomposition may be negligible.

Fig. 2
figure 2

The proposed algorithm scheme

Examples

In this part, we show some examples including synthetic data and real field datasets to illustrate the validity of the proposed denoising method. Meanwhile, the denoised results based on our method, the Cadzow filtering, and the \(f-x\) deconvolution are compared. Here, the metrics, SNR and MSE (mean squared error), are utilized to evaluate the denoising performance of the aforementioned methods:

$$\begin{aligned} SNR&= 10{\log _{10}}\left( {\frac{{{{\left\| d \right\| }_2}}}{{{{\left\| {\hat{d} - d} \right\| }_2}}}}\right) . \end{aligned}$$
(25)
$$\begin{aligned} MSE&= \frac{1}{M}\sum \limits _{m = 1}^M {{{\left[ {\hat{d} \left( m\right) - d\left( m\right) } \right] }^2}}, \end{aligned}$$
(26)

where d or \(d\left( m\right)\) denotes the clean signal, \(\hat{d}\) or \(\hat{d}\left( m\right)\) denotes the denoised signal, and the signal length is M.

Single trace

First, a single trace example is presented, as plotted in Fig. 3. The clean synthetic seismic trace is composed of a 50Hz normal Ricker wavelet at 0.125s, a phase-rotated 40Hz Ricker wavelet at 0.3s, a polarity-inversed 20Hz Ricker wavelet at 0.5s, and two close 30Hz Ricker wavelets at 0.7 and 0.775s, respectively. The noisy single trace is with a SNR of 2 dB. The clean signal and noisy version are plotted in Fig. 3a, b), respectively. Figure 3c, d are the amplitude spectrums regarding the aforementioned signals, which are obtained by using the ATSST.

Then, the improved Optshrink algorithm is employed to transform the time-frequency matrix of the signal with noise into a low-rank matrix and a sparse one as displayed in Fig. 4. It can be seen that the resulting low-rank component and sparse component can be approximately viewed as the denoised signal and removed noise, respectively. Thus, the filtered result can be acquired by transforming the estimated low-rank part based on the inverse transform of ATSST, which is plotted in Fig. 5a. As reported in Fig. 5, the recovered signal by the proposed method has a perfect reconstruction and a negligible error compared with the original signal. Moreover, the SNR of the single trace is improved from 2 dB (Fig. 3b) to 7 dB (Fig. 5a). In addition, we also estimate the noise by extracting the sparse component and transforming it back into the time domain, which is plotted in Fig. 5b.

Fig. 3
figure 3

TFR of a single trace using the ATSST. a Original synthetic trace and b noisy version with a SNR of 2 dB. The TFR of original trace (c) and noisy trace (d)

Fig. 4
figure 4

Decomposition result. a Low-rank component and b sparse component

Fig. 5
figure 5

Reconstructed result. a Recovered (red line) and original (dashed black line) signal. b Removed noise

Synthetic data

Next, a 2D synthetic data based on convolutional model is considered, which is comprised of 150 traces, and the time sampling rate is 2 ms with a total time of 1 s. The data contains the hyperbolic and curved events, and is corrupted by random noise with a SNR of 2 dB. Figure 6a, b exhibit the clean and noisy synthetic records, respectively.

We evaluate the Cadzow filtering, the \(f-x\) deconvolution, and the presented approach using the synthetic data with noise. In this example, the rank parameter is set to 5 for our algorithm. For the Cadzow filtering, the same parameter is chosen as six owing to the existing non-linear events in this data, and we obtain the best noise suppression performance in this case. For the \(f-x\) deconvolution, the length of predictive filter is 15 and the processing frequency band is between 1 and 90 Hz. Figure 7a–c show the denoised results by the \(f-x\) deconvolution, the Cadzow filtering, and the presented method, respectively. Figure 7d–f are the corresponding removed noise. As shown on the figures, the presented approach does a good job in preserving seismic signals (see Fig. 7f) despite some noise still remaining in the denoised result (see Fig. 7c). The \(f-x\) deconvolution shows a greater ability to suppress random noise as compared with the Cadzow filtering. However, inspection of the noise sections indicates that a significant amount of signal leakage can be perceived for the \(f-x\) deconvolution (see red arrows in Fig. 7e), and some useful signals are also leaked in the case of the Cadzow filtering (see red arrows in Fig. 7d).

We also calculated the amplitude spectrums regarding clean, noisy, and denoised results via the Cadzow filtering, the \(f-x\) deconvolution, and the presented algorithm, which are displayed in Fig. 8. These figures indicate that residual noise exists in different denoised results (see red boxes in Fig. 8d–f). It is obvious from Fig. 8f that the amplitude spectrum from the proposed method has the same features as that of the noise-free data within the effective frequency band, which means that the presented method is able to retrieve the seismic signals well. However, the other two methods, the Cadzow filtering and the \(f-x\) deconvolution, do damage to seismic signals; for instance, the Cadzow filtering leads to the low-frequency component leakage (see red arrows in Fig. 8d), the \(f-x\) deconvolution can not reconstruct the middle-frequency content effectively (see red arrows in Fig. 8e.

Furthermore, the SNR and MSE for the results of above-mentioned methods are also calculated, which are displayed in Fig. 9a, b, respectively. The figures demonstrate that the presented method is superior to the other two approaches, because it has the highest SNR and lowest MSE.

Fig. 6
figure 6

a 2D synthetic seismic data. b Noisy synthetic data with SNR = 2 dB

Fig. 7
figure 7

Denoised results using a the Cadzow filtering, b the \(f-x\) deconvolution, and c the proposed method. Removed noise sections corresponding to d the Cadzow filtering, e the \(f-x\) deconvolution, and f the proposed method

Fig. 8
figure 8

Amplitude spectrums of a original data, b noise, c noisy data, and denoised versions by using d the Cadzow filtering, e the \(f-x\) deconvolution, and f the proposed method

Fig. 9
figure 9

Output SNR (a) and MSE b comparison of the three methods for various input SNR

Pre-stack shot data

To further assess the effectiveness of the proposed algorithm in practice, a real pre-stack shot data is taken as an example, in which there are several types of noise such as random noise, multiples, and external source interfrence noise, herein the random noise is concerned. The data is composed of 100 traces, the time sampling rate is 2 ms, and each trace has 500 time samples. Figure 10 shows the noisy shot data. Figure 11a–c exhibit the filtered results of applying the Cadzow filtering, the \(f-x\) deconvolution, and the presented algorithm, respectively. Figure 11d–f are the corresponding noise. The rank parameters for the Cadzow filtering and the proposed algorithm are set to 18 and 20, respectively. For the \(f-x\) deconvolution, we set the filter length as 10, and the processing frequency band is between 1 and 120 Hz. It is obvious that, although the Cadzow filtering and the \(f-x\) deconvolution can attenuate the most random noise, both of them cause some harms to the seismic reflection events to a certain degree (see blue arrows in Fig. 11d, e), especially for the \(f-x\) deconvolution. By comparison, the proposed approach preserves the seismic events well (see Fig. 11f).

For the sake of clarity, we have enlarged one area between 0.5 and 0.8 s, and shown it in Fig. 12. As reported in these results, the presented method preserves the amplitude of the primary refection events well compared with the Cadzow filtering and the \(f-x\) deconvolution. However, the continuity of some events may be damaged a bit. It should be noted that all figures are shown using the same amplitude interval. Figure 13 shows the three local similarity maps between the denoised shot data and removed noise for the three methods, which indicates vividly that both of the Cadzow filtering and the \(f-x\) deconvolution cause a large amount of damaged signals in the noise, especially for the \(f-x\) deconvolution, and the signal damage using the proposed method is the least.

Besides, we also compute the amplitude spectrums for the aforementioned approaches, which are plotted in Fig. 14. It can be clearly observed that the presented method is able to retrieve the seismic signals well, but the Cadzow filtering and the \(f-x\) deconvolution cannot preserve the seismic events.

Fig. 10
figure 10

Noisy real shot data

Fig. 11
figure 11

Denoised results of shot data using a the Cadzow filtering, b the \(f-x\) deconvolution, and c the proposed algorithm, respectively. df are the removed noise corresponding to the three methods

Fig. 12
figure 12

Zoomed denoised results and removed noise using a, b the Cadzow filtering, c, d the \(f-x\) deconvolution and e, f the proposed algorithm, respectively

Fig. 13
figure 13

A local similarity comparison between denoised shot data and removed noise. a The Cadzow filtering, b the \(f-x\) deconvolution, c the proposed method

Fig. 14
figure 14

Amplitude spectrums of the output of the aforementioned methods

Post-stack data

In the section, a post-stack data is considered. This noisy data is shown in Fig. 15, which consists of 400 traces, each trace has 1000 time samples and the sampling rate is 2 ms in the time domain. Figure 16e exhibits the filtered result based on the proposed algorithm. Similarly, the Cadzow filtering and the \(f-x\) deconvolution are also utilized for comparison, which are exhibited in Fig. 16a, c, respectively. Figure 16b, d, f are the removed noise related with the above-mentioned methods. In the example, the input rank parameters are 16 and 20 for the Cadzow filtering and the proposed method, respectively. For the \(f-x\) deconvolution, the frequency range is still from 1 to 120 Hz, but the length of predictive filter is 8.

As can be obviously observed, the presented algorithm performs better, with less the energy of signal leakage (see Fig. 16f). However, a significant amount of energy leaking associated with seismic events occurs when applying the Cadzow filtering and the \(f-x\) deconvolution (see Fig. 16b, d). Furthermore, we also zoom in one area of interest marked by the black box (see Fig. 16), as shown in Fig. 17. As indicated in the results, the proposed algorithm does a good job in preserving the amplitude of seismic events, than the Cadzow filtering and the \(f-x\) deconvolution. For evaluating the signal damages for this example, we calculate the local simimarity between the denoised post-stack data and the noise and show them in Fig. 18, where the small local similarity is shown in the similarity map corresponding the proposed method.

Additionally, the amplitude spectrums of the above-mentioned methods also indicate that the presented algorithm is better than the other ones (see Fig. 19).

Fig. 15
figure 15

Noisy post-stack section

Fig. 16
figure 16

Denoised post-stack sections using a the Cadzow filtering, c the \(f-x\) deconvolution, and e the proposed algorithm, respectively. b, d, and f are the removed noise corresponding to the three methods

Fig. 17
figure 17

Zoomed denoised post-stack sections and corresponding noise using a, b the Cadzow filtering, c, d the \(f-x\) deconvolution, and e, f the proposed algorithm, respectively

Fig. 18
figure 18

A local similarity comparison between denoised post-stack data and removed noise. a the Cadzow filtering, b the \(f-x\) deconvolution, c the proposed method

Fig. 19
figure 19

Amplitude spectrums of the output of the aforementioned methods

Discussion

In the paper, we have presented a new denoising approach, which integrates the ATSST and the improved OptShrink algorithm. In the proposed approach, we first transform a seismic signal corrupted by noise into a time-frequency matirx through the ATSST. Next, the low-rank component and the sparse component are estimated via an improved OptShrink algorithm. Compared with the original TSST algorithm, the ATSST with a time-varying parameter can further enhance the energy concentration of TSST in the time-frequency plane and resolution of a multicomponent signal, thus, it is very promising in instantaneous frequency estimation of the component of a multicomponent signal and accurate component recovery. It is worth noting that the presented method operates trace by trace, it makes that the complex geological structures of the field data does not affect the performance of such an algorithm, which is also one of the prime reasons why the proposed method outperforms the traditional 2D denoising approaches. In addition, the improved OptShrink algorithm reduces the computational cost by simplifying the D transformation and its first derivative in the original OptShrink algorithm, thus, it is more convenient and efficient to implement. However, the proposed method runs involving multiple time-frequency transforms in practice, it inevitably has an influence on computational efficiency, and parallel computing may be the direction of future improvement.

On the basis of aforementioned principle, we can recover the desired signal from the low-rank component of time-frequency matrix of the noisy seismic data. So it is very important to accurately separate the low-rank component. With regard to the OptShrink algorithm, the presence of noise will have a great impact on the estimation of matrix rank, thus, estimating the effective rank plays a significant part in the algorithm. Figure 20 shows the relationship between the output SNR and the rank values for the synthetic single trace in Fig. 3b. It can be found that the SNR first increases fast, then reaches the peak followed by a small reduction trend as the rank values vary. Note that a larger rank will increase the burden of calculation and decrease the noise removal. Thus, several trials need to be done to obtain the optimal rank value in the real case.

Additionally, the commonly used noise reduction methods such as the Cadzow filtering and the \(f-x\) deconvolution are utilized to compare in the paper. In the Cadzow filtering method, a trade-off exists between the noise suppression and the seismic events recovery. A higher rank will reduce the noise suppression and better preserve the seismic events and vice versa. For the \(f-x\) deconvolution method, the input frequency band and the predictive filter length are two significant parameters, which usually have an impact on the level of noise reduction. We have tested the aforementioned algorithms with a variety of parameters until the best results can be obtained.

Fig. 20
figure 20

Output SNR versus different rank values

Conclusion

We propose an adaptive time-reassigned synchrosqueezing transform (ATSST), which is designed to achieve a high-resolution and reversible representation for strongly time-varying non-stationary signal. Afterwards, we combine the ATSST with the improved OptShrink algorithm for random noise attenuation. In the proposed approach, a time-varying window function is introduced to improve time-frequency energy concentration of the original TSST algorithm. Additionally, the low-rank property of seismic signal will be unchanged from time-domain to time-frequency domain, thus, the denoised seismic signal can be rocovered by the inverse transformation of the low-rank matrix from ATSST decomposition. We have tested the presented approach on both synthetic data and real field datasets, and compared it with the Cadzow filtering and the \(f-x\) deconvolution approaches. The results demonstrate that the presented approach performs clearly better in seismic signal preservation. Future works will focus on trying other types of window functions, and develop new applications.