Introduction

Distributed acoustic sensing (DAS) technology is a new technology that uses optical fiber as acoustic signal sensor (Hartog et al. 2014; Olofsson and Martine 2017). The operating principle of sensor system is to measure the acoustic field variation along the optical fiber by transmitting laser pulses into the fiber and receiving Rayleigh Backscatter naturally generated from the fiber. The acoustic signal is coupled to the fiber by friction or pressure, causing the dynamic strain changes along the cable. These strain changes lead to the small displacement of the scattering elements, which leads to the variations of the relative phase of the backscattered photons (Frignet and Hartog 2014; Yu et al. 2018). Through the phase demodulation technology, DAS system can restore the external vibration signal sensed by optical fiber (Madsen et al. 2013; Parker et al. 2014).

For the DAS technology in borehole seismic data acquisition, optical fiber can be used not only as a sensor for seismic waves but also as a transmission medium for signals (Daley et al. 2013). Compared with the conventional vertical seismic profile (VSP) data acquisition technology using downhole geophones, DAS has some prominent advantages (Mestayer et al. 2011; Mateeva et al. 2014; Daley et al. 2016): (1) The cost of DAS equipment is much lower than that of conventional three-component geophone, and its acquisition and construction cost is also lower; (2) the DAS equipment can achieve high density acquisition, the minimum sampling interval can reach 0.25 m, which can present the high resolution data, and its one-time acquisition can achieve full well coverage; (3) the optical fiber sensor of DAS also has many advantages, such as anti-electromagnetic interference, good concealment, and corrosion resistance. However, due to some technical limitations, DAS technology also has some shortcomings. Owing to the strong azimuth response for the optical fiber cable of DAS, it is relatively difficult to realize multi-component observation (Zhang et al. 2020). Even though there have been corresponding improvement technologies (Ning and Sava 2018) in recent years, the actual observation needs to be verified. Affected by many factors, such as acquisition, layout, demodulation technology, and cable noise, the signal-to-noise ratio (SNR) of DAS data is relatively low compared with conventional VSP record. When the fiber cable is not closely contacted with the wellbore, the coupling between the fiber and the formation is poor, resulting in strong coupling noise (Constantinou et al. 2016; Correa et al. 2017).

The acquired DAS VSP seismic data contain some strong coupling noise and some background noise (Bakku et al. 2014; Binder et al. 2020), which reduce the quality of data, so we need to eliminate these interference (Dong et al. 2019). In order to extract the features of effective signal conveniently and remove the noise completely, we choose to transform DAS data into time–frequency domain for analysis and processing.

Time frequency analysis (TFA) is an effective method to analyze time-varying non-stationary signals (Pons-Llinares et al. 2015; Yu et al. 2017). It maps the time-series signal from one-dimensional time axis into two-dimensional time–frequency (TF) plane, and comprehensively reflects the joint characteristics of time and frequency domain of signals (Yang et al. 2014). Traditional time frequency analysis methods include short-time Fourier transform (STFT) (Meignen and Pham 2018), wavelet transform (WT) (Cai et al. 2001), empirical mode decomposition (EMD) (Gómez and Velis 2016; Chen et al. 2017), and so on. Although these traditional TFA methods have some effect on signal processing, they also have various defects, such as Heisenberg uncertainty principle, unexpected cross terms and mode aliasing (Thakur and Wu 2011; Yu et al. 2019). These defects seriously interfere with the description of signal features, and it is difficult to accurately identify effective information. In order to approach the ideal time frequency analysis (ITFA) gradually, many advanced TFA methods have been proposed, for instance, the variational mode decomposition (VMD) (Liu et al. 2016; Liu and Duan 2020), reassignment method (RM) (Auger and Flandrin 1995; Auger et al. 2013), synchrosqueezing transform (SST) (Daubechies et al. 2011; Huang et al. 2016). Among them, SST method can not only improve the resolution of TF result, but also allow signal reconstruction. In this paper, we adopt a time–frequency analysis method based on SST, called Multisynchrosqueezing Transform (Yu et al. 2019). A more precise frequency-reassignment operator can be obtained by applying multiple SST operations iteratively. It makes the TF results more concentrated and approach the ITFA result in a stepwise manner. Moreover, it can reconstruct signals, near perfectly, and is very suitable for the processing and analysis of non-stationary signals.

The DAS seismic signal we processed is a relatively complex non-stationary time-varying signal, and a more ideal time–frequency representation can be more conducive to our understanding and extraction of signal features. DAS seismic data contain strong coupling noise and some non-negligible background noise interference, and its SNR is lower than that of conventional VSP data. It is difficult to remove the coupling noise with strong energy by some conventional time–frequency domain filtering methods, because of the frequency band overlap between the coupling noise and effective signal. In addition, except for direct wave, the other effective reflected signals are relatively weak, which is not conducive to being retained. Therefore, we choose the MSST method which can approximately achieve the ITFA effect to analyze the DAS seismic data in time–frequency domain. Through the time–frequency features of DAS data constructed by MSST, we can better observe the data, discover the potential feature structure of it, and facilitate the separation and extraction of features.

The time–frequency domain characteristics of the effective signal and coupling noise of DAS data are obvious and easy to distinguish under the MSST transform. The feature representation with energy compaction is more convenient for us to extract the effective components, and the algorithm can realize signal reconstruction. In the process of time–frequency feature extraction, in order to extract effective features more accurately, this paper establishes a suitable MSST time–frequency feature domain and decomposes the feature matrix into low-rank matrix (LM) and sparse matrix (SM), so as to make the feature extraction more convenient and clear. Meanwhile, the data position points of the decomposed LM and SM are statistically analyzed, and their data position points distribution maps (DPM) in time and frequency domain are obtained respectively. Combined with the feature of DPM, we realize the accurate extraction of effective signals and the reservation of weak signals. In this paper, we establish a denoising method through low-rank and sparse matrix decomposition (LSMD) and DPM analysis under the MSST feature domain (LS-DPM-MSST), which can remove coupling noise and extract weak effective signal, at the same time, we reduce the background noise by using low-rank constraint on the data. In the following part, the basic principle of the time–frequency analysis method MSST and the specific process of the proposed method for VSP data denoising are introduced in detail. The proposed method is compared with some traditional methods to verify the feasibility and superiority of the method in synthetic and real data processing.

Basic theory of LS-DPM-MSST

Basic principles of MSST

The MSST is based on short-time Fourier transform (STFT) framework. The STFT of a function \(s \in L^{2} (R)\) with respect to the real and even window \(g \in L^{2} (R)\) is defined by:

$$G(t,w) = \int_{ - \infty }^{ + \infty } {g(u - t)s(u)e^{ - iw(u - t)} } {\text{d}}u,$$
(1)

where the window \(g(u)\) compactly supports in \([ - \Delta_{t} ,\Delta_{t} ]\), \(t\), \(u\) denote the time variable, \(w\) denotes the frequency variable, (\(\Delta_{t}\) denotes a minimum time). Though the TFA results of STFT are relatively blurry, it can be concentrated in a compact region around the instantaneous frequency (IF) trajectories of each mode by the SST operation, and the results of SST are clearer. The SST employs a frequency-reassignment operator to gather the spread TF coefficients, which can be expressed as:

$$T_{s} (t,\eta ) = \int_{ - \infty }^{\infty } {G(t,w)} \delta (\eta - \widehat{w}(t,w)){\text{d}}w,$$
(2)

where \(\eta\) represents the reassigned frequency, \(\widehat{w}(t,w)\) is the instantaneous frequency estimation of STFT, and it can be expressed as:

$$\widehat{w}(t,w) = \partial_{t} G(t,w)/iG(t,w).$$
(3)

In order to get a much sharper TF representation, the MSST method applies multiple SST operations iteratively, so that the energy of TF analysis results is gradually concentrated, and the estimated result of IF is closer to true IF, so as to approximate the ITFA in a stepwise manner. Thus, the MSST (Yu et al. 2019) can be formulated as:

$$\begin{aligned} T_{s}^{[N]} (t,\eta ) = & \int_{ - \infty }^{\infty } {T_{s}^{[N - 1]} (t,w)} \delta (\eta - \widehat{w}(t,w)){\text{d}}w \\ = & \int_{ - \infty }^{\infty } {G(t,w)} \delta (\eta - \widehat{w}^{[N]} (t,w)){\text{d}}w, \\ \end{aligned}$$
(4)

where N is the iteration number such that \(N \ge 2\).

Considering that MSST only reassigns TF coefficients in frequency direction, and there is no information leakage, theoretically, MSST allows perfect signal reconstruction. The original signal can be perfectly recovered via:

$$s(t) = (2\pi g(0))^{ - 1} \int_{ - \infty }^{\infty } {T_{s}^{[N]} (t,w)} {\text{d}}w,$$
(5)

where \(g(0)\) is the value of window function \(g(t)\) at time 0.

Discrete MSST

For discrete data \(s[l],l = 0,1, \ldots ,L - 1\), the discrete STFT can be expressed as:

$$G[h,m] = \sum\limits_{l = 0}^{L - 1} {g[l - h]s[l]e^{ - i(2\pi /L)m[l - h]} },$$
(6)

where \(L\) is the number of samples, \(h\) is a discrete time variable, and \(m\) is a discrete frequency variable. Similarly, discrete MSST can be described as:

$$\begin{aligned} T_{s}^{[N]} [h,\xi ] = & \sum\limits_{m = 0}^{M - 1} {T_{s}^{[N - 1]} [h,m]\delta [\xi - \widehat{w}[h,m]]} \\ = & \sum\limits_{m = 0}^{M - 1} {G[h,m]\delta [\xi - \widehat{w}^{[N]} [h,m]]}, \\ \end{aligned}$$
(7)

where \(\xi\) denotes a discrete frequency variable, \(M\) is the number of frequency samples, \(m = 0,1, \ldots ,M - 1\).

DAS VSP data processing based on LS-DPM-MSST

The concentrated TF representation of MSST can address various signal components and extract weak effective signal in a better way, and the DAS-VSP seismic data are observed and processed in the time–frequency characteristic domain under MSST transform in this paper. Through the high-resolution MSST method, we can observe the difference between signals and coupling noise in time–frequency domain more clearly and intuitively. As shown in Fig. 1, (a) and (b) are the seismic trace without noise and trace with coupling noise of DAS seismic data in time domain respectively, (c) and (d) are their time–frequency domain representation results through MSST in an appropriate window function. It is found that the frequency bandwidth of effective signals is wide and the energy distribution is more like the dot- or circle-shaped aggregate distribution; the frequency bandwidth of coupling noise is narrow, and the energy distribution presents the obvious straight line-like characteristic along the time axis. The TF results also contain some background noise and high frequency random noise. We can reduce the impact of this part of noise by constraining the frequency dimension of TF results, also it can reduce the redundancy of data and the computational complexity.

Fig. 1
figure 1

a, b the seismic trace without noise and trace with coupling noise of DAS seismic data in time domain respectively, c, d their corresponding representation results in time–frequency domain by MSST in the form of DMP

Ideally, we hope to extract only the effective signal components from the time–frequency feature map and reject the noise components. Although the TFA results based on MSST have achieved a very concentrated TF representation in Fig. 2a, and we can observe obvious characteristics difference, it still needs some means to distinguish the effective signals and the noise. We utilize LSMD to complete the first separation for seismic data in time–frequency domain, and the obtained time–frequency representation of LM and SM is shown in Fig. 2b, c. For the data decomposition, we suppose that the seismic data matrix \(D\) is the superposition of a low-rank component and a sparse component. Under some suitable assumptions, it is possible to recover both the low-rank and the sparse components exactly by simply minimizing a weighted combination of the nuclear norm and of the L1 norm. This procedure is shown in Eq. (8), where \(\left\| { \, \cdot \, } \right\|_{ * }\) represents the nuclear norm of a matrix, \(\left\| { \, \cdot \, } \right\|_{1}\) represents the L1 norm of a matrix. Solving this convex optimization problem, we can get the LM \(L_{{\text{M}}}\) and SM \(S_{{\text{M}}}\).

$$\begin{gathered} \min \;\left\| {L_{{\text{M}}} } \right\|_{ * } + \lambda \left\| {S_{{\text{M}}} } \right\|_{1} \hfill \\ {\text{s}}.{\text{t}}. \, D = L_{{\text{M}}} + S_{{\text{M}}} \hfill \\ \end{gathered}$$
(8)
Fig. 2
figure 2

a the time–frequency representation of noisy DAS seismic data by MSST, b, c are the LM and the SM of the noisy data by LSMD. They are all the time–frequency representations in the form of DMP. The red arrows indicate the relatively obvious effective signals and coupling noise. Some obvious effective signal components and most of the coupling noise components are marked in (b), and part of the signal components are marked in (c)

The decomposed LM contains some effective signal components and most of the coupling noise components, while SM contains most of the effective signal components (especially weak effective signal), and a small number of possible coupling noise components (as shown in Fig. 2b, c). Next, we complete further signal component extraction and remove the coupling noise part on LM and SM. It can be seen that since the retention of coupling noise is mostly on LM, the noise elimination is mainly completed in it. Most of the weak effective signals are retained on SM, so the preservation and recovery of the weak signals is the focus in this part. The two parts are different in emphasis and processing degree.

To further deal with LM and SM, we set up DPM to assist the data processing. The DMP can be obtained by recording the point positions of data which are processed by the hard threshold function in the matrix. The time–frequency diagrams in Figs. 1 and 2 are presented in the form of DPM. This representation can more clearly observe the time–frequency domain position distribution of data points, which is convenient for us to find feature information. Then, by accumulating all the location points in the time and frequency direction of the DPM, their frequency domain DPM (F-DPM) and time domain DPM (T-DPM) can be obtained (as shown in Fig. 3). The values of F-DPM and T-DPM reflect the distribution of data locations to a certain extent. By observing the F-DPM of LM, we can determine the frequency range of the coupling noise. According to the T-DPM of SM and LM, we can also roughly determine the time domain distribution range of the signals and then combine with F-DPM to determine the frequency distribution of the signals. Through empirical analysis and the numerical criteria determination, we can choose which distributions are needed and which are not. Given the distinct separation of signal and noise in the F-DPM and T-DPM, respectively, the effective signal components can be manually extracted from the F-DPM and T-DPM after smoothing.

Fig. 3
figure 3

a, b the DPM of LM in time direction and frequency direction respectively, and c, d the DPM of SM in time direction and frequency direction, respectively. The amplitude in b, d is normalized. Some obvious effective signal components and coupling noise components are marked by red arrows

We constrain the data rank to suppress the background noise. The LSMD method we selected is GreGoDec, which is proposed by Zhou and Tao (2013) based on GoDec algorithm. It has better robustness to noise and faster convergence speed and is suitable for processing DAS records with large amount of data. And the low-rank constraint solution is completed via nuclear norm minimization (Zhou and Zhang 2017) of data. The flow of DAS VSP data processing by LS-DPM-MSST method proposed in this paper is shown in Fig. 4.

Fig. 4
figure 4

The flow of DAS VSP data processing by LS-DPM-MSST method

Experiments and results

Synthetic records

We first test the denoising performance of the proposed method by processing the synthetic DAS-VSP data generated by the forward model. Figure 5 shows a 2-D forward geological model, containing four layers with different wave velocities, where the abscissa is the horizontal distance (m), the ordinate is the depth (m), the inverted triangle represents the seismic source, and the vertical black line represents the fiber optic sensor. The parameters of the forward model are shown in Table 1, and the pure record corresponding to it is shown in Fig. 6a. By adding some real background noise and coupling noise taken from DAS-VSP data to the pure record, we can obtain the synthetic noisy DAS-VSP record shown in Fig. 6b. In the noisy record, most of the effective signals are seriously contaminated by noise, especially the effective signals with weak energy, also the coupling noise with strong energy destroys the continuity of events.

Fig. 5
figure 5

The forward model contains four layers with different wave velocities. The red rectangle box on the left marks the optical fiber cable. The red rectangular box in the top right corner marks the seismic source

Table 1 Parameters of the forward model
Fig. 6
figure 6

Experiment with synthetic records. a the synthetic pure DAS-VSP record obtained by the forward model, and b the synthetic noisy DAS-VSP record with real background noise and coupling noise. cf Denoised results by using WT, VMD, BP and LS-DPM-MSST, respectively

We apply the wavelet transform (WT), variational mode decomposition (VMD), bandpass filtering (BP) and the proposed method (LS-DPM-MSST) to process the synthetic noisy DAS-VSP record, and the denoised results of them are shown in Fig. 6c–f successively. Meanwhile, their removed noise records between the noisy record and the denoised records are shown in Fig. 7a–d. WT is a classical TFA method, and it is widely used in seismic data processing (Goudarzi and Riahi 2012; Ouadfeul and Aliouane 2014). The DAS seismic data are divided into scales through WT, and the appropriate threshold at each scale is set to complete the screening of coefficients. The soft threshold function is selected, and through continuous experimental tests, the relatively optimal scale and threshold parameters are selected to complete the final denoising processing. VMD is an excellent time–frequency transform method developed in recent years, and has also been applied to noise suppression in seismic data (Liu et al. 2016; Liu and Duan 2020). VMD is a nonlinear TFA method that can realize adaptive decomposition. Compared with the conventional EMD method, VMD has a more solid mathematical foundation and can avoid the problem of mode mixing to some extent. Through many tests, we select the appropriate decomposed modes and get a relatively better processing result. BP filtering is a classical and effective method in noise suppression for most seismic data (Douglas 1997; Ma et al. 2019), it is applied to noise elimination for DAS seismic data by appropriate frequency band in this part.

Fig. 7
figure 7

Removed noise records. ad Removed noise by using WT, VMD, BP and LS-DPM-MSST, respectively

From the denoised results, the four methods can suppress the background noise, but the proposed method has the best suppressing effect, its record is cleaner with least residual noise. The denoised result of BP shows the most background noise residue, followed by VMD and WT results. For signal preservation, the result of WT is difficult to observe the weak effective signals, and the proposed method can recover the effective signals clearly and continuously and keep the weak signal effectively compared with VMD and BP. As for the coupling noise, the proposed method can suppress the noise well, but the other three methods are not ideal, there are still many residual noise. From the observation of the LS-DPM-MSST result, except for a very small part of the coupling noise, the rest of the noise is well suppressed, the denoised seismic records are very clean, and the weak signals are preserved well. From the removed noise records, we can see that the WT attenuates the signals seriously while eliminating the noise. There are more residues of effective events in the WT removed noise record, and the weak events loss is obvious. There are also obvious down-going direct event residues in removed noise record of VMD. Although there is no significant effective signals loss in the BP removed noise record, it contains only background noise and almost no coupling noise in the record, so it can be seen that BP has a weak inhibitory effect on coupling noise. Basically, no significant loss of effective events are seen in the removed noise record of LS-DPM-MSST, and it contains most of the background noise and coupling noise.

Field records

To show the effectiveness of the proposed method, we further applied it to denoise a field DAS-VSP record. Figure 8a presents a field DAS-VSP data from the Tarim region of Xinjiang in western China, which the abscissa is the trace number of seismic data and the ordinate is the trace sample numbers. The sample interval of the DAS system is 1 ms, and the trace sample is 6000. It can be seen from the field data that the down-going direct events and weak up-going events are seriously contaminated by the background noise and coupling noise. From the denoised results (shown in Fig. 8b–e) of the field data, we can discover that the coupling noise and background noise are effectively suppressed by the proposed method, and the effective events become more continuous. The quality of the field data is obviously improved. The other three methods (WT, VMD, and BP) have limited ability to suppress the coupling noise, and the energy loss of the effective signals is large, which destroys the continuity of the events.

Fig. 8
figure 8

Experiment with field records. a A field DAS VSP record. be Denoised results by using WT, VMD, BP and LS-DPM-MSST, respectively

Conclusions

We design a denoising method, LS-MP-MSST, based on ITFA-like to eliminate the coupling noise and background noise of DAS-VSP data. The method can determine the feature components of the effective signal by comprehensively analyzing the DPM of the LM and SM in the time and frequency domain, and completes a more accurate effective information extraction. Both synthetic and field examples show that the proposed method has obvious effect in suppressing coupling noise and background noise, and can reduce the leakage of effective information and recover weak effective signals.