1 State-of-the-Art Interpolation Methods

Physical or economical limitations often cause seismic data to be under-sampled or aliased, so the interpolation or reconstruction is an important issue in seismic data processing (Trahanias and Venetsanopoulos 1993; Lin et al. 2015; Han et al. 2015; Wang et al. 2017; Schneider et al. 2017). The under-sampled data without reconstruction may suffer from degraded amplitude quality and pose sampling artifacts on subsequent processing workflows such as multi-channel deconvolution (Kazemi et al. 2016), surface-related multiple attenuation (Chen et al. 2017), simultaneous-source separation (Zhou et al. 2016; Zhou 2017), and amplitude variation with offset analysis (Keys 1981; Abma and Kabir 2006). There are generally three types of data reconstruction, namely, reconstruction for irregularly sampled data, regularly sampled data, and sparse geophysical data.

1.1 Interpolation for Irregularly Sampled Data

Interpolating irregularly sampled data is considered to be much easier than interpolating regularly sampled data because irregularly sampled data is less aliased (Yang et al. 2012). There have been a large number of methods tailored for the irregularly sampled data reconstruction. Sparsity-promoting reconstruction is one of the most widely used approaches in the literature and is closely related with the compressive sensing paradigm (Candès et al. 2006b; Donoho 2006; Candès and Plan 2010; Chen 2017). This type of method utilizes the sparse transform to distinguish between useful signals and missing traces. The incomplete data is first transformed into a sparse domain, where the useful signals are compressed to be large-amplitude coefficients and the missing data causes small-amplitude coefficients. Rejecting the small-amplitude noise while preserving large-amplitude coefficients helps to extract the useful signal components from the incomplete data. An inverse sparse transform is used to recover missing signal components at the under-sampled spatial locations. These steps are iterated multiple times until the missing data are optimally restored. The key in the sparsity-promoting-based methods is the selection of sparse transform. The optimal sparse transform should have the optimal compression performance. Thus, a lot of research efforts have been dedicated to the development of the sparse transforms. The state-of-the-art sparse transforms include Fourier transform (Sacchi et al. 1998), wavelet transform (Antonini et al. 1992; Gilles 2013; Mousavi et al. 2016; Mousavi and Langston 2016, 2017), Radon transform (Beylkin 1987; Trad et al. 2002b; Deans 2007; Abbad et al. 2011; Ibrahim and Sacchi 2014; Xue et al. 2014), curvelet transform (Candès et al. 2006a; Zu et al. 2017), seislet transform (Fomel and Liu 2010), shearlet transform (Kong and Peng 2015), dreamlet transform (Wang et al. 2015), contourlet transform (Do and Vetterli 2005), bandlet transform (LePennec and Mallat 2005), dictionary-learning-based transforms (Elad et al. 2013; Elad and Aharon 2006; Mairal et al. 2008; Protter and Elad 2009; Liang et al. 2014), etc.

Low-rank approximation-based approaches are among the most popular interpolation approaches in recent years (Oropeza and Sacchi 2011; Wang et al. 2017). This type of method assumes that multi-channel data is inherently low rank in some mapping domains, where a rank-reduction operator, e.g., the singular value decomposition (SVD), can be applied to remove the unwanted noise or to recover the missing data that causes the high-rank structure. The low-rank approximation approach can also be viewed as extracting the principal components from the decimated and noise-corrupted data (Huang et al. 2017). Trickett et al. (2010) initialized the rank-reduction-based interpolation methods in the reflection seismic community. They applied a truncated SVD to the constant frequency slices to reconstruct the missing data. Oropeza and Sacchi (2011) introduced the multi-channel singular spectrum analysis (MSSA)-based rank-reduction algorithm for simultaneous data reconstruction and noise rejection. The incomplete multi-channel data is first transformed into the frequency domain, then a Hankel matrix is constructed for each frequency slice, which is followed by SVD. The missing data is reconstructed through a POCS-like iterative solver and the performance is decent for reflection seismic data that contains locally linear or planar structure. Huang et al. (2016) derived a theoretically optimal rank-reduction operator for better extracting useful signals from noise-corrupted and highly incomplete data, which they call damped multi-channel singular spectrum analysis (DMSSA) operator. Chen et al. (2016b) further applied the DMSSA operator to interpolate highly incomplete 5D seismic data and demonstrated the much improved performance. Siahsar et al. (2017) developed the adaptive optimum-shrink strategy for reconstructing the missing data.

1.2 Interpolation for Regularly Sampled Data

The two aforementioned interpolation methods can only be effectively used to restore the irregularly missing data. For interpolating regularly sampled data, both types of method cannot obtain satisfactory performance due to the strong aliasing issue. To tackle this challenge, interpolating methods arising from different perspectives need to be developed. Spitz (1993) developed a prediction-based interpolation method to deal with strongly aliased data. The idea is to construct prediction-error filters from low-frequency components of the multi-channel data which are less-aliased or not aliased and then to apply the prediction filters to high-frequency components of the data. Naghizadeh and Sacchi (2007) extended the prediction method to interpolate irregularly sampled data that also contains aliased events. Naghizadeh and Sacchi (2007) also extracted the prediction filters from the low-frequency contents. Based on a similar concept, Gan et al. (2015) obtained the local slope map that is crucial for the anti-aliasing ability in the seislet transform from the low-frequency data. Naghizadeh and Sacchi (2010) conducted eigen decomposition of the Hankel matrix at low frequencies and used the result as beyond-alias preconditioning operator to regularize the eigen decomposition of high-frequency slices. In conclusion, all those anti-aliasing interpolation methods for regularly sampled data rely on the low-frequency information that is less-aliased, which aims to help reconstruct the high-frequency data that is much aliased in some ways, e.g., based on prediction, local slope, or eigen decomposition.

1.3 Interpolation for Sparse Data

Interpolating sparse data, e.g., reflection seismicor ground penetrating radar (GPR) data with large gaps or sparsely distributed well-log data, is currently a challenge in many geophysics domains. Although there are tentative efforts, this challenge has not be well solved by existing techniques. Hale (2010) proposed a method for interpolating multiple wells that are located at different places of the 2D surface, which is called image-guided 3D interpolation. In this method, blended neighbor scheme is used for interpolation. The method requires a 3D reflection seismic image for the guiding the borehole data interpolation following geological structure. In addition, the data used in Hale (2010) is a relatively flat 3D image, application of the method in more complicated data has not been reported. Because of these requirements, the method in Hale (2010) is still applicable for post-seismic-stack data interpolation and not able to be applied to more complicated data, e.g., pre-stack data. Karimi and Fomel (2015) proposed a well-log interpolation approach named as predictive painting, which is based on predicting unknown well logs from the known well logs according to a prior geological trend. The geological trend is calculated via the calculated local slope from a known 3D reflection seismic image using the plane-wave destruction (PWD) algorithm. The problem for this method is that it can only be used to predict a very smooth velocity model since large discontinuities such as the faults and fractures will result in unreliable slope prediction.

In this paper, we first review the state-of-the-art interpolation methods that are widely used in the geophysics community. Then, a new inversion-based method is proposed to tackle the bottleneck of state-of-the-art algorithms in interpolating extremely sparse data. The method has the minimum requirements for a prior information. Considering the severe ill-posedness of the inverse problem, shaping regularization is used iteratively to constrain the model. Compared with other well-known regularization methods, the shaping regularization method offers a much more flexible control on the model, and a much accelerated convergence. We provide two straightforward applications of the method, one in interpolating data with large gaps, the other one in interpolating sparse well-log data for preparing optimal initial velocity models that are required by high-fidelity full-waveform inversion (FWI). Both synthetic and real data examples are shown to demonstrate the effectiveness of the proposed method.

2 Theoretical Review of the Methods

2.1 Interpolation as an Inverse Problem

The problem for interpolating multi-channel geophysical data can be formulated as an inverse problem as follows (Trad et al. 2002a; Liu and Sacchi 2004; Yang et al. 2012; Zhang et al. 2017):

$$\begin{aligned} {\mathbf {d}} = {\mathbf {L}}{\mathbf {m}}, \end{aligned}$$
(1)

where \({\mathbf {d}}\) denotes the recorded geophysical data, \({\mathbf {L}}\) denotes the sampling operator, and \({\mathbf {m}}\) denotes the complete geophysical data. Both \({\mathbf {d}}\) and \({\mathbf {m}}\) are vectors, which are constructed by arranging all the traces in the data matrix \({\mathbf {D}}\) and model matrix \({\mathbf {M}}\) into vectors:

$$\begin{aligned} {\mathbf {d}}=\left[ \begin{array}{c} {\mathbf {D}}(1) \\ {\mathbf {D}}(2) \\ \vdots \\ {\mathbf {D}}(N) \end{array}\right] ,\quad {\text {and}},\quad {\mathbf {m}}=\left[ \begin{array}{c} {\mathbf {M}}(1) \\ {\mathbf {M}}(2) \\ \vdots \\ {\mathbf {M}}(N) \end{array}\right] , \end{aligned}$$
(2)

where \({\mathbf {D}}(i)\) and \({\mathbf {M}}(i)\) denotes the ith trace in the matrix \({\mathbf {D}}\) and \({\mathbf {M}}\). N is the number of traces in the data/model matrix. The sampling operator intends to sample or lose certain traces from the complete data. The sampling operator can be expressed as a block diagonal matrix

$$\begin{aligned} {\mathbf {L}}=\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} {\mathbf {I}} &{} &{} &{} &{} &{} \\ &{}{\mathbf {I}}&{} &{} &{} &{} \\ &{}&{} {\mathbf {O}}&{} &{} &{} \\ &{}&{} &{} {\mathbf {I}}&{} &{} \\ &{}&{} &{} &{}\ddots &{} \\ &{}&{} &{} &{} &{} {\mathbf {I}}\\ \end{array} \right] . \end{aligned}$$
(3)

\({\mathbf {I}}\) is an identity operator, the size of which is the same as the dimension of \({\mathbf {d}}_i\) or \({\mathbf {m}}_i\) in Eq. 2. Each \({\mathbf {I}}\) indicates sampling a trace from the complete data and each \({\mathbf {O}}\) indicates missing a trace. Because of many \({\mathbf {O}}\) in \({\mathbf {L}}\), the sampling operator is highly singular, and the observed data \({\mathbf {d}}\) contains a large portion of zero entries. The challenge of interpolation lays in inverting the highly under-determined Eq. 1.

2.2 Regularized Optimization Problem

In order to solve the under-determined inverse problem shown in Eq. 1, the common way is to formulate it as an optimization problem with a regularization term (Spitz 1991; Porsani 1999; Wang 2002; Yang et al. 2013)

$$\begin{aligned} \hat{{\mathbf {m}}} = \arg \min _{{\mathbf {m}}} {\parallel }{\mathbf {d}}-{\mathbf {Lm}} {\parallel }_2^2 + \epsilon ^2{\mathbf {R}}({\mathbf {m}}), \end{aligned}$$
(4)

where \(\hat{{\mathbf {m}}}\) is the model to be estimated, \({\parallel } \cdot {\parallel }_2\) denotes \(L_2\) norm of input vector, \({\mathbf {R}}\) denotes a regularization operator. \(\epsilon ^2\) denotes the balancing parameter compromising the data misfit \({\parallel }{\mathbf {d}}-{\mathbf {Lm}} {\parallel }_2^2\) and the model constraint \({\mathbf {R}}({\mathbf {m}})\).

Two most widely used model constraints in the geophysics community are sparsity-based and the low-rank-based constraints. In the sparsity-based regularized optimization problem as follows (Ma 2009; Liang et al. 2014; Zhong et al. 2016)

$$\begin{aligned} \hat{{\mathbf {m}}} = \arg \min _{{\mathbf {m}}} {\parallel }{\mathbf {d}}-{\mathbf {Lm}} {\parallel }_2^2 + \epsilon ^2{\parallel } {\mathbf {A}}{\mathbf {m}}{\parallel }_1, \end{aligned}$$
(5)

the regularization operator is the \(L_1\) norm (denoted by \({\parallel }\cdot {\parallel }_1\)) of the sparse transform domain coefficients. \({\mathbf {A}}\) in Eq. 5 is a sparse transform, e.g., the Fourier transform (Sacchi et al. 1998), wavelet transform (Portilla et al. 2003), curvelet transform (Qiao et al. 2016; Ma and Plonka 2007), seislet transform (Gan et al. 2015; Chen 2016), etc.

The well-known iterative soft-shrinkage thresholding (IST) method can be used to solve the optimization problem shown in Eq. 5:

$$\begin{aligned} {\mathbf {m}}_{n+1} = {\mathbf {A}}^{-1}{\mathbf {T}}_{\tau }{\mathbf {A}}\left[ {\mathbf {m}}_n+\lambda {\mathbf {L}}^{\mathrm{T}}({\mathbf {d}}-{\mathbf {L}}{\mathbf {m}}_n) \right] , \end{aligned}$$
(6)

where \({\mathbf {A}}\) and \({\mathbf {A}}^{-1}\) are a pair of forward and inverse sparse transform. \(\lambda\) is the step length for model update. \({\mathbf {T}}_{\tau }\) denotes the singular value thresholding operator with an input parameter \(\tau\).

The rank-based regularized optimization is shown as follows (Sacchi 2009; Oropeza and Sacchi 2011; Chen et al. 2016a, b; Huang et al. 2017; Wu and Bai 2018; Zhou et al. 2018)

$$\begin{aligned} \hat{{\mathbf {m}}} = \arg \min _{{\mathbf {m}}} {\parallel }{\mathbf {d}}-{\mathbf {Lm}} {\parallel }_2^2 + \epsilon ^2{\parallel } {\mathcal {M}}({\mathbf {m}}){\parallel }_*. \end{aligned}$$
(7)

where \({\mathcal {M}}\) denotes some mapping operations, e.g., the Hankelization operation (Zhou et al. 2018), the local flattening operation (Zhou et al. 2017), etc. \({\parallel }\cdot {\parallel }_*\) denotes the nuclear norm. An iterative singular value thresholding framework (Cai et al. 2010) can be used to solve optimization problem 7

$$\begin{aligned} {\mathbf {m}}_{n+1} = {\mathcal {M}}^{-1}{\mathcal {T}}_{k}{\mathcal {M}}\left[ {\mathbf {m}}_n+\lambda {\mathbf {L}}^{\mathrm{T}}({\mathbf {d}}-{\mathbf {L}}{\mathbf {m}}_n) \right] , \end{aligned}$$
(8)

where \({\mathcal {M}}\) and \({\mathcal {M}}^{-1}\) are a pair of forward and inverse mapping operations. \({\mathcal {T}}_{k}\) denotes the singular value thresholding operator with an input parameter k.

2.3 Inversion Based on Shaping Regularization

Fig. 1
figure 1

a Reflector model. b Slope calculated from the reflector model. c The data after applying the extension operator. d The data after applying the averaging operator (adjoint of the extension operator)

Solving Eq. 4 using shaping regularization is similar to the iterative framework of IST, as shown in Eq. 6. The general iterative framework can be expressed as

$$\begin{aligned} {\mathbf {m}}_{n+1}={\mathbf {S}}\left[ {\mathbf {m}}_n+\lambda {\mathbf {B}}({\mathbf {d}}-{\mathbf {L}}{\mathbf {m}}_n) \right] , \end{aligned}$$
(9)

where \({\mathbf {S}}\) is the shaping operator and \({\mathbf {B}}\) is the backward operator. In the shaping regularization framework (Fomel 2007b; Chen et al. 2014), the shaping operator \({\mathbf {S}}\) is used to constrain the model based on some a prior knowledge of the model, e.g., sparsity, coherency, smoothness. The backward operator is an approximate inverse of the forward operator, which provides an approximate mapping from the data space to model space. It is obvious that the IST framework introduced above is a specific case of the shaping regularization framework, where the shaping operator is chosen as the transform domain thresholding operator and the backward operator is simply chosen as the adjoint of the forward operator:

$$\begin{aligned} {\mathbf {S}}={\mathbf {A}}^{-1}{\mathbf {T}}_{\tau }{\mathbf {A}},\quad {\text {and}},\quad {\mathbf {B}}={\mathbf {L}}. \end{aligned}$$
(10)

A more preferred shaping operator for \({\mathbf {S}}\) would be a linear operator, since, after some mathematical manipulation, the iterative framework can be significantly accelerated by some fast linear solvers, such as the steepest descent and conjugate gradient method.

When the operators \({\mathbf {S}}\) and \({\mathbf {B}}\) are both linear operators and the iteration converges at \(\hat{{\mathbf {m}}}\), following Eq. 11, then

$$\begin{aligned} \hat{{\mathbf {m}}}={\mathbf {S}}\left[ \hat{{\mathbf {m}}}+\lambda {\mathbf {B}}({\mathbf {d}}-{\mathbf {L}}\hat{{\mathbf {m}}}) \right] . \end{aligned}$$
(11)

\(\hat{{\mathbf {m}}}\) thus can be expressed as

$$\begin{aligned} \hat{{\mathbf {m}}}=[{\mathbf {I}}-{\mathbf {S}}+\lambda \mathbf {SBL}]^{-1}\lambda {\mathbf {SB}}{\mathbf {d}}. \end{aligned}$$
(12)

The shaping regularization solution (12) can be derived from the well-known Tikhonov regularization. Considering the Tikhonov regularization as follows

$$\begin{aligned} \hat{{\mathbf {m}}} = \arg \min _{{\mathbf {m}}} {\parallel }{\mathbf {d}}-{\mathbf {Lm}} {\parallel }_2^2 + \epsilon ^2{\parallel }{\mathbf {D}}{\mathbf {m}}{\parallel }_2^2, \end{aligned}$$
(13)

where \({\mathbf {D}}\) is the regularization operator, e.g., differential operator, Laplacian operator. The shaping regularization is mathematically equivalent to the traditional Tikhonov regularization, but with a much flexible control on the model. From Eq. 11, it is clear that we can flexibly apply any reasonable linear or nonlinear constraint on the model \(\hat{{\mathbf {m}}}\).

Fig. 2
figure 2

a Complete data. b Incomplete data with big gaps (5.72 dB)

Fig. 3
figure 3

a Reconstructed data using the POCS method (8.01 dB). b Reconstructed data using the proposed approach (22.30 dB)

Fig. 4
figure 4

a Slope estimated from the completed data. b Slope estimated from the incomplete data

Fig. 5
figure 5

Single-trace comparison for the middle trace shown in Figs. 2 and 3. The black line is from the exact solution. The blue line corresponds to the POCS method. The green line corresponds to the proposed method. Note that the black and green lines are very close to each other while the blue line deviates from the black line a lot, thus the reconstruction error using the proposed approach is much less than the POCS method

Fig. 6
figure 6

Reconstruction performance for noisy data. a Incomplete and noisy data (\(-\,1.46\,\hbox {dB}\)). b Reconstructed data using the proposed approach (9.54 dB)

Fig. 7
figure 7

Single-trace comparison for the middle trace shown in Fig. 6. The black line is from the exact solution. The blue line corresponds to the proposed method. Note that in the presence of noise, the proposed method can still obtain a very accurate reconstruction. Although this algorithm seems to convert the random noise to some “coherent noise,” the amplitude of the “coherent noise” noise is much weaker after reconstruction

2.4 Structural Smoothness Regularized Sparse Data Interpolation

In this paper, we propose to use a structural smoothness constraint to regularize the model. The structural smoothness shaping operator can be formulated as

$$\begin{aligned} {\mathbf {S}}={\mathbf {P}}{\mathbf {H}}{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}, \end{aligned}$$
(14)

where \({\mathbf {H}}\) is a triangle smoothing operator, as introduced in Fomel (2007b) and Xue et al. (2016b), and \({\mathbf {H}}^{\mathrm{T}}\) denotes its transpose or its adjoint operator. \({\mathbf {P}}\) is a summation operator, which corresponds to averaging the traces along the structural direction. \({\mathbf {P}}^{\mathrm{T}}\) is its adjoint operator, which extends a trace to a group traces along the structural direction so that the triangle smoothing operator can be applied along the structural direction and is called an extension operator. In this paper, the backward operator \({\mathbf {B}}\) is simply chosen as the transpose of the forward operator \({\mathbf {B}}={\mathbf {L}}^{\mathrm{T}}\).

Substituting Eq. 14 and \({\mathbf {B}}={\mathbf {L}}^{\mathrm{T}}\) into Eq. 12 we obtain

$$\begin{aligned} \hat{{\mathbf {m}}}=[{\mathbf {I}}-{\mathbf {P}}{\mathbf {H}}{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}+\lambda {\mathbf {P}}{\mathbf {H}}{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}{\mathbf {L}}^{\mathrm{T}}{\mathbf {L}}]^{-1}\lambda {\mathbf {P}}{\mathbf {H}}{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}{\mathbf {L}}{\mathbf {d}}. \end{aligned}$$
(15)

It can be further derived as

$$\begin{aligned} \hat{{\mathbf {m}}}=\mathbf {PH}[{\mathbf {I}}+{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}(\lambda {\mathbf {L}}^{\mathrm{T}}{\mathbf {L}}-{\mathbf {I}}){\mathbf {P}}{\mathbf {H}}]^{-1}\lambda {\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}{\mathbf {L}}{\mathbf {d}}. \end{aligned}$$
(16)

Let \(\lambda =\frac{1}{c^2}\),

$$\begin{aligned} \hat{{\mathbf {m}}}=\mathbf {PH}[c^2{\mathbf {I}}+{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}({\mathbf {L}}^{\mathrm{T}}{\mathbf {L}}-c^2{\mathbf {I}}){\mathbf {P}}{\mathbf {H}}]^{-1} {\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}{\mathbf {L}}{\mathbf {d}}. \end{aligned}$$
(17)

Considering that \({\mathbf {L}}\) is a block diagonal matrix, as shown in Eq. 3, \({\mathbf {L}}^{\mathrm{T}}={\mathbf {L}}\). Equation 18 is simplified as

$$\begin{aligned} \hat{{\mathbf {m}}}=\mathbf {PH}[c^2{\mathbf {I}}+{\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}({\mathbf {L}}-c^2{\mathbf {I}}){\mathbf {P}}{\mathbf {H}}]^{-1} {\mathbf {H}}^{\mathrm{T}}{\mathbf {P}}^{\mathrm{T}}{\mathbf {L}}{\mathbf {d}}. \end{aligned}$$
(18)

When the inverted matrix is positive-definite, Eq. 18 can be solved via the conjugate gradient (CG) algorithm. The detailed CG workflow for the structural smoothness regularized interpolation (SSRI) is as follows,

figure a

In the above algorithm, c is a regularization parameter, it is often fixed as the least-squares norm of the modeling matrix (Fomel 2007a). tol denotes the minimal tolerance for model update. N denotes the number of iterations.

Figure 1 demonstrates how the extension operator \({\mathbf {P}}^{\mathrm{T}}\) works. Figure 1a shows an input subsurface reflector model. Figure 1b shows the local slope calculated from the reflector model using the PWD algorithm. Figure 1c shows the result after applying the extension operator. Figure 1d shows the result after applying the adjoint extension operator to Fig. 1c. It is clear that the extension operator extends each trace to both left and right directions along the structural direction for applying the following smoothing operator. The adjoint extension operator is simply an averaging operator along the structural direction. The extension width denotes the extension extent of the process. In this case, a maximum extension width of 7 traces is used.

Next, we will theoretically analyze the computing complexities of the POCS and SSRI methods.

In POCS, let model and data have the same size M, with FFT, \({\mathbf {A}}{\mathbf {m}}_n\) takes \(O(M\log M)\), \({\mathbf {T}}_{\tau }\) takes O(M), \({\mathbf {A}}^{-1}\) takes \(O(M\log M)\), \({\mathbf {L}}\) takes O(M) because \({\mathbf {L}}\) is a diagonal matrix operator. So the overall computational complexity is \(O(NM\log M)\), where N denotes the number of iterations.

In SSRI, \({\mathbf {L}}\mathbf {r} - c{\mathbf {m}}\) takes O(M). \({\mathbf {P}}^{\mathrm{T}}\mathbf {g}_m\) takes O(M). \({\mathbf {H}}^{\mathrm{T}}\mathbf {g}_p + c\mathbf {p}\) takes O(MR), where R denotes the smoothing radius. \({\mathbf {H}}\mathbf {g}_p\) also takes O(MR). \({\mathbf {P}}\mathbf {g}_m\) takes O(M). \({\mathbf {L}}\,\mathbf {g}_m\) takes O(M). Other operations all take O(M). Thus, the overall computational complexity of SSRI is O(NMR). Considering that the smoothing radius is often chosen as the half period of the wavelet function or half of the maximum gap size, \(R<\hbox {log}(M)\), the SSRI is faster than the POCS method.

Two key components of this algorithm are the matrix \({\mathbf {H}}\) and \({\mathbf {P}}\). The triangle smoothing operator \({\mathbf {H}}\) can be constructed by cascading two rectangle smoothing operator. Taking a \(SR=3\) as a simple example and assuming the model size is 10, \({\mathbf {H}}\) explicitly has the following structure

$$\begin{aligned} {\mathbf {H}} = \frac{1}{9}\left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c@{\quad }c} 3 &{} 2 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 2 &{} 3 &{} 2 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 1 &{} 2 &{} 3 &{} 2 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 1 &{} 2 &{} 3 &{} 2 &{} 1 &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 1 &{} 2 &{} 3 &{} 2 &{} 1 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 &{} 2 &{} 3 &{} 2 &{} 1 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 2 &{} 3 &{} 2 &{} 1 &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 2 &{} 3 &{} 2 &{} 1 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 2 &{} 2 &{} 1 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} 1 &{} 1 &{} 1 \\ \end{array}\right] . \end{aligned}$$
(19)

The averaging operator \({\mathbf {P}}\) explicitly has the following structure

$$\begin{aligned} {\mathbf {P}} = \left[ \overbrace{\begin{array}{c@{\quad }c@{\quad }c@{\quad }c} {\mathbf {I}}&{\mathbf {I}}&\cdots&{\mathbf {I}} \end{array}}^{7}\right] , \end{aligned}$$
(20)

where

$$\begin{aligned} {\mathbf {I}} = \left[ \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} 1 &{} 0 &{}0 &{}0\\ 0 &{} 1 &{}0 &{}0 \\ 0 &{} 0 &{}\ddots &{} 0\\ 0 &{} 0 &{} 0 &{}1 \end{array}\right] . \end{aligned}$$
(21)

The fault perserving capability of the proposed algorithm depends on the size of the largest gap in the data. If the gap is small, we can use a relatively smaller smoothing radius, then it can preserve the faults. However, if the gap is large, we must use a large smoothing radius, then it will smooth the faults. It is worth noting that a preferred strategy of the smoothing radius determination should be adaptive. In other words, ideally, the algorithm should automatically determine the smoothing radius based on the data distribution density, the wavelet resolution, and other related factors. However, the current algorithm does not enable the spatial-varying capability of the smoothing operator. It will be our future research focus. An adaptive algorithm that can adaptively determine the resolution and the implicit regularization contribution proposed in Hu (2017) would be helpful in solving this problem.

The proposed algorithm can be extended to the multi-dimensional interpolation. The extension to 3D or 5D interpolation is our future research plan and we do not show any examples here. As we analyzed previously, the computational complexity is only related with the data size, independent on the dimension, thus, the higher dimension it is, the faster the proposed method. For 5D interpolation, there are generally two types of methods to enforce the coherence. First, we can construct the data as a 5D structural tensor. 5D smoothing turns into 5D tensor calculation. Second, 5D data refers to a \(t-hx-hy-x-y\) 5D space. We can fix x and y to apply the coherence constraint to \(t-hx-hy\) 3D object, and then fix \(t-hx-hy\) to apply coherence constraint to \(t-x-y\) 3D object. Of course, we can also fix \(t-hx-x\) to apply coherence constraint to \(t-hy-y\) 3D object. However, the result obtained by using this strategy will be different from the 5D structural tensor approach. It is only a temporary strategy to apply the smoothness constraint on 5D data. In order to fully utilize the 5D structural pattern, we need to develop exact 5D smoothness constraints, and we are still investigating its possibility.

Fig. 8
figure 8

Application in interpolating missing shot gathers. a Incomplete shot gathers. b Reconstructed shot gathers

3 Applications

3.1 Application in Interpolating Big Gaps in Geophysical Data: Synthetic Examples

Fig. 9
figure 9

a Complete data. b Incomplete data with big gaps (4.23 dB)

Fig. 10
figure 10

a Reconstructed data using the POCS method (5.95 dB). b Reconstructed data using the proposed approach (14.46 dB). c Reconstructed data using the POCS method after NMO correction (7.26 dB)

Fig. 11
figure 11

Demonstration of the NMO correction. a NMO corrected data using the picked velocity. b NMO corrected data using a rough velocity. Note the strong NMO stretching in a

Fig. 12
figure 12

a Incomplete data with two gaps (3.66 dB). b Slope estimated from the incomplete data. c Reconstructed data using the POCS method (5.42 dB). d Reconstructed data using the proposed approach (12.90 dB)

Fig. 13
figure 13

a Incomplete data with three gaps (3.20 dB). b Slope estimated from the incomplete data. c Reconstructed data using the POCS method (4.89 dB). d Reconstructed data using the proposed approach (12.08 dB)

Fig. 14
figure 14

a Incomplete data with six gaps (1.70 dB). b Slope estimated from the incomplete data. c Reconstructed data using the POCS method (2.62 dB). d Reconstructed data using the proposed approach (9.46 dB)

Fig. 15
figure 15

a Data containing large gaps. b Estimated local slope. c Reconstructed data

The crucial requirement for utilizing the proposed interpolation algorithm to interpolate big gaps in geophysical data is to obtain a fairly acceptable local slope model of the geophysical data. We find that utilizing the low-frequency components of the data and with a large smoothing radius can help obtain the local slope from the incomplete geophysical data. Thus, the two important steps involved in big gaps interpolation are as follows:

  1. 1.

    Calculate the local slope from the low-frequency data with a relatively large smoothing radius.

  2. 2.

    Interpolate the large gaps using the structural smoothness regularized interpolation method.

We first show a synthetic example in Figs. 2 and 3. Figure 2a shows the complete gather, which contains hyperbolic events. Figure 2b shows incomplete data. As can be seen in Fig. 2b, there is a large gap in the middle of the gather. The reconstructed result using the proposed method is shown in Fig. 3b, which is very close to the exact solution shown in Fig. 2a. As a comparison, we also show the result from the projection onto convex sets (POCS) method (Abma and Kabir 2006). The POCS method uses the following iteration for solving Eq. 1:

$$\begin{aligned} {\mathbf {m}}_{n+1} = ({\mathbf {I}}-{\mathbf {L}}){\mathbf {A}}^{-1}{\mathbf {T}}_{\tau }{\mathbf {A}}{\mathbf {m}}_n + {\mathbf {d}}, \end{aligned}$$
(22)

where the symbols have the same meanings as introduced in Sect. 2. The result from the POCS method is shown in Fig. 3a. The POCS method reconstructs the missing near-offset data to some extent, but spatial coherence is not as good as that from the proposed method. To show that the PWD method is capable of estimating the local slope map even in the presence of big gaps, we plot the local slope fields calculated from the original data (Fig. 2a) and the incomplete data (Fig. 2b) in Fig. 4. It is clear that the two slope maps are almost the same.

To quantitatively compare the reconstruction performance, we use the commonly used SNR defined as follows to measure the performance (Chang et al. 2000; Li and Orchard 2001; Gan et al. 2015):

$$\begin{aligned} {\text {SNR}}=10\log _{10}\frac{{\parallel } \mathbf {s} {\parallel } _2^2}{{\parallel }\mathbf {s} -\hat{\mathbf {s}}{\parallel } _2^2}. \end{aligned}$$
(23)

where \(\mathbf {s}\) denotes the exact solution and \(\hat{\mathbf {s}}\) denotes the data to be evaluated. Using this metric, the incomplete data has a SNR equal to 5.72 dB. The SNRs of reconstructed data of the POCS method and the proposed method are 8.01 dB and 22.30 dB, respectively.

In order to compare the reconstruction performance in detail, we plot a single-trace amplitude comparison in Fig. 5. The three traces are extracted from the middle trace (the 128th trace) of the original data and the two reconstructed data using the POCS and the proposed methods. In this example, the black and green lines are still very close to each other while the blue line deviates from the black line a lot.

We also test the robustness of the proposed method in the presence of strong random noise. Figure 6a shows the incomplete data containing strong random noise. Figure 6b shows the reconstructed result using the proposed method. Figure 7 shows the single-trace comparison for the noise test. The black line shows the exact solution and the blue line shows the reconstructed trace using the proposed method. In this test, the SNRs of the incomplete data and the reconstructed data are \(-\,1.46\,\hbox {dB}\) and 9.54 dB, respectively. It is obvious that, even in the case of noisy data, the proposed method can still be robust enough to obtain a very accurate model estimation.

Previous examples are all for interpolating missing traces in one shot gather. To see how the algorithm performs when one or several shot gathers are missing, we create another synthetic example shown in Fig. 8. Figure 8a shows a pre-stack dataset with many missing shot gathers. Figure 8b displays the reconstructed pre-stack dataset with all missing shot gathers successfully filled.

3.2 Application in Interpolating Big Gaps in Geophysical Data: Exploration-Scale Seismic Examples

Then we apply the proposed method to a simulated field dataset. In the simulated field data test, we use a complete field data as the exact solution, and remove big gaps from the complete data to simulate the recorded incomplete data in field. The complete data is shown in Fig. 9a. Figure 9b shows the decimated data with a large gap of near-offset data missing. The reconstructed data using the POCS method is shown in Fig. 10a. The POCS method recovers some energy in the near-offset. However, there exists spatial incoherency between the reconstructed and original data. Figure 10b shows the reconstructed data using the proposed method. The reflection events are spatially coherent, and reconstructed data are almost the same as those in the original complete data. In this example, we also show the performance of the POCS method after normal moveout (NMO) correction. The result is shown in Fig. 10c. The performance is slightly better than directly applying the POCS method. It is worth pointing out the difficulty when applying the NMO correction. For the field data shown in Fig. 9a, applying NMO correction before POCS interpolation is challenging. If we use the picked velocity from the original gather, there will be strong stretching phenomenon, as shown in Fig. 11a. The stretching phenomenon will make the following POCS method ineffective. Thus, to obtain a compromise between the stretching phenomenon and flattening the gather, we only apply an approximate NMO correction with a roughly estimated velocity, as shown in Fig. 11b. The result shown in Fig. 10c is obtained by applying such approximate NMO correction.

In this example, the SNRs of the incomplete data, reconstructed data using POCS method, and reconstructed data using the proposed method are 4.23 dB, 5.95 dB, and 14.46 dB, respectively.

Next, we test the reconstruction performance when more data are missing. Figure 12 shows the reconstruction test for incomplete data containing two big gaps. Figure 12a shows the incomplete data. Figure 12b shows the slope estimated from the incomplete data. Figure 12c, d shows the reconstructed data using the POCS method and the proposed method, respectively. In this case, the proposed method can still perform well and reconstruct data accurately. In Figs. 13 and 14, we show reconstruction results where three and six big gaps are existing. As can be seen from both tests, the proposed method obtains successful data reconstruction and the local slopes estimated from the highly incomplete data are very close to the exact slope. In both cases, the POCS method cannot obtain satisfactory results. It can also be observed that the reconstruction performance slightly deteriorates when more data are missing. For example, in the result shown in Fig. 14d, there are some discontinuities or over-smoothing around the 40th trace and 250th trace. The SNRs of incomplete data, reconstructed data using POCS method and the proposed method, for all these three tests, are indicated in the captions of Figs. 12, 13 and 14.

We also use a real example to demonstrate the performance of the proposed method in interpolating big gaps of reflection seismic data. Figure 15 shows the real data example. Figure 15a shows a common offset seismic profile, in which there are three large gaps missing. Figure 15b shows the estimated local slope from the incomplete data. Figure 15c shows the reconstructed data. In this example, we preserve the data that are originally in the seismic profile and only display the reconstructed part that is not shown in the original profile. It can be seen that the reconstructed data is much smoother than the data that is originally in the seismic profile. The reconstructed data follows the structural direction very well. We admit that for very large gaps, although it is not possible to reconstruct accurately for very complicated structure, the proposed method can be used to provide a decent reference model for interpretational purposes.

3.3 Application in Interpolating Big Gaps in Geophysical Data: Global-Scale Seismic Examples

We also show an example of the presented method in interpolating earthquake data in the global seismology community. Figure 16a shows the stacks over many earthquakes at constant epicentral distance (offset angle). Figure 16a has been improved a lot from the raw data by stacking different earthquake data. Different seismic phases can be seen from Fig. 16a. However, there might be missing traces in the data, which indicates in sufficient data coverage. By applying the proposed method to interpolate the incomplete stack data, we obtain a much improved result, which is shown in Fig. 16b. Different phases have been shown much clearer in the denoised data, as highlighted by the labeled arrows. Note that in this example, we do not preserve the original seismic traces. In the POCS method, we can preserve the raw data and insert the reconstructed components into the gaps. However, in this way, we will inevitably cause some residual noise (for the raw data) and cause discontinuities around the gap-data edges.

3.4 Application in Interpolating Big Gaps in Geophysical Data: GPR Example

An application of the presented method in interpolating the multi-channel data from ground penetrating radar (GPR) in the archeology community. The data is a 2D line from a 3D GPR survey. The GPR survey is conducted for investigating the burial depth of the Mammoth. Because of the physical limitation in a certain area, there is a big gap in the data, as shown in Fig. 17a. Figure 17b shows the reconstructed data using the presented new method. From the result, it is clear that the new method recovers a spatially coherent part of the event. The method also preserves the spatial variation of the amplitude.

Fig. 16
figure 16

Application in interpolating earthquake data in the global seismology community. a Incomplete earthquake stack data. b Reconstructed earthquake stack data, with much clearly depicted seismic phases

Fig. 17
figure 17

Application in interpolating GPR data in the archeology community. a Incomplete GPR data. b Reconstructed GPR data

3.5 Application in Interpolating Well-Log Data for Full-Waveform Inversion

Another application of the proposed method is for interpolating sparsely distributed well logs to prepare optimal initial model for full-waveform inversion (FWI) (Tarantola 1984; Virieux and Operto 2009; Xu et al. 2016; Xue et al. 2016a, c, 2017).

The goal of FWI is to match the data misfit by iteratively updating the velocity model. The objective function that measures least-squares norm of the misfit vector is

$$\begin{aligned} J({\mathbf {m}}) = \frac{1}{2} {\parallel } {\mathbf {d}}_{\mathrm{obs}} - {\mathbf {d}}_{\mathrm{syn}} {\parallel } ^2_2 = \frac{1}{2} {\parallel } {\mathbf {d}}_{\mathrm{obs}} - f({\mathbf {m}}) {\parallel } ^2_2, \end{aligned}$$
(24)

where \({\mathbf {m}}\) is the velocity model, \({\mathbf {d}}_{\mathrm{obs}}\) and \({\mathbf {d}}_{\mathrm{syn}}\) denote the recorded and synthetic data, respectively. f indicates the forward-modeling process.

Since the well logs can be extremely sparse because there are only limited number of wells in a exploration seismic survey. According to a number of tests, we found that when the local slope of the subsurface geological structure can be obtained accurately, the interpolation performance would be very successful. Then, the challenging problem turns to be the local slope estimation. We calculate the local slope of subsurface geological structure by first migrating the recorded surface seismic data to obtain an initial estimate of subsurface reflector structure, and then calculating the local slope of subsurface structure from the initially migrated seismic image. In fact, the local slope does not need to be very accurate, the approximated local slope calculated from the initial seismic image can be acceptable to obtain a very reasonable initial velocity model. The imperfect velocity estimation can be compensated by subsequent FWI. Because the well-log interpolated model is much closer to the exact solution than the initial smooth velocity model that is obtained from NMO-based velocity analysis method or ray-based tomography method, the following FWI can converge fast to a very accurate velocity inversion result. The main steps in the well-log interpolation aided FWI are shown as follows:

  1. 1.

    Obtain initial velocity model using NMO-based velocity analysis method or Ray-based tomography method.

  2. 2.

    Migrate the recorded reflection seismic data using the initial velocity model to obtain a rough estimation of the subsurface reflector model.

  3. 3.

    Approximate the local slope field of the subsurface geological structure by calculating the local slope map from the initially migrated image from Step 2.

  4. 4.

    Input the approximate local slope field to the proposed structural smoothness-based interpolation method to interpolate the sparsely distributed well-log data.

  5. 5.

    Treat the interpolated velocity model as the initial model and conduct the full-waveform inversion based on this model, iterate until the FWI iterations reach the stopping criteria.

We use a simulated example based on the Marmousi velocity model to demonstrate the application of the proposed method in preparing highly accurate initial velocity model for FWI.

Figure 18a shows the exact velocity model. Figure 18b shows the smooth initial velocity model. Figure 18c shows the velocities from 20 evenly distributed wells. Figure 18d shows the interpolated velocity model from the well logs shown in Fig. 18c. In this example, the slope used for the interpolation is estimated from the initially migrated seismic image, as shown in Fig. 19. We use reverse time migration (RTM) to migrate the recorded seismic data using the initial velocity model shown in Fig. 18b, and the result is shown in Fig. 19.

The slope calculated from the exact velocity model and the initially migrated image is shown in Fig. 20. It is clear that both local slope maps are very similar. In order to demonstrate the influence of the well-log interpolated velocity model to the subsequent FWI, we demonstrate the inverted velocities during the inversion. The inverted velocities using the smooth velocity (Fig. 18b) and the well-log interpolated velocity are shown in Fig. 23. The finally inverted velocity shown in Fig. 23h is very close to the exact solution shown in Fig. 18a. Figure 21 shows the normalized convergence diagrams for the two methods. We also show the recorded seismic data and the simulated seismic data from the finally inverted model in Fig. 22.

To compare the velocity in detail, we extract two velocity traces from Fig. 23. We compare the true velocity, smooth initial velocity, inverted velocity using well-log interpolated initial model, and inverted velocity using the smooth initial velocity model, at two different locations, i.e., \(X=3000\,\hbox {m}\) and \(X=6000\,\hbox {m}\). The comparisons are shown in Fig. 24. The black line denotes the true velocity. The pink line denotes the initial velocity. The blue line denotes the inverted velocity after 200 FWI iterations with the well-log interpolated initial velocity model. The red line denotes the inverted velocity after 200 FWI iterations with the smooth initial velocity model. The inverted velocity using the proposed method is almost the same as the true velocity in both shallow and deep parts. In the deep part, the inverted velocity deviates a little from the true velocity due to the relatively low illumination in the deep area. The inverted velocity using the traditional model, however, still deviates from the exact solution a lot due to the cycle-skipping problem.

Fig. 18
figure 18

Marmousi model example. a True velocity. b Initial velocity. c Velocity from 20 wells. d Interpolated velocity from b

Fig. 19
figure 19

Initially migrated reflection seismic image using the smooth velocity shown in Fig. 18b

Fig. 20
figure 20

a Local slope map calculated from the true velocity. b Local slope map calculated from the initially migrated seismic image

Fig. 21
figure 21

Convergence diagrams of FWI using initial velocity model and well-log interpolated velocity model

Fig. 22
figure 22

a Recorded data. b Synthesized data using the inverted velocity from Fig. 23h. c Difference between a and b

Fig. 23
figure 23

Left column: Traditional FWI with 10, 50, 100, 200 iterations. Right column: FWI using 20 wells interpolated model with 10, 50, 100, 200 iterations

Fig. 24
figure 24

Comparison of different velocities at two locations. a Velocity comparison at \(X = 3000\,\hbox {m}\). b Velocity comparison at \(X = 6000\,\hbox {m}\). The black line denotes the true velocity. The pink line denotes the initial velocity. The blue line denotes the inverted velocity after 200 FWI iterations with the well-log interpolated initial velocity model. The red line denotes the inverted velocity after 200 FWI iterations with the smooth initial velocity model. The inverted velocity using the proposed method is almost the same as the true velocity in both shallow and deep parts. In the deep part, the inverted velocity deviates a little from the true velocity due to the relatively low illumination in the deep area. The inverted velocity using the traditional model, however, still deviates from the exact solution a lot due to the cycle-skipping problem

4 Conclusions

Interpolating sparse data is very challenging in the many geophysical domains. We reviewed the state-of-the-art interpolation methods for different types of problems. After reviewing the theoretical backgrounds for the most popular optimized interpolation problems, we proposed an inversion-based interpolation algorithm that can recover missing data from highly insufficient sparse data. Due to the ill-posedness of the inverse problem, we use shaping regularization to solve the corresponding optimization problem. The structural smoothness constraint used in the shaping regularization framework can help stably recover the model. The shaping regularization framework offers more flexible control on the model than the state-of-the-art regularization methods. The interpolation method can be applied to interpolate pre-stack reflection seismic data, GPR data, and earthquake data, with large acquisition gaps and can also be applied to interpolate sparse well-log data to prepare a fairly accurate initial velocity model for subsequent full-waveform inversion. The well-log interpolated initial velocity can help FWI converge to an very accurate model that is very close to the exact solution.