1 Introduction

It is necessary to obtain velocity and density information representing the characteristics of an underground medium based on the pre-stack seismic gather with the help of optimization theory. This process is called seismic inversion, which is essentially an optimization problem in mathematics (Wang 2016). Seismic inversion methods generally include linear and nonlinear methods. Nonlinear inversion methods are easier for solving the global optimal solution because they continuously search in the solution space, but linear inversion significantly reduces the computational cost of nonlinear methods and is more suitable for fast inversion of large-scale seismic data (Maurya et al. 2020). However, seismic inversion is generally an underdetermined problem, making the linear inversion algorithm more dependent on the initial model. To make linear inversion converge at the global extreme value and reduce the instability and multiplicity of the inversion solutions, a regularization constraint is generally used to solve the inverse problem in geophysics (Tarantola 2005).

The sparse constraint is common in signal processing; due to its ability in noise suppression, it has been applied in geophysics for a long time. Claerbout and Muir (1973) proved that \(L_{1}\)-norm could obtain much better results than \(L_{2}\)-norm in most cases for seismic numerical modeling. Mainly, with compressed sensing (CS) technology in signal processing, more researchers have focused widely on deconvolution, seismic inversion and seismic data reconstruction (Kazemi and Sacchi 2014; She et al. 2019; Ma 2013). In seismic inversion, Zhang and Castagna (2011) applied the basis pursuit (BP) algorithm to post-stack seismic data to solve the \(L_{1}\)-norm constrained objective function. This practice shows that the inverted impedance has a higher resolution than the traditional method. Total variation (TV) regularization is a special form of sparse constraint, which is not the \(L_{1}\)-norm of the parameters calculated directly but the \(L_{1}\)-norm of the calculated parameters after difference calculation. In this way, sparse edges can be controlled, and the parameters obtained by inversion show characteristics of periodic variation. In seismic inversion, TV regularization can highlight vertical variation characteristics of strata, and the elastic parameters of discontinuous variation can be obtained by inversion (Mozayan et al. 2018). Scholars have applied various methods, such as Iteratively Reweighted Least Squares (IRLS), the split-Bregman and the Alternating Direction Method of Multipliers (ADMM), to retrieve elastic parameters from post-stack data (Zhang et al. 2014; Liu and Yin 2015; Pan et al. 2017). Compared with post-stack data, pre-stack data contain more lithology and fluid information, and a relatively stable solution can also be obtained after extending the \(L_{1}\)-norm regularization method to the pre-stack AVA inversion (Li and Zhang 2017; Zhi et al. 2016).

\(L_{1 - 2}\)-norm is a recently proposed sparse constrained regularization term. It was first addressed by Lou et al. (2015) in the context of nonnegative least squares problems and group sparsity with applications to spectroscopic imaging. Some efforts have been made in an application and solving algorithm which has made \(L_{1 - 2}\)-norm gradually applied to geophysics. Wang et al. (2018) make use of the sparsity of \(L_{1 - 2}\)-norm to compensate for the attenuation of seismic data and effectively improve signal resolution. Wang et al. (2019) carried out pre-stack seismic inversion under the constraint of \(L_{1 - 2}\)-norm regularization and improved the lateral continuity of inversion results by using f-x prediction filtering and obtained a high-quality elastic parameter inversion result. Huang et al. (2021) used \(L_{1 - 2}\)-norm regularization to carry out AVA joint inversion based on time domain matching of PP- and PS-waves by using the DTW algorithm, and the algorithm showed good stability.

In fact, a more complex iterative algorithm is needed to solve the sub-problems in each iteration for pre-stack seismic inversion, a large-scale problem, by using \(L_{1 - 2}\)-norm, because the traditional difference-of-convex algorithm (DCA) may lead to high cost in large-scale inverse problems (Gotoh et al. 2018). Therefore, in this article, we propose a novel pre-stack seismic inversion scheme by using \(L_{1 - 2}\)-norm. The objective function is composed of a misfit function and a constraint term of \(L_{1 - 2}\)-norm. Based on the proximal DCA, we develop an optimization algorithm by reformulating the objective function as the difference between two convex functions. In each iteration of DCA, we extrapolate the last solution to obtain the starting point of the new iteration and then use the soft thresholding algorithm to calculate the optimal solution of the current iteration. Moreover, we introduce an adaptive regularization parameter selection method into the new algorithm and propose a strategy to solve the problem that the amplitude of the inversion parameters cannot be well recovered in some cases. To verify the effectiveness of the algorithm and the adaptive parameter selection method, we use synthetic data and actual data to test, respectively. The inversion results verify that the new method is effective for the inverse problem constrained by \(L_{1 - 2}\)-norm in the pre-stack seismic inversion, and the adaptive parameter selection method is appropriate.

2 Theory and Methodology

2.1 Pre-Stack Seismic Forward Model

According to the seismic convolution model (Robinson 1967), the pre-stack seismic gather can be represented as reflectivity coefficients for different angles convoluted with the seismic wavelet. A noise term should also be added for real field seismic records. Therefore, for an N-trace pre-stack angle gather, the forward modeling can be expressed as

$${\mathbf{s}}\left( {\theta_{i} } \right) = {\mathbf{w}}\left( {\theta_{i} } \right) * {\mathbf{r}}\left( {\theta_{i} } \right) + {\mathbf{n}}\left( {\theta_{i} } \right), \, i = 1,2, \cdots ,N$$
(1)

where \({\mathbf{s}}\left( {\theta_{i} } \right)\), \({\mathbf{w}}\left( {\theta_{i} } \right)\), \({\mathbf{r}}\left( {\theta_{i} } \right)\) and \({\mathbf{n}}\left( {\theta_{i} } \right)\) are the pre-stack seismic trace data, the seismic wavelet source, the reflectivity coefficient series and the noise at the ith angle of incidence \(\theta_{i}\), respectively. \(\left\langle * \right\rangle\) represents the convolution operation. In Eq. 1, the reflection coefficient controls the amplitude of the seismic reflection wave and directly reflects the difference of impedance between the upper and lower layers of the reflection interface.

Mathematically, the pre-stack seismic forward modeling can be written as, omitting the noise term for simplicity,

$${\mathbf{d}} = {\mathbf{Gm}}$$
(2)

where \(d\) represents the pre-stack seismic gather, \(G\) is the forward modeling operator, and \(m\) represents the model parameters vector.

2.2 \(L_{1 - 2}\)-Norm in Seismic Inversion

The inverse problem of seismic inversion is highly ill-posed which leads to multiple solutions, especially for pre-stack seismic. By adding prior knowledge into the inverse problem, the regularization method in mathematics can solve the problem and obtain a consistent solution with the prior information. Because seismic inversion is mainly for the reflection coefficient sequence in the time domain, and the reflection coefficient of underground media has obvious sparsity, we can use sparsity and add it into the inversion process. That is the sparse regularization in the optimization of inverse problems in mathematics.

At present, there are three commonly used regularization operators: \(L_{0}\)-norm, \(L_{1}\)-norm and \(L_{2}\)-norm, while \(L_{0}\)-norm and \(L_{1}\)-norm are referred to as sparse regularization constraints. As an optimal convex approximation of \(L_{0}\)-norm, \(L_{1}\)-norm can guarantee sparsity and it has better solving characteristics. However, the optimization problem containing \(L_{0}\)-norm belongs to an NP-hard problem, which is challenging to solve. Therefore, \(L_{1}\)-norm is generally used to construct the objective function, which aims to solve the sparse solution in image processing or geophysical inverse problems (Oldenburg et al. 1983; Yin et al. 2015b; Hamid and Pidlisecky 2015). As a recently emerging sparse constraint method, \(L_{{1{ - }2}}\)-norm is getting more attention (Wang et al. 2018, 2019; Huang et al. 2021).

By comparing the four different regularization terms (\(L_{0}\), \(L_{{1{ - }2}}\), \(L_{1}\) and \(L_{2}\)), the advantage of \(L_{{1{ - }2}}\) norm in sparse constraints can be illustrated. The three-dimensional surface comparison of the four different norms (\(L_{0}\), \(L_{{1{ - }2}}\), \(L_{1}\) and \(L_{2}\)) function values is shown in Fig. 1. The function values are calculated by different norm terms using a set of two-dimensional data, and the projection of different surfaces on the two-dimensional plane is a contour map. Under the assumption of the misfit function being unchanged, the process of minimizing the objective function can be regarded as finding the solution of minimizing the regularization term in three-dimensional space. The lowest point of the surface is the optimal point to be searched. After the 3D surface is projected to a 2D plane, it can be found that the lowest point is close to the x-axis and y-axis on the 2D contours. An obvious conclusion is that the closer the norm curve is to the x-axis, the more possibility there is of obtaining the sparse solution by inversion. Therefore, the solution of \(L_{0}\)-norm regularization is the sparsest among these regularization terms. Although the solution of \(L_{2}\)-norm regularization is not sparse, it will make the solving process fast and stable. Another important conclusion is that \(L_{1 - 2}\)-norm is better than the traditional \(L_{1}\)-norm in sparsity as a regularization penalty term, and it is more likely to get a sparse solution.

Fig. 1
figure 1

Comparison of 3D surface and 2D contour of the values for the different norms. a L0-norm. b L1-2-norm. c L1-norm. d L2-norm

According to the pre-stack seismic forward equation, Eq. (2), the pre-stack seismic inversion estimates the model parameters \(m\) by using the pre-stack seismic data \(d\). Moreover, Eq. (2) represents an underdetermined linear equation. Considering the layered distribution of underground media, a sparse regularization constraint is helpful to obtain sparse reflection coefficient sequences. As in the case of the seismic deconvolution method (Oldenburg et al. 1983), a stable and sparse solution is inverted by using the \(L_{2}\)-norm misfit function with \(L_{1}\)-norm regularization

$$J({\mathbf{m}}) = \left\| {{\mathbf{Gm - d}}} \right\|_{2}^{2} + \lambda \left\| {\mathbf{m}} \right\|_{1}.$$
(3)

In Eq. (3), \(\lambda\) is a trade-off parameter to balance the first error, or misfit, term and the second term.

By adopting \(L_{{1{ - }2}}\)-norm to regularize the pre-stack seismic inverse problem, the constructed objective function includes a general misfit function and a sparse constraint regularization term

$$J\left( {\mathbf{m}} \right) = f\left( {\mathbf{m}} \right) + H\left( {\mathbf{m}} \right),$$
(4)

where \(f\left( {\mathbf{m}} \right) = \left\| {{\mathbf{Gm}} - {\mathbf{d}}} \right\|_{2}^{2}, \ H\left( {\mathbf{m}} \right) = \lambda \left( {\left\| {\mathbf{m}} \right\|_{1} - \alpha \left\| {\mathbf{m}} \right\|_{2} } \right). \ H\left( {\mathbf{m}} \right)\) is the \(L_{1 - 2}\)-norm regularization penalty term, and \(\alpha \in (0,1]\) is a constant, which can promote the generation of a sparse solution (Lou et al. 2015).

2.3 Inversion Algorithm

For the objective function of the pre-stack seismic inversion in Eq. (4), we can use the difference-of-convex algorithm (DCA) to solve. In general, we need to set \(H\left( {\mathbf{m}} \right)\) in the form of the difference between two scalar norms

$$H\left( {\mathbf{m}} \right) = H_{1} \left( {\mathbf{m}} \right) - H_{2} \left( {\mathbf{m}} \right) = \lambda \left\| {\mathbf{m}} \right\|_{1} - \lambda \alpha \left\| {\mathbf{m}} \right\|_{2} .$$
(5)

Therefore, the algorithm then transforms the solution of Eq. (4) into alternate iterations of two variables

$$\left\{ \begin{gathered} {\mathbf{b}}^{k} \in \partial H_{2} \left( {{\mathbf{m}}^{k} } \right) \hfill \\ {\mathbf{m}}^{k + 1} = \arg \min \left( {f\left( {\mathbf{m}} \right) + H_{1} \left( {\mathbf{m}} \right)} \right) - \left( {H_{2} \left( {{\mathbf{m}}^{k} } \right) + \left\langle {{\mathbf{b}}^{k} ,{\mathbf{m}} - {\mathbf{m}}^{k} } \right\rangle } \right) \hfill \\ \end{gathered} \right.,$$
(6)

where \({\mathbf{b}}_{k}\) is a subgradient of \(H_{2} \left( {{\mathbf{m}}_{k} } \right)\) and \(\left\langle \cdot \right\rangle\) denotes the inner product of two vectors. The key to Eq. (6) is solving \({\mathbf{m}}^{k + 1}\) in a new convex optimization problem. A common solution is to introduce additional variables and use the Lagrange multiplier method to construct a form, which can be solved with the help of the ADMM algorithm (Yin et al. 2015a; Wang et al. 2019).

The mathematical derivation has shown that the stability and convergence of DCA depend on the concrete decomposition form of \(H({\mathbf{m}})\) (Tao and An 1998). To get the closed-form solution of each subproblem, Gotoh et al. (2018) designed a new type of DCA, which is called proximal DCA (pDCA)

$$H\left( {\mathbf{m}} \right) = \left[ {\frac{L}{2}\left\| {\mathbf{m}} \right\|_{2}^{2} + H_{1} \left( {\mathbf{m}} \right)} \right] - \left[ {\frac{L}{2}\left\| {\mathbf{m}} \right\|_{2}^{2} + H_{2} \left( {\mathbf{m}} \right)} \right].$$
(7)

where \(L\) is the Lipschitz constant, and it can be obtained by calculating \({\text{max}}\left( {{\text{svd}}\left( {{\mathbf{G}}^{{\mathbf{T}}} *{\mathbf{G}}} \right)} \right)\). Through this algorithm, each component of DC decomposition will be strongly convex. Equation (4) can be written as

$$J({\mathbf{m}}) = \left[ {\frac{L}{2}\left\| {\mathbf{m}} \right\|_{2}^{2} + H_{1} \left( {\mathbf{m}} \right)} \right] - \left[ {\frac{L}{2}\left\| {\mathbf{m}} \right\|_{2}^{2} - f\left( {\mathbf{m}} \right) + H_{2} \left( {\mathbf{m}} \right)} \right].$$
(8)

By taking the derivative with respect to \({\mathbf{m}}\) in Eq. (8) and making it equal to zero, we can solve the extremum problem of the objective function \(J({\mathbf{m}})\) to obtain the inversion solution in each iteration. The solution of Eq. (8) is given by Wen et al. (2018)

$$\begin{gathered} {\mathbf{m}}^{k + 1} = \arg \min \left\{ {\left\langle {\nabla f\left( {{\mathbf{m}}^{k} } \right) - {\mathbf{b}}^{k} ,{\mathbf{m}}} \right\rangle + \frac{L}{2}\left\| {{\mathbf{m}} - {\mathbf{m}}^{k} } \right\|_{2}^{2} + H_{1} \left( {\mathbf{m}} \right)} \right\} \\ = \arg \min \left\{ {\frac{L}{2}\left\| {{\mathbf{m}} - \left( {{\mathbf{m}}^{k} - \frac{1}{L}\left[ {\nabla f\left( {{\mathbf{m}}^{k} } \right) - {\mathbf{b}}^{k} } \right]} \right)} \right\|_{2}^{2} + H_{1} \left( {\mathbf{m}} \right)} \right\}. \\ \end{gathered}$$
(9)

In order to accelerate the convergence speed, we can extrapolate a point \({\mathbf{y}}^{k}\) in the direction of gradient descent on the line between \({\mathbf{m}}^{{k{ - }1}}\) and \({\mathbf{m}}^{k}\) according to the FISTA (Aster et al. 2012)

$${\mathbf{y}}^{k} = {\mathbf{m}}^{k} + \omega ({\mathbf{m}}^{k} - {\mathbf{m}}^{{k{ - }1}} ),$$
(10)

where \(\omega = {{\left( {\theta_{t - 1} - 1} \right)} \mathord{\left/ {\vphantom {{\left( {\theta_{t - 1} - 1} \right)} {\theta_{t} }}} \right. \kern-\nulldelimiterspace} {\theta_{t} }}\) with \(\theta_{t} = {{\left( {1 + \sqrt {1 + 4\theta_{t - 1}^{2} } } \right)} \mathord{\left/ {\vphantom {{\left( {1 + \sqrt {1 + 4\theta_{t - 1}^{2} } } \right)} 2}} \right. \kern-\nulldelimiterspace} 2}\). FISTA is an upgraded version of ISTA, which is often used to solve the \(L_{1}\)-norm regularized problem. Let \({\mathbf{y}} = {\mathbf{m}}^{k + 1}\),

$${\mathbf{m}}^{k + 1} = \arg \min \left\{ {\left\langle {\nabla f({\mathbf{y}}^{k} ) - {\mathbf{b}}^{k} ,{\mathbf{y}}} \right\rangle + \frac{L}{2}\left\| {{\mathbf{y}} - {\mathbf{y}}^{k} } \right\|_{2}^{2} + H_{1} ({\mathbf{y}})} \right\}.$$
(11)

We use a variable \({\mathbf{h}}^{k}\) to represent the constant term. Equation (11) can be rewritten as

$${\mathbf{m}}^{k + 1} { = }\arg \min \left\{ {\frac{L}{2}\left\| {{\mathbf{y}} - {\mathbf{h}}^{k} } \right\|_{2}^{2} + \lambda \left\| {\mathbf{y}} \right\|_{1} } \right\}.$$
(12)

The variable \({\mathbf{h}}^{k}\) is the algebraic result of the solution \({\mathbf{y}}^{k}\) in the last iteration

$${\mathbf{h}}^{k} = {\mathbf{y}}^{k} - \frac{1}{L}\left[ {\nabla f\left( {{\mathbf{y}}^{k} } \right) - {\mathbf{b}}^{k} } \right].$$
(13)

Taking the derivative with respect to \({\mathbf{y}}\) in Eq. (12),

$$L({\mathbf{y}} - {\mathbf{h}}^{k} ) + \lambda {\text{sgn}} ({\mathbf{y}}) = 0.$$
(14)

Thus, the solution is given by

$${\mathbf{y}} = {\mathbf{h}}^{k} - \frac{\lambda }{L}{\text{sgn}} \left( {\mathbf{y}} \right).$$
(15)

Since \(h^{k}\)and \({\lambda \mathord{\left/ {\vphantom {\lambda L}} \right. \kern-\nulldelimiterspace} L}\) are constants in Eq. (15), we can get the solution by the soft thresholding algorithm (Aster et al. 2012). When \({\mathbf{h}}^{k} > {\lambda \mathord{\left/ {\vphantom {\lambda L}} \right. \kern-\nulldelimiterspace} L}\), if \({\mathbf{y}} < 0\), then \({\text{sgn}} ({\mathbf{y}}) = - 1\) and \({\mathbf{y}} = {\mathbf{h}}^{k} + {\lambda \mathord{\left/ {\vphantom {\lambda L}} \right. \kern-\nulldelimiterspace} L} > 0\), and there is a contradiction. But if \({\mathbf{y}} > 0\), then \({\text{sgn}} ({\mathbf{y}}) = 1\) and \({\mathbf{y}} = {\mathbf{h}}^{k} - {\lambda \mathord{\left/ {\vphantom {\lambda L}} \right. \kern-\nulldelimiterspace} L} > 0\), which is the correct solution in accordance with the conditions. In the same way, we can also derive the solution when \({\mathbf{h}}^{k} < {\lambda \mathord{\left/ {\vphantom {\lambda L}} \right. \kern-\nulldelimiterspace} L}\). In summary, the solution can be expressed as

$${\mathbf{y}} = \left\{ \begin{aligned} {\mathbf{h}}^{k} - \frac{\lambda }{L}, \, & {\mathbf{h}}^{k} > \frac{\lambda }{L} \\ {\mathbf{h}}^{k} + \frac{\lambda }{L}, \, & {\mathbf{h}}^{k} < - \frac{\lambda }{L} \\ 0, \, & otherwise \\ \end{aligned} \right..$$
(16)

According to the above statements, the procedure for our inversion method is as follows:

(1) Set the initial solution \({\mathbf{m}}^{0}\), tolerance value \(\varepsilon\) and the number of iterations N.

(2) Calculate the Lipschitz constant \(L\).

(3) For each iteration k = 1, 2, 3, …, N:

(a) Update \(\omega\) and \({\mathbf{b}}^{k}\)

$${\mathbf{b}}^{k} = \lambda \alpha {\text{sgn}} \left( {{\mathbf{m}}^{k} } \right)$$
(17)

(b) Compute \({\mathbf{y}}^{k}\)

$${\mathbf{y}}^{k} = {\mathbf{m}}^{k} + \omega ({\mathbf{m}}^{k} - {\mathbf{m}}^{{k{ - }1}} ),$$
(18)

and \({\mathbf{h}}^{k}\)

$${\mathbf{h}}^{k} = {\mathbf{y}}^{k} - \frac{1}{L}\left[ {{\mathbf{G}}^{{\mathbf{T}}} {\mathbf{G}}y^{k} - {\mathbf{G}}^{{\mathbf{T}}} {\mathbf{d}} - {\mathbf{b}}^{k} } \right].$$
(19)

(c) Update \({\mathbf{m}}^{k + 1}\) by the soft thresholding algorithm

$${\mathbf{m}}^{k + 1} = sthresh({\mathbf{h}}^{k} ,\frac{\lambda }{L})$$
(20)

(d) Check the convergence condition

$$\frac{{\left\| {{\mathbf{m}}^{k + 1} - {\mathbf{m}}^{k} } \right\|_{2} }}{{1 + \left\| {{\mathbf{m}}^{k + 1} } \right\|_{2} }} < \varepsilon$$
(21)

3 Synthetic Data Test and Analysis

3.1 Strata Model

A multilayer geological model (Fig. 2) is created and used to generate a pre-stack seismic gather dataset to elucidate the robustness, convergence speed and stability of the proposed inversion method. The detailed model parameters are shown in Table 1, including the P-wave velocity \(V_{{\text{P}}}\), S-wave velocity \(V_{{\text{S}}}\) and bulk density \(\rho\) of each layer. The model consists of 21 layers with different thicknesses, which generates 20 spike reflection coefficients at the interface between two adjacent strata. According to the vertical variation of the model parameters, there will be some small reflection coefficients in the spike sequence (e.g., 2nd, 9th, 11th). The pre-stack reflection coefficient series \(r\left( {t,\theta } \right)\) at different incidence angles are generated by the Aki–Richards approximate equation (Aki and Richards 1980). We assume that the wavelet is a zero-phase Ricker wavelet whose peak frequency is 30 Hz. The synthetic gathers with different incidence angles (5°, 15° and 25°) can be obtained by convolution. Moreover, we add Gaussian random noise with different signal-to-noise ratios (SNR, of 15, 10 and 5, respectively) to verify the robustness of the inversion method in the following section. Noise-free and noisy records are shown in Fig. 3.

Fig. 2
figure 2

Strata model with different velocities and bulk density in 21 layers

Table 1 Elastic parameters of the multilayer model in the synthetic example
Fig. 3
figure 3

Synthetic pre-stack gathers with different signal-to-noise (SNR). Three incidence angles seismic traces are shown, for 5, 15, and 25 degrees. a No noise. b SNR = 15. c SNR = 10. d SNR = 5

3.2 Analysis of \(L_{1 - 2}\)-Norm Sparse Constraint.

To illustrate the sparsity advantage of the \(L_{1 - 2}\)-norm over the \(L_{1}\)-norm, inversion methods based on different regularization terms are applied to the synthetic data. First, we adopted the FISTA (Pérez et al. 2012) method to do the pre-stack inversion, which utilizes the \(L_{1}\)-norm regularization term to solve the objective function. Furthermore, our proposed algorithm is implemented, in which \(L_{1 - 2}\)-norm regularization is adopted. Both algorithms stop the iteration process under the same convergence condition, that is, Eq. (21). The tolerance value is set to \(1 \times 10^{ - 5}\). The input data are synthetic data with SNR = 5 in the inversion. The inversion results are shown in Fig. 4, including the reflection coefficient inversion results for three angles. The gray bars represent the true reflectivity series, and the black bars are the inversion results.

Fig. 4
figure 4

Comparison of inversion results with two sparse constraints for the pre-stack seismic inversion. a-c Inversion results with L1-2-norm constraint. d-f Inversion results with L1-norm constraint. The results are estimated with different incident angles: (a, d) 5 degrees, (b, e) 15 degrees, and (c, f) 25 degrees. The gray and black bars correspond to the actual reflectivity series and the inversion results, respectively

In FISTA, the regularization parameter is assigned to be \(5 \times 10^{ - 2}\), which was selected by the L-curve. In our proposed method, there are two regularization parameters that need to be determined, of which \(\lambda = 8 \times 10^{ - 3}\) and \(\alpha = 1 \times 10^{ - 3}\) in this example. The iteration numbers of the new algorithm and FISTA are 163 and 59, respectively. For the environment we tested (i7-10700F processor, 32G memory, 64-bit processing system), it took about 3.3520 s and 0.4292 s, respectively. The comparison shows that FISTA has a faster speed when reaching the same convergence condition. At the same time, we carried out error analysis of the different inversion results. The correlation coefficient (CC) is 0.9762, and the normalized root-mean-square error (NRMSe) is \(1.46\%\) in FISTA. Correspondingly, our proposed method adopting \(L_{1 - 2}\)-norm regularization can significantly reduce the error of the solution, the CC is increased to 0.9971, and the NRMSe is reduced to \(0.55\%\). Since most of the elements in the sparse solution are zero, the difference in error analysis may not be as significant as in Fig. 4. Obviously, the inversion results based on \(L_{1}\)-norm regularization are not consistent with the amplitude of the real reflection coefficient at some time points. \(L_{1 - 2}\)-norm regularization can make up for this deficiency under the condition of sparsity. The analysis of the first numerical experiment shows that the \(L_{1 - 2}\)-norm regularization can significantly improve the accuracy of the inversion solution compared with the common \(L_{1}\)-norm regularization.

3.3 Analysis of Inversion Solving Method

In the methodology section, we briefly introduced common methods for solving the \(L_{1 - 2}\)-norm regularization, one of which is DCA-ADMM. For the detailed derivation of DCA-ADMM, please refer to Yin et al. (2015b). However, it is quite different from our new algorithm, whether the construction of DCA, or the subsequent iterative solution method. Compared with the DCA-ADMM method, our proposed algorithm has a more straightforward solving process in \(L_{1 - 2}\)-norm regularization since there is no need to update multiple variables alternately.

We also use synthetic data with SNR = 5 to compare the advantages and disadvantages of two different solving methods. The inversion results are shown in Fig. 5. As before, the gray and black bars represent the real reflection coefficient and the inversion result. In contrast to the previous comparison, the solutions based on the same regularization have no noticeable difference in Fig. 5. Combined with the results of error analysis, there is little difference between the DCA-ADMM and new algorithm in CC (0.9963 and 0.9971), NRMSe (\(0.57\%\) and \(0.55\%\)) and program running time (3.1973 s and 3.6673 s) in this example. The main difference between the two algorithms is sparsity. The previous section introduced the model with 20 reflection interfaces, so in the case of three incidence angles, the \(L_{0}\)-norm of the real reflection coefficient is 60, which is called sparsity. By calculating the norm of the solution obtained by the two methods, we can determine the ability of the two algorithms to obtain sparse solutions. The sparsity of the solution obtained by using the proposed algorithm in this paper is 80. However, the sparsity of the solution obtained by using DCA-ADMM reaches 1497, which means that the solution is not strictly sparse. There are a lot of tiny values in many places that are supposed to be zero. Even if values below \(1.9 \times 10^{ - 4}\) are ignored, which is the calculation result of multiplying the minimum real reflection coefficient amplitude \(1.9 \times 10^{ - 3}\) by \(0.1\), and the \(L_{0}\)-norm is reduced to 158, the sparsity is still lower than the solution obtained by the new method. This can be observed in Fig. 5a-c, where there are fewer small spikes than in Fig. 5d-f. Therefore, our proposed new method is more suitable for sparse constraint inversion.

Fig. 5
figure 5

Comparison of inversion results with different solving methods by L1-2-norm sparse constraint. a-c Inversion results by the new method. d-f Inversion results by DCA-ADMM. The results are estimated with different incident angles; (a, d) 5 degrees, (b, e) 15 degrees, and (c, f) 25 degrees. The gray and black bars correspond to the actual reflectivity series and the inversion results, respectively

3.4 Analysis of Noise Effects

Different amounts of noise are generally included in pre-stack synthetic data. It is a crucial problem to ensure the stability of the inversion method without the influence of noise. The SNRs of the synthetic data used for inversion analysis are 5, 10 and 15. The corresponding results are shown in Fig. 6 by using the new method. Regardless of the noise level, a suitable regularization parameter can be determined to make the inversion results match well with the actual model. The CC of the inversion result reaches 0.9996 when the SNR is 15. Even if the SNR decreases to 5 due to a noise increase in the gather, the CC is still as high as 0.9971. The analysis shows that the proposed algorithm is robust for noisy seismic data.

Fig. 6
figure 6

Comparison of inversion results with different SNR. a-c SNR = 5, d-f SNR = 10, g-i SNR = 15. The results are estimated with different incident angles: (a, d, g) 5 degrees, (b, e, h) 15 degrees, and (c, f, i) 25 degrees. The gray and black bars correspond to the actual reflectivity series and the inversion results, respectively

4 Adaptive Regularization Parameter Selection Method

4.1 Analysis of Regularization Parameters

The last problem that needs to be considered is the selection of regularization parameter in inversion processing. In our proposed algorithm, there are two regularization parameters \(\lambda\) and \(\alpha\) that need to be determined. The effect of \(\lambda\) is like the regularization parameter in Tikhonov regularization, which is used to adjust the weight of the misfit function and regularization constraint. As a weight parameter within the \(L_{1 - 2}\) norm, the influence of \(\alpha\) is rarely analyzed. Here, we update the sparsity of the model parameters by adding or combining thin layers based on the original multilayer model. The sparsity of the two new models is 30 and 90. By changing \(\alpha\) in a wide range, we analyze the influence of weight parameter on the inversion results by using three models, which are different in sparsity. We implemented the algorithm on the synthetic records related to these models and calculated the NRMSe of the solutions. Figure 7 shows the variation of NRMSe with regularization parameter \(\alpha\). We find that the influence of parameter \(\alpha\) on the inversion results is closely related to its value. According to the analysis, the error of the solution will decrease rapidly when \(\alpha\) is less than 1 and then settles into a stable state. Likewise, the trend of the variation curve is like the L-curve, where there is an inflection point. More importantly, the influence of \(\alpha\) on the solution tends to be stable when it is less than 0.1. Therefore, to reduce the difficulty of solving the inverse problem, we set the parameter \(\alpha\) to a fixed value of 0.01 in the following test.

Fig. 7
figure 7

Variation of NRMSe with regularization parameter \(\alpha\) in different sparsity models (sparsity = 30, 60, 90)

4.2 Adaptive Selection Method

Since the algorithm only updates the solution in a critical step and does not need to update multiple variables alternately like ADMM, it is possible to use other methods to obtain regularization parameters adaptively. By reasonably changing the regularization parameters to adjust the weight of the regularization term, \(L_{1 - 2}\)-norm regularization can effectively solve the ill-posedness of the inverse problem and help us obtain a sparse solution, as shown in the previous section. Generally, the larger the regularization parameter, the greater the dependence of the inversion solution on the initial model, and the less sensitive to small perturbations (Thore 2015). The strategy to select an appropriate regularization parameter quickly and effectively is an important problem to be solved. Common methods include L-curve (Hansen 1992) and GCV (Golub et al. 1979) and have been widely used in seismic inversion (Gholami 2016; Huang et al. 2017). To adjust the regularization parameters adaptively during inversion iteration, we introduce a parameter selection method based on generalized Stein unbiased risk estimation (G-SURE) (Eldar 2008).

In order to find a function \(F\left( {\mathbf{d}} \right) = {\hat{\mathbf{m}}}\) with noisy data \({\mathbf{d}}\left( {\theta_{i} } \right)\) to minimize the mean square error (MSE), where \({\hat{\mathbf{m}}}\) represents an arbitrary estimate of \({\mathbf{m}}\), Stein (1981) proposed Stein’s unbiased risk estimate (SURE), which is proved to be better than the common maximum likelihood estimation. To further extend the applicability of the SURE to a broad class of problems, Eldar (2008) proposed a generalized SURE by adding a penalty term to the expression. . For example, a sufficient statistic \({\mathbf{u}}\) for estimating \({\mathbf{m}}\) in linear Gaussian model is given by

$${\mathbf{u}} = {\mathbf{G}}^{{\mathbf{T}}} {\mathbf{d}}.$$
(22)

In any case, \(F\left( {\mathbf{d}} \right)\) can be expressed as \(F\left( {\mathbf{u}} \right)\) based on sufficient statistic \({\mathbf{u}}\). We can express the MSE of \({\hat{\mathbf{m}}}\) as

$$E\left\{ {\left\| {{\hat{\mathbf{m}}} - {\mathbf{m}}} \right\|_{2}^{2} } \right\} = \left\| {\mathbf{m}} \right\|_{2}^{2} + E\left\{ {\left\| {F\left( {\mathbf{u}} \right)} \right\|_{2}^{2} } \right\} - 2E\left\{ {F^{T} \left( {\mathbf{u}} \right){\mathbf{m}}} \right\},$$
(23)

where \(\left\| {\mathbf{m}} \right\|_{2}^{2}\) is a constant. The purpose of G-SURE is to minimize MSE, so we define

$$v\left( {F,{\mathbf{m}}} \right) = E\left\{ {\left\| {F\left( {\mathbf{u}} \right)} \right\|_{2}^{2} } \right\} - 2E\left\{ {F^{T} \left( {\mathbf{u}} \right){\mathbf{m}}} \right\},$$
(24)

where \(v\left( {F,{\mathbf{m}}} \right)\) is a function that estimates the parameters of the model, and it can help us estimate MSE accurately. In Eq. (24), the size of MSE depends on the actual model parameter \({\mathbf{m}}\) and \(F\left( {\mathbf{u}} \right)\), while \({\mathbf{m}}\) is unknown. Specifically, Eldar (2008) solved the problem by constructing a function \(g\left( {F\left( {\mathbf{u}} \right)} \right)\) which satisfies

$$E\left\{ {g\left( {F\left( {\mathbf{u}} \right)} \right)} \right\} = E\left\{ {F^{T} \left( {\mathbf{u}} \right){\mathbf{m}}} \right\}.$$
(25)

Then, unbiased estimation of \(v\left( {F,m} \right)\) can be expressed as

$$\hat{v}\left( F \right) = \left\| {F\left( {\mathbf{u}} \right)} \right\|_{2}^{2} - 2g\left( {F\left( {\mathbf{u}} \right)} \right).$$
(26)

The details of Eq. (26) are provided in Appendix A. The unbiased risk estimation based on \(F\left( {\mathbf{u}} \right)\) can be expressed as

$$\hat{v}\left( F \right) = \left\| {F\left( {\mathbf{u}} \right)} \right\|_{2}^{2} + 2Tr\left[ {\frac{{\partial F\left( {\mathbf{u}} \right)}}{{\partial {\mathbf{u}}}}} \right] - 2F^{T} \left( {\mathbf{u}} \right)\left( {{\mathbf{G}}^{{\mathbf{T}}} {\mathbf{G}}} \right)^{ - 1} {\mathbf{u}}.$$
(27)

In each solving algorithm, once the critical steps of the iterative solution are determined, the detailed form of Eq. (27) can be obtained. By deriving \(\hat{v}\left( F \right)\) for the regularization parameter \(\lambda\) and making it equal to 0, the regularization parameters in each iteration can be calculated.

Now, we extend the approach to the algorithm proposed in the paper. Substituting Eq. (19) and Eq. (22) into Eq. (16),

$$F\left( {\mathbf{u}} \right) = \left\{ \begin{aligned} {\mathbf{y}}^{k} - \frac{1}{L}\left[ {{\mathbf{G}}^{{\mathbf{T}}} {\mathbf{Gy}}^{k} - {\mathbf{u}} - {\mathbf{b}}^{k} } \right] - \frac{\lambda }{L}, \, & h^{k} > \frac{\lambda }{L} \\ {\mathbf{y}}^{k} - \frac{1}{L}\left[ {{\mathbf{G}}^{{\mathbf{T}}} {\mathbf{Gy}}^{k} - {\mathbf{u}} - {\mathbf{b}}^{k} } \right] + \frac{\lambda }{L}, \, & h^{k} < - \frac{\lambda }{L} \\ 0, \, & otherwise \\ \end{aligned} \right..$$
(28)

We suppose that there are \(p\) components greater than \({\lambda \mathord{\left/ {\vphantom {\lambda L}} \right. \kern-\nulldelimiterspace} L}\) and \(q\) components less than \({{ - \lambda } \mathord{\left/ {\vphantom {{ - \lambda } L}} \right. \kern-\nulldelimiterspace} L}\) in the \(k\text{th}\) iteration. By arranging them together, we can form two vectors with dimensions \(p\) and \(q\), marked as \({\mathbf{a}}_{1}\) and \({\mathbf{a}}_{2}\). In the same iteration, the components of \(({\mathbf{G}}^{{\mathbf{T}}} {\mathbf{G}})^{ - 1} {\mathbf{u}}\) corresponding to the indexes \({\mathbf{a}}_{1}\) and \({\mathbf{a}}_{2}\) in \(\hat{v}\left( F \right)\) are recorded as \({\mathbf{w}}_{1}\) and \({\mathbf{w}}_{2}\), respectively. Therefore, we can get the expression of each term in Eq. (27)

$$\left\| {F\left( {\mathbf{u}} \right)} \right\|_{2}^{2} = \sum\limits_{i = 1}^{p} {\left( {{\mathbf{a}}_{1}^{i} - \frac{\lambda }{L}} \right)}^{2} + \sum\limits_{j = 1}^{q} {\left( {{\mathbf{a}}_{2}^{j} + \frac{\lambda }{L}} \right)^{2} },$$
(29)
$$Tr\left[ {\frac{{\partial F\left( {\mathbf{u}} \right)}}{{\partial {\mathbf{u}}}}} \right] = \frac{p + q}{L},$$
(30)

and

$$F^{T} \left( {\mathbf{u}} \right)\left( {{\mathbf{G}}^{{\mathbf{T}}} {\mathbf{G}}} \right)^{ - 1} {\mathbf{u}} = \sum\limits_{i = 1}^{p} {\left( {{\mathbf{a}}_{1}^{i} - \frac{\lambda }{L}} \right)} {\mathbf{w}}_{1}^{i} + \sum\limits_{j = 1}^{q} {\left( {{\mathbf{a}}_{2}^{j} + \frac{\lambda }{L}} \right){\mathbf{w}}_{2}^{j} } .$$
(31)

We can get that the specific form of Eq. (27) under the framework of the new algorithm

$$\begin{gathered} \hat{v}\left( F \right) = \sum\limits_{i = 1}^{p} {\left( {{\mathbf{a}}_{1}^{i} - \frac{\lambda }{L}} \right)}^{2} + \sum\limits_{j = 1}^{q} {\left( {{\mathbf{a}}_{2}^{j} + \frac{\lambda }{L}} \right)^{2} } + \frac{{2\left( {p + q} \right)}}{L} - 2\left[ {\sum\limits_{i = 1}^{p} {\left( {{\mathbf{a}}_{1}^{i} - \frac{\lambda }{L}} \right)} {\mathbf{w}}_{1}^{i} - \sum\limits_{j = 1}^{q} {\left( {{\mathbf{a}}_{2}^{j} + \frac{\lambda }{L}} \right){\mathbf{w}}_{2}^{j} } } \right] \\ = \sum\limits_{i = 1}^{p} {\left[ {\left( {{\mathbf{a}}_{1}^{i} - \frac{\lambda }{L}} \right)^{2} - 2\left( {{\mathbf{a}}_{1}^{i} - \frac{\lambda }{L}} \right){\mathbf{w}}_{1}^{i} } \right]} + \sum\limits_{j = 1}^{q} {\left[ {\left( {{\mathbf{a}}_{2}^{j} + \frac{\lambda }{L}} \right)^{2} - 2\left( {{\mathbf{a}}_{2}^{j} + \frac{\lambda }{L}} \right){\mathbf{w}}_{2}^{j} } \right]} + \frac{{2\left( {p + q} \right)}}{L}. \\ \end{gathered}$$
(32)

To obtain the optimal regularization parameter \(\lambda^{k}\) in \(k_{th}\) iteration, we have that

$$\frac{{d\hat{v}}}{d\lambda } = \frac{{2\lambda \left( {p + q} \right)}}{{L^{2} }} - \frac{2}{L}\sum\limits_{i = 1}^{p} {\left( {{\mathbf{a}}_{{{\kern 1pt} 1}} - {\mathbf{w}}_{1} } \right)} + \frac{2}{L}\sum\limits_{j = 1}^{p} {\left( {{\mathbf{a}}_{2} - {\mathbf{w}}_{2} } \right)} .$$
(33)

We can conclude that

$$\lambda^{k} = \frac{{L\left[ {\sum\limits_{i = 1}^{p} {\left( {{\mathbf{a}}_{1} - {\mathbf{w}}_{1} } \right)} - \sum\limits_{j = 1}^{q} {\left( {{\mathbf{a}}_{2} - {\mathbf{w}}_{2} } \right)} } \right]}}{p + q}.$$
(34)

It is worth noting that the solution is unstable if \((G^{T} G)^{ - 1} u\) is calculated directly in Eq. (31); we can solve the problem by calculating the generalized inverse \(G^{\dag }\) (Infante-Pacheco et al. 2020). Both singular value decomposition (SVD) and truncated singular value decomposition (TSVD) are alternative methods. If the latter method is selected, the GCV method can determine the truncation parameter.

4.3 Applicability of \(L_{1 - 2}\)-Norm Regularization

To study the applicability of this adaptive strategy in \(L_{1 - 2}\)-norm regularization, we use the noisy synthetic data (SNR = 5) in numerical examples as the input data for verification. Before we apply the adaptive strategy to seismic data, a significant problem is the determination of iterative convergence conditions. Once this condition is met, the inversion process is suspended to avoid meaningless computation. In the numerical experiments, we found that the sparsity of the solution will be continuously reduced if Eq. (21) is taken as the convergence condition in the adaptive process, which will cause the components of the solution to disappear. To illustrate this process, we tested the adaptive parameter selection method with three different regularization parameters as initial values independently and recorded the regularization parameters \(\lambda\) in the adaptive iterative process.

Figure 8 shows the recorded results in this process and depicts the variation of solution and regularization parameters. The change of the solution is represented by the \(L_{1 - 2}\) norm, which reflects sparsity. The three initial parameters are \(1 \times 10^{ - 3}\), \(1 \times 10^{ - 5}\) and \(1 \times 10^{ - 7}\), corresponding to the light gray, black and dark gray lines, respectively, in Fig. 8. During the initial iteration, there is a significant increase in the regularization term. The reason for this change is that the initial solution of the inversion is zero, and the solution is rapidly searching in the direction of descending gradient, while the solution is not sparse. After several times of jitter reduction, the value of the regularization term will decrease sharply, and the regularization parameter \(\lambda\) will move away from the low-value range and increase significantly in a few successive iterations. In this case, the solution is optimal in the whole iterative process. If the iterative process does not stop, then a following adaptive iterative process will continue to increase the sparsity of the solution until almost all the elements in the inverse solution are zero, which will obviously obtain a solution that is different from the actual model. Equation (21) is not satisfied at the optimal points, which shows that it is not suitable as an iterative convergence condition.

Fig. 8
figure 8

a Variation of L1-2-norm in the iteration process. b Variation of regularization parameter \(\lambda\) in the iteration process. The solid lines represent the variation of properties when the convergence condition is satisfied, and the dashed lines represent the variation of properties after the convergence condition is satisfied. The optimal points are marked in the case of different initial regularization parameters

Combined with the above analysis results, we choose the following conditions as the convergence condition during iteration

$$\left( {\left\| {\mathbf{m}} \right\|_{{_{1} }}^{k} - \alpha \left\| {\mathbf{m}} \right\|_{{_{2} }}^{k} } \right) - \left( {\left\| {\mathbf{m}} \right\|_{{_{1} }}^{k - 1} - \alpha \left\| {\mathbf{m}} \right\|_{2}^{k - 1} } \right) < 0{\textrm{and }}\lambda^{k} - \lambda^{k - 1} > \varepsilon .$$
(35)

The establishment of a new convergence condition combines the rules of parametric variation in Fig. 8. Obviously, Eq. (35) shows that the slope of the regularization term curve is positive, which is opposite to that of the regularization parameter variation curve, and the difference of the regularization parameters between the \(k\) and \(k + 1\) iteration is greater than the tolerance \(\varepsilon\).

In the synthetic data test, we set the tolerance \(\varepsilon\) to 0.03 and the parameter \(\alpha\) to 0.01. Figure 9a-c shows the inversion results using these three regularization parameters directly, and Fig. 9d-f shows the inversion results using the adaptive adjustment strategy, where they are only used as initial values. It is worth mentioning that we only compare the situation when the incidence angle is 5° which is like that of other incidence angles. Obviously, a wrong solution can be obtained by directly applying the inversion method in the case of inappropriate regularization parameters. After applying the adaptive strategy, the sparsity of the solution is greatly improved compared with Fig. 9a-c. Nevertheless, the solution is still not optimal because the amplitude of the reflection coefficient has not been well recovered, even if the reflection coefficient appears at the correct time. Moreover, we can adjust the amplitude of the spike to the appropriate value with the help of a hybrid FISTA least-squares strategy (Pérez et al. 2013).

Fig. 9
figure 9

Inversion results of three strategies for regularization parameter. a-c Fixed regularization parameter strategy, d-f adaptive method and g-i hybrid strategy. The results are estimated with different initial regularization parameters: (a, d, g) \(1 \times 10^{ - 3}\), (b, e, h) \(1 \times 10^{ - 5}\), and (c, f, i) \(1 \times 10^{ - 7}\). The gray and black bars correspond to the actual reflectivity series and the inversion results, respectively

The main idea of the hybrid strategy is to obtain the optimized solution \({\mathbf{m}}_{hybrid}\) by rewriting the forward matrix \({\mathbf{G}}\) to \({\mathbf{B}}\), which is a dimension reduction form. The optimized solution \({\mathbf{m}}_{hybrid}\) is computed by the least squares method

$${\mathbf{m}}_{hybrid} = {\mathbf{m}}_{spike} + \left( {{\mathbf{B}}^{{\mathbf{T}}} {\mathbf{B}}} \right)^{ - 1} \left( {{\mathbf{d}} - {\mathbf{Bm}}_{spike} } \right),$$
(36)

where \(m_{spike}\) is the rearranged solution by removing the zero element. The results are shown in Fig. 9g-i. After applying the hybrid strategy, the CC of the inversion results is 0.9977, 0.9973 and 0.9967, respectively. Therefore, the application of the hybrid strategy can effectively improve the quality of inversion. Considering that the selection of fixed regularization parameters in the actual data may lead to uncertainty at the non-well location, the adaptive method can be applied to the inversion of real data as an optional method.

5 Real-Data Application

Finally, we apply the inversion method to real field data to verify the applicability of our proposed new algorithm and adaptive parameters selection strategy. To enhance the SNR of pre-stack gathers, we utilize partially stacked gathers to do the pre-stack inversion from the original gathers. The near-, mid- and far-angles partially stacked gathers are obtained, and the corresponding incidence angles are \(8^{ \circ }\), \(16^{ \circ }\) and \(24^{ \circ }\), respectively. The time interval of the target layer ranges from 3100 to 4400 ms on the seismic profile (Fig. 10a-c), in which there are apparent strong amplitude anomalies and fault development, as it is a typical sandstone reservoir. The frequency range of seismic data is about 4 ~ 80 Hz, and the location of one drilled oil well is at CDP number 292.

Fig. 10
figure 10

Three partial stacked seismic sections. a Near-angle stacked section, b mid-angle stacked section, and c far-angle stacked section

Before applying the inversion method, it is necessary to accurately obtain the time–depth relationship (TDR) by well-seismic calibration. The corresponding range of well-logging data on the seismic profile is 3660 to 4100 ms. Three angle-dependent wavelets are estimated from the partially stacked seismic sections. To show the accuracy of the TDR and estimated wavelet, we use the velocity and density logging data in the well to make synthetic records, which were compared with the field data at the well location. The result is shown in Fig. 11, where the x-axis represents incidence angle, and the y-axis represents time. The synthetic data are shown as the black curve and the field data as the red curve. We analyzed the similarity of seismic traces at different incidence angles. The CC is 0.8813, 0.8751 and 0.7300, which represent incidence angles of \(8^{ \circ }\), \(16^{ \circ }\) and \(24^{ \circ }\), respectively. The analysis results show that the synthetic records are in good agreement with the field data in the time domain. And the SNR of the far-angle stack data is not as high as the near-angle stack.

Fig. 11
figure 11

Comparison of field data (red) and synthetic data (black) at the well location in the time domain

In the case of applying the general inversion method to the actual data, we first need to determine the appropriate regularization parameters according to the seismic data at the well location and then extend the parameters to the seismic data volume. The regularization parameter selection method proposed in this paper can adaptively select the appropriate regularization parameters according to the inversion data and the extracted wavelet. It is no longer necessary to extend the regularization parameters determined by synthetic data to the whole dataset.

As an effective parameter to identify reservoir properties, seismic impedance can be obtained via the following equation according to the convolution model

$$EI\left( t \right) = EI\left( {t_{0} } \right) * \exp \left[ {2 * C * r\left( t \right)} \right],$$
(37)

where \(EI\) is elastic impedance in time domain t, C is an integral matrix, and \(\exp [ \cdot ]\) denotes exponential operation (Zhang et al. 2014). Here, we adopt Eq. (37) to calculate the seismic impedance from the inverted reflectivity, which is relative impedance. Moreover, for real field data, we can compare it with the result calculated by the logging data. Considering the attenuation in seismic wave propagation, the logging data should be processed by a low-pass filter before calculating relative impedance. At the same time, in the process of calculating seismic impedance by using Eq. (37), if there is an error in the reflection coefficient at a specific time, there will be a low-frequency cumulative error in the whole integration process. To avoid cumulative errors, the usual method is to filter the calculation results, which will cause the results to be band-limited, so the calculation results of logging data need to be processed by the same filter (Wang et al. 2019). Figure 12 compares the actual well-logging relative impedance trace and the inverted relative impedance traces by using two inversion methods. The black solid line represents the relative impedance by filtering the well-logging curves. The gray dashed line and gray dotted line represent the inversion results of the adaptive and non-adaptive method, with CC of 0.7585 and 0.7521, respectively. Although the CC between two inversion methods is equivalent, significant errors may occur if the optimal regularization parameters obtained from the trace at the well location are extended to other traces.

Fig. 12
figure 12

Comparison of the actual well-logging relative impedance trace and the inverted relative impedance traces by using two inversion methods in the time domain. The black line is relative impedance by filtering from the well-logging curves, the gray dashed line represents the result by the adaptive method, and the gray dotted line represents the result by the non-adaptive method

Through the comparison of inversion results at the well location, the accuracy and effectiveness of our proposed inversion method are verified. Therefore, we inverted the whole seismic section data (shown in Fig. 10). Three angle reflectivity series sections are obtained by using the proposed inversion method and the adaptive parameters selection strategy for real field data processing. Moreover, we can calculate the relative seismic impedance using Eq. 37, which is based on the convolution model of reflectivity coefficient. The inversion results for three-angle stack seismic sections are shown in Fig. 13. Comparing the near- and mid-angles relative impedance sections, at 3700 ms, there is a bright spot, and its impedance is significantly higher than that of the surrounding rocks, so it can be considered as a favorable exploration target. The relative impedance section of the far-angle stacked seismic section (Fig. 13c) is not consistent with the inversion results of the near- and mid-angles stacked seismic sections, but it is consistent with the pre-stack seismic gathers as shown in Fig. 11 and the stacked seismic section as shown in Fig. 10c.

Fig. 13
figure 13

Inverted relative impedance section of a the near-angle stacked seismic section, b the mid-angle stacked seismic section and c the far-angle stacked seismic section, which corresponds to Fig. 10

6 Conclusions

We have combined the sparse regularization of \(L_{{1 - 2}}\)-norm and the proximal difference-of-convex algorithm (pDCA) to implement a pre-stack seismic inversion technique to obtain reflectivity coefficients from pre-stack seismic gather data. The analysis of synthetic data inversion results verified that \(L_{1 - 2}\)-norm is better than the traditional \(L_{1}\)-norm in sparsity as a regularization penalty term. At the same time, the analysis of the inversion solving method and noise effects indicated that our implemented inversion method by using the pDCA is also more suitable for sparse constraint inversion than the DCA-ADMM, which is a valuable inversion algorithm used in seismic inversion.

We presented an adaptive hybrid strategy, discussing the influence of regularization parameter \(\alpha\) in \(L_{{1 - 2}}\)-norm and deriving the adaptive selection method of regularization parameter \(\lambda\) based on G-SURE, which could reduce the computation time to obtain the appropriate parameter. The analysis of different sparsity inversions of synthetic data inversion indicated that our adaptive parameter section method could effectively improve the quality of inversion results. Considering that the selection of fixed regularization parameters in real data may lead to uncertainty at the non-well location, the adaptive method can be applied to the inversion of real data as an optional method.

If only one optimization parameter is considered, the computing times of adaptive and non-adaptive methods are the same. However, the limitation of the non-adaptive method is that it needs many tests to obtain a stable regularization parameter. Therefore, the non-adaptive method is more time-consuming in obtaining a solution with the same precision. The adaptive method simplifies the process of parameter selection, which only requires an initial value to update the regularization parameters in the iterative process. Nevertheless, the adaptive method based on G-SURE needs to use the generalized inverse operator \(G^{\dag }\) to obtain a low-precision initial solution \((G^{T} G)^{ - 1} u\). Each component of the initial solution will be tested to determine whether it is appropriate in the iterative process. There is no doubt that the choice of the initial solution will influence the effect of the adaptive method. In this paper, we use truncated singular value decomposition (TSVD) to obtain a generalized inverse, which has an obvious effect and avoids the introduction of more regularization parameters.

We presented an application to real field data by using the proposed pre-stack seismic inversion and adaptive hybrid strategy for selection of regularization parameters. The results showed that our inversion method is effective and stable for real field seismic data, which is generally contaminated by ambient noise. In this paper, we calculated the relative impedance from the inverted reflectivity coefficient series in the time domain according to the convolution model. The relative impedance is band-limited because the seismic data are band-limited, and filtering is required to eliminate the low frequency accumulated error in the integration process. More importantly, the elastic parameters cannot be directly inverted from the relative impedance; this requires wide-band impedance data based on a low-frequency model, but this is not the focus of this paper. Considering the wider application of sparse constraints in geophysics as reviewed in the introduction, our proposed inversion method based on \(L_{1 - 2}\)-norm can be applied to many real problems, such as improving seismic resolution as a deconvolution method, which is another important application direction.