Keywords

1 Introduction

Seismic modelling becomes a common tool to investigate peculiarities of wave propagation in realistic complex models of the Earth’s interior [2, 12, 24] verification of the seismic processing and inversion algorithms, and as a part of the inversion methods, [21]. However, simulation of seismic wave propagation in complex media is one of the most computationally demanding problems requiring intense use of high-performance computing. In particular, if a typical seismic acquisition system is considered, one has to simulate wavefields corresponding to hundreds of thousands of source positions (right-hand sides). Each simulation of a single shot gather is performed in a domain of about \(10^{3}\) km, which corresponds to \(100^{3}\) wavelength. Thus, up to \(8\cdot 10^{9}\) grid points are needed to obtain accurate enough numerical results. Reduction of the problem size by increasing the grid step leads to numerical error growth, which may completely destroy the solution. There are several ways to reduce the numerical dispersion, including use of high order finite-difference schemes [11], dispersion-suppression schemes [14], high-order finite element and discontinuous Galerkin methods [1, 8, 13]. However, the increase of the approach accuracy imminently leads to high computational intensity, including increased flops and RAM access operations.

The other option to reduce the numerical dispersion in the simulated wavefields is a post-processing [9, 23]. However, the standard waveform correction procedures used in seismic processing are not efficient for numerical dispersion mitigation. The error associated with the numerical dispersion depends on the wave propagation path, velocity model etc. Thus, it can not be compensated by a single-phase shift. In this paper, we suggest an approach to post-processing based on using the deep learning technique.

Deep learning finds wide application in various fields of science. Providing a large representative training dataset, deep neural networks (DNNs) can approximate complex non-linear operators within the supervised learning workflow. These DNNs can learn about highly non-linear physics and usually provide much faster computational time than traditional simulation [6, 17].

To develop an efficient algorithm for numerical dispersion mitigation, we use the following peculiarity of seismic modelling. The entire seismic dataset includes wavefields corresponding to different source positions. These positions are relatively close to each other (10 to 100 m apart). Thus, the velocity models and the simulated wavefields are similar if the source is situated nearby. It allows using a small number of sources to simulate accurate solution to be used as a training dataset. At the same time, we can simulate the entire dataset using a coarse enough grid, train the deep neural network, and then post-process the data.

The remainder of the paper has the following structure. In Sect. 2 we remind the basic concepts of seismic modelling, including the main estimates of the numerical dispersion, depending on the grid size. The description of the numerical dispersion mitigation network (NDM-net) is provided in Sect. 3. Numerical experiments illustrating the applicability of the NDM-net to the synthetic seismic data enhancement are presented in Sect. 4.

2 Seismic Modelling

Seismic wave propagation in 2D isotropic elastic media is governed by the elastic wave equation:

$$\begin{aligned} \begin{array}{cc} \begin{array}{c} \rho \frac{\partial u_{1}}{\partial t}=\frac{\partial \sigma _{11}}{\partial x_{1}}+\frac{\partial \sigma _{13}}{\partial x_{3}}, \\ \rho \frac{\partial u_{3}}{\partial t}=\frac{\partial \sigma _{13}}{\partial x_{1}}+\frac{\partial \sigma _{33}}{\partial x_{3}}, \\ \end{array} &{} \begin{array}{c} \frac{\partial \sigma _{11}}{\partial t}=(\lambda +2\mu )\frac{\partial u_{1}}{\partial x_{1}}+\lambda \frac{\partial u_{3}}{\partial x_{3}}+f_{11}(t)\delta (\boldsymbol{x}-\boldsymbol{x}_{s}), \\ \frac{\partial \sigma _{33}}{\partial t}=\lambda \frac{\partial u_{1}}{\partial x_{1}}+(\lambda +2\mu )\frac{\partial u_{3}}{\partial x_{3}} +f_{33}(t)\delta (\boldsymbol{x}-\boldsymbol{x}_{s}), \\ \frac{\partial \sigma _{13}}{\partial t}=\mu \frac{\partial u_{1}}{\partial x_{3}}+\mu \frac{\partial u_{3}}{\partial x_{1}}+f_{13}(t)\delta (\boldsymbol{x}-\boldsymbol{x}_{s}), \\ \end{array} \end{array} \end{aligned}$$
(1)

where \(\rho \) is the mass density, \(\lambda \) and \(\mu \) are the Lame parameters, \(\boldsymbol{u}=(u_{1},u_{3})^{T}\) is the particle velocity vector, \(\sigma \) is the stress tensor, \(f_{ij}(t)\) are the components of the source wavelet function, \(\delta (\boldsymbol{x})\) is the Kroneker delta-function, \(\boldsymbol{x}\) is the vector of spatial coordinates, and \(\boldsymbol{x}_{s}\) is the source coordinate. The seismic modelling is stated in half-space \(x_{3}>0\) and within bounded time interval \(t\in {[}0,T{]}\).

A common way to approximate the elastic wave equation is the use of staggered grid finite differences [11, 20], where the different components of the wavefield are defined at different spatial and temporal points with the use of symmetric stencils to approximate the derivatives:

$$\begin{aligned} \begin{array}{c} \rho D_{t}{[}u_{1}{]}_{i+1/2,j}^{n-1/2}=D_{1}{[}\sigma _{11}{]}_{i+1/2,j}^{n-1/2}+D_{3}{[}\sigma _{13}{]}_{i+1/2,j}^{n-1/2}, \\ \rho D_{t}{[}u_{3}{]}_{i,j+1/2}^{n-1/2}=D_{1}{[}\sigma _{13}{]}_{i,j+1/2}^{n-1/2}+D_{3}{[}\sigma _{33}{]}_{i,j+1/2}^{n-1/2}, \\ D_{t}{[}\sigma _{11}{]}^{n}_{i,j}=(\lambda +2\mu )D_1{[}u_{1}{]}_{i,j}^{n}+\lambda D_3{[}u_{3}{]}_{i,j}^{n}+f_{11}(t^n){[}\delta (\boldsymbol{x}-\boldsymbol{x}_{s}){]}_{i,j}, \\ D_{t}{[}\sigma _{33}{]}^{n}_{i,j}=\lambda D_{1}{[}u_{1}{]}_{i,j}^{n}+(\lambda +2\mu ) D_{3}{[}u_{3}{]}_{i,j}^{n}+f_{33}(t^n){[}\delta (\boldsymbol{x}-\boldsymbol{x}_{s}){]}_{i,j}, \\ D_{t}{[}\sigma _{13}{]}^{n}_{i+1/2,j+1/2}=\mu D_{1}{[}u_{3}{]}_{i+1/2,j+1/2}^{n}+\mu D_{3}{[}u_{1}{]}_{i+1/2,j+1/2}^{n}+\\ +f_{13}(t^{n}){[}\delta (\boldsymbol{x}-\boldsymbol{x}_{s}){]}_{i+1/2,j+1/2}, \\ \end{array} \end{aligned}$$
(2)

where finite-difference operators are

$$\begin{aligned} \begin{array}{c} D_{t}{[}g{]}_{I,J}^{N}=\frac{g_{I,J}^{N+1/2}-g_{I,J}^{N-1/2}}{\tau }, \\ D_{1}{[}g{]}_{I,J}^{N}=\frac{1}{h_{1}}\sum _{m=0}^{M}\alpha _{m}\left( g_{I+m+1/2,J}^{N}-g_{I-m-1/2,J}^{N}\right) ,\\ D_{2}{[}g{]}_{I,J}^{N}=\frac{1}{h_{2}}\sum _{m=0}^{M}\alpha _{m}\left( g_{I,J+m+1/2}^{N}-g_{I,J-m-1/2}^{N}\right) , \end{array} \end{aligned}$$
(3)

where indices IJN can be either integer or half-integer, and g is a smooth enough scalar function. The operator \(D_{t}\) approximates the temporal derivatives with the second order to the time step \(\tau \). The operators \(D_{1}\) and \(D_{2}\) approximate the spatial derivatives. However, depending on the choice of \(\alpha _{m}\) one may construct a high-order approximation up to \(2m+2\) [11], or use theses degrees of freedom to suppress numerical dispersion [14]. Note, that we are not discussing the approximation of the right-hand sides, as it is presented in [7], and the model parameters treatment because it is studied in [16, 22].

The use of symmetric stencils to approximate derivatives ensures an even order of approximation with zero coefficients of odd degrees in the differential approximation of the finite difference scheme (2). Thus the numerical error appears in the solution as a numerical dispersion without dissipation, see [19], and [3] for the details. It means that the emitted impulse will deteriorate, propagating through the media. An example of the impulse deformation due to the dispersion is presented in Fig. 1. We plot the true pulse and that travelled 30 wavelengths simulated by the second-order scheme with a spatial discretization of 10, 20, and 40 points per wavelength and Courant number equal to 0.8. Note that the maximum impulse shifts backwards in time, leading to an overestimation of the reflecting intervals depth in seismic processing and interpretation. Refining the mesh, one gets the convergence of the numerical solution to the true one, however, refining a spatial step by the factor of two leads to the increase of the problem size by the factor of 8 in 3D and 4 in 2D. Moreover, the number of flops increases by 16 in 3D and 8 in 2D because of the temporal step refinement. On the other hand, the solution obtained on the grid with 20 points per wavelength is accurate enough, and simple processing may turn it into the true one.

Fig. 1.
figure 1

An example of the pulse deformation due to numerical dispersion.

3 Numerical Dispersion Mitigation Network (NDM-net)

Convolutional Neural Networks (CNN) are usually applied to analyze visual imagery. A particular case of CNN is a U-Net [18], which was originally introduced for biomedical image segmentation. At this moment, the U-Net and its modifications have broad applications in seismic inversion, pre-stack seismic data processing and interpretation. This work suggests using the Numerical Dispersion Mitigation deep neural network (NDM-net) to learn the mapping between the synthetic seismic data modelled on a coarse grid and data modelled on a fine grid. In other words, we plan to eliminate the numerical dispersion using the Deep Learning approach.

The architecture of the network is similar to the one used by [5]. The differences are using a conventional convolutional layer instead of partial convolutions and the different input/output dimensions, see Fig. 2. These DNN contains 16 convolutional layers, eight upsampling layers, and eight concatenation layers (skip connections). The input and output tensors dimensions are 1250 \(\times \) 512 \(\times \) 2. An activation function for the first eight convolutional layers (encoding, or feature extracting, part of the DNN) is ReLU, while the last eight convolutional layers (decoding part) have LeakyReLu activation with a negative slope coefficient equals to 0.2. We implemented NDM-net in TensorFlow. The DNN weights were randomly initialized, and Adam stochastic optimization algorithm was exploited during the training process.

In the current implementation, we consider the input/output to be regularly sampled pre-stack seismic data. For training, we used each 10-th common shot gather computed on a fine grid and its corrupted version modelled on a coarse grid. Each common shot is converted to a tensor with a dimension of 1250 \(\times \) 512 \(\times \) 2. Here 1250 is the number of time samples in data (4 ms time discretization and 5s record time), 512 is the number of 2C receivers, and 2 is the number of recorded components (vertical and horizontal velocity components). Next, we split this dataset into training and validation datasets. Each common shot is normalized by scaling it to unit variance before being processed by the NDM-net.

Fig. 2.
figure 2

The architecture of NDM-net. The Black right arrow indicates convolution operation, while the red right arrow indicates concatenation. Up and down arrows indicate upsampling and batch normalization operations correspondingly.

4 Numerical Experiments

We applied our approach to mitigate the numerical dispersion in two datasets. Both simulations were done in 2D to illustrate the applicability of the NDM-net to improve seismic modelling accuracy and efficiency.

4.1 Marmousi2 Model

First we considered the elastic Marmousi2 model [15], as presented in Fig. 3. The size of the model was 17 km in the horizontal and 3.6 km in the vertical direction. Marmousi2 is the offshore model with water at the top. To make the considerations consistent with land data acquisition, we substitute water with solid used for the ocean bottom in the model. We performed simulations of seismic waves propagation using meshes with steps equal to 1.25 m, 2.5 m, and 5 m, assuming the solution obtained on the 1.25 m grid is the exact one. Such small grid steps were chosen due to the thick low-velocity layer, that was introduced instead of water at the top of the model. Note that the original model was provided on a grid with step size 1.25 m. However, to exclude the effect of model changes when the simulation mesh is coarsening, we map the mode to the mesh with step 5 m. After that 5-meters model was used for all numerical simulations.

Fig. 3.
figure 3

Marmousi2 elastic velocity model used for synthetic data generation. The marker represents the source position at \(x=8\) km.

The acquisition included 171 sources with the distance between the sources 100 m. We recorded wavefield by 512 2C receivers for each shot with maximal source-receiver offsets equal to 6.4 km. The distance between the receivers was 25 m. Simulations were performed using the fourth-order staggered grid scheme [11]. On average, the simulation time was 5 s per shot if a 5 m grid was used; 40 s per shot for 2.5 m grid; and 4 min for 1.25 m grid, using Nvidia V100 GPU. The example of modelled seismogram on the grid 1.25 m (X = 9 km) is presented in Fig. 4.

Fig. 4.
figure 4

Synthetic seismograms for shot positioned at \(x=9\) km: horizontal (a) and vertical (b) components calculated on a numerical grid with the spatial step 1.25 m.

We performed two numerical experiments, and for each experiment, we trained NDM-net. One was designed to map the data simulated using a 2.5 m grid to the exact solution (data acquired on the grid with steps 1.25 m). The other NDM-net was trained to map 5 m-data to the 1.25 m-data. The training was performed on the Nvidia V100 GPU. As a regularization, we used an early stopping technique and interrupted the training when the error on the validation dataset started to grow. In both cases (2.5 m to 1.25 m and 5 m to 1.25 m), the training process took about 30min. The prediction time is about 0.7 sec for one full common shot gather, while one forward modelling using FD technique on a GPU took about 40 s on 2.5 m grid and about 5 s on 5 m grid, but 5 min for the finest grid of 1.25 m.

To estimate the quality of DNN prediction, we use the normalized RMS (NRMS) as a measure of datasets similarity. NRMS is a strict sample-by-sample metric used for evaluating repeatability between two datasets in 4D seismic [10]. An acceptable level of NRMS in 4D seismic is about 20–40%. The verification of DNN predictions was performed on a testing dataset that differs from training and validation, i.e. were invisible by DNN during the training process. The NRMS plot calculated trace by trace using a sliding window of 200 ms is presented in Fig. 5. On average, the NRMS between 1.25 m data and 2.5 m-data was 30%. Application of the NDM-net reduced the NRMS down to 14%. The average NRMS between 5-m data and 1.25 m-data was about 59%, and the DNN managed to construct a prediction with the NRMS of 33%. So one may conclude that in both cases, NDM-net were able to reduce NRMS up to the acceptable level. To illustrate the effect of the NDM-net data enhancement, we provide the plots (see Figs. 6, 7) of a single seismic trace computed using different grids and then improved by the NDM-net.

Fig. 5.
figure 5

NRMS plot calculated between seismograms computed on a numerical grid with the spatial steps 1.25 m and 2.5 m (a), 1.25 m and DNN predicted data using 2.5 m data as input (b), 1.25 m and 5 m (d), 1.25 m and DNN predicted data using 5 m data as input, and the corresponding histograms (c,f).

Fig. 6.
figure 6

Seismic traces at different positions and its DNN-predictions for the case 2.5 m-data (a) and 5 m-data (b). Black plot – vertical component on the fine grid, red plot – input data for DNN prediction and blue plot – DNN-predicted data.

Fig. 7.
figure 7

Seismic traces at different positions and its DNN-predictions for the case 2.5 m-data (a) and 5 m-data (b). Black plot – vertical component on the fine grid, red plot – input data for DNN prediction and blue plot – DNN-predicted data.

4.2 Model with Vertical Intrusion

The second set of experiments was done for a model with vertical high-contrast intrusions causing lateral heterogeneity as presented in Fig. 8. The size of the entire model was 220 km by 2.6 km. The acquisition included 1901 sources with the distance between the sources 100 m. We recorded wavefield by 512 receivers for each shot with maximal source-receiver offsets equal to 6.4 km. The distance between the receivers was 25 m. In this research, we simulated the wavefield without the surface waves by using a perfectly matched layer for \(x<0\) [4]. The source wavelet was the Ricker pulse with a central frequency 30 Hz.

Fig. 8.
figure 8

Elastic velocity model used for synthetic data generation. The marker represents the source position at \(x=120\) km.

Originally, the model was provided on a grid with the steps 50 m in horizontal and 5 m in a vertical direction. We computed three datasets using the fourth-order staggered grid scheme [11]. We considered the solution acquired at the grid with steps of 2.5 m as the accurate one, whereas two others generated using grids with 5 m and 10 m spatial steps are polluted. We provide examples of the seismograms in Fig. 9.

Fig. 9.
figure 9

Synthetic seismograms for shot positioned at \(x=120\) km: vertical component calculated on a numerical grid with the spatial steps 2.5 m (a), 5 m (b) and 10 m (c). (Color figure online)

Fig. 10.
figure 10

NRMS plot calculated between seismograms computed on a numerical grid with the spatial steps 2.5 m and 5 m (a), 2.5 m and DNN predicted data using 5 m data as input (b), 2.5 m and 10 m (d), 2.5 m and DNN predicted data using 10 m data as input, and the corresponding histograms (c,f). NRMS were calculated in the area designated by a red rectangle on Fig. 2 (vertical component at the time from 3 s to 5 s including all receiver positions)

Fig. 11.
figure 11

Seismic traces at different positions and its DNN-predictions for the case 5 m-data (a) and 10 m-data (b). Black plot – vertical component on the fine grid, red plot – input data for DNN prediction and blue plot – DNN-predicted data.

Fig. 12.
figure 12

Seismic traces at different positions and its DNN-predictions for the case 5 m-data (a) and 10 m-data (b). Black plot – vertical component on the fine grid, red plot – input data for DNN prediction and blue plot – DNN-predicted data.

As in the previous example, we trained two NDM-nets for two synthetic datasets. One was designed to map the data simulated using a 5 m grid to the exact solution (data acquired on the grid with steps 2.5 m). The other NDM-net was trained to map 10 m-data to the 2.5 m-data. In both cases (5 m to 2.5 m and 10 m to 2.5 m), the training process took about 40 min. The prediction time is about 0.7 sec for one full common shot gather, while one forward modelling using FD technique on a GPU took about 40 s on 2.5 m grid and about 5 s on 5 m grid. Since the main error accumulates in the late arrivals, we calculate NRMS for the time range from 3 s to 5 s (red rectangle on the Fig. 9). The corresponding NRMS plot is presented in Fig. 10. On average, the NRMS between 2.5 m data and 5 m-data was 65%. Application of the NDM-net reduced the NRMS down to 30%. The average NRMS between 10-m data and 2.5 m-data was about 120%, which means that the 10-m data are extremely far from the true solution. As a result, the DNN managed to reduce NRMS up to the 90% level. The effect of the NDM-net data enhancement is illustrated in the Figs. 11, 12.

5 Conclusions

We present an original approach to numerical simulation of seismic wavefields. The method combines conventional seismic modelling based on the finite differences with the consequent correction of the data by the DNN-based algorithm called the NDM-net. First, we generate a training dataset simulating wavefields corresponding to at most 10% of the positions of the sources using fine enough spatial discretization (up to 20 points per minimal wavelength - ppw). Second, the full dataset is generated using a coarse mesh with no more than 3–5 ppw. Note that in the 2D case, simulation of the solution using 5 ppw is 64 times faster than that with 20 ppw. Third, the NDM-net is trained to reduce the numerical error in the coarse-grid solution. Then the NDM-net is applied to correct the entire dataset. The presented results demonstrate the ability of the NDM-net to make a high-quality seismic data prediction using the synthetics generated on a coarse grid. In particular, the application of the NDM-net reduced the computational time to simulate the full dataset of 171 common shot gathers for the Marmousi2 model from 684 min to 112 min.