1 Introduction

High-frequency surface-wave methods (Xia et al. 1999, 2002) have been widely used for near-surface shear (S)-wave velocity survey among active (e.g., Xia et al. 2003, 2012; Xia 2014; Ivanov et al. 2006; Luo et al. 2007; Socco et al. 2010; Foti et al. 2011; Pan et al. 2016a; Zhang and Alkhalifah 2019a) and passive seismic investigations (e.g., Louie 2001; Okada 2003; Park and Miller 2008; Cheng et al. 2015, 2016; Zhang et al. 2020). By using dispersion imaging methods, such as the τp transformation (McMechan and Yedlin 1981), the FK transformation (Yilmaz 1987), the phase shift (Park et al. 1998), the frequency decomposition and slant stacking (Xia et al. 2007), and the high-resolution linear Radon transformation (HRLRT) (Luo et al. 2008), dispersion images can be easily transformed from shot gathers. A one-dimensional (1D) S-wave velocity can be achieved by inverting the dispersion curves (Xia et al. 1999; Socco and Boiero 2008; Boaga et al. 2011) picked from each dispersion image. Based on the 1D approximation, each inverted model of dispersion curves represents the mean S-wave velocity of the underground structure (Boiero and Socco 2010). Two-dimensional (2D) S-wave velocity profile or three-dimensional (3D) S-wave velocity model can be constructed by combining multiple 1D results. The key step to ensure the accuracy of inversion is to acquire the reliable dispersion curves of surface waves in the frequency–velocity (fv) domain (Shen et al. 2015). The most traditional way to acquire the dispersion curves is to identify the dispersion energy from dispersion image and manually pick phase velocities by following peaks at different frequencies. At present, the human–machine interaction method can achieve semi-automatically extraction of dispersion curves by manually clicking at dispersion energy areas (Shen 2014); however, manual identification of different dispersion energy modes is still indispensable.

With the widespread use of high-frequency surface-wave methods and the increasing number of seismic data we need to deal with, people are reluctant to spend much time on duplicating tasks of acquiring dispersion curves. The common tasks are: 1. generating 2D S-wave velocity profiles by aligning a large number of 1D S-wave velocity models (Bohlen et al. 2004; Yin et al. 2016; Mi et al. 2017; Pan et al. 2019), 2. delineating shallow S-wave velocity structure using multiple ambient-noise surface-wave methods (Pan et al. 2016b), and 3. estimating for a 3D S-wave velocity model (Pilz et al. 2013; Ikeda and Tsuji 2015; Pan et al. 2018; Mi et al. 2020). Meanwhile, there is a newly developed seismic data acquisition technology—Distributed Acoustic Sensing (DAS) (Daley et al. 2013; Ning and Sava 2018; Song et al. 2019). Innumerable amounts of data can be acquired by this new technology. There is no doubt that manually picking dispersion curves will be unrealistic in the near future. In addition, the dispersion curves of manual extraction have certain subjectivity. Some significant energy (i.e., spatial aliasing and “crossed” artifacts) (Dai et al. 2018a; Cheng et al. 2018a) confuse people when picking dispersion curves because each person has different experiences in the surface-wave data processing.

In recent years, as a new research direction, data-driven deep learning has aroused the interest of geophysicists. Deep learning models, building a layered architecture similar to the human brain, can extract the features from the bottom to the top of the input data layer by layer, thus establishing a good mapping between signal and semantic (LeCun et al. 2015). With the rapid development of deep learning, more powerful computing, and the increasing data processing capacity, recent researches have applied deep learning to geophysics. Perol et al. (2018) presented a convolutional neural network for earthquake detection and location. Zachary et al. (2018) trained a convolutional neural network (CNN) to detect seismic body wave phases. Zhang et al. (2018) applied CNN to predict seismic lithology. Mao et al. (2019) developed a CNN to predict the subsurface velocity information. Wang and Chen (2019) used residual learning of deep CNN for seismic random noise attenuation. Wang et al. (2019) developed a deep learning model to automatically pick a great number of first P- and S-wave arrival times precisely from local earthquake seismograms. Wu et al. (2019) proposed a multiple-task learning CNN to simultaneously perform three seismic image processing tasks of detecting faults, structure-oriented smoothing with edge preserving, and estimating seismic local structure orientations. Yang and Ma (2019) proposed a deep fully convolutional neural network for velocity model building directly from raw seismograms. In addition, within the framework of full waveform inversion, Ovcharenko et al. (2019) used deep learning to extrapolate data at low frequencies, and Zhang and Alkhalifah (2019b) utilized deep neural networks to estimate the distribution of facies in the subsurface as constraints. Few studies (e.g., Dai et al. 2018b), however, have applied deep learning to the surface-wave dispersion curves extraction. It is imperative to adopt deep learning technology for tasks with large volumes of data and tedious work. It often leads to greater efficiency of geophysical data processing, reduces cost, and reduces biases associated with human–machine interaction influenced by past professional experience.

Dai et al. (2018b) discussed that dispersion curves extraction can be regarded as a semantic segmentation (Long et al. 2015; Badrinarayanan et al. 2017) task. For multimode or more complex dispersion images, however, the concept was not suitable. The generated binary segmentation still needs to be separated into the different surface-wave modes, especially when the convergence of dispersion energy is insufficient thereby causing the energy of different surface-wave modes to be very close to each other. Inspired by the success of instance segmentation which has been well studied in computer vision via deep learning models (e.g., Dai et al. 2015; Romera-Paredes and Torr 2016; Bai and Urtasun 2017; Ren and Zemel 2017; He et al. 2017; Neven et al. 2018; Chen et al. 2020), we regard the dispersion curves extraction as an instance segmentation task.

In this paper, we adopt the ideas from these previous studies and develop a deep learning model, called Dispersion Curves Network (DCNet), to extract dispersion curves in the fv domain. DCNet is a multitask network model (e.g., Caruana 1997; Ruder 2017; Wu et al. 2019), which consists of a segmentation branch and an embedding branch. Learning multiple related tasks from data improves efficiency and prediction accuracy by exploiting commonalities and differences through the multiple tasks (Evgeniou and Pontil 2004). The segmentation branch segments the dispersion images into two classes, background and dispersion energy, while the embedding branch further distinguishes the segmented dispersion energy pixels into different mode instances. Surface-wave mode separation technique (Luo et al. 2009a) can be achieved, in which each mode of dispersion curve forms its own instance within the dispersion energy class. We design a data augmentation method for surface-wave energy recognition and create a data set with 25,000 labeled surface-wave dispersion energy data for training the DCNet model. We test the accuracy of the DCNet model extracted results by comparing them with the theoretical dispersion curves of simulated data. These tests indicate that our DCNet model is very effective with high accuracy. We also apply the DCNet model to a 3D passive surface-wave field data to automatically extract a large number of dispersion curves and generate a 3D S-wave velocity model by assembling the 1D S-wave velocity profiles inverted from each dispersion curve. The effectiveness and robustness of our method were demonstrated by comparing the 3D S-wave velocity model with the borehole data.

2 Method and Experiments

2.1 Network Architecture

DCNet is trained end to end for surface-wave dispersion curves extraction, by regrading dispersion curves extraction as an instance segmentation task. In this way, the network can extract surface-wave multimode dispersion curves. DCNet is a multitask encoder–decoder network structure consisting of a segmentation branch and an embedding branch (Fig. 1). Formulas of operations defined in DCNet are listed in Table 1. The encoder part of DCNet is modified from the VGGNet (Simonyan and Zisserman 2014) network model, which has a simple architecture with convolution, pooling, and fully connected layers. It is mainly used for image classification tasks. The decoder part is designed based on the idea of the fully convolutional networks (Long et al. 2015).

Fig. 1
figure 1

The architecture of the DCNet. The end of each Pooling indicates the end of a stage of the encoder part in the middle of the architecture. The left and right parts show the segmentation and embedding branches, respectively. Each rounded block represents input or output. Each block corresponds to an operation. The abbreviations of operations are shown in the network architecture, i.e., Conv, Pooling, Deconv, Fusing, are defined in Table 1. Specifically, Conv and Deconv represent the convolution layer and deconvolution layer with batch normalization (BN) operation and rectified linear unit (ReLU) operation, respectively. The first three-dimensional parameter in the Conv and Deconv operation means the size of its convolutional kernels; the last parameter means the number of convolutional kernels; the strides of Conv is 1 × 1; the strides of Deconv is 2 × 2; the padding use “same”, which means output filled with zeros is the same size with input. The kernel size of Pooling is 2 × 2. The image size for each input or output is shown between two dashed lines

Table 1 Formulas of operations defined in DCNet, where * refers to a 4-dimensional convolution; \(K\) and \(\bar{K}\) represent the convolutional kernels; \(b\) represents the bias; and \(ksize\) represents the kernel size; \(input1\) and \(input2\) represents the outputs of the Conv and Deconv at the same stage, respectively

DCNet shares the first four stages of encoder between the two branches, while the last stage of the encoder and the full decoder are each a separate branch. The segmentation branch is trained to segment the dispersion images into two classes, background and dispersion energy. To distinguish the dispersion energy pixels identified by the segmentation branch, we trained another branch of DCNet for dispersion energy embedding. The last layer of the segmentation branch outputs a one channel image, whereas the last layer of the embedding branch outputs a three-channel image. As shown in Fig. 1, each branch’s loss term is equally weighted and back-propagated through the network. By using the discriminative loss function proposed by De Brabandere et al. (2017), the embedding branch is trained to output an embedding for each surface-wave mode. The discriminative loss function has a good performance on segmentation task of single class with multiple instances, and accepts any number of instances. The distance between pixel embeddings belonging to the same surface-wave mode is small, whereas the distance between pixel embeddings belonging to different surface-wave modes is maximized. After using the output of the segmentation branch (Fig. 2b) as a mask on the output of the embedding branch (Fig. 2c, d), the surface-wave embeddings (Fig. 2e, f) are clustered together and assigned to their mode cluster centers using DBSCAN (Ester et al. 1996) with the parameters of the radius \(Eps = 0.5\) and the minimum number of neighboring points \(MinPts = 50\) in this paper (Fig. 2g). Finally, an instance segmentation image (Fig. 2h) of surface waves with mode separation can be achieved, and then we can fit a curve through the corresponding peaks of phase velocity from each surface-wave mode instance to obtain the final extracted dispersion curves (Fig. 2i, j).

Fig. 2
figure 2

The process of extracting dispersion curves by DCNet. a The input dispersion image. b Output of the segmentation branch (white areas represent dispersion energy and black areas represent the background). c Output of the embedding branch (a 2D representation). d Output of the embedding branch (a 3D representation). e Using the output of the segmentation branch as a mask on the output of the embedding branch (a 2D representation). f Using the output of the segmentation branch as a mask on the output of the embedding branch (a 3D representation). g Using DBSCAN (Ester et al. 1996) to cluster the embeddings together. h The instance output. i The dispersion curves extracted from the energy of each mode. j The final extraction result

2.2 Datasets and Training

To ensure the training results cover a variety of scenarios, we used a large amount of labeled dispersion images (Fig. 3) for training our DCNet model. We used 200 real-world active source data, 1000 real-world passive source data, and 600 simulated data (the creation step is shown in Table 2) to generate the dispersion images by using the common imaging methods, such as τp transformation (McMechan and Yedlin 1981), phase shift (Park et al. 1998), frequency decomposition and slant stacking (Xia et al. 2007), and HRLRT (Luo et al. 2008). They contain not only dispersion images with the fundamental mode but also dispersion images with fundamental and higher modes, or theoretical dispersion images with multimode. The data set contains real-world data with a lot of noise, spatial aliasing (Dai et al. 2018a) or “crossed” artifacts (Cheng et al. 2018a), and synthetic data with spatial aliasing or “crossed” artifacts, to ensure that the training result can cope with the existence of useless or fake energy. With a large number of real-world surface-wave data, we can simulate many theoretical data. However, it is still not enough to train a neural network only by manual data labeling. To solve this problem, we expanded the dataset through data augmentation, which can improve the robustness of the model and avoid overfitting. General data augmentation methods were not suitable for dispersion energy recognition, such as flipping, rotating, zooming, translating, and color jittering (Howard 2013). Therefore, we added different levels of noise to each (real-world or simulated) shot gather and generated dispersion images with different frequency ranges and velocity ranges. The imaging results are both normalized and non-normalized in the frequency domain (Fig. 4), which increases the number of samples. In this way, we obtained 25,000 samples in total, of which 80% were used for training and the rest for validation.

Fig. 3
figure 3

Examples of dataset for training DCNet. Each column represents one sample. Each sample contains dispersion image, binary ground truth, and instance ground truth. The top row is the dispersion images, in which we hid the coordinate information that was not used during the training. The middle and bottom rows are the binary and instance ground truths, respectively. Columns 1–6, respectively, denote the common dispersion image, dispersion image with a higher mode, dispersion image with higher modes, dispersion image with a lot of noise, dispersion image with spatial aliasing, and dispersion image with “crossed” artifacts

Table 2 The steps of simulated data set creation
Fig. 4
figure 4

Data augmentation process. One dispersion image, one binary label, and one instance label as one sample. The dispersion images of odd and even rows are normalized and non-normalized, respectively

DCNet was trained by using TensorFlow libraries (Abadi et al. 2016). The dispersion images were rescaled to 512 × 256. We used the Adam (Kingma and Ba 2014) for optimizing the network parameters. The beginning learning rate was 0.0005 and linearly decays to its 1% with 36 epochs, and all the 20,000 training data sets were processed at each epoch. We initialize random weights and use a batch size of 8 considering a trade-off between generalization performance and memory limitation on a personal laptop with an NVIDIA GeForce RTX 2060 GPU.

The loss function of segmentation branch \({\mathcal{L}}_{seg}\) is,

$${\mathcal{L}}_{seg} = \frac{{\mathop \sum \nolimits_{i = 1}^{N} y_{i}^{{\prime }} { \log }\left( {\frac{{e^{{y_{i} }} }}{{\mathop \sum \nolimits_{i = 1}^{N} e^{{y_{i} }} }}} \right)w_{class} }}{N}$$
(1)

where \(w_{class} = \frac{1}{{\ln \left( {\varepsilon + \frac{{N_{class} }}{N}} \right)}}\) is a bounded inverse class weighting (Paszke et al. 2016), because the dispersion energy area is always much smaller than the background area; \(N_{class}\) represents pixel number of the class (dispersion energy or background) in label and \(N\) represents the pixel number of the whole label; \(\varepsilon = 1.02\) which is a control parameter that limits \(w_{class}\) to the interval of [1:50]. \(y_{i}\) and \(y_{i}^{'}\) represent the prediction values and label values, respectively.

The loss function of embedding branch \({\mathcal{L}}_{emb}\) is,

$${\mathcal{L}}_{var} = \frac{{\mathop \sum \nolimits_{c = 1}^{C} \mathop \sum \nolimits_{i = 1}^{{N_{c} }} \left[ \|{m_{c} - x_{i}\| - \delta_{v} } \right]_{ + }^{2} }}{{C N_{c} }}$$
$${\mathcal{L}}_{dist} = \frac{{\mathop \sum \nolimits_{{c_{a} = 1}}^{C} \mathop \sum \nolimits_{{c_{b} = 1}}^{C} \left[ {2\delta_{d} - \|m_{{c_{a} }} - m_{{c_{b} }} } \|\right]_{ + }^{2} }}{{C\left( {C - 1} \right)}} , \left( {c_{a} \ne c_{b} } \right)$$
$${\mathcal{L}}_{reg} = \frac{{\mathop \sum \nolimits_{c = 1}^{C} \|m_{c}\| }}{C}$$
$${\mathcal{L}}_{emb} = \alpha \cdot {\mathcal{L}}_{var} + \beta \cdot {\mathcal{L}}_{dist} + \gamma \cdot {\mathcal{L}}_{reg}$$
(2)

where \({\mathcal{L}}_{var}\) is a variance term, which applies a pull force on each embedding toward the mean embedding of a surface-wave mode and it only activates when an embedding is further than \(\delta_{v}\) from its mode cluster center; \({\mathcal{L}}_{dist}\) is a distance term, which pushes the cluster centers of different surface-wave modes away from each other and it only activates when they are closer than \(2\delta_{d}\) to each other; \(C\) is the number of surface-wave modes; \(N_{c}\) is the number of elements in mode cluster; \(x_{i}\) represents a pixel embedding and \(m_{c} = \frac{1}{N}\mathop \sum \nolimits_{i = 1}^{{N_{c} }} x_{i}\) represents the mean embedding of the mode cluster; \(\left\| {\, \cdot \,} \right\|\) is the L2 distance and \(\left[ { x } \right]_{ + } = { \hbox{max} }\left( {0,x} \right)\); \({\mathcal{L}}_{reg}\) is a regularization term to keep the activations bounded, which applies a small pull force on all mode clusters toward the origin.

When \(\delta_{d} > \delta_{v}\), each embedding is closer to its own mode cluster center than to any others. When \(\delta_{d} > 2\delta_{v}\), each embedding is closer to all embeddings of its own mode cluster than to any embedding of a different mode cluster. Therefore, we set \(\delta_{v} = 0.5\) and \(\delta_{d} = 3\) to ensure that the embedding of each mode is far away from any other embedding of different modes, and keep \({\mathcal{L}}_{seg}\) and \({\mathcal{L}}_{emb}\) values on a comparable order of magnitude. We set \({\mathcal{L}}_{var}\) and \({\mathcal{L}}_{dist}\) to have the same weight and \({\mathcal{L}}_{reg}\) has a small weight with \(\alpha = \beta = 1\) and \(\gamma = 0.001\).

The accuracy is calculated as the average correct number of points per image:

$$accuracy = \frac{1}{M}\mathop \sum \limits_{img = 1}^{M} \frac{{C_{img} }}{{T_{img} }}$$
(3)

where \(C_{img}\) represents the number of correct points and \(T_{img}\) represents the number of ground-truth points, and \(M\) represents the number of images.

As shown in Fig. 5, the loss functions for training (the blue curves in Fig. 5a, b) and validation (the orange curves in Fig. 5a, b) converge to small values while the accuracy (the orange curves in Fig. 5c) gradually increases to 95% after 36 epochs. Figure 6 also shows that the first two epochs converged rapidly. After 36 epochs, the embeddings of different modes were gradually separated, and embeddings of the same mode were gathered together.

Fig. 5
figure 5

The training log. a Losses of segmentation branch for training (the blue curve) and validation (the orange curve) data sets. b Losses of embedding branch for training (the blue curve) and validation (the orange curve) data sets. c Mean accuracy rate of DCNet for training (the blue curve) and validation (the orange curve) data sets

Fig. 6
figure 6

Convergence of the training process on a dispersion image. a The input dispersion image, binary ground truth, and instance ground truth. b The outputs of epochs of 0, 1, 2, 12, and 36. The first row represents the outputs of the segmentation branch; the second row represents the outputs of the embedding branch (a 2D representation); the third row represents the outputs of the embedding branch (a 3D representation); the fourth row represents the outputs of the embedding branch (a 3D representation) with outputs of the segmentation branch as mask; and the last row represents the outputs of the embedding branch (a 3D representation) with instance ground truth labels

2.3 Theoretical Data Tests

In order to test the accuracy of the results of DCNet model, we simulated two typical models which parameters are shown in Table 3. The geophone interval was 1 m with the nearest offset of 10 m. The length of the record was 600 ms with a 0.1 ms sample interval. The shot gathers of Models 1 and 2 were simulated by a finite-difference method for modeling Rayleigh (Zeng et al. 2011) and Love waves (Luo et al. 2010), respectively. Their shot gathers and dispersion images generated by the phase shift (Park et al. 1998) are shown in Fig. 7. Model 1 was a common layered earth model with obvious fundamental mode dispersion energy (Fig. 7a, c) and Model 2 with stronger multimode surface-wave energy (Fig. 7b, d). We input their dispersion images into the DCNet model, and the processes of dispersion curves extraction are similar to Fig. 2. Their theoretical dispersion curves were calculated by Knopoff’s method (Schwab and Knopoff 1972). Comparisons between dispersion curves extracted by DCNet and theoretically calculated by Knopoff’s method are shown in Fig. 8a, b. The mean error of dispersion curves obtained by theoretical calculation and DCNet model is calculated by the following formula.

$$error = \frac{{\mathop \sum \nolimits_{i = 1}^{{N_{f} }} \left( {\left| {\frac{{T_{{f_{i} }} - M_{{f_{i} }} }}{{T_{{f_{i} }} }}} \right|} \right)}}{{N_{f} }}$$
(4)

where \(T_{{f_{i} }}\) and \(M_{{f_{i} }}\) represent the velocities of theoretical calculation and DCNet model at the i th frequency, respectively. \(N_{f}\) is the number of frequency points participating in comparison.

Table 3 The parameters of two typical simulated models
Fig. 7
figure 7

The shot gathers and dispersion images of the simulated models. a The shot gather of Model 1. b The shot gather of Model 2. c The dispersion image of Model 1 generated using the phase shift method. d The dispersion image of Model 2 generated using the phase shift method

Fig. 8
figure 8

Comparisons of the dispersion curves extracted by different methods. The white lines with white dots represent the dispersion curves extracted by DCNet; the black dashed lines with smaller dots represent the dispersion curves calculated theoretically. a Comparisons of the dispersion curves extracted by DCNet and the theoretically calculated one corresponding to Model 1. b Comparisons of the dispersion curves extracted by DCNet and the theoretically calculated one for Model 2. c The dispersion image of Model 2 generated using the τp transformation. d The dispersion image of Model 2 generated using frequency decomposition and slant stacking. e The dispersion image of Model 2 generated using HRLRT. f Comparisons of the dispersion curves extracted by DCNet using different imaging methods

Calculated using Formula (4), the mean errors of the fundamental mode of surface waves (Fig. 8a) and the multimodes of surface waves (Fig. 8b) are 0.77% and 1.18%, respectively. Due to the errors expected in finite-difference simulation, the energy peaks at low-frequency (Fig. 8a) and high-mode (Fig. 8b) dispersion energy match the theoretical dispersion curves with relatively low accuracy. The comparisons demonstrated that our DCNet model was able to effectively pick accurate dispersion curves. Taking Model 2 as an example, as shown in Fig. 8f, the extracted results from dispersion images generated using different imaging methods (the phase shift, the τ-p transformation, the frequency decomposition and slant stacking, and the HRLRT) are almost the same. This indicates that the method we proposed has low sensitivity to different imaging methods.

2.4 Method Comparison

We compared our method with an automated extraction method of the fundamental mode dispersion curve (Taipodia et al. 2020) which is based on a threshold energy filtering of the dispersion image. We used the simulated data of Model 2 (Table 3) for a comparative experiment. This threshold energy filtering method can mainly be summarized into three steps: first, to read an RGB (red–green–blue) dispersion image generated from SurfSeis® (developed by Kansas Geological Survey) (Fig. 9a) into an HSV (hue–saturation–value) representation; second, to select different colors of pixels of the HSV model using different thresholds and mapping them linearly (Fig. 9b) in different ranges (Table 4); third, to find the energy that is higher than 99.9% (red points in Fig. 9d) from local peaks in the 3D dispersion image (Fig. 9c). We extracted the dispersion curves of the same data (Model 2 in Table 3) through our DCNet, and the outputs of the intermediate processes are shown in Fig. 10. Figure 10b and 10c shows the outputs of the segmentation and embedding branches, respectively. We compared the final results (Figs. 9d, 10d) of the extraction by DCNet and threshold energy filtering method with the theoretical dispersion curves (Fig. 11). Calculated by the Formula (4), the mean errors of the fundamental mode dispersion curve extracted by DCNet and threshold energy filtering method are 1.13% and 7.98%, respectively. The mean error of DCNet is just 0.14 times that of the threshold energy filtering method. The mean speed of DCNet is 0.5 s per dispersion image on a personal laptop, 18 times faster than the threshold energy filtering method. Table 5 shows the comparison in detail. The comparison shows that DCNet can extract more accurate and credible dispersion curves in a much shorter time and can separate different modes, which is unable to be done with the threshold energy filtering method.

Fig. 9
figure 9

The process of extracting dispersion curves by a threshold energy filtering method. a The dispersion image obtain from SurfSeis® (frequency normalization is not applied). b The dispersion image divided into different levels of energy according to the set threshold. c The 3D dispersion image representation. d Extraction of all possible local peaks from the 3D dispersion image (gray points with different depths representing the extraction of different thresholds) and identification of the fundamental dispersion curve (local highest energy peaks, the red points)

Table 4 Parameters of threshold energy filter method
Fig. 10
figure 10

The process of extracting dispersion curves by DCNet. a The same dispersion image as Fig. 9a with a normal color-bar representation (frequency normalization is not applied). b Output of the segmentation branch (white areas represent dispersion energy and black areas represent background). c Output of the embedding branch (different colors represent different modes). d The instance output (light red and blue areas represent energy areas of the fundamental mode and the first higher mode, respectively) and the dispersion curves extracted from the energy of each mode (lines with red and blue dots represent fundamental mode and first higher mode, respectively)

Fig. 11
figure 11

Comparison of dispersion curves extracted by DCNet (red lines with red dots) and threshold energy filtering method (the blue dotted line with blue crosses) and theoretical calculation (the black dashed lines with black dots)

Table 5 Comparisons between DCNet and threshold energy filtering method

3 Field Data Application

3.1 Fieldwork

Our fieldwork was carried out in the city of Hangzhou, China (Fig. 12). It was a relatively flat area and near two busy roads that are located at the south and west of the field survey area. The test site was covered by silty clay near the tens of meters of near surface. There was a borehole with a depth of 95 m about 50 m from the east of the fieldwork area (the red pentagram highlighted in Fig. 12). The borehole results were used to verify the reliability of the application results.

Fig. 12
figure 12

The location of the fieldwork (produced by Google Earth and Google Maps). The red circle landmark in the map represents the location of the fieldwork area, the white dots in the satellite map represent the true coordinate information of all the three-component receivers, and the red pentagram represents the borehole near the survey line

We designed our field measurements with 14 linear passive survey arrays that contain 13 three-component receivers (the white dots highlighted in Fig. 12) in each array with the 10-m receiver separation (Fig. 13a). The size of the fieldwork area was 130 × 120 m2, with a total of 182 three-component receivers with a predominant frequency of 5 Hz placed with a 10-m interval both in north and east directions (black triangles in Fig. 13a). The noise records were recorded with the sampling frequency of 1000 Hz from local time 17:24 on June 15 to 17:19 on June 16 2019.

Fig. 13
figure 13

a The field measurements. Black triangles represent the 14 arrays (east–west) × 13 traces (north–south) three-component receivers placed with a 10-m interval both in north and east directions. A receiver is named by “array number–trace number”. The colored pentagonal stars represent samples of measurement points. The colored dots represent the measurement points generated by recordings from receivers in different combinations. The colored boxes represent the combinations corresponding to the same colored measurement points. b, c Virtual shot gathers marked by red boxes for measurement point A; d the virtual shot gather marked by the orange box for measurement point B; e the virtual shot gather marked by the blue box for measurement point C; and f the virtual shot gather marked by the green box for measurement point D

In the passive seismic data processing of this application, we only utilized the vertical component data to acquire the information of Rayleigh waves. We used the multichannel analysis of passive surface-wave method (MAPS) (Cheng et al. 2016) to process the passive seismic data. After preprocessing, cross-correlation functions between line-arranged traces were used to form the virtual shot gathers according to their internal distance, and then dispersion images were computed by the phase shift method (Park et al. 1998). The 1D S-wave velocity models can be constructed by inverting surface-wave dispersion curves (Xia et al. 1999).

3.2 Extracting Dispersion Curves

Based on the middle-of-receiver-spread assumption (Luo et al. 2009b), implying that the 1D S-wave velocity profile obtained by inversion reflects the medium below the receiver spread, we used recordings from 7 successive receivers to obtain the dispersion curves of the measurement points at the middle of the 7-receiver spread (e.g., the measurement point A generated by recordings from the receivers in the red boxes in Fig. 13a and its virtual shot gathers shown in Fig. 13b, c). The 7-receiver spread is moved 1 receiver position toward the north direction each time, and 7 dispersion curves along array 4 can be obtained. Thus, the dispersion curves of the red measurement points in Fig. 13a can be obtained. In addition, we used recordings from 6 successive receivers to obtain the dispersion curves of the measurement points at the middle of the 6-receiver spread (e.g., the measurement point B generated by recordings from the receivers in the orange box in Fig. 13a and its virtual shot gather shown in Fig. 13d). Thus, the dispersion curves of the orange measurement points in Fig. 13 can be obtained, which made the acquired measurement data denser. We also used recordings from 6 successive receivers in the northeast direction to obtain the dispersion curves of the measurement points in the middle of the 6-receiver spreads (e.g., the measurement point C generated by recordings from the receivers in the blue box in Fig. 13a and its virtual shot gather shown in Fig. 13e). By moving the receiver spread, the dispersion curves of the blue measurement points in Fig. 13a can be obtained. We got a nominal resolution of 5 m × 5 m in the central part of the survey area. We used recordings from 5, 4, or 3 receivers to obtain the measurement data along edges and at corners (e.g., the measurement point D generated by recordings from the receivers in the green box in Fig. 13a and its virtual shot gather shown in Fig. 13f). Therefore, a total of 589 dispersion curves need to be extracted.

We made a list of the receivers required for each measurement point, keeping the same frequency and velocity ranges (Table 6). Then, the dispersion images were calculated in batches. After generating all the dispersion images, all dispersion curves were extracted within 5 min (Fig. 14) with an average speed of only half a second per dispersion image. This batch processing saved a lot of time on human–machine interaction and did not require the experience of extracting the dispersion curves. For the same measurement location, the dispersion curves obtained by different receiver spreads were averaged to obtain more reliable results (e.g., as Fig. 14 and Table 6 show, measurement point A can be obtained by recordings from the receivers in red boxes toward two directions). Finally, the dispersion curves of 533 measurement points were generated.

Table 6 Examples of the receivers list required for measurement points
Fig. 14
figure 14

Examples of the dispersion curves extracted from the field data application. The white lines with white dots represent the dispersion curves extracted by DCNet

3.3 3D S-wave Velocity Model

After obtaining a large number of dispersion curves, batch inversion can be carried out with the same initial model. In general, manual intervention was not required; however, occasionally individual points need to be debugged. We obtained a total of 533 1D S-wave velocity profiles, by inverting each extracted dispersion curve independently using a Levenberg–Marquardt algorithm discussed in Xia et al. (1999). Each 1D inverted model like a virtual borehole was placed at its location of the corresponding measurement point (cyan points in Fig. 15). We used the inverted model of the measurement point 27-19 (Fig. 15), which was closest to the borehole, to compare the results with the borehole measurements (Fig. 16a). The RMS error dropped to 6 m/s (Fig. 16c). Our inverted model fits well with the borehole measurements, which shows that the processing results of the field data are reliable. Each virtual borehole at edges and corners (each red point in Fig. 15), which has no extracted dispersion curve for inversion, is obtained via linear interpolation of the virtual boreholes (cyan points in Fig. 15) from its 8 adjacent positions. We generated a 3D S-wave velocity model (Fig. 17) by assembling all the virtual boreholes. Figure 17b contains vertical sections along the Y direction displayed every 20 m from 15 m to 115 m. It shows that the lateral variation of velocity is small and there is a high-velocity layer at 40 m and 50 m deep.

Fig. 15
figure 15

1D S-wave velocity model assembly. The cyan points represent the inverted 1D S-wave velocity models, and the red points represent the 1D S-wave velocity models generated by linear interpolation. Measurement points are named by “line number–point number”

Fig. 16
figure 16

Comparison with borehole results. a The inversion result of the nearest measurement point from the borehole (measurement point 27-19 in Fig. 15). The black line represents the borehole S-wave velocity, the blue dotted line represents the initial S-wave velocity, and the red line represents the inverted S-wave velocity. The S-wave velocity initial model was set according to the borehole data to constrain the inversion results. b The dispersion curve extracted by DCNet. c The inversion of the dispersion curve. The black dots represent the extracted dispersion curve, the blue dots represent the dispersion curve of initial model with RMS error of 16.1 m/s, and the red dots represent the dispersion curve of the inverted model with RMS error of 6.0 m/s

Fig. 17
figure 17

Final 3D S-wave velocity model of the field data application. a Complete 3D S-wave velocity structure via linear interpolation. b The vertical sections along the Y direction displayed every 20 m from 15 to 115 m

The isosurfaces of the 3D S-wave velocity model were compared with the borehole histogram (Fig. 18). The S-wave velocity of the first isosurface near the ground surface is 200 m/s, which represents the bottom S-wave velocity interface of silt. The second isosurface is the interface between sandy silt and silty clay. Its S-wave velocity is 250 m/s. The S-wave velocities of third and fourth isosurfaces are 370 m/s, which represent the upper and lower S-wave velocity interfaces of gravel layer. The S-wave velocity of the fifth isosurface is 400 m/s, and then the velocities below the fifth isosurface increase dramatically. It demonstrated that there is a distinct bedrock interface compared to the 56 m depth of borehole histogram. These S-wave velocity isosurfaces are fitting well with the depths of interfaces in the borehole histogram. In the entire process, we only need to set the parameters; then most of the work is done by a computer to obtain such a fine 3D S-wave velocity model, instead of picking the dispersion curves manually point by point as in the past.

Fig. 18
figure 18

Isosurfaces of the 3D S-wave velocity model (left) compared with the borehole histogram (right)

4 Discussion and Conclusions

In fact, besides the extracted dispersion curves that can be used for inversion, different outputs of DCNet can be used for different tasks according to the requirements. The output of segmentation branch is exactly the area of surface-wave energy. It can avoid fitting the useless and fake energy in the phase-velocity spectra inversion (Ryden and Park 2006). It can also be used for surface-wave extraction and suppression (Hu et al. 2016). The instance output is exactly the energy areas of different surface-wave modes, which can be used for mode separation by the HRLRT (Luo et al. 2009a). These outputs are used for identifying surface-wave energy instead of manually selection. With the continuous improvement of surface-wave imaging processing (e.g., Bensen et al. 2007; Groos et al. 2012; Ikeda et al. 2013; Shen et al. 2015; Cheng et al. 2018a, b; Zhou et al. 2018; Pang et al. 2019), the quality of the dispersion images is increasing. DCNet can be continuously trained and its training set is expanding all the time. The identification of dispersion energy is becoming easier and easier, and the accuracy of extracted dispersion curves will keep improving. In addition, because of its efficient extraction, this kind of automatic dispersion curves extraction combined with an inversion can produce the real-time profiles, which can offer immediate guidance to fieldwork.

We proposed a deep learning model (DCNet) to rapidly extract numerous multimode surface-wave dispersion curves in the f-v domain. We also presented a method to generate a large number of labeled surface-wave data for training DCNet model. We compared the theoretical dispersion curves of synthetic data to prove the reliability of our results. By automatically extracting dispersion curves in each window, we obtained a 3D S-wave velocity model by assembling 533 individual inverted 1D S-wave velocity models. The field data application demonstrated the effectiveness and robustness of our method. Data processing automation can improve the efficiency and stability when dealing with such a task with a large amount of data.