FormalPara Article Highlights
  • Inapplicability of Green’s function retrieval due to the ambient noise heterogeneity can be relieved by the newly developed algorithm

  • We present a fundamental architecture of the algorithm including the source-velocity parameterizations and joint inversions. Unbiased path velocities and ensemble source model are well recovered in numerical tests

  • Near-surface seismic experiments conducted in the Hangzhou urban area have evaluated and verified the feasibility of the algorithm along with local borehole datasets

1 Introduction

Over the past decades, ambient noise recordings from worldwide seismic networks have been taken to infer the Earth’s interior based on seismic interferometry (e.g., Shapiro et al. 2005; Nakata et al. 2019). Green’s function is empirically expected after the cross-correlation of two-station time series. The interstation correlation function is proportionally related to the medium impulse response between two receivers, and this procedure has been revealed by both theory and laboratory experiments (Lobkis and Weaver 2001; Snieder 2004) assuming a homogeneous source distribution or wavefield equipartitioning. Intensive studies investigate continental or global structures utilizing low-frequency vibrations containing the Earth’s hum, the primary and the secondary microseism (e.g., Yang et al. 2007; Ardhuin et al. 2015; Bao et al. 2015). These large-scale studies were to readily approach the theoretical assumption of dissipative wavefields considering the averaging effects of long recordings for omnidirectional natural sources. However, for the seismic observations in more localized regions with shorter durations, the microseisms may not sufficiently illuminate the subsurface, and this so-called azimuthal effect (Cheng et al. 2015, 2016; Xu et al. 2017) will pose uncertainties on Green’s function retrieval (Tsai 2009; Delaney et al. 2017). Near-surface seismology has become an important supplement in multiscale imaging of the solid earth (Picozzi et al. 2009; Xia et al. 2012; Pilz et al. 2012; de Ridder and Biondi 2013, 2015; Nakata et al. 2015; Mi et al. 2020; Chen et al. 2021). Recent studies commonly retrieve high-frequency surface waves from coherent noise signals above 1 Hz, which are dominantly generated by human activities (Xu et al. 2016; Zhou et al. 2021; Cheng et al. 2018, 2019, 2021). Abundant traffic/construction noise sources are crucial for the passive seismic survey in urban area (Chang et al. 2016), and they provide energetic surface wavefield, but also bring the preferential directions of emitting source (e.g., along the road). Those heterogeneous wavefield propagations can hardly meet the premise of Green’s function recovery. This obvious discrepancy indicates that the imaging of noise source distributions and velocity structures should be integrally reviewed.

To deal with the azimuthal effects on ambient noise imaging, the community has paid great attention to different kinds of algorithm. They can be generally summarized as: (1) Data segment stacking and selection. Preprocessing schemes (Bensen et al. 2007) intend to enhance noise source stationarity and have been widely applied for ambient noise imaging. Weaver et al. (2018) revised the conventional linear stackings by temporally reweighing the noise wavefields to improve Green’s function retrieval. The symmetric correlation functions with higher signal-to-noise ratios (SNRs) were selected to reduce the effects of non-stationary or destructive noise sources (Zhou et al. 2018; Li et al. 2020; Xie et al 2020); (2) Iterative method. Yao and van der Hilst (2009) proposed the inversion approach to update an azimuth-source amplitude model, and thus, the resulted surface wave velocity bias can be corrected iteratively. The method was efficient for tomographic applications assuming that the simplified 1-D source distribution model was correctly recovered from the observed empirical Green’s functions, and this approximation can be guaranteed rely on a qualified velocity model (Wang et al. 2016; Lehujeur et al. 2016); (3) Beamforming correction. Beamforming analysis presents the constructive summation of time series shifted appropriately for the matching azimuth and slowness (Rost and Thosmas 2002). Cheng et al. (2016) correct the azimuthal term of phase-shift dispersion equation according to one dominant source direction revealed by beamforming. The method was further developed by adapting the seismic-array geometry to circumvent offline sources or to improve beamforming resolution (Pan et al. 2016; Liu et al. 2020); (4) Waveform inversion. Tromp et al (2010), Hanasoge (2014) and Fichtner (2015) lay the theoretical foundations of cross-correlation waveform modeling and sensitivity kernel analysis, which make it prospective to implement interferometric wavefield adjoint technique for more complex source–structure inversions. The method successfully reveals the global distributions of Earth’s hum based on the reference velocity model (Ermert et al. 2016, 2017; Sager et al. 2020). Sager et al (2018) conclude the framework of full waveform ambient noise inversion and numerically investigate the performance of different misfit functions on source- and structure-inversion resolutions. Xu et al (2019, 2020) demonstrate the benefits of multicomponent source inversion (the background velocity was predefined as a planar homogeneous model, see also Datta et al. 2019) by both synthetic and field-data tests. Current stage calls for further steps toward structure inversion and more applied research on this theory.

Compared with the first three kinds of method, ambient noise cross-correlation waveform inversion exploits physical properties (e.g., symmetry, travel-time and energy) of the cross-correlation function (CCF) and directly maps them into the noise source distribution and velocity structures. Instead of accommodating the fundamental assumptions of Green’s function retrieval, the actual waveform of CCF with even spurious arrival (e.g., near the zero time lag) can be preserved as meaningful information. In this context, many preprocessing like time/frequency domain normalizations or nonlinear stackings are not plausible for the waveform inversion, since they can change the sensitivities of CCF to the coupled effects of source and structure (Fichtner et al. 2017). Xu et al (2019) and Bowden et al (2020) bridge the matched field processing (MFP) to the theoretical source kernel of observed CCFs, which indicates that both of them have potential to provide a reference source model for the inversion. The average path dispersions contained in CCFs between station pairs are integrally controlled by both noise source distributions and velocity structures. Group/phase velocity measurements for classic tomography have to account for the possible bias caused by azimuthal effects. Therefore, neither the source distribution nor the velocity model can be treated as an isolated issue during the whole iterative inversions. They depend on each other to calculate corresponding kernel functions. This inherent trade-off (Fichtner 2015) also leads to the limitation of the pre-assumption in the iterative method. This study tries to relieve the trade-off or to decouple the nonlinear inverse problem, by jointly optimizing source and velocity parameters in the waveform misfit functions. The noise sources are characterized by strengths and spatial distributions in a 2-D surface, while the velocity model is built by a numeric collection of path velocities between all involved station pairs. We take the observed CCF waveforms as fitting targets because the waveforms integrally reflect the physical properties resulting from sources and structures. Partial derivatives of the bivariate objective function are simultaneously updated upon source and velocity parameters in each iteration. We abbreviate this joint-imaging algorithm to be ModAS (i.e., Modeling Ambient noise distributions for Surface wave imaging).

In this study, we develop ModAS to enrich the ambient noise joint inversion theory and show its applicability for near-surface imaging (e.g., urban underground detections and environmental noise monitoring). The sections are organized as follows: The workflows of ModAS including forward/inversion procedures and model parameterizations are elaborated in Sects. 2 and 3; azimuthal effects, sensitivity kernels, dense array performance, and joint inversion schemes are investigated by synthetic tests in Sect. 4. In Sect. 5, field datasets collected in Hangzhou urban area (Fig. 1) will show the practicability of ModAS by local borehole verifications.

Fig. 1
figure 1

a Field-experiment locations in the Hangzhou urban area, southeast China. b Experimental seismic geometry (alphabetically named A to I in upper case) of field work #1 in Qianjiang new city. c Experimental seismic geometry (alphabetically named a to i in lower case) of field work #2 in Yunqi town. d Shear (S)-wave velocity loggings obtained from the boreholes of field work #2. The deployed seismic stations are represented by black dots, and the well-log locations are indicated by gray crosses

2 Forward Theory

2.1 Modeling Cross-Correlation Function

Noise correlation functions are used to be simulated by cross-correlating the recorded wavelets which are randomly activated by point-like distributed sources (Wapenaar et al. 2010; Lawrence et al. 2013). This conventional way has been widely used to investigate the azimuthal effects in layered medium, but it is time-consuming and cannot handle lateral heterogeneous velocity models. Tromp et al (2010) deduced CCF simulation based on the ensemble averaged noise source model. This forward calculation is more convenient or suitable for iterative waveform inversions, since it consists of two deterministic wavefield modelings for a station pair: one forward wavefield from the reference station and one ensemble correlation wavefield from source distributions.

We routinely focus on the vertical-component seismic data in this study. The synthetic seismic record can be obtained by solving a linear time-invariant forward problem. Its underlying physic is the time-domain convolution of source time function and Green’s function response between source and receiver. The frequency domain equivalent form of a trace record \(u\left( \omega \right)\) can be expressed as the linear equation (Eq. 1):

$$\begin{array}{*{20}c} {u\left( \omega \right) = \int G\left( {{\varvec{x}}_{{\varvec{r}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)F\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} } \\ \end{array}$$
(1)

where \({\varvec{x}}_{{\varvec{r}}}\) and \({\varvec{x}}_{{\varvec{s}}}\) represent receiver and source locations, and \(\omega\) is angular frequency. \(F\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)\) represents the source spectrum. The Green’s function between sources and receiver is calculated using wavefield reciprocity, to avoid the realizations of wave propagation emitted by many sources (e.g., Ermert et al.2017; Xu et al. 2019). Moreover, the analytical form of fundamental mode Rayleigh wave Green’s function (Eq. 2) is utilized to directly model the vertically polarized impulse response (Xu et al. 2019, 2020):

$$\begin{array}{*{20}c} {G\left( {{\varvec{x}}_{{\varvec{r}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right) = G\left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right) = \sqrt {\frac{c}{{8\pi \omega d_{r} }}} e^{{i\left( {\frac{{\omega d_{r} }}{c} - \frac{\pi }{4}} \right)}} } \\ \end{array}$$
(2)

where \(c\) is the Rayleigh wave velocity and \(d_{r}\) is the source–receiver distance:

$$\begin{array}{*{20}c} {d_{r} = \left| {{\varvec{x}}_{{\varvec{r}}} - {\varvec{x}}_{{\varvec{s}}} } \right|} \\ \end{array}$$
(3)

The frequency domain CCF is obtained by the conjugated multiplication of station-pair seismic noise records (substituting Eqs. 1 and 2 into the cross-correlation function in Eq. (4)):

$$\begin{aligned} C\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} ,\omega } \right) & = u\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)u^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right) \\ & = {\int }G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)F\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)F^{*} \left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} \\ & = {\int }G\left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{{\varvec{rec}}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{{\varvec{ref}}}} ,\omega } \right)S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} \\ & = {\int }\frac{c}{8\pi \omega } \cdot \frac{{e^{{\frac{ - i\omega }{c}\left( {d_{rec} - d_{ref} } \right)}} }}{{\sqrt {d_{rec} \cdot d_{ref} } }}S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} \\ \end{aligned}$$
(4)
$$\begin{array}{*{20}c} {S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right) = F\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)F^{*} \left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)} \\ \end{array}$$
(5)
$$\begin{array}{*{20}c} {C\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} ,t} \right) = {\mathbf{\mathcal{F}}}^{ - 1} \left[ {C\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} ,\omega } \right)} \right]} \\ \end{array}$$
(6)

where \({\varvec{x}}_{{{\varvec{rec}}}}\) and \({\varvec{x}}_{{{\varvec{ref}}}}\) are the sequences, respectively, containing locations of the reference stations and their corresponding receiver stations. The source terms \(S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)\) in Eq. (5) are commonly supposed to be neighboring and spatially uncorrelated, and thus, they can be simplified as the source power spectral density (psd) which characterize the average correlations of spatially distributed noise sources. The multiplication of reciprocally conjugated Green’s function and source psd depicts a forward wave propagation from the reference station to the distributed sources. And then, it acts as the source term of a correlation wavefield modeling. Finally, the frequency domain wavefield sampled at the receiver station can be converted to time domain CCF as Eq. (6). (\({\mathbf{\mathcal{F}}}^{ - 1}\) denotes the inverse Fourier transform.)

2.2 Example for Model Parameterizations and Waveform Comparisons

We present a simple numerical example based on the seismic geometry in Fig. 1b to exemplify our scheme of source-velocity parameterization, which is the key part for the CCF forward modeling in ModAS. Noise sources are discretized in a planar grid for the purpose of surface wave generation. These 2-D source power spectral density (psd) models consist of source strengths, geometric distributions (Fig. 2a) and the source spectrum (Fig. 2b). As for the velocity model, we use a set of path-velocity values to build the observable system. In this way, different interstation path velocities constitute an apparent structure, and then, the analytical Green’s function in Eq. (2) can be readily applied to CCF forward modeling for each station pair. Finally, the gather of waveforms is achieved in the context of our parameterized source model and heterogeneous apparent structures.

Fig. 2
figure 2

Source-velocity model setups. a The blue Gaussian blob represents the source psd anomaly, while the colored lines indicate interstation path velocities. The black dots represent different stations (A to D extracted from Fig. 1b). b Gaussian source spectrum of the source anomaly

Note that the colored straight lines in Fig. 2a connecting different station pairs are utilized to explicitly represent the surface wave velocities. However, they do not necessarily mean that the waves propagate along the straight rays. The medium structures are actually parameterized as the averaging path velocity between each station pair which conforms to Fermat’s principle. The path velocities represent the averaged velocities along the ray paths connecting the station pairs. Actually, this pattern is readily understood and widely utilized in tomographic data preparations for further inversions. We make two assumptions to consolidate the physics of mathematical modeling for above model parameterizations: (1) All ambient noise sources share a similar shape of source spectrum; (2) the formation of CCF is most sensitive to the wavefields propagating along the medium between two stations. The first assumption has been demonstrated by Xu et al (2020) that the recorded signals possess similar energy spectral density in small-scale and short-time observations. The second assumption can be supported by the generally elliptical structure sensitivity kernel connecting two involved stations (Tromp et al. 2010; Fichtner 2015). We calculate CCFs of all possible station pairs under the seismic geometry and the source-velocity models given in Fig. 2. And they are compared with the CCFs calculated from the same velocity models but differently using homogeneous source distributions (not shown here). The differences between the two kinds of waveform (Fig. 3) are caused by the Gaussian source anomaly. Obvious discrepancies of asymmetries, travel times and energies in acausal and causal side of the waveforms reveal the azimuthal effects that bias the conventional Green’s function approximation.

Fig. 3
figure 3

Cross-correlation functions of different station pairs forwarded from homogeneous source (orange lines) and the Gaussian source anomaly (blue lines)

3 Inversion Procedures

3.1 Source Misfit Kernel and Velocity Gradient

We define the misfit function \(\chi\) based on the cross-correlation waveform differences. A traditional and widely applied strategy is to find model parameters that minimize the L2-norm (Euclidean length) of the residual vector (Tarantola 2005).

$$\begin{array}{*{20}c} {\chi = \frac{1}{2}\int \left[ {C_{syn}^{w\left( t \right)} \left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} ,t} \right) - C_{obs}^{w\left( t \right)} \left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} ,t} \right)} \right]^{2} {\text{d}}t} \\ \end{array}$$
(7)
$$\delta \chi = \int \left[ {C_{{{\text{syn}}}}^{{w^{2} \left( t \right)}} \left( t \right) - C_{{{\text{obs}}}}^{{w^{2} \left( t \right)}} \left( t \right)} \right]\delta C_{{{\text{syn}}}} \left( t \right){\text{d}}t = \frac{1}{2\pi }\int \left[ {C_{{{\text{syn}}}}^{{w^{2} \left( \omega \right)}} \left( \omega \right) - C_{{{\text{obs}}}}^{{w^{2} \left( \omega \right)}} \left( \omega \right)} \right]^{*} \delta C_{{{\text{syn}}}} \left( \omega \right){\text{d}}\omega$$
(8)
$$\begin{array}{*{20}c} {f\left( \omega \right) = \frac{1}{2\pi }\left[ {C_{{{\text{syn}}}}^{{w^{2} \left( \omega \right)}} \left( \omega \right) - C_{{{\text{obs}}}}^{{w^{2} \left( \omega \right)}} \left( \omega \right)} \right]^{*} } \\ \end{array}$$
(9)
$$\begin{array}{*{20}c} {\delta \chi = \int f\left( \omega \right)\delta C_{{{\text{syn}}}} \left( \omega \right){\text{d}}\omega } \\ \end{array}$$
(10)

In Eq. (7), the quadratic summation of windowed residuals between the synthetic and the observed CCFs in time domain (\(C_{{{\text{syn}}}}^{w\left( t \right)}\) and \(C_{{{\text{obs}}}}^{w\left( t \right)}\), respectively) is halved to facilitate the differential calculation. Perturbations of the misfit function with respect to perturbations of the synthetic CCF waveforms are further transformed into frequency domain. (Fourier transforms are concisely omitted in Eq. (8)). The first term in the right side of Eq. (8) is referred to as the adjoint source (\(f\left( \omega \right)\) in Eq. 9), which corresponds to the Fréchet derivative on modeled CCF waveforms as indicated in Eq. (10).

According to the forward equation (Eq. 4), the perturbation of CCF waveforms can be calculated by the Fréchet derivatives of source parameters and velocity models as expressed in Eq. (11):

$$\begin{aligned} \delta C_{{{\text{syn}}}} \left( \omega \right) & = \int G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)\delta S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} \\ & \quad \begin{array}{*{20}c} { + \int \delta \left[ {G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)} \right]S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} } \\ \end{array} \\ \end{aligned}$$
(11)

Substituting Eq. (11) into Eq. (10), the partial differentials of sources (\(\delta \chi_{S}\)) can be firstly expressed as follows:

$$\begin{aligned} \delta \chi_{S} & = \int G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)f\left( \omega \right)\delta S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} \\ & \begin{array}{*{20}c} { = \int K_{S} \left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} } \right)\delta S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}x_{s} } \\ \end{array} \\ \end{aligned}$$
(12)

where \(K_{S}\) in Eq. (12) represents the source kernel function. And its general form (Eq. 13) consists of the Green’s functions of subsurface structure and the adjoint source of a specific misfit function. This source misfit kernel indicates the sensitivity of misfit variation to the changes in the synthetic source model. Detailed numerical investigations of \(K_{S}\) are presented in Sect. 4.

$$\begin{array}{*{20}c} {K_{S} = \int G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)f\left( \omega \right){\text{d}}\omega } \\ \end{array}$$
(13)

For the partial derivative of the misfit with respect to velocity (\(\delta \chi_{c}\)) in Eq. (14), we use the chain rule to compute the differentials (\(\delta_{c}\)) of Green’s functions with respect to the path velocities in ModAS. Since the analytical form of Green’s function (Eq. 2) is only phase-velocity dependent, our parameterizations of two-station velocities can be explicitly involved in the gradient (\(g_{c} \left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{{\varvec{ref}}}} } \right)\)) of waveform misfit function. We expand the Green’s function differential and then recap the formula as Eq. (15) to present the final \(g_{c}\). Note that we omitted the summation of misfit which should have traversed all station pairs, since both of \(K_{S}\) and \(g_{c}\) in this section are separately calculated from the corresponding Green’s functions and adjoint sources in each two-station measurements.

$$\begin{aligned} \delta \chi _{c} & = \int \delta _{c} \left[ {G\left( {\user2{x}_{{\user2{rec}}} ,\user2{x}_{\user2{s}} ,\omega } \right)G^{*} \left( {\user2{x}_{{\user2{ref}}} ,\user2{x}_{\user2{s}} ,\omega } \right)} \right]f\left( \omega \right)S\left( {\user2{x}_{\user2{s}} ,\omega } \right){\text{d}}x_{s} \\ & = \int g_{c} \left( {\user2{x}_{{\user2{rec}}} ,\user2{x}_{{\user2{ref}}} } \right)\delta c\left( {\user2{x}_{{\user2{rec}}} ,\user2{x}_{{\user2{ref}}} } \right){\text{d}}x_{s} \\ \end{aligned}$$
(14)
$$\begin{aligned} g_{c} & = \int \frac{1}{8\pi \omega } \cdot \frac{{e^{{\frac{ - i\omega }{c}\left( {d_{{{\text{rec}}}} - d_{{{\text{ref}}}} } \right)}} }}{{\sqrt {d_{{{\text{rec}}}} \cdot d_{{{\text{ref}}}} } }}\left( {1 + \frac{i\omega }{c}\left( {d_{{{\text{rec}}}} - d_{{{\text{ref}}}} } \right)} \right)f\left( \omega \right)S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right){\text{d}}\omega \\ & \begin{array}{*{20}c} { = \int G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)f\left( \omega \right)S\left( {{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)\left( {\frac{1}{c} + \frac{i\omega }{{c^{2} }}\left( {d_{{{\text{rec}}}} - d_{{{\text{ref}}}} } \right)} \right){\text{d}}\omega } \\ \end{array} \\ \end{aligned}$$
(15)

3.2 Joint Inversion Scheme

Estimating two kinds of model parameter boils down to minimizing the multivariate misfit function. The CCF waveform misfit functions are finally stacked to obtain a total misfit \(\hat{\chi }\) (Eq. 16). Source sensitivity kernels also have to be summed (\(\widehat{{K_{S} }}\) in Eq. 17) to characterize the overall source model variations reacted to the whole observing system.

$$\begin{array}{*{20}c} {\hat{\chi } = \mathop \sum \limits_{{{\varvec{x}}_{{{\varvec{ref}}}} }} \mathop \sum \limits_{{{\varvec{x}}_{{{\varvec{rec}}}} }} \chi } \\ \end{array}$$
(16)
$$\begin{array}{*{20}c} {\widehat{{K_{S} }} = \mathop \sum \limits_{{{\varvec{x}}_{{{\varvec{ref}}}} }} \mathop \sum \limits_{{{\varvec{x}}_{{{\varvec{rec}}}} }} K_{S} } \\ \end{array}$$
(17)

As introduced in our velocity parameterization section, the path velocities corresponding to one finite frequency band are gathered to form an interstation-velocity model. Thus, the velocity gradient \(g_{c}\) calculated from each station-pair misfit derivative is then allocated to the updating of corresponding path velocity value. The total misfit variations (\(\Delta \hat{\chi }\)) with respect to the model updating (\(\Delta {\mathcal{M}}\)) can be rewritten as Eq. (18) by combining Eqs. (8), (11) and (16):

$$\begin{array}{*{20}c} {\Delta \hat{\chi } = g\Delta {\mathcal{M}}} \\ \end{array}$$
(18)

For a set of source-velocity model parameters (\({\mathcal{M}}\)) within a finite frequency band, the 2-D source model is discretized as \(m \times n\) grid points and the path velocities are constant values related to total \(p\) station pairs (Eq. 19). The set of corresponding gradients (\(\fancyscript{g}\)) consists of the total source kernel \(\widehat{{K_{S} }}\) and all velocity gradients \(g_{c1} \sim g_{cp}\) (Eq. 20). Note that the 2-D kernel \(\widehat{{K_{S} }}\) is flattened as a 1-D sequence in Eq. (20) to achieve pointwise correspondences with source model grids \(s_{1} \sim s_{m \times n}\) in Eq. (19).

$$\begin{array}{*{20}c} {{\mathcal{M}} = \{ s_{1} ,s_{2} \ldots s_{m \times n} ,c_{1} ,c_{2} \ldots c_{p} \} } \\ \end{array}$$
(19)
$$\begin{array}{*{20}c} {\fancyscript{g} = \left\{ {\widehat{{K_{S} }},g_{{c1}} ,g_{{c2}} \ldots g_{{cp}} } \right\}} \\ \end{array}$$
(20)

Iterative gradient-based method is utilized in the inversion module of ModAS. L-BFGS algorithm (Nocedal and Wright 2006; Modrak and Tromp 2016) is the most effective quasi-Newton method. Its approximated inverse Hessian matrix can generally overcome singularities and improve numerical stability in iterations. And it is appropriate for inverse problems where batch optimization makes sense.

Implementing the L-BFGS scheme, the estimated total gradients are packaged as \(\fancyscript{g}\) to jointly update the corresponding source and velocity parameters in \({\mathcal{M}}\) for the minimization of total waveform misfit \(\hat{\chi }\). We apply Gaussian smoothing as regularizations in which the updated grid models are smoothed over the minimum wavelength (Xu et al. 2020; Ermert et al. 2020). Furthermore, we utilize implicit constraints of the expected models to regularize the inversions. For both source and velocity models, we set minimum boundaries to ensure the nonnegativity or the accordance with possible prior information. It can be reasonably deduced that the azimuthal effects lead to early arrivals on CCFs in most cases. We suggest that the inverted interstation velocities should be no larger than the apparent velocities extracted from main arrivals or time–frequency analysis. Therefore, the maximum boundary of velocity parameters can be the observed velocities themselves to force the generally decreased update of path velocities. The iterative inversion will be terminated when the norm of projected gradient \(\left| {\left| \fancyscript{g} \right|} \right|\) is lower than the stopping threshold \(\delta\) (\(\delta = 10^{ - 5}\)). Meanwhile, the BFGS optimizations satisfy the Wolfe conditions (Liu and Nocedal 1989; Modrak and Tromp 2016) which are commonly adopted as built-in functions to terminate the line search for gradient evaluations. For practical cases where the prior information is absent, the observed apparent velocities can be utilized as initial velocity models. And the source beamforming analysis can also provide reference models to initiate source inversions to relieve multiple solutions or local minimum.

4 Synthetic Tests

4.1 Seismic-Array Performance in Source Inversion

Recent studies of dense-array seismology have emerged to enable and refresh the high-resolution ambient noise imaging techniques (de Ridder and Maddison 2018; Wang et al. 2020). In this section, we investigate the path-density performance of seismic array in recovering different source distributions. We utilize the geometry in Fig. 1b and the source spectrum in Fig. 2b. We predefine the path-velocity models (Fig. 4a) as background inputs for the source inversions. The observed CCFs will be generated by two kinds of synthetic noise source model, respectively (Fig. 4b, c). And for each case, we compare the imaging performance between 4-station (A ~ D) and 9-station (A ~ I) seismic array.

Fig. 4
figure 4

a Path-velocity models of the seismic array given in Fig. 1b. b A Gaussian source power spectral density model. c A complex source model including two Gaussian anomalies and a banded source distribution

Source distribution in the first case (Fig. 4b) is the same to that in Fig. 2a: One Gaussian-shaped source anomaly is placed on the northeast, outside of the recording geometry. We set homogeneous noise source distributions as the initial model (\(S_{{{\text{ini}}}}\)) to compute synthetic CCF waveforms for the first iteration. The stacked misfit kernels (Eq. 17) for 4-station and 9-station array are presented in Fig. 5a, b. Negative gradient values in the kernel actually indicate the directions to update the source model for the next iteration. When we increase the involved stations, 9-station array can achieve more sufficient interferences on misfit kernel stackings than 4-station array. Accordingly, the 9-station array is more capable to reduce the imaging artifacts on the inverted source model (Fig. 5c, d). Either 9-station or 4-station inverted results can recover the main direction of true source anomaly and also fit well on the observed CCF waveforms (Fig. 6).

Fig. 5
figure 5

Based on the target source model in Fig. 4b, we have the upper panel: initial misfit kernels of waveform differentials from a 4-station and b 9-station array; the lower panel: final source inversion results from c 4-station and d 9-station array. The involved seismic stations in this study are marked by either white circles or black dots

Fig. 6
figure 6

Waveform comparisons (based on the target source model in Fig. 4b) from source inversion conducted in a 9-station and b 4-station array, respectively. They are normalized by their maximum amplitudes and arranged as shot gathers with respect to the involved station pairs. c Misfits comparison between two kinds of array. The misfit curves are normalized by their initial misfits

We further consider a more complex source model in the second case (Fig. 4c). One smaller Gaussian-shaped source anomaly is supplemented inside of the array, and a NE-SW banded source distribution is set across the array to imitate the urban traffic environment. The influences of interference stacking still lead to different resolutions on the initial misfit kernel from 4-staion and 9-station array (Fig. 7a, b). After the iterative optimizations, three source anomalies can be distinguished in the inverted source model of 4-station array, but they are more smeared than those of 9-staion array (Fig. 7c, d).

Fig. 7
figure 7

Same as Fig. 5 except for different target source models (i.e., Fig. 4c)

Although both 9-station and 4-station arrays have achieved good fittings on waveforms in two cases (Figs. 6a, b, 8a, b), the 4-station source inversions converge into equivalent models or local minimums with more imaging artifacts than those of 9-station array. Moreover, the relative waveform misfits gain more reductions in denser networks (Figs. 6c, 8c). The source anomalies inside of the array are better qualified in terms of their accurate shapes, locations and relative strengths, while the noise sources gathered far from the entire seismic networks are less constrained. All station-pair source sensitivity kernels are stacked. The hyperbolic Fresnel zones extend externally from the geometry. Thus, the interferences outside of the array are not as sufficient as in-array kernel stackings. Consequently, source signals far from the network tend to provide less detailed indications for model updating. Their directions rather than accurate shapes or locations are expected to be recovered.

Fig. 8
figure 8

The same as Fig. 6 except for different target source models (i.e., Fig. 4c)

4.2 Source-Velocity Joint Inversions

We have proved the good recovery of heterogeneous source models based on the predefined true path velocities. However, incorrect background velocity models tend to mislead the source kernels when we purely invert source distributions and vice versa. Given the complex structures and the lack of velocity reference models in the near-surface application, these two kinds of model parameters are both unknown targets. Thus, they are not suitable to be optimized by separated or alternate/sequential inversions. In ModAS, they are put together to be iteratively updated to account for the coupled dependencies in Fréchet derivatives.

The source-velocity models shown in Fig. 4a, c are still regarded as the targets of joint imaging. In Fig. 9a, 36 CCFs are forwarded from all station pairs of the 9-station array. Then, the phase arrivals are manually picked in CCFs’ main wavepackets to calculate the path velocities (\(V_{{{\text{obs}}}}\) in Fig. 9b). All these roughly “observed” velocities exceed the true path velocities (\(V_{{{\text{true}}}}\)), and they reflect the overestimations caused by the azimuthal effects in our common practice of velocity extractions. The degree of these biases varies in different source–receiver geometries. For station pairs mainly affected by perpendicular source distributions, the resulted wavelet will be very close to the zero lag of CCF. Thus, in Fig. 9b, very large even infinite velocity values are readily measured under this extreme condition (e.g., the NW–SE station pairs C–I, C–H, C–E and B–C).

Fig. 9
figure 9

a Observed CCF gather forwarded from the target source-velocity models. The picked phase travel times are indicated by the red arrows. b Histograms of interstation path-velocity models. c Relative differences (the formula is shown in the abscissa) between \(V_{{{\text{true}}}}\) and \(V_{{{\text{obs}}}}\) are shown by the red-gray dots. The averaged bias percentage among the dots is located by the purple line

We exclude the CCF and path velocity value of station pair C–I from the observed datasets since its expected velocity can hardly be determined. The relative differences between \(V_{{{\text{true}}}}\) and \(V_{{{\text{obs}}}}\) (Fig. 9c) show 20% average bias, and about 78% of station pairs possess biases lower than the average value. For the next inversions, we keep the starting source models the same as the homogeneous noise sources utilized in previous sections. In the first case, we set the observed path velocities in Fig. 9b as the initial velocity models, while, in the second case, we set new initial velocity models to be 20% (the average bias in Fig. 9c) higher than the true velocities. These two different initial velocity datasets (\(V_{{{\text{ini}}}}\)) are used to obtain the joint imaging products, respectively, in the first (Fig. 10) and the second (Fig. 11) case. The inverted and the observed CCF waveforms show well matching in both cases (Figs. 10a, 11a). Except for several obvious velocity deviations in the first case (station pairs A–B, A–C, B–C, C–E and C–H in Fig. 10b), the corresponding inverted velocity models in both cases fit well with the true models within 5% relative differences (cyan area in Figs. 10b, 11b). As for the inverted source models (Figs. 10c, 11c), the smaller Gaussian source anomalies inside of the array are well recovered in both cases, and the main trend of NE–SW source distributions can be identified, but they are more smeared in the first case. The initial waveform misfits are reduced by 80 ~ 90 percent in two cases (Figs. 10d, 11d) after the convergence within 70 iterations.

Fig. 10
figure 10

Source-velocity joint inversion results (\(V_{{{\text{ini}}}} = V_{{{\text{obs}}}}\)). a Waveform comparisons between the inverted and the observed CCFs. Note that the station pair C–I was not involved in the optimizations for the first case. b Histograms (the upper red abscissa) of interstation velocity models between \(V_{{{\text{true}}}}\) and \(V_{{{\text{inv}}}}\). Corresponding relative differences (the lower cyan abscissa) overlay the histograms. c Inverted source model. d Normalized misfit curve of the iterative inversions

Fig. 11
figure 11

The same as Fig. 10 except for different initial velocity models (i.e., \(V_{{{\text{ini}}}} = 1.2V_{{{\text{true}}}}\))

The second case entirely avoids the extreme outliers above the 20% average bias in the initial velocity models, which results in more accurate inverted solutions. Comparisons of the two cases help to investigate the effects of the seemingly unreasonable values in the observed velocities. Take station pairs A–C, B–C, C–E and C–H for examples, their waveforms achieve better fittings in the first case; however, they do not necessarily result in closer fittings of velocity models than those in the second case. Due to their worse initial velocity values of the first case in which they are observed to be far from the true models, the joint inversions converge to local optimizations of waveform misfits. Thus, they ultimately cannot approach true velocities as well as those in the second case, and the inverted source distributions are also more smeared than the true model. In general, the inverted source models in both two cases can be deemed as equivalent models that approach the globally optimized solutions, and most of the inverted path velocities (correspondingly they have good initial models) can recover the true models with their deviations less than 5%。

To further improve the noise source imaging in the first case, we modify the initial source model based on the sensitivity kernel of the observed CCFs. The source misfit kernel \(K_{S}\) (Eq. 13) results from the difference between the synthetic and the observed source kernels according to the adjoint source function \(f\left( \omega \right)\) (Eq. 9). The observed source kernel function \(K_{{{\text{obs}}}}\) (Eq. 21) indicates the sensitivity of observed CCF to the source distributions, which is similar to the spatial domain beamforming technique termed matched field processing (MFP).

$$\begin{array}{*{20}c} {K_{{{\text{obs}}}} = \frac{1}{2\pi }\int G\left( {{\varvec{x}}_{{{\varvec{rec}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)G^{*} \left( {{\varvec{x}}_{{{\varvec{ref}}}} ,{\varvec{x}}_{{\varvec{s}}} ,\omega } \right)\left( {C_{{{\text{obs}}}}^{{w^{2} \left( \omega \right)}} \left( \omega \right)} \right)^{*} {\text{d}}\omega } \\ \end{array}$$
(21)

As introduced above, correct background velocity models are essential for accurate source kernels. The theoretical source kernels of observed CCFs calculated based on \(V_{{{\text{true}}}}\) can generally reconstruct the complex source models (Fig. 12a), while the practically observed source kernels calculated from the \(V_{{{\text{obs}}}}\) are more smeared and biased (Fig. 12b), but still preserve the features of source anomalies. The observed source kernels cannot be directly taken as a source model due to the constraint of nonnegativity. We suggest to average the summation of the observed source kernels (Fig. 12b) and the homogeneous source strength model (both of them are normalized by their maximums), to establish an initial strength-normalized source model (Fig. 12c). Thus, the prior information of constructive or destructive sensitivity in the source kernel can be brought into the source model.

Fig. 12
figure 12

Noise source kernels of the observed CCFs calculated from a true velocity models and b observed velocity models. Note that the positive/negative sensitivity values in the kernels indicate the source locations where they were constructive/destructive to form the observed CCFs. c The initial source model (\(S_{{{\text{ini}}}}\)) established from the observed source kernel (b) and the homogeneous source model

Compared to those results of the first case (Fig. 10), the waveform inversions using the new initial source model can also fit the CCFs well (Fig. 13a) and achieve a similar recovery of path velocities with the relative bias less than 5% except for the certain station pairs (Fig. 13b). We further obtain a clearer inverted source image (Fig. 13c) and achieve faster convergence of the optimizations (Fig. 13d) than those in the first case. The modified source model extracted from the observed CCFs in this case provides an initial model closer to the global optimal solution than the purely homogeneous source model. Thus, it helps to improve the source imaging. However, the improvement of velocity inversions, just like the second case (Fig. 11b), should rely on the absence of extreme initial velocity biases (Fig. 9c). Moreover, the inverted source model in the second case (Fig. 11c) can also benefit from the good performance of velocity optimizations to achieve the best recovery of banded source anomalies crossing the array. Therefore, they may indicate that better initial velocity parameters can impose more impacts on the better recovery of source-velocity joint imaging.

Fig. 13
figure 13

The same as Fig. 10 except for different initial source models (i.e., \(S_{{{\text{ini}}}}\) in Fig. 12c)

5 Experiments in the Hangzhou Urban Area

In the tourist city of Hangzhou, southeast China (Fig. 1a), we conducted two field experiments (Fig. 1b, c) to collect ambient noise data for verifying the utility of ModAS. Both the field works #1 and #2 consist of one 9-station seismic array equipped with 5 Hz Zland-3C nodal sensors. Same preprocessing procedures were applied to the two field datasets. The observed one-day noise records (vertical component) were down sampled from 1000 to 50 Hz sampling frequency after anti-aliasing low-pass filtering. Then, they were segmented into 30 s time series with de-trending and de-meaning. The segments were synchronously cross-correlated for all station pairs. Finally, the stacked CCFs were filtered at the center frequency of 5 Hz (to serve as examples in this study) using Gaussian narrow-bandpass filters (Bensen et al. 2007). We utilize the power spectrograms of the filtered CCFs as source term input for the forward modeling of synthetic data to focus on the source characterization of finite-frequency coherent signals. We define the signal windows \(w\left( t \right)\) as [− 1.5 s, 1.5 s] for the CCFs involved in waveform inversions, in which all main wavepackets retrieved from two field datasets are embraced.

5.1 Qianjiang New City (Field Work #1)

Following the workflow established in the synthetic tests, we first pick the apparent path velocities from the filtered observed CCFs (Fig. 14a, b). Obvious spurious signals peaked at the zero lag are not observed in this field-data case (Fig. 14a). These observed path velocities used to be routinely regarded as the input for tomographic inversion without fully accounting for the complex source azimuthal effects, while in ModAS, they are then utilized to derive both the initial velocity (Fig. 14b) and source (Fig. 14c, d) models for further joint inversions.

Fig. 14
figure 14

a Observed CCF gather retrieved from the noise records in field work #1. The picked phase travel times are indicated by the red arrows. The CCFs in signal windows [− 1.5 s, 1.5 s] are isolated by two shadow blocks to concentrate the coherent wavepackets. b Apparent path velocities observed from the picked phase travel times (\(V_{{{\text{obs}}}}\)).c Noise source kernels of the observed CCFs calculated from the observed path velocities. d The initial source model (\(S_{{{\text{ini}}}}\)) established from the observed source kernels (Fig. 14b) and the homogeneous source model

In general, the final inverted CCF waveforms match well with the observed waveforms (Fig. 15a). To further quantify the waveform fitting degrees of ith trace (\(i = 1, 2, \ldots 36\)), we defined the relative confidence for each station-pair inversion result (\({\text{RC}}\left( i \right)\), the cyan area in Fig. 15b). In Eq. (22), the middle value \(\chi_{{{\text{mid}}}}\) averaged from the minimum and the maximum misfits is utilized as a reference to measure the relative performance among these station-pair waveform misfits.

$$\left\{ {\begin{array}{*{20}l} {\chi_{{{\text{mid}}}} = \frac{{\chi_{\max } + \chi_{\min } }}{2}} \hfill \\ {{\text{RC}}\left( i \right) = \frac{{\chi_{{{\text{mid}}}} - \chi_{i} }}{{\chi_{{{\text{mid}}}} }} \times 50\% + 50\% } \hfill \\ \end{array} } \right.$$
(22)
Fig. 15
figure 15

Source-velocity joint inversion results (\(V_{{{\text{ini}}}} = V_{{{\text{obs}}}}\), see \(S_{{{\text{ini}}}}\) in Fig. 14c). a Waveform comparisons between the inverted and the observed CCFs. b Histograms (the upper red abscissa) of inverted velocity models \(V_{{{\text{inv}}}}\). Corresponding relative confidence (the lower cyan abscissa) overlays the histograms. c Inverted source model. d Normalized misfit curve of the iterative inversions

We find that not only the coherent wavepackets in the signal windows gain recovery with high confidence, but also the “coda” like waveforms outside the window (shadow area in Fig. 14a) are accordingly restored. The inverted source model (Fig. 15c) generally reflects the in situ noise environment that there were mainly human-induced active sources inside the network. The stronger inside noise strengths are consistent with those investigations in Sect. 4.1 (Fig. 7d) that they are better qualified by the sufficient interferences inside of the array. We finally achieve nearly 80% waveform misfit reduction after 85 iterations (Fig. 15d) on this field case.

5.2 Yunqi Town (Field Work #2)

We are able to utilize the borehole datasets of shear (S)-wave loggings (Fig. 1d) provided by the China Geological Survey to build a reference velocity model for this field-data case. The Poisson ratios are set as 0.42, and the densities are defined to be 2 g/cm3 according to local geomechanical parameters. We obtain the fundamental-mode Rayleigh wave phase velocity dispersions (Fig. 16a) by forward-modeling the S-wave datasets of each borehole using Geopsy (Wathelet et al. 2020). Then, phase velocities at 5 Hz are extracted from those dispersions and interpolated to be a 2-D slice (Fig. 16b). With this prior information, station-pair path velocities \(V_{{{\text{prior}}}} \left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right)\) can be retrieved from the accumulated ray paths \(l_{p}\) and station-pair travel times \(t\left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right)\) in Eq. (23), where \(t\left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right)\) are forward-calculated by the fast marching method (Rawlinson and Sambridge 2005) using the slowness \(v_{l}^{ - 1} \left( \omega \right)\) along the ray traces (black lines in Fig. 16b).

$$\left\{ {\begin{array}{*{20}l} {t\left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right) = \mathop \int \limits_{{{\varvec{x}}_{{\varvec{s}}} }}^{{{\varvec{x}}_{{\varvec{r}}} }} v_{l}^{ - 1} \left( \omega \right)\, {\text{d}}{\varvec{l}}} \hfill \\ {l_{p} = \mathop \int \limits_{{{\varvec{x}}_{{\varvec{s}}} }}^{{{\varvec{x}}_{{\varvec{r}}} }} {\text{d}}{\varvec{l}}} \hfill \\ {V_{{{\text{prior}}}} \left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right) = l_{p} \cdot t^{ - 1} \left( {{\varvec{x}}_{{\varvec{s}}} ,{\varvec{x}}_{{\varvec{r}}} ,\omega } \right)} \hfill \\ \end{array} } \right.$$
(23)
Fig. 16
figure 16

a Fundamental-mode Rayleigh wave dispersions calculated from all borehole datasets. b Phase velocity slice at 5 Hz extracted from the dispersions. The black lines represent ray paths connecting all station pairs

Firstly, we still pick the apparent phase travel times from the observed CCFs (Fig. 17a) and then calculate the path velocities \(V_{{{\text{obs}}}}\) (Fig. 17b) based on the ray paths in Fig. 16b. The obvious zero-lag signals lead to infinite path velocities in station pairs b-h and e-i. Most observed interstation velocities are higher than the prior values, while only seven station pairs have lower \(V_{{{\text{obs}}}}\) than \(V_{{{\text{prior}}}}\), and they may be caused by the actually existed small-scale karst anomalies which are not sufficiently sampled by the well-log interpolated slice. Therefore, we do not take it for granted that \(V_{{{\text{prior}}}}\) can be treated equally like \(V_{{{\text{true}}}}\) in Fig. 9 as exactly “the standard answers.” They act as essential constraints and references for next optimizations and re-evaluations.

Fig. 17
figure 17

a Observed CCF gather retrieved from the noise records in field work #2. The picked phase travel times are indicated by the red arrows. The CCFs in signal windows [− 1.5 s, 1.5 s] are isolated by two shadow blocks to concentrate the coherent wavepackets. b Histograms of interstation path-velocity models. c Relative differences between \(V_{{{\text{prior}}}}\) and \(V_{{{\text{obs}}}}\) are shown by the red-gray dots. The averaged bias percentage among the dots is located by the purple line

We find 28% averaged bias between \(V_{{{\text{prior}}}}\) and \(V_{{{\text{obs}}}}\) (where the two abnormal traces have been excluded), and their differences can help to reasonably confine the expected velocity model space. We replace the two extremely large values of \(V_{{{\text{obs}}}}\) to better constitute an integrated initial velocity model. \(V_{{{\text{obs}}}}\) and \(V_{{{\text{prior}}}}\) further cooperate to regularize the boundaries of velocity updating: The minimum and the maximum values are set as \(\left[ {{\text{min}}\left( {V_{{{\text{obs}}}} , V_{{{\text{prior}}}} } \right),{\text{max}}\left( {V_{{{\text{obs}}}} , V_{{{\text{prior}}}} } \right) } \right]\). The observed source kernels calculated either from \(V_{{{\text{prior}}}}\) (Fig. 18a) or from the initial velocity model \(V_{{{\text{obs}}}}\) (Fig. 18b) show similar constructive source interferences mainly from the southwest. We still build the initial source model (Fig. 18c) in the same way introduced at the end of Sect. 4.2.

Fig. 18
figure 18

Noise source kernels of the observed CCFs calculated from prior velocity models (a) and observed (initial) velocity models (b). c The initial source model (\(S_{{{\text{ini}}}}\)) established from the observed source kernel (b) and homogeneous source model

The inverted CCF waveforms are restored to fit the observed waveforms (Fig. 19a) with their normalized misfit reduced by 60% after 16 iterations (Fig. 19e). Moreover, the station-pair waveform fittings are quantified by the relative confidence (\({\text{RC}}\left( i \right)\), the cyan area in Fig. 19b). Note that the lower \({\text{RC}}\), especially for station pairs c-h and c-d, may also in turn indicate the less confident constraints provided by the corresponding \(V_{{{\text{prior}}}}\). The averaged bias drops to 10% which indicates that the inverted path velocities are more uniformly approaching the prior information (Fig. 19c). The strong southwest source anomalies in the inverted source model can coincide well with the locations of nearby continuing drilling works (the red star in Fig. 19d).

Fig. 19
figure 19

Source-velocity joint inversion results (\(V_{{{\text{ini}}}} = V_{{{\text{obs}}}}\), see \(S_{{{\text{ini}}}}\) in Fig. 18c). a Waveform comparisons between the inverted and the observed CCFs. b Histograms (the upper red abscissa) of inverted velocity models \(V_{{{\text{inv}}}}\). Corresponding relative confidence (the lower cyan abscissa) overlay the histograms. c Relative differences between \(V_{{{\text{prior}}}}\) and \(V_{{{\text{inv}}}}\) are shown by the red-dark gray dots. The averaged bias percentage among the dots is located by the purple line. d Inverted source model. The working drilling machine is denoted by the red star. e Normalized misfit curve of the iterative inversions

6 Discussion

6.1 Practical Significance of the Path-Velocity Model Assumption

A heterogeneous noise source distribution can affect noise cross-correlations and hence should be accounted for before a more accurate Earth model can be constructed. The new waveform approach in this study simultaneously inverts both the noise source distribution and ray path velocities. The concept is derived from the earlier interferometric waveform inversion theory (e.g., Tromp et al. 2010). In principle, we can hardly obtain the accurate Green’s functions of a heterogeneous medium with common numerical techniques, which is a major error source in seismic waveform inversion (Yagi and Fukahata 2011). Green’s functions can be computed at the beginning and then recycled only when the velocity model is predefined for pure source inversions (Ermert et al. 2020). Although there are well-constructed databases such as Syngine (IRIS 2015) and Instaseis (van Driel et al. 2015) to provide pre-calculated Green’s functions based on various reference models (e.g., PREM), they may hardly cover the demands of small-scale or near-surface high-frequency wavefield information and source-velocity joint inversions. Although the computational cost of 2-D wavefield modeling is well acceptable, the bandwidth limitation of high-frequency surface wave simulation hampers the exact depth information for a velocity slice. The better solution will be the utilization of structure sensitivity kernels based on 3-D wave-equation simulations to directly provide a depth model.

Considering the above reasons, the conventional 2-D ambient noise waveform inversion theory calls for further advancement to be readily applied to actual high-frequency surface wave imaging, given the inaccurate and unstable Green’s function modeling under heterogeneous medium and long-time wavefield propagations from far-field sources, especially when the depth information is required. Thus, the earlier studies purely focus on the source inversion assuming a homogeneous or predefined velocity model. We then direct the theory to the new practical way to further explore the unknown velocities by assuming that noise cross-correlation is predicted based on the noise source distribution and the averaged ray-path velocity connecting the two stations (Eq. 4). This is a credible assumption because of a simple fact that the cross-correlation function is mostly sensitive to the structure between two stations (i.e., the banana-doughnut structure kernel; details can be found in Dahlen et al. 2000; Zhou et al. 2004; Tromp et al. 2005; Fichtner et al. 2017; Sager et al. 2018), which is also the basis of Green’s function extractions utilized for the following tomography. We note that the structure kernel may be distorted when source distributions are so complicated that the medium sensitivity becomes more inclined to source–receiver paths. However, the sensitivity kernel still remains connected around two stations, which is also the prerequisite for ambient noise imaging. Therefore, the assumption emphasizes that the final cross-correlated seismic response will not be affected by the medium unrelated to the two-station system, but intently reflect the shortest time paths between station pairs. The analytical form of Green’s function helps to build this heterogeneous apparent structure which is the manifestation of the unknown medium within the observing system. And the inverted apparent structure of surface wave velocities (e.g., station-pair path velocities at 5 Hz in this study) will be further converted into an integrated 2-D velocity slice at a certain frequency by tomographic inversions. In this context, this study aims to produce more reliable station-pair surface wave velocities as the input for convincing tomographic imaging.

6.2 Prospects

The new method proposed in this study inherits the classic ambient noise inversion theory. And it is developed to practically realize the physical interpretation and reduction of source induced velocity bias which has been a basic concern in ambient noise seismology. Though the extremely large or infinite velocities caused by zero-lag wavepackets have to be excluded, they indicate the relatively perpendicular direction of noise sources which can be significant to constrain the source model updating. We suggest an investigation of the uses and the influences of these special signals for joint inversion in future works. Direct evidence from well logs is indispensable to aid the near-surface imaging of the solid earth. Thus, prior information of velocity models that can supplement the absence of unresolvable values is essential to solve this dilemma. Moreover, we do not routinely apply preprocessing schemes even the SNR selection (e.g., Pang et al. 2019), not only because of the good quality of the dataset, but also due to the concern that the out-of-sync segment selections/stackings will fail to establish the CCFs as self-consistent observables for joint inversion. This concern is similar to that possible waveform bias may be caused by nonlinear operations (Zeng et al. 2012; Fichtner et al. 2017, 2020; Delaney et al. 2017; Zhang et al. 2021). The issues of simultaneous preprocessing for an equivalent source model (e.g., possible secondary vibrations excited by the anisotropic medium) inversion deserve further studies. More research can also focus on multiple objective functions such as the waveform energy ratio of causal/acausal CCFs and the f-k spectrum (e.g., Pan et al. 2019, 2020) to refresh velocities that are trapped into the local minimum caused by waveform cycle skipping. The computational domain of source distributions can be defined depending on the scale of seismic geometry and inversion performance. Since our near-surface studies focus on local-scale and ultrashort-time observations, seismic noise sources far away from the network are reasonably not considered in our computational domain. The source kernel also shows major sensitivities around the seismic network, and the inverted distributions can be regarded as an equivalent local source model (see also Datta et al. 2019).

The dense network provides high-resolution source distributions using the L2 waveform misfit function (Ermert et al. 2020). The scales or resolutions of noise source distributions and interstation velocities are related to the wavelengths or frequency bands that the seismic networks can resolve (Zhou et al. 2021). Small-scale cases in this study suggest that ambient noise waveform inversion is capable of handling the imaging from short interstation distances, even though the wavelengths do not fulfill the restriction of conventional plane-wave assumptions (Yao et al. 2009; Luo et al. 2015). Particularly for the nearly zero-lag signals in field work #2, reliable phase travel times can be retrieved for high-quality joint imaging. However, they may be more dependent on subjective judgments and constraints from prior information. The global solutions of source-velocity optimization will be better achieved by more station-pair datasets available in ModAS, and denser seismic networks can be beneficial to better source constraint inside of the networks. Triplet stations along the common great circle path can also provide consistent velocity constraint and strengthen the dependence among related path velocities in the inversion. Larger-scale networks covering the whole urban area are useful for city noise monitoring in which the human-induced sources play the main roles, and these sources covered by the network are preferably recovered by the ambient noise waveform inversion.

We also find the compromise of ModAS: the simultaneously inverted results of two kinds of unknown parameters, especially the source models, are not exactly recovered in the synthetic tests. We attribute them to the incomplete decoupling in the partial derivative of source and velocity. The defined relative confidence for field-data case can be useful to weight the inverted path velocities to be input for beamforming and tomography. Evaluations and mitigations of the trade-off deserve further study. Nonlinear Monte Carlo methods may be constructive to enable global searching in the constrained model spaces at the cost of huge computing resources. This study has demonstrated the framework of correlation wavefield modeling for the vertical-component seismic data. However, it also has the potential to construct radial/transversal kernel functions based on proper component rotations (Lin et al. 2008; Wang et al. 2019; Xu et al. 2019), which helps to promote the scope of this work on multicomponent seismic data as well as the recently developed fiber-optic sensing data (Ajo-Franklin et al. 2019; Song et al. 2021).

7 Conclusions

In this study, we describe a new algorithm ModAS for source-velocity joint imaging based on ambient noise waveform inversions. This original study mainly provides a possible solution for the decoupling of noise sources and velocity structures in passive seismic surveys, and we further present the applied research of the algorithm. We establish the novel model parameterizations for interstation path velocity, and the joint inversion schemes based on partial derivatives of two kinds of parameters. We build the basic framework of ModAS (Fig. 20) to be an integrated work flow and verify its modules by synthetic tests. The developed ModAS improved the applicability of ambient noise cross-correlation waveform inversion theory in its attempt for source-velocity joint imaging in small-scale studies. And it has been comprehensively qualified through the field datasets in local-scale seismic experiments in the Hangzhou urban area.

Fig. 20
figure 20

Algorithm architecture of ModAS