1 Introduction

The large number of moderate-size earthquakes (M4–M6.5) around the globe provides valuable information about lithospheric mechanical properties, Earth’s structure and seismogenic processes. Earthquake focal depth is a particularly useful parameter. For example, Stein and Wiens (1986) reported progress in shallow earthquake depth determination and presented analyses using focal depth information to infer lithospheric properties. By comparing observed earthquake depth distribution with predicted depth profile from temperature dependent rheology, they demonstrated that focal depth provides helpful constraints on the thermal–mechanical structure of crust and upper mantle. Improved resolution of earthquake depths also contributes to better delineation of subducted plate geometry (Engdahl et al. 1998). More recently, focal depths provide crucial information for seismogenesis of injection-induced earthquakes (Keranen et al. 2013; Kim 2013).

Various methods have been developed to determine focal depth, either with arrival times of direct body waves (P or S wave), differential time between depth phases and direct phases, or full waveform inversion in time domain or frequency domain. For earthquakes recorded by dense local seismic networks, focal depth can be well determined with P and S arrival times. However, some regions of high seismicity are still only covered by sparse network (e.g., Tibetan Plateau, Zargros Collision zone), focal depths of earthquakes in these regions are often inaccurate, unless P and S times are available from a close station with distance less than twice the depth (Mori 1991). Moreover, accuracy of focal depth from teleseismic P and S arrivals is limited by the trade-off with origin time (Billings et al. 1994). Instead, differential time between teleseismic P waves and depth phases (pP, sP, pwP, etc.) provide tight constraints on focal depth (Kind and Seidl 1982; Nyblade and Langston 1995). But teleseismic depth phases are only clearly observable on multiple stations for relatively large earthquakes (M5.5+). For smaller earthquakes recorded by local network, methods based on time-domain waveform inversions, such as the Cut-and-Paste (CAP) method (Zhao and Helmberger 1994), have been developed to determine source parameters including centroid depth. Because independent time-shifts and different weights are adopted for Pnl (waveform in the time window between Pn and Pl wave) and surface waves, CAP inversion is robust against inaccurate velocity models and produces reliable focal mechanisms (Zhu and Helmberger 1996). Therefore, CAP has been widely applied to earthquakes in Alaska, California, Tibetan Plateau and other regions (Zhu and Helmberger 1996; Zhu et al. 2006; Tape et al. 2013).

However, CAP does not utilize information of the complete frequency band; therefore, it only provides partial constraint on source parameters such as focal depth. When the seismic network is dense, the incomplete information due to the limited band in CAP could be compensated by abundance of data from many stations. For the case of sparse network, CAP may not provide tight depth constraints. For example, the waveform misfit versus depth curve is almost flat beyond 10 km in the CAP inversion of the 2008 Wells, Nevada earthquake when only three stations are used (Fig. 1). To improve the resolution of centroid depth, the CAP algorithm could be combined with another inversion scheme in frequency domain. Fundamental mode Rayleigh wave amplitude spectra (RWAS) has been demonstrated to be sensitive to centroid depth (Tsai and Aki 1970; Nguyen and Herrmann 1992). To illustrate its sensitivity to centroid depth, a comparison between synthetic local (within epicentral distance of 5°) waveforms and regional (epicentral distance of 5°–15°) fundamental mode Rayleigh wave amplitude spectra is presented in Fig. 2. For centroid depths of 10, 15 and 20 km, broadband vertical component seismograms show observable differences (Fig. 2a), but band-pass filtered waveforms in the frequency range of 0.02–0.1 Hz are very similar (Fig. 2b). In contrast, fundamental mode Rayleigh wave amplitude spectra shows obvious sensitivity to centroid depth, as the period of “spectral nulls” changes as depth increases (Fig. 2c). Hence inclusion of Rayleigh wave amplitude spectra would contribute to resolution in centroid depth determination.

Fig. 1
figure 1

CAP source parameter inversion with 3 local stations. a Epicenter of the 2008 Nevada earthquake (beachball) and seismic stations (triangle). b Scaled misfit errors and optimal source mechanisms at each depth. Focal mechanism of each depth is indicated with the beach ball

Fig. 2
figure 2

Comparison of synthetic waveforms and Rayleigh wave amplitude spectra (RWAS) at different focal depths for station H07A (epicentral distance of 4.90°) of TA network for the Nevada earthquake. a Broadband vertical component synthetic waveforms. b Filtered vertical component synthetic waveforms in a range as 0.02–0.1 Hz. c Fundamental mode RWAS extracted from the waveforms with do_mft. Black line of each plot denotes the amplitude spectra at the corresponding depths (10 km for the left panel, 15 km for the middle panel, and 20 km for the right panel), while gray line indicates amplitude spectra at other depths

To determine centroid depth with Rayleigh wave amplitude spectra, we need reliable focal mechanisms. Figure 3 shows the bias in the best-fitting depths determined from Rayleigh wave amplitude spectra due to perturbations in dip angle, for the case of the 2008 Nevada earthquake. When the dip is perturbed by ±20°, the deviation from the true depth increases with input depth, and reaches up to 15 km (Fig. 3a, b). The estimated depth is smaller than input depth for both positive and negative dip perturbations, probably because the Nevada earthquake is almost a pure 45° dip-slip event. This interpretation is supported by Fig. 3b that shows a maximum of depth around dip of 45°. CAP has been demonstrated to be able to retrieve reliable focal mechanism, even when it does not constrain centroid depth well. Therefore, Rayleigh wave amplitude spectra can be naturally combined with CAP inversion of local waveform to achieve better constraints on both focal mechanism and centroid depth.

Fig. 3
figure 3

Trade-off between RWAS best-fitting depths and dip angle perturbations. Earthquake and station are same as that in Fig. 2. a Bias of centroid depth with assumed dip perturbations. Squares (open for +20° perturbation, solid for −20°) indicate the deviation between determined and input focal depth. b Sensitivity of RWAS best-fitting depth to dip angle variations. The input depths are all 15 km. The large circle denotes the optimal depth determined with all correct source parameters

In this paper, we propose a method of jointly invert for centroid depth with three-component local seismic waveforms and regional fundamental mode Rayleigh wave amplitude spectra, named as CAP_RWAS, to better constrain depths of earthquakes in sparse networks. In the next two sections, we describe the procedures of this method, and then determine centroid depths of the well-studied Mw6.0 Wells, Nevada earthquake on Feb. 21st, 2008 and a Mw5.6 earthquake in the outer-rise region of the Japan Trench on Oct. 27th, 2013 as case studies. We adopt the bootstrapping algorithm to quantify improvement of depths estimated with joint inversion. We also verify the accuracy of the retrieved centroid depth with teleseismic waveform data, and assess the dependency of joint inversion on velocity model and focal mechanism.

2 Data and Method

2.1 Data

In this paper, we choose the Mw6.0 Nevada earthquake on Feb. 21st, 2008 and the Mw5.6 east of Japan earthquake on Oct. 27th, 2013 for case study. The Nevada earthquake was recorded by the dense EarthScope Transportable Array (TA), and its depth and focal mechanism have been well studied (Dreger et al. 2011; Chen et al. 2015). Taking these results as reference, we can assess the performance of CAP_RWAS joint inversion. Furthermore, dense coverage of TA seismic stations allows us to perform robust assessment on the estimated centroid depth under sparse network with bootstrapping technique. The 2013 outer-rise earthquake is selected here because accurate centroid depths of outer-rise events are important to the understanding of slab bending and how deep oceanic plate can fracture before subduction, which may affect the level of hydration and serpentinization in subduction process (Ranero et al. 2003). However, depth of outer-rise events are often difficult to determine with only body wave arrivals because land-based stations may be hundreds of kilometers away from the epicenters and are usually on one side with limited azimuthal coverage (Hino et al. 2001). Study of centroid depth for the outer-rise earthquake with CAP_RWAS method may provide understanding on the depth accuracy with the joint inversion.

We use seismic waveform data in three distance ranges including local (within epicentral distance of 5°), regional (epicentral distance of 5°–15°) and teleseismic (epicentral distance of 30°–90°) ones. Data at teleseismic distances are collected for assessing the depths retrieved from the joint inversion, because centroid depth constrained with teleseismic depth phases is usually believed to be accurate. We request waveform data from IRIS data center (http://www.iris.edu/hq/), and then remove instrumental responses and linear trends of waveforms, followed by rotation to radial and transverse components of velocity seismograms.

2.2 Method

The algorithm of CAP_RWAS includes three steps, (1) CAP waveform inversion for source parameters, (2) Rayleigh wave amplitude spectra modeling, and (3) introduction of joint depth-error function. These three parts are described as follows.

2.2.1 Waveform Inversion for Focal Mechanisms

In CAP_RWAS, Cut-and-Paste (CAP) method is adopted for source parameters inversion. CAP method is capable of resolving robust source parameters for moderate earthquakes (Zhu and Helmberger 1996), since it allows independent time-shifts of synthetic body and surface waves to accommodate travel time variation caused by inaccurate velocity model as well as errors in earthquake locations and origin times. Even for earthquakes recorded by only 2–3 local stations, CAP has been demonstrated to yield reliable source mechanisms (Zhao and Helmberger 1994; Tan et al. 2006).

In CAP inversions in this study, waveforms are integrated to displacement seismograms, then cut to Pnl and surface wave windows, and band-pass filtered between 0.02 and 0.1 Hz (e.g., Chu et al. 2011; Herrmann et al. 2011; Chen et al. 2015). In the calculation of Green’s functions, we use the 1D velocity model from CRUST2.0 (Laske et al. 2001) at the epicenter, and apply a frequency-wavenumber method (Zhu and Rivera 2002). Then, CAP performs a grid search for optimal double couple focal mechanism (strike, dip and rake angle), moment magnitude and centroid depth. The CAP misfit function is quantified by L2-norm misfit over all waveform windows (waveform window number n), as

$${\text{Ec}}(h) = \sum\limits_{i = 1}^{n} {\left( {\frac{{r_{i} }}{{r_{o} }}} \right)}^{p} \cdot \parallel u_{i} (h) - s_{i} (h)\parallel ,$$
(1)

in which \(r_{i}\) and \(r_{o}\) stand for the epicentral distance of ith station and a reference distance. \(u_{i} (h)\) and \(s_{i} (h)\) are data and synthetic waveforms of ith station for centroid depth of h, respectively. And p is a scaling factor to equalize the contributions of waveform traces with different amplitudes due to difference in epicentral distances. In this study, p is 1.0 and 0.5 for Pnl and surface waves, respectively.

2.2.2 Constraints on Centroid Depth with Rayleigh Wave Amplitude Spectra Modeling

We measure fundamental mode Rayleigh spectra from the vertical components of observed seismograms. We first apply multiple filter analysis (MFT) using the do_mft program in the CPS package. In MFT, the filter with a center frequency \(\omega_{0}\) is defined as

$$H(\omega ) = \exp \left[ { - \frac{{\alpha (\omega - \omega_{0} )^{2} }}{{\omega_{0}^{2} }}} \right].$$
(2)

According to Levshin et al. (1992), a larger α is needed at longer distance in order to obtain reliable measurement of spectral amplitudes, hence α is chosen following CPS package manual (Herrmann and Ammon 2004).

Once observed Rayleigh amplitude spectra are obtained, we calculate synthetic Rayleigh spectra with the focal mechanism inverted with CAP and a range of centroid depths. A efficient semi-analytical approach to compute the theoretical fundamental mode Rayleigh wave eigenfunctions is the Haskell propagator matrix method (Haskell 1953). The Computer Programs in Seismology (CPS) package is one of the broadly used tools which implemented this propagator matrix method (Herrmann and Ammon 2004); thus we adopt the software to calculate the theoretical fundamental mode Rayleigh wave eigenfunctions and amplitude spectra for 1D layered structure model.

Then we perform a grid search over a depth range to find the optimal depth. The misfit of Rayleigh spectra is defined by L1 norm of logarithmic spectral amplitudes, because “spectral nulls” are observed to be more pronounced in logarithmic scale. The short period end of Rayleigh amplitude spectra used is set to be 15 s, since shorter period Rayleigh waves are more affected by structural complexity and attenuation. The long period end is chosen according to the rule that epicentral distances should be larger than 3 times of the wavelengths (Bensen et al. 2007). Spectral amplitudes of periods longer than 100 s are not used, since “spectral nulls” are not likely to be in such long period for crustal earthquakes. Then Rayleigh spectra misfit of the ith station at centroid depth h is calculated from

$${\text{Es}}_{\text{sta}} (h,i) = \sum\limits_{{T = T_{\text{start}} }}^{{T_{\text{end}} }} {\frac{{\left| {\log_{10} \left( {\frac{{A_{\text{obs}} (T)(i)}}{{A_{\text{syn}} (T)(i)}}} \right)} \right|}}{k}} ,$$
(3)

in which \(T_{\text{start}}\) and \(T_{\text{end}}\) denotes the starting and ending period of amplitude spectra, and k denotes number of periods. \(A_{\text{obs}} (T)\) and \(A_{\text{syn}} (T)\) denote observed and synthetic Rayleigh spectral amplitudes at period T, respectively. For all m regional stations, misfit values are summed as

$${\text{Es}}(h) = \sum\limits_{i = 1}^{m} {{\text{Es}}_{\text{sta}} (h,i),}$$
(4)

where \({\text{Es}}(h)\) is defined as Rayleigh wave amplitude spectra (RWAS) misfit error function with respect to centroid depth.

If seismic stations are sparse, the best local waveform fitting moment magnitudes and best regional RWAS fitting magnitudes can be different. Such difference is probably due to structural complexities along different paths for local and regional stations (e.g., inaccurate attenuation models). Therefore, the CAP moment magnitude needs to be adjusted when fitting the amplitude spectra, otherwise the best spectra fitting depth could be significantly deviated from the true depth. For example, a case of CAP inversion for the Mw5.4 Illinois earthquake on Apr. 18th, 2008 shows a moment magnitude of 5.14 (Fig. 4a). When we use the magnitude to compute synthetic Rayleigh spectra, we obtain a best spectra fitting depth as 9 km. However, it is observed that synthetic amplitude spectra of depth as 9 km does not fit data well (Fig. 4b). This misfit is due to the underestimation of moment magnitude in the CAP inversion. To avoid this problem, the CAP magnitude is allowed to shift when modeling Rayleigh spectra. That is, we grid search for optimal moment magnitudes and depths simultaneously near the value estimated from CAP for best spectra fit. We performed such search on Illinois earthquake, and obtained best-fitting centroid depth and moment magnitude as 14 km and 5.27, respectively (Fig. 4c). Using the optimized magnitude, we observed improved RWAS fit in Fig. 4d. This shows that the centroid depth of 14 km is better resolved in this case. Therefore, a relaxed moment magnitude instead of just using CAP magnitude directly is adopted in the joint inversion method.

Fig. 4
figure 4

Trade-off between focal depth and moment magnitude in RWAS modeling. a Waveform fit in CAP inversion for the 2008 Mw5.4 Illinois earthquake. Red and black line represents synthetic and observed seismic waveform, respectively. The numbers below seismograms are cross-correlation coefficients in percentage. The beach ball indicates the estimated source parameters of 295°/85°/8°/5.14 for strike, dip, rake and moment magnitude. The black triangles inside the beach ball denote P wave polarity projected on the lower hemisphere. b RWAS fit using CAP moment magnitude of 5.14. Black points indicate observed data, while solid lines denote synthetics for different depths. Numbers on lines indicate corresponding depths. Red solid line means synthetic RWAS of best-fitting centroid depth. Azimuths and distances of used regional stations are presented on the top right corner. Boxes below the plots show the time-domain Rayleigh waveform fits for the estimated depth of 9 km in this case. Red and black lines denote synthetic and observed waveforms, respectively. All waveforms are filtered between 0.01 and 0.067 Hz. c RWAS misfits for all searched moment magnitudes and centroid depths. The orange circles denote estimated best-fitting centroid depths for all moment magnitudes. The larger square and circle indicate the determined depth of 9 km using CAP moment magnitude, and the global optimized depth of 14 km, respectively. d Same with (b), but using optimized moment magnitude of 5.27. The best-fitting depth is 14 km

2.2.3 The Joint Misfit Function

The CAP_RWAS method comprises 3 steps as illustrated in the flow chart (Fig. 5). In step 1, we obtain source parameters with CAP waveform inversion. The focal mechanism is then fixed to get waveform misfits versus depth. In step 2, we obtain RWAS depth-misfit function via calculation of the spectral amplitudes misfits for a grid of depths and moment magnitudes. In step 3, we define a joint depth-misfit function \({\text{Ej}}(h)\) by combining waveform and amplitude spectra misfits. To balance their contributions, we normalize CAP and RWAS residuals as

$$\begin{aligned} {\text{Ec}}_{\text{Norm}} (h) = \frac{{{\text{Ec}}(h)}}{{{\text{Ec}}\left| {_{\hbox{max} } - {\text{Ec}}} \right|_{\hbox{min} } }} \hfill \\ {\text{Es}}_{\text{Norm}} (h) = \frac{{{\text{Es}}(h)}}{{{\text{Es}}\left| {_{\hbox{max} } - {\text{Es}}} \right|_{\hbox{min} } }}, \hfill \\ \end{aligned}$$
(5)

where \(E\left| {_{\hbox{max} } } \right.\) and \(E\left| {_{\hbox{min} } } \right.\) stand for maximum and minimum misfits of searched depths. The joint depth-misfit function is defined as

$${\text{Ej}}(h) = w_{1} \cdot {\text{Ec}}_{\text{Norm}} (h) + w_{2} \cdot {\text{Es}}_{\text{Norm}} (h),$$
(6)

where \(w_{1}\) and \(w_{2}\) is the weighting coefficient. In this paper, we set \(w_{1} = w_{2} = 1\). By finding the minimum of \({\text{Ej}}(h)\) as a function of h, the optimal centroid depth is determined.

Fig. 5
figure 5

Flowchart of the joint inversion method

2.2.4 Teleseismic Body Wave Modeling for Centroid Depth

To confirm the accuracy of joint inversion depths, teleseismic body waves are used to provide independent verification, by comparing observed vertical component P waves and synthetics of different depths. The calculation of synthetics is based on the propagator matrix method with plane wave approximation (Kikuchi and Kanamori 1991). In the synthetics calculation, CAP source parameters are used since they are not sensitive to depth (e.g., Fig. 1a). CRUST2.0 and IASPEI91 (Kennet 1991) wave velocity models are used for crust and mantle, respectively.

3 Case Studies of the Joint Inversion

3.1 The 2008 Mw6.0 Nevada Event

We used seismic data of 168 local and 189 regional TA stations for the depth determination. These stations are dense in both azimuth and distance; hence they are capable of providing robust constraints on source parameters. We assume centroid depth determined with the whole dataset as the “true” depth, and assess inversion results with sparse stations. Then inversion with three local stations and four regional stations is presented to show improved resolution of joint inversion on depth. Finally, a bootstrapping test (Tichelaar and Ruff 1989) was applied to statistically assess the depth uncertainty for the joint inversion.

With only CAP waveform inversion using all local stations, we determined the best-fitting depth of 8 km, strike/dip/rake angle of 211°/50°/−86° and moment magnitude of 6.01. The source parameters are consistent with global CMT, SLU (Saint Louis University) and USGS Bodywave double couple solutions (Table 1). We then performed joint inversion using both local waveforms and regional fundamental mode Rayleigh amplitude spectra, and obtained again a centroid depth of 8 km, indicating that both CAP and joint inversion depths are well determined with abundant seismic data. The best spectra fitting moment magnitude was also 6.01, which is consistent with the CAP solution.

Table 1 Source parameters and depth of the 2008 Nevada earthquake obtained from CAP inversion and solution from other institutions

We then applied joint inversion to the Nevada earthquake with only three local stations and four regional stations. These stations are selected with good azimuthal coverage (Fig. 6a). With CAP waveform inversion, the centroid depth was determined to be 11 km, 3 km deeper compared with the result obtained from inversion using all local stations. Source parameters were estimated 212°/50°/−84°/6.06 for strike/dip/rake angle and moment magnitude. The mechanism is very close to that obtained from inversion with all local stations. This is consistent with previous studies that CAP inversion with three local stations can reliably resolve focal mechanisms (Zhao and Helmberger 1994; Tan et al. 2006). We then applied joint inversion and obtained optimal centroid depth as 8 km. Sharper convergence of the joint misfit function than the CAP misfit function was observed (Fig. 6b). This demonstrates that joint inversion provides tighter constraints on centroid depth than traditional CAP method in this case. Waveform fits for estimated optimal source parameters are displayed in Fig. 6c. Observed high waveform cross-correlation coefficients of Pnl and Rayleigh waves suggests well-resolved focal mechanism. Love wave segments show larger misfit (e.g., station I10A) than other segments, which is probably caused by transverse component structural complexity. Fit of data and synthetic Rayleigh spectra is shown in Fig. 6d. Best fit of Rayleigh spectra, especially periods of “spectral null” was observed for depth of 8 km. Taking station A05A as an example, the position of “spectral null” on synthetic amplitude spectra migrates to longer periods as centroid depth increases. When depth equals to 8 km, period of “spectral null” of synthetic spectra matches the observed data. For this case of sparse stations, jointly estimated depth is 8 km, same as the “true” depth based on the entire dataset.

Fig. 6
figure 6

Centroid depth determination for the Nevada earthquake with CAP_RWAS method. a Locations of the Nevada earthquake epicenter (star) and seismic stations (triangle). The inset shows local stations in the black box. b Normalized misfits of CAP, RWAS and joint inversion as function of depth. Largest symbols denote the optimal depths estimated using the corresponding datasets. c Local waveform fit for optimal depth and focal mechanism. Black and gray line represents observed seismic waveform data and synthetics, respectively. The numbers below seismograms are cross-correlation coefficients in percentage. d Same as Fig. 4b, but for the Nevada earthquake. Observed amplitude spectra are indicated by gray points. Time-domain Rayleigh waveforms and synthetics are represented by black and gray lines, respectively

To statically assess the accuracy of joint inversion using a sparse network, we applied bootstrapping tests in which stations (2–4 local and 2–4 regional) were randomly selected. To balance the portion of re-sampled cases in the whole sampling space and computational cost, we re-sampled 2, 3 and 4 local CAP stations for 100, 600 and 1000 times, respectively. For comparison, bootstrapping tests of CAP inversion were also adopted. Distributions of centroid depth estimated are shown in Fig. 7. For cases using same number of local stations, joint inversion provides more compact distribution of centroid depth around 8 km as compared to CAP inversion. For example, only 41 % of CAP inversions using 2 local stations determined centroid depth as 8 km, while 64 % of joint inversions using 2 local and 4 regional stations led to the same optimal result. Additionally, we quantify the depth errors by estimating 95 % confidence limit of depth. Since the depth distributions are probably not Gaussian, this confidence limit is determined by elimination of 2.5 % from both smaller and larger ends of depth (Zhan et al. 2012). Less depth errors were observed for joint inversion than CAP inversion when the numbers of local stations are same. Besides, as the number of regional RWAS stations increases, the percentage of accurate depth of 8 km becomes higher. This means that better estimation of centroid depths can be achieved with increased number of regional stations used in the joint inversions.

Fig. 7
figure 7

Histograms of centroid depth with random choices of stations. In each plot, the station selection are indicated in the upper left corner. For example, “2CAP” means CAP inversion with randomly chosen 2 local stations, and “C2R2” means joint inversion with 2 CAP station and 2 RWAS station. The percentage of optimal focal depth (“true depth” estimated using all stations) in each plot is indicated in the upper right comer. Black dots represent the “true depth”. Black squares and gray lines indicate the corresponding 95 % confidence limit. a Histograms of depth determined by CAP inversion. bd Histograms of depth determined by joint inversion using both CAP and RWAS stations

3.2 The 2013 Mw5.6 East of Japan Event

Following the same data-processing procedure as for the Nevada earthquake, we applied joint inversion to the 2013 Mw5.6 east of Japan earthquake. As the study region features substantial 3D heterogeneity including trench, continental crust and oceanic crust, we adopted waveform filtering of longer period in the inversion, so as to suppress 3D effects. The local waveforms were filtered between 0.02 and 0.05 Hz, and the shortest period of Rayleigh spectra was 20 s. All available local waveforms at 16 stations and regional fundamental mode Rayleigh wave amplitude spectra at 49 regional stations of F-network were used in this study (Fig. 8a). Using the whole dataset, centroid depth of the earthquake was determined to be 26 km. The optimal focal mechanism was estimated to be 204°/50°/−80° for strike/dip/rake angle, which are consistent with the source parameters from the Global CMT and USGS body wave solutions (Table 2). Local waveform fits of data and synthetics are displayed in Fig. 8b. Fits of body waves are better than surface waves, probably because surface waves are more sensitive to shallow heterogeneities sampled by paths across the forearc. Fits of fundamental mode RWAS at four stations are presented in Fig. 8c. The positions of “spectral nulls” in such fits show tight constrains on centroid depth. For all RWAS stations, the average misfit value defined by Eq. (3) is shown in Fig. 8a. The spectra average misfit of most stations is less than 0.3, indicating consistency between observed and synthetic Rayleigh wave amplitude spectra.

Fig. 8
figure 8

Centroid depth determination for the Japan earthquake with CAP_RWAS method. a Location of the earthquake epicenter (beachball) and used seismic stations (triangles and squares). Local stations for waveform inversions are denoted by triangles, while stations used for RWAS modeling are indicated by squares. Gray scale of squares represents RWAS misfit. Gray line delineates the boundary of Pacific Plate. b Same as Fig. 6c, but for the Japan earthquake. c RWAS fits of selected stations. Location of these stations is indicated in (a). The symbols are defined in the same way as in Fig. 6d. d Histograms of centroid depths determined with CAP and CAP_RWAS inversion, respectively. The symbols are defined in the say way as in Fig. 7

Table 2 Source parameters and depths of east of Japan earthquake

To assess improvement of depth estimates in joint inversion, we conduct bootstrapping approach by randomly re-sampling 4 local and 4 regional stations. Depth histograms of CAP and joint inversion are displayed in Fig. 8d. With simulated sparse stations, largest proportion of inverted depth is at 25 km, 1 km less than the centroid depth of joint inversion with all stations. This is probably caused by inaccurate dip angle (Fig. 3a). It is shown that only 14 % of the CAP inversion cases determined depth as 25 km, while 22 % of joint inversions resolve depth as 25 km with maximum probability. We also measured the 95 % confidence limit, following the same way as in the case of the Nevada earthquake. It was observed that depth error of joint inversion was 5 km, much smaller than the CAP depth error of 9 km (Fig. 8d), suggesting improved accuracy of centroid depth determination with joint inversion.

3.3 Validation of the Inverted Depth with Teleseismic Depth-Phase Modeling

It has been demonstrated that teleseismic depth phases (pP, sP, etc.) are particularly sensitive to focal depth, therefore previous researchers adopted these phases in determining centroid depth of moderate-size earthquakes (e.g., Chen and Molnar 1983; Stein and Wiens 1986; Fox et al. 2012). To confirm the accuracy of centroid depth determined with our joint inversion, we adopted teleseismic body wave modeling for both the Nevada and Japan outer-rise earthquake. For the Nevada earthquake, vertical displacement seismograms and synthetics were filtered between 0.1 and 0.5 Hz. Synthetics of different depths and data are displayed in Fig. 9a. Because of the shallow depth and relatively long source duration of the Mw6.0 earthquake, P, pP and sP phases overlap with each other, and their arrival times are difficult to be measured. Despite such interference, synthetic waveforms for different depths are still distinguishable. It is shown that best fit between data and synthetics occurs when centroid depth equals 8 km. Such measurement is consistent with the depth obtained by joint inversion.

Fig. 9
figure 9

Centroid depth verifications using teleseismic body waves. a Vertical component P waveform fit of station ALE, MDJ and OTAV for the Nevada earthquake. The black and red lines indicate observed and synthetic waveform, respectively. The bold black line indicates waveform fit for the optimal depth. b Fit of observed and synthetic teleseismic P waves for the Japan earthquake. The black and red lines denote observed and synthe tic P waves for centroid depth of 32 km. Three dashed lines indicate seismic phases identified as P, pP, and pwP, respectively

For the 2013 east of Japan outer-rise earthquake, depth phases can be directly identified because of its deeper focal depth. P waves in velocity and corresponding synthetics of different depths were filtered between 0.3 and 1.0 Hz and compared, leading to best-fitting centroid depth of 32 km (Fig. 9b). Because the velocity model used in CAP_RWAS inversion does not contain a water layer, the joint inversion estimated depth of 26 km actually denotes the vertical distance from seafloor to the earthquake centroid. Since the water thickness is about 6 km at the epicenter (according to CRUST2.0 model), the actual centroid depth from joint inversion should be 32 km, which is consistent with teleseismic depth phase information.

4 Discussion

In this joint inversion method, 1D layered structure model is used in the forward calculation of waveforms and fundamental mode Rayleigh wave amplitude spectra. However, sometimes the assumed model may be inaccurate. To test the robustness of the joint inversion method to 1D model variations, we added perturbation up to ± 5 % of P- and S wave velocities to each layer of CRUST2.0 model in the case of the Nevada earthquake, while keeping the original velocities in other layers, and then performed joint inversion with the stations presented in Fig. 6a. The deviation of determined centroid depths from depth obtained with original model is within 1 km. The perturbed velocity models and the corresponding joint inversion depth-misfit functions are displayed in Fig. S1-S2. To investigate such deviation, we test the sensitivity of “spectra null” relative to the velocity model perturbations (Fig. S3). The periods of “spectral nulls” for perturbed models in Fig. S1-S2 are all within the range of ±1 km depth differences as compared to the original model. The minor sensitivity of “spectral nulls” to the perturbed models may explain the depth deviation of only 1 km. Hence CAP_RWAS is robust to up to ±5 % misestimates of 1D velocity model. On the other hand, very anomalous structure may substantially influence the centroid depth determination with CAP_RWAS. Previous study shows that the Rayleigh wave vertical eigenfunctions have zero spectral values caused by thick sediment layer (Tanimoto and Rivera 2005). These zero values may cause fake spectral nulls unrelated to the “spectral nulls” due to source depth. To avoid error on depth estimate under this circumstance, spectral amplitudes from radial-component Rayleigh waves should be used.

For some areas of strong lateral heterogeneity, inadequate 1D velocity model may also cause large error in depth. For example, the depth error of joint inversion with four local and four regional stations was only 1 km for the Nevada earthquake in continent, while it comes to 5 km for Japan outer-rise earthquake which is near highly heterogeneous subduction zone. Also, the Japan earthquake occurred in outer-rise region that is covered by a water layer, while seismic stations are mostly on land. In this situation, 1D layered velocity model is not capable of representing structures on both source and receiver side. A proper way to reduce depth error caused by 3D structure is to perform quality control for amplitude spectra by rejecting paths sampling strong heterogeneity.

Only local waveforms and regional Rayleigh spectra are used in the joint inversion method, but theoretically data in all possible distance range should be collected. The reason why Rayleigh spectra at local stations are not used is that S waves may interfere with Rayleigh waves at local distances. We also did not use regional seismic waveforms in CAP inversion, since complexity of Pnl waves at this distance range is substantial and is difficult to be accounted for with 1D modeling. In this case, source parameter inversion with 3D Green functions significantly reduces the waveform misfit and provides reliable determination of focal mechanism (Liu et al. 2004; Zhao et al. 2006; Zhu and Zhou 2016). Approaches that make use of 3D model have also been performed in depth determination for shallow events with amplitude spectra (Fox et al. 2012). However, accurate 3D structure model with high resolution are not available for many regions of the world. Another current limitation is that calculation of synthetics with 3D model is computationally costly. Therefore, more robust and reliable source parameters are expected to be achievable with addition of 3D model into the joint inversion method, but further studies are needed to implement the algorithm.

For most focal mechanisms, CAP_RWAS is capable of constraining centroid depth tightly because spectral null is available. However, for dip-slip events with dipping angle close to 0° or 90°, Rayleigh wave amplitude spectra lack resolution to centroid depth, because “spectral nulls” are most pronounced for strike-slip and near 45° dip-slip earthquakes, but not significant for flat/vertical dipping events (Douglas et al. 1971; Fox et al. 2012). For some shallow dipping earthquakes occurred at plate interfaces of subduction zone, CAP_RWAS may not effectively improve the accuracy of centroid depth. That is, only for normal or thrust earthquakes with moderate dipping angle (e.g., outer-rise earthquakes) and strike-slip events, CAP_RWAS is capable to provide improved constraints on centroid depth.

Since surface wave eigenfunctions decrease significantly as earthquake goes deeper, source depth is weakly constrained by Rayleigh wave amplitude spectra for intermediate-depth or deep earthquakes (i.e., focal depth larger than 70 km). For such events, accurate depth determination should rely more on depth-phase arrival time information from both local and teleseismic distances. Therefore, only for earthquakes occurring in the crust or uppermost mantle, the joint inversion method is capable to provide improved resolution and accuracy in depth determination.

To test performance of this method on earthquakes that do not have good teleseismic record and when the regional network is also sparse, we take the Mw5.1 earthquake occurring on July 28th, 1998 in Xinjiang Province, China as an example (Fig. S4A). For this event, we only found a couple of teleseimic records with clear P waves (BFO, INK, etc.). And at local or regional distance, FDSN station spacing is approximately 500 km; therefore, the local network is sparse. With joint inversion, we resolved this earthquake to have a pure thrust focal mechanism, consistent with the Global CMT solution (Table S1). However, our solution indicates a focal depth of 11 km (Fig. S4), in contrast to 42.9 km from ISC catalog and 46.9 km From Global CMT catalog. The short differential time (~3.5 s) between teleseismic P and pP at station BFO (distance as 67.3°) and INK (distance as 50.4°) favors a shallow depth of around 11 km (Fig. S5). This test suggests that the joint inversion method works for very sparse networks.

Table 3 Seismic networks which data were used in this study

5 Conclusions

In this paper, we proposed a joint inversion technique (CAP_RWAS) with three-component local seismic waveform (the Cut-and-Paste method, CAP) and fundamental mode Rayleigh wave amplitude spectra at regional distances to determine centroid depth of moderate earthquakes in sparse networks. This method exploits the fact that focal mechanisms can be well determined with time-domain waveform inversion, and combination of time-domain and frequency-domain information provides extra constraints on centroid depth. In this method, earthquake source parameters are firstly retrieved from CAP waveform inversion. Then Rayleigh wave amplitude spectra and local waveforms are computed for a range of depths, and we conduct a grid search with a joint misfit to find the optimal depth. We applied CAP_RWAS to the 2008 Mw6.0 Nevada earthquake and the 2013 Mw5.6 east of Japan earthquake, and centroid depths of 8 and 32 km, respectively, were obtained. These results are confirmed by independent tests with teleseismic depth phases. For each case, we estimated uncertainty of centroid depth with a bootstrap re-sampling approach. The probability of obtaining the optimal depth increases from 41 % with CAP inversion to 64 % with joint inversion in a test for the Nevada earthquake. The 95 % confidence limit reveals 44 % less error of centroid depth for the east of Japan earthquake, compared with traditional CAP waveform inversion. Robustness of the CAP_RWAS method is tested for ±5 % wave velocity perturbations of 1D velocity model.

With capability to provide better-constrained earthquake centroid depths, applications of this method are expected in studies of seismo-tectonic, seismogenesis, and hazard assessments. For outer-rise earthquakes, this joint inversion method can improve the accuracy of their centroid depths that are difficult to obtain for methods using only arrival times. This may provide more information for studies of seismogenic features of subduction process. For regions with active seismicity, especially those monitored by sparse seismic networks, we expect that a fully automatic centroid depth determination with this joint inversion scheme would be helpful in the construction of earthquake shakemaps.