1 Introduction

The teleseismic receiver function time series (RFs) provides an estimate of the Earth’s seismic impulse response. RFs are obtained by deconvolving the incident P-wavefield of teleseismic earthquakes from the P-to-S (Ps) converted wavefield and thereby equalising source and path effects (e.g., Langston, 1979; Vinnik, 1977). The pulses, expressed as peaks and troughs in the receiver function radial component (RFR) indicate the timing and strength of the wave conversions at seismic discontinuities underlying the station, as well as associated multiples. By analysing amplitudes and time of the signals in the RFR time series, the underlying S-wave velocity structure can be estimated, usually assuming a horizontally layered medium beneath the seismic station. A quantitative analysis may involve a mathematically formalised inverse algorithm which estimates a S-wave velocity model that minimises the misfit between observed and synthetic RFR, under the given assumptions. However, inferring the Earth’s properties from RFR inversion can be challenging due to the non-uniqueness and non-linearity of the time-depth-velocity relationship of the P-to-S delay time in the RFR waveform (Ammon et al., 1990; Jacobsen & Svenningsen, 2008). Ammon et al. (1990) developed a linearised iterative inversion scheme based on the method of Owens et al. (1984) and Shaw and Orcutt (1985) for RFR inversion. Moreover, Ammon et al. (1990) examined the non-uniqueness of the RFR waveform inversion and showed how the trade-off between the layer thickness and S-wave velocity affects the resulting velocity model. Jacobsen and Svenningsen (2008) also investigated the non-uniqueness and non-linearity of RFR inversion and demonstrated that the changes in the delay time during the RFR Jacobian computation are the main source of non-linearity of the inverse problem. They showed that parameterising the velocity model using delay time thicknesses instead of spatial layer thicknesses improves the uniqueness and performance of the RFR inversion (Jacobsen & Svenningsen, 2008).

The inherent non-uniqueness and non-linearity of RFR inversion encouraged many researchers to develop and adopt alternative methods to fit the RFR time series. Some of these methods are grid search-based schemes (e.g., Sandvol et al., 1998; Zhu & Kanamori, 2000; Zor et al., 2006; Ogden et al., 2019 and 2022), global search algorithms such as simulated annealing (e.g., Vergne et al., 2002; Vinnik et al., 2004; Zhao et al., 1996), and stochastic methods such as genetic and neighbourhood algorithms (e.g., Frederiksen et al., 2003; Levin & Park, 1997; Reading et al., 2003; Shibutani et al., 1996) that aim to model the receiver function time series observations.

The sensitivity of both RFR and surface waves to S-wave velocity structure led Julia et al. (2000) to propose a joint inversion framework reducing the non-linearity and non-uniqueness of the RFR inversion. This method has become an established tool for quantifying the Earth’s crustal and upper mantle structures and has been widely used in many studies (e.g., Julià et al., 2000, 2005; Motaghi et al., 2015, 2017; Rastgoo et al., 2018; Priestley et al., 2022). The advantage of this method lies in the combination of two independent observations that are sensitive to absolute S-wave velocity (i.e., surface wave dispersion curve) and S-wave velocity contrast (i.e., RFR time series). However, the rather low lateral resolution of the surface wave tomography images used to obtain surface wave dispersion curves may potentially dampen the effect of small-scale features that are well observed by the receiver function time series, especially at shallow depth. Furthermore, the frequently-used dispersion curves from teleseismic earthquake surface waves have very limited resolution at upper crustal levels. Even if shorter periods of < 10 s are analysed, for example in ambient noise tomography, the ray coverage is often poor leading to significant lateral smearing. Consequently, the dispersion curves extracted from these tomographic models, which may be jointly inverted with RFR, lack essential information about the shallow structures. A joint inversion of RFR and surface waves often fits the surface wave dispersion data well at the cost of an increased data misfit in the RFR time series. Due to the issues with upper crustal resolution, smearing and smoothing of the surface wave/ambient noise tomography, and the over-fitting of dispersion curves, the shallow structure is likely less-well resolved in joint RFR inversion using surface wave dispersion.

Svenningsen and Jacobsen (2007) showed that RFs themselves are in fact sensitive to the absolute S-wave velocity of the subsurface through the incidence angle of the teleseismic wave, expressed as the amplitude of RFs at P-arrival at a known ray-parameter. They used RF polarisation to derive the apparent S-wave velocity (VS,app) beneath the seismic stations, which are primarily sensitive to upper-crustal structures.

The observed angle of incidence (\({i}_{OP}\)) on a three-component seismogram is the superposition of the incident P-plane wave, as well as reflected P and converted P-to-S phases. Therefore, its value differs from the true P-wave incidence angle (\({i}_{TP})\). Wiechert (1907) derives the relation between observed and true angle of incidence, as follows (Eq. 1).

$$\frac{sin({i}_{OP}/2)}{{V}_{S}} =\frac{sin({i}_{TP})}{{V}_{P}} = p$$
(1)

Equation 1 can be used to estimate the apparent half-space S-wave velocity (VS,app) beneath a seismic station at zero time delay (P-wave arrival) which is a function of S- and P-wave velocity of the subsurface, the incoming P-waveform and event slowness (ray parameter) for a horizontally layered subsurface. This will additionally be affected by three-dimensional crustal variations with incidence angle and back azimuth (full description of the apparent S-wave velocity inversion can be found in Svenningsen & Jacobsen, 2007; Hannemann et al., 2016; Park & Ishii, 2018, the interested reader can also find the background theory in chapter five of Aki & Richards, 2002). The effect of the complex incoming source waveforms on the observed angle of incidence can be eliminated by using the Z and R components of RFs instead of using seismic waveform (Svenningsen & Jacobsen, 2007). The observed angle of incidence (\({i}_{OP}\)) and apparent S-wave velocity (VS,app) are determined by convolving the Z and R components of receiver functions with an integration function with varying periods covering different time windows (Chong et al., 2018; Svenningsen & Jacobsen, 2007). The resulting values of \({i}_{OP}\) and VS,app depend on the velocities of the phases present within the integrated time span that corresponds to the associated wavelength. Thus, by increasing the period (T) of the integration function’s kernel, the apparent velocities represent successively deeper S-wave velocity structure of the Earth’s interior.

The VS,app inversion of Svenningsen and Jacobsen (2007) was implemented as part of a joint inversion framework together with RFR by Schiffer et al. (2015). The algorithm was recently extended to a random model search scheme (Schiffer et al., 2022). The joint inversion of RFR and VS,app has since become a robust method to image Earth’s crust and upper mantle (e.g., Chong et al., 2018; Park & Ishii, 2018; Schiffer et al., 2015, 2022; Wang et al., 2022) and Mars crustal structures (Joshi et al., 2023).

The joint inversion of RFR and VS,app is able to map the complex interference of primary and multiple arrivals in RFR and the amplitude of VS,app into velocity changes in appropriate depth ranges. Thus, it is essential to find a layer setup which enables the inversion to map the observations into absolute S-wave velocity. Our new joint inversion core algorithm uses a grid search over different layer setups while estimating S-wave velocity through a linearised joint inversion scheme. As the linearised joint inversion of RFR and VS,app depends on a good starting model, demonstrated by larger posterior errors when using inaccurate/inappropriate starting models, we propose a 2-stage inversion with the major difference to previous models being a pre-conditioning stage of a suitable starting model (hereafter called pseudo-initial model). This pseudo-initial mode is then used in the joint inversion core algorithm to estimate the final S-wave velocity model. To achieve this, we modify the method proposed by Schiffer et al. (2015) and explain the effect of initial model selection, parameterised in layer delay time thickness. We then propose a stochastic method to estimate an initial model for the RFR-VS,app joint inversion and modified the random model search method proposed by (Schiffer et al., 2022) to use our joint inversion core algorithm for finding appropriate layers setup instead of preconditioning according to the peaks in the RFR observation. We explain the joint inversion core algorithm and our joint inversion framework for estimating a high-resolution S-wave velocity model. As a case example, we will apply this framework to estimate the S-wave velocity beneath a permanent station in the Makran subduction zone and compare the results with the previously estimated S-wave velocity from joint inversion of RFR and surface dispersion curves and a controlled source seismic velocity model. To evaluate the method’s reliability in diverse tectonic settings, we also applied it to real data from a station in the Indian craton and a station in Himalaya. The results off the Makran subduction zone is presented in the main text, the results of the latter two are presented in the Supplementary Material.

2 Method

2.1 General Description of the 2-Stage Inversion

The observed RFR and VS,app-curves depend on the subsurface structures beneath stations and can be reconstructed if the subsurface structures are known using a suitable forward function. Joint inversion of these datasets reversely estimates those model parameters that reduce a defined objective function, usually including the data misfit. In this case, the objective function is primarily defined by the data misfit of both RFR and VS,app, but also includes expressions for model roughness and the misfit to the starting model. The various expressions are internally weighted. The RFR and VS,app datasets are both functions of the model properties including P-, S-wave velocity, density and layer thicknesses, thus we can explain these observations according to Eq. 2:

$$ \begin{gathered} RF_{R} = f\left( m \right) \hfill \\ V_{s,app} = g\left( m \right) \hfill \\ \end{gathered} $$
(2)

where f and g are nonlinear functions and \(m\) represent the model properties (e.g., S-wave velocity and layer thickness). The process of finding a model through a linear joint inversion involves linearising f and g about an initial model. However, the effects of non-linear terms which were neglected through linearisation increase when the difference between initial and true models is large. To address this issue, we propose a two-stage joint inversion procedure. Stage 1 of the joint inversion involves finding a good starting model (pseudo-initial model; Sect. 2.4.1) and in stage 2, we estimate the detailed S-wave velocities and layer thicknesses that minimise the defined objective function through the joint inversion core algorithm (2.4.2).

We make use of the parameterisation of the model by defining the vertical extend of a layer not in spatial thickness but in delay time (Jacobsen & Svenningsen, 2008), from now called delay time thickness. We first describe the delay time thicknesses and how it combines the S-wave velocity and spatial layer thicknesses in the joint inversion core algorithm (2.2). Then, we detail the linearisation of the two datasets used in the linearised joint inversion (2.3.1), after which we will present the algorithm to find the best layer setup (2.3.2) and finally explain stage 1 and stage 2 of the joint inversion framework in Sects. 2.4.1 and 2.4.2, respectively.

2.2 Description of Delay Time Thickness

To improve the RFR inverse problem that suffers from inherent non-uniqueness and non-linearity, Jacobsen and Svenningsen (2008) have parameterised Earth’s structure in terms of fixed delay time thickness, instead of fixed spatial thickness. For any change in S- and P-wave velocity, the spatial layer thickness is then updated according to Eq. 3 to retain the same delay time thickness.

$$ \Delta z_{new} = \Delta z_{old} \frac{{\sqrt {V_{S, old}^{2 } - p^{2} } - \sqrt {V_{P, old}^{2} - p^{2} } }}{{\sqrt {V_{S, new}^{2 } - p^{2} } - \sqrt {V_{P, new}^{2} - p^{2} } }} $$
(3)

where \(\Delta {z}_{new}\) and \(\Delta {z}_{old}\) are the new and old spatial layer thickness, \({{V}_{S, new}}\) and \({V}_{S, old}\) are the new and old S-wave velocities of the layer, \({{V}_{P, new}}\) and \({V}_{P, old}\) are the new and old P-wave velocities of the layer and \(p\) is the ray parameter. This correction has been implemented in Schiffer et al. (2015)’s inversion algorithm and subsequent versions, although there the delay time thicknesses were allowed to change in the inversion.

In our proposed RFR-VS,app joint inversion, we use such fixed delay time thicknesses that are calculated from given S-wave velocities and spatial layer thicknesses. We first describe the medium by several layers with an initial S-wave velocity and layer thickness. We then determine the delay time thickness of each layer for the initial model and calculate the Jacobian matrices (partial derivatives of observations with respect to inversion model parameters) for RFR and VS,app. The Jacobian matrices calculation involves perturbing S-wave velocity of each layer and finding the sensitivity of RFR and VS,app to S-wave velocity for all time samples and periods. In this process, we adjust the spatial thickness of the layers according to the S-wave velocity perturbation by Eq. 3 to maintain a fixed delay time thickness. The Jacobian matrices can be used to estimate a S-wave velocity change in an iterative weighted least-squares scheme. In each iteration of the linearised joint inversion the spatial layer thickness will therefore change according to the estimated S-wave velocity perturbation to retain the fixed delay time thicknesses and the Jacobian matrices are recalculated.

2.3 Joint Inversion Core Algorithm

2.3.1 Linearised Joint Inversion

The Jacobians of the RFR and VS,app are calculated by linearising the observables according to Eqs. 5 and 6. We linearise the RFR by expanding this time series about an initial model by using Eq. 4:

$$R{F}_{R}\left({t}_{{i}_{R{F}_{R}}},{m}_{1}\right)=R{F}_{R}\left({t}_{{i}_{R{F}_{R}}},{m}_{0}\right)+ \frac{\partial R{F}_{R}\left({t}_{{i}_{R{F}_{R}}},{m}_{1}\right)}{\partial m}\left({m}_{1}-{m}_{0}\right)+ \frac{1}{2!} \frac{{\partial }^{2}R{F}_{R}\left({t}_{{i}_{R{F}_{R}}},{m}_{1}\right)}{\partial {m}^{2}}\left({m}_{1}-{m}_{0}\right)+\dots $$
(4)

where \({m}_{1}\) and \({m}_{0}\) are the perturbed and initial models and \({t}_{{i}_{R{F}_{R}}}\) represent the time of the ith sample in the RFR. We can linearise the RFR by keeping the first term in the right hand of Eq. 4 and find the Jacobian of the RFR, as follows (Eq. 5):

$${J}_{R{F}_{R}} ({{t}_{i}}_{R{F}_{R}} ,j)= \frac{R{F}_{R}({t}_{{i}_{R{F}_{R}}}, {m}_{0}+I*\Delta {m}_{j})-R{F}_{R}({t}_{{i}_{R{F}_{R}}}, {m}_{0})}{I*\Delta {m}_{j}}$$
(5)

where \(I\) is the identity matrix and \(\Delta {m}_{j}\) is an array representing the perturbation value of the jth layer, while the rest of the layers have a value of zero. A similar linearisation is performed for the VS,app curve Jacobian (Eq. 6):

$${J}_{{V}_{S, app}} ({T}_{{i}_{app}},j)= \frac{{V}_{S, app}({T}_{{i}_{app}}, {m}_{0}+I*\Delta {m}_{j})- {V}_{S, app}({T}_{{i}_{app}}, {m}_{0})}{I*\Delta {m}_{j}}$$
(6)

where \({V}_{S, app}({T}_{{i}_{app}}, {m}_{0})\) is the apparent velocity for the period \({T}_{{i}_{app}}\).

We use S-wave velocities as the unknown model parameters of the linearised joint inversion since they are the main parameters that affect the RFR and VS,app (Ammon et al., 1990; Chong et al., 2018; Svenningsen & Jacobsen, 2007). We apply Christensen and Mooney (1995)’s relations to calculate the P-wave velocity and density. By using \({J}_{{V}_{S, app}}\) and \({J}_{R{F}_{R}}\), we can estimate the absolute S-wave velocity using a standard iterative least-square formula with smoothing and damping constraints (Eq. 7).

$${m}_{k+1}= {m}_{k} +({\alpha }^{2} {J}_{R{F}_{R}}{ }^{T} {J}_{R{F}_{R}} +{\beta }^{2}{{J}_{{V}_{S, app}}}^{T}{J}_{{V}_{S, app}}+{\lambda }^{2}{S}^{T}S+{\mu }^{2}{I}^{T}I{)}^{-1}\times (\alpha {J}_{R{F}_{R}}{ }^{T}\delta R{F}_{R}+\beta {{J}_{{V}_{S, app}}}^{T}\delta {V}_{s, app}+\lambda {S}^{T}\delta {s}_{k}+\mu {I}^{T}\delta {m}_{k})$$
(7)

In this equation, S is the second derivative smoothing matrix (Menke, 2018), \(I\) is the identity matrix, \(\delta R{F}_{R}\) is the difference between the observed and the predicted RFR calculated from \({m}_{k}\) and \(\delta {V}_{s, app}\) is the difference between the observed and the predicted apparent velocity curve. \(\delta {s}_{k}= 0-{S}_{{m}_{k}}\) is the model roughness, and \({\delta m}_{k}={m}_{0}-{m}_{k}\) is the difference between the initial and current model. The coefficients \(\alpha \), \(\beta \), \(\lambda \), and \(\mu \) adjust the effect of receiver function, apparent velocity, smoothing, and damping misfits (\(\delta R{F}_{R}, \delta {V}_{s, app}, \delta {s}_{k}\), and \({\delta m}_{k}\)) on the output S-wave velocity model, respectively.

We have estimated the best set of \(\alpha \), \(\beta \), \(\lambda \), and \(\mu \) through a grid search over 900 different combinations of these parameters. We selected several sets and combinations of weights which generate low objective function values using a synthetic test with 40% error in delay time thickness (see Fig. 2). We visually inspected the diffferent estimated models and chose the set of weights which estimated the true S-velocity model most adequately. This results in \(\alpha =5.0\), \(\beta =2.5\), \(\lambda =1.5\) and \(\mu =0.5\) to recover the best-fitting models, while surpassing the effect of over-parameterisation in the depths where the S-wave velocity gradient is small (see 2.3.2 for more details). It should be noted that we normalise the equations by dividing them by the number of observations for RFR, VS,app and the number of layers, thus, these factors represent the absolute effect of each equation on the estimated model.

2.3.2 Definition of Model Layer Parameterisation

One of the most important and novel aspects of our approach is the model parameterisation in terms of setup of layer boundaries. The model properties are often described in terms of a layered medium with each layer having a density, S-and P-wave velocity. It can be shown that the P-wave velocity and density variations only have a minor effect on the RFR and VS,app (Ammon et al., 1990). Therefore, the remaining two parameters (i.e., S-wave velocities and layer thicknesses) are chosen as unknown model properties in the joint inversion core algorithm. Uncertainty and the occurrence of artefacts in the output model can arise from the choice of the number and depth of the layers used in the linearised joint inversion (Eq. 7). In principle, increasing the number of layers or in other words, increasing the degree of freedom of the inverse problem results in a better fit to the data with the cost of more potentially unrealistic fluctuations in the estimated S-wave velocities of adjacent layers in order to exactly fit the data. Imposing damping and smoothing constraints can reduce these fluctuations. However, a very strong smoothing or damping constraint (large values of \(\lambda \) and \(\mu \) in Eq. 7), limiting the change of S-wave velocity of each layer in consecutive iterations and may also remove small-scale structure. A practical way to parameterise the joint inversion of RFR and VS,app is to increase the number of layers in different steps (Motaghi et al., 2015; Schiffer et al., 2015). We have implemented this method in our joint inversion core algorithm that aims to estimate S-wave velocity and layers setup. Assuming an arbitrary initial model, we estimate the S-wave velocity by finding the model that generates the lowest residual among different models estimated by Eq. 7 utilising initial model parameterised into different layer setups. Thus, the joint inversion core algorithm includes several linearised joint inversions of initial model stratified in different layers setup. The delay time thicknesses of the initial model remain constant throughout all of the following linearised joint inversions, while the spatial layer thicknesses are changed according to S-wave velocities at each iteration of linearised joint inversions with different layer setups (see 2.3.1 for more detail). This process (joint inversion core algorithm) examines different layer interface depths which allows the linearised joint inversion to map the primary and multiple phase arrival into appropriate depth ranges. We have also increased the smoothing and damping constraint by a small amount to dampen the fluctuation in the over-parameterised section of the model when we invert the data with finer resolution. The initial smoothing and damping constraint and the amount of increment are calculated by a trial-and-error process. The final output model for this procedure is the model (i.e., S-wave velocity and layer thickness) with the smallest objective function according to Eq. 7 (Fig. 1a).

Figure 1
figure 1

a Flowchart of the joint inversion core algorithm for an arbitrary initial model to estimate S-wave velocity and the best layers setup (see Sect. 2.3). b Flowchart of the two-stage joint inversion framework used in this study (see Sect. 2.4)

2.4 Joint Inversion Framework

2.4.1 Inversion Stage 1—Pseudo-Initial Model Estimation

The procedure explained above aims to estimate the S-wave velocity and layers thickness for the initial model used in the inversion. We assume that the linear term of the Taylor expansion about this initial model can adequately express the relation between observed RFR, VS,app and S-wave velocity (Eqs. 5 and 6). However, even with a model stratified with delay times thicknesses, this assumption is only valid near the initial S-wave velocity model used in the Jacobians computation. Thus, the initial layer delay time thicknesses which were used in the Jacobians calculation should be optimised to adequately represent the Earth’s crustal structure. A synthetic test was implemented in order to demonstrate the impact of errors in delay time thickness in the initial model on the inversion performance (Sect. 3.1). This experiment compares the results of linearised joint inversion using initial models with delay time thicknesses matching the synthetic models, against various initial models with random variations in delay time thicknesses, not matching the synthetic model. Our analysis demonstrates that one of the crucial steps in our inverse approach is to find a well-suited pseudo-initial model. To address this issue, we propose a stochastic optimisation scheme to find the pseudo-initial model. In addition, we implemented a modified random model search of Schiffer et al. (2022) in our joint inversion core algorithm instead of preconditioning interfaces depth according to RFR observation. The pseudo-initial model prevents mapping multiples into separate, artefactual layers. This is done by defining a pseudo-initial model that enables the linearised joint inversion to fit multiples and primary conversion arrivals simultaneously. Our analysis confirms that the delay time thickness of initial models in the deeper part of the layered medium can significantly affect the accuracy of the final S-wave velocity model. Thus, it is essential to find major discontinuities in the observed data to improve the reliability of the joint inversion.

Schiffer et al. (2022) attempted to generate a similar effect by placing a random fraction of the overall layer boundaries at positive and negative peaks observed in the first 5–8 s of the RFR waveforms. Furthermore, they had a random number and depth position of layers in general with lower and upper limits for number of sedimentary, crustal and upper mantle layers to investigate both finely and coarsely sampled models. Their final S-wave velocity model was defined as the maximum model density of the weighted velocity model population computed from 1000 individual inversion runs.

The need for well-conditioned initial models motivates us to present a new scheme for estimating the pseudo-initial model by using the Particle Swarm Optimization (PSO) algorithm (Kennedy & Eberhart, 1995). The PSO algorithm is a stochastic search method that was originally developed for solving continuous optimisation problems. It operates by employing a population of particles, each representing a random model, to explore a model space with the same dimensions as the number of unknown parameters. Within each iteration, the particles generate a cost function, which evaluates their fit to the data. The algorithm aims to identify the particle with the lowest cost function by iteratively changing all particles’ unknown parameters according to the model space. In the initial step of the PSO algorithm, each particle represents a randomly selected initial model from the available model space. These particles are then updated in each iteration, taking into account their current state, the best state they have experienced so far, and the best overall cost function. This iterative process allows the algorithm to find the unknown parameters (in this study, the major depth discontinuities) that minimise the cost function.

The objective of this optimisation is to find the depth of the Moho and other major intra-crustal discontinuities that may exist in the study area. The first step of optimisation is to define an initial model space. We initiate a model with two major discontinuities at the depths proposed by the global standard model IASP91 (Kennett & Engdahl, 1991) and use an approximate S-wave velocity model according to the observed VS,app curve. This approximation ensures that the difference between the initial and the true S-wave velocity is small enough so that the assumption of linearisation in the process of calculating RFR and VS,app Jacobians is valid. To approximate S-wave velocity for an assumed depth range, we first determine the filter periods that are significantly affected by the S-wave velocity of the assumed depth range according to the apparent velocity Jacobian. The VS,app observations at these filter periods will then be used to approximate the S-wave velocity of this depth range. This process is repeated from the surface to the bottom of the model to approximate the S-wave velocities of the initial model.

In the following step, we create an initial model space by specifying the permissible depth ranges for the main intra-crustal discontinuities. We then use these boundaries to define the PSO particles, where each particle represents a set of randomly generated depths for the primary intra-crustal discontinuities. Additionally, we correct the initial approximate S-wave velocity for every distinct model (particles) by recalculating it according to particles discontinuities depths. This approach helps us determine a set of well-suited starting models by applying minor changes to the presumed global model (i.e., IASP91 or previous studies model). The PSO algorithm searches through model space by utilising these particles (initial models) in the joint inversion of the dataset and evaluating their cost function according to their estimated model residuals (Eq. 8).

$$PSO\_CF = ||R{F}_{RO}-R{F}_{RC}|| +||{V}_{s, appO}-{V}_{s, appC}||$$
(8)

The pseudo-initial model is the model which generates the lowest cost function and will be used in the joint inversion of RFR and VS,app dataset in stage 2.

Using the PSO algorithm enables us to choose a specific cost function, a preferable initial model space, and control the speed of convergence while estimating the pseudo-initial model. However, the computational cost of this method is very high. Considering our computational power, we chose 30 particles and 11 iterations for the PSO algorithm. Increasing the number of particles and iterations would improve the suitability and data misfit of the pseudo-initial model. However, our approach still shows at least a 30% error reduction when using the estimated pseudo-initial model compared to defining the initial S-wave velocity model from reference studies or standard global models. The jrfapp package also provides the possibility for a modified random model search scheme (RMS, e.g., Schiffer et al., 2022), which obtains almost equally good results as PSO (see below). Adopting the RMS method as the pseudo-initial model estimator facilitates a comprehensive assessment of the reference initial model’s goodness and provides an indication of how well the output S-wave velocity model fits the datasets. Additionally, this method offers flexibility in selecting the maximum and minimum values of S-wave velocity for the pseudo-initial model estimation .

2.4.2 Inversion Stage 2—Joint Inversion Procedure

The pseudo-initial model forms the basis of Stage 2 of the inverse algorithm, which will derive a detailed S-wave velocity model. The pseudo-initial model can be generated either by the RMS or PSO method. The user may decide which pre-conditioning method to use. Our synthetic experiment illustrates that the output S-wave velocity model from both methods is comparable and the difference between the S-wave velocity model generated by two methods is smaller than the error (see Figs. 3 and 4).

Through the inversion stage 2, the RFR and VS,app are inverted applying a joint inversion core algorithm described in 2.3 using the pseudo-initial model (Fig. 1a) estimated either by RMS or PSO method. The model with the minimum value of the objective function according to Eq. 7 is then selected as the final model of the joint inversion framework (Fig. 1b).

3 Synthetic Tests

To assess the performance and reliability of our inversion algorithm we performed three synthetic tests. For all tests, we generate a synthetic model by perturbing a two-layer reference model that extends to a depth of 70 km. The unperturbed reference model consists of two layers at depths of 0–40 km and 40–70 km, with S-wave velocities of 3.0 km/s and 4.1 km/s, respectively. In the first experiment, we analyse the effect of the initial delay time thickness errors, the second experiment focuses on estimating the S-wave velocity error using noise-free synthetic data, and in the third synthetic test, we investigate the effect of noise and stacking of the receiver functions for different ray parameters for both random model search and PSO method.

3.1 The Effect of the Initial Time Thickness Errors

This experiment is designed to test the sensitivity of the linearised joint inversion approach with deviations in spatial layers thicknesses and initial S-wave velocity so that the actual delay time thicknesses of the synthetic model are not matched any longer. We generated a synthetic model by perturbing the reference model by 0.4 km/s in four depth ranges and parameterised the perturbed model into 32 layers. The delay time thicknesses of these layers were calculated according to the time difference between the Ps and Pp arrivals (Jacobsen & Svenningsen, 2008; Zhu & Kanamori, 2000) expressed by Eq. 9:

$$\Delta {t}_{P{s}_{j}}=\Delta {z}_{j} (\sqrt{({{V}_{{s}_{j}}}^{-2}- {p}^{2})} - \sqrt{({{V}_{{P}_{j}}}^{-2}- {p}^{2})})$$
(9)

where \(\Delta {t}_{P{s}_{j}}\) represent the delay time thicknesses, \(\Delta {z}_{j}\) is the spatial layer thicknesses, \({V}_{{s}_{j}}\) and \({V}_{{P}_{j}}\) represents S- and P-wave velocities and \(p\) is the ray parameter.

We then inverted the calculated RFR and VS,app of the synthetic model using 4 sets of 250 random initial models parameterised into 32 layers. The random initial models in all sets were generated by perturbing the reference model employing the Cubic Legendre Polynomial (Ammon et al., 1990; Jacobsen & Svenningsen, 2008). The first set uses the actual delay time thicknesses of the synthetic model calculated from Eq. 9, and the subsequent three tests contain a maximum 10, 30, and 60 percent random error in the synthetic delay time thicknesses (Fig. 2a–d respectively).

Figure 2
figure 2figure 2

Results of linearised joint inversion for initial models with different delay time thickness errors. The S-wave velocity of reference model perturbed by Cubic Legendre Polynomial for each initial model. The synthetic delay times with different perturbation values are then used to calculate spatial layer thickness of each layer in initial models. a synthetic delay times, b synthetic delay times with a 10% perturbation, c synthetic delay times with a 30% perturbation, and d synthetic delay times with a 60% perturbation. The initial models subpanel represents the initial models (grey line) and synthetic model (black line). The subpanels of estimated models show the estimated models (grey line) and the synthetic model (black line), while the red line represents the mean estimated model. The mean difference subpanel illustrates the difference between mean estimated and synthetic model. Estimated RFR and apparent velocities are depicted in the Estimated RFs and Estimated Apparent velocities subpanels. Last subpanel shows the evolution of the objective function of each initial model coloured according to their final objective function

Figure 2a demonstrates the linearised joint inversion results for a set of 250 initial models with synthetic initial delay time thicknesses. The grey and black lines in the subpanel entitled initial models shows randomly generated initial and synthetic models respectively. Grey lines in the estimated models subpanel represent the estimated S-wave velocities for each initial model interpolated on a 32-layer setup. The mean estimated S-wave velocity of each layer is calculated by averaging the estimated S-wave velocities resulting from the linearised joint inversion of all initial models and represented by the red line in the estimated models’ panel. The mean difference subpanel is depicted according to the difference between mean estimated and synthetic S-wave velocities. The estimated RFs and estimated apparent velocities subpanel show the calculated RFR and apparent velocity curves of estimated S-wave velocity models. The objective function evolution subpanel illustrates the reduction of the objective function for each initial model in the linearised joint inversion iterations colour-coded according to the objective function value.

Our first synthetic test shows that even with a large difference between initial and synthetic S-wave velocities, the linearised joint inversion was able to estimate an accurate model when the delay time thicknesses of the model are known (Fig. 2a). The difference between the mean estimated and synthetic S-wave velocity in Fig. 2a is smaller than 0.05 km/s for the majority of the depth ranges. However, this value increases to a maximum of 0.25 km/s near the sharp boundaries and dampens again within ~ 4 km of these boundaries. These variations are characteristic of our linearised joint inversion approach and caused by the imposed smoothing constraint (Eq. 7). The average difference between mean estimated and synthetic S-wave velocities increase to ~ 0.1 km/s as the error in the initial time thicknesses amplifies to a maximum of 60% (Fig. 2d). These results illustrate the usefulness of estimating a pseudo-initial model with optimised delay time thickness to reduce the estimated errors when dealing with real datasets and consequently unknown delay time thicknesses. Considering the sensitivity of the VS,app inversion, we expect a good resolution in the first 20 km of the model. However, the mean difference of the estimated velocities in the first layer (0–1.25 km) of the inversion is slightly larger than the expected value for the first 5 km of the model. This increase in the mean difference arose as a result of choosing the Gaussian filter width, which reduces sensitivity in the first shallow layer according to the frequency content. Wang et al. (2022) illustrated the effect of choosing Gaussian filter width on the joint inversion of VS,app and RFs. They used variable Gaussian factor ranging from 1.0 to 5.0 in their receiver function calculation, which in the case of using high-quality data, can improve the near-surface velocity estimation. Nevertheless, considering the general quality of available datasets, we chose a lower Gaussian width filter (Gaussian factor of 3.5, corresponding to a vertical resolution of ~ 0.6 km for a velocity of for example 4 km/s) to reduce the effect of noise in receiver function computation with the cost of a slight reduction in the near-surfaces resolving power.

3.2 Joint Inversion of Noise-Free Data

In this experiment, we assumed that the initial delay time thicknesses are unknown and applied two-stage joint inversion procedures to estimate the S-wave velocity of the noise-free synthetic dataset using both random model search (Fig. 3a) and PSO (Fig. 3b) as the pseudo-initial model estimator. The synthetic dataset is generated by perturbing the two-layer reference model in 4 different depth ranges by 0.6 km/s (light blue line in Fig. 3a estimated models subpanel). We chose to increase the perturbation by 0.2 relative to the previous experiments to test the joint inversion capability of estimating S-wave velocities in the absence of good initial models.

Figure 3
figure 3

a Two-stage joint inversion of noise-free data using random model search method to find the pseudo-initial model. The initial models subpanel represents the unperturbed reference model (light blue line) and random initial models (grey lines). The estimated models subpanel represents the best estimated S-wave velocity of each initial model coloured according to their objective function, a thick red line illustrates the average S-wave velocity of all estimated models interpolated on a 40-layer setup, and the light blue line shows the synthetic model. The mean difference subpanel shows the difference between average and synthetic S-wave velocity. estimated RFs and estimated apparent velocities subpanels demonstrate the calculated RFR and VS,app curves for each estimated model coloured according to their objective function. The light blue line in these subpanels represents the synthetic RFR and VS,app curve. The rightmost panel shows the objective function evolution of each initial model coloured according to their objective function. b Two-stage joint inversion of noise-free data using the PSO method to find the pseudo-initial model. The estimated pseudo-initial models are shown by the grey line in the initial models’ subpanel. The estimated S-wave velocity model generated by the pseudo-initial model, interpolated S-wave velocity calculated by averaging estimated S-wave velocity on a 40-layer setup, and synthetic S-wave velocity are depicted in the estimated models subpanel by navy blue, light blue and red line respectively. The difference between the interpolated model and synthetic S-wave velocities is illustrated in the interpolated difference subpanel. The estimated and synthetic RFR and VS,app curves are demonstrated in the estimated RFs and the estimated apparent velocities subpanels. The cost function evolution panel depicted the cost function of the global best minimum particle in each iteration through the pseudo-initial model estimation using PSO

The random model search method starts with generating a series of correlated random 10-layer models with fixed spatial layer thickness according to the Cubic Legendre Polynomial (grey lines in Fig. 3a initial models subpanel). The variations in the initial models’ S-wave velocities lead to different delay time thicknesses for each initial model and ensure searching through the delay time thickness search space. Our joint inversion core algorithm estimates eight S-wave velocity models for each initial model (Fig. 1a; see Sect. 2.3.2). The final output model is the model among the eight estimated models that generates the minimum objective function according to Eq. 7. The estimated S-wave velocity model, RFR, and VS,app for each initial model are presented in Fig. 3a. The mean S-wave velocity model was calculated by finding the average S-wave velocity of all estimated models (red line in Fig. 3a Estimated models subpanel) interpolated on a 40-layer parameterisation. In the final step, we chose the initial model that produces a S-wave velocity similar to the mean S-wave velocity as the pseudo-initial model of the random model search method.

Figure 3b illustrates the joint inversion results employing the PSO algorithm in estimating the pseudo-initial model. The red line in the estimated models subpanels in Fig. 3b shows the estimated S-wave velocities using the pseudo-initial model resulting from the PSO algorithm interpolated on a 40-layer setup. The difference between the interpolated and synthetic S-wave velocities is less than 0.15 km/s for the majority of depth ranges, however, the mean difference exceeds this value in several parts of the model in both algorithms. In general, due to the low sensitivity of apparent velocity in depth greater than ~ 50 km, the resolving power of our inversion approach deteriorates in the deeper part of the model. Figure 3 also represents an increase in error near the major boundaries due to the smoothing constraint discussed in the previous synthetic test.

3.3 Effects of Data Noise and Stacking

This experiment is designed to simulate the joint inversion framework for a real dataset. We used the ray parameters and back azimuths of the teleseismic events recorded by the permanent station CHBR in southeast Iran (see data section below) to create seismograms with a maximum 50% noise level from a synthetic S-wave velocity model with the perturbation of 0.6 km/s in different depth ranges (light blue line in estimated models subpanel in Fig. 4). The RFs for each seismogram are then calculated, stacked, and used to determine the synthetic VS,app curve observation. We then perform the joint inversion framework using both random model search (Fig. 4a) and PSO (Fig. 4b) as the pseudo-initial model estimator to retrieve the S-wave velocity model.

Figure 4
figure 4

Two-stage joint inversion of data with 50% noise level. a Using random model search with 200 initial models. b Using the PSO optimization method. Subpanels are the same as Fig. 3

Figure 4 illustrates the resolution and reliability of our joint inversion procedures. The mean estimated S-wave velocities of the random model search method to the depth of 40 km are resolved with an error of less than 0.2 km/s. The PSO optimizer also generated a pseudo-initial model (grey line in initial models’ subpanel in Fig. 4b) that estimated a S-wave velocity model with an average error of less than 0.2 km/s in the upper 40 km of the model. However, the initial models used in the random model search were generated according to the reference model with a maximum of 0.6 km/s difference from the synthetic S-wave velocity model. Thus, we can conclude that in the absence of any estimate of the S-wave velocity model for the real dataset inversion, the PSO optimiser estimates a more reliable S-wave velocity model.

The mean error of the estimated S-wave velocities, nonetheless, increased in the deeper part of the model. The 40 km sharp change of the S-wave velocities and the low sensitivity of the VS,app to the deeper part of the model along with the imposed noise level in seismograms and stacking of RFs could be causative sources for the increment of errors in the deeper part of our model. The sharp increase in error at the depth of ~ 60 km in both methods could be a result of imposed noise in the dataset. The joint inversion of RFR and VS,app is often used to image the upper crustal S-wave velocities (e.g., Park & Ishii, 2018; Wang et al., 2022). Our synthetic tests also show that the estimated S-wave velocities in the shallow part of the model (< 50 km) are more reliable than in the deeper part of the model. However, the resolving power of our joint inversion approach in the deeper part of the model (> 50 km) demonstrates that our proposed framework can be used to infer the general S-wave velocity structures of these depth ranges.

4 Real Data Application

We applied our joint inversion framework on a dataset collected by a permanent station in the southeast Iran in Makran subduction zone (Fig. 5). We chose 377 teleseismic events with distances of 30° to 90° and magnitudes greater than five.

Figure 5
figure 5

Topography map of the Makran subduction zone in southeast Iran. The red triangle shows the location of CHBR station

We resampled the raw data to 20 samples per second and applied a 0.05–2.5 Hz bandpass filter to remove noise outside this frequency range. We computed the RFs using the iterative deconvolution method of Ligorría and Ammon (1990) and a Gaussian filter with a width of 3.5 s (full width at half maximum (FWHM) = 0.557 Hz, equivalent to a cut-off frequency of 1.74 Hz). We used a series of cosine low-pass filters with widths from 0 to 25 s to determine the apparent velocity of the RFs (Fig. 6d). We then selected the RFR with the maximum amplitude at the P-arrival and apparent velocities above 1.5 km/s at all filter widths which resulted in a dataset that contains 124 RFR time series (Fig. 6c). We stacked the selected RFs for station CHBR to mitigate the influence of noise, anisotropy, and structural heterogeneity (Fig. 6a). We illustrate the stacks of RFs for station HYB and CAD in the supplementary materials Figs. S3 and S8 respectively. Several stacking methods are available in the jrfapp package. To calculate different stacking of RFs, we first divide all the calculated RFR into 24 bins according to their back-azimuth and stack all the RFR in each bin linearly. Then we applied several stacking methods. These stacking methods include weighted stack according to the number of RFR in 24 back-azimuth bins, phase weighted stack, K0 stack (e.g., Bianchi et al., 2010; Dashti et al., 2020) and linear stack of the stacked RFR in each bin (see Figs. S2 and S7 of supplementary).

Figure 6
figure 6

Receiver functions and apparent velocities data which obtained for CHBR station. a Stacked RFR, b apparent velocity curve (VS,app) of stacked RFR. c Radial receiver function time series stacked for different back-azimuths within 24 bins. The back-azimuth and number of RFR included in each bin depicted at the left and right side of the stacked RFR, respectively. d Apparent velocities (VS,app) of the stacked RFR shown in panel (c). Red line represents the average of all apparent velocity curves

We used the linear stacking method for stacking RFR recorded in all back azimuth for CHBR station. Thus, each sample of stacked RFR is calculated by finding the mean value of the corresponding sample of all RFR that passes the apparent velocity criteria. Then, we computed the apparent velocity of the stacked receiver function (Fig. 6b) and performed a joint inversion of the stacked RFR and its VS,app curve for the CHBR station. The ray parameter of the stacked RFR is the average of all ray parameters included in the stacking process. Figure 7 shows eight estimated S-wave velocity models for CHBR with different numbers of layers, constant, and variable smoothing-damping factors resulting from joint inversion core algorithm using the estimated pseudo-initial model from PSO method (output of stage-2 of framework). The highlighted model produced a minimum objective function according to Eq. 7 (Fig. 7; inversion outputs of HYB and CAD stations presented in Figs. S3 and S8). Our estimated S-wave velocity model points to a sharp decrease in velocity at the depth of 2 km. The S-wave velocity then increases sharply at the depth of 6–8 km. The S-wave velocity for the depth ranges of 10 to 18 km is almost constant around a value of 3.0 km/s. The S-wave velocity slightly reduces at a depth of ~ 18 km before it increases at another velocity interface at the depth of ~ 24–30 km. The S-wave velocity for the depth ranges greater than 30 km is estimated at about 4.0 km/s.

Figure 7
figure 7

The estimated S-wave velocity, receiver function, and apparent velocity for different layers setup using the estimated pseudo-initial model from PSO method (output of joint inversion stage-2). The delay time thickness of pseudo-initial model stratified into 9 layers is divided into 2, 3, 4, and 5 in each subplot. The resulting delay time thickness used to calculate spatial layer thickness assuming the S-wave velocity of each layer of pseudo-initial model. These 4 initial models are used in the linearised joint inversion (Eq. 7) with initial damping and smoothing constraint (upper subplots) and increased damping and smoothing (lower subplots). The highlighted model is the model that generates lowest objective function according to Eq. 7. The objective function (norm) of each linearised joint inversion is reported in the upper part of each subpanel

4.1 Result and Discussion

The station CHBR is located in the Makran accretionary wedge (MAW) which is an active subduction in southeast Iran and south Pakistan (Fig. 5). It was previously used in several studies to infer the MAW structures using joint inversion of RFR and surface wave dispersion curve (Irandoust et al., 2022; Motaghi et al., 2020; Penney et al., 2017; Priestley et al., 2022; Taghizadeh-Farahmand et al., 2015). Additionally, Haberland et al. (2021) presents a P-wave velocity model for MAW across three deep seismic sounding profiles. These studies allow us to compare the results of our approach with different methods used to infer the subsurface structure from the RFR time series and controlled seismic sources.

Figure 8 compares the S-wave velocity model estimated for CHBR by this study with those from Priestley et al. (2022), Motaghi et al. (2020), Penney et al. (2017), Irandoust et al. (2022), and Haberland et al. (2021). The S-wave velocity model of Haberland et al. (2021) deduced from their P-wave velocity model using a constant VP/VS = 1.8. Our velocity model is consistent with different models at the depth ranges of 10–50 km. However, compared to other RF studies this research has much lower velocities in the shallow part of the model. Interestingly, Haberland et al. (2021) model shows a similar decreased S-wave velocity at the shallow part of the model (translated from P-wave velocity in the original model). The higher sensitivity of apparent velocity to these depth ranges allows us to resolve a shallow low velocity structure that was not detectable by the joint inversion of RFR and surface wave dispersion. Surface waves from earthquakes have very limited or no resolution in the upper crust. The S-wave velocity decrease at the depth of 2 km in our model is a result of the decrease in the RFR amplitude observed at ~ 1 s after the P onset (light blue line in Fig. 8b). This peak manifests itself as a decrease in the otherwise increasing trend of VS,app at the periods of 0–2 s (light blue line in Fig. 8c) which is mapped as a shallow low velocity in the estimated S-wave velocity model. This low velocity anomaly represents the shallow sediments (Makran Sand) that overlaid the older sediments (Himalayan Turbidite) with higher S-wave velocities (Grando & McClay, 2007; Pajang et al., 2021).

Figure 8
figure 8

a A comparison of the estimated S-wave velocity of this research (black line) with Penny et al. (2017; green line), Priestley et al., (2022; blue line), Irandoust et al., (2022; red line), Motaghi et al., (2020; purple line), and Haberland et al., (2021; orange line). b Receiver functions calculated from models shown in (a), light blue line represents observed RFR. All RFRs are normalised for consistency in illustration. c Apparent velocity curves calculated from models shown in (a), light blue line represents observed VS,app curve

The S-wave velocity increment at the depth of 26–30 km is in agreement with the other models shown in Fig. 8. This boundary marks the location of the oceanic Moho discontinuity and is the result of a sharp Ps at ~ 4 s in the stacked RFR. The similarity between S-wave velocity generated by our proposed method and those deduced from joint inversion of RFR and surface wave dispersion is an indication that the RFs alone can provide a good estimation of S-wave velocity beneath a seismic station.

The difference between our model and the previous studies represented in Fig. 8 increases at a depth greater than 50 km. We again point out the better sensitivity to S-wave velocity of the VS,app compared to the surface wave studies in shallow depth ranges, however, this relationship is flipped at greater depth. A comparison of estimated S-wave velocity of HYB and CAD stations is presented in Figs. S5, S6 and S10 of the supplementary materials.

5 Conclusion

We presented a new framework for joint inversion of the RFR and VS,app curve and a Python package (jrfapp) which implements this framework. We show that this method can be used to estimate a high-resolution absolute S-wave velocity in crustal depth ranges by using different synthetic tests. In our approach, we simultaneously estimate S-wave velocity and find best layer setups. In addition, a pseudo-initial model is estimated using two individual methods, which helps us overcome the inherent non-uniqueness of the joint inversion of receiver function and apparent velocity curve. To test the ability of the method for resolving the crustal structure, we estimated S-wave velocity model beneath station CHBR located in the Makran subduction zone in southeast Iran. The comparison of our estimated model with previous research shows consistent models in depth range of 10–50 km while the shallower features are more consistent with the model resolved by an active source deep sounding in the region. Our tests confirm that the method can recover general velocity structures at depth greater than 50 km due to the low sensitivity of the VS,app to these depth ranges. The main advantage of this method is the superior sensitivity to absolute velocities, at upper crustal depth compared to surface wave dispersion, as well as the fact that both datasets are entirely consistent, as they are derived from the same raw data, the teleseismic P-wave recording. The fact that no pre-existing, complementary datasets (such as dispersion curves) are required makes this method extremely easy to apply to any recorded teleseismic P-wave data. Surface wave dispersion curves are derived from an entirely different part of the earthquake waveform, require entirely different processing and have much lower resolution in general, and limited or no resolution on upper-crustal scale and thereby provide a more regional representation of the structure of the structure around the station, depending on the setup and station distribution of the tomography model.