1 Introduction

Site-specific seismic hazard is often assessed in a two-step procedure where the hazard is first computed at a reference rock horizon and then local ground conditions are integrated through soil response analysis (e.g., Rodriguez-Marek et al. 2014). We use here the term “reference bedrock” to indicate a site (or horizon) unaffected by the response of the deeper velocity structure thus having a flat (unamplified) response over a sufficiently wide frequency range. The use of empirical ground-motion models (GMMs) to estimate reference bedrock motion is challenging because 1) few models are developed for bedrock conditions due to the limited number of hard-rock sites in strong-motion databases, especially in the magnitude-distance range of interest for the seismic hazard, and 2. the adjustment of models from standard rock to reference bedrock is affected by significant epistemic uncertainties in both host and target conditions. This translates into large epistemic uncertainties in modelling the ground motion at reference bedrock.

Several attempts have been made in the last decades to derive GMMs for reference bedrock conditions. Many of these attempts aimed to adjust existing models from soft-rock to reference-rock conditions through proxy-based generic (Cotton et al. 2006; Van Houtte et al. 2011; Ktenidou and Abrahamson 2016; Lanzano et al. 2022) or site-specific (Biro and Renault 2012; Ameri et al. 2017; Tromans et al. 2019; Bommer et al. 2015) corrections. Others addressed the problem differently, mainly by deconvolving time histories from site effects under the 1D assumptions before GMM developments, resulting in corrected hard-rock motions (Cadet et al. 2012; Laurendeau et al. 2018; Shible et al. 2018). The detailed review of hard-rock motion predictions by Bard et al. (2020) resulted in several recommendations, one of which is to use of generalized inversions techniques (GIT) to robustly predict and remove site effects. Following these recommendations, Shible (2021) and Shible et al. (2023) extended the deconvolution approach of surface recordings beyond the limitation to 1D conditions through the use of site terms from GIT. Although the results of the deconvolution approach are promising, the potential lack of data in low or moderate seismicity regions remains an obstacle to the development of regional GMMs for reference bedrock. Moreover, GMMs generally provide ground motion intensity measures whereas full time histories may be of interest for soil response and dynamic structural analyses. Hence, alternatives to empirical GMMs should be considered to estimate reference ground motion for seismic hazard analyses.

One alternative is to simulate earthquake ground motions using for example 3D physics-based approaches that are capable of generating synthetic time-histories at bedrock considering a local or regional 3D crustal model and an extended-source model (e.g., Frankel et al. 2018; Paolucci et al. 2015; Pitarka et al. 2021). However, such approaches still suffer from the limited knowledge of the propagation medium which prevents, in most cases, obtaining time histories covering a sufficiently large frequency band (0.1–20 Hz). Another particularly appealing approach is the Empirical Green’s Functions (EGF) simulation method (Hartzell 1978; Hutchings 1994; Irikura 1986; Irikura and Kamae 1994) which combines empirical data and theoretical models. The basic idea of the EGF approach is to interpret recordings of small events at the site of interest as reasonable approximations of Green’s functions (describing the impulse response of the medium) and to convolve them suitably with more or less complex source model to simulate time histories that correspond to larger earthquakes.

The power of this technique lies in its ability to map the linear site- and path-specific effects into the ground-motion field, without the need for computationally expensive approaches to simulate 3D wave propagation. The counterpart of this approach is that, being based on site and path from small earthquakes, the simulated ground motion for large events is conditioned to the location, quality and availability of such data. Moreover, in an application to reference bedrock sites, the challenge is to remove the site effects already included in the recordings at each site.

In this article, we adopted the EGF simulation approach and, building upon the previous works (i.e., Laurendeau et al. 2018; Shible et al., 2023), we use the GIT to estimate empirical source and site terms and remove them from observed ground motions. The obtained path effects (effectively EGFs), are coupled with kinematic rupture models to simulate region-specific reference bedrock time histories that can be used alone or in combination with existing empirical datasets to adjust existing GMMs, derive new GMMs, or select hazard-consistent time histories to be used in soil and structural response analyses.

The proposed method is applied to a case study in a moderate-seismicity region characterized by limited amount of local data in order to challenge the approach and to assess its relevance in such context. Further studies are ongoing to apply the proposed approach in low-seismicity regions.

2 Methodology

2.1 Overview

The workflow of the proposed methodology is as follows:

  1. 1.

    The proposed methodology relies first on the nonparametric spectral decomposition approach (also called the generalized inversion technique, GIT) that has been developed and used in many studies to separate the contribution of source, path, and site terms (e.g., Castro et al. 1990; Oth et al. 2011; Castro et al. 2013; Ameri et al. 2020; Shible et al. 2022; Davatgari et al. 2021). A dataset of recordings is collected for the wide target region for the application of the GIT. This region is typically selected large enough to resolve source and site terms at regional scale.

  2. 2.

    GIT is applied to the selected data to separate source, path, and site terms from the observed Fourier spectra. The attenuation terms in the adopted nonparametric GIT approach are unaffected by site/source constraints applied to inversions (Bindi and Kotha 2020; Oth et al. 2011).

  3. 3.

    Deconvolved EGF (representing only the path terms) are obtained by removing from the observed records the average source and site terms estimated by GIT, through Fourier domain deconvolution similar to Shible (2021). The deconvolved EGF in time domain is obtained by inverse Fourier transform assuming no phase modification. This is done for a sub-set of EGF selected to be used in the simulations in order to sample the region around the target site.

  4. 4.

    For each target magnitude (e.g., Mw = 6), a set of kinematic rupture models following a k− 2 slip distribution and approximating an ω−2 source spectrum (Herrero and Bernard 1994; Bernard et al. 1996; Causse et al. 2009) are generated according to the approach presented by Dujardin et al. (2020). Uncertainties in kinematic rupture parameters (e.g., slip distribution, rupture velocity, hypocentral location, stress drop, rupture dimensions) are considered.

  5. 5.

    The source time function and the EGF associated with each sub-fault are then convolved to produce 3-component time histories that combine a simulated source contribution in addition to the empirical path effect. Because site effects have been removed, the simulated ground motions are representative of reference bedrock conditions.

2.2 Source and site spectral modeling

Generalized spectral inversion schemes are based on the principle of separation of the Fourier amplitude spectrum into three main components, as indicated in Eq. (1):

$$\:{FAS}_{ij}\left(f\right)={E}_{j}\left(f\right){A}_{ij}({r}_{ij},f){S}_{i}\left(f\right)$$
(1)

where \(\:{FAS}_{ij}\left(f\right)\) is the Fourier amplitude spectrum at each frequency \(\:f\) recorded at site \(\:i\) for event j,\(\:\:{E}_{j}\left(f\right)\) is the source function, \(\:{A}_{ij}({r}_{ij},f)\) is the path contribution over the event-site distance \(\:{r}_{ij}\), and \(\:{S}_{i}\left(f\right)\) is the site response term.

Applying logarithm to Eq. (1), we obtain a linear system of the form \(\:A.x=b\), where \(\:b\) is the data vector, \(\:x\) is the solution of the system, and \(\:A\) describes the system matrix (e.g. Andrews 1986; Castro et al. 1990). Following its definition, the system has two undetermined degrees of freedom, which can be solved if two constraints are added in the inversion. The first constraint is often applied to one (site term = 1) or several site responses (mean of site terms = 1), while the second one is applied to the attenuation functions by defining a reference distance \(\:{R}_{0}\) at which \(\:A(r={R}_{0},f)\)=1 at all frequencies \(\:f\). Solving the linear system leads to non-parametric terms, and thus a so-called a non-parametric GIT (Bindi et al. 2009; Oth et al. 2011; Ameri et al. 2011; Bindi and Kotha 2020).

As discussed further in this paper, we apply the inversion scheme to a large region around a target site to determine the source spectra \(\:E\left(f\right)\:\)for each earthquake, the attenuation curves as functions of the hypocentral distance \(\:r\) at each frequency \(\:A\left(r,f\right)\), and the site amplification \(\:S\left(f\right)\) as a function of frequency for each site in the dataset. The source and site terms are then combined to form a non-parametric correction term, which is used to derive the EGFs prior to simulations. In this study, we do not intend to provide a detailed interpretation of inversion results in terms of source or attenuation parameters and as such we will only focus on the non-parametric terms.

2.3 Ground motion simulations using the EGF technique

According to the representation theorem (Aki and Richards 2002), assuming a rectangular fault rupture characterized by length L and width W, the simulated acceleration \(\:U\left(\overrightarrow{r},t\right)\) for a station at position \(\:\overrightarrow{r}\) can be written as:

$$\:U\left(\overrightarrow{r},t\right)={\int\:}_{0}^{L}{\int\:}_{0}^{W}R\left(x,y;t\right)\cdot\:F{G}_{x,y}(\overrightarrow{r},t)dxdy$$
(2)

where \(\:R\left(x,y;t\right)\) represents the contribution to the moment rate function at position (x, y) on the fault, and \(\:F{G}_{x,y}\left(\overrightarrow{r},t\right)\) is the Green’s function in acceleration associated to the same subfault.

Originally, Hartzell (1978) proposed to use small-magnitude events as EGF which implicitly allows for the complexity of the propagation and linear site effects over a broad frequency range. The simulation approach adopted in this study couples the EGF technique with a kinematic description of the extended fault assuming a k− 2 slip model. We provide here a general overview of the approach and detail the specific choices made for the present application referring to Dujardin et al. (2018, 2020) for further readings on the general formulation of the method. The main steps are:

  1. 1.

    Rupture Area Dimensions. The seismic moment M0, the focal mechanism, and the size of the fault on which the rupture is expected are postulated. Then, the size of the rupture area on this fault, which is assumed to be rectangular, is automatically calculated from the stress drop (Δσ) and the seismic moment M0, as originally proposed by Herrero and Bernard (1994). The dimensions of the rupture area are derived as follows. From the input stress drop, the theoretical corner frequency (\(\:{f}_{c}\)) is derived following the Brune (1970) model. Then, according to the following approximation (Hanks 1979; Hanks and McGuire 1981): \({f}_{c}=1/{T}_{Rup}\), where \(\:{T}_{Rup}={D}_{Rup}/{V}_{r}\:\) is the rupture duration and \(\:{V}_{r}\) is the rupture velocity (m/s), we derived the length (L) and width (W) of the rupture area by assuming that the characteristic size of the rupture is \(\:{D}_{rup}=\sqrt{{L}^{2}+{W}^{2}}\). Thus, only the ratio between L and W is necessary to derive the dimensions of the rupture area. \(\:{V}_{r}\) depends on the shear-wave velocity (\(\:{V}_{S})\) in the vicinity of the fault, and it commonly varies between 0.7* \(\:{V}_{S}\) and 0.85* \(\:{V}_{S}\) (Heaton 1990). \(\:{V}_{S}\) in the vicinity of the fault is also used to derive the differences in travel times between the different parts of the rupture area and the target station. Both \(\:{V}_{S}\) and the ratio between \(\:{V}_{S}\) and \(\:{V}_{r}\) are parameters to be chosen by the user.

  2. 2.

    Static Slip Generation. Once rupture dimensions are defined, the static slip distributions of the source are generated in two steps, as the low and high-frequency parts of the static slip are constrained separately. The low-frequency part is set to a constant value over the rupture area (the mean slip derived from the seismic moment and rupture area). The high-frequency part is defined in the wavenumber domain following Herrero and Bernard (1994) and should have a k− 2 decay at high wavenumbers.

  3. 3.

    Spatial Sampling. The rupture area is discretized into sub-faults where their sizes\((\:{SF}_{dim})\)are defined according to the target maximum frequency (\(\:{f}_{kmax}\)) of the simulations which is 20 Hz in our application.

$$\:{SF}_{dim}=\frac{{V}_{r}}{2\:\times\:\:{f}_{kmax}}$$
(3)

Note that this approach differs from the EGF formulation based on scaling laws between large and small earthquakes, in which the sub-faults size depends on the EGF seismic moment (e.g., Irikura and Kamae 1994; Miyake et al. 2003).

  1. 4.

    Rupture kinematics. The rupture kinematics is defined based on the position of the rupture starting point and of the rupture velocity. The slip rate function is defined as the sum of the isosceles triangles as proposed by Hisada (2001). The slip rate function can be parameterized by three parameters: the slip rate function duration \(\:{\tau\:}_{rise}\) (or rise time), the number of summed triangles (Nv) and Ar which corresponds to the ratio of the area of the j + 1th triangle with respect to the ratio of the jth triangle (i.e. Ar = Aj+1/Aj). For the present application, we use Nv = 4 and Ar =\(\:\sqrt{2}\) (Dujardin et al. 2020). Hisada (2001) showed that it has two characteristics frequencies: \(\:f1=1/(2{\tau\:}_{rise})\) and \(\:\:fmax=1/{\tau\:}_{1}\), where \(\:{\tau\:}_{1}\:\) is the duration of the first triangle. \(\:{\tau\:}_{rise}\) is supposed to be constant over the rupture area, and is defined according to Somerville et al. (1999):

$$\:{\tau\:}_{rise}=2.03\:\times\:{10}^{-9}{\left({M}_{0}\right)}^{\raisebox{1ex}{$1$}\!\left/\:\!\raisebox{-1ex}{$3$}\right.}$$
(4)

Finally, the absolute source time function is obtained by summing the contribution of each sub-fault.

  1. 5.

    Green’s function adjustments. Several adjustments can be applied in order to correct the EGF for amplitude and time differences when it is shifted from its original location to the subfault position (Dujardin et al. 2018, 2020). Moreover, a radiation pattern correction is also proposed by Dujardin et al. (2020) when the focal mechanisms of the EGF and of the target event are not the same. In this application, we corrected the EGF in amplitude for geometrical and anelastic attenuation (Qs) based on the GIT results. However, we did not use the radiation pattern correction because the information on the focal mechanism of the small events was not available. The simulated target events are thus assumed to have the same focal mechanism as the small events selected for the EGF. This appears reasonable since the selected small events are considered as representative of the event type that may occur close to the target site. The correction for the travel-time difference of the EGF is tested but finally not adopted because of the relatively small dimensions of the rupture and thus the small the time-shift of the EGF (also note that as discussed in the following the simulated ruptures are assumed to be centered on the EGF hypocenter).

  2. 6.

    Time series generation. The source time function and the EGF associated with each sub-fault are then convolved and integrated over the fault in order to obtain ground motions from the simulated target earthquake.

3 Application to a case study

3.1 Evaluation of source and site terms

The proposed approach is applied to a case study corresponding to a hypothetical site located on the western coast of Italy, few tens of kilometers north of Rome.

The data selection relies on the earthquake recordings at stations as close as possible to the target site as reported in the ESM database (Engineering Strong-Motion Database, Luzi et al. 2016; Lanzano et al. 2019b) which contains uniformly processed strong motion data for magnitude ≥ 4.0 (though recordings from some events with magnitudes between 3.5 and 4.0 are also reported), mainly recorded in the European-Mediterranean regions and the Middle-East. Starting from the ESM flatfile containing data up to 2016 (Lanzano et al. 2018) additional recordings have been compiled up to the end of 2021.

The data processing by ESM (described in details in several publications e.g., Paolucci et al. (2011); Luzi et al. (2016); Lanzano et al., 2019b) included application of a 4th order Butterworth band-pass filter over the frequency range [Hp Lp], where the “Hp” and “Lp” are the high-pass and low-pass corner frequencies reported in the ESM flatfile, respectively. We consider in our analysis that the largest usable frequency bandwidth can be defined as [\(\:{f}_{min}=1.3Hp\); \(\:{f}_{max}=Lp/1.3\)], which allows to avoid edge-effects of the filter.

A first dataset, dedicated for the GIT analysis and thus called hereafter DATAGIT, includes a wide selection of stations and earthquakes in Central Italy as shown in Fig. 1.

Fig. 1
figure 1

Map showing the ESM subset of stations (triangles) and earthquakes (circles) used for the GIT application (DATAGIT). The black lines represent the source-site ray paths in the dataset. The magnitude distribution versus epicentral distances REPI (a) and focal depths (b) for the defined ESM subset DATAGIT, is shown in the scatter plots

We apply the following selection criteria on ESM data to define DATAGIT:

  • We discard all records with epicentral distance (REPI) > 250 km to avoid the impact of long-distance attenuation on the results of the inversions.

  • Only events at depths < 40 km are kept to restrict the analysis to shallow-crustal earthquakes,

  • All events with missing depth information are discarded.

  • The minimal number of recordings per station and event is set to 5.

The criteria imposed result in the data magnitude-distance distribution shown in Fig. 1. The subset appears to include enough recordings at different magnitudes and distance ranges, covering magnitudes between 3.5 and 6.5 and REPI starting from a few kilometers. The depths of considered earthquakes are dispersed between a few kilometers and 40 km. DATAGIT contains 10 113 recordings in total.

We assume that DATAGIT covers a region with homogenous attenuation, as also assumed in previous GIT in Central Italy (e.g., Morasca et al. 2023). This allows us to apply inversions using a single attenuation model. Such assumption has limited implications on the results of this study because it is only used to separate source and site terms in the GIT and variations in path effects due to inhomogeneous attenuation are directly mapped into the corrected EGF finally used in the simulations. We set the reference hypocentral distance to \(\:{R}_{0}\) = 10 km (Morasca et al. 2023) due to small number of data at shorter distance bins. The impact of this choice is usually interpreted as shifting the source spectra to a distance \(\:{R}_{0}\). Then, this shifting is generally counterbalanced by rescaling them with \(\:{R}_{0}\) (e.g., Castro et al. 1990; Oth et al. 2011; Bindi and Kotha 2020). However, the nonparametric source terms are not unaffected by the choice of \(\:{R}_{0}\) and despite the fact that preliminary tests showed minor impact on the resulting corrected EGF, we recommend that this is assessed for future applications to other datasets.

The way of dealing with site constraints in GIT is not unique (Bindi and Kotha 2020; Drouet et al. 2010; Nakano et al. 2015; Pacor et al. 2016; Shible et al. 2022; Morasca et al. 2023), and it can be applied on one or several site terms. The six reference stations proposed by Morasca et al. (2023) are used in this application. As already mentioned, the choice of the site constraint has no impact in the present methodology due to the combined correction of source and site terms and their trade-off in the GIT. This is illustrated in Figure S1 for a simple example showing that the combination of GIT-derived source and site terms are stable and unaffected by the site constraint choice. This is also in agreement with previous studies (Bindi and Kotha 2020; Oth et al. 2011).

3.2 Correction of empirical Green’s functions (EGFs)

An automated procedure is implemented to search for recordings to be used as input to EGF simulations. It is important to note here that few recordings in ESM are reported at stations within 50 km around the target site (Fig. 2). Moreover, most of such stations sample earthquakes from the central or northern Apennines regions thus providing recordings related to large epicentral distance (> 100 km). Hence, the amount of recordings per station as well as the number of stations increase with increasing distance from the target site. Recordings corresponding to low-magnitude earthquakes recorded at short-distances (i.e., M ≤ 4 and epicentral distance lower than 50 km ) are very few and only at stations beyond 80 km from the target site.

Fig. 2
figure 2

Distribution of the number of recordings per station versus the epicentral distance (REPI ) to the target site. The total number of all recordings is represented by circles, recordings filtered to keep magnitudes < 4.0 and REPI <50 km by triangles, and recordings filtered to keep magnitudes < 5.0 and REPI < 100 km by squares. The blue, green and red colors mark the distance ranges 0–100 km, 100–120 km and 120–140 km, respectively

Thus, we searched for stations farther from the site (within maximum distance of 140 km), while keeping priority to data recorded at stations at shorter distance. We allowed for the selection of events up to magnitude 5.0 and we included recordings sampling source-to-station distances up to 100 km. We ensure an acceptable sampling of source-to-site distances by iteratively filling 10 equally-spaced bins of REPI (in log10 scale), between 1 and 100 km. Filling distance bins by recordings from different stations is done until a minimal number (Nmin) of 10 recordings is found. In the procedure, only recordings with \(\:{f}_{min}\) and \(\:{f}_{max}\) covering the frequency range 0.3–25 Hz are considered.

Figure 3 shows the magnitude-distance distribution of the set of recordings resulting from the automatic procedure (hereafter called DATAEGF). We observe that most of the distance bins are well sampled, also spanning different magnitudes, except for those within 1–3 km. We also remark that REPI bins between 35 and 100 km mostly contain stations within 100 km from the target site whereas data points with REPI between 15 and 35 km are mainly from stations at 100–120 km from the target site and data points at REPI < 15 km are mostly from stations farther than 120 km.

Fig. 3
figure 3

(a) The magnitude-distance distribution of selected recordings to construct the database for EGF simulations (DATAEGF). (b) Magnitude-depth distribution of these recordings. The color scale represents the distance to the target site. Nmin indicates that distance bins have been filled, whenever possible, by recordings from different stations until a minimal number of 10 recordings is reached

Afterwards, we applied the correction procedure to remove source and site effects from the recordings in DATAEGF. The correction function is defined as the combination of the corresponding average event-specific source and site-specific terms resulting from the GIT. The correction is applied in the Fourier domain assuming no phase modification of the original signals and the corrected time histories are obtained by inverse Fourier transform. An example is shown in Fig. 4, where we consider the recording of an earthquake of magnitude 3.9 at the station SNI of the IT network (REPI = 60 km). We can observe that after removing the source and site terms, the shape of the seismic signal is largely preserved because no phase modification was applied. Here, it is important to note that the signals are corrected for “averaged” source and site effects. Consequently, the corrected signal may still carry contributions from anisotropic effects of the source (e.g., directivity effects) and site (e.g., basin effects) effects.

Fig. 4
figure 4

The correction of source and site effects applied on the 2 components of an example recording. (a) The correction function for source and site effects. The correction function is padded to 1 and smoothed (half-cosine tapering in logarithmic scale) for \(\:\varvec{f}\) < 0.3 Hz and \(\:\varvec{f}\) > 28 Hz in order to cover the whole frequency band necessary for the spectral division. (b, c) The FAS of the original signal and the corrected one. (d, e) The original and source-site corrected time histories (in black and red, respectively) for both components E and N

A post-correction verification is undertaken in order to identify potential outliers in the frequency domain. For each corrected recording, the geometric mean of the Fourier amplitude spectra (FAS) of the two horizontal components is computed and the FAS are grouped in distance bins (the same bins used to select the data initially, i.e., 10 distance bins in the logarithmic scale between 0.1 and 100 km). Then for each distance bin, the median FAS and its standard deviation is computed, removing the FAS that are outside the range of ± 2.5 times the standard deviation from the median FAS within the frequency range 5–20 Hz. The process is repeated until no outliers are found. In the end, this process leaded to the identification of very few records as outliers (6 out of 103). This is shown in Fig. 5 for 4 distance bins as an example. The increasing attenuation with distance due to path effects is clearly observed in the FAS.

Figure 6 shows the source-to-site paths and the magnitude-distance distribution of the EGF finally retained for ground motion simulations of the target events. We note that the source-to-station paths of the selected EGFs show a weak coverage of the region within 80 km around the study site. As already pointed out, this is because stations close to the target site have well recorded far-distance events in the Apennines but lack of well-recorded local events. We also note that some of the EGF are related to the same event recorded at multiple stations as well as to the same station that recorded multiple events. Overall we selected 97 EGFs corresponding to 39 events and 35 stations.

Fig. 5
figure 5

The FAS of combined horizontal components of the source-site corrected signals, passing through outlier-detection procedure before being input for EGF simulations. The identified outlier in the distance bin 1–3 km is highlighted in red. The blue dotted lines correspond to the median ± 2.5 standard deviations, the limits beyond which the signals are flagged outliers. The dashed black lines show the frequency range \(\:{\varvec{f}}_{\varvec{m}\varvec{i}\varvec{n}}\)-\(\:{\varvec{f}}_{\varvec{m}\varvec{a}\varvec{x}}\) in which source-site corrected recordings are considered reliable

Fig. 6
figure 6

(Left) Map showing the source-site ray paths of DATAEGF from the initial dataset. Red ray paths correspond to outliers identified and excluded from simulations. (Right) Magnitude-distance distribution of DATAEGF with red triangles representing the identified outliers

3.3 Source modeling for target magnitudes

Once the EGFs have been selected, they are convolved with a kinematic source model for the target magnitudes to simulate synthetic time histories for events of relevance for seismic hazard. Here we consider Mw = 5.0, 5.5 and 6.0. In order to consider the uncertainties in the parameters describing the rupture geometry and kinematics of the target scenario events, the generation of the rupture models relies on random sampling. For each of the 39 events in DATAEGF, strike and dip angles of the target rupture are defined assuming uniform distributions within the following ranges: strike [0–360°] and dip [40°- 90°]. The rupture is constrained to have an aspect ratio of 2 and centered at the hypocentral location of the EGF event. Then, 30 kinematic rupture models are simulated for each of the considered target magnitudes and for each event in DATAEGF, as follows:

  • The stress drop (Δσ) values, which ultimately control (together with the magnitude) the rupture dimension and the corner frequency (fc), are sampled (using Latin Hypercube Sampling) assuming a lognormal distribution. A median Δσ = 2 MPa is selected being representative of the median stress drop obtained from the GIT. This value agrees with results from Morasca et al. (2023) for the bulk of events considered in their dataset (Mw below 5) in Central Italy. For larger magnitudes, Morasca et al. (2023) suggest increasing values of stress drop with average values of about 10 MPa for Mw ≈ 6. However, in this study we favor a simpler assumption, keeping in mind that estimates of stress drop are generally characterized by large scatter and self-similar earthquake scaling is still subject of debate. A standard deviation \(\:{\sigma\:}_{\text{l}\text{n}(\varDelta\:\sigma\:)}\)=0.5 is assumed in agreement with typical values inferred from empirical ground motion models (Cotton et al. 2013),

  • The k− 2 slip distributions are randomly generated as described in Sect. 2.3,

  • The position of the hypocenter is randomly located along the strike of the rupture and in its lower half,

  • The rupture velocity is randomly sampled in the range 0.7\(\:{V}_{S}\) and 0.85\(\:{V}_{S}\) following a uniform distribution. \(\:{V}_{S}\) = 3.2 km/s is assumed for the Central Italy region (Morasca et al. 2023). Moreover, rupture velocities are randomly perturbed by 0.1% in order to mimic realistic rupture propagation.

Figure S2 shows examples of the randomly generated Mw = 5.5 kinematic rupture scenarios in terms of rupture dimensions, slip distributions, and location of the hypocenter of the rupture. The distributions of some other relevant source parameters are shown in Fig. 7. The distributions of hypocentral depths are similar for the three magnitudes because they are controlled by the hypocentral depths of the EGF events especially for the smaller rupture dimensions (Mw = 5.0). The distributions of depth of the top of the rupture (Ztor) are also quite similar, although for the number of scenarios with Ztor smaller than 5 km increases with magnitude. As expected, the values of average slip, rupture length and rupture width increase with the target magnitude.

Figure 8 shows the generated absolute source time functions for the Mw = 5.5 scenarios in the time and frequency domains. The source spectra follow adequately the omega-squared model up to the requested maximum frequency (i.e., 20 Hz) according to the Hisada (2001) method. We note that the mean source spectrum of the 30 simulations is in good agreement with the Brune’s model for a mean stress drop value of the input distribution (2.3 MPa).

Fig. 7
figure 7

Representative rupture parameters obtained for the 30 rupture models for Mw 5.0, 5.5 and 6.0 earthquakes

Fig. 8
figure 8

Absolute source time functions generated by the k− 2 method (left) and corresponding source spectra (right) for the Mw = 5.5 scenarios (30 simulation, Nsim). The Brune source spectra for the minimum (0.67 MPa), mean (2.3 MPa) and maximum (6.2 MPa) stress drop values are also reported (in black) for comparison as well as the mean of the simulated source spectra (in gray)

3.4 Results for selected target magnitudes

3.4.1 Comparison with empirical GMM

We used the 97 EGFs and the 30 rupture models for each target magnitude resulting in slightly less than 3000 ground motions for each horizontal component covering source-to-site distances up to about 100 km. Starting from the whole set of simulated time histories, we computed response spectra (geometric mean of horizontal components) as well as other ground-motion intensity measures of interest (e.g., PGA, PGV, duration). The results of the simulated ground motions are first presented in terms of spectral acceleration values at selected spectral periods for the Mw = 5 and Mw = 6 scenarios. Note that simulations were band-pass filtered between 0.3 Hz and 20 Hz which represent approximately the usable frequency band of the EGF and maximum target frequency of the simulations.

In Fig. 9, the simulated horizontal spectral accelerations at three spectral periods (T = 0.01 s, 0.2 s and 2 s) are represented as a function of distance and are compared with the empirical GMM for Italy (ITA18) by Lanzano et al. (2019a) for a Vs30 = 800 m/s. The ITA18 model modified by Lanzano et al. (2022) for generic reference rock (ITA18ref) is also shown in order to have a more appropriate comparison. Indeed, the ITA18ref model has been derived based on a subset of Italian recordings belonging to stations that have been classified as reference rock sites applying a multiproxies technique (Lanzano et al. 2020).

We observe that the simulated values are in good agreement with the ITA18 GMM, the mean values of the simulations being in general within one standard deviation of the GMM for both magnitudes. The distance scaling of the simulated values is also very consistent with that of ITA18 confirming that the use of the selected EGF to account for the path effect is appropriate. We note that the mean simulated values are in better agreement with the ITA18ref GMM supporting the procedure adopted to remove the site effects from the recordings.

Interestingly, the standard deviation of the simulated short-period spectral values is also in broad agreement with that of the ITA18 at least up to about 20 km. At longer distances, the standard deviation of simulations decreases, likely because the variability of the rupture models mostly affects the simulated ground motions at short distances whereas at longer distances the variability is mostly controlled by differences in the path which, in our case, is related to a small region and a limited number of EGF. Similarly, the lower standard deviation of the simulated spectral accelerations at long periods (T = 2 s) maybe related to the similarity of the EGF sampling a much smaller regions with respect to the one considered in the ITA18 GMM. Moreover, as shown in Fig. 5 at long periods the EGF variability decreases as the distance increases.

A further comparison of the distance scaling of the simulated spectral accelerations is presented in Fig. 10. In this case the simulations are compared with the median predictions from the backbone GMM adopted in the latest European Seismic Hazard Model (ESHM20) as described by Weatherill et al. (2020). The median ground motion model is represented by nine logic tree branches accounting for epistemic uncertainties in the attenuation and source terms. Here we consider the predictions for the attenuation cluster related to the location of the target site (cluster 3 in Weatherill et al. 2020) corresponding to a fast attenuation compared to the default model. The comparison shows that the simulated spectral accelerations at T = 0.01s and T = 0.2 s are generally lower than the predictions by the ESHM20 GMM whereas they are more in agreement for T = 2 s. The distance scaling at short spectral periods seems stronger for the simulated values than for the GMM which suggest a different attenuation in the target region compared to that of the ESHM20 GMM (although a regional attenuation term correction is considered). We note that, the ESHM20 GMM is evaluated for generic a Vs30 = 1100 m/s which does not corresponds to the reference bedrock conditions of the simulations.

In Fig. 11, the comparison with the ITA18 and ITA18ref models is presented in terms of response spectra at a distance of 20 km. The response spectra of the simulated time-histories are in general good agreement with the ones predicted by the ITA18ref. We observe however that the peaks of the mean simulated spectra are slightly shifted toward higher frequencies compared to the ITA18 models. It is difficult to assess to what extent the reference rock conditions of ITA18ref are comparable to our simulations. In the proposed approach, all site effects are effectively removed in amplitude, and we expect to no longer have site-specific amplification or high-frequency attenuation (\(\:{\kappa}_{0}\), Anderson and Hough 1984) in the recordings. On the other hand, the ITA18ref reference condition correspond to the average response of the selected rock sites (Lanzano et al. 2022) which were found to have a wide range of \(\:{\kappa}_{0}\) values (between 0.007 and about 0.05 s), as shown by Morasca et al. (2023).

The position of the spectral peak also depend on the characteristics of the EGFs employed to simulate the ground motions at such distances. In order to illustrate this, Fig. 12 shows, for the case of Mw = 5 at 20 km, the mean of the simulated spectra (in spectral response and Fourier domains) for each adopted EGF (denoted by an event and a station). We note that the simulated spectra depend significantly on the considered EGF both in amplitude and in spectral shape, especially for frequencies higher than about 2 Hz. Some of the EGFs produce response spectra with peaks at frequencies higher than 10 Hz whereas others lead to spectral shapes more in agreement with the ITA18 model. This suggests that wave propagation effects can be highly variable even is such a small region which is in agreement with recent studies pointing out that path-to-path variability is a major contribution to the total ergodic aleatory variability in ground motion models (Sung et al. 2023). We note however that trade-off exists between source and path effects and that phenomena such as rupture directivity (as highlighted by Colavitti et al. 2022 for small magnitudes in Central Italy) that are not accounted by the average (i.e., isotropic) source correction in the GIT may also contribute to the observed variability of the EGF spectra.

In order to further validate the simulated ground motions, we compare in Fig. 13 the significant duration (D5-95) of the simulated time histories for the Mw = 5 and Mw = 6 scenarios with the estimates from the Pan-European empirical model by Sandıkkaya and Akkar (2017). The comparison as a function of distance shows a good agreement between simulated values and the empirical model although some relevant differences are noted at the shortest and longest distances. Such differences may be due to several factors such as: the source duration (depending of the source dimensions via the stress drop) controlling the D5-95 duration at short distances; the duration of some specific EGFs that may be longer/shorter than average due to local site effects not accounted for by the simple correction applied; region-specific differences with respect to the considered Pan-European model. Regardless the origin of such differences, it is important to remark that the simulated time histories are characterize by realistic durations for the considered magnitude and distances.

Fig. 9
figure 9

Simulated spectral acceleration (PSA) for Mw = 5 (left) and M = 6 (right) as a function of the Joyner-Boore distance (Rjb) for three spectral periods (T = 0.01 s, 0.2 s and 2 s). The gray circles represent the geometric mean of the horizontal components, and the vertical black bars represent the mean and standard deviation of simulated values over distance bins. Stations within the surface projection of the rupture are plotted at Rjb = 0.1 km. The GMM for Italy (ITA18) by Lanzano et al. (2019a) is plotted in light green (median ± 1 standard deviation) considering a Vs30 = 800 m/s and normal fault mechanism. The median ITA18ref (adjusted for reference rock conditions according to Lanzano et al. 2022) is shown in red

Fig. 10
figure 10

Simulated spectral acceleration (PSA) for Mw = 5 (left) and M = 6 (right) as a function of the Joyner-Boore distance (Rjb) for three spectral periods (T = 0.01 s, 0.2 s and 2 s). The gray circles represent the geometric mean of the horizontal components, and the vertical black bars represent the mean and standard deviation of simulated values over distance bins. Stations within the surface projection of the rupture are plotted at Rjb = 0.1 km. The European GMM adopted by Weatherill et al. (2020) is plotted in red for a Vs30 = 1100 m/s and considering the 9 branches of the logic tree proposed to capture uncertainties in median ground motion for attenuation cluster 3

Fig. 11
figure 11

Simulated response spectra (in gray) for the Mw = 5 (left) and Mw = 6 (right) scenarios at 20 km (stations at distances between 15 km to 25 km are used) for the geometric mean of horizontal components. The mean ± 1 standard deviation of the simulated spectra is shown in black. The GMM for Italy (ITA18) by Lanzano et al. (2019a) is plotted in green (median ± 1 standard deviation) considering a Vs30 = 800 m/s and normal fault mechanism. The median ITA18 adjusted for reference rock conditions according to Lanzano et al. (2022) is shown in red

Fig. 12
figure 12

(Left) simulated response spectra (in gray) for the Mw = 5 scenarios at 20 km (stations at distances between 15 km to 25 km are used) for the geometric mean of horizontal components. The mean ± 1 standard deviation of the simulated spectra is shown in black. The mean spectrum obtained for each EGF in shown in color. (Right) the same as in the left panel but in terms of Fourier amplitude spectra (FAS)

Fig. 13
figure 13

Comparison between simulated horizontal significant durations (D5-95: time elapsed between 5% and 95% of the total Arias Intensity) as a function of distance for Mw = 5 (left) and Mw = 6 (right) scenarios and the predictions by the Sandıkkaya and Akkar (2017) empirical model for the Pan-European region (in blue, median ± 1 standard deviation) for Vs30 = 800 m/s and normal-fault mechanism

3.4.2 Comparison with observations from similar events

A second comparison is presented with respect to spectral accelerations observed for events occurred in Central Italy with magnitudes close to those of the simulated target scenarios. To this aim, we selected from the ITACA database (Russo et al. 2022) the available recordings at stations located between 41.5° and 43.5° latitude and 11.0° and 13.5° longitude from events at distances up to 100 km from the stations. In order to have a meaningful comparison, we considered only stations identified as reference rock sites in ITACA according to the abovementioned procedure by Lanzano et al. (2020). The comparison between simulated and observed spectral accelerations as a function of distance is presented in Fig. 14 for Mw = 5.0 and Mw = 6.0. Because our objective is to simulate regional ground motions and not to model a specific event, we mix observations from several events in this comparison. The mean and the range of simulated values are in good agreement with the observations for both magnitudes and for the considered spectral periods. The simulations show a slight tendency to underestimate the observations at short-periods for Mw = 5 at large distances (> 70 km). Unfortunately, most of the data are at distances larger than 10 km and we cannot comment much on the comparison at shorter distances; however the few observations at close distances are captured by the range of simulated values.

Figure 15 shows a similar comparison in terms of response spectra for three magnitudes (Mw = 5.0, 5.5 and 6.0) at 20 km. Despite the fact that the observed data are limited in number, especially for Mw = 6, and that the variability is quite large also due to the selected range of distances (15 km to 25 km), we note a general good agreement between the mean simulated and observed spectra, particularly for Mw = 5.5 and 6.0. As already mentioned concerning Fig. 11, the difference between the spectral shapes of the mean simulated and observed spectra, particularly evident for Mw = 5.0, is likely related to the difference reference conditions obtained by the GIT-based source-site deconvolution we adopted and the sites classified as reference rock in Central Italy.

Overall these comparisons support both the appropriateness of the Green’s function correction approach as well as the source modelling for target events in the region.

Fig. 14
figure 14

Comparison between simulated (gray circles) and observed (red triangles) spectral acceleration (PSA) for Mw = 5 (left) and M = 6 (right) as a function of the Joyner-Boore distance (Rjb) for three spectral periods (T = 0.01 s, 0.2 s and 2 s). The vertical black bars represent the mean and standard deviation of simulated values over distance bins. Stations within the surface projection of the rupture are plotted at Rjb = 0.1 km. The observed data are for ± 0.1 magnitude units with respect to the target magnitudes

Fig. 15
figure 15

Comparison between simulated (in gray) and observed (in red) response spectra for Mw = 5 (left), Mw = 5.5 (center) and Mw = 6 (right) scenarios at 20 km (distances between 15 km to 25 km are used and, for observations only, magnitudes within ± 0.1 units from the target) for the geometric mean of horizontal components. The median ± 1 standard deviation of the simulated spectra is shown in black. The thick red curve represents the mean of the observed spectra. See the text for further details on the selected data

3.4.3 Simulated time histories

One of the advantages of the proposed simulation approach is that it allows generating synthetic time histories based on empirical path terms including complexities in the wave propagation that would be difficult to model numerically. Moreover, the time histories are simulated at reference bedrock (i.e., corrected for site response) and they may be used as realistic region-specific input motions for dynamic soil response. In this case, the site-specific soil profile should be defined down to the bedrock ensuring that all relevant impedance contrasts are accounted for in the soil response analysis.

Figure 16 shows an example of simulated acceleration and velocity horizontal time histories for Mw = 6.0 at several stations with increasing distances. We can note that the durations of the simulated time histories realistically increase with increasing distance as well as the time difference between P and S-waves arrivals. At the largest distances we also observe the presence of surface waves after the S-waves phase on the velocity signal.

Figure 17 shows an example of the impact of the variation of the source stress drop on the simulated time histories for a Mw = 6 at a station above the fault (Rjb = 0 km). The acceleration and velocity time histories as well as the corresponding response spectra are presented for three stress drop values chosen to be close to the average, the minimum and the maximum values of the distribution of the stress drop used in the simulations (see Sect. 3.3). As expected the PGA and PGV values as well as the spectral ordinates increase with increasing stress drop because the source radiates more high-frequency energy. The duration is inversely proportional to the stress drop because it controls the dimensions of the rupture in the present approach. We remark that rupture velocity, slip distribution and rupture nucleation point are also randomly varied in the set of 30 simulations which further contributes to the differences in the presented time histories.

Fig. 16
figure 16

Example of simulated acceleration (left) and velocity (right) time-histories (east-west component) for a Mw = 6.0 scenario at increasing distances (station code and Rjb are indicated in the figure). The original EGF event is EMSC-20161031_0000053

Fig. 17
figure 17

Example of simulated acceleration (left) and velocity (center) time histories (east-west component) for a Mw = 6.0 scenario above the fault (Rjb = 0 km) considering station IT.PRE and EGF event EMSC-20161030_0000135. The selected simulations are for three values of stress drop (#30 = 2.1 MPa, #15 = 4.9 MPa and #13 = 0.7 MPa). The corresponding response spectra are presented in the right-most plot and compared with ensemble of simulations using this EGF for Mw = 6 and Rjb of about 0 km

4 Discussion and conclusions

The method proposed in this article allows the generation of 3-component time histories for reference bedrock conditions (i.e., a virtual reference site assumed to the amplification-free) relying on empirical region-specific path effects. Provided that the usable frequency band of the EGF is large enough (e.g., 0.3 to 25 Hz), the simulated data cover the wide frequency range of interest for engineering applications. Thanks to the use of the EGF, the method accounts for 3D wave propagation without the need for detailed modeling of the crust and using modest computational resources compared to 3D simulations. The resulting time histories have credible properties both in time (i.e. shape, amplitude, duration) and in frequency (i.e. FAS, response spectra) domains for the various magnitude-distance rupture scenarios considered.

In the present proof-of-concept application, the EGF recordings were retrieved from events having magnitude between 3.5 and 4.5. However, lower magnitudes could be used as EGF, the most relevant limitation being the appropriate signal-to-noise ratio of the records over a broad frequency range. Then, the wealth of small-magnitude data that may be available in low-seismicity regions could be used to simulate large-magnitude events, in particular at short distances which are of major concern for seismic hazard assessment. In future studies, we plan an application to low-seismicity regions in order to assess the limitations and challenges of the proposed approach. The use of smaller magnitude events would allow sampling more appropriately than in the present application the paths contained in the few tens of kilometers around the target site accounting more accurately for the geological structure and wave propagation. This calls for the reinforcement of seismological instrumentation of the critical facilities at the site scale and site vicinity in order to expand the database to low-magnitude recording at close distance of the target sites.

The use of the nonparametric GIT to estimate the source and site terms makes the path term (the deconvolved EGF) neither sensitive to the reference site(s) used in the inversion nor to the metadata (magnitude, VS30) of the collected recordings. This is of great interest given the difficult identification of appropriate reference sites in many areas (e.g. Po plain, Parisian basin). In the proposed framework, even soft-soil sites close enough to the target site are of interest to evaluate the reference bedrock ground motion, provided that these recordings can be merged in the GIT to other measurements to determine source, path and site terms. The weak sensitivity of the approach to the EGF magnitude metadata is also a great advantage since small events are often poorly characterized in terms of moment magnitude. One should nonetheless recognize that the lack of constrains on the hypocentral depth, in particular for the small magnitude events, remains an issue when using the EGF. Thanks to the densification of the seismological network, this limitation may vanish in the forthcoming years.

The proposed approach could be used to develop region-specific GMMs for reference bedrock, to adjust median estimates from existing GMMs to the target regional path effects and reference site conditions or to generate reference bedrock ground motions to select hazard-consistent time histories for subsequent soil response analysis and structural response evaluation. The comparisons of ground motions from host empirical models with the simulated ones can also support the characterization of epistemic uncertainties in a target region helping in the assessment of alternative models to quantify the expected ground motion at the site. Indeed, the lack of reference rock recordings is a strong limitation to assess the applicability of GMMs to a target site/region.

Although we believe the proposed approach is promising, it also present some challenges and needs for improvement. The strongest limitation is related to the fact that the proposed methodology cannot be applied if no data is available in the target region, contrary to purely numerical simulation approaches.

Further improvements in the methodology may account for phase correction in the source-site deconvolution. At present, phase is not considered in the deconvolution and this will undoubtedly bias to some extent the duration of the simulated time histories, particularly for soft-soil sites. Another important improvement may concern the kinematic source model which is still quite simplified in the current approach. Pseudo-dynamic rupture models and fractal approaches may provide more realistic source radiation, a better control of the directivity, a better correlation between the slip, rupture velocity and rise time. On one hand, better accounting for the directivity may lead to increase the ground motion variability at short-to-intermediate distance. On the other hand, a better representation of the source kinematics may help to discard unrealistic rupture realization, contributing to better control the variability.