1 Introduction

Probabilistic seismic hazard analysis (PSHA) requires ground motion models (GMMs) to describe the probability distribution of ground motion intensity measures (IMs) for given earthquake scenarios and site conditions. The probability distribution of an IM is often characterized by a log-normal distribution with a median and standard deviation which are estimated from GMMs as functions of earthquake characteristics (e.g., earthquake magnitude—M, rupture distance—\(R_{rup}\), etc.) and site conditions (e.g., time-averaged shear wave velocity in the top 30 m of the soil profile—\(V_{s30}\)).

Current GMMs are based on the ergodic assumption that the distribution of ground motion IMs over time at a single site is the same as the distribution of ground motion IMs over space (Anderson and Brune 1999). Under this assumption, GMMs are developed using global databases, providing median and standard deviation estimates that are the same for a single source-site combination. However, the standard deviations of ergodic GMMs are affected by location-specific systematic and repeatable effects; hence they represent an inflated aleatory variability, which has become evident as regional differences in ground motions have been observed in recent studies (Walling 2009). As a result, the ergodic assumption leads to biased estimates of IM hazard curves due to an inadequate trade-off between aleatory variability and epistemic uncertainties, where the latter is associated with repeatable and systematic effects (Walling 2009). On the contrary, non-ergodic GMMs model location-specific systematic and repeatable effects as adjustments to the median ground motions and hence yield a better trade-off between aleatory variability and epistemic uncertainties. The reduced standard deviations (associated with the reduced aleatory variability) are compensated by additional epistemic uncertainties introduced during the estimation of systematic effects (Abrahamson et al. 2019; Landwehr et al. 2016; Liu et al. 2022; Lavrentiadis et al. 2021). Many studies show that non-ergodic GMMs have significant impacts on PSHA, as the median adjustments shift the hazard curves horizontally while smaller standard deviations lead to steeper hazard curves (e.g., Rodriguez-Marek et al. (2014); Stewart et al. (2017); Abrahamson et al. (2019)), which is schematically illustrated in Figure 1 (see details in the figure caption).

Fig. 1
figure 1

a The probability density of an ergodic model is compared with the densities of three non-ergodic models with smaller standard deviations, but different medians. b The non-ergodic models lead to steeper and horizontally shifted hazard curves as compared to the ergodic one (adapted from Abrahamson et al. (2019))

The fast-growing ground motion databases enable relaxing the ergodic assumption and transitioning from ergodic PSHA to non-ergodic PSHA, which requires estimating location-specific systematic source, path, and site effects. Several efforts with different levels of complexities have been made to develop non-ergodic GMMs that account for systematic and repeatable effects for non-ergodic PSHA. An initial effort is the so-called single-station-sigma GMMs (e.g., Rodriguez-Marek et al. (2013)), which estimate systematic site effects for well-recorded sites. Specifically, if repeated ground motion recordings from multiple earthquakes at a single site are available, then it is possible to estimate and remove the systematic site effect of that site from the residuals and hence reduce the aleatory standard deviation. Examples of single-station-sigma GMMs include models developed by Rodriguez-Marek et al. (2013), Abrahamson and Hollenback (2012), Rodriguez-Marek et al. (2011), Atkinson (2006), Chen and Tsai (2002), and Lin et al. (2011a). Another approach for developing non-ergodic GMMs is to regionalize the functional forms that represent the scaling of IMs. For example, Kuehn et al. (2020) developed a non-ergodic GMM with several regionalized coefficients (e.g., Japan, Taiwan, Southern America, etc.) to capture regional differences in the scaling of spectral accelerations. Dawood and Rodriguez-Marek (2013) developed a GMM for Japan with different anelastic attenuation scaling for different small regions, which is known as the cell-specific attenuation. In a different effort, Landwehr et al. (2016) developed a varying-coefficient GMM for California with coefficients varying smoothly by geographical coordinates of earthquakes and sites. Lanzano et al. (2021) developed a non-ergodic GMM with spatially varying coefficients for crustal earthquakes in Italy, using a multi-source geographically weighted regression (Caramenti et al. 2022). More recent studies decompose the residuals of ergodic GMMs into systematic and aleatory components for source, path, and site terms. The systematic effects are then modeled as functions of geographical coordinates, while the aleatory residuals represent the inherent randomness of the process. For example, Kuehn and Abrahamson (2020) developed ergodic GMMs for Taiwan and California from which residuals were calculated and used as inputs into Gaussian processes that evaluated the systematic source and path effects in the estimated residuals. The estimated systematic effects were subsequently used in the implementation of non-ergodic GMMs for Taiwan and California. In addition, Sgobba et al. (2021) developed a non-ergodic GMM by modeling the event, source, path, and site effects in the residuals of a region-specific GMM for central Italy, which was subsequently used to generate shaking maps.

Although significant efforts have been made to provide valuable insights into the transition from ergodic to non-ergodic GMMs, there are still several gaps to be bridged. For example, one of the challenges is the estimation of the spatial correlation structures for systematic source, path, and site effects, which are expected to affect PSHA and the risk assessment of engineering systems significantly (Sgobba et al. 2019, 2021; Park et al. 2007; Macedo et al. 2020, 2022; Liu et al. 2021; Patel et al. 2021; Ceferino et al. 2020). Hence, it is important to develop accurate and robust spatial correlation models for systematic effects. However, most studies developed spatial correlation models based on ergodic or partially ergodic assumptions, or have not fully exploited the functional forms of spatial correlation models. For example, Jayaram and Baker (2009) modeled the spatial correlation of within-event residuals of ergodic GMMs using an isotropic semi-variogram that solely depends on the distances between sites. However, the within-event residuals contain both systematic path and site effects that cannot be fully modeled by an isotropic semi-variogram alone (Kuehn and Abrahamson 2020; Kuehn et al. 2019; Villani and Abrahamson 2015; Lin et al. 2011b; Schiappapietra and Smerzini 2021). Foulser-Piggott and Goda (2015) evaluated the spatial correlation for site effects in single-station-sigma GMMs, but they did not investigate the correlation structures of source and path effects. More recently, Kuehn and Abrahamson (2020) used Gaussian processes with several covariance functions to estimate the spatial correlation of systematic source and path effects from the ANZA array (Berger et al. 1984) database. However, only a smaller number of correlation models were tested for path effects, and a single model was used for source effects. Schiappapietra and Douglas (2020) have conducted a comprehensive review of previously developed correlation models, among which most of them are developed based on semivariograms with isotropic and stationary exponential models. They also pointed out that a single rate of decay of the correlation as a function of the inter-site separation distance may not be sufficient for seismic hazard and risk assessment. Moreover, most of the previous studies have used ground motions databases that are too sparse to estimate repeatable effects effectively (e.g., Jayaram and Baker (2009); Hong et al. (2009)). Lastly, the transferability of spatial correlation structures across different regions is not fully understood. Schiappapietra and Douglas (2020) observed that differences in ground motion database and regional geological  conditions could result in a different rate of decay of the correlation with increasing inter-site separation distance, in the context of second-order stationary and isotropic correlation models. Kuehn and Abrahamson (2020) investigated the possibility of applying the correlation structures of non-stationary path-effect models developed using the ANZA array in California to Taiwan. They found this transfer of correlation structures (i.e., from California to Taiwan) promising if the magnitude extrapolation was properly accounted for. Even though the previous referred studies provided interesting insights, more efforts are required to investigate the transferability of correlation structures. Exploring the transferability of correlation structures is beneficial as it could reduce the computational efforts in PSHA assessments with non-ergodic GMMs and also improve the prediction of IMs for areas with scarce data (Abrahamson and Kuehn 2021).

In this study, we develop several spatial correlation models for systematic source, path, and site effects for peak ground acceleration (PGA) using the Ridgecrest database (Rekoske et al. 2020). The Ridgecrest database contains a dense set of ground motions in a localized region with a range of magnitudes, which is suitable for estimating systematic effects (Liu et al. 2022). We evaluate the efficacy of different correlation models based on their predictive performance on a hold-out set of ground motions representing future earthquakes. We also investigate the impact of the cell-specific attenuation approach (Dawood and Rodriguez-Marek 2013) on the correlation models for path effects. The performance of the different models developed in this study are compared, and insights from the comparison are shared. Finally, we also compare the performance of our models with the Kuehn and Abrahamson (2020) models, which are developed using a different database, to evaluate the transference of correlation structures across different regions.

2 Database

This study uses the Ridgecrest ground motion database developed by the United States Geological Survey (Rekoske et al. 2020), which contains 22,375 recordings at 968 stations from 133 earthquakes. In this database, multiple recordings are available from a single event or site, making it suitable for estimating the spatial correlation of systematic effects. The database is further filtered based on two criteria. First, ground motions recorded at distances larger than 200 km are removed as PSHA studies commonly exclude scenarios with rupture distances beyond 200 km. Second, an ergodic GMM (i.e., see the “Functional form of the ground motion model” section) is fitted using the database, and additional records with high leverages (Hastie et al. 2009) or abnormal residuals are removed. The filtering process yields a final subset of 12,612 recordings from 131 earthquakes at 458 stations with rupture distances up to 200 km and magnitudes between 3.6 to 7.1. To evaluate the performance of spatial correlation models, some earthquakes and stations are removed to a hold-out test set while the remaining recordings are used as a training set. The training set contains around 70% (9206 recordings from 87 events at 382 stations) while the test set contains around 30% (3405 recordings from 73 events at 347 stations) of the ground motion recordings in the final subset. As elaborated upon later, correlation models are trained using the training set, and their performance is evaluated based on their predictions on the test set. Figures 2 and 3 show the magnitude-distance distribution and the spatial distribution of sites and earthquake epicenters for ground motion recordings in this database, respectively.

Fig. 2
figure 2

Magnitude-distance distribution of ground motion recordings in the Ridgecrest database

Fig. 3
figure 3

Spatial distribution of earthquakes and sites in a the training set and b the test set

3 Estimation of systematic effects

3.1 Components of ground motion residuals

Systematic effects can be estimated from residuals of an ergodic GMM, as shown in Eq. 1.

$$\begin{aligned} \ln {Y} = f_{ergodic}(M,R_{rup},V_{s30}) + \delta B + \delta W \end{aligned}$$
(1)

where Y is the IM of interest (i.e., PGA in this study), \(f_{ergodic}\) is a function of earthquake characteristics (e.g., M, \(R_{rup}\), etc.) and soil conditions (e.g., \(V_{s30}\), etc.), which provides median estimates for the IM of interest. \(\delta B\) and \(\delta W\) are the between-event and within-event residuals, respectively, which can be estimated by random-effect regressions (e.g., Abrahamson and Youngs (1992)). \(\delta B\) and \(\delta W\) have Gaussian distributions with zero means and standard deviations of \(\tau\) and \(\phi\), respectively.

Using a single-station-sigma model, \(\delta W\) can be divided into two components: the between-site residual \(\delta S\), which contains systematic site effects; and the event-site-corrected residual \(\delta WS\), which includes the systematic path effects. Hence, \(\delta B\), \(\delta S\), and \(\delta WS\) follow Gaussian distributions with zero means and standard deviations of \(\tau\), \(\phi _{S}\), and \(\phi _{SS}\), respectively, as shown in Eqs. 2 to 4.

$$\begin{aligned} \delta B&\sim {\mathcal {N}}(0, \tau ^2) \end{aligned}$$
(2)
$$\begin{aligned} \delta S&\sim {\mathcal {N}}(0, \phi _{S}^2)\end{aligned}$$
(3)
$$\begin{aligned} \delta WS&\sim {\mathcal {N}}(0, \phi _{SS}^2) \end{aligned}$$
(4)

Following the notations in Atik et al. (2010), \(\delta B\), \(\delta S\), and \(\delta WS\) can be further partitioned into epistemic components accounting for the systematic effects and aleatory components representing the natural randomness of the process:

$$\begin{aligned} \delta B&= \delta L2L + \delta B_0 \end{aligned}$$
(5)
$$\begin{aligned} \tau ^2&= \tau _{L2L}^2 + \tau _0^2 \end{aligned}$$
(6)
$$\begin{aligned} \delta WS&= \delta P2P + \delta WS_0 \end{aligned}$$
(7)
$$\begin{aligned} \phi _{SS}^2&= \phi _{P2P}^2 + \phi _{0,SS}^2 \end{aligned}$$
(8)
$$\begin{aligned} \delta S&= \delta S2S + \delta S_0 \end{aligned}$$
(9)
$$\begin{aligned} \phi _{S}^2&= \phi _{S2S}^2 + \phi _{0,S2S}^2 \end{aligned}$$
(10)

where \(\delta L2L\), \(\delta P2P\), and \(\delta S2S\) represent systematic source, path, and site effects, respectively. \(\delta B_0\), \(\delta WS_0\), and \(\delta S_0\) are the remaining aleatory between-event, event-site-corrected, and between-site residuals, respectively. \(\tau _{L2L}\), \(\phi _{P2P}\), and \(\phi _{S2S}\) are (epistemic) standard deviations of systematic source, path, and site effects; \(\tau _0\), \(\phi _{0, SS}\), and \(\phi _{0, S2S}\) are standard deviations of aleatory between-event, event-site-corrected, and between-site residuals, respectively.

3.2 Modeling spatial correlation of systematic effects

We model systematic source, path, and site effects as functions of their geographic coordinates as defined in Eqs. 11 to 13:

$$\begin{aligned} \delta L2L&= f_1(x_e) \end{aligned}$$
(11)
$$\begin{aligned} \delta P2P&=f_2(x_e,x_s)\end{aligned}$$
(12)
$$\begin{aligned} \delta S2S&= f_3(x_s) \end{aligned}$$
(13)

in which \(x_e\) and \(x_s\) are column vectors containing geographical coordinates of earthquakes and sites, respectively. Since the functional forms of \(f_i\)’s (for \(i =1 ,2, \text {and } 3\)) are unknown, we assume they are drawn from a Gaussian process (GP) prior:

$$\begin{aligned} f_1(x_e) \sim GP(0, k(x_e,x_e')) \end{aligned}$$
(14)
$$\begin{aligned} f_2(x_e,x_s) \sim GP(0, k([x_e,x_s], [x_e',x_s']))\end{aligned}$$
(15)
$$\begin{aligned} f_3(x_s) \sim GP(0, k(x_s,x_s')) \end{aligned}$$
(16)

where \(k(x,x')\) is a covariance function that estimates the correlation between locations x and \(x'\). The variations of \(f_i\)’s are controlled by the choices of \(k(x,x')\). A covariance function \(k(x,x')\) is called stationary if \(k(x,x') = g(h)\) with \(h = x'- x\) (i.e., the covariance only depends on the vector from x to \(x'\)). Further, if \(k(x,x') = g(||h||)\) (i.e., the covariance only depends on the distance between x and \(x'\)), \(k(x,x')\) is called isotropic. Hence, isotropy implies stationarity in covariance functions; namely, a non-stationary covariance function is also anisotropic. As detailed in later sections, we build different isotropic stationary and anisotropic non-stationary covariance functions and evaluate their performance on estimating spatial correlation of systematic effects. Modeling spatial correlation of systematic effects as GPs is an alternative to geostatistical methods such as fitting semivariograms, which have been used in previous studies (e.g., Jayaram and Baker (2009), Foulser-Piggott and Goda (2015), etc).

Considering that the aleatory residuals \(\delta B_0\), \(\delta WS_0\), and \(\delta S_0\) are each independent and identically distributed Gaussian random variables, \(\delta B\), \(\delta WS\), and \(\delta S\) follow multivariate Gaussian distributions:

$$\begin{aligned} \delta B&\sim {\mathcal {N}}(0, k(x_e,x_e') + \delta _{ij}\tau _0^2) \end{aligned}$$
(17)
$$\begin{aligned} \delta WS&\sim {\mathcal {N}}(0, k([x_e,x_s], [x_e',x_s']) + \delta _{ij}\phi _{0,SS}^2)\end{aligned}$$
(18)
$$\begin{aligned} \delta S&\sim {\mathcal {N}}(0, k(x_s,x_s') + \delta _{ij}\phi _{0,S2S}^2) \end{aligned}$$
(19)

where \(\delta _{ij}\) is the Kronecker delta with a value of 1 if two earthquakes, paths, or sites are the same in Eqs. 17, 18, and 19, respectively. When predictions of systematic effects at new locations are required, one can calculate the conditional distributions of systematic effects given the observed \(\delta B\), \(\delta WS\), and \(\delta S\) in the training set. Details on mathematical derivations for predictions are outlined in Rasmussen (2003) and Kuehn and Abrahamson (2020). In this study, we first compute the residuals of the Ridgecrest database (including both training and test sets) using an ergodic GMM. Then, we estimate correlation structures of systematic effects based on the residuals in the training set and evaluate their predictive performance on the test set.

4 Functional form of the ground motion model

We use residuals of the ergodic GMM defined in Eqs. 20 and 21 to model spatial correlation of systematic effects. This functional form has also been used by GeoPentech (2015) and Abrahamson et al. (2019) in previous studies for other areas in California.

$$\begin{aligned}&\begin{aligned} f_{ergodic} = \theta _0&+ g(M) + (\lambda _4 + \lambda _5(M-5))\ln {\sqrt{R_{rup}^2 + \lambda _6^2}} \\&+ \lambda _7R_{rup} + \theta _8Z_{hyp} + \lambda _9F_n + \lambda _{10}F_{rv} + \theta _{11}\ln {\frac{V_{s30}}{760}} \end{aligned} \end{aligned}$$
(20)
$$\begin{aligned}&\quad g(M) = {\left\{ \begin{array}{ll} -\lambda _1 + \lambda _2(M-5.5), &{} \text {if } M<5.5\\ \lambda _1(M-6.5), &{} \text {if } 5.5 \le M \le 6.5\\ \lambda _3(M-6.5), &{} \text {if } M > 6.5 \end{array}\right. } \end{aligned}$$
(21)

where \(Z_{hyp}\) is the hypocentral depth, \(F_n\) and \(F_{rv}\) are flags with values of 1 for normal and reverse fault mechanisms, respectively. \(\lambda _i\) and \(\theta _i\) are coefficients that are determined using Bayesian regression (Gelman et al. 2013). Considering that the earthquakes in the Ridgecrest database have mainly strike-slip fault mechanisms and only small fractions of ground motions recorded from large magnitudes at short distances are available, the coefficients representing magnitude scaling, geometric spreading, anelastic attenuation, and fault styles (i.e., \(\lambda _i\)’s) are constrained using the NGA-West2 ground motion database (Ancheta et al. 2014), while the remaining coefficients (i.e., \(\theta _i\)’s) are fitted using the Ridgecrest training set.

Figure 4 shows the residuals of the ergodic GMM against M, \(Z_{hyp}\), \(R_{rup}\), and \(V_{s30}\), where there are no noticeable trends, indicating that the ergodic GMM is adequately constrained. The coefficients and standard deviations of the ergodic GMM are presented in Table 1.

Fig. 4
figure 4

Residuals of the ergodic GMM for PGA against M, \(Z_{hyp}\), \(V_{s30}\) and \(R_{rup}\)

Table 1 Coefficients and standard deviations of the ergodic GMM

5 Spatial correlation of systematic source effects

Systematic source effects \(\delta L2L\) are often modeled using an isotropic covariance function for the GP defined in Eq. 14 (Abrahamson et al. 2019; Liu et al. 2022; Kuehn and Abrahamson 2020), which means that the correlation only depends on the distance between the locations of two earthquakes. A typical isotropic covariance function is defined as:

$$\begin{aligned} k(x_e,x_e') = \tau _{L2L}^2\exp {\left(-\frac{||x_e-x_e'||}{\ell }\right)} \end{aligned}$$
(22)

where \(\ell\) is the correlation length that controls the decaying rate of correlation with respect to distances between earthquakes. \(x_e\) is a 2-dimensional column vector representing the longitude and latitude of an earthquake epicenter (all the coordinates are converted into the Universal Transverse Mercator coordinates in this study). One limitation of Eq. 22 is that it ignores the geometry of a fault and hence could inappropriately extrapolate source effects to distant regions. This is illustrated in Figure 5 and subsequent sections.

Fig. 5
figure 5

Three pairs of earthquake epicenters with equal separation distances are marked with circles, triangles, and squares; in addition, two faults are represented by two solid lines. An isotropic stationary covariance function, in this case, would be unable to differentiate the correlation among the three earthquake pairs

In Figure 5, conceivably, we consider 3 pairs of equidistant earthquake epicenters (denoted with circles, squares, and triangles) and 2 fault segments (denoted with black lines), which is representative of the two main faults in the Ridgecrest area. Since all 3 pairs of earthquakes have the same separation distances, Eq. 22 generates the same correlations for the 3 cases. However, the pair of earthquakes located on the same fault (marked as squares) should be more correlated than the pair on different faults (marked as triangles). In addition, the correlation for the pair marked as circles is expected to be the weakest. Considering this observation, we propose an anisotropic non-stationary covariance function by making the correlation length \(\ell\) in Eq. 22 depend on the distance from the earthquake to the fault. Since the covariance function should be positive definite, we use the methodology proposed by Paciorek and Schervish (2006) to construct an anisotropic non-stationary positive definite covariance function as defined in Eqs. 23 and 24.

$$\begin{aligned} Q(x_e,x_e')= & {} \left(x_e -x_e'\right)^T\left(\frac{\Lambda (x_e) + \Lambda (x_e')}{2}\right)^{-1}(x_e -x_e') \end{aligned}$$
(23)
$$\begin{aligned} k(x_e,x_e')= & {} \tau _{L2L}^2 2^{D/2}|\Lambda (x_e)|^{0.25}|\Lambda (x_e')|^{0.25}|\Lambda (x_e) + \Lambda (x_e')|^{-0.5}\exp {(-\sqrt{Q(x_e,x_e')})} \end{aligned}$$
(24)

in which D is the dimension of \(x_e\) (i.e., \(D=2\) in this case). \(\Lambda\) is a D by D matrix-valued function. The premultiplication terms in Eq. 24 ensure the positive definiteness of \(k(x_e,x_e')\). \(\Lambda (x)\) describes the relationship between correlation length and the earthquake location x and should also be positive definite. As a result, we model the correlation length \(\ell\) as a positive function of \(x_e\) and put it on the diagonal entries of \(\Lambda\):

$$\begin{aligned} \Lambda (x) = \begin{pmatrix} \ell ^2(x) &{} 0\\ 0 &{} \ell ^2(x) \end{pmatrix} \end{aligned}$$
(25)

if \(i = j\):

$$\begin{aligned} \ell ^2(x_e) = (a*\exp {(-bd(x_e,i))})^2 \end{aligned}$$
(26)
$$\begin{aligned} \ell ^2(x_e') = (a*\exp {(-bd(x_e',j))})^2 \end{aligned}$$
(27)

if \(i \ne j\):

$$\begin{aligned} \ell ^2(x_e) = (a*\exp {(-bd(x_e,j))})^2 \end{aligned}$$
(28)
$$\begin{aligned} \ell ^2(x_e') = (a*\exp {(-bd(x_e',i))})^2 \end{aligned}$$
(29)

where i and j are the index numbers of the fault segments, to which \(x_e\) and \(x_e'\) are closest, respectively. \(d(x_e,i)\) is the shortest distance from \(x_e\) to the fault segment i. a and b are positive coefficients to be estimated. \(\ell\) is an exponentially decaying function of distances from earthquake locations to fault segments. As a result, the correlation between two earthquakes \(x_e\) and \(x_e'\) decreases as their distances to the fault segments increase.

We train the isotropic stationary (Eq. 22) and anisotropic non-stationary models (Eqs. 23 to 29) using the Ridgecrest training set and evaluate their predictive performance on the test set. In addition, we compare our models with the one developed in Kuehn and Abrahamson (2020). The Kuehn and Abrahamson (2020) model shares the same functional form as our isotropic stationary model but has coefficients estimated using the ANZA array data. The coefficients in Eqs. 22 to 29 are determined by maximum a posteriori (MAP) estimation using the program STAN (Carpenter et al. 2017). For simplicity, we consider the two main faults in the Ridgecrest area (i.e., the Eastern Little Lake fault and the Southern Little Lake fault, as described in Plesch et al. (2020)) and parameterize them as two straight line segments, as shown in Fig. 6. It is important to note that the anisotropic non-stationary model also works for faults with nonlinear geometries. For subsequent discussion, we denote the isotropic stationary and anisotropic non-stationary models developed in this study for source effects as SRC-1 and SRC-2, respectively. In addition, we denote the source effects model developed in Kuehn and Abrahamson (2020) as SRC-KA. The estimated coefficients for these models are shown in Table 2.

Fig. 6
figure 6

Comparison of the spatial distribution of a between-event residuals in the training set, and predicted source effects, \(\delta L2L\), by the b SRC-1, c SRC-2, and d SRC-KA models. The Eastern Little Lake and the Southern Little Lake faults in the Ridgecrest area are represented by the solid and dashed black lines, respectively

Table 2 Coefficients of correlation models for path effects

Figure 6 shows the spatial distribution of the between-event residuals \(\delta B\) in the training set and systematic source effects (\(\delta L2L\)) predicted for the Ridgecrest area by SRC-1, SRC-2, and SRC-KA models. The lower half of the Eastern Little Lake fault shows on average higher between-event residuals than the upper half of the fault. This spatial pattern is captured by all three models. The earthquake epicenters in the training set exhibit clustered structures, leading to the circular contours around earthquake epicenters in the SRC-1 model. In contrast, the source effects predicted by SRC-2 show no clusters; instead, the contours of the source effects are interpolated and constrained along the two faults. As opposed to SRC-1 and SRC-2, the predicted source effects from SRC-KA extrapolate to distant locations where there are no data to validate the extrapolation. This result is likely influenced by the sparsely distributed seismicity in the ANZA array (based on which SRC-KA is developed) as compared to the clustered epicenters in the Ridgecrest area, which yields a longer correlation length in SRC-KA (i.e., 11.8 km vs. 2 km as shown in Table 2).

It is difficult to compare the distribution of source effects based on simple visualizations; as a result, we quantify the performance of these models by computing the root-mean-square-error (RMSE) and mean negative log likelihood (MNLL) on the test set. The test set consists of earthquakes with different epicenter locations than the training set, which reduces the bias of the performance estimation. The RMSE and MNLL are defined as:

$$\begin{aligned} RMSE = \sqrt{\frac{\sum _{i=1}^{N} (y_i-\mu _i)^2}{N}} \end{aligned}$$
(30)
$$\begin{aligned} MNLL = \frac{\sum _{i=1}^{N} \frac{1}{2}\log {2\pi \sigma _i^2} + \frac{(y_i-\mu _i)^2}{2\sigma _i^2}}{N} \end{aligned}$$
(31)

where N is the total number of data, \(\mu _i\) is the predicted mean source, path, or site effects, and \(y_i\) corresponds to \(\delta B\), \(\delta WS\), or \(\delta S\), respectively (for example, in the case of source terms, N represents the total number of earthquakes, \(y_i\) and \(\mu _i\) are the observed between-event residual (\(\delta B\)) and predicted mean source effect for the i th earthquake). \(\sigma _i\) is the total standard deviation (including both the epistemic and aleatory standard deviations) associated with the i th prediction. RMSE measures the differences between predicted and observed values, whereas the MNLL takes into account the uncertainty associated with the predictions (Bosman and Thierens 2000). Specifically, the MNLL is proportional to Kullback-Leibler divergence (Kullback and Leibler 1951); hence, it measures the difference between distributions predicted by different models and the true distribution of the data. For both metrics, RMSE and MNLL, a lower value indicates a better predictive performance.

Fig. 7
figure 7

Performance comparison of correlation models for source effects in terms of (a) RMSE and (b) MNLL on the test set

The RMSE and MNLL results are presented in Figure 7 for the three source effects models. The anisotropic non-stationary SRC-2 model has better performance (lowest RMSE and MNLL) as compared to the other models. The SRC-1 model has similar RMSE and MNLL to the ergodic model (without any spatial correlation modeling of systematic source effects), indicating that it has a similar prediction error as the ergodic model. This is because the SRC-1 model is limited in extrapolating the source effect to new locations along the faults compared to the SRC-2 model (see Figure 6) and hence over-fits the data (see \(\tau _0\) in Table 2). Interestingly, the SRC-KA model has comparable performance to the SRC-2 model. One possible explanation for this may be due to the similarity in earthquake characteristics of the database used in its derivation and the one used in this study. Both the ANZA array and the Ridgecrest databases contain mainly earthquakes in a narrow magnitude range, and the earthquakes in both databases are within the same tectonic region (e.g., both in Southern California). Hence, it is reasonable for the two regions to have similar spatial correlations of source effects. This may also be related to the fact that most of the earthquakes in the Ridgecrest database are located close to the considered faults, and the SRC-2 and SRC-KA models predict similar source effects along the faults (see Figure 6), which may be an artifact of the correlation length in the SRC-KA model. However, the extrapolation of source effects to distant areas in the SRC-KA model is not desired. It is worth highlighting that the performance of the SRC-2 and SRC-KA models should be further evaluated considering earthquakes that are far from the faults, which is not assessed in this study due to the characteristics of the Ridgecrest ground motion database (i.e., most earthquakes occur near the faults). Moreover, the SRC-2 model takes into account the fault geometry; it is not only controlled by the spatial patterns of the between-event residuals, which is the case for the isotropic stationary model. This advantage is illustrated in Figure 8, where we select a different training set to estimate the coefficients of the SRC-1 and SRC-2 models. In this dataset, the between-event residuals show strong negative and positive biases in the upper and lower portions of the Eastern Little Lake fault, respectively. In contrast with Figure 6, the SRC-1 model in Figure 8 shows a stronger extrapolation of source effects to distant locations due to its large correlation length (30 km—influenced by the different training set), while this extrapolation is constrained by the fault geometries for the SRC-2 model, regardless of the training set. In certain scenarios where fault characteristics are not known or explicitly modeled, the traditional isotropic stationary correlation model for source effects could be used with engineering judgment to assess the model extrapolation carefully.

Fig. 8
figure 8

Comparison of the spatial distribution of (a) between-event residuals, and predicted source effects, \(\delta L2L\), by (b) the SRC-1 and (c) SRC-2 models for a different ground motion training subset. The Eastern Little Lake and the Southern Little Lake faults in the Ridgecrest area are represented by the solid and dashed black lines, respectively

6 Spatial correlation of systematic path effects

Previous research efforts have found it difficult to model the spatial correlation of path effects using an isotropic stationary covariance model that is a function of the distances between two sites (i.e., \(||x_s-x_s'||\)) (Loth and Baker 2013; Foulser-Piggott and Stafford 2012; Kuehn and Abrahamson 2020). This difficulty results from the fact that the correlation between two paths depends not only on the distances between sites but also on the source-to-site distances. For example, the two pairs of sites shown in Figure 9 have the same between-site distances but the path effects for the pair far from the earthquake epicenter (represented by squares) should be more correlated because the propagation paths of earthquake waves to the further sites are likely more similar than those to the closer sites.

Fig. 9
figure 9

Illustration of the spatial correlation of path effects. The earthquake epicenter is marked as a circle and sites are denoted as triangles and squares. Path effects for sites in squares are likely more similar than the sites marked as triangles

To capture this phenomenon, we first apply the anisotropic non-stationary covariance function described in Kuehn and Abrahamson (2020), which is similar to the SRC-2 model, to estimate the correlation of path effects, as shown in Eqs. 32 to 38.

If paths \([x_s, x_e]\) and \([x_s', x_e']\) come from the same earthquake or travel to the same site:

$$\begin{aligned} \begin{aligned} k([x_s, x_e],[x_s', x_e'])&=\tau _{L2L}^2 2^{D/2}|\Lambda (x_s, x_e)|^{0.25}|\Lambda (x_s', x_e')|^{0.25}\\&*|\Lambda (x_s, x_e) + \Lambda (x_s', x_e')|^{-0.5}\exp {(-\sqrt{Q([x_s, x_e],[x_s', x_e'])})} \end{aligned} \end{aligned}$$
(32)

Otherwise:

$$\begin{aligned} k([x_s, x_e],[x_s', x_e']) = 0 \end{aligned}$$
(33)

Specifically, \(Q([x_s, x_e],[x_s', x_e'])\) is modeled as a function of between-site or between-earthquake distances for paths from the same earthquakes or to the same sites, respectively:

$$\begin{aligned}&Q([x_s, x_e],[x_s', x_e']) = {\left\{ \begin{array}{ll} (x_s -x_s')^T(\frac{\Lambda (x_s, x_e) + \Lambda (x_s', x_e')}{2})^{-1}(x_s -x_s') &{}\text { same earthquake} \\ (x_e -x_e')^T(\frac{\Lambda (x_s, x_e) + \Lambda (x_s', x_e')}{2})^{-1}(x_e -x_e') &{}\text { same site} \end{array}\right. } \end{aligned}$$
(34)
$$\begin{aligned}&\quad \Lambda (x_s,x_e) = \begin{pmatrix} \ell ^2(x_s,x_e) &{} 0\\ 0 &{} \ell ^2(x_s,x_e) \end{pmatrix} \end{aligned}$$
(35)

In the above equations, \(D=2\) since the covariance function depends on either the earthquake or site locations. We have conducted an initial investigation of the spatial correlation of \(\delta WS\) using semivariograms and observed a generally positive relationship between the correlation length of path effects and the earthquake magnitude. This observation is consistent with that in Heresi and Miranda (2019) that spatial correlation parameters for within-event residuals are magnitude dependent. Consistently, Kuehn and Abrahamson (2020) also found that the performance of correlation models for path effects was improved by making the correlation length magnitude-dependent. For simplicity, in this study, we model the correlation length \(\ell (x_s,x_e)\) to be dependent on the earthquake magnitude in addition to the source-to-site distance (we use \(R_{rup}\) in this study). Regarding the functional forms of \(\ell (x)\), we first consider a linear dependence on \(R_{rup}\) and M:

$$\begin{aligned} \ell ^2(x_s,x_e) = \ell ^2(M,R_{rup}) = (p + qMR_{rup})^2 \end{aligned}$$
(36)

where p and q are positive coefficients to be estimated, M is the magnitude of the earthquake with epicenter \(x_e\), and \(R_{rup}\) is the rupture distance from the site \(x_s\) to the earthquake \(x_e\). This functional form is selected to be consistent with Kuehn and Abrahamson (2020) to facilitate comparisons, as discussed later. However, Eq. 36 implies that \(\ell\) is unbounded, and can unrealistically increase to infinity when \(R_{rup}\) approaches infinity. To address this issue, we evaluate additional functional forms with a slower increasing rate for \(\ell\) and an upper bound, as shown in Eqs. 37 and 38.

$$\begin{aligned} \ell ^2(x_s,x_e) = \ell ^2(M,R_{rup}) = (p + q\ln {(MR_{rup}+1)})^2 \end{aligned}$$
(37)
$$\begin{aligned} \ell ^2(x_s,x_e) = \ell ^2(M,R_{rup}) = \left( p + q\frac{1-\exp {(-MR_{rup})}}{1+\exp (-MR_{rup})}\right) ^2 \end{aligned}$$
(38)

In Eq. 37, a natural logarithm is introduced to \(MR_{rup}\) to reduce the scaling of \(\ell\) when \(R_{rup}\) gets extremely large. Equation 38 is modified based on the sigmoid function (Han and Moraga 1995), where \(\ell\) has a lower bound of p and an upper bound of \(p + q\) when \(MR_{rup}\) approaches 0 and infinity, respectively. By making \(\ell\) a constant to be estimated from the data, we also develop an isotropic stationary model to benchmark the performances of anisotropic non-stationary models. We denote the isotropic stationary model as PATH-1 and the anisotropic non-stationary models with the length functions in Eqs. 36, 37, and 38 as PATH-1a, PATH-1b and PATH-1c, respectively. Similar to SRC-KA, the path-effect model in Kuehn and Abrahamson (2020) is denoted as PATH-KA, which has the same functional form as PATH-1a with different coefficients estimated from the ANZA array. These model definitions are summarized in Table 3. Lastly, it is important to highlight that, also as outlined in Schiappapietra and Smerzini (2021) and Stafford et al. (2019), other factors such as rupture processes, fault extent, etc., could also contribute to the estimation of correlation lengths, especially for near-field ground motions. Hence, more advanced correlation models that account for the different characteristics of far-field and near-field path effects could also be explored with different databases in the future.

It is important to note that the above models ignore the correlation between paths that travel from two different earthquakes to two different sites. This might be a proper assumption for a database with sparsely distributed seismicity and sites. However, for the Ridgecrest database used in this study where earthquakes and sites are closely located, it is reasonable to consider the correlation of paths from different earthquakes to different sites, which results in the following models:

$$\begin{aligned}&\begin{aligned} k([x_s, x_e],[x_s', x_e'])&=\tau _{L2L}^2 2^{D/2}|\Lambda (x_s, x_e)|^{0.25}|\Lambda (x_s', x_e')|^{0.25}\\&*|\Lambda (x_s, x_e) + \Lambda (x_s', x_e')|^{-0.5}\exp {(-\sqrt{Q([x_s, x_e],[x_s', x_e'])})} \end{aligned} \end{aligned}$$
(39)
$$\begin{aligned}&\quad Q([x_s, x_e],[x_s', x_e']) = ({\hat{x}}-{\hat{x}}')^T(\frac{\Lambda (x_s, x_e) + \Lambda (x_s', x_e')}{2})^{-1}({\hat{x}}-{\hat{x}}') \end{aligned}$$
(40)
$$\begin{aligned}&\quad \Lambda (x_s,x_e) = \begin{pmatrix} \ell ^2(x_s,x_e) &{} 0 &{} 0 &{} 0\\ 0 &{}\ell ^2(x_s,x_e) &{} 0 &{} 0 \\ 0 &{} 0 &{}\ell ^2(x_s,x_e) &{} 0 \\ 0 &{} 0 &{} 0 &{}\ell ^2(x_s,x_e) \end{pmatrix} \end{aligned}$$
(41)

where \({\hat{x}} = [x_e^T,x_s^T]^T\) is a 4 by 1 column vector, then D in Eq. 39 becomes 4. The functional forms of \(\ell\) are the same as Eqs. 36, 37, and 38. These models are similar to the previous ones (i.e., PATH-1a,1b, and 1c) except that now the correlation for any two paths depends on both the distances between two earthquakes and between two sites. Accordingly, we denote these models with length functions in Eq. 36, 37, and 38 as PATH-2a, PATH-2b, and PATH-2c, respectively, and the corresponding isotropic stationary model is denoted as PATH-2. The details of the models are presented in Table 3.

Table 3 Characteristics of different models for path effects

Figure 10a shows the path-effect correlation as a function of separation distance given \(R_{rup} = 90\) km and \(M=4.3\) for all correlation models developed in this study. The anisotropic non-stationary models have similar trends compared to the isotropic stationary models. This may be because the considered \(R_{rup}\) and M values are close to the average values in the Ridgecrest database. The PATH-2a,b,c models show a lower correlation compared to the PATH-1a,b,c models; this is likely because they consider the correlation of paths from different earthquakes to different sites. Figure 10b shows the correlation considering a different magnitude and rupture distance (i.e., \(R_{rup}\)= 150 km and M = 7.5). While the trends of the isotropic stationary models remain unchanged, the anisotropic non-stationary models exhibit increased correlations due to their magnitude- and distance-dependent length functions. The models with a linearly-dependent length function (e.g., PATH-1a and PATH-2a) show a higher correlation compared to models with other length functions.

Fig. 10
figure 10

Correlation of path effects as a function of separation distances between stations and/or earthquakes for different models considering (a) \(R_{rup}\) = 90 km, M = 4.3 and (b) \(R_{rup}\) = 150 km, M = 7.5

We also examine the impact of the cell-specific attenuation approach (Dawood and Rodriguez-Marek 2013) on the spatial correlation of path effects. In this case, we divide the Ridgecrest area into multiple 20-by-20-km cells and compute attenuation coefficients for each cell. Then the anelastic attenuation term \(\lambda _7R_{rup}\) in the ergodic GMM (Eq. 20) is replaced by the cell-specific attenuation term:

$$\begin{aligned} \sum _{i=1}^{N_{cell}} c_iR_i \end{aligned}$$
(42)

where \(N_{cell}\) is the total number of cells in the Ridgecrest area, \(c_i\) is the attenuation coefficient for the i th cell, and \(R_i\) is the fraction of a path within the i th cell. The coefficients of the correlation models are re-estimated using the event-site-corrected residuals (\(\delta WS\)) adjusted by the cell-specific attenuation.

Figure 11 shows the RMSE and MNLL of the path effects prediction on the test set using the correlation models without cell-specific attenuation. The isotropic stationary models have a higher RMSE and MNLL than the anisotropic non-stationary models, which show a similar reduction in RMSE and MNLL with respect to the ergodic one. The PATH-2a, 2b, and 2c models have slightly lower RMSE and MNLL than the other models. This may be associated with the Ridgecrest earthquakes being located in a relatively narrow region; hence, there are potential correlations between paths that come from two nearby earthquake sources to two different stations, which cannot be captured by the PATH-1-type models. The influences of the three correlation length functions on the RMSE are unnoticeable, but the linear functional form (e.g., \(\ell ^2(M,R_{rup}) = (p + qMR_{rup})^2\), corresponding to PATH-1a and PATH-2a models) shows slightly better MNLL than others. It is also interesting to observe that the PATH-KA model has a comparable performance (in terms of both RMSE and MNLL) to the models developed in this study, which possibly results from the similarity in the scales of crustal heterogeneity between Ridgecrest and ANZA array (Kuehn and Abrahamson 2020).

Fig. 11
figure 11

Comparing the predictive performance of different correlation models for path effects in terms of a RMSE and b MNLL

Similar comparisons for correlation models with cell-specific attenuation are shown in Figure 12. The prediction performance of these models is similar to those shown in Figure 11, indicating that the cell-specific attenuation approach has a minimal impact on the spatial correlation of path effects, which is also reflected by the estimated model coefficients in Tables 4 and 5, where only small changes in coefficients can be observed for models with and without cell-specific attenuation. This observation could be attributed to the spatial distribution of earthquakes and sites in the Ridgecrest database. The clustered earthquake epicenters and stations lead to high correlation and similarity in the propagation paths of seismic waves (e.g., Kotha et al. (2017)). As a result, it is likely that multiple paths with similar lengths pass through the same cells, concealing the potential variations of path effects among these paths. More sophisticated cell grid designs (e.g., grids with heterogeneous cell shapes or sizes and different orientations) could be considered beneficial in future studies to improve the cell-specific attenuation approach.

Fig. 12
figure 12

Comparing the predictive performance of different correlation models with cell-specific attenuation for path effects in terms of a RMSE and b MNLL

Table 4 Coefficients of correlation models for path effects
Table 5 Coefficients of correlation models for path effects with cell-specific attenuation adjustments

7 Spatial correlation of systematic site effects

The risk assessment of spatially distributed infrastructures (such as portfolios of buildings, pipeline networks, etc.) requires the estimation of the spatial correlation of systematic site effects for regional PSHA (Giorgio and Iervolino 2016; Chioccarelli et al. 2019). Following the common methodologies in previous studies (Foulser-Piggott and Goda 2015; Abrahamson et al. 2019; Jayaram and Baker 2009; Chao et al. 2021; Rahpeyma et al. 2018), we model the spatial correlation of site effects using an isotropic stationary covariance function of between-site distances:

$$\begin{aligned} k(x_s,x_s') = \phi _{S2S}^2\exp {\left(-\frac{||x_s-x_s'||}{\ell }\right)} \end{aligned}$$
(43)

The estimated coefficients in Eq. 43 are shown in Table 6. The site effects predicted using Eq. 43 are compared with the between-site residuals (\(\delta S\)) in Ridgecrest as shown in Figure 13. The site effects show consistent spatial variation with \(\delta S\) and decrease to zero when extrapolating to distant regions with no data. This spatial variation of site effects represents potential spatial deviations from the average correlation between \(V_{s30}\) and deep shear wave velocity profiles as well as potential differences from average topographies effects described by the ergodic GMM. Figure 14 shows the predictive performance of the correlation model on the test set, where a significant reduction in RMSE and MNLL can be observed as compared to the ergodic model. We denote this correlation model as SITE. One could further investigate whether the spatial correlation of site effects could be better modeled with a non-stationary correlation function that takes into account for example differences between stations inside and outside basins (Chen et al. 2021).

Table 6 Coefficient of the correlation model for site effects
Fig. 13
figure 13

Spatial distribution of a between-site residuals are compared with b predicted site effects by the correlation model

Fig. 14
figure 14

Comparing the predictive performance of the correlation model for site effects with the ergodic model in terms of a RMSE and b MNLL

8 Evaluation of overall performance

In this section, we compare the performance of different combinations of source-, path-, and site-effect models on the PGA prediction for the test set. The comparisons also include the ergodic GMM to provide additional insights. For simplicity, we consider only the PATH-1a and PATH-2a models without cell-specific attenuation for path effects since all path-effect models have similar performance. Hence, five models are compared, as summarized in Table 7.

Table 7 Different combinations of correlation models for systematic effects to generate PGA prediction on the test set

Figure 15 shows the RMSE and MNLL for the five models. Significant reductions in RMSE and MNLL are observed for all models. The similar performance between Models A and C, as well as between Models B and D, is associated with the similar performance of the SRC-1 and SRC-2 models, which in turn only have a minimal improvement compared to the ergodic source-effect model (see Figure 7). By considering the correlation between paths from different earthquakes to different sites, Models B and D show slightly lower RMSE and MNLL than Models A and C. This is associated with the slightly better performance of the PATH-2a model over the PATH-1a model. Interestingly, despite being developed using a different database, Model KA exhibits significant improvement with respect to the ergodic GMM. The models developed in this study show a slightly better performance than Model KA. Considering the prediction performance and computational efficiency, we recommend the use of Model C. Model B could be preferred for its highest prediction accuracy and precision if the computational cost is not a major concern.

Fig. 15
figure 15

Comparing the predictive performance of different combinations of correlation models for systematic effects in terms of a RMSE and b MNLL

9 Discussion

Compared to the correlation models for systematic path and site effects, the models for systematic source effects do not show a significant improvement in predictive performance with respect to their ergodic counterpart. This observation could be attributed to the database used in this study. The Ridgecrest database mainly consists of aftershocks with similar magnitudes ranging from 3 to 5 and with epicenters densely located in a small region. Hence, it is challenging to model the spatial correlation of source effects at large scales and assess the potential magnitude dependence of correlation structures. Similarly, the cell-specific attenuation approach shows a minimal impact on the prediction of path effects and the coefficients of correlation models for the reasons discussed in the previous section. Additional correlation models for source effects and alternatives to the current formulation of the cell-specific attenuation approach should be further investigated when more comprehensive ground motion databases are available.

The PATH-2a to 2c models have slightly better predictive performance than the PATH-1a to 1c models. However, it is worth pointing out that the PATH-1a, 1b, and 1c models produce sparse covariance matrices (Eq. 33) as compared to the PATH-2a, 2b, and 2c models that generate large dense matrices; hence, the PATH-1a, 1b, and 1c models can have a higher computational efficiency by using specialized matrix computation algorithms (Lawrence et al. 2003; Melkumyan and Ramos 2009). For example, in this study, the coefficients in the PATH-1a, 1b, and 1c models are estimated using STAN on a local CPU, while the coefficients in the PATH-2a, 2b, and 2c models have to be solved using TensorFlow Probability (Dillon et al. 2017) with TPU (Tensor Processing Unit, Jouppi et al. (2017)) acceleration on the cloud (Bisong 2019). Specifically, the PATH-2a, 2b, and 2c models would take 30 times more time than the PATH-1a, 1b, and 1c models if they are run in a local CPU. More sophisticated correlation models for path effects should be further explored in future studies to account for both computational efficiency and predictive performance.

It is also interesting to highlight that the correlation models for systematic path effects developed in Kuehn and Abrahamson (2020) for the ANZA array show a performance that is comparable to the models developed in this study. Incidentally, Kuehn and Abrahamson (2020) also found that the correlation structures of path effects for the ANZA array were similar to those for a ground motion database in Taiwan (NCREE 2015). These observations potentially suggest that the spatial correlation structures for path effects in different regions might be potentially similar. This makes promising the transference of the spatial correlation structures from one region to another by assuming similar scales of crustal heterogeneity across regions, which is consistent with the observations in Kuehn and Abrahamson (2020). Lastly, the correlation models investigated in this study are restricted to PGA; correlation models for spectral accelerations at other periods or other IMs should be evaluated in future studies.

10 Conclusions

In this study, we have developed spatial correlation models for systematic source, path, and site effects for the Ridgecrest area and compared their performance against existing correlation models. Specifically, we first develop an ergodic GMM and then partition its residuals to estimate the spatial correlation structures of systematic effects. We propose an anisotropic non-stationary correlation model that considers fault geometries and an isotropic stationary model for source effects, several isotropic stationary or anisotropic non-stationary models for path effects, and an isotropic stationary model for site effects.

We find that the anisotropic non-stationary correlation model for source effects captures the fault geometries and extrapolates more reasonably to regions with no data than isotropic stationary correlation models. In terms of path effects, the proposed anisotropic non-stationary correlation models show similar overall performance, which is better than that of the isotropic stationary models. In addition, we find that an isotropic stationary model performs well in capturing the spatial distribution of site effects, which is consistent with previous studies. We also observe that the performance gained from correlation models for source effects is lower compared to the gain in performance by including path and site effects, which may be associated with the spatial distribution of seismicity in the Ridgecrest area. Among the models considered in this study, we recommend the use of Model C (SITE+SRC-2+PATH-1a) as it provides a good balance in terms of prediction performance and computational efficiency. Model B (SITE+SRC-1+PATH-2a) has the highest prediction accuracy and could be preferred if the computational cost is not a major concern. The different correlation models developed in this study can also be incorporated in a logic tree for non-ergodic PSHA if additional epistemic uncertainties associated with the spatial correlation structures of systematic effects need to be considered.

The correlation models developed in this study are also compared with those developed by Kuehn and Abrahamson (2020) using different databases. We find that models for path effects in Kuehn and Abrahamson (2020) have comparable performance to those developed in this study, which may suggest that the spatial correlation structures of path effects are similar for different regions, making them potentially transferable. Interestingly, these models are developed based on very different databases (e.g., number of recordings, locations, spatial density, etc.); hence, this observation should be future investigated in future studies, emphasizing potential physical constraints. Lastly, this study investigated correlation structures for PGA; future studies should consider spectral accelerations at other periods and other IMs.

11 Data and resources

The United States Geological Survey Ridgecrest (Rekoske et al. 2020) and the Pacific Earthquake Engineering Research (PEER) NGA-West2 (Ancheta et al. 2014) ground motion databases are used in this study. The programs STAN (Carpenter et al. 2017) and TensorFlow Probability (Dillon et al. 2017) are used to infer the parameters of correlation models.