1 Introduction

Peak ground acceleration (PGA) and spectral acceleration ordinates (Sa) are the most widely used ground motion intensity measures (IM) for seismic hazard analyses. The probabilistic nature of these parameters is a well-researched topic and several ground motion prediction equations (GMPE), which provide estimations of their median and dispersion, have been developed (e.g., the NGA-West2 generation of GMPEs Bozorgnia et al. 2014). Moreover, estimating seismic risk on spatially distributed infrastructure (e.g., lifelines) or on many structures in a region requires not only the estimation of mean and dispersion of ground motion parameters at each location, but also requires the characterization of correlations of their residuals (e.g., Wesson and Perkins 2001; Lee and Kiremidjian 2007). Note that this correlation is applied to ground motion residuals, therefore it is related to changes in variability during different earthquakes and at different locations.

The spatial correlation of Sa values at different locations has been investigated in several studies over the last 15 years (Wesson and Perkins 2001; Boore et al. 2003; Kawakami and Mogi 2003; Wang and Takada 2005; Park et al. 2007; Goda and Hong 2008; Goda and Atkinson 2009, 2010; Hong et al. 2009; Jayaram and Baker 2009; Goda 2011; Sokolov et al. 2012; Sokolov and Wenzel 2013a; Loth and Baker 2013). The interested reader is also referred to the exhaustive literature review developed by Sokolov and Wenzel (2013b) about spatial correlations of ground motions. In general, previous studies grouped sets of a few different events for developing overall spatial correlation equations, mainly because, with the exception of the 1999 Chi–Chi and 1994 Northridge earthquakes, individual earthquakes have not produced the sufficient number of records for accurate estimations. Although this procedure generates smooth estimations of spatial correlations, it has the drawback of neglecting the variability between different events: the “event-to-event” variability. Even though some authors have commented on this variability when comparing the results of California events with the 1999 Chi–Chi earthquake and its aftershocks (Goda and Hong 2008; Jayaram and Baker 2009), the first author to highlight the large event-to-event variability was Goda (2011), who compared spatial correlations of intraevent terms (also referred to as intraevent spatial correlations) computed from 41 different earthquakes. He found that, for example, the intraevent correlation at 10 km between Sa values at a vibration period T = 0.2 s has a median of 0.5, but it could vary approximately between 0.1 and 0.8. These two values would lead to significantly different results in a regional seismic risk estimation from those computed using a single value obtained from an approximate equation that groups all the events and neglects the high variability (Goda and Atkinson 2009; Sokolov and Wenzel 2011).

In this context, the main objective of the present work is to quantitatively evaluate the event-to-event uncertainty of the spatial correlation of PGA and Sa ordinates, and to propose an approach for explicitly considering it in seismic hazard and risk analyses. In particular, this investigation presents a correlation model, which is then fitted to 39 well-recorded earthquakes. A statistical study of the resulting parameters is conducted in order to propose a new methodology for considering the uncertainty of the spatial correlation model in future regional seismic hazard and risk analyses.

2 Intraevent spatial correlation model

The intraevent spatial correlation model described in this section is similar to the one presented by other authors (Goda and Hong 2008; Goda and Atkinson 2009, 2010; Hong et al. 2009; Jayaram and Baker 2009; Goda 2011). Several studies have shown that response spectral ordinates can be assumed to be lognormally distributed random variables (e.g., Abrahamson 1988; Jayaram and Baker 2008), and GMPEs can be used to estimate their median and dispersion as a function of the event magnitude, source-to-site distance, and other variables, such as the local soil conditions, fault mechanism, and tectonic region. In the case of spectral acceleration ordinates, this is expressed as follows:

$$\ln Sa_{ik} \left( T \right) = f\left( {T, M_{k} ,R_{ik} ,{\boldsymbol \theta_{\bf{ik}}} } \right) + \eta_{k} \left( T \right) + \varepsilon_{ik} \left( T \right),$$
(1)

where Saik(T) is the pseudo-acceleration spectral ordinate for a vibration period T at the i-th site in the k-th event. The function f(T, M, R,θ) is the logarithmic mean value estimated by a GMPE as a function of the event magnitude M, source-to-site distance R, and a set of other explanatory variables θ. Note that this function is deterministic, given a set of input parameters. The randomness of the intensity measure is accounted by ηk(T) and εik(T), which are the interevent (also referred to as between-event) and intraevent (also referred to as within-event) residual terms, respectively. These terms are assumed to be independent and normally distributed random variables with zero mean and standard deviations ση(T) and σε(T), respectively. Note that in some models these standard deviations might be expressed also as a function of the event magnitude and other variables. The interevent term, ηk(T), represents the variability between different earthquake events, independent of the site, while the intraevent term, εik(T), represents the site-to-site variability within an earthquake event. Finally, since ηk(T) and εik(T) are assumed to be independent, then the total standard deviation of ln Saik(T) is σT(T), given by:

$$\sigma_{T} \left( T \right) = \sqrt {\sigma_{\eta }^{2} \left( T \right) + \sigma_{\varepsilon }^{2} \left( T \right)} .$$
(2)

Estimating a correlation between spectral ordinates Sa(T) at different locations would not be appropriate, even when using records from a single earthquake. This is because the underlying distribution of each realization (i.e., values of the IM) is different, as they have different values of Rik and θik. However, realizations of the residual terms ηk(T) + εik(T) come from the same distribution (normal distribution with zero mean and standard deviation σT(T), as explained above). Thus, the correlation between residual terms, η(T) + ε(T), at two sites i and j separated by a distance Δij, and for two different periods Ti and Tj, respectively, can be demonstrated to be given by:

$$\rho_{T} \left( {\varDelta_{ij} ,T_{i} ,T_{j} } \right) = \frac{{\rho_{\eta } \left( {T_{i} ,T_{j} } \right)\sigma_{\eta } \left( {T_{i} } \right)\sigma_{\eta } \left( {T_{j} } \right) + \rho_{\varepsilon } \left( {\Delta_{ij} ,T_{i} ,T_{j} } \right)\sigma_{\varepsilon } \left( {T_{i} } \right)\sigma_{\varepsilon } \left( {T_{j} } \right)}}{{\sigma_{T} \left( {T_{i} } \right)\sigma_{T} \left( {T_{j} } \right)}},$$
(3)

where ρTij,Ti,Tj) is the correlation between ηk(Ti) + εik(Ti) and ηk(Tj) + εjk(Tj), ρη(Ti,Tj) is the interevent correlation between ηk(Ti) and ηk(Tj), and ρεij,Ti,Tj) is the intraevent correlation between εik(Ti) and εjk(Tj). Note that, even when for very large separation distances ρεij,Ti,Tj) is expected to decay to zero, there will be always some total correlation ρΤij,Ti,Tj), due to the correlation between interevent residuals of the same earthquake. Goda and Hong (2008) proposed that the total spatial correlation can be approximated by:

$$\rho_{T} \left( {\varDelta_{ij} ,T_{i} ,T_{j} } \right) \cong \frac{{\rho_{0} \left( {T_{i} ,T_{j} } \right)\left[ {\sigma_{\eta } \left( {T_{i} } \right)\sigma_{\eta } \left( {T_{j} } \right) + \rho_{\varepsilon } \left( {\Delta_{ij} ,T_{\hbox{max} } ,T_{\hbox{max} } } \right)\sigma_{\varepsilon } \left( {T_{i} } \right)\sigma_{\varepsilon } \left( {T_{j} } \right)} \right]}}{{\sigma_{T} \left( {T_{i} } \right)\sigma_{T} \left( {T_{j} } \right)}},$$
(4)

where Tmax is the largest value of the two periods, Ti and Tj, and ρ0(Ti,Tj) represents the empirical approximation of ρTij = 0,Ti,Tj), previously studied by several authors (e.g., Inoue and Cornell 1990; Baker and Cornell 2006; Abrahamson and Silva 2007; Baker and Jayaram 2008; Abrahamson et al. 2013). The approximation from Eqs. (3) to (4) comes from the assumption of a Markov dependence of residuals at different periods, and its correctness was demonstrated by Loth and Baker (2013). Thus, the focus of this paper is the study of the intraevent spatial correlation ρεij,Tmax,Tmax), which will be denoted ρε(Δ,T) hereafter.

In order to estimate ρε(Δ,T), a regression analysis has to be carried out in a first step in order to determine f(T, M, R,θ), ση(T) and σε(T). In this paper the GMPE developed by Boore et al. (2014) is used for computing the mean and dispersions of the regression. The regression residuals are then used to evaluate the intraevent spatial correlation. For a given event and period, the interevent residual term, ηk(T), is a constant for all the sites, therefore the residuals ln Saik(T) − f(T, Mk, Rik, θik) = ηk(T) + εik(T) only give information about the intraevent spatial correlation. One approach for estimating ρε(Δ,T) is to directly compute the covariance and the correlation coefficient between residuals at sites separated at a certain distance Δ. Another approach, first recommended by Goda and Hong (2008), which is consistent with the geostatistical practice, and also used by Jayaram and Baker (2009) and Loth and Baker (2013), is to assume stationarity and isotropy, then using the sample semivariogram (Goovaerts 1997), [σd(Δ,T)]2/2. The term σd(Δ,T) is the standard deviation of εd(Δ,T) = εik(T − εjk(T), where the i-th and j-th sites have a separation distance Δ (note that, as it is unlikely to find several data points with an exact separation Δ, the data points are organized in bins of separation distance). In other words, all the pairs of residuals at sites with a separation distance within the range of distances of a given bin are subtracted to generate a new variable εd(Δ,T) and then σd(Δ,T) is computed as its standard deviation. Finally, the intraevent spatial correlation is evaluated as:

$$\rho_{\varepsilon } \left( {\Delta ,T} \right) = 1 - \frac{1}{2}\left[ {\frac{{\sigma_{d} \left( {\Delta ,T} \right)}}{{\hat{\sigma }_{\varepsilon } \left( T \right)}}} \right]^{2} ,$$
(5)

where \(\hat{\sigma }_{\varepsilon } \left( T \right)\) is the intraevent standard deviation of the event k. It is paramount to note that the intraevent standard deviation from the original GMPE, σε(T), must not be used as \(\hat{\sigma }_{\varepsilon } \left( T \right)\) in Eq. (5), because this is a constant value for different events (all the events considered in the development of the GMPE), while in reality \(\hat{\sigma }_{\varepsilon } \left( T \right)\) may vary significantly from one event to another. Therefore, using a constant \(\hat{\sigma }_{\varepsilon } \left( T \right) = \sigma_{\varepsilon } \left( T \right)\) would introduce a bias into the estimation of ρε(Δ,T). Thus, it is generally recommended to assume that the semivariogram reaches a plateau at long separation distances, where the correlation is theoretically zero. Therefore \(\hat{\sigma }_{\varepsilon } \left( T \right)^{2}\) can be assumed to be equal to 0.5 σd(Δ,T)2 for a very large separation distance. In this work, the semivariogram approach is taken, separation distances that differ by no more than 3 km are grouped into the same bin, and \(\hat{\sigma }_{\varepsilon } \left( T \right)^{2}\) is computed as the plateau value of 0.5 σd(Δ,T)2 at distances between 85 and 180 km. Using other width ranges of separation distances in each bin and large distances for \(\hat{\sigma }_{\varepsilon } \left( T \right)\) produce almost the same results than those shown in the following sections.

At this point it is important to note that the methodology previously described is in theory inconsistent if the GMPE that estimates f(T, M, R,θ), ση(T), and σε(T) of Eq. (1) does not explicitly include the spatial correlation of residuals, which is the common practice (e.g., the NGA-West2 generation of GMPEs assume no spatial correlation). However, Hong et al. (2009) included the spatial correlation in their regression analysis for developing a GMPE with a set of California records, concluding that the effect is negligible when comparing it with a GMPE without spatial correlation. Therefore, even when the procedure previously described is in theory inconsistent, in practice it can be used with standard GMPEs without significant errors.

Once the empirical ρε(Δ,T) was obtained using Eq. (5), the following functional form was fitted to the data:

$$\hat{\rho }_{\varepsilon } \left( {\Delta ,T} \right) = \exp \left[ { - \left( {\frac{\Delta }{\beta \left( T \right)}} \right)^{\alpha \left( T \right)} } \right],$$
(6)

where α and β are the model parameters obtained using a nonlinear regression, which are function of the vibration period T. The parameter α controls the decaying rate of correlation with increasing distance Δ, and β is the distance at which the correlation is exp(− 1) = 0.368. Note that for a fixed value of β, a higher value of α produces higher correlation values for distances Δ < β, and lower correlation values for Δ > β. A least-squares regression was used to fit the model and obtain the parameters α and β as a function of vibration period. However, as correlation coefficients have non-constant standard errors, the following transformation was used, known as Fisher z transformation:

$$z = \frac{1}{2}\log \left( {\frac{1 + \rho }{1 - \rho }} \right),$$
(7)

where ρ is the estimated correlation from Eq. (5) and z is the transformed value, now with a constant standard error. Then, in order to obtain the model parameters, α and β, the least-squares regression was conducted with the z values.

Although this paper is focused on ρε(Δ,T), please note that once this correlation is estimated, the total correlation between residual values ρΤij,Ti,Tj), can be easily computed using Eq. (4). Furthermore, as the residual terms are the only source of uncertainty in Eq. (1), the correlation between ln Sa(T) values is equal to the total correlation between residual terms.

3 Ground motion database

The correlation model expressed in Eq. (6) was fitted to the empirical correlations of different earthquakes individually. The reliability of these empirical correlations increases with the number of stations that recorded the earthquake (i.e., as the sample size increases). Therefore, only well-recorded events, with more than 200 recorded ground motions (i.e., 100 stations with two horizontal perpendicular directions), were considered. This is also consistent with the selection criteria used by Goda (2011). A total of 39 earthquakes were selected, with magnitudes ranging between 4.0 and 7.9 in order to study the influence of the event magnitude in the spatial correlation. Here, only ground motions recorded in stations with NEHRP site class C or D (average shear wave velocity of the top 30 m, VS30, between 180 and 760 m/s) were considered, as these are the most common site classes encountered in most urban areas. Table 1 shows the information about the earthquakes considered. All the ground motions used in this study were obtained from the NGA-West2 database (Ancheta et al. 2014).

Table 1 Summary of considered earthquakes

It should be noted that the reliability of estimates of correlation coefficients computed from a given sample size increases as the absolute value of the correlation coefficient increases. Thus, correlation coefficients from this study are particularly well estimated for separation distances smaller than 30 km, which are the ones that have the largest influence in regional seismic hazard and risk analyses of urban areas.

4 Variability of intraevent spatial correlation

The procedure described in Sect. 2 was applied to every earthquake event, for PGA and for pseudo-acceleration spectral ordinates at periods of vibration between 0.1 and 6.0 s. Figure 1 illustrates, as an example, the results of the 2007 Chuetsu-oki, Japan earthquake for PGA and Sa(5.0 s). The fitted parameters α and β for this event are (α, β) = (0.59, 20.05 km) for PGA, and (α, β) = (0.56, 11.69 km) for Sa (5.0 s). Figure 2a shows the results of fitting the model of Eq. (6) to the 39 earthquake events of Table 1, for Sa (1.0 s). The mean value of the correlation coefficient and the 16th and 84th percentiles, as function of separation distance, are also shown in the same figure. Figure 2b compares the spatial correlation results for Sa(1.0 s) from the 39 events and their mean with previous models developed by Goda and Hong (2008), Jayaram and Baker (2009), and Goda and Atkinson (2010). Figure 2c presents the coefficient of variation (COV) of the correlation coefficients for Sa at four different periods of vibration, as a function of separation distance. As can be observed, there is a significant variability in the intraevent correlation coefficient at a given separation distance for every period. In order to evaluate this variability with an overall measure, the following sections are focused on the probability distribution of the model parameters α and β.

Fig. 1
figure 1

Empirical and fitted spatial intraevent correlation for PGA and Sa (5.0 s) in the 2007 Chuetsu-oki, Japan earthquake. For PGA (α, β) = (0.59, 20.05 km), while for Sa (5.0 s) (α, β) = (0.56, 11.69 km)

Fig. 2
figure 2

a Fitted spatial intraevent correlation for Sa(1.0 s) for 39 individual earthquake events, their mean, and their 16/84th percentiles. b Comparison of the fitted spatial intraevent correlation for 39 individual earthquake events and their mean, with previous models. c Coefficient of variation (COV) of the intraevent correlation coefficient as a function of separation distance, for four different periods of vibration

4.1 Central tendency and variability of correlation model parameters

The resulting values of α and β as a function of period for each individual earthquake are shown in Fig. 3a, b, in light gray lines. The central tendencies and the counted 16th and 84th percentile values are also presented in the same figures. As noted before, the parameters fitted to earthquakes with more records are smoother and more reliable than those fitted to events with less ground motion records, thus the central tendency is computed as the weighted geometric mean, where each earthquake result is weighted by the square of the number of stations (see Table 1 for the number of stations of each event). Dispersions of the parameters α and β as a function of period were computed as the weighted standard deviation of the natural logarithm of α and β values, and are shown in Fig. 3c. As can be seen, the parameter α is fairly constant across periods, with a value approximately equal to 0.55, and it has a significantly lower dispersion than the parameter β. Thus, the total variability of \(\hat{\rho }_{\varepsilon } \left( {\Delta ,T} \right)\) is dominated by the dispersion of β. Therefore, in order to simplify the model of Eq. (6), a new functional form can be used:

Fig. 3
figure 3

a Variation of parameter α as a function of the period of vibration for individual earthquakes, and its central tendency weighted by the square of the number of stations of each event. b Variation of parameter β as a function of the period of vibration for individual earthquakes, and its central tendency weighted by the square of the number of stations of each event. c Dispersion (weighted logarithmic standard deviation) of parameters α and β as a function of the period of vibration

$$\hat{\rho }_{\varepsilon } \left( {\Delta ,T} \right) = \exp \left[ { - \left( {\frac{\Delta }{\beta \left( T \right)}} \right)^{0.55} } \right].$$
(8)

This also simplifies the comparison between different curves, since for a fixed α, a higher value of β is directly translated into higher spatial correlations. New β values were then computed with regression analyses using Eq. (8), consistent with the fixed value of α = 0.55. Figure 4a illustrates the resulting β values for each individual event (in gray lines) with its corresponding weighted geometric mean, plotted as a function of period of vibration. The weighted geometric mean of β is also shown in Table 2. In order to compare these results to previous correlation models, β values are compared with the distances at which the correlation equals exp(− 1)  = 0.368  according to the models proposed by Goda and Hong (2008) and by Jayaram and Baker (2009). The former model was developed with 39 California earthquakes, while the latter only considered seven events and has two branches for periods shorter than 1.0 s, depending on the clustering of site conditions. Figure 4b and Table 2 show the dispersion of the β values computed for individual events, around the central tendency, for a fixed α = 0.55. Note the high dispersion of the parameter β, between 0.6 and 1.2. As a reference, these dispersion values, which are currently neglected in regional seismic hazard analyses, are higher than those of spectral acceleration ordinates in GMPEs, dispersion values that are routinely incorporated in probabilistic seismic hazard analyses.

Fig. 4
figure 4

a Variation of β (for α = 0.55) as a function of the period of vibration for individual earthquakes, and its central tendency weighted by the square of the number of stations of each event. The results are compared with the models proposed by Goda and Hong (2008) and by Jayaram and Baker (2009). b Dispersion (weighted logarithmic standard deviation) of β (for α = 0.55) as a function of the period of vibration

Table 2 Weighted geometric mean and weighted logarithmic standard deviation of β

4.2 Probability distribution of β

At every period of vibration, the empirical cumulative probability distribution of β was computed. For this, the β values were sorted in ascending order and for each observation i, a probability (i.e., plotting position) equal to (i − 3/8)/(n + 1/4) was assigned, where n is the sample size (i.e., the number of earthquakes in this case). This plotting position, proposed by Blom (1958), has been demonstrated to be a suitable approximation of the unbiased plotting position (Cunnane 1978). An example for T = 2.0 s is presented in Fig. 5, where a positive-skewed distribution (higher upper tail) of the data points can be observed. Thus, a lognormal distribution is evaluated to determine if it can characterize the probability distribution of β, using the Kolmogorov–Smirnov (K-S) goodness-of-fit test (Massey 1951). The fitted lognormal distribution and its K-S 10% significance confidence boundaries are also presented in Fig. 5. This test was repeated for every period of vibration. Figure 6 shows the maximum absolute difference between the empirical cumulative distribution of β, Fβ, and the fitted lognormal distribution of β, F * β , as a function of period, along with the K-S 10% significance limit for this sample size, Dcrit,10%. As can be seen, β can be adequately assumed to have a lognormal distribution for all the periods.

Fig. 5
figure 5

Empirical cumulative distribution of β for T = 2.0 s, along with a fitted lognormal distribution and its K-S 10% significance confidence boundaries

Fig. 6
figure 6

Maximum absolute difference between the empirical cumulative distribution of β, Fβ, and the fitted lognormal distribution of β, F * β , as a function of period, along with the K-S 10% significance limit for this sample size, Dcrit,10%

4.3 Influence of earthquake magnitude and clustering of site conditions

In order to evaluate the influence of the event moment magnitude, Mw, β values were plotted against the magnitude of their corresponding events for each period of vibration. An example of this evaluation for T = 3.0 s is shown in Fig. 7. The Pearson’s empirical correlation coefficient between β and Mw at this vibration period is 0.35, and a slight influence can be observed, where higher magnitudes are correlated with higher β values. This correlation was found to be higher for periods greater than 1.0 s than for shorter periods, as can be seen in Fig. 8a, which presents the Pearson’s empirical correlation coefficient between β and event moment magnitude as a function of the period of vibration. However, despite this relatively important level of correlation for periods of vibration larger than 1.0 s, the variability of β is only slightly decreased (less than 20%) when the event magnitude is explicitly taken into account for estimating β. This is shown in Fig. 8b, which compares the dispersion of β before and after taking into account the earthquake magnitude for estimating it, considering a linear trend between β and Mw. This means that only a small fraction of the high variability of β is explained by changes in Mw. Considering a nonlinear model between β and Mw did not improved these results, as no clear trend, either linear or nonlinear, is observed between β and Mw.

Fig. 7
figure 7

Influence of event moment magnitude on β for T = 3.0 s. The corresponding Pearson’s empirical correlation coefficient is 0.35

Fig. 8
figure 8

a Pearson’s empirical correlation coefficient between β and event moment magnitude as a function of the vibration period. b Comparison of variability of β before and after considering event magnitude, Mw, through a linear regression analysis for β as a function of Mw

On the other hand, Jayaram and Baker (2009) showed a trend between β values and the clustering of soils with similar geological conditions. To evaluate this, they used the spatial correlation of VS30 values as a proxy for clustering of site conditions. Considering seven earthquakes, the authors concluded that regions with higher spatial correlations of VS30 present higher spatial correlations between spectral ordinates at short periods of vibration. This is the reason behind the two branches for periods shorter than 1.0 s shown in Fig. 4a, where the top branch (higher β values) is for regions with clustering of VS30 and the bottom branch (lower β values) is for regions where the soil conditions vary widely. Sokolov et al. (2012) and Sokolov and Wenzel (2013a) drew similar conclusions about the influence of clustering of site conditions in Taiwan and Japan, using the same procedure than the one used by Jayaram and Baker (2009). In this study, we follow the same approach for evaluating the influence of clustering of site conditions. Similar to the spatial correlation of intensity measures, the empirical semivariogram was computed from VS30 values of each station, at every earthquake event. From the semivariograms, empirical spatial correlation coefficients were calculated using Eqs. (5) and (8) was fitted to the resulting data. The β values obtained are termed βVS30, and represent a proxy for the clustering of site conditions: higher βVS30 values mean higher spatial correlations of VS30, which are related to the clustering of site conditions. Then, βVS30 was used as a possible explanatory variable that could partially explain the high variability of β, by conducting a linear regression analysis for β as a function of βVS30. Note, however, that this procedure only takes into account the influence of VS30, and it does not consider other possible geological variables, such as the depth of sediments. Similarly to the procedure with the magnitude, Fig. 9 shows an example of a scatter plot between β and βVS30 for T = 0.2 s, where the corresponding Pearson’s empirical correlation coefficient is only − 0.03, illustrating a negligible influence of βVS30 on β. Pearson’s empirical correlation coefficients computed for every period of vibration are presented in Fig. 10a, again showing a very low correlation for all periods. Moreover, Fig. 10b shows that the reduction in the dispersion of β when explicitly considering a regression analysis between β and βVS30 is negligible, demonstrating that βVS30 (and thus the clustering of VS30) has no influence on the spatial correlation of these intensity measures.

Fig. 9
figure 9

Influence of βVS30 (proxy for clustering of site conditions) on β for T = 0.2 s. The corresponding Pearson’s empirical correlation coefficient is − 0.03

Fig. 10
figure 10

a Pearson’s empirical correlation coefficient between β and βVS30 (proxy for clustering of site conditions) as a function of the vibration period. b Comparison of variability of β before and after considering clustering of soil conditions through a linear regression analysis for β as a function of βVS30

Finally, a linear regression analysis was conducted for every period of vibration, with the event moment magnitude, βVS30, and two earthquake characteristics: tectonic region and fault mechanism. From these predictor variables, only the event moment magnitude was found to be statistically significant at a 5% significance level, and just for β values at periods greater than 1.0 s. Moreover, the resulting reduction of the variability of β is similar to the one presented in Fig. 8b, for regressions using only the event moment magnitude as explanatory variable. This is also consistent with Figs. 7 and 9, where no clear difference between fault mechanisms is observed.

5 Monte Carlo approach for considering the variability of spatial correlation

The previous section demonstrated that the intraevent spatial correlation of intensity measures during a given earthquake is characterized by a high inherent variability, and therefore, rather than just considering one correlation model (derived with either one event or a set of events), regional risk assessments can be improved by explicitly considering this dispersion. In this context, Eq. (8) can be used with β as a lognormally distributed random variable. It is proposed that the median and the dispersion of β are computed with the following simplified equations, which are also shown in Fig. 11:

Fig. 11
figure 11

a Computed weighted geometric mean and proposed fitted model. b Computed weighted logarithmic standard deviation and proposed fitted model

$$\hat{\beta }\left( T \right) = \left\{ {\begin{array}{*{20}c} {4.231 \cdot T^{2} - 5.180 \cdot T + 13.392 \quad T < 1.37\, \rm {s}} \\ {0.140 \cdot T^{2} - 2.249 \cdot T + 17.050 \quad T \ge 1.37\, \rm{s}} \\ \end{array} } \right.$$
(9)
$$\sigma_{Ln \beta } \left( T \right) = 4.63x10^{ - 3} \cdot T^{2} + 0.028 \cdot T + 0.713$$
(10)

A simple and direct approach for incorporating the variability of the spatial correlation is to perform Monte Carlo simulations, by considering the spatial correlation model parameters as random variables. In most of the cases, different intensity measures (at different vibration periods) must be simulated at many sites. Therefore, taking advantage of the Markov approximation for reducing the number of correlated variables that must be simulated, the following simulation sequence for a given scenario earthquake can be adopted (note that, at each Monte Carlo simulation, this scenario earthquake can either stay fixed for estimating ground motion intensity measures for that particular event, or may vary for an event-based probabilistic seismic hazard analysis):

  1. 1.

    Define the set of locations (subscript j = 1,2,… J) and periods (subscript i = 1,2,… I) at which ground motion intensity measures will be simulated.

  2. 2.

    Obtain the maximum period of vibration to be simulated, Tmax = max(Ti).

  3. 3.

    Choose a GMPE to be used for simulating the ground motion intensity measures.

  4. 4.

    Use Eqs. (9) and (10) with T = Tmax to compute the median and dispersion of β(Tmax).

  5. 5.

    At the k-th simulation, obtain a realization of β(Tmax) considering a lognormal distribution with median and dispersion computed at step 4.

  6. 6.

    Assemble the total spatial correlation model with Eqs. (4) and (8).

  7. 7.

    Obtain a realization of the residuals for Tmax, ηk(Tmax) + εjk(Tmax), at every location j =  1,2,…J from a multivariate normal distribution with zero mean, total standard deviations computed from Eq. (2), and spatially correlated with the model obtained in step 6.

  8. 8.

    At each location j, obtain a realization of residuals for the rest of the periods, ηk(Ti) + εjk(Ti), conditioned on the value of the residual ηk(Tmax) + εjk(Tmax) computed in step 7 (here is where the Markov approximation is used). The conditional distributions of the residuals for the rest of the periods, Ti, can be computed using a correlation model ρ0(Ti,Tmax), such as those proposed by Inoue and Cornell (1990), Baker and Cornell (2006), Abrahamson and Silva (2007), Baker and Jayaram (2008), or Abrahamson et al. (2013).

  9. 9.

    Compute f(Ti, Mk, Rjk, θjk) for every location j and each period i, and finally obtain the ground motion intensity measures using Eq. (1).

  10. 10.

    Repeat steps 5 through 9 for the total number of simulations.

This procedure can also be used with efficient sampling schemes, such as importance sampling (Rubinstein 1981), as the standard Monte Carlo simulation method is not computationally efficient for estimating low-probability high-consequence risks (Au and Beck 2003). Moreover, note that this simulation sequence can also be applied with other intraevent spatial correlation models different than the one presented in Eq. (8), incorporating a variability into the model parameters, as did with β in this study. However, values of dispersion must be estimated with a similar approach than the one presented in this paper.

6 Conclusions

This study quantitatively evaluated the event-to-event variability of the intraevent spatial correlation of common ground motion intensity measures (PGA and spectral acceleration ordinates). For this, 39 world-wide seismic events, each having more than 100 recording stations, were considered, for a total of 15,940 ground motion records. An exponential model as a function of separation distance and using a single parameter, β, was fitted to every event independently. A probabilistic assessment of the model parameter was conducted, showing that it follows a lognormal distribution, and that the logarithmic standard deviation around its central tendency can be as high as 1.2. Different linear regression analyses were performed, and although the event moment magnitude was found to be statistically significant as a predictor variable at long periods, it explains less than 20% of the total variability of β. Moreover, the spatial correlation of VS30 values (as a proxy for clustering of site geological conditions, while the depth of sediments was not considered), the tectonic region, and the fault mechanism of the earthquakes were found not statistically significant at a 5% significance level.

Finally, this paper has presented a simple and direct Monte Carlo simulation approach for explicitly considering the event-to-event variability of the spatial correlation model when performing regional seismic hazard and risk analyses. The proposed sequence of simulation takes advantage of the Markov dependence of residuals for reducing the total number of correlated variables to be simulated, therefore greatly decreasing the computational effort involved. Explicit consideration of the event-to-event variability of the spatial correlation model will provide improved results when conducting regional hazard and risk assessments.