Statistical modeling of ground motion relations for seismic hazard analysis

Raschke, Mathias

doi:10.1007/s10950-013-9386-z

Statistical modeling of ground motion relations for seismic hazard analysis

Original Article
Published: 13 September 2013

Volume 17, pages 1157–1182, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Seismology Aims and scope Submit manuscript

Statistical modeling of ground motion relations for seismic hazard analysis

Download PDF

Mathias Raschke¹

404 Accesses
17 Citations
Explore all metrics

Abstract

We introduce a new approach for ground motion relations (GMR) in the probabilistic seismic hazard analysis (PSHA), being influenced by the extreme value theory of mathematical statistics. Therein, we understand a GMR as a random function. We derive mathematically the principle of area equivalence, wherein two alternative GMRs have an equivalent influence on the hazard if these GMRs have equivalent area functions. This includes local biases. An interpretation of the difference between these GMRs (an actual and a modeled one) as a random component leads to a general overestimation of residual variance and hazard. Beside this, we discuss important aspects of classical approaches and discover discrepancies with the state of the art of stochastics and statistics (model selection and significance, test of distribution assumptions, extreme value statistics). We criticize especially the assumption of logarithmic normally distributed residuals of maxima like the peak ground acceleration (PGA). The natural distribution of its individual random component (equivalent to exp(ε ₀) of Joyner and Boore, Bull Seism Soc Am 83(2):469–487, 1993) is the generalized extreme value. We show by numerical researches that the actual distribution can be hidden and a wrong distribution assumption can influence the PSHA negatively as the negligence of area equivalence does. Finally, we suggest an estimation concept for GMRs of PSHA with a regression-free variance estimation of the individual random component. We demonstrate the advantages of event-specific GMRs by analyzing data sets from the PEER strong motion database and estimate event-specific GMRs. Therein, the majority of the best models base on an anisotropic point source approach. The residual variance of logarithmized PGA is significantly smaller than in previous models. We validate the estimations for the event with the largest sample by empirical area functions, which indicate the appropriate modeling of the GMR by an anisotropic point source model. The constructed distances like the Joyner–Boore distance do not work well for event-specific GMRs. We discover also a strong relation between magnitude and the squared expectation of the PGAs being integrated in the geo-space for the event-specific GMRs. One of our secondary contributions is the simple modeling of anisotropy for a point source model.

Uncertainty in the estimates of peak ground acceleration in seismic hazard analysis

Article 19 November 2015

Overview and introduction to development of non-ergodic earthquake ground-motion models

Article Open access 17 August 2022

A novel model and estimation method for the individual random component of earthquake ground-motion relations

Article 16 June 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The level of local seismic impact is estimated for modern building codes and the earthquake-resistant design of industrial facilities by probabilistic seismic hazard analysis (PSHA) as a part of seismology and earthquake engineering. Therein, the average annual exceedance frequency of local earthquake ground motion intensity is estimated. An important element of PSHA is the ground motion relation (GMR), which describes the relation between the local ground motion intensity and different event parameters such as the magnitude (Bommer and Abrahamson 2006). It is also called ground motion prediction equation. We prefer the term GMR of Atkinson (2006) because we research an appropriate relation for the PSHA that needs not to be the best prediction and its residual variance for a single event. The previous GMRs are mostly modeled by a statistical regression analysis (Strasser et al. 2009) wherein the event parameters are predictors. Douglas (2001, 2002) provides a good overview of GMRs being published before 2002, and Douglas (2003) gives an excellent overview of all aspects of GMRs like estimation methods or source models. Therein, the physical unit of local ground motion intensity is the peak ground acceleration (PGA) or the maximum of another type of local time history. The parameters of current GMRs are fixed, not event-specific, including the depth parameter. The conditional probability distribution of the local ground motion intensity is generally modeled by the logarithmic normal (log-normal) distribution in the GMR, which implies a normal distribution for the logarithmized ground motion intensity (Joyner and Boore 1993; Strasser et al. 2009). This approach results in unrealistically high estimations of ground motion intensities for low exceedance frequency (Stepp et al. 2001; Abrahamson et al. 2002; Bommer and Abrahamson 2006), which has not been improved by the next generation of GMR (NGA; Abrahamson et al. 2008). Beside this, truncation of the log-normal distribution was suggested to avoid overestimations, but choosing the truncation point is difficult according to Strasser et al. (2008). Therein, statistical estimation methods for truncation points (Raschke 2011) have not been considered. We generally note a lack of consideration of current knowledge of stochastics and statistics in the research of GMR. For example, it is known for a long time that the statistical significance of regression models of GMR should be validated (Joyner and Boore 1981), but many NGAs are not validated in this sense (see Table 2). Beside this, at least the individual random component (ε ₀ of Joyner and Boore 1993) of the PGA should follow an extreme value distribution according to the extreme value statistics (Leadbetter at al. 1983; Coles 2001). Dupuis and Flemming (2006) have introduced the concept of extreme value statistics into GMR, but their paper was not considered any further. In the following section on regression models for GMRs, we criticize important statistical aspects of previous GMR and briefly call arguments for the extreme value distribution of the individual random component in Section 3. However, our break with the traditional approaches to GMRs is deeper; in Section 4, we mathematically derive the area equivalence for GMRs in PSHA inspired by equivalences in max-stable random fields (Schlather 2002; Kabluchko et al. 2009). Therein, GMRs are random functions, which include event-specific GMRs and distinction between GMRs for an actual prediction and GMRs for the PSHA. We also introduce an approach to an anisotropic point source model in this section. In Section 5, we numerically research the detectability of the distribution model and the influence of this and other items like the area equivalence on PSHA. Then, we suggest an estimation concept for our approach to GMR in Section 6, including a regression-free estimation of the variance of the individual random component. We partly apply this concept to nine suitable data sets and research the link between the event-specific GMR and the magnitude. Finally, we conclude our results in the last section. We follow here the rules of statistics, use its terms (see Upton and Cook 2008).

2 Regression model for GMR

2.1 Basic formulation

The GMR is usually formulated by a regression model with the basic formulation (Lindsey 1996; Rawlings et al. 1998; Montgomery et al. 2006)

$$ Y=g\left(\mathbf{X}\right)+{\varepsilon}^{*},E(Y)=g\left(\mathbf{X}\right),E\left({\varepsilon}^{*}\right)=0,V(Y)=V\left({\varepsilon}^{*}\right), $$

(1)

Y is the predicted variable (response variable, dependent variable, conditional variable, or regressand). The regression function g(X) includes a parameter vector θ, which is estimated. The predictors (independent variables, predicting variables, or regressors) are the elements of the random vector X = (X ₁, X _2,.., X _m). E(.) are the expectations and V(.) are the variances. The random variable ε ^* is the random component (residual, random term, or measurement error) and determines the cumulative distribution function (CDF) F _y of Y under condition of X. If g(X) ≥ 0 and V(ε ^*) is proportional to g ²(X), then we write the equivalent formulation

$$ Y=\varepsilon\;g\left(\mathbf{X}\right),\varepsilon =1+{\varepsilon}^{*},E(Y)=g\left(\mathbf{X}\right),E\left(\varepsilon \right)=1,V(Y)=V\left(\varepsilon \right){g}^2\left(\mathbf{X}\right). $$

(2)

We prefer this formulation for GMRs, wherein Y is the PGA or something similar, because the expectation is a very important characterization of a random variable and ε can be neglected under certain conditions (Section 4.1). If Y ≥ 0, then we can logarithm and formulate the popular model for GMRs (Douglas 2001, Abrahamson et al. 2008)

$$ \ln (Y)={g}^{*}\left(\mathbf{X}\right)+\xi, \kern0.5em E\left( \ln (Y)\right)={g}^{*}\left(\mathbf{X}\right),E\left(\xi \right)=0,V\left( \ln (Y)\right)=V\left(\xi \right). $$

(3)

It is assumed for most GMRs for PSHA that ξ is normally distributed (Joyner and Boore 1993; Strasser et al. 2009). This implies a model according to Eq. (2) with log-normally distributed ε. The link between Eqs. (2 and 3) is (Johnson et al. 1994, Eq. (14.8))

$$ E(Y)=g\left(\mathbf{X}\right)= \exp \left({g}^{*}\left(\mathbf{X}\right)+V\left(\xi \right)/2\right)\mathrm{and} $$

(4a)

$$ V(Y)= \exp \left(2{g}^{*}\left(\mathbf{X}\right)\right) \exp \left(V\left(\xi \right)\right)\left( \exp \left(V\left(\xi \right)\right)-1\right), $$

(4b)

$$ \varepsilon = \exp \left(\xi \right)- \exp \left(V\left(\xi \right)/2\right). $$

(4c)

We apply Eqs. (3 and 4) simultaneously even if ε is not exactly log-normally distributed (Johnson et al. 1994, Eq. (12.67) with ξ _Johnson ≈ 0). A typical formulation for a GMR is (Douglas 2002)

$$ {g}^{*}\left(\mathbf{X}\right)={\theta}_0+{\theta}_1m-{\theta}_2r-{\theta}_3 \ln (r)+{\theta}_{\mathrm{s}}{x}_{\mathrm{s}}+\dots, r=\sqrt{d^2+{h}^2},{\theta}_1>0,{\theta}_2\ge 0,{\theta}_3\ge 0 $$

(5)

with predicting variables magnitude m, source distance d (and r) and indicator variable x _s and its parameter θ _s for the site condition. The source depth is considered by h and can be a parameter or an event-specific predictor. There are many variants and extensions for g ^*(X) (Douglas 2002; Abrahamson et al. 2008).

2.2 Random components, estimation methods, and errors

The random components ε resp. ξ are independently and identically distributed (iid) variables, and the predictors are measured exactly in simple regression models. For such cases, the least squared (LS) estimation can be applied, which is equivalent to the maximum-likelihood estimation for normally distributed residuals (see Rawlings et al. 1998, p. 77). This is not popular in seismology, e.g., Castellaro et al. (2006) incorrectly claim that the residuals have to be normally distributed for the LS regression. The LS method has often been used for GMRs and is extended to random components that are not iid. Douglas (2003, Section 11) gives an overview of approaches from before 2003. The two most important approaches seem to be the one and two stage regression method with the following random components (Joyner and Boore 1993; with assumption of normal distribution)

$$ \xi ={\xi}_{\mathrm{E}}+{\xi}_{\mathrm{S}}+{\xi}_0,E\left({\xi}_{\mathrm{E}}\right)=E\left({\xi}_{\mathrm{S}}\right)=E\left({\xi}_0\right)=0, $$

(6)

wherein ξ _E is event-specific, ξ _S is site-specific, and ξ ₀ has an individual realization for each site (station) and event. We prefer the product formulation according to Eq. (2) with

$$ \varepsilon ={\varepsilon}_{\mathrm{E}}{\varepsilon}_{\mathrm{S}}{\varepsilon}_0{\varepsilon}_{\mathrm{Q}},E\left(\varepsilon \right)=E\left({\varepsilon}_{\mathrm{E}}\right)=E\left({\varepsilon}_{\mathrm{S}}\right)=E\left({\varepsilon}_0\right)=E\left({\varepsilon}_{\mathrm{Q}}\right)=1,\kern0.5em {\varepsilon}_{\mathrm{Q}}={g}_{\mathrm{actual}}\left(\mathbf{X}\right)/{g}_{\mathrm{eqivalent}}\left(\mathbf{X}\right), $$

(7)

wherein the additional pseudo-random component ε _Q (resp. ξ _Q) results from the ratio between actual and equivalent function g(X) according to Section 4. A general distribution assumption is not required, but it is obvious that ε _Q has a finite upper bound and a lower bound larger than 0 for a fixed distance d. That is one reason why ε _Q cannot be log-normally distributed.

In other GMRs, the component ε _S resp. ξ _s had been replaced by site-specific predictors x _s in Eq. (5). But there is no proof that one additional predictor can completely replace ε _S resp. ξ _S and we doubt this because site response is very complex. Independent of this, one condition of the regression models of Joyner and Boore (1993) is that predictors m and r do not include a measurement error. However, magnitudes are not measured exactly. Rhoades (1997) considers the known variance of the seismological magnitude estimation in his regression analysis for GMR. It is not considered that this known error needs not to be the only one. The actual error of the seismological estimation can be higher according to Giardini (1984). Arguments for this: The source mechanism influences the ground motion (Campbell 1981, 1993; Crouse and McGuire 1996; Sadigh et al. 1997). This acts like a measurement error of magnitudes in the GMR. An application of fewer classes of source mechanism would reduce but not eliminate it. Furthermore, the inter-event variability ξ _E can be interpreted as an error in magnitudes because the seismological magnitudes can be exact for a certain aspect of the rupture process but do not need to be exact for the GMR. The actual magnitude of GMR could be a non-measurable, latent variable, which is estimated by common magnitudes with an error in the sense of statistical error-in models (Cheng and van Ness 1999, Section 1.1). The considerable differences between the estimated residual variances of GMRs for one sample of PGAs but for different magnitude scales (see, e.g., Atkinson and Boore 1995, Table 5) support this assumption. Additionally, the magnitudes of the analyzed sample could be from different scales (e.g., Bommer et al. 2007), which acts like a measurement error.

The source-to-site distance d is also treated as exactly measured predictor. But it should include an error because there are many definitions for this distance (Douglas 2003, Section 9). How could it be possible that all these measures for the same physical aspect act without a measurement error? Moreover, the distances are determined by the seismological source estimation, which also includes errors. Even if parameters of this error would be known, it would be difficult to consider it in a regression analysis [personal communication with (pcw) Douglas, spring 2013]. Beside this, the influence of the source depth is often reduced to a fixed parameter for a defined class of earthquakes (e.g., shallow events). In other GMRs (Ambraseys and Bommer 1991), h is the seismological epicenter depth. But neither is the influence of the source depth the same for every earthquake nor is the seismological depth exactly measured. In both cases, a kind of measurement error is neglected. Furthermore, it is assumed for current GMR that the parameter vector θ of the GMR is the same for each event (Joyner and Boore 1993; Abrahamson et al. 2008).

There are more estimation methods for a regression model (e.g., Rawlings et al. 1998, Section 10; Stromeyer et al. 2004). The models for unknown measurement errors of predictors (Cheng and van Ness 1999, Section 4) are not applied for GMR as far as we know. Beside this, the aspect of estimating the estimation errors of the regression parameters is not considered in all approaches. These standard errors can be easily estimated for a simple linear LS regression with iid random components (Rawlings et al. 1998, Section 4.6). But it is more difficult for models with random effects. Joyner and Boore (1993) applied the Monte Carlo simulation to estimate the estimation error; Rhoades (1997) has computed these standard errors using the likelihood function. Chen and Tsai (2002) also give a method to estimate the standard error. But Abrahamson and Young (1992) do not give any advice for this issue regarding their procedure. We draw attention here to the fact that an estimation error can be computed by the Jackknife technique (Quenouille 1956; Efron 1979). This also applies for clustered data according to Raschke (2012, 2013), as is the case for the mixed effects. The estimation error can be applied directly to construct the confidence range and verify the statistical significance of a predictor and its parameter.

2.3 The danger of over-parameterization

We could explain the entire variance of a predicted variable Y or ln(Y) by a regression model if we use a large number of predictors and related parameters, although not all predictors have an actual influence (Rawlings et al. 1998, Fig. 8.2). The question is: how can we distinguish between significant and insignificant predictors and/or parameters? Different statistical tools can solve this problem. The first one is the significance test for the regression parameters θ _I in g(X) resp. g ^*(X) = … + θ _i X _i. We test here if θ _i ≠ 0, θ _i ≤ 0, or θ _i ≥ 0 for a defined significance level α (5 % is often used and recommended here). The last two variants are applied when physical reason bounds the influence of a predictor, e.g., a larger magnitude should be related to a larger PGA. In this case, we can be sure with a probability of 100 % − α that the actual parameter θ _I does not have a contrary sign. The smaller α is, the more rigorous is the test. The t test is such a test (Rawlings et al. 1998, Sections 1.6 and 5.3), which has seldom been applied for GMR, e.g., by Joyner and Boore (1981), Molas and Yamazaki (1995), and Ambraseys et al. (2005, pcw Douglas March 2013). Note that the classical t test cannot be applied without modification or acceptance of inaccuracies to the case of mixed effects (clustered data). The significance of a published GMR is also examined implicitly by published standard errors of the parameter estimation. If the related quantile, corresponding with α, is not smaller/larger than 0, then it implies statistical significance. This is the case for the estimations of Joyner and Boore (1993, Table 3) and Rhoades (1997, Table 1). Applying model selection criterions in the model building (Rawlings et al. 1998, Section 7) is a further possibility for guaranteeing the statistical significance, e.g., the Akaike information criterion (AIC) or the Bayesian information criterion.

The significance has to be verified for each statistical model. Otherwise, danger of over-parameterization arises. This problem applies to a considerable amount of GMRs; we list 15 examples in Table 1. This is also an issue in other researches (Raschke and Thürmer 2008).

Table 1 Examples of GMRs without sufficient validation of significance/model selection

Full size table

2.4 The test of the distribution assumption

Any statistical distribution model should be validated (D’Augustino and Stephens 1986). This also applies to the residual distribution of a GMR in PSHA although a distribution assumption is not necessary for the LS regression. A powerful goodness-of-fit test is the best method of examining the distribution assumption, as the Anderson–Darling (AD) test for a normal distribution (Landry and Lepage 1992). Contrary to the aforementioned t test, the test is the more rigorous the larger the selected significance α is. There are such tests for different distribution functions with estimated parameters (Stephens 1986). If all parameters are known, then the distribution is fully specified and the classical Kolmogorov–Smirnov (KS) test can be applied. If the KS test is applied to estimated parameters, then the test does not work (Raschke 2009). If there is not an applicable goodness-of-fit test for the distribution type used, then a quantile plot (q–q plot) can be used for a visual, qualitative test as done by Dupuis and Flemming (2006) for residuals with a mixed, non-normal distribution. However, there is no objective criterion for rejecting the distribution hypothesis in this case. A histogram is a kind of parameter-free distribution model, but it is not a tool for validating a distribution model (not mentioned by D’Augustino and Stephens 1986) because there is no objective criterion for rejection and there are many possible histograms for a sample. We state that the assumption of normally distributed ξ resp. its components in Eq. (3) is often not correctly validated for GMRs. For example, Ambraseys and Bommer (1991), Ambraseys and Simpson (1996), Ambraseys et al. (1996), Atkinson and Boore (1995), Spudich et al. (1999), Douglas and Smit (2001), Atkinson (2004), and Kalkan and Gülkan (2004) have neither assumed nor tested a distribution model. Beside this, the assumed normal distribution of ξ has been tested by the inappropriate KS test in other studies (e.g., McGuire 1977; Campbell 1981; Abrahamson 1988; Monguilner et al. 2000; Restrepo-Velez and Bommer 2003). The quantile plot (e.g., Chang et al. 2001; Bommer et al. 2004) and the histogram (e.g., Atkinson 2006; Beyer and Bommer 2007; Morikawa et al. 2008) have also been applied to validate the normal distribution although these methods are not appropriate. Note that even though ξ seems to be normally distributed, it needs not to be (see Section 5). The inappropriate test of a distribution assumption is also a problem in other researches, e.g., of flood hazard in Germany (Raschke and Thürmer 2008).

3 The distribution of the maximum of a random sequence

The popular assumption for GMR for PGAs that all random components ε _… are log-normally distributed (ξ _… normally distributed, Joyner and Boore 1993; Strasser et al. 2009) is in contradiction to the extreme value theory. According to this special field of stochastics and statistics, the maximum of a sample Y = Max{Z ₁,Z ₂,..,Z _i,…,Z _n} of iid random variables has a generalized extreme value distribution in most cases (GED, Fisher and Tippett 1928; Gnedenko 1943; Beirlant et al. 2004; de Haan and Ferreira 2006). The maximum of a sequence of non-iid random variables also has a GED under some weak conditions (Leadbetter et al. 1983, Parts II and III; Falk et al. 2011, Part III). Its CDF is written with the extreme value index γ (shape parameter), scale parameter σ, and location parameter μ

$$ G(y)= \exp \left(-{\left(1+\gamma \left(y-\mu \right)/\sigma \right)}^{\raisebox{1ex}{$-1$}\!\left/ \!\raisebox{-1ex}{$\gamma $}\right.}\right),\gamma \ne 0,\kern0.62em 1+\gamma \left(y-\mu \right)/\sigma >0, $$

(8a)

$$ G(y)= \exp \left(- \exp \left(-\left(y-\mu \right)/\sigma \right)\right),\gamma =0, $$

(8b)

with the Fréchet domain for γ > 0, the Weibull domain for γ < 0, and the Gumbel domain for γ = 0. The latter one is also called the Gumbel distribution. A fundamental property of the GED is max stability. This means that the maximum max{Y ₁ , Y ₂} of extreme value distributed variable Y is again extreme value distributed with the same extreme value index γ. If Y would be log-normally distributed, then max{Y ₁,Y ₂} would not be log-normally distributed. Neither the normal nor the log-normal distributions are max-stable. This problem is typical for the combination of the two horizontal components (Y ₁, Y ₂) of the earthquake record, it could be defined by a maximum max{Y ₁, Y ₂} (Douglas 2003, Section 6). All combinations with a maximum definition of Douglas (2003, Section 6, #2, 4, 5) result in a classical extreme value for which the GED is the natural distribution. The log-normal assumption for random component ε ₀ is wrong in this case according to the state of the art of stochastics and statistics. We consider here only maxima. In case of other combinations of the horizontal components, it is also unlikely that random component ε ₀ becomes log-normally distributed. We have investigated the distribution of the combination arithmetic mean, geometric mean, and vectorial addition of Gumbel distributed components ε ₀₁ and ε ₀₂ numerically (Appendix 2). The log-normal assumptions are rejected.

The argument of missing max stability of the log-normal assumption also applies for sub-sections of the ground motion time history. If the sub-maxima of not overlapping sub-sections of the time history are log-normally distributed, then the maximum of the entire time history cannot be log-normally distributed (exception: all sub-maxima are identical). Log-normal assumptions would also contradict all our experiences with extreme values (Hüsler et al. 2011, Raschke 2011, 2012, 2013).

We briefly investigate the possible domain of attraction for PGAs and analyze the tail of three acceleration time histories (Fig. 1). The tails are exponentially distributed, which indicates the Gumbel domain of attraction for the maxima of the accelerations (see Coles 2001, Section 4). Besides, Dupuis and Flemming (2006) have estimated a GMR with GED for the residuals of PGA with extreme value index γ ≈ 0, which also indicates the Gumbel domain.

4 GMR in the PSHA as random function in geo-space

4.1 GMRs as random functions in space

A random function is a function randomly selected from a set of functions (population). Schlather (2002, Theorem 1 and its proof; we use different symbols) applies a measurable random function W(s − t) ≥ 0 to construct a max-stable process. This max-stable process max(Y(s _i)) refers to the maxima at site s from all point events i with local event intensity Y(s _i) = m _oi W(s _i − t). Therein, t is a source allocation resp. the source point in the sense of PSHA (not necessary a point source), being part of a homogeneous Poisson process in the geo-space and at a moment scale m _o ≥ 0 with density m _o ⁻². Its (annual) maximum max(Y(s _i)) has the CDF G(y) = exp(−λ(y)) with annual exceedance function (AEF) λ(y) of annual average frequency of exceedance Y(s) ≥ y according to the limit law for extremes (Coles 2001, Section 7.3). This distribution is equivalent for different sets of random functions if their expectations E(V _o) of the volume V _o = ∫ _s W(s − t)ds are equal. This construction can be interpreted as homogeneous seismicity with m _o W(s − t) = g(X). The spatial distance s − t is part of the distance elements of the predicting vector X beside the event parameters in sub-vector X _E of X. It is scaled by the earthquake magnitude with m = ln(m _o) without influence on the shape of W(s − t) resp. g(X). The magnitude is exponentially distributed without an upper bound. According to Theorem 1 of Schlather, the AEF λ(y) is equal for different GMRs if the expectation E(V _o) is equal for any fixed event parameters like magnitude m and h. Furthermore, we can extend this equivalence to m _o W(s − t) = ε(s)g(X) according to Theorem 2 of Schlather (2002; pcw Kabluchko 2012) if the random component ε has the expectation E(ε) = 1. This includes that for an event is ∫ _s g(X)d s ≈ ∫ _s ε(s)g(X)d s if the variance and the spatial correlation of ε(s) is relative small. In all cases, we can interpret a GMR as a random function in space being an element of a set (population). Different sets of GMR act equivalent if the expectations E(V _o) are equivalent. Note that the variance and the distribution type of ε have no influence on λ(y). This could be the reason why Cornel (1968) did not explicitly consider a random component in his GMR for the PSHA with exponentially distributed magnitudes without upper bound—he did not need and could have find this out by numerical researches. Non-mathematicians can check all results in the same way.

This equivalence of GMRs works only for exponentially distributed magnitudes without upper bound. We introduce the area function K(y) to derive a general equivalence of GMRs being independent of the magnitude distribution. For this purpose, we consider at first GMR g(X) in a simple one-dimensional geo-space as shown in Fig. 2b. We use an example with two maxima of g(X) to demonstrate the general application of this equivalence. For fixed event parameters and a fixed value y, the GMR covers a certain area of all points with y ≤ g(X). The area function K(y) is for homogeneous site conditions

$$ \begin{array}{lllll}K(y)={\displaystyle \int \mathbf{1}}\left(g\left(\mathbf{X}\right)\right)d\mathbf{t},\hfill & \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=1\hfill & \mathrm{for}\hfill & y\le g\left(\mathbf{X}\right),\hfill & \mathrm{otherwise}\kern0.62em \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=0.\hfill \end{array} $$

(9)

The first derivation is the related area density measuring the amount of points with y = g(X)

$$ k(y)=- dK(y)/ dy. $$

(10)

The area function is defined according to Fig. 2a, b for a fixed source point t. But we could also fix s and draw g(X) at t although it acts at site s. We reflect the GMR in this way for 1D geo-space, as shown in Fig. 2c. We have an equivalent formulation for K(y) with

$$ \begin{array}{lllll}K(y)={\displaystyle \int \mathbf{1}\left(g\left(\mathbf{X}\right)\right)}d\mathbf{s},\hfill & \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=1\hfill & \mathrm{for}\hfill & y\le g\left(\mathbf{X}\right),\hfill & \mathrm{otherwise}\kern0.62em \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=0.\hfill \end{array} $$

(11)

The reflection becomes complicated for the two-dimensional geo-space, but Eqs. (9–11) still apply. We can illustrate the reflection for an isoline with fixed y = g(X), as shown in Fig. 2d for an anisotropic GMR for a point source and an isotropic GMR for a line source in Fig. 2e.

Now, let us assume the case of homogeneous seismicity: Each point t in the geo-space represents a source allocation with equivalent occurrence intensity ν, equivalent g(X) with X = (s,t,X _E) with event parameter X _E = (m,h,x _i) and its multivariate probability density function f _E. The distances d resp. r of the GMR are determined by s, t, h and the source model. We formulate for the AEF λ(y) of annual average frequency of exceedance Y(s) > y

$$ \lambda (y)=\nu {\displaystyle {\int}_{{\mathbf{X}}_{\mathrm{E}}}{\displaystyle {\int}_{\mathbf{t}}{f}_{\mathrm{E}}\left({\mathbf{X}}_{\mathrm{E}}\right)\left(1-{F}_y\left(y;E\left(Y\left(\mathbf{s}\right)\right)=g\left(\mathbf{X}\right.\right),V\left(Y\left(\mathbf{s}\right)\right)={g}^2\left(\mathbf{X}\right)V\left(\varepsilon \left)\right)\right.\right)}}d\mathbf{t}d{\mathbf{X}}_{\mathrm{E}},\kern0.5em \mathbf{X}=\left({\mathbf{X}}_{\mathrm{E}},\mathbf{s},\mathbf{t}\right). $$

(12)

Therein, the CDF F _y is parameterized by its expectation E(Y(s)) and variance V(Y(s)) according to Eq. (2). V(ε) can be influenced by X _E but does not include ε _Q. Equation (12) is oriented on the absolute probability integral of McGuire 1995, but there are many equivalent formulations. One includes a replacement of the integration on t by the area density k(y) and the integration on y = g(X) with

$$ \lambda (y)=\nu {\displaystyle {\int}_{{\mathbf{X}}_{\mathrm{E}}}{\displaystyle {\int}_zk\left(z\Big|{\mathbf{X}}_{\mathrm{E}}\right){f}_{\mathrm{E}}\left({\mathbf{X}}_{\mathrm{E}}\right)\left(1-{F}_y\left(y;E\left(Y\left(\mathbf{s}\right)\right)=z,V\left(Y\left(\mathbf{s}\right)\right)={z}^2V\left(\varepsilon \right)\right)\right)}} dzd{\mathbf{X}}_{\mathrm{E}}, $$

(13)

because the integration in the geo-space in Eq. (12) is nothing else than a computation of the amount of points with y = g(X) in the sense of measure theory (Billingsley 1995, Chapter 2). Now, it is obvious that two GMRs g ₁(X) ≠ g ₂(X) result in equivalent hazard with λ ₁(y) = λ ₂(y) if the area density is equal with k ₁(y) = k ₂(y) resp. K ₁(y) = K ₂(y). Note, all other components in Eqs. (12 and 13) are equal, including the parameterization of CDF F _y by V(ε), which does not include ε _Q. The equivalence of λ(y) of Eqs. (12 and 14) applies only to one site s with homogeneous seismicity in its surrounding. We introduce now an expansion of this equivalence to the influence function λ ^*(y). This function describes the influence of any fixed source point t to the seismic hazard of all sites s with homogenized site conditions

$$ {\lambda}^{*}(y)=\nu {\displaystyle {\int}_{{\mathbf{X}}_{\mathrm{E}}}{\displaystyle {\int}_{\mathrm{s}}{f}_{{}_{\mathrm{E}}}\left({\mathbf{X}}_{\mathrm{E}}\right)\left(1-{F}_y\left(y;E\left(Y\left(\mathbf{s}\right)\right)=g\left(\mathbf{X}\right.\right),V\left(Y\left(\mathbf{s}\right)\right)={g}^2\left(\mathbf{X}\right)V\left(\varepsilon \left)\right)\right.\right)}}d\mathbf{s}d{\mathbf{X}}_{\mathrm{E}},\kern0.5em \mathbf{X}=\left({\mathbf{X}}_{\mathrm{E}},\mathbf{s},\mathbf{t}\right). $$

(14)

This integral is basically equivalent to the integral of Eq. (12), and the principle of area equivalence also applies. Area-equivalent GMRs result in equal influence functions. Therein, homogeneity of seismicity is not required for Eq. (14); f _E and ν can be source point specific. Time dependence is also possible.

What is the consequence? For an actual and area-equivalent GMR is g _actual(X) ≠ g _equivalent(X) for almost all X, what includes a local bias. An example is given in Fig. 3b. But ε _Q = g _actual(X)/g _equivalent(X) of Eq. (7) is not an actual random component and is not considered in Eqs. (12–14). If ε _Q resp. ξ _Q are interpreted as an actual random component, and the estimated residual variance from the regression analysis for the GMR is directly applied to the GMR in PSHA, then we overestimate the entire random component (residual) ε resp. ξ and the variances V(Y(s)) and by this the influence function λ ^*(y) resp. the entire influence of each source point t on the AEFs λ(y) of all sites. The hazard estimation of all sites is systematically overestimated in this way. This does not exclude the possibility of local underestimation of λ(y) according to Fig. 3b. The only exception of the systematic overestimation is the case if all random components ε _Q, ε ₀, and ε _E have no influence (see above).

Non-mathematicians can numerically check these results. Beside this, we state that GMRs could be random functions because there is no proof that all events need to have equal parameters θ in Eqs. (2–5).

4.2 A model of anisotropic GMR for a point source

An anisotropic GMR can be simply formulated for a point source model according to the intercept theory (Fig. 3a) by a unit-isoline which includes area π equal to the unit circle of angle functions. The radius function d _unit(φ) determines the unit-isoline with azimuth φ of local polar coordinates with origin t. The distance d in r ² = h ² + d ² is replaced by d ^* = d/d _unit (φ). An example is pictured in Fig. 3b. Different unit-isolines can obviously be combined by the sum d ² _unit(φ) = ∑ _i a _i ⋅ d ² _i,unit(φ) with weighting 0 ≤ a _i ≤ 1 and ∑ _i a _i = 1. We do not discuss any physical interpretation because our focus is on statistical modeling and physics is also not discussed in other anisotropic models (Enescu and Enescu 2007; Sørensen et al. 2010).

4.3 Examples of area-equivalent GMRs

We illustrate the action of misinterpretation of ε _Q as an actual random component in example I. We apply the unit-isoline of Fig. 3b, and we set θ ₃ = 1 and θ ₂ = 0 of a GMR according to Eq. (5) (see Fig. 4a). The parameters are θ ₀ = θ ₁ = θ _s = 0 because they are not relevant here. Furthermore, we fix h = 10 km and simulate for a fixed site in the center of a source region with homogeneous seismicity as described in Appendix 3. There is no actual random component in the actual GMR because V(ξ) = 0. We have plotted ln(Y) in relation to distance r in Fig. 4b with the regression function for the isotropic, circular GMR. The estimated parameters are almost equal to the actuals. If the observed residuals are interpreted as actual random components, then the residual variance is overestimated with V(ξ) = 0.10. An interesting aspect constitutes the distance dependency of ξ _Q in Fig. 2b: It increases with increasing distance. But we can also construct an example II of area equivalence with decreasing variance using the Joyner–Boore distance. In Fig. 4c, an actual and an estimated vertical projection of the rupture is pictured. The shape of both projections and the included area is equal; only the azimuth is different. Obviously, the actual and the modeled Joyner–Boore distance are area equivalent for the same GMR. But there is a component ε _Q resp. ξ _Q. We simulate again observations without actual random components and show these in Fig. 4d. V(ξ _Q) decreases with increasing, large distances.

5 Numerical studies

5.1 The influence of the distribution type

We numerically investigate the influence of the type of CDF F _y, which is determined by the distribution of ε resp. ξ, on the hazard curve for equivalent residual variance. For this purpose, we use again the constructed situation of seismicity according to Appendix 3 with fixed depth h = 10 km and consider different upper magnitudes m _max = 7 and 9. Additionally, we consider different variances V(ξ) = 0.15² and 0.3² for log₁₀, which are typical for previous GMRs. The applied GMR is g ^*(X) = 0.5m − ln(r) − 0.002r + 4.7. We consider different distributions of random component ε: Gumbel, the log-normal and truncated log-normal distribution. The latter has an upper and lower bound at three times its standard variation. The computed AEFs are shown in Fig. 5. We note that the influence of the distribution type depends on the maximum magnitude, the residual variance, and the range of y. The hazard of rare events is the largest for the log-normal distribution with high variance. Of course, further seismicity parameters influence the contribution of the distribution model, and the distribution model has no influence in case of unbounded exponentially distributed magnitudes (Section 4.1).

5.2 An example of area-equivalent GMRs in a PSHA

We research the influence of the misinterpretation of ε _Q of example I (Section 4.3, Fig. 3b) on the PSHA. The GMR is g ^*(X) = 0.5m − ln(r) + 4.7. The considered site s again is the center of the quadratic source region of uniform seismicity with m _max = 8; for further details, see Appendix 3. We compute the AEF for an isotropic (circular) GMR and the anisotropic one, both without a random component resp. residual. In the third variant, we consider the isotropic GMR and consider a normally distributed random component ξ with V(ξ) = 0.10 according to the estimation of example I in Section 3.5. The results are illustrated in Fig. 6. As expected according to the theory given in Section 4.1, the area-equivalent GMRs without random component result in equivalent hazard and the misinterpretation of the differences as random component results in the overestimated hazard.

5.3 The obscuration of a Gumbel distributed random component ε ₀

The Gumbel distribution of random component ε ₀ (Eq. (8b)) could be hidden. If we observe/estimate the product ε ₀ ε _S (ε _S as unknown site effect, acting as random variable according to Joyner and Boore 1993), then we cannot test the assumption of log-normal distribution for each component. But the product could be distributed similarly to a log-normal distribution. An example: ε ₀ is Gumbel distributed with E(ε ₀) = 1 and V(ε ₀) = 0.199, ε _s is beta distributed (see Eq. (22)) with E(ε _s) = 1 and V(ε _s) = 0.091 and with bound 0.5 ≤ ε _s ≤ 2.8. The product ε ₀ ε _S has a mixed distribution as shown in Fig. 7a. It looks very similar to a log-normal distribution. However, their tails in Fig. 7b differ considerably. The tail is important for PSHA according to the studies of Restrepo-Velez and Bommer (2003) and Strasser et al. (2008), and it is a specially studied object in extreme value statistics (Leadbetter et al. 1983). Therein, the tail of a distribution should be modeled by the generalized Pareto distribution. Huyse et al. (2010) has already modeled the tail for the residuals of a ground motion using a generalized Pareto distribution.

Furthermore, we research in detail the possibility of hidden distributions for a GMR Y = ε ₀ ε _Sexp(0.7m − ln(r) + θ _s,1 x _s,1 + θ _s,2 x _s,2 + θ _s,1 x _s,2) with a site parameter exp(θ _s,i) = 0.8, 1.1, and 1.2 and indicator variables x _s,i for three site types. We Monte Carlo generate a large sample (ln(Y),m,ln(r),x) with uniform distributed magnitude with 4 ≤ m ≤ 7.5, uniform distributed ln(r) with ln(5) ≤ ln(r) ≤ ln(200) and an occurrence probability of 1/3 for each site type. Then we carry out a LS regression using this sample and analyze the estimated residuals ξ to be normally distributed. We have done this for a sample size of n = 1,500. The estimated residual variance is V(ξ) = 0.293. The q–q normal plot of the residuals is shown in Fig. 7c. Similar plots (see Bommer et al. 2004, Fig. 2) have been interpreted as a proof for a normally distributed ξ, and the wrong KS test would also indicate a log-normal distribution. Only the AD test would reject the false assumption. However, also this correct test would often accept the false distribution model. We state that the actual distribution of ε resp. ξ could be hidden.

5.4 The influence of the different effects on the estimation of GMR and PSHA

Now we research the effects of the combination of misinterpreted random component ε _Q, incorrect distribution assumption for F _y, and measurement errors in the predictors on the PSHA using examples of anisotropic GMRs with point sources. For this purpose, we assume again the constructed situation of a site s and surrounding homogeneous seismicity according to Appendix 3. The magnitude is upper bounded here by m _max = 8. The seismicity parameters are precisely known for the PSHA, but the parameters of the GMRs are estimated. For the latter, regression models are estimated for Monte Carlo simulated samples of (Y,m _measured,r _measured). Therein, the actual hypocenter depth h is fixed and the accidental distance d to the point source is beta distributed; the related parameter depends partly on the simulated beta distributed magnitude (for details, see Appendix 3).

The measurement errors of depth h and distance d are also Monte Carlo simulated; h _measured is a log-normally distributed random variable; the measured distance is d _measured = |d _actual + d _error| (in kilometers) with normally distributed d _error with a standard deviation of 5 km and an expectation of 0. A seismological epicenter or point of maximum energy could be estimated more precisely, but the seismological epicenter can differ from the epicenter of the GMR—the point of maximum g(X) resp. g ^*(X). We also assume a normally distributed measurement error for the magnitude with a standard deviation of 0.15 and 0.25. This is plausible according to our discussion in Section 2.2 and the magnitude errors in the PEER database (2013, NGA_Flatfile_2005Version.xls). We simulate 500 pairs of (m _measured,r _measured) for each sample using this procedure. Examples are illustrated in Fig. 8. These are conceivable possibilities according to actual samples (e.g., Ambraseys and Simpson 1996; Ambraseys et al. 1996; Spudich et al. 1999; Atkinson 2004; Kalkan and Gülkan 2004; Massa et al. 2008).

The related ground motion intensity Y = ε _S ε ₀ g(X) is computed by the actual pair (m.r) and the defined GMR. It is formulated by g ^*(X) and V(ξ ₀) of Eq. (3) and is transformed to g(X) and V(ε ₀) by Eqs. (4a, 4b, and 4c). Its relevant parameters are listed in Table 2. The individual random component ε ₀ is Gumbel distributed (Eq. (8b)). The site-specific random component ε _S is beta distributed with expectation E(ε _S) = 1 with a small share to ε (see Table 2, rows 9 and 10). Anisotropy is considered by an elliptic unit-isoline (see Section 4.2 and Fig. 16b). The actual random component (residuals) ε is not log-normally distributed resp. ξ is not normally distributed.

Table 2 Investigated variants of GMRs according to Eq. (1–5) and the estimations (±standard error of the estimations; parameters θ _i are according to Eq. (5); see also Tables 8 and 9 of “Appendix 4”)

Full size table

We estimate a GMR with the LS regression for each sample of size n = 500 and test the estimated residuals ξ to be normally distributed using the KS test as done in previous researches. We repeat this 100 times for each researched variant. The averages of the estimated parameters are listed in Table 2 as well as the shares of rejection of the KS test. The false normal assumption for ξ is accepted in 68 to 98 % of the samples. The residual variances of all four variants are overestimated according to rows 10 and 11 in Table 2. Therein, the contribution of the magnitude error is small (raw 15). We show the GMRs g(X) in Fig. 9 with actual parameters and with the averages of the estimated parameters. They do not differ from each other very much, but there is a certain bias.

Furthermore, we compute the AEF by PSHA and for the assumed seismicity described above. We compare the influence of the actual and the estimated GMRs. We apply the averages of the estimations to the latter one. The corresponding AEFs for the constructed seismicity are depicted in Fig. 10. The actual AEFs are shown for site condition E(ε _S) = 1 and the 80 % quantile of ε _S. This gives an impression of the small influence of the considered site effects. Furthermore, we show an AEF for the area-equivalent isotropic GMR with the actual type and variance of F _y. We state: The area-equivalence works well, as expected. The overestimated variance and the wrong log-normal assumption lead to an overestimation of the hazard for long return periods (reciprocal of exceedance frequency). The bias in parameter vector θ partly compensates this overestimation. The theoretical results of Section 4.1 are confirmed.

6 Alternative estimation of GMR

6.1 The basic concept

We have stated in Section 4.1 that the GMR of an earthquake is a random function in our model. That means that the GMR has event-specific parameters. In consequence, the parameters of the GMR can and should be estimated event-specific. Therein, we assume that it is practically impossible to find the “true” GMR being absolutely exact for each azimuth and with a perfect source model. Furthermore, we assume that an event-specific, area-equivalent GMR g(X) can be formulated and estimated by a regression model, except the variances of the actual random components. The relation of the event-specific GMRs to the event magnitudes has to be researched in this concept after a number of GMRs have been estimated. This approach has already been applied by Joyner and Boore (1981): They estimated the parameter θ ₁ of Eq. (5) by a regression analysis of the pairs (m, θ ₀), wherein θ ₀ of our Eq. (5) is event-specific. Event-specific random component ε _E resp. ξ _E would be the residual variance of such secondary regression analysis. The influence of side effects can be analyzed using a posterior analysis of the estimated residuals (Morikawa et al. 2008; negligence of non-iid). Thus, event-specific randomness of the site effects could be considered. The remaining problem is to estimate V(ε ₀) resp. V(ξ ₀) under exclusion of any influence of ε _Q resp. ξ _Q. This should be possible by an analysis of the two horizontal components Y ₁ and Y ₂. The difference

$$ \begin{array}{l} \ln \left({Y}_1\right)- \ln \left({Y}_2\right)= \ln \left(g(X){\varepsilon}_{\mathrm{E}}{\varepsilon}_S{\varepsilon}_{01}{\varepsilon}_Q\right)- \ln \left(g(X){\varepsilon}_{\mathrm{E}}{\varepsilon}_{\mathrm{S}}{\varepsilon}_{02}{\varepsilon}_Q\right)\hfill \\ {}= \ln \left({\varepsilon}_{01}\right)- \ln \left({\varepsilon}_{02}\right)={\xi}_{01}-{\xi}_{02},E\left( \ln \left({\varepsilon}_{01}\right)- \ln \left({\varepsilon}_{02}\right)\right)=0\hfill \end{array} $$

(15)

includes only the horizontal random components ε ₀₁ and ε ₀₂ resp. ξ ₀₁ and ξ ₀₂. Therein ε ₀₁ and ε ₀₂ (resp. ξ ₀₁ and ξ ₀₂) have an equivalent distribution, and they are interdependent. If we would know the dependence structure between ε ₀₁ and ε ₀₂ according to Mari and Kotz (2001, Section 4, copula), then we could estimate the distribution of ε ₀₁ and ε ₀₂ with the difference ln(ε ₀₁) − ln(ε ₀₂) by statistical computations. Therefore, the dependence structure should be investigated in future researches. A differentiation by classes of magnitudes, regions, site conditions, or something else is possible because there should be enough ground motion observations to compute a large number ln(ε ₀₁) − ln(ε ₀₂). We cannot prove the functionality of our entire suggestion, but we estimate the GMRs of different earthquakes to demonstrate the potential of our approach.

6.2 Analysis of empirical data

We analyze the observed PGAs of nine earthquakes of the PEER Strong Motion Database (2013, files: NGA_Flatfile_2005Version.xls, NGA_Documentation.xls). We select such earthquakes with a large number of records and an event center inside the cloud of strong motion stations, which should cover the entire event area (event # 136 differs slightly). Furthermore, we consider only one event from a cluster: try to consider different regions and to cover a relatively large range of the magnitude scale. The selected earthquakes are listed in Table 3. We consider different models: a point source model with isotropic GMR, a point source model with anisotropic GMR, and (if available) the source models which lead to the constructed Joyner–Boore distance, the Campbell distance, the root-mean-squared distance (RmsD), and the closest distance to the ruptured area (ClstD). Our basic formulation is g*(X) = θ ₀ − θ ₂ln(r) − θ ₃ r according to Eq. (5) with r ² = d ² + h ² resp. r ² = d ² /d ² _unit(φ) + h ² for the anisotropic point source. We use an eccentric circle and an ellipse (Fig. 16) to model anisotropy. Additionally, we consider different combinations of defined and estimated parameters. The depth parameter h can be set by the hypocenter depth or be estimated with limit h ≥ 0.1 km. The parameter θ ₃ for ln(r) is estimated or set to 1; we do not consider a bound. The parameter θ ₂ is either set to 0 or estimated with limit θ ₂ ≥ 0. We divide the models into groups: the constructed distances, the isotropic point source with epicenter as projected point source, the isotropic point source with estimated coordinates of the point source (start values are the epicenter coordinates), and the anisotropic point source model with estimated coordinates of the point source. For each division, we select the variant of best combination of estimated/set parameters by the smallest AIC (Rawlings et al. 1998, Section 7) with sample size n and parameter number N

$$ \mathrm{AIC}= \ln \left(\widehat{V}\left(\xi \right)\right)+2N/n. $$

(16)

Table 3 Analyzed earthquakes of the PEER strong motion database

Full size table

The parameters of the best models are listed in Table 10 of Appendix 5. Their AICs are given in Table 4, V(ξ) in Table 5. The constructed distances do not result in good estimations; the anisotropic point source model is frequently the best model. This may be relative because constructed distances do not exist for each event, but we consider four constructed distances and only two simple variants of anisotropy. Additionally, it may be possible that we have estimated only a local minimum of least squares for the anisotropic models; the global one could be much better. The average variance of the best models of all point source models is V(ξ) = 0.19. This also includes the site effects, but it is nevertheless significantly smaller than the residual variance of the intra-event component of NGA. For example, Abrahamson and Silva (2008, standard derivations s ₁ and s ₂ of Table 6, Eq. 27) have variances between 0.35 (m ≤ 5) and 0.22 (m ≥ 7). This fact indicates the advantage of our approach. Examples of the estimated GMRs are shown in Fig. 11. The graphs of the GMRs look partly very individual, which is a result of event-specific parameters. This validates the approach of individual GMRs for individual earthquakes. There are also some cases with estimated depth h = 0.1 km; our defined lower bound for h. Reason for such poor estimations is the non-regular situation in the regression model: A parameter h defines the predicting variable − distance r. Smith (1985) has mathematically researched the problem of irregularity for distribution functions; we do not know a similar research for regression models. However, this problem could be minimized by the Bayesian approach of parameter estimation: The seismological source estimation could provide a prior distribution for the depth parameter. Furthermore, the estimations for event #136, the Kocaeli, Turkey 1999 earthquake are poor. The parameter θ ₃ is <0 in some estimations. However, we do not change or remove this event.

Table 4 Best AICs of the different model approaches (bolted—absolute best, cursive—best of point sources)

Full size table

Table 5 Residual variances V(ξ) of g ^*(X) for ln(Y) of the different model approaches

Full size table

6.3 Area functions and site effects of the Chi-Chi earthquake

The Chi-Chi, Taiwan 1999 earthquake (#137) is the one with the largest sample size n = 420. We can use it to compare the area function of the estimated GMR to the actual one. But we cannot compute the area function directly because the records are from stations that are not uniformly distributed; there are concentrations and thinning that have to be considered. We do it using an empirical area function wherein the integration of Eq. (9) is replaced by a discrete accumulation with

$$ \begin{array}{l}\kern2em \\ {}\begin{array}{cccc}\hfill {\widehat{K}}_{\mathrm{observed}}(y)={\displaystyle {\sum}_{i=1}^n{a}_i^{*}\mathbf{1}\left({Y}_i\right)},\hfill & \hfill \mathbf{1}\left({Y}_i\right)=1\kern1em \mathrm{if}\kern0.5em {Y}_i\ge y,\hfill & \hfill \mathrm{otherwise}\kern1em \mathbf{1}\left({Y}_i\right)=0\hfill & \hfill \mathrm{and}\hfill \end{array}\end{array} $$

(17)

$$ \begin{array}{ccc}\hfill \widehat{K_{\mathrm{GMR}}}(y)={\displaystyle {\sum}_{i=1}^n{a}_i^{*}\mathbf{1}\left(g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\right),}\hfill & \hfill \mathbf{1}\left(g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\right)=1\kern1em \mathrm{if}\kern0.5em g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\ge y,\hfill & \hfill \mathrm{otherwise}\kern1em \mathbf{1}\left(g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\right)=0.\hfill \end{array} $$

(18)

We estimate these discrete steps a _i ^* by a Voronoi analysis of the stations, and our estimated area functions are defined with indicator function (see Eq. (9)). The results are shown in Fig. 12. A good model should include a good accordance between Eqs. (17 and 18) with larger differences for larger y because of a larger influence of the random components. Additionally, a certain bias is conceivable for very small values of y because of the effect of the truncation of the geo-space by the finite sample resp. station number in Eqs. (17 and 18). A good agreement is detected for the point source model with estimated coordinates. Especially the anisotropic variant fits well, in contrary to the constructed distances.

The plausibility of the comparison of $ {\widehat{K}}_{\mathrm{observed}}(y) $ and $ {\widehat{K}}_{\mathrm{GMR}}(y) $ can be simply tested by generation of $ Y=\varepsilon g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right) $ with the estimated GMR and Monte Carlo simulated random component ε. We do it for anisotropic GMR for a point source and simulate 100 times the entire sample and adopted the weighting a _i ^* for $ {\widehat{K}}_{\mathrm{observed}}(y) $ by factor 0.01. We consider a log-normal and gamma distributed random component to demonstrate the generality of the approach (gamma distribution, see Johnson et al. 1994, Section 17). Therein is E(ε) = 1 and V(ε) = 0.181, what corresponds with V(ξ) = 0.166 (see Eqs. (4a, 4b, and 4c)). The results are pictured in Fig. 12 h. The approach works and the distribution type of ε is not relevant for the medium range of PGA. We also estimate the site effects for the Campbells GEOCODE of the PEER data (2013, NGA_Documentation.xls) using the expectation of the residuals (see Table 6).

Table 6 Expectations of residuals of g ^*(X) and the statistical significance for different site classes

Full size table

6.4 Relation of specific GMRs to the magnitude

There is the need to find a relation between the event-specific GMRs and the earthquake magnitudes when the event size in the PSHA is quantified by the magnitude. We search such a relation by a statistical analysis of the relations between the parameters of the GMRs and the magnitudes. The results are shown in Fig. 13a–d. Obviously, there is not a significant relation. But we follow the idea of the max-stable random fields and compute the volume of GMRs in the geo-space. We compute the volume

$$ {V}_{\mathrm{o}}={\displaystyle {\int}_{\mathbf{s}}{g}^2\left(\mathbf{X}\right)d\mathbf{s}} $$

(19)

numerically for distance d resp. d ^* ≤ 1,000 km in steps of 25 m. Therein we squared the event-specific GMR because the PGA is approximately proportional to the PGV (Wald et al. 2006, Section 2.5, Eqs. (1.1–1.4)), the squared velocity is proportional to the energy, and the energy is strongly related to the magnitude. The logarithms of these volumes have a strong statistical linear relation to the magnitude according to Fig. 13e, f with a minor influence of V(ξ).

Of course, the applied sample size is small, which causes an uncertainty of the result. But the magnitudes and the volumes are also only estimated and include an estimation error. Such errors rather disturb the observed relation. Therefore, the actual relation should be really strong. The relation is also not changed significantly if we eliminate event #136 with the poor estimations. Such a relation could be applied to GMRs in a PSHA. The distance parameters θ ₂ and θ ₃ in Eq. (5) could be random variables, and θ ₀ is computed using the relation magnitude to volume. The previous magnitude parameter θ ₁ would not be needed anymore.

7 Conclusion and outlook

We have discussed here important aspects of earlier approaches to GMR by a regression model and discovered in Section 2 that many models have not been built according to the rules of statistics regarding statistical significance, model selection, and test of the distribution assumption. But even if the log-normal assumption for residuals of ε are tested positively, it is not because the individual component ε ₀ according to Eq. (7) of a PGA or another maximum value should be generalized extreme value distributed according to the extreme value statistics (Sections 3 and 5). Its domain of attraction seems to be the Gumbel one, but this issue should be examined by future researches. Our major contribution is the introduction of area equivalence of GMR for PSHA in Section 4, which implies a distinction between an appropriate prediction of the PGA for a concrete earthquake by a conventional regression model and an appropriate GMR for the PSHA. These models need not to be equal regarding the residual variances. In contrary, the residual of the regression model for the random function GMR includes the component ε _Q of Eqs. (4a, 4b, and 4c). This may not apply as an actual random component in the GMR for the PSHA; otherwise, the variance V(ε) is overestimated, which leads to an overestimated hazard in the PSHA (except for one case, Section 4.1). The possible influence of the distribution of ε and the misinterpretation of ε _Q have been researched in Section 5. The overestimation of hazard can be remarkable. Our numerical studies consider a broader constellation of parameters and distribution of predictors than the numerical studies of Joyner and Boore (1993) and Chen and Tsai (2002) about estimation procedures for GMRs. Nevertheless, the benefit is limited. More extensive numerical studies with different sample sizes would be needed to quantify more exactly the bias in the PSHA. Independent of this fact, we have suggested an estimation concept for GMRs in PSHA in the last section, including the independent estimation of the parameters of the individual random component ε ₀. We stated that the dependence structure of the horizontal components has to be researched in the future to apply this concept. However, we were able to show that the event-specific modeling of GMR leads to smaller variances V(ξ) than earlier models. Therein the anisotropic point source approach results in the best regression models, while the constructed distances (e.g., Joyner–Boore) do not work well. The relation between the event magnitude and the GMRs is given by the integration of g ²(X) over the geo-space. Details of this relation and its consideration in PSHA should be studied in further researches. Beside this, the empirical area functions for the Chi-Chi, Taiwan 1999 earthquake confirm that the anisotropic point source approach works well. Nevertheless, we also suggest developing a detailed theory of this geo-statistical approach in the future. This also applies to the estimation of point source coordinates and depth by a regression analysis. Further statistical methods like Bayes estimation, local regression, or kernel regression could provide better estimations, and the models of extreme value statistics for the distribution tails could improve the GMR in PSHA. A large challenge for future researches is also the discovering, estimation, and/or examination of the distribution of every single random component of the GMR.

References

Abrahamson NA (1988) Statistical properties of peak ground motion accelerations recorded by the SMART 1 array. Bull Seismol Soc Am 78:26–41
Google Scholar
Abrahamson N, Silva W (2008) Summary of the Abrahamson & Silva NGA ground-motion relations. Earthquake Spectra 24:67–97
Article Google Scholar
Abrahamson NA, Youngs RR (1992) A stable algorithm for regression analyses using the random effects model. Bull Seismol Soc Am 82:505–510
Google Scholar
Abrahamson NA, Birkhauser P, Koller M et al (2002) PEGASOS—a comprehensive probabilistic seismic hazard assessment for nuclear power plants in Switzerland. In: Proceedings of the Twelfth European Conference on Earthquake Engineering, Paper no 633, London
Abrahamson N, Atkinson G, Boore D, Bozorgnia Y et al (2008) Comparisons of the NGA ground-motion relations. Earthquake Spectra 24:45–66
Article Google Scholar
Al Atik L, Abrahamson N, Bommer JJ, Scherbaum F et al (2010) The variability of ground-motion prediction models and its components. Seismol Res Lett 81:794–801
Article Google Scholar
Ambraseys NN, Bommer J (1991) The attenuation of ground accelerations in Europe. Earthq Eng Struct Dyn 20:1179–1202
Article Google Scholar
Ambraseys NN, Simpson KA (1996) Prediction of vertical response spectra in Europe. Earthq Eng Struct Dyn 25:401–412
Article Google Scholar
Ambraseys NN, Simpson KA, Bommer JJ (1996) Prediction of horizontal response spectra in Europe. Earthq Eng Struct Dyn 25:401–412
Article Google Scholar
Ambraseys NN, Douglas J, Sarma SK (2005) Equations for the estimation of strong ground motions from shallow crustal earthquakes using data from Europe and the Middle East: horizontal peak ground acceleration and spectral acceleration. Bull Earthq Eng 3:1–53
Article Google Scholar
Anderson JG, Uchiyama Y (2011) A methodology to improve ground-motion prediction equations by including path corrections. Bull Seismol Soc Am 101:1822–1846
Article Google Scholar
Atkinson GM (2004) Empirical attenuation of ground-motion spectral amplitudes in southeastern Canada and the northeastern United States. Bull Seismol Soc Am 94:1079–1095
Article Google Scholar
Atkinson GM (2006) Single station sigma. Bull Seismol Soc Am 96:446–445
Article Google Scholar
Atkinson GM, Boore DM (1995) Ground-motion relations for Eastern North America. Bull Seismol Soc Am 85:17–30
Google Scholar
Beirlant J, Goegebeur Y, Teugels J, Segers J (2004) Statistics of extremes: theory and applications. Wiley series in probability and statistics. Wiley, Chichester
Book Google Scholar
Beyer K, Bommer JJ (2007) Relationships between median values and between aleatory variabilities for different definitions of the horizontal component of motion. Bull Seismol Soc Am 96(4A):1512–1522
Article Google Scholar
Billingsley P (1995) Probability and measure. Wiles series in probability and mathematical statistics. Wiley, USA
Google Scholar
Bommer JJ, Abrahamson A (2006) Why do modern probabilistic seismic hazard analyses often lead to increased hazard estimates? Bull Seismol Soc Am 96:1967–1977
Article Google Scholar
Bommer JJ, Abrahamson NA, Strasser FO et al (2004) The challenge of defining the upper limits on earthquake ground motions. Seismol Res Lett 75(1):82–95
Article Google Scholar
Bommer JJ, Stafford PJ, Alarcón JE, Akkar S (2007) The influence of magnitude range on empirical ground-motion prediction. Bull Seism Soc Am 97(6):2152–2170
Article Google Scholar
Boore DM, Atkinson GM (2007) Boore–Atkinson NGA ground motion relations for the geometric mean horizontal component of peak and spectral ground motion parameters. PEER Report 2007/01, Pacific Earthquake Engineering Research Center, College of Engineering, University of California, Berkeley
Campbell KW (1981) Near-source attenuation of peak horizontal acceleration. Bull Seismol Soc Am 71:2039–2070
Google Scholar
Campbell KW (1993) Empirical prediction of near-source ground motion from large earthquakes. In: Proceedings of the International Workshop on Earthquake Hazard and Large Dams in the Himalaya. Indian National Trust for Art and Cultural Heritage, New Delhi, India
Campbell K, Bozorgnia Y (2008) NGA ground motion model for the geometric mean horizontal component of PGA, PGV, PGD and 5 % damped linear elastic response spectra for periods ranging from 0.01 to 10 s. Earthquake Spectra 24:139–171
Article Google Scholar
Castellaro S, Mulargia F, Kagan YY (2006) Regression problems for magnitudes. Geophys J Int 165:913–930
Article Google Scholar
Chang T, Cotton EJ, Anglier J (2001) Seismic attenuation and peak ground acceleration in Taiwan. Bull Seismol Soc Am 91:1,229–1,246
Google Scholar
Chen Y-H, Tsai CCP (2002) A new method for estimation of the attenuation relationship with variance components. Bull Seismol Soc Am 92:1984–1991
Article Google Scholar
Cheng C-L, van Ness JW (1999) Statistical regression with measurement error. Kendall’s Library of Statistics, 6. Arnold, London
Chiou BS-J, Youngs RR (2008) NGA model for average horizontal component of peak ground motion and response spectra. PEER Report 2008/09, Pacific Engineering Research Center College of Engineering, University of California, Berkeley
Coles S (2001) An introduction to statistical modeling of extreme values. Springer, London
Google Scholar
Cornell CA (1968) Engineering seismic risk analysis. Bull Seismol Soc Am 58:1583–1606
Google Scholar
Cosentino P, Ficarra V, Luzio D (1977) Truncated exponential frequency–magnitude relationship in earthquake statistics. Bull Seismol Soc Am 67:1615–1623
Google Scholar
Crouse CB, McGuire JW (1996) Site response studies for purpose of revising NEHRP seismic provisions. Earthquake Spectra 12:407–439
Article Google Scholar
D’Augustino RB, Stephens MA (eds) (1986) Goodness-of-Fit techniques. Statistics: textbooks and monographs, Vol. 68. Marcel Dekker, New York
de Haan L, Ferreira A (2006) Extreme value theory. Springer, New York
Google Scholar
Douglas J (2001) A comprehensive worldwide summary of strong-motion attenuation relationships for peak ground acceleration and spectral ordinates (1969 to 2000). ESEE Report 01–1. Department of Civil and Environmental Engineering, Imperial College, London. http://nisee.berkeley.edu/library/douglas/ESEE01-1.pdf
Douglas J (2002) Errata of and additions to ESEE report no. 01–1: ‘A comprehensive worldwide summary of strong-motion attenuation relationships for peak ground acceleration and spectral ordinates (1969 to 2000)’. Dept. Report, Imperial College of Science, Technology and Medicine Department of Civil & Environmental Engineering, London. http://nisee.berkeley.edu/library/douglas/douglas2002.pdf
Douglas J (2003) Earthquake ground motion estimation using strong-motion records: a review of equations for the estimation of peak ground acceleration and response spectral ordinates. Earth Sci Rev 61:43–104
Article Google Scholar
Douglas J, Smit PM (2001) How accurate can strong ground motion attenuation relations be? Bull Seismol Soc Am 91:1917–1923
Article Google Scholar
Dupuis DJ, Flemming JM (2006) Modeling peak acceleration from earthquakes. Earthq Eng Struct Mech 35:969–987
Article Google Scholar
Efron B (1979) Bootstrap methods: another look at the jackknife. The Annals of statistics 7:1–26
Article Google Scholar
Enescu D, Enescu BD (2007) A procedure for assessing seismic hazard generated by Vrancea earthquakes and its application. III. Method for developing isoseismal maps and isoacceleration maps. Application. Rom Rep Phys 59:121–145
Google Scholar
Falk M, Hüsler J, Reiss R-D (2011) Laws of small numbers: extremes and rare events, 3rd edn. Birkhäuser, Basel
Book Google Scholar
Fisher RA, Tippett LHC (1928) Limiting forms of the frequency distributions of largest or smallest member of a sample. Proc Cambridge Philos Soc 24:180–190
Article Google Scholar
Giardini D (1984) Systematic analysis of deep seismicity: 200 centroid-moment tensor solutions for earthquakes between 1977 and 1980. Geophys J R astr Soc 77:883–914
Article Google Scholar
Gnedenko BV (1943) Sur la distribution limite du terme d’une série aléatoire. Ann Math 44:423–453
Article Google Scholar
Hüsler J, Li D, Raschke M (2011) Estimation for the generalized Pareto distribution using maximum likelihood and goodness-of-fit. Commun Stat Theory Methods 40:2500–2510
Article Google Scholar
Huyse L, Chen R, Stamatakos AJ (2010) Application of generalized Pareto distribution to constrain uncertainty in peak ground accelerations. Bull Seismol Soc Am 100(1):87–101
Article Google Scholar
Idriss IM (2007) Empirical model for estimating the average horizontal values of pseudo-absolute spectral accelerations generated by crustal earthquakes. Vol.1. Interim Report Issued for USGS Review, PEER. http://peer.berkeley.edu/ngawest/nga_models.html
Johnson NL, Kotz S, Balakrishnan N (1994) Continuous univariate distributions—Vol. I. 2nd edn. Wiley, New York
Johnson NL, Kotz S, Balakrishnan N (1995) Continuous univariate distributions—Vol. II. 2nd edn. Wiley, New York
Joyner WB, Boore DM (1981) Peak horizontal acceleration and velocity from strong-motion records including records from the 1979 Imperial Valley, California, earthquake. Bull Seism Soc Am No 6:2011–2038
Google Scholar
Joyner WB, Boore DM (1993) Methods for regression analysis of strong motion data. Bull Seism Soc Am 83(2):469–487
Google Scholar
Kabluchko Z, Schlather M, de Haan L (2009) Stationary max-stable random fields associated to negative definite functions. Ann Probab 37(5):2042–2065
Article Google Scholar
Kaklamanos J, Baise LG (2010) Model validation of recent ground motion prediction relations for shallow crustal earthquakes in active tectonic regions. In: Proceedings: 5th International Conference on Recent Advances in Geotechnical Earthquake Engineering and Soil Dynamics, May 24–29, 2010, San Diego, California
Kalkan E, Gülkan P (2004) Empirical attenuation equations for vertical ground motion in Turkey. Earthquake Spectra 20:853–882
Article Google Scholar
Landry L, Lepage Y (1992) Empirical behavior of some tests for normality. Commun Stat Simul Comput 21:971–999
Article Google Scholar
Leadbetter MR, Lindgren G, Rootzen H (1983) Extremes and related properties of random sequences and processes. Springer series in statistics. Springer, New York
Book Google Scholar
Lin PS, Chiou B, Abrahamson N, Walling M (2011) Repeatable source, site, and path effects on the standard deviation for empirical ground-motion prediction models. Bull Seismol Soc Am 101:2281–2295
Article Google Scholar
Lindsey JK (1996) Parametric statistical inference. Oxford Science Publications, Oxford University Press, Oxford
Mari DD, Kotz S (2001) Correlation and dependence. Imperial College Press, London
Book Google Scholar
Massa M, Morasca P, Moratto L et al (2008) Empirical ground-motion prediction equations for northern Italy using weak- and strong-motion amplitudes, frequency content, and duration parameters. Bull Seismol Soc Am 98:1319–1342
Article Google Scholar
McGuire RK (1977) Seismic design spectra and mapping procedures using hazard analysis based directly on oscillator response. Earthq Eng Struct Dyn 5:211–234
Article Google Scholar
McGuire RK (1995) Probabilistic seismic hazard analysis and design earthquakes: closing the loop. Bull Seismol Soc Am 85:1275–1284
Google Scholar
Molas GL, Yamazaki F (1995) Attenuation of earthquake ground motion in Japan including deep focus events. Bull Seismol Soc Am 85:1343–1358
Google Scholar
Monguilner C A, Ponti N, Pavoni S B et al. (2000) Statistical characterization of the response spectra in the Argentine Republic. In: Proceedings of 12th World Conference on Earthquake Engineering, paper no. 1825
Montgomery CM, Peck EA, Vining GG (2006) Introduction to linear regression analysis. Wiley, Hoboken
Google Scholar
Morikawa N, Kanno T, Narita A et al (2008) Strong motion uncertainty determined from observed records by dense network in Japan. J Seismol 12:529–546
Article Google Scholar
PEER Strong Motion Database (2010) http://peer.berkeley.edu/smcat/. Accessed December 2010
PEER Strong Motion Database (2013) http://peer.berkeley.edu/peer_ground_motion_database. Accessed March 2013
Quenouille MH (1956) Notes on bias in estimation. Biometrika 43:353–60
Google Scholar
Raschke M (2009) The biased transformation and its application in goodness-of-fit tests for the beta and gamma distribution. Commun Stat Simul Comput 38:1870–1890
Article Google Scholar
Raschke M (2011) Inference for the truncated exponential distribution. Stoch Env Res Risk A. doi:10.1007/s00477-011-0458-8
Google Scholar
Raschke M (2012) Möglichkeiten der mathematischen Statistik zur Schätzung der Hochwasserwahrscheinlichkeit (German, Possibilities of mathematical statistics to estimate flood probability). Wasser und Abfall 14(6):49–53. cms.springerprofessional.de/journals/JOU=35152/VOL=2012.14/ISU=6/ART=193/BodyRef/PDF/35152_2012_Article_193.pdf
Raschke M (2013) Parameter estimation for the tail distribution of a random sequence. Commun Stat Simul Comput 42:1013–1043
Article Google Scholar
Raschke M, Thürmer K (2008) Defizite der Modellselektion in der Hochwasserstatistik (German, Shortcomings of model selection in flood statistics). Wasser und Abfall 10(12):43–48
Google Scholar
Rawlings JO, Pantula SG, Dickey DA (1998) Applied regression analysis: a research tool, 2nd edn. Springer, New York. http://web.nchu.edu.tw/∼numerical/course992/ra/Applied_Regression_Analysis_A_Research_Tool.pdf
Restrepo-Velez LF, Bommer JJ (2003) An exploration of the nature of the scatter in ground-motion prediction equations and the implications for seismic hazard assessment. J Earthq Eng 7:171–199
Google Scholar
Rhoades DA (1997) Estimation of attenuation relations for strong-motion data allowing for individual earthquake magnitude uncertainties. Bull Seismol Soc Am 87:1674–1678
Google Scholar
Sadigh K, Chang C-Y, Egan JA et al (1997) Attenuation relationships for shallow crustal earthquakes based on California strong motion data. Seismol Res Lett 68:180–189
Article Google Scholar
Scherbaum F, Cotton F, Smit P (2004) On the use of response spectral-reference data for selection and ranking of ground-motion models for seismic-hazard analysis in regions of moderate seismicity: the case of rock motion. Bull Seismol Soc Am 94:2164–2185
Article Google Scholar
Schlather M (2002) Models for stationary max-stable random fields. Extremes 33–44
Smith RL (1985) Maximum likelihood estimation in a class of nonregular cases. Biometrika 72:67–90
Article Google Scholar
Sørensen M, Stromeyer D, Grünthal G (2010) A macroseismic intensity prediction equation for intermediate depth earthquakes in the Vrancea region, Romania. Soil Dyn Earthq Eng 30(11):1268–1278
Article Google Scholar
Spudich P, Joyner WB, Lindh AG et al (1999) SEA99: a revised ground motion prediction relation for use in extensional tectonic regimes. Bull Seismol Soc Am 89:1156–1170
Google Scholar
Stafford PJ, Strasser FO, Bommer JJ (2008) An evaluation of the applicability of the NGA models to ground-motion prediction in the Euro-Mediterranean region. Bull Earthquake Eng 6:149–177
Article Google Scholar
Stephens MA (1986) Test based on EDF statistics. In: D’Augustino RB, Stephens MA (eds) Goodness-of-fit techniques. Statistics: textbooks and monographs, vol. 68. Marcel Dekker, New York
Stepp JC, Wong I, Whitney J et al (2001) Probabilistic seismic hazard analyses for ground motions and fault displacements at Yucca Mountain, Nevada. Earthquake Spectra 17(1):113–151
Article Google Scholar
Strasser FO, Bommer JJ, Abrahamson NA (2008) Truncation of the distribution of ground-motion residuals. J Seismol 12(1):79–105
Article Google Scholar
Strasser FO, Abrahamson NA, Bommer JJ (2009) Sigma: issues, insights and challenges. Seism Res Lett 80:40–56
Article Google Scholar
Stromeyer D, Grünthal G, Wahlström R (2004) Chi-square maximum likelihood regression for seismic strength parameter relations, and their uncertainties, with applications to an mw based earthquake catalogue for central, northern and northwestern Europe. J Seismol 8:143–153
Article Google Scholar
Upton G, Cook I (2008) A dictionary of statistics, 2nd rev. edn. Oxford University Press, Oxford
Utsu T (1999) Representation and analysis of the earthquake size distribution: a historical review and some new approaches. Pure Appl Geophys 155:509–535
Article Google Scholar
Wald DJ, Worden BC, Quitoriano V, Pankow KL (2006) ShakeMap® Maunual. Advanced national seismic system USGS. http://pubs.usgs.gov/tm/2005/12A01/
Youngs RR, Abrahamson N, Makdisi FI, Sadigh K (1995) Magnitude-dependent variance of peak ground acceleration. Bull Seism Soc Am 85(4):1161–1176
Google Scholar

Download references

Acknowledgments

We would expressly like to thank Jürg Hüsler and Zakhar Kabluchko for their explanations of details of the extreme value theory and statistics.

Author information

Authors and Affiliations

Stolze-Schrey Str.1, 65195, Wiesbaden, Germany
Mathias Raschke

Authors

Mathias Raschke
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mathias Raschke.

Appendices

Appendix 1: An inappropriate approach to model selection

Scherbaum et al. (2004) formulated the criterion for model selection, which is the median of the statistic LH, defined with (symbols according to the reference)

$$ LH\left({Z}_0\right)=2\left[1-\varPhi \left(\left|{Z}_0/{\sigma}_0\right|\right)\right] $$

(20)

wherein Z ₀ is the estimated residual (here ξ) and its estimated standard deviation σ ₀. Φ is the CDF of the standard normal distribution; a normal distributed Z ₀ is desired resp. assumed. The smaller the value |Median(LH) − 0.5|, the better is the model. The problem is that |Median(LH) − 0.5| = 0 for different distributions of Z ₀. Examples are shown in Fig. 14. The criterion does not work.

Appendix 2: Numerical research of distributions of combinations of horizontal components

The horizontal components (Y ₁,Y ₂) of ground motion are combined for some GMRs as geometric mean, arithmetic mean, or by a vectorial addition (see Douglas 2003, Section 6). The possibility of a logarithmic normal distribution of resulting ε ₀ is researched here numerically. Therein, we assume Gumbel distributed components (ε ₀₁,ε ₀₂). The other components of the GMR are not considered here because they scale (ε ₀₁,ε ₀₂) simultaneously. The dependence structure of (ε ₀₁,ε ₀₂) is assumed to be of a bivariate normal distribution (Mari and Kotz 2001, Section 4.4) and is quantified by the correlation coefficient R. We simulate pairs of horizontal component ε ₀₁ and ε ₀₂ with this copula and the Gumbel distributions as marginal and combine them. We Monte Carlo simulate a large sample (10,000) of such combinations and check its logarithm for normal distribution with the Anderson–Darling test according to Stephens (1986) for a significance level α = 5 %. We repeat this procedure 100 times and get a share of rejected assumptions to be normal distributed. This share should be around 5 %; otherwise, the considered combinations of Gumbel distributed maxima are not log-normally distributed. This is the case according to the results in Table 7.

Table 7 Share of rejections (in percent) for test of normality (estimated standard error in brackets; sample size n = 10,000, 100 repetitions, parameter b = 1 of Gumbel distribution, see Eq. (8b))

Full size table

Appendix 3: Details of the constructed situation of seismicity

The constructed source region and the considered site s are depicted in Fig. 15. The truncated exponential distribution for the magnitudes is formulated according to Cosentino et al. (1977) with

$$ {F}_{\mathrm{m}}(m)=\left(1- \exp \left(-{\beta}_{\mathrm{m}}\left(m-{m}_{\min}\right)\right)\right)/\left(1- \exp \left(-{\beta}_{\mathrm{m}}\left({m}_{\max }-{m}_{\min}\right)\right)\right),\kern1em {m}_{\min}\le M\le {m}_{\max }, $$

(21)

wherein β _m is a scale parameter, m _max is the upper bound magnitude, and m _min is the smallest considered magnitude. We set m _min = 4 and β _m = 2.3 (see Utsu 1999). The maximum magnitude m _max depends on the investigated variant. The annual seismicity is set to ν = 4.4/600² [km⁻²], which means that 4.4/600² earthquakes with M ≥ 4 occur per km² in the source region (Fig. 15).

Appendix 4: Details of the simulations of Section 5.4

We assume the following for the Monte Carlos simulation of sample in Section 5.4. The beta distribution is applied to simulate a sample of random magnitude m which is generally written with (see Johnson et al. 1995)

$$ f(x)={\left(x/\left(b-a\right)\right)}^{p-1}{\left(\left(1-x\right)/\left(b-a\right)\right)}^{q-1}\varGamma (p)\varGamma (q)/\left(\left(b-a\right)\varGamma \left(p+q\right)\right),a\le x\le b,\kern0.5em p>0,q>0. $$

(22)

The parameters for the beta distributed magnitude m are listed for all variants in Table 8. The real epicenter distance is also simulated by a beta distribution with b = 0 and with parameter a

$$ a=c{M}^d. $$

(23)

The parameters c, d, p, and q of the variants are listed in Table 9.

Table 8 Parameters for the constructed beta distribution of real magnitudes M

Full size table

Table 9 Parameters for the constructed beta distribution of real epicenter distance D

Full size table

Appendix 5: Details of the modeling and estimations of Section 6

Fig. 16

Table 10

Table 10 Estimated parameters of the best variant (smallest AIC) of the different approaches and data sets according to Section 6.2 (E and W geo-coordinates of t; Δ_X, Δ_y, b, and ω according to Fig. 16; θ _i according to Eq. (5))

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Raschke, M. Statistical modeling of ground motion relations for seismic hazard analysis. J Seismol 17, 1157–1182 (2013). https://doi.org/10.1007/s10950-013-9386-z

Download citation

Received: 13 May 2013
Accepted: 01 August 2013
Published: 13 September 2013
Issue Date: October 2013
DOI: https://doi.org/10.1007/s10950-013-9386-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Statistical modeling of ground motion relations for seismic hazard analysis

Abstract

Similar content being viewed by others

Uncertainty in the estimates of peak ground acceleration in seismic hazard analysis

Overview and introduction to development of non-ergodic earthquake ground-motion models

A novel model and estimation method for the individual random component of earthquake ground-motion relations

1 Introduction

2 Regression model for GMR

2.1 Basic formulation

2.2 Random components, estimation methods, and errors

2.3 The danger of over-parameterization

2.4 The test of the distribution assumption

3 The distribution of the maximum of a random sequence

4 GMR in the PSHA as random function in geo-space

4.1 GMRs as random functions in space

4.2 A model of anisotropic GMR for a point source

4.3 Examples of area-equivalent GMRs

5 Numerical studies

5.1 The influence of the distribution type

5.2 An example of area-equivalent GMRs in a PSHA

5.3 The obscuration of a Gumbel distributed random component ε 0

5.4 The influence of the different effects on the estimation of GMR and PSHA

6 Alternative estimation of GMR

6.1 The basic concept

6.2 Analysis of empirical data

6.3 Area functions and site effects of the Chi-Chi earthquake

6.4 Relation of specific GMRs to the magnitude

7 Conclusion and outlook

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix 1: An inappropriate approach to model selection

Appendix 2: Numerical research of distributions of combinations of horizontal components

Appendix 3: Details of the constructed situation of seismicity

Appendix 4: Details of the simulations of Section 5.4

Appendix 5: Details of the modeling and estimations of Section 6

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation

5.3 The obscuration of a Gumbel distributed random component ε ₀