1 Introduction

The level of local seismic impact is estimated for modern building codes and the earthquake-resistant design of industrial facilities by probabilistic seismic hazard analysis (PSHA) as a part of seismology and earthquake engineering. Therein, the average annual exceedance frequency of local earthquake ground motion intensity is estimated. An important element of PSHA is the ground motion relation (GMR), which describes the relation between the local ground motion intensity and different event parameters such as the magnitude (Bommer and Abrahamson 2006). It is also called ground motion prediction equation. We prefer the term GMR of Atkinson (2006) because we research an appropriate relation for the PSHA that needs not to be the best prediction and its residual variance for a single event. The previous GMRs are mostly modeled by a statistical regression analysis (Strasser et al. 2009) wherein the event parameters are predictors. Douglas (2001, 2002) provides a good overview of GMRs being published before 2002, and Douglas (2003) gives an excellent overview of all aspects of GMRs like estimation methods or source models. Therein, the physical unit of local ground motion intensity is the peak ground acceleration (PGA) or the maximum of another type of local time history. The parameters of current GMRs are fixed, not event-specific, including the depth parameter. The conditional probability distribution of the local ground motion intensity is generally modeled by the logarithmic normal (log-normal) distribution in the GMR, which implies a normal distribution for the logarithmized ground motion intensity (Joyner and Boore 1993; Strasser et al. 2009). This approach results in unrealistically high estimations of ground motion intensities for low exceedance frequency (Stepp et al. 2001; Abrahamson et al. 2002; Bommer and Abrahamson 2006), which has not been improved by the next generation of GMR (NGA; Abrahamson et al. 2008). Beside this, truncation of the log-normal distribution was suggested to avoid overestimations, but choosing the truncation point is difficult according to Strasser et al. (2008). Therein, statistical estimation methods for truncation points (Raschke 2011) have not been considered. We generally note a lack of consideration of current knowledge of stochastics and statistics in the research of GMR. For example, it is known for a long time that the statistical significance of regression models of GMR should be validated (Joyner and Boore 1981), but many NGAs are not validated in this sense (see Table 2). Beside this, at least the individual random component (ε 0 of Joyner and Boore 1993) of the PGA should follow an extreme value distribution according to the extreme value statistics (Leadbetter at al. 1983; Coles 2001). Dupuis and Flemming (2006) have introduced the concept of extreme value statistics into GMR, but their paper was not considered any further. In the following section on regression models for GMRs, we criticize important statistical aspects of previous GMR and briefly call arguments for the extreme value distribution of the individual random component in Section 3. However, our break with the traditional approaches to GMRs is deeper; in Section 4, we mathematically derive the area equivalence for GMRs in PSHA inspired by equivalences in max-stable random fields (Schlather 2002; Kabluchko et al. 2009). Therein, GMRs are random functions, which include event-specific GMRs and distinction between GMRs for an actual prediction and GMRs for the PSHA. We also introduce an approach to an anisotropic point source model in this section. In Section 5, we numerically research the detectability of the distribution model and the influence of this and other items like the area equivalence on PSHA. Then, we suggest an estimation concept for our approach to GMR in Section 6, including a regression-free estimation of the variance of the individual random component. We partly apply this concept to nine suitable data sets and research the link between the event-specific GMR and the magnitude. Finally, we conclude our results in the last section. We follow here the rules of statistics, use its terms (see Upton and Cook 2008).

2 Regression model for GMR

2.1 Basic formulation

The GMR is usually formulated by a regression model with the basic formulation (Lindsey 1996; Rawlings et al. 1998; Montgomery et al. 2006)

$$ Y=g\left(\mathbf{X}\right)+{\varepsilon}^{*},E(Y)=g\left(\mathbf{X}\right),E\left({\varepsilon}^{*}\right)=0,V(Y)=V\left({\varepsilon}^{*}\right), $$
(1)

Y is the predicted variable (response variable, dependent variable, conditional variable, or regressand). The regression function g(X) includes a parameter vector θ, which is estimated. The predictors (independent variables, predicting variables, or regressors) are the elements of the random vector X = (X 1, X 2,.., X m ). E(.) are the expectations and V(.) are the variances. The random variable ε * is the random component (residual, random term, or measurement error) and determines the cumulative distribution function (CDF) F y of Y under condition of X. If g(X)  0 and V(ε *) is proportional to g 2(X), then we write the equivalent formulation

$$ Y=\varepsilon\;g\left(\mathbf{X}\right),\varepsilon =1+{\varepsilon}^{*},E(Y)=g\left(\mathbf{X}\right),E\left(\varepsilon \right)=1,V(Y)=V\left(\varepsilon \right){g}^2\left(\mathbf{X}\right). $$
(2)

We prefer this formulation for GMRs, wherein Y is the PGA or something similar, because the expectation is a very important characterization of a random variable and ε can be neglected under certain conditions (Section 4.1). If Y ≥ 0, then we can logarithm and formulate the popular model for GMRs (Douglas 2001, Abrahamson et al. 2008)

$$ \ln (Y)={g}^{*}\left(\mathbf{X}\right)+\xi, \kern0.5em E\left( \ln (Y)\right)={g}^{*}\left(\mathbf{X}\right),E\left(\xi \right)=0,V\left( \ln (Y)\right)=V\left(\xi \right). $$
(3)

It is assumed for most GMRs for PSHA that ξ is normally distributed (Joyner and Boore 1993; Strasser et al. 2009). This implies a model according to Eq. (2) with log-normally distributed ε. The link between Eqs. (2 and 3) is (Johnson et al. 1994, Eq. (14.8))

$$ E(Y)=g\left(\mathbf{X}\right)= \exp \left({g}^{*}\left(\mathbf{X}\right)+V\left(\xi \right)/2\right)\mathrm{and} $$
(4a)
$$ V(Y)= \exp \left(2{g}^{*}\left(\mathbf{X}\right)\right) \exp \left(V\left(\xi \right)\right)\left( \exp \left(V\left(\xi \right)\right)-1\right), $$
(4b)
$$ \varepsilon = \exp \left(\xi \right)- \exp \left(V\left(\xi \right)/2\right). $$
(4c)

We apply Eqs. (3 and 4) simultaneously even if ε is not exactly log-normally distributed (Johnson et al. 1994, Eq. (12.67) with ξ Johnson ≈ 0). A typical formulation for a GMR is (Douglas 2002)

$$ {g}^{*}\left(\mathbf{X}\right)={\theta}_0+{\theta}_1m-{\theta}_2r-{\theta}_3 \ln (r)+{\theta}_{\mathrm{s}}{x}_{\mathrm{s}}+\dots, r=\sqrt{d^2+{h}^2},{\theta}_1>0,{\theta}_2\ge 0,{\theta}_3\ge 0 $$
(5)

with predicting variables magnitude m, source distance d (and r) and indicator variable x s and its parameter θ s for the site condition. The source depth is considered by h and can be a parameter or an event-specific predictor. There are many variants and extensions for g *(X) (Douglas 2002; Abrahamson et al. 2008).

2.2 Random components, estimation methods, and errors

The random components ε resp. ξ are independently and identically distributed (iid) variables, and the predictors are measured exactly in simple regression models. For such cases, the least squared (LS) estimation can be applied, which is equivalent to the maximum-likelihood estimation for normally distributed residuals (see Rawlings et al. 1998, p. 77). This is not popular in seismology, e.g., Castellaro et al. (2006) incorrectly claim that the residuals have to be normally distributed for the LS regression. The LS method has often been used for GMRs and is extended to random components that are not iid. Douglas (2003, Section 11) gives an overview of approaches from before 2003. The two most important approaches seem to be the one and two stage regression method with the following random components (Joyner and Boore 1993; with assumption of normal distribution)

$$ \xi ={\xi}_{\mathrm{E}}+{\xi}_{\mathrm{S}}+{\xi}_0,E\left({\xi}_{\mathrm{E}}\right)=E\left({\xi}_{\mathrm{S}}\right)=E\left({\xi}_0\right)=0, $$
(6)

wherein ξ E is event-specific, ξ S is site-specific, and ξ 0 has an individual realization for each site (station) and event. We prefer the product formulation according to Eq. (2) with

$$ \varepsilon ={\varepsilon}_{\mathrm{E}}{\varepsilon}_{\mathrm{S}}{\varepsilon}_0{\varepsilon}_{\mathrm{Q}},E\left(\varepsilon \right)=E\left({\varepsilon}_{\mathrm{E}}\right)=E\left({\varepsilon}_{\mathrm{S}}\right)=E\left({\varepsilon}_0\right)=E\left({\varepsilon}_{\mathrm{Q}}\right)=1,\kern0.5em {\varepsilon}_{\mathrm{Q}}={g}_{\mathrm{actual}}\left(\mathbf{X}\right)/{g}_{\mathrm{eqivalent}}\left(\mathbf{X}\right), $$
(7)

wherein the additional pseudo-random component ε Q (resp. ξ Q) results from the ratio between actual and equivalent function g(X) according to Section 4. A general distribution assumption is not required, but it is obvious that ε Q has a finite upper bound and a lower bound larger than 0 for a fixed distance d. That is one reason why ε Q cannot be log-normally distributed.

In other GMRs, the component ε S resp. ξ s had been replaced by site-specific predictors x s in Eq. (5). But there is no proof that one additional predictor can completely replace ε S resp. ξ S and we doubt this because site response is very complex. Independent of this, one condition of the regression models of Joyner and Boore (1993) is that predictors m and r do not include a measurement error. However, magnitudes are not measured exactly. Rhoades (1997) considers the known variance of the seismological magnitude estimation in his regression analysis for GMR. It is not considered that this known error needs not to be the only one. The actual error of the seismological estimation can be higher according to Giardini (1984). Arguments for this: The source mechanism influences the ground motion (Campbell 1981, 1993; Crouse and McGuire 1996; Sadigh et al. 1997). This acts like a measurement error of magnitudes in the GMR. An application of fewer classes of source mechanism would reduce but not eliminate it. Furthermore, the inter-event variability ξ E can be interpreted as an error in magnitudes because the seismological magnitudes can be exact for a certain aspect of the rupture process but do not need to be exact for the GMR. The actual magnitude of GMR could be a non-measurable, latent variable, which is estimated by common magnitudes with an error in the sense of statistical error-in models (Cheng and van Ness 1999, Section 1.1). The considerable differences between the estimated residual variances of GMRs for one sample of PGAs but for different magnitude scales (see, e.g., Atkinson and Boore 1995, Table 5) support this assumption. Additionally, the magnitudes of the analyzed sample could be from different scales (e.g., Bommer et al. 2007), which acts like a measurement error.

The source-to-site distance d is also treated as exactly measured predictor. But it should include an error because there are many definitions for this distance (Douglas 2003, Section 9). How could it be possible that all these measures for the same physical aspect act without a measurement error? Moreover, the distances are determined by the seismological source estimation, which also includes errors. Even if parameters of this error would be known, it would be difficult to consider it in a regression analysis [personal communication with (pcw) Douglas, spring 2013]. Beside this, the influence of the source depth is often reduced to a fixed parameter for a defined class of earthquakes (e.g., shallow events). In other GMRs (Ambraseys and Bommer 1991), h is the seismological epicenter depth. But neither is the influence of the source depth the same for every earthquake nor is the seismological depth exactly measured. In both cases, a kind of measurement error is neglected. Furthermore, it is assumed for current GMR that the parameter vector θ of the GMR is the same for each event (Joyner and Boore 1993; Abrahamson et al. 2008).

There are more estimation methods for a regression model (e.g., Rawlings et al. 1998, Section 10; Stromeyer et al. 2004). The models for unknown measurement errors of predictors (Cheng and van Ness 1999, Section 4) are not applied for GMR as far as we know. Beside this, the aspect of estimating the estimation errors of the regression parameters is not considered in all approaches. These standard errors can be easily estimated for a simple linear LS regression with iid random components (Rawlings et al. 1998, Section 4.6). But it is more difficult for models with random effects. Joyner and Boore (1993) applied the Monte Carlo simulation to estimate the estimation error; Rhoades (1997) has computed these standard errors using the likelihood function. Chen and Tsai (2002) also give a method to estimate the standard error. But Abrahamson and Young (1992) do not give any advice for this issue regarding their procedure. We draw attention here to the fact that an estimation error can be computed by the Jackknife technique (Quenouille 1956; Efron 1979). This also applies for clustered data according to Raschke (2012, 2013), as is the case for the mixed effects. The estimation error can be applied directly to construct the confidence range and verify the statistical significance of a predictor and its parameter.

2.3 The danger of over-parameterization

We could explain the entire variance of a predicted variable Y or ln(Y) by a regression model if we use a large number of predictors and related parameters, although not all predictors have an actual influence (Rawlings et al. 1998, Fig. 8.2). The question is: how can we distinguish between significant and insignificant predictors and/or parameters? Different statistical tools can solve this problem. The first one is the significance test for the regression parameters θ I in g(X) resp. g *(X) = … + θ i X i . We test here if θ i  ≠ 0, θ i  ≤ 0, or θ i  ≥ 0 for a defined significance level α (5 % is often used and recommended here). The last two variants are applied when physical reason bounds the influence of a predictor, e.g., a larger magnitude should be related to a larger PGA. In this case, we can be sure with a probability of 100 % − α that the actual parameter θ I does not have a contrary sign. The smaller α is, the more rigorous is the test. The t test is such a test (Rawlings et al. 1998, Sections 1.6 and 5.3), which has seldom been applied for GMR, e.g., by Joyner and Boore (1981), Molas and Yamazaki (1995), and Ambraseys et al. (2005, pcw Douglas March 2013). Note that the classical t test cannot be applied without modification or acceptance of inaccuracies to the case of mixed effects (clustered data). The significance of a published GMR is also examined implicitly by published standard errors of the parameter estimation. If the related quantile, corresponding with α, is not smaller/larger than 0, then it implies statistical significance. This is the case for the estimations of Joyner and Boore (1993, Table 3) and Rhoades (1997, Table 1). Applying model selection criterions in the model building (Rawlings et al. 1998, Section 7) is a further possibility for guaranteeing the statistical significance, e.g., the Akaike information criterion (AIC) or the Bayesian information criterion.

The significance has to be verified for each statistical model. Otherwise, danger of over-parameterization arises. This problem applies to a considerable amount of GMRs; we list 15 examples in Table 1. This is also an issue in other researches (Raschke and Thürmer 2008).

Table 1 Examples of GMRs without sufficient validation of significance/model selection

2.4 The test of the distribution assumption

Any statistical distribution model should be validated (D’Augustino and Stephens 1986). This also applies to the residual distribution of a GMR in PSHA although a distribution assumption is not necessary for the LS regression. A powerful goodness-of-fit test is the best method of examining the distribution assumption, as the Anderson–Darling (AD) test for a normal distribution (Landry and Lepage 1992). Contrary to the aforementioned t test, the test is the more rigorous the larger the selected significance α is. There are such tests for different distribution functions with estimated parameters (Stephens 1986). If all parameters are known, then the distribution is fully specified and the classical Kolmogorov–Smirnov (KS) test can be applied. If the KS test is applied to estimated parameters, then the test does not work (Raschke 2009). If there is not an applicable goodness-of-fit test for the distribution type used, then a quantile plot (qq plot) can be used for a visual, qualitative test as done by Dupuis and Flemming (2006) for residuals with a mixed, non-normal distribution. However, there is no objective criterion for rejecting the distribution hypothesis in this case. A histogram is a kind of parameter-free distribution model, but it is not a tool for validating a distribution model (not mentioned by D’Augustino and Stephens 1986) because there is no objective criterion for rejection and there are many possible histograms for a sample. We state that the assumption of normally distributed ξ resp. its components in Eq. (3) is often not correctly validated for GMRs. For example, Ambraseys and Bommer (1991), Ambraseys and Simpson (1996), Ambraseys et al. (1996), Atkinson and Boore (1995), Spudich et al. (1999), Douglas and Smit (2001), Atkinson (2004), and Kalkan and Gülkan (2004) have neither assumed nor tested a distribution model. Beside this, the assumed normal distribution of ξ has been tested by the inappropriate KS test in other studies (e.g., McGuire 1977; Campbell 1981; Abrahamson 1988; Monguilner et al. 2000; Restrepo-Velez and Bommer 2003). The quantile plot (e.g., Chang et al. 2001; Bommer et al. 2004) and the histogram (e.g., Atkinson 2006; Beyer and Bommer 2007; Morikawa et al. 2008) have also been applied to validate the normal distribution although these methods are not appropriate. Note that even though ξ seems to be normally distributed, it needs not to be (see Section 5). The inappropriate test of a distribution assumption is also a problem in other researches, e.g., of flood hazard in Germany (Raschke and Thürmer 2008).

3 The distribution of the maximum of a random sequence

The popular assumption for GMR for PGAs that all random components ε are log-normally distributed (ξ normally distributed, Joyner and Boore 1993; Strasser et al. 2009) is in contradiction to the extreme value theory. According to this special field of stochastics and statistics, the maximum of a sample Y = Max{Z 1,Z 2,..,Z i ,…,Z n } of iid random variables has a generalized extreme value distribution in most cases (GED, Fisher and Tippett 1928; Gnedenko 1943; Beirlant et al. 2004; de Haan and Ferreira 2006). The maximum of a sequence of non-iid random variables also has a GED under some weak conditions (Leadbetter et al. 1983, Parts II and III; Falk et al. 2011, Part III). Its CDF is written with the extreme value index γ (shape parameter), scale parameter σ, and location parameter μ

$$ G(y)= \exp \left(-{\left(1+\gamma \left(y-\mu \right)/\sigma \right)}^{\raisebox{1ex}{$-1$}\!\left/ \!\raisebox{-1ex}{$\gamma $}\right.}\right),\gamma \ne 0,\kern0.62em 1+\gamma \left(y-\mu \right)/\sigma >0, $$
(8a)
$$ G(y)= \exp \left(- \exp \left(-\left(y-\mu \right)/\sigma \right)\right),\gamma =0, $$
(8b)

with the Fréchet domain for γ > 0, the Weibull domain for γ < 0, and the Gumbel domain for γ = 0. The latter one is also called the Gumbel distribution. A fundamental property of the GED is max stability. This means that the maximum max{Y 1 , Y 2} of extreme value distributed variable Y is again extreme value distributed with the same extreme value index γ. If Y would be log-normally distributed, then max{Y 1,Y 2} would not be log-normally distributed. Neither the normal nor the log-normal distributions are max-stable. This problem is typical for the combination of the two horizontal components (Y 1, Y 2) of the earthquake record, it could be defined by a maximum max{Y 1, Y 2} (Douglas 2003, Section 6). All combinations with a maximum definition of Douglas (2003, Section 6, #2, 4, 5) result in a classical extreme value for which the GED is the natural distribution. The log-normal assumption for random component ε 0 is wrong in this case according to the state of the art of stochastics and statistics. We consider here only maxima. In case of other combinations of the horizontal components, it is also unlikely that random component ε 0 becomes log-normally distributed. We have investigated the distribution of the combination arithmetic mean, geometric mean, and vectorial addition of Gumbel distributed components ε 01 and ε 02 numerically (Appendix 2). The log-normal assumptions are rejected.

The argument of missing max stability of the log-normal assumption also applies for sub-sections of the ground motion time history. If the sub-maxima of not overlapping sub-sections of the time history are log-normally distributed, then the maximum of the entire time history cannot be log-normally distributed (exception: all sub-maxima are identical). Log-normal assumptions would also contradict all our experiences with extreme values (Hüsler et al. 2011, Raschke 2011, 2012, 2013).

We briefly investigate the possible domain of attraction for PGAs and analyze the tail of three acceleration time histories (Fig. 1). The tails are exponentially distributed, which indicates the Gumbel domain of attraction for the maxima of the accelerations (see Coles 2001, Section 4). Besides, Dupuis and Flemming (2006) have estimated a GMR with GED for the residuals of PGA with extreme value index γ ≈ 0, which also indicates the Gumbel domain.

Fig. 1
figure 1

Tails of the time series of ground acceleration a of the PEER Strong Motion Database (2000): a station: CDMG 24278, component: 090, earthquake: Northridge earthquake 01/17/94; b station: ARAKYR, component: 090, earthquake: GAZLI 5/17/76; c station: SMART1 I07, component: NS, earthquake: Taiwan

4 GMR in the PSHA as random function in geo-space

4.1 GMRs as random functions in space

A random function is a function randomly selected from a set of functions (population). Schlather (2002, Theorem 1 and its proof; we use different symbols) applies a measurable random function W(s − t)  0 to construct a max-stable process. This max-stable process max(Y(s i )) refers to the maxima at site s from all point events i with local event intensity Y(s i ) = m oi W(s i  − t). Therein, t is a source allocation resp. the source point in the sense of PSHA (not necessary a point source), being part of a homogeneous Poisson process in the geo-space and at a moment scale m o ≥ 0 with density m o −2. Its (annual) maximum max(Y(s i )) has the CDF G(y) = exp(−λ(y)) with annual exceedance function (AEF) λ(y) of annual average frequency of exceedance Y(s) ≥ y according to the limit law for extremes (Coles 2001, Section 7.3). This distribution is equivalent for different sets of random functions if their expectations E(V o) of the volume V o = ∫ s W(s − t)ds are equal. This construction can be interpreted as homogeneous seismicity with m o W(st) = g(X). The spatial distance s − t is part of the distance elements of the predicting vector X beside the event parameters in sub-vector X E of X. It is scaled by the earthquake magnitude with m = ln(m o) without influence on the shape of W(s − t) resp. g(X). The magnitude is exponentially distributed without an upper bound. According to Theorem 1 of Schlather, the AEF λ(y) is equal for different GMRs if the expectation E(V o) is equal for any fixed event parameters like magnitude m and h. Furthermore, we can extend this equivalence to m o W(s − t) = ε(s)g(X) according to Theorem 2 of Schlather (2002; pcw Kabluchko 2012) if the random component ε has the expectation E(ε) = 1. This includes that for an event is ∫  s g(X)d s ≈ ∫  s ε(s)g(X)d s if the variance and the spatial correlation of ε(s) is relative small. In all cases, we can interpret a GMR as a random function in space being an element of a set (population). Different sets of GMR act equivalent if the expectations E(V o) are equivalent. Note that the variance and the distribution type of ε have no influence on λ(y). This could be the reason why Cornel (1968) did not explicitly consider a random component in his GMR for the PSHA with exponentially distributed magnitudes without upper bound—he did not need and could have find this out by numerical researches. Non-mathematicians can check all results in the same way.

This equivalence of GMRs works only for exponentially distributed magnitudes without upper bound. We introduce the area function K(y) to derive a general equivalence of GMRs being independent of the magnitude distribution. For this purpose, we consider at first GMR g(X) in a simple one-dimensional geo-space as shown in Fig. 2b. We use an example with two maxima of g(X) to demonstrate the general application of this equivalence. For fixed event parameters and a fixed value y, the GMR covers a certain area of all points with y ≤ g(X). The area function K(y) is for homogeneous site conditions

$$ \begin{array}{lllll}K(y)={\displaystyle \int \mathbf{1}}\left(g\left(\mathbf{X}\right)\right)d\mathbf{t},\hfill & \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=1\hfill & \mathrm{for}\hfill & y\le g\left(\mathbf{X}\right),\hfill & \mathrm{otherwise}\kern0.62em \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=0.\hfill \end{array} $$
(9)
Fig. 2
figure 2

GMR and area equivalence: a g(X) in a 1D geo-space, b resulting area function K(y), c g(X) for fixed source point t and reflected version for fixed s, d 2D geo-space with isolines of g(X) for different t (light gray) and reflected isoline (dark gray) for fixed s for anisotropic point source model, e as d but for a line source

The first derivation is the related area density measuring the amount of points with y = g(X)

$$ k(y)=- dK(y)/ dy. $$
(10)

The area function is defined according to Fig. 2a, b for a fixed source point t. But we could also fix s and draw g(X) at t although it acts at site s. We reflect the GMR in this way for 1D geo-space, as shown in Fig. 2c. We have an equivalent formulation for K(y) with

$$ \begin{array}{lllll}K(y)={\displaystyle \int \mathbf{1}\left(g\left(\mathbf{X}\right)\right)}d\mathbf{s},\hfill & \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=1\hfill & \mathrm{for}\hfill & y\le g\left(\mathbf{X}\right),\hfill & \mathrm{otherwise}\kern0.62em \mathbf{1}\left(g\left(\mathbf{X}\right)\right)=0.\hfill \end{array} $$
(11)

The reflection becomes complicated for the two-dimensional geo-space, but Eqs. (911) still apply. We can illustrate the reflection for an isoline with fixed y = g(X), as shown in Fig. 2d for an anisotropic GMR for a point source and an isotropic GMR for a line source in Fig. 2e.

Now, let us assume the case of homogeneous seismicity: Each point t in the geo-space represents a source allocation with equivalent occurrence intensity ν, equivalent g(X) with X = (s,t,X E) with event parameter X E = (m,h,x i ) and its multivariate probability density function f E. The distances d resp. r of the GMR are determined by s, t, h and the source model. We formulate for the AEF λ(y) of annual average frequency of exceedance Y(s) > y

$$ \lambda (y)=\nu {\displaystyle {\int}_{{\mathbf{X}}_{\mathrm{E}}}{\displaystyle {\int}_{\mathbf{t}}{f}_{\mathrm{E}}\left({\mathbf{X}}_{\mathrm{E}}\right)\left(1-{F}_y\left(y;E\left(Y\left(\mathbf{s}\right)\right)=g\left(\mathbf{X}\right.\right),V\left(Y\left(\mathbf{s}\right)\right)={g}^2\left(\mathbf{X}\right)V\left(\varepsilon \left)\right)\right.\right)}}d\mathbf{t}d{\mathbf{X}}_{\mathrm{E}},\kern0.5em \mathbf{X}=\left({\mathbf{X}}_{\mathrm{E}},\mathbf{s},\mathbf{t}\right). $$
(12)

Therein, the CDF F y is parameterized by its expectation E(Y(s)) and variance V(Y(s)) according to Eq. (2). V(ε) can be influenced by X E but does not include ε Q. Equation (12) is oriented on the absolute probability integral of McGuire 1995, but there are many equivalent formulations. One includes a replacement of the integration on t by the area density k(y) and the integration on y = g(X) with

$$ \lambda (y)=\nu {\displaystyle {\int}_{{\mathbf{X}}_{\mathrm{E}}}{\displaystyle {\int}_zk\left(z\Big|{\mathbf{X}}_{\mathrm{E}}\right){f}_{\mathrm{E}}\left({\mathbf{X}}_{\mathrm{E}}\right)\left(1-{F}_y\left(y;E\left(Y\left(\mathbf{s}\right)\right)=z,V\left(Y\left(\mathbf{s}\right)\right)={z}^2V\left(\varepsilon \right)\right)\right)}} dzd{\mathbf{X}}_{\mathrm{E}}, $$
(13)

because the integration in the geo-space in Eq. (12) is nothing else than a computation of the amount of points with y = g(X) in the sense of measure theory (Billingsley 1995, Chapter 2). Now, it is obvious that two GMRs g 1(X) ≠ g 2(X) result in equivalent hazard with λ 1(y) = λ 2(y) if the area density is equal with k 1(y) = k 2(y) resp. K 1(y) = K 2(y). Note, all other components in Eqs. (12 and 13) are equal, including the parameterization of CDF F y by V(ε), which does not include ε Q. The equivalence of λ(y) of Eqs. (12 and 14) applies only to one site s with homogeneous seismicity in its surrounding. We introduce now an expansion of this equivalence to the influence function λ *(y). This function describes the influence of any fixed source point t to the seismic hazard of all sites s with homogenized site conditions

$$ {\lambda}^{*}(y)=\nu {\displaystyle {\int}_{{\mathbf{X}}_{\mathrm{E}}}{\displaystyle {\int}_{\mathrm{s}}{f}_{{}_{\mathrm{E}}}\left({\mathbf{X}}_{\mathrm{E}}\right)\left(1-{F}_y\left(y;E\left(Y\left(\mathbf{s}\right)\right)=g\left(\mathbf{X}\right.\right),V\left(Y\left(\mathbf{s}\right)\right)={g}^2\left(\mathbf{X}\right)V\left(\varepsilon \left)\right)\right.\right)}}d\mathbf{s}d{\mathbf{X}}_{\mathrm{E}},\kern0.5em \mathbf{X}=\left({\mathbf{X}}_{\mathrm{E}},\mathbf{s},\mathbf{t}\right). $$
(14)

This integral is basically equivalent to the integral of Eq. (12), and the principle of area equivalence also applies. Area-equivalent GMRs result in equal influence functions. Therein, homogeneity of seismicity is not required for Eq. (14); f E and ν can be source point specific. Time dependence is also possible.

What is the consequence? For an actual and area-equivalent GMR is g actual(X) ≠ g equivalent(X) for almost all X, what includes a local bias. An example is given in Fig. 3b. But ε Q = g actual(X)/g equivalent(X) of Eq. (7) is not an actual random component and is not considered in Eqs. (1214). If ε Q resp. ξ Q are interpreted as an actual random component, and the estimated residual variance from the regression analysis for the GMR is directly applied to the GMR in PSHA, then we overestimate the entire random component (residual) ε resp. ξ and the variances V(Y(s)) and by this the influence function λ *(y) resp. the entire influence of each source point t on the AEFs λ(y) of all sites. The hazard estimation of all sites is systematically overestimated in this way. This does not exclude the possibility of local underestimation of λ(y) according to Fig. 3b. The only exception of the systematic overestimation is the case if all random components ε Q, ε 0, and ε E have no influence (see above).

Fig. 3
figure 3

Construction of anisotropic GMR by an unit-isoline: a the intercept theorem with relation between distances (A′ − A)/(A − t) = (A′ − B′)/(A − B), b example of an unit-isoline with \( {d}_{\mathrm{unit}}\left(\varphi \right)=\left.0.96+0.352\right|{\left. \sin \left(2\varphi \right)\right|}^{{}^{1.5}}/ \sin \left(2\varphi \right) \) (gray regions overestimation, white regions underestimation, A and B for Fig. 4a)

Non-mathematicians can numerically check these results. Beside this, we state that GMRs could be random functions because there is no proof that all events need to have equal parameters θ in Eqs. (25).

4.2 A model of anisotropic GMR for a point source

An anisotropic GMR can be simply formulated for a point source model according to the intercept theory (Fig. 3a) by a unit-isoline which includes area π equal to the unit circle of angle functions. The radius function d unit(φ) determines the unit-isoline with azimuth φ of local polar coordinates with origin t. The distance d in r 2 = h 2+ d 2 is replaced by d * = d/d unit (φ). An example is pictured in Fig. 3b. Different unit-isolines can obviously be combined by the sum d 2 unit(φ) = ∑  i a i  ⋅ d 2 i,unit(φ) with weighting 0 ≤ a i  ≤ 1 and ∑  i a i  = 1. We do not discuss any physical interpretation because our focus is on statistical modeling and physics is also not discussed in other anisotropic models (Enescu and Enescu 2007; Sørensen et al. 2010).

4.3 Examples of area-equivalent GMRs

We illustrate the action of misinterpretation of ε Q as an actual random component in example I. We apply the unit-isoline of Fig. 3b, and we set θ 3 = 1 and θ 2 = 0 of a GMR according to Eq. (5) (see Fig. 4a). The parameters are θ 0 = θ 1 = θ s = 0 because they are not relevant here. Furthermore, we fix h = 10 km and simulate for a fixed site in the center of a source region with homogeneous seismicity as described in Appendix 3. There is no actual random component in the actual GMR because V(ξ) = 0. We have plotted ln(Y) in relation to distance r in Fig. 4b with the regression function for the isotropic, circular GMR. The estimated parameters are almost equal to the actuals. If the observed residuals are interpreted as actual random components, then the residual variance is overestimated with V(ξ) = 0.10. An interesting aspect constitutes the distance dependency of ξ Q in Fig. 2b: It increases with increasing distance. But we can also construct an example II of area equivalence with decreasing variance using the Joyner–Boore distance. In Fig. 4c, an actual and an estimated vertical projection of the rupture is pictured. The shape of both projections and the included area is equal; only the azimuth is different. Obviously, the actual and the modeled Joyner–Boore distance are area equivalent for the same GMR. But there is a component ε Q resp. ξ Q. We simulate again observations without actual random components and show these in Fig. 4d. V(ξ Q) decreases with increasing, large distances.

Fig. 4
figure 4

Area-equivalent GMR g *(X): a isotropic and anisotropic variant of example I (directions A and B according to Fig. 3b), b estimation of isotropic (circular) g *(X) for a with a Monte Carlo simulated sample, c vertical projection of example II, d estimated g *(X) for estimated projection and simulated sample

5 Numerical studies

5.1 The influence of the distribution type

We numerically investigate the influence of the type of CDF F y , which is determined by the distribution of ε resp. ξ, on the hazard curve for equivalent residual variance. For this purpose, we use again the constructed situation of seismicity according to Appendix 3 with fixed depth h = 10 km and consider different upper magnitudes m max = 7 and 9. Additionally, we consider different variances V(ξ) = 0.152 and 0.32 for log10, which are typical for previous GMRs. The applied GMR is g *(X) = 0.5m − ln(r) − 0.002r + 4.7. We consider different distributions of random component ε: Gumbel, the log-normal and truncated log-normal distribution. The latter has an upper and lower bound at three times its standard variation. The computed AEFs are shown in Fig. 5. We note that the influence of the distribution type depends on the maximum magnitude, the residual variance, and the range of y. The hazard of rare events is the largest for the log-normal distribution with high variance. Of course, further seismicity parameters influence the contribution of the distribution model, and the distribution model has no influence in case of unbounded exponentially distributed magnitudes (Section 4.1).

Fig. 5
figure 5

AEF for different distribution types: a V(ξ) = 0.152 and m max = 7, b V(ξ) = 0.152 and m max = 9, c V(ξ) = 0.32 and m max = 7, d V(ξ) = 0.32 and m max = 9 (ξ here for log10)

5.2 An example of area-equivalent GMRs in a PSHA

We research the influence of the misinterpretation of ε Q of example I (Section 4.3, Fig. 3b) on the PSHA. The GMR is g *(X) = 0.5m − ln(r) + 4.7. The considered site s again is the center of the quadratic source region of uniform seismicity with m max = 8; for further details, see Appendix 3. We compute the AEF for an isotropic (circular) GMR and the anisotropic one, both without a random component resp. residual. In the third variant, we consider the isotropic GMR and consider a normally distributed random component ξ with V(ξ) = 0.10 according to the estimation of example I in Section 3.5. The results are illustrated in Fig. 6. As expected according to the theory given in Section 4.1, the area-equivalent GMRs without random component result in equivalent hazard and the misinterpretation of the differences as random component results in the overestimated hazard.

Fig. 6
figure 6

Hazard curves for the example of misinterpreted differences of GMR (see Fig. 4a, b)

5.3 The obscuration of a Gumbel distributed random component ε 0

The Gumbel distribution of random component ε 0 (Eq. (8b)) could be hidden. If we observe/estimate the product ε 0 ε S (ε S as unknown site effect, acting as random variable according to Joyner and Boore 1993), then we cannot test the assumption of log-normal distribution for each component. But the product could be distributed similarly to a log-normal distribution. An example: ε 0 is Gumbel distributed with E(ε 0) = 1 and V(ε 0) = 0.199, ε s is beta distributed (see Eq. (22)) with E(ε s) = 1 and V(ε s) = 0.091 and with bound 0.5 ≤ ε s ≤ 2.8. The product ε 0 ε S has a mixed distribution as shown in Fig. 7a. It looks very similar to a log-normal distribution. However, their tails in Fig. 7b differ considerably. The tail is important for PSHA according to the studies of Restrepo-Velez and Bommer (2003) and Strasser et al. (2008), and it is a specially studied object in extreme value statistics (Leadbetter et al. 1983). Therein, the tail of a distribution should be modeled by the generalized Pareto distribution. Huyse et al. (2010) has already modeled the tail for the residuals of a ground motion using a generalized Pareto distribution.

Fig. 7
figure 7

Possibility of hidden Gumbel distribution: a mixed distribution of beta and Gumbel distribution and log-normal distribution with E(ln(Y)) = −0.13 and V(ln(Y)) = 0.292, b survival functions of a, c qq plot

Furthermore, we research in detail the possibility of hidden distributions for a GMR Y = ε 0 ε Sexp(0.7m − ln(r) + θ s,1 x s,1+ θ s,2 x s,2+ θ s,1 x s,2) with a site parameter exp(θ s,i ) = 0.8, 1.1, and 1.2 and indicator variables x s,i for three site types. We Monte Carlo generate a large sample (ln(Y),m,ln(r),x) with uniform distributed magnitude with 4 ≤ m ≤ 7.5, uniform distributed ln(r) with ln(5) ≤ ln(r) ≤ ln(200) and an occurrence probability of 1/3 for each site type. Then we carry out a LS regression using this sample and analyze the estimated residuals ξ to be normally distributed. We have done this for a sample size of n = 1,500. The estimated residual variance is V(ξ) = 0.293. The qq normal plot of the residuals is shown in Fig. 7c. Similar plots (see Bommer et al. 2004, Fig. 2) have been interpreted as a proof for a normally distributed ξ, and the wrong KS test would also indicate a log-normal distribution. Only the AD test would reject the false assumption. However, also this correct test would often accept the false distribution model. We state that the actual distribution of ε resp. ξ could be hidden.

5.4 The influence of the different effects on the estimation of GMR and PSHA

Now we research the effects of the combination of misinterpreted random component ε Q, incorrect distribution assumption for F y , and measurement errors in the predictors on the PSHA using examples of anisotropic GMRs with point sources. For this purpose, we assume again the constructed situation of a site s and surrounding homogeneous seismicity according to Appendix 3. The magnitude is upper bounded here by m max = 8. The seismicity parameters are precisely known for the PSHA, but the parameters of the GMRs are estimated. For the latter, regression models are estimated for Monte Carlo simulated samples of (Y,m measured,r measured). Therein, the actual hypocenter depth h is fixed and the accidental distance d to the point source is beta distributed; the related parameter depends partly on the simulated beta distributed magnitude (for details, see Appendix 3).

The measurement errors of depth h and distance d are also Monte Carlo simulated; h measured is a log-normally distributed random variable; the measured distance is d measured = |d actual + d error| (in kilometers) with normally distributed d error with a standard deviation of 5 km and an expectation of 0. A seismological epicenter or point of maximum energy could be estimated more precisely, but the seismological epicenter can differ from the epicenter of the GMR—the point of maximum g(X) resp. g *(X). We also assume a normally distributed measurement error for the magnitude with a standard deviation of 0.15 and 0.25. This is plausible according to our discussion in Section 2.2 and the magnitude errors in the PEER database (2013, NGA_Flatfile_2005Version.xls). We simulate 500 pairs of (m measured,r measured) for each sample using this procedure. Examples are illustrated in Fig. 8. These are conceivable possibilities according to actual samples (e.g., Ambraseys and Simpson 1996; Ambraseys et al. 1996; Spudich et al. 1999; Atkinson 2004; Kalkan and Gülkan 2004; Massa et al. 2008).

Fig. 8
figure 8

Examples of simulated samples

The related ground motion intensity Y = ε S ε 0 g(X) is computed by the actual pair (m.r) and the defined GMR. It is formulated by g *(X) and V(ξ 0) of Eq. (3) and is transformed to g(X) and V(ε 0) by Eqs. (4a, 4b, and 4c). Its relevant parameters are listed in Table 2. The individual random component ε 0 is Gumbel distributed (Eq. (8b)). The site-specific random component ε S is beta distributed with expectation E(ε S) = 1 with a small share to ε (see Table 2, rows 9 and 10). Anisotropy is considered by an elliptic unit-isoline (see Section 4.2 and Fig. 16b). The actual random component (residuals) ε is not log-normally distributed resp. ξ is not normally distributed.

Table 2 Investigated variants of GMRs according to Eq. (15) and the estimations (±standard error of the estimations; parameters θ i are according to Eq. (5); see also Tables 8 and 9 of “Appendix 4”)

We estimate a GMR with the LS regression for each sample of size n = 500 and test the estimated residuals ξ to be normally distributed using the KS test as done in previous researches. We repeat this 100 times for each researched variant. The averages of the estimated parameters are listed in Table 2 as well as the shares of rejection of the KS test. The false normal assumption for ξ is accepted in 68 to 98 % of the samples. The residual variances of all four variants are overestimated according to rows 10 and 11 in Table 2. Therein, the contribution of the magnitude error is small (raw 15). We show the GMRs g(X) in Fig. 9 with actual parameters and with the averages of the estimated parameters. They do not differ from each other very much, but there is a certain bias.

Fig. 9
figure 9

Actual and estimated GMRs g(X) according to Table 1, a #1 (black line) and #2 (gray line) for m = 4, b as a for m = 8, c #3 (black line) and #4 (gray line) for m = 4, d as c for m = 8 (full actual, broken estimation)

Furthermore, we compute the AEF by PSHA and for the assumed seismicity described above. We compare the influence of the actual and the estimated GMRs. We apply the averages of the estimations to the latter one. The corresponding AEFs for the constructed seismicity are depicted in Fig. 10. The actual AEFs are shown for site condition E(ε S) = 1 and the 80 % quantile of ε S. This gives an impression of the small influence of the considered site effects. Furthermore, we show an AEF for the area-equivalent isotropic GMR with the actual type and variance of F y . We state: The area-equivalence works well, as expected. The overestimated variance and the wrong log-normal assumption lead to an overestimation of the hazard for long return periods (reciprocal of exceedance frequency). The bias in parameter vector θ partly compensates this overestimation. The theoretical results of Section 4.1 are confirmed.

Fig. 10
figure 10

Estimated AEFs λ(y) for actual and estimated parameters θ, variances, and distributions, # of variant right upper corner (bold, black line actual θ and V(ξ), Gumbel, anisotropic, ε S = 1; bold, dotted, gray line actual θ and V(ξ), Gumbel, isotropic, ε S = 1; bold, broken, black line actual θ and V(ξ), Gumbel, anisotropic, 80 % quantile of ε S; bold, dotted, gray line actual θ and V(ξ), log-normal, isotropic; bold, broken, gray line estimated θ and V(ξ), log-normal; thin, black line actual θ and estimated V(ξ), log-normal; bold, gray line estimated θ and actual V(ξ), log-normal)

6 Alternative estimation of GMR

6.1 The basic concept

We have stated in Section 4.1 that the GMR of an earthquake is a random function in our model. That means that the GMR has event-specific parameters. In consequence, the parameters of the GMR can and should be estimated event-specific. Therein, we assume that it is practically impossible to find the “true” GMR being absolutely exact for each azimuth and with a perfect source model. Furthermore, we assume that an event-specific, area-equivalent GMR g(X) can be formulated and estimated by a regression model, except the variances of the actual random components. The relation of the event-specific GMRs to the event magnitudes has to be researched in this concept after a number of GMRs have been estimated. This approach has already been applied by Joyner and Boore (1981): They estimated the parameter θ 1 of Eq. (5) by a regression analysis of the pairs (m, θ 0), wherein θ 0 of our Eq. (5) is event-specific. Event-specific random component ε E resp. ξ E would be the residual variance of such secondary regression analysis. The influence of side effects can be analyzed using a posterior analysis of the estimated residuals (Morikawa et al. 2008; negligence of non-iid). Thus, event-specific randomness of the site effects could be considered. The remaining problem is to estimate V(ε 0) resp. V(ξ 0) under exclusion of any influence of ε Q resp. ξ Q. This should be possible by an analysis of the two horizontal components Y 1 and Y 2. The difference

$$ \begin{array}{l} \ln \left({Y}_1\right)- \ln \left({Y}_2\right)= \ln \left(g(X){\varepsilon}_{\mathrm{E}}{\varepsilon}_S{\varepsilon}_{01}{\varepsilon}_Q\right)- \ln \left(g(X){\varepsilon}_{\mathrm{E}}{\varepsilon}_{\mathrm{S}}{\varepsilon}_{02}{\varepsilon}_Q\right)\hfill \\ {}= \ln \left({\varepsilon}_{01}\right)- \ln \left({\varepsilon}_{02}\right)={\xi}_{01}-{\xi}_{02},E\left( \ln \left({\varepsilon}_{01}\right)- \ln \left({\varepsilon}_{02}\right)\right)=0\hfill \end{array} $$
(15)

includes only the horizontal random components ε 01 and ε 02 resp. ξ 01 and ξ 02. Therein ε 01 and ε 02 (resp. ξ 01 and ξ 02) have an equivalent distribution, and they are interdependent. If we would know the dependence structure between ε 01 and ε 02 according to Mari and Kotz (2001, Section 4, copula), then we could estimate the distribution of ε 01 and ε 02 with the difference ln(ε 01) − ln(ε 02) by statistical computations. Therefore, the dependence structure should be investigated in future researches. A differentiation by classes of magnitudes, regions, site conditions, or something else is possible because there should be enough ground motion observations to compute a large number ln(ε 01) − ln(ε 02). We cannot prove the functionality of our entire suggestion, but we estimate the GMRs of different earthquakes to demonstrate the potential of our approach.

6.2 Analysis of empirical data

We analyze the observed PGAs of nine earthquakes of the PEER Strong Motion Database (2013, files: NGA_Flatfile_2005Version.xls, NGA_Documentation.xls). We select such earthquakes with a large number of records and an event center inside the cloud of strong motion stations, which should cover the entire event area (event # 136 differs slightly). Furthermore, we consider only one event from a cluster: try to consider different regions and to cover a relatively large range of the magnitude scale. The selected earthquakes are listed in Table 3. We consider different models: a point source model with isotropic GMR, a point source model with anisotropic GMR, and (if available) the source models which lead to the constructed Joyner–Boore distance, the Campbell distance, the root-mean-squared distance (RmsD), and the closest distance to the ruptured area (ClstD). Our basic formulation is g*(X) = θ 0 − θ 2ln(r) − θ 3 r according to Eq. (5) with r 2 = d 2+ h 2 resp. r 2 = d 2 /d 2 unit(φ) + h 2 for the anisotropic point source. We use an eccentric circle and an ellipse (Fig. 16) to model anisotropy. Additionally, we consider different combinations of defined and estimated parameters. The depth parameter h can be set by the hypocenter depth or be estimated with limit h ≥ 0.1 km. The parameter θ 3 for ln(r) is estimated or set to 1; we do not consider a bound. The parameter θ 2 is either set to 0 or estimated with limit θ 2 ≥ 0. We divide the models into groups: the constructed distances, the isotropic point source with epicenter as projected point source, the isotropic point source with estimated coordinates of the point source (start values are the epicenter coordinates), and the anisotropic point source model with estimated coordinates of the point source. For each division, we select the variant of best combination of estimated/set parameters by the smallest AIC (Rawlings et al. 1998, Section 7) with sample size n and parameter number N

$$ \mathrm{AIC}= \ln \left(\widehat{V}\left(\xi \right)\right)+2N/n. $$
(16)
Table 3 Analyzed earthquakes of the PEER strong motion database

The parameters of the best models are listed in Table 10 of Appendix 5. Their AICs are given in Table 4, V(ξ) in Table 5. The constructed distances do not result in good estimations; the anisotropic point source model is frequently the best model. This may be relative because constructed distances do not exist for each event, but we consider four constructed distances and only two simple variants of anisotropy. Additionally, it may be possible that we have estimated only a local minimum of least squares for the anisotropic models; the global one could be much better. The average variance of the best models of all point source models is V(ξ) = 0.19. This also includes the site effects, but it is nevertheless significantly smaller than the residual variance of the intra-event component of NGA. For example, Abrahamson and Silva (2008, standard derivations s 1 and s 2 of Table 6, Eq. 27) have variances between 0.35 (m ≤ 5) and 0.22 (m ≥ 7). This fact indicates the advantage of our approach. Examples of the estimated GMRs are shown in Fig. 11. The graphs of the GMRs look partly very individual, which is a result of event-specific parameters. This validates the approach of individual GMRs for individual earthquakes. There are also some cases with estimated depth h = 0.1 km; our defined lower bound for h. Reason for such poor estimations is the non-regular situation in the regression model: A parameter h defines the predicting variable − distance r. Smith (1985) has mathematically researched the problem of irregularity for distribution functions; we do not know a similar research for regression models. However, this problem could be minimized by the Bayesian approach of parameter estimation: The seismological source estimation could provide a prior distribution for the depth parameter. Furthermore, the estimations for event #136, the Kocaeli, Turkey 1999 earthquake are poor. The parameter θ 3 is <0 in some estimations. However, we do not change or remove this event.

Table 4 Best AICs of the different model approaches (bolted—absolute best, cursive—best of point sources)
Table 5 Residual variances V(ξ) of g *(X) for ln(Y) of the different model approaches
Fig. 11
figure 11

Estimated GMRs g(X): a iso. point source, source coord. = epicenter, b iso. point source, estimated source coord., c an-iso. point source, estimated source coord., d Joyner–Boore distance, e Campbell distance, f RmsD, g ClstD, h legend

6.3 Area functions and site effects of the Chi-Chi earthquake

The Chi-Chi, Taiwan 1999 earthquake (#137) is the one with the largest sample size n = 420. We can use it to compare the area function of the estimated GMR to the actual one. But we cannot compute the area function directly because the records are from stations that are not uniformly distributed; there are concentrations and thinning that have to be considered. We do it using an empirical area function wherein the integration of Eq. (9) is replaced by a discrete accumulation with

$$ \begin{array}{l}\kern2em \\ {}\begin{array}{cccc}\hfill {\widehat{K}}_{\mathrm{observed}}(y)={\displaystyle {\sum}_{i=1}^n{a}_i^{*}\mathbf{1}\left({Y}_i\right)},\hfill & \hfill \mathbf{1}\left({Y}_i\right)=1\kern1em \mathrm{if}\kern0.5em {Y}_i\ge y,\hfill & \hfill \mathrm{otherwise}\kern1em \mathbf{1}\left({Y}_i\right)=0\hfill & \hfill \mathrm{and}\hfill \end{array}\end{array} $$
(17)
$$ \begin{array}{ccc}\hfill \widehat{K_{\mathrm{GMR}}}(y)={\displaystyle {\sum}_{i=1}^n{a}_i^{*}\mathbf{1}\left(g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\right),}\hfill & \hfill \mathbf{1}\left(g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\right)=1\kern1em \mathrm{if}\kern0.5em g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\ge y,\hfill & \hfill \mathrm{otherwise}\kern1em \mathbf{1}\left(g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right)\right)=0.\hfill \end{array} $$
(18)

We estimate these discrete steps a i * by a Voronoi analysis of the stations, and our estimated area functions are defined with indicator function (see Eq. (9)). The results are shown in Fig. 12. A good model should include a good accordance between Eqs. (17 and 18) with larger differences for larger y because of a larger influence of the random components. Additionally, a certain bias is conceivable for very small values of y because of the effect of the truncation of the geo-space by the finite sample resp. station number in Eqs. (17 and 18). A good agreement is detected for the point source model with estimated coordinates. Especially the anisotropic variant fits well, in contrary to the constructed distances.

Fig. 12
figure 12

Area functions of different GMRs for the Chi-Chi earthquake (black model, gray observed): a iso. point source with coordinates = seismo. epicenter, b iso. point source with estimated coordinates, c an-iso. point source with estimated source coordinates, d Joyner–Boore distance, e Campbell distance, f RmsD, g ClstD, h validation of the comparison of the empirical area functions (bold light gray line log-normally distributed ε, bold dotted dark gray line gamma distributed ε, thin black line Eq. (18))

The plausibility of the comparison of \( {\widehat{K}}_{\mathrm{observed}}(y) \) and \( {\widehat{K}}_{\mathrm{GMR}}(y) \) can be simply tested by generation of \( Y=\varepsilon g\left({\mathbf{X}}_i,\widehat{\boldsymbol{\uptheta}}\right) \) with the estimated GMR and Monte Carlo simulated random component ε. We do it for anisotropic GMR for a point source and simulate 100 times the entire sample and adopted the weighting a i * for \( {\widehat{K}}_{\mathrm{observed}}(y) \) by factor 0.01. We consider a log-normal and gamma distributed random component to demonstrate the generality of the approach (gamma distribution, see Johnson et al. 1994, Section 17). Therein is E(ε) = 1 and V(ε) = 0.181, what corresponds with V(ξ) = 0.166 (see Eqs. (4a, 4b, and 4c)). The results are pictured in Fig. 12 h. The approach works and the distribution type of ε is not relevant for the medium range of PGA. We also estimate the site effects for the Campbells GEOCODE of the PEER data (2013, NGA_Documentation.xls) using the expectation of the residuals (see Table 6).

Table 6 Expectations of residuals of g *(X) and the statistical significance for different site classes

6.4 Relation of specific GMRs to the magnitude

There is the need to find a relation between the event-specific GMRs and the earthquake magnitudes when the event size in the PSHA is quantified by the magnitude. We search such a relation by a statistical analysis of the relations between the parameters of the GMRs and the magnitudes. The results are shown in Fig. 13a–d. Obviously, there is not a significant relation. But we follow the idea of the max-stable random fields and compute the volume of GMRs in the geo-space. We compute the volume

$$ {V}_{\mathrm{o}}={\displaystyle {\int}_{\mathbf{s}}{g}^2\left(\mathbf{X}\right)d\mathbf{s}} $$
(19)

numerically for distance d resp. d * ≤ 1,000 km in steps of 25 m. Therein we squared the event-specific GMR because the PGA is approximately proportional to the PGV (Wald et al. 2006, Section 2.5, Eqs. (1.1–1.4)), the squared velocity is proportional to the energy, and the energy is strongly related to the magnitude. The logarithms of these volumes have a strong statistical linear relation to the magnitude according to Fig. 13e, f with a minor influence of V(ξ).

Fig. 13
figure 13

Relation of magnitude to the regression models with correlation coefficient R: a to h, b to θ 3, c to θ 0, d to V(ξ), d to volume according to Eq. (19), f as e but with V(ξ) = 0 in g(X) according to Eq. (4a)

Of course, the applied sample size is small, which causes an uncertainty of the result. But the magnitudes and the volumes are also only estimated and include an estimation error. Such errors rather disturb the observed relation. Therefore, the actual relation should be really strong. The relation is also not changed significantly if we eliminate event #136 with the poor estimations. Such a relation could be applied to GMRs in a PSHA. The distance parameters θ 2 and θ 3 in Eq. (5) could be random variables, and θ 0 is computed using the relation magnitude to volume. The previous magnitude parameter θ 1 would not be needed anymore.

7 Conclusion and outlook

We have discussed here important aspects of earlier approaches to GMR by a regression model and discovered in Section 2 that many models have not been built according to the rules of statistics regarding statistical significance, model selection, and test of the distribution assumption. But even if the log-normal assumption for residuals of ε are tested positively, it is not because the individual component ε 0 according to Eq. (7) of a PGA or another maximum value should be generalized extreme value distributed according to the extreme value statistics (Sections 3 and 5). Its domain of attraction seems to be the Gumbel one, but this issue should be examined by future researches. Our major contribution is the introduction of area equivalence of GMR for PSHA in Section 4, which implies a distinction between an appropriate prediction of the PGA for a concrete earthquake by a conventional regression model and an appropriate GMR for the PSHA. These models need not to be equal regarding the residual variances. In contrary, the residual of the regression model for the random function GMR includes the component ε Q of Eqs. (4a, 4b, and 4c). This may not apply as an actual random component in the GMR for the PSHA; otherwise, the variance V(ε) is overestimated, which leads to an overestimated hazard in the PSHA (except for one case, Section 4.1). The possible influence of the distribution of ε and the misinterpretation of ε Q have been researched in Section 5. The overestimation of hazard can be remarkable. Our numerical studies consider a broader constellation of parameters and distribution of predictors than the numerical studies of Joyner and Boore (1993) and Chen and Tsai (2002) about estimation procedures for GMRs. Nevertheless, the benefit is limited. More extensive numerical studies with different sample sizes would be needed to quantify more exactly the bias in the PSHA. Independent of this fact, we have suggested an estimation concept for GMRs in PSHA in the last section, including the independent estimation of the parameters of the individual random component ε 0. We stated that the dependence structure of the horizontal components has to be researched in the future to apply this concept. However, we were able to show that the event-specific modeling of GMR leads to smaller variances V(ξ) than earlier models. Therein the anisotropic point source approach results in the best regression models, while the constructed distances (e.g., Joyner–Boore) do not work well. The relation between the event magnitude and the GMRs is given by the integration of g 2(X) over the geo-space. Details of this relation and its consideration in PSHA should be studied in further researches. Beside this, the empirical area functions for the Chi-Chi, Taiwan 1999 earthquake confirm that the anisotropic point source approach works well. Nevertheless, we also suggest developing a detailed theory of this geo-statistical approach in the future. This also applies to the estimation of point source coordinates and depth by a regression analysis. Further statistical methods like Bayes estimation, local regression, or kernel regression could provide better estimations, and the models of extreme value statistics for the distribution tails could improve the GMR in PSHA. A large challenge for future researches is also the discovering, estimation, and/or examination of the distribution of every single random component of the GMR.