1 Introduction

Probabilistic seismic hazard analysis (PSHA) is a complicated and crucial problem of modern seismology as it is related to the effects of strong earthquakes and their consequences for the inhabitants. PSHA is applied to estimate the possible amplitudes of destructive seismic ground motion and to provide the design loads for the construction of critical structures such as dams and power plants. The main goal of such analysis is to minimize the negative effect of future strong earthquakes. Although there are several methods of PSHA (Cornell 1968; Shumilina et al. 2000; Kijko 2008), the most frequently used method is the Cornell–McGuire procedure (Cornell 1968). The theoretical foundations, formulated by Cornell and Esteva (McGuire 2008), were supplemented by the computer programs developed by McGuire (1976; 1978), which led to a method of PSHA known as the Cornell–McGuire procedure.

Ground motion variability is an important component of this method (Bender 1984; Bommer and Abrahamson 2006). This component was introduced in the Cornell–McGuire procedure to account for the effect of the scatter of the amplitude of seismic ground motion at a site (Cornell 1971) and is included in the main equation of this procedure. The common assumption is that the ground motion variability can be modelled by a random variable with a log-normal distribution (Joyner and Boore 1981). This implies that the residuals of ground motion parameters are log-normally distributed about the predicted value or, equivalently, that the residuals of the logarithms of these parameters are normally distributed. However, this hypothesis has not been reliably tested. Moreover, the assumption of log-normally distributed residuals has become a standard and as a result usually is not tested but is accepted as a given.

The evidence for a log-normal distribution was confirmed by the Kolmogorov–Smirnov (KS) test at the 90 % confidence limit (Campbell 1981). Nevertheless, although the hypothesis was not rejected by the KS test, it does not imply that the hypothesis is true. The KS test does perform well in a central part of a distribution; however, it is widely known that the test demonstrates poor sensitivity to deviations from the hypothesized distribution that occur in the tails.

The log-normal assumption is criticized in Raschke (2013), where author notes that the natural distribution for residuals of maxima, such as peak ground acceleration (PGA), is the generalized extreme value. The theory of extreme values is widely applied in the analysis of natural disasters in general and in the analysis of seismic hazard in particular. Pisarenko and Rodkin (2010) provides the results of the application of the extreme value theory for various aspects of the analysis of natural disasters.

In general, a PSHA is applied to estimate ground motions with an annual probability of exceedance down to 10−4, a typical annual exceedance probability value designated for nuclear power plant design. However, in a PSHA performed for the Yucca Mountain nuclear waste repository, probabilistic hazard curves were extrapolated to an annual exceedance probability of 10−8. The peak characteristics of ground motion corresponding to an annual exceedance probability of 10−7 were as high as \(20\ g\) for PGA and up to \(1800\,{\rm cm/s}\) for peak ground velocity (Corradini 2003; Stamatakos 2004). This instance revealed a controversy in a fundamental assumption of the modern Cornell–McGuire method. As pointed out, for example, in Abrahamson (2000), at these low annual probabilities, the hazard estimates are controlled by the tail of the distribution of the ground motion residuals. Since log-normal distribution is unbounded, extrapolation of a hazard curve leads to the unlimited increase in the amplitudes of expected ground motions, with the decrease in the annual probability of exceedance.

On the other hand, some recent studies of the results of the Global Seismic Hazard Assessment Program (GSHAP) revealed discrepancies between the observed seismicity and that predicted by the resulting maps of this program (e.g. Kossobokov and Nekrasova 2011, 2012; Wyss et al. 2012). The authors of these studies concluded that the common methods of PSHA are inadequate and need to be revised and probably modified.

One probable source of the revealed inadequacy is the assumption of the log-normal distribution of residuals of the ground motion parameter (e.g. PGA). An upper tail of the distribution of the ground motion residuals controls the behaviour of hazard curves at long return periods. Therefore, an accurate modelling of this distribution, especially at an upper tail region, is a significant problem.

The current study is methodological in nature, and its main purpose is to introduce a suitable method of studying the ground motion variability. In this study, an analysis of the distribution of the residuals of the logarithm of PGA is performed in order to select a parametric law that describes this distribution most accurately. Data obtained from the Japanese strong-motion seismograph networks were used in the study. The Japanese database was chosen mainly because of a dense net of strong-motion stations that allow obtaining enough observations. A ground motion prediction equation (GMPE) of Zhao et al. (2006) was used for the calculation of the forecast values of PGA. Statistical criteria show that the best approximation for the distribution of residuals of the logarithm of PGA is achieved with the generalized extreme value distribution (GEVD). The generalized Pareto distribution (GPD) is used to capture the behaviour of an upper tail more accurately.

The remainder of the paper is organized as follows. In Sect. 2, the methods used for data analysis are described. In Sect. 3, the main results of data analysis are presented and discussed. An implication for PSHA is demonstrated in Sect. 4. Concluding remarks are summarized in Sect. 5.

2 Methods

The method for studying the distribution of residuals is based on the sequential application of the KS test and the Akaike information criterion (AIC), and a quantile–quantile plot. The distribution of the residuals of the natural logarithm of PGA is modelled by a number of parametric distributions. The residual is defined as

$$\begin{aligned} \varepsilon = \ln (\hbox {PGA}_{\mathrm{observed}})-\ln (\hbox {PGA}_{\mathrm{predicted}}) \end{aligned}$$
(1)

where \(\hbox {PGA}_{\mathrm{observed}}\) is the observed value and \(\hbox {PGA}_{\mathrm{predicted}}\) is the value calculated by using an appropriate GMPE.

The typical GMPEs allow the calculation of median values of the ground motion parameters by using their dependence on the magnitude, source-to-site distance, local soil conditions at a site, source mechanism, and others. Such equations often have an empirical nature and are developed based on vast databases of observed values of ground motion parameters (Boore and Joyner 1982). The selection of the most appropriate GMPE is not a trivial task, and some guidance and criteria for choosing the most appropriate GMPE for the application in a PSHA for a particular site can be found in Scherbaum et al. (2009) and Arroyo et al. (2014). A comprehensive list of GMPEs developed during the period 1964–2010 is presented in Douglas (2011).

In this study, data recorded by the Japanese strong-motion seismograph networks were used. The GMPE of Zhao et al. (2006) was used for the calculation of the forecast values of PGA. This GMPE was developed for the calculation of the ground motion parameters of subduction zone earthquakes, and it allows calculating a geometrical mean of the horizontal components of PGA, or 5 % damped acceleration response spectrum.

In this study, the logistic distribution, the Student’s t-distribution, and the GEVD were considered as alternatives to the normal distribution. Following a standard notation, where μ is a location parameter and σ is a scale parameter, PDFs of these distributions can be written as follows:

The PDF of Student’s t-distribution is defined as

$$\begin{aligned} f_{\mu ,\sigma ,n}(x)= \frac{ \varGamma \big (\frac{n+1}{2}\big )}{ \varGamma \big ( \frac{n}{2}\big ) \sqrt{\pi n} \sigma } \bigg (1+\frac{1}{n} \frac{(x-\mu )^2}{\sigma ^2}\bigg )^{- \frac{n+1}{2}}, \quad x \in (-\infty ;+\infty ) \end{aligned}$$

where \(\varGamma\) is an Euler’s gamma function and n is a number of degrees of freedom.

The PDF of logistic distribution is defined as

$$\begin{aligned} f_{\mu ,\sigma }(x)= \frac{ \exp \Big (-\frac{x-\mu }{\sigma }\Big )}{\sigma \Big (1+\exp \Big (-\frac{x-\mu }{\sigma }\Big ) \Big )^2 }, \quad x \in (-\infty ;+\infty ) \end{aligned}$$

The PDF of the GEVD is defined as

$$\begin{aligned} f_{\mu ,\sigma ,\xi }= \frac{1}{\sigma } \left\{ \begin{array}{ll} \Big (1+\xi \frac{(x-\mu )}{\sigma }\Big )^{-\frac{1}{\xi }-1} \exp \Big \{ -\Big (1+\xi \frac{(x-\mu )}{\sigma }\Big )^{-\frac{1}{\xi }}\Big \}, &{} \quad {1+\xi \frac{(x-\mu )}{\sigma } \ge 0, \xi \ne 0}\\ \exp \Big \{ -\exp \Big (-\frac{x-\mu }{\sigma }\Big )-\frac{x-\mu }{\sigma } \Big \}, &{} \quad {x \in (-\infty ;+\infty ), \xi =0} \end{array}\right. \end{aligned}$$

where ξ is a shape parameter.

Statistical analysis was performed in the following order:

  • Estimation of the distribution parameters by the maximum likelihood (ML) method.

  • Testing the hypothesis that a sample belongs to the current distribution by the KS test at 0.05 significance level.

  • Calculation of the AIC for hypotheses that were accepted by the KS test.

The application of the KS test (Massey 1951) for one sample allows rejecting the distributions that do not fit the empirical data. The test statistic of this test with the Bol’shev’s amendment (Bol’shev and Smirnov 1965) is calculated by using the formula

$$\begin{aligned} S_k = \frac{6 n D_n +1}{6 \sqrt{n}} \end{aligned}$$
(2)

where \(D_n= \max {(D_{n}^{+},D_{n}^{-})}\), \(D_{n}^{+}=\max {\left\{ \frac{i}{n}-F(x_i , \theta )\right\} }\), \(D_{n}^{-}=\max {\left\{ F(x_i , \theta )- \frac{i-1}{n}\right\} }\); n is a sample size, \(x_1,\ldots ,x_n\) are elements of a sample, sorted in ascending order, and \(F(x, \theta )\) is a cumulative distribution function (CDF) of a parametric model that undergoes the test.

An attractive feature of this test is that the distribution of its test statistic itself does not depend on the underlying cumulative distribution function being tested. However, in composite hypotheses testing, when the parameters of the probability distribution are estimated on the analysed sample, the KS test loses this feature. In such instances, the conditional distribution of a test statistic depends on a number of factors (such as form of \(F(x, \theta )\), number of estimated parameters, and method of parameter estimation). Lemeshko and Lemeshko (2009) presents the updated results (tables of percentage points and models of the distributions of statistics) for nonparametric goodness-of-fit tests in testing composite hypotheses in case of using ML estimations.

The KS test rejects hypotheses for which the maximum deviation of the theoretical CDF from the empirical CDF exceeds a critical value at a given significance level.

However, the KS test alone does not allow unambiguous conclusion about which parametric model approximates the empirical distribution most accurately. Such a conclusion can be made based on the calculation of the AIC (Akaike 1974) for hypotheses that were accepted by the KS test. The criterion is defined as

$$\begin{aligned} \hbox {AIC}= -2 \ln (L)+2k \end{aligned}$$
(3)

where L is a maximized likelihood function and k is a number of parameters of the probability distribution model.

The parametric distribution for which the value of criterion is minimal is considered the best approximation among the considered alternatives for the empirical distribution.

The quantile–quantile plot allows comparing the quantiles of empirical and theoretical distributions. The conception of such a plot has emerged from the observation that for important classes of distributions, the quantiles are linearly related to the corresponding quantiles of a standard example from this class (Beirlant et al. 2004). Linearity in a graph can be easily checked by the eye and can further be quantified by means of a correlation coefficient.

3 Results and discussion

The results of the statistical analysis show that the best approximation of the distribution of residuals is achieved with the GEVD. It is important to note that a similar conclusion was reached in Dupuis and Flemming (2006) from theoretical considerations. In Dupuis and Flemming (2006), the regression analysis was performed using both the GEVD and the normal distribution as a model for the distribution of residuals; it was demonstrated that a better fit to the data and in turn more accurate acceleration estimates are obtained with the use of the GEVD. A similar conclusion in regard to the distribution of the ground motion residuals was also reached in Raschke (2013).

Figure 1 demonstrates the histogram of residuals together with the fitted PDFs.

Fig. 1
figure 1

Histogram of residuals \(\Big (\varepsilon = \ln (\hbox {PGA}_{\mathrm{observed}})-\ln (\hbox {PGA}_{\mathrm{predicted}})\Big )\) and the fitted PDFs. 1. Histogram, 2. GEVD, 3. normal, 4. Student’s t, and 5. logistic

Corresponding values of the AIC are presented in Table 1.

Table 1 Values of the AIC for considered distributions

The CDF of the GEVD is defined as

$$\begin{aligned} H_{\xi ,\mu ,\sigma }(x)=\left\{ \begin{array}{ll} \exp \Big \{-\Big (1+\xi \frac{(x-\mu )}{\sigma }\Big )^{-1/\xi }\Big \}, &{} { \xi \ne 0}\\ \exp \Big \{-\exp \Big (-\frac{x-\mu }{\sigma }\Big )\Big \}, &{} {\xi =0} \end{array}\right. \end{aligned}$$

This is generalized form, also known as the Jenkinson-von Mises representation, which combines three types of extreme value distributions. When ξ = 0, it is equivalent to the Gumbel distribution (type I), when ξ > 0, it is equivalent to the Fréchet distribution (type II), and when ξ < 0, it is equivalent to the Weibull distribution (type III) (Embrechts et al. 1997).

Figure 2 demonstrates the quantile–quantile plot for the quantiles of the GEVD. It can be seen from the plot that an upper tail of the distribution of the residuals slightly deviates from the GEVD. Accurate modelling of that tail is an important problem because it defines the hazard curve at long return periods.

Fig. 2
figure 2

Quantile–quantile plot of sample data versus quantiles of the GEVD

Therefore, the “peaks over threshold” method was applied to fit an upper tail of the distribution of residuals more precisely. This method is based on fitting the GPD to values that exceed a reasonably large threshold (Embrechts et al. 1997). The GPD arises as a limiting distribution of the excesses for a sufficiently large threshold value and is often used for modelling the tails of empirical distributions. The CDF of the GPD is defined by the following function

$$\begin{aligned} G_{\xi ,\nu ,\beta }(x)=\left\{ \begin{array}{ll} 1-\Big (1+\xi \frac{(x-\nu )}{\beta }\Big )^{-1/\xi }, &{}\quad {\xi \ne 0} \\ 1-e^{-\frac{x-\nu }{\beta }}, &{}\quad {\xi =0}\\ \end{array}\right. \end{aligned}$$

Similar to the GEVD, the GPD is also characterized by three parameters, namely location ν, scale β, and the shape parameter ξ. When the GPD is used as a model for a tail of some other distribution, its parameter ν defines the threshold from which a tail region of that distribution begins. When ξ = 0, the GPD is equivalent to the exponential distribution, when ξ > 0, the GPD has a heavy tail, and when ξ < 0, the GPD has a finite upper bound defined as \(x_F=\nu -\frac{\beta }{\xi }\). Three possible types of tail of the PDF of the GPD with different values of the shape parameter are shown in Fig. 3.

Fig. 3
figure 3

Tails of PDF of the GPD with different values of parameter ξ

There are several methods for the estimation of the shape parameter. Well-known methods such as the Hill estimator (1975) and the Pickands estimator (1975) are both based on the asymptotic properties and require a significant number of observations. Applicability of these methods to the real observations is doubtful (Pictet et al. 1998). In this study, therefore, a shape parameter was estimated by using the ML method.

The robust estimation of the shape parameter \(\hat{\xi }\) requires an optimal choice of the threshold value \(\hat{\nu }\). If too high a value of \(\hat{\nu }\) is chosen, too few exceedances and, consequently, high variance estimators will be the result. When \(\hat{\nu }\) is too small, the estimators become biased. The procedure for the optimal determination of the threshold value is proposed in Embrechts et al. (1997). This procedure utilizes the linearity of the mean excess function for the GPD, which is defined as

$$\begin{aligned} e(\hat{\nu })=E(X-\hat{\nu }|X>\hat{\nu })=\frac{\hat{\beta }+\hat{\xi } \hat{\nu }}{1-\hat{\xi }} \end{aligned}$$
(4)

This procedure suggests selecting the threshold value \(\hat{\nu }\) as a starting point of a linear segment of the mean excess graph. Such a graph for sample data is presented in Fig. 4.

Fig. 4
figure 4

Mean excess graph of sample data

By varying the threshold value and observing changes in the estimates of the rest of the parameters of the GPD, the optimal threshold value can be determined. Evidence of such a choice is the stabilization of the estimates of the scale and shape parameters. Once again, a quantile–quantile plot is used as a tool for comparing the data and the model. An estimator of the quantile of the GPD can be written as

$$\begin{aligned} {\hat{x}}_p=\hat{\nu }+\frac{\hat{\beta }}{\hat{\xi }} \bigg [\Big (\frac{n}{N_{\hat{\nu }}}(1-p)\Big )^{-\hat{\xi }}-1\bigg ] \end{aligned}$$
(5)

where n and \(N_{\hat{\nu }}\) are sample size and number of exceedances, respectively, and \(\hat{\nu }\), \(\hat{\beta }\) and \(\hat{\xi }\) are estimates of the GPD parameters.

Quantile–quantile plots of a tail of residual data versus quantiles of the GPD with the values of the shape parameter estimated by the ML method for the GEVD and the GPD are shown in Figs. 5 and 6. The estimates of a shape parameter differ for the instances where the GEVD is used as a model for a full range of residuals and where the GPD is used additionally to fit an upper tail more accurately. The estimates are \(\hat{\xi }=-0.245\) for the first instance and \(\hat{\xi }=-0.359\) for the second.

Therefore, the distribution of residuals is represented by a hybrid distribution model that consists of the GEVD in a central region and the GPD in a region of an upper tail.

Fig. 5
figure 5

Quantile–quantile plot of the tail fraction of residuals, ξ is estimated for the GEVD

Fig. 6
figure 6

Quantile–quantile plot of the tail fraction of residuals, ξ is estimated for the GPD

A similar analysis was performed during this study by using the GMPEs of Atkinson and Boore (2003) and Kanno et al. (2006) to check how generally applicable these results are. The results obtained with these GMPEs are very close to those presented in this study.

4 Implication for probabilistic seismic hazard analysis

For a demonstration of the effect of replacing a normal distribution, hazard curves were calculated in the following manner:

  1. 1.

    Using an unbounded normal distribution.

  2. 2.

    Using a normal distribution, truncated at a specified level of ground motion.

  3. 3.

    Using the GEVD.

  4. 4.

    Using a hybrid distribution model that consists of the GEVD for a central region and the GPD for an upper tail.

A brief review of the Cornell–McGuire probabilistic seismic hazard analysis procedure could be helpful for understanding the following material.

To begin with, recap of a PDF of log-normal distribution could be useful. If the distribution of a random variable is log-normal, its PDF has the following form

$$\begin{aligned} f_{\mu ,\sigma }(x)= \frac{1}{x \sigma \sqrt{2 \pi }}e^{-\frac{(\ln x-\mu )^2}{2 \sigma ^2}},\quad x>0 \end{aligned}$$

The transformation \(Y=\ln X\) leads to a normally distributed random variable with a location μ and a scale σ parameters. The values of these parameters are estimated by using an appropriate GMPE. Given an earthquake with magnitude m, the probability can be calculated that ground motion at distance r from the source will exceed a particular level a 0 by the following equation

$$\begin{aligned} P(y\ge \ln (a_0)|m,r)=\frac{1}{\sqrt{2 \pi } \sigma } \int _{a_0}^{\infty } e^{-\frac{(y-\mu )^2}{2 \sigma ^2}}\hbox {d}y \end{aligned}$$
(6)

This equation can be conveniently expressed in terms of the standard normal distribution

$$\begin{aligned} P(y\ge \ln (a_0)|m,r)=1-\varPhi (z) \end{aligned}$$
(7)

where \(z=\frac{\ln (a_0)-\mu }{\sigma }\) is a standardized normal random variable and \(\varPhi (z)\) is the standard normal CDF.

Next, consider a site surrounded by N seismic sources. Each seismic source is characterized by magnitude \(M_i\), distance to site \(R_i\), and annual activity rate \(\nu _i\). The parameters of future seismic events are yet unknown; therefore, \(M_i\) and \(R_i\) are random variables with corresponding PDFs \(f_{M_i}(m)\) and \(f_{R_i}(r)\). The total annual rate of exceedance of a particular level of ground motion a 0 can be calculated as follows

$$\begin{aligned} \lambda (y\ge \ln (a_0))=\sum _{i=1}^{N} \nu _i \int \!\!\!\!\int P(y\ge \ln (a_0)|m,r) f_{M_i}(m) f_{R_i}(r)\, \hbox {d}r \hbox {d}m \end{aligned}$$
(8)

From an assumption that the sequence of major seismic events can be modelled by the Poisson distribution, it follows that the probability for a particular level of ground motion a 0 to be exceeded at least once during the time interval T can be calculated (Anderson and Brune 1999) as follows

$$\begin{aligned} P(y\ge \ln (a_0), T) = 1 - \exp (-\lambda (y\ge \ln (a_0)) \times T) \end{aligned}$$
(9)

Equation 9 with T = 1 year defines the seismic hazard curve, the main result of the PSHA. For small values of the annual rate of exceedance (\(\lambda (y\ge \ln (a_0)) \ll 1\)), Eq. 9 can be approximated as

$$\begin{aligned} P(y\ge \ln (a_0), T = 1) = 1 - \exp (-\lambda (y\ge \ln (a_0))) \cong \lambda (y\ge \ln (a_0)) \end{aligned}$$
(10)

As emphasized in Wang (2011), T = 1 year is neglected on the right side of Eq. 10; thus, both sides of this equation contain a dimensionless quantity, i.e. the annual probability of exceedance.

As can be seen from Eq. 8, the ground motion variability is explicitly incorporated in the calculation of the seismic hazard. It is, namely, used in a calculation of the conditional exceedance probability of a ground motion of a particular level a 0.

The normal distribution is unbounded; therefore, the further a hazard curve extrapolated, the higher the level of ground motion is expected to be exceeded. The necessity of an upper bound of the ground motion and the difficulties related to its determination are summarized in Bommer et al. (2004). Strasser et al. (2004) proposed the truncation of the distribution of residuals at a level of three standard deviations above the median as a measure to prevent the effect of unbounded normal distribution.

Given a normal distribution, truncated at a value a T , the PDF has to be renormalized to satisfy the fundamental properties of PDF. Then, the probability that an earthquake with magnitude m will produce ground motion at distance r from the source that exceeds a particular level a 0 can be expressed as

$$\begin{aligned} P(y\ge \ln (a_0)|m,r)=\left\{ \begin{array}{ll} 1 - \frac{\varPhi (z)}{\varPhi (z_T)}, &{} {y\le a_T}\\ 0, &{} {y>a_T} \end{array}\right. \end{aligned}$$
(11)

where \(z_T=\frac{\ln (a_T)-\mu }{\sigma }\).

After the replacement of a normal distribution by the GEVD, the same probability can be expressed as

$$\begin{aligned} P(y\ge \ln (a_0)|m,r)= 1-H_\xi (z) \end{aligned}$$
(12)

where \(H_\xi\) is a standardized CDF of the GEVD.

And after the replacement of a normal distribution by a hybrid model, this probability can be written as

$$\begin{aligned} P(y\ge \ln (a_0)|m,r)=\left\{ \begin{array}{ll} 1-(1-p)\frac{H_\xi (z)}{H_\xi (z_\nu )}, &{} {y\le \mu + \nu }\\ p(1-G_\xi (z)), &{} {y > \mu + \nu } \end{array}\right. \end{aligned}$$
(13)

where \(z_\nu = \frac{\ln (a_\nu )-\mu }{\sigma }\), \(a_\nu =\exp (\mu +\nu )\), \(G_\xi (z)\) is a standardized CDF of the GPD, p is a fraction of the residual values that fall in a tail region.

For the purpose of demonstration, a simple hypothetical example was considered. This example is similar to an example used in Baker (2008) and assumes there are two seismic sources that may affect the site. Both sources are subduction slab sources. The first source is capable of producing an earthquake of magnitude \(m_1 = 5.5\) every 100 years (\(\nu _1 = 0.01\)) and is located at a depth of \(d_1 = 30 \ km\) and a distance of \(r_1 = 140 \ km\) from the site. The second source is capable of producing an earthquake of magnitude \(m_2 = 6.5\) every 500 years (\(\nu _2 = 0.002\)) and is located at a depth of \(d_2 = 30 \ km\) and a distance of \(r_2 = 200 \ km\) from the site. The soil conditions at a site are characterized as medium soil (\(V_{S30} = 250 \, {\rm m/s}\)). For the given combinations of parameters, a GMPE of Zhao et al. (2006) gives \(\ln (PGA)\) values of \(\mu _1 = 1.8404 \, {\rm cm/s^2}\), \(\mu _2 = 2.0233 \, {\rm cm/s^2}\) and a standard deviation \(\sigma = 0.6840\), which is a constant in this GMPE for seismic events generated by sources of identical type. With the defined earthquake scenarios, Eq. 8 simplifies to the following

$$\begin{aligned} \lambda (y\ge \ln (a_0))= \nu _1 \times P(y\ge \ln (a_0)|m_1,r_1)+\nu _2 \times P(y\ge \ln (a_0)|m_2,r_2) \end{aligned}$$
(14)

By repeating these calculations for a range of values of PGA, a total hazard curve can be constructed. Hazard curves calculated by using the above-mentioned distributions are represented in Fig. 7.

Fig. 7
figure 7

Hazard curves calculated using different parametric distributions

As can be seen, the hazard curve calculated by using the GEVD displays the highest ground motion estimates, almost down to an annual exceedance probability of 10−6 where it crosses with the hazard curve calculated by using an unbounded normal distribution. The hazard curve calculated by using a hybrid distribution model is very close to the curve calculated by using a truncated normal distribution, down to an annual exceedance probability of 10−5, after which it estimates higher ground motions and the difference gradually increases.

As can be seen from Fig. 7, the hazard curves calculated by using the GEVD and a hybrid distribution model depend strongly on the shape parameter ξ. Although the method applied for statistical analysis in this study is satisfactory, the estimations of the shape parameter can only be called preliminary. These estimations were obtained based on the strong ground motion records from a particular region and, therefore, are valid only for this particular region, which in this instance is Japan. Such analysis should be performed for a multiple number of datasets of the recordings of seismic ground motions that were induced by earthquakes of various types and magnitudes and were recorded worldwide, for a possible generalization of these results.

5 Conclusion

In this study, the distribution of the residuals of \(\ln (\hbox{PGA})\) was modelled by a number of probability distribution laws, using the database of the strong-motion seismograph networks of Japan and a GMPE of Zhao et al. (2006). The results of the analysis indicate that the best approximation for the distribution of residuals was obtained with the GEVD. This result is consistent with the conclusions of Dupuis and Flemming (2006) and Raschke (2013). The “peaks over threshold” method was applied in an attempt to model an upper tail of the distribution of residuals more precisely. Thus, the resulting distribution of residuals is a hybrid model that consists of the GEVD in a central region and the GPD in a region of an upper tail. Similar analysis was performed during this study by using GMPEs of Atkinson and Boore (2003) and Kanno et al. (2006), which demonstrated analogous regularities.

The estimations of the shape parameter of the GEVD and the GPD resulted in negative values, indicating that the distribution of residuals has a finite upper bound. Consequently, a maximum value of PGA can be associated with an earthquake scenario involved in the PSHA. This approach is preferred to the truncation procedures proposed in Strasser et al. (2004), because a maximum value of PGA, unlike the truncation of a distribution, has a clear physical meaning.

Hazard curves were calculated for a simple hypothetical example to demonstrate the effect of the replacement of the normal distribution. Hazard curves were calculated by using the GEVD and a hybrid distribution model, which differ from each other and from the curves calculated by using the normal distribution. This difference is particularly evident at low annual exceedance probabilities.