1 Introduction

This short note is a comment on the paper “Characterization of the Tail of the Distribution of Earthquake Magnitudes by Combining the GEV and GPD Descriptions of Extreme Value Theory” by Pisarenko et al. (2014, hereafter referred to as Pisarenko et al.). In a continuation of the research of Pisarenko et al. (2008), the authors suggest applying both generalized extreme value distribution (GEVD) and generalized Pareto distribution (GPD) to the distribution of extreme magnitudes for estimating the upper bound magnitude and the quantiles of the maximum magnitude of a defined time period. They also present the cumulative distribution function (CDF) of the truncated exponential distribution (TED) as a distribution model for earthquake magnitudes (see, e.g., Cosentino et al. 1977). This is a popular method often applied in hazard models, e.g., for the Euro-Mediterranean region (Seismic Hazard Harmonization in Europe [SHARE] 2014). The CDF of the TED is written as

$$F\left( x \right) = \begin{array}{*{20}c} {0;} & {x \, < \, m_{\hbox{min} } ;} \\ {\frac{{1 - \exp ( - \beta \left( {x\,- \,m_{\hbox{min} } } \right))}}{{1 - \exp \left( { - \beta \left( {m_{\hbox{max} } - m_{\hbox{min} } } \right)} \right)}}; \quad} & {m_{\hbox{min} } \, \le x \, \le \, m_{\hbox{max} } ;} \\ {1;} & {x\,>\, m_{\hbox{max} } .} \\ \end{array}$$
(1)

Here, the exponential function is preferred rather than the power function with base 10, as the exponential function is normally used in mathematics (e.g., Hannon and Dahiya 1999). The parameter m max represents the upper bound magnitude, and its estimation has been the subject of many studies (Pisarenko et al. 1996; Kijko and Graham 1998; Raschke 2012).

As mentioned previously, Pisarenko et al. suggest applying the GEVD and the GPD for the estimation of the upper bound magnitude, and explain the link between GEVD and GPD. I present this link in a more straightforward and transparent manner in the following section. I also explain why these models and methods of extreme value statistics do not work well in the case of the TED and other truncated exponential distributions. In Sect. 3, I comment on various issues concerning the parameter estimation for GPD and GEVD by Pisarenko et al.

Additionally, Pisarenko et al. decluster the earthquake catalogs and consider only the main events before applying extreme value analysis to the empirical earthquake data. This is based on their assumption that the event occurrence must follow a homogeneous Poisson process. This is, in fact, not necessary for applying extreme value models, as I will explain in Sect. 4. Finally, my comments are summarized in Sect. 5.

The notations of extreme value theory and statistics mentioned in the following chapters are provided in Table 1 of the appendix, which also includes the corresponding symbols employed by Pisarenko et al.

2 Comments on the Extreme Value Theory

The first important results utilizing the extreme value theory were achieved by Fischer and Tippett (1928), Gnedenko (1943). Today, this theory is a well-established field within probability theory and mathematical statistics. There are numerous available studies dealing with many aspects of this field (e.g., Leadbetter et al. 1983; de Haan and Ferreira 2006; Falk et al. 2011). One fundamental aspect of the extreme value theory is the distribution of peaks over threshold (POT, see, e.g., Coles 2001, Chap. 4; Berilant et al. 2004a, b, Sect. 5.3), which is defined as

$$Y = X - x_{\text{threshold}} , Y > 0,$$
(2)

with a real-valued random variable X and the excess Y over a certain threshold. The threshold acts in the same way as m min in Eq. (1). Under certain conditions, the CDF of Y increasingly approximates the GPD as the threshold increases. The CDF of the GPD is written as

$$H\left( x \right) = \begin{array}{*{20}c} {1 - \left( {1 + \gamma x/\sigma^{*} } \right)^{{ - \frac{1}{\gamma }}} ,} & {\gamma \ne 0, \; x > 0{\text{ and}} \quad x < - \sigma^{*} /\gamma \quad {\text{if}} \; \gamma < 0 } \\ {1 - \exp \left( { - x/\sigma^{*} } \right)} & {\gamma = 0, \; x > 0, \quad } \\ \end{array} ,$$
(3)

where σ* is the scaling parameter and γ is the extreme value index (also called the tail index). With the Weibull case, the finite right endpoint is γ < 0, with the Gumbel case it is γ = 0, and with the Fréchet case it is γ > 0. The Gumbel case also represents the exponential distribution (ED). There, the scale parameter σ* is equivalent to the reciprocal scale parameter 1/β of the TED of Eq. (1) with an infinite m max.

If the tail of a distribution is a member of one of the domains of attraction of the GPD, then the CDF of the block maxima

$$Z = \hbox{max} \left\{ {X_{1} ,X_{2} , \ldots ,X_{n} } \right\}$$
(4)

can be approximated by the GEVD in the case of a large block (sample) size n (see, e.g., Beirlant et al. 2004a, b, Sect. 5.1), with CDF

$$G\left( x \right) = \begin{array}{*{20}c} {\exp \left( { - \left( {1 + \gamma (x - \mu )/\sigma } \right)^{{ - \frac{1}{\gamma }}} } \right),} & {\gamma \ne 0, x > \mu - \frac{\sigma }{\gamma } \quad {\text{if}} \; \gamma > 0 \; {\text{otherwise}} \; x < \mu - \frac{\sigma }{\gamma }} \\ {\exp \left( { - \exp \left( { - \frac{x - \mu }{\sigma }} \right)} \right), } & {\gamma = 0} \\ \end{array} .$$
(5)

The block size can also be determined by the defined length of time of a given period of observation. The actual and exact distribution of the block maximum Z is simply formulated for independent and identically distributed random variables X with

$$G\left( x \right) = F(x)^{n} .$$
(6)

The probability that the maximum Z ≤ x is equal to the probability of no realization with X > x in the block (sample). In the case of a large n, we can apply the Poisson approximation (see, e.g., Falk et al. 2011, Part I). This means that the number of realizations X > x is binomially distributed, which can be approximated by the Poisson distribution and can be written as

$$G\left( x \right) = \Pr \left\{ {m = 0\left| {X > x} \right.} \right\} = \exp (n(1 - F(x)).$$
(7)

Furthermore, n and F(x) can be replaced by H(x) and the (average) number of excesses n threshold in the case of a large sample size n. Hence, we obtain [bounds of x according to Eqs. (5, 9a9c)]

$$G\left( x \right) = \begin{array}{*{20}c} {\exp \left( { - n_{\text{threshold}} \left( {1 + \gamma (x - x_{\text{threshold}} )/\sigma^{*} } \right)^{{ - \frac{1}{\gamma }}} } \right),} & {\gamma \ne 0} \\ {\exp \left( { - n_{\text{threshold}} \exp \left( { - \frac{{x - x_{\text{threshold}} }}{{\sigma^{*} }}} \right)} \right),} & {\gamma = 0} \\ \end{array} .$$
(8)

This equation is equivalent to Eq. (5) with the following parameter transformation

$$\sigma = n_{\text{threshold}}^{\gamma } \sigma^{*} ,$$
(9a)
$$\mu = x_{\text{threshold}} - \sigma^{*} \left( {1 - n_{\text{threshold}}^{\gamma } } \right)/\gamma \quad {\text{if}} \; \gamma \ne 0,$$
(9b)
$$\mu = x_{\text{threshold}} + \sigma^{*} \log \left( {n_{\text{threshold}} } \right) \quad {\text{if}} \; \gamma = 0,$$
(9c)

and the extreme value index γ is the same. This transformation is already well known and needs no further explanation (see, e.g., Coles 2001, Chap. 4). Note that a distribution, which is a member of one domain of attraction of the GEVD and GPD, has only one exact asymptotic extreme value index γ, although different estimations can be obtained.

A crucial point is the convergence speed: how fast does the GEVD approximate the distribution of block maxima in Eq. (6) and the GPD approximate the actual tail of the distribution? For example, the exponential distribution is equal to the GPD with an extreme value index of γ = 0 and the tail of an ED is again an ED. The convergence speed is infinite. Correspondingly, Leadbetter et al. (1983) showed that the block maxima of an exponentially distributed random variable converge to a GEVD with an extreme value index of γ = 0. For application of the GEVD, no large size of the block maxima is needed (Beirlant et al. 2004a, b, Fig. 2.9).

The TED is similar to the exponential distribution when the upper bound m max is relatively high. This applies in many cases, such as the magnitude distribution according to the well-known Gutenberg–Richter law. According to Leadbetter et al. (1983), the extreme value index of the TED is γ = −1, which means that the POT of a TED converges to a uniform distribution. However, the convergence is often poor due to the similarity between ED and TED. Figure 1 presents the POTs of various TEDs with parameter x threshold = m min of Eqs. (1, 2) and the corresponding GPD with an extreme value index of γ = −1. The scale parameter of the GPD is determined by σ* = m max. While it is obvious that different TEDs with equal upper bounds have the same asymptotic tail distribution, the approximation of the TEDs by a GPD works only for very small differences of m max − m min in relation to the parameter β (β ≈ 2.3 in the case of earthquake magnitudes). Similarly, the block size must be very large for the approximation of Eq. (6) with the GEVD to work for the TED. This is in line with previous results by Raschke (2012, Sect. 2.6), wherein the upper bound m max can be better estimated by the methods described by Pisarenko et al. (1996), Kijko and Graham (1998), and Raschke (2012) than by extreme value statistics.

Fig. 1
figure 1

Demonstration of the poor convergence speed of the tail of the TED to the GPD for different scale parameters β of the TED und upper bounds m max, m = 0. In all cases, the GPDs have an extreme value index of γ = −1 and the same upper bound as the TEDs: a m max = 2, b as a but with logarithmized scale, c m max = 0.5, d as c but with logarithmized scale, e m max = 0.2, f as e but with logarithmized scale

Of course, there are alternatives to the TED model for the distribution of magnitudes. The generalized truncated exponential distribution (GTED) model recently formulated by Raschke (2014) is one such alternative. In all cases, the magnitude distribution is similar to the ED (as a GPD with an extreme value index of γ = 0) in a large share of the definition range of the random variable (cf., Gutenberg–Richter law). This implies a poor convergence of the upper tail of the magnitude distribution to the corresponding asymptotic GPD with an extreme value index of γ << 0, and hence I strongly advise against applying classical extreme value statistics for the approximation of earthquake magnitudes.

3 Comments on Statistical Inference

Pisarenko et al. apply the GEVD using the block maxima method and the GPD using the POT method to estimate the upper bound magnitude and the quantile of the maximum magnitude for a defined time period. In both cases, the procedures they use are confusing and inconsistent with those of classical extreme value statistics. Pisarenko et al. consider different lengths T of time periods for the block maxima method. This is unusual, but possible, and is oriented towards the classical POT with different thresholds. The authors apply the moment method for estimating GEVD parameters, referring to Pisarenko et al. (2008), which is in contrast to the commonly used extreme value statistics, as presented, e.g., by Coles (2001, Sect. 3.3) and Beirlant et al. (2004a, b, Sect. 5.1). Based on these references, Pisarenko et al. should have used the probability-weighted moment (PWM) and maximum likelihood (ML) methods. The authors justify their choice by noting the better estimation results in Pisarenko et al. (2008). This argument is weak, however, because Pisarenko et al. (2008) did not investigate the asymptotic behavior of the moment method, and only numerically investigated the case of an extreme value index with γ = −0.2. In addition, the PWM and ML methods do not work well for an extreme value index of γ < −0.5 (Coles 2001, Sect. 3.31), as is the case for TED with γ = −1. The ML method has a large bias (Hosking et al. 1985, Fig. 2), and the PWM has lower efficiency compared with the ML method (Hosking et al. 1985, Fig. 4). The authors do not present an argument for why the moment method could work better for small extreme value indexes with γ ≤ −0.25.

Pisarenko et al. also apply the ML method to estimate the Kolmogorov distance for the GEVD [their Eq. (26)], which is not consistent with their point estimation using the moment method. The Kolmogorov distance is used by the authors to detect the optimal threshold of the POT analysis. This distance represents the test statistics of the Kolmogorov–Smirnov goodness-of-fit test. The idea of applying goodness-of-fit statistics for threshold selection is not new. The procedure described by Goegebeur et al. (2008), for example, includes goodness-of-fit statistics, is based on a stringent theory, and is validated by numerical investigation of different situations. Pisarenko et al., however, provide no such validation.

POT analysis is generally used in extreme value statistics for estimating an extreme value index, as the sample size is normally larger than that of the block maxima. Further, the GPD uses only two parameters, both of which result in a smaller estimation error of the extreme value index than the estimation using the block maxima. As mentioned previously, the authors estimate the parameters of the GPD in POT analysis using the ML method. In the case of a small sample size, the ML method does not work well for a small extreme value index of γ ≤ −0.25 (Hüsler et al. 2011, Figs. 3, 5). In addition, the equations of the ML method do not solve every sample in the case of γ < −0.5 according to Grimshaw (1993). The estimation method described by Hüsler et al. (2011, 2014) has no such limitation, and is hence recommended for a small extreme value index. It is important to remember that the extreme value index of the TED is γ = −1.

A further crucial point is the error quantification in the Pisarenko et al. estimation procedure, which includes a bootstrap, an averaging, and a Monte Carlo simulation for applying the GEVD and GPD. Finally, the authors use the average of the estimated parameters for different block lengths in the case of a GEVD and different thresholds in the case of a GPD. I do not see the advantage in such averaging. On the contrary, if simple point estimation is used, the asymptotic variance–covariance matrix of the estimation method can be used for quantifying the estimation error (Hüsler et al. 2011, Chap. 2.1). The authors also quantify the standard deviation of the estimation error in a different manner, in which they apply reshuffling according to the bootstrap method using 100 reshuffled samples, which would seem to be too few according to DasGupta (2008). The standard deviations (standard errors) shown in their Figs. 2–8 are computed using this procedure. Confusingly, there are different standard deviations for the lower and upper lines in these figures (e.g., Fig. 6, T = 7), although there is only one standard deviation. Furthermore, Pisarenko et al. compute the standard deviation and the mean squared error (MSE) using Monte Carlo simulations with 500 generated samples of a GEVD and GPD in the case of POT analysis. The resulting standard deviations differ widely between the bootstrap and Monte Carlo simulations. For example, for the GEVD of the Harvard catalog, we have an MSE(γ) = 0.047 for T = 80 days, which means a standard deviation of Std(γ) ≈ 0.21. The standard deviation in the authors' corresponding Fig. 2, however, has the value Std(γ) = 0.02–0.025 for T = 75 days, which is only a small fraction of 0.21. I strongly reject their interpretation that "this is not surprising since the latter gives only the scatter conditional to the same unique data sample", as the bootstrap method is specifically used for quantifying the error distribution and its corresponding standard error (DasGupta 2008). Such a sizable difference may be an indicator that approximating the distribution of extreme magnitudes by GEVD and GPD does not work well.

This leads again to the key issue of poor convergence speed in applying GPD and GEVD to earthquake magnitudes. As mentioned above, the GEVD and corresponding estimation methods do not work well for the TED (Raschke 2012). The upper bound magnitude should be estimated using the methods described by Pisarenko et al. (1996), Kijko and Graham (1998), Hannon and Dahiya (1999), or Raschke (2012), where the quantile of a maximum magnitude of a defined time period can be computed by the inverse of Eq. (7). The inherent limitations of the extreme value theory and statistics also apply to other distribution models for magnitudes, as they are very similar to the ED, according to the Gutenberg–Richter law.

4 Is a homogeneous Poisson process needed?

Finally, I want to point out that earthquake data do not need to occur as a homogeneous Poisson process in order to apply extreme value theory and statistics. Pisarenko et al. decluster the earthquake catalog and use only the main events to provide a homogeneous Poisson process with independent magnitudes, which is similar, e.g., to Zöller et al. (2014). But a homogeneous Poisson process is not a necessary condition for applying GEVD and GPD. A homogeneous process would imply that the number of events with a magnitude over a defined threshold would be Poisson-distributed (Johnson et al. 1994, p. 553), but the GEVD works asymptotically for different distributions of excess numbers. A simple example is the maxima of an exponentially distributed random variable with a fixed, not Poisson-distributed, block (sample) size that converges quickly to a GEVD (Beirlant et al. 2004a, b, Fig. 2.9). Leadbetter et al. (1983, Part II) and Falk et al. (2011, Part III) provide further extensions. The GEVD can also be applied for random variables with serial correlation under certain conditions. The extremal index, an additional parameter, compensates for the influence of the serial correlation.

In addition, it has been shown that the parameters of the GPD can be estimated in a POT analysis even if there is serial correlation between the members of excess clusters (Raschke 2013a). In this case, the estimation error can be quantified using the jackknife method. The crucial point of earthquake magnitudes is the poor convergence of their tail to the GPD, not the earthquake process in time.

5 Conclusions

In this comment, I have discussed important aspects of the research conducted by Pisarenko et al. from the perspective of extreme value statistics and theory. In summary, I advise against using the procedures as applied by Pisarenko et al. GPD and GEVD work well only for the extremes of TED or GTED when the block size is very large and/or the threshold is very close to the upper bound magnitude. The crucial point of earthquake magnitudes is the poor convergence of their upper tail to the GPD. Therefore, the classical methods for estimating the upper bound of the TED should be applied as shown by Pisarenko et al. (1996), Kijko and Graham (1998), Hannon and Dahiya (1999), and Raschke (2012). The appropriateness of these methods for other distribution models such as the GTED should be examined in further research. Corresponding parameters of the maximal magnitudes such as the quantile of random block maxima of a defined time period can be computed by the inverse of Eq. (7) in all cases.