Abstract
In this short note, I comment on the research of Pisarenko et al. (Pure Appl. Geophys 171:1599–1624, 2014) regarding the extreme value theory and statistics in the case of earthquake magnitudes. The link between the generalized extreme value distribution (GEVD) as an asymptotic model for the block maxima of a random variable and the generalized Pareto distribution (GPD) as a model for the peaks over threshold (POT) of the same random variable is presented more clearly. Inappropriately, Pisarenko et al. (Pure Appl. Geophys 171:1599–1624, 2014) have neglected to note that the approximations by GEVD and GPD work only asymptotically in most cases. This is particularly the case with truncated exponential distribution (TED), a popular distribution model for earthquake magnitudes. I explain why the classical models and methods of the extreme value theory and statistics do not work well for truncated exponential distributions. Consequently, these classical methods should be used for the estimation of the upper bound magnitude and corresponding parameters. Furthermore, I comment on various issues of statistical inference in Pisarenko et al. and propose alternatives. I argue why GPD and GEVD would work for various types of stochastic earthquake processes in time, and not only for the homogeneous (stationary) Poisson process as assumed by Pisarenko et al. (Pure Appl. Geophys 171:1599–1624, 2014). The crucial point of earthquake magnitudes is the poor convergence of their tail distribution to the GPD, and not the earthquake process over time.
Avoid common mistakes on your manuscript.
1 Introduction
This short note is a comment on the paper “Characterization of the Tail of the Distribution of Earthquake Magnitudes by Combining the GEV and GPD Descriptions of Extreme Value Theory” by Pisarenko et al. (2014, hereafter referred to as Pisarenko et al.). In a continuation of the research of Pisarenko et al. (2008), the authors suggest applying both generalized extreme value distribution (GEVD) and generalized Pareto distribution (GPD) to the distribution of extreme magnitudes for estimating the upper bound magnitude and the quantiles of the maximum magnitude of a defined time period. They also present the cumulative distribution function (CDF) of the truncated exponential distribution (TED) as a distribution model for earthquake magnitudes (see, e.g., Cosentino et al. 1977). This is a popular method often applied in hazard models, e.g., for the Euro-Mediterranean region (Seismic Hazard Harmonization in Europe [SHARE] 2014). The CDF of the TED is written as
Here, the exponential function is preferred rather than the power function with base 10, as the exponential function is normally used in mathematics (e.g., Hannon and Dahiya 1999). The parameter m max represents the upper bound magnitude, and its estimation has been the subject of many studies (Pisarenko et al. 1996; Kijko and Graham 1998; Raschke 2012).
As mentioned previously, Pisarenko et al. suggest applying the GEVD and the GPD for the estimation of the upper bound magnitude, and explain the link between GEVD and GPD. I present this link in a more straightforward and transparent manner in the following section. I also explain why these models and methods of extreme value statistics do not work well in the case of the TED and other truncated exponential distributions. In Sect. 3, I comment on various issues concerning the parameter estimation for GPD and GEVD by Pisarenko et al.
Additionally, Pisarenko et al. decluster the earthquake catalogs and consider only the main events before applying extreme value analysis to the empirical earthquake data. This is based on their assumption that the event occurrence must follow a homogeneous Poisson process. This is, in fact, not necessary for applying extreme value models, as I will explain in Sect. 4. Finally, my comments are summarized in Sect. 5.
The notations of extreme value theory and statistics mentioned in the following chapters are provided in Table 1 of the appendix, which also includes the corresponding symbols employed by Pisarenko et al.
2 Comments on the Extreme Value Theory
The first important results utilizing the extreme value theory were achieved by Fischer and Tippett (1928), Gnedenko (1943). Today, this theory is a well-established field within probability theory and mathematical statistics. There are numerous available studies dealing with many aspects of this field (e.g., Leadbetter et al. 1983; de Haan and Ferreira 2006; Falk et al. 2011). One fundamental aspect of the extreme value theory is the distribution of peaks over threshold (POT, see, e.g., Coles 2001, Chap. 4; Berilant et al. 2004a, b, Sect. 5.3), which is defined as
with a real-valued random variable X and the excess Y over a certain threshold. The threshold acts in the same way as m min in Eq. (1). Under certain conditions, the CDF of Y increasingly approximates the GPD as the threshold increases. The CDF of the GPD is written as
where σ* is the scaling parameter and γ is the extreme value index (also called the tail index). With the Weibull case, the finite right endpoint is γ < 0, with the Gumbel case it is γ = 0, and with the Fréchet case it is γ > 0. The Gumbel case also represents the exponential distribution (ED). There, the scale parameter σ* is equivalent to the reciprocal scale parameter 1/β of the TED of Eq. (1) with an infinite m max.
If the tail of a distribution is a member of one of the domains of attraction of the GPD, then the CDF of the block maxima
can be approximated by the GEVD in the case of a large block (sample) size n (see, e.g., Beirlant et al. 2004a, b, Sect. 5.1), with CDF
The block size can also be determined by the defined length of time of a given period of observation. The actual and exact distribution of the block maximum Z is simply formulated for independent and identically distributed random variables X with
The probability that the maximum Z ≤ x is equal to the probability of no realization with X > x in the block (sample). In the case of a large n, we can apply the Poisson approximation (see, e.g., Falk et al. 2011, Part I). This means that the number of realizations X > x is binomially distributed, which can be approximated by the Poisson distribution and can be written as
Furthermore, n and F(x) can be replaced by H(x) and the (average) number of excesses n threshold in the case of a large sample size n. Hence, we obtain [bounds of x according to Eqs. (5, 9a–9c)]
This equation is equivalent to Eq. (5) with the following parameter transformation
and the extreme value index γ is the same. This transformation is already well known and needs no further explanation (see, e.g., Coles 2001, Chap. 4). Note that a distribution, which is a member of one domain of attraction of the GEVD and GPD, has only one exact asymptotic extreme value index γ, although different estimations can be obtained.
A crucial point is the convergence speed: how fast does the GEVD approximate the distribution of block maxima in Eq. (6) and the GPD approximate the actual tail of the distribution? For example, the exponential distribution is equal to the GPD with an extreme value index of γ = 0 and the tail of an ED is again an ED. The convergence speed is infinite. Correspondingly, Leadbetter et al. (1983) showed that the block maxima of an exponentially distributed random variable converge to a GEVD with an extreme value index of γ = 0. For application of the GEVD, no large size of the block maxima is needed (Beirlant et al. 2004a, b, Fig. 2.9).
The TED is similar to the exponential distribution when the upper bound m max is relatively high. This applies in many cases, such as the magnitude distribution according to the well-known Gutenberg–Richter law. According to Leadbetter et al. (1983), the extreme value index of the TED is γ = −1, which means that the POT of a TED converges to a uniform distribution. However, the convergence is often poor due to the similarity between ED and TED. Figure 1 presents the POTs of various TEDs with parameter x threshold = m min of Eqs. (1, 2) and the corresponding GPD with an extreme value index of γ = −1. The scale parameter of the GPD is determined by σ* = m max. While it is obvious that different TEDs with equal upper bounds have the same asymptotic tail distribution, the approximation of the TEDs by a GPD works only for very small differences of m max − m min in relation to the parameter β (β ≈ 2.3 in the case of earthquake magnitudes). Similarly, the block size must be very large for the approximation of Eq. (6) with the GEVD to work for the TED. This is in line with previous results by Raschke (2012, Sect. 2.6), wherein the upper bound m max can be better estimated by the methods described by Pisarenko et al. (1996), Kijko and Graham (1998), and Raschke (2012) than by extreme value statistics.
Of course, there are alternatives to the TED model for the distribution of magnitudes. The generalized truncated exponential distribution (GTED) model recently formulated by Raschke (2014) is one such alternative. In all cases, the magnitude distribution is similar to the ED (as a GPD with an extreme value index of γ = 0) in a large share of the definition range of the random variable (cf., Gutenberg–Richter law). This implies a poor convergence of the upper tail of the magnitude distribution to the corresponding asymptotic GPD with an extreme value index of γ << 0, and hence I strongly advise against applying classical extreme value statistics for the approximation of earthquake magnitudes.
3 Comments on Statistical Inference
Pisarenko et al. apply the GEVD using the block maxima method and the GPD using the POT method to estimate the upper bound magnitude and the quantile of the maximum magnitude for a defined time period. In both cases, the procedures they use are confusing and inconsistent with those of classical extreme value statistics. Pisarenko et al. consider different lengths T of time periods for the block maxima method. This is unusual, but possible, and is oriented towards the classical POT with different thresholds. The authors apply the moment method for estimating GEVD parameters, referring to Pisarenko et al. (2008), which is in contrast to the commonly used extreme value statistics, as presented, e.g., by Coles (2001, Sect. 3.3) and Beirlant et al. (2004a, b, Sect. 5.1). Based on these references, Pisarenko et al. should have used the probability-weighted moment (PWM) and maximum likelihood (ML) methods. The authors justify their choice by noting the better estimation results in Pisarenko et al. (2008). This argument is weak, however, because Pisarenko et al. (2008) did not investigate the asymptotic behavior of the moment method, and only numerically investigated the case of an extreme value index with γ = −0.2. In addition, the PWM and ML methods do not work well for an extreme value index of γ < −0.5 (Coles 2001, Sect. 3.31), as is the case for TED with γ = −1. The ML method has a large bias (Hosking et al. 1985, Fig. 2), and the PWM has lower efficiency compared with the ML method (Hosking et al. 1985, Fig. 4). The authors do not present an argument for why the moment method could work better for small extreme value indexes with γ ≤ −0.25.
Pisarenko et al. also apply the ML method to estimate the Kolmogorov distance for the GEVD [their Eq. (26)], which is not consistent with their point estimation using the moment method. The Kolmogorov distance is used by the authors to detect the optimal threshold of the POT analysis. This distance represents the test statistics of the Kolmogorov–Smirnov goodness-of-fit test. The idea of applying goodness-of-fit statistics for threshold selection is not new. The procedure described by Goegebeur et al. (2008), for example, includes goodness-of-fit statistics, is based on a stringent theory, and is validated by numerical investigation of different situations. Pisarenko et al., however, provide no such validation.
POT analysis is generally used in extreme value statistics for estimating an extreme value index, as the sample size is normally larger than that of the block maxima. Further, the GPD uses only two parameters, both of which result in a smaller estimation error of the extreme value index than the estimation using the block maxima. As mentioned previously, the authors estimate the parameters of the GPD in POT analysis using the ML method. In the case of a small sample size, the ML method does not work well for a small extreme value index of γ ≤ −0.25 (Hüsler et al. 2011, Figs. 3, 5). In addition, the equations of the ML method do not solve every sample in the case of γ < −0.5 according to Grimshaw (1993). The estimation method described by Hüsler et al. (2011, 2014) has no such limitation, and is hence recommended for a small extreme value index. It is important to remember that the extreme value index of the TED is γ = −1.
A further crucial point is the error quantification in the Pisarenko et al. estimation procedure, which includes a bootstrap, an averaging, and a Monte Carlo simulation for applying the GEVD and GPD. Finally, the authors use the average of the estimated parameters for different block lengths in the case of a GEVD and different thresholds in the case of a GPD. I do not see the advantage in such averaging. On the contrary, if simple point estimation is used, the asymptotic variance–covariance matrix of the estimation method can be used for quantifying the estimation error (Hüsler et al. 2011, Chap. 2.1). The authors also quantify the standard deviation of the estimation error in a different manner, in which they apply reshuffling according to the bootstrap method using 100 reshuffled samples, which would seem to be too few according to DasGupta (2008). The standard deviations (standard errors) shown in their Figs. 2–8 are computed using this procedure. Confusingly, there are different standard deviations for the lower and upper lines in these figures (e.g., Fig. 6, T = 7), although there is only one standard deviation. Furthermore, Pisarenko et al. compute the standard deviation and the mean squared error (MSE) using Monte Carlo simulations with 500 generated samples of a GEVD and GPD in the case of POT analysis. The resulting standard deviations differ widely between the bootstrap and Monte Carlo simulations. For example, for the GEVD of the Harvard catalog, we have an MSE(γ) = 0.047 for T = 80 days, which means a standard deviation of Std(γ) ≈ 0.21. The standard deviation in the authors' corresponding Fig. 2, however, has the value Std(γ) = 0.02–0.025 for T = 75 days, which is only a small fraction of 0.21. I strongly reject their interpretation that "this is not surprising since the latter gives only the scatter conditional to the same unique data sample", as the bootstrap method is specifically used for quantifying the error distribution and its corresponding standard error (DasGupta 2008). Such a sizable difference may be an indicator that approximating the distribution of extreme magnitudes by GEVD and GPD does not work well.
This leads again to the key issue of poor convergence speed in applying GPD and GEVD to earthquake magnitudes. As mentioned above, the GEVD and corresponding estimation methods do not work well for the TED (Raschke 2012). The upper bound magnitude should be estimated using the methods described by Pisarenko et al. (1996), Kijko and Graham (1998), Hannon and Dahiya (1999), or Raschke (2012), where the quantile of a maximum magnitude of a defined time period can be computed by the inverse of Eq. (7). The inherent limitations of the extreme value theory and statistics also apply to other distribution models for magnitudes, as they are very similar to the ED, according to the Gutenberg–Richter law.
4 Is a homogeneous Poisson process needed?
Finally, I want to point out that earthquake data do not need to occur as a homogeneous Poisson process in order to apply extreme value theory and statistics. Pisarenko et al. decluster the earthquake catalog and use only the main events to provide a homogeneous Poisson process with independent magnitudes, which is similar, e.g., to Zöller et al. (2014). But a homogeneous Poisson process is not a necessary condition for applying GEVD and GPD. A homogeneous process would imply that the number of events with a magnitude over a defined threshold would be Poisson-distributed (Johnson et al. 1994, p. 553), but the GEVD works asymptotically for different distributions of excess numbers. A simple example is the maxima of an exponentially distributed random variable with a fixed, not Poisson-distributed, block (sample) size that converges quickly to a GEVD (Beirlant et al. 2004a, b, Fig. 2.9). Leadbetter et al. (1983, Part II) and Falk et al. (2011, Part III) provide further extensions. The GEVD can also be applied for random variables with serial correlation under certain conditions. The extremal index, an additional parameter, compensates for the influence of the serial correlation.
In addition, it has been shown that the parameters of the GPD can be estimated in a POT analysis even if there is serial correlation between the members of excess clusters (Raschke 2013a). In this case, the estimation error can be quantified using the jackknife method. The crucial point of earthquake magnitudes is the poor convergence of their tail to the GPD, not the earthquake process in time.
5 Conclusions
In this comment, I have discussed important aspects of the research conducted by Pisarenko et al. from the perspective of extreme value statistics and theory. In summary, I advise against using the procedures as applied by Pisarenko et al. GPD and GEVD work well only for the extremes of TED or GTED when the block size is very large and/or the threshold is very close to the upper bound magnitude. The crucial point of earthquake magnitudes is the poor convergence of their upper tail to the GPD. Therefore, the classical methods for estimating the upper bound of the TED should be applied as shown by Pisarenko et al. (1996), Kijko and Graham (1998), Hannon and Dahiya (1999), and Raschke (2012). The appropriateness of these methods for other distribution models such as the GTED should be examined in further research. Corresponding parameters of the maximal magnitudes such as the quantile of random block maxima of a defined time period can be computed by the inverse of Eq. (7) in all cases.
References
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. (2004a), Statistics of Extremes—Theory and Application, Wiley series in probability and statistics, John Wiley, Chichester.
Beirlant, J., Goegebeur, Y., Segers, J., and Teugels, J. (2004b), Statistics of Extremes—Theory and Application, http://lstat.kuleuven.be/Wiley, accessed Dec 2009.
Coles, S. (2001), An Introduction to Statistical Modeling of Extreme Values, Springer, London, 208 pages.
Cosentino, P., Ficarra, V., and, Luzio, D. (1977), Truncated Exponential Frequency-Magnitude Relationship in Earthquake Statistics, Bull. Seism. Soc. Am. 67, 1615–1623.
DasGupta, A. (2008), Asymptotic theory of statistics and probability, Springer Texts in Statistics, Springer, New York, 461–492.
de Haan, L., and Ferreira, A. (2006), Extreme Value Theory: An Introduction, Springer series in operations research and financial engineering, Springer, USA.
Falk, M., Hüsler, J., and Reiss, R.-D. (2011), Laws of small numbers: extremes and rare events, 3rd Ed. Birkhäuser, Basel.
Fischer, R.A., and Tippett, L.H.C. (1928), Limiting forms of the frequency distribution of the largest and smallest member of a sample, Proc. Cambridge Philosophical Society 24, 180–190.
Gnedenko, B.V. (1943), Sur la distribution limite du terme maximum d’une serie aleatoire, Annals of Mathematics 44, 423–453.
Goegebeur, Y., Beirlant, J., and de Wet, T. (2008), Linking Pareto-tail kernel goodness-of-fit statistics with tail index at optimal threshold and second order estimation, REVSTAT—Statistical Journal 6, 51–69.
Grimshaw, S. (1993), Computing maximum likelihood estimates for the generalized Pareto distribution, Technometrics 35, 185–191.
Hannon, P.M., and Dahiya, R.C. (1999), Estimation of Parameters for the Truncated Exponential Distribution, Commun Stat Theory Methods, 28, 2591–2612.
Hosking, J.R.M., Wallis, J.R., and Wood, E.F. (1985), Estimation of the Generalized Extreme-Value Distribution by the Method of Probability-Weighted Moments, Technometrics 29, 339–349.
Hüsler, J., Li, D., and Raschke, M. (2011), Estimation for the Generalized Pareto Distribution Using Maximum Likelihood and Goodness-of-fit, Communication in Statistics—Theory and Methods 40, 2500–2510.
Hüsler, J., Li, D., and Raschke, M. (2014), Extreme value index estimator using maximum likelihood and moment estimation, Communication in Statistics—Theory and Methods (accepted).
Johnson, N.L., Kotz, S. and Balakrishnan, N. (1994), Continuous Univariate Distributions, Vol. 1, Wiley Series in probability and mathematical statistics, Wiley, New York.
Kijko, A., and Graham, G. (1998), Parametric-historic Procedure for Probabilistic Seismic Hazard Analysis Part I: Estimation of Maximum Regional Magnitude m max , Pure Appl Geophys 152, 413–442.
Leadbetter, M.R., Lindgren, G., and Rootzén, H. (1983), Extremes and Related Properties of Random Sequences and Processes, Springer, New York.
Pisarenko, V.F., Lyubushin, A.A., Lysenko, V.B., and Golubeva, T.V. (1996), Statistical Estimation of Seismic Hazard Parameters: Maximum Possible Magnitude and Related Parameters, Bul. Seismol. Soc. Am. 86: 691–700.
Pisarenko, V.F., Sornette, A., Sornette, D., and Rodkin, M. (2008), New Approach to the Characterization of Mmax and of the Tail of the Distribution of Earthquake Magnitudes, Pure Appl. Geophys. 165, 847–888.
Pisarenko, V.F., Sornette,A., Sornette, D., and Rodkin, M.V. (2014), Characterization of the Tail of the Distribution of Earthquake Magnitudes by Combining the GEV and GPD Descriptions of Extreme Value Theory, Pure Appl. Geophys. 171, 1599–1624.
Raschke, M. (2012), Inference for the Truncated Exponential Distribution, Stochastic Environmental Research and Risk Assessment 26, 127–138.
Raschke, M. (2013a), Parameter Estimation for the Tail Distribution of a Random Sequence, Communication in Statistics—Simulation and Computation 42, 1013–1043.
Raschke, M. (2013b), Statistical Modelling of Ground Motion Relations for Seismic Hazard Analysis, Journal of Seismology 17, 1157–1182.
Raschke, M. (2014), Modeling of Magnitude Distributions by the Generalized Truncated Exponential Distribution, Journal of Seismology (accepted).
SHARE (2014), Seismic Hazard Harmonization in Europe, scientific project, www.share-eu.org (last accessed October 2014).
Zöller, G., Holschneider, M., and Hainzl, S. (2014), The Largest Expected Earthquake Magnitudes in Japan: The Statistical Perspective, Bull. Seism. Soc. Am. 104, 769–779.
Author information
Authors and Affiliations
Corresponding author
Appendix
Appendix
See Table 1.
Rights and permissions
About this article
Cite this article
Raschke, M. Comment on Pisarenko et al., “Characterization of the Tail of the Distribution of Earthquake Magnitudes by Combining the GEV and GPD Descriptions of Extreme Value Theory”. Pure Appl. Geophys. 173, 701–707 (2016). https://doi.org/10.1007/s00024-015-1031-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00024-015-1031-z