1 Introduction

It has long been observed that moderate to great earthquakes usually occur in a repetitive manner (Rikitake 1976; Utsu 1984; Kagan and Jackson 1991; Faenza et al. 2008; Working Group 2008; Chen et al. 2013). Various probability distributions, namely the exponential (Poisson), gamma, lognormal, and Weibull distributions, are regularly used to model interevent times of such earthquakes (Utsu 1984; Cornell and Winterstein 1986; Nishenko and Buland 1987; Anagnos and Kiremidjian 1988; Parvez and Ram 1997; SSHAC 1997; Yadav et al. 2010; Yazdani and Kowsari 2011; Chen et al. 2013; Pasari and Dikshit 2013). These distributions, though very popular because of their easy interpretation and robust application in several fields, have certain drawbacks. For instance, the exponential distribution, which is also connected to the discrete Poisson distribution, possesses the memoryless property, which contradicts the physics of the earthquake-generating mechanism as illustrated in the ‘elastic rebound theory’ (Reid 1910). Similarly, computation of the distribution function or survival function of a gamma distribution is difficult when the shape parameter is non-integer (Johnson et al. 1995). The closed form of the distribution function of the sum of independent and identically distributed (i.i.d.) Weibull or lognormal random samples is hardly available. This implies that the distribution of the mean of i.i.d. Weibull or lognormal random samples is difficult to obtain (Gupta and Kundu 1999). Besides, the Weibull and lognormal distributions do not show a reproductive (hereditary) property. Therefore, we often end up with situations where studying of other probability models may be required.

Apart from the exponential (Poisson), gamma, lognormal, and Weibull distributions, previous studies have used the triple exponential model (Kijko and Sellevoll 1981), Brownian passage time distribution (Matthews et al. 2002), Pareto distribution (Kagan and Schoenberg 2001; SSHAC 1997), Rayleigh distribution (Yazdani and Kowsari 2011), negative binomial distribution (Dionysiou and Papadopoulos 1992), generalized gamma distribution (Bak et al. 2002), and a few other distributions (SSHAC 1997; Working Group 2008) to determine the underlying pattern of earthquake interevent times and subsequently to strengthen the concept of empirical recurrence modeling. Nevertheless, the most appropriate and versatile distribution function for earthquake recurrence modeling still remains an open research question.

1.1 Scope and objective

Earthquake recurrence modeling in northeast India using the exponential, gamma, lognormal, and Weibull models was carried out previously by Parvez and Ram (1997), Yadav et al. (2010), and Pasari and Dikshit (2013). This article uses the GE distribution to estimate the recurrence time of large earthquakes.

The GE distribution (Gupta and Kundu 1999) is a particular member of the general class of exponentiated distributions proposed by Gupta et al. (1998) as F(t) = [G(t)]β, where G(t) is the base distribution and β > 0 is a shape parameter. It is also known as the exponentiated exponential distribution (Gupta and Kundu 2007). The GE distribution shares many physical properties of gamma and Weibull distributions. This distribution, as an important tool for lifetime data analysis, has widespread applications in the field of medical and biological research (Gupta and Kundu 1999, 2007). However, no study has been carried out to date to explore the suitability of the GE distribution in the field of natural hazards. In view of this, the present research aims to introduce the three-parameter GE model in seismic recurrence studies and to investigate its effectiveness in earthquake recurrence modeling of large earthquakes. The efficacy of the GE model is determined by comparing it to the popular gamma and Weibull models that have been serving as leading earthquake recurrence models for the last few decades. The overall methodology for the present study is arranged in three steps: model description, parameter estimation, and model comparison. Toward the end, we also provide a few conditional probability curves (using the GE distribution) to assess future earthquake hazards in the study region.

1.2 Seimotectonic settings of the study area and earthquake data file description

We consider the northeast part of India and its adjoining regions (20°–32°N and 87°–100°E) for the present study. This region falls under seismic zones V (most seismically active zone), IV (high), and III (moderate) on the seismic zonation map of India (BIS 2002). A number of active thrust faults (Fig. 1), namely main boundary thrust (MBT), main central thrust (MCT), Lohit thrust, Misami thrust, and Sagaing thrust, are present in the study area (Gupta et al. 1986; Yadav et al. 2009). This region has experienced many large earthquakes in the past (Thingbaijam et al. 2008). Among these, two massive great earthquakes (marked in Fig. 1), namely the Shilong plateau earthquake of June 12, 1897 (Mw 8.1) and the upper Assam earthquake of 15 August 1950 (Mw 8.5) caused extensive loss of life and property in the Indian subcontinent (Oldham 1899; Poddar 1950; Molnar and Pandey 1989; Bilham and England 2001). A detailed discussion of the general seismotectonic setting, historical seismicity, and earthquake losses of northeast India may be found in Gupta et al. (1986), Nandy (1986), Kayal (1996), Thingbaijam et al. (2008), and the references therein.

Fig. 1
figure 1

Seismotectonic map of northeast India and its adjacent regions consisting of several faults, thrusts, lineaments, and structural features; two stars mark the epicentral locations of the 1897 Shilong earthquake and 1950 Assam earthquake. The map also highlights four active seismogenic source zones: eastern syntaxis-zone I, Arakan-Yoma subduction belt-zone II, Shilong plateau-zone III, and Himalayan frontal thrusts including MCT and MBT-zone IV (modified after Gupta et al. 1986 and Yadav et al. 2009)

We use a real, complete, and homogeneous earthquake catalog (Yadav et al. 2010) of 20 events (M ≥ 7.0) spanning the period 1846–1995. These events are listed in Table 1, and their geographical epicentral locations are shown in Fig. 2. At this point, it can be noted that no major earthquake (M ≥ 7.0) has occurred in the study region since 1995. Thus, the present catalog accounts all main shocks with magnitude M ≥ 7.0 for the period 1846–2013.

Table 1 List of M ≥ 7.0 earthquakes that occurred (since 1846) in the study region (after Yadav et al. 2010)
Fig. 2
figure 2

Epicentral locations of M ≥ 7.0 earthquakes (as listed in Table 1) that occurred in northeast India and its surrounding regions (after Yadav et al. 2009, 2010)

It is evident from Table 1 that most of the Himalayan earthquakes are of shallow-type (focal depth < 80 km) earthquakes and thus more hazardous. The present study, however, aims to estimate future earthquake occurrences purely on the basis of empirical modeling of earthquake event gaps (time); hence, it does not consider any kind of social, positional, geological, or geophysical influences in the analysis.

2 Preliminaries and the GE model description

Let T be a positive random variable of inter-occurrence times of successive events with the density function f(t), distribution function F(t), and hazard function h(t). Let t e and τ denote the elapsed time (time beyond the last occurrence) and the residual time (remaining time to a future occurrence), respectively. Knowing the elapsed time, the residual time is a random variable. Let P(τ|t e) be the conditional probability of earthquake occurrence, i.e., the probability of an earthquake to occur during time interval t e and t e + τ given that no earthquake has been triggered during last t e year(s).

2.1 Three-parameter generalized exponential (GE) distribution

The three-parameter generalized (exponentiated) exponential distribution (Gupta and Kundu 1999) uses two-parameter exponential distribution F Exp (tαγ) as the base distribution. Thus, the distribution function (F GE) of \( T \sim {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \) can be defined (Gupta et al. 1998) as

$$ F_{\text{GE}} \left( {t;\alpha ,\beta ,\gamma } \right) = \left[ {F_{\text{Exp}} \left( {t;\alpha ,\gamma } \right)} \right]^{\beta } = \left( {1 - \text{e}^{{ - \frac{t - \gamma }{\alpha }}} } \right)^{\beta } \quad \left( {t > \gamma ,\alpha > 0,\beta > 0} \right) $$
(1)

The GE distribution is controlled by three parameters—the scale parameter α(> 0) that restricts the spread of t, the shape parameter β(> 0) that determines the appearance of the distribution, and the location parameter γ(< t) that controls the range of the distribution. The shape parameter β, among all three parameters, plays the most important role in GE model description. It is easy to note that if β = 1, GE distribution exactly coincides with its base distribution, i.e., the two-parameter exponential distribution. Therefore, the GE distribution, like the gamma and Weibull distributions, is an extension (generalization) of classical exponential distribution. The density function f GE(t) and the hazard function h GE(t) of \( T \sim {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \) are given below.

$$ f_{\text{GE}} \left( {t;\alpha ,\beta ,\gamma } \right) = \frac{\beta }{\alpha }\left( {1 - e^{{\text{ - }\frac{t - \gamma }{\alpha }}} } \right)^{\beta - 1} e^{{\text{ - }\frac{t - \gamma }{\alpha }}} \quad \left( {t > \gamma > 0,\alpha > 0,\beta > 0} \right) $$
(2)
$$ h_{\text{GE}} \left( {t;\alpha ,\beta ,\gamma } \right) = \frac{\beta }{\alpha }\frac{{\left( {1 - e^{{\text{ - }\frac{t - \gamma }{\alpha }}} } \right)^{\beta - 1} e^{{\text{ - }\frac{t - \gamma }{\alpha }}} }}{{1 - \left( {1 - \text{e}^{{\text{ - }\frac{t - \gamma }{\alpha }}} } \right)^{\beta } }}\quad \left( {t > \gamma > 0,\alpha > 0,\beta > 0} \right) $$
(3)

The expression for the corresponding conditional probability P GE(τ|t e) for \( {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \) is given as

$$ P_{\text{GE}} \left( {\tau |t_{\text{e}} } \right) = P\left( {T \le t_{\text{e}} + \tau |T > t_{\text{e}} } \right) = \frac{{\left( {1 - \text{e}^{{\text{ - }\frac{{t_{\text{e}} - \gamma + \tau }}{\alpha }}} } \right)^{\beta } - \left( {1 - \text{e}^{{\text{ - }\frac{{t_{\text{e}} - \gamma }}{\alpha }}} } \right)^{\beta } }}{{1 - \left( {1 - \text{e}^{{\text{ - }\frac{{t_{\text{e}} - \gamma }}{\alpha }}} } \right)^{\beta } }}\quad \left( {t > \gamma > 0,\alpha > 0,\beta > 0} \right) $$
(4)

Since the present study also aims to compare GE distribution with gamma and Weibull distributions, we mention the gamma and Weibull density functions below. The associated distribution functions and hazard functions can be easily calculated from the respective density functions.

$$ f_{\text{Weibull}} \left( {t;\alpha ,\beta } \right) = \frac{\beta }{{\alpha^{\beta } }}t^{\beta - 1} e^{{ - \left( {\frac{t}{\alpha }} \right)^{\beta } }} \quad \left( {t > 0,\alpha > 0,\beta > 0} \right) $$
(5)
$$ \begin{gathered} f_{\text{Gamma}} \left( {t;\alpha ,\beta } \right) = \frac{1}{\varGamma \left( \beta \right)}\frac{{t^{\beta - 1} }}{{\alpha^{\beta } }}e^{{ - \frac{t}{\alpha }}} \quad \left( {\,t > 0,\,\alpha > 0,\beta > 0} \right) \hfill \\ \end{gathered} $$

where

$$ \begin{gathered} \varGamma \left( z \right) = \int\limits_{0}^{\infty } {e^{ - t} t^{z - 1} {\text{d}}t} \quad \left( {z > 0} \right) \hfill \\ \end{gathered} $$
(6)

2.2 Model properties

The \( {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \) density function assumes a variety of shapes depending on the shape parameter β (Fig. 3). It is monotonically decreasing for β ≤ 1, and for β > 1, it is unimodal, skewed, and right-tailed, similar to Weibull or gamma density functions.

Fig. 3
figure 3

Different shapes of the \( {\text{GE}}\left( {1,\beta ,0} \right) \) density function for different values of shape parameter β

The hazard functions of the GE distribution, like gamma and Weibull hazard functions, assume various shapes depending on the shape parameter β. Specifically, for fixed (αγ), the GE hazard function is increasing for β > 1 and decreasing for β < 1 (Gupta and Kundu 1999). For β = 1, the hazard function becomes constant, and the distribution follows the memoryless property. The different shapes of hazard function provide salient information about the instantaneous rate of failure of an earthquake reliability system and thus are of specific interest to the scientists who study and try to predict earthquakes (Matthews et al. 2002). The plots of the GE hazard function corresponding to different values of β are shown in Fig. 4.

Fig. 4
figure 4

Different shapes of the hazard function for different values of 1, β, 0 hazard function for different values of β

The moment generating function of \( T \sim {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \) is given (see Appendix 1) as

$$ M_{T} (x) = E\left( {e^{xT} } \right) = \gamma e^{x\gamma } \frac{{\varGamma \left( {\beta + 1} \right)\varGamma \left( {1 - \alpha x} \right)}}{{\varGamma \left( {\beta + 1 - \alpha x} \right)}}\quad \left( {\alpha x < 1} \right) $$
(7)

We use \( E\left( {T^{n} } \right) = \frac{{d^{n} M_{T} (x)}}{{dx^{n} }}\left( 0 \right) \) to obtain respective moments (about origin) of \( T \sim {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \). The mean and variance of the GE distribution are calculated as γ + α[ψ(β + 1) − ψ(1)] and α 2[ψ′(1) − ψ′(β + 1)], respectively; ψ(x) and ψ′(x) denote the digamma function and its first derivate. It is observed (Gupta and Kundu 1999) that the mean and variance of the \( {\text{GE}}\left( {\alpha ,\beta ,\gamma } \right) \) are increasing functions of β (for fixed α). More specifically, the variance of the GE distribution increases to \( \frac{{\pi^{2} \alpha }}{6} \), unlike the variance of gamma distribution, which tends to infinity (as β increases), and the variance of Weibull distribution, which approximately equals \( \frac{{\pi^{2} \alpha }}{{6\beta^{2} }} \) (for large values of β). We also observe that the \( {\text{GE}}\left( {1,\beta ,0} \right) \) is unimodal with mode at log β if β > 1 and at 0 if β ≤ 1. In addition, \( {\text{GE}}\left( {1,\beta ,0} \right) \) has its median at \( - \ln \left( {1 - \left( {0.5} \right)^{{\frac{1}{\beta }}} } \right) \). For large values of β, each of mean, median, and mode of \( {\text{GE}}\left( {1,\beta ,0} \right) \) approximately equals log β (Gupta and Kundu 1999, 2007). A comprehensive representation of mean, mode, median, variance, and coefficient of variation (CV) is illustrated in Fig. 5.

Fig. 5
figure 5

Shapes of mean, mode, median, variance, and coefficient of variation (CV) for the \( {\text{GE}}\left( {1,\beta ,0} \right) \) distribution for different values of shape parameter β

It can be noted that the GE moment generating function M T(x) is not very convenient to handle; thus, the sum of i.i.d. GE random variables is difficult to obtain. As a result, the GE distribution, like the Weibull distribution, does not support the reproductive property (Gupta and Kundu 2007).

3 Inference from the modified maximum likelihood estimation (MMLE)

We apply the modified maximum likelihood estimation (MMLE) method to estimate GE model parameters. The MMLE method is entirely based on the classical maximum likelihood estimation (MLE) method proposed by Fisher in 1912 (Aldrich 1997). If {t 1t 2, …, t n } is a random sample from GE(αβγ) distribution, then the log-likelihood function (ln L GE) is

$$ \ln \,L_{\text{GE}} \left( {\alpha ,\beta ,\gamma ;t_{1} ,t_{2} ,{ \ldots },t_{n} } \right) = n\;\ln \;\beta - n\;\ln \;\alpha - \sum\limits_{i = 1}^{n} {\left( {\frac{{t_{i} - \gamma }}{\alpha }} \right)} + \left( {\beta - 1} \right)\sum\limits_{i = 1}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i} - \gamma }}{\alpha }}} } \right)} $$
(8)

The associated log-likelihood equations are

$$ n\alpha + \left( {\beta - 1} \right)\sum\limits_{i = 1}^{n} {\left( {\frac{{\left( {t_{i} - \gamma } \right)e^{{ - \frac{{t_{i} - \gamma }}{\alpha }}} }}{{1 - e^{{ - \frac{{t_{i} - \gamma }}{\alpha }}} }}} \right) - \sum\limits_{i = 1}^{n} {\left( {t_{i} - \gamma } \right)} } = 0 $$
(9)
$$ \frac{n}{\beta } + \sum\limits_{i = 1}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i} - \gamma }}{\alpha }}} } \right)} = 0 $$
(10)
$$ n\alpha - \alpha \left( {\beta - 1} \right)\sum\limits_{i = 1}^{n} {\left( {\frac{{e^{{ - \frac{{t_{i} - \gamma }}{\alpha }}} }}{{1 - e^{{ - \frac{{t_{i} - \gamma }}{\alpha }}} }}} \right)} = 0 $$
(11)

The estimated parameters are the solution of Eqs. (911) satisfying the constraint that γ < t i ∀ i. However, the regularity conditions, existence, and uniqueness of the solution of Eqs.(911) require extensive investigation (Gupta and Kundu 1999, 2007; Raqab and Ahsanullah 2001). Therefore, in practice, we first estimate the location parameter γ as

$$ \hat{\gamma } = \mathop { \hbox{min} }\limits_{i} t_{i} = t_{\left( 1 \right)} $$
(12)

and then the other parameters are estimated. It is observed that for fixed (αβ), the likelihood function (L GE) is a monotonically increasing function of γ. Further, γ can only assume values between 0 and t (1). Thus, the best possible estimate for γ is \( \hat{\gamma } = t_{\left( 1 \right)} \). However, by doing so, the concept of searching the parameter values that maximize the likelihood function (L GE) or log-likelihood function (ln L GE) is violated. This is precisely the reason for the adopted method to be called the local or the modified MLE (MMLE).

Once \( \hat{\gamma } \) is obtained, all data points {t 1t 2, …, t n } are shifted right or left to the abscissa on the basis of \( \hat{\gamma } \) and the minimum shifted point (which is 0) is discarded from the set of shifted points. This is essential because for β < 1, the functional value of likelihood function (L GE) does not exist [LGE goes to infinity when the value of the location parameter approaches t (1)]; hence, the underlying parameter estimation technique becomes questionable. The shape and scale parameters are estimated from the remaining (n − 1) shifted data points {t (2) − t (1)t (3) − t (1), …, t (n) − t (1)}.

The modified log-likelihood function (ln LGE) is now obtained as

$$ \ln L_{\text{GE}}^{{\prime }} \left( {\alpha ,\beta ,0;t_{2}^{{\prime }} ,t_{3}^{{\prime }} ,{ \ldots },t_{n}^{{\prime }} } \right) = \left( {n - 1} \right)\,\ln \;\beta - \left( {n - 1} \right)\,\ln \,\alpha - \sum\limits_{i = 2}^{n} {\frac{{t_{i}^{{\prime }} }}{\alpha }} + \left( {\beta - 1} \right)\sum\limits_{i = 2}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} } \right)} $$
(13)

In the above expression, t i ′ denotes the shifted data point, i.e., t i ′ = t (i) − t (1). The corresponding log-likelihood equations are given below.

$$ \left( {n - 1} \right)\alpha + \left( {\beta - 1} \right)\sum\limits_{i = 2}^{n} {\left( {\frac{{t_{i}^{{\prime }} e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} }}{{1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} }}} \right) - \sum\limits_{i = 2}^{n} {t_{i}^{{\prime }} } } = 0 $$
(14)
$$ \frac{n - 1}{\beta } + \sum\limits_{i = 2}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} } \right)} = 0 $$
(15)

A simple manipulation of the above equations gives rise to the following Eq. (16) in terms of a single variable α, which can be easily solved numerically or using any standard software packages such as MAPLE, MATLAB, and R.

$$ \left( {n - 1} \right)\alpha - \left( {\frac{n - 1}{{\sum\limits_{i = 2}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} } \right)} }} + 1} \right)\sum\limits_{i = 2}^{n} {\left( {\frac{{t_{i}^{{\prime }} e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} }}{{1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} }}} \right) - \sum\limits_{i = 2}^{n} {t_{i}^{{\prime }} } } = 0 $$
(16)

Alternatively, we can estimate α directly from Eq. (13) by maximizing

$$ \begin{gathered} g\left( \alpha \right) = \ln L_{\text{GE}}^{{\prime }} \left( {\alpha ,\beta \left( \alpha \right),0;t_{2}^{{\prime }} ,t_{3}^{{\prime }} ,{ \ldots }t_{n}^{{\prime }} } \right) \hfill \\\quad \quad\,= K - \left( {n - 1} \right)\ln \left( { - \sum\limits_{i = 2}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} } \right)} } \right) - \left( {n - 1} \right)\,\ln \;\alpha - \sum\limits_{i = 2}^{n} {\ln \left( {1 - e^{{ - \frac{{t_{i}^{{\prime }} }}{\alpha }}} } \right)} - \sum\limits_{i = 2}^{n} {\frac{{t_{i}^{{\prime }} }}{\alpha }} \hfill \\\quad K \equiv {\text{constant term}} \hfill \\ \end{gathered} $$
(17)

Two approaches to solve (17) are provided in Appendix 2.

Table 2 provides the estimated model parameter values of the GE, gamma, and Weibull distributions. The maximum likelihood estimation methods of the gamma and Weibull models can be found in Johnson et al. (1995).

Table 2 Parameter estimation from the (modified) maximum likelihood estimation technique

From Table 2, we see that the estimated shape parameter β of the GE model is greater than 1, meaning the GE density function for the present earthquake catalog is unimodal, right-tailed, and log-concave, and the estimated GE hazard function is monotonically increasing.

4 Model selection

We use three well-established model selection criteria, namely the maximum likelihood criterion and its modification, known as the Akaike information criterion (AIC), the Kolmogorov-Smirnov (K-S) minimum distance criterion, and the chi-square criterion to appraise the suitability of GE distribution in comparison to gamma and Weibull distributions (Pasari and Dikshit 2013).

The maximum likelihood criterion uses the maximum log-likelihood value (ln L) to determine the best suitable model. This criterion, however, assumes that the number of parameters (k) in each competitive model is the same. To overcome this limitation, several modifications have been proposed. Among these, the Akaike information criterion (Akaike 1974) defined by AIC = 2 k − 2 ln L has been widely accepted. The AIC involves a penalty function to account for model complexity due to the unequal number of parameters. The model with the minimum AIC value is tagged as the most suitable model. The log-likelihood values and AIC values of each distribution are listed in Table 4.

The Kolmogorov-Smirnov (K-S) minimum distance criterion uses K-S distances between empirical distribution function and probability distribution function to choose the most appropriate model. The K-S test belongs to the family of non-parametric and distribution-free goodness-of-fit tests (Johnson et al. 1995). A step-by-step procedure to obtain the K-S value is described in Appendix 3, and the calculated K-S distances are listed in Table 4. The associated K-S test plot is given in Fig. 6.

Fig. 6
figure 6

The K-S test plot as a difference between the estimated cumulative distribution function for each test model with the empirical distribution function

The minimum chi-square criterion is one of the oldest conventional techniques for model selection. This criterion uses the observed and expected frequencies of class intervals to calculate the chi-square value (see Appendix 4). However, in the chi-square test, there is no specified rule to choose the number and width of class intervals (Johnson et al. 1995). For this reason, we choose a uniform interval size of 3 years (6 classes: <3, 3–6, 6–9, 9–12, 12–15, and >15) to calculate the chi-square value. According to the chi-square test, the model with the minimum chi-square value is considered the best model. The different chi-square values along with their observed and expected frequencies are summarized in Table 3.

Table 3 The chi-square criterion for different models

Table 4 shows that the GE distribution, among three competitive distributions, has the minimum AIC value. This strongly suggests that the GE distribution is more economical as compared to gamma or Weibull distribution. In addition, we observe that the GE distribution has the minimum K-S distance value. This implies that the nonparametric K-S criterion also suggests GE distribution as the most appropriate distribution to represent the present earthquake catalog. The chi-square criterion, on the other hand, suggests the gamma distribution to be the best fitted model for the present data set.

Table 4 Model comparison using three criteria: the maximum log-likelihood criterion (ln L), AIC, minimum K-S distance criterion, and minimum chi-square (χ 2value ) criterion

A close look at the K-S curve (Fig. 6) substantiates that all three distributions cross each other, and all of them are quite close to the empirical distribution function. Therefore, at this stage, it is difficult to decide which one of these three competitive distributions performs better in a global sense. As a matter of fact, we avoid claiming that the GE is the most appropriate distribution for recurrence modeling. Rather, we argue that the GE distribution can be effectively used as a practical alternative to the gamma and Weibull distributions in earthquake recurrence modeling and associated problems. In the following section, we apply the GE model to generate a number of estimated conditional probability curves to measure earthquake hazards in the study region.

5 Earthquake hazard assessment

Following Utsu (1984), we assess earthquake hazards in terms of the estimated recurrence interval and conditional probability values. Subsequently, we generate a number of conditional probability curves (hazard curves) for different combination of elapsed time (t e) and residual time (τ). These estimated hazard curves play a significant role in seismic zonation and microzonation, evaluation of existing building codes, city planning and infrastructure development, designing of highway/railway bridges, location choice of nuclear power plants, schools, and hospitals, and many other engineering applications (Baker 2008; Yadav et al. 2009, 2010).

The expected mean recurrence interval of an earthquake (M ≥ 7.0) from the GE model is 8.23 ± 6.40 years in comparison to 7.83 ± 6.39 years from the gamma model and 7.84 ± 5.84 years from the Weibull model. The estimated cumulative probability (from the GE model) of a large M ≥ 7.0 magnitude earthquake by 2015 is 0.94, which is critically high. The same is found to be 0.95 and 0.96 from the gamma and Weibull models, respectively. On the other hand, the conditional probability (from the GE model) reaches 0.80–0.90 after about 10–14 years (2023–2027) and 0.90–0.95 after about 14–18 years (2027–2031) for an elapsed time of 18 years (i.e., 2013). A comprehensive list of estimated conditional probabilities using the GE model is presented in Table 5, and the corresponding conditional probability curves (hazard curves) are schematically shown in Fig. 7.

Table 5 Conditional probabilities of an earthquake of magnitude M ≥ 7.0 to occur in the next τ year(s), given that no earthquake has occurred in the last t e year(s) since the last occurrence of 1995
Fig. 7
figure 7

The estimated conditional probability curves (hazard curves) for elapsed time t e = 5, 10, ···, 35 years using the GE distribution for earthquake events M ≥ 7.0 in the study region. The dotted line corresponds to an elapsed time of 18 years, i.e., 2013

6 Summary and Conclusions

This article has presented a study on the three-parameter generalized (exponentiated) exponential distribution (Gupta and Kundu 1999) and has investigated its scope in seismic recurrence studies. The purpose was to increase the choices of potential models to estimate earthquake interevent times. A detailed description of the GE model, its parameter estimation, and model selection techniques was provided. It was observed that the GE distribution, unlike the gamma distribution, has a tractable distribution function or survival function, which makes the GE distribution computationally much easier to handle than the gamma distribution. The GE distribution, like the gamma or Weibull distribution, offers both monotonically increasing and decreasing hazard functions, which play a significant role in seismic reliability analysis. This facility, however, was not previously available from an exponential (Poisson) distribution that only provides a constant failure rate. The moment generating function of the GE distribution is quite intractable. As a consequence, the GE distribution, unlike the gamma distribution and like the Weibull distribution, does not preserve the hereditary (reproductive) property.

For illustrative purposes, we used a real, homogeneous, and complete earthquake catalog of 20 events (M ≥ 7.0) from northeast India and its adjoining regions. The estimated (MMLE) shape parameter (\( \hat{\beta } = 1. 3 7 4 8 3 6 \)) reveals that the underlying GE distribution is unimodal and right-tailed, and the corresponding hazard function increases monotonically. In order to compare the GE distribution with the gamma and Weibull distributions, we applied three model selection criteria, namely, the maximum likelihood criterion and its extension (AIC), the K-S minimum distance criterion, and the chi-square criterion. It was observed that two criteria, namely the maximum likelihood test (AIC) and the K-S test, suggest that the GE model has comparatively better fitting for the present data set. This shows the efficacy of the GE model as a practical alternative to other popular probability models for analysis by seismologists and earthquake professionals. We have also presented a few conditional probability curves (hazard curves) for elapsed time t e  = 5, 10, …, 35 years. These curves indicate very high chances of future earthquakes in the study region. The conditional probability of a large magnitude event by 2023–2027 and by 2027–2040 reaches to 0.80–0.90 and 0.90–0.99, respectively.

The present study has demonstrated the use of GE distribution in probabilistic earthquake recurrence modeling of northeast India and its adjoining regions. However, more work is needed to confirm the suitability of GE distribution for a broad spectrum of the earthquake catalog from different parts of the globe.