A new probability model with application to heavy-tailed hydrological data

Hussain, Tassaddaq; Bakouch, Hassan S.; Chesneau, Christophe

doi:10.1007/s10651-019-00422-7

A new probability model with application to heavy-tailed hydrological data

Published: 31 May 2019

Volume 26, pages 127–151, (2019)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Environmental and Ecological Statistics Aims and scope Submit manuscript

A new probability model with application to heavy-tailed hydrological data

Download PDF

Tassaddaq Hussain¹,
Hassan S. Bakouch² &
Christophe Chesneau³

493 Accesses
11 Citations
1 Altmetric
Explore all metrics

Abstract

Because of the dramatic changes that are being observed in the climatic conditions of the world, such as excess of rains, drought and huge floods, we introduce a versatile hydrologic probability model with three parameters. The proposed model is a combination of the Lomax and generalized Weibull distributions based on an exponent odd function. Main properties of the distribution are obtained, such as shapes of the probability density and hazard rate functions, quantile function, asymptotic distribution, information matrix and characterization via hazard rate function. Parameters are estimated via the maximum likelihood estimation method. Four data sets are used to compare the proposed model with a number of well-known hydrologic models. The proposed model is found to be suitable and representative for heavy-tailed hydrological data sets, with least loss of information attitude and a realistic return period.

The Length–Biased Weibull–Rayleigh Distribution for Application to Hydrological Data

Article 01 December 2021

Modified Beta Linear Exponential Distribution with Hydrologic Applications

Article 18 May 2019

A New Probability Model for Hydrologic Events: Properties and Applications

Article 16 November 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction with the proposed model

Frequency of heavy precipitation or proportion of total rainfall from heavy falls will increase in the 21st century over many areas on the globe, this will increase the likelihood of floods and devastation of infrastructures (Intergovernmental Panel on Climate Change IPCC 2012). These upcoming future devastating effects have certainly opened the hidden corner for accurate modeling of the flood/precipitation/earth quake data which will not only produce good fit but also yield realistic return periods. Such requisition for modeling the hydrologic phenomenon needs to assess the tail behavior which is the only realistic knowledge regarding the behavior of the distribution beyond the range of the sample (Markovich 2007). Such tail behavior has its own significance in its relative discipline. For example, light tail distributions (kurtosis less or equal to 3) can effectively be used in the analysis of global warming in terms of extreme warm and cold temperature (Loikith and Neelin 2015) and rainfall data analysis (Papalexiou et al. 2013), while the heavy-tailed distributions are used to model the service time and input in queuing models, flood levels of rivers, major insurance claims, wave heights during a storm, and low and high temperatures (Markovich 2007). These heavy-tailed distributions include the Pareto, the lognormal, the Weibull with shape parameter less than 1, the Cauchy, the Burr and the Fréchet, while the light-tailed distributions include the exponential, the gamma, the Weibull with shape parameter greater than 1, and the normal distributions.

In this regard, the conventional frequency analysis (CFA) is used for modeling these extreme events via heavy-tailed distributions which include the the Log-Pearson type III (LP(3)) defined by Bobee (1975), the three parameter log-normal distribution (LN(3)) as stated by Krige (1960), the generalized Pareto (GP) distribution used by Hosking and Wallis (1987) and Dargahi-Noubary (1989), the generalized logistic (GLO) distribution studied by Dyrrdal (2012) and Balakrishnan and Leung (1988), Gumbel distribution known as the generalized extreme value type I (GEV-I) distribution investigated by Mujere (2011), the three parameter Kappa distribution studied by Junior and Johnson (1973), the gamma distribution, the generalized exponentiated exponential Lindley (GEEL) proposed by Hussain et al. (2018) and the Fréchet distribution discussed by Ramos et al. (2018).

Moreover, the domain of attractions for maxima–minima, of the LP(3), LN(3), GP, GLO and GEEL distributions belong to the Gumbel–Weibull, the Gumbel–Gumbel, the Fréchet-Weibull, the Gumbel–Gumbel and the Gumbel–Fréchet distributions, respectively. However, the Gumbel distribution is not considered an appropriate model for the analysis of extreme events because its right tail is light. Furthermore, in extreme events analysis, researchers generally prefer the positive skewness coupled with the heavy-tailed phenomena instead of light or intermediate tails distributions (Hussain et al. 2018).

Here, we study a phenomenal procedure designed for modeling heavy-tailed environmental data sets, based on flexible parametric modeling by adding shape, location or scale parameters, while constraining the number of parameters to be at most equal to four. However, three is a recommended number of parameters to draw valid conclusions (Mudholkar et al. 1996, Johnson et al. 2005), which provides a broader range of hazard shapes and allow an assessment of each competing model relative to a more comprehensive one (Mudholkar et al. 1996). Also, such addition of extra parameter is useful in exploring tail behavior which not only improves the goodness of fit of the proposed models but also provides lower information criteria. Moreover, the exploration of tails usually reflects the importance of extreme and intermediate order statistics or exceedances over high thresholds. Focusing our attention on tails may advantageously inspire the introduction of new parametric statistical models.

The first motivational ground of modeling the extreme events and heavy-tailed phenomenon is based on the selection of Pareto Type II distribution, which was first proposed by Lomax (1987). It is effectively used in reliability modelling, life testing, income, wealth, business failure, firm size, queuing problems, biological sciences, modeling the distribution of the sizes of computer files on servers and as an alternative to the exponential distribution when the data are heavy-tailed data. The Lomax distribution is defined as follows.

Definition 1

A random variable T has the Lomax distribution with shape parameter $ \beta >0 $ if its cumulative distribution function (cdf) is given by

$$\begin{aligned} \pi (t)=1-(1+t)^{-\beta }, \quad t> 0, \, \beta > 0, \end{aligned}$$

and its corresponding probability density function (pdf) is expressed as

$$\begin{aligned} \ell (t)=\beta (1+t)^{-\beta -1}, \quad t> 0, \, \beta > 0. \end{aligned}$$

The second motivational ground is to handle the shortcomings of Weibull distribution (an asymptotic version of one of the three heavy-tailed phenomena) which is frequently used in statistical analysis of many practical data, however it is inappropriate when the failure rate is indicated to be unimodal and bathtub shaped as well as when data exhibit high kurtosis usually greater (or equal) twenty. The third motivation is the selection of two parameter generalized Weibull (GW); possessing a bathtub hazard rate, simple structure and heavy-tailed behavior, defined by Mudholkar et al. (1996) as a baseline distribution whose cdf defined on the interval $(0,\infty )$ as $G(x)=1-\left( 1-\lambda x^{\theta }\right) ^{\frac{1}{\lambda }}$ with the pdf $ g(x)=\theta x^{\theta -1}\left( 1-\lambda x^{\theta }\right) ^{\frac{1}{\lambda }-1}$ for $x > 0, \lambda \le 0$. The fourth motivational aspect is the choice of an appropriate link function, with its help we can overcome the scaling issues of environmental data sets $ D(.):[a,b]\rightarrow [0,\infty ] $ which satisfies the next conditions: (i) D(.) is differentiable and monotonically non-decreasing (ii) $ D(x) \rightarrow a$ as $ x \rightarrow 0 $ and $ D(x) \rightarrow b$ as $ x \rightarrow \infty $, and can be considered as exponent odd function $ D(x)=(e^{\theta \frac{G(x)}{1-G(x)}}-1)/(e^{\theta }-1)$, $ \theta > 0; $ where $ \theta $ and $ e^{\theta }-1 $ and behave as scale parameters which usually help to overcome the scaling issues of the environmental data sets. Now on the basis of above four motivational aspects, we define the cdf of Lomax D-GW abbreviated as LDGW distribution as follows.

Definition 2

If X is a continuous random variable, then the cdf of the LDGW distribution with parameters $ \lambda ,\theta , \beta $ is defined as

$$\begin{aligned} {F}_{{LDGW}}(x|\lambda , \theta ,\beta )&= \int _{0}^{D(x)}\ell (t)dt=1-\left( 1+ \dfrac{e^{-\theta +\theta \left( 1- \lambda x^{\theta } \right) ^{-\frac{1}{\lambda }}}-1}{e^{\theta }-1}\right) ^{-\beta }, \end{aligned}$$

(1)

where $\beta > 0$, $\theta > 0$, $\lambda \le 0$ and $x > 0$.

The corresponding probability density function (pdf) of the LDGW distribution is expressed as

$$\begin{aligned}&f_{{LDGW}}(x|\lambda , \theta , \beta )\nonumber \\&\quad = \beta \theta ^{2} \left( 1+ \dfrac{e^{-\theta +\theta \left( 1- \lambda x^{\theta } \right) ^{-\frac{1}{\lambda }}}-1}{e^{\theta }-1}\right) ^{-\beta -1} \dfrac{e^{-\theta +\theta \left( 1- \lambda x^{\theta } \right) ^{-\frac{1}{\lambda }}} x^{\theta -1}\left( 1-\lambda x^{\theta }\right) ^{-\frac{1}{\lambda }-1}}{e^{\theta }-1}. \nonumber \\ \end{aligned}$$

(2)

Similarly the survival function (sf) of the LDGW distribution is

$$\begin{aligned} {S}_{{LDGW}}(x|\lambda , \theta ,\beta )=\left( 1+ \dfrac{e^{-\theta +\theta \left( 1- \lambda x^{\theta } \right) ^{-\frac{1}{\lambda }}}-1}{e^{\theta }-1}\right) ^{-\beta } \end{aligned}$$

(3)

and the hazard rate function (hrf) of the LDGW distribution is given by

$$\begin{aligned} {h}_{{LDGW}}(x|\lambda , \theta ,\beta )=\frac{\beta \theta ^2 e^{-\theta +\theta \left( 1- \lambda x^{\theta } \right) ^{-\frac{1}{\lambda }}} x^{-1+\theta } \left( 1- \lambda x^{\theta } \right) ^{-1-\frac{1}{\lambda }}}{e^{-\theta +\theta \left( 1- \lambda x^{\theta } \right) ^{-\frac{1}{\lambda }}}+ e^{\theta }-2}. \end{aligned}$$

(4)

In addition, we have observed that $ \lim _{x\rightarrow \infty }e^{\alpha x}{S}_{{LDGW}}(x|\lambda , \theta ,\beta )= \infty $, for all $ \alpha > 0 $ and $ \theta < -\lambda $. Furthermore, $ \lim _{x\rightarrow \infty }{h}_{{LDGW}}(x|\lambda , \theta ,\beta ) = 0 $ for $ \theta < -\lambda $. Hence, the LDGW distribution has a heavy tail when $ \theta < -\lambda $ (Foss et al. 2011). This behavior of such distribution adapts hydrological data with heavy tails in the application portion of this paper. Other motivational factors include the handling of floods devastation effects, generally, floods are caused by the heavy concentrated rainfall, which are sometimes augmented by snowmelt flows in rivers. Such floods usually become the causes of financial loss, destruction of infrastructure and spreadness distance issue. In order to cope with the above mentioned causes our proposed model is desired which not only addresses flood and precipitation data at a site or a region but also produces a realistic return period, see real life application of the four data sets as mentioned in the application portion of this paper. Moreover, the proposed model exhibits not only increasing but also upside down bathtub shapes, see Figs. 1 and 2. In addition to, the proposed model exhibits, positive skewness, symmetry and negative skewness coupled with leptokurtic, mesokurtic and plattykurtic behavior (see Figs. 3, 4 and 5). Last but not the least, the asymptomatic distribution of both maxima and minima lies in the Weibull domain of attraction which rarely exists in the literature.

The contents in the rest of this article are arranged as follows. Section 2 deals with some statistical properties of the LDGW distribution, such as quantile function, asymptotic distributions of largest and smallest order statistics and characterization issues via the hazard function. Parameters estimation by maximum likelihood is studied in Sect. 3 as well as hydrological applications are explored and comparison is made with reputed hydrological probability models. Concluding remarks, findings and recommendation are given in Sect. 4.

2 Some properties of the model

Here, we shall discuss some properties of the LDGW distribution.

2.1 Quantile function

If X is an absolutely continuous random variable having the pdf defined in (2), then its quantile function, say $Q_{p}$ (quantile of order p), is defined as ${F}_{{LDGW}}(Q_{p}|\lambda , \theta ,\beta )=p$, where $0<p<1$. Quantile function gives a number $ Q_{p}$ in such a way that area to the left of $Q_{p}$ is p, i.e., it is the root of the function

$$\begin{aligned} Q_p=\lambda ^{-\frac{1}{\theta }}\left( 1-\left( \dfrac{ \theta +\ln \left\{ 2-e^{\theta }-(1-e^{\theta })(1-p)^ {-\frac{1}{\beta }}\right\} }{\theta }\right) ^{-\lambda }\right) ^ {\frac{1}{\theta }}. \end{aligned}$$

(5)

Equation (5) can be used to simulate LDGW random variables.

Now by using (5) we can find the median, skewness and kurtosis of LDGW as Median $ = Q_{0.5}$, when $p = 0.5 $,

$$\begin{aligned} {Sk.}= & {} \frac{Q_{\frac{3}{4}}-2 Q_{\frac{1}{2}}+Q_{\frac{1}{4}}}{Q_{\frac{3}{4}}-Q_{\frac{1}{4}}}, \end{aligned}$$

(6)

$$\begin{aligned} {Ku.}= & {} \frac{Q_{\frac{7}{8}}- Q_{\frac{5}{8}} +Q_{\frac{3}{8}}-Q_{\frac{1}{8}}}{Q_{\frac{6}{8}} -Q_{\frac{2}{8}}}. \end{aligned}$$

(7)

Equation (7) indicates that: As kurtosis increases, the tail of the distribution becomes heavier. These measures are less sensitive to outliers and they exist even for distributions without moments (Cakmakyapan and Ozel 2016). Figure 5 gives the plots of skewness and kurtosis for the LDGW distribution. From this figure, the distribution is leptokurtic and mesokurtic as well as platykurtic attitudes. From this, it is evident the distribution reflects light and heavy tail behavior. Moreover, it is also observed that kurtosis and skewness decrease as $ \beta $ increases.

2.2 Asymptotic distribution

Here, we present the asymptotic distributions of $W_{n}=X_{n:n}$ and $w_{n}=X_{1:n}$ that are the largest and smallest observations, respectively, from the random sample of n observations. For this purpose we take the cdf and pdf of the LDGW distribution. Then, when $ \theta < -\lambda $, the distribution of maxima is

$$\begin{aligned} \lim _{t\rightarrow \infty }\frac{1-{F}_{{LDGW}}(t+x|\lambda , \theta , \beta )}{1-{F}_{{LDGW}} (t|\lambda , \theta , \beta )}=\lim _{t\rightarrow \infty }\frac{{S}_{{LDGW}}(t+x|\lambda , \theta , \beta )}{{S}_{{LDGW}} (t|\lambda , \theta , \beta )}=1. \end{aligned}$$

Hence, the tail of $F_{LDGW}(.)$ is of slow variation. Also, the distributions with slowly varying tails are shown to be of valuable use in practice (Alves et al. 2009). So, the standardized form of distribution of maxima is Weibull (type -III). Now, the distribution of minima for all values of $ \lambda $, after using L’Hôpital’s rule, is

$$\begin{aligned} \lim _{t\rightarrow 0}\frac{{F}_{{LDGW}} (tx|\lambda , \theta , \beta )}{{F}_{{LDGW}} (t|\lambda , \theta , \beta )}=\lim _{t\rightarrow 0} \frac{x f_{{LDGW}}(tx|\lambda , \theta , \beta )}{f_{{LDGW}}(t|\lambda , \theta , \beta )}=x^{\theta }, \end{aligned}$$

and its standardized form is Weibull (type -III), i.e., $ 1-e^{-x^{\theta }} $. According to (Leadbetter et al. 1987, Theorem 1.6.2) of norming constants $a_{n}>0,b_{n}>0,c_{n}>0$ and $d_{n}>0$, we obtain

$$\begin{aligned} P\left\{ a_{n}(W_{n}-b_{n})\le x\right\} \rightarrow e^{-1}, \quad P\left\{ c_{n}(w_{n}-d_{n})\le x\right\} \rightarrow 1-e^{-x^{\theta }}, \end{aligned}$$

as $n\rightarrow \infty .$ From this, it is clear that the asymptotic distribution of sample maxima and sample minima is in the domain of attraction of Weibull distribution.

2.3 Characterization based on the hazard rate function

Characterization usually helps the researchers for determining the exact probability distribution. The hrf is twice differentiable and usually satisfies the first order differential equation

$$\begin{aligned} \dfrac{f'(x)}{f(x)}=\dfrac{h'_{f}(x)}{h_{f}(x)}-h_{f}(x), \end{aligned}$$

where $ h_{f}(x)=\dfrac{f(x)}{1-F(x)} $ is the hrf. The following characterization establishes a non-trivial characterization for the LDGW distribution in terms of the hazard rate function.

Theorem 1

Let $X : \Omega \rightarrow (0, \infty ) $ be an absolutely continuous random variable possessing sf $ {S}_{{F}}(x) {and} \, f(x) $. Then it is said to have pdf as defined in Eq. (2) iff it satisfies the expression

$$\begin{aligned} \frac{(\ln h(x))^{\prime }}{h(x)}&=\bigg \lbrace \theta ^{2}x^{\theta -1}(1-\lambda x^{\theta })^{-\frac{1}{\lambda }-1}+\frac{\theta -1}{x}+\frac{\lambda \theta x^{\theta -1}(1-\lambda x^{\theta })^{-1}(1+\lambda )}{\lambda }\nonumber \\&\quad -\frac{(\beta +1)\theta ^{2}x^{\theta -1}(1-\lambda x^{\theta })^{-\frac{1}{\lambda }-1}e^{-\theta +\theta (1-\lambda x^{\theta })^{-\frac{1}{\lambda }}}}{e^{-\theta +\theta (1-\lambda x^{\theta })^{-\frac{1}{\lambda }}}+e^{\theta }-2}\bigg \rbrace h(x) + 1, \end{aligned}$$

(8)

for $\beta > 0$, $\theta > 0$, $\lambda \le 0$ and $x > 0$.

The proof is postponed in “Appendix-A”.

3 Maximum likelihood estimation and hydrological applications

3.1 Maximum likelihood estimation

Let $X_{1}, X_{2}, \ldots , X_{n}$ be a random sample from the distribution characterized by (2) with parameter vector ${\varvec{\Theta }}=(\lambda ,\beta ,\theta )$ and $ x_{1}, x_{2}, \ldots ,x_{n}$ are the corresponding observed values, then its log-likelihood is expressed as

$$\begin{aligned} \ell (\Theta )&=\ln [{L}(x_{1},x_{2},\ldots x_{n}|\Theta )]=(\theta -1)\sum _{i=1}^n \ln (x_i)-n \ln (e^{\theta }-1)\nonumber \\&\quad -n \theta +\theta \sum _{i=1}^n \left( 1- \lambda x_{i}^{\theta } \right) {}^{-\frac{1}{\lambda }}\nonumber \\&\quad + n \ln (\beta )+2 n \ln (\theta )-\left( 1+\frac{1}{\lambda }\right) \sum _{i=1}^n \ln (1- \lambda x_{i}^{\theta }) \nonumber \\&\quad -(\beta +1)\sum _{i=1}^n \ln \left[ 1+\frac{-1+e^{-\theta +\theta \left( 1-\lambda x_i^{\theta } \right) {}^{-\frac{1}{\lambda }}}}{e^{\theta }-1}\right] . \end{aligned}$$

(9)

The maximum-likelihood estimators (MLEs) of $ \lambda $, $ \theta $ and $ \beta $ are obtained by solving numerically the nonlinear system of equations $ \frac{\partial \ell (\Theta )}{\partial \theta }=0, \frac{\partial \ell (\Theta )}{\partial \lambda }=0 $ and $ \frac{\partial \ell (\Theta )}{\partial \beta }=0 $. Those partial derivatives are given in “Appendix-B”. Moreover, for determining the variance co-variance matrix and the confidence interval for the distribution parameters, one requires information matrix which can be generated by the taking the expectation of the second order derivative. In this regard the second order derivatives of the equations above can be provided on demand.

3.2 Evaluation tests

In order to demonstrate the proposed methodology, we consider four different real-world data sets, and compare ten distributions which are too much popular in hydrologic data analysis including, the Weibull, generalized Pareto (3), log-normal (3), log Pearson Type 3, Kappa(3), extreme value, Fréchet, generalized logistic generalized exponentiated exponential Lindley (GEEL) and generalized Weibull distributions, for probability density functions of these distributions readers referred to Hussain et al. (2018). The statistics: Akaike information criterion (AIC), corrected Akaike information criterion (AICc), Hannan–Quinn information criterion (HQIC) and consistent Akaike information criterion (CAIC) along with Vuong test are used to select the best model among several models. The definitions of AIC, AICc, HQIC and CAIC are given as:

$$\begin{aligned} {AIC}= & {} 2 k -2\ell ({\hat{\Theta }}), \qquad {AICc}={AIC}+\frac{2k(k+1)}{n-k-1},\\ {HQIC}= & {} -2 \ell ({\hat{\Theta }})+2k \ln (\ln (n)), \qquad {CAIC}=-2 \ell ({\hat{\Theta }} )+\frac{2 k n}{n-k-1}. \end{aligned}$$

Moreover, perfection of competing models is also tested via the Kolmogrov$- $ Simnorov(K-S), the Anderson–Darling ($A_{0}^{*}$) and the Cramer Von Misses ($\hbox {W}_{0}^{*}$) statistics. The mathematical expressions for the statistics above are given by

$$\begin{aligned} {KS}= & {} \max \left\{ \frac{i}{m}-z_{i},z_{i}-\frac{i-1}{m}\right\} ,\chi ^{2}=\sum _{i=1}^m \frac{(o_{i}-e_{i})^{2}}{e_{i}},\\ A_{0}^{*}= & {} \left( \frac{2.25}{m^{2}}+\frac{0.75}{m}+1\right) \left\{ -m- \frac{1}{m}\sum _{i=1}^{m}(2i-1)\ln (z_{i}(1-z_{m-i+1})) \right\} ,\\ W_{0}^{*}= & {} \sum _{i=1}^{m}\left( z_{i}-\frac{2i-1}{2m}\right) ^{2} +\frac{1}{12m}, \end{aligned}$$

where k denotes the number of parameters, n denotes the number of observations, m denotes the number of classes, $z_{i}={\mathfrak {Q}}(x_{i}) $, the $x_{i}$’s are the ordered observations, $o_{i}$ and $e_{i}$ are the observed and expected frequencies of the ith class, respectively.

Whereas the Vuong test with its concerning procedure is outlined as follows.

Vuong test Chi-square approximation in the regard of the likelihood ratio test statistic is valid only for testing restrictions on the parameters of a statistical model (i.e., $ {\mathbf {H}}_{0} $ and $ {\mathbf {H}}_{1} $ are nested hypotheses). However when models are non-nested, we can not use the likelihood ratio tests for model comparison. In this scenario, the AIC AICc, CAIC and HQIC as well as the Vuong test for non-nested models are useful. Vuong (1989) proposed a likelihood ratio-based statistic for testing the null hypothesis that the competing models are equally close to the true data generating process against the alternative that one model is closer. Let us consider two statistical models based on the probability density functions $ f_{A}(x;\xi )=F_{A}(b)- F_{A}(a)$ and $ f_{B}(x;\varrho )=F_{B}(b)-F_{B}(a) $ possess equal or unequal number of parameters. Now, we define the likelihood ratio statistic for the model $ f_{1}(x;\xi ) $ against $ f_{2}(x;\varrho ) $ as

$$\begin{aligned} {{\mathfrak {L}}}{{\mathfrak {R}}}(\hat{\xi _{n}},\hat{\varrho _{n}})=\sum _{k=0}^{k_{max}} f_{k} ln \left( \frac{f_{A}(x;\hat{\xi _{n}})}{f_{B}(x;\hat{\varrho _{n}})}\right) , \end{aligned}$$

where $ \hat{\xi _{n}} $ and $ \hat{\varrho _{n}} $ are the MLEs in each model based on the sample $ k_{1}, k_{2},\ldots k_{n} $ and $ f_{k} $ denotes the observed frequency (Denuit et al. 2007). If both models are strictly non-nested, then under $ {\mathbf {H}}_{0}$ we have the test statistic ${\mathbf {Z}} = \frac{{{\mathfrak {L}}}{{\mathfrak {R}}}(\hat{\xi _{n}}, \hat{\varrho _{n}})}{\sqrt{n}\hat{\omega _{n}}}$, having the ${\mathcal {N}}(0,1)$ distribution as approximated distribution when n is large, where

$$\begin{aligned} \hat{\omega _{n}}=\frac{1}{n}\sum _{i=1}^{n} \left( ln \frac{f_{A}(x_{i}; \hat{\xi _{n}})}{f_{B}(x_{i};\hat{\varrho _{n}})} \right) ^{2}- \left( \frac{1}{n}\sum _{i=1}^{n} ln \frac{f_{A}(x_{i}; \hat{\xi _{n}})}{f_{B}(x_{i};\hat{\varrho _{n}})} \right) ^{2}. \end{aligned}$$

In order to select an appropriate model we usually construct the following hypothesis

$$\begin{aligned} {\mathbf {H}}_{0}:{{\mathfrak {K}}}{{\mathfrak {L}}}(f_{A}(x;\xi ))={{\mathfrak {K}}}{{\mathfrak {L}}}(f_{B}(x;\varrho )), \end{aligned}$$

that is, the model A is defined to be better than model B if model A’s $ {{\mathfrak {K}}}{{\mathfrak {L}}} $ distance to the truth is smaller than the model B,

$$\begin{aligned} {\mathbf {H}}_{1A}:{{\mathfrak {K}}}{{\mathfrak {L}}}(f_{A}(x;\xi ))<{{\mathfrak {K}}}{{\mathfrak {L}}}(f_{B}(x;\varrho )), \end{aligned}$$

with corresponding critical value, i.e., we reject ${\mathbf {H}}_{0} $ when $ {\mathbf {Z}} > {\mathbf {Z}}_{\gamma } $ in favor of A. Similarly, model B is defined to be better than model A if model B’s $ {{\mathfrak {K}}}{{\mathfrak {L}}} $ distance to the truth is smaller than the model A,

$$\begin{aligned} {\mathbf {H}}_{1B}:{{\mathfrak {K}}}{{\mathfrak {L}}} (f_{A}(x;\xi ))>{{\mathfrak {K}}}{{\mathfrak {L}}}(f_{B}(x;\varrho )), \end{aligned}$$

with the corresponding critical region, i.e., we reject ${\mathbf {H}}_{0} $ when $ {\mathbf {Z}} < - {\mathbf {Z}}_{\gamma } $ in favor of B otherwise decision can not be made, where $ {{\mathfrak {K}}}{{\mathfrak {L}}} $ denotes the Kullback–Leibler distance measure and $ \gamma $ denotes the level of significance. Therefore, if the value of the test statistic is higher than $ {\mathbf {Z}}_{\gamma } $ then one can reject the null hypothesis that the models are equivalent in favor of $ f_{A}(x;\xi ) $ being better than $ f_{B}(x;\varrho ) $, and if the test statistic is smaller than $ - {\mathbf {Z}}_{\gamma } $ then one rejects the null hypothesis in favor of $ f_{B}(x;\varrho ) $ being better than $ f_{A}(x;\xi ) $ else decision can not be made. The comparisons of the competing models are displayed in the Table 17 for the four data sets.

Example 1

The first data set (I) is the annual flood discharge rates of the Floyd River data flood discharge ($ ft^3/s $) during 1935–1973, and it was first reported by Pickands (1975) and later on by Akinsete et al. (2008). The data consist of the following values: 1460, 4050, 3570, 2060, 1300, 1390, 1720, 6280, 1360, 7440, 5320, 1400, 3240, 2710, 4520, 4840, 8320, 13900, 71500, 6250, 2260, 318, 1330, 970, 1920, 15100, 2870, 20600, 3810, 726, 7500, 7170, 2000, 829, 17300, 4740, 13400, 2940, 5660.

Table 1 Descriptive statistics for data set I

A new probability model with application to heavy-tailed hydrological data

Abstract

Similar content being viewed by others

The Length–Biased Weibull–Rayleigh Distribution for Application to Hydrological Data

Modified Beta Linear Exponential Distribution with Hydrologic Applications

A New Probability Model for Hydrologic Events: Properties and Applications

1 Introduction with the proposed model

Definition 1

Definition 2

2 Some properties of the model

2.1 Quantile function

2.2 Asymptotic distribution

2.3 Characterization based on the hazard rate function

Theorem 1

3 Maximum likelihood estimation and hydrological applications

3.1 Maximum likelihood estimation

3.2 Evaluation tests

Example 1

Example 2

Example 3

Example 4

3.3 Hydrological parameters

3.3.1 Return period

4 Conclusion

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendices

Appendix-A

Proof

Appendix-B

Appendix-C

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation