Abstract
This study introduces a modified hybrid gamma and generalized Pareto distribution. Prior to this, we define a general spliced distribution and its corresponding gamma distribution, which is part of the head, and a generalized Pareto (GP) distribution, which is part of the tail. We then examine the threshold conditions for the modified hybrid gamma and GP distribution and defined probability density function. Also, we derive the negative log-likelihood function of the modified hybrid gamma and GP distribution and estimate approximate maximum likelihood estimates using the differential evolution algorithm for each simulation to minimize it. Moreover, by presenting the mean square error for each sample size, the model is evaluated according to the size of the sample. Finally, we use daily observed summer precipitation for Seoul, Korea, from 1961 to 2011, which includes 4692 data sets. We use 2051 data sets corresponding to wet conditions. As a result, the estimated threshold of the modified hybrid gamma and GP distribution is 0.1455. After deriving Fisher information through the Hessian matrix, we also present the standard error of the maximum likelihood estimator.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
1 Introduction
Establishing a reliable and accurate probability distribution is vital for hydrology, meteorology, and other related fields of study. The choice of distribution particularly affects the verification and analysis of heavy rainfall and the improvement of simulation results using hydrological models (Deguenon et al. 2009; Hanum et al. 2015). Accurate probability distributions of rainfall data can be used to forecast precipitation amounts several hours ahead. For these reasons, various approaches for determining rainfall probability distributions have been actively researched (Coe and Stren 1982; Baxevani and Lennartsson 2015).
Gamma (Coe and Stren 1982) or log normal distributions are commonly used as probability distributions for rainfall data, although mixed distributions have become a more recent popular approach. The gamma distribution does not exhibit good performance in the case of heavy precipitation (high rainfall with low frequency), which requires high accuracy. For this reason, the generalized Pareto (GP) distribution has been applied in extreme rainfall cases (Deguenon et al. 2009); however, it cannot be used if the light rain frequency is high. To compensate for these deficits, spliced distributions, which fit two different distributions in each support and consider covariates for seasonality and teleconnections, are the focus of this research. Hanum et al. (2015) showed that fitting a spliced distribution consisting of a gamma distribution and Pareto distribution to tropical heavy rainfall data of Jakarta, Indonesia, obtained a better result than fitting single distributions such as gamma, Pareto, or GP distributions alone (see also Li et al. 2012). However, it is generally known that the spliced distribution does not satisfy continuity and differentiability at the thresholds of two distributions. A hybrid distribution has therefore been developed to compensate for this problem, although the hybrid distribution only satisfies continuity at the threshold and not differentiability. Previous studies have attempted to splice the gamma distribution into a GP distribution. For example, Baxevani and Lennartsson (2015) proposed a precipitation generator composed of a hybrid gamma and GP distribution after fixing the threshold as a constant.
Stochastic weather generators are often used to simulate a time series of weather variables on a daily scale. They can also be used to temporally downscale climate information, such as monthly or seasonal forecasts (Wilks and Wilby 1999; Benestad et al. 2008). The parametric weather generator can incorporate covariates for seasonality and teleconnections associated with El Niño (Furrer and Katz 2007). However, this approach tends to underestimate the observed inter-annual variance of seasonally aggregated variables, which is termed “overdispersion”, and shown in Fig. 1 (Buishand 1978; Katz and Parlange 1998; Benestad et al. 2008). Recently, filtered time series of seasonal total precipitation using locally weighted scatter-plot smoothing (LOESS, Cleveland 1979; Hastie and Tibshirani 1990) or a hidden variable reflecting the unobserved seasonal shifts in climate regimes have been incorporated as covariates in the GLM based weather generator (Kim et al. 2012; Kim and Lee 2017). However, there are still some issues concerning the choice of smoothing parameter.
The purpose of this study is to investigate the conditions of the modified hybrid distribution considering the differentiability of an existing hybrid distribution and to suggest a modified hybrid gamma and generalized Pareto distribution that can satisfy both continuity and differentiability for rainfall data which is essential for maximum likelihood estimation of parameters of hybrid distribution. For this purpose, we evaluate the appropriateness of the modified hybrid gamma and GP distribution model through various simulation results. In addition, we reveal the practical applications of the model using daily summer precipitation amounts observed in Seoul, Korea, from 1961 to 2011. Finally, a suggestion is made for additional studies to produce a weather generator that can reduce overdispersion with our proposed distribution.
2 Modified Hybrid Gamma and Generalized Pareto Distribution
Here, we first describe the spliced distribution suggested by Klugman et al. (2004) and Nadarajah and Bakar (2014). We then separately introduce the gamma and GP distributions before finally presenting our modified hybrid gamma and GP distribution.
2.1 Spliced Distribution
A spliced distribution is typically constructed using several distribution functions f1(x), f2(x), … , fk(x). The general form of the spliced distribution denoted by f(x) can be expressed as follows (see Klugman et al. 2004; Nadarajah and Bakar 2014):
where ai is the mixing weight and satisfies \( {\sum}_{i=1}^k{a}_i=1,\left({a}_i>0\right) \) and \( {f}_i^{\ast }(x) \) denotes the truncated probability density function, which is of the form
In the special case of (2.1), a simple spliced distribution combines the head part of the probability density function f1(x) with the tail part of f2(x), and can be shown as follows:
As mentioned before, a1, a2 are positive mixing weights that satisfy a1 + a2 = 1, and \( {f}_1^{\ast }(x) \) and \( {f}_2^{\ast }(x) \) have the following forms
where F1(x) and F2(x) are cumulative distribution functions corresponding to the density functions f1(x) and f2(x), respectively. Note that θ denotes the limit of the domain and is regarded as a model parameter (Klugman et al. 2004). However, it is not generally guaranteed that a spliced distribution of the form (2.2) is a valid continuous density function. Thus, a spliced distribution requires the following condition to be continuous: f(θ−) = f(θ+). In addition, it needs another critical condition such as differentiability (Bakar et al. 2015). To be differentiable at every point x, the probability density function f(x) is required to satisfy the following condition at threshold θ:
Under the above conditions, the spliced distribution with probability density function f(x) can be differentiable on the support. In addition, the effect of reducing the parameters of the spliced distribution is commonly obtained by deriving the relationship between the parameters of each head and tail of the distribution from these conditions.
According to the continuity condition, the mixing weights a1, a2 in (2.2) can be represented as
where
Note that δ > 0 and it is well known that mixing weights that depend on the combined distribution parameters give better fits than constant weights (Scollnik 2007). However, this function might not always have an explicit form.
Under both continuity and differentiability conditions of the probability density function f(x), the equation \( {f}_1^{\prime}\left(\theta \right){f}_2\left(\theta \right)-{f}_1\left(\theta \right){f}_2^{\prime}\left(\theta \right) \) is satisfied at threshold θ. Alternatively, it can be achieved by solving
as shown by Bakar et al. (2015). In general, the parameters of both the tail of the probability distribution function f1(x) and the head of the probability distribution function f2(x), as well as the mixing weights a1, a2, depend on threshold θ. However, threshold θ can also be obtained after estimating all parameters of the spliced distribution. Thus, a certain number of iterations is necessary.
2.2 Modified Hybrid Gamma and Generalized Pareto Distribution
For the case of frequent light rainfall, a gamma distribution is commonly cited as an appropriate method because it has short right tail. With a long tail to the right, it might not display goodness of fit. However, the GP distribution generally displays a good fit with a long right tail in the case of heavy rainfall, although it might not show goodness of fit with a short right tail. In addition, data loss occurs because the shape of the GP distribution is truncated below the threshold value.
Therefore, we suggest a modified hybrid gamma and generalized Pareto distribution that uses the gamma distribution when rainfall is light and the GP distribution when rainfall is heavy.
Suppose that part of the head, f1(x), is a gamma distribution with shape parameter α and scale parameter β, and part of the tail, f2(x), is a GP distribution with location θ, scale parameter σ, and shape parameter ξ. Then, threshold θ satisfying (2.3) is indicated by
Thus, threshold θ of the modified hybrid gamma and GP distribution only depends on the parameters of the gamma distribution when α > 1. In addition, the δ value for mixing weights can be expresses as follows:
where Γ(α, θ/β) is the lower incomplete gamma function. Note that δ is determined by parameters of both the GP and gamma distributions. Therefore, the probability density function of the proposed distribution is
where α > 1.
2.3 Parameter Estimation of the Proposed Distribution
Here, we compute the maximum likelihood estimators of parameters of the modified hybrid gamma and GP distribution. From general formation of the likelihood function of the spliced distribution, the log-likelihood function of (α, β, ξ, σ, θ) can be expressed as follows:
where n = M + m, \( M={\sum}_{i=1}^nI\left({x}_i\le \theta \right) \), \( m={\sum}_{i=1}^nI\left({y}_i>\theta \right) \), f1(x) is the probability density function of the gamma distribution, F1(x) is the cumulative density function of the gamma distribution, and f2(x) is the probability density function of the GP distribution. As it is not possible to obtain the explicit form of the maximum likelihood estimator of (α, β, ξ, σ, θ) by maximizing log-likelihood logL(α, β, ξ, σ, θ), we consider the differential evolution (DE) algorithm to achieve global optimization. Note that the DE algorithm does not require the optimization problem to be differentiable required by classical optimization problem and so is useful for finding approximate solutions when there are non-differentials, multiple local minima, and non-linearities, etc.
3 Numerical Studies
We focus on the class of problems where the behavior of distributions over (or below) a high (or low) threshold is of interest; i.e. those that characterize extreme events. As mentioned in the introduction, a mixture of the gamma and GP distributions with a threshold has emerged as an efficient way to generate more realistic weather scenarios for impact assessments.
3.1 Simulation Study
In this section, we report the simulation results for the optimal threshold of the proposed distribution and examine the efficiency of the model estimation method. The results of the maximum likelihood estimators and threshold of the spliced distribution are provided. The parameters α and β are for the head part of gamma distribution, ξ and σ are the parameters for the tail part of GP distribution, and θ is a global parameter for the GP distribution, Each sample is extracted from eq. (2.4) and the mixing weights are represented as a function of other parameters of the model. To compare and summarize the performance of the simulation results, we consider mean square error (MSE) as follows:
where N is the number of iterations, \( {\widehat{\theta}}^{(i)} \) is the estimated threshold at the ith iteration, and θ is the true threshold.
Table 1 shows the simulation results from the modified hybrid gamma and GP distribution with α = 5, β = 4, ξ = 0.3, σ = 8, θ = 16, and N = 100 (Simulation 1). For each simulation, sample size n is 500, 1000, and 2000, resprctively; as the sample size increases, the maximum likelihood estimator of the parameters becomes more stable. Figure 2 is a plot of the fitted modified hybrid gamma and GP distribution for Simulation 1, which clearly shows an effective estimate of the proposed distribution. Table 2 shows the simulation results under different parameter conditions (α = 9, β = 7, ξ = 0.2, σ = 3, θ = 56, and N = 100 (Simulation 2). Tables 3 and 4 show the results for a simulation with a relatively small threshold, where α = 2, β = 4, ξ = 0.7, σ = 4, θ = 4 (Simulation 3), and a simulation with a very large threshold, where α = 14, β = 20, ξ = 1, σ = 4, θ = 260 (Simulation 4), respectively. The overall simulation results indicate that, as the sample size increases, MSE decreases and the estimate becomes more stable. Figures 3, 4 and 5 show the fitted modified hybrid gamma and generalized Pareto distribution corresponding to the results of Simulation 2, 3, and 4.
3.2 Real Data
To verify and demonstrate the performance of the proposed modified hybrid gamma and GP distribution using real case rainfall data, we use the daily summer precipitation amounts observed in 62 weather stations, Korea, from 1961 to 2011 and retrieve the maximum likelihood estimates. In each station, there are 4692 data sets and we used 2051 of them, excluding any classed as 0, which indicated that there was no rain recorded. In Seoul, descriptive statistics are summarized in Table 5. The 50-year rainfall data is positively skewed, revealing that the maximum value exhibits a large difference both in the 1st and 3rd quartiles. The data features multiple instance of low rainfall and only few large amounts of rainfall appears together (Table 6). Summary statistics for estimated parameters of modified hybrid gamma and generalized Pareto distribution using rainfall data in 62 weather stations during 1961–2011 are provided in Table 7. The plots of fitted modified hybrid gamma and generalized Pareto distribution with histogram of rainfall data in several weather stations in Korea during 1961–2011 are provided in Fig. 6. In addition, Fig. 7 and Fig. 8 show estimated parameters of modified hybrid gamma and generalized Pareto distribution using rainfall data in 62 weather stations during 1961–2011. Some parameters share geographical trend in common with threshold parameter (θ).
To determine the maximum likelihood estimator of the proposed distribution, we used the DE algorithm. Through multidimensional global optimization, we found the approximate maximum likelihood estimates that minimize the log likelihood function. In addition, we calculated the standard error, which is the standard deviation of each estimator, after deriving Fisher information through a Hessian matrix. This result is summarized in Table 6. Finally, we compared the goodness of fit results for the gamma distribution, GP distribution, and modified hybrid gamma and GP distribution using the rainfall data, which confirm that the proposed model results in a better estimate.
4 Concluding Remarks
In this study, we first introduced a general spliced distribution and its corresponding gamma distribution, which forms the head in the curve, and a generalized Pareto distribution, which forms the tail. Then, we examined the threshold condition for our proposed distribution and defined a new probability density function accordingly. We further derived a likelihood function for the distribution and estimated approximate maximum likelihood estimates using the DE algorithm for multiple simulations for minimization. At the same time, by presenting the MSE for each sample size, the precipitation generator model was evaluated according to the size of the sample. Finally, we used 2051 data sets of measurable daily summer precipitation observed in Seoul, Korea, from 1961 to 2011. As a result, the estimated threshold of the modified hybrid gamma and generalized Pareto distribution was 0.1455. After deriving Fisher information using a Hessian matrix, we also presented the standard error of the maximum likelihood estimator.
This study represents the first attempt to use a modified hybrid approach, which will be built on in future research. Our work has two major advantages. Firstly, the thresholds are usually fixed as constants in a spliced distribution including a generalized Pareto distribution. However, by using the modified hybrid gamma and the generalized Pareto distribution proposed in this study, the threshold and other parameters can be estimated simultaneously. Therefore, the result will be different from that of a general mixed distribution because it satisfies both continuity and differentiability at the threshold. Secondly, generating rainfall data using the modified hybrid gamma and generalized Pareto distribution will reduce the overdispersion that occurs in existing parametric weather generators and provide more accurate probability estimates.
References
Bakar, S., Hamzah, N.A., Maghsoudi, M., Nadarajah, S.: Modeling loss data using composite models. Insur. Math. and Econ. 61, 146–154 (2015)
Baxevani, A., Lennartsson, L.: A spatiotemporal precipitation generator based on a censored latent Gaussian field. Water Resour. Res. 51, 4338–4358 (2015)
Benestad, R. E., I. Hanssen-Bauer, and D. Chen, 2008: Empirical Statistical Downscaling. World Scientific Publishing Company, 228 pp.
Buishand, T.A.: Some remarks on the use of daily rainfall models. J. Hydrol. 47, 235–249 (1978)
Cleveland, W.S.: Robust locally weighted regression and smoothing scatterplots. J. Amer. Stat. Assoc. 74, 829–836 (1979)
Coe, R., Stren, R.D.: Fitting models to daily rainfall data. J. Appl. Meteorol. 21, 1024–1031 (1982)
Deguenon, J., Barbulescu, A., Sarr, M.: GPD models for extreme rainfall in Dobrudja. Comput. Eng. Sys. Appl. 2, 131–136 (2009)
Furrer, E.M., Katz, R.W.: Generalized linear modeling approach to stochastic weather generators. Clim. Res. 34, 129–144 (2007)
Hanum, H., A. Hamim, A., A. Djuraidah, and W. Mangku, 2015: Modeling extreme rainfall with gamma-Pareto distribution. Appl. Math. Sci., 9, 6029–6039
Hastie, T.J., Tibshirani, R.J.: Generalized additive models, p. 352. Chapman and Hall (1990)
Katz, R.W., Parlange, M.B.: Overdispersion phenomenon in stochastic modeling of precipitation. J. Clim. 11, 591–601 (1998)
Kim, Y., Lee, G.W.: Stochastic precipitation generator with hidden state covariates. Asia-Pac. J. Atmos. Sci. 53(3), 353–359 (2017)
Kim, Y., Katz, R.W., Rajagopalan, B., Podest, G.P., Furrer, E.M.: Reducing overdispersion in stochastic weather generators using a generalized linear modeling approach. Clim. Res. 53, 13–24 (2012)
Klugman, S.A., Panjer, H.H., Willmot, G.E.: Loss models: From data to decisions, 2nd edn. Wiley, New York (2004) 720pp
Li, C., Singh, V.P., Mishra, A.K.: Simulation of the entire range of daily precipitation using a hybrid probability distribution. Water Resour. Res. 48, W03521 (2012)
Nadarajah, S., Bakar, S.: New composite models for the Danish reinsurance data. Scand. Actuar. J. 2014, 180–187 (2014)
Scollnik, D.P.M.: On composite lognormal-Pareto models. Scand. Actuar. J. 2007, 20–33 (2007)
Wilks, D.S., Wilby, R.L.: The weather generator game: a review of stochastic weather models. Prog. Phys. Geogr. 23, 329–357 (1999)
Acknowledgements
This subject is supported by Korea Ministry of Environment (MOE) as “Water Management Research Program” and by “Development of Nowcasting Applications Algorithms” project, funded by ETRI, which is a subproject of “Development of Geostationary Meteorological Satellite Ground Segment (NMSC-2018-01)” program funded by NMSC (National Meteorological Satellite Center) of KMA (Korea Meteorological Administration).
Author information
Authors and Affiliations
Corresponding author
Additional information
Responsible Editor: Kyong-Hwan Seo.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Kim, Y., Kim, H., Lee, G. et al. A Modified Hybrid Gamma and Generalized Pareto Distribution for Precipitation Data. Asia-Pacific J Atmos Sci 55, 609–616 (2019). https://doi.org/10.1007/s13143-019-00114-z
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s13143-019-00114-z