Abstract
Estimation of rainfall and temperature for a desired return period is a prerequisite for planning, design and operation of various hydraulic structures and for evaluation of technical and engineering appraisal of large infrastructure projects. This can be computed through Extreme Value Analysis (EVA) by fitting probability distributions to the annual series of 1-day maximum rainfall, maximum and minimum temperature. This paper details the study on adoption of Extreme Value Type-1, Extreme Value Type-2, 2-parameter Log Normal and Log Pearson Type-3 (LP3) probability distributions in EVA of rainfall and temperature for Hissar. Based on the applicability, standard parameter estimation procedures such as method of moments, maximum likelihood method (MLM) and order statistics approach are used for determination of parameters of distributions. The adequacy on fitting of probability distributions used in EVA of rainfall and temperature is evaluated by goodness-of-fit (GoF) tests, viz. Anderson–Darling and Kolmogorov–Smirnov and diagnostic test using D-index. The GoF and diagnostic tests results suggest the LP3 (MLM) is better suited amongst four probability distributions adopted in EVA of rainfall and temperature for Hissar.
Similar content being viewed by others
Avoid common mistakes on your manuscript.
Introduction
Technical and engineering appraisal of large infrastructure projects such as nuclear, hydro and thermal power plants, dams, bridges and flood control measures needs to be carried out during the planning and formulation stages. In hydrological context, it is well recognized fact that, whatsoever extreme be the design loading, more severe conditions are likely to occur in nature. For assessing such phenomena, Extreme Value Analysis (EVA) of rainfall and temperature from the geographical region of the project is one of the basic requirements. Depending on the size and the design life of the structure, the estimated extreme events corresponding to particular return period is used. This can be computed by fitting of probability distribution to the series of Annual 1-day Maximum Rainfall (AMR), Annual Maximum Temperature (AMAXT) and Annual Minimum Temperature (AMINT) observed at the India Meteorological Department observatory located in the vicinity of the project site. In general, the selection of suitable distribution plays a key role in estimating the extreme values.
Out of a number of available probability distributions, Extreme Value Type-1 (EV1), Extreme Value Type-2 (EV2), 2-parameter Log Normal (LN2) and Log Pearson Type-3 (LP3) are extensively used for Extreme Value Analysis (EVA). Based on the applicability, standard parameter estimation procedures viz., Methods of Moments (MoM) and Maximum Likelihood Method (MLM) are generally used for determination of parameters (Bobee and Ashkar 1991). In addition to MoM and MLM, Atomic Energy Regulatory Board (AERB 2008) guidelines described that the Order Statistics Approach (OSA) can also be considered for determination of parameters of EV1 and EV2 distributions. AERB guidelines also described that the OSA estimates are popular owing to less bias and minimum variance though number of methods are available for parameter estimation. AERB guidelines suggested the 1000-year return period Mean + SE (where Mean denotes the estimated value (x T ) of rainfall (or) temperature and SE the standard error), and value will be generally considered for arriving at a design value of rainfall and maximum temperature, whereas mean–SE value for minimum temperature for design purposes. In the recent past, numbers of studies have been carried out by researchers adopting probability distributions for EVA of rainfall and temperature.
Hughes et al. (2007) carried out statistical analysis using generalized extreme value (GEV) distribution and time series models for minimum and maximum temperatures in the Antarctic Peninsula. Varathan et al. (2010) found that the Gumbel (i.e. EV1) distribution is the best fitting distribution to analyse the annual maximum of rainfall of Colombo district. AlHassoun (2011) carried out a study on developing empirical formulae to estimate rainfall intensity in Riyadh region using EV1, LN2 and LP3. He concluded that the LP3 distribution gives better accuracy amongst three distributions studied in estimation of rainfall intensity. Baratti et al. (2012) carried out flood frequency analysis on seasonal and annual time scales for the Blue Nile River adopting EV1 distribution. Hasan et al. (2013) applied GEV distribution for modelling the AMAXT observed at 22 meteorological stations in Malaysia. Esteves (2013) applied EV1 distribution to estimate the extreme rainfall depths at different rain-gauge stations in southeast UK. Rasel and Hossain (2015) applied EV1 distribution for development of intensity duration frequency curves for seven divisions in Bangladesh. Vivekanandan (2015) applied EV1 and EV2 distributions for modelling the series of AMR, AMAXT, AMINT and annual hourly maximum wind speed (HMWS) for Kanyakumari region. He found that the EV1 (OSA) is a better-suited probability distribution for modelling the series of AMAXT, AMAXT and AMINT, whereas EV2 (OSA) for modelling of HMWS for Kanyakumari. Vivekanandan (2016) carried out statistical analysis of rainfall data using EV1 distribution for estimation of peak flood discharge for ungauged catchments of Bhul and Manjuhi Khad, Himachal Pradesh.
Generally, when different probability distributions are used for EVA, a common problem that arises is how to determine which model fits the best for a given set of data. This can be evaluated by goodness-of-fit (GoF) and diagnostic tests. GoF tests, viz. Anderson–Darling (A2) and Kolmogorov–Smirnov (KS), are applied for checking the adequacy of fitting of probability distributions to the rainfall and temperature data (Zhang 2002). However, the results of GoF tests offered diverging inferences which lead to adopt D-index to aid the selection of suitable distribution for EVA of rainfall and temperature. Thus, there exist research efforts in assessing the extreme values of rainfall and temperature (maximum and minimum) and for aiding design parameter of interest and present work is an effort in this direction. This paper details the procedures involved in assessing the suitable probability distribution for estimation of rainfall and temperature though GoF and diagnostic tests with illustrative example.
Methodology
The objective of the study is to assess the adequacy of probability density function (PDF) for EVA of rainfall and temperature. In this context, various steps followed for data processing, validation and analysis include: (i) prepare the AMR data series from the daily rainfall data; (ii) prepare the AMAXT and AMINT data series from the hourly temperature data; (iii) select the PDFs for EVA (say, EV1, EV2, LN2 and LP3); (iv) select parameter estimation methods (say, MoM, MLM and OSA) wherever applicable; (v) select quantitative based GoF and diagnostic tests; and (vi) conduct EVA and analyse the results obtained thereof. The PDF and quantile estimator (x T ) of the distributions are presented in Table 1.
In Table 1, \( \alpha \), \( \beta \) and \( \gamma \) denote the location, scale and shape parameters of the distributions, respectively. For EV1 and EV2 distributions, the reduced variate (Y T ) for a given return period (T) is defined by Y T = − ln(− ln(1 − (1/T))), while in the mathematical representation of LN2 and LP3, KP denotes the frequency factor corresponding to the probability of exceedance. The coefficient of skewness (CS) is CS = 0.0 for LN2, whereas CS is based on the log-transformed series of the observed data for LP3 (Rao and Hameed 2000). For the data series with AMINT, the value of \( Y_{T} \) will be read as Y T = − ln(− ln(1/T)) and KP the frequency factor corresponding to the probability of non-exceedance for LN2 and LP3 distributions.
Goodness-of-fit tests
Generally, A2 test is applied for checking the adequacy of fitting of EV1 and EV2 distributions. The procedures involved in application of A2 test for LN2 and LP3 are more complex though the utility of the test statistic is extended for checking the quantitative assessment. In view of the above, KS test is widely applied for the purpose of quantitative assessment. Theoretical descriptions of GoF tests statistic are as follows:
A2 test statistic is defined by:
Here, Z i = F(x i ) for i = 1,2,3,…,N with x1 < x2 < ··· < x N , F(x i ) is the cumulative distribution function (CDF) of ith sample (x i ) and N is the sample size.
KS test statistic is defined by:
Here, Fe(x i ) is the empirical CDF of x i and F D (x i ) is the derived CDF of x i by PDFs. In this paper, Weibull plotting position formula is used for computation of empirical CDF. The theoretical values of A2 and KS tests statistic for different sample size (N) at 5% significance level are available in the technical note on ‘goodness-of-fit tests for statistical distributions’ by Charles Annis (2009).
Test criteria If the computed value of GoF tests statistic given by the distribution is less than that of theoretical value at the desired significance level then the distribution is found to be suitable for EVA of rainfall and temperature at that level.
Diagnostic test
Sometimes, the GoF test results would not offer a conclusive inference, thus posing a problem for the user in selecting a suitable PDF for their application. In such cases, a diagnostic test in adoption to GoF is applied for making inference. The selection of most suitable probability distribution for estimation of rainfall and temperature (maximum and minimum) is performed through D-index test (USWRC 1981), which is defined as below:
Here, \( \bar{x} \) is the average value of the observed data, whereas x i (i = 1 to 6) and \( x_{i}^{*} \) are the six highest observed and corresponding estimated values by different PDFs. The distribution has the least D-index is considered as better-suited distribution for EVA of rainfall and temperature.
Application
In this paper, a study on EVA of rainfall and temperature adopting EV1, EV2, LN2 and LP3 probability distributions was carried out. MoM, MLM and OSA were used for determination of parameters of EV1 and EV2 distributions, whereas MoM and MLM for LN2 and LP3 distributions. The series of AMR derived from the daily rainfall data for the period 1969 to 2011 and the series of AMAXT and AMINT extracted from the hourly temperature data for the period 1977 to 2007 observed at Hissar observatory was used for EVA. From the scrutiny of the rainfall data, it was observed that the data of the year 2002 is not available. Similarly, from the scrutiny of the hourly temperature data, it was observed that the data for the period of 6 years (1978, 1979, 1981, 1983, 1987 and 1989) are missing. So, the missing data for the years are imputed by the series maximum value, i.e. 256.5 mm for AMR, 48.4 °C for AMAXT and 2.7 °C for AMINT as per AERB guidelines, and the entire data set is used for EVA. Table 2 gives the descriptive statistics of AMR, AMAXT and AMINT for Hissar.
Results and discussion
Based on the parameter estimation procedures of EV1, EV2, LN2 and LP3 distributions (Rao and Hameed 2000), computer codes were developed in FORTRAN language and used for EVA of rainfall and temperature. These programs compute the parameters of the distributions, estimated values (x T ) of rainfall and temperature with standard error (SE) for different return periods, GoF tests statistic and D-index values.
EVA of rainfall
The estimated 1-day maximum rainfall (ER) with SE obtained from EV1, EV2, LN2 and LP3 distributions are presented in Tables 3 and 4. The ER values obtained from EV1, LN2 and LP3 are used to develop the plots and presented in Fig. 1. For Hissar, it is noted that the ER (Table 3) obtained from EV2 (using MoM, MLM and OSA) are higher than the corresponding values of EV1 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM); and thus the plots of EV2 were not clubbed with the plots of EV1, LN2 and LP3. Figure 2 presents the plots of observed and estimated ER by EV2 (MoM, MLM and OSA) for Hissar.
From Fig. 1, it can be seen that the values of ER obtained from LP3 (MoM) is relatively higher than the corresponding values of EV1 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MLM) for return periods vary from 50 years to 1000 years. Also, from Fig. 2, it can be seen that the values of ER by EV2 (MLM) are comparatively higher than the corresponding values of EV2 (MoM) and EV2 (OSA). From Fig. 1, it can also be seen that the fitted curves of ER values by EV1 and LN2 are in the form of linear trend. Similarly, from Fig. 2, it can also be seen that the fitted curves are in the form of exponential trend while EV2 is adopted for EVA of rainfall.
EVA of temperature
The estimated values of maximum temperature (ET(Max)) with SE computed from EV1, EV2, LN2 and LP3 are presented in Tables 5 and 6. Similarly, the estimated minimum temperature (ET(Min)) with SE using EV1, LN2 and LP3 are presented in Table 7.
As there was no existence of OSA for LN2 and LP3, the EVA results of LN2 (OSA) and LP3 (OSA) are not presented in Tables 6 and 7. Similarly, from the EVA of AMINT data, it was observed that the EV2 is not found to be feasible for fitting, and therefore EVA results of EV2 are not presented in Table 7. The values of ET(Max) obtained from EV1, EV2, LN2 and LP3 are used to develop the plots and presented in Fig. 3.
From Table 6, it is noted that there is no appreciable difference between the ET(Max) values given by LN2 (MoM and MLM) and LP3 (MoM and MLM). From Fig. 3, it can be seen that the values of ET(Max) obtained from EV2 (OSA) is relatively higher than the corresponding values of EV1 (MoM, MLM and OSA), EV2 (MoM and MLM), LN2 (MoM and MLM) and LP3 (MoM and MLM). Also, from Fig. 3, it can be seen that the fitted curves by using the values of ET(Max) obtained from EV1, EV2, LN2 and LP3 are in the form of linear trend.
From Table 7, it is noticed that the values of ET(Min) by EV1 (MoM, MLM and OSA) vary between − 3.6 and 5.0 °C, whereas the corresponding values given by LN2 (MoM and MLM) and LP3 (MoM and MLM) vary between 1.7 and 4.6 °C. By considering the amount of variation in ET(Min) values in lower tail region, the plots of observed and estimated values of minimum temperature by EV1 are presented in Fig. 4, whereas the plots of LN2 and LP3 are presented in Fig. 5.
Analysis based on GoF tests
The adequacy of fitting four different PDFs for EVA of rainfall and temperature was performed by adopting GoF tests, viz. A2 and KS, as described earlier. The GoF tests results for Hissar are presented in Table 8.
From the GoF tests results, the following observations were made:
-
(i)
A2 test results confirmed the applicability of LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall.
-
(ii)
A2 test results did not support the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of temperature.
-
(iii)
KS test results suggested the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall and temperature of Hissar.
Analysis based on diagnostic test
A diagnostic test of D-index was used for assessing the adequate probability distribution for EVA of rainfall and temperature for Hissar. The D-index values of EV1, EV2, LN2 and LP3 distributions are computed and presented in Table 9.
From Table 9, it is noted that the D-index value computed from EV1 (MoM) is minimum when compared to the corresponding values of other probability distributions for the series of AMR. Also, from Table 9, it may be noted that the D-index values computed from LN2 (MoM) and LP3 (MLM) are minimum for the series AMAXT and AMINT, respectively.
Selection of probability distribution
Based on the findings obtained through GoF and D-index tests, the selection of probability distributions for EVA of rainfall and temperature for Hissar was made by considering the following points:
-
(i)
For EVA of rainfall using the series of AMR, it is noted that:
-
(a)
The D-index value of EV1 (MoM) is minimum when compared with the corresponding values of other distributions. But, the A2 test results did not suggest the EV1 (MoM, MLM and OSA) for EVA of rainfall.
-
(b)
By eliminating the D-index value of EV1 (MoM), it is observed that the D-index values computed by LN2 (MoM), LP3 (MoM) and LP3 (MLM) are following the value of 1.737, 1.816 and 1.972, respectively.
-
(a)
-
(ii)
For EVA of temperature using the series of AMAXT, it is noted that:
-
(a)
The D-index values of LN2 (MoM), LP3 (MoM) and LP3 (MLM) are computed as 0.069, 0.076 and 0.078, respectively.
-
(b)
There is no difference between the D-index values of LN2 (MLM) and LP3 (MLM).
-
(a)
-
(iii)
Generally, the EVA results obtained from MoM may not give satisfactory results as (i) it is difficult to assess exact information about the shape of a distribution by its moments of third and higher order; and (ii) the estimated parameters of distributions were biased in comparison to procedures.
-
(iv)
By eliminating the D-index values of MoM, the D-index value of LP3 (MLM) is found as minimum; and hence LP3 (MLM) is considered as most appropriate distribution for EVA of rainfall and maximum temperature.
-
(v)
For EVA of temperature using the series of AMINT, the D-index value computed by LP3 (MLM) is found as minimum when compared with the corresponding values of other distributions.
-
(vi)
The study suggested the LP3 (MLM) distribution as better suited amongst four distributions adopted in EVA of rainfall and temperature (maximum and minimum).
Conclusions
The paper presents the study carried out for EVA of rainfall and temperature for Hissar by adopting four probability distributions and also assessing the adequacy of the probability distributions using GoF and diagnostic tests with applicable parameter estimation methods. The following conclusions were drawn from the study:
-
(i)
It was found that the estimated extreme values by EV2 (MLM) for rainfall, EV2 (OSA) for maximum temperature and LP3 (MoM) for minimum temperature are higher than the corresponding values of other portability distributions used in EVA of rainfall and temperature.
-
(ii)
Adequacy of fitting of probability distribution to the series of rainfall and temperature was evaluated by GoF tests (using A2 and KS).
-
(a)
A2 test results confirmed the applicability of LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall.
-
(b)
A2 test results did not support the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of annual maximum and minimum temperature.
-
(c)
KS test results suggested the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall and temperature of Hissar.
-
(a)
-
(iii)
On the basis of evaluation of diagnostic test results, the study identified that the LP3 (MLM) distribution is better suited for EVA of rainfall and temperature (maximum and minimum).
-
(iv)
The study suggested that the 1000-year return period ER + SE value of rainfall of 594 mm, ET(Max) + SE value of maximum temperature of 56.6 °C and ET(Min)-SE value of minimum temperature of 1.5 °C given by LP3 (MLM) distribution could be used for design purposes while designing the hydraulic structures in the vicinity of Hissar region.
References
AERB (2008) Extreme values of meteorological parameters. Atomic Energy Regulatory Board (AERB) safety guide AERB/NF/SG/S-3
AlHassoun SA (2011) Developing an empirical formulae to estimate rainfall intensity in Riyadh region. J King Saud Univ Eng Sci 23(1):81–88
Baratti E, Montanari A, Castellarin A, Salinas JL, Viglione A, Bezzi A (2012) Estimating the flood frequency distribution at seasonal and annual time scales. Hydrol Earth Syst Sci 16(12):4651–4660
Bobee B, Ashkar F (1991) The Gamma family and derived distributions applied in hydrology. Water Resources Publications, Littleton
Charles Annis PE (2009) Goodness-of-Fit tests for statistical distributions. http://www.statisticalengineering.com/goodness.html. Accessed 5 Dec 2017
Esteves LS (2013) Consequences to flood management of using different probability distributions to estimate extreme rainfall. J Environ Manag 115(1):98–105
Hasan H, Salam N, Adam MB (2013) Modelling extreme temperature in Malaysia using generalized extreme value distribution. Int J Math Comput Natural Phys Eng 7(6):618–624
Hughes GL, Rao SS, Rao TS (2007) Statistical analysis and time-series models for minimum/maximum temperatures in the Antarctic Peninsula. Proc R Soc Ser A 463:241–259
Rao AR, Hameed KH (2000) Flood frequency analysis. CRC Publications, Washington
Rasel M, Hossain SM (2015) Development of rainfall intensity duration frequency equations and curves for seven divisions in Bangladesh. Int J Sci Eng Res 6(5):96–101
USWRC (1981) Guidelines for determining flood flow frequency. United States Water Resources Council (USWRC) Bulletin No. 17B
Varathan N, Perera K, Nalin (2010) Board of study in statistics and computer science of the postgraduate institute of science. University of Peradeniya, Sri Lanka
Vivekanandan N (2015) Modelling of annual extreme rainfall, temperature and wind speed Using OSA of EV1 and EV2 distributions. Int J Innov Res Comput Sci Technol 3(4):57–60
Vivekanandan N (2016) Statistical analysis of rainfall data and estimation of peak flood discharge for ungauged catchments. Int J Res Eng Technol 5(2):27–31
Zhang J (2002) Powerful goodness-of-fit tests based on the likelihood ratio. J R Stat Soc Ser B 64:281–294
Acknowledgements
The author is grateful to Dr. (Mrs.) V.V. Bhosekar, Additional Director, Central Water and Power Research Station, Pune, for providing the research facilities to carry out the study. The author is thankful to M/s Nuclear Power Corporation of India Limited, Mumbai, and India Meteorological Department, Pune, for supply of rainfall and temperature data used in the study.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Vivekanandan, N. Comparison of probability distributions in extreme value analysis of rainfall and temperature data. Environ Earth Sci 77, 201 (2018). https://doi.org/10.1007/s12665-018-7356-z
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s12665-018-7356-z