Introduction

Technical and engineering appraisal of large infrastructure projects such as nuclear, hydro and thermal power plants, dams, bridges and flood control measures needs to be carried out during the planning and formulation stages. In hydrological context, it is well recognized fact that, whatsoever extreme be the design loading, more severe conditions are likely to occur in nature. For assessing such phenomena, Extreme Value Analysis (EVA) of rainfall and temperature from the geographical region of the project is one of the basic requirements. Depending on the size and the design life of the structure, the estimated extreme events corresponding to particular return period is used. This can be computed by fitting of probability distribution to the series of Annual 1-day Maximum Rainfall (AMR), Annual Maximum Temperature (AMAXT) and Annual Minimum Temperature (AMINT) observed at the India Meteorological Department observatory located in the vicinity of the project site. In general, the selection of suitable distribution plays a key role in estimating the extreme values.

Out of a number of available probability distributions, Extreme Value Type-1 (EV1), Extreme Value Type-2 (EV2), 2-parameter Log Normal (LN2) and Log Pearson Type-3 (LP3) are extensively used for Extreme Value Analysis (EVA). Based on the applicability, standard parameter estimation procedures viz., Methods of Moments (MoM) and Maximum Likelihood Method (MLM) are generally used for determination of parameters (Bobee and Ashkar 1991). In addition to MoM and MLM, Atomic Energy Regulatory Board (AERB 2008) guidelines described that the Order Statistics Approach (OSA) can also be considered for determination of parameters of EV1 and EV2 distributions. AERB guidelines also described that the OSA estimates are popular owing to less bias and minimum variance though number of methods are available for parameter estimation. AERB guidelines suggested the 1000-year return period Mean + SE (where Mean denotes the estimated value (x T ) of rainfall (or) temperature and SE the standard error), and value will be generally considered for arriving at a design value of rainfall and maximum temperature, whereas mean–SE value for minimum temperature for design purposes. In the recent past, numbers of studies have been carried out by researchers adopting probability distributions for EVA of rainfall and temperature.

Hughes et al. (2007) carried out statistical analysis using generalized extreme value (GEV) distribution and time series models for minimum and maximum temperatures in the Antarctic Peninsula. Varathan et al. (2010) found that the Gumbel (i.e. EV1) distribution is the best fitting distribution to analyse the annual maximum of rainfall of Colombo district. AlHassoun (2011) carried out a study on developing empirical formulae to estimate rainfall intensity in Riyadh region using EV1, LN2 and LP3. He concluded that the LP3 distribution gives better accuracy amongst three distributions studied in estimation of rainfall intensity. Baratti et al. (2012) carried out flood frequency analysis on seasonal and annual time scales for the Blue Nile River adopting EV1 distribution. Hasan et al. (2013) applied GEV distribution for modelling the AMAXT observed at 22 meteorological stations in Malaysia. Esteves (2013) applied EV1 distribution to estimate the extreme rainfall depths at different rain-gauge stations in southeast UK. Rasel and Hossain (2015) applied EV1 distribution for development of intensity duration frequency curves for seven divisions in Bangladesh. Vivekanandan (2015) applied EV1 and EV2 distributions for modelling the series of AMR, AMAXT, AMINT and annual hourly maximum wind speed (HMWS) for Kanyakumari region. He found that the EV1 (OSA) is a better-suited probability distribution for modelling the series of AMAXT, AMAXT and AMINT, whereas EV2 (OSA) for modelling of HMWS for Kanyakumari. Vivekanandan (2016) carried out statistical analysis of rainfall data using EV1 distribution for estimation of peak flood discharge for ungauged catchments of Bhul and Manjuhi Khad, Himachal Pradesh.

Generally, when different probability distributions are used for EVA, a common problem that arises is how to determine which model fits the best for a given set of data. This can be evaluated by goodness-of-fit (GoF) and diagnostic tests. GoF tests, viz. Anderson–Darling (A2) and Kolmogorov–Smirnov (KS), are applied for checking the adequacy of fitting of probability distributions to the rainfall and temperature data (Zhang 2002). However, the results of GoF tests offered diverging inferences which lead to adopt D-index to aid the selection of suitable distribution for EVA of rainfall and temperature. Thus, there exist research efforts in assessing the extreme values of rainfall and temperature (maximum and minimum) and for aiding design parameter of interest and present work is an effort in this direction. This paper details the procedures involved in assessing the suitable probability distribution for estimation of rainfall and temperature though GoF and diagnostic tests with illustrative example.

Methodology

The objective of the study is to assess the adequacy of probability density function (PDF) for EVA of rainfall and temperature. In this context, various steps followed for data processing, validation and analysis include: (i) prepare the AMR data series from the daily rainfall data; (ii) prepare the AMAXT and AMINT data series from the hourly temperature data; (iii) select the PDFs for EVA (say, EV1, EV2, LN2 and LP3); (iv) select parameter estimation methods (say, MoM, MLM and OSA) wherever applicable; (v) select quantitative based GoF and diagnostic tests; and (vi) conduct EVA and analyse the results obtained thereof. The PDF and quantile estimator (x T ) of the distributions are presented in Table 1.

Table 1 PDF and quantile estimator of probability distributions

In Table 1, \( \alpha \), \( \beta \) and \( \gamma \) denote the location, scale and shape parameters of the distributions, respectively. For EV1 and EV2 distributions, the reduced variate (Y T ) for a given return period (T) is defined by Y T  = − ln(− ln(1 − (1/T))), while in the mathematical representation of LN2 and LP3, KP denotes the frequency factor corresponding to the probability of exceedance. The coefficient of skewness (CS) is CS = 0.0 for LN2, whereas CS is based on the log-transformed series of the observed data for LP3 (Rao and Hameed 2000). For the data series with AMINT, the value of \( Y_{T} \) will be read as Y T  = − ln(− ln(1/T)) and KP the frequency factor corresponding to the probability of non-exceedance for LN2 and LP3 distributions.

Goodness-of-fit tests

Generally, A2 test is applied for checking the adequacy of fitting of EV1 and EV2 distributions. The procedures involved in application of A2 test for LN2 and LP3 are more complex though the utility of the test statistic is extended for checking the quantitative assessment. In view of the above, KS test is widely applied for the purpose of quantitative assessment. Theoretical descriptions of GoF tests statistic are as follows:

A2 test statistic is defined by:

$$ {\text{A}}^{2} = \left( { - N} \right) - \left( {{1 \mathord{\left/ {\vphantom {1 N}} \right. \kern-0pt} N}} \right)\;\sum\limits_{i = 1}^{N} {{\kern 1pt} {\kern 1pt} \left\{ {(2i - 1)\;\ln (Z_{i} ) + \left( {2N + 1 - 2i} \right)\;\ln (1 - Z_{i} )} \right\}} $$
(1)

Here, Z i  = F(x i ) for i = 1,2,3,…,N with x1 < x2 < ··· < x N , F(x i ) is the cumulative distribution function (CDF) of ith sample (x i ) and N is the sample size.

KS test statistic is defined by:

$$ {\text{KS}} = \mathop {{\text{Max}}\;}\limits_{i = 1}^{N} (F_{\text{e}} (x_{i} ) - F_{D} (x_{i} )) $$
(2)

Here, Fe(x i ) is the empirical CDF of x i and F D (x i ) is the derived CDF of x i by PDFs. In this paper, Weibull plotting position formula is used for computation of empirical CDF. The theoretical values of A2 and KS tests statistic for different sample size (N) at 5% significance level are available in the technical note on ‘goodness-of-fit tests for statistical distributions’ by Charles Annis (2009).

Test criteria If the computed value of GoF tests statistic given by the distribution is less than that of theoretical value at the desired significance level then the distribution is found to be suitable for EVA of rainfall and temperature at that level.

Diagnostic test

Sometimes, the GoF test results would not offer a conclusive inference, thus posing a problem for the user in selecting a suitable PDF for their application. In such cases, a diagnostic test in adoption to GoF is applied for making inference. The selection of most suitable probability distribution for estimation of rainfall and temperature (maximum and minimum) is performed through D-index test (USWRC 1981), which is defined as below:

$$ {\text{D-index}} = \left( {{1 \mathord{\left/ {\vphantom {1 {\bar{x}}}} \right. \kern-0pt} {\bar{x}}}} \right)\sum\limits_{i = 1}^{6} {\left| {x_{i} - x_{i}^{*} } \right|} $$
(3)

Here, \( \bar{x} \) is the average value of the observed data, whereas x i (i = 1 to 6) and \( x_{i}^{*} \) are the six highest observed and corresponding estimated values by different PDFs. The distribution has the least D-index is considered as better-suited distribution for EVA of rainfall and temperature.

Application

In this paper, a study on EVA of rainfall and temperature adopting EV1, EV2, LN2 and LP3 probability distributions was carried out. MoM, MLM and OSA were used for determination of parameters of EV1 and EV2 distributions, whereas MoM and MLM for LN2 and LP3 distributions. The series of AMR derived from the daily rainfall data for the period 1969 to 2011 and the series of AMAXT and AMINT extracted from the hourly temperature data for the period 1977 to 2007 observed at Hissar observatory was used for EVA. From the scrutiny of the rainfall data, it was observed that the data of the year 2002 is not available. Similarly, from the scrutiny of the hourly temperature data, it was observed that the data for the period of 6 years (1978, 1979, 1981, 1983, 1987 and 1989) are missing. So, the missing data for the years are imputed by the series maximum value, i.e. 256.5 mm for AMR, 48.4 °C for AMAXT and 2.7 °C for AMINT as per AERB guidelines, and the entire data set is used for EVA. Table 2 gives the descriptive statistics of AMR, AMAXT and AMINT for Hissar.

Table 2 Descriptive statistics of AMR, AMAXT and AMINT for Hissar

Results and discussion

Based on the parameter estimation procedures of EV1, EV2, LN2 and LP3 distributions (Rao and Hameed 2000), computer codes were developed in FORTRAN language and used for EVA of rainfall and temperature. These programs compute the parameters of the distributions, estimated values (x T ) of rainfall and temperature with standard error (SE) for different return periods, GoF tests statistic and D-index values.

EVA of rainfall

The estimated 1-day maximum rainfall (ER) with SE obtained from EV1, EV2, LN2 and LP3 distributions are presented in Tables 3 and 4. The ER values obtained from EV1, LN2 and LP3 are used to develop the plots and presented in Fig. 1. For Hissar, it is noted that the ER (Table 3) obtained from EV2 (using MoM, MLM and OSA) are higher than the corresponding values of EV1 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM); and thus the plots of EV2 were not clubbed with the plots of EV1, LN2 and LP3. Figure 2 presents the plots of observed and estimated ER by EV2 (MoM, MLM and OSA) for Hissar.

Table 3 Estimates of 1-day maximum rainfall with SE adopting EV1 and EV2 distributions
Table 4 Estimates of 1-day maximum rainfall with SE adopting LN2 and LP3 distributions
Fig. 1
figure 1

Plots of observed and estimated values of rainfall by EV1, LN2 and LP3 distributions

Fig. 2
figure 2

Plots of observed and estimated values of rainfall by EV2 distribution

From Fig. 1, it can be seen that the values of ER obtained from LP3 (MoM) is relatively higher than the corresponding values of EV1 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MLM) for return periods vary from 50 years to 1000 years. Also, from Fig. 2, it can be seen that the values of ER by EV2 (MLM) are comparatively higher than the corresponding values of EV2 (MoM) and EV2 (OSA). From Fig. 1, it can also be seen that the fitted curves of ER values by EV1 and LN2 are in the form of linear trend. Similarly, from Fig. 2, it can also be seen that the fitted curves are in the form of exponential trend while EV2 is adopted for EVA of rainfall.

EVA of temperature

The estimated values of maximum temperature (ET(Max)) with SE computed from EV1, EV2, LN2 and LP3 are presented in Tables 5 and 6. Similarly, the estimated minimum temperature (ET(Min)) with SE using EV1, LN2 and LP3 are presented in Table 7.

Table 5 Estimates of maximum temperature with SE adopting EV1 and EV2 distributions
Table 6 Estimates of maximum temperature with SE adopting LN2 and LP3 distributions
Table 7 Estimates of minimum temperature with SE adopting EV1, LN2 and LP3 distributions

As there was no existence of OSA for LN2 and LP3, the EVA results of LN2 (OSA) and LP3 (OSA) are not presented in Tables 6 and 7. Similarly, from the EVA of AMINT data, it was observed that the EV2 is not found to be feasible for fitting, and therefore EVA results of EV2 are not presented in Table 7. The values of ET(Max) obtained from EV1, EV2, LN2 and LP3 are used to develop the plots and presented in Fig. 3.

Fig. 3
figure 3

Plots of observed and estimated values of maximum temperature by EV1, EV2, LN2 and LP3 distributions

From Table 6, it is noted that there is no appreciable difference between the ET(Max) values given by LN2 (MoM and MLM) and LP3 (MoM and MLM). From Fig. 3, it can be seen that the values of ET(Max) obtained from EV2 (OSA) is relatively higher than the corresponding values of EV1 (MoM, MLM and OSA), EV2 (MoM and MLM), LN2 (MoM and MLM) and LP3 (MoM and MLM). Also, from Fig. 3, it can be seen that the fitted curves by using the values of ET(Max) obtained from EV1, EV2, LN2 and LP3 are in the form of linear trend.

From Table 7, it is noticed that the values of ET(Min) by EV1 (MoM, MLM and OSA) vary between − 3.6 and 5.0 °C, whereas the corresponding values given by LN2 (MoM and MLM) and LP3 (MoM and MLM) vary between 1.7 and 4.6 °C. By considering the amount of variation in ET(Min) values in lower tail region, the plots of observed and estimated values of minimum temperature by EV1 are presented in Fig. 4, whereas the plots of LN2 and LP3 are presented in Fig. 5.

Fig. 4
figure 4

Plots of observed and estimated values of minimum temperature by EV1 distribution

Fig. 5
figure 5

Plots of observed and estimated values of minimum temperature by LN2 and LP3 distributions

Analysis based on GoF tests

The adequacy of fitting four different PDFs for EVA of rainfall and temperature was performed by adopting GoF tests, viz. A2 and KS, as described earlier. The GoF tests results for Hissar are presented in Table 8.

Table 8 Computed and theoretical values of GoF tests statistic adopting EV1, EV2, LN2 and LP3

From the GoF tests results, the following observations were made:

  1. (i)

    A2 test results confirmed the applicability of LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall.

  2. (ii)

    A2 test results did not support the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of temperature.

  3. (iii)

    KS test results suggested the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall and temperature of Hissar.

Analysis based on diagnostic test

A diagnostic test of D-index was used for assessing the adequate probability distribution for EVA of rainfall and temperature for Hissar. The D-index values of EV1, EV2, LN2 and LP3 distributions are computed and presented in Table 9.

Table 9 D-index values computed by EV1, EV2, LN2 and LP3 distributions

From Table 9, it is noted that the D-index value computed from EV1 (MoM) is minimum when compared to the corresponding values of other probability distributions for the series of AMR. Also, from Table 9, it may be noted that the D-index values computed from LN2 (MoM) and LP3 (MLM) are minimum for the series AMAXT and AMINT, respectively.

Selection of probability distribution

Based on the findings obtained through GoF and D-index tests, the selection of probability distributions for EVA of rainfall and temperature for Hissar was made by considering the following points:

  1. (i)

    For EVA of rainfall using the series of AMR, it is noted that:

    1. (a)

      The D-index value of EV1 (MoM) is minimum when compared with the corresponding values of other distributions. But, the A2 test results did not suggest the EV1 (MoM, MLM and OSA) for EVA of rainfall.

    2. (b)

      By eliminating the D-index value of EV1 (MoM), it is observed that the D-index values computed by LN2 (MoM), LP3 (MoM) and LP3 (MLM) are following the value of 1.737, 1.816 and 1.972, respectively.

  2. (ii)

    For EVA of temperature using the series of AMAXT, it is noted that:

    1. (a)

      The D-index values of LN2 (MoM), LP3 (MoM) and LP3 (MLM) are computed as 0.069, 0.076 and 0.078, respectively.

    2. (b)

      There is no difference between the D-index values of LN2 (MLM) and LP3 (MLM).

  3. (iii)

    Generally, the EVA results obtained from MoM may not give satisfactory results as (i) it is difficult to assess exact information about the shape of a distribution by its moments of third and higher order; and (ii) the estimated parameters of distributions were biased in comparison to procedures.

  4. (iv)

    By eliminating the D-index values of MoM, the D-index value of LP3 (MLM) is found as minimum; and hence LP3 (MLM) is considered as most appropriate distribution for EVA of rainfall and maximum temperature.

  5. (v)

    For EVA of temperature using the series of AMINT, the D-index value computed by LP3 (MLM) is found as minimum when compared with the corresponding values of other distributions.

  6. (vi)

    The study suggested the LP3 (MLM) distribution as better suited amongst four distributions adopted in EVA of rainfall and temperature (maximum and minimum).

Conclusions

The paper presents the study carried out for EVA of rainfall and temperature for Hissar by adopting four probability distributions and also assessing the adequacy of the probability distributions using GoF and diagnostic tests with applicable parameter estimation methods. The following conclusions were drawn from the study:

  1. (i)

    It was found that the estimated extreme values by EV2 (MLM) for rainfall, EV2 (OSA) for maximum temperature and LP3 (MoM) for minimum temperature are higher than the corresponding values of other portability distributions used in EVA of rainfall and temperature.

  2. (ii)

    Adequacy of fitting of probability distribution to the series of rainfall and temperature was evaluated by GoF tests (using A2 and KS).

    1. (a)

      A2 test results confirmed the applicability of LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall.

    2. (b)

      A2 test results did not support the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of annual maximum and minimum temperature.

    3. (c)

      KS test results suggested the EV1 (MoM, MLM and OSA), EV2 (MoM, MLM and OSA), LN2 (MoM and MLM) and LP3 (MoM and MLM) for EVA of rainfall and temperature of Hissar.

  3. (iii)

    On the basis of evaluation of diagnostic test results, the study identified that the LP3 (MLM) distribution is better suited for EVA of rainfall and temperature (maximum and minimum).

  4. (iv)

    The study suggested that the 1000-year return period ER + SE value of rainfall of 594 mm, ET(Max) + SE value of maximum temperature of 56.6 °C and ET(Min)-SE value of minimum temperature of 1.5 °C given by LP3 (MLM) distribution could be used for design purposes while designing the hydraulic structures in the vicinity of Hissar region.