1 Introduction

For developing countries like Pakistan, agriculture plays the part of heart in economy with 21% contribution to gross domestic product (GDP) and with 3.2% annual growth (Govt.-of-Pakistan 2008). Agriculture in Pakistan is a major user of water. Precipitation provides water for agricultural, for livestock, and also for human use. In tropical countries, rainfall is the vital natural input source for the production of crops. The spatio-temporal variation, occurrence, and distribution of precipitation are unpredictable in nature. Future probabilities of occurrence from the interpretation of past records of rainfall are the main problem in hydrology. Analysis and determination of daily maximum rainfall by probability distributions help in proper management and utilization of water resources. According to Bhakar et al. (2008), historical data can be used to find rainfall against different return periods through frequency analysis which helps to calculate occurrence probability of extreme rainfall events and can be used to estimate annual maximum precipitation for different return periods using the rainfall data (Bhakar et al. 2006).

The quantity and pattern of rainfall for specific place are significant parameters which affect management of water resources, flood protection, tourism, forestry, and agriculture. The floods are mainly caused by extreme rainfall events. The damages, as a result of floods and storms, can be minimized by accurate approximation of expected precipitation along with proper design of hydraulic structures. According to Tao et al. (2002), the data on extreme rainfall events with high return periods is prerequisite for operation and control of water resource policies, to lessen the damages caused by floods, and for the safe designing of hydrologic structures such as dams and urban drainage systems. Bhakar et al. (2008) reported that for good economic return, the farmers should be aware of the prediction of rainfall through scientific methods and systematic planning of their crops. Thus, many problems associated with water management can be solved through probability analysis of rainfall data. George and Kolappadan (2002) reported the prediction of rainfall of various quantities and frequencies by using probability distributions. As there are spatial and temporal variations in rainfall, projected rainfall during various return periods are estimated using different probability distributions. The projected rainfall which might be more or less than the recorded value is estimated using the best-fitted probability model. Precipitation data frequency analysis has been accomplished for various return periods (Barkotulla et al. 2009; Bhakar et al. 2006; Vivekanandan 2012).

In engineering practice, the main problem is to select proper distribution model. The main criterion for the selection of proper distribution model for a particular site is the available data of precipitation. Assessment of existing distribution models is required to choose appropriate distribution model to get the precise estimate of extreme rainfall. According to Tao et al. (2002), several probability models have been developed for the description of distribution for extreme precipitation values at a particular site. Anil (2000) and Singh (2001) reported that log–normal distribution fitted best to 24-h annual maximum precipitation in India. According to Bhakar et al. (2008), Gumbel distribution fitted best to monthly maximum rainfall in India. In Iran, generalized extreme value distribution and Pearson type 3 distribution were the best-fitted probability distributions for monthly maximum rainfall (Eslamian and Feizi 2007). In lower parts of northern areas of Pakistan, log–Pearson type 3 probability distribution fitted best to 24-h annual maximum rainfall values (Amin et al. 2016). Weibull, log–normal, and Pearson type 5 were the best-fitted distributions to monsoon, pre-monsoon, and winter seasons while for annual, summer, and post-monsoon seasons, normal distribution was selected as the best-fitted distribution, based on the study of 24-h monthly, seasonal, and annual maximum precipitation for Sagar Island which is on the continental shelf of Bay of Bengal (Mandal and Choudhury 2015). According to Lee (2005), log–Pearson type 3 was selected as the best-fitted probability model to 50% of stations of Chia-Nan plain area in Southern Taiwan. For 24-h maximum rainfall, log–Pearson type 3 was selected as the best-fit model in Nigeria (Ogunlela 2001). According to Olofintoye et al. (2009), log–Pearson type 3 probability model fitted well to half of the stations in Nigeria and Pearson type 3 probability model fitted well to 40% of stations for 24-h maximum rainfall. According to Kwaku and Duke (2007), log–normal was selected as the best-fitted probability model for maximum precipitation of 1–5 consecutive days at in Ghana. For monthly maximum rainfall, Gamma distribution was selected as the best-fitted probability model in the arid areas of Libya (ŞEN and Eljadid 1999).

Most of the population in Pakistan depends on agriculture for their food and fiber requirements as agriculture is the major contributor in the country’s economy. Water is essential for agriculture and precipitation provides water for agricultural production (Adnan and Khan 2009) Agriculture, biological diversity, and ecosystem are directly affected by extreme events of precipitation. It is therefore vital to estimate expected events of extreme rainfall to minimize the risk factors in the long-term measures of saving property and lives. At present, very limited work have been done in Pakistan to find the best-fitted probability distributions. No study has been carried out in SRIR which contributes water to rivers in Pakistan.

The current research was conducted in the study area that is the lifeline for the Indus Basin as it provides water for the inhabitants and for agriculture of this basin. Previously, only one study of this kind was conducted in the southwestern part of SRIR using rainfall data of only six rainfall stations and only four probability distributions. But, this study was conducted in the whole source region of Indus River and rainfall data of 50 stations was used. The objective of this research was the selection of appropriate probability model for daily maximum rainfall from 1965 to 2015 and to calculate the expected rainfall against the return period of 5, 10, 15, 20, 25, 50, and 100 years. The predicted amounts of rainfall will be helpful for making policies and developing plans to minimize the damages and risks of flooding from extreme rainfall events.

2 Materials and methods

2.1 Study area and data collection

This research was conducted in SRIR (Fig. 1) where elevation ranges from 193 to 8062 m above sea level. The Indus basin with a drainage area of about 1.08 million km2 is among the largest trans-boundary river basins with the major part being in Pakistan, i.e., 56% while 26.6% of area lies in India, 10.7% lies in China, and 6.7% lies in Afghanistan (Wolf et al. 1999). Upper Indus river basin is a distinctive area with multifarious climate (Lutz et al. 2016), diverse physical and geographical features, and divergent hydrological systems (Hasson et al. 2017). Indus River initiates from Mansarovar Lake in the Third Pole and flows through Northern Pakistan and ends its journey into the Arabian Sea. Source region of Indus River, also called Upper Indus Basin (UIB), is positioned in the mountainous range of Tibetan Plateau, Himalaya, Karakoram, and Hindu–Kush in the global range of 32.48° to 37.07° N and 67.33° to 81.83° E (Hasson et al. 2017; Khattak et al. 2011; Lutz et al. 2016). There are about 11,000 glaciers in these mountainous ranges (Hasson et al. 2017); with a glacier surface area of about 22,000 km2, it becomes one of the most glaciated areas in the world (Bajracharya and Shrestha 2011). Indus River and its tributaries (Jhelum, Chenab, Ravi, Satluj, and Kabul) flow through this region. Jhelum River is among the largest tributary of Indus River. The drainage area of Jhelum River Basin is approximately 33,435 km2 and it lies in the global range of 33 to 35° N and 73 to 75.62° E. The drainage point of Jhelum River Basin is the Mangla reservoir which is the largest reservoir in Pakistan after the Tarbela reservoir. There are five rivers/tributaries (Jhelum, Kanshi, Poonch, Neelum, and Kunhar) which flow through this basin and add water to Mangla reservoir. Jhelum River Basin and its tributaries get water from the southern Himalaya and some parts of Pir Panjal mountains in Kashmir. The digital elevation model of SRIR and location of meteorological stations and gridded stations are shown in Fig. 2.

Fig. 1
figure 1

Location map of study area

Fig. 2
figure 2

Study area, showing elevation and location of rainfall stations

Twenty-four-hour annual maximum rainfall from 28 meteorological stations and 22 gridded stations of the study area during the period of 1961–2015 were used in this study. Meteorological stations are located within the boundary of Pakistan and rainfall data of these stations are obtained from Pakistan Meteorology Department. Gridded stations are located outside the boundary of Pakistan and rainfall data of these stations are extracted from APHRODITE. A summary of statistics along with basic information about the rainfall stations is shown in Table 1. Most of the stations (36 No.) of this region are highly skewed as their coefficient of skewness is greater than 1 while 14 stations are moderately skewed.

Table 1 Statistics of 24-h annual maximum rainfall and characteristics of rainfall stations’ location

2.2 Probability distributions

A probability distribution is a method for showing the probable outcomes that a random variable may have along with their probabilities. The choice of proper probability model is very vital for selection of best-fitted model for a specific area. Most generally used probability distributions for the analysis of extreme precipitation are used in this study. These distributions are described in Table 2. Out of nine distributions, five have two parameters while four have three parameters. A distribution with more parameters is generally expected to give better results but its estimation method will be more difficult.

Table 2 Probability distributions and their parameters

2.3 Goodness-of-fit test

The fitness of probability distributions is usually checked by the statistics of goodness-of-fit tests. Three most generally used goodness-of-fit tests are used in this study and are described below:

  • Kolmogorov–Smirnov test

This test finds whether the sample is from assumed continuous distribution. The statistic (D) of this test is the maximum vertical change between theoretical distribution and empirical distribution (Conover 1998). It is defined as:

$${D}_{\mathrm{n}}=\mathrm{max}\left|{F}_{\mathrm{n}}\left(\mathrm{x}\right)-F(\mathrm{x})\right|$$
(1)

The hypothesis that the data follow the selected probability model is rejected if the calculated value of D is higher than its critical value that is 0.17981 at α = 0.05.

  • Anderson–Darling test

It is a common statistical test to find whether the dataset follows the given probability model. The AndersonDarling test statistics (A2) is calculated using Eq. (2):

$${A}^{2}= -n-\frac{1}{n}\sum_{i=1}^{n}\left(2\mathrm{i}-1\right)\left[\mathrm{lnF}\left({\mathrm{X}}_{\mathrm{i}}\right)+\mathrm{ln}\left(1-F\left({\mathrm{X}}_{\mathrm{n}-\mathrm{i}+1}\right)\right)\right]$$
(2)

The hypothesis that data comes from a particular probability model is rejected if the calculated value of A2 is higher than its critical value that is 2.5018 at α = 0.05.

  • Chi-squared test

This test is based on the difference between observed and expected values of each class. Best-fitted distribution was selected by comparing each distribution’s chi-square value and selecting the function that gives the smallest chi-square value (Agarwal et al. 1988). The statistic of this test is defined as:

$${\mathrm{x}}^{2}={\sum }_{\mathrm{i}}\frac{{({O}_{i}-{E}_{i})}^{2}}{{E}_{i}}$$
(3)

where Oi is the observed number and Ei is the expected number of cases in class i. The critical value for the chi-squared test is 11.07 at α = 0.05.

2.4 Return period (T)

Calculation of return period is the main task of frequency analysis. Exceedance probability is inverse of return period. A 5-year return period has an exceedance probability of 1/5 = 0.2 in EACH YEAR. If a variable whose value (x) becomes equal or higher than the event with value (\({x}_{T}\)) once in T years, then exceedance probability (P) of the variable in a given year is given by:

$$P\left(\mathrm{x}\ge {\mathrm{x}}_{T}\right)=\frac{1}{T}$$
(4)
$$T= \frac{1}{P}$$
(5)

3 Results and discussion

3.1 Best-fit probability distribution

Twenty-four-hour annual maximum precipitation from 50 rainfall stations in SRIR from 1961 to 2015 was used in this study. All the rainfall stations are located at different elevations (Fig. 2). Fourteen stations are located at an elevation of 0–1000 m and the average value of their daily maximum rainfall was 75.6 mm, and twenty stations are located at an elevation of 1001 to 3000 m with average rainfall of 57 mm while sixteen stations are located at an elevation of 3001 to 6000 m, and their average daily maximum rainfall was 21.4 mm. Average amount of daily maximum rainfall is decreasing with increasing elevation in the study area.

Nine probability distributions were used in this research work; five distributions have 2 parameters while four distributions have 3 parameters which were calculated using methods of moments. Three goodness-of-fit tests were applied to find the most suited probability distribution at each of the rainfall station in the study area. These goodness-of-fit tests were used following the procedure adopted by numerous authors in previous studies (Adegboye and Ipinyomi 1995; Chowdhury et al. 1991; Leavenworth and Grant 2000). The test statistics were calculated using Eq. (1) for the Kolmogorov–Smirnov test, Eq. (2) for the Anderson–Darling test, and Eq. (3) for the chi-squared test. The critical values were obtained from table at significance level of 0.05. The hypothesis that the rainfall data follow the probability distribution is rejected if the calculated value of test statistics is more than its critical value. These goodness-of-fit tests were given the value of 1 for best-fitted and 9 for least-fitted probability distribution. The values of all goodness-of-fit tests used in this study were added for all the rainfall stations and are shown in Table 3. It is clear from Table 3 that only log–Pearson type 3 is rejected at four stations, i.e., Cherat, Mangla, Peshawar, and Risalpur. The best-fitted probability model was selected based on total minimum scores of values from used goodness-of-fit test and is shown by gray color in Table 3. The best-fitted probability models based on these results for all rainfall stations are shown in Fig. 3a. As can be seen from Fig. 3a, generalized extreme value (GEV) is the best-fitted model at 54% of the stations in the source region of Indus River (SRIR). Log–Pearson type 3 (LP3) is the best-fitted distribution at 14% of the stations, followed by Gamma (12%), Weibull 3P (12%), Weibull (4%), log–normal (2%), and extreme value type 1 (2%) in the source region of Indus River. According to Khudri and Sadia (2013), GEV was also the best-fitted probability model for annual maximum rainfall in Bangladesh. Eslamian and Feizi (2007) also reported GEV and Pearson type 3 as the best-fitted probability models for maximum monthly values of rainfall in Iran. Second best-fitted models are shown in Fig. 3b. Log–Pearson type 3 (LP3) is the second best-fit probability distribution on most of the station in the study area.

Table 3 Sum of values of goodness-of-fit tests for probability distributions used in the study
Fig. 3
figure 3

Best-fit (a) and second best-fit (b) probability distributions at rainfall stations of SRIR

Probability density function and cumulative distribution function at 50 rainfall stations of the study area were calculated using the best-fitted model of that station and results are presented in Figs. 4 and 5. Fifteen rainfall stations whose daily maximum rainfall is greater than or equal to 250 mm have lower probability, i.e., less than 1.5% (Fig. 4a), fourteen stations with daily maximum rainfall of 150–250 mm have probability up to 3% (Fig. 4b), and ten stations with daily maximum rainfall of 100–150 mm have probability up to 5% (Fig. 4c) while eleven rainfall stations with daily maximum rainfall less than equal to 60 mm have highest probability, i.e., up to 15% (Fig. 4d). The cumulative probability for fifteen rainfall stations is about 1 with daily maximum rainfall equal to or less than 250 mm (Fig. 5a), the cumulative probability is about 1 for fourteen stations with rainfall less than or equal to 150 mm (Fig. 5b), and the cumulative probability is about 1 for ten stations with daily maximum rainfall less than equal to 100 mm (Fig. 5c) while cumulative probability for eleven stations is about 1 with daily maximum rainfall less than equal to 60 mm (Fig. 5d).

Fig. 4
figure 4

Probability density functions at rainfall stations of SRIR. a Daily maximum rainfall ≥ 250 mm. b Daily maximum rainfall 150–250 mm. c Daily maximum rainfall 100–150 mm. d Daily maximum rainfall ≤ 60 mm

Fig. 5
figure 5

Cumulative distribution functions at rainfall stations of SRIR. a Daily maximum rainfall ≥ 250 mm. b Daily maximum rainfall 150–250 mm. c Daily maximum rainfall 100–150 mm. d Daily maximum rainfall ≤ 60 mm

3.2 Expected rainfall against various return periods

Daily maximum rainfall against the return periods of 5, 10, 15, 20, 25, 50, and 100 years is estimated using the best-fitted probability model at each of the rainfall station of the study area and the rainfall estimates are shown in Fig. 6. For the 5-year return period, daily maximum rainfall of less than 50 mm is estimated at 21 stations and 50–100 mm is estimated at 15 stations while more than 100 mm is estimated at 14 stations of the study area. For the 20-year return period, daily maximum rainfall of less than 50 mm is estimated at 12 stations, 50–100 mm is estimated at 12 stations, and 100–150 mm is estimated at 11 stations while more than 150 mm is estimated at 15 stations of the study area.

Fig. 6
figure 6

Rainfall estimates against different return periods using best-fitted probability distribution at rainfall stations in SRIR

The spatial distributions of estimated rainfall against return periods of 5, 10, 15, 20, 25, 50, and 100 years in source region of Indus River are shown in Fig. 7. Daily values of annual maximum rainfall ranged from 12 mm in the eastern part of SRIR to 186 mm in the middle of source region of Indus River against return period of 10 years (Fig. 7b). Annual maximum rainfall ranged from 17 to 233 mm and 20 to 272 mm against return period of 25 and 50 years respectively (Fig. 7e and f). Amount of expected rainfall is increasing from the eastern side toward the middle and from the western side toward the middle of SRIR. For the 100-year return period, daily values of annual maximum rainfall ranged from 23 to 317 mm (Fig. 7g).

Fig. 7
figure 7

Spatial distribution of rainfall against return period of 5 years (a), 10 years (b), 15 years (c), 20 years (d), 25 years (e), 50 years (f), and 100 years (g)

4 Conclusions

Source region of Indus River is the area providing water to Indus River and its tributaries. Daily maximum rainfall observations from 1961 to 2015 of 50 rainfall stations of SRIR were used in this study to find the best-fitted probability model out of nine distributions used. Three goodness-of-fit tests were employed to find the most suited probability model. Each probability model is given the value of 1 to 9 based on the results of the goodness-of-fit tests. The values of these goodness-of-fit tests were added and best-fitted probability distribution was selected based on the minimum sum of values. Daily maximum rainfall estimated using the non-best-fitted probability distributions could give high or low values as compared to the actual ones which will have adverse effects on the safety of hydrologic structures.

Generalized extreme value probability model was selected as best-fitted distribution on twenty-seven stations of the source region of Indus River (SRIR), followed by log–Pearson type 3 distribution on seven stations, Gamma distribution on six stations, Weibull 3P distribution on six stations, Weibull distribution on two stations, log–normal distribution on one station, and extreme value type 1 distribution on one station. It is clear from the findings of this study that generalized extreme value distribution fitted best to more than 50% stations of SRIR.

The more applied result of this study was the predicted rainfall against different return periods for all the rainfall stations of the study area. The best-fitted probability distributions, e.g., GEV, LP3, Gamma, W3P, W, LN, and EV1, were used for the estimation of rainfall against return periods of 5, 10, 15, 20, 25, 50, and 100 years. The spatial distribution of estimated daily values of annual maximum rainfall shows that the middle part of SRIR has the highest values, i.e., 186 mm for the 10-year return period and 317 mm for the 100-year return period. The eastern side of the study area has the lowest values of 24-h annual maximum rainfall, i.e., 12 mm and 23 mm for 10- and 100-year return period respectively.

A policymaker can use 25 or 50-year rainfall estimates for making policies or risk analysis of 25- or 50-year plan to minimize the threat and reparations from extreme events of rainfall and flooding.