1 Introduction

Climate change is a global phenomenon having varying degree of regional impacts. Though there are different schools of thought regarding the contribution of different driving forces of climate like greenhouse gases, aerosols etc., and also the sequence and pace of the phenomenon, everybody agrees that climate change and global warming is a reality. The third assessment report of the Intergovernmental Panel on Climate Change (IPCC 2001) revealed the impact of climate change on water resources. The climate change causes intensification of some processes within the hydrological cycle, affecting ground and surface water supply for irrigation, domestic and industrial uses, water-based recreation and hydropower generation. The projections indicate that there would be change in the variability of climate, and changes in the frequency and intensity of some extreme climatic phenomenon. Therefore, it is necessary to evaluate the consequence of climate change at river basin level. Xu (1999) discussed the existing gaps between the GCM realizations and hydrological concern in spatial as well as temporal scale and stated that the efficiency of the GCM to simulate at fine resolution decreases; whereas, the importance of the hydrological processes increases at fine scale (Kundzewicz et al. 2007). Furthermore, Chen et al. (2006) advocated that the coarser scale GCM outputs cannot be considered directly in hydrological studies at finer scale and will not be able to capture the circulation pattern causing the extreme events of hydrology (Christensen and Christensen 2007). Therefore, GCM outputs are inadequate to assess the spatial and temporal variability of rainfall required for hydrologic modelling (Wilby et al. 1999). Thus downscaling is used to link the large scale climatic variability to the historical observations of the surface parameters of interest to quantify the possible changes at the regional level. The method of modelling the hydrologic variables at a regional scale based on large scale GCM outputs is known as downscaling. Due to the lack of understanding of physical processes and vagueness involved in different future representative scenarios the uncertainty level is high in GCM simulated outputs (Mujumdar and Ghosh 2008). Additionally, Maraun et al. (2010) stated that the mismatch in the scale can be tackled by downscaling of the GCM output with the assumption that large scale global circulation has significant impact on the local scale weather.

The relationships resulting from observed data is used in statistical downscaling (Wigley et al. 1990; Hewitson and Crane 1996). In this method statistical relationship is established between the predictors and predictand to downscale the global projection to a regional level projection (Von Storch et al. 1993) and therefore, this method is considered as prognosis downscaling (Kalnay 2003; Wilks 2006). Future precipitation was calculated using fuzzy clustering from GCM projection over Orissa by Ghosh and Mujumdar (2006). Fuzzy clustering combined with relevance vector machine is used to predict streamflow over Mahanadi River using greenhouse emission scenarios by Mujumdar and Ghosh (2008). The statistical downscaling has gained popularity in various aspects of hydrology such as projecting the river runoff (Rao 1995; Simonovic and Li 2004; Samadi et al. 2013), low flows in river basin (Diaz-Nieto and Wilby 2005), to analyse the impact on temperature (Coulibaly et al. 2005; Chu et al. 2010), daily precipitation (Coulibaly et al. 2005; Chen et al. 2006; Maraun et al. 2010), pan evaporation (Chu et al. 2010). Furthermore, Wilby et al. (1998) performed more comprehensive analysis on hydrometeorological variables through statistical downscaling using Hadley Centre coupled ocean-atmosphere model through two GCM airflow scenarios. Bronstert et al. (2007) adopted a different approach, where they have assessed the climate change impact on hydrology by developing the future climate projection scenario based on the regional climate features and large scale future climate changes provided by the GCMs. In the present study, a regression based statistical downscaling method is used to evaluate the impact of climate change over a river basin, namely, River Godavari in India, using RCP scenarios from CanESM2 GCM. The historical precipitation data is considered as dependent variable and the predictors obtained from the National Centers for Environmental Predictions (NCEP) are treated as independent variables to establish the statistical relationship.

The paper is organized as follows. Details of the study area and data used in the study are presented in section 2. Model formulation and synthesis of future events are described in section 3. Comparison and analysis of the model output are summarized in section 4. Summary and conclusion of the study is detailed in section 5.

2 Study Area and Data

The Godavari is considered to be the second largest basin in India, which extends over regions of Maharashtra, Andhra Pradesh, Telangana, Odisha, Madhya Pradesh and Karnataka. The geographical area of the basin is 302,065.10 km2, which is 9.5% of total geographical area of India. The basin has tropical climate and about 85% of annual rainfall over the basin is received during south-west monsoon. The annual rainfall depth varies from 600 mm to 3000 mm. The basin lies between 73°24′ to 83°4′ east longitudes and 16°19′ to 22°34′ north latitudes. The location and basin map of the study area are shown in Fig. 1.

Fig. 1
figure 1

Position and basin map of Godavari basin

Data of the selected predictors for downscaling (a) mean sea level pressure (MSLP), (b) specific humidity, (c) 500 hPa geopotential height are collected from the National Centre for Environmental Prediction/National Centre for Atmospheric Research (NCEP/NCAR) official website (http://www.cdc.noa.gov/cdc/reanalysis/reanalysis.html). The climate data are extracted for the grid points covering the study area i.e. 40 (5 × 8) between latitudes 150 – 250N and longitudes 700–87.50 E.

A high resolution (10 × 10lat/long) gridded daily rainfall data for the Indian region developed by Indian Meteorological Department (IMD) is used in this study (Rajeevan et al. 2008). Daily data for monsoon season is converted to monthly data for the long term baseline period for the 25 grid points representing the study area.

For the future scenarios, RCP 2.6, 4.5 and 8.5, the climatic data are extracted from the fourth generation global climate model (CanESM2) output of CMIP 5 and used to project the future monsoon rainfall.

3 Model Formulation

The purpose of the study is to study the possible future rainfall variations on a river basin scale using different scenarios of climate change as predicted by GCM. The regression based statistical downscaling i.e. combination of fuzzy clustering and multiple regression is used to project the future monsoon precipitation. The detailed model formulation and estimation of the predictand is presented in the form of flow chart as shown in Fig. 2. The steps illustrating the complete procedure are as follows:

  1. Step 1:

    Regrid the GCM output at NCEP grid points. Perform Principal Component Analysis (PCA) to remove interdependency between the predictors. The variability among the data will be represented by the dimensionally reduced variables. Standardize the data to remove bias.

  2. Step 2:

    Find out the optimum number of clusters and fuzzification parameter on the basis of Fuzziness Performance Index (FPI).

  3. Step 3:

    Take the membership function along with the principal components as independent components and perform multiple regression using rainfall as dependent variable.

  4. Step 4:

    Use Equiprobability transformation to remove the model uncertainty for the simulated historical data and make the correction for the simulated future data based on CDF.

Fig. 2
figure 2

Flow chart for estimation of future rainfall

The data obtained from the GCM are at atmospheric horizontal resolution of 2.81250 × 2.81250 and the NCEP data at a resolution of 2.50 × 2.50. Interpolation is carried out to avoid the error associated with the mismatch of the grid resolution by re gridding the GCM data on the NCEP grid points

The initial stage of correcting the bias is known as Standardization (Wilby et al. 2004). Standardization is carried out by subtracting the long term mean (location parameter) and dividing by the standard deviation (scale parameter) of the predictor variable to reduce the systematic bias between the GCM and NCEP data set before downscaling (Ghosh and Mujumdar 2008). The long term baseline period is taken from 1971 to 2007.

The predictors for downscaling are extracted at each points of NCEP. Therefore the total number of the data attributes is 120 i.e. 40 grids and 3 predictors. The most important thing is that the predictors are also correlated among themselves hence it is very difficult to handle high dimensional correlated data. PCA is a tool used to remove the interdependency among the variables and reduce the dimentionality (Hannachi et al. 2007). PCA uses an orthogonal transformation to convert a set of correlated observations into a set of values of linearly uncorrelated variables called principal components (Huth 1999). In this study, the first 6 principal components explain 96% of the information (variability) of the original predictors. The obtained Eigen values vector, which acts as principal direction and preserves the variability of the observed series, is multiplied with the GCM data to get the principal components of GCM data.

3.1 Fuzzy Clustering and Multiple Linear Regression

A fuzzy clustering based downscaling technique developed by Ghosh and Mujumdar (2006) is used in the present study for downscaling the monsoon precipitation. Grouping of dataset into a number of classes is known as clustering. Membership function is calculated based on the Euclidean distance between the cluster centre and the data points. Membership value ranges between 0 and 1 (Raju and Nagesh 2007). Dataset, which are closer to the centre of the cluster will get membership function close to 1 and vice versa. In the present study the fuzzy clustering analysis is carried out with the dataset containing six principal components obtained from the atmospheric predictors (i.e. pc 1t , pc 2t, pc 3t, pc 4t, pc 5t, pc 6t ) for different time periods. Optimum numbers of clusters are determined based on Fuzzy Performance Index (FPI). Fuzzification parameter (m) and number of clusters (c) are the two important parameters of the fuzzy clustering algorithm. The values of these two parameters are determined based on Fuzzy Performance Index (FPI). FPI is calculated using Eqs. 1 and 2.

$$ F=\frac{1}{N}{\displaystyle \sum_{i=1}^c{\displaystyle \sum_{t=1}^N{u}_{it}^2}} $$
(1)
$$ FPI=1-\frac{\left(cF-1\right)}{\left(c-1\right)} $$
(2)

Where, μit is the membership in cluster of the principal components, c is number of clusters, t is the time in month (i.e. June, July, August and September). Fuzzification parameter is varied from 1.2 to 3.0 and number of clusters is varied between 2 and 5. FPI value of about 0.25 is recommended for the purpose of selecting the parameters m and c (Guler and Thyne 2004; Ghosh and Mujumdar 2008). The results for the selection of the optimum number of cluster and corresponding fuzzification parameter are shown in Table. 1.

Table 1 Fuzziness performance index result

Membership value of the principal components and the principal components itself are taken as the independent variables to fit a statistical relationship with the monsoon rainfall (dependent variable) in multiple linear regression.

$$ {Rain}_t={C}_1\times {\mu}_{1t}+{C}_2\times {\mu}_{2t}+\dots \dots ..+{C}_n\times {\mu}_{nt}+{B}_1\times {PC}_{1t}+\dots \dots \dots +{B}_6\times {PC}_{6t} $$
(3)

Where, C1 to Cn and B1 to B6 are coefficients for membership function and principal component respectively.

4 Results and Discussions

Membership function and principal components are used to fit the monsoon rainfall at each grid points of IMD falling over the study area. R2 (coefficient of determination) is calculated for every grid point as shown in Table 2. The result indicates that the regression model fits well at all the grid points.

Table 2 Goodness of fit for different grid points

4.1 Bias Correction after Downscaling

Ghosh and Mujumdar (2008) stated that the regression based downscaling generally will not be able to capture the entire variance of the predictand, which results in the form of bias near extreme events. Hence, the uncorrected bias should be taken care of; otherwise it will propagate in the computations of subsequent years. One method for the bias correction after downscaling is Equiprobability transformation. To remove such bias from a given downscaled output, for all the scenarios, the following methodology (Ghosh and Mujumdar 2008; Ghosh and Mujumdar 2007) is used.

  1. 1.

    Initially the probability density function (PDF) for the observed data is computed for all the grid points. Based on the PDF the probability plotting position is decided (e.g. Gringorten for the extreme value distribution, Weibull for the Gumbel distribution) to obtain the Cumulative Distribution Function (CDF).

  2. 2.

    Then the CDFs are calculated for the downscaled GCM and historical data for the years 1977–2007.

  3. 3.

    For a given GCM simulated precipitation the corresponding CDF (CDFGCM) is computed.

  4. 4.

    Corresponding to the CDFGCM the observed value is estimated from the observed CDF.

  5. 5.

    Then the GCM generated rainfall is replaced by the estimated rainfall with same CDF.

  6. 6.

    The correction factor is calculated for the reference period (1977–2007) and applied to the GCM generated future precipitation.

The main assumption in this method of bias correction is that the correction factor will remain the same when the model is used to predict the future scenarios.

Figure 3 shows the spatial pattern of average annual monsoon rainfall for historical as well as future projections under RCP 2.6 scenario. The spatial pattern is quite similar to the historical. Pattern shows that the eastern part of the basin receives more rainfall during monsoon and average monsoon rainfall decreases as we move towards the west side of the basin. The projections for future monsoon rainfall over the basin under RCP 2.6 scenario indicates that a significant increase of monsoon rainfall is observed in the lower reaches than middle and upper reaches of the river. However there is a decrease in the rainfall depth over the upper reach.

Fig. 3
figure 3

RCP 2.6 scenario generated annual average monsoon rainfall

Future projections under RCP 4.5 scenarios, shown in Fig. 4, indicate that the monsoon precipitation pattern is increasing. The precipitation pattern is moving from the lower reaches towards the middle reaches. Average monsoon rainfall pattern in 2070–2100 shows almost all sub-basins are getting more rainfall except the upper reaches.

Fig. 4
figure 4

RCP 4.5 scenario generated annual average monsoon rainfall

Figure 5 shows the average projected monsoon rainfall for the periods 2008–2038, 2039–2069 and 2070–2100 simulated under RCP 8.5 scenario. The spatially plotted long-term average total monsoon rainfall indicates that the depth of monsoon rainfall is going to increase in the future over coastal region; whereas, the upper reaches will experience the low rainfall depth. Moreover, there is no significant change in the precipitation pattern as compared to the historical observations.

Fig. 5
figure 5

RCP 8.5 scenario generated annual average monsoon rainfall

All the sub-basins under the study area are divided into two groups. Group 1 includes Indravati, Weinganga, Wardha and Pranhita. Similarly group 2 includes Godavari lower, Godavari middle, Manjra and Godavari upper. Average annual monsoon rainfalls for both groups are plotted in the form of box-plot for different durations. Figure 6 shows the box-plot for group 1. The lower and extreme value of average annual monsoon rainfall is increased under RCP 2.6 and 4.5 scenarios for all the sub-basins. High extreme values are observed in RCP 4.5 and the trend is increasing for both the scenarios. RCP 8.5 shows mixed trend with rainfall decreasing in the first thirty years of prediction and then increasing gradually over the next sixty years.

Fig. 6
figure 6

Box-plot of annual average monsoon rainfall for group-1 sub-basins

The group-2 sub-basins are plotted in Fig. 7. Godavari lower, Godavari middle and Manjra show an increasing trend for RCP 2.6 and 4.5 scenarios. Godavari upper is showing a decreasing trend for all the scenarios and the highest extreme observed for Godavari upper is under RCP 8.5 scenario. Sub-basins, Weinganga, Wardha, and Indravati, are falling in the zone of major rainfall and the interannual variation of anomaly of these sub-basins is performed for the downscaled future monsoon rainfall.

Fig. 7
figure 7

Box-plot of annual average monsoon rainfall for group-2 sub-basins

The interannual variation of the anomaly (defined as the actual value in any year minus the mean value) of these sub-basins is computed as a percentage of the mean value during 1977–2007. The years in which the rainfall is above mean by more than one standard deviation are considered as excess years, while the years in which the rainfall is below mean by more than one standard deviation are considered as deficit years.

Interannual variation of anomaly for Weinganga sub-basin is shown in Fig. 8. For Weinganga variation above 20.5% is considered as excess, while variation below −20.5% is considered as deficit. The historical anomaly is plotted from 1977 to 2007 and it shows that frequency for the deficit is 3 in 18 years (1987–2004). The future downscaled anomaly for different scenarios is plotted from 2020 to 2100. For scenario RCP 2.6 frequency of deficit is very less i.e. 4 in 81 years but the excess rainfall frequency is quite high with respect to the historical. The excess events are occurring in cluster form. For the scenario RCP 4.5 the interesting observation is that there is no deficit period and the excess frequency is very high and magnitude is also very large with reference to the mean. In RCP 8.5 the frequency of excess is higher in between 2090 and 2100 i.e. 9 in 11 years.

Fig. 8
figure 8

Interannual variation of anomaly (as % of the mean) of Weinganga for different scenarios

Interannual variation for Wardha sub-basin (Fig. 9) shows that the frequency of deficit in RCP 2.6 is very less than the historical observation. Frequency of excess is quite high as well as the magnitude. RCP 4.5 interannual variation indicates that the frequency of the deficit is zero and the frequency of the excess precipitation is very high. RCP 8.5 is showing the frequency of deficit period as 7 in 20 years which is higher than the historical observation and the excess periods are lesser than RCP 2.6 and RCP 4.5 scenario variability.

Fig. 9
figure 9

Interannual variation of anomaly (as % of the mean) of Wardha for different scenarios

Indravati historical interannual variability of anomaly (Fig. 10) shows the frequency of excess is higher than the other two sub-basins. The excess rainfall is more frequent in RCP 2.6 and RCP 4.5 than other two sub-basins. No deficit period is observed in RCP 4.5 which is very much similar to the others. In RCP 8.5 more frequent excess rainfall anomaly is observed in the last decade i.e. 2090–2100

Fig. 10
figure 10

Interannual variation of anomaly (as % of the mean) of Indravati for different scenarios

5 Conclusions

A multiple linear regression model built with principal components and fuzzy clusters is used to downscale the monsoon rainfall over river Godavari to evaluate the impact of climate change using GCM simulated atmospheric variables. The Spatial distribution of total monsoon rainfall is plotted over the study area from 2008 to 2100 using CANESM2 global climate model and the critical findings are as follows.

The statistical relationship between the GCM output and monsoon rainfall is modelled by fuzzy clustering based multiple linear regression. The variations in the spatial distribution indicate that the zone under the high magnitude of monsoon rainfall (lower and middle reaches) according to the historical observation will get more rainfall and the zone under the less rainfall amount (upper reaches) will get less precipitation. Furthermore, there is a significant change in the spatial pattern under RCP2.6 and 4.5 where the middle reaches are going to get more monsoon rainfall as compared to the past. The highest amount of precipitation is observed in RCP 4.5 scenarios. Interannual variability of anomaly for sub-basins under the monsoon zone shows the frequency of deficit rainfall is very less in all the scenarios. RCP 4.5 scenario has no deficit period and magnitude with respect to mean is also very high. A sudden increase in the precipitation is observed for different temporal scales in RCP 2.6 and 4.5. There is a gradual increase in the rainfall for RCP 8.5 i.e. highest rainfall is observed during 2070–2100. Predictors selected for the downscaling show a positive feedback for the radiative forcing 2.6 w/m2 and 4.5 w/m2. The forcing 8.5 w/m2 is showing a negative feedback for the period 2008–2038 and positive feedback for next 60 years.