1 Introduction

The summer monsoon rainfall is an essential parameter for agricultural production, water supply, and livelihood in India. However, relatively accurate quantification of summer monsoon rainfall for future periods at the regional and local scale is a challenging task owing to its erratic behavior and skewed statistics (Meher et al. 2017). The varying orography and land-sea contrast make the uneven distribution of rainfall over landmasses of India, and therefore, the predictability of rainfall is always a challenging task. Global climate models (GCMs) are used to predict the rainfall but unable to provide information on regional/smaller scales (Solomon et al. 2007). To improve prediction, in the 1990s, World Climate Research Programme (WCRP) coordinated Coupled Model Inter-comparison Project (CMIP) has been carried out on control experiment and variety of sensitivity experiments (Meehl et al.2000), and further additional phases of the CMIP, termed as CMIP2, CMIP2+, CMIP3, CMIP5, and recently CMIP6, have been performed. There are numerous studies on validation and future projection of rainfall in CMIP3 and CMIP5 model experiments over the land points of India (Sarthi et al. 2015, 2016); however, projection of summer monsoon rainfall under changing climate using GCMs is still challenging for the researchers (Pattnaik and Kumar 2010; Turner and Annamalai 2012). In addition to that, uncertainties are associated with GCMs in the prediction of monsoon rainfall due to the vastness of GCMs, coarse resolution, and not proper inclusion of local or regional factors (Christensen et al. 2008; Saini et al. 2015).

To provide information on regional scales for agricultural planning, water resources management, power industry, and environmental policymaking, the prediction of rainfall through coarse resolution GCMs is not sufficient (Maraun et al. 2010). In general, the globally available GCMs outputs at coarser resolution (varying resolution of 1.0–2.5°) (Pepler et al. 2016) are mandatory to be scaled down to local scales and are done by either dynamical or statistical downscaling. To provide the information at a regional scale, there are varieties of dynamical downscaling and statistical downscaling techniques developed in the last decades (Wilby and Wigley 1998; Mearns et al. 1999; Maraun et al. 2010; Sunyer et al. 2012; Ekström et al. 2015). The statistical technique uses empirical relation between large-scale climate predictors from GCMs and the local scale predictants of real-time observation (station data) of interest (Huang et al. 2011; Wilby et al. 1998). The dynamical downscaling technique employs regional climate models (RCM) using the output of GCMs (Fowler et al. 2007, Giorgi 1990). On the end, the widely used statistical downscaling is more applicable than dynamical downscaling (Sun and Chen 2012), because of its easy implementation, low computation effort (Fowler et al. 2007), and ability to provide point scale outputs (Wilby et al. 2002). However, there is no best suited downscaling approach since all these approaches depend on the desired spatial and temporal resolution of outputs and the climate characteristics of the region of interest (Trzaska and Schnarr 2014). The statistical downscaling methods such as the WEather GEnerator (WGEN), the Long Ashton Research Station-Weather Generator (LARS-WG), and the Statistical Downscaling Model (SDSM) (Hashmi et al. 2011; Mahmood and Babel, 2013) are recently developed.

To fulfill the requirements and necessity of downscaled climate data, NASA applied statistical downscaling techniques on GCMs of the CMIP5, to generate a high-resolution dataset for long-term projections, called “NASA Earth Exchange Global Daily Downscaled Projections” (NEX-GDDP), which have been released on June 2015 (Thrasher et al. 2013). Raghavan et al. (2018) have used NEX-GDDP data for examining NEX-GDDP dataset over Southeast Asia in historical (1976–2005) and future (2020–2050, 2070–2099) periods (under RCP 4.5 and 8.5), for rainfall and surface temperature at a surface resolution of 25 km on a daily basis. Over China, the NEX-GDDP data has been evaluated for their performance in simulating the extremes of rainfall and climate changes (Chen et al. 2017). The historical dataset shows good agreement with observations on monthly scales but fails to capture daily statistics. Sahany et al. (2019) validated the NEX-GDDP and NCAR-CCSM4 model under CMIP5 experiments and suggested an underestimation of rainfall extremes by CCSM4-CMIP5 than the CCSM4-NEX-GDDP. Both CCSM4-CMIP5 and CCSM4-NEX-GDDP have projected an increase in annual rainfall over India, under the RCP8.5. Worth noting is that the extreme daily rainfall values projected by CCSM4-NEX-GDDP are two to three times larger than that projected by CCSM4-CMIP5.

As mentioned earlier, the simulation (in the past and future periods) of June-July-August-September (JJAS) rainfall over monsoon homogeneous regions is still a challenging task due to different physics and parameterization schemes applied in the models (Christensen and Christen, 2007). To fulfill this gap, the newly available NEX-GDDP rainfall data (https://nex.nasa.gov/nex/projects/1356) provided by NASA in multiple climate models are evaluated for JJAS rainfall over India. The current study may be a novel approach for the assessment of NEX-GDDP in capturing the characteristics of observed rainfall over individual HMRI.

In this paper, the first section discusses the existing literature over the pros and cons of the spatial resolution of GCMs and dynamically downscaled RCMs in simulation and statistical downscaling of ISMR and, in last, describes the major objective of current research. Section 2 consists of data and methods, followed by Sect. 3 that discusses the result and discussion. The conclusions are placed in Sect. 4.

2 Study area, data, and methods

2.1 Study area

In this study, the five homogeneous monsoon regions of India are considered (Parthasarathy et al. 1993). The five homogeneous region are (i) North West India (NWI), (ii) West Central India (WCI), (iii) Central Northeast India (CNI), (iv) North East India (NEI), and (v) Peninsular India (PI), as shown in Fig. 1 (Source: IITM, Pune, India), and there are regional differences in the monsoon rainfall variability over each homogeneous monsoon region (Parthasarathy 1984; Walker 1925; Shukla 1987, Gregory 1989). In the present study, the Himalayan Region (HR) of India is not included due to fewer numbers of observations and is also distantly located (Rajeevan et al. 2006). The well-validated NASA’s NEX-GDDP models data at finer resolution may be helpful for impact assessment to sectors like hydrology, agriculture, economics, and others, in the near and far future period.

Fig. 1
figure 1

General overview of terrain in homogeneous monsoon region of India (HMRI). (Dem Data source: http://clima-dods.ictp.it/data/Data/RegCM_Data/SURFACE/)

2.2 Data

The high-resolution daily rainfall data of NASA Earth Exchange Global Daily Downscaled Projection (NEX-GDDP) at surface resolution 0.25° (~ 25 km × 25 km) is the output of twenty-one (21) GCMs of CMIP5 and is available for the period of 1950–2100 (during 1950–2005, in hindcast/retrospective run, and 2006–2099 in prospective run). Since these data provide climate change information in the past and future periods at the finest possible scales (Thrasher et al. 2012a, b), therefore, the dataset may be used for climate change assessment study at a city/basin level. The details of the methodology applied in generating this data are explained by Maurer and Hidalgo (2008), Thrasher et al. (2012a, b), Thrasher et al. (2013), and Wood et al. (2004). The bias correction spatial disaggregation (BCSD) method is used to produce the NEX-GDDP datasets. The BCSD is a statistical downscaling algorithm that addresses limitations of global GCM outputs (coarser resolution and biased at regional/local scale) (Wood et al. 2002, 2004; Thrasher et al. 2012a, b). For the purpose, five models, namely, ACCESS, CanESM, CCSM4, MPI-ESM-LR, and NorESM of NEX-GDDP, are considered and shown in Table 1. These five models (under CMIP5 experiment) have shown better performance in the simulation of JJAS rainfall (Sarthi et al. 2015, 2016; McSweeney et al. 2015; Sonali et al. 2017). The NEX-GDDP data of these selected GCMs are taken from the NASA data portal (ftp://ftp.nccs.nasa.gov/ NEX-GDDP). The observational data (either station data or grid data) plays an important role as reference value for the model’s evaluation, and therefore, the gridded data from the experiment of Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) at a spatial surface resolution of 0.25° (~ 25 km × 25 km) is considered (Yatagai et al. 2012) for the period of 1961–2005.

Table 1 List of NASA NEX-GDDP datasets included from CMIP5 models

2.3 Methodology

To find the correlation, at each grid between rainfall in observations and each of five selected model simulation rainfall, grid point correlation (GPC) is calculated at each of the grid points over regions of HMRI, as shown in Fig. 2.The annual and seasonal (JJAS) rainfall data is area averaged over the land point for individual HMRI, and is considered for the period of 1961-2005. The annual and seasonal (JJAS) rainfall is area-averaged over land points of individual HMRI for the period of 1961–2005. Further, the models’ ability for simulating the ISMR for the past time period, over individual HMRI, is assessed by comparing the daily climatology, distribution of rainfall using box plot, the probability density function (PDF), the Taylor (2001) statistics, and percentage bias. For the spatial distribution, mean JJAS rainfall is considered for a retrospective run (1961–2005).

Fig. 2
figure 2

Grid point correlation (GPC) between observation and (a) ACCESS, (b) CanESM, (c) CCSM4, (d) MPI-ESM-LR, and (e) NorESM

3 Results and discussion

3.1 Spatial distribution of ISMR over HMRI

The GPC between observations and simulated rainfall at each grid points over different regions is carried out by many researchers (Guhathakurta and Rajeevan, 2008; Sagar et al., 2017; Mandal et al., 2006). The GPC of JJAS rainfall between the model of NEX-GDDP and observation is shown in Fig. 2a–e. The GPC, which varies from 0 (no correlation) to 1 (strong correlation), is presented for individual HMRI. A strong positive GPC between observation and simulated JJAS rainfall of ACCESS, CanESM, CCSM4, MPI-ESM-LR, and NorESM is noticed over WCI and parts of NWI and CNI, which are the core monsoon regions of India (Sinha et al. 2007). The western part of WCI and PI also shows a strong GPC. The other regions have relatively weaker GPC. It seems that NEX-GDDP models simulated JJAS rainfall shows strong GPC over core monsoon regions of India. It might be attributed to better representation of orography in the driving model (GCMs here), leading to well capturing of the JJAS rainfall pattern over the regions (core monsoon region of India). The orography represented in GCMs, as well as appropriate parameterizations and convection schemes, may make it possible to cover the large-scale monsoon dynamics here (Xie et al. 2006). It may be emphasized that the best model for a particular area may not necessarily be the best performer over other regions (Errasti et al. 2011).

3.2 Temporal variability of ISMR over HMRI

In Fig. 3a–e, the daily JJAS rainfall climatology is shown in observation and simulation for the period of 1961–2005 over CNI, WCI, NWI, PI, and NEI. It seems that the NEX-GDDP-simulated daily JJAS rainfall is following the observed pattern, with varying magnitude (in the range of − 3 to 3 mm/day), of rainfall over each HMRI. Over CNI (Fig. 3a), the simulated rainfall shows a considerable large variation with observations; however MPI-ESM-LR- and NorESM-simulated rainfall show underestimation in comparison to the observation. Here, large variation means the degree to which rainfall amounts vary through time from the mean (not including the extremes). Further, the ACCESS-simulated rainfall shows large-scale variation compared with observation over regions of CNI, WCI, NWI, and NEI. In similar ways, Raghavan et al. (2018) suggest that over South Asia, NEX-GDDP-simulated daily rainfall statistics are not close to observation. Over the UK, Rivington et al. (2008) found that RCM-simulated rainfall during 1960–1990 shows an excess of small (< 0.3 mm) precipitation events in observation while overestimating the annual mean and underestimating at different places. It seems that the effect of topography in model simulations occasionally excites or intensifies precipitation extremes. Therefore high-resolution NEX-GDDP dataset (Bao and Wen, 2017) may not follow the extremes in observation.

Fig. 3
figure 3

Daily climatology during June1–September 30 in NEX-GDDP-simulated and observed rainfall (APHRODITE) over CNI (a), WCI (b), NWI (c), PI (d), and NEI (e)

The time series analysis is carried out to characterize the trend in JJAS-accumulated rainfall for the period of 1961–2005 over individual HMRI as shown in Fig. 4a–e. Over the region of CNI (Fig. 4a), all models perform relatively well in capturing trend of JJAS rainfall except ACCESS (overestimation). However, during the period of 1975–1995, a small bias is observed by the model. Over WCI, the models ACCESS and CCSM4 show significant positive and negative bias, while other models show an excellent predictability of JJAS rainfall. The models CCSM4, NorESM, CanESM, and MPI-ESM-LR follow the observed trend over the Penisular India. Over NEI, all the models show significant variability in comparison to observation. Again, over the region, NEI, CanESM, and CCSM4 outperform the trend. Further, an overview of the annual cycle is analyzed to determine how well each model does follow the pattern of the observed annual cycle. The annual cycle of simulated rainfall for the period of 1961–2005 is constructed for the initial evaluation of the model’s performance (Sarthi et al., 2015) and shown in Fig. 5a–e. The annual climatology is obtained by averaging the monthly data over the period of 1961–2005. The result shows that NEX-GDDP rainfall, except in a few cases (models), follows the pattern of observations. Over the CNI, CanESM-simulated rainfall is an overestimation of observation; however, over the PI, all models are overestimating (with less bias) in comparison to observations for the rainy months of August and September (highest). During the initial monsoon month (June), the models are overestimating (with less bias) over the region NEI. Over the region of NWI, the models are overestimating (slightly) during the month of July, while it shows a good resemblance with observation for all months. All models are showing resemblance with the observed pattern during the monsoon months.

Fig. 4
figure 4

Trend of JJAS rainfall over HMRI, in observation (APHRODITE) and datasets of NEX-GDDP

Fig. 5
figure 5

Annual cycle for NEX-GDDP models and observations over CNI (a), WCI (b), NWI (c), PI (d), and NEI (e)

To evaluate model’s performance in terms of the shape of the distribution, its central value, and its variability, the box and whisker plots are used (Sarthi et al. 2015, 2016; Rana et al. 2012; Durai and Bhardwaj, 2014; Saha et al. 2014; Ghosh et al. 2016). Figure 6 a and e show box plots of observed and simulated JJAS rainfall for CNI, WCI, NWI, PI, and NEI. The median of simulated JJAS rainfall shows good agreement with observation over WCI (Fig. 6b), NWI (Fig. 6c), PI (Fig. 6d), and NEI (Fig. 6e), whereas models MPI-ESM-LR- and NorESM-simulated median of JJAS rainfall are not close to observation over CNI. It seems that over the regions of large rainfall variability like CNI and WCI, the NEX-GDDP-simulated rainfall is not relatively closer to observations, while regions of small variability of rainfall like NWI, PI, and NEI in NEX-GDDP show good agreement in median as well as in the range (maximum and minimum values) of observed rainfall.

Fig. 6
figure 6

Box plots for NEX-GDDP-simulated and observed JJAS rainfall over CNI (a), CI (b), NWI (c), PI (d), and NEI (e)

Based on relatively low standard deviation (SD), high correlation, and low root mean square error (RMSE) of model’s simulated by JJAS rainfall in comparison to observations, the Taylor analysis is carried over individual HMRI as shown in Fig. 7a–e. In the Taylor plot, Pearson’s correlation is shown along the circular axis, and a strong value is located close to observation on the x-axis. The normalized standard deviation (SD) of observation is taken as one, and the same is shown in terms of its distance from the observation. Similarly, root mean square error (RMSD) of each model is shown as the distance from the observations on the x-axis (Taylor et al. 2012). The radial distance from the observation shows the actual performance of each model, the closer (radially) from observation, the more accurate in predicting the ISMR over particular HMRI. All models have different capabilities to simulate the ISMR compared to observation; hence, the Taylor score for each of the models varies. It is noticed that the NEX-GDDP-CanESM is performing relatively better than other downscaled models over NEI, PI, NWI, and WC, while NEX-GDP-MPI_ESM_LR is performing relatively better than others over CNI. The model MPI-ESM-LR and NorESM do well in simulating the ISMR over CNI and followed by ACCESS, CanESM, and CCSM4. Similarly, the model CanESM performs very well over the regions of PI, NWI, and WCI. It seems that the relative performance of downscaled NEX-GDDP models varies over individual HMRI (Sahany et al. 2019; Raghavan et al. 2018).

Fig. 7
figure 7

Taylor’s diagram for JJAS rainfall in NEX-GDDP-simulated and observed rainfall (APHRODITE) over CNI (a), WCI (b), NWI (c), PI (d), and NEI (e)

The percentage bias (PBIAS) is another way to assess NEX-GDDP model performance. Table 2 shows the different statistical scores for considered models (compare to observations) of NEX-GDDP over each of the HMRI. The result shows the positive PBIAS over PI of HMRI; however, the highest PBIAS is predicted by the model CCSM4 and lowest with ACCESS model. The CNI region of HMRI has large negative PBIAS in the simulation of NorESM (− 50.5) and MPI-ESM-LR (− 47.2), while a small positive PBIAS is predicted by ACCESS (5.4), CanESM (10.9), and CCSM4 (11.3) simulations. The PBIAS is highest over the region of NWI in MPI-ESM-LR, while over WCI region, a negative (but small) PBIAS is predicted by ACCESS model. It is very interesting that all models can simulate the JJAS rainfall with good confidence, except over PI in the high-resolution NEX-GDDP dataset (Bao and Wen, 2017).

Table 2 Percentage bias (PBIAS) prediction by model over individual HMRI

Examining climate statistics other than climate means is not new, and, earlier, researchers have used probability distribution functions (PDF) to analyze the frequency and severity of climate extremes. The PDF is used to understand model’s ability in simulating rainfall on daily basis during monsoon season, while monthly rainfall analysis is carried out to see how models are simulating monthly rainfall. Researchers have already reported that many climate models fail to simulate rainfall on a daily/monthly basis although they reasonably well simulate seasonal rainfall. To investigate the possible shifts in daily rainfall probability (Bokhari et al. 2018), the PDF on daily rainfall (during 1961–2005) in observation and model simulation over each of HMRI is shown in Fig. 8a–e. Over CNI, the frequency of daily rainfall in the simulation of NEX-GDDP-CanESM and NEX-GDDP-CCSM4 shows good agreement with observation and found to be in the range of 01–09 mm day−1. The model NEX-GDDP-ACCESS-simulated rainfall shows good agreement with the ranges of observed rainfall of 1–4 mm day−1. Other models, NEX-GDDP-MPI-ESM-LR and NEX-GDDP-NorESM, have underestimated the daily observed rainfall frequency. The models CanESM, CCSM4, and ACCESS show good agreement with a daily range of rainfall between 11 and 15 mm day−1. Over the NEI of India, each model shows good agreement with the frequency of rainfall in the range of 1–8 mm day−1. However, the overestimation and underestimation are very low in frequency, and it is suggested that all models are performing well in a rainfall probability distribution. While considering the performance of models over NWI, all five models have similar rainfall probability in daily rainfall of 1–3 mm day−1. The NEX-GDDP-CCSM4 does follow the frequency of daily climatological rainfall over the entire range of observed rainfall. However, overestimation in frequency is found for the remaining four models. Compare to observation, the models (NorESM and CCSM4) show underestimate (very slight in magnitude) of the rainfall frequency in the range of 4–5 mm day−1, but highly underestimation in frequency in the range of 3–7 mm day−1. Over the WCI, except ACCESS and NorESM, all models do perform quite well in the entire range of rainfall, but ACCESS and NorESM models do underestimate the entire frequency range of daily rainfall. It is further observed that over PI, the frequency of daily rainfall is more in all models as compared with other regions of HMRI and underestimates the rainfall frequency; however, in the range of 2–4 mm day−1, all model performs well in predicting the daily frequency.

Fig. 8
figure 8

Probability distribution function for NEX-GDDP models and observations over CNI (a), WCI (b), NWI (c), PI (d), and NEI (e)

It is generally accepted that the model that performed better in the current climate is considered as the model with a more reliable future projection (Errasti et al. 2011; Zamani and Berndtsson, 2018). It may be suggested that the model shows good skill, against observation, in simulating rainfall in historical experiment which may be (probabilistically) a good predictor for the future time period (Reichler and Kim 2008). Hence, the selected models based on evaluation may be relatively better in predicting rainfall in future period.

4 Conclusions

The coarser resolution of GCMs in CMIP5 does not provide much scope for studying the climate change assessment over the regional scale, which has varying orography, terrains, and climatic conditions. To fulfill this gap, high-resolution NEX-GDDP data may provide information at the regional level. Further, they are evaluated and assessed their performance in capturing the observed ISMR over individual homogenous monsoon regions of India. For the purpose, the observational rainfall data of APHRODITE and simulated rainfall in five models of NEX-GDDP are considered. The individual models are assessed over individual HMRI and validated against the observation by applying GPC, daily climatology, annual cycle, distribution in the box plot, the Taylor statistics, and PDF. In capturing the spatial pattern of ISMR over individual HMRI, the models in NEX-GDDP show much improved accuracy. The considered models widely agree with observation; however, over a few regions of HMRI, a mixed response is noticed. It is very crucial to find that, over the region of NEI, the model CanESM does perform well. Over the region CNI, the model MPI-ESM-LR does perform better than other models. Similarly, over the regions of PI, NWI, and WCI, models CanESM and NorESM have relatively better representation in capturing observed rainfall pattern. The relative performance of model in predicting the JJAS rainfall over the individual monsoon regions of India is summarized and shown in Table 3. Overall predictions by the model ACCESS are relatively weak. The lesser percentage bias and high GPC in the simulation of NEX-GDDP shows relatively better reliability of model for impact studies and may provide reliable projections in the near and far future time periods in compare to coarse resolution GCMs.

Table 3 Relative performance of model in predicting the JJAS rainfall over HMRI