1 Introduction

Precipitation plays a crucial role to sustain any life on the Earth. It is an important element to assess the impact of global warming on agriculture production over regions like India. The economy of the Indian region is dependent on rain-fed agriculture, which is most sensitive to the variations in rainfall distribution (both spatial and temporal). Precipitation over the Indian region is highly variable in time and space, leading to large-scale floods and droughts.

Over the Indian region, the summer monsoon period (June to September) contributes most of the annual rainfall. The Indian summer monsoon (ISM) rainfall is the result of the interaction of several complex atmospheric and oceanic processes evolving at different spatial and temporal scales (Webster 1987). The amount of rainfall in the monsoon season is highly variable, spatially ranging from 160 to 1800 mm/year from the north-west to the northeast and from the far north to the extreme south. Spatially coherent monsoon regions, when aggregated, result in five homogeneous regions covering 95 % of the whole of India (except hilly regions in the extreme north of India) as shown in Fig. 1a. The ISM rainfall patterns are modulated by steep topography (see Fig. 1a) of the Himalayas and western Ghats (WG) (e.g., Bookhagen and Burbank 2010), which alters the pathways of water vapor transport. WG and northeast region of the India receive the highest annual rainfall. Both the mountain terrains have unique characteristics. The hilly terrain acts as a hermetic to southwest winds coming from Arabian Sea leading to intensified downpour at the windward side of the mountains. Spatial distribution of annual rainfall in India exhibits north–south oriented belt of heavy rainfall over WG. In addition, spatially coherent monsoon regions are also shown in the Fig. 1a, as defined by the India Meteorological Department (IMD) (Rao 1976). Rainfall over the Indian region varies on time scales ranging from hours through intraseasonal and longer periods. The representation of precipitation in the gridded datasets comprises a number of challenging aspects, which may not occur when considering continuous variables such as temperature or pressure, which vary smoothly over a large spatial scale.

Fig. 1
figure 1

a Topographic map (based on ETOPO1 data provided by NOAA) of the Indian sub-continent. The major regions are shown west coast (WC), east coast (EC), interior peninsula (IP), north east (NE), north central (NC), north west (NW) and western Himalayas (WH). The Himalaya forms a high topographic barrier in the north resulting in orographic rainfall. Along the west coast of India are the western Ghats forming an additional orographic barrier. b Number of rain gauge stations used for generating the gridded IMD data

The IMD had brought out high quality, high-resolution (1° × 1°) long-term gridded precipitation data (Rajeevan et al. 2006, 2008) for the period of 1901–2007 by using large network of 6300 rain gauge stations as shown in Fig. 1b. A number of studies on different aspects of Indian rainfall have been undertaken by using these high-resolution gridded datasets (Lau and Waliser 2005; Lau et al. 2012; Bollasina et al. 2011; Ghosh et al. 2012; Rajeevan et al. 2012). Earlier, Rajeevan et al. (2006) explained about the interpolation technique for the rainfall data into regular grids including the directional effects and barriers. They have compared the IMD gridded dataset with Variability Analysis of Surface Climate Observations (VASClimo) dataset during the monsoon season for the period 1951–2004. Mishra et al. (2012) carried out comparative study of the IMD rainfall dataset with three reanalysis datasets during the monsoon season for the period 1951–2004. Later, Rajeevan et al. (2008) studied the variability and long-term trends using extreme IMD rainfall events. It was found that the frequency of extreme rainfall events shows significant inter-decadal variations in addition to a statistically significant long-term trend of 6 % per decade. However, in the present study, we attempted for detailed evaluation of the high-resolution IMD data with both observational and reanalysis datasets had been not documented. Thus, evaluation of the products over specific geographic or specific regions will help assess the performance of the models under different circumstances.

The spatio-temporal patterns of rainfall and trend analysis have been studied over the Indian region by using 107 years of precipitation data. There are three objectives of the present study. The first aim is to evaluate the IMD gridded precipitation, with other observational datasets [Global Precipitation Climatology Project (GPCP) and Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of water resources (APHRODITE)]. The second aim is to compare the reanalysis datasets [Climate Forecast System Reanalysis (CFSR), European Centre for Medium-Range Weather Forecasts Interim Re-Analysis (ERA-Interim), Japanese 25 years Re Analysis (JRA-25), and National Aeronautics Space Administration-Modern Era Retrospective-analysis for Research and Applications (NASA-MERRA)] with IMD data over the Indian region. The third aim is to examine the spatial trends over the Indian region. Due to an increasing tendency of climate change related studies over this region, this information would be useful for verification of climate model characteristics, as well for agriculture and water management activities. In the present study, we considered the four seasons namely, winter (December, January and February), pre-monsoon (March, April and May), monsoon (June, July, August and September) and post-monsoon (October and November).

This study is organized as follows: Sect. 2 deals with the brief description of the different datasets used in this study. In Sect. 3, presents the brief discussion on methodology. Section 4 discusses the results of climatology, trends, and comparison with observed and reanalysis datasets. Seasonal comparisons and differences between IMD and reanalysis datasets are described. Finally, Sect. 5 presents the conclusion of the study.

2 Data

2.1 IMD gridded precipitation data

The IMD precipitation datasets are available on 1° × 1° latitude–longitude resolution over the India region (Rajeevan et al. 2006, 2008). Before interpolating, the station rainfall data into regular grids multistage quality control (QC) of observed data was carried out (Rajeevan et al. 2008). The data are interpolated into regular grids using weighted-sum method with radii of influence, including the directional effects developed by Shepard (1968). By using the daily data, we estimated the monthly, seasonal, and yearly rainfall at each grid point.

2.2 GPCP precipitation data

The GPCP data is one of the most accurate and complete in-situ precipitation datasets. The GPCP data (Version 6) comprise quality controlled gridded monthly precipitation data from 85,000 rain-gauge stations in near time via the World Meteorological Organization (WMO) Global Telecommunication System (GTS) and non-real time by bilateral contributions from most meteorological and hydrological services of the world and historic data. The GPCP products, consists of monthly precipitation for global land surface available with different spatial resolution. The non-real time products based on the complete GPCP monthly rainfall database covers the period from January 1901 to December 2010. Non real-time data comes from dense national observation networks of individual countries and other global and regional collections of climate data are integrated in the GPCP full database (Schneider et al. 2011).

2.3 APHRODITE’s precipitation data

The APHRODITE project developed state-of-the-art daily precipitation datasets with high-resolution (0.25°) grids for the Asian region. APHRODITE’s Water Resources project has been executed by the Research Institute for Humanity and Nature (RIHN) and the Meteorological Research Institute of Japan Meteorological Agency (MRI/JMA) since 2006 (Yatagai et al. 2012). This dataset base is generated primarily with data obtained from in-situ rain-gauge-observation network. The quantitative estimation of precipitation and thus development of a gridded precipitation product are crucial for many scientific studies. Thus, many methods of interpolating station data have been developed in the last century, and considerable efforts have been made to derive precipitation information from satellites over the last two decades. This data is available from 1961 to 2007 and can have access from http://www.chikyu.ac.jp/precip/scope/index.html (Accessed 2014).

2.4 ERA-Interim precipitation data

ERA-Interim is one of the long-term global atmospheric reanalysis datasets produced by the European Centre for Medium-Range Weather Forecasts (ECMWF). The primary goal for ERA-Interim was to address several difficult data assimilation problems encountered during the production of ERA-40. Reanalysis data provide a multivariate, spatially complete and coherent record of the global atmospheric circulation (Dee et al. 2011). In 2006, a new reanalysis was introduced from January 1979, to produce an interim reanalysis (ERA-Interim). ERA-Interim forecasts have a uniform quality in time and space than forecasts from the ERA-40 (Uppala et al. 2005). ERA-Interim data uses a four-dimensional variational (4D-var) system based on spectral Global Circulation Model (GCM) simulations (Dee et al. 2011). The main advances in ERA-Interim system are improved model physics, a new humidity analysis, a variational bias correction technique, and direct assimilation of early satellite radiance data and an improved fast radiative transfer model. The datasets are available with 1.5° × 1.5° longitude and latitude grids.

2.5 JRA-25 precipitation data

In JRA-25, a six hourly global data assimilation cycle was carried out from 1979 to present. JRA-25 reanalysis datasets are combined effort by the Japan Meteorological Agency (JMA) and the Central Research Institute of Electrical Power Industry (CRIEPI). The main components of the system are a spectral forecast model, various QC processes, a three-dimensional variational (3D-var) data assimilation process, and a land surface model (Onogi et al. 2005, 2007). JRA-25 reanalysis is a basic meteorological grid point dataset with uniform resolution of 120 km in the horizontal and from the surface to about 50 km in vertical; it has the highest amount of historical observational data available at present. The datasets utilized for this study during the period from 1989 to 2007 with 1.25° × 1.25° longitude and latitude grid.

2.6 NASA-MERRA (MERRA)

The Global Modeling and Assimilation Office (GMAO) at NASA’s Goddard Space Flight Center are producing a satellite era analysis called the National Aeronautics Space Administration (NASA) Modern Era Retrospective-analysis for Research and Applications (MERRA) (Bosilovich et al. 2006). The objectives are to support NASA’s climate strategies by placing current research satellites in climate context and improving the representation of water cycle in the reanalysis. The model uses a topographically based approach instead of a layer-based approach to soil moisture. The MERRA data are available from 1979 to present. MERRA has data on a 2/3° longitude and ½° latitude native grid (540 × 361 global grid points). The MERRA reanalysis datasets are produced using the grid point statistical analyses method (Kliest et al. 2009; Cullather and Bosilovich 2011), a three dimensional variational assimilation system, with a 6-h analyses window.

2.7 Climate forecast system reanalysis (CFSR)

The CFSR (Saha et al. 2010) is the latest global reanalysis from the National Centers of Environmental Prediction (NCEP). Major modeling improvements include the hybrid sigma-pressure coordinate system to improve the stratosphere, a better land surface model, better moist physics, and a coupled ocean model/analysis (Saha et al. 2010). The major advances in the CFSR include a 6-h coupled forecast for the first guess field, an interactive sea-ice model, assimilation of satellite radiances, and a high resolution of atmospheric and oceanic model. The CFSR datasets are available with high spatial (0.5° × 0.5°) and temporal (hourly) model outputs. Recently CFSR have been examined in the accompanying papers, including an assessment of the surface climate variability (Wang et al. 2010) and troposphere (Chelliah et al. 2011) variability. Although the observational (GPCP and APHRODITE) and reanalysis (CFSR, ERA-Interim, JRA-25, NCEP, and MERRA) datasets are available with different time periods, in order to maintain uniformity we used all the datasets from 1989 to 2007.

3 Methodology

To compute the spatially averaged precipitation over the entire country, this allows for uniform comparison between the different datasets. Therefore, the reanalysis and observational datasets are regridded to a 1° × 1° grid corresponding to the grids of IMD gridded data. Using IMD and reanalysis precipitation data, we have calculated the daily, monthly, and seasonal means at each grid. For trend analysis, the robust regression analysis has been developed as an improvement to least squares estimation in the presence of outliers. Holland and Welsch (1977) introduced this method. This technique performs relative to least squares on clean data (without outliers) (O’leary 1990). The robust regression is based on Iteratively Reweighted Least Squares Regression (IRLS). IRLS uses weighted least squares to dampen the influence of outliers. These weights are based on the residuals, which measures how far the observation from its predicted value (Kutner et al. 2004). The main purpose of robust regression analysis is to fit a model, which represents the information in the majority of the data. Trends are estimated both at the annual and at the seasonal scale using 107 years of IMD datasets. The statistical significance of the trend for each grid we followed non-parametric Mann–Kendall test (Mann 1945). The Mann–Kendall test confirms the existence of a positive or negative trend for a given confidence level.

4 Results and discussion

4.1 Characteristics of Indian rainfall

It is important to understand the rainfall characteristics over small homogeneous region of India, which is important for agriculture and related industries. The daily mean climatology over different regions is shown in Fig. 2 from IMD rainfall data. The seven homogeneous regions over the India are shown in the top right corner of Fig. 2 in different colors. The maximum rainfall is observed over the northeast and west coast regions, with maximum precipitation during July of about 17 mm/day. Another following maximum rainfall peak is noticed over the western Himalayas of about ~14 mm/day. The minimum values are observed over northwest India, where the southwest monsoon arrives late and withdraws early. The east coast region shows double peak structure, one peak in July–August and the other peak during the months of October and November. This is due to the occurrence of the both southwest (during June to September) and northeast monsoon (during October–December). The all-India average rainfall closely follows the north central region of India, with the peak value difference by about ~2 mm/day, during the July and August.

Fig. 2
figure 2

Daily climatological mean (mm/day) of 10-day running mean of rainfall area averaged over India (black) and seven different regions are indicated with different color lines (regions are shown at right corner) during 1901–2007

Spatially organized summer monsoon arrives over the Indian Ocean bringing much-needed rainfall to the sub-continent during the summer. Figure 3 shows the spatial distribution of climatological monthly precipitation from IMD, APHRODITE and GPCP datasets for both annual and monsoon season for the period from 1989 to 2007. As far as the annual rainfall is concerned, northeast, west coast, western Himalayas and north central parts of the India receives the highest amount of rainfall, which is shown in all the three datasets as shown in Fig. 3a, c, e. The contribution of summer rainfall is the most significant for the annual rainfall. The highest rainfall is observed over the WG and the Himalayan foothills, which is attributed to the regional orography (Rahman et al. 2009). About 70 % of the annual rainfall occurs during the summer monsoon months (June, July, August, and September) with large spatial and temporal variability (Fig. 3b, d, f). The GPCP shows similar and very fine features as that of IMD rainfall data in both the annual and summer season. The APHRODITE data do not show similar features when compared with IMD and GPCP. During the monsoon season, July and August months of rainfall is higher than June and September months. From June to July, sensible heat input at the surface is close to maximum and also the vertical motion and atmospheric moisture over the northern hemisphere tropical landmasses. Maximum values of pressure gradient force have also been attained at this stage and the monsoon reaches its maximum intensity with maximum amount of precipitation (Webster et al. 1998). The higher precipitation values are observed in northeast, north central, western Himalayas, and west coast regions. The minimum rainfall is observed over northwest part of India throughout the whole year. The monthly variations of rainfall statistics during 1901 to 2007 over seven different regions of India are given in Table 1. The maximum monthly rainfall is observed during July over west coast at about 499.7 mm/month and minimum rainfall in February at west coast at about 4.9 mm/month. Another maximum rainfall (473.5 mm/month) is observed in July over the northeast region. Comparing two maximum rainfall regions the variability is more over northeast (82.4 mm) than the west coast (68.1 mm) region. From Table 1, the mean (1901–2007) rainfall of July is 304.37 mm, which is the highest and contributes 25.6 % of annual rainfall (1189.7 mm). The August rainfall is slightly lower and it contributes 22.8 % of annual rainfall. June and September months rainfall are almost similar and they contribute 14.1 and 14.9 % of rainfall respectively. The mean southwest monsoon rainfall (921.2 mm) contributes 77.4 % of the annual rainfall (1189.7 mm). Contribution of pre-monsoon and post-monsoon rainfall is about 9.8 and 8.5 %, respectively.

Fig. 3
figure 3

Mean annual (top panels) and mean summer monsoonal (bottom panels) precipitation based on IMD (1° × 1°), APHRODITE and GPCP data for the 1989–2007

Table 1 Monthly mean rainfall (mm/month) and their standard deviations observed by IMD data over seven homogeneous regions in India during 1901 to 2007

4.2 Comparison of IMD data with observational datasets

This section discusses the comparison between IMD and other observational datasets (GPCP and APHRODITE). Figure 4 shows the seasonal mean difference of rainfall in percentage over India between APHRODITE and IMD and GPCP and IMD during the different seasons for the period from 1989 to 2007. We estimate the percentage differences between dataset A and dataset B in this study and it is given by D = 100 × (A − B)/B, where A is observation (APHRODITE or GPCP) data and B is IMD precipitation dataset at each grid. The percentage difference D gives the magnitude of the difference between fields A and B in relation to the reference field B. To examine whether these changes are statistically significant, we apply the student’s t test on the each grid of seasonal precipitation. The differences between IMD and observations (APHRODITE and GPCP) are significant at the 95 % confidence level as indicated by the black marks on each grid point. A large difference is noticed between APHRODITE and IMD datasets, which represent that APHRODITE underestimate the rainfall over the Indian region. During the monsoon season, the difference is less when compared to other seasons. Large difference is observed during the winter season especially. The GPCP data shows very good comparison with the IMD dataset particularly during the monsoon months with a very low mean difference. IMD data is underestimating rainfall over northern part of the India (Himalayas), this may be due to the less number of observations available while gridding the IMD data. The GPCP data compares with IMD data over the central and northeast India, and western parts of the India, where high precipitation occurs. The GPCP data is able to captures the major rainfall features over the northeast and WG but underestimates over northwest, western Himalayas and the west coast regions. The APHRODITE data shows large deviation from IMD data, especially over the southern peninsula and northwest India.

Fig. 4
figure 4

Seasonal relative difference observed in rainfall between APHRODITE and IMD (top row), between GPCP and IMD (bottom row) over Indian region during the period 1989–2007. The black solid black marks indicate regions where the difference is significant at the 95 % confidence level (based on the student’s t test)

4.3 Comparison of IMD data with reanalysis datasets

Before discussing the comparison between IMD and reanalysis (CFSR, ERA-Interim, JRA-25, and MERRA) data, we shall discuss briefly the climatology of rainfall from the reanalysis datasets. The seasonal rainfall structures over the Indian sub-continent from IMD and reanalysis (CFSR, ERA-Interim, JRA-25, and MERRA) rainfall during the NH winter, monsoon, pre-monsoon, and post-monsoon seasons are shown in Fig. 5. In this study, all the reanalysis data have been used from the year 1989 to 2007. The basic features of rainfall from the reanalysis datasets are similar to that of IMD rainfall. In general, enhanced rainfall values were observed during the monsoon and post-monsoon seasons and less rainfall in pre-monsoon and winter seasons. During winter season over the western Himalayas, IMD precipitation is lesser than the other reanalysis data. JRA-25 and MERRA monsoon precipitation values are little higher than the IMD, especially over the northeastern parts of India. Bosilovich et al. (2008) showed that JRA-25 reanalysis precipitation is spread out more than some other global regions and they found more biases at the tropical regions of JRA-25 with other reanalysis precipitation. The IMD and all reanalysis datasets show higher amounts of rainfall over the northeast, west coast, and the western Himalayas. The north central part of India receives the highest rainfall during the pre-monsoon season. The above findings are in good agreement with the results reported by using the Tropical Rainfall Measuring Mission (TRMM) precipitation radar measurements and special sensor microwave/imager (SSM/I) data by Xie et al. (2006). The IMD precipitation is close to ERA-Interim and CFSR than the other reanalysis in terms of seasonal mean. During the post-monsoon season, CFSR precipitation values are less distinct than IMD and other reanalysis datasets over the north central parts of India. Precipitation along the east coast of the mid-latitude continents is generally underestimated not only by the reanalysis, but also by Climate Prediction Center (CPC) Merged Analysis of Precipitation (CMAP) compared to GPCP (Bosilovich et al. 2008). Recently, Rahman et al. (2009) compared rain rate from IMD, GPCP, and Tropical Rainfall Measuring Mission Multi-satellite Precipitation Analysis (TMPA) during the low-pressure system periods (27–29 Aug 2007). They suggested that TMPA captures both the spatial distribution and intensity of rainfall seen in IMD, whereas GPCP does not capture the rain over western parts of India.

Fig. 5
figure 5

Seasonal climatological rainfall from top winter (1st row), monsoon (2nd row), pre-monsoon (3rd row) and post-monsoon (4th row) mean observed by IMD and reanalysis (CFSR, ERA-Interim, JRA-25, and MERRA) over India during 1989–2007

The spatially integrated seasonal values of IMD and reanalysis precipitation during the period 1989–2007 are shown in Table 2. The JRA-25 (MERRA) data show near precipitation values with IMD, within 226 (234) mm/month over the monsoon season. ERA-Interim has precipitation values markedly smaller than the other reanalysis (except the winter season). Comparison between the ERA-Interim and IMD data reveals that ERA-Interim is drier during the monsoon season and wetter in other three seasons. From Table 2, the IMD seasonal mean values are in good agreement with the reanalysis datasets. Seasonal precipitation values of MERRA are little closer to the IMD values, this may be due to the assimilation of Advanced Microwave Sounding Unit (AMSU) radiance into MERRA data (Cullather and Bosilovich 2011).

Table 2 Mean rainfall (mm/month) and standard deviations observed by IMD and models (CFSR, ERA-Interim, JRA-25, and MERRA) for all months, pre-monsoon (MAM), winter (DJF), monsoon (JJAS), and post-monsoon (ON) over India during from 1989 to 2007

The longitude–latitude rainfall structure differences in percentage values are shown in Fig. 6. The 95 % confidence level at each precipitation grid point has computed by using student’s t test, as indicated by solid black marks. The most significant differences between the IMD and reanalysis precipitation datasets occur in the interior peninsula and western Himalayas regions during the monsoon and pre-monsoon seasons. Note that the red (blue) color denotes IMD precipitation is lower (higher). From Fig. 6, CFSR shows higher precipitation values than IMD over the central and south part of India during the monsoon and post-monsoon seasons. In the case of MERRA precipitation values are higher than IMD in all seasons, except during winter. The JRA-25 reanalysis precipitation values are higher than IMD in most parts of the country, especially during the pre-monsoon season. The ERA-Interim precipitation shows lesser values compared to IMD in several parts of India.

Fig. 6
figure 6

Seasonal relative difference observed in rainfall between CFSR and IMD (1st column), between ERA-Interim and IMD (2nd column), between JRA-25 and IMD (3rd column), and between MERRA and IMD (4th column) over Indian region during the period 1989–2007. The 95 % significance levels (based on student’s t test) are marked by the solid black marks on each grid

The mean percentage difference of precipitation at each season at different regions over the India is tabulated in Table 3. In each reanalysis maximum mean differences, values are highlighted as depicted Table 3. The positive (negative) indicates IMD is lower (higher) than the model values. During pre-monsoon, JRA-25 precipitation overestimates than IMD over India, by an average ~76 % higher than IMD precipitation. Over interior peninsula, MERRA precipitation values are ~40 % larger than the IMD during monsoon, and post-monsoon seasons. The IMD precipitation is ~50 % higher than CFSR over western Himalayas (monsoon), and CFSR ~85 % higher than IMD during the post-monsoon season. Significant difference between IMD and ERA-Interim is observed during the pre and post-monsoon over western Himalayas and with small differences during the monsoon over north central part of the India. The difference between ERA-Interim and IMD data is negative in all seasons along the east coast, interior peninsula, northwest, and west coast, implying that ERA-Interim dataset underestimates in all seasons. Similarly, ERA-Interim datasets overestimates the heavy rainfall amounts along the northeast part of the India. From Table 3, maximum values in the reanalysis precipitation datasets are observed over the western Himalayas during pre and post-monsoon with the maximum positive in the post-monsoon season (except JRA-25). The mean percentage difference between IMD and reanalysis all over India for all years (1989–2007) is about 12.05 %, −2.06, 17.27, and 1.03 % for CFSR, ERA-Interim, JRA-25, and MERRA respectively. The CFSR and MERRA precipitation is drier than IMD over the southern part of India in most of the seasons.

Table 3 Seasonal mean difference of rainfall in percentage between model (CFSR, ERA-Interim, JRA-25, and MERRA) and IMD datasets, ratio = (100*(model − IMD)/IMD) for different regions over India during 1989–2007

The agreement between both IMD and reanalysis datasets is quite good except for some specific regions. All reanalysis datasets showed higher precipitation than IMD over western Himalayas during winter, pre-monsoon and post-monsoon seasons. The relative difference is large between CFSR and MERRA over southern part of India during monsoon and post-monsoon seasons. The reanalysis values are underestimated compared to the IMD dataset, which can be attributed to the accuracy of the rain gauges and spatial heterogeneity, which is not sufficient for the gridding procedure. On the other hand, problems of the reanalysis data at these regions can also not be excluded; because it is unclear how many real measurements (e.g. rainfall data and satellite measurements) went into the reanalysis at these locations. From Fig. 6, we have seen that the interior and continental parts of the Indian pre-monsoon region exhibit relatively large differences between IMD and JRA-25 rainfall (see Table 3). The reanalysis use different assimilation methodologies. Therefore, it is quite difficult to attribute quantitatively the differences between them at certain physical process (Mishra et al. 2012).

Histograms of the probability distribution function (PDFs) of monthly IMD, observations, and reanalysis rainfall datasets are shown in Fig. 7a–f. The scatter plot is also included in the same figure along with the best-fit line (black solid line), the associated linear equation, and the correlation coefficient value. The scatter plot displays the linear relationship between the IMD and observations and reanalysis datasets. This method is useful to augment the traditional approach to validate with the use of PDFs of rainfall datasets over an extended period for a given spatial domain. The scatter plot of IMD and CFSR (Fig. 7c) and IMD and MERRA (Fig. 7f) comparisons also differ by a few percentage points, which could be partly spatial sampling issue and is likely to have contributed to the apparent errors of the model precipitation values. The maximum mean absolute value of 27.04 mm and root-mean square error of 39.571 are observed IMD versus CFSR, and minimum values with MERRA data. The absolute mean difference and root mean square (RMS) values are less between IMD and GPCP datasets. The scatterplots also show fewer outliers between IMD and observational data than reanalysis datasets. The high correlation is observed between IMD and observations than reanalysis datasets. These results reveal that the IMD data is suitable for studying the temporal variation.

Fig. 7
figure 7

Histograms showing the monthly rainfall distributions of a IMD and APHRODITE, b IMD and GPCP, c IMD and CFSR, d IMD and ERA-Interim, e IMD and JRA-25, and f IMD and MERRA over India during 1989–2007, Blue (red) line is for IMD (observational/model). Dots indicate monthly rainfall values, and black line represents the best fit between the IMD and model rainfall data

Figure 8 shows the correlation and standard deviation values used to produce the Taylor diagram (Taylor 2001; Bosilovich et al. 2008) over the Indian continent during the period from 1989 to 2007. The standard deviations are lower in the winter than other seasons while in the observations (APHRODITE and GPCP), standard deviations are less in winter season. On the other hand, the CFSR standard deviation is high during the monsoon season. The correlation coefficient between IMD, observations, and model precipitation is found to be highly significant. The observations (APHRODITE and GPCP) seasonal correlations are higher (>0.9) than any other reanalysis datasets. At large scales, the observations, and reanalysis seasonal correlations are significant.

Fig. 8
figure 8

Taylor diagram for the seasonal precipitation correlations and standard deviations from observations and reanalysis using IMD as a reference during the period from 1989 to 2007

4.4 Trend analysis

In previous sections, we have discussed the IMD rainfall characteristics and its validation with observational and reanalysis datasets. It is well known that rainfall over the Indian region exhibits strong inter-annual variability, which is controlled by tropical pacific sea surface temperature (SST) anomalies related to the El Niño-Southern Oscillation (ENSO) (Kumar et al. 2006). At lower frequency, monsoon variations are linked to the Atlantic multi-decadal oscillation related SST (Kucharski et al. 2009). Turner and Annamalai (2012) discussed the anthropogenic climate change and its effects on long-term monsoon patterns. Therefore, it is necessary to have insights into historical rainfall trends, which will be useful for both agriculture and water development planning. In this section, we will focus on the spatial inter-annual variability of rainfall using IMD datasets. Previous studies reported the trends of the Indian monsoon and rainfall with different aspects and different regions (Guhatakurtha and Rajeevan 2008; Dash et al 2009; Ranade et al. 2008; Lacombe and McCartney 2014). All these studies have concentrated with regional average rainfall and other studies from short-term rainfall datasets.

Figure 9a shows the interannual variability of IMD mean annual rainfall over India during the period from 1901 to 2007. The horizontal dotted line indicates the mean value (1190 mm) of 107 years from 1901 to 2007. The solid line shows the 5-year running mean of yearly precipitation dataset. The inter-annual variability is evident during the period from 1910 to 2007. From the figure, it is clearly noticed that during 1901–1905 (1056 mm), 1950–1954 (1090 mm), and 2000–2004 (1050 mm) periods are very less rainfall, particularly in the winter seasons and that inter-annual variability has been below average for these periods. In particular, during these years the annual rainfall was extremely low in 1951 (984 mm), 1965 (974 mm), 1972 (910 mm) and 2002 (918 mm). This may be due to less rainfall during the summer monsoon. The spectral amplitudes of the periodicities are estimated from the yearly averaged precipitation data over the Indian sub-continent as shown in Fig. 9b. Here we used the Lomb–Scargle periodogram analysis (Scargle 1982). This technique is equivalent to a pure harmonic least-square analysis. The 95 % confidence level is indicated with a dashed line. The major features are 15.7-year period with moderate intensity is seen in 107 years of precipitation datasets. It is worthwhile to note that there are some other peaks representing different periodicities of 10, 12, 23, and 33-year, but these periods are <95 % confidence level.

Fig. 9
figure 9

a Mean annual rainfall over the Indian continent for each year from 1901 to 2007. The dashed and solid line shown are mean of whole period (1901–2007) and the 5-year running mean, respectively. b The spectral amplitudes of the IMD precipitation data during the period 1901 to 2007. The 95 % significance level is indicated by horizontal dashed line

Further, we examine spatial distribution of trend analysis over the Indian region. In order to study the secular variations of regional rainfall, we carried out the spatial trends with respect to season wise. Here, we computed the precipitation trends at each grid point using robust regression technique (Holland and Welsch 1977). The spatial distribution trends are estimated using 107 years of IMD datasets and are shown in Fig. 10. The major features are significant and remarkable variations on the regional scale. In Fig. 10, solid black mark areas indicate the 95 % confidence level as determined for each individual grid box using non-parametric Mann–Kendall test. The spatial trend structures of the IMD rainfall during the period of 1901–2007 for annual mean (Fig. 10a), and different seasons winter (Fig. 10b), pre-monsoon (Fig. 10c), monsoon (Fig. 10c), and post-monsoon (Fig. 10e).

Fig. 10
figure 10

Longitude–latitudinal IMD rainfall trend structures of the a annual mean, b winter, c pre-monsoon, d monsoon, and e post-monsoon season during 1901–2007. Solid black marks indicate regions where the trends are significant at the 95 % confidence level (based on the Mann–Kendall test)

At first, we will discuss about annual mean trends, the positive trends are observed over several parts of India. The maximum positive trends (~2.5 mm/decade) over the northeastern parts of India, and negative trends (~3 mm/decade) over the western Himalayas region is observed. Significant negative trend is noticed in all seasons over the north central east region at about ~3 mm/decade. The moderate negative trends are observed over north-central and southern part of India. During the period of 107 years, not much change is noticed over northwest part of India. For the seasonal rainfall, negative trends are observed over several parts of the country in winter, except northeastern part of India. Northeast, and interior peninsular India experienced an increasing trend in summer monsoon and other two regions (western Himalayas, and north central east) experienced a decreasing trend. During the monsoon season, rainfall shows the positive and negative trend variations clearly over the Indian region. The western Himalayas show negative trends in all seasons, expect during post-monsoon season. A negative trend is also observed in the southern parts of India in all the seasons. From Fig. 10, positive trends are clearly observed over northeastern (~2.75 mm/decade, monsoon) and the west coast parts of India, and clear negative trends over western Himalayas (~4.7 mm/decade, monsoon), north central east (~3.6 mm/decade), and southern (~1.35 mm/decade, post-monsoon) part of India. Sinha Ray and Srivastava (1999), the monsoon season shows an increasing trend over certain parts of the country, whereas a decreasing trend has been observed during winter and pre-monsoon seasons. Previous studies showed that the trends analysis with respect to region (Mooley and Parthasarathy 1984; Thapliyal and Kulshreshtha 1991; Lal 2001). For the first time over the Indian region, the spatial trends of rainfall are shown with high-resolution (1° × 1°) long-term gridded datasets.

5 Conclusions

The characteristics of Indian rainfall are explored by using observations (IMD, GPCP and APHRODITE) and reanalysis (CFSR, ERA-Interim, JRA-25, and NASA-MERRA) datasets. The spatial distribution of rainfall over India and its validation with observational and reanalysis datasets has been studied. In addition, trends during different seasons (annual/pre-monsoon, summer monsoon, post-monsoon and winter) were also analyzed using the robust regression technique. To help synthesize results, field significance of identifying positive and negative trends is assessed all over the Indian region, which indicates the increasing and decreasing tendency of the Indian rainfall at regional scales. Thus, we have shown the spatial distribution of rainfall trends with 95 % confidence level at each grid point. The spatial trend maps generated in this study could be used to anticipate future changes in water resources availability and, to some extent, agriculture production across the country. The most important results are summarized below.

  1. 1.

    The monthly mean precipitation from IMD data show a maximum value at about 499.7 mm/month over the west coast region, followed by northeast region (473.5 mm/month) during the month of July. The minimum precipitation is observed in the month of February in the west coast region at about ~5 mm/month. The year-to-year variability of mean annual rainfall over India shows high inter-annual variability is evident during the period of 1901–2007.

  2. 2.

    The spatial patterns of mean annual and summer monsoon precipitation is similar for IMD and GPCP datasets but APHRODITE data shows more deviation. These differences may be attributed partially to the different resolutions of the observations.

  3. 3.

    Among the reanalysis datasets, ERA-Interim shows realistic values to that observed by IMD followed by CFSR, MERRA and JRA-25 reanalysis. JRA-25 data heavily underestimates during the pre-monsoon season. These differences are found only in spatial distribution, but monthly mean values show reasonably good agreement. Moreover, very high correlations are observed between IMD and reanalysis datasets (see Fig. 7).

  4. 4.

    The strong annual cycle over the Indian sub-continent is reasonably well captured by the IMD and model datasets. However, CFSR shows little drier, and JRA-25 and ERA-Interim shows little wetter than IMD precipitation. The correlation between IMD, observational and reanalysis datasets reveal that JRA-25 shows highest correlation and less correlation with CFSR reanalysis.

  5. 5.

    The spatial trends were computed over India for the four seasons at each grid (1° × 1°) point using robust regression technique. The trend analysis was carried out at above 95 % significance level using non-parametric Mann–Kendall test. The significance analysis helps to locate the regions where the rainfall is either increasing or decreasing.

  6. 6.

    The annual and seasonal trends were derived at each grid from the IMD precipitation during the period from 1901 to 2007. The maximum positive trend is noticed over northeast part of India in all seasons, whereas a decreasing trend is observed in the western Himalayas and north central east, and southern parts of India. In addition, the moderate positive trends are observed over the west coast and interior peninsula during the monsoon, and post-monsoon seasons.

The good agreement between IMD and the other observational datasets suggests that IMD measurements can be used to describe the spatial and temporal variability of precipitation. This study results in the characterization of the Indian rainfall with observational and reanalysis datasets. The result presented in this study provides useful insights for the climate modeling community to establish appropriate benchmarks for performing model evaluation.