1 Introduction

The Indian subcontinent is one of the most densely populated regions of the world. The region's socioeconomic development is significantly influenced by rainfall, and rainfed crops constitute a major part of the agriculture sector (Ashrit et al. 2020). Rainfall is an essential part of various hydrological and meteorological scenarios over the area. Spatiotemporal rainfall patterns play a crucial role in energy cycles that rely on land coupling and atmosphere (Chakraborty et al. 2015). Therefore, the availability of precise or reanalysed rainfall datasets is a crucial prerequisite for monitoring weather variabilities, modelling natural phenomena, forecasting and hydrometeorological studies.

Most accurate observations can be obtained from gauge measurements, but the scarcity of gauge stations remains a challenge. The sparsity of gauge stations over remote areas and oceans hinders in prediction and forecasting of climatic variations in those regions. Reanalysis datasets are characterised by the combined effect of empirical and satellite datasets to provide the most recent gridded atmospheric conditions for a certain time period (Blacutt et al. 2015). The precision of reanalysis datasets depends on the data assimilation scheme and the underlying model. Reanalysis products provide high-quality, long-term information about climate variability and atmospheric circulations. With the aid of a stable and individual data assimilation technique, the modelling system can combine limited and varying observations producing gridded coherent meteorological data (Rienecker et al. 2011). Over the past few decades, further developments of reanalysis datasets and their analysis have been extensively carried out (Kalnay et al. 1996; Saha et al. 2010; Dee et al. 2011; Maussion et al. 2014). The establishment of various global and regional reanalysis datasets paved the way for new high-resolution datasets that can provide insight into different atmospheric variabilities (Fortelius et al. 2002; Mesinger et al. 2006; Onogi et al. 2007; Kobayashi et al. 2015; Bollmeyer et al. 2015). Due to the requirement of enormous computational power, labour, and storage, global reanalysis products are produced with coarse grid resolutions. This moderate resolution of the global reanalysis product impedes its application in various mesoscale processes and prediction of climatic extremes which necessitates high spatial resolution. A regional reanalysis product employs a high-resolution model with restricted spatial coverage with boundary conditions and initial conditions derived from global reanalysis and can evaluate regional synoptic incidents more precisely (Dahlgren et al. 2016). National Centre for Medium Range Weather Forecasting (NCMRWF), Ministry of Earth Sciences, India, recently released two high-resolution reanalysis datasets, namely Indian Monsoon Data Assimilation and Analysis (IMDAA) and NCMRWF Global Forecast System model (NGFS). With the help of these reanalysis products, high-resolution datasets can be made available. Another positive aspect is a better understanding of extreme events and monsoon patterns at finer scales, particularly over India. IMDAA reanalysis product is developed for a limited area to produce a high-resolution (12 km) dataset rather than global reanalyses. IMDAA is better at quantifying rainfall in the past period and predicting future trends than NGFS (Mahmood et al. 2018).

Spatial aggregations of India show an adverse increase in extreme precipitation events, mainly during the monsoon season (Ghosh et al. 2016). About 80% of the overall rainfall received by India is during the monsoon season, i.e., June to September. Climatic changes including global warming, greenhouse gas emissions and land cover increase due to urbanisation inclining the chances of occurrence of extreme events as shown by the study in (Kharin et al. 2007). Every 1 °C of warming results in an increase in the water-holding capacity of the atmosphere, by 6–7% which leads to an increase in extreme rainfall (Trenberth 2005). The urbanisation rate of India in 2020 is estimated as 34.92%, out of which the South Peninsular Region’s contribution is very high. An increase in the number of extreme events during 1901–2010 in many urban cities indicated the influence of urbanisation on the occurrence of such events (Ali et al. 2014). Extreme rainfall events associated with the water vapour transport from the Arabian Sea and the Indian Ocean caused floods in Kerala in 2018 (Lyngwa and Nayak 2021). Similarly, many states of South Peninsular India are facing havoc due to extreme rainfall events. Explanations of spatiotemporal trends in extremes, mean precipitation and other hydrometeorological factors remain challenging because of the uncertainties in the multi-model observations and disagreement in data assimilation strategies in reanalysis products (Zhang et al. 2017). Henceforth, an investigation of the reanalysis products in estimating extreme precipitation scenarios and quantifying the rainfall amount received by the area is essential for societal growth and prediction of future trends in the spatial variability of precipitation and other factors. A limited number of studies have been conducted to analyse the performance of IMDAA over India; notably, no studies were conducted to study the rainfall occurrences and extreme events over South Peninsular India (SPI) using IMDAA. In (Aggarwal et al. 2022), an investigation of IMDAA’s ability to characterise the Indian Summer Monsoon (ISM) was carried out over Northwest India for the years 1979–2018. The results showed the capability of IMDAA in realistically representing the ISM features and the linkage between moisture availability and convective precipitation formation. For the year 2018, a study was conducted to assess the ability of NCMRWF in predicting ISM for the months of June to September revealing the qualitative reliability of IMDAA in forecasting precipitation and zonal winds (Chakraborty et al. 2021). Precipitation forecast analysis was done with respect to Integrated Multi-satellite Retrievals for Global Precipitation Measurement (IMERG GPM), predictive skills were calculated based on Tropical Rainfall Measuring Mission (TRMM) observations and skills for zonal winds forecasts were computed using European Centre for Medium-Range Weather Forecasts Reanalysis (ERA-interim). A study over Western Himalaya region depicts that IMDAA is efficient in demonstrating heavy rainfalls as well as characteristics of winter precipitation at seasonal, diurnal and interannual scales associated with western disturbances. The study also implied that IMDAA is coherent in exhibiting spatial patterns comparing to gauge based and satellite products, even in higher magnitudes due to its high resolution (Sharma et al. 2022). Another intercomparison research on gridded rainfall datasets of nine global datasets over Sri Lanka implies that IMDAA well identified rainfall patterns and suitable for hydrological applications (Bandara et al. 2022). Significance of high-resolution datasets is emphasized in a study over Western Himalayas, in which, authors analysed winter seasonal precipitation over the area using different satellite (IMERG), observational (IMD) and reanalysis products (European Centre for Medium-Range Weather Forecasts (ERA5), IMDAA). Evaluation also revealed the efficiency of using IMDAA precipitation data in moisture transport, cloud cover, upper tropospheric circulations and surface temperature simulations (Punde et al. 2022).

This study focuses on the evaluation of the high-resolution reanalysis dataset, IMDAA during the years 2000–2020 in estimating precipitation characteristics and extreme events over SPI. The research study is categorised into the following details:

  1. 1.

    Spatiotemporal variability of daily average precipitation amounts on annual and seasonal scale is carried out using IMDAA with Indian Meteorological Department (IMD) daily gridded dataset as the benchmark.

  2. 2.

    Reliability of IMDAA is further emphasised by statistically analysing IMDAA in capturing precipitation using different agreement and categorical indices.

  3. 3.

    Spatial analysis over the grid points showing significant differences in percentile values of precipitation is done using a new approach known as Mahalanobis distance. Mahalanobis distance is preferred for multivariate comparison since a multivariate system is converted to a univariate system in the process and it exhibits sensitivity in intervariable changes in the system (Mahalanobis 1930). Also, a study conducted over San Cristobal, Venezuela concluded that Mahalanobis distance is better than Euclidean distance and Manhattan distance in detecting atypical observations in monthly precipitation series (Rivas et al. 2020).

  4. 4.

    Further, a statewise analysis of extreme rainfall episodes detection by IMDAA and extreme event prediction using extreme value distribution is carried out.

  5. 5.

    ENSO (El Niño-Southern Oscillation) climatic phenomenon is used to analyse its resonance with the extreme years using both IMDAA and IMD standardised precipitation data.

The paper is organised as follows: Sect. 2 describes the data and climatology of the site; Sect. 3 consists of the methodology of the research done; the results are discussed in Sect. 4; Summary and main conclusions of the study are presented in Sect. 5.

2 Data and climatology of the site

2.1 Data

Indian Monsoon Data Assimilation and Analysis (IMDAA) reanalysis datasets of 12 km resolution are downloaded from the NCMRWF database from 2000 to 2020 (Rani et al. 2021). This high-resolution reanalysis dataset is comprehensively analysed using the Indian Meteorological Department (IMD) observations for the same period (Pai et al. 2014).

2.2 IMD precipitation data

Indian Meteorological Department (IMD) provides 0.25\(^\circ \) × 0.25\(^\circ \) daily gridded precipitation datasets over India. This agency provides datasets based on ground-based gauge stations across the country. Around 6955 gauge stations are utilised to estimate the precipitation over the region. These 6955 gauge stations include observations from 547 IMD observatory stations, 494 hydro-meteorology observatories, 74 Agromet observatories, and the rest are rainfall stations maintained by the government (Rajeevan et al. 2008). Due to the topography and complex hilly terrains, there is a paucity of gauge stations in the country, mainly in the northernmost regions of the country. There is a non-uniform distribution of the gauge stations in the country, and the Inverse Distance Weighted interpolation (IDW) method was implemented to reduce this effect of nonuniformity and temporal unavailability of gauge stations (Rajeevan et al. 2006). The precipitation data from 2000 to 2020 downloaded from its official website are used as a benchmark in assessing the reanalysis dataset.

2.3 IMDAA precipitation data

IMDAA is a high-resolution reanalysis dataset implemented in the combined endeavour of the National Centre for Medium Range Weather Forecasting (NCMRWF), India and the Met Office (MO), UK, in association with the IMD as part of the National Monsoon Mission (NMM) project of the Ministry of Earth Sciences, Government of India (Shepard 1968). It is the foremost reanalysis dataset produced by NCMRWF with a horizontal resolution of 12 km spanning south Asia with neighbouring regions to capture all the necessary variables affecting the climatic variations or occurrence of the monsoon season. The database consists of hourly, three hourly and six hourly datasets of different observed variables, such as precipitation, moisture, temperature, and humidity, and non-observed variables, such as cloud cover and fluxes, for the period of 1979–2022 at present. This mesoscale version is built on a four-dimensional assimilation method and with 63 vertical levels leading to a height of 40 km. The precipitation data are downloaded from RDS NCMRWF repository.

2.4 Climatology of SPI

Assessment of reanalysis product over a region necessitates the knowledge of climatology of the study region. The performance of each precipitation product differs from area to area due to topological and climatic characteristics. This study focuses on the precipitation characteristics over the South Peninsular region of India as shown in Fig. 1 using IMDAA and IMD.

Fig. 1
figure 1

Map of South Peninsular India

SPI is a triangle-shaped landmass surrounded by the Bay of Bengal on the east, the Arabian Sea on the west and the Indian Ocean on the south. SPI consists of five states: Kerala, Karnataka, Tamil Nadu, Andhra Pradesh, and Telangana.

Along with the sea, ocean and other water resources, the Western Ghats and the Eastern Ghats present in the borders play a significant role in the wind circulations. The circulations transport moisture content across the country, leading to monsoon and several precipitations phenomena. These moisture contents from the Arabian sea move towards the Narmada and Tapi rivers causing precipitation in areas of central India, and the combined flow of moisture content from the Indian Ocean and the Bay of Bengal leads to widespread rain in the northeast parts of India (Singh et al. 2021). Hence, an analysis related to SPI is critical in the context of climatology because of the area’s topography.

3 Methodology

Precipitation datasets for 21 years from 2000 to 2020 are utilised in this study. Total hourly precipitation estimated by IMDAA during the analysis time is extracted from NCMRWF official web page (Tan and Santo 2018). Daily precipitation data over India for the study period are downloaded from IMD website. The resolutions of the two datasets used here are different. So, the first step was to regrid the reanalysis dataset to a horizontal resolution of 0.25 \(^\circ \). After interpolating the datasets to the same resolution, precipitation over SPI is obtained. A comparison of the estimation capacity of the reanalysis dataset with the observation dataset is done by examining seasonal and interannual variability. Based on the climatology of SPI, the data are classified into four seasons: (1) Winter (January–February), (2) Pre-monsoon (March–May), (3) Monsoon (June–September), and (4) Post-monsoon (October–December). For the comprehensive assessment of the reanalysis product and the observation datasets, different statistical indices are made use of.

3.1 Assessment indices

Statistical indices are calculated over 810 grid points over the land within the geographical boundaries of the states representing SPI spanning 7\(^\circ \) N to 22\(^\circ \) N to 73.5\(^\circ \) E to 84.25\(^\circ \) E for the observation period. The performance of IMDAA is analysed at annual, monthly, and seasonal scales. Here, \({O}_{i}\) and \({S}_{i}\) is the average precipitation obtained by IMD and IMDAA, respectively, over the area, \(\overline{O }\) and \(\overline{S }\) is the mean of \({O}_{i}\) and \({S}_{i}\), respectively, \(n\) is the number of samples corresponding to space and time \(rd\) and \(cd\) are the standard deviations of IMD and IMDAA values, respectively. A detailed description of the statistical indices used is given in Table 1.

Table 1 Description of statistical indices

Rather than the above-mentioned metrics, Kling Gupta Efficiency (KGE) takes into account the pattern similarity, variance and error, while Nash Sutcliffe Efficiency Coefficient (NSE) gives the index value totally based on residuals. Hence, these two indexes altogether give different scores and are important to make any decision regarding the best model selection. The details of these two metrics are given as follows:

Kling Gupta Efficiency (KGE): A measure to show the degree of fit and helps to understand the relationship between variability, correlation, standard deviation, bias, etc.

$$ {\text{KGE}} = 1 - \sqrt {\left( {CC - 1} \right)^{2} + \left( {\frac{cd}{{rd}} - 1} \right)^{2} + \left( {\frac{{\overline{S}}}{{\overline{O}}} - 1} \right)^{2} } $$
(1)

Nash Sutcliffe Efficiency Coefficient (NSE): A statistic that establishes the magnitude of variability between the variances in observed and reanalysis data. Its value shows how strongly a 1:1 line fits in the observed versus reanalysis graph. The optimal value of NSE is 1 which implies an ideal model.

$$ {\text{NSE}} = 1 - \frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i} - S_{i} } \right)^{2} }}{{\mathop \sum \nolimits_{i = 1}^{n} \left( {O_{i} - \overline{O}} \right)^{2} }} $$
(2)

Moreover, seasonal and interannual variability of data is compared using spatial distributions and the significant difference between average precipitation during different seasons and years is analysed using the Student’s t test with the null hypothesis that there is no difference in the mean precipitation of IMD and IMDAA estimation.

Student’s t test is a statistical method of testing how significant the differences can be between two groups which are drawn from a normally distributed population where the standard deviation is unknown.

$$ {\text{Test}}\,{\text{statistic}},\quad t = \frac{{\left| {O_{i} - S_{i} } \right|}}{{D\sqrt {\frac{1}{{n_{1} }} + \frac{1}{{n_{2} }}} }} $$
(3)

where \({n}_{1}\) and \({n}_{2}\) are sample sizes of IMD and IMDAA, but here \({n}_{1}\) and \({n}_{2}\) are same, \(D\) is the pooled standard deviation of IMD and IMDAA.

3.2 Mahalanobis distance

Other than the usual distance metric calculated for single mean variables, a new approach is utilised which makes use of a multivariate distance metric known as Mahalanobis distance to find out if there exists any significant distance between IMD observations and reanalysis product datasets across seasonal and annual percentiles of precipitation of each year corresponding to each grid point.

Mahalanobis distance (MD) is an approach which provides a covariance distance between data that helps to detect anomalies and similarities of unidentified samples effectively (Mimmack et al. 2001). It gives distance between centroids of two or more groups to get an idea about the divergence of characteristics associated with the groups (Matcharashvili et al. 2017). Rather than Euclidean distance, it addresses different groups based on the correlation of data points and all variables are equally characterised by giving less weightage to highly correlated variables (Kishore et al. 2016). Other important properties of Mahalanobis distances include their relationship to multidimensional scaling and the log-likelihood of multivariate normal distributions. Mahalanobis distance is used to produce univariate distance for multivariate groups with various parameters. Mahalanobis distance is given by the formula:

$$ MD^{2} = \left( {x - y} \right)^{T} P^{ - 1} \left( {\overline{x} - \overline{y}} \right) $$
(4)

where \(\overline{x}\) and \(\overline{y}\) represent sample means with sizes \(m\) and \(n\), respectively, \(T\) is the transpose operation and \(P\) is the covariance matrix:

$$ P = \frac{{\left( {\left( {m - 1} \right)P_{1} + \left( {n - 1} \right)P_{2} } \right)}}{m + n - 2} $$
(5)

where \(P_{i}\)’s denote the covariance matrices of the corresponding groups.

As the MD value gets smaller, the variables tend to fall into the same group or it implies that the groups possess similar characteristics. After the calculation of MD values, Hotelling’s \({T}^{2}\) test is used to calculate the significant difference in the precipitation values. The \(T\) statistic is given by:

$$ F = \frac{m + n - p - 1}{{\left( {m + n - 2} \right)p}}MD^{2} \sim F_{p,m + n - p - 1} $$
(6)

where \(p\) is the number of variables used. The significance test is undergone with the assumption that there is no discrepancy in the performance parameters of the groups as the null hypothesis. If \(F>{F}_{p,m+n-p-1},\) then the null hypothesis is rejected implying the existence of a significant difference between the group parameters.

3.3 Performance diagram

Performance diagram is used to analyse the detection capability of IMDAA in seasonal, annual and percentile scales. The performance diagram plots four forecast statistics such as Probability of Detection (POD), Success Ratio (SR), Critical Success Index (CSI) and bias simultaneously. Points close to the upper right corner of the graph show the best efficiency in forecasting. POD is the probability with which a product will detect the rainfall correctly (Haile et al. 2016). CSI gives the capability of the product in estimating precipitation and SR is the inverse of the False Alarm Ratio (FAR) which gives the ratio of wrongly detected occurrences.

$$ {\text{POD}} = \frac{H}{H + M} $$
(7)
$$ {\text{FAR}} = \frac{F}{F + M} $$
(8)
$$ {\text{CSI}} = \frac{H}{H + M + F} = \frac{1}{{\frac{1}{{\left( {1 - {\text{FAR}}} \right)}} + \frac{1}{{{\text{POD}}}} - 1}} $$
(9)

where \(H\) is the number of hits, \(M \) is the number of misses and \(F\) is the number of false alarms (Tang et al. 2020).

3.4 Extreme value distribution

The occurrence of extreme precipitation events over SPI has increased during the past few decades. The geographical conditions of SPI with the ocean surrounding the landmass on three sides lead to the development of thermal and pressure gradients increasing the chances of formation of hydrometeorological extremes more often (Baki et al. 2021). Extreme flood events are causative of these extreme precipitations. During the study period of 2000–2020, the states of SPI were prone to severe flooding events that caused catastrophic destructions of agricultural, economic, and societal aspects. The prediction of these disastrous extreme events attributed to variations in several dynamical parameters is a need. Spatial distributions of extreme rainfall episodes of each state detected by IMDAA are analysed and compared against the IMD observation dataset. Not only the detection or estimation of precipitation is needed, but a reanalysis dataset must also be capable of detailing small variations in precipitation amounts in calibrating future extremes. An extreme value distribution was used to give an insight into the prediction capability of IMDAA. Extreme value Type 1 distribution was used for finding the large extremes of the data (Phien 1987).

General representation of the probability density function of the Gumbel (maximum) distribution is given by:

$$ f\left( x \right) = \frac{1}{\beta }e^{{ - \frac{x - \mu }{\beta }}} e^{{e^{{ - \frac{x - \mu }{\beta }}} }} $$
(10)

where \(x\) is the precipitation data, \(\mu \) is the location parameter and \(\beta \) is the scale parameter. For the standard Gumbel distribution \(\mu \) = 0 and \(\beta \) = 1, lead to the following formula:

$$ f\left( x \right) = e^{ - x} e^{{ - e^{ - x} }} $$
(11)

The spatial distribution of grid points showing extreme precipitation amounts during the observed extreme rainfall episodes was compared based on the IMD observations.

Gumbel distribution is widely used to predict extreme values of different phenomena such as wind speeds, flood frequency, and extreme event prediction. Extreme value prediction can be done using different distribution such as Normal, Log-Normal, Weibull, Gamma, and Gumbel. The best fit distribution differs from data to data. Hence, goodness of fit measures such as the Kolmogorov–Smirnov (K–S) Test, Anderson–Darling (A–D) Test and Root Mean Square Error (RMSE) were used to determine the best fit distribution. Gumbel distribution resulted in the best fit for the data and in predicting the extremes. Also Gumbel distribution seems to be better in predicting extremes using IMD rainfall data as seen in studies related to rainfall intensity duration frequency curve over Bangladesh (Rasel and Islam 2015), intensity duration frequency duration with maximum rainfall (Barrie and Scott 2021) and the distribution shows reliable results in flood forecasting, extreme events prediction, estimate river discharge or flow (Solomon and Prince 2013; Meeyaem and Polpinit 2014; Mamman et al. 2017; Bhagat 2017; Naz et al. 2019).

4 Results and discussion

As the primary analysis, the annual cycle of IMDAA and IMD is shown in Fig. 2. It can be noticed that daily average rainfall given by IMDAA is in close correlation with IMD rainfall data. However, a little exaggeration of the data is seen for IMDAA. A broad overview of the similar behaviour of average daily data can be seen from the figure. It is clear that IMDAA and IMD data share same patterns of crest and trough throughout the daily average of the 21 years of study. However, to guarantee the precision of IMDAA data in all respects, particularly in grid-based and extreme event prediction, an analysis in higher level skills is important. The following subsections are devoted to spatiotemporal analysis, variability assessment using different metrics and research on reliability of the data in predicting extremes.

Fig. 2
figure 2

Daily average precipitation estimated by IMDAA and IMD

4.1 Interannual variability of precipitation between IMDAA and IMD

The behavioural patterns of the reanalysis product throughout the years in comparison to the IMD observations over SPI are studied. Here, the findings reveal that during monsoon season, the estimated rainfall data are more spread out which can be observed from the standard deviation (SD) values of IMD (SD = 4.49 mm/day) and IMDAA (SD = 4.72 mm/day), whereas the data are clustered around the average values during pre-monsoon (SD of IMD = 0.8 mm/day, SD of IMDAA = 1.28 mm/day) and winter seasons(SD of IMD 0.2 mm/day, SD of IMDAA = 0.31 mm/day). Figure 3 shows interannual variability of average precipitation estimated by IMDAA and IMD over the years 2000–2020 during different seasons and total yearly precipitation. Annual precipitation of IMDAA is comparable with IMD but with an overestimation by IMDAA. Figure 3a, d shows a common pattern indicating average rainfall estimation of IMDAA and IMD is closely related during monsoon season and overall annual precipitation values. As depicted in Fig. 3b, e, Winter and post-monsoon season show similar characteristics implying that IMDAA has well-captured the rainfall during these seasons on overall seasonal scale. Out of these seasonal and annual estimations, Fig. 3c exhibits the highest deviation of precipitation amount obtained by IMDAA when compared to IMD. That is, during the pre-monsoon period, a high level of overestimation by IMDAA occurred during these 21 years.

Fig. 3
figure 3

Year to year variation of average daily rainfall on annual and seasonal scale, a Annual, b Winter, c Pre-monsoon, d Monsoon and e Post-monsoon

4.2 Daily climatology comparison using scatter plots

Scatter plots define the correlation between two quantities with respect to the direction in which the points are scattered and the closeness of the points. If the data points are closely correlated, the scatter plot will show a straight line fit, and the direction of correlation depends on whether the line possesses a positive or negative slope.

Here, Fig. 4 presents scatter plots of average daily precipitation amount over SPI for 21 years during different seasons and the whole year. For a comprehensive analysis of the correlation, daily rainfall data of all grid points of SPI from both IMD and IMDAA for annual and respective seasons are pooled. All the seasonal and annual scatter plots show a positive correlation between IMDAA and IMD. In the scatter plots of annual (CC = 0.8018) and monsoon (CC = 0.8196) seasons, IMDAA displays a high correlation with the IMD observations. The winter and post-monsoon seasonal rainfall are dominant over some parts of SPI with significant interannual variability. The dynamical variation in the rainfall amounts during these seasons is causing anomalies in the correlation values. As inferred from Fig. 4b, e, the correlation values are comparatively less in the case of winter (CC = 0.3294) and post-monsoon (CC = 0.69315) seasons. A correlation coefficient of 0.7315 indicates that pre-monsoon seasonal IMDAA data shows a moderate level of correlation with the IMD daily grid wise precipitation data. An additional information about the correlational relationship can be derived from the best fit trend line and the corresponding equation of the line which is included in the figure.

Fig. 4
figure 4

Scatter plot of average daily precipitation estimated by IMD versus IMDAA during a annual, b winter, c pre-monsoon, d monsoon and e post-monsoon

4.3 Spatial variability of IMDAA and IMD on annual and seasonal range

Spatiotemporal patterns of precipitation estimation resulting from a reanalysis product are an important analysis technique in establishing many hydrometeorological parameters.

Figure 5 depicts spatial distributions of grid points showing significant difference in annual and seasonal precipitation between IMDAA and IMD across the landmass during 2000–2020. Different colours in the distribution represent grid points which are manifesting notable positive or negative changes in mean precipitation amount when compared to the observation dataset.

Fig. 5
figure 5

Spatial distribution of significantly different grid points of seasonal and annual daily average precipitation, a annual, b winter, c pre-monsoon, d monsoon, e post-monsoon

Student’s t test was used to analyse the extent to which there is a significant difference between IMDAA precipitation values and the observation dataset. For the 810 data points, a t test was done between 21 years’ annual and seasonal precipitation values. The test resulted in zero for the grid points showing the same rainfall value as that of IMD during the years 2000–2020 period and the statistic value is one for grid points exhibiting a difference in precipitation.

In Fig. 5a, the spatial distribution shows that over the Telangana and most of the areas of Andhra Pradesh, there is a significant positive difference of 0.5–1 mm/day. It is evident that there is an overestimation by IMDAA over some grids of Karnataka over which an annual rainfall of 3 mm/day is received. Along the coastal sides of the peninsular region, IMDAA shows good agreement with the IMD observations and is having average precipitation of 10–12 mm/day. Additionally, along the coastal sides of SPI, an underestimation of IMDAA annual rainfall can be seen. Furthermore, IMDAA overestimated the amount of rainfall in most areas of SPI during the winter by 0–1 mm/day.

Pre-monsoon rainfall predicted by IMDAA is analogous to IMD as the number grids indicating significant difference is very less and an overestimation is only seen in relatively few number of grids along the coastal regions Andhra Pradesh, Karnataka and Kerala as shown in Fig. 5c. Precipitation during monsoon season estimated by IMDAA demonstrate a great homogeneity over all the parts of the landmass as given in Fig. 5d. The area received mean rainfall of 5.35 mm/day according to IMD and 6.43 mm/day based on IMDAA during the monsoon period from 2000 to 2020. A mixture of over and under estimations of IMDAA rainfall can be seen with a concentration of significantly different grids over Telangana. For the post-monsoon season, IMD and IMDAA estimated 2–3 mm/day along the northwest sides of Karnataka and over Telangana, 7–8 mm/day over Kerala and some regions of Fig. 5e indicate that IMDAA rainfall during post-monsoon season shows significant difference in the northern states of the region, especially overestimation in the range of 0–0.5 mm/day over Telangana and Karnataka. Underestimation of the rainfall amount predicted by IMDAA can be visualised through the mean difference value which is in the range of 0–1.5 mm/day.

Quantitatively, the estimations of IMDAA are comparable with IMD observations excluding overestimations in the course of some seasons over particular areas of the landmass. Hence, during 2000–2020, IMDAA well-captured the precipitation for all the seasons and overall times of the years.

The performance of IMDAA in correctly extracting precipitation amounts for each grid point in different years shows differences in several grid points during winter and post-monsoon seasons. But for the monsoon and pre-monsoon seasons, the spatial distribution follows a pattern in which there is a larger number of white grids than shaded grids indicating the efficiency of IMDAA.

4.4 Comparison using statistical metrics

Apart from the visual representation of the homogeneity of the two products, a statistical evaluation gives a better interpretation of the performance of the reanalysis product. Tables 2 and 3 illustrate the values of various statistical measures to predict the reliability of the reanalysis product.

Table 2 Statistical indices results for monthly analysis
Table 3 Statistical indices results for seasonal and annual analysis

Monthly, seasonal, and annual variations of statistical indices are observed for IMDAA with an observation dataset from IMD across the SPI region during the span of 2000–2020. CC values are high for all the months indicating a good linear relationship between reanalysis and observation. RMSE and MAE show a similar trend moving from January to December. RMSE and MAE values are lowest for the month of January (~ 3–4 mm/month), from February to June there is a linear increase in RMSE and MAE values i.e., ~ 5.5–60 mm/month and a sudden trend of decreasing RMSE and MAE values can be seen from July to December with approximately in the range of 35–7 mm/month. Relative Bias is also showing a similar pattern to that of RMSE and MAE values. Index of Agreement values is close to one, indicating a good agreement over all twelve months even though there are some differences in the performance of IMDAA in each month as indicated by other statistics. This may be because of the sensitivity of the index due to the presence of extreme values leading to varying squared differences.

Table 3 gives statistical values for seasonal and annual scales. A similar pattern is shown by the seasonal values also. That is, the NSE, KGE and CC values are the smallest and have a high relative bias for the pre-monsoon (March–May) season. Monsoon and annual periods have a negative NSE but a lower relative bias indicating the efficiency of IMDAA. The RMSE and MAE values are very high in the case of monsoon and annual since these are calculated on a seasonal (mm/season) and annual (mm/year) basis in which the total precipitation amount is very high. Overall results show that IMDAA estimations are in close relation to IMD based on all the statistical indices even though some months are showing some discrepancies.

4.5 Geometrical evaluation using performance diagrams

Evaluation of performance of IMDAA is done using a geometrical relationship representation approach known as performance diagram. Performance diagram can depict the relationship between four important forecast metrics such as Probability of Detection (POD), Bias, Success Ratio (SR) and Critical Success Index (CSI). Percentiles of data are normalised to find out the hits and misses by fixing a threshold of zero. Hit occurs when both observation and reanalysis data behave analogously, that is, when observation is greater than zero, reanalysis is also greater than zero or vice versa. Misses are defined as a condition where observation and predicted data behaves oppositely.

Figure 6a indicates performance diagram of IMDAA during seasonal and annual scales where the data are taken yearly over areal average of 810 grid points and temporal average corresponding to seasonal or annual basis. In all the seasonal and annual data, IMDAA is showing good performance. IMDAA performs best during pre-monsoon season where the POD and SR values are close to one, winter and post-monsoon values are coinciding, annual and monsoon season plots are varying similarly. Figure 6b shows performance of IMDAA in the case of daily data for annual and seasonal scale. Prediction during annual (POD = 0.98 and SR = 0.98) and post-monsoon (POD = 0.98 and SR = 0.99) season is relatively satisfactory with respect to other seasons in daily case. POD and CSI of winter (POD = 0.87) and monsoon (POD = 0.84) are less than other points. Spatial scale comparison of annual and seasonal data is shown in Fig. 6c. IMDAA works better in monsoon season, which is having high POD (0.95), CSI, and SR (0.97) values. But annual, pre-monsoon and post-monsoon seasons show similar properties. Out of the five forecast values, IMDAA during winter season (POD = 0.82 and SR = 0.88) shows relatively worst performance. Moreover, the mean values over yearly, spatial, and daily scale data, an analysis was carried out to evaluate the reliability using percentile values. Performance diagram of 90th percentile (Fig. 6d), 95th percentile (Fig. 6e) and 99th percentile (Fig. 6f) resulted in the excellence of detection of precipitation by IMDAA during monsoon season with large POD and SR values. 90th and 95th percentile performance plots exhibit similar characteristics with annual( POD for 90th = 0.90, SR for 90th = 0.91 and POD for 95th = 0.88, SR for 95th = 0.93), pre-monsoon (POD for 90th = 0.91,SR for 90th = 0.94 and POD for 95th = 0.89, SR for 95th = 0.93), monsoon (POD for 90th = 0.93, SR for 90th 0.97 and POD for 95th = 0.95, SR for 95th = 0.97) and post-monsoon(POD for 90th = 0.96, SR for 90th = 0.93, POD for 95th = 0.92, SR for 95th = 0.91) seasons. Working of IMDAA during the winter season is relatively bad when compared to other seasons as seen in all the three percentile diagrams. Analysing all the performance diagrams, it can be concluded that prediction by IMDAA is close enough in all aspects, whether it is annual, daily, spatial or percentiles indicating the robustness of IMDAA in capturing precipitation occurrences.

Fig. 6
figure 6

Performance diagrams of IMDAA with respect to a annual variation, b daily variation, c spatial variation, d 90th percentile variation, e 95th percentile variation, f 99th percentile variation

4.6 Mahalanobis distance metric for percentiles

The intensity of rainfall events can adversely affect agriculture, biodiversity, lives of people, constructions, etc. Extreme rainfall can cause flash floods leading to catastrophes. The frequency of such extreme events increases the chance of distorting the balance of the climatic conditions. Hence a study verifying the changes in percentiles of precipitation over the years 2000–2020 of each grid point rather than mean rainfall is also important. Rather than the usual Euclidean metric, a new technique of finding multivariate distance is used to evaluate the variations in percentiles of IMDAA and IMD. Corresponding to each grid point, the 90th, 95th, and 99th percentile of precipitation for 21 years are calculated. Two groups are formed relative to IMDAA and IMD datasets using these percentiles as the three variables. Mahalanobis distance was calculated between these groups in annual and seasonal cases.

Mahalanobis distance considers correlation between variables explaining the variability between groups by measuring distance between centroids of clusters of different groups. Average MD values over the area are smaller for winter (1.29), monsoon (3.09) and post-monsoon (2.64) seasons while annual (8.38) and pre-monsoon (6.45) shows distinguishably high MD values. With reference to the IMD, IMDAA shows distinct percentile characteristics during different seasons as depicted by the MD values.

After calculating the distance, Hotelling’s \({T}^{2}\) test was utilised to find the grid points with a significant difference in percentile values or which have high Mahalanobis distance. Spatial plots showing the grid points of significant variations are shown in Fig. 7. At 5% significance level, the annual percentile groups show the highest number of significantly different grids with account of 810 grids. Grid points spread out in the central regions of SPI show distinguishable variation in percentiles considering the yearly data (Fig. 7a). Telangana and different parts of Andhra Pradesh also contain grid points with percentile differences during the pre-monsoon season, as seen in Fig. 7c. 188 out of 810 grid points, mostly lying in Telangana and coastal regions of Andhra Pradesh shows significant difference during pre-monsoon season. Apart from the lower MD values, the similarity between IMDAA and IMD percentile groups for winter (1 grid), monsoon (19 grids) and post-monsoon (5 grids) seasons is again shown by the smaller number of significantly different grids. Indifferent behavioural patterns in percentiles are not displayed by IMDAA during monsoon and post-monsoon seasons as presented in Fig. 7d, e. In a nutshell, a comparison of IMDAA and IMD in terms of distance between groups dependent on percentiles resulted in the depiction of proficiency of reanalysis product in capturing precipitation to a great extent.

Fig. 7
figure 7

Spatial plot of grid points showing significant Mahalanobis distance during a annual, b winter, c pre-monsoon, d monsoon and e post-monsoon

4.7 Spatial distribution of extreme rainfall episodes over SPI

The agricultural and economic growth of India is largely dependent on rainfall received by the country. The occurrence of extreme rainfall events over the country is exceeding the normal frequency for the past few years. Regional variations in land cover, urbanization, global warming, deforestation, etc. affect the amount of rainfall received by the country. Low-pressure systems over the Bay of Bengal and the increase in cyclonic activity over the Arabian sea are also causing extreme events in many parts of the country (Chaluvadi et al. 2021). To avert the significant economic damage caused by such catastrophic events, an effective reanalysis dataset is needed.

This section examines the statewise assessment of severe rainfall incidents that occurred between 2000 and 2020. Analyses are done to see how well IMDAA and IMD can estimate supposedly severe occurrences that have already taken place.

Between the Arabian Sea to the west and the Western Ghats to the east sits Kerala, which covers 38,863 km2 of the total area of the mainland. Kerala has a humid marine climate with the monsoon season from June to August receiving an average of 3100–7030 mm annual rainfall. The state consists of land imposed by the Western Ghats composed of mountains, valleys, etc. About 44 rivers are present in Kerala, out of which 40 are originating in the Western Ghats. For the past few years, the frequency of flooding has increased in the state (Lal et al. 2020). The spatial distribution of average rainfall obtained by the area during extreme rainfall events from 6 to 18 August 2018 estimated by IMDAA and IMD is displayed in Fig. 8a, b. This extreme rainfall event caused havoc in the state with huge economic loss. Spatial distributions are plotted with the original resolutions of both datasets. That is, IMDAA is having 12 km resolution and IMD with 25 km resolution. IMDAA and IMD are exhibiting comparable spatial patterns from 6 August to 18 August 2018 with heavy rains in several areas of Kerala.

Fig. 8
figure 8figure 8

Spatial distribution of extreme rainfall episodes estimated by IMDAA over a Kerala, c Tamil Nadu, e Andhra Pradesh, g Telangana, i Karnataka, and IMD over b Kerala, d Tamil Nadu, f Andhra Pradesh, h Telangana, j Karnataka

Tamil Nadu is one of the largest states in India, having a total area of 130,058 km2. The Indian Ocean borders it on the south, the Bay of Bengal to the east, Kerala to the west, Karnataka to the northwest, and Andhra Pradesh to the north. Much of the rain-bearing clouds are obstructed by the Western Ghats, leading to variations in monsoon over Tamil Nadu, effectively causing droughts in some regions. But a distinct sudden rainfall is causing severe floods in Tamil Nadu. An extreme rainfall scenario is observed from 26th November to 2nd December 2015 (Singh et al. 2018). Figure 8c, d depicts spatial analysis on these rainy days. Results imply that IMDAA underestimated rain in some regions of Tamil Nadu as seen in the distribution image. Discrepancies between the IMDAA and IMD outcomes are seen in the figures where IMD showed heavy rainfall in coastal lines but IMDAA showed heavy rainfall in some grid points other than one shown by IMD.

Andhra Pradesh is located between latitudes 12° 41′ N and 19.07°′N and longitudes 77° E and 84° 40′ E. Andhra Pradesh has the second longest coastal line in the country and consists of major parts of the Eastern Ghats and Deccan Plateau. A flood occurred from 29th September to 2nd October 2009 due to heavy precipitation in the cause of a mesoscale system (Kumar et al. 2012). Figure 8e, f represents these rainfall episodes and shows how well IMDAA and IMD captured this event. IMD fails in capturing these events properly, whereas IMDAA shows higher precipitation values in Andhra Pradesh during these days.

Telangana is a semi-arid state located on Deccan Plateau. Godavari and Krishna are the two major rivers in the state. A yearly rainfall of 700–900 mm is received by the southern region and 900–1500 mm by the northern region of the state, with major influence by the summer monsoon season. About 250 mm of rainfall is received by the north-western parts of the state from 23–24 September 2016 (Boyaj et al. 2020). On 24 September 2016, a rainfall of 390 mm was reported at the station Armoor (18.9\(^\circ \) N, 78.29\(^\circ \) E). These rainfall episodes are accurately estimated by the two products, as seen in Fig. 8g, h.

Karnataka is in the plateau of convergence of the Western Ghats and the Eastern Ghats. The topographical features of Karnataka are diverse, with forests, mountains, coastal plains, and remnant hills. About 80% of the rainfall received by the state is during the southwest monsoon season. Depression in the Bay of Bengal caused heavy rainfall of around 224 mm in the state on 6–10 August 2019. It is evident from Fig. 8i, j that IMDAA and IMD well-captured the precipitation amount.

4.8 Extreme event prediction—utilisation of extreme value distribution

The accuracy of IMDAA in estimating the level of severe rainfall occurrences was examined in the preceding section. In this section, an extreme value distribution is utilised to predict extremes when given precipitation data. Extremes are predicted using precipitation data during each of the rainfall episodes mentioned in Sect. 4.7, so that the effectiveness of the data may be evaluated. Corresponding to each state’s extreme rainfall days, extreme value distribution was used to generate grid points illustrating the extremes on the specified days. Gumbel distribution was used to fit the data and examine the extreme grid points. On each rainfall episodes, distribution was utilised to find out the grid points showing extremes. For the five states, precipitation data predicted by IMDAA and IMD during extreme rainfall episodes happened in each state were observed. The observed rainfall during these episodes predicted by IMDAA and IMD were used to determine extremes, which in hand presented the difference between the reliability of IMDAA and IMD data in predicting extremes. The extreme grid points predicted by IMDAA and IMD are shown in Fig. 9.

Fig. 9
figure 9figure 9

Spatial plots of grid points with extreme rainfall events estimated using IMDAA over a Kerala, c Tamil Nadu, e Andhra Pradesh, g Telangana, i Karnataka, and IMD over b Kerala, d Tamil Nadu, f Andhra Pradesh, h Telangana, j Karnataka

In Fig. 9a, b, IMDAA is showing overall overestimation when compared to IMD, but over Kerala, IMDAA shows more grid points with extremes. Extreme valued grid points are similar for both IMDAA and IMD in the case of Tamil Nadu rainfall episodes (Fig. 9c, d). IMDAA and IMD are forecasting the same grid points with extreme value during the Andhra Pradesh extremes, with the exception of two or three grid points (Fig. 9e, f). IMDAA overestimated the number of extreme grid points but showed compatible results over Telangana (Fig. 9g, h).

A record amount of rainfall was received by Karnataka during 6–10 August 2019. Still, the extreme grid points detected using IMD data are unsatisfactory since no grid points exhibit extremes in Karnataka. In contrast, IMDAA spatial plot reveals most of the grid points which had extreme precipitation, especially over Karnataka. IMDAA precipitation data, as opposed to IMD precipitation data, accurately represents extremes, according to the findings. Also, IMDAA being 12 km resolution data available even on an hourly scale, utilisation of it for predicting extreme events stands to reason. The unreliable behaviour of IMD, since it failed to determine extreme grid points which was correctly predicted by IMDAA implies that usage of IMDAA helps in accurately predicting extremes which in turn helps assessing the spatiotemporal variability much better.

4.9 Teleconnection of ENSO with SPI rainfall

The interannual variability of tropical climate system is influenced by a large-scale phenomenon known as ENSO ( El Niño-Southern Oscillation). Research suggests that an inverse relationship exists between ENSO and Indian summer monsoon rainfall. El Niño is referred to as a situation where five consecutive three month moving average Oceanic Nino Index Average (ONIA) values exceed 0.5 °C (Azad and Rajeevan 2016).

An attempt is made to investigate the role of ENSO in the interannual variation of rainfall over SPI by determining how resonant the extreme years predicted using IMDAA with the ENSO signal. In general, El Niño events are associated with dry weather and La Niña occurrences are related to wet conditions. Figure 10 represents the interannual variability of standardised rainfall time series with depiction of El Niño and La Niña year’s corresponding to IMDAA and IMD from 2000–2020.

Fig. 10
figure 10

Interannual variability of standardised rainfall time series with El Niño and La Niña year’s corresponding to a IMDAA and b IMD from 2000 to 2020

El Niño years are shown in red and La Niña years are shown in blue colour. Fig. 10a, b implies that IMDAA and IMD behaves similarly considering the variability with respect to ENSO. El Niño and La Niña are the warm and cold phases of the climate patterns, respectively. In most of the El Niño years, normalised precipitation has got a negative value and for the La Niña years, a positive value can be seen which is in tune with the characteristics of El Niño and La Niña events. The La Niña years of 2000, 2007, 2008, 2010 and 2020 imply a positive normalised precipitation with consistent behaviour in both IMDAA and IMD. The wet year 2014 is not associated with La Niña episodes. Over the SPI, the relatively dry year of 2002 is not related to El Niño in case of IMD data, but it is related when normalised precipitation is from IMDAA. The wet years of 2006, 2015, 2016 and 2020 is not completely account for La Niña episodes. During the year of 2018 and 2019, the southern parts of SPI has experienced flood, though there was El Niño episodes. Even though IMD normalised precipitation is positive, it is far less than IMDAA, which in turn represent the wet year more efficiently. Conclusively, it is obvious that the interannual variability is dependent on the ENSO events but cannot completely rely on the El Niño and La Niña episodes to predict extreme years.

5 Summary and conclusions

For the past few decades, South Peninsular India has been facing extreme rainfall events causing severe meteorological impacts, including agricultural and infrastructural losses. A robust reanalysis product with high resolution is required for predicting future forecasts to take adequate measurements and reduce risks of extreme events such as flash floods and landslides.

A high-resolution 12 km gridded rainfall dataset produced by National Centre for Medium Range Weather Forecasting (NCMRWF), namely IMDAA, is analysed to estimate rainfall characteristics and extreme events over South Peninsular India during 2000–2020. IMD daily gridded rainfall dataset is explored for the benchmark of the study. The datasets considered are acquired using different assimilation techniques and can be utilised for comparison without any overlapping. Proper assessment of monthly, seasonal and annual scale precipitation estimation by IMDAA is done with IMD observations. Investigation of spatial trends of mean precipitation during seasonal and annual scale, statewise rainfall episodes, identification of significant differences in precipitation using distance metric known as Mahalanobis distance, and prediction of extreme events using extreme value distribution is done. Several visual and statistical assessment is also implemented, such as calculating several statistical measures and graphical representation of detection capability using a Performance diagram. The major conclusions of the research can be summed up as follows:

  1. 1.

    IMDAA slightly overestimated average precipitation over SPI throughout all the years under investigation. Seasonal scale measurements reveal that the IMDAA's assessment varies from season to season. Daily average precipitation obtained by IMDAA over SPI during 2000–2020 represents a close correlation with IMD observations.

  2. 2.

    Utilisation of Student’s t test to find the grid points showing significant variations in average precipitation over the years implied that the number of grid points indicating substantial difference is more prominent in winter and post-monsoon season compared to monsoon and pre-monsoon season, according to a spatial plot displaying grid points exhibiting significant differences throughout each season and yearly.

  3. 3.

    Correlation values are pretty high for all monthly, seasonal, and annual cases, but RMSE, RB, and MAE values vary in a similar pattern. Both seasonal and monthly statistics show identical characteristics, such as RMSE, RB, and MAE being smaller for the winter season, with much higher values till starting of the monsoon season and decreasing from July to December. Index of Agreement values is close to one in all cases. NSE and KGE values are also changing concerning the months or seasons of study. It is clear that, even though the statistical metrics show different values based on the seasons, all metrics are close to optimal values.

  4. 4.

    Graphical representation of precipitation detection capacity of IMDAA during daily, annual, spatial and percentile using Performance diagram indicates that POD and SR are close to one in most cases. For the annual scale, yearly and seasonal plots, IMDAA exhibits good performance. Daily scale performance of IMDAA is better in annual and post-monsoon seasons, whereas over the spatial scale, it performs best during monsoon season. 90th and 95th percentile performance diagrams revealed comparable characteristics.

  5. 5.

    Mahalanobis distance for percentiles implied that monsoon and post-monsoon seasons demonstrated strong compatibility between IMDAA and IMD; however, annual and pre-monsoon seasons displayed some considerably dissimilar grid points in terms of distance.

  6. 6.

    The five states constituting SPI are diverse in their climatology. Each state is impacted by rainfall differently due to its unique geographic and climatic characteristics. Spatial distributions of rainfall episodes of each state acted differently. For the state of Kerala, IMDAA and IMD exhibited similar distributions. A dry bias was shown by IMDAA over Tamil Nadu for precipitation from 26th November to 2nd December 2015. On the other hand, IMD could not effectively capture the rainfall episodes from 29th September to 2nd October 2009 over Andhra Pradesh, while IMDAA successfully captured the rainfall. IMDAA indicated the reported rainfall exactly over Telangana and Karnataka.

  7. 7.

    The extreme value distribution overestimated extreme grid points by IMDAA data in the Telangana and Kerala rainfall episodes. Extreme grid points were satisfactorily the same for IMDAA and IMD in the case of Tamil Nadu and Telangana. However, IMDAA correctly detected extreme grid points in the Karnataka region while IMD failed to do so.

  8. 8.

    ENSO phenomenon impacts the tropical climate including Indian monsoon. Interannual variability of standardised rainfall time series of IMDAA and IMD with respect to El Niño and La Niña years indicated that extreme years are resonant with the ENSO signal over the region but cannot completely rely on ENSO to predict extremes.

The results suggest that IMDAA reanalysis dataset is better at estimating accurate precipitation over SPI. The research study also gives insight into the certainty that IMDAA data may be used to forecast extreme rainfall events. The overestimation of rainfall amount estimated by IMDAA in some seasons and regions can be corrected using bias correction. The future study will include the bias correction of the reanalysis product and consider other factors responsible for the extreme events.