1 Introduction

Recent studies indicate that global warming and its consequences substantially impact the world's socio-economic development. It is reported that the available moisture in the atmosphere will increase at a rate of ~ 7% per kelvin of temperature rise, potentially increasing the magnitude and frequencies of extreme precipitation (EP) events ( Trenberth et al. 2003; Donat et al. 2013; O'Gorman 2015; Trenberth 2012). Most of the water infrastructure designs are based on the assumption that EP over time remains stationary. However, due to the effects of the warming climate, the stationarity assumption is not valid. Therefore, non-stationary modelling approaches are developed to address the changes in EP over time. Coles (2001) proposed the non-stationary generalized extreme value (NSGEV) modelling framework based on the Extreme Value Theory (EVT). It provides a robust mathematical framework and has been widely used for modelling of EP (Cheng et al. 2014; Zhang et al. 2010).

The NSGEV models are fitted by assuming the distribution parameters as a function of dependent variables (also known as covariates) (Coles 2001). Most of the studies considered the location or scale or both the parameters of the NSGEV model as a function of time (Cheng and Aghakouchak 2014; El Adlouni et al. 2007; Sarhadi and Soulis 2017; Yilmaz and Perera 2014). However, large-scale climate teleconnections resulting from the variations in the atmosphere–ocean oscillations show strong correlations with EP and have improved the performance of the NSGEV model. Zhang et al. (2010) used covariates such as Pacific Decadal Oscillation (PDO), Southern Oscillation Index (SOI), and North Atlantic Oscillation (NAO) to model the EP over North America. The results showed that the SOI and PDO strongly influence EP. Agilan and Umamahesh (2016) developed the NSGEV model by selecting five covariates, i.e., land-use changes, local and global temperature (annual mean), El Nino Southern Oscillation (ENSO) (averaged Nov-Mar), Indian Ocean Dipole (IOD) (averaged Jun-Nov), and time. They found that the teleconnections as a covariate are appropriate for the study area. Thiombiano et al. (2018) found that Arctic Oscillation (AO) and Pacific North American (PNA) patterns significantly influence the extreme daily precipitation over South-Eastern Canada. Ouarda et al. (2019) modeled NSGEV for three stations in Canada and California, USA with AMO, Western Hemisphere Warm Pool, SOI, and PDO covariates and showed that the NSGEV is a better fit than the stationary. Jha et al. (2021) assessed the influence of ENSO, IOD, and Atlantic Multidecadal Oscillation (AMO) indices (averaged over Jun-Sep) on EP using the NSGEV model for 24 river basins of India, assuming a simultaneous relationship between changes in SST and precipitation and found that NSGEV models perform better than the SGEV models.

The climate index series considered are usually averaged over a few months/seasons or annually, depending on the spatio-temporal effects of teleconnections on the EP. These uniform time-lag climate index series were used as covariates in the non-stationary models (Gao et al. 2016; Jha et al. 2021; Zhang et al. 2010). However, the variations in teleconnection patterns and their effect on extreme precipitation occur non-uniformly in space and time. He and Guan (2013) observed that the time-lags between precipitation and teleconnections better interpret precipitation's temporal variability. Kim et al. (2018) showed that the non-uniform climate indices could account for the delayed effect of changes in atmospheric circulations on precipitation. It was observed that the ENSO indices with four- and ten-month lags strongly represented periodic oscillations in monthly precipitation. Armal et al. (2018) investigated the frequency of precipitation across the United States, and they suggested examining the time-lagged dependence of the climate variables and precipitation spatially. Most of these works focused on using uniform time-lag covariates with hydroclimatic series but not for non-stationary modelling of EP. Therefore, it is envisaged that non-uniform time lag climate indices as covariates may improve the performance of the NSGEV models.

Many studies were carried out to understand the effect of teleconnections on EP over a large-scale region. Mondal and Mujumdar (2015) modelled non-stationary extreme rainfall over India using covariates such as Nino 3.4 SST anomalies, global temperature, and local temperature. Significant spatial variability was observed in EP characteristics with the teleconnections. Tan and Gan (2017) considered teleconnections ENSO, NAO, PDO, North Pacific Oscillations (NPO) for analyzing the frequency and intensity of EP under non-stationary conditions over Canada. They found that the ENSO's effect on EP was consistent at all stations across the region. Also, the PDO, NAO, and NPO significantly influence EP variability at a few stations across Canada. Vu and Mishra (2019) studied non-stationary EP events over the United States using time, an annual average maximum, and mean temperature and ENSO as covariates. It was observed that the different combinations of covariates influence EP variability over the US under non-stationary conditions. One of the possible ways to overcome this variability in large-scale study areas is to cluster the regions of EP. The identification of zones helps in reducing the prediction uncertainties. The regionalization of precipitation in a stationary framework is well studied (Irwin et al. 2017; Roushangar et al. 2018; Roushangar and Alizadeh 2018). However, the regionalization in a non-stationary framework accounting for the influence of teleconnection indices on EP has not been well explored, especially for Monsoon Asia Region (MAR).

The proposed study addresses three key issues: i) Selection of non-uniform time-lag climate indices: most of the studies in the NSGEV framework have used uniform time-lagged covariates, i.e., irrespective of their EP month of occurrence, the covariate is averaged over particular months (uniform). However, the maximum correlation between EP and climate indices will vary in space and time and hence the covariates series should be non-uniformly lagged. It is envisaged in this study that identifying the non-uniform lag structure between precipitation and the multiple climatic indices would improve the prediction of EP and contribute to a better understanding/representation of the mechanics underlying these large-scale EP responses to teleconnection patterns; ii) Regionalization of NSGEV model parameters: identification of coherent regional spatial patterns and the uncertainty associated with non-stationary extreme precipitation across large study domains need to be addressed. This is achieved by regionalizing NSGEV model parameters by grouping similar EP sites into coherent zones. The zones will reduce the uncertainties in the prediction of EP. The performance of the regional NSGEV model needs to be evaluated to the individual sites within the zone; and (iii) Estimation of Zonal EPs': the regional NSGEV model is used to estimate the intensity of extreme precipitation for the selected return periods in each zone, and compared with the estimates from the SGEV model. The EP intensity for various return periods is investigated and presented for each cluster. In addition, the efficacy of the proposed modelling framework is also compared with traditional NSGEV models using corrected Akaike Information Criteria (AICc) for MAR.

The paper comprises the following sections: Sect. 2 describes the MAR study area, precipitation data, and teleconnections. Following this, Sect. 3 presents a detailed methodology explaining (i) NSGEV model framework, (ii) Selection of non-uniform climate indices, (iii) Clustering of EP zones. In Sect. 4, the results and discussions showing the efficacy of the non-uniform climate index time series and regional parameters are presented. Further, the section presents the variations and changes in EP characteristics for stationary and non-stationary. Finally, the conclusion and future scope of the proposed study are presented in Sect. 5.

2 Case study

In the present study, the Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) gridded precipitation dataset for 1951–2007 with a 0.5 × 0.5-degree spatial resolution for the Monsoon Asia Region (MAR) is used as illustrated in Fig. 1 (Yatagai et al. 2009, 2012) (http://www.chikyu.ac.jp/precip/index.html). It is developed by the Research Institute for Humanity and Nature (RIHN) and the JMA (Japan Meteorological Agency). The region includes mainly South Asia (India, Nepal, Pakistan, Bangladesh), South-East Asia (SEA), and Northern Australia. The \(180 \times 140\) grid locations cover \(60^\circ -150^\circ \mathrm{E}, 15^\circ -55^\circ \mathrm{N}\) region. There are 25,200 grid locations, of which 13,672 and 11,528 are ocean and land grids. The Annual Maximum Precipitation (AMP) series is extracted for each grid location and used in the study, referred to as EP. Aphrodite data are extensively used for various applications such as (i) Study of seasonal variations of the summer monsoon events (Befort et al. 2016); (ii) Climate change's effect on non-monsoonal precipitation in the Western Himalayas (Krishnan et al. 2019); (iii) Simulating the response of East Asia's rainfall to ENSO (Ng et al. 2019); (iv) Understanding dry spell characteristics over India (Sushama et al. 2014); (v) Understand the El Nino and Non-El Nino effects on droughts (Varikoden et al. 2015).

Fig. 1
figure 1

Map with APHRODITE grids representing Monsoon Asia Region (MAR)

2.1 Climate indices

Climate patterns and circulation processes are influenced by global atmospheric circulation at multiple scales. The low-frequency oscillations that reflect large-scale atmospheric circulation patterns are called "teleconnections patterns" and arise from internal atmospheric dynamics. Many planetary-scale teleconnection patterns have wide-scale effects, cover large areas, and contribute to changes in the earth's temperature, precipitation, and storm tracks. These are responsible for severe weather conditions, including extreme precipitation and flooding (Barnston and Livezey 1987; Mo and Livezey 1986). The climate centers worldwide monitor and tabulate the teleconnection patterns, also known as "climate indices," based on geopotential height, sea level pressure, and sea surface temperature. Several observations indicate that extreme precipitation variability has a strong correlation with teleconnections. For example, ENSO is reported as the most significant climate index affecting global extreme precipitation variability (Dai and Wigley 2000; Krishnaswamy et al. 2015; Sun et al. 2015; Wang 2002). Krishnaswamy et al. (2015) showed ENSO and IOD as significant drivers for Indian Monsoon variability and increased extreme precipitation events. Wei et al. (2017) found that extreme precipitation changes are mainly induced by teleconnections like ENSO, PDO, and, East Asian summer monsoon over China. Few studies over Australia reported that the ENSO phases (Verdon et al. 2004), SAM (Meneghini et al. 2007), and IOD (Ashok et al. 2003) have a high impact on precipitation changes. Ng et al. (2019) observed that the extreme precipitation varies during ENSO phases (El Nino and La Nino) over East Asia. Duan et al. (2015) found that climate indices like SOI have significant associations with the increasing extreme precipitation over Japan. Rathinasamy et al. (2019) confirmed a strong correlation between the IOD, PDO, and NAO and precipitation extremes across India. In the present study, 15 climate indices are used for the NSGEV model as covariates to analyze the extreme precipitation over MAR. Table 1 shows the list of climate indices with a brief description.

Table 1 Selected climate indices and their description

3 Methodology

The proposed methodology has five main modules: (1) Selection of uniform and non-uniform time-lag climate indices; (2) NSGEV model; (3) Model selection using AICc; (4) Zone formation using clustering algorithm; (5) Performance evaluation of the model. The proposed methodology is schematically represented in Fig. 2. The details for each of the modules are explained in the following subsections.

Fig. 2
figure 2

Schematic representation of the proposed methodology for the NSGEV framework

3.1 Selection of uniform and non-uniform time-lag climate indices

This study identifies the time-lags between precipitation and climate indices based on the cross-correlation technique. The lag corresponding to the maximum correlation coefficient is the offset or strong dependence between precipitation and the corresponding climate index. The advantage of cross-correlation is that it is simple to compute and identify the covarying pattern between precipitation and climate indices in space and time instantaneously. It is to be noted that the cross-correlation analysis addresses:(i) The identification of the time-lags with each climate indices at each grid location and (ii) Extraction of the non-uniform climate index series corresponding to Annual Maximum Precipitation (AMP)/EP at each grid location. The hypothetical example of extracting the uniform and non-uniform climate index series corresponding to a grid location in MAR is shown in Fig. 3 and are presented below:

  • (i) Extraction of Month Index: Consider AMP series of 'n' years at a grid location. The shaded boxes in Fig. 3a represent the month-index of EP. For example, in year-1, EP is observed in the month-7, i.e., the month index is 7 for year 1; for year-2, AMP's month-index- 8 and so on.

  • (ii) Uniform time-lag climate index series is selected based on the average over a season or few months irrespective of EP occurrence in any month/time of a year. Figure 3b shows that the climate index series selection is constant. In this hypothetical example, the index value is obtained by taking an average over the three-month window as shown in Fig. 3b, i.e., months 6–8, and remains uniform throughout the study region.

  • (iii) Non-uniform time-lag climate index series

    Step 1::

    The cross-correlation between monthly precipitation and climate index is carried out. A time lag corresponding to the maximum correlation coefficient is selected at each grid location. In this example, a time lag of 5 months is assumed.

    Step 2::

    The climate index series is selected corresponding to the time-lag for each year's EP series, as shown in Figure 3c. From Figure 3a, c, the AMP of year-1 occurs in the month-7, the climate index value averaged over three months windows (i.e., month-2 with time-lag of 5 as the center of the time-window) for year-1 is obtained as shown in Figure 3c. Similarly, for the subsequent years, the non-uniform time-lag climate index series will be derived for MAR.

Fig. 3
figure 3

Hypothetical example of extracting the climate index series at a grid location a annual maximum precipitation series b uniform climate index series c non-uniform climate index series assuming cross-correlation lag of 5

3.2 NSGEV model

The GEV model is extensively used for analyzing EP based on Extreme Value Theory (EVT) (Coles 2001). The location \(\left( {\upmu } \right)\), scale \(\left( {\upsigma } \right)\), and shape \(\left( {\upxi } \right)\) are the three parameters of the GEV distribution. Consider \({\text{x}} = {\text{x}}_{1} ,{\text{ x}}_{2} ,{ } \ldots \ldots \ldots .{\text{x}}_{{\text{n}}}\) to be independent and identically distributed AMP series. The Probability Density Function (PDF), f is given by Eq. 1 and the Cumulative Distribution Function (CDF), F of the GEV is shown in Eq. 2

$${\text{ The PDF}},{\text{ f}}:{\text{f}}\left( {\text{x}} \right) = \frac{1}{{\upsigma }}\left\{ {1 + \frac{{{\upxi }\left( {{\text{x}} - {\upmu }} \right)}}{{\upsigma }}} \right\}^{{\left( { - \frac{1}{{\upxi }}} \right) - 1{ }}} {\text{exp}}[ - \left\{ {1 + \frac{{{\upxi }\left( {{\text{x}} - {\upmu }} \right)}}{{\upsigma }}} \right\}^{{ - \frac{1}{{\upxi }}}}$$
(1)
$${\text{ The CDF}},{\text{ F}}:{\text{F}}\left( {\text{x}} \right) = {\text{exp}}\{ - \left[ {1 + \frac{{{\upxi }\left( {{\text{x}} - {\upmu }} \right)}}{{\upsigma }}} \right]^{{ - \frac{1}{{\upxi }}}} ,{ }1 + \frac{{{\upxi }\left( {{\text{x}} - {\upmu }} \right)}}{{\upsigma }} > 0{ }$$
(2)

Non-stationarity is included by considering one or more GEV distribution parameters as a function of time-varying covariates (Coles 2001; Katz 2002). In the present study, covariates are introduced as a function of only location parameters (Eq. 3) and substituted in CDF of GEV (Eq. 4). The scale parameter is not considered for computational simplicity (Ouarda and Charron 2018), and the shape parameter is kept constant due to its uncertainty and difficulty to estimate with precision (Coles 2001).

$${\upmu }\left( {\text{C}} \right) = {\upmu }_{0} + {\upmu }_{1} {\text{C}}$$
(3)
$${\upsigma }\left( {\text{C}} \right) = {\upsigma }$$
(4)
$$\xi \left( C \right) = \xi$$
(5)

The climate Indices described in Table 1 are considered covariates for the NSGEV model and are represented as "C". There will be 15 NSGEV models corresponding to 15 climate indices and an SGEV model at each grid location over MAR. The maximum likelihood estimation is used to estimate the model parameters. The log-likelihood for the cumulative distribution function for both stationary (Eq. 6) and non-stationary (Eq. 7) GEV model is given below:

$$\log L\left( {\mu ,\sigma ,\xi |X} \right) = - n \log \sigma - \left( {1 + \frac{1}{\xi }} \right)\sum \log \left[ {1 + \frac{{\xi \left( {x - \mu } \right)}}{\sigma }} \right]^{{ - \frac{1}{\xi }}} - \sum \left[ {1 + \frac{{\xi \left( {x - \mu } \right)}}{\sigma }} \right]^{{ - \frac{1}{\xi }}}$$
(6)
$$\log L\left( {\mu \left( C \right),\sigma ,\xi |X} \right) = - n \log \sigma \left( {1 + \frac{1}{\xi }} \right)\sum \log \left[ {1 + \frac{{\xi \left( {x - \mu \left( {C} \right)} \right)}}{\sigma }} \right]^{{ - \frac{1}{\xi }}} - \sum \left[ {1 + \frac{{\xi \left( {x - \mu \left( {C} \right)} \right)}}{\sigma }} \right]^{{ - \frac{1}{\xi }}}$$
(7)

The parameters for the SGEV and NSGEV model are obtained by maximizing the negative log-likelihood (Eqs. 6 and 7). The 'ismev' R package estimates the models' parameters (Coles 2001;  Heffernan 2018).

The extreme precipitation is estimated for various return periods under stationary and non-stationary conditions using the low-risk approach (Coles 2001). In this study, the ninety-fifth percentile of the location parameter values (Eq. 8) is estimated and used to calculate the precipitation intensity for selected return periods for the stationary \((Z_{rs} )\) and non-stationary \((Z_{rns} )\) conditions are given below (Eqs. 9 and 10):

$$\mu_{95} = Q_{95} \left( {\mu_{t1} , \mu_{t2} , \ldots \ldots \ldots \ldots \ldots \ldots \ldots \mu_{tm} } \right)$$
(8)
$${\text{ Z}}_{{{\text{rs}}}} = {\upmu } + \frac{{\upsigma }}{{\upxi }}\left[ {\left( { - \log \left( {1 - {\text{p}}} \right)} \right)^{{\upxi }} - 1} \right]$$
(9)
$${\text{Z}}_{{{\text{rns}}}} = \left( {{\upmu }_{95} } \right) + \frac{{\upsigma }}{{\upxi }}\left[ {\left( { - \log \left( {1 - {\text{p}}} \right)} \right)^{{\upxi }} - 1} \right]$$
(10)

where p = 1/T, the probability of occurrence and T is the return period. \({\upmu }_{95}\) is the 95th percentile of the location parameter. The Relative Percentage Change (RPC) in extreme precipitation intensity between non-stationary and stationary are computed using Eq. 11

$$RPC = \frac{{Z_{rns} - Z_{rs} }}{{Z_{rs} }} \times 100\left( \% \right)$$
(11)

where \(Z_{rns}\) and \(Z_{rs}\) are precipitation intensities for non-stationary and stationary models.

3.3 Selection of the model

In this study, the corrected Akaike Information Criterion (AICc) is adopted to choose a model from a set of alternative models arising from the 15 covariates. The negative log-likelihood function (− logL) is penalized according to the number of estimated parameters (p) for each model. The AICc helps prevent the bias of over-fitting the data and is recommended when n/p > 40, where n is the sample size and p is the number of parameters (Burnham and Anderson 2004; Hurvich and Tsai 1989). The AICc for the fitted model is expressed as (Eq. 12)

$${\text{AICc}} = - 2{\text{logL}} + 2{\text{p}} + \frac{{2{\text{p}}\left( {{\text{p}} + 1} \right)}}{{{\text{n}} - {\text{p}} - 1}}$$
(12)

where − log L is the negative log-likelihood value, the first two terms of Eq. 12 represent the classic AIC. Second-order bias correction is added in AICc by the last term. It is worth noting that as n increases, AICc converges to AIC. Each model is evaluated for its AICc, and the one with the lowest AICc is chosen as the best model.

3.4 Clustering using gaussian mixture model (GMM)

The GMM based clustering algorithm is applied to identify an adequate number of EP zones over MAR. This method is very flexible and has advantages over traditional heuristic methods such as k-means and hierarchical clustering, in which the grouping of coherent regions is based on measured distances (Euclidean distance). The number of clusters and initial cluster center must specify a priori. These limitations can be overcome using model-based methods such as GMM in a probability framework involving multiple mixtures (clusters), each defined by a Gaussian distribution. The Bayesian Information Criterion (BIC) is used as an objective metric to determine the best model and an optimal number of clusters. It uses soft clustering, where probability is assigned to each data location in a cluster. The probability value indicates the degree of association between the cluster and the data locations. The mathematical expression of the Gaussian model representing each cluster is given in Eq. 13

$$\begin{aligned} \emptyset_{k} \left( {{\text{z|}}\mu_{k} ,{\Sigma }_{k} } \right) &= \left( {2\pi } \right)^{{ - \frac{1}{2}}} \times \left| {{\Sigma }_{k} } \right|^{{ - \frac{1}{2}}} \\ &\quad\times {\text{exp}}\left\{ { - \frac{1}{2}\left( {z_{i} - \mu_{k} } \right)^{T} \times {\Sigma }_{k}^{ - 1} \left( {z_{i} - \mu_{k} } \right)} \right\}\end{aligned}$$
(13)

where \(z_{i}\) is the attribute data for ith location, \(\mu_{k}\) is the centre of the distribution for kth cluster, and \({\Sigma }_{k}\) is the variance matrix. The Expectation–Maximization (EM) algorithm is used to estimate the maximum likelihood for the parameters (mean and variance) of the Gaussian clusters initialized by hierarchical model-based agglomerative clustering. EM proceeds with an 'E’-step in which a matrix z is computed such that zik is a conditional probability estimate that ith location is in kth cluster for the parameters \(\mu_{k}\) and \({\Sigma }_{k}\). Following 'M-step' estimates the parameters, \(\mu\) and \(\sum\) converges to the maximum log-likelihood values (Eq. 14) of the model (Dempster et al. 1977).

$${\text{log L}}\left( {{\text{z}}|\mu_{k} ,{\Sigma }_{k} } \right) = \mathop \sum \limits_{i = 1}^{n} {\text{log}}\left[ {\mathop \sum \limits_{k = 1}^{K} \pi_{k} N(x_{i} ;\mu_{k} ,{\Sigma }_{k} )} \right]$$
(14)

The BIC determines the adequate number of clusters. It penalizes log-likelihood with increment in sample size by introducing a second component for the parameter size. The BIC is typically used in association with GMMs and takes the following form (Eq. 15)

$${\text{BIC}}_{{\text{M,K}}} {\text{ = 2 x log L}}_{{\text{M,G}}} {-}p{\text{x log}}\left( {\text{n}} \right)$$
(15)

where log LM,G is the log-likelihood for model M, K is the number of clusters, n is the sample size, and p is the number of parameters. The paired {M, K} which optimizes BICM,K is found by varying K between a range of clusters. In this study, K is varied between 1 and 20 to find the optimal clusters. The GMM clustering analysis is carried out using the 'mclust' R package (Scrucca et al. 2016).

3.5 Performance evaluation of models

The quality of the fitted NSGEV models must be evaluated to ensure the selected model fits the data well. The Quantile–Quantile (QQ) plots are used to check the quality of the selected model based on model and empirical quantiles. The AMP data of n- years (\({\text{Zi}}\)) used to fit the NSGEV model is not identically distributed and has to be transformed into residuals (\({\upvarepsilon })\) (Eq. 16) (Coles 2001; Katz 2002). Model quantiles \(({\text{M}}_{{\text{i}}} ){ }\) are the ordered values of residuals and Empirical quantile (\({\text{E}}_{{\text{i}}}\)) is given by Eq. 17. In this study, the Root Mean Square Error (RMSE) based on quantile plots are calculated using the model and empirical quantiles given in Eq. 18. It measures the model's efficacy while quantifying its quality.

$${\upvarepsilon }_{{\text{i}}} = { }\frac{1}{{{\upxi }_{{\text{t}}} }}\log \left\{ {1 + {\upxi }_{{\text{t}}} \left( {\frac{{{\text{Z}}_{{\text{i}}} - {\upmu }_{{\text{t}}} }}{{{\upsigma }_{{\text{t}}} }}} \right)} \right\}$$
(16)
$${\text{E}}_{{\text{I}}} { } = { } - \log \left( { - \log \left( {\frac{{\text{i}}}{{{\text{n}} + 1}}} \right)} \right),{\text{i}} = 1, \ldots \ldots {\text{n}}$$
(17)
$${\text{RMSE}} = \left( {\frac{1}{{\text{n}}}\mathop \sum \limits_{{{\text{i}} = 1}}^{{\text{n}}} \left( {{\text{E}}_{{\text{i}}} - {\text{M}}_{{\text{i}}} } \right)^{2} } \right)^{0.5}$$
(18)

4 Results and discussions

4.1 Extreme precipitation

The EP (annual daily maximum precipitation) is extracted from daily precipitation data for each grid location from 1951 to 2007. The statistics of EP such as mean and standard deviation are extracted for each grid location and are presented in Fig. 4a, b. It is observed that the mean and standard deviation vary spatially across MAR. The range of mean EP varies from 3 to 189 mm, and the standard deviation is between 0.92 mm and 86 mm. The variations show high EPs in the coastal and monsoon windward regions. The Western Ghats, Himalayan foothills, a few parts of Southeast Asia, and most eastern coastal areas show the highest EP. One of the primary reasons for high EP in these regions could be the elevated terrain, windward parts of mountains, and ocean proximity (Saini et al. 2020; Shige et al. 2017; Khouakhi and Villarini 2017; Zhang and Zhou 2019). On the other hand, the EP gradually decreases to inland areas where there is not much moisture left for precipitation and are far from large water bodies. In addition, it is worth to be noted that the number of EPs (i.e., frequency) is more prevalent in coastal regions and generally decreasing towards inland areas. The number of grid locations having mean and the standard deviation of EPs is shown in Fig. 4c, d, respectively. It is observed that the maximum number of grids locations are of EP less than 20 mm and 10 mm, respectively.

Fig. 4
figure 4

Map illustrating the spatial distribution a mean and b standard deviation of EP; Distribution of grid locations over MAR c mean and d standard deviations, i.e., how many grids are in each bin (precipitation interval)

The percentage of EP's per month occurring over the 57 years for MAR is shown in Fig. 5. It is observed that the maximum number of EP's in the MAR occur during summer monsoon or the Southwest monsoon months (i.e., June–September), i.e., about 68% of grids (June—13.72%, July—23.3%, August—20.64%, and September—10.48). Following this, about 9% of EP occurs during the Northeast monsoon months (October–December), mainly in the Bay of Bengal coastal (Southern India), a few East Asian countries, and Sri Lanka. It is worth to be noted that although a few months have a higher occurrence of EP, there is a considerable variation within the months of the year and spatially across MAR. It is evident from the results that the selection of constant lag time series of covariates for the NSGEV model may not represent the plausible delayed-effect of the teleconnections. Thus the effect of variations in the EP occurrence has to be accounted for extracting the covariates.

Fig. 5
figure 5

Month-wise number of EP occurrence (in %) over MAR (color bar represents the number of EP occurrences in percentage)

To understand the temporal variations in the EP, 50 grid locations across MAR is selected (10 regions containing five grids each, as shown in Figure S1). It is observed from Fig. 6 that sites closer to each other have similar patterns of occurrences. Two significant observations are: (i) the grid location sampled from a similar EP characteristics region shows relatively similar patterns; and (ii) EP varies in space and time and is not uniform throughout the MAR. Therefore, since the month index of EP varies over the study domain, the corresponding effect of teleconnection time-lag should also vary in space and time. Consequently, it is anticipated that using non-uniform time-lagged climate indices will capture the dynamics of EP variability better than the uniform/constant lagged climate indices.

Fig. 6
figure 6

Spatial variation of month index of EPs at selected 50 grid locations over MAR (color bar represents the month index)

4.2 Time-lagged climate indices

The uniform time-lagged (UTL) series for a covariate is selected based on the MAR region's averaged months. For example, ENSO is averaged over Nov-Mar (Agilan and Umamahesh 2016; Mondal and Mujumdar 2015), DMI is averaged over Jun–Nov (Agilan and Umamahesh 2016), PNA is averaged over Dec–Mar, and AO is averaged over Jan–Dec (Thiombiano et al. 2018), AMO is averaged over Jun–Aug (Ouarda and Charron 2018), PDO is averaged over Nov–Mar (Gao et al. 2016), global temperature is the annual mean (Mondal and Mujumdar 2015) and NAO is averaged over Dec–Feb (Hao et al. 2019). It is to be noted that the UTL series for a climate index will be identical irrespective of the grid location over MAR. On the other hand, the non-uniform time-lagged (NUTL) climate indices are selected based on the maximum cross-correlation between monthly precipitation and climate index at each grid location. For brevity, NAO results only are presented in this section (and for other indices, refer to supplementary material—Figure S2, S3 and S4). For the NUTL series, the time-lag corresponding to the maximum correlation for the NAO climate index of all grid locations is shown in Fig. 7. It is observed that the time-lags obtained are spatially non-uniform over the region. These lags represent the possible time-delayed relationship between precipitation and NAO teleconnection. It is also worth to be noted that the NUTL are randomly clustered into smaller regions, which might have similar localized or geographic conditions. NUTL-NAO series are extracted at each grid location based on these identified time-lags.

Fig. 7
figure 7

Time lag corresponding to the maximum correlation for NAO at each grid over MAR (color bar represents the time-lag in months)

The performance comparison of UTL- and NUTL based NSGEV models is presented in Fig. 8a. It is observed that in about 94.7% of grid locations, the performance of the NUTL based NSGEV model is better (lower AICc values) when compared to UTL. For brevity, the difference in the AICc values at three grid locations are highlighted in Fig. 8a. In addition, the boxplot (Fig. 8b) of the deviance statistics (difference in AICc) shows that NUTL has the lower AICc for all the grid locations. It is also observed that the deviance statistics for most of the grid locations is greater than two, i.e., the selected NUTL has shown significant improvements in the NSGEV model performance compared to UTL.

Fig. 8
figure 8

Comparison of Uniform (UTL) and Non-Uniform Time-Lagged (NUTL) NSGEV models a AICc (X and Y represent data label corresponding to the x-axis and y-axis respectively at three randomly selected points (lower the AICc value better the model)); b deviance statistics (difference in AICc between UTL and NUTL NSGEV)

4.3 NSGEV versus SGEV models

The total number of models at each grid location is 16, i.e., 15 NSGEV models and 1 SGEV model. The spatial distribution of selected best models based on AICc over MAR is shown in Fig. 9. It is observed that about 92% of grid locations over MAR, the NSGEV models perform better than SGEV models. However, the spatial pattern of grid-wise selected models (Figs. 10 and 11) shows that no single/group of climate index/covariate-based NSGEV models are chosen over the entire MAR. Similar results were observed by Mondal and Mujumdar (2015) over India; they found that the significant covariates differed from one grid location to another showing a non-uniform pattern. The results indicate that among all the indices, the GLBT and AMO are majorly influencing MAR. The reason could be that the GLBT represents the global warming phenomenon and directly affects the EP changes through the warming of oceans, transporting moist air to land and into the weather systems (Groisman et al. 2005; IPCC 2007; Res and Trenberth 2011). Similarly, the SSTs in the tropical Atlantic (AMO) strongly influence Monsoon precipitation (Azad and Rajeevan 2016; Kucharski et al. 2007). Overall, the use of NULT based climate indices for NSGEV has significantly improved the modelling of EP compared to SGEV. However, it is evident from Fig. 11 that there is a considerable variation in the selected NSGEV model, covariates and NULT depending on the effect of the teleconnections over the MAR region. The uncertainty in the prediction of the EP will vary significantly over MAR. Therefore, it is proposed to investigate the variability of NSGEV model parameters by grouping the grid locations having similar magnitude EP into coherent zones. In a large-scale domain, the grid locations closer to each other are expected to have identical EP behaviour, as discussed earlier.

Fig. 9
figure 9

Performance of SGEV and NSGEV (includes 15 covariates) models at each grid over MAR

Fig. 10
figure 10

Percentage distribution of grid locations over MAR for selected NSGEV models (X-axis represents the15 climate indices corresponding to the NSGEV models, and Y-axis represents the number of grid locations in percentage)

Fig. 11
figure 11

Selected best covariate based NSGEV and SGEV models at each grid over MAR

4.4 Clustering of EP zones

The analysis over MAR suggests that the within-year distribution of the EPs (Fig. 6) for the grids in a region are almost similar, indicating the NSGEV model parameters can be grouped over an area. Regionalization of model parameters reduces the model and parametric uncertainties in simulating the EP. In the present study, the EP zones are derived based on the magnitude of AMPs using the GMM probability-based clustering algorithm. An optimal number of six EP zones are identified over MAR, as shown in Fig. 12. The percentage of grid locations covered by each zone and their descriptive statistics are presented in Table 2. Zone-5 is the largest zone enclosing 30.95% of gird locations, mainly representing precipitation from the Southwest monsoon, which has an EP range from 35.95 to 78.74 mm and mean of 53.05 mm with a variation of 11.07 mm. Zone-6 is the smallest zone with a 5.95 percentage of grid locations having EP range between 78.76 to 189.99 mm with a mean of 101.06 mm and variation of 20.98 mm and lies on the windward side of the Western ghats (wettest peninsular India) and grids closer to coastlines such as parts of Tamil Nadu, Vietnam, Japan, and receives Northeast and East Asian monsoons. The remaining zone-1, zone-2, zone-3, and zone-4 each cover 14–16 percentage of the study area with EP mean and variation in the range of 3–35 mm and 1.6 -3.5 mm, respectively. The zones (1–3) cover the lee-ward portions of the Himalayas or Hindu-Kush, and the Arakan Mountains, Tibetan Plateau, and Tarim Basins, primarily arid and semi-arid regions receiving low precipitation, and zone-4 is mainly confined to the Northern Australia, Southeast countries, and leeward sides of Western ghats.

Fig. 12
figure 12

Six EP zones obtained over MAR from GMM based clustering (color bar represents the zone number)

Table 2 Percentage of grid locations and descriptive statistics of AMP for identified zones over MAR

The zone-wise distribution of the number of grid locations for the selected NSGEV model (each climate index) is represented as a stacked column in Fig. 13a. It is observed that the GLBT and AMO are dominant among all climate indices in each of the zone, which is similar to the earlier results. The zone-wise observations show that the NSGEV models are the best alternative in all the EP zones compared to the SGEV model (Fig. 13b). However, the grids within each zone have different covariate-based NSGEV models. The selected best NSGEV models' parameters at each grid location within each zone are pooled up to obtain the regional NSGEV parameters. Table 3 provides descriptive statistics of Zone-wise (Z-) NSGEV parameters across MAR. It is observed that the location parameter is lower for zone-1 ranging between 3.24 and 12.44, and a mean of 7.64, and parameter values increase across the zones, with zone-6 having the highest values in a range of 59.95–191.17 and mean of 98.64. The following sections present the evaluation of Z-NSGEV models' performance and zone-wise EP changes for various return periods compared to SGEV.

Fig. 13
figure 13

a Selected covariate based NSGEV models at a number of grids within each zone; b Performance comparison of the NSGEV and SGEV models across the zones

Table 3 Descriptive statistics of the zone-wise parameters of NSGEV models for six zones

4.5 Z-NSGEV model performance

The efficacy of the model's performance is brought out in two steps. In step one, an intra-comparison of the Z-NSGEV model with an NSGEV model within a zone is carried out. The Z-NSGEV models within the area are compared to the individual grid NSGEV models. Figure 14 shows the intra-comparison of models in each zone (grid location (G) and zone-wise (Z)). It is observed that the RMSE for Z-NSGEV models is comparable to the grid NSGEV models in the respective zones over MAR, indicating that the former can capture the EP dynamics of the respective zones. In step two, an inter-comparison of Z-NSGEV models across the zones is carried out to validate the performance of the regional models. It is to be noted that since EP is used for the formation of the region, it cannot be used for homogeneity/validation of model performance. Alternatively, a simplified approach is proposed to validate the zone model's performance by assuming that the model derived for a particular zone will underperform in other zones. A heat map representing the performance of the Z-NSGEV models across the zones is shown in Fig. 15. The non-diagonal RMSE values show the zone model parameters used to predict EP in other zones, whereas the diagonal values show the zone-wise model performance. All the non-diagonal RMSE values are higher than the diagonal, indicating that the estimated regional parameters for all the six zones can capture the dynamics of the EP within the cluster grids better when compared to the other zones.

Fig. 14
figure 14

Intra-comparison of the Z-NSGEV model with an NSGEV model at all the grids within a zone (G represents the grids within the zone and Z represents the zone-wise model i.e., 1-G and 1-Z refer to zone-1)

Fig. 15
figure 15

Validation of Z-NSGEV models across the zones (Diagonal elements represent the within-zone performance and non-diagonal elements represent the model performance with other zones)

4.6 Zone-wise: estimation of EP

The zone-wise EP is estimated for various return periods of 5-, 10-, 25- 50-, and 100- years for SGEV and Z-NSGEV models. The relative percentage change in intensity of EP (NSGEV vs. SGEV) is shown in Fig. 16. It is observed that the percentage change ranges between 7 -10%, 5 –8%, 4–8%, and 4–10% for 5-, 10-, 50- and 100- year return periods, respectively, across the six zones. For each zone, the percentage change ranges between 6.8–9.8%, 7.11–9.24%, 6.4–8.9%, 4.4–7.6%, 8.4–10.85% and 7.4–10.5% for various return periods. Percentage change is observed higher in zone-5 and -6, which lie in monsoonal windward and coastal regions (prone to high tropical cyclones). The EP in other zones (1–4), mainly arid and semi-arid regions (inland of MAR), show a moderate increase in the EP intensities. It is observed that for all the six zones of MAR, the intensities of the EP are increasing, and return periods are decreasing, i.e., high-intensity EP will occur more frequently. The revised return periods based on the SGEV and Z-NSGEV models are provided in Table 4. The results show that the return periods have reduced on an average from 5- to 3.4-year, 10- to 6.7-year, 25- to 16.6-year, 50- to 32.2-year, and 100- to 61.64- year in six zones, respectively. In addition, the relative percentage change in zone-wise return periods (Fig. 17) shows that the percentage change in return periods ranges from 26 to 47% across all the zones. Overall, the results indicate an increase in extreme precipitation and reduction in the return periods across all zones over MAR compared to SGEV models.

Fig. 16
figure 16

Zone-wise relative percentage change in EP (NSGEV vs. SGEV) for various return periods

Table 4 Revised return periods based on the SGEV and Z-NSGEV models across the zones
Fig. 17
figure 17

Zone-wise relative percentage change in various Return Period (RP) across the six zones over MAR

5 Summary and conclusions

This study presents an NSGEV framework for modelling EP over Monsoon Asia Region (MAR). The non-uniform time-lag climate indices are used as covariates for the NSGEV models and have improved the model's performance compared to uniform time-lag climate indices. It is observed that the NSGEV models selected at about 92% of the grid locations over MAR outperform the SGEV model. The MAR region is divided into six zones using the GMM clustering algorithm. The Z-NSGEV models are derived by regionalizing the model parameters for analyzing the EPs. The results show that the Z-NSGEV models can mimic the EP characteristics in all the zones. Finally, the zone-wise EPs are investigated for various return periods for Z-NSGEV and SGEV models. It is observed that there is an increase in intensity and reduction in return periods across all zones, especially in zone 5 and 6 there is a relatively higher increase in intensity, which lies in monsoonal windward regions and coastal, respectively. Due to climate variability, these regions are highly receptive to EP and are expected to intensify (in both magnitude and frequency). The Z-NSGEV model outcomes will help in the reliable design and rehabilitation of water infrastructure, risk assessment, and to develop adaptation and mitigation strategies over MAR.

5.1 Limitations and extensions

  1. 1.

    In the NSGEV model, only the location parameter as a function of covariate is varied to address the non-stationarity. The effect of scale and shape parameters is not considered in this study. The work can be extended to understand other parameters' role on the NSGEV model's performance in MAR.

  2. 2.

    The Peak Over Threshold (POT) series can be considered instead of the AMP series in the proposed framework. POT has advantages over AMP as more maximums can be extracted for a set threshold value (Cheng and Aghakouchak 2014).

  3. 3.

    In this study, the location parameter for the NSGEV model is only a function of a single climate index (covariate). However, the association between EP and one teleconnection pattern is most likely modulated by other teleconnections or may include other effects of teleconnections. The framework can be extended to include the distribution parameters as a function of various combinations of covariates. Further, the study analyzes only the influence of teleconnections representing the global processes on EP variability over MAR. Other covariates, such as local temperature, topography, and land cover changes, can also be included to understand their effect on EP.

  4. 4.

    The NSGEV models consider linear functions to relate covariates and parameters. However, the linear assumption may not be able to capture the complex and nonlinear EP processes. The work is in progress to develop a nonlinear functional relationship to capture/mimic the processes between teleconnections and EP.

  5. 5.

    The regional homogeneity of the clusters requires the formation of the zones based on independent hydro-climatic datasets. However, in this study, EP was directly used as an attribute to cluster the MAR region, and hence homogeneity of the region is not carried out. Instead, a simplified approach is adopted for validating the zones, i.e., inter-comparison Z-NSGEV models. Alternatively, the homogeneous zones of EP can be formed using other hydro-climatic data influencing EP such as temperature, pressure, location parameters, geopotential height derived using reanalysis datasets (NCAR–NCEP) as attributes to cluster the zones. In addition, geographic characteristics such as elevation, topographic features, and location variables can also be considered.