Introduction

Groundwater contributes to one-third of the global freshwater supply, which supports a population of over 2 billion (Gorelick and Zheng 2015). In India, groundwater is a source of water supply for more than 80% of the rural and 50% of the urban populations, and for 50% of irrigation demand, which contributes to 70–80% of irrigated production (Mall et al. 2006). There is a consensus among climate scientists that global warming will intensify, accelerate, or enhance the water cycle, which will have important consequences for the world’s freshwater resources (UNESCO 2009). Although the effects of climate change on water resources are already visible worldwide, the greatest concern of water experts is its impact on groundwater as this is a more dependable source of water supply for domestic, irrigation and industrial sectors (e.g., Holman 2006; Gurdak et al. 2009; Gorelick and Zheng 2015). Groundwater is affected by climate through major hydrological processes such as precipitation, evapotranspiration, and runoff as well as through interaction with surface-water bodies. The extremes of climate (droughts and floods) are often related to the drivers of climate variability, i.e., large-scale climatic patterns/oscillations such as the North Atlantic Oscillation (NAO), Arctic Oscillation (AO), Pacific Decadal Oscillation (PDO), El Niño Southern Oscillation (ENSO), etc. These large-scale and long-term climatic cycles can have the most discernible impacts on groundwater due to slow aquifer recharge processes and long aquifer response times (Russo et al. 2014); however, proper assessment of climate-change impacts on groundwater is challenging due to its complex relations with hydro-climatic variables (Mishra and Singh 2010). This challenge is further complicated for data-scarce developing countries.

In the recent past, several researchers ascertained the connection between groundwater and large-scale climate patterns across the world (e.g., Jones and Banner 2003; Fleming and Quilty 2006; Hanson et al. 2006; Luque-Espinar et al. 2008; Gurdak et al. 2009; Tremblay et al. 2011; Perez-Valdivia et al. 2012). Fleming and Quilty (2006) studied this link in the aquifer of southwest British Columbia, Canada, and reported that the groundwater of the area has a significant correlation with seasonal ENSO-related precipitation anomalies, i.e., groundwater levels are higher during La Niña years and lower during El Niño years. Using wavelet and coherence analysis, Tremblay et al. (2011) investigated the links of climatic oscillations (NAO, AO, PDO, Pacific-Northern-America-Pattern, and multivariate ENSO Index) with the groundwater levels of three unconfined aquifers in Canada and concluded that the inter-annual cycles observed in large-scale climatic patterns were also found in groundwater levels, thereby suggesting substantial influence of these climatic patterns on groundwater. All of these studies confirm that there exists a definite linkage between groundwater levels and long-term climatic cycles occurring thousands of kilometers away from the area/basin under study.

India receives around 70–90% of its rainfall from the southwest monsoon and its interannual variability is mostly influenced by the large-scale climatic pattern of ENSO (Revadekar et al. 2012). The ENSO is a climatic phenomenon, which affects global climate variability owing to the interaction between the tropical Pacific Ocean and its surrounding atmosphere. It has two phases, El Niño and La Niña, which are linked to sea surface temperatures and approximately alternates every 2–7 years (IRI 2017). During the warm phase (El Niño), the sea surface temperature is anomalously warm, while in the cool phase (La Niña), it is anomalously cool. In a recent study, it was found that 10 out of the past 13 droughts in India had a high correlation with the El Niño phase (Singh 2014). A study conducted in an arid region of Western Rajasthan (India) revealed that the droughts during El Niño phases were more severe than La Niña phases (Ganguli and Reddy 2013). In another study, Reddy and Ganguli (2012b) applied bivariate copulas and reported that during La Niña (El Niño), there was higher (lower) precipitation and shallow (deeper) groundwater levels in the Manjra River basin of western India.

The preceding reviews suggest that the association between hydro-climatic variables can be studied using emerging tools and techniques. One of such techniques is copula functions, which have been extensively used in hydro-meteorological studies (ICSH 2017). The copula technique has salient features, which are helpful in hydrological studies: (1) it derives joint distributions independent of the marginal, (2) along with the composite likelihood approach, it reduces uncertainty in the estimates of frequency distribution parameters, and (3) it handles non-linearity for modeling dependence between random variables (Genest and Favre 2007; Chowdhary and Singh 2010). Although the copula-based bivariate, trivariate and quadravariate analyses have been used for the frequency analysis of extreme events (Salvadori and De Michele 2004; Shiau et al. 2007; Kao and Govindaraju 2008; Karmakar and Simonovic 2009; Wong et al. 2010), the use of bivariate analysis is mostly preferred due to its simplicity (Klein et al. 2011).

It is also apparent from the aforementioned review that to date, only one study (Reddy and Ganguli 2012b) has applied copula models for the risk assessment of changes in hydro-climatic variables on groundwater; however, this study considered only one observation well as a representative for the entire river basin (143.32 km2), which is not practically appropriate because the findings of the study may not be useful for the entire basin. To address this shortcoming, the present study was conceived to explore the applicability of the copula technique at a larger scale (e.g., basin/sub-basin scale) so as to ensure more realistic findings for the area under study. Also, the studies on groundwater linkage with the ENSO phenomenon are very limited in developing countries in general and the Indian subcontinent in particular (Reddy and Ganguli 2012b; Susilo et al. 2013; Seeboonruang 2014). Given these research gaps and increasing drought incidences in India, this study was carried out to address some of the aforementioned research gaps considering the Sina River basin as a study area, which is located in the semi-arid region of Maharashtra, western India, and it comes under the ‘chronically drought-prone area’ (PACS 2004). As a result, frequent droughts occur and water scarcity is a serious problem in the area (Chary et al. 2010; DTE 2016; News World India 2016). In addition, to the best of the authors’ knowledge, no scientific study has been conducted so far in the study area and the present study is first of its kind in the area. The specific objectives of this study are: (1) to evaluate dependence among hydro-climatic variables, (2) to model joint dependence between hydro-climatic variables using suitable Archimedean copulas, and (3) to explore the effect of hydro-climatic variables (rainfall and ENSO phenomena) on groundwater using copula-based conditional distributions. The methodology used to fulfill these objectives is an extension of that reported by Reddy and Ganguli (2012b).

Overview of copulas

A copula is a multivariate probability distribution having uniform marginal distribution of random variables. It can represent and model dependence between associated random variables irrespective of their marginal distributions. Sklar’s theorem (Sklar 1959) states that every joint distribution F can be expressed as:

$$ {F}_{X_1,{X}_2..\dots {X}_n}\left({x}_1,x{}_2,\dots, {x}_n\right)=C\;\left[{F}_{X_1}\left({x}_1\right),{F}_{X_2}\left({x}_2\right),.\dots, {F}_{X_n}\right] $$
(1)

where \( {F}_{X_1,{X}_2..\dots {X}_n}\left({x}_1,x{}_2,\dots, {x}_n\right) \) stands for the joint cumulative distribution function (CDF) with continuous marginal distributions \( {F}_{X_1}\left({x}_1\right),{F}_{X_2}\left({x}_2\right),.\dots, {F}_{X_n} \)of the random variables X 1, …, X n and C is a copula, i.e., a CDF whose margins are uniform on the interval (0, 1).

This study focuses on two variables at a time, i.e., n = 2. These variables are denoted X and Y; their joint distribution F X,Y can be expressed in terms of their CDFs F X and F Y as follows:

$$ {F}_{X,Y}\left(x,y\right)=C\;\left[{F}_x(x),{F}_y(y)\right]=C\;\left[u,v\right] $$
(2)

where C is unique whenever F X and F Y are continuous, else uniquely estimated by range F X  × range F Y .

Archimedean copulas

In general, the copula C in Eq. (2) is assumed to come from a parametric class. Archimedean copulas, elliptical copulas, and extreme-value copulas families have been applied in hydrological studies. However, Archimedean copulas are most frequently used due to their flexibility and simplicity (e.g., Genest and MacKay 1986; Zhang and Singh 2006; Klein et al. 2011) and, hence, this type was selected for this study. A bivariate copula C is said to be Archimedean if it can be written in the following form:

$$ C\left(u,v\right)={\phi}^{-1}\left(\phi \left(\mathrm{u}\right)+\phi \left(\mathrm{v}\right)\right) $$
(3)

where the generator ϕ is a function ϕ: [0, 1] → [0, ∞] which is convex, decreasing and such that ϕ(1) = 0. Its pseudo-inverse is denoted ϕ −1. Various parametric classes of Archimedean copulas are listed in Table 1, in terms of their CDF, generator function and other properties. For more information about copulas and their application, the interested reader can refer to Salvadori and De Michele (2007), Genest and Nešlehová (2012a, b) or Genest and Chebana (2016).

Table 1 Copula cumulative distribution functions, their generator function φ θ , and the relation of Kendall’s tau (τ) with parameter θ

Methodology

Study area

For the present study, the Sina River basin was selected as the study area. This basin is located in Maharashtra, western India (Fig. 1), between 17° 28′ N and 19° 16′ N latitude, 74° 28′ E and 76° 7′ E longitude. The basin has an area of 12,244 km2, with the topographic elevation ranging from 420 to 964 m (above mean sea level; MSL). It comprises four districts, namely Ahmednagar, Beed, Osmanabad and Solapur, but the largest portion (42%) of the basin falls in Solapur district. The 19 smaller subdivisions, i.e., blocks, for these four administrative districts are shown by different colors in Fig. 1. The average maximum and minimum air temperatures are 40.5 °C in the month of May and 10.5 °C in the month of December, respectively. The rainy season extends from mid-June to the end of October. The average annual rainfall of the study area is 644 mm; most of the rainfall occurs due to the southwest monsoon.

Fig. 1
figure 1

Location of the study basin along with boundaries of administrative ‘blocks’ and locations of gauging sites

Geologically, the study area is underlain by Deccan basalts, which are composed of vesicular amygdaloidal basalt and fraction jointed basalt (Deolankar 1980). The water-bearing formations are generally shallow unconfined or semi-confined aquifers in the cover of weathered or fractured upper portions of Deccan basalts, along with a patch of local alluvium. The depth of weathered/fractures zones under unconfined conditions ranges from 7.2 to 22.5 m below the ground level. Specific yield (effective porosity) of the unconfined aquifers ranges from 0.010 to 0.026, which indicates relatively low storage capability of the aquifers.

Data

Hydro-meteorological data used in this study were collected from various government organizations/agencies. Daily rainfall data of nine raingauge stations for the period of 1985–2009 were collected from India Meteorological Department (IMD), Pune and State Data Storage Center, Hydrology Project (HP), Nashik, India. It should be noted that the World Meteorological Organization recommendation of 1 rainfall station per 600–900 km2 for plain areas could not be met, so the rainfall data were supplemented by data from stations in the vicinity (outside) the study area, to better represent the spatial average rainfall within the study area. Pre-monsoon (May month) and post-monsoon (October month) groundwater-level data of 132 sites (observation wells) over the basin for the 1985–2009 period were also acquired from the Groundwater Survey and Development Agency (GSDA), Pune, India. These data are from unconfined aquifers, which are predominant in the study area. The locations of observation wells and raingauge stations are shown in Fig. 1. Groundwater-level data for many sites and for some years are missing from the 1985–2009 dataset; this is a common problem in most developing nations of the world. As a result, the application of time-series analysis techniques under limited-data conditions becomes a challenging task for the researchers of developing nations; therefore, considering the low availability and continuity of time-series groundwater-level data in the study area, the present study was carried out under data-scarce conditions. In this study, 17 years (1990–2006) of groundwater-level and rainfall data have been used to investigate the applicability of the copula technique at larger scale. Thiessen polygons were created using the rainfall stations available in the study area (Fig. 2). The areas of the Thiessen polygons and the number of observation wells falling within each Thiessen polygon are given in Table 2. It is worth also mentioning that in some of the recent studies on copula modeling, limited datasets (15–18 years) have been used (e.g., Durocher et al. 2016; Reddy and Ganguli 2012a).

Fig. 2
figure 2

Locations of raingauge stations and areas of their Thiessen polygons

Table 2 Number of observation wells, area and percentage of total area under the Thiessen polygons of nine raingauge stations

The impacts of hydro-climatic factors on groundwater are reflected in recharge and discharge processes occurring in a groundwater basin. However, detailed information about these processes are often lacking at a basin scale, especially in the developing world. Generally, groundwater level is monitored and, hence, it is easily available data compared to other components of groundwater. In fact, spatio-temporal variations of groundwater levels in a basin are the outcome of spatially and temporally varying recharge and discharge processes occurring in the basin. Given this fact and the unavailability of other groundwater-related data in the study area, the effects of hydro-climatic factors on groundwater have been explored in this study using groundwater-level data.

The ENSO phenomenon is well represented by a recent index known as ‘multivariate ENSO Index (MEI)’. MEI is defined using the first un-rotated principal component of six observed variables—sea-level pressure, zonal and meridional components of the surface wind, sea surface temperature, surface air temperature and total cloudiness fraction of the sky over the tropical Pacific (Wolter and Timlin 2011). Monthly MEI values for the study period (1990–2006) were obtained from the database provided by the National Oceanic and Atmospheric Administration (NOAA 2017).

Dependence among the hydro-climatic variables

Before evaluating dependence, the data independency in all the time series was checked by an auto-correlation test (Ljung-Box Q-test). For evaluating dependence, rank-based (and hence scale-free) measures of dependence, such as Spearman’s rho (ρ) and Kendall’s tau (τ) are preferred over Pearson’s correlation coefficient, given that they do not rely on any assumption of linearity between the random variables and are not affected by outliers (Klein et al. 2011). In this study, Spearman’s ρ was used to evaluate dependence among hydro-climatic variables at all the nine raingauge stations for the 17 years period (1990–2006). The elevation of post-monsoon groundwater levels (PMGL) was considered instead of depth to groundwater below the ground surface in order to maintain a common datum for all the groundwater-monitoring sites.

In standard climatology, variables affected by large-scale climatic patterns should be averaged over the area (Fleming and Quilty 2006); hence, to study the effect of monsoon rainfall and the ENSO phenomenon on groundwater levels, post-monsoon groundwater levels for the observation wells of a particular Thiessen polygon were averaged. To perform this analysis, the cumulative monsoon rainfall and the average of monthly MEI values for the period June to October were used. The presence of dependence between each pair of hydro-climatic variables was examined at the 1 and 5% levels of significance based on the p-values of the standard two-tailed t-test. It is worth mentioning that to reject the null hypothesis, the p-value should be less than or equal to the level of significance (α). For a visual illustration, the variation of hydro-climatic variables over their standardized value was also plotted.

Fitting marginal distributions to hydro-climatic variables

After evaluating the dependence, marginal distributions were fitted to each of the variables. For PMGL and rainfall, the most popular parametric distributions were used, namely gamma (GM), lognormal (LN) and Weibull (WB); however, for the ENSO Index, non-parametric kernel-density-based normal and quadratic distributions were considered because parametric distributions do not fit climate indices properly (Reddy and Ganguli 2012b). The probability density functions and parameter estimates for the parametric and non-parametric kernel-density-based distributions are shown in Table 3. In all cases, the estimates were obtained using the method of maximum likelihood. The best distribution was selected based on selected univariate statistical indicators—root mean square error (RMSE), Akaike information criterion (AIC), and Kolmogorov-Smirnov (KS) test—and a graphical indicator (cumulative distributive function plot).

Table 3 Expressions of probability density function and parameter estimation for parametric and non-parametric distributions used in the study

Archimedean copulas for modeling dependence

A priori, the choice of parametric Archimedean copulas families as possible models for the dependence between hydro-climatic variables is guided by the range of association they allow. The Clayton and Gumbel-Hougaard copulas are used if the dependence is positive, whereas Ali-Mikhail-Haq and Frank copulas are applied for modeling both positive and negative dependence. The Frank copula can model the entire range of dependence values [−1, +1], whereas the Ali-Mikhail-Haq family of copulas is only suitable for weakly dependent variables (Nelsen 2006). There exists a connection between a rank-based non-parametric measure of dependence called Kendall’s τ and Archimedean copulas generators, which is given as follows (Genest and MacKay 1986):

$$ \tau =1+4\int \frac{\phi (t)}{\phi^{\prime }}dt $$
(4)

where ϕ′ denotes the derivative of ϕ with respect to t. This relation can be used to estimate the parameter θ of an Archimedean copula by the method of moments, which consists of replacing τ by an estimate thereof in Eq. (4) and solving for θ for any given choice of Archimedean copulas; thus, paired random variables can be modeled through copulas by preserving their mutual dependence. In this study, four families of Archimedean copulas (Clayton, Gumbel-Hougaard, Ali-Mikhail-Haq and Frank) were applied. The expression of the generator function for each copula family with its derivative, together with the relation of Kendall’s τ with copula parameter θ, are presented in Table 1. Copula modeling was performed using MATLAB software.

Goodness-of-fit tests for selecting copulas

Goodness-of-fit tests can be used to check whether a specific copula family fits the data at hand. In this study, both graphical and statistical indicators were used to assess the fitness of Archimedean copulas.

Graphical diagnostics

In order to assess the fit of a given Archimedean copulas family C θ , 1,000 observations were generated from C θ after estimating its parameter. These pseudo-observations were then transformed back into the variables’ original units using the inverses of the marginal distribution F X and F Y . The scatter plot of the resulting pairs was then visualized and compared to the original data. Algorithms to generate random pairs from different copula families (C θ ) can be found in Whelan (2004) and Genest and Favre (2007).

Statistical indicators

Apart from the graphical diagnostics, three statistical indicators for bivariate copulas were used in this study, namely RMSE; AIC; and KS goodness-of-fit test. Detailed descriptions of these statistical indicators can be found in Klein et al. (2011).

Effect of rainfall and the ENSO phenomenon on groundwater

In order to study the impacts of rainfall and the ENSO phenomenon on groundwater, the copula-based conditional distribution probabilities of PMGL ≤ PMGLavg for average and non-average monsoon rainfall scenarios as well as for ENSO phases were determined from the following equation (Zhang and Singh 2006; Reddy and Ganguli 2012b):

$$ {F}_{X\Big|\kern0.28em Y\le y}={C}_{U\kern0.28em \Big|\kern0.28em V\le v}=\frac{C\left(u,v\right)}{v} $$
(5)

Further, the spatial variation of these probabilities over the study area was analyzed by generating probability maps using ArcGIS software.

Results and discussion

Preliminary data analysis

Rainfall characteristics

The monthly variation of rainfall for stations in upper, middle, and lower parts of the study area is shown in Fig. 3a–c. Maximum amount of rainfall in the study area is confined to five monsoon months, i.e., from June to October. For all rainfall stations, among monsoon months, the maximum amount of rainfall is received in the month of September. In upper, middle, and lower parts of the study area, the maximum rainfall is received for Chinchondipatil, Alni and Solapur stations, respectively. The variation of the monthly rainfall over the period of 25 years for that particular month is shown by standard error bars in Fig. 3a–c. All stations have the highest standard error in the month of September, except Kasegaon station, which has it in month August. Minimum standard errors are found in the months of January and February at all the stations.

Fig. 3
figure 3

Temporal variation of mean monthly rainfall for a upper, b middle, c lower part stations of the study area during 1990–2006 period

Groundwater characteristics

The pre- and post-monsoon groundwater-level elevation time series data for Chinchondipatil, Jamkhed and Kasegaon stations representing, respectively, the upper, middle, and lower parts of the study area are plotted in Fig. 4a–c along with the annual rainfall time series data. The pre-monsoon groundwater-level elevations for Chinchondipatil, Jamkhed and Kasegaon stations are 616.09–626.73, 549.99–566.16, 479.52–493.07 m MSL, and the post-monsoon groundwater-level elevations are in the range of 622.73–630.39, 558.45–564.09, 483.38–489.71 m MSL, respectively. These values clearly show the response of post-monsoon groundwater-level elevation to the variation of rainfall, i.e., post-monsoon groundwater-level elevation increases with increase in rainfall and vice-versa. In case of pre-monsoon groundwater-level elevation, a sudden peak is observed in year 1999 for Jamkhed and Kasegaon stations, which is attributed to recharge from the maximum rainfall in the previous year (i.e., 1998).

Fig. 4
figure 4

Pre-monsoon and post-monsoon groundwater-level fluctuations for a upper, b middle, c lower parts of the study area during 1990–2006 period

ENSO Index

The monthly variation of the ENSO index during 1990–2006 period is shown in Fig. 5. For the ENSO phases during the considered period, the top 30th percentile of ENSO index values represents El Niño years, whereas the bottom 30th percentile of ENSO Index values represents La Niña years, and the remaining as neutral years (Wolter and Timlin 2011). Accordingly, 1995–1996 and 1998–2000 years indicates La Niña years; 1990 and 2001–2006 years denote neutral years and the remaining 5 years, i.e., 1991–1994, 1997, were El Niño years.

Fig. 5
figure 5

Monthly variation of ENSO Index during 1990–2006 period

Evaluating dependence among the hydro-climatic variables

The auto-correlation test revealed that there is no significant time autocorrelation at the 5% level of significance, thereby suggesting that each time series is independent during the study period. However, it is apparent from Fig. 6a–d that all the hydro-climatic variables, i.e., PMGL, rainfall and ENSO Index are cross-correlated to one another. For brevity, the graphs of four selected stations are shown in Fig. 6a–d as an example and the dependence measured using Spearman’s ρ is presented in Table 4. The evaluation of dependence indicated that there is positive dependence in the PMGL-Rainfall pair, which means increase in rainfall increases the PMGL. On the other hand, PMGL-ENSO Index and Rainfall-ENSO Index are negatively associated with each other. These relationships suggest that there will be a decrease in the PMGL as well as rainfall with increase in the ENSO Index values.

Fig. 6
figure 6

Variation of standardized values of ENSO Index, post-monsoon groundwater levels (PMGL) and monsoon rainfall for the 1900–2006 period at rainfall stations: a Alni, b Chinchondipatil, c Jamkhed, and d Kasegaon

Table 4 Summary of Spearman’s rho (ρ) values for the pairs of PMGL-Rainfall, PMGL-ENSO Index, and Rainfall-ENSO Index

For the PMGL-Rainfall pair, a high level of dependence was found to be significant for all the stations, except at the Bandalgi station, which is located in the downstream portion of the study area. This lower dependence between PMGL and Rainfall at the Bandalgi station could be attributed to concentrated runoff (overland flow) at the downstream end. For the PMGL-ENSO Index pair, only Jamkhed, Tembhurni, Solapur and Bandalgi stations, which cover 49% of the study area (6,185 km2), exhibited statistically significant negative dependence (Table 4). There is no statistically significant dependence for the remaining five stations and, hence, in the areas covered by these stations, the relationship between PMGL and ENSO can only be used for qualitative predication (high or low PMGL). This insignificant dependence may be attributed to other climatic oscillations (Jones and Banner 2003). For the Rainfall-ENSO Index pair, only three stations (Alni, Kasegaon and Supa) that cover 32% of the study area showed statistically significant negative dependence; hence, this pair was not considered in subsequent analyses.

Identifying marginal distribution for fitting hydro-climatic variables

The performance evaluation for the distribution fitting of PMGL at all the stations was carried out using cumulative distribution function (CDF) plots and statistical indicators as shown in Fig. 7a,b and Table 5, respectively. Table 6 summarizes the estimated parameters of GM, LN, and WB distributions. Upon visually assessing CDF fit for the PMGL time series (Fig. 7a,b) for different stations and AIC criteria (Table 5), it can be seen that the WB distribution provides a better fit than GM and LN distributions. For the rainfall time series, CDF plots (Fig. 8a,b) and RMSE values (Table 7) suggest that it is better represented by the LN distribution compared to WB and GM distributions. The parameter estimates for the GM, LN, and WB distributions, fitted rainfall time series are given in Table 8.

Fig. 7
figure 7

Cumulative distribution function of gamma, lognormal, and Weibull distributions fitted to post-monsoon groundwater levels (PMGL) in the zones of rainfall stations: a Chinchondipatil and b Jamkhed

Table 5 Performance evaluation of different probability distributions fitted to PMGL
Table 6 Parameters of the probability distributions fitted to PMGL
Fig. 8
figure 8

Cumulative distribution function of gamma, lognormal, and Weibull distributions fitted to rainfall at stations: a Chinchondipatil and b Jamkhed

Table 7 Performance evaluation of different probability distributions fitted to rainfall
Table 8 Parameters estimated of the probability distributions fitted to rainfall

Moreover, the CDF plot for the ENSO Index is depicted in Fig. 9, which reveals that both the non-parametric kernel-based normal and quadratic distributions performed nearly the same; the KS-test also supports both the distributions (Table 9). However, the statistical evaluation confirmed that the ENSO Index is best fitted by the ‘non-parametric kernel-based normal distribution’ with lower values of AIC (−32.54) and RMSE (0.05989) as shown in Table 9. The optimal bandwidth is the only parameter for the two non-parametric kernel-based distributions and its value is estimated as 0.4455.

Fig. 9
figure 9

Cumulative distribution function of non-parametric normal and quadratic distributions fitted to the ENSO Index

Table 9 Performance evaluation of different probability distributions fitted to ENSO Index

Selecting suitable copula for modeling dependence

As mentioned in the previous section, the PMGL and rainfall time series followed different distributions and hence, the traditional bivariate distribution cannot be used for dependence modeling. Even if in the case of same marginal distribution for the PMGL and rainfall time series, copula function is preferred to the traditional bivariate distribution due to its better performance (Ganguli and Reddy 2012). Hence, the dependence between PMGL and rainfall is modeled using a copula function, because it does not need the condition of random variables to follow the same marginal distribution family. As the pair of PMGL-Rainfall exhibited highly positive dependence (p < 0.01), an attempt was made to capture their dependence using Clayton (Cl), Frank (Fr) and Gumbel-Hougaard (GH) copula models. The estimates of copula model parameters for the PMGL-Rainfall pair are shown in Table 10. The scatter plots of observed and simulated data from the three fitted copula models are shown in Fig. 10a–c, together with the Kendall’s τ values computed from simulated samples for the three copula models. It is evident that the random pairs generated by all three copula models (shown as gray dots) are well intertwined with the observed data (shown as black dots). Furthermore, the values of Kendall’s τ for the simulated data are close to those of the observed data (Fig. 10a–c and Table 11). However, the Clayton copula better simulates the trend of the observed data compared to the other two copulas. In addition, all the statistical indicators (Table 12) also confirm that the Clayton copula is a better choice among the three copula families considered. It should be noted from Fig. 10a–c that the upper bound appears for groundwater levels and rainfall dependence. For this, the estimated non-parametric upper tail dependence coefficient for all stations in the study area is found to be varying from 0.30 to 0.72. Also, at a certain threshold of high rainfall, very weak dependency exists between PMGL and rainfall. In fact, the rate of recharge is a function of depth to the water table. Hence, when the water table reaches the threshold value, the recharge rate is drastically reduced, which causes less dependency on rainfall.

Table 10 Parameters of different copulas fitted to PMGL-Rainfall pairs
Fig. 10
figure 10

Scatter plots of observed (black dots) versus 1,000 simulated (gray dots) samples using Clayton, Frank and Gumbel-Hougaard copulas for the PMGL-Rainfall pair at a Alni, b Chinchondipatil, and c Jamkhed stations. In this figure, Kendall’s tau (τ) value is shown for simulated samples

Table 11 Kendall’s tau ( τ) values for the PMGL-Rainfall and PMGL-ENSO Index pairs
Table 12 Results of goodness-of-fit tests for different the PMGL-Rainfall copula models

For the PMGL-ENSO Index pair, a negative dependence was found (Table 4). Therefore, they were only modeled by using the Frank copula, which is applicable to the entire range of dependence [−1, +1]. The scatter plots show a good overlap and close Kendall’s τ values between the observed data and the pseudo-sample generated by the Frank copula for all the stations (Fig. 11a–c and Table 11). The results of the KS-test (Table 13) also suggest that the dependence in the PMGL-ENSO Index pair is adequately captured by the Frank copula. The estimates of the parameters of the Frank copula fitted to the PMGL-ENSO Index pair are shown in Table 13. This choice of copula model corroborates the earlier study reported by Reddy and Ganguli (2012b) in which depth-to-groundwater data were considered instead of PMGL.

Fig. 11
figure 11

Scatter plots of observed (black dots) versus 1,000 simulated (gray dots) samples using Frank copula for the PMGL-ENSO index pair at a Jamkhed, b Tembhurni, and c Solapur stations

Table 13 Estimated parameters and KS-test values of the Frank copula for PMGL-ENSO Index pairs

Impacts of rainfall and the ENSO phenomenon on groundwater

Impacts of rainfall

In order to study the effects of rainfall on groundwater, the graphs of the Clayton-copula-based conditional distribution probabilities of PMGL for given average and non-average (5th, 25th, 50th, 75th and 95th percentiles) rainfall conditions were prepared for four rainfall stations as an example (Fig. 12a–d). Obviously, for a given average rainfall, the probability of PMGLs of lower magnitudes is smaller, whereas that of PMGLs of higher magnitudes is greater. This can be explained by considering Chinchondipatil station as shown in Fig. 12b. The PMGL values of 624 and 628 m are respectively lower and higher magnitude in the zone/area represented by this station. For a given average rainfall, the probability of PMGL being less than or equal to 624 m (MSL) is 15%, whereas that of PMGL less than or equal to 628 m (MSL) is nearly 80% (Fig. 12b). These probability values are not symmetrical when two different values of PMGL are considered. However, if only one value of PMGL is considered, then symmetry (x% and 100–x%) of probability values will exist—for example, the probability of PMGL being less than or equal to 624 m (MSL) is 15%, whereas that of PMGL being greater than 624 m (MSL) is 85%.

Fig. 12
figure 12

Conditional probabilities of post-monsoon groundwater levels (PMGL) for given average and non-average rainfall scenarios (5 th, 25 th, 50 th, 75 th and 95 th percentiles) at four stations: a Alni, b Chinchondipatil, c Jamkhed, and d Kasegaon. In this figure, P denotes percentile of rainfall

Furthermore, based on Fig. 12a–d, a spatial map of the probability of PMGL ≤ PMGLavg, i.e., probability of non-exceedance for a given average rainfall scenario, is generated as shown in Fig. 13. It can be seen from Fig. 13 that for a given average rainfall, the conditional probability of PMGL ≤ PMGLavg is above 70% for the areas/zones covered by Alni, Tembhurni and Kolgaon stations, which encompass about 33% of the study area (4,019 km2). It indicates that PMGL in these areas (Barshi and Madha blocks, and some parts of nearby blocks) will be much lower than its average value and, hence, the groundwater of these areas should be managed with a high priority or an alternative water source should be utilized. In addition, it is recommended to propose rainwater harvesting and artificial recharge structures in these areas. The conditional probabilities of PMGL ≤ PMGLavg for a given average rainfall are found in the range of 65–70% for the areas (2,316 km2) covered by Chinchondipatil and Supa stations, which suggests a moderate groundwater scenario under average rainfall conditions. The groundwater extraction from these areas should be carefully monitored to protect them from falling into higher conditional probability areas. Furthermore, the conditional probability values for a given average rainfall vary from 60 to 65% in the zones covered by Jamkhed, Kasegaon and Solapur stations, indicating that post-monsoon groundwater levels in the southern and central parts of the study area (46% of the area; 5,674 km2) would be close to their average values under average rainfall conditions and, hence, these areas are most favorable zones for groundwater extraction for domestic and irrigation needs as compared to other parts of the study area.

Fig. 13
figure 13

Probability (P) map of PMGL ≤ PMGL avg for a given average rainfall scenario. NS non-significant dependence

Impacts of ENSO phenomenon

For evaluating the effects of ENSO phenomenon on groundwater, the Frank-copula-based conditional distributions of PMGL for different phases of ENSO were plotted, which are illustrated in Fig. 14a–d. For this study, the average of the top 30th percentile of ENSO Index values (1.23) for 1990–2006 period was considered as representative of the El Niño phase, whereas the average of the bottom 30th percentile of the ENSO Index values (−0.25) was deemed as representative of the La Niña phase. This figure reveals that with an increase in ENSO Index, the probability of PMGL for a particular interval increases at a lower magnitude of PMGL, but it decreases at a higher magnitude of PMGL. The probability of occurrence of higher PMGL is greater for a negative ENSO Index (La Niña phase) than for a positive ENSO Index (El Niño phase)—for example, at the Jamkhed station (Fig. 14a) for the ENSO Index value of Z ≤ 1.23 (El Niño phase), the chance of occurrence of PMGL less than 563 m (above MSL) is 74%, whereas it is about 52% in the La Niña phase (Z ≤ −0.25).

Fig. 14
figure 14

Conditional probability of PMGL ≤ PMGLavg for the ENSO Index Z ≤ −0.25 for La Niña phase and Z ≤ 1.23 for El Niño phase at stations a Jamkhed, b Tembhurni, c Solapur, and d Bandalgi

The conditional probability (PMGL ≤ PMGLavg) values during ENSO phases are determined for only Jamkhed, Tembhurni, Solapur and Bandalgi stations (Fig. 15a,b) where dependence is statistically significant (Table 4). It is found that during El Niño phase (Fig. 15a), the Tembhurni and Solapur stations (covering southwestern portions of the study area) show a higher (35–40%) non-exceedence probability of PMGL with respect to its average. This suggests that the blocks under maximum conditional probability (PMGL ≤ PMGLavg) will be more severely affected during El Niño years than the other parts of the study area. The affected blocks will be Madha, Mohal, North-Solapur blocks and some portions of Parenda and Karmala blocks encompassing an area of 2,739 km2 (23% of the study area). It is also apparent from Fig. 15a that the minimum probability of non-exceedence of PMGL during El Niño phase is in the range of 30–35% for Jamkhed and Bandalgi stations (central part of the study area), which cover 3,228 km2 (26% of the study area). On the other hand, in the La Niña phase (Fig. 15b), the non-exceedance probability of PMGL less than or equal to its average value is found below 15% for Jamkhed, Bandalgi and Solapur stations covering an area of 4,283 km2 (35% of the study area); thus, the central and southern portions of the study area will benefit by increased PMGL during La Niña years.

Fig. 15
figure 15

Probability (P) maps of PMGL ≤ PMGLavg for a El Niño and b La Niña phases. NS non-significant dependence

Conclusions

In this report, Archimedean copulas were applied under limited data conditions to assess the effects of the ENSO phenomenon and rainfall on the groundwater resource of a semi-arid river basin of western India. With regard to the availability and continuity of hydro-climatic time-series data in the study area, the dataset used in this study comprised monsoon rainfall of nine stations, post-monsoon groundwater levels (PMGL) at 132 sites, and the ENSO Index for the 1990–2006 period. Based on the salient goodness-of-fit criteria, marginal distributions were selected to formulate copula-based joint distributions for modeling dependence between hydro-climatic variables. Thereafter, out of the four Archimedean copula families, the best-performing copula was used to derive conditional probability distributions of groundwater-level time series with respect to rainfall events and ENSO phases.

The analysis of the results of this study revealed that the dependence for the PMGL-Rainfall pair is positive, whereas that for the PMGL-ENSO Index pair is negative. The PMGL and rainfall time series are best represented respectively by the parametric Weibull and lognormal distributions, whereas the ENSO Index time series is best represented by the non-parametric kernel-based normal distribution. The performance evaluation of the Archimedean copulas family indicated that the Clayton copula is the best for modeling dependence between PMGL and Rainfall, while the Frank copula is the best for the PMGL-ENSO Index pair. The spatial variation of the probability of PMGL ≤ PMGLavg for a given mean rainfall in the study area suggests that for managing groundwater, the areas having above 70% probability (33% of the study area in the eastern and western portions) should be given higher priority. In addition, the results of the probability of PMGL ≤ PMGLavg during ENSO phases indicated that 23% of the study area in the southwestern portion will be severely affected during El Niño years, but 35% of the study area in central and southern portions will benefit by increased PMGL (greater than PMGLavg) during La Niña years.

Finally, it can be concluded that the copula-based approach is very useful for understanding the impacts of environmental factors on vital groundwater resources at a basin or sub-basin scale. The methodology demonstrated in this study can be replicated for the effective planning and management of water resources at a basin scale under data-scarce condition, particularly in the drought-prone regions of Indian subcontinent and other parts of the world.