Introduction

Groundwater is a finite amount of resource, supplying nearly half of all drinking water in the world (WHO 2011). India is considered to be one of the largest consumers of groundwater resources and has overexploited most of aquifer (Aeschbach-hertig and Gleeson 2012). Demand for freshwater resources is increasing rapidly due to population growth, urbanization, and industrialization which specifically increases the water stress in developing nations (UN-Water 2020). It is estimated that two-thirds of the global population will be living in cities by the year 2050 (DeSA 2013). Due to erratic climate events available, surface and groundwater resources have been depleted in most of the urbanized regions (Gober et al. 2010; Famiglietti 2014). In addition to increasing demand and climate change, groundwater gets contaminated through manmade activities such as industrial wastewater discharge (Selvakumar et al. 2017), fertilizers use in agriculture (Nas and Berktay 2006), poor sewerage system (Bishop et al. 1998), and dumping sites (Han et al. 2016). Once groundwater resource gets contaminated, it takes huge cost and time to replenish it (Kazemzadeh-Parsi et al. 2015). To protect this most valuable groundwater resource from contamination, a proper management strategy needs to be deployed (El-Naqa and Al-Shayeb 2009; Wachniew et al. 2016). Groundwater vulnerability assessment is considered to be an effective tool for groundwater protection and planning (Panagopoulos et al. 2006). Generally, groundwater vulnerability of an aquifer is assessed through three different techniques, (i) index-based models (Foster 1987; Aller et al. 1987; Van Stempvoort et al. 1993; Civita 1994; Chachadi and Lobo-Ferreira 2001), (ii) statistical models (Burkart et al. 1999; Johnson and Belitz 2009; Rawat et al. 2019), and (iii) process-based models (Jeannin et al. 2001; Schlosser et al. 2002; Milnes 2011).

The statistical method is associated with uncertainty and applies to regions with similar groundwater contamination factors, whereas the process-based model requires adequacy of data and capture-associated physical, chemical, and biological reactions beneath the surface (Shirazi et al. 2012). The index-based method has an advantage over abovementioned methods and applicable to the large region with minimum data requirement. Some of the previous studies which employed different index-based methods are (i) DRASTIC (Secunda et al. 1998; Kim and Hamm 1999; Shouyu and Guangtao 2003; Koo and O’Connell 2006; Pacheco and Sanches Fernandes 2013), (ii) SINTACS (Kumar et al. 2013; Noori et al. 2019; Jahromi et al. 2020), (iii) GALDIT (Recinos et al. 2015; Kazakis et al. 2018; Bordbar et al. 2020), (iv) GOD (Ghazavi and Ebrahimi 2015; Boufekane and Saighi 2018; Mfonka et al. 2018), (v) AVI (Raju et al. 2014; Jilali et al. 2015; Luoma et al. 2017), and (vi) SI (Ribeiro et al. 2017; Ghouili et al. 2021). DRASTIC is the widely adopted index-based model for groundwater vulnerability assessment (Rahman 2008). However, the abovementioned methods have certain limitations because the ratings and weights associated with this approach are site specific and based on an expert judgment which increases subjectivity of DRASTIC model.

In recent years, many researchers developed various approaches to reduce the subjectivity of DRASTIC model. There are two general approaches followed to improve the DRASTIC method. The first category involves inclusion or exclusion of certain parameters from original DRASTIC model based on aquifer characteristics. Kazakis and Voudouris (2015) replaced the original parameters such as soil media, aquifer type, and vadose zone characteristics with nitrogen losses from the soil, aquifer thickness, and hydraulic resistance. The modified DRASTIC-PA and DRASTIC-PAN (includes land use) perform better than original DRASTIC model. Jia et al. (2019) considered parameters like aquifer thickness, nitrate attenuation intensity, hydraulic resistance, groundwater velocity, and pollutant intensity to develop DRANTHVP model. In conjunction with the developed model, parameter ratings were altered using projection pursuit dynamic clustering (PPDC) model. Kumar and Pramod Krishna (2020) consider land use parameter in addition to seven hydrogeological parameters for estimating the groundwater vulnerability of hard rock aquifer system. Abunada et al. (2021) utilize the SWAT model to calculate net recharge parameter in the DRASTIC model and improved the model accuracy.

In the second category, the most common approach to improve DRASTIC framework is by optimizing its rates and weights using different techniques. Neshat et al. (2014) developed five different approaches for vulnerability assessment. The Wilcoxon rank-sum nonparametric statistical test was used to adjust the rates of model parameters and used an analytical hierarchy process to modify criteria and sub-criteria of the DRASTIC model. Neshat and Pradhan (2015) adopted frequency ratio (FR) approach to modify the rates further integrated with weights obtained from an analytical hierarchy process (AHP) and single-parameter sensitivity analysis (SPSA). Pacheco et al. (2015) attempted weight modification of DRASTIC model based on Spearman correlations, sensitivity analysis, correspondence analysis, and logistic regression. Jang et al. (2016) employed discriminant analyzed to alter the factors weighting coefficients. Khosravi et al. (2018b) used four objective methods such as Shannon entropy (SE), weights-of-evidence (WOE), logistic model tree (LMT), and bootstrap aggregating (BA) to modify the original DRASTIC model. Barzegar et al. (2019) used Wilcoxon rank-sum test and frequency ratio (FR) to modify rates and optimize weights using correlation coefficient, analytic hierarchy process (AHP), and genetic algorithms (GA). Torkashvand et al. (2020) hybridized stepwise weight assessment ratio analysis (SWARA) with the entropy and genetic algorithm (GA) to alter the rates and weights. Balaji et al. (2021) used Wilcoxon rank-sum test to modify rates and further used five metaheuristic algorithms such as IWO, FA, PSO, TLBO, and SFLA to modify the weights of DRASTIC parameters. Elzain et al. (2021) developed deep learning neural networks (DLNN) with two optimization algorithms PSO and DE to improve vulnerability assessment. Agossou and Yang (2021) used DRASTIC, additive model of DRASTIC (DRASTICLcLu), classic DRASTIC weights modified using Shannon’s entropy (entropy weight DRASTIC), entropy weight DRASTICLcLu, and AVI for assessing groundwater vulnerability. Khosravi et al. (2021b) developed DRASTIC, modified DRASTIC, and three statistical bivariate models (frequency ratio (FR), evidential belief function (EBF), and weights-of-evidence (WOE)). Norouzi et al. (2021) used random forest (RF) and genetic algorithm (GA) for DRASTIC framework optimization. Previous studies show the efficiency of the multi-criteria decision-making (MCDM) method and metaheuristic algorithm in improving the DRASTIC framework. Among the MCDM techniques, AHP was adopted by several researchers to alter the rates and weights of DRASTIC (Thirumalaivasan et al. 2003; Sener and Davraz 2013; Yang et al. 2017; Santhosh and Sivakumar Babu 2018; Karan et al. 2018). Garewal et al. (2017) modified weights using AHP and analytic network process (ANP). SWARA is the recently developed efficient MCDM technique that is adopted in several complicated environmental studies (Hong et al. 2018; Khosravi et al. 2018a; Chen et al. 2019b). Despite this interest, few researchers have attempted SWARA method to modify the index-based model rates (Torkashvand et al. 2020; Khosravi et al. 2021a). Torkashvand et al. (2021) employed particle swarm optimization (PSO) and genetic algorithm (GA) for optimizing the weights and SWARA for rate modification of DRASTIC model.

In this study, to improve the robustness of groundwater vulnerability assessment model, bivariate statistical and MCDM approach were coupled with biogeography-based optimization (BBO). SE and SWARA method were not coupled with BBO algorithm so far in DRASTIC-based studies. The novelty of this study is to use new framework such as SE, SWARA and Shannon-MH, and SWARA-MH and compare its performance in groundwater vulnerability assessment.

The main objectives of this study are (i) to develop improved DRASTIC model by modifying rates using bivariate and MCDM approach; (ii) to develop coupled DRASTIC model by integrating bivariate and MCDM model with biogeography-based optimization; (iii) to compare the performance of newly developed model with original DRASTIC model based on weighted F1 score and area under the receiver operating characteristic (AU-ROC) curve; and (iv) to delineate groundwater vulnerability map using the best model for this region.

Study area

Chennai Metropolitan Area (CMA) is selected for this study as it is one of the rapidly urbanizing cities in the world. CMA locates in northern portion of Tamil Nadu, extending between latitudes of 12° 52′ 48″ to 13° 15′ 48″ and longitudes 80° 03′ 04″ to 80° 20′ 37″ as shown in Fig. 1a. This is the fourth biggest and one of the rapidly urbanizing cities of India. The rapid urbanization has resulted in uneven and unplanned growth which in turn has increased the vulnerability of certain areas of the city in terms of water pollution and climate resilience (Krishnamurthy and Desouza 2015), and hence, this city is selected for the study. CMA covers an area of 1189 km2 and is spread over three districts, namely, Chennai, Thiruvallur, and Kancheepuram. The major region of the study area falls under flat and low elevation ranging from 0 to 162 m above mean sea level. The geology comprises of four major formations such as black sandy clay and sand and silt of Quaternary era, followed by Proterozoic granite and Mio-Pliocene sandstone. In addition to these four major formations, Archean age amphibolite, gneiss, mylonite, Late Cretaceous argillaceous, Quaternary fluvial marine, and Proterozoic syenite are available over the study area (Fig. 1b). The city has three rivers flowing from west to east namely Cooum, Adyar, and Kosasthalaiyar and reaches Bay of Bengal (Fig. 1c). The study area has four major lakes such as Chembarambakkam, Red Hills, Korattur, and Sholavaram. The aquifers of Chennai city are phreatic. The groundwater extraction in the city is only for domestic purpose (CGWB, 2019). CMA has a tropical semiarid climate with an annual average maximum temperature of 36.9 °C and a minimum temperature of 20.9 °C. The humidity ranges between 58 and 84%. Rainfall in the study area is chiefly by northeast monsoon during October to December.

Fig. 1
figure 1figure 1

Study area description: a location and elevation, b geology, and c waterbody

Methodology

Aller et al. (1987) developed the original DRASTIC model which considers parameters related to hydrogeological characteristics, which are D, depth to groundwater table; R, net recharge; A, aquifer media; S, soil media; T, topography; I, impact of vadose zone; and C, hydraulic conductivity. The original rates and weights proposed by Aller et al. (1987) are based on expert opinion and also vary from region to region. To overcome this limitation, an improved framework is proposed which are summarized below:

  1. i.

    The seven data layers were processed in the ArcGIS environment by using original rates and weights. Original DRASTIC vulnerability indices were calculated.

  2. ii.

    The nitrate concentration was spatially distributed over each class of DRASTIC parameters to modify rates using the bivariate statistical (Shannon entropy) and multiple criteria decision-making (stepwise weight assessment ratio analysis).

  3. iii.

    Modifying weights using biogeography-based optimization considering each pixel of the entire study region.

  4. iv.

    Using optimized rates and weights, four different models (SE, SWARA, Shannon-MH, SWARA-MH) were developed to assess groundwater vulnerability.

  5. v.

    Determining the best model based on weighted F1 score from confusion matrix and AU-ROC curve evaluation.

  6. vi.

    Estimating groundwater vulnerability using the best model.

  7. vii.

    Proposed methodology for the improved DRASTIC framework is shown in Fig. 2.

Fig. 2
figure 2

Methodology flowchart for DRASTIC framework improvement

DRASTIC layers

Depth to groundwater (D)

It is the distance measured between water table and surface level (Aller et al. 1987). Higher depth to groundwater lowers the groundwater vulnerability. Groundwater level data collected from 45 wells were used to interpolate depth to groundwater layer using inverse distance weighting (IDW) method. The interpolated values were classified into two classes (Santhosh and Sivakumar Babu 2018). If the water table is too shallow, the pollutants can easily contaminate aquifer. Depth to groundwater level of the study area varies from 1.6 to 10.6m bgl. Hence, it is considered dominant factor for groundwater vulnerability to emphasize more weightage on this parameter; this layer was classified into two classes with higher rating.

Net recharge (R)

Net recharge transports the contaminant to groundwater table and spread laterally in the aquifer (Khosravi et al. 2018b). It was estimated using the Piscopo method which depends on the three factors, topography, rainfall, and soil permeability (Piscopo 2001).

The percentage slope of the study area was estimated using digital elevation model (DEM). Rainfall data from Indian Meteorological Department (IMD) for 10 years period was used to calculate rainfall. Soil map was procured from the Institute for Water Studies (IWS), Tamil Nadu, to estimate soil permeability. The net recharge was estimated using the following equation and obtained rates are shown in Table 1.

Table 1 Net recharge ratings using Piscopo approach
$$ \mathrm{Recharge}\ \mathrm{value}=\mathrm{Slope}\%+\mathrm{Rainfall}\ \left(\mathrm{mm}\right)+\mathrm{Soil}\ \mathrm{permeability} $$
(1)

Aquifer media (A)

Aquifer media is the saturated zone that controls contaminant movement within the media (Babiker et al. 2005). It was prepared using a map collected from the aquifer system of Tamil Nadu and Puducherry atlas (prepared by CGWB). It consists of four lithological classes, namely alluvium, charnockite, laterite, and sandstone.

Soil media (S)

Soil media reflects the water holding capacity and contaminant travel time (El Naqa 2004). This layer was prepared using a soil map obtained from the Institute for Water Studies (IWS), Tamil Nadu. It was divided into five classes: sandy, marsh, fine, coarse loamy, and clayey.

Topography (T)

Shuttle Radar Topography Mission (SRTM) satellite data was used to develop topography map. It shows the slope variability of the study region and implies the runoff time based on a flat or steep slope. The major study region falls under the flat slope (0–6%).

Impact of vadose zone (I)

The vadose zone is the layer between upper soil zone and aquifer media. It controls the attenuation behavior below the ground surface and water table (Baghapour et al. 2016). This layer was prepared using the lithology profile of 93 wells within the study region maintained by CGWB. It is divided into five categories, namely clay, sandy clay, weathered rock, topsoil, and sand.

Hydraulic conductivity (C)

Hydraulic conductivity shows the potential of aquifer to transfer groundwater through its void spaces, it plays a critical role in pollutant velocity and mechanical dispersion (Khosravi et al. 2018b). Hydraulic conductivity is estimated by Eq. (2).

$$ K=T.{b}^{-1} $$
(2)

where K and T are the hydraulic conductivity (m.d−1) and transmissivity (m2.d−1) of the aquifer and b is the thickness of aquifer (m).

The estimated values of K lie between 0.0286 and 172.8 m.d−1 and is classified into six classes.

Table 2 presents the description about data source and its processing for groundwater vulnerability assessment. The original ratings from Aller et al. (1987) were assigned to seven hydrogeological parameters. Spatial distribution of the seven DRASTIC layers is shown in Fig. 3a–g.

Table 2 Data source for DRASTIC layers
Fig. 3
figure 3

DRASTIC parameters map. a Depth to groundwater, b net recharge, c aquifer media, d soil media, e topography, f impact of vadose zone, g hydraulic conductivity

Nitrate concentration

Nitrate concentration (NO3) is considered to be the influencing parameter for assessing groundwater vulnerability in urban region. Nitrate concentration data for 46 wells were acquired from the Central Groundwater Board, South Eastern Coastal Region, for the period 2018. The collected wells are uniformly spread over the study region and represent the status of groundwater. The nitrate concentration of the study region was obtained through the inverse distance weighting method (IDW) (Nadiri et al. 2017). IDW interpolation yields smooth and gradual surface. In addition to smoothness, IDW yields convexity in result, whereas kriging estimates outside the range of the observed values (Li and Heap 2014). The estimated nitrate concentration of the study region varies between 4 and 217 mg/L (Fig. 4). The highest level of nitrate concentration was observed in the northeastern fringe of the study area and the lowest level was found in the western region of the study area.

Fig. 4
figure 4

Nitrate concentration distribution in the study area

Original DRASTIC method

Seven hydrogeological parameter ratings were assigned using original DRASTIC rates (Aller et al. 1987). The classified range and rates are given in Table 3. The weighted sum method in the ArcGIS environment is used to calculate groundwater vulnerability indices.

Table 3 Original ratings of DRASTIC parameters and modified ratings from improved framework

The assigned rating and weights are based on the expert approach as suggested by Aller et al. (1987). The higher the resultant groundwater vulnerability index, the greater the pollution risk (Babiker et al. 2005; Jamrah et al. 2008; Huan et al. 2012; Khosravi et al. 2018b).

DRASTIC framework improvement

As discussed earlier, the subjectivity of the original DRASTIC model can be reduced through modification of rates and weights. In this study, bivariate statistical and multiple criterion decision-making approaches enable the rate modification followed by metaheuristic algorithm application in the weight modification.

Stepwise weight assessment ratio analysis (SWARA)

SWARA developed by Keršulienė et al. 2010 is a multiple criteria decision-making (MCDM) method that considers expert opinion in evaluating the rates and weights of criteria and sub-criteria. Based on the expert implicit knowledge, information, and experiences, each criterion was ranked (Zolfani and Saparauskas 2014). The criteria are ranked according to their significance. The main feature of this method is the ability to include importance ratio of the criteria based on expert opinion for weight determination (Keršulienė et al. 2010). Steps to determine the relative weights of criteria are explained below (Torkashvand et al. 2020):

Step 1: Criteria are prioritized according to the importance and sorted in descending order.

Step 2: Estimating the relative importance of criterion j in relation to the preceding j-1 criterion, for every criterion (Stanujkic et al. 2015). Calculate the comparative importance of average value, Sj (Keršulienė et al. (2010).

Step 3: Determine the coefficient Kj for each criterion:

$$ {K}_j=\left\{\begin{array}{c}1\kern2.75em if\ j=1\\ {}{S}_j+1\kern0.5em if\ j>1\end{array}\right. $$
(3)

Step 4: Qj, recalculated weight is estimated as follows:

$$ {Q}_j=\left\{\begin{array}{c}1\kern2.75em if\ j=1\\ {}\frac{K_{j-1}}{K_j}\kern1em if\ j>1\end{array}\right. $$
(4)

Step 5: Estimation of relative weights of the criteria.

$$ {W}_j=\frac{Q_j}{\sum_{j=1}^n{Q}_j} $$
(5)

where Wj represents the relative weight of criterion j and n denotes the number of criteria.

Shannon entropy (SE)

Entropy is a measure of uncertainty, disorder, and imbalance related to the system (Khosravi et al. 2016). It is a measure of the average of proportional differences between unit groups and the overall system (Naghibi et al. 2015). Shannon developed information theory by modifying the Boltzmann model (Shi and Jin 2009; Khosravi et al. 2018b). The information entropy method is used to estimate the weight index of several systems and has been integrated with flash flood susceptibility mapping (Khosravi et al. 2016), qanat potential mapping (Naghibi et al. 2015), and urban flood (Xu et al. 2018). The equation used to estimate the information coefficient (Vj) representing the weight value of each parameter is as follows (Constantin et al. 2011; Reza et al. 2012):

$$ {E}_{ij}=\frac{FR_{ij}}{\sum_{i=1}^{M_j}{FR}_{ij}} $$
(6)

where FR represents the frequency ratio (i is the number of classes in parameter j and Mj is the total number of classes for each parameter) and Eij is the probability density for class i in the parameter j.

$$ {H}_j=-{\sum}_{i=1}^{M_j}{E}_{ij}\times {\log}_2{E}_{ij} $$
(7)

where Hj is the entropy value of parameter j.

$$ {H}_{j\ \mathit{\max}}={\log}_2{M}_j $$
(8)

where Hj max is the maximum entropy value of parameter j.

$$ {I}_j=\frac{\left({H}_{j\ \mathit{\max}}-{H}_j\right)}{H_{j\ \mathit{\max}}} $$
(9)

where Ij is the information for the factor ranges between 0 and 1.

$$ {V}_j={I}_j\times \frac{\sum_{i=1}^{M_j}{FR}_{ij}}{M_j} $$
(10)

where Vj is the weight of parameter j.

Biogeography-based optimization

Biogeography-based optimization is a population-based metaheuristic algorithm introduced by Simon (2008) which inspired biogeography. The BBO has similar features like other population-based metaheuristic algorithms such as genetic algorithm (GA) and particle swarm optimization (PSO) to find the best candidate solution in the search space of optimization problem (Lim et al. 2016). The genes in GA and particles in PSO are equivalent to habitat in the BBO algorithm (Khosravi et al. 2021a). Each habitat has its own habitat suitability index (HSI). Based on the optimization problem, low or high HSI habitat is considered for further iterations. Migration and mutation are two significant operators of BBO. BBO algorithm solves many complex numerical problems and it is highly efficient in the optimization of many real-world complex problems like reservoir operation (Haddad et al. 2016), mapping of groundwater potential (Chen et al. 2019a), and flood susceptibility assessment (Ahmadlou et al. 2019).

Migration

The migration operator is used to improve the solution of habitat within the population by transferring information among them. Emigration and immigration are two forms of migration (Simon 2011). The immigration rate is used to choose whether habitat solutions need to modify or not (Ahmadlou et al. 2019). If a particular solution needs to improve, the emigration rate of other habitats is used to decide which of them should migrate (Bhattacharya and Chattopadhyay 2010). The emigration rate and immigration rate are obtained using the following equations (Simon 2008; Ahmadlou et al. 2019):

$$ {\lambda}_s=I\left(1-\frac{S}{S_{max}}\right) $$
(11)
$$ {\mu}_s=\frac{ES}{S_{max}} $$
(12)

where μs and λs are the emigration rate and immigration rate of S species, respectively. Smax stands for the maximum number of species. E indicates emigration rate and I indicates immigration rate.

Mutation

Mutation operator enhances the habitat diversity to prevent trapping in local minima. The mutation value of species is estimated using the following equation:

$$ m(S)={m}_{max}\left(1-\frac{1-{p}_s}{p_{max}}\right) $$
(13)

where m(S) is the mutation rate of habitat containing S species, mmax is the maximum mutation rate, and ps and pmax are the probability of each species and maximum probability, respectively.

The hyperparameters of the BBO algorithm used for optimization are shown in Table 4.

Table 4 Hyperparameters of BBO algorithm

Weight optimization using BBO

The metaheuristic algorithms employed in the previous studies of index-based groundwater assessment models such as DRASTIC (Jafari and Nikoo 2016; Barzegar et al. 2020; Balaji et al. 2021) and GALDIT (Bordbar et al. 2020; Khosravi et al. 2021a) were considered representatives on well contaminant concentration available in the respective study region to optimize the weights of index-based models. The abovementioned approach reduces the metaheuristic algorithm capability because of the reduced sample size in optimization (Davoudi et al. 2020). In previous multiple model studies, well concentration studies were used to optimize the weights of DRASTIC parameter. In this study, to explore the advancement of computing power, and to improve the DRASTIC framework, nitrate concentration of each pixel of the entire study region was considered to optimize the weights by creating spatial profile of nitrate concentration. The improved framework (SWARA and SE models) for groundwater vulnerability assessment is coupled with biogeography-based optimization for modifying the weights. Therefore, the objective function of BBO algorithm was to obtain maximum Pearson correlation between the groundwater vulnerability indices and nitrate concentration of SWARA and SE models.

$$ \mathit{\operatorname{Maximize}}\ F= corr\left( GVI,N\right) $$
(14)
$$ {\displaystyle \begin{array}{c} Constraint:1<{w}_j<5,\kern0.5em j=1,2,..,7\\ {}F=\frac{\sum_{i=1}^n\left({GVI}_i-\overline{GVI}\right)\left({N}_i-\overline{N}\right)}{\sqrt{\sum_{i=1}^n{\left(G{VI}_i-\overline{GVI}\right)}^2}\sqrt{\sum_{i=1}^n{\left({N}_i-\overline{N}\right)}^2}}\end{array}} $$
(15)

where F represents the objective function, n denotes the number of pixels, GVIi is the groundwater vulnerability index of the ith pixel, \( \overline{GVI} \) is the average groundwater vulnerability index, Ni indicates the nitrate concentration of the ith pixel (NO3) (mg/L), \( \overline{N} \) is the average nitrate concentration (NO3) (mg/L), and wj is the weight.

Groundwater vulnerability map

Groundwater vulnerability indices obtained from all the developed models are used to construct a vulnerability map. The resultant vulnerability map was generated using the weighted sum method in the ArcGIS environment with a pixel size of 30 × 30 m which is consistent with DEM. The rates and weights of seven DRASTIC layers were assigned according to the original DRASTIC model (Tables 3 and 7). For the modified groundwater vulnerability framework, the optimized rates and weights were assigned from SWARA, Shannon, SWARA-MH, and Shannon-MH model (Tables 3 and 7). All generated vulnerability map pixel values were categorized into five classes, namely very low, low, moderate, high, and very high, using the natural Jenks method (Jenks and Caspall 1971; Torkashvand et al. 2020). After the classification, each pixel resembles a vulnerability category which shows the probability of pollution occurrence (Khosravi et al. 2018b).

Model performance using the confusion matrix

To evaluate the groundwater vulnerability classification, performance F1 score and confusion matrix were selected (Zhang et al. 2020). In this case, classification accuracy is the number of predictions correctly made over the total number of pixels available. The metrics precision and recall are used to evaluate accuracy in classification using a confusion matrix. Generally, there is a trade-off between precision and recall; the F1 score on the other hand considers both the values of precision and recall to evaluate the classification performance. The metrics precision, recall, and F1 score were estimated using the following equations (Zhang et al. 2020):

$$ precision=\frac{y_{ij}}{y_i}\times 100\% $$
(16)
$$ recall=\frac{y_{ij}}{y_j}\times 100\% $$
(17)

where yij indicates the value in row i and column j; yi represents a marginal total of row i; and yj represents a marginal total of column j of the confusion matrix.

$$ F1\ score=\frac{2\times precicion\times recall}{precision+ recall} $$
(18)

For multi-class classification, the weighted F1 score was used which is calculated by considering the F1 score of each class to the number of pixels in that particular class (Pan et al. 2019).

Comparison and validation of maps

To compare the developed groundwater vulnerability model prediction accuracy, area under the receiver operating characteristic (AU-ROC) curve was used (Torkashvand et al. 2020; Abunada et al. 2021). The ROC value indicates the prediction ability of the model and ensures map accuracy quantitatively (Khosravi et al. 2018b). The receiver operating characteristic (ROC) is a plot between sensitivity (true positive rate) and 1-specificity (false positive rate). In this study, nitrate concentration pixel values and groundwater vulnerability indices were used to prepare the ROC curve. It is a scalar measure which ranges between 0 and 1.0. AUC ranges were ranked as follows: 0.5–0.6 (poor), 0.6–0.7 (average), 0.7–0.8 (good), 0.8–0.9 (very good), 0.9–1.0 (excellent) (Marmion et al. 2009). Model with a higher AUC indicates better model predictability.

Results

Original DRASTIC model

Groundwater vulnerability indices were calculated using Equation (19) and respective rates and weights of DRASTIC parameters were assigned from Tables 3 and 7. The thematic layers were assigned with rates (1 to 10) and weights (1 to 5) to calculate vulnerability indices based on the original DRASTIC model. The estimated vulnerability indices range between 102 and 188 as shown in Fig. 5a.

$$ GVI={D}_r.{D}_w+{R}_r.{R}_w+{A}_r.{A}_w+{S}_r.{S}_w+{T}_r.{T}_w+{I}_r.{I}_w+{C}_r.{C}_w $$
(19)
Fig. 5
figure 5

Groundwater vulnerability maps based on a original DRASTIC, b Shannon entropy, c SWARA, d SWARA-MH, and e Shannon-MH

where D, R, A, S, T, I, C represents the hydrogeological parameters, subscripts r and w are the respective rates and weights, and GVI is the groundwater vulnerability index.

To overcome the subjectivity and improve the performance of the vulnerability assessment framework, SWARA and SE approach was coupled with a metaheuristic algorithm.

Shannon entropy

Shannon entropy model modifies both weights and rates of the DRASTIC parameter using nitrate density. The weight values obtained from SE model for D, R, A, S, T, I, and C were 0.00624, 0.00078, 0.00882, 0.00680, 0.00061, 0.00447, and 0.01700, respectively. The modified rates of each class of DRASTIC parameters are given in Table 5. Hydraulic conductivity and topography were identified as the most and least significant weights for estimating groundwater vulnerability indices. Groundwater vulnerability indices using SE were calculated by using Equation (20). The resultant vulnerability map was divided into 5 classes using natural Jenks method as shown in Fig. 5b.

Table 5 Spatial distribution of nitrate concentration across DRASTIC parameters and modified rates using Shannon entropy method

The groundwater vulnerability indices of SE model were calculated using the following equation:

$$ {GVI}_{SE}=0.00624\times {D}_{FR}+0.00078\times {R}_{FR}+0.00882\times {A}_{FR}+0.0068\times {S}_{FR}+0.00061\times {T}_{FR}+0.00447\times {I}_{FR}+0.017\times {C}_{FR} $$
(20)

where FR indicates the rates obtained from SE model of respective parameters.

SWARA

SWARA technique improves vulnerability assessment reliability by ranking each class of DRASTIC parameters according to its nitrate concentration distribution over each class. The nitration concentration observed from 46 wells was spatially distributed over the entire region of interest using inverse distance weighting method (IDW). The relative weights and final rates of DRASTIC parameters rates were modified by calculating nitrate density of each class using the SWARA method. Based on SWARA rate modification, laterite (0.302) exerts a greater impact than alluvium (0.203) aquifer media. In soil media, marsh (0.257) exhibits more impact on vulnerability indices than sandy (0.202) soil. Hydraulic conductivity in the range 28.7–41(m/day) (0.247) is the most influencing class than the >82 (m/day) (0.118). Due to normalization step, modified rates lie in the range of 0 to 1.

The resultant vulnerability indices from SWARA framework were calculated from Equation (21) using modified SWARA rates (Table 6) and original DRASTIC weights (Table 7). The final vulnerability map using SWARA model is shown in Fig. 5c.

$$ {GVI}_{SWARA}={D}_{Rsw}.{D}_w+{R}_{Rsw}.{R}_w+{A}_{Rsw}.{A}_w+{S}_{Rsw}.{S}_w+{T}_{Rsw}.{T}_w+{I}_{Rsw}.{I}_w+{C}_{Rsw}.{C}_w $$
(21)
Table 6 Spatial distribution of nitrate concentration across DRASTIC parameters and modified rates using SWARA approach
Table 7 Original and optimized weight using hybridized metaheuristic framework

where Rsw indicates the rates obtained from SWARA model of respective parameters and w is the weight of original DRASTIC model.

SWARA-MH and Shannon-MH

Hybridization of modified DRASTIC model with metaheuristic algorithm will further improve the reliability of groundwater vulnerability assessment model. As discussed earlier, metaheuristic algorithm employed in this study considers each pixel of the entire region to optimize weights of the SWARA and SE model. The objective function of BBO algorithm is to optimize weights of DRASTIC parameters. The algorithm was coded in MATLAB 2016b platform with the lower and upper bound constraint of DRASTIC weights as 1 to 5. The calculated weights of the SWARA and SE model using biogeography-based optimization (BBO) are given in Table 7. It is to be noted that these optimized weights were obtained from rate-altered SWARA and SE model.

$$ {GVI}_{SWARA- MH}={D}_{Rsw}.{D}_{MH}+{R}_{Rsw}.{R}_{MH}+{A}_{Rsw}.{A}_{MH}+{S}_{Rsw}.{S}_{MH}+{T}_{Rsw}.{T}_{MH}+{I}_{Rsw}.{I}_{MH}+{C}_{Rsw}.{C}_{MH} $$
(22)

where subscript Rsw indicates the rates obtained from SWARA model of respective parameters and subscript MH is the optimal weights using BBO algorithm.

$$ {GVI}_{Shannon- MH}={D}_{FR}\times {D}_{MH}+{R}_{FR}\times {R}_{MH}+{A}_{FR}\times {A}_{MH}+{S}_{FR}\times {S}_{MH}+{T}_{FR}\times {T}_{MH}+{I}_{FR}\times {I}_{MH}+{C}_{FR}\times {C}_{MH} $$
(23)

where MH indicates the optimal weights obtained from BBO algorithm.

The resultant groundwater vulnerability maps using SWARA-MH and Shannon-MH model are shown in Fig. 5d and e.

Validation of model

Vulnerability indices obtained from 5 developed groundwater vulnerability models were validated and compared with the ROC curve method. In this study, the ROC curve was plotted by considering each pixel of vulnerability map developed. The developed groundwater vulnerability indices and nitrate concentration pixels were classified into 5 classes using natural Jenks method. As mentioned earlier, class obtained from each pixel of the nitrate concentration map and vulnerability indices was used to plot the ROC curve. The obtained ROC curve using developed vulnerability models is shown in Fig. 6. Results showed that Shannon-MH model had the highest AUC value of 0.8249. The performance accuracy for vulnerability assessment of other developed models were AUC = 0.8186 (SWARA-MH), AUC = 0.7714 (SWARA), AUC = 0.7672 (Shannon), and AUC = 0.7378 (original DRASTIC). The result shows the superiority of MCDM and SE approach for improving the vulnerability assessment model which was further improved by hybridizing the framework with a metaheuristic algorithm.

Fig. 6
figure 6

Assessment of model performance using the ROC curve

Confusion matrix and F1 score

Confusion matrices for five developed models are shown in Fig. 7a–e. The results from confusion matrices depict that the Shannon-MH and SWARA-MH model was performing better than generic DRASTIC model. The nitrate concentration pixel and vulnerability indices pixel were profound to be in good agreement on metaheuristic coupled models. The assessment of high and very high class was found poor in all 5 models because most of the nitrate concentration pixels were falling in the very low and low category. It is observed from resultant vulnerability maps that the classified area mostly falls in very low and low category which correctly correlates with true nitrate concentration map. The F1 score comparison of five models across each category is shown in Fig. 8. The weighted F1 score from the confusion matrix for the Shannon-MH model yields a best weighted F1 score of 0.452 among other developed models. SWARA-MH model had weighted F1 value of 0.437 followed by 0.419 and 0.370 for SWARA and SE models. Original DRASTIC model produces the least accuracy of weighted F1 value 0.234. In most instances, groundwater vulnerability framework improved with MCDM and bivariate statistical method performs better than the original DRASTIC model. It was found from F1 score of each category (Fig. 8) that coupling of metaheuristic algorithm with SWARA and Shannon framework considerably improves the performance of vulnerability mapping. In the case of multi-class classification, overall quality of the model was assessed by calculating weighted F1 score (Castro et al. 2017). The weighted F1 score of Shannon-MH model was found better than other developed models because of its classification accuracy.

Fig. 7
figure 7

Confusion matrix corresponding to a original DRASTIC, b Shannon entropy, c SWARA, d SWARA-MH, and e Shannon-MH

Fig. 8
figure 8

F1 and weighted F1 score for each model across different classes

Discussion

The original DRASTIC model has several shortcomings in groundwater vulnerability assessment because rates and weights adopted are not universal. Previous studies used a different approach to alter the rates and weights of DRASTIC parameters (Khosravi et al. 2018b; Nadiri et al. 2019; Barzegar et al. 2020; Torkashvand et al. 2020; Balaji et al. 2021). In this study, to reduce its subjectivity, bivariate statistical method (Shannon entropy) and MCDM (SWARA) approach were coupled with a metaheuristic algorithm. Current study advances SE and SWARA methods further to improve the accuracy of assessing vulnerability by coupling with metaheuristic algorithm. The developed framework in this study benefitted from bivariate and MCDM methods to improve DRASTIC ratings and to optimize weights by BBO algorithm.

Effects of modified rates and weights

Improvement of ratings for D, R, A, S, T, I, and C parameters by SE and SWARA methods provides analogous results. Modified rates of DRASTIC parameters in SE and SWARA models show a similar pattern because nitrate concentration distribution over each class was considered. According to the results of SE and SWARA method, soil media consists of marsh (SES = 1.40; SWARAS = 0.257) which had a higher rate than sandy (SES = 1.13; SWARAS = 0.202); this is due to the highest amount of nitrate concentration in the region which coincides with the respective class of soil media. Modified rates for hydraulic conductivity in the range 28.7–41 (m/day) (SEC = 1.51; SWARAC = 0.247) are the most influencing class than the >82 (m/day) (SEC = 0.68; SWARAC = 0.118). In topography, higher retention time of water on the surface occurs where the slope is flat. Results from the SE and SWARA method showed that slope of 6–12% (SET = 1.008; SWARAT =1.0069) was observed to be higher than the slope of 0–2% (SET = 0.98; SWARAT = 0.9834).

Optimized weights using biogeography-based optimization show hydraulic conductivity parameter is the highly significant parameter (weight = 5) in both Shannon-MH and SWARA-MH models for estimating groundwater vulnerability indices.

Contrarily, the weight considered for net recharge parameter is considerably reduced (w= 4 to 1) for both Shannon-MH and SWARA-MH models. In the case of Shannon-MH and SWARA-MH models, weight of topography factor increased from 1 to 3.36 and 3.23, respectively. Weight of depth to groundwater was reduced from 5 to 4.59 for Shannon-MH, while for SWARA-MH model, drastically to 1.55. Depth to groundwater had a high weight in bivariate model (Shannon-MH) but the inverse result was obtained using MCDM (SWARA-MH) model for attaining the defined objective function. Considerable weight increase has occurred in both the models for topography factor (1 to 3.36 for Shannon-MH and 3.23 for SWARA-MH). Among all parameters, hydraulic conductivity had the highest weight and net recharge had the lowest weight in both models. Rates from Table 3 depict an interesting finding that a moderate hydraulic conductivity with moderate terrain slope is subjected to high groundwater vulnerability than the flat terrain slope with high hydraulic conductivity for this region which is in agreement with the previous studies (Khosravi et al. 2018b).

Comparison with previous studies

Necessity to reduce the subjectivity of original DRASTIC model leads to several modification approaches for groundwater vulnerability assessment. Some of the researchers attempted modification of DRASTIC model using statistical (Pacheco and Sanches Fernandes 2013), multi-criteria decision-making (Torkashvand et al. 2020), bivariate (Khosravi et al. 2021b), metaheuristic algorithm approach (Balaji et al. 2021), and deep learning neural network (DLNN) (Elzain et al. 2021).

It is evident from the results that modifying the rates and weights of the DRASTIC model could improve its performance. The results obtained using bivariate statistical from this study are in accordance with the results of Sahoo et al. (2016) and Khosravi et al. (2018b). Generally, bivariate model optimizes both rates and weights, whereas this study maneuvers rates from the bivariate model and weights using BBO algorithm to develop a coupled (Shannon-MH) model. This improved framework enhances capability (AUC= 0.7672 to AUC = 0.8249) of the bivariate model further for better groundwater vulnerability assessment. The ratings were modified through MCDM approach and original DRASTIC weights (SWARA) model was obtained. The aforementioned model is further coupled with a metaheuristic algorithm (SWARA-MH) to improve its accuracy (AUC= 0.7714to AUC = 0.8186). Results depict improvement in vulnerability assessment framework by employing metaheuristic algorithm which is in agreement with previous studies (Jafari and Nikoo 2019; Barzegar et al. 2020; Balaji et al. 2021; Torkashvand et al. 2021). The multiple model framework for improvement of robustness has been proven in previous studies such as Elzain et al. (2021); Norouzi et al. (2021); and Torkashvand et al. (2021). This study further enhances the need of multiple model framework to address the inherent subjectivity of DRASTIC model.

According to the results of weighted F1 score from the confusion matrix, Shannon-MH model had better prediction accuracy and shows its superiority in vulnerability assessment. However, SWARA-MH model had a better F1 score for very low and moderate classes than Shannon-MH model. The results from AU-ROC are based on sensitivity and specificity which are derived from TP, TN, FP, and FN (Khosravi et al. 2018b). Findings indicated that higher classification accuracy was obtained after optimizing rates and weights. Results from AU-ROC, bivariate, and MCDM approach improve the accuracy of vulnerability assessment, AUC = 0.7378 (original DRASTIC) to AUC = 0.7672 and 0.7714 (for Shannon and MCDM approach). MCDM method marginally performs better than the bivariate statistical method. The modified model is further coupled with a metaheuristic algorithm and improved the performance of vulnerability assessment, SWARA-MH (AUC = 0.8186) and Shannon-MH (AUC = 0.8249).

Application of the new models reduces error considerably due to its objective nature of rates and weights modification by considering nitrate density. The main advantage of SE is its nonparametric nature and does not need any assumptions about variable distribution (Reza et al. 2012). SWARA method is straightforward and model criterions were prioritized based on the observed nitrate concentration which is its major advantage (Zolfani and Saparauskas 2014; Torkashvand et al. 2021). This study further improves the effectiveness of this model by coupling with BBO algorithm for vulnerability assessment of this region.

Conclusion

This study proposes a novel approach to improve the DRASTIC framework by coupling bivariate and MCDM approach with a metaheuristic algorithm. Four developed models, namely SWARA, Shannon, SWARA-MH, and Shannon-MH performance, were compared for vulnerability assessment. SE and SWARA approaches were employed first to modify the rates of the DRASTIC model and then biogeography-based optimization algorithm was used to optimize its weights. The results reveal that Shannon-MH model performs better than other developed models. The resultant vulnerability map obtained from the outperformed model classified northeastern region of study area as high to very high. However, major region of the study area falls under very low and low category. Moreover, performance of the developed model was evaluated through a weighted F1 score and confusion matrix. The result from this study depicts that bivariate and MCDM approach coupled with metaheuristic algorithm improves the assessment of groundwater vulnerability mapping. The framework developed in this study can be adopted in similar regions for accurate vulnerability assessment which helps policymakers to protect aquifer from contamination. However, this study has certain limitations which should be accounted for future improvement of DRASTIC framework. Land use is considered to be an important factor associated with anthropogenic vulnerability of urban region integrating land use with this novel framework significantly improving its prediction accuracy. In addition to spatial variation, spatiotemporal variation of nitrate concentration may integrate with this model by continuous monitoring of wells. The estimation of recharge rate may improve by considering groundwater pumping and recharge ratio integrated with land use.

Development of new models such as SWARA, SE, Shannon-MH, and SWARA-MH models is new of its nature for assessing vulnerability for this study region. The higher predicting capability of developed models helps the policymakers to identify the vulnerable zone and prevent it from further exploitation. This study concludes that the developed novel DRASTIC framework fetches accurate groundwater vulnerability map of this study region, which in turn helps policymakers in planning and adopting strategies for sustainable groundwater protection and management.