Keywords

1 Introduction

Drinking water crisis and groundwater scarcity are major challenges among the various prevailing contemporary issues of the earth. Groundwater is the most important but fast depleting natural resource whose appropriate delineation and management are momentous at this conjuncture. India is the most groundwater-consuming country in the world, which uses nearly 230 km3 year−1 of groundwater (World Bank 2010). According to the World Bank report of 2010, if India does not reduce the use of groundwater, more than 60% of the aquifers will be dried within 20 years. In India, demand for groundwater has been increasing through the green revolution and the pace of industrialization, urbanization, and agricultural practices (Suhag 2016). There are two different types of aquifers in India, i.e., crystalline aquifers (located in peninsular area) and another are alluvial aquifers (developed in the Indo-Gangetic plain). The former is characterized by low permeability and hard rocks and the latter leads in terms of groundwater resources (Suhag 2016). Thus, groundwater quality and potentiality assessment are important tasks at hand for reasons of sustainability and livelihood.

Therefore, most of the aquifers are in critical situations, particularly in semi-arid and arid regions, which may turn into a severe problem. Several researchers have tried to determine aquifer characteristics with the help of sediments beneath, identifying pore space, and fractures in a rock on the earth's surface, which is not adequate for identifying reliable aquifers (Naghibi et al. 2017). Generally, groundwater potentiality assessment including hydro-geological nature of the region especially porosity, aquifer properties, permeability, storage capacity, groundwater recharge, and hydraulic conductivity of the aquifer materials are very pertinent factors. These are broadly dependent on physical variables like geomorphology, geology, rainfall, soil, drainage, and LULC (Saha 2017; Haque et al. 2020). Presently, the unnecessary use of groundwater and unscientific management strategies are affecting the groundwater recharge level (Chaudhry et al. 2019). Therefore, in such circumstances, it is required that an adequate management strategy for groundwater potentiality assessment is framed (Chen et al. 2019). Thus, a groundwater potential map can help to identify the prospect of groundwater yield, which can guide toward proper management of groundwater.

Different popular and well-accepted models have been developed for preparing groundwater potentiality mapping (Corsini et al. 2009; Ozdemir 2011; Lee et al. 2017; Saha 2017; Chen et al. 2019). For example, the analytical hierarchy process (Razandi et al. 2015; Ghosh et al. 2020), the weight of evidence (Tahmassebipoor et al. 2016), and frequency ratio (Guru et al. 2017; Das 2019), fuzzy logic (Mohamed and Elmahdy 2017). Nowadays those models were not applied by researchers because they are unable to solve multi-criteria decision problems. An examination of the literature reveals that the integration of machine learning models has provided better results (Kenda et al. 2018; Chen et al. 2019). So, machine learning models handle data with high dimensionality and provide more perfect results using geographical information systems and remote sensing data (Gayen et al. 2019; Rudin 2019; Haque et al. 2020). Guzman et al. (2015) applied artificial neural network (ANN) and support vector machine (SVM) to predict groundwater potentiality. Guzman et al. (2015) have explained the superiority of the SVM models over ANN models about prediction. Naghibi et al. (2018) also applied some well-accepted machine learning models, i.e., boosted regression tree, classification and regression tree, and random forest for groundwater potentiality prediction. Their study shows that the boosted regression tree model provides a better result with an AUC value of 0.8103. Sajedi-Hosseini et al. (2018) also implemented a few machine learning models for groundwater risk assessment. Thus, the previous research work confirms the prediction capacity of machine learning models to predict groundwater potentiality. The present study has focused on novel ensemble machine learning models of Radial Basis Function neural network (RBFnn)- Bagging (RBFnn-Bagging) and Dagging (RBFnn-Dagging). The primary objective of this research is to prepare a groundwater potentiality map, along with groundwater quality of Md. Bazar Block in Birbhum District, India. Finally, researchers have tried to predict the groundwater controlling efficiency of the applied factors with mean decrease Gini (MDG).

2 Materials and Methods

2.1 Study Area

The Md. Bazar is a Jharkhand adjacent western block of Birbhum District located in West Bengal, India. It is extended from 87°25′ E to 87°40′ E and 23°55′ N to 24°50′ N (Fig. 15.1). This block was recognized as drought influenced district of West Bengal. This region is formed of gneisses and associated rocks, older alluvium, and older alluvium with lateritic types of aquifer media. The older alluvium has high to moderate yield potentiality but in the cases of older alluvium with laterite rocks, the yield potentiality is limited between 100 and 700 gpd ft−2 hydraulic conductivity in the study area (Thapa et al. 2018). This falls under the warm monsoon climate where annual precipitation is approximately 1200 mm and temperature ranges from 6 to 40 °C (Saha 2017). The maximum precipitation occurs from July to September (monsoon period). The long gap of the rainy season and over-increasing pressure of agriculture leads to continuous updraft of groundwater for irrigation which is one of the major issues of this region. The main routes of groundwater recharge in Md. Bazar block is natural and anthropogenic activities such as artificial canals, hydropower dams, and check dams.

Fig. 15.1
figure 1

Location map of the study area; a West Bengal b Birbhum District of West Bengal c Md. Bazar block

2.2 Data Used

In the first instance, dug wells locations were collected from the Central Ground Water Board. A total of 85 dug wells and one piezometer were recognized in Md. Bazar Block of Birbhum District and verified using GPS and field survey and considered for a groundwater inventory map (CGWB, 2017). After that, the well and no-well locations were classified into two sets by maintaining 70:30 ratio. 70% of locations were used as training dataset which was applied to predict the GWPMs. At the same time, the unused 30% locations were considered as a validation dataset of the modeling result (Naghibi et al. 2017; Chen et al. 2019).

Fourteen groundwater controlling factors viz., aspect, elevation, curvature, topographical positioning index (TPI), topographical wetness index (TWI), slope, stream junction frequency (SJF), geomorphology, distance to a river, rainfall, pond frequency, land use\land cover (LULC), geology, and soil texture were selected for the development of the GWPMs (Fig. 15.2). Thematic data layers of parameters were prepared using the GIS-spatial analysis tool and the PALSAR Digital Elevation Model (DEM) was taken from the Alaska Satellite facility; LULC map was developed by applying the Sentinal-2 data; rainfall data from Indian Meteorological Department (IMD); soil map from NBSS-LUP; and the geological map was collected from Geological Survey of India (GSI).

Fig. 15.2
figure 2

Flowchart illustrated the applied methodology of groundwater potentiality mapping

2.3 Preparing Groundwater Influencing Factors

At first, 12.5 × 12.5 m spatial resolution based PALSAR-DEM data was used to prepare the aspect, elevation, curvature, TWI, and TPI maps (Fig. 15.3a–e). Because these parameters are considered by several researchers (Naghibi et al. 2016, 2017; Chen et al. 2019) to be an essential parameters of the GWPM. Aspect and elevation both are associated with soil moisture, sunlight, temperature, wind, soil development, and precipitation therefore both factors can enhance the rate of groundwater recharge (Golkarian et al. 2018; Gayen et al. 2019). The slope is an important terrain factor that increases the velocity of surface runoff wherein a high slope does not allow infiltration of groundwater (Arabameri et al. 2019). The regional slope angle ranges from 0° to 34.21°. The TWI is applied for measuring the influence of topological conditions on hydro-geomorphic processes. It is the integration of slope and the upstream contributing area per unit orthogonal to the direction of flow (Arabameri et al. 2019). The calculation of TWI is represented in Moore et al. (1991):

$${\text{TWI}} = \ln \left( {\frac{{{\text{As}}}}{\tan \beta }} \right)$$
(15.1)
Fig. 15.3
figure 3figure 3

The spatial data layers: a aspect, b elevation, c curvature, d TPI, e TWI, f slope, g stream junction frequency, h geomorphology, i distance to river, j rainfall, k pond frequency, l LULC, m geology, n soil texture

where, As denotes cumulative catchment area (m2 m−1) and \(\beta\) defines the slope angle.

The TPI and curvature both are exhibited to affect groundwater potentiality (Grohmann and Riccomini 2009; Arabameri et al. 2019). The TPI and curvature maps were developed with the help of PALSAR-DEM data. The TPI has been calculated by using Eq. (15.2).

$${\text{TPI}} = Z_0 - \overline{Z}$$
(15.2)
$$\overline{Z} = \frac{1}{n_R }\sum_{i \in R} {Z_i }$$
(15.3)

where Z0 denotes the central point altitude, Z represents the mean altitude within a particular radius (R), and small R defines small ridges and valleys (Weiss 2001). The highest and lowest TPI values within the study area are 0.00 and 1.00.

Pond, drainage, and stream dictate structural characteristics and permeability of an area that influences groundwater storage and movement through a hydraulic gradient (Tien Bui et al. 2017). The distance to river and stream junction frequency maps were developed using the extracted drainage output from the 1:50,000 toposheet maps. Junction indicates the confluence areas of two rivers. Generally, chances of groundwater arability are more in the highest pond frequency areas and confluence zone areas because both are enhancing groundwater recharge processes. The LULC can reflect less susceptibility to groundwater potentiality (Saha 2017). A LULC map of study area was developed using Sentinal-2 data and results were affirmed by applying Cohen’s Kappa index with 89.6% Kappa value. The Block is covered by eight LULC classes: reservoir, watercourse, sand cover, settlement, agricultural land, mining area, wasteland, and vegetation cover (Fig. 3l). The duration of the Rainfall and its intensity also play a key role in groundwater recharge (Shekhar and Pandey 2014). Jothibasu and Anbazhagan (2016) noted that rainfall influences GPM accuracy and moving water percolation for that reason spatial distribution of rainfall was taken as a predisposing factor for this study (Fig. 3j).

Soil types are most important predisposing factors for the assessment of the infiltration rate of any region. This study area falls under six major soil types like sandy, clay loam, loamy, sandy loam, sandy clay, and sandy clay loam. Maximum areas of Md. Bazar block is covered by sandy loam and clay loam soil types (Fig. 3n). Generally, potentiality of groundwater infiltration rate is higher in sandy regions as compared to loamy or clayey strata. The Md. Bazar block is composed of eight geological formations. The western part is dominated by pink granite whereas rocks belonging to the Vindhyan formation occur to the east (Fig. 3m). The pisolitic and kankar ferruginous concretions are mostly found in the laterite track. Some parts of the block are covered by basaltic rocks and younger alluvium. The block falls under three primary geomorphological regions, i.e., depositional plain, anthropogenic origin, and denudational plain (Fig. 3h).

2.4 Machine Learning Ensemble Meta-classifiers Modes for the GWPMs

Novel ensemble models, the RBFnn-Bagging and RBFnn-Dagging, are used for mapping groundwater potentiality in this study. RBFnn originated in the late 1980s is a version of an artificial neural network. In a two-layer neural network, where each hidden unit implements a radial-activated function, RBFs are embedded. A weighted sum of hidden unit outputs is implemented by output units. Although the output is linear, the input into an RBF network is nonlinear. Their exceptional approximation capacities are investigated. RBF networks can model complex mappings due to their nonlinear approximation properties. The RBFnn was used as a base learner in this study. As for the ensemble technique, because of its utility in ensemble estimation, the Bagging and Dagging were applied as the meta-learner.

2.4.1 Bagging

The bagging algorithm has introduced by Breiman (1996), is the developer of bootstrapping (Freedman 1981). Several researchers have applied this model to predict susceptibility maps (i.e., flood, landslide, etc.) as this model has excellent performance ability (Hong et al. 2020). The bagging tree is a bagging algorithm comprised of models based on decision trees. This algorithm is selected because it fabricates the decision tree with the help of each produced subset and ultimately, they are assembled within the final model (Hong et al. 2020). It enhances the alignment accuracy by minimizing the inconsistency of the alignment error (Saha et al. 2021; Wu et al. 2020). A bagging classifier is considered a three-step bagging system (Breiman 1996; Yariyan et al. 2020). It is developed as a bootstrap sample through substantive training samples through the displacement approach (Saha et al. 2021). This MLA can promote the success of all arrays of subset by connecting them to the actual feature process for the bagging classification stage; also, this model is not dependent upon the precision of past models (Breiman 1996; Yariyan et al. 2020).

2.4.2 Dagging

The Dagging algorithm was introduced by Ting and Witten (1997), using another sampling method to extract a basic classifier. Dagging is very similar to bagging—name is a portmanteau derived from the phrase “disjoint bagging.” In dagging, once data is used for classification the subset is “disjointed” (or set aside). In bagging, each subset is not disjointed and the data is returned to the full set to be used again. Dagging is a well-known group-sampling technique using majority votes to combine several classifiers to improve prediction accuracies of basic classifiers (Kotsianti and Kanellopoulos 2007).

2.5 Validation of Groundwater Potentiality Models

Models’ evaluation and validation is an important steps in prediction work and without validation, the model does not have any scientific significance (Talukdar and Pal 2020; Pal and Mandal 2021). The applied model’s prediction capacity was investigated by ROC curve, RMSE, MAE, accuracy, and precision (Chen et al. 2018). The two categories of ROC curve on prediction and success rate, are developed using validation and training datasets, respectively. It is a graphical illustration of model prediction through a diagnostic test (Chen et al. 2019). The area under the curve (AUC) varies from 0.5 to 1.0 and the value close to 1.0 predicts the power of models (Mishra et al. 2020).

Also, error within the predictive models was calculated through RMSE and MAE tests to identify the prediction capacity (Abedinpour et al. 2012). Each error was calculated with the comparison between model values and field observed values (Rahmati et al. 2017). The precision, RMSE, MAE, and AUC have been calculated by using Eqs. (15.4)(15.7).

$${\text{Precision}} = \frac{{{\text{TP}}}}{{{\text{TP}} + {\text{FP}}}}$$
(15.4)
$${\text{RMSE}} = \sqrt {\frac{1}{{(N)\sum_{i = 1}^N {(O_{ i} - S_{ i} )^2 } }}}$$
(15.5)
$${\text{MAE}} = \sqrt {\sum_{i = 1}^n {\frac{{(S_{ i} - O_{ i} )}}{n}} }$$
(15.6)
$${\text{AUC}} = \frac{{\Sigma {\text{TP}} + \Sigma {\text{TN}}}}{P + N}$$
(15.7)

where TN and TP denote true negative and true positive, FP and FN denote false positive and false negative, Oi and Si are observed and predicted values, n is the number of observations, P and N are the dug wells location points, and N is the total number of non-dug wells location points.

3 Results and Analysis

3.1 Groundwater Potentiality Models

At first, two accepted meta classifier based MLAs were developed by applying the training dataset. The constructed models were divided into four classes (i.e., high, very high, moderate, and low) to calculate the groundwater potentiality indices (GWPI) (Chen et al. 2018) (Fig. 4a, b). Actually, the user-defined classification of GWPMs is nearly hard for readers to justify and interpret. Therefore, nature break statistics were most convenient for the arrangement of GWPI following the histogram of data distribution (Chen et al. 2019).

Fig. 15.4
figure 4

The groundwater potentiality maps by RBF-Bagging and RBF-Dagging models

The RBFnn-Bagging produced result shows that low potentiality zone has the maximum area (68.64%), followed by the very high (16.92%), moderate (11.51%), and high (2.92%) in the study area. The corresponding area covered by RBFnn-Dagging mode is 68.03%, 13.70%, 13.59%, and 4.68% for the low, very high, moderate, and high zones, respectively. It is manifest through both models GWPMs; the largest GWP area is found in the southern part of the Md Bazar Block because of the more forest cover and presence of water reservoir (Table 15.1).

Table 15.1 Areal share under potentiality classes of groundwater potentiality models (area in km2)

3.2 Validation and Comparison of Applied Models

For validation and comparing the applied models; RMSE, MAE, accuracy, precision, and ROC were implemented using validation and training data sets (Fig. 5a, b), as they are important aspects to conclude the prediction capacity of applied models (Pal and Mandal 2021).

Fig. 15.5
figure 5

Validation of groundwater potentiality maps applying ROC curve: a success rate curve (applying training dataset) and b prediction rate curve (applying validation dataset)

The results show that the RBFnn-Bagging algorithm has higher AUC values of 0.837 and 0.847, respectively, for the success and prediction rate curves, followed by the RBFnn-Dagging algorithm with an AUC value of 0.793 and 0.829, respectively. So, it is concluded that both models have excellent GWP prediction capacity. The RMSE and MAE values of RBFnn-Bagging and RBFnn-Dagging were calculated for the training phase as 0.237, 0.057, 0.270, and 0.74 and validation phase as 0.039, 0.198, 0.51, and 0.227, respectively (Table 15.2). Also, results of accuracy and precision tests are presented in Table 15.2 for both the applied models. The accuracy and precision values of both models were 0.88, 0.79, 0.83, and 0.75 for RBF-Bagging and RBFnn-Dagging, respectively which indicates that both the models have uniform prediction capacity for assessment of groundwater potentiality.

Table 15.2 Estimation of root mean square error (RMSE), MAE, accuracy, and precision for both models

3.3 Significant Factors Identification by MDGs

The significant factors identification is a challenging task because groundwater recharge is impacted by various groundwater controlling factors (Conforti et al. 2010). The mean decrease Gini was applied to evaluate factor’s relative importance by using the random forest (RF) algorithm (Breiman 2001). The MDG varies from 14.09 to 286.01. Distance to a river (286.01), pond frequency (229.96), aspect (103.56), stream junction frequency (101.45), elevation (62.06), and geomorphology (61.10) were the most important factors. These were followed in order of influence by the slope (45.32), TWI (42.94), soil types (36.11), curvature (31.45), geology (28.38), rainfall (26.88), TPI (24.99), and LULC (14.09) (Fig. 15.6 and Table 15.3). All the fourteen predisposing factors were subjected to the modelling—purpose because all are contributors to GWP occurrence.

Fig. 15.6
figure 6

Significant factors identification by mean decrease Gini

Table 15.3 Calculated MDGs values for significant factors identification

4 Discussion

For the groundwater potentiality (GWP) assessment factors like rainfall, land use, slope, elevation, pond frequency, stream junction frequency, distance to a river, TWI, soil texture, geology, geomorphology, curvature, and aspect are used. The elevation and slope are very low in the south-eastern portion of Md. Bazar block. Recharge of groundwater is negatively related to the elevation of study area. Thus, locations that are situated in low elevation areas show high groundwater potentiality at a particular region within the study area rather than being uniformly distributed across it.

In other works (e.g., Corsini et al. 2009; Ozdemir 2011; Rahmati et al. 2017; Naghibi et al. 2017; Chen et al. 2019), similar factors have been used for assessing GWP and the applied relation between the factor used and the wells are also found to be the same. Usually, there is no algorithm with an extreme prediction capacity that works completely as natural processes, and groundwater modeling is a complex and nonlinear process and cannot be based on normal models with a linear structure (Chen et al. 2019). Several researchers have applied MLAs like Bagging and Dagging, in various fields of research, like gully erosion, landslide, flood hazard, and deforestation susceptibility assessment (Chen et al. 2018, 2019; Arabameri et al. 2020; Hong et al. 2020; Pal et al. 2020; Talukdar et al. 2020; Saha et al. 2021). In every case, prediction capacity of the meta classifier ensemble model’s results was extremely appreciable. So, the application of machine learning algorithms (MLAs) is not a new thing, but the implication of these machine learning meta-classifiers models for groundwater potentiality assessment is unique.

Previous research work like that by Corsini et al. (2009), Ozdemir (2011), Lee et al. (2017), Naghibi et al. (2018), concluded that MLAs provided adequate results with respect to multivariate and bivariate statistical models. In other studies, like floods, landslides, and assessments of spring potential, the RBFnn-Bagging model has also given good results. In the sense that no overfeeding of data is executed, the RBFnn-Bagging model is the most important. It consists of multiple decision trees with an interaction between predisposing factors and non-linearity (Hong et al. 2020; Saha et al. 2021). The results also revealed that the processing speed of RBFnn-Bagging is much higher concerning RBFnn-Dagging mode, which means assignment of input factors is very important. As a matter of fact, concerning percentage of the low and high GWP zones, two models displayed a uniform spatial distribution. So, the RBF-Bagging and RBFnn-Dagging models can be applied for hazard vulnerability and susceptibility mappings such as flood, landslide, forest fire, and gully erosion at a local and regional scale.

5 Conclusion

Groundwater potential mapping, applying various predisposing factors, is an important aspect of groundwater research. For the accurate experiment of groundwater conditions, several algorithms have been applied around the globe. In this study, a well-accepted methodology was applied to delineate GWP zones in Md. Bazar Block. After critically evaluating the study, fourteen predisposing factors were overlaid with RBF-Bagging and RBFnn-Dagging models. The RBFnn-Bagging and RBFnn-Dagging models identified 16.92 and 13.70% of areas with very high groundwater potentiality and 68.64 and 68.03% of the block with low groundwater potentiality. The results alert that this block may face vulnerable conditions in the future if the government back-steps from introducing various schemes (i.e., rainwater harvesting, dam construction, etc.) and generating awareness among common people. Based on the experiment results the following conclusions can be summarized. First, the RBFnn-Bagging model has better prediction capacity than the RBFnn-Dagging model because the Bagging algorithm can be applied to find out reliable features of the real data. Second, researchers can solve the model overfitting problems by applying the RBF-Bagging model. Third, based on the mean decrease Gini, the most effective factors of groundwater potentiality are the distance to a river, pond frequency, aspect, stream junction frequency, elevation, and geomorphology, respectively. Finally, this proposed approach should be useful for the exploration, development, and management of groundwater. At the outset, it is pertinent that groundwater recharge processes along with their management are taken at the earliest in Md. Bazar Block.