Abstract
The present study aimed to create novel hybrid models to produce groundwater potentiality models (GWP) in the Teesta River basin of Bangladesh. Six ensemble machine learning (EML) algorithms, such as random forest (RF), random subspace, dagging, bagging, naïve Bayes tree (NBT), and stacking, coupled with fuzzy logic (FL) models and a ROC-based weighting approach have been used for creating hybrid models integrated GWP. The GWP was then verified using both parametric and nonparametric receiver operating characteristic curves (ROC), such as the empirical ROC (eROC) and the binormal ROC curve (bROC). We conducted an RF-based sensitivity analysis to compute the relevancy of the conditioning variables for GWP modeling. The very high and high groundwater potential regions were predicted as 831–1200 km2 and 521–680 km2 areas based on six EML models. Based on the area under the curve of the ROC, the NBT (eROC: 0.892; bROC: 0.928) model outperforms rest of the models. Six GPMs were considered variables for the next step and turned into crisp fuzzy layers using the fuzzy membership function, and the ROC-based weighting approach. Subsequently four fuzzy logic operators were used to assimilate the crisp fuzzy layers, including AND, OR, GAMMA0.8, and GAMMA 0.9, as well as GAMMA0.9. Thus, we created four hybrid models using FL model. The results of the eROC and bROC curve showed that GAMMA 0.9 operator outperformed other fuzzy operators-based GPMs in terms of accuracy. According to the validation outcomes, four hybrid models outperformed six EML models in terms of performance. The present study will aid in enhancing the efficiency of GPMs in preparing viable planning for groundwater management.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
In all climatic areas across the globe, groundwater is a highly significant and stable water source. Groundwater resource is depleted because of overexploitation with other natural resources (Falkenmark et al. 2019). The agriculture for developing countries like Bangladesh relies on irrigation based on groundwater. Therefore, groundwater exploitation is higher, causing the loss of groundwater supplies, which is a significant cause of worry (Khan et al. 2021; Nzama et al. 2021). In Bangladesh, Groundwater provides around 79 percent of the water supply (Shahinuzzaman et al. 2021). Groundwater provides 95 percent of irrigation supplies in certain sections, like the northwest (Shahinuzzaman et al. 2021). Agriculture accounts for around 18 percent of Bangladesh’s GDP and provides jobs to about 48 percent of the workforce (Shahinuzzaman et al. 2021). As a result, the development of groundwater resources is critical to the country's social and economic growth. It is also critical for the agricultural policy of the government toward attaining food independence and poverty reduction (Salem et al. 2017).
Consequently, it is critical to design a tactical plan to properly evaluate and manage groundwater resources using an assimilated method that considers a variety of ecological, socioeconomic, and scientific aspects. For the extension of irrigation-based agriculture and the execution of government initiatives, a thorough understanding of the spatial distribution of groundwater accessibility is critical. In order to minimize overdraft, it is also critical to utilize groundwater wisely (Benjmel et al. 2020).
Delineation of the areas having groundwater is one of the most fundamental aspects of groundwater research. Recently, there has been a strong interest in potential groundwater mapping among the researchers, especially in dry places, where the shortage of safe freshwater is a significant issue, and the growth of irrigation, industry, and urbanization is nearly entirely dependent on groundwater (Portoghese et al. 2021; Zhu and Abdelkareem 2021). The geographic information systems (GISs) and remote sensing (RS) technology have been used recently to analyze large-scale spatial and temporal databases. These technologies help delineate the potential groundwater zones with high precision and very little time. These technologies have replaced the time-consuming and costly groundwater research methods, such as drilling and geological and geophysical procedures. Now, GIS is crucial for dealing with large geographic datasets, such as spring and qanat sites (Pham et al. 2021; Nwankwo et al. 2020). By integrating RS and GIS, many topographical, hydrological, climatic, pedogenic parameters for extensive areas can be extracted with very high precision (Arabameri et al. 2020, 2021) (Table 1). Also, these technologies help to assimilate multi-parameters and are able to produce highly accurate groundwater potential maps for large to a small areas within a short time (Dau et al. 2021; Namous et al. 2021). Therefore, these technologies have added a new dimension to groundwater research (Mallick et al. 2021c; Nguyen et al. 2020). The robust and effective models generally rely on the choice of conditioning variables and standard assimilation methods (Kumar et al. 2020). The proper parameters choice for GWP modeling is challenging because redundant parameters can produce the erroneous results (Malik and Bhagwat 2021). However, researchers have used several topographical, hydrological, climatic, pedogenic parameters for modeling (Tolche 2021; Zhu and Abdelkareem 2021). In the present study, we chose the conditioning variables for the modeling based on the literature survey (Table 2). We chose those variables, which many researchers have extensively used. In the plain regions, topographic and climatic parameters have been recognized as important variables, while in the mountains, along with topographic, geological variables have been described as critical variables for GWP mapping (Mallick et al. 2021c; Al-Djazouli et al. 2021; Pathak et al. 2021; Namous et al. 2021; Al-Abadi et al. 2021). For example, drainage density could be a valid variable in flood plains, not in mountainous regions (Bhattacharya et al. 2021; Fadhillah et al. 2021). Therefore, researchers should pay attention while choosing variables for modeling the spatial features of the study area (Pal et al. 2020b). Consequently, as shown in Table 1, the groundwater potentiality conditioning factors utilized in this study were determined after a comprehensive literature review.
Recently, researchers have found that just the assimilation of several parameters does not provide highly accurate and robust GWP maps; therefore, to achieve the accurate GWP maps, researchers have to use different methods, which can assimilate mathematically all parameters having different data patterns and direction (Hembram et al. 2019; Das et al. 2021). Therefore, researchers have been paid higher interest in developing such accurate GWP modeling methods (Nguyen et al. 2020). Several approaches have been developed and used for GWP modeling (Pande et al. 2020). Therefore, we classified all methods as per their operational background, such as (1) statistical approaches for zoning groundwater potential, which have a long history of use (Mallick et al. 2021c; Pham et al. 2021). Statistical procedures that are now in use include frequency ratio (Abd Manap et al. 2014; Guru et al. 2017), logistic regression (Rizeei et al. 2019), weight of evidence (Rane and Jayaraj 2021; Das et al. 2021), certainty factor (Razandi et al. 2015), and evidential belief function (Tahmassebipoor et al. 2016). However, they have several disadvantages, including a lack of precision (Chen et al. 2020). (2) Techniques for multi-criteria decision analysis (MCDA), such as the analytic hierarchy process (AHP) (Kumar et al. 2020; Murmu et al. 2019) and TOPSIS (Mandal et al. 2021; Zaree et al. 2019). Experts' judgment has tuned semiquantitative models (AHP), but for comparable geo-environmental elements or locations, the models need extensive understanding of groundwater and conditioning variables, which is seldom accessible (El Bilali et al. 2021; Mogaji et al. 2016). Statistical approaches have been widely regarded as the best way for GWP mapping at sizes of 1:20,000 to 1:50,000, as they can map springs and wells in detail (Mallick et al. 2021a; Arshad et al. 2020). Statistical models cannot account for nonlinear interactions. Therefore, machine learning (ML) models based on artificial intelligence have been created (Mallick et al. 2021a). The conditions necessary to enhance groundwater capacity have been established using machine learning algorithms based on data mining. (3) Machine learning (ML) models include CART (Gayen and Pourghasemi 2019), random forest (RF) (Golkarian et al. 2018), support vector machine (Panahi et al. 2020), artificial neural network (Nguyen et al. 2020; Naghibi et al. 2017; Mallick et al. 2021c), neuro-fuzzy (Termeh et al. 2019), and decision trees (Choubin et al. 2019). Each has the same goal: to discover the best cost-effective and efficient technique. It is also worth noting that utilizing field data in GIS-based models enhances outcomes (Phong et al. 2021; Zhao and Chen 2020).
The utilization of EML algorithms has been substantially increased for higher accuracy in GWP mapping (Al-Abadi and Shahid 2015). Ensemble modeling included two or more ML algorithms to enhance the prediction accuracy (Muavhi et al. 2021; Pham et al. 2021; Farzin et al. 2021). Ensemble modeling can mitigate the flaws of an individual model (Talukdar et al. 2020, 2021b; Rahmati et al. 2016). Susceptibility, vulnerability, hazards, potentiality, and other issues can now be studied using a multi-model approach and ensemble modeling (Talukdar and Pal, 2019; Islam et al., 2021; Mahato et al. 2021; Talukdar et al. 2021a). The ensemble models include AdaBoost (Ha et al. 2021), bagging (Yen et al. 2021), Reptree-bagging (Chen et al. 2019a), dagging (Talukdar et al. 2021a, b), and rotation forest (Mallick et al. 2021c). Therefore, to increase the model's resilience for GWP mapping, we utilized six ensemble machine learning techniques in the present study, including RF, RS, bagging, dagging, NBT, and stacking. The EML-based prediction approach is rarely utilized in the Teesta River Basin of Bangladesh for GWP mapping.
Experimental hybrid models for GWP mapping have now been investigated in recent years, as there is a necessity to investigate contemporary prediction methodologies and procedures to collect more scientific knowledge to make fair findings (Table 2). Several hybrid approaches have been effectively utilized for groundwater potentiality modeling, which has been produced by combining statistical techniques with machine learning approaches, such as bagging based linear discriminant function (Chen et al. 2019b), EML models with discriminant analysis (Ha et al. 2021), and adaptive neuro-fuzzy (Termeh et al. 2019).
To create hybrid models, six EML models were combined with four operators of fuzzy logic models and a ROC-based weighting technique in the current work. The hybrid models have a higher capacity to help researchers in future groundwater potentiality studies by increasing the popularity of this approach. Predictions of GWP utilizing contemporary hybrid techniques are significant since the models are more accurate in detecting and predicting than machine learning models.
Additionally, prior studies devoted minimal emphasis to thematic layer sensitivity analysis. This research has treated thematic layers with sensitivity tests following creating hybrid models. The most significant thematic layers have been determined utilizing several machine learning-based sensitivity studies to improve the model's predictive performance. This technique was utilized to minimize uncertainty in other research, such as gully erosion prediction, land subsidence prediction, and landslide susceptibility (Forkuor et al. 2017; Abdulkadir et al. 2019; Chen et al. 2018). In this study, RF-based sensitivity analyses were implemented to identify the model's significant thematic layer output. In addition, the model's efficiency was assessed using the ROC curve. Only a few researchers have used parametric and nonparametric ROC curves for validation. Therefore, to address the research as mentioned earlier gaps, the study's main objectives are to:
-
1.
Develop hybrid algorithm-based GPMs by combining EMLs such as RF, RS, bagging, dagging, NBT, and stacking with four fuzzy logic operators;
-
2.
Undertake sensitivity analysis; and
-
3.
Apply eROC and bROC curves for validation.
This study will aid governments and scientists in effectively proposing plans for groundwater management.
Materials and methods
Description of the study area
The Teesta River is originated from eastern Himalayas, flows across Bangladesh's northern area (Fig. 1), and is recognized as the lifeblood of Bangladesh's northern area. This river covers fourteen percent land, which provides direct and indirect livelihood for twenty-one million people of Bangladesh, which accounts for seven percent population of the Bangladesh (BBS 2016). The floodplain of this river is considered as the important geomorphic units, including fourteen northern districts. It flows across the five districts of Bangladesh (Gaibandha, Kurigram, Lalmonirhat, Nilphamari, and Rangpur districts). The basin area of this river is around 2,000 km2 and comprises alluvial floodplain having fine to medium sand.
This river is a vital supply of water in the northern drought-prone area, and millions of people rely on it for their lives. The study area is in a subtropical monsoon climatic zone where rainfall occurs only during monsoon months (June to September), dry for the rest of the year (Akter et al. 2019). Although the northern region stays dry throughout the post- and pre-monsoon seasons, the area receives over 1900 mm of annual rainfall on average. Summer and winter mean temperatures in the Teesta River basin are about 35 °C and 15 °C, respectively (Islam et al. 2014).
Preparation of groundwater inventory
In the present research, GWP was predicted using ML and EML algorithms with conditioning parameters. To create the inventory, we used well locations of the study area. The inventory map for the study area contains 230 well points gathered across multiple sources and a thorough field examination. First, non-groundwater data should be generated that is comparable to the groundwater data used in GWP modeling. The field survey has been used to make the selection, along with an equal quantity of non-groundwater data (230 points). All datasets have been separated into 80 percent (368): 20 percent (92 points) training and testing datasets based on arbitrary partitioning (Fig. 11). Groundwater and non-groundwater training data are used to calibrate the model, while groundwater and non-groundwater testing data are used to validate it (Mallick et al. 2021b).
Data preparation
In the present study, we selected 12 conditioning variables based literature review, availability of data, and technological setup (Table 1). Therefore, the variables are elevation, aspect, TWI, SPI, STI, LULC, TRI, distance to the river, curvature, soil condition, slope, and rainfall. Employing a resampling approach, all relevant factors have been converted to a spatial resolution of 30 m.
Topographic factors, derived from ASTER GDEM, are important for GWP modeling since they influence the study region's hydrological properties both directly and indirectly (Bui et al. 2020b).
Elevation
The elevation primarily shows surface terrain irregularity, crucial to groundwater potentiality. There is a reduced infiltration rate in locations connected with steep elevation due to increased surface runoff. In contrast, plain land with lower elevation has an extended water retention period, increasing the water infiltration rate for higher groundwater recharge (Arulbalaji et al. 2019). We created the elevation map of the research region using an SRTM-DEM. The research area's elevation ranges from 18 to 69 m (Fig. 2a). The majority of the region (about 70% of the total area) has elevations ranging from 18 to 40 m, while fewer than 10% of the entire area has elevations over 60 m.
Curvature
Curvature values describe the shape of regional topography (Ginesta Torcivia and Ros López 2020). A positive curvature indicates that the surface is convex, whereas a negative curvature indicates that it is concave (Costache and Tien Bui 2020). The value zero denotes a fat surface. Convex slopes, on the other hand, drain more runoff water than concave slopes. The concave down regions have been the most vulnerable to groundwater recharge (Fig. 2b).
TRI
Riley et al. (1999) created TRI (Fig. 2c) by computing the discrepancy between the elevation values of a given cell in a DEM (Arabameri et al. 2021). Each of the numbers is squared to keep them all positive, and then, the squares are averaged. To obtain the TRI, the square root of this average is calculated. The TRI value in the study area ranges 0–27.
Aspect
Aspect is the direction in which a slope faces, and it impacts the physical properties of a slope such as lineament, and exposure to sunlight (Masroor et al. 2021). DEM was used to construct aspect data (Fig. 2d), which were divided into nine categories: north, east, south, west, northeast, northwest, southeast, southwest, and flat.
Slope
Slope is the magnitude of inclination of a surface in reference to a horizontal plane that affects water flow under the influence of gravity, thereby determining subsurface lateral transmissivity rate (Bhattacharya et al. 2021; Al-Abadi et al. 2021). It controls the quantity of water that collects in a certain area, and hence plays an essential role in groundwater recharge. Lower slopes and flat regions define the research area, which contribute to good groundwater recharge. The study area belongs to flat regions, therefore, has a high probability of groundwater (Fig. 2e).
TWI
The topographic wetness index (TWI) was first established by Beven and Kirkby (1979) as part of the runoff model TOPMODEL (Arulbalaji et al. 2019). This index is an indicator of availability of water in an area as a result of topographic effects on water accumulation (Mokarram et al. 2015). This index represents the amount of water contained in the region at each pixel scale (Saha et al. 2021) and is calculated using Eq. (1):
As and β denote, respectively, the same catchment area (m2m1) and slope (in degrees). High TWI values and GWP have a strong association in general (Shit et al. 2020). TWI values range from − 1.54 to 7.72 in the research region (See Fig. 2f.)
SPI
The slope and contributing area are used to determine SPI, which is a measure of the erosive strength of flowing water (Namous et al. 2021). The SPI is calculated using Eq. 2.
As indicates the catchment area, while \(\beta\) denotes the slope. The SPI in the study area ranges between 0 and > 3 (Fig. 3a).
STI
The sediment transport index (STI) represents the quantity of erosion and depositions that might affect infiltration and recharging (Pham et al. 2021). The channel's bed alters owing to silt deposition, limiting the channel's capacity to retain water and creating groundwater potentiality. The STI is calculated from the DEM using Eq. 3.
where each pixel of the slope of the upstream region is defined by As. The STI value in the study area varies between 0 and 140.64. (See Fig. 3b.)
Rainfall
Rainfall, collected from meteorological stations of Bangladesh, has been identified as a critical component in influencing the possibility for groundwater to be recharged (Arulbalaji et al. 2019). In part, an excessive amount of rain in a short period may cause a low groundwater potential (Fadhillah et al. 2021). The kriging interpolation method constructed a rainfall map in the ArcGIS software version 10.3 environment using recorded rainfall data from four Bangladesh meteorological stations. The data were imported into ArcGIS 10.3 and processed. Because of the tiny quantity of information available, this strategy is highly recommended (Zhu and Abdelkareem 2021). The yearly rainfall in the study region, on the other hand, varies from 361 to 550 mm each year (Das 2021; Das and Wahiduzzaman 2021) (Fig. 3c).
Soil types
Soil type affects the rainfall-runoff process (Tolche 2021). Soil qualities directly regulate water penetration, therefore affects rainfall-runoff production. If the degree of penetration seems to be high, groundwater incidents are more likely to happen. According to USDA soil classification, the research area comprises 12 different types of soil (Fig. 3d).
Land use/land cover
The influence of LULC on surface runoff and sediment flow has a substantial effect on the incidence of groundwater potentiality (Senapati and Das 2021). Usually, the LULC has a complete control over surface runoff production and penetration. The built-up regions prohibit water from accessing and creating surface water, and groundwater potentiality is quite low. The forest environment, on the other hand, favors water infiltration, resulting in lower groundwater potentiality (Elmahdy et al. 2020). The association between GWP and plant density is inverse when evaluating hydrological responses at different time scales (Senapati and Das 2021). We collected Landsat 8 OLI (path/row: 138/42) for LULC mapping. The ANN model was used to produce a LULC map in ENVI software (version 5.3). The LULC map was categorized into six classes: bare land, forest, sand bar, built-up, agricultural land, and water body (Fig. 3e).
Distance to the river
The majority of groundwater potentiality-inundation regions are often located around the river's edge. Because river distance effects groundwater potentiality and river flow to river aspect, it is an important factor for finding basin regions with high groundwater potential (Namous et al. 2021). The greater the distance between a place and a river, the less probable it is that the area has a big amount of groundwater capacity. The basin-scale storage of terrestrial water accounts for regional groundwater potentiality. In this investigation, we used a topographic map with a scale of 1: 50,000 and Google Earth to compute the distance to the river map (Fig. 3f).
Methods for information gain ratio and multicollinearity test
Before using ML models to measure GWP in this study, two preliminary tests have been executed, such as multicollinearity and feature selection. When two or more variables in an analysis have a linear correlation, multicollinearity arises. If there is multicollinearity, slight adjustments in the model or data might cause considerable variations in the multiple regression coefficient estimations. This circumstance may impair the precision of the generated models' predictions. The feature selection (FS) test is the other preliminary test, and it seeks to pick the appropriate characteristics for utilization in model creation. The FS minimizes the complexity of a model while also improving the predictors' effectiveness. It also allows for a deeper grasp of the underpinning mechanism that produced the data (Tien Bui et al. 2020). The information gain ratio (IGR) approach was employed in this investigation. IGR measures the information gain with regard to the class to determine the value of a feature. The IGR measures the value of a characteristic in relation to the class, with a larger information gain ratio indicating a stronger prediction power of the utilized models. Tien Bui et al. (2020) provide more information on this approach.
Method for groundwater potentiality modeling
RF
A random forest model is an ensemble machine learning approach that may build many decision trees to elucidate the spatial link between landslides. It operates by training many decision trees and then generating classes that represent the mode of classification or regression of individual trees (Breiman 2001). A decision tree is used to output the class in the classification process. The average of the findings is used to predict the dependent variable in the regression process. There are no preconceptions regarding the connection between explanatory factors and response variables in the random forest. This is an effective way to investigate hierarchical relationships and nonlinearities in big data. As a result, a random forest method may be used to anticipate new data cases more accurately.
Random subspace
Random subspace is a successful EML algorithm developed by Ho (1998) that uses a pseudo-randomly selected subset of characteristics to separate classifiers and combines their outputs via voting. RSS is a forest creation approach that uses an ensemble classifier to enhance the performance of individual classifiers that are underperforming (Kotsiantis 2011). The RS method includes selecting samples from the original training set at random to create a bootstrap sample, which would then be utilized to construct the decision tree (Kotsiantis 2011). A subset of features gets picked at random for each node of the decision tree, and the best split gets determined. Attributes, predictors, and independent variables are all included in an RS model. The correlation between estimators is reduced when randomly chosen features are used instead of the whole feature set. Ultimately, the tree is constructed to its full potential. As a result of leveraging random subspaces in both creation and aggregation, this strategy generates an effective hybrid model for minimizing over-fitting difficulties and managing datasets with a large number of repetitive variables. Ho (1998) has detailed information of the RS model.
Dagging
Ting and Witten (1997) pioneered the dagging technique. The dagging approach divides the training dataset into a number of disjoint, stratified folds and uses the given base learner to train each fold. Predictions are produced using a majority vote approach for classification issues and an averaging procedure for regression problems.
Bagging
The bagging (Bootstrap aggregating) EML algorithm is a fundamental group learning model to manufacture and aggregate (Quinlan 2006). It was offered as a way to reduce variation without raising bias error too much. Hong et al. (2019) found that bagging is a useful strategy for simulating a variety of environmental concerns. Bagging combines the bootstrap technique with the auxiliary approach to create several sets of samples, which are referred to as bootstrapped subsets. Each subgroup trains a base classifier on its own till the outputs get combined into a unified strong classifier via majority voting approach.
NBT
The machine learning classifier naive Bayes (NB) produces a probability-based model, which operates using the Bayes' theorem. The NB's structure is based on a decision tree (DT), and it arranges an NB model on each of the DT's leaf nodes (Jiang and Li 2011). The NBT performs well in terms of categorization and reliability (Arabameri et al. 2020).
The influence of a feature values on a given class throughout the NB process is independent of the value of another feature, which is referred to as class conditional independence. NB's conditional independence speeds up the training of datasets by treating all vectors as independent and using the Bayes rule.
Stacking
Stacking is an ensemble model in which the training data are utilized to generate a variety of algorithms. This method was developed by Wolpert (1992), and it works by computing the raw classifiers of the poor performance in relation to independent or bootstrapped reference data. Ensemble stacking is also known as blending since all of the statistics may be blended to create an estimation or classification. The stacking method increases the classifier's predictive power over the bagging and boosting procedures. Remote sensing, computer science, and finance are just a few of the fields where this ensemble method has shown potential. Table 3 shows the parameters that have been optimized.
Validation of the models
The ROC curve has been employed to evaluate the precision GWP models. The ROC is a relative factor that indicates the probability of a class employing the Boolean method. The vertical axis of the ROC curve shows the actual positive proportion, while the horizontal axis represents the false positive percentage. AUC stands for the area under the ROC curve, and value ranges from 0 to 1. The high values indicate good performance of the models. If the value is close to 1.0, the predicted model's accuracy will be very high.
Both nonparametric and parametric methodologies were used to determine the area under the ROC curve. In this study, we applied nonparametric and nonparametric approaches for validation.
Proposing fuzzy logic-ROC weighting-based hybrid EML models for GWP mapping
We combined the fuzzy logic model with previously utilized EMLs to increase the accuracy of GPMs. A variety of procedures were taken to achieve this. The EML algorithms (six models) were utilized as parameters to build GPMs using a fuzzy logic model. Zadeh (1965) was the first to propose the concept of fuzzy sets. It makes it possible to grasp non-discrete natural events mathematically. The following are the specifics of fuzzy logic-based hybrid models:
We combined different operators of FL model with already developed six EML models to enhance the precision of the GWP models. To do so, we followed several steps, such as first we considered six EML models as input of the FL model. Then, we applied linear fuzzy membership function to the six EML models as the value of EML models showed the monotonic trend of potentiality like low potential to high potential.
We did not use conditioning factors directly in this investigation; instead, we used six GPMs that had already been created using conditioning variables. The concept behind using six GPMs is that each GPM was created using distinct ensemble machine learning models and numerous parameters. As a result, the GPM result revealed the intricate functioning of algorithms, conditioning parameters, and existing inventories. After converting the input variables (six EML models) into fuzzy crisp layers, the subsequent process is the integration of the parameters.
For integrating several input variables, fuzzy operators have been used. Five operators, such as AND, OR, SUM, PRODUCT, and GAMMA, have been extensively used (Chung and Fabbri 2001). To obtain very high precision prediction, a suitable operator should be selected for integration. In the present study, we used all the operators for combining the input variables. Based on the initial screening, we excluded the final output of AND and PRODUCT operators. The following formulas have been used to integrate the input variables using fuzzy operators:
where \({f_{{\text{RF}}}}\), \({f_{{\text{RS}}}}\), \({f_{{\text{Bagging}}}}\), \({f_{{\text{Dagging}}}}\), \({f_{{\text{NBT}}}}\), \({f_{{\text{Stacking}}}}\) are fuzzy crisp layers of RF, RS, bagging, dagging, NBT, and stacking, respectively. Also \({R_i}\) represents the fuzzy membership function of the \(ith\) map, \(i = 1,2,...,n\).
For GAMMA operator, we used six coefficients, such as 0.7, 0.75, 0.8, 0.85, 0.9, and 0.95. After screening, we excluded the final output of 0.7, 0.75, 0.8, and 0.85 coefficient. The value of final output after integration ranges from 0 to 1, where close to 1 indicates the higher potentiality. Then, we applied natural break algorithm to classify the models into five classes, such as very low, low, moderate, high, and very high potentiality.
Sensitivity analysis
In the present study, we performed machine learning algorithm (RF)-based sensitivity analysis to compute the relevancy of the conditioning variables. The mean decrease in accuracy (MDA) and mean decrease in Gini (MDG) coefficient are two measures based on RF for evaluating the sensitivity power of the input variables. To determine a variable's MDA, their values are permuted arbitrarily for the OOB data, whereas the other variables' values remain unchanged. The variable's relevance is determined through evaluating the resultant misclassification rate to the rate obtained without arbitrarily permuting the variable's values. This process is carried out for each parameter. Using the Gini splitting criteria, a variable's MDG is calculated considering the number of trees in the forest as a normalization factor. (For details of RF, see method section.)
The methodology of this research is summarized in Fig. 4.
Results
Computation of the multicollinearity analysis and importance of the parameters
In multicollinearity diagnostics tests, the highest VIF is discovered in elevation (2.71) and rainfall (2.67), followed by STI, SPI, and slope. The lowest VIF has been found in the case of aspect, curvature, and TWI (Table 4). The results also show that the variables have no collinearity among themselves; therefore, we can use them for modeling GWP.
Table 4 also provides the results of the tenfold cross-validation method used to calculate each parameter's InGR. The InGR data indicated that the LULC (0.516), distance to river (0.124), and elevation (0.114) have the high InGR value that indicates the most influential parameters for modeling GWP. The TRI (0.031), SPI (0.027), and STI (0.015) have just a slight impact on the GWP models. The TWI (0.008) and curvature (0.011) are all statistically insignificant. It is worth noting that the aspect factor has a value of InGR = 0.007, suggesting that it has the least impact on groundwater potential zones prediction.
GWP modeling and their validation
We created GWP models in Fig. 5 utilizing six EMLs, including RF, RS, bagging, dagging, NBT, and stacking. We classified GWP models into five classes, as illustrated in Fig. 5, as follows: very high to very low. The possible GWP areas follow the drainage route of the watershed, running northwest–southeast. Zones with high GWP dominate the south and southeast part of the study area, whereas zones with low GWP comprise the north and northwest part of the study area.
According to the RF model, 2.26 percent and 36.69 percent area predicted as very high GWP and high GWP zones (Table 5). While the RS, bagging, dagging, NBT, and stacking models categorized roughly thirty percent of the entire basin area as having a high GWP zone. The NBT model revealed the lowest area for extremely low class, whereas RF, bagging, and RS covered the maximum area (Table 5). All the models identified the river catchment region as possessing many possibilities for groundwater storage. However, since the size of the area varies, it is crucial to describe the most appropriate model.
Using the obtained GPS coordinates, the AUC of ROC has been utilized to verify the GWP models (Meten et al. 2015; Nahayo et al. 2019). NBT (AUC: 0.892 and 0.928) seemed to be the best model for both ROC curves, preceded by stacking (AUC: 0.889 and 0.931), RS (AUC: 0.889 and 0.912), dagging (AUC: 0.87 and 0.882), RF (AUC: 0.882 and 0.936), and bagging (AUC: 0.861 and 0.87) (Fig. 6). However, according to the binormal ROC curve, RF was the finest model (bROC: 0.936), followed by stacking (bROC: 0.931), NBT (bROC: 0.928), RS (bROC: 0.912), dagging (bROC: 0.882), and bagging (bROC: 0.87) (Fig. 6).
Sensitivity analysis using machine learning algorithms
Advanced EML models showed the zonation of GWP areas for the present study area. Furthermore, none of these models include the influence of any variables to the prediction of GWP modeling. The problem emerges in developing and implementing management plans without a thorough grasp of the link between parameters and GWP models.
If the influence of the conditioning variables is not possible to compute, it would be very unclear how management strategies would be developed and implemented. Identifying factors linked with GWP models could sometimes help reduce the exploitation of groundwater resources and formulation of groundwater management plans. As a consequence, determining which elements have the most influence is crucial. For this, we used RF-based two error matrices, such as MDG and MDA to compute the influence of the variables to the GWP models (Hollister et al. 2016). The results showed that the distance to the river, TWI, aspect, STI, slope, elevation, and rainfall were the most relevant parameters for GWP modeling (Fig. 7). The least significant factors in defining the relative importance of the 12 variables included in EML models were soil kinds, LULC, and SPI, with soil types, LULC, and SPI being the least important.
Development of FL and ROC weighting-based hybrid models and their validation
The GWP models must be very resilient and precise before providing sustainable management approaches. As a result, we attempted to increase the robustness and accuracy of EML-based GWP models to provide highly effective sustainable management strategies in this work. Integrating fuzzy logic and a ROC-based weighting technique has enhanced the EML models even further. Before using fuzzy logic, the ROC-based weighting technique was used to weight the EML-based GWP layers in this work. The rationale for using a ROC-based weighting method rather than an expert-based approach, AHP, or weighted linear combination is because the ROC measures how similar EML-based GWP models are to the ground truth or reality. Therefore, the value of ROC curve shows that the high value reflects the prediction of the models is quite similar with ground conditions. As a result, the model with the highest AUC value can be very appropriate and given a higher weight than other models. Hence, we utilized the AUC values of the ROC curve as the weighted value in this investigation. For the weighting technique, we used the AUC value of the binormal ROC curve. In this study, the RF model received the highest score of all the models, since it had higher AUC values than the other models. RF, stacking, NBT, RS, dagging, and bagging are the layers in the hierarchical sequence for allocating weights based on AUC values.
The fuzzy logic model was deployed after the EML-based GWP models were transformed into weighted layers. Before applying the weighted method, the models were normalized because the EML algorithms predicted GWP as 0–1 values. The data patterns of the layers reflect a similar tendency, such as monotonous growth, which shows a constant growing or declining trend. The data pattern in this investigation revealed a constant GW decreasing tendency. As a result, we used a linear fuzzy membership function to normalize all the GWP layers. After applying the fuzzy membership function, the fuzzy crisp layers of six EML-based GWP models are shown in Fig. 8a–f.
After converting the crisp fuzzy layers, we integrated all fuzzified layers using different fuzzy operators, such as AND, OR, SUM, PRODUCT, GAMMA 0.7, GAMMA 0.75, GAMMA 0.8, and GAMMA 0.9. Then, we inspected the generated output through visualization. Subsequently, we excluded the output generated from SUM, PRODUCT, GAMMA 0.7, and GAMMA 0.75, as these outputs seem not good enough. We considered the output from AND, OR, GAMMA 0.8, and GAMMA 0.9 as excellent results based on our inspection. After that, we classified the output into five classes as we did for previous EML models. Then, we validated the models using eROC and bROC curves (Fig. 9a–d). The area coverage for various GPW categories was calculated. According to all models, 1045–1200 km2 of the area were classified as very high GPW zones, whereas 780–895km2 of the area was projected as very-low GPW zones (Fig. 9a–d).
Based on the AUC of ROC curve, GAMMA 0.9 appeared as best model (eAUC: 0.903 and bAUC: 0.932), followed by GAMMA 0.8 (eAUC: 0.902 and bAUC: 0.949), AND (eAUC: 0.899 and bAUC: 0.948), and OR (eAUC: 0.866 and bAUC: 0.919) models (see Fig. 9a–d). However, according to the bROC curve, GAMMA 0.8 was shown to be the superior model for prediction of natural hazards (bAUC: 0.949), followed by AND (bAUC: 0.948), GAMMA 0.9 (bAUC: 0.932), and bagging (bAUC: 0.919) (Fig. 10). All models are highly accurate and robust than the EML-based models. Therefore, it can be stated that after integrating ROC-based weighting approach and fuzzy logic, the efficiency of the GPW models is increased further.
Discussion
Delineation of GWP or other natural hazards using ML and EML algorithms is highly timely work since future circumstances should be known to professionals and governments to promote sustainable development. Decision-makers can suggest management plans based on this information. No model, however, is ideal for predicting GWP and natural hazards using ML and EML algorithms. As a result, researchers are constantly attempting to create and use new models for predicting occurrences through the complicated nonlinear process. Therefore, in the present study, we proposed six ROC weighting integrated ensemble machine learning models, such as RF, RS, bagging, dagging, and stacking, which had never been used before, were tested and coupled with fuzzy logic operators (AND, OR, GAMMA 0.8, and GAMMA 0.9), a widely employed advanced model, in the current study. The criteria that are beneficial for groundwater occurrence were initially detected for groundwater resource identification. The precision of the outputs entirely relies on the model's predictive capacity and the input data's quality. Therefore, the impact of these variables was evaluated (Table 1), and we eliminated the variables having less impact from modeling (Table 4). These less important variables could affect the prediction procedure (Maskooni et al. 2020; Muavhi et al. 2021).
Also, we applied machine learning technique like random forest for computing the importance of the GWP conditioning variables to the GWP models (Fig. 7). The results showed that the distance to the river and the TWI have the largest impact since water penetration is stronger near the river and in the higher TWI zone, resulting in larger GWP ability. Our work is quite identical to the findings of Pham et al. (2021) and Pal et al. (2020a). Rainfall ranks third in the MDA and seventh in the MDG (Fig. 7) because it has a moderate influence on the groundwater potentiality model. The relevance of the factors in potential groundwater mapping, on the other hand, is heavily determined by the study region's features and the research method used.
There have been several statistical, and ML models applied in GWP modeling, and many of these models have yielded excellent prediction results, as shown in the literature review (Mallick et al. 2021d). In recent years, hybrid models, on the other hand, have become more popular. For groundwater-related studies, the effectiveness of hybrid approaches could be helpful to researchers in the future (Farzin et al. 2021). Because of this, we proposed ROC weighting-based ensemble machine learning algorithms (RF, RS, bagging and dagging, NBT, and stacking) for groundwater potentiality modeling in the present study. We combined these algorithms with different operators of fuzzy logic (AND, OR, GAMMA 0.8, and GAMMA 0.9). In this way, we built hybrid models for GWP modeling.
The RS, bagging, and dagging models, together with the NBT and stacking models, categorized approximate thirty percent area of the total study area as considered high GWP zone. The NBT model predicted that the very low class would have the lowest coverage area, while the RF, bagging, and RS classes would have the maximum coverage area (Table 4). The river catchment region, in general, was identified by all the models as having a significant impact on GWP model. Furthermore, the six advanced EMLs were validated using the eROC and bROC curves and showed NBT model (eROC: 0.892; bROC: 0.928) appeared as best model, followed by stacking (eROC: 0.889; bROC: 0.931), RS (eROC: 0.889; bROC: 0.912), dagging (eROC: 0.87; bROC: 0.882), RF (eROC: 0.882; bROC: 0.936), and bagging (eROC: 0.861; bROC: 0.87). These six models performed better, with AUC values greater than 0.8. As a result, it is reasonable to conclude that NBT outperformed other models because it is a fast decision algorithm ensemble with naïve Bayes that has been successfully applied to achieving trustworthy findings for forecasting natural disasters and other environmental factors (Pham et al. 2021; Phong et al. 2021).
Finally, to the best of the authors' knowledge, the fuzzy logic-ROC weighted integrated hybrid EML models were proposed for the first time. The outputs were found to be very high rather than standalone ML and EML, such as AND-hybrid (eROC: 0.899; bROC: 0.948), OR-hybrid (eROC: 0.866; bROC: 0.919), GAMMA 0.8 (eROC: 0.902; bROC: 0.949), and GAMMA 0.9 (eROC: 0.903; bROC: 0.932) can improve the accuracy and robustness of advanced machine learning models.
We concluded that hybrid EML models outperformed other EML models and ML models for GWP modeling based on the above discussion and results. Therefore, the present study recommends using hybrid EML models to predict natural hazards and other natural resource predictions in different regions. These models would yield high precision prediction results.
Conclusion
Specifically, the present work is concerned with creating fuzzy logic, and EML integrated hybrid models to predict groundwater potentiality models. We summarized the main findings below:
-
(i)
Using six EML models and four fuzzy-based hybrid models, researchers determined that the extremely high groundwater potential zone encompasses an area ranging from 830 to 21200km2.
-
(ii)
The NBT model performed as superior for GWP modeling (eROC = 0.892; bROC: 0.928). It was followed by stacking, RS, dagging, RF, and bagging. However, the suggested FL-based hybrid models, such as GAMMA 0.9 (eROC − 0.903; bROC: 0.932), outperformed all other models in terms of AUC. The best models, according to binormal ROC, would be GAMMA 0.8 (bROC: 0.949), followed by AND (bROC: 0.948), GAMMA 0.9 (bROC: 0.932), and OR (bROC: 0.919), respectively. All four models outperformed the six EML models by a significant margin.
-
(iii)
We performed machine learning algorithm like random forest for sensitivity analysis to compute the influence of the parameters for GWP modeling. The results showed that the distance to the river, elevation, and slope are mostly sensitive parameters for GWP.
Among GWP models, hybrid models beat EML-based models in accuracy and sensitivity. These findings encourage the researchers to adopt the hybrid EML-based models for integrating multi-parameters for any predictive model. In addition, we recommend using more numbers of conditioning variables for generating the high precision predictive models. Also, the application and integration of hybrid models with deep learning algorithms may produce very high precision findings. The present study also recommends proper management of the conditioning variables, reducing groundwater exploitation, and increasing groundwater recharge. Consequently, the maintenance of forest cover will help in the recharging of groundwater. For a scientific evaluation of groundwater in different potential zones, more research is required to provide more accurate advice on how much water may be taken from each prospective zone.
Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request”.
References
Abdulkadir TS, Muhammad RUM, Wan Yusof K et al (2019) Quantitative analysis of soil erosion causative factors for susceptibility assessment in a complex watershed. Cogent Eng. https://doi.org/10.1080/23311916.2019.1594506
Abd Manap M, Nampak H, Pradhan B, Lee S, Sulaiman WNA, Ramli MF (2014) Application of probabilistic-based frequency ratio model in groundwater potential mapping using remote sensing data and GIS. Arab J Geosci 7(2):711–724
Ajibade FO, Olajire OO, Ajibade TF et al (2021) Groundwater potential assessment as a preliminary step to solving water scarcity challenges in Ekpoma, Edo State, Nigeria. Acta Geophys 69:1367–1381. https://doi.org/10.1007/s11600-021-00611-8
Akter S, Howladar MF, Ahmed Z, Chowdhury TR (2019) The rainfall and discharge trends of Surma River area in North-eastern part of Bangladesh: an approach for understanding the impacts of climatic change. Environ Syst Res 8:1–12. https://doi.org/10.1186/s40068-019-0156-y
Al-Abadi AM, Shahid S (2015) A comparison between index of entropy and catastrophe theory methods for mapping groundwater potential in an arid region. Environ Monit Assess. https://doi.org/10.1007/s10661-015-4801-2
Al-Abadi AM, Fryar AE, Rasheed AA, Pradhan B (2021) Assessment of groundwater potential in terms of the availability and quality of the resource: a case study from Iraq. Environ Earth Sci 80:1–22. https://doi.org/10.1007/S12665-021-09725-0
Al-Djazouli MO, Elmorabiti K, Rahimi A et al (2021) Delineating of groundwater potential zones based on remote sensing, GIS and analytical hierarchical process: a case of Waddai, eastern Chad. GeoJournal 86:1881–1894. https://doi.org/10.1007/s10708-020-10160-0
Arabameri A, Lee S, Tiefenbacher JP, Ngo PTT (2020) Novel ensemble of MCDM-artificial intelligence techniques for groundwater-potential mapping in arid and semi-arid regions (Iran). Remote Sens 12:490. https://doi.org/10.3390/rs12030490
Arabameri A, Pal SC, Rezaie F et al (2021) Modeling groundwater potential using novel GIS-based machine-learning ensemble techniques. J Hydrol Reg Stud 36:100848. https://doi.org/10.1016/j.ejrh.2021.100848
Arshad A, Zhang Z, Zhang W, Dilawar A (2020) Mapping favorable groundwater potential recharge zones using a GIS-based analytical hierarchical process and probability frequency ratio model: a case study from an agro-urban region of Pakistan. Geosci Front 11:1805–1819. https://doi.org/10.1016/j.gsf.2019.12.013
Arulbalaji P, Padmalal D, Sreelash K (2019) GIS and AHP techniques based delineation of groundwater potential zones: a case study from Southern Western Ghats. India Sci Rep. https://doi.org/10.1038/s41598-019-38567-x
Aykut T (2021) Determination of groundwater potential zones using Geographical Information Systems (GIS) and Analytic Hierarchy Process (AHP) between Edirne-Kalkansogut (northwestern Turkey). Groundw Sustain Dev 12:100545. https://doi.org/10.1016/j.gsd.2021.100545
BBS (Bangladesh Bureau of Statistics) (2016) Bangladesh Disaster-related Statistics 2015: Climate Change and Natural Disaster Perspectives. Statistics and Informatics Division, Ministry of Planning, Government of the People’s Republic of Bangladesh, pp 165–171
Beven KJ, Kirkby MJ (1979) A physically based, variable contributing area model of basin hydrology. Hydrol Sci J 24(1):43–69
Benjmel K, Amraoui F, Boutaleb S, Ouchchen M, Tahiri A, Touab A (2020) Mapping of groundwater potential zones in crystalline terrain using remote sensing, GIS techniques, and multicriteria data analysis (Case of the Ighrem Region, Western Anti-Atlas, Morocco). Water 12(2):471
Bhattacharya S, Das S, Das S et al (2021) An integrated approach for mapping groundwater potential applying geospatial and MIF techniques in the semiarid region. Environ Dev Sustain 23:495–510. https://doi.org/10.1007/s10668-020-00593-5
Breiman L (2001) Random forests. Mach Learn 45:5–32. https://doi.org/10.1023/A:1010933404324
Chen W, Li H, Hou E et al (2018) GIS-based groundwater potential analysis using novel ensemble weights-of-evidence with logistic regression and functional tree models. Sci Total Environ 634:853–867. https://doi.org/10.1016/j.scitotenv.2018.04.055
Chen W, Panahi M, Khosravi K, Pourghasemi HR, Rezaie F, Parvinnezhad D (2019a) Spatial prediction of groundwater potentiality using ANFIS ensembled with teaching-learning-based and biogeography-based optimization. J Hydrol 572:435–448
Chen W, Pradhan B, Li S, Shahabi H, Rizeei HM, Hou E, Wang S (2019b) Novel hybrid integration approach of bagging-based fisher’s linear discriminant function for groundwater potential analysis. Nat Resourc Res 28(4):1239–1258
Chen W, Li Y, Tsangaratos P et al (2020) Groundwater spring potential mapping using artificial intelligence approach based on kernel logistic regression, random forest, and alternating decision tree models. Appl Sci 10:425. https://doi.org/10.3390/app10020425
Choubin B, Rahmati O, Soleimani F, et al (2019) Regional groundwater potential analysis using classification and regression trees. In: Spatial modeling in GIS and R for earth and environmental sciences. pp 485–498
Chung CF, Fabbri AG (2001) Prediction models for landslide hazard using a Fuzzy set approach. Marchetti M, Rivas V, Ed Geomorphol Environ impact Assess, pp. 31–47
Costache R, Tien Bui D (2020) Identification of areas prone to flash-flood phenomena using multiple-criteria decision-making, bivariate statistics, machine learning and their ensembles. Sci Total Environ 712:136492. https://doi.org/10.1016/j.scitotenv.2019.136492
Das S (2021) Extreme rainfall estimation at ungauged locations: information that needs to be included in low-lying monsoon climate regions like Bangladesh. J Hydrol 601:126616
Das S, Wahiduzzaman M (2021) Identifying meaningful covariates that can improve the interpolation of monsoon rainfall in a low-lying tropical region. Int J Climatol. https://doi.org/10.1002/joc.7316
Das N, Sutradhar S, Ghosh R, Mondal P (2021) applicability of geospatial technology, weight of evidence, and multilayer perceptron methods for groundwater management: a geoscientific study on Birbhum District, West Bengal, India. In: Groundwater and Society. pp 473–499
Dau QV, Kuntiyawichai K, Adeloye AJ (2021) Future changes in water availability due to climate change projections for Huong Basin. Vietnam Environ Process 8:77–98. https://doi.org/10.1007/s40710-020-00475-y
El Bilali A, Taleb A, Brouziyne Y (2021) Comparing four machine learning model performances in forecasting the alluvial aquifer level in a semi-arid region. J African Earth Sci 181:104244. https://doi.org/10.1016/j.jafrearsci.2021.104244
Elmahdy S, Mohamed M, Ali T (2020) Land use/land cover changes impact on groundwater level and quality in the northern part of the United Arab Emirates. Remote Sens. https://doi.org/10.3390/rs12111715
Fadhillah MF, Lee S, Lee CW, Park YC (2021) Application of support vector regression and metaheuristic optimization algorithms for groundwater potential mapping in gangneung-si. South Korea Remote Sens 13:1196. https://doi.org/10.3390/rs13061196
Falkenmark M, Lindh G, Tanner RG et al (2019) Water for a starving world. Taylor and Francis
Farzin M, Avand M, Ahmadzadeh H et al (2021) Assessment of Ensemble models for groundwater potential modeling and prediction in a Karst Watershed. Water 13:2540. https://doi.org/10.3390/W13182540
Forkuor G, Hounkpatin OKL, Welp G, Thiel M (2017) High resolution mapping of soil properties using remote sensing variables in south-western Burkina Faso: a comparison of machine learning and multiple linear regression models. PLoS ONE. https://doi.org/10.1371/journal.pone.0170478
Gayen A, Pourghasemi HR (2019) Spatial modeling of gully erosion: a new ensemble of CART and GLM data-mining algorithms. In: Spatial modeling in GIS and R for earth and environmental sciences. Elsevier, pp 653–669
Ginesta Torcivia CE, Ríos López NN (2020) Preliminary morphometric analysis: Río Talacasto Basin, Central Precordillera of San Juan, Argentina. pp 158–168
Golkarian A, Naghibi SA, Kalantar B, Pradhan B (2018) Groundwater potential mapping using C5. 0, random forest, and multivariate adaptive regression spline models in GIS. Environ monit Assess 190(3):1–16
Guru B, Seshan K, Bera S (2017) Frequency ratio model for groundwater potential mapping and its sustainable management in cold desert. India. J King Saud Univ-Sci 29(3):333–347
Ha DH, Nguyen PT, Costache R, Al-Ansari N, Van Phong T, Nguyen HD, Amiri M, Sharma R, Prakash I, Van Le H, Nguyen HBT (2021) Quadratic discriminant analysis based ensemble machinelearning models for groundwater potential modeling and mapping. Water Res Manag 35(13):4415–4433
Hembram TK, Paul GC, Saha S (2019) Comparative analysis between morphometry and geo-environmental factor based soil erosion risk assessment using weight of evidence model: a study on Jainti River Basin, Eastern India. Environ Process 6:883–913. https://doi.org/10.1007/s40710-019-00388-5
Ho TK (1998) The random subspace method for constructing decision forests. IEEE Trans Pattern Anal Mach Intell 20:832–844. https://doi.org/10.1109/34.709601
Hollister JW, Milstead WB, Kreakie BJ (2016) Modeling lake trophic state: a random forest approach. Ecosphere 7(3):e01321
Hong H, Liu J, Zhu AX (2019) Landslide susceptibility evaluating using artificial intelligence method in the Youfang district (China). Environ Earth Sci. https://doi.org/10.1007/s12665-019-8415-9
Islam ARMT, Talukdar S, Mahato S et al (2021) Flood susceptibility modelling using advanced ensemble machine learning models. Geosci Front. https://doi.org/10.1016/j.gsf.2020.09.006
Islam S, Reza A, Islam T, et al (2014) Geomorphology and land use mapping of Northern Part of Rangpur District, Bangladesh. J Geosci Geomatics 2:145–150. https://doi.org/10.12691/jgg-2-4-2
Jiang L, Li C (2011) Scaling up the accuracy of decision-tree classifiers: a naive-bayes combination. J Comput 6:1325–1331. https://doi.org/10.4304/jcp.6.7.1325-1331
Khan M, Ul H, Shakeel M, Ahsan N et al (2021) Groundwater contamination and health risk posed by industrial effluent in NCR region. Mater Today Proc. https://doi.org/10.1016/j.matpr.2021.02.192
Kotsiantis S (2011) Combining bagging, boosting, rotation forest and random subspace methods. Artif Intell Rev 35:223–240. https://doi.org/10.1007/s10462-010-9192-8
Kumar M, Goswami R, Patel AK, Srivastava M, Das N (2020) Scenario, perspectives and mechanism of arsenic and fluoride co-occurrence in the groundwater: a review. Chemosphere 249:126126
Mahato S, Pal S, Talukdar S et al (2021) Field based index of flood vulnerability (IFV): a new validation technique for flood susceptible models. Geosci Front 12:101175. https://doi.org/10.1016/j.gsf.2021.101175
Malik A, Bhagwat A (2021) Modelling groundwater level fluctuations in urban areas using artificial neural network. Groundw Sustain Dev. https://doi.org/10.1016/j.gsd.2020.100484
Mallick J, Talukdar S, Alsubih M et al (2021a) Proposing receiver operating characteristic-based sensitivity analysis with introducing swarm optimized ensemble learning algorithms for groundwater potentiality modelling in Asir region. Saudi Arabia Geocarto Int. https://doi.org/10.1080/10106049.2021.1878291
Mallick J, Talukdar S, Alsubih M et al (2021b) Integration of statistical models and ensemble machine learning algorithms (MLAs) for developing the novel hybrid groundwater potentiality models: a case study of semi-arid watershed in Saudi Arabia. Geocarto Int. https://doi.org/10.1080/10106049.2021.1939439
Mallick J, Talukdar S, Ben KN et al (2021c) A novel hybrid model for developing groundwater potentiality model using high resolution Digital Elevation Model (DEM) derived factors. Water 13:2632. https://doi.org/10.3390/w13192632
Mallick J, Talukdar S, Pal S, Rahman A (2021d) A novel classifier for improving wetland mapping by integrating image fusion techniques and ensemble machine learning classifiers. Ecol Info 65:101426
Mandal P, Saha J, Bhattacharya S, Paul S (2021) Delineation of groundwater potential zones using the integration of geospatial and MIF techniques: A case study on Rarh region of West Bengal. India. Environ Challenges 5:100396
Maskooni EK, Naghibi SA, Hashemi H, Berndtsson R (2020) Application of advanced machine learning algorithms to assess groundwater potential using remote sensing-derived data. Remote 12:2742. https://doi.org/10.3390/RS12172742
Masroor M, Rehman S, Sajjad H et al (2021) Assessing the impact of drought conditions on groundwater potential in Godavari Middle Sub-Basin, India using analytical hierarchy process and random forest machine learning algorithm. Groundw Sustain Dev 13:100554. https://doi.org/10.1016/j.gsd.2021.100554
Meten M, PrakashBhandary N, Yatabe R (2015) Effect of landslide factor combinations on the prediction accuracy of landslide susceptibility maps in the Blue Nile Gorge of Central Ethiopia. Geoenviron Disasters 2(1):1–17
Mogaji KA, Omosuyi GO, Adelusi AO, Lim HS (2016) Application of GIS-based evidential belief function model to regional groundwater recharge potential zones mapping in Hardrock geologic terrain. Environ Process 3:93–123. https://doi.org/10.1007/s40710-016-0126-6
Mokarram M, Roshan G, Negahban S (2015) Landform classification using topography position index (case study: salt dome of Korsia-Darab plain, Iran). Model Earth Syst Environ 1:1–7. https://doi.org/10.1007/s40808-015-0055-9
Muavhi N, Thamaga KH, Mutoti MI (2021) Mapping groundwater potential zones using relative frequency ratio, analytic hierarchy process and their hybrid models: case of Nzhelele-Makhado area in South Africa. Geocarto Int. https://doi.org/10.1080/10106049.2021.1936212
Murmu P, Kumar M, Lal D, Sonker I, Singh SK (2019) Delineation of groundwater potential zones using geospatial techniques and analytical hierarchy process in Dumka district, Jharkhand. India. Groundwater Sustainable Develop 9:100239
Naghibi SA, Ahmadi K, Daneshi A (2017) Application of support vector machine, random forest, and genetic algorithm optimized random forest models in groundwater potential mapping. Water Res Manag 31(9):2761–2775
Nahayo L, Kalisa E, Maniragaba A, Nshimiyimana FX (2019) Comparison of analytical hierarchy process and certain factor models in landslide susceptibility mapping in Rwanda. Model Earth Syst Environ 5(3):885–895
Namous M, Hssaisoune M, Pradhan B et al (2021) (2021) Spatial prediction of groundwater potentiality in large semi-arid and karstic mountainous region using machine learning models. Water 13:2273. https://doi.org/10.3390/W13162273
Nguyen PT, Ha DH, Avand M et al (2020) Soft computing ensemble models based on logistic regression for groundwater potential mapping. Appl Sci 10:2469. https://doi.org/10.3390/app10072469
Nwankwo CB, Hoque MA, Islam MA, Dewan A (2020) Groundwater constituents and trace elements in the basement aquifers of Africa and sedimentary aquifers of Asia: medical hydrogeology of drinking water minerals and toxicants. Earth Syst Environ 4:369–384. https://doi.org/10.1007/s41748-020-00151-z
Nzama SM, Kanyerere TOB, Mapoma HWT (2021) Using groundwater quality index and concentration duration curves for classification and protection of groundwater resources: relevance of groundwater quality of reserve determination. South Africa Sustain Water Resour Manag. https://doi.org/10.1007/s40899-021-00503-1
Pal S, Kundu S, Mahato S (2020a) Groundwater potential zones for sustainable management plans in a river basin of India and Bangladesh. J Clean Prod 257:120311. https://doi.org/10.1016/j.jclepro.2020.120311
Panahi M, Sadhasivam N, Pourghasemi HR, Rezaie F, Lee S (2020) Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). J Hydrol 588:125033
Pande CB, Moharir KN, Singh SK, Dzwairo B (2020) Groundwater evaluation for drinking purposes using statistical index: study of Akola and Buldhana districts of Maharashtra, India. Environ Dev Sustain 22:7453–7471. https://doi.org/10.1007/s10668-019-00531-0
Pathak D, Maharjan R, Maharjan N et al (2021) Evaluation of parameter sensitivity for groundwater potential mapping in the mountainous region of Nepal Himalaya. Groundw Sustain Dev 13:100562. https://doi.org/10.1016/j.gsd.2021.100562
Pham BT, Jaafari A, Van PT et al (2021) Naïve Bayes ensemble models for groundwater potential mapping. Ecol Inform 64:101389. https://doi.org/10.1016/j.ecoinf.2021.101389
Phong TV, Pham BT, Trinh PT, Ly HB, Vu QH, Ho LS, Le HV, Phong LH, Avand M, Prakash I (2021) Groundwater Potential Mapping Using GIS-Based Hybrid Artificial Intelligence Methods. Groundwater 59(5):745–760
Portoghese I, Giannoccaro G, Giordano R, Pagano A (2021) Modeling the impacts of volumetric water pricing in irrigation districts with conjunctive use of surface and groundwater resources. Agric Water Manag. https://doi.org/10.1016/j.agwat.2020.106561
Rane NL, Jayaraj GK (2021) Comparison of multi-influence factor, weight of evidence and frequency ratio techniques to evaluate groundwater potential zones of basaltic aquifer systems. Environment, Development and Sustainability, pp 1–30
Razandi Y, Pourghasemi HR, Neisani NS, Rahmati O (2015) Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Sci Informatics 8(4):867–883
Tahmassebipoor N, Rahmati O, Noormohamadi F, Lee S (2016) Spatial analysis of groundwater potential using weights-of-evidence and evidential belief function models and remote sensing. Arab J Geosci 9(1):1–18
Quinlan JR (2006) Bagging, boosting, and C4. 5. Univ Sydney Sydney, Aust 1:725–730
Rahmati O, Pourghasemi HR, Melesse AM (2016) Application of GIS-based data driven random forest and maximum entropy models for groundwater potential mapping: a case study at Mehran Region. Iran CATENA 137:360–372. https://doi.org/10.1016/J.CATENA.2015.10.010
Rizeei HM, Pradhan B, Saharkhiz MA, Lee S (2019) Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique. J Hydrol 579:124172
Saha TK, Pal S, Talukdar S et al (2021) How far spatial resolution affects the ensemble machine learning based flood susceptibility prediction in data sparse region. J Environ Manage. https://doi.org/10.1016/j.jenvman.2021.113344
Salem GSA, Kazama S, Shahid S, Dey NC (2017) Impact of temperature changes on groundwater levels and irrigation costs in a groundwater-dependent agricultural region in Northwest Bangladesh. Hydrol Res Lett 11(1):85–91
Shit PK, Bhunia GS, Pourghasemi HR (2020) Gully erosion susceptibility mapping based on bayesian weight of evidence. In: Gully Erosion Studies from India and Surrounding Regions. Springer, Cham, 133–146
Senapati U, Das TK (2021) Assessment of basin-scale groundwater potentiality mapping in drought-prone upper Dwarakeshwar River basin, West Bengal, India, using GIS-based AHP techniques. Arab J Geosci 14:1–22. https://doi.org/10.1007/s12517-021-07316-8
Shahinuzzaman M, Haque MN, Shahid S (2021) Delineation of groundwater potential zones using a parsimonious concept based on catastrophe theory and analytical hierarchy process. Hydrogeol J 29(3):1091–1116
Talukdar S, Pal S (2019) Effects of damming on the hydrological regime of Punarbhaba river basin wetlands. Ecol Eng 135:61–74. https://doi.org/10.1016/j.ecoleng.2019.05.014
Talukdar S, Eibek KU, Akhter S et al (2021a) Modeling fragmentation probability of land-use and land-cover using the bagging, random forest and random subspace in the Teesta River Basin. Bangladesh Ecol Indic 126:107612. https://doi.org/10.1016/j.ecolind.2021.107612
Talukdar S, Pal S, Singha P (2021b) Proposing artificial intelligence based livelihood vulnerability index in river islands. J Clean Prod 284:124707. https://doi.org/10.1016/j.jclepro.2020.124707
Talukdar S, Ghose B, Shahfahad, et al (2020) Flood susceptibility modeling in Teesta River basin, Bangladesh using novel ensembles of bagging algorithms. Stoch Environ Res Risk Assess 34:2277–2300. https://doi.org/10.1007/s00477-020-01862-5
Termeh SVR, Khosravi K, Sartaj M et al (2019) Optimization of an adaptive neuro-fuzzy inference system for groundwater potential mapping. Hydrogeol J 27:2511–2534. https://doi.org/10.1007/s10040-019-02017-9
Tien Bui D, Hoang ND, Martínez-Álvarez F et al (2020) A novel deep learning neural network approach for predicting flash flood susceptibility: a case study at a high frequency tropical storm area. Sci Total Environ. https://doi.org/10.1016/j.scitotenv.2019.134413
Ting KM, Witten IH (1997) Stacking bagged and dagged models
Tolche AD (2021) Groundwater potential mapping using geospatial techniques: a case study of Dhungeta-Ramis sub-basin, Ethiopia. Geol Ecol Landscapes 5:65–80. https://doi.org/10.1080/24749508.2020.1728882
Wolpert DH (1992) Stacked generalization. Neural Net 5(2):241–259
Yen HPH, Pham BT, Van Phong T, Ha DH, Costache R, Van Le H, Nguyen HD, Amiri M, Van Tao N, Prakash I (2021) Locally weighted learning based hybrid intelligence models for groundwater potential mapping and modeling: A case study at Gia Lai province. Vietnam. Geosci Front 12(5):101154
Zadeh LA (1965) Electrical engineering at the crossroads. IEEE Trans Educ 8(2):30–33
Zaree M, Javadi S, Neshat A (2019) Potential detection of water resources in karst formations using APLIS model and modification with AHP and TOPSIS. J Earth Syst Sci 128(4):1–12
Zhao X, Chen W (2020) GIS-based evaluation of landslide susceptibility models using certainty factors and functional trees-based ensemble techniques. Appl Sci. https://doi.org/10.3390/app10010016
Zhu Q, Abdelkareem M (2021) Mapping groundwater potential zones using a knowledge-driven approach and GIS analysis. Water (switzerland) 13:579. https://doi.org/10.3390/w13050579
Acknowledgements
The authors extend their appreciation to the Deanship of Scientific Research at King Khalid University for funding this work through Research Group under grant number (GRP2 /169/43).
Funding
Funding for this research was given by the Deanship of Scientific Research; King Khalid University, Ministry of Education, Kingdom of Saudi Arabia.
Author information
Authors and Affiliations
Contributions
Conceptualization: Swapan Talukdar, Javed Mallick; Data curation, Swapan Talukdar, Abu Reza Md Towfiqul Islam; Formal analysis, Javed Mallick, Swapan Talukdar, Atiqur Rahman; Methodology, Javed Mallick and Swapan Talukdar; Project administration, Javed Mallick; Resources, Swapan Talukdar; Software, Swapan Talukdar; Supervision, Atiqur Rahman; Validation: Swapan Talukdar and Javed Mallick; Writing—original draft, Swapan Talukdar and Showmitra Kumar Sarkar; Writing—review & editing, Javed Mallick, Sujit Kumar Roy, Swapan Talukdar, Bushra Praveen, Mohd Waseem Naikoo, Mohoua Sobnam.
Corresponding authors
Ethics declarations
Conflict of interest
No potential conflict of interest was reported by the authors”.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Talukdar, S., Mallick, J., Sarkar, S.K. et al. Novel hybrid models to enhance the efficiency of groundwater potentiality model. Appl Water Sci 12, 62 (2022). https://doi.org/10.1007/s13201-022-01571-0
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s13201-022-01571-0