Introduction

Iran is located in arid and semi-arid areas and is exposed to water scarcity (Chezgi et al. 2016). Water scarcity is considered as the lack of adequate obtainable water resources, which are required to supply water requirements in an area. Water scarcity could be a result of excessive population growth, climate change, and inappropriate water resource management (Asia and Richman 1991). People’s influence on water resources could be changed by using better water resource management plan leading to mitigation of the impact of water scarcity on their lives. One important aspect of water resource management is groundwater (GW) resource management. In general, different hydrogeological mechanisms impact the location and discharge of the water springs in an area. Modeling GW potential by different technical methods could be useful to gain a deeper apprehension of GW resources leading to better GW management strategies.

Recently, several types of approaches and techniques have been employed in different countries for generating GPMs. For example, frequency ratio is calculated in some works to generate GPMs (Manap et al. 2014; Naghibi et al. 2015; Mousavi et al. 2017). Weights-of-evidence is another method, which has been employed for this purpose by Ozdemir (2011a). Similarly, Ozdemir)2011b) and Pourtaghi and Pourghasemi (2014) implemented logistic regression for producing GPMs in two basins in Turkey and Iran, respectively. Evidential belief function (EBF) is another bivariate approach, which has been implemented for producing GPMs (Naghibi and Pourghasemi 2015; Rahmati and Melesse 2016; Tahmassebipoor et al. 2016; Ghorbani Nejad et al. 2016; Kordestani et al. 2018). More recently, a number of machine learning models including tree-based algorithms, i.e., boosted regression trees (BRT), classification and regression tree (CART), and random forest (RF); discriminant analysis models; and neural network and support vector machines have been employed for obtaining GPMs in several areas regarding different environmental features (Naghibi and Pourghasemi 2015; Naghibi and Moradi Dashtpagerdi 2016; Hong et al. 2016a, b; Rahmati et al. 2016; Zabihi et al. 2016; Naghibi et al. 2017b, 2018a, b; Hong et al. 2018; Golkarian et al. 2018; Rahmati et al. 2018). The mentioned machine learning models have proved to produce reliable outputs. Rotation forest is another tree-based algorithm, which has been implemented in other applications such as a breast cancer recognition study (Aličković and Subasi 2017), an image classification study (Xia et al. 2017), a gene expression data classification study (Lu et al. 2017), and landslide studies (Pham et al. 2017, 2018a; Hong et al. 2018). All those researchers have approved the acceptable efficacy of the RTF in classification issues. In addition, some other researchers have implemented data mining models in landslide, flood, and groundwater studies (Chen et al. 2017a, b; Pham et al. 2018a, b, c, d, e; Pham and Prakash 2018; Pham 2018; Khosravi et al. 2018a, b; Chen et al. 2018a, b, c; Hong et al. 2019).

A complete investigation of the published articles in spatial modeling fields such as landslide susceptibility modeling, flood susceptibility mapping, and ground subsidence hazard mapping reveals that ensemble methods are frequently being used, and their outputs are more reliable than single methods. In a paper, Park et al. (2014) employed frequency ratio, logistic regression, and artificial neural network in ground subsidence hazard mapping. In the next stage, outputs of the mentioned models were employed as input of the fuzzy logic model to create a novel ensemble model. The findings of their research showed that the ensemble model produced more accurate results. Lee et al. (2012a, b), Umar et al. (2014), Althuwaynee et al. (2014), Youssef et al. (2015), and Aghdam et al. (2016) introduced new ensemble models in the field of landslide susceptibility mapping and reported better performance of the ensemble models. On the other hand, Tehrany et al. (2013, 2014) employed ensemble models in the field of flood susceptibility assessment. Tehrany et al. (2013) proposed an ensemble algorithm by mixing bivariate and multivariate models, whereas in Tehrany et al. (2014), the outputs of weight of evidence and support vector machines were mixed together to build a new ensemble model. Both of those papers addressed better efficacy of the ensemble algorithms vs. the single ones. In the field of GW potential assessment, Naghibi et al. (2017a) used several data mining methods such as AdaBoost, Bagging, generalized additive model, and Naive Bayes for determining GW potential. They combined the outputs of those models by frequency ratio model and subsequently built a new ensemble model. The result of this new ensemble method was reported acceptable. Ensemble models are reported to give better outputs than single methods since ensemble models overcome the weakness of single models (Wan et al. 2012; Naghibi et al. 2017a).

By investigating the literature, it can be obviously seen that a shift from statistical models to more sophisticated ensemble models has been occurred in the recent years. However, a limited number of studies have been done on ensemble models in groundwater modeling, which can take the advantages of both statistical and data mining algorithms. With this in mind, the main novelty of this study is the application of a new ensemble model (EBFTM) through a combination of the BRT, CART, RF, and RTF algorithms by the EBF method in GW potential mapping. Another novelty of this research is the application of the RTF in producing GPMs. In general, the objectives of this study are (i) application of two new models i.e., RTF and ensemble model (EBFTM) in producing GPMs, and (ii) defining the importance of the GCFs in modeling GW potential.

Study area

The Meshgin Shahr area is stretched between 38° 38′ 26″ and 38° 13′ 14″N latitudes, and 47° 21′ 11″ and 48° 11′ 25″E longitudes in Iran (Fig. 1). The study region covers an area of about 2032 km2. The altitude changes between 800 and 4117 m a.s.l. with a mean value of about 1556 m. The average annual precipitation was estimated as 361 mm. Most of the precipitation occurs in winter and spring seasons. The annual average humidity in Meshgin Shahr is measured as 63%. The area comprises six land use classes of agriculture, forest, orchard, rangeland, residential area, and water body. Most of the area is covered by rangeland. Geologically, the area comprises 16 different lithological units ranging from andesite to basaltic volcanic to granite to diorite (Table 1).

Fig. 1
figure 1

Location of the study area in Iran (a) and location of the training and validation springs in the study area (b)

Table 1 The lithological characteristics of the study area

Material and methods

Figure 2 represents flowchart of the methodology employed in the current research.

Fig. 2
figure 2

Flowchart of the methodology implemented in this study

Spring inventory

The location of the springs in the Meshgin Shahr area was prepared by importing them into GIS environment from a report by Khorasan Razavi Regional Water Authority (2015). The spring inventory map contains 279 springs. The average discharge of the springs is measured as 2.2 L/s. Additionally, the average pH of the springs is measured as 6.64. In modeling a phenomenon, it is essential to separate training and validation datasets. The ratio of these datasets is very important in such studies. This study followed Oh et al. (2011), Naghibi et al. (2017a), and Pradhan et al. (2014) in selecting the ratio. According to the mentioned papers, springs were segregated into training (70% of locations) and validation (30% of locations) datasets. It needs to be mentioned that most of the springs are located very close to each other in the west and southwestern parts of the Meshgin Shahr area. Those datasets were implemented in training the models and obtaining the tuned parameters and validating the produced GPMs (Fig. 1).

Construction of the conditioning factors

In this study, considering the hydrogeological and topographical conditions of the basin and literature (Ozdemir 2011a, b; Naghibi et al. 2016), several GCFs including hydrogeological, topographical, and land use factors were mapped as input variables and employed for the modeling process.

Topographical factors

The topographic parameters, which include slope, aspect, altitude, plan curvature, profile curvature, and slope length (LS), were generated implementing a digital elevation model (DEM) with a spatial resolution of 30 × 30 m extracted from the topographic maps at a 1:50,000 scale (ASTER-DEMFootnote 1). For those functions, ArcGIS and system for automated geoscientific analysis (SAGA) were used.

The infiltration status is majorly controlled by slope angle of the area, and this factor has a high importance in GW potential mapping (Adiat et al. 2012). In rough slope sections, the overland flow is fast and there is not adequate time for the flow to infiltrate (i.e., an inverse relationship exists among infiltration rate and slope degree) (Prasad et al. 2008). In the studied region, slope angle varies from 0 to 68.79° with an average value of 9.69° (Fig. 3a). Slope aspect impacts the sunlight duration in a given area (Kordestani, et al. 2018), and consequently influences the snowmelt and infiltration rates. This factor was classified into nine categories including main and sub-main categories (Fig. 3b). Altitude determines the level of drainage system development in an area in a way that normally lower areas have more developed drainage systems. This factor was produced by employing the DEM of the study region and was regarded as an important reported factor in GW potential (Naghibi et al. 2015) in this work (Fig. 3c).

Fig. 3
figure 3figure 3figure 3

Groundwater conditioning factors implemented in the modeling process

Plan and profile curvatures were calculated from a DEM using SAGA software. These two curvatures influence the speed of water flow over the surface and subsequently impact the infiltration rate. Negative plan curvature depicts concave, positive curvature depicts convex, and zero curvature shows flat (Fig. 3d). Profile curvature is parallel to the slope and clarifies the direction of maximum slope (Fig. 3e).

LS as a sign of soil loss potential is a mixture of two factors of slope steepness and slope length (Fig. 3f). A higher value of LS shows higher accumulation of water and erosion, and influences infiltration rate. This factor can be presented mathematically as below (Moore and Burch 1986):

$$ \mathrm{LS}={\left(\frac{\mathrm{Bs}}{22.13}\right)}^{0.6}{\left(\frac{\sin\ \upalpha}{0.0896}\right)}^{1.3} $$
(1)

where Bs shows the specific catchment area (m2).

Hydrogeological factors

TWI has been implemented as a topographical indicator of saturated sources location in a given area (Beven 1997). TWI shows the water tendency to accumulate at any point in the basin and can be represented mathematically as below (Moore et al. 1991; Fig. 3g):

$$ \mathrm{TWI}=\ln \left(\upalpha /\tan \upbeta \right) $$
(2)

where α is the accumulative upslope area draining to a point and tan ß is the slope angle at the point.

Streams, which reflect slope, infiltration rate, and flow direction, were derived from a DEM using ArcGIS. It is known that streams are one of the main sources of groundwater recharge in semi-arid areas such as Iran. Therefore, distance from rivers could show the impact of this recharge source. River density shows the development of the drainage system and impacts groundwater recharge. Distance from rivers was computed by employing the river layer and Euclidean distance module implemented in ArcGIS 10.5 (Fig. 3h). In addition, river density was calculated using ArcGIS 10.5 and line density module (Fig. 3i). In this study, distance and density layers of the faults were built and regarded as GCFs (Fig. 3j, k). Lithology impacts porosity and infiltration of water into the ground (Rahmati et al. 2014). The lithology layer was produced by implementing a 1:100,000 scale geological map (GSI 1997). It was then categorized into 16 groups (Fig. 3l and Table 1).

Land use factor

Landsat 7/ETM images in 2015 were downloaded from USGS (http://earthexplorer.usgs.gov/), and the land use layer of Meshgin Shahr region was generated by implementing a supervised maximum likelihood algorithm in Idris software (Fig. 3m). This map contains six land use classes of agriculture, forest, orchard, rangeland, residential area, and water body.

Methodology

This section presents the methodological steps to obtain the final GPM by the EBFTM. In the first stage, tree-based models, i.e., BRT, CART, RF, and RTF, are explained. It should be noted that these models were applied using the training dataset and produced four GPMs. Then, these maps were classified into four classes of GW potential according to a previous research by Naghibi et al. (2017a). The groundwater potential values (i.e., spring occurrence probability) obtained by the BRT, CART, RF, and RTF models for all the pixels of the study area were classified into four potential classes. These values were used as inputs of the EBF model to construct the new ensemble model (i.e., EBFTM). In other words, probability classes obtained by the BRT, CART, RF, and RTF were extracted for the training dataset and used in the EBFTM to calculate the belief values for each class.

Application of the tree-based models in GW potential modeling

GW potential modeling by boosted regression tree

BRT is fundamentally different from classical regression algorithms, which generates a single best choice. It uses boosting feature as a combiner of a large number of comparatively simple trees adaptively for optimizing the predictive efficacy (Elith et al. 2008). Adapting several trees in this algorithm conquers the major issue of single tree methods, which is their comparatively weak predictive efficacy (Aertsen et al. 2010). Elith et al. (2008) clarified that the final model could be considered as an additive regression algorithm with simple trees, adapted in a forward, stepwise form. In the BRT, three factors need to be tuned: number of trees; interaction depth, which defines the size of trees; and shrinkage as a representative of the tree’s contribution in the model (Elith and Leathwick 2013). It needs to be stated that parameter optimization in the BRT was applied by grid search. The implemented grid included interaction depth of 1 to 9 and number of trees of 0 to 1500 with 100 intervals. This algorithm was applied in R statistical software by employing the gradient boosting method (gbm) and caret packages (Naghibi et al. 2016). It needs to be mentioned that a ten-fold cross-validation was employed to define the parameters of this algorithm (Naghibi et al. 2016).

GW potential modeling by classification and regression tree

CART is a famous tree-based model as it shows information in such a way that is simple to figure out. CART model is a binary recursive separation process, which is able to process continuous and nominal features as outputs and inputs. Each node (or split) is determined by an easy rule based on a single variable, classifying the data into two classes. In each one of the classes, the response factor has to maximize the homogeneity, intending to minimization of the total deviance (Vorpahl et al. 2012). The estimated amount of a “terminal node” clarifies the mean of the response amount in that node (Breiman et al. 1984). McKenney and Pedlar (2003) stated that classification tree should be pruned for selecting only the most contributing factors, which refers to the nodes explaining the largest amount of deviance. It needs to be clarified that CART was applied in R statistical software using the rpart script (Ripley 2015).

GW potential modeling by random forest

Random forest (RF) is a nonparametric technique derived from CART. Each tree is generated on a bootstrapped subset of the dataset implementing CART methodology, with an accidental subspace of factors, which have been chosen for every node (Micheletti et al. 2013). The number of factors is larger than the size of the factors in the input space. The varied trees are decorrelated because of the accidental picking of the characteristics at each node, which enhances the strength of the ensemble model by decreasing the variance with no enhancing effect on the bias (Breiman 2001). The RF tunes two parameters of the number of trees called ntrees, and the number of factors, which is stochastically selected from the accessible set of characteristics called mtry (Micheletti et al. 2013). One of the capabilities of RF is its effectiveness in defining the contribution of input factors by two indices of mean decrease accuracy and Gini (Pardo and Sberveglieri 2008). It needs to be explained that the RF was carried out in R statistical software employing randomForest and its inner tuning function.

GW potential modeling by rotation forest with decision trees as base classifier

Rotation forest draws upon RF idea. RTF ensemble could be implemented to strengthen the performance of weak classifiers such as data mining models (Chen et al. 2017c; Pham et al. 2018a). In RTF, the training set is randomly classified into K sets and a rotation sparse matrix is built by extracting features from each set. RTF implements principal component analysis for extracting features for building sub-training data space to learn base classifiers (Koyuncu and Ceylan 2013). In RTF, every tree is trained using all data with a rotated feature space; however, the base classifiers are separately constructed decision trees (Rodriguez and Kuncheva 2007). Since the algorithm constructs the classification districts implementing hyperplanes parallel to the feature axes, a small rotation of the axes could result in a totally distinct tree. RTF directs to create correct and different classifiers (Rodriguez and Kuncheva 2007). Bootstrap samples are regarded as the training data for separate classifiers, like in bagging. The major heuristic is to conduct feature exploitation and to successively rebuild a complete feature set for each classifier in the ensemble. RTF was applied in the R statistical software employing rotationForest and caret scripts.

EBFTM ensemble approach

In this section, first, EBF model is described and then procedure of the ensemble modeling in this study is explained.

Description of evidential belief function model

EBF is constructed on the basis of the Dempster-Shafer theory (Dempster 1968; Shafer 1976). This model comprises degrees of belief, disbelief, uncertainty, and plausibility as Bel, Dis, Unc, and Pls, respectively (Nampak et al. 2014). EBF regards two different relationships in data including the relationship among a GCF and incidence of the springs as well as the relationship between the classes of each GCF (Pradhan et al. 2014). Degree of belief changes from 0 to 1 (Carranza et al. 2008; Pradhan et al. 2014). In this method, the generalized Bayesian lower and upper probabilities show the belief and plausibility values, respectively. The Unc can be computed from subtraction of the Bel from Pls, which would always be a positive value. The Unc could be determined as the doubt, which the incident supports a preposition (Pradhan et al. 2014). Finally, the Dis could be computed as 1-pls or 1-Unc-Bel (Tien Bui et al. 2012; Pradhan et al. 2014).

GW potential modeling by the new ensemble model EBFTM

To construct the ensemble model (i.e., EBFTM) in this work, first, the tree-based models were carried out and their corresponding GPMs were built. In the next stage, the GPMs were grouped into four categories of potentiality (i.e., low, moderate, high, and very high). Afterward, the EBF degrees were computed for each class of the GPMs produced by the tree-based models. Then, Bel values were assigned to the classes of the GPMs, and the EBFTM values were obtained as follows:

$$ \mathrm{EBFTM}={\mathrm{Bel}}_{\mathrm{BRT}}+{\mathrm{Bel}}_{\mathrm{CART}}+{\mathrm{Bel}}_{\mathrm{RF}}+{\mathrm{Bel}}_{\mathrm{RTF}} $$
(3)

where the EBFTM shows the final value for each pixel in the ensemble model; BelBRT shows Bel values calculated for different classes of the BRT model; and BelCART, BelRF, and BelRTF represent the same value for the CART, RF, and RTF models, respectively.

Validation of the GPMs

The last stage is to evaluate the created GPMs and determine which one provided the best efficiency. In this investigation, receiver operating characteristics (ROC) curve analysis was used for this purpose (Chen et al. 2017a, b; Sangchini et al. 2016). The ROC curve is a plot according to specificity, and sensitivity with diverse cut-off thresholds (Tien Bui et al. 2016). To compare quantitatively, the area under the ROC curves (AUC), which is regarded as a statistical brief of the total efficacy of the implemented algorithms, was employed. An AUC value of one depicts an excellent algorithm that accurately classifies all the spring and non-spring pixels, whereas a value of 0.5 depicts a weaker algorithm (Hong et al. 2016b, 2017a). Additionally, Wilcox signed rank test was conducted in order to define the significance of difference between the new ensemble model, EBFTM, and its elements (i.e., tree-based models such as BRT, CART, RTF, and RF).

Results

Boosted regression tree model

The BRT was tuned by implementing the caret script, and the final BRT model included the number of trees of 100, an interaction depth of 1, a shrinkage of 0.1, and an interaction depth of 1 having accuracy and kappa values of 0.78 and 0.57, respectively. The results of the variable importance are shown in Table 2. Based on the results, land use, altitude, distance from faults, and TWI were the most important GCFs, while river density, fault density, and aspect had the lowest relative influence values. The GPM generated using the BRT algorithm is represented in Fig. 4a and Table 3. Low moderate, high, and very high categories of this GPM included 44.34, 27.11, 15.65, and 12.90% of the study area, respectively.

Table 2 Importance of groundwater conditioning factors in GPMs using BRT model
Fig. 4
figure 4figure 4

Groundwater potential maps produced by a BRT, b CART, c RF, d RTF, and e EBFTM

Table 3 The distribution of the spring potential values and areas with respect to the groundwater spring potential zones

Classification and regression tree model

The CART method was trained and pruned by the rpart script in R software having total classified error of 20.4% (Fig. 5). Also, the contribution of the GCFs was determined and displayed in Table 2. The findings showed that land use, distance from faults, and lithology had the highest contributions, while river density and LS had the least contribution in the modeling procedure. The GPM obtained by the CART model is shown in Fig. 4b and Table 3. According to the results, the area allocated to low, moderate, high, and very high categories of the GPM by this algorithm are 55.44, 15.39, 17.35, and 11.82% of the studied area, respectively.

Fig. 5
figure 5

Pruned classification tree by CART model

Random forest model

The final model was fitted with a node size of 6, a mtry of 2, and the number of trees of 1000. The “node size” is the minimum size of the nodes. The “mtry” is the number of factors randomly sampled as candidate at each split. Besides, the mean decrease Gini was reported to show the contribution of the GCFs in the concluding model (Table 2). The findings of the mean decrease Gini showed that distance from faults, land use, and altitude had the highest contribution, while fault density and aspect had the least contribution in the modeling procedure. The GPM produced by the RF model is shown in Fig. 4c and Table 3. Low, moderate, high, and very high categories of this GPM cover 36.15, 32.14, 18.21, and 13.51% of the study area, respectively.

Rotation forest model

The RTF model was tuned and fitted using the caret and rotationForest packages in the R software. The final model included a K of 4 and an L of 9 resulting in a ROC value of 0.80 (Fig. 6). K refers to the number of variable subsets and L determines the number of base classifiers. It can be observed that when variable sets range from 1 to 3.5, changes in the number of base classifiers do not have a dramatic effect, while for variable subset of 4, a very distinct result is obtained. The GPM obtained by the RTF algorithm is displayed in Fig. 4d. Considering Table 3, low, moderate, high, and very high categories of this GPM include 43.99, 26.51, 15.21, and 14.29% of the study region, respectively.

Fig. 6
figure 6

The results of cross-validation for rotation forest

EBFTM results

The findings of the EBFTM are listed in Table 4. According to the table, Bel values for the BRT classes range from 0.003 to 0.856 for low and very high classes of potentiality, respectively. In the case of the CART algorithm, low class has a Bel value of 0.014, while 0.774 is assigned to its very high potential class. In the respect of the RF, it can be observed that low, moderate, high, and very high categories have Bel values of 0.002, 0.015, 0.086, and 0.897, respectively. The findings of the RTF depicted that belief values range from 0.010 to 0.857 for low and very high classes of potentiality, respectively. In addition, the GPM produced by the EBFTM is shown in Fig. 4e and Table 3. It can be observed that a high percent of the area is assigned to the low class (72.51%), whereas a low percent is assigned to the very high potential class (10.99%).

Table 4 Belief values obtained from EBF for different classes of the tree-based models

Validation of the models

The AUC-ROC for the models are presented in Fig. 7. As it can be observed, area under the curve of ROC for the implemented methods varies between 86.2 and 90.4%, which shows different prediction capabilities of the methods. The findings showed that the EBFTM, RF, BRT, CART, and RTF algorithms had AUC-ROC values of 90.4, 90.1, 89.8, 86.9, and 86.2%, respectively. The results of the Wilcox signed rank test also showed that there are significant differences between the results of the new ensemble model, EBFTM, with the results of the single tree-based models, i.e., BRT, CART, RF, and RTF (Table 5).

Fig. 7
figure 7

Prediction rate of the curve for the GPMs produced by tree-based and ensemble models

Table 5 The Wilcox signed rank test results between the accuracy of EBFTM and BRT, CART, RTF, and RF algorithms

Discussion

GW potential as a hydrogeological phenomenon deals with a wide variety of uncertainties due to the complex nature of GW resources in different areas (Ozdemir 2011a). What makes the situation more complex is the overfitting problem related to the modeling techniques that have been employed in this field of study. Tree-based models face a high amount of uncertainty as they are fitted to datasets as much as possible. The high complexity of GW resources accompanied with different sources of uncertainty in the modeling procedures make the outputs and methodologies regional and reduces the possibility to generalize them. Bartels (1997) recommended model averaging as a recommended solution for this problem. However, there are some more mathematically appropriate methods such as the application of ensemble models. With this in mind, this study applied the EBFTM as a new ensemble model in order to reduce the over fitting issue in each of the tree-based models (i.e., BRT, CART, RF, and RTF algorithms). The results depicted the higher efficacy of the EBFTM comparing with the single models. The findings also clarified that the BRT and RF had similar AUC-ROC values and higher performance capabilities than the CART and RTF models. The findings also depicted that all the tree-based algorithms provided GPMs with higher AUC values than 70%, which shows their acceptable efficacy (Naghibi et al. 2016). The EBFTM produced a more accurate GPM than all the other models since it works as a model averaging technique and could be implemented as a method to reduce the variance. It is known that error could be classified into bias and variance. Naghibi et al. (2017a) stated that ensemble models are believed to combine several weak learners and create a new powerful model. This procedure is regarded as boosting technique in statistical viewpoint. As a matter of fact, since the EBFTM used the same data as the tree-based models, it can be inferred that implementing the ensemble approach could enhance the prediction efficacy with no more data. This finding is in agreement with the results obtained in Lee et al. (2012a, b) in the field of landslide susceptibility mapping.

In the respect of the BRT model, there are some strong characteristics that could result in its high efficacy in classification problems. For instance, BRT can keep important GCFs, detect the interactions, and also it is able to delete the variables with large number of missing values (Elith et al. 2008; Carty 2011). RF algorithm has been reported to have acceptable efficacy in different fields of study (Peters et al. 2007; Vorpahl et al. 2012; Chen et al. 2018d; Hong et al. 2017b, 2018). In this work, RF showed acceptable results. There are some advantages in RF algorithm such as the low aptitude to overfitting, and the capability to support high-dimensional datasets (Caruana and Niculescu-Mizil 2006). In the case of interpreting, it can be seen that among all the tree-based algorithms, which are employed in this research, the CART has the highest interpretability. This kind of information could be very helpful for water resource planners and managers who are not very familiar with the modeling procedures. Breiman et al. (1984) mentioned that CART is able to be conducted on nonlinear data and has no requirement to factor transformation. Furthermore, outliers do not have a high influence on the results (Strobl et al. 2008). Besides, another positive feature in CART is that it does not have adverse impact on numerical calculations. In the respect of the RTF algorithm, it produced a GPM with lower accuracy than the other three tree-based models; however, its AUC-ROC is 86.2% and this confirms that the RTF is an appropriate GW predictor especially under the uncertain condition of GW resources and lack of data. The main limitation of this work like other works applied on spatial sciences such as groundwater potential mapping is that the scientists need to investigate the results of their models in more than one study area to make sure that the results are general. Therefore, the authors suggest the future works to work on two or more study areas to achieve this goal.

This research also investigated the importance of the GCFs in modeling GW potential. It was seen that land use was the most important factor in the BRT and CART, and the second most important factor in the RF. Regarding the land use classes of the study area, it can be seen that most of the springs have occurred in agriculture and orchard categories. This accumulation of the springs could be mentioned as a reason of its importance in the modeling process. Distance from faults, altitude, and lithology were the following most important GCFs in this study. An investigation of the distance from faults layer shows that most of the training springs are occurred in far distances. This shows a direct relationship between GW potential and distance from faults. In another word, springs are not occurred at the location of the faults. Altitude has an inverse influence on spring occurrence as it impacts slope angle, river system development, and agricultural and industrial development. All these factors influence GW potential in an area. An investigation of the location of the training springs regarding the altitude shows that some altitude ranges do not have any spring occurrence. Lithological characteristics of a watershed impact its hydrological as well as hydrogeological features including soil permeability, transmissivity, and river system development. It can be seen that the springs are concentrated in piedmont fan and valley terraces with high amount of permeability. On the other hand, river density, and fault density were the least important GCFs.

In fact, one important characteristic of a classification model could be regarded as how precise it is in determining the very high class of potentiality. The finding of this work depicted that the very high class of potentiality in the GPMs generated by the BRT, CART, RF, RTF, and EBFTM algorithms were calculated as 12.9, 11.82, 13.51, 14.29, and 10.99%, respectively. This finding explains that the new ensemble model (i.e., EBFTM) in addition to its higher performance can determine the very high potential class more precise than the other models.

Conclusions

Water resource-related issues are growing and this fact enhances the necessity to gain a better understanding of these important natural resources. Thus, investigating different aspects of water including surface and GW resources has a high priority. This paper made an attempt to generate a new methodology (i.e., EBFTM method) to model GW potentiality as a useful guideline for water resource managers. Performance of the EBFTM was validated by using ROC curve and accuracy index. The findings of this investigation represented that the EBFTM had better efficacy than all the tree-based models implemented in this work. Another novelty of this study was application of the RTF algorithm in producing a GPM for the first time, and its performance was compared to other data-mining algorithms (i.e., BRT, CART, and RF). The findings depicted that although the RTF had weaker efficacy than the other tree-based models, it gave acceptable efficacy based on area under the ROC curve. Model averaging techniques such as the one implemented in this research (i.e., EBFTM) could be implemented with less uncertainty resulting in more general outputs. Regarding this fact, this study suggests the application of ensemble models like the EBFTM in cross-application studies. These kinds of general models could be more applicable in situations where the managers deal with a larger area with a high variety of hydrogeological, climatic, and topographical features. The outlook of this research in companion with other papers applied on ensemble models could be to lead the research community to work on more advanced ensemble models and obtain better results in order to solve complex geospatial issues. Overall, considering the five constructed GPMs in this study, the authors can suggest west central parts of the study area for water extraction for different water uses. A smaller area at the eastern part of the watershed can also be conservatively considered for water extraction.