1 Introduction

Groundwater is known as one of the most important natural resources in the worldwide, and is major source in industries and agricultural purposes (Nampak et al. 2014). As demand for fresh groundwater in the worldwide is increasing, delineation of groundwater spring potential zones become an increasingly important tool for implementing a successful groundwater determination, protection, and management programs. In the last decade, some researchers have employed several statistical models such as frequency ratio (Oh et al. 2011; Manap et al. 2012; Pourtaghi and Pourghasemi 2014; Davoodi Moghaddam et al. 2015; Naghibi et al. 2015), weights-of-evidence (Ozdemir 2011a; Pourtaghi and Pourghasemi 2014), logistic regression (Ozdemir 2011a; Pourtaghi and Pourghasemi 2014), index of entropy (Naghibi et al. 2015), artificial neural network (Lee et al. 2012), analytical hierarchy process (Rahmati et al. 2014; Razandi et al. 2015) and evidential belief function (Pourghasemi and Beheshtirad 2014) models in the groundwater potential mapping.

Also, other researchers have used fuzzy clustering (Moradi Dashtpagerdi et al. 2013) for flood spreading, spatial optimization techniques (Durga Rao 2014) for planning groundwater supply scheme, distributed hydrogeological budget (Mazza et al. 2014) for evaluating the available regional groundwater resources, multi-criteria analysis (Esquivel et al. 2015) for groundwater level monitoring, an optimization-simulation approach (Zekri et al. 2015) for groundwater abstraction under recharge uncertainty, and spatial multi-criteria evaluation (Chezgi et al. 2015) for underground dam site selection.

Meanwhile, according to the literature, the BRT, CART, and RF models haven’t been used in the groundwater potential mapping, but several studies have been applied to assess accuracy of the mentioned machine learning models in different cases such as landslide susceptibility and hazard mapping (Stumpf and Kernel 2011; Vorpahl et al. 2012; Lee et al. 2013; Trigila et al. 2013), ground subsidence hazard mapping (Oh and Lee 2010), wildfire (Oliveira et al. 2012; Leuenberger et al. 2013), gully susceptibility mapping (Gutiérrez et al. 2009a, 2009b), ecology (Elith et al. 2008; Aertsen et al. 2010, 2011), environmental modeling (Bachmair and Weiler 2012; Catani et al. 2013). According to the aforementioned literature, machine learning models had better performance than bivariate and multivariate models in different studies. Thus, the aim of current study is to evaluate the capability of BRT, CART, RF, EBF, and GLM models in the groundwater potential mapping and comparison of their performance. The main difference between this research and the approaches described in the aforementioned publications is that three machine learning models were applied, and the result is compared with bivariate and multivariate models in the study area. So, application of the BRT, CART, and RF models in groundwater potential mapping belongs originally to the current study.

2 The Study Area

The Beheshtabad Watershed is located in the Chaharmahal-e-Bakhtiari Province, Iran, between 31° 50′ 36″N and 32° 34′ 16″ N latitude and 51°26′ 57″ E and 59° 21′ 51″ E longitude (Fig. 1). It covers an area of approximately 2321 km2. The topographical elevation of the study area varies between 1660 m and 3560 m above sea level (a.s.l.). The mean annual point precipitation is recorded as 618.8 mm in the weather station (Mojiri and Zarei 2006). Based on the geological survey of Iran (GSI 1997), 49 % of the lithology covering the study area falls within the units described as A including low level pediment fan and valley terraces deposit. Most of the area (66.26 %) is covered by rangeland/pasture land use types. Exploitation of groundwater resources in this area includes use of qanats, springs, and deep and semi-deep wells. The average spring discharge is approximately 4 gal per second in the study area. The general trend of groundwater flow is from the north of the basin to the south of the plain, and the general topographic gradient of the plain is north to south.

Fig. 1
figure 1

Location of the study area in the Charmahal-e-Bakhtiari Province and spring locations with digital elevation model (DEM) map of the study area

3 Methods

3.1 Spring Characteristics

In total, 1425 springs were detected in Beheshtabad Watershed and was mapped at 1:50,000-scale (Fig. 1). By randomly partition (Oh et al. 2011; Ozdemir 2011a), 998 (70 %) of the spring locations were used for groundwater potential mapping and 427 (30 %) cases were used for validation aims.

3.2 Groundwater Conditioning Factors

Various thematic data layers such as slope angle, slope aspect, altitude, plan curvature, profile curvature, LS, SPI, TWI, distance from rivers, distance from faults, river density, fault density, lithology, and land use were prepared in GIS environment and applied for this study.

The digital elevation model (DEM) was created from the 1:50,000-scale topographic maps in 20 m resolution. Groundwater conditioning-factors such as slope angle, slope aspect, and altitude were prepared using DEM in ArcGIS 9.3 and represented in Fig. 2a–c.

Fig. 2
figure 2figure 2figure 2figure 2

Groundwater effective factors maps of the study area; a slope degree, b slope aspect, c altitude, d plan curvature, e profile curvature, f slope length, g stream power index, h topographic wetness index, i distance from rivers, j distance from faults, k drainage density, l fault density, m landuse, n lithology

Plan curvature can be used to describe the divergence and convergence of flow and to be discriminate between watersheds, and hollows channelized by a 0th order hydraulic network (Fig. 2d). Profile curvature represents the rate at which the slope gradient changes in the direction of maximum slope (Catani et al. 2013) (Fig. 2e).

Slope-length (Eq. 1) is the combination of the slope steepness (S) and slope length (L) which is calculated by Moore and Burch (1986) (Fig. 2f).

$$ \mathrm{L}\mathrm{S}={\left(\frac{\mathrm{Bs}}{22.13}\right)}^{0.6}{\left(\frac{ \sin\ \upalpha}{0.0896}\right)}^{1.3} $$
(1)

where, α is the local slope gradient measured in degree and Bs is the specific catchment area (m2).

The SPI (Fig. 2g) is defined by Moore et al. (1991) as:

$$ SPI={B}_s* \tan \alpha $$
(2)

The TWI (Fig. 2h) is defined as ln (A/tanβ), where A is upslope contributing area (or flow accumulation) and β is the slope angle (Beven and Kirkby 1979).

Distance from rivers and drainage density maps were created using topographic maps, whereas, distance from faults and fault density maps were calculated using a geological map. Distance from rivers and faults layers were classified into five classes with 100 and 250 m intervals, respectively (Fig. 2i–j). But drainage density and fault density maps (Fig. 2k–l) were classified using the natural break method into four classes.

The landuse map was prepared using Landsat 7/ETM+ images for 2010 based on the supervised classification method and maximum likelihood algorithm. These landuse types are agriculture, residential area, orchard, and rangeland types (Fig. 2m).

The lithology map was digitized using a 1:100,000-scale geological map in the ArcGIS 9.3. The study area is covered by various types of lithological formations and was classified into thirteen classes such as: A to M, respectively. The low-level piedmont fan and valley terraces deposit (A) covers about 45.83 % of the study area. The general geological setting of the area is shown in Fig. 2n. Class B represents Low weathering grey marls alternating with bands of more resistant shelly limestone. Class C refers to Pale-red, polygenic conglomerate, and sandstone. Class D is undifferentiated metamorphic rocks, including phillite, meta-volcanics, calcschist and crystalized limestone. Class E represents cream to brown-weathering, feature- forming, well- jointed limestone with intercalations of shale. Class F is grey, thick-bedded, o’olitic, fetid limestone. Class G represents grey, thick-bedded to massive orbitolina limestone. Class H is high level piedmont fan and valley terraces deposits and class I is marl and calcareous shale with intercalations of limestone. Class J refers to polymictic conglomerate and sandstone. Class K is undivided Bangestan Group, mainly limestone and shale, Albian to Companian. Class L represents undivided Eocene rock and class M is unconsolidated wind-blown sand deposits and back shore sand dunes.

3.3 Application of Models

3.3.1 Boosted Regression Tree (BRT)

BRT, also called stochastic gradient boosting (Elith et al. 2006), combines classification and regression trees with the gradient boosting algorithm (Friedman 2001). Boosting is a machine learning technique similar to model averaging, where the results of several competing models are combined. Unlike model averaging, boosting uses a forward, stage-wise procedure, where tree models are fitted interactively to a subset of the training data. Subsets of the training data were implemented at each iteration of the model fitting are randomly selected without replacement, where the proportion of the training data used is determined by the modeler, the “bag fraction” parameter. This procedure introduces an element of stochastic that improves model accuracy and reduces over fitting (Elith et al. 2008).

3.3.2 Classification and Regression Tree (CART)

CART is a popular machine learning and non-parametric regression technique (Breiman et al. 1984). The CART grows a decision tree based on a binary partitioning algorithm, that recursively splits the data until groups is either homogeneous or contained fewer observations than a user-defined threshold (Aertsen et al. 2010). Regression trees are insensitive to outliers, and can accommodate missing data in predictor factors using surrogates (Breiman et al. 1984).

3.3.3 Random Forest (RF)

RFs are very powerful and flexible ensemble classifiers based upon decision trees, the first developed by Breiman (2001) (Catani et al. 2013; Micheletti et al. 2014). RF consists of a combination of many trees, where each tree is generated by boot-strap samples, leaving about a third of the overall sample for validation (the out-of-bag predictions- OOB) (Oliveira et al. 2012). The algorithm estimates the importance of a variable by looking at how much the prediction error goes up when OOB data for that variable is permuted while all others are left unchanged (Liaw and Wiener 2002; Catani et al. 2013).

RFs need two parameters to be tuned by the user: (1) the number of trees T, (2) the number of variables m, to be stochastically chosen from the available set of features. Also, two types of error were calculated: mean decrease in accuracy and mean decrease in node impurity (mean decrease Gini). These different importance measures can be used for ranking variables and variable selection (Calle and Urrea 2010).

3.3.4 Generalized Linear Model (GLM)

Regression approaches comprising of linear regression, log-linear regression, and logistic regression (LR) have been used commonly. The primary goal of the LR is to find the best model to represent the relationship between a dependent variable and multiple independent variables (Ozdemir and Altural 2013). The logistic regression model can be expressed in its simplest form as:

$$ P=1/1+{e}^2 $$
(4)

where, P is the estimated probability of an event occurring. Because Z can vary from -∞ to + ∞, the probability varies from 0 to 1 as an S-shaped curve. Parameter Z is defined as:

$$ Z={B}_0+{B}_1{X}_1+{B}_2{X}_2+\dots +{B}_n{X}_n $$
(5)

where, B0 is the intercept and n is the number of independent variables. Values of Bi (i = 0, 1, 2, …, n) are the slope coefficients, and Xi (i = 0, 1, 2, …, n) are the independent variables. Based on Eqs. 4 and 5, the logistic regression can be written in the following extended form:

$$ Logit(P)=1/1+{e}^{-{B}_0+{B}_1{X}_1+{B}_2{X}_2+\dots +{B}_n{X}_n} $$
(6)

3.3.5 Evidential Belief Function (EBF)

The Dempster–Shafer theory of evidence belief (Dempster 1968; Shafer 1976), is a mathematical-based model with a bivariate statistically methodology, used to find the spatial integration based on the rule of combination. The main advantage of the EBF is that it has a relative flexibility to accept uncertainty and the ability to combine beliefs from multiple sources of evidence (Thiam 2005). The EBFs are Bel (degree of belief), Dis (degree of disbelief), Unc (degree of uncertainty) and Pls (degree of plausibility). The Bel and Pls be, respectively, lower and upper degrees of belief that the proposition is true based on given evidence. The difference between Pls and Bel is uncertainty (Unc), which represents ignorance that the evidence supports a proposition. Disbelief (Dis) is the belief of the false proposition based on given evidential data; it is equal to 1 − Pls (or 1 − Unc − Bel). Therefore, the sum of Bel, Unc, and Dis is always 1.

The details of the mentioned algorithm (EBF) can be found in Carranza et al. (2008), and Nampak et al. (2014).

3.3.6 Validation and Comparison of the GPMs

Validation of predictive groundwater potential maps (GPMs) is an essential component in modeling process. Using the success-rate and prediction -rate curves, the five GPMs were validated with known spring locations.

The success-rate results were obtained based on training dataset (998 spring grid cells) for each of the five GPMs, separately.

Since the success-rate measures the goodness of fit for the five models to the training dataset, it isn’t a suitable method for measuring the prediction capability of the spring models (Tien Bui et al. 2012). The prediction-rate curve can provide the validation and explains how well the model and groundwater conditioning factors predict the existing springs (Lee 2007).

4 Results

4.1 BRT Model

Main effects for the BRT model, where learning rate = 0.005, tree complexity = 5 and bag fraction = 0.005, the optimal number of trees was reached at trees = 900. The BRT final model included 71.93 % of the mean total deviance (1-mean residual deviance / mean total deviance = 1 - (0.49/1.38) = 0.64) (Abeare 2009). An index of relative influence calculated in summing the contribution of each variable, which is equivalent to summing the branch length for each variable in the regression tree (Abeare 2009). The measures are based on the number of times a variable is selected for splitting, weighted by the squared improvement to the model as a result of each split, and averaged over all trees (Friedman and Meulman 2003). For the main effects BRT model fitted here, the five most influential variables were altitude (20.24 %), distance from faults (19.56 %), SPI (12.98 %), distance from rivers (10.67 %) and fault density (10.33 %), respectively (Table 1). Furthermore, it was seen that six factors, including profile curvature, plan curvature, river density, landuse, slope aspect, and lithology were removed in the final analysis.

Table 1 Summary of the relative contributions of predictor variables for BRT, CART, and RF models

4.2 CART Model

The results of variables importance in CART model are represented in Table 1. According to the results, from the 14 independent factors, CART used only six factors to generate the optimal model, including distance from faults, fault density, altitude, SPI, TWI, and distance from rivers, which had high variable importance values of 25, 18, 16, 8, 7, and 7 %, respectively. Also it can be concluded from the results that landuse, profile curvature, slope aspect, and lithology had the lowest values of variable importance. The result of CART was a tree with 10 non-terminal nodes and 10 terminal nodes (Fig. 3).

Fig. 3
figure 3

Optimal tree obtained by CART with terminal nodes resulting in spring (highlighted) and non-spring (grey)

4.3 RF Model

Results from variable selection in RF are represented in Table 1. This represents the 14 variable ordered by two specific importance measures (mean decrease accuracy and mean decrease Gini). Based on Table 1, the higher values indicate that the variable is relatively more importance (Williams 2011). The accuracy measure (mean decrease) lists altitude, distance from faults, distance from rivers, SPI, fault density, and next most important factors. On the other hand, according to the mean decrease Gini, it is seen that distance from faults is the most important factor.

4.4 GLM Model

According to the results, the conditioning factors such as slope aspect, profile curvature, slope length, SPI, TWI, fault density, and lithology affect the logistic regression (LR) function, positively (Table 2). Also, it can be seen that the highest positive β coefficient is allocated to profile curvature and TWI, which were 7.991 and 0.07672, respectively. On the other hand, slope angle, altitude, plan curvature, distance from rivers, distance from faults, river density, and landuse have negative effect in spring occurrence as they all have negative β coefficients (Table 2). In the case of negative β coefficients, plan curvature, and river density had the highest negative values (−9.515, and −1.043, respectively). The estimates for a regression model can’t be uniquely computed when a perfect linear relationship exists between the predictors. Tolerance and the variance inflation factor are two important indices for multi-collinearity diagnosis (O’Brien 2007). The tolerance and variance inflation factors were calculated for this study, and variables with VIF > 5 and TOL < 0.1 should be excluded from the LR analysis, but there was not any multi-collinearity problem in used factors in this study.

Table 2 Spatial relationship between effective factors and springs using EBF and GLM models

4.5 EBF Model

The spatial factor datasets were evaluated using EBFs to reveal the correlation between the existing springs and the individual spatial factors in the study area. Table 2 shows the estimated EBFs (belief, disbelief, uncertainty, Plausibility). According to Table 2, each class of the effective factors has a belief value which a higher belief value shows that the class has higher effect on the groundwater potential. For example, in the case of slope angle, 5–15° and 15–30° classes had the highest belief values (0.45, and 0.27).

4.6 Groundwater Potential Mapping (GPM)

The obtained cell values were then classified based on the natural break classification scheme (Pourghasemi and Beheshtirad 2014; Naghibi et al. 2015) into low, moderate, high, and very high potential groups (Fig. 4a–e) and Table 3).

Fig. 4
figure 4

Groundwater potential maps produced by BRT (a), CART (b), RF (c), GLM (d), and EBF (e) models

Table 3 The distribution of the spring potential values and areas with respect to the groundwater occurrence potential zones, success-rate and prediction-rate curves for GPMs

Based on the GSPMs of BRT, CART, RF, GLM, and EBF, low class of GPMs covered 48, 12, 40, 30, and 20 % of the study area, respectively, while the sum of high and very high classes for BRT, CART, RF, GLM, and EBF are 28, 52, 32, 40, and 51 %, respectively. So, it can be concluded that BRT represented the lowest value of area for high and very high, while CART and EBF had high values for these two classes.

4.7 Validation of Groundwater Potential Maps (VGPM)

Table 3 represents the success-rate of five GPMs. The results show that values of area under the curve (AUC) for the five models vary from 0.692 to 0.975, indicating that all the models have a reasonable good prediction capability. The BRT model has the highest prediction capability (97.50 %), while the EBF model has lowest prediction capability (69.20 %). The other models with almost equal prediction capabilities are intermediate between the BRT and EBF models.

Table 3 depicts the results of prediction-rate for the implemented methods in groundwater potential mapping. According to the results, the AUC for prediction-rate ranges from 77.26 to 86.39 %. The CART, BRT, and RF techniques showed very good performance in groundwater potential mapping with the values of 86.39, 86.12, and 86.05 %, respectively, which shows close performance of these models. In contrast, The EBF and GLM models showed weak performance by the AUC values of 67.72, and 77.26 %, respectively.

5 Discussion

In this section, the results are discussed by two parts: (1) the performance of models and their characteristics, (2) the importance of variables in groundwater potential mapping and their relationship in each used model in the current study.

5.1 The Performance of Models and Their Comparison

BRT models are able to select relevant variables, fit accurate functions and automatically identify and model interactions, giving sometimes substantial predictive advantage over methods such as GLM and GAM (Generalized Additive Models). A growing body of literature quantifies this difference in performance (Elith et al. 2006; Leathwick et al. 2006; Moisen et al. 2006; Vorpahl et al. 2012). Efficient variable selection means that large suites of candidate variables will be handled well than in GLM or GAM developed with stepwise selection.

According to the results, RF method had better performance than a GLM which is common with some researches in other fields, including wildfire, landslide susceptibility mapping, and ecology studies (Peters et al. 2007; Oliveira et al. 2012; Vorpahl et al. 2013). According to Ozdemir (2011b), GLM or LR showed poor estimator for groundwater potential mapping. Also, the results of Nampak et al. (2014) showed that EBF model had better results than GLM but both, they had prediction rates of less than 78 %.

In their final form, BRT model included a smaller number of variables selected from the original dataset of 14 (eight variables), while CART, EBF, GLM, and RF included all 14 variables. Other authors also stated that a parsimonious model would be more stable and easier to generalize (Catry et al. 2009; Vilar et al. 2010), particularly at a broad spatial scale.

5.2 The Importance of Variables in GPMs and Their Relationship

According to the results of three machine learning methods, altitude, distance from faults, SPI, and fault density had the highest importance in groundwater potential mapping. However, the results of Pourtaghi and Pourghasemi (2014) showed that the conditioning factors such as slope aspect, altitude, plan curvature, and lithology affect the LR function positively. So, the importance of variables in groundwater potential mapping is considerably affected by the method used in a research and properties of study area. In other words, different geological, topographical, and climatic conditions of an area change the priority of the effective factors in groundwater potential mapping. For example, in a semi-flat watershed, altitude may not be as important as in a mountainous watershed. Also, precision of the models and their accuracy affect the importance of effective factors in groundwater potential mapping which is seen according to the current studies’ results.

According to the results, there was direct relationship between LS, TWI, and fault density and degree of belief that means groundwater potential increase when the value of these factors increased. On the other hand, results showed inverse relationship between altitude, distance from rivers, distance from faults, and river density and degree of belief. A growing body of literature determines the relationship between groundwater conditioning factors and potential (Oh et al. 2011; Naghibi et al. 2015). The result of Ozdemir (2011b) showed that the elevation and slope-related factors had a negative correlation with groundwater potential, whereas other factors (TWI, river density, and lineament-related factors) show a positive correlation. The results of Naghibi et al. (2015) showed that TWI had direct relationship, while altitude, slope angle, distance to faults, and profile curvature had inverse relationship with groundwater potential.

6 Conclusions

This study presented an application of three different machine learning models, bivariate, and multivariate models in groundwater potential mapping in Beheshtabad Watershed, Chaharmahal-e-Bakhtiari Province, Iran. According to results, three machine learning techniques used in the current study had very good results in groundwater potential mapping. The AUC of prediction-rates for machine learning techniques were approximately 86 %. But, bivariate and multivariate models used in this study had weaker performance in groundwater potential mapping with AUC values of 67, and 77 %, respectively. The GPMs produced from this study could therefore assist planners and engineers during development and water resource planning. The results of such studies determine areas with high groundwater potential which can be used for exploitation. On the other hand, susceptible areas with low groundwater potential are determined. Planners can apply conservation plans such as flood spreading in these areas. In the final form of models, BRT included a smaller number of variables selected from the original set of 14 (8 variables), while CART, EBF, GLM, and RF included all 14 variables and can be generalized easier. Also, it was concluded from the results that altitude, distance from faults, SPI, and fault density had the highest importance in groundwater potential mapping.

The result obtained in this study may provide technical support to government agencies, as well as private sectors for groundwater exploration and assessment in Iran. The proposed methods provided rapid, accurate, and cost effective results. Furthermore, the analysis may be transferable to other watersheds with similar topographic and hydro-geological characteristics.