Groundwater potential assessment using GIS-based ensemble learning models in Guanzhong Basin, China

Wang, Zitao; Wang, Jianping; Yu, Dongmei; Chen, Kai

doi:10.1007/s10661-023-11388-2

Groundwater potential assessment using GIS-based ensemble learning models in Guanzhong Basin, China

Research
Published: 18 May 2023

Volume 195, article number 690, (2023)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Environmental Monitoring and Assessment Aims and scope Submit manuscript

Groundwater potential assessment using GIS-based ensemble learning models in Guanzhong Basin, China

Download PDF

Zitao Wang^1,2,3,
Jianping Wang^1,2,
Dongmei Yu^1,2,3 &
…
Kai Chen⁴

424 Accesses
4 Citations
Explore all metrics

Abstract

Groundwater plays a crucial role in sustaining industrial and agricultural production and meeting the water demands of the growing population in the semi-arid Guanzhong Basin of China. The objective of this study was to evaluate the groundwater potential of the region through the use of GIS-based ensemble learning models. Fourteen factors, including landform, slope, slope aspect, curvature, precipitation, evapotranspiration, distance to fault, distance to river, road density, topographic wetness index, soil type, lithology, land cover, and normalized difference vegetation index, were considered. Three ensemble learning models, namely random forest (RF), extreme gradient boosting (XGB), and local cascade ensemble (LCE), were trained and cross-validated using 205 sets of samples. The models were then applied to predict groundwater potential in the region. The XGB model was found to be the best, with an area under the curve (AUC) value of 0.874, followed by the RF model with an AUC of 0.859, and the LCE model with an AUC of 0.810. The XGB and LCE models were more effective than the RF model in discriminating between areas of high and low groundwater potential. This is because most of the RF model’s prediction outcomes were concentrated in moderate groundwater potential areas, indicating that RF is less decisive when it comes to binary classification. In areas predicted to have very high and high groundwater potential, the proportions of samples with abundant groundwater were 33.6%, 69.31%, and 52.45% for RF, XGB, and LCE, respectively. In contrast, in areas predicted to have very low and low groundwater potential, the proportions of samples without groundwater were 57.14%, 66.67%, and 74.29% for RF, XGB, and LCE, respectively. The XGB model required the least amount of computational resources and achieved the highest accuracy, making it the most practical option for predicting groundwater potential. The results can be useful for policymakers and water resource managers in promoting the sustainable use of groundwater in the Guanzhong Basin and other similar regions.

Application of Ensemble Machine Learning Models to Assess the Sub-regional Groundwater Potentiality: A GIS-Based Approach

Combining high resolution input and stacking ensemble machine learning algorithms for developing robust groundwater potentiality models in Bisha watershed, Saudi Arabia

Article Open access 21 March 2022

Integrated machine learning and remote sensing for groundwater potential mapping in the Mekong Delta in Vietnam

Article 03 April 2024

Discover the latest articles, news and stories from top researchers in related subjects.

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Groundwater is a vital resource that supports residential life, agricultural and industrial activities, and ecosystem sustainability (Anand et al., 2021; Cui & Shao, 2005). Nearly a quarter of the world’s freshwater resources come from groundwater, according to statistics (Panahi et al., 2020). In arid and semi-arid regions, the proportion of groundwater can be even higher due to the scarcity of surface water resources. In Northwest China, which is a typical arid and semi-arid region, the population represents only 7.3% of the country, while groundwater reserves account for 1/8 of the country’s total (Chen, 1986; Wang et al., 2008). Effective utilization of groundwater can help alleviate freshwater scarcity in these regions (Arabameri et al., 2019). However, the availability of groundwater is currently threatened by over-exploitation, climate change, and land-use changes. In semi-arid regions like China’s Guanzhong Basin, where groundwater resources are limited and coupled with high population density and rapid economic development, demand for groundwater resources has intensified (Kong et al., 2019). Therefore, it is crucial to conduct accurate assessments of groundwater potential to promote sustainable groundwater management in these areas.

The assessment of groundwater potential aims to identify areas with a high likelihood of containing groundwater resources (Jhariya et al., 2021). This assessment is crucial for optimizing the placement of groundwater wells, estimating the potential yield of groundwater, and promoting sustainable use of groundwater resources (Tegegne, 2022). Typically, the assessment involves the use of geological, hydrological, and environmental information to identify favorable zones for groundwater development (Farhat et al., 2023). However, traditional methods for determining groundwater potential, such as pumping tests and borehole drilling, are time-consuming and expensive. To overcome this challenge, more researchers have been turning to advanced technologies such as geographic information systems (GIS) (Bera et al., 2021), remote sensing (Shamsudduha & Taylor, 2020; Sun et al., 2019), and drones (Jansen, 2019) to predict and assess the potential of groundwater (Wang et al., 2022). The biggest advantage of using these methods is their ability to analyze the entire study area by combining satellite images with regional geographic or geological information. However, one disadvantage of these methods is that the results obtained are not always directly related to the groundwater itself, but are relatively related, which can make them less efficient and reliable than drilling or pumping tests. In order to tackle this issue, researchers have developed several assessment and evaluation techniques to analyze the data. Two main approaches have emerged for assessing groundwater potential. The first approach involves weighing the various factors that affect groundwater and overlaying them (Singh et al., 2019). Weight-based methods include the analytic hierarchy process (Ahmad et al., 2023; Arefin, 2020), which depends on human intervention and expert ratings and entropy weight (Al-Abadi et al., 2016; Zhang et al., 2021) which considers the distribution characteristics of the factors. Although these methods are more interpretable during the calculation process, their final results often do not meet expectations. The second method involves dividing the research area into multiple points and then scoring or ranking each point using methods such as the technique for order of preference by similarity to ideal solution (TOPSIS) (Zaree et al., 2019) or compressing the factor features of these points into a one-dimensional vector using principal component analysis (Sun et al., 2021). Point-based methods usually produce more accurate results than weight-based methods. The two types of methods can also be combined, such as the entropy-TOPSIS method (Li et al., 2019), to statistically determine the distribution of groundwater potential. However, another problem arises that these methods are difficult to integrate with hydrogeology itself, and some valuable drilling or pumping test data have not been used accurately enough.

In recent decades, the rapid development of machine learning algorithms has integrated them into various industries (Kaur & Sood, 2020; Reichstein et al., 2019), including hydrogeology. Machine learning is used in various areas of hydrology, such as hydrogeological modeling (Sun et al., 2019), model parameter inversion (Mo et al., 2020; Zhou et al., 2014), groundwater pollution source identification (Han et al., 2020), and groundwater potential assessment. The machine learning approach combines the factors that affect groundwater potential with borehole data or field survey site information. By utilizing the factors as machine learning features and borehole data as results, machine learning can determine potential non-linear associations between different features and build complex black box models (Wang et al., 2022). One advantage of using machine learning for groundwater potential assessment is that the drilling and pumping test data can be used for verification to ensure the reliability of the prediction results. Common machine learning models, such as decision trees (Duan et al., 2016), linear regression, support vector machines (Panahi et al., 2020), k-nearest neighbors, and Bayes models (Pham et al., 2021), as well as integrated models such as random forests (RF) (Wang et al., 2022), adaptive boosting (Rizeei et al., 2019), and extreme gradient boosting (XGB) (Ibrahem Ahmed Osman et al., 2021), are widely used for groundwater potential assessments. Ensemble models enhance the accuracy and usability of machine learning, compared to single base learner models (Arabameri et al., 2021). In most cases, the prediction of groundwater potential using ensemble learning models are more accurate (Pham et al., 2021). However, the complexity of the geological environment introduces variations in the model’s performance and feature engineering, which can differ greatly in different study areas. Therefore, it is necessary to investigate and find the most suitable machine learning model to predict groundwater potential, while keeping in mind the importance of integrating and validating these models with the reality of the area being studied.

The Guanzhong Basin is a semi-arid region heavily reliant on groundwater resources for agricultural practices and other human activities (Zhang et al., 2022). As a region experiencing rapid urbanization and economic growth, the local water supply faces an increasing demand for water resources, which poses significant challenges. Thus, assessing the groundwater potential of the Guanzhong Basin is crucial to ensure sustainable water resource management in the area. In this study, we collected factors that influence groundwater potential in the Guanzhong Basin and used three ensemble learning models, RF, XGB, and local cascade ensemble (LCE) to predict the groundwater potential. We conducted multiple calculations and cross-validation to determine the most suitable parameter sets enabling us to provide a comprehensive and accurate assessment of the groundwater potential of the Guanzhong Basin, using GIS-based ensemble learning models. The results allow for the identification of groundwater enrichment areas while minimizing costs. This information can provide valuable guidance for subsequent drilling and other related activities. Furthermore, this study can also serve as a reference for other semi-arid regions that share similar characteristics with the Guanzhong Basin.

Data and data processing

Description of the study area

The Guanzhong Basin is a large geological basin located in the central part of China (Fig. 1), covering an area of approximately 20,722.63 km² (Zhang et al., 2022). Its longitude and latitude range are 107°–110°30′E and 35°10′–34°N, with an altitude range of about 200–2000 m. The basin is an important agricultural and industrial region, which is home to several large cities, including Xi’an, the capital of Shaanxi Province. The geology of the Guanzhong Basin is dominated by a thick layer of sedimentary rock that was deposited during the Mesozoic Era (Xu et al., 2019). The climate of the Guanzhong Basin is temperate, with four distinct seasons. The area has an average annual temperature of 12–13.6℃ and an average annual rainfall of about 500–800 mm, mainly in summer, but it is prone to droughts and water shortages (Ren et al., 2021). The annual average evaporation is 800–1200 mm (Bei et al., 2016). The main river in the Guanzhong Plain is the Weihe River, which is the largest tributary of the Yellow River. The Weihe River is an important source of water for the basin, which has close interactions with the groundwater in the basin (Kong et al., 2019). Given its importance as a center of agriculture and industry, as well as its vulnerability to water scarcity, accurate assessments of groundwater potential in the Guanzhong Basin are critical for sustainable development and management of the region’s water resources.

Borehole datasets

The accuracy of machine learning models used for groundwater potential assessment is largely dependent on the quality of training data (Chen et al., 2019a; Panahi et al., 2020). The borehole datasets utilized in this study were collected from two sources, namely GeoCloud (http://geoscience.cn) and field observations. In total, 205 sets of borehole data were collected and analyzed. The distribution of these boreholes is illustrated in Fig. 1. To facilitate model training and testing, the borehole data was divided into two categories based on the water pumping rate, with a threshold of 5 t/h used to distinguish between enriched groundwater (> 5 t/h) and lack of groundwater (≤ 5t/h). Subsequently, the data for enriched groundwater and lack of groundwater were randomly partitioned into training and test sets in the ratio of 0.7:0.3 (Panahi et al., 2020). The training data was utilized for model calibration, while the test set was used to assess the accuracy of the model. It is worth noting that the partitioning of the data was performed randomly to ensure that the training and test sets were representative of the entire dataset (Wang et al., 2022).

Database of conditioning factors

The selection of indicators that influence groundwater potential is critical to accurately assess the potential for groundwater in a given area (Zaree et al., 2019). These indicators are variables that can affect the recharge and availability of groundwater. By identifying and analyzing these factors, it is possible to map the groundwater potential of a region, which can aid in groundwater management, water resources planning, and sustainable development. Based on the characteristics of semi-arid areas (Arabameri et al., 2019) and previous literature reviews (Díaz-Alcaide & Martínez-Santos, 2019), this study selected 14 factors as features for ensemble learning, including landform, slope, slope aspect, curvature, precipitation, evapotranspiration, distance to fault, distance to river, road density, topographic wetness index (TWI), soil type, lithology, land cover, and normalized difference vegetation index (NDVI) (Figs. 2–4).

Landform is a crucial factor in groundwater potential assessment since it determines the surface water recharge (Razandi et al., 2015). In this study, landform was classified into five categories: floodplain, hill, mountain, plain, and plateau (Fig. 2a). Generally, high groundwater levels are common in floodplain areas near rivers and streams, while hills and mountains have lower groundwater potential due to steep slopes and limited surface water recharge.

Slope is another critical factor that influences groundwater potential. The slope of a land surface affects the rate of water infiltration and runoff, which ultimately impacts groundwater recharge (Doke et al., 2021). In the Guanzhong Basin, slopes were calculated using a digital elevation model (DEM) with a 30 m resolution (obtained from https://www.gscloud.cn) and ranged from 0 to 74.88° (Fig. 2b). In general, concave slopes have higher groundwater recharge than flat or convex slopes. Slope aspect also affects groundwater potential, as it determines the amount of solar radiation and wind exposure a surface receives (Naghibi et al., 2015b). Flat slopes are more likely to be recharged due to their even exposure to solar radiation and wind. The slope aspect was also computed from the DEM (Wang et al., 2020) and classified into ten categories based on the angle of the slope aspect: flat, north, northeast, east, southeast, south, southwest, west, northwest, and north (Fig. 2c). Curvature, defined as the rate of change of slope along a contour line, can also affect groundwater flow and recharge by influencing the direction and velocity of water movement (Arabameri et al., 2019). We classified the factor into three types — concave, flat, and convex — based on the size of its value (Fig. 2d). Concave areas exhibit negative curvature, suggesting lower slope angles in the center and higher angles on the periphery. Such areas accelerate the convergence of surface water bodies and augment their interaction with groundwater (Chen et al., 2019b). Conversely, convex areas featuring higher slope angles in the center and lower angles on the periphery often promote the divergence of surface water bodies, preventing their interaction with groundwater. Flat areas, characterized by zero curvature and a uniform slope, serve a role somewhere between concave and convex areas.

Precipitation and evapotranspiration are two additional factors that affect groundwater potential. Precipitation provides the source of water for groundwater recharge, while evapotranspiration limits the amount of water available for recharge (Jia et al., 2011; Jin et al., 2013). Therefore, areas with high precipitation and low evapotranspiration have higher groundwater potential. In this study, precipitation data were obtained from WorldClim2 (Fick & Hijmans, 2017), with a range of 500–757 mm (Fig. 2e), while evapotranspiration data (obtained from https://data.cma.cn/) ranged from 794–1277 mm (Fig. 2f). It is evident that the overall evapotranspiration of the Guanzhong Basin is slightly higher than the rainfall, indicating that it is a typical semi-arid area.

Fault zones can serve as pathways for groundwater to interact with surface water (Ahmad et al., 2021), while rivers and streams serve as surface water sources (Golkarian et al., 2018). In this study, the distance to faults was divided into five categories: 1, 5, 10, 20, and 50 km (Fig. 3a), while the distance to rivers was divided into six categories: 1, 2, 5, 10, 20, and 50 km (Fig. 3b). Road density is defined as the length of road per unit area and is calculated from the road network data (Velis et al., 2017). High road density can lead to increased surface runoff, reducing infiltration and groundwater recharge. In this study, we calculated the road density ranging from 0 to 2.29 in the Guanzhong Basin (Fig. 3c).

TWI is an important factor that indicates the degree of land surface wetness, which is used to estimate the spatial distribution of soil moisture and potential groundwater recharge (Sørensen & Seibert, 2007). TWI is calculated based on the slope and contributing area of each pixel in the DEM (Wang et al., 2022). The calculation method assumes that water accumulates in areas with low slope and high contributing area. The TWI is computed using the following formula (Wang et al., 2022):

$$TWI=ln\left(\frac{a}{\mathrm{tan}\beta }\right)$$

(1)

$$a=\frac{\mathrm{tan}a}{S}+1$$

(2)

where a is the specific catchment area and β is the local slope. The specific catchment area (a) is defined as the upslope contributing area per unit contour length. α is the local flow direction, and S is the local slope. The local flow direction and slope are calculated from the DEM. The TWI values range from negative to positive infinity, with higher values indicating areas with higher potential for groundwater recharge. In Guanzhong Basin, the TWI ranges from 5.44 to 30.56 (Fig. 3d).

Soil type and lithology are important factors in assessing groundwater potential. Soil types affect water infiltration and storage, while lithology affects subsurface permeability and porosity (Arabameri et al., 2021). In the Guanzhong Basin, soil types were categorized into seven types: paddy soils, brown earths, cinnamon soils, fluvo-aquic soils, dark loessial soils, cultivated loessial soils, and gray-cinnamon soils (Fig. 4a). Lithology was divided into seven categories: intrusive rock, Lower Quaternary, Upper Quaternary, Tertiary, Paleozoic, Proterozoic, and Archean (Fig. 4b).

Land cover is another factor that influences groundwater potential. The Guanzhong Basin has different land use types, including cropland, forest, grassland, shrub, wetland, aquatic, artificial surface, and bareland (Fig. 4c) (data from http://www.globallandcover.com/). Changes in land use can significantly affect groundwater recharge and discharge, affecting groundwater potential. For example, deforestation can reduce groundwater recharge by decreasing the interception and infiltration of precipitation. Conversely, land use types like wetlands and grasslands can enhance groundwater recharge by increasing infiltration and reducing surface runoff.

NDVI is a remote sensing-based conditioning factor that measures the density of green vegetation cover. NDVI is calculated using the spectral reflectance values of the red and near-infrared bands of remote sensing data. NDVI values range from − 1 to 1, with higher values indicating denser vegetation cover (Han et al., 2021). Vegetation cover can influence groundwater recharge by reducing surface runoff and increasing infiltration. In the Guanzhong Basin, the NDVI ranges from 0.18 to 0.90 (Fig. 4d) (data from https://www.resdc.cn/).

Methodology

In this study, we began by dividing the study area into 230,224 points based on a size of 300 × 300 m. The values corresponding to the 14 factors influencing groundwater potential in the Guanzhong Basin were extracted, resulting in a database of conditioning factors with a specification of 230,224 × 14. A sample dataset of 205 × 15, including a column of result values, was created and used to train and validate the RF, XGB, and LCE algorithms. The loop and cross-validation techniques were employed to obtain optimal prediction parameters. These three models were then applied to the database of 230,224 points, and the results were converted into raster data (Wang et al., 2022). The best model was identified by comparing the model results. Figure 5 shows the multi-phase methodological framework employed in this study.

Random forest (RF)

RF is a powerful machine learning algorithm used for various tasks such as classification and regression (Breiman, 2001). It is an ensemble learning method that combines multiple decision trees to create a more robust and accurate model (Paul et al., 2018). The basic principle of the RF algorithm is to build multiple decision trees on randomly sampled subsets of the training data and combine their predictions. This randomness ensures that the trees are diverse and not overfitting to the training data. The calculation process of the RF algorithm involves several steps (Breiman, 2001):

Step 1: Randomly select a subset of the training data (with replacement) to build a decision tree. This subset is known as the bootstrap sample.

Step 2: At each node in the decision tree, randomly select a subset of features to use for splitting. This subset is known as the random feature subset.

Step 3: Repeat steps 1 and 2 to create multiple decision trees.

Step 4: To make a prediction for a new instance, pass it through each decision tree in the forest and take the majority vote of the predictions. For regression tasks, the average of the predictions is taken.

The RF algorithm is a highly flexible and versatile tool that finds application in various fields, including finance, medicine, and environmental science (Gislason et al., 2006; Wang et al., 2020). Its capabilities have also been leveraged in groundwater potential prediction (Naghibi et al., 2015a).

Extreme gradient boosting (XGB)

XGB is a machine learning algorithm that uses a tree learning algorithm and a linear model learning to do parallel computation on a single machine (Chen & Guestrin, 2016). It is faster than other gradient boosting algorithms because it has a block structure for parallel learning. It also uses a distributed weighted quantile sketch algorithm to handle weighted data. XGB minimizes an objective function that consists of a loss function and a regularization term. The objective function can be written as (Ibrahem Ahmed Osman et al., 2021):

$$obj\left(\theta \right)=\sum_{i=1}^{n}l\left(Yi,\widehat{Yi}\right)+\sum k={1}^{K}\Omega \left({f}_{k}\right)$$

(3)

where θ is the set of model parameters, l is a differentiable convex loss function that measures how well the model fits the data, Ω is a regularization term that controls the complexity of each tree f_k, and K is the number of trees. The regularization term Ω(f_k) can be defined as:

$$\Omega\left(f_k\right)=\gamma T+\frac12\lambda\left|w\right|^2$$

(4)

where T is the number of leaves in tree f_k, w is a vector of leaf scores, γ is a parameter that penalizes the number of leaves, and λ is a parameter that penalizes large leaf scores.

The loss function measures how well the model fits the data, and the regularization term controls the complexity of the model. XGB uses gradient descent to update the model parameters based on the gradients of the objective function. The gradient descent update rule can be written as:

$${\theta }^{\left(t+1\right)}={\theta }^{\left(t\right)}-{\eta g}^{\left(t\right)}$$

(5)

where t is an iteration index, η is a learning rate, and g^(t) is a vector of partial derivatives of objective function with respect to each element in θ. In XGB, each element in θ corresponds to a leaf score in one tree. XGB grows trees sequentially, adding one tree at a time that fits the current pseudo-residuals (the negative gradients of objective function. Each tree is a weak learner that makes a small improvement over the previous prediction. The final prediction is a weighted sum of all the trees.

XGB is a popular and efficient ensemble model that has been applied to various domains (Rasool et al., 2022). Following the RF, we used XGB to predict groundwater potential in the Guanzhong Basin.

Local cascade ensemble (LCE)

The LCE algorithm is a new machine learning method that enhances the prediction performance of Random Forest and XGB by combining their strengths and adopting a complementary diversification approach (Fauvel et al., 2022). It can be used for classification and regression tasks. The LCE algorithm consists of two main steps: first, it trains a RF model on the original dataset and obtains its predictions as new features for each instance. Second, it trains an XGB model on a subset of instances that are selected based on their proximity to the decision boundary of the RF. The proximity is measured by a score function that depends on the number of trees that agree on the predicted class for each instance. The final prediction of LCE is obtained by combining the predictions of both models using a weighted average scheme. The calculation formula for LCE is given by (Fauvel et al., 2022):

$$\widehat{y}=a\widehat{y}RF+\left(1-a\right)\widehat{y}XGB$$

(6)

where ŷ is the final prediction, ŷRF is the prediction of random forest, ŷXGB is the prediction of XGB, and α is a weight parameter that controls the balance between both models.

Results and discussion

Application of the three models

To ensure that the models are as accurate as possible and to avoid potential overfitting risks, we utilized multiple calculations to identify the performance of three models, namely RF, XGB, and LCE, by selecting different parameters. Since each model has dozens of parameters, not all of them can have a significant impact on the accuracy of the model. Thus, we selected key parameters that have a greater impact on the model for identification. For the RF model, the key parameters are n estimators, max depth, min samples split, and max feature (Breiman, 2001). For the XGB model, the key parameters are n estimators, learning rate, max depth, and subsample (Chen & Guestrin, 2016). For the LCE model, the key parameters are n estimator and max depth (Fauvel et al., 2022). Within a reasonable range of these parameters, we trained the training set gradually according to a certain interval and obtained the accuracy score of the model by verifying on the test set. Using this approach, the parameter ranges were obtained for high `accuracy scores. Based on this range, cross-validation and grid search can significantly increase the computational efficiency of the model and improve its accuracy (Pedregosa et al., 2011). The parameter tests of the three models are shown in Figs. 6–8.

The term “n estimators” of the RF refers to the number of decision trees. Increasing the number of decision trees can enhance the generalization ability of the RF model, but it also raises computational costs and the risk of overfitting (Sexton & Laake, 2009). In this study, we observed that the accuracy of the RF model increased rapidly as the number of decision trees increased, and the accuracy reached a peak when the number of trees reached about 150, after which the accuracy rate gradually declined (Fig. 6a). Therefore, we set the “n estimators” range of RF to 100–150. The parameter “max depth” determines the maximum depth of each tree. If the value is too small, the model cannot capture the details of the data, and if the value is too large, the generalization ability of the model will decrease. In this study, we found that the accuracy of the RF model did not change after the “max depth” reached 10 (Fig. 6b). Thus, to reduce computational cost, we set max depth to 10–12. The “min samples split” parameter specifies the minimum number of samples required for each node split. As shown in Fig. 6c, the min samples split of the RF model is between 2 and 6, and the accuracy rate gradually decreases. Thus, we choose 2 as the value of this hyperparameter. The “max features” refers to the number of predictors that the model examines at each split. The size of this parameter affects the diversity and accuracy of the tree. In this study, the performance of “sqrt” and “log2” was basically the same (Fig. 6d). Thus, we chose both as candidates for grid search.

The n estimators parameter of the XGB model represents the number of boosting trees (Chen & Guestrin, 2016). As with the RF model, the n estimators range for XGB should not be too large or too small. We performed sequential calculations for the model from 30 to 800 with a step size of 1 and determined the range of this parameter to be 250–350 based on the results shown in Fig. 7a. The learning rate controls the weight of new trees added to the model. Generally, a smaller learning rate leads to a higher accuracy rate but requires more trees and longer training time. Figure 7b shows that after the learning rate reaches 0.2, the accuracy rate gradually stabilizes. Therefore, we selected the learning rate range of 0.2–0.4. The max depth parameter controls the depth of the boosting tree. Figure 7c indicates that the accuracy rate reaches its peak at max depth values of 6 and 7 and does not change after reaching 10. We therefore chose a range of 5–8 for adjusting the max depth of the XGB model. The subsample parameter controls the proportion of random sampling during each training, which can reduce the risk of overfitting. However, too little sampling will reduce accuracy by affecting the training samples of the model. Therefore, we selected the hyperparameter selection range of 0.4–0.8.

As the LCE model is a hybrid ensemble method, it encompasses nearly all the parameters in both the RF and XGB models (Fauvel et al., 2022). However, the two hyperparameters that have the most significant impact are n estimators and max depth. The n estimators parameter denotes the number of base learners for each division of LCE. As depicted in Fig. 8a, it is challenging to significantly improve the model’s accuracy beyond 10 base learners, and there are significant fluctuations. Thus, we chose the range of 9–12 for adjusting this parameter. The max depth parameter controls the maximum depth of the base learners. Based on Fig. 8b, we determined the range of 6–10 for this parameter.

Based on the provided information, we screened out the parameter selection range for three models, cross-validated the training data, and performed a grid search on the hyperparameters to find the optimal parameter combination for each model. The optimal hyperparameter values for the RF model were 123, 10, 2, and log2; for the XGB model, they were 304, 0.38, 6, and 0.77; and for the LCE model, they were 10 and 8. Using these hyperparameters, the three models were constructed and imported 230,224 groups of points to be predicted into the model for calculation one by one. The results were converted into raster data and split into 5 categories with an interval of 0.2 for easy comparison: very low [0.0–0.2), low [0.2–0.4), moderate [0.4–0.6), high [0.6–0.8), and very high [0.8–1.0). The resulting prediction map is shown in Fig. 9.

Based on the results presented in Fig. 9, it can be observed that the three models provide relatively similar spatial distribution of groundwater potential in the Guanzhong Basin. Specifically, the central and southern parts of the model are characterized by high and very high groundwater potential, while the northeastern and northwestern parts of the model are associated with low and very low groundwater potential. This spatial distribution of groundwater potential seems reasonable given that the central part of the model corresponds to Xi’an, the most densely populated area in the region. In addition, higher groundwater potential tends to be distributed on both sides of the Weihe River and its tributary, the Bahe River. The RF model seems to have a larger uncertainty in the assessment of groundwater potential since the distribution of very low and very high groundwater potential is small, and the prediction results are mainly centered on moderate groundwater potential. This could limit the help to decision-makers in formulating next-step water resource management policies. On the other hand, XGB and LCE models tend to predict more clearly either very high or very low groundwater potential, which could provide more insights into the regional groundwater potential.

Validation of groundwater potential maps

To better assess the impact of predicting groundwater potential in the Guanzhong Basin, we utilized ROC (receiver operating characteristic) and AUC (area under the curve) metrics to evaluate the accuracy of the three models (Arabameri et al., 2021). ROC-AUC are measures commonly used in evaluating the performance of binary classification models. The ROC curve plots the true positive rate (sensitivity) against the false positive rate (1—specificity) for different classification thresholds, while AUC measures the area under the ROC curve (Chen et al., 2019b; Wang et al., 2020). A perfect classification model would have an AUC of 1, indicating that it can perfectly distinguish between the two classes. On the other hand, a model with an AUC of 0.5 would perform as well as random guessing, while a model with an AUC below 0.5 would perform worse than random guessing. Given that the training set accuracies of all three models were high (either close to or reaching 1), we primarily conducted ROC-AUC analysis on the test set, as shown in Fig. 10.

Based on Fig. 10, the XGB model achieved the highest accuracy (0.874), followed by RF (0.859) and LCE (0.810), as measured by the size of AUC. The superiority of the XGB model could be attributed to its ability to prune unimportant features and reduce model complexity, as opposed to the RF model, which randomly selected a subset of features without pruning. Additionally, gradient boosting could better fit the data distribution and loss function, while avoiding the risk of overfitting. Although the LCE model integrated the XGB and RF models, it failed to improve accuracy and instead increases computational load, as shown in Table 1. The LCE model required 7.7 s, 11m10.7 s, and 64m32.6 s to run 1, 100, and 1000 times, respectively, while the XGB model required 0.1 s, 18.1 s, and 2m45.3 s, and the RF model took 0.1 s, 20.8 s, and 9m6.3 s, respectively. Since the LCE model’s computational time increases considerably as the sample size grows, it can put a heavy burden on the entire calculation process. The LCE model took much longer to calculate than the other two models because it trained two layers of base learners, with RF in the first layer and XGB in the second, increasing the computational cost and time. Furthermore, the LCE model made multiple predictions per sample with different base learner subsets and requires parameter tuning for each base learner. Therefore, in general, we do not recommend using the LCE model to predict groundwater potential. Conversely, the XGB model calculated the gain of each boosting tree in parallel, speeding up the training process, while the RF model only calculated each decision tree in parallel. In conclusion, the XGB was the best model for predicting groundwater potential in the Guanzhong Basin.

Table 1 The time consumed of the three models by 1, 100 and 1000 times

Full size table

Figure 11 illustrates the data distribution from 230,224 predicted points using three models. The blue histogram displayed the ratio of each groundwater potential category to the total number of rasters. The clarity of the model’s prediction result on the presence of groundwater in the area depended on the proportion of very low and very high groundwater potential. Conversely, a higher proportion of moderate groundwater potential led to a more ambiguous direction of the prediction result of the model. The figure showed that RF’s predictions about the groundwater potential of the Guanzhong Basin mostly fall into the moderate, low, and high categories, accounting for 96.81% of the study area, while only 3.2% was predicted as very low or very high groundwater potential. The results indicate that the predicted groundwater potential by the RF model is centered around 0.5. Notably, a value of 0.5 is the threshold that distinguishes the test set label into enriched groundwater (labeled as 1) or lack of groundwater (labeled as 0). The predicted probabilities of the two types classified by the RF model are extremely similar. Consequently, the model is not sufficiently reliable in dividing the testing set. The XGB model predicted that 61.18% of the study area has very high or very low groundwater potential, while the LCE model predicted 46.85%. This suggested that the XGB and LCE models are more directional in determining the groundwater potential of the basin.

The green and red histograms in the figure indicated the distribution of the 205 sample groups in different groundwater potential zones across the Guanzhong Basin. The green histogram represented the samples with scarce groundwater, and the more it accounts for in the very low and low groundwater potential areas, the more accurate the model’s prediction of low groundwater potential. The red histogram represented the samples with enriched groundwater, and the more it accounts for in very high and high groundwater potential areas, the more accurate the model’s prediction of high groundwater potential. Among the RF, XGB, and LCE models, the proportions of samples with scarce groundwater were 57.14%, 66.67%, and 74.29% for areas predicted to have very low and low groundwater potential, respectively. In contrast, for areas predicted to have very high and high groundwater potential, the proportions of samples with enriched groundwater were 33.66%, 69.31%, and 52.45%, respectively. Overall, the LCE model was more accurate in predicting low groundwater potential, while the XGB model was more accurate in predicting high groundwater potential. As a result, the credibility of the RF output is questionable because the probability values for distinguishing between enriched groundwater and the lack of groundwater are nearly identical, resulting in inconclusive findings.

Of the three ensemble models used to assess groundwater potential in the Guanzhong Basin, the XGB model demonstrated the highest accuracy and required the least amount of computation time. While the LCE model was better at predicting low groundwater potential, its overall score was lower, and its computation time was 7–30 times longer than the XGB model. The RF model had moderate accuracy and computation performance but provided more vague results, mostly within the moderate grade. In conclusion, the XGB model is the most suitable for evaluating and predicting groundwater potential in the study area.

Conclusion

The present study aimed to assess the groundwater potential in the Guanzhong Basin, China, using GIS-based ensemble learning models. To achieve this, fourteen influencing factors were considered, including landform, slope, slope aspect, curvature, precipitation, evapotranspiration, distance to fault, distance to river, road density, TWI, soil type, lithology, land cover, and NDVI. The values of these factors were extracted into 205 sets of samples and 230,224 points discretized from the study area. To train and validate the models, the 205 groups of samples were divided into training and test sets using a ratio of 0.7:0.3. The three ensemble models, RF, XGB, and LCE, were trained and cross-validated using the samples, and their hyperparameters were tuned to obtain the best models. The three models were then applied to the 230,224 points to predict the groundwater potential of the Guanzhong Basin. The results showed that the AUC values of the RF, XGB, and LCE models were 0.859, 0.874, and 0.810, respectively. The RF model’s predictions were primarily focused on the areas with moderate groundwater potential, indicating that it has some uncertainty in predicting the test set label. In contrast, the XGB and LCE models performed better in identifying and distinguishing the areas with very high and very low groundwater potential in the study area. Out of the three models, the proportion of samples without groundwater in areas predicted to have very low and low groundwater potential were 57.14%, 66.67%, and 74.29%, respectively. On the other hand, in areas predicted to have very high and high groundwater potential, the proportion of samples with abundant groundwater was 33.66%, 69.31%, and 52.45% for RF, XGB, and LCE, respectively. In terms of computational load, LCE required the most time and resources, while the XGB model required the least. Hence, based on the results, the XGB model was found to be the best for predicting the groundwater potential of the Guanzhong Basin. This study provides valuable insights into groundwater potential assessment using GIS-based ensemble learning models. The results can be useful for policymakers and water resource management authorities for sustainable management of groundwater resources in the Guanzhong Basin and similar regions.

Availability of data and materials

The datasets used or analyzed and the Python code during the current study are available from the corresponding author on reasonable request.

References

Ahmad, I., Dar, M. A., Fenta, A., et al. (2021). Spatial configuration of groundwater potential zones using OLS regression method. Journal of African Earth Sciences, 177,104147. https://doi.org/10.1016/j.jafrearsci.2021.104147
Ahmad, I., Hasan, H., Jilani, M. M., & Ahmed, S. I. (2023). Mapping potential groundwater accumulation zones for Karachi city using GIS and AHP techniques. Environmental Monitoring and Assessment, 195, 381. https://doi.org/10.1007/s10661-023-10971-x
Article Google Scholar
Al-Abadi, A. M., Al-Temmeme, A. A., & Al-Ghanimy, M. A. (2016). A GIS-based combining of frequency ratio and index of entropy approaches for mapping groundwater availability zones at Badra–Al Al-Gharbi–Teeb areas, Iraq. Sustain Water Resour Manag, 2, 265–283. https://doi.org/10.1007/s40899-016-0056-5
Article Google Scholar
Anand, B., Karunanidhi, D., & Subramani, T. (2021). Promoting artificial recharge to enhance groundwater potential in the lower Bhavani River basin of South India using geospatial techniques. Environmental Science and Pollution Research, 28, 18437–18456. https://doi.org/10.1007/s11356-020-09019-1
Article CAS Google Scholar
Arabameri, A., Pal, S. C., Rezaie, F., et al. (2021). Modeling groundwater potential using novel GIS-based machine-learning ensemble techniques. Journal of Hydrology: Regional Studies, 36, 100848. https://doi.org/10.1016/j.ejrh.2021.100848
Arabameri, A., Rezaei, K., Cerda, A., et al. (2019). GIS-based groundwater potential mapping in Shahroud plain, Iran. A comparison among statistical (bivariate and multivariate), data mining and MCDM approaches. Science of the Total Environment, 658, 160–177. https://doi.org/10.1016/j.scitotenv.2018.12.115
Article CAS Google Scholar
Arefin, R. (2020). Groundwater potential zone identification using an analytic hierarchy process in Dhaka City. Bangladesh. Environ Earth Sci, 79, 268. https://doi.org/10.1007/s12665-020-09024-0
Article Google Scholar
Bei, N., Xiao, B., Meng, N., & Feng, T. (2016). Critical role of meteorological conditions in a persistent haze episode in the Guanzhong basin, China. Science of the Total Environment, 550, 273–284. https://doi.org/10.1016/j.scitotenv.2015.12.159
Article CAS Google Scholar
Bera, A., Mukhopadhyay, B. P., Chowdhury, P., et al. (2021). Groundwater vulnerability assessment using GIS-based DRASTIC model in Nangasai River basin, India with special emphasis on agricultural contamination. Ecotoxicology and Environmental Safety, 214, 112085. https://doi.org/10.1016/j.ecoenv.2021.112085
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Chen, M. (1986). Regional characteristics and assessment of groundwater resource in China. Journal of Natural Resources, 1, 18–27. https://doi.org/10.11849/zrzyxb.1986.01.004
Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, pp 785–794.
Chen, W., Panahi, M., Khosravi, K., et al. (2019a). Spatial prediction of groundwater potentiality using ANFIS ensembled with teaching-learning-based and biogeography-based optimization. Journal of Hydrology, 572, 435–448. https://doi.org/10.1016/j.jhydrol.2019.03.013
Article Google Scholar
Chen, W., Tsangaratos, P., Ilia, I., et al. (2019b). Groundwater spring potential mapping using population-based evolutionary algorithms and data mining methods. Science of the Total Environment, 684, 31–49. https://doi.org/10.1016/j.scitotenv.2019.05.312
Article CAS Google Scholar
Cui, Y., & Shao, J. (2005). The role of ground water in arid/semiarid ecosystems, Northwest China. Groundwater, 43, 471–477. https://doi.org/10.1111/j.1745-6584.2005.0063.x
Article CAS Google Scholar
Díaz-Alcaide, S., & Martínez-Santos, P. (2019). Review: Advances in groundwater potential mapping. Hydrogeology Journal, 27, 2307–2324. https://doi.org/10.1007/s10040-019-02001-3
Article Google Scholar
Doke, A. B., Zolekar, R. B., Patel, H., & Das, S. (2021). Geospatial mapping of groundwater potential zones using multi-criteria decision-making AHP approach in a hardrock basaltic terrain in India. Ecological Indicators, 127, 107685. https://doi.org/10.1016/j.ecolind.2021.107685
Duan, H., Deng, Z., Deng, F., & Wang, D. (2016). Assessment of groundwater potential based on multicriteria decision making model and decision tree algorithms. Mathematical Problems in Engineering, 2016, 2064575. https://doi.org/10.1155/2016/2064575
Article Google Scholar
Farhat, B., Souissi, D., Mahfoudhi, R., et al. (2023). GIS-based multi-criteria decision-making techniques and analytical hierarchical process for delineation of groundwater potential. Environmental Monitoring and Assessment, 195, 285. https://doi.org/10.1007/s10661-022-10845-8
Article Google Scholar
Fauvel, K., Fromont, E., Masson, V., et al. (2022). XEM: An explainable-by-design ensemble method for multivariate time series classification. Data Mining and Knowledge Discovery, 36, 917–957. https://doi.org/10.1007/s10618-022-00823-6
Article Google Scholar
Fick, S. E., & Hijmans, R. J. (2017). WorldClim2: New 1-km spatial resolution climate surfaces for global land areas. International Journal of Climatology, 37, 4302–4315. https://doi.org/10.1002/joc.5086
Article Google Scholar
Gislason, P. O., Benediktsson, J. A., & Sveinsson, J. R. (2006). Random forests for land cover classification. Pattern Recognition Letters, 27, 294–300. https://doi.org/10.1016/j.patrec.2005.08.011
Article Google Scholar
Golkarian, A., Naghibi, S. A., Kalantar, B., & Pradhan, B. (2018). Groundwater potential mapping using C5.0, random forest, and multivariate adaptive regression spline models in GIS. Environmental Monitoring and Assessment, 190, 149. https://doi.org/10.1007/s10661-018-6507-8
Han, J., Wang, J., Chen, L., et al. (2021). Driving factors of desertification in Qaidam Basin, China: An 18-year analysis using the geographic detector model. Ecological Indicators, 124, 107404. https://doi.org/10.1016/j.ecolind.2021.107404
Han, K., Zuo, R., Ni, P., et al. (2020). Application of a genetic algorithm to groundwater pollution source identification. Journal of Hydrology, 589, 125343. https://doi.org/10.1016/j.jhydrol.2020.125343
Ibrahem Ahmed Osman, A., Najah Ahmed, A., Chow, M. F., et al. (2021). Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Engineering Journal, 12, 1545–1556. https://doi.org/10.1016/j.asej.2020.11.011
Article Google Scholar
Jansen, J. (2019). Drone based geophysical surveys for groundwater applications. In: 2019 groundwater week.
Jhariya, D. C., Khan, R., Mondal, K. C., et al. (2021). Assessment of groundwater potential zone using GIS-based multi-influencing factor (MIF), multi-criteria decision analysis (MCDA) and electrical resistivity survey techniques in Raipur city, Chhattisgarh, India. J Water Supply Res Technol-Aqua, 70, 375–400. https://doi.org/10.2166/aqua.2021.129
Article Google Scholar
Jia, S., Zhu, W., Lű, A., & Yan, T. (2011). A statistical spatial downscaling algorithm of TRMM precipitation based on NDVI and DEM in the Qaidam basin of China. Remote Sensing of Environment, 115, 3069–3079. https://doi.org/10.1016/j.rse.2011.06.009
Article Google Scholar
Jin, X., Guo, R., & Xia, W. (2013). Distribution of actual evapotranspiration over Qaidam basin, an arid area in China. Remote Sensing, 5, 6976–6996. https://doi.org/10.3390/rs5126976
Article Google Scholar
Kaur, A., & Sood, S. K. (2020). Deep learning based drought assessment and prediction framework. Ecological Informatics, 57, 101067. https://doi.org/10.1016/j.ecoinf.2020.101067
Kong, F., Song, J., Zhang, Y., et al. (2019). Surface water-groundwater interaction in the Guanzhong section of the Weihe river basin, China. Groundwater, 57, 647–660. https://doi.org/10.1111/gwat.12854
Article CAS Google Scholar
Li, M., Sun, H., Singh, V. P., et al. (2019). Agricultural water resources management using maximum entropy and entropy-weight-based TOPSIS methods. Entropy, 21, 364. https://doi.org/10.3390/e21040364
Article CAS Google Scholar
Mo, S., Zabaras, N., Shi, X., & Wu, J. (2020). Integration of adversarial autoencoders with residual dense convolutional networks for estimation of non-Gaussian hydraulic conductivities. Water Resources Research, 56. https://doi.org/10.1029/2019WR026082
Naghibi, S. A., Pourghasemi, H. R., & Dixon, B. (2015a). GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environmental Monitoring and Assessment, 188, 44. https://doi.org/10.1007/s10661-015-5049-6
Article Google Scholar
Naghibi, S. A., Pourghasemi, H. R., Pourtaghi, Z. S., & Rezaei, A. (2015). Groundwater qanat potential mapping using frequency ratio and Shannon’s entropy models in the Moghan watershed Iran. Earth Science Informatics, 8, 171–186. https://doi.org/10.1007/s12145-014-0145-7
Article Google Scholar
Panahi, M., Sadhasivam, N., Pourghasemi, H. R., et al. (2020). Spatial prediction of groundwater potential mapping based on convolutional neural network (CNN) and support vector regression (SVR). Journal of Hydrology, 588, 125033. https://doi.org/10.1016/j.jhydrol.2020.125033
Paul, A., Mukherjee, D. P., Das, P., et al. (2018). Improved random forest for classification. IEEE Transactions on Image Processing, 27, 4012–4024. https://doi.org/10.1109/TIP.2018.2834830
Article Google Scholar
Pedregosa, F., Varoquaux, G., Gramfort, A., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825–2830.
Google Scholar
Pham, B. T., Jaafari, A., Phong, T. V., et al. (2021). Naïve Bayes ensemble models for groundwater potential mapping. Ecological Informatics, 64,
Rasool, U., Yin, X., Xu, Z., et al. (2022). Mapping of groundwater productivity potential with machine learning algorithms: A case study in the provincial capital of Baluchistan, Pakistan. Chemosphere, 303, 135265. https://doi.org/10.1016/j.chemosphere.2022.135265
Razandi, Y., Pourghasemi, H. R., Neisani, N. S., & Rahmati, O. (2015). Application of analytical hierarchy process, frequency ratio, and certainty factor models for groundwater potential mapping using GIS. Earth Science Informatics, 8, 867–883. https://doi.org/10.1007/s12145-015-0220-8
Article Google Scholar
Reichstein, M., Camps-Valls, G., Stevens, B., et al. (2019). Deep learning and process understanding for data-driven Earth system science. Nature, 566, 195–204. https://doi.org/10.1038/s41586-019-0912-1
Article CAS Google Scholar
Ren, X., Li, P., He, X., et al. (2021). Hydrogeochemical processes affecting groundwater chemistry in the central part of the Guanzhong basin, China. Archives of Environmental Contamination and Toxicology, 80, 74–91. https://doi.org/10.1007/s00244-020-00772-5
Article CAS Google Scholar
Rizeei, H. M., Pradhan, B., Saharkhiz, M. A., & Lee, S. (2019). Groundwater aquifer potential modeling using an ensemble multi-adoptive boosting logistic regression technique. Journal of Hydrology, 579, 124172. https://doi.org/10.1016/j.jhydrol.2019.124172
Sexton, J., & Laake, P. (2009). Standard errors for bagged and random forest estimators. Computational Statistics & Data Analysis, 53, 801–811. https://doi.org/10.1016/j.csda.2008.08.007
Article Google Scholar
Shamsudduha, M., & Taylor, R. G. (2020). Groundwater storage dynamics in the world’s large aquifer systems from GRACE: Uncertainty and role of extreme precipitation. Earth System Dynamics, 11, 755–774. https://doi.org/10.5194/esd-11-755-2020
Article Google Scholar
Singh, S. K., Zeddies, M., Shankar, U., & Griffiths, G. A. (2019). Potential groundwater recharge zones within New Zealand. Geoscience Frontiers, 10, 1065–1072. https://doi.org/10.1016/j.gsf.2018.05.018
Article CAS Google Scholar
Sørensen, R., & Seibert, J. (2007). Effects of DEM resolution on the calculation of topographical indices: TWI and its components. Journal of Hydrology, 347, 79–89. https://doi.org/10.1016/j.jhydrol.2007.09.001
Article Google Scholar
Sun, A. Y., Scanlon, B. R., Zhang, Z., et al. (2019). Combining physically based modeling and deep learning for fusing GRACE satellite data: Can we learn from mismatch? Water Resources Research, 55, 1179–1195. https://doi.org/10.1029/2018WR023333
Article Google Scholar
Sun, X., Zhou, Y., Yuan, L., et al. (2021). Integrated decision-making model for groundwater potential evaluation in mining areas using the cusp catastrophe model and principal component analysis. Journal of Hydrology: Regional Studies, 37,
Tegegne, A. M. (2022). Applications of convolutional neural network for classification of land cover and groundwater potentiality zones. Journal of Engineering, 2022, 6372089. https://doi.org/10.1155/2022/6372089
Velis, M., Conti, K. I., & Biermann, F. (2017). Groundwater and human development: Synergies and trade-offs within the context of the sustainable development goals. Sustainability Science, 12, 1007–1017. https://doi.org/10.1007/s11625-017-0490-9
Article Google Scholar
Wang, Y., Guo, H., Li, J., et al. (2008). Investigation and assessment of groundwater resources and their environmental issues in the Qaidam basin. Geology Press.
Google Scholar
Wang, Z., Liu, Q., & Liu, Y. (2020). Mapping landslide susceptibility using machine learning algorithms and GIS: A case study in Shexian county, Anhui province, china. Symmetry-Basel, 12, 1954. https://doi.org/10.3390/sym12121954
Article Google Scholar
Wang, Z., Wang, J., & Han, J. (2022). Spatial prediction of groundwater potential and driving factor analysis based on deep learning and geographical detector in an arid endorheic basin. Ecological Indicators, 142, 109256. https://doi.org/10.1016/j.ecolind.2022.109256
Xu, P., Zhang, Q., Qian, H., et al. (2019). Characterization of geothermal water in the piedmont region of Qinling mountains and Lantian-Bahe group in Guanzhong Basin China. Environmental Earth Sciences, 78, 442. https://doi.org/10.1007/s12665-019-8418-6
Article CAS Google Scholar
Zaree, M., Javadi, S., & Neshat, A. (2019). Potential detection of water resources in karst formations using APLIS model and modification with AHP and TOPSIS. Journal of Earth System Science, 128, 76. https://doi.org/10.1007/s12040-019-1119-4
Article Google Scholar
Zhang, Q., Li, P., Lyu, Q., et al. (2022). Groundwater contamination risk assessment using a modified DRATICL model and pollution loading: A case study in the Guanzhong basin of China. Chemosphere, 291, 132695. https://doi.org/10.1016/j.chemosphere.2021.132695
Zhang, Y., Jia, R., Wu, J., et al. (2021). Evaluation of groundwater using an integrated approach of entropy weight and stochastic simulation: A case study in East Region of Beijing. International Journal of Environmental Research and Public Health, 18, 7703. https://doi.org/10.3390/ijerph18147703
Article CAS Google Scholar
Zhou, H., Gómez-Hernández, J. J., & Li, L. (2014). Inverse methods in hydrogeology: Evolution and recent trends. Advances in Water Resources, 63, 22–37. https://doi.org/10.1016/j.advwatres.2013.10.014
Article Google Scholar

Download references

Funding

The study was supported cooperatively by Second Tibetan Plateau Scientific Expedition and Research Program (2019QZKK0805-02), Key deployment projects of the Chinese academy of sciences (ZDRW-ZS-2020–3), National Natural Science Foundation of China (U20A2088), Innovation Team Foundation of Qinghai Office of Science and Technology (2022-ZJ-903), and Kunlun Talented People of Qinghai Province, High-end Innovation and Entrepreneurship talents-Leading Talents (E140DZ3901).

Author information

Authors and Affiliations

Key Laboratory of Comprehensive and Highly Efficient Utilization of Salt Lake Resources, Qinghai Institute of Salt Lakes, Chinese Academy of Sciences, Xining, 810008, China
Zitao Wang, Jianping Wang & Dongmei Yu
Qinghai Provincial Key Laboratory of Geology and Environment of Salt Lakes, Xining, 810008, China
Zitao Wang, Jianping Wang & Dongmei Yu
University of Chinese Academy of Sciences, Beijing, 100049, China
Zitao Wang & Dongmei Yu
School of Earth and Environment, Anhui University of Science and Technology, Huainan, 232001, China
Kai Chen

Authors

Zitao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jianping Wang
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Yu
View author publications
You can also search for this author in PubMed Google Scholar
Kai Chen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Zitao Wang contributed to the study’s methodology, performed formal analysis, drafted the original manuscript, and created visualizations. Jianping Wang conceptualized the study, reviewed and edited the manuscript. Dongmei Yu was responsible for data curation. Kai Chen contributed to the validation of the study’s findings. All authors reviewed and approved the final manuscript for submission.

Corresponding author

Correspondence to Jianping Wang.

Ethics declarations

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Highlights

• The groundwater potential in the Guanzhong Basin was assessed using four ensemble learning models: Random Forest (RF), Extreme Gradient Boosting (XGB), and Local Cascade Ensemble (LCE).

• The XGB model was found to be the most accurate, with an AUC value of 0.874, followed by the RF model with an AUC of 0.859, and the LCE model with an AUC of 0.810.

• The XGB model required the least amount of computational resources and achieved the highest accuracy, making it the most practical option for predicting groundwater potential in the study area.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Wang, Z., Wang, J., Yu, D. et al. Groundwater potential assessment using GIS-based ensemble learning models in Guanzhong Basin, China. Environ Monit Assess 195, 690 (2023). https://doi.org/10.1007/s10661-023-11388-2

Download citation

Received: 04 March 2023
Accepted: 11 May 2023
Published: 18 May 2023
DOI: https://doi.org/10.1007/s10661-023-11388-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.