Abstract
Flood occurs as a result of high intensity and long-term rainfalls accompanied by snowmelt which flow out of the main river channel onto the flood prone areas and damage the buildings, roads, and facilities and cause life losses. This study aims to implement extreme gradient boosting (EGB) method for the first time in flood susceptibility modelling and compare its performance with three advanced benchmark models including Frequency Ratio (FR), Random Forest (RF), and Generalized Additive Model (GAM). Flood susceptibility map is an efficient tool to make decision for flood control. To do this, the altitude, slope degree, profile curvature, topographic wetness index (TWI), distance from rivers, normalized difference vegetation index, plan curvature, rainfall, land use, stream power index, and lithology were fed to the models. To run the models, 243 flood locations were detected by field surveys and national reports. The same number of locations were randomly created in the study regions and considered as non-flood locations. The flood and non-flood locations were split in 70% ratio for the training dataset and 30% ratio for the testing dataset. Both flood and non-flood locations were fed into the models and output flood susceptibility maps were produced. In order to evaluate the performance of the algorithms, receiver operating characteristics (ROC) curve was implemented. The results of the current research show that the RF model and EGB have the best performances with the area under ROC curve (AUC) of 0.985, and 0.980, followed by the GAM and FR algorithms with AUC values of 0.97, and 0.953, respectively. The results of variable importance by the RF model show that distance from rivers has an important influence on flood susceptibility mapping (FSM), followed by profile curvature, slope, TWI, and altitude. Considering the high performances of the RF and EGB models in flood susceptibility modelling, application of these models is recommended for such studies.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Avoid common mistakes on your manuscript.
Introduction
In the recent past, floods have occurred more frequently as a result of climate changes like the variations in air temperature and rainfall amount and intensity. Apart from the increase of the flood frequency, inappropriate land use planning and management has enhanced both damages costs and life losses. In order to manage the situation and decrease the damages or even forbid them, it is essential to first determine the flood-prone areas (Lee et al. 2017).
Regarding the complicated hydrological features of the watershed and the ever-increasing anthropogenic impacts, floods are hard to be predicted implementing simple non-linear algorithms (Khosravi et al. 2018). For this reason, machine learning and statistical models have been implemented in flood prediction, landslide, and gully susceptibility as well as groundwater potential studies because of their higher efficiency (Bui et al. 2019; Chen et al. 2020b, 2020c; Li and Chen 2020; Zhao and Chen 2020a, 2020b).
Some examples of these models are: artificial neural networks (Sahoo et al. 2006; Youssef et al. 2011), support vector machines (Shafapour et al. 2015), logistic regression (Nandi et al. 2016), evidential belief function and decision trees (Rahmati and Pourghasemi 2017), frequency ratio (Rahmati et al. 2016) random forest and boosted-tree (Lee et al. 2017), Genetic Algorithm Rule-Set Production (GARP) and Quick Unbiased Efficient Statistical Tree (QUEST) (Darabi et al. 2019), weakly labeled support vector machine (WELLSVM) (Zhao et al. 2019), Reducederror pruning trees (REPTree) with Bagging (Bag-REPTree) and Random subspace (RS-REPTree) ensemble frameworks (Chen et al. 2019), classification and regression trees and alternating decision tree (Janizadeh et al. 2019), and alternating decision tree (ADT), functional tree (FT), kernel logistic regression (KLR), multilayer perceptron (MLP) and quadratic discriminant analysis (QDA) (Janizadeh et al. 2019). Additionally, some other studies indicated that hybrid models, such as ensemble of Decision Tree, weights-of-evidence and support vector machines (Tehrany et al. 2014a, 2019), neuro-fuzzy system integrated with metaheuristic algorithms (Bui et al. 2016; Termeh et al. 2018), logistic model tree with bagging ensembles (Chapi et al. 2017), swarm optimized neural networks (Ngo et al. 2018), RF,ANN, SVM (Zhao et al. 2018), ensemble of evolutionary models and ANFIS (Hong et al. 2018), ensemble of multivariate discriminant analysis, CART, and SVM (Choubin et al. 2019), ensemble of multi-criteria decision making (Wang et al. 2019a), fuzzy rule based ensembles (Bui et al. 2019), ensemble of RF, Stochastic Gradient Boosted Model, and Extreme Learning Machine (Shin et al. 2019), had better performances than their single models. Investigating the literature refers that different kinds of algorithms have been used for modelling flood susceptibility, but there still need to use newer and more advanced models to find the best solution to control flood disaster regarding its complicated behavior. Therefore, this study aims to model flood susceptibility by the new model EGB and compare its performance with three benchmark models i.e., FR, RF, and GAM. The FR, RF, and GAM models have been successfully implemented in the flood susceptibility modelling and different other fields (Rahmati et al. 2016; Golkarian et al. 2018; Motevalli et al. 2019; Naghibi et al. 2019a; Vafakhah et al. 2020). Therefore, the method of the EGB is used in flood susceptibility mapping (FSM) in this paper. The fundamental advantage of the EGB is the implementation of the boosting method, which produces strong predictions by “combining several weak learners”. Application of the EGB can diminish the impact of “over-fitting issue” in the final model and produce more generalized outputs.
Material and methods
At first, the flood locations were determined based on field surveys and national reports. Additionally, non-flood locations were produced with a “random-systematic” strategy. Then, we prepare the flood conditioning factors and classify them into training and testing datasets. These datasets are used in order to model flood susceptibility. The output susceptibility maps were validated by Accuracy, and Kappa indices as well as receiver operating characteristics (ROC) curve. A detailed methodology flow chart is shown in Fig. 1.
Study area
The Talar watershed is a mountainous region and has an area of roughly 1765 km2. (Yousefi et al. 2017) showed that this river has been impacted by many floods in the past years. The elevation in the Talar River watershed varies from 221 to 3944 m above sea level with an average value of 1966 m a.s.l.. The average width of the Talar River at the outlet of the basin is about 25.5 m (Fig. 2). There are different land-use classes in the Talar watershed including bare land, agriculture, forest, rangeland, and residential areas (Fig. 3). The average annual rainfall and temperature in the Talar watershed are 610 mm, and 11֠C, respectively. The Talar watershed has a Mediterranean climate. The main soil textures in the study region is loamy-silty, clay-silty, loamy-clay, and clay-loamy (Maghsood et al. 2019).
Flood dataset
In order to detect the flood locations in the Talar watershed, several field surveys were carried out in the lowland areas of the watershed. Further, we used the reports of the Mazandaran Regional Water Authority as well as gaining information from the residents. In addition, hydrology and flood reports as well as the findings of Motevalli and Vafakhah (2016) and Yousefi et al. (2017) were used. Overall, 243 flood locations were detected in the study area. In order to apply the machine learning models, which need non-occurrence or in this study non-flood locations, 243 locations were systematic-randomly selected. First, the points were generated in ArcGIS, and then they investigated in order to check whether they have been correctly selected. Based on investigation of the literature in FSM (Tehrany et al. 2014b; Wang et al. 2019b; Chen et al. 2020c; Pourghasemi et al. 2020) and other geospatial sciences (Naghibi et al. 2018, 2019b, 2020; Motevalli et al. 2019; Li and Chen 2020), the presence and absence locations i.e., the flood and non-flood locations were categorized into training and testing groups covering 70% and 30% of the points, respectively (Fig. 2).
Flood conditioning factors
This study considered several flood susceptibility conditioning factors based on the literature (Tehrany et al. 2014a; Shafapour et al. 2015; Rahmati et al. 2016; Hong et al. 2018; Khosravi et al. 2018; Termeh et al. 2018; Vafakhah et al. 2020) and data availability. The input factors include altitude, slope, profile curvature, topographic wetness index (TWI), distance from rivers, normalized difference vegetation index (NDVI), plan curvature, rainfall, land use, stream power index (SPI), and lithology. The altitude of the study region was obtained from the ASTER-Global digital elevation model (DEM) having a 30×30 m spatial resolution. Generally, higher altitudes have high drainage density and low discharge, while the situation is different in the lowland areas. Slope impacts the water flow velocity over the ground surface and in the channels. This factor was calculated using DEM and is presented in Fig. 4b. The study area has slopes ranging from 0 to 69 degrees.
Plan and profile curvature were created using the DEM of the study region and used in the modelling process (Fig. 4c). These curvatures influence the water flow velocity as well as erosion and deposition processes (Fig. 4d).
SPI presents the river strength for the erosion process. SPI has a direct influence on flood occurrence because it increases with slope and upland watershed area (Lee et al. 2018).
SPI can be computed as follows (Dewan and Yamaguchi 2008) (Fig. 4e):
where, As depicts certain basin area, and b slope degree at each point of the basin.
TWI can be calculated as follows (Beven and Kirkby 1979) (Fig. 4f):
where, a is the cumulative area to a specific pixel, and b is slope angle at any given pixel.
Distance from river influences the discharge and spread of the flooding in a given area (Wan et al. 2010; Glenn et al. 2012). Distance from river layer was created by the Euclidean distance function (Fig. 4g).
Land use and NDVI are indicators of land cover in an area. Land use was created by a “supervised learning algorithm” which is a common way of classifying land use (Myint et al. 2011; Alganci et al. 2013; Kantakumar and Neelamsetti 2015; Basukala et al. 2017; Thakkar et al. 2017). We used Landsat OLI images for four dates including 31 May 2017, 2 July 2017, 20 September 2017, and 22 October 2017 to derive land use maps by Maximum Likelihood Classification algorithm (Kamali Maskooni et al. 2020). The full methodology and results of this part can be found in Mirzaei et al. (2020). The Talar River watershed was classified into five classes of rangeland, agriculture, forest, residential areas and bareland (Fig. 4h). The vegetated parts of the watershed have a lower susceptibility to the flood incidence because there is a reverse relationship between flooding incidence probability and vegetation cover (Tehrany et al. 2013). NDVI was computed regarding the red and infrared bands of an image on 2 July 2017 (Row: 35, Path: 163) from Landsat OLI-IRS (Mirzaei et al. 2020).
Rainfall data were obtained from 14 raingauges and climatological stations in and around the study region (Table 1).
In this study, universal and ordinary Kriging and Co-kriging interpolation methods by circular, spherical, exponential, Gaussian, Stable, J-Bessel, K-Bessel, Hole Effect, Rational Quadratic models, Inverse Distance Weighting (IDW), Radial Basis Function (RBF), Global Polynomial Interpolation (GPI), Local Polynomial Interpolation (LPI), General and local estimators were evaluated using ArcGIS software. After performing the interpolation operation by geostatistical and deterministic methods for comparing, evaluating and selecting suitable interpolation method, Root Mean Squares Error (RMSE) index was used. Results showed that, in the case of annual rainfall, Ordinary Kriging by J-Bessel model was the most appropriate. The spatial variation of annual rainfall is shown in Fig. 4k. The lithology of the study basin was obtained from the Geology Survey of Iran (GSI) (1997). Lithology impacts on soil permeability and has an important role in flooding and its magnitude. There are 26 different lithology classes in the study region (Table 2; Fig. 4j).
It needs to be clarified that the RF, GAM, and EGB used the continuous form of the factors except the ones that are categorical such as land use, and lithology. Whereas to apply the FR method, we needed to classify the continuous factors to distinguishing classes. To do so, NDVI was classified into five classes with equal classification algorithm; plan and profile curvatures were classified into three classes of < −0.00, (− 0.001) - (0.001), and > 0.001 representing convex, flat, and concave curvatures, respectively; since most of the floods occurred in approximations of the rivers, we classified it to five classes of 0–50, 50–100, 100–150, 150–200, and > 200 to better distinguish the relationship between this factor and flood occurrence.
Classification models
Frequency ratio
FR was introduced by Bonham-Carter (1994) and is explained as the probability of incidence of a specific event. This model has been used in many studies in order to define the relationship between target factors such as flood, gully, forest fire, and groundwater spring and their conditioning factors. The output of the FR is simple and helps managers and stakeholders to understand the relationships between input and output factors (Nourani and Komasi 2013). FR can be calculated as below:
where, F is the number of floods in each class, FF is the total number of floods in the study region, A is the number of pixels in each class, and AA is the total number of pixels in the study region. It is noteworthy to mention that the final FR value is obtained by summing the FR values for all factors. FR values are assigned to the pixels by “lookup” function in ArcMap and they are summed by the “weightedsum” function.
Random forest
RF could be regarded as an ensemble algorithm created by several decision trees as predictors and is implemented for classification and regression topics (Breiman 2001). RF is a flexible and strong algorithm that applies random trees by a set of cases through a bootstrapping method. The cases that are not considered in constructing each tree is called out of bag (Catani et al. 2013; Hong et al. 2017). There are two indices to define the contribution of the factors in RF model such as “mean decrease accuracy and mean decrease Gini” (Naghibi et al. 2016). RF is appropriate for working with large data sets and produces satisfactory outputs (Arabameri et al. 2019). In RF, a voting is done between the outputs of the constructed trees and predicts the target variable, in this case, flood susceptibility. To run this model, randomForest package in R software was implemented and the maps were prepared and classified in ArcMap 10.2.
Generalized additive model
GAM is categorized as a “semi-parametric” regression method (Hastie and Tibshirani 1990; Chambers and Hastie 1992). Response curves of this model are predicted by smooth functions; this leads to an extensive variety of response curves to be predicted (Maggini et al. 2006; Pourtaghi et al. 2016). An advantage of the GAM is that it could be interpreted easily, unlike other data mining, black-box, complex models (Goetz et al. 2011). GAM is able to model non-linear features that are influenced by many factors like flood susceptibility (Petschko et al. 2014). The main difference between the generalized linear model and GAM is that the first one implements the parametric impact of solitary variables, while the second one has smoother additive terms (Vorpahl et al. 2012). GAM was applied using caret and mgcv packages in R software.
Extreme gradient boosting
EGB method was introduced by Chen and Guestrin (2016) is a new application of the “gradient boosting machine”. The foundation of EGB is on the basis of the “boosting” which could be explained as creating a “strong learner” by combining the outputs of several “weak learners” (Fan et al. 2018a). The EGB attempts to tune the parameters without making the model over-fitted. The procedure of optimization in EGB begins with creating the first learner to the whole dataset of the variables and follows with creating the next model on the residuals. The procedure finishes when it reaches “stopping criteria” (Fan et al. 2018a). EGB also utilized parallel processes which diminishes the required computation time (Fan et al. 2018b; Naghibi et al. 2020). It gets stronger comparing with other algorithms in the case of missing data availability in the dataset. To apply the EGB, we used the caret package in the R statistical software.
Results and discussion
Flood susceptibility maps obtained by the used algorithms
Frequency ratio
The results of the FR model are presented in Table 3. Based on the results, the highest FR is related to the elevation class of 220–1000 m with an FR value of 4.7. The class of 1000–1650 m has the second-highest FR value of 1.4. In the case of land use, it can be seen that agriculture and residential areas have the highest FR values of 7.5 and 9.7, respectively. FR for NDVI depicts that classes of less than 0.75 have high FR values. NDVI class of 0.1–0.25 and NDVI class lower than 0.1 have the highest FR values of 1.8 and 1.6, respectively. For plan curvature, the findings indicated that class of (− 0.001) - (0.001) had the highest FR value of 4.6. In the case of profile curvature, a class more than 0.001 has the highest FR value of 1.7. Rainfall classes of 725–880 and 617–728 have the highest FR values of 2.7 and 1.3, respectively. In the case of distance from rivers, it can be seen that classes of 50–100 and 150-200 m have the highest FR values of 10.7 and 10.1, respectively. FR results for slope showed that classes of 0–2 (FR = 5.5) and 15–70 (FR = 1.3) have higher FR values than other classes. In the case of SPI, it can be seen that the class of 2.5–80 has a high FR value of 12.3. Regarding TWI, the results showed that TWI class of more than 18.3 has the highest FR value of 36.5. It should be mentioned that this class only covers 1 % of the study region; thus, it does not have much importance in this model. The second highest FR value was observed for the TWI class of 14.1–18.3. Figure 5 shows the flood susceptibility map produced by the FR model classified by the natural break classification scheme. Area percent of flood susceptibility classes showed for the FR, GAM, RF and EGB algorithms in Table 4.
Random forest
The RF model was optimized for the training dataset with a node size of 3, mtry of 2, and 1000 trees. The confusion matrix for predictions of the RF on training data is shown in Table 5. Based on Table 6, the RF has predicted 161 non-flood cases and 164 flood cases correctly, while 10 non-floods and 5 floods are predicted incorrectly. This leads us to a class error of 0.0584 for non-flood prediction and a class error of 0.0295 for flood prediction. The importance of the factors in flood susceptibility mapping was defined through the calculation of mean decrease Gini and is presented in Table 5. Based on the results, altitude, distance from rivers, TWI, slope, and land use had the highest importance in modelling flood susceptibility. On the contrary, lithology, NDVI, and SPI were reported to be the least important factors. For defining the flood susceptibility classes, we used natural break classification scheme with four classes according to the literature (Termeh et al. 2018; Khosravi et al. 2019). Figure 6 shows the flood susceptibility map produced by the RF model. According to the flood susceptibility map, low, moderate, high, and very high susceptibility classes cover 77.6, 14.2, 4.3, and 3.9% of the study area, respectively.
Generalized additive model
The GAM was optimized by a select parameter of FALSE with accuracy and Kappa indices of 0.98 and 0.97, respectively. For optimizing the GAM, the tuning parameter of the method was selected to be “generalized cross-validation Cp”. Figure 7 shows the flood susceptibility map produced by the GAM. Based on the flood susceptibility map classified by natural break, low, moderate, high, and very high susceptibility classes occupy 90.4, 0.7, 0.9, and 8% of the studied region, respectively.
Extreme gradient boosting
Based on the results, the final EGB model was optimized with rounds of 100, ʎ of 0.1, an α of 0.1, and ƞ of 0.3 (Fig. 8). Further it can be seen that 100 iterations produces the best accuracy for different alpha and regularization terms. The accuracy and Kappa values of the optimum EGB algorithm were calculated as 0.95, and 0.90. Low, moderate, high, and very high classes of susceptibility cover 91.6, 14.2, 4.3, and 3.9%, respectively (Fig. 9). The findings of the EGB in Fig. 5 also depicted the high importance of the distance from the rivers, NDVI, slope, and TWI. Lower contribution of the lithology, plan curvature, rainfall and land use were also reported.
Evaluating the performance of the models
Due to the importance of the performance evaluation step, this study used ROC curve for this purpose. ROC is a common and strong method for evaluating the binary issues and has been used in different fields of study including groundwater, flood, floodspreading, and landslide (Naghibi et al. 2017, 2018; Rahmati et al. 2018; Golkarian et al. 2018; Chen et al. 2019, 2020a, 2020c; Kordestani et al. 2019; Chen and Li 2020; Wang et al. 2020; Zhao and Chen 2020b, a; Lei et al. 2020a, 2020b; Li and Chen 2020; Chen and Chen 2021). ROC curve plots “sensitivity” against “1-specificity” at different cut-off values (Conoscenti et al. 2016; Naghibi and Moradi Dashtpagerdi 2016). The area under the curve (AUC) of ROC varies from 0 to 1 where an AUC close to one shows a high-performance model and an AUC close to 0 depicts a low-performance model (Sangchini et al. 2016; Hong et al. 2017; Mousavi et al. 2017). Based on the results of the ROC curve in Table 7, it can be seen that the RF and EGB are the leading models with the highest AUCs of 0.985, and 0.98, respectively. The GAM and FR models had lower accuracy than the leading models with AUC scores of 0.94 and 0.953, respectively. Based on the accuracy scores, RF had the highest performance with an accuracy of 0.965, followed by the EGB and GAM. The Kappa index also showed high performance of the RF and EGB compared to other models.
Discussion
Flood occurs frequently in different countries particularly in the Middle Eastern countries as a result of lack of proper water resources management plans and strategies. This leads the researchers to more advanced algorithms to generate high-accuracy flood susceptibility maps and prepare some initial information for further actions to reduce the damages and save lives. This study made use of EGB as a new MLA and assessed its performance for this purpose. Based on Fressard et al. (2014), all the algorithms produced excellent predictions (or AUC > 0.9). Further to AUC, Accuracy and Kappa were also calculated for the algorithms and showed that the RF and EGB had the best performances, followed by the GAM and FR algorithms. The higher performance of the RF could have resulted from its strong features. RF is robust to noise and outliers (Sameen et al. 2019), the issues that are frequent in geospatial studies like flood susceptibility. RF is capable of predicting the importance or influence ratio of the input factors in the modeling process (Naghibi et al. 2016). This capability makes this model more interpretable than other black-box tree-based models (Pal 2005). RF is able to handle and work with multiple different inputs without an act of factor removal (Naghibi and Pourghasemi 2015; Sameen et al. 2019). RF is able to work with huge data. GAM and FR have also shown acceptable performances. EGB on the other hand, applied boosting technique, which is known as a strong feature in data mining models resulting in better outputs for classification issues. “Gradient boosting method” suffers from a lack of “strong regulation parameter”, that had made it vulnerable to “over-fitting”, but the regularization parameter in EGB makes overcomes this shortcoming (Georganos et al. 2018). The impact of boosting was also confirmed in another study i.e., Naghibi et al. (2017) where they used the FR model to combine the results of some data mining models. Their ensemble model constructed on the basis of boosting had better performance, which is consistent with the results of this research. The results of Georganos et al. (2018) in object-based land-use classification proved a superior performance of the EGB comparing to other models like RF and support vector machines. In another study, Naghibi et al. (2020) also implied the superior efficiency of the EGB in groundwater potential studies which is in line with our findings. Babajide Mustapha and Saeed (2016) clarified that the EGB operated well in classifying biological datasets and they pointed to the fact that EGB is capable of handling both “homogenous and heterogenous” datasets. It also does not require handling missing cases which enhances its computational speed (Timofeev and Denisov 2020). FR as a statistical model provides an easy to interpret outputs that could be useful for the managers as well as stakeholders (Nourani et al. 2014). Therefore, the selected models in this study provide both complex high-performance and simple interpretable results. This feature might have caused superior performance than two other models of GAM and FR with simpler structures.
The results of factor importance by the RF model as the best algorithm in this research showed that distance from the rivers had an important influence on flood susceptibility, followed by profile curvature, slope, TWI, and altitude. The results of Khosravi et al. (2018) showed that altitude had the highest importance in modelling flood susceptibility, followed by distance from the river, NDVI, soil type, and slope. This shows that in spite of differences between the importance of factors affecting the flood susceptibility, there are some shared results, for instance, for distance from the river, and slope. The differences between the important values in this study and Khosravi et al. (2018) could be related to the physical, topographical, and hydrological characteristics of the watersheds. Floods occur in certain distances from rivers; thus, this factor has had a high contribution to the modelling. Higher slopes are related to higher elevations where drainage density is higher and flood discharge is lower. Therefore, we do not expect flood occurrence in those areas. A range of slopes between mountainous and plain areas where discharge reaches higher amounts is more susceptible to flood occurrence. Profile curvature, TWI as secondary topographical factors as well as altitude impact the drainage development in different parts of the watershed, runoff speed, and erosion and sediment ratio.
Conclusion
The current study approved the high performance of the EGB in FSM compared with the RF as the benchmark algorithm in such studies. The RF and EGB models had AUC values of more than 0.98, which is regarded as excellent prediction ability in classification issues. Therefore, it can be concluded that the EGB can be utilized for FSM. In addition to the performance analysis, the importance of the factors was also assessed and depicted that the high importance of the distance from the river, profile curvature, slope, TWI, and altitude in the modelling process of flood occurrence. It is also concluded that the topographical of DEM-derived factors have great influence. This finding gives insight to researchers to target the factors for future studies and select them in a better way. The current study gives a regional perspective to flood control sector to focus on potentially disastrous areas and mitigate the damages. More precisely, the northern parts of the watershed are more susceptible and flood control strategies should be concentrated to those spots. For future studies, it is recommended to apply different optimization algorithms to enhance the performance of the EGB and produce more reliable flood susceptibility maps.
References
Alganci U, Sertel E, Ozdogan M, Ormeci C (2013) Parcel-level identification of crop types using different classification algorithms and multi-resolution imagery in southeastern Turkey. Photogrammetric Engineering & Remote Sensing 79:1053–1065. https://doi.org/10.14358/PERS.79.11.1053
Arabameri A, Pradhan B, Rezaei K (2019) Gully erosion zonation mapping using integrated geographically weighted regression with certainty factor and random forest models in GIS. J Environ Manag 232:928–942
Babajide Mustapha I, Saeed F (2016) Bioactive molecule prediction using extreme gradient boosting. Molecules 21:983. https://doi.org/10.3390/molecules21080983
Basukala AK, Oldenburg C, Schellberg J, Sultanov M, Dubovyk O (2017) Towards improved land use mapping of irrigated croplands: performance assessment of different image classification algorithms and approaches. European Journal of Remote Sensing 50:187–201. https://doi.org/10.1080/22797254.2017.1308235
Beven K, Kirkby MJ (1979) A physically based, variable contributing area model of basin hydrology. Hydrol Sci J 24:43–69
Bonham-Carter GF (1994) Geographic information systems for geoscientists-modeling with GIS. Computer Methods in the Geoscientists 13:398
Breiman LEO (2001) Random forests. Mach Learn 45:5–32
Bui DT, Ngo P-TT, Pham TD, Jaafari A, Minh NQ, Hoa PV, Samui P (2019) A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. Catena 179:184–196
Bui DT, Pradhan B, Nampak H et al (2016) Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibilitgy modeling in a high-frequency tropical cyclone area using GIS. J Hydrol 540:317–330
Catani F, Lagomarsino D, Segoni S, Tofani V (2013) Landslide susceptibility estimation by random forests technique: sensitivity and scaling issues. Nat Hazards Earth Syst Sci 13:2815–2831. https://doi.org/10.5194/nhess-13-2815-2013
Chambers JM, Hastie TJ (1992) Statistical models in S. Wadsworth & Brooks/Cole Advanced Books & Software Pacific Grove, CA
Chapi K, Singh VP, Shirzadi A, Shahabi H, Bui DT, Pham BT, Khosravi K (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245. https://doi.org/10.1016/j.envsoft.2017.06.012
Chen T, Guestrin C (2016) Xgboost: a scalable tree boosting system. In: proceedings of the 22nd ACM sigkdd international conference on knowledge discovery and data mining. ACM, pp 785–794
Chen W, Chen X, Peng J, Panahi M, Lee S (2020a) Landslide susceptibility modeling based on ANFIS with teaching-learning-based optimization and satin bowerbird optimizer. Geosci Front 12:93–107. https://doi.org/10.1016/j.gsf.2020.07.012
Chen W, Hong H, Li S, Shahabi H, Wang Y, Wang X, Ahmad BB (2019) Flood susceptibility modelling using novel hybrid approach of reduced-error pruning trees with bagging and random subspace ensembles. J Hydrol 575:864–873. https://doi.org/10.1016/j.jhydrol.2019.05.089
Chen W, Li Y (2020) GIS-based evaluation of landslide susceptibility using hybrid computational intelligence models. Catena 195:104777. https://doi.org/10.1016/j.catena.2020.104777
Chen W, Li Y, Tsangaratos P, Shahabi H, Ilia I, Xue W, Bian H (2020b) Groundwater spring potential mapping using artificial intelligence approach based on kernel logistic regression, random forest, and alternating decision tree models. Appl Sci 10:425
Chen W, Zhao X, Tsangaratos P, Shahabi H, Ilia I, Xue W, Wang X, Ahmad BB (2020c) Evaluating the usage of tree-based ensemble methods in groundwater spring potential mapping. J Hydrol 583:124602. https://doi.org/10.1016/j.jhydrol.2020.124602
Chen X, Chen W (2021) GIS-based landslide susceptibility assessment using optimized hybrid machine learning methods. Catena 196:104833. https://doi.org/10.1016/j.catena.2020.104833
Choubin B, Moradi E, Golshan M, Adamowski J, Sajedi-Hosseini F, Mosavi A (2019) An ensemble prediction of flood susceptibility using multivariate discriminant analysis, classification and regression trees, and support vector machines. Sci Total Environ 651:2087–2096. https://doi.org/10.1016/j.scitotenv.2018.10.064
Conoscenti C, Rotigliano E, Cama M, Caraballo-Arias NA, Lombardo L, Agnesi V (2016) Exploring the effect of absence selection on landslide susceptibility models: a case study in Sicily, Italy. Geomorphology 261:222–235. https://doi.org/10.1016/j.geomorph.2016.03.006
Darabi H, Choubin B, Rahmati O, Torabi Haghighi A, Pradhan B, Kløve B (2019) Urban flood risk mapping using the GARP and QUEST models: a comparative study of machine learning techniques. J Hydrol 569:142–154. https://doi.org/10.1016/j.jhydrol.2018.12.002
Dewan AM, Yamaguchi Y (2008) Effect of land cover changes on flooding: example from greater Dhaka of Bangladesh. Int J Geoinform 4:11–20
Fan J, Wang X, Wu L, Zhou H, Zhang F, Yu X, Lu X, Xiang Y (2018a) Comparison of support vector machine and extreme gradient boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: a case study in China. Energy Convers Manag 164:102–111. https://doi.org/10.1016/j.enconman.2018.02.087
Fan J, Wang X, Wu L, Zhou H, Zhang F, Yu X, Lu X, Xiang Y (2018b) Comparison of support vector machine and extreme gradient boosting for predicting daily global solar radiation using temperature and precipitation in humid subtropical climates: a case study in China. Energy Convers Manag 164:102–111
Georganos S, Grippa T, Vanhuysse S, Lennert M, Shimoni M, Wolff E (2018) Very high resolution object-based land use–land cover urban classification using extreme gradient boosting. IEEE Geosci Remote Sens Lett 15:607–611. https://doi.org/10.1109/LGRS.2018.2803259
Glenn EP, Morino K, Nagler PL, Murray RS, Pearlstein S, Hultine KR (2012) Roles of saltcedar (Tamarix spp.) and capillary rise in salinizing a non-flooding terrace on a flow-regulated desert river. J Arid Environ 79:56–65. https://doi.org/10.1016/j.jaridenv.2011.11.025
Goetz JN, Guthrie RH, Brenning A (2011) Integrating physical and empirical landslide susceptibility models using generalized additive models. Geomorphology 129:376–386. https://doi.org/10.1016/j.geomorph.2011.03.001
Golkarian A, Naghibi SA, Kalantar B, Pradhan B (2018) Groundwater potential mapping using C5. 0, random forest, and multivariate adaptive regression spline models in GIS. Environ Monit Assess 190:149
Hastie TJ, Tibshirani RJ (1990) Generalized additive models London chapman and hall. Inc
Hong H, Naghibi SA, Dashtpagerdi MM et al (2017) A comparative assessment between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China. Arab J Geosci 10:167
Hong H, Panahi M, Shirzadi A, Ma T, Liu J, Zhu AX, Chen W, Kougias I, Kazakis N (2018) Flood susceptibility assessment in Hengfeng area coupling adaptive neuro-fuzzy inference system with genetic algorithm and differential evolution. Sci Total Environ 621:1124–1141. https://doi.org/10.1016/j.scitotenv.2017.10.114
Janizadeh S, Avand M, Jaafari A, Phong TV, Bayat M, Ahmadisharaf E, Prakash I, Pham BT, Lee S (2019) Prediction success of machine learning methods for flash flood susceptibility mapping in the Tafresh watershed, Iran. Sustainability 11:5426
Kamali Maskooni E, Naghibi SA, Hashemi H, Berndtsson R (2020) Application of advanced machine learning algorithms to assess groundwater potential using remote sensing-derived data. Remote Sens 12:2742
Kantakumar LN, Neelamsetti P (2015) Multi-temporal land use classification using hybrid approach. Egypt J Remote Sens Space Sci 18:289–295
Khosravi K, Pham BT, Chapi K et al (2018) A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran. Sci Total Environ 627:744–755. https://doi.org/10.1016/j.ejrs.2015.09.003
Khosravi K, Shahabi H, Pham BT, Adamowski J, Shirzadi A, Pradhan B, Dou J, Ly HB, Gróf G, Ho HL, Hong H, Chapi K, Prakash I (2019) A comparative assessment of flood susceptibility modeling using multi-criteria decision-making analysis and machine learning methods. J Hydrol 573:311–323. https://doi.org/10.1016/j.jhydrol.2019.03.073
Kordestani MD, Naghibi SA, Hashemi H, Ahmadi K, Kalantar B, Pradhan B (2019) Groundwater potential mapping using a novel data-mining ensemble model. Hydrogeol J 27:211–224
Lee S, Lee S, Lee M-J, Jung H-S (2018) Spatial assessment of urban flood susceptibility using data mining and geographic information system (GIS) tools. Sustainability 10:648
Lee SS, Kim J-C, Jung H-S, Lee MJ, Lee S (2017) Spatial prediction of flood susceptibility using random-forest and boosted-tree models in Seoul metropolitan city, Korea. Geomatics, Natural Hazards and Risk 8:1185–1203. https://doi.org/10.1080/19475705.2017.1308971
Lei X, Chen W, Avand M, Janizadeh S, Kariminejad N, Shahabi H, Costache R, Shahabi H, Shirzadi A, Mosavi A (2020a) GIS-based machine learning algorithms for gully Erosion susceptibility mapping in a semi-arid region of Iran. Remote Sens 12:2478
Lei X, Chen W, Pham BT (2020b) Performance evaluation of gis-based artificial intelligence approaches for landslide susceptibility modeling and spatial patterns analysis. ISPRS Int J Geo Inf 9:443
Li Y, Chen W (2020) Landslide susceptibility evaluation using hybrid integration of evidential belief function and machine learning techniques. Water 12:113
Maggini R, Lehmann A, Zimmermann NE, Guisan A (2006) Improving generalized regression analysis for the spatial prediction of forest communities. J Biogeogr 33:1729–1749. https://doi.org/10.1111/j.1365-2699.2006.01465.x
Maghsood FF, Moradi H, Bavani ARM et al (2019) Climate change impact on flood frequency and source area in northern Iran under CMIP5 scenarios. Water 11:273. https://doi.org/10.3390/w11020273
Mirzaei S, Vafakhah M, Pradhan B, Alavi SJ (2020) An improved land use classification scheme using multi-seasonal satellite images and secondary data. ECOPERSIA 8:97–107
Motevalli A, Naghibi SA, Hashemi H, Berndtsson R, Pradhan B, Gholami V (2019) Inverse method using boosted regression tree and k-nearest neighbor to quantify effects of point and non-point source nitrate pollution in groundwater. J Clean Prod 228:1248–1263
Motevalli A, Vafakhah M (2016) Flood hazard mapping using synthesis hydraulic and geomorphic properties at watershed scale. Stoch Env Res Risk A 30:1889–1900. https://doi.org/10.1007/s00477-016-1305-8
Mousavi SM, Golkarian A, Naghibi SA et al (2017) GIS-based groundwater spring potential mapping using data mining boosted regression tree and probabilistic frequency ratio models in Iran. AIMS Geosci 3:91–115. https://doi.org/10.3934/geosci.2017.1.91
Myint SW, Gober P, Brazel A, Grossman-Clarke S, Weng Q (2011) Per-pixel vs. object-based classification of urban land cover extraction using high spatial resolution imagery. Remote Sens Environ 115:1145–1161. https://doi.org/10.1016/j.rse.2010.12.017
Naghibi S, Vafakhah M, Hashemi H, Pradhan B, Alavi S (2018) Groundwater augmentation through the site selection of floodwater spreading using a data mining approach (case study: Mashhad plain, Iran). Water 10:1405
Naghibi SA, Dolatkordestani M, Rezaei A, Amouzegari P, Heravi MT, Kalantar B, Pradhan B (2019a) Application of rotation forest with decision trees as base classifier and a novel ensemble model in spatial modeling of groundwater potential. Environ Monit Assess 191:248
Naghibi SA, Hashemi H, Berndtsson R, Lee S (2020) Application of extreme gradient boosting and parallel random forest algorithms for assessing groundwater spring potential using DEM-derived factors Journal of Hydrology 125197
Naghibi SA, Moghaddam DD, Kalantar B, Pradhan B, Kisi O (2017) A comparative assessment of GIS-based data mining models and a novel ensemble model in groundwater well potential mapping. J Hydrol 548:471–483
Naghibi SA, Moradi Dashtpagerdi M (2016) Evaluation of four supervised learning methods for groundwater spring potential mapping in Khalkhal region (Iran) using GIS-based features. Hydrogeology journal 1–21. https://doi.org/10.1007/s10040-016-1466-z
Naghibi SA, Pourghasemi HR (2015) A comparative assessment between three machine learning models and their performance comparison by bivariate and multivariate statistical methods in groundwater potential mapping. Water Resour Manag 29:5217–5236. https://doi.org/10.1007/s11269-015-1114-8
Naghibi SA, Pourghasemi HR, Dixon B (2016) GIS-based groundwater potential mapping using boosted regression tree, classification and regression tree, and random forest machine learning models in Iran. Environ Monit Assess 188:44. https://doi.org/10.1007/s10661-015-5049-6
Naghibi SA, Vafakhah M, Hashemi H et al (2019b) Water resources management through flood spreading project suitability mapping using frequency ratio, k-nearest neighbours, and random forest algorithms. Nat Resour Res 29:1915–1933
Nandi A, Mandal A, Wilson M, Smith D (2016) Flood hazard mapping in Jamaica using principal component analysis and logistic regression. Environ Earth Sci 75:465. https://doi.org/10.1007/s12665-016-5323-0
Ngo P-T, Hoang N-D, Pradhan B, Nguyen Q, Tran X, Nguyen Q, Nguyen V, Samui P, Tien Bui D (2018) A novel hybrid swarm optimized multilayer neural network for spatial prediction of flash floods in tropical areas using Sentinel-1 SAR imagery and geospatial data. Sensors 18:3704. https://doi.org/10.3390/s18113704
Nourani V, Komasi M (2013) A geomorphology-based ANFIS model for multi-station modeling of rainfall–runoff process. J Hydrol 490:41–55
Nourani V, Pradhan B, Ghaffari H (2014) Landslide susceptibility mapping at Zonouz Plain , Iran using genetic programming and comparison. Nat Hazards 71:523–547. https://doi.org/10.1007/s11069-013-0932-3
Pal M (2005) Random forest classifier for remote sensing classification. Int J Remote Sens 26:217–222
Petschko H, Brenning A, Bell R, Goetz J, Glade T (2014) Assessing the quality of landslide susceptibility maps–case study Lower Austria. Nat Hazards Earth Syst Sci 14:95–118. https://doi.org/10.5194/nhess-14-95-2014
Pourghasemi HR, Razavi-Termeh SV, Kariminejad N, Hong H, Chen W (2020) An assessment of metaheuristic approaches for flood assessment. J Hydrol 582:124536
Pourtaghi ZS, Pourghasemi HR, Aretano R, Semeraro T (2016) Investigation of general indicators influencing on forest fire and its susceptibility modeling using different data mining techniques. Ecol Indic 64:72–84
Rahmati O, Naghibi SA, Shahabi H, Bui DT, Pradhan B, Azareh A, Rafiei-Sardooi E, Samani AN, Melesse AM (2018) Groundwater spring potential modelling: comprising the capability and robustness of three different modeling approaches. J Hydrol 565:248–261
Rahmati O, Pourghasemi HR (2017) Identification of critical flood prone areas in data-scarce and ungauged regions: a comparison of three data mining models. Water Resour Manag 31:1473–1487. https://doi.org/10.1007/s11269-017-1589-6
Rahmati O, Pourghasemi HR, Zeinivand H (2016) Flood susceptibility mapping using frequency ratio and weights-of-evidence models in the Golastan Province, Iran. Geocarto International 31:42–70. https://doi.org/10.1080/10106049.2015.1041559
Sahoo GB, Ray C, De Carlo EH (2006) Use of neural network to predict flash flood and attendant water qualities of a mountainous stream on Oahu, Hawaii. J Hydrol 327:525–538
Sameen MI, Pradhan B, Lee S (2019) Self-learning random forests model for mapping groundwater yield in data-scarce areas. Nat Resour Res 28:757–775
Sangchini EK, Emami SN, Tahmasebipour N, Pourghasemi HR, Naghibi SA, Arami SA, Pradhan B (2016) Assessment and comparison of combined bivariate and AHP models with logistic regression for landslide susceptibility mapping in the Chaharmahal-e-Bakhtiari Province, Iran. Arab J Geosci 9:201. https://doi.org/10.1007/s12517-015-2258-9
Shafapour M, Biswajeet T, Tehrany MS et al (2015) Flood susceptibility analysis and its verification using a novel ensemble support vector machine and frequency ratio method. Stoch Env Res Risk A 29:1149–1165. https://doi.org/10.1007/s00477-015-1021-9
Shin J-Y, Ro Y, Cha J-W, et al (2019) Assessing the Applicability of Random Forest, Stochastic Gradient Boosted Model, and Extreme Learning Machine Methods to the Quantitative Precipitation Estimation of the Radar Data: A Case Study to Gwangdeoksan Radar, South Korea, in 2018. Adv Meteorol 2019:
Tehrany MS, Jones S, Shabani F (2019) Identifying the essential flood conditioning factors for flood prone area mapping using machine learning techniques. Catena 175:174–192. https://doi.org/10.1016/j.catena.2018.12.011
Tehrany MS, Lee MJ, Pradhan B, Jebur MN, Lee S (2014a) No title. Environ Earth Sci 72:4001–4015
Tehrany MS, Pradhan B, Jebur MN (2014b) Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS. J Hydrol 512:332–343
Tehrany MS, Pradhan B, Jebur MN (2013) Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS. Journal of hydrology 504:. https://doi.org/10.1016/j.jhydrol.2013.09.034
Termeh SVR, Kornejady A, Pourghasemi HR, Keesstra S (2018) Flood susceptibility mapping using novel ensembles of adaptive neuro fuzzy inference system and metaheuristic algorithms. Sci Total Environ 615:438–451
Thakkar AK, Desai VR, Patel A, Potdar MB (2017) Post-classification corrections in improving the classification of land use/land cover of arid region using RS and GIS: the case of Arjuni watershed, Gujarat, India. Egypt J Remote Sens Space Sci 20:79–89
Timofeev A V, Denisov VM (2020) Machine learning based predictive maintenance of infrastructure facilities in the cryolithozone. In: Recent developments on industrial control systems resilience. Springer, pp. 49–74
Vafakhah M, Mohammad Hasani Loor S, Pourghasemi HR, Katebikord A (2020) Comparing performance of random forest and adaptive neuro-fuzzy inference system data mining models for flood susceptibility mapping. Arab J Geosci 13:417. https://doi.org/10.1007/s12517-020-05363-1
Vorpahl P, Elsenbeer H, Märker M, Schröder B (2012) How can statistical models help to determine driving factors of landslides ? Ecol Model 239:27–39. https://doi.org/10.1016/j.ecolmodel.2011.12.007
Wan S, Tc L, Ty C (2010) A novel data mining technique of analysis and classification for landslide problems. Nat Hazards 52:211–230. https://doi.org/10.1007/s11069-009-9366-3
Wang G, Lei X, Chen W, Shahabi H, Shirzadi A (2020) Hybrid computational intelligence methods for landslide susceptibility mapping. Symmetry 12:325
Wang Y, Hong H, Chen W, Li S, Pamučar D, Gigović L, Drobnjak S, Bui DT, Duan H (2019a) A hybrid GIS multi-criteria decision-making method for flood susceptibility mapping at Shangyou, China. Remote Sens 11:62. https://doi.org/10.3390/rs11010062
Wang Y, Hong H, Chen W et al (2019b) A hybrid GIS multi-criteria decision-making method for flood susceptibility mapping at Shangyou, China. Remote Sens 11:62
Yousefi S, Moradi HR, Pourghasemi HR, Khatami R (2017) Assessment of floodplain landuse and channel morphology within meandering reach of the Talar River in Iran using GIS and aerial photographs. Geocarto International 6049:1–14. https://doi.org/10.1080/10106049.2017.1353645
Youssef AM, Pradhan B, Hassan AM (2011) Flash flood risk estimation along the St. Katherine road, southern Sinai, Egypt using GIS based morphometry and satellite imagery. Environ Earth Sci 62:611–623. https://doi.org/10.1007/s12665-010-0551-1
Zhao G, Pang B, Xu Z, Peng D, Xu L (2019) Assessment of urban flood susceptibility using semi-supervised machine learning model. Sci Total Environ 659:940–949
Zhao G, Pang B, Xu Z, Yue J, Tu T (2018) Mapping flood susceptibility in mountainous areas on a national scale in China. Sci Total Environ 615:1133–1142
Zhao X, Chen W (2020a) Optimization of computational intelligence models for landslide susceptibility evaluation. Remote Sens 12:2180
Zhao X, Chen W (2020b) GIS-based evaluation of landslide susceptibility models using certainty factors and functional trees-based ensemble techniques. Appl Sci 10:16
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: H. Babaie
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Mirzaei, S., Vafakhah, M., Pradhan, B. et al. Flood susceptibility assessment using extreme gradient boosting (EGB), Iran. Earth Sci Inform 14, 51–67 (2021). https://doi.org/10.1007/s12145-020-00530-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12145-020-00530-0