Introduction

In the recent past, floods have occurred more frequently as a result of climate changes like the variations in air temperature and rainfall amount and intensity. Apart from the increase of the flood frequency, inappropriate land use planning and management has enhanced both damages costs and life losses. In order to manage the situation and decrease the damages or even forbid them, it is essential to first determine the flood-prone areas (Lee et al. 2017).

Regarding the complicated hydrological features of the watershed and the ever-increasing anthropogenic impacts, floods are hard to be predicted implementing simple non-linear algorithms (Khosravi et al. 2018). For this reason, machine learning and statistical models have been implemented in flood prediction, landslide, and gully susceptibility as well as groundwater potential studies because of their higher efficiency (Bui et al. 2019; Chen et al. 2020b, 2020c; Li and Chen 2020; Zhao and Chen 2020a, 2020b).

Some examples of these models are: artificial neural networks (Sahoo et al. 2006; Youssef et al. 2011), support vector machines (Shafapour et al. 2015), logistic regression (Nandi et al. 2016), evidential belief function and decision trees (Rahmati and Pourghasemi 2017), frequency ratio (Rahmati et al. 2016) random forest and boosted-tree (Lee et al. 2017), Genetic Algorithm Rule-Set Production (GARP) and Quick Unbiased Efficient Statistical Tree (QUEST) (Darabi et al. 2019), weakly labeled support vector machine (WELLSVM) (Zhao et al. 2019), Reducederror pruning trees (REPTree) with Bagging (Bag-REPTree) and Random subspace (RS-REPTree) ensemble frameworks (Chen et al. 2019), classification and regression trees and alternating decision tree (Janizadeh et al. 2019), and alternating decision tree (ADT), functional tree (FT), kernel logistic regression (KLR), multilayer perceptron (MLP) and quadratic discriminant analysis (QDA) (Janizadeh et al. 2019). Additionally, some other studies indicated that hybrid models, such as ensemble of Decision Tree, weights-of-evidence and support vector machines (Tehrany et al. 2014a, 2019), neuro-fuzzy system integrated with metaheuristic algorithms (Bui et al. 2016; Termeh et al. 2018), logistic model tree with bagging ensembles (Chapi et al. 2017), swarm optimized neural networks (Ngo et al. 2018), RF,ANN, SVM (Zhao et al. 2018), ensemble of evolutionary models and ANFIS (Hong et al. 2018), ensemble of multivariate discriminant analysis, CART, and SVM (Choubin et al. 2019), ensemble of multi-criteria decision making (Wang et al. 2019a), fuzzy rule based ensembles (Bui et al. 2019), ensemble of RF, Stochastic Gradient Boosted Model, and Extreme Learning Machine (Shin et al. 2019), had better performances than their single models. Investigating the literature refers that different kinds of algorithms have been used for modelling flood susceptibility, but there still need to use newer and more advanced models to find the best solution to control flood disaster regarding its complicated behavior. Therefore, this study aims to model flood susceptibility by the new model EGB and compare its performance with three benchmark models i.e., FR, RF, and GAM. The FR, RF, and GAM models have been successfully implemented in the flood susceptibility modelling and different other fields (Rahmati et al. 2016; Golkarian et al. 2018; Motevalli et al. 2019; Naghibi et al. 2019a; Vafakhah et al. 2020). Therefore, the method of the EGB is used in flood susceptibility mapping (FSM) in this paper. The fundamental advantage of the EGB is the implementation of the boosting method, which produces strong predictions by “combining several weak learners”. Application of the EGB can diminish the impact of “over-fitting issue” in the final model and produce more generalized outputs.

Material and methods

At first, the flood locations were determined based on field surveys and national reports. Additionally, non-flood locations were produced with a “random-systematic” strategy. Then, we prepare the flood conditioning factors and classify them into training and testing datasets. These datasets are used in order to model flood susceptibility. The output susceptibility maps were validated by Accuracy, and Kappa indices as well as receiver operating characteristics (ROC) curve. A detailed methodology flow chart is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of the methodology in the current study

Study area

The Talar watershed is a mountainous region and has an area of roughly 1765 km2. (Yousefi et al. 2017) showed that this river has been impacted by many floods in the past years. The elevation in the Talar River watershed varies from 221 to 3944 m above sea level with an average value of 1966 m a.s.l.. The average width of the Talar River at the outlet of the basin is about 25.5 m (Fig. 2). There are different land-use classes in the Talar watershed including bare land, agriculture, forest, rangeland, and residential areas (Fig. 3). The average annual rainfall and temperature in the Talar watershed are 610 mm, and 11֠C, respectively. The Talar watershed has a Mediterranean climate. The main soil textures in the study region is loamy-silty, clay-silty, loamy-clay, and clay-loamy (Maghsood et al. 2019).

Fig. 2
figure 2

Location of the study area in Iran, Mazandaran province, and location of the training (flood and non-flood) and testing (flood and non-flood)

Fig. 3
figure 3

Four different locations affected by flood in Talar watershed (photos by Sajjad Mirzaei, Zirab City)

Flood dataset

In order to detect the flood locations in the Talar watershed, several field surveys were carried out in the lowland areas of the watershed. Further, we used the reports of the Mazandaran Regional Water Authority as well as gaining information from the residents. In addition, hydrology and flood reports as well as the findings of Motevalli and Vafakhah (2016) and Yousefi et al. (2017) were used. Overall, 243 flood locations were detected in the study area. In order to apply the machine learning models, which need non-occurrence or in this study non-flood locations, 243 locations were systematic-randomly selected. First, the points were generated in ArcGIS, and then they investigated in order to check whether they have been correctly selected. Based on investigation of the literature in FSM (Tehrany et al. 2014b; Wang et al. 2019b; Chen et al. 2020c; Pourghasemi et al. 2020) and other geospatial sciences (Naghibi et al. 2018, 2019b, 2020; Motevalli et al. 2019; Li and Chen 2020), the presence and absence locations i.e., the flood and non-flood locations were categorized into training and testing groups covering 70% and 30% of the points, respectively (Fig. 2).

Flood conditioning factors

This study considered several flood susceptibility conditioning factors based on the literature (Tehrany et al. 2014a; Shafapour et al. 2015; Rahmati et al. 2016; Hong et al. 2018; Khosravi et al. 2018; Termeh et al. 2018; Vafakhah et al. 2020) and data availability. The input factors include altitude, slope, profile curvature, topographic wetness index (TWI), distance from rivers, normalized difference vegetation index (NDVI), plan curvature, rainfall, land use, stream power index (SPI), and lithology. The altitude of the study region was obtained from the ASTER-Global digital elevation model (DEM) having a 30×30 m spatial resolution. Generally, higher altitudes have high drainage density and low discharge, while the situation is different in the lowland areas. Slope impacts the water flow velocity over the ground surface and in the channels. This factor was calculated using DEM and is presented in Fig. 4b. The study area has slopes ranging from 0 to 69 degrees.

Fig. 4
figure 4figure 4

Input predictor variables: (a) altitude, (b) slope angle, (c) plan curvature, (d) profile curvature, (e) SPI, (g) distance from the river, (h) land-use, (i) NDVI, (j) lithology, and (k) rainfall

Plan and profile curvature were created using the DEM of the study region and used in the modelling process (Fig. 4c). These curvatures influence the water flow velocity as well as erosion and deposition processes (Fig. 4d).

SPI presents the river strength for the erosion process. SPI has a direct influence on flood occurrence because it increases with slope and upland watershed area (Lee et al. 2018).

SPI can be computed as follows (Dewan and Yamaguchi 2008) (Fig. 4e):

$$ SPI= As\times \tan b $$
(1)

where, As depicts certain basin area, and b slope degree at each point of the basin.

TWI can be calculated as follows (Beven and Kirkby 1979) (Fig. 4f):

$$ TWI=\ln \left(\frac{a}{\tan b}\right) $$
(2)

where, a is the cumulative area to a specific pixel, and b is slope angle at any given pixel.

Distance from river influences the discharge and spread of the flooding in a given area (Wan et al. 2010; Glenn et al. 2012). Distance from river layer was created by the Euclidean distance function (Fig. 4g).

Land use and NDVI are indicators of land cover in an area. Land use was created by a “supervised learning algorithm” which is a common way of classifying land use (Myint et al. 2011; Alganci et al. 2013; Kantakumar and Neelamsetti 2015; Basukala et al. 2017; Thakkar et al. 2017). We used Landsat OLI images for four dates including 31 May 2017, 2 July 2017, 20 September 2017, and 22 October 2017 to derive land use maps by Maximum Likelihood Classification algorithm (Kamali Maskooni et al. 2020). The full methodology and results of this part can be found in Mirzaei et al. (2020). The Talar River watershed was classified into five classes of rangeland, agriculture, forest, residential areas and bareland (Fig. 4h). The vegetated parts of the watershed have a lower susceptibility to the flood incidence because there is a reverse relationship between flooding incidence probability and vegetation cover (Tehrany et al. 2013). NDVI was computed regarding the red and infrared bands of an image on 2 July 2017 (Row: 35, Path: 163) from Landsat OLI-IRS (Mirzaei et al. 2020).

Rainfall data were obtained from 14 raingauges and climatological stations in and around the study region (Table 1).

Table 1 Average annual rainfall at the rain-gauge stations, their location and height

In this study, universal and ordinary Kriging and Co-kriging interpolation methods by circular, spherical, exponential, Gaussian, Stable, J-Bessel, K-Bessel, Hole Effect, Rational Quadratic models, Inverse Distance Weighting (IDW), Radial Basis Function (RBF), Global Polynomial Interpolation (GPI), Local Polynomial Interpolation (LPI), General and local estimators were evaluated using ArcGIS software. After performing the interpolation operation by geostatistical and deterministic methods for comparing, evaluating and selecting suitable interpolation method, Root Mean Squares Error (RMSE) index was used. Results showed that, in the case of annual rainfall, Ordinary Kriging by J-Bessel model was the most appropriate. The spatial variation of annual rainfall is shown in Fig. 4k. The lithology of the study basin was obtained from the Geology Survey of Iran (GSI) (1997). Lithology impacts on soil permeability and has an important role in flooding and its magnitude. There are 26 different lithology classes in the study region (Table 2; Fig. 4j).

Table 2 Lithological characteristics of the study area

It needs to be clarified that the RF, GAM, and EGB used the continuous form of the factors except the ones that are categorical such as land use, and lithology. Whereas to apply the FR method, we needed to classify the continuous factors to distinguishing classes. To do so, NDVI was classified into five classes with equal classification algorithm; plan and profile curvatures were classified into three classes of < −0.00, (− 0.001) - (0.001), and > 0.001 representing convex, flat, and concave curvatures, respectively; since most of the floods occurred in approximations of the rivers, we classified it to five classes of 0–50, 50–100, 100–150, 150–200, and > 200 to better distinguish the relationship between this factor and flood occurrence.

Classification models

Frequency ratio

FR was introduced by Bonham-Carter (1994) and is explained as the probability of incidence of a specific event. This model has been used in many studies in order to define the relationship between target factors such as flood, gully, forest fire, and groundwater spring and their conditioning factors. The output of the FR is simple and helps managers and stakeholders to understand the relationships between input and output factors (Nourani and Komasi 2013). FR can be calculated as below:

$$ FR=\frac{F/ FF}{A/ AA} $$
(3)

where, F is the number of floods in each class, FF is the total number of floods in the study region, A is the number of pixels in each class, and AA is the total number of pixels in the study region. It is noteworthy to mention that the final FR value is obtained by summing the FR values for all factors. FR values are assigned to the pixels by “lookup” function in ArcMap and they are summed by the “weightedsum” function.

Random forest

RF could be regarded as an ensemble algorithm created by several decision trees as predictors and is implemented for classification and regression topics (Breiman 2001). RF is a flexible and strong algorithm that applies random trees by a set of cases through a bootstrapping method. The cases that are not considered in constructing each tree is called out of bag (Catani et al. 2013; Hong et al. 2017). There are two indices to define the contribution of the factors in RF model such as “mean decrease accuracy and mean decrease Gini” (Naghibi et al. 2016). RF is appropriate for working with large data sets and produces satisfactory outputs (Arabameri et al. 2019). In RF, a voting is done between the outputs of the constructed trees and predicts the target variable, in this case, flood susceptibility. To run this model, randomForest package in R software was implemented and the maps were prepared and classified in ArcMap 10.2.

Generalized additive model

GAM is categorized as a “semi-parametric” regression method (Hastie and Tibshirani 1990; Chambers and Hastie 1992). Response curves of this model are predicted by smooth functions; this leads to an extensive variety of response curves to be predicted (Maggini et al. 2006; Pourtaghi et al. 2016). An advantage of the GAM is that it could be interpreted easily, unlike other data mining, black-box, complex models (Goetz et al. 2011). GAM is able to model non-linear features that are influenced by many factors like flood susceptibility (Petschko et al. 2014). The main difference between the generalized linear model and GAM is that the first one implements the parametric impact of solitary variables, while the second one has smoother additive terms (Vorpahl et al. 2012). GAM was applied using caret and mgcv packages in R software.

Extreme gradient boosting

EGB method was introduced by Chen and Guestrin (2016) is a new application of the “gradient boosting machine”. The foundation of EGB is on the basis of the “boosting” which could be explained as creating a “strong learner” by combining the outputs of several “weak learners” (Fan et al. 2018a). The EGB attempts to tune the parameters without making the model over-fitted. The procedure of optimization in EGB begins with creating the first learner to the whole dataset of the variables and follows with creating the next model on the residuals. The procedure finishes when it reaches “stopping criteria” (Fan et al. 2018a). EGB also utilized parallel processes which diminishes the required computation time (Fan et al. 2018b; Naghibi et al. 2020). It gets stronger comparing with other algorithms in the case of missing data availability in the dataset. To apply the EGB, we used the caret package in the R statistical software.

Results and discussion

Flood susceptibility maps obtained by the used algorithms

Frequency ratio

The results of the FR model are presented in Table 3. Based on the results, the highest FR is related to the elevation class of 220–1000 m with an FR value of 4.7. The class of 1000–1650 m has the second-highest FR value of 1.4. In the case of land use, it can be seen that agriculture and residential areas have the highest FR values of 7.5 and 9.7, respectively. FR for NDVI depicts that classes of less than 0.75 have high FR values. NDVI class of 0.1–0.25 and NDVI class lower than 0.1 have the highest FR values of 1.8 and 1.6, respectively. For plan curvature, the findings indicated that class of (− 0.001) - (0.001) had the highest FR value of 4.6. In the case of profile curvature, a class more than 0.001 has the highest FR value of 1.7. Rainfall classes of 725–880 and 617–728 have the highest FR values of 2.7 and 1.3, respectively. In the case of distance from rivers, it can be seen that classes of 50–100 and 150-200 m have the highest FR values of 10.7 and 10.1, respectively. FR results for slope showed that classes of 0–2 (FR = 5.5) and 15–70 (FR = 1.3) have higher FR values than other classes. In the case of SPI, it can be seen that the class of 2.5–80 has a high FR value of 12.3. Regarding TWI, the results showed that TWI class of more than 18.3 has the highest FR value of 36.5. It should be mentioned that this class only covers 1 % of the study region; thus, it does not have much importance in this model. The second highest FR value was observed for the TWI class of 14.1–18.3. Figure 5 shows the flood susceptibility map produced by the FR model classified by the natural break classification scheme. Area percent of flood susceptibility classes showed for the FR, GAM, RF and EGB algorithms in Table 4.

Table 3 Results of the FR model for different classes of the factors
Fig. 5
figure 5

Flood susceptibility map obtained by the FR algorithm

Table 4 Area percent of flood susceptibility classes for the FR, GAM, RF and EGB algorithms

Random forest

The RF model was optimized for the training dataset with a node size of 3, mtry of 2, and 1000 trees. The confusion matrix for predictions of the RF on training data is shown in Table 5. Based on Table 6, the RF has predicted 161 non-flood cases and 164 flood cases correctly, while 10 non-floods and 5 floods are predicted incorrectly. This leads us to a class error of 0.0584 for non-flood prediction and a class error of 0.0295 for flood prediction. The importance of the factors in flood susceptibility mapping was defined through the calculation of mean decrease Gini and is presented in Table 5. Based on the results, altitude, distance from rivers, TWI, slope, and land use had the highest importance in modelling flood susceptibility. On the contrary, lithology, NDVI, and SPI were reported to be the least important factors. For defining the flood susceptibility classes, we used natural break classification scheme with four classes according to the literature (Termeh et al. 2018; Khosravi et al. 2019). Figure 6 shows the flood susceptibility map produced by the RF model. According to the flood susceptibility map, low, moderate, high, and very high susceptibility classes cover 77.6, 14.2, 4.3, and 3.9% of the study area, respectively.

Table 5 Importance of the factors in modelling flood susceptibility in the study area
Table 6 Confusion matrix of the RF model for the training dataset
Fig. 6
figure 6

Flood susceptibility map obtained by the RF algorithm

Generalized additive model

The GAM was optimized by a select parameter of FALSE with accuracy and Kappa indices of 0.98 and 0.97, respectively. For optimizing the GAM, the tuning parameter of the method was selected to be “generalized cross-validation Cp”. Figure 7 shows the flood susceptibility map produced by the GAM. Based on the flood susceptibility map classified by natural break, low, moderate, high, and very high susceptibility classes occupy 90.4, 0.7, 0.9, and 8% of the studied region, respectively.

Fig. 7
figure 7

Flood susceptibility map obtained by the GAM algorithm

Extreme gradient boosting

Based on the results, the final EGB model was optimized with rounds of 100, ʎ of 0.1, an α of 0.1, and ƞ of 0.3 (Fig. 8). Further it can be seen that 100 iterations produces the best accuracy for different alpha and regularization terms. The accuracy and Kappa values of the optimum EGB algorithm were calculated as 0.95, and 0.90. Low, moderate, high, and very high classes of susceptibility cover 91.6, 14.2, 4.3, and 3.9%, respectively (Fig. 9). The findings of the EGB in Fig. 5 also depicted the high importance of the distance from the rivers, NDVI, slope, and TWI. Lower contribution of the lithology, plan curvature, rainfall and land use were also reported.

Fig. 8
figure 8

Training results of the EGB-Linear

Fig. 9
figure 9

Flood susceptibility map obtained by the EGB algorithm

Evaluating the performance of the models

Due to the importance of the performance evaluation step, this study used ROC curve for this purpose. ROC is a common and strong method for evaluating the binary issues and has been used in different fields of study including groundwater, flood, floodspreading, and landslide (Naghibi et al. 2017, 2018; Rahmati et al. 2018; Golkarian et al. 2018; Chen et al. 2019, 2020a, 2020c; Kordestani et al. 2019; Chen and Li 2020; Wang et al. 2020; Zhao and Chen 2020b, a; Lei et al. 2020a, 2020b; Li and Chen 2020; Chen and Chen 2021). ROC curve plots “sensitivity” against “1-specificity” at different cut-off values (Conoscenti et al. 2016; Naghibi and Moradi Dashtpagerdi 2016). The area under the curve (AUC) of ROC varies from 0 to 1 where an AUC close to one shows a high-performance model and an AUC close to 0 depicts a low-performance model (Sangchini et al. 2016; Hong et al. 2017; Mousavi et al. 2017). Based on the results of the ROC curve in Table 7, it can be seen that the RF and EGB are the leading models with the highest AUCs of 0.985, and 0.98, respectively. The GAM and FR models had lower accuracy than the leading models with AUC scores of 0.94 and 0.953, respectively. Based on the accuracy scores, RF had the highest performance with an accuracy of 0.965, followed by the EGB and GAM. The Kappa index also showed high performance of the RF and EGB compared to other models.

Table 7 Results of area under the ROC curve (AUC)

Discussion

Flood occurs frequently in different countries particularly in the Middle Eastern countries as a result of lack of proper water resources management plans and strategies. This leads the researchers to more advanced algorithms to generate high-accuracy flood susceptibility maps and prepare some initial information for further actions to reduce the damages and save lives. This study made use of EGB as a new MLA and assessed its performance for this purpose. Based on Fressard et al. (2014), all the algorithms produced excellent predictions (or AUC > 0.9). Further to AUC, Accuracy and Kappa were also calculated for the algorithms and showed that the RF and EGB had the best performances, followed by the GAM and FR algorithms. The higher performance of the RF could have resulted from its strong features. RF is robust to noise and outliers (Sameen et al. 2019), the issues that are frequent in geospatial studies like flood susceptibility. RF is capable of predicting the importance or influence ratio of the input factors in the modeling process (Naghibi et al. 2016). This capability makes this model more interpretable than other black-box tree-based models (Pal 2005). RF is able to handle and work with multiple different inputs without an act of factor removal (Naghibi and Pourghasemi 2015; Sameen et al. 2019). RF is able to work with huge data. GAM and FR have also shown acceptable performances. EGB on the other hand, applied boosting technique, which is known as a strong feature in data mining models resulting in better outputs for classification issues. “Gradient boosting method” suffers from a lack of “strong regulation parameter”, that had made it vulnerable to “over-fitting”, but the regularization parameter in EGB makes overcomes this shortcoming (Georganos et al. 2018). The impact of boosting was also confirmed in another study i.e., Naghibi et al. (2017) where they used the FR model to combine the results of some data mining models. Their ensemble model constructed on the basis of boosting had better performance, which is consistent with the results of this research. The results of Georganos et al. (2018) in object-based land-use classification proved a superior performance of the EGB comparing to other models like RF and support vector machines. In another study, Naghibi et al. (2020) also implied the superior efficiency of the EGB in groundwater potential studies which is in line with our findings. Babajide Mustapha and Saeed (2016) clarified that the EGB operated well in classifying biological datasets and they pointed to the fact that EGB is capable of handling both “homogenous and heterogenous” datasets. It also does not require handling missing cases which enhances its computational speed (Timofeev and Denisov 2020). FR as a statistical model provides an easy to interpret outputs that could be useful for the managers as well as stakeholders (Nourani et al. 2014). Therefore, the selected models in this study provide both complex high-performance and simple interpretable results. This feature might have caused superior performance than two other models of GAM and FR with simpler structures.

The results of factor importance by the RF model as the best algorithm in this research showed that distance from the rivers had an important influence on flood susceptibility, followed by profile curvature, slope, TWI, and altitude. The results of Khosravi et al. (2018) showed that altitude had the highest importance in modelling flood susceptibility, followed by distance from the river, NDVI, soil type, and slope. This shows that in spite of differences between the importance of factors affecting the flood susceptibility, there are some shared results, for instance, for distance from the river, and slope. The differences between the important values in this study and Khosravi et al. (2018) could be related to the physical, topographical, and hydrological characteristics of the watersheds. Floods occur in certain distances from rivers; thus, this factor has had a high contribution to the modelling. Higher slopes are related to higher elevations where drainage density is higher and flood discharge is lower. Therefore, we do not expect flood occurrence in those areas. A range of slopes between mountainous and plain areas where discharge reaches higher amounts is more susceptible to flood occurrence. Profile curvature, TWI as secondary topographical factors as well as altitude impact the drainage development in different parts of the watershed, runoff speed, and erosion and sediment ratio.

Conclusion

The current study approved the high performance of the EGB in FSM compared with the RF as the benchmark algorithm in such studies. The RF and EGB models had AUC values of more than 0.98, which is regarded as excellent prediction ability in classification issues. Therefore, it can be concluded that the EGB can be utilized for FSM. In addition to the performance analysis, the importance of the factors was also assessed and depicted that the high importance of the distance from the river, profile curvature, slope, TWI, and altitude in the modelling process of flood occurrence. It is also concluded that the topographical of DEM-derived factors have great influence. This finding gives insight to researchers to target the factors for future studies and select them in a better way. The current study gives a regional perspective to flood control sector to focus on potentially disastrous areas and mitigate the damages. More precisely, the northern parts of the watershed are more susceptible and flood control strategies should be concentrated to those spots. For future studies, it is recommended to apply different optimization algorithms to enhance the performance of the EGB and produce more reliable flood susceptibility maps.