Introduction

Landslides are the movement of materials that form the slope including natural rocks, soil, artificial accumulations, or a mixture of them that move to lower parts by gravity force (Guzzetti 2015). The landsides are one of the most common catastrophic natural dangers that occur in any regions of world and cause to hundreds million dollar economic loss, soil erosion and hundreds thousand mortality and injuries yearly (Aleotti and Chowdhury 1999; De Sy et al. 2013; Lee et al. 2017). During recent years many governments and research institutions have invested for preparing the maps that indicate the landslides spatial distribution (Xie et al. 2005; Guo-liang et al. 2017). Regardless the obtained progresses for identifying, measuring, forecast, and warning systems of landslide, but the losses result from landslide are increasingly in the worldwide, yet (Kincal et al. 2010). The landslides are the result of interconnected local temporal processes including hydrological processes (precipitation, evaporation, and underground waters), vegetation weight, root resistance, soil condition, mother stone, height from sea level, slope degree and direction, topography, and human activities (Youssef 2015; Myronidis et al. 2016). Over the last decades, different methods have been developed to evaluate landslide susceptibility in different parts of the world. These methods can be divided into two categories including qualitative and quantitative methods. Qualitative methods are based on field observations and knowledge of the experts. In these methods, weight of conditioning factors was determined by experts (Wen et al. 2017; Achour et al. 2017). Quantitative methods that in the last few decades have often been used for landslide susceptibility mapping are consisted from three main categories: deterministic approaches, statistical methods, and computational intelligence methods. Statistical methods can be divided into two categories including bivariate and multivariate methods. The bivariate methods are including information value method (Chen et al. 2016a, b; Ba et al. 2017; Achour et al. 2017), frequency ratio (Wu et al. 2016; Li et al. 2017), certainty factor (Wen et al. 2017; Hong et al. 2017a, b; Kornejady et al. 2017), and evidential belief function (Ding et al. 2016; Pourghasemi and Kerle 2016). In contrast, multivariate statistical analyses are known as logistic regression (Wang et al. 2015; Colkesen et al. 2016; Horafas and Gkeki 2017; Guo-liang et al. 2017). Computational intelligence models are artificial neural network (Dou et al. 2015; Moosavi and Niazi 2015; Wang et al. 2016; Chen et al. 2016a, b; Pourghasemi et al. 2017; Zeng et al. 2017), support vector machines (Ren et al. 2015; Hong et al. 2016a, b; Colkesen et al. 2016; Chen et al. 2017a, b), and random forest (Hong et al. 2016a, b; Zhang et al. 2017; Chen et al. 2017a, b; Kim et al. 2017;  Lai et al. 2017; Pourghasemi and Rahmati 2017; Zhang et al. 2017). Several studies have also been done on various scenarios (Avolioa et al. 2000; Du et al. 2013; Mantovani et al. 2000; Prompera et al. 2014). Quantitative and qualitative methods have disadvantages and advantages in the literature reviews. The accuracy of qualitative methods is significantly dependent on the expertise of researcher (Feizizadeh et al. 2014), whereas deterministic models because they depend on the computation of the relevance between resisting and provocative forces, which requires precise data on the slope geometry, soils and rock’s, and hydrological conditions are usually used in small region (Armas et al. 2014). In the statistical methods, bivariate models can obtain the impact of each conditioning factor class on landslide occurrence, but it does not consider variables importance, while multivariate statistical method is opposite to it (Guo-liang et al. 2017). Statistical models that are proven to be suitable for studying large areas in terms of landslide susceptibility, but these models do not easily predict unforeseen relationships between large numbers of landslide conditioning factors and complex landslide systems (Pourghasemi et al. 2013). Computational intelligence models focus on appropriate learning approaches for recognition of the nonlinear relationship between conditioning factors and landslides (Gordan et al. 2015). These models have been successfully implemented for landslide susceptibility mapping.

So, the main objective of this study is to provide different scenarios for landslide susceptibility assessment in the Ghaemshahr Watershed, Mazandaran Province, Iran, using a combination of statistical and computational intelligence methods. In this regard, among the statistical methods, frequency ratio and among computational intelligence methods random forest and support vector machines were selected for applying ten scenarios on landslide modeling. By the way, the mentioned research aims to consider scenario-based landslide modeling in point and polygon formats.

Materials and methods

Study area

The Ghaemshahr Watershed with a total area of 1637 square kilometers is located approximately in 43 kilometers southwest of the city of Sari in Mazandaran Province. The study area lies between the latitudes of 35°44′–36°09′N, and longitudes of 52°36′–53°23′E (Fig. 1). The maximum height of the study area is located in the southwest with a height of 3877 m above sea level, and the minimum height is in the northeast of the area with a height of 476 m a.s.l. The monthly average rainfall of this basin is more than 500 mm, and maximum rainfall occurs during January to April according to nine rainfall stations (Alasht, Tale-Savadkoh, Alvand-Doab, Ori-Melk, Zardgol-Sorkhabad, Doabe-Savadkoh, Veresk, Docal, and Nesa) for the years 1985–2015 (Meteorological Organization, http://www.irimo.ir/far/). The study area is covered by various types of lithological formations including Triassic, Pliocene, Cretaceous, Eocene, Miocene, Cambrian, Devonian, Quaternary, and Paleozoic. Most of the study area is covered by forest (B), about 636 km2. Other land use types are orchard (A), range (C), forest and dryfarming (D), irrigation agriculture and range (E), forest and range (F), and residential area (G).

Fig. 1
figure 1

Study area location

Methodology

The flowchart of the methodology used in this study is shown in Fig. 2 and consists of five phases:

  1. (1)

    Preparation of data,

  2. (2)

    Multi-collinearity analysis among conditioning factors using tolerance and VIF indices,

  3. (3)

    Determination of the relationship between landslide occurrence and conditioning factors using FR model,

  4. (4)

    Running SVM and RF intelligent models by applying different scenarios on landslide point and polygon formats, and

  5. (5)

    Validation of the landslide susceptibility maps using the ROC curve.

Fig. 2
figure 2

Flowchart of research of in the study area

Conditioning factors database

The tools used in this research are ArcGIS10.5, ENVI 4.8 (for extraction of LU/LC), SAGA-GIS 2.1.1, and Global Positioning System (GPS). The basic maps used were geological maps by scale of 1: 100,000, aerial photos (08/02/1964) on scale 1:40,000, topographic maps with scale of 1:50,000, satellite images of Landsat8, ASTER DEM, LISS-III, and rainfall data for a 30-year period (from 1985 to 2015) and nine rainfall stations (Alasht, Tale-Savadkoh, Alvand-Doab, Ori-Melk, Zardgol-Sorkhabad, Doabe-Savadkoh, Veresk, Docal, and Nesa).

The first step for mapping landslide susceptibility and risk analysis is collecting data about landslides that have occurred in the past, so preparing landslide inventory map is prerequisite for such studies (Guzzetti et al. 2012). In the study area, a total of 294 landslides were mapped using aerial photograph with 1:40,000-scale, satellite imagers (IRS: LISS-III), and several field surveys. Most of the landslides are shallow rotational with a few translational. In this research, the landslide inventory was randomly split into a testing dataset 70% (206 landslide locations) for training the models and the remaining 30% (88 landslides locations) was used for validation purpose (Chen et al. 2017a, b). Field photographs of some identified landslides in the study area are shown in Fig. 3.

Fig. 3
figure 3

Field photographs of some identified landslides in the study area

Identification and selection of conditioning factors on landslide occurrence is one of the most important steps for landslide susceptibility mapping (Ercanoglu and Gokceoglu 2002). In this research, based on the study of previous researches (Zhang et al. 2017; Zeng et al. 2017) and features of the study area, 13 conditioning factors affecting landslide such as slope aspect, slope degree, altitude, plan curvature, distance from river, drainage density, distance from fault, distance from road, LU/LC, TWI, annual rainfall, geology, and convergence index were selected (Fig. 4). Maps related to the effective factors were prepared in the ArcGIS 10.5 and prepared for processing. In order to prepare DEM (digital elevation model), slope aspect, slope degree, and geomorphometric parameters such as TWI, plan curvature, and convergence index, ASTER DEM with 30 m spatial resolution are used. The elevation does not contribute directly to landslide occurrence, but in relation to the other parameters, like tectonics, erosion–weathering processes, and precipitation, the elevation contributes to landslide occurrence and influences the whole system (Rozos et al. 2011). The elevation map for study area with cell size 30 m × 30 m was produced from the ASTER DEM and classified into six classes of 477–1200, 1200–1600, 1600–2000, 2000–2400, 2400–2800, and > 2800 m (Fig. 4a). The slope map of the study area is derived from the ASTER DEM using the slope function in ArcGIS 10.5. These slope values (in degree) are based on natural break scheme and divided into sex different classes (Wu et al. 2016) including flat-gentle slope < 10, fair slope (10–15), low slope (15–20), moderate slope (20–30), steep slope (30–40), and very steep slope > 40 (Fig. 4b). Slope aspect strongly affects hydrological processes by means of evaporation-transpiration, direction of frontal precipitation, and thus affects weathering processes and vegetation and root development (Sidle and Ochiai 2006). Aspect layer has been categorized into nine classes including flat, north, northeast, east, southeast, south, southwest, west, and northwest (Fig. 4c). The convergence index (CI) gives a measure of how flow in a cell diverges (convergence index < 0) or converges (convergence index > 0) (Claps et al. 1994). CI map provided in SAGA-GIS 2.1.1 and divided into 5 classes: − 100 to − 22, − 22 to − 6, − 6 to 6, 6–21, and > 21 (Fig. 4d). The curvature of the surface (plan curvature) reflects the directional variations along a curve. The effect of plan curvature on the slope erosion process is the convergence and divergence of water along the flow direction (Ercanoglu and Gokceoglu 2002). The plan curvature map was produced using ArcGIS 10.5 and was classified into three categories (Pourghasemi and Kerle 2016): concave, flat, and convex (Fig. 4e). TWI is a combination of ups and downs that shows the ratio between slopes in the basin (Eq. 1). TWI index is an indicator of the spatial distribution of soil moisture along the landscape. Therefore, it is used for landslide susceptibility mapping (Pourghasemi et al. 2014; Naghibi et al. 2015).

$${\text{TWI}} = \ln \left( {\frac{S}{\tan \propto }} \right)$$
(1()

where S is the cumulative upslope area draining and a is the slope gradient in degrees (Moore et al. 1991). The TWI map divided into four classes (Hong et al. 2016a, b): − 5.4 to − 0.28, − 0.28 − 1.26, 1.26–4, and > 4 (Fig. 4f).

Fig. 4
figure 4figure 4

Landslide conditioning factors

By applying the gradient formula of the rainfall region on the digital elevation model (Eq. 2), map of annual rainfall was prepared and classified into five classes (Chen et al. 2016a, b): 475–520, 520–580, 580–620, 620–670, and > 670 mm/yr (Fig. 4g).

$$Y = 0.5667X + 108.33$$
(2)

where Y is annual rainfall and X is altitude. R 2 is 0.923.

The distance from linear factors such as river, road, and fault was calculated using the distance function available in the ArcGIS.10.5. Distance from roads (Fig. 4h) and distance from rivers (Fig. 4i) divided into sex classes: 0–100, 100–200, 200–300, 300–400, 400–500, and > 500 m. Distance from fault also divided into five classes: 0–500, 500–1000, 1000–1500, 1500–2000, and > 2000 m (Fig. 4j). Drainage density is prepared by applying the density line function available in the ArcGIS.10.5 and divided into five classes (Chauhan et al. 2010): 0–0.5, 0.5–0.8, 0.9–1.1, 1.1–1.4, and > 1.4 mm2 (Fig. 4k). Land use/land cover (LU/LC) map of the study area was prepared using Landsat 8 images. To create the land use map, a supervised classification using the maximum likelihood algorithm (Pourghasemi and Kerle 2016) was applied. Seven land use types were extracted such as orchard (A), forest (B), range (C), forest and dryfarming (D), irrigation agriculture and range (E), forest and range (F), and residential area (G) (Fig. 4l). The generated LC/LU was validated using 285 field verification points in the field. Kappa coefficient for the final map was estimated by Eq. 3. (Lo and Yeung 2002).

$$K = {\raise0.7ex\hbox{${\left\{ {N\mathop \sum \nolimits_{i = 1}^{r} \left( {X_{ii} } \right) - N\mathop \sum \nolimits_{i = 1}^{r} \left( {X_{i + } \cdot X_{ + i} } \right)} \right\}}$} \!\mathord{\left/ {\vphantom {{\left\{ {N\mathop \sum \nolimits_{i = 1}^{r} \left( {X_{ii} } \right) - N\mathop \sum \nolimits_{i = 1}^{r} \left( {X_{i + } \cdot X_{ + i} } \right)} \right\}} {N^{2} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${N^{2} }$}} - \mathop \sum \limits_{i = 1}^{r} \left( {X_{i + } \cdot X_{ + i} } \right)$$
(3)

where r is number of rows in error matrix; X ii is number of observations in row i and column i; X i + is total of observations in row i; X +I is total of observations in column i; and N is total number of observations included in the matrix. Kappa coefficient of generated LC/LU 97.65 was obtained.

Geological map of the region was prepared based on the digitization of the polygons of the lithological units in the geological map with a 1:100,000-scale in ArcGIS10.5. The lithological units were classified into ten categories according to formation and theirs susceptibility to landslide occurrence (Fig. 4m and Table 1).

Table 1 Lithology of the study area (GSI, 1997)

For the classification of the conditioning factors, different methods such as manual, equal interval, and natural break were used.

Multi-collinearity analysis

An important issue in the use of conditioning factors in the preparation of a landslide susceptibility map is the effect of correlation among conditioning factors. When there is a high correlation between the two independent variables, there is a problem called multi-collinearity. This high correlation reduces the accuracy of the results. Tolerance and the variance inflation factor (VIF) are two important indices for multi-collinearity recognition. A tolerance of less than 0.20 or 0.10 and/or a VIF of 5 or 10 and above indicates a multi-collinearity problem (Pourghasemi et al. 2013).

Determination of the relationship between landslide occurrence and conditioning factors using FR model

The FR model is considered as the most popular and the simplest approach for preparing landslide susceptibility maps (Wu et al. 2016). The FR is based on the observed relationship between the distribution of landslides and each landslide-related factor (Tay et al. 2014). In determining the ratio of frequency ratios, the occurrence ratio of landslide in each class of conditioning factors toward the total of landslides is obtained and ratio of the surface of each class toward the total area of the region is also calculated. Finally, by dividing the occurrence ratio of landslides in each class by the rate of area of each class, relative to the entire study area, the frequency ratio of the classes of each factor is calculated. The calculation of the frequency ratio for each class of conditioning factors is expressed in Eq. 4 (Pradhan and Lee 2010):

$$FR = \frac{{\left( {{\raise0.7ex\hbox{$A$} \!\mathord{\left/ {\vphantom {A B}}\right.\kern-0pt} \!\lower0.7ex\hbox{$B$}}} \right)}}{{\left( {{\raise0.7ex\hbox{$C$} \!\mathord{\left/ {\vphantom {C D}}\right.\kern-0pt} \!\lower0.7ex\hbox{$D$}}} \right)}} = \frac{E}{F}$$
(4)

where A is the number of landslide pixels in each class, B is the total number of landslide pixels of the whole area, C is the number of pixels in each class of conditioning factors, D is the total number of pixels in the area, E is the percentage of landslide occurrences in each class of conditioning factors, and F is the relative percentage of the area of each class.

Support vector machine (SVM)

Support vector machine is a supervised learning method based on statistical learning theory (Vapnik 1995) and the principle of structural risk minimization (Chen et al. 2016a, b). It is based on the statistical approach in order to find an optimal hyperplane for separating two classes (Tien Bui et al. 2016). A more detailed of SVM algorithm for landslide assessment has recently been depicted by Marjanovic et al. (2011); Colkesen et al. (2016).

To perform the landslide susceptibility map using SVM, the “rminer” package (Cortez 2015) was used. Meanwhile, there are four types of kernels: linear, polynomial, radial basis function (RBF), and sigmoid for SVM modeling, in this research used from RBF kernel.

Random forest (RF)

The random forest algorithm that developed by Breiman (2001) is based on a bunch of decision trees, and now it is one of the best learning patterns (Zhang et al. 2017). This model is based on the averaging of the results of all decision trees. The random forest method is widely used for data prediction and interpretation purposes and is suitable for nonlinear high-dimensional landslide susceptibility modeling problems (Messenzehl et al. 2016). The RF algorithm tends to produce quite accurate models, because it decreases the variance of the model, without increasing the bias (Hastie et al. 2009). This algorithm needs two original parameters to be implemented by the user: the number of trees (T) and the number of variables (m).

The main advantage of this approach is that it can categorize a large number of input variables without variable deletion (Immitzer et al. 2012). Compared with other algorithms, this model has more efficiency in the classification of a large dataset; moreover, it can process high-dimensional datasets without feature selection and can rank the parameters in terms of importance after calculating (Zhang et al. 2017). This method uses unbiased estimation in model building, which means that the training is fast and simple to running. This method is suitable for regional-scale applications and is useful for the landslide susceptibility mapping (Zhang et al. 2017). For running random forest is used from R statistical and randomForest package (Briman and Cutler 2015).

Applying different scenarios for ensemble of intelligent techniques

After calculation of weight of classes of each factor using bivariate statistical method (FR), and running RF and SVM computational intelligence methods, different scenarios were developed to provide a reasonable landslide susceptibility map in both landslide point and polygon formats According to Eqs. 516:

$${\text{Scenario}}1 = \frac{{\left( {{\text{FR}} + {\text{SVM}}} \right)}}{2}$$
(5)
$${\text{Scenario}}2 = \left( {2 \times {\text{RF}}} \right) + {\text{SVM}}$$
(6)
$${\text{Scenario}}3 = \left( {2 \times {\text{SVM}}} \right) + {\text{RF}}$$
(7)
$${\text{Scenario}}4 = {\text{SVM}} + {\text{RF}}$$
(8)
$${\text{Scenario}}5 = {\text{SVM}} \times {\text{RF}}$$
(9)
$${\text{Scenario}}6 = \frac{{\left( {{\text{SVM}} + {\text{RF}}} \right)}}{3}$$
(10)
$${\text{Scenario}}7 = \left( {\frac{{\left( {{\text{FR}} \times {\text{AUC}}_{\text{RF}} } \right) + \left( {{\text{SVM}} \times {\text{AUC}}_{\text{SVM}} } \right)}}{{\left( {{\text{AUC}}_{\text{RF}} + {\text{AUC}}_{\text{SVM}} } \right)}}} \right)$$
(11)
$${\text{Scenario}}8 = {\text{RF weighted by FR}}$$
(12)
$${\text{Scenario}}9_{\text{RF}} = {\text{Individual RF model}}$$
(13)
$${\text{Scenario}}9_{\text{SVM}} = {\text{Individual SVM model}}$$
(14)
$${\text{Scenario}}10_{\text{RF}} = {\text{Individual RF model}}$$
(15)
$${\text{Scenario}}10_{\text{SVM}} = {\text{Individual SVM model}}$$
(16)

where scenarios of 10SVM and 10RF are for landslide point format and in the rest of scenarios, landslides are in polygon format. AUC is area under the curve models.

Results and discussion

In the Ghaemshahr Watershed, landslides are the most serious natural problems that impress the economic development and cause loss of fertile soils and great damage to land and property. According to earlier studies by the Iranian Landslide Working Party (ILWP 2007), the highest frequency of landslide occurrence in Iran is in Mazandaran Province.

The soil is one of the most important components of the earth system as it controls erosional, hydrological, biological, and geochemical Earth cycles and provides a widespread range of services, goods, and resources to human kind (Keesstra et al. 2016, 2018; Comino et al. 2016; Vaezi et al. 2017). Soil erosion in agricultural areas because of loss of productivity and land degradation is a large problem worldwide (Kirchhoff et al. 2017) that must be solved by means of nature-based strategies to be able to achieve sustainability (Cerdà et al. 2017). Also, soil erosion is a key factor of desertification and affects the goals for sustainability of the United Nations (Keesstra et al. 2016). Many of the 15 Sustainable Development Goals defined by UN have a strong relation to land and water management and demanding a sustainable use of resources, ecosystem restoration, biodiversity, and sustainable basin management (Keesstra et al. 2016). The development of human societies requires the prudently use of natural resources such as soil (Keesstra et al. 2016). Soil conservation not only depends on wise decisions by foresters, farmers, and land planners, but also on political decisions on rules and regulations (Keesstra et al. 2016).

Multi-collinearity among conditioning factors

In this study, the multi-collinearity test considered according to two indices such as VIF and tolerance (Table 2). According to Table 2, the smallest tolerance and highest VIF were 0.31 and 3.20, respectively. So, there is not any multi-collinearity between independent factors in the current research.

Table 2 Multi-collinearity test

Spatial relationship between landslides and conditioning factor using FR model

Spatial relationship between landslides and conditioning factor by frequency ratio model is shown in Table 3. In the case of the relationship between landslide occurrence and altitude factors, most of landslide events are located in 477–2400 m including 477–1200 (2.53), 1200–1600 (3.39), 1600–2000 (1.21), and 2000–2400 (1.11), respectively. According to the results, at altitudes below 1200 m, due to the human activities such as agriculture and road construction, the most susceptibility to landslide has been shown. In contrast, at altitudes above 2800, due to rocky outcrops, the probability of landslide is negligible. In the case of slope degree, most of landslide occurrences are in slope ranges of 20°–40°, including 20°–30° (1.085), and 30°–40° (1.289), respectively. The results showed that with the increasing in slope degree, the probability of landslide occurrence has also increased. While the slope degree increases, the shear stress on the slope material increases and the probability of landslide occurrence increases. Although steep slopes due to outcropping bedrock may not be susceptible to shallow landslides (Mohammady et al. 2012; Wu et al. 2016). In the case of slope aspect, aspect parameter on southwestern-facing slopes represents the highest probability (2.171) to landslide occurrence. In the case of convergence index, class of − 22 to − 6 has the highest FR value (1.043) and class of > 21 has the lowest FR value (0.000). Based on plan curvature parameters, concave class with FR (1.110) has shown the most susceptible to the occurrence of landslides, whereas convex and flat classes with (0.987, 0.894) are located in the next ranks, respectively. According to results of TWI factor, class of − 5.4 to − 0.28 with score of (1.02) is located in the first rank in terms of susceptibility to landslide and classes of (< 4, − 0.28 to 1.26 and 1.26–4) with FR values of (1.01, 0.99, and 0.98) are located in the next ranks. In the case of rainfall, class of > 670 mm with the highest rainfall compared to other classes, with FR (1.589) has been shown the most susceptibility to landslide. The results obtained from the distance from rivers and the distance from roads showed that with the increasing distance from these parameters, landslide susceptibility also decreased, and the class of 0–100 m has the highest FR (1.98 and 1.41) for distance from road and river, respectively. This is in line by results of Mohammady et al. (2012). In the case of distance from fault, class of 1500–2,000 m with FR (1.71) has shown a high susceptibility to landslides and classes of (1000–1500, 0–500, 500–1000, and > 2000 m) with FR (1.61, 1.21, 1.19, and 0.73) are in the next ranks, respectively.

Table 3 Relationship between landslide occurrence and conditioning factors using frequency ratio model

Based on the results of the drainage density factor, there is a direct relationship between drainage density and landslide susceptibility; so, with increasing drainage density, landslide susceptibility has also increased. LU/LC analysis by FR indicated that forest (B) and irrigation agriculture and range (E) classes with the highest FR (1.85 and 1.12) in compared to other classes have more susceptibility to landslide. Result of geology factor explained that TRJs class with dark gray shale and sandstone (Shemshak formation) and FR of 1.20 has highly susceptible to landslide in the current study area.

Random forest model

The results of variables importance using random forest intelligence technique are shown in Fig. 5. The results show according to mean decrease accuracy analysis, altitude, slope aspect, drainage density, and distance from rivers are the most important factors on landslide occurrence in the study area. Also, the other factors such as annual rainfall, slope degree, distance from roads, distance from faults, geology, convergence index, LU/LC, plan curvature, and TWI are in the next ranks, respectively. Out-of-bag (OOB) error rate in this study was 3.84% with 1000 trees and three variables tried at each split. Finally, the landslide susceptibility map (LSM) using the RF algorithm was provided and classified based on the natural break classification scheme in ArcGIS 10.5 (Pourghasemi and Kerle 2016) into five susceptibility classes: very low, low, moderate, high, and very high.

Fig. 5
figure 5

Two measures of variable importance calculated by the random forest algorithm

Support vector machine (SVM) model

In this research, the SVM model with radial basis function (RBF) was trained in R statistical software. The RBF kernel is one of the most powerful kernels and in many studies especially in nonlinear problems, RBF provides better prediction results for landslide susceptibility mapping than other kernels. The probability of landslide occurrence falls in the range between 0 and 1.

In this research, Hyper-parameter sigma and number of support vectors were 0.055 and 6557, respectively.

The results were then exported into the ArcGIS 10.5 software for visualization. Finally, landslide susceptibility map based on the natural break classification divided into five susceptibility classes: very low, low, moderate, high, and very high.

Applying difference scenarios for LSM provide

In general, the scenarios are in both landslide point and polygon formats. The landslide susceptibility maps produced by ten scenarios are represented in Fig. 6a–l. Implementing of different scenarios is done in the ArcGIS10.5 software environment using the Raster Calculator tool. The obtained pixel values from these scenarios were then classified based on the natural break classification scheme (Pourghasemi et al. 2013) into five susceptibility classes: very low, low, moderate, high, and very high. The results showed that most of the landslide area is located in very high susceptibility class. Furthermore, very high susceptibility class covers only low area of watershed (Youssef 2015). As in the scenarios 1, 2, 3, 4, 5, 6, 7, 8, 9 (FR), 9 (SVM), 10 (FR), and 10 (SVM) (8.98, 6.358, 11.28, 8.98, 1.76, 8.98, 8.66, 4.44, 3.11, 13.05, 12.05, and 13.92) percentage of the total area and (36.94, 31.68, 39.20, 36.94, 15.44, 36.94, 36.41, 22.99, 22.18, 36.81, 27.89, and 27.77) of landslide pixels are located in very high susceptibility classes, respectively.

Fig. 6
figure 6figure 6

Landslide susceptibility maps prepared using various scenarios

Validation of landslide susceptibility maps

Validation of landslide susceptibility models (LSMs) is considered as one of the most important steps in assessment of landslide susceptibility. Furthermore, it is essential in order to assess the predictive capabilities of the landslide susceptibility maps. Thus, without validation, LSM will not have scientific significance (Wu et al. 2016). In this study, for considering the accuracy of the LSM maps provided using the different scenarios, the receiver operating characteristic (ROC) curve was used. In the ROC analysis, the area under the curve (AUC) value used to evaluate the model accuracy. The AUC value of 1.0 represents that the model performed perfectly; and the closer the AUC value to 1.0, the better the model is (Tien Bui et al. 2016). Also, using the frequency ratio (FR) and SCAI (seed cell area index), the accuracy of the separation between the susceptibility classes was verified and confirmed. In this context, the percentages of susceptibility are divided by the percentages of landslide cells in order to develop the SCAI density of landslides for the classes. Considering that the same landslides that are used in running of model cannot be used to evaluate the built models (Komac 2006), As a result, the total landslides detected in the study area were randomly divided into two groups: 70% (206 landslide locations) for training the models and the remaining 30% (88 landslides locations) was used for validation purpose. According to the results of classification accuracy assessment using the SCAI and FR indicators (Table 4 and Figs. 7 and 8), in all models, with increasing the susceptibility of the risk from very low to very high, the FR is almost up trend, but the SCAI index shows a significant downward trend and indicates a high correlation between the susceptibility classes with the landslide pixels and field observations of the study area. Therefore, the separation order between the classes was evaluated in different scenarios, accurately. The results of the AUC evaluation showed in Table 5.

Table 4 Values of FR and SCAI
Fig. 7
figure 7

FR values in different scenarios

Fig. 8
figure 8

SCAI values in different scenarios

Table 5 Area under the curve

Results of scenarios showed that AUC was varying from 0.668 to 0.749. In general, maps produced by landslide polygon format (scenarios 1–9) represented the better prediction accuracy than another scenario (landslide point format) and can be used for the spatial prediction of landslide hazard analysis in the study area. Because when use from polygon format, certainly it consisted of several points in compared to a point as polygon centroid, toe of landslides, or landslide crown. So, the polygon format is better from sample points. Also, results of scenarios 1–9 indicated that accuracy of ensemble models is more from individual SVM model; meanwhile, random forest accuracy is similar with ensemble models. By the way, comparison of scenarios 9 and 10 showed that SVM and RF models built by landslide polygon had the better accuracy than these models (SVM and RF) by landslide point format.

Pourghasemi and Kerle (2016) used random forests and evidential belief function-based models for landslide susceptibility assessment in Western Mazandaran, Iran, and stated that combination of these models with AUC = 81.77 has high ability to identify susceptible areas to landslides. This is in line with archived results from ensemble modeling in compared to individual SVM technique. Chen et al. (2016a, b) use support vector machine models for landslide susceptibility mapping in Qianyang County, China. Result of this research indicated that among four kernels, RBF with AUC = 83.15 has a high performance in providing a landslide susceptibility map. In our study, in both landslide formats (polygon and point), the SVM–RBF machine learning technique shows the lowest accuracy. Zhang et al. (2017) applied random forest and decision tree methods for landslide susceptibility mapping in the Three Gorges Reservoir area, China, and stated that RF with AUC = 97.0 is suitable for landslide susceptibility. Our results are in line with Zhang et al. (2017) as individual RF model in scenario 9 had a high accuracy.

Chen et al. (2017a, b) introduced new ensembles of ANN, MaxEnt, and SVM machine learning techniques for landslide spatial modeling. They stated that ensemble models have a high performance for landslide susceptibility mapping. Our results showed that this scenario (ensemble modeling) can propose for other researchers, as ensemble modeling was accurately than some scenarios (scenario 9SVM, and scenario 10).

Conclusion

Landslides are one of the most important natural hazards in the world; so, providing of landslide susceptibility maps is very important that can help planners and decision makers in disaster management. The accuracy of landslide susceptibility maps mainly depends on the amount and the quality of data, the scale, and the methodology. In the present study, landslide susceptibility maps were prepared using combination of statistical method (FR) and computational intelligence methods (RF and SVM), by applying different scenarios using in landslide polygon and point formats. These maps will help planners and policy makers to mitigation dangers of landslides in construction of roads and settlement. In order to providing landslide susceptibility map, 13 conditioning factors including elevation, slope angle, plan curvature, slope aspect, topographic wetness index (TWI), lithology, LU/LC, distance from rivers, drainage density, distance from fault, distance from roads, convergence index and annual rainfall were used. The FR model was applied as a bivariate statistical method to evaluate the correlation between the landslides and classes of each conditioning factors. Finally, the ROC curve is used for validation of LSMs. Results of validation indicated that AUC of individual and ensemble models was varying from 0.668 to 0.749. The result of landslide susceptibility maps showed that the high susceptibility areas are mainly distributed along the north to northeastern in the study area. Due to high residential density in this area, it is suggested that any construction operations in this area be made more cautiously. Also, due to the fact that landslides cause loss of fertile soil and land degradation, in order to soil conservation, it is recommended that farmers and foresters avoid unplanned actions on slopes that are sensitive to landslides.