1 Introduction

Landslides occur on steep slopes of hills and mountainous areas causing mortality, economic losses, damage to water and soil resources (Schlögel et al. 2015; Raja et al. 2017). This mass movement occurs whenever the loading of an earth material exceeds its shear strength (Lin and Lin 2017). Although landslides have always occurred over time, factors such as changes in climate patterns, constant deforestation of mountainous regions and increased urbanization and its development in susceptible areas have increased landslides around the world in recent years (Goetz et al. 2011). According to a report by the Centre for Research on Epidemiology of Natural Disasters (CRED), landslides are responsible for at least 17% of losses caused by natural disasters in the world (Chen et al. 2019a, b). According to Nadim et al. (2006), the South America, the northern parts of USA and Canada, Iran, Turkey, the Himalayas, the Philippines, Indonesia, Japan and New Zealand are the most landslide vulnerable areas in the world. Therefore, landslides are of great importance as a global problem. Landslide-induced mortality risk and economic losses exceed actual reported numbers in most countries, so that in some cases damages exceed those of other natural disasters (Kjekstad and Highland 2009). In addition, environmental impacts of mass movements such as damages to forests and rangelands and increased sediment load of rivers and its transport to dams should not be ignored. Hence, research on landslides has recently received much attention by policymakers due to the consequences of this destructive phenomenon (Mohammady et al. 2012). Iran is highly susceptible to landslides due to mountainous topography caused by the Alborz and Zagros Mountains, physiography, seismicity and diverse climate and geological conditions. According to the statistics recorded in Iran, there were 4900 landslides until September 2007 that resulted in 187 casualties and about USD 12,700 million in financial losses (Pourghasemi et al. 2012a). The study area, located on the Zagros Mountains, is highly susceptible to landslides. This sensitivity is due to several factors, including the increase in rural settlements, changes in land use, adding the river tributaries and irrigation canals for agricultural use, as well as the construction of Karun 3 and 4 dams. Therefore, this region is of great importance in terms of human, economic, environmental and energy supply resources. So spatial modeling of landslides in this region seems necessary.

Landslide modeling includes using available data for landslide susceptibility mapping (LSM) by selecting an appropriate model. LSMs are in fact tools capable of dividing the ground surface into zones of various stability degrees by evaluating the effect of various factors on slope instability (Youssef et al. 2015a). Therefore, the development of landslide zoning studies and the preparation of susceptibility maps lead to better management and planning of land use and thus reduce the destructive risks arising from it (Baena et al. 2019).

In recent years, many researchers have tried to develop landslide susceptibility maps through using new methods and GIS as a powerful tool (Lorentz et al. 2016; Kadavi et al. 2018; Sameen et al. 2020). Various qualitative and quantitative methods have been used for different areas. However, quantitative (data-driven) approaches have received much attention and are often used in the related studies (Pradhan 2013; Ciurleo et al. 2017; Juliev et al. 2019). Generally, quantitative approaches are categorized into statistical and soft computing methods. Bivariate and multivariate probability models (Pradhan and Lee 2010; Erener et al. 2016; Nicu 2018; Chen et al. 2019a; Zhao et al. 2019) and their combinations (Althuwaynee et al. 2014; Youssef et al. 2015b) are among widely used statistical methods for prepare LSM. Soft computing methods including various machine learning methods such as ANN (Moayedi et al. 2018; Can et al. 2019; Harmouzi et al. 2019), SVM (Zhang et al. 2010), ANFIS (Mehrabi et al. 2020), random forest (Đurić et al. 2019), boosted regression tree and meta-heuristic algorithms (Kavzoglu et al. 2015) have also been used for this purpose. Unlike quantitative methods in which the relationships between landslide controlling factors are numerically expressed, qualitative (knowledge-driven) approaches such as multi-criteria decision analysis methods (Feizizadeh et al. 2014; Yan et al. 2019; Ozioko and Igwe 2020) consider these factors inferentially and their results depend on experts’ views. It should be noted that these methods vary in computational process and efficiency. Hence, analyzing the previous and new methods is a significant step which leads to more realistic results. There is no agreement that determines what type of method should be used for an area. Although the use of new methods is essential for making progress in landslide studies, the use of new combinations as a complementary solution can provide more optimal results. It should also be noted that the quality of the input data also affects the output. In other words, in the same conditions and simultaneous use of the same method, data with better have a more accurate output compared to lower-quality data. The above statements confirm the complexity of this issue and show that the preparation of a landslide susceptibility map depends on various factors, and all aspects must be carefully considered in order to obtain a realistic result. This study aimed at combining ANFIS with GWO and PSO algorithms using the outputs of qualitative SWARA and quantitative CF methods. The results obtained from these methods were also compared to produce the best LSM for the Karun watershed.

As the first innovative aspect, this study compared the ability of meta-heuristic GWO and PSO algorithms in preparation of ANFIS model. Furthermore, for the first time, instead of a single model, the outputs of qualitative SWARA and quantitative CF models are used as input data for training ANFIS-GWO and ANFIS-PSO hybrid models. Ultimately, the LSMs’ performance was evaluated by ROC curve, and then, they were compared.

2 Study area

The study area, as part of the Karun watershed with an approximate area of 7380 km2, lies between longitudes 49° 46′ to 51° and latitudes 31° 27′ to 32° 33′ (Fig. 1). This region is specified in the Zagros Mountains having a height of 503–3970 m above the sea level. The main river flowing in the region is the Karun River originated from Zard-Kuh in Koohrang. This watershed is of great economic and environmental importance given its large contribution in hydropower generation as well as 13% of agricultural products in the region. The climatic data of the region were obtained from the Meteorological Organization and the Ministry of Energy. The lowest average temperature is recorded in February, whereas August is reported as the warmest month (− 1.5 and 19.3 °C, respectively). According to data collected from rain-gauge stations, the mean annual precipitation rate equals 950 mm leading to the creation of many surface runoffs in this mountainous region. The surface runoffs in turn induce many landslides by slope un-stabilization in the region. Human factors such as road construction and settlements as well as agricultural activities exacerbate this natural disaster. The diverse regional lithology also causes landslides. There are many reasons why the study area is important for finding the potential for landslides. One of the reasons that are economically and humanly important is the issue of farming areas and settlements. According to the land use maps, 20.6% of the study area is covered by agricultural and orchard lands. Some of these regions and settlements are located on low slope hillsides or foothills, which are prone to landslides. Therefore, the collapse of unstable slopes will destroy agricultural regions and human casualties. Another reason that is environmentally important is the issue of the sediment accumulation in rivers and their transfer to the back of dams. Two important dams of southwest of Iran, namely Karun 3 and 4, are located in this area (Fig. 1). Therefore, considering an increase in landslides occurred in this area and the conditions mentioned above, it is necessary to spatial modeling of landslides.

Fig. 1
figure 1

Location map and landslide inventory map of the study area

3 Data preparation

3.1 Inventory map

Locating the landslide points is the basis in finding the correlation between the geographical distribution of landslides and the conditioning factors (Lee et al. 2013). In this paper, the landslide locations were obtained from Forest Range and Watershed Management Organization (Fig. 1). According to estimates made, most of the landslides occurred in the region are rotational and translational slides, complex, and a small number of them are of debris flows. There was no definite approach for dividing landslide points. According to the literature (Choi et al. 2012; Juliev et al. 2019), 70% of the identified landslides (185 landslide pixels) were applied for training phase, while the remaining 30% (79 landslide pixels) were used for validation.

3.2 Preparation of factor maps

Selection of effectual factors as input variables is considered an important step in spatial modeling of landslides for evaluating the potential of susceptible areas (Trigila et al. 2015). According to the regional properties and available data, a total of 12 conditioning factors including slope, aspect, altitude, distance to faults, distance to rivers, distance to roads, land use, lithology, rainfall, plan curvature, profile curvature and TWI were selected. These factors were obtained from different resources and used for calculating the landslide susceptibility index after digitalization (spatial resolution 30 m) (Table 1). All preparation stages and display of data layers were carried out using ArcGIS 10.5.

Table 1 Source of the conditioning factors

Slope gradient is considered the main slope stability parameter due to its direct relationship with landslides (Pourghasemi et al. 2012b). Therefore, it is considered one of the factors affecting landslides (Fig. 2a). Due to the effect of rainfall and solar radiation on different directions of slopes, slope direction has always been considered as an important factor in the literature (Basharat et al. 2016). For spatial modeling of landslides, this factor was classified into flat, north, northeast, east, southeast, south, southwest, west and northwest classes (Fig. 2b). Altitude is another important conditioning factor in assessing because of its significant impact on soil properties (Gomez and Kavzoglu 2005). This factor, also known as DEM, was categorized into eight classes (Fig. 2c). Water diffusion into the porosities of slope-forming materials causes water pressure to the pores leading to reduced strength of the soil. Increased moisture leads to slope instability and landslide risk. In addition to flow accumulation, TWI shows the trend of the water flow to go downslope (Tehrany et al. 2015) (Fig. 2d).

Fig. 2
figure 2figure 2figure 2

Produced causative factors of the study: a slope, b aspect, c altitude, d TWI, e plan curvature, f profile curvature, g rainfall, h distance to rivers, i distance to roads, j distance to faults, k land use and l lithology

Plan curvature is defined as the curvature of stereometric created by the cross section of horizontal plane and the surface (Pradhan and Sameen 2017) (Fig. 2e). Profile curvature represents the curvature of the ground surface along the gradual slope relative to the vertical level of flow. This parameter controls the water velocity and erosion rate (Sujatha et al. 2013) (Fig. 2f).

Saturation degree is increased with an increase in the mean precipitation rate causing a decrease in the shear strength of slopes and increased mass movements. Therefore, rainfall intensity affects slope fractures and landslide occurrence (Su et al. 2015) (Fig. 2g). Surface runoffs adversely affect slope stability by eroding toe slope or saturating materials forming the slopes (Conforti et al. 2014). Distance to river was categorized into eight classes with a distance of 100 m using the Euclidean distance method (Fig. 2h). Road construction in mountainous areas negatively affects slope stability, and it is thus known as a destructive human activity in the literature (Regmi et al. 2014; Althuwaynee et al. 2016). The distance to roads was also classified into 8 classes with a distance of 200 m using the Euclidean distance method (Fig. 2i). The movement of tectonic plates also causes failure of unstable slopes.

Faults are known as the main stimulating factor, especially in earthquake-prone regions. Distance to active faults in the region was also placed in 8 classes with a distance of 500 m by the Euclidean distance method (Fig. 2j). The land use map is considered a key factor in the study of landslides (Persichillo et al. 2017). Land use was categorized into 12 classes (Fig. 2k). Given the role of different lithological units in landslide susceptibility, it has always been received much attention by scholars (Rozos et al. 2011) (Fig. 2l). Tables 2 and 3 show the details on the classification of conditioning factors (Fig. 3).

Table 2 Spatial relationship between landslides and lithological units by CF and SWARA
Table 3 Spatial relation between conditioning factors and landslides using CF and SWARA models
Fig. 3
figure 3

Flowchart of the study showing all steps to produce landslide susceptibility maps

4 Methodology

4.1 Stepwise weight assessment ratio analysis (SWARA)

Stepwise weight assessment ratio analysis (SWARA), introduced by Keršulienė and Turskis (2011), is a multi-criteria decision-making method (MCDM) aimed at ranking and calculating weights of criteria and sub-criteria. Similar to other MCDM methods, experts’ views play a key role in evaluating and calculating weights in SWARA so that each expert is able to rank criteria based on personal knowledge, information and experience and assign a weight to each criterion based on its significance (Hashemkhani Zolfani and Bahrami 2014). The main feature of this method was its ability in evaluating experts’ views on the relative significance of criteria to determine their weights (Keršulienė and Turskis 2011). The method for calculating the final weight of criteria is discussed below:

  1. 1.

    The first step is to identify criteria and sub-criteria.

  2. 2.

    Here, experts are provided with final criteria to rank them based on their relative significance. Accordingly, the top rank is assigned to the most significant criterion placed in the first row. The bottom rank is assigned to the less significant criterion in the last row.

  3. 3.

    After determining the relative significance of each criterion (Sj) relative to previous criteria, the normalized weight of criteria is calculated from the following equations:

    $$S_{j} = \frac{{\mathop \sum \nolimits_{i}^{n} A_{i} }}{n}$$
    (1)

    where j is the criteria index, n is the number of experts and Ai is the ranks suggested for each criterion. Kj is a function of relative significance of each criterion, and the initial weight, Qj, is determined from the following equations:

    $$K_{j} = \left\{ {\begin{array}{*{20}l} 1 \hfill & {j = 1} \hfill \\ {S_{j} + 1} \hfill & {j > 1} \hfill \\ \end{array} } \right.$$
    (2)
    $$Q_{j} = \frac{{S_{j} - 1}}{{K_{j} }}$$
    (3)

The final normalized weight is obtained from Eq. 4:

$$W_{j} = \frac{{Q_{j} }}{{\mathop \sum \nolimits_{j = 1}^{m} Q_{j} }}$$
(4)

where j and m are, respectively, the criterion index and the total number of criteria (Keršulienė and Turskis 2011).

Figure 4 (Keršulienė and Turskis 2011) shows the procedure detail of SWARA model.

Fig. 4
figure 4

Diagram of the process the SWARA model

4.2 Certainty factor (CF)

The certainty factor (CF) model was first proposed by Buchanan and Shortliffe (1984) as a function to deal with problems arising from a combination of various data layers and unreliability of input data (Devkota et al. 2013). The model was originally designed for medical diagnostic systems, but then was modified by Heckerman (1985). This model is in the category of bivariate probabilistic methods, which has been extensively used in different fields, including landslide studies (Fan et al. 2017; Chen et al. 2019a). To calculate the certainty factor, the inventory map was intersected with the maps for conditioning factors to determine the number of landslides that occurred in each class of conditioning factors. According to the following equation:

$${\text{CF}} = \left\{ {\begin{array}{*{20}l} {\frac{{{\text{PP}}_{a} - {\text{PP}}_{s} }}{{{\text{PP}}_{a} (1 - {\text{PP}}_{s)} }}} \hfill & {{\text{if}}\quad {\text{PP}}_{a} \ge {\text{PP}}_{s} } \hfill \\ {\frac{{{\text{PP}}_{a} - {\text{PP}}_{s} }}{{{\text{PP}}_{s} \left( {1 - {\text{PP}}_{a} } \right)}}} \hfill & {{\text{if}}\quad {\text{PP}}_{a} < {\text{PP}}_{s} } \hfill \\ \end{array} } \right.$$
(5)

where PPa is the ratio of landslide pixels in a class to the number of all classes (the conditional likelihood of landslides occurring in the class a) and PPs is all landslide pixels ratios to all pixels in the region (is the prior probability of all landslides occurred). Larger positive values indicate higher certainty and thus an increase in the probability of landslides. In contrast, negative values indicate a lower certainty and thereby a lower probability of landslides. One cannot comment on the certainty of landslides for values close to zero (Devkota et al. 2013; Fan et al. 2017).

4.3 Combined adaptive neuro-fuzzy inference system (ANFIS)

Introduced in 1993, the ANFIS method is a combination of artificial neural network (ANN) and fuzzy, in order to solve complicated nonlinear problems. The ANFIS structure is composed of the conventional components of the fuzzy system, except computations, because this part is run by hidden neurons of the layer. In addition, the training capacity of the neural network is used to increase the knowledge of the system. The Sugeno and Mamdani are two common fuzzy systems which are based on Takagi–Sugeno–Kang method and Lotfi Zade’s paper, respectively. These systems work without any limitation on the black box and can also be provided in an uncertainty environment. The Takagi and Sugeno system is based on two if–then rules, which are as follows:

$${\text{Rule}}1:{\text{If}}\,x\,{\text{is}}\,A_{1} \,{\text{and}}\,y\,{\text{is}}\,B_{1} ,{\text{then}}\,Z_{1} = p_{1} x + q_{1} y + r_{1} .$$
(6)
$${\text{Rule}}\,2:{\text{If}}\,x\,{\text{is}}\,A_{2} \,{\text{and}}\,y\,{\text{is}}\,B_{2} ,{\text{then}}\,Z_{2} = p_{2} x + q_{2} y + r_{1} .$$
(7)

where x (\(A_{1} ,A_{2}\)) and y \((B_{1} ,B_{2} )\) are inputs, A1, B1 and B2 are fuzzy sets determined during the training process and pij, qij and rij (i, j = 1, 2) are parameters obtained in the training phase (Zhang et al. 2010).

Layer 1 In the first layer, the values of input variables are fuzzified, so that each node i is defined as an adaptive node with a node function. They are responsible for producing membership grades of the inputs (Oh and Pradhan 2011). The following equations are used to obtain the output of this layer:

$$O_{{A_{i} }}^{1} = \mu_{{A_{i} }} \left( x \right),\quad i = 1,2,$$
(8)
$$O_{{B_{j} }}^{1} = \mu_{{B_{i} }} \left( y \right),\quad i = 1,2,$$
(9)

where … and … are inputs for the node i, \(A_{i}\) and \(B_{i}\) indicate the associated linguistic labels and \(\mu_{{A_{i} }} \left( x \right)\) and \(\mu_{{B_{i} }} \left( y \right)\) are the membership functions from different forms including triangular, trapezoidal, Gaussian functions, generalized bell or other functions.

Layer 2 Here, all nodes are fixed, and they are denoted by Π to show that they play a role of a simple multiplier.

In this layer, the output of each node (\(\omega_{i}\)) represents the firing strength of a rule, which is expressed by the following equation:

$$\omega_{i} = \mu_{{A_{i} }} \left( x \right) . \mu_{{B_{i} }} \left( y \right)\quad i = 1,2$$
(10)

Layer 3 In this section, every node is also a fixed node which is presented as N. The ith node calculates the ratio of the ith rule’s firing strength to the sum of all rules’ firing strengths, which also called normalized firing strength, and the outputs are given by the following equation (Zhang et al. 2010):

$$O_{3,i} = \frac{{\omega_{i} }}{{\sum \omega_{i} }} = \frac{{\omega_{i} }}{{\omega_{1} + \omega_{2} }} = \bar{\omega }_{i} \quad i = 1, 2$$
(11)

Layer 4 In this layer, an output is specified for each rule. Every node is an adaptive node, and the output of each node is simply the product of the normalized firing. The following equation can be used to obtain the outputs of this layer:

$$O_{i}^{4} = \bar{\omega }_{i} \cdot f_{i} = \bar{\omega }_{i} (p_{i} x + q_{i} y + r_{i} )\quad i = 1, 2$$
(12)

where \(\bar{\omega }_{i}\) is the output of the third layer and \((p_{i} ,q_{i} \,{\text{and}}\,r_{i} )\) are consequent parameters.

Layer 5 In this layer, the single node is considered a fixed node which is labeled as Σ. The fixed nodes calculate the entire output as the sum of all incoming signals. It can be described by the following equation:

$$O_{5,i} = \mathop \sum \limits_{i} \bar{\omega }_{i} \cdot f_{i} = \frac{{\mathop \sum \nolimits_{i} w_{i} f_{i} }}{{\mathop \sum \nolimits_{i} w_{i} }} = f_{{\text{out}}}$$
(13)

The ultimate results for ANFIS are given by the above equation.

4.4 Gray wolf optimizer (GWO)

Gray wolf optimizer (Mirjalili et al. 2014) is a nature-inspired meta-heuristic algorithm based on two principles in the life of gray wolves, namely social hierarchy and hunting strategy. This is a population-based algorithm (number of wolves) and thus considered as a swarm intelligence (SI) algorithm.

There is no SI technique available in the literature that mimics the leadership hierarchy of gray wolves, which are well known for their pack hunting (Mirjalili et al. 2014).

Different steps of GWO algorithm are discussed below:

4.4.1 Social hierarchy

Social hierarchy is determined at this stage. To this end, gray wolf population is generated randomly. The generated population in each pack is modeled as a pyramid consisting of four groups of alpha, beta, delta and omega wolves. Alpha is the best solution while beta and delta are the second and third, respectively. In the GWO algorithm, hunting is carried out with the help of alpha, beta and delta wolves while omega wolves follow these three types to explore the best solution.

4.4.2 Encircling prey

The following relations show the encircling process of the gray wolves around the prey:

$$\vec{D} = \left| {\vec{C} \cdot \vec{X}_{P} \left( t \right) - \vec{X}\left( t \right)} \right|$$
(14)
$$\vec{X}\left( {t + 1} \right) = \vec{X}_{P} \left( t \right) - \vec{A} \cdot \vec{D}$$
(15)

where t is the current iteration; A and C are coefficient vectors, \(\vec{X}_{P}\) is the prey position vector and X is the gray wolf position vector. The coefficient vectors (A, C) are evaluated as follows:

$$\vec{A} = 2\vec{a} \cdot \vec{r}_{1} - \vec{a}$$
(16)
$$\vec{C} = 2 \cdot \vec{r}_{2}$$
(17)

where r1 and r2 are random vectors ranging from 0 to 1. \(\vec{a}\) is linearly reduced from 2 to 0 through the iterations.

4.4.3 Hunting process

The hunting process is directed by alpha wolves. However, there is no idea on optimal position because hunting positions are constantly changing in the algorithm. Therefore, it is presumed that alpha, beta and delta wolves are aware of the best prey positions. The prey position of alpha (best candidate solution), beta and delta wolves is stored as the best solutions. Then, the remaining solutions (omega) are updated based on the position of the best exploration factors.

Using the following equations, the mathematical description of α, β and δ wolves tracking orientation of preys could be realized (Mirjalili et al. 2014):

$$\vec{D}_{\alpha } = \left| {\vec{C}_{1} \cdot \vec{X}_{\alpha } - \vec{X}} \right|,\vec{D}_{\beta } = \left| {\vec{C}_{2} \cdot \vec{X}_{\beta } - \vec{X}} \right|,\vec{D}_{\delta } = \left| {\vec{C}_{3} \cdot \vec{X}_{\delta } - \vec{X}} \right|$$
(18)
$$\vec{X}_{1} = \vec{X}_{\alpha } - \vec{A}_{1} \cdot \left( {\vec{D}_{\alpha } } \right),\vec{X}_{2} = \vec{X}_{\beta } - \vec{A}_{2} \cdot \left( {\vec{D}_{\beta } } \right),\vec{X}_{3} = \vec{X}_{\delta } - \vec{A}_{3} \cdot \left( {\vec{D}_{\delta } } \right)$$
(19)
$$\vec{X}\left( {t + 1} \right) = \frac{{\vec{X}_{1} + \vec{X}_{2} + \vec{X}_{3} }}{3}$$
(20)

4.4.4 Attacking prey (exploitation)

When the prey stops, gray wolves attack it and hunting process is terminated. Mathematically, the process is associated with a reduction in \(\vec{a}\) leading to a reduction in the range of \(\vec{A}\) fluctuations. \(\vec{A}\) is a random value in the range of [− 2a, 2a] where \(\vec{a}\) is decreased from 2 to 0 through several iterations. Moreover, the wolves are required to attack the prey when \(\left| A \right| < 1\) (Mirjalili et al. 2014).

4.4.5 Searching for prey

In this situation, prey searching (discovery) is evaluated according to the positions of alpha, beta and delta wolves. Gray wolves are separated for finding a prey and then approach each other for hunting.

The exploration capability is incorporated in the GWO algorithm when \(\vec{A}\) values lie outside the − 1 to 1 range.

When \(\left| A \right| > 1\), wolves should be separated for searching the prey. C is another exploratory component in GWO algorithm containing random values ranging from 0 to 2. This component gives random weights for emphasizing (C > 1) or not emphasizing (C < 1) the effect of prey in interval definition (Eq. 14). This helps the GWO algorithm to avoid the local optimum point (the algorithm gets rid of trapping in a local optimum point). It should be noted that unlike A, C does not decrease linearly. After the above steps, the algorithm is terminated by meeting a final criterion (Mirjalili et al. 2014). For more details on the gray wolf optimizer, the reader may refer to the original paper http://www.alimirjalili.com/GWO.html.

4.5 Particle swarm optimization (PSO) algorithm

For the first time in 1995, Eberhart and Kennedy introduced particle swarm optimization (PSO) algorithm. It is a population-based evolutionary algorithm inspired by the social behavior of bird flocks. This algorithm has been used as a powerful tool for solving nonlinear random optimization problems.

In PSO, a group of particles (optimization variables) are randomly distributed in the search space. Each particle in this space is defined by two main features, namely position and velocity. Each particle selects a movement direction considering its current position and the best position experienced by tracking information on one or multiple particles among the particles. A step of the algorithm is terminated after the movement of the particles. This process is repeated until the best location visited by all particles is presented as the solution. Given the impact of position of other particles on searching for a particle, PSO is also known as a swarm intelligence (SI) algorithm. A d-dimensional search space is assumed for a mathematical description of the above-mentioned process. The position (Xi) and velocity (Vi) vectors for the ith particle in the search space are defined as follows (Eberhart and Kennedy 1995):

$$X_{i} = \left( {X_{i1} , X_{i2} , X_{i3} , \ldots , X_{id} } \right)$$
(21)
$$V_{i} = \left( {V_{i1} , V_{i2} , V_{i3} , \ldots , V_{id} } \right)$$
(22)

The algorithm updates both the position and velocity of each particle in the iteration t + 1 according to the following equations (Eberhart and Kennedy 1995):

$$V_{i}^{t + 1} = \omega \cdot V_{i}^{t} + C_{1} \cdot r_{1} \cdot \left( {P_{i}^{t} - X_{i}^{t} } \right) + C_{2} \cdot r_{2} \cdot \left( {g_{i}^{t} - X_{i}^{t} } \right)$$
(23)
$$X_{i}^{t + 1} = \left( {X_{i}^{t} + V_{i}^{t + 1} } \right)$$
(24)

where t is the number of iterations, \(X_{i}^{t}\) and \(V_{i}^{t}\) are position and velocity of ith particle in the iteration t, \(P_{i}^{t}\) the best position of the ith particle, \(g_{i}^{t}\) the best position recorded among all particles and r1 and r2 are random weights generated in the range of 0–1. In addition, ω is the inertia coefficient, and C1 and C2 are cognitive and social coefficients, respectively. It is noteworthy that selecting appropriate inertia and acceleration coefficients may lead to a leveling between local and global searches (Assareh et al. 2010).

One of the first applications of PSO was a neural network (NN) training which was shown to be an efficient method for training neural networks.

5 Results and analysis

Tables 2 and 3 list the values obtained from SWARA implementation on criteria and sub-criteria. According to the results, the slope class 7.33°–14.59° presented the highest SWARA value of 0.319. The eastern and southern slope directions, respectively, showed the highest SWARA values of 0.463 and 0.244. Altitude class of 503–1189 m with a SWARA value of 0.372 indicated the highest landslide probability (SWARA weight decreases with increasing altitude). The TWI class of 8.56–22.38 showed the highest landslide probability (0.275). The flat and convex plan curvatures showed the lowest and highest SWARA values of 0.301 and 0.393, respectively. The lowest SWARA value was found for a convex profile curvature. The highest SWARA value (0.395) was found for flat profile curvature. The rainfall classes 435–586 and 954–1138, respectively, with a SWARA value of 0.296 and 0.156 showed the highest landslide probability. As shown in Table 2, the highest weights of 0.313, 0.418 and 0.356 were obtained for the distance to the stream (0–100), distance to the road (200–400) and distance to the fault (2000–2500), respectively. The highest SWARA value (0.277) was observed for the surface runoff class. The highest SWARA value (0.168) was found for pC-Ch lithology class (Table 2).

Tables 2 and 3 also show the results obtained from the CF model. A slope range of 7.33°–14.59° showed the highest CF value (0.444) indicating its high landslide potential. Similar to the results obtained from SWARA, the eastern and southern slope directions showed the highest CF values of 0.277 and 0.177, respectively. The altitude class 1574–1910 with a weight of 0.642 showed the highest landslide probability. The TWI class 8.56–22.38 showed the most significant correlation. As clearly shown in Table 2, the flat plan curvature and profile curvature showed the highest CF values of 0.197 and 0.252, respectively. The rainfall classes 435–586 and 956–1138 m with CF values of 0.937 and 0.928 showed the most significant correlation with landslide occurrence. The landslide probability decreased with increasing distance to the road. Accordingly, the 0–100 m and more than 700 m (> 700 m) classes showed the highest (0.733) and lowest CF values, respectively. In relation to distance to roads and fault, 0–400-m and 2000–2500-m classes showed the highest landslide probability. In the case of land use, surface runoff showed the highest probability of 0.892. According to the results on lithological units, the pC-Ch and MuPlai classes with CF values of 0.965 and 0.879 showed the most significant correlation with landslide probability (Table 2).

5.1 Application of ANFIS-GWO and ANFIS-PSO hybrid models

GWO and PSO intelligent algorithms were used in this study instead of classic functions for ANFIS training. The ANFIS-GWO and ANFIS-PSO hybrid models were implemented in MATLAB 2015.b. Training and validation data were required for implementing the algorithms. The algorithms were trained by training data, and accuracy was estimated by validation data. The SWARA and CF outputs were used to generate training and validation data. As mentioned earlier, 70% of 264 landslides were used for training and the rest (30%) for the validation of algorithms. In this regard, the same number of training data (185 points) was generated for non-landslide points. The points were generated in the landslide-free areas randomly. Thereafter, a value of 1 was allocated to 185 landslide and 0 was assigned to 185 non-landslide points, respectively. These points were then intersected with conditioning factors to obtain the value of each point. The values of points were used as input data in MATLAB for training the algorithms. The mean square error (MSE) and root mean square error (RMSE) were used for validating the training dataset (Figs. 5, 6a). MSE and RMSE are defined as follows:

$${\text{MSE}} = \frac{1}{n} \mathop \sum \limits_{i = 1}^{n} ( X_{i} - \bar{X}_{i} )^{2}$$
(25)
$${\text{RMSE}} = \sqrt {\frac{1}{n} \mathop \sum \limits_{i = 1}^{n} ( X_{i} - \bar{X}_{i} )^{2} }$$
(26)
Fig. 5
figure 5

Comparative results between ANFIS-GWO and ANFIS-PSO training and validation datasets used SWARA model (scenario 1): a MSE and RMSE value in the training phase; b frequency errors in the training phase; c MSE and RMSE value in the testing phase and d frequency errors in the testing phase

Fig. 6
figure 6

Comparative results between ANFIS-GWO and ANFIS-PSO training and validation datasets used CF model (scenario 2): a MSE and RMSE value in the training phase, b frequency errors in the training phase, c MSE and RMSE value in the testing phase and d frequency errors in the testing phase

Here, n is the total number of samples, \(X_{i}\) is the target values, and \(\bar{X}_{i}\) is the output values. RMSE is the square root of MSE.

Similar to the training data generation process, a similar number of non-landslide points (79 points) were randomly produced for the test dataset in the landslide-free areas. Value of 1 was assigned to 185 landslide and 0 was assigned to 185 non-landslide points. After intersecting these points with conditioning factors, the resulting values were used as the test dataset in MATLAB. To determine the most efficient model, the MSE of the test dataset should be evaluated. Figures 5c and 6c, respectively, show MSE and RMSE for the test dataset. According to the results, an MSE of 0.121 and 0.110 was obtained, respectively, for ANFIS-GWO and ANFIS-PSO hybrid models in the first scenario. The corresponding values in the second scenario were 0.107 and 0.081, respectively. It is worth mentioning that the MSE of the test dataset should be consistent with the validation results of final maps. In other words, a lower MSE (closer to zero) means a higher prediction rate (AUC) of the LSM. As shown in Table 4, ANFIS-PSO algorithm in the second scenario and ANFIS-GWO in the first scenario, respectively, with the lowest and highest MSE of 0.081 and 0.121 showed the highest and lowest accuracy among algorithms.

Table 4 Relationship between MSE, RMSE and AUC values

The final value for each pixel was calculated after entering final data into the algorithms. The resulting values were exported to ArcGIS for landslide susceptibility mapping (Figs. 7, 8). A variety of classification methods was tested for zoning the produced maps. However, the geometric interval outperformed other methods in this regard. Therefore, the maps produced by the geometric interval were classified into five classes: very low, low, moderate, high and very high.

Fig. 7
figure 7

Landslide susceptibility maps using ANFIS-GWO and ANFIS-PSO models in scenario 1

Fig. 8
figure 8

Landslide susceptibility maps using ANFIS-GWO and ANFIS-PSO models in scenario 2

5.2 Validation of the landslide susceptibility maps

Validation of models used in studies is a critical step in evaluating the ability of LSMs in predicting future events (Bui et al. 2012). As mentioned previously, 70% of identified landslides were considered for model training and 30% for validation. ROC curve was used in this study to validate the ANFIS-GWO and ANFIS-PSO hybrid models.

5.2.1 Receiver operating characteristics (ROC) curve

ROC represents the quality of the system, indicating probable and definite predictions (Assareh et al. 2010). The x and y axes of the ROC curve, respectively, show false positive and true positive rates. The area under the curve shows the quality of the predictive method by characterizing the ability to accurately forecast the frequency or failure of a pre-defined event (Devkota et al. 2013). Since AUC shows the accuracy of the method quantitatively, it is necessary to estimate this value to compare model’s performance. AUC ranges from 0.5 to 1 so that the values closer to 1 and 0.5, respectively, show the reasonable and weak performance of the model. A value 0.5 is a random guess for AUC (Dehnavi et al. 2015; Pourghasemi et al. 2012b).

Figure 9 shows the ROC curve for LSMs. According to the results, the second scenario outperforms the first one. In the first scenario, ANFIS-PSO with an AUC of 83% outperformed ANFIS-GWO with an AUC of 78.9%. In the second scenario, ANFIS-PSO with an AUC of 87.9% outperformed ANFIS-GWO with an AUA of 85%. It should be noted that a lower standard error means a larger area and thus a more accurate model (Table 5). Although AUC for all four models was greater than 0.75, ANFIS-PSO in the second scenario with an AUC of 87.9% and a standard error of 0.027 gives a more accurate landslide forecast in this research.

Fig. 9
figure 9

ROC curves for the landslide susceptibility maps: a scenario 1 and b scenario 2

Table 5 Details of ROC curve for landslide susceptibility maps

6 Discussion

Landslide is a complicated phenomenon due to different factors controlling its occurrence. Thus, the use of novel analysis methods to create high precision mapping is considered a crucial step in this regard. In the current study, a new hybrid ANFIS approach was used to prepare LSM at Karun watershed, Iran. To achieve this, ANFIS was first combined with PSO and GWO algorithms and then trained by SWARA and CF models. Landslide susceptibility mapping was carried out by ANFIS_GWO and ANFIS-PSO hybrid models, and their performance was evaluated by ROC curve. In recent years, researchers have combined a variety of quantitative and qualitative models with machine learning and data mining algorithms. The study of these methods and their combinations and also the comparison of their results will provide more realistic modeling in order to produce landslide susceptibility map. The results obtained from methods in other studies are reviewed below.

SWARA multi-criteria decision-making method and the CF probability model have been applied in various studies. Dehnavi et al. (2015) prepared LSM for Iran using SWARA and ANFIS methods. To this end, SWARA and its combination with ANFIS were employed. According to their results, the SWARA-ANFIS hybrid model with an AUC of 0.8 gave better forecasts than SWARA with an AUC of 0.78. To indicate a correlation between landslides and conditioning factors CF, probability model has also been able to produce reasonable results. Arabameri et al. (2019) used a new approach by ensemble geographically weighted regression (GWR) method with the CF and RF models for gully erosion zonation mapping in the Mahabia watershed of Iran. After gully erosion zonation mapping, RF model showed distance to stream, distance to road and land use have higher influence on gully formation. In addition, validation results indicated the better performance of GWR-CF-RF new ensemble model with a prediction rate of 96.7% compared with CF, RF and the CF-RF models with an AUC of 76.3%, 77.6% and 89.7%, respectively. For spatial modeling of landslides in Ziyang district in China, Fan et al. (2017) used CF model and its combination with AHP and then compared with bivariate statistical index to evaluate landslide susceptibility. According to their results, CF-AHP and LSI models showed the highest and lowest prediction rate with an AUC of 78.3% and 69.2%, respectively.

Adaptive neuro-fuzzy inference system (ANFIS) is a combination of neural networks and fuzzy logic for implementation of neural network knowledge using fuzzy logic (Oh and Pradhan 2011). In order to evaluate flood susceptibility maps, Bui et al. (2016) used ANFIS and its combination with two meta-heuristic algorithms (PSO and EG) as a new approach which was named MONF and then compared with the J48DT, RF, MLP Neural Nets and SVM models. They concluded that although J48DT, RF, MLP neural nets and SVM models have high accuracy with an AUC of 89.5%, 89.4%, 90.3%, 90.5% and 76.7%, respectively, MONF model has better performance (AUC = 91.1%). Aghdam et al. (2017) used FR and WOE statistical methods and ANFIS machine learning to identify landslide susceptible areas in southern provinces of the Zagros Mountains. After landslide susceptibility mapping by FR and WOE models, they were combined with ANFIS to overcome drawbacks of bivariate statistical methods. Validation results indicated the better performance of FR-ANFIS and WOE-ANFIS hybrid models with a prediction rate of 0.85 and 0.84, respectively, compared with FR and WOE (AUC = 0.82).

The most important advantage of the population-based random PSO algorithm over other optimization algorithms is its ability to exchange information among members. Moayedi et al. (2018) used ANN and its combination with meta-heuristic PSO algorithm for landslide susceptibility mapping of Layle Village in Kermanshah Province. Analysis of test data showed a coefficient of determination and root mean square error of 0.9733 and 0.111, respectively. The corresponding values for PSO-ANN were 0.9899 and 0.0389, respectively. After landslide susceptibility mapping and model evaluation by color intensity rating (CER), they found that PSO-ANN provides a more realistic evaluation of landslide probability than ANN. Also, in our study, PSO algorithm showed better results in combination with ANFIS. Termeh et al. (2018) combined ANFIS with ACO, GA and PSO algorithms for zoning flood risk in Jahrom, Fars Province. Pursuant to their result, the prediction accuracy of ANFIS-ACO, ANFIS-GA and ANFIS-PSO was 91.8%, 92.6% and 94.5%, respectively. A prediction accuracy of 91.4% was obtained for FR. Based on their results, ANFIS-PSO evaluated flood risk in the region more accurately. To similar with Termeh et al. (2018), our results indicated that ANFIS-PSO has slightly better performance in comparison with ANFIS-GWO hybrid model in both scenarios.

As mentioned earlier, gray wolf optimizer is a nature-inspired meta-heuristic algorithm based on social hierarchy and behavior of wolves during hunting process (Mirjalili et al. 2014). This new algorithm has provided reasonable results in the literature (Termeh et al. 2018; Yu and Lu 2018). Jaafari et al. (2019) used novel ANFIS-BBO and ANFIS-GWO data mining techniques for landslide susceptibility mapping. According to their results, ANFIS-GWO and ANFIS-BBO with a predication rate of 0.945 and 0.95, respectively, can be used as reliable methods in other studies.

According to the literature, soft computing methods such as machine learning and intelligent algorithms estimate the relationship between data more accurately. Although the use of machine learning methods such as ANN and ANFIS provides better results compared to the statistical and qualitative models, their combination with meta-heuristic algorithms results in more efficient outputs because these methods reduce their drawbacks. This is because meta-heuristic algorithms better train the learning networks through reducing the local minimum effect. The outputs obtained in this study as well as their comparison with the results of the previous studies confirm this statement. Based on the obtained values, all four models used \(\left( {{\text{ANFIS - PSO}}\,{\text{and}}\,{\text{ANFIS - GWO}}} \right)_{\text{SWARA }}\), \(\left( {{\text{ANFIS - PSO}}\,{\text{and}}\,{\text{ANFIS - GWO}}} \right)_{{\text{CF}}}\) with the accuracy more than 75% have acceptable performance in predicting a nonlinear problem. \({\text{ANFIS - PSO}}_{{\text{CF}}}\) model with an area under the curve of 87.9% and the MSE value of 0.081 and the \({\text{ANFIS - GWO}}_{{\text{CF}}}\) model with an area under the curve of 85% and the MSE value of 0.107 showed the best performance, respectively. Comparison of GWO and PSO algorithms showed that PSO outperformed GWO in both scenarios in terms of optimization and data convergence. Moreover, the results obtained from SWARA and CF models indicated the key role of the type of method used for evaluating the correlation between landslides and conditioning factors leading to performance improvement in data mining techniques because the quantitative CF model performed better than the qualitative SWARA model. Regardless of the type of method, type of combination also plays a key role in enhancing the precision of LSMs.

7 Conclusion

Failure of unstable slopes in mountainous regions is associated with destructive social and environmental consequences besides casualties and economic losses. Hence, identification of landslide susceptible areas has become an essential tool for regional authorities and policymakers (Jaafari et al. 2019). In this study, a novel comparison between two data mining techniques was used for landslide susceptibility mapping. Qualitative SWARA and quantitative CF models were first used to determine the relationship between factors (such as slope, aspect, altitude, TWI, plan curvature, profile curvature, rainfall, distance to rivers, distance to roads, distance to faults, land use and lithology) and identified landslides. For effective training, ANFIS was combined with GWO and PSO intelligent algorithms. As a result, ANFIS-GWO and ANFIS-PSO hybrid models were separately generated using SWARA and CF outputs. Finally, LSMs were generated and then assessed by the ROC curve.

Given the AUC values, the hybrid models trained by the CF probability model (\({\text{ANFIS - GWO}}_{{\text{AUC}}} = {\text{\% }}85\,{\text{and}}\,{\text{ANFIS - PSO}}_{{\text{AUC}}} = {\text{\% }}87.9\)) gave better estimates than those generated by the SWARA multi-criteria model (\({\text{ANFIS - GWO}}_{{\text{AUC}}} = \% 78.9\,{\text{and}}\,{\text{ANFIS - PSO}}_{{\text{AUC}}} = \% 83.8\)). According to the results, the PSO algorithm outperformed GWO algorithm in terms of calculating the final value of pixels in both scenarios. In addition, in comparison with SWARA, CF probability model has better performance to evaluate relationship between landslides and conditioning factors. In general, the use of data mining techniques allows more effective understanding of latent data patterns, leading to more realistic outputs. It should be noted that the use of data mining techniques does not necessarily guarantee reaching an optimal solution. As shown in the current study, the type of method used to illustrate the correlation between effective factors and landslides is important. The models used in this study are recommended for evaluating risk of landslides in other susceptible areas to help regional authorities and policymakers.