1 Introduction

Traditional approaches of natural slope failure analysis employed various engineering-designed tools [1, 2]. Presenting more progressive designed tools, such as the machine learning-based predictive algorithms, draw attention to a lot of researchers [3, 4]. Most studies have exposed that the machine learning-based techniques are dependable methods to approximate the engineering complex explanations and solutions [5]. The stability of the local slopes against failure is a critical matter that has to be investigated meticulously [6, 7], because of their high impacts on the adjacent engineering buildings (e.g., projects that include excavation and transmission roads, etc.). Also, slope failures cause a lot of damages (e.g., the loss of property and human life) worldwide every year. There are many factors that need to be considered during the stability of such slopes. As an example, the saturation degree, along with other intrinsic of the soil properties, mostly affects the chances of slope failure [8, 9]. Up to now, many scientists intend to provide effective modeling for the stability of slopes [10, 11]. Some disadvantages of traditional approaches such as the necessity of utilizing laboratory equipment [12,13,14] along with the high level of complexity make them a difficult solution [15,16,17,18]. Additionally, they cannot be utilized as a certain solution, because of their limitation to investigate a specific slope condition (e.g., slope height, the angle of the slope, soil properties, depth of the groundwater level, etc.). Because of the criticality of slope steadiness evaluation, many types of research have been concentrating on tackling this problem of attention. At this time, machine learning, analytical methods, and expert assessment are usually employed to analyze slope conditions [19]. The very first approach is based on experts’ knowledge and experiences [20,21,22,23,24]. Using slope stability specialists’ judgments, considering the key factors that possibility have higher influences on the further slope failure could be recognized [25]. Though the main disadvantage of the expert evaluation approach is that it mostly relies on judging subjectively and it is infeasible to make sure the consistency of the prediction results [26]. Recently, complexity of landslide failures has caused plenty of losses in both financial and psychological aspects around the world. Varnes and Radbruch-Hall [27] introduced the landslide as the whole sort of gravity-made downward mass (i.e., soil, natural cliffs, and artificial deposits) movements at slopes. Landslide may occur on these masses. Scholars [28] have stated that developing countries are more exposed to landslides and more than around 90% of the landslides occurred in these countries. Analyze of the landslide susceptibility of a zone is an effective task for reducing upper mentioned losses [29]. According to a set of geological and environmental states, landslide susceptibility is defined as the spatial possibility of landslide incidents. For various landslide-prone zones worldwide, there are different methods in the case of landslide hazard mapping [30, 31].

Researchers compared the performance of different landslide predictive evaluative methods. For predicting the landslide hazard in Longhai zone (China), He et al. [32] used Naïve Bayes (NB), RBF neural, radial basis function (RBF) and classifier and compared obtained results. They used FR and SVM approaches for predicting the performance of landslide conditioning factors and showed that RBF classifier has better performance in comparison to NB and RBF network with the precision of 88.1%. Chen et al. [33] used different new predictive approaches in the landslide probability prediction including generalized additive model (GAM), ANFIS-FR (adaptive neuro-fuzzy inference system synthesized with FR), and SVM for a specific zone in China. They showed that the SVM has better performance than ANFIS-FR as well as GAM with precisions around 87.5, 85.1, and 84.6%, respectively. In this regard, other scholars used LR and ANFIS predictive methods and planned the unstable rockfall of Firooz Abad-Kojour earthquake, which occurred in 2004, and indicated that ANFIS has better performance. In addition, many researchers have defined and made different hybrid evolutionary approaches with enhancing artificial neural network (ANN) and ANFIS in landslide susceptibility mapping [34,35,36,37,38] as well as flood susceptibility mapping [39,40,41] for enhancing the performance of usual approaches. To design the ensembles of Random Subspace-based Reduced Error Pruning Trees (RSREPT), MultiBoost according to Reduced Error Pruning Trees (MBREPT), Bagging-based Reduced Error Pruning Trees (BREPT), and Rotation Forest-based Reduced Error Pruning Trees (RFREPT) for landslide hazard prediction, Pham et al. [42] utilized a hybrid technique of Reduced Error Pruning Trees (REPT). The findings of this research revealed the superiority of the BREPT method. Also, Moayedi et al. [37] coupled an MLP neural network with particle swarm optimization (PSO) in the case of spatial landslide hazard modeling at Kermanshah province, western Iran. They concluded that using PSO can facilitate obtaining more precise outputs. Chen et al. [43] also studied the robustness of PSO differential evolution (DE) and genetic algorithm (GA) techniques to enhance the ANFIS efficiency. Their results showed that ANFIS-DE method outperformed ANFIS-GA, and ANFIS-PSO with respect to the calculated area under the curve (AUCs) of 0.844, 0.821, and 0.780, respectively. For assessment of rainfall-triggered landslide risk, Tien Bui et al. [44] enhanced and used the model of Least-Squares Support Vector Machines (LSSVM) utilizing differential evolution (DE) method and proposed LSSVM model which has better performance than MLP, J48, and SVM algorithms with the accuracy of 82%. The required approximation is commonly performed using a spatial dataset that consists of various landslide conditioning parameters such as stream power index (SPI), altitude, soil, rainfall, climate, lithology, aspect, lithology, and the distance for linear phenomena. Many scholars have selected predictive techniques such as statistical index (SI), certainty factor (CF), index of entropy (IOE) and frequency ratio (FR), and also regression-based methods and evaluated the risk of a landslide by these approaches [45,46,47,48]. In this regard, Yang et al. [49] conducted a specific case study for investigating the efficiency of a spatial logistic regression (SLR) method for modeling the landslide hazard in Duwen Highway Basin at Sichuan Province located China. Moreover, many researchers have expanded and proposed a GeoDetector-based approach for selecting the landslide-related factors, properly. The occurred estimation of their proposed model was around 11.9% enhanced compared to the usual logistic regression (LR) model. Additionally, a case study has been conducted for analyzing the landslide occurrence risk of a specific location in China. In this way, IOE as well as certainty factor (CF) methods by utilizing conditioning factors of the slope angle, distance to rivers, plan curvature, profile curvature, distance to faults, geomorphology, distance to roads, topographic wetness index (TWI), slope aspect, general curvature, rainfall, lithology and altitude and also the sediment transport index (STI) as well as the stream power index (SPI) [50]. Based on the respective accuracy of 82.32% and 80.88%. It has been determined that CF calculated the landslide susceptibility map with more validity in comparison to IOE. In terms of making an efficient, fast, and inexpensive prediction of landslide hazard, soft computing (SC) approaches have been highly suggested by scholars because of its computational advances [36, 48, 51,52,53,54]. In this way, researchers have performed different studies. Lee et al. [55] have used a support vector machine (SVM) method for specific zones in Korea (Pyeong Chang and Inje). They have employed the SVM method as a reliable tool for analyzing landslide hazard and found that this method had proper results with accuracy by around 81.36% and 77.49% for the Pyeong Chang and Inje zones, respectively. Pradhan and Lee [56] utilized a back-propagation neural network in producing the landslide hazard map in a specific area in Malaysia with 83% precision.

This paper addresses a comprehensive optimization for landslide hazard analysis using six state-of-the-art metaheuristic algorithms. To this end, we first evaluate the capability of seven traditional classification techniques. Various statistical indices are used to distinguish the most capable model. Then the proposed elite model is coupled with six evolutionary algorithms to enhance its performance. As well as the statistical indices, area under the AUROC and the classification rate are considered to compare the efficiency of the used ensembles.

2 Methodology and data collection

2.1 Data collection

Referring to the previous researches in the context of landslide hazard assessment, fifteen key factors including altitude, slope, total curvature, profile curvature, plan curvature, SPI, TRI, TWI, fault river, road, aspect, soil, land use, and geology are considered as the independent variables in this paper. The response variable is also considered the landslide occurrence index consisting of two values of 1 (i.e., 133 rows indicating nonlandslide) and 2 (i.e., 133 rows indicating landslide occurrence). In overall, out of 266 samples, 212 data (i.e., 80%) were randomly selected for training the proposed models. Then the accuracy of them is evaluated by means of the remaining 54 data (i.e., 20%). The mentioned landslide independent factors were produced in the geographic information system (GIS). In fact, some pre-processing actions were carried out for each layer to be created from its basic formats such as contours, polygons, and tabular data. In the next step, the values of each GIS raster were extracted for each landslide and nonlandslide point. Table 1 denotes an example of the dataset used in this study.

Table 1 An example of the provided dataset for perdition of actual landslide target

2.2 Conventional machine learning classification techniques

Learning algorithms have gained a huge attraction in many fields of research [57]. The machine learning models that are utilized in this work are introduced as follows:

The idea of logistic regression (LR) is drawn on determining a target with dichotomous variables such as true and false or 0 and 1 influenced by some independent factors [58]. For every classification usage, it aims to find a reasonable fit to establish a relationship between the presence or absence of the proposed target event and its key factors. Finally, it calculates the results through developing a linear equation in which a weight is multiplied by each conditioning factor [59]. Multi-layer perceptron (MLP) is the most common notion of ANNs which its idea was first designed in 1943 [60]. The MLP is capable to discover the non-linear relationship between the proposed variables. An MLP is composed of some layers including one input layer, one or more hidden layer(s), and one output layer containing several computational units called neurons. Each neuron receives the input vectors and assigns some weights and biases to establish a mathematical equation. The name SGD implies stochastic gradient descent learning method [61] which is a common iteration-based optimizer. The SGD breaks a set of samples into mini batches for calculating the gradient on each batch separately, instead of computing the gradient of the cost entirely. In other words, it optimizes a pre-defined cost function to achieve the most accurate parameters of the problem [62]. The name decision table (DT) indicates a tabular classification model in which the data are sorted to find an exact match in the table. In this sense, two responses are likely to appear: (1) if the desired value is met it will be considered as the response, (2) otherwise, the answer is no match found [63]. Generally, four major sections that construct the DT are condition stubs, condition entries, action stubs, and action entries. During the validation stage, the DT checks cases like incompleteness and contradiction [64]. Self-organizing maps (SOM) denote a special notion of ANNs, devoid of the hidden layer [65]. In this model, the input vector is mapped into a lower dimensional map. Considering two inputs, if they are closely related in the reference dataset, they remain closely related in the mentioned lower dimensional map. In such cases, they are mapped into a similar map-unit [65]. Locally weighted learning (LWL) [66] is a well-known lazy learning model. Lazy LWL responses the queries by creating a so-called local model “Naive Bayes (BS)”. The training samples that are similar to the query data are used to develop this model. Regarding the distance between the proposed training point and the prediction point, a weight is assigned to each training sample. In other words, training data which are located closer to the estimation point receive a larger weight [67]. This model is well detailed in [66]. REP tree is a fast notion of decision tree learners. In this model, with respect to the type of the problem, a regression tree or a decision tree is build using the information gain as a splitting criterion. Remarkably, it is pruned using the reduced error pruning. There is only one chance for numeric attributes to be sorted, and the missing ones are dealt with by splitting the related instances into pieces [68].

2.3 Metaheuristic evolutionary techniques

Due to the advances in soft computing, diverse optimization techniques have been successfully used for different applications [69]. For the prediction of the landslide occurrence, many natural-inspired algorithms have been employed by researchers. These algorithms are utilized to pre-defined objective function (OF). OF function is commonly used for optimizing algorithms and calculating the precision of different algorithms. Minimizing the results of the OF can enhance the accuracy of estimation including regression or classification. First, in the evolutionary approaches, a random population should be defined as the involved relations. The advantages of a determined solution can be evaluated during a repetitive procedure. If the next solution has high accuracy, it is deserted and this should be continuing up to one of the stopping criteria occurred. In the case of the productive tasks, MLP neural network can be used that consists of a general function. In the case of the MLP, the foundation of the MLP optimization is its activation functions and also the weights and biases of the MLP. For achieving a proper performance, enough number of iterations is needed in a defined approach. In the process of the MLP optimization, the computational error reduces in each iteration and also the enhanced weights and biases of the MLP can be utilized for generating new outputs. In addition to the MLP algorithm, there are various evolutionary algorithms that are used in the lecture as follows.

Ant Colony Optimization (ACO) is first introduced in Ref. [70]. It is known as a novel branch of the hybrid evolutionary. This algorithm can mime the foraging life approaches of ant herds. The observed relations are extremely in touch to collaborate. Each ant chooses the path by predicting a possibility. They leave a chemical pheromone trail on the way for the other ants and they guided using this smell and it assistances them to choose the more promising path. This relation helps ants to discover the shortest path among the nectars, for example, the food sources as well as the nest. In Ref. [71], according to the geographical dispensation of a species, biogeography-based optimization (BBO) algorithm has been suggested. In this regard, two different factors of this model are a habitat as well as a habitat suitability indicator which, respectively, produce the possible solution of the suggested issue and its advantage. In this regard, a possible solution is introduced (for example, habitat), which includes some features (decision variables). This method is based on sharing the features among the possible solutions and can enhance the advantage of the possible solutions. This method is based on the migration operators for mutating the calculated habitat suitability index and is properly detailed in Refs. [72, 73]. This method operates via enhancing the diversity of the population for preventing trapping I the local minima [74]. For the first one, the algorithm named evolutionary strategy (ES), which points a stochastic metaheuristic approach has been proposed in Ref. [75]. This method was developed in Ref. [76]. The algorithm of ES uses two different selection and mutation operators and is based on two approaches of evolution and adaption. The canonical version consists of six major steps with distinct notions of this method. The population is initialized and it will be analyzed. Then a population is produced with offspring variables. For selecting the elite population, the modality of the offspring is compared to the parents. Holland [77] first designed genetic algorithm (GA) that is known as a robust search approach. This method has been widely utilized for different optimization issues [78, 79]. The GA method proposes an initial group of number strings that each one might be a possible solution. This method is similar to other metaheuristic algorithms. The modality of these parameters is then analyzed and the algorithm distributes to change these groups to a more promising string set. For producing a new population and achieving the excellent generation, a reproduction method can be considered. More details about this method are presented in Refs. [80, 81]. In the lecture, there are different special types of the GA method like probability-based incremental learning (PBIL). This algorithm is proper to conclude the genotype of a probability vector rather than relation. It is introduced as a combination of evolutionary calculation as well as reinforcement learning. The learning method of the PBIL is similar to the algorithm of GA. It gets initiated using a possibility initialization. Next, the instances are produced by the present possibility matrix and the best instance is recognized. After that, the expanded probability matrix is enhanced by the elite sample. In addition, a mutation operator operates probabilistically. Lastly, a termination criterion ends the algorithm [82]. Particle swarm optimization (PSO) is first proposed in Ref. [83], which is based on mimicking the social behavior along with herd lifestyle of the animal. This method is utilized for increasing different typical intelligent methods [84, 85]. Higher learning speed and using less memory are the considerable merits of this algorithm compared to other optimization methods of Imperialist Competition Algorithm (ICA), Artificial Bee Colony (ABC), GA, etc. [37]. In the PSO algorithm, the candidate solution and also population take the name of “particle” and “swarm”. In addition, these particles have the position and velocity, which are known as the determinative factors. Each particle is analyzed against the total population in the case of its position, for enhancing its change. More details about this method are presented in [86, 87].

3 Results and discussion

As stated supra, this study outlines the optimization of a typical machine learning model for landslide occurrence prediction. Two major steps form the body of this paper. First, the performance of seven conventional machine learning classification techniques including LR, MLP, SGD, DT, SOM, LWL, and REP tree is evaluated. Then the elite model is selected to be optimized with six metaheuristic algorithms, namely ACO, BBO, ES, GA, PBIL, and PSO. Note that five accuracy criteria of kappa statistics (κ), mean absolute error (MAE), root mean square error (RMSE), relative absolute error (RAE in  %), and root relative squared error (RRSE in  %) were used. Equations (1)–(5) describe these indices:

$$\kappa = \frac{{p_{\text{o}} - p_{\text{e}}}}{{1 - p_{\text{e}}}},$$
(1)
$${\text{MAE}} = \frac{1}{N}\sum\limits_{I = 1}^{s} {|Y_{{i_{\text{observed}}}} - Y_{{i_{\text{predicted}}}} |},$$
(2)
$${\text{RMSE}} = \sqrt {\frac{1}{N}\sum\limits_{i = 1}^{s} {[(Y_{{i_{\text{observed}}}} - Y_{{i_{\text{predicted}}}})]}^{2}},$$
(3)
$${\text{RAE}} = \frac{{\sum\nolimits_{i = 1}^{s} {\left| {Y_{{i_{\text{observed}}}} - Y_{{i}_{\text{observed}}}} \right|}}}{{\sum\nolimits_{i = 1}^{s} {\left| {Y_{{i}_{\text{observed}}} - \overline{Y}_{\text{observed}}} \right|}}},$$
(4)
$${\text{RRSE}} = \sqrt {\frac{{\sum\nolimits_{i = 1}^{s} {(Y_{{i_{\text{predicted}}}} - Y_{{i}_{\text{observed}}})^{2}}}}{{\sum\nolimits_{i = 1}^{s} {(Y_{{i}_{\text{observed}}} - \overline{Y}_{\text{observed}})^{2}}}}},$$
(5)

in which po is the percentage agreement between the classifier and ground truth, and pe represents the chance agreement. Also, \(Y_{{i}_{{_{\text{observed}} }}}\) and \(Y_{{i}_{{_{\text{predicted}}} }}\) stand for the actual and predicted values of landslide occurrence, respectively. The term S defines the number of instances, and \(\bar{Y}\)observed is the average of the target landslide numbers (i.e., 1 and 2).

3.1 Conventional machine learning technique implementation

In this part, the performance of LR, MLP, SGD, DT, SOM, LWL, and REP tree classification models is evaluated for estimating the landslide occurrence using altitude, slope, total curvature, profile curvature, plan curvature, SPI, TRI, TWI, fault river, road, aspect, soil, land use, and geology are considered as landslide independent factors. The Waikato environment for knowledge analysis (WEKA) software was used to implement the mentioned models. Table 2 shows an example of the produced results by each model. Also, the implemented models are compared in Table 3 in terms of KS, MAE, RMSE, RAE (%), and RRSE (%) accuracy indices. Note that, a score-based ranking system is also developed to determine the most capable models. Based on this system, the cells that indicate more accuracy for each model are shown with more intense red color. Finally, the overall score of each model determines its ranking. According to this table, the MLP outperforms all other six models in terms of all defined indices. In this regard, the KS, MAE, RMSE, RAE, and RRSE of the MLP are obtained as 0.796, 0.219, 0.312, 53.976, and 62.392, respectively. In addition, considering the total ranking score (16, 35, 12, 9, 12, 21, and 30, respectively, obtained for LR, MLP, SGD, DT, SOM, LWL, and REP tree) it can be seen that the MLP presents the most accurate estimation, followed by REP tree and LWL as the second and third efficient models.

Table 2 Assessment of conventional machine learning classification
Table 3 The obtained results of the conventional machine learning models for the landslide occurrence prediction

3.2 Metaheuristic evolutionary technique implementation

In this part, it is aimed to optimize the performance of the elite model which showed the highest accuracy of the prediction among typical machine learning approaches. As explained, the MLP outperformed other models and is coupled with ACO, BBO, ES, GA, PBIL, and PSO evolutionary algorithms to achieve a more reliable approximation of landslide occurrence risk. This is noteworthy that the mentioned methods try to find the optimal values of the weights and biases of the MLP.

The programming language of MATLAB 2014 was used for this part. Each optimization process was executed within 1000 iterations, and mean absolute error (MSE) as defined as the objective function to measure the accuracy of the ACO-MLP, BBO-MLP, ES-MLP, GA-MLP, PBIL-MLP, and PSO-MLP ensembles in each iteration. Figure 1 illustrates the convergence path of each model. According to this figure, the GA-MLP, PBIL-MLP, and BBO-MLP have reached the lower MSE in comparison with other ensembles. Notably, the MSEs obtained for the ACO-MLP and ES-MLP (around 0.0080) was a little higher than PSO-MLP (around 0.0027). Also, considering the number of iterations that each model needed to reach the minimum error, it was concluded that the ACO-MLP had the best convergence speed; however, the obtained MSE was higher than other models.

Fig. 1
figure 1

The convergence curves of the implemented evolutionary models in terms of MSE

In the following, two accuracy criteria of the area under the receiving operating characteristic curve (AUROC), which is a well-known method for evaluating the accuracy of a diagnostic issue, and classification ratio are defined. The ROC curves related to the performance of the typical MLP, as well as metaheuristic ensembles, are shown in Fig. 2. Needless to say, the higher the AUROC, the higher the accuracy of the results. As the first result, it can be deduced that the performance of the MLP has enhanced considerably, by applying the mentioned algorithms. Moreover, the GA-MLP has gained the highest AUROC (0.983), followed by BBO-MLP (AUROC = 0.971) and PBIL-MLP (0.948). After those, the PSO was a more successful technique for optimizing the MLP (0.917), in comparison with ACO (0.896) and ES (0.878).

Fig. 2
figure 2

The ROC curve plotted for the results of the applied models

Furthermore, the percentage of the correctly classified samples demonstrates another method for accuracy evaluation of the applied models. Figure 3 shows the classification ratio of the proposed models. First, increasing the classification ratio of the MLP shows the efficiency of the used optimization algorithms. Also, supporting the AUROC results, the highest value of the classification ratio is obtained for the GA-MLP (85%) followed by BBO-MLP (81.5%) and PBIL-MLP (79.6%). The PSO-MLP has classified 75.4% of the samples correctly. This value was calculated as 62.8% and 60.1%, respectively, for the ES-MLP and ACO-MLP. All in all, the results of the study show that the GA, BBO, PBIL, and PSO have shown the higher capability of optimization of the MLP neural network.

Fig. 3
figure 3

The classification rate of the performance of the typical MLP and evolutionary ensembles

4 Conclusions

Due to the importance of having an appropriate approximation of landslide occurrence risk, this study presented a comprehensive optimization of MLP neural network for landslide occurrence prediction using six capable evolutionary methods, namely ACO, BBO, ES, GA, PBIL, and PSO. To do so, a proper dataset was provided. First, seven conventional machine learning techniques including LR, MLP, SGD, DT, SOM, LWL, and REP tree were evaluated. The results showed that the MLP performed more efficiently with respective KS, MAE, RMSE, RAE, and RRSE of 0.796, 0.219, 0.312, 53.976, and 62.392. In the next step, the elite model (i.e., MLP) was coupled with the mentioned optimization algorithms to achieve a more reliable prediction. The results showed that the AUROC and classification of the MLP (i.e., 0.816 and 58.3%) increased as 0.983, 0.971, 0.948, 0.917, 0.878, and 0.896, and 85, 81.5, 79.6, 75.4, 62.8, and 60.1%, respectively, by applying GA, BBO, PBIL, PSO, ES, and ACO techniques. From the comparison viewpoint, it was revealed that GA outperformed other optimization methods. Finally, it is worth noting that the presented paper can be improved by performing an optimization for proper selection of landslide conditioning factors, which would be a good idea for future works.