1 Introduction

Slope stability assessment is one of the key subjects in the field of mining/geotechnical engineering as it concerns the safety of natural and man-made structures including the important lives. The imperativeness of the slope assessment has led to the development of various models using the factor of safety as the stability index. This index is considered imperative and conventional in evaluating slope stability and design treatment choice. The most established models for slope stability evaluation are based on the principle of limit equilibrium (LE) approach and conventional numerical technique based on the theory of elasticity and plasticity (Liao and Liao 2020). The pioneering works on LE methods were done by Fellenius (1936), Bishop (1955), Spencer (1967), Morgenstern and Price (1965), and Janbu (1954). More information on their respective assumed failure mechanism, sliding surface, satisfied equilibrium conditions, and application can be found in Azarafza et al. (2021).

The major drawback of the LE approach is that the resulting equations are statically indeterminate, and as a result, simplified assumptions have to be made to make the resulting equations solvable (Lawal 2018; Xu and Lawal 2021; Lawal 2021). It was also stated that they are computationally tedious, and their quick applications for assessing slopes against the natural hazards like liquefaction may not be possible (Marrapu et al. 2021). To utilize the LE approach, the engineers usually face the problem of locating the critical slip surface, and several procedures such as grid center, Siegel’s method, Carter’s method, genetic algorithms, leapfrog algorithm, and annealing algorithm among others have been proposed (Boutrup and Lovell 1980; Siegel 1975; Carter 1971; Goh 1999; Zolfaghari et al. 2005; Bolton et al. 2003). Nevertheless, owing to the rigors associated with the implementation of these procedures, the engineers still rely on prior experience based on a trial-and-error approach to locate the minimum critical surface. The proposed methods in this study will help in solving the need for the location of the critical failure surface before computing the factor of safety of slopes and simple to implement. On the other hand, the conventional numerical techniques are complex and costly to implement as they require a constitutive model that can accurately reflect the mechanical behavior of soil and rock slopes, which is a tedious task (Liao and Liao 2020).

Recently, multiple linear regression method has also been used to assess the slope stability to provide models that can be used for quick assessments of the stability of the slopes (Erzin and Cetin 2013; Cetina 2014). The accuracy of this method is usually low because of the nonlinear relationships between the contributing factors to slope failure. The quest for the models that can accurately predict the slope stability has led to the adoption of the artificial intelligence (AI) approach. Artificial neural network (ANN) has been the most notably used AI approach for the slope stability assessment. Sakellariou and Ferentinou (2005) assessed the factor of safety of slopes using the ANN method based on the various recorded data from literature. Erzin and Cetin (2013) developed ANN and multiple linear regression (MLR) models for the factor of safety predictions based on the database generated with the simplified Bishop methods. Abdalla et al. (2015) also proposed ANN-based models for the minimum factor of safety computation in clayey soil based on a similar source of data to that of Erzin and Cetin (2013). Liao and Liao (2020) adopted MARS and ANN models to predict the factor of safety based on the historical dataset. Also, Marrapu et al. (2021) adopted ANN model in their study to assess the factor of safety of slopes.

Other authors like Liu et al. (2014), Hoang and Pham (2016), Xue et al. (2014) among others have used different machine learning (ML) methods to assess the slope stability. Most recently, Mahmoodzadeh et al. (2022) used Gaussian process regression (GPR), support vector regression (SVR), decision trees (DT), long short-term memory (LSTM), deep neural networks (DNN), and K-nearest neighbors (KNN) to predict the FoS of the slope but the performances of all their adopted methods are poor as the highest R2 value obtained in their study is around 81%, and hence more reliable models are still needed. It appears that ANN has been the most widely used AI method for slope stability prediction as evident in the reviewed studies. However, the metaheuristic-based ANN which has been successfully used to improve the performance of the ANN is scanty for the factor of safety predictions. In fact, the multiverse optimization and salp swarm optimization algorithms are yet to be used for slope stability analysis.

Although there are many methods for assessing the slope stability as previously mentioned, there is no single one of the methods agreed to be the best or preferred over others (Albataineh 2006). The reliability of any solution from each method is completely left to the field engineers (Albataineh 2006; Shiferaw 2021). Hence, selecting reliable models among several LE and ANN methods scattered in the literature for the prediction of factor of safety of slopes may be difficult for the users. Therefore, in this study, we propose two different hybrid ANN models and perform the reliability analysis of the existing models including the proposed models in this study using another historical dataset, which are not included in developing the newly proposed models. Another problem with the current ANN models for the factor of safety prediction is that they are not implementable as they are not in the form of tractable mathematical models or codes. This problem compels the users to still rely on the traditional LE models. The proposed hybrid ANNs in this study are transformed into a simple mathematical form coupled with MATLAB code for easy implementation. Hence, the proposed study offers several novel advantages varying from developing novel hybrid ANN models to selecting most reliable models for the factor of safety prediction using rigorous statistical methods.

2 Materials and method

2.1 Data description and analysis

The adopted dataset summarized in Table 1 is historical dataset obtained from Sakellariou and Ferentinou (2005) and Sah et al. (1994). Table 1 depicts the geotechnical and geometrical parameters that are prominent in slope stability prediction. Forty-six (46) parameters for each of the geotechnical parameters (such as friction angle, cohesion, unit weight, and pore water pressure) and geometrical parameters (slope height and slope angle) were obtained for the model developments. The statistical descriptions of the datasets are shown in Table 1. Furthermore, the correlation matrix of the datasets is also presented in Fig. 1.

Table 1 Summary statistics of the datasets
Fig. 1
figure 1

Correlogram of the model datasets

In general, the relationship between the geotechnical and geometrical parameters and the factor of safety is weak. Therefore, a model that can capture the complexity between the parameters is necessary to ensure accurate prediction of FoS. The number of datasets used in model development are good enough as many established literature has used smaller amounts of datasets for developing AI models related to slope stability. For instance, Zhao (2008) used ten datasets for their SVM model, Wang et al. (2005) used 27 datasets for their ANN model, Lu and Rosenbaum (2003) used 32 datasets for their ANN model, and Choobbasti et al. (2009) used 36 datasets in their models, while Samui (2008), Das et al. (2011) and Xue (2017) used 46 datasets each to develop their respective models which is the same with the number used in this study.

Asides from the model development, one of the key features of this study is the selection of the most reliable model for the factor of safety determination. Bishops (1955), Morgenstern and Price (1965), Fellenius (1936), and Janbu (1975) are the most used LE-based methods for the factor of safety prediction, and the ANN model proposed by Marrapu et al. (2021) is assessed for the purpose of selecting the most suitable model among them including the newly developed ordinary and hybrid ANNs in this study. The assessment of the models is done using new nineteen (19) historical datasets obtained from the historical slope cases which are not part of those used in developing the models. It is also imperative to state that the FoS predicted by the Bishop, Morgenstern and Price (1965), Fellenius (1936), and Janbu (1975) tagged FSB, FSM.P, FSFel, and FSJan in this study which were obtained from SlopeW software, and an ANN model tagged FSANN from Marrupu et al. (2021).

2.2 Model developments

2.2.1 ANN model

ANN is an aspect of AI method that mimics the function of human nervous system and appears to be the most widely used AI method in various fields of applications, including geotechnical engineering. Various properties of geomaterials have been determined using ANN, and it has proven to be versatile in its predictive performances (Lawal et al. 2021a, 2022). In the aspect of slope stability investigation, ANN has also been widely used, as previously delineated in the introduction section, and ANN is also adopted in this study to assess the slope stability. The proposed ANN-based model is unique compared to the current ANN models for FoS prediction because it is self-iterated. The mathematical form, which can serve as an alternative to the current LE-based equations, is also available. The self-iteration implies that contrary to training and retraining of the ANN structure as obtainable in the ANN interface in MATLAB, a code is generated which allows self-training and retraining of the networks until the suitable or optimum performance of each of the tried architectures is achieved. At each iteration stage, the values of MSE are evaluated. To perform the self-iterated ANN model, the adopted datasets were preprocessed by normalizing them into the range of -1 to 1 to enable dimensional uniformity and avoid overfitting (Lawal et al. 2021b, c). The processed datasets were then imported into MATLAB and ANN parameters such as the number of hidden neurons, and the training and transfer functions were defined. The number of neurons in the input layer were six (6) while that of the output layer was one (1). The number of neurons in the hidden layers were varied between 2 and 10 to obtain the optimum ANN architecture. The ANN was trained with backpropagation training algorithm coupled with Levenberg–Marquardt training function. The transfer function used at the hidden and output layers were hyperbolic tangent function. Each tried ANN structures was subjected to the same number of iterations to obtain the optimum result. The obtained results for each of the simulated network is presented in Table 2. Although the overall coefficient of correlation (R) value of the 6-9-1 seems to be the highest, it is selected for further analysis but the R value for the validation phase of this ANN structure is slightly lower than those of 6-7-1 and 6–8-1. To further enhance the validation and overall performances of the optimum 6-9-1 structure, we employed two novel metaheuristic optimization algorithms, i.e., MVO and SSA algorithms described below.

Table 2 Simulated ANN models

2.2.2 Development of hybrid models

The optimum ANN structure obtained after several simulations as presented in Table 2 is subjected to stochastic-based metaheuristic algorithms. The multiverse optimizer (MVO) and salp swarm algorithm (SSA) are used in this study. The two selected algorithms are relatively new optimization algorithms and have never been used for FoS predictions. Although Lawal and Kwon (2022) adopted SSA algorithm for predicting the ultimate bearing capacity of shallow foundations, the performance of SSA is encouraging. However, MVO and SSA algorithms have not been used for FoS prediction, and hence the proposed hybrid ANN are new. In addition, Faris et al. (2019) stated that SSA has some unique merits that are not obtainable in the traditional particle swarm optimization (PSO), gray wolf optimization (GWO), and grasshopper algorithm (GSA) methods. Thanks to the No-Free-Lunch theorem that encourages the use of different algorithms as the performance of an algorithm may not be optimum in all the cases.

The MVO algorithm is one of the recent population-based metaheuristic algorithms proposed by Mirjalili et al. (2016). The motivation behind this algorithm is the theory of the multiverse in astrophysics. The multiverse theory explains the creation of universes by big bangs and the interaction between them through different holes named white holes, black holes, and wormholes. Mirjalili et al. (2016) utilized the concept of white and black holes to explore search space, while the wormhole was used to exploit the search space to formulate the population-based algorithm. Each solution was assumed to stand for a universe with an individual variable/attribute in the solution space standing for an object in the universe. Furthermore, each solution has a fitness value otherwise called inflation rate to reflect/mirror the solution quality which is computed by the corresponding objective function. The main mathematical models of the MVO algorithm are presented in Eqs. (1) to (4)

$${X}_{i}^{j}=\left\{\begin{array}{c}{X}_{k, }^{j},\quad {r}_{1} <NI({U}_{i})\\ {X}_{i}^{j}, \quad {r}_{1}\ge NI({U}_{i})\end{array}\right.$$
(1)

where Xij stands for the jth object of the ith universe, r1 is a random number in the range (0,1), NI(Ui) stands for the normalized inflation rate of the ith universe, and Xjk stands for the jth object of the kth universe.

$${X}_{i}^{j}=\left\{\begin{array}{c}\begin{array}{l}\left({X}_{j}+TDR\times \left(ub-lb\right)\times {r}_{4}+lb\right), {r}_{3}<0.5, \\ \left({X}_{j}-TDR\times \left(ub-lb\right)\times {r}_{4}+lb\right), {r}_{3}\ge 0.5, \end{array} {r}_{2}<WEP\\ {X}_{i}^{j} {r}_{2}\ge WEP\end{array}\right.$$
(2)

where Xj stands for the jth parameter of the best universe obtained so far, ub and lb are the upper and the lower bounds, TDR is the traveling distance rate, which is a coefficient, WEP is the wormhole existence probability (WEP), it is also a coefficient, and r2 to r4 are random numbers in the same range as r1. WEP and TDR imply the adaptive variables, which are for the enhancement of exploitation. The formula for the computation of WEP and TDR is as follows:

$$WEP=min+l\times \left(\frac{max-min}{L}\right)$$
(3)
$$TDR=1-\frac{{l}^{1/p}}{{L}^{1/p}}$$
(4)

where p represents the exploitation factor, min is the minimum and max is the maximum, l is the current iteration, and L shows the maximum iterations. The pseudo-code for implementing the MVO is presented below (Mirjalili et al. 2016).

Algorithm 1 Pseudo-code of MVO algorithm

figure a

On the other hand, SSA is a population-based algorithm like MVO that was recently proposed by Mirjalili et al. (2017). It mimics the swarm behavior of salps in nature. According to Faris et al. (2019), SSA has some unique merits that are not obtainable in the traditional PSO, GWO, and GSA methods. The dynamic movements of salps enhance the searching capabilities of the SSA in escaping from local optima (LO) and premature convergence drawbacks. This study adopts this method based on this unique advantage and its previous application for the optimization of ANN models (e.g., Lawal and Kwon 2022). To mathematically model the salp chains, the population is divided into two groups: leaders and followers; leaders play the frontline role, while others follow each leader either in the same direction or opposite direction. The main mathematical equations for SSA algorithm are presented in Eqs. (5) to (7):

$${z}_{j}^{1}=\left\{\begin{array}{c}{F}_{j}+{\delta }_{1}\times \left(-\left({LB}_{j}{-UB}_{j}\right)\times {\delta }_{2}+{LB}_{j}\right), {\delta }_{3}\ge 0.5\\ {F}_{j}-{\delta }_{1}\times \left(-\left({LB}_{j}{-UB}_{j}\right)\times {\delta }_{2}+{LB}_{j}\right), {\delta }_{3}<0.5\end{array}\right.$$
(5)

where z1j stands for the position of the leader in the jth dimension, and Fj represents the position of the food source in the jth dimension. The δ1 is defined as in Eq. (6), and δ2 and δ3 are random variables which ranges between 0 and 1.

$${\delta }_{1}=2{e}^{{-\left(\frac{\Delta l}{L}\right)}^{2}}$$
(6)

where l is the iteration and L is the maximum number of iterations. The final mathematical transformation required in SSA is presented in Eq. (7).

$${z}_{j}^{i}=1/2\left({z}_{j}^{i}+{z}_{j}^{i-1}\right)$$
(7)

where i ≥ 2 and zij depicts the position of the ith follower salp in jth dimension. The pseudo-code for implementing the SSA can be found in following (Mirjalili et al. 2017).

Algorithm 2
figure b

Pseudo-code of SSA algorithm

The MVO and SSA are simulated in the MATLAB based on their pseudo-codes. The ANN was first initialized and then subjected to stochastic simulations using MVO for the MVO-ANN and SSA for the SSA-ANN model. The adopted ANN architecture for the initialization is the 6-9-1 which is the optimum architecture obtained for the ordinary ANN models as presented in Table 2. The number of search agents in the both adopted optimization algorithms was varied from 10 to 100 with a fixed maximum iteration at 1000 in both. The solution space and the convergence curve that gave the optimum result in MVO-ANN model is presented in Fig. 2, while that of SSA-ANN is illustrated in Fig. 3. Notably, Figs. 3 and 4 show the trend of the convergence curves at each iteration stage. The predictive performances of the MVO-ANN and SSA-ANN models are compared with the ordinary ANN and also the existing models using an independent historical dataset. Thereafter, the models are subjected to detailed statistical analysis for the test of reliability and selection of suitable models for slope stability analysis.

Fig. 2
figure 2

An example function used in evaluating MVO and results of the MVO code in finding the minima

Fig. 3
figure 3

An example function used in evaluating SSA and results of the SSA code in finding the minima

Fig. 4
figure 4

Empirical CFD of the proposed models for a measured b proposed ANN c MVO-ANN and d SSA-ANN

3 Results and discussion

3.1 Model comparison

The proposed models are compared with the measured FoS (the measured FoS here refers to the obtained historical FoS dataset) using the empirical CFD as presented in Fig. 4 based on the overall dataset. The mean value of the measured FoS is 1.245 with a standard deviation of 0.3814 (Fig. 4a). Comparing the measured FoS mean and standard deviation with the ordinary ANN (Fig. 4b), MVO-ANN (Fig. 4c), and SSA-ANN (Fig. 4d), the mean values of MVO-ANN and SSA-ANN are identical to that of the measured FoS while that of the ordinary ANN is lower than the measured FoS mean. The standard deviation of the proposed models is close to that of the measured values, and the SSA-ANN has the lowest standard deviation value. Since the mean values of the hybrid ANNs are identical to the measured values, they (MVO-ANN and SSA-ANN) seem to both have closer empirical CFD to that of the measured than the ordinary ANN with lower mean values. Even though the ordinary ANN has a very high R value as given in Table 2, the optimization algorithms have improved its performance as evidenced in the empirical CFD. It can also be inferred from the obtained identical mean values that the error values of the ordinary ANN have been minimized by the algorithms.

To further establish the importance of the proposed models in the palace of the reliable slope stability evaluations, independent historical datasets outside the one used in developing the models are used to assess the performance of the proposed models coupled with the predictions of the classical LEM and the Marrapu et al. (2021) ANN model. The comparison is also done using the empirical CFD of the models as presented in Fig. 5. The mean values of all eight models that participated in the comparison are lower than that of the measured data points as shown in Fig. 5. Since the standard deviation can provide an insight into how well the models deviate from the expected targets, the standard deviation of the proposed models most specifically the hybrid ANNs are the closest to the actual values’ standard deviation and the lowest among the models (Fig. 5). This indicates that the proposed models may be more reliable and safer for the FoS assessment than the existing models. However, the information provided with the figures may not be sufficient to ascertain the proposed models’ reliability. Therefore, further statistical analysis is conducted for the selection of the most reliable models among the models that participated in this comparison. This is presented in the next section.

Fig. 5
figure 5

Empirical CFD of the models

The performance of the proposed models with the existing models is further evaluated prior to the detailed statistical analysis using the percentage relative error (RE) criteria based on the independent historical dataset. The expression for the RE is presented in Eq. (8),

$$RE=\frac{{\Delta }_{m}-{\Delta }_{p}}{{\Delta }_{m}}\times 100\%$$
(8)

where ∆m stands for the actual FoS value, while ∆p stands for the predicted value using the models. The percentage MARE is computed using RE as expressed in Eq. (9)

$$MARE=\frac{1}{n}\sum_{i=1}^{n}\left|RE\right|$$
(9)

where n stands for the number of data points used in the validation or evaluation. The obtained results for the RE and MARE are presented in Table 3. The MARE values of the proposed SSA-ANN and MVO-ANN are the smallest among the other models as shown in Table 3. This observation agrees with the empirical CFD results in Fig. 4. However, these criteria are still shallow, and there is a need for further justification via detailed statistical analysis. This is presented in what follows.

Table 3 Relative error analysis in the predicted FoS

3.2 Model selection with statistical analysis

The values of FoS are predicted for all the models under the evaluation with the historical data outside those used in developing the models. Thereafter, root-mean-square error (RMSE) (Eq. (10)), mean absolute error (MAE) (Eq. (11)) and R2 values between the actual and predicted data points are computed to do a preliminary statistical analysis.

$$RMSE=\sqrt{\frac{\sum_{i=1}^{n}{({Y}_{p}-{Y}_{a})}^{2}}{n}}$$
(10)
$$MAE=\frac{\sum_{i=1}^{n}\left|{Y}_{p}-{Y}_{a}\right|}{n}$$
(11)

where YP and Ya are the predicted and actual values, respectively. The obtained RMSE, MAE, and R2 values for the eight models that participated in this evaluation is shown in Table 4. From Table 4, FSANN model has the highest RMSE of 0.393321 and MAE of 0.344316. FSM.P model has the least RMSE value of 0.339828. The FSFel has the highest R2 value of 0.8877, while SSA-ANN has the least MAE value of 0.259729.

Table 4 Performance evaluation with basic statistical indices

According to Chai and Draxler (2014), the accuracy of the model is better assessed using the RMSE than MAE when the error distribution is normal but if the RMSE of two models are the same with different MAE, then MAE can be used. However, the normality test conducted on the error between the actual and predicted values revealed that the error is normally distributed for all the model as their p-values are greater than 0.05 except SSA-ANN model with p-value < 0.05 (Table 5). By careful examination of Table 4, there are inconsistencies in the obtained RMSE and MAE values for the models. For instance, FSM.P model has the least RMSE value of 0.339828 but its MAE is 0.279579 which is greater than those of MVO-ANN and SSA-ANN with both having higher RMSE than that of FSM P model. The R2-value of FSFel is 0.8877 which is the highest but its RMSE and MAE are higher than the models with the lower R2 values than it. Hence, the information provided by RMSE and MAE are insufficient and may not be reliable for the selection of the most suitable model. The R2 has been described to be a weak indicator, and it can also not be used for the reliable assessment of the models (Willmott and Matsuura 2005; Lawal and Kwon 2023).

Table 5 Outcome of statistical analyses

Since the traditional statistical indicators gave inconsistent and insufficient values that cannot be relied on in selecting the best models among the eight stability analyses models, their evaluations are unacceptable. For the detailed statistical evaluation of the models, the procedures in the flowchart shown in Fig. 6 is adopted.

Fig. 6
figure 6

Flowchart showing the procedures of the detailed statistical analysis (Mohammed et al. 2019)

Based on Fig. 6, the actual and predicted values were subjected to normality test, and depending on the obtained distribution, the p-values were computed. According to Table 5, the actual value has p-value greater than 0.05 which implies its normality. The p-values of the FSBis, FSFel, FSM.P, and FSJan were also greater than 0.05 indicating their normality. However, the p-values of the FSANN, ANN, MVO-ANN, and SSA-ANN are less than 0.05 which implied that they are non-normal. Hence, to perform the p-value tests between the actual and predicted values for the normally distributed data, the one-way ANOVA was first conducted to determine the F-test value and assess the homogeneity of the variance of the data points. Thereafter, the p-values were computed for the models using the hypothetical two sample t test. However, for the models with non-normality, the nonparametric Mann–Whitney test was used as the requirement for this is that either the measured or predicted data must be non-normal. Hence, for the computation of p-values between the measured and predicted data by FSANN, ANN, MVO-ANN, and SSA-ANN models, Mann–Whitney was adopted, and the obtained p-values are presented in Table 5.

The p-values of the proposed hybrid MVO-ANN and SSA-ANN are 0.9427 which is the highest among all the models. Hence, the proposed hybrid models are reliable and suitable for the FoS safety prediction. This also agrees with MARE prediction (Table 5). The other models also have relatively high p-values, and they can also be used but the higher the p-value, the better the model reliability. The FSBis, FSM.P, and FSANN have the least p-values. Therefore, the most reliable model for the FoS predictions is the proposed hybrid ANN models SSA-ANN and MVO-ANN. The performance of the hybrid models is not unconnected to their stochastic nature rather than entirely deterministic as in the case of classical models. Hence, the practical implementation of these two models is imperative, and this is made possible by transforming their weights and biases into the mathematical form and MATLAB code (Appendix) which can easily be used for FoS computation. This is also one of the uniqueness of the paper as none of the existing SC models for FoS prediction simplify their model for the practical implementation as obtainable in this study.

3.3 Transformation of the hybrid models into mathematical form

The selected models by the performed statistical analyses which are MVO-ANN and SSA-ANN are transformed into the implementable mathematical forms. To achieve this, the weights and biases for the respective MVO-ANN and SSA-ANN are presented in Table 6. From Table 6, Eqs. (12) to (15) are obtained.

$${FoS}_{MVO-ANN}=0.7125\mathrm{tanh}\left(\sum_{i=1}^{9}({X}_{i}-0.9403)\right)+1.3375$$
(12)

where the Xi which is X1 to X9 in Eq. (12) can be obtained from Table 6 as demonstrated by X1 and X9 in Eq. (13). The detailed expression for the X2 to X8 is presented in Appendix.

$$\begin{array}{c}{X}_{1}=-1.8636\mathrm{tanh}\left(\begin{array}{c}-1.13304{\gamma }^{n}+ 1.7032{c}^{n}+ 0.4959{\varphi }^{n}+ 1.0912{\beta }^{n}+\dots \\ 0.4626{H}^{n}-0.6961{ru}^{n}-1.1326\end{array}\right)\\ \vdots \\ {X}_{9}=-0.24799\mathrm{tanh}\left(\begin{array}{c}0.80533{\gamma }^{n}+ 0.02456{c}^{n}+ 0.70387{\varphi }^{n}- 0.24874{\beta }^{n}-\dots \\ 0.7204{H}^{n}+0.5224{ru}^{n}+0.4426\end{array}\right)\end{array}$$
(13)
$${FoS}_{SSO-ANN}=0.7125\mathrm{tanh}\left(\sum_{i=1}^{9}({Y}_{i}-0.19421)\right)+1.3375$$
(14)
$$\begin{array}{c}{Y}_{1}=1.96097\mathrm{tanh}\left(\begin{array}{c}0.4634{\gamma }^{n}-0.2245{c}^{n}-0.62067{\varphi }^{n}-2.11399{\beta }^{n}+\dots \\ 0.1952{H}^{n}-0.6747{ru}^{n}-0.110006\end{array}\right)\\ \vdots \\ {Y}_{9}=1.47531\mathrm{tanh}\left(\begin{array}{c}-0.72398{\gamma }^{n}-1.78201{c}^{n}+1.3258{\varphi }^{n}+ 0.91056{\beta }^{n}-\dots \\ 0.50955{H}^{n}+0.4153{ru}^{n}-2.3654\end{array}\right)\end{array}$$
(15)

where the Yi which is Y1 to Y9 in Eq. (14) can be obtained from Table 6 as demonstrated by Y1 and Y9 in Eq. (15). The detailed expression for the Y2 to Y8 is also presented in Appendix. Equations (12) and (14) can be a better alternative for the slope stability analysis. The marginal plots of Eqs. (12) and (14) are presented in Fig. 7. It can be seen that the histogram of the SSA-ANN is closer to that of the measured points than the MVO-ANN. This could be the reason why the statistical analysis and MARE favored SSA-ANN than the MVO-ANN.

Table 6 Obtained weights and biases from the MVO-ANN and SSA-ANN
Fig. 7
figure 7

Marginal plots of the predictions of MVO-ANN (Eq. 12) and SSA-ANN (Eq. 14) models

3.4 Limitation and findings

The limitation of this study is that only the circular failure in soil slopes are considered unlike the Zhu et al. (2022) and Azarafza et al. (2022) that worked on rock slopes. Hence, the work can be extended to rock slopes too. The proposed models just like other models will perform most reliably when the data that fall within the range of those used in developing them are used. However, the proposed methods can conveniently be used for the quick and rapid assessment of FoS of soil slopes with high degree of accuracy, and no assumption or location of critical failure surface are required. The study has also made the selection of the accurate model among the existing LE methods possible including the development of the ANN-based mathematical model and MATLAB code for the FoS prediction. All these advantages provided by this study can be hardly found in the existing related studies.

3.5 Sensitivity analysis

A sensitivity test was performed to ascertain effective parameters in factor of safety using cosine amplitude method (CAM) model. This method is commonly used by the researchers for assessing the influence of model’s independent parameters on the dependent parameter (Lawal and Kwon 2022). The impact of input parameters on the factor of safety was depicted in terms of rij as illustrated in Fig. 8. The r value close to 1 revealed the parameter with the highest impact on output, whereas r value of 0 represents no effect of the parameter on the output. Results from Fig. 8 propose that parameter γ (kN/m3) has the highest influence on factor of safety with the r value of 0.957 followed by φ (°) with r value of 0.909, while c (kPa) has the least impact as shown in Fig. 8. However, all the parameters have influence on the predicted factor of safety because none of the parameters have rij below 0.5 which implies an average or moderate impact.

Fig. 8
figure 8

Sensitivity analysis using CAM

4 Conclusion

This study presents the rigorous statistical technique for selecting suitable slope stability models among the ordinary ANN and stochastically modified ANN and the classical deterministic slope stability analysis methods. The historical database was generated from the scattered literature to achieve the aim. The obtained datasets comprised the geotechnical properties of the soil and the slope geometric parameters. Then, the ANN models were simulated, and the optimum ANN model was selected and then subjected to MVO and SSA stochastic algorithms to enhance the performance of the ordinary ANN model. Then, the performances of the ordinary and hybrid ANN models were compared using the empirical CFD. Thereafter, 19 independent datasets outside those used in developing the models were used to validate the proposed models and the classical slope stability analysis models coupled with an existing ANN model. The validation was done using both the empirical CFD and MARE. The results in all the validation cases favored MVO-ANN and SSA-ANN. Then, the models were further subjected to rigorous statistical analysis as the basic statistical indices such as RMSE, MAE, and R2 gave inconsistent results that cannot be totally relied on for the selection of the models. We subjected the models to normality test, ANOVA, variance homogeneity test, two-way t tests, and nonparametric test. The output of all the tests gave a strong background for selecting the most reliable models which are SSA-ANN and MVO-ANN for the slope stability analysis. The selected models were then transformed into the mathematical form and written in MATLAB code as presented in Appendix for easy assessment of the slopes. It can be concluded that based on the study, the proposed models (hybrid ANNs) are suitable and reliable for the evaluation of slopes for stability, and they are also practically implementable. The SSA-ANN performed best followed by the MVO-ANN.