1 Introduction

Slope failures are the easiest natural hazard to prevent, reduce, or resolve (Collins and Znidarcic 2004). Landslides occur on a large portion of land surfaces except snow-covered areas in India (Chawla et al. 2018). This translates to a total area of 0.42 million km2, out of which 43% area is found in the North-Eastern Himalayan Region (NEHR) according to GSI 2014. According to the National Crime Records Bureau's (NCRB) statistics on inadvertent fatalities (2010–2019), landslides kill around 304 people in India each year. Surprisingly, recent changes in global climatic conditions have resulted in catastrophic weather events that increase the likelihood of landslides (Zou et al. 2021) and their frequency is being aggravated by uncontrolled urbanization and unorganized land-use changes in steep terrains (Khanna et al. 2021; Phong et al. 2021; Pourghasemi et al. 2012). The Kalimpong district is located in the NEHR and is susceptible to small- and large-scale landslides, particularly during the monsoon season, which lasts from July to September. The Kalimpong area has a steeply slanting mountainous topography that is constantly drained by heavy rains, making it very prone to landslides. The major town is located on a ridge near the Teesta River, but other rivers like Relli, Neora, Geesh, Leesh, Jaldhaka, and Murti, as well as several tiny streams, drain Kalimpong. These water bodies create active denudation among slopes on the valley side by erosion, which makes them steeper. The interCuvial (Area occurring between two phases or streams) area has been narrowed, making the environment more prone to landslides. The average monthly rainfall in Kalimpong during the rainy season, from June to September, runs from 119 to 417 cm (https://worldclim.org). Anthropogenic infrastructure projects such as roads, communities, hydropower projects, and so on loosen the slope material at the price of vegetative cover. This permits the loose debris to glide downslope with only a minimal lubrication from water. All of these elements combine to make Kalimpong an alluring site for landslide research. It is also critical to demarcate the danger-prone zones after which suitable steps may be implemented to limit the risks to people as well as property (Roy et al. 2022).

In general, slope stability is determined by shear strength as a function of normal stress on the slip surface, cohesion, and internal friction. The factor of safety (FoS) reflects the slope's stability, which is determined by calculating the ratio of “shear strength” to “shear stress” generated. A slope generally collapses when the produced shear stress exceeds the available shear strength of the soil (Kabir et al. 2023). Limit equilibrium analysis techniques (LEMs), a basic and conventional analytical tool for slope stability investigations, may be used to compute FoS and are widely utilized in slope stability studies because of their simplicity, low version complexity, and rapid processing durations (Mafi et al. 2021). Both the static as well as dynamic scenarios for multi-dimensional (2D and 3D) environments (Azarafza et al. 2014; Agam et al. 2016) can be used with LEM. The FoS is estimated using several equilibrium approaches. Some of the most well-known approaches include Fellenius, Bishop, Janbu, Modified Swedish, Morgenstern-Price etc., (Alejano et al. 2011). Maximum approaches out of this produce similar findings when calculating FoS, with the variance in projected values often being less than 6% (Huang et al. 2012). In recent decades, LEM has been proposed and extensively researched for slope stabilization evaluation (Yue and Kang 2021; Liu et al. 2015; Wang et al. 2011; Cheng et al. 2007; Zhu et al. 2005; Zhu et al. 2003). Regardless of their utilitarian value, the LEM technique has remained the method of choice for optimum utilization of a number of methodologies based on the nature of issue to be addressed (e.g., circular, non-circular) and the desired precision of the findings (Matthews et al. 2014). The method of slices is also utilized for identifying the most critical slip surface, taking into account the consideration of probabilistic soil parameters. (Johari and Rahmati 2019). Traditional stability analysis methods, which are impacted by the stabilization process, struggle to deliver reliable conclusions due to the uncertainties while assessing FoS values. To address this issue, researchers applied computational intelligence methodologies that give a very precise forecast of the slope condition, failure mechanism, and danger of slide (Azarafza et al. 2022; Zhu et al. 2003; Ahangari Nanehkaran et al. 2022; Li and Yang 2019; Mathe and Ferentinou 2021). Meanwhile, machine learning approaches have attracted a lot of attention for minimizing uncertainty in FoS computations.

Artificial intelligence (AI), and particularly machine learning, has offered great assistance for determining the stability of slopes in terms of FoS calculation utilizing prognostic models. Such models conjecture FoS based on the rate of machine learning and specified accuracy of models. These algorithms try to construct techniques for comprehending the existing state of “target data”, learning, and operating to learn using “training data”. It employs a variety of algorithms that are categorized as “shallow” or “deep” learning approaches in order to produce likelihoods or predictions (Raschka et al. 2020). The precision of the predictions is directly related to the algorithms’ learning mechanism, which can be a counterpart to learning models such as controlled, unstructured, or reinforcement learning (Schmidhuber 2015). The last 25 years have seen the effective use of AI and ML techniques in the fields of engineering and sciences (Asteris et al. 2021a, b, c; 2022; Johari et al. 2016; Harandizadeh et al. 2021; Zhao et al. 2021; Zhang et al. 2021; Armaghani et al. 2021; Zhou et al. 2021a,b; 2016; Yang et al. 2020; Kardani et al. 2021). ML models are also utilized in order to calculate findings for slope stability analysis that can offer insights into prospective slope collapse processes and rates through predictive modeling, risk assessment, and uncertainty analysis (Bui et al. 2020; Erzin and Cetin 2012; Abdalla et al. 2015; Verma et al. 2016; Samui 2013; Sakellariou and Ferentinou 2005; Ferentinou and Sakellariou 2007). MATLAB-based coded program (Johari and Fooladi 2020; Kalantari et al. 2023), ANFIS, and other algorithms are also used to forecast the FoS of slopes and made comparisons of those predictions with results of LEM approach (Mohamed and Kasa 2014). In different research, particle swarm optimization (PSO) technique is also used to compare the FoS of slopes with 3D-FEM (Kalatehjari et al. 2014). They demonstrated effective use of PSO under 3D circumstances, but less effective use under 2D slope stability conditions. In order to forecast slope stability in comparison to the LEM slope stability study, many researchers (Ferentinou and Sakellariou 2007; Lu and Rosenbaum 2003; Sakellariou and Ferentinou 2005) employed artificial neural networks (ANN), a fundamental and common AI model. The LEM and ANN model findings were discovered to be in agreement, and the categorization of sample observations based on the anticipated failure mechanism was made possible. In different research, while comparing support vector machine (SVM) model and contrasted it with ANN outcomes, it was shown that SVM was able to achieve a little greater accuracy (Samui 2008). Gradient boosting was used to determine the FoS and its connection to the triggering factors on slope instabilities (Zhou et al. 2019). A similar comparison was made with support vector regression (SVR) and the radial basis function (Wei et al. 2021a). Different artificial intelligence-based methods were employed to forecast the FoS values for slopes with the necessary precision, which was then applied for slope stabilization (Qi and Tang 2018). The “extreme learning machine” (Liu et al. 2014), “attribute recognition method” (Tao et al. 2021), ANN (Wei et al. 2021b), “fuzzy comprehensive evaluation method” (Wang and Lin 2021), “aggregative indicator method” (Yan et al. 2019), “particle swarm optimization” (Gupta et al. 2016) and “cloud model” (Cui et al. 2021) have all produced a number of promising results. Taking into account the stress on the body of slope, demonstrating its deformation and stability and corresponding back failure mechanism play a crucial role in the prediction of slope stability using numerical simulation techniques and the limit equilibrium approach.

Therefore, in this study, critical sites from Kalimpong are identified and utilized for the assessment of future risk prediction under dry and saturated conditions. This scope is fulfilled by calculating the factor of safety by LEM and denoting them as stable or unstable. Then, machine learning techniques are utilized to train the machine learning models and testing them for real-site data samples. The disciplines of computer science, database hypothesis, data analysis, probabilistic theory, and other scholarly fields are all necessary for using machine learning algorithms, which have major advantages including quick processing and good generalization. Standard machine learning techniques may be able to solve some systems and issues that are challenging to solve using conventional experimentation and simulation techniques. The seven traditional ML algorithms used in this research offer the benefits of a straightforward structure, good prediction, and high classification accuracy. Python (a programming language) and fundamental machine learning techniques are used in this study to evaluate the accuracy of the prediction models. Additionally, random cross-validation is used to further determine the models' accurateness. In the end, a versatile, exact, and trustworthy slope stability forecasting model is produced. Also, innovative approaches are designed to enhance the accuracy of models for such scattered datasets like data scaling, ensembling which can be achieved by normalizing or standardizing real-valued input and output variable and a new stack model R-Boost is developed in this research to get maximum accuracy in output prediction.

2 Study Area

Kalimpong is a small peninsula town in the Indian state of West Bengal, close to the border of Nepal. It is 1250 m above sea level and is recognized for its moderate temperatures and natural beauty. Kalimpong is surrounded by lush green hills and is known for its tea plantations, flowers, and breathtaking Himalayan vistas. It is hemmed on the western side by the Teesta River and on the eastern side by the Relli River. The average temperature in this region is between 27 and 5 °C. The strong monsoons in this region generate devastating floods that annually cut off Kalimpong from the remaining state. Because of its location and natural characteristics, Kalimpong, like many other high regions, is prone to collapses. During the summer, the region suffers strong rainfall, which often results in landslides, also due to the steep slopes and loose soil. Despite many efforts, it continues to be a severe threat to Kalimpong and the surrounding communities. Local leaders and individuals must be vigilant and take the appropriate precautions to prevent and mitigate its effects in the region (Das et al. 2022). Figure 1 shows the geographic location of Kalimpong in India and a Google image of Kalimpong with critical sites marked (L1-6) in Mahakal Dara Bhalukhop, Chandraloke, Upper Tashiding, Ngassey Busty, Mongbol Road, and Deolo, respectively. Figure 2 represents the site images of these locations after the recurring landslide with their latitudes and longitudes in caption while soil samples were collected from these locations later.

Fig. 1
figure 1

(Source: USGS)

a Geographic location of Kalimpong in India. b Google image of Kalimpong with locations 1–6

Fig. 2
figure 2

Post-landslide images of critical locations

2.1 Geology

Kalimpong is situated in the Eastern Himalayas, a region known for its complex geological features. The region is characterized by the collision of the “Indian” and “Eurasian” tectonic plates, which has resulted in the formation of the Himalayan mountain range. The eastern flanks of Kalimpong are rather flat and safe, however the western face is primarily steep and rocky. The town of Kalimpong is primarily made up of soft phyllite, Archean gneiss, and schists. The area has several cracks and joints that increase the chance of rock decomposition and dissolution, leading to the formation of unconsolidated substance (Dikshit and Satyam 2018). The mountainous soils found in the area are marked by heavy organic matter and water-holding ability, which can cause volume growth. The bedrock in the region is a golden to silver-colored quartz mica schist of the Daling series, with small variations. The constant percolation of water at the bottom layer of the soil horizon is linked with coarse textures in the middle part of the soil resulting in a reduction in soil shear strength (Abraham et al. 2020). These geological features shape the landscape and natural resources of Kalimpong. Because of the geological activity in the region, the Eastern Himalayas are also prone to earthquakes. In the past, the shocks caused significant damage to infrastructure and human lives. An elevation map of Kalimpong shown in Fig. 3 provides information regarding the topography of a slope, which is a crucial factor for understanding the forces operating on it.

Fig. 3
figure 3

(Source: USGS)

Elevation map of Kalimpong

2.2 Geohydrology

Kalimpong, like other steep places, has a complicated water system affected by the region's geography, geology, and temperature. The power of the soil to receive water boosts soil mass and, finally, soil unit area. Landslides become more possible when pore pressure rises due to growing soil water absorption. Water flow weathers the rocks along the edges of streams and rivers, causing rock and other materials to break over time, resulting in slides. The rainy season in Kalimpong produces high-intensity rainfall, making it the region's peak landslide season. A number of smaller channels known as kholas (second- and third-order streams) and jhoras (mainly first- and second-order streams) drain the area. The jhoras get their water from a large number of long-lasting regular springs at the hill's top (Source: Save the Hills). The region is crossed by five subbasins, all of which are Teesta River sources (Mukherjee and Mitra 2001). First-order streams in the Teesta region combine to create second- and higher-order streams (Fig. 4).

Fig. 4
figure 4

(Source: USGS)

Geohydrological details showing streams of 1, 2, and 3 orders

3 Methodology

The study of geotechnical traits of crucial sites' soil at different places in Kalimpong is important so as to find the strength of the soil. For analysis of landslides, soil samples were collected from the slope’s bottom, center, and top parts after the occurrence.These were procured by an instrument “Core Cutter” at an approximate depth of 0.5 m. All of these placid samples were carefully transferred to a laboratory and tested for various qualities, viz. grain size distribution (Fig. 5), Atterberg limits as per IS: 2720(Part-5), water content, maximum dry density IS: 2720(Part-7), cohesion, and internal friction angle as per ASTM standards. In situ bulk density measurement was also done by core cutter method. The data obtained are to be utilized for LEM modeling by SLOPE/W software in GeoStudio. As geotechnical research shows, the presence of sand in the sample and the undrained and drained study of soil samples has also been done using direct shear test and triaxial shear test (consolidated drained), but since worst conditions are described to identify future risk threshold, only drained parameters are noted here by representing the Mohr–Coulomb failure envelope (Fig. 6).

Fig. 5
figure 5

Grain-size distribution curve

Fig.6
figure 6

Mohr–Coulomb failure envelope

3.1 SLOPE/W Results

The present study measures the FoS for a variety of critical cut slopes with varying soil properties in Kalimpong using the Morgenstern–Price (M–P) method (Morgenstern and Price 1965), in GeoStudio 2021.4 with the help of Slope/W software, confirmed by field survey. This part explains the full approach, including mathematical modeling and field validation. Figure 7 depicts the SLOPE/W results for various samples at L1-6 for dry (bulk) and saturated conditions, respectively, and the soil properties along with computed factor of safety are represented in Table 1.

Fig. 7
figure 7figure 7

SLOPE/W results for L1-6 under dry and saturated conditions

Table 1 Geotechnical analysis and factor of safety of collected soil samples

3.2 Data Collection and Processing of Slope Field Cases

In this research, 97 field instances of slope stability analysis were analyzed, including 12 cases of crucial sites from the Kalimpong area, the findings of which are reported above, and 85 cases from pertinent literature based on “slope stability” assessment (Sah et al. 1994; Zhou and Chen 2009; Li and Wang 2010). Each sample depicts a field study related to slope engineering, which embraces five input parameters (i.e., five independent factors). The stability of the slope will then be assessed using a signal (one dependent component), either "stable" or "failure." Table 2 shows the distribution range of each component. To make it easier to apply ML models, “failure” and “stable” are ticketed as 0 and 1, respectively, at the time of prediction and later converted to the same.

Table 2 Ranges of different inputs

3.3 Sanity of the Data

Each group of data was matched based on five independent characteristics, yielding one dependent outcome. Because the data are merged, each sample attribute is significant and distinct, with an accurate indication. Among these 97 dataset rows, 41 are categorized as "stable," whereas the remaining 56 are categorized as "failure." There is a ratio of 1:1.36 between these two groupings, indicating that the signs are distributed almost equally. To more immediately examine the data's validity, a violin chart (strip plot for revealing underlying data by points) is used. Figure 8a–e shows the violin plots for UW, C, Phi, SA, and SH for both “Stable” and “Failure” categories. The white circle at the center of each plot shows the median. The box's range includes the first and third quartiles. The 95% confidence level is indicated by a narrow black line existing in each violin plot. The silhouette or boundary of each violin provides an approximation of the normal kernel density for the supplied feature. The findings indicate that the data are stable and follow a normal distribution.

Fig. 8
figure 8

Normal kernel density violin plots for different input parameters

3.4 Attributes of the Information Dissemination

This section examines different statistics of each feature to check whether the data/parameters are having a “skewness” distribution. Because the five sources have distinct SI units and meanings, they are all evaluated independently. The unit weight's minimum, maximum, mean, mode, median, and standard deviation are 13.97, 31.30, 20.827, 18.5, 19.97, and 3.79 kN/m3, indicating that it follows the normal distribution. Table 3 also displays all statistical data values such as mean, median, mode, min, max, standard deviation, and dispersion. Figure 9 depicts a parametric distribution, as well as the mean (mu) and standard deviation (sigma) for normal distribution and rate parameter (lambda) for exponential distribution. However, the slope height in Fig. 9d demonstrates that it fits exponential distribution in a better manner as compared to normal distribution in other indices with rate parameter lambda = 0.01379. Also, mean and standard deviation for exponential distribution is the inverse of rate parameter, which comes out to be 72.49 for slope height. The remaining parameters, UW, C, Phi, and SA, demonstrate normal distribution.

Table 3 Statistical characteristics of data
Fig. 9
figure 9

Distribution histogram of different indexes

3.5 Assessment of Correlations Among Parameters

It is crucial to first investigate the relation between the five attributes (i.e., factors) before making a conclusion on prediction models. The significant relationship between these features may influence the models’ accuracy used in prediction and lead to indecorous inferences that controvert the reality. The equation to calculate Pearson's correlation coefficient between any two elements is represented by Eq. (1) (Cohen et al. 2009).

$$\left(r\right)=\frac{\sum ({x}_{i}-\overline{x })({y}_{i}-\overline{y })}{\sqrt{\sum {({x}_{i}-\overline{x })}^{2}\sum {({y}_{i}-\overline{y })}^{2}}}$$
(1)

where r is the coefficient of correlation of x and y (range − 1 to 1), xi is the x variable value, yi is the y variable value, \(\overline{x }\) = the mean of x values, and \(\overline{y }\) = the mean of y values. Table 4 contains a matrix with the association values of all five qualities. If the correlation value of two components approaches 1, they are regarded to have a strong correlation. Otherwise, the relationship between these two elements is weak. According to Table 4, correlation between cohesion and internal friction angle is − 0.22, which shows that materials are negatively correlated with each other. The slope angle and friction have the highest positive relationship, with an r value of 0.522 (among the five characteristics considered in Table 4). However, two entities with a correlation coefficient up to 0.5 are not inextricably related. As a result, the five qualities exhibit an ignorable connection. To better explain the relationship between the five qualities chosen for this paper and to more clearly illustrate the variables' ranges and affiliations, the correlation matrix of the factors influencing the stability of the concerned slopes is displayed in Fig. 10 by blending using the drawing software.

Table 4 Evaluation metrics for different ML methods
Fig. 10
figure 10

Correlation matrix

4 Prediction from Models

4.1 Conventional ML Models

In this work, seven supervised models—support vector machine (SVM), decision tree (DT), k-nearest neighbors (KNN), logistic regression (LR), random forest (RF), and AdaBoost—and one probabilistic model naïve Bayes (NB) are used. A supervised machine learning approach called SVM may be applied to classification, regression, and outlier identification. It is a kind of linear classifier that seeks the most effective hyperplane for categorizing the data. The gap between the two classes is maximized by choosing the hyperplane in this fashion (Samui 2008). For classification and regression analysis, decision tree is a supervised machine learning method. It is a graphical representation of all the possible solutions to a decision based on certain conditions. In a decision tree, each node represents a decision, and each edge represents the outcome of that decision (Hwang et al. 2009) Since kNN is a non-parametric method, it makes no assumptions about how the data will be distributed. To get the categorization or regression value of a particular data point, it only examines the k-nearest neighbors (Cheng and Hoang 2016). A logistic function is used in the linear model of logistic regression to represent the likelihood that the output variable will fall into a certain class (Bhagat et al. 2022). Multiple decision trees are trained in the random forest algorithm using arbitrary subsets of the training data, and the final prediction is achieved by averaging the predictions of each individual tree (Xie et al. 2022). A supervised machine learning technique called AdaBoost (adaptive boosting) is utilized for classification and regression analysis. It is an ensemble learning technique that combines a number of weak learners to increase the model's performance and accuracy (Lin et al. 2021). Naïve Bayes is a probabilistic ML algorithm that is used for classification. It is based on Bayes' theorem and presupposes that the input characteristics are conditionally independent of one another (Feng et al. 2018).

4.2 Consideration of Impacting Parameters on Slope Stability

Slope stability is influenced by both numerical and qualitative parameters. The numerical parameters include cohesiveness, slope height and angle, pore water pressure, unit weight, internal friction angle, and others. Qualitative parameters include failure patterns, physical characteristics and quality of soil and rocks, subsurface water, and more. Here, the objective is to determine whether a slope is stable or failing, and this is based on numerical calculations. However, converting qualitative characteristics into quantitative values is the biggest issue while field instances data are not sufficient. Therefore, ML algorithms are used to develop prediction models based on five indicators: C, SA, SH, Phi, UW, and the dependent component related to assessment of slopes is classified as "stable" or else "failure". Interstitial water pressure is not included in the prediction models because it is often unclear in field instances and value assignment is based on diverse standards. This study focuses on 99 slope data case sets and concludes that the five chosen indicators accurately reflect slope stability. Thus, interstitial water pressure is excluded to ensure sufficient accuracy and reliability in the prediction models.

4.3 ML Models Analysis

Standard cross-validation techniques, such as 2, 3, 5, 10, and 20-fold is applied on the original testing data. To create the model for slope stability forecasting, 29 samples are selected at random as the test set and the rest data are considered as the training set. After repeating the aforementioned random choice five times, the model's final forecast result is the average of the five prediction outcomes. For ease of reckoning in this article, randomized cross-validation is performed using programming language Python. Only the scatter plots and linear fitting curves between unit weight on horizontal axis (x) and various parameters on vertical axis (y) are displayed in this article due to space restrictions. Figure 11 also represents the fitting line equation, its slope and intercept, Pearson’s coefficient (r) and coefficient of determination (COD).

Fig. 11
figure 11

Regression fitting line and scatter plots of different parameters

4.4 Valuation of Models

The common prediction model assessment metrics include classification accuracy (CA), precision (P), recall (R), the F1 score (F1) and the area under the curve (AUC). The combination of forecasting and reality is classified into four categories: true positive (TP), true negative (TN), false positive (FP), and false negative (FN). According to Eq. (2), CA measures how well the model can correctly predict both positive and negative instances shown (Begum et al. 2021).

$${\text{CA}} = \, \frac{{\left( {{\text{TP }} + {\text{TN}}} \right)}}{{\left( {{\text{TP }} + {\text{ FP }} + {\text{ FN }} + {\text{ TN}}} \right)}}$$
(2)

Precision is a metric used in machine learning to evaluate the accuracy of a model's positive predictions, as indicated by Eq. (3) (Begum et al. 2021).

$${\text{P}} = \frac{{{\text{TP}}}}{{\left( {{\text{TP}} + {\text{FP}}} \right)}}$$
(3)

Equation (4) (Chen et al. 2022) gives recall, which is the inverse of accuracy.

$${\text{R}} = \frac{{{\text{TP}}}}{{\left( {{\text{TP }} + {\text{ FN}}} \right)}}$$
(4)

The F1 score indicated by Eq. (5) provides a balance between precision and recall, and is particularly useful in situations where there is an uneven class distribution or where the cost of false positives and false negatives is similar. The higher the F1 score, the better the model's performance in correctly classifying both positive and negative instances. If recall is considered to be on the horizontal axis and accuracy is considered to be on the vertical axis, the 'PR' curve may be calculated [further details can be found in (Begum et al. 2021)]. Models outside of the slope provide better outcomes in general.

$${\text{F1}} = \frac{{{\text{TP}}}}{{\left[ {{\text{TP}} + 0.{5}\left( {{\text{FP}} + {\text{FN}}} \right)} \right]}}$$
(5)

Equations (6) and (7) calculate the true-positive rate (TPR) and false-positive rate (FPR).

$${\text{TPR}} = \frac{{{\text{TP}}}}{{\left( {{\text{TP}} + {\text{FN}}} \right)}}$$
(6)
$${\text{FPR}} = \frac{{{\text{FP}}}}{{\left( {{\text{FP}} + {\text{TN}}} \right)}}$$
(7)

ROC is a graphical plot that illustrates the trade-off between the true-positive rate (TPR) and the false-positive rate (FPR) of a binary classifier as the decision threshold is varied. A higher AUC value indicates more success for a model.

5 Examination of Results from Predictions

5.1 Model Assessment Based on the Unprocessed Data

Seven distinct machine learning approaches and one stacking approach of random forest and AdaBoost i.e., R-Boost is introduced to conduct the random cross assessment. Table 4 shows the classification accuracy, precision, recall, F1, and AUC values derived from Eqs. (27). AUC is an important metric in machine learning because it provides a reliable and easy-to-interpret measure of a model's performance in binary classification problems, especially when the dataset is imbalanced. In terms of AUC, RF represents an average value of 0.81, while R-Boost has an AUC of 0.798 followed by LR with a value of 0.74. Also in terms of classification accuracy, R-Boost has the greatest forecasting skill in terms of CA, with an average of 0.725. AdaBoost and RF comes in second with an average accuracy of 0.723. AdaBoost also produces strong results since it converts a high bias low variance model to a low bias low variance model, which aids in the development of an ideal machine learning model that provides a highly accurate estimate. It is also simpler to use, requiring less adjustment than algorithms such as SVM. Here, however, the R-Boost algorithm developed gives maximum CA because it first applies random forest to the dataset in order to generate an initial array of decision trees. The decision trees are then given an AdaBoost enhancement to increase their efficiency and precision. This method can increase the model's precision and decrease its variance, making it more reliable and better able to handle complicated datasets with numerous characteristics to more explicitly characterize the correctness of each model. Accuracy is one of the indicators that the model's behavior is inaccurate for regarding skewed data. When both F1 and AUC values are taken into account, excellent prediction models may be developed. According to the findings in Table 4, forecast models with F1 values more than 70% comprise R-Boost, AdaBoost, and RF. Furthermore RF, R-Boost, and LR have AUC values greater than 74%. As a result, RF and R-Boost are regarded to be the most accurate predictor on the basis of AUC and CA, which also adds value to the novelty factor of this research paper.

5.2 Sensitivity Analysis

Here we focus on the sensitivity analysis by weight determination criteria in which the importance factor is computed for each input parameter. Inter-criteria correlation (CRITIC)-based method is utilized to perform this activity by making use of coefficient of variation, which is equal to (standard deviation/average) for individual parameter given by Ij. The objective weight (Wj) of any criteria j is determined using Eq. (8) (Krishnan et al. 2021).

$${\text{Wj}} = \frac{{{\text{Ij}}}}{{\left( {\sum\limits_{{j = 1}}^{n} {{\text{Ij}}} } \right)}}$$
(8)

In accordance with the five input parameters, weightage plays an important role. Weightage of each parameter is shown in Fig. 13. The values of weightage in percentage comes out to be 2.7, 12, 7.5, 6.3, and 71.5 for UW, C, Phi, SA, and SH respectively. According to the findings, slope height has a greater influence on slope stability than cohesiveness while unit weight has the least shown in Fig. 12.

Fig. 12
figure 12

Weightage indicator of each parameter

5.3 ROC Curve

The receiver operating characteristic (ROC) curve is a graphical depiction of a binary classifier system's performance. Area under ROC (ROC-AUC) in the curve for current scenario, as shown in Fig. 13, indicated that RF provided the greatest overall accuracy when used to quantify the level of competence for all models. Strong machine learning algorithms RF and R-Boost can forecast FoS for various earth slopes and give appropriate outcomes based on the ROC curve findings. In the interim, SVM can be recommended as an alternate method that produces acceptable and accurate results. Based on these ROC curves, RF has the maximum overall accuracy with AUC = 0.81 in comparison to other classifiers, while R-Boost has an AUC of 0.798 followed by LR with a value of 0.74, respectively. Naïve Bayes has the lowest AUC of 0.654. It is conceivable to state in this regard that the use of RF and R-Boost can give FoS with dependable and accurate results that are in excellent agreement with LEMs. This type of ML technology can help in the development of an optimal approach for determining the stability state of such slopes and providing relevant stabilization measures for them.

Fig. 13
figure 13

ROC curve

5.4 Comparison with GeoStudio Results

The modeling was done in stages, beginning with geometrical modeling and progressing through border conditions, behavioral specifications, materials, and mechanical modeling. In the phase of geometrical modeling, each slope is created based on the geographic conditions, angle and height of the slope's surface as well as other geometric index qualities. The boundary criteria are implemented via the “external boundary”, which is a secure polyline encircling the soil region needs to be studied. Also, it can be drawn manually in SLOPE/W by drawing the regions using appropriate coordinates. The present work used schematic coordinates. Table 5 contains details regarding the stability prediction based on best models and SLOPE/W program data. With their reasoning method, predictive models, particularly R-Boost followed by RF, deliver similar or close results to stability condition, as mentioned in Table 5 where S and F represent stable and failure, respectfully.

Table 5 Testing results on stability condition criteria for Kalimpong

6 Conclusions

To investigate the effects of topography, weather, and other climatic conditions on slopes, a thorough numerical examination is conducted under static conditions for dry and saturated conditions, respectively, by LEM and further simulated by machine learning models. The major findings of this study are summarized as follows:

  1. 1.

    The current study is concerned with the modeling and assessment of stability grades at crucial sites in Kalimpong identified by site visits and landslide inventory maps. Limit equilibrium software (SLOPE/W) is utilized to carry out factor of safety and slip surfaces determining the future perspective of stability in these locations. The comparison of FoS derived from simulation studies shows that under dry conditions, all slopes are stable, while in saturated conditions, two slopes are stable (marginally) while four are unstable which justifies the vulnerable impact of rainfall on the study area under both dry and saturated condition.

  2. 2.

    Five parameters namely cohesion, internal friction angle, unit weight, slope angle, and slope height are chosen as random variables and output being stability condition. By Inter-criteria-based correlation (CRITIC) method, the outcomes indicate that height of the slope is having the greatest impact on slope stability while cohesion is second.

  3. 3.

    Cross-validation in different number of folds and seven distinct machine learning approaches with a novel stacking approach R-Boost is used to conduct the random cross assessment. In terms of AUC, RF represents an average value of 0.81, while R-Boost has an AUC of 0.798 followed by LR with a value of 0.74, and in terms of classification accuracy, R-Boost has the greatest forecasting skill in terms of CA, with an average of 0.725. Among all predictive models, particularly R-Boost followed by RF, provides similar results as obtained by SLOPE/W.

  4. 4.

    The current study includes a comprehensive analysis as well as evidence from experiments, which show that slopes typically fail under saturated circumstances or are marginally stable, which is supported by machine learning models. This technique will be highly beneficial in minimizing, anticipating, and reducing the impact of such catastrophes disasters, which are one of the major impediments to the nation's socioeconomic progress.