Introduction

A landslide is defined as rock or debris sliding down a slope (Gudiyangada Nachappa et al. 2019). According to statistics, 3,876 landslides occurred between 1995 and 2014, causing 163,658 deaths worldwide (Haque et al. 2019). The factors causing landslides are complex and diverse, including natural factors, such as heavy rain, long continuous rainfall, reservoir river erosion, and earthquakes, and human factors, such as excavation of slopes, mining of mineral resources, and the combination of natural and human factors (Wilde et al. 2018). Landslides have significantly affected residents' living standards and infrastructure, especially in underdeveloped countries (Bovenga et al. 2017; Chen et al. 2019a). The occurrence of landslides may increase in the future due to rapid economic and population growth, environmental damage, and over-exploitation of natural resources (Gudiyangada Nachappa et al. 2019). Crucially, global warming has caused a rise in temperatures and a sharp increase in rainfall events and rainfall frequency (reduced summer precipitation); thus, it contributes to landslides (Kavzoglu et al. 2019).

Landslide susceptibility is the probability of slope instability occurring in an area based on geological conditions (Bui et al. 2012). The existing modeling methods can be divided into qualitative methods and quantitative methods (Chen and Yang 2023; Saha et al. 2024). The qualitative method relies on the subjective opinions of experts (Saha et al. 2023b), which may deviate the results from reality (Chen and Yang 2023). Quantitative methods could be divided into traditional statistical models, deterministic models, and machine learning algorithms (Chen and Yang 2023). For example, logistic regression (Bai et al. 2010; Ohlmacher and Davis 2003; Zhu and Huang 2006), weight of evidence (Lee 2013; Othman et al. 2015; Wang et al. 2020a), and frequency ratios (FRs) (Chen et al. 2020a; Hong et al. 2017; Khan et al. 2019) are commonly used statistical methods that have also been used as benchmarks. Although ground-based geotechnical surveys can be used to create reliable landslide susceptibility maps, this method is time-consuming, costly, and not suitable for large areas (Kim et al. 2018). Therefore, researchers have proposed numerous statistical and machine learning methods for creating landslide susceptibility maps since the 1970s (Kavzoglu et al. 2019). Machine learning methods can learn from data and adapt models by analyzing the potential relationships in the data to create analytical models. In landslide susceptibility analysis, the characteristics of landslide and non-landslide locations are analyzed, and advanced algorithms are used to simulate the inherently complex relationship (Kavzoglu et al. 2019). Many machine learning methods have been used to conduct landslide susceptibility assessments. Examples include support vector machines (Ballabio and Sterlacchini 2012; Saha et al. 2023a), Bayesian networks (BN) (Lee 2010; Song et al. 2012), artificial neural network (ANN) (Cao et al. 2023; Saha et al. 2022; Wu et al. 2024), credal decision trees (He et al. 2019), convolutional neural network (CNN) (Aslam et al. 2023; Deng et al. 2024; Ge et al. 2023), random forest (RF) (Guo et al. 2024; Lagomarsino et al. 2017; Rao and Leng 2024; Youssef et al. 2016), classification and regression trees (CART) (Felicísimo et al. 2013; Pham et al. 2018), and logistic model trees (LMT) (Bui et al. 2016; Truong et al. 2018).

In this context, the present study endeavors to address the pressing concerns of landslide susceptibility in Yanchuan County, China. By harnessing the capabilities of advanced machine learning models including RF, CART, LMT, and BN, coupled with a comprehensive array of terrain and environmental factors, this study seeks to unveil the intricate relationship between these factors and landslide occurrence. By doing so, we aim to provide not only a robust predictive framework for identifying vulnerable areas but also insights that can inform targeted mitigation strategies and emergency planning. As the region grapples with the increasing challenges posed by climate change and rapid urbanization, the outcomes of this research promise to contribute meaningfully to the ongoing efforts in disaster resilience and sustainable land use management.

Study area

Yanchuan County is located in the northeastern part of Yan'an City, Shaanxi Province, China (109°36′20″ E—110°26′44″ E; 36°37′15 N–37°5′55″ N) (Fig. 1). The county extends 70 km from east to west and 39 km from north to south and covers an area of 1941 km2. Yanchuan County is located in the Loess Plateau. The terrain is undulating, and the loess cliffs have steep slopes. The multi-year average evaporation is 1541.7 mm. Heavy and continuous rain during the flood season often causes geological disasters, such as landslides, collapses, and mudslides of different scales, although landslides are the predominant disaster type.

Fig. 1
figure 1

Locations of study area and landslides

Data sources and parameters

The research method has seven steps (Fig. 2): (1) Data collection; (2) landslide inventory mapping and determining causes of landslides; (3) establishment of landslide susceptibility model; (4) optimization of the model training parameters; (5) model verification and comparison; (6) landslide susceptibility map generation; (7) selecting the optimal model for the study area.

Fig. 2
figure 2

Flowchart of the study

A landslide inventory map is required to analyze the landslide susceptibility in the study area and understand the causes of landslides (Kim et al. 2018). We identified 311 landslides in the landslide inventory map. The same number of non-landslide points were selected in the study area to construct a sample dataset. The dataset was divided into a training dataset (218 landslides) and a verification dataset (93 landslides) at a ratio of 7:3.

No authoritative criterion exists to select factors contributing to landslides (Wang et al. 2020b). In the present study, 16 factors based on the geomorphological characteristics of the study area and previous studies, were used including elevation, plan curvature, profile curvature, slope aspect, slope angle, slope length, topographic position index (TPI), terrain ruggedness index (TRI), convergence index (CI), distance to roads, distance to rivers, rainfall, normalized difference vegetation index (NDVI), soil type, lithology, and land use. A digital elevation model (DEM) with a resolution of 30 m was used to extract the profile curvature, plan curvature, slope angle, slope aspect, and slope length. The raster files of the 16 factors were resampled to the same spatial resolution (30 × 30 m).

In general, the temperature, rainfall, and gravitational energy of landslides differ for different altitudes (Wang et al. 2016). These factors may affect the stability of the slope (Meng et al. 2016). The altitude in the study area has a range of 600–1300 m. It was divided into nine categories with an interval of 100 m: < 600, 600–700, 700–800, 800–900, 900–1000, 1000–1100, 1100–1200, 1200–1300, and > 1300 (Fig. 3a). The plan curvature reflects the shape of the terrain. Positive values represent convex areas, and negative values represent concave areas. This parameter has been commonly used to analyze the stability of landslides (Zhang et al. 2018). Profile curvature is a critical factor because it affects the rate of deposition and erosion (Wang et al. 2015). The plan curvature and profile curvature values are divided into three categories according to the shape of the terrain: concave, plane, and convex (Fig. 3b-c). The slope aspect affects landslide susceptibility because of the influence of sunlight, resulting in different temperature and climatic conditions (Kumar and Anbalagan 2019). Similar to a recent study (Pham and Prakash 2017), the aspect values were divided into nine categories, as shown in Fig. 3d. The slope angle influences the movement of material and the distance it moves (Wang et al. 2015). Therefore, the slope should be considered in landslide susceptibility studies. We divided the slope into 7 categories: < 10, 10–20, 20–30, 30–40, 40–50, 50–60, and > 60 (Fig. 3e). The slope length is the distance from the beginning of overland flow to the location of deposition (Wischmeier and Smith 1978). It controls the water flow velocity down the slope and the distance of material movement (Gomez and Kavzoglu 2005). The slope length in the study area is in the range of 0–923.97 m and was divided into five categories: 0–39.86, 39.86–112.33, 112.33–202.91, 202.91–329.73, and 329.73–923.97 m (Fig. 3f). The TPI is the relative height of a pixel based on the neighboring terrain (Mokarram et al. 2015). Positive values (TPI > 0) indicate terrain higher than the adjacent terrain and vice versa (Neuh user et al. 2012). The TPI values were divided into five categories: -87.9-(-12.52), (-12.52)-(-4.81), (-4.81)-2.32, 2.32–10.03, and 10.03–63.45 (Fig. 3g). The TRI is a critical factor influencing landslides because it can divides the landslides into small groups (Nguyen et al. 2019b). The TRI was extracted from the DEM, and the values were divided into five categories: 0–5.05, 5.05–8.62, 8.62–12.48, 12.48–17.82, and 17.82–75.75 (Fig. 3h). The CI describes of material or water flows into or out of a grid cell (Neuh user et al. 2012). The maximum and minimum values of the CI in the study area were 97.05 and -99.34, respectively (Fig. 3i).

Fig. 3
figure 3figure 3figure 3

Thematic maps of the study: a elevation, b plan curvature, c profile curvature, d slope aspect, e slope angle, f slope length, g TPI, h TRI, i convergence Index, j distance to roads, k distance to rivers, l rainfall, m NDVI, n soil type, o lithology, p land use

The construction of roads affects slope stability. Thus, the distance to roads is a critical factor affecting the occurrence of landslides (Dang et al. 2019). The distance to roads was divided into five categories with an interval of 300 m: < 300, 300–600, 600–900, 900–1200, and > 1200 m (Fig. 3j). River erosion can also contribute to landslides (Zhang et al. 2018). The distance to rivers is a theoretical indicator of the extent of river erosion (Manzo et al. 2013). The values of this indicator were divided into five categories with 200 m intervals: < 200, 200–400, 400–600, 600–800, and > 800 m (Fig. 3k). Rainfall is a key factor affecting landslides due to runoff and pore water pressure (Yang et al. 2015). Rainfall was divided into six categories: < 460, 460–470, 470–480, 480–490, 490–500, and > 500 mm/yr (Fig. 3i).

The NDVI reflects the surface vegetation cover (Shirzadi et al. 2018). It is calculated as (Chapi et al. 2017):

$$\text{NDVI}=\frac{\left(\text{NIR}\left(\text{Band }4\right)\right)-\text{Red}\left(\text{Band }3\right)}{\left(\text{NIR}\left(\text{Band }4\right)\right)+\text{Red}\left(\text{Band }3\right)}$$
(1)

where Red and NIR are the spectral reflectance values in the red and near-infrared regions, respectively. The NDVI value of the study area has a range of -0.22–0.51. The NDVI values were divided into five categories: -0.22–0.1, 0.1–0.15, 0.15–0.20, 0.20–0.25, and 0.25–0.51.

Different soil types have different structures and properties, resulting in different impacts on landslides (Pham et al. 2017a). The area had four soil types: Type A: Calcaric Cambisol (CMc), Type B: Eutric Cambisol (CMe), Type C: Calcaric Fluvisol (FLc), and Type D5: Rendzic Leptosol (LPk) (Fig. 3n). The lithology also affects slope stability (Kumar and Anbalagan 2019). The lithology of the study area is shown in Fig. 2o. There are five categories. Group B comprises the largest area (the lower layer is gray, sandy gravel, and the upper layer is Malan loess). In addition, there are Group A (the lower layer is grayish-yellow sandy gravel, and the upper layer is yellow loess-like soil), Group C (the lower layer is sandy gravel composed of feldspar sandstone, mudstone, and calcareous nodules, and the upper layer is Lishi loess), Group D (red-brown silty clay with scattered calcareous nodules; the bottom layer is grayish white conglomerate, sandstone, and sandy mudstone) and Group E (sandstone, silty mudstone, argillaceous siltstone, sandy mudstone, siltstone, carbonaceous shale, and oil shale). Previous studies have shown that land use is a critical factor in landslide susceptibility analysis (Bui et al. 2012; Oh et al. 2018). The land use map was created from aerial photographs of the study area. There are six land-use types in the study area: farmland, forest land, grassland, water, construction land, and unused land (Fig. 3p).

Methodology

Random forest

RF is a machine learning algorithm used for classification and prediction problems (Chen et al. 2020b). Each tree in an RF depends on the values of independently sampled random vectors, and all have the same distribution (Breiman 2001). RF has advantages over CART because of the fewer number of nodes (Breiman 2001). This method is not prone to overfitting because of the law of large numbers and is an accurate classification and regression technique when suitable randomness is used (Breiman 2001). It utilizes the bagging technique. Samples are selected from the training dataset to establish classification or regression trees. The remaining samples are used to evaluate the error of the model. Because an RF consists of many classification trees or regression trees, resulting in high model stability and accuracy (Trigila et al. 2015). The results of individual CART are combined (Zhu et al. 2019). A RF model has two critical parameters, ntree and mrty, where ntree is the number of classification regression trees; mtry is the number of node splits. If the ntree value is very large, the modeling time is high, and if the ntree value is very small, modeling errors occur (Chen et al. 2020b).

Classification and regression trees

Breiman proposed a rule-based binary recursive partitioning method called CART (Steinberg 2009). It uses binary recursive partitioning to generate a binary tree and divides the results of a node into yes/no answers (Felicísimo et al. 2013). CART produces an overgrown tree that is pruned by cross-validation to prevent overfitting (Felicísimo et al. 2013). Researchers have used CART to construct complex trees to handle complex problems with large datasets (Felicísimo et al. 2013). CART is regarded as a modern version of a decision tree (Fig. 4) (Mondal et al. 2019).

Fig. 4
figure 4

Structure of the CART classifier

CART has been applied in many fields, such as medical research (Temkin et al. 1995), natural disaster prediction (e.g., floods) (Mosavi et al. 2018), and computational science (Westreich et al. 2010). It has also been successfully applied to landslide prediction research (Pham et al. 2017b). Landslide susceptibility assessment studies consist of constructing trees, pruning trees, and selecting the optimum trees for landslide and non-landslide categories (Pham et al. 2018).

Bayesian network

A BN is an inference method to quantity uncertainty. It has been used in artificial intelligence to determine probability and uncertainty (Larra aga et al. 1996) and model decision-making processes (Gheisari and Meybodi 2016). BNs have been widely used in many fields in recent years for landslide stability analysis (Lee 2010). They have two learning subtasks: structure learning and parameter learning (Chen et al. 2018). Structure learning determines the topology of the network, a critical aspect of BNs. Parameter learning is used to define the numerical parameters of a given network topology (Gheisari and Meybodi 2016). BNs have many advantages (Tingyao and Dinglong 2013). First, they are well suited for incomplete data. Second, they can be combined with other methods for cause-and-effect analysis. Third, they utilize prior knowledge and are not prone to overfitting.

Logistic model trees

LMTs are efficient and flexible for establishing a logical model. They use the CART algorithm to determine the model error and the model complexity to conduct pruning (Landwehr et al. 2005) to prevent overfitting of the LMT. LMT have the following characteristics (Landwehr et al. 2005): (1) They provide more accurate results than the C4.5 decision tree and independent logistic regression; (2) they do not require any parameter tuning.

The purpose of using LMT is to establish a tree model and classify the training dataset into two categories: landslides have a value of 1, and non-landslide areas have a value of 0. The predicted value of the landslide category is used as a sensitivity indicator.

The LogitBoost algorithm performs additive logistic regression and least-squares fitting for each type of Ci (landslide or non-slide). It is expressed in Eq. (2) (Bui et al. 2016):

$$Lc\left(\text{x}\right)={\sum }_{i=1}^{D}{\beta }_{i}{x}_{i}+{\beta }_{o}$$
(2)

where D is the total number of landslide input factors; bi is the logistic coefficient.

Linear logistic regression is used to calculate the posterior probability of the leaf node, as expressed in Eq. (3) (Bui et al. 2016). This value is the landslide susceptibility index (LSI).

$$P\left(\text{C}/\text{x}\right)=\text{exp}\left(\text{Lc}\left(\text{x}\right)\right)/{\sum }_{{c}^{\prime}=1}^{c}\text{exp}\left({\text{L}}_{{\text{c}}^{\prime}}\left(\text{x}\right)\right)$$
(3)

Model performance and comparison

Receiver operating characteristic curves

The area under the ROC curve (AUC) is used to compare the performance of the four landslide susceptibility models (Saha et al. 2023a; Williams et al. 1999). The ROC curve can be used to distinguish two types of events and visualize the performance of the classifiers (Swets 1988). The ROC curve represents the probability of a true positive (correctly predicted event response) and a false positive (falsely predicted event response). In the spatial prediction of landslide susceptibility (Gorsevski et al. 2006), the true positive is the prediction of the location of the landslide, and the false positive is the prediction of the location where the landslide will not occur. The ideal model is obtained when P (true positive) = 1 and P (false positive) = 0 (the area under the ROC curve is 1) (Williams et al. 1999; Zhang et al. 2023).

$$AUC=\frac{\sum TP+\sum TN}{P+N}$$
(4)

where TP denotes true positive, TN denotes true negative, P is the total number of positives, and N is the total number of negatives.

Statistical measures

Other statistical indicators were used to evaluate the model performance, including the positive predictive rate (PPR), negative predictive rate (NPR), sensitivity, specificity, accuracy, F-score, Matthews correlation coefficient (MCC), and true skill statistic (TSS). These statistical indicators have been used in landslide susceptibility mapping and have been explained in detail elsewhere (Hong et al. 2020; Nguyen et al. 2019a; Saha et al. 2022, 2023a; Wu et al. 2020). They can be calculated using the following equations:

$$PPR=\frac{TP}{TP+FP}$$
(5)
$$NPR=\frac{TN}{TN+FN}$$
(6)
$$Sensitivity=\frac{TP}{TP+FN}$$
(7)
$$Specificity=\frac{TN}{TN+TP}$$
(8)
$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$
(9)
$$F-score=\frac{2TP}{2TP+FP+FN}$$
(10)
$$MCC=\frac{\left(TP\times TN\right)-\left(FP\times FN\right)}{\sqrt{\left(TP+FP\right)\left(TP+FN\right)\left(TN+FP\right)\left(TP+FN\right)\left(TN+FN\right)}}$$
(11)
$$TSS=\frac{TP}{TP+FN}+\frac{TN}{TN+FP}$$
(12)

where the true positive (TP) and true negative (TN) are the correctly classified pixels, and the false positive (FP) and false negative (FN) are the incorrectly classified pixels. PPV represents the accuracy of predicting positive results and is the proportion of predictions that accurately represent the true conditions (Powers 2011). The NPR represents the accuracy of the prediction model in predicting missing (or non-event) locations (Thai Pham et al. 2019). Sensitivity, also known as the true positive rate (TPR), represents the probability of correctly predicting the positive results observed in reality (Rahmati et al. 2019; Saha et al. 2023b). Specificity, also known as the true negative rate (TNR), is the probability of quantifying the negative factors observed in the correct prediction of reality. The accuracy, also known as the model efficiency, reflects the overall success of the prediction model. The harmonic mean of the precision and the recall (F-score) ranges from 0 to 1, where 1 indicates the highest accuracy and recall, and 0 indicates the worst. The MCC is the correlation coefficient between the observed and predicted value. It assumes true and false positive and negative values according to the calculated metrics (Rahmati et al. 2019). The TSS is used to measure the ability of a predicted value to distinguish between events and non-events (Allouche et al. 2006).

Results

Correlation analysis results and factor selection

It is important to calculate the correlation between landslide influencing factors in landslide susceptibility mapping (Chen et al. 2017). It is difficult to predict landslides if the factors are correlated (Chen et al. 2019b). Therefore, a multicollinearity analysis was conducted to determine the correlation between the 16 factors (Bui et al. 2019). The tolerance (TOL) and variance inflation factor (VIF) are critical statistical parameters in multicollinearity analysis (Saha et al. 2024). They are calculated using Eqs. 13 and 14, respectively (Li and Wang 2019):

$$TOL=\frac{1}{VIF}$$
(13)
$$VIF=\frac{1}{1-{R}_{i}^{2}}$$
(14)

where Ri represents the negative correlation coefficient of the ith independent variable. Studies have shown that when the TOL is less than 0.1 or the VIF is greater than 10, multicollinearity exists between the factors (Bui et al. 2016). Table 1 shows no multicollinearity occurs between the 16 landslide susceptibility factors.

Table 1 Outcomes of multicollinearity analysis

The FR is calculated to analyze the relationship between the 16 factors and landslide occurrence. Figure 5 shows the results. In the land use category, construction land has the highest FR value (2.191), followed by farmland (1.069), forest land (1.067), and grassland (0.921). The water and unused land categories have FR values of 0. The ranking of the FR value in the lithology category is Group D (2.192), Group E (1.198), Group B (0.892), Group C (0.845), and Group A (0.776). The category with the highest FR value in the soil type is Type A (1). The minimum value is Type D (0.516). The FR values in the NDVI category have a range of 0.814–1.177, and the highest FR value occurs in the 0.15–0.20 class. In the rainfall category, the FR value is the highest in the first class (0–460), and the values decrease with a decrease in rainfall. In distance to rivers, the FR value for the 0–200 classification is 1.806, followed by 200–400 (1.644), 600–800 (0.696), 400–600 (0.620), and > 800 (0.588). In the distance to roads classification, the FR value is the highest in the 300–600 category (1.825) and the lowest in the 900–1,200 category (0.880). The FR value in the CI category is in the range of 0 to 1.209. In the five categories of TRI, The FR value of the TRI is the highest in the category 17.82–75.75 (2.347). The highest FR value occurs in the TPI category -87.9- -12.52 (1.126 The FR values of the slope angle in the 10–20, 20–30, 30–40, and 40–50 classes are 0.036, 2.286, 2.486, and 2.907, respectively. The FR value of the Flat category is 0, and the highest FR value of the North category is 1.515 for the aspect. The FR values of the Plan curvature and Profile curvature range from 0 to 1.095 and 0 to 1.067, respectively. The highest FR value for elevation occurs in the 1,300–1,392 range (1.535).

Fig. 5 
figure 5

Frequency ratios of landslide influencing factors

Result of parameter optimization

The grid search method was used in parameter optimization. The RF model does assume a relationship between the explanatory and response variables, enabling the analysis of hierarchical and nonlinear interactions in large datasets (Kim et al. 2018). Therefore, the RF algorithm was selected for landslide susceptibility analysis. The optimal parameters of the RF model obtained after parameter adjustment are as follows: number of alterations = 60, number of seeds = 7, number of decimal places = 2, and batch size = 100 (Fig. 6). The LMT model is a popular machine learning model and has been widely used in recent years. We chose the LMT because it uses decision rules to simplify complex problems (Bui et al. 2016). The probability calculation for the LMT model was performed in Weka software (Frank et al. 2016). The parameters are: batch size = 100, minimum number of instances = 15, number of boosting iterations = 8, number of decimal places = 2, and weightTrimBeta is 0 (Fig. 7). The BN used in this study uses a hill-climbing algorithm constrained by the order of variables. It was implemented in Weka software, and the alpha parameter was 0.5. The CART model is a non-parametric prediction model (Markham et al. 2000). The main parameters to establish the CART model in Weka software are: SValue is 1, batch size is 100, NumFoldsPruning is 5, and number of seeds is 3 (Fig. 8).

Fig. 6
figure 6

Parameter optimization of the RF model

Fig. 7
figure 7

Parameter optimization of the LMT model

Fig. 8
figure 8

Parameter optimization of the CART model

The landslide susceptibility maps were created in GIS software (ESRI 2014). We calculated the LSI, ranked it, and generated the maps. The landslide susceptibility maps generated by the RF, CART, LMT, and BN models are shown in Fig. 9a-d. Natural breaks were used in GIS software to classify the landslide susceptibility zones into five classes: very low, low, moderate, high, and very high (Zhao and Chen 2020).

Fig. 9
figure 9

Landslide susceptibility maps for four models: a RF model, b CART model, c BN model, and d LMT model

Comparison of model performance

Figure 10 shows the ROC curves of the four models for the different datasets. For the training dataset, the AUC values of the four models (RF, CART, BN, and LMT) are 0.875, 0.830, 0.786, and 0.870, respectively. We considered the AUC value based on the training dataset for comparing and validating the models. The AUC value of the validation dataset is equally important. The AUC values of the four models are 0.878, 0.880, 0.886, and 0.893, respectively, for the validation dataset.

Fig. 10
figure 10

ROC curves for the four models: a training data; b validation data

In addition to the AUC value, we calculated the standard error, 95% confidence interval, and significance level (p-value). Tables 2 and 3 list the statistical results. In general, the smaller the standard error, the confidence interval, and the p-value, the better the model performance is (Wang et al. 2020b).

Table 2 Parameters of ROC curves using the training data
Table 3 Parameters of ROC curves using the validation data

Table 4 list the performance of the four machine learning models. The LMT model has the highest performance for predicting landslides (the sensitivity is 93.6% for the training data set and 94.6% for the validation data set). The RF model has the best performance for classifying non-landslide areas for the training data set (specificity = 70. 2%). The RF and LMT models have the same specificity for the validation data set (specificity = 76.3%). The LMT model has the highest accuracy (85.5%), the highest F-score value (86.7%), the highest MCC (0.772), and the highest TSS (0.710) for the validation data set. These results indicate that the LMT model has superior performance, followed by the RF model and the CART model. The BN model is less stable for the training and validation data sets.

Table 4 Statistical measures of the four models

A Chi-square test was used to determine if there were significant differences between the models (Zhao and Chen 2020). Table 5 indicates no significant differences between the RF and LMT models (the Chi-square value is 0.021, and the p-value is 0.885). The other five groups showed significant differences.

Table 5 Results of chi-Square test of models

Discussion

Significant climate change in recent years has produced frequent heavy rainfall, increasing the number of landslides. Therefore, an assessment of landslide susceptibility in the study area can identify areas susceptible to landslide disasters to enable the government to warn residents and manage susceptible areas (Lee et al. 2019). The present study introduces various aspects in the field of landslide susceptibility assessment. It systematically integrates four advanced machine learning algorithms RF, CART, LMT, and BN to comprehensively analyze and map landslide-prone areas. By evaluating these models and optimizing their parameters, the research provides a comparative framework that highlights the strengths and weaknesses of each approach. After consulting the scientific literature, 16 factors influencing landslide were selected: elevation, plan curvature, profile curvature, slope aspect, slope angle, slope length, TPI, TRI, CI, distance to roads, distance to rivers, rainfall, NDVI, soil type, lithology, and land use. The relationship between landslide occurrence and the 16 factors was analyzed by calculating the FR. The meticulous factor selection process, guided by a multicollinearity analysis, ensures that relevant and non-correlated variables are considered. Moreover, the study's emphasis on parameter optimization enhances the accuracy and robustness of the models' predictions. The investigation of dominant factors through variable importance analysis enhances the understanding of local landslide triggers. By generating multiple susceptibility maps, the research acknowledges the variability of model outputs, paving the way for potential integrated modeling approaches. The study showcases a systematic and rigorous methodology, offering novel insights into effective landslide susceptibility assessment techniques that can be adapted and extended to diverse geographical settings.

Researchers have proposed various landslide susceptibility modeling methods in recent years, resulting in high variability of prediction accuracy (Akgun 2012). Several novel machine learning methods have outperformed traditional methods (Tien Bui et al. 2012), and these methods have been combined with GIS analysis. Therefore, it is critical to compare new landslide susceptibility modeling methods. This study evaluated four machine learning methods for landslide susceptibility mapping. The model parameters were optimized according to the classification accuracy and the area under the ROC curve. However, the model performance cannot be evaluated solely based on the AUC value. Studies have shown that the AUC value may be high, but the prediction accuracy may not necessarily be sufficient (Gutiérrez et al. 2013). Therefore, other statistical parameters were included to evaluate model performance. It was found that the performance of the machine learning models was substantially affected by the parameters. The performance stability of the RF, CART, and LMT models was higher than that of the BN model. The RF and LMT models had the best performance and stability in this study. However, more research is required on the optimization of the model parameters because of the large workload of adjusting the model parameters and the wide range of factors.

Determining the optimal number of influencing factors is a critical problem in landslide susceptibility assessments. Different study areas may have different landslide susceptibility factors (Van Westen et al. 2003). No unified standard has been developed to date to solve this problem. We selected 16 landslide susceptibility factors according to the literature and the geological characteristics of the study area. The variable importance must be considered because of the large number of factors influencing landslides. Figure 11 shows that the slope angle, TRI, and distance to rivers were the most important influencing factors of existing landslides in the study area, although the remaining 13 factors also contributed to landslide susceptibility. Many studies have shown that the slope angle significantly influenced landslides (Pourghasemi and Rossi 2017). Generally, landslides are more likely in areas with high slopes than in areas with low slopes (Kumar and Anbalagan 2019). In Yanchuan County, the undulating terrain and deep valleys contribute to variations in slope angles, leading to areas with higher angles being more prone to landslides. The TRI value reflects the degree of surface fluctuation and erosion (Zhang et al. 2019). Rivers typically erode the slope foot and adversely affect slope stability; thus, the distance to rivers is an important factor in the study area. Rivers can significantly impact slope stability by eroding the base of slopes and weakening their structural integrity. Areas closer to rivers are more vulnerable to landslides due to increased erosion and potential undercutting of slopes. In Yanchuan County, the proximity to rivers and their erosional forces likely contributes to the heightened susceptibility observed in these areas.

Fig. 11
figure 11

The prediction capability of landslide susceptibility factors

The four machine learning models showed differences in stability. Therefore, the landslide susceptibility maps generated by the four models exhibit differences. In addition, the factor selection strategy used in this study is applicable to other areas.

The performance evaluation of the four models in this study reveals distinct advantages and limitations for each other. The RF model demonstrates high stability and competitive accuracy, making it suitable for complex interactions among variables. Its ability to handle large datasets and nonlinear relationships contributes to robust predictions. However, its parameter optimization process can be time-consuming. The CART model is characterized by its simplicity and interpretability, making it useful for identifying dominant factors. Yet, it may suffer from overfitting and instability, especially with complex datasets. The LMT exhibit remarkable accuracy, particularly in classifying positive instances, and their logical structure aids in explaining results. However, they can be sensitive to parameter settings and may require substantial data preprocessing. The BN model approach leverages probabilistic relationships, incorporating prior knowledge effectively. Still, it can be constrained by the assumption of conditional independence and may necessitate thorough domain expertise for proper structure specification. While RF and LMT offer strong overall performance, each model's suitability depends on the specific context, data availability, and desired interpretability, highlighting the importance of selecting models based on trade-offs between accuracy, stability, and simplicity.

Conclusions

The present study systematically examined landslide susceptibility in Yanchuan County, China, using four machine learning models: Random Forest (RF), Classification and Regression Trees (CART), Logistic Model Trees (LMT), and Bayesian Network (BN). Through a rigorous analysis of 16 influential factors and parameter optimization, the RF and LMT models emerged as the most stable and accurate choices for predicting landslides. The comprehensive analysis of landslide susceptibility in Yanchuan County highlighted the significant importance of slope angle, TPI, and distance to rivers as key contributing factors. These factors exhibited strong influence on landslide occurrence, reflecting the intricate interplay of terrain characteristics, erosion dynamics, and hydrological conditions in shaping the susceptibility landscape. Their prominent roles underscored the critical need for effective management and mitigation strategies, as well as the importance of considering localized geological and topographical features in landslide susceptibility assessments.

The landslide susceptibility maps generated by RF and LMT provided valuable insights for disaster prevention and mitigation in the study area. The study highlights the significance of data preprocessing, model parameter adjustment, and the multifaceted nature of landslide prediction. While each model presents distinct advantages and limitations, the RF and LMT models stood out for their robust performance and applicability to complex scenarios. The research contributes to advancing our understanding of landslide susceptibility assessment and underscores the importance of model selection based on accuracy, stability, and interpretability considerations.

Despite its valuable contributions, this study has certain limitations that should be acknowledged. Firstly, the study relies on the assumption that the selected factors are the sole contributors to landslide susceptibility, potentially overlooking other unaccounted variables. Secondly, while parameter optimization enhances model performance, the process can be time-consuming and resource-intensive, potentially limiting its practical implementation on larger scales. Additionally, the study employs a single-model approach to generate landslide susceptibility maps, which may not fully capture the complexity of real-world scenarios that could benefit from integrated models. Moreover, the study's focus on Yanchuan County's unique geological and environmental conditions might limit the generalizability of findings to other regions with distinct characteristics. Lastly, while comprehensive evaluation metrics are used, the assessment does not consider temporal variations in landslide occurrences, which could affect model robustness over time. Recognizing these limitations provides avenues for future research to address these challenges and refine the methodologies for more accurate and broadly applicable landslide susceptibility assessments.

In summary, this study provides a reasonable basis for landslide prediction. The conclusions and results of this study can be used to prevent landslides and reduce the harm caused by landslide disasters in the study area and other areas with similar geological environment conditions.