Exploring advanced machine learning techniques for landslide susceptibility mapping in Yanchuan County, China

Chen, Wei; Guo, Chao; Lin, Fanghao; Zhao, Ruixin; Li, Tao; Tsangaratos, Paraskevas; Ilia, Ioanna

doi:10.1007/s12145-024-01455-8

Exploring advanced machine learning techniques for landslide susceptibility mapping in Yanchuan County, China

Research
Published: 27 August 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Earth Science Informatics Aims and scope Submit manuscript

Exploring advanced machine learning techniques for landslide susceptibility mapping in Yanchuan County, China

Download PDF

Wei Chen¹,
Chao Guo¹,
Fanghao Lin¹,
Ruixin Zhao²,
Tao Li³,
Paraskevas Tsangaratos⁴ &
…
Ioanna Ilia⁴

74 Accesses
Explore all metrics

Abstract

Many landslides occurred every year, causing extensive property losses and casualties in China. Landslide susceptibility mapping is crucial for disaster prevention by the government or related organizations to protect people's lives and property. This study compared the performance of random forest (RF), classification and regression trees (CART), Bayesian network (BN), and logistic model trees (LMT) methods in generating landslide susceptibility maps in Yanchuan County using optimization strategy. A field survey was conducted to map 311 landslides. The dataset was divided into a training dataset and a validation dataset with a ratio of 7:3. Sixteen factors influencing landslides were identified based on a geological survey of the study area, including elevation, plan curvature, profile curvature, slope aspect, slope angle, slope length, topographic position index (TPI), terrain ruggedness index (TRI), convergence index, normalized difference vegetation index (NDVI), distance to roads, distance to rivers, rainfall, soil type, lithology, and land use. The training dataset was used to train the models in Weka software, and landslide susceptibility maps were generated in GIS software. The performance of the four models was evaluated by receiver operating characteristic (ROC) curves, confusion matrix, chi-square test, and other statistical analysis methods. The comparison results show that all four machine learning models are suitable for evaluating landslide susceptibility in the study area. The performances of the RF and LMT methods are more stable than those of the other two models; thus, they are suitable for landslide susceptibility mapping.

GIS-based landslide susceptibility mapping in the Longmen Mountain area (China) using three different machine learning algorithms and their comparison

Article 13 July 2023

Spatial prediction of landslide susceptibility using GIS-based statistical and machine learning models in Wanzhou County, Three Gorges Reservoir, China

Article 22 April 2019

Selecting optimal conditioning parameters for landslide susceptibility: an experimental research on Aqabat Al-Sulbat, Saudi Arabia

Article 14 August 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

A landslide is defined as rock or debris sliding down a slope (Gudiyangada Nachappa et al. 2019). According to statistics, 3,876 landslides occurred between 1995 and 2014, causing 163,658 deaths worldwide (Haque et al. 2019). The factors causing landslides are complex and diverse, including natural factors, such as heavy rain, long continuous rainfall, reservoir river erosion, and earthquakes, and human factors, such as excavation of slopes, mining of mineral resources, and the combination of natural and human factors (Wilde et al. 2018). Landslides have significantly affected residents' living standards and infrastructure, especially in underdeveloped countries (Bovenga et al. 2017; Chen et al. 2019a). The occurrence of landslides may increase in the future due to rapid economic and population growth, environmental damage, and over-exploitation of natural resources (Gudiyangada Nachappa et al. 2019). Crucially, global warming has caused a rise in temperatures and a sharp increase in rainfall events and rainfall frequency (reduced summer precipitation); thus, it contributes to landslides (Kavzoglu et al. 2019).

Landslide susceptibility is the probability of slope instability occurring in an area based on geological conditions (Bui et al. 2012). The existing modeling methods can be divided into qualitative methods and quantitative methods (Chen and Yang 2023; Saha et al. 2024). The qualitative method relies on the subjective opinions of experts (Saha et al. 2023b), which may deviate the results from reality (Chen and Yang 2023). Quantitative methods could be divided into traditional statistical models, deterministic models, and machine learning algorithms (Chen and Yang 2023). For example, logistic regression (Bai et al. 2010; Ohlmacher and Davis 2003; Zhu and Huang 2006), weight of evidence (Lee 2013; Othman et al. 2015; Wang et al. 2020a), and frequency ratios (FRs) (Chen et al. 2020a; Hong et al. 2017; Khan et al. 2019) are commonly used statistical methods that have also been used as benchmarks. Although ground-based geotechnical surveys can be used to create reliable landslide susceptibility maps, this method is time-consuming, costly, and not suitable for large areas (Kim et al. 2018). Therefore, researchers have proposed numerous statistical and machine learning methods for creating landslide susceptibility maps since the 1970s (Kavzoglu et al. 2019). Machine learning methods can learn from data and adapt models by analyzing the potential relationships in the data to create analytical models. In landslide susceptibility analysis, the characteristics of landslide and non-landslide locations are analyzed, and advanced algorithms are used to simulate the inherently complex relationship (Kavzoglu et al. 2019). Many machine learning methods have been used to conduct landslide susceptibility assessments. Examples include support vector machines (Ballabio and Sterlacchini 2012; Saha et al. 2023a), Bayesian networks (BN) (Lee 2010; Song et al. 2012), artificial neural network (ANN) (Cao et al. 2023; Saha et al. 2022; Wu et al. 2024), credal decision trees (He et al. 2019), convolutional neural network (CNN) (Aslam et al. 2023; Deng et al. 2024; Ge et al. 2023), random forest (RF) (Guo et al. 2024; Lagomarsino et al. 2017; Rao and Leng 2024; Youssef et al. 2016), classification and regression trees (CART) (Felicísimo et al. 2013; Pham et al. 2018), and logistic model trees (LMT) (Bui et al. 2016; Truong et al. 2018).

In this context, the present study endeavors to address the pressing concerns of landslide susceptibility in Yanchuan County, China. By harnessing the capabilities of advanced machine learning models including RF, CART, LMT, and BN, coupled with a comprehensive array of terrain and environmental factors, this study seeks to unveil the intricate relationship between these factors and landslide occurrence. By doing so, we aim to provide not only a robust predictive framework for identifying vulnerable areas but also insights that can inform targeted mitigation strategies and emergency planning. As the region grapples with the increasing challenges posed by climate change and rapid urbanization, the outcomes of this research promise to contribute meaningfully to the ongoing efforts in disaster resilience and sustainable land use management.

Study area

Yanchuan County is located in the northeastern part of Yan'an City, Shaanxi Province, China (109°36′20″ E—110°26′44″ E; 36°37′15 N–37°5′55″ N) (Fig. 1). The county extends 70 km from east to west and 39 km from north to south and covers an area of 1941 km². Yanchuan County is located in the Loess Plateau. The terrain is undulating, and the loess cliffs have steep slopes. The multi-year average evaporation is 1541.7 mm. Heavy and continuous rain during the flood season often causes geological disasters, such as landslides, collapses, and mudslides of different scales, although landslides are the predominant disaster type.

Data sources and parameters

The research method has seven steps (Fig. 2): (1) Data collection; (2) landslide inventory mapping and determining causes of landslides; (3) establishment of landslide susceptibility model; (4) optimization of the model training parameters; (5) model verification and comparison; (6) landslide susceptibility map generation; (7) selecting the optimal model for the study area.

A landslide inventory map is required to analyze the landslide susceptibility in the study area and understand the causes of landslides (Kim et al. 2018). We identified 311 landslides in the landslide inventory map. The same number of non-landslide points were selected in the study area to construct a sample dataset. The dataset was divided into a training dataset (218 landslides) and a verification dataset (93 landslides) at a ratio of 7:3.

No authoritative criterion exists to select factors contributing to landslides (Wang et al. 2020b). In the present study, 16 factors based on the geomorphological characteristics of the study area and previous studies, were used including elevation, plan curvature, profile curvature, slope aspect, slope angle, slope length, topographic position index (TPI), terrain ruggedness index (TRI), convergence index (CI), distance to roads, distance to rivers, rainfall, normalized difference vegetation index (NDVI), soil type, lithology, and land use. A digital elevation model (DEM) with a resolution of 30 m was used to extract the profile curvature, plan curvature, slope angle, slope aspect, and slope length. The raster files of the 16 factors were resampled to the same spatial resolution (30 × 30 m).

In general, the temperature, rainfall, and gravitational energy of landslides differ for different altitudes (Wang et al. 2016). These factors may affect the stability of the slope (Meng et al. 2016). The altitude in the study area has a range of 600–1300 m. It was divided into nine categories with an interval of 100 m: < 600, 600–700, 700–800, 800–900, 900–1000, 1000–1100, 1100–1200, 1200–1300, and > 1300 (Fig. 3a). The plan curvature reflects the shape of the terrain. Positive values represent convex areas, and negative values represent concave areas. This parameter has been commonly used to analyze the stability of landslides (Zhang et al. 2018). Profile curvature is a critical factor because it affects the rate of deposition and erosion (Wang et al. 2015). The plan curvature and profile curvature values are divided into three categories according to the shape of the terrain: concave, plane, and convex (Fig. 3b-c). The slope aspect affects landslide susceptibility because of the influence of sunlight, resulting in different temperature and climatic conditions (Kumar and Anbalagan 2019). Similar to a recent study (Pham and Prakash 2017), the aspect values were divided into nine categories, as shown in Fig. 3d. The slope angle influences the movement of material and the distance it moves (Wang et al. 2015). Therefore, the slope should be considered in landslide susceptibility studies. We divided the slope into 7 categories: < 10, 10–20, 20–30, 30–40, 40–50, 50–60, and > 60 (Fig. 3e). The slope length is the distance from the beginning of overland flow to the location of deposition (Wischmeier and Smith 1978). It controls the water flow velocity down the slope and the distance of material movement (Gomez and Kavzoglu 2005). The slope length in the study area is in the range of 0–923.97 m and was divided into five categories: 0–39.86, 39.86–112.33, 112.33–202.91, 202.91–329.73, and 329.73–923.97 m (Fig. 3f). The TPI is the relative height of a pixel based on the neighboring terrain (Mokarram et al. 2015). Positive values (TPI > 0) indicate terrain higher than the adjacent terrain and vice versa (Neuh user et al. 2012). The TPI values were divided into five categories: -87.9-(-12.52), (-12.52)-(-4.81), (-4.81)-2.32, 2.32–10.03, and 10.03–63.45 (Fig. 3g). The TRI is a critical factor influencing landslides because it can divides the landslides into small groups (Nguyen et al. 2019b). The TRI was extracted from the DEM, and the values were divided into five categories: 0–5.05, 5.05–8.62, 8.62–12.48, 12.48–17.82, and 17.82–75.75 (Fig. 3h). The CI describes of material or water flows into or out of a grid cell (Neuh user et al. 2012). The maximum and minimum values of the CI in the study area were 97.05 and -99.34, respectively (Fig. 3i).

The construction of roads affects slope stability. Thus, the distance to roads is a critical factor affecting the occurrence of landslides (Dang et al. 2019). The distance to roads was divided into five categories with an interval of 300 m: < 300, 300–600, 600–900, 900–1200, and > 1200 m (Fig. 3j). River erosion can also contribute to landslides (Zhang et al. 2018). The distance to rivers is a theoretical indicator of the extent of river erosion (Manzo et al. 2013). The values of this indicator were divided into five categories with 200 m intervals: < 200, 200–400, 400–600, 600–800, and > 800 m (Fig. 3k). Rainfall is a key factor affecting landslides due to runoff and pore water pressure (Yang et al. 2015). Rainfall was divided into six categories: < 460, 460–470, 470–480, 480–490, 490–500, and > 500 mm/yr (Fig. 3i).

The NDVI reflects the surface vegetation cover (Shirzadi et al. 2018). It is calculated as (Chapi et al. 2017):

$$\text{NDVI}=\frac{\left(\text{NIR}\left(\text{Band }4\right)\right)-\text{Red}\left(\text{Band }3\right)}{\left(\text{NIR}\left(\text{Band }4\right)\right)+\text{Red}\left(\text{Band }3\right)}$$

(1)

where Red and NIR are the spectral reflectance values in the red and near-infrared regions, respectively. The NDVI value of the study area has a range of -0.22–0.51. The NDVI values were divided into five categories: -0.22–0.1, 0.1–0.15, 0.15–0.20, 0.20–0.25, and 0.25–0.51.

Different soil types have different structures and properties, resulting in different impacts on landslides (Pham et al. 2017a). The area had four soil types: Type A: Calcaric Cambisol (CMc), Type B: Eutric Cambisol (CMe), Type C: Calcaric Fluvisol (FLc), and Type D5: Rendzic Leptosol (LPk) (Fig. 3n). The lithology also affects slope stability (Kumar and Anbalagan 2019). The lithology of the study area is shown in Fig. 2o. There are five categories. Group B comprises the largest area (the lower layer is gray, sandy gravel, and the upper layer is Malan loess). In addition, there are Group A (the lower layer is grayish-yellow sandy gravel, and the upper layer is yellow loess-like soil), Group C (the lower layer is sandy gravel composed of feldspar sandstone, mudstone, and calcareous nodules, and the upper layer is Lishi loess), Group D (red-brown silty clay with scattered calcareous nodules; the bottom layer is grayish white conglomerate, sandstone, and sandy mudstone) and Group E (sandstone, silty mudstone, argillaceous siltstone, sandy mudstone, siltstone, carbonaceous shale, and oil shale). Previous studies have shown that land use is a critical factor in landslide susceptibility analysis (Bui et al. 2012; Oh et al. 2018). The land use map was created from aerial photographs of the study area. There are six land-use types in the study area: farmland, forest land, grassland, water, construction land, and unused land (Fig. 3p).

Methodology

Random forest

RF is a machine learning algorithm used for classification and prediction problems (Chen et al. 2020b). Each tree in an RF depends on the values of independently sampled random vectors, and all have the same distribution (Breiman 2001). RF has advantages over CART because of the fewer number of nodes (Breiman 2001). This method is not prone to overfitting because of the law of large numbers and is an accurate classification and regression technique when suitable randomness is used (Breiman 2001). It utilizes the bagging technique. Samples are selected from the training dataset to establish classification or regression trees. The remaining samples are used to evaluate the error of the model. Because an RF consists of many classification trees or regression trees, resulting in high model stability and accuracy (Trigila et al. 2015). The results of individual CART are combined (Zhu et al. 2019). A RF model has two critical parameters, ntree and mrty, where ntree is the number of classification regression trees; mtry is the number of node splits. If the ntree value is very large, the modeling time is high, and if the ntree value is very small, modeling errors occur (Chen et al. 2020b).

Classification and regression trees

Breiman proposed a rule-based binary recursive partitioning method called CART (Steinberg 2009). It uses binary recursive partitioning to generate a binary tree and divides the results of a node into yes/no answers (Felicísimo et al. 2013). CART produces an overgrown tree that is pruned by cross-validation to prevent overfitting (Felicísimo et al. 2013). Researchers have used CART to construct complex trees to handle complex problems with large datasets (Felicísimo et al. 2013). CART is regarded as a modern version of a decision tree (Fig. 4) (Mondal et al. 2019).

CART has been applied in many fields, such as medical research (Temkin et al. 1995), natural disaster prediction (e.g., floods) (Mosavi et al. 2018), and computational science (Westreich et al. 2010). It has also been successfully applied to landslide prediction research (Pham et al. 2017b). Landslide susceptibility assessment studies consist of constructing trees, pruning trees, and selecting the optimum trees for landslide and non-landslide categories (Pham et al. 2018).

Bayesian network

A BN is an inference method to quantity uncertainty. It has been used in artificial intelligence to determine probability and uncertainty (Larra aga et al. 1996) and model decision-making processes (Gheisari and Meybodi 2016). BNs have been widely used in many fields in recent years for landslide stability analysis (Lee 2010). They have two learning subtasks: structure learning and parameter learning (Chen et al. 2018). Structure learning determines the topology of the network, a critical aspect of BNs. Parameter learning is used to define the numerical parameters of a given network topology (Gheisari and Meybodi 2016). BNs have many advantages (Tingyao and Dinglong 2013). First, they are well suited for incomplete data. Second, they can be combined with other methods for cause-and-effect analysis. Third, they utilize prior knowledge and are not prone to overfitting.

Logistic model trees

LMTs are efficient and flexible for establishing a logical model. They use the CART algorithm to determine the model error and the model complexity to conduct pruning (Landwehr et al. 2005) to prevent overfitting of the LMT. LMT have the following characteristics (Landwehr et al. 2005): (1) They provide more accurate results than the C4.5 decision tree and independent logistic regression; (2) they do not require any parameter tuning.

The purpose of using LMT is to establish a tree model and classify the training dataset into two categories: landslides have a value of 1, and non-landslide areas have a value of 0. The predicted value of the landslide category is used as a sensitivity indicator.

The LogitBoost algorithm performs additive logistic regression and least-squares fitting for each type of Ci (landslide or non-slide). It is expressed in Eq. (2) (Bui et al. 2016):

$$Lc\left(\text{x}\right)={\sum }_{i=1}^{D}{\beta }_{i}{x}_{i}+{\beta }_{o}$$

(2)

where D is the total number of landslide input factors; bi is the logistic coefficient.

Linear logistic regression is used to calculate the posterior probability of the leaf node, as expressed in Eq. (3) (Bui et al. 2016). This value is the landslide susceptibility index (LSI).

$$P\left(\text{C}/\text{x}\right)=\text{exp}\left(\text{Lc}\left(\text{x}\right)\right)/{\sum }_{{c}^{\prime}=1}^{c}\text{exp}\left({\text{L}}_{{\text{c}}^{\prime}}\left(\text{x}\right)\right)$$

(3)

Model performance and comparison

Receiver operating characteristic curves

The area under the ROC curve (AUC) is used to compare the performance of the four landslide susceptibility models (Saha et al. 2023a; Williams et al. 1999). The ROC curve can be used to distinguish two types of events and visualize the performance of the classifiers (Swets 1988). The ROC curve represents the probability of a true positive (correctly predicted event response) and a false positive (falsely predicted event response). In the spatial prediction of landslide susceptibility (Gorsevski et al. 2006), the true positive is the prediction of the location of the landslide, and the false positive is the prediction of the location where the landslide will not occur. The ideal model is obtained when P (true positive) = 1 and P (false positive) = 0 (the area under the ROC curve is 1) (Williams et al. 1999; Zhang et al. 2023).

$$AUC=\frac{\sum TP+\sum TN}{P+N}$$

(4)

where TP denotes true positive, TN denotes true negative, P is the total number of positives, and N is the total number of negatives.

Statistical measures

Other statistical indicators were used to evaluate the model performance, including the positive predictive rate (PPR), negative predictive rate (NPR), sensitivity, specificity, accuracy, F-score, Matthews correlation coefficient (MCC), and true skill statistic (TSS). These statistical indicators have been used in landslide susceptibility mapping and have been explained in detail elsewhere (Hong et al. 2020; Nguyen et al. 2019a; Saha et al. 2022, 2023a; Wu et al. 2020). They can be calculated using the following equations:

$$PPR=\frac{TP}{TP+FP}$$

(5)

$$NPR=\frac{TN}{TN+FN}$$

(6)

$$Sensitivity=\frac{TP}{TP+FN}$$

(7)

$$Specificity=\frac{TN}{TN+TP}$$

(8)

$$Accuracy=\frac{TP+TN}{TP+TN+FP+FN}$$

(9)

$$F-score=\frac{2TP}{2TP+FP+FN}$$

(10)

$$MCC=\frac{\left(TP\times TN\right)-\left(FP\times FN\right)}{\sqrt{\left(TP+FP\right)\left(TP+FN\right)\left(TN+FP\right)\left(TP+FN\right)\left(TN+FN\right)}}$$

(11)

$$TSS=\frac{TP}{TP+FN}+\frac{TN}{TN+FP}$$

(12)

where the true positive (TP) and true negative (TN) are the correctly classified pixels, and the false positive (FP) and false negative (FN) are the incorrectly classified pixels. PPV represents the accuracy of predicting positive results and is the proportion of predictions that accurately represent the true conditions (Powers 2011). The NPR represents the accuracy of the prediction model in predicting missing (or non-event) locations (Thai Pham et al. 2019). Sensitivity, also known as the true positive rate (TPR), represents the probability of correctly predicting the positive results observed in reality (Rahmati et al. 2019; Saha et al. 2023b). Specificity, also known as the true negative rate (TNR), is the probability of quantifying the negative factors observed in the correct prediction of reality. The accuracy, also known as the model efficiency, reflects the overall success of the prediction model. The harmonic mean of the precision and the recall (F-score) ranges from 0 to 1, where 1 indicates the highest accuracy and recall, and 0 indicates the worst. The MCC is the correlation coefficient between the observed and predicted value. It assumes true and false positive and negative values according to the calculated metrics (Rahmati et al. 2019). The TSS is used to measure the ability of a predicted value to distinguish between events and non-events (Allouche et al. 2006).

Results

Correlation analysis results and factor selection

It is important to calculate the correlation between landslide influencing factors in landslide susceptibility mapping (Chen et al. 2017). It is difficult to predict landslides if the factors are correlated (Chen et al. 2019b). Therefore, a multicollinearity analysis was conducted to determine the correlation between the 16 factors (Bui et al. 2019). The tolerance (TOL) and variance inflation factor (VIF) are critical statistical parameters in multicollinearity analysis (Saha et al. 2024). They are calculated using Eqs. 13 and 14, respectively (Li and Wang 2019):

$$TOL=\frac{1}{VIF}$$

(13)

$$VIF=\frac{1}{1-{R}_{i}^{2}}$$

(14)

where R_i represents the negative correlation coefficient of the ith independent variable. Studies have shown that when the TOL is less than 0.1 or the VIF is greater than 10, multicollinearity exists between the factors (Bui et al. 2016). Table 1 shows no multicollinearity occurs between the 16 landslide susceptibility factors.

Table 1 Outcomes of multicollinearity analysis

Full size table

The FR is calculated to analyze the relationship between the 16 factors and landslide occurrence. Figure 5 shows the results. In the land use category, construction land has the highest FR value (2.191), followed by farmland (1.069), forest land (1.067), and grassland (0.921). The water and unused land categories have FR values of 0. The ranking of the FR value in the lithology category is Group D (2.192), Group E (1.198), Group B (0.892), Group C (0.845), and Group A (0.776). The category with the highest FR value in the soil type is Type A (1). The minimum value is Type D (0.516). The FR values in the NDVI category have a range of 0.814–1.177, and the highest FR value occurs in the 0.15–0.20 class. In the rainfall category, the FR value is the highest in the first class (0–460), and the values decrease with a decrease in rainfall. In distance to rivers, the FR value for the 0–200 classification is 1.806, followed by 200–400 (1.644), 600–800 (0.696), 400–600 (0.620), and > 800 (0.588). In the distance to roads classification, the FR value is the highest in the 300–600 category (1.825) and the lowest in the 900–1,200 category (0.880). The FR value in the CI category is in the range of 0 to 1.209. In the five categories of TRI, The FR value of the TRI is the highest in the category 17.82–75.75 (2.347). The highest FR value occurs in the TPI category -87.9- -12.52 (1.126 The FR values of the slope angle in the 10–20, 20–30, 30–40, and 40–50 classes are 0.036, 2.286, 2.486, and 2.907, respectively. The FR value of the Flat category is 0, and the highest FR value of the North category is 1.515 for the aspect. The FR values of the Plan curvature and Profile curvature range from 0 to 1.095 and 0 to 1.067, respectively. The highest FR value for elevation occurs in the 1,300–1,392 range (1.535).

Result of parameter optimization

The grid search method was used in parameter optimization. The RF model does assume a relationship between the explanatory and response variables, enabling the analysis of hierarchical and nonlinear interactions in large datasets (Kim et al. 2018). Therefore, the RF algorithm was selected for landslide susceptibility analysis. The optimal parameters of the RF model obtained after parameter adjustment are as follows: number of alterations = 60, number of seeds = 7, number of decimal places = 2, and batch size = 100 (Fig. 6). The LMT model is a popular machine learning model and has been widely used in recent years. We chose the LMT because it uses decision rules to simplify complex problems (Bui et al. 2016). The probability calculation for the LMT model was performed in Weka software (Frank et al. 2016). The parameters are: batch size = 100, minimum number of instances = 15, number of boosting iterations = 8, number of decimal places = 2, and weightTrimBeta is 0 (Fig. 7). The BN used in this study uses a hill-climbing algorithm constrained by the order of variables. It was implemented in Weka software, and the alpha parameter was 0.5. The CART model is a non-parametric prediction model (Markham et al. 2000). The main parameters to establish the CART model in Weka software are: SValue is 1, batch size is 100, NumFoldsPruning is 5, and number of seeds is 3 (Fig. 8).

The landslide susceptibility maps were created in GIS software (ESRI 2014). We calculated the LSI, ranked it, and generated the maps. The landslide susceptibility maps generated by the RF, CART, LMT, and BN models are shown in Fig. 9a-d. Natural breaks were used in GIS software to classify the landslide susceptibility zones into five classes: very low, low, moderate, high, and very high (Zhao and Chen 2020).

Comparison of model performance

Figure 10 shows the ROC curves of the four models for the different datasets. For the training dataset, the AUC values of the four models (RF, CART, BN, and LMT) are 0.875, 0.830, 0.786, and 0.870, respectively. We considered the AUC value based on the training dataset for comparing and validating the models. The AUC value of the validation dataset is equally important. The AUC values of the four models are 0.878, 0.880, 0.886, and 0.893, respectively, for the validation dataset.

In addition to the AUC value, we calculated the standard error, 95% confidence interval, and significance level (p-value). Tables 2 and 3 list the statistical results. In general, the smaller the standard error, the confidence interval, and the p-value, the better the model performance is (Wang et al. 2020b).

Table 2 Parameters of ROC curves using the training data

Full size table

Table 3 Parameters of ROC curves using the validation data

Full size table

Table 4 list the performance of the four machine learning models. The LMT model has the highest performance for predicting landslides (the sensitivity is 93.6% for the training data set and 94.6% for the validation data set). The RF model has the best performance for classifying non-landslide areas for the training data set (specificity = 70. 2%). The RF and LMT models have the same specificity for the validation data set (specificity = 76.3%). The LMT model has the highest accuracy (85.5%), the highest F-score value (86.7%), the highest MCC (0.772), and the highest TSS (0.710) for the validation data set. These results indicate that the LMT model has superior performance, followed by the RF model and the CART model. The BN model is less stable for the training and validation data sets.

Table 4 Statistical measures of the four models

Full size table

A Chi-square test was used to determine if there were significant differences between the models (Zhao and Chen 2020). Table 5 indicates no significant differences between the RF and LMT models (the Chi-square value is 0.021, and the p-value is 0.885). The other five groups showed significant differences.

Table 5 Results of chi-Square test of models

Full size table

Discussion

Significant climate change in recent years has produced frequent heavy rainfall, increasing the number of landslides. Therefore, an assessment of landslide susceptibility in the study area can identify areas susceptible to landslide disasters to enable the government to warn residents and manage susceptible areas (Lee et al. 2019). The present study introduces various aspects in the field of landslide susceptibility assessment. It systematically integrates four advanced machine learning algorithms RF, CART, LMT, and BN to comprehensively analyze and map landslide-prone areas. By evaluating these models and optimizing their parameters, the research provides a comparative framework that highlights the strengths and weaknesses of each approach. After consulting the scientific literature, 16 factors influencing landslide were selected: elevation, plan curvature, profile curvature, slope aspect, slope angle, slope length, TPI, TRI, CI, distance to roads, distance to rivers, rainfall, NDVI, soil type, lithology, and land use. The relationship between landslide occurrence and the 16 factors was analyzed by calculating the FR. The meticulous factor selection process, guided by a multicollinearity analysis, ensures that relevant and non-correlated variables are considered. Moreover, the study's emphasis on parameter optimization enhances the accuracy and robustness of the models' predictions. The investigation of dominant factors through variable importance analysis enhances the understanding of local landslide triggers. By generating multiple susceptibility maps, the research acknowledges the variability of model outputs, paving the way for potential integrated modeling approaches. The study showcases a systematic and rigorous methodology, offering novel insights into effective landslide susceptibility assessment techniques that can be adapted and extended to diverse geographical settings.

Researchers have proposed various landslide susceptibility modeling methods in recent years, resulting in high variability of prediction accuracy (Akgun 2012). Several novel machine learning methods have outperformed traditional methods (Tien Bui et al. 2012), and these methods have been combined with GIS analysis. Therefore, it is critical to compare new landslide susceptibility modeling methods. This study evaluated four machine learning methods for landslide susceptibility mapping. The model parameters were optimized according to the classification accuracy and the area under the ROC curve. However, the model performance cannot be evaluated solely based on the AUC value. Studies have shown that the AUC value may be high, but the prediction accuracy may not necessarily be sufficient (Gutiérrez et al. 2013). Therefore, other statistical parameters were included to evaluate model performance. It was found that the performance of the machine learning models was substantially affected by the parameters. The performance stability of the RF, CART, and LMT models was higher than that of the BN model. The RF and LMT models had the best performance and stability in this study. However, more research is required on the optimization of the model parameters because of the large workload of adjusting the model parameters and the wide range of factors.

Determining the optimal number of influencing factors is a critical problem in landslide susceptibility assessments. Different study areas may have different landslide susceptibility factors (Van Westen et al. 2003). No unified standard has been developed to date to solve this problem. We selected 16 landslide susceptibility factors according to the literature and the geological characteristics of the study area. The variable importance must be considered because of the large number of factors influencing landslides. Figure 11 shows that the slope angle, TRI, and distance to rivers were the most important influencing factors of existing landslides in the study area, although the remaining 13 factors also contributed to landslide susceptibility. Many studies have shown that the slope angle significantly influenced landslides (Pourghasemi and Rossi 2017). Generally, landslides are more likely in areas with high slopes than in areas with low slopes (Kumar and Anbalagan 2019). In Yanchuan County, the undulating terrain and deep valleys contribute to variations in slope angles, leading to areas with higher angles being more prone to landslides. The TRI value reflects the degree of surface fluctuation and erosion (Zhang et al. 2019). Rivers typically erode the slope foot and adversely affect slope stability; thus, the distance to rivers is an important factor in the study area. Rivers can significantly impact slope stability by eroding the base of slopes and weakening their structural integrity. Areas closer to rivers are more vulnerable to landslides due to increased erosion and potential undercutting of slopes. In Yanchuan County, the proximity to rivers and their erosional forces likely contributes to the heightened susceptibility observed in these areas.

The four machine learning models showed differences in stability. Therefore, the landslide susceptibility maps generated by the four models exhibit differences. In addition, the factor selection strategy used in this study is applicable to other areas.

The performance evaluation of the four models in this study reveals distinct advantages and limitations for each other. The RF model demonstrates high stability and competitive accuracy, making it suitable for complex interactions among variables. Its ability to handle large datasets and nonlinear relationships contributes to robust predictions. However, its parameter optimization process can be time-consuming. The CART model is characterized by its simplicity and interpretability, making it useful for identifying dominant factors. Yet, it may suffer from overfitting and instability, especially with complex datasets. The LMT exhibit remarkable accuracy, particularly in classifying positive instances, and their logical structure aids in explaining results. However, they can be sensitive to parameter settings and may require substantial data preprocessing. The BN model approach leverages probabilistic relationships, incorporating prior knowledge effectively. Still, it can be constrained by the assumption of conditional independence and may necessitate thorough domain expertise for proper structure specification. While RF and LMT offer strong overall performance, each model's suitability depends on the specific context, data availability, and desired interpretability, highlighting the importance of selecting models based on trade-offs between accuracy, stability, and simplicity.

Conclusions

The present study systematically examined landslide susceptibility in Yanchuan County, China, using four machine learning models: Random Forest (RF), Classification and Regression Trees (CART), Logistic Model Trees (LMT), and Bayesian Network (BN). Through a rigorous analysis of 16 influential factors and parameter optimization, the RF and LMT models emerged as the most stable and accurate choices for predicting landslides. The comprehensive analysis of landslide susceptibility in Yanchuan County highlighted the significant importance of slope angle, TPI, and distance to rivers as key contributing factors. These factors exhibited strong influence on landslide occurrence, reflecting the intricate interplay of terrain characteristics, erosion dynamics, and hydrological conditions in shaping the susceptibility landscape. Their prominent roles underscored the critical need for effective management and mitigation strategies, as well as the importance of considering localized geological and topographical features in landslide susceptibility assessments.

The landslide susceptibility maps generated by RF and LMT provided valuable insights for disaster prevention and mitigation in the study area. The study highlights the significance of data preprocessing, model parameter adjustment, and the multifaceted nature of landslide prediction. While each model presents distinct advantages and limitations, the RF and LMT models stood out for their robust performance and applicability to complex scenarios. The research contributes to advancing our understanding of landslide susceptibility assessment and underscores the importance of model selection based on accuracy, stability, and interpretability considerations.

Despite its valuable contributions, this study has certain limitations that should be acknowledged. Firstly, the study relies on the assumption that the selected factors are the sole contributors to landslide susceptibility, potentially overlooking other unaccounted variables. Secondly, while parameter optimization enhances model performance, the process can be time-consuming and resource-intensive, potentially limiting its practical implementation on larger scales. Additionally, the study employs a single-model approach to generate landslide susceptibility maps, which may not fully capture the complexity of real-world scenarios that could benefit from integrated models. Moreover, the study's focus on Yanchuan County's unique geological and environmental conditions might limit the generalizability of findings to other regions with distinct characteristics. Lastly, while comprehensive evaluation metrics are used, the assessment does not consider temporal variations in landslide occurrences, which could affect model robustness over time. Recognizing these limitations provides avenues for future research to address these challenges and refine the methodologies for more accurate and broadly applicable landslide susceptibility assessments.

In summary, this study provides a reasonable basis for landslide prediction. The conclusions and results of this study can be used to prevent landslides and reduce the harm caused by landslide disasters in the study area and other areas with similar geological environment conditions.

References

Akgun A (2012) A comparison of landslide susceptibility maps produced by logistic regression, multi-criteria decision, and likelihood ratio methods: a case study at zmir, Turkey. Landslides 9:93–106
Article Google Scholar
Allouche O, Tsoar A, Kadmon R (2006) Assessing the accuracy of species distribution models: prevalence, kappa and the true skill statistic (TSS). J Appl Ecol 43:1223–1232
Article Google Scholar
Aslam B, Zafar A, Khalil U (2023) Comparative analysis of multiple conventional neural networks for landslide susceptibility mapping. Nat Hazards 115:673–707. https://doi.org/10.1007/s11069-022-05570-x
Article Google Scholar
Bai S-B, Wang J, Lü G-N, Zhou P-G, Hou S-S, Xu S-N (2010) GIS-based logistic regression for landslide susceptibility mapping of the Zhongxian segment in the Three Gorges area China. Geomorphology 115:23–31
Article Google Scholar
Ballabio C, Sterlacchini S (2012) Support vector machines for landslide susceptibility mapping: the Staffora River Basin case study, Italy. Math Geosci 44:47–70
Article Google Scholar
Bovenga F, Pasquariello G, Pellicani R, Refice A, Spilotro G (2017) Landslide monitoring for risk mitigation by using corner reflector and satellite SAR interferometry: The large landslide of Carlantino (Italy). CATENA 151:49–62
Article Google Scholar
Breiman L (2001) Random forests. Machine Learn 45:5–32
Article Google Scholar
Bui DT, Pradhan B, Lofman O, Revhaug I, Dick OB (2012) Spatial prediction of landslide hazards in Hoa Binh province (Vietnam): a comparative assessment of the efficacy of evidential belief functions and fuzzy logic models. Catena 96:28–40
Article Google Scholar
Bui DT, Tuan TA, Klempe H, Pradhan B, Revhaug I (2016) Spatial prediction models for shallow landslide hazards: a comparative assessment of the efficacy of support vector machines, artificial neural networks, kernel logistic regression, and logistic model tree. Landslides 13:361–378
Article Google Scholar
Bui DT, Ngo P-TT, Pham TD, Jaafari A, Minh NQ, Hoa PV, Samui P (2019) A novel hybrid approach based on a swarm intelligence optimized extreme learning machine for flash flood susceptibility mapping. Catena 179:184–196
Article Google Scholar
Cao W, Pan D, Xu Z, Fu Y, Zhang W, Ren Y, Nan T (2023) Landslide disaster vulnerability mapping study in Henan Province: Comparison of different machine learning models. Bull Geol Sci Technol. https://doi.org/10.19509/j.cnki.dzkq.tb20230338.
Chapi K, Singh VP, Shirzadi A, Shahabi H, Bui DT, Pham BT, Khosravi K (2017) A novel hybrid artificial intelligence approach for flood susceptibility assessment. Environ Model Softw 95:229–245
Article Google Scholar
Chen W, Yang Z (2023) Landslide susceptibility modeling using bivariate statistical-based logistic regression, naïve Bayes, and alternating decision tree models. Bull Eng Geol Env 82:190. https://doi.org/10.1007/s10064-023-03216-1
Article Google Scholar
Chen W, Xie X, Peng J, Wang J, Duan Z, Hong H (2017) GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Na ve-Bayes tree, and alternating decision tree models. Geomat Nat Haz Risk 8:950–973
Article Google Scholar
Chen W, Peng J, Hong H, Shahabi H, Pradhan B, Liu J, Zhu A-X, Pei X, Duan Z (2018) Landslide susceptibility modelling using GIS-based machine learning techniques for Chongren County, Jiangxi Province, China. ScTEn 626:1121–1135. https://doi.org/10.1016/j.scitotenv.2018.01.124
Article CAS Google Scholar
Chen W, Hong H, Panahi M, Shahabi H, Wang Y, Shirzadi A, Pirasteh S, Alesheikh AA, Khosravi K, Panahi S (2019a) Spatial prediction of landslide susceptibility using gis-based data mining techniques of anfis with whale optimization algorithm (woa) and grey wolf optimizer (gwo). Appl Sci 9:3755
Article Google Scholar
Chen W, Pradhan B, Li S, Shahabi H, Rizeei HM, Hou E, Wang S (2019b) Novel hybrid integration approach of bagging-based fisher¡¯s linear discriminant function for groundwater potential analysis. Nat Resour Res 28:1239–1258
Article Google Scholar
Chen W, Fan L, Li C, Pham BT (2020a) Spatial prediction of landslides using hybrid integration of artificial intelligence algorithms with frequency ratio and index of entropy in Nanzheng County, China. Appl Sci 10:29
Article CAS Google Scholar
Chen W, Li Y, Xue W, Shahabi H, Li S, Hong H, Wang X, Bian H, Zhang S, Pradhan B (2020b) Modeling flood susceptibility using data-driven approaches of na ve bayes tree, alternating decision tree, and random forest methods. Sci Total Environ 701:134979
Article CAS Google Scholar
Dang V-H, Dieu TB, Tran X-L, Hoang N-D (2019) Enhancing the accuracy of rainfall-induced landslide prediction along mountain roads with a GIS-based random forest classifier. Bull Eng Geol Env 78:2835–2849
Article Google Scholar
Deng R, Zhang Q, Liu W, Chen L, Tan J, Gao Z, Zheng X (2024) Collapse susceptibility evaluation based on an improved two-step sampling strategy and a convolutional neural network. Bull Geol Sci Technol 43(2):186–200. https://doi.org/10.19509/j.cnki.dzkq.tb20220535
Article Google Scholar
ESRI (2014) ArcGIS desktop: release 10.2 Redlands, CA: Environmental Systems Research Institute
Felicísimo ÁM, Cuartero A, Remondo J, Quirós E (2013) Mapping landslide susceptibility with logistic regression, multiple adaptive regression splines, classification and regression trees, and maximum entropy methods: a comparative study. Landslides 10:175–189
Article Google Scholar
Frank E, Hall AM, Witten HI (2016) The WEKA Workbench. Online Appendix for "Data Mining: Practical Machine Learning Tools and Techniques", Morgan Kaufmann, Fourth Edition
Ge Y, Liu G, Tang H, Zhao B, Xiong C (2023) Comparative analysis of five convolutional neural networks for landslide susceptibility assessment. Bull Eng Geol Env 82:377. https://doi.org/10.1007/s10064-023-03408-9
Article Google Scholar
Gheisari S, Meybodi MR (2016) Bnc-pso: structure learning of bayesian networks by particle swarm optimization. Inf Sci 348:272–289
Article Google Scholar
Gomez H, Kavzoglu T (2005) Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng Geol 78:11–27
Article Google Scholar
Gorsevski PV, Gessler PE, Foltz RB, Elliot WJ (2006) Spatial prediction of landslide hazard using logistic regression and ROC analysis. Trans GIS 10:395–415
Article Google Scholar
Gudiyangada Nachappa T, TavakkoliPiralilou S, Ghorbanzadeh O, Shahabi H, Blaschke T (2019) Landslide susceptibility mapping for Austria using Geons and optimization with the Dempster-Shafer theory. Appl Sci 9:5393
Article Google Scholar
Guo Y, Dou J, Xiang Z, Ma H, Dong A, Luo W (2024) Susceptibility evaluation of Wenchuan coseismic landslides by gradient boosting decision tree and random forest based on optimal negative sample sampling strategies. Bull Geol Sci Technol 43(3):251–265. https://doi.org/10.19509/j.cnki.dzkq.tb20230037
Article Google Scholar
Gutiérrez JA, Carvalheiro LG, Polce C, van Loon EE, Raes N, Reemer M, Biesmeijer JC (2013) Fit-for-purpose: species distribution model performance depends on evaluation criteria CDutch hoverflies as a case study. PloS One 8:e63708
Article Google Scholar
Haque U, Da Silva PF, Devoli G, JR Pilz, Zhao B, Khaloua A, Wilopo W, Andersen P, Lu P, Lee J (2019) The human cost of global warming: deadly landslides and their triggers (1995-2014). Sci Total Environ 682:673–684
Article CAS Google Scholar
He Q, Xu Z, Li S, Li R, Zhang S, Wang N, Pham BT, Chen W (2019) Novel entropy and rotation forest-based credal decision tree classifier for landslide susceptibility modeling. Entropy 21:106
Article Google Scholar
Hong H, Naghibi SA, Dashtpagerdi MM, Pourghasemi HR, Chen W (2017) A comparative assessment between linear and quadratic discriminant analyses (LDA-QDA) with frequency ratio and weights-of-evidence models for forest fire susceptibility mapping in China. Arab J Geosci 10:167
Article Google Scholar
Hong H, Liu J, Zhu A-X (2020) Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. ScTEn 718:137231
CAS Google Scholar
Kavzoglu T, Colkesen I, Sahin EK (2019) Machine learning techniques in landslide susceptibility mapping: A survey and a case study. Landslides: Theory, Practice and Modelling. Springer. pp 283–301
Khan H, Shafique M, Khan MA, Bacha MA, Shah SU, Calligaris C (2019) Landslide susceptibility assessment using Frequency Ratio, a case study of northern Pakistan. Egyptian J Remote Sens Space Sci 22:11–24
Google Scholar
Kim J-C, Lee S, Jung H-S, Lee S (2018) Landslide susceptibility mapping using random forest and boosted tree models in Pyeong-Chang, Korea. Geocarto Int 33:1000–1015
Article Google Scholar
Kumar R, Anbalagan R (2019) Landslide susceptibility mapping of the Tehri reservoir rim area using the weights of evidence method. J Earth Syst Sci 128:153
Article Google Scholar
Lagomarsino D, Tofani V, Segoni S, Catani F, Casagli N (2017) A tool for classification and regression using random forest methodology: applications to landslide susceptibility mapping and soil thickness modeling. Environ Model Assess 22:201–214
Article Google Scholar
Landwehr N, Hall M, Frank E (2005) Logistic model trees. Machine Learn 59:161–205
Article Google Scholar
Larraaga P, Poza M, Yurramendi Y, Murga RH, Kuijpers CMH (1996) Structure learning of Bayesian networks by genetic algorithms: A performance analysis of control parameters. IEEE Trans Pattern Anal Machine Intell 18:912–926
Article Google Scholar
Lee S-H (2010) Landslide susceptibility analysis using bayesian network and semantic technology. J Korean Soc Geospatial Inform Syst 18:61–69
Google Scholar
Lee S (2013) Landslide detection and susceptibility mapping in the Sagimakri area, Korea using KOMPSAT-1 and weight of evidence technique. Environ Earth Sci 70:3197–3215
Article Google Scholar
Lee S, Lee M-J, Jung H-S, Lee S (2019) Landslide susceptibility mapping using na ve bayes and bayesian network models in Umyeonsan, Korea. Geocarto Int 34:1–15
Google Scholar
Li R, Wang N (2019) Landslide susceptibility mapping for the Muchuan county (China): A comparison between bivariate statistical models (woe, ebf, and ioe) and their ensembles with logistic regression. Symmetry 11:762
Article CAS Google Scholar
Manzo G, Tofani V, Segoni S, Battistini A, Catani F (2013) GIS techniques for regional-scale landslide susceptibility assessment: the Sicily (Italy) case study. Int J Geogr Inf Sci 27:1433–1452
Article Google Scholar
Markham IS, Mathieu RG, Wray BA (2000) Kanban setting through artificial intelligence: a comparative study of artificial neural networks and decision trees. Integr Manuf Syst 11:239–246
Article Google Scholar
Meng Q, Miao F, Zhen J, Wang X, Wang A, Peng Y, Fan Q (2016) GIS-based landslide susceptibility mapping with logistic regression, analytical hierarchy process, and combined fuzzy and support vector machine methods: a case study from Wolong Giant Panda Natural Reserve, China. Bull Eng Geol Env 75:923–944
Article Google Scholar
Mokarram M, Roshan G, Negahban S (2015) Landform classification using topography position index (case study: salt dome of Korsia-Darab plain, Iran). Model Earth Syst Environ 1:40
Article Google Scholar
Mondal P, Liu X, Fatoyinbo TE, Lagomasino D (2019) Evaluating Combinations of Sentinel-2 Data and Machine-Learning Algorithms for Mangrove Mapping in West Africa. Remote Sens 11:2928
Article Google Scholar
Mosavi A, Ozturk P, Chau K-w (2018) Flood prediction using machine learning models: Literature review. Water 10:1536
Article Google Scholar
Neuh user B, Damm B, Terhorst B (2012) GIS-based assessment of landslide susceptibility on the base of the weights-of-evidence model. Landslides 9:511–528
Article Google Scholar
Nguyen V-T, Tran TH, Ha NA, Ngo VL, Nadhir A-A, Tran VP, Duy Nguyen H, Malek MA, Amini A, Prakash I (2019a) GIS based novel hybrid computational intelligence models for mapping landslide susceptibility: A case study at Da Lat City, Vietnam. Sustainability 11:7118
Article Google Scholar
Nguyen VV, Pham BT, Vu BT, Prakash I, Jha S, Shahabi H, Shirzadi A, Ba DN, Kumar R, Chatterjee JM (2019b) Hybrid machine learning approaches for landslide susceptibility modeling. Forests 10:157
Article Google Scholar
Oh H-J, Kadavi PR, Lee C-W, Lee S (2018) Evaluation of landslide susceptibility mapping by evidential belief function, logistic regression and support vector machine models. Geomat Nat Haz Risk 9:1053–1070
Article Google Scholar
Ohlmacher GC, Davis JC (2003) Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng Geol 69:331–343
Article Google Scholar
Othman A, Gloaguen R, Andreani L, Rahnama M (2015) Landslide susceptibility mapping in Mawat area, Kurdistan Region, NE Iraq: a comparison of different statistical models. Nat Hazards Earth Syst Sci Discuss 3:1789–1833
Google Scholar
Pham BT, Bui DT, Pourghasemi HR, Indra P, Dholakia M (2017a) Landslide susceptibility assesssment in the Uttarakhand area (India) using GIS: a comparison study of prediction capability of na ve bayes, multilayer perceptron neural networks, and functional trees methods. Theoret Appl Climatol 128:255–273
Article Google Scholar
Pham BT, Khosravi K, Prakash I (2017b) Application and comparison of decision tree-based machine learning methods in landside susceptibility assessment at Pauri Garhwal Area, Uttarakhand, India. Environ Process 4:711–730
Article Google Scholar
Pham BT, Prakash I, Bui DT (2018) Spatial prediction of landslides using a hybrid machine learning approach based on random subspace and classification and regression trees. Geomorphology 303:256–270
Article Google Scholar
Pham BT, Prakash I (2017) A novel hybrid intelligent approach of random subspace ensemble and reduced error pruning trees for landslide susceptibility modeling: A Case Study at Mu Cang Chai District, Yen Bai Province, Viet Nam. International Conference on Geo-Spatial Technologies and Earth Resources. Springer. pp 255–269
Pourghasemi HR, Rossi M (2017) Landslide susceptibility modeling in a landslide prone area in Mazandarn Province, north of Iran: a comparison between GLM, GAM, MARS, and M-AHP methods. Theoret Appl Climatol 130:609–633
Article Google Scholar
Powers DM (2011) Evaluation: from precision, recall and F-measure to ROC, informedness, markedness and correlation
Rahmati O, Kornejady A, Samadi M, Deo RC, Conoscenti C, Lombardo L, Dayal K, Taghizadeh-Mehrjardi R, Pourghasemi HR, Kumar S (2019) PMT: New analytical framework for automated evaluation of geo-environmental modelling approaches. ScTEn 664:296–311
CAS Google Scholar
Rao S, Leng X (2024) Debris flow susceptibility evaluation of Liangshan Prefecture based on the RSIV-RF model. Bull Geol Sci Technol 43(1):275–287. https://doi.org/10.19509/j.cnki.dzkq.tb20220267
Article Google Scholar
Saha A, Villuri VGK, Bhardwaj A (2022) Development and Assessment of GIS-Based Landslide Susceptibility Mapping Models Using ANN, Fuzzy-AHP, and MCDA in Darjeeling Himalayas, West Bengal. India Land 11:1711
Article Google Scholar
Saha A, Villuri VGK, Bhardwaj A (2023a) Development and assessment of a novel hybrid machine learning-based landslide susceptibility mapping model in the Darjeeling Himalayas. Stoch Env Res Risk Assess. https://doi.org/10.1007/s00477-023-02528-8
Article Google Scholar
Saha A, Villuri VGK, Bhardwaj A, Kumar S (2023b) A Multi-Criteria Decision Analysis (MCDA) Approach for Landslide Susceptibility Mapping of a Part of Darjeeling District in North-East Himalaya. India 13:5062
CAS Google Scholar
Saha A, Tripathi L, Villuri VGK, Bhardwaj A (2024) Exploring machine learning and statistical approach techniques for landslide susceptibility mapping in Siwalik Himalayan Region using geospatial technology. Environ Sci Pollut Res 31:10443–10459. https://doi.org/10.1007/s11356-023-31670-7
Article Google Scholar
Shirzadi A, Soliamani K, Habibnejhad M, Kavian A, Chapi K, Shahabi H, Chen W, Khosravi K, Thai Pham B, Pradhan B (2018) Novel GIS based machine learning algorithms for shallow landslide susceptibility mapping. Sensors 18:3777
Article Google Scholar
Song Y, Gong J, Gao S, Wang D, Cui T, Li Y, Wei B (2012) Susceptibility assessment of earthquake-induced landslides using Bayesian network: a case study in Beichuan, China. Comput Geosci 42:189–199
Article Google Scholar
Steinberg D (2009) CART: classi cation and regression trees. The top ten algorithms in data mining. Chapman and Hall/CRC. pp 193–216
Swets JA (1988) Measuring the accuracy of diagnostic systems. Science 240:1285–1293
Article CAS Google Scholar
Temkin NR, Holubkov R, Machamer JE, Winn HR, Dikmen SS (1995) Classification and regression trees (CART) for prediction of function at 1 year following head trauma. J Neurosurg 82:764–771
Article CAS Google Scholar
Thai Pham B, Shirzadi A, Shahabi H, Omidvar E, Singh SK, Sahana M, Talebpour Asl D, Bin Ahmad B, Kim Quoc N, Lee S (2019) Landslide susceptibility assessment by novel hybrid machine learning algorithms. Sustainability 11:4386
Article Google Scholar
Tien Bui D, Pradhan B, Lofman O, Revhaug I (2012) Landslide susceptibility assessment in vietnam using support vector machines, decision tree, and Naive Bayes Models. Math Problems Eng 2012:26
Article Google Scholar
Tingyao J, Dinglong W (2013) A landslide stability calculation method based on Bayesian network. 2013 2nd International Symposium on Instrumentation and Measurement, Sensor Network and Automation (IMSNA). IEEE. pp 905–908
Trigila A, Iadanza C, Esposito C, Scarascia-Mugnozza G (2015) Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 249:119–136
Article Google Scholar
Truong XL, Mitamura M, Kono Y, Raghavan V, Yonezawa G, Truong XQ, Do TH, Tien Bui D, Lee S (2018) Enhancing prediction performance of landslide susceptibility model using hybrid machine learning approach of bagging ensemble and logistic model tree. Appl Sci 8:1046
Article Google Scholar
Van Westen C, Rengers N, Soeters R (2003) Use of geomorphological information in indirect landslide susceptibility assessment. Nat Hazards 30:399–419
Article Google Scholar
Wang Q, Li W, Chen W, Bai H (2015) GIS-based assessment of landslide susceptibility using certainty factor and index of entropy models for the Qianyang County of Baoji city, China. J Earth Syst Sci 124:1399–1415
Article Google Scholar
Wang Q, Li W, Wu Y, Pei Y, Xie P (2016) Application of statistical index and index of entropy methods to landslide susceptibility assessment in Gongliu (Xinjiang, China). Environ Earth Sci 75:599
Article Google Scholar
Wang G, Chen X, Chen W (2020a) Spatial prediction of landslide susceptibility based on GIS and discriminant functions. ISPRS Int J Geo Inf 9:144
Article Google Scholar
Wang G, Lei X, Chen W, Shahabi H, Shirzadi A (2020b) Hybrid computational intelligence methods for landslide susceptibility mapping. Symmetry 12:325
Article Google Scholar
Westreich D, Lessler J, Funk MJ (2010) Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J Clin Epidemiol 63:826–833
Article Google Scholar
Wilde M, Günther A, Reichenbach P, Malet J-P, Hervás J (2018) Pan-European landslide susceptibility mapping: ELSUS Version 2. J Maps 14:97–104
Article Google Scholar
Williams CJ, Lee SS, Fisher RA, Dickerman LH (1999) A comparison of statistical methods for prenatal screening for Down syndrome. Appl Stoch Model Bus Ind 15:89–101
Article Google Scholar
Wischmeier WH, Smith DD (1978) Predicting rainfall erosion losses: a guide to conservation planning. Department of Agriculture, Science and Education Administration
Wu Y, Ke Y, Chen Z, Liang S, Zhao H, Hong H (2020) Application of alternating decision tree with AdaBoost and bagging ensembles for landslide susceptibility mapping. Catena 187:104396
Article Google Scholar
Wu L, Yin K, Zeng T, Liu, Shuhao L, Zhenyi (2024) Evaluation of geological disaster susceptibility of transmission lines under different grid resolutions. Bull Geol Sci Technol 43(1):241–252. https://doi.org/10.19509/j.cnki.dzkq.tb202203
Article Google Scholar
Yang Z-h, Lan H-x, Gao X, Li L-p, Meng Y-s, Wu Y-m (2015) Urgent landslide susceptibility assessment in the 2013 Lushan earthquake-impacted area, Sichuan Province, China. Nat Hazards 75:2467–2487
Article Google Scholar
Youssef AM, Pourghasemi HR, Pourtaghi ZS, Al-Katheeri MM (2016) Landslide susceptibility mapping using random forest, boosted regression tree, classification and regression tree, and general linear models and comparison of their performance at Wadi Tayyah Basin, Asir Region, Saudi Arabia. Landslides 13:839–856
Article Google Scholar
Zhang T, Han L, Chen W, Shahabi H (2018) Hybrid integration approach of entropy with logistic regression and support vector machine for landslide susceptibility modeling. Entropy 20:884
Article Google Scholar
Zhang T, Han L, Han J, Li X, Zhang H, Wang H (2019) Assessment of Landslide Susceptibility Using Integrated Ensemble Fractal Dimension with Kernel Logistic Regression Model. Entropy 21:218
Article Google Scholar
Zhang W, Chen H, Ji C, Yang Q, Xi w, Sun X, Zhang Y, Yu T, Ni B, Xu Z, Li D (2023) Landslide Susceptibility Assessment in the Alpine and Canyon Areas based on Ascending and Descending InSAR Data. Bull Geol Sci Technol. https://doi.org/10.19509/j.cnki.dzkq.tb20230560
Zhao X, Chen W (2020) Gis-based evaluation of landslide susceptibility models using certainty factors and functional trees-based ensemble techniques. Appl Sci 10:16
Article Google Scholar
Zhu L, Huang J-f (2006) GIS-based logistic regression method for landslide susceptibility mapping in regional scale. J Zhejiang University-Sci A 7:2007–2017
Article Google Scholar
Zhu A-X, Miao Y, Liu J, Bai S, Zeng C, Ma T, Hong H (2019) A similarity-based approach to sampling absence data for landslide susceptibility mapping using data-driven methods. Catena 183:104188
Article Google Scholar

Download references

Funding

This study was supported by the Innovation Capability Support Program of Shaanxi (Program No. 2020KJXX-005).

Author information

Authors and Affiliations

College of Geology and Environment, Xi’an University of Science and Technology, Xi’an, 710054, China
Wei Chen, Chao Guo & Fanghao Lin
School of Highway, Chang’an University, Xi’an, 710064, China
Ruixin Zhao
School of Mining & Civil Engineering Liupanshui Normal University, Liupanshui, 553000, Guizhou, China
Tao Li
Laboratory of Engineering Geology and Hydrogeology, Department of Geological Sciences, School of Mining and Metallurgical Engineering, National Technical University of Athens, 15780, Zografou, Greece
Paraskevas Tsangaratos & Ioanna Ilia

Authors

Wei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Chao Guo
View author publications
You can also search for this author in PubMed Google Scholar
Fanghao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Ruixin Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Paraskevas Tsangaratos
View author publications
You can also search for this author in PubMed Google Scholar
Ioanna Ilia
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei Chen.

Ethics declarations

Conflict of interest

The authors declare no competing interests.

Additional information

Communicated by: Hassan Babaie

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Chen, W., Guo, C., Lin, F. et al. Exploring advanced machine learning techniques for landslide susceptibility mapping in Yanchuan County, China. Earth Sci Inform (2024). https://doi.org/10.1007/s12145-024-01455-8

Download citation

Received: 28 May 2024
Accepted: 14 August 2024
Published: 27 August 2024
DOI: https://doi.org/10.1007/s12145-024-01455-8

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Exploring advanced machine learning techniques for landslide susceptibility mapping in Yanchuan County, China

Abstract

Similar content being viewed by others

GIS-based landslide susceptibility mapping in the Longmen Mountain area (China) using three different machine learning algorithms and their comparison

Spatial prediction of landslide susceptibility using GIS-based statistical and machine learning models in Wanzhou County, Three Gorges Reservoir, China

Selecting optimal conditioning parameters for landslide susceptibility: an experimental research on Aqabat Al-Sulbat, Saudi Arabia

Introduction

Study area

Data sources and parameters