Abstract
This study introduces four rock–soil characteristics factors, that is, Lithology, Rock Structure, Rock Infiltration, and Rock Weathering, which based on the properties of rock formations, to predict Landslide Susceptibility Mapping (LSM) in Three Gorges Reservoir Area from Zigui to Badong. Logistic regression, artificial neural network, support vector machine is used in LSM modeling. The study consists of three main steps. In the first step, these four factors are combined with the 11 basic factors to form different factor combinations. The second step randomly selects training (70% of the total) and validation (30%) datasets out of grid cells corresponding to landslide and non-landslide locations in the study area. The final step constructs the LSM models to obtain different landslide susceptibility index maps and landslide susceptibility zoning maps. The specific category precision, receiver operating characteristic curve, and 5 other statistical evaluation methods are used for quantitative evaluations. The evaluation results show that, in most cases, the result based on Rock Structure are better than the result obtained by traditional method based on Lithology, have the best performance. To further study the influence of rock–soil characteristic factors on the LSM, these four factors are divided into “Intrinsic attribute factors” and “External participation factors” in accordance with the participation of external factors, to generate the LSMs. The evaluation results show that the result based on Intrinsic attribute factors are better than the result based on External participation factors, indicating the significance of Intrinsic attribute factors in LSM. The method proposed in this study can effectively improve the scientificity, accuracy, and validity of LSM.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Introduction
China is located in the eastern part of the Asian continent and has active geotectonic movement and a complex geological environment. Its climate varies from cold temperate to a tropical climate, it is densely populated, and there are extensive and unrestrained engineering activities. It is a country that suffers from frequent landslides1. According to data from the National Land and Resources Bulletins and National Geological Hazard Notifications issued by the National Bureau of Statistics of China and the Ministry of Natural Resources of China, there were 379,596 geological hazard events in China during the 19 years from 2001 to 2019, and a total of 21,241 people were injured, missing, or died due to these events. The direct economic loss was as much as RMB 77.87 billion. Among the geological hazard events, there were 274,204 landslides, accounting for 72.2% of the total number of geological hazard events, as shown in Fig. 1.
The Three Gorges Reservoir Area (TGRA) is located in western Hubei and the mountainous areas of Chongqing. It has complex geological and topographic conditions, and suffers frequent occurrences of geological hazards. According to surveys and statistics of the Ministry of Natural Resources, there are over 2000 landslides in the entire TGRA. Among them, there are at least 250 large-scale landslides with total volume of nearly 2.39 billion cubic meters, accounting for 79% of the total number of geological hazard events in the entire TGRA5.
The theoretical basis of Landslide Susceptibility Mapping (LSM) is engineering-geological analogy, as a non-deterministic method. It differs from the deterministic method that uses the traditional slope failure mechanics calculation model and predicts the risk of a single landslide combined with basic spatial data6. Globally, researchers have conducted many studies on the formation of landslides. With the increasing understanding of landslides, researchers have realized that it is not enough to study the mechanism of single landslide. As landslides feature characteristics of banding and patchy distribution, more attention needs to be paid to research on regional landslides, and LSM is an effective method for the same. With the beginning of the twenty-first century, there has been increasing research on LSMs. Asmelash et al. took the Tamale region on the edge of the rift valley in central Ethiopia as the study area and used Weighted Linear Combination to calculate the Landslide Susceptibility Index (LSI) through the distribution weights and grades given by the analytic hierarchy process. The landslide prediction precision analysis result was about 88.6%7. Riegel and Alves applied the Forward Logistic Regression model and the Conditional Analysis model in Sicily, Italy, with the aim to identify the statistical correlation between the spatial distribution of landslides and the past controlling factors. Their studies proved that it was feasible to perform regional LSMs based on factor graphs and landslide records, which can save a lot of time and money8. Shahri et al. used the Artificial Neural Network (ANN) to draw a large-scale LSM of southwestern Sweden. The high-precision results obtained using the ANN model were important and cost-effective with potential significance for urban planners9. Pandey, Pourghasemi, and Sharma generated the landslide-prone area along Tirpari of Gawal-Himalayas to the Gutu corridor to study the performance of Maximum Entropy and Support Vector Machine (SVM) in LSM. The results of the Receiver Operating Characteristic (ROC) curve showed that SVM had a better prediction rate10. Chen et al. set Longhai City in China as the study area and used Priority Decision Tree, Random Forest (RF), and Naive Bayes (NB) models to generate LSMs. The ROC curve results showed that the RF model had the highest accuracy, with an Area Under the Curve (AUC) value of 0.86911. Yu et al. considered the spatial correlation of LSM factors, and used the Geographically Weighted Regression model to segment the study area. The LSM was generated with the combination of the Particle Swarm Optimization algorithm and SVM. Compared with the traditional model—not considering the spatial correlation—the results had been significantly improved6,12. Sameen, Pradhan, and Lee set the south area of Yangyang City in South Korea as the study area. They used the Bayes Optimization algorithm to obtain the hyperparameters of the One-Dimensional Convolution Neural Network (1D-CNN). And the 1D-CNN was used to generate the LSM of the study area. The results showed that the performance of the 1D-CNN model was better than ANN and SVM, its AUC and prediction precision were the most accurate at 0.893 and 83.11%, respectively13. Fang et al. set Yongxin County, Ji’an City, Jiangxi Province as the study area and generated LSMs integrating CNN with three traditional machine learning classifiers, namely, SVM, RF, and Logistic Regression (LR). The experimental results showed that integrating CNN can effectively improve the performance of machine learning classifiers. As for the precision of LSMs, the proposed three methods showed significant improvement compared with traditional machine learning classifiers14. Nhu et al. set the Halong region in Vietnam as the study area and studied the ability of Keras’s Deep Learning (DL) model to model the landslide hazard space, and compared it with traditional methods such as RF, J48 Decision Tree, Classification Tree, and Logical Model Tree. The results showed that the prediction precision of Keras’s DL model was 84.0%, indicating that its performance was better than those of traditional methods15. The LSM theory has been consistently developing, and has experienced a process from qualitative research to semi-qualitative research, and then to quantitative research1. In addition, with the continuous improvement of data acquisition methods and the sharing degree, research on LSM has led to the introduction of the DL model that requires massive amounts of data and complex calculations, greatly promoting the development of LSM theories and applications. Realizing the importance of models for LSM, many researchers devote themselves to the research of LSM models and developed many LSM models with higher accuracy, which often implies that the model is overfitting or the it takes a lot of time to adjust the model parameters. However, few researchers have focused on the underlying elements of landslide events, namely various LSM factors.
Recent years, most researchers continue to focus on rainfall-induced and earthquake-induced landslides in research on LSM factors16,17,18,19,20,21,22. Moreover, many researchers have focused on other types of LSM factors and explored the influence of LSM factors on LSMs. Lee et al. used web-based digital aerial photographs to generate the LSM in the Jinbu area of South Korea. According to the characteristics of landslides, Lee considered factors related to landslides, such as geomorphology, soil, forest, geology, and land use. The type, average diameter, density, age of trees on the landslide mass were also used as input factors. The RF, Weights of Evidence, LR, and ANN models were used to generate the LSMs. Their overall satisfaction levels were over 80%23. Chen, Pourghasemi, and Naghibi studied the influences of 12 factors on the landslides in the study area. They used the precision measurement of the RF model and realized that the most important factors in the study area were lithology, fault distance, and altitude. In the Gini measurement method, the three most important factors were altitude, fault distance, and distance to highway24. Pawluszek and Borkowski took the Rhone Lake region in Poland as the study area and studied the Digital Elevation Model (DEM) and the terrain conditions calculated by DEM to discuss the influence of DEM-derived factors on LSMs25. Skilodimou et al. used the mountainous area in the northern Peloponnese in Southern Greece as the study area and applied the statistical analysis of landslide frequency and density to evaluate the collected data along with Geographic Information System (GIS), thus determining the influence of natural and human factors on landslide activities26. Al-Najjar et al. extracted 14 LSM factors in the study area and divided them into 4 groups based on different methods. Then, they used three machine learning algorithms—RF, NB, and Enhanced Logistic Regression to generate the LSMs. The research results showed that the fourth group with 8 factors selected by factor analysis and optimization methods had the highest AUC value27. Yu and Gao used GIS and Remote Sensing (RS) theories to extract 58 LSM factors in the study area. By combining Pearson Correlation Coefficient (PCC), the Principal Component Analysis, and the factor importance analysis, a total of 18 LSM factors were obtained. They also generated the LSM of the TGRA6. Mind’je et al. combined 10 factors without multicollinearity with the Frequency Ratio method to generate the LSM of Rwanda. After analysis, it was concluded that parts of the western, northern, and southern regions were the most susceptible areas for landslides and the altitude was the main influencing factor28. Bourenane et al. studied the landslide disaster in Azazga city, the statistical results show that the landslides in this area are affected by the dip of the flysch formation layers, the schistosity planes and fractures downward slope direction, and the interface contact between the quaternary scree and flysch substratum29. Tang et al. argued that the differences of different types of landslides should be considered when mapping landslide susceptibility, and used loess landslides as a research object. The results of the study showed that rainfall and land use are the keys to predict the occurrence of loess landslides and avalanches30. Huang et al. discussed the influence of different attribute intervals (AINs) numbers on the frequency ratio (FR) analysis of continuous environmental factors and the uncertainty of landslide susceptibility prediction (LSP), the results showed that for a certain model, the LSP accuracy gradually increases with the AINs increasing from 4 to 8, and then the accuracy is stable with the AINs increasing from 8 to 2031. Huang et al. used machine learning methods such as C5.0 decision tree, LR, multilayer perceptron, and SVM to study the effect of soil erosion (SE) on landslide events in Ningdu County, China. The results show that the SE-based model has higher prediction accuracy than a single model without SE factors32. Although many scholars have tried to expand the scope of LSM factors and explore the relationship between factors and LSMs without being limited by traditional influences of rainfall and earthquakes on LSMs, they have achieved different results. These factors and corresponding influences are often concentrated on the influencing factors—the internal conditions formed by landslides are considered as controlling factors, and the external conditions formed by landslides are considered as influencing factors—and there is a lack of discussion on controlling factors.
The purpose of this study is to discuss the influence of rock–soil characteristics factors on LSM, and take the TGRA as the study area. We try to explore the influence of rock–soil characteristics factors and their specific combinations on LSM. We use several statistical methods to evaluate the effectiveness of LSM based on rock–soil characteristics factors, which provides meaningful information for further research.
Study area and data source
Overview of study area
The TGRA is located in the transition area from the second step to the third step among the three major steps of Chinese terrain. The study area is in the eastern part of the two natural geographic units of the TGRA, starting from Xinling Town in Badong County and ending at Quyuan Town in Zigui County. It spans about 54 km from east to west and about 16 km from north to south33, as shown in Fig. 2. The strata in the study area are well developed, with outcrops from Sinian to Quaternary, and only a few stratigraphic gaps34. The structural features in the study area—folds and faults—are mainly formed from the late Yanshan movement to the early Himalayan movement, they form the basic structural background for the evolution and development of the TGRA, and include the Huangling anticline, Zigui syncline, Xiannv Mountain fault, Jiuwanxi fault, Niukou fault, and Xiangluping fault35, as shown in Fig. 3. The study area has a well-developed water system, and the density of rivers is as high as 1.2 km per square kilometer6. The study area belongs to the mid-latitude subtropical monsoon climate zone, affected by the alternate control of tropical ocean air masses and polar continental air masses. The temperature and rainfall vary significantly with seasons. The average annual rainfall in Badong is 1093 mm, while the average annual rainfall in Zigui is 1274 mm36. Since the study area is mostly mountainous, the vegetation (including arable land, shrubs, and woodland, etc.) is lush and occupies the largest area of 258.8 km2, accounting for 52.1% of the total area. The water system is well developed, with an area of 120.2 km2, accounting for 24.2%. The area of wasteland is affected by the seasons, with a smaller area of 24.1 km2, accounting for 18.9%. And the artificial impervious surface (including houses, roads, and bridges, etc.) are mainly concentrated in the area located in the northwest of the study area in Badong County and in the southeast near Zigui County, with an area of 24.1 km2, accounting for 4.8%.
Landslide inventory mapping
The existing landslide database shows the spatial distribution of landslide events in the study area, which is also helpful to understand the relationship between LSM factor and landslide occurrence37. TGRA landslide inventory map is produced through extensive field investigation, landslide history and bibliographic data on landslides, and visual interpretation of remote sensing images. In this study, 202 historical landslide locations were extracted from the study area, and the distribution is shown in Fig. 4.
Landslide disasters occurred frequently in the study area, especially after the establishment of TGRA. Affected by the reservoir water storage and rainfall, some ancient landslides revived and new landslide disaster were produced. The total area of landslides in the study area is 23.4 km2, of which the largest is Fanjiaping landslide with an area of 1.51 km2, and the smallest is Lianhua Street landslide with an area of 2068.8 m2.
It can be seen from Fig. 4 that the landslide disasters in the study area are distributed along the Yangtze River, showing the characteristics of banding and patchy distribution, and there are compound landslides formed by the superposition of multiple landslides, such as Taijiaozi, Shizishubao, Tanjiawan, Huanglashi, Hualianshu, Zhangjialiangzi landslide group on the left bank of the Yangtze River.
Data
The data sources used in this paper are shown in Table 1.
The scale of topographic maps and geological maps are 1:50,000, the scale of landslide distribution maps is 1:10,000, and fully matches the precision requirements of RS data sources and DEM data sources with a spatial resolution of 30 m. In order to ensure that data of different scales/resolutions can be used properly in this study, without being affected by the data of different scales/resolutions on the modeling process, the lowest precision of the available data, i.e., Landsat 8 and DEM data with 30-m resolution, was taken as the research precision. While other data, such as geological maps, topographic maps, and landslide maps, are higher than the research precision, they are resampled to 30-m resolution data.
Methods
Factor analysis model
PCC analysis
In the field of statistics, the PCC also known as the Pearson Product-Moment Correlation Coefficient (PPMCC), which was proposed by Karl Pearson38. It is a common method used to measure the degree of linear correlation between two variables39, as shown in Formula 1.
where cov refers to the covariance, and E refers to the mathematical expectation, X, Y are the individual sample points, μX, μY are the sample means, σX, σY are the sample standard deviations.
The value of PCC is between − 1 and 1. A positive value represents a positive correlation, and a negative value represents a negative correlation40. The larger the value, the greater the correlation, and vice versa41. In LSM, it is necessary to remove LSM factors with strong correlations (with the value > 0.7) to ensure that the selected factors are relatively independent39.
Multicollinearity analysis
Multicollinearity analysis is one of the methods to evaluate the correlation between LSM factors. Multicollinearity means that the characteristics of at least two predictive features exhibit a high degree of linear correlation in the multiple regression42. Variance expansion factor (VIF) and tolerance (TOL) can be used to objectively calculate multicollinearity results, where VIF and TOL are reciprocal to each other. In the case of LSM, if the VIF coefficient of a factor is greater than 10 or the TOL coefficient is less than 0.1, the corresponding factor can be considered to have multicollinearity and should be excluded from the subsequent modeling process43.
Relief-F analysis
The Relief-F method evaluates the LSM factor value by calculating the correlation between the LSM factor and the landslide to determine the relative importance of the factor to the occurrence of the landslide44. The Relief-F algorithm will randomly select a sample R from the training set D, and construct sample sets H and M by using k-nearest neighbor samples with a sample label of R and different labels from R, respectively45. For factor set A, the weight of the ith factor is calculated by Formula 2.
where C is the sample label, p(C) is the probability of class C, Class (R) is the sample label of class R, Mj(C) is the jth sample of class C, the diff (Ai, R, Hi) and diff (Ai, R, Mi(C)) are distance functions, and the factor importance will be calculated after repeating this process m times.
Classifiers
LR model
LR is a multivariate statistical method, which uses logic function to model binary dependent variables46. The principle of LR is to transform each LSM factor into a logical variable, and then use the maximum likelihood estimation to obtain the probability value of each factor for the occurrence of the landslide events47. The output prediction of LR is defined as Formula 348.
where p is the probability and z is the linear combination of variables, as show in Formula 4.
where (β0, β1, β2, …, βn) and (X1, X2, …, Xn) are the regression variables and the explanatory variables, respectively.
ANN model
ANN is a computational program that simulates the work of the human brain49. The goal of the ANN model is to create a method for predicting the output of input factors that are not used in the modeling process50. The standard ANN model consists of three layers: an input layer (i.e., LSM factor), a hidden layer, and an output layer (i.e., LSM). In the training step, the network uses the weights and bias values of the samples to predict the labels of each sample, the cost function finds the difference between the computed labels and the true labels, while in the backpropagation step, each weight receives an update based on the gradient of the cost function, and this process continues until the convergence condition is met or the maximum number of epochs is reached in training50,51. A sketch of the ANN architecture is shown in Fig. 5.
SVM model
The SVM model was first proposed by Vapnik (1995) as a machine learning algorithm, and was established based on the VC dimension theory and the principle of minimum structural risk52. It seeks the best compromise between the complexity of the model and the learning ability based on limited sample information, to obtain the best generalization ability. It have many unique advantages in small sample, nonlinear, and high-dimensional pattern recognitions52,53. Due to the relatively small number of landslide samples and the large number of LSM factors in the study area, the SVM model is used as the LSM model in this paper.
SVM can be a binary classification model. In order to find an n-dimensional hyperplane, this model divides the statistical samples to ensure the distance between the sample point closest to the hyperplane and the dividing line is the largest53. In other words, it is a space classifier that maximizes the interval between sample points. Its function is defined as show in Formula 5.
where xi refers to the point on the hyperplane, yi refers to the classification mark, i = 1, 2, …, R, R refers to the number of samples, w refers to the vector perpendicular to the hyperplane, b refers to the constant to prevent the hyperplane from passing through the origin of the coordinate axis, ||w|| is 2-norm of w.
Formula 6 introduces a non-negative slack variable ζi, however, a penalty factor C must also be introduced to represent the distance from a misclassified point to its correct position. Therefore, Formula 6 can be expressed as:
For the problem of transforming training samples into n-dimensional space, Vapnik considered K (xi, yi) as a kernel function and introduced SVM. The essence of this kernel function is a mapping function. Its basic function is to accept vectors in the low-dimensional space and calculate the inner product value of vectors in the n-dimensional space after a certain transformation, that is, it can map low-dimensional inseparable linear training samples to n-dimensional space and make them linearly separable52. In this paper, the Radial Basis Function is selected as the kernel function of the SVM model to map vectors in the low-dimensional space to the high-dimensional feature space for classification. The function can be expressed as Formula 7.
where γ refers to the nuclear parameters of different radial basis functions.
Result evaluation model
Specific category precision analysis
The traditional quantitative analysis method of LSM is based on Landslide Susceptibility Zoning (LSZ), which calculates proportion of the area of the landslide in each type of landslide-prone zones using the landslide distribution data. The analysis result is based on the proportion of the landslide area in the highest susceptibility zones to the total area of the landslides. However, when the prediction results of the model are polarized, and many areas in the LSZ belong to the highest susceptibility zone, it is natural that most landslides are in the highest susceptibility zone, which will lead to a better result of the model. Obviously, this cannot be used to verify the effect of the method, and it is not appropriate to the quantitative analysis of LSM.
The specific category precision analysis method is an improved quantitative analysis method that was used to solve the above problems12. In this paper, the specific category precision method takes into account the number of calculation units in the classification area, and can be expressed as Formula 8:
where i = 1, 2, …, S, S refers to the number of LSZ classification, Ai is the number of calculation units occupied by landslides in the ith LSZ classification, Bi is the number of calculation units in the ith LSZ classification, and pi is the specific category precision in the ith LSZ classification.
ROC curve and AUC value
The ROC curve analysis is a classic method in statistical theory and is also a method commonly used to analyze LSM54. This method mainly analyzes the classification results of the binary classification model55. The ROC curve is in the form of coordinates on a rectangular coordinate system, describes the process of classifier performance as the classifier threshold changes, with a value range of [0,1]. Each point on the curve reflects the sensitivity to the same signal. The horizontal axis is the specificity of the False Positive Rate (FPR), the vertical axis is the sensitivity of the True Positive Rate (TPR). There are 4 situations: (1) the result is a positive type and the prediction is also positive, it is a True Positive (TP); (2) the result is a negative type and the prediction is positive, it is a False Positive (FP); (3) the result is a negative type and the prediction is also negative, it is a True Negative (TN); and (4) the result is a positive type and the prediction is negative, it is a False Negative (FN)6, as shown in Table 2.
The AUC value is calculated by the ROC curve. This indicator is widely used in studies in different disciplines and has been tested in various precision prediction models. The prediction effect has also been widely recognized.
Five statistical measures
In addition to the specific category accuracy analysis, ROC curve analysis and AUC value mentioned above, five statistical methods, including overall accuracy (OA), precision, recall, F-measure, and Matthews correlation coefficient (MCC), were used to evaluate the calculation results of the model14. The formulas of these 5 evaluation methods are as Formula 9–13.
where the TP, FP, TN, and FN are the same as the definitions in “ROC curve and AUC value” section.
Experimental process
Selection of calculation units
According to the conclusions of Guzzetti et al., all LSM calculation units were summarized into the following five types: grid unit, geographic unit, single condition unit, slope unit, and sub-basin unit56. Based on the research purpose, the grid unit is selected as the calculation unit in this paper. After invalid data is extracted and eliminated using ArcGIS 10.5 developed by ESRI, a total of 422,242 valid calculation units in study area are finally obtained.
Selection of factors
Selection of basic factors
Based on previous research results, eleven factors in the study area are selected in this paper, including Elevation, Slope, Aspect, Slope Form, Slope Structures, Distance from River, Topographic Wetness Index (TWI), Stream Power Index (SPI), Rainfall, Landuse, and Normalized Difference Vegetation Index (NDVI). These eleven factors can be divided into two major categories, that is, landslide controlling factors and influencing factors, and four sub-categories, that is, topography, basic geology, hydrological conditions, and land cover. The effect of each factor on landslides is shown in Table 3.
Rock–soil characteristic factors
Rock–soil mass is the material basis for landslides. Rock–soil mass with different characteristics has diverse effects on the development of landslides. It not only affects the development degree of landslides in the study area, but also determines the type and scale of landslides. It is an important controlling factor for landslides1. The structure and composition of rock–soil mass in the study area constitute unique rock–soil characteristic factors. Based on the characteristics of the rock–soil mass in the study area, the properties of rock–soil mass are summarized into four rock–soil characteristic factors: lithology, Rock Structure, rock infiltration, and rock weathering. The effects of each rock–soil characteristic factor on landslides are shown in Table 4.
In the National Standards of the People's Republic of China, the classification standards for these four factors are specified in detail65,66. Therefore, combining the existing lithological data of the study area, the distribution map of these four geotechnical characteristics can be obtained.
For example, in the lithological factors, hard rocks represented by the Huanglong Formation, soft rocks represented by the Liantuo Formation, and soft-hard alternating rocks represented by the Qianfuya Formation. In the rock structure factors, massive structure represented by the Maokou Formation, stratified structure represented by the Liangshan Formation, cataclastic structure represented by the Tongzhuyuan Formation, and granular structure represented by the Badong Formation. In the rock weathering factors, slightly weathered represented by the Qixia Formation, weakly weathered represented by the Danying Formation, strongly weathered represented by the Shilongdong Formation, and completely weathered represented by the upper part of the Jialingjiang Formation. In the rock infiltration factors, very slightly permeable represented by the four sections of the Badong Formation, slightly permeable represented by the Daye Formation, weakly permeable represented by the Penglaizhen Formation, moderately permeable represented by the Tianhepan Formation, and strongly permeable represented by the Nantuo Formation.
Factor correlation analysis
To ensure the relative independence of the selected factors, IBM SPSS Statistics software is used to perform PCC analysis on the 15 factors above, to evaluate existence of strong correlation between the factors and ensure the accuracy of the LSMs. The PCC matrix between each factor is shown in Fig. 6.
It can be seen from Fig. 6 that the correlation between the factors is relatively low. The highest correlation appears between TWI and SPI, which is only 0.489, showing a weak correlation. It has no adverse effects on the establishment of the LSM model.
Factor multicollinearity analysis
In order to ensure that there is no multicollinearity in the selected factors in the study, which affects the calculation of the weight of the factors and causes the inaccuracy of LSM, all factors must be checked for multicollinearity. The results of multicollinearity of all factors in this study are shown in the Table 5.
It can be seen from Table 5 that VIF values of all factors are less than 10 and TOL values are greater than 0.1, so there is no multicollinearity in the selected factors in this study.
Factor Relief-F analysis
Through Relief-F calculation, the factors that are not important to the occurrence of landslide events can be eliminated from the selected factors, the number of input factors for modeling can be reduced, the redundancy of model calculations can be eliminated, and the accuracy of LSM can be improved. The Relief-F coefficients of each LSM factor are shown in Fig. 7.
As shown in Fig. 7, although the Relief-F coefficients of some LSM factors are very low, for example, NDVI factor is only 0.06, but the coefficients of all factors are greater than 0, so all LSM factors are retained.
The final LSM factors
The LSM factors in the study area are finally obtained after selection and analysis, as shown in Table 6 and Fig. 8.
LSM model based on three classifiers and rock–soil characteristic factors
The establishment process of the LSM model based on three classifiers and rock–soil characteristic factors is as follows:
-
1.
After calculation units and LSM factors are selected, it is necessary to establish the training sample set and the validation sample set of the LSM model. In this paper, the landslides in the study area are randomly divided according to the traditional ratio of 70:30. In other words, 141 landslides are used as the landslide distribution data in the training samples, and the remaining 61 landslides are only existed as validation samples, as shown in Fig. 9.
-
2.
After the landslides are randomly divided, there are 141 landslides with 19,077 calculation units (the remaining 61 landslides had 14,796 calculation units) that are distributed in study area. Moreover, a buffer with 3 times raster resolution (90 m) is set up for these 141 landslides to solve the problem of landslide distribution data deviations due to inaccurate landslide surface depictions. Then, the landslides distribution data is marked as “1”, namely, landslide; the area outside the buffer area and within the study area is marked as “0”, namely, non-landslide. All the landslide calculation units are selected, and the same number of non-landslide calculation units as landslide calculation units are randomly selected, forming training sample points composed of 38,154 calculation units, to participate in the training and modeling of the LR model, the ANN model, and the SVM model.
-
3.
Eleven basic factors including Elevation, Slope, Aspect, Slope Form, Slope Structures, Distance from River, TWI, SPI, Rainfall, Landuse, and NDVI, as well as Lithology, which is the commonly used rock–soil characteristic factor, are selected to form a combination of traditional factors. The values of these factors are assigned into the training sample points established in the previous step to obtain the training sample set, and then input into the three models for modeling.
-
4.
All the calculation units (422,242) are used as the total sample set. All calculation units, except for the landslide training sample set involved in the modeling, were used as validation data set (403,165). Based on the model built in the previous step, the membership degree of each calculation unit to the landslide is obtained through calculations, which refers to the probability of a landslide occurring in each calculation unit. Consequently, the LSM based on traditional factor combinations is obtained.
-
5.
As the LSM model is sensitive to the input factors, only the Lithology factor in traditional factor combinations is replaced with Rock Structure factor, Rock Infiltration factor, and Rock Weathering factor, in order to further discuss the influence of different rock–soil characteristic factors on the LSMs. Then, in a classifier model, three sample sets of LSM models are established based on different rock–soil characteristic factor combinations, generating three groups of LSMs for comparison and analysis with traditional factor combinations.
-
6.
Repeat step (5) with different classifiers to obtain LSMs based on different classifiers and different sample sets, so as to study the influence of rock–soil characteristic factors on LSM and the stability and universality of this influence.
-
7.
The four rock–soil characteristic factors are classified into two categories, that is, Intrinsic attribute factors and External participation factors, to generate the LSMs by three classifiers, in order to further study the influence of the introduction of different rock–soil characteristic factors on the LSM.
The flow chart is shown in Fig. 10.
Results and analysis
Experimental results of traditional LSI based on lithology
After LSM factors analysis and three classifiers modeling, the LSM model is established, and the traditional LSMs based on Lithology factor are finally obtained. The LSI is a continuous variable used to express the LSM. Its value ranges from 0 to 1. The larger the value, the greater the occurrence probability of landslides, and vice versa, as shown in Fig. 11.
Experimental results of LSI based on the other three rock–soil characteristic factors
As mentioned in Step (5) in “LSM model based on three classifiers and rock–soil characteristic factors” section, in order to better study the influence of different rock–soil characteristic factors on the LSM, based on the principle of single variate with other conditions remaining unchanged, only the Lithology factor in traditional factor combinations is replaced with Rock Structure factor, Rock Infiltration factor, and Rock Weathering factor. Modeling with three classifiers, the nine LSIs are shown in Fig. 12.
Experimental results of LSZ based on four rock–soil characteristic factors
To increase the readability of the LSIs, areas are reclassified into 5 categories in this paper—very low susceptibility, low susceptibility, medium susceptibility, high susceptibility, and very high susceptibility, according to the value range of 0–0.5, 0.5–0.75, 0.75–0.85, 0.85–0.95, and 0.95–1, respectively. The LSZs based on four different rock–soil characteristic factors are obtained, as shown in Fig. 13.
Experimental results analysis
To quantitatively analyze the LSMs obtained in “Experimental results of LSZ based on four rock–soil characteristic factors” section, the evaluation methods mentioned in “Classifiers” section are used. It should be noted that in the analysis of the three different sample sets, including the overall sample set, the training sample set, and the validation sample set, the definitions of landslide and non-landslide are not the same, therefore, the results of these three sample sets are not comparable.
Specific category precision analysis
The specific category precision analysis of the LSMs based on different rock–soil characteristic factors and different classifiers is shown in Table 7.
It can be seen from Table 7 that for the very high susceptibility area, the results of traditional methods based on lithological factors are not all the best in the specific category accuracy analysis of three different sample sets and three different classifiers. It ranked second in the overall and training sample sets (46.44%, 40.98%), and third in the validation sample set (14.72%) in the LR model, following result based the rock structure factor (47.19%, 41.24%, and 16.09%), which slightly better than the result based on the rock weathering factor and rock infiltration factor. The results are similar in the SVM model, the best results are based on rock structure factor (51.87%, 45.54%, and 19.44%), followed by the rock weathering factor (50.95%, 45.04%), rock infiltration factor (50.55%, 44.32%) and lithology factor (50.13%, 43.41%) in the overall and training sample sets, respectively, and followed by the lithology factor (19.23%), rock infiltration factor (18.47%), and rock weathering factor (17.98%) in the validation sample set, respectively. However, the results in the ANN model are different. In the ANN model, the top ranking factor in the training sample set is the rock structure factor (49.68%), followed by the lithology factor (49.55%), rock infiltration factor (49.46%), and rock weathering (48.37%), which is similar to the results of other models, but in the overall and validation sample sets, the top ranking factor is the lithology factor (56.17%, 23.04%), followed by rock structure (56.01%, 22.24%), rock infiltration (54.94%, 19.44%) and rock weathering (53.87%, 18.77%), respectively.
ROC curve and AUC value
The ROC curve of the LSMs based on different rock–soil characteristic factors and different classifiers, is shown in Fig. 14.
In the ROC curve, the curve closer to the upper left corner indicates higher precision. It can be seen from Fig. 14 that in the ROC curve analysis of the three different sample sets and three different classifiers, the rock infiltration factor results are the worst. In specifically, in the LR model, that is, Fig. 14a–c, the ROC curves based on rock structure factor and rock weathering factor are closest to the upper left corner, while the ROC curve based on rock lithology is slightly further. In the ANN model, that is, Fig. 14d–f, the ROC curve based on lithology factor and rock structure are closer to the upper left corner, while the ROC curve based on rock weathering factor further. In the SVM model, that is, Fig. 14g–i, the result is similar to the ANN model.
To better understand the results of the ROC curve analysis, AUC is used to quantitatively analyze the ROC curve, as shown in Table 8.
The conclusion of Table 8 and Fig. 14 is consistent, and have more details than ROC curve analysis. As show in Fig. 14, in the Table 8, the performance of rock infiltration factor in different sample sets and different classifiers is the worst. For the LR model, the AUC values based on rock structure factor and rock weathering factor are the same (0.887, 0.851), slightly better than the AUC values based on lithology factor (0.886, 0.848) in overall and validation sample sets, and in the training sample set, the AUC value based on lithology factor is the best (0.915), slightly better than the AUC values based on rock structure factor and rock weathering factor (0.914). For the ANN model, the AUC values based on rock structure factor (0.918, 0.949) are better than the AUC values based on the lithology factor (0.917, 0.947) and rock weathering factor (0.911, 0.872) in the overall and training sample sets, however, the AUC value based on the lithology factor (0.879) is the best, followed by the AUC values based on the rock structure factor (0.877) and rock weathering factor (0.872) in the validation sample set. For the SVM model, the AUC values based on the lithology factor is the best in three sample sets (0.918, 0.949, and 0.877), and the AUC values base on rock structure factor (0.917, 0.948, and 0.877) is almost equal to them, which is better than the AUC value based on rock weathering factor (0.914, 0.945, and 0.876).
Five statistical methods
In order to analyze the statistical results of LSM results obtained with different sample sets and different classifiers, the calculation results of five statistical methods, including OA, precision, recall, F-measure, and MCC are shown in Table 9. In this section, only the overall sample set is used.
The Table 9 shows that, in the LR model, the results of statistical calculation based on rock structure factor are the best (80.49%, 0.2666, 0.8179, 0.4022, and 0.4371), followed by the result based on rock weathering factor (80.60%, 0.2678, 0.8181, 0.4035, and 0.4383), lithology factor (80.36%, 0.2643, 0.8122, 0.3988, and 0.4343), and rock infiltration factor (79.73%, 0.2582, 0.8148, 0.3921, and 0.4287). In the ANN model, the results are slightly different. The results of statistical calculation based on rock structure factor are still the best (84.82%, 0.3242, 0.8225, 0.4651, and 0.4898), but the second and third ranking has changed. The results based on lithology factor ranked second (84.51%, 0.3205, 0.8309, 0.4626, and 0.4881), rock weathering factor ranked third (83.91%, 0.3107, 0.8248, 0.4513, and 0.4784), and rock infiltration ranked last (83.10%, 0.2976, 0.8131, 0.4357, and 0.4650). Compared with the other two models, the results have changed a lot in the SVM model. The results of statistical calculation based on lithology factor are the best (85.51%, 0.3377, 0.8391, 0.4815, and 0.5043), followed by the results based on rock structure factor (85.28%, 0.3334, 0.8354, 0.4766, and 0.5000), the rock weathering factor (84.68%, 0.3229, 0.8295, 0.4649. and 0.4900), and rock infiltration factor (84.33%, 0.3174, 0.8280, 0.4589, and 0.4849).
Summary of experimental results based on 4 rock–soil characteristic factors
In the specific category accuracy analysis, a total of nine results were obtained from three sample sets and three classifiers. Among these nine results in the very high susceptibility category, the best ones are the results based on rock structure factor and the lithology factor (7 and 2 times, respectively), and the second ranked results are those based on lithology factor, rock weathering factor and rock structure factor (4, 3 and 2 times, respectively). In this analysis, it can be found that the results based on rock structure did not appear in the third and fourth place, that is, its influence on LSM will be more stable, while the traditional lithology factor, which appeared in the first, second and fourth place, indicates the instability of its influence on LSM.
In the AUC value, which also obtains nine results from three sample sets and three classifiers, due to the occurrence of the same AUC value, the results based on rock structure factor, lithology factor and rock weathering factor are ranked first (5 times, 5 times and 2 times, respectively), and the second ranked results based on rock structure factor, lithology factor and rock weathering factor (7 times, 3 times and 3 times, respectively). In this analysis, it can be observed that although the number of occurrences of rock structure factor and lithology factor are the same in the first place of the results, just like the specific categories accuracy analysis, the results based on rock structure factor are more concentrated and ranked higher than those of lithology factor, indicating that rock structure factors have a greater influence on LSM.
Among the five statistical methods, the study only analyzed the overall sample set, therefore, only three results were obtained. The results show that the top ranking is the result based on rock structure factor and lithology factor (2 times and 1 time, respectively), which also indicates statistically that the rock structure factor is more important than the lithology factor in the LSM.
In summary, although in some cases, the results based on rock structure factors are not the best, and often alternate with the result based on lithology factors in the first place, but in most cases, the results based on rock structure factors are the best. It shows that its stability in LSM and its influence on LSM are better than lithological factors.
Experimental results and analysis based on two rock–soil characteristic factor combinations
To further study the influence of the introduction of different rock–soil characteristic factors on the LSM, the four rock–soil characteristic factors are classified into two categories— “Intrinsic attribute factors” and “External participation factors”. According to Table 4, Lithology factor and Rock Structure factor are the internal attributes of rock soil, and are determined by the nature of rock–soil mass, while Rock Infiltration factor and Rock Weathering factor can be realized with the participation of external conditions (water, wind, sunlight, air, and so on). Based on this, two new factor combinations can be obtained, namely, the “Basic factors and intrinsic attribute factors” and “Basic factors and external participation factors”. According to the new factor combinations and three classifiers, the LSI based on the intrinsic attribute factor and the external participation factor can be obtained, as shown in Fig. 15.
Similarly, to better interpret the LSIs, the results are reclassified with the interval of 0–0.5, 0.5–0.75, 0.75–0.85, 0.85–0.95, and 0.95–1.0, and the LSZs are obtained, as shown in Fig. 16.
To analyze the two groups of LSMs based on different types of rock–soil characteristic factor combinations, the two evaluation methods and five statistical methods mentioned in the “Result evaluation model” section are still adopted in this section. The results are shown in Fig. 17 and Table 10.
From a qualitative perspective, it can be seen from Fig. 17 that the ROC curve of the LSM based on the “Intrinsic attribute factors” is closer to the upper left corner than the ROC curve of the LSM based on the “External participation factors”, indicating that its prediction effect is better.
From a quantitative perspective, as can be seen in Table 10, whether it is specific category precision analysis, ROC curve analysis, or five statistical methods, the LSM based on intrinsic attribute factors is better than that based on external participation factors in three classifiers. It shows that the intrinsic attribute factor has a more important role and influence on LSM than the external participation factors.
A comprehensive analysis of the data in Tables 7, 8, 9 and 10 reveal some interesting phenomena. Compared to the results based on rock structure factors and traditional lithology factor, the LSM based on intrinsic attribute factors has a significant improvement in the evaluation method used in the study. There is no doubt that the improvement of the evaluation results is due to the fact that the intrinsic attribute factors can express rock–soil characteristics more comprehensively, but it also needs to be noted that it may also be due to more factors participating in the LSM model during the modeling process.
In view of the latter possible problem, it can be seen from the data in Tables 7, 8, 9 and 10 that the LSM based on external participation factors is not all better than the results when the four rock–soil characteristics factors are combined with the basic factors alone; for example, in the overall sample set of the LR model, the LSM based on external participation factors are only slightly better than those based on rock permeability and rock weathering, and it even gives the worst results in the overall sample set of the ANN model.
It shows that the improvement of the LSM is not only due to the increase in the number of factors participating in the LSM modeling, but it is more closely related to the significance of the factors participating in the modeling of the development and occurrence of landslides.
Discussions
LSM plays a vital role in the management and prevention of landslide disasters. Therefore, it is very important to improve the accuracy of prediction and help managers and decision makers obtain more accurate LSI and LSZ45. To this end, this study expands the lithology factor, which is traditionally considered as the only factor representing rock–soil characteristics among geological factors. According to the national standards of the People’s Republic of China65,66, three new rock–soil characteristic factors have been obtained: rock structure, rock weathering and rock infiltration, and using the traditional LR model, ANN model and SVM model to obtain different LSMs.
In general, morphological, geological, and hydrological conditions are highly correlated with landslide occurrence6,45. In this study, PCC coefficients, multicollinearity and Relief-F methods were used for factor screening and examination to ensure the validity of LSM factors. Elevation was found to be the most critical LSM factor when using the Relief-F method because it determines the stress distribution on slopes and is associated with human activities that affect landslide stability. These observations are consistent with previous studies45,68.
Once the LSM factor is determined, the basic factors can be combined with the rock–soil characteristic factors to obtain LSM based on different rock–soil characteristic factors by constructing different sample sets and using different classifiers. Two evaluation methods (specific category accuracy analysis, ROC curve analysis and AUC value) and five statistical methods (OA, Precision, Recall, F-measure, MCC) are used to evaluate LSM results. The experimental results show that the traditional method simply considered lithology as the only rock–soil characteristic factor, which is one-sided or even wrong. In most cases, the results based on the rock structure factor are better than those based on lithology factor. From all the experiments, the former ranked more highly and concentrated, indicating that rock structure factor has more influence on the LSM, and this influence is more stable.
To further verify the influence of rock–soil characteristic factors on LSMs, the four rock–soil characteristic factors in the study were classified into internal attribute factors (lithology factor and rock structure factor) and external participation factors (rock weathering factor and rock infiltration factor) according to whether external conditions were involved as classification standard to formed two factor combinations with the basic factors, respectively, and then three classifiers are used to obtain LSMs based on the overall sample set. The experimental results show that the results based on internal attribute factors are better than those based on external participation factors in all evaluation methods. That is to say, in terms of rock–soil characteristics, the influence of internal attributes of the rick–soil mass themselves on LSMs are greater than that with the participation of external conditions. This can be seen very implicitly in previous studies for factor analysis12,32,67.
In recent years, researchers have paid attention to the importance of LSM factors for LSM. Some researchers have found some special LSM factors23, some are committed to studying the LSM factors in different regions24,25,26,28, some are concerned about the influence of different factor combinations on LSMs27, some are attention about the influence of the mathematical attributes of the factors themselves on LSM31, and some are more concerned about the number of LSM factors6. However, little attention has been paid to the influence of rock–soil characteristic factors in LSM. The rock–soil characteristic factors play an important role in the occurrence of landslide, which are very important in geological factors and one of the most essential controlling factors for the development of landslides. Therefore, it is very promising to explore the influence of rock–soil characteristic factors on LSM.
Although in the study, the rock–soil characteristic factors, especially the rock structure factor and the inherent attribute factor composed of the rock structure factor and lithology factor, have achieved good performance in LSM, but there are two points that need to be noted. First, in the national standards of the People's Republic of China, there are many other rock–soil characteristic factors. Whether these factors can be used in LSM research and what kind of influence they have on LSM is worth exploring. Second, researchers all know that in different study areas, influenced by topography, geology, hydrology, meteorology rain, earthquakes, human engineering activities and other factors, each LSM factors often exhibits different importance. Whether rock–soil characteristic factors perform well in other study areas is also the focus of further research.
Conclusion
In traditional LSM studies, researchers tend to include lithology as the only geological factor related to rock–soil characteristic in LSM modeling. Rocks can be classified into hard rocks, soft rocks, and soft-hard interbedded rocks according to their lithology or engineering geological conditions. For a more detailed study, rocks can be further classified into 5–7 or more categories by combining the hardness and proportions of rocks in different strata. However, it is undeniable that regardless of the classification method used, the concept of geotechnical properties is only related to rock hardness. This is a one-sided understanding of the concept of rock–soil characteristic and the influencing factors.
In this study, the application of rock–soil characteristic factors in LSM is researched using the section from Zigui to Badong in the TGRA as the study area. This method, which is based on the stratigraphic properties of the study area, is universal and can be applied to other regions in the world with similar characteristics. All 15 LSM factors used in this study passed the validity checks of PCC coefficients, multicollinearity, and Refile-F methods. Different LSMs were obtained using traditional LR models, ANN models and SVM models based on different combinations of 11 basic factors and 4 geotechnical characteristics factors. The validation of the results was carried out with objective indicators from two evaluation methods and five statistical methods.
The experiments confirmed the following conclusions. First, the proposed rock structure factor and internal attribute factors are more practical for landslide management and prevention than the traditional lithology-based LSM due to the improved accuracy. Second, the results based on the rock structure factor ranked high and concentration in most of the evaluations. Third, the internal attribute factors consisting of rock structure and lithology had the best results in all evaluations. And finally, the evaluation results show that the rock structure factor and internal attribute factors have a stronger and more stable influence on LSM. In summary, LSM factor analysis, especially the rock–soil characteristics factor, is promising for landslide spatial prediction. In the future, our research will investigate more efficient factor analyses for LSM.
Data availability
The public data such as remote sensing data and DEM data, can be downloaded directly through the link provided in Table 1. However, basic geographic data, basic geological data, and landslide distribution data are all confidential data in China. According to the requirements of relevant laws, these confidential data have been decrypted when we use them. Any researchers in related fields that need these decrypted data can contact the corresponding author to obtain them.
References
Yu, X. Study on the Landslide Susceptibility Evaluation Method Based on Multi-source Data and Multi-scale Analysis Doctor thesis (China University of Geosciences, 2016).
Statistics, N. B. o. (National Bureau of Statistics, 2001–2005).
China, M. o. n. r. o. t. p. s. R. o. (2006–2013).
China, M. o. n. r. o. t. p. s. R. o. (2014–2019).
Peng, L. Landslide Risk Assessment in the Three Gorges Reservoir Doctor thesis, (China University of Geosciences, 2013).
Yu, X. & Gao, H. A landslide susceptibility map based on spatial scale segmentation: A case study at Zigui-Badong in the Three Gorges Reservoir Area, China. PLoS ONE 15, e0229818 (2020).
Abay, A., Barbieri, G. & Woldearegay, K. GIS-based landslide susceptibility evaluation using analytical hierarchy process (AHP) approach: The case of Tarmaber District, Ethiopia. Momona Ethiop. J. Sci. 11, 14–36 (2019).
Riegel, R. P. et al. Assessment of susceptibility to landslides through geographic information systems and the logistic regression model. Natural Hazards 103 (2020).
Shahri, A. A., Spross, J., Johansson, F. & Larsson, S. Landslide susceptibility hazard map in southwest Sweden using artificial neural network. CATENA 183, 104225 (2019).
Pandey, V. K., Pourghasemi, H. R. & Sharma, M. C. Landslide susceptibility mapping using maximum entropy and support vector machine models along the Highway Corridor, Garhwal Himalaya. Geocarto Int. 35, 168–187 (2020).
Chen, W., Zhang, S., Li, R. & Shahabi, H. Performance evaluation of the GIS-based data mining techniques of best-first decision tree, random forest, and naïve Bayes tree for landslide susceptibility modeling. Sci. Total Environ. 644, 1006–1018 (2018).
Yu, X., Wang, Y., Niu, R. & Hu, Y. A combination of geographically weighted regression, particle swarm optimization and support vector machine for landslide susceptibility mapping: A case study at Wanzhou in the Three Gorges Area, China. Int. J. Environ. Res. Public Health 13, 487 (2016).
Sameen, M. I., Pradhan, B. & Lee, S. Application of convolutional neural networks featuring Bayesian optimization for landslide susceptibility assessment. CATENA 186, 104249 (2020).
Fang, Z., Wang, Y., Peng, L. & Hong, H. Integration of convolutional neural network and conventional machine learning classifiers for landslide susceptibility mapping. Comput. Geosci. 139, 104470 (2020).
Nhu, V.-H. et al. Effectiveness assessment of keras based deep learning with different robust optimization algorithms for shallow landslide susceptibility mapping at tropical area. CATENA 188, 104458 (2020).
Bai, S., Lu, P. & Thiebes, B. Comparing characteristics of rainfall-and earthquake-triggered landslides in the Upper Minjiang catchment, China. Eng. Geol. 268, 105518 (2020).
Lin, J.-W., Hsieh, M.-H. & Li, Y.-J. Factor analysis for the statistical modeling of earthquake-induced landslides. Front. Struct. Civ. Eng. 14, 123–126 (2020).
Zhao, B., Li, W., Wang, Y., Lu, J. & Li, X. Landslides triggered by the Ms 6.9 Nyingchi earthquake, China (18 November 2017): analysis of the spatial distribution and occurrence factors. Landslides 16, 765–776 (2019).
Maheshwari, B. Earthquake-induced landslide hazard assessment of Chamoli District, Uttarakhand Using relative frequency ratio method. Indian Geotech. J. 49, 108–123 (2019).
Chen, Y. C. et al. Controls of preferential orientation of earthquake-and rainfall-triggered landslides in Taiwan’s orogenic mountain belt. Earth Surf. Proc. Land. 44, 1661–1674 (2019).
Kuradusenge, M., Kumaran, S. & Zennaro, M. Rainfall-induced landslide prediction using machine learning models: The case of Ngororero District, Rwanda. Int. J. Environ. Res. Public Health 17, 4147 (2020).
Segoni, S., Tofani, V., Rosi, A., Catani, F. & Casagli, N. Combination of rainfall thresholds and susceptibility maps for dynamic landslide hazard assessment at regional scale. Front. Earth Sci. 6, 85 (2018).
Lee, S., Song, K.-Y., Oh, H.-J. & Choi, J. Detection of landslides using web-based aerial photographs and landslide susceptibility mapping using geospatial analysis. Int. J. Remote Sens. 33, 4937–4966 (2012).
Chen, W., Pourghasemi, H. R. & Naghibi, S. A. Prioritization of landslide conditioning factors and its spatial modeling in Shangnan County, China using GIS-based data mining algorithms. Bull. Eng. Geol. Env. 77, 611–629 (2018).
Pawluszek, K. & Borkowski, A. Impact of DEM-derived factors and analytical hierarchy process on landslide susceptibility mapping in the region of Rożnów Lake, Poland. Nat. Hazards 86, 919–952 (2017).
Skilodimou, H. D., Bathrellos, G. D., Koskeridou, E., Soukis, K. & Rozos, D. Physical and anthropogenic factors related to landslide activity in the Northern Peloponnese, Greece. Land 7, 85 (2018).
Al-Najjar, H. A. H., Kalantar, B., Pradhan, B. & Saeidi, V. In Earth Resources and Environmental Remote Sensing/GIS Applications X. 111560K (International Society for Optics and Photonics). https://doi.org/10.1117/12.2532687.
Mind’je, R. et al. Landslide susceptibility and influencing factors analysis in Rwanda. Environ. Dev. Sustain. 22, 7985–8012 (2020).
Bourenane, H., Braham, M., Bouhadad, Y. & Meziani, A. A. Spatial distribution, controlling factors and failure mechanisms of the large-scale landslides in the urban area of Azazga city (northern Algeria). Environ. Earth Sci. 80, 1–23 (2021).
Tang, Y. et al. Integrating principal component analysis with statistically-based models for analysis of causal factors and landslide susceptibility mapping: A comparative study from the loess plateau area in Shanxi (China). J. Clean. Prod. 277, 124159 (2020).
Huang, F. et al. Uncertainty study of landslide susceptibility prediction considering the different attribute interval numbers of environmental factors and different data-based models. CATENA 202, 105250 (2021).
Huang, F. et al. Landslide susceptibility prediction considering regional soil erosion based on machine-learning models. ISPRS Int. J. Geo Inf. 9, 377 (2020).
Li, S. et al. Characterizing the spatial distribution and fundamental controls of landslides in the three gorges reservoir area, China. Bull. Eng. Geol. Environ. 78, 4275–4290 (2019).
Guo, Z. et al. Regional rainfall warning system for landslides with creep deformation in Three Gorges using a statistical black box model. Sci. Rep. 9, 1–14 (2019).
Chen, T. et al. Mapping landslide susceptibility at the Three Gorges Reservoir, China, using gradient boosting decision tree, random forest and information value models. J. Mt. Sci. 17, 670–685 (2020).
Tang, H., Wasowski, J. & Juang, C. H. Geohazards in the three Gorges Reservoir Area, China-Lessons learned from decades of research. Eng. Geol. 261, 105267 (2019).
Tsangaratos, P., Ilia, I., Hong, H., Chen, W. & Xu, C. Applying Information Theory and GIS-based quantitative methods to produce landslide susceptibility maps in Nancheng County, China. Landslides 14, 1091–1111 (2017).
Pearson, K. Correlation coefficient. R. Soc. Proc. 58, 214 (1895).
Hong, H., Liu, J. & Zhu, A.-X. Modeling landslide susceptibility using LogitBoost alternating decision trees and forest by penalizing attributes with the bagging ensemble. Sci. Total Environ. 718, 137231 (2020).
Ahlgren, P., Jarneving, B. & Rousseau, R. Requirements for a cocitation similarity measure, with special reference to Pearson’s correlation coefficient. J. Am. Soc. Inform. Sci. Technol. 54, 550–560 (2003).
Zhao, L., Wu, X., Niu, R., Wang, Y. & Zhang, K. Using the rotation and random forest models of ensemble learning to predict landslide susceptibility. Geomat. Nat. Haz. Risk 11, 1542–1564 (2020).
Dormann, C. F. et al. Collinearity: A review of methods to deal with it and a simulation study evaluating their performance. Ecography 36, 27–46 (2013).
Kavzoglu, T., Sahin, E. K. & Colkesen, I. Landslide susceptibility mapping using GIS-based multi-criteria decision analysis, support vector machines, and logistic regression. Landslides 11, 425–439 (2014).
Rendell, L. & Kira, K. in International Conference on Machine Learning, 249–256. https://doi.org/10.1016/B978-1-55860-247-2.50037-1.
Fang, Z., Wang, Y., Peng, L. & Hong, H. A comparative study of heterogeneous ensemble-learning techniques for landslide susceptibility mapping. Int. J. Geogr. Inf. Sci. 35, 321–347 (2021).
Erener, A., Mutlu, A. & Düzgün, H. S. A comparative study for landslide susceptibility mapping using GIS-based multi-criteria decision analysis (MCDA), logistic regression (LR) and association rule mining (ARM). Eng. Geol. 203, 45–55 (2016).
Reichenbach, P., Rossi, M., Malamud, B. D., Mihir, M. & Guzzetti, F. A review of statistically-based landslide susceptibility models. Earth Sci. Rev. 180, 60–91 (2018).
King, G. & Zeng, L. Logistic regression in rare events data. Polit. Anal. 9, 137–163 (2001).
Pradhan, B. & Lee, S. Delineation of landslide hazard areas on Penang Island, Malaysia, by using frequency ratio, logistic regression, and artificial neural network models. Environ. Earth Sci. 60, 1037–1054 (2010).
Kalantar, B., Pradhan, B., Naghibi, S. A., Motevalli, A. & Mansor, S. Assessment of the effects of training data selection on the landslide susceptibility mapping: A comparison between support vector machine (SVM), logistic regression (LR) and artificial neural networks (ANN). Geomat. Nat. Haz. Risk 9, 49–69 (2018).
Sevgen, E., Kocaman, S., Nefeslioglu, H. A. & Gokceoglu, C. A novel performance assessment approach using photogrammetric techniques for landslide susceptibility mapping with logistic regression, ANN and random forest. Sensors 19, 3940 (2019).
Vapnik, V. N. The Nature of Statistical Learning Theory Vol. 1 (Springer, 1995).
Pham, B. T., Jaafari, A., Prakash, I. & Bui, D. T. A novel hybrid intelligent model of support vector machines and the MultiBoost ensemble for landslide susceptibility modeling. Bull. Eng. Geol. Env. 78, 2865–2886 (2019).
Zhao, X. & Chen, W. Optimization of computational intelligence models for landslide susceptibility evaluation. Remote Sens. 12, 2180 (2020).
Hanley, J. A. & McNeil, B. J. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143, 29–36 (1982).
Guzzetti, F., Carrara, A., Cardinali, M. & Reichenbach, P. Landslide hazard evaluation: A review of current techniques and their application in a multi-scale study, Central Italy. Geomorphology 31, 181–216 (1999).
Pourghasemi, H. R., Pradhan, B. & Gokceoglu, C. Application of fuzzy logic and analytical hierarchy process (AHP) to landslide susceptibility mapping at Haraz watershed, Iran. Nat. Hazards 63, 965–996 (2012).
Aditian, A., Kubota, T. & Shinohara, Y. Comparison of GIS-based landslide susceptibility models using frequency ratio, logistic regression, and artificial neural network in a tertiary region of Ambon, Indonesia. Geomorphology 318, 101–111 (2018).
Chang, K.-T., Merghadi, A., Yunus, A. P., Pham, B. T. & Dou, J. Evaluating scale effects of topographic variables in landslide susceptibility models using GIS-based machine learning techniques. Sci. Rep. 9, 1–21 (2019).
Fallah-Zazuli, M., Vafaeinejad, A., Alesheykh, A. A., Modiri, M. & Aghamohammadi, H. Mapping landslide susceptibility in the Zagros Mountains, Iran: A comparative study of different data mining models. Earth Sci. Inf. 12, 615–628 (2019).
Carrara, A. Multivariate models for landslide hazard evaluation. J. Int. Assoc. Math. Geol. 15, 403–426 (1983).
He, Q. et al. Landslide spatial modelling using novel bivariate statistical based Naïve Bayes, RBF Classifier, and RBF Network machine learning algorithms. Sci. Total Environ. 663, 1–15 (2019).
Ciurleo, M., Cascini, L. & Calvello, M. A comparison of statistical and deterministic methods for shallow landslide susceptibility zoning in clayey soils. Eng. Geol. 223, 71–81 (2017).
Zhao, C., Chen, W., Wang, Q., Wu, Y. & Yang, B. A comparative study of statistical index and certainty factor models in landslide susceptibility mapping: A case study for the Shangzhou District, Shaanxi Province, China. Arabian J. Geosci. 8, 9079–9088 (2015).
Ministry of housing and urban rural development of the People's Republic of China, G. A. o. q. s., inspection and Quarantine of the people's Republic of China. Vol. GB 50487–2008 (2008).
Ministry of housing and urban rural development of the People's Republic of China, G. A. o. q. s., inspection and Quarantine of the people's Republic of China. Vol. GB/T 50218–2014 (2014).
Wang, Y., Fang, Z. & Hong, H. Comparison of convolutional neural networks for landslide susceptibility mapping in Yanshan County, China. Sci. Total Environ. 666, 975–993 (2019).
Bui, D. T., Tsangaratos, P., Nguyen, V.-T., Van Liem, N. & Trinh, P. T. Comparing the prediction performance of a Deep Learning Neural Network model with conventional machine learning models in landslide susceptibility assessment. CATENA 188, 104426 (2020).
Survey, H. P. G. (Hubei Province Geological Survey Press, Wuhan, China, 1997).
Reservoir, H. o. P. a. C. o. G.-H. i. A. o. T. G. (2011).
Acknowledgements
We are grateful to the Headquarters of Prevention and Control of Geo-Hazards in the Area of the Three Gorges Reservoir for providing data and material.
Author information
Authors and Affiliations
Contributions
Conceptualization, X.Y. and K.Z.; methodology, X.Y., K.Z., Y.S., and W.J.; software, K.Z., Y.S., and J.Z.; validation, K.Z. and J.Z.; formal analysis, X.Y. and J.Z.; investigation, K.Z. and J.Z.; resources, X.Y. and W.J.; data curation, W.J. and J.Z.; writing—original draft preparation, X.Y., K.Z., W.J., and J.Z.; writing—review and editing, X.Y. and K.Z.; visualization, K.Z., Y.S., and W.J.; supervision, X.Y.; project administration, X.Y.; funding acquisition, X.Y., Y.S., and W.J. All authors have read and agreed to the published version of the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors declare no competing interests.
Additional information
Publisher's note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Yu, X., Zhang, K., Song, Y. et al. Study on landslide susceptibility mapping based on rock–soil characteristic factors. Sci Rep 11, 15476 (2021). https://doi.org/10.1038/s41598-021-94936-5
Received:
Accepted:
Published:
DOI: https://doi.org/10.1038/s41598-021-94936-5
- Springer Nature Limited
This article is cited by
-
Research on the influence of different sampling resolution and spatial resolution in sampling strategy on landslide susceptibility mapping results
Scientific Reports (2024)
-
Geospatial technologies for landslide monitoring: a case study of Sighetu Marmației, Romania
Environmental Earth Sciences (2024)
-
Landslide susceptibility, ensemble machine learning, and accuracy methods in the southern Sinai Peninsula, Egypt: Assessment and Mapping
Natural Hazards (2024)
-
Comparative study on landslide susceptibility mapping based on unbalanced sample ratio
Scientific Reports (2023)
-
Evaluating the influence of road construction on landslide susceptibility in Saudi Arabia’s mountainous terrain: a Bayesian-optimised deep learning approach with attention mechanism and sensitivity analysis
Environmental Science and Pollution Research (2023)