Introduction

Landslides are one of the major geological disasters. The frequent occurrence of landslides has caused huge economic losses and casualties to society (Pourghasemi et al. 2018; Tang et al. 2019; Gong et al. 2021). Landslide susceptibility assessment deals with “where” landslides are most likely to occur (Guzzetti et al. 2005), which is a preliminary and imperative task for landslide prevention and control. The key content of this task is to select suitable landslide susceptibility models. It is known that landslide susceptibility assessment is based on the general assumption that slope failure in the future will be more likely to occur under those conditions which led to landslides in the past (Gemitzi et al. 2011). Thus, in landslide susceptibility modelling, the input should be a set of landslide conditioning factors which are important attributes to slope instability, while the output should be a landslide susceptibility zonation map that delineates the spatial incidence of landslides.

There are extensive studies on landslide susceptibility modelling and mapping in the past few decades. According to the difference in model principles and evaluation bases, landslide susceptibility models in the existing studies could be categorized into empirical approaches, statistical approaches, machine learning approaches, and others (Melchiorre et al. 2008). The empirical models usually combine engineering information with disaster prevention experience, based on which the evaluation standard of landslide susceptibility is formulated (Pourghasemi et al. 2012). The statistical models carry out mathematical and statistical analysis on regional landslide database, and then establish the relationship between landslide susceptibility levels and conditioning factors (Goetz et al. 2015). The most common statistical models include information value model (Achour et al. 2017), weight evidence analysis model (Ilia and Tsangaratos 2016), fuzzy assessment model (Gemitzi et al. 2011), and frequency ratio model (Mohammady et al. 2012; Ozdemir and Altural 2013). Machine learning models input relevant geological data into computer programs and adopt algorithms to derive landslide susceptibility maps (Pham et al. 2016). In recent years, there is an increasing preference towards machine learning models for landslide susceptibility analysis. The well-established machine learning models include the support vector machine model (Micheletti et al. 2014; Huang and Zhao 2018), decision tree model (Pradhan 2013), logistic regression model (Budimir et al. 2015), and artificial neural network model (Pradhan and Lee 2010).

It is noted that all of the above landslide susceptibility models have their specific advantages and limitations (Reichenbach et al. 2018). Although empirical models take advantage of prior knowledge and experience, the analysis results might be quite different from the actual situations (Goetz et al. 2011). Statistical models are able to reflect the relationship between the input geoenvironmental factors and the output evaluation results, but the model structure is relatively simple and prone to deviations (Shahabi and Hashim 2015; Juliev et al. 2019). Machine learning models have the advantages of strong adaptability and generalization, but are subjected to the selection of model parameters and the size of the training dataset (Goetz et al. 2015; Zhou et al. 2018). As a result, it can be found from previous studies that the performance of landslide susceptibility models varies in different geoenvironmental settings, and no individual model proves to be superior across all conditions (Reichenbach et al. 2018; Pourghasemi et al. 2018). Nonetheless, the majority of previous studies related to landslide susceptibility assessment implement only one single model, which might lead to unsatisfactory results (Vorpahl et al. 2012).

It is known that multi-model ensemble methods provide an efficient and practical tool to improve the results of single classification models by combining their outcomes. The trait of an ensemble method is the harnessing of collective intelligence of diverse individual models (referred to hereinafter as base models). Thus, the effect of the ensemble method can surpass that of the best base model. In recent years, diverse ensemble approaches have been developed and shown great effectiveness in the spatial evaluation of various environmental hazards, such as flood assessment (Mojaddadi et al. 2017; Shahabi et al. 2020), groundwater contamination mapping (Barzegar et al. 2018), gully erosion assessment (Hembram et al. 2020), and landslide susceptibility mapping (Aghdam et al. 2016; Saha et al. 2020).

The focus of this study is placed on the use of multi-model ensemble methods to combine the predictions made by individual base models to achieve high-accuracy and reliable prediction of landslide susceptibility. In this study, three ensemble methods are introduced and modified for landslide susceptibility mapping: matrix ensemble method (Wei et al. 2018; Ahmed et al. 2018), partition ensemble method (Hong et al. 2018b; Martinello et al. 2020), and probability-weighted ensemble method (Baldassarre et al. 2009; Zhang et al. 2013). These ensemble methods adopt different ensemble strategies. To illustrate the effectiveness of the proposed ensemble methods, four of the most commonly used landslide susceptibility models are selected as base models for performing the ensemble methods through a case study of Fengjie County located in the Three Gorges Reservoir Region, China. This study was carried out in the Faculty of Engineering, China University of Geosciences, Wuhan, during July 2020 to May 2021.

Materials and methods

Framework for implementation of ensemble methods

This study proposes three different types of ensemble methods to seek better prediction accuracy and reliability of landslide susceptibility mapping, including the qualitative matrix ensemble method, semi-quantitative partition ensemble method, and quantitative probability-weighted ensemble method. With the support of remote sensing and geographic information system (GIS) (Awais et al. 2021a, 2021b; Cheng et al. 2021; Shao et al. 2021; You et al. 2021), the historical landslides and associated landslide conditioning factors in the study area are derived as basic inputs for landslide susceptibility modelling.

The four base models selected for ensembles in the case study include two typical statistical models, i.e. frequency ratio (FR) model and fuzzy assessment (FA) model, and two prevailing machine learning models, i.e. backpropagation neural network (BPNN) model and support vector machine (SVM) model. These four base models are chosen mainly for the following two reasons. Firstly, the quality of base models directly affects the prediction performance of ensembles, and the four base models have proved to provide reasonably good performance in previous studies (Gemitzi et al. 2011; Bui et al. 2012; Reichenbach et al. 2018; He et al. 2019; Shano et al. 2021). Secondly, ensembles tend to yield better prediction performance when there is a significant diversity among the constituent base models (Kuncheva and Whitaker 2003), and the four base models are built on different modelling principles. It should be noted that the base models are not limited to the above four landslide susceptibility models, and the proposed three ensemble methods are also suitable for other well-established landslide susceptibility models.

Figure 1 shows a systematic framework for the application process of the three ensemble methods. Firstly, based on the geological and geomorphological settings of the study area (i.e. Fengjie County), a preliminary analysis of landslide conditioning factors is conducted. Secondly, the selected four base models are well trained and validated to evaluate the landslide susceptibility in the study area within a GIS environment. Thirdly, the three ensemble methods are employed to combine the outcomes of the four base models. Finally, an extensive comparative analysis of the prediction accuracy and reliability between the proposed three ensemble methods and the four base models is performed to validate the effects of the ensemble methods.

Fig. 1
figure 1

Framework for implementation of ensemble methods

Study area and database

This section first introduces the geological and geomorphological settings of the study area, then presents a landslide inventory database consisting of historical landslides and landslide conditioning factors, and finally performs a sensitivity analysis of landslide conditioning factors to highlight their importance in causing landslides.

Geological and geomorphological settings in Fengjie County

Fengjie County is located in the east of Chongqing, China, as well as in the hinterland of the Three Gorges Reservoir Region, as shown in Fig. 2. It covers an area of 4098 km2 between latitudes 30°29′19″ to 31°22′33″N, and longitudes 109°1′17″ to 109°45′58″E. Fengjie County belongs to the mountainous landform in the east of Sichuan Basin with undulating mountains and gullies. The elevation of the study area ranges from 175 to 2123 m above mean sea level. The terrain of Fengjie County is higher in the south-east, and lower in the middle and north-west. The river valley near the Yangtze River system in the middle of the county is relatively developed. The topography and geomorphology are crucial to the distribution of land coverage. Affected by topography and geomorphology, land coverage presents spatial differences in Fengjie County. With the increase in the elevation and slope degree, the exposed surface area gradually decreases and the vegetation coverage gradually increases.

Fig. 2
figure 2

Location of the study area and spatial distribution of historical landslides

Since Fengjie County is situated at the intersection of several fold belts, there exists complex geological structures mainly composed of folds. It has a humid subtropical monsoon climate with an average annual precipitation of 1132 mm, and the temperature drops sharply due to the great height of the mountains and the immense depth of the valleys. The Yangtze River runs through Fengjie County, and there are 49 reservoirs in the study area. Affected by the complex geological structures, continuous rainfalls, and periodic changes in reservoir water level, Fengjie County is saddled with frequent geological disasters such as landslides and collapses.

Establishment and preliminary analysis of landslide database

The description of the spatial relationship between historical landslides and their conditioning factors is essential to landslide susceptibility modelling. Firstly, it is necessary to prepare the landslide inventory database regarding landslides and associated landslide conditioning factors. A total of 1,550 historical landslides occurred from 1950 to 2019 in Fengjie County have been well documented, each annotated with crucial information on occurrence time, geographical coordinates of landslide location, geological and geomorphic conditions, triggering factors, type, scale, and consequences. Rainfall is observed to be the predominant triggering factor of landslides in Fengjie County. For spatial landslide susceptibility mapping, this study emphasizes the effect of the conditioning factors which are the intrinsic attributes that govern the stability condition of slopes, regardless of the role of rainfall which is the external triggering factor with a relative yearly recurrence pattern.

Meanwhile, to assist in the training of landslide susceptibility models, an equal number of non-landslide sample points are generated using the seed cell sampling strategy (Dagdelenler et al. 2016). To be specific, a total of 1550 non-landslide sample points are randomly extracted from the undisturbed areas that are over 2000 m away from the landslide locations. The 1550 historical landslides and 1550 non-landslide sample points are mapped out in the study area, which are, respectively, represented by the red and black dots as illustrated in Fig. 2.

Moreover, based on previous studies (Budimir et al. 2015; Wen et al. 2017; Hong et al. 2018a; Zhao and Chen 2020; Sun et al. 2020) and the local geoenvironmental conditions, a total of 12 key conditioning factors are determined: elevation, slope degree, slope aspect, terrain curvature, terrain roughness index (TRI), lithology, distance to fold, distance to river, stream power index (SPI), topographic wetness index (TWI), normalized difference vegetation index (NDVI), and distance to road. These factors basically capture the overall intrinsic features of the slope with respect to topography, geology, hydrology, land use, and human activities.

Elevation, slope degree, slope aspect, terrain curvature, and TRI characterize the terrain features in the study area. Elevation is an important topographic factor, which has significant influence on slope degree, terrain curvature, and other factors, and further affects slope stability (Pourghasemi et al. 2012; Zhou et al. 2018). Slope degree is a significant geomorphological feature of the slope, and varying slopes lead to spatial differences in stress distribution, loose material accumulation, and weathering degree of rock strata (Moayedi et al. 2019). Slope aspect has a certain influence on illumination, soil texture, wind speed, and vegetation development, which could affect the spatial distribution of landslides (Bui et al. 2011). Terrain curvature reflects terrain structure and shape (Kalantar et al. 2018). TRI describes surface fluctuations and reflects the topographic and geomorphic conditions (Hong et al. 2018a). The distribution maps for elevation, slope degree, slope aspect, terrain curvature, and TRI are produced from the digital elevation model (DEM) of Fengjie County (Fig. 3a–e). The DEM data, with a spatial resolution of 30 m, are derived from the Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) launched on NASA’s Terra satellite.

Fig. 3
figure 3

Spatial distribution maps of landslide conditioning factors in the study area: a elevation; b slope; c aspect; d terrain curvature; e terrain roughness index (TRI); f lithology; g distance to fold; h distance to river; i stream power index (SPI); j topographic wetness index (TWI); k normalized difference vegetation index (NDVI); l distance to road

Lithology and distance to fold characterize geological conditions. Lithology determines rainfall erosion resistance and rock mass stability, and hence can strongly influence landslide occurrence (Roodposhti et al. 2014). The lithology data of the study area are extracted from the geological map of China Geological Survey (CGS) in 1:1 million scale (Fig. 3f). The presence of folds exerts significant influence on the distribution characteristics of rock strata, groundwater, and geological activities such as collapse and earthquake (Kayastha et al. 2013). The distribution map of distance to fold is generated by the Euclidean distance function in ArcGIS software (Fig. 3g).

Distance to river, SPI, and TWI characterize hydrogeological conditions. Distance to river directly affects the rock erosion process due to water level change and rainfall in the Three Gorges Reservoir Region. Based on the river network database in Fengjie County, the distribution map of distance to river is generated by the Euclidean distance function in ArcGIS software (Fig. 3h). SPI represents the degree of regional erosion by water flow, and TWI indicates ground humidity (Park et al. 2018). SPI and TWI are jointly determined by the river network database and the DEM data (Fig. 3i–j).

NDVI reflects regional land use and land cover in terms of the degree of vegetation coverage, which has great influence on the soil and hydrological conditions, and further affects slope stability (Jaafari et al. 2014). Using the Moderate-Resolution Imaging Spectroradiometer (MODIS) satellite data, the NDVI map is generated from the Environment for Visualizing Images (ENVI) remote sensing image processing platform (Fig. 3k). Distance to road shows the influence of human engineering activities, as the modification of slopes in the process of road construction could influence landslide occurrence to some extent (Sujatha et al. 2012; Meneses et al. 2019). According to the road network database, the distribution map of distance to road is generated by the Euclidean distance function in ArcGIS software (Fig. 3l).

In Fig. 3, the whole study area is appropriately discretized into 4,861,338 square grids of pixels, and each pixel contains the data information of the 12 conditioning factors. Moreover, each conditioning factor is reclassified into five classes by the Jenks optimization method (Jenks 1967), as illustrated in Table 1. The Jenks optimization method classifies the conditioning factors using natural breaks in data values. The best arrangement of values into various classes is determined by minimizing the squared deviations of the class means. Consequently, the spatial distribution maps and statistical database for all of the 12 conditioning factors are constructed with their subclasses.

Table 1 Statistics of landslide conditioning factors

To establish the spatial relationship between the landslides and the conditioning factors, for each class of every conditioning factor, the corresponding percentage of landslides and percentage of covered area are calculated and summarized in Table 1. The results demonstrate that landslide occurrence decreases with increasing elevation, terrain curvature, distance to river, distance to fold, distance to road, SPI, TWI, and TRI. The historical landslides in Fengjie County predominantly occur in elevation of 130–530 m, which is in line with the findings of Ercanoglu and Gokceoglu (2002) and Gemitzi et al. (2011). In addition, most landslides take place in slope degree between 12° and 30° in Fengjie County, which is similar to the resulting 15°–35° of the study area in Sujatha et al. (2012).

Sensitivity analysis of conditioning factors

A quantitative sensitivity analysis is performed to rank the importance of the 12 conditioning factors in causing landslides. The certainty factor is used to measure the relationship between the landslide density of a subarea and that of the whole study area (Sujatha et al. 2012). For each class of every conditioning factor, certainty factor is quantified by Eq. (1), and the result is given in Table 1. A positive value of certainty factor means the landslide density of the class is greater than the average value of the whole study area. The greater the certainty factor, the higher the probability of landslide occurrence.

$${\text{CF}}_{i} = \left\{ {\begin{array}{*{20}l} {\frac{{D_{i} - D}}{{D_{i} (1 - D)}},} \hfill & {D_{i} \ge D} \hfill \\ {\frac{{D_{i} - D}}{{D(1 - D_{i} )}},} \hfill & {D_{i} < D} \hfill \\ \end{array} } \right.$$
(1)

where CFi and Di, respectively, denote the certainty factor and landslide density in the ith class of a specified conditioning factor; and D denotes the average landslide density in the study area.

The sensitivity index of a conditioning factor is defined by the maximum difference between the certainty factor values in various classes of this conditioning factor, as seen in Eq. (2). A large value of the sensitivity index is indicated as a strong correlation with landslide susceptibility.

$${\text{SI}} = {\text{CF}}_{\max } - {\text{CF}}_{\min }$$
(2)

where SI stands for the sensitivity index of a conditioning factor; and CFmax and CFmin, respectively, represent the maximum value and minimum value of certainty factor among the five classes of this conditioning factor.

The obtained sensitivity indexes of the 12 landslide conditioning factors are then sorted in descending order to highlight their roles in causing slope failures in the study area, as shown in Fig. 4. The result shows that elevation is of utmost importance in landslide susceptibility, followed by lithology, distance to fold, and terrain curvature. As the result suggests, terrain features and geological formations and structures have dominant influence on slope stability. In contrast, topographic wetness index, aspect, and stream power index have lower sensitivity index values, indicating these three factors exert relatively less influence on landslide occurrence in the study area.

Fig. 4
figure 4

Sensitivity index ranking of the 12 conditioning factors

Four base models selected for ensembles

The above analysis of the spatial correlations between landslide conditioning factors and landslide distribution provides the basis for landslide susceptibility modelling and mapping. In this section, four of the most commonly used landslide susceptibility models (i.e. FR model, FA model, BPNN model, and SVM model) are introduced as the base models for ensembles. Each one of the four base models has been well trained and validated before performing ensemble methods.

Frequency ratio (FR) model

Frequency ratio is simply defined by the ratio of the landslide occurrence percentage to the area occupation percentage for various classes of every conditioning factor, as formulated in Eq. (3) (Samanta et al. 2018). A higher frequency ratio implies greater chances of landslides.

$${\text{FR}}_{i} = \frac{{{{N_{i} } \mathord{\left/ {\vphantom {{N_{i} } N}} \right. \kern-\nulldelimiterspace} N}}}{{{{S_{i} } \mathord{\left/ {\vphantom {{S_{i} } S}} \right. \kern-\nulldelimiterspace} S}}}$$
(3)

where FRi stands for the frequency ratio of the ith class of a conditioning factor; Ni and Si, respectively, denote the number of landslides and the area size within the ith class of this conditioning factor; N is the total number of landslides in the study area; and S is the total study area.

The FR values of the 12 conditioning factors (each with five classes) are calculated and presented in Table 1. For every pixel in the study area, the overall FR value is obtained by aggregating the FR values of the 12 conditioning factors. The overall FR values of all pixels are then grouped into five intervals by the Jenks optimization method. The study area is subsequently demarcated into five landslide susceptibility zones for visual interpretation: very low, low, medium, high, and very high, as shown in Fig. 5a. Comparing the produced landslide susceptibility map to the landslide/non-landslide points, it is found that the high- and very high-susceptibility classes agree well with the historical landslides along the Yangtze River, and the low- and very low-susceptibility classes agree well with the non-landslide sample points at the southern end of the study area.

Fig. 5
figure 5

Landslide susceptibility zonation maps generated by: a FR model; b FA model; c BPNN model; d SVM model; e matrix ensemble method; f partition ensemble method; g probability-weighted ensemble method

Fuzzy assessment (FA) model

The key of the fuzzy assessment model is to establish appropriate membership functions for assigning membership values of landslide susceptibility to each pixel for every individual conditioning factor (Gemitzi et al. 2011). In this study, the membership function is expressed by Eq. (4). The membership values range between 0 and 1, reflecting the degree of certainty of membership.

$$\begin{aligned} \, M(T_{1} ) & = \left\{ \begin{gathered} 0, \, V > T_{2} \hfill \\ - (V - T_{2} )/(T_{2} - T_{1} ), \, T_{1} < V \le T_{2} \hfill \\ 1, \, V \le T_{1} \hfill \\ \end{gathered} \right. \\ M(T_{i = 2,3,4} ) & = \left\{ \begin{gathered} 0, \, V > T_{i + 1} \hfill \\ - (V - T_{i + 1} )/(T_{i + 1} - T_{i} ), \, T_{i} < V \le T_{i + 1} \hfill \\ (V - T_{i} )/(T_{i} - T_{i - 1} ), \, T_{i - 1} < V \le T_{i} \hfill \\ 0, \, V \le T_{i - 1} \hfill \\ \end{gathered} \right. \\ M(T_{5} ) & = \left\{ \begin{gathered} 1, \, V \ge T_{5} \hfill \\ (V - T_{4} )/(T_{5} - T_{4} ), \, T_{4} \le V \le T_{5} \hfill \\ 0, \, V < T_{4} \hfill \\ \end{gathered} \right. \\ \end{aligned}$$
(4)

where M(Ti) is the membership value; V is the value of a conditioning factor; and T1T5 are the five control points of this conditioning factor derived by the natural breakpoint method. T1–T5 are corresponded to the five susceptibility levels (i.e. very low, low, medium, high, and very high) according to the certainty factor value of the five classes to which T1T5 belong (as seen in Table 1). Thus, for every conditioning factor in each pixel, the membership values corresponding to the five susceptibility levels are determined. Subsequently, a 5 × 12 membership matrix is obtained for the 12 conditioning factors in each pixel.

In addition, the importance weight vector is also obtained as the normalized sensitivity index values of the 12 conditioning factors. Then through the product of the membership matrix and the importance weight vector, the weighted membership values of the five susceptibility levels are derived for each pixel. Finally, the susceptibility level of each pixel is recognized using the maximum membership principle. As shown in Fig. 5b, the resulting susceptibility map visually shows that most of the study area is predicted to be highly susceptible to landslides, and even some non-landslide sample points fall into the high- and very high-susceptibility classes.

Backpropagation neural network (BPNN) model

Backpropagation neural network is a multi-layer feedforward network trained by error backward propagation which is an optimization algorithm using gradient descent for supervised learning of artificial neural networks (Pradhan and Lee 2010). In this study, 1050 out of 1550 historical landslides and 1050 out of 1550 non-landslide sample points are randomly selected as the training dataset, and the remaining 500 landslides and 500 non-landslide sample points as the validation dataset. Then, the Bayesian regularization algorithm is applied to the BPNN model training, since this algorithm can achieve good generalization, fast convergence speed, and high learning accuracy (Bui et al. 2012). Meanwhile, the error backpropagation process is adjusted using the weight decay method to reduce the verification error.

Using the trained BPNN model, 72.14% of the training dataset and 85.10% of the validation dataset are successfully classified into the corresponding landslide susceptibility levels. It is noted that correct classifications include two circumstances: (1) historical landslides fall into the high- and very high-susceptibility classes, and (2) non-landslide sample points fall into the low- and very low-susceptibility classes. Finally, the landslide susceptibility zonation map of the study area is generated by the optimized BPNN model, as shown in Fig. 5c. It visually shows that the BPNN model has good performance in the north-west of the study area (where the densely distributed landslides fall into the very high-susceptibility classes) and at the south-east end of the study area (where the densely distributed non-landslide sample points fall into the very low-susceptibility classes).

Support vector machine (SVM) model

Support vector machine is a supervised nonlinear machine learning algorithm, which differentiates the classes with an optimal hyper-plane that maximizes the margin between the classes, and the data points closest to the hyper-plane are called support vectors (Vapnik 1995; Christianini and Shawe-Taylor 2000). Same as the dataset division for the BPNN model training, 1050 out of 1550 historical landslides and 1050 out of 1550 non-landslide sample points are randomly selected as the training dataset, and the remaining 500 landslides and 500 non-landslide sample points as the validation dataset. Then, the radial basis function (RBF) is employed as the kernel function to train the SVM model, since RBF kernel has been proved to produce the best prediction results in many studies (Pradhan 2013; Pourghasemi et al. 2013; He et al. 2019). The detailed description of underlying mathematics of applying the SVM model with RBF kernel to landslide prediction is presented in Pourghasemi et al. (2013).

Using the trained SVM model, 83.38% of the training dataset and 90.10% of the validation dataset are successfully classified into the corresponding landslide susceptibility levels. The resulting spatial prediction of landslide susceptibility in the study area is shown in Fig. 5d. It is found that the landslide and non-landslide points show satisfactory consistency with the detected very high- and very low-susceptibility classes, respectively.

Three types of ensemble methods proposed

Three different ensemble approaches, i.e. the matrix ensemble method, partition ensemble method, and probability-weighted ensemble method, are proposed to integrate the outcomes of the selected four base landslide susceptibility models mentioned above. The ensemble strategies and processes are described in detail below.

Matrix ensemble method

The matrix ensemble method is a qualitative decision support tool that uses a decision matrix to integrate two sources of information and make a comprehensive decision (Wei et al. 2018; Ahmed et al. 2018). In this study, a decision matrix is created to integrate landslide susceptibility results of different models, as shown in Table 2. The decision matrix is a two-dimensional symmetric matrix listing all possible combinations of landslide susceptibility levels (obtained from Model A and Model B) and the corresponding final susceptibility classifications. According to this decision matrix, for example, if the susceptibility levels of a pixel derived by Model A and Model B are “very high” and “medium”, respectively, then the ensemble result should be “high”; if the two models result in the same susceptibility level for a pixel, then the ensemble result will remain unchanged.

Table 2 Decision matrix for ensembling two landslide susceptibility models

The process of applying the decision matrix to ensemble the four base models includes two steps, as illustrated in Fig. 6. First, the two statistical models (i.e. FR model and FA model) and the two machine learning models (i.e. BPNN model and SVM model) are ensembled separately according to the decision matrix. Then, the two sets of ensemble results obtained from the first step are ensembled again by the decision matrix. Consequently, the final landslide susceptibility zonation map is produced as seen in Fig. 5e. The map shows that most of the historical landslides fall into the predicted high- and very high-landslide-susceptibility areas such as the north-west region and the areas on both sides of the Yangtze River. The non-landslide sample points also agree well with the predicted low- and very low-landslide-susceptibility areas.

Fig. 6
figure 6

Operation steps of the matrix ensemble method

Partition ensemble method

The partition ensemble method firstly divides the study area into several subareas based on the regional topographic distribution characteristics, then selects the best evaluation results in each subarea, and assembles them together to generate a new landslide susceptibility map for the whole study area (Hong et al. 2018b; Martinello et al. 2020). Therefore, it is a semi-quantitative method. According to the presence of the main fold belts, the study area is partitioned into five subareas I–V as shown in Fig. 7. To compare the performance of the four base models in each subarea, the success rate involving both historical landslides and non-landslide sample points is calculated as below.

$${\text{SR}}_{i} = \frac{{n_{{i{\text{\_L}}}} + n_{{i{\text{\_NL}}}} }}{{N_{{i{\text{\_L}}}} + N_{{i{\text{\_NL}}}} }}$$
(5)

where SRi is the success rate of a base model in subarea i; ni_L is the number of historical landslides that fall into the high- and very high-susceptibility classes in subarea i; ni_NL is the number of non-landslide sample points that fall into the low- and very low-susceptibility classes in subarea i; and Ni_L and Ni_NL, respectively, denote the total number of historical landslides and the total number of non-landslide sample points in subarea i.

Fig. 7
figure 7

Partition of the study area into five subareas

Table 3 shows the calculated success rates of the four base models in each subarea. The largest success rate is considered to be the best evaluation result for each subarea among the four base models, which is highlighted in bold in each column of Table 3. It is found that the FA model performs best in terms of success rate in subarea I, and the SVM model performs best in the remaining four subareas. Consequently, the best evaluation results of landslide susceptibility in the five subareas are extracted to piece together the final landslide susceptibility zonation map, as shown in Fig. 5f. By selecting and combining the optimal result from the four base models in each subarea, the partition ensemble method yields better results than each base model.

Table 3 Comparison of the success rate between the four base models in each subarea

Probability-weighted ensemble method

The probability-weighted ensemble method assigns probability weights to the outputs of the base models (Zhang et al. 2013; Hong et al. 2018a; Martinello et al. 2020). In this study, the weighting factors of the four base models are determined by the overall success rate of historical landslides as well as non-landslide sample points in the whole study area. Then the normalized weights assigned to various base models are calculated as follows (Zhang et al. 2013).

$${\text{P}} \left( {M_{i} |D} \right) = \frac{{{\text{P}} \left( {D|M_{i} } \right){\text{P}} \left( {M_{i} } \right)}}{{\sum\nolimits_{k = 1}^{K} {{\text{P}} \left( {D|M_{k} } \right){\text{P}} \left( {M_{k} } \right)} }}$$
(6)

where P(Mi|D) is the normalized posterior probability weight of the ith base model Mi; P(D|Mi) is the observation probability of Mi, which is equal to the overall success rate of historical landslides and non-landslide sample points in the whole study area obtained from Mi; P(Mi) is the prior probability weight of Mi, which is equally set to 0.25 for the four base models; D is the dataset consisting of the 1550 historical landslides and 1550 non-landslide sample points; and K is the number of base models which equals four.

Table 4 presents the overall success rates and the resulting normalized weights of the four base models. For each pixel in the study area, the landslide susceptibility levels predicted by the four base models are weighted correspondingly by the normalized weights to determine the final landslide susceptibility level of the pixel. Finally, the landslide susceptibility zonation map of the study area is produced using the probability-weighted ensemble method, as shown in Fig. 5g. This susceptibility map illustrates that most historical landslides fall into the high- and very high-susceptibility areas, and most non-landslide sample points fall into the low- and very low-susceptibility areas. A careful visual inspection reveals that the ensembled landslide susceptibility zonation maps obtained from the matrix and probability-weighted ensemble methods are very similar over the entire study area.

Table 4 Overall success rate and normalized weight of the four base models

Results and discussion

To demonstrate the effectiveness of the proposed three ensemble methods for landslide susceptibility mapping, this section presents an extensive comparative analysis on the prediction performance of the three ensemble methods and four base models in a quantitative manner. Prediction performance has two connotations: prediction accuracy and reliability. In this study, prediction accuracy is examined by the two basic decision rules (used to define a high-quality landslide susceptibility zonation map) and the overall prediction accuracy (measured with the receiver operating characteristics curve). Reliability is characterized by the magnitude of statistical discrepancy calculated by the information entropy. Additionally, a brief discussion is provided to explicate what makes this study different from previous studies related to landslide susceptibility mapping, reveal the reasons behind the different effects of the proposed three ensemble methods, and recommend the most preferable ensemble method based upon prediction performance as well as practical operability.

Comparative analysis between the ensemble methods and base models

Two basic decision rules of a high-quality landslide susceptibility zonation map

In order to quantitatively evaluate and compare the prediction performance of the three ensemble methods and four base models, the percentage of landslide occurrence and the percentage of area occupation in each landslide susceptibility level are obtained from the produced landslide susceptibility zonation maps. The detailed statistical results are presented in Fig. 8.

Fig. 8
figure 8

Statistical results of the three ensemble methods and four base models: a percentage of landslide occurrence in each landslide susceptibility level; b percentage of area occupation in each landslide susceptibility level

A high-quality landslide susceptibility zonation map is supposed to fulfil the two basic decision rules: (1) most of the historical landslides should be located in the predicted high-susceptibility zones; (2) the high-susceptibility classes should cover small areas (Sujatha et al. 2012). In other words, the predicted highly susceptible zones are better to enclose more historical landslides and cover smaller areas. Based on the statistical results in Fig. 8, the percentage of landslide occurrence and the percentage of area occupation are calculated within the high- to very high-susceptibility classes for the three ensemble methods and four base models. The results are shown in Table 5.

Table 5 Statistical results of the four base models and three ensemble methods

In the high- and very high-susceptibility classes obtained from the matrix ensemble method, partition ensemble method, and probability-weighted ensemble method, the percentages of landslide occurrence (i.e. success rate of historical landslides) are 96.00%, 90.26%, and 96.39%, respectively, and the percentages of area occupation are 54.71%, 35.85%, and 51.43%, respectively. Based on the two basic decision rules, it manifests that the matrix and probability-weighted ensemble methods produce relatively close results and perform better than the partition ensemble method in terms of success rate of historical landslides. The partition ensemble method yields smaller highly susceptible areas than the other two ensemble methods.

Among the four base models, the FA model has the highest success rate of historical landslides (95.23%) but also covers the largest areas (63.83%), while the SVM model covers the smallest highly susceptible areas (36.76%) but meanwhile encloses the least historical landslides (86.19%). The statistical results of the percentage of landslide occurrence and the percentage of area occupation in the high- to very high-susceptibility classes derived from the FR model and BPNN model are between that of the FA model and SVM model. The results demonstrate that none of the four base models can synchronously satisfy the above-mentioned two basic decision rules very well. Compared with the FA model, the matrix and probability-weighted ensemble methods perform better because they achieve not only higher success rate of historical landslides but also cover smaller highly susceptible areas. Compared with the SVM model, the partition ensemble method performs better because it achieves not only smaller highly susceptible areas but also higher success rate of historical landslides. The results show that the proposed three ensemble methods could better satisfy the two basic decision rules than the four base models, and thus are encouraging in improving the quality of the landslide susceptibility map for the study area.

It should be noted that although the two basic decision rules are widely used for directly assessing landslide susceptibility model performance (Sujatha et al. 2012), they only emphasize the classification accuracy and efficiency of landslide prone areas, and neglect the fitting goodness between the non-landslide sample points and the predicted stable zones, which might lead to limited insight into the full model performance. Moreover, the two basic decision rules could not identify the best ensemble method.

Overall prediction accuracy measured by the receiver operating characteristics curve

The receiver operating characteristics (ROC) curve is a standard tool for statistically measuring the overall prediction accuracy of landslide susceptibility models and the associated zonation maps by considering both landslides and non-landslides (Reichenbach et al. 2018). The ROC curve plots the true positive rate (sensitivity) on Y-axis and the false positive rate (1-specificity) on X-axis. The area under the curve (AUC) measures the probability of correct classification and thus characterizes the global accuracy of a model. An AUC value close to 1 indicates high accuracy. In this study, the ROC curve is acquired to further compare the overall prediction accuracy of the proposed three ensemble methods versus the four base models by comparing the susceptibility zonation maps to the 1550 historical landslides and 1550 non-landslide sample points.

Figure 9a depicts the ROC curves of the four base models. The FR model, FA model, BPNN model, and SVM model achieve AUC of 65.74%, 74.34%, 70.98%, and 78.60%, respectively. The AUC values are less than 80%, indicating the overall prediction accuracy of the four base models is not desirable. Nonetheless, the SVM model performs best among the four base models in terms of overall prediction accuracy. Figure 9b depicts the ROC curves obtained from the three ensemble methods. The matrix ensemble method, partition ensemble method, and probability-weighted ensemble method achieve AUC of 83.78%, 80.09%, and 82.61%, respectively. The ROC plots confirm that all the three ensemble methods have high performance (AUC > 80%) compared with the four base models (AUC < 80%). The matrix ensemble method provides the best improvement of overall prediction accuracy over the base models, followed by the probability-weighted ensemble method and partition ensemble method.

Fig. 9
figure 9

ROC test results: a four base models; b three ensemble methods

Statistical discrepancy quantification using information entropy

Reliability measures the consistency of results, which can be reasonably identified by the magnitude of statistical discrepancy (Karagrigoriou 2012). The smaller the observed discrepancy, the more consistent the evaluation results and the higher the reliability. There are various approaches to examine statistical discrepancy such as McNemar test (Kavzoglu et al. 2015), Friedman test (Zhou et al. 2018), Wilcoxon test (Sahin 2020), and information entropy theory (Grunwald and Dawid 2004).

In this study, information entropy theory is adopted to measure the discrepancy of landslide susceptibility mapping results. The map of information entropy provides a direct visualization of the discrepancy distribution in each pixel of the study area, and the average information entropy characterizes the overall statistical discrepancy in the whole study area. The information entropy in each pixel and the average information entropy in the whole study area, denoted as E(i) and EA, respectively, are formulated as follows (Wellmann and Regenauer-Lieb 2012; Zhao et al. 2021).

$$E(i) = - \sum\limits_{l \in L} {{\text{P}}_{l} (i)} \log {\text{P}}_{l} (i)$$
(7)
$$E_{{\text{A}}} = - \frac{1}{{\left| {\mathbf{S}} \right|}}\sum\limits_{i \in S} {E(i)}$$
(8)

where Pl(i) is the probability of each landslide susceptibility level l in the pixel i; L is the set of the total five susceptibility levels; and |S| denotes the cardinality of the set S (herein |S| is equal to the total number of pixels). The information entropy in a pixel is 0 when the three ensemble methods or the four base models produce the exact same result of landslide susceptibility level, while the information entropy is the highest (i.e. equal to 1) when the methods/models result in completely distinct landslide susceptibility levels.

Figure 10 illustrates the information entropy distribution map of the four base models and the three ensemble methods. It is shown that the pixels with high entropy (the areas in red colour) in the information entropy map of the three ensemble methods are remarkably reduced compared to that of the four base models. Besides, the calculated average information entropies of the four base models and the three ensemble methods are 0.4937 and 0.2991, respectively, with a significant decrease of average information entropy after ensembles. The result indicates that the discrepancy of landslide susceptibility mapping results is effectively decreased by applying the proposed ensemble methods, even though their ensemble strategies are quite different. Hence, the uncertainty in landslide susceptibility mapping introduced by model diversity can be reduced through multi-model ensembles. From this point of view, the proposed ensemble methods can enhance the reliability of landslide susceptibility mapping in addition to increasing the prediction accuracy.

Fig. 10
figure 10

Information entropy plots: a four base models; b three ensemble methods

Discussion

The vast majority of previous studies related to landslide susceptibility mapping were carried out to develop optimized models for high prediction accuracy. For example, one of the most recent studies conducted by Sun et al. (2020) aimed to develop optimized random forest model involving hyper-parameter optimization using the Bayesian algorithm. Different from previous studies, this study adopts three different ensemble strategies to combine multiple well-developed landslide susceptibility models for an increase in prediction accuracy and higher reliability as well, and therefore is a systematic optimization research for landslide susceptibility mapping. Another merit of this study is that in addition to comparing the overall prediction accuracy, the reliability of the ensemble methods is further verified by measuring and comparing the statistical discrepancies between the methods/models.

As the evaluation results show, the proposed three ensemble methods exhibit different effects in improving the quality of landslide susceptibility mapping. The reason behind that is their different ensemble units and principles. As mentioned previously, the basic ensemble unit of the matrix and probability-weighted ensemble methods is the pixel, much finer than the subarea used in the partition ensemble method. As for the ensemble principles, the matrix and probability-weighted ensemble methods can fully integrate the outcomes of the four base models and produce comprehensive evaluation results. In other words, each base model contributes to the ensemble results. The partition ensemble method directly selects the result of a single base model with local optimum among the four base models. Therefore, the matrix and probability-weighted ensemble methods perform superior to the partition ensemble method.

Overall, among the three ensemble methods proposed, the matrix ensemble method is the most effective to improve the overall prediction accuracy of landslide susceptibility mapping. Besides, as it adopts a rather simple qualitative ensemble rule without complex calculation, this method is well universal and operable. To summarize, the matrix ensemble method has the best prediction performance as well as the strongest practical operability, and thus is the most preferable for regional landslide susceptibility mapping.

Conclusion

Each existing landslide susceptibility model has its specific strengths and limitations, which leads to the fact that there is no universal model that fits all geoenvironmental conditions. To improve the prediction accuracy and reliability of landslide susceptibility mapping, three different ensemble approaches for combining multiple susceptibility models are investigated in this paper, including the matrix ensemble method, partition ensemble method, and probability-weighted ensemble method. Within a GIS environment, a case study of Fengjie County located in the Three Gorges Reservoir Region, China, is conducted to illustrate the effectiveness of the proposed ensemble methods. Four of the most commonly used landslide susceptibility models are selected as base models for ensembles, including frequency ratio, fuzzy assessment, backpropagation neural network, and support vector machine models. The following conclusions can be drawn from this study.

  1. 1.

    The three ensemble methods proposed can effectively improve the overall prediction accuracy of landslide susceptibility mapping by taking advantage of the base models. Compared to the four base models, the three ensemble methods better satisfy the two basic decision rules used to define a high-quality landslide susceptibility map. Moreover, the ensemble methods achieve higher overall prediction accuracy (AUC > 80%) than the four base models (AUC < 80%). In particular, the matrix ensemble method provides the best improvement in the overall prediction accuracy.

  2. 2.

    The three ensemble methods proposed can also improve the reliability of landslide susceptibility mapping. The reliability of the ensemble methods is reasonably interpreted by the magnitude of statistical discrepancy of the landslide susceptibility mapping results which is measured by information entropy. The calculated average information entropy of the three ensemble methods is 0.2991, which is significantly smaller than that of the four base models (i.e. 0.4937). The results demonstrate that the proposed three ensemble methods could produce more consistent and reliable results. Therefore, the proposed multi-model ensemble methods are powerful tools for improving not only prediction accuracy but also reliability in landslide susceptibility mapping.

  3. 3.

    Comparison between the three ensemble methods demonstrates that the matrix ensemble method is most advantageous in both prediction accuracy and practical operability, followed by the probability-weighted ensemble method and partition ensemble method. The final landslide susceptibility zonation map produced by the matrix ensemble method more accurately portrays the spatial likelihood of landslides occurring in Fengjie County. The percentages of area occupation in the very high-, high-, medium-, low-, and very low-landslide susceptibility classes are 34.42%, 20.30%, 10.90%, 8.74%, and 25.64%, respectively. This refined landslide hazard map can better facilitate local land planning, early warning, landslide mitigation, and other practical applications.