Introduction

During the recent decades, assessment of landslide-susceptible zones has become one of the most discussed topics in literature because prediction of landslide events is particularly difficult due to the complex natures of landslides (Tien Bui et al. 2016e). Consequently, various methods and techniques have been proposed for landslide modeling and they can be classified into three main groups such as physical-based, statistical, and soft computing methods. Since physical-based methods are not suitable for large areas, statistical and soft computing methods have received huge attention. In the statistical methods, bivariate analysis (Suzen and Doyuran 2004; Yalcin et al. 2011), multivariate analysis (Chung et al. 1995; Suzen and Doyuran 2004) and logistic regression (Costanzo et al. 2014; Felicisimo et al. 2013; Kavzoglu et al. 2015; Lee et al. 2014; Pradhan and Lee 2010; Tien Bui et al. 2011) are considered to be the most suitable methods for landslide susceptibility assessment on medium and regional scales. However, prediction capability of these landslide models is still not satisfied; therefore, data mining methods have been proposed (Tien Bui et al. 2016e).

Data mining, which is a branch of applied artificial intelligence, is defined as the exploration of observational datasets to find internal relationships and represent the data in understandable ways (Mennis and Guo 2009). They include multiple steps such as data selection and preprocessing, transformation, incorporation of prior knowledge, analysis with computational algorithms, interpretation and evaluation of the results (Fayyad et al. 1996). Literature review shows that data mining is suitable to deal with nonlinear real-world problems with high accuracy, including landslide modeling (Hoang and Tien Bui 2016; Hoang et al. 2016; Tien Bui et al. 2016a; Were et al. 2015).

Among data mining methods and techniques, neuro-fuzzy (Pradhan et al. 2010; Tien Bui et al. 2012d), artificial neural networks (Gomez and Kavzoglu 2005; Hong et al. 2015b; Lee et al. 2003; Tien Bui et al. 2012c; Yilmaz 2009), and support vector machines (Kavzoglu et al. 2014; Yao et al. 2008) may be the most widely used. Several studies have compared the prediction performance of these methods with conventional methods and concluded that the performance of data mining models is better than that of conventional methods (Cheng and Hoang 2015; Pham et al. 2015, 2016a; Pradhan 2013; Tien Bui et al. 2012a, 2013a; Were et al. 2015; Yilmaz 2009).

The recent developments of geographic information systems (GIS) technology in combination with soft computing tools (such as in Weka, R programming, and MATLAB) have provided new and powerful techniques for landslide modeling (Tien Bui et al. 2016e) such as rule-based systems, probabilistic reasoning, decision tables, J48 decision trees, logistic model trees, and functional trees (Kumar et al. 2012). The main advantage of these methods is that they provide not only a more transparent calculation in the modeling process but also better accuracy (Hong et al. 2015a; Park and Lee 2014; Pham et al. 2016b; Tien Bui et al. 2014; Tsangaratos and Ilia 2015). Therefore, exploration of new methods and techniques for landslide modeling are highly necessary (Tien Bui et al. 2012e). This is because a few percentage of increment of the spatial accuracy could affect the spatial distribution of landslide-susceptible areas (Jebur et al. 2014; Kavzoglu et al. 2014; Tien Bui et al. 2012b, 2013a, 2014).

More recently, ensemble frameworks have received much attention in many fields due to their abilities to improve the prediction performance of models as well as dealing with complex and high-dimensional data (Lee et al. 2012; Rokach 2010). Various ensemble frameworks have been proposed such as Stacking, Random subspace, Random forests, and Rotation forests (Rodriguez et al. 2006), Bagging (Breiman 1996), AdaBoost (Freund and Schapire 1997), MultiBoost (Webb 2000), and they can group into two main categories: heterogeneous and homogeneous (Shun and Wenjia 2006). The first one incorporates models from different algorithms to form the final ensemble classifier, for example in Lee et al. (2012), whereas in the second one, only one algorithm is used but the original training data is split into several subsets to build classifiers, and then, a committee is constructed (Maudes et al. 2012). Nevertheless, exploration of ensemble frameworks for landslide susceptibility modeling has seldom been carried out.

This study fills this gap in literature by proposing and verifying a novel ensemble methodology for landslide susceptibility modeling. In the proposed approach, functional trees (Gama 2004) and three ensemble techniques such as AdaBoost, Bagging, and MultiBoost were used. The functional trees (FT) are classification trees that use linear functions at the leaves, whereas AdaBoost, Bagging, and MultiBoost are homogeneous ensemble frameworks that have ability to improve performances of prediction models significantly (Pham et al. 2016b; Tien Bui et al. 2013a, 2014). The prediction performances of the ensemble models were assessed using the training and validation datasets, statistical evaluation measures, the receiver operating characteristic (ROC) curve, and area under the curve (AUC). In addition, landslide models derived from J48 decision trees and artificial neural networks were included for comparison, and finally, concluding remarks were given. It is noted that the data processing was carried out using Microsoft Excel 2013, ArcGIS 10.2, and IDRISI Selva 17.01. The modeling process was carried out using the R programming environment and Weka 3.7.

Study area and data used

Geographic setting of the study area

The corridor of the National Road No. 32 section, between the Yen Bai and the Lao Cai provinces (Fig. 1), is selected as the study area. The area is located in the northwestern region of Vietnam and covers an area of around 3164 km2, between longitudes 103°33′23″E and 104°52′58″E, and between the latitude 21°19′53″N and 22°20′18″N. The total length of the road section is about 250 km.

Fig. 1
figure 1

Landslide inventory map of the study area

The altitude of the study area ranges from 120 to 3140 m a.s.l, with an average altitude of 1078 m and SD is 555.9 m. Areas with slope group 0°–15° account for 22.3 % of the total area. About 52.9 % of the study area falls within slope greater than 25°, whereas areas in the slope category 15°–25° account for 24.8 % of the total area. Topographically, around 30.6 % of the total area is saddle hillside, whereas ridge areas account for 18.2 %. Approximately 17.8 % of the total area is ravine. Convex and concave areas account for 13.1 and 12.0 % of the total area.

The climate in the areas is characterized by the tropical monsoon with hot, rainy, and dry seasons. The average temperature is 22–23 °C and the average humidity 83–87 %. Rainfall is mainly concentrated in the rainy season from March to November, with an annual average rainfall is around 1500–2200 mm. Rainfall is generally low from December to February. The highest temperature can peak 41°, whereas the lowest one is around 0° (Ho et al. 2010).

Three main fault zones pass through the study area that causes weakness in the rock mass: Fansipan, Tu Le, and Song Da. There are 34 lithological formations outcrop in the study area, and among them, 10 formations (Fig. 1) are dominant and account for 88.9 % of the total area. They are Sinh Quyen (2.2 %), Bac Son (1.5 %), Suoi Bang (6.6 %), Muong Trai (2.6 %), Nam Mu (6.1 %), Tu Le complex (22.4 %), Phu Sa Phin complex (9.0 %), Ngoi Thia (12.0 %), Tram Tau (13,2 %) and Phu San Cap complex (13.2 %). Our analysis of these formations shows that tuff, sandstone, clay shale, clayey limestone, siltstone, limestone, trachyte porphyry, rhyolite, and granite are the main lithologies. Landslides are highly concentrated in Tu Le complex and Tram Tau formation (Ho et al. 2010).

Data collection and processing

Landslide inventory map

In this study, data collection and processing was carried out by means of a geographic information system. Landslide modeling is carried out using the statistical hypothesis that landslides will occur in future under the same conditions that produced them in the past and present (Guzzetti et al. 1999); therefore, a landslide inventory map is highly necessary to understand the conditioning factors that trigger slope failures and their mechanisms (Dai et al. 2002). In this study, a landslide inventory map (Fig. 1) with 262 landslide locations which have occurred during the last 20 years was used. These landslides were collected and interpreted using aerial photographs with resolution of 1 m, and these works were carried out in a national project by Ho et al. (2010). These landslides including 16 translational slides and 246 soil-mixed-boulder slides are depicted by polygons where the maximum size is 37,326 m2, while the minimum size is about 476 m2.

Around 14.5 % of the total landslides have sizes lager than 10,000 m2, whereas only 1.5 % of the total landslides have sizes less than 1000 m2. Landslide sizes between 1000 and 5000 m2 account for 56.5 % of the total landslides. The other landslides (27.5 %) have sizes from 5000 to 10,000 m2. It is important to note that some types of failures such as rock falls and topples were eliminated because their failure mechanisms are different. Our extensive field works showed that landslides were mainly triggered by heavy rainfalls that caused saturation of soils. Photographs of some landslides in this study area are shown in Fig. 2, and detailed explanations of these landslides can be seen in Ho et al. (2010).

Fig. 2
figure 2

Some photographs of landslides occurred in the study area. These photographs were taken by Ho et al. (2010): a the Khau Pha health Clinic center, b Tram Tau area, c Deo Khau Pha area, d Tu Le area, e Cao Pha area and f Nam Kip area

Landslide conditioning factors

Since landslide susceptibility assessments employing soft computing techniques are considered as indirect approaches, therefore a large number of input parameters should be considered (Tien Bui et al. 2016b), though a model with too many factors does not necessarily resulting in higher prediction capability (Floris et al. 2011). Lithology, slope, and aspect are most widely used conditioning factors (Tien Bui et al. 2015, 2016c), whereas effectiveness of other factors such as soil type, land use, road and river networks may still debatable among landslide researchers. Conditioning factors should be selected based on the landslide typology and failure mechanism, the characteristics of the study area, the scale of analysis, the available data sets, and the methodology used (Ercanoglu 2005; Manzo et al. 2013).

Investigated relationships between landslide inventory map and related conditioning factors for this study area have been carried out by Ho et al. (2010), and based on their findings, a total of ten conditioning factors were selected, constructed, and converted to a raster format with a resolution of 20 m. They are lithology, distance to faults, slope, aspect, relief amplitude, toposhape, topographic wetness index (TWI), distance to roads, distance to river, and rainfall. The detail classes of these factors are shown in Table 1.

Table 1 Landslide conditioning factors and their class intervals used in this study

Lithology is considered as one of the most important factor (Ilia and Tsangaratos 2016) because it influences the geomechanical and hydraulic characteristics of terrain, therefore controlling types and mechanism of landslides (Dai et al. 2001; Ercanoglu 2005). Faults are considered a critical factor that influences distributions of landslides (Dou et al. 2015; Hong et al. 2016); therefore, distances to faults are also selected. In this study, the lithology and faults area were extracted from the Geological and Mineral Resources Map of Vietnam at a scale of 1:200,000. The lithologic map with 12 groups (Fig. 3) that compiled by Ho et al. (2010) was used. Distance to faults map with four classes (Fig. 4a) was constructed.

Fig. 3
figure 3

Lithologic map

Fig. 4
figure 4

a Distance to faults map, b slope map, c aspect map and d relief amplitude map

It is well known that slope failures are directly linked to types of terrain; therefore, a digital elevation model (DEM) with a resolution of 20 m for the study area was constructed using national topographic maps at the scale of 1:50,000. Based on the DEM, five geomorphometric factors were extracted: slope, aspect, relief amplitude, toposhape, topographic wetness index (TWI). Slope is selected for instability analysis because it is subject to shear stresses acting on the displacement of hill slopes (Dai et al. 2001). Aspect is a factor that indirectly influences slope failure because slope directions relate to the exposition of the terrain to solar radiation and rainfall that control the concentration of the soil moisture (Magliulo et al. 2008) and therefore influencing landslides. In this study, the slope map (Fig. 4b) was constructed with six classes, whereas the aspect map (Fig. 4c) with nine classes was built.

Relief amplitude that represents differences between the highest and lowest points in the terrain is considered as a highly sensitive factor to landslide occurrences (Tang et al. 2010; Vergari et al. 2011). The relief amplitude map with six classes (Fig. 4d) was compiled for the study area. Since the landslide occurrences are closely related to topographic attributes (Lineback Gritzner et al. 2001; Zhang et al. 2014); therefore, topographic shape is used in landslide susceptibility assessment (Caniani et al. 2008; Ercanoglu 2005). The toposhape map in this study (Fig. 5a) was constructed with ten classes. TWI that was developed by Beven and Kirkby (1979) is a combination of local upslope contributing area. TWI could quantify the effect of topography on hydrological processes and characterize the distribution of soil moisture and surface saturation (Sørensen et al. 2006); therefore, it is used in landslide susceptibility analysis. In this study, the TWI map (Fig. 5b) with five classes was constructed.

Fig. 5
figure 5

a Toposhape map, b TWI map, c distance to roads map, d distance to rivers map and e rainfall map

Anthropogenic factor such as distance to roads is used for the assessment of landslide susceptibility because excavations for road cuts may induce slope failures (Lay 2009). For the case of distance to rivers, water may influence the saturation of slopes when it undercuts banks of streams (Highland and Bobrowsky 2008); therefore, the distance to rivers should be used for landslide modeling. In this study, road and river networks were obtained from the national topographic maps at the scale of 1:50,000, and then, road and river sections that undercut slopes larger than 15o were extracted. The distance to road map (Fig. 5c) and distance to river map (Fig. 5d) were constructed by buffering the road and river sections. Regarding rainfall, the rainfall map (Fig. 5e) with five classes that was constructed by Ho et al. (2010) is used. This map was constructed based on the average rainfall from the year 1980–2008 using the Inverse Distance Weighed method (Tien Bui et al. 2011). The rainfall data were obtained from the Institute of Meteorology and Hydrology in Vietnam.

Methodology

Sampling strategy and preparation of training and validation data

In order to build landslide models and evaluate their performance, the landslide inventory and ten conditioning factor maps were converted to a grid cell format with a cell-size of 20 m. Since the dates of these landslides are not known, these landslide polygons were randomly split in two subsets with a ratio of 70/30 (Tien Bui et al. 2012d). The first subset (2781 landslide pixels) was used for building models, whereas the second one (1011 landslide pixels) was used for model validation.

The assessment of landslide susceptibility using data mining methods can be considered as a binary classification; therefore, they require both the positive data (e.g., in current case, the presence of landslides) and negative data (e.g., the absence of landslides). Because number of the landslide pixels (3792 pixels) are much smaller than total number of pixels of the study area (7,871,195 pixels), therefore, we used the under sampling method (Pradhan 2013; Tien Bui et al. 2016d) in this study. For this reason, the same non-landslide pixels were randomly sampled in the free-landslide area. The landslide pixels were assigned value of “1”, whereas the non-landslide pixels were assign value of “0”. Finally, values for the ten landslide conditioning factors were then extracted to build the training and validation datasets.

Feature selection and correlation analysis

Overall performance of landslide models using soft computing methods may be improved with the use of feature selection (Doshi and Chaturvedi 2014). This is because the training dataset may have some noisy features that cause confusions to the models; therefore, the feature selection is used in this study. Various methods and techniques for the selection of feature have been proposed for this task such as Information Gain (Quinlan 1993), Symmetrical uncertainty (Senthamarai Kannan and Ramaraj 2010), fuzzy rough set (Dai and Xu 2013), and PSO-based feature selection (Ajit Krisshna et al. 2014). In this study Information Gain was used because it is considered as one of the widely used techniques in feature selection in soft computing (Martínez-Álvarez et al. 2013; Witten et al. 2011), including landslide modeling (Tien Bui et al. 2016e). In addition, Information Gain helps to identify the importance of the input variables (Yang et al. 2011).

The Information Gain value for landslide conditioning factor L i corresponding to the out class Y (landslide and non-landslide) is measured (Eq. 1) by calculating the reduction of the information (entropy) in bits.

$${\text{Infomation}}\;{\text{Gain}}(Y,L_{i} ) = H(Y) - H(Y|L_{i} )$$
(1)

where H(Y) is the entropy value of Y i and is calculated by using Eq. (2); H(Y|L i ) is the entropy of Y after associating values of landslide conditioning factor L i and is estimated using Eq. (3)

$$H(Y) = - \sum\limits_{i} {P\text{(}Y_{i} )} \log_{2} (P\text{(}Y_{i} \text{)})$$
(2)
$$H(Y|L_{i} ) = - \sum\limits_{i} {P\text{(}Y_{i} \text{)}} \sum\limits_{j} {P(Y_{i} |L_{i} )} \log_{2} (P(Y_{i} |L_{i} ))$$
(3)

where P(Y i ) is the prior probability of the out class Y and P(Y i |L i ) is the posterior probabilities of Y given the values of conditioning factor L i .

The prediction performance of landslide susceptibility models may have negative effects if it has an existing dependence between conditioning factors; therefore, the correlation degree of these factors should be checked. In this study, Spearman’s rank correlation (Myers and Sirois 2014) was used to analyze the relationships between these conditioning factors. The main advantage of using Spearman’s rank correlation is that it is not affected by the distribution of the data. In addition, it can still be efficient with small sample sizes (Gautheir 2001).

The strength of correlation given the Spearman’s rank is: very strong (0.9–1.0); strong, high correlation (0.7–0.9); moderate correlation (0.4–0.7); low correlation (not very significant) (0.2–0.4); very weak to negligible correlation (0.0–0.2) (Passman et al. 2011).

Functional trees classifier

Decision tree is a hierarchical model composed of decision rules that can be used for both regression and classification problems. Decision tree comprises a large number of algorithms and some of them have been proposed for landslide modeling with promising results such as Classification and Regression Trees (Felicisimo et al. 2013), Chi-square Automatic Interaction Detector Decision Trees (Althuwaynee et al. 2014), C4.5 or J48 (Tien Bui et al. 2013a), and Random forests (Trigila et al. 2015), Alternating decision tree (Hong et al. 2015a), and Logistic model trees (Tien Bui et al. 2016e). New algorithm such as functional trees (FT) (Gama 2004) has shown promising results in other fields (Witten et al. 2011) but has seldom been explored for landslide modeling and therefore was selected in this study.

Consider a training dataset D with n samples (X i Y i ) with X i  ∊ R n, \(Y_{i} \in \left\{ {\text{1,0}} \right\}\). X i is a input vector comprising the ten landslide conditioning factors (slope, aspect, relief amplitude, topographic wetness index, topographic shape, distance to roads, distance to rivers, distance to faults, lithology, and rainfall), Y i is the output that consists of two classes, landslide and no-landslide. The aim of FT is to build a decision tree that separates the two classes from the mentioned set of training data. The main difference between traditional decision tree algorithms and FT is that these traditional algorithms divide the input data at tree nodes by comparing the value of some input attributes with a constant, whereas FT uses logistic regression functions for the splitting in the inner nodes (called oblique split) and prediction at the leaves (Witten et al. 2011). There are three variants of FT: (1) the full FT that uses regression models for both the inner nodes and the leaves; (2) FT inner uses regression models for only the inner nodes; and (3) FT leaves used regression models for only leaves. In this study, the FT leaves was used.

The FT use (1) the gain ratio as the splitting criterion is to select an input attribute to split on; (2) standard C4.5 pruning (Quinlan 1996) to prevent the problem of over-fitting; and (3) the LogitBoost (iterative reweighting) for fitting the logistic regression functions at leaves with least-squares fits (Doetsch et al. 2009) for each class \(Y_{i}\) (Eq. 4).

$$f_{{Y_{i} }} (X) = \sum\limits_{i = 1}^{10} {\beta_{i} X_{i} \text{ }{ + }\beta_{0} } \,$$
(4)

where \(P\text{(}x\text{)}\) is the probability predicted value; β i is the coefficient of the ith component in the input vector X i . The posterior probabilities in the leave, P(X), are calculated as follows (Landwehr et al. 2005):

$$P\text{(}X\text{)} = \frac{{{\text{e}}^{{2f_{{Y_{i} }} (X)}} }}{{1 + {\text{e}}^{{2f_{{Y_{i} }} (X)}} }}$$
(5)

Ensemble learning algorithms

This section describes briefly three ensemble learning algorithms, Bagging, AdaBoost, and MultiBoost that were used to established ensemble models for landslide susceptibility in this study.

Bagging

Bagging (known as bootstrap aggregation) that is a machine ensemble learning method proposed by Breiman (1996) is used in this study for obtaining more robust and accurate landslide models. Bagging has shown to be useful in landslide susceptibility models because it is sensitive to small changes in the training data, therefore may have ability to improve the prediction capability of the model (Tien Bui et al. 2014). The procedure of the bagging algorithm consists of three steps: (1) first, bootstrap samples are obtained by randomly resampling from the training dataset to form a set of training subsets; (2) then, multiple classifier-based models are constructed based on each of the subset; and (3) lately, the final model is formed by aggregating all classifier-based models.

AdaBoost

AdaBoost (known as adaptive boosting) is a relative new machine learning ensemble algorithm proposed by Freund and Schapire (1997). In contrast to the Bagging, where training subsets are randomly sampled independently from the previous step, training subsets are obtained sequentially in the adaptive boosting ensemble. Compared to the Bagging, the AdaBoost provides controls for both bias and variance; however, bagging has better variance reduction (Ganjisaffar et al. 2011). The procedures of the AdaBoost algorithm are: (1) first, a subset is generated from the training dataset and an initial classifier-based model is then constructed where the instances are assigned equal weights; (2) the initial model is used to predict all instances in the training dataset and the misclassified instances will be embedded higher weights, whereas the weights of the correctly classified instances are remained; (3) in the next step, the weights of all instances in the training dataset are normalized and a new subset is then randomly sampled to build a next classifier-based model. This process continues until it reaches a terminated condition (Tien Bui et al. 2013a). The final model is obtained based on a weighted sum of all the classifier-based models.

MultiBoost

Multiboost is an extension of the AdaBoost algorithm that combines the strengths of Boosting and Wagging to prevent overfitting problem (Webb 2000). Wagging is a variant of Bagging, but Wagging does not use random bootstrap samples to form a set of training subsets; it assigns random weights to the cases in each training subset. The procedures of the Multiboost algorithm are: (1) using the training dataset, random selection with replacement is carried out to build a set of training subsets, and then, uses them to build classifier-based models; (2) resetting the instance weights according to overall accuracy performance of the classifier-based models; (3) new subsets is continuous sampling on the instance weighting to train the newer classifier-based models and the result is a committee of classifiers.

Performance assessment and comparison of landslide susceptibility models

Accuracy, Sensitivity, and Specificity are the three statistical evaluation measures generally used to assess the overall performance of the landslide susceptibility models (Tien Bui et al. 2016b). Accuracy is the proportion of pixels that are classified correctly. Sensitivity is the proportion of landslide pixels that are correctly classified whereas Specificity is the proportion of the non-landslide pixels that are correctly classified.

$${\text{Sensitivity}} = \frac{\text{TP}}{\text{TP + FN}};\quad {\text{Specificity = }}\frac{\text{TN}}{\text{FP + TN}};\quad {\text{Accuracy = }}\frac{\text{TP + TN}}{\text{TP + TN + FP + FN}}$$
(6)

where true positives (TP) and true negatives (TN) are the number of pixels that are correctly classified. False positives (FP) and false negatives (FN) are the numbers of pixels that are erroneously classified.

The overall performance of the landslide susceptibility models is assessed through receiver operating characteristic (ROC) curve. The ROC curve graphs are constructed using the true positives versus the false positives in a two-dimensional space (Fawcett 2006). The ROC curve technique is attractive because it is insensitive to changes in class distribution. It means that if the proportions of landslide and non-landslide pixels in the validation dataset are varied, the ROC curve still remains. The area under the ROC curve (AUC) is a summary measure of the ROC analysis result that quantifies (1) the goodness-of-fit of the landslide models on the training dataset and (2) prediction capability of the landslide models using the validation data. A perfect model will be if AUC value is equal 1, whereas when AUC is equal 0, it indicates a non-informative model. The closer the AUC value to 1, the better is for the landslide model.

The assessment of performance of models using only the ROC curve analysis may not be the best approach. This is because the models with a high AUC value may not be necessarily associated with a high spatial accuracy of the models in some cases (Aguirre-Gutiérrez et al. 2013). Therefore, in this study, the prediction–rate curve method (Chung and Fabbri 2003) was further used. The prediction–rate results were obtained by overlaying the landslide pixels of the validation dataset with landslide susceptibility maps, and then the prediction–rate curve was constructed by plotting the cumulative percentage of landslide susceptibility maps and the cumulative percentage of the landslide pixels. The area under the prediction–rate curve (AUC_P) was used to quantify the prediction capability of the landslide models and when the AUC_P is equal to 1, it indicates perfect prediction accuracy.

Results and analysis

Feature selection and correlation analysis

Using Information Gain, the predictive ability of the ten conditioning factors was quantified and the result is shown in Table 2 in which the average merit is the average Information Gain and its SD with ten-fold cross-validation. It could be seen that the distance to roads has the highest Information Gain (0.266), followed by the slope (0.09), the aspect (0.048), the toposhade (0.045), the TWI (0.043), the relief amplitude (0.04), the distance to rivers (0.038), the rainfall (0.031), the lithology (0.029), and the distance to faults (0.014). Since ten factors have positive Information Gain, all of them were included in this analysis.

Table 2 Average Information Gain for the landslide conditioning factors

The result of the Spearman correlation analysis of the ten conditioning factors for this study is shown in Table 3. It could be observed that there is low correlation between these factors because the highest correlation value of 0.497 is for the correlation between the slope and the relief amplitude. This value is less than the critical value of 0.7 (Martín et al. 2012); therefore, none of the ten factors was eliminated in this analysis.

Table 3 Spearman’s correlation between pairs of landslide conditioning factors

Performance assessment of landslide susceptibility models

The performance of the FT model may be influenced by minimum number of instances per leaf; therefore, a test is carried out by varying number of instances per leaf versus classification accuracy on both the training and validation data (Tien Bui et al. 2012a). The result showed that 30 instances per leaf are the best for this study. For building the FT model, LogitBoost with 15 iterations (default parameter) is used. Using tenfolds cross-validation, the FT model was constructed using the standard top-down approach. Accordingly, in each internal node, the splitting was carried out using the gain ratio, and then, logistic regression models were constructed for the leaves of the FT model.

The resulting FT model for the assessment of landslide susceptibility is shown in Fig. 6. It can be seen that the size of the tree is 71, including (1) the root node (orange color); (2) 34 internal nodes (purple color); and (3) 36 leaves (green rectangular boxes). In the leaves, LS denotes the landslide class, No-LS denotes the non-landslide class, and FT indicates FT number. The highest number of instances in a leaf node in the FT model is 508, whereas the smallest number of instances in a leaf node is 62.

Fig. 6
figure 6

The functional tree model for landslide susceptibility assessment of this study area

Example of the FT25:15/210(152) in Fig. 6 is explained as follows: (1) the first number (15) is the numbers of LogitBoost iterations performed at this node; (2) the second number (210) is the total numbers of LogitBoost iterations performed, including iterations at the higher levels in the tree and the number of training examples at this node; and (3) the number in the parentheses (152) is the number of training instances used (Fig. 6). The functional trees for the node 25 are:

$$\begin{aligned} & {\text{Non-Landslide class}}{:} \, 12.66 - 2.76*{\text{Slope}}-0.07\,* \\ & \quad {\text{Aspect}} - 0.94*{\text{RF}} + 1.1*{\text{TWI}} + 0.46*{\text{TopoShade}} + 0.29\,* \\ & \quad {\text{Lithology}}-0.5*{\text{Faults}} - 0.61*{\text{Roads}} - 0.19*{\text{Rivers}} - 2.87*{\text{Rainfall}}.\\ \end{aligned}$$
$$\begin{aligned} & {\text{Landslide class}}{:} \, - 12.66 - 2.76*{\text{Slope}}-0.07\,* \\ & \quad {\text{Aspect}} - 0.94*{\text{RF}} + 1.1*{\text{TWI}} + 0.46*{\text{TopoShade}} + 0.29\,* \\ & \quad {\text{Lithology}}-0.5*{\text{Faults}} - 0.61*{\text{Roads}} - 0.19*{\text{Rivers}} - 2.87*{\text{Rainfall}}. \\ \end{aligned}$$

Since the aim of this study is to propose and verify three novel ensemble frameworks (Bagging, AdaBoost, and MultiBoost) for landslide susceptibility modeling, therefore three ensemble models used FT as a base classifier are constructed and the results are shown in Table 4. It could be observed that all three ensemble algorithms improved the model performance and have higher goodness-of-fit to the training data than the FT model does. The highest fit of the training data with a model is the FT with AdaBoost model (96.1 %) and the FT with MultiBoost model (95.9 %), followed by the FT with Bagging model (94.6 %), and the FT model (91.5 %). The FT with AdaBoost model has also the highest overall classification accuracy (90.919 %), followed by the FT with MultiBoost model (90.685 %), the FT with Bagging model (88.563 %), and the FT model (87.7 %).

Table 4 Training results for the four landslide models with tenfolds cross-validation

The FT with AdaBoost model has the highest sensitivity of 93.492 % indicating that 93.492 % of the landslide pixels are correctly classified to the landslide class. It is closely followed by the FT with MultiBoost model (92.844 %), the FT model (90.076 %), and the FT with Bagging model (89.824 %). Regarding specificity, three ensemble models have almost equal values that the probability to classify the non-landslide pixels to the non-landslide class is almost the same. Kappa index of the four susceptibility models is varied from 0.754 (the FT model) to 0.818 (the FT with AdaBoost model) indicating good agreement between the models and the training data.

Once the FT and three ensemble models were successfully built in the training phase, these models were then used to calculate the susceptibility index for all the pixels in the study area. These indices were exported into a GIS format using an application developed in C++ programming, and then opened in ArcGIS 10.2 software. For visualization of the landslide susceptibility maps, these indexes were visualized by means of five susceptibility levels such as very high, high, moderate, low and very low (Chung et al. 1995). Although various methods can be used for the classification of susceptibility indexes such as the equal interval method, the natural break method and the SD (Ayalew and Yamagishi 2005), the classification method based on the graphical curve (Chung and Fabbri 2008; Tien Bui et al. 2012e; Van Westen et al. 2003) is considered the most widely used and was used in this study.

In this method, first, all landslide pixels were overlaid on the four landslide susceptibility maps. Then, cumulative percentages of the landslide pixels versus percentage of landslide susceptibility indexes were calculated, and finally, the graphical curve was derived. Detailed explanation on how to build the graphical curve can be seen in Chung et al. (1995) and Chung and Fabbri (2008). Based on the graphical curves (Fig. 7), five susceptibility classes were determined as very high 5 %, high 10 %, moderate 15 %, low 20 %, and very low 50 % (Fig. 7).

Fig. 7
figure 7figure 7

Landslide susceptibility map using: a the functional tree model, b the functional tree with AdaBoost model, c the functional tree with Bagging model; and d the functional tree with MultiBoost model

Model validation and comparison

The prediction capability of four susceptibility models is evaluated and compared using the validation dataset that was not used in the training phase. The results are shown in Table 5 and Fig. 8. It could be seen that AUC of 0.917 is for the FT with Bagging model indicating that the prediction accuracy is 91.7 %, followed closely by the FT with MultiBoost model (91 %), the FT model (89.8 %), and the FT with AdaBoost model (88.2 %). The FT with AdaBoost model has the lowest Kappa index (0.604), whereas the FT with Bagging model has the highest one (0.711) (Table 5).

Table 5 Model validation
Fig. 8
figure 8

Model validation with the ROC curves and AUC analysis for the four landslide susceptibility maps using the functional tree, the functional tree with AdaBoost model, the functional tree with Bagging model, and the functional tree with MultiBoost model

The detailed statistical measures of the validation results are shown in Table 5. It reveals that the highest classification accuracy is for the FT with Bagging model (85.552 %), whereas the lowest one is for the FT with AdaBoost model (80.208 %). The classification accuracy is almost equal for the FT with MultiBoost model (83.869 %) and the FT model (83.671 %). The FT with Bagging model has the highest sensitivity (81.998 %) indicating the probability to correctly classify the landslide pixels to the landslide class is 81.998 %, followed by the FT model (81.503 %), the FT with MultiBoost model (76.855 %), and the FT with AdaBoost (68.447 %). The highest specificity is for the FT with AdaBoost model (91.98 %) indicating 91.98 % non-landslide pixels are correctly classified to the non-landslide class. It is closely followed by the FT with MultiBoost model (90.891 %), and the FT with Bagging model (89.109 %). The lowest specificity is the FT model (85.842 %) indicating that the probability to classify the non-landslide pixels to the non-landslide class correctly is 85.842 %.

The prediction rate of the four susceptibility models is assessed using the spatial cross-validation procedure as mentioned in the Sect. 3.5. The areas under the prediction–rate curves (AUC_P) were then estimated and shown in Fig. 9. It shows that the FT with Bagging has highest prediction capability (89.7 %) is for the FT with Bagging and the FT with MultiBoost models. They are followed by the FT model (86.2 %) and the FT with AdaBoost model (85.6 %).

Fig. 9
figure 9

Model validation with the prediction–rate curve and AUC_P analysis for the four landslide susceptibility maps using the functional tree model, the functional tree with AdaBoost model, the functional tree with Bagging model, and the functional tree with MultiBoost model

Based on the aforementioned results, it could be concluded that the FT with Bagging is the best model for landslide susceptibility mapping in this study.

Similarities and dissimilarities of the four landslide susceptibility maps and their classes

In order to evaluate similarities and dissimilarities of the geographic patterns in five classes of the four landslide susceptibility maps, three Kappa statistics (Kappa index, Kappa location, and Kappa histogram) were used. It is noted that this task was carried out using the Map Comparison Kit (Visser and de Nijs 2006). Kappa (Cohen 1960) that based on the level of agreement is widely used to measure similarity between a pair of landslide susceptibility maps. Kappa location (Pontius 2000) and Kappa histogram (Hagen 2002) are extensions of Kappa index. Kappa location compares the actual to expected success rate due to chance, to assess the similarity of location regarding the spatial distribution of categories on the maps (Pontius 2000). Kappa histogram measures similarity of quantitative (fraction of pixels) based on the histograms of the two maps (Prasad et al. 2006). The values of Kappa statistics are varied from 0 to 1. Value of 1 indicates two classes are identical (total agreement), while a value of 0 indicates that the no agreement between two classes. The degree of agreement between two classes given the Kappa is for 0.8–1.0 almost perfect, 0.6–0.8 substantial, 0.4–0.6 moderate, 0.2–0.4 fair, 0–0.2 slight, and ≤0 poor (Landis and Koch 1977).

Table 6 shows the results of the comparison of four landslide susceptibility maps in terms of Kappa statistics. The results show that Kappa indexes for the four susceptibility maps varied from 0.246 to 0.423 indicates that the similarity between the four susceptibility maps is low. Looking at the Kappa index values for susceptibility classes (Table 6), the highest similarity is in the very high class obtained from the FT and the FT with Bagging models (Kappa index of 0.810). The largest dissimilarity is for the low susceptibility classes produced by the FT and the FT with MultiBoost models (Kappa index of 0.057). The highest value of Kappa location is 0.482 for two maps obtained from The FT with AdaBoost and the FT with MultiBoost models indicating that the spatial distributions of susceptibility indexes over the two maps are moderate, whereas the very high classes of the FT and the FT with Bagging models has the highest similarity in terms of spatial distributions. The largest dissimilarity in the spatial distributions is for the low susceptibility classes obtained from the FT and the FT with AdaBoost models (Kappa location of 0.073). The values of Kappa histogram are general high when comparing four susceptibility maps indicates a perfect quantitative similarity. An interpretation of Kappa histogram values for five susceptibility classes shows that the highest quantitative dissimilarities (Kappa histogram of 0.521) is for the pair low susceptibility classes obtained from the FT and the FT with MultiBoost models, and the FT with Bagging and the FT with MultiBoost models.

Table 6 Kappa index, Kappa location, and Kappa histogram for the four landslide susceptibility maps and their five classes

Discussion and conclusion

Landslide susceptibility maps are of great help in land use planning, hazard management, and mitigations (Burby 1998); therefore, these maps should be constructed using prediction models with high accuracy. However, a perfect landslide model with no error is almost impossible; therefore, new algorithms and frameworks that may help to increase prediction performances of landslide models should be explored and verified. We address this issue in this paper by proposing and verifying a new ensemble methodology for landslide susceptibility modeling based on FT and three ensemble frameworks, AdaBoost, Bagging, and MultiBoost. Three main aims are focused on: (1) feature selection and variable importance for landslide conditioning factors using the Information Gain technique; (2) exploration in the first time the potential application of the FT and three ensembles techniques for the assessment of landslide susceptibility at the corridor of the national road No. 32 (Vietnam); and (3) assessment similarities and dissimilarities of the landslide susceptibility maps and their susceptibility classes using Kappa index, Kappa location, and Kappa histogram.

In landslide modeling, the predictive ability of a set of widely used conditioning factors should be quantified (Tien Bui et al. 2016c). Although various techniques and methods have been proposed for the feature selection such as linear correlation (Irigaray et al. 2007), Goodman-Kruskal and Kolmogorov–Smirnov test (Costanzo et al. 2012; Fernández et al. 2003), and GIS matrix combination method (Cross 2002), but none of them is widely accepted as the standard guideline for the assessment of landslide susceptibility. The result in this study shows that the Information Gain technique could be used for the feature selection. The main advantage of this technique is that the decrease in entropy of the output (landslide and non-landslide classes) when the output is associated with landslide conditioning factors, is measured and used to assess the importance of these factors. The higher the decreasing of entropy, the better is for the conditioning factor. This study shows that all ten conditioning factors have significant predictive ability, indicating that the collection, processing, and coding of these factors have been carried out successfully. Distance to roads and slope are the most important factors, indicating logical and reasonable result. This is because this study mainly investigated landslides occurred in the corridor of the national road No.32 and slope is widely accepted as the most important in literature (Costanzo et al. 2012; Van Den Eeckhaut et al. 2006).

Using the ten conditioning factors, four landslide susceptibility maps were produced using the FT and the three ensembles techniques. It was found that four susceptibility models performed reasonably well with high degree-of-fits and high prediction capabilities. The FT model with its visible structures provided useful insights on how the model works. The AUC for the FT model show a high degree-of-fits on the training dataset (91.5 %). The degree-of-fits is even improved when the FT was integrated with the three ensembles techniques. The AUC is improved significantly, 3.1 % for the FT with Bagging, 4.4 % for the FT with AdaBoost, 4.6 % the FT with MultiBoost. The prediction power of the FT with Bagging and the FT with MultiBoost models has also improved 1.9 and 1.2 % compared to the FT model, respectively. In contrast, the prediction power of the FT with AdaBoost is reduced 1.6 % compared to the FT model. Therefore, the Bagging and the MultiBoost ensemble frameworks should be used for landslide susceptibility modeling. In fact, the Bagging and the MultiBoost are more recently well-recognized techniques in the soft computing modeling that enable not only to improve single classifier but also to deal with complex and high-dimensional modeling problems (Trawiński et al. 2013). In general, the finding results in this study agree with Althuwaynee et al. (2014), Jebur et al. (2014), and Tien Bui et al. (2014) who state that ensemble models outperform the single model

The prediction powers of four susceptibility models were further estimated by using the prediction–rate method that using only the landslide pixels in the validation set. The FT with Bagging and the FT with MultiBoost models have the highest prediction powers (89.7 %), followed by the FT model (86.2 %) and the FT with AdaBoost model (85.6 %). It is clear that the prediction power of all the models checked by the prediction–rate method is slightly lower than those calculated using the ROC curve method. The highest difference is for the FT model (3.6 %), followed by the FT with AdaBoost model (2.6 %), the FT with Bagging model (2.0 %), and the FT with MultiBoost model (1.3 %). These differences are because the validation procedure using the ROC curve analysis using entire validation dataset (1011 landslide and 1011 non-landslide pixels), whereas the prediction–rate method used only 1011 landslide pixels in the validation dataset for the estimation of area under the curves in four susceptibility maps. In fact, the ROC curve and AUC in landslide susceptibility models are affected by several factors: (1) the methods or techniques used; (2) the selection of conditioning factors; (3) the landslides inventory map; and (4) characteristics of the study area. Consequently, the correlation between AUC values and the prediction capability of the susceptibility models may not correspond strictly; therefore, the prediction–rate method should be considered as well.

To evaluate geographic consistency of the susceptibility index distributions, Kappa index, Kappa location, and Kappa histogram should be used. These could help to reveal similarities and dissimilarities of the four landslide susceptibility maps and their classes. For example, although the performances of the FT with Bagging and the FT with MultiBoost models are almost the same, the similarities of spatial distributions of susceptibility indexes over the two maps are only moderate. However, a high degree of similarities is for the high landslide susceptibility classes, whereas dissimilarities are low susceptibility classes.

Overall, the result from this study clearly shows that the FT with Bagging model has the highest accuracy. Compared with the susceptibility models produced by the same authors using well-known soft computing algorithms such as J48 Decision Tree (Tien Bui et al. 2013a) and artificial neural networks (Tien Bui et al. 2013b), the prediction capability of the FT with Bagging model is better. Therefore, we conclude that the FT with Bagging is a promising technique that should be considered as an alternative for the assessment of landslide susceptibility. Since these results are representative of the currently implemented versions of these techniques, the performance of susceptibility models may be improved if having changes in coding the algorithms in the future. However, these results are only representative for the current study area. Investigations for other areas with different terrain and geological contexts should be further considered. As a final conclusion, these results from this study may useful for land use planning and decision making in areas prone to landslides.