Keywords

1 Introduction

Landslide susceptibility mapping is known as a first step in landslide hazard mitigation [1]. It helps in identifying the high and very high prone areas so that the suitable land use planning strategies and decisions can be made in advance to protect the stability of the slopes and prevent the occurrences of future landslides. Landslide susceptibility mapping is carried out on the basis of the analysis of spatial relationship between past and current landslides and a set of influencing factors [2, 3]. Geographic Information Systems (GIS), which has been used widely as a basic tool for data manipulation and spatial management [4], is also a helpful tool for producing the landslide susceptibility map.

In recent years, many literatures have shown landslide susceptibility mapping studies using GIS. Many of these studies have applied machine learning methods such as artificial neural networks [5, 6], decision trees [7, 8], support vector machines [9, 10], logistic regression [11, 12]. More recently, new hybrid techniques have been developed and used for landslide susceptibility mapping. Bui et al. [13] developed a novel hybrid model of least-squares support vector machines and differential evolution optimization for landslide spatial prediction. Pham et al. [14] proposed a novel hybrid machine learning model of rotation forest ensemble and fuzzy unordered rules induction algorithm classifier for landslide prediction. In general, the hybrid models often outperform the single models for landslide susceptibility mapping [15, 16].

In this paper, we proposed a novel hybrid model namely Rotation Forest based Functional Trees (RFFT), which is a hybrid intelligent approach of two state of the art machine learning techniques of Functional Trees (FT) classifier and Rotation Forest (RF) ensemble, for landslide susceptibility mapping at the Kon Tum Province, Viet Nam. Another objective of this study is to compare the novel hybrid RFFT model with another well-known landslide model namely Support Vector Machines (SVM), and the single FT model. To validate and compare the models, the Receiver Operating Characteristic (ROC) curve and statistical indexes analysis have been applied in both training and testing datasets.

2 Methods Used

2.1 Rotation Forest

Rotation Forest (RF), which is one of the popular ensemble techniques, was proposed by Rodriguez et al. [17]. It is known as an efficient technique to improve the weaker classifiers [18]. The RF uses Principal Component Analysis (PCA), which is a multivariate technique to analyze the large multivariate datasets for reducing their dimensionality [19], and to separate the original training datasets into subsets which are then used to train the classifiers [20]. Even though the RF is a relatively new ensemble technique, its application has been in many different fields such as medical [21, 22], remote sensing data classification [23, 24]. In landslide susceptibility modeling, the RF has also been applied in few cases [14, 25, 26].

2.2 Functional Trees

Functional Trees (FT), which was proposed by Gama [27], is a combination of a discriminant function and a multivariate decision trees by means of constructive induction [28]. It is considered as a generalization of multivariate trees. The FT combines attributes at leaf nodes, decision nodes, or both nodes and leaves for learning classification trees [28] in which decision nodes are constructed as the tree is growing whereas functional leaves are constructed as the tree is pruning [27]. In prediction, the FT can be used to predict the value of the target variables for unclassified samples. Firstly, the sample traverses the tree from the root node to a leaf in which the set of attributes of the sample is extended at each decision node using the constructor function constructed at the node. Thereafter, the decision test of the node is applied to define the path on which the sample will follow. Finally, the sample is classified as a leaf is reached using either the constructor function constructed at the leaf or the constant related with the leaf [28].

2.3 Proposed Rotation Forest Based Functional Trees

Rotation Forest Based Functional Trees (RFFT) is a hybrid approach of the RF ensemble and the FT classifier. In landslide modeling, the RFFT is constructed in three main steps as follows:

  • Step 1, Optimization: In this step, the RF ensemble was used to optimize the input data for modeling. Original training dataset was divided into different sub-training datasets which were then used for classification. Optimal sub-training datasets were determined as the best performance of the RFFT model was obtained. Iterations were applied in this step in which the number of iteration was determined as “16” based on a trial-and-error test to give the best for training the RF ensemble.

  • Step 2, Classification: In this step, the FT classifier was used to classify the classes (landslide and non-landslide) for predicting spatially landslides. Different FT classifiers were generated using different sub-training datasets which were generated by the RF ensemble. To train the FT classifier, the parameters of the algorithm were selected using trial-error process such as the minimum number of instances at which a node is considered for splitting is “15”, and the number of iterations for the RF is “15”.

  • Step 3, Combination: In the final step of modeling, the RF ensemble was used to combine the FT classifiers generated to construct the final RFFT model.

2.4 Validation Methods

Receiver Operating Characteristic (ROC) curve which is a useful tool to represent the quality of models [29, 30], and the area under the ROC curve (AUC) were selected to validate the overall performance of the hybrid RFFT model in this study. On the ROC curve graph, the “sensitivity” is shown on the y-axis whereas the “100-specificity” is shown on the x-axis. The highest AUC value of “1” indicates the best or perfect performance of the model whereas higher AUC value shows a better model [31,32,33].

In addition, statistical indexes namely negative predictive value (NPV), positive predictive value (PPV), specificity (SPF), sensitivity (SST), accuracy (ACC), Kappa (K), and Root Mean Squared Error (RMSE), which shows more detail performance of models, were also selected to validate the predictive capability of the proposed hybrid model in the present study. Detail description of these statistical indexes were presented in Bennett et al. [34] and Pham et al. [35].

3 General Characteristics of the Study Area

The study area is Kon Tum province located in Central Highland plateau where many geo-hazards (i.e. landslides and human induced earthquakes) occurred as the result of the interaction between natural hazards and human activities [36,37,38,39,40]. Geographically, the study area is located on the western part of Truong Son mountain range, between longitudes 107o 20ʹ E to 108o 32ʹ E and latitudes 13o 55ʹ N to 15o 26ʹ N (Fig. 1). It covers an area of approximately 9689.6 km2, and has nine administrative units namely Kon Tum city, Dak Glei, Dak To, Kon Ray, Kon Plong, TuMoRong, Sa Thay, Ngoc Hoi, and Dak Ha districts. The study area belongs to the highly precipitation zone of the Central Highland where total annually rainfall varies from 1700 mm to above 3000 mm in which rainfall (90.9%) concentrates strongly in rainy season (April to November). The Kon Tum area has three main large drainages such as the Dak Bla River, the KrongPoKo River, and the Sa Thay River. The drainage density in the study area is measured around 0.45 km/km2.

Fig. 1.
figure 1

Landslide inventory map and location map of study region

In the study area, the topography feature is aligned in the N - S direction. The elevation varies from 800–1200 m in the north part whereas it ranges from 500–530 m in the southern part, and the highest point of 2598 m is at the Ngoc Linh peak in the southern part of Vietnam.

The geology is complex with 33 geologic formations and complexes from Archean to Cenozoic of ages. Metamorphic rocks are mostly dominated, followed by intrusive igneous rocks, extrusive igneous rocks, and sedimentary. The study area belongs to the Kon Tum block which is one of the main uplift blocks in the Central Highland affected strongly by the tectonic activities. Fault zones in the Kon Tum province follow the sub-meridian and northwest-southeast directions. Major faults traverse area including Ngoc Linh - Dak To, Po Co River, Xe Ca Man, Dak Sa, and Sa Thay.

4 GIS Database

4.1 Landslide Inventory Map

Landslide inventory map of this area was constructed from 1404 landslides occurred during the years from 2004 to 2015, data for the inventory has been collected from various sources such as interpretation of Google Earth images and field investigation.

Landslides in this area can be classified into 3 main types in size such as large landslides (above 10000 m2), medium landslides (1000–10,000 m2), and small landslides (less than 1000 m2). Among these landslides, the largest landslide covers an area of approximately 361485 m2 whereas the smallest one is about 118.4 m2.

Landslides, in the study area, have been identified as polygons. However, due to the number of landslides are of big size, thus the center representative landslide pixels converted from polygons have been used for landslide susceptibility analysis. Out of these, a total of 983 landslide pixels have been used for generating training dataset whereas 420 remaining pixels have been utilized for creating testing dataset.

4.2 Landslide Affecting Parameters

In the present study, a total of ten parameters (Table 1) namely slope, aspect, elevation, land use, lithology, Normalized Difference Vegetation Index (NDVI), soil type, distance to faults, distance to geology boundaries, and valley depth were selected and used for landslide susceptibility modeling as they are known as landslide influencing factors for landslide spatial prediction [13, 41].

Table 1. Landside affecting parameters and their classes

A Digital Elevation Model (DEM) with the spatial resolution of 30 × 30 m was generated from national topographic maps (1:50,000), and then the DEM was used to generate slope map (Fig. 2), aspect map, elevation map, and valley depth map (Fig. 3) using GIS application and SAGA software.

Fig. 2.
figure 2

Slope map of the study area

Fig. 3.
figure 3

Valley depth map of the study area

Soil type map (Fig. 4) at a scale of 1:100000 was collected from the Agricultural Planning and Projection of Vietnam in 2000. Geological and mineral resource maps at the scale of 1:200.000 collected from Ministry of Natural Resources and Environment of Vietnam were used to construct lithology map (Fig. 5). These maps have been also used to extract the faults which were then used to generate distance to faults map using the buffering function in ArcGIS application. Landsat 8 OLI (30 m resolution, with three paths/rows 124-50, 125-50 and 125-49) that was acquired on 14 April 2015 was used to generate NDVI map. Land use map at scale of 1:50000 was provided by the local authority of the study area. In addition, distance to geology boundaries was used for landslide susceptibility modeling. Distance to geology boundaries map was constructed using the geology boundaries extracted from geological and mineral resource maps, and buffered into the study area. For landslide susceptibility assessment, different classes of these factors have been classified on the basis of the study of literatures [13, 42] and the analysis of the characteristics of the study area in relation with landslide occurrences.

Fig. 4.
figure 4

Land use map of the study sarea

Fig. 5.
figure 5

Land use/ land cover map of the study area

5 Results and Discussion

Landslide susceptibility model using the novel hybrid RFFT model has been constructed and validated using training and testing datasets, respectively, and the results are shown in Figs. 6, 7, 8, and 9. According to the ROC curve analysis (Figs. 6 and 7), it can be observed that the RFFT model has high AUC values for training (0.926), and testing (0.811) datasets. These values are higher than those of the SVM model (0.904 for training dataset and 0.806 for testing dataset), and the single FT model (0.892 for training dataset and 0.761 for testing dataset).

Fig. 6.
figure 6

The ROC curve analysis of landslide models using training dataset

Fig. 7.
figure 7

The ROC curve analysis of landslide models using testing dataset

Fig. 8.
figure 8

Model performance using training dataset

Fig. 9.
figure 9

Model performance using testing dataset

Likewise, according to the statistical indexes analysis (Figs. 8 and 9), the RFFT model has the highest values of PPV (80.57% for training dataset and 70.31% for testing dataset), NPV (90.05% for training dataset and 78.15% for testing dataset), SST (90.00% for training dataset and 76.29% for testing dataset), SPF (82.41% for training dataset and 72.47% for testing dataset), ACC (85.81% for training dataset and 74.23% for testing dataset), and K (0.716 for training dataset and 0.485 for testing dataset), followed by the SVM model with PPV (80.16% for training dataset and 70.30% for testing dataset), NPV (90.74% for training dataset and 77.67% for testing dataset), SST (89.65% for training dataset and 75.90% for testing dataset), SPF (82.06% for training dataset and 72.35% for testing dataset), ACC (85.45% for training dataset and 73.99% for testing dataset), and K (0.710 for training dataset and 0.480 for testing dataset), and the single FT model with PPV (79.86% for training dataset and 68.65% for testing dataset), NPV (90.54% for training dataset and 76.96% for testing dataset), SST (89.41% for training dataset and 74.87% for testing dataset), SPF (81.80% for training dataset and 71.05% for testing dataset), ACC (85.20% for training dataset and 72.80% for testing dataset), and K (0.704 for training dataset and 0.456 for testing dataset), respectively. In contrast, the RFFT model has the lowest value of RMSE (0.322 for training dataset and 0.412 for testing dataset), followed by the SVM model (0.349 for training dataset and 0.419 for testing dataset), and the single FT model (0.395 for training dataset and 0.454 for testing dataset), respectively.

In landslide susceptibility modeling, the hybrid model is known as a great alternative tool [16]. In this study, the RFFT model, which is a hybrid approach of the RF ensemble and the FT classifier, also performed very well for landslide susceptibility modeling based on the analysis of above results as it outperforms both the well-known SVM model and single FT model. It can be reasonably explained that the RFFT model takes advantage of machine learning ensemble method of the RF ensemble for improving the prediction ability of FT classifier [17]. In the RF, the diversity is guaranteed by the rotation strategy, thus the classification accuracy of the RFFT model is better than single FT classifier [43]. In addition, the PCA used in the RF is a good filter method which will help in reducing the learning errors [17].

Landslide susceptibility map has been constructed at the Kontum province, Viet Nam using the novel hybrid model RFFT in this study. For this, a total of 24021009 pixels of the study area has been assigned the susceptibility indexes which have been generated during training the RFFT model. Thereafter, susceptibility classes namely very low, low, moderate, high, and very high were determined to construct the landslide susceptibility map (Fig. 10) based on the classification of the generated susceptibility indexes using the natural breaks method [44] in ArcGIS environment. Furthermore, landslide pixels frequency analysis on the map has been carried out to validate the reliability of the landslide susceptibility map, and the results are shown in Fig. 11. It can be observed that the highest frequency of landslides is on the very high class (69.82%), followed by the high class (15.63%), the moderate class (9.47%), the low class (3.71), and the very low class (1.37%), respectively. Based on the frequency analysis results, it is reasonably to state that the produced landslide susceptibility map is much reliable for landslide hazard management in the study area.

Fig. 10.
figure 10

Landslide susceptibility map using the proposed hybrid model RFFT

Fig. 11.
figure 11

Pixels distribution analysis on landslide susceptibility map

6 Concluding Remarks

Landslide susceptibility mapping has been conducted at the Kontum province, Viet Nam using the novel hybrid model named RFFT which is a combination of two state of the art machine learning methods of the RF ensemble and the FT classifier. Performance of the proposed RFFT model has been validated by analysis of the ROC curve and statistical indexes, and compared with a well-known landslide model namely Support Vector Machines (SVM), and the single FT model.

Results show that the proposed RFFT model has very good performance for landslide susceptibility assessment. It has better predictive capability compared with well-known SVM model and single FT model. Therefore, it can be concluded that the proposed RFFT model should be used as a great alternative method for better landslide susceptibility assessment in landslide prone areas. Results of this study also reconfirmed that the hybrid model using ensemble techniques such as the RF is a promising approach for landslide susceptibility modeling.