Introduction

Landslides, causing extensive damages to residential regions, economic losses, and human casualties all over the world, are one of the most dangerous geo-hazards in hill and mountain terrains because of the cliffy topography, irrational application of land cover and harmful climatic conditions for landslides (Solaimani et al. 2013; Sujatha et al. 2012; Akgun et al. 2008). Globally, landslides cause almost 1000 deaths per year and property damage of about 4 billion dollars (Lee and Pradhan 2007). There are frequent landslides in China, which often result in many casualties and great quality economic losses. It is reported that more than 30,737 hazards associated with landslides occurred in 2012, 2013 and 2014, which caused a total of 1256 people dead or missing, and a direct economic loss of 15.41 billion CNY (http://www.cigem.gov.cn). It is therefore necessary to evaluate the factors that affect instability, study the hazard and forecast of the potential landslides to reduce the damages caused by landslides and evolve rational mitigation method (Sujatha et al. 2012).

Currently, there have been many GIS-based methods for assessing landslide susceptibility. Many studies have used probabilistic methods such as the frequency ratio and weight of evidence models (Solaimani et al. 2013; Lee and Pradhan 2007; Regmi et al. 2014; Pradhan et al. 2010; Choi et al. 2012; Vijith and Madhu 2008; Demir et al. 2013; Ozdemir and Altural 2013; Sujatha et al. 2014). Among statistical models, the bivariate and multivariate logistic regression models have also been used for landslide susceptibility mapping (Solaimani et al. 2013; Lee and Pradhan 2007; Akgun 2012; Bai et al. 2010; Ozdemir and Altural 2013; Pradhan and Lee 2010; Choi et al. 2012; Devkota et al. 2013; Yilmaz 2010b; Park et al. 2013; Raman and Punia 2012; Mihaela et al. 2011; Yalcin et al. 2011; Youssef et al. 2015b). In addition, some researchers (Kayastha et al. 2013; Kanungo et al. 2011; Pourghasemi et al. 2012a; Guettouche 2013; Akgun et al. 2012; Sharma et al. 2013; Pradhan 2010, 2011, 2013; Ercanoglu and Gokceoglu 2002, 2004; Oh and Pradhan 2011) have produced the landslide susceptibility maps using the deterministic models such as the analytic hierarchy process (AHP) and fuzzy models. Other new techniques such as fuzzy-logic, artificial neural network (ANN), and neuro-fuzzy models (Park et al. 2013; Pradhan and Buchroithner 2010; Pouydal et al. 2010; Chauhan et al. 2010; Sharma et al. 2013; Guettouche 2013; Choi et al. 2012; Vahidnia et al. 2010; Sezer et al. 2011) also have been used to evaluate landslide susceptibility. In order to determine the better model that is more accurate in landslide susceptibility mapping in a study area, some studies have used two or three models and compared their accuracy, such as probability and statistical analyses, probability and fuzzy-logic analyses, statistical and ANN analyses, analytic hierarchy process, probability and statistical analyses, and probability, statistical, and ANN analyses, etc. (Jaafari et al. 2014; Constantin et al. 2011; Kanungo et al. 2011; Ozdemir and Altural 2013; Park et al. 2013; Pourghasemi et al. 2013a; Pouydal et al. 2010; Solaimani et al. 2013; Yalcin et al. 2011; Youssef et al. 2015a; Demir et al. 2013; Lee and Pradhan 2007; Akgun 2012; Devkota et al. 2013).

The main purpose of this study is to assess the susceptibility of landslides for the Qianyang County of Baoji City, Shanxi Province, China, using a geographical information system (GIS). To achieve this aim, the support vector machine (SVM) with four different kernel functions were used to obtain the landslide susceptibility maps using the ArcGIS 10.0 software.

The SVM means that they have relatively seldom been used for landslide susceptibility mapping. Furthermore, the comparison of different kernel classifiers is rational as SVM could apply various types of kernel functions.

The study area

The study area is located in the City of Baoji, Shanxi Province, China, covering a surface area of about 996.46 km2 between latitudes 106°56′15″–107°22′31″E and longitudes 34°34′34″–34°56′56″N. The altitude decreases from North to South and varies in the range from 752 to 1560 m. The climate of the study area is characterized by the warm semi-arid to semi-humid monsoon; the winter is dry and cold, but the summer is hot and rainy. Based on China Meteorological Administration, the temperature of this region varies between −20.6 °C in winter and 40.5 °C in summer with a yearly average of 11.8 °C. The mean relative humidity varies between 59 and 82 %. The mean annual rainfall is around 627.4 mm, and the rainy season is mainly from July to September with the total rainfall accounting to half of the yearly rainfall. Wei and Jing river systems are the main streams in this area and their tributaries shape dentritic drainage systems because of the topographical and geological trait of the area.

Data preparation

Landslide inventory map

Landslide inventory and mapping is the backbone of landslide susceptibility studies, which can determine the events affecting landslide development in the study region, and the terrain instability factors involved (Guinau et al. 2005; van Westen et al. 2006; Youssef et al. 2015a, b). The first step is to collect all of the available information and data concerning landslides in the area whose liability and accuracy affect the success of the used methodology (Ercanoglu and Gokceoglu 2004; Melchiorre et al. 2011). With collecting the data concerning landslides and study satellite imagery and aerial photographs combining with field surveys using a GPS device, landslide inventory maps can be acquired (Pradhan and Kim 2014). Finally, 81 landslides were acquired by assessing aerial photos of 1: 50,000 scale coupled with field surveys in the study area and subsequently digitized for further analysis. Then, it was randomly divided into two parts (70/30), which were used as training and validating purposes, respectively (Fig. 1).

Fig. 1
figure 1

Location of the study area

Landslide conditioning factors

With the purpose of applying the SVM model in the study area, 15 landslide conditioning factors, including slope angle, slope aspect, altitude, plan curvature, profile curvature, distance to faults, distance to rivers, distance to roads, NDVI, STI, SPI, TWI, geomorphology, rainfall, and lithology were used. All of the data were converted into raster format with a pixel size of 50 × 50 m. These conditioning factors were classified into four groups: topographic factors (i.e., slope angle, slope aspect, altitude, plan curvature, profile curvature, STI, SPI, TWI), distance related factors (i.e., distance to roads, distance to rivers, and distance to faults), ground conditions (i.e., geomorphology, NDVI, and lithology) and triggering factors (i.e., rainfall) (Akgun et al. 2012; Melchiorre et al. 2011).

Topographic factors including slope angle, slope aspect, altitude, plan curvature, profile curvature, STI, SPI, TWI were mainly produced from the DEM of the study area. The slope angle, directly related to landslide incidence, is frequently applied in landslide susceptibility studies (He et al. 2012; Dai and Lee 2001). Slope angles in the study area ranged from 0° to 38°, and were reclassified into five classes, i.e., 0°–7°, 7°–14°, 14°–21°, 21°–28°, and 28°–38° (Fig. 2a). Aspect, describing the direction of slope, is also an important factor for landslide susceptibility analysis, as aspect controls the formation of the landslide such as lineaments, rainfalls, wind effects, and exposure to sunshine (Yalcin and Bulut 2007; Pourghasemi et al. 2012a; He et al. 2012). Aspect in the study area was classified into nine categories: flat (−1), north (337.5°–360°, 0°–22.5°), northeast (22.5°–67.5°), east (67.5°–112.5°), southeast (112.5°–157.5°), south (157.5°–202.5°), southwest (202.5°–247.5°), west (247.5°–292.5°), and northwest (292.5°–337.5°) (Fig. 2b). Altitude or elevation, controlled by several geologic and geomorphologic processes, is also frequently used in landslide susceptibility mapping (Pourghasemi et al. 2012b; Pradhan and Kim 2014). Elevation values in the study area ranged from 720 to 1560 m, and five categories of elevations were identified, i.e., 720–850, 850–1000, 1000–1150, 1150–1300, and 1300–1560 m (Fig. 2c). The plan curvature influences the convergence and divergence of flow across a surface. The profile curvature, the vertical plane parallel to the slope direction, affects the acceleration and deceleration of down slope flows, and as a result, influences erosion and deposition (He et al. 2012; Kritikos and Davies 2014; Kannan et al. 2013). In this study, plan curvature and profile curvature were calculated in GIS software of Arc GIS 10.0; and they were divided into three classes: <−0.05, −0.05 to 0.05, and >0.05, respectively (Fig. 2d, e). The sediment transport index (STI) reflects the process of erosion and deposition (Devkota et al. 2013). In the study, STI was classified into four classes: <3, 3–9, 9–15, and >15 (Fig. 2f). The stream power index (SPI), describing erosion capability of water flow, is also considered as a factor influencing the stability in the study region (Regmi et al. 2014; Conforti et al. 2011). The SPI map was grouped into four different classes: <5, 5–10, 10–40, and >40 (Fig. 2g). The topographic wetness index (TWI) describes the effect of topography on the location and size of saturated source areas of runoff generation, and was considered as another contributing factor (Pourghasemi et al. 2013b; Pradhan and Kim 2014). The TWI values of this area were arranged in four classes: <7, 7–10, 10–13, and >13, respectively (Fig. 2h).

Fig. 2
figure 2figure 2figure 2

Landslide conditioning factors of the study area: a slope angle, b slope aspect, c elevation, d plan curvature, e profile curvature, f STI, g SPI, h TWI, i distance to faults, g distance to rivers, k distance to roads, l geomorphology, m NDVI, n rainfall, and o lithology

Faults are responsible for triggering a large number of landslides due to the tectonic breaks that usually decrease the rock strength (Devkota et al. 2013). Therefore, the distance to faults was also a necessary parameter in the susceptibility analysis. In the study area, the distance to faults map was reclassified into five divisions, such as 0–2000, 2000–4000, 4000–6000, 6000–8000, and >8000 m, respectively (Fig. 2i).

The distance to rivers, controlling the stability of a slope, is another important factor for landslide susceptibility analysis. On the basis of rivers and streams, a map of proximity to drainage was generated using ArcGIS 10.0 and was divided into five categories, such as 0–200, 200–400, 400–600, 600–800, and >800 m (Fig. 4j).

The distance to roads is an important anthropogenic factor influencing landslides occurrence. In the present study, the distance to roads was calculated and reclassified the resultant map into five classes: 0–1000, 1000–2000, 2000–3000, 3000–4000, and >4000 m, respectively (Fig. 2k).

Geomorphology is an important factor which is closely related to landslide occurrence (Kannan et al. 2013). Four geomorphologic units of the study area can be identified, i.e., mountain areas, loess ridge and hill areas, loess tableland areas and plain areas (Fig. 2l).

The NDVI is also considered as a conditioning factor related to landslide occurrence. In general, the higher the value of NDVI is, the larger the area that is covered by vegetation (He et al. 2012). In this study, the NDVI map was obtained from Landsat satellite image and reclassified into five classes, i.e., −0.31 to 0.08, 0.08 to 0.26, 0.26 to 0.40, 0.40 to 0.53, and 0.53 to 0.71, respectively (Fig. 2m).

The rainfall, closely associated with landslide initiation, is one of the main parameters in landslide susceptibility mapping. The annual rainfall of the study area was classified into five classes: <600, 600–650, 650–700, 700–750, and >750 mm/year, respectively (Fig. 2n).

Lithology is one of the most common determinant factors in most landslide stability studies. The geological map of the study area is compiled from existing geological maps and publications in Arc GIS 10.0. The lithological units of the study area are shown in Table 1, and the general geological setting of the area is shown on the source map (Fig. 2o).

Table 1 Description of geological units of the study area

Support vector machines

Support vector machine (SVM), a supervised learning method, is established on the basis of statistical learning theory. With the purpose to search an optimal separating hyperplane, the theory changed original import space into a dimensional feature space (Vapnick 1998; Xu et al. 2012; Bui et al. 2015).

For example, consider a training dataset of instance-label pairs (x i , y i ), where i = 1, 2,…, n, x i is an input vector that includes 15 landslide conditioning factors, y i ∈ {1, −1} is its corresponding two output classes, i.e., landslide and non-landslide, n is the number of training samples. The aim of SVM is to search an n-dimensional hyperplane differentiating the two types by their maximum gap. Its mathematical expression is as follows (Yao et al. 2008; Xu et al. 2012; Tehrany et al. 2015):

$$1/2\left\| w \right\|^{2}$$
(1)
$$y_{i} \left( {\left( {w \times x_{i} } \right) + b} \right) \ge 1$$
(2)

where ‖w‖ is the norm of the normal of the hyperplane, b is a constant. Introducing the Lagrangian multiplier (λ i ), the cost function can be defined as:

$$L = 1/2\left\| w \right\|^{2} - \sum\limits_{i = 1}^{n} {\lambda_{i} \left( {y_{i} \left( {\left( {w \times x_{i} } \right) + b} \right) - 1} \right)}$$
(3)

For non-separable case, introducing slack variables ξ i (Vapnik 1995), Eq. (2) can be modified as:

$$y_{i} \left( {\left( {w \times x_{i} } \right) + b} \right) \ge 1 - \xi_{i}$$
(4)

then, introducing v(0, 1) to express misclassification (Schölkopf et al. 2000; Wu et al. 2014), Eq. (1) can be defined as:

$$L = \frac{1}{2}\left\| w \right\|^{2} - \frac{1}{vn}\sum\limits_{i = 1}^{n} {\xi_{i} }$$
(5)

Besides that, a kernel function K (x i , x j ) is applied to account for nonlinear decision boundary (Vapnik 1995). In the study, the following four types of kernel function were applied to examine the efficiency of each kernel function in landslide susceptibility mapping (Xu et al. 2012; Pourghasemi et al. 2013b):

$${\text{Linear:}}\;K\left( {x_{i} ,x_{j} } \right) = x_{i}^{\text{T}} \times x_{j}$$
(6)
$${\text{Polynomial:}}\;K\left( {x_{i} ,x_{j} } \right) = \left( {\gamma \times x_{i}^{\text{T}} \times x_{j} + r} \right)^{d} ,\quad \gamma > 0$$
(7)
$${\text{Sigmoid:}}\;K\left( {x_{i} ,x_{j} } \right) = \tanh \left( {\gamma \times x_{i}^{\text{T}} \times x_{j} + r} \right)$$
(8)
$${\text{Radial}}\;{\text{basis}}\;{\text{function:}}\;K\left( {x_{i} ,x_{j} } \right) = \left( { - \gamma \left( {x_{i} - x_{j} } \right)} \right),\quad \gamma > 0$$
(9)

where d, r, and γ are parameters of the kernel functions (Pourghasemi et al. 2013b).

Results and discussion

The results of spatial relationship between landslides and conditioning factors using frequency ratio are shown in Table 2. In Table 2, landslides were most abundant in the class 14°–21°, indicating the highest probability of landslide occurrence in this group, followed by slope category 21°–28°. For the slope aspect, the frequency ratio was highest for north-facing slopes (FR value of 1.67) and lowest for flat slopes (0.0). For the elevation, the frequency ratio was highest for the class 720–850 m. In the case of plan curvature, the frequency ratio was 1.16 for class −0.05 to 0.05, indicating a very high probability of landslide occurrence. Similarly, for the profile curvature in the class >0.05, the frequency ratio was 1.44, which indicates a high probability of landslide occurrence. In the case of STI, SPI and TWI most of the landslides occurred in the class >15, >40 and 10–13, respectively. The relationship between landslides and their distance to faults, rivers and roads shows that when distance to a fault, river or road line increases, the probability of landslide occurrence decreases. The frequency ratio between landslide occurrence and geomorphology showed that the loess table land areas had the highest value 2.52 and mountain areas had the lowest value (0.12). The frequency ratio for the NDVI was high between 0.08 and 0.26, which indicates a very high probability of landslide occurrence. As shown in Table 1, it can be observed that as rainfall increases, the landslide frequency generally increases.

Table 2 Spatial relationship between each landslide conditioning factor and landslide using frequency ratio model

In this study, the SVM model with four types of kernel classifiers such as linear (LN), polynomial degree of 2 (PL), sigmoid (SIG), and radial basis function (RBF) were trained using the ENVI5.1 software. The probability of landslide occurrence falls in the range between 0 and 1. The results were then exported into the ArcGIS 10.0 software for visualization. Finally, the LSI of the produced maps was grouped into five classes (very low, low, moderate, high, and very high) using the natural break method. The four landslide susceptibility maps are shown in Fig. 3.

Fig. 3
figure 3

Landslide susceptibility maps: a the LN-SVM model; b the PL-SVM model; c the SIG-SVM model; d the RBF-SVM model

Validation and comparison of susceptibility maps

Validation is an absolutely essential component in the development of landslide susceptibility and determination of its quality (Pourghasemi et al. 2012c). Landslide susceptibility maps are meaningless without validation (Chung and Fabbri 2003). In the study, the receiver operating characteristics (ROC) curve was used to assess the overall performance of the four used models, because the ROC curve is helpful for representing the quality of deterministic and probabilistic forecast systems (Akgun et al. 2012; Youssef et al. 2015a, b). The ROC curve plots the false positive rate on the X-axis and true positive rate on the Y-axis, which shows the trade-off between the two rates (Pradhan 2013). The area under the curve (AUC) represents the quality of the probabilistic model to reliably predict the occurrence or non-occurrence of landslides (Youssef et al. 2015a, b). The success rate was obtained using the training dataset. As shown in Fig. 4a, the RBF-SVM model represented the highest value of success rate (83.15 %), followed by the PL-SVM model (82.72 %), LN-SVM model (81.77 %) and the SIG-SVM (79.99 %).

Fig. 4
figure 4

a Success rate and b prediction rate for the LN-SVM, the PL-SVM, the SIG-SVM, and the RBF-SVM models

The prediction capability of the four landslide susceptibility maps was obtained using the validation dataset. The result is shown in Fig. 4b. It can be observed that the RBF-SVM model had the highest prediction rate (77.98 %). Moreover, the prediction rates were 77.07, 77.50 and 76.08 % for LN-SVM model, PL-SVM model, and SIG-SVM model, respectively.

Discussion and conclusions

The preparation of landslide susceptibility maps is a crucial step that can help planners, local administrations, and decision makers in disaster planning. Accuracy of the landslide susceptibility maps is important for reducing the losses of life and property (Kavzoglu et al. 2014). Landslide susceptibility can be assessed using different methods and many research papers were published in order to solve the deficiencies and difficulties in the landslide susceptibility mapping (Yilmaz 2010a). The main objective of this research is to produce landslide susceptibility maps for the Qianyang County, China, using SVM based on four types of kernel classifiers such as linear, polynomial, sigmoid and radial basis function.

As the first step, a reliable landslide inventory map is necessary for landslide susceptibility mapping. In the study, 70 % of landslides were used for training the models and the others were used for validation purpose.

Secondly, 15 landslide conditioning factors such as slope angle, slope aspect, altitude, plan curvature, profile curvature, distance to faults, distance to rivers, distance to roads, NDVI, STI, SPI, TWI, geomorphology, rainfall, and lithology were constructed and used for producing landslide susceptibility maps.

Finally, five landslide susceptibility classes, i.e., very low, low, moderate, high, and very high susceptible for landsliding, were derived with natural break method. The spatial performances of the obtained landslide susceptibility maps were compared using ROC curves.

The validation results showed that the landslide susceptibility map generated by RBF-SVM model had the highest prediction rate (77.98 %), followed by the PL-SVM model (77.50 %), the LN-SVM (77.07 %), and the SIG-SVM (76.08 %). Success rate curves gave similar results, with RBF-SVM model the highest AUC value (83.15 %), followed by the PL-SVM model (82.72 %), the LN-SVM model (81.77 %), and the SIG-SVM model (79.99 %).

SVM model has been used in many literatures. Brenning (2005) obtained sufficiently smooth prediction surfaces for creating susceptibility map by using SVM. Yao et al. (2008) used the SVM in landslide susceptibility mapping, they found that SVM was a useful tool in landslide susceptibility assessment, and they found that the SVM had better prediction efficiency than LR. Marjanović et al. (2011) commented on the strengths and weaknesses of the SVM model, and indicated that SVM models do not need any feature selection technique as opposed to some other methods such as decision trees. Xu et al. (2012) found that the radial basis and polynomial kernel functions were suitable for modeling any input training data. San (2014) used SVM to generate medium scale landslide susceptibility maps; they also found that SVM presented high classification accuracy.

The results of the present study show that the SVM, based on four types of kernel classifiers, have been applied successfully to the production of landslide susceptibility maps. The landslide susceptibility maps provide valuable information on the slope stability in the study area, which could be of benefit to infrastructure planning, land use, engineering and hazard mitigation design.