Introduction

Wolong Natural Reserve is one of the largest habitats for giant pandas in the world. This region was listed as one of the world’s top 25 biodiversity hotspots by Conservation International, and named as a Global 200 eco-region by the World Wildlife Fund, and inscribed as a World Heritage Site in 2006 (UNESO World Heritage Center 2006). Wenchuan earthquake (May 12, 2008) triggered abundant secondary landslides and created unstable landslide areas, which have threatened the ecological environment for giant pandas in Wolong Natural Reserve (Ouyang et al. 2008). Therefore, landslide susceptibility zonation (LSZ) of this area is an urgent subject for future decision makers to select the low susceptible areas of landslide hazards as suitable locations for giant pandas.

In order to assess landslide hazards and construct maps portraying their spatial distribution, many researchers have attempted to use different methods, either qualitative or quantitative (Aleotti and Chowdhury 1999; Guzzetti et al. 1999; Dai and Lee 2002; Ayalew and Yamagishi 2005). Qualitative methods represent the susceptible level based on expert opinion. Scholars used these methods very frequently in the 1970s (Carrara and Merenda 1976; Fenti et al. 1979; Kienholz 1978; Ives and Messerli 1981; Rupke et al. 1988). To minimize the subjective bias from the experts, quantitative methods, such as bivariate statistical, multivariate statistical, and probabilistic prediction models were developed (Corominas et al. 2014). In the meantime, geographical information systems (GIS), with the availability of integrating various thematic layers, became increasingly popular. Many researchers have done landslide susceptibility mapping by rating, weighting, and superimposing various thematic maps corresponding to the causative factors based on GIS, such as probabilistic models (Rowbotham and Dudycha 1998; Luzi et al. 2000; Lee and Min 2004; Akgun et al. 2008, 2011; Ozdemir 2009; Yilmaz 2010a; Oh and Lee 2010, 2011; Pourghasemi et al. 2012a, b; Mohammady et al. 2012), bivariate statistics (Brabb et al. 1972; Yilmaz and Yildirim 2006; Constantin et al. 2011; Yilmaz et al. 2012; Yalcin et al. 2008, 2011; Magliulo et al. 2008; Lucà et al. 2011), multivariate analysis (Carrara 1983; Chung et al. 1995; Santacana et al. 2003 ; Komac 2006; Piegari et al. 2009 ; Pradhan et al. 2010a; Nandi and Shakoor 2010), logical regression (Dai et al. 2001, 2003, 2004; Lee and Min 2001; Lee and Pradhan 2007; Can et al. 2005; Yesilnacar and Topal 2005; Goesevski et al. 2006; Lee and Evangelista 2006; Nefeslioglu et al. 2008a; Yilmaz 2009; Lei et al. 2011; Pradhan et al. 2008, 2010a, b, 2011a, b; Chauhan et al. 2010; Bai et al. 2010; Akgun et al. 2012; Bui et al. 2011a; Felicisimo et al. 2013; Süzen and Kaya 2012), and the analyticalhierarchy process (Ayalew et al. 2004; Yoshimatsu and Abe 2006; Ercanoglu et al. 2008; Akgun and Türk 2010; Pourghasemi et al. 2012c; Kayastha et al. 2013).

In recent years, machine learning approaches such as artificial neural networks (ANN) and support vector machines (SVM) have been partially successfully implemented with the advantage of overcoming the deficiency of statistical methods that require two class samples (Pradhan B et al. Pradhan 2010c, d; Sezer et al. 2011; Oh and Pradhan 2011; Tien et al. 2012; Micheletti et al. 2013; Yao et al. 2008; Yilmaz 2008, 2010a; Yilmaz and Yuksek 2008a, b; Polykretis et al. 2015).

Proposed as indirect assessment strategies that combine the advantages of quantitative and qualitative assessments, hybrid models have become the new research hot issue recently, with the intent to create an improved and objective model. Kanungo et al. (2006), Lee et al. (2009) and Vahidnia et al. (2010) have combined a fuzzy inference system (FIS) with an artificial neural network (ANN) to generate LSZ. Goesevski et al. (2006) have integrated fuzzy logic with AHP. Tehrany et al. (2013) applied an ensemble rule based on decision tree (DT) and multivariate statistical methods in the spatial prediction of flood areas in Malaysia. Damasevicius et al. (2010) pointed out that robustness and clustering algorithms can be positively affected by combining grammar inference and SVM.

The hybrid methods cited above give rise to new thoughts of combining two different models together in order to reduce the sensitivity to noises and isolated samples, thus appealing for many scholars (Pradhan 2010a). The combined fuzzy similarity and SVM (F-SVM) method is an improved algorithm for SVM, which can overcome the weakness of either approach. However, attempts to create F-SVM are relative few.

In this paper, the F-SVM method has been created here for landslide susceptibility mapping in Wolong Giant Panda Natural Reserve. Nine factors were selected as landslide controls factors: slope, aspect, altitude, geology, and lithology, distance from rivers, distance from roads, distance from faults, profile curvatures, normalized difference vegetation index. They were constructed based on ArcGIS software for data spatial analysis and manipulation. Then, LSZ was generated and compared with three different approaches (LR, AHP, F-SVM). Finally, the result based on the optimum method in this particular study area could provide practical suggestions for government and decision makers for future conservation of the giant panda.

The study area

General characteristics

Wolong Natural Reserve is a suitable living environment for endangered species, especially for giant pandas. It is located in the west of Sichuan province, China, approximately between 102°52′00″E and 103°25′00″E longitude, and 30°45′00″N and 31° 25′00″N latitude, with an area of approximately 3600 km2. The epicenter of the Wenchuan earthquake is located 30 km northeast of the study area, dissecting the rock masses into small blocks. The fault zones near well-known Longmenshan mountain fault zones include, from northwest to southeast, Pitiao river fault, Gengda fault, and Yingxiu fault characterized by a series of parallel folds and faults that extend NE 40–50°. The rocks in this area are intensively fractured, and a number of joint sets are developed. The elevation ranges from 1194 to 5789 m. Slope degree in this region is very steep, varying from 0° to 86.117°. Owing to the particular geographical position and complex geological structure, it is frequently subjected to landslides (Fig. 1).

Fig. 1
figure 1

Study area

Geological setting

The rocks outcropping in the study area range in age from the Early Paleozoic era to Mesozoic. The formation of Jurassic and Cretaceous in Mesozoic is missing, and tertiary units in Cenozoic are also sparse. The Maoxian Group of Silurian is formed of celadon sericite phyllite, silver sand phyllite with a thin layer of quartzite, and thin-bedded and lenticular crystalline limestone in the southeast of the study area along Pitiao River. Triassic formations are distributed in the northwest along the Pitiao River, consisting of feldspar quartz sandstone, slate, carbonaceous phyllite, thin-bedded limestone, and fine siltstone. Additionally, the Jinning-Chengjing formation in the Proterozoic period is distributed in the northeast of the study area and is mainly composed of diorite and granodiorite, with the characteristic of being densely jointed and crushed. Since it is the oldest formation and susceptible to weathering, large numbers of landslides are observed in these units through field investigation.

Hydrological characteristics

The climate of the study area is very humid. According to the data obtained from a local meteorological station, the average humidity is up to 80 %. The average annual precipitation is 890 mm and generally concentrates in spring and summer. The main streams in the study area are Pitiao, Jin, Zhong, and Xi rivers. These rivers and their tributaries form a dendritic drainage pattern due to topographical and geological features of the study area.

Slope failures

Landslides that have occurred in this region are widely distributed and represent a serious threat to humans and giant pandas. Large scales of potential landslides and detrital materials formed on the slope during the process of the earthquake. The rock mass on the slope has become loose after the earthquake and, therefore, provides source material for a potential precipitation-induced landslide. These unstable slopes are very likely to slide when triggered by rainstorms or earthquakes. The following Fig. 2 shows that a landslide with an approximate 70 m length, 50 m width, and 40 m height occurred just after a heavy rainfall. The main body is presumed to be created by the May 12, 2008 Wenchuan earthquake. After heavy rain on June 19, 2014, new tension cracks appeared at the back of the main scarp. As material accumulated, movement accelerated and secondary landslide occurred. Many similar landslides are cited for the study area.

Fig. 2
figure 2

a A typical landslide in the study area. b A profile map of a typical landslide in the study area

Construction of a landslide spatial database

For the landslide susceptibility mapping, the primary step is to construct the spatial database from relevant landslide conditioning factors. This stage is thought to be the most important part of landslide susceptibility and hazard mitigation studies (Guzzetti et al. 1999; Ercanoglu and Gokceoglu 2004; Kincal et al. 2009). The spatial database for the study area is composed of slope degree, aspect, altitude, profile curvature, geology and lithology, distance from faults, distance from rivers, distance from roads, and the normalized difference vegetation index (NDVI). These spatial conditioning factors make the slope susceptible without trigger conditions and thus are considered responsible for the occurrence of landslides in the study area. As we know, rainfall and earthquakes, as triggering factors and temporal phenomena, set off the movement by shifting the slope from the quasi-stable state to an unstable state. However, past data on these trigger factors in relation to landslide occurrence are not available and thus are not considered in this study. The sources of this spatial database are shown in Table 1.

Table 1 The source of spatial database used in landslide susceptibility analysis

In this paper, a digital elevation model (DEM) with a ground resolution of 25 m was constructed by interpolation of 1:50,000 scale local digital contour lines using ArcGIS software. Some significant terrain attributes such as slope gradient, aspect, altitude, and profile curvature were derived from this DEM. All other digital lines such as geology maps, fault distribution, river distribution, and road distribution were converted into raster format and resampled with the same pixel size as the DEM.

Landslide inventory map

Since a reliable landslide inventory map plays the most important role in mapping the landslide susceptibility, it is necessary to determine the locations and outlines of landslides accurately (Pradhan and Lee 2007). However, employing field survey and observation as the initial method is difficult and time consuming on account of complex and dangerous terrain conditions after the earthquake. Instead, remote sensing methods, such as high resolution remote sensing and aerial photographs, are used to exact significant and cost effective information on landslides. Certainly, a field survey can be used to verify the result of aerial photograph interpretation and remote sensing imagery analysis.

It should be noted that different sample strategies representing the landslide have different results and have different meanings as well (Nefeslioglu et al. 2008b). Nevertheless, the conceptual differentiation of sampling strategies applied in susceptibility evaluation is commonly ignored. Moreover, there is no agreement on the technique of producing a landslide inventory map. Generally, point, seed cell (Süzen and Doyuran 2004; Yesilnacar and Topal 2005; Sujatha et al. 2012), and scarp (Clerici et al. 2006) are used as training data to represent the failure condition of landslides by researchers. Yilmaz (2010b) has first compared the effect of these three different sampling strategies by means of landslide inventory on a landslide susceptibility assessment, and the result showed that the scarp sampling strategy performed better than the other two sampling strategies. According to Yilmaz (2010b), the point sampling strategy described by a single X, Y coordinate couldn’t reflect the landslide affected area. As is well known, two genetically and morphologically distinct zones can be identified: the depletion zone (the upper part of the landslide where the failure is effectively generated) and the accumulation zone (the lower part which is simply affected by the arrival of the depleted material) (Clerici et al. 2006). If the whole landslide is considered in assessing landslide susceptibility, the accumulation zones are erroneously considered to be prone to landsliding. The depletion zone is generally difficult to identify completely since it is partially occupied by the displaced material. Thus, the main scarp (the higher portion of depletion zone, especially its upper edge) is the most evident morphological feature of a landslide and can be easily distinguished from the accumulation/depletion zone or rupture zone as a polygon feature (Yilmaz 2010b) (Fig. 3).

Fig. 3
figure 3

Landslide inventory map. a The main scarp of landslides were analyzed from IKONOS; b the main scarp of landslides were interpreted from SPOT

Different map scales (large, medium, and small scales) should be considered in general natural hazard zonation (Holec et al. 2013). Concerning the purpose of the assessment, the extent of the study area, and data availability (Aleotti et al. 1996), a medium scale (1:50,000) is chosen as the work scale to analyze landslide susceptibility zonation. Additionally, only a few landslide areas (about 0.05 % of the total landslide number) are less than 100 m2 (Chong Xu et al. 2013), and 1:50,000 map scale is deemed sufficient to delineate a landslide.

In this study, a total of 4771 landslides are identified via RapidEye in 5 m resolution, SPOT-5 in 2.5 m resolution, IKONOS in 1 m resolution, and QuickBird in 0.6 m resolution, and about 80 % of these landslides are verified by field surveys; however, main scarps of landslides larger than one cell (25 × 25 m2) were selected in the landslide inventory mapping (the number adds up to 1773; Fig. 3). Most of the landslides were rock slides according to the classification system proposed by Varnes (1978). Among these data, a random 70 % of the data were chosen as training data for the landslide susceptibility map, while the remaining 30 % were used for the model validation. The pixel size of landslide inventory and other thematic maps was 25 m. The study area includes 2,264,362 pixels, and the main scarps of landslides include 63,631 pixels.

Slope degree

The main parameter of the landslide stability analysis is the slope degree, since it dictates the distribution of slope stress (Lee and Min 2001; Saha et al. 2005; Ercanoglu et al. 2002). Meanwhile, the slope degree also restricts the redistribution of material and energy of the earth’s surface and controls terrestrial plumbing, the thickness of the loose material, and recharge and discharge of groundwater on the slope. Most importantly, the slope influences the effective free face of the slope body, for landslides tend to increase with the free face of the slope body. For these reasons, the slope degree map of the study area is crucial for this research. The slope degree map is derived from DEM and divided into six slope categories (Fig. 4a).

Fig. 4
figure 4figure 4figure 4

The thematic map of landslide affecting factor. a Slope degree; b aspect; c altitude; d profile curvature; e geology and lithology; f distance from faults; g distance from rivers; h distance from roads; i normalized difference vegetation index

Aspect

Aspect is defined as the direction of the maximum slope of the terrain surface. It has an indirect influence on slope instability. Aspect related factors, such as exposure to sunlight, land use, drying winds, rainfall (degree of saturation), and discontinuities, may control the occurrence of landslides (Yalcin 2008). For example, Xu et al. (2013b) has reported that large numbers of landslides caused by the Wenchuan earthquake occurred in south-facing aspects. Therefore, in this study, the aspect map is also derived from DEM and divided into nine classes: flat (−1°), north (337.5°−360°,0°–22.5°), northeast (22.5°–67.5°), east (67.5°–112.5°), southeast (112.5°–157.5°), south (157.5°–202.5°), southwest (202.5°–247.5°), west (247.5°–292.5°), northwest (292.5°–337.5°) (Fig. 4b).

Altitude

Altitude is also a relevant landslide conditioning factor. It is well known that altitude influences temperature, vegetable, human activity, and gravitational energy of landslides. In turn, these conditions have the potential to affect slope stability and generate slope failure. The altitude map is derived from DEM and reclassified into seven classes (Fig. 4c).

Profile curvature

The profile curvature is theoretically defined as the rate of change of slope gradient or aspect, usually in one particular direction (Wilson and Gallant 2000). Profile curvature on the slope erosion processes influences the convergence or divergence of water during downhill flow (Ercanoglu and Gokceoglu 2002; Oh and Pradhan 2011). In addition, it also controls the change of velocity of mass flowing down the slope (Talebi et al. 2007). It is negative when the concavity of the normal section directed up and vice versa (Hengl et al. 2003). The profile curvature map was created by using a spatial geo-scientific analyses model in ArcGIS software (Fig. 4d).

Geology and lithology

Geology and lithology describe the material basement of landslides. Rock types and structures decide the physical properties of rocks and thus affect the stability of landslides. For this reason, it is essential to group the lithology properties properly (Dai et al. 2001; Duman et al. 2006). In this study area, different lithology associations are developed in different geological periods (Table 2). The geological map was prepared by the Geological Survey of Sichuan province with 1:20,000 scale, then digitized and converted into raster format with 25-m pixel size in GIS (Fig. 4e).

Table 2 Geology and lithology of the study area

Distance from faults

The specific shape, type, and displacement mechanism of landslides were decided by pre-landslide geological features. Tectonic action plays important role in landslides occurrence. Faults form a line or zone of weakness characterized by tectonic structure (Foumelis et al. 2004). Generally speaking, landslides occur more frequently near the faults. Selective erosion and water movement along fault planes promote landsliding. In this study, the distance-from-faults map was extracted from the geology map at 1:200,000 scale. The buffer intervals were set to 200 m, and then the buffer map was converted into raster format (Fig. 4f).

Distance from rivers

The distance of the slope to drainage structure is another important factor in terms of landslide stability. Streams may adversely affect stability by eroding the slope or saturating the lower part of the material resulting in water level increases (Gokceoglu and Aksoy 1996; Saha et al. 2002). For this reason, six different buffer zones were defined with 100-m intervals to determine how the streams affected the slopes (Fig. 4g).

Distance from roads

Distance from roads is another important factor. A high slope caused by road excavation is more prone to slide owing to disruption of the stress state and slope equilibrium. In fact, a large number of landslides were observed closer to the road during the field investigation. For this reason, five different buffer zones were created with 100-m intervals to determine how the roads affected the stability of slope (Fig. 4h).

Normalized difference vegetation index (NDVI)

The incidence of landslides is closely related to vegetation density. Barren slopes are more prone to landslides as compared to one with higher vegetation coverage. The NDVI was derived from German remote sensing images (RapidEye) with 5 * 5 m resolution. The NDVI value was calculated using the following equation:

$$ {\text{NDVI}}\,{ = }\, ( {\text{NIR}}\, - \,R ) / ( {\text{NIR}}\,{ + }\,R ) $$
(1)

where NIR is the infrared value, and R is the red portion of the electromagnetic spectrum, respectively. The study area was divided into six classes to demonstrate how the NDVI influences landslide occurrence (Fig. 4i).

Landslide susceptibility mapping

Logistic regression model

Logistic regression (LR) is a multivariate analysis model used to find the optimal fitting to describe the relationship between the presence and absence of landslides based on a set of independent variables such as slope angle, aspect, and lithology. In the present situation, the dependent variable is a binary variable 0 or 1 that represents the absence or presence of a landslide. The LR model generates coefficients to estimate ratios for each of the independent variables.

Quantitatively, the relationship between occurrence and its dependency on several variables can be expressed as:

$$ p\, = \,\frac{1}{{1\, + \,e^{ - z} }} $$
(2)

where the p value is the estimated probability of landslide occurrence, and Z is the linear combination of each affecting factor.

It follows that logistic regression involves fitting an equation of following form to the data

$$ z = (b_{0} + b_{1} x_{1} + b_{2} x_{2} + \cdots b_{n} x_{n} ), $$
(3)

where b 0 is the intercept of the model; b i (i = 0, 1, 2,… n) is the partial regression coefficient; x i (i = 0, 1, 2,… n) is the independent variable.

Before using the logistic regression model, the spatial databases of each factor influencing the landslide were converted to ASCII format files. Then, the coefficient between the landslide and each affecting factor was calculated by statistical software (SPSS 15.0).

$$ \begin{aligned} z\, = \,(0.12\, \times \,{\text{SLOPE}})\, + \,(0.178\, \times \,{\text{ASPECT}})\, \quad + \,(0.058\, \times \,{\text{LITHOLOGY}}) \hfill \\ + \,(0.071\, \times \,{\text{NDVI}})\, \quad + \,(0.012\, \times \,{\text{FAULT}})\,{ + }\, (0.193\, \times \,{\text{ROAD}})\quad \,{ + }\, (0.064\, \times \,{\text{RIVER)}}\,{ + }\, \hfill \\ (0.275\, \times \,{\text{ALTITUDE}}) - (0.027 \times {\text{Procur}})\, - \,0.594 \hfill \\ \end{aligned} $$
(4)

where SLOPE is slope value; ASPECT is aspect value; LITHOLOGY is geology and lithology value; NDVI is NDVI value; FAULT is distance from fault value; ROAD is distance from road value; RIVER is distance from river value; ALTITUDE is altitude value; Procur is profile curvature value; and z is a parameter.

Using Eqs. (2) and (3), the possibility of a landslide occurrence was calculated, and finally, a susceptibility map was obtained by converting the file into raster format. The p value ranges from 0.42 to 0.83. Five classes (very low, low, moderate, high, very high) were defined based on the standard deviation (Fig. 5a).

Fig. 5
figure 5

Landslide susceptibility map using LR model (a); AHP model (b); F-SVM (c)

Analytical hierarchy process (AHP)

The analytical hierarchy process, developed by Saaty (1977), is a semi-qualitative method based on pair-wise comparison of the contribution of different factors for landslide occurrence. It is a multi-objective, multi-criterion, decision-making approach that enables the user to arrive at scale of preference drawn from a set of alternatives (Saaty 1980). The decision maker can obtain the goal using the following steps:

  1. 1.

    Break down a complex and unstructured problem into component factors;

  2. 2.

    Arrange these factors in a hierarchical order;

  3. 3.

    Assign numerical values, weights, according to their subjective relevance to determine the relative importance of each factor;

  4. 4.

    Synthesize the judgments to determine the priorities of these factors (Saaty and Vargas 2001). In order to construct the pair-wise comparison matrix, each factor should be rated against any other factor by assigning a score between 1 and 9, given in Table 3.

    Table 3 Scale of preference between two parameters in AHP (Satty 2000)

When the factor on the vertical axis is more important than the factor on the horizontal axis, this value varies between 1 and 9. Conversely, the value varies between the reciprocal 1/2 and 1/9. According to the above principles, the importance of each parameter affecting landslide susceptibility and a calculated consistency ratio (CR) were generated (Table 4). In the AHP method, the CR is used to indicate the probability that the matrix judgments were randomly generated. When CR is less than 0.1, it represents a reasonable level of consistency (Malczewski 1999).

Table 4 The pair-wise comparison matrix, factor weights, and consistency ratio of the data layers

On the contrary, the judgment is needed when the CR is above 0.1. In this study, the CR is 0.0551, which means the ratio indicates a reasonable level of consistency in the pair-wise comparison matrix. Geology and lithology, slope degree, distance from faults, and NDVI were found to be important parameters influencing the landslide occurrence, whereas distance from river is of low importance. Using a weighted linear sum procedure, the acquired weights were used to calculate the landslide susceptibility models.

$$ {\text{LSM}}_{\text{AHP}} { = }\sum {{\text{R}}_{ 1i} \cdot W_{1i} } $$
(5)

where R 1i is the rating class of each layer such as slope, aspect, elevation, where W 1i is the weight for each conditioning factor. Based on the GIS, each conditioning factor is converted into raster format and weighted summation. The pixel values obtained are then classified into five classes based on standard deviation to determine the class intervals in the landslide susceptibility map (Fig. 5b).

Combined SVM and fuzzy similarity model

The novel hybrid learning model is the combination of SVM and fuzzy similarity concept. Fuzzy similarity is attractive because it is straightforward to understand and implement. It is different from data-driven approaches such as logistic regression or weight of confidence (Pradhan 2011a, b). However, the weight of thematic layer in the fuzzy similarity method is controlled by the expert; in other words, the determination of weights is qualitative not quantitative. Consequently, combined fuzzy similarity with SVM can integrate advantages of two methods and provide objective and steady results. The flow diagram in Fig. 6 involves three steps. Firstly, determine the rates of thematic layers using the fuzzy similarity approach. Secondly, determine the weights of thematic layers through the SVM approach. Finally, integrate the weights and rates using GIS to generate landslide susceptibility mapping. The flow diagram of this hybrid method is shown in Fig. 6.

Fig. 6
figure 6

Flow diagram showing the combined fuzzy and SVM method for landslide susceptibility mapping

Fuzzy similarity method

To deal with complex problems, Zadeh (1965) first introduced fuzzy set theory, which was oriented to the rationality of uncertainty due to imprecision or vagueness. In fuzzy similarity theory, a spatial object is a member of set. Such a set is characterized by a membership, which can be assigned any value between 0 and 1, reflecting the degree of certainty of membership (Zadeh 1965). If the object belongs to member of set, the value is 1, otherwise the value is 0.

In this study, the membership degrees of categories of each conditioning factor are determined based on a frequency ratio model. The frequency ratio is the ratio of area where landslides occurred in the total area. If the value is greater than 1, it shows that this affecting factor has a high correlation with landslide occurrence; if lower than 1, it is a lower correlation; if equal to 1, it means an average value. Then, the frequency ratio normalized between 0 and 1 to describe the fuzzy membership values (Table 5).

Table 5 Spatial probability relationship between each landslide affecting factor and landslide and fuzzy membership value

Support vector machines

The support vector machine was originally developed by Vapnik (1995) as a more recent machine-learning method after artificial neural networks.

Using the training data, SVM implicitly converts the original input space into higher dimensional feature space based on kernel functions (Brenning 2005). Subsequently, in the feature space, the optimal hyper-plane is determined by maximizing the margins of class boundaries (Shigeo Abe 2010). Therefore, SVM trains are modeled by constraining duality optimal solution.

Consider a training dataset of instance-label pairs (x i , y i ), with x i ϵ R n. The training vectors consist of two classes, which are denoted as y i ϵ {1, −1} and i = 1, 2,…, m. If a point x i ϵ R n is above the hyper-plane, it is classified as 1, otherwise it is −1. The goal of SVM is to search for an n-dimensional hyper-plane differentiating the two classes by the maximum gap.

Mathematically, it can be denoted as

$$ \hbox{min} \frac{1}{2}\left\| w \right\|^{2} $$
(6)

Subject to the following constraints

$$ y_{i} ((w \cdot x_{i} ) + b) \ge 1 $$
(7)

where ‖w‖ is the normal of the hyper-plane, b is a scalar base, and \( \cdot \) denotes the scalar product operation.

Introducing the Lagrangian multiplier, the cost function can be defined as

$$ L = \frac{1}{2}\left\| w \right\|^{2} - \sum\limits_{i = 1}^{n} {\lambda_{i} (} y_{i} ((w \cdot x_{i} ) + b) - 1), $$
(8)

where λ i is the Lagrangian multiplier. The solution can be achieved by dually minimizing Eq. (8).

For the case of linear separable data, a separate hyper-plane can be defined as

$$ y_{i} ((w \cdot x_{i} ) + b) \ge 1 - \xi_{i} ,\xi_{i} \ge 0, $$
(9)

where ξ i is the slack variable. The above equation will be modified as

$$ L = \frac{1}{2}\left\| w \right\|^{2} - \frac{1}{vn}\sum\limits_{i = 1}^{n} {\xi_{i} }, $$
(10)

where v(0,1] is introduced to account for misclassification (Scholkopf et al. 2000; Hastie et al. 2001). Additionally, a kernel function K(x i , y i ) is introduced accounting for the nonlinear decision boundary.

$$ K(x_{i} ,x_{j} ) = e^{{ - y(x_{i} - x_{j} )^{2} }} $$
(11)

Generally speaking, there are several kernel types, such as linear kernel, polynomial kernel, RBF (Gaussian kernel). Because the RBF kernel has proved to be the most powerful kernel in dealing with nonlinear cases (Yao et al. 2008) it was thus employed in this study. For the RBF kernel, the kernel width (γ) is the primary parameter, which controls the degree of nonlinearity of the SVM model (Damasevicius 2010). Only (γ) has to be determined for a chosen v. For each pair (γ, v), the dataset is divided into n folds: one fold is considered as verification dataset, the other n − 1 folds are considered as training datasets. By iterating each fold as a verification dataset and combination of other folds as training, the optimal (γ, v) is determined. For this research, (γ, v) is choosen to be (0.1, 0.65) based on a 60 % subset of test data as training data and the other 40 % of the data as verification data. Final weights of landslide conditioning factors are given in Table 6 using the SVM model. Datasets and their classes are given in Table 5. Landslide susceptibility map produced by SVM is shown in Fig. 5c.

Table 6 Weights of each landslide affecting factor based on SVM model

It can be observed from Table 5 that when the slope degree is greater than 50, the frequency ratio value is 2.21. This means a high probability for landslide occurrence, and thus the corresponding value of fuzzy membership is 1. For slope degree between 0 to 10, the frequency ratio value is 0.3517, which indicates a low probability of landslide occurrence, and the corresponding value of fuzzy membership is 0. In terms of slope aspect, landslides were the most abundant on the southeast and south slopes. Thus, the hill slope facing the southeast or south is more susceptible to landslide. The slopes facing flat and northeast have a lower probability of landslide. With respect to the altitude, landslides were the most abundant on 1179–1500, 1500–2000, 2000–2500 m (1.88, 2.4837, and 2.094, respectively). In the case of geology and lithology, the frequency ratio (13.47) is the highest in the areas that are composed of plagioclase granite in Yanshanian period, and few landslides are distributed in C, T3zh, P1, P2, η51b, ζ51b, D 2+3, γ 2b.5 . In the case of profile curvature, the frequency ratio values were higher in concave areas and lower in flat areas. In the case of NDVI, the frequency ratio is higher in 0.063–0.108, 0.108–0.28, 0.28–0.45. In the case of distance from fault, at distances of 0–200, 200–400, 400–600, 600–800, 800–2000 m, the frequency ratios are 0.59, 0.739, 0.662, 1.154, 1.396, respectively, showing a high probability of landslide occurrence. In the case of distance from river, distances of 200–300, 300–400, and 400–500 m have a high probability for landslide. For the distance from the road, the landslides mostly occurred at distances of 100–200, 200–300, 300–400, >400 m.

Results and comparison of multi-models

The LSZ was generated by three different methods based on GIS. To test the optimal approach, LR, AHP, and F-SVM were compared and validated. The percentage distribution of the susceptibility classes in the study area was determined by standard deviation classification, since the histogram of data values exhibits a normal distribution.

According to the LSZ produced by the LR method, it can be observed that 5.34, 20.15, 29.0, 20.17, and 25.34 % of the study area can be classified as very high, high, moderate, very low, and low susceptibilities (Fig. 7). As shown in Fig. 7, the histogram of the landslide susceptibility area based on the AHP model exhibits that 9.8 % of the total area is very low probability for landsliding, and 5.4 % of the total area shows very high probability for landsliding. The low area covers about 30.9 % of the total area. The moderate susceptibility zone is about 27.6 % and the high susceptibility area is 26.1 %. According to the landslide susceptibility zone produced by F-SVM, 5.8 % of the study area is very high zonation, 17.8 % high, 26.7 % moderate, 21.1 % low, and 28.6 % very low area (Fig. 7).

Fig. 7
figure 7

A histogram showing the percentage of landslide zones constructed with the LR, AHP, F-SVM methods

For validation of landslide hazard calculation models, two assumptions are needed. One is that the landslides are related to spatial information. The other assumption is that future landslides will be triggered by specific factors such as rainfall and earthquake. In this study, both of the basic assumptions were met. The landslide susceptibility maps can be validated by comparing the known landslides location data, which were not included in the susceptibility analyses, with the susceptibility map obtained. In the present study, 30 % of total landslides were used for validation based on random selection. Figure 8 presents a histogram that summarizes the results of the entire process. It can be observed that 41.7, 40.9, 13.6, 3.6 % of the validation hazard data fall into the very high, high, moderate, and low classes of the landslide susceptibility map using the F-SVM method. Of the landslides that occurred, 38.7, 41.3, 15.8, and 4 % fall into the very high, high, moderate, and low susceptibility classes in LSZ with the LR method. It is worth mention that no validation data falls into the very low susceptible class in LSZ with F-SVM and LR. The landslide susceptibility map created with the AHP method showed that 10, 51.2, 26.5, 10.7, 1.4 % of landslides that occurred fall into very high, high, moderate, low, very low susceptible classes.

Fig. 8
figure 8

A histogram showing the verification data that fall into the various classes of the LR, AHP, F-SVM susceptibility maps

Moreover, the rate curves are generated, and the area under curve (AUC) is a good indicator to evaluate the prediction performance of the model. If the AUC is close to 1, it indicates a more ideal model (Swets 1988; Yesilnacar and Topal 2005). To obtain the relative ranks for each prediction model, the calculated landslide susceptibility index (LSI) of all cells in the study area was sorted in descending order. Then the ordered cell values were divided into 100 classes with accumulated 1 % intervals. Cumulative percentage of landslide occurrence in different models appears as a line in Fig. 9. It can be observed from Fig. 9 that three different methods show the same tendency. This means all three methods can be used for predicting the susceptibility of landslide. In the case of the AHP method, 90 to 100 % (10 %) class of the study area where the landslide hazard index had a high rank could explain 40.22 % of all the landslides. Additionally, 80–100 % (20 %) class of the study area where the landslide hazard index had a high rank could explain 63.09 % of all the landslides. In the case of the LR approach, 90–100 % (10 %) class of the study area where the landslide hazard index had a high rank could explain 52.713 % of all the landslides. In addition, the 80–100 % (20 %) class of the study area where the landslide hazard index had a high rank could explain 71.98 % of all the landslides. In the case of the fuzzy-SVM method, 90–100 % (10 %) class of the study area where the landslide hazard index had a high rank could explain 53.13 % of all the landslides. In addition, 80–100 % (20 %) class of the study area where the landslide hazard index had a high rank could explain 73.06 % of all the landslides. In order to be compared with the prediction accuracy of different methods quantitatively, the area under the curve needs to be calculated. In the case of the AHP method, the area ratio was 0.7884. In other words, the prediction accuracy is 78.84 %. In the case of the LR method, the prediction accuracy is 84.55 %. In the case of the fuzzy-SVM method, the prediction accuracy is 85.73 %. It is easy to conclude that F-SVM has better prediction than AHP, whereas it is relatively similar to LR.

Fig. 9
figure 9

Cumulative frequency diagram showing landslide hazard index rank occurring in cumulative percent of landslide occurrence

Discussion and conclusions

Since landslides are among the most dangerous natural hazards, government and research institutions worldwide have attempted to assess landslide susceptibility, risk, and show its spatial distribution. The research for assessing hazard susceptibility in cultural heritage sites or natural heritage sites is relatively few (Kyoji Sassa et al. 2009). Wolong Giant Panda Natural Reserve, as one of the world’s cultural heritages, is located southwest of the epicenter of the Wenchuan earthquake at a distance of about 30 km. Obviously, the Wenchuan earthquake had triggered enormous landslides and caused lager landslide susceptible areas. In the present study, a total of 1773 landslide scarps larger than one cell (25 × 25 m2) were selected in the landslide inventory mapping, 70 % of which are randomly selected to be used as test data, and the other 30 % are used as validation. Nine landslide conditioning factors were selected: slope degree, aspect, altitude, profile curvature, geology and lithology, distance from faults, distance from rivers, distance from roads, and normalized difference vegetation index (NDVI). The logistic regression, analytical hierarchy process, and combined fuzzy and SVM were applied and compared for landslide susceptibility mapping in Wolong Giant Panda Natural Reserve. The validation was carried out and showed that combined fuzzy and SVM hybrid model would be the most accurate LSZ map in this study area.

Many studies have compared neural network models with LR, AHP, and conditional probability. Some authors agree that soft computing (e.g., ANN, SVM) models have superior performance to conventional conditional probability or LR methods (Yao et al. 2008), while other authors find that soft computing models have no difference with other prediction methods (Tu 1996; Schumacher et al. 1996; Ottenbacher et al. 2001; Mahiny and Turner 2003).

In this study, our results demonstrate that although three different methods can predict landslide susceptibility according to their same tendency, the combined fuzzy and SVM method (F-SVM) is better than AHP and has relative similar accuracy to LR. The AHP method is a simple tool and easy to be implemented based on expert opinion, but the limitation is results with uncertainty and subjectivity. The LR method is relatively excellent, for it can decrease the subjective result to some extent as a data-driven model. However, the combined fuzzy and SVM hybrid model performed the most excellent. This may be because SVM represents an objective approach, where weights for each landslide conditioning factor are determined through the SVM model, and rating of the thematic layer is determined by the fuzzy similarity method.

In summary, the landslide susceptibility map generated by the combined fuzzy and SVM hybrid model in Wolong Giant Panda Natural Reserve is the objective approach. According to LSZ based on F-SVM, 5.8, 17.8 % of the study area is assigned as very high and high susceptibility areas, which is very meaningful for government, managers, and decision makers of protecting giant panda.