Introduction

Landslides are characterized the most natural hazards after earthquakes, which continue to cause human and financial loss. In Sep 2014, Lifengyuan Hydropower Station (originally called Dalingshan) in China was completely destroyed by landslides. It is the first time hydropower station has been destroyed by landslides in Three Gorges Reservoir area. Landslide susceptibility mapping (LSM) is a solution to understanding and predicting hazards to mitigate their consequences (Feizizadeh and Blaschke 2011).

For decades, a variety of methods for LSM have been proposed, most of which are GIS-based and are related to qualitative, quantitative, or hybrid approaches (Fig. 1). Some of methodology overviews list as follows: Varnes 1984; Van Westen 1994; Leroi 1996; Aleotti and Chowdhury 1999; Guzzetti et al. 1999; Van Westen et al. 2003; Brenning 2005; Glade and Crozier 2005; and Budimir et al. 2015.

Fig. 1
figure 1

A schematic illustration of commonly used LSM methods (by Sabokbar et al. 2014)

Landslides are complex processes, mainly because of the role of many factors, including geology, geomorphology, hydrology, and the role of people. If we try to quantify the relationships between influencing factors and landslide occurrence, a question may be asked: whether the influence of factors on landslide occurrence is homogeneous everywhere? That is to say, are relationships between landslide location and influencing factors stationary? It is necessary to note that the literature focusing on the spatial non-stationarity in landslide susceptibility assessment is very limited (Zhou et al. 2002; Erener 2009; Erener and Düzgün 2010; Atkinson and Massari 2011; Chalkias et al. 2011, 2014; Feuillet et al. 2014). Unfortunately, most studies have considered the relationships between predictors and landslide occurrence as fixed effects (Feuillet et al. 2014).

The methods which ignore spatial dependence or autocorrelation characteristics of data in susceptibility assessment, like logistic regression (LR), can be called as global models (Feuillet et al. 2014). Relatively, the methods which consider spatial variability in the effect of influencing parameters at the local scale, like geographically weighted logistic regression (GWLR), can be called as local models. The global model (LR) makes the regression coefficients of landslide predisposing factors remain constant throughout the whole study area, whereas the local model (GWLR) allows the regression coefficients different everywhere. The concept of geographical weighting was proposed in 1996 and initiated using geographically weighted regression (GWR) to capture spatial non-stationarity (Brunsdon et al. 1996; 1998; Fotheringham et al. 2002). GWLR is then developed to explore the relations between riverbank erosion and geomorphological controls (Atkinson et al. 2003). Erener (2009) first used GWLR in LSM in her PhD thesis, in the case of Bartin Kumluca watershed in Turkey; then, she compared LR, spatial regression (SR), and GWLR in the case of More and Romsdal, Norway (Erener and Düzgün 2010); Chalkias et al. showed the differences of landslide susceptibility estimations between GWLR and LR in South Greece (Chalkias et al. 2011; 2014). Feuillet et al. (2014) focused on investigating the spatial non-stationarity in the relationships between paraglacial variables and landslide locations in northern Iceland.

In this paper, LR and GWLR models are implemented for LSM in Qinggan River basin, a small part of the Three Gorges Reservoir area in China. And six evaluation criteria are used to compare the two approaches.

Study region

Three Gorges lies in the mountains separating Sichuan Basin and Jianghan Plain, and along the middle reaches of the Yangtze River. The study area, located ~50 km west of the Three Gorges Dam, covers a surface area of 46.8 km2 and lies between latitudes from 30.56′13″ N to 31.0′37″N and longitudes from 110.33′46″E to 110.39′36″E (Fig. 2). The minimum elevation in the area is 140 m, and the maximum elevation is 1210 m. The terrain consists of a series of limestone ridges and gorges, with intergorge valleys underlain primarily by interbedded mudstones, shale, and thinly bedded limestone. Landslides tend to occur in areas underlain by failure-prone rock units are exposed in the intergorge valleys (Fourniadis et al. 2007a, 2007b). Although the slope gradients range from 0° to 78°, the majority are between 20° and 30°. Steep slopes have developed in areas underlain by easily erodible, soft materials, and landslides are common in these areas. The average annual precipitation is 1100 mm. The rainfall is generally concentrated in the spring and summer, and the summer average can be as high as 200–300 mm per month (Peng et al. 2014, 2015).

Fig. 2
figure 2

Study area in Three Gorges Reservoir area with landslide events

The study area is located in the southern tip of Zigui syncline axis. The strata exposed along the northwest to the southeast are (from the old to the new): Triassic Badong Formation (T2b), Triassic Jiuligang Formation (T2j), Jurassic Tonglinyuan Formation (J1t), Jurassic Qianfoyan Formation (J2q), Jurassic lower Shaximiao Formation (J2x), and Jurassic upper Shaximiao Formation (J2s).

A landslide inventory map surveyed by the Three Gorges Headquarters was used to obtain landslide locations (Fig. 2). The mapped landslides cover a total area of 5.46 km2, representing 11.1 % of the study region. Landslides mostly distribute along the waters. The global Moran’s I is 0.617 with P value 0.001, which reveals obvious spatial agglomeration. Examples of large and the most destructive landslides include Shuping landslide, Qianjiangping landslide and Yanguoshaba landslide (Fig. 2).

Data

Influencing factors of landslides

Nine variables were selected for the LSM study area: elevation, slope, modified normalized difference water index (MNDWI, Pelletier et al. 1997), normalized difference vegetation index (NDVI), distance-to-stream (dis_stream), distance-to-fault (dis_fault), distance-to-road (dis_road), lithology, and bedding structure. These variables were based on previous studies of landslide susceptibility in Three Gorges Reservoir area (Liu et al. 2004, 2009; Fourniadis et al. 2007a, 2007b; Bai et al. 2010; Peng et al. 2014; Bi et al. 2014; Talaei 2014).

Elevation seems to have no direct relation with landslide. However, water development and human activities are closely associated with elevation. Then as elevation reduces, the probability of the surface material disturbance increases. Through the histogram statistics and analysis, there is no landslide when elevation is more than 700 m. Therefore, negative correlation may exist between elevation and landslides.

Slope is thought to be important factor affecting the stability of landslides. In the study area, most landslides distribute between 20° to 30°, and there is no landslide distribution after more than 50°.

It is essential to group lithology properly. We divided lithology into three groups: mudstone, shale and Quaternary deposits; sandstones and thinly bedded limestones; and limestones and massive sandstones.

The bedding structure is a continuous raster layer representing the angular relationship between topography and strata attitude. The relationship can be characterized by the product of the bed dip direction and angle, slope angle, and aspect (Meentemeyer and Moody 2000; Peng et al. 2014). The classification for bedding structure is shown in Fig. 3. Then, these data were used to generate a bedding structure map (Fig. 4).

Fig. 3
figure 3

Classification of the bedding structure. α: slope aspect; β: bed dip direction; γ: bed dip angle; and δ: slope angle (by Peng et al. 2014)

Fig. 4
figure 4

Bedding structure map for the whole region

Elevation, slope, dis_stream, dis_fault, and dis_road were derived from 1:10,000-scale digital topographic maps and 1:10,000-scale digital geological maps surveyed by the Three Gorges Headquarters. MNDWI and NDVI were calculated from Landsat imagery. Lithology was obtained from 1:50,000 geo-map. Bedding structure was obtained from 1:10,000 topo-map and 1:50,000 geo-map.

By the analysis of multicollinearity, dis_stream showed a high correlation with elevation, and MNDWI showed high correlation with NDVI. As a result, dis_stream and NDVI were excluded from the further calculations. In the study area, the main triggering factor for landsliding is the high amount of precipitation (mainly consists of rainfall). However, the regression analysis does not include precipitation because rainfall is relatively uniform throughout the whole study area.

Slope unit

In this study, slope unit was chosen as the LSM unit, a partition of the landscape based on the surface hydrologic analysis. Partition of a region into subbasins or slope units is obtained from high-quality DEM’S and hydrological regions between drainage and divide lines (Carrara 1988; Carrara et al. 1991). Depending on the type of instability to be investigated (deep-seated vs. shallow slides or complex slides vs. debris flows) the mapping unit may correspond either to the subbasin or to the main slope unit (right/left side of the subbasin) (Guzzetti et al. 1999). In nature, there exists a clear physical relationship between landsliding and the fundamental morphological elements of a hilly or mountainous region, namely drainage and divide lines. Therefore, the slope unit-based mapping unit has more representative power for the landslide phenomena (Erener 2009).

Methods

Two logistic regression models were developed for quantitative mapping of landslide susceptibility: LR was used to establish the relationship between landslide influencing factors and landslide occurrence at a global scale, while GWLR was used to investigate the relationship at a local scale. Figure 5 is a schematic representation of the methodology.

Fig. 5
figure 5

Schematic representation of the methodology

Global logistic model (LR)

LR is a multivariate analysis model for predicting the presence (or the absence) of a phenomenon, based on the values of predictor variables (Lee 2005). It has at least two advantages over traditional multivariate linear regression (Atkinson et al. 2003). First, LR allows the independent variables that are categorical or continuous, or any combination of both types. Second, residuals do not need to be normally distributed about their mean and no assumptions are made about their error distributions. Thus, most of the limitations of traditional regression are removed (Atkinson et al.2003). Using a LR model, the relationship between landslide occurrence (Y) and landslide influencing factors (X1, X2,…, Xn) is established as:

$${\text{Y}} = \log {\text{it}}\,({\text{p}}_{i} ) = \ln (\frac{{p_{i} }}{{1 - p_{i} }}) = \beta_{0} + \sum\limits_{k = 1}^{p} {\beta_{k} *X_{k} }$$
(1)

where pi is the probability of Y occurring at location i, p i /(1–p i ) is the “odds ratio” or likelihood ratio, β 0 is the intercept, and β 1, β 2,…, βp are the regression coefficients. If a coefficient is positive, its transformed log value will be greater than one, meaning the event is more likely to occur. If a coefficient is negative, its transformed log value will be less than one and the odds of the event occurring decrease (Ayalew and Yamagishi 2005).

In the application of LR model for LSM, some scientists have created layers of binary values for each class of an influencing parameter (Guzzetti et al. 1999; Lee and Min 2001; Dai et al. 2001; Dai and Lee 2002; Ohlmacher and Davis 2003). This might lead to a great number of independent variables. If many variables are included, the regression equation will be very long and it may even introduce numerical problems. One solution is to arrange classes of all parameters according to their corresponding landslide densities (Ayalew and Yamagishi 2005). Another alternative method is to transform the categorical variables to numeric variables by using landslide densities (Yesilnacar and Topal 2005; Zhu and Huang 2006; Bai et al. 2010).

In this study, we implemented the two logistics models by using the continuous data standardized to range from 0 to 1 and using landslide density to transform the categorical variables to numeric variables. Landslide density is calculated as (Carrara 1992):

$${\text{Landslide}}\;{\text{density}}\, = \,{\raise0.7ex\hbox{${(B_{i} /A_{i} )}$} \!\mathord{\left/ {\vphantom {{(B_{i} /A_{i} )} {\sum\limits_{i} {(B_{i} /A_{i} )} }}}\right.\kern-0pt} \!\lower0.7ex\hbox{${\sum\limits_{i} {(B_{i} /A_{i} )} }$}}$$
(2)

where A i is the area of the ith type of the influencing variable and B i is the landslide area in ith type of the influencing variable.

Local logistic model (GWLR)

GWLR is a geographically weighted version of the above traditional LR model. It is first developed to explore the relations between riverbank erosion and geomorphological controls (Atkinson et al. 2003). Then, it has been applied in the spatial simulation of regional land use patterns (Liao et al. 2010), the exploration of spatial non-stationarity of fisheries survey data (Windle et al. 2010), other studies in geosciences (Erener and Düzgün 2010; Chalkias et al. 2011; 2014; Cossart 2013; Wu and Zhang 2013; Feuillet et al. 2014; Zini et al. 2015), and in human geography (Windle et al. 2010; Saefuddin et al. 2012; Yang and Matthews 2012).

A key step in the development of GWLR is the choice of a spatial weighting function for estimating local parameters. Specifically, GWLR uses a distance-based weighting scheme since it is assumed that observations near point i have more influence on the estimation of the parameters than observations located farther from i (Feuillet et al. 2014). Then, the weighted least square estimates of β i are:

$$\hat{\beta }_{i} = (x^{T} w_{i} x)^{ - 1} x^{T} w_{i} y$$
(3)

where w i is n by n weighting matrix, whose off-diagonal elements are zero and diagonal elements are the geographical weighting:

$$w_{i} = \left( \begin{aligned}w_{i1}\;\;\;\;\;\;\;\;\;\;\;\\ \;\;\;\;\;w_{i2}\;\;\;\;\;\; \\ \;\;\;\;\;\;\;\;\;\ddots\;\;\;\;\\ \;\;\;\;\;\;\;\;\;\;\;w_{in}\\ \end{aligned} \right)$$
(4)

The choice of w i depends on the selection of kernel function, which may be in the form of fixed (i.e., fixed bandwidth) or adaptive kernels (i.e., varying bandwidths). We choose bi-square function (Fotheringham et al. 2002):

$$w_{ij} = \left\{ \begin{aligned} \left[ {1 - (d_{ij} /b)} \right]^{2} \;\;d_{ij} \le b \hfill \\ 0\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;\;d_{ij} > b \hfill \\ \end{aligned} \right.$$
(5)

where b is the bandwidth and d ij is the spatial distance between location i and j. Points close to location i are highly weighted, and the weighting reduces as d ij increases. The choice of bandwidth size is influent and is determined by minimizing the corrected Akaike’s information criterion (AICc), which is based on the log likelihood of the model (Johnson and Omland 2004). In this study, the bandwidth that minimizes the AICc is 805 m in the GWLR model with 7 variables. But we choose a fixed bandwidth (1600 m) in order to compare with other GWLR models.

All the regression calculations were performed using GWR4.0 freeware, while interpolation and mapping were computed with ArcGIS 10.2, and spatial statistics was done in GeoDa.

Model comparison criteria

Six evaluation criteria, considering both fitting and complexity of the models, are used to compare LR and GWLR. BIC/MDL is appropriate for arguing the degree of complexity of the process to be analyzed. Lower value of AICc indicates a more efficient model. The model error is given by deviance, and the lower the value is, the better the fitting is. Local percent deviance explained (pdev) is another goodness-of-fit measure, also known as a type of pseudo-R square. The higher the value is, the better the model fits to the data. The receiver operating characteristic curves (ROC) are also given. The area under the ROC curve (AUC) is used to evaluate the accuracy of model. It is normally above 0.5 (random discrimination) and not higher than 1 (perfect separation of the classes) (Swets 1988). The global Moran’s I is used as a measure on the spatial distribution of the residual error. The closer the index is to 0, the more random the spatial distribution is.

Results and discussion

Model comparison results

The statistics of model parameters are summarized in Table 1. In LR model, the regression coefficients are constant over the whole region. But in GWLR model, the regression coefficients are different for each slope unit (Table 1 and Fig. 6).

Table 1 Summary statistics for LR and GWLR regression coefficients
Fig. 6
figure 6

Spatial variation of coefficients values from GWLR calculation

From Table 1, bedding structure and lithology seem to be not significant, owing to the small Z scores. But in the study area, they are very important factors. Then, should we remove these two factors? For the sake of answer, two experiments, one with all the factors except lithology, the other with all the factors except bedding structure and lithology, have been compared to see whether the factors could be removed. The comparison results are given in Table 2, and the ROC are shown in Fig. 7.

Table 2 Comparison diagnostics between LR and GWLR models with different number of factors
Fig. 7
figure 7

ROC of LR and GWLR models with different number of factors

Table 2 shows that, as factor removed, the model complexity is decreased slightly (BIC/MDL decreased gradually), both for LR and GWLR model. However, there is almost no change in LR results except BIC/MDL, whether the goodness of fit, or the spatial autocorrelation of residuals. But GWLR model behaves different. When lithology factor being removed, the goodness of fit of GWLR significantly reduced, with less randomness of residual distribution. Details about the six evaluation criteria see the last section.

The comparative results told us two things. On the one hand, it is likely to weed out some important factors by significance tests; on the other hand, GWLR model could better reflect the importance of lithology factor and protect important factors being removed. Therefore, we decided not to remove the last two factors.

Next, let us compare LR and GWLR models. Again, from Table 1, we can easily find that LR lithology appears to have a little positive influence on landslide occurrence, as it has a small positive regression coefficient (0.005), and the other factors have negative effects in landslide formation as they all have negative coefficients. However, the situation changes in GWLR model; lithology does not always keep positive influence at the local scale, as its coefficients range from −0.533 to 0.695, negative in the northern and middle parts of the study region and positive in the eastern and western other parts (Fig. 6). And the other factors do not always have negative effects in landslide formation, except for elevation and MNDWI. The hypothesis that elevation is negatively related to landslides has been confirmed by both LR and GWLR. However, GWLR results indicate a degree of spatial variation in the relationship between landslide susceptibility and the influencing factors in the study area, which seems more reasonable than LR. For example, through the slope histogram statistics before the regression analysis, we know that slope is impossible always negatively related to landslides, but for LR slope has a negative effect (−0.401) over the whole study area.

Nevertheless, GWLR has well maintained the general trend with LR, for the comparability between the median of the regression coefficients in GWLR and the corresponding coefficients in LR (Table 1). And for both the two models, elevation shows the strongest correlation with landslide, as it has a maximum absolute value, next are MNDWI and slope. On the contrary, bedding structure and lithology have little influence on landslide occurrence (Table 1).

Furthermore, it is easy to find from Table 2 that GWLR model has less complexity than LR, as it has smaller BIC/MDL values. GWLR also shows a better fit of the statistics, with lower values of AICc, deviance, and higher values of pdev and AUC. Moreover, GWLR gives a more randomly distributed spatial pattern, as it has lower residual Moran’s I.

Landslide susceptibility mapping

Landslide susceptibility maps were created after obtaining the two regression models (Fig. 8). The probability of landslide occurrence, i.e., a proxy for the landslide susceptibility index values has been obtained from the regression calculation. For LR, the probability ranges from 0.0003 to 0.826, while for GWLR it ranges from 0 to 0.834. The natural breaks algorithm in ArcGIS was used to divide the probability maps into four susceptibility zones: very low, low, medium, and high. The breakpoints are: 0.114, 0.273, and 0.470 for LR, and 0.119, 0.291, and 0.477 for GWLR, respectively.

Fig. 8
figure 8

Landslide susceptibility maps produced by LR and GWLR with 7 variables

Looking at the two maps (Fig. 8), there are places where differences are subtle and there are also areas with dissimilarities. In both of the prediction models, the eastern part of the region is less susceptible to landslides, compared to the middle and southwestern parts of the regions which are highly susceptible to landslides. It should been taken into consideration that the more the distance from the effective landslide events the more the intrinsic uncertainty is to the interpolated points. Figure 8 shows that the eastern area is more uniformly depicted in GWLR technique than in LR since there are no real landslides. In contrast, the northern part is more susceptive for GWLR model, which seems more acceptable, for its adjacent to the Yangtze River.

Percentage statistics given in Table 3 show the differences between the two susceptibility maps quantitatively. It is found that 70 % of all the landslides which occurred in the study region in the past lie in the high and medium susceptibility zones in LR, comparing to 75 % that exist in the high and medium susceptibility zones in GWLR. 5 % of the past landslides lie in the very low susceptibility zone in LR, comparing to 2 % that exist in the very low susceptibility zone in GWLR. As it can be seen from the findings, the reliability of the GWLR model is relatively higher.

Table 3 Area statistics for each susceptibility zones

Conclusion

This paper implemented two logistic regression models (LR and GWLR) for LSM in the Three Gorges Reservoir area, China. The summary statistics for regression coefficients (Table 1) and the comparative results summarized (Table 2, 3) show that GWLR has less complexity and higher goodness of fit than LR. It could better reflect the importance of lithology factor and protect important factors being removed. Moreover, GWLR gives a more randomly distributed spatial pattern, as it has lower Moran’s I value of the residual error. The results reveal that GWLR provides potential advantages in LSM and sheds new light on the spatial non-stationarity of the relationship between landslide susceptibility and its influencing factors. It is worth noting that GWLR parameters are specific to the area under study, but the general methodology is “general,” since it is reproducible to any geographical context.