Introduction

The Cordillera Blanca in Peru (Fig. 1) has always been prone to disastrous landslides or related events. Several disasters like the two landslides at Nevado Huascarán (January 10, 1962 and May 31, 1970; Plafker et al. 1971; Vilímek et al. 2000; Evans et al. 2009) or the glacial lake outburst flood of Palcacocha (1941) (Carey 2010) with thousands of victims caused intense hazard and risk-related research in this area (Lliboutry 1975; Lliboutry et al. 1977; Vilímek et al. 2000, 2005; Hubbard et al. 2005; Klimeš and Vilímek 2011; Somos-Valenzuela and Mckinney 2011; Huggel et al. 2012; Klimeš 2012; Schneider et al. 2014; Klimeš et al. 2016). This research is usually concentrated on specific slopes or regions. An exception is the work done by Villacorta et al. (2012) who established a landslide susceptibility map covering the whole country of Peru. Due to the spatial resolution of their work of 100 m, most of the Cordillera Blanca is just considered having high or very high susceptibilities of landslides. Such spatially coarse information is hardly applicable for the local population and administration, thus more detailed studies possibly distinguishing different landslide types are needed. The presented work investigates the susceptibility to shallow landslides of the Cordillera Blanca using, amongst others, a physically based landslide model represented by infinite slope stability calculation coupled with a less complex infiltration model (Pack et al. 1998). Attempts with more complex, three-dimensional models pointed out the necessity of detailed information on spatial distribution of geotechnical parameters and local infiltration processes (Mergili et al. 2014).

Fig. 1
figure 1

Study area and landslide inventories. Overview map from (Earth G 2015). The term SLI refers to Shallow Landslide Inventory, MLI to Marcará Landslide Inventory

Physically based models like SINMAP, SHALSTAB, or Transient Rainfall Infiltration and Grid-Based Regional Slope-Stability (TRIGRS) (Crosta and Frattini 2003; Meisina and Scarabelli 2007; Terhorst and Kreja 2009; Zizioli et al. 2013; Michel et al. 2014; Pradhan and Kim 2015; Sarkar et al. 2016; Thiebes et al. 2016) or empirical statistical models (Van Den Eeckhaut et al. 2006; Bai et al. 2011; Felicísimo et al. 2012; He et al. 2012; Park et al. 2013) have been applied in many different regions around the world. All the mentioned studies used different additional information like geological, land-use, or soil maps to better describe occurrence conditions of the studied landslides. Data availability largely constrains the extent of the application of the models, which would otherwise require extensive field work. Applications in mountainous regions are often even more challenging, as the landslide preparatory factors change abruptly in space (e.g. slope dip or soil characteristics due to the different altitudinal belts (de Castro Portes et al. 2016)) and available maps may lack important details of their spatial distribution pattern. Therefore, the model parametrisation on a regional scale always introduces uncertainties which are very difficult to assess or even quantify limiting the applicability of the final susceptibility maps (Guzzetti et al. 2006; Levermore et al. 2012).

To overcome this problem, this paper aims at elaborating how landslide susceptibility models perform in areas without any additional information to the ones given by a digital elevation model (DEM), as the elevation information is increasingly more reliable and available even in high mountains due to the variety of remotely sensed data (Lacroix et al. 2015). To assess the influence of DEMs and to possibly improve the models, three different DEMs are used for this study. Two of them have a spatial resolution of 30 m (ASTER GDEM and SRTM), and the third one, TanDEM-X, has a spatial resolution of 12 m.

Comparisons of the effect of DEMs and models on landslide assessment and mapping have been made using different physically based and statistical models (Havenith et al. 2006; Yilmaz 2009; Zizioli et al. 2013; Pradhan and Kim 2015; Sarma et al. 2015) and DEMs. The comparison of the DEMs, though, mainly focused on the effect of the DEM’s spatial resolution (Claessens et al. 2005; Legorreta Paulin et al. 2010; Fuchs et al. 2014; Arnone et al. 2016; Schlögel et al. 2018). Comparisons of different DEMs of the same or similar resolution are scarce. Existing studies showed that comparing different models can significantly improve the results. A general tendency if statistical or physically based models lead to better results was not found. Concerning the DEMs, they indicated that a higher resolution does not automatically lead to better results, although such finding may not be generally valid (Mergili et al. 2014). Most studies received the best model performance for a spatial resolution around 10 or 20 m. A systematic comparison of different DEMs of similar resolution using statistical and physically based models, however, is yet missing.

The main part of this paper is the comparison of the physically based model SINMAP to a statistically based model using logistic regression (LRM). As a third and a fourth model, two different slope models have been established. One using the logistic regression considering only the slope angle as independent parameter, and another one using landslide density per slope class, in order to evaluate the added value of the parameters considered additionally by SINMAP and LRM. These four models will be applied to the three DEMs. The evaluation of these model runs will be done using two different landslide inventories. One of the inventory includes shallow landslides distributed over the whole study area, the other one is restricted to a much smaller region (see Fig. 1). These two inventories are used to avoid model uncertainties caused by the highly variable conditions of the large study area. These efforts should answer the following questions:

  • How do regional-scale landslide susceptibility models perform in areas with highly variable morphology and soil characteristics that are typical for high mountain regions?

  • How much of the performance of a model can be explained by considering only the slope angle?

  • What is the influence of the different used DEMs on the performance of the models?

Study area

The Cordillera Blanca is situated in Áncash, Peru. Large parts of it are considered in this study (see Fig. 1). The mountain range is margined by the Rio Santa in the west. This river flows through Huaraz on around 3000 m above sea level (m a.s.l.) and flows down to 1400 m a.s.l. in the north of the study area. There are several peaks on altitudes above 6000 m a.s.l., including Nevado Huascarán, the highest mountain of Peru with 6768 m a.s.l. (ÖAV 2006). Therefore, big parts of the study area are glacierised or of its surface consist of bare rocks (1381 km2). The remaining area (2861 km2) consists of many steep slopes which are on average around 23° steep. The large interval of the elevations leads to the above-mentioned variability of the soils. Some soils reach thicknesses of more than 2 m, whereas in the higher elevations, there are no soils at all. Besides the soil thickness, the soil type varies a lot as well. Different kinds of soils can be detected varying from Folic or Haplic Umbrisols to Haplic Leptosols, Haplic Regosols, or Endogleyic Fluvisols (de Castro Portes et al. 2016).

The climate is dominated by a dry and a wet season. During the wet season from October to March, about 400 to 800 mm of precipitation is recorded, generally increasing with elevation, while during the dry season from April to September, only 100 to 200 mm of precipitation is observed. This combination of steep topography, extreme precipitation, and other factors, such as earthquakes, led to several landslides in the past (see Fig. 1). Furthermore, this region is highly inhabited. Only the major towns of Huaraz, Yungay, Caraz, and Carhuaz together, all located in the Santa valley at the eastern foot of the Cordillera Blanca, have around 300,000 inhabitants (Instituto Nacional de Estadística e Informática 2015). These conditions make imperative to address landslide susceptibility zoning in order to make a first step toward increasing landslide resilience of the local population.

Data and methods

Landslide inventories

High-resolution optical data like aerial photography or satellite images have proven to be a useful tool for establishing landslide inventories, in particular because such data is increasingly becoming freely available in a georeferenced format (He et al. 2012; van Westen et al. 2012; Zizioli et al. 2013; Kritikos and Davies 2014; Steger et al. 2015; Alejandrino et al. 2016; Pradhan and Mezaal 2017). Google Earth, for example, has been used a lot for this purpose (England 2011; Guzzetti et al. 2012; Corominas et al. 2014; Fuchs et al. 2014; Posner and Georgakakos 2015; Sarkar et al. 2016). Therefore, the Shallow Landslide Inventory (SLI) was based on different Google Earth images recorded in August 2013, July 2014, April 2016, May 2016, and July 2017.

This landslide inventory is restricted to shallow landslides (less than 2 m) as defined by Sidle and Ochiai (2006) which corresponds to the kind of landslides modelled by SINMAP (Pack et al. 1998). The inventory covers the entire study area. We looked for one or several of the following recognition features proposed by Rowbotham and Dudycha (1998): (i) disrupted vegetation patterns, (ii) scars, or (iii) obviously displaced blocks of unconsolidated material. Each landslide was represented by a point located in its uppermost part. The resulting inventory was then split into a calibration set for the LRM (about 75% of the landslides) and a validation set (25% of landslides) for all models. This is a slightly more equilibrated ratio than the 80/20% used by Bai et al. (2011) and Van Den Eeckhaut et al. (2006). By doing so, we ensured to still have enough points for the validation. Hence, of the 254 landslides which were found (see Fig. 1), 196 were used to calibrate the statistical models and 58 were used as a validation set. The calibration set was completed by 798 non-landslide points. For data, where non-events are much more frequent than events as it is the case for landslide occurrence, it is recommended to reflect this as well in the calibration set, with up to five times more non-event points (King and Zeng 2001; Van Den Eeckhaut et al. 2006; Bai et al. 2011). For calculating the non-landslide points, each of the mapped landslides of the inventory was subtracted from the study area using a 5-m buffer zone around its origin point. Within the resulting polygon, a random point pattern was calculated for having 798 randomly distributed points within the study area. The landslides of the Marcará Landslide Inventory (see later) are not considered in this inventory for having two exclusively distinct inventories.

The second inventory used for the evaluation was the Marcará Landslide Inventory (MLI). This is an already existing landslide inventory established for an ongoing study in the region around Carhuaz and Marcará. It was prepared through the high-resolution optical data available on Google Earth, in particular the images from July 2016, and then it was validated with extensive fieldwork. It includes all landslide types which could be identified in the field. They were classified depending on their activity (judged only based on visual characteristics like the landforms and vegetative patterns), shape, and depth, divided into shallow (less than 2 m according to Sidle and Ochiai (2006) also used for the SLI), medium (2 to 10 m), and deep (more than 10 m). Of all landslides included in this inventory, just the shallow ones were selected for this work. Due to the field check of this inventory, we consider it to be more complete than the Shallow Landslide Inventory. The purpose of this second inventory is having a smaller, but more homogeneous evaluation area. None of the 77 landslides of the MLI (see Fig. 1) were used for the calibration of the models, but they were used for their evaluation.

Digital elevation models

The four landslide susceptibility models were applied to large parts of the Cordillera Blanca (see Fig. 1) using three different DEMs: the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global DEM (ASTER GDEM) (NASA LP DAAC 2011), the Shuttle Radar Topography Mission (SRTM) DEM (version 4) (USGS 2015), and TanDEM-X DEM (TDX). The last DEM was produced in-house from TanDEM-X acquisitions performed along ascending (24.01.2013) and descending (01.10.2013) orbits with a posting of 0.0001 decimal degrees, corresponding to about 10 m. For TanDEM-X DEMs produced with the same methodology in the past over Mount Etna in Italy (Wegmüller et al. 2014) and the Chomolhari region in Bhutan (Ambrosi et al. 2018), we found in comparison to ground control points measured by means of GPS mean differences of the elevations of 0.6 m and 3.6 m, respectively, and standard deviations of 4.3 m and 2.8 m, respectively. The first two DEMs have a spatial resolution of 30 m (Farr et al. 2007; Tachikawa et al. 2011), and the third one was resampled to the 12-m resolution provided by TDX (Deutsches Zentrum für Luft- und Raumfahrt e.V. 2009).

For having comparable results the DEMs were co-registered using the co-registration method developed by Nuth and Kääb (2011). This was applied with the SRTM DEM as master and the ASTER GDEM as slave, as the used SRTM and TDX were already co-registered. The study area was extracted from these DEMs for the modelling, and after the model run, the glacier and rock mask were applied, for not removing parts of the flow accumulation areas in advance.

SINMAP model

Stability Index Mapping (SINMAP) is a physically based slope stability model developed and described by Pack et al. (1998). It calculates for each grid cell of the map stability index (SI). The SI is based on a dimensionless form of the infinite slope stability model’s factor of safety (FS). For its calculation, some parameters can be derived from the DEM. These are the slope angle, flow direction, and the specific catchment area. The remaining parameters need to be set manually using available geotechnical data (e.g. cohesion). As no further information was available of the soil parameters and the variable soil characteristics are hardly capturable anyway, the default values proposed by the authors (Pack et al. 1998) were used here. Hence, the parameters were set as follows: transmissivity/effective recharge = 2000 to 3000 m, dimensionless cohesion = 0 to 0.25, internal friction angle = 30 to 45°, and soil density = 2000 kg/m3. The resulting map was then classified into six categories from “defended slope zone” (SI < 0) to “stable slope zone” (SI > 1.5, Pack et al. 1998). Application of these arbitrary parameters strongly limits the use of the model results which should be considered only as general susceptibility indicator and shall not be viewed as actual FS.

Multiple logistic regression model

Regressions are used to model a dependent variable with one or more independent or explanatory variables. A linear regression model assumes a straight-line relationship between the independent and the dependent variable, except an error term (Ross 2010). For many problems in Earth sciences, this is a reasonable model (Yilmaz 2009).

For binary response data, though, where just the presence or absence of a phenomenon is of interest, logistic regressions are more often used (Lee and Sambath 2006). Especially in the case of landslide susceptibility modelling, this is a frequently used approach (Lee 2004; Ayalew and Yamagishi 2005; Lee and Sambath 2006; Van Den Eeckhaut et al. 2006; Yilmaz 2009; Bai et al. 2011; Felicísimo et al. 2012; Devkota et al. 2013; Park et al. 2013; Kavzoglu et al. 2014). The aim of a logistic regression is to model the probability of the occurrence of an event based on some independent variables (Hilbe 2011). In the case of landslide modelling popular independent variables are listed in Table 1.

Table 1 Common independent parameters for logistic regression modelling of landslide susceptibilities

For this study, only information which can be derived from a DEM is used: elevation, slope, aspect, curvature, flow accumulation, and distance to rivers (derived from flow accumulation).

To perform a logistic regression with these variables, a calibration dataset is required. This dataset consists of event (1) as well as non-event points (0). The calibration set was analysed using the glm function of the stats package in R. The considered independent parameters were combined in different ways to find the lowest AIC (Akaike information criteria) and, hence, the best model (Hilbe 2011). In order to receive the lowest AIC, a regression model using all parameters was compared to different regression models with less explanatory parameters. The resulting parameters which have a significant influence on the model and lead to a lower AIC were then used for the logistic regression using Eqs. 1 and 2:

$$ P\left(Y=1\right)={\mu}_{\mathrm{i}}=\frac{1}{1+{e}^{-z}} $$
(1)
$$ z={x}_{\mathrm{i}}\beta ={\beta}_0+{x}_1{\beta}_1+\dots +{x}_n{\beta}_n $$
(2)

where μi = probability that a landslide occurs, β = calculated weights for the explanatory variables, xi = explanatory variables, and n = number of explanatory parameters (Hilbe 2011).

The final map of resulting landslide susceptibilities was classified into five classes using the natural breaks method.

Slope model

Within many slope stability models, including the ones used in this study, the slope angle plays a crucial role for determining the stability of slopes (Wu and Sidle 1995; Dietrich and Montgomery 1998; Pack et al. 1998; Baum et al. 2002; Lee et al. 2002; van Beek et al. 2002; Haneberg 2004; GEO-SLOPE International Ltd. 2012; Kavzoglu et al. 2014). Hence, as it is one of the main factors used, a second logistic regression model was established, using the slope angle as the only explanatory variable. It was trained using the same training set as for the LRM above. This most simple model was compared to SINMAP and LRM, to see how much the increasing number of explanatory parameters improved the model performance. The classification process was done identically to the one of the LRM. Since this logistic regression slope model is very similar to the LRM, further simplification was done using bivariate statistics to define slope angle/landslide occurrence relationship. This second slope model is based on the failure rate method described in Jäger and Wieczorek (1994), but just uses the landslide densities. The slope maps were classified into classes of 5° (0–5°, 5–10°, etc.) up to the final class which includes all areas with slope angles > 50°. The landslide density was calculated by dividing the number of landslides in the calibration set of the Shallow Landslide Inventory, by the total area of the slope class.

Comparison

The model runs are using the DEMs and their derivatives as input parameters. Potential differences of the performance of the model runs could, therefore, relate to the variability of these characteristics. Hence, these variabilities were compared between the DEMs and the different study areas of the MLI and SLI using a t test for point clouds within the two areas, MLI and SLI. For all the points, the deviation from the mean was tested to check the variabilities of the considered parameters. It was worked using a 1% significance level.

Since the SINMAP model works only on slope stabilities of soils and weathered slope material, areas without any sediment cover like glaciers and rocks were excluded from the analysis. This was done by establishing a glacier and rock mask. At first, areas were selected for slope angles steeper than 50° and elevations higher than 5200 m a.s.l. The resulting polygons were manually completed and adjusted using Google Earth imagery.

Finally, there are 12 different realisations of four models; each one established on the three DEMs. All these model runs were evaluated with the validation sets extracted from two landslide inventories. The process of the validation was done using the Receiver Operating Characteristics (ROC) and the related area under curve (AUC). These methods are useful tools for comparing models which do not have the same scales. A point dataset with a similar amount of non-landslide cells and landslide cells which are distributed within the same area is required. The points of the non-landslide cells were again calculated randomly within the area of the inventory. The calculated value of the model at the given points was then evaluated. This is done by calculating the true positive rate and the false positive rate for different thresholds of the model. The received values are then plotted against each other. The true positive rate is the y-axis and the false positive rate the x-axis. The points on the plot resemble a curve, and the bigger the area below this curve, the better the model performance. The AUC expresses the ratio between the area under the curve to the total area of the plot (Fawcett 2006).

Furthermore, we made a comparison of the susceptibility classes. This is a similar method as Michel et al. (2014) used to evaluate their results. The percentage of landslides which occurred in the least susceptible classes was compared to the percentage of landslides occurring in the most susceptible classes. Additionally, the classes were also spatially compared, to see if the same regions were modelled as most/least susceptible.

Results

First, the different DEMs and the considered study areas were compared. Three hundred eighty-five points distributed within the small area (MLI) were compared to the same amount of points distributed over the large area (SLI). A t test revealed that between the DEMs, there is no significant difference considering the whole study area. There was just a significant difference between the ASTER GDEM and the TDX DEM within the MLI area for the slope angle (p value = 0.00083).

The comparison of the variabilities between the study areas, from the MLI to the SLI, showed statistically significant results for the elevation within all DEMs (see Table 2). Furthermore, the ASTER GDEM had statistically significant results for the variability of the slope angle and the curvature between the two study areas. For all these statistically significant results, the one-sided t test showed that the assumption that the variabilities within the MLI is bigger can be rejected.

Table 2 p values of the t tests comparing the morphologic variability of the MLI and the SLI. Both-sided just indicates differences of the means, whereas MLI less and MLI greater test the assumption that the mean of the MLI is lower or higher. The significant results (< 1%) are italicized

The results of the SINMAP model using the different DEMs had no SI values below 0. Hence, the SINMAP class “defended slope zone” was not used. Concerning the LRM the lowest AIC was achieved with different parameters for each DEM. The used explanatory parameters and their weights for the LRM and the slope model are summarised in Table 3. The maps were classified using the threshold values shown in Table 4, except the landslide density model which did not need further classification. The values used for the classes of this last model are provided on Fig. 2. The resulting susceptibility maps for the SRTM DEM are shown on Fig. 3.

Table 3 Summary of the used weights for the logistic regression models. The first columns are the parameters of the LRM, the last two are the ones of the slope model. “–” means that this parameter was not used for the considered DEM. The parameter aspect and flow accumulation had no significant impact on the model for no DEM and are, therefore, not in the table
Table 4 List of the used threshold values for the classification. The class names are the ones proposed for the SINMAP model (Pack et al. 1998), starting with the least susceptible class
Fig. 2
figure 2

Used values for the classes of the landslide density model

Fig. 3
figure 3

Results for three different landslide susceptibility models using SRTM data

In a next step, the evaluation using the AUC/ROC method was performed for all realisations (see Table 5, Fig. 4. The physically based SINMAP approach seems to be problematic for modelling slope stabilities over the entire study area as it does not perform well using the SLI (see Table 5). Especially the result obtained with the ASTER GDEM is close to a random prediction of landslides. This could be due to the used default literature geotechnical parameters of the model. Therefore, the SINMAP susceptibility maps can only be used as a general reference and do not indicate the real stability conditions of the study area. Within the smaller area (MLI), on the other hand, it performs much better. The statistical models (LRM, slope model, and landslide density model) receive as well higher AUC values for the smaller area and have the lowest AUC values over the entire study area using the ASTER GDEM. But all statistical models obtained better results than SINMAP considering the entire study area. The results using the LRM have generally the best performances. It receives AUC values between 0.684 and 0.759 over the large study area. Within the smaller study area, it even received AUC values which vary from 0.768 to 0.799. Similar values were received for the slope model, which obtained AUC values between 0.672 and 0.742 for SLI and 0.767 to 0.783 for MLI. The two slope models (the logistic regression and density model) performed very similarly, except for the TDX, where the landslide density model performed was worse over the SLI area, but better over the whole area. The landslide density also reached the highest AUC from all applied models for the MLI (TDX) area. For the further analysis, the focus is laid on the first slope model using logistic regression, as it can be easier compared to the other models, since it is established similar to the LRM. For all models, it is noticeable that the AUC values of the different DEMs are much closer to each other for the evaluation with the MLI.

Table 5 Results of the AUC calculations for all the model runs
Fig. 4
figure 4

ROC plot for the results of the regression model. The evaluation using the Shallow Landslide Inventory (SLI) is displayed as a solid line, the one of the Marcará Landslide Inventory (MLI) as dotted line. Cf. Table 5

The portion of the landslides used for the model validation (58 and 77 cases for the SLI and MLI, respectively) within each susceptibility class can be considered as a measure of the success of the model to predict the distribution of landslides unknown during its preparation. The portion of landslides captured by the most susceptible classes shows correct spatial prediction, while landslides which fall into the least susceptible class may be considered as an error of the models. The most susceptible class of all models has a rather small extent (e.g. up to 7% of the pixels of the entire study area). The landslides occurring in this class are less than a quarter for all the model runs. A much higher ratio of correctly modelled landslides can be obtained by considering the two most susceptible classes together. These two classes together extend, depending on the used model and DEM, over 18.29 to 42.76% of the whole area. The percentage of landslides included in these most susceptible classes varies from 43.1 to 77.6%.

Discussion

A possible explanation of the different model performances using the different DEMs and the landslide inventories could be the variability of the DEM-derived characteristics (see Table 2). These results show that the variability of the elevation is significantly different between the MLI and SLI for all DEMs. The lower variability of the DEMs within the area of the MLI might improve the results of the models. The ASTER GDEM has as well higher variabilities within the whole study area (SLI) for two more characteristics, namely the slope angle and the curvature. This might be a reason, why the models using ASTER GDEM perform even worse over the whole study area than using the other two DEMs. It can be doubted, though, that the variability of the DEM-derived characteristics is the main explanation for the model performance. There are no significant differences between the DEMs considering the area of the SLI. Still, there are differences in the AUC of around 0.07 for the different DEMs. The study area of the MLI, on the other hand, does have significant differences between the variability of the TDX and the ASTER GDEM, but there, the AUC of these two DEMs is for all models within a range of 0.02. Hence, there needs to be other factors influencing the model performance than only the variability of the DEM-derived characteristics. One of these factors could be the completeness of the used landslide inventories as the MLI is considered to be more complete due to the performed field check of the satellite image interpretation results.

The distribution of landslides within the susceptibility classes shows at first sight slightly better results for SINMAP considering the SLI (see Table 6) than when using the ROC. It includes by far the highest percentage of landslides occurring in the least stable classes for the whole region. It also has, though, the highest number of landslides occurring in the most stable class. Looking at the extents of these classes, this is not surprising. SINMAP considers around 40% of the area as most susceptible and 30% as least susceptible. The LRM and the slope model consider no more than 28.6% as most susceptible and no more than 21% as least susceptible, respectively. A similar pattern is visible within the MLI. SINMAP classifies regions where up to 66.2% of the landslides occurred as unstable. On the other hand, depending on the DEM, 9.1–16.9% of the landslides occurred in the most stable class. Both these values are higher than the ones of the LRM and the slope model. A spatial comparison of the consistency of these classes is provided in Fig. 5. It shows that similar slopes are considered as most or least susceptible by the models. The class defined as contradiction (see Fig. 5), which includes regions which are considered as least susceptible by one and most susceptible by another model, has a negligible extent (just one pixel of the whole study area). This high spatial agreement is in contrast with findings of other researchers (Sterlacchini et al. 2011) who noted low spatial agreement of susceptibility maps prepared using different combinations of input factors, while maintaining very similar prediction rate. The high ROC and spatial consistency of our results may be attributed to the very similar and limited number of input parameters used for the modelling.

Table 6 Summary of the comparison of the different susceptibility classes. The classes were obtained using natural breaks for the LRM and the slope model. The SINMAP classes are the ones proposed by the authors (Pack et al. 1998). The two most susceptible classes refer to upper and lower thresholds instable, the least susceptible to stable slope zone
Fig. 5
figure 5

The study area showing the spatial agreement between the two most and least susceptible classes calculated by the different models using the SRTM DEM. The class “contradiction” refers to areas which are considered as stable by one and unstable by another model

Poor performance of the SINMAP model may also be explained by the morphological characteristics of the study area. Previous works (Klimeš 2008; Thiebes et al. 2016) suggest that the model performs better in regions with contrasting slopes where landslide source areas distribution does not follow slope distribution within the study area. For instance, highest landslide occurrence is related to less frequent slope class within the study area. The high variability of the landslide occurrence conditions considered by the SINMAP model is illustrated in Fig. 6. The studied landslides occurred on a wide range of slopes and flow accumulation. It contrasts with a different study area in the Czech Republic (represented by the blue line in Fig. 6) where the slope/flow accumulation variability is much lower and also the SINMAP model performed better in this condition. Nevertheless, the obtained AUC values for the SINMAP model results are comparable with other studies, where AUC values between 0.647 and 0.703 were reported (Mergili et al. 2014).

Fig. 6
figure 6

Comparison of the morphologic characteristics of ASTER GDEM and TDX over the smaller area of the MLI for random points. The blue line indicates the maximal extent of the morphologic variability for a study area in the Czech Republic where SINMAP was applied as well (all points are below this line)

The two evaluation techniques are generally in agreement with each other. The high AUC values using the SLI for the combinations LRM/SRTM, LRM/TDX, and slope model/SRTM have high percentages of landslides in the most susceptible class (56.9–63.8%) and low percentages in the least susceptible class (0–1.7%). The LRM/ASTER GDEM combination, though, seems to have a contradiction of the evaluation techniques for the MLI. It received the highest AUC value (0.799) but just classifies 13% of the landslides correctly (in the most susceptible classes) with wrongly classified 5.2% of the landslides in regions considered as least susceptible. An adjustment of the class limits could possibly improve its performance there, as just a small area is considered as most susceptible (3.6%) and a big area is considered as least susceptible (38.8%).

The performances of the slope models come quite close to the one of the LRM model. The combination of TDX and MLI even performed slightly better. For all the other combinations, they just had slightly lower AUC values. These results show that the slope angle is the most important model parameter of the considered variables for explaining the occurrence of shallow landslides. According to the landslide density slope model, the susceptibility increases for steeper slope angles until it reaches a peak somewhere between 40 and 45° (see Fig. 2). Additional DEM-derived parameters only used as explanatory variables for the LRM improved the model performance only slightly. This confirms previous findings by Glade and Crozier (2005) that simply increasing the number of used preparatory factors for susceptibility modelling does not necessarily improve the model performance. These results also provide argument for application of simple susceptibility models in regions with insufficient availability of representative and reliable input model parameters where in-depth studies are missing so far. It would prevent introduction of additional uncertainties into the model while providing acceptable results of the susceptibility zoning.

The spatial resolution of the DEMs does not seem to strongly affect the modelling results. The performance of the SRTM with its spatial resolution of 30 m is similar or even slightly better than the performance of the TDX which has a spatial resolution of 12 m. This may be explained by the fact that none of the DEM’s resolutions is capable of capturing important local topographic variations; thus, no qualitative improvement of the results was detected. This rather low resolution of the main model input parameter limits its applicability which should be limited to the regional scale. Slope-scale use should be very carefully considered and combined with expert knowledge of the field stability conditions, while site-specific applications have to be avoided completely.

Conclusion

Landslide susceptibility has been modelled for a high-mountain study area within the Cordillera Blanca, Peru. Three different model approaches and three different DEMs were evaluated, using a regional and a local landslide inventory. An AUC value of 0.759 was received over the whole area using a logistic regression applied on the SRTM DEM. This was just slightly better than the AUC of a similar model considering only slope angle as model parameter (0.742) or a model based on landslide densities of slope angle classes (0.74). This is a remarkable performance of the models when considering that the study area shows high topographic variability, including elevations between 1400 and up to more than 5000 m a.s.l. (excluding glaciers and bare rock slopes) and that the information used is restricted to remotely sensed data used for both, DEM construction and landslide inventory preparation. The topographic variability is much lower within the local study area of the MLI with the modelling results suggesting that the topographic variability affects performance of the model runs. The ASTER GDEM for example has the highest variabilities of the DEM-derived characteristics between the two study areas (SLI and MLI) and as well the highest difference of the respective results AUCs. The differences in model performances based on the different DEMs, however, could not be explained with this approach. The visual validation of the models, looking at the most and least susceptible classes, indicates reasonable results as well. The most successful model (LRM on SRTM DEM) assigned 63.8% of the mapped landslides to the most susceptible class. This is a satisfactory performance when considering the prepared models as a refinement of the national-scale landslide susceptibility models and databases already available in Peru. Still, the models are surely not precise enough for assessing single slope susceptibilities or stability in the case of the SINMAP. Despite that, we consider the presented approach useful and effective for landslide susceptibility assessment for data-scarce regions as it requires input parameters (DEM and landslide inventories) derived only from remotely sensed data (e.g. SRTM, high-resolution optical satellite images). With the same data, the required glacier and rock mask can be established for further improving results. The presented analyses are restricted to the shallow landslides triggered by water infiltration and saturation. Hence, for complex landslide hazard maps, further modelling, possibly combining different types of models (e.g. statistical and physically based), would be required in order to include all kinds of landslides (e.g. slides, rock-falls, debris flows) which may affect the study region.

For further studies, it would be interesting to analyse whether physically based models could achieve better results at this scale using spatially distributed estimators for soil characteristics and water infiltration. Additionally, it would be interesting to do further comparisons of the DEMs to evaluate if the ASTER GDEM is generally less suitable for landslide susceptibility modelling on regional scales. This was the case here, but for other DEMs, reasonable results could be obtained. Hence, a comparison of different approaches of landslide susceptibility modelling can be recommended to get better results.