Introduction

Landslides are common geomorphic processes in mountain areas and are responsible for mass movements involving rock materials, regolith and/or soil debris. They usually occur under specific conditions of soil moisture content, having gravity as the main engine. However, the degree to which an area is susceptible to mass movements and the spatial distribution of such events is difficult to establish, largely because the relationships between the triggers and predisposing factors are complex (Guzzetti et al. 2005; Can et al. 2005; Federici et al. 2006). According to Varnes (1984) the keys to determine the occurrence of future landslides depend on the geological, geomorphological and hydrological processes that led to instability in the past and also at present. Consequently, the objective of the landslide susceptibility assessment is to identify the probability of failure of the land surface by mass movement (Varnes 1978; Guzzetti et al. 2005).

To date, research on susceptibility to mass movements has generally been interested in individual events and has been spatially orientated. The assessment of individual cases includes the study of each type of mass movement in a specific area, while the spatial assessment is usually based on the geographical distribution of environmental factors related to a landslide inventory map (Guzzetti et al. 2003; Chacón et al. 2006). This requires the use of statistical tools in combination with geographic information systems (GIS) and remote sensing imagery (Carrara et al. 1999; Guzzetti et al. 2005; Alexander 2008; Budetta et al. 2008).

Different techniques have been used to evaluate terrain susceptibility to landslides, ranging from qualitative assessments based on expert judgement (Gupta and Sah 2008; Budetta et al. 2008; Magliulo et al. 2009) to quantitative assessments based on advanced statistical techniques or mathematical models. Qualitative assessments are intrinsically subjective (Van Westen 2000; Magliulo et al. 2009), while quantitative evaluations may be more helpful in detailed investigations with availability of the required data, but they are usually less effective at regional scale (Carrara et al. 1992; Zhou et al. 2002; Gorsevski et al. 2006).

Among the statistical analyses used to assess terrain susceptibilities to landslides, the most commonly applied include bivariate statistical analysis (Budetta et al. 2008; Magliulo et al. 2009), discriminant analysis (Greco et al. 2007; Carrara et al. 2008; Frattini et al. 2008), logistic regression (Dai and Lee 2002; Can et al. 2005; Carrara et al. 2008; Park et al. 2013), multiple regression (Ohlmacher and Davis 2003; Yilmaz 2009) and factorial analysis (Duman 2005). More recent techniques add in artificial neural networks (Ermini et al. 2005; Chacón et al. 2006), fuzzy logic (Remondo et al. 2003; Gorsevski et al. 2003; Viloria-Botello et al. 2012), uncertainty analysis, reliability analysis and fractal analysis (Chacón et al. 2006).

The relationship between landslides and causative factors is not linear, unique or constant in either space or time (Dai and Lee 2002; Zhou et al. 2002; Ohlmacher and Davis 2003; Duman 2005; Guzzetti et al. 2005; Can et al. 2005; Federici et al. 2006), and although, as mentioned above, statistical analysis techniques have been used to study several aspects of landslide susceptibility, the multitemporal aspect has been little studied (Guzzetti et al. 2003; Chau and Chan 2005; Chacón et al. 2006; Felicísimo et al. 2012). Currently, the interest has been focused on the comparison of methods of prediction of landslides to select those with the greatest accuracy (Choi et al. 2012; Devkota et al. 2012; He et al. 2012; Calvello et al. 2013; Kavzoglu et al. 2014).

Temporal evaluations of landslide susceptibility are included only when there is a perceived need to predict the changes that might occur in a given landscape (Parise 2001). To evaluate the time factor, some authors have performed inventories of landslides, using remote sensing images from different dates. The purpose has commonly been to predict when landslides may occur based solely on the location of past landslide scars, without taking into account changes in environmental conditions. However, spatial and temporal responses of mass movements may vary considerably either within or between basins (Carrara et al. 1992) and over time (Guzzetti et al. 2003). As a result, factors that were initially not decisive may later become triggers or predisposing factors.

Landslide inventories have been related to different environmental variables by logistic regression analysis, so as to identified those factors with the greatest influence over landslide occurrences (e.g. Dai and Lee 2002; Douglas et al. 2013). This kind of analysis applied to landslide inventories from different dates in a given area, may reveal whether the influence of environmental factors has been consisting or it has changed over time.

The present research aimed to determine environmental factors associated with susceptibility to landslides in a mountainous area of the Caribbean region, as well as changes that occur to the relationships between these factors and landslides over space and time. The study was carried out in the Caramacate River Basin, located in the southern part of the central coastal range of Venezuela. This area is part of a strategic basin as a source of water and is frequently affected by landslides.

Study area

The study area covered 6760 ha in the Caramacate River Basin, located in the mountains of north-central Venezuela from 9° 55′ to 10° 09′ North, and from −67° 12′ to −67° 03′ East (SIRGAS-REGVEN datum) (Fig. 1). Elevation ranges from 334 to 1405 masl, with an average slope of 40 %. The average annual rainfall is 1100 mm, and the average annual temperature is 22 °C (Pineda et al. 2011b). The geology of the study area is dominated by two major formations (Pineda et al. 2011b): ‘El Chino-El Caño metatobas’ (VCcn) and ‘El Carmen metalavas’ (VCca), which belong to the Villa de Cura group (metavolcanic and metasedimentary rocks) (Urbani and Rodríguez 2003). The VCcn consists of basalts and metavolcanic rocks associated with sedimentary rocks while the VCca comprise mafic metalavas interbedded with metasedimentary and metavolcanic rocks (Shagam 1960). The prevailing land use is extensive grazing which has caused severe erosion and land degradation. Grasslands are alternated with small plots of subsistence farming, forest corridors following waterways and patches of evergreen forest in the highest areas.

Fig. 1
figure 1

Location of the Caramacate River Basin and the study area

Materials and methods

We correlated, by logistic regression analysis, landslide scars with selected variables derived from a digital elevation model (DEM), remote sensing data and geologic and geomorphologic maps, so as to identify relationships between landslides and environmental factors in the study area. This analysis was performed with landslide inventories from different dates, in two areas with diverse geological substratum, to establish whether such relationships have been consistent or have changed through time and space.

Cartographic information

A DEM with a 20-m cell size was generated from 1:25,000 topographic charts, by means of the ANUDEM algorithm (Hutchinson 1989) included in the TOPOGRID command of ArcGIS (version 9.2, Environmental Systems Research Institute, Redlands, CA, USA). Although Palamakumbure et al. (2015) indicates that for regional studies, a pixel resolution of 10 m is optimal, we adopted a 20-m resolution, considering the available source of topographic information and the low intensity of land use prevailing in the study area.

Various types of images were used to identify landslide scars from different dates (Table 1). The aerial photographs from 1941 to 1971 were scanned in greyscale (8 bits) at a resolution of 600 dots per inch. They were georeferenced with the MiraMon® Software v.7.0b, using 52 control points taken from an orthorectified SPOT 5 image from 2008. Later, they were adjusted to the DEM through colinearity equations. It should be noted that the image orthorectification is sensitive to the DEM resolution.

Table 1 Number of points with and without landslide scars and validation points at each combination of geological formations and the date of the landslide inventory

Evaluation of susceptibility to landslides

Landslide mapping

Multitemporal inventories of landslides from 1941, 1971, 1992 to 2008 were carried out by visual interpretation of aerial photographs at 1:25,000 from 1941 to 1971, orthophotomaps at 1:25,000 from 1992 and a multispectral SPOT 5 image with 10-m pixels from 2008. This is a conventional method that provides general information on the distribution, abundance and type of mass movements, as a preliminary step to assess landslide susceptibility. The identified landslides were of the rotational type (Hutchinson 1968; Varnes 1978; EPOCH 1993), which produces scars of concave form (spoon shape) and the material slipped in this case is soil. Because of the resolution of the images used in this study, only scars wider than 5 m were identified and these could not be separated from their corresponding deposits. On the other hand, points with no evidence of landslides were identified for each location outside a 50-m radius buffer, drawn around each scar (Dai and Lee 2002). Landslide points were identified as 1 whereas points without landslide evidences were coded as 0. The visual interpretation of landslide scars was validated by examining in the field 30 of the landslide points identified on the image corresponding to 2008 (Fig. 2).

Fig. 2
figure 2

Left (a) landslide scars close to the drainage network; this situation particularly occurs on the slopes of the VCca geomorphic unit. Right (b) landslide scars observed near the top of hill slopes in the VCcn geomorphic unit

Six different combinations of geological unit and year were compared. Each combination (named setting) is described in Table 2 while Fig. 3 shows the location of points with and without landslide scars in each setting. It is necessary to say that the area corresponding to the VCcn formation was not covered by the 1941 and 1971 aerial photographs.

Table 2 Geomorphometric parameters related to the occurrence of landslides
Fig. 3
figure 3

Points with and without landslide scars in each combination of geological formations and the date of the landslide inventory

Estimation of terrain parameters

The present research related the landslide inventories to the following environmental variables:

  1. A.

    Geomorphometric parameters computed from the DEM by means of the Software Systems for Automated Geoscientific Analyses (SAGA) v2.0.4 (Table 2). Such parameters were chosen because previous works have related them to the occurrence of landslides (e.g. Adediran et al. 2004; Bolongaro-Crevenna et al. 2005; Ardiansyah Prima et al. 2006; Pineda et al. 2011a). Table 2 indicates how each parameter was calculated.

  2. B.

    Distance to the drainage network, obtained through contour curves plotted every 50 m from the drainage lines (Dai and Lee 2002), by means of the methodological approach proposed by D’Amato Avanzi et al. (2004) and Shrestha and Zinck (1999). This variable was considered because previous observations (Montgomery and Dietrich 1989; Hovius et al. 1998; Ng 2006) suggest that landslides are more frequent near to the drainage network.

  3. C.

    The Normalized Difference Vegetation Index (NDVI) (Rouse et al. 1973) calculated from a LANDSAT TM image from 1992 and a SPOT 5 image from 2008.

  4. D.

    Landform classes identified by geomorphic criteria at scales 1:50,000 (types of landscape) and 1:25,000 (types of relief) (Pineda et al. 2011b).

Logistic regression analysis

The environmental variables indicated above, converted to ASCII files, were related to the presence or absence of landslides by zonal statistics in ArcMap 9.2 (Environmental Systems Research Institute, Redlands, CA, USA). A multiple logistic regression analysis based on the maximum likelihood model (Chung 2006; Lee et al. 2007; Dewitte et al. 2010) was performed with SPSS v 12 (SPSS Inc., Chicago, IL, USA). This analysis used 90 % of the sampling points available at each setting of geologic unit and year. The remaining sampling points were used for the validation of results. The logistic regression analysis explored the relationship between the environmental factors, as independent variables, and the presence (landslide scars in Table 1) or absence (no scars in Table 1) of landslides, as the dependent variable. The values predicted by the logistic regression results range from 0 to 1 and are defined by the following equations:

$$ \begin{array}{l}P=\raisebox{1ex}{${e}^z$}\!\left/ \!\raisebox{-1ex}{$1$}\right.+{e}^z\\ {}z=b0+b1\times x1+b2\times x2+bn\times xn\end{array} $$

where P is the probability of landslide occurrence (landslide susceptibility index), z is the linear logistic model, b0 is the intercept of the model, n is the number of landslide-conditioning factors, b is the weight of the each factor and x are the environmental variables affecting the occurrence of landslides.

The probability of a landslide occurrence was interpreted as landslide susceptibility (e.g. Dai and Lee 2002; Can et al. 2005; Carrara et al. 2008; Pradhan and Lee 2010; Bai et al. 2010), since P indicates the odds of landslides, i.e. the possibility that landslide events will happen, calculated as P/1 − P. According to Can et al. (2005), P must be interpreted as landslide susceptibility and not as the probability of occurrence, because it does not take into account the time factor.

The obtained equations and their respective probabilities were applied to the whole study area using ArcMap 9.2. Finally, quantitative landslide susceptibility maps (one map for each setting) were generated and classified into four levels of susceptibility to landslides based on the values of P, based on the criteria proposed by Tangestani (2003) (<0.25 = low; 0.25–0.5 = moderate; 0.5–0.75 = high; >0.75 = very high).

Validation of the landslide susceptibility models

A confusion matrix was produced for each model comparing predicted and observed landslides at the validation points. The model for VCca-1941 was not assessed because the number of validation points in that setting was too small. An analysis based on the receiver operating characteristic (ROC) curve (Fawcett 2006) was performed from each confusion matrix. The ROC curves plotted the fraction rate of positive outcomes that were correctly identified (true positives) versus the rate of positive outcomes that were not correctly identified (false positives). The area under the curve (AUC) was used as a metric to appraise the overall performance of each model, so that the larger the AUC, the better the performance of the model.

Results and discussion

A relatively small number of landslide scars were identified on the aerial photographs from 1941. All of these scars corresponded to the VCca geological unit, since this set of aerial photographs did not cover the VCcn area. The density of landslides scars increased almost six times from 1941 to 1971 (Table 1), which seems to be related to the change of land cover from forest to grassland observed on the aerial photographs between those dates. The density of scars decreased from 1971 to 1992, probably because local vegetation covered some landslides no longer active or because landslide scars were more easily identified on air photographs than on orthophotos. However, the density of identified scars augmented again from 1992 to 2008, almost 1.5 times in both VCca and VCcn geological units, revealing that landslides are a growing threat in the studied area.

The logistic regression models varied between geological units. This is in agreement with other authors who found that lithology was, among other factors, a determining factor leading to landslide occurrence (Ozdemir and Altural 2013; Kavzoglu et al. 2014). The regression models also varied among dates within the same geological unit, which demonstrates that the explanatory factors of landslides changed with geological conditions and through time in the study area. However, the predictive variables maintained the sign of their coefficients through different settings of geological unit and time, except for the convergence index (CI) whose coefficient changed from negative in VCca-1971 to positive in VCcn-1992. Variables with positive coefficient are regarded as landslide promoters while those with negative coefficient are considered as protective factors.

This research considered several categorical variables as potential predictive factors of landslides. Three of them corresponded to geomorphologic units from a hierarchical landscape classification (geological unit, type of landscape and type of relief). The others comprised classified landscape attributes (flow direction and distance to the drainage network). From those variables, only flow direction (FWD) was included as a predictive variable in some of the evaluated settings: VCca-1941 and VCcn-1992 (Table 3). This demonstrates that, with the exception of the geological substratum, the attributes of the geomorphologic units obtained by landscape classification did not determine the susceptibility to landslides in the study area.

Table 3 Variables included in each model of binomial multiple logistic regression analysis and predictive power

In the VCca-1941 setting, the logistic regression did not recognize any explanatory variable with positive coefficient (Table 3). The negative coefficient of flow direction (FWD) in this setting indicates that the odds of landslides was less if flow direction incremented by 1, provided the other variables remain unchanged. According to the codification of this variable, such a result implies that landslides were more frequent where water flowed towards S, SE and SW. In VCca-1971, landslide occurrence was explained by slope, aspect, elevation and the convergence index. This partially coincides with Ozdemir and Altural’s (2013) results in SW Turkey, which showed that slope, aspect and elevation played major roles in landslide distribution. In this setting, the odds of landslides increased 15 % as slope steepness augmented by 1, keeping constant the other predictor variables. Conversely, landslides decreased as altitude raised and the convergence index augmented, which implies that landslides were less likely to occur in upper lands and in convex areas with diverging flow.

The most striking result of the logistic regression equations obtained from the 1992 and 2008 landslide inventories, either in VCca or VCcn, was the dominant influence of the vegetation index (NDVI) on landslides. This index has been used in this research as an estimate of vegetation cover. The values of the NDVI coefficients raised to the exponent B (Table 3) were, by far, larger than those of any other variable. These values of Exp (B) indicate the expected increase in the odds of landslides for a unit increase in the NDVI, holding the other predictor variables constant at a certain value. Such an increase in the odds of landslides does not depend on the original value of the NDVI, but on the increase of these values. In fact, the largest values of the NDVI in the study area correspond to the evergreen forest in the highest areas, where landslides are uncommon. Conversely, the largest increases of the NDVI values correspond to two different conditions which are associated with landslides events. The first condition involves secondary vegetation invasions on pasturelands that had been abandoned for the lost of forage productivity due to soil erosion. The second condition corresponds to riparian forests present along the drainage network. Previous results obtained in the study area revealed that landslide susceptibility increases on concave slopes located at less than 50-m distance from the drainage network (Pineda et al. 2012). In agreement with these results, He et al. (2012) found that areas around rivers were highly susceptible to the occurrence of landslides in the Qinggan River delta.

The other predictive variables included in the 1992 and 2008 models, in addition to the NDVI, varied from one setting to the other. Thus, the odds of landslides grew 3 % in VCca and 4 % in VCcn, in 1992, in response to each unit increase of the sediment transport index (STI), provided the other predictor variables were constant. However, this variable had no significant effect on the landslide odds in 2008. Besides, in 1992, landslides were boosted by the convergence index (CI) and diminished by the flow direction (FWD) in VCcn, but not in VCca. In the 2008 settings, the variable with the greatest influence on landslides, after the NDVI, was the slope gradient (SLO). For constant values of the other predictive variables, a unit increase in slope gradient increased the odds of landslides by 22 % in VCca and 29 % in VCcn, that year. The remaining environmental variables had little effect on the odds of landslides in that date. These results match with the established for Corominas et al. 2014 who state that the type and weighting of each factor depend on the environmental setting and may also differ substantially within a given area due to differences in terrain conditions (e.g. soil properties and local relief). Figure 4 shows the landslide susceptibility predicted by the logistic regression model in each evaluated setting. In both geological units, VCca and VCcn, the area covered by high susceptibility to landslides has increased through time.

Fig. 4
figure 4

Logistic multiple regression models relating environmental variables to scars associated with landslides in the VCcn, VCca or both formations, for each evaluated year

Both the type and number of the variables included in the models obtained for the different settings revealed that the variables related to the occurrence of landslides varied not only with the geological substratum but also with time. This confirms that environmental conditions affecting landslides are so dynamic that results of statistical analysis applied to a given location and date cannot be extrapolated through space or time.

Figure 5 shows the validation results of the logistic regression models produced for each setting of geological substratum and year. Predictions of landslides were more accurate in the geological unit VCca than in VCcn. In the first of them, the models for 1971 and 1992 produced fairly accurate predictions, with AUC values above 0.7, but prediction accuracy dropped below 0.7 for the 2008 model. Conversely, the models obtained in this research failed at predicting landslides in both of the dates evaluated for the VCcn geological unit. In particular, the VCcn-2008 model substantially underestimated the occurrence of landslides, leading to values of AUC below 0.5. These results suggest that there is a greater complexity of the processes involved in the occurrence of landslides in the VCcn area than in VCca. Consequently, landslide occurrence in VCcn could not be properly modelled by means of the variables and statistical methods used in this investigation. The validation results also indicate that the complexity of landslide processes increased with time in the whole studied area.

Fig. 5
figure 5

ROC curves and values of area under the curve (AUC) representing the prediction accuracy of the models: a VCca-1971, b VCca-1992, c VCca-2008, d VCcn-1992 and e VCcn-2008

Conclusions

The density of landslide scars increased notably during the study period in the two geological units considered in this research, particularly between 1992 and 2008, confirming that landslides are a growing threat in the study area. The models of landslide susceptibility obtained in this study varied between geological units and among dates within the same geological unit. So did the prediction accuracy of these models. This suggests that in the studied area, the complexity of the landslide processes, as well as their explanatory factors, changed with geological conditions and through time. Consequently, these findings put into question the use of a single model to assess the susceptibility of landslides over large regions. Thus, the variables selected for the regression models were suitable only to the considered region and time. Although the applied methodology can be used in other regions, the models will likely change to include other factors according to the intrinsic characteristics of the new localities.