Introduction

Landslides are one of the major natural hazards, having large and spread impact all over the world and are responsible for human and socio-economic losses (Froude and Petley 2018). Identification and mitigation of landslide risk is still a challenging task for local administrations and policymakers, being the spatial-temporal prediction of such phenomena still affected by large uncertainties. To this aim, landslide susceptibility assessment is fundamental to improve the understanding of the occurrence of such phenomena (Brabb 1984; Gigli et al. 2014; Lagomarsino et al. 2017). Many methodologies aim at assessing Landslide Susceptibility Mapping (LSM) and in recent years, a significant boost has been gained thanks to robust scientific advances, both among numerical deterministic modelling (Jelínek and Wagner 2007; Godt et al. 2008; Cervi et al. 2010; Formetta et al. 2016; Schilirò et al. 2016; Ciurleo et al. 2017; Park et al. 2019) and statistical approaches (Van Westen et al. 2003; Yalcin et al. 2011; Goetz et al. 2015; Reichenbach et al. 2018; Sepe et al. 2019; Huang et al. 2019; Kavoura and Sabatakakis 2019; Xiao et al. 2019). Deterministic models need information about the physical processes leading to the trigger and hence involve different approaches according to the different types of mass movements. On the other hand, the statistical modelling assumes that the factors that led to slope failure in the past are to be expected to recurrently trigger landslides. Thus, inventories of past landslides coupled with environmental factors can be used to train statistical models (Ermini et al. 2005; Pradhan and Lee 2010; Vorpahl et al. 2012; Corominas et al. 2014; Segoni et al. 2015; Carotenuto et al. 2017). No standard susceptibility procedure has ever been established, hence making possible the exploitation of numerous different algorithms and methodologies. Statistical LSM methods have been extensively applied in different settings and spatial scales by performing different models: bivariate statistical analysis (Constantin et al. 2011; Chen et al. 2018), logistic regression (Chauhan et al. 2010; Mousavi et al. 2011; Wang et al. 2015; Trigila et al. 2015; Tsangaratos and Ilia 2016a, b), multivariate regression (Guzzetti et al. 2006; Akgün and Türk 2011; Felicísimo et al. 2013; Althuwaynee et al. 2014), weight of evidence (Neuhäuser et al. 2012; Ilia and Tsangaratos 2016; Teerarungsigul et al. 2016; Tsangaratos et al. 2017), etc. Spatial distribution of landslides has been also investigated by mathematical models such as fractals (Liu et al. 2019). Among the new statistical approaches, machine learning algorithms (MLAs) are more and more recurrent, gaining new consideration. Many algorithms have been applied to LSM, such as artificial neural networks (ANN—Lee et al. 2004; Ermini et al. 2005; Choi et al. 2012; Gorsevski et al. 2016; Meng et al. 2016; Chen et al. 2018), decision trees (DT—Yeon et al. 2010; Tsangaratos and Ilia 2016b), support vector machine (SVM—Yao et al. 2008; Kavzoglu et al. 2014; Pham et al. 2018, Dou et al. 2019), random forest (RF—Catani et al. 2013; Chen et al. 2017a, b, Krkač et al. 2017) or treebagger RF (Segoni et al. 2020). Many authors have proposed methods to minimize uncertainty, which relies on combining the predictions yielded by multiple algorithms, as testified by the works of Umar et al. (2014), Youssef et al. (2015), Chen et al. (2017a, b), Pham et al. (2017), Kim et al. (2018) and Bueechi et al. (2019). The investigation of new ensemble methods (EMs) for landslide susceptibility mapping is therefore highly necessary, as also highlighted by Chen et al. (2018). Anyhow, EM provides a solid contribution to minimize uncertainty and to refine and improve the prediction accuracy, which is always the key parameter to take into account when working with LSM. In this work, three different MLAs were ensembled to assess landslide susceptibility within some small coastal basins located in the Monterosso al Mare area (Cinque Terre National Park, La Spezia Province, Italy). Specifically, ANN, generalized boosting (GBM) and maximum entropy (MaxEnt) were employed. The study area is highly vulnerable to rapid shallow landslides, as testified by the rainfall-induced ground effects produced by the severe rainfalls occurred on October 25, 2011, which affected a large area across Northern Tuscany and Eastern Liguria reaching intensities of up to 350 mm in 6 h at Monterosso al Mare Rain Gauge (A.R.P.A.L.-C.F.M.I.-P.C. 2012; Cevasco et al. 2015). Such rainfall, which caused extensive damage across the Tyrrhenian basins between Bonassola and Manarola and the Magra river basin (Cevasco et al. 2012; D’Amato Avanzi et al. 2013) triggered more than 500 shallow landslides in the Vernazza catchment (Cevasco et al. 2013, 2014) and about 260 shallow landslides in the area of Monterosso al Mare (Schilirò et al. 2018). Thanks to the availability of high-resolution aerial photographs and field surveys, these landslides were inventoried at great detail. Thus, the database of Monterosso al Mare landslides, represented as points, has been used for susceptibility modelling. Thirteen predisposing factors (PFs) were also considered, based on the main characteristics of the territory and of shallow rapid landslide mechanisms: slope angle, aspect angle, geo-lithology, planform curvature, profile curvature, distance to roads, distance to streams, land use, agricultural terraces state of activity, soil thickness, Topographic Wetness Index (TWI), Topographic Position Index (TPI) and Stream Power Index (SPI). Specifically, each single susceptibility model (GBM, ANN and MaxEnt) was implemented with 100 different landslide data combinations, every time splitting the landslide database in 80% as data training and 20% as data testing. The dataset of selected landslides was performed randomly and, in the end, about 300 different models have been obtained starting from slightly different datasets. The final result is therefore represented by a susceptibility map computed by an average of the values of probability of occurrence previously obtained. With this case study, we aim to promote the use of MLA to assess landslide susceptibility and to provide an example showing how Ensemble modelling can support land use planning activities and landslide hazard management.

Study area

Geological and geomorphological settings

The study area (Fig. 1), approximately 5.5 km2 wide, is located in the Cinque Terre National Park, within the territory of Monterosso al Mare municipality, along the easternmost Ligurian coast (Northern Italy, La Spezia Province). It includes six small catchments: Rio delle Rocche, Rio Pastanelli, Fosso Serra, Rio Gaggia, Corone and Fosso Molinaro. The extent of these water basins ranges from 3.2 km2 (Rio Pastanelli) to 0.1 km2 (Fosso Serra). The Cinque Terre area is a UNESCO World Heritage Site, since 1997 (https://whc.unesco.org/en/list/826/) and a National Park since 1999. This area is worldwide known as a typical example of man-made landscape, characterized by widespread century-old agricultural terraces retained by dry-stone walls. The Cinque Terre National Park, and in particular its western sector, is geologically belonging to an NW–SE oriented segment of the Northern Apennine, an orogenic chain formed during the Tertiary period (Abbate et al. 1969; Terranova et al. 2006). This sector of the belt is made up of a nappe sequence that includes five overlapping tectonic units, from the top to the bottom: Gottero Unit, Ottone Unit, Marra Unit, Canetolo Unit and Tuscan Nappe (Raso et al. 2019a) (Fig. 2). The Gottero Unit crops out in the westernmost sector of Cinque Terre National Park, along the Punta Mesco promontory, and mainly consists of ophiolite rocks (Jurassic), followed by a turbidite sequence (Late Cretaceous). The Ottone, Canetolo and Marra units are localized in the western side of the Monterosso al Mare village and in the central sector of the national park. The first one is prevalently composed of pelitic rocks (Monte Veri Monte Veri Complex Fm.); Canetolo and Marra occupy a narrow NW–SE-oriented stretch of land encompassing the coastal villages of Manarola and Corniglia and they include claystone with limestones and silty arenaceous turbiditic rocks. The Tuscan Nappe occupies most of the Cinque Terre National Park as it crops out on the eastern, central and western sectors; in the Monterosso al Mare area, this unit is represented by thick sandstone-claystone turbidites (Macigno Fm.), largely cropping out along the coast and along the Rio Pastanelli catchment. The geomorphological setting of the area is strictly linked to the tectonic history of the Apennine orogeny, which experienced an earlier compressive phase first, and a ductile extensive phase then (Carmignani et al. 1994). The Cinque Terre National Park territory is bounded by the sea at SW and S, with a 15-km-long coastal configuration and highest elevations reaching 800 m a.s.l. along the NW–SE-orientated main mountain ridge, located at a very short distance from the coastline. The geo-structural and lithological setting strongly influences the local morphology, including slope orientation: most of the slopes are SE to SW-oriented, showing high slope gradient values (over 60°) for more than 74% of the total surface. The Monterosso al Mare territory is nevertheless characterized by slightly different slope orientation and steepness, being N–NE oriented and showing more gentle slopes, mainly due to the weathering affecting gabbros and serpentinites belonging to the Liguridi tectonics units. The tectonic activity influenced also the stream network configuration: the main streams, Fegina (which gives the name to the homonymous hamlet) and Pastanelli, have a very steep profile and are short, with anti-Apennine direction, imposed on approximately N–S faulting systems. These creeks mostly show a torrential behaviour, characterized by high rates of solid transport mainly due to the erosional processes affecting the surrounding slopes: especially in the occurrence of extraordinary rainfall events, very frequent in the area, these high amounts of material strongly contribute to feed the beach deposits that interrupt the continuity of the rocky sea cliffs. Among the natural factors, slope steepness, tectonic setting and the role of exogenous agents (i.e. rainfall and anthropic activity) made the area of Monterosso al Mare prone to landslides over time. Many landslides are reported, besides those of the 2011 event, such as the ones affecting the Mesco promontory along with the western sector of the coastline. A fundamental role to slope stability is also played by agricultural terraces, which strongly reshape the surrounding landscape and therefore contribute to providing a worldwide knowledge of the Cinque Terre National Park (D’Onofrio and Trusiani 2018). As noted before, Cinque Terre is characterized by an average annual precipitation rate of 900 mm; however, extraordinary rainfall event is very frequent due to its geomorphological configuration. The ridge, running parallel to the coast, represents, indeed, a barrier for cumulonimbus coming from S to SW. Furthermore, in the ongoing climate change climatic context, a general increase of precipitation extremes is recognizable, leading thus to an intensification of the frequency of flash floods and landslides.

Fig. 1
figure 1

Setting of the study area. In the top-right inset, location of the Cinque Terre National Park in northern Italy is represented. In the main map, the Cinque Terre National Park boundary is shown in red and the study area outlined in yellow. In the bottom left, the elevation map of the study area along with urban areas and drainage network is represented

Fig. 2
figure 2

Geological map of the study area. The associated legend is as follows: ACC: Canetolo limestones and argillites; ARB: Ponte Bratica sandstones; ARBa: Ponte Bratica sandstones and conglomerates; CGV: Groppo del Vescovo limestones; MAC: Macigno sandstones; MACa: “Arenarie zonate” lithofacies; MACb: Pelite-siltite lithofacies; MACc: marl-siltite lithofacies; BST: basalts; GBB: gabbros; GOT: Gottero sandstones; SRN: serpentinites; LVG: Val Lavagna argillites; MVE: Monte Veri Complex; DSA: Cherts; Faults—1fd: direct fault; 1fi: reverse fault; 1s: thrust (CARG (CARtografia Geologica, in Italian, National Mapping Program – Sheet 248, La Spezia) (Abbate 1969)

Land use setting

The landscape of the Cinque Terre area has been almost totally modified by human activity during the last centuries through agricultural terraces building (Terranova 1984; Brandolini 2017). The agricultural terraced slopes at Cinque Terre area (Fig. 3) extend from the shoreline, just above the cliff edge or from the toes of coastal landslides to an average altitude of 400–450 a.s.l., and in some cases up to 500 m (Terranova 1984; Terranova et al. 2006). The necessity of claiming new spaces to allocate agricultural activities, mainly for vineyards and olive groves, led the local farmers to rework and retain the shallow eluvial–colluvial soil covers by constructing dry-stone walls, over the most suitable bedrock (Brandolini 2017). Terraced areas are almost entirely distributed over some specific geological formations: most of the cultivations can be found on terraces built on the Macigno sandstones belonging to the Tuscan Unit or over the limestones and silty sandstones belonging to the Canetolo Unit or to the Ottone Unit, while few terraced areas can be counted on the sandstones, gabbros and serpentinites that crop out in the Monterosso al Mare area and are part of the Gottero Unit. According to Terranova (1984), about 60% of the territory of the Cinque Terre area is covered by terraced areas, and the sum of the linear extent of dry-stone walls can be estimated to approximately 6000 km. Considering an average dry-stone wall height of 2 m (however varying, according to slope steepness and wall width), the total volume of reworked stones is calculated to be approximately 8 × 106 m3 (Terranova 1984).

Fig. 3
figure 3

Example of the agricultural terraced area in the Cinque Terre area, both active, in the central sector of the slope, and inactive, in the lower and upper part of the slope. On the background, Manarola hamlet

The lack of maintenance of the dry-stone walls and the clogging of drainage channels due to farmland abandonment are considered some of the main causes of the increase in landslide susceptibility of terraced slopes within the Cinque Terre National Park (Canuti et al. 2004; Tarolli et al. 2014; Cevasco et al. 2014; Brandolini et al. 2018). In this area, agricultural terrace abandonment started between the two World Wars and accelerated after the 1950s, leading to widespread and intense erosion and land degradation issues. According to a recent inventory, more than 80% of the whole terraced areas are nowadays abandoned (Brandolini 2017) and have been replaced by pine trees and by Mediterranean scrub, peculiar land-use features in the Liguria region. Although the slopes were almost entirely terraced during the past centuries for vineyards and olive yards, only 19% of the total terraced areas are still cultivated. This issue denotes how the social changes that took place in the second half of the 1950s affected the land use at Cinque Terre area (Schilirò et al. 2018). On abandoned terraced areas, vegetation spreads quite rapidly, as well as scrubs grow up sparsely. After a time span of about 25–30 years, dense vegetation constituted by forest tree species (mainly pine, mixed woods, chestnut woods, holm oak woods) or by Mediterranean scrub covers abandoned terraces (Brandolini et al. 2018). Such land-use modifications concurred to increase the propension of the Cinque Terre territory to be affected by landslides, as confirmed by previous studies which investigated the relationship between agricultural terrace abandonment and the magnitude of rainfall-induced shallow landslides (Cevasco et al. 2013, 2014; Galve et al. 2015; Pepe et al. 2019).

Data and methods

Landslide inventory

The Cinque Terre area has been historically affected by recurring landslide events, mainly in response to extreme rainfall events related to the peculiar rainfall regime (Cevasco et al. 2015). A complete landslide inventory map of the Cinque Terre National Park was redacted by Raso et al. (2019a, b), in which a total of 459 landslides characterized by an extension higher than 100 m2 were identified and classified according to Cruden and Varnes (1996) and Hungr et al. (2014). A large number of debris slides (22.9%) can be related to the vulnerability of dry-stone terraces, while rockfalls (17.6%) are concentrated along the coast. Debris flows (11.1%) are also very spread, especially in the western sector of the Cinque Terre area (Monterosso al Mare and Vernazza municipalities); most of them were triggered by the October 25, 2011 rainfall event, between 9 and 15 UTC. The Monterosso al Mare rain gauge recorded very high hourly rainfall peaks (i.e. 92 mm/h), for a total of 382 mm of rain falling during the entire event (A.R.P.A.L.-C.F.M.-P.C. 2012). As a consequence of such an extreme event, numerous shallow landslides and dry stone wall collapses were triggered, more than 250 in the Monterosso al Mare catchment alone (Fig. 4), where the mobilization of enormous amounts of material caused severe damage to Monterosso al Mare village (Cevasco et al. 2015).

Fig. 4
figure 4

Landslide inventory map of the study area; in yellow are the urban patterns of Monterosso al Mare and Fegina

The landslide inventory used in this study was prepared in the aftermath of the October 25, 2011 event (Fig. 4) through aerial ortophotogrammetry and field survey (Schilirò et al. 2018). Two hundred and sixty shallow landslides and dry-stone wall collapses were triggered following the abundant precipitations and therefore mapped. Most of these landslides were classified as debris avalanches, as defined by Hungr et al. (2001), namely shallow and very fast flow-like movements (velocity between 3 m/min to 5 m/s, Cruden and Varnes 1996) of partially or completely saturated debris on very steep slopes. Furthermore, having available the surveyed landslides as polygons, only one the highest point, characterized by the highest altitude in the detachment area, has been selected. This operation has been conducted in order to make possible the run of the different models requiring points as input data.

Predisposing factors

According to the most representative local morphological and spatial features, 13 PFs were selected (Fig. 5). PFs include slope angle; aspect angle; planform curvature; profile curvature; distance to roads; distance to streams; Topographic Wetness Index, Topographic Position Index and Stream Power Index; agricultural terraces state of activity; land use; geo-lithology and soil thickness. The analysis is focused on environmental variables since it does not take into consideration the triggering factors (e.g. intense rainfall or earthquakes). Since most of the variables were derived from the digital terrain model (DTM) of 5 × 5 m cells, all the elements were set to equal DTM resolution.

Fig. 5
figure 5

Maps of the PFs used for the susceptibility models. a Slope angle. b Slope aspect. c Planar curvature. d Profile curvature. e Distance to roads. f Distance to rivers. g TWI. h TPI. i SPI. j Terrace state of activity. k Land use. l Geo-lithological map. m Soil thickness

A brief description of the PFs is available as Supplementary Information (Supplementary 1).

Modelling methods

Three different MLAs, included in the package “biomod2” (Thuiller et al. 2016), developed in R environment (R Development Core Team 2019) were used. Specifically, artificial neural network (ANN), generalized boosting model (GBM) and maximum entropy (MaxEnt) algorithms, well-known for their good performance in species distribution modelling (Elith et al. 2006), were adopted. A brief description of the algorithms is provided as follows, whereas extended descriptions are available as Supplementary Information (Supplementary 2).

To implement and evaluate models obtained with the different MLAs, the k-fold cross-validation approach was used. Cross-validation is one of the most widely accepted approaches for testing the predictive accuracy of species distribution models (SDMs). A random part of the data is kept for calibration (i.e., training data) while the remainder is used to test (i.e., testing data) the prediction of the model; the whole approach is then repeated several times for a single model and the average predictive accuracy is finally reported (Araujo et al. 2005; Thuiller et al. 2009). Therefore, each SDM was carried out 100 times splitting the population in 80% for training and the remaining part for testing.

Artificial neural network

ANN simulates the structure and/or functional aspects of biological neural networks of the brain, in order to process information (Zurada 1992). Similarly, ANN models are composed by a large number of nodes and connections, arranged in various layers, with the input layer containing the data being processed, many hidden layers and the output layer that represents the final result of the model. Input data, in our case, are the environmental variables, each represented by a node. Connections between nodes and hidden layers may be characterized by specific weights that are randomly assigned at the beginning of the process and later updated for algorithm optimization by means of back-propagation processes (Pijanowski et al. 2002).

Generalized boosting model

The boosting principle was originally developed, among the MLAs, to improve classification procedures (Schapire 1990) and consists of an ensemble of several logistic regression or decision trees. In such an algorithm, a starting model is built by relying on some randomly chosen trees and its performance is then improved by iteratively adding new randomly chosen decision trees that improve the accuracy of the previous iteration’s model.

Maximum entropy

Conceptually, MaxEnt relies on the distribution of an event in the space, related to the distribution on the landscape of the factors characterizing the events (Philips and Dudik 2008). MaxEnt method compares the probability density functions (PDFs) of environmental variables at event’s occurrence places with the PDFs of the same variables on the remaining landscape; the difference is regarded as the “relative entropy” (Elith et al. 2011, Convertino et al. 2013). MaxEnt algorithm finds the PDFs of environmental variables characterizing the events that approximate the PDFs on the landscape, i.e. by maximizing the entropy of information. Hence, the aim of MaxEnt is to recognize the PDFs of variables that may induce patterns to happen (Kornejady et al. 2017). The maximized entropy provides information to forecast those patters with high precision. Since MaxEnt does not specifically model the presence data but rather the density of the environmental conditions used, raw outputs are then transformed into logistic values to be interpreted directly as the probability of an event to occur.

Ensemble modelling

The approach of an Ensemble modelling (EM) is relatively recent and was first introduced by Burnham and Anderson (2002), who made the average of different regression models. The basic principle of multi-model inference is to avoid selecting the best model and to rely on more models indicated; in such a way, uncertainty and bias of both variables and models are minimized. The procedure of modelling is visible in Fig. 6 and can be subdivided into different steps. The first one allows recognizing landslide occurrence data as the response variable and the predisposing factors as predictors. In the second step, to reduce the collinearity among predictors, the variance inflation factor (VIF) was measured and a value of 0.7 was set as threshold. VIF is based on the square of the multiple correlations coefficients resulting from regressing a predictor variable against all other predictor variables. It, therefore, detects multicollinearities that cannot always be easily detected through a simple correlation. In detail, two different strategies to exclude highly collinear variables using a gradual procedure were chosen. Vifcor (cor stands for correlation) was used; first, it finds a pair of variables that has the maximum linear correlation and excludes the one having the largest VIF. The procedure is repeated until there is no variable with a higher correlation coefficient (greater than the threshold) between another variable. Vifstep calculates the VIF for all variables, excludes the one with the highest VIF (greater than the threshold) and repeats the procedure until there are variables with VIF greater than the remainder (Guisan et al. 2017). In the next step, the third, the above mentioned SDMs were performed to model susceptibility over the study area. Furthermore, during this phase, the importance of each variable related to the probability of event occurrence was acquired. Once the models are trained, the variables’ importance is obtained, by making a correlation between the individual variables (Pearson’s correlation, by default). A good correlation score between two predictors means that one of the two variables has low influence, and it is considered not important for the model. On the contrary, a low correlation means that both variables have a high influence on the model. In the ‘VarImportance’ output, the values given correspond to 1 minus the correlation score. The higher the value, the more important the prediction variable is in the model. To assess the reliability of the results of each SDMs, ROC (receiver operating characteristic) and TSS (true skill statistic) methods were chosen. TSS takes into account both omission and commission errors and ranges from − 1 to 1, not being affected by prevalence differently from Kappa (Allouche et al. 2006). Values ranging from 0.2 to 0.5 were considered poor, from 0.5 to 0.8 useful and values larger than 0.8 were considered good to excellent (as in Coetzee et al. 2009). Moreover, ROC/AUC was used as an alternative measure of accuracy. ROC use is very spread in literature: Cantarino et al. (2019) even applied a classification scheme of susceptibility levels based on ROC analysis. Prediction accuracy is considered to be similar to random for ROC/AUC values lower than 0.5; poor, for values in the range 0.5–0.7; fair in the range 0.7–0.9 and excellent for values greater than 0.9 (Swets 1988; Fressard et al. 2014). Since ROC/AUC curves show higher performance values and it is more consolidated in literature, the ensemble modelling has been used. The outputs of the models were combined by implementing four ensemble techniques and the variability of the modelling outputs was calculated by executing the coefficient of variation.

Fig. 6
figure 6

Scheme of the proposed approach for landslide susceptibility modelling

In detail, four kinds of representative ensemble models were chosen, namely (Thuiller et al. 2016):

  • Mean of probabilities (PM)—this ensemble method corresponds to the mean probabilities over each selected model

  • Median of probabilities (PME)—this ensemble model corresponds to the median probability over the selected models. The median is less sensitive to outliers than the mean

  • Committee averaging (CA)—to do this model, the probabilities from the selected models are first transformed into binary data. It is built on the analogy of a simple vote. Each model votes for the landslides being either present or absent. The interesting feature of this measure is that it gives both a prediction and a measure of uncertainty. When the prediction is close to 0 or 1, it means that all the models agree to predict 0 and 1 respectively; when the prediction is around 0.5, it means that half the models predict 1 and the other half 0

  • Weighted mean of probabilities (PMW)—this algorithm returns the mean weighted (or, more precisely, this is the weighted sum) by the selected evaluation method scores

To measure the role of the chosen EM on the prediction output, the spatial variability between mean, weighted mean and median-based final predictions by means of ANOVA test (Analysis of Variance test, Fisher 2006) was tested. ANOVA provides a statistical test of whether two or more population means are equal, and it is based on the law of total variance, where the observed variance in a particular variable is partitioned into components attributable to different sources of variation. Furthermore, pairwise differences between methods were verified by means of the Tukey honestly significant difference test (Tukey 1953), one of the most recommended and used procedures for controlling Type I error rate when making multiple pairwise comparisons.

Results

Ensemble forecasting

Each single ML model (ANN, GBM and MaxEnt) was applied with 100 different test and training combinations (Fig. 7). Every susceptibility map generated in this phase is characterized by small variations and with different values of errors and evaluation scores, subsequently reduced in the EM phase. The average AUC score of the 100 iterations for each model is resumed in Table 1, where it is evident that the best result is obtained by GBM and MaxEnt (0.84 and 0.83, respectively), while ANN result is slightly lower (0.74). Therefore, the three models showed fair results, according to Swets (1988) charts.

Fig. 7
figure 7

Examples of stand-alone susceptibility maps. a ANN. b GBM. c MaxEnt

Table 1 AUC and TSS scores for the stand-alone ML algorithm used, on the top, and for EM on the bottom

Thus, EM was performed by using four different Ensemble techniques: mean of probabilities (PM), median of probabilities (PME), committee averaging (CA) and weighted mean of probabilities (PMW).

The first landslide susceptibility map, shown in Fig. 8a, has been generated through the mean of probabilities (PM). Such map shows a probability of occurrence with a minimum and maximum of 5 and 93%, respectively. It is worth to underline that in all maps, susceptibility was showed using continuous data, without any division in classes, as often adopted in Italy in the framework of landslide risk management. The landslide distribution within the range of susceptibility value is reported in Table 2: the highest concentration of landslide can be found between the values 0.8 and 0.9. Moreover, the landslide distribution is characterized by an increasing trend when going from the lowest to the highest value of susceptibility. Conversely, the areal distribution of the susceptibility values shows a decreasing trend, with large parts of the territory characterized by low values of susceptibility and a limited extent for those characterized by higher values. As for the validation metrics, ROC/AUC test gave a value of 0.910, while the TSS test value was 0.658.

Fig. 8
figure 8

EM susceptibility maps. a Mean ensemble susceptibility map. b Median ensemble susceptibility map. c Committee averaging susceptibility map. d Weighted mean susceptibility map

Table 2 Distribution of landslides according to the probability of occurrence value for each Ensemble model

The second susceptibility map has been produced with the median of probability (PME) EM (Fig. 8b). In this case, the values are ranging between a minimum and maximum value of 0.02 and 0.97, respectively. As previously observed, the main concentration of landslides is located within the areas with a higher level of susceptibility, with an increasing trend (from the lowest to the highest value of susceptibility) (Table 2). The areal distribution is also characterized by a decreasing trend, with a larger extent of lower-susceptibility areas. ROC/AUC and TSS for the selected zone are 0.901 and 0.637, respectively.

By means of the committee averaging (CA) method, values ranging between 0 and 1 were obtained (Fig. 8c). Moreover, the average landslide distribution trend is slightly different, with a quasi-exponential trend, being more than 60% of landslides located in areas with susceptibility values between 0.9 and 1 (Table 2). The areal extent of susceptibility pixels is different as well: larger areas are both distributed in the lower and higher levels of susceptibility. Both aforementioned aspects are nevertheless conditioned by the CA method, which is based on the calculation of binary average values, identified by a cut-off value, to establish the susceptibility. It is indeed a method which tends to overestimate areas with higher susceptibility values, but which can provide an excellent instrument for local administrators and decision-makers since it could represent an application of a “precautionary approach”. ROC/AUC and TSS scores with the CA method are 0.89 and 0.66, respectively.

Finally, the weighted mean of probabilities method (PMW) has been implemented (Fig. 8d). In this case, minimum and maximum values are 0.05 and 0.93, respectively. As in the previous cases, most of the landslides are distributed over the more susceptible areas, as well as the areal distribution presents a decreasing trend, with lower susceptibility values distributed over larger areas (Table 2). ROC/AUC and TSS values are similar to the previous scores, in this case, 0.91 and 0.658, respectively.

To summarize, the use of EM has improved the evaluation scores and therefore the model reliability. The obtained results are also comparable to each other, being all characterized by excellent values of AUC and a good score of TSS (Table 1).

Another significant outcome was obtained, by applying biomod2 library, which allows the assessment of the variable importance for each model. Through this peculiar function, landslide dynamics in the study area and the role of representative PFs can be assessed. Tables 3 and 4 show every variable’s importance for each applied SDM. The first table represents the average importance value of all the environmental variables for each stand-alone model. The sum of all predictors, to any model, is not equal to 1; this is because, as explained in the “Modelling methods” section, every variable must be considered individually. Aspect and slope variables exhibited a higher score respect to the others in all models. The other PFs showed moderate levels of importance, such as land use and terraces state of activity. Table 4 exhibits the important value for each variable in all ensemble models. In this case, five PFs (i.e. “slope angle”, “slope aspect”, “planar curvature”, “terraces” and “land use”) show significant values in all the ensemble models.

Table 3 Variable score for the stand-alone ML models
Table 4 Variable score for the Ensemble models

Uncertainties and similarities of the models

To compare the results obtained from all the performed EMs and to assess their uncertainty, the coefficient of variation (CV) has been computed. CV of probabilities (i.e. standard deviation/mean of each pixel) is a measure of uncertainty of the ensemble models, and it assumes relevant importance when there is a great availability of data. If CV exhibits a high evaluation score, the uncertainty of a given data is high, while the lower the score, the better are the models’ outputs (Fig. 9). As can be seen from Fig. 9, the central and NE areas of the Monterosso al Mare catchment showed lower CV values (light areas), while the other portions of the basin showed high CV values (dark areas). This is an expected pattern, taking into account that the ensemble models highlighted analogous values of susceptibility in the central and NE study areas. However, areas with high CV values can be identified where mainly low susceptibility values are located. In general, the CV map gives a more powerful basis to estimate landslide-susceptible areas of Monterosso al Mare basin along with ensemble maps. However, the CV map could not take into account uncertainty coming from predisposing factors; thus, uncertainties from variables were reduced by using the same input variables for each SDM.

Fig. 9
figure 9

Matrix resulting from the all possible interaction between the coefficient of variation and landslide susceptibility

In this study, a matrix-based approach was used to analyse the relationship between the ensemble models and the uncertainty map (i.e. CV), to generate a “confidence” map (Kim et al. 2018); in this way, the reliability of ensemble models has been assessed. The susceptibility values derived from EM were subdivided into five groups (1–5), while the CV map was categorized into six classes (1–6); in both maps, the Natural Breaks method was applied (Jenks 1967). The matrix was generated across the table to recognize the efficacy of the ensemble map in terms of reducing the uncertainty from the various SDMs. The value of 51 in the matrix table highlights pixels characterized by high landslide likelihood with low variability. Conversely, a value of 16 means that a pixel has a high uncertainty with a low proneness to collapse. The maps in Fig. 10 presents areas with two different typologies, namely, zones with a high probability of landslide occurrence and low uncertainty (i.e. 41, 51) and areas with pixels with low susceptibility and low uncertainty (i.e. 12, 22). Areas with high landslide susceptibility and high uncertainty (56, 55) were hard to recognize.

Fig. 10
figure 10

“Confidence map” resulting from the intersection between coefficient of variation and landslide susceptibility. a Mean (PM). b Median (PME). c Committee averaging (CA). d Weighted mean (PMW). For the colour legend, refer to the matrix of Fig. 9

However, small areas with low probabilities and high uncertainty (i.e. 15, 16) can also be identified. The graphs represented in Fig. 11 confirmed the statistical analyses performed on these maps.

Fig. 11
figure 11

Graph resuming the results of the cross-comparison between landslides and each class of the confidence map (a). Graph resuming the results of the cross-comparison between the number of pixels and each class of the confidence map (b)

As can be observed from the graph in Fig. 11a, most of the landslides (about 75%), in all ensemble models, fall into classes 51 and 41 (i.e. high susceptibility and low uncertainty), confirming how EM correctly identified potential triggering areas, with a low uncertainty value. In addition, landslides falling into low-susceptibility classes are about 3%. The second graph (Fig. 11b) confirms that the largest areas are those with low susceptibility, although the degree of uncertainty varies from low to moderate (classes 13, 14, 15). Nevertheless, pixels with higher susceptibility values are mainly distributed (about 90%) in correspondence of the classes 41 and 51. Therefore, highly susceptible areas are not particularly extended, contrary to those with low susceptibility, and assume low values of uncertainty.

A further comparison between the ensemble susceptibility maps was produced in terms of the spatial distribution of the susceptibility values. Indeed, the spatial similarity between the different models was analysed (Fig. 12). As previously reported, the ensemble maps have been divided into five classes (1–5). Subsequently, an intersection operation between the different maps has been performed. It should be stated that the CA map was not considered in the study since it could have a strong influence on the result, because it tends to overestimate the highest and the lowest susceptibility values. It should be noted that the CA first transforms the probabilities of all ensemble maps into binary values and then it calculates their average. Each pixel of the intersection map represents the different combinations that the individual models can assume. Thus, values labelled as 111 represented pixels in which all models have low susceptibility. Similarly, the pixels labelled as 555 represented areas where all models have high susceptibility. During this operation, intermediate combinations have been also obtained as output (e.g. 121), in which the first and third models have a low susceptibility while the second one has a higher susceptibility.

Fig. 12
figure 12

Spatial distribution of susceptibility classes for each coherent susceptibility level

As in the previous analyses, an intersection operation between the landslides and the output map was performed, to acquire further information. In Table 5, the number of landslides in each specific class and the relative percentage are represented. About 84% of the total landslides fall within “coherent” susceptibility values (e.g. 555, 444, etc.), while the remaining 16% in “non-coherent” classes (Fig. 13a; Table 5). Furthermore, 55% of the total landslides fall into the 555 class, in which all the models are characterized by very high susceptibility. Lastly, about 80% of the landslides fall into the classes with moderate to very high susceptibility. Figure 13b represents the obtained areal extension for each classes. Coherent classes represent 83% of the total study area, where the main part (42%) is characterized by the lowest susceptibility level (class 111), while about 15% is represented by the non-coherent classes (Table 5).

Table 5 Distribution of landslides for each susceptibility level, “coherent” (bold) and “incoherent” (italics)
Fig. 13
figure 13

a Distribution of pixels for each coherent susceptibility level; b Distribution of landslides for each coherent susceptibility level

Discussion and conclusions

Landslide susceptibility mapping represents an essential tool in terms of landslide risk mitigation, especially if carried out in a rigorous way and supported by an accurate dataset. To this, statistic-probabilistic methods represent an ideal synthesis, especially if applied to large areas and when the interaction between geological, geomorphological, hydrological and anthropic PFs is extremely high. However, one of the constraints of statistical methods is represented by the uncertainty associated with every process. Even if many methodologies with the aim of assessing the reliability and evaluating the goodness of statistical methods (such as ROC/AUC, etc.) could be implemented, a percentage of unpredictability is always strictly linked to statistical modelling. One of the main objectives of this work is, therefore, the minimization of such effects, by using different ML methods.

The rationale behind using and merging several models is that two or more models may have very similar predictive performance even when they contain different environmental predictors and/or they yield vastly different spatial predictions, making difficult to know which of the equivalent candidate models to use (Guisan et al. 2017). Furthermore, the “best” model may not be automatically the best one for predictions in a different area or under new conditions (Randin et al. 2006). Combination of several models demonstrates that they produce robust and more stable outputs than single models, which is called “ensemble forecasting” (Araújo and New 2007). The ensemble modelling draws both the main trend (i.e. mean, median, weighted mean) and the general variation (and thus uncertainty) through all models.

In this work, the EM multiple modelling has been carried out 100 times, every time with a different combination of training and test data, for each of the three stand-alone selected methods; later on, a first data calibration has been achieved. The different EMs, obtained by averaging the result of every model, was useful to obtain a summary of all data and therefore to get available information with a strong uncertainty minimization. Ensemble results seem therefore to be encouraging and improved with respect to stand-alone methods, showing ROC/AUC values higher than 0.9, hence excellent according to the various classifications available in the literature (Fressard et al. 2014). Additionally, also TSS values are good (higher than 0.6). As to confirm, landslide distribution in the Monterosso al Mare catchments is mainly concentrated in the areas characterized by high susceptibility. A further refinement of EM reliability has been achieved comparing the ensemble susceptibility maps with the uncertainty value, computed through the CV. Susceptible areas with low uncertainty value provide high-quality outputs, which may be especially useful for decision-makers, willing to obtain trustworthy information. Likewise, non-susceptible areas with low uncertainty value may represent areas where new social and economic activities may potentially be addressed. Moreover, the spatial comparison done through the intersection of the different outcomes of each EM shows a general consistency, with a very high coincidence (about 84%) of susceptible areas computed in different ways, and a general presence of inventoried landslides almost exclusively in high susceptible coherent areas. By using the ANOVA test, no significant differences between the three different ensembling methods have been found (F = 0.244, p = 0.783) and confirmed by the general similarity of AUC of each different ensemble model applied (ranging between 0.9 and 0.91).

The environmental variables chosen as PFs have been selected according to the main characteristics of the territory. Such an assumption is fundamental because many aspects presented here are strictly related to peculiar slope processes occurring within the Cinque Terre area. Indeed, the selection of PFs influencing the landslide triggering is not based on universally recognized guidelines, but it was determined according to the type of landslides and the slope dynamics of Cinque Terre area. By analysing the score derived from the stand-alone and ensemble susceptibility computation, it is possible to observe that geomorphological variables, such as slope and aspect, are very significant and influences the debris cover deposition, the hydrological circulation, evapotranspiration processes and thus erosion, land use and slope dynamics. Among the other variables, an important role is played by those strongly influenced by human activities, such as the presence of terraces and their state of activity and land use. The influence of these factors has been already highlighted in previous works (Cevasco et al. 2013, 2014; Brandolini et al. 2018; Pepe et al. 2019), revealing that terraces abandoned for a short time showed the highest landslide susceptibility. Moreover, slope failures affecting cultivated zones were characterized by a lower magnitude than those occurring on abandoned terraced slopes. Furthermore, the role of vegetation may also be positive, both increasing the evapotranspiration potential and controlling the erosional processes. Besides, the constant maintenance of dry-stone walls, and thus terraces conservation, is a key aspect to preserve the function of dissipating the pore water pressure excess generated behind the wall and subsequently protect and conserve a unique man-made landscape (Camera et al. 2012, 2014). However, a further refinement of input data, such as constant update and new-data implementation, is advisable to obtain susceptibility models which may consider all potential factors leading to instability.

As a final consideration on the methodology applied in this work, further improvements and latest knowledge acquisition about ML algorithms for landslide susceptibility mapping are necessary, to obtain more accurate and less uncertain estimations. A further improvement toward a more refined and precise forecasting of landslide occurrence, both in time and space, can be achieved with the integration of landslide triggering factors such as rainfalls (Segoni et al. 2018), which can support the national civil protection agency in setting-up early warning systems. Moreover, a field cross-validation of LSM is always recommendable to have as most reliable as possible maps, especially when such products are dedicated to public administrations and stakeholders.

Indeed, such tools may be applied in a more suitable way once further and more complete predisposing factors will be identified on the whole Cinque Terre area. These tools will represent potentially a fundamental tool for the safety of people living in the area and for many tourists that every year visit the Cinque Terre National Park. It is worth to underline that in an extremely dynamic and variable environment, such an advanced instrument will be suitable for the management of agricultural terraces, since many cooperatives and wine-makers have started a land recovery during these last years. For all these reasons, an update of environmental variables and therefore of the susceptibility assessment is necessary to guarantee informed land management policies.