Introduction

Landslides hazard mapping at regional scale requires the combination of spatial and temporal probabilities (Remondo et al. 2005; Guzzetti et al. 2006; Corominas et al. 2014; Thiebes et al. 2017). Spatial probability is generally assessed using a variety of statistically based landslide susceptibility models that rely on the analysis of the relationships between the spatial distribution of past landslide events and a number of geo-environmental factors. These methods have been recently thoroughly reviewed by Reichenbach et al. (2018) who suggested to classify them into (i) classical statistics (e.g. logistic regression, discriminant analysis and linear regression), (ii) index-based (e.g. weight of evidence and heuristic analysis), (iii) machine learning (e.g. fuzzy logic systems, support vector machines and forest trees), (iv) neural networks, (v) multi-criteria decision analysis and (vi) other statistics. While classical statistics and index-based methods have been widely used already since the 1990s (Atkinson and Massari 1998; Carrara et al. 1999), more computing demanding methods such as machine learning and neural networks have grown in usage in the last decade and are nowadays popular methods to implement spatial probability mapping (Catani et al. 2013; Conforti et al. 2014; Youssef et al. 2016; Kalantar et al. 2017; Arabameri et al. 2021). Statistical methods have pros and cons that depend upon scales, datasets, methods and purposes (Fell et al. 2008; Hearn and Hart 2019) and have also been quite widely used to map debris flows initiation as well as debris flows propagation susceptibility at regional scale (Mark and Ellen 1995; Delmonaco et al. 2003; Carrara et al. 2008; Chevalier et al. 2013; Heckmann et al. 2014; Bertrand et al. 2013, 2017). On the other hand, the temporal probability of landslides and debris flows occurrence in a given area can be assessed either with probabilistic methods based on historical data (Coe et al. 2004), magnitude-frequency relationships (Hungr et al. 2008) or, ultimately, with rainfall thresholds and their associated exceedance probability in a given period of time (Crosta 1998; Peruccacci et al. 2017). On this respect, it should be pinpointed that while the spatial distribution of (cumulated or averaged) rainfall is sometimes considered and analysed as another geo-environmental factor during landslide susceptibility assessment (Youssef et al. 2016; Ali et al. 2021), the exceedance probability of rainfall thresholds in a given period of time is an independent temporal probability value that can be combined with spatial probability related to the susceptibility of the terrain in order to assess the hazard of an area in quantitative terms (Guzzetti et al. 2005; Corominas and Moya 2008; Jaiswal et al. 2010; Wu and Chen 2013; Thiebes et al. 2017).

In a similar manner, in this work, we aim to map debris flows initiation hazard zones over the Emilia-Romagna Apennines (Italy), by combining regional scale susceptibility modelling (to assess spatial probability), with regionalization of the exceedance probability of debris rainfall thresholds recently proposed by Ciccarese et al. (2020) (to assess temporal probability). Despite the fact that debris flows accumulation fans account for only 0.2% of total landslides deposits area in Emilia-Romagna Apennines (Emilia-Romagna Region 2018a) and that known debris flows events are a few hundred out of more than 14 thousand landslides records (Piacentini et al. 2018), making them significantly less frequent in space and time than other types of landslides such as earth slides and earth flows (Ronchetti et al. 2007; Bertolini et al. 2017; Mulas et al. 2018), debris flows are in any case to be considered a significant potential threat to human activities in this region, mostly because of their rapidity that can cause widespread damages and, eventually, casualties. This has been made evident by the multi-occurrences debris flows events occurred in October 2014 and in September 2015 in the provinces of Parma and Piacenza respectively. Back then, large mesoscale convective system rainstorm clusters triggered altogether more than a hundred debris flows within the time span of few hours, causing severe damages to many roads, some houses and a remarkable geomorphological impact on slopes and streams (Corsini et al. 2015, 2017; Ciccarese et al. 2016, 2017; Scorpio et al. 2018). The relevance of debris flows in the Emilia-Romagna Apennines can also be evidenced by considering the remarkable effects and damages reported by various authors caused by other tens of debris flows occurred in the past decades (Moratti and Pellegrini 1977; Papani and Sgavetti 1977; Rossetti and Tagliavini 1977; Tagliavini 1989; Pasquali 2003). Therefore, since the existing landslide deposits inventory map (Emilia-Romagna 2018a) only outlines large debris flows accumulation fans and sporadic debris flows deposits along the slopes, and no indication is substantially given on the slopes along which these phenomena might more probably occur in the future, mapping debris flows initiation hazard zones by combining spatial and temporal probability, which is the aim of this research, can fill such information gap and, in perspective, can support the improvement of land use and emergency planning in the Emilia-Romagna Apennines.

Methods

Study area and outline of work steps

The study area extends for approximately 15 × 103 km2 between 43°44′23″N and 44°56′13″N and 9°11′59″E and 12°44′28″E (Fig. 1). It covers the Emilia-Romagna Apennines, i.e. the sector of the northern Apennines located inside the administrative boundaries of the Emilia-Romagna Region (Italy). Elevation ranges from maximum 2165 m a.s.l. along the SE-NW mountain range watershed to less than 100 m a.s.l. at the Po plain margin. Rainfall averages approximately 1800 mm/year at the higher elevations (with significant snow fall in winter) to only 800 mm/year at the transition to the Po plain. Large convective cells and systems develop in late summer and early autumn, especially in the most north-westerly portions of the study area, determining severe rainstorms events that can cause as much as one-fourth of yearly precipitation or so, to pour down within a few hours. The hydrographic network is dominated by SW-NE directed rivers, most of which are tributaries of the Po river, flowing down the main structurally controlled valleys. Land cover is vastly dominated by forests and pastures, albeit a quite significant network of villages and roads is covering this mountain region. Geologically, the area is dominated by weak and highly fractured sedimentary rocks (Abbate et al. 1970; Bettelli and De Nardo 2001).

Fig. 1
figure 1

Study area and spatial distribution of the debris flows initiation points used in this research

The outline of the main work steps is presented in Fig. 2. It should be noted, initially, that our analysis is limited to the susceptibility to debris flow initiation and that, as such, it has been based on an inventory of initiation points. As Corominas et al. (2014) correctly argued, different landslide types are controlled by different combinations of environmental and triggering factors, and this should be reflected in the analysis of susceptibility. For this reason, they also suggest that a landslide inventory should be subdivided (when possible) into several subsets, each related to a particular failure mechanism and linked to a specific combination of causal factors. Consequently, in our case, it would have made no sense analysing at the same time the susceptibility to debris flows initiation (that is basically related to the triggering of shallow slides) and that to flow propagation and accumulation, that are governed by different topographic, hydrographic and rheological factors that, moreover, are better accounted for in deterministic runout models rather than statistical susceptibility modelling. Therefore, a dataset of debris flow initiation points mapped after the 2014 and 2015 events in Parma and Piacenza has been used to train different spatial statistical models (Frequency Ratio [FR], Weight of Evidence [WOE] and Logistic Regression [LR]), which were run with various combinations of geo-environmental causal factors. As specified in the next paragraphs, initiation points have been buffered by conversion to 5 × 5 m grid cells, so that average values of topographic and of other causal factors around the points are considered. Another set of the initiation points of 2014 and 2015 events has been used as independent validation dataset 1 to compute success rate curves (SRC, Chung and Fabbri 2003) of all the spatial models. The model with the higher area under curve (AUC) has been further tested using a validation dataset 2 including debris flows initiation points from other areas of the Emilia-Romagna Apennines which have occurred in the period 1972 to 2016. On the basis of the SRC, the model outputs have been reclassified into spatial probability classes. Furthermore, the annual exceedance probability of debris flows triggering thresholds at 3 h cumulated rainfall recently published by Ciccarese et al. (2020) has been calculated and regionalized by spatial interpolation. Finally, the multiplication of debris flows spatial and temporal probability maps returned the debris flows initiation hazard map that was classified on the basis of the return periods associated to different yearly probability values.

Fig. 2
figure 2

Workflow adopted in this study for probabilistic debris flows initiation hazard mapping

Spatial probability modelling

Spatial probability models have been trained and validated on the basis of a dataset of debris flows initiation points referring to the Parma 2014 and Piacenza 2015 events as well as to other debris flows events occurred elsewhere in the Emilia-Romagna Apennines in the period 1972–2016. The debris flows initiation points correspond to the centroid of small shallow translational failures at the head of slope incisions and creeks or along their side banks, from which the remobilized deposits were mixed with runoff water and evolved into ‘debris flows’ and ‘debris floods’ (Hungr et al. 2001). Consequently, the spatial probability analysis considers the possible initiation of both these types of phenomena. Debris flows initiation points have been mapped on the basis of remote sensing and field surveys for the Parma 2014 and Piacenza 2015 events and on the basis of existing landslide’s occurrences databases (Piacentini et al. 2018), publications (Moratti and Pellegrini 1977; Papani and Sgavetti 1977; Rossetti and Tagliavini 1977; Tagliavini 1989; Pasquali 2003) and direct field surveys, for other events in the 1972–2016 period. The extent of the shallow translational slope failures during the Parma 2014 and Piacenza 2015 events ranged indicatively from 100 to 400 m2, and the involved material was coarse granular debris and blocks of sandstones and limestones derived by the weathering of heterogeneous weak rocks such as flysch and block-in-matrix clayey shales (Corsini et al. 2015, 2017, 2019; Ciccarese et al. 2016). Specifically, the dataset includes 136 debris flows initiation points referring to the Parma 2014 and Piacenza 2015 events and 24 debris flows initiation points referring to debris flows phenomena occurred in the period 1972–2016 elsewhere in the Emilia-Romagna Apennines (Fig. 1 and Table 1).

Table 1 Consistency of the debris flows initiation points datasets

A spatial training dataset has been generated by random selection of 60% of the Parma 2014 and Piacenza 2015 debris flows (i.e. 82 points), while a validation dataset 1 has been created using the remaining 40% of points (i.e. 54 points), which has been used for assessing the predictive performances of spatial models inside the same areas the models have been trained. Although it is relatively more common to have a 70/30% partition between training and validation (see for instance Youssef et al. 2016; Kalantar et al. 2017; Arabameri et al. 2021), with some notable exceptions that use also a 50/50% partition (Conforti et al. 2014), and aware that some studies have demonstrated that a changing partition can affect the outputs of the spatial models (Shirzadi et al. 2018), our choice of a 60/40% partition has been mostly driven by the need to train the model while guaranteeing also an adequate number of validation points in this relatively data-points scarce application. For this same reason, we have also created a validation dataset 2 that includes 100% of the debris flows initiation points in the 1972–2016 period (i.e. 24 points) located in other areas of the Emilia-Romagna Apennines, which has served the purpose of assessing the performance of the selected model outside the areas of training. Prior to spatial analysis with FR, WOE and LR, debris flows initiation points have been rasterized with 5 × 5 m grid cells, resulting in one single ‘representative’ grid cell for each debris flows initiation point.

The dataset of geo-environmental spatial causal factors includes slope angle, slope curvature, slope aspect, flow accumulation, distance to streams, land-cover and bedrock lithology (Fig. 3a–g). These parameters have been chosen as they are the most influential for the possible trigger of shallow translational slides that can evolve into debris flows. Slope angle is influential for any type of landslide, as they are gravity-driven phenomena. Curvature allows to discriminate concave to convex morphological patterns. Aspect, generally a minor factor of influence, might eventually condition the antecedent soil moisture or the production of regolith and slope debris in variably exposed slopes. Flow accumulation reflects the possibility that underground flow parallel to slope might be a factor in triggering shallow slides, as it is considered such in some deterministic methods. The distance to stream can be positively correlated to slides that had actually the possibility to transition to debris flows as they occurred close to streams and, also, it indirectly considers the fact that discharge during rainstorm can be a factor of slope toe erosion that might favour shallow slides. Land cover is also another rather influential factor for many types of slope instabilities, and especially for shallow slides tree roots or the absence of them might play a role. Bedrock lithology, in this case, has been considered as a proxy of the surface deposit texture (for which no map at regional scale is actually available across the Emilia-Romagna Apennines) which is obviously related to the fact that the mobilized debris is coarse enough to originate to a debris flow.

Fig. 3
figure 3

Maps of the geo-environmental causal factors and 3 h rainfall thresholds for 185 rain gauges. For the explanation of land cover types and bedrock lithology codes please refer to Table 2

All of the topographic-related variables have been derived by a Digital Elevation Model at 5 × 5 m grid cells of the Emilia-Romagna Apennines. Slope angle has been classified into 6 classes, five in 10° ranges (in order to use a finer seeding in the most common slope values) and one class for all values higher than 50° (Table 2). Aspect has been classified into 9 classes that correspond to the conventional eight main cardinal directions plus flat. Slope curvature has been ranked in the 9 classes resulting from all the possible combinations of positive and negative planar and profile curvature values (a positive planar curvature indicates convexity in the across slope direction while a positive profile curvature indicates concavity along slope dip). Flow accumulation has been calculated in terms of contributing areas (after pre-processing the DEM with a fill function to eliminate morphological depressions and make the model hydrologically consistent) using a D8 flow algorithm. The subdivision into 7 classes has also taken into consideration the relative number of initiation points in the classes. Distance to stream (i.e. the Euclidean distance) has been calculated by retrieving all streams network element using the stream definition function. Even in this case, the subdivision into 6 distance classes has taken into consideration the relative number of initiation points at various distances. Land cover has been derived from the 2018 edition dataset of the official land cover map of the region (Emilia-Romagna Region 2018b). This dataset represents Corine classes in the first 3 levels of detail and other additional classes at higher levels of detail. Altogether, 17 classes have been used: 1–12 derive from the 2nd level, 13–16 derive from the 4th level and class 17 from the 5th level of detail (Table 2). Bedrock lithology has been ranked into 10 classes by grouping on a lithological basis the formations of the geological map of Emilia-Romagna Region at 1:10,000 scale (Emilia-Romagna Region 2006). Classes correspond to lithotypes ranging from massive rocks, to flysch (with different lithic to pelite ratios), marls, olistostrome shales and shales at variable consolidation and tectonic disturbance degree. Prior to spatial modelling, all the geo-environmental variables have been rasterized at 5 × 5 m grid cells.

Table 2 Classes of the geo-environmental causal factors considered in the spatial analysis, relative frequency and WOE contrast

A Pearson’s correlation coefficient matrix between pairs of the raster maps of selected geo-environmental variables is presented in Table 3. Results show a correlation coefficient lower than 0.3 in all pairs, with only an exception of 0.39 between slope and land cover. Values in these ranges are typically associated to a substantial lack of correlation or to a very low correlation level. Consequently, all of the geo-environmental variables have been considered suitable for spatial probability modelling and no further tests of multicollinearity have been carried out. Furthermore, it is also important to underline that in the areas used for training the spatial models (i.e. the portions of Parma and Piacenza provinces covered by the training dataset of initiation points), all of the classes of the geo-environmental causal factors considered (including all of the lithological classes) are represented, allowing for an extrapolation of the results of statistical analyses to the entire study area.

Table 3 Pearson’s correlation coefficient matrix between pairs of the raster maps of selected geo-environmental variables

Spatial probability modelling has been carried out using FR, WOE and LR statistical methods. The FR method is a very simple bivariate method. In practice, the spatial frequency ratio of event occurrence (i.e. debris flows initiation) in a given class of causal factors is divided by the spatial frequency ratio of occurrence of that class of causal factor in the study area. A ratio higher than a unit indicates a probability higher than average, and vice versa. The overall ‘susceptibility’ score is obtained by pixel by pixel sum of FR values obtained for all the classes of causal factors occurring in that specific point in space. The WOE is a slightly more complex bivariate statistical method based on Bayes’ theorem (Lee 1989; Agterberg et al. 1993; Bonham-Carter 1994; Denison et al. 2002). It computes the prior (unconditional) and the posterior (conditional) probability of having an event (i.e. a debris flow initiation) for each class of the causal factors considered a positive and negative weight (W + and W −) is calculated and the sum of the two weights is the so-called contrast (C). A positive (or negative) contrast indicates a positive (or negative) statistical correlation between the class of the causal factor and the event. The overall ‘susceptibility’ score is obtained by a linear combination of C values. The LR is a multivariate method (Cox 1958; Agterberg et al. 1993) based on maximum likelihood estimates obtained by transforming dependent variables into logit variable (i.e. natural log of the odds of the variable occurring or not). LR can be applied even if the variables show conditional dependence. It can be used on categorical or continuous variables, even if they are not normally distributed (Hosmer et al. 2013). To compute the probability of occurrence of an event (i.e. debris flows initiation) in a given combination of classes of causal factors, an s-shaped curve is created by linear regression producing ‘y’ values between − ∞ and + ∞ and transforming it in a function of probability (p) between 0 (as ‘y’ approaches − ∞) and 1 (as ‘y’ approaches + ∞). Finally, a Z-value is obtained by a linear combination of all the regression parameters (estimated through the maximum likelihood criterion) associated to each independent variable (i.e. class of causal factor) that expresses the relative contribution of the classes of causal factors to determine the event (a positive coefficient for a positive correlation and vice versa).

The WOE, LR and FR models have been run in parallel by using the training dataset of debris flows initiation points and by testing various different combinations of spatial causal factors (see Table 4). The total number of combinations has been arbitrarily limited to the 13 potentially most significant ones, and each combination includes from 3 to 6 causal factors maps. This resulted in a total of 39 predictive models that have been compared by SRC (Chung and Fabbri, 2003) obtained using validation dataset 1. The model with the higher AUC has been further tested against validation dataset 2, in order to assess its performances also outside the area of training. On the basis of the results of the SRC obtained with validation dataset 1, the model outputs have been partitioned into 4 classes including susceptibility values associated to 0–40% (high), 40–70% (medium), 70–90% (low) and 90–100% (negligible) of cumulative predicted debris flows initiation areas.

Table 4 Different combinations of geo-environmental causal factors considered in the spatial analysis

In order to convert susceptibility into spatial probability, the Bayesian posterior probability associated to each susceptibility class has been calculated and normalized in a 0–1 (min–max) range. Normalization has been necessary because using few tens validation ‘landslide’ pixels with some thousand pixels making up the validation study area inevitably returns Bayesian’ posterior probability values strongly shifted toward extremely low values, which would be substantially masked by the (much higher) temporal probability component when the two probabilities are multiplied to assess the combined spatio-temporal probability. Specifically, normalization of the Bayesian’ posterior probability in a 0–1 range has been carried out by considering as 0 (minimum) the probability associated to the specific susceptibility value below which no validation landslide (pixel) is found and as 1 (maximum) the probability associated to the higher susceptibility value that corresponds to a landslide occurrence (pixel) in the validation dataset. Furthermore, as the number of pixels with susceptibility values higher than that associated to max probability resulted limited, they were all associated to the high probability class. At the same time, since the lower susceptibility class returned an almost null scaled probability, all the pixels with susceptibility values lower than that associated to null probability have been included in the so-called negligible probability class. Results were such, in our case, that a normalized spatial probability of 0.8 was associated to the high susceptibility class, 0.4 to the medium class, 0.2 to the low class and 0 to the negligible class, thus allowing for a spatial probability map at 5 × 5 m grid cells to be obtained.

Temporal probability assessment by regionalization of triggering thresholds

The debris flows rainfall triggering thresholds by Ciccarese et al. (2020) have been used to support temporal probability assessment. Such dataset includes thresholds at 30′, 1 h, 2 h, 3 h and 6 h cumulated rainfall for 185 rain gauges distributed across the Emilia-Romagna Apennines. In this work, the 3 h rainfall thresholds have been used, ranging from 2 to 151 mm/3 h (Fig. 3h). For the aim of this research, the 3 h rainfall thresholds were preferred to thresholds associated to other durations since (i) a duration of 3 h is reached only during large scale rainstorms that last for a significant period of time, being related to convective cells or convective systems, accounting for the fact that the majority of the debris flows inventoried in Emilia-Romagna Apennines have actually occurred during such type of events; (ii) the 3 h thresholds are associated to a high predictive capability of the multiple occurrences events of 2014 and 2015 ( as evidenced by an AUC from 0.88 to 0.97 in Ciccarese et al. 2020); thus, they are quite reliable predictors of debris flows triggering; (iii) the distribution of 3 h threshold values is less scattered in space across the Emilia-Romagna Apennines than lower duration thresholds and, also, they better mimic the spatial distribution of rainfall regimes in the various altimetric zones of the study area. The annual exceedance probability (AEP) of 3 h rainfall thresholds in each of the 185 rain gauges has been calculated using Gumbel probability distribution coefficients ‘α’ and ‘u’ based on long-term precipitation records. The AEP value in rain gauges has been regionalized by interpolation in 500 × 500 m grid cells by inverse distance weighted (IDW) and no geographically weighted regression. A resampling was finally performed in order to obtain a map of yearly probability (in 0–1 range) at 5 × 5 m grid cells size.

Hazard mapping

Finally, a debris flow initiation hazard has been created by numerically multiplying the spatial probability and the temporal probability maps at 5 × 5 m grid cells size. The resulting yearly spatio-temporal probability values, i.e. the hazard of potentially having a debris flow in a given grid cell, have been ranked into classes by computing the corresponding associated return period and by grouping results into the following range classes: 11–30 years for high, 30–100 years for medium, 100–300 years for low and > 300 years for negligible hazard class.

Results

Spatial probability

The success rate curves for the 39 spatial models tested against validation dataset 1 and their AUC are represented in Fig. 4a and Table 5. The SRC are generally quite good and the AUC rather high (between 0.8909 and 0.9694), pointing to a significant predictive capability of all the models. In general, all of the LR models perform comparatively better than FR and WOE with any combination of causal factors considered. With each modelling method, models run with causal factors combination 11 (i.e. slope, lithology, curvature, flow accumulation and distance to stream) are the ones returning relative maximum AUC (i.e. FR Model 11, WOE Model 24 and LR Model 37; see Fig. 4b).

Fig. 4
figure 4

Success rate curves (SRC) of models run with different combinations of geo-environmental causal factors and classification of susceptibility values into spatial probability values. Legend: a SRC of all spatial models; b SRC of models with factors combination no. 11; c SRC of LR model 37; d classification of model 37 outputs into spatial probability values

Table 5 Results of the success rate curves for Frequency Ratio (FR), Weight of Evidence (WOE) and Logistic Regression (LR) models run with different combinations of geo-environmental causal factors, expressed in terms of area under curve (AUC)

In absolute terms, with respect to validation dataset 1, the best performance is that of LR Model 37, with an AUC of 0.9694. This model has been further analysed against validation dataset 2, in order to assess its capability to correctly discriminate the location of debris flows initiation points from other areas of the Emilia-Romagna Apennines. The SRC with validation dataset 2 (Fig. 4c) has also a quite high AUC (0.9168), indicating a good predictive capability of the model even outside the area of training. The SCR with validation dataset 1 has also been used to classify the outputs of LR Model 37 (i.e. values from 0 to 0.11328) into 4 susceptibility classes. The partition into 4 susceptibility classes and the corresponding association to spatial probability classes according to the approach illustrated in ‘Spatial probability modelling’ has resulted in 0.73% of the study area being classified at high spatial probability (normalized probability = 0.8), 1.45% of the study area being classified at medium spatial probability (normalized probability = 0.4), 2.01% of the study area being classified at low spatial probability (normalized probability = 0.2) and 95.80% of the study area being classified at negligible spatial probability (normalized probability = 0) (Fig. 5a).

Fig. 5
figure 5

Results of the combination of spatial modelling and regionalization of rainfall thresholds for debris flows hazard mapping. Legend: a Spatial probability of debris flows initiation; b temporal probability of rainfall thresholds (annual exceedance probability); c spatio-temporal probability of debris flows initiation; d classified hazard of debris flows initiation

Spatio-temporal probability and hazard map

The regionalization of the annual exceedance probability (AEP) of the 3 h triggering rainfall thresholds resulted into a map of continuous values ranging from 0.0114 to 0.1145 year−1 (Fig. 5b). The spatio-temporal probability map displays values ranging from 0.0 to 0.0892 year−1 (Fig. 5c). Being obtained by multiplying the spatial probability map and the regionalized AEP map, a 0 value is obtained for each grid cell that was considered having null spatial probability. These values have been reclassified into 4 hazard classes corresponding to different return period ranges, obtaining 0.87% of the study area classified as high hazard (i.e. return period 11–30 years), 2.83% as medium hazard (i.e. return period 30–100 years), 0.5% as low hazard (i.e. return period 100–300 years) and the remaining 95.81% as negligible hazard (i.e. return period > 300 years) (Fig. 5d). Substantially, the spatial distribution of hazard values over the study area is governed by the spatial distribution of susceptibility and, consequently, the associated spatial probability. In practice, only grid cells in which a given spatial probability of having debris flows initiation exists are considered hazardous at some level. On the other hand, the spatial distribution of AEP values associated to rainfall thresholds acts as a scaling factor of spatial probability, so that areas that have analogue spatial probability but different probability of occurrence of triggering rainfall conditions are potentially classified at different hazard levels. The influence of temporal probability values over the spatial distribution of hazard values is mostly evident on a regional scale, where two mountain sectors similarly characterized by a relatively large number of grid cells at high spatial probability might result having a different number of grid cells classified at high or at medium hazard because the temporal probability of occurrence of triggering rainfalls is significantly different in one sector than the other.

Discussion

Limitations of the hazard map

One intentional limitation of this research is that it only analyses the hazardousness of slopes with respect to the initiation of debris flows. Thus, it does not consider the whole process of runout and deposition which are certainly relevant for hazard assessment. However, taking runout and deposition areas into account at regional scale would have required using training ad validation points referring to runout and deposition zones since, as already mentioned, these processes are governed by different topographic, hydrographic and rheological conditions. Moreover, potential runout and deposition areas are actually better accounted for by using dynamic runout models (Hungr 1995; Hurlimann et al. 2008; Berti and Simoni 2014; Liu et al. 2021). Thus, it was decided that mapping the susceptibility to runout and deposition was beyond the scopes and possibilities of this research. Furthermore, debris flows initiations areas are represented, for spatial analysis purposes, by a single ‘representative’ initiation grid cell 5 × 5 m located at the centroid of the shallow roto-translational slides that triggered debris flows during the 2014 and 2015 events. This is certainly a limitation, as the causal factors considered might actually have a certain variability outside the representative pixel but still inside the real extent of the shallow slide area. Nevertheless, as the extent of such phenomena was indeed quite limited (100 to 400 m2 as already reported in ‘Methods’ and references therein), this variability should be also limited. However, it is undoubtful that we might have introduced some under sampling of the variability of geo-environmental conditions at debris flows initiation areas. At the same time, the identification of initiation point of debris flows from archive data was rather tentative, leading to a quite large uncertainty on the actual conditions at the initiation points. That’s why these events have not been used for training models, but, only, for a second-level validation of model outputs.

The geo-environmental variables selected for the analysis should be commented with respect to their significance at the scale adopted for their analysis. On that respect, we have tested 13 different combinations of geo-environmental variables mapped at 5 m grid cells on the basis of original data surveyed at a similar level of nominal spatial accuracy. The combination that resulted in performing better in terms of discriminant capacity, included slope angle, standard slope curvature, flow accumulation, distance to stream and bedrock lithology. Therefore, these are the factors that have the higher importance in determining debris flows initiation in our area. As for the morphometric variables, the role they play on shallow slides triggering is quite straightforward, as they encompass a number of physical and hydro-mechanical factors. The use of a DEM originally at 5 m grid cells is a guarantee of the representativeness of the calculated morphometric variables at such scale of detail. The contrast values obtained by the WOE method (reported in Table 2), which are intuitive indicators of the importance of each class of the parameters used in our analysis, show positive correlation from over 20° of slope, with maximum correlation to slope angles higher than 50°. Regarding slope curvature, the most influential class (high contrast) is the one combining a negative planar curvature to a positive profile curvature that is typical of the head zone of slopes incisions and creeks. The flow accumulation is similarly positively correlated to debris flows initiation over many classes, indicating that although the factor is influential in general, the extent of the contributing area areas is not such a significant factor. As regards bedrock lithology, it plays a role as it is substantially a proxy of the characteristics of the slope deposits that are mobilized by shallow slides and that initiate debris flows. Actually, an ideal dataset for the analysis would have been a map of slope deposits types. But being not available for the study area, bedrock lithology has been used as a surrogate. Even in this case, the scale of surveys of the geological maps that has been used to derive lithological map is originally at 1:10,000, so that the original information is sufficiently detailed as to be resampled at 5 m grid cells. Nevertheless, the usage of a limited number of lithological classes has certainly introduced a level of simplification of the real variability of the lithological conditions along the slopes, also because the involved formations are in many cases lithologically and structurally complex. In this case, contrast values (see Table 2) indicate a positive correlation to massive rocks and flysch with high lithic to pelite ratios, as well as to limestones. Expectedly enough, a null or negative correlation is with shales and marls or flysch with a large pelite component. A comment should also be passed for the absence of land cover from the combination of factors that performs better. Since the large majority of debris flows occur in forests (with a contrast value largely positive), one would expect this class to be discriminant. However, such a land cover class is also by far the most widespread in the study area, limiting the statistical significance of the factor. Even in this case, the scale and detail of original data is sufficient for a resampling at 5 m grid cells without altering information.

The approach adopted to assess the temporal probability of debris flows involves a number of assumptions. Some are inherent to the methodology adopted by Ciccarese et al. (2020) for the assessment of triggering rainfall thresholds, and they are thoroughly discussed in their paper. In this study, for all the reasons explicated in ‘Temporal probability assessment by regionalization of triggering thresholds’, we have considered the 3 h cumulate rainfall thresholds as the most significant one for hazard mapping purposes. Among these reasons, the main one is that thresholds at 3 h cumulated duration are only reached when large convective cells or even mesoscale convective systems take place, which was the condition leading to all the known multi-occurrence debris flows events. This makes the 3 h duration more suitable for our mapping purposes than thresholds at 30′ or 1 h, that on the contrary can be reached even during single rainstorm events. A positive computational consequence of using 3 h thresholds is that they refer to rainfall events with pluriannual return periods, thus with an annual exceedance probability lower than unit that is ideal to compute annual probability in a 0–1 year−1 range. The spatialization at regional scale of annual exceedance probability of thresholds referring to single rain gauges is also a source of uncertainties in our analysis. We selected an inverse distance weighted interpolator over other possible ones (such as kriging), since we wanted to maintain unaltered the values in the data nodes (i.e. the rain gauges). We also decided to apply no geographically weighted regression, by taking into consideration that debris flows are associated to convective rainstorm events, whose intensity in space upon occurrence cannot be univocally related to ground elevation. Nonetheless, it is undoubted that these assumptions are reflected on the computed hazard values. Finally, being based on the analysis of past rainfall data, the expected annual probability of occurrence of triggering rainfall thresholds across the study area does not consider the possibility that, due to changing climatic trends, high-intensity rainfall events can in the future have a different spatial distribution and a higher frequency in time. This implies that in some parts of the study area, the actual probability to have debris flows in the future might be higher than computed in our hazard assessment.

The use of grid cells for landslide susceptibility mapping has also some quite well-known drawbacks. One is that the number of non-landslide (‘stable’) cells is often much larger than the number of landslide (‘unstable’) cells, resulting in a sampling bias that can affect the classification (Reichenbach et al. 2018). In our case, the consequence was that the outputs of the spatial models were numerically biassed toward very low values. Such bias has been bypassed by reclassifying the model outputs in terms of spatial probability on the basis of the outcomes of the SRC, but the reclassification itself is a source of uncertainty. Another drawback of grid cells-based analysis is that resulting maps are often difficult to interpret, with single grid cells with high values being typically surrounded by grid cells at lower values and vice versa. Our results in terms of combined spatial and temporal probability made no exception. Nevertheless, the problem has been addressed by performing a ‘post-processing’ of the modelling results, i.e. by grouping spatial–temporal probability values into hazard classes related to key return periods. This has significantly limited the pixel-to-pixel variability of the results, making the map much easier to be interpreted.

Reliability and usability of the hazard map

One way to assess the reliability of a map is to check the substantial correspondence of known debris flows initiation points with areas characterized by medium to high hazard. As previously mentioned, the spatial distribution of hazard mimics the spatial distribution of spatial probability scaled accordingly to the temporal probability of occurrence of rainfall triggering events. Consequently, on a quantitative basis, the success rate curves used to compare different statistical modelling approaches and combinations of geo-environmental variables provide also indirectly an indication of the quantitative reliability of the hazard map. The significant predictive capability of the spatial model that has finally been used for obtaining susceptibility values is evidenced by the SRCs obtained with two different independent validation datasets. Both the SRCs show that 90% of the debris flow initiation points can be correctly predicted by mapping only 5% of the study area as hazardous. Furthermore, AUC of both SRCs are higher than 0.9. Particularly significant is the performance with validation dataset 2 that indicates that the model is capable of discriminating highly susceptible zones even outside the areas of training. This is made possible by the fact that the training areas (i.e. the portions of Parma and Piacenza provinces covered by the training dataset of initiation points) include all possible classes of geo-environmental factors found in the regional study area. This has allowed obtaining statistical results that can be significant over to the entire study area. On a more qualitative basis, the good agreement between debris flows initiation points and grid cells characterized by medium to high hazard can be visually perceived in the maps presented in Fig. 6 that are examples from different locations inside the study area (Fig. 6a). The first two examples (Fig. 6b, c) refer to areas in the Piacenza and Parma provinces, in which a significant number of debris flows initiation points from the 2014 and 2015 multiple occurrences events do actually fall into grid cells corresponding to medium to high hazard. The third example (Fig. 6d) represents an area in Reggio Emilia province in which a number of initiation points referring to other past debris flows events show a good correspondence with grid cells classified at medium to high hazard.

Fig. 6
figure 6

Examples of the hazard map for different provinces in the study area. Legend: a Location of the examples; b Aveto valley (Province of Piacenza); c Corniglio (Province of Parma); d Ventasso (Province of Reggio Emilia); e Casalfiumese (Province of Bologna); f Montefiore Conca-Saludecio (Province of Rimini)

Another way to assess the reliability and usability of a map on a practical usage perspective is to check it against ‘geomorphological common sense’ (Hearn and Hart, 2019) and ground truths of debris flows after the map is produced. Actually, all of the grid cells classified as hazardous tend to be concentrated around and along the uppermost branches of creeks and streams. This pattern is quite evidently the result of the statistical influence of factors such as the distance to stream and flow accumulation. But this is certainly geomorphologically reasonable, given the fact that debris flows initiation points must represent shallow slides that convey debris to the drainage network from which it was then mobilized by solid–liquid discharge. Another feature in the spatial distribution of hazardous grid cells that makes sense on a geological perspective is that hazardous cells are much more frequent in slopes along which bedrock includes significant amounts of hard rocks such as limestones and sandstones. As a matter of fact, these are the lithologies that are predominantly involved in debris flows and that are common in the areas represented in Fig. 6b–d. On the contrary, such lithologies are rare in the areas represented in Fig. 6e, f that are consequently characterized by a very limited number of cells classified as hazardous and that, as a matter of fact, have no record of known debris flows occurrences. Finally, the chance to evaluate the reliability of the hazard map for practical purposes has been given by an event that has taken place during 2020 in the locality of Fosso Riaccio (province of Modena). Field surveys have demonstrated that the upper initiation point recognized along the slope, as well as some lateral failures that have contributed to the event, correspond to areas in which the grid cells are classified at high to medium hazard (Fig. 7), while the accumulation zone is in a low initiation hazard zone. This is consistent with the aim of the method adopted and it indicates that the map correctly discriminates the parts of the slopes in which debris flows initiation is more probable.

Fig. 7
figure 7

Debris flows event 2020 in Fosso Riaccio (Province of Modena). Legend: a Hazard map and location of pictures; b main initiation zone characterized by shallow slope failures; c lateral scouring; d secondary slope failures along the debris flows track; e debris flows deposits along the track; f main debris flows accumulation fan

Conclusions

In this study, we have used debris flows initiation information obtained after major multi-occurrence events in 2014 and 2015 in the Emilia-Romagna Apennines, together with data from archives and literature, in order to assess the spatial probability of initiation of debris flows at regional scale. In doing so, relevant thematic information regarding debris flows and geo-environmental causal factors has been collected and processed using grid cells mapping units. We have tested different statistical models and combinations of causal factors and have quantified their performances using metrics associated to the success rate curves. The outputs of the model providing best spatial predictive performances, have been reclassified in terms of spatial probability of debris flows initiation. At the same time, we have regionalized recently published rainfall thresholds, in order to map the annual probability of debris flows events. The combination of these products resulted in spatio-temporal probabilities that have been ranked into hazard classes using ranges of return periods. The resulting hazard map is consistent both with the spatial distribution of past debris flows initiation points and with the geomorphological common sense. Furthermore, after being prepared, it has proven quite accurate in predicting the location of a debris flows that has occurred in 2020 inside the study area. On such a basis, despite its limitations, we consider the debris flows hazard map produced sufficiently reliable to integrate existing inventory maps in land-use regulation and emergency planning. On a general perspective, similarly to other previous studies that considered the spatial and temporal probabilities of phenomena as independent values to be combined in order to assess hazard in quantitative terms, the approach adopted in this research can be replicated in any situation in which a sufficient amount of spatial information regarding debris flows initiation zones and their possible causal factors, as well as specific rainfall thresholds, are available for the analysis.