Introduction

Landslides are one of the most frequent and widespread natural hazards worldwide (McGuire et al. 2002), causing considerable human and material losses each year (Papathoma-Köhle et al. 2015). Moreover, those effects are increasing globally due to the growth in population, infrastructures, and economic activities, which have been expanded into the landslide-prone areas, altering the spatio-temporal pattern of exposure (Baum and Godt 2010; Fuchs et al. 2013). Presently, efforts are being made to advance on the temporal forecasting of landslides (Melillo et al. 2018; Segoni et al. 2018) and the implementation of early warning systems (Naidu et al. 2017; Piciullo et al. 2018). Achieving this objective involves studying the landslide-triggering factors (floods, earthquakes, volcanic activity), with rainfall being widely recognized as one of the most important landslide triggers (e.g., Crozier 1986; Wieczorek 1996; Zêzere et al. 2015).

The temporal occurrence of rainfall-induced landslides is commonly modeled through empirical thresholds that define the precipitation conditions that must be reached or exceeded for the triggering of landslides (Guzzetti et al. 2007). The calculation of these thresholds is based on the statistical analysis of individual or multiple rainfall events that resulted in landslides in the past (e.g., Caine 1980; Dahala and Hasegawa 2008). When information on rainfall events that triggered and did not trigger landslides is available, thresholds are defined as the best separator of both conditions (e.g., Jakob and Weatherly 2003; Giannecchini et al. 2012; Vaz et al. 2018). The critical rainfall-triggering conditions are described by a single parameter or a combination of several of them, with the intensity-duration (I-D) (e.g., Jan and Chen 2005; Guzzetti et al. 2008; Brunetti et al. 2010) and the accumulated rainfall duration (E-D) (e.g., Zêzere et al. 2008; Vennari et al. 2014; Peruccacci et al. 2017) among the most commonly considered.

The usual input data for these models are records from landslide databases (Van Den Eeckhaut et al. 2012) and precipitation data series from national or regional weather station networks, although presently rainfall satellite remote sensing data are becoming an alternative (Brunetti et al. 2018). Thus, the completeness and temporal scale (hourly, daily, monthly) of these data series will control the definition of the landslide events, understood as the number of individual landslides triggered in a wide area by a single triggering event (Crozier and Glade 1999), and therefore the temporal resolution at which the triggering conditions may be forecasted (Caracciolo et al. 2017). Another critical issue is the delineation of the rainfall interval responsible for the landslide initiation. Thereby, some authors establish thresholds only considering the rainfall-triggering event: interval between the moment of initiation of the failure or failures and the time when the rainfall event started (Brunetti et al. 2010; Vessia et al. 2014; Rossi et al. 2017). Other authors consider the antecedent rainfall conditions over a long interval before the time of occurrence of the landslide: 10 days (Aleotti 2004), 15 days (Mathew et al. 2014), 30 days (Giannecchini et al. 2012), or a range of durations from 1 to 90 days (Zêzere et al. 2015), among others.

According to the European Landslide Susceptibility Map, updated in 2018, the Cantabrian and Pyrenean Ranges (N of the Iberian Peninsula) are some of the most landslide-prone areas in Europe. However, few recent works have addressed the temporal occurrence of landslides in these zones. Before 2012, several studies focused on the Eastern Pyrenees have set critical rainfall conditions for the occurrence of shallow landslides associated with different precipitation patterns (Gallart and Clotet 1988; Corominas and Moya 1999) and landslides occurring in artificial slopes (Moya and Corominas 1997; Moya 2002). Furthermore, Corominas and Moya (1999), Corominas et al. (2002), and Domènech et al. (2012) have calculated thresholds for the reactivation of earthflows and slides in the same area. Sometimes, those critical conditions have been interpreted as regional thresholds, despite the reduced extent of the areas for which they were defined or the limited size of the analyzed landslide datasets. In contrast, only one study (Domínguez-Cuesta et al. 1999; Domínguez-Cuesta 2003) has established rainfall-triggering conditions in the Cantabrian Range before 2012. Although new thresholds have been computed by Valenzuela et al. (2018a) for different precipitation patterns and San Millán Revuelta (2015) for Cantabria province, new work is needed for a better understanding of the relation rainfall-landslides in this area. Apart from the northern area, other studies have been developed in the Iberian Peninsula, such as the thresholds calculated by Mateos et al. (2012) for the Tramuntana Range, Palenzuela et al. (2016) for the Betic Range, and the large number of studies conducted in Portugal (e.g., Marques et al. 2008; Zêzere and Trigo 2011; Zêzere et al. 2015; Vaz et al. 2018).

Goals

The main goal of the present research is to calculate empirical antecedent rainfall thresholds for the triggering of landslides in different areas of Asturias for the period 2008–2016 as an approach to understand the rainfall-landslides relationship and to forecast the landslide occurrence in the study area.

Study area

Geography

This research focuses on the Principality of Asturias, a 10,603-km2 Autonomous Community located in the NW of Spain (Fig. 1A). The region constitutes an E-W trending strip of territory bounded by (i) the Cantabrian Sea to the north, along a cliff-fronted coastline of almost 400 km, and (ii) the Cantabrian Range to the south, whose watershed is over 1200 m and reaches its highest elevation in Torrecerredo Peak (2684 m asl). The short distance between the axis of the Cantabrian Range and the coast, ranging from 75 km in the west to less than 30 km in the east, implies a large altimetric difference and a rugged orography. The center of the region and a 1–5-km-wide strip along the coastline (23% of the study area) show a low relief, while most of the hinterland (77% of the region) shows a mountainous landscape.

Fig. 1
figure 1

Study area and characteristics of the landslide dataset. A Location of the Principality of Asturias and main geographical features. B Monthly distribution of the whole BAPA dataset and the selected landslide records throughout the 8-hydrological-year study period; comparison with monthly precipitation records. C Monthly landslide distribution during the hydrological year

Asturias has a population of 1,051,229 inhabitants (INE 2015), showing a very uneven distribution across the territory. The center is the most densely populated area due to the location of (i) the three major cities and industrial centers: Oviedo, Gijón, and Avilés (more than 1000 inhabitants km−2) and (ii) the Central Coalfield: a strongly human-altered area due to the intense coal mining activity. In contrast, the remaining territory, characterized by a rural economy, has population densities lower than 100 inhabitants km−2 (SADEI 2016). Moreover, the whole region is characterized by a dispersed population pattern, generating a very dense infrastructure network (e.g., roads, railway lines, water supply, electricity network). All these factors have resulted in a highly human-altered landscape in all the Principality of Asturias.

Geology

The bedrock is made up of (i) a Precambrian and Paleozoic basement, that constitutes the majority of the Asturian territory, and (ii) a discordant upper Permian, Mesozoic, and Paleogene cover, that overlies the basement in the center and north-eastern area. The Paleozoic basement, strongly folded and fractured, is divided into two zones: the West-Asturian Leonese Zone (Marcos 2004) to the west, with a predominance of siliceous materials affected by metamorphism, and the Cantabrian Zone (Alonso et al. 2009) to the east. In the Cantabrian Zone, it highlights the Central Coal Basin, with an alternation of different materials and coal beds, and the Picos de Europa Unit, with a predominance of limestones. On the other hand, the discordant cover is composed of slightly deformed siliciclastic and calcareous materials (Alonso et al. 1996). The characteristics of the Asturian bedrock, together with the uplift of the Cantabrian Range during the Alpine Orogeny and the climatic variability during the Cenozoic, have conditioned the development of the relief through the operation of glacial activity (Rodríguez-Rodríguez et al. 2015), fluvial incision (Jiménez-Sánchez et al. 2014), and karst processes (Ballesteros et al. 2015).

The current geomorphological regime is dominated by coastal, fluvial, and slope instability processes, with landslides being the most widespread cause of natural hazards (Ferrer Gijón 1995). Landslides cause considerable economic costs, estimated at 66 million euros every year, and sometimes human victims (González Moradas and Lima de Montes 2001). During the Cenozoic, huge mass movements linked to past climatic and tectonic conditions were common in Asturias, as shown by the abundant relict deposits. In contrast, most of the landslides occurring nowadays show a reduced size (metric to decametric) and depth (typically less than 3 m), usually affecting the Quaternary deposits or the surficial weathered layer of the bedrock. Previous studies have pointed to the Central Coalfield and the Permian-Mesozoic cover as the most susceptible areas for the occurrence of landslides (Marquínez et al. 2003; Domínguez-Cuesta et al. 2007). Rainfall has been widely recognized as the main cause of slope instabilities (Domínguez-Cuesta et al. 1999; Domínguez-Cuesta et al. 2017), without forgetting the preparatory role played by the soil moisture (Valenzuela et al. 2018a). Moreover, other factors are also present, such as: (i) the temperature in the case of rockfalls (Menéndez-Duarte and Marquínez 2002), (ii) the influence of natural fluvial or coastal processes (Domínguez-Cuesta et al. 2018), and (iii) the human activity (Martínez et al. 2005).

Climate

The Asturian region shows a characteristic Oceanic climate. Considering the Köppen-Geiger classification (Peel et al. 2007), the majority of the area presents a Temperate climate Cfb, without dry season and with a temperate summer, while the SW area shows a Temperate climate Csb, with a dry and a temperate summer; moreover, the highest mountainous areas present Cold climate Df (García Couto 2011). Average annual precipitation and temperature recorded in meteorological stations range from 960 mm and 13.3 °C in the lowest areas near the coast to 1688 mm and 11.4 °C in the mountainous areas (AEMET 2012). Despite the absence of a well-defined dry season in the Asturian climate, the maximum monthly precipitation values are concentrated in the period October–May, while the minimum precipitation values are recorded between June and September (both hereinafter named as wet and dry periods). The average annual number of rainy days is 123 (> 1 mm in 24 h), with 30 moderate rainy days (> 10 mm in 24 h) and 4 heavy rainy days (> 30 mm in 24 h) (Botey et al. 2013).

Due to (i) the orientation of the major geographical features, (ii) the proximity between the sea and the Cantabrian Range, and (iii) the prevailing winds, the altitude and the orographic effect play a predominant role in the rainfall distribution over the Asturian region (Arasti et al. 2002). Thus, the central area shows the lowest amount of precipitation (< 1100 mm yr−1), while the mountainous areas show mean annual precipitation values higher than 1500 mm yr−1. Two main precipitation patterns are the most frequent: (i) prolonged episodes of moderate-intensity frontal rainfall enhanced by orographic forcing and (ii) brief episodes of heavy rainfall due to strong instability of air masses (Valenzuela et al. 2018a). The first synoptic conditions, common between October and May, are due to the succession of fronts associated to extratropical cyclones during weeks or even months. In contrast, the rainfall between June and September is primarily due to convection associated to mid-level troughs or cut-off lows lasting hours or days (Valenzuela et al. 2018b).

Data and methodology

Input data

Empirical data used in the proposed analysis comes from the BAPA: Base de datos de Argayos del Principado de Asturias-Principality of Asturias Landslide Database (www.geol.uniovi.es/BAPA), a landslide inventory developed by the University of Oviedo for the Asturian territory (NW Spain). Argayo is the term used in Asturleonés, a traditional language from the NW of Spain, to refer to landslides. Although Valenzuela et al. (2017) published and analyzed the contents of the database up to the year 2015, the BAPA continues to be updated, so the present research considers a dataset covering the period 1980–2016.

The database gathers a wide range of information about each recorded landslide, including the following: (i) date of occurrence, (ii) location coordinates, (iii) size and type of landslide, (iv) triggering factor, and (v) damage and costs caused. Data collection has mainly been performed by searching in regional newspaper archives (86% of the database entries), although additional reports from citizens and institutions have also been considered (14% of the entries). On this basis, 2245 individual landslides (Ls) have been recorded for the period 1980–2016, giving an average annual value of 60.7 Ls yr−1 and a density of 0.2 Ls km−2. The dataset shows relevant variations in their spatio-temporal distribution over this time span and the study area. Thus, the recorded landslides are more concentrated in the central area of Asturias. Moreover, the dataset shows an increasing pattern to date due to the improved availability of data sources, so that the highest completeness of the BAPA database has been reached since 2008. From 2008 to 2016, the highest availability of data also implies a more homogeneous distribution of the records across the study area.

A substantial effort has been taken to assess the reliability of the spatio-temporal information about the landslides. For the temporal assessment, three categories have been defined: Level 1: exact date, Level 2: date with an uncertainty of 2–5 days, and Level 3: date with an uncertainty greater than a week. In the case of the spatial data, the combined use of photos and descriptions from the press archives and reports, free cartographic servers (Google Maps-Google Street View and Iberpix), and field work has allowed us to geo-locate and assess the accuracy of the landslide location, also defining three categories: Level 1: exact coordinates (1:1000 scale), Level 2: location with an uncertainty of 100–500 m (1:10,000 scale), and Level 3: location with an uncertainty higher than 500 m (1:25,000 scale or greater). The assessment process, widely described in Valenzuela et al. (2017), has revealed that the spatio-temporal accuracy of the records increases from 1980 to 2016, reaching the highest levels from the year 2008; the exact coordinates are known in 60% of the recorded landslides, while the exact date is known in 51% of the cases; conversely, hourly data are only available for 285 landslides (22%).

Selection of the study period and the landslide dataset

The first step for the analysis was to select a period of study within the BAPA database. The period has been delimited considering the definition of the hydrologic year in Spain (from October to September) and the highest level of completeness. In addition, it was considered essential that the records showed the highest level of spatio-temporal accuracy. A primary analysis of the climatic conditions within the selected period was performed considering precipitation records from the AEMET (Agencia Estatal de Meteorología-Spanish Meteorological Agency). Within this period, the landslide dataset selected for the analysis includes only those records from the BAPA database located with an exact date. The available information about the triggering factors of the selected landslides was reviewed in order to remove those with a known origin different from rainfall, such as fluvial erosion, coastal erosion, or human causes (slope cut works in progress, blasts); the records with unknown origin were considered in the analysis.

Selection and processing of the precipitation data

The daily precipitation data series used for the analysis, covering the period 1979–2016, come from the AEMET. Precipitation data correspond to rain, snow, sleet, and hail. Although the Asturian region has a relatively dense network of 246 rain gauges (1 gauge every 43.1 km2) managed by a government agency, the majority of the rainfall series only include daily measures showing significant data gaps; e.g., for the present research, only 19 precipitation data series with daily records and a completeness higher than 90% were available for the period 1979–2016. For this reason, 6 weather stations were selected as representative of different regions within the study area, taking into account its geographic location, the spatial distribution of the annual precipitation (Arasti et al. 2002), and the availability of consistent daily data series, showing a completeness of almost 90% during the period 1979–2016. Linear regression between pairs of data series from the nearest rain gauges was used to fill the gaps in the series on the basis of the equation y = mx + b (Sattari et al. 2016). The requirement of data series with a completeness higher than 90% attempts to minimize the limitations of the linear regression method. In order to portray the days without precipitation recorded in the original data series (daily records x = 0) in the new completed series, b was set as 0 in all the cases. The best correlation was determined separately for the humid and dry periods (October–May and June–September, respectively) in each data series on the basis of the correlation coefficient R2. All the records with values lower than 0.1 mm were deemed to be 0. Also, those daily records that resulted from the accumulation of precipitation during two or more days were divided between the corresponding number of days. Finally, the precipitation data series were normalized by dividing each daily record by the mean annual precipitation (MAP) of the corresponding rain gauge.

Definition of the landslide events

The shortest distance criterion was adopted to determine the most representative rain gauge for each landslide and applied by using ArcGIS. Thus, the spatial distribution of the landslides assigned to each rain gauge by criteria of proximity allowed outlining the influence areas for each rain gauge, whose specific limits were defined by using the Thiessen polygon method (Fiedler 2003). Moreover, one or more landslides reported on the same day and related to the same representative rain gauge were deemed as a landslide event. A single landslide reported in a date was enough to define a landslide event, taking into account that the analyzed landslide dataset comes mainly from the review of press archives (Zêzere et al. 2015). The existing discrepancy between daily precipitation records, measured from 7 to 7 UTC of the following day, and landslide records, located considering calendar days, made it necessary to deeply review them, where data about the time of occurrence or the moment of the day were available, to avoid time lags between both data series.

Definition of critical antecedent rainfall periods

The critical antecedent rainfall periods for the triggering of landslides were calculated following the methodology developed by Trigo et al. (2005) and Zêzere et al. (2005). Considered precipitation data from each rain gauge cover a period of 37 hydrological years (October 1979–September 2016). For each day of this period, antecedent rainfall conditions were computed as the accumulated absolute precipitation for different fixed intervals: 1, 2, 3, 4, 5, 10, 15, 30, 40, 60, 75, and 90 previous days, respectively. From this, maximum antecedent rainfall values reached on each of the study years were determined for the different fixed intervals. In the case of the days with recorded landslide events, the antecedent rainfall conditions with the highest return period, also interpreted as the lowest exceedance probability, were assumed as the critical combination of accumulated rainfall (E)-duration (D) for the occurrence of instabilities. However, those events with return periods lower than 3 years were removed as a way of discriminating between landslide events clearly related to extraordinary rainfall and more frequent landslide events in which this relation is not so clear. The return period was calculated on the basis of the Gumbel distribution (Gumbel 1958). Previously described methodology was applied to the events defined for each rain gauge. After the selection of the critical E-D conditions with a return period higher than 3 years, the remaining conditions were also analyzed taking into account the precipitation data series of the nearest rain gauges to search for other possible correlations.

Definition of the antecedent rainfall thresholds

The used methodology (Trigo et al. 2005; Zêzere et al. 2005) describe a threshold as a curve in the E-D space that separates the rainfall conditions that have triggered landslides from those conditions that have not triggered landslides. Three empirical antecedent rainfall thresholds were implemented for each rain gauge: (i) a threshold coincident with the best-fit line (Thbest-fit); (ii) a minimum threshold below which no landslide events were recorded (Thmin); and (iii) a maximum threshold over which all the conditions resulted in landslides (Thmax). For each rain gauge, the best-fit line threshold was calculated through linear regression considering all the E-D conditions that triggered landslides, while the minimum threshold was calculated only taking into account the two lowest E-D conditions that triggered landslides. The remaining E-D conditions that did not trigger known landslides were also represented, except the following: (i) all the E-D pairs (from 1 to 90 days) of the days with known landslide events, both in the influence area of the rain gauge analyzed and in the surrounding ones, and (ii) all the E-D pairs (from 1 to 90 days) of the days following a landslide event with accumulated rainfall values over the minimum thresholds defined for the rain gauge. Critical E-D conditions defined for the day of the landslide event were used to determine if the threshold is or is not exceeded in the following days. Thus, the maximum thresholds were represented considering the two highest E-D conditions that did not trigger any known landslide for each rain gauge. The landslide occurrence probability represented by each threshold was expressed as the percentage of the conditions which triggered events over the total conditions (that trigger and did not trigger) above the corresponding line.

Evaluation of the thresholds

Contingency tables and Receiver Operating Characteristics (ROC) metrics, previously implemented by many authors (e.g., Staley et al. 2012; Gariano et al. 2015; Zêzere et al. 2015), were adopted to quantitatively characterize and to assess the performance of the thresholds. Contingency tables use the thresholds as a binary classifier of the E-D conditions that are likely or not likely to result in landslides; these conditions are classified as True (it occurs) or False (it does not occur), and the predictions are classified as Positive (successful) or Negative (failed). Thus, the E-D points are categorized into four possible contingencies: (i) True Positives (TP), points above the threshold that resulted in a landslide event; (ii) True Negatives (TN), points below the threshold that did not result in a landslide event; (iii) False Positives (FP), points above the threshold that did not result in a landslide event; and (iv) False Negatives (FN), points below the thresholds that resulted in a landslide event. On this basis, four skill scores were calculated for each threshold following the terminology proposed by Staley et al. (2012): (i) True Positive rate (TPR) shows the proportion of events correctly predicted; (ii) False Positive rate (FPR) shows the proportion of positive predictions when the event did not occur, also called false alarms; (iii) False Alarm rate (FAR) is the ratio between the number of false alarms and the number of true forecasts; and (iv) Threat Score (TS) is a measure of the overall performance of the threshold where a perfect model score would equal one, and each incorrect prediction (FP or FN) reduces the value of TS.

Comparison of the thresholds

Finally, the thresholds from each rain gauge were newly calculated using the same E values divided by the mean annual precipitation (MAP) of the corresponding gauge. This MAP normalization was considered the most suitable to facilitate the comparison between E-D thresholds from different rain gauges and to verify the exceptionality of each threshold with respect to the typical rainfall registered in each area. The normalization by the number of rainy days, more appropriate for the I-D thresholds, was dismissed in the present work.

Results

The study period (October 2008–September 2016) includes 589 individual landslides located at least with a daily precision. However, 30 of these records, with a known trigger different from rainfall, were dismissed. The remaining 559 were analyzed in the present research (Fig. 1B), including those triggered by rainfall and those without a known trigger. Based on the Cruden and Varnes’ (1996) classification, 90 were classified as slides, 46 were considered flows, and 172 were classified as rockfalls. The lack of information hampered the classification in the remaining cases (251). A total of 205 landslides affected artificial slopes, while 99 took place in natural unchanged slopes; no data was available in the rest of the records (255). Regarding data availability about the spatial location, 408 landslides (73%) show the highest level of spatial accuracy (level 1), 87 landslides (16%) show a level 2, and the remaining 64 records (11%) show a level 3.

The number of individual landslides per hydrological year shows remarkable variations, ranging from 105 records in 2009/2010 to 24 records in 2010/2011. Six out of the 8 hydrological years show a number of records higher than the average annual values of 60.7 Ls yr−1, calculated for the whole BAPA dataset. No clear relation between the number of landslide records and the wet or dry character of the corresponding year was observed for the study period. However, a seasonal variation of the landslide records in correlation with precipitation records was observed at a monthly scale. February is the month with the highest number of records (106 landslides), and July and August are the months with fewer records (6 landslides, respectively) (Fig. 1C). The selected dataset shows a high proportion of instabilities that occurred during the wet period, October–May (444 landslides, 79%), while a lower amount (115 landslides, 21%) took place during the dry period June–September.

Concerning the six rain gauges selected as representative of the study area, 2 are located in the western sector of Asturias, with altitude values between 400 and 1170 m asl; the eastern sector is also represented by 2 rain gauges located at altitude ranging from 280 to 730 m asl; finally, the remaining selected rain gauges are located within the central sector, at altitudes between 127 and 336 m asl. The distance between each rain gauge and the selected landslides ranges from 1.5 to 48.4 km, with an average distance of 18.4 km. In 20% of the cases (114 records), the rain gauge-landslide distance is ≤ 10 km and in 37% of the cases (206 records) is ≤ 15 km. The location of these rain gauges is also representative of the areas with different annual precipitation in Asturias defined for Arasti et al. (2002) (Fig. 2A, B).

Fig. 2
figure 2

Location of the representative rain gauges selected for the study: A in relation to the spatial distribution of the annual precipitation and altitude (Arasti et al. 2002); B in relation to the landslide influence areas

On this basis, 392 landslide events were selected for the study period. The number of landslides per event ranges from 1 to 15, with an average value of 1.4. Furthermore, the number of events defined for each rain gauge ranges from 55 to 76, with quite similar average values of 1.2–1.8 landslides per event (Table 1). In 215 of these events, with return periods (T) lower than 3 years, the relation rainfall-landslide was considered doubtful and, for this reason, they were removed from the analysis. In the remaining 177 events (T > 3 years), the E-D values corresponding to the highest return period were selected as the critical conditions for the triggering of landslides. Events initially dismissed (T < 3 years) were analyzed together with the precipitation data series from the other nearby rain gauges. As a result of this, return periods higher than 3 years were found for 66 events previously removed from the analysis. Thirteen out of these 66 events were analyzed together with the events of the new rain gauges due to its geographical proximity. In 4 of these cases, an event had already been defined in the new rain gauge for the same date. For this reason, the number of landslides of the new event was added to those of the pre-existing event. In the remaining 9 cases, new events were considered for the analysis in the corresponding rain gauges, as noted in the “Other gauges” row of Table 1.

Table 1 Summary statistics for the landslide events defined for each rain gauge

As a result of this selection process, 186 events including 304 individual landslides were analyzed. In this way, only between 31 and 61% of the original 392 events were considered for the analysis, depending on each rain gauge. The low percentage of events with T > 3 years described for the rain gauges 2, 5, and 6 (31–38%) is notable, while values over 50% were reached in the rest of the cases. The number of events incorporated from another of the nearby rain gauges in each case is very low, between 1 and 4. This situation happened in 4 of the 6 rain gauges (1, 2, 4, and 5).

Table 2 shows the characteristics of the landslides included in the selected events for the influence areas of each rain gauge. In 5 out of 6 influence areas, the events that occurred during the wet period are more abundant than those recorded during the dry period. The extreme case is the rain gauge 2, without any event from the dry period. On the opposite side, the rain gauge 3 shows more events from the dry period, although the proportion wet-dry is almost equal. Due to the lack of data, the type of landslide and slope could not be established in a percentage of records from each influence area between 38 and 58%. Regarding those that could be classified, the rockfalls are the most common in the majority of the gauges, followed by the slides and the flows. In the case of the type of slope, the landslides affecting artificial slopes are, in general, more abundant than those affecting natural slopes. Moreover, the distribution of the landslides triggered by the selected events is not homogeneous, showing the highest concentration in the center and east of the study area, while a high amount of landslides was dismissed in the western area.

Table 2 Characteristics of the landslides analyzed in the influence areas of each rain gauge

The considered events show average durations of 48 days. D values in the range 60–90 days are the most frequent (98 events, 53%), while events between 1 and 10 days are less abundant (39 events, 21%). The remaining 26% (49 events) corresponds to D values in the range 15–40 days. Moreover, the events with D values of 60–90 days are the most abundant in all the rain gauges. Accumulated precipitation during all the studied events is in the range 54 ≤ E ≤ 986 mm, with average values of 486 mm. Mean E values were also calculated considering the different durations of the studied events: 198 mm for the events 1 ≤ D ≤ 10 days, 376 mm for the events 15 ≤ D ≤ 40 days, and 647 mm for the events 60 ≤ D ≤ 90 days; these values are coincident with those calculated for the events defined for each individual rain gauge.

Focusing on the events recorded during the dry (June–September) and wet (October–May) periods, some differences were observed. The majority of the events defined for the dry period, which constitute 12% of the total events considered, show durations in the range 1 ≤ D ≤ 10 days, with average D values of 6 days, and accumulated precipitation values in the range 54 ≤ E ≤ 380 mm. Only one event shows a duration of 30 days, with an E value of 283 mm. It is worth mentioning that 22 out of 23 events defined for the dry period occurred during an extraordinary episode of rainfall that took place in June 2010. In contrast, a significant part of the events defined for the wet period, which constitute 88% of the dataset, shows longer durations up to 15 days, with average D values of 48 days, and higher accumulated precipitation values between 176 and 986 mm. Some differences were also observed between the E-D conditions of the events defined for each rain gauge. A higher amount of events below 30 days can be seen in the rain gauges 3, 4, and 5 (Fig. 3A). Thus, accumulated precipitation shows notable differences between the rain gauges located in the center of Asturias (3 and 4), with lower accumulated values than those located in the mountainous areas (2 and 6) (Fig. 3B). Moreover, data from the rain gauge 6 show lower dispersion both in D and E values than the rest of the rain gauges (Fig. 3).

Fig. 3
figure 3

E-D conditions calculated for the events defined for the different rain gauges (1, 2, 3, 4, 5, and 6). A Rainfall duration D (days) of the events. B Accumulated precipitation (mm) for the events

Critical E-D landslide-triggering conditions selected for each rain gauge were plotted in individual scatter plots. All the best-fit lines calculated through linear regression show good R2 correlation values between 0.90 and 0.96. After the selection process previously described, the E-D conditions which did not cause landslides were also plotted in the corresponding graphic of each rain gauge. The number of those non-landslide conditions ranges from 28,452 to 32,492, with D values between 1 and 90 days and E values in the range 0 ≤ E ≤ 1017 mm.

Figure 4 shows the (i) best-fit line, (ii) maximum, and (iii) minimum thresholds calculated for each rain gauge; percentages correspond to the probability of occurrence of a landslide represented by each threshold. The maximum thresholds always represent a probability of occurrence of 100% since all the E-D conditions located over them triggered landslides; furthermore, the minimum thresholds show the probability of occurrence values in the range 2 to 7%. The probability of occurrence represented by the best-fit thresholds shows a higher variability: the rain gauges 3, 4, and 6 present probability values of 50–52%, the gauge 1 presents a probability of 28%, while the lowest probability values, between 5 and 9%, were computed for the gauges 2 and 5. Table 3 summarizes all the thresholds and their corresponding probabilities of occurrence of landslides, calculated for the 6 rain gauges.

Fig. 4
figure 4

Rainfall thresholds for the triggering of landslides, calculated on the basis of the landslide events defined for each rain gauge. Maximum or upper limit, best-fit line, and minimum or lower limit thresholds for the rain gauge 1 (A), rain gauge 2 (B), rain gauge 3 (C), rain gauge 4 (D), rain gauge 5 (E), and rain gauge 6 (F)

Table 3 Rainfall thresholds and probabilities of landslide occurrence for each rain gauge

Skill scores calculated for each threshold were included in Table 4. Minimum thresholds show the highest values of the True Positive rate (TPR = 1), since all the E-D conditions that caused landslides are located over it and, in consequence, properly predicted. However, the False Positive rate shows values higher than the other thresholds because a 91–98% of the E-D conditions located over it did not trigger landslides. The opposite situation can be seen in the case of the maximum thresholds; computed TPR values are lower than those for the other thresholds because the majority of the E-D conditions that produced landslides, between 79 and 96%, are plotted below it; in addition, FPR is equal to 0 because no E-D conditions that did not trigger landslides are located over it. The thresholds corresponding to the best-fit lines show TPR values between 0.43 and 0.57 and FPR equal to 0 in all the cases. Moreover, these curves represent the most equilibrated ratio between the number of false alarms and true forecast, which is supported by the computed values of the False alarm rate, in the range 0.48 ≤ FAR ≤ 0.93. In contrast, the lower and upper limit thresholds show the highest and the lowest FAR values for each rain gauge, respectively. Finally, the best-fit thresholds represent the lowest number of incorrect predictions, both False Positives and False Negatives, reaching Threat Score values 0.06 ≤ TS ≤ 0.36 higher than those calculated for the other thresholds.

Table 4 Skill scores associated to the rainfall thresholds. Abbreviations corresponds to: True Positive (TP), True Negative (TN), False Positive (FP), False Negative (FN), True Positive rate (TPR), False Positive rate (FPR), False Alarm rate (FAR), and Threat Score (TS)

Figure 5A allows for the comparison of the 6 best-fit thresholds calculated for each one of the rain gauges. Slope of the curves ranges between 4.7 and 8.6. Most restrictive E-D conditions for the occurrence of landslides are defined by the threshold of the rain gauge 6, followed by those of the rain gauges 1, 2, and 5, while the least restrictive conditions are defined for the thresholds of the rain gauges 3 and 4. Furthermore, the rainfall thresholds normalized by the mean annual precipitation (MAP) of each rain gauge were plotted in Fig. 5B. In this case, all the curves defined quite similar critical conditions of occurrence except in the case of the rain gauge 4, with a threshold significantly less restrictive than the rest of the cases.

Fig. 5
figure 5

Comparison between the best-fit rainfall thresholds computed for each rain gauge, considering: A absolute accumulated precipitation (mm); B accumulated precipitation normalized by the mean annual precipitation (MAP) of each rain gauge

Discussion

Characteristics of the input dataset

The size and representativeness of the empirical input dataset are two of the critical issues to model landslide-triggering rainfall conditions since the uncertainties associated with the modeling are reduced as the number of reliable empirical data points increase in the database (Vennari et al. 2014). One of the first decisions adopted in this work was to consider if the landslide records gathered in the BAPA database were to be used in whole or in part. The use of the whole dataset would have provided a much larger amount of data, covering a period of 37 years. However, analysis previously performed by Valenzuela et al. (2017) highlighted the low completeness of the database between 1980 and 2007, which suggests that a remarkable amount of landslide events have been underreported. Considering that the modeling is based on the analysis of rainfall events which triggered landslides in the past, the use of the records from the 1980–2007 period would introduce a strong bias in the input dataset. For this reason, only the records within the period 2008–2016 were considered. Due to the strong seasonal variation registered in the occurrence of landslides from 2008 to present, with a significant increase in frequency observed between October and May (Fig. 1C), the hydrological year was selected instead of the calendar year for the definition of the study period.

Due to the need to correlate precipitation and landslide data series, the temporal factor played a major role in the selection and process of the landslide records. The knowledge of the date of occurrence was considered the most important requirement for the selection of the landslides, dismissing those records with low temporal accuracy (levels 2 and 3). Unlike the temporal accuracy, the reliability of the spatial information was not considered a limiting factor for the use of the landslides. The records with spatial accuracy levels 2 (uncertainty of 100–500 m, equivalent to 1–5 cm on a 1:10,000 scale) and 3 (uncertainty > 500 m, equivalent to more than 2 cm on a 1:25,000 scale) were also considered for the analysis. Although low spatial accuracy records introduce uncertainty for the selection of the nearest rain gauge to each landslide, this fact was considered negligible for the present research. Otherwise, the size of the input dataset would not have been significant enough for the analysis. On the other hand, the limited accuracy of the data about triggers gathered within the BAPA database discouraged its use for the selection or rejection of landslide records. Only those cases with an obviously anthropogenic origin were removed from the analysis, while the remaining cases were all considered for the modeling.

Another significant constraining factor for the rainfall-triggered landslide occurrence modeling is the limited availability of precipitation data. It means that, in many cases, the selected rain gauges are several kilometers away from the location of the analyzed landslides. Considering the rugged orography of Asturias, where the climate shows a significant spatial variability between neighboring areas, this fact implies a considerable level of uncertainty. However, the calculation of the return period requires the use of the entire daily data series for the period 1979–2016, discouraging the use of partially complete series from closer or more representative gauges. The large number of factors which affect the climate in mountainous areas also hampers the rainfall modeling through simple interpolation or more complex geostatistical methods. In this respect, the use of satellite rainfall data and meteorological radar data could represent a solution for the spatial uncertainty linked to the use of data from the nearest rain gauge. Moreover, satellite and radar data may eventually allow a more detailed analysis to quantify the connection between rainfall and landslides at the hourly time scale. However, the time of occurrence or the moment of the day is known only in 22% of the landslide records from the BAPA database, making difficult to address this detailed analysis at the present stage.

Despite its relatively short duration, the selected 8-hydrological-year interval (October 2008–September 2016) includes an amount of landslide records (559) large enough to carry out the proposed statistical analysis. Moreover, this period offers other advantages that make it suitable for the analysis, showing an almost homogeneous spatial distribution of the records across the study area and the highest levels of spatial and temporal accuracy reached within the BAPA time span.

It is worth noting the large number of empirical data necessary to perform significant statistical analysis, considering that a relevant percentage of the original data is dismissed during the selection process due to the level of uncertainty of the records or the unproven relation between rainfall and landslide event. Only 559 out of the 1261 individual landslides recorded for the 8-hydrological-year study period were selected for the analysis due to their high temporal accuracy (level 1). On this basis, 392 landslide events were considered, but 206 out of them were dismissed because the rainfall-landslide event relation was not fully proved. Finally, only 24% of the original dataset (304 individual landslides distributed in 186 landslide events) were used for the analysis. Thus, the selection of the analyzed records considering both, the completeness of the landslide database and the precipitation data series and the classification of the recorded spatial and temporal information in accuracy levels, provided the necessary criteria to select a reasonably representative and relatively unbiased input dataset for the modeling.

The definition of the landslide events considered for the present research was conditioned by two factors: (i) the origin of the information from the BAPA database, mainly extracted from press archives, and (ii) the representative rainfall gauges selected. The experience achieved during the review of newspapers showed that, in many cases, the occurrence of a landslide event is not fully reflected in press; only those landslides that cause adverse social or economic consequences appear in the newspapers, while the remaining landslides are reported without any spatial or temporal reference or simply unreported. This fact supported the adoption of the criterion established by Zêzere et al. (2015), which considers a single landslide enough for the definition of a landslide event in the case of inventories based on press archives, assuming that more unreported landslides without adverse consequences have occurred on the same date.

Apart from this minimum requirement, the number of landslides per event depends on the number of rain gauges used for the analysis. In the present case, the records were divided between only six gauges, resulting in quite large landslide events including up to 15 instabilities. Conversely, the scarce number of gauges of reference gave as a result long landslide-gauge distances (in the range 1.5 to 48.4 km; > 10 km in 63% of the cases) within the same event. Considering the temporal comparison between landslide records and precipitation data series, the larger the landslide event size, the more robust the correlation with the corresponding rainfall event. However, spatial criteria must also be taken into consideration. From a strict point of view, modeled rainfall-triggering conditions are only applicable to the location of the rain gauge which provided the precipitation data series. For this reason, the remarkable distance between the landslides included in a particular event and the representative gauge strongly influences its significance. Although a minimum distance criterion has been considered suitable to link landslides and rain gauges in this stage of the research, future work must include new correlation criteria which consider the geographical-geomorphological and climatic conditions of the study area.

One of the main limitations of the present work is that at least 37% of the analyzed landslides took place in artificial slopes, although the percentage may be higher considering all the records without any information about the type of affected slope registered within the BAPA database. This fact strongly affects the results of the modeling, since those slopes altered by human activity are more prone to landslide occurrence. This would imply an increase in the probability of occurrence and, in consequence, a decrease in the required critical E-D conditions. Other authors (Zêzere and Trigo 2011) only considered landslides that occurred in natural slopes to avoid distortion of the results. However, in the case of Asturias, the high population density and the infrastructure network spread throughout the territory make landslides occurred in artificial slope the most relevant and frequent instabilities. In view of this, the removal of those landslides was discarded to avoid potential bias in the modeling and better reflect reality since these instabilities cause the majority of the problems. Another limitation is the joint analysis of different landslide types. However, there are not enough records of each landslide type to address a more detailed analysis at the present stage of the research.

Performance of the antecedent rainfall thresholds

Focusing on the selection of the parameters used to define the rainfall-triggering conditions, the accumulated rainfall (E) has been preferred to the rainfall intensity (I). In the case of the study area, there is no available rainfall intensity data directly measured in the gauges. Indeed, the I values used in previous works are mean values calculated by dividing E by D. The I-D parameters obtained in this way do not comply with the independence conditions needed to search the statistical relationship between two variables, as was previously pointed out by Peruccacci et al. (2012). Therefore, the direct use of the E parameter simplifies the methodology and guarantee the independence of the analyzed terms.

The identification of the critical rainfall duration D is always a challenge, namely for long-lasting rainfall periods. Working at the regional scale, there is no physical basis for any accumulation period, but the use of fixed periods of time to calculate the duration of the rainfall-triggering events and its combination with the return period constitutes a widely accepted statistically based methodology across the literature (Zêzere et al. 2015). The selected fixed periods, from 1 to 90 days, include the average duration of the most common precipitation patterns identified in the study area: (a) prolonged episodes of moderate-intensity frontal rainfall with durations between weeks and months and (b) brief episodes of heavy rainfall with durations of a few days.

In this way, the assumption of the E-D pairs with the highest return period as the critical landslide-triggering conditions is a useful way of selecting the landslide events clearly triggered by precipitation. The removal of those conditions with return periods lower than 3 years introduces a bias when dismissing a significant amount of non-extraordinary triggering E-D conditions, very common considering the wet Asturian climate. Even so, facing the development of a forecasting model, the removal of the most common triggering conditions is useful for reducing the probability of false alarms. Although the criterion T > 3 years has been successfully applied by other works (Trigo et al. 2005; Zêzere et al. 2005, 2008), it would be interesting to reduce this requirement for future researches in order to check the performance of the models considering a higher proportion of the most common triggering conditions. However, preliminary analysis considering T > 2-year E-D conditions showed similar results.

As a result of the use of the highest return period criterion, 53% of the defined events shows D values between 60 and 90 days, while 21% shows D values lower than 10 days. Thus, the followed methodology favors the definition of long events. Moreover, all the events with D ≥ 15 days took place during the wet period October–May, while all but one of the events occurred in the dry period (June–September) showed D values ≤ 10 days. This fact is consistent with the two seasonal rainfall patterns previously described in the study area as the main landslide-triggering conditions and the observed influence of the soil moisture in each case (Valenzuela et al. 2018a, 2018b). During the period October–May, the succession of moderate-intensity frontal precipitation episodes implies a gradual increase in soil moisture to levels approaching the saturation and its maintenance over periods ranging from a week to two or 3 months. Once these preparatory conditions have been reached, a moderately intense rainfall event is able to trigger the landslide. Thus, not only the final rainfall-triggering event but also the antecedent rainfall plays a relevant role in the occurrence of landslides and must be considered for the calculation of the thresholds. For this reason, the events show a large average duration of 48 days. Instead, the rainfall events defined between June and September show average durations of 6 days. This is consistent with the short-lasting heavy precipitation events characteristic of the dry period, with a rainfall intensity that allows the reaching of the preparatory soil moisture conditions in 1 or 2 days or even in a few hours, although these conditions disappear when the rainfall ends. In general, those rainfall events are isolated episodes during the dry period, so the antecedent rainfall is often non-existent and the triggering event plays the main role. It must be pointed out that all but one of the “dry period” landslide events took place during an extraordinary convective episode occurred in June 2010, triggering at least 220 landslides. These events show E values between 96 and 380 mm and D values in the range 1–10 days. In contrast, the other non-extraordinary convective event analyzed shows a duration of 1 day and an accumulated precipitation of 54 mm, triggering 3 landslides. This fact suggests that a larger number of non-extraordinary events are needed to reach more realistic conclusions about the triggering of landslides during the dry period.

Figure 6 allows the comparison between the E-D conditions calculated for the dry and wet periods. In the case of the dry period, the best-fit line has been calculated without considering the event with a 30-day duration, interpreted as an outlier. Between June and September, E values under 380 mm are required to trigger landslides, being reached in a short period of time. In contrast, the majority of the events recorded between October and May show E values over 300 mm, reached after a period of rainfall longer than a month. The best-fit lines calculated for each data set show relevant differences in the slope, emphasizing the variability of the rainfall-triggering conditions throughout the year. The observed differences suggest the desirability of using specific rainfall thresholds for the forecasting of landslides depending on the seasonal period. In the case of the present work, the scarcity of data about events recorded between June and September hampered the computation of robust seasonal rainfall thresholds for the study area.

Fig. 6
figure 6

Comparison between the E-D conditions of the events analyzed during the dry (June–September) and wet (October–May) periods and its corresponding best-fit lines

The E-D conditions defined for the rain gauges were used to calculate six local antecedent rainfall thresholds. The accumulated precipitation during the rainfall events shows a significant variability between each rain gauge (Fig. 3B). The most significant difference was observed between the gauges located in the center of Asturias (3 and 4) and those located in the highest mountainous areas (2 and 6). In contrast, the location of the rain gauges does not seem to significantly influence the duration of the events, although a high number of short events, occurred during the dry period, were recorded in rain gauges 3, 4, and 5 (Fig. 3A).

Mean annual precipitation across Asturias is directly correlated to the altitude, as was previously noted by Arasti et al. (2002). This fact explains the differences in E values observed between the six rain gauges and suggests a significant climatic control over the performance of the thresholds. Furthermore, these observations are consistent with the assertion of Pedrozzi (2004), according to which the slope instability system of a region tends to naturally reach an equilibrium state that is in accordance with the amount of rain that normally falls in the region. Thus, the thresholds defined for mountainous zones (2 and 6) are more restrictive because the equilibrium state of the slopes is adjusted to a higher MAP (between 1431 and 1713 mm yr−1, respectively). However, the lower MAP of the rainfall gauges located in the center of the region (3 and 4, with 1086 and 985 mm yr−1, respectively) implies less restrictive equilibrium conditions; consequently, a lower intensity rainfall event may cause slope instabilities (Fig. 4).

As a result of the analysis of contingency tables and skill scores, the best-fit curves were selected as the most suitable thresholds for the forecasting of landslides within the study area. Contrary to expectations, the calculated probability of occurrence for the best-fit curves was lower than 50% for the rain gauges farthest from the center of Asturias (thresholds 1, 2, and 5). It is worth recalling that the occurrence probability is calculated considering the E-D conditions above the threshold that have resulted in landslides over the total conditions above the threshold. Taking into account the previously mentioned lack of data existing in those areas, a relevant amount of rainfall-triggering events have probably been unreported, increasing the number of False Positives over the thresholds and, consequently, reducing the probability of occurrence. The rest of the thresholds represents the probability of occurrence values between 50 and 52%, which may be related to the higher completeness of the original dataset for their influence areas. This observation agrees with the location of the corresponding rain gauges in the central area of Asturias, where more landslide data are available.

A direct comparison between the six thresholds was discouraged due to the differences in climate observed for each area. For this reason, analyzed E-D conditions were normalized, dividing the E parameter by the MAP, to allow the comparison in relative terms. With one exception, the normalized thresholds set quite close triggering conditions (Fig. 5). Normalization dismisses the climatic factor and suggests a geological, environmental, or anthropic influence to explain the significantly low threshold obtained for the rain gauge 4.

From a geological point of view, the influence area 4 shows relevant differences with respect to the neighboring ones, as it was previously described by Domínguez-Cuesta (2003). This influence area matches with the Central Coalfield, characterized by a rugged orography and the alternation of materials which cohesiveness and weathering conditions favor the triggering of landslides. Another possible justification for the different performance of the threshold may be the higher anthropic activity in the corresponding influence area. At least 40 of the landslides used for the calculation of the threshold 4 occurred in artificial slopes, although the number could be even higher considering those records without data about the type of slope affected. However, this percentage is similar to that seen in the influence areas 1, 2, and 5, which show more restrictive thresholds (Table 2). Nor are there significant differences in the type of landslides recorded in each influence area to suggest an explanation for the different behavior of the threshold 4. In any case, the center of Asturias is a heavily human-altered area characterized by a high population density and the dispersion of the settlements across the territory. This fact increases the probability of occurrence of landslides, reducing the E-D parameters required for the triggering in natural areas, and the probability of the less striking instability events being reported by the newspapers or the institutions. This is, probably, the reason why the number of events D < 40 days used for the calculation of the thresholds 4 is higher than those used for the rest of the thresholds. Moreover, the accumulated precipitation values for those events D < 40 are lower than in the remaining rain gauges, which may impact the final result.

Conclusions

The landslide-triggering conditions are strongly linked to the spatial variability of the climate and the seasonal variability of the precipitation patterns. As it has been seen in Asturias, the triggering conditions defined by the local thresholds are in balance with the mean annual precipitation of the areas they were calculated for. Thus, the thresholds from the mountains define stricter rainfall conditions than those calculated for the lowest areas, which is consistent with the rainfall-altitude gradient previously observed in Asturias. Removing the climatic factor through normalization, all but one of the thresholds set quite close triggering conditions. The geologic characteristics and the human impact are the most likely causes for the lower performance of the threshold from the center of Asturias, interpreted as a minimum regional threshold for the study area. Moreover, the analysis has confirmed the variability of the landslide-triggering E-D conditions throughout the year due to the strong influence of different seasonal rainfall patterns. During the wet period, the antecedent rainfall plays a significant role, since the preparatory soil moisture conditions are reached and sustained due to the succession of moderate-intensity rainfall events during a month or more. In contrast, the landslides triggered during the dry period are generally linked to isolated short-lasting and heavy convective precipitation events. In this case, the antecedent rainfall is usually non-existent, and the D and E factors lost relevance in favor of the rainfall intensity of the triggering event. Such marked differences suggest the desirability of implementing different seasonal rainfall thresholds for the forecasting of landslides.

Despite the previously described limitations, the BAPA database and the selected rainfall data series have provided a reasonably representative input dataset for the modeling. In any case, the improvement and update of the landslide databases must be a central issue to reduce the bias and to successfully model the landslide occurrence. Moreover, the methodology followed to calculate empirical antecedent rainfall thresholds has been revealed to be suitable for the study area, considering that 80% of the analyzed landslides have occurred during prolonged precipitation events in which the antecedent rainfall plays a key role in the triggering of instabilities. Thus, the present work represents a valuable effort to characterize the rainfall-triggering conditions for the occurrence of landslides in Asturias and constitutes a step towards the temporal forecasting of landslides in the NW of Spain.