Introduction

Landslides extensively occur in the mountainous regions worldwide and have induced numerous casualties and property losses. In the areas susceptible to landslides, it is a cost-efficient tool to establish landslide early warning systems for hazard mitigation (Glade and Nadim 2014). Rainfall thresholds have been commonly employed in existing warning systems (Keefer et al. 1987; Aleotti 2004; Baum and Godt 2010; Segoni et al. 2015). This is mainly attributed to the significant roles of rainfall in facilitating slope instability: increasing the soil moisture, reducing the matric suction, and raising the pore water pressure (Chung et al. 2017). Additionally, the availability of weather forecast makes the prediction of potential landslide activity possible (Vallet et al. 2016).

Two types of rainfall threshold models have been used for landslide warning: physically based models and empirical models (Guzzetti et al. 2007). The first type of models are developed by simulating the hydrological process within the soil mass and analyzing the stability of the slopes (Wu et al. 2015; Kanjanakul et al. 2016). They can provide the specific time and location of potential landslides. However, it is difficult to get the hydrological and geotechnical parameters required to run these models on a regional scale. Thus, physically based models tend to be applied in smaller areas (Posner and Georgakakos 2015; Bezak et al. 2016). Empirical models define the rainfall conditions for landslide initiation in a region by analyzing historical landslide events and the associated rainfall data. Without considering underlying conditions like geomorphology, land use, and lithology, these models cannot give the exact location of possible landslides. Nonetheless, they can be used in combination with landslide susceptibility zonation techniques to help the associated agencies to assess landslide risks (Segoni et al. 2015).

To define rainfall thresholds, the following parameters have been commonly employed to characterize the rainfall conditions: (i) rainfall intensity (I) (Guzzetti et al. 2008; Mathew et al. 2014; Segoni et al. 2014; Ma et al. 2015; Giannecchini et al. 2016; Rosi et al. 2016); (ii) rainfall duration (D), which is usually linked to I; (iii) accumulated event rainfall (E) (Vennari et al. 2014; Gariano et al. 2015; Segoni et al. 2015; Lainas et al. 2016; Melillo et al. 2016; Rossi et al. 2017); and (iv) antecedent rainfall calculated for various time periods (Sepúlveda and Padilla 2008; Kanungo and Sharma 2014; Ma et al. 2014; Elias 2016). The I and E parameters can be normalized by mean annual precipitation or rainy-day-normal to concern the effect of climate conditions. The determination of which rainfall parameters are adopted is affected by the landslide and rainfall data available. For instance, the application of I and D requires rainfall measurements recorded at hourly or smaller time steps. Moreover, researchers might tend to use their preferable parameters. For example, some researchers adopt the ID threshold, while others employ the ED threshold, although these two types are interconvertible.

In general, it is impossible to divide the rainfall conditions into a 100% landslide field and a 100% non-landslide field. Therefore, two methods have been proposed to define rainfall thresholds (Lagomarsino et al. 2015). In the first method, the threshold is taken as the lower boundary of rainfall conditions that have triggered landslides (Bui et al. 2013; Lainas et al. 2016). Thus, it represents the minimum rainfall needed to initiate a landslide and is conservative in operational use for the great number of false alarms (Peres and Cancelliere 2014). The second approach identifies the threshold by a trade-off between the maximization of correct forecasts and the minimization of incorrect forecasts (Staley et al. 2013; Giannecchini et al. 2016). Statistical techniques, such as the Bayesian approach and the logistic regression analysis, are usually employed to derive thresholds for different exceedance probability levels or for diverse possibilities of landslide occurrence. The best-performing threshold is finally defined using contingency tables, receiver operating characteristic (ROC) analysis, and the related skill scores.

This study aims to define the rainfall threshold for landslide activity in Dazhou, which is a landslide-prone area in southwest China whereas little attention has been paid to the regional rainfall threshold. Concerning the relevant role of the antecedent rainfall in landslide triggering, we use this parameter in combination with the intraday rainfall to represent the rainfall condition. Initially, the rainfall and landslide data are systematically analyzed with logistic regression. Then, we obtain the optimal threshold by ROC analysis. Furthermore, the thresholds for different warning levels are derived following a procedure for performance evaluation of models with multiple warning levels. Finally, the thresholds are validated with additional data from the study region.

Study area and materials

Environmental setting of the study area

Dazhou is located in the northeast of Sichuan Province, southwest China (Fig. 1), and extends for an area of 16,591 km2. It consists of 7 county-level units: Wanyuan, Xuanhan, Dachuan, Tongchuan, Kaijiang, Qu, and Dazhu, with a total population of 5.6 million.

Fig. 1
figure 1

The geographical location and terrain elevation of Dazhou. Names of the 7 county-level units are also shown

Lying in the zone where the Daba Mountain and the Sichuan Basin intersect, the terrain elevation of Dazhou decreases from 2399 m in the north to 250 m in the south (Fig. 1). The north part of the region is featured by mountainous landforms, where the relative relief usually goes above 1000 m. The middle and the south parts are dominated by low mountains and hills, with the relative relief varying from 700 to 1000 m and flatlands scattering in the valleys. In general, mountains are the primary landforms in the study area and account for 70.7%. Hills and flatlands account for 28.1% and 1.2%, respectively (Tang et al. 2010). Dazhou is located in the composite part of the Daba Mountain and the East Sichuan arc-like fold belts, where fractures and folds are well developed. Mudstone, shale, sandstone, and limestone formed in Triassic, Jurassic, or Cretaceous are distributed across the study area. Quaternary overburdens have been developed on some slopes from mudstone and shale due to weathering, with the depth fluctuating between 3 and 9 m (Xiang 2015).

Climate in this area is typically subtropical, humid, and monsoon-affected. The mean annual temperature varies from 14.7 to 17.6 °C, with the south part warmer than the north. The mean annual precipitation changes between 1076 and 1270 mm, with the north slightly wetter than the south. The annual precipitation fluctuates greatly from 1 year to another. Moreover, precipitation concentrates in the monsoon season, namely, April–October. Geological disasters have widely occurred in this region due to the steep terrain, broken rock mass, and strong rainfall events during the monsoon season. A total of 1814 potential geohazard sites, which might bring damages to 102,828 inhabitants, have been identified (Liu et al. 2014).

Landslide and rainfall data

Landslide data used in this study were collected by the Sichuan Provincial Environmental Monitoring Center from 2003 to 2015. The database consists of the main information on a total of 1843 landslides: occurrence date, geographical location, landslide scale, and damage induced. Altogether, these landslides caused 28 deaths, 5 people missing, 66 injured, and a direct economic loss of 409 million RMB. Four landslides were eliminated in our research because the exact dates of their occurrences were not known. In recent years, the study area has undergone rapid industrialization and urbanization. Human activities such as roadbuilding and mining have increased the risk of landslide hazards. However, among the 372 landslides recorded with trigger information, only 3 were induced by human activities and the others were triggered by rainfall. Considering the low proportion of human-induced landslides, it is assumed that all the remaining slope failures without recorded triggering factors were initiated by rainfall. Therefore, a total of 1836 landslides are available, with the locations shown in Fig. 2. In terms of landslide scale, the category of individual landslides rather than the exact volume is contained in the database. It shows that landslides pertaining to small-sized (< 105 m3), medium-sized (105–106 m3), large-sized (106–107 m3), and super-large-sized type (> 107 m3) account for 79.0%, 16.0%, 4.5%, and 0.5%, respectively. Distribution of each type of landslides in different administrative units is listed in Table 1. Landslide typology (e.g., shallow landslides, deep-seated landslides, etc.) is not listed in the database and thus was not considered in the present study.

Fig. 2
figure 2

Distribution of the landslides triggered during the period of 2003–2015 in Dazhou and locations of the rain gauges used in this study. Gray lines are boundaries of the 6 alert zones (AZ1–AZ6), while the bold black line is an equidistance line (250 km) from the main fault in the 12 May 2008 Wenchuan earthquake

Table 1 Landslide distribution in terms of their sizes in different administrative units

Daily rainfall data measured by 6 rain gauges were employed for threshold analysis. Figure 2 depicts the locations of these gauges. The mean annual precipitation in the 2003–2015 period is 1294 mm, with the temporal distribution summarized in Fig. 3. Due to the high proportion of rainfall accumulated in the monsoon season (88.9%), almost all landslides were triggered in this period with only six exceptions. The number of landslides triggered in July is 1053 (accounts for 57.4%) and is much higher than the number for the other months. This is primarily attributed to the great amount of rainfall in July (accounts for 17.6%). Moreover, extreme rainstorms might lead to a soaring number of slope failures. An example can be found in mid-July 2010, when the recorded accumulated event rainfall was 449 mm in Wanyuan and 194 mm in Xuanhan and it induced a total of 480 landslides in these two regions. In operational regional early warning systems, alerts are commonly issued based on administrative units. Concerning the distribution of rain gauges (Fig. 2), Tongchuan and Dachuan were combined into one alert zone (AZ), while each of the other 5 county-level administrative regions was taken as an independent alert zone. For simplicity, these alert zones are labeled AZ1 through AZ6, as illustrated in Fig. 2. The corresponding rain gauge situated in each alert zone was taken as the reference gauge.

Fig. 3
figure 3

Distribution of precipitation and landslide occurrence during the year (a) and during the 2003–2015 period (b) in Dazhou

The dataset was divided into two subsets: a calibration set including 1426 landslides triggered during the 2003–2010 period, and a validation set covering 410 landslides initiated over the 2011–2015 period. The effects of the 12 May 2008 Wenchuan earthquake should be mentioned. The earthquake has greatly impacted the rainfall thresholds for geohazards in the stricken zones thereafter (Yu et al. 2014). To evaluate the influence of the earthquake on rainfall threshold in Dazhou, spatial and temporal distribution of landslides before and after the earthquake was analyzed. On the one hand, the study area was divided into two parts by the equidistance line which is 250 km from the main fault of the earthquake (Fig. 2). A total of 978 landslides were reported during the 2008–2015 period, and 51.6% of them were located in the west part. This proportion is close to that during the period 2003–2007 (50.8%). On the other hand, inter-annual variation in precipitation and landslide occurrence was investigated (Fig. 3b). Considering that the numerous landslides in 2010 were mainly triggered by the extraordinary storm event in Wanyuan and Xuanhan detailed above, there is no evidence that the study area has become more susceptible to landslides after the earthquake. In addition, no co-seismic landslides were reported in Dazhou for the lower seismic intensity (VI). Therefore, rainfall threshold for landslide occurrence in the study area was considered to be unchanged after the earthquake in this research.

Methodology

Rainfall parameters

The antecedent rainfall can impact slope stability by changing soil moisture content and correspondingly affecting the geotechnical parameters. Thus, it was taken as one parameter for threshold definition, denoted by Ra. Another parameter is the intraday rainfall (R0), which is usually the direct predisposing factor for slope failure. Moisture preserved in the soil mass will decline due to the evapotranspiration and discharge processes. Therefore, the effective antecedent rainfall rather than the accumulated antecedent rainfall was employed. Following Bruce and Clark (1966), it is written as:

$$ {R}_a=\sum \limits_{i=1}^n{k}^i{R}_i $$
(1)

where Ri is daily rainfall measured the ith day prior to landslide occurrence, n is the number of days considered, and k is the decay factor.

The value for the decay factor is affected by multiple factors, such as the evaporation capacity, the vegetation fraction, and the hydrological characteristics of the regolith. Different values ranging from 0.72 to 0.86 have been proposed by investigators, whereas the value 0.84 is used most extensively (Zhuang et al. 2014). Although this value was derived from the hydrological data measured in Ottawa (Bruce and Clark 1966), satisfactory performance was obtained when it was used in the rainfall threshold analysis for landslide warning in New Zealand (Crozier and Eyles 1980). We also employed k = 0.84 in the present study.

In terms of the number of days considered for computing Ra, values proposed in literature extend from 3 to 120, while values smaller than 30 are more commonly employed (Kanungo and Sharma 2014; Elias 2016). In some researches, Ra calculated for different time spans are plotted versus the intraday rainfall or the accumulated event rainfall, and the time span using which the rainfall conditions with landslide occurrence are more easily discriminated is finally adopted (Bui et al. 2013; Mathew et al. 2014). Therefore, we used different time periods varying between 3 and 30 days to compute Ra. We found that significant linear relationships exist among various Ra values. Figure 4 exhibits the slope and the determination coefficient when the Ra values computed for diverse periods are linearly correlated with that derived for 30 days. The slope is 0.90 when n is equal to 7, indicating that the corresponding rainfall is representative. It consists with the finding of Zhuang et al. (2014) in Xi’an, China, that the influence of a rainfall event on soil moisture becomes negligible after 7 days. As a consequence, we used n = 7 in this research.

Fig. 4
figure 4

The slope and determination coefficient when the effective antecedent rainfall computed for different periods is linearly correlated with that derived from a 30-day period

Logistic regression

Logistic regression has been widely used to discriminate rainfall conditions with landslide occurrence from those without slope failures (Glade et al. 2000; Chang et al. 2008; Frattini et al. 2009; Mathew et al. 2014; Giannecchini et al. 2016). In this approach, probability of landslide occurrence (p) is expressed as a sigmoid function of the linear combination of rainfall parameters:

$$ \ln \left(\frac{p}{1-p}\right)={\alpha}_1{R}_0+{\alpha}_2{R}_a+{\alpha}_0 $$
(2)

where αi is the regression coefficient.

For each alert zone, rainy days (with R0 ≥ 0.1 mm) during April–October in the period of 2003–2010 were classified into two groups: landslide days (LDs) and non-landslide days (NLDs). Logistic regression was carried out using rainfall parameters associated with the two groups of days, and the coefficients in Eq. (2) were derived correspondingly.

ROC analysis

ROC analysis is an effective tool to evaluate the performance of classification models. It works on the basis of a contingency table, which expresses the four possible outcomes when a threshold model is applied. A true positive (TP) means that the rainfall condition exceeded the threshold and at least one landslide occurred. A false positive (FP) represents an instance that the rainfall condition exceeded the threshold but no landslides were recorded. An (Ra, R0) data point located below the threshold is counted as true negative (TN) if no landslides were observed; otherwise, it is considered false negative (FN). In an early warning system, false positives and false negatives are false alarms and missed alarms, respectively.

Using the number of the four classes of outcomes, the following skill scores can be computed:

  • Probability of detection (POD) measures the proportion of landslide days that are correctively forecasted: POD = TP/(TP + FN);

  • Probability of false detection (POFD) measures the proportion of non-landslide days which are erroneously predicted: POFD = FP/(FP + TN);

  • Probability of false alarm (POFA) is the ratio between the number of false alarms and the total number of forecasts: POFA = FP/(TP + FP);

  • Hanssen and Kuipers (1965) skill score (HK) evaluates the accuracy for the instances with and without landslides and is calculated as the difference between POD and POFD: HK = POD – POFD.

In the ROC space (Fig. 5), each data point represents the prediction capability of a rainfall threshold. Given thresholds for different probabilities of landslide occurrence, an ROC curve will be derived. It represents the forecasting ability of the logistic regression model, the performance of which can be evaluated using the area under curve (AUC). The diagonal line from the lower left corner to the upper right corner in the ROC graph represents the strategy of random guessing a class, and the corresponding AUC value is equal to 0.5. The upper left corner of the ROC graph (the black dot in Fig. 5) represents the best classification, i.e., POD = 1 and POFD = 0. The shorter the distance between the data point and the best classification, the better the performance of the corresponding threshold. To get the best-performing threshold, the index Λ defined by Gariano et al. (2015) was employed, which is expressed as a linear combination of HK, POFA, and the Euclidean distance δ from the perfect classification:

$$ \Lambda ={\lambda}_1\cdot HK-{\lambda}_2\cdot POFA-{\lambda}_3\cdot \delta $$
(3)

where λ1, λ2, and λ3 are weight coefficients of individual skill scores, and λ1 + λ2 + λ3 = 1. In our study, the same weight was given to the three skill scores, i.e., λ1 = λ2 = λ3 = 1/3. The combination of the scores that maximizes Λ represents the best compromise between maximizing correct predictions and minimizing false predictions.

Fig. 5
figure 5

ROC space with a hypothetical curve. The black dot represents the best classification, while the diagonal line (in green) represents the strategy of random guessing a class

Performance evaluation of the model with multiple warning levels

For a given region, not only the probability of landslide occurrence but also the magnitude of the possible hazard, in terms of the number, the scar area, the volume, or the density, increases when the rainfall event becomes heavier (Gao et al. 2018). Consequently, a model with multiple warning levels considering landslide magnitude might be of greater value for landslide warning. In this study, both landslide number and the category of landslide size were used to evaluate the hazard magnitude and four classes of landslide events were specified following Piciullo et al. (2017): no landslides (no), small event (1–2 small-sized landslides, S), intermediate event (3–9 small-sized landslides or ≥ 1 medium-sized landslide, I), and large event (≥ 10 small and medium-sized landslides or ≥ 1 large and super-large-sized landslide, L). Correspondingly, four levels of warning may be issued: no warning (WL0), moderate warning (WL1), high warning (WL2), and very high warning (WL3), and three rainfall thresholds, denoted as RT1, RT2, and RT3, are required.

Calvello and Piciullo (2016) proposed a method to evaluate the performance of models providing multiple warning levels, which was further modified by Piciullo et al. (2017) and applied in Campania, Italy. It was also employed in the present research. In this method, two criteria are employed to evaluate the performance of the warning model. One is the “alert classification” criterion using a classification scheme developed from a two-by-two confusion matrix, as shown in Fig. 6a. Correct predictions (CP) are given to the situations that the two lowest levels of warning were issued and the two smallest landslide events occurred or the opposite situations. True negatives (TN) represent no warning and no landslide occurrence. Both false alerts (FA) and missed alerts (MA) are erroneous forecasts. The other criterion is the “grade of accuracy” criterion, which uses four color codes to represent the accuracy of the predictions: green (no error, Gre), yellow (minor error, Yel), red (significant error, Red), and purple (severe error, Pur), as depicted in Fig. 6b.

Fig. 6
figure 6

Alert classification (a) and grade of accuracy (b) criteria to assess the performance of a warning model with four classes of alert (Piciullo et al. 2017)

Using these criteria, Calvello and Piciullo (2016) defined 14 performance indicators. Eight of them have been adopted by Piciullo et al. (2017). Concerning the functional relationships between different indicators, only four indicators used by Piciullo et al. (2017) were employed in our study, as listed in Table 2. Rainfall thresholds RT1, RT2, and RT3 can be defined for the warning model based on these indicators.

Table 2 Indicators used for evaluating the performance of a model with multiple levels of alert (Piciullo et al. 2017)

Results and discussions

Results of logistic regression

Table 3 summarizes the number of landslide days (LDs) and non-landslide days (NLDs) for each alert zone (AZ) during the period 2003–2010. The landslide incidence is highest in AZ3 (10%) while it is lowest in AZ4–AZ6 (<5%). Rainfall conditions associated with the LDs and NLDs are plotted in the (R0, Ra) space in Fig. 7, represented by the red and the green points, respectively. The values of R0 are mainly in the range 0 ≤ R0 ≤ 150 mm, and the Ra values are mostly in the range 0 ≤ Ra ≤ 200 mm. The red points and the green ones overlap in a large part of the graph. To be more specific, no landslides were recorded in some days with relatively large R0 or Ra values, while some landslides were triggered when the R0 and Ra values were relatively small. This is probably caused by the spatial heterogeneity in rainfall. Only one rain gauge, commonly located in the river valley, is available in each AZ. Rainfall in the mountainous and the hilly regions might be significantly different from that in the vicinity of the gauge. In other words, precipitation data employed are occasionally incapable to represent rainfall conditions at the landslide locations. Nevertheless, Fig. 7 exhibits the trend that landslide incidence increases with the increase in R0 and Ra, suggesting the potential of using these data for rainfall threshold determination.

Table 3 Regression coefficients of the logistic regression model
Fig. 7
figure 7

The intraday rainfall versus effective antecedent rainfall conditions for landslide days (red dots) and non-landslide days (gray dots) and the 10% (solid line), 50% (dashed line), and 90% (dotted line) probability thresholds derived from logistic regression for each alert zone using the calibration dataset

Table 3 also lists the regression coefficients in Eq. (2), which were derived using the Statistical Product and Service Solutions software. Eq. (2) can be rewritten as:

$$ {\displaystyle \begin{array}{c}{R}_0+\beta {R}_a=R(p)\\ {}\ \beta =\frac{\alpha_2}{\alpha_1},R(p)=\frac{1}{\alpha_1}\left(\ln \left(\frac{p}{1-p}\right)-{\alpha}_0\right)\end{array}} $$
(4)

which means that the iso-probability rainfall condition can be represented by a straight line with a slope of –β and a y-intercept of R(p) in the R0Ra space. Figure 7 depicts the R0Ra relationships for the 10%, 50%, and 90% probability levels. The iso-probability lines have the slopes −0.83 ≤  – β ≤  − 0.51, indicating that the intraday rainfall prevails over the effective antecedent rainfall in landslide initiation. The value of β can reflect the relative significance of antecedent rainfall. In cooler zones or in areas where the permeability of soil is lower, soil water brought by antecedent rainfall can be maintained for a longer period and β would be higher.

Following Eq. (4), the probability of landslide occurrence was computed for each data point in Fig. 7. Theoretically, the probability corresponding to the lower limit of rainfall conditions associated with LDs (pL) should be equal to 0%, and that corresponding to the upper limit of rainfall conditions associated with NLDs (pH) should equal 100%. However, the calculated value for pH is significantly different from the theoretical one in the study region except AZ3 (Table 3). In terms of pL, the calculated value approximates the theoretical one. To investigate the reason for the difference, we compared the probability derived from logistic regression with landslide incidence computed by simple statistics. Initially, we defined the combined daily rainfall (Rc) as a linear combination of R0 and Ra:

$$ {R}_{\mathrm{c}}={R}_0+\beta {R}_a $$
(5)

To facilitate the analysis, Rc was then normalized by the maximum value calculated for the corresponding AZ (Rc,max):

$$ {NR}_{\mathrm{c}}={R}_{\mathrm{c}}/{R}_{\mathrm{c},\max } $$
(6)

in which NRc is called normalized combined daily rainfall.

For each AZ, landslide incidence for different rainfall conditions was computed using the following procedure: (i) the variable NRc was computed for each rainy day; (ii) a series of intervals were defined for NRc, with the lower limit beginning from 0 and increasing by 0.01 step by step, the length being 0.01, 0.02, 0.05, and 0.1, respectively, when the lower limit lies in the ranges 0–0.1, 0.1–0.2, 0.2–0.4, and 0.4–1.0, and the upper limit lower than the maximum NRc associated with NLDs; (iii) for each interval, landslide incidence was computed as the ratio between the number of LDs and the number of rainy days, and the corresponding value of NRc was represented by the average of NRc values for the rainy days. The reason for the usage of changeable interval lengths is that the number of rainy days decreases with the increase in NRc.

The landslide incidence and NRc data obtained from the procedure above were finally compared with the logistic regression model:

$$ {NR}_{\mathrm{c}}\cdot {R}_{\mathrm{c},\max }=\frac{1}{\alpha_1}\left(\ln \left(\frac{p}{1-p}\right)-{\alpha}_0\right) $$
(7)

which is a combination of Eqs. (4)–(6), as shown in Fig. 8. The maximum NRc value associated with NLDs and the corresponding theoretical probability (100%) is also exhibited in Fig. 8. It suggests that the relationship expressed as Eq. (7) is generally consistent with the statistical data, whereas they agree with each other much better for the smaller NRc (or p) values than for the larger values. This probably results from two factors. Firstly, there are a very small number of rainy days available for larger NRc values, leading to a significant fluctuation in the corresponding statistical landslide incidence. Additionally, the number of NLDs is one order of magnitude larger than that of LDs, and thus the rainfall data associated with NLDs exerted a more important influence on the determination of the regression coefficients summarized in Table 3. This might also explain why the computed value of pH is closest to the theoretical value for the alert zone having the greatest number of LDs (AZ3).

Fig. 8
figure 8

The relationship between probability of landslide occurrence (p) and normalized combined daily rainfall (NRc) calculated with the calibration dataset. The gray points represent data obtained by simple statistics, and the black curve was derived from Eq. (7). The maximum NRc value associated with the non-landslide days and the corresponding theoretical probability (100%) is also shown (the black square)

Determination of the optimal rainfall threshold

Despite the uncertainty associated with logistic regression as discussed above, it is still potential to employ this method to define the rainfall threshold, because in this research the threshold was determined by ROC analysis rather than using a specific probability level. The 4 contingencies (TP, FP, TN, and FN) and related skill scores (POD, POFD, POFA, HK, and δ) were computed for diverse probability thresholds. The ROC curves are shown in Fig. 9. The AUC values change between 0.839 and 0.907 and lie within the 95% confidence intervals, indicating that the prediction capability of the logistic regression model is acceptable.

Fig. 9
figure 9

The ROC curve for each alert zone

Taking AZ1 for instance, the skill scores for different probability thresholds are listed in Table 4. The number of true positives (TP) is maximized at the 1% probability threshold, and it is the same for the POD skill score. However, there are a great number of false alarms (FP), leading to high values for the POFD and POFA skill scores (0.549 and 0.920, respectively). When the threshold increases, both POFD and POFA decrease for reduced false alarms and get the minimum value (0) at the 90% probability threshold. Nonetheless, POD is dramatically low (0.118) in this case owing to the existence of numerous missed alarms (FN). In terms of the HK skill score, it measures the difference between POD and POFD and is mathematically \( \sqrt{2} \) times the perpendicular distance between the (POFD, POD) data point and the diagonal line representing random guess (the green line in Fig. 5). Therefore, along the ROC curve, the point associated with the maximum value of HK is farthest to the diagonal line. Concerning the convex feature of the ROC curve (Fig. 9), the point having the maximum distance to the diagonal line is usually closest to the perfect classification (the upper left corner of the ROC graph). Consequently, the HK skill score and the Euclidean distance δ reach their optimal values at approximate rainfall thresholds (p = 5%). As a linear combination of HK, δ, and POFA, the Λ index is maximized at a slightly higher threshold (p = 7%), which was finally selected as the optimal threshold.

Table 4 Skill scores corresponding to different probability thresholds for AZ1

Table 5 summarizes the optimal rainfall threshold and the corresponding skill scores for each alert zone. The best performance is obtained at the probability levels 4 %  ≤ p ≤ 10%. The corresponding POD skill score lies in the range 0.65 ≤ POD ≤ 0.79, which is close to the values reported in literature (Staley et al. 2013; Giannecchini et al. 2016). Considering that more than one landslide was recorded in some landslide days, the proportion of landslides that were triggered by rainfall conditions exceeding the optimal threshold is higher than the value of POD and varies between 0.71 and 0.95. Although the POFA skill score has been concerned to determine the optimal threshold, it still has a relatively high value (0.67 ≤ POFA ≤ 0.84), especially in comparison with POFD (0.07 ≤ POFD ≤ 0.18). This most likely results from the larger number of NLDs than that of LDs.

Table 5 The optimal rainfall threshold and the corresponding skill scores

Rainfall thresholds for different warning levels

Table 6 lists the number of the three classes of events with different magnitudes in each AZ during the 2003–2010 period. It suggests that about half of the landslide events are small, which on average occur 1 to 5 times a year, while intermediate events and large events account for 32.4% and 18.7%, respectively. The distribution of the triggering rainfall for each class of events is shown in Fig. 10. Despite the significant variation in most groups of rainfall data, in terms of the median and the third quartile, the triggering rainfall generally increases with landslide magnitude. However, there are two exceptions. One is the median triggering rainfall for intermediate landslide events is lower than that for small events in AZ4. The other is the triggering rainfall for the unique large landslide event is smaller than that for most small and intermediate events. Inspection of the data indicates that these exceptions are probably attributed to two factors: (i) the relatively poor representativeness of the rainfall data in AZ4 and AZ6, and (ii) the incompleteness in landslide records. Undoubtedly, these problems also exist in the data associated with the other AZs. Nevertheless, they are not as severe as the case in AZ4 and AZ6 and have not influenced the positive relationship between landslide magnitude and triggering rainfall.

Table 6 The number of the three classes of landslide events with different magnitudes during the 2003–2010 period
Fig. 10
figure 10

Comparison of the triggering rainfall for each class of landslide events during the 2003–2010 period (S, small event; I, intermediate event; L, large event)

To define the three thresholds (RT1, RT2, and RT3) for the four warning levels, sensitivity of the four performance indicators (Ieff, HRL, PPW, PSM) to each threshold was analyzed. Taking AZ1 for example, the value of the indicators was computed for different probability combinations. The results are shown in Table 7. The number of correct predictions (CP) increases with the increase in RT1, while the number of true negatives (TN) exhibits an opposite trend. The change in RT1 has no influence on the number of false alarms (FA) and missed alarms (MA). Correspondingly, the maximum value for Ieff, HRL, and PPW is obtained at the smallest RT1 (p = 1%). In terms of the Pur indicator, it is also optimized at the 1% probability threshold for RT1. It seems that the smaller the probability for RT1, the better the model performance. However, further analysis reveals that the variation in all the four indicators is simply induced by the variation in the number of CP while RT2 and RT3 remain unchanged. Comparison between the two classes of criteria illustrated in Fig. 6 manifests that landslide events pertaining to CP come from two sources: events with no prediction error but with landslide activity (Gre – TN) and events with minor error (Yel). In other words, a mathematical relationship exists among the number of CP, Yel, Gre, and TN: CP = Yel + (Gre – TN). When the probability for RT1 decreases from 10 to 1%, the number of (Gre – TN) rises from 10 to 16, while the number of Yel grows from 44 to 377, suggesting that the increase in the number of CP is primarily induced by the increase in the number of the events with minor prediction error. That is to say, if the 1% probability threshold is used for RT1 in the operational early warning system, a great number of WL1 would be issued for the days without landslide occurrence, and the reliability of the system would be affected. Concerning this issue, the optimal threshold determined by ROC analysis was finally chosen as RT1, i.e., p = 7%. Compared with the 1% probability threshold, the proportion of the events classified as Yel in those pertaining to CP decreases from 0.96 to 0.81.

Table 7 Performance of the warning model with different combinations of probability thresholds for AZ1 during the period 2003–2010

In the aspect of the RT2 threshold, when it becomes larger, fewer events are classified as FA while a greater number of MA are generated. Since the number of TN keeps constant, the number of CP is just related to the total number of incorrect predictions (FA + MA). The number of Pur increases marginally with the increase in RT2. Correspondingly, the Ieff, HRL, PPW, and PSM indicators change in different ways when RT2 is altered and arrive at their optimal values at different probability thresholds: p = 30%, 40%, and 80% for Ieff, p = 30% for HRL, p = 80% for PPW, and p ≤ 40% for PSM. In regard to RT3, the number of CP, FA, and MA is insensitive to this threshold, while the number of Pur decreases drastically when RT3 increases. Therefore, a high threshold (p = 90%) is required to minimize PSM. The determination of RT2 still remains a problem. Inspection of Table 7 indicates that the number of Pur remains at the minimum when RT2 varies in a relatively wide range (p = 10 – 40%), in which the optimal value of FA + MA is 18 and it remains the smallest when more thresholds are considered. As a consequence, the optimal RT2 threshold can be identified as the one which primarily minimizes the occurrences with severe error (Pur) and then generates the smallest possible incorrect predictions (FA + MA). If more than one threshold is selected following this criterion, the lowest one, namely the most conservative one, will be finally utilized. For instance, although the 30% and 40% probability thresholds provide equivalent values for Pur and for FA + MA (Table 7), the lower threshold generates less missed alarms and more false alarms and was ultimately selected for RT2.

To obtain the optimal RT2 and RT3 thresholds for each alert zone, more probability combinations were employed. Using the probability for RT2 and RT3 varying in the range 20 %  ≤ p ≤ 90% and 35 %  ≤ p ≤ 95%, respectively, with an interval of 5%, a total of 117 combinations were considered. The optimal combination was identified following the criteria discussed above and is listed in Table 8. For AZ1 and AZ5, the values of Pur and FA + MA are minimized at the selected probability combination, while for the other AZs the value of FA + MA is the second smallest among the 117 combinations. Inspection of Table 8 shows that the optimal probability for RT1, RT2, and RT3 lies within the ranges 4 %  ≤ p ≤ 10%, 30 %  ≤ p ≤ 55%, and 75 %  ≤ p ≤ 95%, respectively, which coincidently is approximate to the probability thresholds used for four warning levels in literature (p = 10%, 50%, and 90%, respectively)(Zhuang et al. 2014; Huang et al. 2015). The thresholds are also illustrated in the R0Ra space (Fig. 11).

Table 8 The RT1, RT2, and RT3 thresholds and the corresponding performance
Fig. 11
figure 11

The intraday rainfall versus effective antecedent rainfall conditions for the four classes of landslide events during the 2003–2010 period and the three thresholds for the warning model. The gray, yellow, orange, and red dots correspond to no landslides, small events, intermediate events, and large events, respectively. The solid, dashed, and dotted lines represent the RT1, RT2, and RT3 thresholds, respectively

Validation of the rainfall thresholds

To validate the thresholds, landslide and rainfall data during the monsoon season of the period 2011–2015 were analyzed. Table 9 lists the number of rainy days and LDs in each zone. Using the thresholds summarized in Table 5, the four contingencies and related skill scores were computed, as shown in Table 9. Comparison with skill scores derived from the calibration dataset, values of POD are generally higher for the validation dataset while values of POFD are just slightly different. Consequently, the HK skill score becomes higher. Nevertheless, POFA is also higher for most zones, especially for AZ4 and AZ6, where only 2 days were recorded with landslide occurrence during the 5-year period. Investigation of the rainfall data reveals that the average daily precipitation for rainy days in the monsoon season of the period 2011–2015 (11.9 mm) is lower than that for the period 2003–2010 (12.6 mm). Therefore, the proportion of LDs has decreased. However, the decrease is much more drastic in AZ2, AZ4, and AZ6 than in the other zones, which means that the landslide records are probably severely incomplete in these zones because the local governments have put more effort into geohazard mitigation and a large portion of landslides were not recorded considering that no damage was caused, leading to higher POFA values. As a result, using skill scores from the remaining zones to evaluate performance of the thresholds is more reliable, which shows that the skill scores derived from the validation dataset are comparable to that derived from the calibration dataset, with POD, POFD, and POFA changing in the ranges 0.71–0.77, 0.08–0.16, and 0.67–0.86, respectively.

Table 9 Performance of the rainfall thresholds for landslide occurrence for the period 2011–2015

The validation dataset from AZ1, AZ3, and AZ5 were further used to evaluate the RT1, RT2, and RT3 thresholds, as illustrated in Fig. 12. Table 10 summarizes the classification of the results and corresponding performance indicators. Comparison between Table 8 and Table 10 shows that although the Ieff and PPW indicators slightly decrease for AZ1 in the period 2011–2015, there is not much difference between the prediction performance for the same zone in the two periods, indicating the robustness of the thresholds and the potential of them for practical use.

Fig. 12
figure 12

The intraday rainfall versus effective antecedent rainfall conditions for the four classes of landslide events during the period 2011–2015 and the three thresholds for the warning model. The gray, yellow, orange, and red dots correspond to no landslides, small events, intermediate events, and large events, respectively. The solid, dashed, and dotted lines represent the RT1, RT2, and RT3 thresholds, respectively

Table 10 Performance of the rainfall thresholds for different warning levels for the period 2011–2015

Conclusions

In this study, daily rainfall data from 6 gauges and landslide data including 1836 records in Dazhou during the period of 2003–2015 were systematically analyzed. The study region was divided into six alert zones. For each zone, the rainfall threshold for landslide activity was defined using logistic regression, ROC analysis, and a procedure for performance evaluation of models with multiple warning levels with dataset for the period 2003–2010, and was further validated with dataset for the period 2011–2015. The following conclusions can be drawn:

  1. 1.

    The probability of landslide occurrence computed by logistic regression was compared with the value counted by simple statistics for various rainfall conditions. The difference between the two values might be relatively large when the probability is high. Therefore, it is highly suggested that the rainfall threshold be defined using a rigorous methodology such as ROC analysis rather than using a predetermined probability level.

  2. 2.

    Diverse threshold combinations were used to evaluate performance of the model with four warning levels, and the best-performing combination was determined. Performance of the threshold for the validation dataset is comparable to that for the calibration dataset, and the efficiency index varies in the range 0.74 ≤ Ieff ≤ 0.95. Considering the large geographical area of each alert zone, rainfall data measured by the unique gauge is insufficient to represent rainfall conditions in the whole zone. Nonetheless, performance of the warning model is acceptable. This is probably because rainfall parameters used in this study are associated with a longer period (8 days) of rainfall data, in which case the representativeness of the gauge might be better than the case that event rainfall data are used. Hence, the procedure for threshold determination employed in this research is expected to be applicable to other poorly gauged areas.

Many factors, such as geology, topography, and vegetation, also play important roles in slope stability. It is impossible to predict possible landslide activities merely using rainfall data. Consequently, it should be emphasized that the rainfall threshold should be used in combination with landslide susceptibility zonation techniques, in which the other factors are considered, in operational regional warning systems.