1 Introduction

A primary disaster triggering a series of secondary disasters is called a disaster chain, which is a complex phenomenon involving the propagation and amplification of disasters (Han et al. 2019a). As a common meteorological disaster with high potential for harm (Zeng et al. 2019), rainstorms can cause serious damage to infrastructure and threaten public safety (Gao et al. 2017); meanwhile, they are also a key trigger for geohazards (Abdulwahid and Pradhan 2017), which can form rainstorm-geohazard disaster chains. It is reported (China Meteorological Administration, 20052019) that rainstorm-geohazard disaster chains occurred 161 times in Hubei province and 135 times in Chongqing province from 2015 to 2019, and a single rainstorm was responsible for triggering 187 geohazards at most. These data highlight the importance of rainstorm-geohazard disaster chains.

Due to the structure of chain evolution, the damage and impact caused by a disaster chain are usually much greater than that caused by the simple superposition of several hazards (Han et al. 2019a). Therefore, the rainstorm-geohazard disaster chain often poses greater losses and affects sustainable regional socioeconomic development. As a key step in the development of disaster prevention and the effective reduction of disaster losses, hazard assessment plays an extremely important role in disaster management (Han et al. 2019a). Unfortunately, previous hazard assessments mostly do not consider the chain reaction between rainstorms and geohazards, resulting in a lack of a specific hazard assessment model of rainstorm-geohazard disaster chains. Although some studies have recently started to set rainstorms as a trigger for geohazards in hazard assessments (Ng et al. 2021; Rong et al. 2020), these results do not reflect the overall hazard of the rainstorm-geohazard disaster chain and ignore the phenomenon that rainstorms trigger multi-level secondary geohazards.

In hazard assessment models of rainstorm-geohazard disaster chains, how to represent the traits of rainstorms is a concern. Gao et al. studied the spatial characteristics of rainstorms through a stochastic rain field model and found that the cumulative rainfall depth of the maximum rolling rainfall at different time periods is important for landslide studies (Gao et al. 2017). Describing rainstorm characteristics by the spatial distribution and intensity of rainfall, rainstorms were incorporated into landslide impact factors to outline landslide-susceptible areas in a spatial and temporal context (Ng et al. 2021). The average and maximum annual rainfall has also been used as a condition factor to reflect the impact of rainstorms on geohazards (Abdulwahid and Pradhan 2017; Rong et al. 2020). The affected range of rainstorms is often large, and rainstorm models based on case (Yang et al. 2020) or high-precision data (Chang et al. 2021) are usually not able to reflect the dangerous features of rainstorms on a large scale. At the same time, it is difficult to comprehensively and objectively reflect the destructive property of rainstorms by using annual average precipitation or precipitation anomalies (Abdulwahid and Pradhan 2017; Rong et al. 2020) on a large regional scale.

Hazard assessments of disaster chains cannot simply superimpose rainstorms and geohazards but must consider their causal relationship (Cui et al. 2021). How to reflect the causality of primary hazards on secondary disasters is the key to the hazard assessment of disaster chains (Han et al. 2019a), and the form of disaster chains can be realized through various models (Rong et al. 2020). Bayesian networks (Kwag and Gupta 2017) and complex networks (Han et al. 2019b) can be used for the evaluation models of disaster chain risk. In addition, the susceptibility of the disaster chain can be studied using series and parallel models (Han et al. 2019b). Machine learning models based on conditional probability can also reflect the causality of disaster chains (Bernardo et al. 2007). The essence of ensemble methods is to construct an ensemble of individual classifiers to reduce prediction errors and improve performance. Due to their superior predictive power and high robustness, an increasing number of ensemble methods is being applied and explored in various assessments (Hu et al. 2021). Rong et al. used GBDT combined with Bayesian optimization to investigate the susceptibility of landslides triggered by rainstorms (Rong et al. 2020). However, the existing models cannot adequately describe the actual character of the rainstorm-geohazard disaster chain. In addition, the damage degree of geohazards is usually reflected by disaster intensity, and the occurrence probability of high-intensity geohazards is often lower than that of low-intensity geohazards, so the intensity data of the geohazards show an unbalanced distribution, while the current assessment models fail to consider this feature.

To help fill these gaps, this study attempts to develop a rainstorm-geohazard disaster chain model based on the formation process and actual data characteristics. First, the framework of the hazard assessment model is established by the primary disaster (rainstorm) and secondary disaster (geohazard). Second, the rainstorm parameters were used as variables to obtain the multiple scenario rainstorm intensity using Pearson type III curve and kriging interpolation, and the secondary geohazards were divided into multiple steps using DE-LightGBM and SMOTE-Tomek to calculate. Third, the three types of rainstorm-geohazard disaster chains were comprehensively evaluated in terms of overall characteristics and level distribution. Finally, the hazard results of different types of rainstorm-geohazard disaster chains, secondary geohazards and rainstorm intensities are compared, and model performance is verified.

2 Methodology

2.1 Formation process of rainstorm-geohazard disaster chains

As a common meteorological hazard, rainstorms are defined as daily rainfall greater than or equal to 50 mm and short-term precipitation (China Meteorological Administration, 20052019). Rainwater does not drain quickly in areas with low lying and closed terrain, which leads to waterlogging and oversaturation of soil moisture. Studies have shown that rainstorms play an important role in inducing geohazards (Rong et al. 2020).

Common secondary geohazards caused by rainstorms mainly include collapses and landslides. Collapse is a geological phenomenon in which a dangerous rock on a steep slope suddenly breaks away from its parent body under the action of gravity (Li et al. 2020a). Under hydrostatic pressure, the infiltration of rainwater not only further promotes fracture expansion but also leads to increased stress and reduced strength of the geotechnical soil that reinforces plastic flow deformation (Chen and Cui 2017). It also increases the slip force of the perilous rock to accelerate the formation of collapse. Combined with the impact of rainstorm and disaster-pregnant environment, the rock spalls to cause collapse.

The sliding of a rock–soil mass along a shear damage surface is called a landslide (Abdulwahid and Pradhan 2017). The infiltration of rainwater increases the water content of the geotechnical body in the landslide area. As the water level rises, the dynamic and hydrostatic pressures increase resulting in an increase in the weight of the geotechnical body. With the increase in the duration and volume of rainfall, the geotechnical body is gradually saturated to reduce the shear strength of the sliding surface and the landslide.

Rainstorm-geohazard disaster chains not only have the type of one-level secondary geohazards (collapses or landslides) but also exist in the form of multi-level secondary geohazards, such as rainstorm-collapses-landslides disaster chain (Yilmaz et al. 2006). Areas with rainstorm–collapse–landslide disaster chains occur where the terrain is sloping and the back edge is a steep cliff. Affected by rainstorms, rainwater flows into the joints and pores of geotechnical bodies. If the fissures of the geotechnical body are filled with rainwater, it will increase the pore water pressure. Under the influence of rainstorms, the disaster started with a collapse (one-level secondary geohazards) and then developed into a landslide (two-level secondary geohazards).

2.2 Hazard assessment model for rainstorm-geohazard disaster chains

In general, the hazard value of a disaster is equal to its susceptibility (the occurrence probability of the disaster) multiplied by its intensity (the damage degree caused by the disaster) (Han et al. 2019b), which can be considered only when there is a primary hazard. It is defined as:

$$H_{1} = P_{1} *I_{1}$$
(1)

where H1 is the hazard of the rainstorm, P1 is the rainstorm susceptibility, and I1 is the rainstorm intensity.

The hazard value of a disaster chain is the sum of the hazard values of the primary disaster (rainstorm) and its secondary disasters (Han et al. 2019a). Based on the formation process, the secondary geohazard can be divided into two types: one-level secondary geohazard (collapse or landslide) and multi-level secondary geohazard (collapse-landslide). Therefore, the hazard assessment model of the rainstorm-geohazard disaster chain is constructed as shown in Eqs. 2 and 3:

$$H_{1,2} = P_{1} *I_{1} + P_{1,2} *I_{1,2}$$
(2)
$$H_{1,2,3} = P_{1} *I_{1} + P_{1,2} *I_{1,2} + P_{1,2,3} * \, I_{1,2,3}$$
(3)

where H1,2 and H1,2,3 denote the hazard of the disaster chain that generates one-level and multi-level secondary geohazards, P1 and I1 are the susceptibility and intensity of the rainstorm, P1,2 and I1,2 are the susceptibility and intensity of the one-level secondary geohazard, and P1,2,3 and I1,2,3 are the susceptibility and intensity of the multi-level secondary geohazard, respectively.

From the above equations, rainstorm-geohazard disaster chains can be divided into two parts: rainstorms (P1 and I1) and secondary geohazards (P1,2, I1,2, P1,2,3 and I1,2,3). Therefore, accurately calculating these two parts is important for hazard assessments of rainstorm-geohazard disaster chains.

In the rainstorm-geohazard disaster chain, rainstorms are bound to occur; thus, P1 is set to 1. Rainstorms can be described from four perspectives (Jun et al. 2020): (a) precipitation exceeding a fixed threshold, such as daily precipitation exceeding 50 mm; (b) exceeding a variable threshold (return period), such as the 95th or 99th percentile; (c) precipitation over a sustained period of time, such as precipitation lasting one or three days; and (d) total rainfall value, such as perennial rainfall. This study characterizes the rainstorm intensity of a region by rainstorm parameters including duration and return period. According to different rainstorm parameters, rainfall is obtained as rainstorm intensity (I1) by fitting the distribution function with historical rainfall data, which enables a good description of rainstorm intensity (Zeng et al. 2019).

In this study, the chain occurrence of disasters is reflected in the calculation of secondary geohazards. In previous research results, rainstorms and environmental factors were used as input variables to calculate the hazard of geohazards triggered by rainstorms (Ng et al. 2021; Rong et al. 2020), indicating that geohazards occur under the conditions that rainstorms are a trigger factor and that environmental factors represent the disaster-pregnant environment. By analyzing the formation process of secondary geohazards and the research of other types of disaster chains (Han et al. 2019a), it is found that the intensity of one-level secondary disasters is influenced by whether one-level secondary disasters occur, the intensity of one-level secondary disasters is closely related to whether two-level secondary disasters can be triggered, and the intensity of two-level secondary disasters is influenced by whether two-level secondary disasters occur. Therefore, P1,2 is obtained with rainstorm intensity and environmental factors as input variables, I1,2 is obtained with rainstorm intensity, environmental factors and the value of P1,2 as input variables, P1,2,3 is obtained with rainstorm intensity, environmental factors and the value of I1,2 as input variables, and I1,2,3 is obtained with rainstorm intensity, environmental factors and the value of P1,2,3 as input variables.

The susceptibility of secondary geohazards (P1,2 or P1,2,3) indicates whether secondary geohazards occur or not, which is expressed in probability form. The intensity level of secondary geohazards (I1,2 or I1,2,3) is divided into 5 levels according to the relevant research and the actual situation, in which the damage intensity increases step by step. The intensity level data of secondary geohazards are not evenly distributed, and the number of high-level samples is smaller than the number of low-level samples. It is necessary to consider the unbalanced problem in the calculation of the secondary disaster intensity.

Accordingly, the flowchart of hazard assessment of the rainstorm-geohazard disaster chain is shown in Fig. 1. The following section describes the calculation methods for rainstorms and secondary geohazards and the selection of environmental factors for secondary geohazards.

Fig. 1
figure 1

Flowchart of hazard assessment of rainstorm-geohazard disaster chain

2.2.1 Rainstorms

In this study, Pearson type III curves (Eq. 4) were used as the fitted distribution function to represent the rainfall intensity of a rainstorm in different scenarios. The Pearson type III curve is one of the commonly used probability distribution functions in hydrology (Juma et al. 2021) and is considered a traditional distribution type in China (Shao and Liu 2018), which can fit rainfall well (Zeng et al. 2019).

$$f\left(x\right)=\frac{{\beta }^{\alpha }}{\Gamma \left(\alpha \right)}{(x-\delta )}^{\alpha -1}\mathrm{exp}\left[-\beta \left(x-\delta \right)\right];\alpha >0, \beta >0,\delta \le x<\infty$$
(4)

where Γ(α) is the gamma function of α, and α, β, and δ denote the shape, size, and location parameters of the Pearson type III curves.

The duration and return period were set based on rainstorm research results (Chang et al. 2021) and disaster cases from the China Meteorological Disaster Yearbook (China Meteorological Administration, 20052019). The durations were considered to be set to 1 and 3 days, which are consistent with the durations of general rainstorms and are widely used to estimate extreme precipitation for a given return period. The rainfall return periods were set to 50 years and 100 years. Regional rainfall intensity distribution maps are obtained by kriging interpolation.

2.2.2 Secondary geohazard

Ensemble learning has shown greater effectiveness in the spatial assessment of various environmental hazards (Hu et al. 2021). Compared with individual learners, ensemble learning has higher accuracy and shorter running time and is gradually becoming widely used in classification and regression in the field of machine learning (Li et al. 2021). LightGBM is an efficient GBDT implementation algorithm that improves computational efficiency while ensuring high model accuracy (Ke et al. 2017). Therefore, it is used for hazard assessment model of secondary geohazards in this study. LightGBM is expressed as Eq. 5:

$${F}_{k}\left({x}_{i}^{^{\prime}}\right)=\sum_{t=1}^{{L}_{T}}\sum_{j=1}^{{J}_{tk}}\frac{-{L}_{l}{G}_{tkj\_1}}{{G}_{tk{j}_{2}}+{L}_{\lambda }}$$
(5)

where Fk is called the kth decision tree for obtaining the fitted value of each sample, \({x}_{i}^{^{\prime}}\) is the ith sample of the learned data, Ll is the learning rate, Lλ is the regularization factor, Gtkj_1 is the sum of the first-order derivatives of the samples in the jth leaf node of the kth decision tree, determined by the loss function in t iterations, and Gtkj_2 is the sum of the second-order derivatives of the samples in the jth leaf node of the kth decision tree, determined by the loss function in t iterations. LT is the maximum number of iterations, and Jtk is the number of leaf nodes in the kth decision tree in t iterations.

LightGBM has a large number of parameters to need be set, and manual adjustment may result in the model not achieving optimal performance (Ke et al. 2017), so many methods to automatically adjust hyperparameters are used. The differential evolution algorithm (DE) is a random population-based method that is similar to the genetic algorithm (GA) and simulated annealing algorithm (SA), and the whole process needs to go through four stages including population initialization, mutation, crossover, and selection. The advantage of the differential evolution algorithm is that it belongs to the global optimality algorithm and is able to obtain the convergence number in a fast iteration (Kien et al. 2021). Therefore, the differential evolution algorithm (DE) was used to select the parameter selection for LightGBM.

The calculation of the susceptibility of secondary geohazards (P1,2 or P1,2,3) is implemented by the softmax function after the results were calculated by the LightGBM model. For each input vector, the category corresponding to the largest probability will be considered as the final predicted result. It can compress a k-dimensional vector \(F_{k} \left( {x_{i}^{^{\prime}} } \right)\) containing arbitrary real numbers into another K-dimensional real vector \(P_{k} \left( {x_{i}^{^{\prime}} } \right)\) such that each element ranges between (0,1) and the sum of all elements is 1, as follows (Ke et al. 2017):

$$P_{k} \left( {x_{i}^{^{\prime}} } \right) = e^{{F_{k} \left( {x_{i}^{^{\prime}} } \right)}} /\mathop \sum \limits_{P = 1}^{K} e^{{F_{P} \left( {x_{i}^{^{\prime}} } \right)}}$$
(6)

where \(P_{k} \left( {x_{i}^{^{\prime}} } \right)\) is the predicted probability for each sample with the kth category, and K is the number of categories.

To solve the unbalanced problem in this study, a SMOTE-Tomek algorithm can be used to convert the imbalanced dataset into a balanced one (Chen et al. 2019). SMOTE balances the minor dataset by increasing its samples as followings:

$$x_{{{\text{new}}}} = x_{i} + {\text{rand}}\left( {0,1} \right)*\left( {x_{n} - \, x_{i} } \right),i = 1,2, \ldots n$$
(7)

where xnew is a new sample, xi is a sample in minority classes, and xn is one of nearest neighbors.

By applying SMOTE, it is expected that the classifier can build a larger decision region that contains nearby minority class sample. But it can lead to overfitting. To overcome this drawback, an integrated Smote-Tomek Link algorithm is designed with Tomek Link. For the new sample and a sample closest to non-new samples, a pair of Tomek link is formed. A space is framed by taking the new sample as the center and the distance of Tomek Link as the range radius.

If the number of minority or majority classes in the space is less than the minimum threshold, the new sample is considered as “invalid sample” and such samples are removed or another SMOTE training has been performed. The above steps are repeated until the invalid sample is eliminated, and finally, the new samples are added to the dataset.

2.2.3 Secondary geohazard environmental factors

The disaster-pregnant environment includes natural environment and human environment in a broad sense. Based on the formation process of rainstorm-geohazard disaster chain and previous studies (Hakim et al. 2022; Hu et al. 2021; Li et al. 2020a), the environmental factors describing the disaster-pregnant environment are considered to be composed of four aspects: topography, hydrology, vegetation and human activities.

Topography includes slope, elevation, aspect, plan curvature, profile curvature, EI (Erodibility index), and TR (Topographic relief). The occurrence of collapse and landslides is both closely related to the slope. The value of the slope not only affects the vertical distribution of gravity (Tournadour et al. 2015) but also controls the slope shear conditions and seepage rate (Fernandez and Lutz 2010). The increase in slope will affect the stability of the slope. Compared with landslides, collapses generally occur in steeper areas. The elevation can respond to the humidity, temperature, and wind direction of the area (Chen and Cui 2017). Meanwhile, the maximum displacement of the geotechnical body is also related to the elevation in the formation of geohazards (Xia et al. 2013). The light conditions and the weathering degree of slope bodies are also influenced by the aspect to some extent. The plane curve directly affects the convergence and divergence of surface water flow. The profile curve affects the rate of movement of the material on the slope surface, which influences the formation process of the overburden material. Weathering causes changes in the physicochemical properties of the rock mass, which can cause strength and stability changes. Areas with severe weathering are often prone to geohazards (Roback et al. 2018). The degree of regional weathering can be reflected by the EI. The occurrence of geohazards requires certain geomorphological conditions and rarely occurs in areas with flat or low relief (Sajinkumar and Anbazhagan 2015). TR expresses whether the terrain is flat or not by the difference in elevation between the highest and lowest points in a certain area.

In terms of hydrology, the topographic wetting index (TWI), stream power index (SPI) and distance from rivers were selected as environmental factors. There are important factors in hazard assessment models for TWI and SPI. The TWI can be used to assess soil and runoff volumes that are theoretical models of the interactions between precipitation, stored moisture, and temperature conditions (Rahmati et al. 2016). The SPI measures the erosive force of flowing water based on the assumption that flow is proportional to the area of a given catchment (Pourghasemi et al. 2012). Slopes are more prone to instability in areas with high erosive forces of water flow. Rivers affect the water content of the surrounding rocks and soils and erode slopes, affecting stability. Therefore, distance from rivers is also a condition factor to be considered for the occurrence of geohazards.

Vegetation can reduce slope stability from root function, water seepage and soil erosion. It was found that areas with low vegetation cover are more prone to geohazards (Li et al. 2020a), and the normalized difference vegetation index (NDVI) was adopted to express this condition. In terms of human activities, distance from roads is usually considered a condition factor (Gayen and Saha 2018). The construction of roads enhances human activities, and dense roads tend to have high human activities. Moreover, road construction can excavate slopes and weaken their stability.

In summary, elevation, slope, aspect, plane curve, profile curve, EI, and TR were selected from the topographic conditions. TWI, SPI, and distance from rivers were selected from hydrological conditions, NDVI was selected from vegetation conditions, and distance from roads was selected from human activity conditions. A total of 12 environmental factors were selected.

It is necessary to evaluate the environmental factors before performing the model training as a factor with noise or negative power (Hu et al. 2021), which can cause the model to incorrectly learn the data. Therefore, this study uses an effective feature selection method, Relief-F (ReF), to measure the quality of factors by distinguishing between different classes of instances. The 12 geohazard environmental factors were screened, and those with significantly lower scores were excluded.

3 Material

3.1 Study area

The Three Gorges Reservoir area is a region with a high incidence of geohazards (Hou et al. 2016), with rainstorms as the main trigger, resulting in this area being selected as the study area in this study (Fig. 2). The Three Gorges Reservoir Area is located in the upper reaches of the Yangtze River. It is composed of 25 districts and counties, with latitudes of 28°56′N to 31°44′N and longitudes of 106°16′E to 111°28′E. The total area of the Three Gorges Reservoir is approximately 5,800 km2, with a population of over 30 million. Located in a subtropical climate, the general characteristics of the study area are relatively mild. Its average annual temperature is approximately 17.8 °C, and the average annual precipitation ranges from 723 to 1134 mm, with rainstorms usually occurring from May to September (Qin et al. 2020). The Yangtze River is the main water system in the study area, with mainstream and tributary lengths of approximately 660 km and 1840 km, respectively. The lithology of the Three Gorges Reservoir area belongs to transitional geological rock groups, mostly mudstone, siltstone and marl, which are prone to geohazards, and its main types are collapse and landslide.

Fig. 2
figure 2

Location of the study area and geohazard points inventory

3.2 Data

For topography, the elevation (Fig. 3a) was obtained from the digital elevation model (DEM) by ALOS-PALSAR with a spatial resolution of 12.5 m. Slope (Fig. 3b), aspect (Fig. 3c), plane curvature (Fig. 3d), and profile curvature (Fig. 3e) are obtained based on the DEM through the surface analysis tool in the ArcGIS spatial analysis. The EI (Fig. 2f) is obtained based on the processing of lithological data from the China Geological Survey (2001) 1:1,000,000 scale digital geological map database, which is a value for the assessment of lithological weathering capacity. The TR (Fig. 3g) is obtained by subtracting the maximum elevation from the minimum elevation within the area. For hydrology, TWI (Fig. 3h) and SPI (Fig. 3i) are obtained by the hydrological analysis tool in the ArcGIS spatial analysis. The distance from rivers is calculated using Euclidean distances in the ArcGIS spatial analysis (Fig. 3j). For vegetation, NDVI (Fig. 3k) is obtained from Landsat 8 (OLI) images. For human activities, the distance from roads (Fig. 3l) is obtained from the same as the river distance.

Fig. 3
figure 3

Secondary geohazard environmental factors: a Elevation. b Slope. c Aspect. d Plane curve. e Profile curve. f EI. g TR. h TWI. i SPI. j Distance from rivers. k NDVI. l Distance from roads

Historical rainfall data were obtained from the National Meteorological Science Data Center. A total of 58 meteorological stations in the Three Gorges Reservoir area were collected for historical data since the stations were established. As rainstorms in the study area mainly occur from May to September, the data of this period are adopted. Statistics of large-scale hazard point data are usually in the form of point data (Li et al. 2020b); therefore, the center of mass of these disaster locations is entered into ArcGIS to generate a disaster inventory map of the study area (Fig. 2). The disaster points of collapse and landslide are acquired by remote sensing images, and the collapse-landslide hazard points are acquired based on field investigation and historical information of the disaster occurrence process. A total of 5817 geohazard samples are collected, including 4258 landslide disaster points, 1173 collapse disaster points and 386 collapse-landslide disaster points. Geohazard nonoccurrence samples are randomly selected from the area outside the 1 km buffer zone established at the center of all disaster points to ensure the accuracy of the nonoccurrence sample selection.

4 Result

4.1 Selection of geohazard conditioning factors

The importance of environmental factors for three types of secondary geohazards in the rainstorm-geohazard disaster chain is shown in Table 1. For the collapse, the order of importance of the environmental factors is aspect, elevation, distance from rivers, NDVI, TR, slope, distance from roads, EI, TWI, profile curvature, plane curvature, and SPI. In terms of the landslide, the environmental factors in order of importance are aspect, elevation, NDVI, slope, distance from rivers, TR, TWI, plan curvature, profile curvature, distance from roads, EI, and SPI. For collapse-landslide, the importance ranking of environmental factors is elevation, aspect, NDVI, EI, slope, TR, distance from rivers, distance from roads, profile curvature, plane curvature, TWI, and SPI. The results showed that all factors contributed positively to the prediction of secondary geohazards, but the importance of the SPI was significantly lower in all three secondary geohazards compared with other environmental factors, so the SPI was removed from the initial dataset, and 11 environmental factors were used for assessment.

Table 1 Importance of conditional factors

4.2 Rainstorm intensity distribution

The regional rainstorm intensities based on different rainstorm parameters are shown in Fig. 4, which has four cases, namely, S1(duration = 1 day, return period = 100 years), S2(duration = 1 day, return period = 50 years), S3(duration = 3 day, return period = 100 years), S4(duration = 3 day, return period = 50 years).

Fig. 4
figure 4

Multiple scenario regional rainstorm intensity distributions (S1, S2, S3, S4)

The results show that the rainstorm intensity in the study area is similar in spatial regional distribution under various rainstorm parameters, to some extent. There are some areas in Wushan County, Badong County, Zigui County, Changshou County, Fuling District and Zhong County where the rainstorm intensity is at a greater value under different rainstorm parameters. The rainstorm intensity in Jiangjin District and its neighboring areas are at lower values under different rainstorm design values, and there is a more obvious gap compared to most of the Three Gorges Reservoir area. The spatial regional distribution patterns of rainstorm intensity are similar for the same duration and different return periods. Comparing the regional rainstorm intensities of different durations, the rainstorm intensities in Kaizhou District, Wuxi County, Wanzhou District, Yunyang County and Fengjie County with a duration of 3 days increase relative to those with a duration of 1 day.

4.3 Hazard assessment results of rainstorm-geohazard disaster chains

The hazard assessment results of rainstorm geohazard disaster chain are visualized. The natural breakpoint method was used to classify the hazard values of the study area into four levels, very low hazard (VLH), low hazard (LH), medium hazard (MH) and high hazard (HH), to obtain the hazard distribution map (Fig. 5) and the proportion of each level (Fig. 6).

Fig. 5
figure 5

Hazard distribution map of rainstorm-geohazard disaster chains under four rainstorms scenarios: a rainstorm–collapse disaster chain b rainstorm–landslide disaster chain c rainstorm–collapse–landslide disaster chain

Fig. 6
figure 6

Distribution areas of the rainstorm-geohazard disaster chains under four rainstorms scenarios: a rainstorm–collapse disaster chain b rainstorm–landslide disaster chain c rainstorm–collapse–landslide disaster chain

As shown in Figs. 5a and 6a, under different rainstorm scenarios, the area and distribution of each hazard level for the rainstorm–collapse disaster chain are similar. The area of each hazard level is presented in the order of HH, VLH, MH, LH from small to large. Within the study area, the area of LH accounts for the largest part, with more than 40% of the total area under different rainstorm scenarios. The distribution of the hazard level has spatial aggregation characteristics to a certain extent, where the area of VLH is mainly distributed in Jiangjin District and its neighboring areas, and the areas of MH and HH are mainly concentrated in Fengjie County and Fengdu County and the surrounding areas. The distribution pattern of the hazard level is similar under different rainstorm scenarios, which can also reflect the robustness of the assessment results. When the duration increased from one day to three days, the areas of MH and HH in Wanzhou District, Yunyang County and Fengjie County increased accordingly. In addition, the distribution of hazard levels is also correlated with the distribution of the Yangtze River system. Both banks along the river are in the areas of MH and HH.

As shown in Figs. 5b and 6b, the assessment results show that the hazard distribution pattern of the rainstorm–landslide disaster chain has a certain degree of consistency in different rainstorm scenarios. The area and distribution of each hazard level are similar. Four hazard levels can be arranged in increasing order of size according to HH, VLH, MH and LH, where the area of LH is the main type, accounting for more than 40%. Showing the characteristics of spatial aggregation, the MH and HH areas are concentrated in Fengjie and Fengdu counties and the surrounding areas, and the VLH area is concentrated in Jiangjin District and its adjacent areas. Moreover, in all rainstorm scenarios, the MH and HH areas are distributed in the areas along the Yangtze River, where there is a close relationship with the Yangtze River system. As the duration increased from one day to three days, the areas of MH and HH in Wanzhou District, Yunyang County and Fengjie County increased accordingly.

As shown in Figs. 5c and 6c, the hazard levels of the rainstorm–collapse–landslide disaster chain in the study area are mainly VLH and LH, and the sum of VLH and LH both account for more than 60%. The area of VLH is the largest in the study area for the S1, S2, and S4 scenarios. The area of LH is the largest in the study area for the S3 scenario, which is 7.3% more than the area of VLH. There are significant features in the hazard distribution, which means that the areas of MH and HH are concentrated in the areas along the Yangtze River system, and the areas of VLH and LH are distributed in the areas farther away relative to the Yangtze River system. The results for different rainstorm scenarios indicate that the hazard level distribution of the rainstorm–collapse–landslide disaster chain is stable.

5 Discussion

5.1 Model performance

The model performance of the rainstorm-geohazard disaster chain is presented in two parts: rainstorm and secondary geohazards. In the rainstorm section, the occurrence probability of rainstorms is set to 1, so it is not necessary to test the effect. The rainfall intensity of 58 stations was obtained by fitting Pearson type III curve, and then, regional rainfall intensity was obtained by kriging interpolation as rainstorm intensity. The fitting effect of Pearson type III curve with historical data is reflected by the root mean square error (RSME). The summary distribution of RSME for all sites is shown in Fig. 7. The maximum value of RSME is 0.023, and most of the results are concentrated in the interval of 0.0027–0.0195. The results of RSME indicate a good curve fit.

Fig. 7
figure 7

RSME distribution of the rainfall intensity fitting curve

Since the hazard assessment of secondary geohazards uses machine learning methods, common machine learning model performance assessment indices (Han et al. 2019b) (accuracy and area under the curve (AUC)) are used to validate the performance of the model. The accuracy and AUC of the prediction models for Pi and Hi under different rainstorm scenarios are shown in Table 2. In Table 2, Si indicates different rainstorm scenarios, the occurrence probability of the secondary geohazard Pi is a binary model evaluated by AUC, and the intensity of the secondary geohazard Hi is a multiclass classification model evaluated by Micro-AUC and Macro-AUC. In general, an AUC value > 0.7 can be considered an acceptable prediction value (Harrell et al. 1996). The accuracy and AUC of all models are greater, so the hazard assessment result of the disaster chain is credible.

Table 2 Accuracy and AUC values of the secondary geohazards

5.2 Comparison of different types of disaster chains

5.2.1 From the perspective of hazard assessment results

In terms of hazard distribution characteristics, different types of rainstorm-geohazard disaster chains were compared. Rainstorm–collapse disaster chains and rainstorm–landslide disaster chains have some commonalities in hazard distribution maps. The MH and HH areas were concentrated in Fengjie and Fengdu counties and their surrounding areas, and the VLH area is concentrated in Jiangjin District and its surrounding areas. All four hazard classes can be ranked by HH, VLH, MH and LH in increasing order of magnitude. In most rainstorm scenarios, the percentage of HH and LH of the rainstorm–landslide disaster chain is higher than that of the rainstorm–collapse disaster chain, and the percentage of MH is similar to both, while the percentage of VLH of the rainstorm–landslide disaster chain is lower than that of the rainstorm–collapse disaster chain, which indicates that the hazard of the rainstorm–landslide disaster chain is higher than that of the rainstorm–collapse disaster chain in the study area, which is consistent with the actual situation. The hazard distribution characteristics are basically similar, but the proportion of each hazard class in the area is different, which reflects the different effects of environmental factors on landslides and collapses under the same rainstorm, as also reflected in the disaster formation mechanism. This indicates that collapse and landslides can occur in the same space to form the collapse-landslide phenomenon. The areas of HH and VLH of the rainstorm–collapse–landslide disaster chain are higher than those of the other two disaster chains, indicating a polarization phenomenon, which suggests that the disaster chain with multi-level secondary hazards is more dangerous, although they have a lower susceptibility. The Yangtze River coast belongs to the HH area, indicating a close correlation between rainstorm-geohazard disaster chains and water systems, which has also been demonstrated in many studies (Hu et al. 2021).

The rainstorm intensity and the rainstorm-geohazard disaster chain were compared and analyzed. For the rainstorm–collapse disaster chain, Jiangjin district and its adjacent areas, where the rainstorm intensity is relatively low, are basically in the VLH area. In most regions with relatively high rainstorm intensity, such as Fengdu County, Zhong County, Fuling District and their adjacent areas, the hazard classes are basically MH and HH. During the formation of the disaster chain, as rainfall increases, the hazard caused by secondary geohazards also increases (Rong et al. 2020). Therefore, it is a reasonable phenomenon that the intensity of rainstorms is positively correlated with the hazard of the disaster chain. Additionally, the hazard of the rainstorm–collapse disaster chain is not only influenced by rainstorms but also related to the disaster-pregnant environment. For example, in some areas of Badong County, the rainstorm intensity is high, but the hazard level is LH, which means that the area may not be prone to geohazards. As the same type of disaster chain that triggers one-level secondary disasters, the rainstorm–landslide disaster chain exhibits the same characteristics as the rainstorm–collapse disaster chain. Due to the more complex formation process, the hazard assessment results of the rainstorm–collapse–landslide disaster chain exhibit another characteristic in addition to the characteristics reflected by the rainstorm–collapse disaster chain. In a few regions, the areas with relatively high rainstorm intensity are at the LH level, while the areas with relatively low rainstorm intensity are at the MH and HH levels. This feature suggests that the hazard of the rainstorm–collapse–landslide disaster chain is influenced not only by the rainstorm and disaster-pregnant environment but also by the triggering effect between secondary geohazards. The one-level secondary disaster (collapse) in some regions easily triggers the two-level secondary disaster (landslide), while the one-level secondary disaster (collapse) in some regions hardly triggers the two-level secondary disaster, which affects the regional hazard value assessment.

5.2.2 From the perspective of secondary geohazards

By analyzing and comparing Figs. 5, 6 and 7, it was found that the assessment results of these disaster chains are stable under different rainstorm scenarios. Therefore, when further discussing the secondary geohazard in the disaster chain, only the disaster chain under one rainstorm scenario (duration = 1 day, return period = 100 years) needs to be discussed.

As shown in Figs. 4, 5 and 8, the three types of rainstorm-geohazard disaster chains have a common feature: Rainstorm intensity and hazard distribution of disaster chains and secondary geohazards show a consistent trend. Especially in MH and HH regions, the hazard values of disaster chains and secondary geohazards show a positive correlation with the intensity of rainstorms. Many studies have shown that the formation process of geohazards is closely related to rainstorms (Ng et al. 2021; Rong et al. 2020). Rainwater infiltrates into the geotechnical body to change its physical properties and environment. Increasing rainstorm intensity raises the incidence and scale of geohazards and subsequently causes the hazard of the whole disaster chain to increase. Meanwhile, the hazard value of the disaster chain is the sum of all the disasters (Han et al. 2019b). The increase in rainstorm intensity and the hazard value of secondary geohazards can increase the hazard value of the disaster chain. Therefore, the positive correlation between rainstorm intensity and hazard distribution of disaster chains and secondary geohazards is consistent with the actual situation.

Fig. 8
figure 8

Hazard distribution map of secondary geohazards in three types of disaster chains under the S1 scenario

The formation of landslides in multi-level secondary geological hazards except the influence of rainstorm, the large amount of loose accumulation generated by collapse induced by rainstorm provides more material basis for the occurrence of landslide, which makes the scale of landslide be able to increase or promote the formation of landslide. This phenomenon is also known as the cascade effect of the disaster chain, and the destructive effect of the disaster will be amplified (Han et al. 2019a). In Fig. 8b, d, the percentages of hazard levels (VLH, LH, MH, HH) for the two secondary landslides were 18%, 52%, 24%, 4% and 31%, 12%, 38%, 19%, respectively. The percentages of HH and MH for secondary landslides of the rainstorm–collapse–landslide disaster chain were 15% and 14% higher than those of the rainstorm–landslide disaster chain, respectively. Compared with only rainstorm as a trigger condition, landslides affected by collapse can pose a greater threat, which is consistent with the formation process of rainstorm-geohazard disaster chains and the ability of disaster chains to amplify disasters.

6 Conclusion

A hazard assessment model is proposed for the rainstorm-geohazard disaster chain. Based on the formation process of the rainstorm-geohazard disaster chain, the hazard value calculation of the disaster chain is transformed into the sum of the hazard values of the rainstorm and the secondary geohazard. Various rainstorm scenarios are constructed by using different durations and return periods as rainstorm parameters, and the regional rainstorm intensity is obtained by fitting historical data based on Pearson type III curves. Based on DE-LightGBM algorithm, the susceptibility and intensity of secondary geohazards are calculated step by step by integrating rainstorm intensity, environmental factors and the occurrence process of secondary geohazards. To correct the effect of category imbalance in the secondary geohazard intensity data, SMOTE-Tomek was used in the data preprocessing stage. The main conclusions are as follows.

  1. 1.

    The areas of the rainstorm–collapse disaster chain and the rainstorm–landslide disaster chain can be arranged in increasing order of HH, VLH, MH and LH, with the LH area being the most dominate type, accounting for more than 40%. For the rainstorm–collapse–landslide disaster chain, the sum of VLH and LH both account for more than 60% of the area. Meanwhile, except for the S3 scenario where the area of LH is the largest, the area of VLH is the largest in the rest of the scenarios. The hazard distribution of the three disaster chains has the characteristics of spatial agglomeration. For rainstorm–collapse disaster chain and rainstorm–landslide disaster chain, the MH and HH are concentrated in Fengjie County and Fengdu County and its surrounding area, and the VLH is concentrated in Jiangjin District and its adjacent area. The hazard distribution of the rainstorm–collapse–landslide disaster chain is concentrated in the Yangtze River and adjacent areas.

  2. 2.

    The hazard results of the three disaster chains are compared in terms of distribution characteristics, relationship with rainstorm intensity, and secondary geohazards. The rainstorm-geohazard disaster chain is closely related to the distribution of water systems, and the Yangtze River coastal areas all belong to the HH area. The hazards of the disaster chain and secondary geohazards are positively correlated with rainstorm intensity, and environmental factors also play a certain role in the hazard value. In addition, the hazard of the rainstorm–collapse–landslide disaster chain is influenced by the triggering effect between secondary geohazards. The disaster chain characteristics that landslides can be triggered only when the collapse reaches a certain disaster intensity and the disaster chain can amplify the damage effect can be reflected by the hazard distribution of secondary geohazards in the rainstorm–collapse–landslide disaster chain. These findings are consistent with the theory of disaster chain formation, which indicates the usability of the model.

  3. 3.

    The rainstorm and secondary geohazard models of the hazard assessment model of the rainstorm-geohazard disaster chain show satisfactory test indexes and results. The RMSE reflects that the fitted results of the Pearson type III curve can better represent multiple rainstorm scenarios based on recurrence periods and durations. The accuracy and AUC indicate that the performance of the secondary geohazard model is excellent. The distribution characteristics, including the hazard of the disaster chain, the hazard of secondary geohazards, and rainstorm intensity, are consistent with the formation process theory and the actual situation of the disaster chain. Therefore, the model applies to the hazard assessment of the rainstorm-geohazard disaster chain.

The results of this study provide a reference for local governments in disaster prevention and emergency management. Of course, this study still has limitations even if it addresses some problems. First, in this study, the rainfall data used in the rainstorm intensity model are daily units, which describe the rainstorm characteristics from a macroscopic scale. Future studies can use rainfall data with more precise scale units or dynamic changes to make more microscopic and detailed representations of rainstorm characteristics, which can reflect more detailed characteristics of the rainstorm-geological disaster chain. Second, there is still much room to improve the predictive accuracy of machine learning algorithms for secondary geohazards, and more advanced algorithms can be considered for secondary geohazard assessment in future.