Introduction

Rain-induced landslides pose great threats to the safety and property of people living in mountainous areas around the world (Gong et al. 2021; Cheng et al. 2021). As the intensity and frequency of severe rainfall events tend to increase in most regions, landslide events will be more severe (Haque et al. 2019). Landslide risk should be controlled at a practicable level to meet the needs of public safety and sustainable development. Spatiotemporal prediction of rainfall-induced landslides plays a vital role in landslide risk management and mitigation in mountainous areas (Huang and Zhao 2018). Landslide susceptibility analysis, which can derive the spatial distribution of landslides given conditioning factors, is an effective approach for the spatial forecasting of landslides. Data-driven methods, particularly machine learning techniques, have been excessively adopted to build landslide susceptibility models. For example, logistic regression (Dai and Lee 2001; Sun et al. 2021), support vector machine (Yao et al. 2008; Kavzoglu et al. 2014; Zhang et al. 2023), artificial neural network (Yilmaz 2010; Wu et al. 2013), and random forest (Chen et al. 2018; Zhao et al. 2019) have oftentimes been adopted in the conventional landslide susceptibility modeling, and the study results indicate that the nonlinear relationships between various environmental factors (e.g., slope angle, slope curvature, and geology) and the landslide occurrence probability in the spatial domain could be effectively captured by these machine learning techniques.

To realize an effective spatiotemporal prediction of rain-induced landslides, more and more attention is paid to coupling the triggering factor of rainfall with landslide susceptibility analyses. Some researchers (e.g., Segoni et al. 2018; Pradhan et al. 2019) proposed combining the landslide susceptibility map with rainfall thresholds using the matrix ensemble approach. The rainfall thresholds are often determined based on the statistical relationship between the landslide occurrence frequency and the corresponding rainfall records (Ko and Lo 2016; Rosi et al. 2016; Chen et al. 2017; Gao et al. 2018). As can be seen, the variation in the sensitivity of landslide occurrence to rainfall within the study area is not considered in these approaches. For example, two regions with different geological and topographical conditions could exhibit different sensitivities of landslide occurrence to rainfall (Jordanova et al. 2020; Wang et al. 2021a). To address this issue, new approaches have been proposed in recent studies (e.g., Wang et al. 2021b; Ng et al. 2021; Xiao et al. 2022), in which the maximum rolling x–h rainfall data (e.g., maximum rolling 12-h rainfall) is taken as a conditioning factor for the machine learning-based landslide susceptibility modeling. With the aid of these new approaches, the spatiotemporal probability of rain-induced landslides can be derived based on the fitted nonlinear relationships between landslide spatial susceptibility and maximum rolling x–h rainfall. It should be noted that the maximum rolling rainfall in a fixed time interval may not be able to describe the complex rainfall situation. For example, the maximum rolling 24-h rainfall data cannot be adopted to characterize the short-term heavy rainfall, whereas the maximum rolling 2-h rainfall data cannot reflect long-term small rainfall conditions. In other words, multiple rainfall data, not the maximum rolling rainfall in a fixed time interval, should be considered for spatiotemporal prediction of rain-induced landslides.

This study proposes an ensemble approach for spatiotemporal landslide susceptibility modeling, in which the dynamic rainfall index and the random forest method are ensembled; and, to consider complex rainfall conditions, multiple rainfall data (i.e., maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall) are integrated into a novel landslide conditioning factor, in terms of the maximum rolling rainfall index (MRRI). To further improve the accuracy of spatiotemporal landslide susceptibility modeling, a frequency ratio-based method (Liu et al. 2022; Zhang and Yan 2022) is adopted for selecting non-landslide samples in the study area. To depict the effectiveness and versatility of the proposed approach, landslide susceptibility models are first developed based on the historical landslide data in the central area of Hong Kong. Then, the trained susceptibility models are applied to two historical rainstorms in Hong Kong. Based on the study results, the advantages and limitations of the proposed approach are discussed.

Study area

The study area is situated in the central area of Hong Kong with an area of about 256 km2 (113° 52′ 53″–114° 5′ 50″ E, 22° 24′ 27″–22° 33′ 54″ N), including Hong Kong Island and part of Kowloon and Lantau Island (Fig. 1a). Hong Kong is located at the mouth of the Pearl River Delta, South China, as shown in Fig. 1a. Noted that the natural mountains occupy about 60% of the land, many urban areas in Hong Kong are densely developed on hillsides. Hong Kong enjoys a subtropical monsoon climate characterized by a large amount of seasonal rainfall, which is mainly concentrated in the rainy season from June to September (AECOM and Lin 2015). Storms with high intensity and short duration are common in Hong Kong, and the hourly rainfall of some severe storms can exceed 200 mm (Gao et al. 2018). Most of the historical landslides reported in Hong Kong are shallow landslides, which are mainly triggered by frequent intense rainfall conditions on steep terrains; and the sliding materials are generally composed of saprolite, colluvium, and weathered rock (Lam et al. 2012).

Fig. 1
figure 1

a Location of the study area. b Spatial distribution of historical landslides and rain gauges

To ensure high-quality rainfall data acquisition, the Hong Kong Observatory (HKO) and Geotechnical Engineering Office (GEO) have installed plenty of automatic rain gauges in Hong Kong since the early 1980s, noted that 53 rain gauges are sparsely installed in the study area, and these rain gauges can provide real-time rainfall data at a 5-min interval. The spatial distribution of these gauges is shown in Fig. 1b. An inventory of historical landslides in Hong Kong known as the Natural Terrain Landslide Inventory (NTLI) was compiled by GEO in 1984 and has continuously enhanced based on aerial photographs and field investigations since then (Maunsell-Fugro Joint Venture 2007). A total of 4990 landslides were recorded in the study area from 1984 to 2009. The spatial distribution of historical landslides in the study area is illustrated in Fig. 1b.

Spatiotemporal landslide susceptibility modeling approach

In this section, the formulation of MRRI, non-landslide sampling method, and random forest technique are briefly introduced; and the implementation procedures of the proposed spatiotemporal landslide susceptibility modeling approach are illustrated.

Landslide triggering factor - maximum rolling rainfall index

The rolling x–h rainfall is defined as the rainfall recorded in x consecutive hours on a rain gauge (Chien-Yuan et al. 2008), whereas the maximum rolling x–h rainfall is defined as the maximum value of the rainfall in x consecutive hours on a rain gauge (Ng et al. 2021). To consider complex rainfall conditions, the MRRI which can be taken as a comprehensive index derived from the multiple rainfall data (i.e., maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall) is proposed herein. The MRRI is defined as the maximum cumulative frequency of historical landslides induced by the current maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data. Noted that this MRRI index is calculated from the maximum rolling rainfall recorded in all the durations of 2-, 4-, 6-, 8-, 12-, 18-, and 24-h. The procedures for calculating the MRRI index are summarized as follows (Fig. 2): (1) seven cumulative frequency curves of the historical landslides over the maximum rolling rainfall data are constructed for the durations of 2-, 4-, 6-, 8-, 12-, 18-, and 24-h, respectively; (2) seven cumulative frequency values are derived from the cumulative frequency curves constructed, based on the inputs of real-time maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data recorded at each position, respectively; and (3) the MRRI index is taken as the maximum cumulative frequency value derived in the previous step. Noted that in the historical landslide database constructed in the study area (i.e., 4,990 landslides), the precise occurrence timing could only be known for the limited number of historical landslides (i.e., 523 landslides). To ensure that there exist a sufficient number of historical landslides for constructing these seven cumulative frequency curves mentioned above, an assumption is made in this study: each landslide is assumed to be triggered by the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data in the year of landslide occurrence, respectively.

Fig. 2
figure 2

Main procedure for calculating the maximum rolling rainfall index

Non-landslide sampling based on frequency ratio-based method

Supervised machine learning techniques are widely adopted in landslide susceptibility analyses (Chang et al. 2020). In general, the training dataset for building a supervised machine learning-based model includes landslide samples (i.e., historical landslides in the study area) and non-landslide samples. Noted that the selection of non-landslide data is vital for effective landslide susceptibility modeling, however, no effective criterion or rule has been established for the non-landslide sample selection. In some traditional analyses, non-landslide points are randomly selected in the study area, and then the non-landslide points that coincide with the landslide points are excluded. As can be seen, great uncertainty or error might exist in the non-landslide sample selection. To reduce the uncertainty or error in the non-landslide sample selection, a non-landslide sampling method based on frequency ratio (FR) is applied in this study, which can randomly generate non-landslide samples in areas with infrequent landslides. Noted that although this FR-based method is rough, it has been employed in some of the existing landslide susceptibility modeling (Liu et al. 2022; Zhang and Yan 2022) and has been shown effective. The FR-based method is commonly employed to calculate the probabilistic relationship between dependent and independent variables (Ozdemir and Altural 2013). The index of FR is defined as the ratio of the landslide occurrence percentage to the area occupation percentage for various classes of every landslide conditioning factor in the study area, the formulation of which is provided below (Lee and Pradhan 2007).

$$F{r}_{i}=\frac{{N}_{i}/N}{{S}_{i}/S}$$
(1)

where Fri denotes the frequency ratio of the ith class of a landslide conditioning factor; Ni is the number of landslides within the ith class of this factor; and N is the total number of landslides in the study area; Si is the area of the ith class of this factor; and S is the total study area. A lower Fri value means that landslides are less likely to occur in the area.

The main procedures for generating non-landslide samples, with the FR-based method, are summarized in the following steps, as shown in Fig. 3. First, calculate the frequency ratio for each class of landslide conditioning factors. Second, aggregate the Fr values of all factors to obtain the overall Fr value for each position within the study area. Furthermore, divide the overall Fr values of the study area into five intervals by the Jenks optimization method (Jenks 1967), which are very low, low, medium, high, and very high. Finally, perform non-landslide random sampling in the areas with medium, low, or very low Fr values.

Fig. 3
figure 3

Main procedure for non-landslide sampling based on the FR-based method

Random forest technique

Random forest (RF), a machine learning technique proposed by Breiman (2001), can construct a multitude of decision trees to reveal the complex relationships between landslide occurrences and landslide conditioning factors (Catani et al. 2013). There are two stages involved in building a landslide susceptibility model using the RF technique: training and predicting stages. In the training stage, a large number of uncorrelated decision trees are generated with the bootstrap sample technique (Hastie et al. 2009). Noted that each tree is grown based on a random subset of input training samples, thus, each tree is unique. In the predicting stage, each tree yields assessment results of landslide susceptibility independently, and the final output is derived based on the unweighted majority of votes among all trees.

Implementation procedures of the landslide susceptibility prediction approach

As shown in Fig. 4, the main procedures for implementing the proposed landslide susceptibility modeling approach are summarized in the following steps:

  • Step 1: Collect landslide conditioning factors, which could characterize the favorable environmental conditions for the development of landslides, from multi-source data such as digital elevation models, geological maps, and satellite images, and derive the MRRI indexes (i.e., landslide triggering factor) of historical landslides based on the historical rainfall and landslide information.

  • Step 2: Calculate Fr values in the whole study area from various landslide conditioning factors and historical landslides, and perform non-landslide random sampling in the study area based on computed Fr values, noted that the number of non-landslide samples should be equal to that of historical landslides.

  • Step 3: Extract the landslide conditioning factors and the condition triggering factor of the landslide and non-landslide samples, and train the landslide susceptibility model using the RF technique based on the information extracted, noted that the other advanced machine learning technique could also be adopted for training the landslide susceptibility model.

  • Step 4: Input time-series MRRI indexes, obtained from the real-time maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data, to the trained landslide susceptibility model for predicting the spatial and temporal probabilities of landslide occurrence in the study area.

Fig. 4
figure 4

Implementation procedures of the proposed spatiotemporal landslide susceptibility modeling approach

Landslide susceptibility modeling in the study area

This section collects conditioning factors in the landslide susceptibility modeling in the study area from multi-source data. Ten landslide susceptibility models are then trained using the proposed landslide susceptibility modeling approach.

Conditioning factors for landslide susceptibility modeling

The occurrence of landslides is closely related to the internal geological and external geomorphological conditions of the slope. Various environmental factors have been employed as landslide conditioning factors for landslide susceptibility modeling in existing studies, but these factors are selected on a case-by-case basis (e.g., Pradhan et al. 2019; Zhao et al. 2019; Gong et al. 2022). Based on previous studies on landslide susceptibility modeling in Hong Kong carried out by Yao et al. (2008) and Wang et al. (2021b), ten key conditioning factors, including elevation, slope angle, aspect, plan curvature, profile curvature, topographic wetness index (TWI), stream power index (SPI), normalized difference vegetation index (NDVI), bedrock, and surficial deposit, are chosen in this study (Fig. 5). The digital elevation model (DEM) of the study area is obtained from the high-precision LiDAR data of the entire Hong Kong acquired in January 2011, with a spatial resolution of 5.0 m/pixel. Most of the rain-induced landslides in Hong Kong are small-scale landslides, with a scar area ranging from 20 to 200 m2 (Gao et al. 2021). Therefore, this DEM is considered to be sufficient to characterize the geomorphological conditions of natural terrain in the study area.

Fig. 5
figure 5

Landslide conditioning factors of the study area: a elevation; b slope; c aspect; d plan curvature; e profile curvature; f topographic wetness index (TWI); g stream power index (SPI); h normalized difference vegetation index (NDVI); i bedrock; j surface deposit

Five topographic conditioning factors (i.e., elevation, slope angle, aspect, plan curvature, profile curvature) can be directly derived from the DEM (Fig. 5a–e). Elevation is a basic terrain feature closely related to slope stability, which yields a considerable influence on the spatial distribution of conditioning factors such as land cover and rainfall. Slope angle is expected to be the most important geomorphological feature of the slope, as it is closely linked to the stress distribution within the terrain and partly reflects the accumulation of loose materials and rock weathering degree. Aspect is the compass direction that a slope faces and variation of slope aspect can lead to spatial differences in illumination and weathering, which may result in different soil textures, soil moisture, and vegetation development. Plan curvature and profile curvature are the terrain curvatures perpendicular and parallel to the direction of maximum slope, respectively, which could reflect the scene of flow across a surface.

Two hydrogeological conditioning factors (i.e., TWI and SPI) can be estimated from the DEM and the river network (Fig. 5f, g). Stream power index (SPI) measures the erosive power of the concentrated surface runoff, while topographic wetness index (TWI) indicates the percolation saturation of soil water (Moore et al. 1991). The formulations of SPI and TWI are provided below.

$$SPI={A}_{c}\times tan\varphi$$
(2)
$$TWI={\mathrm {ln}}(\frac{{A}_{c}}{tan\varphi })$$
(3)

where Ac is the specific catchment area and φ is the slope of the terrain.

Normalized difference vegetation index (NDVI), through which the degree of vegetation growth can be indicated, is usually adopted to reflect soil and hydrological conditions of the slope. Noted that soil and hydrological conditions exhibit significant influence on landslide occurrence. If the value of NDVI is equal to or smaller than 0, the related area is an area with non-vegetated features, such as water, bare ground (rock and soil), and artificial constructions, whereas areas with higher positive NDVI values are typically denser green vegetation regions (e.g., grass, shrub, and forest). It should be noted that landslides in the study area mainly occur in the rainy season (i.e., June to September), and the vegetation growth status is similar in the rainy season each year. As such, the temporal variation of the NDVI in the study area is not considered in the landslide susceptibility modeling conducted in this study. The NDVI, in this study, is derived from the satellite image of Landsat 8 with a resolution of 30 m/pixel, acquired on September 18, 2016. To be consistent with the other conditioning factors generated by the DEM, the resolution of the NDVI map is resampled to 5.0 m/pixel, as shown in Fig. 5h.

The maps of both bedrock and surficial deposits in Hong Kong, produced by the Hong Kong Geological Survey of the Civil Engineering and Development Department (CEDD), have been open to the public since 2006. The geological maps of the study area are converted into raster format with a resolution of 5.0 m/pixel. In total, 12 types of bedrock are distributed in the study area (Fig. 5i), which could be divided into three geological periods (i.e., Jurassic, Cretaceous, Quaternary). Granitic rock (i.e., Jurassic granitic and Cretaceous granitic) and volcanic rock (i.e., Jurassic coarse ash, Cretaceous lava, fine ash vitric tuff and crystal tuff) are the dominant bedrocks, which cover 38.9% and 25.5% of the land area, respectively. The surficial deposits in the study area could be divided into alluvium, colluvium, reclaimed land, weathered bedrock, and beach, intertidal, and estuarine deposits (Fig. 5j). Because of the warm and humid environment, the rate of bedrock weathering in the study area is high and weathered mantles are widely distributed on the slopes.

As mentioned above, only 53 rain gauges are sparsely distributed in the study area, and the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data at positions without rain gauges are interpolated from the rainfall data recorded by these gauges, while the spatial unit adopted is 5.0 × 5.0 m. In this study, the ordinary Kriging interpolation method in the Geostatistical Analyst module of ArcGIS software is adopted, and the exponential model (Leung and Law 2002) is adopted for building the semi-variogram while the separation distance of the semi-variograms established is about 15.0 km. Meanwhile, at least three adjacent rain gauges are utilized for interpolating the rainfall data at each position. Based on the historical rainfall and landslide information, seven cumulative frequency curves of the historical landslides over the maximum rolling rainfall data are constructed for the durations of 2-, 4-, 6-, 8-, 12-, 18-, and 24-h, respectively, as illustrated in Fig. 6. MRRI indexes of historical landslides that is adopted for training the landslide susceptibility model are derived according to these cumulative frequency curves. In contrast, the MRRI indexes adopted for the landslide susceptibility prediction are estimated based on the real-time rainfall data and the cumulative frequency curves shown in Fig. 6.

Fig. 6
figure 6

Cumulative frequency curves of the historical landslides over the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data in the study area

Data preprocessing and landslide susceptibility modeling

As mentioned above, a total of 4990 landslides are recorded in the study area from 1984 to 2009; among which, 4764 historical landslides are selected as the landslide samples for the landslide susceptibility modeling, noted that 226 landslides induced by the rainfall on August 19–20, 2005, and June 6–7, 2008, are retained to later demonstrate the application of the trained landslide susceptibility model. To generate the non-landslide samples in the study area, the FR-based method is adopted. The Fr values of various classes of ten landslide conditioning factors are calculated, as detailed in Table 1. The map of overall Fr values in the study area is further obtained by aggregating the Fr values of all factors and divided into five zones (Fig. 7a): very low, low, medium, high, and very high. Finally, 4764 non-landslide samples are randomly generated in the areas with medium, low, or very low Fr values (Fig. 7b).

Table 1 Fr values of landslide conditioning factors
Fig. 7
figure 7

Non-landslide sampling in the study area. a The map of overall Fr values for the study area. b Distribution of the non-landslide samples

It is known that unimportant landslide conditioning factors and multicollinearity among the conditioning factors could degrade the accuracy of the landslide susceptibility models trained (Zhou et al. 2018). In this study, the variance inflation factor (VIF) and tolerance indexes are computed and adopted to identify the multicollinearity among the conditioning factors, and the information gain ratio is adopted to assess the relative importance of each conditioning factor. The VIF is an indicator that is often estimated to measure the degree of variance of a regression coefficient induced by the multicollinearity among input variables (O’brien 2007). Tolerance is the reciprocal of VIF. When the value of the estimated VIF is greater than 5.0 or the computed tolerance is less than 0.2, multicollinearity exists among the input variables. The information gain ratio is a normalized information gain based on the entropy of values in each class, the mathematical formulation of which can be available in Ghasemi et al. (2020). Generally, a conditioning factor with a higher value of the information gain ratio indicates more importance of this factor in landslide susceptibility modeling. In addition, a random variable varying from 0 to 1.0 is sampled to assess its relative importance in landslide susceptibility modeling. The information gain ratio of this random variable could be taken as a benchmark to identify the unimportant landslide conditioning factors (Luti et al. 2020; Segoni et al. 2020).

As tabulated in Table 2, the smallest tolerance and the highest VIF of the landslide conditioning factors are 0.452 and 2.215, respectively; thus, no multicollinearity is found among the conditioning factors in this study. Figure 8 shows the relative importance of eleven landslide conditioning factors and the random variable. Because the random variable is not related to the landslide occurrence, its information gain ratio is the lowest as expected. The information gain ratio of each conditioning factor is greater than that of the random variable, indicating that all the selected conditioning factors contribute to the development of the landslide susceptibility model. Specifically, the slope angle yields the highest information gain ratio (i.e., 0.131), followed by the profile curvature (i.e., 0.130). The information gain ratio of the MRRI is relatively high (i.e., 0.057), indicating that rainfall has a strong effect on the landslide occurrence in the study area. In comparison to the bedrock factor (i.e., 0.037), the surficial deposit factor (i.e., 0.041) appears to be more closely related to the occurrence of landslides, which is consistent with the engineering experience (Ko and Lo 2016; Gao et al. 2018). Noted that although the aspect yields the relatively low information gain ratio (i.e., 0.007), excluding this conditioning factor could degrade the effectiveness of the landslide susceptibility modeling. Thus, our landslide susceptibility modeling keeps this least informative conditioning factor. In summary, all eleven conditioning factors are vital for the landslide susceptibility modeling in the study area.

Table 2 Variance inflation factors and tolerances of landslide conditioning factors
Fig. 8
figure 8

Information gain ratios of landslide conditioning factors and random variable

In the landslide susceptibility modeling with the RF technique, the information of landslide and non-landslide samples with all conditioning factors is taken as the input, while the landslide susceptibility index (i.e., landslide 1, non-landslide 0) is taken as the output. For ease of the training of the landslide susceptibility model, the categorical conditioning factors (i.e., bedrock and surficial deposit) are assigned numerical labels, and continuous conditioning factors (i.e., elevation, slope angle, aspect, plan curvature, profile curvature, TWI, SPI, NDVI, and MRRI) are normalized using the min–max normalization method (Salehpour et al. 2021). Among the 4764 landslide samples and 4764 non-landslide samples, 80% are randomly selected for the model training, whereas the remaining samples are employed to validate the trained landslide susceptibility model. The number of decision trees and the maximum depth of the tree are 500 and 5, respectively. Two indicators, in terms of the accuracy and receiver operating characteristic (ROC) curve, are adopted to assess the performance of the landslide susceptibility model trained. The accuracy is the proportion of the correct predictions in the total testing samples. In the ROC curve, the false positive and true positive rates are plotted on the X and Y axes, respectively. The area under the curve (AUC) is adopted to measure the probability of correct classification. The performance of a landslide susceptibility model with accuracy and AUC close to 1.0 is considered good. To avoid the randomness induced in the landslide susceptibility model training, the landslide susceptibility modeling is conducted ten times (with the proposed approach); as such, ten susceptibility models can be derived. The average accuracy and AUC of the testing samples are 86.30% and 0.933 (Fig. 9), which demonstrates the effectiveness of the trained landslide susceptibility models. The plots in Fig. 9 also suggest that the variability of the trained landslide susceptibility models induced by training randomness is small. For illustration purposes, only the averaging results of the ten susceptibility models are presented in the following text.

Fig. 9
figure 9

Receiver operating characteristic (ROC) curves of the landslide susceptibility models trained by the proposed approach

Application of the trained landslide susceptibility models to two rainfall events

Two historical rainfall events are selected to demonstrate the spatiotemporal predictive ability of the established landslide susceptibility models. The first rainfall event occurred on August 19–20, 2005, a moderate rainfall event with a long duration that triggered 93 landslides in the study area. The other rainfall event occurred on June 6–7, 2008, which was one of the most severe rainstorms recorded in Hong Kong, and this rainfall event triggered 133 landslides in the study area. Figure 10 shows the spatial distributions of the MRRI index in the study area during the two rainfall events.

Fig. 10
figure 10

Spatial distributions of MRRI index in the study area during the two rainfall events: ah from 21:00 on August 19 to 18:00 on August 20, 2005; ip from 15:00 on June 6 to 12:00 June 7, 2008

In the first rainfall event that occurred in August 2005, the peak center of the MRRI gradually moves from the southwestern Kowloon to the central Kowloon (Fig. 10a–h). The relatively quick increase in the MRRI index in the period from 21:00 on August 19 to 00:00 on August 20, 2005, and that from 09:00 to 12:00 on August 20, 2005, confirms the increase of the short-duration rainfall intensity, while the slow increase of the MRRI index in the remaining periods indicates the weak rainfall intensity. Compared to the first rainfall event, the spatial variation of the MRRI index in the second rainfall event that occurred in June 2008 is much more complicated. For example, in the early stage of the second rainfall event (Fig. 10i), the maximum MRRI index is only 0.10, and the peak center is located in the southeastern part of Hong Kong Island, indicating that the rainfall intensity is small. Then, the MRRI index increases slowly throughout the study area (Fig. 10j–n). The MRRI index in the entire study area increases rapidly from the southeastern after 6:00 on June 7, 2008. All exceed 0.30 within 6 h, especially in the western region, with a maximum value of 0.99 (Fig. 10o, p). As can be seen, the characteristics of these two rainfall events are different: widespread moderate rainfall prevails in the first rainfall event, while more intense short-duration rainfall hits the entire study area in the second rainfall event. It is evident from this that the MRRI index’s spatial distribution can well capture the rainfall characteristics.

On the basis of the spatial distributions of the MRRI index derived in these two rainfall events, the spatiotemporal landslide susceptibility maps are readily obtained with the trained landslide susceptibility models. For ease of landslide susceptibility mapping, the study area is divided into 10,268,219 grids with a size of 5.0 × 5.0 m. Shown in Fig. 11 are the spatial distributions of the landslide susceptibility index (LSI) obtained in these two rainfall events. As can be seen in Fig. 11, in the early stage of the first rainfall event (i.e., from 21:00 on August 19 to 00:00 on August 20, 2005), the areas with high LSI values (i.e., LSI > 0.8) are mainly located in southwestern Kowloon and northern Hong Kong Island (Fig. 11a–d), then the areas with high LSI values gradually distribute in the whole mountainous areas of the study area (Fig. 11e–h). In the early stage of the second rainfall event, the LSI value in the whole study area is not high (Fig. 11i); in the middle stage (i.e., from 21:00 on June 6 to 3:00 on June 7, 2008), the areas with high LSI values are mainly located in northern Kowloon and southeastern Hong Kong Island (Fig. 11j–n); and finally, the areas with high LSI values expand to the entire mountainous areas (Fig. 11o, p), due to the rapid increase of the overall MRRI index. In addition, the locations of all real rain-induced landslides in the two rainfall events are marked in Fig. 11. Note that only the occurrence dates of these landslides could be available while the precise occurrence timing cannot be available. Thus, maps showing the dynamic evolution of real rain-induced landslides versus time are not provided in this study. The time-series landslide susceptibility maps shown in Fig. 11 depict that the expansion trend of the high LSI area corresponds well to the spatial distribution of the real landslides. Thus, the effectiveness and versatility of the trained landslide susceptibility models are demonstrated.

Fig. 11
figure 11

Spatial distributions of the LSI index in the study area during the two rainfall events: ah from 21:00 on August 19 to 18:00 on August 20, 2005; ip from 15:00 on June 6 to 12:00 June 7, 2008

To further illustrate the effectiveness of the trained landslide susceptibility models, the percentage of the areas with high LSI values (i.e., LSI > 0.8), the number of real landslides located in the high LSI areas, and the average hourly rainfall in the study area during these two rainfall events are derived, and the results are plotted in Fig. 12. As can be seen, the variation trend of the percentage of the areas with high LSI values matches that of the number of real landslides. Furthermore, the areas with high LSI values increase rapidly from 6:00 to 11:00 on June 7, 2008, due to the short-duration heavy rainfall in the second rainfall event. In contrast, due to the moderate rainfall pattern, the areas with high LSI values increase relatively slowly in the first rainfall event. The plots in Fig. 12 also demonstrate that the variation of the percentage of the areas with high LSI values corresponds well with the rainfall condition. Thus, complex rainfall conditions could be effectively captured by the trained landslide susceptibility models.

Fig. 12
figure 12

Percentage of the areas with high LSI values (i.e., > 0.8), the number of real landslides located in the high LSI areas, and the average hourly rainfall in the study area during the two rainfall events: a from 21:00 on August 19 to 18:00 on August 20, 2005; b from 15:00 on June 6 to 12:00 June 7, 2008

Discussion

In this section, comparative analyses are first conducted to demonstrate the advantages of the proposed landslide susceptibility modeling approach over some existing methods; then, the limitations of the proposed approach are discussed.

Comparisons between the proposed approach and the existing methods

One of the important components of the proposed approach is the use of the MRRI as a landslide triggering factor to capture the rainfall conditions. In contrast, the existing landslide susceptibility studies frequently employ the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data (e.g., Wang et al. 2021b; Ng et al. 2021; Xiao et al. 2022). The relative importance of the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall as the triggering factor for landslide susceptibility modeling in the study area is quantitatively assessed with the information gain ratio (Fig. 13). Noted that the information gain ratios of all the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall factors are smaller than that of the MRRI (i.e., 0.057), showing that the MRRI contributes more to susceptibility modeling. As the maximum rolling 24-h rainfall factor yields the highest information gain ratio among the conventional seven maximum rolling rainfall factors, it is then selected to implement the conventional landslide susceptibility modeling in the further comparative analysis, noted that the same training and testing samples, as those adopted in "Landslide susceptibility modeling in the study area" section, are adopted in the conventional landslide susceptibility modeling. The susceptibility modeling is also conducted ten times using the maximum 24-h rolling rainfall, and the average accuracy and AUC of the trained landslide susceptibility models are 81.44% and 0.898 (Fig. 14a), noted that the average accuracy and AUC of the landslide susceptibility models trained by the maximum 24-h rolling rainfall are smaller than those trained by the MRRI (i.e., average accuracy, 86.30%; average AUC, 0.933). Thus, the MRRI is shown more effective for landslide susceptibility modeling, in comparison to the maximum rolling 24-h rainfall factor adopted in the existing landslide susceptibility modeling.

Fig. 13
figure 13

Information gain ratios of maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall and MRRI

Fig. 14
figure 14

Receiver operating characteristic (ROC) curves of the landslide susceptibility models: a the models trained by the maximum 24-h rolling rainfall; b the models trained by the randomly selected non-landslide samples

The other vital component of the landslide susceptibility modeling approach is the non-landslide sampling based on the FR-based method. In contrast, the non-landslide samples in the conventional landslide susceptibility modeling are randomly selected in the study area (e.g., Chen et al. 2018; Zhao et al. 2019). Thus, a comparative analysis is further conducted to illustrate the effectiveness of the FR-based method. In this comparative analysis, 4764 non-landslide samples are randomly selected in the study area. The landslide susceptibility modeling is then conducted ten times using the non-landslide samples that are randomly selected in the study area, and the average accuracy and AUC of the trained landslide susceptibility models are 82.25% and 0.903 (Fig. 14b), noted that the average accuracy and AUC of the landslide susceptibility models trained by the randomly selected non-landslide samples are smaller than those trained by the non-landslide samples obtained with the FR-based method (i.e., average accuracy, 86.30%; average AUC, 0.933). Thus, the non-landslide samples obtained with the FR-based method are more effective for landslide susceptibility modeling, in comparison to the randomly selected non-landslide samples. However, it should be noted that although the non-landslide sampling strategy based on the FR-based method is shown to be effective in this study, the non-landslide samples, within the context of this FR-based method, are indeed generated from the landslide samples through data analyses (not from field surveys), and the data analyses might be trick. Thus, the method for obtaining non-landslide samples merits further investigation and improvement.

Limitations of the proposed landslide susceptibility modeling approach

Noted that although the proposed landslide susceptibility modeling approach is shown in this study area and the superiorities over the existing susceptibility modeling methods are demonstrated, the proposed landslide susceptibility modeling approach is not perfect and the following aspects warrant further investigation: (1) although the complex and dynamic rainfall conditions can be considered in the proposed method, the real-time spatiotemporal prediction of landslides under a rainfall event could not be reached, as the temporal delays between the landslide occurrence and rainfall are not included; (2) the output of the landslide susceptibility prediction is the landslide susceptibility index (LSI) ranging from 0 to 1.0, this index is more like a landslide occurrence probability, not a deterministic value, implying that the landslide susceptibility modeling results could only be interpreted in a probabilistic manner; and (3) the proposed approach is a data-driven statistical approach and the modeling results are strongly affected by the data quality and quantity; as the physical mechanism of the landslide cannot be included, the proposed approach cannot take the advantage of the accumulated knowledge of landslide mechanism.

Conclusions

This study proposed an ensemble approach for spatiotemporal landslide susceptibility modeling, in which the dynamic rainfall index and the random forest method are ensembled. Within the context of the proposed approach, the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall data are integrated into the maximum rolling rainfall index (MRRI) to consider complex rainfall conditions; and the frequency ratio (FR)-based method is taken for selecting non-landslide samples in the proposed approach. To illustrate the effectiveness and versatility of the proposed approach, ten landslide susceptibility models were first trained based on the historical landslide data collected in the central area of Hong Kong from 1984 to 2009; the trained susceptibility models were then applied to generate the spatiotemporal prediction of landslides under two historical rainfall events. The following conclusions are reached based upon the results presented.

  1. 1.

    The spatiotemporal landslide susceptibility modeling approach proposed was shown effective in the studies conducted. The time-series landslide susceptibility maps derived from the landslide susceptibility models trained by the proposed approach were in good agreement with the spatial distribution of real landslides under two rainfall events.

  2. 2.

    Compared to the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall adopted in the conventional landslide susceptibility modeling, the MRRI index proposed in this study was more effective. The information gain ratios of all the maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall were smaller than that of the MRRI; thus, the MRRI contributes more to landslide susceptibility modeling. The results of comparative analyses suggested that the landslide susceptibility models trained by the MRRI could perform better than those trained by the traditional maximum rolling 2-, 4-, 6-, 8-, 12-, 18-, and 24-h rainfall. Furthermore, complex rainfall conditions could be well captured by the MRRI index formulated.

  3. 3.

    Comparative analyses conducted also confirmed the effectiveness of the FR-based method for non-landslide sampling in the landslide susceptibility modeling. The non-landslide samples obtained with the FR-based method were more effective in the landslide susceptibility modeling, in comparison to the non-landslide samples that were randomly selected.