Introduction

Landslides are a common geological hazard that cause huge casualties and property losses (Chae et al. 2017). In particular, the Three Gorges area of the Yangtze River in China has experienced a great number of landslides since the impoundment of the Three Gorges Reservoir in 2003 (Tang et al. 2019; Huang et al. 2016). Developing monitoring and early warning systems is a vital approach against such a threat, and landslide deformation predictions have played an important role in making the whole system function (Kirschbaum et al. 2010; Zhou et al. 2016).

Considerable progress in landslide deformation predictions has been made since Saito (1965) proposed an empirical model based on the creeping nature of soils. To date, four main types of prediction models for landslide deformation have been developed; the first three are empirical, statistical, and nonlinear models (Miao et al. 2018). Empirical models are based on creep experiments and monitoring landslide data (Saito 1965; Fukuzono 1985; Crosta and Agliardi 2012); however, they have strict application conditions that often restrict their practical use. Statistical models are based on mathematical statistics and their derivatives (Deng 1988; Yin and Yan 1996; Wang et al. 2019). They are used to determine the landslide trends when the physical mechanism of the landslides is too complicated to be understood using empirical models. However, the ability of statistical models to predict landslide trends determined by multiple impact factors is still insufficient (He et al. 2008; Ren et al. 2015). Nonlinear models are based on traditional nonlinear theories (Dong et al. 2011) and mainly include intelligent models, such as neural networks (Yang et al. 2019), support vector machines (Han et al. 2020), and extreme learning machines (Cao et al. 2016). Nevertheless, for complex landslides influenced by multiple factors, nonlinear models cannot always satisfactorily forecast the landslide deformation trend (Li et al. 2012). The fourth model is the synthetic model combining the multiple prediction models mentioned above; the synthetic model can integrate the impacts of multiple factors into one model. Thus, more satisfactory landslide deformation prediction results can be obtained even though the landslide mechanism is not fully understood (Ma et al. 2017; Cai et al. 2016). In particular, prediction models combining time series with machine learning models have exhibited high accuracy concerning complex landslides (Zhou et al. 2016; Zhu et al. 2018b; Zhang et al. 2021).

Notably, the cumulative displacement of landslides is caused by both internal geological conditions and external environmental factors (Wen et al. 2017). The former includes the lithology, geological structure, and progressive weathering, and the latter includes rainfall and variations in the reservoir water level. Therefore, landslide displacement can be mainly decomposed into a trend and periodic components (Zhang et al. 2021). The former can be represented by a monotonic function of time, which is generally approached using a polynomial function. The latter can be obtained by machine learning models, such as SVM, ANN and ELM (Zhu et al. 2018a; Guo et al. 2020; Zhou et al. 2018). However, overfitting often occurs in the training process of machine learning models and may notably lower the prediction reliability (Liang et al. 2020).

Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any individual constituent learning algorithm; thus, these methods can lower the probability of overfitting. Moreover, hyperparameter tuning is essential for performance improvement when training machine learning models. As one of the most representative ensemble learning models, the random forest algorithm has the advantages of simplicity, high precision, strong anti-noise ability, and strong anti-overfitting ability, and has been used to predict landslide velocities (Krkač et al. 2020). Bayesian optimization is a simple but efficient optimization algorithm for hyperparameter tuning (Nguyen et al. 2018). Bayesian optimization can quickly find the global optimal solution, which may be expected to further improve the accuracy of the random forest model. However, few works have focused on predicting landslide deformation using the optimized random forest model.

In this paper, through time series analysis, the accumulated displacement was decomposed into a displacement trend and periodic displacement. The Bayesian optimized random forest (BORF) model was developed to predict the periodic displacement of the landslide deformation. Moreover, the Kalman filter, a simple but efficient statistical algorithm, was used to predict the displacement trend. The cumulative deformation of landslides can be predicted by the BORF combined Kalman filter (BORF-KF) model. The applicability, efficiency and accuracy of the novel synthetic model were verified against monitoring data obtained from the Outang landslide in Chongqing City, China.

Methodology

Deformation decomposition

Time series analysis methods are used to decompose cumulative landslide displacements into trends and periodic displacements; among them, the moving average method is the most widely used. When the time series data are influenced by the periodic variation and irregular fluctuation and the development trend is not obvious, the use of the moving average method helps to eliminate the influence of unfavorable factors and implement the analyses and prediction of long-term time series trends.

The time series of cumulative landslide displacement \({y}_{t}\) yields the following equation:

$$y_{t} = S_{t} + C_{t} ,$$
(1)

where \({S}_{t}\) and \({C}_{t}\) are the trend and periodic terms, respectively.

At moment t, the trend displacement \({S}_{t}\) yields the following equation:

$$S_{t} = \frac{{D_{t} + D_{t - 1} + \cdots + D_{t - k + 1} }}{k}\quad \left( {t = k,k + 1, \ldots n} \right),$$
(2)

where \({D}_{t}\) is the cumulative displacement at moment t and k is the time interval.

Kalman filter

Essentially, the Kalman filter is a type of optimal estimation, i.e., it estimates the parameters describing the underlying problem from noisy observations (Brown and Hwang 1997). It holds the advantages of high accuracy and robustness (Kalman 1960). The basic principle is shown in Fig. 1. As shown in the figure, the Kalman filter estimates the state of the process at a certain point in time by feedback control, obtaining feedback in the form of measurements. Notably, the Kalman filter equations comprise two groups: the time update equation and the measurement update equation. The former is in charge of predicting the current state and estimating the error covariance to obtain the a priori estimate for the next time step. The latter is in charge of the feedback, incorporating new measurements into the prior estimate to obtain an improved posterior estimate. The estimation algorithm of the Kalman filter is similar to the prediction-correction algorithm when solving numerical problems (Maklouf et al. 2009).

Fig. 1
figure 1

Schematic diagram of Kalman filter

The displacement trend of the cumulative landslide deformation represents its long-term trend and can be considered a function of time (Lacroix et al. 2020). Given a small observation interval, the landslide deformation is assumed to be relatively small; thus, the displacement trend can be expanded in a Taylor series at moment \({t}_{k}\), which yields the following equation:

$$x_{k} = x_{k - 1} + v_{k - 1} \left( {t_{k} - t_{k - 1} } \right) + \frac{1}{2}a_{k - 1} \left( {t_{k} - t_{k - 1} } \right)^{2} + \frac{1}{6}s_{k - 1} \left( {t_{k} - t_{k - 1} } \right)^{3} + g_{k - 1} ,$$
(3)

where xk is the trend displacement of the cumulative displacement, and \({v}_{k-1}\) and \({a}_{k-1}\) are the velocity and acceleration, respectively, all at moment tk. Additionally, \({s}_{k-1}\) is the effect of the third power of time change on the deformation; \({g}_{k-1}\) is the residual term of the Taylor series.

Random forest model

Ensemble learning implements the target task by building and integrating multiple base learners. The multiple base learners combined by a certain strategy can obtain superior generalization performance than any single learner. A decision tree is a common base learner for ensemble learning that follows a divide-and-conquer strategy. Bagging is the most prominent representative of parallel ensemble learning methods and is based on bootstrap sampling; in this method, the learning process generates multiple training sets by bootstrap sampling and develops base learners for individual training sets.

A random forest is an ensemble learning model based on bagging methods that uses decision trees as the base learners (Breiman 2001). The training process of random forest is implemented by selecting features with randomly selected attributes. Notably, the feature selection uses relevant feature subsets to develop a learning method with strong robustness (Blum and Langley 1997). The individual nodes of the base decision tree can be divided by their optimal attributes based on random subsets of attributes (Norouzi and Shahmohammadi-Kalalagh 2019). The random forest enriches the diversity of the base learners by sample and attribute perturbation, reducing the instability, improving the generalization performance, and enhancing the robustness of the model (Belgiu and Drăguţ 2016; Prasad et al. 2006). In addition, overfitting is avoided using bagging, which allows the random forest to exhibit high accuracy and strong robustness when handling problems with natures such as missing data, outliers, and noise.

The flowchart of the random forest is shown in Fig. 2. As shown in the figure, n samples are selected by bootstrap sampling as a sample subset, and m attributes are randomly selected for feature selection to construct a single decision tree. The process is repeated n times to construct n decision trees, and finally, the results of each decision tree are combined as the final output of the model.

Fig. 2
figure 2

Schematic diagram of random forest algorithm

Bayesian optimization

Bayesian optimization (BO) (Ghahramani 2015) is a type of approximation method that is effective for global optimization. Based on Bayes' theorem, the BO estimates the posterior distribution of the objective function and constructs alternative functions, which is based on the a priori evaluation results of the unknown objective function. Thus, the next hyperparameter combination that matches the optimal solution can be promptly found (Donald et al. 1998). The BO makes full use of the information of the previous sampling point to find the parameter combinations maximizing the improvement of the objective function, such that the search efficiency can be enhanced by learning the shape of the objective function. The equation of the BO is shown as follows:

$$x^{*} = \mathop {\arg \min }\limits_{x \in X} f\left( x \right),$$
(4)

where \(x\) denotes the parameter to be optimized; \(X\) denotes the set of parameters to be optimized; \(f\left(x\right)\) denotes the objective function; and \(\mathrm{arg min}\) denotes the process of finding the minimum value instead of the optimal value.

The kernels of BO are the probabilistic agent model and the acquisition function. The probabilistic agent model is used to proxy the unknown objective function, which can improve the accuracy of the agent model by continuously correcting the prior probabilities through iterations of data increments. The collection function samples from the most likely global optimal solution and unsampled regions in accordance with the posterior distribution, searching the optimal solution from the candidate set to minimize the value of the loss function. The probabilistic model of the BO uses the Gaussian function to proxy the complex black box function (Sano et al. 2020) and introduces prior knowledge of the target function to be optimized in the probabilistic model, such that the sampling redundancy can be reduced. Additionally, the local neighborhood information for effective inference can be used to select the potential point matching the optimal solution more accurately (Cui and Yang 2018). Compared with other optimization methods, such as maximum likelihood estimation, Bayesian optimization is less likely to fall into the problem of the local optimal solution.

Modeling procedure

The whole modeling process of the BORF-KF model for landslide deformation prediction is shown in Fig. 3. As seen from the figure, the landslide cumulative displacement is decomposed into the trend and periodic displacements. The displacement value of the trend term is predicted using the Kalman filter, while that of the periodic term is predicted using the random forest model. In particular, the hyperparameters of random forest are tuned by Bayesian optimization. Finally, the predicted values of the trend and periodic displacements are superposed to obtain the predicted values of the final cumulative displacement.

Fig. 3
figure 3

Flow chart of the prediction model: a data preparation; b model prediction; c Bayesian optimization

Case study

Geological conditions

The Outang landslide is an ancient landslide, which is located in Fengjie County, Chongqing City, China. The geographical location is in the region comprising 30° 56′ 57″–30° 58′ 1″ N and 109° 20′ 41″–109° 21′ 14″ E. It is approximately 12 km upstream of Fengjie County and 177 km downstream from the Three Gorges Reservoir, as shown in Fig. 4a and b. Also, Fig. 4c shows the panoramic view of the frontal landform of the Outang landslide. As seen from the figure, the leading edge of the Outang landslide has been thrusted into the Yangtze River, whose details are shown in Fig. 5. The landslide area has a subtropical monsoon climate, with an average annual temperature of 16.3 °C and an average annual rainfall of 1147.9 mm. Rainfall is abundant and is concentrated in the flood season, from May to September annually.

Fig. 4
figure 4

Geographic position of the Outang landslide: a the geographical location of the study area; b the geographical location of the Outang landslide; and c the panoramic view of the frontal landform of the Outang landslide

Fig. 5
figure 5

Leading edge of the Outang landslide thrusted into the Yangtze River

The topography of the landslide is a shallow- to medium-cut monoclinic low mountain valley, with the tendency of the rock layer nearly consistent with the slope direction. The Yangtze River flows from the north side to the northeast from the west, with an angle of 10°–15° to the rock layer direction. As shown in Fig. 4c, the main slide direction of the landslide is 345° north. Overall, the landslide is inverted, ancient and bell-shaped, wide in front and narrow in the back, with an area of 171 × 104 m2 and a total volume of 7510 × 104 m3. The elevation of the front edge of the landslide ranges from 90 to 102 m; the elevation of the back edge is approximately 705 m; the north–south length of the slide body ranges from 1640 to 2230 m; and the east–west width ranges from 550 to 1300 m. Additionally, the thickness of the slide average is approximately 44 m, and the thickest slide is up to 114 m.

The Outang landslide is located in a secondary tectonic unit of the Yangzi para-tectonic platform at the intersection of the Sichuan platform syncline and the Upper Yangzi platform fold. Locally, the landslide area is also located in the southeast wing of the syncline end of Gulling town, without any regional fracture. The Outang landslide is an exceptionally large and deep compliant rocky landslide, with the topography being high in the south and low in the north and a steep top and a gentle bottom.

The landslide area is located on the south bank of the Yangtze River, where a seasonal ditch is developed with a cutting depth of 3–6 m. The groundwater at the front edge of the landslide is closely linked to the Yangtze River and the Three Gorges Reservoir area with good connectivity; the groundwater level is mainly affected by the regulation of the reservoir water level.

The groundwater in the middle and back edge of the landslide has insufficient recharge and poor groundwater storage conditions. The overall groundwater level of the west side is higher than that of the east side, owing to the groundwater runoff from the southeast to the northwest. Being blocked by the rock ridge in the northeast, the groundwater runoff turns to the north side of the Yangtze River for discharge and finally gathers in the western part of the region.

Data acquisition

The Outang landslide comprises three blocks, as shown in Fig. 6. Block 1 is located below, exhibiting an inverted ball shape, with an elevation of 90–370 m, an area of 92.2 × 104 m2, and a volume of 6480 × 104 m3. In particular, two intensive deformation zones, located in the western and eastern regions, exist in Block 1. Notably, the leading edge of Block 1 is seasonally submerged and surfaced with the variation in the reservoir water level. Block 2, irregularly shaped and extending its front to the top of Block 1, is located in the middle with an elevation of 250–530 m, an area of 31.6 × 104 m2, and a volume of 1020 × 104 m3. Block 3, which has an inverted, ancient clock shaped, is located at the top, with an elevation of 400–705 m, an area of 54.3 × 104 m2, and a volume of 1450 × 104 m3.

Fig. 6
figure 6

Landform and monitoring point layout of the Outang landslide

As an ancient landslide, the Outang landslide resurrected after the experimental impoundment of the Three Gorges Reservoir in June 2008. A long period of deformation occurred without termination, which resulted in building damage and ground fissures within a wide range of 171 × 104 m2. Professional monitoring was performed by the government of Fengjie County in December 2010, mainly by means of GNSS for ground movement, borehole inclinometers for deep displacement, sensors for rainfall and reservoir water level, and apparent inspection. Typical GNSS monitoring points are selected to verify the prediction performance of the proposed BORF-KF model using the measured time series data of the surface deformation. As noted in Fig. 6, three selected monitoring points, MJ01, MJ08, and MJ20, are located in Blocks 1, 2, and 3, respectively. In particular, MJ01 is within the intensive deformation area. The engineering geology profile of section A–A′ in Fig. 6 is illustrated in Fig. 7.

Fig. 7
figure 7

Engineering geology profile of A–A′ section

Landslide deformation characteristics

The measured data were fitted using the least squares interpolation method and then transformed into an equal-interval time series with an interval of 15 days. Figure 8 shows the time series of the monitoring data of the Outang landslide from December 6, 2010, to April 8, 2016. The measurements of the monitoring points in Fig. 6 comprise three types, namely, rainfall, the reservoir water level, and the cumulative ground deformation. As shown in the figure, the general ground deformation trend is strongly correlated with those of the rainfall and reservoir water level. Additionally, the cumulative ground deformation of the landslide increases during the flood season and alleviates during the rest season. During the monitoring period, the cumulative ground deformation did not exhibit an obvious convergence trend.

Fig. 8
figure 8

Measurements of Outang landslide

Rainfall is an external motive that induces reservoir landslides (Tomas et al. 2014; Li et al. 2010). Rainfall infiltration not only increases the hydrostatic pressure but also generates dynamic water pressure, which reduces the effective stress inside the soil and leads to slope instability. Previous studies on the relationship between landslides and rainfall show that the cumulative rainfall in the 30 days before the landslide has a remarkable effect on the deformation of the landslide (Du et al. 2013; Roering et al. 2015). Additionally, the periodic variation in the reservoir water level is the main factor that induces the occurrence of step-like deformation of the landslide (Jiao et al. 2014; Guo et al. 2017). Moreover, the variation in the reservoir water level is influenced by rainfall. Herein, the rainfall over the past 15 and 30 days and the variation in the reservoir water level over 15 and 30 days are depicted in Fig. 8. As seen from the figure, the cumulative landslide displacement is positively correlated with rainfall; in addition, the decrease in the reservoir water level is positively correlated with the step-like deformation event of the landslide.

The deformation characteristics of landslides induced by external factors vary at different stages of landslide evolution (Glade et al. 2005). Given the evolution stage of the Outang landslide, the displacements measured 15 and 30 days before the current moment are taken as the state factors in the prediction model of periodic displacement (Zhou and Yin 2014). Setting the identification coefficient as 0.5 (Zhang et al. 2019), gray correlation theory is used to quantitatively characterize the correlation of the individual state factor. Once the correlation coefficient value of a factor exceeds 0.6, a strong correlation between this factor and the periodic displacement of the Outang landslide could be confirmed. Table 1 lists the values of all the gray correlation coefficients of those three monitoring points, all of which are greater than 0.6, indicating that the six factors, e.g., the rainfall over the past 15 days and 30 days, the variation in the reservoir water level over the past 15 days and 30 days, and the displacement variation over the past 15 days and 30 days, are all strongly correlated factors of the periodic displacement of the Outang landslide.

Table 1 Gray correlation coefficients

Results

Cumulative displacement decomposition

Based on the measurements obtained at the three monitoring points, a prediction model of the ground deformation of the Outang landslide is developed. Given the three monitoring points, the whole dataset is divided into a training set and a test set.

Consider a hydrological year in Fig. 8 as a study period containing 24 sets of monitoring data. The moving average method is used to extract the trend term from the cumulative displacement, and then the trend displacement is subtracted from the cumulative displacement to obtain the periodic displacement.

Trend displacement prediction

Based on the extracted trend displacement data, the Kalman filter model was used for the one-step prediction, and the results were compared with those of the polynomial function, as shown in Fig. 9. As seen from the figure, given those three monitoring points, the trend trace of the prediction deformation using the Kalman filter agrees very well with that of the measured trend displacement, but minor divergence exists between the trends of the prediction deformation using the polynomial function and the trend displacement. Moreover, the errors of the prediction results obtained using both models are compared in Table 2. Notably, the mean absolute percentage error (MAPE) is used for the error analysis. The results show that the prediction accuracy of the Kalman filter is higher than that of the polynomial function, and the minimal proportional accuracy improvement of MAPE reaches 81%.

Fig. 9
figure 9

Comparison between the measured and predicted values of trend displacement a MJ01; b MJ08; c MJ20

Table 2 Error analysis

Periodic displacement prediction

The BO and random search algorithm (RSO) were used to optimize the four main hyperparameters of the random forest model. These four hyperparameters include the number of decision trees, the number of minimum split point samples, the maximum number of features, and the maximum depth in a single decision tree. The values of the optimized hyperparameters using both algorithms are listed in Table 3. Both hyperparameter combinations were used in the random forest model to predict the periodic displacement of the Outang landslide.

Table 3 Values of optimized hyperparameters

Given the three monitoring points, Fig. 10 shows the periodic displacement prediction results obtained using the BORF and random search-optimized RF, i.e., RSORF. As seen from the figure, the curve trends of both models are in general agreement with that of the measured periodic displacement. Table 4 compares the accuracy of both models in detail, and the results show that the MAPE values of the BORF model are less than 15%, within the acceptable error range, while some MAPE values of the RSORF model exceed 15%, which is out of the acceptable range. The prediction accuracy of the BORF model is notably higher than that of the RSORF model; the minimum relative improvement in accuracy of MAPE reaches 21%, suggesting that the BORF model can obtain satisfactory accuracy in predicting the periodic displacement.

Fig. 10
figure 10

Prediction results of periodic displacement a MJ01, b MJ08, c MJ20

Table 4 Periodic displacement prediction

Cumulative displacement prediction

The prediction results of the trend were superposed above the periodic displacement in accordance with the time series. Thus, the prediction results of the cumulative displacement of the Outang landslide are obtained, as shown in Fig. 11. As shown in the figure, given the three monitoring points, the predicted cumulative displacements of the landslide are in good agreement with the measured values. Table 5 gives the MAPE values of the predicted cumulative displacement, of which the maximum is 1.89%, suggesting that the proposed BORF-KF model has a high prediction accuracy.

Fig. 11
figure 11

Prediction results of the cumulative displacement a MJ01; b MJ08; c MJ20

Table 5 Cumulative displacement error

Discussion

Although the implementation of the Three Gorges Dam was a great engineering achievement, the construction and operation of the reservoir inevitably caused potential negative impacts on the hydrological environment, aquatic and terrestrial ecosystems, regional climate, and soil conservation in the middle and lower reaches of the Yangtze River (An et al. 2009; Gao et al. 2010; Yu et al. 2014). Reservoir impoundment also varied the regional engineering geological and hydrogeological conditions, leading to frequent reservoir landslides, among which the Outang landslide is a typical representative.

Since the experimental impoundment of the Three Gorges reservoir in 2008, the periodic rise and fall of the reservoir water level resulted in the rise of the water-level fluctuation zone, within which the seepage pressure varied (Jiang et al. 2011). Additionally, the leading edge of the sliding mass is partially immersed in the reservoir water, reducing the sliding resistance and local stability, leading to the occurrence of local collapse. More importantly, the infiltration of rainfall increased the sliding force and weakened the slip resistance. Eventually, all of the abovementioned factors jointly induced the resurrection of the Outang landslide. Ground deformation continuously occurred after the resurrection of the Outang landslide, yielding disasters such as building damage and fissures within the landslide area of 171 × 104 m2. In June 2013, the government of Fengjie County enacted emergency reinforcement measures, including backfilling, toe compression and the placement of lattice revetments on the east and west sides of Block 1, significantly slowing the ground deformation rate of the landslide region. Nevertheless, the measurements in Fig. 7 show that the ground deformation did not exhibit a significant convergence trend during the monitoring period, which is mainly owing to the soil creep.

The Kalman filter is a one-step predictive model that can be used to track continuously changing systems. The unique advantage of the Kalman filter is its ability to correct estimates based on the latest measurements with dynamic weighted correction. The random forest algorithm, as a typical representative of ensemble learning, can effectively overcome the overfitting problem to obtain high prediction accuracy. In addition, compared with regular hyperparameter tuning measures, the Bayesian optimization algorithm can implement global optimization such that the main hyperparameters of the random forest are optimized to further improve the prediction accuracy. A synthetic model, BORF-KF, is developed based on these three techniques to predict the ground deformation of reservoir landslides. In this study, the prediction results using the BORF-KF model agree well with the measured data. Particularly, satisfactory prediction accuracy is shown even in the step-like period, highlighting the application potential of the BORF-KF model. However, the proposed method cannot be applied in regions where soils may exhibit considerable strain softening, which may trigger sudden rapid dynamic landslide movement such as that occurred during the Vaiont landslide (Ciabati 1964; Tika and Hutchinson 1999; Stamatopoulos and Di 2015).

Conclusion

The Outang landslide is an ancient giant rocky landslide resurrected after the experimental impoundment of the Three Gorges Reservoir in 2008. In this paper, a synthetic model based on a Kalman filter and Bayesian optimized random forest is proposed to predict the displacement of the Outang landslide. The following conclusions are drawn from this study:

  1. 1.

    The impoundment of the Three Gorges reservoir varied the regional engineering geological and hydrogeological conditions, leading to frequent reservoir landslides, among which the Outang landslide is a typical representative. The recurrent of the Outang landslide is jointly induced by the rainfall, variation of reservoir water level and soil creep.

  2. 2.

    The ground deformation of the Outang landslide exhibits notable step-like characteristics, which is positively correlated with the concentrated rainfall and the decrease of the reservoir water level. The ground deformation did not exhibit convergence during the monitoring period from December 2011 to April 2016.

  3. 3.

    The comparison results of the prediction and monitoring time series indicate that the BORF-KF model attains a high prediction accuracy (cumulative deformation error less than 2%) from the initial to step-like stages of the landslide, suggesting that such a synthetic prediction model holds much potential in predicting the ground deformation of the reservoir landslide.