Introduction

The traditional factor of safety (FS) used in the slope stability analysis cannot consider the effects of the uncertainty of the soil properties. On the other hand, the probabilistic approaches which try to quantify those uncertainties are increasingly being used by engineers and researchers for the probabilistic slope stability analysis (Cho 2009, 2013; Hamedifar et al. 2014; Jha and Ching 2013; Li et al. 2014; Low 2007; Zhang et al. 2011a). Generally, the probability of a slope failure involves solving a multiple integration with respect to the joint probability distribution function of all the random variables (e.g., c and φ) within the failure region, which is a very challenging task if the analytical method is adopted. Currently, the Monte Carlo simulation (MCS) is being used to obtain an unbiased estimation of the failure probability by evaluating the performance function based on a large number of randomly generated samples (Jiang et al. 2015; Wang 2012, 2013; Wang et al. 2010). However, the MCS suffers from a prohibitively expensive calculating cost, and a full MCS may take more than 1 day to complete the computation using the limit equilibrium method (LEM). For the slope stability analysis based on the finite element model (FEM), the required computer time is practically unacceptable to most of the routine engineering design works. To enhance the computational efficiency of the MCS, the quasi MCS and the variance reduction methods have been proposed in the literature. For example, the use of the quasi MCS in the slope stability analysis is a promising alternative (Cheng et al. 2015) which may however perform unsatisfactorily in some cases. In addition, the Subset Simulation (SS), which is considered as an enhancement of the MCS, can significantly improve the computational efficiency, but it still requires hundreds evaluations of the LEM or FEM slope stability model (Au and Wang 2014; Wang et al. 2011). Therefore, more efficient and accurate reliability analysis approaches are still under active research.

As an alternative, the surrogate model, which uses an explicit performance function to approximate the implicit relationship between the original model output (i.e., FS) and the inputs (e.g., c and φ), has been recognized as an efficient tool for the slope reliability analysis over the past few decades. The most popular surrogate models include the support vector machine (Kang and Li 2015; Zhao 2008), the neural network (Cho 2009), the Kriging (Luo et al. 2012a, b; Yi et al. 2015; Zhang et al. 2011a, 2013) and the response surface method (RSM) (Jiang et al. 2014; Li et al. 2015; Li and Chu 2015; Zhang et al. 2011b). Among them, the RSM based on the second order polynomial without cross terms is the most popular approach at present, and more about the RSM and its variants can be found in a recent review paper (Li et al. 2016). Usually, the RSM is adopted in combination with the first order reliability method (FORM) or the sampling-based approaches (e.g., MCS) to assess the reliability of a slope, which can substantially reduce the evaluating time for the reliability analysis. By contrast, it has also been validated that a traditional RSM is unable to approximate the performance function accurately, and the adjustment to the RSM for an expected precision is non-trivial when it is only locally accurate (Zhang et al. 2013).

Compared with the RSM, a Kriging-based surrogate model presents several advantages as follows: Firstly, it is an exact interpolation method, that is, the predictions at the known points that belong to the design of experiments (DOEs) are absolutely correct. The DOEs here in this study are also known as the training samples to calibrate the Kriging model. Secondly, it can provide a Kriging variance indicating the prediction error at an unknown point, for which the RSM cannot (Zhang et al. 2011a; Zhao et al. 2015). Due to the abovementioned merits, the Kriging model has gained much attraction in structural reliability analysis (Busby 2009; Kaymaz 2005; Yuan et al. 2013; Zhang et al. 2015; Zhao et al. 2015; Zhao and Wang 2011; Zhao et al. 2010). On the contrary, the application of the Kriging model in geotechnical reliability field is still limited at present. Zhang et al. (2011a) is probably the first one to use the Kriging in the numerical model or the RSM to best fit a response surface for the reliability analysis of the stability of some typical geotechnical structures. Later on, Zhang et al. (2013) compared the Kriging with the quadratic RSM in system reliability analysis of soil slopes and have found its superiority over the commonly used quadratic RSM. Luo et al. (2012b) and Yi et al. (2015) used the artificial bee colony (ABC) algorithm and the particle swarm optimization (PSO) optimized Kriging method for the slope reliability analysis, respectively. Recently, Kang et al. (2015) used a Gaussian process regression with the Latin hypercube sampling (LHS) to perform the system probabilistic stability analysis of soil slopes, of which the underlying principle is very similar to the Kriging method. It should be noted that all these researchers used the ordinary Kriging (OK) based on a MATLAB toolbox DACE (design and analysis of computer experiments) developed by Lophaven et al. (2002). The problem is that the trend function is a nonzero constant which might not be able to capture the non-constant mean trend of the soil strength parameters. Although the universal Kriging (i.e., the trend function is polynomial) is available in the DACE, it is limited to the second order while the selection of the base function forms still requires exploration. Given the high nonlinearity of the soil properties and the slope stability models, applying the higher order trend function for improving performance should be of great importance. Further comparison of the performances between the Kriging with different trends in the slope reliability analysis should be conducted, but such work appears to be outstanding at present, to the best knowledge of the authors.

Inspired by Liu et al. (2010), the objective of this paper is to combine the Taylor expansion and the Kriging to form a unified Kriging approach. It is termed as the Taylor Kriging (TK) in this paper, and the Taylor expansion order can be as high as the fourth. To implement the TK model, firstly, the genetic algorithm (GA) is used to optimize the correlation parameters in the Kriging, and an adapted toolbox based on the DACE for the purpose of consistent comparison with other Kriging models, named as the GATK (genetic algorithm optimized TK), is built for the ongoing research. In the GATK, the best Kriging model can be identified easily from the different Taylor expansion orders. Both the analytical and the practical slope examples are used to validate the proposed approach. With the existence of the GATK, the comparisons can be made between the OK, UK and TK models as well as the other reliability approaches. To achieve these aims, this paper starts with a review of the classical Kriging theory and the proposed TK forms, followed by the introduction of the GATK model which is validated by two analytical examples. The system reliability analysis of the slope stability and its implementation procedure are then described. Finally, the proposed approach is illustrated by two practical soil slopes, followed by the conclusions of this study.

Kriging methodology

Classical Kriging theory

Unlike the other models, the Kriging method is a semi-parametric interpolation technique consisting of two parts: linear regression part (trend part) and stochastic part (Cressie 1993; Krige 1994). Assume x denotes a vector of input variables and y(x) denotes the dependent response, the Kriging is expressed as

$$ \boldsymbol{y}\left(\boldsymbol{x}\right)=\boldsymbol{f}{\left(\boldsymbol{x}\right)}^T\boldsymbol{\beta} +\mathbf{z}\left(\boldsymbol{x}\right) $$
(1)

where f(x) = [f 1(x), f 2(x),  ⋯ , f n (x)]Tis the basis function, β = [β 1, β 2,  ⋯ , β n ]Tis a vector of the regression coefficients which needs to be determined, and n denotes the number of the basis function. z(x) is used to model the fluctuation of the regression part f(x)T β, and it is assumed to be a Gaussian process with the following statistical properties:

$$ \left\{\begin{array}{l}E\left(\mathbf{z}\left(\boldsymbol{x}\right)\right)=0\\ {}Var\left(\mathbf{z}\left(\boldsymbol{x}\right)\right)={\sigma}^2\\ {}Cov\left(\mathbf{z}\left({\boldsymbol{x}}_i\right),\mathbf{z}\left({\boldsymbol{x}}_j\right)\right)={\sigma}^2\boldsymbol{R}\left({\boldsymbol{x}}_i,{\boldsymbol{x}}_j\right)\end{array}\right. $$
(2)

where σ 2 is the process variance, Cov(z(x i ), z(x j )) defines the covariance between the two arbitrary points x i and x j , and R(x i , x j ) is the spatial correlation function which can be determined from several available models. The Gaussian correlation function is adopted as the spatial function in this study, and it is written as

$$ \boldsymbol{R}\left({\boldsymbol{x}}_i,{\boldsymbol{x}}_j\right)= \exp \left(-{\displaystyle \sum_{l=1}^{n_v}{\theta}_l}{\left|{\boldsymbol{x}}_{il}-{\boldsymbol{x}}_{jl}\right|}^2\right) $$
(3)

where n v is the number of the variables, and θ l is the lth correlation parameter which ensures a high flexibility of the model and is usually determined by an optimization method.

Suppose there is a set of DOEs with S = [x 1, x 2,  ⋯ , x m ]T and the response Y = [y 1, y 2,  ⋯ , y m ]T, then the unknown parameters σ 2 and β can be estimated using the least square method or the maximum likelihood estimation (MLE) as

$$ \widehat{\boldsymbol{\beta}}={\left({\boldsymbol{F}}^T{\boldsymbol{R}}^{-1}\boldsymbol{F}\right)}^{-1}{\boldsymbol{F}}^T{\boldsymbol{R}}^{-1}\boldsymbol{Y} $$
(4)
$$ {\sigma}^2=\frac{1}{m}{\left(\boldsymbol{Y}-\boldsymbol{F}\widehat{\boldsymbol{\beta}}\right)}^T{\boldsymbol{R}}^{-1}\left(\boldsymbol{Y}-\boldsymbol{F}\widehat{\boldsymbol{\beta}}\right) $$
(5)

where F is a vector of f(x), m is the number of the DOEs, and R is the correlation matrix which can be described as

$$ \boldsymbol{R}=\left(\begin{array}{ccc}\hfill R\left({\boldsymbol{x}}_1,{\boldsymbol{x}}_1\right)\hfill & \hfill \dots \hfill & \hfill R\left({\boldsymbol{x}}_1,{\boldsymbol{x}}_m\right)\hfill \\ {}\hfill \vdots \hfill & \hfill \ddots \hfill & \hfill \vdots \hfill \\ {}\hfill R\left({\boldsymbol{x}}_m,{\boldsymbol{x}}_1\right)\hfill & \hfill \cdots \hfill & \hfill R\left({\boldsymbol{x}}_m,{\boldsymbol{x}}_m\right)\hfill \end{array}\right) $$
(6)

However, it can be seen from Eq. (4) and Eq. (5) that the parameter θ l should be estimated prior to the \( \widehat{\boldsymbol{\beta}} \) and σ 2, and based on MLE, it is evolved to find an optimized solution as the following expression:

$$ \min :\varPsi \left({\theta}_l\right)=\frac{1}{2}\left(m \ln {\sigma}^2+ \ln \left|\boldsymbol{R}\right|\right),\kern1em l=1,2,\cdots, {n}_v $$
(7)

In the DACE, the minimum value of the Eq. (7) is determined by the pattern search method (PSM) which belongs to a local optimization method, and this is the reason why the GA is adopted in this study. The difference between the effects of the GA and the PSM on the minimum of the Eq. (7) will be illustrated by the analytical example #1 in the following section.

After the unknown parameters above are calibrated using the DOEs, we can predict the responses at other unknown points. It should be noted that the Kriging can predict the value at any unknown point as well as providing an estimation of the prediction variance, which gives an uncertainty indication of the established Kriging model. The predicted value and the variance at a new point x new are estimated as

$$ \widehat{y}\left({\boldsymbol{x}}_{new}\right)={\boldsymbol{f}}^T\left(\boldsymbol{x}\right)\widehat{\boldsymbol{\beta}}+{\boldsymbol{r}}^T\left(\boldsymbol{x}\right)\left(\boldsymbol{Y}-\boldsymbol{F}\widehat{\boldsymbol{\beta}}\right) $$
(8)
$$ {\sigma}_{\widehat{\boldsymbol{y}}}^2\left({\boldsymbol{x}}_{new}\right)={\sigma}^2\left(1+{\boldsymbol{u}}^T{\left({\boldsymbol{F}}^T{\boldsymbol{R}}^{-1}\boldsymbol{F}\right)}^{-1}\boldsymbol{u}-{\boldsymbol{r}}^T\left(\boldsymbol{x}\right){\boldsymbol{R}}^{-1}\boldsymbol{r}\left(\boldsymbol{x}\right)\right) $$
(9)

where r T(x new ) = [R(x new , x 1), R(x new , x 2),  ⋯ , R(x new , x m )]T is the correlation vector between an unknown point x new and all the known DOEs(x 1, x 2,  ⋯ , x m ).

Theory of TK

The main idea of the TK is to integrate the Taylor expansion with the trend function to enhance the nonlinear function approximation capabilities of the Kriging. Suppose the first part (trend function) in Eq. (1) is denoted as μ(x) which has continuous derivatives up to the (n + 1)th order at a point x 0, then the Taylor expansion of μ(x) at the point x 0 is:

$$ \begin{array}{c}\hfill \mu \left(\boldsymbol{x}\right)=\mu \left({\boldsymbol{x}}_0\right)+\mu^{\prime}\left({\boldsymbol{x}}_0\right)\left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)+\frac{\mu^{\prime\prime}\left({\boldsymbol{x}}_0\right)}{2!}{\left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)}^2+\cdots \hfill \\ {}\hfill +\frac{\mu^{(n)}\left({\boldsymbol{x}}_0\right)}{n!}{\left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)}^n+\frac{\mu^{\left(n+1\right)}\left(\boldsymbol{\xi} \right)}{\left(n+1\right)!}{\left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)}^{n+1}\hfill \end{array} $$
(10)

where ξ is a vector between x and x 0. Clearly, the Eq. (10) can be further abbreviated in the matrix form as

$$ \mu \left(\boldsymbol{x}\right)\approx \left[1\kern0.5em \left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)\kern0.5em {\left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)}^2\kern0.5em \begin{array}{cc}\hfill \cdots \hfill & \hfill {\left(\boldsymbol{x}-{\boldsymbol{x}}_0\right)}^n\hfill \end{array}\right]\left[\begin{array}{c}\hfill \mu \left({\boldsymbol{x}}_0\right)\hfill \\ {}\hfill \mu^{\prime}\left({\boldsymbol{x}}_0\right)\hfill \\ {}\hfill \frac{\mu^{\prime\prime}\left({\boldsymbol{x}}_0\right)}{2!}\hfill \\ {}\hfill \begin{array}{c}\hfill \vdots \hfill \\ {}\hfill \frac{\mu^{(n)}\left({\boldsymbol{x}}_0\right)}{n!}\hfill \end{array}\hfill \end{array}\right]=\boldsymbol{f}{\left(\boldsymbol{x}\right)}^T\boldsymbol{\beta} $$
(11)

which is similar in the form to that in the Eq. (1). Hence, the major difference between TK and OK or UK is the selection of the basis function, and this will not affect the analytical solution form of the other parameters (i.e., \( \widehat{\boldsymbol{\beta}} \) and σ 2). With this in mind, a unified Kriging model can be built by choosing a suitable Taylor expansion order and the Taylor expansion point. The highest order in this study is selected at the fourth order considering the over-fitting problem and computational complexity based on Liu (2009). In addition, the fourth order TK model can deal with quite a few slope stability problems, as can be concluded from the former works (Li et al. 2015; Li et al. 2016; Li and Chu 2015; Zhang et al. 2011b) where the quadratic response surface or the second to fourth order Hermite polynomial chaos expansion functions well. However, it is worthy of noting that, for those highly nonlinear problems, a proper order should be identified by more rigor methods, such as the Bayesian model class selection method (Cao and Wang 2012; Wang et al. 2016), instead of cutting directly at the fourth order.

GATK surrogate model

Genetic algorithm

GA is a stochastic global search method that mimics the metaphor of the natural biological evolution (Homayouni et al. 2014). The underlying principle is the natural selection, based on which the individuals in nature are becoming much stronger through the continuous breeding. Specifically in the GA, a group of the potential solutions are randomly created and coded as chromosomes, and each component of a solution is considered as a gene in the corresponding chromosome. Similar to the natural selection, the outstanding individuals (solutions) will get more opportunity for the next generation breeding, which are selected by the evaluation of a fitness function, that is, the objective function of the optimization problem. Similar to the complex process of the natural selection, those selected excellent individuals can crossover with each other and mutate themselves which increases the diversity of the next generation. This process leads to the evolution of solutions that are better suited to the studied problem than the initially created solutions. Iterations are stopped when the maximum breeding generation predefined is reached. Hence, six parameters will affect the efficiency and accuracy of the GA, and they are the number of the initial population (NIP), the maximum breeding generation (MBD), the generation gap (GGAP), the crossover probability (CP), and the mutation probability (MP). In this study, NIP = 50, MBD = 100, GGAP = 0.95, CP = 0.7, and MP = 0.01. The values here are determined by trial and error with the consideration of the balance between the accuracy and computational effort. For the enhancement of the GA, the evolution reversal, the multiple populations, and the elitist strategy are suggested. With the global optimization capability, the GA will replace the pattern search in the DACE for finding the optimal correlation parameters θ l in Eq. (7).

GATK model

GATK model is an adaption based on the Kriging toolbox DACE (Lophaven et al. 2002), in which the function [dmodel, perf] = dacefit (S, Y, regr, corr, theta0, lob, upb) was utilized to build the basic Kriging model with an embedded pattern search method for determining θ l in Eq. (7). In the DACE, the outputs dmodel and perf provide information on the parameters needed in the prediction model and optimization of the objective function in Eq. (7), respectively. The inputs S and Y are the same with those mentioned in the second section in this study; regr, corr and theta0 denote the basis function, correlation function and initial θ l adopted in the model, respectively. lob and upb are respectively the lower and upper bounds of θ l . A new function [dmodel, perf] = dacetkgafit(S, Y, @regTaylor, corr, n_order) is however developed to establish the proposed GATK model. It is clear that the number of the inputs is the same as the original DACE while two specific augments are different. In the GATK, @regTaylor is a new sub-function which aims at obtaining the Taylor expansion basis function values (i.e., F in Eq. (4) and Eq. (5)) at the DOEs S, and it is in the form of [F] = regTaylor(S, n_order), where n_order is the selected Taylor expansion order. As described in the section of “Theory of TK,” n_order can range from one to four. Regards to the GA, it is embedded in the GATK and no prior θ l is needed. With the GA in hand, the disadvantages of the pattern search’s single-point search method and its heavy dependency on the initial choice of θ l could be overcome, and the optimal correlation parameters will be found, which will be verified by an analytical example in the following section. Similar to the DACE, the GATK can also be easily accepted by other researchers and engineers.

Analytical validation of GATK—example #1

An explicit one-dimensional analytical example is firstly adopted to verify the proposed GATK model. The limit state function is defined as

$$ g(x)={x}^3+2{x}^2-3 $$
(12)

where x is assumed to comply with the standard normal distribution. To verify the proposed GATK model, its optimization capability for searching the minimum of Eq. (7) is firstly compared with that in DACE. The variation of the minimum Ψ(θ l ) is observed under different DOEs which are used to calibrate unknown parameters \( \widehat{\boldsymbol{\beta}} \) and σ 2. As can be seen in Fig. 1, the GATK can always converge to a smaller Ψ(θ l ) than DACE as a whole, which may contribute to different estimation of the θ values. It is clear that the difference of Ψ(θ l ) between GATK and DACE is negligible when the number of DOEs reaches to about 20, this is because the Kriging model can be highly accurate with this large amount of DOEs (compared with the number or random variables here). The difference between the two models is not very evident (0.71468 vs. 0.7149), but GATK still gives smaller value than DACE, which means that Ψ(θ l ) depends less on θ when the number of DOEs is much small (say 1 when the logarithm of determinant of R is 0). Hence, from the perspective of robustness, GATK is better than DACE in general. Therefore, the following analysis is based on GATK.

Fig. 1
figure 1

Variation of Ψ(θ l ) and θ l with number of DOEs between GATK and DACE

To validate the accuracy of the proposed TK method, a TK model is initially established when the number of DOEs is 5. Then, it is used to predict a set of randomly produced samples by LHS, and the predicted results produced by the TK, UK, and OK models are compared with the actual results calculated from Eq. (12), as shown in Fig. 2. Note that the UK1 and UK2 models denote the linear and quadratic regression models in DACE, respectively, while the TK1, TK2, TK3, and TK4 models denote the first, second, third, and fourth TK models in GATK. Obviously, predictions from the TK3 and TK4 can agree quite well with the actual y values while the worst is the OK model. Interestingly, it is easy to find that the scatters of the UK1 and TK1 coincide with each other as well as for the UK2 and TK2. This is simply because the Taylor expansion point in this study is selected at the mean value. Hence, the UK model can be incorporated in the TK model. It is also find that when the number of DOEs increases to 10, all the Kriging models are highly accurate (see Fig. 3), which is not surprising because Kriging is an absolute accurate interpolation method. This also indicates that the OK and UK can produce accurate results only at the expense of extra evaluation time; hence, it can be concluded that the efficiency of the TK is better in general.

Fig. 2
figure 2

Comparison between predicted and actual values under different Kriging models (LHS = 5). a The first order TK. b The second order TK. c The third order TK. d The fourth order TK

Fig. 3
figure 3

Comparison between predicted and actual values under different Kriging models (LHS = 10). a The first order TK. b The second order TK. c The third order TK. d The fourth order TK

Analytical validation of GATK—example #2

In this section, a more complicated example is shown to gain more insights about the proposed TK model. The Himmelblau Function with a minor adaption is used for such a purpose (Liu 2009), which is given as

$$ g\left(x,y\right)={\left({x}^2+y-11\right)}^2+{\left(x+{y}^2-7\right)}^2+0.1\left({\left(x-3\right)}^2+{\left(x-2\right)}^2\right)-5 $$
(13)

where x and y are independent standard normal random variables. The contour plot of the performance function (i.e., Eq. (13)) is shown in Fig. 4 with a global minimum and several local minima. It is also postulated that the failure event happens when g(x, y) is less than or equal to zero.

Fig. 4
figure 4

Contour plot of Eq. (13)

Figure 5 shows the variation of the probability of failure with the number of DOEs. It is clear that the fourth TK model converges fast to the baseline (i.e., probability of failure evaluated from the explicit performance function by MCS), while the others need more DOEs to achieve a comparable accuracy. Also, the results from the TK1 and TK2 models are consistent with those from the UK1 and UK2 models. Again, this indicates that the TK model has a good potential to cope with the more complicated problems efficiently.

Fig. 5
figure 5

Variation of probability of failure with the number of DOEs

System reliability analysis using GATK surrogate model

The functional state of a slope is often characterized by a limit state function G(X), which is expressed as

$$ G\left(\boldsymbol{X}\right)=F{S}_{\min}\left(\boldsymbol{X}\right)-1 $$
(14)

where FS min(X) denotes the minimum slope factor of safety for a vector of input variables X. Slope failure happens when the value of G(X) is less than or equal to the unit, and the failure probability P f is depicted as

$$ {P}_f={\displaystyle \int \cdots {\displaystyle \iint I\left[F{S}_{\min}\left(\boldsymbol{X}\right)\le 1\right]}}f\left(\boldsymbol{X}\right)\mathrm{d}\boldsymbol{X} $$
(15)

where f(X) is the joint probability density function (PDF) of X, and I[·] is an indicator function which is equal to the unit when FS min(X) ≤ 1 and zero otherwise. In view of the large numbers law, the MCS can be used to estimate the failure probability as

$$ {P}_f=\frac{1}{N}{\displaystyle \sum_{i=1}^NI\left[F{S}_{\min}\left(\boldsymbol{X}\right)\le 1\right]} $$
(16)

where N is the number of the MCS samples. The estimation accuracy of the P f is highly dependent on the number of samples and it is assessed by the coefficient of variation of P f as

$$ CO{V}_{P_f}=\sqrt{\frac{1-{P}_f}{N{P}_f}} $$
(17)

Generally, to reach a reasonable accuracy of the MCS, the value of N could be extremely large, say in the order of or larger than 104. However, directly running the deterministic stability model (particularly for a FEM model) for such a large number would be prohibitively time-consuming. Hence, the proposed TK surrogate model will replace the original deterministic stability model in the following analysis, which is expected to reduce significantly the computation time. A flow chart for the following system slope reliability analysis is suggested in Fig. 6 for reference.

Fig. 6
figure 6

Flow chart for system reliability analysis of slope

Illustrative examples

In this section, the proposed TK model is illustrated by a homogeneous and a heterogeneous soil slopes. The reliability analysis results are also compared with those from other methods in the literature to further verify the accuracy of the proposed TK surrogate model.

A homogeneous c-φ slope

In this example, a single-layered c-φ slope shown in Fig. 7 is studied, which has also been analyzed by Jiang et al. (2015); Cho (2010) and Li et al. (2016) in the literature. The slope has a height of 10 m, a slope angle of 45°, and a total unit weight of 20 kN/m3. The stochastic properties of the soil strength parameters are given in Table 1. Based on the mean values of the soil parameters, the FS of this slope is evaluated as 1.206 using Bishop’s simplified method, which is well consistent with the values of 1.206, 1.204, and 1.206 by Jiang et al. (2015); Cho (2010) and Li et al. (2016), respectively. The critical slip surface with the minimal FS and a total number of 6069 potential slip surfaces are schematically shown in Fig. 7.

Fig. 7
figure 7

The geometry of the c-φ slope with 6069 potential slip surfaces

Table 1 Stochastics of soil strength parameters

As the number of DOEs is of great importance for calibrating the Kriging model, a sensitivity analysis is adopted in order to identify the optimal number. The accuracy of the Kriging model is measured by the coefficient of determination (R 2) based on the predicted and actual FS values of 100 randomly generated testing samples. The optimal number of DOEs is determined when the R 2 is relatively large (e.g., larger than 0.95) and relatively invariant with the number of DOEs. Figure 8 shows the variation of R 2 with the number of DOEs. It is clear that the higher order TK models converge very fast to the optimal number of DOEs, while the OK model seems to be unstable with the increase of the number of DOEs. Specifically, both the TK3 and TK4 models begin to level off at the number of about 200 with the R 2 equal to 0.9999 and 1.000, respectively, which is 100 earlier than the TK2 and UK2 models with the value of R 2 both equal to 0.9980. This also indicates that the TK3 and TK4 models are better than the TK2 (UK2) model in terms of the comparable accuracy. Although the TK1 and UK1 models seems to be invariant with the number of DOEs when the number is greater than 350, they are less accurate compared with the TK3 and TK4 models. The optimal number for each model is schematically shown in Fig. 9. Hence, it can be concluded that the TK3 and TK4 models perform better than the OK and UK models for capturing the nonlinear properties of this slope stability problem.

Fig. 8
figure 8

Variation of coefficient of determination with the number of DOEs

Fig. 9
figure 9

Histogram of the optimal number of DOEs

Based on the optimal number of DOEs, different Kriging models are established and can be used as replacements of the original deterministic stability model to perform MCS for slope reliability analysis. For the convenience of comparisons, a thousand times of the LHS are first directly evaluated based on the original deterministic stability model, and the probability of failure is estimated as 0.048 which can be considered as the baseline of this example. Then, a hundred thousand samples generated by the MCS shown in Fig. 10 are evaluated based on the established Kriging models for calculating the probability of failure. Figure 11 shows the variation of the probability of failure with respect to the number of DOEs for different Kriging models. Similar observations with Fig. 8 can be seen that the results (0.0477 and 0.0478) estimated respectively by the TK3 and TK4 models are very close to the baseline (i.e., 0.0480). The TK2 and UK2 models can also produce comparable results (both at 0.0477) at the expense of about 150 more DOEs, that is, 150 times more evaluations on the original deterministic stability model. This means more computational effort is required. Clearly, the TK1 and UK1 models overestimate the probability of failure by about 5 %. In addition, it is worthy to note that the probability of failure from the OK model is still very sensitive to the number of DOEs as stated above, and the corresponding result is 23.75 % below the baseline. Table 2 summarizes the results of the probability of failure estimated by the proposed TK models, with the results from the literatures included for comparison. The results obtained by different methods are comparable with each other, which validates the feasibility of the proposed TK models.

Fig. 10
figure 10

Correlated lognormal samples generated by MCS

Fig. 11
figure 11

Variation of probability of failure with the number of DOEs

Table 2 Results of the probability of failure by different methods

A heterogeneous two-layered soil slope

This example is a two-layered soil slope with the cross-section given in Fig. 12, which was originally studied by Hassan and Wolff (1999). The upper layer is clay and the lower layer is a c-φ soil. The strength parameters are assumed to be mutually independent standard normal random variables, and the statistic properties are given in Table 3. There is no water table or external water in the slope, and the unit weight of the soil is taken as 19 kN/m3. Using the mean values of these strength parameters, the minimum FS of this slope is calculated as 1.621 according to the Spencer’s non-circular slip surface method. This value is slightly less than 1.650 by Yi et al. (2015) and 1.634 by Xu and Low (2006).

Fig. 12
figure 12

Cross-section of a two-layered soil slope

Table 3 Statistics of soil strength parameters

The Kriging models should be established before performing the MCS. As for the surrogate model, the number of DOEs is critical to establish a suitable Kriging model. In this study, the optimal number of DOEs is identified considering the balance between the model accuracy and efficiency, which is achieved based on a sensitivity analysis with respect to the number of DOEs as shown in Fig. 13. Similar to the first example, the R 2 that is calculated based on the predicted and actual FSs of 100 randomly generated testing samples is selected to measure the accuracy of the Kriging model. As can be seen from Fig. 13, the R 2 evaluated by the OK model fluctuates significantly with the number of DOEs and tends to converge at a lower value compared with the UK and TK models. This indicates that the constant trend function in the ordinary Kriging cannot well capture the nonlinear properties of this slope stability problem. Again, the TK2 and UK2 models coincide with each other as well as the TK1 and UK1 models, which is expected as the Taylor expansion point is selected at the mean point. However, the TK1 (UK1) model seems to be less accurate than the TK2 (UK2) model. Additionally, the TK3 model appears to be the most accurate for this example and is as efficient as the TK2 (UK2) model (both converge at nearly DOEs = 220), while the TK4 model is also capable of obtaining competent accuracy but at the expense of nearly 20 more DOEs (i.e., approximate 240) than the TK3 model. The optimal number of DOEs for each Kriging model is shown in Fig. 14. Overall, it can be concluded that the TK model is much more efficient and accurate.

Fig. 13
figure 13

Variation of coefficient of determination with the number of DOEs

Fig. 14
figure 14

Histogram of the optimal number of DOEs

To estimate the failure probability of this slope, the direct MCS based on the original deterministic stability model would be prohibitively impossible. Therefore, the Kriging-based MCS is used as a surrogate to efficiently estimate the failure probability with a reasonable accuracy. For a convenient comparison, a thousand random samples produced by the LHS are firstly evaluated on the original deterministic slope stability model to obtain the probability of failure, which is estimated as 0.0100 and is considered as the baseline of this example. According to the literature (Hassan and Wolff 1999; Xu and Low 2006; Yi et al. 2015), the MCS with a hundred thousand random samples shown in Fig. 15 are sufficient for this example to estimate an accurate failure probability. Similar to Fig. 13, the variation of the probability of failure with respect to the number of DOEs is shown in Fig. 16. Obviously, the probability of failure estimated from the OK model is not as good as those from the UK and TK models due to the highly nonlinear properties, while those high order TK models present inconspicuous difference.

Fig. 15
figure 15

Independent normal samples generated by MCS

Fig. 16
figure 16

Variation of probability of failure with the number of DOEs

To gain more insights into the proposed TK model, the reliability results evaluated by other methods from the literature are provided for comparison and are listed in Table 4. The results of probability of failure estimated from different Kriging models under the corresponding optimal DOEs are also shown in this table. In general, there are some minor differences among different methods since both the deterministic and stochastic slope stability models are different. However, the results from Yi et al. (2015) agree well with the baseline as well as the results from the proposed TK models estimated in this study. Additionally, the result (0.0103) obtained from the TK3 model is the closet to the baseline (0.0100), and this result seems to be better than other published results. On the other hand, for conservative estimation, the result from the TK4 model is closer to the results obtained by Xu and Low (2006) with less variation. This further indicates the effectiveness and accuracy of the proposed TK model.

Table 4 Results of probability of failure by different methods

Discussions

The aforementioned reliability results have verified the effectiveness of the proposed GATK model. The efficiency of the GATK model has also been qualitatively analyzed. In this section, the computation cost of the proposed method will be quantified to further quantitatively illustrate its high computational efficiency compared with the direct MCS. The current work presented in this study was completed on a desktop with Intel Core i7-4790 K, 4.00 GHz processor and 16 GB RAM, based on which several observations on the computation expenses by different models are discussed and described as follows.

Figure 17a, b schematically shows the stack columns of the time sources for the Kriging-based MCS with 100,000 samples for the homogeneous c-φ slope and the heterogeneous two-layered soil slope, respectively. It can be found from the figures that the computation time required by the Kriging-based MCS mainly consists of four parts: model training for preparing the DOEs, model calibration including the GA optimization for determining the unknown coefficients, model validation based on the testing samples and the MCS for estimating the failure probability. Compared with the time for model training and validation, the time for model calibration and the MCS is negligible as the two processes can be finished quickly within 1 min, as can be seen from Fig. 17. This indicates that the efficiency of the proposed method depends mainly on the model training and validation. But even so, the total computational time required by each Kriging-based MCS is less than 8000 s (i.e., about 2 h) for the homogeneous c-φ slope example. However, it takes about 416.7 h or 17.4 days to perform a direct MCS, which is difficult to be accepted by the engineers. Similarly for the heterogeneous two-layered soil slope, the total computational time by a Kriging-based MCS is about 5000 s (i.e., about 1.4 h), which is significantly less than 277.8 h or 11.6 days by the direct MCS. Thus, all these indicate the high efficiency of the proposed method in this study in comparison to the direct MCS.

Fig. 17
figure 17

Stack column of the time sources for the Kriging-based MCS with 100,000 samples. a The homogeneous c-φ slope. b The heterogeneous two-layered soil slope

Furthermore, Fig. 18a compares the computational time by the Kriging-based MCS and the direct MCS for the homogeneous c-φ slope when the number of the MCS samples changes. It is found that the computation time by the direct MCS increases sharply with the increase of the number of the MCS samples. However, the time spent by the Kriging-based MCS seems to be invariant with the number of the MCS samples. This is because the MCS is performed on the established Kriging model, which can be completed within 1 min, as illustrated in Fig. 17. This highlights the superiority of the proposed method in efficiency if a high accuracy of the MCS estimation is required. Similar results can also be observed for the heterogeneous two-layered soil slope, as illustrated in Fig. 18b.

Fig. 18
figure 18

Comparison of the computational time between the Kriging-based MCS and the direct MCS. a The homogeneous c-φ slope. b The heterogeneous two-layered soil slope

Finally, for different Kriging-based MCS methods, it is observed from both Figs. 17 and 18 that the TK models are more efficient than the UK and OK models despite the introduction of the GA optimization. Specifically for the case of the homogeneous c-φ slope, the TK3 and TK4 models perform much faster than the OK and UK models with more accurate results (see Table 2), reducing computation time by about half an hour. For the heterogeneous two-layered soil slope, the TK models (TK2, TK3, and TK4) take nearly 0.6 h less time than the OK and UK1 models, and simultaneously ensure the accuracy of the results (see Table 4).

Conclusions

A unified Kriging surrogate model based on the Taylor expansion, which is referred to as the TK model, has been proposed for the system reliability analysis of soil slopes in this paper. The proposed TK model is implemented based on a GATK toolbox which is adapted from the commonly used DACE toolbox. Different from the DACE, the global optimization method GA is introduced in the GATK to search for the optimal correlation parameters. The effectiveness of the GA is verified by an analytical example, and it is found that the GA well outperforms the pattern search approach in the DACE. Two analytical examples are then evaluated to demonstrate the feasibility of the proposed GATK toolbox. Finally, the proposed model is applied to a homogeneous c-φ slope and a heterogeneous two-layered soil slope to evaluate their system reliability, and the results are compared with those obtained by the OK, UK and other methods from the literature. Several conclusions can be made from this study and are described as follows:

Firstly, it can be concluded that the UK model can be incorporated into the TK model if the Taylor expansion point is selected at the mean points. Secondly, the OK model is demonstrated to be not able to capture the potential nonlinear properties existed in the slope stability model compared with the high order TK model. Thirdly, the UK model can obtain comparable results but should be at the expense of more evaluation time on the original deterministic stability model. Fourthly, the TK model seems to show higher accuracy and efficiency, particularly when there is higher nonlinearity in a slope stability problem. Finally, the present work is developed with a view to maintain a balance between the amount of computation and accuracy in analysis so that routine reliability analysis will be possible for most of the engineering design.