Keywords

1 Introduction

We focus on rare-event simulation for addressing reliability problems corresponding to dynamic systems. To compute the rare-event (failure) probability for a dynamic system, both input (excitation) and modeling uncertainties should be quantified and propagated. Therefore, a probability model must be chosen to describe the uncertainty in the future input for the system, and then a chosen deterministic or stochastic system model is used, preferably in conjunction with a probability model describing the associated modeling uncertainties, to propagate these uncertainties. These input and system models define a probabilistic description of the system output (response). For example, the problem of interest might be to compute the small failure probability for a highly reliable dynamic system such as a bridge or building under uncertain future earthquake excitation, or for an aircraft under uncertain excitation by turbulence, using a finite-element structural model to approximate the dynamics of the system. This model will usually be subject to both parametric uncertainty (what values of the model parameters best represent the behavior of the system?) and nonparametric modeling uncertainty (what are the effects of the aspects of the system behavior not captured by the dynamic model?). The treatment of input uncertainty has a long history in dynamic reliability theory and random vibrations, now more commonly called stochastic dynamics, but the treatment of modeling uncertainty is more recent.

Usually the dynamic model of the system is represented by a time-dependent BVP (boundary-value problem) involving PDEs (partial differential equations) or by a set of coupled ODEs (ordinary differential equations). Typically the failure event is defined as any one of a set of performance quantities of interest exceeding its specified threshold over some time interval. This is the so-called first-passage problem. This challenging problem is characterized by a lack of analytical solutions, even for the simplest case of a single-degree-of-freedom linear oscillator subject to excitation that is modeled as a Gaussian process. Approximate analytical methods exist that are usually limited in scope, and their accuracy is difficult to assess in a given application [43, 51]. Semi-analytical methods from structural reliability theory such as FORM and SORM (first- and second-order reliability methods) [20, 43] cannot be applied directly to the first-passage problem and are inapplicable, anyway, because of the high-dimensional nature of the discrete-time input history [32, 53]. Standard Monte Carlo simulation has general applicability, but it is computationally very inefficient because of the low failure probabilities. As a consequence, advanced stochastic simulation schemes are needed.

1.1 Mathematical Formulation of Problem

We assume that initially there is a continuous-time deterministic model of the real dynamic system that consists of a state-space model with a finite-dimensional state \(X(t) \in \mathbb{R}^{n}\) at time t, and this is converted to a discrete-time state-space model using a numerical time-stepping method to give

$$\displaystyle{ X(t + 1) = f(X(t),U(t),t),\quad X(t) \in \mathbb{R}^{n},\quad U(t) \in \mathbb{R}^{m},\quad t = 0,\ldots,T }$$
(30.1)

where \(U(t) \in \mathbb{R}^{m}\) is the input at discrete time t.

If the original model consists of a BVP with PDEs describing a response u(x, t) where \(x \in \mathbb{R}^{d}\), then we assume that a finite set of basis functions {ϕ 1(x), , ϕ n (x)} is chosen (e.g., global bases such as Fourier and Hermite polynomials or localized ones such as finite-element interpolation functions) so that the solution is well approximated by

$$\displaystyle{ u(x,t) \approx \sum _{i=1}^{n}X_{ i}(t)\phi _{i}(x) }$$
(30.2)

Then a numerical method is applied to the BVP PDEs to establish time-dependent equations for the vector of coefficients X(t) = [X 1(t), , X n (t)] so that the standard state-space equation in (30.1) still applies. For example, for a finite-element model of a structural system, {ϕ 1(x), , ϕ n (x)} would be local interpolation functions over the elements. Then, expressing the BVP in weak form, a weighted residual or Galerkin method could be applied to give a state-space equation for the vector of coefficients X(t) [27].

Suppose that a positive scalar performance function g(X(t)) is a quantity of interest and that the rare-event \(\mathcal{E}\) of concern is that g(X(t)) exceeds a threshold b over some discrete-time interval t = 0, , T:

$$\displaystyle{ \mathcal{E} = \left \{U = (U(0),\ldots,U(T)):\max \limits _{t=0,\ldots,T}g(X(t))> b\right \} \subset \mathbb{R}^{m\times (T+1)} }$$
(30.3)

where X(t) satisfies (30.1). The performance function g(X(t)) may involve exceedance of multiple performance quantities of interest {g k (X(t)): k = 1, , K} above their corresponding thresholds {a k }. This can be accomplished by aggregating them using the max and min operators in an appropriate combination on the set of g k ’s; for example, for a pure series failure criterion, where the threshold exceedance of any a k represents failure, one takes the aggregate performance failure criterion as g(X(t)) = max{g k (X(t))∕a k : k = 1, , K} > 1, while for a pure parallel failure criterion, where all of the g k must exceed their thresholds before failure is considered to have occurred, one takes the aggregate performance failure criterion as g(X(t)) = min{g k (X(t))∕a k : k = 1, , K} > 1.

If the uncertainty in the input time history vector \(U = [U(0),\ldots,U(T)] \in \mathbb{R}^{D}\) (D = m × (T + 1)) is quantified by a probability distribution for U that has a PDF (probability density function) p(u) with respect to Lebesgue integration over \(\mathbb{R}^{D}\), then the rare-event probability is given by

$$\displaystyle{ p_{\mathcal{E}} = \mathbb{P}(U \in \mathcal{E}) =\int _{\mathcal{E}}p(u)du }$$
(30.4)

The PDF p(u) is assumed to be readily sampled. Although direct sampling from a high-dimensional PDF is not possible in most cases, multidimensional Gaussians are an exception because the Gaussian vector can be readily transformed so that the components are independent and the PDF is a product of one-dimensional Gaussian PDFs. In many applications, the discrete-time stochastic input history is modeled by running discrete-time Gaussian white noise through a digital filter to shape its spectrum in the frequency domain and then multiplying the filtered sequence by an envelope function to shape it in the time domain, if it is nonstationary.

The model in (30.1) may also depend on uncertain parameters \(\theta \in \varTheta \subset \mathbb{R}^{p}\) which includes the initial values X(0) if they are uncertain. Then a prior PDF p(θ) may be chosen to quantify the uncertainty in the value of vector θ. Some of the parameters may characterize the PDF for input U which can then be denoted p(u | θ). It is convenient to redefine vector U to also include θ; then the new PDF p(u) is p(u | θ)p(θ) in terms of the previous PDFs. We assume that model parameter uncertainty is incorporated in this way, so the basic equations remain the same as (30.1), (30.3), and (30.4). When model uncertainty is incorporated, the calculated \(p_{\mathcal{E}}\) has been referred to as the robust rare-event probability [10, 40], meaning robust to model uncertainty, as in robust control theory.

2 Standard Monte Carlo Simulation

The standard Monte Carlo simulation method (MCS) is one of the most robust and straightforward ways to simulate rare events and estimate their probabilities. The method was originally developed in [37] for solving problems in mathematical physics. Since then MCS has been used in many applications in physics, statistics, computer science, and engineering, and currently it lays at the heart of all random sampling-based techniques [35, 44].

The basic idea behind MCS is to observe that the probability in (30.4) can be written as an expectation:

$$\displaystyle{ p_{\mathcal{E}} =\int _{\mathbb{R}^{D}}I_{\mathcal{E}}(u)p(u)du = \mathbb{E}_{p}[I_{\mathcal{E}}] }$$
(30.5)

where \(I_{\mathcal{E}}\) is the indicator function of \(\mathcal{E}\), that is, \(I_{\mathcal{E}}(u) = 1\) if \(u \in \mathcal{E}\) and \(I_{\mathcal{E}}(u) = 0\) otherwise, and D = m × (T + 1) is the dimension of the integral. Recall that the strong law of large numbers [45] states that if U 1, , U N are independent and identically distributed (i.i.d.) samples of vector U drawn from the distribution p(u), then for any function h(u) with finite mean \(\mathbb{E}_{p}[h(u)]\), the sample average \(\frac{1} {N}\sum _{i=1}^{N}h(U_{ i})\) converges to the true value \(\mathbb{E}_{p}[h(u)]\) as N almost surely (i.e., with probability 1). Therefore, setting \(h(u) = I_{\mathcal{E}}(u)\), the probability in (30.5) can be estimated as follows:

$$\displaystyle{ p_{\mathcal{E}}\approx p_{\mathcal{E}}^{MCS} = \frac{1} {N}\sum _{i=1}^{N}I_{ \mathcal{E}}(U_{i}) }$$
(30.6)

It is straightforward to show that \(p_{\mathcal{E}}^{MCS}\) is an unbiased estimator of \(p_{\mathcal{E}}\) with mean and variance:

$$\displaystyle{ \begin{array}{rcl} \mathbb{E}_{p}[p_{\mathcal{E}}^{MCS}]& =&\mathbb{E}_{p}\left [ \frac{1} {N}\sum _{i=1}^{N}I_{ \mathcal{E}}(U_{i})\right ] = \frac{1} {N}\sum _{i=1}^{N}\mathbb{E}_{ p}[I_{\mathcal{E}}] = p_{\mathcal{E}} \\ \mathrm{Var}_{p}[p_{\mathcal{E}}^{MCS}]& =&\mathrm{Var}_{p}\left [ \frac{1} {N}\sum _{i=1}^{N}I_{ \mathcal{E}}(U_{i})\right ] = \frac{1} {N^{2}} \sum _{i=1}^{N}\mathrm{Var}_{p}[I_{\mathcal{E}}] = \frac{p_{\mathcal{E}}(1-p_{\mathcal{E}})} {N} \end{array} }$$
(30.7)

Furthermore, by the central limit theorem [45], as N, \(p_{\mathcal{E}}^{MCS}\) is distributed asymptotically as Gaussian with this mean and variance.

Frequentist interpretation of MCS: The frequentist interpretation of MCS focuses on the forward problem, arguing that if N is large so that the variance of \(p_{\mathcal{E}}^{MCS}\) is relatively small, then the value \(\hat{p}_{\mathcal{E}}^{MCS}\) based on (30.6) for a specific set of N samples \(\{\hat{U}_{1},\ldots, \widehat{U } _{N}\}\) drawn from p(u) should be close to the mean \(p_{\mathcal{E}}\) of \(p_{\mathcal{E}}^{MCS}\). The sample mean estimate \(\hat{p}_{\mathcal{E}}^{MCS}\) is very intuitive and, in fact, simply reflects the frequentist definition of probability: \(\hat{p}_{\mathcal{E}}^{MCS}\) is the ratio between the number of trials where the event \(\mathcal{E}\) occurred, \(\widehat{N }_{\mathcal{E}} =\sum _{ i=1}^{N}I_{\mathcal{E}}(\widehat{U } _{i})\), and the total number of trials N.

Bayesian interpretation of MCS: The same MCS estimate \(\hat{p}_{\mathcal{E}}^{MCS}\) has a simple Bayesian interpretation (e.g., [56]), which focuses on the inverse problem for the specific set of N samples \(\{\widehat{U } _{1},\ldots, \widehat{U } _{N}\}\) drawn from p(u). Following the Bayesian approach [26], the unknown probability \(p_{\mathcal{E}}\) is considered as a stochastic variable whose value in [0, 1] is uncertain. The Principle of Maximum Entropy [25] leads to the uniform prior distribution for \(p_{\mathcal{E}}\), \(p(p_{\mathcal{E}}) = 1\), \(0 \leq p_{\mathcal{E}}\leq 1\), which implies that all values are taken as equally plausible a priori. Since samples U 1, , U N are i.i.d., the binary sequence \(I_{\mathcal{E}}(U_{1}),\ldots,I_{\mathcal{E}}(U_{N})\) is a sequence of Bernoulli trials, and so for the forward problem, \(N_{\mathcal{E}}\) is distributed according to the binomial distribution with parameters N and \(p_{\mathcal{E}}\), \(N_{\mathcal{E}}\sim \mathrm{Bin}(N,p_{\mathcal{E}})\). Therefore, for the set of N samples, the likelihood function is \(p(\widehat{N } _{\mathcal{E}}\vert p_{\mathcal{E}},N) ={ N\choose \widehat{N } _{\mathcal{E}}}p_{\mathcal{E}}^{\widehat{N } _{\mathcal{E}}}(1 - p_{\mathcal{E}})^{N-\widehat{N } _{\mathcal{E}}}\). Using Bayes’ theorem, the posterior distribution for \(p_{\mathcal{E}}\), \(p(p_{\mathcal{E}}\vert \widehat{N } _{\mathcal{E}},N) \propto p(p_{\mathcal{E}})p(\widehat{N } _{\mathcal{E}}\vert p_{\mathcal{E}},N)\), is therefore the beta distribution \(\mathrm{Beta}(\widehat{N } _{\mathcal{E}} + 1,N -\widehat{N } _{\mathcal{E}} + 1)\), i.e.,

$$\displaystyle{ p(p_{\mathcal{E}}\vert \widehat{N } _{\mathcal{E}},N) = \frac{p_{\mathcal{E}}^{\widehat{N } _{\mathcal{E}}}(1 - p_{\mathcal{E}})^{N-\widehat{N } _{\mathcal{E}}}} {B(\widehat{N } _{\mathcal{E}} + 1,N -\widehat{N } _{\mathcal{E}} + 1)} }$$
(30.8)

where the beta function B is the normalizing constant that equals \((N + 1)!/(\widehat{N } _{\mathcal{E}}!(N -\widehat{N } _{\mathcal{E}})!)\) here. The MCS estimate is the maximum a posteriori (MAP) estimate, which is the mode of the posterior distribution (30.8) and therefore the most probable value of \(p_{\mathcal{E}}\) a posteriori:

$$\displaystyle{ \hat{p}_{\mathcal{E}}^{MCS} = \frac{\widehat{N } _{\mathcal{E}}} {N} }$$
(30.9)

Notice that the posterior PDF in (30.8) gives a complete description of the uncertainty in the value of \(p_{\mathcal{E}}\) based on the specific set of N samples of U drawn from p(u). The posterior distribution in (30.8) is in fact the original Bayes’ result [9], although Bayes’ theorem was developed in full generality by Laplace [34].

The standard MCS method for estimating the probability in (30.4) is summarized in the following pseudo-code.

Assessment of accuracy of MCS estimate: For the frequentist interpretation, the coefficient of variation (c.o.v.) for the estimator \(p_{\mathcal{E}}^{MCS}\) given by (30.6), conditional on \(p_{\mathcal{E}}\) and N, is given by (30.7):

$$\displaystyle{ \delta (p_{\mathcal{E}}^{MCS}\vert p_{ \mathcal{E}},N) = \frac{\sqrt{\mathrm{Var }_{p } [p_{\mathcal{E} }^{MCS }]}} {\mathbb{E}_{p}[p_{\mathcal{E}}^{MCS}]} = \sqrt{\frac{1 - p_{\mathcal{E} } } {Np_{\mathcal{E}}}} }$$
(30.10)

This can be approximated by replacing \(p_{\mathcal{E}}\) by the estimate \(\hat{p}_{\mathcal{E}}^{MCS} = \widehat{N } _{\mathcal{E}}/N\) for a given set of N samples \(\{\widehat{U } _{1},\ldots, \widehat{U } _{N}\}\):

$$\displaystyle{ \delta (p_{\mathcal{E}}^{MCS}\vert p_{ \mathcal{E}},N) \approx \sqrt{\frac{1 -\hat{ p}_{\mathcal{E} }^{MCS }} {N\hat{p}_{\mathcal{E}}^{MCS}}} \stackrel{\bigtriangleup }{=}\hat{\delta }_{N}^{MCS} }$$
(30.11)

For the Bayesian interpretation, the posterior c.o.v. for the stochastic variable \(p_{\mathcal{E}}\), conditional on the set of N samples, follows from (30.8):

$$\displaystyle{ \delta (p_{\mathcal{E}}\vert \widehat{N } _{\mathcal{E}},N) = \frac{\sqrt{\mathrm{Var }[p_{\mathcal{E} } \vert \widehat{N } _{\mathcal{E} }, N]}} {\mathbb{E}[p_{\mathcal{E}}\vert \widehat{N } _{\mathcal{E}},N]} = \frac{\sqrt{1 - \frac{\widehat{N } _{\mathcal{E} } +1} {N+2}} } {\sqrt{(N + 3)\left (\frac{\widehat{N } _{\mathcal{E} } +1} {N+2} \right )}}\longrightarrow \sqrt{\frac{1 -\hat{ p}_{\mathcal{E} }^{MCS }} {N\hat{p}_{\mathcal{E}}^{MCS}}} =\hat{\delta }_{ N}^{MCS} }$$
(30.12)

as N. Therefore, the same expression \(\hat{\delta }_{N}^{MCS}\) can be used to assess the accuracy of the MCS estimate, even though the two c.o.v.s have distinct interpretations.

The approximation \(\hat{\delta }_{N}^{MCS}\) for the two c.o.v.s reveals both the main advantage of the standard MCS method and its main drawback. The main strength of MCS, which makes it very robust, is that its accuracy does not depend on the geometry of the domain \(\mathcal{E}\subset \mathbb{R}^{D}\) and its dimension D. As long as an algorithm for generating i.i.d. samples from p(u) is available, MCS, unlike many other methods (e.g., numerical integration), does not suffer from the “curse of dimensionality.” Moreover, an irregular, or even fractal-like, shape of \(\mathcal{E}\) will not affect the accuracy of MCS.

On the other hand, the serious drawback of MCS is that this method is not computationally efficient in estimating the small probabilities \(p_{\mathcal{E}}\) corresponding to rare events, where from (30.10),

$$\displaystyle{ \delta (p_{\mathcal{E}}^{MCS}\vert p_{ \mathcal{E}},N) \approx \frac{1} {\sqrt{Np_{\mathcal{E}}}} }$$
(30.13)

Therefore, to achieve a prescribed level of accuracy δ < 1, the required total number of samples is \(N = (p_{\mathcal{E}}\delta ^{2})^{-1} \gg 1\). For each sampled excitation U i , a system analysis – usually computationally very intensive – is required to compute the corresponding system trajectory X i and to check whether U i belongs to \(\mathcal{E}\). This makes MCS excessively costly and inapplicable for generating rare events and estimating their small probabilities. Nevertheless, essentially all sampling-based methods for estimation of rare-event probability are either based on MCS (e.g., importance sampling) or have it as a part of the algorithm (e.g., subset simulation).

3 Importance Sampling

The importance sampling (IS) method belongs to the class of variance reduction techniques that aim to increase the accuracy of the estimates by constructing (sometimes biased) estimators with a smaller variance [1, 22]. It seems it was first proposed in [29], soon after the standard MCS method appeared.

The inefficiency of MCS for rare-event estimation stems from the fact that most of the generated samples U i p(u) do not belong to \(\mathcal{E}\) so that the vast majority of the terms in the sum (30.6) are zero and only very few (if any) are equal to one. The basic idea of IS is to make use of the information available about the rare-event \(\mathcal{E}\) to generate samples that lie more frequently in \(\mathcal{E}\) or in the important region \(\tilde{\mathcal{E}}\subset \mathcal{E}\) that accounts for most of the probability content in (30.4). Rather than estimating \(p_{\mathcal{E}}\) as an average of many 0’s and very few 1’s like in \(\hat{p}_{\mathcal{E}}^{MCS}\), IS seeks to reduce the variance by constructing an estimator of the form \(p_{\mathcal{E}}^{IS} = \frac{1} {N}\sum _{i=1}^{N^{{\prime}} }w_{i}\), where N is an appreciable fraction of N and the w i are small but not zero, ideally of the same order as the target probability, \(w_{i} \approx p_{\mathcal{E}}\).

Specifically, for an appropriate PDF q(u) on the excitation space \(\mathbb{R}^{D}\), the integral in (30.5) can be rewritten as follows:

$$\displaystyle{ p_{\mathcal{E}} =\int _{\mathbb{R}^{D}}I_{\mathcal{E}}(u)p(u)du =\int _{\mathbb{R}^{D}}\frac{I_{\mathcal{E}}(u)p(u)} {q(u)} q(u)du = \mathbb{E}_{q}\left [\frac{I_{\mathcal{E}}p} {q} \right ] }$$
(30.14)

The IS estimator is now constructed similarly to (30.6) by utilizing the law of large numbers:

$$\displaystyle{ p_{\mathcal{E}}\approx p_{\mathcal{E}}^{IS} = \frac{1} {N}\sum _{i=1}^{N}\frac{I_{\mathcal{E}}(U_{i})p(U_{i})} {q(U_{i})} = \frac{1} {N}\sum _{i=1}^{N}I_{ \mathcal{E}}(U_{i})w(U_{i}) }$$
(30.15)

where U 1, , U N are i.i.d. samples from q(u), called the importance sampling density (ISD), and \(w(U_{i}) = \frac{p(U_{i})} {q(U_{i})}\) is the importance weight of sample U i .

The IS estimator \(p_{\mathcal{E}}^{IS}\) converges almost surely as N to \(p_{\mathcal{E}}\) by the strong law of large numbers, provided that the support of q(u), i.e., the domain in \(\mathbb{R}^{D}\) where q(u) > 0, contains the support of \(I_{\mathcal{E}}(u)p(u)\). Intuitively, the latter condition guarantees that all points of \(\mathcal{E}\) that can be generated by sampling from the original PDF p(u) can also be generated by sampling from the ISD q(u). Note that if q(u) = p(u), then w(U i ) = 1 and IS simply reduces to MCS, \(p_{\mathcal{E}}^{MCS} = p_{\mathcal{E}}^{IS}\). By choosing the ISD q(u) appropriately, IS aims to obtain an estimator with a smaller variance.

The IS estimator \(p_{\mathcal{E}}^{IS}\) is also unbiased with mean and variance:

$$\displaystyle{ \begin{array}{rcl} \mathbb{E}_{q}[p_{\mathcal{E}}^{IS}]& =&\mathbb{E}_{q}\left [ \frac{1} {N}\sum _{i=1}^{N}I_{ \mathcal{E}}(U_{i})w(U_{i})\right ] = \frac{1} {N}\sum _{i=1}^{N}\mathbb{E}_{ q}\left [\frac{I_{\mathcal{E}}p} {q} \right ] = p_{\mathcal{E}} \\ \mathrm{Var}_{q}[p_{\mathcal{E}}^{IS}]& =& \frac{1} {N^{2}} \sum _{i=1}^{N}\mathrm{Var}_{q}\left [\frac{I_{\mathcal{E}}p} {q} \right ] = \frac{1} {N}\left (\mathbb{E}_{q}\left [\frac{I_{\mathcal{E}}p^{2}} {q^{2}} \right ] - p_{\mathcal{E}}^{2}\right )\end{array} }$$
(30.16)

The IS method is summarized in the following pseudo-code.

The most important task in applying IS for estimating small probabilities of rare events is the construction of the ISD, since the accuracy of \(\hat{p}_{\mathcal{E}}^{IS}\) depends critically on q(u). If the ISD is “good,” then one can get great improvement in efficiency over standard MCS. If, however, the ISD is chosen inappropriately so that, for instance, \(N_{\mathcal{E}} = 0\) or the importance weights have a large variation, then IS will yield a very poor estimate. Both scenarios are demonstrated below in Sect. 6.

It is straightforward to show that the optimal ISD, which minimizes the variance in (30.16), is simply the original PDF p(u) conditional on the domain \(\mathcal{E}\):

$$\displaystyle{ q_{0}(u) = p(u\vert \mathcal{E}) = \frac{I_{\mathcal{E}}(u)p(u)} {p_{\mathcal{E}}} }$$
(30.17)

Indeed, in this case, all generated sample excitations satisfy \(U_{i} \in \mathcal{E}\), so their importance weights \(w(U_{i}) = p_{\mathcal{E}}\), and the IS estimate \(\hat{p}_{\mathcal{E}}^{IS} = p_{\mathcal{E}}\). Moreover, just one sample (N = 1) generated from q 0(u) is enough to find the probability \(p_{\mathcal{E}}\) exactly. Note, however, that this is a purely theoretical result since in practice sampling from the conditional distribution \(p(u\vert \mathcal{E})\) is challenging, and, most importantly, it is impossible to compute q 0(u): this would require the knowledge of \(p_{\mathcal{E}}\), which is unknown. Nevertheless, this result indicates that the ISD q(u) should be chosen as close to q 0(u) as possible. In particular, most of the probability mass of q(u) should be concentrated on \(\mathcal{E}\). Based on these considerations, several ad hoc techniques for constructing ISDs have been developed, e.g., variance scaling and mean shifting [15].

In the special case of linear dynamics and Gaussian excitation, an extremely efficient algorithm for estimating the rare-event probability \(p_{\mathcal{E}}\) in (30.4), referred to as ISEE (importance sampling using elementary events) , has been presented [3]. The choice of the ISD exploits known information about each elementary event, defined as an outcrossing of the performance threshold b in (30.3) at a specific time t ∈ {0, , T}. The c.o.v. of the ISEE estimator for N samples of U from p(u) is given by

$$\displaystyle{ \delta _{N}^{ISEE} = \frac{\alpha } {\sqrt{N}} }$$
(30.18)

where the proportionality constant α is close to 1, regardless of how small the value of \(p_{\mathcal{E}}\). In fact, α decreases slightly as \(p_{\mathcal{E}}\) decreases, exhibiting the opposite behavior to MCS.

In general, it is known that in many practical cases of rare-event estimation, it is difficult to construct a good ISD that leads to a low-variance IS estimator, especially if the dimension of the uncertain excitation space \(\mathbb{R}^{D}\) is large, as it is in dynamic reliability problems [5]. A geometric explanation as to why IS is often inefficient in high dimensions is given in [32]. Au [2] has presented an efficient IS method for estimating \(p_{\mathcal{E}}\) in (30.4) for elastoplastic systems subject to Gaussian excitation. In recent years, substantial progress has been made by tailoring the sequential importance sampling (SIS) methods [35], where the ISD is iteratively refined, to rare-event problems. SIS and its modifications have been successfully used for estimating rare events in dynamic portfolio credit risk [19], structural reliability [33], and other areas .

4 Subset Simulation

The subset simulation (SS) method [4] is an advanced stochastic simulation method for estimating rare events which is based on Markov chain Monte Carlo (MCMC) [35, 44]. The basic idea behind SS is to represent a very small probability \(p_{\mathcal{E}}\) of the rare-event \(\mathcal{E}\) as a product of larger probabilities of “more-frequent” events and then estimate these larger probabilities separately. To implement this idea, let

$$\displaystyle{ \mathbb{R}^{D} \equiv \mathcal{E}_{ 0} \supset \mathcal{E}_{1}\ldots \supset \mathcal{E}_{L} \equiv \mathcal{E} }$$
(30.19)

be a sequence of nested subsets of the uncertain excitation space starting from the entire space \(\mathcal{E}_{0} = \mathbb{R}^{D}\) and shrinking to the target rare-event \(\mathcal{E}_{L} = \mathcal{E}\). By analogy with (30.3), subsets \(\mathcal{E}_{i}\) can be defined by relaxing the value of the critical threshold b:

$$\displaystyle{ \mathcal{E}_{i} = \left \{U \in \mathbb{R}^{D}:\max \limits _{ t=0,\ldots,T}g(X(t))> b_{i}\right \} }$$
(30.20)

where b 1 < < b L = b. In the actual implementation of SS, the number of subsets L and the values of intermediate thresholds {b i } are chosen adaptively.

Using the notion of conditional probability and exploiting the nesting of the subsets, the target probability \(p_{\mathcal{E}}\) can be factorized as follows:

$$\displaystyle{ p_{\mathcal{E}} =\prod _{ i=1}^{L}\mathbb{P}(\mathcal{E}_{ i}\vert \mathcal{E}_{i-1}) }$$
(30.21)

An important observation is that by choosing the intermediate thresholds {b i } appropriately, the conditional events \(\{\mathcal{E}_{i}\vert \mathcal{E}_{i-1}\}\) can be made more frequent, and their probabilities can be made large enough to be amenable to efficient estimation by MCS-like methods.

The first probability \(\mathbb{P}(\mathcal{E}_{1}\vert \mathcal{E}_{0}) = \mathbb{P}(\mathcal{E}_{1})\) can be readily estimated by standard MCS:

$$\displaystyle{ \mathbb{P}(\mathcal{E}_{1}) \approx \frac{1} {n}\sum _{j=1}^{n}I_{ \mathcal{E}_{1}}(U_{j}), }$$
(30.22)

where U 1, , U n are i.i.d. samples from p(u). Estimating the remaining probabilities \(\mathbb{P}(\mathcal{E}_{i}\vert \mathcal{E}_{i-1})\), i ≥ 2, is more challenging since one needs to generate samples from the conditional distribution \(p(u\vert \mathcal{E}_{i-1}) = \frac{I_{\mathcal{E}_{i-1}}(u)p(u)} {\mathbb{P}(\mathcal{E}_{i-1})}\), which, in general, is not a trivial task. Notice that a sample U from \(p(u\vert \mathcal{E}_{i-1})\) is one drawn from p(u) that lies in \(\mathcal{E}_{i-1}\). However, it is not efficient to use MCS for generating samples from \(p(u\vert \mathcal{E}_{i-1})\): sampling from p(u) and accepting only those samples that belong to \(\mathcal{E}_{i-1}\) is computationally very expensive, especially at higher levels i.

In standard SS, samples from the conditional distribution \(p(u\vert \mathcal{E}_{i-1})\) are generated by the modified Metropolis algorithm (MMA) [4] which belongs to the family of MCMC methods for sampling from complex probability distributions that are difficult to sample directly from [35, 44]. An alternative strategy – splitting – is described in the next section.

The MMA algorithm is a component-wise version of the original Metropolis algorithm [38]. It is specifically tailored for sampling from high-dimensional conditional distributions and works as follows. First, without loss of generality, assume that p(u) = k = 1 D p k (u k ), i.e., components of U are independent. This assumption is indeed not a limitation, since in simulation one always starts from independent variables to generate correlated excitation histories U. Suppose further that some vector \(U_{1} \in \mathbb{R}^{D}\) is already distributed according to the target conditional distribution, \(U_{1} \sim p(u\vert \mathcal{E}_{i-1})\). MMA prescribes how to generate another vector \(U_{2} \sim p(u\vert \mathcal{E}_{i-1})\), and it consists of two steps:

  1. 1.

    Generate a “candidate” state V as follows: first, for each component k = 1, , D of V, sample ν(k) from the symmetric univariate proposal distribution q k, i (ν | U 1(k)) centered on the k th component of U 1, where symmetry means that q k, i (ν | u) = q k, i (u | ν); then, compute the acceptance ratio \(r_{k} = \frac{p_{k}(\nu (k))} {p_{k}(U_{1}(k))}\); and finally, set

    $$\displaystyle{ V (k) = \left \{\begin{array}{ll} \nu (k), &\mbox{ with probability }\min \{1,r_{k}\} \\ U_{1}(k),&\mbox{ with probability }1 -\min \{ 1,r_{k}\}\end{array} \right. }$$
    (30.23)
  2. 2.

    Accept or reject the candidate state V:

    $$\displaystyle{ U_{2} = \left \{\begin{array}{ll} V, &\mbox{ if }V \in \mathcal{E}_{i-1} \\ U_{1},&\mbox{ if }V \notin \mathcal{E}_{i-1}\end{array} \right. }$$
    (30.24)

It can be shown that U 2 generated by MMA is indeed distributed according to the target conditional distribution \(p(u\vert \mathcal{E}_{i-1})\) when U 1 is [4]. For a detailed discussion of MMA, the reader is referred to [56].

The procedure for generating conditional samples at level i is as follows. Starting from a “seed” \(U_{1} \sim p(u\vert \mathcal{E}_{i-1})\), one can now use MMA to generate a sequence of random vectors U 1, , U n , called a Markov chain, distributed according to \(p(u\vert \mathcal{E}_{i-1})\). At each step, U j is used to generate the next state U j+1. Note that although these MCMC samples are identically distributed, they are clearly not independent: the correlation between successive samples is due to the proposal PDFs {q k, i } at level i that govern the generation of U j+1 from U j . Nevertheless, U 1, , U n can still be used for statistical averaging as if they were i.i.d., although with certain reduction in efficiency [4]. In particular, similar to (30.22), the conditional probability \(\mathbb{P}(\mathcal{E}_{i}\vert \mathcal{E}_{i-1})\) can be estimated as follows:

$$\displaystyle{ \mathbb{P}(\mathcal{E}_{i}\vert \mathcal{E}_{i-1}) \approx \frac{1} {n}\sum _{j=1}^{n}I_{ \mathcal{E}_{i}}(U_{j}) }$$
(30.25)

To obtain an estimator for the target probability \(p_{\mathcal{E}}\), it remains to multiply the MCS (30.22) and MCMC (30.25) estimators of all factors in (30.21). In real applications, however, it is often difficult to rationally define the subsets \(\{\mathcal{E}_{i}\}\) in advance, since it is not clear how to specify the values of the intermediate thresholds {b i }. In SS, this is done adaptively. Specifically, let U 1 (0), , U n (0) be the MCS samples from p(u), X 1 (0), , X n (0) be the corresponding trajectories from (30.1), and G j (0) = max t = 0, , T g(X j (0)(t)) be the resulting performance values. Assume that the sequence {G j (0)} is ordered in a nonincreasing order, i.e., G 1 (0)G n (0), renumbering the samples where necessary. Define the first intermediate threshold b 1 as follows:

$$\displaystyle{ b_{1} = \frac{G_{np_{0}}^{(0)} + G_{np_{0}+1}^{(0)}} {2} }$$
(30.26)

where p 0 is a chosen probability satisfying 0 < p 0 < 1. This choice of b 1 has two immediate consequences: first, the MCS estimate of \(\mathbb{P}(\mathcal{E}_{1})\) in (30.22) is exactly p 0, and, second, \(U_{1}^{(0)}\ldots,U_{np_{0}}^{(0)}\) not only belong to \(\mathcal{E}_{1}\) but also are distributed according to the conditional distribution \(p(u\vert \mathcal{E}_{1})\). Each of these np 0 samples can now be used as mother seeds in MMA to generate \(( \frac{1} {p_{0}} - 1)\) offspring, giving a total of n samples \(U_{1}^{(1)},\ldots,U_{n}^{(1)} \sim p(u\vert \mathcal{E}_{1})\). Since these seeds start in the stationary state \(p(u\vert \mathcal{E}_{1})\) of the Markov chain, this MCMC method gives perfect sampling, i.e., no wasteful burn-in period is needed. Similarly, b 2 is defined as

$$\displaystyle{ b_{2} = \frac{G_{np_{0}}^{(1)} + G_{np_{0}+1}^{(1)}} {2} }$$
(30.27)

where {G j (1)} are the (ordered) performance values corresponding to excitations {U j (1)}. Again by construction, the estimate (30.25) gives \(\mathbb{P}(\mathcal{E}_{2}\vert \mathcal{E}_{1}) \approx p_{0}\), and \(U_{1}^{(1)},\ldots,U_{np_{0}}^{(1)} \sim p(u\vert \mathcal{E}_{2})\). The SS method proceeds in this manner until the target rare-event \(\mathcal{E}\) is reached and is sufficiently sampled. All but the last factor in (30.21) are approximated by p 0, and the last factor \(\mathbb{P}(\mathcal{E}\vert \mathcal{E}_{L-1}) \approx \frac{n_{\mathcal{E}}} {n} \geq p_{0}\), where \(n_{\mathcal{E}}\) is the number of samples in \(\mathcal{E}\) among \(U_{1}^{(L-1)},\ldots,U_{n}^{(L-1)} \sim p(u\vert \mathcal{E}_{L-1})\). The method is more formally summarized in the following pseudo-code.

Implementation details of SS, in particular the choice of level probability p 0 and proposal distributions {q k }, are thoroughly discussed in [56]. It has been confirmed that p 0 = 0. 1 proposed in the original paper [4] is a nearly optimal value. The choice of {q k, i } is more delicate, since the efficiency of MMA strongly depends on the proposal PDF variances in a nontrivial way: proposal PDFs with both small and large variance tend to increase the correlation between successive samples, making statistical averaging in (30.25) less efficient. In general, finding the optimal variance of proposal distributions is a challenging task not only for MMA but also for almost all MCMC algorithms. Nevertheless, it has been found in many applications that using \(q_{k,i}(\nu \vert u) = \mathcal{N}(\nu \vert u,\sigma _{k,i}^{2})\), the Gaussian distribution with mean u and variance σ k, i 2 yields good efficiency if σ k, i 2 = σ 0 2 and p(u) is a multi-dimensional Gaussian with all variances equal to σ 0 2. For an adaptive strategy for choosing {q k, i }, the reader is referred to [56]; for example, σ k, i 2 = σ i 2 can be chosen so that the observed average acceptance rate in MMA, based on a subset of samples at level i, lies in the interval [0. 3, 0. 5].

It can be shown [4, 7] that, given \(p_{\mathcal{E}}\), p 0, and the total number of samples N, the c.o.v. of the SS estimator \(p_{\mathcal{E}}^{SS}\) is given by

$$\displaystyle{ \delta ^{2}(p_{ \mathcal{E}}^{SS}\vert p_{ \mathcal{E}},p_{0},N) = \frac{(1+\gamma )(1 - p_{0})} {Np_{0}(\ln p_{0}^{-1})^{r}} (\ln p_{\mathcal{E}}^{-1})^{r} }$$
(30.28)

where 2 ≤ r ≤ 3 and γ is approximately a constant that depends on the state correlation of the Markov chain at each level. Numerical experiments show that r = 2 gives a good approximation to the c.o.v. and that γ ≈ 3 if the proposal variance σ i 2 for each level is appropriately chosen [4, 7, 56]. It follows from (30.13) that \(\delta _{MCS}^{2} \propto p_{\mathcal{E}}^{-1}\) for MCS, while for SS, \(\delta _{SS}^{2} \propto (\ln p_{\mathcal{E}}^{-1})^{r}\). This drastically different scaling behavior of the c.o.v.’s with small \(p_{\mathcal{E}}\) directly exhibits the improvement in efficiency.

To compare an advanced stochastic simulation algorithm directly with MCS, which is always applicable (but not efficient) for rare-event estimation, [11] introduced the relative computation efficiency of an algorithm, η A , which is defined as the ratio of the number of samples N MCS required by MCS to the number of samples N A required by the algorithm for the same c.o.v. δ. The relative efficiency of SS is then

$$\displaystyle{ \eta _{SS} = \frac{N_{MCS}} {N_{SS}} = \frac{p_{0}(\ln p_{0}^{-1})^{r}} {(1+\gamma )(1 - p_{0})p_{\mathcal{E}}(\ln p_{\mathcal{E}}^{-1})^{r}} \approx \frac{0.03p_{\mathcal{E}}^{-1}} {(\log _{10}p_{\mathcal{E}}^{-1})^{2}} }$$
(30.29)

for r = 2, γ = 3, and p 0 = 0. 1. For rare events, \(p_{\mathcal{E}}^{-1}\) is very large, and, as expected, SS outperforms MCS; for example, if \(p_{\mathcal{E}} = 10^{-6}\), then η SS ≈ 800.

In recent years, a number of modifications of SS have been proposed, including SS with splitting [17] (described in the next section), hybrid SS [18], two-stage SS [30], spherical SS [31], and SS with delayed rejection [57]. A Bayesian post-processor for SS, which generalizes the Bayesian interpretation of MCS described above, was developed in [56]. In the original paper [4], SS was developed for estimating reliability of complex civil engineering structures such as tall buildings and bridges at risk from earthquakes. It was applied for this purpose in [5] and [24]. SS and its modifications have also been successfully applied to rare-event simulation in fire risk analysis [8], aerospace [39, 52], nuclear [16], wind [49] and geotechnical engineering [46], and other fields. A detailed exposition of SS at an introductory level and a MATLAB code implementing the above pseudo-code are given in [55]. For more advanced and complete reading, the fundamental monograph on SS [7] is strongly recommended .

5 Splitting

In the previously presented stochastic simulation methods, samples of the input and output discrete-time histories, \(\{U(t): t = 0,\ldots,T\} \subset \mathbb{R}^{m}\) and \(\{X(t): t = 0,\ldots,T\} \subset \mathbb{R}^{n}\), are viewed geometrically as vectors U and X that define points in the vector spaces \(\mathbb{R}^{(T+1)m}\) and \(\mathbb{R}^{(T+1)n}\), respectively. In the splitting method, however, samples of the input and output histories are viewed as trajectories defining paths of length (T + 1) in \(\mathbb{R}^{m}\) and \(\mathbb{R}^{n}\), respectively. Samples that reach a certain designated subset in the input or output spaces at some time are treated as “mothers” and are then split into multiple offspring trajectories by separate sampling of the input histories subsequent to the splitting time. These multiple trajectories can themselves subsequently be treated as mothers if they reach another designated subset nested inside the first subset at some later time and so be split into multiple offspring trajectories. This is continued until a certain number of the trajectories reach the smallest nested subset corresponding to the rare event of interest.

Splitting methods were originally introduced by Kahn and Harris [28], and they have been extensively studied (e.g., [12, 17, 42, 54]). We describe splitting here by using the framework of subset simulation where the only change is that the conditional sampling in the nested subsets is done by splitting the trajectories that reach each subset, rather than using them as seeds to generate more samples from Markov chains in their stationary state. As a result, only standard Monte Carlo simulation is needed, instead of MCMC simulation.

The procedure in [17] is followed here to generate offspring trajectories at the i th level (i = 1, , L) of subset simulation from each of the mother trajectories in \(\mathcal{E}_{i}\) constructed from samples from the previous level, except that we present it from the viewpoint of trajectories in the input space, rather than the output space. Therefore, at the i th level, each of the np 0 sampled input histories U j , j = 1, , np 0, from the previous level that satisfy \(U_{j} \in \mathcal{E}_{i}\), as defined in (30.20) (so the corresponding output history X j satisfies \(\max \limits _{t=0,\ldots,T}g(X_{j}(t))> b_{i}\)), is split at their first-passage time

$$\displaystyle{ t_{j} =\min \{ t = 0,\ldots,T: g(X_{j}(t))> b_{i}\} }$$
(30.30)

This means that the mother trajectory U j is partitioned as [U j , U j +] where U j = [U j (0), , U j (t j )] and U j + = [U j (t j + 1), , U j (T)]; then a subtrajectory sample \(\widetilde{U}_{j}^{+} = [\widetilde{U}_{j}(t_{j} + 1),\ldots,\widetilde{U}_{j}(T)]\) is drawn from

$$\displaystyle{ p(u_{j}^{+}\vert U_{ j}^{-},\mathcal{E}_{ i}) = \frac{\mathbb{P}(\mathcal{E}_{i}\vert u_{j}^{+},U_{j}^{-})} {\mathbb{P}(\mathcal{E}_{i}\vert U_{j}^{-})} p(u_{j}^{+}\vert U_{ j}^{-}) = p(u_{ j}^{+}\vert U_{ j}^{-}) = p(u_{ j}^{+}) }$$
(30.31)

where the last equation follows if one assumes independence of the U j (t), t = 0, , T (although it is not necessary). Also, \(\mathbb{P}(\mathcal{E}_{i}\vert u_{j}^{+},U_{j}^{-}) = 1 = \mathbb{P}(\mathcal{E}_{i}\vert U_{j}^{-})\). Note that the new input sample \(\widetilde{U}_{j} = [U_{j}^{-},\widetilde{U}_{j}^{+}]\) also lies in \(\mathcal{E}_{i}\) since it has the subtrajectory U j in common with U j , which implies that the corresponding outputs at the first-passage time t j are equal: \(\widetilde{X}_{j}(t_{j}) = X_{j}(t_{j})> b_{i}\). The offspring trajectory \(\widetilde{U}_{j}\) is a sample from p(u) lying in \(\mathcal{E}_{i}\), and so, like its mother U j , it is a sample from \(p(u\vert \mathcal{E}_{i})\). This process is repeated to generate \(( \frac{1} {p_{0}} - 1)\) such offspring trajectories from each mother trajectory, giving a total of \(np_{0}( \frac{1} {p_{0}} - 1) + np_{0} = n\) input histories that are samples from \(p(u\vert \mathcal{E}_{i})\) at the i th level.

The pseudo-code for the splitting version of subset simulation is the same as the previously presented pseudo-code for the MCMC version except that the part describing the generation of conditional samples at level i using the MMA algorithm is replaced by

To generate the same number of samples n at a level, the splitting version of subset simulation is slightly more efficient than the MCMC version using MMA because when generating the conditional samples, the input offspring trajectories \(\widetilde{U} = [\widetilde{U}^{-},\widetilde{U}^{+}]\) already have made available the first part \(\widetilde{X}^{-}\) of the corresponding output trajectory \(\widetilde{X} = [\widetilde{X}^{-},\widetilde{X}^{+}]\). Thus, (30.1) need only be solved for \(\widetilde{X}^{+}\) starting from the final value of \(\widetilde{X}^{-}\) (which corresponds to the first-passage time of the trajectory). A disadvantage of the splitting version is that it cannot handle parameter uncertainty in the model in (30.1) since the offspring trajectories must use (30.1) with the same parameter values as their mothers. Furthermore, the splitting version applies only to dynamic problems, as considered here. The MCMC version of subset simulation can handle parameter uncertainty and is applicable to both static and dynamic uncertainty quantification problems.

Ching, Au, and Beck [17] discuss the statistical properties of the estimators corresponding to (30.22) and (30.25) when the sampling at each level is done by the trajectory splitting method. They show that as long as the conditional probability in subset simulation satisfies p 0 ≥ 0. 1, the coefficient of variation for \(p_{\mathcal{E}}\) when estimating it by (30.21) and (30.25) is insensitive to p 0.

Ching, Beck, and Au [18] also introduce a hybrid version of subset simulation that combines some advantages of the splitting and MCMC versions when generating the conditional samples U j , j = 1, , n at each level. It is limited to dynamic problems because of the splitting, but it can handle parameter uncertainty through using MCMC. All three variants of subset simulation are applied to a series of benchmark reliability problems in [6]; their results imply that for the same computational effort in the dynamic benchmark problems, the hybrid version gives slightly better accuracy for the rare-event probability than the MCMC version. For a comparison between these results and those of other stochastic simulation methods that are applied to some of the same benchmark problems (e.g., spherical subset simulation, auxiliary domain method, and line sampling), the reader may wish to check [47].

6 Illustrative Example

To illustrate MCS, IS, and SS with MCMC and splitting for rare-event estimation, consider the following forced Lorenz system of ordinary differential equations:

$$\displaystyle\begin{array}{rcl} \dot{X_{1}} =\sigma (X_{2} - X_{1}) + U(t)& &{}\end{array}$$
(30.32)
$$\displaystyle\begin{array}{rcl} \dot{X_{2}} = rX_{1} - X_{2} - X_{1}X_{3}& &{}\end{array}$$
(30.33)
$$\displaystyle\begin{array}{rcl} \dot{X_{3}} = X_{1}X_{2} - bX_{3}& &{}\end{array}$$
(30.34)

where X(t) = (X 1(t), X 2(t), X 3(t)) defines the system state at time t and U(t) is the external excitation to the system. If U(t) ≡ 0, these are the original equations due to E. N. Lorenz that he derived from a model of fluid convection [36]. In this example, the three parameters σ, r, and b are set to σ = 3, b = 1, and r = 26. It is well known (e.g., [50]) that in this case, the Lorenz system has three unstable equilibrium points, one of which is

$$\displaystyle{ X^{{\ast}} = \left (\sqrt{b(r - 1)},\sqrt{b(r - 1)},r - 1\right ) = (5,5,25) }$$
(30.35)

that lies on one “wing” of the “butterfly” attractor. Let

$$\displaystyle{ X(0) = X^{{\ast}} + (1/2,1/2,1/2) = (5.5,5.5,25.5) }$$
(30.36)

be the initial condition and X(t) be the corresponding solution. Lorenz showed [36] that the solution of (30.3230.33, and 30.34) with U(t) ≡ 0 always (for any t) stays inside the bounding ellipsoid \(\mathbb{E}\):

$$\displaystyle{ \frac{X_{1}(t)^{2}} {R^{2}\frac{b} {\sigma } } + \frac{X_{2}(t)^{2}} {bR^{2}} + \frac{(X_{3}(t) - R)^{2}} {R^{2}} \leq 1,\quad R = r+\sigma }$$
(30.37)

Suppose that the system is now excited by U(t) = αB(t), where B(t) is the standard Brownian process (Gaussian white noise) and α is some scaling constant. The uncertain stochastic excitation U(t) makes the corresponding system trajectory X(t) also stochastic. Let us say that the event \(\mathcal{E}\) occurs if X(t) leaves the bounding ellipsoid \(\mathbb{E}\) during the time interval of interest [0, T].

The discretization of the excitation U is obtained by the standard discretization of the Brownian process:

$$\displaystyle{ U(0) = 0,\quad U(k) =\alpha B(k\varDelta t) = U(k - 1) +\alpha \sqrt{\varDelta t}Z_{k} =\alpha \sqrt{\varDelta t}\sum _{i=1}^{k}Z_{ i}, }$$
(30.38)

where Δt = 0. 1s is the sampling interval and k = 1, , D = TΔt, and Z 1, , Z D are i.i.d. standard Gaussian random variables. The target domain \(\mathcal{E}\subset \mathbb{R}^{D}\) is then

$$\displaystyle{ \mathcal{E} =\{ (Z_{1},\ldots,Z_{D}):\max _{0\leq k\leq D}g(k)> 1\}, }$$
(30.39)

where the system response g(k) at time t = kΔt is

$$\displaystyle{ g(k) = \frac{X_{1}(k\varDelta t)^{2}} {R^{2}\frac{b} {\sigma } } + \frac{X_{2}(k\varDelta t)^{2}} {bR^{2}} + \frac{(X_{3}(k\varDelta t) - R)^{2}} {R^{2}} }$$
(30.40)

Figure 30.1 shows the solution of the unforced Lorenz system (with α = 0 so U(t) = 0), and an example of the solution of the forced system (with α = 3) that corresponds to excitation \(U \in \mathcal{E}\) (slightly abusing notation, \(U = U(Z_{1},\ldots,Z_{D}) \in \mathcal{E}\) means that the corresponding Gaussian vector \((Z_{1},\ldots,Z_{D}) \in \mathcal{E}\)).

Fig. 30.1
figure 1

The left column shows the solution of the unexcited Lorenz system (α = 0) enclosed in the bounding ellipsoid \(\mathbb{E}\) (top) and the corresponding response function g(t) (bottom), where t ∈ [0, T], T = 100. The right top panel shows the solution of the forced Lorenz system (α = 3) that corresponds to an excitation \(U \in \mathcal{E}\). As it is clearly seen, this solution leaves the ellipsoid \(\mathbb{E}\). According to the response function g(t) shown in the right bottom panel, this first-passage event happens around t = 90

Monte Carlo Simulation: For α = 3, Fig. 30.2 shows the probability \(p_{\mathcal{E}}\) of event \(\mathcal{E}\) as a function of T estimated using standard MCS:

$$\displaystyle{ \hat{p}_{\mathcal{E}}^{MCS} = \frac{1} {N}\sum _{i=1}^{N}I_{ \mathcal{E}}(Z^{(i)}) }$$
(30.41)

where Z (i) = (Z 1 (i), , Z D (i)) ∼ ϕ(z) are i.i.d. samples from the standard D-dimensional Gaussian PDF ϕ(z). For each value of T, N = 104 samples were used. When T < 25, the accuracy of the MCS estimate (30.41) begins to degenerate since the total number of samples N becomes too small for the corresponding target probability. Moreover, for T < 15, none of the N-generated MCS samples belong to the target domain \(\mathcal{E}\), making the MCS estimate zero. Figure 30.2 shows, as expected, that \(p_{\mathcal{E}}\) is an increasing function of T, since the more time the system has, the more likely its trajectory eventually penetrates the boundary of ellipsoid \(\mathbb{E}\).

Fig. 30.2
figure 2

Top panel shows the estimate of the probability \(p_{\mathcal{E}}\) of event \(\mathcal{E}\) where α = 3 as a function of duration time T. For each value of T ∈ [5, 100], N = 104 samples were used in MCS, and n = 2 × 103 samples per conditional level were used in the two versions of SS. The MCS and SS/splitting estimates for \(p_{\mathcal{E}}\) are zero for T < 15 and T < 12, respectively. The bottom panel shows the total computational effort automatically chosen by both SS algorithms

Importance Sampling: IS is a variance reduction technique, and, as it was discussed in previous sections, its efficiency critically depends on the choice of the ISD q. Usually some geometric information about the target domain \(\mathcal{E}\) is needed for constructing a good ISD. To get some intuition, Fig. 30.3 shows the domain \(\mathcal{E}\) for two lower-dimensional cases: T = 1, Δt = 0. 5 (D = 2) and T = 1. 5, Δt = 0. 5 (D = 3). Notice that in both cases, \(\mathcal{E}\) consists of two well-separated subsets, \(\mathcal{E} = \mathcal{E}_{-}\cup \mathcal{E}_{+}\), which are approximately symmetric about the origin. This suggests that a good ISD must be a mixture of two distributions q and q + that effectively sample \(\mathcal{E}_{-}\) and \(\mathcal{E}_{+}\),

$$\displaystyle{ q(z) = \frac{q_{-}(z) + q_{+}(z)} {2} }$$
(30.42)
Fig. 30.3
figure 3

Left panel: visualization of the domain \(\mathcal{E}\) in two- dimensional case D = 2, where T = 1, Δt = 0. 5, and α = 20. N = 104 samples were generated and marked by red circles (respectively, green dots) if they do (respectively, do not) belong to \(\mathcal{E}\). Right panel: the same as on the left panel but with D = 3 and T = 1. 5

In this example, three different ISDs, denoted q 1, q 2, and q 3, are considered:

Case 1::

\(q_{\pm }(z) =\phi (z\vert \pm z_{\mathcal{E}})\), where \(z_{\mathcal{E}}\sim \phi (z\vert \mathcal{E})\). That is, we first generate a sample \(z_{\mathcal{E}}\in \mathcal{E}\) and then take ISD q 1 as the mixture of Gaussian PDFs centered at \(z_{\mathcal{E}}\) and \(-z_{\mathcal{E}}\).

Case 2::

\(q_{\pm }(z) =\phi (z\vert \pm z_{\mathcal{E}}^{{\ast}})\), where \(z_{\mathcal{E}}^{{\ast}}\) is obtained as follows. First we generate n = 1000 samples from ϕ(z) and define \(z_{\mathcal{E}}^{{\ast}}\) to be the sample in \(\mathcal{E}\) with the smallest norm. Sample \(z_{\mathcal{E}}^{{\ast}}\) can be interpreted as the “best representative” of \(\mathcal{E}_{-}\) (or \(\mathcal{E}_{+}\)), since \(\phi (z_{\mathcal{E}}^{{\ast}})\) has the largest (among generated samples) value. We then take ISD q 2 as the mixture of Gaussian PDFs centered at \(z_{\mathcal{E}}^{{\ast}}\) and \(-z_{\mathcal{E}}^{{\ast}}\).

Case 3::

To illustrate what happens if one ignores the geometric information about two components of \(\mathcal{E}\), we choose \(q_{3}(z) =\phi (z\vert z_{\mathcal{E}}^{{\ast}})\), as given in Case 2.

Let T = 1 and α = 20. The dimension of the uncertain excitation space is then D = 10. Table 30.1 shows the simulation results for the above three cases as well as for standard MCS. The IS method with q 1, on average, correctly estimates \(p_{\mathcal{E}}\). However, the c.o.v. of the estimate is very large, which results in large fluctuations of the estimate in independent runs. IS with q 2 works very well and outperforms MCS: the c.o.v. is reduced by half. Finally, IS with q 3 completely misses one component part of the target domain \(\mathcal{E}\), and the resulting estimate is about half of the correct value. Note that the c.o.v. in this case is very small, which is very misleading.

Table 30.1 Simulation results for IS and MCS. For each method, mean values \(\langle \hat{p}_{\mathcal{E}}\rangle\) of the estimates and their coefficient of variations \(\delta (\hat{p}_{\mathcal{E}})\) are based on 100 independent runs

It was mentioned in previous sections that IS is often not efficient in high dimensions because it becomes more difficult to construct a good ISD [5, 32]. To illustrate this effect, IS with q 2 was used to estimate \(p_{\mathcal{E}}\) for a sequence of problems where the total duration time gradually grows from T = 1 to T = 10. This results in an increase of the dimension D of the underlying uncertain excitation space from 10 to 100. Figure 30.4 shows how the IS estimate degenerates as the dimension D of the problem increases. While IS is accurate when D = 10 (T = 1), it strongly underestimates the true value of \(p_{\mathcal{E}}\) as D approaches 100 (T = 10).

Fig. 30.4
figure 4

Estimation of the target probability \(p_{\mathcal{E}}\) as a function of duration time T. Solid red and dashed blue curves correspond to MCS and IS with q 2, respectively. In this example, α = 20 and N = 104 samples for each value of T are used. It is clearly visible how the IS estimate degenerates as the dimension D goes from 10 (T = 1) to 100 (T = 10)

Subset Simulation: SS is a more advanced simulation method, and, unlike IS, it does not suffer from the curse of dimensionality. For α = 3, Fig. 30.2 shows the estimate of the target probability \(p_{\mathcal{E}}\) as a function of T using SS with MCMC and splitting. For each value of T, n = 2 × 103 samples were used in each conditional level in SS. Unlike MCS, SS is capable of efficiently simulating very rare events and estimating their small probabilities. The total computational effort, i.e., the total number N of samples automatically chosen by SS, is shown in the bottom panel of Fig. 30.2. Note that the larger the value of \(p_{\mathcal{E}}\), the smaller the number of conditional levels in SS, and, therefore, the smaller the total number of samples N. The total computational effort in SS is thus a decreasing function of T. In this example, the original MCMC strategy [4] for generating conditional samples outperforms the splitting strategy [17] that exploits the causality of the system: while the SS/MCMC method works even in the most extreme case (T = 5), the SS/splitting estimate for \(p_{\mathcal{E}}\) becomes zero for T < 12.

7 Conclusion

This chapter examines computational methods for rare-event simulation in the context of uncertainty quantification for dynamic systems that are subject to future uncertain excitation modeled as a stochastic process. The rare events are assumed to correspond to some time-varying performance quantity exceeding a specified threshold over a specified time duration, which usually means that the system performance fails to meet some design or operation specifications.

To analyze the reliability of the system against this performance failure, a computational model for the input-output behavior of the system is used to predict the performance of interest as a function of the input stochastic process discretized in time. This dynamic model may involve explicit treatment of parametric and nonparametric uncertainties that arise because the model only approximately describes the real system behavior, implying that there are usually no true values of the model parameters and the accuracy of its predictions is uncertain. In the engineering literature, the mathematical problem to be solved numerically for the probability of performance failure, commonly called the failure probability, is referred to as the first-passage reliability problem. It does not have an analytical solution, and numerical solutions must face two challenging aspects:

  1. 1.

    The vector representing the time-discretized stochastic process that models the future system excitation lies in an input space of high dimension;

  2. 2.

    The dynamic systems of interest are assumed to be highly reliable so that their performance failure is a rare event, that is, the probability of its occurrence, \(p_{\mathcal{E}}\), is very small.

As a result, standard Monte Carlo simulation and importance sampling methods are not computationally efficient for first-passage reliability problems. On the other hand, subset simulation has proved to be a general and powerful method for numerical solution of these problems. Like MCS, it is not affected by the dimension of the input space, and for a single run, it produces a plot of \(p_{\mathcal{E}}\) vs. threshold b covering \(p_{\mathcal{E}}\in [p_{0}^{-L},1]\), where L is the number of levels used. For a critical appraisal of methods for first-passage reliability problems in high dimensions, the reader may wish to check Schuëller et al. [48].

Several variants of subset simulation have been developed motivated by the goal of further improving the computational efficiency of the original version, although the efficiency gains, if any, are modest. All of them have an accuracy described by a coefficient of variation for the estimate of the rare-event probability that depends on \(\ln (1/p_{\mathcal{E}})\) rather than \(\sqrt{1/p_{\mathcal{E}}}\) as in standard Monte Carlo simulation. For all methods covered in this section, the dependence of this coefficient of variation on the number of samples N is proportional to N −1∕2. Therefore, in the case of very low probabilities, \(p_{\mathcal{E}}\), it still requires thousands of simulations (large N) of the response time history based on a dynamic model as in (30.1) in order to get acceptable accuracy. For complex models, this computational effort may be prohibitive.

One approach to reduce the computational effort when estimating very low rare-event probabilities is to utilize additional information about the nature of the problem for specific classes of reliability problems (e.g., [2, 3]). Another more general approach is to construct surrogate models (meta-models) based on using a relatively small number of complex-model simulations as training data. The idea is to use a trained surrogate model to rapidly calculate an approximation of the response of the complex computational model as a substitute when drawing new samples. Various methods for constructing surrogate models have been applied in reliability engineering, including response surfaces [14], support vector machines [13, 23], neural networks [41], and Gaussian process modeling (Kriging) [21]. The latter method is a particularly powerful one because it also provides a probabilistic assessment of the approximation error. It deserves further exploration, especially with regard to the optimal balance between the accuracy of the surrogate model as a function of the number of training samples from the complex model and the accuracy of the estimate of the rare-event probability as a function of the total number of samples from both the complex model and the surrogate model.