An alternative to importance sampling in estimating rare events and related functionals is multilevel splitting. In the context of estimating probabilities of a set \(\mathscr {C}\) in path space, the multilevel splitting philosophy is to simulate particles that evolve according to the law of \(\left\{ X_{i}\right\} \), and at certain times split those particles considered more likely to lead to a trajectory that belongs to the set \(\mathscr {C}\). For example, \(\mathscr {C}\) might be the trajectories that reach some unlikely set B before hitting a likely set A, after starting in neither A nor B. In this case, the splitting will favor migration toward B. Splitting can also be used to enhance the sampling of regions that are important for a given integral. In all cases, particles which are split are given an appropriate weighting to ensure that the algorithm remains unbiased.

Broadly speaking, there are two types of multilevel splitting algorithms, those with killing and those without, where stopping is distinguished from killing. In the example just mentioned, particles are stopped upon entry into either A or B. Killing involves abandoning a particle prior to entry into either A or B, presumably because continuation of the trajectory is not worth the computational effort. Care must be taken that any killing will not introduce bias.

To the authors’ knowledge, there is only one type of multilevel splitting algorithm without killing—the splitting algorithm (see [148] for further references). The standard implementation of this algorithm requires a sequence of sets \(C_{J}\supset C_{J-1}\supset \cdots \supset C_{0} \), splitting thresholds called splitting thresholds,  and a sequence of positive integers \(R_{J-1},\ldots , R_{0}\), splitting rates called splitting rates.   A single particle is started at the initial position \(x_{0}\in C_{J}\backslash C_{J-1}\) and evolves according to the law of \(\left\{ X_{i}\right\} \). When a particle enters a set \(C_{j}\) for the first time, it produces \(R_{j}-1\) offspring. After splitting has occurred, all particles evolve independently of each other. Each particle is stopped according to whatever stopping rule is associated with \(\left\{ X_{i}\right\} \), and the algorithm terminates when all the particles generated have been stopped. The probability of interest is approximated by \(N/\prod _{i=0}^{J-1}R_{i}\), where N is the number of particles simulated whose trajectories belong to \(\mathscr {C}\). A more general version of this algorithm lets the splitting rates \(R_{i}\) take nonnegative real values, in which case the number of offspring is randomized.

A large deviation analysis of ordinary splitting is given in [76], which shows that it performs quite well when the thresholds are chosen properly. Although the splitting algorithm can be very effective, there is one clear source of inefficiency in dealing with rare events. The vast majority of the particles generated will not have trajectories that belong to the set \(\mathscr {C}\), and so much of the computational effort is devoted to generating trajectories that do not make any direct contribution. Multilevel splitting algorithms with killing were introduced as a way to mitigate this problem. One of the first such algorithms was the RESTART (Repetitive Simulation Trials After Reaching Threshold) algorithm, introduced in [241, 242] (for others, see [208] and the references therein). Its implementation is identical to that of the standard splitting algorithm except that particles are split every time they enter a splitting threshold (and not just at the first entrance time, as with the standard splitting scheme), and particles are killed when they exit the splitting threshold in which they were born. The initial particle is assumed to be born in the set \(C_{J}\), which by convention is equal to the state space of the process \(\left\{ X_{i}\right\} \), and so this particle is never killed.

The standard version of the RESTART algorithm requires that the splitting rates be integer-valued and that the process not cross more than one splitting threshold at each time step. However, its definition implicitly allows one to design an algorithm in which the process can cross more than one threshold in each time step. The issue of allowing the process to cross more than one threshold in any given time step was first addressed explicitly in the context of the DPR (Direct Probability Redistribution) algorithm, introduced in [152, 153].

In this chapter we will develop a theory for multilevel splitting, and in particular the RESTART/DPR algorithm, which parallels the theory for importance sampling that was developed in the last two chapters. Since the algorithm is notationally much more complicated than importance sampling, to simplify the presentation we consider only the case of estimating small probabilities, and refer to [77] for expected values.

Although the statements of performance analysis for splitting are often similar to those for importance sampling, it should be noted that there is an important distinction between the types of subsolution required for the two methods. For importance sampling, one needs functions that are classical-sense subsolutions. In contrast, splitting-type schemes require only a subsolution in a weak sense (see Definition 16.12). Indeed, this is in some sense expected, since importance sampling uses the gradient of the subsolution to construct the algorithm, while splitting uses only the function itself. For some problems it is easier to construct weak-sense subsolutions, in which case splitting-type schemes can be easier to apply. These and related issues will be discussed and illustrated by examples in Chap. 17.

When comparing importance sampling and splitting, one must recognize that the work used to produce samples need not be the same, and in fact, depending on circumstances, one method can be strongly favored over the other. However, when one uses subsolutions to design splitting schemes, the comparison simplifies somewhat, especially if the performance measure for importance sampling is the exponential decay rate. Suppose one were to consider, say, a work-normalized relative error [see (16.21)]. We will show that the computing cost of splitting grows subexponentially when a subsolution is used. Thus the main issue in comparing splitting to importance sampling is to compare decay rates. This assumes that the implementation of importance sampling is relatively straightforward, i.e., given a subsolution, it is easy to compute the needed alternative sampling distribution. Since splitting simulates using only the original dynamics, it may be preferred when this is not the case, as in some multiscale models.

In the rest of this chapter, unless explicitly stated otherwise, by a splitting algorithm we mean the RESTART/DPR algorithm. We focus on this version of splitting, since in our experience it is usually preferable to ordinary splitting, and we will just note that analogous versions of all the statements presented here apply to ordinary splitting [76], and with much easier proofs. In Sect. 16.3, we derive formulae for the computational cost and second moment of the algorithm. These will be used in the asymptotic analysis, and Sect. 16.4 considers the asymptotic problem. In Sect. 16.4, a method for designing RESTART/DPR algorithms based on the subsolution framework is developed. Expressions for the asymptotic work-normalized error of such algorithms are derived using the formulas developed in the previous section, and subsolutions that lead to asymptotically optimal performance are identified. The formulation of splitting that is appropriate for finite-time problems of the same type as that considered in the context of importance sampling in Chaps. 14 and 15 is presented in Sect. 16.5.

1 Notation and Terminology

Let \(\left\{ X_{i}\right\} _{i\in \mathbb {N}_{0}}\) be a Markov chain with state space \(\mathbb {R}^{d}\) for some \(d\in \mathbb {N}\). Although we will later consider processes \(\left\{ X_{i}\right\} _{i\in \mathbb {N}_{0}}\) as elements of a sequence that satisfies a large deviation property, for notational simplicity the large deviation index is initially suppressed. Until Sect. 16.5, we focus on estimating

$$\begin{aligned} P_{x_{0}}\{X_{M}\in B\}, \end{aligned}$$
(16.1)

where \(M\doteq \inf \{i:X_{i}\in A\cup B\}\), and as in Sect. 14.1, A is open, B is closed, and \(A\cap B=\emptyset \). Although not necessary, to simplify some arguments we will assume as in Remark 15.8 that \((A\cup B)^{c}\) is bounded and that its closure is denoted by D.

The following notation will be used. Branching processes, which take values in \(\cup _{n=1}^{\infty }(\mathbb {R}^{d})^{n}\), are denoted by \(\left\{ Z_{i}\right\} _{i\in \mathbb {N}_{0}}\). Each branching process has an \(\mathbb {N}\)-valued process \(\{N_{i}\}_{i\in \mathbb {N}_{0}}\) associated with it, where \(N_{i}\) equals the number of particles present in the branching process at time i. As will be explained later in the section, particles are born through branching of existing particles when they reach certain thresholds, while particles die when they exit certain regions.

For each \(i\in \mathbb {N}_{0}\) and \(j=1,\ldots , N_{i}\), \(Z_{i, j}\) denotes the state of the jth particle at time i. We also define a measure on \(\mathbb {R}^{d}\) associated with such a branching process by a random measure associated with branching processes

$$ \bar{\delta }_{Z_{i}}\doteq \sum _{j=1}^{N_{i}}\delta _{Z_{i, j}}. $$

Note that this is typically not a probability measure, and it is referred to as the unnormalized empirical measure. If \(Z_{i, j}\) is in either A or B, then it is killed at the next time step, and so will be counted only once in this measure. Note that this killing is distinct from the killing introduced for algorithmic efficiency.

Splitting schemes are often defined in terms of “importance functions.”   Later on, these importance functions will be identified with subsolutions translated by a constant, and we use U to denote such an object.Footnote 1 To be precise, an importance function is a continuous mapping \(U:\mathbb {R}^{d} \rightarrow \mathbb {R}\) that is bounded from below. As we will see, it is only the relative values of U(x) at different points that matter, and so we assume for simplicity of notation that \(U(x)\ge 0\) for all \(x \in \mathbb {R}^d\). There is also a parameter \(\varDelta \in (0,\infty )\) such that \(R\doteq e^{\varDelta } \in \{2,3,\ldots \}\), and we define closed sets \(C_{j}\) by

$$ C_{j}\doteq \{x\in D:U(x)\le j\varDelta \} $$

for \(0\le j\le J-1\doteq \left\lceil U(x_{0})/\varDelta \right\rceil -1\) and \(C_{J}\doteq D\). Note that \(x_{0}\notin C_{J-1}\). We also define a piecewise constant function \(\bar{U}\) by setting \(\bar{U}(x)=0\) for \(x\in C_{0}\) and \(j\varDelta \) if \(x\in C_{j}\backslash C_{j-1}\)

$$ \bar{U}(x)=j\varDelta \text { for }x\in C_{j}\backslash C_{j-1}, j-0,1,\ldots , J, $$

where we follow the convention \(C_{-1}=\emptyset \). After we introduce the large deviation scaling, it will be possible to obtain a collection of importance functions corresponding to a collection of values of the large deviation index from a single “generating” function in a convenient manner. While it would be possible to allow the splitting rate R or the spacing \(\varDelta \) between levels to depend on j, we will not do so once this scaling is used, and so to simplify notation, we have chosen not to do so here.

Later on, U will be derived from a subsolution \(\bar{V}\) that satisfies a boundary condition (i.e., \(\bar{V}(x)\le 0\) for \(x\in B\)). One possibility will be to let \(c\doteq \min [\bar{V}(x):x\in B]\), with \(c\le 0\) due to the boundary condition, and then let U(x) be equal to \([\bar{V}(x)-c]\vee 0\). In this case, as illustrated in Fig. 16.1, \(C_{0}\cap B\) may be smaller than B. Although the process stops when B is entered, if it crosses into \(C_{1}\) or \(C_{2}\) without entering B, the branching will continue. Thus the number of thresholds crossed before entering B depends on where B is entered. An alternative is to take U(x) equal to \(\bar{V}(x)\vee 0\), in which case \(\bar{V}(x)\le 0\) for \(x\in B\) implies \(B\subset C_{0}\), as in Fig. 16.2. The rate of decay of the second moment will be the same for both schemes, though one might expect slightly better performance from the first scheme.

Fig. 16.1
figure 1

Splitting based on \([\bar{V}(x)-c]\vee 0\)

Fig. 16.2
figure 2

Splitting based on \(\bar{V}(x)\vee 0\)

Given an importance function U and \(x\in D\), let \(\sigma (x)\) be the unique integer j such that \(x\in C_{j}\backslash C_{j-1}\) unique integer j such that \(x\in C_{j}\backslash C_{j-1}\). We let \(\bar{U}_{k}\) denote the common value of \(\bar{U}(x)\) for all x such that \(\sigma (x)=k\) \(\bar{U}(x)\) if \(\sigma (x)=k\) (i.e., \(x\in C_{k}\backslash C_{k-1}\)), and note that with this notation,

$$\begin{aligned} e^{-\bar{U}(x)}=e^{-\bar{U}_{\sigma (x)}}=e^{-\bar{U}_{k}}=R^{-k}\text { for all }x\in C_{k}\backslash C_{k-1}\text {, }k=J, J-1,\ldots , 0. \end{aligned}$$
(16.2)

With ordinary splitting, weights are assigned to particles so that a unit mass associated with a single particle starting at any location is partitioned evenly among the descendants at each splitting. Thus (16.2) are natural weightings for the given splitting rates, in that the fraction of the mass associated with each such descendant after k thresholds are crossed is \(R^{-k}\). The issue is more subtle with RESTART, since particles reaching B can be the result of more splitting events than the number of intervening thresholds (due to multiple reentries of a particle into a splitting threshold). Nonetheless, owing to the killing, \(R^{-k}\) is still the correct weight to apply, as will be shown in the proof of unbiasedness.

In the standard version of the RESTART algorithm, splitting is fairly simple. Every time a particle enters a splitting threshold \(C_{j}\), the deterministic number \(R-1\) of offspring are generated, so that including the parent, R particles result. Also, particles are destroyed when they exit the splitting threshold in which they are born, and thus each particle has an integer attached to it to record this splitting threshold. These are referred to as the support thresholds of the particles. While in general, one could allow the number of particles to be random and thereby accommodate arbitrary \(R\in (0,\infty )\), there seems to be little practical benefit in doing so, and we restrict to the case in which R is an integer. In this chapter we will consider multilevel splitting, which accounts for the fact that the particles can jump more than one level in a single step. In such a case, different support thresholds must be assigned to the offspring. In order to analyze this mathematically, it is convenient to use the following notation. Let \(\mathbb {S}\) be the set of elements \(q\in \mathbb {N}_{0}^{\infty }\) such that \(q_{j}=0\) for all sufficiently large j. Vectors \(q\in \mathbb {S}\) will be referred to as splitting vectors.

Consider a particle in a multilevel splitting algorithm that moves from \(C_{j}\backslash C_{j-1}\) to \(C_{k}\backslash C_{k-1}\), \(k<j\), in a given time step. Then splitting will occur, and all offspring as well as the original parent particle will be located in \(C_{k}\backslash C_{k-1}\). The support threshold of each new particle will be an element of \(\left\{ k,\ldots , j-1\right\} \), and numbers of offspring and their support thresholds will be independent of all past data except through the values of j and k. It follows that the splitting of a particle is equivalent to assigning to each particle that splits a vector \(q\in \mathbb {S} \). The number of new particles will be equal to \(\sum _{l=0}^{\infty }q_{l}\), and precisely \(q_{l}\) of the new particles will be given support threshold l. Given that each particle generates \(R-1\) descendants upon moving from \(C_{j+1}\backslash C_{j}\) to \(C_{j}\backslash C_{j-1}\), it is clear that when moving from \(C_{j}\backslash C_{j-1}\) to \(C_{k}\backslash C_{k-1}\), we should use the splitting vector q(jk) defined by \(q_{l}(j, k)\doteq 0\) if either \(l\ge j\) or \(l<k\), and splitting vectors

$$\begin{aligned} q_{l}(j, k)\doteq (R-1)R^{j-l-1}\;\text {if }k\le j-1\text { and }k\le l\le j-1. \end{aligned}$$
(16.3)

Note that \(\sum _{l\in \mathbb {N}_{0}}q_{l}(j, k)=R^{j-k}-1\), and so including the original particle, exactly \(R^{j-k}\) particles are produced. We take \(q_{l}(j, k)=0\) for all l if \(j\le k\).

2 Formulation of the Algorithm

To define the algorithm, we assume the following condition. Conditions that imply the finiteness of M are given in Proposition 15.19.

Condition 16.1

M is almost surely finite.

Following the standard logic of acceleration methods generally, the hope is that with a well-chosen importance function U(x), the variance of the estimator is made lower than that of standard Monte Carlo by building in information regarding the underlying process and the event of interest.

In order to analyze the algorithm, we will need some recursive formulas. Observe that if we examine a generic particle at some time after the algorithm has started, then it will be in a set of the form \(C_{j}\backslash C_{j-1}\) and have a killing threshold in \(\left\{ j,\ldots , J\right\} \). For a Markov property to hold, if we imagine starting the process with an initial condition in \(C_{j}\backslash C_{j-1}\), \(j<J\), then the support threshold is part of the state variable, and so we must also assign a distribution to the support threshold that is consistent with the dynamics prior to entering \(C_{j}\backslash C_{j-1}\). The distribution of the support threshold of the initial particle will be denoted by \(\mathscr {I}\), and will be referred to as the initializing distribution . The correct form for such initializing distributions will be identified later on.

The estimator of (16.1), rewritten for a general initial condition \(x_{0}\) (i.e., one not necessarily in \(C_{J}\backslash C_{J-1}\)), is

$$ \sum _{i=0}^{\infty }\int _{\mathbb {R}^{d}}1_{B}(x)e^{\bar{U}(x)-\bar{U}(x_{0} )}\bar{\delta }_{Z_{i}}(dx). $$

We recall that particles are killed the step after entering A or B, and so contribute to the sum for at most one time index. The weighting term \(e^{\bar{U}(x)-\bar{U}(x_{0})}\) is important when the number of thresholds crossed before reaching B depends on where the particle is located in B, as in Fig. 16.1.

The splitting thresholds, splitting rates, and splitting vectors of the algorithm will be defined using importance functions U and initialization distributions \(\mathscr {I}\) as described previously. In Theorem 16.3 it will be shown that the algorithm is unbiased when the initializing distributions have a prescribed form that will be identified below. The algorithm, with the dependence on these quantities suppressed in the notation, can be written in pseudocode as follows.

figure a

Remark 16.2

The pseudocode just given presents a “parallel” version of the algorithm, in that all particles for a given threshold are split and then simulated either to the next threshold, the stopping criteria, or killing. Alternatively, one can implement a “sequential” version in which a particle is simulated until it either reaches the stopping criteria or is killed, recording where appropriate the number of additional particles that remain to be simulated for each threshold. After the current particle has been simulated to termination, the algorithm reverts to the highest threshold for which particles remain to be simulated, and starts a new particle.

Note that the output of the algorithm is indeed equal to the desired quantity

$$\begin{aligned} \gamma =e^{-\bar{U}(x_{0})}\sum _{i=0}^{\infty }\int _{\mathbb {R}^{d}} 1_{B}(y)e^{\bar{U}(y)}\bar{\delta }_{Z_{i}}(dy). \end{aligned}$$
(16.4)

An algorithm resulting from an importance function U, the collection of splitting vectors q(jk), and an initializing distribution \(\mathscr {I}\) will be said to be unbiased if

$$ E_{x_0}\left[ \gamma \right] =P_{x_{0}}\{X_{M}\in B\}. $$

Recall that the splitting rates R are defined in terms the level \(\varDelta >0\) that was used to partition U through \(R=e^{\varDelta }\). Define the vector

$$\begin{aligned} q_{l}\doteq \left\{ \begin{array} [c]{ll} 1, &{} l=J,\\ (R-1)R^{J-l-1}, &{} 0\le l\le J-1. \end{array} \right. \end{aligned}$$
(16.5)

In the setting of ordinary splitting, where particles are branched only when they enter a threshold for the first time, \(q_{l}\) is the number of descendants that would be born in threshold l if all particles descending from a single particle in threshold J were to make it to l. We then define probability distributions \(\lambda _{k}\) on \(\{k,\ldots , J\}\) by initializing distribution for splitting

$$\begin{aligned} \lambda _{k}(l)\doteq q_{l}/R^{J-k}=\left\{ \begin{array} [c]{ll} R^{k-J}, &{} l=J,\\ (R-1)R^{k-l-1}, &{} k\le l\le J-1. \end{array} \right. \end{aligned}$$
(16.6)

We extend the definition of \(\lambda _{k}\) to \(\{0,\ldots , J\}\) by setting \(\lambda _{k}(l)=0\) for \(l=0,\ldots , k-1\).

Theorem 16.3

Fix \(x_{0}\in (A\cup B)^{c}\) and suppose that \(X_{0}=x_{0}\). Let U be an importance function, and assume Condition 16.1. If the initializing distribution is \(\mathscr {I} =\lambda _{\sigma (x_{0})}\) and the splitting vectors q(jk) are as in (16.3), then the resulting splitting algorithm is unbiased.

The proof of Theorem 16.3 relies on the following lemma. Recall that as in the pseudocode, support threshold of particle m at time i

$$ L_{i, m}\doteq \text { support threshold of particle }m\text { at time }i, $$

and that \(\bar{\delta }_{Z_i}\doteq \sum _{m=1}^{N_i}\delta _{Z_{i, m}}\).

Lemma 16.4

Assume Condition 16.1 and let h be a nonnegative function on \(\mathbb {R}^{d}\). Let \(\bar{h}(x)=h(x)e^{\bar{U}(x)}\) and let \(i\in \mathbb {N}_{0}\) be given. For a splitting scheme with \(\mathscr {I}=\lambda _{\sigma (x_{0})}\) and q(jk) as in (16.3),

$$ e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{m=1}^{N_{i}}\bar{h}(Z_{i, m} )1_{\left\{ L_{i, m}=l\right\} }\right] =E_{x_{0}}\left[ h(X_{i} )\lambda _{\sigma (X_{i})}(l)1_{\left\{ M\ge i\right\} }\right] ,\;l=0,1,\ldots , J, $$

and

$$ e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \int _{\mathbb {R}^{d}}\bar{h}(y)\bar{\delta }_{Z_{i}}(dy)\right] =E_{x_{0}}\left[ h(X_{i})1_{\left\{ M\ge i\right\} }\right] . $$

Proof

We assume that \(M>0\), since otherwise, the lemma is trivial. The second result is obtained by summing the first one over l. Recall that \(Z_{i, j}\) records the location of particle number j at time i. We will prove the first display by induction on i. The result holds for \(i=0\), since in this case there is only a single particle with support threshold distribution \(\lambda _{\sigma (x_{0})}(l)\), and thus

$$\begin{aligned} e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \bar{h}(Z_{0,1})1_{\left\{ L_{0,1} =l\right\} }\right]&=e^{-\bar{U}(x_{0})}\bar{h}(x_{0})\lambda _{\sigma (x_{0})}(l)\\&=E_{x_{0}}\left[ h(X_{0})\lambda _{\sigma (X_{0})}(l)1_{\{M\ge 0\}}\right] . \end{aligned}$$

Suppose the result has been proved up to some time \(i^{*}\). We then claim that

$$\begin{aligned}&e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j=1}^{N_{i^{*}+1}} \bar{h}(Z_{i^{*}+1,j})1_{\left\{ L_{i^{*}+1,j}=l\right\} }\right] \nonumber \\&\quad =e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ e^{\bar{U}_{\sigma (Z_{0,1})} -\bar{U}_{\sigma (Z_{1,1})}}E_{Z_{1,1}}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] \right] . \end{aligned}$$
(16.7)

Thus we currently have one particle located at \(x_0\), whose support threshold lies in \(\{\sigma (x_0),\ldots , J\}\). To prove the claim, we will compute the above expectation by conditioning on \(Y_{0,1}\) as it appears in the pseudocode, which, we recall, has the distribution \(P\{Y_{0,1}\in dy\}=P\{X_{1}\in dy|X_{0}=Z_{0,1}\}\). It suffices to show that for all \(y\in \mathbb {R}^{d}\),

$$\begin{aligned}&E_{x_{0}}\left[ \left. \sum _{j=1}^{N_{i^{*}+1}}\bar{h}(Z_{i^{*}+1,j})1_{\left\{ L_{i^{*}+1,j}=l\right\} }\right| Y_{0,1}=y\right] \nonumber \\&\quad =e^{\bar{U}_{\sigma (x_{0})}-\bar{U}_{\sigma (y)}}E_{y}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*} , m}=l\right\} }\right] . \end{aligned}$$
(16.8)

Decomposing according to the support threshold, which has initializing distribution \(\lambda _{\sigma (x_{0})}(\cdot )\), we have

$$\begin{aligned}&E_{x_{0}}\left[ \left. \sum _{j=1}^{N_{i^{*}+1}}\bar{h}(Z_{i^{*}+1,j})1_{\left\{ L_{i^{*}+1,j}=l\right\} }\right| Y_{0,1}=y\right] \nonumber \\&\quad =\sum _{k=\sigma (x_{0})}^{J}\lambda _{\sigma (x_{0})}(k)E_{x_{0} ,y, k}\left[ \sum _{m=1}^{N_{i^{*}+1}}\bar{h}(Z_{i^{*}+1,m})1_{\left\{ L_{i^{*}+1,m}=l\right\} }\right] , \end{aligned}$$
(16.9)

where \(E_{x_{0},y, k}\) denotes expected value given \(Z_{0,1}=x_{0}\), \({Y} _{0,1}=y\), and \(L_{0,1}=k\). Similarly, \(E_{x, k}\) will denote the expected value given \(Z_{0,1}=x\) and \(L_{0,1}=k\). Note that by the Markov property,

$$\begin{aligned}&E_{x_{0},y, k}\left[ \sum _{m=1}^{N_{i^{*}+1}}\bar{h}(Z_{i^{*} +1,m})1_{\left\{ L_{i^{*}+1,m}=l\right\} }\right] \\&\quad =E_{x_{0},y, k}\left[ \sum _{r=1}^{N_{1}}E_{Z_{1,r}, L_{1,r}}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*} , m}=l\right\} }\right] \right] . \end{aligned}$$

Thus the expression in (16.9) can be written as

$$\begin{aligned} \sum _{k=\sigma (x_{0})}^{J}\lambda _{\sigma (x_{0})}(k)E_{x_{0},y, k}\left[ \sum _{r=1}^{N_{1}}E_{Z_{1,r}, L_{1,r}}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] \right] . \end{aligned}$$
(16.10)

We now consider the expression (16.10) for the three cases \(\sigma (y)=\sigma (x_{0})\), \(\sigma (y)>\sigma (x_{0})\), and \(\sigma (y)<\sigma (x_{0})\), and show that in each of these cases, the expression equals the right side of (16.8).

Consider the first case \(\sigma (y)=\sigma (x_{0})\). In this case, neither killing nor branching occurs, and so we have \(N_{1}=1\), \(Z_{1,1}=y\), and \(L_{1,1}=L_{0,1}=k\). Thus (16.10) can be written as

$$ \sum _{k=\sigma (y)}^{J}\lambda _{\sigma (y)}(k)E_{y, k}\left[ \sum _{m=1} ^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] =E_{y}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*} , m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] , $$

which equals the right side of (16.8), since \(e^{\bar{U} _{\sigma (x_{0})}-\bar{U}_{\sigma (y)}}=1\).

Consider next the second case \(\sigma (y)>\sigma (x_{0})\). In this case, the particle has moved to a threshold with higher index, and branching does not occur. Recall that the particle is killed if and only if \(k<\sigma (y)\), since this means that the particle exited its support threshold. Thus for \(\sigma (x_{0})\le k<\sigma (y)\), since \(N_{1}=0\),

$$ E_{x_{0},y, k}\left[ \sum _{r=1}^{N_{1}}E_{Z_{1,r}, L_{1,r}}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*} , m}=l\right\} }\right] \right] =0. $$

Also, if \(k\ge \sigma (y)\), then \(N_{1}=1\) and \(L_{1,1}=L_{0,1}=k\). Since \(\lambda _{\sigma (x_{0})}(k)/\lambda _{\sigma (y)}(k)=R^{\sigma (x_{0})-\sigma (y)}=e^{\bar{U}_{\sigma (x_{0})}-\bar{U}_{\sigma (y)}}\), (16.10) in this case can be written as

$$\begin{aligned}&\sum _{k=\sigma (y)}^{J}\lambda _{\sigma (x_{0})}(k)E_{y, k}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*} , m}=l\right\} }\right] \\&\quad =\sum _{k=\sigma (y)}^{J}\frac{\lambda _{\sigma (x_{0})}(k)}{\lambda _{\sigma (y)}(k)}\lambda _{\sigma (y)}(k)E_{y, k}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*} , m}=l\right\} }\right] \\&\quad =e^{\bar{U}_{\sigma (x_{0})}-\bar{U}_{\sigma (y)}}\sum _{k=\sigma (y)} ^{J}\lambda _{\sigma (y)}(k)E_{y, k}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] , \end{aligned}$$

which once more equals the right side of (16.8).

Finally, consider the case \(\sigma (y)<\sigma (x_{0})\). Here there is the possibility that new particles are created (i.e., \(N_{1}>1\)), though in all cases we have \(Z_{1,r}=y\). When new particles are created, the associated thresholds are determined according to the measure \(q_{l}(j, k)\), and so using (16.3) and the definition (16.6), (16.10) takes the form

$$\begin{aligned}&\sum _{k=\sigma (x_{0})}^{J}\lambda _{\sigma (x_{0})}(k)\left[ \sum _{j=\sigma (y)}^{\sigma (x_{0})-1}q_{j}(\sigma (x_{0}),\sigma (y))E_{y, j} \left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*},m}=l\right\} }\right] \right. \\&\quad \quad \left. +E_{y, k}\left[ \sum _{m=1}^{N_{i^{*}} }\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] \right] . \end{aligned}$$

Since the sum \(\sum _{j=\sigma (y)}^{\sigma (x_{0})-1}\) has no k dependence, using \(\sum _{k=\sigma (x_0)}^J{\!}\lambda _{\sigma (x_{0})}(k)=1\), that for \(j\in \{\sigma (y),\ldots ,\sigma (x_{0})-1\}\),

$$ q_{j}(\sigma (x_{0}),\sigma (y))=e^{\bar{U}_{\sigma (x_{0})}-\bar{U}_{\sigma (y)} }\lambda _{\sigma (y)}(j), $$

and that for \(k\ge \sigma (x_{0})\),

$$ \lambda _{\sigma (x_{0})}(k)=e^{\bar{U}_{\sigma (x_{0})}-\bar{U}_{\sigma (y)} }\lambda _{\sigma (y)}(k), $$

this quantity can be written as

$$ e^{\bar{U}_{\sigma (x_{0})}-\bar{U}_{\sigma (y)}}\sum _{k=\sigma (y)}^{J}\lambda _{\sigma (y)}(k)E_{y, k}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] . $$

Thus in this case as well, (16.10) equals the right side of (16.8). This completes the proof of (16.8), and hence we have proved the claim in (16.7).

Thus from the induction hypothesis and (16.7), we have that

$$\begin{aligned}&e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j=1}^{N_{i^{*}+1}} \bar{h}(Z_{i^{*}+1,j})1_{\left\{ L_{i^{*}+1,j}=l\right\} }\right] \\&\quad = e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ e^{\bar{U}_{\sigma (Z_{0,1})} -\bar{U}_{\sigma (Z_{1,1})}}E_{Z_{1,1}}\left[ \sum _{m=1}^{N_{i^{*}}}\bar{h}(Z_{i^{*}, m})1_{\left\{ L_{i^{*}, m}=l\right\} }\right] \right] \\&\quad =e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ e^{\bar{U}_{\sigma (Z_{0,1})} }E_{Z_{1,1}}\left[ h(X_{i^{*}})\lambda _{\sigma (X_{i^{*}})}(l)1_{\{M\ge i^{*}\}}\right] \right] \\&\quad =E_{x_{0}}\left[ E_{Z_{1,1}}\left[ h(X_{i^{*}})\lambda _{\sigma (X_{i^{*}})}(l)1_{\{M\ge i^{*}\}}\right] \right] \\&\quad =E_{x_{0}}\left[ E_{X_{1}}\left[ h(X_{i^{*}})\lambda _{\sigma (X_{i^{*}})}(l)1_{\{M\ge i^{*}\}}\right] \right] \\&\quad =E_{x_{0}}\left[ h(X_{i^{*}+1})\lambda _{\sigma (X_{i^{*}+1} )}(l)1_{\{M\ge i^{*}+1\}}\right] , \end{aligned}$$

where the third equality uses the fact that \(Z_{1,1}\) and \(X_{1}\) have the same distribution, and the last equality uses the Markov property of \(\{X_{i}\}\). This completes the induction step, and thus the lemma follows.    \(\square \)

Proof

(of Theorem 16.3) Since by Condition 16.1, \(M<\infty \) a.s., we have

$$\begin{aligned} E_{x_{0}}\left[ 1_{B}(X_{M})\right]&=E_{x_{0}}\left[ \sum _{i=0} ^{\infty }1_{B}(X_{i})1_{\{M=i\}}\right] \\&=\sum _{i=0}^{\infty }E_{x_{0}}\left[ 1_{B}(X_{i})1_{\{M=i\}}\right] \\&=\sum _{i=0}^{\infty }e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \int _{\mathbb {R} ^{d}}1_{B}(x)e^{\bar{U}(x)}\bar{\delta }_{Z_{i}}(dx)\right] \\&=E_{x_{0}}\left[ \sum _{i=0}^{\infty }e^{-\bar{U}(x_{0})}\int _{\mathbb {R} ^{d}}1_{B}(x)e^{\bar{U}(x)}\bar{\delta }_{Z_{i}}(dx)\right] , \end{aligned}$$

where the second and fourth equalities use Tonelli’s theorem, and the third uses Lemma 16.4 applied to \(h=1_{B}\) and the observation that \(1_B(X_i)1_{\{M=i\}}= 1_B(X_i)1_{\{M\ge i\}}\). The result now follows on observing that the term on the last line equals \(E_{x_{0}}\left[ \gamma \right] \).    \(\square \)

3 Performance Measures

Recall that to derive a recurrence equation, we had to consider initializing distributions of the form \(\mathscr {I}=\lambda _{\sigma (x_{0})}\). In actual numerical implementation, it is always the case that \(x_{0}\in C_{J}\), which implies that all mass will be placed on \(l=J\).

The performance of the algorithm depends on two factors: the second moment (and hence variance) of the estimator and the computational cost of each simulation. To avoid discussion of any issues relating to the specific way the algorithm is implemented in practice, the computational cost is defined to be

$$\begin{aligned} w=\sum _{i=0}^{\infty }N_{i}, \end{aligned}$$
(16.11)

where \(N_{i}=\int _{\mathbb {R}^{d}}\bar{\delta }_{Z_{i}}(dx)\). Thus w is the sum of the lifetimes of all the particles simulated in the algorithm. In this section, formulas for both the second moment and computational cost are derived in terms of only the importance function and the underlying process. Throughout it is assumed that U is an importance function and Condition 16.1 is satisfied.

We begin by characterizing the mean of w.

Theorem 16.5

Assume Condition 16.1. Then

$$ E_{x_{0}}\left[ w\right] =e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{i=0} ^{M}e^{-\bar{U}(X_{i})}\right] . $$

Proof

With the third equality due to Lemma 16.4 applied to \(h(x)=e^{\bar{U}(x_{0})-\bar{U}(x)}\), an application of Tonelli’s theorem gives

$$\begin{aligned} E_{x_{0}}\left[ w\right]&=E_{x_{0}}\left[ \sum _{i=0}^{\infty } \int _{\mathbb {R}^{d}}\bar{\delta }_{Z_{i}}(dx)\right] \\&=\sum _{i=0}^{\infty }E_{x_{0}}\left[ \int _{\mathbb {R}^{d}}\bar{\delta }_{Z_{i}}(dx)\right] \\&=\sum _{i=0}^{\infty }e^{\bar{U}(x_{0})}E_{x_{0}}\left[ e^{-\bar{U}(X_{i} )}1_{\{M\ge i\}}\right] \\&=e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{i=0}^{\infty }e^{-\bar{U}(X_{i} )}1_{\{M\ge i\}}\right] \\&=e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{i=0}^{M}e^{-\bar{U}(X_{i} )}\right] . \end{aligned}$$

   \(\square \)

Next note that the following bounds hold for all U, all \(0\le k\le l<j\le J\), \(0\le k\le m<j\le J\). Since \(q_{l}(j, k)\) as defined in (16.3) equals \([R^{j-l}-R^{j-l-1}]\), it follows that

$$\begin{aligned} \begin{array}{c} (q_{l}(j, k))^{2}-q_{l}(j, k)\le [R^{j-l}-R^{j-l-1}]^{2},\\ q_{l}(j,k)q_{m}(j, k)=[R^{j-l}-R^{j-l-1}][R^{j-m}-R^{j-m-1}]. \end{array} \end{aligned}$$
(16.12)

We can now give bounds for the second moment of the splitting estimator. Let \(\mathfrak {S}(\bar{U})\) denote \(E_{x_{0}}[\gamma ^{2}]\) when \(\bar{U}\) is used to design the splitting scheme.

Theorem 16.6

Assume Condition 16.1. Then for all \(x_{0}\in (A\cup B)^{c}\),

$$\begin{aligned}&\mathfrak {S}(\bar{U})\le e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{i=1}^{M}e^{\bar{U}(X_{i-1})}\left[ P_{X_{i}}\{X_{M}\in B\}\right] ^{2}\right] \nonumber \\&\qquad \qquad \qquad \qquad +e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ e^{\bar{U}(X_{M} )}1_{B}(X_{M})\right] . \end{aligned}$$
(16.13)

Proof

Recall that M is the first entry time of the set \(A\cup B\subset D\). First consider the case in which there is \(T<\infty \) such that \(M\le T\) \(P_{x_{0}} \)-a.s. Let \(W(x)\doteq e^{\bar{U}(x)}E_{x}[\gamma ^{2}]\) with \(\gamma \) as in (16.4), and let s(xjk), \(k=0,\ldots ,\) denote iid sequences of random variables with the same distribution as \(\gamma \), conditioned on \(Z_{0,1}=x\) and \(L_{0,1}=j\). Note that since the maximum possible number of particles is bounded, these random variables are bounded.

The proof is based on finding a recurrence equation for W. If \(x_{0}\notin A\cup B\), then there are two contributions to \(\gamma \) depending on the killing and/or splitting that takes place over the next time step. The first is due to future contributions if the particle stays within the support threshold, and the second occurs if new particles are generated [\(\sigma (X_{1})<\sigma (X_{0})\)]. To account for thresholds of both the existing particles and those that might be generated, let \(Q_l(j, k)\) random vector defined in terms of \(q_{l}(j, k)\) be random vectors defined by \(Q_{l}(j,k)=q_{l}(j, k)\) for \(j>l\) (i.e., these components are deterministic), and such that \(Q_{l}(j, k)\) equals 1 for exactly one value of \(j\le l\le J\) and 0 for remaining values, with the index chosen according to the initializing distribution \(\lambda _{j}\). Recall that \(q_{l}(j, k)=0\) for all l if \(k\ge j\). To abbreviate notation temporarily, let \(\sigma _{i} =\sigma (Z_{i, 1}), i=0,1\). Then

$$\begin{aligned} W(x_{0})&= e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \left( 1_{\left\{ L_{0,1}\ge \sigma _{1}\right\} }e^{\bar{U}(Z_{1,1})-\bar{U}(Z_{0,1})}s(Z_{1,1}, L_{0,1};0)\right. \right. \\&\qquad \quad \left. \left. +1_{\left\{ \sigma _{0}>\sigma _{1}\right\} }\left( \sum _{j=\sigma _{1}}^{\sigma _{0}-1}\,\sum _{m=1}^{q_{j}(\sigma _{0},\sigma _{1})}e^{\bar{U}(Z_{1,1})-\bar{U}(Z_{0,1})}s(Z_{1,1}, j;m)\right) \right) ^{2}\right] \\&= e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \left( \sum _{j=0}^{J}\sum _{m=1}^{Q_{j}(\sigma _{0},\sigma _{1})}e^{\bar{U}(Z_{1,1})-\bar{U}(Z_{0,1} )}s(Z_{1,1}, j;m)\right) ^{2}\right] . \end{aligned}$$

We now use the following facts: \(L_{0,1}\) has distribution \(\lambda _{\sigma (X_{0})}\); \(Z_{1,1}\) has the same distribution (conditioned on \(Z_{0,1}=X_{0}=x_{0}\)) as \(X_{1}\); by the definition of \(Q_{l}(j, k)\), for all jkl [see also (16.6) and (16.5)],

$$\begin{aligned} E_{x_0}Q_{l}(j, k)e^{\bar{U}_{k}-\bar{U}_{j}}=\lambda _{k}(l); \end{aligned}$$
(16.14)

and that the future evolution of the algorithm is independent of the \(Q_{l}(j, k)\). Now let \(\sigma _{i}\) denote \(\sigma (X_{i}), i=0,1\). Together with the expression just given for \(W(x_{0})\), these give

$$\begin{aligned} W(x_{0})&=e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j, k=0}^{J}e^{2\bar{U}(X_{1})}Q_{j}(\sigma _{0},\sigma _{1})Q_{k}(\sigma _{0},\sigma _{1})E_{X_{1} , j}[\gamma ]E_{X_{1}, k}[\gamma ]\right] \\&\qquad \quad +e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j=0}^{J}e^{2\bar{U}(X_{1})}Q_{j}(\sigma _{0},\sigma _{1})\left( E_{X_{1}, j}\left[ \gamma ^{2}\right] -\left( E_{X_{1}, j}[\gamma ]\right) ^{2}\right) \right] .\nonumber \end{aligned}$$
(16.15)

We examine the various terms separately. Using (16.14) and \(W(x)\doteq e^{\bar{U}(x)}E_x[\gamma ^2]\),

$$\begin{aligned}&e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j=0}^{J}e^{2\bar{U}(X_{1} )}Q_{j}(\sigma _{0},\sigma _{1})E_{X_{1}, j}\left[ \gamma ^{2}\right] \right] \nonumber \\&\quad =E_{x_{0}}\left[ e^{\bar{U}(X_{1})}\sum _{j=0}^{J}e^{\bar{U} (X_{1})-\bar{U}(X_{0})}Q_{j}(\sigma _{0},\sigma _{1})E_{X_{1}, j}\left[ \gamma ^{2}\right] \right] \nonumber \\&\quad =E_{x_{0}}\left[ e^{\bar{U}(X_{1})}\sum _{j=0}^{J}\lambda _{\sigma _{1}}(j)E_{X_{1}, j}\left[ \gamma ^{2}\right] \right] \nonumber \\&\quad =E_{x_{0}}\left[ W\left( X_{1}\right) \right] . \end{aligned}$$
(16.16)

If (16.16) is subtracted from the right side of (16.15), the remaining quantity is

$$\begin{aligned}&e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j=0}^{J}\sum _{l=0}^{J}e^{2\bar{U}(X_{1})-2\bar{U}(X_{0})}Q_{j}(\sigma _{0},\sigma _{1})Q_{l}(\sigma _{0} ,\sigma _{1})E_{X_{1}, j}\left[ \gamma \right] E_{X_{1}, l}\left[ \gamma \right] \right] \nonumber \\&-e^{\bar{U}(x_{0})}E_{x_{0}}\left[ \sum _{j=0}^{J}e^{2\bar{U}(X_{1} )-2\bar{U}(X_{0})}Q_{j}(\sigma _{0},\sigma _{1})\left( E_{X_{1}, j}\left[ \gamma \right] \right) ^{2}\right] . \end{aligned}$$
(16.17)

The terms with both l and j at or above \(\sigma (X_{0})\) contribute nothing to this expression. Indeed, \(Q_{j}(\sigma _{0},\sigma _{1})\) is equal to 1 for exactly one j and to 0 for all remaining j that are greater than \(\sigma (X_{0})-1\). Hence the corresponding terms in the double and single sums cancel. Also, this is the only possibility when \(\sigma (X_{1})\ge \sigma (X_{0})\), and so we restrict to \(\sigma (X_{1})<\sigma (X_{0})\), and use that \(Q_{j}(\sigma _{0},\sigma _{1})=0\) for \(j<\) \(\sigma (X_{1})\) when this is the case. Dropping terms that contribute nothing, we decompose the double sum as

$$ \sum _{j=\sigma (X_{1})}^{\sigma (X_{0})-1}\sum _{l=\sigma (X_{1})}^{\sigma (X_{0})-1}+2\sum _{j=\sigma (X_{0})}^{J}\sum _{l=\sigma (X_{1})}^{\sigma (X_{0} )-1}. $$

Using (16.14), we get the following upper bound for expression (16.17):

$$\begin{aligned}&e^{\bar{U}(x_{0})}E_{x_{0}}\left[ 1_{\left\{ \sigma (X_{1} )<\sigma (X_{0})\right\} }\left( \sum _{j=\sigma (X_{1})}^{\sigma (X_{0} )-1}\lambda _{\sigma (X_{1})}(j)E_{X_{1}, j}\left[ \gamma \right] \right) ^{2}\right] \nonumber \\&+2e^{\bar{U}(x_{0})}E_{x_{0}}\left[ 1_{\left\{ \sigma (X_{1})<\sigma (X_{0})\right\} }\left( \sum _{j=\sigma (X_{0})}^{J}\lambda _{\sigma (X_{1} )}(j)E_{X_{1}, j}\left[ \gamma \right] \right) \right. \nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \left. \times \left( \sum _{l=\sigma (X_{1})}^{\sigma (X_{0})-1}\lambda _{\sigma (X_{1})}(l)E_{X_{1} , l}\left[ \gamma \right] \right) \right] \nonumber \\&\quad \le e^{\bar{U}(x_{0})}E_{x_{0}}\left[ 1_{\left\{ \sigma (X_{1})<\sigma (X_{0})\right\} }\left( \sum _{j=\sigma (X_{1})}^{J} \lambda _{\sigma (X_{1})}(j)E_{X_{1}, j}\left[ \gamma \right] \right) ^{2}\right] . \end{aligned}$$
(16.18)

We now combine (16.15), (16.16), (16.18), and Theorem 16.3 to get that for \(x_{0}\notin A\cup B\),

$$ {W(x_{0})\le e^{\bar{U}(x_{0})}E_{x_{0}}\left[ 1_{\left\{ \sigma (X_{1})<\sigma (X_{0})\right\} }\left( E_{X_{1}}\left[ 1_{B}(X_{M})\right] \right) ^{2}\right] +E_{x_{0}}\left[ W(X_{1})\right] .} $$

Since all functions involved are bounded and nonnegative, it follows that the sequence

$$ \varSigma _{i}\doteq W(X_{i\wedge M})+\sum _{j=1}^{i\wedge M}\left\{ e^{\bar{U}(X_{j-1})}\left( E_{X_{j}}\left[ 1_{B} (X_{M})\right] \right) ^{2}\right\} $$

defined for \(i\in \left\{ 0,\ldots , T\right\} \) is a submartingale. Thus, using \(W(X_{T\wedge M})=e^{\bar{U}(X_{M})}1_{B}(X_{M})\) and that \(1_{B}(X_{k})=0\) if \(k<M\), we have

$$\begin{aligned} e^{\bar{U}(x_{0})}\mathfrak {S}(\bar{U})&=W(x_{0})\\&=\varSigma _{0}\\&\le E_{x_{0}}\left[ \varSigma _{T}\right] \\&=E_{x_{0}}\left[ \sum _{i=1}^{M}e^{\bar{U}(X_{i-1})}\left( E_{X_{i} }\left[ 1_{B}(X_{M})\right] \right) ^{2}\right] +E_{x_{0}}\left[ e^{\bar{U}(X_{M})}1_{B}(X_{M})\right] , \end{aligned}$$

which is the same as (16.13).

We next remove the restriction that \(M\le T\) for some constant \(T<\infty \). We add time as a state variable [i.e., work with the process \((X_{i}, i)\)], and consider the analogous estimation problem in which the stopping set is \((A\cup B)\times \left\{ T\right\} \) (i.e., we stop if either \(X_{i}\) enters \(A\cup B\) or \(i=T\)). Then \(\gamma _{T}\) defined in an analogous manner is an unbiased estimator of \(E_{x_{0}}\left[ 1_{B}(X_{M})1_{\{M\le T\}}\right] \), and by the previous result for bounded stopping times,

$$\begin{aligned}&e^{\bar{U}(x_{0})}E_{x_{0}}[(\gamma _{T})^{2}] \\&\quad \le E_{x_{0}}\left[ \sum _{i=1}^{M\wedge T}e^{\bar{U}(X_{i-1} )}\left( E_{X_{i}}\left[ 1_{B}(X_{M})1_{\{M\le T\}}\right] \right) ^{2}\right] +E_{x_{0}}\left[ e^{\bar{U}(X_{M})}1_{B}(X_{M})1_{\{M\le T\}}\right] .\nonumber \end{aligned}$$
(16.19)

Also note that

$$ \gamma _{T}=e^{-\bar{U}(x_{0})}\sum _{i=0}^{T}\int _{\mathbb {R}^{d}}e^{\bar{U}(y)}1_{B}(y)\bar{\delta }_{Z_{i}}(dy) $$

and \(\gamma _{T}\uparrow \gamma \) a.s. By the monotone convergence theorem,

$$ E_{x_{0}}\left[ (\gamma _{T})^{2}\right] \rightarrow \mathfrak {S}(\bar{U}). $$

Using this in (16.19), the nonnegativity of \(1_{B}\), and the monotone convergence theorem a second time gives (16.13) without the restriction on M.    \(\square \)

The following result gives a lower bound on the second moment of the estimator, complementing the upper bound in Theorem 16.6.

Theorem 16.7

Assume Condition 16.1. Then

$$ \mathfrak {S}(\bar{U})\ge e^{-\bar{U}(x_{0})}E_{x_{0}}\left[ e^{\bar{U} (X_{M})}1_{B}(X_{M})\right] . $$

Proof

From the nonnegativity of \(1_{B}\), (16.15), and (16.16), it follows that \(W(x_{0})\ge E_{x_{0}}\left[ W(X_{1})\right] \). From the Markov property of \(\{X_{i}\}\), it follows that \(\varSigma _{i}\doteq W(X_{i\wedge M})\) is a supermartingale, and in particular,

$$ E_{x_{0}}\left[ W(X_{M\wedge i})\right] \le W(x_{0}). $$

The definition \(W\doteq e^{\bar{U}(x)}E_{x}[\gamma ^2]\) and its nonnegativity then give

$$ E_{x_{0}}\left[ W(X_{M})1_{\{M\le i\}}\right] \le E_{x_{0}}\left[ W(X_{M\wedge i})\right] \le e^{\bar{U}(x_{0})}\mathfrak {S}(\bar{U}). $$

Since \(W(x)=e^{\bar{U}(x)}1_{B}(x)\) for \(x\in A\cup B\), the last display gives

$$ E_{x_{0}}\left[ e^{\bar{U}(X_{M})}1_{B}(X_{M})1_{\{M\le i\}}\right] \le e^{\bar{U}(x_{0})}\mathfrak {S}(\bar{U}). $$

The result now follows on sending \(i\rightarrow \infty \) and using the monotone convergence theorem.    \(\square \)

4 Design and Asymptotic Analysis of Splitting Schemes

Thus far we have considered only the problem of estimating a single probability of the form (16.1). Now we shall turn to the problem of estimating a sequence of such probabilities

$$\begin{aligned} P_{x_{0}}\{X_{M^{n}}^{n}\in B\}, \end{aligned}$$
(16.20)

\(n\in \mathbb {N}\), where \(M^{n}\doteq \inf \{i:X_{i}^{n}\in A\cup B\}\) and \(\{X_{i}^{n}\}_{i\in \mathbb {N}_{0}}\) is a Markov chain for each n that satisfies a large deviation principle as \(n\rightarrow \infty \) (see Condition 16.9). We recall that by assumption, A is open, B is closed, and \((A\cup B)^{c}\) is bounded. With \(\mathfrak {S}^{n}(\bar{U})\) denoting the second moment \(E_{x_{0}}\left[ (\gamma ^{n})^{2}\right] \), the asymptotic performance will be evaluated using the following measure of work-normalized error:

$$\begin{aligned} \lim _{n\rightarrow \infty }\frac{1}{n}\log \frac{\mathfrak {S}^{n}(\bar{U})E_{x_{0}}\left[ w^{n}\right] }{\left[ P_{x_{0}}\{X_{M^{n}}^{n}\in B\}\right] ^{2}}, \end{aligned}$$
(16.21)

where \(\gamma ^{n}\) is the splitting-based estimator for (16.20), and \(w^{n}\) is its computational cost, which was defined in the nonasymptotic setting in (16.11). (Such a weighted performance measure is not needed for importance sampling, since the cost per sample is essentially independent of the subsolution.)

Suppose that \(-(1/n)\log P_{x_{0}}\{X_{M^{n}}^{n}\in B\}\rightarrow V(x_{0})\) as \(n\rightarrow \infty \). Jensen’s inequality as discussed in Chap. 14 shows that the best possible value of (16.21) is zero, and this occurs only when the work grows subexponentially and the second moment \(\mathfrak {S}^{n}(\bar{U})\) decays at rate \(2V(x_{0})\). Bounds on the asymptotic behavior of the work-normalized error will be obtained using Theorems 16.516.7 and are stated in Theorem 16.15, Corollary 16.16, and Theorem 16.18.

Remark 16.8

As in Chap. 15, the theoretical bounds on performance are given for the case of a fixed initial condition \(x_{0}\). However, all results are easily generalized to the case of varying initial conditions \(x_{n}\) that converge to \(x_{0}\) as \(n\rightarrow \infty \). This generalization is useful for systems with discrete state spaces, such as queueing networks.

The theory presented in this section will require some fairly standard assumptions on the stability and large deviation behavior of \(\left\{ X_{i}^{n}\right\} \), and also some regularity properties on A and B that are qualitatively similar to assumptions made in Chap. 15 (e.g., Condition 15.9). For example, we will want to know that \(\tau ^{n}\doteq M^{n}/n\) can essentially be taken as bounded, in the sense that there is some \(T<\infty \) such that the event \(\tau ^{n}>T\) is unimportant as far as the large deviation asymptotics are concerned. This is an important qualitative assumption, and it is related to stability properties of the law of large numbers limit processes obtained when \(n\rightarrow \infty \).

We define continuous time stochastic processes as usual by setting \(X^{n}(t)=X_{i}^{n}\) for \(t=i/n\) and by piecewise linear interpolation for \(t\in \left[ i/n,(i+1)/n\right) \). Throughout, we assume that \(x_{0}\in (\bar{A}\cup B)^{c}\). The following condition will be needed to establish a limit for the second moment; it is not needed if one wants to give just upper bounds on the second moment.

Condition 16.9

For every \(T\in (0,\infty )\), the sequence \(\left\{ X^{n}\right\} _{n\in \mathbb {N}}\) satisfies a large deviation principle on \(\mathscr {C}\left( [0,T]:\mathbb {R}^{d}\right) \) that is uniform with respect to the initial condition in compacts sets. The rate function is of the form

$$ I_{T}(\phi )\doteq \int _{0}^{T}L(\phi (s),\dot{\phi }(s))ds $$

if \(\phi \in \mathscr {C}\left( [0,T]:\mathbb {R}^{d}\right) \) is absolutely continuous with \(\phi (0)=x_{0}\) and \(\infty \) otherwise, where L is a nonnegative measurable function.

As remarked above, the conditions we use beyond the LDP can be partitioned into “stability” and “controllability” type conditions. We give two conditions that will be sufficient (but not necessary) for what follows. Moreover, the sufficient conditions we give will by themselves cover many interesting problems. The stability condition (Condition 16.10) will imply that the algorithm is practical in that the tails of the hitting times are controlled, and also that the escape time problem can be approximated using estimates over finite time intervals. The condition we refer to as “controllability” (Condition 16.11) is needed to establish limits rather than just bounds, and is analogous to the additional conditions that would have been required in Chap. 15 as noted in Remark 15.16.

We will assume the following condition, which is the same as Condition 15.9 in Chap. 15.

Condition 16.10

There exist \(c>0,T_{0}\in (0,\infty )\) and \(n_{0} \in \mathbb {N}\) such that for all \(n\ge n_{0}\), \(T<\infty \), and \(x\in D\),

$$ P_{x}\{\tau ^{n}>T\}\le \exp \{-cn(T-T_{0})\}. $$

Note that Condition 16.10 implies that

$$\begin{aligned} \limsup _{T\rightarrow \infty }\limsup _{n\rightarrow \infty }\sup _{x\in D}\frac{1}{n}\log P_{x}\{\tau ^{n}>T\}=-\infty . \end{aligned}$$
(16.22)

Condition 16.10 would not hold if there were two attractors for the zero-cost trajectories, \(A\cup B\) contains one of the attractors but not the other, and the process starts in the domain of attraction of the stable point that is not in \(A\cup B\).

Condition 16.11

Suppose we are given absolutely continuous \(\phi \) satisfying \(\phi (0)=x_{0}\notin \bar{A}\cup B\), \(\phi (t)\notin A\cup B^{\circ }\) for \(t\in [0,T)\), and \(\phi (T)\in B\) for some \(T<\infty \). Then given \(\gamma >0\), there exist absolutely continuous \(\phi ^{*}\), \(T^{*}<\infty \), and \(\tau ^{*}<T^{*}\) such that \(\phi ^{*}(0)=x_{0}\), \(\phi ^{*}(t)\notin \bar{A}\cup B\) for \(t\in [0,\tau ^{*})\), \(\phi ^{*}(t)\in B^{\circ }\) for \(t\in (\tau ^{*}, T^{*}]\), and such that

$$ \int _{0}^{T^{*}}L(\phi ^{*}(r),\dot{\phi }^{*}(r))dr\le \int _{0} ^{T}L(\phi (r),\dot{\phi }(r))dr+\gamma ,\;\left\| \phi (T)-\phi ^{*} (\tau ^{*})\right\| \le \gamma . $$

One can consider this a controllability-type condition. It says that given a trajectory \(\phi \) that enters \(\bar{A}\cup B\) but not \(A\cup B^\circ \) and finally enters \(B^\circ \) at T, one can find a trajectory with almost the same cost that avoids \(\bar{A}\cup B\) till \(\tau ^*\), at which time it enters \(B^\circ \) near \(\phi (T)\). One can establish more concrete conditions that imply this condition, such as assuming that \(L(x,\beta )\) is continuous, bounded on each compact subset of \(\mathbb {R}^{d}\times \mathbb {R}^{d}\), and assuming regularity properties for the boundaries of A and B.

We next give a definition of subsolution appropriate to this problem, but phrased directly in terms of the calculus of variations problem. The definition via calculus of variations is somewhat more to the point of what is required and is used in the proofs. For \(y\in (A\cup B^{\circ })^{c}\) and \(T\in (0,\infty )\), define

$$\begin{aligned}&K_{y, T} \\&\quad \doteq \left\{ \phi \in \mathscr {AC}([0,T]:\mathbb {R}^{d}):\phi (0)=y;\phi (s)\notin A\cup B^{\circ }, s\in (0,T),\phi (T)\in B\right\} . \nonumber \end{aligned}$$
(16.23)

Definition 16.12

A continuous function \(\bar{V}:\mathbb {R}^{d}\rightarrow \mathbb {R}\) is a subsolution if it is bounded from below,

$$\begin{aligned} \bar{V}(y)\le \inf _{\phi \in K_{y,T}, T<\infty }\left[ \int _{0}^{T}L(\phi (s),\dot{\phi }(s))ds+\bar{V}(\phi (T))\right] \end{aligned}$$
(16.24)

for all \(y\in (A\cup B^{\circ })^{c}\), and \(\bar{V}(z)\le 0\) for \(z\in \) B.

Remark 16.13

(Relations between notions of subsolution I) Suppose that \(\bar{V}\) is a subsolution in the sense of Definitions 14.4 or 14.5 that is bounded from below, and to simplify the discussion, assume also that \(\mathbb {H}(x,\alpha )\) is continuous. Then we claim that \(\bar{V}\) is a subsolution in the sense of Definition 16.12. Consider the case of Definition 14.4, and for \(y \in (A\cup B^{\circ })^{c}\), suppose \(\phi \in K_{y, T}\). Since \(\bar{V}\) is continuously differentiable, the definition \(\mathbb {H}(x,\alpha )\doteq \inf _{\beta \in \mathbb {R}^{d}}\left[ \left\langle \alpha ,\beta \right\rangle +L(x,\beta )\right] \) implies

$$ \left\langle D\bar{V}(\phi (s)),\dot{\phi }(s)\right\rangle +L(\phi (s),\dot{\phi }(s))\ge 0 $$

for a.e. \(s\in [0,T]\). Integrating gives

$$ \bar{V}(y)\le \left[ \int _{0}^{T}L(\phi (s),\dot{\phi }(s))ds+\bar{V} (\phi (T))\right] . $$

Since \(\phi \in K_{y, T}\) is arbitrary, and \(\bar{V}(z)\le 0\) for \(z\in B\) is part of Definition 14.4, the claim follows. For the case of piecewise classical subsolutions it is enough to note that the mollification (14.16) produces a classical-sense subsolution \(\bar{V}^{\delta }\), and the claim follows by taking the limit \(\delta \downarrow 0\). Note that one does not need to use the mollified subsolution for the design of the splitting scheme, but can instead use the potentially nonsmooth limit. It is also worth noting that the continuity of \(\mathbb {H}\) is not used in any essential way, so that the analogous claim holds for problems involving discontinuous statistics, such as queueing networks.

Remark 16.14

(Relations between notions of subsolution II) The set of trajectories \(K_{y, T}\) in (16.23) differs from \(C_{y, T}\) introduced in Chap. 14 in that trajectories are excluded from B for \(s\in (0,T)\) for \(C_{y, T}\), but only from \(B^{\circ }\) for \(K_{y, T}\). The reason for the difference is the slightly different way in which the subsolution property is used in the cases of importance sampling and splitting. However, under the conditions we assume, to obtain a limit as in Theorem 16.15 one could use \(C_{y, T}\) in Definition 16.12 instead. Indeed, since \(C_{y,T}\subset K_{y, T}\), one has only to check that the defining inequality (16.24) holds for \(\phi \in K_{y, T}\) for all T if it holds for all \(\phi \in C_{y, S}\), \(y\in (A\cup B^{\circ })^{c}\), and \(S\in (0,\infty )\). But this is easy to check under Condition 16.11. Note also that the two definitions are equivalent without reference to Condition 16.11 if \(\bar{V}\) is constant on B, simply because \(L\ge 0\).

Note that \(\bar{V}\) is never greater than the solution to the calculus of variations problem, which is defined by \(V(z)=0\) for \(z\in B\), \(V(z)=\infty \) for \(z\in A\), and

$$\begin{aligned} V(x)=\inf _{\phi \in K_{x, T};T<\infty }\left[ \int _{0}^{T}L(\phi (s),\dot{\phi }(s))ds\right] \end{aligned}$$
(16.25)

for \(x\in (A\cup B)^{c}\). Given a subsolution, the corresponding splitting scheme is defined as follows. Thresholds are defined in terms of the levels

$$ \bar{V}(x_{0}),\bar{V}(x_{0})-(\log R)/n,\bar{V}(x_{0})-(2\log R)/n,\ldots . $$

Let \(J^{n}\) be the smallest number such that \(\bar{V}(x)\ge \bar{V} (x_{0})-(J^{n}\log R)/n\) for all \(x\in D\), so that there are no more than \(J^{n}\) thresholds. Then we define \(C_{J^{n}}^{n}\doteq D, C_{-1}^{n} \doteq \emptyset ,\) and

$$\begin{aligned} C_{J^{n}-j}^{n}\doteq \left\{ x:\bar{V}(x)\le \bar{V}(x_{0})-(j\log R)/n\right\} ,\;j=1,\ldots , J^{n}. \end{aligned}$$
(16.26)

Recall \(R=e^{\varDelta }\) and define \(\varDelta ^{n}\doteq \varDelta /n\). Also define a sequence \(\{\bar{U}^{n}\}\) according to

$$ \bar{U}^{n}(x)\doteq (J^{n}\log R)/n-(j\log R)/n\text { for }x\in C_{J^{n} -j}^{n}\backslash C_{J^{n}-j-1}^{n},\;j=0,1,\ldots , J^{n}. $$

Note that whenever \(y_{n}\rightarrow y\), \(\bar{U}^{n}(y_{n})-\bar{U}^{n} (x_{0})\rightarrow \bar{V}(y)\wedge \bar{V}(x_{0})-\bar{V}(x_{0})\). Note also that if \(\bar{V}\) is a subsolution in the sense of Definition 16.12, then so is \(\bar{V}(\cdot )\wedge \bar{V}(x_{0})\). To simplify notation, we assume without loss that \(\bar{V}(x)\le \bar{V}(x_{0})\) for all \(x\in D\), and therefore for \(x\in D\),

$$\begin{aligned} \left| (\bar{U}^{n}(x_{0})-\bar{U}^{n}(x))-(\bar{V}(x_{0})-\bar{V}(x))\right| \le \frac{\log R}{n}. \end{aligned}$$
(16.27)

In particular, \(\bar{U}^{n}(x_{0})-\bar{U}^{n}(x)\rightarrow \bar{V} (x_{0})-\bar{V}(x)\) for all \(x\in D\). We now apply Theorems 16.6 and 16.7, with \(\varDelta \) replaced by \(\varDelta ^{n}\) and \(\bar{U}\) replaced by \(n\bar{U}^{n}\), to the Markov chain \(\{X_{i}^{n}\}\). Following the same notation as in Chaps. 14 and 15, we denote the second moment of the estimator \(E_{x_{0}}[\left( \gamma ^{n}\right) ^{2}]\) by \(\mathfrak {S}^{n}(\bar{V})\). The corresponding initializing distribution \(\lambda _{k}^{n}\) is defined as \(\lambda _{k} ^{n}(l)=q_{l}^{n}/R^{J^{n}-k}\), with \(q_{l}^{n}\) defined by (16.5) but with J replaced by \(J^{n}\). Theorems 16.6 and 16.7 then say that

$$\begin{aligned}&e^{-n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ \sum _{i=1}^{M^{n}} e^{n\bar{U}^{n}(X_{i-1}^{n})}\left( E_{X_{i}^{n}}\left[ 1_{B}(X_{M^{n}} ^{n})\right] \right) ^{2}\right] \nonumber \\&\qquad \qquad \quad +e^{-n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ e^{n\bar{U} ^{n}(X_{M^{n}}^{n})}1_{B}(X_{M^{n}}^{n})\right] \ge \mathfrak {S}^{n}(\bar{V}) \end{aligned}$$
(16.28)

and

$$ \mathfrak {S}^{n}(\bar{V})\ge e^{-n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ e^{n\bar{U}^{n}(X_{M^{n}}^{n})}1_{B}(X_{M^{n}}^{n})\right] , $$

where \(\gamma ^{n}=e^{-n\bar{U}^{n}(x_{0})}\sum _{i=0}^{\infty }\int _{\mathbb {R}^{d}}1_{B}(y)e^{-n\bar{U}^{n}(y)}\bar{\delta }_{Z_{i}^{n}}(dy)\) is the estimator of (16.20) based on the importance function \(\bar{U}^{n}\).

Theorem 16.15 below describes the asymptotic performance of the splitting scheme based on importance functions \(\{\bar{U}^{n}\}\). As a consequence of this result, in Corollary 16.16, we will see that the decay rate

$$ \lim _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V}) $$

is bounded below by \(V(x_{0})+\bar{V}(x_{0})\), where \(V(x_{0})\) is defined in (16.25), which is the same as the decay rate for an importance sampling scheme based on the same subsolution if it is sufficiently regular (see Theorem 15.10). We recall from (14.5) that when the large deviation limit holds, the decay rate of any unbiased splitting scheme is bounded above by \(2V(x_{0})\). In particular, if \(\bar{V} (x_{0})=V(x_{0})\), we get the best possible decay rate \(2V(x_{0})\). Finally, in Theorem 16.18 below, we will show that

$$ \lim _{n\rightarrow \infty }\frac{1}{n}\log E_{x_{0}}\left[ w^{n}\right] =0. $$

Thus the work associated with such a scheme grows subexponentially, and consequently, the decay rate of the work-normalized error is zero, which is the best possible rate.

It is easily checked that if \(\bar{V}\) is not a subsolution, then at points where the subsolution property fails, the branching is supercritical, and hence in this case there exists \(y\in D\) such that if \(y_{n}\rightarrow y\), then

$$ \liminf _{n\rightarrow \infty }\frac{1}{n}\log E_{y_{n}}\left[ w^{n}\right] >0. $$

It follows that importance functions that are not obtained from subsolutions should not be used to design schemes, since it is possible that the computational costs of such schemes will grow exponentially.

Recall that \(x_{0}\not \in \bar{A}\cup B\), \(\bar{V}\) is a subsolution as in Definition 16.12, and that as noted previously, we can assume \(\bar{V}(x)\le \bar{V}(x_{0})\) for all \(x\in D\). Recall also the definition of \(K_{y, T}\) in (16.23).

Theorem 16.15

Assume Conditions 16.916.11. Then for \(x_{0} \not \in \bar{A}\cup B\),

$$\begin{aligned}&\lim _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V})\nonumber \\&\quad =\inf _{\phi \in K_{x_{0},T}, T<\infty }\left[ \int _{0}^{T}L(\phi (r),\dot{\phi }(r))dr+\bar{V}(\phi (0))-\bar{V}(\phi (T))\right] . \end{aligned}$$
(16.29)

Proof

We first consider the lower bound

$$\begin{aligned}&\liminf _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V})\nonumber \\&\quad \ge \inf _{\phi \in K_{x_{0},T}, T<\infty }\left[ \int _{0}^{T} L(\phi (r),\dot{\phi }(r))dr+\bar{V}(\phi (0))-\bar{V}(\phi (T))\right] , \end{aligned}$$
(16.30)

which is based on (16.28). While there are two terms in (16.28), the second term can be treated in a similar manner as the first term, and so we focus on the first. This term is

$$\begin{aligned} e^{-n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ \sum _{i=1}^{M^{n}}e^{n\bar{U} ^{n}(X_{i-1}^{n})}1_{B}\left( X_{M^{1,i, n}}^{1,i, n}\right) 1_{B}\left( X_{M^{2,i, n}}^{2,i, n}\right) \right] , \end{aligned}$$
(16.31)

where \(\{X_{j}^{k,i, n}\}_{j\ge i}\), \(k=1,2\), are (conditionally) independent copies of \(\{X_{j}^{n}\}_{j\ge i}\) that start at \(X_{i}^{n}\) at \(j=i\), and \(M^{k,i, n}\) are the corresponding escape times.

We claim that instead of the large deviation asymptotics of (16.31), it suffices to consider the large deviation asymptotics of

$$\begin{aligned} e^{-n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ \sum _{i=1}^{\left\lfloor nT\right\rfloor }1_{\{M^{n}\ge i\}}e^{n\bar{U}^{n}(X_{i-1}^{n})}1_{B}\left( X_{M^{1,i, n}\wedge \left\lfloor nT\right\rfloor }^{1,i, n}\right) 1_{B}\left( X_{M^{2,i, n}\wedge \left\lfloor nT\right\rfloor }^{2,i, n}\right) \right] \end{aligned}$$
(16.32)

for some fixed and finite T. Assuming the claim, observe that there are no more than order-n terms in the expected value, and it suffices to obtain the desired bound on each of these terms. Let \(i_{n}\) index such a term. In obtaining a bound, we can assume without loss that \(i_{n}/n\) will converge to some limit \(t\in [0,T]\), and to simplify notation, we write i for \(i_{n}\). We first show that

$$\begin{aligned}&\liminf _{n\rightarrow \infty }-\frac{1}{n}\log e^{-n\bar{U}^{n}(x_{0} )}E_{x_{0}}\left[ e^{n\bar{U}^{n}(X_{i-1}^{n})}1_{\{M^{n}\ge i\}} 1_{B}\left( X_{M^{1,i, n}\wedge \left\lfloor nT\right\rfloor }^{1,i, n}\right) 1_{B}\left( X_{M^{2,i, n}\wedge \left\lfloor nT\right\rfloor }^{2,i, n}\right) \right] \nonumber \\&\ge \inf \left[ \int _{0}^{s}L(\phi (r),\dot{\phi }(r))dr+\bar{V}(x_{0} )-\bar{V}(\phi (s))\right] , \end{aligned}$$
(16.33)

where the infimum is over all absolutely continuous \(\phi \) such that \(\phi (0)=x_{0}\) and \(\phi (r)\notin A\cup B^{\circ }\) for \(r\in (0,s)\) and \(\phi (s)\in B\) for some \(s\ge t\). Let \(\hat{X}^{n}(t)\) be the continuous time trajectory that interpolates \(X_{j}^{n}\) up till i, and thereafter is a two-component process that interpolates \(X_{j}^{1,i, n}\) and \(X_{j}^{2,i, n}\) up until \(\left\lfloor nT\right\rfloor \). It is straightforward, using the Markov property and the uniformity of the large deviation estimates with respect to initial conditions that is assumed in Condition 16.9, to check that \(\{\hat{X}^{n}\}\) satisfies a large deviation property, and that the rate function (with obvious notation for a trajectory \(\eta \) that branches at time t into \(\eta ^{1}\) and \(\eta ^{2}\)) is

$$ \int _{0}^{t}L(\eta (r),\dot{\eta }(r))dr+\int _{t}^{T}L(\eta ^{1}(r),\dot{\eta }^{1}(r))dr+\int _{t}^{T}L(\eta ^{2}(r),\dot{\eta }^{2}(r))dr. $$

Since \(\bar{V}\) is continuous and B is closed, we obtain the lower bound

$$\begin{aligned}&\inf \left[ \int _{0}^{t}L(\eta (r),\dot{\eta }(r))dr+\int _{t}^{T}L(\eta ^{1}(r),\dot{\eta }^{1}(r))dr\right. \\&\quad \quad \quad \left. +\int _{t}^{T}L(\eta ^{2}(r),\dot{\eta }^{2} (r))dr+\bar{V}(x_{0})-\bar{V}(\eta (t))\right] \end{aligned}$$

for the left side of (16.33), where the infimum is over all \(\eta \) such that \(\eta (0)=x_{0}\) and \(\eta (r)\notin A\cup B^{\circ }\) for \(r\in (0,t]\) and \(\eta ^{k}, k=1,2\) such that \(\eta ^{k}(t)=\eta (t)\) and \(\eta ^{k}(r)\notin A\cup B^{\circ }\) for \(r\in [t, s^{k}]\), \(s^{k}\in [t, T]\), \(\eta ^{k}(s^{k})\in B\). Without loss we can assume that the cost is zero after \(s^{k}\) and that \(\eta ^{1}=\eta ^{2}\) (which we relabel as \(\eta \), and \(s^{k}\) as s). By the subsolution property,

$$ \bar{V}(\eta (t))\le \int _{t}^{s}L(\eta (r),\dot{\eta }(r))dr+\bar{V}(\eta (s)), $$

which gives (16.33).

We now prove the claim. It remains to show that (16.31) has the same large deviation asymptotics as (16.32). To justify bounding the other random times by \(\left\lfloor nT\right\rfloor \), we need to show that

$$\begin{aligned}&\limsup _{T\rightarrow \infty }\limsup _{n\rightarrow \infty }\frac{1}{n}\log e^{-n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ \sum _{i=1}^{M^{n}\wedge \left\lfloor nT\right\rfloor }e^{n\bar{U}^{n}(X_{i-1}^{n})}\left( 1_{\left\{ \tau ^{1,i, n}\ge nT\right\} }+1_{\left\{ \tau ^{2,i, n}\ge nT\right\} }\right) \right] \nonumber \\&\quad \quad \quad =-\infty . \end{aligned}$$
(16.34)

However, using (16.27), the expected value is bounded above by

$$ 2Re^{n2\left\| \bar{V}\right\| _{\infty }}\sum _{i=1}^{\lfloor nT\rfloor }P_{x_{0}}\left\{ \tau ^{1,i, n}\ge nT\right\} , $$

and thus (16.34), and therefore the claim, follows from Condition 16.10 [see (16.22)].

We now prove the upper bound

$$\begin{aligned}&\limsup _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V})\nonumber \\&\quad \le \inf _{\phi \in K_{x_{0},T}, T<\infty }\left[ \int _{0}^{T} L(\phi (r),\dot{\phi }(r))dr+\bar{V}(\phi (0))-\bar{V}(\phi (T))\right] . \end{aligned}$$
(16.35)

Fix \(\varepsilon \in (0,1)\), let \(\varGamma \) denote the right-hand side of (16.35), and using Condition 16.11 choose an absolutely continuous \(\phi , T\) and \(\tau \in (0,T)\) such that \(\phi (0)=x_{0}\), \(\phi (\tau )\in B\), \(\phi (t)\not \in \bar{A}\cup B\) for all \(t\in (0,\tau )\), \(\phi (t)\in B^{\circ }\) for all \(t\in (\tau , T]\), and

$$\begin{aligned} \int _{0}^{T}L(\phi (r),\dot{\phi }(r))dr+\bar{V}(\phi (0))-\bar{V}(\phi (\tau ))\le \varGamma +\varepsilon . \end{aligned}$$
(16.36)

Let \(\delta >0\) satisfy \(\left\| \phi (r)-x\right\| \ge \delta \) if \(r\in [0,\tau -\delta )\) and \(x\in \bar{A}\cup B\) and \(\Vert \phi (r)- x\Vert \ge \delta \) if \(r\in [\tau +\delta , T]\) and \(x\notin B\). Let also \(\varXi ^{n}\doteq \{\sup _{0\le t\le T}\left\| X^{n}(t)-\phi (t)\right\| <\delta \}\). Then using the large deviation lower bound for the third inequality and (16.36) for the fourth, we obtain

$$\begin{aligned}&\liminf _{n\rightarrow \infty }\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V} )\\&\quad \ge \liminf _{n\rightarrow \infty }\frac{1}{n}\log e^{-n\bar{U}^{n} (x_{0})}E_{x_{0}}\left[ e^{n\bar{U}^{n}(X_{M^{n}}^{n})}1_{B}(X_{M^{n}} ^{n})\right] \\&\quad \ge \liminf _{n\rightarrow \infty }\frac{1}{n}\log e^{-n\bar{U}^{n} (x_{0})}E_{x_{0}}\left[ e^{n\bar{U}^{n}(X_{M^{n}}^{n})}1_{B}(X_{M^{n}} ^{n})1_{\varXi ^{n}}\right] \\&\quad \ge \inf _{y\in C_{\delta }}\bar{V}(y)-\bar{V}(x_{0})+\liminf _{n\rightarrow \infty }\frac{1}{n}\log P_{x_{0}}\{\varXi ^{n}\}\\&\quad \ge \inf _{y\in C_{\delta }}\bar{V}(y)-\bar{V}(\phi (\tau ))-\varGamma -\varepsilon , \end{aligned}$$

where \(C_{\delta }\doteq \{y:\left\| y-\phi (t)\right\| <\delta \) for some \(t\in [\tau -\delta ,\tau +\delta ]\}\). Since \(\bar{V}\) and \(\phi \) are continuous, we have \(\inf _{y\in C_{\delta }}\bar{V}(y)-\bar{V}(\phi (\tau ))\rightarrow 0\) as \(\delta \rightarrow 0\). Since \(\varepsilon >0\) and \(\delta >0\) are arbitrary, this proves the upper bound in (16.35).    \(\square \)

Corollary 16.16

Under same conditions and with the same notation as in Theorem 16.15,

$$ \liminf _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V})\ge V(x_{0})+\bar{V}(x_{0}). $$

In particular, if \(\bar{V}(x_{0})=V(x_{0})\), then

$$ \lim _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V})=2V(x_{0}). $$

Proof

Recall the set \(K_{x_{0}, T}\) introduced in (16.23) and consider any \(\phi \in K_{x_{0}, T}\) with \(T\in (0,\infty )\). Since \(\bar{V}\) is a subsolution, it follows that \(\bar{V}(z)\le 0\) for \(z\in B\), and so

$$ \int _{0}^{T}L(\phi (r),\dot{\phi }(r))dr+\bar{V}(\phi (0))-\bar{V}(\phi (T))\ge \int _{0}^{T}L(\phi (r),\dot{\phi }(r))dr+\bar{V}(x_{0}). $$

Taking the infimum over all \(\phi \in K_{x_{0}, T}\) and \(T\in (0,\infty )\), we have from (16.25) and (16.29) that

$$ \liminf _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V})\ge V(x_{0})+\bar{V}(x_{0}), $$

proving the first statement in the corollary. On the other hand, as was argued previously,

$$ \limsup _{n\rightarrow \infty }-\frac{1}{n}\log \mathfrak {S}^{n}(\bar{V} )\le 2V(x_{0}). $$

The second statement in the corollary follows.    \(\square \)

Remark 16.17

An examination of the proof shows that the greatest contribution to the second moment of the estimator is from the correlation of particles that make it to B and whose last common ancestor is located in one of the thresholds close to B.

Finally, we now show that the work associated with the splitting scheme based on the sequence \(\{\bar{U}^{n}\}\) grows subexponentially.

Theorem 16.18

Under same conditions and with the same notation as in Theorem 16.15,

$$ \lim _{n\rightarrow \infty }\frac{1}{n}\log E_{x_{0}}\left[ w^{n}\right] =0, $$

where \(w^{n}\) is defined by the right side of (16.11) replacing \(N_{i}\) by \(N_{i}^{n}\doteq \int _{D}\bar{\delta }_{Z_{i}^{n}}(dx)\).

Proof

We know from Theorem 16.5 that

$$ E_{x_{0}}\left[ w^{n}\right] =e^{n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ \sum _{i=0}^{M^{n}}e^{-n\bar{U}^{n}(X_{i}^{n})}\right] . $$

Exactly as in the proof of (16.30), it follows that the large deviation asymptotics of \(E_{x_{0}}\left[ w^{n}\right] \) are the same as those of

$$ e^{n\bar{U}^{n}(x_{0})}E_{x_{0}}\left[ \sum _{i=0}^{M^{n}\wedge \left\lfloor nT\right\rfloor }e^{-n\bar{U}^{n}(X_{i}^{n})}\right] $$

for some sufficiently large but finite T. The convergence \(\bar{U} ^{n}(y)-\bar{U}^{n}(x_{0})\rightarrow \bar{V}(y)-\bar{V}(x_{0})\) and the same line of argument as in the proof of (16.30) show that

$$\begin{aligned}&\limsup _{n\rightarrow \infty }\frac{1}{n}\log e^{n\bar{U}^{n}(x_{0})} E_{x_{0}}\left[ \sum _{i=0}^{M^{n}\wedge \left\lfloor nT\right\rfloor }e^{-n\bar{U}^{n}(X_{i}^{n})}\right] \\&\quad \le -\inf _{\phi \in K_{x_{0},T}, T<\infty }\left[ \int _{0}^{T} L(\phi (s),\dot{\phi }(s))ds-(\bar{V}(x_{0})-\bar{V}(\phi (T)))\right] . \end{aligned}$$

By the subsolution property (Definition 16.12), the quantity to be minimized is always nonnegative, and so the upper bound follows. Since \(E_{x_{0}}\left[ w^{n}\right] \ge 1\) for all n, the lower bound is automatic, which completes the proof.    \(\square \)

Remark 16.19

Although the subsolution property implies a type of stability as asserted in Theorem 16.18, it could allow for polynomial growth of the number of particles. If in practice one observes that a large number of particles make it to B in the course of simulating a single sample, then one can consider the use of a strict subsolution, i.e., a function \(\bar{V}\) that satisfies the boundary conditions and

$$ \bar{V}(y)\le \inf _{\phi \in K_{y,T}, T<\infty }\left[ \int _{0}^{T} [L(\phi (s),\dot{\phi }(s))-\varepsilon ]ds+\bar{V}(\phi (T))\right] $$

for some \(\varepsilon >0\). Because the value of \(\bar{V}(x_{0})\) is lowered slightly, there will be a slight increase in the second moment of the estimator. However, the strict inequality provides stronger control, and indeed, the expected number of particles and moments of the number of particles are bounded uniformly in n. See [76] for further discussion and examples. If phrased in terms of \(\mathbb {H}\) as in Remark 16.14, the strict subsolution property means that \(\mathbb {H}(x, D\bar{V}(x))\ge \varepsilon \) for \(x\in (A\cup B^\circ )^c\).

5 Splitting for Finite-Time Problems

 By adding time as a state variable, finite-time problems such as those discussed in the context of importance sampling in Sect. 15.2 can also be be put into the RESTART framework. Thus the process \(\{X_{i}^{n}\}\) is replaced by \(\{(X_{i}^{n}, t_{i}^{n})\}\), where

$$ X_{i+1}^{n}=X_{i}^{n}+\frac{1}{n}v_{i}(X_{i}^{n}),\quad X_{0}^{n}=x_{0},\quad t_{i+1}^{n}=t_{i}^{n}+\frac{1}{n},\quad t_{0}^{n}=0. $$

Consider, for example, the estimation of \(P_{x_{0}}\left\{ X^{n}(T)\in B\right\} \). For this problem, it is assumed that the rare set \(B\subset \mathbb {R}^{d}\) does not contain the terminal values \(\phi (T)\) of zero-cost trajectories that start at \(x_{0}\), and the typical behavior is \(\left\{ X^{n}(T)\in A\right\} \) with \(A=B^{c}\). (Note that if we wish to continue the reduction to the time-independent case, then with the state space \(\mathbb {R}^{d+1}\), we would call the rare set \(B\times \{T\}\subset \mathbb {R}^{d+1}\) and the typical set \(A\times \{T\}\).) The definition of subsolution becomes the following, with \(\bar{K}_{y,t, T}\) the set of absolutely continuous trajectories \(\phi \) with \(\phi (t)=y\) and \(\phi (T)\in B\).

Definition 16.20

A continuous function \(\bar{V}:\mathbb {R}^{d} \times [0,T]\rightarrow \mathbb {R}\) is a subsolution if it is bounded from below and

$$ \bar{V}(y,t)\le \inf _{\phi \in \bar{K}_{y,t,T}}\left[ \int _{t}^{T} L(\phi (s),\dot{\phi }(s))ds+\bar{V}(\phi (T), T)\right] $$

for all \((y, t)\in \mathbb {R}^{d}\times [0,T)\), and \(\bar{V}(z, T)\le 0\) for \(z\in \) B.

One can add a “bounding” set as in Remark 15.4, which does not change the requirement for \(\bar{V}\) to be a subsolution, except that \(A\times \{T\}\) now also includes the points \(D^{c}\times [0,T)\), and we restrict in the definition to \((y, t)\in D\times [0,T)\). As in Remark 16.13, a classical or piecewise classical subsolution in the sense of Definitions 14.1 and 14.2 is a subsolution in the sense of Definition 16.20.

A related problem of interest is to estimate the probability of escaping any time during the time interval, i.e., \(P_{x_{0}}\{X^{n}(t)\in B\) for some \(t\in [0,T]\}\). In this case, one should replace B in the time-independent setting by \(B\times [0,T]\), and A by \(A\times \{T\}\). The definition of subsolution is then the following.

Definition 16.21

A continuous function \(\bar{V}:\mathbb {R} ^{d}\times [0,T]\rightarrow \mathbb {R}\) is a subsolution if it is bounded from below and

$$ \bar{V}(y,t)\le \inf _{\phi \in \bar{K}_{y,t,s},t\le s\le T}\left[ \int _{t}^{s}L(\phi (s),\dot{\phi }(s))ds+\bar{V}(\phi (s), s)\right] $$

for all \((y, t)\in \mathbb {R}^{d}\times [0,T)\) and \(\bar{V}(z, t)\le 0\) for \(z\in \) B and \(t\in [0,T]\).

As an elementary time-dependent example, we consider the case in which the \(\{v_{i}(x)\}\) are N(0, 1) (and thus independent of x), so that \(H(\alpha )=\alpha ^{2}/2\) and \(L(\beta )=\beta ^{2}/2\). With \(B=[1,\infty )\) and \(T=1\),

$$ \bar{V}(x, t)=-x+\frac{1}{2}t+\frac{1}{2} $$

is a subsolution with the optimal value at (0, 0). Splitting thresholds as well as the start of a simulation with splitting rate \(R=3\) are depicted in Fig. 16.3.

Fig. 16.3
figure 3

Splitting thresholds for time-dependent problem

5.1 Subsolutions for Analysis of Metastability

 Suppose that \(x^{*}\) has the property that all zero-cost trajectories are attracted to \(x^{*}\) in the sense that for all \(x_{0}\in \mathbb {R}^{d}\), the properties \(I_{S}(\phi )=0\) for all S and \(\phi (0)=x_{0}\) imply that \(\phi (S)\rightarrow x^{*}\) as \(S\rightarrow \infty \). Consider the issue of estimating \(P_{x^{*}}\left\{ X^{n}(T)\in B\right\} \). Assume for simplicity that \(B^{c}\) is bounded and define

$$ W(x, y)=\inf \left[ I_{S}(\phi ):\phi (0)=x,\phi (S)=y, S<\infty \right] . $$

Then W(xy) is the Freidlin–Wentzell quasipotential [140] relative to the starting point x. Suppose again for simplicity of presentation that \(W(x^{*} ,\cdot )\) is continuous. In this context, a particularly convenient subsolution is that of the form

$$ \bar{V}(y,t)=\bar{V}(y)=-W(x^{*}, y)+c. $$

Here \(c\in \mathbb {R}\) is the largest value such that the boundary condition \(-W(x^{*}, y)+c\le 0\) holds for all \(y\in B\).

To see that \(\bar{V}(y)\) is a subsolution, we note that W(xz) satisfies the dynamic programming equation

$$ W(x,z)=\inf _{y\in \mathbb {R}^{d}}\left[ W(x,y)+W(y, z)\right] , $$

from which it follows that for all \(y\in \mathbb {R}^{d}\) and \(c\in \mathbb {R}\),

$$ -W(x^{*},y)+c\le -W(x^{*},z)+c+W(y, z). $$

The definition of W(yz) then gives [for all \(\phi \in \bar{K}_{y,t, T}\) and with \(z=\phi (T)\)] that

$$ \bar{V}(y)\le \bar{V}(\phi (T))+\int _{0}^{T}L(\phi (s),\dot{\phi }(s))ds, $$

and therefore \(\bar{V}\) is a subsolution. One can show that under appropriate conditions, as \(T\rightarrow \infty \),

$$ \inf _{\phi \in \bar{K}_{x^{*},t, T}:\phi (T)\in B}\left[ \int _{0}^{T} L(\phi (s),\dot{\phi }(s))ds\right] $$

converges to \(\bar{V}(x^{*})\), and hence \(\bar{V}(y)\) is a potentially useful subsolution for studying escape to B from a neighborhood of the attractor \(x^{*}\) at the end of a long time interval.

With regard to the problem of estimating \(P_{x^{*}}\{X^{n}(t)\in B\) for some \(t\in [0,T]\}\), \(\bar{V}(y)\) again provides a (time-independent) subsolution with a nearly optimal value at \(x^{*}\) when T is large. The argument is similar to the case of \(P_{x^{*}}\left\{ X^{n}(T)\in B\right\} \) and is hence omitted.

6 Notes

Particle splitting methods originate with [166], and are further developed in [30]. A review of their application to rare-event problems appears in [145], as well as [223]. The RESTART algorithm, which is the focus of this chapter, was first presented in [241].

The main source for this chapter is [77], which uses a more general formulation and also phrases the assumptions to explicitly include queueing networks and expected values. Just as with importance sampling, some incorrect uses of large deviation asymptotics for the design of splitting schemes have been proposed, and a discussion on these issues can be found in [149].