1 Introduction

Small random perturbations often have a lasting effect on the long-time evolution of dynamical systems. For example, they give rise to transitions between otherwise stable equilibria, a phenomenon referred to as metastability that is observed in a wide variety of contexts, e.g. phase separation, population dynamics, chemical reactions, climate regimes, neuroscience, or fluid dynamics. Since the time-scale over which these transition events occurs is typically exponentially large in some control parameter (for example the noise amplitude), a brute-force simulation approach to compute these events quickly becomes infeasible. Fortunately, it is possible to exploit the fact that the mechanism of these transitions is often predictable when the random perturbations have small amplitude: with high probability the transitions occur by their path of maximum likelihood (PML), and knowledge of this PML also permits to estimate their rate. This is the essence of large deviation theory (LDT) [20], which applies in a wide variety of contexts. For example, systems whose evolution is governed by a stochastic (ordinary or partial) differential equation driven by a small noise or by a Markov jump process in which jumps occur often but lead to small changes of the system state, or slow/fast systems in which the fast variables are randomly driven and the slow ones feel these perturbations through the effect fast variables only, all fit within the framework of LDT. Note that, typically, the dynamics of these systems fail to exhibit microscopic reversibility (detailed balance) and the transitions therefore occur out-of-equilibrium. Nevertheless, LDT still applies.

LDT also indicates that the PML is computable as the minimizer of a specific objective function (action): the large deviation rate function of the problem at hand. This is a non-trivial numerical optimization problem which calls for tailor-made techniques for its solution. Here we will focus on one such technique, the geometric minimum action method (gMAM, [26, 39, 39]), which is based on the minimum action method and its variants [17, 41, 44], and was designed to perform the action minimization over both the transition path location and its duration. This computation gives the so-called quasipotential, whose role is key to understand the long time effect of the random perturbations on the system, including the mechanism of transitions events induced by these perturbations. Our purpose here is twofold. First, we would like to briefly review the theoretical aspects behind LDT that lead to the rate function minimization problem and, in particular, to the geometric variant of it that is central in gMAM. Second, we would like to discuss in some details the computational issues this minimization entails, and remedy a drawback of gMAM, namely its somewhat complicated descent step that requires higher order derivatives of the large deviation Hamiltonian. Here, we propose a simpler algorithm, minimizing the geometric action functional, but requiring only first order derivatives of the Hamiltonian. The power of this algorithm is then illustrated via applications to a selection of problems:

  1. 1.

    the Maier-Stein model, which is a toy non-gradient stochastic ordinary differential equation that breaks detailed balance;

  2. 2.

    a stochastic Allen-Cahn/Cahn-Hilliard partial differential equation motivated by population dynamics;

  3. 3.

    the stochastic Burgers-Huxley PDE, related to fluid dynamics and neuroscience;

  4. 4.

    Egger’s and Charney-DeVore equations, introduced as climate models displaying noise-induced transitions between metastable regimes;

  5. 5.

    a generalized voter/Ising model with multiplicative noise;

  6. 6.

    metastable networks of chemical reaction equations and reaction-diffusion equations;

  7. 7.

    a fast/slow system displaying transitions of the slow variables induced by the effects of the fast ones.

The remainder of this paper is organized as follows. In Sect. 2 we briefly review the key concepts of LDT that we will use (Sect. 2.1) and give a geometrical point of view of the theory that led to the action used in gMAM (Sect. 2.2). In Sect. 3 we discuss the numerical aspects related to the minimization of the geometric action, propose a simplified algorithm to perform this calculation, and compare it to existing algorithms. We also discuss further simplifications of the algorithm that apply in regularly occurring special cases, such as additive or multiplicative Gaussian noise. Finally, in Sect. 4 we present the applications listed above.

2 Freidlin-Wentzell Large Deviation Theory (LDT)

Here we first give a brief overview of LDT [20], focusing mainly on stochastic differential equations (SDEs) for simplicity, but indicating also how the theory can be extended to other models, such as Markov jump processes or fast/slow systems. Then we discuss the geometric reformulation of the action minimization problem that is used in gMAM.

2.1 Some Key Concepts in LDT

Consider the following SDE for \(X \in \mathbb{R}^{n}\)

$$\displaystyle{ dX = b(X)dt + \sqrt{\epsilon }\sigma (X)dW, }$$
(1)

where \(b: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n}\) denotes the drift term, W is a standard Wiener process in \(\mathbb{R}^{n}\), \(\sigma: \mathbb{R}^{n} \rightarrow \mathbb{R}^{n} \times \mathbb{R}^{n}\) is related to the diffusion tensor via a(x) = (σσ )(x), and ε > 0 is a parameter measuring the noise amplitude. Suppose that we want to estimate the probability of an event, such as finding the solution in a set \(B \subset \mathbb{R}^{n}\) at time T given that it started at X(0) = x at time t = 0. LDT indicates that, in the limit as ε → 0, this probability can be estimated via a minimization problem :

$$\displaystyle{ \mathbb{P}^{x}\left (X(T) \in B\right ) \asymp \exp \left (-\epsilon ^{-1}\min _{ \phi \in {C}}S_{T}(\phi )\right ). }$$
(2)

Here ≍ denotes log-asymptotic equivalence (i.e. the ratio of the logarithms of both sides tends to 1 as ε → 0), the minimum is taken over the set \({C} =\{\phi \in C([0,T], {R}^{n}):\phi (0) = x,\phi (T) \in B\}\), and we defined the action functional

$$\displaystyle{ S_{T}(\phi ) = \left \{\begin{array}{@{}l@{\quad }l@{}} \int _{0}^{T}L(\phi,\dot{\phi })\,dt\quad &\text{if the integral converges} \\ \infty \quad &\text{otherwise.} \end{array} \right. }$$
(3)

Here

$$\displaystyle{ L(\phi,\dot{\phi }) = \tfrac{1} {2}\langle \dot{\phi }-b(\phi ),\left (a(\phi )\right )^{-1}(\dot{\phi }-b(\phi ))\rangle, }$$
(4)

where we assumed for simplicity that a(ϕ) is invertible and 〈⋅ , ⋅ 〉 denotes the Euclidean inner product in \(\mathbb{R}^{n}\). LDT also indicates that, as ε → 0, when the event occurs, it does so with X being arbitrarily close to the minimizer

$$\displaystyle{ \phi _{{\ast}} =\mathop{ \text{argmin }}_{\phi \in {C}}S_{T}(\phi ) }$$
(5)

in the sense that

$$\displaystyle{ \forall \delta > 0: \qquad \lim _{\epsilon \rightarrow 0}\mathbb{P}^{x}{\Bigl (\,\sup _{ 0\leq t\leq T}\vert X(t) -\phi _{{\ast}}(t)\,\vert \, <\delta \Big \vert X(T) \in B\Bigr )} = 1 }$$

Thus, from a computational viewpoint, the main question becomes how to perform the minimization in (5). Note that, if we define the Hamiltonian associated with the Lagrangian  (4)

$$\displaystyle{ H(\phi,\theta ) =\langle b(\phi ),\theta \rangle +\tfrac{1} {2}\langle \theta,a(\phi )\theta \rangle }$$
(6)

such that

$$\displaystyle{ L(\phi,\dot{\phi }) =\sup _{\theta }\left (\langle \dot{\phi },\theta \rangle -H(\phi,\theta )\right ), }$$
(7)

this minimization reduces to the solution of Hamilton’s equations of motion ,

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} \dot{\phi }= H_{\theta }(\phi,\theta ) = b(\phi ) + a(\phi )\theta \quad \\ \dot{\theta }= -H_{\phi }(\phi,\theta ) = -(b_{\phi }(\phi ))^{T}\theta + \tfrac{1} {2}\langle \theta,a_{\phi }(\phi )\theta \rangle,\quad \end{array} \right. }$$
(8)

where subscripts denote differentiation and we use the convention (b ϕ ) ij = ∂b i ∂ϕ j . What makes the problem nonstandard, however, is the fact that these equations must be solved as a boundary value problem, with ϕ(0) = x and ϕ(T) = yB. We will come back to this issue below.

If the minimum of the action in (2) is nonzero, this equation indicates that the probability of finding the solution in B at time T is exponentially small in ε, i.e. it is a rare event. This is typically the case if one considers events that occur on a finite time interval, T < fixed. LDT, however, also permits to analyze the effects of the perturbations over an infinite time span, in which case they become ubiquitous. In this context, the central object in LDT is the quasipotential defined as

$$\displaystyle{ V (x,y) =\inf _{T>0}\min _{\phi \in {C}_{x,y}}S_{T}(\phi ), }$$
(9)

where \({C}_{x,y}\,=\,\{\phi \in C([0,T], {R}^{n}):\phi (0)\,=\,x,\phi (T)\,=\,y\}\). The quasipotential permits to answer several questions about the long time behavior of the system. For example, if we assume that the deterministic equation associated with (1), \(\dot{X}\,=\,b(X)\), possesses a single stable fixed point, x a , as unique stable structure, and that (1) admits a unique invariant distribution, the density associated with this distribution can estimated as ε → 0 as

$$\displaystyle{ \rho (x) \asymp \exp \left (-\epsilon ^{-1}V (x_{ a},x)\right ). }$$
(10)

Similarly, if \(\dot{X} = b(X)\) possesses two stable fixed points, x a and x b , whose basins of attraction have a common boundary, we can estimate the mean first passage time the system takes to travel for one fixed point to the other as

$$\displaystyle{ \mathbb{E}\tau _{a\rightarrow b} \asymp \exp \left (\epsilon ^{-1}V (x_{ a},x_{b})\right ), }$$
(11)

where

$$\displaystyle{ \tau _{a\rightarrow b} =\inf \{ t: X(t) \in B_{\delta }(x_{b}),X(0) = x_{a}\}, }$$
(12)

in which B δ (x b ) denotes the ball of radius δ around x b , with δ small enough so that this ball is contained in the basin of attraction of x b . In this setup, we can also estimate the ratio of the stationary probabilities to find the system in the basins of attraction of x a or x b . Denoting these probabilities by p a and p b , respectively, we have

$$\displaystyle{ \frac{p_{a}} {p_{b}} \asymp \frac{\mathbb{E}\tau _{a\rightarrow b}} {\mathbb{E}\tau _{b\rightarrow a}} \asymp \exp \left (\epsilon ^{-1}(V (x_{ a},x_{b}) - V (x_{b},x_{a}))\right ). }$$
(13)

These statements can be generalized to many other situations, e.g. if \(\dot{X} = b(X)\) possesses more than two stable fixed points, or attracting structures that are more complicated than points, such as limit cycles. They can also be generalized to dynamical systems other than (1), e.g. if this equation is replaced by a stochastic partial differential equation (SPDE), or for Markov jump processes in which the jump rates are fast but lead to small changes of the system’s state [20, 35], or in slow/fast systems where the slow variables feels random perturbations through the effect the fast variables have on them [6, 19, 28, 29, 40]. In all cases, LDT provides us with an action functional like (3), but in which the Lagrangian is different from (4) if the system’s dynamics is not governed by an S(P)DE. Typically, the theory yields an expression for the Hamiltonian (6), which may be non-quadratic in the momenta, or even such that the Legendre transform in (7) is not available analytically. This per se is not an issue, since we can in principle minimize the action by solving Hamilton’s equations (8). However, these calculations face two difficulties. The first, already mentioned above, is that (8) must be solved as a boundary value problem. The second, which is specific to the calculation of the quasipotential in (9), is that the time span over which (8) are solved must be varied as well since (9) involves a minimization over T, and typically the minimum is reached as T (i.e. there is a minimizing sequence but no minimizer) which complicates matters even more. These issues motivate a geometric reformulation of the problem, which was first proposed in [25] and we recall next.

2.2 Geometric Action Functional

As detailed in [25] (see Proposition 2.1 in that paper), the quasipotential defined in (9) can also be expressed as

$$\displaystyle{ V (x,y) =\min _{\varphi \in \hat{{C}}_{x,y}}\hat{S}(\varphi ), }$$
(14)

where \(\hat{{C}}_{x,y} =\{\varphi \in C([0,1], {R}^{n}):\varphi (0) = x,\varphi (1) = y\}\) and \(\hat{S}(\varphi )\) is the geometric action that can be defined in the following equivalent ways:

$$\displaystyle{ \hat{S}(\varphi ) =\sup _{\vartheta:H(\varphi,\vartheta )=0}\int _{0}^{1}\langle \varphi ',\vartheta \rangle ds }$$
(15a)
$$\displaystyle{ \hat{S}(\varphi ) =\int _{ 0}^{1}\langle \varphi ',\vartheta _{ {\ast}}(\varphi,\varphi ')\rangle ds }$$
(15b)
$$\displaystyle{ \hat{S}(\varphi ) =\int _{ 0}^{1} \frac{1} {\lambda (\varphi,\varphi ')}L(\varphi,\lambda \varphi ')ds, }$$
(15c)

where ϑ (φ, φ′) and λ(φ, φ′) are the solutions to

$$\displaystyle{ H(\varphi,\vartheta _{{\ast}}(\varphi,\varphi ')) = 0,\qquad H_{\vartheta }(\varphi,\vartheta _{{\ast}}(\varphi,\varphi ')) =\lambda (\varphi,\varphi ')\varphi '\quad \text{with }\lambda \geq 0. }$$
(16)

The action \(\hat{S}(\varphi )\) has the property that its value is left invariant by reparametrization of the path φ, i.e. it is an action on the space of continuous curves. In particular, one is free to choose arclength-parametrization for φ, e.g. | φ′ | = 1∕L for | φ′ | ds = L. This also means that the minimizer of (14) exists in more general cases (namely as long as the path has finite length), which makes the minimization problem easier to handle numerically, as shown next.

3 Numerical Minimization of the Geometric Action

From (14), we see that the calculation of the quasipotential reduces to a minimization problem, whose Euler-Lagrange equation is simply

$$\displaystyle{ D_{\phi }\hat{S}(\varphi ) = 0, }$$
(17)

where D φ denotes the functional gradient with respect to φ. The main issue then becomes how to find the solution φ to (17) that minimize the action \(\hat{S}(\varphi )\). In this section, we first briefly review how the gMAM achieves this task. We will then introduce a simplified variant of the gMAM algorithm that in its simplest form relies solely on first order derivatives of the Hamiltonian. Subsequently, we also analyze several special cases where the numerical treatment can be simplified even further.

3.1 Geometric Minimum Action Method

The starting point of gMAM is the following expression involving \(D_{\phi }\hat{S}(\varphi )\) that can be calculated directly from formula (15b) for the action functional:

$$\displaystyle{ -\lambda H_{\vartheta \vartheta }D_{\varphi }\hat{S}(\varphi ) =\lambda ^{2}\varphi '' -\lambda H_{\vartheta \varphi }\varphi ' + H_{\vartheta \vartheta }H_{\varphi } +\lambda \lambda '\varphi '. }$$
(18)

This is derived as Proposition 3.1 in Appendix E of [25], and we will show below how this expression can be intuitively understood. Since H ϑϑ is assumed to be positive definite and λ ≥ 0, we can use (18) directly to compute the solution of (17) that minimizes \(\hat{S}(\varphi )\) via a relaxation method in virtual time τ, that is, using the equation:

$$\displaystyle\begin{array}{rcl} \frac{\partial \varphi } {\partial \tau }& =& -\lambda H_{\vartheta \vartheta }D_{\varphi }\hat{S}(\varphi ) \\ & =& \lambda ^{2}\varphi '' -\lambda H_{\vartheta \varphi }\varphi ' + H_{\vartheta \vartheta }H_{\varphi } +\lambda \lambda '\varphi '.{}\end{array}$$
(19)

This equation is the main equation used in the original gMAM. Note that the computation of the right hand-side of this equation requires the computation of H φ , H ϑφ and H ϑϑ , where the second derivatives of the Hamiltonian possibly become unsightly for more complicated systems that arise naturally when trying to use gMAM in practical applications. In Sect. 3.2 we propose a simplification of this algorithm that reduces the terms necessary to only first order derivatives of the Hamiltonian, H ϑ and H φ .

Coming back to (18), it can be intuitively understood by using the associated Hamiltonian system. Consider a reparametrization of the original minimizer φ (s(t)) = ϕ (t). In the following we are using a dot in order to denote partial derivatives with respect to time and a prime in order to denote a partial derivative with respect to the parametrization s, hence \(\dot{v} \equiv \partial v/\partial t\) and v′ ≡ ∂v∂s. With this notation, we find for λ −1 = t′(s) that \(\dot{\phi }_{{\ast}} =\lambda \varphi _{{\ast}}'\) as well as \(\dot{\phi }_{{\ast}} = H_{\theta },\dot{\theta }_{{\ast}} = -H_{\phi }\), and therefore

$$\displaystyle\begin{array}{rcl} \ddot{\phi }_{{\ast}}& =& H_{\theta \phi }\dot{\phi }_{{\ast}} + H_{\theta \theta }\dot{\theta }_{{\ast}} {}\\ & =& \lambda H_{\theta \phi }\varphi _{{\ast}}'- H_{\theta \theta }H_{\phi } {}\\ \end{array}$$

but also, since ∂t = λ∂s,

$$\displaystyle\begin{array}{rcl} \ddot{\phi }_{{\ast}}& =& \partial (\lambda \varphi _{{\ast}}')/\partial t {}\\ & =& \lambda \lambda '\varphi _{{\ast}}' +\lambda ^{2}\varphi _{ {\ast}}'' {}\\ \end{array}$$

so in total

$$\displaystyle{ -\lambda \lambda '\varphi _{{\ast}}' +\lambda H_{\theta \phi }\varphi _{{\ast}}'- H_{\theta \theta }H_{\phi } -\lambda ^{2}\varphi _{ {\ast}}'' = 0 =\lambda H_{\theta \theta }D_{\varphi }\hat{S}(\varphi ), }$$

i.e. indeed the gradient vanishes at the minimizer.

3.2 A Simplified gMAM

In contrast to the previous section, we start from the form (15a) of the geometric action. We want to solve the mixed optimization problem , i.e. find a trajectory φ such that

$$\displaystyle{ \varphi _{{\ast}} =\mathop{ \mathop{argmin}}\limits _{\varphi \in \hat{{C}}_{x,y}}\sup _{\vartheta:H(\varphi,\vartheta )=0}E(\varphi,\vartheta ), }$$
(20)

where

$$\displaystyle{ E(\varphi,\vartheta ) =\int _{ 0}^{1}\langle \varphi ',\vartheta \rangle \,ds. }$$
(21)

Let

$$\displaystyle{ E_{{\ast}}(\varphi ) =\sup _{\vartheta:H(\varphi,\vartheta )=0}E(\varphi,\vartheta ) }$$
(22)

and ϑ (φ) such that E (φ) = E(φ, ϑ (φ)). This implies that ϑ fulfills the Euler-Lagrange equation associated with the constrained optimization problem in (22), that is,

$$\displaystyle{ D_{\vartheta }E(\varphi,\vartheta _{{\ast}}) =\mu H_{\vartheta }(\varphi,\vartheta _{{\ast}}), }$$
(23)

where on the right-hand side μ(s) is the Lagrange multiplier added to enforce the constraint H(φ, ϑ ) = 0. In particular, at ϑ = ϑ , we have

$$\displaystyle{ \mu = \frac{\|D_{\vartheta }E\|^{2}} {\langle \!\langle D_{\vartheta }E,H_{\vartheta }\rangle \!\rangle } = \frac{\|\varphi '\|^{2}} {\langle \!\langle \varphi ',H_{\vartheta }\rangle \!\rangle }, }$$
(24)

where the inner product 〈​〈⋅ , ⋅ 〉​〉 and its induced norm ∥⋅ ∥ can be chosen appropriately, for example as 〈⋅ , ⋅ 〉 or 〈⋅ , H ϑϑ −1 ⋅ 〉.

At the minimizer φ , the variation of E with respect to φ vanishes. Using (23) we conclude

$$\displaystyle\begin{array}{rcl} 0 = D_{\varphi }E_{{\ast}}(\varphi _{{\ast}})& =& D_{\varphi }E(\varphi _{{\ast}},\vartheta _{{\ast}}) + \left [D_{\vartheta }ED_{\varphi }\vartheta \right ]_{(\varphi,\vartheta )=(\varphi _{{\ast}},\vartheta _{{\ast}})} \\ & =& -\vartheta _{{\ast}}' +\mu \left [H_{\vartheta }D_{\varphi }\vartheta \right ]_{(\varphi,\vartheta )=(\varphi _{{\ast}},\vartheta _{{\ast}})} \\ & =& -\vartheta _{{\ast}}'-\mu H_{\varphi }(\varphi _{{\ast}},\vartheta _{{\ast}}), {}\end{array}$$
(25)

where in the last step we used H(φ, ϑ ) = 0 and therefore

$$\displaystyle{H_{\varphi }(\varphi,\vartheta _{{\ast}}) = -H_{\vartheta }(\varphi,\vartheta _{{\ast}})D_{\varphi }\vartheta.}$$

Multiplying the gradient (25) with any positive definite matrix as pre-conditioner yields a descent direction. It is necessary to choose μ −1 as pre-conditioner to ensure convergence around critical points, where φ′ = 0.

Summarizing, we have reduced the minimization of the geometric action into two separate tasks:

  1. 1.

    For a given φ, find ϑ (φ) by solving the constrained optimization problem

    $$\displaystyle{ \vartheta _{{\ast}}(\varphi ) =\mathop{ \mathop{argmax}}\limits _{\vartheta,H(\varphi,\vartheta )=0}E(\varphi,\vartheta ), }$$
    (26)

    which is equivalent to solving

    $$\displaystyle{ D_{\vartheta }E(\varphi,\vartheta _{{\ast}}) =\varphi ' =\mu H_{\vartheta }(\varphi,\vartheta _{{\ast}}) }$$
    (27)

    for (μ, ϑ ) under the constraint H(φ, ϑ ) = 0. This can be done via

    • gradient descent;

    • a second order algorithm for faster convergence (e.g. Newton-Raphson, as employed in [25]);

    • in many cases, analytically (see below).

  2. 2.

    Find φ by solving the optimization problem

    $$\displaystyle{ \varphi _{{\ast}} =\mathop{ \mathop{argmin}}\limits _{\varphi \in \hat{{C}}_{x,y}}E_{{\ast}}(\varphi ), }$$
    (28)

    for example by pre-conditioned gradient descent, using as direction

    $$\displaystyle{ -\mu ^{-1}D_{\varphi }E_{ {\ast}} =\mu ^{-1}\vartheta _{ {\ast}}'(\varphi ) + H_{\varphi }(\varphi,\vartheta _{{\ast}}(\varphi )), }$$
    (29)

    with μ −1 as pre-conditioner. The constraint on the parametrization, e.g. | φ′ | = const, must be fulfilled during this descent (see below).

3.3 Connection to gMAM

The problem of finding ϑ (φ) is equivalent to (16) from gMAM and the same methods are applicable. In particular note that the Lagrange multiplier μ which enforces H(φ , ϑ ) = 0 is identical to λ −1.

It is also easy to see that, at (φ , ϑ ), the combined optimization problem {D ϑ E = μH ϑ , D φ E = 0} is identical to the geometric equations of motion,

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} D_{\vartheta }E =\varphi ' =\mu H_{\vartheta } \quad \\ D_{\varphi }E_{ {\ast}} = -\vartheta '-\mu H_{\varphi } = 0.\quad \end{array} \right. }$$
(30)

On the other hand, none of the formulas in the above section use higher derivatives of the Hamiltonian: Only H φ and H ϑ are needed, which is a significant simplification. This is obviously also true for the equations of motion (8) and their geometric variant (30), which is the basis for the efficiency of algorithms like [11, 21, 22].

3.4 Simplifications for SDEs with Additive Noise

For an SDE of the form

$$\displaystyle{ dX = b(X)dt + \sqrt{\epsilon }\,dW, }$$
(31)

where σ = Id, the equations of gMAM become significantly simpler. In the following, we derive explicit expressions for this case, as it arises in numerous applications.

The corresponding Hamiltonian is given by

$$\displaystyle{ H(\varphi,\vartheta ) =\langle b,\vartheta \rangle +\frac{1} {2}\langle \vartheta,\vartheta \rangle = 0 }$$
(32)

and we find directly

$$\displaystyle{ H_{\varphi } = (b_{\varphi })^{T}\vartheta,\qquad H_{\vartheta } = b +\vartheta. }$$

In many cases, we consider exits from stable fixed points of the deterministic system where we have H = 0 which, if we also use D ϑ E = μH ϑ , permits to conclude that

$$\displaystyle{ \vert H_{\vartheta }\vert ^{2} = \vert b +\vartheta \vert ^{2} = \vert b\vert ^{2} + 2\langle b,\vartheta \rangle +\langle \vartheta,\vartheta \rangle = \vert b\vert ^{2} + 2H = \vert b\vert ^{2}. }$$
(33)

As a result

$$\displaystyle{ \mu = \frac{\vert D_{\vartheta }E\vert } {\vert H_{\vartheta }\vert } = \frac{\vert \varphi '\vert } {\vert b +\vartheta \vert } = \frac{\vert \varphi '\vert } {\vert b\vert }, }$$
(34)

i.e. we can compute μ without the knowledge of ϑ. On the other hand (27) implies

$$\displaystyle{ \varphi ' =\mu H_{\vartheta } =\mu (b+\vartheta )\quad \Rightarrow \quad \vartheta =\mu ^{-1}\varphi ' - b. }$$
(35)

The whole algorithm therefore reduces to the gradient descent

$$\displaystyle{ \frac{\partial \varphi } {\partial \tau } =\mu ^{-1}\vartheta _{ {\ast}}' + (b_{\varphi })^{T}\vartheta _{ {\ast}}, }$$
(36)

with μ, ϑ given by (34) and (35). Examples in this class will be treated in Sects. 4.1, 4.2, and 4.4 below.

3.5 Simplifications for General SDEs (Multiplicative Noise)

As a slightly more complicated case, consider the following SDE with multiplicative noise :

$$\displaystyle{ dX = b(X)\,dt + \sqrt{\epsilon }\sigma (X)\,dW, }$$
(37)

where a(φ) = σ(φ)σ (φ). Then the Hamiltonian reads

$$\displaystyle{ H(\varphi,\vartheta ) =\langle b,\vartheta \rangle +\tfrac{1} {2}\langle \vartheta,a\vartheta \rangle }$$
(38)

and

$$\displaystyle{ H_{\varphi } = (b_{\varphi })^{T}\vartheta + \tfrac{1} {2}\langle \vartheta,(a_{\varphi })\vartheta \rangle,\qquad H_{\vartheta } = b + a\vartheta. }$$
(39)

Defining an inner product and norm induced by the correlation, 〈u, v a = 〈u, a −1 v〉 and | u | a = 〈u, u a 1∕2 yields, as before,

$$\displaystyle{ \vert H_{\vartheta }\vert _{a} = \vert b\vert _{a}\qquad \Rightarrow \qquad \mu = \frac{\vert \varphi '\vert _{a}} {\vert b\vert _{a}} }$$
(40)

and

$$\displaystyle{ \vartheta = a^{-1}(\mu ^{-1}\varphi ' - b). }$$
(41)

In the case of multiplicative noise, the algorithm therefore reads

$$\displaystyle{ \frac{\partial \varphi } {\partial \tau } =\mu ^{-1}\vartheta _{ {\ast}}' + \left ((b_{\varphi })^{T}\vartheta _{ {\ast}} + \tfrac{1} {2}\langle \vartheta _{{\ast}},(a_{\varphi })\vartheta _{{\ast}}\rangle \right ), }$$
(42)

with μ, ϑ given by (40) and (41). An example in this will be treated in Sect. 4.5.

It is also worth pointing out that we encounter difficulties as soon as the noise correlation a is not invertible. This is equivalent to stating that some degrees of freedom are not subject to noise and thus behave deterministically. The adjoint field ϑ has to be equal to zero on these modes, and they fulfill the deterministic equation φ′ = b exactly. This translates into additional constraints for the minimization procedure, which have to be enforced numerically.

3.6 Comments on Improving the Numerical Efficiency

To increase the numerical efficiency of the algorithm, some alterations are possible:

  • Arc-length parametrization , | φ′ | = const, can be enforced trivially and without introducing a stiff Lagrange multiplier term by interpolation along the trajectory every (or every few) iterations. As additional benefit of this method all terms of the relaxation dynamics which are proportional to φ′ can be discarded, as they are canceled by the reparametrization. This is of particular use in applications that involve PDEs (see Sect. 3.7), as shown in examples below.

  • Stability in the relaxation parameter can be greatly increased if one treats the stiffest term of the relaxation equation implicitly. In ODE systems, the stiffest term usually is H ϑϑ −1 φ″, which is contained in ϑ′. For simplicity of implementation, it is sufficient to compute ϑ in the usual way, apply ϑ ′ in the descent step, but subtract H ϑϑ −1 φ n ″ and add H ϑϑ −1 φ n+1″ here. This approach also extends to the case of general Hamiltonians, where the dependence of ϑ on φ′ is less obvious.

    In our implementation, the relaxation step is conducted by computing

    $$\displaystyle{ \varphi _{n+1} = \left (1 - h\mu ^{-2}H_{\vartheta \vartheta }^{-1}\partial _{ s}^{2}\right )^{-1}R_{ n}, }$$
    (43)

    where

    $$\displaystyle{R_{n} = \left (\varphi _{n} + h(\mu ^{-1}\vartheta _{ {\ast}}'(\varphi _{n}) + H_{\varphi }(\varphi _{n},\vartheta _{{\ast}}(\varphi _{n})) -\mu ^{-2}H_{\vartheta \vartheta }^{-1}\varphi _{ n}'')\right ).}$$

    This division into an implicit treatment of the stiffest term and explicit treatment of the rest is the simplest case of Strang splitting [36] and the implementation of (43) is only first order accurate. The splitting can be taken to arbitrary order [43] under additional computational cost.

    Note that the above modification, while increasing efficiency, at the same time increases complexity, as the computation of the second derivative H ϑϑ becomes necessary. In practice, if the Hamiltonian is not too complex, we find that the benefits outweigh the implementation costs, and some problems, especially PDE systems, are not tractable at all with the inefficient but simpler choice of explicit relaxation. If the PDE system contains higher-order spatial derivatives, even more terms should possibly be treated with a stable integrator, as is discussed in the next section.

  • Depending on the problem, it might be beneficial to choose a different scalar product in the descent. In case of traditional gMAM, the descent is done using 〈⋅ , (μ 2 H ϑϑ )−1 ⋅ 〉, but other choices are also feasible. Note that it is possible to choose the metric such that at least one term at the right-hand side disappears, as it becomes parallel to the trajectory and is canceled by reparametrization, as outlined above.

  • Some insight about the nature of the transition can be obtained by first finding the heteroclinic orbits defined geometrically as

    $$\displaystyle{ \varphi ' \parallel b(\varphi ). }$$
    (44)

    This calculation can be done very efficiently even for complicated problems via the string method [16]. Even though the heteroclinic orbit differs from the transition path for systems that violate detailed balance, it does correctly predict the transition from the saddle point onward (the “downhill” portion, which happens deterministically). The method put forward here can then be used to find the transition path up to the saddle (the “uphill” portion) only. If there are several saddles to be taken into account, it is not known a priori which one will be visited by the transition pathway. In this case, the strategy has to be modified accordingly, for example by computing one heteroclinic orbit per saddle. To highlight the relation between the string and the minimizer, we compute and compare the two in many of the applications below. We denote with “string” the heteroclinic orbits connecting the fixed points to the saddle point of relevance found via the string method.

3.7 SPDEs with Additive Noise

In this section, we discuss the application to SPDE systems. For simplicity, we focus on the case of SPDEs with additive noise that can be written formally as

$$\displaystyle{ U_{t} = B(U) + \sqrt{\epsilon }\,\eta (x,t), }$$
(45)

where the drift term is given by the operator B(U) and η denotes spatio-temporal white-noise. It is a non-trivial task to make mathematical sense of such SPDEs under spatially irregular noise due to the possible ill-posedness of non-linear terms, especially if the spatial dimension is higher than one. This may require to renormalize the equation, which can be done rigorously in certain cases using the theory of regularity structures [23]. The renormalization procedure typically involves mollifying the noise term on a scale δ, and adding terms in the equation that counterbalance divergences that may occur as one lets δ → 0. In the context of LDT, the main issue is whether these renormalizing terms subsist if we also let ε → 0. In [24], it was shown in the context of the stochastic Allen-Cahn equation in 2 or 3 spatial dimensions that the action of the mollified equation converges towards the action associated with the (possibly formal) equation in (45) in which the noise is white-in-space provided that ε is sent to zero fast enough as δ → 0. This action reads

$$\displaystyle{ S_{T}(\phi ) = \frac{1} {2}\int _{0}^{T}\|\phi _{ t} - B(\phi )\|_{L^{2}}^{2}dt, }$$
(46)

where \(\|\cdot \|_{L^{2}}\) denotes the L 2-norm. This leads to expressions for the geometric action that are similar to those in (15) but with the Euclidean inner product replaced by the L 2-inner product. In the sequel we will not dwell further on these mathematical issues and always assume that (46) and the associated geometric action are the relevant one to study.

The gradient descent for the minimizer of this geometric action is similar to the one in (36) but with the term (b φ )T replaced by the functional derivative of the operator B with respect to φ.

$$\displaystyle{ \frac{\partial \varphi } {\partial \tau } =\mu ^{-1}\vartheta _{ {\ast}}' + \left (D_{\varphi }B\right )^{T}\vartheta _{ {\ast}}\,. }$$
(47)

In practice, however, this equation needs to be rewritten in order to allow for numerical stability. This is due to the fact that the scheme will contain derivatives of high orders, and their corresponding stability condition (CFL condition) will limit the rate of convergence of the scheme. We therefore want to treat the most restrictive terms either implicitly or with exponential integrators. To this end, let us focus on the following class of problems where the drift B can be written as

$$\displaystyle{ B = L\varphi + R(\varphi ), }$$
(48)

where L is a linear self-adjoint operator containing higher-order derivatives that does not depend on time explicitly, and R(φ) is the rest, possibly nonlinear. Recall that ϑ can be computed from φ′ via

$$\displaystyle{ \vartheta _{{\ast}} =\mu ^{-1}\varphi ' - B =\mu ^{-1}\varphi ' - L\varphi - R(\varphi ). }$$
(49)

On the other hand, we have also a term proportional to L in

$$\displaystyle{ D_{\varphi }B = D_{\varphi }R + L }$$
(50)

and, therefore, the relaxation formula (47) for φ actually contains a term L 2 φ. If L contains higher-order derivatives, this term will likely be the most restrictive in terms of numerical stability. It is therefore advantageous to treat it separately. Introducing an auxiliary variable \(\tilde{\vartheta }_{{\ast}}\) defined by

$$\displaystyle{ \tilde{\vartheta }_{{\ast}} =\mu ^{-1}\varphi ' - R(\varphi ) =\vartheta _{ {\ast}} + L\varphi }$$
(51)

we can rewrite the relaxation formula as

$$\displaystyle\begin{array}{rcl} \frac{\partial \varphi } {\partial \tau }& =& \mu ^{-1}\vartheta _{ {\ast}}' + \left (D_{\varphi }B\right )^{T}\vartheta _{ {\ast}} {}\\ & =& \mu ^{-1}\tilde{\vartheta }_{ {\ast}}'-\mu ^{-1}L\varphi ' + \left (D_{\varphi }R\right )^{T}\vartheta _{ {\ast}} + L\vartheta _{{\ast}} {}\\ & =& \tilde{\mu }^{-1}\vartheta _{ {\ast}}'-\mu ^{-1}L\varphi ' + \left (D_{\varphi }R\right )^{T}\vartheta _{ {\ast}} + L(\tilde{\vartheta }_{{\ast}}- L\varphi ) {}\\ & =& \mu ^{-1}\tilde{\vartheta }_{ {\ast}}'-\mu ^{-1}L\varphi ' + \left (D_{\varphi }R\right )^{T}\vartheta _{ {\ast}} + L\tilde{\vartheta }_{{\ast}}- L^{2}\varphi {}\\ & =& \mu ^{-1}\tilde{\vartheta }_{ {\ast}}' + \left (D_{\varphi }R\right )^{T}\vartheta _{ {\ast}}- LR(\varphi ) - L^{2}\varphi. {}\\ \end{array}$$

The term L 2 φ is now separated and can be treated independently. Since it is linear by definition, it can be treated very efficiently with an integrating factor by employing exponential time differencing (ETD) [5]. For an equation with a deterministic term of the form (48), multiplying by the integrating factor e and integrating from τ n to τ n+1 = τ n + h, one obtains the exact formula

$$\displaystyle{ \varphi _{n+1} = e^{Lh}\varphi _{ n} + e^{Lh}\int _{ 0}^{h}e^{-L\tau }R(\varphi (t_{ n}+\tau ))\,d\tau, }$$
(52)

which can be approximated by

$$\displaystyle{ \varphi _{n+1} = e^{Lh}\varphi _{ n} + (e^{Lh} -\mathrm{ Id})L^{-1}R(\varphi _{ n}), }$$
(53)

when treating the linear part of the equation exactly and approximating the integral to first order. This scheme can be taken to higher order [12] and its stability improved [27], but a first order scheme proved to be sufficient for the examples given below. For the descent (47) we want to treat the stiffest part − L 2 φ with ETD, so the integrating factor here becomes \(e^{-L^{2}\tau }\).

A complete relaxation step then consists of

  1. 1.

    compute ϑ and \(\tilde{\vartheta }_{{\ast}}\) using the explicit formulas

    $$\displaystyle{ \tilde{\vartheta }_{{\ast}} =\mu ^{-1}\varphi ' - R(\varphi ),\qquad \vartheta _{ {\ast}} =\tilde{\vartheta } _{{\ast}}- L\varphi \,; }$$
  2. 2.

    compute the explicit step

    $$\displaystyle{ \xi =\mu ^{-1}\tilde{\vartheta }_{ {\ast}}' + \left (D_{\varphi }R\right )^{T}\vartheta _{ {\ast}}- LR(\varphi ) -\mu ^{-2}H_{\vartheta \vartheta }^{-1}\varphi _{ n}''\,, }$$

    where as in the SDE case, if needed, we can subtract the term μ −2 H ϑϑ −1 φ n ″ to treat it implicitly later;

  3. 3.

    perform an ETD step

    $$\displaystyle{ \bar{\varphi }= e^{-L^{2}h }\varphi _{n} - (e^{-L^{2}h } -\mathrm{ Id})(L^{2})^{-1}\xi \,; }$$
  4. 4.

    apply the second derivative in arc-length direction implicitly,

    $$\displaystyle{ \varphi _{n+1} = (1 - h\mu ^{-2}H_{\vartheta \vartheta }^{-1}\partial _{ t}^{2})^{-1}\bar{\varphi }. }$$

Note that the integral factors \(e^{-L^{2}h }\) and \((e^{-L^{2}h } -\mathrm{ Id})(\mu L^{2})^{-1}\) are possibly costly to compute, as they contain matrix-exponentials and inversions. However, the computation can be done once before starting the iteration, so that the associated computational cost becomes negligible. In contrast, this is not true in general for the implicit step 4, since μ −2 H ϑϑ −1 might depend on the fields in a complicated way and has to be recomputed at every iteration.

4 Illustrative Applications

In what follows we apply our simplified gMAM to the series of examples listed in the introduction. These examples illustrate specific questions encountered in practical applications arising in a variety of fields, in which the computation of the rate and mechanism of transitions is of interest. Note that all these examples involve non-equilibrium systems whose dynamics break detailed balance, so that simpler methods of computation are not readily available.

In the following, we will break our notation convention and instead use the notation of the respective fields to minimize confusion.

4.1 Maier-Stein Model

Maier and Stein’s model [31] is a simple system often used as benchmark in LDT calculations. It reads

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} du = (u - u^{3} -\beta uv^{2})dt + \sqrt{\epsilon }dW_{u}\quad \\ dv = -(1 + u^{2})vdt + \sqrt{\epsilon }dW_{v}, \quad \end{array} \right. }$$
(54)

where β is a parameter. For all values of β, the deterministic system has the two stable fixed points, φ = (−1, 0) and φ + = (1, 0), and a unique unstable critical point φ s = (0, 0). However it satisfies detailed balance only for β = 1. In this case, we can write the drift in gradient form, b(φ) = ∇ φ U(φ), and the minimizers of the geometric action that connects φ to φ + and vice-versa are the time-reverse of each other and lie on the location of the heteroclinic orbit where φ′ ∥∇U. Here, we use β = 10, in which case detailed balance is broken and the forward and backward transition pathways are no longer identical. Since the noise is additive, the system (54) falls into the category discussed in Sect. 3.4 and can be solved with the simplest variant of the algorithm. The minimizer of the action connecting φ to φ + and the value of the action along it are shown in Fig. 1. Since the system is invariant under the transformation v → −v, there is also a minimizer with identical action in the v < 0 half-plane. Similarly, the paths from φ + to φ can be obtained via the transformation u → −u. The numerical parameters used in these calculations were h = 10−1, N s = 210, where N s denotes the number of configurations along the transition trajectory or the number of images.

Fig. 1
figure 1

Maier-Stein model, β = 10. Left: PML and heteroclinic orbit. The arrows denote the direction of the deterministic flow, the shading its magnitude. The solid line depicts the minimizer, the dashed line the heteroclinic orbit. Dots are located at the fixed points (circle: stable; square: saddle). Right: Action density along the minimizer and the heteroclinic orbit

4.2 Allen-Cahn/Cahn-Hilliard System

Pattern formation in motile micro-organisms is often driven by non-equilibrium forces, leading to visible patterns in cellular colonies [8, 34]. For example, E. coli in a uniform suspension separates into a bacteria-rich and a bacteria-poor phase if the swim speed decreases sufficiently rapidly with density [37]. Here we study a model inspired by these phenomena. We note that this model does not permit the thermodynamic mapping used in [37], so that understanding the non-equilibrium transitions in the model requires minimization of the geometric action of LDT.

4.2.1 Reduced Allen-Cahn/Cahn-Hilliard Model

Consider the SDE system

$$\displaystyle{ d\phi =\Bigg (\frac{1} {\alpha } Q(\phi -\phi ^{3}\Bigg)-\phi )dt + \sqrt{\epsilon }dW }$$
(55)

with ϕ = (ϕ 1, ϕ 2) and the matrix Q = ((1, −1), (−1, 1)). This system does not satisfy detailed balance, as its drift is made of two gradient terms with incompatible mobility operators (namely Q and Id). Model (55) can be seen as a 2-dimensional reduction to a discretized version of the continuous Allen-Cahn/Cahn-Hilliard model discussed later in Sect. 4.2.2.

The deterministic flowlines of (55) are depicted in Fig. 2. The deterministic dynamics has two stable fixed points, ϕ A = (−1, 1) and ϕ B = (1, −1), and an unstable critical point, ϕ S = (0, 0), lying on the separatrix where ϕ 1 = ϕ 2 between the basins of ϕ A and ϕ B . The location of the heteroclinic orbits connecting ϕ S to ϕ A and ϕ B is a straight line between these points. When α is small in (55), there exists a “slow manifold” , comprised of all points where Q(ϕϕ 3) = 0 which is shown as a white dashed line in Fig. 2. On this manifold, the deterministic dynamics are of order O(1), which is small in comparison to the dynamics of the Q-term, which are of order O(1∕α). This suggests that for small enough α the transition trajectory will follow this slow manifold on which the drift is small, rather than the heteroclinic orbit, to escape the basin of the stable fixed points. This is confirmed in Fig. 2 where we show the action minimizer connecting ϕ B to ϕ A . As can be seen, the minimizer first tracks the slow manifold, and it approaches the separatrix at a point far from ϕ S . It then follows closely the separatrix towards ϕ S (which has to be part of the transition) to cross into the other basin and then relax (deterministically) towards ϕ A .

Fig. 2
figure 2

Allen-Cahn/Cahn- Hilliard toy ODE model, α = 0. 01. The arrows denote the direction of the deterministic flow, the color its magnitude. The white dashed line corresponds to the slow manifold. The solid line depicts the minimizer, the dashed line the heteroclinic orbit. Markers are located at the fixed points (circle: stable; square: saddle)

The action along the minimizer and the paths made of the heteroclinic orbits are depicted in Fig. 3 (left). Notably, due to its movement along the slow manifold, the action along the minimizer is smaller by a factor of order α. Minimizers for different values of α are shown in Fig. 3 (right). Note that in the opposite limit α ≫ 0 the switch to a straight line happens at a finite value α ≈ 1. 12.

Fig. 3
figure 3

Left: Action density along the path for the 2-dimensional reduced model. Path parameter is normalized to s ∈ (0, 1). For the second half of the transition, the action density is zero. Right: Minimizers of the action functional for different values of α. For α → 0, the minimizer approaches the slow manifold. Note that the switch to a straight line minimizer happens at a finite value α ≈ 1. 12

In these computations, we used N s = 214, h = 10−2.

4.2.2 Full Allen-Cahn/Cahn-Hilliard Model

Consider next the SPDE

$$\displaystyle{ \phi _{t} = \frac{1} {\alpha } P(\kappa \phi _{xx} +\phi -\phi ^{3}) -\phi +\sqrt{\epsilon }\eta (x,t), }$$
(56)

where P is an operator with zero spatial mean and η(x, t) a spatio-temporal white-noise. This model is again of the form of two competing gradient flows with different mobilities:

$$\displaystyle{ \phi _{t} = -M_{1}D_{\phi }V _{1}(\phi ) - M_{2}D_{\phi }V _{2}(\phi ) + \sqrt{\epsilon }M_{2}^{1/2}\eta (x,t), }$$
(57)

with

$$\displaystyle{ V _{1}(\phi ) = \frac{1} {2}\kappa \vert \phi _{x}\vert ^{2} + \frac{1} {2}\vert \phi \vert ^{2} -\frac{1} {4}\vert \phi \vert ^{4},\quad M_{ 1} = \frac{1} {\alpha } P }$$
(58a)
$$\displaystyle{ V _{2}(\phi ) = -\frac{1} {2}\vert \phi \vert ^{2},\qquad \qquad \qquad \qquad M_{ 2} =\mathrm{ Id}. }$$
(58b)

For P = − x 2 the system is a mixture of a stochastic Allen-Cahn [2] and Cahn-Hilliard [7] equation. Here we will consider , which is similar in most aspects discussed below but simpler to handle numerically. We are again interested in situations where α is small, and the time scales associated with V 1 and V 2 differ significantly. In this case it will turn out that transition pathways are very different from the heteroclinic orbits, in that the separatrix between the basins of attraction is approached far from the unstable critical point of the deterministic system. This behavior is reminiscent of the 2-dimensional example discussed above, but in an SPDE setting.

The fixed points of the deterministic (ε = 0) dynamics of system (56) are the solutions of

$$\displaystyle{ P(\kappa \phi _{xx} +\phi -\phi ^{3})-\alpha \phi = 0. }$$
(59)

The only constant solution of this equation is the trivial fixed point ϕ(x) = 0, whose stability depends on α and κ. In the following, we choose α = 10−2 and κ = 2 ⋅ 10−2, in which case ϕ(x) = 0 is unstable. The two stable fixed points obtained by solving (59) for these values of α and κ are depicted in Fig. 4 as ϕ A and ϕ B , with ϕ A = −ϕ B . An unstable fixed point configuration on the separatrix between ϕ A and ϕ B is also shown as ϕ S .

Fig. 4
figure 4

The configurations A, B, S, X in space: ϕ A and ϕ B are the two stable fixed points, ϕ S is the unstable fixed point on the separatrix in between. At point ϕ X , the slow manifold intersects the separatrix

For finite but small α, the deterministic part of (56) has a “slow manifold” made of the solutions of

$$\displaystyle{ P(\kappa \phi _{xx} +\phi -\phi ^{3}) = 0. }$$
(60)

On this manifold the motion is driven solely by changing the mean via the slow terms, \(-\phi + \sqrt{\epsilon }\,\eta (x,t)\), on a time-scale of order O(1) in α. After two integrations in space, (60) can be written as

$$\displaystyle{ \kappa \phi _{xx} +\phi -\phi ^{3} =\lambda, }$$
(61)

where λ is a parameter. As a result the slow manifold can be described as one-parameter families of solutions parametrized by \(\lambda \in \mathbb{R}\)—in general there is more than one family because the manifold can have different branches corresponding to solutions of (59) with a different number of domain walls. The configuration labeled as ϕ X in Fig. 4 shows the field at the intersection of one of these branches with the separatrix. Since the deterministic drift along the slow manifold is small compared to the O(1∕α) drift induced by the Cahn-Hilliard term, one expects that the most probable transition pathway will use this manifold as channel to escape the basin of attraction of the stable fixed points ϕ A or ϕ B . This intuition is confirmed by the numerics, as shown next.

Figure 5 (left) shows the heteroclinic orbit connecting the two stable fixed points ϕ A and ϕ B to the unstable configuration ϕ S . The mean is preserved along this orbit, which involves a nucleation event at the boundaries followed by domain wall motion through the domain. The unstable fixed point ϕ s , denoted by S, which also demarcates the position at which the separatrix is crossed, is the spatially symmetric configuration with a positive central region and two negative regions at the boundary. Locations A and B label the two stable fixed points ϕ A and ϕ B .

Fig. 5
figure 5

Transition pathways between two stable fixed points of equation (56) in the limit ε → 0. Left: heteroclinic orbit, defining the deterministic relaxation dynamics from the unstable point S down to either A or B. Right: Minimizer of the geometric action, defining the most probable transition pathway from A to B, following the slow manifold up to X, where it starts to nearly deterministically travel close to the separatrix into S

In contrast, Fig. 5 (right) shows the minimizer of the geometric action, which is the most probable transition path as ε → 0. It was computed via the algorithm outlined in Sect. 3.7, with \(L = \frac{1} {\alpha } P\kappa \partial _{x}^{2} -\mathrm{ Id}\) and \(R(u) = \frac{1} {\alpha } P(u - u^{3})\). Starting at the fixed point A the minimizer takes a very different path than the heteroclinic orbit. It first moves the domain wall, at vanishing cost for α → 0, without nucleation. At the point X the motion changes, tracking closely the separatrix towards the unstable point S. From this point onward, SB, the transition path then follows the heteroclinic orbit, which is the deterministic relaxation path. In this respect, the SPDE model (56) resembles closely the 2-dimensional model (55).

To further illustrate this resemblance, we choose to project the minimizer and the heteroclinic orbit onto two coordinates,

  1. 1.

    its mean ∫ϕ(x) dx, which resembles the direction ϕ 1 + ϕ 2 of the 2-dimensional model, and

  2. 2.

    its component in the direction of the initial (or final) state, ∫ϕ(x)ϕ A (x) dx, which corresponds to the direction ϕ 1ϕ 2 of the 2-dimensional model.

The transition path and the heteroclinic projected in these reduced coordinates are depicted in Fig. 6. Note that this figure is not a schematic, but the actual projection of the heteroclinic orbit and the minimizer of Fig. 5 according to (i) and (ii) above. The separatrix is the straight line ∫ϕ(x)ϕ A (x) dx = 0. The movement of the minimizer (dark) closely along the slow manifold (dashed), AX, and the separatrix, XS, (which is also part of the slow manifold) into S highlights its difference with the heteroclinic orbit (light). The configurations at the points A, B, S and X are depicted in Fig. 4, while Fig. 7 shows the action density dS along the transition path. Note that this quantity becomes close to zero already at X, because the minimizer follows closely the separatrix from X to S, and this motion is therefore quasi-deterministic.

Fig. 6
figure 6

Projection of the heteroclinic orbit and the minimizer of the action functional into a 2-dimensional plane. The x-direction is proportional to its component in the direction of the initial condition ϕ A while the y-direction corresponds to its spatial mean. The stable fixed points are located at A and B, the unstable fixed point at S. The separatrix is the straight line ∫ϕ(x)ϕ A (x) dx = 0. The heteroclinic orbit (light) travels ASB in a horizontal line with vanishing mean, while the minimizer (dark) travels first along the slow manifold (dashed) AX and then tracks the separatrix from X to S

Fig. 7
figure 7

Action along the minimizer. Note that the action is non-zero climbing up the slow-manifold, but diminishes to zero already at X when it approaches the separatrix, before it reaches S

The numerical parameters we used in these computations are h = 10−1, N s = 100, N x = 26, where N x denotes the number of spatial discretization points.

4.3 Burgers-Huxley Model

As a second example involving an SPDE, we consider

$$\displaystyle{ u_{t} +\alpha uu_{x} -\kappa u_{xx} = f(u,x,t) + \sqrt{\epsilon }\eta (x,t). }$$
(62)

where α > 0 and κ > 0 are parameters, and we impose periodic boundary condition on x ∈ [0, 1]. Without the term f(u, x, t), this is the stochastic Burgers equation which arises in a variety of fields, in particular in the context of compressible gas dynamics, traffic flow, and fluid dynamics. With the reaction term f(u, x, t) added this equation is referred to as the (stochastic) Burgers-Huxley equation [42], which has been used e.g. to describe the dynamics of neurons. The addition of a reaction term makes it possible to obtain multiple stable fixed points. As a particular case, we will consider (62) with

$$\displaystyle{ f(u,x,t) = -u(1 - u)(1 + u) }$$
(63)

so that u + = 1 and u = −1 are the two stable fixed points of the deterministic dynamics. We are interested in the mechanism of the noise-induced transitions between these points.

When α = 0, the system is in detailed balance and therefore the forward and backward reaction follow the same path. The potential associated with the reaction term (63) is symmetric under u → −u, and both states are equally probable. In contrast, when α ≠ 0 it is not obvious a priori whether u + and u are equally probable, since the non-linearity breaks the spatial symmetry, leading to a steepening of negative gradients into shocks while flattening positive gradients. A computation of the minimizer of the geometric action in both directions, for κ = 0. 01 and \(\alpha = \tfrac{1} {4}\) reveals that indeed forward and backward reactions are equally probable, even though the transition paths do not coincide with the heteroclinic orbits. The transition from u to u + is depicted in Fig. 8 (left). An intuitive explanation for the equal probability of u + and u is given by the fact that the backward reaction pathways is identical to the forward path under the transformation u → −u, x → −x. The action along this minimizer is depicted in Fig. 8 (right). The minimizer is computed via the algorithm lined out in Sect. 3.7, with L = −κ∂ x 2 and R(u) = αuu x + u(1 − u)(1 + u).

Fig. 8
figure 8

Burgers-Huxley equation: Minimizer switching from u = −1 to u + = 1. Left: u-field. The saddle-point is marked with a dashed line. There is a noticeable kink in the dynamics switching from uphill (s < s saddle) to downhill (s > s saddle) dynamics. Right: Action density along the minimizer

The numerical parameters were chosen as N s = 100, N x = 28, h = 5 ⋅ 10−3.

4.4 Noise-Induced Transitions Between Climate Regimes

Many climate systems exhibit metastability. Examples include the Kuroshio oceanic current off the coast of Japan, which can be in either a small or a large meander state and rarely switches between the two [9, 33], or the atmospheric mid-latitude circulation over the North-Atlantic, which makes rare transitions between a strongly zonal and a weakly zonal (“blocked”) flow, characterized as “Grosswetterlagen” in [4]. In these and similar examples, the climate system stays trapped in the vicinity of the stable regimes most of the time. Random noise, originating either from physical stresses or from unresolved modes in truncated models, induces rare regime transitions, which can be captured by large deviation minimizers. The transition trajectory and their corresponding action allow to make statements about not only the relative probability of the different regimes and the transition rates, but also the exact transition pathway taken to switch between regimes.

We want to illustrate the feasibility of our numerical scheme for this particular field of application by investigating metastability in two simple climate models: A three-dimensional model for Grosswetterlagen proposed by Egger [18] and the six-dimensional Charney-DeVore model [10]. Due to their highly truncated nature, both models have very limited predictive power, but exemplify the phenomenon of metastability in climate patterns or regimes.

4.4.1 Metastable Climate Regimes in Egger’s Model

Egger [18] introduces the following SDE system as a crude model to describe weather regimes in central Europe:

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} da = kb(U -\beta /k^{2})\,dt -\gamma a\,dt + \sqrt{\epsilon }dW_{a}, \quad \\ db = -ka(U -\beta /k^{2})\,dt + UH/k\,dt -\gamma b\,dt + \sqrt{\epsilon }dW_{b},\quad \\ dU = -bHk/2\,dt -\gamma (U - U_{0})\,dt + \sqrt{\epsilon }dW_{U}. \quad \end{array} \right. }$$
(64)

When ε is small, these equation exhibit metastability between a “blocked state” and a “zonal state”, shown in Fig. 9. We use our gMAM algorithm to compute the transition paths between these states. The system (64) falls into the category discussed in Sect. 3.4 and can be solved with the simplest variant of the algorithm. For H = 12, β = 1. 25, γ = 2, k = 2 and U 0 = 10. 5, the fixed points are approximately (a, b, U) = (0. 465, 1. 65, 0. 593) for the blocked, (3. 07, 0. 392, 8. 15) for the zonal and (2. 80, 1. 35, 2. 38) for the unstable fixed point (saddle). The minimizers of the action are show in Fig. 9(left) where they are compared to the heteroclinic orbits that connects the unstable critical points to the stable ones. The action density along the transition trajectories and the heteroclinic orbits is depicted in Fig. 9 (right).

Fig. 9
figure 9

Egger’s model with H = 12, β = 1. 25, γ = 2, k = 2, U 0 = 10. 5 Left: Minimizers and deterministic relaxation paths. Right: Comparison of the action density

The numerical parameters we used in these computations are N s = 28, h = 10−3.

4.4.2 Metastable Climate Regimes in the Charney-DeVore Model

Egger’s model retains no nonlinear interaction between different fluid modes, which is believed to be insufficient to explain the transitions between zonal and blocked states. A more sophisticated model, truncating the barotropic vorticity equation (BVE) with full nonlinear terms, was introduced by Charney and DeVore [10]. Their starting point is the two-dimensional BVE on the β-plane,

$$\displaystyle{ \frac{\partial } {\partial t}\omega = u \cdot \nabla \omega - C(\omega -\omega ^{{\ast}}). }$$
(65)

Here ω = ζ + βy + γh is the total vorticity, where γh is the topography in the β-plane, with β = 2Ωcos(θ)∕R for planetary angular velocity Ω, radius R and latitude θ, and ζ = Δψ is the relative vorticity for the stream-function ψ. The term − C(ωω ) accounts for Ekman damping with coefficient C > 0.

Charney-DeVore considered the vorticity equation (65) in the box [0, 2π] × [0, πb] with periodic boundary conditions in x-direction and no-slip boundary conditions in y-direction. They then projected this equation over 6 Fourier modes in total, using the following representation for the stream-function ψ(x, y, t):

$$\displaystyle{ \psi (x,y,t) =\sum _{n,m}\psi _{nm}(t)\phi _{nm}(x,y), }$$
(66)

where the sums run on n ∈ {−1, 0, 1} and m ∈ {1, 2} and

$$\displaystyle{ \phi _{0m}(y) = \sqrt{2}\cos (my/b),\qquad \qquad \phi _{nm}(x,y) = \sqrt{2}e^{inx}\sin (my/b). }$$
(67)

Letting x i , i ∈ {1, , 6} be defined as

$$\displaystyle{ \begin{array}{l} x_{1} = \frac{1} {b}\psi _{01},\quad x_{2} = \frac{1} {\sqrt{2}b}\left (\psi _{11} +\psi _{-11}\right ),\quad x_{3} = \frac{i} {\sqrt{2}b}\left (\psi _{11} -\psi _{-11}\right ), \\ x_{4} = \frac{1} {b}\psi _{02},\quad x_{5} = \frac{1} {\sqrt{2}b}\left (\psi _{12} +\psi _{-12}\right ),\quad x_{6} = \frac{i} {\sqrt{2}b}\left (\psi _{12} -\psi _{-12}\right ), \end{array} }$$
(68)

taking the following form for the topography

$$\displaystyle{ h(x,y) =\cos (x)\sin (y/b), }$$
(69)

and choosing ω such that only two parameters x 1 and x 4 are free and the other are set zero, they arrived at the following six-dimensional model

$$\displaystyle{ \begin{array}{rl} dx_{1} & = \left (\tilde{\gamma }_{1}x_{3} - C(x_{1} - x_{1}^{{\ast}})\right )dt + \sqrt{2\epsilon }dW_{1}, \\ dx_{2} & = \left (-(\alpha _{1}x_{1} -\beta _{1})x_{3} - Cx_{2} -\delta _{1}x_{4}x_{6}\right )dt + \sqrt{2\epsilon }dW_{2}, \\ dx_{3} & = \left ((\alpha _{1}x_{1} -\beta _{1})x_{2} -\gamma _{1}x_{1} - Cx_{3} +\delta _{1}x_{4}x_{5}\right )dt + \sqrt{2\epsilon }dW_{3}, \\ dx_{4} & = \left (\tilde{\gamma }_{2}x_{6} - C(x_{4} - x_{4}^{{\ast}}) +\eta (x_{2}x_{6} - x_{3}x_{5})\right )dt + \sqrt{2\epsilon }dW_{4}, \\ dx_{5} & = \left (-(\alpha _{2}x_{1} -\beta _{2})x_{6} - Cx_{5} -\delta _{2}x_{3}x_{4}\right )dt + \sqrt{2\epsilon }dW_{5}, \\ dx_{6} & = \left ((\alpha _{2}x_{1} -\beta _{2})x_{5} -\gamma _{2}x_{4} - Cx_{6} +\delta _{2}x_{2}x_{4}\right )dt + \sqrt{2\epsilon }dW_{6},\end{array} }$$
(70)

where, for m ∈ {1, 2},

$$\displaystyle{ \begin{array}{rl} \alpha _{m}& = \frac{8\sqrt{2}} {\pi } \frac{m^{2}} {4m^{2}-1} \frac{b^{2}+m^{2}-1} {b^{2}+m^{2}}, \\ \beta _{m}& = \frac{\beta b^{2}} {b^{2}+m^{2}}, \\ \gamma _{m}& =\gamma \frac{\sqrt{2}b} {\pi } \frac{4m^{3}} {(4m^{2}-1)(b^{2}+m^{2})}, \\ \tilde{\gamma }_{m}& =\gamma \frac{\sqrt{2}b} {\pi } \frac{4m} {4m^{2}-1}, \\ \delta _{m}& = \frac{64\sqrt{2}} {15\pi } \frac{b^{2}-m^{2}+1} {b^{2}+m^{2}}, \\ \eta & = \frac{16\sqrt{2}} {5\pi }.\end{array} }$$
(71)

The original Charney-DeVore equation did not contain random forcing terms: here we added to each equations an independent white noise dW i with amplitude \(\sqrt{2\epsilon }\).

Choosing \(b = \tfrac{1} {2}\), \(C = \tfrac{1} {10}\), \(\beta = \tfrac{5} {4}\), γ = 1, \(x_{1}^{{\ast}} = \tfrac{9} {2}\), and \(x_{4}^{{\ast}} = -\tfrac{9} {5}\), the 6-dimensional stochastic model above possesses two metastable states, shown in Fig. 10: a zonal state (left) and a blocked state (right). The transition paths from zonal to blocked and from blocked to zonal are different. They are shown in Figs. 11 and 12, respectively, and they were both calculated by minimizing the geometric action using our simplified gMAM algorithm. The actions along both paths are depicted in Fig. 13.

Fig. 10
figure 10

Contours of the stream-function ψ(x, y) of the two meta-stable configurations of the 6-dimensional CDV model. Left: Zonal state; Right: Blocked state

Fig. 11
figure 11

Contours of the stream-function ψ(x, y) along the transition trajectory from the zonal to the blocked meta-stable configuration for the CDV model. The arclength parameter increases in lexicographic order, with the top left plot being the initial state and the bottom right plot being the final state. The saddle point configuration is depicted in the center. The colormap is identical to Fig. 10

Fig. 12
figure 12

Contours of the stream-function ψ(x, y) along the transition trajectory from the blocked to the zonal meta-stable configuration for the CDV model. The arclength parameter increases in lexicographic order, with the top left plot being the initial state and the bottom right plot being the final state. The saddle point configuration is depicted in the center. The colormap is identical to Fig. 10

Fig. 13
figure 13

Action density dS along the transition pathways from zonal to blocked (forward) and from blocked to zonal (backward). In both directions, after passing the saddle point, the action becomes zero since the motion is deterministic

The numerical parameters in these computations were N s = 100, h = 10−3.

4.5 Generalized Voter/Ising Model

To analyze phase transitions in out-of-equilibrium systems, a Langevin equation was proposed in [1] that models critical phenomena with two absorbing states . This equation was constructed by requiring that it be symmetric under the transformation ϕ → −ϕ and have two absorbing states, arbitrarily chosen to be at ± 1. The presence of these absorbing states makes the noise multiplicative, with a scaling involving the square root of the distance to the absorbing boundaries, as suggested by the voter model [13, 15]. In order to account for Ising-like spontaneous symmetry breaking , the authors of [1] also added a bi-stable “potential”-term with − V ′(ϕ) = ( 3) to the equation, which finally lead them to:

$$\displaystyle{ \phi _{t} = \left ((1 -\phi ^{2})(a\phi - b\phi ^{3}) + D\phi _{ xx}\right )\,dt +\sigma \sqrt{1 -\phi ^{2}}\eta (x,t). }$$
(72)

In the absence of noise (ε = 0) and for a > 0, the ϕ = 0 state is locally unstable, but b > 0 ensures stable fixed points at \(\phi = \pm \sqrt{a/b}\). In the limit ab → 1, these fixed points approach the absorbing boundaries, and we are interested in the noise induced transition between these states.

We stress that making mathematical sense of (72) is non-trivial (see the discussion in Sect. 3.7). In the present application, we are going to consider a finite truncation of this SPDE, where the question of spatial regularity disappears. Specifically, we transform (72) into a two-dimensional stochastic ODE model by discretizing the spatial direction via the standard 3-point Laplace stencil, and taking only N x = 2 discretization points. This yields the stochastic ODE system

$$\displaystyle{ \left \{\begin{array}{ll} d\phi _{1} & = \left ((1 -\phi _{1}^{2})(a\phi _{ 1} - b\phi _{1}^{3}) + D(\phi _{ 1} -\phi _{2})\right )\,dt +\sigma \sqrt{1 -\phi _{ 1 }^{2}}\,dW_{x} \\ d\phi _{2} & = \left ((1 -\phi _{2}^{2})(a\phi _{2} - b\phi _{2}^{3}) - D(\phi _{1} -\phi _{2})\right )\,dt +\sigma \sqrt{1 -\phi _{ 2 }^{2}}\,dW_{y},\end{array} \right. }$$
(73)

where the constant D couples the two degrees of freedom. This SDE poses an interesting test-case for our numerical scheme, since not only the noise is multiplicative , but also the computational domain must be restricted. The square defined by 1 = max( | ϕ 1 |, | ϕ 2 | ) marks the region in which the noise is defined (real), and the noise decreases towards zero as it approaches this absorbing barrier. Analog to the discussion in [1], the choice of the parameters (a, b) determines the dynamics, in particular if a > 0, b > 0 the model exhibits bi-stability: There is an unstable fixed point at ϕ = (0, 0) and stable fixed points at \(\phi = \pm (\sqrt{a/b},\sqrt{a/b})\). As long as a < b, these fixed points are inside the allowed region. For ab → 1 the two stable fixed points approach the absorbing boundary. Here, we take b = 1, a = 1 − 10−4, D = 0. 4, so that \(\sqrt{ a/b} \approx 0.99995\) is located close to the barrier at 1. The minimizer and corresponding action are shown in Fig. 14.

Fig. 14
figure 14

Generalized voter/Ising model. Left: The arrows denote the direction of the deterministic flow, the shading its magnitude. The solid line depicts the minimizer, the dashed line the heteroclinic orbit. Markers are located at the fixed points (circle: stable; square: saddle). Right: Action density along the minimizers for the two trajectories, with normalized path parameter s ∈ (0, 1)

The numerical parameters were chosen as N s = 28, h = 10−3.

4.6 Bi-Stable Reaction-Diffusion Model

In the context of chemical reactions and birth-death processes , one considers networks of several reactants in a container of volume V which is considered well-stirred. As an example case, we consider the bi-stable chemical reaction network

$$\displaystyle{ A\mathop{\stackrel{k_{0}}{\rightleftharpoons }}\limits_{k_{1}}X,\qquad 2X + B\mathop{\stackrel{k_{2}}{\rightleftharpoons }}\limits_{k_{3}}3X }$$

with rates k i > 0, and where the concentrations of A and B are held constant. This system was introduced in [32] as a prototypical model for a bi-stable reaction network. Its dynamics can be modeled as a Markov jump process (MJP) with generator

$$\displaystyle{ (L^{R}f)(n) = A_{ +}(n)\left (f(n + 1) - f(n)\right ) + A_{-}(n)\left (f(n - 1) - f(n)\right ) }$$
(74)

with the propensity functions

$$\displaystyle{ \left \{\begin{array}{ll} A_{+}(n)& = k_{0}V + (k_{2}/V )n(n - 1) \\ A_{-}(n)& = k_{1}n + (k_{3}/V ^{2})n(n - 1)(n - 2). \end{array} \right. }$$
(75)

The model above satisfies a large deviation principle in the following scaling limit: Denote by c = nV the concentration of X, and normalize it by a typical concentration, ρ = cc 0. Now, in the limit of a large number of particles per cell Ω = c 0 V and simultaneously rescaling time by 1∕Ω, we obtain

$$\displaystyle{ (L_{\epsilon }^{R}f)(\rho ) = \frac{1} {\epsilon } \Big(a_{+}(\rho )\left (f(\rho +\epsilon ) - f(\rho )\right ) + a_{-}(\rho )\left (f(\rho -\epsilon ) - f(\rho )\right )\Big)\,, }$$
(76)

where ε = 1∕Ω is a small parameter. Here, we defined k i = λ i (c 0)1−i, and

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} a_{+}(\rho )\quad &=\lambda _{0} +\lambda _{2}\rho ^{2} \\ a_{-}(\rho )\quad &=\lambda _{1}\rho +\lambda _{3}\rho ^{3}. \end{array} \right. }$$
(77)

The large deviation principle for (76) can be formally obtained via WKB analysis , that is, by setting \(f(\rho ) = e^{\epsilon ^{-1}G(\rho ) }\) in (76) and expanding in ε [14]. To leading order in ε, this gives an Hamilton-Jacobi operator associated with an Hamiltonian that is also the one rigorously derived in LDT [35]. It reads

$$\displaystyle{ H(\rho,\vartheta ) = a_{+}(\rho )(e^{\vartheta } - 1) + a_{-}(\rho )(e^{-\vartheta }- 1). }$$
(78)

This is an example of a system whose Hamiltonian is not quadratic in the conjugate momentum ϑ . Therefore the computation of ϑ by (26) can not be performed explicitly in general. For parameters λ 0 = 0. 8, λ 1 = 2. 9, λ 2 = 3. 1, λ 3 = 1, the system has two stable fixed points ρ ± and a saddle ρ s at \(\rho _{+} = \tfrac{8} {5},\rho _{-} = \tfrac{1} {2},\rho _{s} = 1\).

Since transitions in 1D are fairly trivial, we want to consider the case of N neighboring reaction compartments, each well-stirred, but with random jumps possible between neighboring compartments. This situation was analyzed in [38] via direct sampling, but we are interested in the computation of the transition trajectory. Denote by ρ i the concentration in the i-th compartment and refer to the vector ρ as the complete state, \(\rho =\sum _{ i=0}^{N}\rho _{i}\hat{e}_{i}\). In this case, we obtain a diffusive part of the generator, L D, coupling neighboring compartments. For a diffusivity D, it is

$$\displaystyle{ (L^{D}f)(\rho ) = \frac{D} {\epsilon } \sum _{i=1}^{N}\rho _{ i}\left (f(\rho -\epsilon \hat{ e}_{i} +\epsilon \hat{ e}_{i-1}) + f(\rho -\epsilon \hat{ e}_{i} +\epsilon \hat{ e}_{i+1}) - 2f(\rho )\right ). }$$
(79)

The process associated with this generator also admits a large deviation principle with Hamiltonian

$$\displaystyle{ H^{D}(\rho,\vartheta ) = D\sum _{ i=1}^{N}\rho _{ i}\left (e^{\vartheta _{i-1}-\vartheta _{i} } + e^{\vartheta _{i+1}-\vartheta _{i}} - 2\right ). }$$
(80)

Therefore, the full Hamiltonian becomes H(ρ, ϑ) = H D(ρ, ϑ) + i = 1 N H R(ρ i , ϑ i ), where H R(ρ i , ϑ i ) is the reactive Hamiltonian in (78), which is summed up over all the compartments.

We used our new gMAM algorithm to minimize the geometric action and compute the transition paths between the stable fixed points for the simplest non-trivial case of N = 2 compartments. Shown in Fig. 15 are the forward and backward trajectories. Note that the backward transition ((ρ +, ρ +) → (ρ , ρ )) takes a special form: It climbs against the deterministic dynamics up to the maximum, then relaxes along the separatrix down to the saddle. Additionally, we compare these trajectories with the heteroclinic orbit obtained by the string method. The action along these trajectories is depicted in Fig. 16. Note how for the backward minimizer the action is zero already before it hits the saddle, as the movement from the maximum to the saddle happens deterministically.

Fig. 15
figure 15

Bi-Stable reaction-diffusion model with N = 2 reaction cells. Show are the forward (red) and backward (green) transitions between the two stable fixed points, in comparison to the heteroclinic orbit (dashed). The flow-lines depict the deterministic dynamics, their magnitude is indicated by the background shading

Fig. 16
figure 16

Action densities for the bi-stable reaction-diffusion model. Depicted are the actions corresponding to the forward (solid) and backward (dashed) minimizer (dark) and heteroclinic (light) orbit

The numerical parameters were chosen as N s = 29, h = 10.

4.7 Slow-Fast Systems

In contrast to a large deviation principle arising in the limit of small noise or large number of particles, a different class of Hamiltonians arises for systems with a slow variable X evolving on a timescale O(1) and a fast variable Y on a time scale O(α):

$$\displaystyle{ \dot{X} = f(X,Y ) }$$
(81a)
$$\displaystyle{ dY = \frac{1} {\alpha } b(X,Y )dt + \frac{1} {\sqrt{\alpha }}\sigma (X,Y )dW. }$$
(81b)

Examples of systems with large timescale separation α ≪ 1 are ubiquitous in nature, and usually one is interested mostly in the long-time behavior of the slow variables. In particular, we are concerned with situations where the slow dynamics exhibits metastability. We want to use our algorithm to compute transition pathways in this setup for the limit of infinite time scale separation.

In the limit as α → 0, the fast variables reach statistical equilibrium before any motion of the slow variables, and these slow variables only experience the average effect of the fast ones. This behavior can be captured by the following deterministic limiting equation which is akin to a law of large numbers (LLN) in the present context and reads

$$\displaystyle{ \dot{\bar{X}} = F(\bar{X})\quad \text{where}\quad F(x) =\lim _{T\rightarrow \infty }\frac{1} {T}\int _{0}^{T}f(x,Y _{ x}(\tau ))\,d\tau. }$$
(82)

Here Y x (t) is the solution of (81b) for X(t) = x fixed [3, 6, 19, 30]. For small but finite α, the slow variables also experience fluctuations through the fast variables. In particular, the statistics of \(\xi = (X -\bar{ X})/\sqrt{\alpha }\) on O(1) time scales can be described by a central limit theorem (CLT) as small Gaussian noise on top of the slow mean \(\bar{X}\). The CLT scaling, however, is inappropriate to describe the fluctuations of the slow variables that are induced by the effect of the fast variables on longer time scales and may, for example, lead to transitions between stable fixed points of the limiting equation in (82). In particular, the naive procedure of constructing an SDE out of the LLN and CLT to then compute its LDT fails. Instead, the transitions in the limit of α → 0 are captured by an LDP with the Hamiltonian

$$\displaystyle{ H(x,\vartheta ) =\lim _{T\rightarrow \infty }\frac{1} {T}\log \mathbb{E}\exp \left (\vartheta \int _{0}^{T}f(x,Y _{ x}(t))\,dt\right ). }$$
(83)

Except for the special case f(x, y) = r(x) + s(y)y (linear dependence on the fast variable), the Hamiltonian (83) is non-quadratic in θ. As a consequence no S(P)DE with Gaussian noise exists for the slow variable which has an LDP to describe the transitions correctly.

The implicit nature of the Hamiltonian (83), in particular containing an expectation, complicates numerical procedures to compute its associated minimizers. Yet, in the non-trivial case of a quadratic dependence of the slow variable on the fast ones, for example,

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} \dot{X} = Y ^{2} -\beta X \quad \\ dY = -\frac{1} {\alpha } \gamma (X)Y \,dt + \frac{\sigma } {\sqrt{\alpha }}dW,\quad \end{array} \right. }$$
(84)

one indeed does obtain an explicit formula for the Hamiltonian (83) (as derived in [6])

$$\displaystyle{ h(x,\vartheta ) = -\beta x\vartheta + \tfrac{1} {2}\left (\gamma (x) -\sqrt{\gamma ^{2 } (x) - 2\sigma ^{2}\vartheta }\right ). }$$
(85)

This example is interesting for our purpose not only because the Hamiltonian is non-quadratic, but furthermore because of the existence of a forbidden region ϑ > γ 2∕(2σ) where the Hamiltonian is not defined.

Additionally increasing the number of degrees of freedom by combining two independent multi-stable slow-fast systems and coupling them by a spring with spring constant D, the full system reads

$$\displaystyle{ \left \{\begin{array}{@{}l@{\quad }l@{}} \dot{X}_{1} = Y _{1}^{2} -\beta _{1}X_{1} - D(X_{1} - X_{2})\quad \\ \dot{X}_{2} = Y _{2}^{2} -\beta _{2}X_{2} - D(X_{2} - X_{1})\quad \\ dY _{1} = -\frac{1} {\alpha } \gamma (X_{1})Y _{1}dt + \frac{\sigma } {\sqrt{\alpha }}dW_{1} \quad \\ dY _{2} = -\frac{1} {\alpha } \gamma (X_{2})Y _{2}dt + \frac{\sigma } {\sqrt{\alpha }}dW_{2}. \quad \end{array} \right. }$$
(86)

The Hamiltonian for the LDT for this system is

$$\displaystyle{ H(x_{1},x_{2},\vartheta _{1},\vartheta _{2}) = h(x_{1},\vartheta _{1}) + h(x_{2},\vartheta _{2}) +\langle -\nabla U(x_{1},x_{2}),\vartheta \rangle, }$$
(87)

for \(U(x,y) = \frac{1} {2}D(x - y)^{2}\) and h(x, ϑ) defined as in equation (85). The choice γ(X) = (X − 5)2 + 1 ensures two stable fixed points. The deterministic dynamics of this system (i.e. the evolution of the averaged slow variables) are depicted as white arrows in Fig. 17 (left). To stress the important portion of the transition trajectory, the plot is focused only on the initial state up to the saddle. Compared are the minimizer and the heteroclinic orbits connecting the stable fixed points to the saddle point. The corresponding actions are shown in Fig. 17 (right). The specific choice of model parameters for this computation is β 1 = 0. 6, β 2 = 0. 3, D = 1. 0 and σ 2 = 10.

Fig. 17
figure 17

Coupled slow-fast system ODE model for D = 1. 0. Left: The arrows denote the direction of the deterministic flow, the shading its magnitude. The solid line depicts the minimizer, the dashed line the relaxation paths from the saddle. Markers are located at the fixed points (circle: stable; square: saddle). Right: Action density along the minimizers for the two trajectories up to the saddle, with normalized path parameter s ∈ (0, 1)

The numerical parameters were chosen as N s = 210, h = 10−2.

5 Concluding Remarks

We have discussed numerical schemes to compute minimizers of large deviation action functionals, which are based on the geometric minimum action method. The basis of these schemes is the minimization of a geometric action on the space of arc-length parametrized curves, which makes it possible to perform the double minimization over transition time T and action S T that is required to compute the LDT quasipotential. In particular, transitions between metastable fixed points of a system, which generally involve T and which are not tractable with non-geometric minimum action methods can be naturally analyzed in this setup.

A simplified gMAM algorithm was proposed here which is based on a particular formulation of the geometric action leading to a mixed optimization problem. This new formulation of the gMAM algorithm is easier to implement than the original method: In its simplest form, only first order derivatives of the Hamiltonian H(φ, ϑ) are needed. The algorithm is applicable to a large class of systems, and does not rely on an explicit formula of the large deviation rate function—only the Hamiltonian of the theory is needed. We derived specific reductions that are possible in regularly occurring special cases, such as SDEs with additive or multiplicative noise. Furthermore, we discussed optimizations for SPDEs with additive noise and commented on how to improve numerical efficiency.

The performances of the new gMAM algorithm were illustrated in a series of applications arising from different fields and involving different types of models, like S(P)DEs with additive and multiplicative Gaussian noises, Markov jump processes, or slow-fast systems.