Keywords

MSC numbers

1 The Purpose of GNI

Geometric numerical integration (GNI) emerged as a major thread in numerical mathematics some 25 years ago. Although it has had antecedents, in particular the concerted effort of the late Feng Kang and his group in Beijing to design structure-preserving methods, the importance of GNI has been recognised and its scope delineated only in the 1990s.

But we are racing ahead of ourselves. At the beginning, like always in mathematics, there is the definition and the rationale of GNI. The rationale is that all-too-often mathematicians concerned with differential equations split into three groups that have little in common. Firstly, there are the applied mathematicians, the model builders, who formulate differential equations to describe physical reality. Secondly, there are those pure mathematicians investigating differential equations and unravelling their qualitative features. Finally, the numerical analysts who flesh out the numbers and the graphics on the bones of mathematical formulation. Such groups tended to operate in mostly separate spheres and, in particular, this has been true with regards to computation. Discretisation methods were designed (with huge creativity and insight) to produce rapidly and robustly numerical solutions that can be relied to carry overall small error. Yet, such methods have often carried no guarantee whatsoever to respect qualitative features of the underlying system, the very same features that had been obtained with such effort by pure and applied mathematicians.

Qualitative features come basically in two flavours, the dynamical and the geometric. Dynamical features—sensitivity with respect to initial conditions and other parameters, as well as the asymptotic behaviour—have been recognised as important by numerical analysts for a long time, not least because they tend to impinge directly on accuracy. Thus, sensitivity with respect to initial conditions and perturbations comes under ‘conditioning’ and the recovery of correct asymptotics under ‘stability’, both subject to many decades of successful enquiry. Geometric attributes are invariants, constants of the flow. They are often formulated in the language of differential geometry (hence the name!) and mostly come in three varieties: conservation laws, e.g. Hamiltonian energy or angular momentum, which geometrically mean that the solution, rather than evolving in some large space \(\mathbb {R}^d\), is restricted to a lower-dimensional manifold \(\mathcal {M}\), Lie point symmetries, e.g. scaling invariance, which restrict the solution to the tangent bundle of some manifold, and quantities like symplecticity and volume, conservation laws for the derivative of the flow. The design and implementation of numerical methods that respect geometric invariants is the business of GNI.

Since its emergence, GNI has become the new paradigm in numerical solution of ODEs, while making significant inroads into numerical PDEs. As often, yesterday’s revolutionaries became the new establishment. This is an excellent moment to pause and take stock. Have all the major challenges been achieved, all peaks scaled, leaving just a tidying-up operation? Is there still any point to GNI as a separate activity or should it be considered as a victim of its own success and its practitioners depart to fields anew—including new areas of activity that have been fostered or enabled by GNI?

These are difficult questions and we claim no special authority to answer them in an emphatic fashion. Yet, these are questions which, we believe, must be addressed. This short article is an attempt to foster a discussion. We commence with a brief survey of the main themes of GNI circa 2015. This is followed by a review of recent and ongoing developments, as well as of some new research directions that have emerged from GNI but have acquired a life of their own.

2 The Story So Far

2.1 Symplectic Integration

The early story of GNI is mostly the story of symplectic methods. A Hamiltonian system

(2.1)

where \(H:\mathbb {R}^{2d}\rightarrow \mathbb {R}\) is a Hamiltonian energy, plays a fundamental role in mechanics and is known to possess a long list of structural invariants, e.g. the conservation of the Hamiltonian energy. Yet, arguably its most important feature is the conservation of the symplectic form \(\sum _{k=1}^d \,\mathrm {d}\varvec{p}_k\wedge \,\mathrm {d}\varvec{q}_k\) because symplecticity is equivalent to Hamiltonicity—in other words, every solution of a Hamiltonian system is a symplectic flow and every symplectic flow is locally Hamiltonian with respect to an appropriate Hamiltonian energy [33].

The solution of Hamiltonian problems using symplectic methods has a long history, beautifully reviewed in [32], but modern efforts can be traced to the work of Feng and his collaborators at the Chinese Academy of Sciences, who have used generating-function methods to solve Hamiltonian systems [21]. And then, virtually simultaneously, [46, 77, 83] proved that certain Runge–Kutta methods, including the well-known Gauss–Legendre methods, preserve symplecticity and they presented an easy criterion for the symplecticity of Runge–Kutta methods. GNI came of age!

Symplectic methods readily found numerous uses, from particle accelerator physics [23] and celestial mechanics [47] to molecular dynamics [49] and beyond.

Subsequent research into symplectic Runge–Kutta methods had branched out into a number of directions, each with its own important ramifications outside the Hamiltonian world:

  • Backward error analysis. The idea of backward error analysis (BEA) can be traced to Wilkinson’s research into linear algebra algorithms in the 1950ties. Instead of asking “what is the numerical error for our problem”, Wilkinson asked “which nearby problem is solved exactly by our method?”. The difference between the original and the nearby problem can tell us a great deal about the nature of the error in a numerical algorithm.

    A generalisation of BEA to the field of differential equations is fraught with difficulties. Perhaps the first successful attempt to analyse Hamiltonian ODEs in this setting was by [70] and it was followed by many, too numerous to list: an excellent exposition (like for many things GNI) is the monograph of [33]. A major technical tool is the B-series, an expansion of composite functions in terms of forests of rooted trees, originally pioneered by [7]. (We mention in passing that the Hopf algebra structure of this Butcher group has been recently exploited by mathematical physicists to understand the renormalisation group [15]—as the authors write, “We regard Butcher’s work on the classification of numerical integration methods as an impressive example that concrete problem-oriented work can lead to far-reaching conceptual results”.) It is possible to prove that, subject to very generous conditions, the solution of a Hamiltonian problem by a symplectic method, implemented with constant step size, is exponentially near to the exact solution of a nearby Hamiltonian problem for an exponentially long time. This leads to considerably greater numerical precision, as well as to the conservation on average (in a strict ergodic sense) of Hamiltonian energy.

    B-series fall short in a highly oscillatory and multiscale setting, encountered frequently in practical Hamiltonian systems. The alternative in the BEA context is an expansion into modulated Fourier series [29], as well as expansions into word series [68], to which we return in Sect. 4.1.

  • Composition and splitting.

    Many Hamiltonians of interest can be partitioned into a sum of kinetic and potential energy, \(H(\varvec{p},\varvec{q})=\varvec{p}^\top M\varvec{p}+V(\varvec{q})\). It is often useful to take advantage of this in the design of symplectic methods. While conventional symplectic Runge–Kutta methods are implicit, hence expensive, partitioned Runge–Kutta methods, advancing separately in \(\varvec{p}\) and \(\varvec{q}\), can be explicit and are in general much cheaper. While perhaps the most important method, the Störmer–Verlet scheme, has been known for many years, modern theory has led to an entire menagerie of composite and partitioned methods [79].

    Splitting methodsFootnote 1 have been used in the numerical solution of PDEs since 1950s. Thus, given the equation \(u_t=\mathcal {L}_1(u)+\mathcal {L}_2(u)\), where the \(\mathcal {L}_k\)s are (perhaps nonlinear) operators, the idea is to approximate the solution in the form

    $$\begin{aligned} u(t+h)\approx {\mathrm e}^{\alpha _1 h\mathcal {L}_1} {\mathrm e}^{\beta _1 h\mathcal {L}_2} {\mathrm e}^{\alpha _2 h\mathcal {L}_1} \cdots {\mathrm e}^{\alpha _s h\mathcal {L}_1} {\mathrm e}^{\beta _s\mathcal {L}_2}u(t), \end{aligned}$$
    (2.2)

    where \(v(t_0+h)=:{\mathrm e}^{h \mathcal {L}_1}v(t_0)\) and \(w(t_0+h)=:{\mathrm e}^{h \mathcal {L}_2}w(t_0)\) are, formally, the solutions of \(\dot{v}=\mathcal {L}_1(v)\) and \(\dot{w}=\mathcal {L}_2(w)\) respectively, with suitable boundary conditions. The underlying assumption is that the solutions of the latter two equations are either available explicitly or are easy to approximate, while the original equation is more difficult.

    A pride of place belongs to palindromic compositions of the form

    $$\begin{aligned} {\mathrm e}^{\alpha _1 h\mathcal {L}_1} {\mathrm e}^{\beta _1 h\mathcal {L}_2} {\mathrm e}^{\alpha _2 h\mathcal {L}_1} \cdots {\mathrm e}^{\alpha _q h\mathcal {L}_1}{\mathrm e}^{\beta _q h\mathcal {L}_2}{\mathrm e}^{\alpha _q h\mathcal {L}_1} \cdots {\mathrm e}^{\alpha _2 h\mathcal {L}_1} {\mathrm e}^{\beta _1 h\mathcal {L}_2} {\mathrm e}^{\alpha _1 h\mathcal {L}_1}, \end{aligned}$$
    (2.3)

    invariant with respect to a reversal of the terms. They constitute a time-symmetric map, and this has a number of auspicious consequences. Firstly, they are always of an even order. Secondly—and this is crucial in the GNI context—they respect both structural invariants whose integrators are closed under composition, i.e. form a group (for example integrators preserving volume, symmetries, or first integrals), as well as invariants whose integrators are closed under symmetric composition, i.e. form a symmetric space (for example integrators that are self-adjoint, or preserve reversing symmetries). A basic example of (2.3) is the second-order Strang composition

    $$\begin{aligned} {\mathrm e}^{\frac{1}{2} h\mathcal {L}_1} {\mathrm e}^{h\mathcal {L}_2}{\mathrm e}^{\frac{1}{2} h \mathcal {L}_1} ={\mathrm e}^{h(\mathcal {L}_1+\mathcal {L}_2)} +\mathcal{O}\left( h^3\right) . \end{aligned}$$

    Its order—and, for that matter, the order of any time-symmetric method—can be boosted by the Yoshida device [88]. (Cf. also [85].) Let \(\Phi \) be a time-symmetric approximation to \({\mathrm e}^{t\mathcal {L}}\) of order 2P, say. Then

    $$\begin{aligned} \Phi ((1+\alpha )h)\Phi (-(1+2\alpha )h)\Phi ((1+\alpha )h),\qquad \text{ where }\qquad \alpha =\frac{2^{1/(2P+1)}-1}{2-2^{1/(2P+1)}} \end{aligned}$$

    is also time symmetric and of order \(2P+2\). Successive applications of the Yoshida device allow to increase arbitrarily the order of the Strang composition, while retaining its structure-preserving features. This is but a single example of the huge world of splitting and composition methods, reviewed in [4, 57].

  • Exponential integrators.

    Many ‘difficult’ ODEs can be written in the form \(\dot{\varvec{y}}=A\varvec{y}+\varvec{b}(\varvec{y})\) where the matrix A is ‘larger’ (in some sense) than \(\varvec{b}(\varvec{y})\)—for example, A may be the Jacobian of an ODE (which may vary from step to step). Thus, it is to be expected that the ‘nastiness’ of the ODE under scrutiny—be it stiffness, Hamiltonicity or high oscillation—is somehow ‘hardwired’ into the matrix A. The exact solution of the ODE can be written in terms of the variation-of-constants formula,

    $$\begin{aligned} \varvec{y}(t+h)={\mathrm e}^{hA}\varvec{y}(t)+\int _0^h {\mathrm e}^{(h-\xi )A}\varvec{b}(\varvec{y}(t+\xi ))\,\mathrm {d}\xi , \end{aligned}$$
    (2.4)

    except that, of course, the right-hand side includes the unknown function \(\varvec{y}\). Given the availability of very effective methods to compute the matrix exponential, we can exploit this to construct exponential integrators, explicit methods that often exhibit favourable stability and structure-preservation features. The simplest example, the exponential Euler method, freezes \(\varvec{y}\) within the integral in (2.4) at its known value at t, the outcome being the first-order method

    $$\begin{aligned} \varvec{y}_{n+1}={\mathrm e}^{hA}\varvec{y}_n+A^{-1}({\mathrm e}^{hA}-I)\varvec{b}(\varvec{y}_n). \end{aligned}$$

    The order can be boosted by observing that (in a loose sense which can be made much more precise) the integral above is discretised by the Euler method, which is a one-stage explicit Runge–Kutta scheme, discretising it instead by multistage schemes of this kind leads to higher-order methods [35].

    Many Hamiltonian systems of interest can be formulated as second-order systems of the form \(\ddot{\varvec{y}}+\Omega ^2\varvec{y}=\varvec{g}(\varvec{y})\). Such systems feature prominently in the case of highly oscillatory mechanical systems, where \(\Omega \) is positive definite and has some large eigenvalues. The variation of constants formula (2.4) now reads

    $$\begin{aligned} \left[ \! \begin{array}{c} \varvec{y}(t+h)\\ \dot{\varvec{y}}(t+h) \end{array} \!\right]&= \left[ \begin{array}{cc} \cos (h\Omega ) &{} \Omega ^{-1}\sin (h\Omega )\\ -\Omega \sin (h\Omega ) &{} \cos (h\Omega ) \end{array} \right] \left[ \! \begin{array}{c} \varvec{y}(t)\\ \dot{\varvec{y}}(t) \end{array} \!\right] \\&~~~~\,\,+\int _t^{t+h} \left[ \begin{array}{cc} \cos ((h-\xi )\Omega ) &{} \Omega ^{-1}\sin ((h-\xi )\Omega )\\ -\Omega \sin ((h-\xi )\Omega ) &{} \cos ((h-\xi )\Omega ) \end{array} \right] \! \left[ \begin{array}{c} \varvec{0}\\ \varvec{g}(\varvec{y}(t+\xi )) \end{array} \right] \!\,\mathrm {d}\xi \end{aligned}$$

    and we can use either standard exponential integrators or exponential integrators designed directly for second-order systems and using Runge–Kutta–Nyström methods on the nonlinear part [87].

    An important family of exponential integrators for second-order systems are Gautschi-type methods

    $$\begin{aligned} \varvec{y}_{n+1}-2\varvec{y}_n+\varvec{y}_{n-1}=h^2\Psi (h\Omega ) (\varvec{g}_n-\Omega ^2\varvec{y}_n), \end{aligned}$$
    (2.5)

    which are of second order. Here \(\Psi (x)=2(1-\cos x)/x\) while, in Gautschi’s original method, \(\varvec{g}_n=\varvec{g}(\varvec{y}_n)\) [35]. Unfortunately, this choice results in resonances and a better one is \(\varvec{g}_n=\varvec{g}(\Phi (h\Omega )\varvec{y}_n)\), where the filter \(\Phi \) eliminates resonances: \(\Phi (0)=I\) and \(\Phi (k\pi )=0\) for \(k\in \mathbb {N}\). We refer to [35] for further discussion of such methods in the context of symplectic integration.

  • Variational integrators. Lagrangian formulation recasts a large number of differential equations as extrema of nonlinear functionals. Thus, for example, instead of the Hamiltonian problem \(M\ddot{\varvec{q}}+\varvec{\nabla } V(\varvec{q})=\varvec{0}\), where the matrix M is positive definite, we may consider the equivalent variational formulation of extremising the positive-definite nonlinear functional \(L(\varvec{q},\dot{\varvec{q}})=\frac{1}{2} \dot{\varvec{q}}^\top M\dot{\varvec{q}}-V(\varvec{q})\). With greater generality, Hamiltonian and Lagrangian formulations are connected via the familiar Euler–Lagrange equations and, given the functional L, the corresponding second-order system is

    $$\begin{aligned} \frac{\partial L(\varvec{q},\dot{\varvec{q}})}{\partial \varvec{q}}-\frac{\,\mathrm {d}}{\,\mathrm {d}t} \left[ \frac{\partial L(\varvec{q},\dot{\varvec{q}})}{\partial \dot{\varvec{q}}}\right] =\varvec{0}. \end{aligned}$$

    The rationale of variational integrators parallels that of the Ritz method in the theory of finite elements. We first reformulate the Hamiltonian problem as a Lagrangian one, project it to a finite-dimensional space, solve it there and transform back. The original symplectic structure is replaced by a finite-dimensional symplectic structure, hence the approach is by design symplectic [64]. Marsden and West [54] review the implementation of variational integrators.

2.2 Lie-Group Methods

Let \(\mathcal {G}\) be a Lie group and \(\mathcal {M}\) a differentiable manifold. We say that \(\Lambda :\mathcal {G}\times \mathcal {M}\rightarrow \mathcal {M}\) is a group action if

  1. a.

    \(\Lambda (\iota ,y)=y\) for all \(y\in \mathcal {M}\) (where \(\iota \) is the identity of \(\mathcal {G}\)) and

  2. b.

    \(\Lambda (p,\Lambda (q,y))=\Lambda (p\cdot q,y)\) for all \(p,q\in \mathcal {G}\) and \(y\in \mathcal {M}\).

If, in addition, for every \(x,y\in \mathcal {M}\) there exists \(p\in \mathcal {G}\) such that \(y=\Lambda (p,x)\), the action is said to be transitive and \(\mathcal {M}\) is a homogeneous space, acted upon by \(\mathcal {G}\).

Every Lie group acts upon itself, while the orthogonal group \(\mathrm {O}(n)\) acts on the \((n-1)\)-sphere by multiplication, \(\Lambda (p,y)=py\). The orthogonal group also acts on the isospectral manifold of all symmetric matrices similar to a specific symmetric matrix by similarity, \(\Lambda (p,y)=pyp^\top \). Given \(1\le m\le n\), the Grassmann manifold \(\mathbb {G}(n,m)\) of all m-dimensional subspaces of \(\mathbb {R}^n\) is a homogeneous space acted upon by \(\mathrm {SO}(m)\times \mathrm {SO}(n-m)\), where \(\mathrm {SO}(m)\) is the special orthogonal group—more precisely, \(\mathbb {G}(n,m)=\mathrm {SO}(n)/(\mathrm {SO}(m)\times \mathrm {SO}(n-m))\).

Faced with a differential equation evolving in a homogeneous space, we can identify its flow with a group action: Given an initial condition \(y_0\in \mathcal {M}\), instead of asking “what is the value of y at time \(t>0\)” we might pose the equivalent question “what is the group action that takes the solution from \(y_0\) to y(t)?”. This is often a considerably more helpful formulation because a group action can be further related to an algebra action. Let \(\mathfrak {g}\) be the Lie algebra corresponding to the matrix group \(\mathcal {G}\), i.e. the tangent space at \(\iota \in \mathcal {G}\), and denote by \(\mathfrak {X}(\mathcal {M})\) the set of all Lipschitz vector fields over \(\mathcal {M}\). Let \(\lambda :\mathfrak {g}\rightarrow \mathfrak {X}(\mathcal {M})\) and \(a:\mathbb {R}_+\times \mathcal {M}\rightarrow \mathfrak {g}\) be both Lipschitz. In particular, we might consider

where \(\Lambda \) is a group action and \(\rho :\mathbb {R}_+\rightarrow \mathcal {G}\), \(\rho (s,y(s))=\iota +a(s,y(s))s+\mathcal{O}\left( s^2\right) \) for small |s|. The equation \(\dot{y}=\lambda (a(t,y),y)\), \(y(0)=y_0\in \mathcal {M}\) represents algebra action and its solution evolves in \(\mathcal {M}\). Moreover,

$$\begin{aligned} y(t)=\Lambda (v(t),y_0)\qquad \text{ where }\qquad \dot{v}=a(t,\Lambda (v,y_0))v,\quad v(0)=\iota \in \mathcal {G} \end{aligned}$$
(2.6)

is a Lie-group equation. Instead of solving the original ODE on \(\mathcal {M}\), it is possible to solve (2.6) and use the group action \(\Lambda \) to advance the solution to the next step: this is the organising principle of most Lie-group methods [42]. It works because a Lie-group equation can be solved in the underlying Lie algebra, which is a linear space. Consider an ODEFootnote 2 \(\dot{y}=f(y)\), \(y(0)\in \mathcal {M}\), such that \(f:\mathcal {M}\rightarrow \mathfrak {X}\)—the solution y(t) evolves on the manifold. While conventional numerical methods are highly unlikely to stay in \(\mathcal {M}\), this is not the case for Lie-group methods. We can travel safely between \(\mathcal {M}\) and \(\mathcal {G}\) using a group action. The traffic between \(\mathcal {G}\) and \(\mathfrak {g}\) is slightly more complicated and we need to define a trivialisation, i.e. an invertible map taking smoothly a neighbourhood of \(0\in \mathfrak {g}\) to a neighbourhood of \(\iota \in \mathcal {G}\) and taking zero to identity. The most ubiquitous example of trivialisation is the exponential map, which represents the solution of (2.6) as \(v(t)={\mathrm e}^{\omega (t)}\), where \(\omega \) is the solution of the dexpinv equation

$$\begin{aligned} \dot{\omega }=\sum _{m=0}^\infty \frac{\mathrm {B}_m}{m!} \mathrm {ad}^m_{a(t,{\mathrm e}^\omega )}\omega ,\qquad \omega (0)=0\in \mathfrak {g} \end{aligned}$$
(2.7)

[42]. Here the \(\mathrm {B}_m\)s are Bernoulli numbers, while \(\mathrm {ad}^m_b\) is the adjoint operator in \(\mathfrak {g}\),

$$\begin{aligned} \mathrm {ad}_b^0 c=c,\qquad \mathrm {ad}_b^m c=[b,\mathrm {ad}_b^{m-1}c],\quad m\in \mathbb {N},\qquad b,c\in \mathfrak {g}. \end{aligned}$$

Because \(\mathfrak {g}\) is closed under linear operations and commutation, solving (2.7) while respecting Lie-algebraic structure is straightforward. Mapping back, first to \(\mathcal {G}\) and finally to \(\mathcal {M}\), we keep the numerical solution of \(\dot{y}=f(t)\) on the manifold.

Particularly effective is the use of explicit Runge–Kutta methods for (2.7), the so-called Runge–Kutta–Munthe-Kaas (RKMK) methods [65]. To help us distinguish between conventional Runge–Kutta methods and RKMK, consider the three-stage, third-order method with the Butcher tableauFootnote 3

(2.8)

Applied to the ODE \(\dot{y}=f(t,y)\), \(y(t_n)=y_n\in \mathcal {M}\), evolving on the manifold \(\mathcal {M}\subset \mathbb {R}^d\), it becomes

$$\begin{aligned}&k_1=f(t_n,y_n),\\&k_2=f(t_{n+\frac{1}{2}},y_n+{\textstyle \frac{1}{2}} hk_1),\\&k_3=f(t_{n+1},y_n-hk_1+2hk_2),\\&\Delta =h({\textstyle \frac{1}{6}} k_1+{\textstyle \frac{2}{3}} k_2+{\textstyle \frac{1}{6}} k_3),\\ y_{n+1}= & {} y_n+\Delta . \end{aligned}$$

Since we operate in \(\mathbb {R}^d\), there is absolutely no reason for \(y_{n+1}\) to live in \(\mathcal {M}\). However, once we implement (2.8) at an algebra level (truncating first the dexpinv equation (2.7)),

$$\begin{aligned}&k_1=a(t_n,\iota ),\\&k_2=a(t_{n+\frac{1}{2}},{\mathrm e}^{hk_1/2}),\\&k_3=a(t_{n+1},{\mathrm e}^{-hk_1+2hk_2}),\\&\Delta =h({\textstyle \frac{1}{6}} k_1+{\textstyle \frac{2}{3}} k_2+{\textstyle \frac{1}{6}} k_3),\\ \omega _{n+1}= & {} \Delta +{\textstyle \frac{1}{6}} h[\Delta ,k_1]\\ y_{n+1}= & {} \Lambda ({\mathrm e}^{\omega _{n+1}},y_n), \end{aligned}$$

the solution is guaranteed to stay in \(\mathcal {M}\).

An important special case of a Lie-group equation is the linear ODE \(\dot{v}=a(t)v\), where \(a:\mathbb {R}_+\rightarrow \mathfrak {g}\). Although RKMK works perfectly well in a linear case, special methods do even better. Perhaps the most important is the Magnus expansion [53], \(v(t)={\mathrm e}^{\omega (t)}v(0)\), where

$$\begin{aligned} \nonumber \omega (t)= & {} \int _0^t a(\xi )\,\mathrm {d}\xi -\frac{1}{2} \int _0^t\!\int _0^{\xi _1} [a(\xi _2),a(\xi _1)]\,\mathrm {d}\xi _2\,\mathrm {d}\xi _1 \\&+\frac{1}{4} \int _0^t \! \int _0^{\xi _1} \! \!\int _0^{\xi _2} [[a(\xi _3),a(\xi _2)],a(\xi _1)]\,\mathrm {d}\xi _3\,\mathrm {d}\xi _2\,\mathrm {d}\xi _1\\ \nonumber&+\frac{1}{12} \int _0^t\!\int _0^{\xi _1}\!\!\int _0^{\xi _2}[a(\xi _3),[a(\xi _2),a(\xi _1)]]\,\mathrm {d}\xi _3\,\mathrm {d}\xi _2\,\mathrm {d}\xi _1+\cdots . \end{aligned}$$
(2.9)

We refer to [6, 40, 42] for explicit means to derive expansion terms, efficient computation of multivariate integrals that arise in this context and many other implementation details. Magnus expansions are important in a number of settings when preservation of structure is not an issue, not least in the solution of linear stochastic ODEs [51].

There are alternative means to expand the solution of (2.7) in a linear case, not least the Fer expansion [22, 38], that has found recently an important application in the computation of Sturm–Liouville spectra [76].

Another approach to Lie-group equations uses canonical coordinates of the second kind [72].

2.3 Conservation of Volume

An ODE \(\dot{\varvec{x}}=\varvec{f}(\varvec{x})\) is divergence-free if \(\varvec{\nabla } \cdot \varvec{f}(\varvec{x})=0\). The flows of divergence-free ODEs are volume-preserving (VP). Volume is important to preserve, as it leads to KAM-tori, incompressibility, and, most importantly, is a crucial ingredient for ergodicity. Unlike symplecticity, however, phase space volume can generically not be preserved by Runge–Kutta methods, or even by their generalisations, B-series methods. This was proved independently in [13] and in [43]. Since B-series methods cannot preserve volume, we need to look to other methods.

There are essentially two known numerical integration methods that preserve phase space volume. The first volume-preserving method is based on splitting [20]. As an example, consider a 3D volume preserving vector field:

$$\begin{aligned} \dot{x}= & {} u(x,y,z) \nonumber \\ \dot{y}= & {} v(x,y,z) \\ \dot{z}= & {} w(x,y,z) \nonumber \end{aligned}$$
(2.10)

with

$$\begin{aligned} u_x + v_y + w_z = 0. \end{aligned}$$

We split this 3D VP vector field into two 2D VP vector fields as follows

$$\begin{aligned} \begin{array}{lcl} \displaystyle \dot{x} = u(x,y,z), &{}\qquad \quad &{} \displaystyle \dot{x} = 0,\\ \displaystyle \dot{y} = -\int \! u_x(x,y,z)\,\mathrm {d}y, &{}&{} \displaystyle \dot{y} = v(x,y,z) + \int \! u_x(x,y,z) \,\mathrm {d}y,\\ \displaystyle \dot{z} = 0; &{}&{} \displaystyle \dot{z} = w(x,y,z). \end{array} \end{aligned}$$
(2.11)

The vector field on the left is divergence-free by construction, and since both vector fields add up to (2.1), it follows that the vector field on the right is also volume-preserving.

Having split the original vector field into 2D VP vector fields, we need to find VP integrators for each of these 2D VP vector fields. But that is easy, because in 2D volume-preserving and symplectic vector fields are the same—this, of course, holds also for symplectic Runge–Kutta methods.

The above splitting method is easily generalised to n dimensions, where one splits into \(n-1\) 2D VP vector fields, and integrates each using a symplectic Runge–Kutta method.

An alternative VP integration method was discovered independently by Shang and by Quispel [74, 80]. We again illustrate this method in 3D.

We will look for an integrator of the form

$$\begin{aligned} x_1= & {} g_1(x_1',x_2,x_3) \nonumber \\ x_2'= & {} g_2(x_1',x_2,x_3) \\ x_3'= & {} g_1(x_1',x_2',x_3) \nonumber \end{aligned}$$
(2.12)

where (here and below) \(x_i= x_i(nh)\), and \(x_i'=x_i((n+1)h)\). The reason the form (2.12) is convenient, is because any such map is VP iff

$$\begin{aligned} \frac{\partial x_1}{\partial x_1'} = \frac{\partial x_2'}{\partial x_2}\frac{\partial x_3'}{\partial x_3}. \end{aligned}$$
(2.13)

To see how to construct a VP integrator of the form (2.12), consider as an example the ODE

$$\begin{aligned} \dot{x}_1= & {} x_2 + x_1^2 + x_3^3 \nonumber \\ \dot{x}_2= & {} x_3 + x_1x_2 + x_1^4 \\ \dot{x}_3= & {} x_1 - 3x_1x_3 + x_2^5 \nonumber \end{aligned}$$
(2.14)

It is easy to check that it is divergence-free.

Now consistency requires that any integrator for (2.14) should satisfy

$$\begin{aligned} x_1'= & {} x_1 + h( x_2 + x_1^2 + x_3^3) + \mathcal{O}\left( h^2\right) \nonumber \\ x_2'= & {} x_2 + h(x_3 + x_1x_2 + x_1^4) + \mathcal{O}\left( h^2\right) \\ x_3'= & {} x_3 + h(x_1 - 3x_1x_3 + x_2^5) + \mathcal{O}\left( h^2\right) \nonumber \end{aligned}$$
(2.15)

and therefore

$$\begin{aligned} x_1&=x_1' - h( x_2 + x_1'^2 + x_3^3) + \mathcal{O}\left( h^2\right) \end{aligned}$$
(2.16)
$$\begin{aligned} x_2'&= x_2 + h(x_3 + x_1'x_2 + x_1'^4) + \mathcal{O}\left( h^2\right) \end{aligned}$$
(2.17)
$$\begin{aligned} x_3'&= x_3 + h(x_1' - 3x_1'x_3 + x_2'^5) + \mathcal{O}\left( h^2\right) \end{aligned}$$
(2.18)

Since we are free to choose any consistent \(g_2\) and \(g_3\) in (2.12), provided \(g_1\) satisfies (2.13), we choose the terms designated by \(\mathcal{O}\left( h^2\right) \) in (2.15) and (2.16) to be identically zero. Equation (2.13) then yields

$$\begin{aligned} \frac{\partial x_1}{\partial x_1'} = (1+hx_1')(1-3hx_1'). \end{aligned}$$
(2.19)

This can easily be integrated to give

$$\begin{aligned} x_1 = x_1' - hx_1'^2 - h^2x_1'^3 + k(x_2,x_3;h). \end{aligned}$$
(2.20)

where the function k denotes an integration constant that we can choose appropriately. The simplest VP integrator satisfying both (2.14) and (2.20) is therefore:

$$\begin{aligned} x_1= & {} x_1' - h( x_2 + x_1'^2 + x_3^3) -h^2x_1'^3 \nonumber \\ x_2'= & {} x_2 + h(x_3 + x_1'x_2 + x_1'^4) \\ x_3'= & {} x_3 + h(x_1' - 3x_1'x_3 + x_2'^5) \nonumber \end{aligned}$$
(2.21)

A nice aspect of the integrator (2.21) (and (2.12)) is that it is essentially only implicit in one variable. Once \(x_1'\) is computed from the first (implicit) equation, the other two equations are essentially explicit.

Of course the method just described also generalises to any divergence-free ODE in any dimension.

2.4 Preserving Energy and Other First Integrals

As mentioned, Hamiltonian systems exhibit two important geometric properties simultaneously, they conserve both the symplectic form and the energy. A famous no-go theorem by Ge and Marsden [25] has shown that it is generically impossible to construct a geometric integrator that preserves both properties at once. One therefore must choose which one of these two to preserve in any given application. Particularly in low dimensions and if the energy surface is compact, there are often advantages in preserving the energy.

An energy-preserving B-series method was discovered in [75] cf. also [59].

For any ODE \(\dot{\varvec{x}} = \varvec{f}(\varvec{x})\), this so-called average vector field method is given by

$$\begin{aligned} \frac{\varvec{x}'-\varvec{x}}{h} = \int _{0}^{1}\varvec{f}(\xi \varvec{x}' + (1-\xi )\varvec{x})\,\mathrm {d}\xi . \end{aligned}$$
(2.22)

If the vector field \(\varvec{f}\) is Hamiltonian, i.e. if there exists a Hamiltonian function \(H(\varvec{x})\) and a constant skew-symmetric matrix S such that \(\varvec{f}(\varvec{x}) = S\nabla H(\varvec{x})\), then it follows from (2.22) that energy is preserved, i.e. \(H(\varvec{x}')=H(\varvec{x})\).

While the B-series method (2.22) can generically preserve all types of Hamiltonians H, it can be shown that no Runge–Kutta method is energy-preserving for all H. (In other words, this can only be done using B-series methods that are not RK methods.) For a given polynomial H however, Runge–Kutta methods preserving that H do exist [37]. This can be seen as follows.

Note that the integral in (2.22) is one-dimensional. This means that e.g. for cubic vector fields (and hence for quartic Hamiltonians) an equivalent method is obtained by replacing the integral in (2.22) using Simpson’s rule:

$$\begin{aligned} \int _{0}^{1} g(\xi )\,\mathrm {d}\xi \approx \frac{1}{6}\left[ g(0) + 4g({\textstyle \frac{1}{2}}) + g(1)\right] \!. \end{aligned}$$
(2.23)

yielding the Runge–Kutta method

$$\begin{aligned} \frac{\varvec{x}'-\varvec{x}}{h} = \frac{1}{6}\left[ \varvec{f}(\varvec{x}) + 4\varvec{f}\left( \frac{\varvec{x}+\varvec{x}'}{2}\right) + \varvec{f}(\varvec{x}')\right] \!, \end{aligned}$$
(2.24)

preserving all quartic Hamiltonians.

We note that (2.22) has second order accuracy. Higher order generalisations have been given in [27]. We note that the average vector field method has also been applied to a slew of semi-discretised PDEs in [9].

While energy is one of the most important constants of the motion in applications, many other types of first integrals do occur. We note here that all B-series methods preserve all linear first integrals, and that all symplectic B-series methods preserve all quadratic first integrals. So, for example, the implicit midpoint rule

$$\begin{aligned} \frac{\varvec{x}'-\varvec{x}}{h} = \varvec{f}\!\left( \frac{\varvec{x}+\varvec{x}'}{2} \right) \end{aligned}$$

(which is symplectic) preserves all linear and quadratic first integrals. There are however many cases not covered by any of the above.

How does one preserve a cubic first integral that is not energy? And what about Hamiltonian systems whose symplectic structure is not constant? It turns out that generically, any ODE \(\dot{\varvec{x}} = \varvec{f}(\varvec{x})\) that preserves an integral \(I(\varvec{x})\), can be written in the form

$$\begin{aligned} \dot{\varvec{x}} = S(\varvec{x})\varvec{\nabla } I(\varvec{x}), \end{aligned}$$
(2.25)

where \(S(\varvec{x})\) is a skew-symmetric matrix.Footnote 4

An integral-preserving discretisation of (2.25) is given by

$$\begin{aligned} \frac{\varvec{x}'-\varvec{x}}{h} = \bar{S}(\varvec{x},\varvec{x}') \bar{\nabla }I(\varvec{x},\varvec{x}'), \end{aligned}$$
(2.26)

where \(\bar{S}(\varvec{x},\varvec{x}')\) is any consistent approximation to \(S(\varvec{x})\) (e.g. \(\bar{S}(\varvec{x},\varvec{x}')=S(\varvec{x})\)), and the discrete gradient \( \bar{\varvec{\nabla }}I\) is defined by

$$\begin{aligned} (\varvec{x}'-\varvec{x}) \cdot \bar{\varvec{\nabla }}I(\varvec{x},\varvec{x}') = I(\varvec{x}') - I(\varvec{x}) \end{aligned}$$
(2.27)

and

$$\begin{aligned} \lim _{\varvec{x}' \rightarrow \varvec{x}} \bar{\varvec{\nabla }}I(\varvec{x},\varvec{x}') = \varvec{\nabla } I(\varvec{x}). \end{aligned}$$
(2.28)

There are many different discrete gradients that satisfy (2.27) and (2.28). A particularly simple one is given by the Itoh–Abe discrete gradient, which for example in 3D reads

$$\begin{aligned} \bar{\nabla }I(\varvec{x},\varvec{x}') = \left[ \begin{array}{c} \displaystyle \frac{I(x_1',x_2,x_3) - I(x_1,x_2,x_3)}{x_1'-x_1} \\ \displaystyle \frac{I(x_1',x_2',x_3) - I(x_1',x_2,x_3)}{x_2'-x_2} \\ \displaystyle \frac{I(x_1',x_2',x_3') - I(x_1',x_2',x_3)}{x_3'-x_3} \end{array} \right] \!. \end{aligned}$$
(2.29)

Other examples of discrete gradients, as well as constructions of the skew-symmetric matrix \(S(\varvec{x})\) for a given vector field \(\varvec{f}\) and integral I may be found in [59].

We note that the discrete gradient method can also be used for systems with any number of integrals. For example an ODE \(\dot{\varvec{x}}=\varvec{f}(\varvec{x})\) possessing two integrals \(I(\varvec{x})\) and \(J(\varvec{x})\) can be written

$$\begin{aligned} \dot{x}_i = S_{ijk}(\varvec{x}) \frac{\partial I(\varvec{x})}{\partial x_j} \frac{\partial J(\varvec{x})}{\partial x_k}, \end{aligned}$$
(2.30)

where the summation convention is assumed over repeated indices and \(S(\varvec{x})\) is a completely antisymmetric tensor. A discretisation of (2.30) which preserves both I and J is given by

(2.31)

with \(\bar{S}\) any completely skew approximation of S and \(\bar{\nabla }I\) and \(\bar{\nabla }J\) discrete gradients as defined above. Discrete gradient methods have recently found an intriguing application in variational image regularisation [26].

3 Four Recent Stories of GNI

The purpose of this section is not to present a totality of recent research into GNI, a subject that would have called for a substantially longer paper. Instead, we wish to highlight a small number of developments with which the authors are familiar and which provide a flavour of the very wide range of issues on the current GNI agenda.

3.1 Highly Oscillatory Hamiltonian Systems

High oscillation occurs in many Hamiltonian systems. Sometimes, e.g. in the integration of equations of celestial mechanics, the source of the problem is that we wish to compute the solution across a very large number of periods and the oscillation is an artefact of the time scale in which the solution has physical relevance. In other cases oscillation is implicit in the multiscale structure of the underlying problem. A case in point are the (modified) Fermi–Pasta–Ulam (FPU) equations, describing a mechanical system consisting of alternating stiff harmonic and soft nonlinear springs. The soft springs impart fast oscillation, while the hard springs generate slow transfer of energy across the system: good numerical integration must capture both!

A good point to start (which includes modified FPU as a special case) is the second-order ODE

$$\begin{aligned} \ddot{\varvec{q}}+\Omega ^2\varvec{q}=\varvec{g}(\varvec{q}),\qquad t\ge 0,\qquad \varvec{q}(0)=\varvec{u}_0,\quad \dot{\varvec{q}}(0)=\varvec{v}_0, \end{aligned}$$
(3.1)

where \(\varvec{g}(\varvec{q})=-\varvec{\nabla }U(\varvec{q})\) and

$$\begin{aligned} \Omega = \left[ \begin{array}{cc} O &{} O\\ O &{} \omega I \end{array} \right] \!,\quad \omega \gg 1,\qquad \varvec{q}= \left[ \begin{array}{c} \varvec{q}_0\\ \varvec{q}_1 \end{array} \right] \!,\qquad \varvec{q}_0\in \mathbb {R}^{n_0},\;\;\varvec{q}_1\in \mathbb {R}^{n_1}. \end{aligned}$$

An important aspect of systems of the form (3.1) is that the exact solution, in addition to preserving the total Hamiltonian energy

$$\begin{aligned} H(\varvec{p},\varvec{q})=\frac{1}{2} (\Vert \varvec{p}_1\Vert ^2+\omega ^2 \Vert \varvec{q}_1\Vert ^2)+\frac{1}{2} \Vert \varvec{p}_0\Vert ^2 +U(\varvec{q}_0,\varvec{q}_1), \end{aligned}$$
(3.2)

where \(\dot{\varvec{q}}=\varvec{p}\), also preserves the oscillatory energy

$$\begin{aligned} I(\varvec{p},\varvec{q})=\frac{1}{2} \Vert \varvec{p}_1\Vert ^2+\frac{\omega ^2}{2} \Vert \varvec{q}_1\Vert ^2 \end{aligned}$$
(3.3)

for intervals of length \(\mathcal{O}\left( \omega ^N\right) \) for any \(N\ge 1\). This has been proved using the modulated Fourier expansions

$$\begin{aligned} \varvec{q}(t)=\sum _{m=-\infty }^\infty {\mathrm e}^{{\mathrm i}m\omega t} \varvec{z}_m(t). \end{aligned}$$

The solution of (3.1) exhibits oscillations at frequency \(\mathcal{O}\left( \omega \right) \) and this inhibits the efficiency of many symplectic methods, requiring step size of \(\mathcal{O}\left( \omega ^{-1}\right) \), a situation akin to stiffness in more conventional ODEs. However, by their very structure, exponential integrators (and in particular Gautschi-type methods (2.5)) are particularly effective in integrating the linear part, which gives rise to high oscillation. The problem with Gautschi-type methods, though, might be the occurrence of resonances and we need to be careful to avoid them, both in the choice of the right filter (cf. the discussion in Sect. 2.1) and step size h.

Of course, one would like geometric numerical integrators applied to (3.1) to exhibit favourable preservation properties with respect to both total energy (3.2) and oscillatory energy (3.3). Applying modulated Fourier expansions to trigonometric and modified trigonometric integrators, this is indeed the case provided that the step size obeys the non-resonance condition with respect to the frequency \(\omega \),

$$\begin{aligned} |\sin ({\textstyle \frac{1}{2}} mh\omega )|\ge c h^{1/2},\qquad m=1,\ldots ,N,\quad N\ge 2, \end{aligned}$$

cf. Hairer and Lubich [30].

All this has been generalised to systems with multiple frequencies, with the Hamiltonian function

$$\begin{aligned} H(\varvec{p},\varvec{q})=\overbrace{\frac{1}{2} \sum _{j=1}^s \left( \Vert \varvec{p}_j\Vert ^2+\omega _j^2\Vert \varvec{q}_j\Vert ^2\right) }^{\mathrm {oscillatory}}+\overbrace{\frac{1}{2}\Vert \varvec{p}_0\Vert ^2+U(\varvec{q})}^{\mathrm {slow}}, \end{aligned}$$

where

$$\begin{aligned} \varvec{p}= \left[ \begin{array}{c} \varvec{p}_0\\ \varvec{p}_1\\ \vdots \\ \varvec{p}_s \end{array} \right] \!,\quad \varvec{q}=\left[ \begin{array}{c} \varvec{q}_0\\ \varvec{q}_1\\ \vdots \\ \varvec{q}_s \end{array} \right] \!,\qquad 0<\min _{j=1,\ldots ,s}\omega _j,\quad 1\ll \max _{j=1,\ldots ,s}\omega _j \end{aligned}$$

for both the exact solution [24] and for discretisations obtained using trigonometric and modified trigonometric integrators [14].

Further achievements and open problem in the challenging area of marrying symplectic integration and high oscillation are beautifully described in [28] and [31].

3.2 Kahan’s ‘Unconventional’ Method

A novel discretisation method for quadratic ODEs was introduced and studied in [44, 45] and analysed first from the GNI standpoint in [78]. This new method discretised the vector field

$$\begin{aligned} \dot{x}_i = \sum _{j,k}^{}a_{ijk}x_jx_k + \sum _{j}^{}b_{ij}x_j + c_i \end{aligned}$$
(3.4)

as follows,

$$\begin{aligned} \frac{x_i'-x_i}{h} = \sum _{j,k}^{}a_{ijk} \left( \frac{x_jx_k' + x_j'x_k}{2}\right) + \sum _{j}^{}b_{ij} \left( \frac{x_j + x_j'}{2} \right) + c_i. \end{aligned}$$
(3.5)

Kahan called the method (3.5) ‘unconventional’, because it treats the quadratic terms different from the linear terms. He also noted some nice features of (3.5), e.g. that it often seemed to be able to integrate through singularities.

Properties of Kahan’s method:

  1. 1.

    Kahan’s method is (the reduction of) a Runge–Kutta method.

    Celledoni et al. [12] showed that (3.5) is the reduction to quadratic vector fields of the Runge–Kutta method

    $$\begin{aligned} \frac{\varvec{x}'-\varvec{x}}{h} = 2 \varvec{f}\left( \frac{\varvec{x} + \varvec{x}'}{2}\right) - \frac{1}{2} \varvec{f}(\varvec{x}) - \frac{1}{2} \varvec{f}(\varvec{x}') \end{aligned}$$
    (3.6)

    The RK method (3.6) (which is defined for all vector fields f), once applied to quadratic vector fields, coincides with Kahan’s method (which is only defined in the quadratic case).

    This explains inter alia why Kahan’s method preserves all linear first integrals.

  2. 2.

    Kahan’s method preserves a modified energy and measure. For any Hamiltonian vector field of the form

    $$\begin{aligned} \dot{\varvec{x}} = \varvec{f}(x) = S\varvec{\nabla } H(\varvec{x}), \end{aligned}$$
    (3.7)

    with cubic Hamiltonian \(H(\varvec{x})\) and constant symplectic (or Poisson) structure S, Kahan’s method preserves a modified energy as well as a modified measure exactly [12].

    The modified volume is

    $$\begin{aligned} \frac{\,\mathrm {d}x_1 \wedge \dots \wedge \,\mathrm {d}x_n}{\det \!\left( I - \frac{1}{2}hf'(\varvec{x}) \right) }, \end{aligned}$$
    (3.8)

    while the modified energy is

    $$\begin{aligned} \tilde{H}(\varvec{x}) := H(\varvec{x}) + \frac{1}{3}h \varvec{\nabla } H(\varvec{x})^\top \!\left( I - \frac{1}{2}hf'(\varvec{x}) \right) ^{-1} \!\varvec{f}(\varvec{x}). \end{aligned}$$
    (3.9)
  3. 3.

    Kahan’s method preserves the integrability of many integrable systems of quadratic ODEs.

    Beginning with the work of Hirota and Kimura, subsequently extended by Suris and collaborators [73], and by Quispel and collaborators [10, 12, 86], it was shown that Kahan’s method preserves the complete integrability of a surprisingly large number of quadratic ODEs. In most cases this means that, in n dimensions, Kahan’s method preserves a (modified) volume form, as well as \(n-1\) (modified) first integrals.

Here we list some 2D vector fields whose integrability is preserved by Kahan’s method:

  • Quadratic Hamiltonian systems in 2D:

    The 9-parameter family

    $$\begin{aligned} \left[ \begin{array}{c} \dot{x} \\ \dot{y} \end{array} \right] = \left[ \begin{array}{c} bx^2 + 2cxy +dy^2 +fx + gy + i \\ -ax^2 - 2bxy - cy^2 - ex -fy -h \end{array} \right] \!; \end{aligned}$$
    (3.10)
  • Suslov systems in 2D:

    The 9-parameter family

    $$\begin{aligned} \left[ \begin{array}{c} \dot{x} \\ \dot{y} \end{array} \right] = l(x,y) \left[ \begin{array}{cc} 0 &{} 1 \\ -1 &{} 0 \end{array} \right] \nabla H(x,y), \end{aligned}$$
    (3.11)

    where \(l(x,y) = ax+by+c\); \(H(x,y) = dx^2 + exy +fy^2 + gx + hy + i\);

  • Reduced Nahm equations in 2D:

    Octahedral symmetry:

    $$\begin{aligned} \left[ \begin{array}{c} \dot{x} \\ \dot{y} \end{array} \right] = \left[ \begin{array}{c} 2x^2 - 12y^2 \\ -6x^2 - 4y^2 \end{array} \right] \!; \end{aligned}$$
    (3.12)

    Icosahedral symmetry:

    $$\begin{aligned} \left[ \begin{array}{c} \dot{x} \\ \dot{y} \end{array} \right] = \left[ \begin{array}{c} 2x^2 - y^2 \\ -10xy + y^2 \end{array} \right] \!. \end{aligned}$$
    (3.13)

The modified energy and measure for the Kahan discretisations of these 2D systems, as well as of many other (higher-dimensional) integrable quadratic vector fields are given in [10, 12, 73].

Generalisations to higher degree polynomial equations using polarisation are presented in [11].

3.3 Applications to Celestial Mechanics

GNI methods particularly come into their own when the integration time is large compared to typical periods of the system. Thus long-term integrations of e.g. solar-type systems and of particle accelerators typically need symplectic methods. In this subsection we focus on the former.Footnote 5

In those studies where dissipation can be neglected, a common approach to solar system type dynamics is to split the N-body Hamiltonian H in the form

$$\begin{aligned} H=H_1+\varepsilon H_2, \end{aligned}$$
(3.14)

where \(H_1\), representing the Keplerian motion of the \(N-1\) planets, is integrable, \(H_2\) represents the interaction between the planets and \(\varepsilon >0\) is a small parameter. In this manner, special methods for near-integrable Hamiltonian dynamics can and have been used, cf. e.g. [56].

One of the first symplectic integrations of the solar system was done in [84] where it was confirmed that the solar system has a positive Lyapunov exponent, and hence exhibits chaotic behaviour cf [47].

More recently these methods have been improved and extended [5, 17, 48]. Several symplectic integrators of high order were tested in [19], in order to determine the best splitting scheme for long-term studies of the solar system.

These various methods have resulted in the fact that numerical algorithms for solar system dynamics are now so accurate that they can be used to define the geologic time scales in terms of the initial conditions and parameters of solar system models (or vice versa). For related work cf [82].

3.4 Symmetric Zassenhaus Splitting and the Equations of Quantum Mechanics

Equations of quantum mechanics in the semiclassical regime represent a double challenge of structure conservation and high oscillation. A good starting point is the linear Schrödinger equation

$$\begin{aligned} \frac{\partial u}{\partial t}={\mathrm i}\varepsilon \frac{\partial ^2 u}{\partial x^2}-{\mathrm i}\varepsilon ^{-1} V(x)u \end{aligned}$$
(3.15)

(for simplicity we restrict our discourse to a single space dimension), given in \([-1,1]\) with periodic boundary conditions. Here V is the potential energy of a quantum system, \(|u(x,t)|^2\) is a position density of a particle and \(0<\varepsilon \ll 1\) represents the difference in mass between an electron and nuclei. It is imperative to preserve the unitarity of the solution operator (otherwise \(|u(\,\cdot \,,t)|^2\) is no longer a probability function), but also deal with oscillation at a frequency of \(\mathcal{O}\left( \varepsilon ^{-1}\right) \). A conventional approach advances the solution using a palindromic splitting (2.3), but this is suboptimal for a number of reasons. Firstly, the number of splittings increases exponentially with order. Secondly, error constants are exceedingly large. Thirdly, quantifying the quality of approximation in terms of the step-size h is misleading, because there are three small quantities at play: the step size h, \(N^{-1}\) where N is the number of degrees of freedom in space discretisation (typically either a spectral method or spectral collocation) and, finally, \(\varepsilon >0\) which, originating in physics rather than being a numerical artefact, is the most important. We henceforth let \(N=\mathcal{O}\left( \varepsilon ^{-1}\right) \) (to resolve the high-frequency oscillations) and \(h=\mathcal{O}\left( \varepsilon ^\sigma \right) \) for some \(\sigma >0\)—obviously, the smaller \(\sigma \), the larger the time step.

Bader et al. [1] have recently proposed an alternative approach to the splitting of (3.15), of the form

$$\begin{aligned} {\mathrm e}^{{\mathrm i}h (\varepsilon \partial _x^2-\varepsilon ^{-1} V)}\approx {\mathrm e}^{\mathcal {R}_0}{\mathrm e}^{\mathcal {R}_1} \cdots {\mathrm e}^{\mathcal {R}_s}{\mathrm e}^{\mathcal {T}_{s+1}}{\mathrm e}^{\mathcal {R}_s} \cdots {\mathrm e}^{\mathcal {R}_1} {\mathrm e}^{\mathcal {R}_0} \end{aligned}$$
(3.16)

such that \(\mathcal {R}_k=\mathcal{O}\left( \varepsilon ^{\alpha _k}\right) \), \(\mathcal {T}_{s+1}=\mathcal{O}\left( \varepsilon ^{\alpha _{s+1}}\right) \), where \(\alpha _0\le \alpha _1<\alpha _2<\alpha _3<\cdots \)—the symmetric Zassenhaus splitting. Here \(\partial _x=\partial / \partial x\).

The splitting (3.16) is derived at the level of differential operators (i.e., prior to space discretisation), applying the symmetric Baker–Campbell–Hausdorff formula to elements in the free Lie algebra spanned by \(\partial _x^2\) and V. For \(\sigma =1\), for example, this yields

$$\begin{aligned} \mathcal {R}_0= & {} -{\textstyle \frac{1}{2}}\tau \varepsilon ^{-1}V=\mathcal{O}\left( 1\right) ,\\ \mathcal {R}_1= & {} {\textstyle \frac{1}{2}}\tau \varepsilon \partial _x^2=\mathcal{O}\left( 1\right) ,\\ \mathcal {R}_2= & {} {\textstyle \frac{1}{24}}\tau ^3\varepsilon ^{-1}(\partial _xV)^2+{\textstyle \frac{1}{12}} \tau ^3\varepsilon \{(\partial _x^2V)\partial _x^2+\partial _x^2[(\partial _x^2V)\,\cdot \,]\}=\mathcal{O}\left( \varepsilon ^2\right) ,\\ \mathcal {R}_3= & {} -{\textstyle \frac{1}{120}}\tau ^5\varepsilon ^{-1}(\partial _x^2 V)(\partial _xV)^2 -{\textstyle \frac{1}{24}}\tau ^3\varepsilon (\partial _x^4V) +{\textstyle \frac{1}{240}}\tau ^5\varepsilon \left( 7\{(\partial _x^2V)^2\partial _x^2 \right. \\&+\partial _x^2[(\partial _x^2V)^2\,\cdot \,] +\{(\partial _x^3V)(\partial _xV)\partial _x^2\left. +\partial _x^2[(\partial _x^3V)(\partial _xV)\,\cdot \,]\}\right) \\&+{\textstyle \frac{1}{120}}\tau ^5\varepsilon ^{-3} \{(\partial _x^4V)\partial _x^4+\partial _x^4[(\partial _x^4V)\,\cdot \,]\}=\mathcal{O}\left( \varepsilon ^4\right) , \end{aligned}$$

where \(\tau ={\mathrm i}h\). Note that all the commutators, ubiquitous in the BCH formula, have disappeared: in general, the commutators in this free Lie algebra can be replaced by linear combinations of derivatives, with the remarkable property of height reduction: each commutator ‘kills’ one derivative, e.g.

$$\begin{aligned}{}[V,\partial _x^2]=-(\partial ^2_x V)-2(\partial _xV)\partial _x,\qquad [[V,\partial _x^2],\partial _x^2]=(\partial _x^4V)+4(\partial _x^3V)\partial _x+4(\partial _x^2V)\partial _x^2. \end{aligned}$$

Once we discretise with spectral collocation, \(\mathcal {R}_0\) becomes a diagonal matrix and its exponential is trivial, while \({\mathrm e}^{\mathcal {R}_1}\varvec{v}\) can be computed in two FFTs for any vector \(\varvec{v}\) because \(\mathcal {R}_1\) is a Toeplitz circulant matrix. Neither \(\mathcal {R}_2\) nor \(\mathcal {R}_3\) possess useful structure, except that they are small! Therefore we can approximate \({\mathrm e}^{\mathcal {R}_k}\varvec{v}\) using the Krylov–Arnoldi process in just 3 and 2 iterations for \(k=2\) and \(k=3\), respectively, to attain an error of \(\mathcal{O}\left( \varepsilon ^6\right) \) [1].

All this has been generalised to time-dependent potentials and is applicable to a wider range of quantum mechanics equations in the semiclassical regime [2].

4 Beyond GNI

Ideas in one area of mathematical endeavour often inspire work in another area. This is true not just because new mathematical research equips us with a range of innovative tools but because it provides insight that casts new light not just on the subject in question but elsewhere in the mathematical universe. GNI has thus contributed not just toward its own goal, better understanding of structure-preserving discretisation methods for differential equations, but in other, often unexpected, directions.

4.1 GNI Meets Abstract Algebra

The traditional treatment of discretisation methods for differential equations was wholly analytic, using tools of functional analysis and approximation theory. (Lately, also tools from algebraic topology.) GNI has added an emphasis on geometry and this leads in a natural manner into concepts and tools from abstract algebra. As often in such mathematical dialogues, while GNI borrowed much of its conceptual background from abstract algebra, it has also contributed to the latter, not just with new applications but also new ideas.

  • B-series and beyond. Consider numerical integration methods that associate to each vector field \(\varvec{f}\) a map \(\varvec{\psi }_h(\varvec{f})\). A method \(\varvec{\psi }_h\) is called g-covariantFootnote 6 if the following diagram commutes,

    figure a

    It follows that if g is a symmetry of the vector field f and \(\psi \) is g-covariant, then \(\psi \) preserves the symmetry g. It seems that this concept of covariance for integration methods was first introduced in [55, 60].

    It is not hard to check that all B-series methods are covariant with respect to the group of affine transformations. A natural question to ask then, was “are B-series methods the only numerical integration methods that preserve the affine group?”. This question was open for many years, until it was answered in the negative by [66], who introduced a more general class of integration methods dubbed “aromatic Butcher series”, and showed that (under mild assumptions) this is the most general class of methods preserving affine covariance. Expansions of methods in this new class contain both rooted trees (as in B-series), as well as products of rooted trees and so-called k-loops [43].

    Whereas it may be said that to date the importance of aromatic B-series has been at the formal rather than at the constructive level, these methods may hold the promise of the construction of affine-covariant volume-preserving integrators, cf also [58].

  • Word expansions. Classical B-series can be significantly generalised by expanding in word series [69]. This introduced an overarching framework for Taylor expansions, Fourier expansions, modulated Fourier expansions and splitting methods. We consider an ODE of the form

    $$\begin{aligned} \dot{\varvec{x}}=\sum _{a\in \mathcal {A}} \lambda _a(t) \varvec{f}_a(\varvec{x}),\qquad \varvec{x}(0)=\varvec{x}_0, \end{aligned}$$
    (4.1)

    where \(\mathcal {A}\) is a given alphabet. The solution of (4.1) can be formally expanded in the form

    $$\begin{aligned} \varvec{x}(t)=\sum _{n=0}^\infty \sum _{\varvec{w}\in \mathcal {W}_n} \alpha _{\varvec{w}}(t) f_{\varvec{w}}(\varvec{x}_0), \end{aligned}$$

    where \(\mathcal {W}_n\) is the set of all words with n letters from \(\mathcal {A}\). The coefficients \(\alpha _{\varvec{w}}\) and functions \(\varvec{f}_{\varvec{w}}\) can be obtained recursively from the \(\lambda _a\)s and \(\varvec{f}_a\)s in a manner similar to B-series. Needless to say, exactly like with B-series, word series can be interpreted using an algebra over rooted trees.

    The concept of word series is fairly new in numerical mathematics but it exhibits an early promise to provide a powerful algebraic tool for the analysis of dynamical systems and their discretisation.

  • Extension of Magnus expansions. Let \(\mathcal {W}\) be a Rota–Baxter algebra, a commutative unital algebra equipped with a linear map R such that

    $$\begin{aligned} R(x)R(y)=R(R(x)y+xR(y)+\theta xy),\qquad x,y\in \mathcal {W}, \end{aligned}$$

    where \(\theta \) is a parameter. The inverse \(\partial \) of R obeys

    $$\begin{aligned} \partial (xy)=\partial (x)y+x\partial (y)+\theta \partial (x)\partial (y) \end{aligned}$$

    and is hence a generalisation of a derivation operator: a neat example, with clear numerical implications, is the backward difference \(\partial (x)=[x(t)-x(t-\theta )]/\theta \). [18] generalised Magnus expansions to this and similar settings, e.g. dendriform algebras. Their work uses the approach in [40], representing individual ‘Magnus terms’ as rooted trees, but generalises it a great deal.

  • The algebra of the Zassenhaus splitting. The success of the Zassenhaus splitting (3.16) rests upon two features. Firstly, the replacement of commutators by simpler, more tractable expressions and, secondly, height reduction of derivatives under commutation. Singh [81] has derived an algebraic structure \(\mathfrak {J}\) which, encoding these two features, allows for a far-reaching generalisation of the Zassenhaus framework. The elements of \(\mathfrak {J}\) are operators of the form \(\langle f\rangle _k =f\circ \,\partial _x^k+\partial _x^k\circ f\), where \(k\in \mathbb {Z}_+\) and f resides in a suitable function space. \(\mathfrak {J}\) can be endowed with a Lie-algebraic structure and, while bearing similarities with the Weyl algebra and the Heisenberg group, is a new and intriguing algebraic concept.

4.2 Highly Oscillatory Quadrature

Magnus expansions (2.9) are particularly effective when the matrix A(t) oscillates rapidly. This might seem paradoxical—we are all conditioned to expect high oscillation to be ‘difficult’—but actually makes a great deal of sense. Standard numerical methods are based on Taylor expansions, hence on differentiation, and their error typically scales as a high derivative of the solution. Once a function oscillates rapidly, differentiation roughly corresponds to multiplying amplitude by frequency, high derivatives become large and so does the error. However, the Magnus expansion does not differentiate, it integrates! This has an opposite effect: the more we integrate, the smaller the amplitude and the series (2.9) converges more rapidly. Indeed, often it pays to render a linear system highly oscillatory by a change of variables, in a manner described in [39], and then solve it considerably faster and cheaper. Yet, once we contemplate the discretisation of (2.9) for a highly oscillatory matrix function A(t), we soon come up another problem, usually considered difficult, if not insurmountable: computing multivariate integrals of highly oscillatory functions.

In a long list of methods for highly oscillatory quadrature (HOQ) circa 2002, ranging from the useless to the dubious, one method stood out: [50] proposed to calculate univariate integrals by converting the problem to an ODE and using collocation. This was the only effective method around, yet incompletely understood.

The demands of GNI gave the initial spur to the emergence in the last ten years to a broad swath of new methods for HOQ: Filon-type methods, which replace the non-oscillatory portion of the integrand by an interpolating polynomial [41], improved Levin-type methods [71] and the method of numerical stationary phase of [36]. The common characteristic of all these methods is that they are based on asymptotic expansions. This means that high oscillation is no longer the enemy—indeed, the faster the oscillation, the smaller the error!

Highly oscillatory integrals occur in numerous applications, from electromagnetic and acoustic scattering to fluid dynamics, quantum mechanics and beyond. Their role in GNI is minor. However, their modern numerical theory was originally motivated by a problem in GNI [16]. This is typical to how scholarship progresses and it is only natural that HOQ has severed its GNI moorings and has become an independent area on its own.

4.3 Structured Linear Algebra

GNI computations often lead to specialised problems in numerical linear algebra. However, structure preservation has wider impact in linear algebraic computations. Often a matrix in an algebraic problem belongs to an algebraic structure, e.g. a specific Lie algebra or a symmetric space, and it is important to retain this in computation—the sobriquet “Geometric Numerical Algebra” might be appropriate! Moreover, as in GNI so in GNA, respecting structure often leads to better, more accurate and cheaper numerical methods. Finally, structured algebraic computation is often critical to GNI computations.

  • Matrix factorization is the lifeblood of numerical algebra, the basis of the most effective algorithms for the solution of linear systems, computation of eigenvalues and solution of least-squares problems. A major question in GNA is “Suppose that \(A\in \mathcal {A}\), where \(\mathcal {A}\) is a set of matrices of given structure. Given a factorization \(A=BC\) according to some set of rules, what can we say about the structure of B or C?”. [52] addressed three such ‘factorization rules’: the matrix square root, \(B=C\), the matrix sign, where the elements of B are \(\pm 1\), and the polar decomposition, with unitary B and positive semidefinite C. They focussed on sets \(\mathcal {A}\) generated by a sesquilinear form \(\langle \,\cdot \,,\,\cdot \,\rangle \). Such sets conveniently fit into three classes:

    1. (a)

      Automorphisms G, such that \(\langle G\varvec{x},G\varvec{y}\rangle =\langle \varvec{x},\varvec{y}\rangle \), generate a Lie group;

    2. (b)

      Self-adjoint matrices S, such that \(\langle S\varvec{x},\varvec{y}\rangle =\langle \varvec{x},S\varvec{y}\rangle \), generate a Jordan algebra; and

    3. (c)

      Skew-adjoint matrices H such that \(\langle H\varvec{x},\varvec{y}\rangle =-\langle \varvec{x},H\varvec{y}\rangle \), generate a Lie algebra.

    It is natural to expect that conservation of structure under factorization would depend on the nature of the underlying inner product. The surprising outcome of [52] is that, for current purposes, it is sufficient to split sesquilinear forms into just two classes, unitary and orthosymmetric, each exhibiting similar behaviour.

  • Many algebraic eigenvalue problems are structured, the simplest example being that the eigenvalues of a symmetric matrix are real and of a skew-symmetric are pure imaginary: all standard methods for the computation of eigenvalues respect this. However, many other problems might have more elaborate structure, and this is the case in particular for nonlinear eigenvalue problems. An important example, with significant applications in mechanics, is

    $$\begin{aligned} (\lambda ^2 M+\lambda G+K)\varvec{x}=\varvec{0}, \end{aligned}$$
    (4.2)

    where both M and K are symmetric, while G is skew symmetric. The eigenvalues \(\lambda \) of (4.2) exhibit Hamiltonian pattern: if \(\lambda \) is in the spectrum then so are \(-\lambda ,\bar{\lambda }\) and \(-\bar{\lambda }\).Footnote 7 As often in numerical algebra, (4.2) is particularly relevant when the underlying matrices are large and sparse.

    Numerical experiments demonstrate that standard methods for the computation of a quadratic eigenvalue problem may fail to retain the Hamiltonian structure of the spectrum but this can be obtained by bespoke algorithms, using a symplectic version of the familiar Lanczos algorithm, cf. [3].

    This is just one example of the growing field of structured eigenvalue and inverse eigenvalue problems.

  • The exponential from an algebra to a group: Recall Lie-group methods from Sect. 2.2: a critical step, e.g. in the RKMK methods, is the exponential map from a Lie algebra to a Lie group. Numerical analysis knows numerous effective ways to approximate the matrix exponential [61], yet most of them fail to map a matrix from a Lie algebra to a Lie group! There is little point to expand intellectual and computational effort to preserve structure, only to abandon the latter in the ultimate step, and this explains the interest in the computation of the matrix exponential which is assured to map A in a Lie algebra to an element in the corresponding Lie group. While early methods have used structure constants and, for maximal sparsity, Lie-algebraic bases given by space-root decomposition [8], the latest generation of algorithms is based upon generalised polar decomposition [67].