This chapter focuses on discrete gradient integrators intending to preserve the first integral or the Lyapunov function of the original continuous system. Incorporating the discrete gradients with exponential integrators, we discuss a novel exponential integrator for the conservative or dissipative system \(\dot{y}=Q(My+\nabla U(y))\), where Q is a \(d\times d\) real matrix, M is a \(d\times d\) symmetric real matrix and \(U : \mathbb {R}^{d}\rightarrow \mathbb {R}\) is a differentiable function. For conservative systems, the exponential integrator preserves the energy, while for dissipative systems, the exponential integrator preserves the decaying property of the Lyapunov function. Two properties of the new scheme are presented. Numerical experiments demonstrate the remarkable superiority of the new scheme in comparison with other structure-preserving schemes in the recent literature.

2.1 Introduction

In this chapter we are interested in the numerical solution of the IVP

$$\begin{aligned} \begin{aligned} \dot{y}(t)=Q(My(t)+\nabla U(y(t))),\quad {y(t_{0})=y^{0},} \end{aligned} \end{aligned}$$
(2.1)

where the \(\dot{y}\) denotes the derivative with respect to time, Q is a \(d\times d\) real matrix, M is a \(d\times d\) symmetric real matrix and \(U : \mathbb {R}^{d}\rightarrow \mathbb {R}\) is a differentiable function. Since M is symmetric, \(My(t)+\nabla U(y(t))\) is the gradient of the function

$$H(y(t))=\frac{1}{2}y(t)^{\intercal }My(t)+U(y(t)).$$

In physical applications, the quantity H is often referred to as “energy”. Two special categories are important in applications:

(i) If Q is skew-symmetric, then (2.1) is a conservative system with the first integral H, i.e. H(y(t)) is constant.

(ii) If Q is negative semi-definite (denoted by \(Q\le 0\)), then (2.1) is a dissipative system with the Lyapunov function H, i.e. H(y(t)) is monotonically decreasing along the solution y(t).

An even more particular case in the first category is that Q in (2.1) is the identity matrix. The system becomes

$$\begin{aligned} \begin{aligned} \dot{y}(t)&=My(t)+\nabla U(y(t)),\quad {y(t_{0})=y^{0}.} \end{aligned} \end{aligned}$$
(2.2)

An algorithm for (2.2) is an exponential integrator if it involves the computation of matrix exponentials (or related matrix functions) and exactly integrates the following system

$$\begin{aligned} \dot{y}(t)-My(t)=0. \end{aligned}$$

In general, exponential integrators permit larger stepsizes and achieve higher accuracy than non-exponential ones when (2.2) is a very stiff differential equation such as a highly oscillatory ODE or a semi-discrete time-dependent PDE. Therefore, numerous exponential algorithms have been proposed for first-order (see, e.g. [1, 10, 20, 22,23,24,25,26, 31]) and second-order (see e.g. [11, 12, 14, 18, 34]) ODEs.

On the other hand, (2.2) often possesses many important geometrical/physical structures. For example, the canonical Hamiltonian system

$$\begin{aligned} \begin{aligned} \dot{y}(t)&=J^{-1}\nabla H(y(t)),\quad {y(t_{0})=y^{0},} \end{aligned} \end{aligned}$$
(2.3)

is a special case of (2.2), with

$$\begin{aligned} J=\left( \begin{array}{cc}O_{d\times d}&{}I_{d\times d}\\ -I_{d\times d}&{}O_{d\times d}\end{array}\right) . \end{aligned}$$

The flow of (2.3) preserves the symplectic 2-form \(\mathrm{d}y\wedge J\mathrm{d}y\) and the function H(y). In the sense of geometric integration, it is a natural idea to design numerical schemes that preserve the two structures. As far as we know, most research papers dealing with exponential integrators up to now focus on the development of high-order explicit schemes but fail to be structure preserving except for symmetric/symplectic/energy-preserving methods for first-order ODEs in [5, 7] and oscillatory second-order ODEs (see, e.g. [18, 32, 33]).

It should be noted that the choice for M in (2.1) or in (2.2) is not unique. In order to take advantage of exponential integrators, the matrix M in (2.1) should be chosen such that \(||QM||\gg ||Q Hess(U)||\), where Hess(U) is the Hessian matrix of U. For example, highly oscillatory Hamiltonian systems can be characterized by a dominant linear part My, where M implicitly contains the large frequency component. Up to now, many energy-preserving or energy-decaying methods have been proposed in the case of \(M=0\) (see, e.g. [3, 4, 15, 17, 19, 29]). However, these general-purpose methods are not suitable for dealing with (2.1) when ||QM|| is very large. On the one hand, numerical solutions generated by these methods are far from accurate. On the other hand, they are generally implicit, and iterative solutions are required at each step. But the fixed-point iterations for them are not convergent unless the stepsize is taken very small. As mentioned at the beginning, these two obstacles can hopefully be overcome by introducing exponential integrators. In [32], the authors proposed an energy-preserving AAVF integrator (a trigonometric method) for solving the second-order Hamiltonian system

$$\begin{aligned} \left\{ \begin{aligned}&\ddot{q}(t)+\tilde{M}q(t)=\nabla \tilde{U}(q(t)),\quad \tilde{M} \text { is a symmetric matrix},\\&q(t_{0})=q_{0},\quad \dot{q}(t_{0})=\dot{q}_{0},\\ \end{aligned}\right. \end{aligned}$$

which falls into the class of (2.1) by introducing

$$y=(\dot{q}^{\intercal },q^{\intercal })^{\intercal },~ U(y)= \tilde{U}(q),~ Q=J^{-1},$$
$$M=\left( \begin{array}{cc}I_{d\times d} &{}0_{d\times d}\\ 0_{d\times d}&{}\tilde{M}\end{array}\right) ,$$

and

$$U(y)=-\tilde{U}(q).$$

In this chapter, we present and analyse a new exponential integrator for (2.1) which can preserve the first integral or the Lyapunov function.

This chapter is organized as follows. Section 2.2 presents the discrete gradient integrators. In Sect. 2.3, we construct a general structure-preserving scheme for (2.1)–an exponential discrete gradient integrator. Two important properties of the scheme are proven. Symmetry and convergence of the EAVF integrator are investigated in Sect. 2.4. We then present a list of problems which can be solved by this scheme in Sect. 2.5. Numerical results, including the comparison between our new scheme and other structure-preserving schemes in the literature, are shown in Sect. 2.6. Section 2.7 is devoted to concluding remarks.

2.2 Discrete Gradient Integrators

Let r(z) be a holomorphic function in the neighborhood of zero (\(r(0):=\lim \limits _{z\rightarrow 0} r(z)\) if 0 is a removable singularity)

$$\begin{aligned} r(z)=\sum _{i=0}^{\infty }\frac{r^{(i)}(0)}{i!}z^{i}. \end{aligned}$$
(2.4)

The series (2.4) is assumed to be absolutely convergent. For a matrix A, the matrix-valued function r(A) is defined by

$$\begin{aligned} r(A)=\sum _{i=0}^{\infty }\frac{r^{(i)}(0)}{i!}A^{i}. \end{aligned}$$

I and O always denote identity and zero matrices of appropriate dimensions respectively. \(A^{\frac{1}{2}}\) is a square root (not necessarily principal) of a symmetric matrix A. If \(r^{(i)}(0)=0\) for all odd i, then \(r(A^{\frac{1}{2}})\) is well defined for every symmetric A (independent of the choice of \(A^{\frac{1}{2}}\)). For functions of matrices, the reader is referred to [21].

The discrete gradient method is an effective approach to constructing energy-preserving integrators. A discrete gradient (DG) of a differentiable function g is a bi-variate mapping \({\nabla }^\mathrm{D}g:\mathbb R^{d}\times \mathbb {R}^{d}\rightarrow \mathbb R^{d}\) satisfying

$$\begin{aligned} \left\{ \begin{aligned}&{\nabla }^\mathrm{D}g(y,\hat{y})^{\intercal }(y-\hat{y})=g(y)-g(\hat{y}),\\&{\nabla }^\mathrm{D}g(y,y)=\nabla g(y). \end{aligned}\right. \end{aligned}$$
(2.5)

Accordingly, a DG integrator for the system (2.3) is defined by

$$\begin{aligned} y^{1}=y^{0}+hJ^{-1}{\nabla }^\mathrm{D}H(y^{1},y^{0}). \end{aligned}$$
(2.6)

Multiplying \({\nabla }^\mathrm{D}g(y^{1},y^{0})^{\intercal }\) on both sides of (2.6) and using the first identity of (2.5), we obtain \(H(y^{1})=H(y^{0})\), i.e., the scheme (2.6) is energy preservation. For more details on the DG method, readers are referred to [15, 30]. A typical discrete gradient is the average-vector-field (AVF) which is defined by

$$\begin{aligned} {\nabla }^\mathrm{D}g(y,\hat{y})=\int _{0}^{1}{\nabla }g((1-\tau )\hat{y}+\tau {y})\mathrm{d}\tau . \end{aligned}$$
(2.7)

Then the AVF integrator for the system (2.3) is given by

$$\begin{aligned} y^{1}=y^{0}+hJ^{-1}\int _{0}^{1}{\nabla }H((1-\tau )y^{0}+\tau y^{1})\mathrm{d}\tau . \end{aligned}$$
(2.8)

2.3 Exponential Discrete Gradient Integrators

We next derive the exponential discrete gradient method for the problem (2.1). The starting point is the following variation-of-constants formula for the problem (2.1):

$$\begin{aligned} y(t_{0}+h)=\exp (hQM)y(t_{0})+h\int _{0}^{1}\exp ((1-\xi )hQM)Q\nabla U(y(t_{0}+\xi h))\mathrm{d}\xi . \end{aligned}$$
(2.9)

Approximating \(\nabla U(y(t_{0}+\xi h))\) in (2.9) by \({\nabla }^\mathrm{D}U(y^{1},y^{0})\), we obtain the following exponential discrete gradient (EDG) integrator:

$$\begin{aligned} y^{1}=\exp (V)y^{0}+h\varphi (V)Q{\nabla }^\mathrm{D}U(y^{1},y^{0}), \end{aligned}$$
(2.10)

where \(V=hQM\),

$$\varphi (V)=(\exp (V)-I)V^{-1},$$

and \(y^{1}\) is an approximation of \(y(t_{0}+h)\).

Due to the energy-preserving property of the DG method, we are hopeful of preserving the first integral by (2.10) when Q is skew symmetric. For simplicity, we sometimes write \({\nabla }^\mathrm{D}U(y^{1},y^{0})\) in brief as \({\nabla }^\mathrm{D}U\). To begin with, we give the following preliminary lemma.

Lemma 2.1

For any real symmetric matrix M and scalar \(h>0\), the matrix

$$B=\exp (hQM)^{\intercal }M\exp (hQM)-M$$

satisfies:

$$\begin{aligned} B=\left\{ \begin{aligned}&=0 , \quad \text {if }Q \,\text {is skew-symmetric,}\\&\le 0, \quad \text {if }Q \le 0. \end{aligned}\right. \end{aligned}$$

Proof

Consider the linear ODE:

$$\begin{aligned} \dot{y}(t)=QMy(t). \end{aligned}$$
(2.11)

When Q is skew symmetric, (2.11) is a conservative equation with the first integral \(\frac{1}{2}y^{\intercal }My\), and its exact solution starting from the initial value \(y(0)=y^{0}\) is \(y(t)=\exp (tQM)y^{0}\). It then follows immediately from

$$\frac{1}{2}y(h)^{\intercal }My(h)=\frac{1}{2}y^{0\intercal }My^{0}$$

that

$$\begin{aligned} \ \frac{1}{2}y^{0\intercal }\exp (hQM)^{\intercal }M\exp (hQM)y^{0}=\frac{1}{2}y^{0\intercal }My^{0} \end{aligned}$$

for any vector \(y^{0}\). Therefore,

$$B=\exp (hQM)^{\intercal }M\exp (hQM)-M$$

is skew-symmetric. Since it is also symmetric, \(B=0\).

Likewise, the case that \(Q\le 0\) can be proved.    \(\square \)

Theorem 2.1

If Q is skew-symmetric, then the integrator (2.10) preserves the first integral H in (2.1):

$$\begin{aligned} H(y^{1})=H(y^{0}), \end{aligned}$$

where \(H(y)=\frac{1}{2}y^{\intercal }My+U(y)\).

Proof

Here we firstly assume that the matrix M is nonsingular. We next calculate \(\frac{1}{2}y^{1\intercal }My^{1}\). Denote \(M^{-1}{\nabla }^\mathrm{D}U=\widetilde{\nabla }U\). Replacing \(y^{1}\) by \(\exp (V)y^{0}+h\varphi (V)Q{\nabla }^\mathrm{D}U(y^{1},y^{0})\) leads to

$$\begin{aligned} \begin{aligned}&\frac{1}{2}y^{1\intercal }My^{1}\\&=\frac{1}{2}(y^{0\intercal }\exp (V)^{\intercal }+h{\nabla }^\mathrm{D}U^{\intercal }Q^{\intercal }\varphi (V)^{\intercal })M(\exp (V)y^{0}+h\varphi (V)Q{\nabla }^\mathrm{D}U)\\&=\frac{1}{2}y^{0\intercal }\exp (V)^{\intercal }M\exp (V)y^{0}+hy^{0\intercal }\exp (V)^{\intercal }M\varphi (V)Q{\nabla }^\mathrm{D}U\\&\quad +\frac{h^{2}}{2}{\nabla }^\mathrm{D}U^{\intercal }Q^{\intercal }\varphi (V)^{\intercal }M\varphi (V)Q{\nabla }^\mathrm{D}U\\&=\frac{1}{2}y^{0\intercal }\exp (V)^{\intercal }M\exp (V)y^{0}+y^{0\intercal }\exp (V)^{\intercal }M\varphi (V)V\widetilde{\nabla }U\\&\quad + \frac{1}{2}\widetilde{\nabla }U^{\intercal }V^{\intercal }\varphi (V)^{\intercal }M\varphi (V)V\widetilde{\nabla }U\quad (\text {using }V=hQM)\\&=\frac{1}{2}y^{0\intercal }\exp (V)^{\intercal }M\exp (V)y^{0}+y^{0\intercal }\exp (V)^{\intercal }M(\exp (V)-I)\widetilde{\nabla }U\\&\quad +\frac{1}{2}\widetilde{\nabla }U^{\intercal }(\exp (V)^{\intercal }-I)M(\exp (V)-I)\widetilde{\nabla }U\quad (\text {using }\varphi (V)V=\exp (V)-I)\\&=\frac{1}{2}y^{0\intercal }\exp (V)^{\intercal }M\exp (V)y^{0}+y^{0\intercal }(\exp (V)^{\intercal }M\exp (V)-\exp (V)^{\intercal }M)\widetilde{\nabla }U\\&\quad +\frac{1}{2}\widetilde{\nabla }U^{\intercal }(\exp (V)^{\intercal }M\exp (V)-\exp (V)^{\intercal }M-M\exp (V)+M)\widetilde{\nabla }U.\\ \end{aligned} \end{aligned}$$
(2.12)

On the other hand, it follows from the property of the discrete gradient (2.5) that

$$\begin{aligned} \begin{aligned}&U(y^{1})-U(y^{0})\\&=(y^{1\intercal }-y^{0\intercal }){\nabla }^\mathrm{D}U(y^{1},y^{0})\\&=y^{0\intercal }(\exp (V)^{\intercal }-I){\nabla }^\mathrm{D}U +h{\nabla }^\mathrm{D}U^{\intercal }Q^{\intercal }\varphi (V)^{\intercal }{\nabla }^\mathrm{D}U\\&=y^{0\intercal }(\exp (V)^{\intercal }M-M)\widetilde{\nabla }U +\widetilde{\nabla }U^{\intercal }V^{\intercal }\varphi (V)^{\intercal }M\widetilde{\nabla }U\\&=y^{0\intercal }(\exp (V)^{\intercal }M-M)\widetilde{\nabla }U +\widetilde{\nabla }U^{\intercal }(\exp (V)^{\intercal }M-M)\widetilde{\nabla }U.\\ \end{aligned} \end{aligned}$$
(2.13)

Combining (2.12), (2.13) and collecting terms by types ‘\(y^{0\intercal }*y^{0}\)’, ‘\(y^{0\intercal }*\widetilde{\nabla }U\)’, ‘\(\widetilde{\nabla }U^{\intercal }*\widetilde{\nabla }U\)’ lead to

$$\begin{aligned} \begin{aligned}&H(y^{1})-H(y^{0})\\&=\frac{1}{2}y^{1\intercal }My^{1}-\frac{1}{2}y^{0\intercal }My^{0}+U(y^{1})-U(y^{0})\\&=\frac{1}{2}y^{0\intercal }(\exp (V)^{\intercal }M\exp (V)-M)y^{0}+y^{0\intercal }(\exp (V)^{\intercal }M\exp (V)-M)\widetilde{\nabla }U\\&\quad +\frac{1}{2}\widetilde{\nabla }U^{\intercal }(\exp (V)^{\intercal }M\exp (V)-M)\widetilde{\nabla }U +\frac{1}{2}\widetilde{\nabla }U^{\intercal }(\exp (V)^{\intercal }M-M\exp (V))\widetilde{\nabla }U\\&=\frac{1}{2}(y^{0}+\widetilde{\nabla }U)^{\intercal }B(y^{0}+\widetilde{\nabla }U)+\frac{1}{2}\widetilde{\nabla }U^{\intercal }C\widetilde{\nabla }U=0,\\ \end{aligned} \end{aligned}$$
(2.14)

where \(B=\exp (V)^{\intercal }M\exp (V)-M\) and \(C=\exp (V)^{\intercal }M-M\exp (V)\). The last step is from the skew-symmetry of the matrix B (according to Lemma 2.1) and C.

If M is singular, it is easy to find a series of symmetric and nonsingular matrices \(\{M_{\varepsilon }\}\) which converge to M when \(\varepsilon \rightarrow 0.\) Thus, according to the result stated above, it still holds that

$$\begin{aligned} H_{\varepsilon }(y_{\varepsilon }^{1})=H_{\varepsilon }(y^{0}) \end{aligned}$$
(2.15)

for all \(\varepsilon ,\) where \(H_{\varepsilon }(y)=\frac{1}{2}y^{\intercal }M_{\varepsilon }y+U(y)\) is the first integral of the perturbed problem

$$\dot{y}=Q(M_{\varepsilon }y+\nabla U(y)),\quad {y(t_{0})=y^{0}},$$

and

$$\begin{aligned} y_{\varepsilon }^{1}=\exp (V_{\varepsilon })y^{0}+h\varphi (V_{\varepsilon })Q{\nabla }^\mathrm{D}U(y_{\varepsilon }^{1},y^{0}),\quad V_{\varepsilon }=hQM_{\varepsilon }. \end{aligned}$$

Therefore, when \(\varepsilon \rightarrow 0\), \(y^{1}_{\varepsilon }\rightarrow y^{1}\) and (2.15) lead to

$$H(y^{1})=H(y^{0}).$$

This completes the proof.    \(\square \)

Moreover, the scheme (2.10) can also respect the decay of the first integral when \(Q\le 0\) in (2.1). The next theorem shows this point.

Theorem 2.2

If Q is negative semi-definite (not necessarily symmetric), then the scheme (2.10) preserves the decaying property of the Lyapunov function H in (2.1):

$$\begin{aligned} H(y^{1})\le H(y^{0}), \end{aligned}$$

where \(H(y)=\frac{1}{2}y^{\intercal }My+U(y)\).

Proof

If M is nonsingular, the equation in (2.14)

$$\begin{aligned} H(y^{1})-H(y^{0})=\frac{1}{2}(y^{0}+\widetilde{\nabla }U)^{\intercal }B(y^{0}+\widetilde{\nabla }U) \end{aligned}$$

still holds, since the derivation does not depend on the skew-symmetry of Q. By Lemma 2.1, B is negative semi-definite. Thus \(H(y^{1})\le H(y^{0}).\) In the case that M is singular, this theorem can be easily proved by replacing the equalities

$$H_{\varepsilon }(y_{\varepsilon }^{1})=H_{\varepsilon }(y^{0}),\quad H(y^{1})=H(y^{0})$$

in the proof of Theorem 2.1 with the inequalities

$$H_{\varepsilon }(y_{\varepsilon }^{1})\le H_{\varepsilon }(y^{0}),\quad H(y^{1})\le H(y^{0}).$$

We omit the details.   \(\square \)

2.4 Symmetry and Convergence of the EAVF Integrator

In this chapter, we consider a special type of the discrete gradient in (2.10), the average vector field,

$$ \begin{aligned} {\nabla }^\mathrm{D}U(y,\hat{y})=\int _{0}^{1}\nabla U((1-\tau )\hat{y}+\tau y)d\tau .\end{aligned}$$

The corresponding integrator becomes

$$\begin{aligned} \begin{aligned} y^{1}&=\exp (V)y^{0}+h\varphi (V)Q\int _{0}^{1}\nabla U((1-\tau )y^{0}+\tau y^{1})d\tau ,\end{aligned} \end{aligned}$$
(2.16)

where \(V=hQM\) and \(y^{1}\approx y(t_{0}+h)\). The scheme (2.16) is called an exponential AVF integrator and denoted by EAVF.

In the sequel we present and prove two properties of EAVF—symmetry and convergence.

Theorem 2.3

The EAVF integrator (2.16) is symmetric.

Proof

Exchanging \(y^{0}\leftrightarrow y^{1}\) and replacing h by \(-h\) in (2.16), we obtain

$$\begin{aligned} y^{0}=\exp (-V)y^{1}-h\varphi (-V)Q\int _{0}^{1}\nabla U((1-\tau )y^{1}+\tau y^{0})d\tau . \end{aligned}$$
(2.17)

We rewrite (2.17) as:

$$\begin{aligned} y^{1}=\exp (V)y^{0}+h\exp (V)\varphi (-V)Q\int _{0}^{1}\nabla U((1-\tau )y^{0}+\tau y^{1})d\tau . \end{aligned}$$
(2.18)

Since \(\exp (V)\varphi (-V)=\varphi (V)\), (2.18) is the same as (2.16) exactly. This means that EAVF is symmetric.    \(\square \)

It should be noted that the scheme (2.16) is implicit in general, and thus iterative solutions are required. We next discuss the convergence of the fixed-point iteration for the EAVF integrator.

Theorem 2.4

Suppose that \(||\varphi (V)||_{2}\le C\), and that \(\nabla U(u)\) satisfies a Lipschitz condition; i.e., there exists a constant L such that

$$||\nabla U(v)-\nabla U(w)||_{2}\le L||v-w||_{2},$$

for all arguments v and w \(\in \mathbb {R}^{d}\). If

$$\begin{aligned} 0<h\le \hat{h}<\frac{2}{CL||Q||_{2}}, \end{aligned}$$
(2.19)

then the mapping

$$ \varPsi :z\mapsto \exp (V)y^{0}+h\varphi (V)Q\int _{0}^{1}\nabla U((1-\tau )y^{0}+\tau z)d\tau $$

has a unique fixed point and the iteration for the EAVF integrator (2.16) is convergent.

Proof

Since

$$\begin{aligned} \begin{aligned}&||\varPsi (z_{1})-\varPsi (z_{2})||_{2}\\&=||h\varphi (V)Q\int _{0}^{1}(\nabla U((1-\tau )y^{0}+\tau z_{1})-\nabla U((1-\tau )y^{0}+\tau z_{2}))d\tau ||_{2}\\&\le h\Vert \varphi (V)\Vert _2\Vert Q\Vert _2\int _{0}^{1}\Vert \nabla U((1-\tau )y^{0}+\tau z_{1})-\nabla U((1-\tau )y^{0}+\tau z_{2})\Vert _2d\tau \\&\le hCL||Q||_{2}\int _{0}^{1}\tau ||z_{1}-z_{2}||_{2}d\tau \\&= \frac{h}{2}CL||Q||_{2}||z_{1}-z_{2}||_{2}\\&\le \rho ||z_{1}-z_{2}||_{2},\\ \end{aligned} \end{aligned}$$

where \(\rho =\dfrac{{\hat{h}}}{2}CL||Q||_{2}<1\), by the Contraction Mapping Theorem, the mapping \(\varPsi \) has a unique fixed point and the iteration solving the Eq. (2.16) is convergent.    \(\square \)

Remark 2.1

We note two special and important cases in practical applications. If QM is skew-symmetric or symmetric negative semi-definite, then the spectrum of V lies in the left half-plane. Since QM is unitarily diagonalizable and \(|\varphi (z)|\le 1\) for any z satisfying \(\mathrm{Re}(z)\le 0\), we have \(||\varphi (V)||_{2}\le 1\).

In many cases, the matrix M has an extremely large norm (e.g., M incorporates high frequency components in oscillatory problems or M is the differential matrix in semi-discrete PDEs), and hence Theorem 2.4 ensures the possibility of choosing relatively large stepsize regardless of M.

In practice, the integral in (2.16) usually cannot be easily calculated. Therefore, we can evaluate it using the s-point Gauss-Legendre (GLs) formula \((b_{i},c_{i})_{i=1}^{s}\):

$$\begin{aligned} \int _{0}^{1}\nabla U((1-\tau )y^{0}+\tau y^{1})d\tau \approx \sum _{i=1}^{s}b_{i}\nabla U((1-c_{i})y^{0}+c_{i}y^{1})). \end{aligned}$$

The corresponding scheme is denoted by EAVFGLs. Since the s-point GL quadrature formula is symmetric, EAVFGLs is also symmetric. Due to the fact that \(\sum _{i=1}^{s}b_{i}c_{i}=1/2\), the corresponding iteration for EAVFGLs is convergent provided (2.19) holds.

2.5 Problems Suitable for EAVF

2.5.1 Highly Oscillatory Nonseparable Hamiltonian Systems

Consider the Hamiltonian

$$H(p,q)=\frac{1}{2}p_{1}^{\intercal }M_{1}^{-1}p_{1}+\frac{1}{2\varepsilon ^{2}}q_{1}^{\intercal }A_{1}q_{1}+S(p,q),$$

where p and q are both d-length vectors, partitioned as

$$p=\left( \begin{array}{c}p_{0}\\ p_{1}\end{array}\right) ,\quad q=\left( \begin{array}{c}q_{0}\\ q_{1}\end{array}\right) ,$$

\(M_{1}, A_{1}\) are symmetric positive definite matrices, and \(0<\varepsilon \ll 1\). This Hamiltonian governs oscillatory mechanical systems in 2 or 3 spatial dimensions such as the stiff spring pendulum and the dynamics of the multi-atomic molecule (see, e.g. [8, 9]). With an appropriate canonical transformation (see, e.g. [18]), the Hamiltonian becomes

$$\begin{aligned} H(p,q)=\frac{1}{2}\sum _{j=1}^{l}\left( p_{1,j}^{2}+\frac{\lambda _{j}^{2}}{\varepsilon ^{2}}q_{1,j}^{2}\right) +S(p,q), \end{aligned}$$
(2.20)

where \(p_{1}=(p_{1,1},\ldots ,p_{1,l})^{\intercal }, q_{1}=(q_{1,1},\ldots ,q_{1,l})^{\intercal }\). The corresponding differential equations are given by

$$\begin{aligned} \left\{ \begin{aligned}&\dot{p_{0}}=-\nabla _{q_{0}}S(p,q),\\&\dot{p_{1}}=-\omega ^{2}q_{1}-\nabla _{q_{1}}S(p,q),\\&\dot{q_{0}}=p_{0}+(\nabla _{p_{0}}S(p,q)-p_{0}),\\&\dot{q_{1}}=p_{1}+\nabla _{p_{1}}S(p,q),\\ \end{aligned}\right. \end{aligned}$$
(2.21)

where \(\omega ^2=\mathrm{diag}(\omega _{1}^2,\ldots ,\omega _{l}^2), \omega _{j}=\lambda _{j}/\varepsilon \) for \(j=1,\ldots ,l\). Equation (2.21) is of the form (2.1) with

$$\begin{aligned} y=\left( \begin{array}{c}p\\ q\end{array}\right) ,\quad Q=\left( \begin{array}{cc}O&{}-I_{d\times d}\\ I_{d\times d}&{}O\end{array}\right) ,\quad M=\left( \begin{array}{cc}I_{d\times d}&{}O\\ O&{}\varOmega _{d\times d}\end{array}\right) , \end{aligned}$$

and

$$\begin{aligned} U(p,q)=S(p,q)-\frac{1}{2}p_{0}^{\intercal }p_{0},\quad \varOmega =\mathrm{diag}(0,\ldots ,0,\omega _{1}^{2},\ldots ,\omega _{l}^{2}). \end{aligned}$$

Since \(q_{11},\ldots ,q_{1l}\) and \(p_{11},\ldots ,p_{1l}\) are fast variables, it is favorable to integrate the linear part of them exactly by the scheme (2.16). Note that

$$\begin{aligned} \varphi (V)=\left( \begin{array}{cc}\mathrm{sinc}(h\varOmega ^{\frac{1}{2}})&{}h^{-1}g_{2}(h\varOmega ^{\frac{1}{2}})\\ hg_{1}(h\varOmega ^{\frac{1}{2}})&{}\mathrm{sinc}(h\varOmega ^{\frac{1}{2}})\end{array}\right) , \end{aligned}$$

where \(\mathrm{sinc}(z)=\sin (z)/z, g_{1}(z)=(1-\cos (z))/z^{2}, g_{2}(z)=\cos (z)-1\). Unfortunately, the block \(h^{-1}g_{2}(h\varOmega ^{\frac{1}{2}})\) is not uniformly bounded. In the first experiment, the iteration still works well, perhaps due to the small Lipshitz constant of \(\nabla S\).

2.5.2 Second-Order (Damped) Highly Oscillatory System

Consider

$$\begin{aligned} \ddot{q}-N\dot{q}+\varOmega q=-\nabla U_{1}(q), \end{aligned}$$
(2.22)

where q is a d-length vector variable, \(U_{1}: \mathbb {R}^{d}\rightarrow \mathbb {R}\) is a differential function, N is a symmetric negative semi-definite matrix, \(\varOmega \) is a symmetric positive semi-definite matrix, \(||\varOmega ||\) or \(||N||\gg 1\). (2.22) stands for highly oscillatory problems such as the dissipative molecular dynamics, the (damped) Duffing and semi-discrete nonlinear wave equations. By introducing \(p=\dot{q},\) we write (2.22) as a first-order system of ODEs:

$$\begin{aligned} \left( \begin{array}{c}\dot{p}\\ \dot{q}\end{array}\right) =\left( \begin{array}{cc}N&{}-\varOmega \\ I&{}O\end{array}\right) \left( \begin{array}{c}p\\ q\end{array}\right) +\left( \begin{array}{c}-\nabla U_{1}(q)\\ O\end{array}\right) , \end{aligned}$$
(2.23)

which falls into the class of (2.1), where

$$\begin{aligned} y=\left( \begin{array}{c}p\\ q\end{array}\right) , Q=\left( \begin{array}{cc}N&{}-I\\ I&{}O\end{array}\right) , M=\left( \begin{array}{cc}I&{}O\\ O&{}\varOmega \end{array}\right) , U(y)=U_{1}(q). \end{aligned}$$

Clearly, \(Q\le 0\) and (2.23) is a dissipative system with the Lyapunov function \(H=\frac{1}{2}p^{\intercal }p+\frac{1}{2}q^{\intercal }\varOmega q+U_{1}(q)\). Applying the EAVF integrator (2.16) to the Eq. (2.23) yields the scheme:

$$\begin{aligned} \left\{ \begin{aligned}&p^{1}=\exp _{11}p^{0}+\exp _{12}q^{0}-h\varphi _{11}\int _{0}^{1}\nabla U_{1}((1-\tau )q^{0}+\tau q^{1})d\tau ,\\&q^{1}=\exp _{21}p^{0}+\exp _{22}q^{0}-h\varphi _{21}\int _{0}^{1}\nabla U_{1}((1-\tau )q^{0}+\tau q^{1})d\tau ,\\ \end{aligned}\right. \end{aligned}$$
(2.24)

where \(\exp (hQM)\) and \(\varphi (hQM)\) are partitioned into

$$\begin{aligned} \left( \begin{array}{cc}\exp _{11}&{}\exp _{12}\\ \exp _{21}&{}\exp _{22}\end{array}\right) \text { and } \left( \begin{array}{cc}\varphi _{11}&{}\varphi _{12}\\ \varphi _{21}&{}\varphi _{22}\end{array}\right) , \end{aligned}$$

respectively.

It should be noted that only the second equation in the scheme (2.24) needs to be solved by iteration. From the proof procedure of Theorem 2.4, one can find that the convergence of the fixed-point iteration for the second equation in (2.24) is irrelevant to ||QM|| provided \(\varphi _{21}\) is uniformly bounded.

Theorem 2.5

Suppose that \(\varOmega \) and N commute and \(||\nabla U_{1}(v)-\nabla U_{1}(w)||_{2}\le L||v-w||_{2}\). Then the iteration

$$\varPhi : z\mapsto \exp _{21}p^{0}+\exp _{22}q^{0}-h\varphi _{21}\int _{0}^{1}\nabla U_{1}((1-\tau )q^{0}+\tau z)d\tau $$

for the scheme (2.24) is convergent provided

$$0<h\le \hat{h}<\frac{2}{L^{\frac{1}{2}}}.$$

Proof

It is crucial here to find a uniform bound of \(\Vert \varphi _{21}\Vert \). Since \(\varOmega \) and N commute, they can be simultaneously diagonalized:

$$\varOmega =F^{\intercal }\varLambda F,\quad N=F^{\intercal }\varSigma F,$$

where F is an orthogonal matrix, \(\varLambda =\mathrm{diag}(\lambda _{1},\ldots ,\lambda _{d}), \varSigma =\mathrm{diag}(\sigma _{1},\ldots ,\sigma _{d})\) and \(\lambda _{i}\ge 0, \sigma _{i}\le 0\) for \(i=1,2,\ldots ,d\). It now follows from

$$\begin{aligned} \begin{aligned}&QM=\left( \begin{array}{cc}F^{\intercal }&{}O\\ O&{}F^{\intercal }\end{array}\right) \left( \begin{array}{cc}O&{}I\\ -\varLambda &{}\varSigma \end{array}\right) \left( \begin{array}{cc}F&{}O\\ O&{}F\end{array}\right) \end{aligned}\end{aligned}$$

that

$$\begin{aligned} \begin{aligned}&\exp (hQM)=\left( \begin{array}{cc}F^{\intercal }&{}O\\ O&{}F^{\intercal }\end{array}\right) \exp \left\{ \left( \begin{array}{cc}O&{}hI\\ -h\varLambda &{}h\varSigma \end{array}\right) \right\} \left( \begin{array}{cc}F&{}O\\ O&{}F\end{array}\right) .\\ \end{aligned} \end{aligned}$$

To show that \(\exp _{21}\) and \(\varphi _{21}\) depends on h, we denote them by \(\exp _{21}^{h}\) and \(\varphi _{21}^{h}\), respectively. After some calculations, we have

$$\begin{aligned} \exp _{21}^{h}=F^{\intercal }(\varSigma ^{2}-4\varLambda )^{-\frac{1}{2}}\cdot 2\sinh (h(\varSigma ^{2}-4\varLambda )^{\frac{1}{2}}/2)\exp \left( \frac{h\varSigma }{2}\right) F. \end{aligned}$$

We then have

$$\begin{aligned} \begin{aligned} ||\exp _{21}^{h}||_{2}&=||2(\varSigma ^{2}-4\varLambda )^{-\frac{1}{2}}\sinh (h(\varSigma ^{2}-4\varLambda )^{\frac{1}{2}}/2)\exp \left( \frac{h\varSigma }{2}\right) ||_{2}\\&=h\max _{i}|\frac{\sinh ((h^{2}\sigma _{i}^{2}/4-\lambda _{i})^{\frac{1}{2}})}{(h^{2}\sigma _{i}^{2}/4-\lambda _{i})^{\frac{1}{2}}}\exp \left( \frac{h\sigma _{i}}{2}\right) |. \end{aligned}\end{aligned}$$
(2.25)

In order to estimate \(||\exp _{21}^{h}||_{2}\), the bound of the function

$$\begin{aligned} g(\lambda ,\sigma )=\frac{\sinh ((\sigma ^{2}-4\lambda )^{\frac{1}{2}})}{(\sigma ^{2}-4\lambda )^{\frac{1}{2}}}\exp \left( \sigma \right) , \end{aligned}$$

should be considered for \(\sigma \le 0,\lambda \ge 0\). If \(\sigma ^{2}-4\lambda <0\), we set \((\sigma ^{2}-4\lambda )^{\frac{1}{2}}=\mathrm{i}a,\) where \(\mathrm{i}\) is the imaginary unit and a is a real number. Then we have

$$\begin{aligned} |g|=|\frac{\sin (a)}{a}\exp \left( \sigma \right) |\le |\frac{\sin (a)}{a}|\le 1. \end{aligned}$$

If \(\sigma ^{2}-4\lambda \ge 0\), then \(a=(\sigma ^{2}-4\lambda )^{\frac{1}{2}}\le -\sigma \),

$$\begin{aligned} |g|=|\frac{\sinh (a)}{a}\exp \left( \sigma \right) |\le |\frac{\sinh (a)}{a}\exp (-a)|=|\frac{1-\exp (-2a)}{2a}|\le 1. \end{aligned}$$

Thus,

$$\begin{aligned} |g(\lambda ,\sigma )|\le 1\quad \text {for }\sigma \le 0,\lambda \ge 0. \end{aligned}$$
(2.26)

It follows from (2.25) and (2.26) that

$$\begin{aligned} ||\exp _{21}^{h}||_{2}=h\max _{i}|g\left( \frac{h\sigma _{i}}{2},\lambda _{i}\right) |\le h. \end{aligned}$$
(2.27)

Therefore, using \(\varphi (hQM)=\int _{0}^{1}\exp ((1-\xi )hQM)\mathrm{d}\xi \) and (2.27), we obtain

$$||\varphi _{21}||_{2}=||\int _{0}^{1}\exp _{21}^{(1-\xi )h}\mathrm{d}\xi ||_{2}\le \int _{0}^{1}||\exp _{21}^{(1-\xi )h}||_{2}\mathrm{d}\xi \le \int _{0}^{1}(1-\xi )h\mathrm{d}\xi =\frac{1}{2}h.$$

The rest of the proof is similar to that of Theorem 2.4 which we omit here.    \(\square \)

It can be observed that in the particular case that \(N=0\), the scheme (2.24) reduces to the AAVF integrator in [32].

2.5.3 Semi-discrete Conservative or Dissipative PDEs

Many time-dependent PDEs are in the form:

$$\begin{aligned} \dfrac{\partial }{\partial {t}}y(x,t)=\mathscr {Q}\frac{\delta \mathscr {H}}{\delta y}, \end{aligned}$$
(2.28)

where \(y(\cdot ,t)\in X\) for every \(t\ge 0\), X is a Hilbert space like \(\mathbf {L}^{2}(\mathscr {D})\), \(\mathbf {L}^{2}(\mathscr {D})\times \mathbf {L}^{2}(\mathscr {D}),\ldots ,\) \(\mathscr {D}\) is a domain in \(\mathbb {R}^{d}\), and \(\mathscr {Q}\) is a linear operator on X, the functional \(\mathscr {H}[y]=\int _{\mathscr {D}}f(y,\partial _{\alpha }y)\mathrm{d}x\) (f is smooth, \(x=(x_{1},\ldots ,x_{d}), \mathrm{d}x=\mathrm{d}x_{1}\ldots \mathrm{d}x_{d}\) and \(\partial _{\alpha }y\) denote the partial derivatives of y with respect to spatial variables \(x_{i}, 1\le i\le d\)). Under a suitable boundary condition (BC), the variational derivative \(\dfrac{\delta \mathscr {H}}{\delta y}\) is defined by:

$$ \langle \frac{\delta \mathscr {H}}{\delta y},z\rangle =\frac{\mathrm{d}}{\mathrm{d}\varepsilon }\big |_{\varepsilon =0}\mathscr {H}[y+\varepsilon z] $$

for any smooth \(z\in X\) vanishing on the boundary of \(\mathscr {D}\), where \(\langle \cdot ,\cdot \rangle \) is the inner product of X. If \(\mathscr {Q}\) is a skew or negative semi-definite operator with respect to \(\langle \cdot ,\cdot \rangle \), then the Eq. (2.28) is conservative (e.g., the nonlinear wave, nonlinear Schrödinger, Korteweg–de Vries and Maxwell equations) or dissipative (e.g., the Allen–Cahn, Cahn–Hilliard, Ginzburg–Landau and heat equations), i.e., \(\mathscr {H}[y]\) is constant or monotonically decreasing (see, e.g. [6, 13]). In general, after the spatial discretisation, (2.28) becomes a conservative or dissipative system of ODEs in the form (2.1).

A typical example of a conservative system is the nonlinear Schrödinger (NSL) equation:

$$\begin{aligned} \mathrm{i}\dfrac{\partial }{\partial {t}}y+\dfrac{\partial ^2}{\partial {x}^2}y+V^{'}(|y|^{2})y=0 \end{aligned}$$
(2.29)

subject to the periodic BC \(y(0,t)=y(L,t)\). Denoting \(y=p+\mathrm{i}q\) (\(\mathrm{i}^2=-1\)), where pq are the real and imaginary parts of y, the Eq. (2.29) can be written in the form of (2.28):

$$\begin{aligned} \dfrac{\partial }{\partial {t}}\left( \begin{array}{c}p\\ q\end{array}\right) =\left( \begin{array}{cc}0&{}-1\\ 1&{}0\end{array}\right) \left( \begin{array}{c}\dfrac{\partial ^2}{\partial {x}^2}p+V^{'}(p^{2}+q^{2})p\\ \dfrac{\partial ^2}{\partial {x}^2}q+V^{'}(p^{2}+q^{2})q\end{array}\right) , \end{aligned}$$
(2.30)

where \(X=\mathbf {L}^{2}([0,L])\times \mathbf {L}^{2}([0,L]),\)

$$ \mathscr {H}[y]=\frac{1}{2}\int _{0}^{L}\left( V(p^{2}+q^{2})-\left( \frac{\partial }{\partial {x}}p\right) ^{2}-\left( \frac{\partial }{\partial {x}}q\right) ^{2}\right) \mathrm{d}x. $$

We consider the spatial discretisation of (2.30). It is supposed that the spatial domain is equally partitioned into N intervals: \(0=x_{0}<x_{1}<\ldots <x_{N}=L\). Discretizing the spatial derivatives of (2.30) by central differences gives

$$\begin{aligned} \left( \begin{array}{c}\dot{\tilde{p}}\\ \dot{\tilde{q}}\end{array}\right) =\left( \begin{array}{cc}O&{}-I\\ I&{}O\end{array}\right) \left( \begin{array}{c}D\tilde{p}+V^{'}(\tilde{p}^{2}+\tilde{q}^{2})\tilde{p}\\ D\tilde{q}+V^{'}(\tilde{p}^{2}+\tilde{q}^{2})\tilde{q}\end{array}\right) , \end{aligned}$$
(2.31)

where

$$\begin{aligned} D=\left( \begin{array}{ccccc}-2&{}1&{} &{} &{} \\ 1&{}-2&{}1&{} &{}\\ &{}\ddots &{}\ddots &{}\ddots &{} \\ &{} &{}1&{}-2&{}1\\ &{} &{} &{}1&{}-2\end{array}\right) , \end{aligned}$$

is an \(N\times N\) symmetric differential matrix, \(\tilde{p}=(p_{0},\ldots ,p_{N-1})^{\intercal }, \tilde{q}=(q_{0},\ldots ,q_{N-1})^{\intercal }, p_{i}(t)\approx p(x_{i},t)\) and \(q_{i}(t)\approx q(x_{i},t)\) for \(i=0,\ldots ,N-1\).

As an example of dissipative PDEs we consider the Allen–Cahn (AC) equation

$$\begin{aligned} \dfrac{\partial y}{\partial {t}}=\beta \frac{\partial ^2y}{\partial {x}^2}+y-y^{3},\quad \beta \ge 0, \end{aligned}$$
(2.32)

subject to the the Neumann BC \(\frac{\partial }{\partial {x}}y(0,t)=\frac{\partial }{\partial {x}}y(L,t)\). \(X=\mathbf {L}^{2}([0,L]), \mathscr {Q}=-1, \mathscr {H}[y]=\int _{0}^{L}(\frac{1}{2}\beta (\frac{\partial }{\partial {x}}y)^{2}-\frac{1}{2}y^{2}+\frac{1}{4}y^{4})\mathrm{d}x\). The spatial grids are chosen in the same way as the NLS. Discretizing the spatial derivative with the central difference, we obtain

$$\begin{aligned} \dot{\tilde{y}}=\beta \hat{D}\tilde{y}+\tilde{y}-\tilde{y}^{3}, \end{aligned}$$
(2.33)

where

$$\begin{aligned} \hat{D}=\left( \begin{array}{ccccc}-1&{}1&{} &{} &{} \\ 1&{}-2&{}1&{} &{}\\ &{}\ddots &{}\ddots &{}\ddots &{} \\ &{} &{}1&{}-2&{}1\\ &{} &{} &{}1&{}-1\end{array}\right) , \end{aligned}$$

is the \((N-1)\times (N-1)\) symmetric differential matrix, \(\tilde{y}=(y_{1},\ldots ,y_{N-1})^{\intercal }, y_{i}(t)\approx y(x_{i},t)\).

Both the semi-discrete NLS equation (2.31) and AC equation (2.33) are of the form (2.1). For the NLS equation, we have

$$\begin{aligned} Q=\left( \begin{array}{cc}O&{}-I\\ I&{}O\end{array}\right) ,\quad M=\left( \begin{array}{cc}D&{}O\\ O&{}D\end{array}\right) ,\quad U=\frac{1}{2}\sum _{i=0}^{N-1}V(p_{i}^{2}+q_{i}^{2}), \end{aligned}$$

while for the AC equation, we have

$$\begin{aligned} Q=-I,\quad M=-\beta \hat{D},\quad U=\sum _{i=1}^{N-1}\left( -\frac{1}{2}y_{i}^{2}+\frac{1}{4}y_{i}^{4}\right) . \end{aligned}$$

Therefore, the scheme (2.16) can be applied to solve them. Since the matrix QM is skew or symmetric negative semi-definite in these two cases, according to Remark 2.1, the convergence of fixed-point iterations for them is independent of the differential matrix.

2.6 Numerical Experiments

In this section, we compare the EAVF method (2.16) with the well-known implicit midpoint method which is denoted by MID:

$$\begin{aligned} y^{1}=y^{0}+hQ\nabla \widetilde{U}\left( \frac{y^{0}+y^{1}}{2}\right) , \end{aligned}$$
(2.34)

and the traditional AVF method for (2.1) given by

$$\begin{aligned} y^{1}=y^{0}+hQ\int _{0}^{1}\nabla \widetilde{U}((1-\tau )y^{0}+\tau y^{1})d\tau , \end{aligned}$$
(2.35)

where \(\widetilde{U}(y)=U(y)+\frac{1}{2}y^{\intercal }My\). The authors in [30] showed that (2.35) preserves the first integral or the Lyapunov function \(\widetilde{U}\). Our comparison also includes another energy-preserving method of order four for (2.1):

$$\begin{aligned} \left\{ \begin{aligned}&y^{\frac{1}{2}}=y^{0}+hQ\int _{0}^{1}\left( \frac{5}{4}-\frac{3}{2}\tau \right) \nabla \widetilde{U}(y_{\tau })d\tau ,\\&y^{1}=y^{0}+hQ\int _{0}^{1}\nabla \widetilde{U}(y_{\tau })d\tau ,\\ \end{aligned}\right. \end{aligned}$$
(2.36)

where

$$y_{\tau }=(2\tau -1)(\tau -1)y^{0}-4\tau (\tau -1)y^{\frac{1}{2}}+(2\tau -1)\tau y^{1}.$$

This method is denoted by CRK since it can be written as a continuous Runge–Kutta method. For details, readers are referred to [17].

Throughout the experiment, the ‘reference solution’ is computed by high-order methods with a sufficiently small stepsize. We always start to calculate from \(t_{0}=0\). \(y^{n}\approx y(t_{n})\) is obtained by the time-stepping way \(y^{0}\rightarrow y^{1} \rightarrow \cdots \rightarrow y^{n}\rightarrow \cdots \) for \(n=1,2,\ldots \) and \(t_{n}=nh\). The error tolerance for iteration solutions of the four methods is set as \(10^{-14}\). The maximum global error (GE) over the total time interval is defined by:

$$\begin{aligned} GE=\max _{n\ge 0}||y^{n}-y(t_{n})||_{\infty }. \end{aligned}$$

The maximum global error of H (EH) on the interval is:

$$\begin{aligned} EH=\max _{n\ge 0}|H^{n}-H(y(t_{n}))|. \end{aligned}$$

In our numerical experiments, the computational cost of each method is measured by the number of function evaluations (FE).

Example 2.1

The motion of a triatomic molecule can be modelled by a Hamiltonian system with the Hamiltonian of the form (2.20) (see, e.g. [8]):

$$\begin{aligned} H(p,q)=S(p,q)+\frac{1}{2}(p_{1,1}^{2}+p_{1,2}^{2}+p_{1,3}^{2})+\frac{\omega ^{2}}{2}(q_{1,1}^{2}+q_{1,2}^{2}+q_{1,3}^{2}), \end{aligned}$$
(2.37)

where

$$S(p,q)=\frac{1}{2}p_{0}^{2}+\frac{1}{4}(q_{0}-q_{1,3})^{2}-\frac{1}{4}\frac{2q_{1,2}+q_{1,2}^{2}}{(1+q_{1,2})^{2}}(p_{0}-p_{1,3})^{2} -\frac{1}{4}\frac{2q_{1,1}+q_{1,1}^{2}}{(1+q_{1,1})^{2}}(p_{0}+p_{1,3})^{2}.$$

The initial values are given by:

$$\begin{aligned} \left\{ \begin{aligned}&p_{0}(0)=p_{1,1}(0)=p_{1,2}(0)=p_{1,3}(0)=1,\\&q_{0}(0)=0.4, q_{1,1}(0)=q_{1,2}(0)=\frac{1}{\omega }, q_{1,3}=\frac{1}{2^{\frac{1}{2}}\omega }.\\ \end{aligned}\right. \end{aligned}$$

Setting \(h=1/2^{i}\) for \(i=6,\ldots ,10\), \( \omega =50\), and \(h=1/100\times 1/2^{i}\) for \(i=0,\ldots ,4\), \(\omega =100\), we integrate the problem (2.21) with the Hamiltonian (2.37) over the interval [0, 50]. Since the nonlinear term \(\nabla S(p,q)\) is complicated to be integrated, we evaluate the integrals in EAVF, AVF and CRK by the 3-point Gauss–Legendre (GL) quadrature formula \((b_{i},c_{i})_{i=1}^{3}\):

$$b_{1}=\frac{5}{18}, b_{2}=\frac{4}{9}, b_{3}=\frac{5}{18}; \quad c_{1}=\frac{1}{2}-\frac{15^{\frac{1}{2}}}{10}, c_{2}=\frac{1}{2}, c_{3}=\frac{1}{2}+\frac{15^{\frac{1}{2}}}{10}.$$

The corresponding schemes are denoted by EAVFGL3, AVFGL3 and CRKGL3 respectively. Numerical results are presented in Fig. 2.1.

Figure 2.1a, c show that MID and AVFGL3 lost basic accuracy. It can be observed from Fig. 2.1b, d that AVFGL3, EAVFGL3, CRKGL3 are much more efficient in preserving energy than MID. In the aspects of both energy preservation and algebraic accuracy, EAVF is the most efficient among the four methods.

Fig. 2.1
figure 1

Copyright 2016 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved

Efficiency curves.

Example 2.2

The equation

$$\begin{aligned} \begin{aligned}&\dot{x}_{1}=-\zeta x_{1}-\lambda x_{2}+x_{1}x_{2},\\&\dot{x}_{2}=\lambda x_{1}-\zeta x_{2}+\frac{1}{2}(x_{1}^{2}-x_{2}^{2}),\\ \end{aligned} \end{aligned}$$
(2.38)

is an averaged system in wind-induced oscillation, where \(\zeta \ge 0\) is a damping factor and \(\lambda \) is a detuning parameter (see, e.g. [16]). For convenience, setting \(\zeta =r\mathrm{cos}(\theta ), \lambda =r\mathrm{sin}(\theta ), r\ge 0, 0\le \theta \le \pi /2\), (see [29]) we write (2.38) as

$$\begin{aligned} \begin{aligned} \left( \begin{array}{c}\dot{x}_{1}\\ \dot{x}_{2}\end{array}\right) =\left( \begin{array}{cc}-\cos (\theta )&{}-\sin (\theta )\\ \sin (\theta )&{}-\cos (\theta )\end{array}\right) \left( \begin{array}{c}rx_{1}-\frac{1}{2}\sin (\theta )(x_{2}^{2}-x_{1}^{2})-\cos (\theta )x_{1}x_{2}\\ rx_{2}-\sin (\theta )x_{1}x_{2}+\frac{1}{2}\cos (\theta )(x_{2}^{2}-x_{1}^{2})\end{array}\right) , \end{aligned} \end{aligned}$$
(2.39)

which is of the form (2.1), where

$$\begin{aligned} \begin{aligned}&Q=\left( \begin{array}{cc}-\cos (\theta )&{}-\sin (\theta )\\ \sin (\theta )&{}-\cos (\theta )\end{array}\right) ,\quad M=\left( \begin{array}{cc}r&{}0\\ 0&{}r\end{array}\right) ,\\&U=-\frac{1}{2}\sin (\theta )\left( x_{1}x_{2}^{2}-\frac{1}{3}x_{1}^{3}\right) +\frac{1}{2}\cos (\theta )\left( \frac{1}{3}x_{2}^{3}-x_{1}^{2}x_{2}\right) . \end{aligned}\end{aligned}$$
(2.40)

Its Lyapunov function (dissipative case, when \(\theta <\pi /2\)) or the first integral (conservative case, when \(\theta =\pi /2\)) is:

$$H=\frac{1}{2}r(x_{1}^{2}+x_{2}^{2})-\frac{1}{2}\mathrm{sin}(\theta )\left( x_{1}x_{2}^{2}-\frac{1}{3}x_{1}^{3}\right) +\frac{1}{2}\mathrm{cos}(\theta )\left( \frac{1}{3}x_{2}^{3}-x_{1}^{2}x_{2}\right) .$$

The matrix exponential of the EAVF scheme (2.16) for (2.39) is calculated by:

$$\begin{aligned} \exp (V)=\left( \begin{array}{cc}\exp (-hcr)\mathrm{cos}(hsr)&{}-\exp (-hcr)\mathrm{sin}(hsr)\\ \exp (-hcr)\mathrm{sin}(hsr)&{}\exp (-hcr)\mathrm{cos}(hsr)\end{array}\right) , \end{aligned}$$

where \(c=\cos (\theta ), s=\sin (\theta )\), and \(\varphi (V)\) can be obtained by \((\exp (V)-I)V^{-1}\). Given the initial values:

$$\begin{aligned} x_{1}(0)=0,x_{2}(0)=1, \end{aligned}$$

we first integrate the conservative system (2.39) with the parameters \(\theta =\pi /2, r=20\) and stepsizes \(h=1/20\times 1/2^{i}\) for \(i=-1,\ldots ,4\) over the interval [0, 200]. Setting \(\theta =\pi /2-10^{-4}, r=20,\) we then integrate the dissipative (2.39) with the stepsizes \(h=1/20\times 1/2^{i}\) for \(i=-1,\ldots ,4\) over the interval [0, 100]. Numerical errors are presented in Figs. 2.2 and 2.3. It is noted that the integrands appearing in AVF, EAVF are polynomials of degree two and the integrands in CRK are polynomials of degree five. We evaluate the integrals in AVF, EAVF by the 2-point GL quadrature:

$$b_{1}=\frac{1}{2}, b_{2}=\frac{1}{2},\quad c_{1}=\frac{1}{2}-\frac{3^{\frac{1}{2}}}{6}, c_{2}=\frac{1}{2}+\frac{3^{\frac{1}{2}}}{6},$$

and the integrals appearing in CRK by the 3-point GL quadrature. Then there is no quadrature error.

The efficiency curves of AVF and MID consist of only five points in Figs. 2.2a, b, and 2.3a (two points overlap in Figs. 2.2a and 2.3a), since the fixed-point iterations of MID and AVF are not convergent when \(h=1/10\). Note that QM is skew-symmetric or negative semi-definite, the convergence of iterations for the EAVF method is independent of r by Theorem 2.4 and Remark 2.1. Thus larger stepsizes are allowed for EAVF. The experiment shows that the iterations for EAVF uniformly converge for \(h=1/20\times 1/2^{i}\) for \(i=-1,\ldots ,4\). Moreover, it can be observed from Fig. 2.3b that MID cannot strictly preserve the decay of the Lyapunov function.

Fig. 2.2
figure 2

Copyright 2016 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved

Efficiency curves.

Example 2.3

The PDE:

$$\begin{aligned} \frac{\partial ^{2}u}{\partial t^{2}}=\beta \frac{\partial ^{3}u}{\partial t\partial x^{2}}+\frac{\partial ^{2}u}{\partial x^{2}} \left( 1+\varepsilon \left( \frac{\partial u}{\partial x}\right) ^{p}\right) -\gamma \frac{\partial u}{\partial t}-m^{2}u, \end{aligned}$$
(2.41)

where \(\varepsilon >0, \beta , \gamma \ge 0\), is a continuous generalization of \(\alpha \)-FPU (Fermi–Pasta–Ulam) system (see, e.g. [28]). Taking \(\partial _{t}u=v\) and the homogeneous Dirichlet BC \(u(0,t)=u(L,t)=0\), the Eq. (2.41) is of the type (2.28), where \(X=\mathbf {L}^{2}([0,L])\times \mathbf {L}^{2}([0,L])\) and

$$\begin{aligned}\begin{aligned} y&=\left( \begin{array}{c}u\\ v\end{array}\right) ,\quad \mathscr {Q}=\left( \begin{array}{cc}0&{}1\\ -1&{}\beta \partial _{x}^{2}-\gamma \end{array}\right) ,\\ \mathscr {H}[y]&=\int _{0}^{L}\left( \frac{1}{2}u_{x}^{2}+\frac{m^{2}}{2}u^{2}+\frac{v^{2}}{2}+\frac{\varepsilon u_{x}^{p+2}}{(p+2)(p+1)}\right) \mathrm{d}x. \end{aligned} \end{aligned}$$
Fig. 2.3
figure 3

Copyright 2016 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved

a Efficiency curves. b The Lyapunov function against time t.

It is easy to verify that \(\mathscr {Q}\) is a negative semi-definite operator, and thus (2.41) is dissipative. The spatial discretization yields a dissipative system of ODEs:

$$\begin{aligned}\begin{aligned}&\ddot{u}_{j}(t)-c^{2}(u_{j-1}-2u_{j}+u_{j+1})+m^{2}u_{j}-\beta ^{'}(\dot{u}_{j-1}-2\dot{u}_{j}+\dot{u}_{j+1})+\gamma \dot{u}_{j}(t)\\&=\varepsilon ^{'}(V^{'}(u_{j+1}-u_{j})-V^{'}(u_{j}-u_{j-1})), \end{aligned}\end{aligned}$$

where \(c=1/\varDelta x, \beta ^{'}=c^{2}\beta , \varepsilon ^{'}=c^{p+2}\varepsilon , V(u)=u^{p+2}/[(p+2)(p+1)], u_{j}(t)\approx u(x_{j},t), x_{j}=j/\varDelta x\) for \(j=1,\ldots ,N-1\) and \(u_{0}(t)=u_{N}(t)=0\). Note that the nonlinear term \(u_{xx}u_{x}^{p}\) is approximated by:

$$\begin{aligned}\begin{aligned}&\frac{\partial ^{2}u}{\partial x^{2}}\left( \frac{\partial u}{\partial x}\right) ^{p}|_{x=x_{j}}=\frac{1}{p+1}\partial _{x}\left( \frac{\partial u}{\partial x}\right) ^{p+1}|_{x=x_{j}}\\&\approx \frac{1}{p+1}\left( \left( \frac{u_{j+1}-u_{j}}{\varDelta x}\right) ^{p+1}-\left( \frac{u_{j}-u_{j-1}}{\varDelta x}\right) ^{p+1}\right) /\varDelta x.\end{aligned} \end{aligned}$$

We now write it in the compact form (2.22):

$$\begin{aligned} \ddot{q}-N\dot{q}+\varOmega q=-\nabla U_{1}(q), \end{aligned}$$

where \(q=(u_{1},\ldots ,u_{N-1})^{\intercal }, N=\beta ^{'}D-\gamma I, \varOmega =-c^{2}D+m^{2}I, U_{1}(q)=\varepsilon ^{'}\sum _{j=0}^{N-1}V(u_{j+1}-u_{j})\) and

$$\begin{aligned} D=\left( \begin{array}{ccccc}-2&{}1&{} &{} &{} \\ 1&{}-2&{}1&{} &{}\\ &{}\ddots &{}\ddots &{}\ddots &{} \\ &{} &{}1&{}-2&{}1\\ &{} &{} &{}1&{}-2\end{array}\right) . \end{aligned}$$

In this experiment, we set \(p=1, m=0, c=1, \varepsilon =\frac{3}{4},\) and \(\gamma =0.005.\) Consider the initial conditions in [28]:

$$\begin{aligned} \phi _{j}(t)=B\ln \left\{ \left( \frac{1+\exp [2(\kappa (j-97)+t\sinh (\kappa ))]}{1+\exp [2(\kappa (j-96)+t\sinh (\kappa ))]}\right) \left( \frac{1+\exp [2(\kappa (j-32)+t\sinh (\kappa ))]}{1+\exp [2(\kappa (j-33)+t\sinh (\kappa ))]}\right) \right\} \end{aligned}$$

with \(B=5, \kappa =0.1\), that is,

$$\begin{aligned} \left\{ \begin{aligned}&u_{j}(0)=\phi _{j}(0),\\&v_{j}(0)=\dot{\phi }_{j}(0).\\ \end{aligned}\right. \end{aligned}$$

for \(j=1,\ldots ,N-1\). Let \(N=128, \beta =0, 2.\) We compute the numerical solution by MID, AVF and EAVF with the stepsizes \(h=1/2^{i}\) for \(i=1,\ldots ,5\) over the time interval [0, 100]. Similarly to EAVF (2.24), the nonlinear systems resulting from MID (2.34) and AVF (2.35) can be reduced to:

$$\begin{aligned} q^{1}=q^{0}+hp^{0}+\frac{h}{2}N(q^{1}-q^{0})-\frac{h^{2}}{4}\varOmega (q^{1}+q^{0})-\frac{h^{2}}{2}\nabla U_{1}\left( \frac{q^{0}+q^{1}}{2}\right) , \end{aligned}$$

and

$$\begin{aligned} q^{1}=q^{0}+hp^{0}+\frac{h}{2}N(q^{1}-q^{0})-\frac{h^{2}}{4}\varOmega (q^{1}+q^{0})-\frac{h^{2}}{2}\int _{0}^{1}\nabla U_{1}((1-\tau )q^{0}+\tau q^{1})d\tau \end{aligned}$$

respectively. Both the velocity \(p^{1}\) of MID and AVF can be recovered by

$$\begin{aligned} \frac{q^{1}-q^{0}}{h}=\frac{p^{1}+p^{0}}{2}. \end{aligned}$$

The integrals in AVF and EAVF are exactly evaluated by the 2-point GL quadrature. Since \(\exp (hA), \varphi (hA)\) in (2.24) have no explicit expressions, they are calculated by the Matlab package in [2]. The basic idea is to evaluate \(\exp (hA), \varphi (hA)\) by their Padé approximations. Numerical results are plotted in Fig. 2.4. Alternatively, there are other popular algorithms such as the contour integral method and the Krylov subspace method for matrix exponentials and \(\varphi \)-functions. Readers are referred to [23] for a summary of algorithms and well-established mathematical software.

According to Theorem 2.5, the convergence of iterations in the EAVF scheme is independent of \(\varOmega \) and N. Iterations of MID and AVF are not convergent when \(\beta =2, h=1/2\). Thus the efficiency curves of MID and AVF in Fig. 2.4b consist of only 4 points. From Fig. 2.4c, it can be observed that the EAVF method is dissipative even using the relatively large stepsize \(h=1/2\).

Fig. 2.4
figure 4

Copyright 2016 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved

a, b Efficiency curves. c The decay of Lyapunov function obtained by EAVF.

2.7 Conclusions and Discussions

Exponential integrators date back to the original work by Hersch [20]. The term “exponential integrators” was coined in the seminal paper by Hochbruck, Lubich and Selhofer [22]. It turns out that exponential integrators have constituted an important class of effective methods for the numerical solution of differential equations in applied sciences and engineering. In this chapter, combining the ideas of the exponential integrator with the average vector field, a new exponential scheme EAVF was proposed and analysed. The EAVF method can preserve the first integral or the Lyapunov function for the conservative or dissipative system (2.1). The symmetry of EAVF is responsible for the good long-term numerical behavior. The implicitness of EAVF means that the solution must be solved iteratively. We have analysed the convergence of the fixed-point iteration and showed that the convergence is free from the influence of a large class of coefficient matrices M. In the dynamics of the triatomic molecule, wind-induced oscillation and the damped FPU problem, we compared the new EAVF method with the MID, AVF and CRK methods. The three problems are modelled by the system (2.1) having a dominant linear term and small nonlinear term. As for the efficiency as well as preserving energy and dissipation, EAVF is superior to the other three methods. In general, energy-preserving and energy-decaying methods are implicit, and iterative solutions are required. With relatively large stepsizes, the iterations of EAVF converge while those of AVF and MID do not. We conclude that EAVF is a promising method for solving the system (2.1) with \(||QM||\gg ||Q\ {Hess}(U)||\).

In conclusion, exponential integrators are an important class of structure-preserving numerical methods for differential equations. Therefore, we will further discuss and analyse exponential Fourier collocation methods in the next chapter, and symplectic exponential Runge–Kutta methods for solving nonlinear Hamiltonian systems in Chap. 4.

This chapter is based on the work of Li and Wu [27].