1 Introduction

In this paper, we are concerned with the construction and implementation of new efficient exponential Runge–Kutta integrators for solving stiff parabolic PDEs. These PDEs, upon their spatial discretizations, can be cast in the form of semilinear problems

$$\begin{aligned} u'(t)= A u(t) + g(t, u(t))=F(t, u(t)), \qquad u(t_0)=u_0, \end{aligned}$$
(1.1)

where the linear part Au usually causes stiffness. The nonlinearity g(tu) is assumed to satisfy a local Lipschitz condition in a strip along the exact solution.

Exponential Runge–Kutta methods are a popular class of exponential integrators [9], which have shown a great promise as an alternative to standard time integration solvers for stiff systems and applications in recent years, see e.g. [8, 10,11,12,13,14,15,16,17,18,19,20, 22]. The main idea behind these methods is to solve the linear portion of (1.1) exactly and integrate the remaining nonlinear portion explicitly based on a representation of the exact solution using the variation-of-constants formula.

A s-stage explicit exponential Runge–Kutta (expRK) method [8] applied to (1.1) can be reformulated (see [15, 17]) as

$$\begin{aligned} U_{ni}&= u_n + c_i h \varphi _{1} ( c_i h A)F(t_n, u_n) + h \sum _{j=2}^{i-1}a_{ij}(h A) D_{nj}, \ 2\le i\le s, \end{aligned}$$
(1.2a)
$$\begin{aligned} u_{n+1}&= u_n + h \varphi _{1} ( h A)F(t_n, u_n) + h \sum _{i=2}^{s}b_{i}(h A) D_{ni} , \end{aligned}$$
(1.2b)

where

$$\begin{aligned} D_{ni}= g (t_n+c_i h, U_{ni})- g(t_n, u_n ), \qquad 2\le i\le s. \end{aligned}$$
(1.3)

Here, \(U_{ni}\) denote the internal stages that approximate \(u(t_n+c_i h)\) using the time step size \(h = t_{n+1}-t_n >0\) and nodes \(c_i \). By construction, the coefficients \(a_{ij}(z)\) and \(b_i (z)\) are usually linear combinations of the entire functions

$$\begin{aligned} \varphi _{k}(z)=\int _{0}^{1} \mathrm{e}^{(1-\theta )z} \frac{\theta ^{k-1}}{(k-1)!}\,\text {d}\theta , \quad k\ge 1 \end{aligned}$$
(1.4)

and their scaled versions \(\varphi _{k}(c_i z)\).

A common approach that has been used to determine the unknown matrix functions \(a_{ij}(h A)\) and \(b_{i}(h A)\) is to expand them as \(a_{ij}(h A)=\sum _{k\ge 0} \alpha ^{(k)}_{ij} (hA)^k \), \(b_{i}(h A)=\sum _{k\ge 0} \beta ^{(k)}_{i} (hA)^k \) (e.g. using classical Taylor series expansions) to obtain order conditions. Clearly, the boundedness of the remainder terms of these expansions (and thus the error terms) are dependent of \(\Vert A\Vert \). Due to stability reasons, such resulting methods might not be suitable for integrating stiff PDEs, which A typically has a large norm or is even unbounded operator. These methods are thus usually referred as classical (non-stiffly accurate) expRK methods. Unlike this approach, in a seminal contribution [8], Hochbruck and Ostermann derived a new error expansion with the remainder terms that are bounded independently of the stiffness (i.e. not involving the powers of A), leading to stiff order conditions, which give rise to the construction of stiffly accurate expRK methods of orders up to four. Following this, in [14] Luan and Ostermann developed a systematic theory of deriving stiff order conditions for expRK methods of arbitrary order, thereby allowing the construction of a fifth-order method in [15].

In view of the existing stiffly accurate expRK methods in the literature, we observe that they were derived based on a convergence result that requires weakening many of the stiff order conditions (in order to minimize the number of required stages s and matrix functions used in each internal stages \(U_{ni}\)). As a result, their structures contain internal stages \(U_{ni}\) that are dependent of the preceding stages, implying that such methods must be implemented by computing each of these stages sequentially. Also, the very last stages usually involve several different linear combinations of \(\varphi _{k}(c_i hA)\)-functions (using different nodes \(c_i\) in their arguments) acting on different sets of vectors. This would introduce additional computational effort for these stages. For more details, we refer to Sects. 2 and 5.

Motivated by the observations above, in this work we show a stronger convergence result for expRK methods up to order five which requires weakening only one order condition (thereby could improve the stability and accuracy) and offers more degree of freedoms in solving order conditions. Using this result and inspired by our recent algorithm, \({\mathtt {phipm\_simul\_iom}}\), proposed in [19] (which allows one to simultaneously compute multiple linear combinations of \(\varphi \)- functions acting on a same set of vectors), we construct new methods of orders 4 and 5 which involve only one linear combination of \(\varphi \)- functions for each stage and have multiple internal stages \(U_{ni}\) that are independent of one another, thereby allowing them to be computed in parallel. Furthermore, one can derive these independent stages in a way that they share the same form of linear combination of \(\varphi _{k}(c_i hA)\)- functions acting on the same set of vectors, allowing them to be implemented simultaneously (by one evaluation). While these independent states can be computed in parallel (as mentioned above) by any algorithm which approximates the action of (the linear combination of) \(\varphi \)- functions, we note that the possibility to compute them simultaneously is a new feature that can be used with our algorithm \({\mathtt {phipm\_simul\_iom}}\) (other algorithms, e.g., that do not require the construction of Krylov subspaces, might not support computing these stages simultaneously). Overall, this makes the new methods to behave like methods using much less number of stages (even when compared to the existing methods of the same orders), meaning that they require much less number of evaluations for linear combinations of \(\varphi \)- functions, and are thus more efficient.

The paper is organized as follows. In Sect. 2, we describe our motivation, propose new ideas, and review the existing expRK methods in the literature with respect to these ideas. Following this, in Sect. 3 we prove a stronger convergence result (Theorem 3.1) for expRK methods, which requires relaxing only one order condition. This allows us to construct more efficient methods in Sect. 4. In particular, we are able to derive two new families of fourth- and fifth- order stiffly accurate expRK methods called \({\mathtt {expRK4s6}}\) (4th-order 6-stage but requires 4 independent stage evaluation only) and \({\mathtt {expRK5s10}}\) (5th-order 10-stage but requires 5 independent stage evaluation only), respectively. In Sect. 5, we present details implementation of these two new methods, as well as the existing stiffly accurate expRK schemes of the same orders (for comparison). In the last section, numerical examples including one and two-dimensional stiff PDEs are presented to demonstrate the accuracy and efficiency of the two newly constructed expRK integrators.

2 Motivation and existing methods

In this section, we start our motivation by taking a closer look at an efficient way for implementing expRK methods (1.2). Then, we propose some ideas to derive more efficient methods with respect to this efficient implementation along with reviewing the current methods.

2.1 An efficient way of implementation

Clearly, each stage (\(U_{ni}\) or \(u_{n+1}\)) of (1.2) requires computing matrix functions of the form \(\varphi _k(c_i h A)v_k\) (\(0 < c_i \le 1\)), where \(v_k\) is some vector (could be \(F(t_n, u_n), D_{ni}\) or a linear combination of these). Thanks to recent developments [1, 4, 6, 21], one can efficiently compute a linear combination of \(\varphi \)-functions acting on a set of input vectors \(V_0,\ldots ,V_q\)

$$\begin{aligned} \varphi _0 (M)V_0 + \varphi _1 (M)V_1 +\varphi _2 (M)V_2+\cdots +\varphi _{q}(M)V_{q}, \end{aligned}$$
(2.1)

where M is some square matrix. This is crucial when implementing exponential integrators. Very recently, in [19], we were able to improve the implementations presented in [6, 21], resulting in the routine \({\mathtt {phipm\_simul\_iom}}\). The underlying method in this algorithm is the use of an adaptive time-stepping technique combined with Krylov subspace methods, which allows us to simultaneously compute multiple linear combinations of type (2.1) using different scaling factors \(\rho _1, \cdots , \rho _r\) of M, i.e.,

$$\begin{aligned} \begin{aligned} \varphi _0 (\rho _1 M)V_0 + \varphi _1 (\rho _1 M)V_1&+\varphi _2 (\rho _1 M)V_2+\cdots +\varphi _{N}(\rho _1 M)V_{q},\\&\vdots \\ \varphi _0 (\rho _r M)V_0 + \varphi _1 (\rho _r M)V_1&+\varphi _2 (\rho _r M)V_2+\cdots +\varphi _{N}(\rho _r M)V_{q}. \end{aligned} \end{aligned}$$
(2.2)

Now taking \(M=hA\) and considering \(\rho _k\) (\(1 \le k \le r\)) as nodes \(c_i\) used in expRK methods immediately suggests that one can compute the following (\(s-1\)) linear linear combinations

$$\begin{aligned} \varphi _1(c_i h A)V_1 + \varphi _2(c_i h A)V_2+ \cdots + \varphi _N(c_i h A) V_q, \quad 2 \le i \le s \end{aligned}$$
(2.3)

simultaneously by using only one evaluation (i.e., one call to \({\mathtt {phipm\_simul\_iom}}\)). Note that this requires the use of a same set of vectors \([V_1,\ldots ,V_q ]\) for all the linear combinations in (2.3).

Motivated by this, we see that if a s-stage expRK scheme (1.2) is constructed in such a way that each internal stage \(U_{ni}\) has the form

$$\begin{aligned} U_{ni}=u_n+ \varphi _1(c_i h A)V_{1i} + \varphi _2(c_i h A)V_{2i}+ \cdots + \varphi _N(c_i h A) V_{qi}, \end{aligned}$$
(2.4)

which includes only one linear combination of \(\varphi \)- functions using exactly node \(c_i\) as an argument in all \(\varphi _k\) functions, then the scheme will contain a total of s such linear combinations (\(s-1\) for \(U_{ni}\) and 1 for \(u_{n+1}\) as (1.2b) can be always written in the form of (2.4) with \(c_i=1\)), thereby requiring s evaluations only. Furthermore, since the sets of vectors \([V_{1i}, V_{2i}, \cdots , V_{qi}]\) in (2.4) are usually different for each \(U_{ni}\), (2.3) also suggests that the efficiency will be significantly increased if one could build such stages (or a group of) \(U_{ni}\) of the form (2.4) that share the same format (i.e., having the same set of acting vectors \(V_{1i} \equiv V_1,\ldots ,V_{qi} \equiv V_q\)) or that are independent of one another. As this allows to compute such stages simultaneously by one evaluation or to implement them in parallel similarly to our construction of parallel exponential Rosenbrock methods [18]), it certainly reduces the total number of required evaluations and thus speedups the computing time.

With respect to these observations, we now review the existing expRK schemes in the literature. Since our focus is on stiff problems, we will discuss only on stiffly accurate expRK methods, meaning that they satisfy the stiff order conditions (see Sect. 3 below).

2.2 Existing schemes and remarks

In [8], expRK methods of orders up to four have been derived. For later reference, we name the second-order, the third-order, and the fourth-order methods in that work as \({\mathtt {expRK2s2}}\), \({\mathtt {expRK3s3}}\), and \({\mathtt {expRK4s5}}\), respectively. In [15], we have constructed an expRK method of order five called \({\mathtt {expRK5s8}}\). To discuss all of these schemes in terms of the implementation, we rewrite their internal stages \(U_{ni}\) and \(u_{n+1}\) as linear combinations of \(\varphi \)- functions like (2.4) and display them as follows (Note that since the first-order method, the exponential Euler scheme \( u_{n+1}= u_n + \varphi _1 ( h A) hF(t_n,u_n), \) has no internal stage, we do not consider it here).

\({\mathtt {expRK2s2}}\):

$$\begin{aligned} \begin{aligned} U_{n2}&=u_n +\varphi _1 (c_2 h A) c_2 hF(t_n,u_n), \\ u_{n+1}&= u_n + \varphi _1 ( h A) hF(t_n,u_n) + \varphi _2 (h A)\tfrac{1}{c_2} h D_{n2}. \end{aligned} \end{aligned}$$
(2.5)

\({\mathtt {expRK3s3}}\) (a representative with \(c_2 \ne \tfrac{2}{3}\)):

$$\begin{aligned} \begin{aligned} U_{n2}&=u_n +\varphi _1 (c_2 h A) c_2 hF(t_n,u_n), \\ U_{n3}&=u_n + \varphi _1 \left( \tfrac{2}{3} h A\right) \tfrac{2}{3} hF(t_n, u_n)+ \varphi _2 \left( \tfrac{2}{3} h A\right) \tfrac{4}{9 c_2}h D_{n2}, \\ u_{n+1}&= u_n + \varphi _1 ( h A) hF(t_n,u_n) + \varphi _2 (h A)\tfrac{3}{2} h D_{n2}. \end{aligned} \end{aligned}$$
(2.6)

\({\mathtt {expRK4s5}}\) (the only existing fourth-order stiffly accurate expRK method [8]):

$$\begin{aligned} \begin{aligned} U_{n2}&=u_n +\varphi _1 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} hF(t_n,u_n), \\ U_{n3}&=u_n + \varphi _1 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} hF(t_n, u_n)+ \varphi _2 \left( \tfrac{1}{2} h A\right) h D_{n2}, \\ U_{n4}&=u_n + \varphi _1 ( h A) hF(t_n,u_n)+ \varphi _2 (h A) h(D_{n2} + D_{n3}),\\ U_{n5}&=u_n + \left[ \varphi _1 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} hF(t_n, u_n)+ \varphi _2 \left( \tfrac{1}{2} h A\right) \tfrac{1}{4} h (2 D_{n2} + 2 D_{n3} - D_{n4})\right. \\&\quad \left. + \varphi _3 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} h (-D_{n2} - D_{n3} + D_{n4})\right] + \left[ \varphi _2 (h A) \tfrac{1}{4} h (D_{n2} + D_{n3} - D_{n4})\right. \\&\quad \left. + \varphi _3 (h A) h (-D_{n2} - D_{n3} + D_{n4})\right] ,\\ u_{n+1}&= u_n + \varphi _1 ( h A) hF(t_n,u_n) + \varphi _2 (h A) h (-D_{n4}+4 D_{n5})+ \varphi _3 (h A) h (4D_{n4} -8 D_{n5}). \end{aligned}\nonumber \\ \end{aligned}$$
(2.7)

\({\mathtt {expRK5s8}}\) (the only existing fifth-order stiffly accurate expRK method [15]):

$$\begin{aligned} U_{n2}= & {} u_n +\varphi _1 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} hF(t_n,u_n), \\ U_{n3}= & {} u_n + \varphi _1 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} hF(t_n, u_n)+ \varphi _2 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} h D_{n2}, \\ U_{n4}= & {} u_n + \varphi _1 \left( \tfrac{1}{4} h A\right) \tfrac{1}{4} hF(t_n,u_n)+ \varphi _2 \left( \tfrac{1}{4} h A\right) \tfrac{1}{8} h D_{n3},\\ U_{n5}= & {} u_n + \varphi _1 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} hF(t_n, u_n) + \varphi _2 \left( \tfrac{1}{2} h A\right) \tfrac{1}{2} h (- D_{n3} + 4D_{n4})\\&\quad + \varphi _3 \left( \tfrac{1}{2} h A\right) h (2D_{n3} - 4D_{n4})\\ U_{n6}= & {} u_n + \varphi _1 \left( \tfrac{1}{5} h A\right) \tfrac{1}{5} hF(t_n, u_n) + \varphi _2 \left( \tfrac{1}{5} h A\right) \tfrac{1}{25} h (8D_{n4} - 2D_{n5})\\&\quad + \varphi _3 \left( \tfrac{1}{5} h A\right) \tfrac{1}{125}h (-32 D_{n4} + 16 D_{n5}),\\ U_{n7}= & {} u_n +\left[ \varphi _1 \left( \tfrac{2}{3} h A\right) \tfrac{2}{3} hF(t_n, u_n) + \varphi _2 \left( \tfrac{2}{3} h A\right) h \left( \tfrac{-16}{27} D_{n5} +\tfrac{100}{27} D_{n6}\right) \right. \\&\quad \left. + \varphi _3 \left( \tfrac{2}{3} h A\right) h \left( \tfrac{320}{81} D_{n5} -\tfrac{800}{81} D_{n6}\right) \right] \\&\quad +\left[ \varphi _2 \left( \tfrac{1}{5} h A\right) h \left( \tfrac{-20}{81} D_{n4} +\tfrac{5}{243} D_{n5}+\tfrac{125}{486} D_{n6}\right) \right. \\&\quad \left. +\varphi _3 \left( \tfrac{1}{5} h A\right) h \left( \tfrac{16}{81} D_{n4} -\tfrac{4}{243} D_{n5}-\tfrac{50}{243} D_{n6}\right) \right] , \\ U_{n8}= & {} u_n +\left[ \varphi _1 ( h A) hF(t_n, u_n) + \varphi _2 ( h A) h \left( \tfrac{-16}{3} D_{n5} +\tfrac{250}{21} D_{n6} +\tfrac{27}{14} D_{n7}\right) \right. \\&\quad + \varphi _3 ( h A) h \left( \tfrac{208}{3} D_{n5} -\tfrac{250}{3} D_{n6}- 27 D_{n7}\right) \\&\quad \left. + \varphi _4 ( h A) h \left( -240D_{n5} + \tfrac{1500}{7} D_{n6} + \tfrac{810}{7} D_{n7}\right) \right] \\&\quad + \left[ \varphi _2 \left( \tfrac{1}{5} h A\right) h \left( \tfrac{-4}{7} D_{n5} + \tfrac{25}{49} D_{n6} +\tfrac{27}{98} D_{n7}\right) \right. \\&\quad + \varphi _3 \left( \tfrac{1}{5} h A\right) h \left( \tfrac{8}{5} D_{n5} - \tfrac{10}{7} D_{n6} - \tfrac{27}{35} D_{n7}\right) \\&\quad \left. +\varphi _4 \left( \tfrac{1}{5} h A\right) h \left( \tfrac{-48}{35} D_{n5} + \tfrac{60}{49} D_{n6} + \tfrac{162}{245} D_{n7}\right) \right] \\&\quad + \left[ \varphi _2 \left( \tfrac{2}{3} h A\right) h \left( \tfrac{-288}{35} D_{n5} + \tfrac{360}{49} D_{n6} +\tfrac{972}{245} D_{n7}\right) \right. \\&\quad + \varphi _3 \left( \tfrac{2}{3} h A\right) h \left( \tfrac{384}{5} D_{n5} - \tfrac{480}{7} D_{n6} - \tfrac{1296}{35} D_{n7}\right) \\&\quad \left. +\varphi _4 \left( \tfrac{2}{3} h A\right) h \left( \tfrac{-1536}{7} D_{n5} + \tfrac{9600}{49} D_{n6} + \tfrac{5184}{49} D_{n7}\right) \right] ,\\ u_{n+1}= & {} u_n + \varphi _1 ( h A) hF(t_n,u_n) + \varphi _2 (h A) h \left( \tfrac{125}{14} D_{n6} -\tfrac{27}{14} D_{n7} + \tfrac{1}{2} D_{n8}\right) \\&\quad + \varphi _3 (h A) h \left( \tfrac{-625}{14} D_{n6} + \tfrac{162}{7} D_{n7} - \tfrac{13}{2} D_{n8}\right) \\&\quad + \varphi _4 (h A) h \left( \tfrac{1125}{14} D_{n6} - \tfrac{405}{7} D_{n7} + \tfrac{45}{2} D_{n8}\right) . \end{aligned}$$

Remark 2.1

In view of the structures of these schemes, one can see that only the second- and third-oder schemes (\({\mathtt {expRK2s2}}\), \({\mathtt {expRK3s3}}\)) have all \(U_{ni}\) in the form (2.4). While \({\mathtt {expRK2s2}}\) requires one internal stage \(U_{n2}\), \({\mathtt {expRK3s3}}\) needs two internal stages with \(U_{n3}\) depends on \(U_{n2}\), making these stages cannot be computed simultaneously. As for \({\mathtt {expRK4s5}}\), to the best of our knowledge, this 5-stage scheme is the only existing fourth-order stiffly accurate expRK method. As seen, among its internal stages the three internal stages \(U_{n2}\), \(U_{n3}\), and \(U_{n4}\) are of the form (2.4) but again their corresponding sets of vectors \([V_{ki}]\) are not the same (\([V_{k2}]=[\tfrac{1}{2} hF(t_n,u_n)], [V_{k3}]=[\tfrac{1}{2} hF(t_n,u_n), hD_{n2} ], [V_{k4}]=[hF(t_n,u_n), h(D_{n2}+D_{n3})]\)), and because of (1.3), they are not independent of one another (\(U_{n4}\) and \(U_{n3}\) depend on their preceding stages). Therefore one needs 3 sequential evaluations for computing these three stages. Also, we note that the last internal stage \(U_{n5}\) depends on all of its preceding stages and involves two different linear combinations of \(\varphi _k\)- functions with different scaling factors \(c_5=\tfrac{1}{2}\) and \(c_4=1\), namely, \(\sum _{k} \varphi _k (\tfrac{1}{2} hA)V_k\) and \(\sum _{k}\varphi _k (hA)W_k\) (grouped in two different brackets \([ \ ]\)), which has to be implemented by 2 separate evaluations. The final stage \(u_{n+1}\) depends on \(U_{n4}\) and \(U_{n5}\). As a result, this scheme must be implemented in a sequential way, which requires totally 6 evaluations for 6 different linear combinations. Similarly, to the best of our knowledge, \({\mathtt {expRK5s8}}\) is also the only existing fifth-order stiffly accurate expRK methods. From the construction of this scheme [15], one needs 8 stages. Among them, the first five internal stages are of the form (2.4). We note, however, that the last two internal stages \(U_{n7}\) and \(U_{n8}\) involves 2 and 3 different linear combinations (grouped in different brackets \([ \ ]\)) of \(\varphi _k\)- functions (with different scaling factors) acting on different sets of vectors. And each stage (\(U_{ni}\) or \(u_{n+1}\)) depends on all the preceding stages (except for the first stage \(U_{n2}\)). Thus, this scheme must be also implemented in a sequential way (it also does not have any group of internal stages that can be computed simultaneously). Clearly, it requires totally 11 evaluations (11 different linear combinations of \(\varphi \) functions).

Remark 2.2

The resulting structures of the expRK schemes discussed in Remark 2.1 can be explained by taking a closer look at their constructions presented in [8, 15]. Namely, these methods have been analyzed and derived by using a weakened convergence result, i.e., weakening of many order conditions in order to minimize the number of required stages s and the number of matrix functions in each internal stage \(U_{ni}\). Specifically, for fourth-order methods (e.g., \({\mathtt {expRK4s5}}\)) 4 out of 9 order conditions have to be relaxed and for fifth-order methods (e.g., \({\mathtt {expRK5s8}}\)) 9 out of 16 order conditions have to be relaxed. As a trade off, each stage of these methods depends on the preceding stages (thus the resulting schemes must be implemented by computing each stage sequentially) and the very last stages usually involve different linear combinations of \(\varphi _{k}\)-functions (with several different nodes \(c_i\) as scaling factors) acting on not the same set of vectors, which then require additional sequential evaluations. For more details, see Sect. 4 below.

3 Stiff order conditions and convergence analysis

Inspired by the motivation and remarks in Sect. 2, we next present a stronger convergence result which later allows a construction of new efficient methods of high order. For this, we first recall the stiff order conditions for expRK methods up to order 5 (see [15, 17]).

3.1 Stiff order conditions for methods up to order 5

Let \({\tilde{e}}_{n+1}={\hat{u}}_{n+1}- u(t_{n+1})\) denote the local error of (1.2), i.e., the difference between the numerical solution \({\hat{u}}_{n+1}\) obtained by (1.2) after one step starting from the ‘initial condition’ \(u(t_n)\) and the corresponding exact solution \(u(t_{n+1})\) of (1.1) at \(t_{n+1}\).

To simplify the notation in this section we set \(f(t)=g(t, u(t))\) as done in [8], and additionally denote \(G_{k,n}=D^{k} g(t_n, u(t_n))\) be the k-th partial Fréchet derivative (with respect to u) evaluated at \(u(t_n)\). Our results in [17] (Sect. 4.2) or [14] (Sect. 5.1) showed that

$$\begin{aligned} \begin{aligned} {\tilde{e}}_{n+1}&=h^2\psi _{2} ( h A)\,f'(t_n)+ h^3\psi _{3} ( h A) \,f''(t_n) + h^4 \psi _{4} ( h A) \,f'''(t_n) + h^5 \psi _{5} ( h A) \,f^{(4)}(t_n) \\&\quad +{\mathbf {R}}_n+ {\mathscr {O}}(h^6) \end{aligned} \end{aligned}$$
(3.1)

with the remaining terms

$$\begin{aligned} {\mathbf {R}}_n&= h^3 \sum _{i=2}^{s}b_{i} G_{1,n} \psi _{2,i} \,f'(t_n) + h^4 \sum _{i=2}^{s}b_{i} G_{1,n} \psi _{3,i} \,f''(t_n)\nonumber \\&\quad + h^4 \sum _{i=2}^{s}b_{i} G_{1,n} \sum _{j=2}^{i-1}a_{ij} G_{1,n} \psi _{2,j} \,f'(t_n) \nonumber \\&\quad + h^4 \sum _{i=2}^{s}b_{i} c_i G_{2,n}\big (u'(t_n), \psi _{2,i} \,f'(t_n) \big ) + h^5 \sum _{i=2}^{s}b_{i} G_{1,n}\psi _{4,i}\,f'''(t_n) \nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} G_{1,n} \sum _{j=2}^{i-1}a_{ij} G_{1,n} \psi _{3,j}\,f''(t_n)\nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} G_{1,n} \sum _{j=2}^{i-1}a_{ij} G_{1,n}\sum _{k=2}^{j-1}a_{jk} G_{1,n} \psi _{2,k} \,f'(t_n) \nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} G_{1,n} \sum _{j=2}^{i-1}a_{ij}c_j G_{2,n}\big (u'(t_n), \psi _{2,j} \,f'(t_n)\big ) \nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} c_i G_{2,n}\big (u'(t_n), \psi _{3,i} \,f''(t_n) \big ) \nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} c_i G_{2,n}\big (u'(t_n),\sum _{j=2}^{i-1}a_{ij} G_{1,n} \psi _{2,j} \,f'(t_n) \big ) \nonumber \\&\quad + h^5 \sum _{i=2}^{s} \frac{b_{i}}{2!} G_{2,n}\big (\psi _{2,i} \,f'(t_n), \psi _{2,i} \,f'(t_n) \big ) \nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} \frac{c^2_i}{2!} G_{2,n}\big (u''(t_n),\psi _{2,i} \,f'(t_n)\big )\nonumber \\&\quad + h^5 \sum _{i=2}^{s}b_{i} \frac{c^2_i}{2!} G_{3,n}\big (u'(t_n), u'(t_n),\psi _{2,i} \,f'(t_n) \big ). \end{aligned}$$
(3.2)

Here, (and from now on) we use the abbreviations \(a_{ij}=a_{ij}(hA)\), \(b_{i}=b_{i}(hA), \varphi _{j,i}=\varphi _{j} (c_i hA)\) and

$$\begin{aligned} \psi _{j} (hA)&= \sum _{i=2}^{s}b_{i}\frac{c^{j-1}_{i}}{(j-1)!}- \varphi _{j} (hA), \quad j\ge 2 \end{aligned}$$
(3.3a)
$$\begin{aligned} \psi _{j,i}&=\psi _{j,i} (hA)= \sum _{k=2}^{i-1}a_{ik}\frac{c^{j-1}_{k}}{(j-1)!}- c^{j}_{i}\varphi _{j,i}. \end{aligned}$$
(3.3b)

Requiring a local error truncation \( {\tilde{e}}_{n+1}={\mathscr {O}}(h^6)\) results in the stiff order conditions for methods of order up to 5, which are displayed in Table 1 below.

Table 1 Stiff order conditions for explicit exponential Runge–Kutta methods up to order 5. The variables Z, J, K, L denote arbitrary square matrices, and B an arbitrary bilinear mapping of appropriate dimensions. The functions \(\psi _l\) and \(\psi _{k,l}\) are defined in (3.3)

3.2 A stronger convergence result

The convergence analysis of exponential Runge–Kutta methods is usually performed in the framework of analytic semigroups on a Banach space X with the following assumptions (see e.g. [8, 15]):

Assumption 1

The linear operator A is the infinitesimal generator of an analytic semigroup \(\mathrm{e}^{tA}\) on X. This implies that

$$\begin{aligned} \Vert \mathrm{e}^{tA}\Vert _{X\leftarrow X}\le C, \quad t\ge 0 \end{aligned}$$
(3.4)

and consequently \(\varphi _k(h A)\), the coefficients \(a_{ij}(h A)\) and \(b_{i}(h A)\) of the method are bounded operators. Furthermore, the following stability bound (see [8, Lemma 1])

$$\begin{aligned} \left\| h A \sum _{j=1}^{n}\mathrm{e}^{j h A} \right\| _{X\leftarrow X} \le C \end{aligned}$$
(3.5)

holds uniformly for all \(n\ge 1\) and \(h>0\) with \(0<nh\le T-t_0\).

Assumption 2

(for high-order methods). The solution \(u:[t_0, T]\rightarrow X\) of (1.1) is sufficiently smooth with derivatives in X and \(g: [t_0, T] \rightarrow X\) is sufficiently often Fréchet differentiable in a strip along the exact solution. All occurring derivatives are assumed to be uniformly bounded.

Let \( e_{n+1} = u_{n+1} - u(t_{n+1})\) denote the global error at time \(t_{n+1}\). In [15], we have shown that \(e_n\) satisfies the recursion

$$\begin{aligned} e_{n}=h\sum _{j=0}^{n-1} \mathrm{e}^{(n-j)hA} {\mathscr {K}}_j (e_j)e_j + \sum _{j=0}^{n-1} \mathrm{e}^{jhA}{\tilde{e}}_{n-j}, \end{aligned}$$
(3.6)

where \({\mathscr {K}}_{j} (e_j)\) are bounded operators on X.

Motivated by Remark 2.2, we now give a stronger convergence result (compared to those results given in [8, 15]) in the sense that it requires relaxing only one order condition.

Theorem 3.1

(Convergence) Let the initial value problem (1.1) satisfy Assumptions 1–2. Consider for its numerical solution an explicit exponential Runge–Kutta method (1.2) that fulfills the order conditions of Table 1 up to order p (\(2 \le p \le 5\)) in a strong form with the exception that only one condition \(\psi _{p}(Z)=0\) holds in a weakened form, i.e., \(\psi _{p}(0)=0\). Then, the method is convergent of order p. In particular, the numerical solution \(u_n\) satisfies the error bound

$$\begin{aligned} \Vert u_n -u(t_n)\Vert \le C h^p \end{aligned}$$
(3.7)

uniformly on compact time intervals \(t_0 \le t_n =t_0+nh \le T\) with a constant C that depends on \(T-t_0\), but is independent of n and h.

Proof

The proof can be carried out in a very similar way as done in [15, Theorem 4.2]. In view of (3.1) and (3.2) and employing the assumptions of Theorem 3.1 on the order conditions, we have \({\mathbf {R}}_n=0\) and thus

$$\begin{aligned} {\tilde{e}}_{n+1}=h^p \big ( \psi _{p} ( h A)-\psi _{p} (0)\big ){G_{p-1,n}}+h^{p+1} {\mathbf {S}}_n, \end{aligned}$$
(3.8)

where \(G_{p-1,n}\) is defined in Sect. 3.1 and \({\mathbf {S}}_n\) involves the terms multiplying \(h^{p+1}\) and higher order in (3.1) (clearly, \(\Vert {\mathbf {S}}_n\Vert \le C\)). Inserting (3.8) (with index \(n-j-1\) in place of n) into (3.6) and using the fact that there exists a bounded operator \({\tilde{\psi }}_{p}(hA)\) such that \( \psi _{p} ( h A)-\psi _{p} (0)={\tilde{\psi }}_{p}(hA)hA\) yields

$$\begin{aligned} e_{n}=h\sum _{j=0}^{n-1} \mathrm{e}^{(n-j)hA} {\mathscr {K}}_j (e_j)e_j + h^p\sum _{j=0}^{n-1} hA\mathrm{e}^{jhA}{\tilde{\psi }}_{p}(hA){\mathbf {G}}_{n-j-1,p-1} +h^{p+1} {\mathbf {S}}_{n-j-1}.\nonumber \\ \end{aligned}$$
(3.9)

Using (3.4), (3.5) and an application of a discrete Gronwall lemma shows (3.7). \(\square \)

With the result of Theorem 3.1 in hand, we are now ready to derive more efficient methods. In particular, we will solve the system of stiff order conditions of Table 1 in the context of Theorem 3.1. It turns out that for methods of high order this will require an increase in the number of stages s. However, we will have more degree of freedoms for constructing our desired methods as seen in Sect. 4 below. In addition, by relaxing only one order condition, we expect methods resulted from Theorem 3.1 to have better stability (and thus may be more accurate) when integrating stiff systems (see Sect. 6).

4 Derivation of new efficient exponential Runge–Kutta methods

In this section, we will derive methods which have the following features: (i) containing multiple internal stages \(U_{ni}\) that are independent of each other (henceforth called parallel stages) and share the same format (thereby allowing them to be implemented in parallel); (ii) involving less number of evaluations of the form (2.4) when compared to the existing methods of the same orders (thus behaving like methods that use fewer number of stages s).

We first start with methods of order \(p \le 3\). When solving order conditions for these methods (requiring at least \(s=2\) and \(s=3\) for second- and third-order methods, respectively), one can easily show that it is not possible to fulfill the desired feature (ii), particularly when comparing with \({\mathtt {expRK2s2}}\) (order 2, 2-stage) and \({\mathtt {expRK3s3}}\) (order 3, 3-stage) mentioned in Sect. 2. We omit the details. Therefore, we will focus on the derivation of new methods of higher orders, namely, orders 4 and 5.

4.1 A family of fourth-order methods with parallel stages

Deriving methods of order 4 requires solving the set of 7 stiff order conditions 1–7 in Table 1. First, we discuss on the required number of stages s. It is shown in [8, Sect.5.3] that \(s=5\) is the minimal number of stages required to construct a family of fourth-order methods which satisfies conditions 1–3 in the strong sense and conditions 4–7 in the weakened form (relaxing \(b_i(Z)\) as \(b_i(0)\)). In other words, with \(s=5\) it is not possible to fulfill the order conditions in the context of Theorem 3.1, which requires only condition 4 holds in a weakened form \(\psi _{4}(0)=0\) or equivalently \(\sum _{i=2}^{s}b_{i}(0)\frac{c_i^3}{3!} = \varphi _4(0)=1/24\). Therefore, we consider \(s=6\). In this case, conditions 1, 2, and the weakened condition 4 are

$$\begin{aligned} b_2 c_2+ b_3 c_3+ b_4 c_4+b_5 c_5+b_6 c_6&=\varphi _2, \end{aligned}$$
(4.1a)
$$\begin{aligned} b_2 c^2_2+ b_3 c^2_3+ b_4 c^2_4+b_5 c^2_5+b_6 c^2_6&=2\varphi _3,\end{aligned}$$
(4.1b)
$$\begin{aligned} b_2 (0) c^3_2+ b_3 (0)c^3_3+ b_4 (0) c^3_4+b_5 (0) c^3_5+b_6 (0) c^3_6&=6\varphi _4 (0)=1/4, \end{aligned}$$
(4.1c)

and conditions 3, 5, 7 and 6 are

$$\begin{aligned}&b_2 J \psi _{2,2}+ b_3 J \psi _{2,3}+ b_4 J \psi _{2,4}+ b_5 J \psi _{2,5}+ b_6 J \psi _{2,6}= 0, \end{aligned}$$
(4.2a)
$$\begin{aligned}&b_2 J \psi _{3,2}+ b_3 J \psi _{3,3}+ b_4 J \psi _{3,4}+ b_5 J \psi _{3,5}+ b_6 J \psi _{3,6}= 0, \end{aligned}$$
(4.2b)
$$\begin{aligned}&b_2 c_2 K \psi _{2,2}+ b_3 c_3 K \psi _{2,3}+ b_4 c_4 K \psi _{2,4}+ b_5 c_5 K \psi _{2,5}+ b_6 c_6 K \psi _{2,6}=0 ,\end{aligned}$$
(4.2c)
$$\begin{aligned}&b_3 J a_{32}J\psi _{2,2} + b_4 J (a_{42}J\psi _{2,2}+a_{43}J\psi _{2,3}) +b_5 J(a_{52}J\psi _{2,2}+a_{53}J\psi _{2,3}+a_{54}J\psi _{2,4}) \\&\quad +b_6J(a_{62}J\psi _{2,2}+a_{63}J\psi _{2,3}+a_{64}J\psi _{2,4}+ a_{65}J\psi _{2,5})=0. \nonumber \end{aligned}$$
(4.2d)

We now solve these order conditions. We note from (3.3b) that

$$\begin{aligned} \psi _{2,i} = \sum _{j=2}^{i-1}a_{ij}c_j-c^{2}_i \varphi _{2,i}, \quad \psi _{3,i} = \sum _{j=2}^{i-1}a_{ij}\frac{c^{2}_j}{2!}-c^{3}_i \varphi _{3,i} \end{aligned}$$
(4.3)

and thus \( \psi _{2,2}=-c^{2}_{2}\varphi _{2,2} \ne 0, \quad \psi _{3,2}=-c^{3}_{2}\varphi _{3,2} \ne 0 \) (since \(c_2\ne 0\)). Using (4.3), one can infer that either \( \psi _{2,3}\) or \( \psi _{3,3}\) must be nonzero as well (if both are zero then \(a_{32}=\dfrac{c^2_3}{c_2}\varphi _{2,3}=\dfrac{2c^3_3}{c^2_2}\varphi _{3,3}\), which is impossible since \(c_3>0\) and \(\{\varphi _2, \varphi _3 \}\) are linearly independent). This strongly suggests that \(b_2=b_3=0\) in order to later fulfill (4.2) in the strong sense with arbitrary square matrices J and K. Next, we further observe that if \(b_4 \ne 0\) one may need both \( \psi _{2,4}= \psi _{3,4}=0\) (which solves \(a_{42} \ne 0\), \(a_{43} \ne 0\)). However, this makes the second term in (4.2d) to be nonzero which is then very difficult to satisfy (4.2d) in the strong form. Putting together, it requires that \(b_2=b_3=b_4=0\). Using this sufficient condition we can easily solve (4.1) to get

$$\begin{aligned} b_5 =\dfrac{-c_6\varphi _2 + 2\varphi _3}{c_5(c_5-c_6)}, \quad b_6 =\dfrac{-c_5\varphi _2 + 2\varphi _3}{c_6(c_6-c_5)} \end{aligned}$$

for any choice of distinct nodes \(c_5, c_6>0\), satisfying the condition

$$\begin{aligned} c_5=\dfrac{4c_6-3}{6c_6-4}. \end{aligned}$$
(4.4)

Since \(b_5, b_6 \ne 0\), we must enforce \(\psi _{2,5}=\psi _{3,5}=0\) and \(\psi _{2,6}=\psi _{3,6}=0\) to satisfy conditions (4.2a)–(4.2c). Using (4.3), this leads to the following 2 systems of two linear equations

$$\begin{aligned} a_{52} c_2+ a_{53} c_3+ a_{54}c_4&=c^2_5\varphi _{2,5}, \end{aligned}$$
(4.5a)
$$\begin{aligned} a_{52} c^2_2+ a_{53} c^2_3+ a_{54}c^2_4&=2c^3_5\varphi _{3,5}, \end{aligned}$$
(4.5b)

and

$$\begin{aligned} a_{62} c_2+ a_{63} c_3+ a_{64}c_4+ a_{65}c_5&=c^2_6\varphi _{2,6}, \end{aligned}$$
(4.6a)
$$\begin{aligned} a_{62} c^2_2+ a_{63} c^2_3+ a_{64}c^2_4+ a_{65}c^2_5&=2c^3_6\varphi _{3,6}. \end{aligned}$$
(4.6b)

To satisfy conditions (4.2d), we further enforce \(a_{52}=a_{62}=0\) (since \(\psi _{2,2} \ne 0\)), which immediately solves (4.5) for coefficients (with \(c_3 \ne c_4\))

$$\begin{aligned} a_{53} =\dfrac{-c_4 c^2_5\varphi _{2,5}+ 2 c^3_5\varphi _{3,5}}{c_3(c_3-c_4)} \ne 0, \quad a_{54}=\dfrac{-c_3 c^2_5\varphi _{2,5} + 2 c^3_5\varphi _{3,5}}{c_4(c_4-c_3)} \ne 0, \end{aligned}$$
(4.7)

and thus we also need \(\psi _{2,3}=\psi _{2,4}=0\) (since \(\psi _{2,5}=0\)), which gives

$$\begin{aligned} a_{32}&=\dfrac{c^2_3}{c_2}\varphi _{2,3}, \end{aligned}$$
(4.8a)
$$\begin{aligned} a_{42} c_2+ a_{43} c_3&=c^2_4\varphi _{2,4}. \end{aligned}$$
(4.8b)

After fulfilling all the required order conditions in (4.1)–(4.2), we see from (4.6) and (4.8b) that either \(a_{42}\) or \(a_{43}\) and one of the coefficients among \(a_{63},\ a_{64},\ a_{65}\) can be taken as free parameters. We now use them to construct parallel stages. Guided by (4.7) and (4.8a), we choose \(a_{43}=0\) to make \(U_{n4}\) is independent of \(U_{n3}\) so that both these stages only depend on \(U_{n2}\), and choose \(a_{65}=0\) to make \(U_{n6}\) is independent of \(U_{n5}\) so that both these stages only depend on the two preceding stages \(U_{n3}, U_{n4}\) (since \(a_{52}=a_{62}=0\)). From this we determine the remaining coefficients

$$\begin{aligned} a_{42} =\dfrac{c^2_4}{c_2}\varphi _{2,4}, \quad a_{63} =\dfrac{-c_4 c^2_6\varphi _{2,6}+ 2 c^3_6\varphi _{3,6}}{c_3(c_3-c_4)}, \quad a_{64}=\dfrac{-c_3 c^2_6\varphi _{2,6}+ 2 c^3_6\varphi _{3,6}}{c_4(c_4-c_3)}.\nonumber \\ \end{aligned}$$
(4.9)

Putting altogether and rearranging terms in \(U_{ni},\ u_{n+1}\) as linear combinations of \(\varphi \) functions, we obtain the following family of 4th-order 6-stage methods (with the pairs of parallel stages \(\{U_{n3}, U_{n4}\}\) and \(\{U_{n5}, U_{n6}\}\)), which will be called \({\mathtt {expRK4s6}}\):

$$\begin{aligned} U_{n2}&=u_n +\varphi _1 (c_2 h A) c_2 hF(t_n, u_n), \end{aligned}$$
(4.10a)
$$\begin{aligned} U_{n,k}&=u_n + \varphi _1 (c_k h A) c_k hF(t_n, u_n)+ \varphi _2 (c_k h A) \tfrac{c^2_k}{c_2} h D_{n2}, \quad k=3, 4 \end{aligned}$$
(4.10b)
$$\begin{aligned} U_{n,j}&=u_n + \varphi _1 (c_j h A)c_j h F(t_n, u_n)+ \varphi _{2} (c_j hA) \tfrac{c^2_j}{c_3-c_4} h \big (\tfrac{-c_4}{c_3}D_{n3} +\tfrac{c_3}{c_4}D_{n4}\big )\nonumber \\&\quad + \varphi _{3} (c_j hA) \tfrac{2c^3_j}{c_3-c_4} h \big (\tfrac{1}{c_3}D_{n3} -\tfrac{1}{c_4}D_{n4}\big ), \quad j=5,6 \end{aligned}$$
(4.10c)
$$\begin{aligned} u_{n+1}&=u_n + \varphi _1 (h A) h F(t_n, u_n)+ \varphi _{2} (hA) \tfrac{1}{c_5-c_6} h \big (\tfrac{-c_6}{c_5}D_{n5} +\tfrac{c_5}{c_6}D_{n6}\big ) \nonumber \\&\quad +\varphi _{3} (hA) \tfrac{2}{c_5-c_6} h \big (\tfrac{1}{c_5}D_{n5} -\tfrac{1}{c_6}D_{n6}\big ). \end{aligned}$$
(4.10d)

For the numerical experiments given in Sect. 6, we choose \(c_2=c_3=\frac{1}{2}, c_4=\frac{1}{3}\), \(c_6=\frac{1}{3}\) which gives \(c_5=\frac{5}{6}\) due to (4.4).

Remark 4.1

(A comparison with \({\mathtt {expRK4s5}}\)). As seen, \({\mathtt {expRK4s6}}\) is resulted from weakening only condition 4 of Table 1 instead of weakening four conditions 4–7 as in the derivation of \({\mathtt {expRK4s5}}\). While the 5-stage method \({\mathtt {expRK4s5}}\) requires 6 sequential evaluations in each step (as mentioned in Section 2), the new fourth-order 6-stage method \({\mathtt {expRK4s6}}\) requires only 4 sequential evaluations, making it to behave like a 4-stage method. This is due to the fact \({\mathtt {expRK4s6}}\) has the pairs of parallel stages \(\{U_{n3}, U_{n4}\}\) and \(\{U_{n5}, U_{n6}\}\) and all \(U_{ni}\) within these pairs have the same format, i.e., same (one) linear combination of \(\varphi _k (c_i h A)v_k\), allowing them to be computed in parallel or simultaneously (see Sect. 5).

4.2 A family of fifth-order methods with parallel stages

Constructing fifth-order exponential Runge-Kutta methods needs much more effort as one has to solve 16 order conditions in Table 1. As mentioned in Section 2, the only existing method of order 5 in the literature is \({\mathtt {expRK5s8}}\) (see [15]) which requires \(s=8\) stages. Like \({\mathtt {expRK4s5}}\), this method does not have any parallel stages and must be implemented in a sequential way. It also does not satisfy the assumption on the order conditions stated in Theorem 3.1. Indeed, it was constructed by fulfilling conditions 1–7 in the strong form and weakening conditions 8–16 (9 out of 16 order conditions) with \(b_i(0)\) in place of \(b_i (Z)\). This resulted in the last two internal stages \(U_{n7}\) and \(U_{n8}\) that involve several different linear combinations of \(\varphi _k (c_i hA)v_k\) (with different scalings \(c_6, c_7, c_8\) of hA), for which additional computational efforts are required to compute those stages (as shown in Section 2).

Therefore, to derive a method based on Theorem 3.1 which later allows us to derive parallel stages schemes with \(U_{ni}\) involving only one linear combination of \(\varphi _k (c_i hA)v_k\), we have to increase \(s\geqslant 9\). To make it easier for readers to follow, we consider \(s=10\) first and later employ the similar procedure to show that it is not possible to fulfill condition 11 of Table 1 in the strong form (and thus not satisfying Theorem 3.1) with \(s=9\).

(a) The case \(s=10\): Similarly to the derivation presented in Sect. 4.1, using (4.3), it strongly suggests \(b_2=b_3=b_4=b_5=b_6=b_7=0\) in order to solve conditions 3, 5, 9, 7, 16, 13, and 15 in their strong form. Using this, these conditions now read as

$$\begin{aligned}&b_8 J \psi _{2,8}+ b_9 J \psi _{2,9}+ b_{10} J \psi _{2,10}= 0, \end{aligned}$$
(4.11a)
$$\begin{aligned}&b_8 J \psi _{3,8}+ b_9 J \psi _{3,9}+ b_{10} J \psi _{3,10}= 0, \end{aligned}$$
(4.11b)
$$\begin{aligned}&b_8 J \psi _{4,8}+ b_9 J \psi _{4,9}+ b_{10} J \psi _{4,10}= 0, \end{aligned}$$
(4.11c)
$$\begin{aligned}&b_8 c_8 K \psi _{2,8}+ b_9 c_9 K \psi _{2,9}+ b_{10} c_{10} K \psi _{2,10}=0 ,\end{aligned}$$
(4.11d)
$$\begin{aligned}&b_8 c^2_8 L \psi _{2,8}+ b_9 c^2_9 L \psi _{2,9}+ b_{10} c^2_{10} L \psi _{2,10}=0 ,\end{aligned}$$
(4.11e)
$$\begin{aligned}&b_8 c_8 K \psi _{3,8}+ b_9 c_9 K \psi _{3,9}+ b_{10} c_{10} K \psi _{3,10}=0 ,\end{aligned}$$
(4.11f)
$$\begin{aligned}&b_8 B(\psi _{2,8},\psi _{2,8})+ b_9 B(\psi _{2,9},\psi _{2,9})+ b_{10} B(\psi _{2,10},\psi _{2,10})=0, \end{aligned}$$
(4.11g)

respectively. And conditions 1, 2, 4, and 8 (weakened form) become

$$\begin{aligned} b_8 c_8+b_9 c_9+b_{10} c_{10}&=\varphi _2, \end{aligned}$$
(4.12a)
$$\begin{aligned} b_8 c^2_8+b_9 c^2_9+b_{10} c^2_{10}&=2\varphi _3,\end{aligned}$$
(4.12b)
$$\begin{aligned} b_8 c^3_8+b_9 c^3_9+b_{10} c^3_{10}&=6\varphi _4,\end{aligned}$$
(4.12c)
$$\begin{aligned} b_8 (0) c^4_8+b_9 (0) c^4_9+b_{10}(0) c^4_{10}&=24\varphi _5(0)=1/5. \end{aligned}$$
(4.12d)

Solving (4.12) gives

$$\begin{aligned} b_8&=\dfrac{c_9 c_{10}\varphi _2 - 2(c_9+c_{10})\varphi _3+6\varphi _4}{c_8(c_8-c_9)(c_8-c_{10})},\end{aligned}$$
(4.13a)
$$\begin{aligned} b_9&=\dfrac{c_8 c_{10}\varphi _2 - 2(c_8+c_{10})\varphi _3+6\varphi _4}{c_9(c_9-c_8)(c_9-c_{10})},\end{aligned}$$
(4.13b)
$$\begin{aligned} b_{10}&=\dfrac{c_8 c_9\varphi _2 - 2(c_8+c_9)\varphi _3+6\varphi _4}{c_{10}(c_{10}-c_8)(c_{10}-c_9)} \end{aligned}$$
(4.13c)

where \(c_8, c_9\), and \(c_{10}\) are distinct and positive nodes satisfying the algebraic equation

$$\begin{aligned} \dfrac{c_8+c_9+c_{10}}{4}-\dfrac{c_8 c_9 + c_8 c_{10} + c_9 c_{10}}{3}+\dfrac{c_8 c_9 c_{10}}{2}=\dfrac{1}{5}. \end{aligned}$$
(4.14)

Clearly, \(b_8, b_9, b_{10} \ne 0\) so one has to enforce

$$\begin{aligned} \psi _{2,j}=\psi _{3,j}=\psi _{4,j}=0 \ (j=8, 9,10) \end{aligned}$$
(4.15)

to satisfy (4.11) in the strong sense with arbitrary square matrices JKL and B. Next, we consider conditions 6 and 10 taken into account that \(b_i=0 \) (\(i=2,\cdots ,7\)) and (4.15), which can be now simplified as

$$\begin{aligned} \sum _{j=2}^{7} (b_8 J a_{8j}+ b_9 J a_{9j}+ b_{10} J a_{10j}) J \psi _{m,j}=0 \ \ (m=2, 3), \end{aligned}$$
(4.16)

respectively. In order to satisfy the strong form of (4.16) one needs

$$\begin{aligned} a_{8j}=a_{9j}=a_{10j}=0 \ (j=2, 3, 4) \end{aligned}$$
(4.17)

(this is again due to (4.3)) and

$$\begin{aligned} \psi _{2,j}=\psi _{3,j}=0 \ (j=5, 6,7). \end{aligned}$$
(4.18)

With (4.17), we note that \(U_{n8}, U_{n9}, U_{n10}\) are independent of the internal stages \(U_{n2}, U_{n3}, U_{n4}\). Taking into all the requirements above, one can easily see that conditions 12 and 14 are now automatically fulfilled. Therefore, the only remaining condition to satisfy is condition 11.

Before working with condition 11, we first solve (4.15) using (4.17). For this, we observe that several coefficients \(a_{ij}\) can be considered as free parameters. To have \(U_{n8}, U_{n9}, U_{n10}\) are independent of each other, we choose

$$\begin{aligned} a_{98}=a_{10,8}=a_{10,9}=0. \end{aligned}$$
(4.19)

The resulting systems of linear equations from (4.15) is then solved with the unique solution

$$\begin{aligned}&a_{ij} =\dfrac{c^2_i c_k c_l \varphi _{2,i} -2 c^3_i (c_k +c_l)\varphi _{3,i}+ 6 c^4_i \varphi _{4,i}}{c_j(c_j-c_k)(c_j - c_l)},\nonumber \\&\quad i=8, 9,10; \ j, k, l \in \{5, 6, 7\}, \ j \ne k \ne l \end{aligned}$$
(4.20)

(i.e., \(c_5, c_6, c_7>0\) are distinct nodes).

We now use \(b_i=0 \) (\(i=2,\cdots ,7\)), (4.15), (4.17), (4.18), and (4.19) to simplify condition 11 as

$$\begin{aligned} \sum _{i=8}^{10} b_i J \sum _{j=5}^{7} a_{ij} J \big (a_{j2}J\psi _{2,2}+ a_{j3}J\psi _{2,3}+a_{j4}J\psi _{2,4}\big )=0. \end{aligned}$$
(4.21)

Since \(b_8, b_9, b_{10} \ne 0\), coefficients \(a_{ij}\) in (4.20) (\(i \in \{8, 9, 10\}, j \in \{5, 6, 7\}\)) are also nonzero, and that \(\psi _{2,2} \ne 0\), we must enforce

$$\begin{aligned} a_{j2}=0 \ (j=5, 6, 7), \ \text {i.e.,} \ a_{52}=a_{62}=a_{72}=0 \end{aligned}$$
(4.22)

and require that

$$\begin{aligned} \psi _{2,3}=\psi _{2,4}=0 \end{aligned}$$
(4.23)

in order to satisfy (4.21) in the strong sense. Note, because of (4.22), one could not require \(a_{53}=0\) or \(a_{54}=0\) (\(j=5\)) in (4.21) or both as this does not agree with the requirement \(\psi _{2,5}=\psi _{3,5}=0\) in (4.18) (in other words, the linear system of equations displayed in (4.5) represented for this requirement has no solution). This justifies the requirement (4.23).

Finally, we solve (4.23) and (4.18) for the remaining coefficients \(a_{ij}\). When solving (4.23) (see (4.8)), we choose \(a_{43}=0\) to have \(U_{n4}\) is independent of \(U_{n3}\). This gives

$$\begin{aligned} a_{32} =\dfrac{c^2_3}{c_2}\varphi _{2,3}, \quad a_{42} =\dfrac{c^2_4}{c_2}\varphi _{2,4}. \end{aligned}$$
(4.24)

When solving (4.18) (using (4.22)), we choose \(a_{65}=a_{75}=a_{76}=0\) to have \(U_{n5}, U_{n6}, U_{n7}\) are independent of each other. This results in the following 6 coefficients:

$$\begin{aligned} a_{ij} =\dfrac{ -c^2_i c_k \varphi _{2,i} + 2 c^3_i \varphi _{3,i}}{c_j (c_j-c_k)}, \quad i=5, 6,7; \ j, k \in \{3, 4\}, \ j \ne k \end{aligned}$$
(4.25)

(i.e., \(c_3, c_4 >0\) are distinct nodes).

Inserting all the obtained coefficients \(a_{ij}\) and \(b_i\) into \(U_{ni},\ u_{n+1}\) and rewriting these stages as linear combinations of \(\varphi \) functions, we obtain the following family of 5th-order 10-stage methods (with the groups of parallel stages \(\{U_{n3}, U_{n4}\}\), \(\{U_{n5}, U_{n6}, U_{n7}\}\), and \(\{U_{n8}, U_{n9}, U_{n10}\}\)) which will be called \({\mathtt {expRK5s10}}\):

$$\begin{aligned} \begin{aligned} U_{n2}&= u_n + \varphi _1 (c_2 hA) c_2 hF(t_n,u_n), \\ U_{n\ell }&= u_n + \varphi _1 (c_{\ell } hA) c_{\ell } hF(t_n, u_n) + \varphi _2 (c_{\ell } hA) \tfrac{c^2_{\ell }}{c_2} h D_{n2}, \quad \ell =3,4 \\ U_{nm}&= u_n + \varphi _1 (c_m hA)c_m hF(t_n, u_n)+ \varphi _{2} (c_m hA) c^2_m h \big (\tfrac{c_4 }{c_3 (c_4-c_3)}D_{n3} +\tfrac{c_3 }{c_4 (c_3-c_4)} D_{n4}\big )\\&\quad + \varphi _{3} (c_m hA) c^3_m h \big (\tfrac{2}{c_3 (c_3-c_4)} D_{n3} -\tfrac{2}{c_4 (c_3-c_4)} D_{n4}\big ), \quad m=5,6,7 \\ U_{nq}&= u_n + \varphi _1 (c_q hA)c_q hF(t_n, u_n) + \varphi _{2} (c_q hA) c^2_q h \big (\alpha _5 D_{n5} +\alpha _6 D_{n6}+\alpha _7 D_{n7} \big )\\&\quad + \varphi _{3} (c_q hA) c^3_q h \big (\beta _5 D_{n5} -\beta _6 D_{n6}-\beta _7 D_{n7}\big ) \\&\quad + \varphi _{4} (c_q hA) c^4_q h \big (\gamma _5 D_{n5} +\gamma _6 D_{n6}+\gamma _7 D_{n7}\big ), \quad q=8,9,10 \\ u_{n+1}&= u_n + \varphi _1 (hA) h F(t_n, u_n)+ \varphi _{2} (hA) h \big (\alpha _8 D_{n8} + \alpha _9 D_{n9} +\alpha _{10} D_{n10} \big ) \\&\quad -\varphi _{3} (hA) h \big (\beta _8 D_{n8} + \beta _9 D_{n9} +\beta _{10} D_{n10} \big ) +\varphi _{4} (hA) h \big (\gamma _8 D_{n8} + \gamma _9 D_{n9} +\gamma _{10} D_{n10} \big ), \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \alpha _i = \dfrac{c_k c_l}{c_i (c_i-c_k)(c_i - c_l)},\quad \beta _i = \dfrac{2(c_k+ c_l)}{c_i (c_i-c_k)(c_i - c_l)}, \quad \gamma _i = \dfrac{6}{c_i (c_i-c_k)(c_i - c_l)}\nonumber \\ \end{aligned}$$
(4.26)

with \(i \in \{5, 6, 7\}\) for \(\ k, l \in \{5, 6, 7\}\), and \(i \in \{8, 9, 10\}\) for \(\ k, l \in \{8, 9, 10\}\) (note that ikl are distinct indices and that \(c_i, c_k, c_l\) are distinct (positive) nodes).

For our numerical experiments, we choose \(c_2=c_3=c_5=\tfrac{1}{2}\), \(c_4=c_6=\tfrac{1}{3}\), \(c_7=\tfrac{1}{4}\), \(c_8=\tfrac{3}{10}\), \(c_9=\tfrac{3}{4}\), and \(c_{10}=1\) (satisfying (4.14)).

Remark 4.2

(A comparison with \({\mathtt {expRK5s8}}\)). Although the new fifth-order method \({\mathtt {expRK5s10}}\) has 10 stages (compared to 8 stages of \({\mathtt {expRK5s8}}\) displayed in Section 2), its special structure offers much more efficient for implementation. In particular, all \(U_{ni}\) in this scheme involve only one linear combination of \(\varphi _k (c_i h A)v_k\) which can be computed by one evaluation for each; and more importantly, due to the same format of multiple stages within each of the three groups \(\{U_{n3}, U_{n4}\}\), \(\{U_{n5}, U_{n6}, U_{n7}\}\), and \(\{U_{n8}, U_{n9}, U_{n10}\}\) (same linear combination with different inputs \(c_i\)), they can be computed simultaneously or implemented in parallel (see Sect. 5). This makes \({\mathtt {expRK5s10}}\) to behave like a 5-stage method only, thereby requiring only 5 sequential evaluations in each step. Moreover, while \({\mathtt {expRK5s8}}\) requires weakening 9 out of 16 order conditions of Table 1, \({\mathtt {expRK5s10}}\) requires only one condition (number 8) held in the weakened form. Note that by following the similar way of deriving \({\mathtt {expRK5s10}}\), we can derive a scheme that satisfies all the stiff order conditions in Table 1 in the strong sense with \(s=11\). Such a scheme, however, still behaves like a 5-stage method. Therefore, we do not discuss further on this case.

(b) The case \(s=9\) (which does not work): Clearly, in this case we have less degree of freedoms than the case \(s=10\) when solving the order conditions in Table 1. Nonetheless, one can still proceed in a similar way as done for \(s=10\). Again, it strongly suggests \(b_2=b_3=b_4=b_5=b_6=0\) (which solves for \(b_7, b_8, b_{9} \ne 0\) from conditions 1, 2, 4) and

$$\begin{aligned} \psi _{2,j}=\psi _{3,j}=\psi _{4,j}=0 \ (j=7, 8, 9) \end{aligned}$$
(4.27)

in order to satisfy conditions 1, 2, 3, 4, 5, 7, 9, 13, 15, 16 in the strong form. With this, conditions 6 and 10 now become

$$\begin{aligned} \sum _{j=2}^{6} (b_7 J a_{7j}+ b_8 J a_{8j}+ b_{9} J a_{9j}) J \psi _{m,j}=0 \ \ (m=2, 3). \end{aligned}$$
(4.28)

Again, due to the fact that \(\psi _{2,2}, \psi _{3,2} \ne 0\) and either \( \psi _{2,3}\) or \( \psi _{3,3}\) must be nonzero, one needs to enforce \( a_{7j}=a_{8j}=a_{9j}=0 \ (j=2, 3) \) in (4.28). Using this to solve (4.27) for \(j=7\) (\(\psi _{2,7}=\psi _{3,7}=\psi _{4,7}=0\)) gives a unique solution (with \(c_4, c_5, c_6 >0\) and are distinct) for \(a_{74}, a_{75}, a_{76} \ne 0\), which then determines \(U_{n7}\). Next, one can solve (4.27) for \(j=8, 9\) to obtain \( U_{n8}, U_{n9}\) that are independent of \(U_{n7}\), as well as are independent of each other, by requiring the three free parameters \(a_{87}=a_{97}=a_{98}=0\). As a result, one gets \(a_{7j}, a_{8j}, a_{9j} \ne 0 \ (j=5, 6)\). This immediately suggests \( \psi _{2,j}=\psi _{3,j}=0 \ (j=4, 5, 6) \) to completely fulfill (4.28) with arbitrary square matrix J. With all of these in place, conditions 12 and 14 are automatically fulfilled, and condition 11 is now reduced to

$$\begin{aligned} \sum _{i=7}^{9} b_i J \sum _{j=4}^{6} a_{ij} J \big (a_{j2}J\psi _{2,2}+ a_{j3}J\psi _{2,3}\big )=0. \end{aligned}$$
(4.29)

Clearly, since \(b_7, b_8, b_{9} \ne 0\), \(a_{7j}, a_{8j}, a_{9j} \ne 0 \ (j=4, 5, 6)\), and \(\psi _{2,2} \ne 0\), (4.29) can be satisfied in the strong sense only if we have one of the following conditions: \(a_{j2}=a_{j3}=0 \) or \(a_{j2}=\psi _{2,3}=0, \ (j=4, 5, 6)\). Unfortunately, either of these requirements is in contradiction with \( \psi _{2,j}=\psi _{3,j}=0 \ (j=4, 5, 6)\) which is needed for conditions 6 and 10 mentioned above. For example, solving \(\psi _{2,4}=\psi _{3,4}=0\) results in \(a_{42}, a_{43} \ne 0\).

5 Details implementation of fourth- and fifth-order schemes

In this section, we present details implementation of the old and new fourth- and fifth-order expRK schemes (\({\mathtt {expRK4s5}}\), \({\mathtt {expRK5s8}}\), \({\mathtt {expRK4s6}}\), \({\mathtt {expRK5s10}}\)) mentioned above.

As mentioned in Sect. 2.1, we will use the MATLAB routine phipm_simul_iom (described in details in [19]) to implement expRK methods. In particular, given the following inputs: an array of scaling factors \({\mathtt {t}}=[\rho _1, \ldots , \rho _{r}]\) with \(0<\rho _1<\rho _2<\cdots <\rho _r \le 1\) (\({\mathtt {t}}\) could be a positive scalar), an n-by-n matrix M, and a set of column vectors \({\mathtt {V}}=[V_0,\ldots ,v_q]\) (each \(v_i\) is an n-by-1 vector), a tolerance \({\mathtt {tol}}\), an initial value m for the dimension of the Krylov subspace, and an incomplete orthogonalization length of \({\mathtt {iom}}\), a call to this function

$$\begin{aligned} {\mathtt {phipm\_simul\_iom(t, M, V, tol, m, iom)}} \end{aligned}$$
(5.1)

simultaneously computes the following r linear combinations

$$\begin{aligned} L_{\rho _i ,{\mathtt {V}}}= & {} \varphi _0 (\rho _i M)v_0 + \varphi _1 (\rho _i M) \rho _i v_1 +\varphi _2 (\rho _i M) \rho _{i}^2 v_2+\cdots +\varphi _{q}(\rho _i M) \rho _{i} ^q v_{q}, \nonumber \\&\quad 1 \le i \le r. \end{aligned}$$
(5.2)

Note that, by setting \(V_{j}=\rho _{i}^{j} v_j \) (\(j=0,\cdots ,q\)), (5.2) becomes (2.2). In other words, all the linear combinations in (2.2) (if \(V_j\) are given instead of \(v_j\)) can be then computed at the same time with one call (5.1) by using scaled vectors \(v_j= V_{j}/ \rho _{i}^{j}\) for the input \({\mathtt {V}}\).

In the following, we set

$$\begin{aligned} M=hA, \quad \rho _i= c_i, \quad v=h F (t_n, u_n), \quad d_{i}=hD_{ni}. \end{aligned}$$
(5.3)

to simplify notations in presenting details of implementation of the fourth- and fifth-order methods mentioned above. When calling (5.1), we use \({\mathtt {tol}}=10^{-12}\), \({\mathtt {m}}=1\) (default value), and \({\mathtt {imo}}=2\) (as in [19]).

Implementation of \({\mathtt {expRK4s5}}\) (\(c_2=c_3=c_5=\tfrac{1}{2}, c_4=1 \)): As discussed in Remark 2.1, \({\mathtt {expRK4s5}}\) requires a sequential implementation of the following 6 different linear combinations of the form (5.2), corresponding to 6 calls to phipm_simul_iom:

  1. (i)

    Evaluate \(L_{c_2,{\mathtt {V}}}\) with \({\mathtt {t}}=c_2, {\mathtt {V}}=[0, v]\) to get \(U_{n2}=u_n + L_{c_2,{\mathtt {V}}}\).

  2. (ii)

    Evaluate \(L_{c_3,{\mathtt {V}}}\) with \({\mathtt {t}}=c_3, {\mathtt {V}}=[0, v, d_2/c^2_3]\) to get \(U_{n3}=u_n + L_{c_3,{\mathtt {V}}}\).

  3. (iii)

    Evaluate \(L_{c_4,{\mathtt {V}}}\) with \({\mathtt {t}}=c_4, {\mathtt {V}}=[0, v, d_2 +d_3]\) to get \(U_{n4}=u_n + L_{c_4,{\mathtt {V}}}\).

  4. (iv)

    Evaluate \(L_{c_5,{\mathtt {V_1}}}\) with \({\mathtt {t}}=c_5, {\mathtt {V}}_1=[0, v, 2 d_{2} + 2 d_{3} - d_{4},(-d_{2} - d_{3} + d_{4})/c^2_5] \) and

  5. (v)

    Evaluate \(L_{c_4,{\mathtt {V_2}}}\) with \({\mathtt {t}}=c_4, {\mathtt {V}}_2=[0, 0, ( d_{2} + d_{3} - d_{4})/4,(-d_{2} - d_{3} + d_{4})] \) to get \(U_{n5}=u_n + L_{c_5,{\mathtt {V}}_1}+L_{c_4,{\mathtt {V_2}}} \).

  6. (vi)

    Evaluate \(L_{1,{\mathtt {V}}}\) with \({\mathtt {t}}=1, {\mathtt {V}}=[0,v, -d_4 +5d_5, 4d_4-8d_5 ]\) to get \(u_{n+1}=u_n + L_{1,{\mathtt {V}}}\).

Since \(d_i=hD_{ni}\) which depends on \(U_{ni}\), these are the 6 (sequential) evaluations.

Implementation of \({\mathtt {expRK4s6}}\) (\(c_2=c_3=\frac{1}{2}, c_4=c_6=\frac{1}{3}, c_5=\frac{5}{6}\)): As discussed in Remark 4.1, \({\mathtt {expRK4s6}}\) can be implemented like a 4-stage method by evaluating the following 4 sequential evaluations, corresponding to 4 calls to phipm_simul_iom:

  1. (i)

    Evaluate \(L_{c_2,{\mathtt {V}}}\) with \({\mathtt {t}}=c_2, {\mathtt {V}}=[0, v]\) to get \(U_{n2}=u_n + L_{c_2,{\mathtt {V}}}\).

  2. (ii)

    Evaluate \(L_{c_4,{\mathtt {V}}}\) and \(L_{c_3,{\mathtt {V}}}\) simultaneously using \({\mathtt {t}}=[c_4, c_3], {\mathtt {V}}=[0, v, d_2/c_2]\) to get both \(U_{n3}=u_n + L_{c_3,{\mathtt {V}}}\) and \(U_{n4}=u_n + L_{c_4,{\mathtt {V}}}\).

  3. (iii)

    Evaluate \(L_{c_5,{\mathtt {V}}}\) and \(L_{c_6,{\mathtt {V}}}\) simultaneously with \({\mathtt {t}}=[c_6, c_5], {\mathtt {V}}=[0, v, \tfrac{-c_4}{(c_3-c_4)c_3}d_3 +\tfrac{c_3}{(c_3-c_4)c_4}d_4, \tfrac{1}{(c_3-c_4)c_3}d_3 -\tfrac{1}{(c_3-c_4)c_4}d_4]\) to get both \(U_{n5}=u_n + L_{c_5,{\mathtt {V}}}\) and \(U_{n6}=u_n + L_{c_6,{\mathtt {V}}}\).

  4. (iv)

    Evaluate \(L_{1,{\mathtt {V}}}\) with \({\mathtt {t}}=1, {\mathtt {V}}=[0,v, \tfrac{1}{c_5-c_6}(\tfrac{-c_6}{c_5}d_5 + \tfrac{c_5}{c_6}d_6), \tfrac{2}{c_5-c_6}(\tfrac{1}{c_5}d_5 - \tfrac{1}{c_6}d_6) ]\) to get \(u_{n+1}=u_n + L_{1,{\mathtt {V}}}\).

Implementation of \({\mathtt {expRK5s8}}\) (\(c_2=c_3=c_5=\frac{1}{2}, c_4=\frac{1}{4}, c_6=\frac{1}{5}, c_7=\frac{2}{3}, c_8=1\)): As discussed in Remark 2.1, \({\mathtt {expRK5s8}}\) requires a sequential implementation of 11 different linear combinations of the form (5.2), corresponding to the following 11 calls to phipm_simul_iom:

  1. (i)

    Evaluate \(L_{c_2,{\mathtt {V}}}\) with \({\mathtt {t}}=c_2, {\mathtt {V}}=[0, v]\) to get \(U_{n2}=u_n + L_{c_2,{\mathtt {V}}}\).

  2. (ii)

    Evaluate \(L_{c_3,{\mathtt {V}}}\) with \({\mathtt {t}}=c_3, {\mathtt {V}}=[0, v, d_2/c^2_3]\) to get \(U_{n3}=u_n + L_{c_3,{\mathtt {V}}}\).

  3. (iii)

    Evaluate \(L_{c_4,{\mathtt {V}}}\) with \({\mathtt {t}}=c_4, {\mathtt {V}}=[0, v, d_3/c^2_4]\) to get \(U_{n4}=u_n + L_{c_4,{\mathtt {V}}}\).

  4. (iv)

    Evaluate \(L_{c_5,{\mathtt {V}}}\) with \({\mathtt {t}}=c_5, {\mathtt {V}}=[0, v, (- d_{3} + 4d_{4})/c^2_5, (2d_{3} - 4d_{4})/c^3_5 ]\) to get \(U_{n5}=u_n + L_{c_5,{\mathtt {V}}}\).

  5. (v)

    Evaluate \(L_{c_6,{\mathtt {V}}}\) with \({\mathtt {t}}=c_6, {\mathtt {V}}=[0, v, (8d_{4} - 2d_{5})/c^2_6, (-32d_{4} +16d_{5})/c^3_6 ]\) to get \(U_{n6}=u_n + L_{c_6,{\mathtt {V}}}\).

  6. (vi)

    Evaluate \(L_{c_7,{\mathtt {V_1}}}\) with \({\mathtt {t}}=c_7, {\mathtt {V}}_1=[0, v, (\tfrac{-16}{27} d_{5} +\tfrac{100}{27} d_{6})/c^2_7, (\tfrac{320}{81} d_{5} -\tfrac{800}{81} d_{n6})/c^3_7] \) and

  7. (vii)

    Evaluate \(L_{c_6,{\mathtt {V_2}}}\) with \({\mathtt {t}}=c_6, {\mathtt {V}}_2=[0, 0, (\tfrac{-20}{81} d_{4} +\tfrac{5}{243} d_{5}+\tfrac{125}{486} d_{6})/c^2_6, (\tfrac{16}{81} d_{4} -\tfrac{4}{243} d_{5}-\tfrac{50}{243} d_{6})/c^3_6] \) to get \(U_{n7}=u_n + L_{c_7,{\mathtt {V}}_1}+L_{c_6,{\mathtt {V_2}}} \).

  8. (viii)

    Evaluate \(L_{c_8,{\mathtt {V_1}}}\) with \({\mathtt {t}}=c_8, {\mathtt {V}}_1=[0, v, (\tfrac{-16}{3} d_{5} +\tfrac{250}{21} d_{6}+\tfrac{27}{14} d_{7})/c^2_8, (\tfrac{208}{3} d_{5} -\tfrac{250}{3} d_{6}- 27 d_{7})/c^3_8, (-240d_{5} + \tfrac{1500}{7} d_{6}+ \tfrac{810}{7} d_{7})/c^4_8 ] \) and

  9. (ix)

    Evaluate \(L_{c_6,{\mathtt {V_2}}}\) with \({\mathtt {t}}=c_6, {\mathtt {V}}_2=[0, 0, (\tfrac{-4}{7} d_{5} + \tfrac{25}{49} d_{6} +\tfrac{27}{98} d_{7}) /c^2_6, (\tfrac{8}{5} d_{5} - \tfrac{10}{7} d_{6} - \tfrac{27}{35} d_{7})/c^3_6, (\tfrac{-48}{35} d_{5} + \tfrac{60}{49} d_{6} + \tfrac{162}{245} d_{7})/c^4_6]\) and

  10. (x)

    Evaluate \(L_{c_7,{\mathtt {V_3}}}\) with \({\mathtt {t}}=c_7, {\mathtt {V}}_3=[0, 0, (\tfrac{-288}{35} d_{5} + \tfrac{360}{49} d_{6} +\tfrac{972}{245} d_{7}) /c^2_7, (\tfrac{384}{5} d_{5} - \tfrac{480}{7} d_{6} - \tfrac{1296}{35} d_{7})/c^3_7, (\tfrac{-1536}{7} d_{5} + \tfrac{9600}{49} d_{6} + \tfrac{5184}{49} d_{7})/c^4_7] \) to get \(U_{n8}=u_n + L_{c_8,{\mathtt {V}}_1}+ L_{c_6,{\mathtt {V}}_2}+L_{c_7,{\mathtt {V_3}}} \).

  11. (xi)

    Evaluate \(L_{1,{\mathtt {V}}}\) with \({\mathtt {t}}=1, {\mathtt {V}}=[0,v, \tfrac{125}{14} d_{6} -\tfrac{27}{14} d_{7} + \tfrac{1}{2} d_{8}, \tfrac{-625}{14} d_{6} + \tfrac{162}{7} d_{7} - \tfrac{13}{2} d_{8},\tfrac{1125}{14} d_{6} - \tfrac{405}{7} d_{7} + \tfrac{45}{2} d_{8} ]\) to get \(u_{n+1}=u_n + L_{1,{\mathtt {V}}}\).

Implementation of \({\mathtt {expRK5s10}}\) (\(c_2=c_3=c_5=\tfrac{1}{2}\), \(c_4=c_6=\tfrac{1}{3}\), \(c_7=\tfrac{1}{4}\), \(c_8=\tfrac{3}{10}\), \(c_9=\tfrac{3}{4}\), and \(c_{10}=1\)): As discussed in Remark 4.2, \({\mathtt {expRK5s10}}\) can be implemented like a 5-stage method by evaluating the following 5 sequential evaluations, corresponding to 5 calls to phipm_simul_iom:

  1. (i)

    Evaluate \(L_{c_2,{\mathtt {V}}}\) with \({\mathtt {t}}=c_2, {\mathtt {V}}=[0, v]\) to get \(U_{n2}=u_n + L_{c_2,{\mathtt {V}}}\).

  2. (ii)

    Evaluate \(L_{c_4,{\mathtt {V}}}\) and \(L_{c_3,{\mathtt {V}}}\) simultaneously using \({\mathtt {t}}=[c_4, c_3], {\mathtt {V}}=[0, v, d_2/c_2]\) to get both \(U_{n3}=u_n + L_{c_3,{\mathtt {V}}}\) and \(U_{n4}=u_n + L_{c_4,{\mathtt {V}}}\).

  3. (iii)

    Evaluate \(L_{c_5,{\mathtt {V}}}\), \(L_{c_6,{\mathtt {V}}}\), and \(L_{c_7,{\mathtt {V}}}\) simultaneously using \({\mathtt {t}}=[c_7, c_6, c_5]\), \({\mathtt {V}}=[0, v, \tfrac{c_4 }{c_3 (c_4-c_3)}d_{3} +\tfrac{c_3 }{c_4 (c_3-c_4)} d_{4}, \tfrac{2}{c_3 (c_3-c_4)} d_{3} -\tfrac{2}{c_4 (c_3-c_4)} d_{4}]\) to get \(U_{n5}=u_n + L_{c_5,{\mathtt {V}}}\), \(U_{n6}=u_n + L_{c_6,{\mathtt {V}}}\), \(U_{n7}=u_n + L_{c_7,{\mathtt {V}}}\).

  4. (iv)

    Evaluate \(L_{c_8,{\mathtt {V}}}\), \(L_{c_9,{\mathtt {V}}}\), and \(L_{c_{10},{\mathtt {V}}}\) simultaneously using \({\mathtt {t}}=[c_9, c_{10}, c_8]\), \({\mathtt {V}}=[0, v, \alpha _5 d_{5} +\alpha _6 d_{6}+\alpha _7 d_{7}, \beta _5 d_{5} -\beta _6 d_{6}-\beta _7 d_{7}, \gamma _5 d_{5} +\gamma _6 d_{6}+\gamma _7 d_{7} ]\) to get \(U_{n8}=u_n + L_{c_8,{\mathtt {V}}}\), \(U_{n9}=u_n + L_{c_9,{\mathtt {V}}}\), \(U_{n10}=u_n + L_{c_{10},{\mathtt {V}}}\).

  5. (v)

    Evaluate \(L_{1,{\mathtt {V}}}\) with \({\mathtt {t}}=1, {\mathtt {V}}=[0,v, \alpha _8 d_{8} + \alpha _9 d_{9} +\alpha _{10} d_{10}, \beta _8 d_{8} + \beta _9 d_{9} +\beta _{10} d_{10}, \gamma _8 d_{8} + \gamma _9 d_{9} +\gamma _{10} d_{10} ]\) to get \(u_{n+1}=u_n + L_{1,{\mathtt {V}}}\) (coefficients \(\alpha _i, \beta _i, \gamma _i\) are given in (4.26)).

6 Numerical experiments

In this section, we demonstrate the efficiency of our newly derived fourth- and fifth-order expRK time integration methods (\({\mathtt {expRK4s6}}\), \({\mathtt {expRK5s10}}\)). Specifically, we will compare their performance against the existing methods of the same orders (\({\mathtt {expRK4s5}}\), \({\mathtt {expRK5s8}}\)) on several examples of stiff PDEs. All the numerical simulations are performed in MATLAB on a single workstation, using an iMac 3.6 GHz Intel Core i7, 32 GB 2400 MHz DDR4.

Example 6.1

(A one-dimensional semilinear parabolic problem [8]): We first verify the order of convergence for the new derived fourth- and fifth-order expRK schemes (\({\mathtt {expRK4s6}}\), \({\mathtt {expRK5s10}}\)) by considering the following PDE for u(xt), \(x\in [0,1], t\in [0,1]\), and subject to homogeneous Dirichlet boundary conditions,

$$\begin{aligned} \frac{\partial u(x,t)}{\partial t}- \frac{\partial ^2 u(x,t)}{\partial x^2} =\frac{1}{1+u^{2}(x,t)}+\varPhi (x,t), \end{aligned}$$
(6.1)

whose exact solution is known to be \(u(x,t)=x(1-x)\mathrm{e}^t\) for a suitable choice of the source function \(\varPhi (x,t)\).

Spatial discretization: For this example, we use standard second order finite differences with 200 grid points. This leads to a very stiff system of the form (1.1) (with \(\Vert A\Vert _{\infty } \approx 1.6 \times 10^5\)).

The resulting system is then integrated on the time interval [0, 1] using constant step sizes, corresponding to the number of time steps \(N=4, 8, 16, 32, 64\). The time integration errors at the final time \(t=1\) are measured in the maximum norm.

In Fig. 1, we plot orders for all the employed integrators in the left diagram and the total CPU time versus the global errors in the right diagram. The left diagram clearly shows a perfect agreement with our convergence result in Theorem 3.1, meaning that the two new integrators \({\mathtt {expRK4s6}}\) and \({\mathtt {expRK5s10}}\) fully achieve orders 4 and 5, respectively. When compared to the old integrators of the same orders \({\mathtt {expRK4s5}}\) and \({\mathtt {expRK5s8}}\), we note that, given the same number of time steps, \({\mathtt {expRK4s6}}\) is slightly more accurate but is much faster than \({\mathtt {expRK4s5}}\) (see the right diagram). In a similar manner, \({\mathtt {expRK5s10}}\) gives almost identical global errors but is also much faster than \({\mathtt {expRK5s8}}\). Finally, we observe that, for this example, for a global error that is larger than \(10^{-6}\), the new fourth-order method \({\mathtt {expRK4s6}}\) is the fastest one, and for more stringent errors, \({\mathtt {expRK5s10}}\) is the fastest integrator.

Fig. 1
figure 1

Order plots (left) and total CPU times (right) of \({\mathtt {expRK4s5}}\), \({\mathtt {expRK4s6}}\) , \({\mathtt {expRK5s8}}\), and \({\mathtt {expRK5s10}}\) when applied to (6.1). The global errors at time \(t=1\) are plotted as functions of the number of time steps (left) and the total CPU time in second (right). For comparison, straight lines with slopes 4 and 5 are added

Example 6.2

(A nonlinear Schrödinger equation [2, 5]): We consider the following one-dimensional nonlinear Schrödinger (NLS) equation with periodic boundary conditions

$$\begin{aligned} \begin{aligned} {\mathtt {i}} \frac{\partial \varPsi (x,t)}{\partial t}&= - \frac{\partial ^2 \varPsi (x,t)}{\partial x^2} + \big (V(x)+ \lambda |\varPsi (x,t)|^2\big )\varPsi (x,t), \\ \varPsi (-\pi ,t)&= \varPsi (\pi ,t), \quad t \ge 0\\ \varPsi (0,t)&=\varPsi _0 (x), \quad x \in [-\pi , \pi ] \end{aligned} \end{aligned}$$
(6.2)

where the potential function \(V(x)=\dfrac{1}{1+\sin ^2 (x)}\), the initial condition \(\varPsi _0 (x)=\mathrm{e}^{\sin (2x)}\), and the constant \(\lambda =1\) (see [2]).

Spatial discretization: For this example, we use a discrete Fourier transform \({\mathscr {F}}\) with \(ND=128\) modes, leading to a mildly stiff system of the form (1.1) with

$$\begin{aligned} \begin{aligned} A&= \text {diag}(-{\mathtt {i}} k^2),\ k= -\frac{ND}{2}+1, \ldots , \frac{ND}{2}= -63, \ldots , 64 \\ g(t,u)&=-{\mathtt {i}} {\mathscr {F}}((V(x)+ \lambda |{\mathscr {F}}^{-1}(u)|^2){\mathscr {F}}^{-1}(u). \end{aligned} \end{aligned}$$
(6.3)

Next, we integrate this system on the time interval [0, 3] with constant step sizes, corresponding to the number of time steps \(N= 64, 128\), 256, 512, 1024. Since the exact solution \( \varPsi (x,t)\) of (6.2) is unknown, a reliable reference solution is computed by the stiff solver \({\mathtt {ode15s}}\) with \(ATOL=RTOL=10^{-14}\). Again, the time integration errors are measured in a discrete maximum norm at the final time \(t= 3\).

As seen from the two double-logarithmic diagrams in Fig. 2, we plot the accuracy of the four employed integrators (\({\mathtt {expRK4s5}}\), \({\mathtt {expRK4s6}}\) , \({\mathtt {expRK5s8}}\), and \({\mathtt {expRK5s10}}\)) as functions of the number of time steps (left) and the total CPU time (right). The left digram clearly indicates that the two new integrators \({\mathtt {expRK4s6}}\) and \({\mathtt {expRK5s10}}\) achieve their corresponding expected orders 4 and 5. While \({\mathtt {expRK5s10}}\) is a little more accurate than \({\mathtt {expRK5s8}}\), \({\mathtt {expRK4s6}}\) is much more accurate than \({\mathtt {expRK4s5}}\) for a given same number of time steps, meaning that it can take much larger time steps while achieving the same accuracy. Moreover, the right precision digram displays the efficiency plot indicating that both \({\mathtt {expRK4s6}}\) and \({\mathtt {expRK5s10}}\) are much faster than their counterparts \({\mathtt {expRK4s5}}\) and \({\mathtt {expRK5s8}}\), respectively. More specifically, a similar story is observed: for lower accuracy requirements, say error \(\sim 10^{-7}\), the new fourth-order method \({\mathtt {expRK4s6}}\) is the most efficient, whereas for error \(\sim 10^{-8}\) or tighter the new fifth-order method \({\mathtt {expRK5s10}}\) is the most efficient.

Fig. 2
figure 2

Order plots (left) and total CPU times (right) of \({\mathtt {expRK4s5}}\), \({\mathtt {expRK4s6}}\) , \({\mathtt {expRK5s8}}\), and \({\mathtt {expRK5s10}}\) when applied to Example 6.2. The errors at time \(t=3\) are plotted as functions of the number of time steps (left) and the total CPU time in second (right). For comparison, straight lines with slopes 4 and 5 are added

Example 6.3

(A 2D Gray–Scott model [3, 7]): Consider the following two-dimensional reaction-diffusion equation–the Gray–Scott equation model, for \(u = u(x, y, t ), \ v= v(x, y, t)\) on the square \(\varOmega =[0, L]^2\), (here, we choose \(L=1.5\)) subject to periodic boundary conditions

$$\begin{aligned} \begin{aligned} \frac{\partial u}{\partial t}&=d_u \varDelta u - uv^2+ \alpha (1-u),\\ \frac{\partial v}{\partial t}&= d_v \varDelta v +uv^2 -(\alpha +\beta )v, \end{aligned} \end{aligned}$$
(6.4)

where \(\varDelta \) is the Laplacian operator, the diffusion coefficients \(d_u=0.02, \ d_v=0.01\), and the bifurcation parameters \(\alpha =0.065,\ \beta =0.035\). The initial conditions are Gaussian pulses

$$\begin{aligned} u(x,y,0)=1-e^{-150\big ((x-L)^2+ (y-L)^2\big )}, \ v(x,y,0)=e^{-150\big ((x-L)^2+ 2(y-L)^2\big )}. \end{aligned}$$

Spatial discretization: For this example, we use standard second order finite differences using 150 grid points in each direction with mesh width \(\varDelta x = \varDelta y = L/150\). This gives a stiff system of the form (1.1).

The system is then solved on the time interval [0, 2] using constant step sizes. In the absence of an analytical solution of (6.4), a high-accuracy reference solution is computed using the \({\mathtt {expRK4s6}}\) method with a sufficient small time step. Errors are measured in a discrete maximum norm at the final time \(t= 2\).

In Fig. 3, using the same number of time steps \(N=32, 64, 128\), 256, 512, 1024, we again display the order plots of the taken integrators. One can see that \({\mathtt {expRK4s6}}\) is much more accurate than \({\mathtt {expRK4s5}}\) and \({\mathtt {expRK5s10}}\) is slightly more accurate than \({\mathtt {expRK5s8}}\).

Fig. 3
figure 3

Order plots of \({\mathtt {expRK4s5}}\), \({\mathtt {expRK4s6}}\) , \({\mathtt {expRK5s8}}\), and \({\mathtt {expRK5s10}}\) when applied to Example 6.3. The errors at time \(t=2\) are plotted as functions of the number of time steps. For comparison, straight lines with slopes 4 and 5 are added

In Fig. 4, we display the efficiency plot for which the time step sizes were chosen for each integrator to obtain about same error thresholds \(10^{-i}, \ i= 5,\ldots , 11\) (The corresponding number of time steps for each integrator are displayed in Table 2. As seen, given about the same level of accuracy, the new methods use smaller steps than the old ones of the same order, meaning that they can take larger step sizes). Again, \({\mathtt {expRK4s6}}\) is much faster than \({\mathtt {expRK4s5}}\) and it is interesting that this new fourth-order method turns out to be the most efficient (although for error thresholds tighter than \(10^{-11}\) the new fifth-order method \({\mathtt {expRK5s10}}\) seems to become the most efficient).

Fig. 4
figure 4

Total CPU times of \({\mathtt {expRK4s5}}\), \({\mathtt {expRK4s6}}\) , \({\mathtt {expRK5s8}}\), and \({\mathtt {expRK5s10}}\) when applied to Example 6.3. The time step sizes were chosen in such a way that each integrator achieves about the same error thresholds \(10^{-i}, \ i= 5,\ldots , 11\). The errors at time \(t=2\) are plotted as functions of the total CPU time (in second)

Table 2 The number of time steps taken to achieve about the same error thresholds \(10^{-i}, \ i= 5,\ldots , 11\)

The numerical results presented on the three examples above clearly confirm the advantage of constructing parallel stages expRK methods based on Theorem 3.1, leading to more efficient and accurate methods \({\mathtt {expRK4s6}}\) and \({\mathtt {expRK5s10}}\).