1 Introduction

When solving an ordinary differential equation (ODE) of the form

$$\begin{aligned}&u_t = F(t,u) \;,\;\;\;\;\; t \ge 0 \nonumber \\&u(t=0) =u_0 \; \end{aligned}$$
(1)

One can evolve the solution forward in time using the first order forward Euler method

$$\begin{aligned} v_{n+1} = v_n + \Delta t F(t_n, v_n) \; . \end{aligned}$$

To obtain a more accurate solution, one can use methods with multiple steps:

$$\begin{aligned} v_{n+1} \,= \sum _{j=1}^s a_j\, v_{n+1-j} + \, \Delta t \sum _{j=0}^s b_j F(t_{n+1-j}, v_{n+1-j} ), \end{aligned}$$
(2)

known as linear multistep methods [3]. Alternatively, one can use multiple stages, such as Runge–Kutta methods [3]:

$$\begin{aligned} y_i= & {} F\left( v_n+ \sum _{j=1}^m a_{ij} y^{(j)}, t_{n} + \Delta t \sum _{j=1}^m a_{ij}\right) \; \; \; \text{ for } \; \; j=1,...,m\\ v_{n+1}= & {} v_n + \Delta t \sum _{j=1}^m b_j y_j .\\ \end{aligned}$$

The class of general linear methods described in [2, 9] combines the use of multiple steps and stages, constructing methods of the form:

$$\begin{aligned} y_i= & {} \sum _{j=1}^s \tilde{U}_{ij} v_{n} + \Delta t \sum _{j=1}^m \tilde{A}_{ij} f(y_j) \nonumber \\ v^i_{n+1}= & {} \sum _{j=1}^s \tilde{V}_{ij} v^i_{n} + \Delta t \sum _{j=1}^m \tilde{B}_{ij} f(y_j) \; . \end{aligned}$$
(3)

The inclusion of multiple derivatives, such as Taylor series methods [3],

$$\begin{aligned} v_{n+1}= v_n + \Delta t F(t_n, v_n) + \frac{\Delta t^2 }{2} F'(t_n,v_n) + \frac{\Delta t^3 }{3!} F''(v^n), \end{aligned}$$

is another possibility, and multiple stages and derivatives have also been developed and used successfully [4, 10, 11, 17, 18].

For time-dependent problems the global error, which is the difference between the numerical and exact solution at any given time \(t_n= n \Delta t\):

$$\begin{aligned} E_n = v_n - u(t_n) , \end{aligned}$$

depends on the local truncation error which, roughly speaking, is the accuracy over one time step. In our case we define the local truncation error as the error of the method over one time-step, normalized by \(\Delta t\). For example, the local truncation error for Euler’s method is (following [1, 6, 8, 13, 15])

$$\begin{aligned} \tau = \frac{u(t_{n+1}) - u(t_{n}) - \Delta t F(t_n, u(t_{n}) )}{ \Delta t} \approx O(\Delta t). \end{aligned}$$

(To avoid confusion it is important to note that sometimes the truncation error is defined a little differently than we define it above and is not normalized by \({\Delta t}\)).

A well known theoretical result, known as the Lax-Richtmeyer equivalence theorem (see e.g. [6, 12, 13]) states that if the numerical scheme is stable then the global error is at least of the same order as the local truncation error. In all the schemes for numerically solving ordinary differential equations (ODEs) that we are familiar with from the literature, the global errors are indeed of the same order as their local truncation errors.Footnote 1 This is common to other fields in numerical mathematics, such as for finite difference schemes for partial differential equations (PDEs), see e.g. [6, 13]. It was recently demonstrated, however, that finite difference schemes for PDEs can be constructed such that their convergence rates, or the order of their global errors, are higher than the order of the truncation errors [5]. In this work we adopt and adapt the ideas presented in [5] to show that it is possible to construct numerical methods for ODEs where the the global error is one order higher than the local truncation error. As we discuss below, these schemes achieve this higher order by inhibiting the lowest order term in the local error from accumulating over time, and so we name them Error Inhibiting Schemes.

Following an idea in [14], an interesting paper by Shampine and Watt in 1969 [16] describes a class of implicit one-step methods that obtain a block of s new step values at each step. These methods take s initial step values and generate the next s step values, and so on, all in one step. These methods are in fact explicit block one-step methods, and can be written as general linear methods of the form (3) above. Inspired by this form, we construct explicit block one-step methods which are in the form (3), but where the matrix \(\tilde{U}\) is an identity matrix, and the matrix \(\tilde{A}\) is all zeros; these are known as Type 3 methods in [2]. The major feature of our methods is that in addition to satisfying the appropriate order conditions listed in [2], they have a special structure that mitigates the accumulation of the truncation error, so we obtain a global error that is one order higher than predicted by the order conditions in [2], which describe the local truncation error.

In Sect. 2 we motivate our approach by describing how typical multistep methods can be written and analyzed as block one-step methods: these methods obtain a block of s new step values at each step. We show how this form allows us to precisely describe the growth of the error over the time-evolution. In Sect. 3 we then exploit this understanding to develop explicit error inhibiting block one-step methods that produce higher order global errors than possible for typical multistep methods. In Sect.  4 we present some methods developed according to the theory in Sect. 3 and we test these methods on several numerical examples to demonstrate that the order of convergence is indeed one order higher than the local truncation error. We also show that, in contrast to our error inhibiting Type 3 method, a typical Type 3 method developed by Butcher in [2] does not satisfy the critical condition for a method to be error inhibiting and therefore produces a global error that is of the same order as the local truncation error. Finally, we present our conclusions in Sect. 5, and suggest that further investigation of error inhibiting methods shall include the analysis of their linear stability properties, storage implications, and computational efficiency.

2 Motivation

In this section we present the analysis of explicit multistep methods in a block one-step form for a simple linear problem. In this familiar setting we define the local truncation error, the global error, and the solution operator that connects them. We also discuss the stability of a method of this form. We limit our analysis to the linear case so that we can clearly observe the process by which the solution operator interacts with the local truncation error, and results in a global error that is of the same order as the local truncation error. Although we are dealing for the moment with standard multistep methods, this will set the stage for the construction and analysis of error inhibiting block one-step methods.

In order to illustrate the main idea we start with a linear ordinary differential equation (ODE)

$$\begin{aligned}&u_t = f(t) \; u \;,\;\;\;\;\; t \ge 0 \nonumber \\&u(t=0) =u_0 \; \end{aligned}$$
(4)

where \(f(t)<M \, , \; \forall t \ge 0\) and f(t) is analytic.

An s-step explicit multistep method applied to (4) takes the form

$$\begin{aligned} v_{n+s} = \sum _{j=0}^{s-1} a_j\, v_{n+j} \, + \, {\Delta t}\sum _{j=0}^{s-1} b_j F(t_{n+j}, \, v_{n+j} ) = \sum _{j=0}^{s-1} a_j\, v_{n+j} \, + \, {\Delta t}\sum _{j=0}^{s-1} b_j f(t_{n+j}) \, v_{n+j} \end{aligned}$$
(5)

where the time domain is discretized by the sequence \(t_n = n \, {\Delta t}\), and \(v_n\) denotes the numerical approximation of \(u(t_n)\). The method (5) is defined by its coefficients \( \{ a_j \}_{j=0}^{s-1}\) and \(\{b_j \}_{j=0}^{s-1}\), which are constant values.

Following [6] we rewrite the method (5) in its block form. To do this, we first introduce the exact solution vector

$$\begin{aligned} U_n = \left( u(t_{n+s-1}), \ldots , u(t_n) \right) ^T \end{aligned}$$
(6)

and similarly, the numerical solution vector is

$$\begin{aligned} V_n = \left( v_{n+s-1}, \ldots , v_n\right) ^T . \end{aligned}$$
(7)

Now (5) can be written in block form so that it looks like a one step scheme

$$\begin{aligned} V_{n+1} = Q_n V_n \end{aligned}$$
(8)

where

$$\begin{aligned} Q_n = \left( \begin{array}{cccc} a_{s-1}+{\Delta t}b_{s-1} f(t_{n+s-1}) \; \; &{} \; \; a_{s-2}+{\Delta t}b_{s-2}f(t_{n+s-2}) \; \; &{} \dots &{} \; \; a_{0}+{\Delta t}b_{0} f(t_{n})\\ I \\ &{} \ddots \\ &{} &{} I &{}0 \end{array} \right) , \end{aligned}$$
(9)

this matrix (or its transpose) is often called the companion matrix.

From repeated applications of Eq. (8) we observe that the numerical solution vector \(V_n\) at any time \(t_n\) can be related to \(V_{\nu }\) for any previous time \(t_\nu \)

$$\begin{aligned} V_n = S_{\Delta t}\left( t_n, t_{\nu }\right) V_{\nu }\;,\;\;\;{\nu } \le n \end{aligned}$$
(10)

where \(S_{\Delta t}\) is the discrete solution operator. This operator can be expressed explicitly by

$$\begin{aligned} S_{\Delta t}\left( t_n, t_{\nu }\right) = Q_{n-1} \ldots Q_{\nu +1} Q_{\nu }\;,\;\;\;S_{\Delta t}\left( t_n, t_n\right) = I. \end{aligned}$$
(11)

For simplicity we can define this by

$$\begin{aligned} \prod _{\mu =\nu }^{n-1} Q_{\mu }\equiv Q_{n-1} \ldots Q_{\nu +1} Q_{\nu }\;,\;\;\;\prod _{\mu =n}^{n-1}Q_{\mu } \equiv I. \end{aligned}$$
(12)

Note that if each matrix \(Q_\mu \) is independent of \(\mu \) (in other words, in the constant coefficient case where f is independent of t), we simply have a product of matrices Q, and the discrete solution operator becomes

$$\begin{aligned} S_{\Delta t}\left( t_n, t_{\nu }\right) = Q^{n-\nu }. \end{aligned}$$
(13)

The behavior of a method depends in large part on the accuracy of its solution operator. We begin by defining the local truncation error as the error of the method over one time-step, normalized by \({\Delta t}\):

Definition 1

The local truncation error \({\varvec{\tau }}_n\) is given by [1, 6, 8, 13, 15]

$$\begin{aligned} {\Delta t}\, {\varvec{\tau }}_n \, = \, U_{n+1} - Q_n U_n \end{aligned}$$
(14)

Note that in the case of the standard multistep method, where \(Q_n\) is given by the matrix (9), the truncation error has only one non-zero element:

$$\begin{aligned} {\varvec{\tau }}_n \, = \, \left( \tau _n, 0, \ldots , 0\right) ^T . \end{aligned}$$
(15)

The error that we are most interested in is the difference between the exact solution vector and the numerical solution vector at time \(t_n\),

$$\begin{aligned} E_n = U_n-V_n \; , \end{aligned}$$
(16)

known as the global error. At the initial time, we have the error \(E_0\) which is based on the starting values a method of this sort requires: the values \(v_j\), \(j=0, \ldots , s-1\) that are prescribed or somehow computed. Typically, \(v_0\) is the initial condition defined in (1) and \(v_j\), \(j=1, \ldots , s-1\) are computed to sufficient accuracy using some other numerical scheme. Thus, the value of \(E_0\) is assumed to be as small as needed.

The evolution of the global error (16) depends on the local truncation error defined by (14) and the discrete solution operator given in (8):

$$\begin{aligned} E_{n+1} \, = \, Q_n E_n \,+\, {\Delta t}\, {\varvec{\tau }}_n \;. \end{aligned}$$
(17)

Unraveling this equality all the way back to \(E_0\) gives

$$\begin{aligned} E_{n} \, = \, S_{\Delta t}\left( t_n, 0\right) E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-1} S_{\Delta t}\left( t_n, t_{\nu +1}\right) {\varvec{\tau }}_{\nu } \;, \end{aligned}$$
(18)

or, equivalently

$$\begin{aligned} E_{n} \, = \, \prod _{\mu =0}^{n-1} Q_{\mu } E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} Q_{\mu } \right) {\varvec{\tau }}_{\nu } \;. \end{aligned}$$
(19)

(This formula is obtained from the discrete version of Duhamel’s principle, see Lemma 5.1.1 in [6]).

It is clear from (18) that the behavior of the discrete solution operator \(S_{\Delta t}( t_n, t_{\nu +1}) \) must be controlled for this error to converge. This property defines the stability of the method. Also here we use the stability definition presented in [6], namely:

Definition 2

The scheme (8) is called stable if there are constants \(\alpha _s\) and \(K_s\), independent of \({\Delta t}\), such that for all \(0 <{\Delta t}\le {\Delta t}_0\)

$$\begin{aligned} \left\| S_{\Delta t}\left( t_n, t_{\nu }\right) \right\| \le K_s e^{\alpha _s\left( t_n - t_{\nu }\right) } \end{aligned}$$
(20)

If the scheme is stable, we can use (20) and (18) to bound the growth of the error:

$$\begin{aligned} \left\| E_{n} \right\| \, \le \, K_s \left[ e^{\alpha _s t_n } \left\| E_{0} \right\| \,+\, \max _{0 \le \nu \le n-1} \left\| {\varvec{\tau }}_{\nu } \right\| \phi _h^*(\alpha _s, \, t_n) \right] \;. \end{aligned}$$
(21)

where

$$\begin{aligned} \phi _{{\Delta t}}^*(\alpha _s, \, t_n) \,=\, {{\Delta t}} \,\sum _{\nu =0}^{n-1} e^{\alpha _s\left( t_n - t_{\nu +1}\right) } \, \approx \, \int _0^{t_n} e^{\alpha _s\left( t_n - \zeta \right) }d\zeta \,=\, \left\{ \begin{array}{lll} \frac{e^{\alpha _s\, t_n } -1}{\alpha _s} &{}&{} \alpha _s \ne 0 \\ t_n &{}&{} \alpha _s = 0 \end{array} \right. \;. \end{aligned}$$
(22)

Equation (21) means that stability implies convergence:Footnote 2 if the scheme is stable than the global error is controlled by the local truncation error for any given final time. In the formula above it is clear that the global error must have order at least as high as the local truncation error, but the possibility of having a higher order global error is left open.

The first Dahlquist barrier [3, 7] states that any explicit s step linear multistep method can be of order p no higher than s. It is the common experience that methods have global error of the same order as the local truncation error. These two together greatly limit the accuracy of the methods we can derive.

Remark 1

In an Adams–Bashforth scheme the entry in the first row and first column in the term \(S_{\Delta t}\left( t_n, t_{\nu }\right) = \prod _{\mu =\nu }^{n-1} Q_{\mu }\) is equal to \(1+O({\Delta t})\). Therefore the error, due to the accumulation of the contributions from the truncation errors, becomes:

$$\begin{aligned} \mathbf{e}_{n+s} \, = \, {\Delta t}\, \sum _{\nu =0}^{n-1} \left( 1+O({\Delta t}) \right) \tau _{\nu } \; \end{aligned}$$
(23)

which is approximately the average value of \(\tau _{\nu }\) over \({\nu =0,..,n-1} \). This suggests that we may need to look outside the family of linear multistep methods to attain a higher order global error.

The analysis in this section suggests that if the operator \(Q_n\) is properly constructed, the growth of the global error described in Eq. (19) may be controlled through the properties of the operator \(Q_n\) and its relationship with the local truncation error \({\varvec{\tau }}_{n}\). However, as implied by the example of the Adams-Bashforth scheme above, we need to construct methods where the operator \(Q_n\) is not limited by the structure in this section. In the next section we present the construction of block one-step methods that are error inhibiting. The class of methods described by this block one-step structure is very broad: while all classical multistep methods can be written in this block form, not every such block one-step method can be written as a classical multistep method. Thus, we rely on the discussion in this section with one main change: the structure of the operator \(Q_n\).

3 An Error Inhibiting Methodology

In Sect. 2 we rewrote explicit linear multistep methods in a block one-step form, and expressed the relationship between its local and global error. We observed that the growth of the local errors is driven by the behavior of the discrete solution operator \(Q_n\), and in particular its interaction with the local truncation error. Using this insight we show in this section that it is possible to construct such explicit block one-step methods (which are also known as Type 3 DIMSIM methods in [2]) that inhibit the growth of the truncation error so that the global error (16) gains an order of accuracy over the local truncation error (14).

We begin in Sect. 3.1 by describing the construction and analysis of error inhibiting block one-step schemes for the case of linear constant coefficient equations. We then show that this approach yields methods that are also error inhibiting for variable coefficient linear equations in Sect. 3.2 and nonlinear equations in Sect. 3.3.

3.1 Error Inhibiting Schemes for Linear Constant Coefficient Equations

Given a linear ordinary differential equation with constant coefficients:

$$\begin{aligned}&u_t= f \cdot u \;,\;\;\; \text{ for } \; \; \; \; t \ge 0, \nonumber \\&u(t=0) = u_0 \end{aligned}$$
(24)

where \(f \in \mathbbm {R}\). We define a vector of length s that contains the exact solution of (24) at times \(\left( t_n+ j \Delta t/s \right) \) for \(j=0,\ldots ,s-1\)

$$\begin{aligned} U_n = \left( u(t_{n+(s-1)/s}), \ldots , u(t_{n+1/s}), u(t_n) \right) ^T , \end{aligned}$$
(25)

and the corresponding vector of numerical approximations

$$\begin{aligned} V_n = \left( v_{n+(s-1)/s}, \ldots , v_{n+1/s}, v_n\right) ^T. \end{aligned}$$
(26)

Note that although we are assuming that the solution u at any given time is a scalar, this entire discussion easily generalizes to the case where u is a vector, with only some cumbersome notation needed. Thus without loss of generality we continue the discussion with scalar notation.

Remark 2

The notation above emphasizes that this scheme uses s terms for generating the next s terms, unlike the explicit linear multistep methods in the section above which use s terms to generate one term. To match with the notation in Sect. 2 above, we can replace \(\Delta t' = s \Delta t \) thus defining this scheme on integer grid points.

We define the block one-step method for the linear constant coefficient problem (24)

$$\begin{aligned} V_{n+1} = Q V_n \end{aligned}$$
(27)

where

$$\begin{aligned} Q = A + \Delta t B f \end{aligned}$$
(28)

and \(A, B \in \mathbbm {R}^{s \times s}\). Unlike in the case of classical multistep methods, here we do not restrict the structure of the matrices A and B. Thus, any multistep method of the form (5) can be written in this form (as we saw above), but not every method of the form (28) can be written as a multistep method. In fact, this methods is a general linear method of the DIMSIM form (3) with \(\tilde{A}\) is all zeroes, \(\tilde{U} \) is the identity matrix, \(\tilde{V}=A\), and \(\tilde{B}= B \). This particular formulation is, as we mentioned above, called a Type 3 DIMSIM in Butcher’s 1993 paper [2].

At any time \(t_n\), we know that \( \; u(t_{n}+{\Delta t}) \,= \, u(t_n)+O({\Delta t})\), so that for the numerical solution \(V_n\) to converge to the analytic solution \(U_n\) one of the eigenvalues of Q must be equal to \(1+O({\Delta t})\), and its eigenvector must have the form:

$$\begin{aligned} \left( 1+O({\Delta t}), \ldots , 1+O({\Delta t})\right) ^T \; . \end{aligned}$$
(29)

The structure of the eigensystem of A, which is the leading part of Q, is critical to the stability of the scheme and the dynamics of the error.

Suppose A is constructed such that:

  1. (1)

    \(\mathbf{rank} (A)=1\).

  2. (2)

    Its non-zero eigenvalue is equal to one and its corresponding eigenvector is \(\left( 1, \ldots , 1\right) ^T \)

  3. (3)

    A can be diagonalized.

Property (2) assures that the method produces the exact solution for the case \(f=0\). Now, since the term \( {\Delta t}B f \) is only an \(O({\Delta t})\) perturbation to A, the matrix Q will have one eigenvalue, \(z_1=1+O({\Delta t})\) whose eigenvector has the form

$$\begin{aligned} \psi _1 = \left( 1+O({\Delta t}), \ldots , 1+O({\Delta t})\right) ^T\; \end{aligned}$$
(30)

and the rest of the eigenvalues satisfy \(z_j=O({\Delta t})\) for \(j=2,\ldots ,s\).

Since the \(\Vert Q\Vert = 1+O({\Delta t})\), we can conclude that there exist constants \(K_s\) and \(\alpha _s\) such that

$$\begin{aligned} \left\| S_{\Delta t}\left( t_n, t_{\nu }\right) \right\| = \left\| Q^{n-\nu } \right\| \le K_s e^{\alpha _s\left( t_n - t_{\nu }\right) } \end{aligned}$$
(31)

where \(\alpha _s = \Vert B\Vert \, |f|\). Therefore, according to Definition 2, the scheme (27) is stable. By the same argument used above, we can show that the global error will have order that is no less than the order of the local truncation error.

We now turn to the task of investigating the truncation error, \({\varvec{\tau }}_n\). The definition of the local truncation error in this case is still

$$\begin{aligned} {\Delta t}\, {\varvec{\tau }}_n \, = \, U_{n+1} - Q_n U_n \end{aligned}$$

as defined in the previous section in Eq. (14).

Remark 3

Since \(Q = A + \Delta t B f\) and \(u_t=fu\) the local truncation error can be written as

$$ {\Delta t}\, {\varvec{\tau }}_n \, = \, U_{n+1} - \left( A U_n +{\Delta t}B \frac{d\, U_n}{dt}\right) \;. $$

Therefore \({\varvec{\tau }}_n\) does not explicitly depend on f. This observation is valid for the variable coefficients and the nonlinear case as well.

The definition of the error is

$$\begin{aligned} E_n = U_n-V_n \; , \end{aligned}$$

as in Eq. (16). The evolution of the error is still described by Eq. (19)

$$ E_{n} \, = \, \prod _{\mu =0}^{n-1} Q_{\mu } E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} Q_{\mu } \right) {\varvec{\tau }}_{\nu } \;, $$

which in the linear constant coefficient case becomes

$$\begin{aligned} E_{n} = Q^{n} E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-1} Q^{n-\nu -1} {\varvec{\tau }}_{\nu } \;. \end{aligned}$$
(32)

The main difference between this case and the linear multistep method in Sect. 2 is that the structure of Q is different, and that unlike (15), in this case all the entries in \({\varvec{\tau }}_n\) are typically non-zero.

Equation (32) indicates that there are several sources for the error at the time \(t_n\):

  1. (1)

    The initial error \(E_0\) which is the error in the initial condition \(V_0\): This error is caused primarily by the numerical scheme used to compute the first \(s-1\) elements in \(V_0\). We assume these errors can be made arbitrary small. The initial value, which is the final element of \(V_0\), is taken from the analytic initial condition and is considered to be accurate to machine precision.

  2. (2)

    The term \({\Delta t}\,{\varvec{\tau }}_{n-1},\) which is the last term in the sum in the right hand side of (32): This term is clearly, by definition, of the size \(O({\Delta t}) \Vert {\varvec{\tau }}_{n-1}\Vert .\)

  3. (3)

    The summation

    $$\begin{aligned} {\Delta t}\, \sum _{\nu =0}^{n-2} Q^{n - \nu -1} {\varvec{\tau }}_{\nu }, \end{aligned}$$
    (33)

    which are all the rest of the terms in the sum in the right hand side of (18): This is the term we need to bound to control the growth of the truncation error.

The terms in the sum (33) are all comprised of the discrete solution operator Q multiplying the local truncation error. This leads us to the major observation that is the key to constructing error inhibiting methods: if the local truncation error lives in the subspace of eigenvectors that correspond to the eigenvalues of \(O({\Delta t})\), then the growth of the truncation error will be inhibited, and the global error will be one order higher than the local truncation error.

Recall that Q has one dominant eigenvalue that has the form \( 1+O({\Delta t})\) and all the others are \(O({\Delta t})\). Correspondingly, two subspaces can be defined

$$\begin{aligned} \Psi _1 = \mathrm{span} \left\{ \psi _1\right\} \; \; \; \; \text{ and } \; \; \; \; \Psi _1^c = \mathrm{span} \left\{ \psi _2, ..., \psi _s \right\} \end{aligned}$$

where \(\psi _j\) is the eigenvector associated with each eigenvalue \(z_j\). As \(\psi _j\) can be normalized, we assume that \(\Vert \psi _j \Vert =O(1)\). It should be noted that while \(\Psi _1\) and \(\Psi _1^c\) are linearly independent, they are not orthogonal subspaces. Furthermore, since the matrix A is diagonalizable by construction, its eigenvectors span \(\mathbbm {R}^s\). Since \( {\varvec{\tau }}_{\nu } \in \mathbbm {R}^s\), it can be written as

$$\begin{aligned} {\varvec{\tau }}_{\nu } \, = \, \gamma _1 \psi _1 + \sum _{j=2}^s \gamma _j \psi _j \; \end{aligned}$$
(34)

where \(\gamma _1 \psi _1 \in \Psi _1 \) and \(\sum _{j=2}^s \gamma _j \psi _j \in \Psi _1^c\).

Of course, the truncation error \({\varvec{\tau }}_{\nu } \) is determined by the entries of Q. To ensure that the local truncation error is mostly in the space \(\Psi _1^c \) of eigenvectors which correspond to the eigenvalues of size \(O({\Delta t})\), we choose the entries of Q (i.e. the entries of A and B) such that \(\gamma _1 = O({\Delta t})\), which will mean that

$$\begin{aligned} \left\| \gamma _1 \psi _1 \right\| \, = \, O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| \;. \end{aligned}$$
(35)

Using this, we can bound product of the discrete solution operator and the truncation error,

$$\begin{aligned} \left\| Q {\varvec{\tau }}_{\nu } \right\|= & {} \left\| \gamma _1 Q \psi _1 + \sum _{j=2}^s \gamma _j Q \psi _j \right\| \le \left\| \gamma _1 Q \psi _1 \right\| +\left\| \sum _{j=2}^s \gamma _j Q \psi _j \right\| \nonumber \\= & {} \left\| \gamma _1 z_1 \psi _1 \right\| +\left\| \sum _{j=2}^s \gamma _j z_j \psi _j \right\| \le |z_1| \left\| \gamma _1 \psi _1 \right\| + \max _{j=2, .. . s} |z_j| \left\| \sum _{j=2}^s \gamma _j \psi _j \right\| \nonumber \\\le & {} |z_1| \left\| \gamma _1 \psi _1 \right\| + \max _{j=2, .. . s} |z_j| \left\| {\varvec{\tau }}_\nu - \gamma _1 \psi _1 \right\| \nonumber \\\le & {} \left( 1+ O({\Delta t}) \right) O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| + O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| = O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| \nonumber \end{aligned}$$

where \(z_j\) are the eigenvalues of Q. Therefore we have

$$\begin{aligned} \left\| Q {\varvec{\tau }}_{\nu } \right\| \le O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| \ . \end{aligned}$$
(36)

Whenever the condition (36) is satisfied, we can show that the sum (33) above is bounded:

$$\begin{aligned} \left\| {\Delta t}\, \sum _{\nu =0}^{n-2} Q^{n - {\nu -1}} {\varvec{\tau }}_{\nu } \right\|= & {} {\Delta t}\left\| \sum _{\nu =0}^{n-2} Q^{n - {\nu -1}} {\varvec{\tau }}_{\nu } \right\| \nonumber \le {\Delta t}\sum _{\nu =0}^{n-2} \left\| Q^{n-\nu -2} \right\| \left\| Q {\varvec{\tau }}_{\nu } \right\| \nonumber \\\le & {} {\Delta t}\sum _{\nu =0}^{n-2} \left\| Q \right\| ^{n-\nu -2} O({\Delta t}) \Vert {\varvec{\tau }}_\nu \Vert \nonumber \\\le & {} {\Delta t}\left( \max _{\nu =0,...,n-2} \left\| {\varvec{\tau }}_{\nu } \right\| \right) \sum _{\nu =0}^{n-2} (1+c {\Delta t})^{n-\nu -2} O({\Delta t}) \nonumber \\\le & {} {\Delta t}\left( \max _{\nu =0,...,n-2} \left\| {\varvec{\tau }}_{\nu } \right\| \right) \sum _{\nu =0}^{n-2} \left[ e^{c {\Delta t}} \left( 1+O({\Delta t}^2) \right) \right] ^{n-\nu -2} O({\Delta t}) \nonumber \\\le & {} {\Delta t}\left( \max _{\nu =0,...,n-2} \left\| {\varvec{\tau }}_{\nu } \right\| \right) \sum _{\nu =0}^{n-2} \left[ e^{c (t_{n-2}-t_{\nu })} \left( 1+O({\Delta t}) \right) \right] O({\Delta t}) \nonumber \\\le & {} O({\Delta t}) \left( \max _{\nu =0,...,n-2} \left\| {\varvec{\tau }}_{\nu } \right\| \right) \phi _{{\Delta t}}^*(c, \, T) . \end{aligned}$$
(37)

(Recall (22) for the definition of of \(\phi _{{\Delta t}}^*(c, \, T)\).)

In the final equation, T is the final time, and the term \(\phi _{{\Delta t}}^*(c, \, T)\) is therefore a constant. Thus we have the bound

$$\begin{aligned} \left\| {\Delta t}\, \sum _{\nu =0}^{n-2} Q^{n - {\nu -1}} {\varvec{\tau }}_{\nu } \right\| \le O({\Delta t}) \max _{\nu =0,...,n-2} \left\| {\varvec{\tau }}_{\nu } \right\| . \end{aligned}$$
(38)

Putting this all together into (32), we obtain

$$\begin{aligned} \left\| E_n \right\| \, = \, O({\Delta t}) \max _{\nu =0,...,n-1} \left\| {\varvec{\tau }}_{\nu } \right\| \;. \end{aligned}$$
(39)

Thus, if the coefficients of A and B are chosen so that we can control the size of \(\left\| Q {\varvec{\tau }}_{\nu } \right\| \) in (36), we can obtain a scheme that inhibits the growth of the local truncation error, so that the global error is one order more accurate than its truncation error.

3.2 Linear Variable-Coefficient Equations

In the previous section we showed how to construct an error inhibiting method by choosing the coefficients in A and B so that the local truncation error lives (mostly) in the space that is spanned by the eigenvectors corresponding to eigenvalues that are of \(O({\Delta t})\). In this section we show that under the same criteria as above, these methods are also error inhibiting when applied to a variable coefficient linear ordinary differential equation:

$$\begin{aligned}&u_t= f(t) u \;\;,\;\;\;\;\; t \ge 0 \nonumber \\&u(t=0) = u_0 \; \end{aligned}$$
(40)

where f(t) assumed to be analytic or as smooth as needed, and bounded. In this case the scheme is given by a time-dependent evolution operator \(Q_n\) which may change each time-step:

$$\begin{aligned} V_{n+1} = Q_n V_n \end{aligned}$$
(41)

where

$$\begin{aligned} Q _n= A + {\Delta t}B \, \left( \begin{array}{cccccc} f \left( t_{n+(s-1)/s} \right) \\ &{} f \left( t_{n+(s-2)/s} \right) \\ &{}&{} \ddots \\ &{}&{}&{} \;\;\;\; f \left( t_{n} \right) \end{array} \right) \end{aligned}$$
(42)

and the matrices A and B are the same as described above for the constant coefficient scheme.

Since f(t) is an analytic function, \(Q_n\) can be written as

$$\begin{aligned} Q _n= A + {\Delta t}B f(t_n) + {\Delta t}^2 B f'(t_n) \left( \begin{array}{cccccc} \left( {(s-1)/s} \right) \\ &{} \left( {(s-2)/s} \right) \\ &{}&{} \ddots \\ &{}&{}&{} \;\;\;\;\; 0 \end{array} \right) + O({\Delta t}^3)\quad \end{aligned}$$
(43)

We can also say then that

$$\begin{aligned} Q_{n} = A + {\Delta t}B f(t_{n}) + O({\Delta t}^2) B f'(t_{n}) = \tilde{Q}_{n} + O( {\Delta t}^2) . \end{aligned}$$
(44)

Each \(\tilde{Q}_{n}\) has the same structure as Q in the constant coefficient case. In particular

$$\begin{aligned} \Vert \tilde{Q}_{n} \Vert = \left( 1+O({\Delta t})\right) \le 1+c {\Delta t},\; \;\;\; \forall n\;. \end{aligned}$$
(45)

Furthermore, as was pointed out in Remark 3, since the local truncation error \({\varvec{\tau }}_n\) does not depend explicitly on f(t) at any time \(t_n\), we can write \({\varvec{\tau }}_n\) as a linear combination of the eigenvectors of A that correspond to the zero eigenvalues. Thus, \({\varvec{\tau }}_n\) lives (mostly) in the space that is spanned by the eigenvectors of any matrix \(\tilde{Q}_{n}\) corresponding to eigenvalues that are of \(O({\Delta t})\). We can then follow the same analysis as in (35)–(36), to obtain the bound

$$\begin{aligned} \Vert \tilde{Q}_{n+1} {\varvec{\tau }}_n\Vert = O({\Delta t})\Vert {\varvec{\tau }}_n\Vert ,\; \;\;\; \forall n \;. \end{aligned}$$
(46)

In this case, Eq. (18) takes the modified form (for \(n \ge 1\))

$$\begin{aligned} E_{n}= & {} \prod _{\mu =0}^{n-1} Q_{\mu } E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} Q_{\mu } \right) {\varvec{\tau }}_{\nu } \nonumber \\= & {} \prod _{\mu =0}^{n-1} Q_{\mu } E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-2} \prod _{\mu =\nu +1}^{n-1} \left( \tilde{Q}_{\mu } + O({\Delta t}^2) \right) {\varvec{\tau }}_{\nu } + {\Delta t}{\varvec{\tau }}_{n-1} \nonumber \end{aligned}$$

The first term is negligible because we assume that the initial error can be made arbitrarily small, and the final term is clearly of order \({\Delta t}{\varvec{\tau }}_{n-1}\). Using (45), (46) and the same analysis as in (35)–(38) we have

$$\begin{aligned} \left\| {\Delta t}\, \sum _{\nu =0}^{n-2} \left( \prod _{\mu =\nu +1}^{n-1} \tilde{Q}_{\mu } \right) {\varvec{\tau }}_{\nu } \right\|= & {} \left\| {\Delta t}\, \sum _{\nu =0}^{n-2} \left( \prod _{\mu =\nu +2}^{n-1} \tilde{Q}_{\mu } \right) \left( \tilde{Q}_{\nu +1} {\varvec{\tau }}_{\nu } \right) \right\| \\\le & {} {\Delta t}\, \sum _{\nu =0}^{n-2} \left\| \prod _{\mu =\nu +2}^{n-1} \tilde{Q}_{\mu } \right\| \left\| \tilde{Q}_{\nu +1} {\varvec{\tau }}_{\nu } \right\| \\\le & {} {\Delta t}\, \sum _{\nu =0}^{n-2} O\left( 1+O({\Delta t}) \right) ^{n- \nu -2} O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| \\\le & {} O({\Delta t}) \max _{\nu =0, . . ., n-2} \left\| {\varvec{\tau }}_{\nu } \right\| .\\ \end{aligned}$$

Putting these all together we have

$$\begin{aligned} \left\| E_n\right\| = O({\Delta t}) \max _{\nu =0, . . ., n-1} \left\| {\varvec{\tau }}_{\nu }\right\| . \end{aligned}$$
(47)

This simple proof shows that even for the variable coefficient case, the schemes constructed as described above have a higher order error than would be expected from the truncation error. In the next subsection we extend this analysis to the general nonlinear case.

3.3 Nonlinear Equations

Finally, we analyze the behavior of methods satisfying the assumptions in Sect. 3.1 when applied to nonlinear problems. Consider the nonlinear equation

$$\begin{aligned}&u_t = f(u(t), t) \;\;,\;\;\;\;\; t \ge 0 \nonumber \\&u(t=0) = u_0 \; \end{aligned}$$
(48)

where f(ut) assumed to be analytic in u and t. We now use the scheme

$$\begin{aligned} V_{n+1} = A V_n + {\Delta t}B \, \left( \begin{array}{cccccc} f \left( v_{n+(s-1)/s}, t_{n+(s-1)/s} \right) \\ \vdots \\ f \left( v_n, t_{n} \right) \end{array} \right) \end{aligned}$$
(49)

where the matrices A and B are as constructed above for the constant coefficients problem.

As defined in (14), the exact solution to (48) and the truncation error are related by

$$\begin{aligned} U_{n+1} = A U_n + {\Delta t}B \, \left( \begin{array}{cccccc} f \left( u_{n+(s-1)/s}, t_{n+(s-1)/s} \right) \\ \vdots \\ f \left( u_n, t_{n} \right) \end{array} \right) + {\Delta t}{\varvec{\tau }}_n. \end{aligned}$$
(50)

Note that by Taylor expansion

$$\begin{aligned} f \left( v_{\nu }, t_{\nu } \right) =f \left( u_{\nu }, t_{\nu } \right) +f _u\left( u_{\nu }, t_{\nu } \right) (v_{\nu } -u_{\nu })+ r(v_{\nu } -u_{\nu }) \;,\;\ \end{aligned}$$

where \(f_u (u,t)= \partial f(u,t)/ \partial u\) and \(|r(v_{\nu } -u_{\nu }) | \le c_1|v_{\nu } -u_{\nu }|^2\). Subtracting (49) from (50) and assuming that \(E_n = U_n-V_n \ll 1\) gives

$$\begin{aligned} E_{n+1} {=} A E_n - {\Delta t}B \, \left( \!\begin{array}{cccccc} f_u \left( u_{n+(s-1)/s}, t_{n+(s-1)/s} \right) \\ &{} \ddots \\ &{}&{} \;\;\;\; f_u \left( u_n, t_{n} \right) \end{array} \right) E_n + {\Delta t}{\varvec{\tau }}_n +{\Delta t}R (E_n) \end{aligned}$$
(51)

where \(\Vert R (E_n) \Vert \le c_1 \Vert E_n\Vert ^2\). Equation (51) means that as long as \(O(E_n^2 ) \ll O({\varvec{\tau }}_n)\), the equation for the error \(E_n\) can be analyzed in essentially the same way as for the linear variable coefficient case, and the same estimates hold.

In order to evaluate the time interval in which \(O(E_n^2 ) \ll O({\varvec{\tau }}_n)\) we note that although the term \(R (E_n)\) in (51) is not a non-homogeneous term but rather a function of \(E_n\), we can still use the approach used in [6, Theorem 5.1.2]) to prove stability for a perturbed solution operator. As in [6, Theorem 5.1.2]), we use the discrete version of Duhamel’s principle to obtain

$$\begin{aligned} E_{n}= & {} \prod _{\mu =0}^{n-1} \hat{Q}_{\mu } E_0 \,+\, {\Delta t}\, \sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} \hat{Q}_{\mu } \right) {\varvec{\tau }}_{\nu } + {\Delta t}\sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} \hat{Q}_{\mu } \right) R (E_{\nu }) \nonumber \\ \end{aligned}$$
(52)

where

$$\begin{aligned} \hat{Q}_{n} = A - {\Delta t}B \, \left( \begin{array}{cccccc} f_u \left( u_{n+(s-1)/s}, t_{n+(s-1)/s} \right) \\ &{} \ddots \\ &{}&{} \;\;\;\; f_u \left( u_n, t_{n} \right) \end{array} \right) \;. \end{aligned}$$
(53)

Taking the norm of (52) and using the triangle inequality we obtain

$$\begin{aligned} \left\| E_{n} \right\|\le & {} \left\| \prod _{\mu =0}^{n-1} \hat{Q}_{\mu } E_0 \right\| \,+\, \left\| {\Delta t}\, \sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} \hat{Q}_{\mu } \right) {\varvec{\tau }}_{\nu } \right\| + \left\| {\Delta t}\sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} \hat{Q}_{\mu } \right) R (E_{\nu }) \right\| .\nonumber \\ \end{aligned}$$
(54)

As in the linear case we assume that the initial error, \(E_0\) is arbitrary small, so the first term is negligible. If \(\hat{Q}_{\nu +1}\) is constructed such that \(\Vert \hat{Q}_{\nu +1} {\varvec{\tau }}_{\nu } \Vert = {\Delta t}O({\varvec{\tau }}_{\nu })\) then using the same analysis as in variable coefficient case the second term in (54) is less or equal to \({\Delta t}c_0 \phi _h^*(c, \, t_n) \max _{\nu =0, . . ., n-1} \left\| {\varvec{\tau }}_{\nu }\right\| \). As to the third term, the same arguments can be used to show that it is bounded by

$$\begin{aligned} \left\| {\Delta t}\sum _{\nu =0}^{n-1} \left( \prod _{\mu =\nu +1}^{n-1} \hat{Q}_{\mu } \right) R (E_{\nu }) \right\| \le c_1 \phi _h^*(c, \, t_n) \left\| E_{n} \right\| ^2 \;, \end{aligned}$$
(55)

so that (54) (with the substitution of (55) for the final term) can be re-arranged to obtain

$$\begin{aligned} \left\| E_{n} \right\| \left( 1- c_1 \phi _h^*(c, \, t_n) \left\| E_{n} \right\| \right)\le & {} {\Delta t}c_0 \phi _h^*(c, \, t_n) \max _{\nu =0, . . ., n-1} \left\| {\varvec{\tau }}_{\nu }\right\| . \end{aligned}$$
(56)

If \( c_1 \phi _h^*(c, \, t_n) \left\| E_{n} \right\| <1/2\), we obtain

$$\begin{aligned} \left\| E_{n} \right\|\le & {} 2 {\Delta t}c_0 \phi _h^*(c, \, t_n) \max _{\nu =0, . . ., n-1} \left\| {\varvec{\tau }}_{\nu }\right\| \end{aligned}$$
(57)

This estimate holds as long as

$$\begin{aligned} c_1 \phi _h^*(c, \, t_n) \left\| E_{n} \right\|\le & {} 2 {\Delta t}c_0 c_1\left( \phi _h^*(c, \, t_n) \right) ^2 \max _{\nu =0, . . ., n-1} \left\| {\varvec{\tau }}_{\nu }\right\| \le \frac{1}{2}, \end{aligned}$$
(58)

which is satisfied for all times \(t_n\) such that \({\Delta t}\phi _h^*(c, \, t_n)= O(1)\).

Therefore

$$\begin{aligned} \left\| E_n\right\| = O({\Delta t}) \max _{\nu =0, . . ., n-1} \left\| {\varvec{\tau }}_{\nu }\right\| . \end{aligned}$$

for the nonlinear case as well.

4 Some Error Inhibiting Explicit Schemes

In the previous section we define sufficient conditions for methods of the form

$$\begin{aligned} V_{n+1} = Q V_n \end{aligned}$$
(59)

where

$$\begin{aligned} Q = A + {\Delta t}B f \end{aligned}$$

to be error inhibiting. These are

C1. :

\(\mathbf{rank} (A)=1\).

C2. :

Its non-zero eigenvalue is equal to 1 and its corresponding eigenvector is

$$\begin{aligned} \left( 1, \ldots , 1\right) ^T. \end{aligned}$$
C3. :

A can be diagonalized.

C4. :

The matrices A and B are constructed such that when the local truncation error is multiplied by the discrete solution operator we have the bound:

$$\begin{aligned} \left\| Q {\varvec{\tau }}_{\nu } \right\| \le O({\Delta t}) \left\| {\varvec{\tau }}_{\nu } \right\| . \end{aligned}$$

This is accomplished by requiring the local truncation error to live in the space of the eigenvectors of A that correspond to the zero eigenvalues.

In this section we present several schemes which were constructed using the approach presented in the previous section. To construct these schemes, we first select the coefficients in A (an \(s \times s\) matrix). C1C3 imply that all the rows A are identical and depends on \(s-1\) independent variables, e.g. \(a_{1,1}, ... , a_{1,s-1}\). Condition C2 assures consistency by ensuring that the row sums are one. Next, we select the elements of the \(s\times s\) matrix B and \(a_{1,1}, ... , a_{1,s-1}\) by demanding that the order conditions are satisfied to order s, i.e. \(\Vert {\varvec{\tau }}\Vert = O({\Delta t}^s)\) and that \(\Vert Q{\varvec{\tau }}\Vert = O({\Delta t}^{s+1})\). This ensures that condition C4 is satisfied. These calculations were done symbolically using Mathematica©. This procedure ensures that the method has the correct LTE, and that condition C4 is satisfied, thus establishing the error inhibiting property.

In Sect. 4.1, we present a block one-step method that evolves two steps (\(v_n\) and \(v_{n+\frac{1}{2}}\)) to obtain the next two steps (\(v_{n+1}\) and \(v_{n+\frac{3}{2}}\)). This method has truncation error (14) that is second order, while its global order (16) is third order. We demonstrate that the expected convergence rate is attained on several sample nonlinear problems. In this section we also show that a typical Type 3 DIMSIM method (derived in [2]) that satisfies the first three conditions above but not the fourth, has truncation error of order two, and its global error is of the same order. This demonstrates the importance of condition C4.

Next, in Sect. 4.2 we present a block one-step method that evolves three steps \(v_n\), \(v_{n+\frac{1}{3}}\) and\(v_{n+\frac{2}{3}}\) to obtain \(v_{n+1}\), \(v_{n+\frac{4}{3}}\) and \(v_{n+\frac{5}{3}}\). This method has truncation error (14) that is third order, while its global order (16) is fourth order, as we demonstrate on several sample problems. Finally, to show that the methods in each class are not unique, we present two other methods of this type and show that their global error is of one order higher than the local truncation error on a sample nonlinear system.

4.1 A Third Order Error Inhibiting Method with \(s=2\)

In this subsection we define an explicit block one-step with \(s=2\) that satisfies the conditions C1 – C4 above. This method takes the values of the solution at the times \(t_n\) and \(t_{n+\frac{1}{2}}\) and obtains the solution at the time-level \(t_{n+1}\) and \(t_{n+\frac{3}{2}}\). The exact solution vector for this problem is

$$\begin{aligned} U_n = \left( u(t_{n+1/2}), u(t_n) \right) ^T \end{aligned}$$

and, similarly, the corresponding vector of numerical approximations is

$$\begin{aligned} V_n = \left( v_{n+1/2}, v_n\right) ^T. \end{aligned}$$

The scheme is given by:

$$\begin{aligned} V_{n+1} \,= \, \frac{1}{6}\left( \begin{array}{ccc} -1&{} \;\; &{} 7 \\ -1&{} &{} 7 \end{array} \right) V_n + \frac{{\Delta t}}{24}\left( \begin{array}{ccc} 55 &{} \;&{}-17 \\ 25 &{} &{} 1 \end{array} \right) \, \left( \begin{array}{cccccc} f \left( v_{n+1/2}, t_{n+1/2} \right) \\ f \left( v_n, t_{n} \right) \end{array} \right) , \end{aligned}$$
(60)

and has truncation error

$$\begin{aligned} {\varvec{\tau }}_{n} \,= \, \frac{23}{576}\left( \begin{array}{ccc} 7 \\ 1 \end{array} \right) \frac{d^3}{dt^3} u(t_n) \,\Delta t^2 \, + \, O(\Delta t^3) \;. \end{aligned}$$
(61)

The matrix A can be diagonalized as follows:

$$\begin{aligned} A \,= \, \frac{1}{6}\left( \begin{array}{ccc} -1&{} \;\; &{} 7 \\ -1&{} &{} 7 \end{array} \right) \,= \, \frac{1}{6} \left( \begin{array}{ccc} 1&{} \;\; &{} 7 \\ 1&{} &{} 1 \end{array} \right) \left( \begin{array}{ccc} 1&{} \;\; &{} \\ &{} &{} 0 \end{array} \right) \left( \begin{array}{ccc} -1&{} \;\; &{} 7 \\ 1&{} &{} -1 \end{array} \right) \;. \end{aligned}$$
(62)

Observe that the leading order of the truncation error (61) is in the space of the second eigenvector of A, the one that corresponds to the zero eigenvalue. Also, as was pointed out in Remark 3, \({\varvec{\tau }}_n\) depends only on this eigenvector of A and a multiple that is not directly dependent on f but only on the third derivative of the solution u. This underscores the analysis in Sects. 3.2 and 3.3 that demonstrates that the error inhibiting property carries through for variable coefficient and nonlinear problems.

To study the behavior of the global error we use the fact shown in Sect. 3.3 that even for a nonlinear equation it is sufficient to analyze the matrix

$$\begin{aligned} Q \,= \, A + {\Delta t}\,B\,f \end{aligned}$$
(63)

where f is a constant. In this case:

$$\begin{aligned} Q= & {} \frac{1}{6} \left( \begin{array}{ccc} 1 + \frac{{f} {\Delta t}}{2} + \frac{{f}^2 {\Delta t}^2}{8}+O({\Delta t}^3) &{} \;\; &{} 7 + 36 {f} {\Delta t}+ 228 {f}^2 {\Delta t}^2 +O({\Delta t}^3) \\ 1&{} &{} 1 \end{array} \right) \nonumber \\&\left( \begin{array}{ccc} 1 + {f} {\Delta t}+ \frac{{f}^2 {\Delta t}^2}{2} + \frac{{f}^3 {\Delta t}^3}{6}+O({\Delta t}^4)&{} \;\; &{} \\ &{} &{} \frac{4 {f} {\Delta t}}{3} - \frac{{f}^2 {\Delta t}^2}{2} - \frac{{f}^3 {\Delta t}^3}{6}+O({\Delta t}^4) \end{array} \right) \nonumber \\&\left( \begin{array}{ccc} -1 + \frac{71 {f} {\Delta t}}{12} + \frac{107 {f}^2 {\Delta t}^2}{36} +O({\Delta t}^3)&{} \;\; &{} 7 - \frac{65 {f} k}{12} - \frac{209 {f}^2 {\Delta t}^2}{36}+O({\Delta t}^3) \\ 1 - \frac{71 {f} {\Delta t}}{12} - \frac{107 {f}^2 {\Delta t}^2}{36}+O({\Delta t}^3)&{} &{} -1 + \frac{65 {f} k}{12} + \frac{209 {f}^2 {\Delta t}^2}{36}+O({\Delta t}^3) \end{array} \right) \nonumber \\ \end{aligned}$$
(64)

Recall that, neglecting the initial error \(E_0\), we can say that the global error is (16)

$$\begin{aligned} E_n = {\Delta t}\sum _{\nu =0}^{n-1} Q^{n-\nu -1} \tau _\nu \end{aligned}$$

Putting together Eqs. (61) and (64) we see that each term \( Q \,{\varvec{\tau }}_\nu \) contributes to the error in two ways:

  • The first contribution is due to the fact that \( {\varvec{\tau }}_\nu \) is almost co-linear with the second eigenvector \(\psi _2\). The order of this contribution is

    $$\begin{aligned} |z_2| \Vert \psi _2 {\varvec{\tau }}_\nu \Vert = O({\Delta t}) \cdot O(\Vert {\Delta t}{\varvec{\tau }}_\nu \Vert ) = O({\Delta t}^3) \end{aligned}$$

    where the term \(|z_2|\) is the second eigenvalue which is of order \(O({\Delta t})\).

  • The second contribution to the error comes from the component of \( {\varvec{\tau }}_\nu \) that is a multiple \(\gamma _1\) of the first eigenvector \(\psi _1\),

    $$\begin{aligned} |z_1| \Vert \gamma _1 \psi _1 {\varvec{\tau }}_\nu \Vert = O({\Delta t}) \cdot O(\Vert {\varvec{\tau }}_\nu \Vert ) = O({\Delta t}^3) \end{aligned}$$

    the term \(\gamma _1\) is of \(O({\Delta t})\) because \( {\varvec{\tau }}_\nu \) lives mostly in the space of \(\psi _2\).

While each of the terms in \( {\Delta t}Q \,{\varvec{\tau }}_\nu \) has order \(O({\Delta t}^2) \cdot O(\Vert {\varvec{\tau }}_\nu \Vert ) = O({\Delta t}^4)\), as the method is evolved forward, the errors accumulate over time, and sum of all contributions from all the times gives us a global error of order \(O({\Delta t}) \cdot O(\Vert {\varvec{\tau }}_n \Vert ) = O({\Delta t}^3)\).

Example 1a

To demonstrate that this method indeed performs as designed we study its behavior on a nonlinear scalar equation of the form:

$$\begin{aligned}&u_t= -u^2 \; = \; f(u) \;\;,\;\;\;\;\; t \ge 0 \nonumber \\&u(t=0) = 1 \;. \end{aligned}$$
(65)

We evolve the solution of this equation to time \(T=1\) using the scheme (60). The initial steps are computed exactly. The plots of the errors and the truncation errors are presented in Fig. 1a. Both errors are shown for the first component, \(v_{n+1/2}\) (denoted v(1) in the legend) and the second component, \(v_{n}\) (denoted v(2) in the legend). Clearly, although the truncation error is only second order (denoted tr err v(1) and tr err v(2) in the legend), the global error is third order, as predicted by the theory.

Fig. 1
figure 1

Convergence plots using the scheme (60). a The errors and truncation errors versus \(\Delta t\), for several values of \(\Delta t\), for the numerical solution of (65). b The errors versus \( {\Delta t}\) for each component of the solution, computed for several values of \({\Delta t}\), for the numerical solution of the van der Pol equation (66)

Example 1b

It is important that the method will perform as designed on a nonlinear system as well. To demonstrate this, we solve the van der Pol system

$$\begin{aligned} u^{(1)}_t= & {} u^{(2)} \nonumber \\ u^{(2)}_t= & {} 0.1 [1- (u^{(1)})^2 ] u^{(2)}- u^{(1)} \end{aligned}$$
(66)

using the same scheme (60). As this is a system, it is important that both components are examined. Thus, the vector of the numerical solution has two components for the time level \(t_n\), denoted by v(2), and two components for the time level \(t_{n+\frac{1}{2}}\), denoted by v(1). In Fig. 1b the convergence plot of the components of \(u^{(1)}\) and \(u^{(2)}\) are presented. Once again, we see that the convergence rate is indeed third order.

Remark 4

It is important to note that not all Type 3 DIMSIM methods are error inhibiting! The property that the local truncation error lives in the space spanned by the eigenvectors of A that correspond to the zero eigenvalues is needed for the error inhibiting behavior to occur, and this property is not generally satisfied. To observe this, we study the DIMSIM scheme of types 3 presented by Butcher in [2].

Consider the scheme

$$\begin{aligned} \left( \begin{array}{ccc} v_{n+2}\\ v_{n+1} \end{array} \right) \,= \, \frac{1}{4}\left( \begin{array}{ccc} 7&{} \;\; &{} -3 \\ 7&{} &{} -3 \end{array} \right) \left( \begin{array}{ccc} v_{n+1}\\ v_{n} \end{array} \right) + \frac{{\Delta t}}{8}\left( \begin{array}{ccc} 9 &{} \;&{}-7 \\ -3 &{} &{} -3 \end{array} \right) \, \left( \begin{array}{cccccc} f \left( v_{n+1}, t_{n+1} \right) \\ f \left( v_n, t_{n} \right) \end{array} \right) \end{aligned}$$
(67)

given in [2]. This scheme has truncation error

$$\begin{aligned} {\varvec{\tau }}_{n} \,= \, \frac{1}{48}\left( \begin{array}{ccc} 23 \\ 3 \end{array} \right) \frac{d^3}{dt^3} u(t_n) \,\Delta t^2 \, + \, O(\Delta t^3) \;. \end{aligned}$$
(68)

The matrix A can be diagonalized as follows:

$$\begin{aligned} A \,= \, \frac{1}{4}\left( \begin{array}{ccc} 7&{} \;\; &{} -3 \\ 7&{} &{} -3 \end{array} \right) \,= \, \left( \begin{array}{ccc} 1&{} \;\; &{} 3/7 \\ 1&{} &{} 1 \end{array} \right) \left( \begin{array}{ccc} 1&{} \;\; &{} \\ &{} &{} 0 \end{array} \right) \frac{1}{4}\left( \begin{array}{ccc} 7&{} \;\; &{} -3 \\ -7&{} &{} 7 \end{array} \right) \;. \end{aligned}$$
(69)

The truncation error \({\varvec{\tau }}_{n}\) can be written as a linear combination of the two eigenvectors of A as follows:

$$\begin{aligned} {\varvec{\tau }}_{n} \,= \, \left[ \frac{19}{24}\left( \begin{array}{ccc} 1 \\ 1 \end{array} \right) - \frac{35}{48}\left( \begin{array}{ccc} 3/7 \\ 1 \end{array} \right) \right] \frac{d^3}{dt^3} u(t_n) \,\Delta t^2 \, + \, O(\Delta t^3) \;. \end{aligned}$$
(70)

Unlike the EIS scheme (60), here the first term in this expansion is of the order of \(O({\varvec{\tau }}_{n})= O(\Delta t^2)\). Therefore a term of the order of \(\Delta t O({\varvec{\tau }}_{n})= O(\Delta t^3)\) is accumulated at each time step, so that the global error is second order.

We note that both this method (67) and our error inhibiting method (60) satisfy the order conditions in Theorem 3.1 of [2] only up to second order (\(p=2\)). However, as we see in Fig. 2, when the method (67) is used to simulate the solution of the problems (65) and (66) we have second order accuracy, while the error inhibiting method (60) gave third order accuracy (Fig. 1).

Fig. 2
figure 2

Convergence plots using Butcher’s scheme (67). a The errors and truncation errors versus \(\Delta t\), for several values of \(\Delta t\), for the numerical solution of (65). Note that the errors for v(1) and v(2) are virtually identical so these error lines coincide. b The errors versus \( {\Delta t}\) for each component of the solution, computed for several values of \({\Delta t}\), for the numerical solution of the van der Pol equation (66). Note that for this problem as well the behavior of this method on both components is virtually identical, so the error lines for each component of the solution coincide. Both the local truncation errors and the global errors are second order: this is not an error inhibiting scheme

4.2 A Fourth Order Error Inhibiting Method with \(s=3\)

In this subsection we present an error inhibiting method with \(s=3\) that takes the values of the solution at the times \(t_n\), \(t_{n+\frac{1}{3}}\), and \(t_{n+\frac{2}{3}}\) and uses these three values to obtain the solution at the time-level \(t_{n+1}\), \(t_{n+\frac{4}{3}}\), and \(t_{n+\frac{5}{3}}\). The exact solution vector is given by

$$\begin{aligned} U_n = \left( u(t_{n+2/3}), u(t_{n+1/3}), u(t_n) \right) ^T, \end{aligned}$$

and the corresponding vector of numerical approximations is

$$\begin{aligned} V_n = \left( v_{n+2/3}, v_{n+1/3}, v_n\right) ^T. \end{aligned}$$

Consider the error inhibiting scheme

$$\begin{aligned} V_{n+1}= & {} \frac{1}{768}\left( \begin{array}{ccccc} 467 &{} -1996 &{} 2297 \\ 467 &{} -1996 &{} 2297 \\ 467 &{} -1996 &{} 2297 \\ \end{array} \right) V_n \nonumber \\&+\,\frac{{\Delta t}}{1152}\left( \begin{array}{ccc} 5439 &{} -6046 &{} 3058 \\ 2399 &{} -1694 &{} 1362 \\ 703 &{} 354 &{} 626 \\ \end{array} \right) \, \left( \begin{array}{cccccc} f \left( v_{n+2/3}, t_{n+2/3} \right) \\ f \left( v_{n+1/3}, t_{n+1/3} \right) \\ f \left( v_n, t_{n} \right) \end{array} \right) , \end{aligned}$$
(71)

which has a local truncation error of third order,

$$\begin{aligned} {\varvec{\tau }}_{n}= & {} \frac{1}{373248}\left( \begin{array}{ccc} 43699 \\ 12787 \\ 2227 \end{array} \right) \frac{d^4}{dt^4} u(t_n) \,{\Delta t}^3 \, + \, O({\Delta t}^4) \nonumber \\ \nonumber \\\approx & {} \; \left( \begin{array}{ccc} 0.117078\\ 0.0342587\\ 0.00596654 \\ \end{array} \right) \frac{d^4}{dt^4} u(t_n) \,{\Delta t}^3 \, + \, O({\Delta t}^4) \;. \end{aligned}$$
(72)

However, it can be verified that for the linear case, the product

$$\begin{aligned} Q_n {\varvec{\tau }}_{n} = O({\Delta t}{\varvec{\tau }}_n) = O({\Delta t}^4) \; . \end{aligned}$$

Given the analysis in Sect. 3.3 above, this result will carry over to the nonlinear case, and thus this method will have a fourth order global error, despite the third order truncation error.

To demonstrate this result we revisit the two examples (65) and (66) in the previous subsection and use the scheme (71) to evolve them forward in time. The results, shown in Fig. 3, are exactly as we expect: although the truncation errors (seen for the problem (65) in Fig. 3a) are only third order, the errors are fourth order for both problems (65) and the van der Pol problem (66).

Fig. 3
figure 3

Convergence plots using the scheme (71). a The errors and truncation errors versus \(\Delta t\), for several values of \(\Delta t\), for the numerical solution of (65). b The errors versus \( {\Delta t}\) for each component of the solution, computed for several values of \({\Delta t}\), for the numerical solution of the van der Pol Eq. (66). As expected, we observe fourth order accuracy for the errors, although the truncation errors are third order

4.2.1 Other Fourth Order Error Inhibiting Methods with \(s=3\)

The methods above are not unique, in fact other methods can be derived using this approach. In this section we present two additional error inhibiting methods with \(s=3\) that have local truncation error that is third order but demonstrate fourth order global error on a nonlinear system.

The first method is

$$\begin{aligned} V_{n+1}= & {} \frac{1}{1020}\left( \begin{array}{ccccc} 449 &{} -1966 &{}\;&{} 2537 \\ 449 &{} -1966 &{}\;&{} 2537 \\ 449 &{} -1966 &{}\;&{} 2537 \\ \end{array} \right) V_n \nonumber \\&+\,\frac{{\Delta t}}{6120}\left( \begin{array}{ccc} 29123 &{} -32576 &{} 15789 \\ 12973 &{} -9456 &{} 6779 \\ 3963 &{} 1424 &{} 2869 \\ \end{array} \right) \, \left( \begin{array}{cccccc} f \left( v_{n+2/3}, t_{n+2/3} \right) \\ f \left( v_{n+1/3}, t_{n+1/3} \right) \\ f \left( v_n, t_{n} \right) \end{array} \right) , \end{aligned}$$
(73)

and has a local truncation error of third order,

$$\begin{aligned} {\varvec{\tau }}_{n}= & {} \frac{1}{991440}\left( \begin{array}{ccc} 115733 \\ 33623\\ 5573 \end{array} \right) \frac{d^4}{dt^4} u(t_n) \,{\Delta t}^3 \, + \, O({\Delta t}^4) \nonumber \\ \nonumber \\\approx & {} \; \left( \begin{array}{ccc} 0.116732 \\ 0.0339133\\ 0.00562112 \end{array} \right) \frac{d^4}{dt^4} u(t_n) \,{\Delta t}^3 \, + \, O({\Delta t}^4) \;. \end{aligned}$$
(74)

The second method is

$$\begin{aligned} V_{n+1}= & {} \left( \begin{array}{ccccc} - \frac{101}{96} &{} \frac{97}{24} &{} - \frac{191}{96}\\ - \frac{101}{96} &{} \frac{97}{24} &{} - \frac{191}{96}\\ - \frac{101}{96} &{} \frac{97}{24} &{} - \frac{191}{96}\\ \end{array} \right) V_n + {\Delta t}\left( \begin{array}{ccc} \frac{733}{144} &{} -\frac{431}{72} &{} \frac{23}{12} \\ \frac{353}{144} &{} - \frac{53}{24} &{} \frac{4}{9}\\ \frac{47}{48} &{} - \frac{31}{72} &{} - \frac{7}{36} \\ \end{array} \right) \, \left( \begin{array}{cccccc} f \left( v_{n+2/3}, t_{n+2/3} \right) \\ f \left( v_{n+1/3}, t_{n+1/3} \right) \\ f \left( v_n, t_{n} \right) \end{array} \right) . \quad \quad \end{aligned}$$
(75)

The truncation error is also third order

$$\begin{aligned} {\varvec{\tau }}_{n}= & {} \left( \begin{array}{ccc} \frac{5303}{46656} \\ \frac{1439}{46656}\\ \frac{119}{46656}\\ \end{array} \right) \frac{d^4}{dt^4} u(t_n) \,{\Delta t}^3 \, + \, O({\Delta t}^4) \\= & {} \left( \begin{array}{ccc} {0.113662} \\ {0.0308428} \\ {0.00255058} \\ \end{array} \right) \frac{d^4}{dt^4} u(t_n) \,{\Delta t}^3 \, + \, O({\Delta t}^4) \nonumber \end{aligned}$$
(76)

Both these methods satisfy

$$\begin{aligned} Q_n {\varvec{\tau }}_{n} = O({\Delta t}{\varvec{\tau }}_n) = O({\Delta t}^4) \; \end{aligned}$$

as well. As above, this property results in an error inhibiting mechanism that produced a global error of order four. This can be seen once again in Fig. 4, using the nonlinear problem (66) above. The results of method (73) are on the left and of (75) are on the right.

Fig. 4
figure 4

Convergence plots van der Pol Eq. (66). The plots show the errors versus \( {\Delta t}\) for each component of the solution, computed for several values of \({\Delta t}\) for a the scheme (73) and b the scheme (75). As expected, we observe fourth order accuracy for the errors, although the truncation errors computed above are third order

Note: The time-step \(\Delta t\) in the numerical tests was chosen to demonstrate the order of convergence of the methods. In practice, much larger values of \(\Delta t\) are still within the stability regions of these methods. In [2], Butcher mentions that the method given in (67) has real axis stability region of [\(-\frac{4}{3},0\)]. The popular Adams–Bashforth methods have real axis stability of [-1,0] for p=2, [\(-\)0.54,0] for p=3, and [\(-\)0.3,0] for p=4. Our third order (\(s=2\)) method given by (60) has real axis stability \([-0.635,0]\). The fourth order (\(s=3\)) methods have real axis stability domains of [\(-\)0.46,0] for the method given by Eq. (71), [\(-\)0.472,0] for the method given by Eq. (73), and [\(-\)0.54,0] for the method given by Eq. (75), which is comparable to that of the third order Adams Bashforth.

5 Conclusions

While it is generally assumed that the global error will be of the order of the local truncation error, in this work we presented an approach to creating methods that have a global error of higher order than predicted by the local truncation error. To accomplish this, we used the block formulation of a method \( V_{n+1} = Q_n V_n \) where the discrete solution operator \(Q_n = A+{\Delta t}B F_n \) is comprised of matrices of coefficients A and B, and the matrix operator \(F_n.\)

We show that if A is a diagonalizable matrix of rank one, that has only one nonzero eigenvalue, \(z_1=1\), that corresponds to the eigenvector of all ones, then the error inhibiting property will occur if the leading part of the local truncation error error for the linear constant coefficient case (\(F_n=F=\) a constant) is spanned by the eigenvectors corresponding to the zero eigenvalues of A (to the leading order). We show that a method that has these properties will have a global error that has higher order than the local error, on nonlinear problems.

After presenting the concept behind these methods we use the theoretical properties above to develop block one-step methods that are in the family of Type 3 DIMSIM methods presented in [2]. We demonstrate in numerical examples on nonlinear problems (including a nonlinear system) that these methods have global error that is one order higher than the local truncation errors. We also show that this is in contrast to another Type 3 DIMSIM method which has a matrix A that satisfies the first three properties C1–C3, but does not satisfy the error inhibiting property C4, that the local truncation error is in the space spanned by the eigenvectors of A that correspond to the zero eigenvalues, and indeed does not give us a global error that is higher than the local truncation error on nonlinear test problems.

The major development in this work is the concept of an error inhibiting method and the new approach for developing methods that are constructed to control the growth of the local truncation error. While the newly developed methods presented in this work can be used in place of currently standard methods (particularly in place of type 3 DIMSIM methods) to obtain higher order accuracy, it is not yet known how they compare to other methods in terms of other important properties. In future work we intend to the study of the computational efficiency and storage requirements of these methods and the analysis of their linear stability regions. We expect that this will also lead to further development of error inhibiting methods that have other favorable properties.