Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative

Eriksson, Sofia

doi:10.1007/s10915-021-01606-9

Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative

Open access
Published: 21 September 2021

Volume 89, article number 30, (2021)
Cite this article

Download PDF

You have full access to this open access article

Journal of Scientific Computing Aims and scope Submit manuscript

Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative

Download PDF

Sofia Eriksson ORCID: orcid.org/0000-0003-1216-1672¹

1138 Accesses
4 Citations
1 Altmetric
Explore all metrics

Abstract

The scalar, one-dimensional advection equation and heat equation are considered. These equations are discretized in space, using a finite difference method satisfying summation-by-parts (SBP) properties. To impose the boundary conditions, we use a penalty method called simultaneous approximation term (SAT). Together, this gives rise to two semi-discrete schemes where the discretization matrices approximate the first and the second derivative operators, respectively. The discretization matrices depend on free parameters from the SAT treatment. We derive the inverses of the discretization matrices, interpreting them as discrete Green’s functions. In this direct way, we also find out precisely which choices of SAT parameters that make the discretization matrices singular. In the second derivative case, it is shown that if the penalty parameters are chosen such that the semi-discrete scheme is dual consistent, the discretization matrix can become singular even when the scheme is energy stable. The inverse formulas hold for SBP-SAT operators of arbitrary order of accuracy. For second and fourth order accurate operators, the inverses are provided explicitly.

A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators

Article 16 October 2017

Simultaneous Approximation Terms for Multi-dimensional Summation-by-Parts Operators

Article 31 August 2017

Discrete Pseudo-differential Operators and Applications to Numerical Schemes

Article 15 February 2024

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Consider the time-dependent partial differential equation (1a) below, where ${\mathcal {L}}$ represents a linear differential operator and $f(x)$ is a forcing function. We assume that some suitable initial condition and—for the moment homogeneous—boundary conditions are given such that we have a well-posed problem. Applying the method of lines, that is discretizing first in space while keeping time continuous, yields a system of ordinary differential equations (1b), where we refer to $L$ as the discretizarion matrix.

$$\begin{aligned} u_t+{\mathcal {L}}u&=f,&t\ge 0,\quad x\in [0,\ell ], \end{aligned}$$

(1a)

$$\begin{aligned} {\mathbf {v}}_t+L{\mathbf {v}}&=\mathbf {f},&t\ge 0. \end{aligned}$$

(1b)

We first look at the scalar advection equation and thereafter at the heat equation, both in one spatial dimension. Thus $L$ approximates either the first or the second derivative operator, including boundary treatments.

In this paper, $L$ is obtained using the SBP-SAT finite difference method. This class of finite difference method is based on difference operators fulfilling summation-by-parts (SBP) properties, and is modified by the penalty technique simultaneous approximation term (SAT) for treating the boundary conditions. The SBP operators were first developed for first derivatives [21, 29] and then later for second derivatives [7, 25] and are designed to facilitate the derivation of energy estimates. A means to impose boundary conditions without destroying these properties is to use SAT [6]. The SATs included in $L$ contain free parameters. We follow the common practice of determining these parameters using the energy method, such that (1b) is guaranteed to be time-stable. Thereafter, any remaining degrees of freedom in the SATs can be used to make the scheme dual consistent. Dual consistency is advantageous when computing functionals of the solution, since the order of accuracy of functionals from dual consistent schemes can be higher compared to those from non-dual consistent schemes [18]. For more details about SBP-SAT, see [12, 31].

Thanks to the SBP-SAT properties, the discretization matrix can be factorized as $L=H^{-1}K$, where $H$ is a symmetric, positive definite matrix that has the role of a quadrature rule, see [19]. Now consider the steady version of (1a), ${\mathcal {L}}u=f$. Its solution $ u(x)$ may be represented as in (2a) below, where ${\mathcal {G}}$ is the Green’s function. The steady version of (1b) is $L{\mathbf {v}}=\mathbf {f}$. Solving for ${\mathbf {v}}$, yields (2b).

$$\begin{aligned} u(x)&=\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y, \end{aligned}$$

(2a)

$$\begin{aligned} {\mathbf {v}}&=K^{-1}H\mathbf {f}. \end{aligned}$$

(2b)

With $H$’s role as a quadrature rule in mind, we can see a clear similarity between (2a) and (2b): Since $\mathbf {f}$ approximates $f$ and the multiplication by $H$ approximates the integration, we realize that $K^{-1}$ resembles the Green’s function ${\mathcal {G}}$. It makes sense to refer to $K^{-1}$ as a discrete Green’s function.

A finite difference analogue of the Green’s function was introduced already in the fundamental article [9]. Thereafter, discrete Green’s functions appear sporadically in the literature, see for example [8, 10] and references therein. E.g. in [4] (and correspondingly in [9] for two-dimensional problems) the finite formula approximating (2a) is scaled with the spatial mesh size h, which then corresponds closely to (2b). However, since traditional finite difference stencils usually do not have an assigned quadrature rule in the same sense as the SBP operators, the term “discrete Green’s functions” often refers to $L^{-1}$ rather than to $K^{-1}$, for example in [5, 8, 28].

In the above-mentioned articles, the standard way of enforcing boundary conditions, injection, has been used instead of SAT (for descriptions of these two boundary methods, see for example [31]). In [14], the first and second derivatives were approximated using an SBP-SAT finite volume method, the inverses analogous to $K^{-1}$ were derived and used for analysing errors. Here, we derive formulas for $K^{-1}$ corresponding to the first and second derivatives as well, however, as an extension to the results in [14], our formulas hold for arbitrary orders of accuracy and in the second derivative case we consider general Robin boundary conditions instead of only Dirichlet boundary conditions.

The inverses are full matrices and are therefore probably not competitive for solving systems $L{\mathbf {v}}=\mathbf {f}$ directly, compared to fast solvers for banded matrices. It is however often advisable to use pre-conditioning to improve the convergence of iterative methods [16]. A preconditioning matrix $P$ should ideally approximate the inverse of $L$ in some sense, and knowledge about the structure of the inverses could—speculatively—be used when designing preconditioning matrices. If $P$ is a sparse approximate inverse, the computations are cheap, but preconditioners $P$ may also be essentially dense matrices, as for example the fundamental solution preconditioners considered in [5].

The paper is organized as follows: in Sect. 2, we look at the semi-discrete scheme approximating the advection equation. The matrix $K$ associated with $\frac{\partial }{\partial x}$ is denoted ${\widetilde{Q}}$, and its inverse is presented in Theorem 2.1. In Sect. 3, we consider the heat equation, thus approximating $\frac{\partial ^2}{\partial x^2}$. The related matrix $K$, denoted ${\widetilde{A}}$, is inverted in Theorem 3.1. The SAT parameters are chosen to give stability and dual consistency, and additionally it is of interest to know if some choices of SAT parameters result in a singular discretization matrix $L$. In the second derivative case, it turns out that an energy stable scheme can actually have a singular $L$ if the scheme is also dual consistent. Some relations between stability, dual consistency and a singular discretization matrix are discussed in Sect. 3.3. We also discuss the relations between two different ways of showing energy stability, in Sect. 3.4. The paper is summarized in Sect. 4.

2 The First Derivative

Consider the scalar advection equation with a Dirichlet boundary condition at the inflow boundary, that is

(3)

valid for $t\ge 0$, with initial condition $u(x,0)=u_0(x)$. The forcing function $f(x,t)$, the initial data $u_0(x)$ and the boundary data $g_\mathrm{L}(t)$ are known functions.

We call (3) well-posed if it has a unique solution and is stable (can be bounded by data). Techniques for showing existence and uniqueness can be found in for example [17, 20]. We focus on showing stability, since we will derive a corresponding stable discrete problem later. We use the energy method, and multiply the partial differential equation in (3) by $u$, and integrate over the spatial domain. Thereafter, we use integration by parts and apply the boundary condition. For simplicity, we consider the homogeneous case, that is with the data $f=0$ and $g_\mathrm{L}=0$. This yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert u\Vert ^2 =-u(\ell ,t)^2 \end{aligned}$$

where $\Vert u\Vert ^2=\int _0^{\ell }u^2\,\mathrm {d} x$ and where we have used that $(u^2)_t=2uu_t$. In the homogeneous case, the growth rate thus becomes $\frac{\mathrm {d} }{\mathrm {d} t}\Vert u\Vert ^2\le 0$. Integrating this in time yields the energy estimate $\Vert u\Vert ^2\le \Vert u_0\Vert ^2$ and the solution is thus bounded. Since (3) is an one-dimensional hyperbolic problem it is also possible to show strong well-posedness, i.e., that $\Vert u\Vert $ is bounded by the data $f$, $g_\mathrm{L}$ and $u_0$. See [17, 20] for different definitions of well-posedness.

2.1 The Semi-discrete Scheme

We first discretize in space, on the interval $x\in [0, \ell ]$, using $n+1$ equidistant grid points $x_i=ih$, where $h= \ell /n$ and $i=0,1,\ldots ,n$. Using the SBP-SAT finite difference method, we obtain a semi-discrete scheme approximating (3) as

$$\begin{aligned} \begin{aligned} {\mathbf {v}}_t+D_1{\mathbf {v}}=\mathbf {f}&+H^{-1} \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}\left( {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-g_\mathrm{L}\right) , \end{aligned} \end{aligned}$$

(4)

where ${\mathbf {v}}(t)=[v_0, v_1, \ldots , v_n]^{\mathsf {T}}$ is the approximation of the continuous solution $u(x,t)$, and where $\mathbf {f}=[f(x_0,t), f(x_1,t), \ldots , f(x_n,t)]^{\mathsf {T}}$ is the restriction of $f(x,t) $ to the grid. In the same way, we let the initial data be ${\mathbf {v}}(0)=[u_0(x_0), u_0(x_1), \ldots , u_0(x_n)]^{\mathsf {T}}$. The matrix $D_1$ approximates the first derivative operator $\partial /\partial x$, and fulfills the SBP-properties [21, 29]

$$\begin{aligned} D_1=H^{-1}Q,&H=H^{\mathsf {T}}>0,&Q+Q^{\mathsf {T}}={\mathbf {e}}_\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}\end{aligned}$$

(5)

where ${\mathbf {e}}_\mathrm{L}=[1, 0, \ldots , 0]^{\mathsf {T}}$ and ${\mathbf {e}}_\mathrm{R}=[ 0, \ldots , 0, 1]^{\mathsf {T}}$. By the notation >, we mean that the matrix $H$ is positive definite. As mentioned in the introduction, $H$ has the role of a quadrature rule and $\Vert {\mathbf {v}}\Vert _{H}^2\equiv {\mathbf {v}}^{\mathsf {T}}H{\mathbf {v}}$ approximates the $L^2$-norm of $u(x,t)$, see [19]. The scalar $\sigma _\mathrm{L}$ determines the strength of the SAT, and will be chosen below such that the scheme (4) is energy stable and dual consistent.

2.1.1 Stability and Dual Consistency

To show energy stability, we multiply (4) by ${\mathbf {v}}^{\mathsf {T}}H$ from the left and use the relations (5). We thereafter add the transpose, and we consider $\mathbf {f}={\mathbf {0}}$ and $g_\mathrm{L}=0$, just as in the continuous case. This yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H=-v_n^2+(1+2 \sigma _\mathrm{L}) v_0^2, \end{aligned}$$

where $v_0={\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}$ and $v_n={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}$. We need $\frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H\le 0$, which is guaranteed if $ \sigma _\mathrm{L}\le -1/2$. For a dual consistent scheme, we need $\sigma _\mathrm{L}=-1$, see [3, 18].

2.2 The Inverse of the Discretization Matrix

We first rewrite (4) as

$$\begin{aligned} {\mathbf {v}}_t+ H^{-1}{\widetilde{Q}}{\mathbf {v}}&=\widetilde{\mathbf{f}}, \end{aligned}$$

(6)

where

$$\begin{aligned} {\widetilde{Q}}=Q- \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}},&\widetilde{\mathbf{f}}=\mathbf {f}-H^{-1} \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}g_\mathrm{L}. \end{aligned}$$

(7)

We identify ${\widetilde{Q}}$ as the first derivative version of $K$ discussed in the introduction. The second order accurate version of ${\widetilde{Q}}$ was inverted in [14] and inspired by those results, we make a similar ansatz and derive ${\widetilde{Q}}^{-1}$ of arbitrary order of accuracy. The result is given in Theorem 2.1.

Theorem 2.1

Consider the $(n+1) \times (n+1) $-matrices Q from (5) and ${\widetilde{Q}}$ found in (7). The structures of Q and ${\widetilde{Q}}$ are

(8)

where $\vec {q}$ is an $n\times 1$-vector and is an $n\times n$-matrix. It is assumed that is invertible. The inverse of ${\widetilde{Q}}$ is

$$\begin{aligned} {\widetilde{Q}}^{-1}=G_1-\frac{1}{\sigma _\mathrm{L}}{\mathbf {1}}{\mathbf {b}}^{{\mathsf {T}}}, \end{aligned}$$

(9)

where

(10)

Proof of Theorem 2.1

We aim to show that ${\widetilde{Q}}{\widetilde{Q}}^{-1}=I$, where I is the $(n+1)\times (n+1)$ identity matrix. Using ${\widetilde{Q}}$ from (7) and ${\widetilde{Q}}^{-1}$ from (9), we compute

Note that $D_1{\mathbf {1}}=0$, since $D_1$ in (5) is a consistent difference operator. Hence, $Q{\mathbf {1}}={\mathbf {0}}$. Furthermore, ${\mathbf {e}}_\mathrm{L}^{\mathsf {T}}G_1={\mathbf {0}}^{\mathsf {T}}$ since the first row of $G_1$ consists of zeros. These relations, the fact that ${\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {1}}=1$ and the structures of the components in (8) and (10) yields

where is the $n\times n$ identity matrix. $\square $

Corollary 2.2

The structure of ${\widetilde{Q}}^{-1}$ in (9) implies that ${\widetilde{Q}}$ is singular only if $\sigma _\mathrm{L}=0$.

The existence of $G_1$ and ${\mathbf {b}}$ in (10), and consequently the validity of Theorem 2.1 and Corollary 2.2, rely on the assumption that is invertible. In the (2,1) order accurate case—where we by the notation “(2,1) order accurate”, refer to a matrix $D_1$ which has second order of accuracy in the interior finite difference stencil and first order of accuracy at the boundaries—the inverse of is derived and presented in “Appendix A.1”, which directly proves its existence. The same is done for the inverse of the “Section A.2” of Appendix order accurate operator, which is presented in “Section A.2” of Appendix. Higher order operators, on the other hand, have free parameters. For example, for the diagonal norm (6,3) order accurate version of $D_1$ described in [29], $x_1$ is a free parameter. In this case, ${\widetilde{Q}}$ is invertible for commonly used parameter values $x_1$, see [27]. The invertibility of ${\widetilde{Q}}$ is also addressed for general SBP operators in [22], where it is shown that ${\widetilde{Q}}$ (with $\sigma _\mathrm{L}=-1$) is invertible if and only if ${\mathbf {1}}$ spans the nullspace of $D_1$.

The discussion above is focused on “classical FD-SBP operators”, constructed around centred finite difference approximations with diagonal matrices $H$. However, Theorem 2.1 only requires consistency (such that $Q{\mathbf {1}}=0$) and that the SAT makes ${\widetilde{Q}}=Q- \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}$. Thus it holds for a more general class of SBP operators where the boundary nodes are included in the operator, compare Definition 1 in [11]—as long as the corresponding is invertible. Moreover, in Theorem 2.1 it is implied that $Q+Q^{\mathsf {T}}={\mathbf {e}}_\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}$, but this is not crucial for the proof and the result applies also for e.g. upwind operators.

Remark 2.3

For the steady version of (3), that is $u_{x}=f$ with $u(0)=g_\mathrm{L}$, we have

$$\begin{aligned} u(x)&=g_\mathrm{L}+\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y,&{\mathcal {G}}(x,y)=\left\{ \begin{array}{ll}1,&{}\quad y<x,\\ 0,&{}\quad x\le y,\end{array}\right. \end{aligned}$$

where ${\mathcal {G}}$ is a Green’s function. Starting from ${\mathbf {v}}={\widetilde{Q}}^{-1}H\widetilde{\mathbf{f}}$, using (7) and (9) as well as the relations ${\mathbf {b}}^{\mathsf {T}}{\mathbf {e}}_\mathrm{L}=1$ and $G_1{\mathbf {e}}_\mathrm{L}={\mathbf {0}}$ deduced from (10), we obtain

$$\begin{aligned} {\mathbf {v}}&=g_\mathrm{L}{\mathbf {1}}+{\widetilde{Q}}^{-1}H\mathbf {f}. \end{aligned}$$

Recall from the introduction that $K^{-1}={\widetilde{Q}}^{-1}$ resembles ${\mathcal {G}}$. E.g. the version of ${\widetilde{Q}}^{-1}$ found in (34) in “Section A.1” of Appendix (which corresponds to the second order accurate operator) is

$$\begin{aligned} \left( {\widetilde{Q}}^{-1}\right) _{i,j}=\left\{ \begin{array}{ll} 1-(1+1/\sigma _\mathrm{L})(-1)^j,&{}\quad 0\le j\le i\le n,\\ (-1)^{i+j}-(1+1/\sigma _\mathrm{L})(-1)^j,&{}\quad 0\le i \le j\le n. \end{array}\right. \end{aligned}$$

The dual consistent choice $\sigma _\mathrm{L}=-1$ is optimal in the sense that it cancels the oscillations such that $({\widetilde{Q}}^{-1})_{i,j}=1$ for $j\le i$, however $({\widetilde{Q}}^{-1})_{i,j}=(-1)^{i+j}\ne 0$ for $i\le j$. If we instead let $\sigma _\mathrm{L}\rightarrow -\infty $, interpreted as mimicking the injection treatment, results in ${\widetilde{Q}}^{-1}=G_1$. By writing the numerical solution as ${\mathbf {v}}={\mathbf {1}}\left( g_\mathrm{L}-\frac{1}{\sigma _\mathrm{L}}{\mathbf {b}}^{\mathsf {T}}H\mathbf {f}\right) +G_1H\mathbf {f}$, we see that the constant level of the solution varies when $\sigma _\mathrm{L}$ is tuned. In particular, ${\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\rightarrow g_\mathrm{L}$ as $\sigma _\mathrm{L}\rightarrow -\infty $.

2.2.1 Interface SATs

The SBP-SAT methodology is well suited for dividing the computational domain into subdomains, coupled by interfaces [7]. As an example, we discretize (3) again, using two subdomains with the unknowns ${\mathbf {v}}_{\mathrm{A},\mathrm{B}}$, coupled such that ${\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}_\mathrm{A}\approx {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}_\mathrm{B}$ at the interface. Modifying (4) to this two-subdomain system yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}V+\mathbb {H}^{-1}\widetilde{{\mathbb {Q}}}V={\widetilde{F}}, \end{aligned}$$

with

where all quantities with subindex A belongs to the left subdomain and the ones marked with B to the right subdomain. The same vectors ${\mathbf {e}}_\mathrm{L,R}$ are used in both domains implying that they have the same number of grid points, but that is merely for ease of presentation. In particular, ${\widetilde{Q}}_\mathrm{A}=Q_\mathrm{A}- \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}$ and $\widetilde{\mathbf{f}}_\mathrm{A}=\mathbf {f}_\mathrm{A}-H_\mathrm{A}^{-1} \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}g_\mathrm{L}$ are modified to impose the boundary condition, and $\mu _{\mathrm{A},\mathrm{B}}$ are the penalty parameters at the interface. For $\mu _{\mathrm{A}}-\mu _{\mathrm{B}}=1$ with $\mu _{\mathrm{A}}+\mu _{\mathrm{B}}\le 0$, the scheme is conservative, dual consistent and stable.

Assume $Q_{\mathrm{A},\mathrm{B}}{\mathbf {1}}={\mathbf {0}}$, and let ${\widetilde{Q}}_\mathrm{A}^{-1}=G_\mathrm{A}-\frac{1}{\sigma _\mathrm{L}}{\mathbf {1}}{\mathbf {b}}_\mathrm{A}^{\mathsf {T}}$, and $(Q_\mathrm{B}-\mu _\mathrm{B}{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}})^{-1}=G_\mathrm{B}-\frac{1}{\mu _\mathrm{B}}{\mathbf {1}}{\mathbf {b}}_\mathrm{B}^{\mathsf {T}}$. Then ${\mathbb {Q}}\mathbb {1}=0$, where ${\mathbb {Q}}=\widetilde{{\mathbb {Q}}}(\sigma _\mathrm{L}=0)$ and where $\mathbb {1}$ is given below. In this case Theorem 2.1 applies and the inverse of $\widetilde{{\mathbb {Q}}}$ has the form $\widetilde{{\mathbb {Q}}}^{-1}={\mathbb {G}}-\frac{1}{\sigma _\mathrm{L}}\mathbb {1}\mathbb {b}^{\mathsf {T}}$, where

are obtained using the formula for inverse of block-matrices together with the relations ${\mathbf {e}}_\mathrm{L}^{\mathsf {T}}G_\mathrm{B}={\mathbf {0}}^{\mathsf {T}}$, $G_\mathrm{B}{\mathbf {e}}_\mathrm{L}={\mathbf {0}}$, ${\mathbf {b}}_\mathrm{B}^{\mathsf {T}}{\mathbf {e}}_\mathrm{L}=1$ and ${\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}{\mathbf {1}}=1$.

As in the single domain case, $\widetilde{{\mathbb {Q}}}^{-1}$ can be interpreted as a discrete Green’s function. In particular, we note an interesting behaviour when $\mu _{\mathrm{A}}=0$ and $\mu _{\mathrm{B}}=-1$, i.e. a fully up-wind coupling. Then $\widetilde{{\mathbb {Q}}}$ is block-triangular, which leads to

We see that with up-wind the continuous feature of having ${\mathcal {G}}(x,y)=0$ for $x\le y$ from Remark 2.3 is at least mimicked on block-matrix level.

3 The Second Derivative

Now consider the scalar heat equation with Robin boundary conditions, that is

(11)

valid for $t\ge 0$, with initial condition $u(x,0)=u_0(x)$. The forcing function $f(x,t)$, the initial data $u_0(x)$ and the boundary data $g_\mathrm{L,R}(t)$ are known functions.

We multiply the partial differential equation in (11) by $u$ and integrate the result over the spatial domain, with the data put to $f=0$ and $g_\mathrm{L,R}=0$. Thereafter using integration by parts and the boundary conditions, yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert u\Vert ^2+2\Vert u_x\Vert ^2=-2 \frac{\beta _\mathrm{R}}{\alpha _\mathrm{R}}u_{x}( \ell ,t)^2-2 \frac{\beta _\mathrm{L}}{\alpha _\mathrm{L}}u_{x}(0,t)^2. \end{aligned}$$

For a decaying growth rate, we need $\alpha _\mathrm{L,R}\beta _\mathrm{L,R}\ge 0$.

3.1 The Semi-discrete Scheme

Using the SBP-SAT finite difference method, we obtain a scheme approximating (11) as

$$\begin{aligned} \begin{aligned} {\mathbf {v}}_t-D_2{\mathbf {v}}=\mathbf {f}&+H^{-1}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L}) \left( \alpha _\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-\beta _\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-g_\mathrm{L}\right) \\ {}&+H^{-1}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R}) \left( \alpha _\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}+\beta _\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}-g_\mathrm{R}\right) , \end{aligned} \end{aligned}$$

(12)

where ${\mathbf {v}}$, $\mathbf {f}$, $H$ and ${\mathbf {e}}_\mathrm{L,R}$ are described as in Sect. 2.1. The matrix $D_2$ approximates the second derivative operator, and fulfills the SBP-properties

(13)

The vectors ${\mathbf {d}}_\mathrm{L}$ and ${\mathbf {d}}_\mathrm{R}$ are consistent finite difference stencils approximating the first derivative, see [7]. Two common categories of $D_2$ operators are wide-stencil and narrow-stencil operators. Wide-stencil operators can be factorized as $D_2= D_1^2$, and the term “narrow” describes finite difference schemes with a minimal stencil width [26].

The penalty parameters $\sigma _\mathrm{L,R}$ and $\tau _\mathrm{L,R}$ in (12) are scalars that will be further specified and discussed in the next sections. Now, we use (13) to rewrite (12) as

$$\begin{aligned} {\mathbf {v}}_t+ H^{-1}{\widetilde{A}}{\mathbf {v}}&=\widetilde{\mathbf{f}}, \end{aligned}$$

(14)

where

(15)

and where $\widetilde{\mathbf{f}}=\mathbf {f}-H^{-1}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L})g_\mathrm{L}-H^{-1}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R}) g_\mathrm{R}$. We identify ${\widetilde{A}}$ as the second derivative version of the matrix $K$ from the introduction.

3.1.1 Stability

To show energy stability, we multiply (12) by ${\mathbf {v}}^{\mathsf {T}}H$ from the left and use the relations (13). We thereafter add the transpose, and let $\mathbf {f}={\mathbf {0}}$ and $g_\mathrm{L,R}=0$. This yields

$$\begin{aligned} \begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H+2{\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}&=2{\mathbf {v}}^{\mathsf {T}}({\mathbf {e}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}){\mathbf {v}}\\&\quad +\,2{\mathbf {v}}^{\mathsf {T}}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L}) \left( \alpha _\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}-\beta _\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\right) \\&\quad +\,2{\mathbf {v}}^{\mathsf {T}}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R})\left( \alpha _\mathrm{R}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}+\beta _\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\right) ,\end{aligned} \end{aligned}$$

(16)

where we need to show that $\frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H\le 0$. We will determine the stability limits of $\sigma _\mathrm{L,R}$ and $\tau _\mathrm{L,R}$ using a procedure sometimes called the borrowing technique [1, 2, 7, 15, 24, 30, 32]. The idea is to “borrow” a maximum amount $\gamma $ of “positivity” from A, more precisely as

$$\begin{aligned} A={\tilde{A}}_\gamma +h\gamma ({\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+{\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}),&{\tilde{A}}_\gamma \ge 0,\quad \gamma >0. \end{aligned}$$

(17)

Inserting the relation in (17) into (16), we obtain

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert {\mathbf {v}}\Vert ^2_H+2{\mathbf {v}}^{\mathsf {T}}{\tilde{A}}_\gamma {\mathbf {v}}&=\left[ \begin{array}{c}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\\ -{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] ^{\mathsf {T}}\left[ \begin{array}{cc} 2\sigma _\mathrm{L}\alpha _\mathrm{L}&{}1+\sigma _\mathrm{L}\beta _\mathrm{L}+\tau _\mathrm{L}\alpha _\mathrm{L}\\ 1+\sigma _\mathrm{L}\beta _\mathrm{L}+\tau _\mathrm{L}\alpha _\mathrm{L}&{}2\tau _\mathrm{L}\beta _\mathrm{L}-2h\gamma \end{array}\right] \left[ \begin{array}{c}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\\ -{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] \\&\quad +\,\left[ \begin{array}{c}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\\ {\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] ^{\mathsf {T}}\left[ \begin{array}{cc} 2\sigma _\mathrm{R}\alpha _\mathrm{R}&{}1+\sigma _\mathrm{R}\beta _\mathrm{R}+\tau _\mathrm{R}\alpha _\mathrm{R}\\ 1+\sigma _\mathrm{R}\beta _\mathrm{R}+\tau _\mathrm{R}\alpha _\mathrm{R}&{}2\tau _\mathrm{R}\beta _\mathrm{R}-2h\gamma \end{array}\right] \left[ \begin{array}{c}{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\\ {\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {v}}\end{array}\right] . \end{aligned}$$

For stability, we need both the matrices in the two quadratic forms above to be negative semi-definite. This is fulfilled if

$$\begin{aligned} \begin{aligned} 2\sigma _\mathrm{L,R}\alpha _\mathrm{L,R}&\le 0\\ 2(\tau _\mathrm{L,R}\beta _\mathrm{L,R}-h\gamma )&\le 0\\ (1+\tau _\mathrm{L,R}\alpha _\mathrm{L,R}+\sigma _\mathrm{L,R}\beta _\mathrm{L,R})^2&\le 4\sigma _\mathrm{L,R}\alpha _\mathrm{L,R}(\tau _\mathrm{L,R}\beta _\mathrm{L,R}-h\gamma ).\end{aligned} \end{aligned}$$

(18)

3.1.2 Dual Consistency

To make the scheme (12) dual consistent we first note that the operator $\partial ^2/\partial x^2$ (including boundary conditions) is a symmetric operator and that the matrix ${\widetilde{A}}$ must be symmetric to mimic this. From (15) it is clear that ${\widetilde{A}}$ is symmetric if $1+\sigma _\mathrm{L,R}\beta _\mathrm{L,R}=\tau _\mathrm{L,R}\alpha _\mathrm{L,R}$. Let

$$\begin{aligned} \delta _\mathrm{L}\equiv 1 +\sigma _\mathrm{L}\beta _\mathrm{L}-\tau _\mathrm{L}\alpha _\mathrm{L}&\delta _\mathrm{R}\equiv 1+ \sigma _\mathrm{R}\beta _\mathrm{R}- \tau _\mathrm{R}\alpha _\mathrm{R}, \end{aligned}$$

(19)

where $\delta _\mathrm{L,R}=0$ for dual consistent choices of penalty parameters. The relations in (19), with $\delta _\mathrm{L,R}=0$, can also be derived from the penalty parameters of the scalar problem in [13]. For a background and more thorough descriptions of dual consistency, see [18].

Note that now, using the dual consistency parameters $\delta _\mathrm{L,R}$ defined in (19), the three stability requirements in (18) can be reformulated as

$$\begin{aligned} \sigma _\mathrm{L,R}\alpha _\mathrm{L,R}\le 0,&\tau _\mathrm{L,R}\beta _\mathrm{L,R}\le h\gamma ,&\delta _\mathrm{L,R}^2\le -4 \alpha _\mathrm{L,R}(\sigma _\mathrm{L,R}h\gamma +\tau _\mathrm{L,R}). \end{aligned}$$

(20)

3.2 The Inverse of the Discretization Matrix

We consider the steady version of (14), that is $H^{-1}{\widetilde{A}}{\mathbf {v}}=\widetilde{\mathbf{f}}$, which has a unique solution ${\mathbf {v}}={\widetilde{A}}^{-1}H\widetilde{\mathbf{f}}$, if ${\widetilde{A}}^{-1}$ exists. We derive this inverse and present the result in Theorem 3.1.

Theorem 3.1

Consider ${\widetilde{A}}$ in (15), which depends on A and ${\mathbf {d}}_\mathrm{L,R}$ in (13) and on the boundary related scalars $\sigma _\mathrm{L,R}$, $\tau _\mathrm{L,R}$, $\alpha _\mathrm{L,R}$ and $\beta _\mathrm{L,R}$. Let the parts of A be denoted as follows,

$$\begin{aligned} A=\left[ \begin{array}{ccc}a_\mathrm{L}&{}\quad \vec {a}_\mathrm{L}^{{\mathsf {T}}}&{}\quad a_\mathrm{C}\\ \vec {a}_\mathrm{L}&{}\quad {\bar{A}}&{}\quad \vec {a}_\mathrm{R}\\ a_\mathrm{C}&{}\quad \vec {a}_\mathrm{R}^{{{\mathsf {T}}}}&{}\quad a_\mathrm{R}\end{array}\right] , \end{aligned}$$

(21)

where $a_\mathrm{L}$, $a_\mathrm{R}$ and $a_\mathrm{C}$ are scalars, $\vec {a}_\mathrm{L,R}$ are $(n-1)\times 1$-vectors and ${\bar{A}}$ is an $(n-1)\times (n-1)$-matrix. The inverse of ${\widetilde{A}}$ is

$$\begin{aligned} {\widetilde{A}}^{-1}&=G_2+\left[ \begin{array}{cccc}-\tau _\mathrm{L}{\mathbf {b}}_\mathrm{L}&-\tau _\mathrm{R}{\mathbf {b}}_\mathrm{R}&{\mathbf {1}}-{\mathbf {x}}/\ell&{\mathbf {x}}/\ell \end{array}\right] \Sigma ^{-1} \left[ \begin{array}{c}{\mathbf {b}}_\mathrm{L}^{{\mathsf {T}}}\\ {\mathbf {b}}_\mathrm{R}^{{\mathsf {T}}}\\ \beta _\mathrm{L}({\mathbf {1}}-{\mathbf {x}}/\ell )^{{\mathsf {T}}}\\ \beta _\mathrm{R}{\mathbf {x}}^{{\mathsf {T}}} /\ell \end{array}\right] \end{aligned}$$

(22)

where ${\mathbf {1}}=[1\ 1\ 1\ \ldots \ 1]^{{\mathsf {T}}}$ and ${\mathbf {x}}=h[0\ 1\ 2\ \ldots \ n]^{{\mathsf {T}}}$, and where

$$\begin{aligned} G_2=\left[ \begin{array}{ccc}0&{}\vec {0}^{{\mathsf {T}}}&{}0\\ \vec {0}&{}{\bar{A}}^{-1}&{}\vec {0}\\ 0&{}\vec {0}^{{\mathsf {T}}}&{}0\end{array}\right] ,&{\mathbf {b}}_\mathrm{L}\equiv {\mathbf {1}}-{\mathbf {x}}/\ell -G_2{\mathbf {d}}_\mathrm{L},&{\mathbf {b}}_\mathrm{R}\equiv {\mathbf {x}}/\ell +G_2{\mathbf {d}}_\mathrm{R}. \end{aligned}$$

(23)

Furthermore, $ \Sigma $ in (22) is a $4\times 4$-matrix

$$\begin{aligned} \Sigma =\left[ \begin{array}{cccc}\sigma _\mathrm{L}+\tau _\mathrm{L}\xi _\mathrm{L}&{}-\tau _\mathrm{R}\xi _\mathrm{C}&{} 0&{}0\\ -\tau _\mathrm{L}\xi _\mathrm{C}&{}\sigma _\mathrm{R}+\tau _\mathrm{R}\xi _\mathrm{R}&{}0&{}0\\ \delta _\mathrm{L}&{}0&{}\alpha _\mathrm{L}+\beta _\mathrm{L}/\ell &{}-\beta _\mathrm{L}/\ell \\ 0&{}\delta _\mathrm{R}&{}- \beta _\mathrm{R}/\ell &{} \alpha _\mathrm{R}+\beta _\mathrm{R}/\ell \end{array}\right] \end{aligned}$$

(24)

that depends on $\alpha _\mathrm{L,R}$ and $\beta _\mathrm{L,R}$, that is on the choices of boundary conditions in (11), on the choices of penalty parameters $\sigma _\mathrm{L,R}$ and $\tau _\mathrm{L,R}$ in (12) and on the duality parameters $\delta _\mathrm{L,R}$ in (19), as well as on the scalars

$$\begin{aligned} \xi _\mathrm{L}\equiv - {\mathbf {d}}_\mathrm{L}^{{\mathsf {T}}} {\mathbf {b}}_\mathrm{L},&\xi _\mathrm{R}\equiv {\mathbf {d}}_\mathrm{R}^{{\mathsf {T}}}{\mathbf {b}}_\mathrm{R}&\xi _\mathrm{C}\equiv {\mathbf {d}}_\mathrm{L}^{{\mathsf {T}}}{\mathbf {b}}_\mathrm{R}=-{\mathbf {d}}_\mathrm{R}^{{\mathsf {T}}}{\mathbf {b}}_\mathrm{L}. \end{aligned}$$

(25)

Proof of Theorem 3.1

The proof is given in “Appendix B”. $\square $

Note that the quantities in (23), and thus the validity of Theorem 3.1, rely on the existence of ${\bar{A}}^{-1}$. In “Appendix D”, the explicit values of ${\bar{A}}^{-1}$, as well as of $G_2$, ${\mathbf {b}}_\mathrm{L,R}$, $\xi _\mathrm{L,R}$ and $\xi _\mathrm{C}$, are provided for the (2,0), (2,1) and (4,2) order accurate narrow-stencil operators and the (2,0) order accurate wide-stencil operator. This directly proves the existence of ${\bar{A}}^{-1}$ for these operators. Higher order accurate operators have free parameters, but empirically we can draw the conclusion that ${\bar{A}}^{-1}$ must exist at least for the parameter choices in [25], since the operators therein have been applied successfully for many years.

Given the existence of ${\bar{A}}^{-1}$, we note that ${\widetilde{A}}$ in (22) is singular if and only if $ \Sigma $ in (24) is singular. The matrix $ \Sigma $ is in turn singular if any of the two relations

$$\begin{aligned}&(\alpha _\mathrm{L}+\beta _\mathrm{L}/\ell )( \alpha _\mathrm{R}+\beta _\mathrm{R}/\ell )-\beta _\mathrm{L}\beta _\mathrm{R}/\ell ^2=0 \end{aligned}$$

(26)

$$\begin{aligned}&(\sigma _\mathrm{L}+\tau _\mathrm{L}\xi _\mathrm{L})(\sigma _\mathrm{R}+\tau _\mathrm{R}\xi _\mathrm{R})-\tau _\mathrm{L}\tau _\mathrm{R}\xi _\mathrm{C}^2=0 \end{aligned}$$

(27)

holds. The first condition is related to the continuous boundary conditions, and makes the matrix singular if Neumann boundary conditions are imposed on both boundaries, i.e. if $\alpha _\mathrm{L}=\alpha _\mathrm{R}=0$. The second condition has to do with the choice of penalty parameters, and leads us to the following corollary of Theorem 3.1:

Corollary 3.2

The matrix ${\widetilde{A}}$, described in (15), is singular when the penalty parameters simultaneous fulfill $\sigma _\mathrm{L}=-\left( \xi _\mathrm{L}+\zeta |\xi _\mathrm{C}|\right) \tau _\mathrm{L}$ and $\sigma _\mathrm{R}=-\left( \xi _\mathrm{R}+|\xi _\mathrm{C}|/\zeta \right) \tau _\mathrm{R}$, where $\zeta \ne 0$. If $\xi _\mathrm{C}$, $\tau _\mathrm{L}$ or $\tau _\mathrm{R}$ is zero, the matrix ${\widetilde{A}}$ is singular if either $\sigma _\mathrm{L}=-\tau _\mathrm{L}\xi _\mathrm{L}$ or if $\sigma _\mathrm{R}=-\tau _\mathrm{R}\xi _\mathrm{R}$.

Proof of Corollary 3.2

We make the ansatz $\sigma _\mathrm{L,R}=-\tau _\mathrm{L,R}\xi _\mathrm{L,R}-\varepsilon _\mathrm{L,R}$ with some unknown scalars $\varepsilon _\mathrm{L,R}$. Inserting this into (27) above gives $\varepsilon _\mathrm{L}\varepsilon _\mathrm{R}=\tau _\mathrm{L}\tau _\mathrm{R}\xi _\mathrm{C}^2$ which is fulfilled for all pairs $\varepsilon _\mathrm{L}=\tau _\mathrm{L}|\xi _\mathrm{C}|\zeta $ and $\varepsilon _\mathrm{R}=\tau _\mathrm{R}|\xi _\mathrm{C}|/\zeta $ with arbitrary choices of $\zeta \ne 0$. If $\xi _\mathrm{C}$, $\tau _\mathrm{L}$ or $\tau _\mathrm{R}$ is equal to zero, it is enough if either $\varepsilon _\mathrm{L}=0$ or $\varepsilon _\mathrm{R}=0$. $\square $

The requirements on A and ${\mathbf {d}}_\mathrm{L,R}$ in Theorem 3.1 are only that A is symmetric, that ${\bar{A}}^{-1}$ exists (as discussed above) and that $D_2$ and ${\mathbf {d}}_\mathrm{L,R}$ in (13) are consistent such that the relations (43) and (44) in “Appendix B” holds. In addition we will assume that $D_2$ is constructed such the left and right boundary closures are equivalent. This implies that A is a centrosymmetric matrix, that is $A_{i,j} = A_{n-i,n-j}$ for all $0 \le i,j \le n$, and that $({\mathbf {d}}_\mathrm{L})_i = -({\mathbf {d}}_\mathrm{R})_{n-i}$ for $0 \le i \le n$. This additional assumption leads to $\xi _\mathrm{L}=\xi _\mathrm{R}$ (this is easiest seen by expressing the quantities in (25) as $\xi _\mathrm{L,R}= 1/\ell +{\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}G_2{\mathbf {d}}_\mathrm{L,R}$ and $\xi _\mathrm{C}=1/\ell +{\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}G_2{\mathbf {d}}_\mathrm{R,L}$ and thereafter using the fact that the inverse of a centrosymmetric matrix is also centrosymmetric). For later reference we define

$$\begin{aligned} \xi _\mathrm{T}\equiv \xi _\mathrm{L,R}+|\xi _\mathrm{C}|, \end{aligned}$$

(28)

and assume that the penalty is chosen to be equally strong on both boundaries:

Assumption 3.3

Choosing an equal penalty strength on both boundaries corresponds to having $\zeta =1$ in Corollary 3.2. If in addition equivalent boundary closures are assumed, such that $\xi _\mathrm{L}=\xi _\mathrm{R}$, we can use $\xi _\mathrm{T}\equiv \xi _\mathrm{L,R}+|\xi _\mathrm{C}|$ from (28). This simplifies the condition of singularity in Corollary 3.2 to $\sigma _\mathrm{L,R}=-\xi _\mathrm{T}\tau _\mathrm{L,R}$.

Remark 3.4

The inverse of ${\widetilde{A}}$ mimics a fundamental solution. For example, the Green’s function ${\mathcal {G}}$ of Poisson’s equation, $-u_{xx}=f$ with $u(0)=u( \ell )=0$, is

$$\begin{aligned} u(x)&=\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y,&{\mathcal {G}}(x,y)=\left\{ \begin{array}{ll}y(1-x/\ell ),&{}\quad y<x,\\ x(1-y/\ell ),&{}\quad x\le y.\end{array}\right. \end{aligned}$$

Recalling that the matrix $H$ has the role of a quadrature rule, we see the clear similarity to the time-independent, homogeneous version of (14), ${\mathbf {v}}={\widetilde{A}}^{-1}H\mathbf {f}$. The resemblance is more obvious if the penalty dependent part in (22) is ignored, since then ${\mathbf {v}}=G_2H\mathbf {f}$. For the second order accurate approximation given in (64), $G_2$ is exact in the grid points, as

$$\begin{aligned} \left( G_2\right) _{i,j}=\left\{ \begin{array}{ll} x_j(1-x_i/\ell ),&{}\quad 0\le j\le i\le n,\\ x_i(1-x_j/\ell ),&{}\quad 0\le i \le j\le n. \end{array}\right. \end{aligned}$$

This is identical with the result noted for the classical finite difference method using injection instead of SAT, compare [4, 28]. With Robin boundary conditions we have

$$\begin{aligned} u(x)&=\int _0^{ \ell }{\mathcal {G}}(x,y)f(y)\,\mathrm {d} y+c_\mathrm{L}(1-x/\ell )+c_\mathrm{R}x/\ell \end{aligned}$$

where $c_\mathrm{L,R}$ depends on the type and data of the boundary conditions from (11), as

$$\begin{aligned} \left[ \begin{array}{c}c_\mathrm{L}\\ c_\mathrm{R}\end{array}\right] =\left[ \begin{array}{cc}\alpha _\mathrm{L}+\beta _\mathrm{L}/\ell &{}\quad -\beta _\mathrm{L}/\ell \\ -\beta _\mathrm{R}/\ell &{}\quad \alpha _\mathrm{R}+\beta _\mathrm{R}/\ell \end{array}\right] ^{-1}\left[ \begin{array}{c}g_\mathrm{L}+\beta _\mathrm{L}\int _0^\ell \left( 1-y/\ell )\right) f(y)\,\mathrm {d} y\\ g_\mathrm{R}+\beta _\mathrm{R}\int _0^{\ell }(y/\ell )f(y)\,\mathrm {d} y\\ \end{array}\right] . \end{aligned}$$

The discrete counterpart is still ${\mathbf {v}}={\widetilde{A}}^{-1}H\widetilde{\mathbf{f}}$, which, using relations in Theorem 3.1 and “Section B.1” of Appendix and with $\widetilde{\mathbf{f}}=\mathbf {f}-H^{-1}( \sigma _\mathrm{L}{\mathbf {e}}_\mathrm{L}-\tau _\mathrm{L}{\mathbf {d}}_\mathrm{L})g_\mathrm{L}-H^{-1}(\sigma _\mathrm{R}{\mathbf {e}}_\mathrm{R}+\tau _\mathrm{R}{\mathbf {d}}_\mathrm{R}) g_\mathrm{R}$, can be written

where

$$\begin{aligned} \left[ \begin{array}{c}\eta _\mathrm{L}\\ \eta _\mathrm{R}\end{array}\right]&=\left[ \begin{array}{cc}\sigma _\mathrm{L}+\tau _\mathrm{L}\xi _\mathrm{L}&{}\quad -\tau _\mathrm{R}\xi _\mathrm{C}\\ -\tau _\mathrm{L}\xi _\mathrm{C}&{}\quad \sigma _\mathrm{R}+\tau _\mathrm{R}\xi _\mathrm{R}\end{array}\right] ^{-1}\left[ \begin{array}{c}{\mathbf {b}}_\mathrm{L}^{\mathsf {T}}\\ {\mathbf {b}}_\mathrm{R}^{\mathsf {T}}\end{array}\right] H\mathbf {f}. \end{aligned}$$

Unless $\mathbf {f}=0$, such that $\eta _\mathrm{L,R}=0$, the numerical solution ${\mathbf {v}}$ differs depending on the choice of penalty parameters, where the vectors ${\mathbf {1}}$, ${\mathbf {x}}$ and ${\mathbf {b}}_\mathrm{L,R}$ span the possible perturbations. As long as choices resulting in $\sigma _\mathrm{L,R}+\xi _\mathrm{T}\tau _\mathrm{L,R}\approx 0$ are avoided, this perturbation is slight.

3.3 Relations Between Stability, Singularity and Dual Consistency

We take a look at the relation between the stability requirements on the scheme (12) and the conditions that make its discretization matrix singular. First, we note that:

Theorem 3.5

Consider $\gamma $ in (17) and $\xi _\mathrm{T}$ in (28). It holds that $h\gamma =1/\xi _\mathrm{T}$.

Proof

Theorem 3.5 is proven in “Section C.1” of Appendix. $\square $

A consequence of Theorem 3.5 is that the stability demands in (20) can be written

$$\begin{aligned} \sigma _\mathrm{L,R}\alpha _\mathrm{L,R}\le 0,&\tau _\mathrm{L,R}\beta _\mathrm{L,R}\le 1/\xi _\mathrm{T},&\delta _\mathrm{L,R}^2\le -4 \alpha _\mathrm{L,R}(\sigma _\mathrm{L,R}/\xi _\mathrm{T}+\tau _\mathrm{L,R}), \end{aligned}$$

(29)

with $\delta _\mathrm{L,R}$ from (19). We will see that the penalty can be chosen such that we have energy stability and a singular discretization matrix at the same time: from Assumption 3.3 we know that the matrix ${\widetilde{A}}$ is singular when $\sigma _\mathrm{L,R}=-\tau _\mathrm{L,R}\xi _\mathrm{T}$. Inserting this into (29), the third stability demand becomes $\delta _\mathrm{L,R}^2\le 0$, which is only fulfilled if the penalty parameters are chosen in a dual consistent way. This means that if (12) is an energy stable scheme, it must also be dual consistent to risk having a singular discretization matrix. Note though that even if the scheme is dual consistent, a singular discretization matrix is avoided by choosing $\sigma _\mathrm{L,R}\ne -\tau _\mathrm{L,R}\xi _\mathrm{T}$. To be precise, simultaneous having $\sigma _\mathrm{L,R}=-\xi _\mathrm{T}/(\beta _\mathrm{L,R}\xi _\mathrm{T}+\alpha _\mathrm{L,R})$ and $\tau _\mathrm{L,R}=1 /(\beta _\mathrm{L,R}\xi _\mathrm{T}+\alpha _\mathrm{L,R})$ should be avoided, since this particular choice makes $\delta _\mathrm{L,R}=0$, fulfills the stability demands but at the same time makes ${\widetilde{A}}$ singular.

In Assumption 3.3, one can argue that $\zeta =-1$ gives just as an equal penalty strength as $\zeta =1$, simplifying Corollary 3.2 to $\sigma _\mathrm{L,R}=-\left( \xi _\mathrm{L,R}-|\xi _\mathrm{C}|\right) \tau _\mathrm{L,R}$. However, these choices do not give energy stability and are therefore not interesting for our further discussions. Besides, $|\xi _\mathrm{C}|$ tend to be very small so in practice it does not make much of a difference.

3.4 Relations to the Stability Demands in [13]

In Sect. 3.1.1 the “borrowing technique” is used for deriving the stability restrictions on the penalty parameters. In [13], a different approach (inspired by [3, 18] where wide-stencil discretizations are rewritten as first order systems) is used for showing stability, and here we are going to comment on some connections between the two methods.

In [13], it is assumed that A can be decomposed as in [7], that is as

$$\begin{aligned} A=A^{\mathsf {T}}=S^{\mathsf {T}}MS,&{\mathbf {d}}_\mathrm{L}=S^{\mathsf {T}}{\mathbf {e}}_\mathrm{L},&{\mathbf {d}}_\mathrm{R}=S^{\mathsf {T}}{\mathbf {e}}_\mathrm{R}, \end{aligned}$$

(30)

and the strategy for showing stability is to modify the approximation of $u_x$ from $S{\mathbf {v}}$ to the auxiliary variable ${\mathbf {w}}=S{\mathbf {v}}+M^{-1}{\mathbf {e}}_\mathrm{L}\rho _\mathrm{L}+M^{-1}{\mathbf {e}}_\mathrm{R}\rho _\mathrm{R}$. In [13], $\rho _\mathrm{L,R}$ are penalty-like terms proportional to the solution deviations from boundary data, but other options are possible. Computing ${\mathbf {w}}^{\mathsf {T}}M{\mathbf {w}}$ makes the terms

$$\begin{aligned} 2{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{L}\rho _\mathrm{L}+2{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{R}\rho _\mathrm{R}+q_\mathrm{L}\rho _\mathrm{L}^2 +2q_\mathrm{C}\rho _\mathrm{L}\rho _\mathrm{R}+q_\mathrm{R}\rho _\mathrm{R}^2 \le 2{\mathbf {v}}^{\mathsf {T}}({\mathbf {d}}_\mathrm{L}\rho _\mathrm{L}+{\mathbf {d}}_\mathrm{R}\rho _\mathrm{R}) +q_\mathrm{T}(\rho _\mathrm{L}^2 +\rho _\mathrm{R}^2 ) \end{aligned}$$

available to the boundary terms in (16), where $q_\mathrm{L,R}$, $q_\mathrm{C}$ and $q_\mathrm{T}$ are defined as

$$\begin{aligned} q_\mathrm{L,R}\equiv {\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}M^{-1}{\mathbf {e}}_\mathrm{L,R},&q_\mathrm{C}\equiv {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}M^{-1}{\mathbf {e}}_\mathrm{R}={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}M^{-1}{\mathbf {e}}_\mathrm{L},&q_\mathrm{T}\equiv q_\mathrm{L,R}+|q_\mathrm{C}|. \end{aligned}$$

(31)

The “borrowing technique” on the other hand, makes the terms $ -h\gamma {\mathbf {v}}^{\mathsf {T}}({\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+{\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}){\mathbf {v}}$ available for the boundary terms in (16).

Although these two approaches of showing stability are different, they are closely related. In Lemma 3.6 we formalize this relation and show that $q_\mathrm{T}=1/(h\gamma )$.

Lemma 3.6

Assume that A in (13) can be factorized as in (30) with $M>0$, and define $q_\mathrm{T}$ as stated in (31). Next, consider (17), where the parameter $\gamma $ is defined as the maximum number such that ${\tilde{A}}_\gamma \ge 0$ still holds. Then it holds that $h\gamma =1/q_\mathrm{T}$.

Proof

Lemma 3.6 is proven in “Section C.2” of Appendix. $\square $

For wide-stencil operators, $S=D_1$ and $M=H$ in (30), and the parameters $q_\mathrm{L,R}$ and $q_\mathrm{C}$ in (31) are easily obtained since M is known. For narrow-stencil operators on the other hand, M and the interior of S are not uniquely defined. In [13], the strategy was (under the contrary assumption that S is non-singular and M is singular) to compute

$$\begin{aligned} {\widetilde{q}}_\mathrm{L,R}\equiv {\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}{\widetilde{M}}^{-1}{\mathbf {e}}_\mathrm{L,R},&{\widetilde{q}}_\mathrm{C}\equiv {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\widetilde{M}}^{-1}{\mathbf {e}}_\mathrm{R}={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\widetilde{M}}^{-1}{\mathbf {e}}_\mathrm{L},&{\widetilde{q}}_\mathrm{T}\equiv {\widetilde{q}}_\mathrm{L,R}+|{\widetilde{q}}_\mathrm{C}| \end{aligned}$$

(32)

instead, where ${\widetilde{M}}\equiv S^{-{\mathsf {T}}}(A+p{\mathbf {e}}_\mathrm{L}{\mathbf {e}}_\mathrm{L}^{\mathsf {T}})S^{-1}$ with $p\ne 0$ being a perturbation parameter. For wide-stencil operators though, it can easily be checked numerically that $q_\mathrm{L,R}\ne {\widetilde{q}}_\mathrm{L,R}$ and $q_\mathrm{C}\ne {\widetilde{q}}_\mathrm{C}$. This is somewhat alarming, but it can as easily be checked that it still holds that $q_\mathrm{T}={\widetilde{q}}_\mathrm{T}$. We confirm this analytically in Theorem 3.8 below, and the use of ${\widetilde{q}}_\mathrm{T}$ in [13] is thus justified. First though, we note the following:

Lemma 3.7

The quantities ${\widetilde{q}}_\mathrm{L,R}$ and ${\widetilde{q}}_\mathrm{C}$ defined in (32) are identical to the quantities $\xi _\mathrm{L,R}$ and $\xi _\mathrm{C}$ in (25).

Proof

Lemma 3.7 is proven in “Section C.3” of Appendix. $\square $

Thus, in summary, we have that:

Theorem 3.8

Assume that A in (13) can be factorized as in (30) with $M>0$, and define $q_\mathrm{T}$ as stated in (31). Next, assume that M is singular instead, with $M\ge 0$, and define ${\widetilde{q}}_\mathrm{T}$ as stated in (32). Then it holds that $q_\mathrm{T}={\widetilde{q}}_\mathrm{T}$.

Proof

From Lemma 3.6 we have that $q_\mathrm{T}=1/(h\gamma )$ and from Theorem 3.5 we have that $1/(h\gamma )=\xi _\mathrm{T}$. Combining Lemma 3.7 with the definitions in (32) and (28) we deduce that $\xi _\mathrm{T}={\widetilde{q}}_\mathrm{T}$. All in all, this gives $q_\mathrm{T}=1/(h\gamma )=\xi _\mathrm{T}={\widetilde{q}}_\mathrm{T}$ concluding the proof. $\square $

For an example, see the derived values of ${\widetilde{q}}_{\mathrm{L,R},\mathrm{C}}$ and $q_{\mathrm{L,R},\mathrm{C}}$ for the wide-stencil (2,0) order operator in “Section D.4” of Appendix. As a numerical confirmation, in Table 1 we compare the values of $h{\widetilde{q}}_\mathrm{T}$ from [13] to the values of $\gamma $ computed in [24, 32]. In Table 1 though, it appears that $h{\widetilde{q}}_\mathrm{T}\ge 1/\gamma $. This is because the listed $\gamma $ are computed for $n\rightarrow \infty $, and are as such slightly too large for very coarse meshes.

Table 1 The borrowing parameter $\gamma $ computed in [24, 32], for narrow-stencil second derivative operators from [23, 25]

Full size table

4 Conclusions

We discretize the scalar advection equation and the heat equation in one-dimensional space, using the SBP-SAT finite difference method. This gives rise to two semi-discrete schemes of the form ${\mathbf {v}}_t+L{\mathbf {v}}=\widetilde{\mathbf{f}}$, where the discretization matrix $L$ is approximating either the first derivative or the second derivative, including treatment of the boundary conditions. The matrix $L$ is, due to properties of the SBP-SAT method, associated with a positive definite matrix $H$ such that $L=H^{-1}K$, where the inverse of $K$ is interpreted as a discrete Green’s function. We derive the general forms of these inverses, and provide explicit examples of $K^{-1}$ for some operators $L$ of second and fourth order accuracy.

The boundary treatment SAT induces free parameters in $L$. We first determine these parameters such that the semi-discrete schemes are energy stable. Any remaining degrees of freedom can be used to make the schemes dual consistent. Another important question is whether the discretization matrices $L$ are invertible. Conveniently, the formula for $K^{-1}$ reveals precisely which combinations of SAT parameters that make $L$ singular.

In the second derivative case, it turns out that for one very particular choice of SAT parameters, $L$ can become singular even when the scheme is energy stable. Here, we can avoid this and instead choose the parameters such that the scheme is energy stable, dual consistent and guaranteed to have an invertible discretization matrix (and consequently a unique solution). However, for more complex problems it might not be feasible to prove that the discretization matrix is invertible, not even for energy stable schemes.

Last, we take a look at two supposedly different approaches of proving energy stability. Curiously, they are closely related, leading to the same demands on the SAT parameters.

References

Almquist, M., Wang, S., Werpers, J.: Order-preserving interpolation for summation-by-parts operators at nonconforming grid interfaces. SIAM J. Sci. Comput. 41(2), 1201–1227 (2019)
Article MathSciNet Google Scholar
Appelö, D., Kreiss, G.: Application of a perfectly matched layer to the nonlinear wave equation. Wave Motion 44(7), 531–548 (2007)
Article MathSciNet Google Scholar
Berg, J., Nordström, J.: Superconvergent functional output for time-dependent problems using finite differences on summation-by-parts form. J. Comput. Phys. 231(20), 6846–6860 (2012)
Article MathSciNet Google Scholar
Beyn, W.-J.: Discrete Green’s functions and strong stability properties of the finite difference method. Appl. Anal. 14(2), 73–98 (1982)
Brandén, H., Holmgren, S., Sundqvist, P.: Discrete fundamental solution preconditioning for hyperbolic systems of PDE. J. Sci. Comput. 30(1), 35–60 (2007)
Article MathSciNet Google Scholar
Carpenter, M.H., Gottlieb, D., Abarbanel, S.: Time-stable boundary conditions for finite-difference schemes solving hyperbolic systems: methodology and application to high-order compact schemes. J. Comput. Phys. 111(2), 220–236 (1994)
Article MathSciNet Google Scholar
Carpenter, M.H., Nordström, J., Gottlieb, D.: A stable and conservative interface treatment of arbitrary spatial accuracy. J. Comput. Phys. 148(2), 341–365 (1999)
Article MathSciNet Google Scholar
Chung, F., Yau, S.-T.: Discrete Green’s functions. J. Comb. Theory Ser. A 91(1), 191–214 (2000)
Courant, R., Friedrichs, K., Lewy, H.: Über die partiellen differenzengleichungen der mathematischen physik. Math. Ann. 100(1), 32–74 (1928)
Article MathSciNet Google Scholar
Deeter, C.R., Springer, G.: Discrete harmonic kernels. J. Math. Mech. 14(3), 413–438 (1965)
MathSciNet MATH Google Scholar
Del Rey Fernández, D.C., Boom, P.D., Zingg, D.W.: A generalized framework for nodal first derivative summation-by-parts operators. J. Comput. Phys. 266, 214–239 (2014)
Article MathSciNet Google Scholar
Del Rey Fernández, D.C., Hicken, J.E., Zingg, D.W.: Review of summation-by-parts operators with simultaneous approximation terms for the numerical solution of partial differential equations. Comput. Fluids 95, 171–196 (2014)
Article MathSciNet Google Scholar
Eriksson, S.: A dual consistent finite difference method with narrow stencil second derivative operators. J. Sci. Comput. 75(2), 906–940 (2018)
Article MathSciNet Google Scholar
Eriksson, S., Nordström, J.: Analysis of the order of accuracy for node-centered finite volume schemes. Appl. Numer. Math. 59(10), 2659–2676 (2009)
Article MathSciNet Google Scholar
Gong, J., Nordström, J., Stable: Accurate and Efficient Interface Procedures for Viscous Problems. Technical Report 2006-19, Department of Information Technology, Uppsala University, Uppsala, Sweden (2006)
Grote, M., Huckle, T.: Parallel preconditioning with sparse approximate inverses. SIAM J. Sci. Comput. 18(3), 838–853 (1997)
Article MathSciNet Google Scholar
Gustafsson, B., Kreiss, H.-O., Oliger, J.: Time-Dependent Problems and Difference Methods. Wiley, New York (2013)
Book Google Scholar
Hicken, J.E., Zingg, D.W.: Superconvergent functional estimates from summation-by-parts finite-difference discretizations. SIAM J. Sci. Comput. 33(2), 893–922 (2011)
Article MathSciNet Google Scholar
Hicken, J.E., Zingg, D.W.: Summation-by-parts operators and high-order quadrature. J. Comput. Appl. Math. 237(1), 111–125 (2013)
Article MathSciNet Google Scholar
Kreiss, H.-O., Lorenz, J.: Initial-Boundary Value Problems and the Navier–Stokes Equations. Academic Press, Boston (1989)
MATH Google Scholar
Kreiss, H.-O., Scherer, G.: Finite element and finite difference methods for hyperbolic partial differential equations. In: De Boor, C. (ed.) Mathematical Aspects of Finite Elements in Partial Differential Equation. Academic Press, New York (1974)
Google Scholar
Linders, V., Nordström, J., Frankel, S.H.: Properties of Runge–Kutta-summation-by-parts methods. J. Comput. Phys. 419, 109684 (2020)
Article MathSciNet Google Scholar
Mattsson, K., Almquist, M.: A solution to the stability issues with block norm summation by parts operators. J. Comput. Phys. 253, 418–442 (2013)
Article MathSciNet Google Scholar
Mattsson, K., Ham, F., Iaccarino, G.: Stable and accurate wave-propagation in discontinuous media. J. Comput. Phys. 227(19), 8753–8767 (2008)
Article MathSciNet Google Scholar
Mattsson, K., Nordström, J.: Summation by parts operators for finite difference approximations of second derivatives. J. Comput. Phys. 199(2), 503–540 (2004)
Article MathSciNet Google Scholar
Mattsson, K., Svärd, M., Shoeybi, M.: Stable and accurate schemes for the compressible Navier–Stokes equations. J. Comput. Phys. 227(4), 2293–2316 (2008)
Article MathSciNet Google Scholar
Ruggiu, A.A., Nordström, J.: Eigenvalue analysis for summation-by-parts finite difference time discretizations. SIAM J. Numer. Anal. 58(2), 907–928 (2020)
Article MathSciNet Google Scholar
Stetter, H.J.: Instability and non-monotonicity phenomena in discretizations to boundary-value problems. Numer. Math. 12(2), 139–145 (1968)
Article MathSciNet Google Scholar
Strand, B.: Summation by parts for finite difference approximation for d/dx. J. Comput. Phys. 110(1), 47–67 (1994)
Article MathSciNet Google Scholar
Svärd, M., Nordström, J.: A stable high-order finite difference scheme for the compressible Navier–Stokes equations: no-slip wall boundary conditions. J. Comput. Phys. 227(10), 4805–4824 (2008)
Article MathSciNet Google Scholar
Svärd, M., Nordström, J.: Review of summation-by-parts schemes for initial-boundary-value problems. J. Comput. Phys. 268, 17–38 (2014)
Article MathSciNet Google Scholar
Wang, S., Kreiss, G.: Convergence of summation-by-parts finite difference methods for the wave equation. J. Sci. Comput. 71(1), 219–245 (2017)
Article MathSciNet Google Scholar

Download references

Acknowledgements

I would like to thank Jan Nordström and Anna Nissen for inspiration and encouragement in early discussions about this work.

Funding

Open access funding provided by Linnaeus University.

Author information

Authors and Affiliations

Department of Mathematics, Linnaeus University, Växjö, Sweden
Sofia Eriksson

Authors

Sofia Eriksson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sofia Eriksson.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Explicit Inverses of the First Derivative Operator

1.1 The (2,1) Order Accurate Operator

In the second order case, we have

(33)

with the associated norm-matrix $H=h\ \mathrm{diag}\left( \begin{array}{ccccccc}\frac{1}{2},&\quad 1,&\quad 1,&\quad \ldots ,&\quad 1,&\quad 1,&\quad \frac{1}{2}\end{array}\right) $. In (33), we identify $\vec {q}^{\mathsf {T}}=\left[ \begin{array}{ccccc}\frac{1}{2}&\quad 0&\quad \ldots&\quad 0&\quad 0\end{array}\right] $ and (given below) according to (8). Using Gauss–Jordan elimination we find the inverse of , as

We compute as well. Inserting these results into (9) and (10) yields

$$\begin{aligned} {\widetilde{Q}}^{-1}=2\left[ \begin{array}{ccccccc}0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad \cdots \\ 0&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad \cdots \\ 0&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad 0&{}\quad \cdots \\ 0&{}\quad 1&{}\quad 0&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad \cdots \\ 0&{}\quad 1&{}\quad 0&{}\quad 1&{}\quad 0&{}\quad 0&{}\quad \cdots \\ 0&{}\quad 1&{}\quad 0&{}\quad 1&{}\quad 0&{}\quad 1&{}\quad \cdots \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots \end{array}\right] -\frac{1}{\sigma _\mathrm{L}}\left[ \begin{array}{ccccccc}1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad \cdots \\ 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad \cdots \\ 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad \cdots \\ 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad \cdots \\ 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad \cdots \\ 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad 1&{}\quad -1&{}\quad \cdots \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots \end{array}\right] . \end{aligned}$$

(34)

The formula (34) holds both for even and odd number of grid points $n$. If $n$ is even (as assumed in the derivation in [14]), the bottom last element of ${\widetilde{Q}}^{-1}$ is $-1/\sigma _\mathrm{L}$, if $n$ is odd, the bottom last element of ${\widetilde{Q}}^{-1}$ is $2+1/\sigma _\mathrm{L}$.

1.2 The (4,2) Order Accurate Operator

In [29], we find $D_1$ with fourth order interior accuracy and the associated $H$. Together with (5) and (7), this gives us

(35)

We identify and $ \vec {q}$ as indicated in (8). We are now looking for a matrix such that . Let be composed as

For to hold, must be fulfilled for all $j=1,2,\ldots ,n$, where the $n\times 1$ vector $\vec {e}_j=[0, \ldots , 0, 1, 0, \ldots , 0]^{\mathsf {T}}$ is non-zero only in its $j\hbox {th}$ element. For to be fulfilled, the interior rows lead to $g_{i-2,j}-8g_{i-1,j}+8g_{i+1,j}- g_{i+2,j} =12\delta _{i,j}$, where $\delta _{i,j}$ is the Kronecker delta. Hence, the fourth order linear homogeneous recurrence relation $g_{i-2,j} -8g_{i-1,j}+8g_{i+1,j}-g_{i+2,j}=0$ has to be fulfilled by most $g_{i,j}$. The general, explicit solution to this recursive relation has the form $g_{i,j}=c_{1}+c_{2}(-1)^i+c_{3}\phi ^i+ c_{4}\phi ^{-i}$, where $\phi =4+\sqrt{15}\approx 7.873$ and where $ c_{1,2,3,4}$ are j-dependent constants.

The requirement takes slightly different forms depending on j. For $j=1$, we have , which is expressed explicitly as

$$\begin{aligned} \frac{1}{96} \left[ \begin{array}{c} 59g_{2,1} \\ -59g_{1,1}+59g_{3,1}-8g_{4,1} \\ -59g_{2,1}+64g_{4,1}-8g_{5,1} \\ 8\left( g_{2,1} - 8g_{3,1}+8g_{5,1}- g_{6,1} \right) \\ \vdots \\ 8\left( g_{i-2,1} - 8g_{i-1,1}+8g_{i+1,1}- g_{i+2,1} \right) \\ \vdots \\ 8\left( g_{n-6,1}-8g_{n-5,1}+8g_{n-3,1}- g_{n-2,1} \right) \\ 8g_{n-5,1} -64g_{n-4,1}+59g_{n-2,1}-3g_{n,1}\\ 8g_{n-4,1} -59g_{n-3,1}+ 59g_{n-1,1}-8g_{n,1}\\ -59g_{n-2,1}+ 59g_{n,1}\\ 3g_{n-3,1} +8g_{n-2,1}- 59g_{n-1,1}+48g_{n,1} \end{array}\right] =\left[ \begin{array}{c}1\\ 0\\ 0\\ 0\\ \vdots \\ 0\\ \vdots \\ 0\\ 0\\ 0\\ 0\\ 0\end{array}\right] . \end{aligned}$$

(36)

The ansatz $g_{i,1}=c_{1}+c_{2}(-1)^i+c_{3}\phi ^i+ c_{4}\phi ^{-i}$ holds for $2\le i\le n-2$ where $ c_{1,2,3,4}$ are unknowns to be determined. In addition, we have the three unknowns $g_{1,1}$, $g_{n-1,1}$ and $g_{n,1}$. The three first and the four last rows in (36) gives us seven conditions. Inserting the above mentioned expressions for $g_{i,1}$ into (36), gives a linear system with seven unknowns and seven conditions, as

$$\begin{aligned} \left[ \begin{array}{ccccccc|c}0&{}\quad 59&{}\quad 59&{}\quad 59\phi ^2&{}\quad 59\phi ^{-2} &{}\quad 0&{}\quad 0&{}\quad 96\\ -59&{}\quad 51&{}\quad -67&{}\quad 59\phi ^3-8\phi ^4&{}\quad 59\phi ^{-3}-8\phi ^{-4} &{}\quad 0&{}\quad 0&{}\quad 0\\ 0&{}\quad -3&{}\quad 13&{}\quad -59\phi ^2+8\phi ^{3}&{}\quad -59\phi ^{-2}+8\phi ^{-3}&{}\quad 0&{}\quad 0 &{}\quad 0\\ 0&{}\quad 3&{}\quad -13(-1)^{n}&{}\quad \phi ^{n}(-8\phi ^{-3}+59\phi ^{-2})&{}\quad \phi ^{-n}(-8\phi ^{3}+59\phi ^{2})&{}\quad 0&{}\quad -3&{}\quad 0\\ 0&{}\quad -51&{}\quad 67(-1)^{n}&{}\quad \phi ^{n}(8\phi ^{-4}-59\phi ^{-3})&{}\quad \phi ^{-n}(8\phi ^{4}-59\phi ^{3})&{}\quad 59&{}\quad -8&{}\quad 0\\ 0&{}\quad -59&{}\quad -59(-1)^{n}&{}\quad -59\phi ^{n-2}&{}\quad -59\phi ^{2-n}&{}\quad 0&{}\quad 59&{}\quad 0\\ 0&{}\quad 11&{}\quad 5(-1)^{n}&{}\quad \phi ^{n}(3\phi ^{-3}+8\phi ^{-2})&{}\quad \phi ^{-n}(3\phi ^{3}+8\phi ^{2})&{}\quad - 59&{}\quad 48&{}\quad 0\end{array}\right] \end{aligned}$$

with the unknowns sorted as $g_{1,1}$, $c_{1}$, $c_{2}$, $c_{3}$, $ c_{4}$, $g_{n-1,1}$ and $g_{n,1}$, and where we have used the relation $\phi +\phi ^{-1} =8$ to simplify the expressions.

1.2.1 The Inverse with an Even Number of Grid Points $n$

To make the expressions manageable, we simplify by assuming that $n$ is an even number. In this particular case, when solving the $7\times 7$ system above, we obtain

$$\begin{aligned} g_{1,1} =\frac{1}{2}\left( \frac{12{\mathcal {C}}_{n}}{59{\mathcal {D}}_{n}}\right) ^2,&g_{n-1,1} =\frac{12}{59}\left( \frac{{\mathcal {C}}_{n}}{{\mathcal {D}}_{n}}-\frac{9}{59{\mathcal {D}}_{n}^2}\right) ,&g_{n,1} =\frac{12{\mathcal {C}}_{n}}{59{\mathcal {D}}_{n}}. \end{aligned}$$

where ${\mathcal {C}}_{n}$ and ${\mathcal {D}}_{n}$ are integers given in (37) below. Note that ${\mathcal {D}}_{n}\ge 1$ for even $n$, so there is no risk of division by zero. Moreover, we obtain

$$\begin{aligned} c_{1}=\frac{12{\mathcal {C}}_{n}}{59{\mathcal {D}}_{n}},&c_{2}=\frac{36}{590{\mathcal {D}}_{n}^2},&c_{3}=\frac{6(\phi -1)\phi ^{1-n}}{590{\mathcal {D}}_{n}^2},&c_{4}=\frac{6(\phi ^{-1}-1)\phi ^{n-1}}{590{\mathcal {D}}_{n}^2}, \end{aligned}$$

which inserted into the ansatz $g_{i,1}=c_{1}+c_{2}(-1)^i+c_{3}\phi ^i+ c_{4}\phi ^{-i}$ leads to

$$\begin{aligned} g_{i,1} =\frac{12}{59}\left( \frac{{\mathcal {C}}_{n}}{{\mathcal {D}}_{n}}-\frac{3{\mathcal {B}}_{n-i}}{{\mathcal {D}}_{n}^2}\right) ,&\mathrm{for}\, 2\le i\le n-2. \end{aligned}$$

The quantities ${\mathcal {B}}_j$ are integers for integers j, and are specified below

(37)

where $\nu _j=\phi ^{j}+\phi ^{-j}$. For convenience, all the $g_{i,1}$ presented above will be restated in (38) and (39), wherein we will also make use of ${\mathcal {A}}_j$ defined above.

We use the same strategy for the other columns $j>1$. For $2\le j\le n-2$, we need two different versions of the constants $ c_{1,2,3,4}$, depending on if we consider $g_{i,j}$ for $i\le j$ or for $i\ge j$. We let $g_{i,j}= c^u_{1}+ c^u_{2} (-1)^i+ c^u_{3}\phi ^i+ c^u_{4}\phi ^{-i}$ for $2\le i\le j\le n-2$ and $g_{i,j}= c^l_{1}+ c^l_{2} (-1)^i+ c^l_{3}\phi ^i+ c^l_{4}\phi ^{-i}$ for $2\le j\le i\le n-2$. Thus for every $2\le j\le n-2$, we have eight unknown constants, as well as the three remaining unknowns $ g_{1,j} $, $ g_{n-1,j}$ and $ g_{n,j}$. The three first and the four last rows in the system above gives us seven conditions. From the rows $i=j-1,j,j+1$, we get three more conditions and in addition, we demand that the two versions of $g_{j,j}$ are identical. All in all, this gives a linear system with eleven unknowns $ g_{1,j} $, $ c^u_{1} $, $ c^u_{2} $, $ c^u_{3} $, $ c^u_{4} $, $ c^l_{1} $, $ c^l_{2} $, $ c^l_{3} $, $ c^l_{4} $, $ g_{n-1,j} $ and $ g_{n,j} $ and eleven conditions.

We still consider even numbers of $n$. Solving for the unknowns and inserting ${c^u_{1,2,3,4}}$ and ${c^l_{1,2,3,4}}$ into their respective ansatz, we eventually end up with $g_{i,j}$ for the inner columns, presented below in (40) and (41). Furthermore, repeating the procedure for the last two columns, we obtain $g_{i,j}$ for $j=n-1$ and $j=n$, given in (38) and (39). To simplify the expressions in (39)–(41), we have used ${\mathcal {A}}_j$ in (37).

In summary, when $n$ is even, the inverse of is given by with $g_{i,j}$ as described in (38)–(41) below. First, the corner elements are

(38)

For $2\le i\le n-2$, we obtain

(39)

while we for $2\le j\le n-2$ have

$$\begin{aligned} g_{1,j}&= 36\frac{4{\mathcal {C}}_{n}{\mathcal {A}}_j+{\mathcal {B}}_{n-j}}{59{\mathcal {D}}_{n}^2} -\frac{12{\mathcal {C}}_{n}}{59{\mathcal {D}}_{n}},&g_{n-1,j}= \frac{12{\mathcal {A}}_j}{{\mathcal {D}}_{n}}-\frac{36{\mathcal {B}}_{j}}{59{\mathcal {D}}_{n}^2},&g_{n,j} =\frac{12{\mathcal {A}}_j}{{\mathcal {D}}_{n}}. \end{aligned}$$

(40)

Finally, the interior elements are

(41)

In the expressions above we have used ${\mathcal {D}}_{n}$, ${\mathcal {C}}_{n}$, ${\mathcal {B}}_j$ and ${\mathcal {A}}_j$ defined in (37). Next, we recall the structure in (8), and identify $\vec {q}$ in (35) as

$$\begin{aligned} \vec {q}^{\mathsf {T}}= \left[ \begin{array}{cccccccccc} \frac{59}{96} &{}\quad -\frac{1}{12} &{}\quad -\frac{1}{32} &{}\quad 0 &{}\quad 0&{}\quad \cdots &{}\quad 0 \\ \end{array}\right] , \end{aligned}$$

and compute . This gives

(42)

where we have used the structures of $g_{i,j}$ in (38)–(41), together with the following relations:

$$\begin{aligned} {\mathcal {B}}_2&=0,&{\mathcal {A}}_2&=0,&{\mathcal {B}}_{n-2}&=\frac{{\mathcal {C}}_{n} {\mathcal {D}}_{n}-8{\mathcal {D}}_{n}^2}{3}\\ {\mathcal {B}}_3&=1,&{\mathcal {A}}_3&=\frac{{\mathcal {C}}_{n}-8{\mathcal {D}}_{n}}{3},&{\mathcal {B}}_{n-3}&=\frac{-2{\mathcal {C}}_{n}^2+33{\mathcal {C}}_{n} {\mathcal {D}}_{n}-136{\mathcal {D}}_{n}^2}{3}. \end{aligned}$$

As an example, we write out from (38)–(41) explicitly, for $n=8$, as

where e.g. ${\mathcal {D}}_{8}=55$. Correspondingly, we have

Inserting from (38)–(41), and (42) into (9) and (10) yields the inverse of ${\widetilde{Q}}$ in the (4,2) order accurate case (for $n$ even). In the example with $n=8$, we have

$$\begin{aligned} {\widetilde{Q}}^{-1}\approx \left[ \begin{array}{ccccccccc}1&{}\quad -0.72&{}\quad 1&{}\quad -0.75&{}\quad 0.78&{}\quad -0.75&{}\quad 1&{}\quad -0.72&{}\quad 1\\ 1&{}\quad 0.76&{}\quad -0.63&{}\quad 0.54&{}\quad -0.56&{}\quad 0.53&{}\quad -0.72&{}\quad 0.51&{}\quad -0.72\\ 1&{}\quad 0.91&{}\quad 1&{}\quad -0.75&{}\quad 0.78&{}\quad -0.75&{}\quad 1&{}\quad -0.72&{}\quad 1\\ 1&{}\quad 0.99&{}\quad 1&{}\quad 0.78&{}\quad -0.56&{}\quad 0.56&{}\quad -0.75&{}\quad 0.53&{}\quad -0.75\\ 1&{}\quad 1.00&{}\quad 1&{}\quad 0.97&{}\quad 0.81&{}\quad -0.56&{}\quad 0.78&{}\quad -0.56&{}\quad 0.78\\ 1&{}\quad 1.00&{}\quad 1&{}\quad 1.00&{}\quad 0.97&{}\quad 0.78&{}\quad -0.75&{}\quad 0.54&{}\quad -0.75\\ 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad -0.63&{}\quad 1\\ 1&{}\quad 1.00&{}\quad 1&{}\quad 1.00&{}\quad 1.00&{}\quad 0.99&{}\quad 0.91&{}\quad 0.76&{}\quad -0.72\\ 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1&{}\quad 1\end{array}\right] \end{aligned}$$

for $\sigma _\mathrm{L}=-1$, which we see mimic the Green’s function, as discussed in Remark 2.3.

Recall that we assumed that $n$ was even. Repeating the derivation for odd $n$, the resulting inverse ${\widetilde{Q}}^{-1}$ has a similar behaviour, but with other coefficients. For example, the denominators will instead be ${\widetilde{{\mathcal {D}}}}_n=(\phi ^{(n-3)/2}-\phi ^{(3-n)/2})/\sqrt{60}$, which are positive integers for odd $n\ge 5$.

Proof of Theorem 3.1

Theorem 3.1 states that the inverse of ${\widetilde{A}}$ from (15) is equal to the expression (22). This is shown in “Section B.2” of Appendix, however, first, we present some useful relations.

1.1 Preliminaries

Note that $D_2{\mathbf {1}}={\mathbf {0}}$ and $D_2{\mathbf {x}}={\mathbf {0}}$, since $D_2$ approximates the second derivative operator (these two relations actually hold also for the inconsistent (2,0) order accurate operators in “Sections D.1” and “D.4” of Appendices). Furthermore, ${\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}$ consistently approximate the first derivative, so that ${\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}{\mathbf {1}}=0$ and ${\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}{\mathbf {x}}=1$. Hence

$$\begin{aligned} {\mathbf {d}}_\mathrm{L}^{\mathsf {T}}(\ell {\mathbf {1}}-{\mathbf {x}})=-1,&{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}{\mathbf {x}}=1,&{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}(\ell {\mathbf {1}}-{\mathbf {x}})=-1,&{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}{\mathbf {x}}=1. \end{aligned}$$

(43)

Combining the above relations with $A=-HD_2+{\mathbf {e}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}-{\mathbf {e}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}$ from (13), gives

$$\begin{aligned} A(\ell {\mathbf {1}}-{\mathbf {x}})={\mathbf {e}}_\mathrm{L}-{\mathbf {e}}_\mathrm{R},&A{\mathbf {x}}={\mathbf {e}}_\mathrm{R}-{\mathbf {e}}_\mathrm{L}. \end{aligned}$$

(44)

Now, we define the additional $(n-1)\times 1$-vectors $\vec {1}=[1\ 1\ \ldots \ 1]^{\mathsf {T}}$ and $\vec {x}=h[1\ 2\ \ldots \ n-1]^{\mathsf {T}}$ (they are shorter versions of ${\mathbf {1}}$ and ${\mathbf {x}}$ in Theorem 3.1). With these new variables and with the notation from (21), the relations (44) can be expressed as

Given that A is correctly constructed, such that ${\bar{A}}$ in invertible, this leads to the relations

$$\begin{aligned} \vec {1}-\vec {x}/\ell&=-{\bar{A}}^{-1}\vec {a}_\mathrm{L},&\vec {x}/\ell&=-{\bar{A}}^{-1}\vec {a}_\mathrm{R}\end{aligned}$$

(45)

and

$$\begin{aligned} a_\mathrm{L}&=\vec {a}_\mathrm{L}^{\mathsf {T}}{\bar{A}}^{-1}\vec {a}_\mathrm{L}+\frac{1}{\ell },&a_\mathrm{R}&=\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}\vec {a}_\mathrm{R}+\frac{1}{\ell },&a_\mathrm{C}&=\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}\vec {a}_\mathrm{L}-\frac{1}{\ell } =\vec {a}_\mathrm{L}^{\mathsf {T}}{\bar{A}}^{-1}\vec {a}_\mathrm{R}-\frac{1}{\ell }.\qquad \end{aligned}$$

(46)

Now, multiplying A from (21) by $G_2$ from (23) and using the relations (45), we get

$$\begin{aligned} AG_2=\left[ \begin{array}{ccc}0&{}\quad \vec {a}_\mathrm{L}^{\mathsf {T}}{\bar{A}}^{-1}&{}\quad 0\\ \vec {0}&{}\quad {\bar{I}}&{}\quad \vec {0} \\ 0&{}\quad \vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}&{}\quad 0\end{array}\right] =I-{\mathbf {e}}_\mathrm{L}({\mathbf {1}}-{\mathbf {x}}/\ell )^{\mathsf {T}}-{\mathbf {e}}_\mathrm{R}{\mathbf {x}}^{\mathsf {T}}/\ell \end{aligned}$$

(47)

where ${\bar{I}}$ is the $(n-1)\times (n-1)$ identity matrix. From (23) we have $ {\mathbf {b}}_\mathrm{L}={\mathbf {1}}-{\mathbf {x}}/\ell -G_2{\mathbf {d}}_\mathrm{L}$ and ${\mathbf {b}}_\mathrm{R}={\mathbf {x}}/\ell +G_2{\mathbf {d}}_\mathrm{R}$, and using the relations (44), (47) and (43), we arrive at

$$\begin{aligned} A{\mathbf {b}}_\mathrm{L}=-{\mathbf {d}}_\mathrm{L},&A{\mathbf {b}}_\mathrm{R}={\mathbf {d}}_\mathrm{R}. \end{aligned}$$

(48)

The vectors ${\mathbf {e}}_\mathrm{L,R}$ picks out the first and last elements in the vectors they are multiplied by, such that

$$\begin{aligned} \begin{aligned} {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}({\mathbf {1}}-{\mathbf {x}}/\ell )=1,&{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {x}}/\ell =0,&{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}({\mathbf {1}}-{\mathbf {x}}/\ell )=0,&{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {x}}/\ell =1,\\ {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {b}}_\mathrm{L}=1,&{\mathbf {e}}_\mathrm{L}^{\mathsf {T}}{\mathbf {b}}_\mathrm{R}=0,&{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {b}}_\mathrm{L}=0,&{\mathbf {e}}_\mathrm{R}^{\mathsf {T}}{\mathbf {b}}_\mathrm{R}=1. \end{aligned} \end{aligned}$$

(49)

Finally, from (23) we have

$$\begin{aligned} {\mathbf {e}}_\mathrm{L}^{\mathsf {T}}G_2={\mathbf {e}}_\mathrm{R}^{\mathsf {T}}G_2={\mathbf {0}}^{\mathsf {T}},&{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}G_2=({\mathbf {1}}-{\mathbf {x}}/\ell -{\mathbf {b}}_\mathrm{L})^{\mathsf {T}},&{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}G_2=({\mathbf {b}}_\mathrm{R}-{\mathbf {x}}/\ell )^{\mathsf {T}}. \end{aligned}$$

(50)

We are now ready to prove Theorem 3.1.

1.2 Confirmation of Eq. (22) with (23)–(25)

We multiply ${\widetilde{A}}$ in (15) by the expression for ${\widetilde{A}}^{-1}$ in (22), with the aim of showing that ${\widetilde{A}}{\widetilde{A}}^{-1}=I$ indeed holds. In the first step, (22) yields

(51)

We start by looking at the first term in (51). First using (15), followed by the relations in (47) and (50), and thereafter just rearranging the terms, we arrive at

(52)

Next, we look at the part $\Gamma $ in (51). After rewriting ${\widetilde{A}}$ using (15), we use the relations in (48), (44), (49), (43) and (25). Thereafter, the resulting terms are rearranged. These steps are shown below in (53).

(53)

We note that the last $4\times 4$-matrix is nothing but $\Sigma $ from (24). Inserting the results from (52) and (53) into (51) gives us

concluding the proof.

Proofs of the Relations Between $\xi _\mathrm{T}$, $\gamma $, $q_\mathrm{T}$ and ${\widetilde{q}}_\mathrm{T}$

Below we present the proofs of Theorem 3.5 and the Lemmas 3.6 and 3.7.

1.1 Proof of Theorem 3.5

We aim to relate $\gamma $ in (17) to $\xi _\mathrm{T}$ in (28). Note that the latter quantity relies on that $\xi _\mathrm{L}=\xi _\mathrm{R}$ in (25). To emphasize this, we introduce $\xi _\mathrm{D}=\xi _\mathrm{L,R}$.

We start by defining ${\widetilde{{\mathbf {v}}}}={\mathbf {v}}-{\mathbf {b}}_\mathrm{L}\rho _\mathrm{L}+{\mathbf {b}}_\mathrm{R}\rho _\mathrm{R}$ with ${\mathbf {b}}_\mathrm{L,R}$ from (23), and compute

$$\begin{aligned} {\widetilde{{\mathbf {v}}}}^{\mathsf {T}}A {\widetilde{{\mathbf {v}}}}&={\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}+2\rho _\mathrm{L}{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{L}+2\rho _\mathrm{R}{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{R}+\rho _\mathrm{L}^2\xi _\mathrm{L}+2\rho _\mathrm{L}\rho _\mathrm{R}\xi _\mathrm{C}+\rho _\mathrm{R}^2\xi _\mathrm{R}\end{aligned}$$

(54)

using (48) and (25). The $(n+1)\times 1$-vector ${\mathbf {v}}$ is arbitrary and for the scalars $\rho _\mathrm{L,R}$ we make the ansatz $\rho _\mathrm{L}=(s_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+t_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}){\mathbf {v}}$ and $\rho _\mathrm{R}=(s_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}+t_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}){\mathbf {v}}$ where $t_\mathrm{L,R}$ and $s_\mathrm{L,R}$ are scalars yet to be determined. Inserted into (54), this yields

$$\begin{aligned} {\widetilde{{\mathbf {v}}}}^{\mathsf {T}}A {\widetilde{{\mathbf {v}}}}={\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}+{\mathbf {v}}^{\mathsf {T}}(z_\mathrm{L}{\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+2z_\mathrm{C}{\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}+z_\mathrm{R}{\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}){\mathbf {v}}\end{aligned}$$

(55)

where we have defined

$$\begin{aligned} \begin{aligned} z_\mathrm{L}&=2s_\mathrm{L}+2\xi _\mathrm{C}s_\mathrm{L}t_\mathrm{L}+\xi _\mathrm{L}s_\mathrm{L}^2+\xi _\mathrm{R}t_\mathrm{L}^2\\ z_\mathrm{R}&=2s_\mathrm{R}+2\xi _\mathrm{C}s_\mathrm{R}t_\mathrm{R}+\xi _\mathrm{R}s_\mathrm{R}^2+\xi _\mathrm{L}t_\mathrm{R}^2\\ z_\mathrm{C}&=t_\mathrm{L}+t_\mathrm{R}+\xi _\mathrm{L}s_\mathrm{L}t_\mathrm{R}+\xi _\mathrm{R}s_\mathrm{R}t_\mathrm{L}+\xi _\mathrm{C}s_\mathrm{L}s_\mathrm{R}+\xi _\mathrm{C}t_\mathrm{L}t_\mathrm{R}.\end{aligned} \end{aligned}$$

(56)

Using the “borrowing technique”, $\gamma $ is the maximum value such that ${\tilde{A}}_\gamma \ge 0$ still holds, referring to $\gamma $ and ${\tilde{A}}_\gamma $ from (17). For (55) to correspond to (17), we need $z_\mathrm{L}=z_\mathrm{R}$ and $z_\mathrm{C}=0$, and under these constraints we must mimimize $z_\mathrm{L,R}$. To get there, we first define $x_\mathrm{L}=s_\mathrm{L}+t_\mathrm{L}$, $y_\mathrm{L}=s_\mathrm{L}-t_\mathrm{L}$, $x_\mathrm{R}=s_\mathrm{R}+t_\mathrm{R}$ and $y_\mathrm{R}=s_\mathrm{R}-t_\mathrm{R}$. Now

$$\begin{aligned} x_\mathrm{L}+y_\mathrm{L}&=2s_\mathrm{L},&x_\mathrm{L}^2-y_\mathrm{L}^2&=4s_\mathrm{L}t_\mathrm{L},&x_\mathrm{L}^2+y_\mathrm{L}^2&=2(s_\mathrm{L}^2+t_\mathrm{L}^2),\\ x_\mathrm{R}+y_\mathrm{R}&=2s_\mathrm{R},&x_\mathrm{R}^2-y_\mathrm{R}^2&=4s_\mathrm{R}t_\mathrm{R},&x_\mathrm{R}^2+y_\mathrm{R}^2&=2(s_\mathrm{R}^2+t_\mathrm{R}^2). \end{aligned}$$

Inserted into $z_\mathrm{L}$ and $z_\mathrm{R}$ in (56), these relations gives us

$$\begin{aligned} z_\mathrm{L,R}&=x_\mathrm{L,R}+y_\mathrm{L,R}+\xi _\mathrm{C}\frac{x_\mathrm{L,R}^2-y_\mathrm{L,R}^2}{2}+\xi _\mathrm{D}\frac{x_\mathrm{L,R}^2+y_\mathrm{L,R}^2}{2}\\&=\frac{\xi _\mathrm{D}+\xi _\mathrm{C}}{2}\left( x_\mathrm{L,R}+\frac{1}{\xi _\mathrm{D}+\xi _\mathrm{C}}\right) ^2+\frac{\xi _\mathrm{D}-\xi _\mathrm{C}}{2}\left( y_\mathrm{L,R}+\frac{1}{\xi _\mathrm{D}-\xi _\mathrm{C}}\right) ^2-\frac{\xi _\mathrm{D}}{\xi _\mathrm{D}^2-\xi _\mathrm{C}^2} \end{aligned}$$

where we have used that $\xi _\mathrm{D}=\xi _\mathrm{L}=\xi _\mathrm{R}$. Note that for fixed values of $z_\mathrm{L}$ and $z_\mathrm{R}$, the pairs $(x_\mathrm{L}, y_\mathrm{L})$ and $(x_\mathrm{R}, y_\mathrm{R})$ describe ellipses. Reformulated in a parametric form, they are

(57)

where $r_\mathrm{L}^2=z_\mathrm{L}+\xi _\mathrm{D}/(\xi _\mathrm{D}^2-\xi _\mathrm{C}^2)$ and $r_\mathrm{R}^2=z_\mathrm{R}+\xi _\mathrm{D}/(\xi _\mathrm{D}^2-\xi _\mathrm{C}^2)$. To enforce $z_\mathrm{L}=z_\mathrm{R}$, we simply let $r_\mathrm{L}=r_\mathrm{R}=r$. This gives us

$$\begin{aligned} z_\mathrm{L,R}&=r^2-\frac{\xi _\mathrm{D}}{\xi _\mathrm{D}^2-\xi _\mathrm{C}^2}. \end{aligned}$$

(58)

Next, we need to fulfull the requirement $z_\mathrm{C}=0$. Inserting the relations

$$\begin{aligned} t_\mathrm{L,R}=\frac{x_\mathrm{L,R}-y_\mathrm{L,R}}{2},&s_\mathrm{L}t_\mathrm{R}+t_\mathrm{L}s_\mathrm{R}=\frac{x_\mathrm{L}x_\mathrm{R}-y_\mathrm{L}y_\mathrm{R}}{2},&s_\mathrm{L}s_\mathrm{R}+t_\mathrm{L}t_\mathrm{R}=\frac{x_\mathrm{L}x_\mathrm{R}+y_\mathrm{L}y_\mathrm{R}}{2} \end{aligned}$$

into $z_\mathrm{C}$ in (56), and thereafter using (57) with $r_\mathrm{L,R}=r$, leads to

$$\begin{aligned} 2z_\mathrm{C}&=x_\mathrm{L}-y_\mathrm{L}+x_\mathrm{R}-y_\mathrm{R}+\xi _\mathrm{D}(x_\mathrm{L}x_\mathrm{R}-y_\mathrm{L}y_\mathrm{R})+\xi _\mathrm{C}(x_\mathrm{L}x_\mathrm{R}+y_\mathrm{L}y_\mathrm{R})\\&=2\left( \frac{\xi _\mathrm{C}}{\xi _\mathrm{D}^2-\xi _\mathrm{C}^2}+r^2\cos (\theta _\mathrm{L}+\theta _\mathrm{R})\right) . \end{aligned}$$

Now, we want $z_\mathrm{C}=0$ while keeping $r^2$ to a minimum (in order to in turn minimize $z_\mathrm{L,R}$). We achieve this by putting

$$\begin{aligned} r^2=\frac{|\xi _\mathrm{C}|}{\xi _\mathrm{D}^2-\xi _\mathrm{C}^2},&\cos (\theta _\mathrm{L}+\theta _\mathrm{R})=-\mathrm{sgn}(\xi _\mathrm{C}). \end{aligned}$$

It can be shown that $\xi _\mathrm{D}^2-\xi _\mathrm{C}^2\ge 0$ (by inserting (48) into (25) and using that $A^{\mathsf {T}}=A\ge 0$), therefore the absolute value is only needed for $\xi _\mathrm{C}$. Inserting the above choice of $r^2$ into $z_\mathrm{L,R}$ in (58) and thereafter using (28) with $\xi _\mathrm{L,R}=\xi _\mathrm{D}$, we obtain

$$\begin{aligned} z_\mathrm{L,R}=\frac{|\xi _\mathrm{C}|-\xi _\mathrm{D}}{\xi _\mathrm{D}^2-\xi _\mathrm{C}^2}&=\frac{-1}{\xi _\mathrm{D}+|\xi _\mathrm{C}|}=-\frac{1}{\xi _\mathrm{T}}. \end{aligned}$$

We have thereby shown that, with $z_\mathrm{C}=0$ and $z_\mathrm{L}=z_\mathrm{R}$ in (55), $1/\xi _\mathrm{T}$ is the maximum amount of “positivity” in form of $( {\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+ {\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}})$ we can extract from A. Inserting $z_\mathrm{C}=0$ and $z_\mathrm{L,R}=-1/\xi _\mathrm{T}$ into (55) and noting that ${\widetilde{{\mathbf {v}}}}^{\mathsf {T}}A {\widetilde{{\mathbf {v}}}}\ge 0$, we get

$$\begin{aligned} {\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}-\frac{1}{\xi _\mathrm{T}}{\mathbf {v}}^{\mathsf {T}}( {\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+ {\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}){\mathbf {v}}\ge 0. \end{aligned}$$

(59)

Comparing with (17), we deduce that $h\gamma =1/\xi _\mathrm{T}$.

1.2 Proof of Lemma 3.6

We define ${\mathbf {w}}=S{\mathbf {v}}+M^{-1}{\mathbf {e}}_\mathrm{L}\rho _\mathrm{L}+M^{-1}{\mathbf {e}}_\mathrm{R}\rho _\mathrm{R}$ and use the relations in (30) to compute

$$\begin{aligned} {\mathbf {w}}^{\mathsf {T}}M{\mathbf {w}}&={\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}+2\rho _\mathrm{L}{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{L}+2\rho _\mathrm{R}{\mathbf {v}}^{\mathsf {T}}{\mathbf {d}}_\mathrm{R}+\rho _\mathrm{L}^2q_\mathrm{L}+2 \rho _\mathrm{L}\rho _\mathrm{R}q_\mathrm{C}+\rho _\mathrm{R}^2q_\mathrm{R}\end{aligned}$$

(60)

where $q_{\mathrm{L,R},\mathrm{C}}$ are defined in (31) and where $\rho _\mathrm{L,R}$ are any scalars. It is assumed that $M>0$ and that ${\mathbf {w}}^{\mathsf {T}}M{\mathbf {w}}\ge 0$. Note that the right-hand-side of (60) has the same form as (54), but with $\xi _{\mathrm{L,R},\mathrm{C}}$ replaced by $q_{\mathrm{L,R},\mathrm{C}}$. Thus, by following the same procedure, we obtain the relation corresponding to (59), namely

$$\begin{aligned} {\mathbf {v}}^{\mathsf {T}}A{\mathbf {v}}&-\frac{1}{q_\mathrm{T}}{\mathbf {v}}^{\mathsf {T}}\left( {\mathbf {d}}_\mathrm{L}{\mathbf {d}}_\mathrm{L}^{\mathsf {T}}+ {\mathbf {d}}_\mathrm{R}{\mathbf {d}}_\mathrm{R}^{\mathsf {T}}\right) {\mathbf {v}}\ge 0 \end{aligned}$$

with $q_\mathrm{T}$ defined in (31). Comparing with (17) we see that $h\gamma =1/q_\mathrm{T}$.

1.3 Proof of Lemma 3.7

In [13], it was shown that ${\widetilde{q}}_\mathrm{L,R}$ and ${\widetilde{q}}_\mathrm{C}$ in (32) can be computed as

$$\begin{aligned} {\widetilde{q}}_\mathrm{L}={\mathbf {d}}_\mathrm{L}^{\mathsf {T}}K_0{\mathbf {d}}_\mathrm{L},&{\widetilde{q}}_\mathrm{R}={\mathbf {d}}_\mathrm{R}^{\mathsf {T}}K_0{\mathbf {d}}_\mathrm{R},&{\widetilde{q}}_\mathrm{C}={\mathbf {d}}_\mathrm{L}^{\mathsf {T}}K_0{\mathbf {d}}_\mathrm{R}={\mathbf {d}}_\mathrm{R}^{\mathsf {T}}K_0{\mathbf {d}}_\mathrm{L}, \end{aligned}$$

(61)

with $K_0$ defined (using our notation from (21)) as

Now, we want to show that the quantities in (61) are equal to the ones in (25). Applying the formula for inverses of block matrices to the above definition of $K_0$, and thereafter using the relation for $a_\mathrm{R}$ in (46), we obtain

$$\begin{aligned} K_0&=\frac{1}{a_\mathrm{R}-\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}\vec {a}_\mathrm{R}}\left[ \begin{array}{ccc}0&{}\quad \vec {0}^{\mathsf {T}}&{}\quad 0\\ \vec {0}&{}\quad (a_\mathrm{R}-\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}\vec {a}_\mathrm{R}){\bar{A}}^{-1}+{\bar{A}}^{-1}\vec {a}_\mathrm{R}\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}&{}\quad -{\bar{A}}^{-1}\vec {a}_\mathrm{R}\\ 0&{}\quad -\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}&{}\quad 1\end{array}\right] \nonumber \\ {}&=\left[ \begin{array}{ccc}0&{}\quad \vec {0}^{\mathsf {T}}&{}\quad 0\\ \vec {0}&{}\quad {\bar{A}}^{-1}&{}\quad \vec {0}\\ 0&{}\quad \vec {0}^{\mathsf {T}}&{}\quad 0\end{array}\right] + \ell \left[ \begin{array}{c}0\\ Amid^{-1}\vec {a}_\mathrm{R}\\ 1\end{array}\right] \left[ \begin{array}{ccc}0&\quad -\vec {a}_\mathrm{R}^{\mathsf {T}}{\bar{A}}^{-1}&\quad 1\end{array}\right] .\end{aligned}$$

(62)

Comparing (62) with (23) and (45), we note that $K_0=G_2+{\mathbf {x}}{\mathbf {x}}^{\mathsf {T}}/\ell $. Inserting this into (61), and thereafter using (50) and that ${\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}{\mathbf {1}}=0$ and ${\mathbf {d}}_\mathrm{L,R}^{\mathsf {T}}{\mathbf {x}}=1$, yields

$$\begin{aligned} {\widetilde{q}}_\mathrm{L}= -{\mathbf {b}}_\mathrm{L}^{\mathsf {T}}{\mathbf {d}}_\mathrm{L},&{\widetilde{q}}_\mathrm{R}={\mathbf {b}}_\mathrm{R}^{\mathsf {T}}{\mathbf {d}}_\mathrm{R},&{\widetilde{q}}_\mathrm{C}= -{\mathbf {b}}_\mathrm{L}^{\mathsf {T}}{\mathbf {d}}_\mathrm{R}= {\mathbf {b}}_\mathrm{R}^{\mathsf {T}}{\mathbf {d}}_\mathrm{L}, \end{aligned}$$

that is exactly the same relations as in (25).

Explicit Inverses of the Second Derivative Operator

We provide the explicit expressions of ${\bar{A}}^{-1}$, ${\mathbf {b}}_\mathrm{L,R}$, $\xi _\mathrm{L,R}$ and $\xi _\mathrm{C}$ for the (2,0), (2,1) and (4,2) order accurate narrow-stencil operators and the (2,0) order accurate wide-stencil operator. By the notation “(2,0) order accurate operator”, we refer to a matrix $D_2$ which has order 2 in the interior finite difference stencil and order 0 at the boundaries.

1.1 The Narrow-Stencil (2,0) Order Operator

The simplest possible example of a second derivative operator $D_2$ fulfilling the SBP-properties in (13) is the narrow-stencil (2,0) order operator, and its corresponding matrix ${\widetilde{A}}$ was inverted already in [14] for the special case $\alpha _\mathrm{L,R}=1$, $\beta _\mathrm{L,R}=0$ and $\tau _\mathrm{L,R}=0$. It is given below, together with its associated ${\mathbf {d}}_\mathrm{L,R}$ vectors.

(63)

The operator $D_2$ is also associated with $H=h\ \mathrm{diag}\left( \frac{1}{2}, 1, 1, \ldots , 1, 1, \frac{1}{2}\right) $, and using (13) we obtain the $(n+1)\times (n+1)$ matrix A given below. The $(n-1)\times (n-1)$ matrix ${\bar{A}}$ is identified using (21). Gauss–Jordan elimination then leads to ${\bar{A}}^{-1}$ as

Inserting ${\bar{A}}^{-1}$ from above into (23), and using that $x_i=ih$, yields

$$\begin{aligned} \left( G_2\right) _{i,j}=\left\{ \begin{array}{ll} x_j(1-x_i/\ell ),&{}\quad 0\le j\le i\le n,\\ x_i(1-x_j/\ell ),&{}\quad 0\le i \le j\le n. \end{array}\right. \end{aligned}$$

(64)

Note the striking similarity to the continuous Green’s function in Remark 3.4. Next, by noticing the structure of ${\mathbf {d}}_\mathrm{L,R}$ in (63) and identifying the first and last columns of ${\bar{A}}^{-1}$ as $h(\vec {1}-\vec {x}/\ell )$ and $h\vec {x}/\ell $ we can compute $G_2{\mathbf {d}}_\mathrm{L,R}$ and consequently ${\mathbf {b}}_\mathrm{L,R}$ in (23) as

$$\begin{aligned} G_2{\mathbf {d}}_\mathrm{L}=\left[ \begin{array}{c}0\\ \vec {1}-\vec {x}/\ell \\ 0\end{array}\right] ,&G_2{\mathbf {d}}_\mathrm{R}=-\left[ \begin{array}{c}0\\ \vec {x}/\ell \\ 0\end{array}\right] ,&{\mathbf {b}}_\mathrm{L}={\mathbf {e}}_\mathrm{L},&{\mathbf {b}}_\mathrm{R}={\mathbf {e}}_\mathrm{R}. \end{aligned}$$

Furthermore, inserting these ${\mathbf {b}}_\mathrm{L,R}$ and ${\mathbf {d}}_\mathrm{L,R}$ from (63) into (25), we obtain

$$\begin{aligned} \xi _\mathrm{L}=\xi _\mathrm{R}=1/h,&\xi _\mathrm{C}=0. \end{aligned}$$

1.2 The Narrow-Stencil (2,1) Order Operator

The narrow-stencil (2,1) order operator (see Section C.1 in [25]), have the same matrices $H$ and A as the (2,0) order operator, and hence its $G_2$ is given by (64). However, the difference matrices ${\mathbf {d}}_\mathrm{L,R}$ differ, for the (2,1) order operator they are

$$\begin{aligned} {\mathbf {d}}_\mathrm{L}^{\mathsf {T}}=\frac{1}{h}\left[ \begin{array}{ccccccc}-\frac{3}{2}&\quad 2&\quad -\frac{1}{2}&\quad 0&\quad 0&\quad \cdots&\quad 0\end{array}\right] ,&\quad&\quad {\mathbf {d}}_\mathrm{R}^{\mathsf {T}}=\frac{1}{h}\left[ \begin{array}{ccccccc}0&\quad \cdots&\quad 0&\quad 0&\quad \frac{1}{2}&\quad -2&\quad \frac{3}{2}\end{array}\right] . \end{aligned}$$

We can compute $G_2{\mathbf {d}}_\mathrm{L}$ as

$$\begin{aligned} G_2{\mathbf {d}}_\mathrm{L}=h\left[ \begin{array}{cccccc}0&{}\quad 0&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad 0\\ 0&{}\quad 1-\frac{1}{n}&{}\quad 1-\frac{2}{n}&{}\quad \cdots &{}\quad \frac{1}{n}&{}\quad 0\\ 0&{}\quad 1-\frac{2}{n}&{}\quad 2( 1-\frac{2}{n})&{}\quad \cdots &{}\quad \frac{2}{n}&{}\quad 0\\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots &{}\quad \vdots &{}\quad \vdots \\ 0&{}\quad \frac{1}{n}&{}\quad \frac{2}{n}&{}\quad \cdots &{}\quad 1-\frac{1}{n}&{}\quad 0\\ 0&{}\quad 0&{}\quad 0&{}\quad \cdots &{}\quad 0&{}\quad 0\end{array}\right] \frac{1}{h}\left[ \begin{array}{c}-\frac{3}{2}\\ 2\\ frac{1}{2}\\ 0\\ \vdots \\ 0\end{array}\right] =\left[ \begin{array}{c}0\\ \frac{3}{2}-\frac{1}{n}\\ 1-\frac{2}{n}\\ \vdots \\ \frac{1}{n}\\ 0\end{array}\right] \end{aligned}$$

and repeating the procedure for $G_2{\mathbf {d}}_\mathrm{R}$ and thereafter using (23), we arrive at

$$\begin{aligned} {\mathbf {b}}_\mathrm{L}=\left[ \begin{array}{cccccc}1&-\frac{1}{2}&0&\cdots&0&0\end{array}\right] ^{\mathsf {T}},&{\mathbf {b}}_\mathrm{R}=\left[ \begin{array}{cccccc}0&0&\cdots&0&-\frac{1}{2}&1\end{array}\right] ^{\mathsf {T}}.\end{aligned}$$

Finally, we use (25) to compute

$$\begin{aligned} \xi _\mathrm{L,R}=2.5/h,&\xi _\mathrm{C}=0, \end{aligned}$$

where $\xi _\mathrm{C}=0$ holds for $n\ge 4$.

1.3 The Narrow-Stencil (4,2) Order Operator

The operator $D_2$ with fourth order interior accuracy and diagonal norm $H$, see Section C.2 in [25], is associated with the difference operators

(65)

Using (13) and identifying the interior of A according to (21), we obtain

$$\begin{aligned}{\bar{A}}=\frac{1}{h}\left[ \begin{array}{ccccccccc} \frac{59}{24}&{}\quad -\frac{59}{48}&{}\quad 0&{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ -\frac{59}{48}&{}\quad \frac{55}{24}&{}\quad -\frac{59}{48}&{}\quad \frac{1}{12}&{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ 0&{}\quad -\frac{59}{48}&{}\quad \frac{59}{24}&{}\quad -\frac{4}{3}&{}\quad \frac{1}{12}&{}\quad &{}\quad &{}\quad &{}\quad \\ &{}\quad \frac{1}{12}&{}\quad -\frac{4}{3}&{}\quad \frac{5}{2}&{}\quad -\frac{4}{3}&{}\quad \frac{1}{12}&{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad \frac{1}{12}&{}\quad -\frac{4}{3}&{}\quad \frac{5}{2}&{}\quad -\frac{4}{3}&{}\quad \frac{1}{12}&{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad \frac{1}{12}&{}\quad -\frac{4}{3}&{}\quad \frac{59}{24}&{}\quad -\frac{59}{48}&{}\quad 0\\ {} &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \frac{1}{12}&{}\quad -\frac{59}{48}&{}\quad \frac{55}{24}&{}\quad -\frac{59}{48}\\ {} &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0&{}\quad -\frac{59}{48}&{}\quad \frac{59}{24}\end{array}\right] .\end{aligned}$$

We are now looking for a matrix ${\bar{G}}$ such that ${\bar{G}}={\bar{A}}^{-1}$, and follow the same procedure as in “Section A.2” of Appendix. We make the ansatz

$$\begin{aligned}{\bar{G}}&=\left[ \begin{array}{ccccc} \vec {g}_{1}&\vec {g}_{2}&\ldots&\vec {g}_{n-1} \end{array}\right] ,&\vec {g}_{j}=\left[ \begin{array}{ccccc} g_{1,j}&g_{2,j}&\ldots&g_{n-1,j} \end{array}\right] ^{\mathsf {T}}. \end{aligned}$$

For ${\bar{A}}{\bar{G}}={\bar{I}}$ to hold, ${\bar{A}}\vec {g}_j=\vec {e}_j$ must be fulfilled for all $j=1,2,\ldots ,n-1$, where the vector $\vec {e}_j=[0\ \ldots \ 0\ 1\ 0\ \ldots \ 0]^{\mathsf {T}}$ is non-zero only in its jth element. From the mid rows of ${\bar{A}}\vec {g}_j$, given the inner structure of ${\bar{A}}$, we thus need

$$\begin{aligned} g_{i-2,j} -16g_{i-1,j}+ 30g_{i,j} -16g_{i+1,j}+g_{i+2,j}=12h\delta _{i,j},&\begin{array}{l}\forall i=4,5,\ldots , n-4,\\ \forall j=1,2,\ldots ,n-1,\end{array} \end{aligned}$$

where $\delta _{i,j}$ is the Kronecker delta. Hence, the fourth order linear homogeneous recurrence relation $g_{i-2,j} -16g_{i-1,j}+ 30g_{i,j} -16g_{i+1,j}+g_{i+2,j}=0$ has to be fulfilled by almost all $g_{i,j}$. The explicit solution to this recursive relation has the form $g_{i,j}=c_{1}+c_{2}i+c_{3}\psi ^i+ c_{4}\psi ^{-i}$, where $\psi =7+\sqrt{48}\approx 13.9$ and where $ c_{1,2,3,4}$ are j-dependent constants. To be precise, $g_{i,j}$ has this form for $2\le i\le n-2$, and we need two versions of the j-dependent constants, that is $g_{i,j}= c^u_{1}+ c^u_{2} i+ c^u_{3}\psi ^i+ c^u_{4}\psi ^{-i}$ for $2\le i\le j$ and $g_{i,j}= c^l_{1}+ c^l_{2} i+ c^l_{3}\psi ^i+ c^l_{4}\psi ^{-i}$ for $j\le i\le n-2$. For each $j=2,3,\ldots ,n-2$, we thus have eight unknown constants ${c^u_{1,2,3,4}}$ and ${c^l_{1,2,3,4}}$, as well as the two remaining unknowns $ g_{1,j} $ and $ g_{n-1,j}$. These are determined by the three first and the three last rows in the requirement ${\bar{A}}\vec {g}_j=\vec {e}_j$, which gives us six conditions. From the rows $i=j-1,j,j+1$, we get three more conditions and in addition, we demand that the two versions of $g_{j,j}$ are identical. Altogether, this leads to a $10\times 10$ system of equations which we solve using Gauss–Jordan elimination. The boundary columns $j=1$ and $j=n-1$ must be treated separately, in a similar manner. All in all, these steps lead to the elements of the inverse $({\bar{A}}^{-1})_{i,j}=g_{i,j}$ as

$$\begin{aligned} ({\bar{A}}^{-1})_{i,j}&=\kappa _{i,j}+\left\{ \begin{array}{ll} x_j(1-x_ i/\ell ),&{}\quad 1\le j\le i\le n-1\\ x_i(1- x_j/\ell ),&{}\quad 1\le i\le j\le n-1,\end{array}\right. \end{aligned}$$

which is thus similar to the second order version of ${\bar{A}}^{-1}$, plus an additional term $\kappa _{i,j}$. This additional correction term is, for $2\le i,j\le n-2$, given by

where

$$\begin{aligned} {\mathcal {P}}_i&=\frac{(51-2\psi ^{-1})\psi ^{i-2}-(51-2\psi )\psi ^{2-i}}{\psi -\psi ^{-1}},\\ {\mathcal {Q}}_n&=\frac{\psi ^{n-4}(2\psi ^{-1}-51 )^2-\psi ^{4-n}(2\psi -51)^2}{\psi -\psi ^{-1}}. \end{aligned}$$

Note that ${\mathcal {Q}}_n\ne 0$ (unless $n\approx 3.7$), so there is no risk of division by zero. Moreover, for $i,j=1$ or $i,j=n-1$ we have

$$\begin{aligned} \kappa _{1,j}&=-h \frac{{\mathcal {P}}_{n-j} }{{\mathcal {Q}}_n},&\kappa _{n-1,j}&=-h \frac{{\mathcal {P}}_{j} }{{\mathcal {Q}}_n},&2\le j\le n-2, \\ \kappa _{i,1}&=-h\frac{{\mathcal {P}}_{n-i}}{{\mathcal {Q}}_n},&\kappa _{i,n-1}&=-h\frac{{\mathcal {P}}_{i}}{{\mathcal {Q}}_n},&2\le i\le n-2, \end{aligned}$$

and

$$\begin{aligned} \kappa _{1,1}=\kappa _{n-1,n-1}&=-h\frac{{\mathcal {P}}_{n-2} }{2{\mathcal {Q}}_n}-h \frac{11}{118},&\kappa _{1,n-1}=\kappa _{n-1,1}&=-h\frac{{\mathcal {P}}_2}{2{\mathcal {Q}}_n}. \end{aligned}$$

From (23) we have that the interior of $G_2$ is given by ${\bar{A}}^{-1}$ described above. Next, we use ${\mathbf {d}}_\mathrm{L}$ from (65) to compute $G_2{\mathbf {d}}_\mathrm{L}$ and thereafter (23) again, to compute ${\mathbf {b}}_\mathrm{L}$ as

(66)

where we have used that ${\mathcal {Q}}_n+2{\mathcal {P}}_{n-3}=51{\mathcal {P}}_{n-2}$. Then, ${\mathbf {b}}_\mathrm{R}$ is given by $({\mathbf {b}}_\mathrm{R})_i=({\mathbf {b}}_\mathrm{L})_{n-i}$. We also compute the scalars from (25), as

$$\begin{aligned} \xi _\mathrm{L}=\xi _\mathrm{R}=\frac{1}{h}\left( \frac{2417}{354}-\frac{17^2{\mathcal {P}}_{n-2}}{2{\mathcal {Q}}_n}\right) ,&\xi _\mathrm{C}=\frac{1}{h} \frac{ 17^2}{{\mathcal {Q}}_n}. \end{aligned}$$

Evaluating $h\xi _\mathrm{L,R}$ and $h\xi _\mathrm{C}$ explicitly for some values of $n$, see Table 2, we see that these numbers corresponds exactly (to machine precision) to ${\widetilde{q}}_\mathrm{L}h$ and ${\widetilde{q}}_\mathrm{C}h$ tabulated in [13]. This serves as a numerical verification of Lemma 3.7 and indirectly of Theorem 3.1.

Table 2 The parameters $h\xi _\mathrm{L,R}$ and $h\xi _\mathrm{C}$ in the (4,2) order case evaluated explicitly

Full size table

1.4 The Wide-Stencil (2,0) Order Operator

The wide-stencil (2,0) order accurate operator $D_2$, which is obtained by squaring the (2,1) order accurate operator $D_1$ from (33), is given below together with ${\mathbf {d}}_\mathrm{L,R}=D_1^{\mathsf {T}}{\mathbf {e}}_\mathrm{L,R}$

The operator is also associated with the same $H=h \mathrm{diag}\left( \begin{array}{ccccccc}\frac{1}{2},&\quad 1,&\quad 1,&\quad \ldots ,&\quad 1,&\quad 1,&\quad \frac{1}{2}\end{array}\right) $ as the other operators with second order accuracy, and from this we can compute the $(n+1)\times (n+1)$ matrix A. Identifying the parts of A according to (21), gives us the $(n-1)\times (n-1)$ matrix ${\bar{A}}$. The inverse of this matrix ${\bar{A}}$ is

$$\begin{aligned} {\bar{A}}^{-1}=2h\left[ \begin{array}{cccc} 1-\frac{1}{n}&{}\quad 0&{}\quad 1-\frac{3}{n}&{}\quad \cdots \\ 0&{}\quad 2(1-\frac{2}{n})&{}\quad 0&{}\quad \cdots \\ 1-\frac{3}{n}&{}\quad 0&{}\quad 3(1-\frac{3}{n})&{}\quad \cdots \\ \vdots &{}\quad \vdots &{}\quad \vdots &{}\quad \ddots \end{array}\right] , \end{aligned}$$

that is the discrete Green’s function in (23) becomes

$$\begin{aligned} \left( G_2\right) _{i,j}=\left\{ \begin{array}{ll} x_j(1-x_i/\ell )(1+(-1)^{i+j}),&{}\quad 0\le j\le i\le n,\\ x_i(1-x_j/\ell )(1+(-1)^{i+j}),&{}\quad 0\le i \le j\le n. \end{array}\right. \end{aligned}$$

Thus the discrete Green’s function produced by the wide operator oscillate, jumping between 0 and 2 times the exact value. Next, using (23) we obtain the vectors

$$\begin{aligned} {\mathbf {b}}_\mathrm{L}^{\mathsf {T}}&=\left[ \begin{array}{cccccccccc}1&-(1-\frac{1}{n})&1-\frac{2}{n}&-(1-\frac{3}{n})&\ldots&(-1)^n\frac{2}{n}&-(-1)^n\frac{1}{n}&0\end{array}\right] ,\\ {\mathbf {b}}_\mathrm{R}^{\mathsf {T}}&=\left[ \begin{array}{cccccccccc}0&-(-1)^n\frac{1}{n}&(-1)^n\frac{2}{n}&\ldots&-(1-\frac{3}{n})&1-\frac{2}{n}&-(1-\frac{1}{n})&1\end{array}\right] . \end{aligned}$$

Last, we compute the (2,0) order wide-stencil version of (25), as

$$\begin{aligned} \xi _\mathrm{L}=\xi _\mathrm{R}=\frac{2}{h}-1/\ell ,&\xi _\mathrm{C}=-(-1)^{n}/\ell . \end{aligned}$$

In the wide-stencil case, $q_\mathrm{L,R}={\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}H^{-1}{\mathbf {e}}_\mathrm{L,R}=2/h$ and $q_\mathrm{C}={\mathbf {e}}_\mathrm{L,R}^{\mathsf {T}}H^{-1}{\mathbf {e}}_\mathrm{R,L}=0$ can be computed directly. We recall that ${\widetilde{q}}_{\mathrm{L,R},\mathrm{C}}=\xi _{\mathrm{L,R},\mathrm{C}}$ and note that ${\widetilde{q}}_\mathrm{L,R}\ne q_\mathrm{L,R}$ and ${\widetilde{q}}_\mathrm{C}\ne q_\mathrm{C}$, but still ${\widetilde{q}}_\mathrm{T}=q_\mathrm{T}=2/h$. Compare with the discussion in Sect. 3.4.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Eriksson, S. Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative. J Sci Comput 89, 30 (2021). https://doi.org/10.1007/s10915-021-01606-9

Download citation

Received: 19 April 2020
Revised: 21 June 2021
Accepted: 26 July 2021
Published: 21 September 2021
DOI: https://doi.org/10.1007/s10915-021-01606-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative

Abstract

Similar content being viewed by others

A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators

Simultaneous Approximation Terms for Multi-dimensional Summation-by-Parts Operators

Discrete Pseudo-differential Operators and Applications to Numerical Schemes

1 Introduction

2 The First Derivative

2.1 The Semi-discrete Scheme

2.1.1 Stability and Dual Consistency

2.2 The Inverse of the Discretization Matrix

Theorem 2.1

Proof of Theorem 2.1

Corollary 2.2

Remark 2.3

2.2.1 Interface SATs

3 The Second Derivative

3.1 The Semi-discrete Scheme

3.1.1 Stability

3.1.2 Dual Consistency

3.2 The Inverse of the Discretization Matrix

Theorem 3.1

Proof of Theorem 3.1

Corollary 3.2

Proof of Corollary 3.2

Assumption 3.3

Remark 3.4

3.3 Relations Between Stability, Singularity and Dual Consistency

Theorem 3.5

Proof

3.4 Relations to the Stability Demands in [13]

Lemma 3.6

Proof

Lemma 3.7

Proof

Theorem 3.8

Proof

4 Conclusions

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Appendices

Explicit Inverses of the First Derivative Operator

1.1 The (2,1) Order Accurate Operator

1.2 The (4,2) Order Accurate Operator

1.2.1 The Inverse with an Even Number of Grid Points \(n\)

Proof of Theorem 3.1

1.1 Preliminaries

1.2 Confirmation of Eq. (22) with (23)–(25)

Proofs of the Relations Between \(\xi _\mathrm{T}\), \(\gamma \), \(q_\mathrm{T}\) and \({\widetilde{q}}_\mathrm{T}\)

1.1 Proof of Theorem 3.5

1.2 Proof of Lemma 3.6

1.3 Proof of Lemma 3.7

Explicit Inverses of the Second Derivative Operator

1.1 The Narrow-Stencil (2,0) Order Operator

1.2 The Narrow-Stencil (2,1) Order Operator

1.3 The Narrow-Stencil (4,2) Order Operator

1.4 The Wide-Stencil (2,0) Order Operator

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation