A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators

Eriksson, Sofia

doi:10.1007/s10915-017-0569-6

A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators

Published: 16 October 2017

Volume 75, pages 906–940, (2018)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Scientific Computing Aims and scope Submit manuscript

A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators

Download PDF

Sofia Eriksson ORCID: orcid.org/0000-0003-1216-1672¹

391 Accesses
12 Citations
1 Altmetric
Explore all metrics

Abstract

We study the numerical solutions of time-dependent systems of partial differential equations, focusing on the implementation of boundary conditions. The numerical method considered is a finite difference scheme constructed by high order summation by parts operators, combined with a boundary procedure using penalties (SBP–SAT). Recently it was shown that SBP–SAT finite difference methods can yield superconvergent functional output if the boundary conditions are imposed such that the discretization is dual consistent. We generalize these results so that they include a broader range of boundary conditions and penalty parameters. The results are also generalized to hold for narrow-stencil second derivative operators. The derivations are supported by numerical experiments.

Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative

Article Open access 21 September 2021

Adaptive Finite Difference Methods for Nonlinear Elliptic and Parabolic Partial Differential Equations with Free Boundaries

Article 21 November 2015

Simultaneous Approximation Terms for Multi-dimensional Summation-by-Parts Operators

Article 31 August 2017

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

In this paper we consider a summation by parts (SBP) finite difference method, which is combined with a penalty technique denoted simultaneous approximation term (SAT) for the boundary conditions. The main advantages of the SBP–SAT finite difference methods are high accuracy, computational efficiency and provable stability. For a background on the history and the newer developments of SBP–SAT, see [6, 19].

A discrete differential operator $D_1$ is said to be a SBP-operator if it can be factorized by the inverse of a positive definite matrix $H$ and a difference operator Q, as specified later in Eq. (12). When $H$ is diagonal, $D_1$ consists of a 2p-order accurate central difference approximation in the interior, but at the boundaries, the accuracy is limited to pth order. The global accuracy of the numerical solution can then be shown to be $p+1$, see [18, 19].

In many applications functionals are of interest, sometimes they are even more important than the primary solution itself (one example is lift or drag coefficients in computational fluid dynamics). It could be expected that functionals computed from the numerical solution would have the same order of accuracy as the solution itself. However, recently Hicken and Zingg [9] showed that when computing the numerical solution in a dual consistent way, the order of accuracy of the output functional is higher than that of the solution, in fact, the full 2p accuracy can be recovered. Related papers are [8, 10] which includes interesting work on SBP operators as quadrature rules and error estimators for functional errors. Note that this kind of superconvergent behavior was already known for example for finite element and discontinuous Galerkin methods, but it had not been proven for finite difference schemes before, see [9]. Later Berg and Nordström [1,2,3] showed that the results hold also for time-dependent problems.

In [8, 9] and [1] boundary conditions of Dirichlet type are considered (in [9] Neumann boundary conditions are included but are rewritten on first order form), and in [2, 3] boundary conditions of far-field type are derived. In this paper, we generalize these results by deriving penalty parameters that yield dual consistency for all energy stable boundary conditions of Robin type (including the special cases Dirichlet and Neumann). In contrast to [2, 3], where the boundary conditions were adapted to get the penalty in a certain form, we adapt the penalty after the boundary conditions instead. Furthermore, we extend the results such that they hold also for narrow-stencil second derivative operators (sometimes also denoted compact second derivative operators), where the term narrow is used to define explicit finite difference schemes with a minimal stencil width. In fact, the results even carry over to narrow-stencil second derivatives operators for variable coefficients (of the type considered for example in [12]).

To keep things simple we consider linear problems in one spatial dimension, however, note that this is not due to a limitation of the method. In [8, 9] the extension to higher dimensions, curvilinear grids and non-linear problems are discussed and implemented for stationary problems and in [3] the theory is applied to the time-dependent Navier–Stokes and Euler equations in two dimensions.

The paper is organized as follows: In Sect. 2 we consider hyperbolic systems of partial differential equations and derive a family of SAT parameters which guarantees a stable and dual consistent discretization. Since higher order differential equations can always be rewritten as first order systems, this result directly leads to penalty parameters for parabolic problems, when using wide-stencil second derivative operators. Next, these parameters are generalized such that they hold also for narrow-stencil second derivative operators. This is all done in Sect. 3. In Sect. 4 a special aspect of the stability for the narrow operators is discussed. The derivations are then followed by examples and numerical simulations in Sect. 5 and a summary is given in Sect. 6.

1.1 Preliminaries

We consider time-dependent partial differential equations (PDE) as

$$\begin{aligned} \begin{aligned} \mathcal {U}_t+\mathcal {L}(\mathcal {U})&=\mathcal {F}, \quad t\in [0,T], \quad x\in {\varOmega }, \end{aligned} \end{aligned}$$

(1)

where $\mathcal {L}$ represents a linear, spatial differential operator and $\mathcal {F}(x,t)$ is a forcing function. For simplicity, we will assume that the sought solution $\mathcal {U}(x,t)$ satisfies homogeneous initial and boundary conditions. To derive the dual equations we follow [1, 2, 9] and pose the problem in a variational framework: Given a functional $\mathcal {J}(\mathcal {U})=\langle { \mathcal {G},\mathcal {U}}\rangle $, where $\mathcal {G}(x,t)$ is a smooth weight function and where $\langle { \mathcal {G},\mathcal {U}}\rangle =\int _{\varOmega }\mathcal {G}^T\mathcal {U}\,\mathrm {d} x$ refers to the standard $L^2$ inner product, we seek a function $\mathcal {V}(x,t)$ such that $\mathcal {J}(\mathcal {U})=\mathcal {J}^*(\mathcal {V})=\langle { \mathcal {V},\mathcal {F}}\rangle $. This defines the dual problem as

$$\begin{aligned} \begin{aligned} \mathcal {V}_\tau +\mathcal {L}^*(\mathcal {V})&=\mathcal {G}, \quad \tau \in [0, T], \quad x\in {\varOmega }, \end{aligned} \end{aligned}$$

(2)

where $\mathcal {L}^*$ is the adjoint operator, given by $\langle {\mathcal {V},\mathcal {L}\mathcal {U}}\rangle =\langle { \mathcal {L}^*\mathcal {V},\mathcal {U}}\rangle $, and where $\mathcal {V}$ also satisfies homogeneous initial and boundary conditions. Note that the dual problem actually goes “backward” in time; the expression in (2) is obtained using the transformation $\tau =T-t$.

Let $U$ and $V$ be discrete vectors approximating $\mathcal {U}$ and $\mathcal {V}$, respectively, and let $F$ and $G$ be projections of $\mathcal {F}$ and $\mathcal {G}$ onto a spatial grid. We discretize (1) using a stable and consistent SBP–SAT scheme, leading to

$$\begin{aligned} \begin{aligned} U_t+LU&=F, \quad t\in [0,T]. \end{aligned} \end{aligned}$$

(3)

The SBP–SAT scheme has an associated matrix $H$ which defines a discrete inner product, as $\langle { G,U}\rangle _{{}_H}= G^THU$ (when $\mathcal {U}$ is vector-valued, $H$ must be replaced by $\bar{H}$, which is defined later in the paper). Now the discrete adjoint operator is given by $L^*= H^{-1}L^TH$, since this leads to $\langle {V,LU}\rangle _{{}_H}=\langle { L^*V,U}\rangle _{{}_H}$ which mimics the continuous relation above.

If $L^*$ happens to be a consistent approximation of $\mathcal {L}^*$, then the discretization (3) is said to be dual consistent (if considering the stationary case) or spatially dual consistent, see [1, 9] respectively. When (3) is a stable and dual consistent discretization of (1), then the linear functional $J(U)=\langle { G,U}\rangle _{{}_H}$ is a 2p-order accurate approximation of $\mathcal {J}(\mathcal {U})$, that is $J(U)=\mathcal {J}(\mathcal {U})+\mathcal {O}(h^{2p})$, and we thus have superconvergent functional output. To obtain such high accuracy it is necessary to have compatible and sufficiently smooth data, see [9] for more details.

2 Hyperbolic Systems

We start by considering a hyperbolic system of PDEs of reaction-advection type, namely

$$\begin{aligned} \begin{aligned} \mathcal {U}_t+\mathcal {R}\mathcal {U}+ \mathcal {A}\mathcal {U}_x=&\mathcal {F}, \qquad x\in [x_{L},x_{R}], \\ \mathcal {B}_{{L}}\mathcal {U}=&g_{{L}}, \qquad x=x_{L},\\ \mathcal {B}_{{R}}\mathcal {U}=&g_{{R}}, \qquad x=x_{R}, \end{aligned} \end{aligned}$$

(4)

valid for $t\ge 0$ and augmented with initial data $\mathcal {U}(x,0)=\mathcal {U}_0(x)$. We let $\mathcal {R}$ and $\mathcal {A}$ be real-valued, symmetric $n\times n $ matrices with constant coefficients. Further, $\mathcal {R}$ is positive semi-definite, that is $\mathcal {R}\ge 0$. The operators $\mathcal {B}_{{L}}$ and $\mathcal {B}_{{R}}$ define the form of the boundary conditions and their properties are specified in (10) below. The forcing function $\mathcal {F}(x,t)$, the initial data $\mathcal {U}_0(x)$ and the boundary data $g_{{L}}(t)$ and $g_{{R}}(t)$ are assumed to be compatible and sufficiently smooth such that the solution $\mathcal {U}(x,t)$ exists. We will refer to (4) as our primal problem.

2.1 Well-Posedness Using the Energy Method

We call (4) well-posed if it has a unique solution and is stable. Existence is guaranteed by using the right number of boundary conditions, and uniqueness then follows from the stability, [7, 15]. Next we show stability, using the energy method.

The PDE in the first row of (4) is multiplied by $\mathcal {U}^T$ from the left and integrated over the domain ${\varOmega }=[x_{L},x_{R}]$. Using integration by parts we obtain

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert \mathcal {U}\Vert ^2+2\langle { \mathcal {U},\mathcal {R}\mathcal {U}}\rangle&=2\langle { \mathcal {U},\mathcal {F}}\rangle + {\text {BT}}_L+{\text {BT}}_R \end{aligned}$$

(5)

where $\Vert \mathcal {U}\Vert ^2=\langle { \mathcal {U}, \mathcal {U}}\rangle =\int _{x_{L}}^{x_{R}}\mathcal {U}^T \mathcal {U}\,\mathrm {d} x$ and where

$$\begin{aligned} {\text {BT}}_L=\left. \mathcal {U}^T\mathcal {A}\mathcal {U}^{}\right| _{x_{L}},&{\text {BT}}_R=-\left. \mathcal {U}^T\mathcal {A}\mathcal {U}^{}\right| _{x_{R}}. \end{aligned}$$

To bound the growth of the solution, we must ensure that the boundary conditions make ${\text {BT}}_L$ and ${\text {BT}}_R$ non-positive for zero data. We consider the matrix $\mathcal {A}$ above and assume that we have found a factorization such that

$$\begin{aligned} \mathcal {A}=Z{\varDelta }Z^T,&{\varDelta }=\left[ \begin{array}{ccc}{\varDelta }^{}_+\\ &{}{\varDelta }^{ }_0\\ &{}&{}{\varDelta }^{ }_-\end{array}\right] ,&Z=\left[ Z_+\ \, Z_0\ \, Z_-\right] , \end{aligned}$$

(6)

where $Z$ is non-singular. The parts of ${\varDelta }$ are arranged such that ${\varDelta }_+>0$, ${\varDelta }_0=0$ and ${\varDelta }_-<0$. According to Sylvester’s law of inertia, the matrices $\mathcal {A}$ and ${\varDelta }$ have the same number of positive ($n_+$), negative ($n_-$) and zero ($n_0$) eigenvalues (for a non-singular $Z$), where $n=n_++n_0+n_-$. To bound the terms ${\text {BT}}_L$ and ${\text {BT}}_R$, we have to give $n_+$ boundary conditions at $x=x_{L}$ and $n_-$ boundary conditions at $x=x_{R}$. We note that

$$\begin{aligned} \mathcal {A}=Z^{ }_+{\varDelta }^{ }_+Z_+^T+Z_-^{ }{\varDelta }^{ }_-Z_-^T, \end{aligned}$$

(7)

which gives

$$\begin{aligned} {\text {BT}}_L&= \left. \mathcal {U}^T\left( Z^{ }_+{\varDelta }^{}_+Z_+^T+Z_-^{}{\varDelta }^{ }_-Z_-^T\right) \mathcal {U}\right| _{x_{L}},\\ \text {BT}_R&=-\left. \mathcal {U}^T\left( Z^{ }_+{\varDelta }^{ }_+Z_+^T+Z_-^{ }{\varDelta }^{ }_-Z_-^T\right) \mathcal {U}\right| _{x_{R}} \end{aligned}$$

where $Z_+^T\mathcal {U}$ represents the right-going variables (ingoing at the left boundary), and $Z_-^T\mathcal {U}$ represents the left-going variables (ingoing at the right boundary). The ingoing variables are given data in terms of known functions and outgoing variables, as

$$\begin{aligned} \left. Z_+^T\mathcal {U}\right| _{x_{L}} =\left. \widetilde{g}_L-R_LZ_-^T\mathcal {U}\right| _{x_{L}},&\left. Z_-^T\mathcal {U}\right| _{x_{R}} =\left. \widetilde{g}_R-R_RZ_+^T\mathcal {U}\right| _{x_{R}}, \end{aligned}$$

(8)

where $\widetilde{g}_L$, $\widetilde{g}_R$ are the known data and where the matrices $R_L$ and $R_R$ must be sufficiently small, see below. Using the boundary conditions in (8), the boundary terms $\text {BT}_L$ and $\text {BT}_R$ become

$$\begin{aligned} \begin{aligned} \text {BT}_L&=\left. \mathcal {U}^TZ_-^{ }\mathcal {C}_LZ_-^T\mathcal {U}\right| _{x_{L}}-\left. 2\widetilde{g}_L^T{\varDelta }^{ }_+R_LZ_-^T\mathcal {U}\right| _{x_{L}}+\ \widetilde{g}_L^T{\varDelta }^{ }_+\widetilde{g}_L\\ \text {BT}_R&=\left. \mathcal {U}^TZ_+^{ }\mathcal {C}_RZ_+^T\mathcal {U}\right| _{x_{R}}+\left. 2\widetilde{g}_R^T{\varDelta }^{ }_-R_RZ_+^T\mathcal {U}\right| _{x_{R}}-\, \widetilde{g}_R^T{\varDelta }^{ }_-\widetilde{g}_R, \end{aligned} \end{aligned}$$

(9)

where we have defined

$$\begin{aligned} \mathcal {C}_L={\varDelta }^{ }_-+R_L^T{\varDelta }^{ }_+R_L,&\mathcal {C}_R=-{\varDelta }^{ }_+-R_R^T{\varDelta }^{}_-R_R. \end{aligned}$$

We note that if $\mathcal {C}_L\le 0$ and $\mathcal {C}_R\le 0$, the boundary terms in (9) will be non-positive for zero data. By integrating (5) in time we can now obtain a bound on $\Vert \mathcal {U}\Vert ^2$. Since the boundary conditions have the form (8), we also know that the correct number of boundary conditions are specified at each boundary, which yields existence. Our problem is thus well-posed.

To relate the original boundary conditions in (4) to the ones in (8), we let

$$\begin{aligned} \mathcal {B}_{{L}}=P_L\left( Z_+^T+R_LZ_-^T\right) , \quad \mathcal {B}_{{R}}=P_R\left( Z_-^T+R_RZ_+^T\right) , \end{aligned}$$

(10)

where $P_L$ and $P_R$ are invertible scaling and/or permutation matrices given by the chosen $\mathcal {B}_{{L,R}}$ and $Z_\pm $. The data in (8) is identified as $\widetilde{g}_L=P_L^{-1}g_{{L}}$ and $\widetilde{g}_R=P_R^{-1}g_{{R}}$. We assume that the boundary conditions in (4) are properly chosen such that $R_L$ and $R_R$ are sufficiently small to make $\mathcal {C}_L$, $\mathcal {C}_R\le 0$.

Remark 1

Note that the energy method is a sufficient but not necessary condition for stability and that it is rather restrictive with respect to the admissible boundary conditions. By rescaling the problem we could allow $R_L$ and $R_R$ to be larger, see [7, 11]. We will not consider this complication but simply require that $\mathcal {C}_{L,R}\le 0$.

Remark 2

In the homogeneous case, with boundary conditions such that $\mathcal {C}_{L,R}\le 0$, the growth rate in (5) becomes $\frac{\mathrm {d} }{\mathrm {d} t}\Vert \mathcal {U}\Vert ^2\le 0$. Integrating this in time we obtain the energy estimate $\Vert \mathcal {U}\Vert ^2\le \Vert \mathcal {U}_0\Vert ^2$ and (4) is well-posed. Since (4) is an one-dimensional hyperbolic problem it is also possible to show strong well-posedness, i.e., that $\Vert \mathcal {U}\Vert $ is bounded by the data $g_{{L}}$, $g_{{R}}$, $\mathcal {F}$ and $\mathcal {U}_0$. See [7, 11] for different definitions of well-posedness.

2.2 The Semi-discrete Problem

We discretize in space using $N+1$ equidistant grid points $x_i=x_{L}+hi$, where $h=(x_{R}-x_{L})/N$ and $i=0,1,\ldots ,N$. The semi-discrete scheme approximating (4) is written

$$\begin{aligned} \begin{aligned} U_t+(I_N\otimes \mathcal {R})U+(D_1\otimes \mathcal {A})U=&\,F+\left( H^{-1}e_{0}\otimes {\varSigma }_0\right) (\mathcal {B}_{{L}}{U}_{0}-g_{{L}})\\&+\left( H^{-1}e_N \otimes {\varSigma }_N\right) (\mathcal {B}_{{R}}{U}_N -g_{{R}}), \end{aligned} \end{aligned}$$

(11)

where $U=[U_0^T,U_1^T,\ldots ,U_N^T]^T$ is a vector of length $n(N+1)$, such that $U_i(t)\approx \mathcal {U}(x_i,t)$, and where $F_i(t)=\mathcal {F}(x_i,t)$. The symbol $\otimes $ refers to the Kronecker product. The finite difference operator $D_1$ approximates $\partial /\partial x$ and satisfies the SBP-properties

$$\begin{aligned} D_1=H^{-1}Q,\quad H=H^T>0,\quad Q+Q^T=E_N-E_0 \end{aligned}$$

(12)

where $E_0=e_0e_0^T$, $E_N=e_Ne_N^T$, $e_0=[1, 0, \ldots , 0]^T$ and $e_N=[ 0, \ldots , 0, 1]^T$. Note that $U_0=(e_0^T\otimes I_n)U$ and $U_N=(e_N^T\otimes I_n)U$. By $I_N$ and $I_n$ we refer to identity matrices of size $N+1$ and n, respectively. The boundary conditions are imposed using the SAT technique which is a penalty method. The penalty parameters ${\varSigma }_0$ and ${\varSigma }_N$ in (11) are at this point unknown, but are derived in the next subsections and presented in Theorem 1.

In this paper, we require that $H$ is diagonal and in this case $D_1$ consists of a 2p-order accurate central difference approximation in the interior and one-sided, p-order accurate approximations at the boundaries. Examples of SBP operators can be found in [13, 17]. For more details about SBP–SAT, see [18] and references therein.

2.3 Numerical Stability Using the Energy Method

Just as in the continuous case we use the energy method to show stability. We multiply (11) by $U^T\bar{H}$ from the left, where $\bar{H}=H\otimes I_n$, and then add the transpose of the result. Thereafter using the SBP-properties in (12) we obtain

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert U\Vert ^2_{{}_H}+2{U}^T(H\otimes \mathcal {R}){U}=2\langle {U, F}\rangle _{{}_H}+\text {BT}_L^{\text {disc.}}+\text {BT}_R^{\text {disc.}}, \end{aligned}$$

where $\Vert U\Vert ^2_{{}_H}=\langle { U,U}\rangle _{{}_H}=U^T\bar{H}U$ is the discrete $L^2$-norm and where

$$\begin{aligned} \begin{aligned} \text {BT}_L^{\text {disc.}}&=U_{0}^T\left( \mathcal {A}+{\varSigma }_0\mathcal {B}_{{L}}+ \mathcal {B}_{{L}}^T{\varSigma }_0^T \right) {U}_{0}-U_{0}^T {\varSigma }_0g_{{L}}- g_{{L}}^T{\varSigma }_0^T U_{0},\\ \text {BT}_R^{\text {disc.}}&=U_N^T\left( - \mathcal {A}+{\varSigma }_N\mathcal {B}_{{R}}+\mathcal {B}_{{R}}^T{\varSigma }_N^T\right) U_N-U_N^T{\varSigma }_Ng_{{R}}- g_{{R}}^T{\varSigma }_N^TU_N. \end{aligned} \end{aligned}$$

(13)

We define $C_0= \mathcal {A}+{\varSigma }_0\mathcal {B}_{{L}}+\mathcal {B}_{{L}}^T{\varSigma }_0^T$ and $ C_N= - \mathcal {A}+{\varSigma }_N\mathcal {B}_{{R}}+\mathcal {B}_{{R}}^T{\varSigma }_N^T$. For stability $\text {BT}_L^{\text {disc.}}$ and $\text {BT}_R^{\text {disc.}}$ must be non-positive for zero boundary data, i.e., $C_0\le 0$ and $ C_N\le 0$. We make the following ansatz for the penalty parameters:

$$\begin{aligned} {\varSigma }_0&=(Z_+{\varPi }_0+Z_-{\varGamma }_0)P_L^{-1},&{\varSigma }_N&=(Z_+{\varGamma }_N+Z_-{\varPi }_N)P_R^{-1}, \end{aligned}$$

(14)

where the matrices ${\varPi }_0$, ${\varGamma }_0$, ${\varGamma }_N$ and ${\varPi }_N$ will be determined over the next few pages. Taking the left boundary as example and using (7), (10) and (14) we obtain

$$\begin{aligned} C_0&=\left[ \begin{array}{c}Z_+^T \\ Z_-^T\end{array}\right] ^T\left[ \begin{array}{cc}{\varDelta }_++{\varPi }_0+{\varPi }_0^T\ &{}\quad {\varPi }_0R_L+{\varGamma }_0^T \\ {\varGamma }_0+ R_L^T{\varPi }_0^T&{}\quad {\varDelta }_-+{\varGamma }_0R_L+R_L^T{\varGamma }_0^T\end{array}\right] \left[ \begin{array}{c}Z_+^T \\ Z_-^T\end{array}\right] . \end{aligned}$$

(15)

2.4 The Dual Problem

Given the functional $\mathcal {J}(\mathcal {U})=\langle { \mathcal {G},\mathcal {U}}\rangle $, the dual problem of (4) is

$$\begin{aligned} \begin{aligned} \mathcal {V}_\tau +\mathcal {R}\mathcal {V}- \mathcal {A}\mathcal {V}_x=&\,\mathcal {G}, \qquad x\in [x_{L},x_{R}],\\ \widetilde{\mathcal {B}_{{L}}} \mathcal {V}=&\,\widetilde{g_{{L}}}, \qquad x= x_{L},\\ \widetilde{\mathcal {B}_{{R}}} \mathcal {V}=&\,\widetilde{g_{{R}}}, \qquad x= x_{R}, \end{aligned} \end{aligned}$$

(16)

which holds for $\tau \ge 0$ and is complemented with the initial data $\mathcal {V}(x,0)=\mathcal {V}_0(x)$. Note that we have used the transformation $\tau =T-t$ mentioned in Sect. 1.1, such that $\mathcal {V}=\mathcal {V}(x,\tau )$. The boundary operators in (16) have the form

$$\begin{aligned} \widetilde{\mathcal {B}_{{L}}}=\widetilde{P_L}\left( Z_-^T+\widetilde{R_L}Z_+^T\right) ,&\widetilde{\mathcal {B}_{{R}}}=\widetilde{P_R}\left( Z_+^T+\widetilde{R_R}Z_-^T\right) , \end{aligned}$$

(17)

where $\widetilde{P_L}$ and $\widetilde{P_R}$ are arbitrary invertible matrices and where $\widetilde{R_L}$ and $\widetilde{R_R}$ depend on the primal boundary conditions as

$$\begin{aligned} \widetilde{R_L}=-{\varDelta }_-^{-1}R_L^T{\varDelta }_+,&\widetilde{R_R}=-{\varDelta }_+^{-1}R_R^T{\varDelta }_-. \end{aligned}$$

(18)

The claim that (16), (17) and (18) describes the dual problem is motivated below: Using the notation in (1) and (2) we identify the spatial operators of (4) and (16) as

$$\begin{aligned} \mathcal {L}=\mathcal {R}+\mathcal {A}\frac{\partial }{\partial x},&\mathcal {L}^*=\mathcal {R}-\mathcal {A}\frac{\partial }{\partial x}, \end{aligned}$$

(19)

respectively. For (16) to be the dual problem of (4), $\mathcal {L}$ and $\mathcal {L}^*$ must fulfill the relation $\langle {\mathcal {V},\mathcal {L}\mathcal {U}}\rangle =\langle { \mathcal {L}^*\mathcal {V},\mathcal {U}}\rangle $. Using integration by parts we obtain

$$\begin{aligned} \langle {\mathcal {V},\mathcal {L}\mathcal {U}}\rangle&=\langle { \mathcal {L}^*\mathcal {V},\mathcal {U}}\rangle +[\mathcal {V}^T\mathcal {A}\mathcal {U}]_{x_{L}}^{x_{R}} \end{aligned}$$

and we see that $\mathcal {V}^T\mathcal {A}\mathcal {U}^{}$ must be zero at both boundaries (the boundary conditions for the dual problem are defined as the minimal set of homogeneous conditions such that all boundary terms vanish after that the homogeneous boundary conditions for the primal problem have been applied, see [1]). Using the boundary conditions of the primal problem, (8), followed by the dual boundary conditions, (16), (17), yields (for zero data)

$$\begin{aligned} \left. \mathcal {V}^T\mathcal {A}\mathcal {U}^{}\right| _{x_{L}}&=-\left. \mathcal {V}^TZ^{ }_+\left( {\varDelta }^{ }_+R_L+\widetilde{R_L}^T{\varDelta }^{ }_-\right) Z_-^T\mathcal {U}\right| _{x_{L}}\\ \left. \mathcal {V}^T\mathcal {A}\mathcal {U}^{}\right| _{x_{R}}&=-\left. \mathcal {V}^TZ^{ }_-\left( \widetilde{R_R}^T{\varDelta }^{ }_++{\varDelta }^{ }_-R_R\right) Z_+^T\mathcal {U}\right| _{x_{R}} \end{aligned}$$

and if (18) holds, then $\mathcal {V}^T\mathcal {A}\mathcal {U}^{}=0$ at both boundaries and the above claim is confirmed.

Remark 3

The functional of interest can also include outgoing solution terms from the boundary, as $\mathcal {J}(\mathcal {U})=\langle { \mathcal {G},\mathcal {U}}\rangle +{\varPhi }\mathcal {U}|_{x_{L}} +{\varPsi }\mathcal {U}|_{x_{R}}$, where ${\varPhi }$ and $ {\varPsi }$ have the form ${\varPhi }={\varPhi }_+^{ }Z_+^T+{\varPhi }_-^{ }Z_-^T$ and $ {\varPsi }={\varPsi }_+^{ }Z_+^T+{\varPsi }_-^{ }Z_-^T$. This specifies the boundary data in (16) to $ \widetilde{g_{{L}}}=-\widetilde{P_L}{\varDelta }^{ -1}_-({\varPhi }_-^{ }-{\varPhi }_+^{ }R_L)^T$ and $\widetilde{g_{{R}}}= \widetilde{P_R}{\varDelta }^{ -1}_+ ({\varPsi }_+-{\varPsi }_-R_R)^T$, compare with [9]. When $\mathcal {J}(\mathcal {U})=\langle { \mathcal {G},\mathcal {U}}\rangle $ then the boundary data in (16) is actually zero.

2.4.1 Well-Posedness of the Dual Problem

The growth rate for the dual problem is given by

$$\begin{aligned} {\frac{\mathrm {d} }{\mathrm {d} \tau }} \Vert \mathcal {V}\Vert ^2+2\langle {\mathcal {V}, \mathcal {R}\mathcal {V}}\rangle =\text {BT}_L^{\text {dual}}+\text {BT}_R^{\text {dual}} \end{aligned}$$

where the boundary terms (after that the homogeneous boundary conditions have been applied) are

$$\begin{aligned} \text {BT}_L^{\text {dual}}=\mathcal {V}^TZ^{ }_+\widetilde{\mathcal {C}_L}\left. Z_+^T \mathcal {V}\right| _{x_{L}}{,}\qquad \qquad \text {BT}_R^{\text {dual}}=\mathcal {V}^TZ_-^{ }\widetilde{\mathcal {C}_R}\left. Z_-^T \mathcal {V}\right| _{x_{R}} \end{aligned}$$

and where $\widetilde{\mathcal {C}_L}=-{\varDelta }^{ }_+-{\varDelta }_+R_L{\varDelta }_-^{-1}R_L^T{\varDelta }_+$ and $\widetilde{\mathcal {C}_R}={\varDelta }^{ }_- +{\varDelta }_-R_R{\varDelta }_+^{-1}R_R^T{\varDelta }_-$. For well-posedness of the dual problem $\widetilde{\mathcal {C}_L}\le 0$ and $\widetilde{\mathcal {C}_R}\le 0$ are necessary.

Recall that the primal problem is well-posed if $\mathcal {C}_L$, $\mathcal {C}_R\le 0$. The dual demand $\widetilde{\mathcal {C}_L}\le 0$ is directly fulfilled if $\mathcal {C}_L\le 0$ and $\widetilde{\mathcal {C}_R}\le 0$ follows from $\mathcal {C}_R\le 0$. When $R_L,R_R$ are square, invertible matrices, this is trivial. For general $R_L$, $R_R$ it can be shown with the help of the determinant relation in Lemma 1 in “Appendix B”. We conclude that the dual problem (16) with (17), (18) is well-posed if the primal problem (4) with (10) is well-posed.

Remark 4

In [2, 3] the dual consistent schemes are constructed by first designing the boundary conditions (for incompletely parabolic problems) such that both the primal and the dual problem are well-posed. Their different approach can partly be explained by their wish to have the boundary conditions in the special form $H_{L,R}U \mp BU_x=G_{L,R}$. Looking e.g. at Eq. (30) in [2], we note that after applying the boundary conditions, $U ^TM_LU \ge 0$ is needed for stability. However, if B is singular, replacing $BU _x$ by $\pm H_{L,R}U $ does not guarantee that all conditions have been completely used, and u and p in $U =[p,u]^T$ in $U ^TM_LU$ can be linearly dependent. Therefore the demand $M_L\ge 0$ after Eq. (31) in [2] is unnecessarily strong and gives some extra restrictions on the boundary conditions.

2.4.2 Discretization of the Dual Problem

The semi-discrete scheme approximating the dual problem (16) is written

$$\begin{aligned} \begin{aligned} V_\tau +(I_N\otimes \mathcal {R})V-(D_1\otimes \mathcal {A})V=&\, G+\left( H^{-1}e_{0}\otimes \widetilde{{\varSigma }_0}\ \right) \left( \widetilde{\mathcal {B}_{{L}}}{V}_{0}-\widetilde{g_{{L}}}\right) \\&+\left( H^{-1}e_N \otimes \widetilde{{\varSigma }_N} \right) \left( \widetilde{\mathcal {B}_{{R}}}V_N -\widetilde{g_{{R}}}\right) , \end{aligned} \end{aligned}$$

(20)

where $V_i(\tau )$ represents $\mathcal {V}(x_i,\tau )$. The SAT parameters $\widetilde{{\varSigma }_0}$ and $\widetilde{{\varSigma }_N}$ are yet unknown.

2.5 Dual Consistency

The semi-discrete scheme (11) is rewritten as $U_t+LU=\text {RHS}$, where

$$\begin{aligned} L&=(I_N\otimes \mathcal {R})+(D_1\otimes \mathcal {A})-\left( H^{-1}E_0 \otimes {\varSigma }_0\mathcal {B}_{{L}}\right) -\left( H^{-1}E_N \otimes {\varSigma }_N\mathcal {B}_{{R}}\right) \end{aligned}$$

and where $\text {RHS}$ only depends on known data. In contrast to its continuous counterpart $\mathcal {L}$, $L$ includes the boundary conditions explicitly. According to [2], the discrete adjoint operator is given by $L^*= \bar{H}^{-1}L^T\bar{H}$, which, using (12), leads to

$$\begin{aligned} \begin{aligned} L^* =&\, (I_N\otimes \mathcal {R})-(D_1\otimes \mathcal {A})-\left( H^{-1}E_{0}\otimes \mathcal {B}_{{L}}^T{\varSigma }_0^T+\mathcal {A}\right) \\&-\left( H^{-1}E_N \otimes \mathcal {B}_{{R}}^T{\varSigma }_N^T-\mathcal {A}\right) . \end{aligned} \end{aligned}$$

(21)

If $L^*$ is a consistent approximation of $\mathcal {L}^*$ in (19), then the scheme (11) is dual consistent. Looking at (20), we see that $L^*$ must have the form

$$\begin{aligned} \begin{aligned} L^*_{\text {goal}} =&\,(I_N\otimes \mathcal {R})-(D_1\otimes \mathcal {A}) -\left( H^{-1}E_{0}\otimes \widetilde{{\varSigma }_0} \widetilde{ \mathcal {B}_{{L}}} \right) \\&-\left( H^{-1}E_N \otimes \widetilde{{\varSigma }_N}\widetilde{ \mathcal {B}_{{R}}}\right) . \end{aligned} \end{aligned}$$

(22)

Thus we have dual consistency if the expressions in (21) and (22) are equal. This gives us the following requirements:

$$\begin{aligned} \mathcal {B}_{{L}}^T{\varSigma }_0^T+\mathcal {A}-\widetilde{{\varSigma }_0} \widetilde{ \mathcal {B}_{{L}}}=0,&\mathcal {B}_{{R}}^T{\varSigma }_N^T-\mathcal {A}-\widetilde{{\varSigma }_N}\widetilde{ \mathcal {B}_{{R}}}=0. \end{aligned}$$

Similarly to the penalty parameters (14) for the primal problem, we make the ansatz

$$\begin{aligned} \widetilde{{\varSigma }_0}&=\left( Z_+\widetilde{{\varGamma }_0}+Z_-\widetilde{{\varPi }_0}\right) \widetilde{P_L}^{-1},&\widetilde{{\varSigma }_N}&=\left( Z_+\widetilde{{\varPi }_N}+Z_-\widetilde{{\varGamma }_N}\right) \widetilde{P_R}^{-1} \end{aligned}$$

(23)

for the penalty parameters of the dual problem. The matrices $\widetilde{{\varGamma }_0}$, $\widetilde{{\varPi }_0}$, $\widetilde{{\varPi }_N}$ and $\widetilde{{\varGamma }_N}$ are at this stage unknown. We consider the left boundary and use (14) and (23), together with (7), (10) and (17), to write

$$\begin{aligned} \mathcal {B}_{{L}}^T{\varSigma }_0^T+\mathcal {A}-\widetilde{{\varSigma }_0} \widetilde{ \mathcal {B}_{{L}}}&=\left[ \begin{array}{c}Z_+^T \\ Z_-^T\end{array}\right] ^T\left[ \begin{array}{cc}{\varDelta }^{ }_++{\varPi }_0^T-\widetilde{{\varGamma }_0}\widetilde{R_L}&{}\quad {\varGamma }_0^T-\widetilde{{\varGamma }_0} \\ R_L^T{\varPi }_0^T-\widetilde{{\varPi }_0}\widetilde{R_L}&{}\quad \ {\varDelta }^{ }_-+R_L^T{\varGamma }_0^T-\widetilde{{\varPi }_0}\end{array}\right] \left[ \begin{array}{c}Z_+^T \\ Z_-^T\end{array}\right] \end{aligned}$$

which is zero if and only if the four entries of the matrix are zero. These four demands are rearranged to the more convenient form

$$\begin{aligned} {\varPi }_0&=-{\varDelta }^{ }_+-{\varDelta }^{ }_+R_L{\varDelta }^{-1 }_-{\varGamma }_0\end{aligned}$$

(24a)

$$\begin{aligned} \widetilde{R_L}&=-{\varDelta }^{ -1}_-R_L^T{\varDelta }^{ }_+\end{aligned}$$

(24b)

$$\begin{aligned} \widetilde{{\varGamma }_0}&={\varGamma }_0^T \end{aligned}$$

(24c)

$$\begin{aligned} \widetilde{{\varPi }_0}&={\varDelta }^{ }_--{\varDelta }^{ }_-\widetilde{R_L}{\varDelta }^{-1 }_+\widetilde{{\varGamma }_0}. \end{aligned}$$

(24d)

Note that (24a) only depends on parameters from the primal problem, while (24d) only depends on parameters from the dual problem. Interestingly enough, (24b) is nothing but the duality demand (18) for the continuous problem. The demand (24c) relates the penalty of the dual problem to the primal penalty.

Unless we actually want to solve the dual problem, it is enough to consider the first demand, (24a). We repeat the above derivation also for the right boundary and get the following result: The penalty parameters ${\varSigma }_0$ and ${\varSigma }_N$ in (14) with

$$\begin{aligned} {\varPi }_0=-{\varDelta }^{ }_+-{\varDelta }^{ }_+R_L{\varDelta }^{-1 }_-{\varGamma }_0,&{\varPi }_N={\varDelta }^{ }_--{\varDelta }_-R_R{\varDelta }_+^{-1}{\varGamma }_N, \end{aligned}$$

(25)

makes the discretization (11) dual consistent.

Remark 5

If the discrete primal problem (11) is dual consistent there is no need to check if the discrete dual problem (20) is stable—in [8] it is stated that stability of the primal problem implies stability of the dual problem, because the system matrix for the dual problem is the transpose of the system matrix for the primal problem—that is the primal and dual discrete problems have exactly the same growth rates for zero data.

2.6 Penalty Parameters for the Hyperbolic Problem

Consider the ansatz for the left penalty parameter, ${\varSigma }_0=(Z_+{\varPi }_0+Z_-{\varGamma }_0)P_L^{-1}$, which is given in (14). From a stability point of view, we must choose ${\varPi }_0$ and ${\varGamma }_0$ such that $C_0$ in (15) becomes non-positive. In addition, for dual consistency the constraint in (25) must be fulfilled. By inserting ${\varPi }_0=-{\varDelta }^{ }_+-{\varDelta }^{ }_+R_L{\varDelta }^{-1 }_-{\varGamma }_0$ from (25) into $C_0$ we obtain, after some rearrangements, the expression

$$\begin{aligned} C_0&= \left[ \begin{array}{c}P_L^{-1}\mathcal {B}_{{L}}\\ Z_-^T\end{array} \right] ^T \left[ \begin{array}{cc}-{\varDelta }_+-{\varDelta }^{ }_+R_L{\varDelta }^{-1 }_-{\varGamma }_0-({\varDelta }^{ }_+R_L{\varDelta }^{-1 }_-{\varGamma }_0)^T &{}\quad {\varGamma }_0^T{\varDelta }^{-1 }_-\mathcal {C}_L\\ \mathcal {C}_L{\varDelta }^{-1 }_-{\varGamma }_0&{}\quad \mathcal {C}_L\end{array}\right] \left[ \begin{array}{c}P_L^{-1}\mathcal {B}_{{L}}\\ Z_-^T\end{array} \right] . \end{aligned}$$

The most obvious strategy to make $C_0\le 0$ is to cancel the off-diagonal entries by putting ${\varGamma }_0=0$, but note that other choices exist. To single out the optimal (in a certain sense) candidate, we use another approach. With (7), (10) and $\widetilde{g}_L=P_L^{-1}g_{{L}}$, the left boundary term in (13) can be rearranged as

$$\begin{aligned} \begin{aligned} \text {BT}_L^{\text {disc.}}=&\,U_0^TZ_-^{ }\mathcal {C}_LZ_-^T{U}_0-2\widetilde{g}_L^T{\varDelta }^{ }_+R_LZ_-^TU_0+\widetilde{g}_L^T{\varDelta }^{}_+\widetilde{g}_L\\&-\left( \mathcal {B}_{{L}}U_0-g_{{L}}\right) ^TP_L^{-T}{\varDelta }^{}_+P_L^{-1}\left( \mathcal {B}_{{L}}U_0-g_{{L}}\right) \\&+2\left( \mathcal {B}_{{L}}U_0- g_{{L}}\right) ^T\left( {\varSigma }_0+Z_+{\varDelta }^{}_+P_L^{-1}\right) ^TU_0, \end{aligned} \end{aligned}$$

(26)

where we see that the first row corresponds exactly to the continuous boundary term $\text {BT}_L$ in (9). The second row is a damping term that is quadratically proportional to the solution’s deviation from data at the boundary, $\mathcal {B}_{{L}}U_0-g_{{L}}$. The term in the last row is only linearly proportional to this deviation, so we would prefer it to be zero. This is possible if the penalty parameter is chosen exactly as ${\varSigma }_0=-Z_+{\varDelta }^{ }_+P_L^{-1}$. Luckily this choice fulfills both the stability requirement and the duality constraint. We repeat the above derivation also for the right boundary and summarize our findings in Theorem 1.

Theorem 1

Consider the problem (4) with an associated factorization (6) where $Z$ is non-singular. With the particular choice of penalty parameters

$$\begin{aligned} {\varSigma }_0&=-Z_+{\varDelta }^{ }_+P_L^{-1} ,&{\varSigma }_N&=Z_-{\varDelta }^{ }_-P_R^{-1}, \end{aligned}$$

(27)

the scheme (11) is a stable and dual consistent discretization of (4). The matrices $P_L$ and $P_R$ are specified through (10).

Proof

Comparing with (14), we note that ${\varSigma }_0$ in (27) is obtained using ${\varPi }_0=-{\varDelta }_+$ and ${\varGamma }_0=0$. These values fulfill the left duality constraint in (25). Inserting ${\varGamma }_0=0$ into $ C_0$ above, we obtain $C_0=Z_-\mathcal {C}_LZ_-^T-\mathcal {B}_{{L}}^TP_L^{-T}{\varDelta }_+P_L^{-1}\mathcal {B}_{{L}}$, which is negative semi-definite if the continuous problem is well-posed (in the $\mathcal {C}_L\le 0$ sense). Thus the stability demand $C_0\le 0$ is fulfilled. The same is done for the right boundary, completing the proof. $\square $

Remark 6

The seemingly very specific choice of penalty parameters in Theorem 1 is, in fact, a family of penalty parameters, depending on the factorization used. Note that it is not necessary to use the same factorization for the left and the right boundary.

Remark 7

If characteristic boundary conditions (in the sense $R_L, R_R=0$) are used, the scheme (11) together with the SATs from Theorem 1 simplifies to

$$\begin{aligned} \begin{aligned} U_t+(I_N\otimes \mathcal {R})U+(D_1\otimes \mathcal {A})U&=F+(H^{-1}E_0 \otimes -\mathcal {A}_+ )U+(H^{-1}E_N \otimes \mathcal {A}_- )U\end{aligned} \end{aligned}$$

in the homogeneous case, where $\mathcal {A}_+=Z^{ }_+{\varDelta }^{ }_+Z_+^T$ and $\mathcal {A}_-=Z_-^{ }{\varDelta }^{ }_-Z_-^T$. When the factorization refers to the eigendecomposition, this corresponds to the SAT used for the characteristic boundary conditions of the nonlinear Euler equations in [9].

Remark 8

Assume that we are interested in the functional mentioned in Remark 3,

$$\begin{aligned} \mathcal {J}(\mathcal {U})&=\langle { \mathcal {G},\mathcal {U}}\rangle +{\varPhi }\mathcal {U}|_{x_{L}} +{\varPsi }\mathcal {U}|_{x_{R}}, \end{aligned}$$

(28)

which includes boundary terms. We approximate it with the discrete functional

$$\begin{aligned} J(U)&= G^T\bar{H}U+{\varPhi }U_{0}+{\varPsi }U_{N}-{\varPhi }_+^{ }P_L^{-1}(\mathcal {B}_{{L}}{U}_{0}-g_{{L}})-{\varPsi }_-^{ }P_R^{-1}(\mathcal {B}_{{R}}{U}_N -g_{{R}}). \end{aligned}$$

(29)

Note that we have added correction terms which are proportional to the boundary condition deviations, where ${\varPhi }_+^{ }$ and ${\varPsi }_-^{ }$ are specified through ${\varPhi }={\varPhi }_+^{ }Z_+^T+{\varPhi }_-^{ }Z_-^T$ and ${\varPsi }={\varPsi }_+^{ }Z_+^T+{\varPsi }_-^{ }Z_-^T$. These penalty-like correction terms, which are derived following techniques from [9], make the discrete functional superconvergent.

3 Parabolic Systems

Consider the parabolic (or incompletely parabolic) system of partial differential equations

$$\begin{aligned} \begin{aligned} \mathcal {U}_t+ \mathcal {A}\mathcal {U}_x-\mathcal {E}\mathcal {U}_{xx}=&\,\mathcal {F}, \qquad x\in [x_{L},x_{R}], \\ \mathcal {H}_L\mathcal {U}+\mathcal {G}_L\mathcal {U}_x=&\,g_{{L}}, \qquad x= x_{L},\\ \mathcal {H}_R\mathcal {U}+\mathcal {G}_R\mathcal {U}_x=&\,g_{{R}}, \qquad x= x_{R}, \end{aligned} \end{aligned}$$

(30)

for $t\ge 0$, augmented with the initial condition $\mathcal {U}(x,0)=\mathcal {U}_0(x)$. The matrices $\mathcal {A}$ and $\mathcal {E}\ge 0$ are symmetric $n\times n$ matrices, and we assume that $\mathcal {G}_L$ and $\mathcal {G}_R$ scales as $\mathcal {G}_L=\mathcal {K}_L\mathcal {E}$ and $\mathcal {G}_R=\mathcal {K}_R\mathcal {E}$, respectively. Treating $\mathcal {U}_x$ as a separate variable, we can rewrite (30) as a first order system (as was also done in [1, 9]), arriving at

$$\begin{aligned} \begin{aligned} \bar{\mathcal {I}}\bar{\mathcal {U}}_t+\bar{\mathcal {R}}\bar{\mathcal {U}}+ \bar{\mathcal {A}} \bar{\mathcal {U}}_x=&\,\bar{\mathcal {F}}, \quad x\in [x_{L},x_{R}], \\ \bar{\mathcal {B}}_L\bar{\mathcal {U}}=&\,g_{{L}}, \quad x= x_{L},\\ \bar{\mathcal {B}}_R\bar{\mathcal {U}}=&\,g_{{R}}, \quad x= x_{R}, \end{aligned} \end{aligned}$$

(31)

where

$$\begin{aligned} \bar{\mathcal {I}}=\left[ \begin{array}{cc}I_n&{}0\\ 0 &{} 0\end{array}\right] ,&\bar{\mathcal {R}}=\left[ \begin{array}{cc}0 &{}0\\ 0 &{}\mathcal {E}\end{array}\right] ,&\bar{\mathcal {U}}=\left[ \begin{array}{c}\mathcal {U}\\ \mathcal {U}_x\end{array}\right] ,&\bar{\mathcal {F}}=\left[ \begin{array}{c}\mathcal {F}\\ 0\end{array}\right] \end{aligned}$$

and

$$\begin{aligned} \bar{\mathcal {A}}=\left[ \begin{array}{cc} \mathcal {A}&{}-\mathcal {E}\\ -\mathcal {E}&{}\ 0 \end{array}\right] ,&\bar{\mathcal {B}}_L=\left[ \begin{array}{cc}\mathcal {H}_L&\mathcal {G}_L\end{array}\right] ,&\bar{\mathcal {B}}_R=\left[ \begin{array}{cc}\mathcal {H}_R&\mathcal {G}_R\end{array}\right] . \end{aligned}$$

(32)

The system (31) has almost the same form as (4) since $\bar{\mathcal {R}}\ge 0$ and $\bar{\mathcal {A}}$ are symmetric $ m\times m$ matrices, where $m= 2n$. Thus we can use the results from the hyperbolic case.

Remark 9

In [2, 3] the operators corresponding to $\mathcal {H}_L$, $\mathcal {G}_L$, $\mathcal {H}_R$ and $\mathcal {G}_R$ are square $n\times n$ matrices and their ranks are changed to suit the number of boundary conditions. We adapt the matrix dimensions instead. Both approaches have their respective advantages.

3.1 Discretization Using Wide-Stencil Second Derivative Operators

To discretize the parabolic problem, we first consider the reformulated problem (31), and use the results from the hyperbolic section. Then we rearrange the terms such that we get an equivalent scheme but in a form corresponding to (30). These steps, which are done in “Appendix A”, lead to

$$\begin{aligned} \begin{aligned} U_t +(D_1\otimes \mathcal {A})U-(D_1^2\otimes \mathcal {E})U=&\, F+\bar{H}^{-1}(e_{0}\otimes \widehat{\mu }_{0}+D_1^Te_{0}\otimes \widehat{\nu }_{0})\widehat{\xi }_{0}\\&+\bar{H}^{-1}(e_N \otimes \widehat{\mu }_N+D_1^Te_N \otimes \widehat{\nu }_N) \widehat{\xi }_N \end{aligned} \end{aligned}$$

(33)

where

$$\begin{aligned} \widehat{\xi }_{0}&=\mathcal {H}_L{U}_{0}+\mathcal {G}_L(\bar{D}U)_{0}-g_{{L}},&\widehat{\xi }_N&= \mathcal {H}_R{U}_N+\mathcal {G}_R(\bar{D}U)_N -g_{{R}}, \end{aligned}$$

(34)

and $\bar{H}=(H\otimes I_n)$ and $\bar{D}=(D_1\otimes I_n)$. The penalty parameters in (33) are

$$\begin{aligned} \begin{aligned} \widehat{\mu }_{0}&=(-\bar{Z}_1+\widehat{q}\bar{Z}_2)\bar{{\varDelta }}^{ }_+\widehat{{\varXi }}_L^{-1},\quad \quad&\widehat{\nu }_{0}&= \bar{Z}_2\bar{{\varDelta }}^{ }_+\widehat{{\varXi }}_L^{-1}, \\ \widehat{\mu }_N&= (\bar{Z}_3+\widehat{q}\bar{Z}_4)\bar{{\varDelta }}^{ }_-\widehat{{\varXi }}_R^{-1},&\widehat{\nu }_N&=-\bar{Z}_4\bar{{\varDelta }}^{ }_-\widehat{{\varXi }}_R^{-1},\end{aligned} \end{aligned}$$

(35)

where $\widehat{{\varXi }}_L=\bar{P}_{L}+\widehat{q}\mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{ }_+$ and $\widehat{{\varXi }}_R=\bar{P}_{R}-\widehat{q}\mathcal {K}_R\bar{Z}_4\bar{{\varDelta }}^{ }_-$ and where the matrices $\bar{Z}_{1,2,3,4}$ are defined through

$$\begin{aligned} \bar{Z}_+=\left[ \begin{array}{c}\bar{Z}_1\\ \bar{Z}_2\end{array}\right] ,&\bar{Z}_-=\left[ \begin{array}{c}\bar{Z}_3\\ \bar{Z}_4\end{array}\right] . \end{aligned}$$

(36)

As before, $\bar{{\varDelta }}^{ }_\pm $, $\bar{Z}_\pm $ and $\bar{P}_{L}$, $\bar{P}_{R}$ are described in (6) and (10), respectively, but are now obtained using $\bar{\mathcal {A}}$ and $\bar{\mathcal {B}}_L$, $\bar{\mathcal {B}}_R$ from (32). Finally, the quantity $\widehat{q}$ in (35) is given by

$$\begin{aligned} \widehat{q}=e_0^TH^{-1}e_0=e_N^TH^{-1}e_N. \end{aligned}$$

(37)

The matrix $H$ is positive definite and proportional to the grid size h, and thus $\widehat{q}$ is a positive scalar proportional to 1 / h.

Remark 10

Given well-posedness (i.e., in the sense $\bar{\mathcal {C}}_L=\bar{{\varDelta }}^{ }_-+\bar{R}_L^T\bar{{\varDelta }}^{ }_+\bar{R}_L\le 0$ and $\bar{\mathcal {C}}_R=-\bar{{\varDelta }}^{ }_+-\bar{R}_R^T\bar{{\varDelta }}^{ }_-\bar{R}_R\le 0$) and that $\widehat{q}\ge 0$, one can show that $\widehat{{\varXi }}_L$ and $\widehat{{\varXi }}_R$ are non-singular. This is done in “Appendix B”.

3.2 Discretization Using Narrow-Stencil Second Derivative Operators

In [1], it was suggested that dual consistency might require wide-stencil second derivative operators, but next we will show that this is not necessary. The semi-discrete scheme approximating (30) is now written, analogously to (33), as

$$\begin{aligned} \begin{aligned} U_t+(D_1\otimes \mathcal {A})U-(D_2\otimes \mathcal {E})U=&\, F+\bar{H}^{-1}\left( e_{0}\otimes \mu _0+S^Te_{0}\otimes \nu _{0}\right) \xi _{0}\\&+\bar{H}^{-1}\left( e_N \otimes \mu _N+S^Te_N \otimes \nu _N\right) \xi _N. \end{aligned} \end{aligned}$$

(38)

The operator $D_2$, which approximates the second derivative operator, is no longer limited to the previous form $D_1^2$, where the first derivative is used twice. However, $D_2$ must still fulfill the SBP relations

$$\begin{aligned} D_2=H^{-1}(-A_{{S}}+(E_N-E_0)S),&\qquad A_{{S}}=A_{{S}}^T=S^TMS\ge 0, \end{aligned}$$

(39)

where the first and last row of the matrix S are consistent difference stencils, see e.g., [13]. For dual consistency, $A_{{S}}$ must be symmetric. Note that when having narrow-stencil operators the interior of S is not uniquely defined and neither is the matrix M above (this is discussed in Sect. 4). Furthermore, in (38) we have

$$\begin{aligned} \xi _{0}&=\mathcal {H}_L{U}_0 +\mathcal {G}_L(\bar{S}U)_0 -g_{{L}},&\xi _N&= \mathcal {H}_R{U}_N+\mathcal {G}_R(\bar{S}U)_N -g_{{R}}, \end{aligned}$$

(40)

where

$$\begin{aligned} \bar{S}=S\otimes I_n,&(\bar{S}U)_0=(e_0^TS\otimes I_n)U,&(\bar{S}U)_N=(e_N^TS\otimes I_n)U. \end{aligned}$$

We also define

$$\begin{aligned} q\equiv q_0+|q_c|=q_N+|q_c| \end{aligned}$$

(41)

where

$$\begin{aligned} q_0=e_0^TM^{-1}e_0,&q_N=e_N^TM^{-1}e_N,&q_c=e_0^TM^{-1}e_N=e_N^TM^{-1}e_0, \end{aligned}$$

(42)

where M is a part of $D_2$ as stated in (39). In Sect. 4 we provide $q$ for various $D_2$ matrices. The penalty parameters $\mu _0$, $\nu _{0}$, $\mu _N$ and $\nu _N$ in (38) are now given by:

Theorem 2

Consider the problem (30) with $\mathcal {G}_L=\mathcal {K}_L\mathcal {E}$ and $\mathcal {G}_R=\mathcal {K}_R\mathcal {E}$. Furthermore, let $\bar{\mathcal {A}}$, which is specified in (32), be factorized as $\bar{\mathcal {A}}=\bar{Z}\bar{{\varDelta }}\bar{Z}^T$ as described in (6). Then the particular choice of penalty parameters

$$\begin{aligned} \begin{aligned} \mu _0&=\left( -\bar{Z}_1+q\bar{Z}_2\right) \bar{{\varDelta }}^{ }_+{\varXi }_L^{-1},\quad \quad&\nu _{0}&= \bar{Z}_2\bar{{\varDelta }}^{ }_+{\varXi }_L^{-1}, \\ \mu _N&= \left( \bar{Z}_3+q\bar{Z}_4\right) \bar{{\varDelta }}^{ }_-{\varXi }_R^{-1},&\nu _N&=-\bar{Z}_4\bar{{\varDelta }}^{ }_-{\varXi }_R^{-1},\end{aligned} \end{aligned}$$

(43)

where ${\varXi }_L=\bar{P}_{L}+q\mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{ }_+$ and ${\varXi }_R=\bar{P}_{R}-q\mathcal {K}_R\bar{Z}_4\bar{{\varDelta }}^{ }_-$, makes the scheme in (38) stable and dual consistent. The matrices $\bar{Z}_{1,2,3,4}$ are given in (36), $\bar{P}_{L}$, $\bar{P}_{R}$ are obtained from (10) (using $\bar{\mathcal {B}}_L$, $\bar{\mathcal {B}}_R$ in (32)) and $q$ is defined in (41). The matrices ${\varXi }_L$ and ${\varXi }_R$ are non-singular for well-posed problems, see “Appendix B”.

Note that $q$ in (41) is a generalization of $\widehat{q}$ in (37), and that the penalty parameters in (43) and (35) are identical if $q=\widehat{q}$. Hence the narrow-stencil scheme (38) is a generalization of the wide-stencil scheme in (33), since the schemes are identical if we choose $D_2=D_1^2$, $S=D_1$ and $M=H$. In the rest of this section we will justify these generalizations and prove Theorem 2 by showing that the penalties given in (43) indeed make the scheme (38) stable and dual consistent.

3.3 Stability When Using Narrow-Stencil Second Derivative Operators

We multiply the scheme (38) by $U^T\bar{H}$ from the left and add the transpose of the result. Thereafter using the SBP-properties in (12) and (39) yields

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert U\Vert ^2_{{}_H}+2{U}^T(S^TMS \otimes \mathcal {E}){U}&=2\langle {U, F}\rangle _{{}_H}+\text {BT}_L^{\text {disc.}}+\text {BT}_R^{\text {disc.}}, \end{aligned}$$

(44)

where

$$\begin{aligned} \begin{aligned} \text {BT}_L^{\text {disc.}}&= U_{0}^T\mathcal {A}{U}_{0}-2{U}_{0}^T\mathcal {E}(\bar{S}{U})_{0}+2\left( U_{0}^T \mu _0+(\bar{S}U)^T_{0}\nu _{0}\right) \xi _{0}\\ \text {BT}_R^{\text {disc.}}&=-{U}_N^T\mathcal {A}{U}_N+2{U}_N^T\mathcal {E}(\bar{S}{U})_N+2\left( U_N^T\mu _N+(\bar{S}U)_N^T\nu _N\right) \xi _N \end{aligned} \end{aligned}$$

(45)

where $\xi _{0,N}$ are given in (40). If $\text {BT}_L^{\text {disc.}}$ and $\text {BT}_R^{\text {disc.}}$ are non-positive for zero data the scheme is stable. This can be achieved if $\mu _0$, $\nu _{0}$, $\mu _N$ and $\nu _N$ are chosen freely, but the scheme should also be dual consistent. It turns out that in some cases these requirements are impossible to combine, for example when having Dirichlet boundary conditions. We therefore need an alternative way to show stability.

First, we assume that the penalty parameters $\nu _{0}$ and $\nu _N$ scales with $\mathcal {E}$. Let

$$\begin{aligned} \nu _{0}=-\mathcal {E}\kappa _0,&\nu _N=-\mathcal {E}\kappa _N. \end{aligned}$$

(46)

Next, we take a look at the wide case (which is partly presented in “Appendix A”). Using a wide counterpart to (46), $\widehat{\nu }_0=-\mathcal {E}\widehat{\kappa }_0$ and $\widehat{\nu }_N=-\mathcal {E}\widehat{\kappa }_N$, and the later relations in (71) and (72), we can rewrite (67b) as

$$\begin{aligned} \widehat{W}&=\bar{D}U+(H^{-1}e_0 \otimes \widehat{\kappa }_0)\widehat{\xi }_{0}+(H^{-1}e_N \otimes \widehat{\kappa }_N )\widehat{\xi }_N. \end{aligned}$$

We return to the narrow-stencil scheme (38). Inspired by the wide case, we define

$$\begin{aligned} W&\equiv \bar{S}U+( M^{-1} e_0\otimes \kappa _0)\xi _{0}+(M^{-1} e_N\otimes \kappa _N)\xi _N. \end{aligned}$$

(47)

From (47) we compute

$$\begin{aligned} W^T(M\otimes \mathcal {E})W =&\,U^T(S^TMS\otimes \mathcal {E})U+\left( 2(\bar{S}U)_0 +q_0 \kappa _0 \xi _{0}+q_c\kappa _N\xi _N\right) ^T\mathcal {E}\kappa _0 \xi _{0}\\&+\left( 2(\bar{S}U)_N +q_N\kappa _N\xi _N+q_c\kappa _0 \xi _{0}\right) ^T\mathcal {E}\kappa _N\xi _N \end{aligned}$$

where $q_0$, $q_N$ and $q_c$ are given in (42). In the general case, $q_c$ can be non-zero. Since we want to treat the two boundaries separately, we use Young’s inequality, $q_c(\xi _N^T\kappa _N^T\mathcal {E}\kappa _0\xi _{0}+\xi _{0}^T\kappa _0^T\mathcal {E}\kappa _N\xi _N)\le |q_c|\left( \xi _{0}^T\kappa _0^T\mathcal {E}\kappa _0\xi _{0}+\xi _N^T\kappa _N^T\mathcal {E}\kappa _N\xi _N\right) $, which yields

$$\begin{aligned} \begin{aligned} W^T(M\otimes \mathcal {E})W \le&U^T(S^TMS\otimes \mathcal {E})U+\left( 2(\bar{S}U)_{0}+q\kappa _{0}\xi _{0}\right) ^T \mathcal {E}\kappa _{0}\xi _{0}\\&+\left( 2(\bar{S}U)_N +q\kappa _N\xi _N\right) ^T\mathcal {E}\kappa _N\xi _N \end{aligned} \end{aligned}$$

(48)

where $q=q_0+|q_c|=q_N+|q_c|$, as stated in (41). Further, we note that multiplying (47) by $(e_0^T\otimes I_n)$ and $(e_N^T\otimes I_n)$, respectively, yields the relations $W_0= (\bar{S}U)_0+ q_0 \kappa _0\xi _{0}+q_c \kappa _N\xi _N$ and $W_N=(\bar{S}U)_N+q_c\kappa _0\xi _{0}+q_N \kappa _N\xi _N$. Instead of using those, which contain unwanted terms from the other boundary, we define

$$\begin{aligned} \widetilde{W}_0&\equiv (\bar{S}U)_0+q\kappa _0\xi _{0}&\widetilde{W}_N&\equiv (\bar{S}U)_N+q\kappa _N\xi _N. \end{aligned}$$

(49)

Inserting the relation (48) into (44), we obtain

$$\begin{aligned} \frac{\mathrm {d} }{\mathrm {d} t}\Vert U\Vert ^2_{{}_H}+2W^T(M\otimes \mathcal {E})W&\le 2\langle {U, F}\rangle _{{}_H}+ \widetilde{\text {BT}}{}^{\text {disc.}}_L+ \widetilde{\text {BT}}{}^{\text {disc.}}_R \end{aligned}$$

(50)

where (45) and (49) together with (46) yields

$$\begin{aligned} \begin{aligned} \widetilde{\text {BT}}{}^{\text {disc.}}_L&= U_{0}^T\mathcal {A}{U}_{0}-2{U}_{0}^T\mathcal {E}\widetilde{W}_{0}+2(U_{0}^T (\mu _0-q\nu _{0})-\widetilde{W}_{0}^T \nu _{0})\xi _{0}\\ \widetilde{\text {BT}}{}^{\text {disc.}}_R&=-{U}_N^T\mathcal {A}{U}_N+2{U}_N^T\mathcal {E}\widetilde{W}_N +2(U_N^T(\mu _N+q\nu _N)-\widetilde{W}_N^T\nu _N)\xi _N. \end{aligned} \end{aligned}$$

(51)

If the penalty parameters make $\widetilde{\text {BT}}{}^{\text {disc.}}_L\le 0$ and $\widetilde{\text {BT}}{}^{\text {disc.}}_R\le 0$ for zero data, (38) is stable.

Again taking the left boundary as an example, we define $\widetilde{U}_0=[U_0^T,\widetilde{W}_0^T]^T$ and write the first part of $ \widetilde{\text {BT}}{}^{\text {disc.}}_L$ in (51) as

$$\begin{aligned} U_0^T\mathcal {A}{U}_0 -2{U}_0^T\mathcal {E}\widetilde{W}_0=\widetilde{U}_0^T\bar{\mathcal {A}}\widetilde{U}_0. \end{aligned}$$

(52)

Next, using the relations (32), (49) and (40), recalling the assumptions $\mathcal {G}_L=\mathcal {K}_L\mathcal {E}$ and $\nu _{0}=-\mathcal {E}\kappa _0$, and thereafter using (43) from Theorem 2, we obtain

$$\begin{aligned} \begin{aligned} \bar{\mathcal {B}}_L\widetilde{U}_0-g_{{L}}&=\bar{P}_{L}(\bar{P}_{L}+q\mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{ }_+)^{-1}\xi _{0}. \end{aligned} \end{aligned}$$

(53)

From (43) we also get

$$\begin{aligned} \mu _0-q\nu _{0}=-\bar{Z}_1\bar{{\varDelta }}^{ }_+ (\bar{P}_{L}+q\mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{ }_+ )^{-1},&-\nu _{0}=-\bar{Z}_2\bar{{\varDelta }}^{ }_+(\bar{P}_{L}+q\mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{ }_+)^{-1} \end{aligned}$$

such that the second part of $\widetilde{\text {BT}}{}^{\text {disc.}}_L$ in (51) becomes

$$\begin{aligned} \begin{aligned} 2\left( U_0^T (\mu _0-q\nu _{0})-\widetilde{W}_0 ^T \nu _{0}\right) \xi _{0}&=2\widetilde{U}_0^T\bar{{\varSigma }}_{0}(\bar{\mathcal {B}}_L\widetilde{U}_0-g_{{L}}) \end{aligned} \end{aligned}$$

(54)

where the relations (36) and (53) have been used, and where $\bar{{\varSigma }}_{0}=-\bar{Z}_+\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1}$. Now we can, by inserting (52) and (54) into (51), write

$$\begin{aligned} \begin{aligned} \widetilde{\text {BT}}{}^{\text {disc.}}_L&=\widetilde{U}_0^T\bar{\mathcal {A}}\widetilde{U}_0+2\widetilde{U}_0^T\bar{{\varSigma }}_{0}\left( \bar{\mathcal {B}}_L\widetilde{U}_0-g_{{L}}\right) \end{aligned} \end{aligned}$$

which has exactly the same form as $\text {BT}_L^{\text {disc.}}$ in (13). We thus know that $\widetilde{\text {BT}}{}^{\text {disc.}}_L\le 0$ for zero data, since $\bar{{\varSigma }}_{0}$ is computed just as in the hyperbolic case. The same procedure can, of course, be repeated for the right boundary. We conclude that the scheme (38) with the penalty parameters (43) is stable.

3.4 Dual Consistency for Narrow-Stencil Second Derivative Operators

The dual problem of (30) is

$$\begin{aligned} \begin{aligned} \mathcal {V}_\tau - \mathcal {A}\mathcal {V}_x-\mathcal {E}\mathcal {V}_{xx}=&\,\mathcal {G}, \qquad x\in [x_{L},x_{R}],\\ \widetilde{\mathcal {H}_L} \mathcal {V}+\widetilde{\mathcal {G}_L} \mathcal {V}_x=&\,\widetilde{g_{{L}}}, \qquad x= x_{L},\\ \widetilde{\mathcal {H}_R}\mathcal {V}+\widetilde{\mathcal {G}_R} \mathcal {V}_x=&\,\widetilde{g_{{R}}}, \qquad x= x_{R}, \end{aligned} \end{aligned}$$

(55)

for $\tau \ge 0$ and with $\mathcal {V}(x,0)=\mathcal {V}_0(x)$. The spatial operator in (30) and its dual are thus

$$\begin{aligned} \mathcal {L}=\mathcal {A}\frac{\partial }{\partial x}-\mathcal {E}\frac{\partial ^2}{\partial x^2},&\mathcal {L}^*=-\mathcal {A}\frac{\partial }{\partial x}-\mathcal {E}\frac{\partial ^2}{\partial x^2}. \end{aligned}$$

(56)

The semi-discrete approximation of (55) is

$$\begin{aligned} \begin{aligned} V_\tau -(D_1\otimes \mathcal {A})V-(D_2\otimes \mathcal {E})V=&\, G+\bar{H}^{-1}\left( e_0 \otimes \widetilde{\mu _0} +S^Te_0 \otimes \widetilde{\nu _{0}} \right) \widetilde{\xi _{0}}\\&+\bar{H}^{-1}\left( e_N \otimes \widetilde{\mu _N}+S^Te_N \otimes \widetilde{\nu _N}\right) \widetilde{\xi _N}, \end{aligned} \end{aligned}$$

(57)

where

$$\begin{aligned} \widetilde{\xi _{0}}&=\widetilde{\mathcal {H}_L}V_0 +\widetilde{\mathcal {G}_L} (\bar{S}V)_0 -\widetilde{g_{{L}}},&\widetilde{\xi _N}&= \widetilde{\mathcal {H}_R}V_N+\widetilde{\mathcal {G}_R}(\bar{S}V)_N -\widetilde{g_{{R}}}. \end{aligned}$$

From (38) we see that the discrete operator, corresponding to $\mathcal {L}$ in (56), is

$$\begin{aligned} \begin{aligned} L=&\,(D_1\otimes \mathcal {A})-(D_2\otimes \mathcal {E})\\&-\bar{H}^{-1}\left( e_{0}\otimes \mu _0+S^Te_{0}\otimes \nu _{0}\right) \left( e_{0}^T\otimes \mathcal {H}_L+ e_{0}^TS\otimes \mathcal {G}_L\right) \\&-\bar{H}^{-1}\left( e_N \otimes \mu _N+S^Te_N \otimes \nu _N\right) \left( e_N^T\otimes \mathcal {H}_R+e_N^TS\otimes \mathcal {G}_R\right) . \end{aligned} \end{aligned}$$

(58)

Using the relations in (12) and (39), we obtain

$$\begin{aligned} L^* =\, \bar{H}^{-1}L^T\bar{H}=&-(D_1\otimes \mathcal {A})-(D_2\otimes \mathcal {E}) \\&-\bar{H}^{-1}\left( e_0e_0^T\otimes \mathcal {A}\right) +\bar{H}^{-1}\left( \left( S^Te_0e_0^T-e_0e_0^TS\right) \otimes \mathcal {E}\right) \\&+\bar{H}^{-1}\left( e_Ne_N^T\otimes \mathcal {A}\right) -\bar{H}^{-1} \left( \left( S^Te_Ne_N^T-e_Ne_N^TS\right) \otimes \mathcal {E}\right) \\&-\bar{H}^{-1}\left( e_0\otimes \mathcal {H}_L^T+ S^Te_0\otimes \mathcal {G}_L^T\right) \left( e_0^T \otimes \mu _0^T +e_0^TS \otimes \nu _{0}^T \right) \\&-\bar{H}^{-1}\left( e_N\otimes \mathcal {H}_R^T+S^Te_N\otimes \mathcal {G}_R^T \right) \left( e_N^T \otimes \mu _N^T+e_N^TS \otimes \nu _N^T\right) . \end{aligned}$$

However, from (57) we see that for dual consistency $L^*$ must have the form

$$\begin{aligned} L^*_{\text {goal}} =&\, -(D_1\otimes \mathcal {A})-(D_2\otimes \mathcal {E}) \\&-\bar{H}^{-1}\left( e_0 \otimes \widetilde{\mu _0} +S^Te_0 \otimes \widetilde{\nu _{0}} \right) \left( e_0^T\otimes \widetilde{\mathcal {H}_L} + e_0^TS\otimes \widetilde{\mathcal {G}_L}\right) \\&-\bar{H}^{-1}\left( e_N \otimes \widetilde{\mu _N}+S^Te_N \otimes \widetilde{\nu _N}\right) \left( e_N^T\otimes \widetilde{\mathcal {H}_R}+e_N^TS\otimes \widetilde{\mathcal {G}_R}\right) . \end{aligned}$$

Demanding that $L^*=L^*_{\text {goal}}$, gives us the duality constraints

$$\begin{aligned} \begin{aligned} \left[ \begin{array}{cc}\mathcal {H}_L^T\mu _0^T+\mathcal {A}&{} \mathcal {H}_L^T\nu _{0}^T+\mathcal {E}\\ \mathcal {G}_L^T\mu _0^T-\mathcal {E}&{}\mathcal {G}_L^T\nu _{0}^T\end{array}\right]&=\left[ \begin{array}{cc} \widetilde{\mu _0}\widetilde{\mathcal {H}_L}&{} \widetilde{\mu _0}\widetilde{\mathcal {G}_L}\\ \widetilde{\nu _{0}}\widetilde{\mathcal {H}_L}&{} \widetilde{\nu _{0}}\widetilde{\mathcal {G}_L}\end{array}\right] \\ \left[ \begin{array}{cc}\mathcal {H}_R^T\mu _N^T-\mathcal {A}&{} \mathcal {H}_R^T\nu _N^T-\mathcal {E}\\ \mathcal {G}_R^T\mu _N^T+\mathcal {E}&{}\mathcal {G}_R^T\nu _N^T\end{array}\right]&= \left[ \begin{array}{cc} \widetilde{\mu _N}\widetilde{\mathcal {H}_R}&{} \widetilde{\mu _N}\widetilde{\mathcal {G}_R}\\ \widetilde{\nu _N}\widetilde{\mathcal {H}_R}&{} \widetilde{\nu _N}\widetilde{\mathcal {G}_R}\end{array}\right] . \end{aligned} \end{aligned}$$

(59)

The duality constraints in (59) do not depend explicitly on the grid size h. Moreover, we already know that for the wide case, the penalty parameters in (35)—even though they contain the h-dependent constant $\widehat{q}$—gives dual consistency. Since the generalized penalty parameters in (43) have exactly the same form (the only difference is that they depend on another h-dependent constant, $q$) they will also yield dual consistency. We have thus shown that the penalty parameters in Theorem 2 indeed makes the scheme (38) stable and dual consistent.

Remark 11

The SAT parameters in Theorem 2 are probably a subset of all parameters giving stability and dual consistency since the duality constraint (59) could be used in combination with some other stability proof than the one presented here.

4 Computing $q$

We want to compute $q=q_0+|q_c|=q_N+|q_c|$ as stated in (41) and are thus looking for $q_0$, $q_N$ and $q_c$ specified in (42). For wide second derivative operators, M is equal to $H$, and is thus well-defined. When using narrow second derivative operators, M is defined in (39) through $A_{{S}}=S^TMS$. However, only the first and last row of S are clearly specified. In for example [4, 5, 13], the interior of S is chosen to be the identity matrix, and S is then invertible. $A_{{S}}$ is singular (since $A_{{S}}=(E_N-E_0)S-HD_2$, where $D_2$ and the first and last row of S are consistent difference operators) and thus an invertible S implies that M is singular.

If M and S are defined such that M is singular and S not (which is often the case), we use the following strategy to find $q$: The relation $A_{{S}}=S^TMS$ leads to $M^{-1}=SA_{{S}}^{-1}S^T$, but since $A_{{S}}$ is singular we define the perturbed matrix $\widetilde{A}_{{S}}\equiv A_{{S}}+\delta E_0$ and compute $\widetilde{M}^{-1}=S\widetilde{A}_{{S}}^{-1}S^T$ instead. This is motivated by the following proposition:

Proposition 1

Define $\widetilde{A}_{{S}}\equiv A_{{S}}+\delta E_0$, where $A_{{S}}$ is a part of a consistent second derivative operator $D_2$ fulfilling the relations in (39), $\delta \ne 0$ is a scalar parameter and $E_0$ is an all-zero matrix except for the element $(E_0)_{0,0}=1$. The inverse of $\widetilde{A}_{{S}}$ is $\widetilde{A}_{{S}}^{-1}=J/\delta +K_0$, where J is an all-ones matrix and $K_0$ is a matrix that does not depend on $\delta $. A consequence of this structure is that the corners of $\widetilde{M}^{-1}=S\widetilde{A}_{{S}}^{-1}S^T$ are independent of $\delta $, such that

$$\begin{aligned} q_0=e_0^T\widetilde{M}^{-1}e_0,&q_N=e_N^T\widetilde{M}^{-1}e_N,&q_c=e_0^T\widetilde{M}^{-1}e_N=e_N^T\widetilde{M}^{-1}e_0. \end{aligned}$$

(60)

Proposition 1 is proved in “Appendix C”. In Table 1 we provide the value of $q$ for all second derivative operators considered in this paper. The wide-stencil operators are given by $D_2=D_1^2$, where $D_1$ has the order of accuracy (2, 1), (4, 2), (6, 3) or (8, 4), paired as (interior order, boundary order). For these operators, the $q$ values are obtained directly from the matrix $H$. For the narrow-stencil operators, the $q$ values are computed according to Proposition 1. All examples in Table 1, except the narrow (2, 0) order operator, refers to operators given in [13]. Note that for some of the narrow-stencil operators $q$ varies slightly with N, see “Appendix C”.

Table 1 The $q$-values (scaled with h) for various second derivative operators

Full size table

Remark 12

The SBP operators with interior order 6 or 8 have free parameters, and if those parameters are chosen differently than in [13], that will affect $q$.

Remark 13

The quantity $q$ has nothing to do with dual consistency, but indicates how the penalty should be chosen to give energy stability. As an example, consider solving the scalar problem presented below in (62) with Dirichlet boundary conditions, using the scheme (64). Using the same technique as in Sect. 3.3, we find that the stability demands for the (left) penalty parameter $\mu _0$, in three special cases of $\nu _{0}$, are

$$\begin{aligned}&\text {Dual consistent} \, (\text {see Eq. } (65))&\nu _{0}=-\varepsilon&\mu _0\le -a/2-\varepsilon q\\&\text {Method 1 (dual inconsistent)}&\nu _{0}=0&\mu _0\le -a/2-\varepsilon q/4\\&\text {Method 2 (dual inconsistent)}&\nu _{0}=\varepsilon&\mu _0\le -a/2. \end{aligned}$$

The two latter approaches are frequently used but they do not yield dual consistency.

5 Examples and Numerical Experiments

In this section, we give a few concrete examples of the derived penalty parameters and perform some numerical simulations. We demonstrate that these penalty parameters give superconvergent functional output not only for the wide second derivative operators but also for the narrow ones. The following procedure is used:

(i)
Consider a continuous problem formulated as (30), where $\mathcal {G}_L=\mathcal {K}_L\mathcal {E}$ and $\mathcal {G}_R=\mathcal {K}_R\mathcal {E}$ are required. Identify $\bar{\mathcal {A}}$ and $\bar{\mathcal {B}}_L,\bar{\mathcal {B}}_R$ according to (32).
(ii)
Factorize $\bar{\mathcal {A}}$ as $\bar{\mathcal {A}}=\bar{Z}\bar{{\varDelta }}\bar{Z}^T,$ according to (6), where $\bar{Z}$ must be non-singular.
(iii)
Compute $\bar{P}_{L}$ and $\bar{P}_{R}$. From (10) we see that $\bar{P}_{L}$ is the first $m_+\times m_+$ part of $\bar{\mathcal {B}}_L\bar{Z}^{-T}$ and that $\bar{P}_{R}$ is the last $m_-\times m_-$ part of $\bar{\mathcal {B}}_R\bar{Z}^{-T}$, as
$$\begin{aligned} \bar{\mathcal {B}}_L\bar{Z}^{-T} = \left[ \begin{array}{ccc}\bar{P}_{L}&0_{m_+,m_0}&\bar{P}_{L}\bar{R}_L\end{array}\right] ,&\bar{\mathcal {B}}_R\bar{Z}^{-T} = \left[ \begin{array}{ccc}\bar{P}_{R}\bar{R}_R&0_{m_-,m_0}&\bar{P}_{R}\end{array}\right] . \end{aligned}$$
(61)
(iv)
The problem (30) is discretized in space using the scheme (38). Rearranging the terms in the scheme yields $U_t+LU=\text {RHS}$, where $L$ is given in (58), and where
$$\begin{aligned} \text {RHS}=F-\bar{H}^{-1}(e_0 \otimes \mu _0+S^Te_0 \otimes \nu _{0})g_{{L}}-\bar{H}^{-1}(e_N \otimes \mu _N+S^Te_N \otimes \nu _N)g_{{R}}. \end{aligned}$$
The penalty parameters $\mu _0$, $\nu _{0}$, $\mu _N$ and $\nu _N$ are specified in Theorem 2.
(v)
If $\mathcal {U}_t=0$, we have a stationary problem and the linear system $LU=\text {RHS}$ must be solved. For the time-dependent cases, we use the method of lines and discretize $U_t+LU=\text {RHS}$ in time using a suitable solver for ordinary differential equations.

Remark 14

When we have a hyperbolic problem, step (i) is omitted and step (iv) is modified such that the scheme (11) is used with penalty parameters given in Theorem 1.

In the simulations, we are interested in the functional error $\texttt {E}=J(U)-\mathcal {J}(\mathcal {U})$, where $\mathcal {J}(\mathcal {U})=\langle {\mathcal {G},\mathcal {U}}\rangle $, $J(U)=\langle { G,U}\rangle _{{}_H}$ and $G_i(t)=\mathcal {G}(x_i,t)$, but of course also in the solution error $\texttt {e}$, where $\texttt {e}_i(t)=U_i(t)-\mathcal {U}(x_i,t)$. We also investigate the spectra of $L$, that is the eigenvalues $\lambda _j$ of $L,$ with $j=1,2,\ldots ,n(N+1)$. Here we are in particular interested in the spectral radius $\rho =\max _j(|\lambda _j|)$ and in $\eta =\min _j(\mathfrak {R}(\lambda _j))$. For time-dependent problems $\rho {\varDelta }t\lesssim C$ is a crude estimate of the stability regions of explicit Runge–Kutta schemes, and thus $\rho $ can be seen as a measure of stiffness. The eigenvalue with the smallest real part, $\eta $, determines how fast a time-dependent solution converges to a steady-state solution, see [14]. Ideally, the penalties are chosen such that the errors and $\rho $ are kept small while $\eta $ is maximized. For steady problems or when using implicit time solvers, other properties (e.g., the condition number) might be of greater interest.

We start by investigating the scalar case in some detail, then give an example of a system with a solid wall type of boundary condition.

5.1 The Scalar Case

Consider the scalar advection-diffusion equation,

$$\begin{aligned} \begin{aligned} \mathcal {U}_t+a\mathcal {U}_x-\varepsilon \mathcal {U}_{xx}=&\,\mathcal {F}, \quad x\in [0,1], \\ \alpha _{{}_L}\mathcal {U}+\beta _{{}_L}\mathcal {U}_x=&\,g_{{L}}, \quad x= 0,\\ \alpha _{{}_R}\mathcal {U}+\beta _{{}_R}\mathcal {U}_x=&\,g_{{R}}, \quad x= 1, \end{aligned} \end{aligned}$$

(62)

valid for $t\ge 0$, with initial condition $\mathcal {U}(x,0)=\mathcal {U}_0(x)$ and where $\varepsilon >0$. Using (32) yields

$$\begin{aligned} \bar{\mathcal {A}}=\left[ \begin{array}{cc} a &{}-\varepsilon \\ -\varepsilon &{} 0 \end{array}\right] ,&\bar{\mathcal {B}}_L=\left[ \begin{array}{cc}\alpha _{{}_L}&\beta _{{}_L}\end{array}\right] ,&\bar{\mathcal {B}}_R=\left[ \begin{array}{cc}\alpha _{{}_R}&\beta _{{}_R}\end{array}\right] . \end{aligned}$$

In this case, the factorization of the matrix $\bar{\mathcal {A}}$ can be parameterized as

$$\begin{aligned} \bar{\mathcal {A}}=\bar{Z}\bar{{\varDelta }} \bar{Z}^T=\left[ \begin{array}{cc}\frac{a+\omega }{2s_1} &{}\frac{a-\omega }{2s_2} \\ \frac{-\varepsilon }{s_1}&{}\frac{-\varepsilon }{s_2}\end{array}\right] \left[ \begin{array}{cc}\frac{s_1^2}{\omega } &{}0\\ 0 &{}-\frac{s_2^2}{\omega }\end{array}\right] \left[ \begin{array}{cc}\frac{a+\omega }{2s_1} &{}\frac{a-\omega }{2s_2} \\ \frac{-\varepsilon }{s_1}&{}\frac{-\varepsilon }{s_2}\end{array}\right] ^T, \end{aligned}$$

(63)

with $\omega >0$. In particular, if $\omega =\sqrt{a^2+4\varepsilon ^2}$ and if $s_{1,2}^2=\omega (\omega \pm a)/2$, then the above factorization is the eigendecomposition of $\bar{\mathcal {A}}$. The discrete scheme mimicking (62) is

$$\begin{aligned} \begin{aligned} U_t+aD_1U-\varepsilon D_2U=&\,F+H^{-1}( \mu _0e_{0}+\nu _{0}S^Te_{0}) \left( \alpha _{{}_L}U_{0}+\beta _{{}_L}(S U)_{0}-g_{{L}}\right) \\&+H^{-1}(\mu _Ne_N +\nu _NS^Te_N) \left( \alpha _{{}_R}U_N+\beta _{{}_R}(S U)_N-g_{{R}}\right) . \end{aligned} \end{aligned}$$

(64)

To compute the penalty parameters, we need $\bar{P}_{L}$ and $\bar{P}_{R}$. They are given by (61), as $\bar{P}_{L}=\frac{s_1}{\omega }\left( \alpha _{{}_L}+\beta _{{}_L}\frac{a-\omega }{2\varepsilon }\right) $ and $\bar{P}_{R}=-\frac{s_2}{\omega }\left( \alpha _{{}_R}+\beta _{{}_R}\frac{a+\omega }{2\varepsilon }\right) $. Theorem 2 now yields

$$\begin{aligned} \begin{aligned} \mu _0&=\frac{-\frac{a+\omega }{2}-q\varepsilon }{ \alpha _{{}_L}+\beta _{{}_L}\frac{a-\omega }{2\varepsilon }-q\beta _{{}_L}},\nu _{0}=\frac{-\varepsilon }{ \alpha _{{}_L}+\beta _{{}_L}\frac{a-\omega }{2\varepsilon }-q\beta _{{}_L}},\\ \mu _N&=\frac{\frac{ a-\omega }{2}-q\varepsilon }{\alpha _{{}_R}+ \beta _{{}_R}\frac{ a+\omega }{2\varepsilon }+q\beta _{{}_R}},\nu _N=\frac{\varepsilon }{\alpha _{{}_R}+ \beta _{{}_R}\frac{ a+\omega }{2\varepsilon }+q\beta _{{}_R}}. \end{aligned} \end{aligned}$$

(65)

Formally $0<\omega <\infty $ is necessary (since in the limits $\bar{Z}$ becomes singular), but as long as the number of imposed boundary condition does not change or the penalty parameters go to infinity, we can allow $0\le \omega \le \infty $.

Remark 15

For Dirichlet boundary conditions we have $\alpha _{{}_{L,R}}=1$ and $\beta _{{}_{L,R}}=0$. Translating the penalty parameters for the advection-diffusion case in [1] to the form used here, it can be seen that they are exactly the same. If we instead have $\alpha _{{}_L}=\frac{|a|+a}{2}$, $\beta _{{}_L}=-\varepsilon $ at the left boundary and $\alpha _{{}_R}=\frac{|a|-a}{2}$, $\beta _{{}_R}=\varepsilon $ at the right boundary, we have boundary conditions of a low-reflecting far-field type. In the limit $\omega \rightarrow \infty $, we obtain $\mu _0=-1$, $\nu _{0}=0$, $\mu _N=-1$ and $\nu _N=0$. This particular choice corresponds to the penalty ${\varSigma }=-I$ used in [2, 3] for systems with boundary conditions of far-field type.

Remark 16

If $\varepsilon =0$ in (62) we get the transport equation, and then only one boundary condition should be given instead of two. That means that the derivation of the penalty parameters must be redone accordingly. See [1], where this case is covered.

Remark 17

The results can be extended to the case of varying coefficients. Consider the scalar diffusion problem $\mathcal {U}_t-(\varepsilon \mathcal {U}_{x})_x=\mathcal {F}$, where $\varepsilon (x)>0$, with Dirichlet boundary conditions at $x=0$ and $x=1$. Following [12], we define a narrow-stencil operator mimicking $\partial /\partial x(\varepsilon \partial /\partial x)$ as

$$\begin{aligned} D_2^{(\varepsilon )}=H^{-1}\left( -A_{{S}}^{(\varepsilon )}+(\varepsilon (1)E_N-\varepsilon (0)E_0)S\right) \end{aligned}$$

where $A_{{S}}^{(\varepsilon )}$ is symmetric and positive semi-definite. It is assumed that $D_2^{(\varepsilon )}=\varepsilon D_2$ holds when $\varepsilon $ is constant. The discrete problem becomes

$$\begin{aligned} U_t-D_2^{(\varepsilon )}U=&\, F+H^{-1}\left( \mu _0e_0 +\nu _{0}S^Te_0\right) \left( U_0-g_{{L}}\right) \\&+H^{-1}\left( \mu _Ne_N +\nu _NS^Te_N\right) \left( U_N-g_{{R}}\right) . \end{aligned}$$

The continuous problem is self-adjoint, so for dual consistency $L^*= H^{-1}L^TH=L$ is needed, which is fulfilled if $\nu _{0}=-\varepsilon (0)$ and $\nu _N=\varepsilon (1)$. Moreover, using $A_{{S}}^{(\varepsilon )}\ge \varepsilon _{\min } A_{{S}}$, where $\varepsilon _{\min }=\min _{x\in [0,1]}\varepsilon (x)$, it can be shown that the discretization will be stable if we choose $\mu _0\le -\frac{q}{\varepsilon _{\min }}\varepsilon (0)^2$ and $\mu _N\le -\frac{q}{\varepsilon _{\min }}\varepsilon (1)^2$. The superconvergence for functionals has been confirmed numerically and the resulting “best” choices of $\mu _0$ and $\mu _N$ are similar to what we obtain in the constant case considered below.

5.1.1 The Heat Equation with Dirichlet Boundary Conditions

We consider the heat equation with Dirichlet boundary conditions, i.e., problem (62) with $a=0$, $\alpha _{{}_{L,R}}=1$ and $\beta _{{}_{L,R}}=0$, which we solve using the scheme (64), with the penalty parameters given by (65). To isolate the errors originating from the spatial discretization, we first look at the steady problem. Thus we let $\mathcal {U}_t=0$ and solve $-\mathcal {U}_{xx}=\mathcal {F}(x)$ numerically. The resulting quantities $\rho $ and $\eta $, the solution error $\Vert \texttt {e}\Vert _{{}_H}$ and the functional error $|\texttt {E}|$ are given (as functions of the factorization parameter $\omega $) in Fig. 1. The spectral radius $\rho $ grows with $\omega $, so we do not want $\omega \rightarrow \infty $. On the other hand, the decay rate $\eta $ shrinks with $\omega $ so $\omega \rightarrow 0$ should also be avoided. The errors tend to decrease with increasing $\omega $ (the errors naturally varies slightly depending on the choice of $\mathcal {F}$ and $\mathcal {G}$, but the example in Fig. 1 shows a typical behavior). Thus the demand for accuracy is conflicting with the demand of keeping $\rho $ small (the aim to maximize $\eta $ is met before the aim to minimize the errors and is therefore not a limiting factor in this case). Empirically we have found that a good compromise, which gives small errors without increasing the spectral radius dramatically, is obtained using $\omega \approx q\varepsilon $.

From this example, we make an observation. If we use the eigenfactorization, $\omega =\sqrt{a^2+4\varepsilon ^2}=2$. However, in Fig. 1 we see that that choice is not especially good, since the errors then become much larger than if using $\omega =q\varepsilon $ (i.e., $\omega \approx 200$ and $\omega \approx 340$, respectively). In some cases, the difference in accuracy is so severe that the choice of factorization parameter $\omega $ affects the convergence rate. For the narrow operator with the order (2, 0), the errors behave as $\Vert \texttt {e}\Vert _{{}_H}\sim h^{3/2}$ when using $\omega \sim 1$, whereas we obtain the expected $\Vert \texttt {e}\Vert _{{}_H}\sim h^{2}$ when using $\omega \sim 1/h$. Similar behaviors are observed also for narrow operators of higher order, see below.

In Fig. 2a the errors $\Vert \texttt {e}\Vert _{{}_H}$ for the operators with interior order 6 are shown. For the narrow scheme, the convergence rate is 4.5 when using $\omega =2\varepsilon $ and 5.5 when using $\omega =q\varepsilon $. For the wide scheme, the order is 4 in both cases, but the error constant changes. In the 8th order case, Fig. 2b, the convergence rates are not affected, but in the narrow case the errors are around 2500 times smaller when using $\omega =q\varepsilon $. In this example, the functional errors are not as sensitive to $\omega $ as the solution errors. In the 6th order case, the convergence rates are slightly better than the predicted $2p=6$, both for the wide and the narrow schemes, see Fig. 3a. For the 8th order case, see Fig. 3b, the convergence rates are in all cases higher than $2p=8$. Thus the derived SAT parameters actually produce superconvergent functionals, also for the narrow operators.

Remark 18

For the time-dependent case, i.e., the actual heat equation, the superconvergence is confirmed both when using Dirichlet and additionally when using Neumann boundary conditions. This is presented in “Appendix D”.

5.1.2 The Advection–Diffusion Equation with Dirichlet Boundary Conditions

For simplicity we consider steady problems again, this time $a\mathcal {U}_x=\varepsilon \mathcal {U}_{xx}+\mathcal {F}$. That is, we solve (62) using the scheme (64), both with omitted time derivatives. The penalty parameters for Dirichlet boundary conditions are given in (65) with $\alpha _{{}_{L,R}}=1$ and $\beta _{{}_{L,R}}=0$.

First, we take a look at an interesting special case, namely when $\mathcal {F}=0$. Then the exact solution is $\mathcal {U}(x)=c_1+c_2\exp {(ax/\varepsilon )}$, where the constants $c_1$ and $c_2$ are determined by the boundary conditions. For $\varepsilon \ll |a|$ the exact solution forms a thin boundary layer at the outflow boundary, which for insufficient resolution usually leads to oscillations in the numerical solution. This can be handled by upwinding or artificial diffusion (see e.g., [16]). Here we will instead use the free parameter $\omega $ in the penalty to minimize the oscillating modes (the so-called $\pi $-modes).

We start with the wide second derivative operators. The ansatz $U_i=k^i$, inserted into the interior of the scheme (64), gives (for the second order case) a numerical solution

$$\begin{aligned} U_i=\widetilde{c}_1+\widetilde{c}_2(-1)^i+\widetilde{c}_3k_3^i+\widetilde{c}_4k_4^i,&\ k_{3,4}=\frac{ha}{\varepsilon }\pm \sqrt{\frac{h^2a^2}{\varepsilon ^2}+1}. \end{aligned}$$

Thus there exist two modes with alternating signs, $\widetilde{c}_2(-1)^i$ and $\widetilde{c}_4k_4^i$. However, one can show that the choice $\omega =|a|$ leads to $\widetilde{c}_2=0$ and to $\widetilde{c}_4$ being small enough compared to $\widetilde{c}_3$ such that $U_i$ is monotone. Empirically we have seen that this nice behavior holds also for the wide schemes with higher order of accuracy. In Fig. 4 the result using the scheme with interior order 8 is shown. The solution obtained using $\omega =|a|$ shows no oscillations in the interior, even though the grid is very coarse (the small over-shoot that can be observed close to the boundary layer when $\omega =|a|$ has nothing to do with the $\pi $-mode since it is in the zone where the stencil is modified due to the boundary closure). Moreover, this particular choice of factorization gives functional errors almost at machine precision (although it should be noted that this is a special case since $\mathcal {F}(x)=0$ and $\mathcal {G}(x)=1$).

For the narrow-stencil schemes, the existence of spurious oscillating modes depends on the resolution. In the second order case, the interior solution is

$$\begin{aligned} U_i=\widetilde{\widetilde{c}}_1+\widetilde{\widetilde{c}}_2\left( \frac{1+ah/(2\varepsilon )}{1-ah/(2\varepsilon )}\right) ^i, \end{aligned}$$

which has an oscillating component if $|a|h/(2\varepsilon )>1$. With very particular choices of the penalty parameters this component can be canceled (for the operators with order (2, 0) and (2, 1) it is achieved using $\omega =|a|/(1-\frac{2\varepsilon }{|a|h})$ and $\omega =|a|(1-\frac{\varepsilon }{|a|h})/(1-\frac{2\varepsilon }{|a|h})^2$, respectively) such that the numerical solution becomes constant. As soon as $|a|h/(2\varepsilon )<1$, this mode should not be canceled anymore, but how to do the transition between the unresolved case and the resolved case is not obvious. For the higher order schemes the $\omega $ which cancels the oscillating modes are even more complicated and in some cases negative (i.e., useless). In short, these particular, canceling choices of $\omega $ are not worth the effort. Instead, we recommend to use $\omega \approx |a|+q\varepsilon $ for the narrow-stencil operators, see below.

The above results were obtained under the assumption $\mathcal {F}=0$. Next, we use a forcing function $\mathcal {F}$ such that the exact solution is $\mathcal {U}(x)=\cos (30x)$. The resulting errors, together with $\rho $ and $\eta $, are shown in Fig. 5 for $a=1$ and $\varepsilon =10^{-6}$. Clearly, $\omega \approx |a|$ is still a good choice since the errors are small, $\rho $ is not too large and $\eta $ is maximal. For $\varepsilon \gg |a|h$ the curves are more similar to those in Fig. 1, and $\omega \approx |a|+q\varepsilon $ will be a better choice. In the transition region $\varepsilon \sim |a|h$ we sometimes observe order reduction. This can be seen in Figs. 6 and 7 for the schemes with an interior order of accuracy 6. Figure 6 shows the convergence rates when $\varepsilon =0.1$, which is large enough for the numerical solution to be well resolved. For the narrow scheme, we see an improved convergence rate for the solution error if $\omega =|a|+q\varepsilon $ is used. The functional output converges with $2p=6$ for all schemes. Figure 7 shows the convergence rates when $\varepsilon $ is decreased to $10^{-4}$, such that the numerical solution is badly resolved. For all schemes, except the wide scheme with the particular choice $\omega =|a|$, we see a pre-asymptotic order reduction of the functional.

We conclude that the penalties in Theorem 2 yields superconvergent functionals for the advection-diffusion equation with Dirichlet boundary conditions—in the asymptotic limit. In the special case when having the wide scheme with $\omega =|a|$ we even get superconvergent functionals in the troublesome transition region.

Remark 19

In this case the adjoint problem is $-a\mathcal {V}_x=\varepsilon \mathcal {V}_{xx}+\mathcal {G}$ with (homogeneous) Dirichlet boundary conditions. For a general $\mathcal {G}$, the solution has a thin boundary layer when $\varepsilon \ll |a|$ and the discrete solution will be non-smooth for coarse grids (except for the wide scheme with $\omega =|a|$). If the functional coincidentally is chosen such that $\mathcal {V}$ is “nice”, then the dual problem can be resolved even for coarse meshes. When we instead of $\mathcal {G}(x) =\cos (30x)$ use for example $\mathcal {G}(x)=-a\ell \pi \cos (\ell \pi x)+\varepsilon \ell ^2\pi ^2\sin (\ell \pi x)$, which has the solution $\mathcal {V}=\sin (\ell \pi x)$ if $\ell $ is an integer, then the functional in Fig. 7 converges with a clear 6th order of accuracy for all four schemes.

5.1.3 Functionals Including Boundary Terms

Consider solving the heat equation with one Dirichlet boundary condition and one Neumann condition, i.e., (62) with $a=0$, $\alpha _{{}_L}=1$, $\beta _{{}_L}=0$, $\alpha _{{}_R}=0$ and $\beta _{{}_R}=1$. If we are using the wide scheme, we can discretize the above problem as a system, by approximating $\mathcal {U}$ by $U$ and $\mathcal {U}_x$ by $\widehat{W}$ as in (67). We use the factorization suggested in (63) to compute the penalty parameters $\sigma _0$, $\tau _0$, $\sigma _N$ and $\tau _N$ and obtain the wide semi-discrete scheme

$$\begin{aligned} U_t -\varepsilon D_1\widehat{W}&=F-H^{-1}e_0 \frac{\omega }{2} ({U}_0 -g_{{L}})-H^{-1}e_N\varepsilon ( \widehat{W}_N -g_{{R}}),\\ \widehat{W}&= D_1 U+H^{-1}e_0 ({U}_0 -g_{{L}})-H^{-1}e_N \frac{2\varepsilon }{\omega }( \widehat{W}_N -g_{{R}}). \end{aligned}$$

Now assume that the functional of interest is

$$\begin{aligned} \mathcal {J}(\mathcal {U})=\int _0^1\mathcal {G}_1\mathcal {U}\,\mathrm {d} x+\int _0^1\mathcal {G}_2\mathcal {U}_x\,\mathrm {d} x+\alpha \mathcal {U}|_{x=1}+\beta \varepsilon \mathcal {U}_x|_{x=0}. \end{aligned}$$

With $\bar{\mathcal {U}}=\left[ \begin{array}{cc}\mathcal {U}^T,&\mathcal {U}_x^T\end{array}\right] ^T$ we identify ${\varPhi }=\left[ \begin{array}{cc}0,&\beta \varepsilon \end{array}\right] $ and ${\varPsi }=\left[ \begin{array}{cc}\alpha ,&0\end{array}\right] $ in (28). Next, using (29) with $\bar{P}_{L}=s_1/\omega $, $\bar{P}_{R}=-s_2/(2\varepsilon )$ and $Z_{\pm }$ given in (63), we obtain

$$\begin{aligned} J(U,\widehat{W})=G_1^THU+G_2^TH\widehat{W}+\alpha U_N+\beta \varepsilon \widehat{W}_0-\frac{2\varepsilon \alpha }{\omega }(\widehat{W}_N-g_{{R}})+\frac{\beta \omega }{2}(U_0-g_{{L}}). \end{aligned}$$

This is the time-dependent and constant coefficient version of (3.11)-(3.14) in [9] (their penalty corresponds to using $\omega =2$ at the left boundary and $\omega =\infty $ at the right boundary).

We choose $\mathcal {G}_1=\mathcal {G}_2=0$ and $\beta =\alpha =1$ such that $\mathcal {J}(\mathcal {U})=\mathcal {U}|_{x=1}+\varepsilon \mathcal {U}_x|_{x=0} $ (we consider the steady case, solving $-\mathcal {U}_{xx}=\mathcal {F}(x)$). Although the penalty-like correction terms are derived for the wide scheme, they work for the narrow-stencil scheme as well, with $\widehat{W}_{0,N}$ replaced by $\widetilde{W}_{0,N}$ from (49). This can be seen in Fig. 8 where we show the result when using the schemes with 8th order inner accuracy and where we by $\omega =\{2,\infty \}$ refer to the penalty choice used in [9].

Remark 20

When using the wide scheme, the $\int _0^1\mathcal {G}_2\mathcal {U}_x\,\mathrm {d} x$-part of the functional is approximated by $G_2^TH\widehat{W}$. However, in the narrow case the corresponding W in (47) is not uniquely defined (since M and S are not unique), only $\widetilde{W}_{0,N}$ are. How this kind of functionals best should be approximated is not investigated here, but one workaround is to simply consider $\left[ \mathcal {G}_2\mathcal {U}\right] _0^1-\int _0^1(\mathcal {G}_2)_x\mathcal {U}\,\mathrm {d} x$ instead of $\int _0^1\mathcal {G}_2\mathcal {U}_x\,\mathrm {d} x$.

5.1.4 Reflections From the Scalar Case

From what we have seen from the numerical experiments so far, the best choice of the factorization parameter $\omega $ is not only dependent on the continuous problem at hand (i.e., the parameters a and $\varepsilon $ and the type of boundary conditions), but also on numerical quantities, such as the grid resolution and if the stencils are wide or narrow. In some cases the factorization has almost no impact, sometimes it makes the system at hand extremely ill-conditioned or even changes the order of accuracy of the scheme.

In the scalar case it is rather straightforward to optimize with respect to the single factorization parameter $\omega $, but for systems this task becomes non-trivial and one might have to settle for the factorizations at hand. Nevertheless, we note that the eigendecomposition is not necessarily the best factorization and that it could be worth searching for other options. With that being said, next we consider a system and use nothing but the eigendecomposition for constructing the penalty parameters.

5.2 A Fluid Dynamics System with Solid Wall Boundary Conditions

The symmetrized, compressible Navier–Stokes equations in one dimension (with ${\varOmega }=[0,1]$) with frozen coefficients is given by (30), with

$$\begin{aligned} \mathcal {A}=\left[ \begin{array}{ccc}\bar{u} &{}a&{}0\\ a&{}\bar{u} &{}b\\ 0 &{}b&{}\bar{u}\end{array}\right] ,&\mathcal {E}=\varepsilon \left[ \begin{array}{ccc}0 &{}0 &{}0\\ 0 &{}\varphi &{}0\\ 0 &{}0 &{}\psi \end{array}\right] ,&\mathcal {U}=\left[ \begin{array}{c}\varrho \\ u\\ T\end{array}\right] , \end{aligned}$$

where the constants $\bar{u}$, $a$, $b$, $\varepsilon $, $\varphi $ and $\psi $ denote suitable physical quantities and where $\varrho $, u and T are scaled perturbations in density, velocity and temperature. Let $\bar{u}<0$ and $\varepsilon , \varphi , \psi >0$. In this case, two boundary conditions should be given at the left boundary and three at the right boundary. We impose solid wall boundary conditions (a perfectly insulated wall) at the left boundary, that is $u(0,t)= T_x(0,t)=0$. At the right boundary, we impose free stream boundary conditions of Dirichlet type, as $\mathcal {U}(1,t)=\mathcal {U}_\infty $. These boundary conditions give a well-posed problem. The boundary operators are

$$\begin{aligned} \mathcal {H}_L&=\left[ \begin{array}{ccc}0 &{}1 &{}0\\ 0 &{}0\ &{}0\end{array}\right] ,&\mathcal {G}_L&=\left[ \begin{array}{ccc}0 &{}0 &{}0\\ 0 &{}0\ &{}1\end{array}\right] ,&\mathcal {H}_R&=\left[ \begin{array}{ccc}1 &{}0 &{}0\\ 0\ &{}1 &{}0\\ 0 &{}0 &{}1\end{array}\right] ,&\mathcal {G}_R&=\left[ \begin{array}{ccc}0\ &{}0 &{}0\\ 0 &{}0 &{}0\\ 0 &{}0 &{}0\end{array}\right] . \end{aligned}$$

These boundary conditions can not be rearranged to the far-field form and therefore the penalty used in [2, 3] can not be applied. We identify $\bar{\mathcal {A}}$, $\bar{\mathcal {B}}_L$ and $\bar{\mathcal {B}}_R$ according to (32), and factorize $\bar{\mathcal {A}}$ using the eigendecomposition. The dual consistent penalty parameters are now described in (43), with

$$\begin{aligned} \mathcal {K}_L&=\left[ \begin{array}{ccc}0 &{}0 &{}0\\ 0 &{}0\ &{}1/(\varepsilon \psi )\end{array}\right] ,&\mathcal {K}_R&=\left[ \begin{array}{ccc}0 &{}0 &{}0\\ 0 &{}0 &{}0\\ 0 &{}0\ &{}0\end{array}\right] . \end{aligned}$$

As a comparison, we use the alternative penalty parameters (cf. Method 2 in Remark 13)

$$\begin{aligned} \tilde{\mu }_0=\left[ \begin{array}{rc}-a&{} 0\\ 0 &{}0\\ -b&{}\varepsilon \psi \end{array}\right] ,&\tilde{\nu }_0=\left[ \begin{array}{cc}0 &{}0\\ \varepsilon \varphi \ &{} 0\\ 0 &{} 0\end{array}\right] ,&\tilde{\mu }_N=\left[ \begin{array}{ccc}\bar{u} &{}a&{}0\\ 0\ &{} \bar{u} &{} 0\\ 0 &{}b&{} \bar{u}\end{array}\right] ,&\tilde{\nu }_N=\left[ \begin{array}{ccc}0 &{} 0 &{} 0\\ 0 &{} -\varepsilon \varphi &{} 0\\ 0 &{} 0 &{} -\varepsilon \psi \end{array}\right] \end{aligned}$$

which give stability (they are chosen such that the boundary terms in (45) are non-positive for zero data) but they do not fulfill the demands for dual consistency.

In the numerical simulations we use the exact solution $\varrho =\cos (7x)$, $u=\sin (13x)$ and $T=\cos (30x)$ and as weight functions we use $\mathcal {G}(x)=[1,\ 0,\ 0]^T$, $\mathcal {G}(x)=[0,\ 1,\ 0]^T$ and $\mathcal {G}(x)=[0,\ 0,\ 1]^T$ (such that one functional output is obtained for each variable). Figure 9 shows the resulting errors when using the schemes with interior order 6. In the wide case, the solutions do not differ much. In the narrow case, the dual consistent solution converges one half order slower than the dual inconsistent one (order 4 for $\varrho $ and 4.5 for u, T compared to 4.5 for $\varrho $ and 5 for u, T), but the result is still as good as in the wide case. Moreover, recall that in the scalar case the order could be improved by choosing another factorization than the eigendecomposition, see Fig. 6a. In Fig. 10 we see that the functionals convergence with the expected 6th order for both the dual consistent schemes, whereas the dual inconsistent schemes yield 5th order.

The diffusion parameter is decreased from $\varepsilon =0.01$ to $\varepsilon =10^{-6}$ and the resulting errors are shown in Figs. 11 and 12. Now the solution errors obtained using the dual consistent schemes are slightly better than the ones obtained using the dual inconsistent schemes, but the difference is small, see Fig. 11.

For the functional errors the difference is more pronounced, see Fig. 12. In the wide case, the dual consistent scheme produces a perfect convergence rate of almost 7. This behavior was observed already in the scalar case, when the factorization parameter was chosen exactly as $\omega =|a|$ (which for small amounts of diffusion is very close to the eigendecomposition). For the narrow-stencil schemes the dual consistent scheme still produces smaller errors than the dual inconsistent scheme, but the order is reduced to 3 (a pre-asymptotic low-order tendency seen already in Fig. 7 in the scalar case).

Extrapolating from the scalar case, we assume that it could be worth searching for better penalty parameters for the narrow-stencil schemes when having diffusion dominated problems. However, for convection dominated problems the wide scheme with a factorization close to the eigendecomposition is hard to beat.

6 Concluding Remarks

We use a finite difference method based on summation by parts operators, combined with a penalty method for the boundary conditions (SBP–SAT). Diagonal-norm SBP operators have 2p-order accurate interior stencils and p-order accurate boundary closures, which limits the global accuracy of the solution to $p+1$ (or $p + 2$ for parabolic problems under certain conditions). Recently, it has been shown that SBP–SAT schemes can give functional estimates that are $\mathcal {O}(h^{2p})$. To achieve this superconvergence, the SAT parameters must be carefully chosen to ensure that the discretization is dual consistent.

We first look at hyperbolic systems and derive stability requirements and duality constraints for the SATs. Then we present a recipe to choose these SAT parameters such that both these (independent) demands are fulfilled. When wide-stencil second derivative operators are used, the results automatically extend to parabolic problems. We generalize the recipe such that it holds also for narrow-stencil second derivative operators.

The 2p order convergence of SBP–SAT functional estimates is confirmed numerically for a variety of scalar examples, as well as for an incompletely parabolic system. For low-diffusion advection-diffusion problems, the superconvergence is sometimes seen first asymptotically. Generally speaking, the narrow-stencil schemes are better for diffusion dominated problems whereas the wide schemes are preferable for advection dominated problems.

In most cases the derived dual consistent SAT parameters have some remaining degree of freedom. The free parameters can be used to improve the accuracy of the primary solution or to tune numerical quantities such as spectral radius, decay rate or condition numbers. Optimal choices of the SAT parameters are suggested for the scalar problems, however, to do the same for systems is considered a task for the future.

References

Berg, J., Nordström, J.: Superconvergent functional output for time-dependent problems using finite differences on summation-by-parts form. J. Comput. Phys. 231(20), 6846–6860 (2012)
Article MathSciNet MATH Google Scholar
Berg, J., Nordström, J.: On the impact of boundary conditions on dual consistent finite difference discretizations. J. Comput. Phys. 236, 41–55 (2013)
Article MathSciNet MATH Google Scholar
Berg, J., Nordström, J.: Duality based boundary conditions and dual consistent finite difference discretizations of the Navier–Stokes and Euler equations. J. Comput. Phys. 259, 135–153 (2014)
Article MathSciNet MATH Google Scholar
Carpenter, M.H., Nordström, J., Gottlieb, D.: A stable and conservative interface treatment of arbitrary spatial accuracy. J. Comput. Phys. 148(2), 341–365 (1999)
Article MathSciNet MATH Google Scholar
Eriksson, S., Nordström, J.: Analysis of the order of accuracy for node-centered finite volume schemes. Appl. Numer. Math. 59(10), 2659–2676 (2009)
Article MathSciNet MATH Google Scholar
Fernández, D.C.D.R., Hicken, J.E., Zingg, D.W.: Review of summation-by-parts operators with simultaneous approximation terms for the numerical solution of partial differential equations. Comput. Fluids 95, 171–196 (2014)
Article MathSciNet Google Scholar
Gustafsson, B., Kreiss, H.O., Oliger, J.: Time-Dependent Problems and Difference Methods. Wiley, New York (2013)
Book MATH Google Scholar
Hicken, J.E.: Output error estimation for summation-by-parts finite-difference schemes. J. Comput. Phys. 231(9), 3828–3848 (2012)
Article MathSciNet MATH Google Scholar
Hicken, J.E., Zingg, D.W.: Superconvergent functional estimates from summation-by-parts finite-difference discretizations. SIAM J. Sci. Comput. 33(2), 893–922 (2011)
Article MathSciNet MATH Google Scholar
Hicken, J.E., Zingg, D.W.: Summation-by-parts operators and high-order quadrature. J. Comput. Appl. Math. 237(1), 111–125 (2013)
Article MathSciNet MATH Google Scholar
Kreiss, H.O., Lorenz, J.: Initial-Boundary Value Problems and the Navier–Stokes Equations. Academic Press, New York (1989)
MATH Google Scholar
Mattsson, K.: Summation by parts operators for finite difference approximations of second-derivatives with variable coefficients. J. Sci. Comput. 51(3), 650–682 (2012)
Article MathSciNet MATH Google Scholar
Mattsson, K., Nordström, J.: Summation by parts operators for finite difference approximations of second derivatives. J. Comput. Phys. 199(2), 503–540 (2004)
Article MathSciNet MATH Google Scholar
Nordström, J., Eriksson, S., Eliasson, P.: Weak and strong wall boundary procedures and convergence to steady-state of the Navier–Stokes equations. J. Comput. Phys. 231(14), 4867–4884 (2012)
Article MathSciNet MATH Google Scholar
Nordström, J., Svärd, M.: Well-posed boundary conditions for the Navier–Stokes equations. SIAM J. Numer. Anal. 43(3), 1231–1255 (2005)
Article MathSciNet MATH Google Scholar
Quarteroni, A., Sacco, R., Saleri, F.: Numerical Mathematics. Springer, Berlin (2000)
MATH Google Scholar
Strand, B.: Summation by parts for finite difference approximation for d/dx. J. Comput. Phys. 110(1), 47–67 (1994)
Article MathSciNet MATH Google Scholar
Svärd, M., Nordström, J.: On the order of accuracy for difference approximations of initial-boundary value problems. J. Comput. Phys. 218(1), 333–352 (2006)
Article MathSciNet MATH Google Scholar
Svärd, M., Nordström, J.: Review of summation-by-parts schemes for initial-boundary-value problems. J. Comput. Phys. 268, 17–38 (2014)
Article MathSciNet MATH Google Scholar

Download references

Acknowledgements

The author would like to sincerely thank the anonymous referees for their valuable comments and suggestions.

Author information

Authors and Affiliations

Department of Mathematics, Technische Universität Darmstadt, Dolivostr. 15, 64293, Darmstadt, Germany
Sofia Eriksson

Authors

Sofia Eriksson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sofia Eriksson.

Appendices

Appendix A: Reformulation of the First Order Form Discretization

We derive the scheme (33) with penalty parameters (35), using the hyperbolic results.

Step 1: Consider the problem (31), which is a first order system. We represent the solution $\bar{\mathcal {U}}$ by a discrete solution vector $\bar{U}=[\bar{U}_0^T,\bar{U}_1^T,\ldots ,\bar{U}_N^T]^T$, where $\bar{U}_i(t)\approx \bar{\mathcal {U}}(x_i,t)$ and discretize (31) exactly as was done in (11) for the hyperbolic case, that is as

$$\begin{aligned} \begin{aligned} (I_N\otimes \bar{\mathcal {I}})\bar{U}_t+(I_N\otimes \bar{\mathcal {R}})\bar{U}+(D_1\otimes \bar{\mathcal {A}})\bar{U}=&\,\bar{F}+(H^{-1}e_{0}\otimes \bar{{\varSigma }}_{0}) (\bar{\mathcal {B}}_L{\bar{U}}_{0}-g_{{L}})\\&+(H^{-1}e_N \otimes \bar{{\varSigma }}_N)(\bar{\mathcal {B}}_R{\bar{U}}_N -g_{{R}}). \end{aligned} \end{aligned}$$

(66)

As proposed in Theorem 1, we let $\bar{{\varSigma }}_{0}=-\bar{Z}_+\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1}$ and $\bar{{\varSigma }}_N=\bar{Z}_-\bar{{\varDelta }}^{ }_-\bar{P}_{R}^{-1}$.

Step 2: We discretize (30) directly by approximating $\mathcal {U}$ by $U$ and $\mathcal {U}_x$ by $\widehat{W}$. We obtain

$$\begin{aligned} \begin{aligned} U_t +(D_1\otimes \mathcal {A})U-(D_1\otimes \mathcal {E})\widehat{W}=&\,F+\left( H^{-1}e_{0}\otimes \sigma _0\right) \left( \mathcal {H}_L{U}_{0}+\mathcal {G}_L\widehat{W}_{0}-g_{{L}}\right) \\&+\left( H^{-1}e_N \otimes \sigma _N\right) \left( \mathcal {H}_R{U}_N+\mathcal {G}_R\widehat{W}_N -g_{{R}}\right) , \end{aligned} \end{aligned}$$

(67a)

$$\begin{aligned} \begin{aligned} (I_N\otimes \mathcal {E})\widehat{W} -(D_1\otimes \mathcal {E})U=&\,\left( H^{-1}e_{0}\otimes \tau _0\right) \left( \mathcal {H}_L{U}_{0}+\mathcal {G}_L\widehat{W}_{0}-g_{{L}}\right) \\&+\left( H^{-1}e_N \otimes \tau _N\right) \left( \mathcal {H}_R{U}_N+\mathcal {G}_R\widehat{W}_N -g_{{R}}\right) . \end{aligned} \end{aligned}$$

(67b)

If $ \bar{{\varSigma }}_{0}=[\sigma _0^T,\tau _0^T]^T$ and $\bar{{\varSigma }}_N=[\sigma _N^T,\tau _N^T]^T$, then (67) is a permutation of (66).

Step 3: The scheme in (67) is a system of differential algebraic equations, so we would like to cancel the variable $\widehat{W}$ and get a system of ordinary differential equations instead. Multiplying (67b) by $\bar{D}=(D_1\otimes I_n)$ and adding the result to (67a), yields

$$\begin{aligned} \begin{aligned} U_t +(D_1\otimes \mathcal {A})U-(D_1^2\otimes \mathcal {E})U=&\,F+\left( H^{-1}e_{0}\otimes \sigma _0+D_1H^{-1}e_{0}\otimes \tau _0\right) \widehat{\chi }_{0}\\&+\left( H^{-1}e_N \otimes \sigma _N+D_1H^{-1}e_N \otimes \tau _N\right) \widehat{\chi }_N, \end{aligned} \end{aligned}$$

where

$$\begin{aligned} \widehat{\chi }_{0}=\mathcal {H}_L{U}_{0}+\mathcal {G}_L\widehat{W}_{0}-g_{{L}},\qquad \quad \widehat{\chi }_N=\mathcal {H}_R{U}_N+\mathcal {G}_R\widehat{W}_N -g_{{R}}. \end{aligned}$$

(68)

Next, using the properties in (12), together with the fact that $H$ is diagonal, we compute

$$\begin{aligned} D_1H^{-1}e_0 =H^{-1}\left( -\widehat{q}I_N-D_1^T\right) e_0,&D_1H^{-1}e_N=H^{-1}\left( \widehat{q}I_N-D_1^T\right) e_N, \end{aligned}$$

where $\widehat{q}$ is the scalar $\widehat{q}=e_0^TH^{-1}e_0=e_N^TH^{-1}e_N$ given in (37). This yields

$$\begin{aligned} \begin{aligned} U_t +(D_1\otimes \mathcal {A})U-\left( D_1 ^2\otimes \mathcal {E}\right) U=&\,F+\bar{H}^{-1}\left( e_{0}\otimes (\sigma _0-\widehat{q}\tau _0)-D_1^Te_{0}\otimes \tau _0\right) \widehat{\chi }_{0}\\&+\bar{H}^{-1}\left( e_N \otimes (\sigma _N+\widehat{q}\tau _N) -D_1^Te_N \otimes \tau _N\right) \widehat{\chi }_N, \end{aligned} \end{aligned}$$

(69)

where $\bar{H}=(H\otimes I_n)$. However, the boundary condition deviations $\widehat{\chi }_0$ and $\widehat{\chi }_N$ still contain $\widehat{W}$, so we multiply (67b) by $(e_0^T\otimes I_n)$ and $(e_N^T\otimes I_n)$, respectively, to get

$$\begin{aligned} \mathcal {E}\widehat{W}_0 - \mathcal {E}(\bar{D}U)_0&=\widehat{q}\tau _0\widehat{\chi }_0,&\mathcal {E}\widehat{W}_N- \mathcal {E}(\bar{D}U)_N&=\widehat{q}\tau _N\widehat{\chi }_N. \end{aligned}$$

(70)

Next, we need boundary condition deviations without $\widehat{W}$, and define

$$\begin{aligned} \widehat{\xi }_{0}&=\mathcal {H}_L{U}_0 +\mathcal {G}_L(\bar{D}U)_0 -g_{{L}},&\widehat{\xi }_N&= \mathcal {H}_R{U}_N+\mathcal {G}_R(\bar{D}U)_N -g_{{R}}. \end{aligned}$$

Recall that $\mathcal {G}_{L,R}=\mathcal {K}_{L,R}\mathcal {E}$. Using (70), we can now relate $\widehat{\xi }_{0,N}$ above to $\widehat{\chi }_{0,N}$ in (68) as

$$\begin{aligned} \widehat{\xi }_{0}&=(I_{m_+}-\widehat{q}\mathcal {K}_L\tau _0)\widehat{\chi }_0 ,&\widehat{\xi }_N&= (I_{m_-}-\widehat{q}\mathcal {K}_R\tau _N)\widehat{\chi }_N, \end{aligned}$$

(71)

where $I_{m_+}$ and $I_{m_-}$ are identity matrices of sizes corresponding to the number of positive (${m_+}$) and negative ($m_-$) eigenvalues of $\bar{\mathcal {A}}$, respectively. Inserting $\widehat{\chi }_{0,N}$ from (71) into (69) allows us to finally write the scheme without any $\widehat{W}$ terms and we obtain (33), with

$$\begin{aligned} \begin{aligned} \widehat{\mu }_{0}&=(\sigma _0-\widehat{q}\tau _0) (I_{m_+}-\widehat{q}\mathcal {K}_L\tau _0)^{-1},\widehat{\nu }_{0}=-\tau _0(I_{m_+}-\widehat{q}\mathcal {K}_L\tau _0)^{-1}, \\ \widehat{\mu }_N&=(\sigma _N+\widehat{q}\tau _N) (I_{m_-}-\widehat{q}\mathcal {K}_R\tau _N)^{-1},\widehat{\nu }_N=-\tau _N(I_{m_-}-\widehat{q}\mathcal {K}_R\tau _N)^{-1}.\qquad \end{aligned} \end{aligned}$$

(72)

From Step 1 and 2 we know that

$$\begin{aligned} \left[ \begin{array}{c}\sigma _0\\ \tau _0\end{array}\right] =-\left[ \begin{array}{c}\bar{Z}_1\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1}\\ \bar{Z}_2\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1}\end{array}\right] ,&\left[ \begin{array}{c}\sigma _N\\ \tau _N\end{array}\right] =\left[ \begin{array}{c}\bar{Z}_3\bar{{\varDelta }}^{ }_-\bar{P}_{R}^{-1}\\ \bar{Z}_4\bar{{\varDelta }}^{ }_-\bar{P}_{R}^{-1}\end{array}\right] , \end{aligned}$$

where $\bar{Z}_{1,2,3,4}$ are given in (36). Inserting the above relation into (72), we obtain the penalty parameters presented in (35).

Appendix B: Validity of the Derivations and Penalty Parameters in Appendix A

The following Lemma will prove useful:

Lemma 1

(Determinant theorem) For any matrices A and B of size $m \times n$ and $n \times m$, respectively, $\det (I_{ m }+AB)=\det (I_{ n }+BA)$ holds. This lemma is a generalization of the “matrix determinant lemma” and sometimes referred to as “Sylvester’s determinant theorem”.

Proof

Consider the product of block matrices below:

$$\begin{aligned} {\begin{pmatrix} I_m&{}0\\ B&{}I_n \end{pmatrix}} {\begin{pmatrix} I_m+AB&{}A\\ 0&{}I_n \end{pmatrix}} {\begin{pmatrix} I_m&{}0\\ -B&{}I_n \end{pmatrix}}= {\begin{pmatrix} I_m&{}A\\ 0&{}I_n+BA \end{pmatrix}}. \end{aligned}$$

Using the multiplicativity of determinants, the determinant rule for block triangular matrices and the fact that $\det (I_m)=\det (I_n)=1$, we see that the determinant of the left hand side is $ \det (I_m+AB)$ and that the determinant of the right hand side is $ \det (I_n + BA)$. $\square $

When (67) in “Appendix A” is rewritten such that all dependence of $\widehat{W}$ is removed, we rely on the assumption that we can extract $(I_N\otimes \mathcal {E})\widehat{W}$ from (67b) and insert it into (67a). Intuitively we expect this to be possible, since (67) is in fact (although indirectly) a consistent approximation of (30). To investigate this more carefully, we multiply (67b) by $(H\otimes I_n)$ and move all $\widehat{W}$ dependent parts to the left hand side (recall that $\mathcal {G}_{L,R}=\mathcal {K}_{L,R}\mathcal {E}$). This yields

$$\begin{aligned} \left( (H\otimes I_n)-(E_{0}\otimes \tau _0\mathcal {K}_L)-(E_N \otimes \tau _N\mathcal {K}_R) \right) (I_N\otimes \mathcal {E})\widehat{W}=&\,(Q\otimes \mathcal {E})U\\&+(e_{0}\otimes \tau _0)(\mathcal {H}_L{U}_{0}-g_{{L}})\\&+(e_N \otimes \tau _N)(\mathcal {H}_R{U}_N -g_{{R}}). \end{aligned}$$

We see that we can solve for $(I_N\otimes \mathcal {E})\widehat{W}$ if the matrices $H_{0,0}I_n-\tau _0\mathcal {K}_L$ and $H_{N,N}I_n-\tau _N\mathcal {K}_R$ are non-singular. From “Appendix A” we know that

$$\begin{aligned} \tau _0=-\bar{Z}_2\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1},&\tau _N=\bar{Z}_4\bar{{\varDelta }}^{ }_-\bar{P}_{R}^{-1},\end{aligned}$$

that is, we need $I_n+\widehat{q}\bar{Z}_2\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1} \mathcal {K}_L$ and $I_n-\widehat{q}\bar{Z}_4\bar{{\varDelta }}^{ }_-\bar{P}_{R}^{-1} \mathcal {K}_R$ to be non-singular (note that $\widehat{q}=1/H_{0,0}=1/H_{N,N}$ for diagonal matrices $H$). According to Lemma 1 above, we have

$$\begin{aligned} \det \left( I_n+\widehat{q}\bar{Z}_2\bar{{\varDelta }}^{ }_+\bar{P}_{L}^{-1} \mathcal {K}_L\right)&=\det \left( I_{m_+}+\widehat{q}\bar{P}_{L}^{-1} \mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{}_+\right) =\det \left( \bar{P}_{L}^{-1}\right) \det \left( \widehat{{\varXi }}_{L}\right) \\ \det \left( I_n-\widehat{q}\bar{Z}_4\bar{{\varDelta }}^{ }_-\bar{P}_{R}^{-1} \mathcal {K}_R\right)&=\det \left( I_{m_-}-\widehat{q}\bar{P}_{R}^{-1} \mathcal {K}_R\bar{Z}_4\bar{{\varDelta }}^{ }_-\right) =\det \left( \bar{P}_{R}^{-1} \right) \det \left( \widehat{{\varXi }}_{R}\right) . \end{aligned}$$

That is, we can solve for $(I_N\otimes \mathcal {E})\widehat{W}$ in (67b) if the matrices $\widehat{{\varXi }}_{L,R}$ are non-singular, where $\widehat{{\varXi }}_{L,R}$ are nothing else than the matrices that shows up in the penalty parameters in (35). So, we are thus interested in the regularity of the matrices

$$\begin{aligned} \widehat{{\varXi }}_L=\bar{P}_{L}+\widehat{q}\mathcal {K}_L\bar{Z}_2\bar{{\varDelta }}^{ }_+ ,&\widehat{{\varXi }}_R=\bar{P}_{R}-\widehat{q}\mathcal {K}_R\bar{Z}_4\bar{{\varDelta }}^{ }_-, \end{aligned}$$

which are inverted in (35)—or if $\widehat{q}$ is replaced by $q$ we consider ${\varXi }_{L,R}$ from (43). Below we show that $\widehat{{\varXi }}_L$ is non-singular for well-posed problems. First, using (6), (32) and (36) we obtain

$$\begin{aligned} \left[ \begin{array}{cc}-\mathcal {E}&0_{n,n} \end{array}\right] \bar{Z}^{-T}=\left[ \begin{array}{ccc}\bar{Z}_2\bar{{\varDelta }}_+&0_{n,m_0}&\bar{Z}_4\bar{{\varDelta }}_-\end{array}\right] \end{aligned}$$

and realize that the matrices $\bar{Z}_2$ and $\bar{Z}_4$ scales with $\mathcal {E}$ as $\bar{Z}_2=\mathcal {E}\widetilde{Z_2}$ and $\bar{Z}_4=\mathcal {E}\widetilde{Z_4}$, where

$$\begin{aligned} \widetilde{Z_2}=\left[ \begin{array}{cc}I_{n}&0_{n,n} \end{array}\right] \bar{Z}^{-T}\left[ \begin{array}{c}-I_{m_+}\\ 0_{(m-m_+),m_+} \end{array}\right] \bar{{\varDelta }}_+^{-1},&\widetilde{Z_4}=\left[ \begin{array}{cc}I_n&0_{n,n} \end{array}\right] \bar{Z}^{-T}\left[ \begin{array}{c}0_{(m-m_-),m_-}I_{m_-} \end{array}\right] \bar{{\varDelta }}_-^{-1}. \end{aligned}$$

Secondly, using (10), (32) and (36) leads to $\mathcal {G}_L=\bar{P}_{L}(\bar{Z}_2^T+\bar{R}_L\bar{Z}_4^T)$ and we can rewrite $\widehat{{\varXi }}_L$ as

$$\begin{aligned} \widehat{{\varXi }}_L&=\bar{P}_{L}\left( I_{m_+}+\widehat{q}\left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) \mathcal {E}\widetilde{Z_2}\bar{{\varDelta }}^{ }_+\right) , \end{aligned}$$

where we have used that $\mathcal {G}_L=\mathcal {K}_L\mathcal {E}$. Now Lemma 1 yields $\det (\widehat{{\varXi }}_L) =\det (\bar{P}_{L}) \det \left( {\varUpsilon }\right) $, where ${\varUpsilon }\equiv I_{n}+\widehat{q}\mathcal {E}^{1/2}\widetilde{Z_2}\bar{{\varDelta }}^{}_+\big (\widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\big )\mathcal {E}^{1/2}$ and where we by $\mathcal {E}^{1/2}$ refer to the principal square root of $\mathcal {E}$. The permutation matrix $\bar{P}_{L}$ is invertible but ${\varUpsilon }$ must be checked. Thus we compute

$$\begin{aligned} {\varUpsilon }{\varUpsilon }^T =&\, I_{n}+\widehat{q}^2 \mathcal {E}^{1/2}\widetilde{Z_2}\bar{{\varDelta }}^{}_+\left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) \mathcal {E}\left( \widetilde{Z_2}+\widetilde{Z_4}\bar{R}_L^T\right) \bar{{\varDelta }}^{}_+ \widetilde{Z_2}^T \mathcal {E}^{1/2}\\&+\widehat{q}\mathcal {E}^{1/2}\left( \left( \widetilde{Z_2}+\widetilde{Z_4}\bar{R}_L^T\right) \bar{{\varDelta }}^{ }_+ \widetilde{Z_2}^T +\widetilde{Z_2}\bar{{\varDelta }}^{ }_+\left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) \right) \mathcal {E}^{1/2}. \end{aligned}$$

Next, thanks to the condition for well-posedness, $\bar{\mathcal {C}}_L=\bar{{\varDelta }}^{ }_-+\bar{R}_L^T\bar{{\varDelta }}^{ }_+\bar{R}_L\le 0$, we obtain

$$\begin{aligned} 0&\le \left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) ^T\bar{{\varDelta }}_+^{}\left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) \\&=\widetilde{Z_2}\bar{{\varDelta }}_+^{}\widetilde{Z_2}^T+\widetilde{Z_4}\bar{R}_L^T\bar{{\varDelta }}_+^{}\widetilde{Z_2}^T+\widetilde{Z_2}\bar{{\varDelta }}_+^{}\bar{R}_L\widetilde{Z_4}^T+\widetilde{Z_4}\bar{R}_L^T\bar{{\varDelta }}_+^{}\bar{R}_L\widetilde{Z_4}^T\\&\le \widetilde{Z_2}\bar{{\varDelta }}_+^{}\widetilde{Z_2}^T+\widetilde{Z_4}\bar{R}_L^T\bar{{\varDelta }}_+^{}\widetilde{Z_2}^T+\widetilde{Z_2}\bar{{\varDelta }}_+^{}\bar{R}_L\widetilde{Z_4}^T-\widetilde{Z_4}\bar{{\varDelta }}_-^{}\widetilde{Z_4}^T. \end{aligned}$$

Inserting this into ${\varUpsilon }{\varUpsilon }^T$ above, gives (recall that $\widehat{q}$ is positive)

$$\begin{aligned} {\varUpsilon }{\varUpsilon }^T \ge&\, I_{n}+\widehat{q}^2 \mathcal {E}^{1/2}\widetilde{Z_2}\bar{{\varDelta }}^{}_+\left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) \mathcal {E}\left( \widetilde{Z_2}+\widetilde{Z_4}\bar{R}_L^T\right) \bar{{\varDelta }}^{}_+ \widetilde{Z_2}^T \mathcal {E}^{1/2}\\&+\widehat{q}\mathcal {E}^{1/2}\left( \widetilde{Z_2}\bar{{\varDelta }}^{ }_+\widetilde{Z_2}^T+\widetilde{Z_4}\bar{{\varDelta }}_-^{}\widetilde{Z_4}^T\right) \mathcal {E}^{1/2}. \end{aligned}$$

We now let $\mathcal {E}=X{\varLambda }X^T$ be the eigendecomposition of $\mathcal {E}$, with—for simplicity—the eigenvalues sorted as ${\varLambda }=\text {diag}({\varLambda }_+, {\varLambda }_0)$, where ${\varLambda }_+>0$ and ${\varLambda }_0=0$. Furthermore, we denote

$$\begin{aligned} X^T\left( \widetilde{Z_2}\bar{{\varDelta }}^{ }_+\widetilde{Z_2}^T+\widetilde{Z_4}\bar{{\varDelta }}_-^{}\widetilde{Z_4}^T\right) X=\left[ \begin{array}{cc}{\varTheta }_{1}&{}{\varTheta }_{3}\\ {\varTheta }_{2}&{}{\varTheta }_{4}\end{array}\right] , \end{aligned}$$

where ${\varTheta }_{1}$ and ${\varTheta }_{4}$ have the same sizes as ${\varLambda }_+$ and ${\varLambda }_0$, respectively. Using (32), (36) and (7) leads to $\bar{Z}_2\bar{{\varDelta }}^{ }_+\bar{Z}_2^T+\bar{Z}_4\bar{{\varDelta }}^{ }_-\bar{Z}_4^T =0$, that is

$$\begin{aligned} 0&=\bar{Z}_2\bar{{\varDelta }}^{ }_+\bar{Z}_2^T+\bar{Z}_4\bar{{\varDelta }}^{ }_-\bar{Z}_4^T =\mathcal {E}\left( \widetilde{Z_2}\bar{{\varDelta }}^{ }_+\widetilde{Z_2}^T+\widetilde{Z_4}\bar{{\varDelta }}_-^{}\widetilde{Z_4}^T\right) \mathcal {E}=X\left[ \begin{array}{cc}{\varLambda }_+{\varTheta }_{1}{\varLambda }_+&{}0\\ 0&{}0\end{array}\right] X^T, \end{aligned}$$

which means that ${\varTheta }_{1}=0$ must hold. This in turn leads to

$$\begin{aligned} \mathcal {E}^{1/2}\left( \widetilde{Z_2}\bar{{\varDelta }}^{ }_+\widetilde{Z_2}^T+\widetilde{Z_4}\bar{{\varDelta }}_-^{}\widetilde{Z_4}^T\right) \mathcal {E}^{1/2}&=X\left[ \begin{array}{cc}{\varLambda }_+^{1/2}{\varTheta }_{1}{\varLambda }_+^{1/2}&{}0\\ 0&{}0\end{array}\right] X^T=0, \end{aligned}$$

where we have used that $\mathcal {E}^{1/2}=X{\varLambda }^{1/2}X^T$. Thus ${\varUpsilon }{\varUpsilon }^T$ is non-singular, since

$$\begin{aligned} {\varUpsilon }{\varUpsilon }^T&\ge I_{n}+\widehat{q}^2 \mathcal {E}^{1/2}\widetilde{Z_2}\bar{{\varDelta }}^{}_+\left( \widetilde{Z_2}^T+\bar{R}_L\widetilde{Z_4}^T\right) \mathcal {E}\left( \widetilde{Z_2}+\widetilde{Z_4}\bar{R}_L^T\right) \bar{{\varDelta }}^{}_+ \widetilde{Z_2}^T \mathcal {E}^{1/2}\ge I_{n}>0. \end{aligned}$$

It follows that ${\varUpsilon }$ is non-singular, since ${\text {rank}}({\varUpsilon }{\varUpsilon }^T) = {\text {rank}}({\varUpsilon }) $, and consequently so is $\widehat{{\varXi }}_L$, since $\det (\widehat{{\varXi }}_L) =\det (\bar{P}_{L}) \det \left( {\varUpsilon }\right) $. The same derivations can be repeated for the right boundary.

Appendix C: Proof of Proposition 1 and Examples of $q$

In Proposition 1 we claim that the inverse of $\widetilde{A}_{{S}}= A_{{S}}+\delta E_0$ has the structure $\widetilde{A}_{{S}}^{-1}=J/\delta +K_0$ and that the corners of $\widetilde{M}^{-1}=S\widetilde{A}_{{S}}^{-1}S^T$ are independent of $\delta $. We prove this below.

Proof

First we make sure that $\widetilde{A}_{{S}}$ is non-singular. By numerically investigating the eigenvalue of $\widetilde{A}_{{S}}$ which is closest to zero, we see that (for all operators in this paper) it scales almost as $c(\delta )/N$, where $c(\delta )$ is nearly independent of the operator and where $c(\delta )=0$ only if $\delta =0$. We now show that the inverse of $\widetilde{A}_{{S}}$ is $J/\delta +K_0$. We denote the parts of $ A_{{S}}$, $ \widetilde{A}_{{S}}$, $K_0$ and J

$$\begin{aligned} A_{{S}}=\left[ \begin{array}{cc}a&{}\mathbf {a}^T\\ \mathbf {a}&{}\bar{A}\end{array}\right] ,&\widetilde{A}_{{S}}=\left[ \begin{array}{cc}a+\delta &{}\mathbf {a}^T\\ \mathbf {a}&{}\bar{A}\end{array}\right] ,&K_0=\left[ \begin{array}{cc}{0}&{}\mathbf {0}^T\\ \mathbf {0}&{}\bar{A}^{-1}\end{array}\right] ,&J=\left[ \begin{array}{cc}1 &{}\mathbf {1}^T\\ \mathbf {1}&{}\bar{J}\end{array}\right] , \end{aligned}$$

where $\mathbf {a}$ and $\mathbf {1}=[1, 1, \ldots , 1]^T$ are vectors and $\bar{A}$ and $\bar{J}$ are matrices of size $N\times N$. Since $A_{{S}}$ consists of consistent difference operators, it yields zero when operating on constants. Therefore, $A_{{S}}J=0$ (because J is an all-ones matrix) and $\mathbf {a}+\bar{A}\mathbf {1}=\mathbf {0}$. Note that the relation $\mathbf {a}+\bar{A}\mathbf {1}=\mathbf {0}$ leads to $\left[ \begin{array}{cc} \mathbf {a}&\bar{A}\end{array}\right] =\bar{A}B$, where $B=\left[ \begin{array}{cc} -\mathbf {1}&\bar{I}\end{array}\right] $ is an $N\times (N+1)$ matrix of rank N, in which case it holds that $ {\text {rank}}(\bar{A}B) = {\text {rank}}(\bar{A})$. Moreover, since $\widetilde{A}_{{S}}$ is non-singular, $\left[ \begin{array}{cc} \mathbf {a}&\bar{A}\end{array}\right] $ must have full rank N. Hence $ {\text {rank}}(\bar{A})= {\text {rank}}(\bar{A}B) =N$, i.e., $\bar{A}$ has full rank N and is invertible. Next, due to the structure of $K_0$, we know that $E_0K_0=0$. Thus we have

$$\begin{aligned} (A_{{S}}+\delta E_0)(J/\delta +K_0) =A_{{S}}K_0+ E_0J =\left[ \begin{array}{cc}1 &{}\mathbf {a}^T\bar{A}^{-1}+\mathbf {1}^T\\ \mathbf {0}&{}\bar{I}\end{array}\right] =I. \end{aligned}$$

In the last step we have used that $\mathbf {a}^T+\mathbf {1}^T\bar{A}^T=\mathbf {0}^T$ and that $\bar{A}$ is symmetric.

The first and the last row of the matrix S are consistent difference stencils. We can thus write $S=[\mathbf {s}_0, \times , \mathbf {s}_N]^T$, where the vectors $\mathbf {s}_{0, N}$ have the property $\mathbf {s}_{0, N}^TJ=0$. The interior rows of S are marked by a $\times $ because they are not uniquely defined. We compute

$$\begin{aligned} \widetilde{M}^{-1}&=S\widetilde{A}_{{S}}^{-1}S^T =S\left( J/\delta +K_0\right) S^T =\left[ \begin{array}{ccc}\mathbf {s}_0^TK_0\mathbf {s}_0 &{}\times &{} \mathbf {s}_0^TK_0\mathbf {s}_N\\ \times &{} \times &{} \times \\ \mathbf {s}_N^TK_0\mathbf {s}_0 &{}\times &{}\mathbf {s}_N^T K_0\mathbf {s}_N \end{array}\right] . \end{aligned}$$

We see that the corner elements of $\widetilde{M}^{-1}$ are independent of $\delta $. We conclude that if S is defined such that it is non-singular, the constants in (42) can be computed using (60). $\square $

As an example, consider the narrow (2, 0) order operator in Table 1, specified by $D_2$ below and associated with the following matrices $H$ and S

$$\begin{aligned} \begin{aligned} D_2&=\frac{1}{h^2}\left[ \begin{array}{ccccc} 0&{}0\\ 1&{}-2&{}1\\ &{}\ddots &{}\ddots &{}\ddots \\ &{}&{}1&{}-2&{}1\\ &{}&{}&{}0&{}0 \end{array}\right] ,&H=h\left[ \begin{array}{ccccc} 1/2\\ &{}1\\ &{}&{}\ddots \\ &{}&{}&{}1\\ &{}&{}&{}&{}1/2\end{array}\right] ,\\{} S&=\frac{1}{h}\left[ \begin{array}{ccccc} -1&{}1\\ \times &{}\times &{}\times &{}\times &{}\times \\ \vdots &{}\vdots &{}\vdots &{}\vdots &{}\vdots \\ \times &{}\times &{}\times &{}\times &{}\times \\ &{}&{}&{}-1&{}1\end{array}\right] . \end{aligned} \end{aligned}$$

(73)

Using (73) and (39) we obtain $A_{{S}}$ such that we can compute $K_0$ and $\widetilde{M}^{-1}$ as

$$\begin{aligned} K_0=h\left[ \begin{array}{ccccc}0 &{}0 &{}\ldots &{}0&{}0 \\ 0 &{}1 &{}\ldots &{}1&{}1\\ \vdots &{}\vdots &{}\ddots &{}\vdots &{}\vdots \\ 0 &{}1 &{}\ldots &{}N-1&{}N-1 \\ 0 &{}1 &{}\ldots &{}N-1&{}N\end{array}\right] ,&\widetilde{M}^{-1} =\frac{1}{h}\left[ \begin{array}{ccccc}1 &{}\times &{}\ldots &{}\times &{}0 \\ \times &{}\times &{}\ldots &{}\times &{}\times \\ \vdots &{}\vdots &{} &{}\vdots &{}\vdots \\ \times &{}\times &{}\ldots &{}\times &{}\times \\ 0 &{}\times &{}\ldots &{}\times &{}1\end{array}\right] \end{aligned}$$

In this case we get $q_0=q_N=1/h$ and $q_c=0$, such that $q=1/h$.

In addition to the operator discussed above, we use the diagonal-norm operators in [13]. For the higher order accurate operators found in [13], $q$ varies with N. For example, for the narrow (4, 2) order accurate operator, we have

$$\begin{aligned} \begin{array}{c@{\quad }c@{\quad }c@{\quad }c} N&{}q_0h&{}q_ch&{}qh\\ \hline 8&{} 3.986350339808304 &{} 0.000041141179445 &{} 3.986391480987749\\ 9&{} 3.986350339313381 &{} 0.000002953803786 &{} 3.986353293117168\\ 10&{} 3.986350339310830 &{} 0.000000212073570 &{} 3.986350551384400\\ 11&{} 3.986350339310817 &{} 0.000000015226197 &{} 3.986350354537014\\ 12&{} 3.986350339310817 &{}0.000000001093192 &{} 3.986350340404008\\ \end{array} \end{aligned}$$

Since the values do not differ so much, it is practical to use the largest value, the one for $N=8$, regardless of the number of grid points.

Appendix D: Time-Dependent Numerical Examples

For simplicity we mainly consider stationary numerical examples. Below we give a couple of examples confirming the superconvergence also for time-dependent problems.

1.1 The Heat Equation with Dirichlet Boundary Conditions

We consider the heat equation. We solve $\mathcal {U}_t=\varepsilon \mathcal {U}_{xx}+\mathcal {F}(x,t)$ with $\varepsilon =0.01$ and the exact solution $\mathcal {U}(x,t)=\cos (30x)+\sin (20x)\cos (10t)+\sin (35t)$. For the time propagation the classical 4th order accurate Runge–Kutta scheme is used, with sufficiently small time steps, ${\varDelta }t=10^{-4}$, such that the spatial errors dominate. In Fig. 13 the errors obtained using the narrow (6, 3) order scheme are shown as a function of time.

The corresponding spatial order of convergence (at time $t=1$) is shown in Table 2. The simulations confirm the steady results, namely that both $\omega =2\varepsilon $ and $\omega =q\varepsilon $ give superconvergent functionals but that choosing the factorization parameter as $\omega \sim \varepsilon /h$ improves the solution significantly compared to when using the eigendecomposition.

1.2 The Heat Equation with Neumann Boundary Conditions

We solve $\mathcal {U}_t=\varepsilon \mathcal {U}_{xx}+\mathcal {F}(x,t)$ again, but this time with Neumann boundary conditions, and the penalty parameters are now given by (65) with $a=0$, $\varepsilon =0.01$, $\alpha _{{}_{L,R}}=0$ and $\beta _{{}_{L,R}}=1$. In contrast to when having Dirichlet boundary conditions, the spectral radius $\rho $ does not depend so strongly on $\omega $ and therefore we can let $\omega \rightarrow \infty $ (we can use $\omega =q\varepsilon $ here too, it gives the same convergence rates as $\omega =\infty $). In Table 3 we show the errors and convergence orders (at time $t=1$) for the same setup as in the previous section, that is when solving using the 4th order Runge–Kutta scheme with ${\varDelta }t=10^{-4}$ and having the exact solution $\mathcal {U}(x,t)=\cos (30x)+\sin (20x)\cos (10t)+\sin (35t)$ and the weight function $\mathcal {G}=1$. We note that the convergence rates behaves similarly to the Dirichlet case.

Table 2 The errors and convergence rates at $t=1$ for the narrow (6,3) order scheme, when using Dirichlet boundary conditions

Full size table

Table 3 The errors and convergence rates at $t=1$ for the narrow (6,3) order scheme, when using Neumann boundary conditions

Full size table

Rights and permissions

Reprints and permissions

About this article

Cite this article

Eriksson, S. A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators. J Sci Comput 75, 906–940 (2018). https://doi.org/10.1007/s10915-017-0569-6

Download citation

Received: 18 November 2016
Revised: 24 August 2017
Accepted: 25 September 2017
Published: 16 October 2017
Issue Date: May 2018
DOI: https://doi.org/10.1007/s10915-017-0569-6

Keywords

Mathematics Subject Classification

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Dual Consistent Finite Difference Method with Narrow Stencil Second Derivative Operators

Abstract

Similar content being viewed by others

Inverses of SBP-SAT Finite Difference Operators Approximating the First and Second Derivative

Adaptive Finite Difference Methods for Nonlinear Elliptic and Parabolic Partial Differential Equations with Free Boundaries

Simultaneous Approximation Terms for Multi-dimensional Summation-by-Parts Operators

1 Introduction

1.1 Preliminaries

2 Hyperbolic Systems

2.1 Well-Posedness Using the Energy Method

Remark 1

Remark 2

2.2 The Semi-discrete Problem

2.3 Numerical Stability Using the Energy Method

2.4 The Dual Problem

Remark 3

2.4.1 Well-Posedness of the Dual Problem

Remark 4

2.4.2 Discretization of the Dual Problem

2.5 Dual Consistency

Remark 5

2.6 Penalty Parameters for the Hyperbolic Problem

Theorem 1

Proof

Remark 6

Remark 7

Remark 8

3 Parabolic Systems

Remark 9

3.1 Discretization Using Wide-Stencil Second Derivative Operators

Remark 10

3.2 Discretization Using Narrow-Stencil Second Derivative Operators

Theorem 2

3.3 Stability When Using Narrow-Stencil Second Derivative Operators

3.4 Dual Consistency for Narrow-Stencil Second Derivative Operators

Remark 11

4 Computing \(q\)

Proposition 1

Remark 12

Remark 13

5 Examples and Numerical Experiments

Remark 14

5.1 The Scalar Case

Remark 15

Remark 16

Remark 17

5.1.1 The Heat Equation with Dirichlet Boundary Conditions

Remark 18

5.1.2 The Advection–Diffusion Equation with Dirichlet Boundary Conditions

Remark 19

5.1.3 Functionals Including Boundary Terms

Remark 20

5.1.4 Reflections From the Scalar Case

5.2 A Fluid Dynamics System with Solid Wall Boundary Conditions

6 Concluding Remarks

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Appendices

Appendix A: Reformulation of the First Order Form Discretization

Appendix B: Validity of the Derivations and Penalty Parameters in Appendix A

Lemma 1

Proof

Appendix C: Proof of Proposition 1 and Examples of \(q\)

Proof

Appendix D: Time-Dependent Numerical Examples

1.1 The Heat Equation with Dirichlet Boundary Conditions

1.2 The Heat Equation with Neumann Boundary Conditions

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification

Search

Navigation