Keywords

1 Introduction

The numerical solution of singularly perturbed differential equations (SPDEs) is of great interest to numerical analysts, given the importance of these equations in computational modelling, and the challenges they present for classical numerical schemes and the mathematical methods used to analyse them; see [15] for a survey. In this work, we focus on linear second-order reaction-diffusion problems of the form

$$\begin{aligned} -\varepsilon \varDelta u + b u = f \text { on } \varOmega :=(0,1)^d \qquad u|_{\partial \varOmega }=0, \end{aligned}$$
(1)

for \(d=1,2,3\), where we assume there exist constants \(0<b_0<b(\varvec{x})<b_1\) for every \(\varvec{x}\in \varOmega \). Like all SPDEs, (1) is characterised by a small positive parameter that multiplies the highest derivative. It is “singular” in the sense that the problem is ill-posed if one formally sets \(\varepsilon =0\). As \(\varepsilon \) approaches this limit, the solution typically exhibits layers: regions of rapid change, whose length is determined by \(\varepsilon \). The over-arching goal is to devise methods that resolve these layers, and for which the error (measured in a suitable norm) is independent of \(\varepsilon \). Many classical techniques make the tacit assumption that derivatives of u are bounded, which does not hold, uniformly in \(\varepsilon \), for solutions to (1). Numerous specialised methods, usually based around layer-adapted meshes, have been developed with the goal of resolving these layers and the attendant mathematical conundrums. The celebrated piecewise uniform meshes of Shishkin have been particularly successful in this regard; and analyses of finite-difference methods for (1) and its many variants is largely complete [13].

Finite-element methods (FEMs) applied on layer-adapted meshes have also been successfully applied to (1), but their analysis is more problematic. This is highlighted to great effect by Lin and Stynes who demonstrated that the usual energy norm associated with (1) is too weak to adequately express the layers present in the solution [10]. They proposed a first-order FEM (see Sect. 2) for which the associated norm is sufficiently strong to capture layers; they coined the term “balanced norm” to describe this.

A flurry of activity on balanced norms was prompted by [10], including the first-order system Petrov-Galerkin (FOSPeG) approach proposed by the authors [1], and we refer to its introduction for a survey of the progress up to 2015. Since then, developments have continued apace. Broadly speaking, studies can be classified as one of two types.

  1. 1.

    Those that give analyses of standard FEMs, but in norms that are not induced by the associated bilinear forms; see, e.g., [16] on sparse grid FEMs, and [12] on hp-FEMs.

  2. 2.

    Those that propose new formulations for which the associated norm is naturally “balanced”; see, e.g., the discontinuous Petrov-Galerkin method of Heuer and Karulik [7].

The present study belongs to the second of these classes: we propose a new FEM for which the induced norm is balanced. This method is related to our earlier work [1], but instead uses a weighted least-squares FEM to obtain a symmetric discrete system. In this first-order system least-squares (FOSLS) approach [4, 5], care is taken in choosing the weight, so that the resulting norms are indeed balanced.

The remainder of the paper is outlined as follows. Section 2 gives a brief discussion on balanced norms, where the Lin and Stynes and FOSPeG methods are summarized. In Sect. 3, we discuss the weighted least-squares approach and provide the necessary analysis, which applies in one, two and three dimensions. In Sect. 4, we focus on the particular case of \(d=2\); we present a suitable Shishkin mesh of the problem, and present numerical results that support our findings. Some concluding remarks are given in Sect. 5.

2 Balanced Norms

In [10], Lin and Stynes propose a first-order system reformulation of (1), writing the equivalent system as

$$\begin{aligned} L_{\text {div}}\,\mathcal {U} := \begin{pmatrix} \varepsilon ^{1/4}\big (\varvec{w} - \nabla u \big ) \\ -\varepsilon \nabla \cdot \varvec{w} + bu \end{pmatrix} = \begin{pmatrix} 0 \\ f \end{pmatrix} =: \mathcal {F}_{\text {div}}, \end{aligned}$$
(2)

for \(\mathcal {U}=(u, \varvec{w})^T\). Rather than forming a least-squares finite-element discretization as in [4, 5], they choose to close the system in a nonsymmetric manner, defining \(\mathcal {V} = (v,\varvec{z})^T\) and

$$ M_{\text {div}}\mathcal {V} := \begin{pmatrix} \varepsilon ^{1/4}\big (\varvec{z} - \nabla v \big )\\ -\varepsilon ^{1/2}b^{-1} \nabla \cdot \varvec{z} + v \end{pmatrix}, $$

then writing the solution of (1) as that of the weak form

$$\begin{aligned} a_{\text {div}}(\mathcal {U},\mathcal {V}) := \langle L_{\text {div}}\,\mathcal {U},M_{\text {div}}\mathcal {V}\rangle = \langle \mathcal {F}_{\text {div}},M_{\text {div}}\mathcal {V}\rangle \quad \forall \mathcal {V} \in H^1(\varOmega )\times H(\text {div}). \end{aligned}$$
(3)

In [10], it is shown that \(a_{\text {div}}\) is coercive and continuous with respect to the norm,

$$\begin{aligned} |||\mathcal {U}|||_{\text {div}}^2:= b_0 \Vert u\Vert _0^2 +\frac{\varepsilon ^{1/2}}{2} \Vert \nabla u\Vert _0^2 + \frac{\varepsilon ^{1/2}}{2} \Vert \varvec{w}\Vert _0^2 + \varepsilon ^{3/2}\Vert \nabla \cdot \varvec{w}\Vert _0^2, \end{aligned}$$
(4)

which is shown to be a balanced norm for the problem, in the sense that all the components in (4) have the same order of magnitude with respect to the perturbation parameter, \(\varepsilon \).

In [1], the authors augmented the first-order system approach proposed by Lin and Stynes to include a curl constraint, in the same style as [5], leading to the first-order system reformulation of (1) as

$$\begin{aligned} L\,\mathcal {U}:= \begin{pmatrix} {\varepsilon }^{1/4}\big (\varvec{w} - \nabla u \big ) \\ -\varepsilon \nabla \cdot \varvec{w} + bu \\ \varepsilon \nabla \times \varvec{w} \end{pmatrix} = \begin{pmatrix} \varvec{0}\\ f \\ \varvec{0} \end{pmatrix} =: \hat{\mathcal {F}}. \end{aligned}$$
(5)

Then, writing

$$\begin{aligned} M_k\mathcal {V} := \begin{pmatrix} {\varepsilon }^{1/4}\big (\varvec{z} - \nabla v \big )\\ -\varepsilon ^{1/2} b^{-1}\nabla \cdot \varvec{z} + v\\ \varepsilon ^{k/2} \nabla \times \varvec{z} \end{pmatrix}, \end{aligned}$$
(6)

leads to the weak form

$$\begin{aligned} a_k^{}(\mathcal {U},\mathcal {V}) := \langle L\mathcal {U},M_k\mathcal {V}\rangle = \langle \hat{\mathcal {F}},M_k\mathcal {V}\rangle \quad \forall \mathcal {V} \in \left( H^1(\varOmega )\right) ^{1+d}. \end{aligned}$$
(7)

Building on the theory of [10], this form is shown to be coercive and continuous with respect to the balanced norm

$$\begin{aligned} |||\mathcal {U}|||_{k}^2 = b_0 \Vert u\Vert _0^2 +\frac{\varepsilon ^{1/2}}{2} \Vert \nabla u\Vert _0^2 + \frac{\varepsilon ^{1/2}}{2} \Vert \varvec{w}\Vert _0^2 + \varepsilon ^{3/2}\Vert \nabla \cdot \varvec{w}\Vert _0^2 + \varepsilon ^{1+k/2}\Vert \nabla \times \varvec{w}\Vert _0^2. \end{aligned}$$
(8)

Furthermore, in [1], the authors show that, when discretized using piecewise bilinear finite elements on a tensor-product Shishkin mesh, this weak form leads to a parameter-robust discretization, with an error estimate independent of the perturbation parameter \(\varepsilon \).

3 First-Order System Least Squares Finite-Element Methods

While theoretical and numerical results in [1] show the effectiveness of the first-order system Petrov-Galerkin approach proposed therein, the non-symmetric nature of the weak form also has disadvantages. Primary among these is that the weak form no longer can be used as an accurate and reliable error indicator, contrary to the common practice for FOSLS finite-element approaches [2,3,4,5,6]. Standard techniques to symmetrize the weak form in (7) fail, however, either sacrificing the balanced nature of the norm (and, thus, any guarantee of parameter robustness of the resulting discretization) or coercivity or continuity of the weak form (destroying standard error estimates). Here, we propose a FOSLS approach for the problem in (1), made possible by considering a weighted norm with spatially varying weight function. Weighted least-squares formulations have been used for a wide variety of problems including those with singularities due to the domain [8, 9].

To this end, we define the weighted inner product on both scalar and vector \(H^1(\varOmega )\) spaces, writing

$$ \langle u,v \rangle _\beta = \int _\varOmega \beta (\varvec{x}) u(\varvec{x}) v(\varvec{x})\, d\varvec{x}, $$

with the associated norm written as \(\Vert u\Vert _\beta \). Slightly reweighting the first-order system from (5), we have

$$\begin{aligned} \mathcal {L}\,\mathcal {U} := \begin{pmatrix} {\varepsilon }^{1/2}\big (\varvec{w} - \nabla u \big )\\ -\varepsilon b^{-1/2}\nabla \cdot \varvec{w} + b^{1/2}u \\ \varepsilon ^{k/2} \nabla \times \varvec{w} \end{pmatrix} = \begin{pmatrix} \varvec{0}\\ b^{-1/2}f \\ \varvec{0} \end{pmatrix} =: \mathcal {F}. \end{aligned}$$
(9)

and pose the weighted FOSLS weak form as

$$ a(\mathcal {U},\mathcal {V}) = \langle \mathcal {L}\mathcal {U},\mathcal {L}\mathcal {V} \rangle _\beta = \langle \mathcal {F},\mathcal {L}\mathcal {V}\rangle _\beta \quad \forall \mathcal {V} \in \left( H^1(\varOmega )\right) ^{1+d}. $$

This form leads to a natural weighted product norm given by

$$ |||\mathcal {U}|||_{\beta ,k}^2 = \Vert u\Vert _\beta ^2 + \varepsilon \Vert \nabla u\Vert _\beta ^2 + \varepsilon \Vert \varvec{w}\Vert _\beta ^2 + \varepsilon ^2\Vert \nabla \cdot \varvec{w}\Vert _\beta ^2 + \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta ^2. $$

As shown below, under a reasonable assumption on the weight function, \(\beta \), the FOSLS weak form is coercive and continuous with respect to this norm.

Theorem 1

Let \(\beta (\varvec{x})\) be given such that there exists \(C>0\) for which

$$ \nabla \beta \cdot \nabla \beta < \frac{b_0\beta ^2(\varvec{x})}{\varepsilon (1+C)^2}, $$

for every \(\varvec{x}\in \varOmega \), and let \(k\in \mathbb {R}\) be given. Then,

$$\begin{aligned} |a(\mathcal {U},\mathcal {V})| \le&\left( 3+2\max (b_0^{-1},b_1)\right) |||\mathcal {U}|||_{\beta ,k}|||\mathcal {V}|||_{\beta ,k}\\ \min \left( \frac{C\min (1,b_0)}{1+C},b_1^{-1},1\right) |||\mathcal {U}|||_{\beta ,k}^2 \le&\, a(\mathcal {U},\mathcal {U}) \end{aligned}$$

for all \(\mathcal {U},\mathcal {V}\in \left( H^1(\varOmega )\right) ^{1+d}\).

Proof

For the continuity bound, we note that

$$\begin{aligned} a(\mathcal {U},\mathcal {V}) = \varepsilon&\langle \varvec{w}-\nabla u,\varvec{z}-\nabla v \rangle _\beta \\ {}&+\, \langle -\varepsilon b^{-1/2}\nabla \cdot \varvec{w}+b^{1/2}u,-\varepsilon b^{-1/2}\nabla \cdot \varvec{z} + b^{1/2}v\rangle _\beta \\ {}&+\, \varepsilon ^k\langle \nabla \times \varvec{w},\nabla \times \varvec{z}\rangle _\beta . \end{aligned}$$

Thus, by the Cauchy-Schwarz and triangle inequalities, we have

$$\begin{aligned} |a(\mathcal {U},\mathcal {V})| \le&\, \varepsilon \left( \Vert \varvec{w}\Vert _\beta + \Vert \nabla u\Vert _\beta \right) \left( \Vert \varvec{z}\Vert _\beta + \Vert \nabla v\Vert _\beta \right) \\ {}&+\, \left( \varepsilon b_0^{-1/2}\Vert \nabla \cdot \varvec{w}\Vert _\beta + b_1^{1/2}\Vert u\Vert _\beta \right) \left( \varepsilon b_0^{-1/2}\Vert \nabla \cdot \varvec{z}\Vert _\beta + b_1^{1/2}\Vert v\Vert _\beta \right) \\ {}&+\, \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta \Vert \nabla \times \varvec{z}\Vert _\beta \\ \le&\left( 3+2\max (b_0^{-1},b_1)\right) |||\mathcal {U}|||_{\beta ,k}|||\mathcal {V}|||_{\beta ,k}. \end{aligned}$$

For the coercivity bound, we note

$$\begin{aligned} a(\mathcal {U},\mathcal {U}) = \varepsilon \Vert \varvec{w}-\nabla u\Vert _\beta ^2&+ \varepsilon ^2\Vert b^{-1/2}\nabla \cdot \varvec{w}\Vert _\beta ^2 +\Vert b^{1/2}u\Vert _\beta ^2 + \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta ^2 \\ {}&- \,2\varepsilon \langle \nabla \cdot \varvec{w},u\rangle _\beta \\ \ge \varepsilon \Vert \varvec{w}-\nabla u\Vert _\beta ^2&+\, \varepsilon ^2b_1^{-1}\Vert \nabla \cdot \varvec{w}\Vert _\beta ^2 +b_0\Vert u\Vert _\beta ^2 + \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta ^2 \\ {}&-\, 2\varepsilon \langle \nabla \cdot \varvec{w},u\rangle _\beta . \end{aligned}$$

Now consider

$$\begin{aligned} - 2\varepsilon \langle \nabla \cdot \varvec{w},u\rangle _\beta&= -2\varepsilon \int _\varOmega \left( \nabla \cdot \varvec{w}\right) u\beta d\varvec{x} \\&= 2\varepsilon \int _\varOmega \varvec{w}\cdot \nabla (u\beta )d\varvec{x} \\&= 2\varepsilon \int _\varOmega \left( \varvec{w}\cdot \nabla u\right) \beta d\varvec{x} + 2\varepsilon \int _\varOmega \left( \nabla \beta \cdot \varvec{w}\right) u d\varvec{x}\\&= 2\varepsilon \langle \varvec{w},\nabla u\rangle _\beta + 2\varepsilon \int _\varOmega \left( \nabla \beta \cdot \varvec{w}\right) u d\varvec{x}, \end{aligned}$$

where we use the fact that \(u=0\) on the boundary in the integration by parts step. Note that

$$ \langle \varvec{w},\nabla u\rangle _\beta = \frac{1}{4}\Vert \varvec{w}+\nabla u\Vert _\beta ^2 - \frac{1}{4}\Vert \varvec{w}-\nabla u\Vert _\beta ^2, $$

and, consequently, that

$$ \varepsilon \Vert \varvec{w}-\nabla u\Vert _\beta ^2 + 2\varepsilon \langle \varvec{w},\nabla u\rangle _\beta = \frac{\varepsilon }{2}\Vert \varvec{w}+\nabla u\Vert _\beta ^2 + \frac{\varepsilon }{2}\Vert \varvec{w}-\nabla u\Vert _\beta ^2 = \varepsilon \Vert \varvec{w}\Vert _\beta ^2 + \varepsilon \Vert \nabla u\Vert _\beta ^2. $$

Thus,

$$\begin{aligned} a(\mathcal {U},\mathcal {U}) \ge b_0\Vert u\Vert _\beta ^2&+\, \varepsilon \Vert \varvec{w}\Vert _\beta ^2 + \varepsilon \Vert \nabla u\Vert _\beta ^2+ \varepsilon ^2b_1^{-1}\Vert \nabla \cdot \varvec{w}\Vert _\beta ^2 + \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta ^2 \\ {}&+ \,2\varepsilon \int _\varOmega \left( \nabla \beta \cdot \varvec{w}\right) u d\varvec{x}. \end{aligned}$$

Finally, consider

$$ 2\varepsilon \left| \int _\varOmega \left( \nabla \beta \cdot \varvec{w}\right) u d\varvec{x}\right| = 2\varepsilon \left| \left\langle \varvec{w},\frac{u}{\beta }\nabla \beta \right\rangle _\beta \right| \le 2\varepsilon \Vert \varvec{w}\Vert _\beta \left\| \frac{u}{\beta }\nabla \beta \right\| _\beta . $$

By our assumption on \(\beta \),

$$ \left\| \frac{u}{\beta }\nabla \beta \right\| _\beta ^2 \le \frac{b_0}{\varepsilon (1+C)^2}\Vert u\Vert _\beta ^2, $$

and, so,

$$ 2\varepsilon \left| \int _\varOmega \left( \nabla \beta \cdot \varvec{w}\right) u d\varvec{x}\right| \le 2\frac{\varepsilon ^{1/2}b_0^{1/2}}{1+C}\Vert \varvec{w}\Vert _\beta \Vert u\Vert _\beta . $$

This gives

$$\begin{aligned} a(\mathcal {U},\mathcal {U}) \ge&b_0\Vert u\Vert _\beta ^2 + \varepsilon \Vert \varvec{w}\Vert _\beta ^2 + \varepsilon \Vert \nabla u\Vert _\beta ^2+ \varepsilon ^2b_1^{-1}\Vert \nabla \cdot \varvec{w}\Vert _\beta ^2 + \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta ^2\\ {}&-\, 2\frac{\varepsilon ^{1/2}b_0^{1/2}}{1+C}\Vert \varvec{w}\Vert _\beta \Vert u\Vert _\beta \\ \ge&\,b_0\left( 1-\frac{1}{(1+C)}\right) \Vert u\Vert _\beta ^2 + \varepsilon \left( 1-\frac{1}{(1+C)}\right) \Vert \varvec{w}\Vert _\beta ^2 \\ {}&+\, \varepsilon \Vert \nabla u\Vert _\beta ^2+ \varepsilon ^2b_1^{-1}\Vert \nabla \cdot \varvec{w}\Vert _\beta ^2 + \varepsilon ^k\Vert \nabla \times \varvec{w}\Vert _\beta ^2\\ \ge&\min \left( \frac{C\min (1,b_0)}{1+C},b_1^{-1},1\right) |||\mathcal {U}|||_{\beta ,k}^2. \end{aligned}$$

A natural question, in light of this result, is whether a suitable choice of \(\beta (\varvec{x})\) exists. We now give a concrete construction of one such family of functions, \(\beta (\varvec{x})\), for which the assumption above is satisfied. This family is constructed for the case of \(\varOmega = [0,1]^d\) with boundary layers along each boundary adjacent to the origin (i.e., where \(x_i = 0\) for some i). The extension to boundary layers along all 2d boundary faces is straightforward from the construction.

Theorem 2

Let \(C>0\) be given, and define \(\gamma = \frac{b_0^{1/2}}{(1+C)\sqrt{d}}\). Take

$$\begin{aligned} \beta (\varvec{x}) = \left( 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_1/\sqrt{\varepsilon }}\right) \cdots \left( 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_d/\sqrt{\varepsilon }}\right) \end{aligned}$$
(10)

Then,

$$ \nabla \beta \cdot \nabla \beta < \frac{b_0\beta ^2(\varvec{x})}{\varepsilon (1+C)^2}, $$

for every \(\varvec{x}\in \varOmega \).

Proof

A direct calculation shows that

$$ \frac{\partial \beta }{\partial x_i} = \frac{\frac{-\gamma }{\varepsilon }e^{-\gamma x_i/\sqrt{\varepsilon }}}{\left( 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_i/\sqrt{\varepsilon }}\right) }\beta (\varvec{x}). $$

Consequently,

$$ \nabla \beta \cdot \nabla \beta = \sum _{i=1}^d \left( \frac{\frac{-\gamma }{\varepsilon }e^{-\gamma x_i/\sqrt{\varepsilon }}}{\left( 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_i/\sqrt{\varepsilon }}\right) }\right) ^2\beta ^2(\varvec{x}). $$

Note, however, that

$$ \left( \frac{\frac{-\gamma }{\varepsilon }e^{-\gamma x_i/\sqrt{\varepsilon }}}{\left( 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_i/\sqrt{\varepsilon }}\right) }\right) ^2 = \frac{\gamma ^2}{\varepsilon }\left( \frac{\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_i/\sqrt{\varepsilon }}}{\left( 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_i/\sqrt{\varepsilon }}\right) }\right) ^2 \le \frac{\gamma ^2}{\varepsilon }. $$

This gives

$$ \nabla \beta \cdot \nabla \beta \le \frac{d\gamma ^2}{\varepsilon } \beta ^2(\varvec{x}). $$

Substituting in the chosen value for \(\gamma \) gives the stated result.

The final question to be resolved is whether \(\beta (\varvec{x})\) as given in (10) is a “good” choice, in the sense of whether quasi-optimal approximation in the resulting norm is expected to give a good approximation to the layer structure in a typical solution. We consider the case of \(d=2\), the unit square. Following Lemmas 1.1 and 1.2 of [11], we require that the problem data satisfy the assumptions of [11, §2.1], specifically that \(f,b \in C^{4,\alpha }(\bar{\varOmega })\) and that f vanishes at the corners of the domain. Denoting the four edges of the domain by \(\varGamma _i\), \(1\le i \le 4\), numbered clockwise with the edge \(y=0\) as \(\varGamma _1\), and the four corners of the domain by \(c_i\), \(1\le i \le 4\), numbered clockwise with the origin as \(c_1\), we have the following result.

Lemma 1

([11, Lemmas 1.1 and 1.2]). The solution u of (1) can be decomposed as

$$\begin{aligned} u = V+W+Z = V + \displaystyle \sum \limits _{i=1}^4 W_i + \displaystyle \sum \limits _{i=1}^4 Z_i, \end{aligned}$$
(11a)

where each \(W_i\) is a layer associated with the edge \(\varGamma _i\) and each \(Z_i\) is a layer associated with the corner \(c_i\). There exists a constant C such that

$$\begin{aligned} \left| \frac{\partial ^{m+n}V}{\partial x^m \partial y^n} (x,y)\right|&\le C(1+\varepsilon ^{1-m/2-n/2}),&0\le m+n \le 4,&\end{aligned}$$
(11b)
$$\begin{aligned} \left| \frac{\partial ^{m+n}W_1 }{\partial x^m \partial y^n} (x,y)\right|&\le C(1+\varepsilon ^{1-m/2})\varepsilon ^{-n/2} e^{-y\sqrt{b_0/(2\varepsilon )}},&0\le m+n \le 3,&\end{aligned}$$
(11c)
$$\begin{aligned} \left| \frac{\partial ^{m+n}W_2 }{\partial x^m \partial y^n} (x,y)\right|&\le C\varepsilon ^{-m/2}(1+\varepsilon ^{1-n/2}) e^{-x\sqrt{b_0/(2\varepsilon )}},&0\le m+n \le 3,&\end{aligned}$$
(11d)
$$\begin{aligned} \left| \frac{\partial ^{m+n} Z_1}{\partial x^m \partial y^n } (x,y)\right|&\le C \varepsilon ^{-m/2-n/2}e^{-(x+y)\sqrt{b_0/(2\varepsilon )}},&0\le m+n \le 3,&\end{aligned}$$
(11e)

with analogous bounds for \(W_3\), \(W_4\), \(Z_2\), \(Z_3\) and \(Z_4\).

Thus, as a “stereotypical” solution of (1) in the case where boundary layers only form along the edges \(x=0\) and \(y=0\) of \([0,1]^2\), we can consider

$$ u(x) = u_0(x) + c_1e^{-x\sqrt{b_0/(2\varepsilon )}} + c_2e^{-y\sqrt{b_0/(2\varepsilon )}} + c_3e^{-(x+y)\sqrt{b_0/(2\varepsilon )}}. $$

Next, we check if \(|||\mathcal {U}|||_{\beta ,k}\) is “balanced”, not only in the sense of all terms having the same order, but in addition that each component in the stereotypical solution above is well-represented in the norm. This means the norm can be bounded from above and below by \(\varepsilon \)-independent values, so that it is not seen as being well-approximated by zero in the norm (unless truly vanishingly small), nor that the norm blows up as \(\varepsilon \rightarrow 0\). For this case, (10) simplifies as

$$\begin{aligned} \beta (x,y) = \beta _1(x)\beta _1(y) \quad \text { where } \quad \beta _1(x) = 1+\frac{1}{\sqrt{\varepsilon }}e^{-\gamma x_1/\sqrt{\varepsilon }}, \end{aligned}$$

and the checks rely on two direct calculations:

$$\begin{aligned} \int _0^1 \beta _1(x)dx&= 1 + \frac{1}{\gamma }\left( 1-e^{-\gamma /\sqrt{\varepsilon }}\right) \approx 1 + \frac{1}{\gamma }, \\ \int _0^1 \beta _1(x) \left( e^{-x\sqrt{b_0/(2\varepsilon )}}\right) ^2dx&= \frac{1}{\gamma +\sqrt{2b_0}}\left( 1-e^{-\gamma /\sqrt{\varepsilon } - \sqrt{2b_0/\varepsilon }}\right) \\&\qquad \qquad +\, \sqrt{\frac{\varepsilon }{2b_0}}\left( 1-e^{-2\sqrt{b_0/(2\varepsilon )}}\right) \\&\approx \frac{1}{\gamma +\sqrt{2b_0}}, \end{aligned}$$

With this, assuming that \(u_0(x)\) is \({\mathcal {O}}(1)\) over a nontrivial fraction of the domain, we conclude that

$$ |||(u_0,\nabla u_0)^T|||_{\beta ,k} \approx 1+\frac{1}{\gamma }, $$

because of the separable nature of the calculation. Thus, the regular part of the solution is well-represented in the norm.

For the \(W_2\) layer term, we write \(w_2(x,y) = e^{-x\sqrt{b_0}/2\varepsilon }\) and calculate from the above that

$$ \left\| w_2\right\| _\beta ^2 \approx \left( 1+\frac{1}{\gamma }\right) \frac{1}{\gamma +\sqrt{2b_0}}. $$

Noting that all derivatives of this term with respect to y are zero and that \(\partial _x^\ell w_2 = (-\sqrt{b_0/(2\varepsilon )})^\ell w_2\), we compute

$$\begin{aligned} |||(w_2,\nabla w_2)^T|||_{\beta ,k}^2&= \Vert w_2\Vert _\beta ^2 + \varepsilon \Vert \nabla w_2\Vert _\beta ^2 + \varepsilon \Vert \nabla w_2\Vert _\beta ^2 + \varepsilon ^2\Vert \nabla \cdot \nabla w_2\Vert _\beta ^2 \\&\qquad \quad +\, \varepsilon ^k\Vert \nabla \times \nabla w_2\Vert _\beta ^2 \\&= \Vert w_2\Vert _\beta ^2 + \frac{b_0}{2}\Vert w_2\Vert _\beta ^2 + \frac{b_0}{2}\Vert w_2\Vert _\beta ^2 + \left( \frac{b_0}{2}\right) ^2\Vert w_2\Vert _\beta ^2+0 \\&\approx \left( 1+b_0 + \left( \frac{b_0}{2}\right) ^2\right) \left( 1+\frac{1}{\gamma }\right) \frac{1}{\gamma +\sqrt{2b_0}}. \end{aligned}$$

Again, this shows that the \(W_2\) layer term is well-represented in the norm. Similar calculations show the same to be true for the \(W_1\) layer and \(Z_1\) corner terms in the stereotypical solution.

4 Numerical Results

To test the above approach, we consider a two-dimensional problem with constant \(b=1\) posed on the unit square. We construct a problem whose solution mimics the stereotypical solution discussed above, with two edge layers and one corner layer. Specifically, we choose f so that the solution is

$$ u(x,y) = \left( \cos \left( \frac{\pi x}{2}\right) - \frac{e^{-x/\sqrt{\varepsilon }} -e^{-1/\sqrt{\varepsilon }}}{1-e^{-1/\sqrt{\varepsilon }}}\right) \left( 1-y- \frac{e^{-y/\sqrt{\varepsilon }}-e^{-1/\sqrt{\varepsilon }}}{1-e^{-1/\sqrt{\varepsilon }}}\right) . $$

We note that this has somewhat more complex layer behaviour than the stereotypical solution, but still obeys the bounds of Lemma 1. Also, the solution is constructed so as to obey the homogeneous Dirichlet boundary conditions. For numerical stability, we rescale the equations by defining \(\varvec{w} = \sqrt{\varepsilon }\nabla u\) and making corresponding changes in weights to preserve the balanced nature of the norm. With this, we pick k to match the powers of \(\varepsilon \) in the weighting terms of both \(\Vert \nabla \cdot \varvec{w}\Vert _\beta ^2\) and \(\Vert \nabla \times \varvec{w}\Vert _\beta ^2\) in \(|||\mathcal {U}|||_{\beta ,k}\), equivalent to taking \(k=2\) above.

We discretize the test problem on a tensor-product Shishkin mesh (see, e.g., [1, §3] for more details). To do this, we select a transition point, \(\tau > 0\), and construct a one-dimensional mesh with N/2 equal-sized elements on each of the intervals \([0,\tau ]\) and \([\tau ,1]\). The two-dimensional mesh is created as a tensor-product of this mesh with itself, with rectangular (quadrilateral) elements. For the choice of \(\tau \), we slightly modify the standard choice from the literature (see, for example, [1, 10, 11]) to account for both the layer functions present in the solution decomposition and in the definition of \(\beta (\varvec{x})\) in (10). As such, we take

$$ \tau = \min \left\{ \frac{1}{2}, (p+1)\sqrt{\frac{2\varepsilon }{b_0}}\gamma ^{-1}\ln N\right\} $$

where p is the degree of the polynomial space (\(p=1\) for bilinear elements, \(p=2\) for biquadratic, and \(p=3\) for bicubic), so that this factor matches the expected \(L^2\) rate of convergence of the approximation, while the terms \(\sqrt{{2\varepsilon }/{b_0}}\gamma ^{-1}\) decrease appropriately as \(\varepsilon \) does, but increase (corresponding to increasing layer width) with decreases in \(b_0\) or \(\gamma \). In the results that follow, we take \(\gamma = 0.5\), implying \(C = \sqrt{2}-1\). All numerical results were computed using the software for automating the finite-element method, Firedrake [14], for the discretization and a direct solver for the resulting linear systems.

Table 1. Expected error reduction rates on a Shishkin mesh.
Table 2. \(\beta \)-weighted norm and discrete max norm errors for model problem with bilinear discretization.
Table 3. \(\beta \)-weighted norm and discrete max norm errors for model problem with biquadratic discretization.
Table 4. \(\beta \)-weighted norm and discrete max norm errors for model problem with bicubic discretization.

Table 1 shows the expected reduction rates in errors with respect to the mesh parameter, N, if we were to have standard estimates of approximation error in the \(\beta \)-norm on the Shishkin meshes considered here. Tables 2, 3 and 4 show the measured errors (relative to the manufactured solution) for the bilinear, biquadratic, and bicubic discretizations, respectively. Expected behaviour for the bilinear case is a reduction like \(N^{-1}\ln N\) for \(|||\mathcal {U}^*-\mathcal {U}^N|||_{\beta ,2}\) (where \(\mathcal {U}^*\) represents the manufactured solution, \(u^*\) and its gradient) and like \((N^{-1}\ln N)^2\) for the discrete maximum norm of the error, \(\Vert u^*-u^N\Vert _{\ell _{\infty }}\), which is measured at the nodes of the mesh corresponding to the finite-element degrees of freedom. These are both expected to be raised by one power in the biquadratic case, and a further one power for bicubics. In Tables 2, 3, and 4, we see convergence behaviour comparable to these rates, with the exception of the results for the discrete maximum norm in Table 3. These seem to show a superconvergence-type phenomenon, although we have no explanation for this observation at present.

5 Conclusions

In the paper, we propose and analyse a new weighted-norm first-order system least squares methodology tuned for singularly perturbed reaction-diffusion equations that lead to boundary layers. The analysis includes a standard ellipticity result for the FOSLS formulation in a weighted norm, and shows that this norm is suitably weighted to be considered a “balanced norm” for the problem. Numerical results confirm the effectiveness of the method. Future work includes completing the error analysis by proving the necessary interpolation error estimates, with respect to \(||| \cdot |||_{\beta ,2}\), investigating the observed superconvergence properties, generalizing the theory to convection-diffusion equations, and investigating efficient linear solvers for the resulting discretizations.