1 Introduction

Singularly perturbed differential equations of reaction-diffusion type have been extensively studied, as described in [16, 21, 23]. When a standard Galerkin finite element method (FEM) is used to solve these problems, it is straightforward to carry out the usual “energy norm” analysis, but a serious weakness of this measure of the error is that when the singular perturbation parameter (i.e., the diffusion coefficient in the PDE) is very small, the energy norm of the error in the computed solution is essentially no stronger than the \(L^2\) norm of the error; that is, the \(H^1\) component of the energy norm is typically much smaller than its \(L^2\) component. This drawback is discussed at length in [14], where the authors proposed the replacement of the energy norm by a stronger balanced norm whose \(H^1\) component is scaled to the correct size, so that both the \(H^1\) and \(L^2\) components of the solution are O(1) when the singular perturbation parameter is small.

[14] also proposed a new bilinear form and FEM that were designed to facilitate analysis in their balanced norm. Subsequently, other authors derived new FEMs and new analyses that yielded convergence in various balanced norms; see in particular [1, 2, 5, 12, 18, 20, 22].

In the present paper, we consider a singularly perturbed reaction-diffusion problem that is posed on an arbitrary bounded domain \(\varOmega \subset \mathbb {R}^k\) for \(k\ge 1\) and present a new and simple way of constructing a FEM that is convergent in a weighted balanced norm. This new norm is stronger than the standard energy norm because of the weighting that it includes.

Our key idea is to modify the standard Galerkin FEM by introducing a weight function that was (essentially) already used in [2], but unlike [2] we do not rewrite the reaction-diffusion problem as a system of equations. Furthermore, the analysis of [2] is for a problem posed on the unit square in \(\mathbb {R}^2\), but the analysis in the current paper permits a far more general class of domains.

Our method is simpler than any other FEM that is designed to yield convergence in a balanced norm. It stands on a solid theoretical foundation: Theorem 1 shows that on an arbitrary bounded domain, for arbitrary meshes and an arbitrary conforming FEM space, the computed solution is quasioptimal with respect to our weighted norm. This norm can be shown to be balanced for problems where one has a typical decomposition of the unknown solution into smooth and layer components; see Remarks 3 and 4.

The structure of the paper is as follows. In Sect. 2 we state our reaction-diffusion problem and construct a weighted norm and associated bilinear form for which we derive various fundamental properties. These results are used to construct our weighted FEM in Sect. 3, and quasioptimality of the FEM solution is proved. Then in Sect. 4 we specialise this general theory to the particular case where the domain is the unit square in 2D and the FEM uses bilinears on a Shishkin mesh. Here a detailed error analysis leads to the optimal-order convergence result of Theorem 2, where we show that (up to a log factor) the weighted FEM attains first-order convergence in our weighted norm. Finally, Sect. 5 presents numerical results to show the performance of the weighted FEM.

2 The singularly perturbed reaction–diffusion problem

Let \(\varOmega\) be a bounded domain in \(\mathbb {R}^k\), where \(k\ge 1\). Write \({\bar{\varOmega }}\) for the closure of \(\varOmega\) and \(\partial \varOmega\) for its boundary. We shall discuss the elliptic boundary value problem

$$\begin{aligned} Lu:=-\varepsilon ^2\varDelta u + b u&= f\ \text { in } \varOmega , \end{aligned}$$
(1a)
$$\begin{aligned} u&= 0 \ \text { on } \partial \varOmega , \end{aligned}$$
(1b)

where the diffusion parameter satisfies \(0< \varepsilon \le 1\), but the challenging and interesting case is when \(\varepsilon \ll 1\); then (1) is a singularly perturbed reaction-diffusion problem. Assume that the reaction term satisfies \(b\in L_\infty (\varOmega )\) with \(b_0^2 < b(x) \le b_1^2\) for almost all \(x\in \varOmega\), where \(b_0, b_1\) are positive constants. This assumption is usual in singularly perturbed problems of this type—see [16, 21, 23].

It is well known that for each \(f\in L^2(\varOmega )\) the problem (1) has a unique solution \(u\in H^2(\varOmega )\cap H_0^1(\varOmega )\) if \(\varOmega\) is convex or if \(\partial \varOmega\) is smooth, but throughout Sects. 2 and 3 we assume only that \(\varOmega\) is bounded.

Notation. Let \(\omega\) be any measurable subset of \({\bar{\varOmega }}\) and let \(\partial \varOmega\) be its boundary. We use standard notation for the Sobolev spaces \(W^{m,p}(\omega )\) and their associated norms \(\Vert \cdot \Vert _{m,p,\omega }\) and seminorms \(|\cdot |_{m,p,\omega }\). If \(m=0\), we simply write \(\Vert \cdot \Vert _{p,\omega }\); while if \(p=2\) we set \(H^m(\omega ) =W^{m,2}(\omega )\) and \(\Vert \cdot \Vert _{m,\omega }= \Vert \cdot \Vert _{m,2,\omega }\). As usual, \(H^1_0(\omega )=\{v\in H^1(\omega ): v |_{\partial \varOmega }=0\}\), where \(v|_{\partial \varOmega }\) is the trace of v on \(\partial \varOmega\), and \(L^2(\omega ) = H^0(\omega )\).

Throughout this paper, the letter C (with or without subscripts) will denote a generic positive constant that may stand for different values in different places but is independent of the parameter \(\varepsilon\) and of the mesh diameter.

2.1 Weighted norm

To solve (1), we shall apply a FEM that uses standard trial and test functions on general meshes; its distinguishing feature is that it incorporates a special weight in the bilinear form and its associated norm, where the weight is chosen in such a way that the norm is balanced (i.e., the \(H^1(\varOmega )\) and \(L^2(\varOmega )\) weighted components of our norm are commensurable for typical solutions of (1)); see Remarks 3 and 4.

For each \(x\in \varOmega\), set

$$\begin{aligned} d(x) = \min \{|x-z|: z\in \partial \varOmega \}, \end{aligned}$$

where \(|x-z|\) denotes the Euclidean distance from x to z. That is, d(x) is the Euclidean distance from x to the boundary of \(\varOmega\).

For any \(y\in \varOmega\), choose \(z\in \partial \varOmega\) such that \(d(y) = |y-z|\). Then \(d(x) \le |x-z| \le |x-y| + d(y)\), by a triangle inequality. Similarly \(d(y) \le |y-x| + d(x)\). Hence

$$\begin{aligned} |d(x) - d(y) | \le |x-y| \ \text { for all } x,y\in \varOmega . \end{aligned}$$
(2)

(This proof of (2) comes from [10, p.354].) The inequality (2) says that \(d(\cdot )\) is uniformly Lipschitz continuous on \({\bar{\varOmega }}\). Then Rademacher’s Theorem [7, p.296] guarantees that the function \(d(\cdot )\) is differentiable almost everywhere in \(\varOmega\), and a fortiori (2) clearly implies that

$$\begin{aligned} |\nabla d(x)| \le 1 \ \text { almost everywhere}. \end{aligned}$$
(3)

Define our weight function \(\beta\) by

$$\begin{aligned} \beta (x) = 1 + \frac{1}{\varepsilon }\exp \left( -\frac{\gamma d(x)}{\varepsilon }\right) \text { for } x\in \varOmega , \end{aligned}$$
(4)

where the positive constant \(\gamma\) is (for the moment) arbitrary. Then

$$\begin{aligned} \nabla \beta (x) = -\frac{\gamma \nabla d(x)}{\varepsilon ^2}\exp \left( -\frac{\gamma d(x)}{\varepsilon }\right) \ \text { almost everywhere in } \varOmega . \end{aligned}$$

Recalling (3), we get

$$\begin{aligned} |\nabla \beta (x)| \le \frac{\gamma }{\varepsilon ^2}\exp \left( -\frac{\gamma d(x)}{\varepsilon }\right) \le \frac{\gamma }{\varepsilon }\beta (x) \ \text { almost everywhere in } \varOmega . \end{aligned}$$
(5)

Let \((\cdot , \cdot )\) denote the scalar and vector inner products in \(L^2(\varOmega )\). Write \(\Vert \cdot \Vert\) for the norm associated with this inner product. For each \(v\in L^2(\varOmega )\) and \(w\in H^1(\varOmega )\), define

$$\begin{aligned} \Vert v\Vert _\beta := (\beta v, v)^{1/2}\ \text { and } |||w|||_\beta := \left[ \varepsilon ^2 \Vert \nabla w\Vert _\beta ^2 + \Vert w\Vert _\beta ^2 \right] ^{1/2}. \end{aligned}$$

This norm is balanced (i.e., its components \(\varepsilon ^2 (\beta \nabla u, \nabla u)\) and \(\Vert u\Vert _\beta ^2\) are both O(1)) for a typical solution u of (1); see Remarks 3 and 4.

Remark 1

(Comparison with other balanced norms used to solve (1)) In [2] the balanced norm used comprises the above norm \(|||w|||_\beta\) with some extra terms added. The balanced norm used in [5, 18, 20, 22] is \(\Vert w\Vert _\varepsilon := \left[ \varepsilon \Vert \nabla w\Vert ^2 + \Vert w\Vert ^2 \right] ^{1/2}\). In [1, 12, 14] extra terms are added to \(\Vert w\Vert _\varepsilon\) to form a balanced norm. Of course, all these norms are stronger than the standard “energy norm” \(\left[ \varepsilon ^2 \Vert \nabla w\Vert ^2 + \Vert w\Vert ^2 \right] ^{1/2}\) that is often associated with (1).

Remark 2

(Motivation and applicability) The weight function \(\beta\) defined in (4) is a generalisation of the weight function used in [2] for a problem posed on the unit square in \(\mathbb {R}^2\). It is motivated by the desire to have the simplest weight with the following two properties: that \(\beta \approx 1\) away from layers so the method is like the standard Galerkin method, and that \(\beta \approx 1/\varepsilon\) inside layers so that the component \(\varepsilon ^2 \Vert \nabla w\Vert ^2\) in the standard energy norm is essentially replaced by \(\varepsilon \Vert \nabla w\Vert ^2\) in regions where we want this term to be O(1).

Our weight function \(\beta\) should work satisfactorily for any layer that has the same exponential decay nature as the boundary layers described in Lemma 3 below, provided that \(\beta\) is suitably modified: one replaces “distance to \(\partial \varOmega\)” by “distance to the layer location”. For example, the layers appearing in certain reaction-diffusion problems with interior layers are of this type; see [4, 8].

Remark 3

(Heuristic justification that \(|||\cdot |||_\beta\) is balanced) Given sufficient information about the data of (1), a standard technique due to Shishkin enables us to decompose its solution as \(u=v+w\), where v satisfies \(Lv=f\) on \(\varOmega\) with all low-order derivatives of v bounded independently of \(\varepsilon\), while \(Lw=0\) on \(\varOmega\) with \(w=-v\) on \(\partial \varOmega\), so w is the boundary layer component of u. This is done for example in [6], where \(\varOmega \subset \mathbb {R}^2\) is the unit square. Using the properties of v and w, one finds that typically \(|||v|||_\beta \sim \Vert v\Vert _\beta = O(1)\) and \(|||w|||_\beta \sim \varepsilon \Vert \nabla w\Vert _\beta = O(1)\), so each component of \(|||u|||_\beta\) is O(1), demonstrating that \(|||\cdot |||_\beta\) is balanced.

2.2 Weighted bilinear form

For all \(v,w \in H_0^1(\varOmega )\), define the bilinear form

$$\begin{aligned} B_\beta (v,w) := (\varepsilon ^2 \nabla v, \nabla (\beta w)) + (bv, \beta w), \end{aligned}$$

We prove two fundamental properties of \(B_\beta (\cdot ,\cdot ~)\).

Lemma 1

(coercivity of \(B_\beta (\cdot ,\cdot )\)) Assume that \(0 < \gamma \le b_0\). Then for the weight function \(\beta\) specified in (4), one has

$$\begin{aligned} B_\beta (v,v) > C_1\, |||v|||_\beta ^2 \text { for all }\ v\in H_0^1(\varOmega ), \end{aligned}$$

where \(C_1 := \min \{1/2, b_0^2/2 \}\).

Proof

Let \(v\in H_0^1(\varOmega )\). Then

$$\begin{aligned} B_\beta (v,v)&= (\varepsilon ^2 \nabla v, \nabla (\beta v)) + (bv, \beta v) \nonumber \\&\ge \varepsilon ^2 (\nabla v, \beta \nabla v + v \nabla \beta ) + (b_0 v, \beta v) \nonumber \\&= \varepsilon ^2 \Vert \nabla v\Vert _\beta ^2 + (\varepsilon ^2 \nabla v, v\nabla \beta ) + b_0^2 \Vert v\Vert _\beta ^2. \end{aligned}$$
(6)

Recalling (5), we have

$$\begin{aligned} \left| (\varepsilon ^2 \nabla v, v\nabla \beta )\right|&\le \left( \varepsilon |\nabla v|, |v|\gamma \beta \right) \\&\le \varepsilon \Vert \nabla v\Vert _\beta \, \gamma \Vert v\Vert _\beta \\&\le \frac{\varepsilon ^2}{2} \Vert \nabla v\Vert _\beta ^2 + \frac{\gamma ^2}{2} \Vert v\Vert _\beta ^2, \end{aligned}$$

by the Cauchy-Schwarz and Young inequalities. Inserting this bound into (6), one obtains

$$\begin{aligned} B_\beta (v,v) \ge \frac{\varepsilon ^2}{2} \Vert \nabla v\Vert _\beta ^2 + \left( b_0^2 - \frac{\gamma ^2}{2}\right) \Vert v\Vert _\beta ^2. \end{aligned}$$

The lemma follows, since \(\gamma \le b_0\). \(\square\)

Lemma 2

(boundedness of \(B_\beta (\cdot ,\cdot )\)) For the weight function \(\beta\) specified in (4), one has

$$\begin{aligned} |B_\beta (v,w)| \le C_2 |||v|||_\beta \, |||w|||_\beta \text { for all }\ v,w\in H_0^1(\varOmega ), \end{aligned}$$

where \(C_2 := \max \{1, \gamma , b_1^2 \}\).

Proof

Let \(v,w\in H_0^1(\varOmega )\). Then

$$\begin{aligned} B_\beta (v,w) = (\varepsilon ^2 \nabla v, \nabla (\beta w)) + (bv, \beta w) = (\varepsilon ^2 \beta \nabla v, \nabla w) + (\varepsilon ^2 \nabla v, w\nabla \beta ) + (bv, \beta w). \end{aligned}$$
(7)

Now (5) yields

$$\begin{aligned} |(\varepsilon ^2 \nabla v, w\nabla \beta )| \le \gamma \varepsilon ( | \nabla v|, |w|\beta ) \le \gamma \varepsilon \Vert \nabla v\Vert _\beta \Vert w\Vert _\beta \end{aligned}$$

by a Cauchy-Schwarz inequality. Applying two more Cauchy-Schwarz inequalities to (7), we get

$$\begin{aligned} \left| B_\beta (v,w) \right| \le \varepsilon ^2 \Vert \nabla v\Vert _\beta \Vert \nabla w\Vert _\beta + \gamma \varepsilon \Vert \nabla v\Vert _\beta \Vert w\Vert _\beta + b_1^2 \Vert v\Vert _\beta \Vert w\Vert _\beta . \end{aligned}$$

The desired result follows. \(\square\)

From now on, we assume that \(f\in H^{-1}(\varOmega ) := \left( H_0^1(\varOmega )\right) '\). One can then define a weak solution \(u\in H_0^1(\varOmega )\) of (1) by requiring it to satisfy

$$\begin{aligned} B_\beta (u,v) = (f, \beta v) \text { for all }\ v\in H_0^1(\varOmega ), \end{aligned}$$
(8)

where \(\beta\) is defined in (4) and the constant \(\gamma\) satisfies \(0 < \gamma \le b_0\). By the Lax-Milgram theorem, invoking Lemmas 1 and 2, the problem (8) has a unique solution \(u\in H_0^1(\varOmega )\). Any strong solution \(u\in H^{2}(\bar{\varOmega })\cap H_0^1(\varOmega )\) of the original problem (1) is also a weak solution of (8), as can be seen by multiplying (1a) by \(\beta v\) then integrating by parts.

3 The weighted finite element method

Let \(\varOmega _h\) be an arbitrary mesh on \({\bar{\varOmega }}\). Let \(V_h \subset H_0^1(\varOmega )\) be a conforming finite element space defined on this mesh. Suppose that \(0 < \gamma \le b_0\). From the Lax-Milgram theorem and Lemmas 1 and 2, there is a unique \(u_h\in V_h\) such that

$$\begin{aligned} B_\beta (u_h,v_h) = (f, \beta v_h) \text { for all }\ v_h\in V_h. \end{aligned}$$
(9)

Combining (8) and (9) gives the Galerkin orthogonality property

$$\begin{aligned} B_\beta (u-u_h,v_h) = 0\ \text { for all }\ v_h\in V_h. \end{aligned}$$
(10)

Theorem 1

(quasioptimal FEM error bound) Let \(\varOmega\) be a bounded subset of \(\mathbb {R}^k\) for some \(k\ge 1\). Let \(\varOmega _h\) be an arbitrary mesh on \({\bar{\varOmega }}\), and \(V_h \subset H_0^1(\varOmega )\) a finite element space defined on this mesh. Choose \(\gamma \in \mathbb {R}\) to satisfy \(0 < \gamma \le b_0\). Let u be the weak solution of (8) and \(u_h\) the solution of (9). Then one has

$$\begin{aligned} |||u-u_h|||_\beta \le \frac{C_2}{C_1}\, \inf _{w_h\in V_h} |||u-w_h|||_\beta . \end{aligned}$$

Proof

Let \(w_h\in V_h\) be arbitrary. Invoking Lemma 1, then equation (10), then Lemma 2, we get

$$\begin{aligned} C_1 |||u-u_h|||_\beta ^2&\le B_\beta (u-u_h, u-u_h) \\&= B_\beta (u-u_h, u-w_h) \\&\le C_2 |||u-u_h|||_\beta |||u-w_h|||_\beta . \end{aligned}$$

Hence \(|||u-u_h|||_\beta \le (C_2/C_1)|||u-w_h|||_\beta\). As \(w_h\in V_h\) was arbitrary, we are done. \(\square\)

The analysis up to this point is for a general bounded domain, a general mesh, and a general conforming finite element space \(V_h\). This generality cannot however yield a bound in Theorem 1 that will ensure that \(|||u-u_h|||_\beta\) is small; to get this desirable outcome, one must tailor \(V_h\) or the mesh to the singularly perturbed nature of the problem. In Sect. 4 we show how this is done for a specific domain by a suitable choice of mesh.

4 Reaction-diffusion problem on the unit square in \(\mathbb {R}^2\)

We now specialise the theory of Sect. 2 to a 2D reaction-diffusion problem that has been considered in many balanced-norm papers, including [1, 12, 14].

During Sect. 4 we take \(\varOmega = (0,1)^2\), the unit square in \(\mathbb {R}^2\). Assume that \(f,~b \in C^{0,\alpha }(\bar{\varOmega })\), where we use the standard notation for Hölder spaces. Assume the corner compatibility conditions

$$\begin{aligned} f(0,0)= f(1,0) = f(0,1)=f(1,1) = 0. \end{aligned}$$

Then (1) has a unique solution \(u\in C^{2,\alpha }(\bar{\varOmega })\); see, e.g., [11]. Furthermore, this solution has typically an exponential boundary layer in a neighbourhood of \(\partial \varOmega\) of width \(O(\varepsilon |\ln \varepsilon |)\); see Lemma 3 below for more details.

The 4 sides of \(\varOmega\) will be denoted by

$$\begin{aligned}&\varGamma _1:=\{(x,0)|0\le x \le 1\},\quad \varGamma _2:=\{(0,y)|0\le y\le 1\},\\&\varGamma _3:=\{(x,1)|0\le x \le 1\},\quad \varGamma _4:=\{(1,y)|0\le y\le 1\}. \end{aligned}$$

The corners of this domain are \(z_1:= (0,0), z_2:= (1,0), z_3:= (1,1), z_4:= (0,1)\).

In [6, Theorem 2.2] and [17, Lemma 1.2] the solution u of (1) with \(\varOmega = (0,1)^2\) is decomposed into smooth and layer components, and bounds are derived on certain derivatives of these components. The analysis in these papers makes stronger assumptions than ours, but an inspection of their arguments shows however that, under our assumptions on the data, one obtains the following result, where for brevity the derivative \(\partial ^{m+n}g/\partial ^mx \partial ^ny\) of any function g is written as \(D_x^mD_y^n g\).

Lemma 3

The solution u of (1) can be decomposed as

$$\begin{aligned} u = v + \sum _{k=1}^4 w_k + \sum _{k=1}^4 z_k, \end{aligned}$$
(11)

where each \(w_k\) is a layer associated with the edge \(\varGamma _k\) and each \(z_k\) is a layer associated with the corner \(c_k\). There exists a constant C such that for all \((x,y)\in {\bar{\varOmega }}\) and \(0 \le m+n \le 2\) one has

$$\begin{aligned} \Vert D_x^m D_y^n v\Vert _{0,\infty ,{\bar{\varOmega }}}&\le C, \end{aligned}$$
(12a)
$$\begin{aligned} \left| D_x^mD_y^n w_1(x,y)\right|&\le C\varepsilon ^{-n} e^{-b_0 y/\varepsilon }, \end{aligned}$$
(12b)
$$\begin{aligned} \left| D_x^mD_y^n z_1(x,y) \right|&\le C \varepsilon ^{-m-n} e^{-b_0(x+y)/\varepsilon }. \end{aligned}$$
(12c)

Bounds for \(w_2, w_3\) and \(w_4\) that are analogous to (12b) and bounds for \(z_2, z_3\) and \(z_4\) that are analogous to (12c) also hold.

Remark 4

For a solution u that enjoys the properties described in Lemma 3, one can verify that for any constant \(\gamma >0\), one has \(\varepsilon \Vert \nabla u\Vert _\beta = O(1)\) and \(\Vert u\Vert _\beta = O(1)\). That is, the norm \(|||\cdot |||_\beta\) is balanced for this problem.

4.1 The Shishkin mesh

To solve the problem numerically, we shall use a piecewise-uniform Shishkin mesh. These meshes are a popular tool in the numerical solution of problems such as (1). For an introduction to their properties and usage, see [16, 21, 23].

Let N be an even positive integer. The mesh will use N mesh intervals in each coordinate direction. The mesh transition parameter \(\uplambda\) will specify where the mesh changes from coarse to fine; we define it by

$$\begin{aligned} \uplambda = \min \left\{ \frac{1}{4},\ \frac{\sigma \varepsilon \ln N}{b_0} \right\} . \end{aligned}$$
(13)

In this formula, the constant \(\sigma\) will be chosen later to facilitate our numerical analysis and is then used in the implementation of the finite element method.

Then, without loss of generality, one can assume that N is so large that (13) simplifies to

$$\begin{aligned} \uplambda = \sigma \varepsilon b_0^{-1} \ln N. \end{aligned}$$
(14)

Partition \(\varOmega\) as follows (see Fig. 1): \({\bar{\varOmega }}=\varOmega _{11}\cup \varOmega _{21}\cup \varOmega _{12}\cup \varOmega _{22}\), where

$$\begin{aligned} \varOmega _{11}&= [\uplambda ,1-\uplambda ] \times [\uplambda , 1-\uplambda ], \qquad \varOmega _{21}= ([0,\uplambda ]\cup [1-\uplambda ,1]) \times [\uplambda ,1-\uplambda ],\\ \varOmega _{12}&= [\uplambda ,1-\uplambda ] \times ([0,\uplambda ]\cup [1-\uplambda ,1]),\\ \varOmega _{22}&= ([0,\uplambda ]\times ([0,\uplambda ]\cup [1-\uplambda ,1])) \cup ([1-\uplambda ,1] \times ([0,\uplambda ]\cup [1-\uplambda ,1])). \end{aligned}$$

Divide each of the x-intervals \([0,\uplambda ]\) and \([1-\uplambda , 1]\) into N/4 equidistant subintervals and divide \([\uplambda , 1-\uplambda ]\) into N/2 equidistant subintervals. This gives a coarse mesh on \([\uplambda ,1-\uplambda ]\) and a fine mesh on \([0,\uplambda ]\cup [1-\uplambda , 1]\). Divide the y-interval [0,1] in the same way. Then the final 2-dimensional mesh is a tensor product of these 1-dimensional Shishkin meshes; see Fig. 1, where \(N=8\).

Fig. 1
figure 1

Shishkin mesh for reaction-diffusion

An explicit description of the mesh follows: one has \(0=x_0< x_1< \dots < x_N =1\) and \(0=y_0< y_1< \dots < y_N =1\), with mesh sizes \(h_i := x_i - x_{i-1}\) and \(k_j := y_j - y_{j-1}\) that are defined by

$$\begin{aligned} h_i = \left\{ \begin{array}{ll} h := \displaystyle {4\uplambda N^{-1}} &{} \text{ for } \ i=0,\dots ,N/4 \text { and }\ i= 3N/4+1,\dots ,N, \\ H := \displaystyle {2(1 -2\uplambda )N^{-1}} &{} \text{ for } i=N/4+1,\dots ,3N/4, \end{array} \right. \end{aligned}$$
(15a)

and

$$\begin{aligned} k_j = \left\{ \begin{array}{ll} h &{} \text{ for } \ j=0,\dots ,N/4 \text { and }\ j= 3N/4+1,\dots ,N, \\ H &{} \text{ for } \ j=N/4+1,\dots ,3N/4. \end{array} \right. \end{aligned}$$
(15b)

The mesh divides \(\varOmega\) into a set \(T^{N,N}\) of mesh rectangles R whose sides are parallel to the axes—see Fig. 1. The mesh is coarse on \(\varOmega _{11}\), coarse/fine on \(\varOmega _{21}\cup \varOmega _{12}\), and fine on \(\varOmega _{22}\). The mesh is quasiuniform on \(\varOmega _{11}\) and its diameter d there satisfies \(\sqrt{2}/N \le d \le 2\sqrt{2}/N\); on \(\varOmega _{12}\cup \varOmega _{21}\), each mesh rectangle has dimensions \(O(N^{-1})\) by \(O(\varepsilon N^{-1} \ln N)\); and on \(\varOmega _{22}\) each rectangle is \(O(\varepsilon N^{-1} \ln N)\) by \(O(\varepsilon N^{-1} \ln N)\). These dimensions will be used in the error analysis of our finite element method.

4.2 Bilinear FEM on Shishkin mesh

We assume henceforth that the user-chosen constant \(\gamma\) satisfies \(0 < \gamma \le b_0\), so Theorem 1 is valid.

For our finite element method, choose the finite space \(V_h\subset C({\bar{\varOmega }})\cap H_0^1(\varOmega )\) to comprise piecewise bilinears on the Shishkin mesh of Sect. 4.1. Given any function \(g\in C({\bar{\varOmega }})\cap H_0^1(\varOmega )\), we write \(g^I\) for the nodal interpolant of g from \(V_h\).

The following interpolation error bound is suitable for the highly anisotropic Shishkin mesh.

Lemma 4

[3, Theorem 2.7] Let R be any Shishkin mesh rectangle with dimensions \(h_x \times k_y\). Let \(\phi \in H^2(R)\). Then its bilinear nodal interpolant \(\phi ^I\) satisfies the bounds

$$\begin{aligned} \Vert \phi -\phi ^I\Vert _{\infty ,R}&\le C \left( h_x^2\Vert \phi _{xx}\Vert _{\infty ,R} +k_y^2\Vert \phi _{yy}\Vert _{\infty ,R} \right) ,\\ \Vert (\phi -\phi ^I)_x\Vert _{\infty ,R}&\le C \left( h_x\Vert \phi _{xx}\Vert _{\infty ,R} +k_y\Vert \phi _{xy}\Vert _{\infty ,R} \right) ,\\ \Vert (\phi -\phi ^I)_y\Vert _{\infty ,R}&\le C \left( h_x\Vert \phi _{xy}\Vert _{\infty ,R} +k_y\Vert \phi _{yy}\Vert _{\infty ,R} \right) , \end{aligned}$$

where the constant C is independent of \(\phi , h_x\) and \(k_y\).

Theorem 1 gives us

$$\begin{aligned} |||u-u_h|||_\beta \le \frac{C_2}{C_1} |||u-u^I |||_\beta . \end{aligned}$$
(16)

We shall bound the right-hand side of (16) by using the decomposition of u from Lemma 3, comparing each component there with its interpolant from \(V_h\).

Lemma 5

For the component v in Lemma 3, one has

$$\begin{aligned} ||| v - v^I |||_\beta \le C\varepsilon N^{-1} + N^{-2}. \end{aligned}$$

Proof

Lemmas 3 and 4 yield \(\Vert v-v^I \Vert _{\infty ,\varOmega } \le CN^{-2} |v|_{2,\infty ,\varOmega } \le CN^{-2}\). Hence

$$\begin{aligned} \Vert v-v^I \Vert _\beta = \left[ \int _\varOmega \beta (v-v^I)^2 \right] ^{1/2} \le CN^{-2} \left[ \int _\varOmega \beta \right] ^{1/2} \le CN^{-2}. \end{aligned}$$

By a similar argument, again using Lemmas 3 and 4, one has \(\Vert \nabla (v-v^I) \Vert _{\infty ,\varOmega } \le CN^{-1} |v|_{2,\infty ,\varOmega } \le CN^{-1}\) and hence \(\Vert \nabla (v-v^I) \Vert _\beta \le CN^{-1}\). To finish the proof, recall the definition of \(||| \cdot |||_\beta\). \(\square\)

For any measurable \(\omega \subset \varOmega\), and each \(v\in L^2(\omega )\) and \(w\in H^1(\omega )\), define

$$\begin{aligned} \Vert v\Vert _{\beta ,\omega } := \left[ \int _\omega \beta v^2 \right] ^{1/2}\ \text { and } \ |||w|||_{\beta ,\omega } := \left[ \left( \varepsilon ^2 \int _\omega \beta |\nabla w|^2\right) + \Vert w\Vert _{\beta ,\omega }^2 \right] ^{1/2}. \end{aligned}$$

Lemma 6

For the components \(w_1, w_2, w_3, w_4\) of Lemma 3, one has

$$\begin{aligned} ||| w_j - w_j^I |||_\beta \le C\left( N^{-1}\ln N + N^{-\sigma }\right) \text { for } \ j=1,2,3,4. \end{aligned}$$

Proof

We give the proof only for \(w_1\), as the other \(w_j\) are similar.

Set \(\varOmega _y = \{ (x,y)\in \varOmega : y\ge \uplambda \}\). By Lemma 3, one has

$$\begin{aligned} \left| D_x^mD_y^n w_1(x,y)\right| \le C \varepsilon ^{-n} N^{-\sigma } \quad \text {for } 0\le m+n \le 2 \text { and } (x,y)\in \varOmega _y. \end{aligned}$$

Hence,

$$\begin{aligned} \Vert w_1-w_1^I \Vert _{\infty ,\varOmega _y}&\le \Vert w_1 \Vert _{\infty ,\varOmega _y} + \Vert w_1^I \Vert _{\infty ,\varOmega _y} = 2\Vert w_1 \Vert _{\infty ,\varOmega _y} \le C N^{-\sigma } \\&\text {and} \\ \Vert \nabla (w_1-w_1^I) \Vert _{\infty ,\varOmega _y}&\le \Vert \nabla w_1 \Vert _{\infty ,\varOmega _y} + \Vert \nabla w_1^I \Vert _{\infty ,\varOmega _y} \le 2\Vert \nabla w_1 \Vert _{\infty ,\varOmega _y} \le C \varepsilon ^{-1} N^{-\sigma }. \end{aligned}$$

It now follows, as in the proof of Lemma 5, that

$$\begin{aligned} ||| w_1 - w_1^I |||_{\beta , \varOmega _y} \le CN^{-\sigma }. \end{aligned}$$
(17)

On \(\varOmega \setminus \varOmega _y\) invoke Lemma 4 and (15) to get

$$\begin{aligned} \Vert w_1-w_1^I \Vert _{\infty ,\varOmega \setminus \varOmega _y}&\le C \left[ H^2 \Vert (w_1)_{xx}\Vert _{\infty , \varOmega \setminus \varOmega _y} + h^2 \Vert (w_1)_{yy}\Vert _{\infty , \varOmega \setminus \varOmega _y} \right] \\&\le C \left[ N^{-2} + (\varepsilon N^{-1}\ln N)^2 \varepsilon ^{-2} \right] \\&= C(N^{-1}\ln N)^2\\&\text {and}\\ \Vert \nabla (w_1-w_1^I) \Vert _{\infty ,\varOmega \setminus \varOmega _y}&\le C \left[ H \Vert (w_1)_{xx}\Vert _{\infty , \varOmega \setminus \varOmega _y} + (h+H) \Vert (w_1)_{xy}\Vert _{\infty , \varOmega \setminus \varOmega _y} \right. \\&\quad \left. + h \Vert (w_1)_{yy}\Vert _{\infty , \varOmega \setminus \varOmega _y} \right] \\&\le C \left[ N^{-1} (1 + \varepsilon ^{-1}) + (\varepsilon N^{-1}\ln N) (\varepsilon ^{-1}+\varepsilon ^{-2} )\right] \\&= C \varepsilon ^{-1} (N^{-1}\ln N), \end{aligned}$$

where we used Lemma 3 to bound the derivatives of \(w_1\). From these estimates it follows, as in the proof of Lemma 5, that

$$\begin{aligned} ||| w_1 - w_1^I |||_{\beta , \varOmega \setminus \varOmega _y} \le CN^{-1}\ln N. \end{aligned}$$
(18)

Putting together (17) and (18) completes the proof. \(\square\)

Next we prove the corresponding result for the components \(z_j\) of u; the proof resembles the proof of Lemma 6 but there are some differences.

Lemma 7

For the components \(z_1, z_2, z_3, z_4\) in Lemma 3, one has

$$\begin{aligned} ||| z_j - z_j^I |||_\beta \le C\left( N^{-1}\ln N + N^{-\sigma }\right) \text { for }\ j=1,2,3,4. \end{aligned}$$

Proof

We give the proof only for \(z_1\), as the other \(z_j\) are similar.

Set \(\varOmega _{xy} = \{ (x,y)\in \varOmega : x\ge \uplambda \text { or } y\ge \uplambda \}\). By Lemma 3, one has

$$\begin{aligned} \left| D_x^mD_y^n w_1(x,y)\right| \le C \varepsilon ^{-m-n} N^{-\sigma } \quad \text {for } 0\le m+n \le 2 \text { and } (x,y)\in \varOmega _{xy}. \end{aligned}$$

Hence,

$$\begin{aligned} \Vert z_1-z_1^I \Vert _{\infty ,\varOmega _y}&\le \Vert z_1 \Vert _{\infty ,\varOmega _y} + \Vert z_1^I \Vert _{\infty ,\varOmega _y} = 2\Vert z_1 \Vert _{\infty ,\varOmega _y} \le C N^{-\sigma } \\&\text {and} \\ \Vert \nabla (z_1-z_1^I) \Vert _{\infty ,\varOmega _y}&\le \Vert \nabla z_1 \Vert _{\infty ,\varOmega _y} + \Vert \nabla z_1^I \Vert _{\infty ,\varOmega _y} \le 2\Vert \nabla z_1 \Vert _{\infty ,\varOmega _y} \le C \varepsilon ^{-1} N^{-\sigma }. \end{aligned}$$

It now follows, as in the proof of Lemma 5, that

$$\begin{aligned} ||| z_1 - z_1^I |||_{\beta , \varOmega _y} \le CN^{-\sigma }. \end{aligned}$$
(19)

On \(\varOmega \setminus \varOmega _{xy}\) invoke Lemma 4 and (15) to get

$$\begin{aligned} \Vert z_1-z_1^I \Vert _{\infty ,\varOmega \setminus \varOmega _y}&\le C \left[ h^2 \Vert (z_1)_{xx}\Vert _{\infty , \varOmega \setminus \varOmega _y} + h^2 \Vert (z_1)_{yy}\Vert _{\infty , \varOmega \setminus \varOmega _y} \right] \\&\le C (\varepsilon N^{-1}\ln N)^2 \varepsilon ^{-2} \\&= C (N^{-1}\ln N)^2\\&\text {and}\\ \Vert \nabla (z_1-z_1^I) \Vert _{\infty ,\varOmega \setminus \varOmega _y}&\le C \left[ h \Vert (z_1)_{xx}\Vert _{\infty , \varOmega \setminus \varOmega _y} + h \Vert (z_1)_{xy}\Vert _{\infty , \varOmega \setminus \varOmega _y} \right. \\&\quad \left. + h \Vert (z_1)_{yy}\Vert _{\infty , \varOmega \setminus \varOmega _y} \right] \\&\le C (\varepsilon N^{-1}\ln N) (1+ \varepsilon ^{-1}+\varepsilon ^{-2}) \\&= C \varepsilon ^{-1} (N^{-1}\ln N), \end{aligned}$$

where we used Lemma 3 to bound the derivatives of \(z_1\). From these estimates it follows, as in the proof of Lemma 5, that

$$\begin{aligned} ||| z_1 - z_1^I |||_{\beta , \varOmega \setminus \varOmega _y} \le CN^{-1}\ln N. \end{aligned}$$
(20)

Combine (19) and (20) to finish the argument. \(\square\)

We can now give a precise error estimate for our finite element method.

Theorem 2

Let u be the solution of (1) and \(u_h\) the solution of (9), where the finite element method uses piecewise bilinears on the Shishkin mesh. Then there exists a constant C such that

$$\begin{aligned} ||| u-u_h |||_\beta \le C\left( N^{-1}\ln N + N^{-\sigma }\right) . \end{aligned}$$

With the choice \(\sigma =1\) in the definition (14) of the Shishkin mesh transition parameter, this bound becomes

$$\begin{aligned} ||| u-u_h |||_\beta \le C N^{-1}\ln N. \end{aligned}$$

Proof

Lemmas 5, 6 and 7 and the decomposition of Lemma 3 yield

$$\begin{aligned} ||| u-u^I |||_\beta \le C\left( N^{-1}\ln N + N^{-\sigma }\right) . \end{aligned}$$

The result then follows from Lemma 1. \(\square\)

Remark 5

One could instead use a FEM space that comprises polynomials of higher degree and carry out a similar error analysis to obtain a higher order of convergence in Theorem 2, but this necessitates imposing more conditions on the data of (1) so that more derivatives of u are bounded in Lemma 3, as in [6, 17].

Remark 6

[9] extend the reaction-diffusion analysis of [20] to the convection-diffusion problem

$$\begin{aligned} -\varepsilon \varDelta v + av_x + bv = f \ \text { on } \varOmega = (0,1)^2, \quad v=0 \ \text { on } \partial \varOmega , \end{aligned}$$
(21)

where the functions a and b are positive. (Here, as is customary, we have replaced the \(\varepsilon ^2\) coefficient of (1a) by \(\varepsilon\) since this is a convection–diffusion problem.) Typical solutions of this problem exhibit an exponential layer along the side \(x=1\) of \(\varOmega\), and parabolic layers along the sides \(y=0\) and \(y=1\). The associated energy norm \(\left[ \varepsilon |\nabla v|^2 + \Vert v\Vert _0^2 \right] ^{1/2}\) is correctly balanced for the exponential layer, but is unbalanced for the weaker parabolic layers, for which it reduces essentially to the \(L^2(\varOmega )\) norm when \(\varepsilon \ll 1\). In [9] an FEM comprising streamline-diffusion (SDFEM) in the x-direction and standard Galerkin in the y-direction is used and is shown to converge in the balanced norm \(\left[ \varepsilon |v_x|^2 + \varepsilon ^{1/2} |v_y|^2 + \Vert v\Vert _0^2 \right] ^{1/2}\) on a Shishkin mesh that is appropriate for this problem.

One can carry out an analogous construction and analysis in our setting: working with piecewise bilinears on a Shishkin mesh like that of [9], construct an FEM that uses SDFEM in the x-direction (note that we can use the standard SDFEM, unlike the special variant of SDFEM that is used in [9]) and replaces (4) by the weight function

$$\begin{aligned} {\hat{\beta }}(x,y) := 1 + \frac{1}{\varepsilon ^{1/2}}\exp \left( -\frac{\gamma [1/2-|y-1/2|]}{\varepsilon ^{1/2}}\right) \ \text { for } (x,y)\in \varOmega . \end{aligned}$$

This weight is a function of y only and is large inside the parabolic layers. One can prove convergence on the Shishkin mesh in the weighted balanced norm

$$\begin{aligned} \left[ \varepsilon |v_x|^2 + \varepsilon {\hat{\beta }} |v_y|^2 + \Vert v\Vert _0^2 \right] ^{1/2}. \end{aligned}$$
(22)

We do not give details here as they are lengthy and require no new idea beyond a synthesis of our earlier analysis and that of [9].

5 Numerical results

Our test problem is

$$\begin{aligned} -\varepsilon ^2\varDelta u + u = f\ \text { in } \varOmega = (0,1)^2, \qquad u = 0 \ \text { on } \partial \varOmega , \end{aligned}$$
(23)

where f chosen so that

$$\begin{aligned} u(x,y) = \left( \cos \left( \frac{\pi }{2} x\right) - \frac{e^{-x/\varepsilon } - e^{-1/\varepsilon }}{1-e^{-1/\varepsilon }}\right) \bigg (1 - y - \frac{e^{-y/\varepsilon } - e^{-1/\varepsilon }}{1-e^{-1/\varepsilon }}\bigg ). \end{aligned}$$
(24)

This problem is taken from [13] and is widely used in the literature (see, e.g., [1]). Its solution exhibits only two layers, which are near \(x=0\) and \(y=0\). Nonetheless, the bounds presented in Lemma 3 hold and are sharp for v, \(w_1\), \(w_2\), and \(z_1\). Thus, the example is sufficiently typical to verify the results of Theorem 2.

Since this problem has layers along only two edges, we modify the Shishkin mesh of Sect. 4.1 so that it is a tensor product of two one-dimensional meshes with N/2 equidistant subintervals on each of \([0,\uplambda ]\) and \([\uplambda , 1]\). In our experiments we have taken \(\gamma =0.98\) in (4), and \(b_0=0.99\) and \(\sigma =1\) in (13).

Our results are computed using Firedrake; see [19]. All results presented here are for bilinear elements; consistent results were obtained using biquadratic elements (see Remark 5).

In Table 1 we present the errors in the solutions computed using our proposed method. Observe that, for sufficiently small \(\varepsilon\), the errors are independent of \(\varepsilon\), and converge at a rate that is \(\mathcal {O}(N^{-1}\ln N)\), verifying Theorem 2. This contrasts with the observed results for the classical Galerkin method (i.e., \(\beta (\cdot ) \equiv 1\)) for this problem on this Shishkin mesh, where the computed errors in the standard energy norm scale like \(\varepsilon ^{1/2}N^{-1}\ln N\); see, e.g., [17, Table 1].

Table 1 \(|||u-u_h|||_\beta\) where u is as in (24) and \(u_h\) the solution of (9) on a Shishkin mesh

Although a discrete maximum norm error analysis is beyond the scope of this paper, in Table 2 we show the maximum error observed at the mesh points. One sees again that, for sufficiently small \(\varepsilon\), the error is independent of \(\varepsilon\) and converges at a rate that is \(\mathcal {O}(N^{-1})\). We remark that choosing \(\sigma =2\) in (13) improves the experimental convergence rate to \(\mathcal {O}(N^{-2}\ln ^2 N)\), though we do not show these results here.

Table 2 \(|||u-u_h|||_{\infty ,T^{N,N}}\) for the weighted method on a Shishkin mesh

Finally, we mention that when a solution \({\tilde{u}}_h\) is computed using the classical Galerkin FEM, our numerical results reveal that \(|||u-{\tilde{u}}_h|||_\beta\) closely matches the values of \(|||u-u_h|||_\beta\) stated in Table 1. This is curious, and worthy of further investigation, particularly since our experiments also suggest that when errors are measured in the discrete maximum norm, then our weighted method is more accurate than the classical Galerkin FEM by more than a factor of two.

Remark 7

For the problem of Remark 6, we have verified experimentally the accuracy of the numerical method described in that remark, using the standard SDFEM with its stabilisation parameter chosen as in [15]. Here we chose f in (21) such that the solution has parabolic layers along \(y=0\) and \(y=1\), but no exponential layer at the outflow boundary \(x=1\), so that the error in the parabolic layer dominates. Our results in the weighted norm (22) show almost first-order convergence, independently of \(\varepsilon\).