1 Introduction

1.1 Motivation

In this paper we consider solving a two-dimensional elliptic equation with smooth coefficients on a rectangular domain by high order finite difference schemes, which are constructed via using suitable quadrature in the classical continuous finite element method on a rectangular mesh. Consider the following model problem as an example: a variable coefficient Poisson equation \(-\nabla \cdot (a({\mathbf {x}})\nabla u)=f, a({\mathbf {x}})>0\) on a square domain \(\Omega =(0,1)\times (0,1)\) with homogeneous Dirichlet boundary conditions. The variational form is to find \(u\in H_0^1(\Omega )=\{v\in H^1(\Omega ): v|_{\partial \Omega }=0\}\) satisfying

$$\begin{aligned} A(u,v)=(f,v),\quad \forall v\in H_0^1(\Omega ), \end{aligned}$$

where \(A(u,v)=\iint _{\Omega } a\nabla u \cdot \nabla v dx dy\), \( (f,v)=\iint _{\Omega }fv dxdy.\) Let h be the mesh size of an uniform rectangular mesh and \(V_0^h\subseteq H^1_0(\Omega )\) be the continuous finite element space consisting of piecewise \(Q^k\) polynomials (i.e., tensor product of piecewise polynomials of degree k), then the \(C^0\)-\(Q^k\) finite element solution is defined as \(u_h\in V_0^h\) satisfying

$$\begin{aligned} A(u_h,v_h)=(f,v_h),\quad \forall v_h\in V_0^h. \end{aligned}$$
(1.1)

Standard error estimates of (1.1) are \(\Vert u-u_h\Vert _{1}\le C h^{k}\Vert u\Vert _{k+1}\) and \(\Vert u-u_h\Vert _{0}\le C h^{k+1}\Vert u\Vert _{k+1}\) where \(\Vert \cdot \Vert _k\) denotes \(H^k(\Omega )\)-norm, see [5]. For \(k\ge 2\), \({\mathcal {O}} (h^{k+1})\) superconvergence for the gradient at Gauss quadrature points and \({\mathcal {O}} (h^{k+2})\) superconvergence for functions values at Gauss–Lobatto quadrature points were proven for one-dimensional case in [1, 2, 11] and for two-dimensional case in [4, 8, 14, 17].

When implementing the scheme (1.1), integrals are usually approximated by quadrature. The most convenient implementation is to use \((k+1)\times (k+1)\) Gauss–Lobatto quadrature because they not only are superconvergence points but also can define all the degree of freedoms of Lagrangian \(Q^k\) basis. See Fig. 1 for the case \(k=2\). Such a quadrature scheme can be denoted as finding \(u_h\in V_0^h\) satisfying

$$\begin{aligned} A_h(u_h,v_h)=\langle f,v_h\rangle _h,\quad \forall v_h\in V_0^h, \end{aligned}$$
(1.2)

where \(A_h(u_h,v_h)\) and \(\langle f,v_h\rangle _h\) denote using tensor product of \((k+1)\)-point Gauss–Lobatto quadrature for integrals \(A(u_h,v_h)\) and \((f,v_h)\) respectively.

Fig. 1
figure 1

An illustration of Lagrangian \(Q^2\) element and the \(3\times 3\) Gauss–Lobatto quadrature

It is well known that many classical finite difference schemes are exactly finite element methods with specific quadrature scheme, see [5]. We will write scheme (1.2) as an exact finite difference type scheme in Sect. 7 for \(k=2\). Such a finite difference scheme not only provides an efficient and also convenient way for assembling the stiffness matrix especially for a variable coefficient problem, but also with has advantages inherited from the variational formulation, such as symmetry of stiffness matrix and easiness of handling boundary conditions in high order schemes. This is the variational approach to construct a high order accurate finite difference scheme.

Classical quadrature error estimates imply that standard finite element error estimates still hold for (1.2), see [5, 7]. The focus of this paper is to prove that the superconvergence of function values at Gauss–Lobatto points still holds. To be more specific, for Dirichlet type boundary conditions, we will show that (1.2) with \(k\ge 2\) is a \((k+2)\)th order accurate finite difference scheme in the discrete 2-norm under suitable smoothness assumptions on the exact solution and the coefficients.

In this paper, the main motivation to study superconvergence is to use it for constructing \((k+2)\)th order accurate finite difference schemes. For such a task, superconvergence points should define all degree of freedoms over the whole computational domain including boundary points. For high order finite element methods, this seems possible only on quite structured meshes such as rectangular meshes for a rectangular domain and equilateral triangles for a hexagonal domain, even though there are numerous superconvergence results for interior cells in unstructured meshes.

1.2 Related Work and Difficulty in Using Standard Tools

To illustrate our perspectives and difficulties, we focus on the case \(k=2\) in the following. For computing the bilinear form in the scheme (1.1), another convenient implementation is to replace the smooth coefficient a(xy) by a piecewise \(Q^2\) polynomial \(a_I(x,y)\) obtained by interpolating a(xy) at the quadrature points in each cell shown in Fig. 1. Then one can compute the integrals in the bilinear form exactly since the integrand is a polynomial. Superconvergence of function values for such an approximated coefficient scheme was proven in [13] and the proof can be easily extended to higher order polynomials and three-dimensional cases. This result might seem surprising since interpolation error \(a(x,y)-a_I(x,y)\) is of third order. On the other hand, all the tools used in [13] are standard in the literature.

From a practical point of view, (1.2) is more interesting since it gives a genuine finite difference scheme. It is straightforward to use standard tools in the literature for showing superconvergence still holds for accurate enough quadrature. Even though the \(3\times 3\) Gauss–Lobatto quadrature is fourth order accurate, the standard quadrature error estimates cannot be used directly to establish the fourth order accuracy of (1.2), as will be explained in detail in Remark 3.8 in Sect. 3.2.

We can also rewrite (1.2) for \(k=2\) as a finite difference scheme but its local truncation error is only second order as will be shown in Sect. 7.4. The phenomenon that truncation errors have lower orders was named supraconvergence in the literature. The second order truncation error makes it difficult to establish the fourth order accuracy following any traditional finite difference analysis approaches.

To construct high order finite difference schemes from variational formulation, we can also consider finite element method with \(P^2\) basis on a regular triangular mesh in which two adjacent triangles form a rectangle [18]. Superconvergence of function values in \(C^0\)-\(P^2\) finite element method at the three vertices and three edge centers can be proven [4, 17]. See also [10]. Even though the quadrature using only three edge centers is third order accurate, error cancellations happen on two adjacent triangles forming a rectangle, thus fourth order accuracy of the corresponding finite difference scheme is still possible. However, extensions to construct higher order finite difference schemes are much more difficult.

1.3 Contributions and Organization of the Paper

The main contribution is to give the proof of the \((k+2)\)th order accuracy of (1.2) with \(k\ge 2\), which is an easy construction of high order finite difference schemes for variable coefficient problems. An important step is to obtain desired sharp quadrature estimate for the bilinear form, for which it is necessary to count in quadrature error cancellations between neighboring cells. Conventional quadrature estimating tools such as the Bramble–Hilbert Lemma only give the sharp estimate on each cell thus cannot be used directly. A key technique in this paper is to apply the Bramble–Hilbert Lemma after integration by parts on proper interpolation polynomials to allow error cancellations.

The paper is organized as follows. In Sect. 2, we introduce our notations and assumptions. In Sect. 3, standard quadrature estimates are reviewed. Superconvergence of bilinear forms with quadrature is shown in Sect. 4. Then we prove the main result for homogeneous Dirichlet boundary conditions in Sect. 5 and for nonhomogeneous Dirichlet boundary conditions in Sect. 6. Section 7 provides a simple finite difference implementation of (1.2). Section 8 contains numerical tests. Concluding remarks are given in Sect. 9.

2 Notations and Assumptions

2.1 Notations and Basic Tools

We will use the same notations as in [13]:

  • We only consider a rectangular domain \(\Omega =(0,1)\times (0,1)\) with its boundary denoted as \(\partial \Omega \).

  • Only for convenience, we assume \(\Omega _h\) is an uniform rectangular mesh for \({{\bar{\Omega }}}\) and \(e=[x_e-h,x_e+h]\times [y_e-h,y_e+h]\) denotes any cell in \(\Omega _h\) with cell center \((x_e,y_e)\). The assumption of an uniform mesh is not essential to the discussion of superconvergence. All superconvergence results in this paper can be easily extended to continuous finite element method with \(Q^k\) element on a quasi-uniform rectangular mesh, but not on a generic quadrilateral mesh or any curved mesh.

  • \(Q^k(e)=\left\{ p(x,y)=\sum \limits _{i=0}^k\sum \limits _{j=0}^k p_{ij} x^iy^j, (x,y)\in e\right\} \) is the set of tensor product of polynomials of degree k on a cell e.

  • \(V^h=\{p(x,y)\in C^0(\Omega _h): p|_e \in Q^{k}(e),\quad \forall e\in \Omega _h\}\) denotes the continuous piecewise \(Q^{k}\) finite element space on \(\Omega _h\).

  • \(V^h_0=\{v_h\in V^h: v_h=0 \quad \text{ on }\quad \partial \Omega \}.\)

  • The norm and seminorms for \(W^{k,p}(\Omega )\) and \(1\le p<+\infty \), with standard modification for \(p=+\infty \):

    $$\begin{aligned}&\Vert u\Vert _{k,p,\Omega }=\left( \sum \limits _{i+j\le k}\iint _{\Omega }|\partial _x^i\partial _y^ju(x,y)|^pdxdy\right) ^{1/p},\\&|u|_{k,p,\Omega }=\left( \sum \limits _{i+j= k}\iint _{\Omega }|\partial _x^i\partial _y^ju(x,y)|^pdxdy\right) ^{1/p},\\&[u]_{k,p,\Omega }=\left( \iint _{\Omega }|\partial _x^k u(x,y)|^pdxdy+\iint _{\Omega }|\partial _y^k u(x,y)|^p dxdy\right) ^{1/p}. \end{aligned}$$

    Notice that \([u]_{k+1,p,\Omega }=0\) if u is a \(Q^k\) polynomial.

  • For simplicity, sometimes we may use \(\Vert u\Vert _{k,\Omega }\), \(|u|_{k,\Omega }\) and \([u]_{k,\Omega }\) denote norm and seminorms for \(H^k(\Omega )=W^{k,2}(\Omega )\).

  • When there is no confusion, \(\Omega \) may be dropped in the norm and seminorms, e.g., \(\Vert u\Vert _k=\Vert u\Vert _{k,2,\Omega }\).

  • For any \(v_h\in V^h\), \(1\le p<+\infty \)and \(k\ge 1\), we will abuse the notation to denote the broken Sobolev norm and seminorms by the following symbols

    $$\begin{aligned} \Vert v_h\Vert _{k,p,\Omega }:= & {} \left( \sum _e\Vert v_h\Vert _{k,p,e}^p\right) ^{\frac{1}{p}}, \quad |v_h|_{k,p,\Omega }:= \left( \sum _e|v_h|_{k,p,e}^p\right) ^{\frac{1}{p}},\\ {[}v_h]_{k,p,\Omega }:= & {} \left( \sum _e[v_h]_{k,p,e}^p\right) ^{\frac{1}{p}}. \end{aligned}$$
  • Let \(Z_{0,e}\) denote the set of \((k+1)\times (k+1)\) Gauss–Lobatto points on a cell e.

  • \(Z_0=\bigcup _e Z_{0,e}\) denotes all Gauss–Lobatto points in the mesh \(\Omega _h\).

  • Let \(\Vert u\Vert _{2,Z_0}\) and \(\Vert u\Vert _{\infty ,Z_0}\) denote the discrete 2-norm and the maximum norm over \(Z_0\) respectively:

    $$\begin{aligned} \Vert u\Vert _{2,Z_0}=\left[ h^2\sum _{(x,y)\in Z_0} |u(x,y)|^2\right] ^{\frac{1}{2}},\quad \Vert u\Vert _{\infty ,Z_0}=\max _{(x,y)\in Z_0} |u(x,y)|. \end{aligned}$$
  • For a continuous function f(xy), let \(f_I(x,y)\) denote its piecewise \(Q^k\) Lagrange interpolant at \(Z_{0,e}\) on each cell e, i.e., \(f_I\in V^h\) satisfies:

    $$\begin{aligned} f(x,y)=f_I(x,y), \quad \forall (x,y)\in Z_0. \end{aligned}$$
  • \(P^k(t)\) denotes the set of polynomial of degree k of variable t.

  • \((f,v)_e\) denotes the inner product in \(L^2(e)\) and (fv) denotes the inner product in \(L^2(\Omega )\):

    $$\begin{aligned} (f,v)_e=\iint _{e} fv\, dxdy,\quad (f,v)=\iint _{\Omega } fv\, dxdy=\sum _e (f,v)_e. \end{aligned}$$
  • \(\langle f,v\rangle _{e,h}\) denotes the approximation to \((f,v)_e\) by using \((k+1)\times (k+1)\)-point Gauss Lobatto quadrature with \(k \ge 2\) for integration over cell e.

  • \(\langle f,v\rangle _h\) denotes the approximation to (fv) by using \((k+1)\times (k+1)\)-point Gauss Lobatto quadrature with \(k \ge 2\) for integration over each cell e.

  • \({\hat{K}}=[-1,1]\times [-1,1]\) denotes a reference cell.

  • For f(xy) defined on e, consider \({\hat{f}}(s, t)=f(sh+ x_e,t h+ y_e)\) defined on \({\hat{K}}\). Let \({\hat{f}}_I\) denote the \(Q^k\) Lagrange interpolation of \({\hat{f}}\) at the \((k+1) \times (k+1)\) Gauss Lobatto quadrature points on \({\hat{K}}\).

  • \(({\hat{f}},{\hat{v}})_{{\hat{K}}}=\iint _{{\hat{K}}} {\hat{f}}{\hat{v}}\, dsdt.\)

  • \(\langle {\hat{f}},{\hat{v}}\rangle _{{\hat{K}}}\) denotes the approximation to \(({\hat{f}},{\hat{v}})_{{\hat{K}}}\) by using \((k+1) \times (k+1)\)-point Gauss–Lobatto quadrature.

  • On the reference cell \({\hat{K}}\), for convenience we use the superscript h over the ds or dt to denote we use \((k+1)\)-point Gauss–Lobatto quadrature on the corresponding variable. For example,

    $$\begin{aligned} \iint _{{\hat{K}}} {\hat{f}} d^hsdt=\int _{-1}^{1}\left[ w_1{\hat{f}}(-1,t)+w_{k+1}{\hat{f}}(1,t)+ \sum _{i=2}^k w_i{\hat{f}}(x_i,t)\right] dt. \end{aligned}$$

    Since \(({\hat{f}}{\hat{v}})_I\) coincides with \({\hat{f}}{\hat{v}}\) at the quadrature points, we have

    $$\begin{aligned} \iint _{{\hat{K}}} ({\hat{f}}{\hat{v}})_I dxdy=\iint _{{\hat{K}}} ({\hat{f}}{\hat{v}})_I d^hxd^hy=\iint _{{\hat{K}}} {\hat{f}}{\hat{v}} d^hxd^hy=\langle {\hat{f}},{\hat{v}}\rangle _{{\hat{K}}}. \end{aligned}$$

The following are commonly used tools and facts:

  • For two-dimensional problems,

    $$\begin{aligned} h^{k-2/p}|v|_{k,p,e}=|{\hat{v}}|_{k,p,{\hat{K}}},\quad h^{k-2/p}[v]_{k,p,e}=[{\hat{v}}]_{k,p,{\hat{K}}}, \quad 1\le p\le \infty . \end{aligned}$$
  • Inverse estimates for polynomials:

    $$\begin{aligned} \Vert v_h\Vert _{k+1, e}\le C h^{-1} \Vert v_h\Vert _{k, e},\quad \forall v_h \in V^h, k\ge 0. \end{aligned}$$
    (2.1)
  • Sobolev’s embedding in two and three dimensions: \(H^{2}({\hat{K}})\hookrightarrow C^0({\hat{K}})\).

  • The embedding implies

    $$\begin{aligned}&\Vert {\hat{f}}\Vert _{0,\infty ,{\hat{K}}}\le C \Vert {\hat{f}}\Vert _{k,2, {\hat{K}}},\quad \forall {\hat{f}}\in H^{k}({\hat{K}}), k\ge 2,\\&\Vert {\hat{f}}\Vert _{1,\infty ,{\hat{K}}}\le C \Vert {\hat{f}}\Vert _{k+1,2, {\hat{K}}}, \quad \forall {\hat{f}}\in H^{k+1}({\hat{K}}), k\ge 2. \end{aligned}$$
  • Cauchy–Schwarz inequalities in two dimensions:

    $$\begin{aligned} \sum _e \Vert u\Vert _{k,e}\Vert v\Vert _{k,e}\le \left( \sum _e \Vert u\Vert ^2_{k,e}\right) ^{\frac{1}{2}}\left( \sum _e \Vert v\Vert ^2_{k,e}\right) ^{\frac{1}{2}}, \quad \Vert u\Vert _{k,1,e}={\mathcal {O}}(h) \Vert u\Vert _{k,2,e}. \end{aligned}$$
  • Poincaré inequality: let \({\bar{u}}\) be the average of \(u\in H^1(\Omega )\) on \(\Omega \), then

    $$\begin{aligned} |u-{\bar{u}}|_{0,p,\Omega }\le C |\nabla u|_{0,p,\Omega }, \quad p\ge 1. \end{aligned}$$

    If \({\bar{u}}\) is the average of \(u\in H^1(e)\) on a cell e, we have

    $$\begin{aligned} |u-{\bar{u}}|_{0,p,e}\le C h |\nabla u|_{0,p,e}, \quad p\ge 1. \end{aligned}$$
  • For \(k\ge 2\), the \((k+1)\times (k+1)\) Gauss–Lobatto quadrature is exact for integration of polynomials of degree \(2k-1\ge k+1\) on \({\hat{K}}\).

  • Define the projection operator \({\hat{\Pi }}_1: {\hat{u}} \in L^1({\hat{K}})\rightarrow {{\hat{\Pi }}}_1{\hat{u}}\in Q^1({\hat{K}})\) by

    $$\begin{aligned} \iint _{{\hat{K}}} ({\hat{\Pi }}_1 \hat{u} ) w dsdt= \iint _{{\hat{K}}} \hat{u} w dsdt,\forall w\in Q^1({\hat{K}}). \end{aligned}$$
    (2.2)

    Notice that all degree of freedoms of \({\hat{\Pi }}_1 \hat{u}\) can be represented as a linear combination of \(\iint _{{\hat{K}}} {\hat{u}}(s,t) p(s,t)dsdt\) for \(p(s,t)=1,s,t,st\), thus the \(H^1({\hat{K}})\) (or \(H^2({\hat{K}})\)) norm of \({\hat{\Pi }}_1 \hat{u}\) are determined by \(\iint _{{\hat{K}}} {\hat{u}}(s,t) p(s,t)dsdt\). By Cauchy–Schwarz inequality \(|\iint _{{\hat{K}}} {\hat{u}}(s,t) {\hat{p}}(s,t)dsdt|\le \Vert {\hat{u}}\Vert _{0,2,{\hat{K}}}\Vert {\hat{p}}\Vert _{0,2,{\hat{K}}}\le C \Vert {\hat{u}}\Vert _{0,2,{\hat{K}}}\), we have \(\Vert \Pi _1 \hat{u}\Vert _{1,2,{\hat{K}}} \le C \Vert {\hat{u}}\Vert _{0,2,{\hat{K}}}\), which means \({\hat{\Pi }}_1\) is a continuous linear mapping from \(L^2({\hat{K}})\) to \(H^1({\hat{K}})\). By a similar argument, one can show \({\hat{\Pi }}_1\) is a continuous linear mapping from \(L^2({\hat{K}})\) to \(H^2({\hat{K}})\).

2.2 Coercivity and Elliptic Regularity

We consider the elliptic variational problem of finding \(u\in H_0^1(\Omega )\) to satisfy

$$\begin{aligned} A(u,v): =\iint _\Omega (\nabla v^T {\mathbf {a}} \nabla u +\mathbf{b}\nabla u v + c u v) \,dx dy =(f,v), \forall v\in H^1_0(\Omega ), \end{aligned}$$
(2.3)

where \({\mathbf {a}}=\begin{pmatrix} a^{11} &{}\quad a^{12}\\ a^{21} &{}\quad a^{22} \end{pmatrix}\) is real symmetric positive definite and \(\mathbf{b}=[b^1 \quad b^2]\). Assume the coefficients \({\mathbf {a}}\), \({\mathbf {b}}\) and c are smooth with uniform upper bounds, thus \(A(u,v)\le C\Vert u\Vert _1\Vert v\Vert _1\) for any \(u, v\in H^1_0(\Omega )\). We denote \(\lambda _{{\mathbf {a}}}\) as the smallest eigenvalues of \({\mathbf {a}}\). Assume \(\lambda _{{\mathbf {a}}}\) has a positive lower bound and \( \nabla \cdot {\mathbf {b}}\le 2c \), so that coercivity of the bilinear form can be easily achieved. Since

$$\begin{aligned} ({\mathbf {b}}\cdot \nabla u,v)=\int _{\partial \Omega } uv {\mathbf {b}}\cdot {\mathbf {n}} ds-(\nabla \cdot (v{\mathbf {b}}), u)=\int _{\partial \Omega } uv {\mathbf {b}}\cdot {\mathbf {n}} ds-({\mathbf {b}}\cdot \nabla v, u)-(v\nabla \cdot {\mathbf {b}}, u), \end{aligned}$$

we have

$$\begin{aligned} 2({\mathbf {b}}\cdot \nabla v,v)+2(c v, v)=\int _{\partial \Omega } v^2 {\mathbf {b}}\cdot {\mathbf {n}} ds+((2c-\nabla \cdot {\mathbf {b}}) v, v)\ge 0,\quad \forall v\in H^1_0(\Omega ). \qquad \end{aligned}$$
(2.4)

By the equivalence of two norms \(|\cdot |_{1}\) and \(\Vert \cdot \Vert _{1}\) for the space \(H^1_0(\Omega )\) (see [5]), we conclude that the bilinear form \(A(u,v)=({\mathbf {a}} \nabla u, \nabla v)+({\mathbf {b}}\cdot \nabla u,v)+(c u, v)\) satisfies coercivity \(A(v,v)\ge C \Vert v\Vert _1\) for any \(v\in H^1_0(\Omega )\).

The coercivity can also be achieved if we assume \(| {\mathbf {b}}|< 4\lambda _{{\mathbf {a}}}c \). By Young’s inequality

$$\begin{aligned} |( {\mathbf {b}}\cdot \nabla v, v )| \le \iint _{\Omega } \frac{|\mathbf{b} \cdot \nabla v|^2}{4c} + c |v|^2dxdy \le \left( \frac{|\mathbf{b}|^2}{4c} \nabla v, \nabla v\right) + ( c v, v), \end{aligned}$$

we have

$$\begin{aligned}&A(v,v) \ge ({\mathbf {a}} \nabla v, \nabla v) +(c v, v) -|( \mathbf{b}\cdot \nabla v, v )| \ge \left( \left( \lambda _{\mathbf{a}}-\frac{|{\mathbf {b}}|^2}{4c}\right) \nabla v, \nabla v\right) >0,\nonumber \\&\quad \forall v\in H^1_0(\Omega ). \end{aligned}$$
(2.5)

Let \(A^*\) be the dual operator of A, i.e., \(A^*(u,v)=A(v,u)\). We need to assume the elliptic regularity holds for the dual problem of (2.3) :

$$\begin{aligned} w\in H^1_0(\Omega ), A^*(w,v)=(f,v),\quad \forall v\in H_0^1(\Omega ) \Longrightarrow \Vert w\Vert _2\le C\Vert f\Vert _0, \end{aligned}$$
(2.6)

where C is independent of w and f. See [9, 16] for the elliptic regularity with Lipschitz continuous coefficients on a Lipschitz domain.

3 Quadrature Error Estimates

In the following, we will use \({\hat{\quad }}\) for a function to emphasize the function is defined on or transformed to the reference cell \({\hat{K}} =[-1,1]\times [-1,1]\) from a mesh cell.

3.1 Standard Estimates

The Bramble–Hilbert Lemma for \(Q^k\) polynomials can be stated as follows, see Exercise 3.1.1 and Theorem 4.1.3 in [6]:

Theorem 3.1

If a continuous linear mapping \({{\hat{\Pi }}}: H^{k+1}({\hat{K}})\rightarrow H^{k+1}({\hat{K}})\) satisfies \({{\hat{\Pi }}} {\hat{v}}={\hat{v}}\) for any \({\hat{v}}\in Q^k({\hat{K}})\), then

$$\begin{aligned} \Vert {\hat{u}}-{{\hat{\Pi }}}{\hat{u}}\Vert _{k+1,{\hat{K}}}\le C [{\hat{u}}]_{k+1, {\hat{K}}}, \quad \forall {\hat{u}}\in H^{k+1}({\hat{K}}). \end{aligned}$$
(3.1)

Thus if \(l(\cdot )\) is a continuous linear form on the space \(H^{k+1}({\hat{K}})\) satisfying \(l({\hat{v}})=0,\forall {\hat{v}}\in Q^k({\hat{K}}),\) then

$$\begin{aligned} |l({\hat{u}})|\le C \Vert l\Vert '_{k+1, {\hat{K}}} [{\hat{u}}]_{k+1,{\hat{K}}},\quad \forall {\hat{u}}\in H^{k+1}({\hat{K}}), \end{aligned}$$

where \( \Vert l\Vert '_{k+1, {\hat{K}}}\) is the norm in the dual space of \(H^{k+1}({\hat{K}})\).

By applying Bramble–Hilbert Lemma, we have the following standard quadrature estimates. See Theorems 2.3 and 2.4 in [13] for the detailed proof.

Theorem 3.2

For a sufficiently smooth function \(a(x,y) \in H^{2k}(e)\) and \(k \ge 2\), let m is an integer satisfying \(k\le m\le 2k\), we have

$$\begin{aligned} \iint _e a(x,y)dxdy- \iint _e a_I(x,y)dxdy={\mathcal {O}}(h^{m+1})[a]_{m,e}={\mathcal {O}} (h^{m+2})[a]_{m,\infty ,e}. \end{aligned}$$

Theorem 3.3

If \(f\in H^{k+2}(\Omega )\) with \(k \ge 2\), then

$$\begin{aligned} (f,v_h)-\langle f,v_h\rangle _h ={\mathcal {O}}(h^{k+2}) \Vert f\Vert _{k+2} \Vert v_h\Vert _2,\quad \forall v_h\in V^h. \end{aligned}$$

Remark 3.4

By the Theorem 3.1, on the reference cell \({\hat{K}}\), for \(a(x,y) \in H^{k+2}(e)\) and \(k \ge 2\), we have

$$\begin{aligned} \iint _{{\hat{K}}} {\hat{a}}(s,t)-{\hat{a}}_I(s,t)dsdt\le C [{\hat{a}}]_{k+2,{\hat{K}}}\le C[{\hat{a}}]_{k+2,\infty ,{\hat{K}}}, \end{aligned}$$
(3.2)

and

$$\begin{aligned} \Vert {\hat{a}}-{\hat{a}}_I\Vert _{k+1,{\hat{K}}}\le C [{\hat{a}}]_{k+1,{\hat{K}}}. \end{aligned}$$
(3.3)

The following two results are also standard estimates obtained by applying the Bramble–Hilbert Lemma.

Lemma 3.5

If \(f\in H^{2}(\Omega )\) or \(f\in V^h\), we have \((f,v_h)-\langle f,v_h\rangle _h ={\mathcal {O}}(h^2) |f|_{2} \Vert v_h\Vert _0,\quad \forall v_h\in V^h.\)

Proof

For simplicity, we ignore the subscript in \(v_h\). Let E(f) denote the quadrature error for integrating f(xy) on e. Let \({\hat{E}}({\hat{f}})\) denote the quadrature error for integrating \({\hat{f}}(s,t)=f(x_e+sh,y_e+th)\) on the reference cell \({\hat{K}}\). Due to the embedding \(H^{2}({\hat{K}})\hookrightarrow C^0({\hat{K}})\), we have

$$\begin{aligned} |{\hat{E}}({\hat{f}} {\hat{v}})|\le C |{\hat{f}} {\hat{v}}|_{0,\infty ,{\hat{K}}}\le C |{\hat{f}} |_{0,\infty ,{\hat{K}}}|{\hat{v}}|_{0,\infty ,{\hat{K}}}\le C\Vert {\hat{f}}\Vert _{2,{\hat{K}}} \Vert {\hat{v}}\Vert _{0,{\hat{K}}}. \end{aligned}$$

Thus the mapping \({\hat{f}}\rightarrow E({\hat{f}} {\hat{v}})\) is a continuous linear form on \(H^2({\hat{K}})\) and its norm is bounded by \(C\Vert {\hat{v}}\Vert _{0,{\hat{K}}}\). If \({\hat{f}} \in Q^1({\hat{K}})\), then we have \({\hat{E}}({\hat{f}} {\hat{v}})=0\). By the Bramble–Hilbert Lemma Theorem 3.1 on this continuous linear form, we get

$$\begin{aligned} |{\hat{E}}({\hat{f}} {\hat{v}})| \le C [{\hat{f}}]_{2,{\hat{K}}} \Vert {\hat{v}}\Vert _{0,{\hat{K}}} . \end{aligned}$$

So on a cell e, we get

$$\begin{aligned} E(fv)=h^2{\hat{E}} ({\hat{f}}{\hat{v}})\le Ch^2[{\hat{f}}]_{2,{\hat{K}}} \Vert {\hat{v}}\Vert _{0,{\hat{K}}}\le C h^2|f|_{2, e} \Vert v\Vert _{0, e}. \end{aligned}$$
(3.4)

Summing over all elements and use Cauchy–Schwarz inequality, we get the desired result. \(\square \)

Theorem 3.6

Assume all coefficients of (2.3) are in \(W^{2,\infty }(\Omega )\). We have

$$\begin{aligned} A(z_h,v_h)-A_h(z_h,v_h)={\mathcal {O}}(h) \Vert v_h\Vert _2 \Vert z_h\Vert _1, \quad \forall v_h,z_h\in V^h. \end{aligned}$$

Proof

Following the same arguments as in the proof of Lemma 3.4, we have

$$\begin{aligned} E(fv)\le C h^2|f|_{2, e} \Vert v\Vert _{0, e}, \quad \forall f, \quad v\in V^h. \end{aligned}$$

Let \(f=a^{11} (v_h)_x\) and \(v = (z_h)_x \) in the estimate above, we get

$$\begin{aligned}&|(a^{11} (z_h)_x, (v_h)_x)-\langle a^{11} (z_h)_x, (v_h)_x\rangle _h|\le C h^2 \Vert a^{11} (v_h)_x\Vert _2 \Vert (z_h)_x\Vert _0\\&\quad \le C h^2 \Vert a^{11}\Vert _{2,\infty } \Vert v_h\Vert _3 |z_h|_1 \le C h \Vert a^{11}\Vert _{2,\infty } \Vert v_h\Vert _2 |z_h|_1, \end{aligned}$$

where the inverse estimate (2.1) is used in the last inequality. Similarly, we have

$$\begin{aligned}&(a^{12} (z_h)_x, (v_h)_y)-\langle a^{12} (z_h)_x, (v_h)_y\rangle _h= C h \Vert a^{12}\Vert _{2,\infty }\Vert v_h\Vert _2 |z_h|_1,\\&(a^{22} (z_h)_y, (v_h)_y)-\langle a^{22} (z_h)_y, (v_h)_y\rangle _h= C h \Vert a^{22}\Vert _{2,\infty }\Vert v_h\Vert _2 |z_h|_1,\\&(b^1 (z_h)_x, v_h)-\langle b^1 (z_h)_x, v_h\rangle _h= C h \Vert b^1\Vert _{2,\infty } \Vert v_h\Vert _2 |z_h|_0,\\&(b^2 (z_h)_y, v_h)-\langle b^2 (z_h)_y, v_h\rangle _h= C h \Vert b^2\Vert _{2,\infty } \Vert v_h\Vert _2 |z_h|_0,\\&(c z_h, v_h)-\langle c z_h, v_h\rangle _h= C h \Vert c\Vert _{2,\infty } \Vert v_h\Vert _1 |z_h|_0, \end{aligned}$$

which implies

$$\begin{aligned} A(z_h,v_h)-A_h(z_h,v_h)={\mathcal {O}}(h) \Vert v_h\Vert _2 \Vert z_h\Vert _1. \end{aligned}$$

\(\square \)

3.2 A Refined Consistency Error

In this subsection, we will show how to establish the desired consistency error estimate for smooth enough coefficients:

$$\begin{aligned} A(u, v_h)-A_h(u,v_h)={\left\{ \begin{array}{ll} {\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h_0\\ {\mathcal {O}}(h^{k+\frac{3}{2}})\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h \end{array}\right. }. \end{aligned}$$

Theorem 3.7

Assume \(a(x,y)\in W^{k+2,\infty }(\Omega )\), \(u\in H^{k+3}(\Omega )\), \(k \ge 2\), then

figure a
$$\begin{aligned}&(a\partial _x u,v_h)-\langle a\partial _x u, v_h\rangle _h=\mathcal O(h^{k+2})\Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h_0, \end{aligned}$$
(3.7)
$$\begin{aligned}&(au,v_h)-\langle au, v_h\rangle _h=\mathcal O(h^{k+2})\Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+2}\Vert v_h\Vert _2,\quad \forall v_h\in V^h_0. \end{aligned}$$
(3.8)

Remark 3.8

We emphasize that Theorem 3.7 cannot be proven by applying the Bramble–Hilbert Lemma directly. Consider the constant coefficient case \(a(x,y)\equiv 1\) and \(k=2\) as an example,

$$\begin{aligned} (\partial _x u, \partial _x v_h)-\langle \partial _x u, \partial _x v_h\rangle _h=\sum _e \left( \iint _e u_x (v_h)_x dxdy -\iint _e u_x(v_h)_x d^hxd^hy\right) . \end{aligned}$$

Since the \(3 \times 3\) Gauss–Lobatto quadrature is exact for integrating \(Q^3\) polynomials, by Theorem 3.1 we have

$$\begin{aligned} \left| \iint _e u_x (v_h)_x dxdy -\iint _e u_x(v_h)_x d^hxd^hy\right|= & {} \left| \iint _{{\hat{K}}} {\hat{u}}_s ({\hat{v}}_h)_s dsdt -\iint _{{\hat{K}}} {\hat{u}}_s({\hat{v}}_h)_s d^hsd^ht\right| \\\le & {} C [{\hat{u}}_s ({\hat{v}}_h)_s]_{4,{\hat{K}}}. \end{aligned}$$

Notice that \({\hat{v}}_h\) is \(Q^2\) thus \(({\hat{v}}_h)_{stt}\) does not vanish and \([({\hat{v}}_h)_s]_{4,{\hat{K}}}\le C |{\hat{v}}_h|_{3,{\hat{K}}}\). So by Bramble–Hilbert Lemma for \(Q^k\) polynomials, we can only get

$$\begin{aligned} \iint _e u_x (v_h)_x dxdy -\iint _e u_x(v_h)_x d^hxd^hy={\mathcal {O}}(h^4) \Vert u\Vert _{5,e} \Vert v_h\Vert _{3,e}. \end{aligned}$$

Thus by Cauchy–Schwarz inequality after summing over e, we only have

$$\begin{aligned} (\partial _x u, \partial _x v_h)-\langle \partial _x u, \partial _x v_h\rangle _h={\mathcal {O}}(h^4) \Vert u\Vert _5\Vert v_h\Vert _3. \end{aligned}$$

In order to get the desired estimate involving only the broken \(H^2\)-norm of \(v_h\), we will take advantage of error cancellations between neighboring cells through integration by parts.

Proof

For simplicity, we ignore the subscript \(_h\) of \(v_h\) in this proof and all the following v are in \(V^h\) which are \(Q^k\) polynomials in each cell. First, by Theorem 3.3, we easily obtain (3.7) and (3.8):

$$\begin{aligned} (au_x,v)-\langle au_x,v\rangle _h= & {} {\mathcal {O}}(h^{k+2}) \Vert au_x\Vert _{k+2}\Vert v\Vert _{2}={\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+3}\Vert v\Vert _{2},\\ (au,v)-\langle au, v\rangle _h= & {} \mathcal {O}(h^{k+2})\Vert au\Vert _{k+2}\Vert v\Vert _{2}= {\mathcal {O}}(h^{k+2})\Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+2}\Vert v\Vert _{2}. \end{aligned}$$

We will only discuss \((au_x, v_x)-\langle au_x, v_x\rangle _h\) and the same discussion also applies to derive (3.6a) and (3.6b).

Since we have

$$\begin{aligned} (au_x, v_x)-\langle au_x, v_x\rangle _h&=\sum _e \left( \iint _e au_x v_x dxdy -\iint _e au_xv_x d^hxd^hy\right) \\&= \sum _e \left( \iint _{{\hat{K}}} {\hat{a}} {\hat{u}}_s {\hat{v}}_s dsdt -\iint _{{\hat{K}}} {\hat{a}} {\hat{u}}_s {\hat{v}}_s d^hsd^ht \right) \\&=\sum _e \left( \iint _{{\hat{K}}} {\hat{a}}{\hat{u}}_s {\hat{v}}_s dsdt -\iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s d^hsd^ht \right) , \end{aligned}$$

where we use the fact \({\hat{a}}{\hat{u}}_s {\hat{v}}_s = ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s\) on the Gauss–Lobatto quadrature points. For fixed t, \(({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s\) is a polynomial of degree \(2k-1\) w.r.t. variable s, thus the \((k+1)\)-point Gauss–Lobatto quadrature is exact for its s-integration, i.e.,

$$\begin{aligned} \iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s d^hsd^ht=\iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s dsd^ht . \end{aligned}$$

To estimate the quadrature error we introduce some intermediate values then do interpretation by parts,

$$\begin{aligned}&\iint _{{\hat{K}}} {\hat{a}}{\hat{u}}_s {\hat{v}}_s dsdt -\iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s d^hsd^ht \end{aligned}$$
(3.9)
$$\begin{aligned}&\quad =\iint _{{\hat{K}}} {\hat{a}}{\hat{u}}_s {\hat{v}}_s dsdt -\iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s dsdt + \iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s dsdt-\iint _{{\hat{K}}} ({\hat{a}}{\hat{u}}_s)_I{\hat{v}}_s dsd^ht \end{aligned}$$
(3.10)
$$\begin{aligned}&\quad = \iint _{{\hat{K}}} \left[ {\hat{a}}{\hat{u}}_s - ({\hat{a}}{\hat{u}}_s)_I \right] {\hat{v}}_s dsdt + \left( \iint _{{\hat{K}}} \left[ ({\hat{a}}{\hat{u}}_s)_I\right] _s{\hat{v}} dsd^ht -\iint _{{\hat{K}}} \left[ ({\hat{a}}{\hat{u}}_s)_I\right] _s{\hat{v}} dsdt \right) \end{aligned}$$
(3.11)
$$\begin{aligned}&\qquad +\left( \left. \int _{-1}^1 ({\hat{a}}{\hat{u}}_s)_I{\hat{v}} dt \right| ^{s=1}_{s=-1} - \left. \int _{-1}^1 ({\hat{a}}{\hat{u}}_s)_I{\hat{v}} d^ht \right| ^{s=1}_{s=-1} \right) = I + II + III. \end{aligned}$$
(3.12)

For the first term in (3.12), let \(\overline{{\hat{v}}_s}\) be the cell average of \({\hat{v}}_s\) on \({\hat{K}}\), then

$$\begin{aligned} I = \iint _{{\hat{K}}} \left( {\hat{a}}{\hat{u}}_s -({\hat{a}}{\hat{u}}_s)_I\right) \overline{{\hat{v}}_s} dsdt+\iint _{{\hat{K}}} \left( {\hat{a}}{\hat{u}}_s -({\hat{a}}{\hat{u}}_s)_I\right) ({\hat{v}}_s-\overline{{\hat{v}}_s})dsdt. \end{aligned}$$

By (3.2) we have

$$\begin{aligned} \left| \iint _{{\hat{K}}} \left( {\hat{a}}{\hat{u}}_s -({\hat{a}}{\hat{u}}_s)_I\right) \overline{{\hat{v}}_s} dsdt \right| \le C [{\hat{a}}{\hat{u}}_s]_{k+2,{\hat{K}}}\left| \overline{{\hat{v}}_s}\right| =\mathcal O(h^{k+2})\Vert {\hat{a}}\Vert _{k+2,\infty ,e}\Vert {\hat{u}}\Vert _{k+3,e} \Vert {\hat{v}}\Vert _{1,e}. \end{aligned}$$

By Cauchy–Schwarz inequality, the Bramble–Hilbert Lemma on interpolation error and Poincaré inequality, we have

$$\begin{aligned}&\left| \iint _{{\hat{K}}} \left( {\hat{a}}{\hat{u}}_s -({\hat{a}}{\hat{u}}_s)_I\right) ({\hat{v}}_s-\overline{{\hat{v}}_s})dsdt \right| \le |{\hat{a}}{\hat{u}}_s -({\hat{a}}{\hat{u}}_s)_I|_{0,{\hat{K}}}|{\hat{v}}_s -\overline{{\hat{v}}_s}|_{0,{\hat{K}}} \\&\quad \le C[{\hat{a}}{\hat{u}}_s ]_{k+1,{\hat{K}}}|{\hat{v}}|_{2,{\hat{K}}} ={\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+1,\infty , e}\Vert u\Vert _{k+2, e}\Vert v\Vert _{2,e}. \end{aligned}$$

Thus we have

$$\begin{aligned} I ={\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+2,\infty ,e}\Vert u\Vert _{k+3,e}\Vert v\Vert _{2,e}. \end{aligned}$$

For the second term in (3.12), we can estimate it the same way as in the proof of Theorem 2.4. in [13]. For each \({\hat{v}} \in Q^k({\hat{K}})\) we can define a linear form on \(H^k({\hat{K}})\) as

$$\begin{aligned} {\hat{E}}_{{\hat{v}}}({\hat{f}}) = \iint _{{\hat{K}}} ({\hat{F}}_I)_s {\hat{v}} dsdt -\iint _{{\hat{K}}} ({\hat{F}}_I)_s{\hat{v}} dsd^ht, \end{aligned}$$

where \({\hat{F}}\) is an antiderivative of \({\hat{f}}\) w.r.t. variable s. Due to the linearity of interpolation operator and differentiating operation, \({\hat{E}}_{{\hat{v}}}\) is well defined. By the embedding \(H^{2}({\hat{K}})\hookrightarrow C^0({\hat{K}})\), we have

$$\begin{aligned}&{\hat{E}}_{{\hat{v}}}({\hat{f}}) \le C \Vert {\hat{F}}\Vert _{0,\infty ,{\hat{K}}}\Vert {\hat{v}}\Vert _{0,\infty ,{\hat{K}}} \le C \Vert {\hat{f}}\Vert _{0,\infty ,{\hat{K}}}\Vert {\hat{v}}\Vert _{0,\infty ,{\hat{K}}} \\&\quad \le C \Vert {\hat{f}}\Vert _{2,{\hat{K}}} \Vert {\hat{v}}\Vert _{0,{\hat{K}}} \le C \Vert {\hat{f}}\Vert _{k,{\hat{K}}} \Vert {\hat{v}}\Vert _{0,{\hat{K}}}, \end{aligned}$$

where we use the fact that all the norms on \(Q^k({\hat{K}})\) are equivalent to derive the first inequality. The above inequalities imply that the mapping \({\hat{E}}_{{\hat{v}}}\) is a continuous linear form on \(H^k({\hat{K}})\). With projection \(\Pi _1\) defined in (2.2), we have

$$\begin{aligned} {\hat{E}}_{{\hat{v}}}({\hat{f}}) = {\hat{E}}_{{\hat{v}} - \Pi _1{\hat{v}}} ({\hat{f}})+ {\hat{E}}_{ \Pi _1{\hat{v}}}({\hat{f}}),\quad \forall {\hat{v}} \in Q^k({\hat{K}}). \end{aligned}$$

Notice that \({\hat{F}}\) by definition is an antiderivative of \({\hat{f}}\) w.r.t. only variable s. If \({\hat{f}} \in Q^{k-1}({\hat{K}})\), then \({\hat{F}}_I\) is a polynomial of degree only \(k-1\) w.r.t. to variable t thus \(({\hat{F}}_I)_s\in Q^{k-1}({\hat{K}})\). The quadrature is exact for polynomials of degree \(2k-1\), thus \( Q^{k-1}({\hat{K}}) \subset \ker {\hat{E}}_{{\hat{v}} - \Pi _1{\hat{v}}}\). So by the Bramble–Hilbert Lemma, we get

$$\begin{aligned} {\hat{E}}_{{\hat{v}} - \Pi _1{\hat{v}}} ({\hat{f}}) \le C[f]_{k,{\hat{K}}}\Vert {\hat{v}}- \Pi _1{\hat{v}}\Vert _{0,{\hat{K}}}\le C[f]_{k,{\hat{K}}}|{\hat{v}}|_{2,{\hat{K}}}, \end{aligned}$$

and we also have

$$\begin{aligned} {\hat{E}}_{\Pi _1{\hat{v}}} ({\hat{f}}) =\iint _{{\hat{K}}}({\hat{F}}_I)_s \Pi _1{\hat{v}} dsdt -\iint _{{\hat{K}}} ({\hat{F}}_I)_s\Pi _1{\hat{v}} dsd^ht=0. \end{aligned}$$

Thus we have

$$\begin{aligned}&\iint _{{\hat{K}}} \left[ ({\hat{a}}{\hat{u}}_s)_I\right] _s{\hat{v}} dsd^ht -\iint _{{\hat{K}}} \left[ ({\hat{a}}{\hat{u}}_s)_I\right] _s{\hat{v}} dsdt = -{\hat{E}}_{{\hat{v}}}(({\hat{a}}{\hat{u}}_s)_s) = -{\hat{E}}_{{\hat{v}} - \Pi _1{\hat{v}}} (({\hat{a}}{\hat{u}}_s)_s)\\&\quad \le C [({\hat{a}}{\hat{u}}_s)_s]_{k,{\hat{K}}}|{\hat{v}}_h|_{2,{\hat{K}}} \le C |{\hat{a}}{\hat{u}}_s|_{k+1,{\hat{K}}}|{\hat{v}}|_{2,{\hat{K}}} = {\mathcal {O}}(h^{k+2})\Vert a\Vert _{k+1,\infty , e}\Vert u\Vert _{k+2,e}|v|_{2,e} \end{aligned}$$

Now we only need to discuss the line integral term. Let \(L_2\) and \(L_4\) denote the left and right boundary of \(\Omega \) and let \(l^e_2\) and \(l^e_4\) denote the left and right edge of element e or \(l^{{\hat{K}}}_2\) and \(l^{{\hat{K}}}_4\) for \({\hat{K}}\). Since \(({\hat{a}}{\hat{u}}_s)_I{\hat{v}}\) mapped back to e will be \(\frac{1}{h}(au_x)_Iv\) which is continuous across \(l^e_2\) and \(l^e_4\), after summing over all elements e, the line integrals along the inner edges are canceled out and only the line integrals on \(L_2\) and \(L_4\) remain.

For a cell e adjacent to \(L_2\), consider its reference cell \({\hat{K}}\), and define a linear form \({\hat{E}}({\hat{f}} ) = \int _{-1}^1 {\hat{f}}(-1,t) dt - \int _{-1}^1 {\hat{f}}(-1,t)d^ht\), then we have

$$\begin{aligned} {\hat{E}}({\hat{f}} {\hat{v}}) \le C |{\hat{f}}|_{0,\infty , l^{{\hat{K}}}_2} |{\hat{v}}|_{0,\infty ,l^{{\hat{K}}}_2} \le C\Vert {\hat{f}}\Vert _{2,l^{{\hat{K}}}_2}\Vert {\hat{v}}\Vert _{0,l^{{\hat{K}}}_2}, \end{aligned}$$

which means that the mapping \({\hat{f}} \rightarrow {\hat{E}}({\hat{f}} {\hat{v}})\) is continuous with operator norm less than \(C\Vert {\hat{v}}\Vert _{0,l^{{\hat{K}}}_2}\) for some C. Clearly we have

$$\begin{aligned} {\hat{E}}({\hat{f}} {\hat{v}}) = {\hat{E}}({\hat{f}} \Pi _1 {\hat{v}}) + {\hat{E}}({\hat{f}}( {\hat{v}}-\Pi _1 {\hat{v}})). \end{aligned}$$

By the Theorem 3.1 we get

$$\begin{aligned}&{\hat{E}}(({\hat{a}}{\hat{u}}_s)_I({\hat{v}}-\Pi _1 {\hat{v}})) \le C[({\hat{a}}{\hat{u}}_s)_I]_{k,l^{{\hat{K}}}_2}[{\hat{v}}]_{2, l^{{\hat{K}}}_2} \le C(|{\hat{a}}{\hat{u}}_s-({\hat{a}}{\hat{u}}_s)_I|_{k,l ^{{\hat{K}}}_2}+|{\hat{a}}{\hat{u}}_s|_{k,l^{{\hat{K}}}_2}) [{\hat{v}}]_{2,l^{{\hat{K}}}_2}\\&\quad \le (|{\hat{a}}{\hat{u}}_s|_{k+1,l^{{\hat{K}}}_2}+|{\hat{a}} {\hat{u}}_s|_{k,l^{{\hat{K}}}_2})[{\hat{v}}]_{2,l^{{\hat{K}}}_2} = \mathcal O(h^{k+2})\Vert a\Vert _{k+1,\infty , l^e_2}\Vert u\Vert _{k+2,l^e_2}[v]_{2,l^{e}_2}, \end{aligned}$$

where the first inequality comes from the accuracy of the \((k+1)\)-point Gauss–Lobatto quadrature rule, i.e. \(\hat{E}({\hat{f}})=0,\, \forall {\hat{f}} \in Q^{2k-1}({\hat{K}})\). The \((k+1)\)-point Gauss–Lobatto quadrature rule also gives

$$\begin{aligned} {\hat{E}}(({\hat{a}}{\hat{u}}_s)_I\Pi _1 {\hat{v}}) = 0. \end{aligned}$$

For the third term in (3.12), we sum them up over all the elements. Then for the line integral along \(L_2\)

$$\begin{aligned}&\sum _{e \cap L_2\ne \emptyset }\int _{-1}^1 ({\hat{a}}{\hat{u}}_s)_I(-1,t){\hat{v}}(-1,t) dt - \sum _{e \cap L_2\ne \emptyset } \int _{-1}^1 ({\hat{a}}{\hat{u}}_s)_I(-1,t){\hat{v}}(-1,t) d^ht \\&\quad = \sum _{e \cap L_2\ne \emptyset }{\hat{E}}(({\hat{a}}{\hat{u}}_s)_I{\hat{v}} ) = \sum _{e \cap L_2\ne \emptyset } {\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+1,\infty , l^e_2}\Vert u\Vert _{k+2,l^e_2}|v|_{2,l^{e}_2}. \end{aligned}$$

Let \(s_\alpha \) and \(\omega _\alpha \) (\(\alpha =1,2,\cdots , k+2\)) denote the quadrature points and weights in \((k+2)\)-point Gauss–Lobatto quadrature rule for \(s\in [-1,1]\). Since \({\hat{v}}^2_{tt}(s, t)\in Q^{2k}({\hat{K}})\), \((k+2)\)-point Gauss–Lobatto quadrature is exact for s-integration thus

$$\begin{aligned} \int _{-1}^1\int _{-1}^1 {\hat{v}}_{tt}^2(s, t)dsdt=\sum _{\alpha =1}^{k+2}\omega _\alpha \int _{-1}^1 {\hat{v}}_{tt}^2(s_\alpha , t)dt, \end{aligned}$$

which implies

$$\begin{aligned} \int _{-1}^1 {\hat{v}}_{tt}^2(\pm 1, t)dt \le C\int _{-1}^1\int _{-1}^1 {\hat{v}}_{tt}^2(s, t)dsdt, \end{aligned}$$
(3.13)

thus

$$\begin{aligned} h^{\frac{1}{2}}|v|_{2,l^{e}_2}\le C [v]_{2,e}. \end{aligned}$$

By Cauchy–Schwarz inequality and trace inequality, we have

$$\begin{aligned}&\sum _{e \cap L_2\ne \emptyset } \left( \left. \int _{-1}^1 ({\hat{a}}{\hat{u}}_s)_I{\hat{v}} dt \right| ^{s=1}_{s=-1} - \left. \int _{-1}^1 ({\hat{a}}{\hat{u}}_s)_I{\hat{v}} d^ht \right| ^{s=1}_{s=-1} \right) \\&\quad = \sum _{e \cap L_2\ne \emptyset } {\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+1,\infty , l^e_2}\Vert u\Vert _{k+2,l^e_2}|v|_{2,l^{e}_2}\\&\quad = \sum _{e \cap L_2\ne \emptyset } \mathcal O\left( h^{k+\frac{3}{2}}\right) \Vert a\Vert _{k+1,\infty ,l^e_2}\Vert u\Vert _{k+2,l^e_2}|v|_{2,e} \\&\quad = \mathcal O\left( h^{k+\frac{3}{2}}\right) \Vert a\Vert _{k+1,\infty ,\Omega }\Vert u\Vert _{k+2,L_2}|v|_{2,\Omega } \\&\quad = {\mathcal {O}}\left( h^{k+\frac{3}{2}}\right) \Vert a\Vert _{k+1,\infty ,\Omega }\Vert u\Vert _{k+3,\Omega }|v|_{2,\Omega }. \end{aligned}$$

Combine all the estimates above, we get (3.5b). Since the \(\frac{1}{2}\) order loss is only due to the line integral along the boundary \(\partial \Omega \). If \(v\in V_0^h\), \(v_{yy}=0\) on \(L_2\) and \(L_4\) so we have (3.5a). \(\square \)

4 Superconvergence of Bilinear Forms

The M-type projection in [3, 4] is a very convenient tool for discussing the superconvergence of function values. Let \(u_p\) be the M-type \(Q^k\) projection of the smooth exact solution u and its definition will be given in the following subsection. To establish the superconvergence of the original finite element method (1.1) for a generic elliptic problem (2.3) with smooth coefficients, one can show the following superconvergence of bilinear forms, see [4, 14] (see also [13] for a detailed proof):

$$\begin{aligned} A(u-u_p,v_h)= {\left\{ \begin{array}{ll} {\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h_0,\\ {\mathcal {O}}(h^{k+\frac{3}{2}})\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h. \end{array}\right. } \end{aligned}$$

In this section we will show the superconvergence of the bilinear form \(A_h\):

figure b

4.1 Definition of M-Type Projection

We first recall the definition of M-type projection. More detailed definition can also be found in [13]. Legendre polynomials on the reference interval \([-1,1]\) are given as

$$\begin{aligned} l_k(t)=\frac{1}{2^k k!}\frac{d^k}{dt^k} (t^2-1)^k: l_0(t)=1, l_1(t)=t, l_2(t)=\frac{1}{2}(3t^2-1), \ldots , \end{aligned}$$

which are \(L^2\)-orthogonal to one another. Define their antiderivatives as M-type polynomials:

$$\begin{aligned}&M_{k+1}(t)=\frac{1}{2^k k!}\frac{d^{k-1}}{dt^{k-1}} (t^2-1)^k: M_0(t)=1, M_1(t)=t, M_2(t)=\frac{1}{2}(t^2-1),\\&M_3(t)=\frac{1}{2}(t^3-t),\ldots . \end{aligned}$$

which satisfy the following properties:

  • If \(j-i\ne 0, \pm 2\), then \(M_i(t)\perp M_j(t)\), i.e., \(\int _{-1}^1 M_i(t)M_j(t) dt=0.\)

  • Roots of \(M_k(t)\) are the k-point Gauss–Lobatto quadrature points for \([-1,1]\).

Since Legendre polynomials form a complete orthogonal basis for \(L^{2}([-1,1])\), for any \({\hat{f}}(t)\in H^1([-1,1])\), its derivative \({\hat{f}}'(t)\) can be expressed as Fourier–Legendre series

$$\begin{aligned} {\hat{f}}'(t)=\sum _{j=0}^{\infty }{\hat{b}}_{j+1}l_j(t), \quad {\hat{b}}_{j+1}=\left( j+\frac{1}{2}\right) \int _{-1}^1 {\hat{f}}'(t)l_j(t)dt. \end{aligned}$$

The one-dimensional M-type projection is defined as \( {\hat{f}}_k(t)=\sum _{j=0}^{k}{\hat{b}}_{j}M_j(t), \) where \({\hat{b}}_0=\frac{{\hat{f}}(1)+{\hat{f}}(-1)}{2}\) is determined by \({\hat{b}}_1=\frac{{\hat{f}}(1)-{\hat{f}}(-1)}{2}\) so that \({\hat{f}}_k(\pm 1)={\hat{f}}(\pm 1)\). We have \( {\hat{f}}(t)=\lim \limits _{k\rightarrow \infty }{\hat{f}}_k(t)=\sum \limits _{j=0}^{\infty }{\hat{b}}_{j}M_j(t). \) The remainder \({\hat{R}}[{\hat{f}}]_k(t)\) of one-dimensional M-type projection is

$$\begin{aligned} {\hat{R}}[{\hat{f}}]_k(t)={\hat{f}}(t)-{\hat{f}}_k(t)=\sum _{j=k+1}^{\infty }{\hat{b}}_{j}M_j(t). \end{aligned}$$

For a function \({\hat{f}}(s,t)\in H^2({\hat{K}})\) on the reference cell \({\hat{K}}=[-1,1]\times [-1,1]\), its two-dimensional M-type expansion is given as

$$\begin{aligned} {\hat{f}}(s,t)=\sum _{i=0}^\infty \sum _{j=0}^\infty {\hat{b}}_{i,j} M_i(s)M_j(t), \end{aligned}$$

where

$$\begin{aligned} {\hat{b}}_{0,0}&=\frac{1}{4}[{\hat{f}}(-1,-1)+{\hat{f}}(-1,1)+{\hat{f}}(1,-1)+{\hat{f}}(1,1)],\\ {\hat{b}}_{0,j}, {\hat{b}}_{1,j}&=\frac{2j-1}{4}\int _{-1}^1 [{\hat{f}}_t(1,t)\pm {\hat{f}}_t(-1,t)]l_{j-1}(t)dt, \quad j\ge 1,\\ {\hat{b}}_{i,0}, {\hat{b}}_{i,1}&=\frac{2i-1}{4}\int _{-1}^1 [{\hat{f}}_s(s,1)\pm {\hat{f}}_s(s,-1)]l_{i-1}(s)ds, \quad i\ge 1,\\ {\hat{b}}_{i,j}&=\frac{(2i-1)(2j-1)}{4}\iint _{{\hat{K}}}{\hat{f}}_{st}(s,t)l_{i-1}(s)l_{j-1}(t)dsdt, \quad i,j\ge 1. \end{aligned}$$

The M-type \(Q^k\) projection of \({\hat{f}}\) on \({\hat{K}}\) and its remainder are defined as

$$\begin{aligned} {\hat{f}}_{k,k}(s,t)=\sum _{i=0}^k\sum _{j=0}^k {\hat{b}}_{i,j} M_i(s)M_j(t), \quad {\hat{R}}[{\hat{f}}]_{k,k}(s,t)={\hat{f}}(s,t)-{\hat{f}}_{k,k}(s,t). \end{aligned}$$

The M-type \(Q^k\) projection is equivalent to the point-line-plane interpolation used in [14, 15]. See Theorem 3.1 in [13] for the proof of the following fact:

Theorem 4.1

For \(k\ge 2\), the M-type \(Q^k\) projection is equivalent to the \(Q^k\) point-line-plane projection \(\Pi \) defined as follows:

  1. 1.

    \(\Pi {\hat{u}}={\hat{u}}\) at four corners of \({\hat{K}}=[-1,1]\times [-1,1]\).

  2. 2.

    \(\Pi {\hat{u}}-{\hat{u}}\) is orthogonal to polynomials of degree \(k-2\) on each edge of \({\hat{K}}\).

  3. 3.

    \(\Pi {\hat{u}}-{\hat{u}}\) is orthogonal to any \({\hat{v}}\in Q^{k-2}({\hat{K}})\) on \({\hat{K}}\).

For f(xy) on \(e=[x_e-h, x_e+h]\times [y_e-h, y_e+h]\), let \({\hat{f}}(s,t)= f( sh+x_e, t h+y_e)\) then the M-type \(Q^k\) projection of f on e and its remainder are defined as

$$\begin{aligned} f_{k,k}(x,y)={\hat{f}}_{k,k}\left( \frac{x-x_e}{h},\frac{y-y_e}{h}\right) , \quad R[ f]_{k,k}(x,y)= f(x,y)- f_{k,k}(x,y). \end{aligned}$$

Now consider a function \(u(x, y)\in H^{k+2} (\Omega )\), let \(u_p (x, y)\) denote its piecewise M-type \(Q^k\) projection on each element e in the mesh \(\Omega _h\). The first two properties in Theorem 4.1 imply that \(u_p (x, y)\) on each edge of e is uniquely determined by u(xy) along that edge. So \(u_p (x, y)\) is a piecewise continuous \(Q^k\) polynomial on \(\Omega _h\).

M-type projection has the following properties. See Theorem 3.2, Lemmas 3.1 and 3.2 in [13] for the proof.

Theorem 4.2

For \(k \ge 2\),

$$\begin{aligned}&\Vert u-u_p\Vert _{2,Z_0}={\mathcal {O}}(h^{k+2}) \Vert u\Vert _{k+2},\quad \forall u\in H^{k+2}(\Omega ).\\&\Vert u-u_p\Vert _{\infty ,Z_0}={\mathcal {O}}(h^{k+2}) \Vert u\Vert _{k+2,\infty }, \quad \forall u\in W^{k+2,\infty }(\Omega ). \end{aligned}$$

Lemma 4.3

For \({\hat{f}} \in H^{k+1}({\hat{K}})\), \(k \ge 2\),

  1. 1.

    \(|{\hat{R}}[{\hat{f}}]_{k,k}|_{0,\infty ,{\hat{K}}}\le C [{\hat{f}}]_{k+1,{\hat{K}}},\quad |\partial _s {\hat{R}}[{\hat{f}}]_{k,k}|_{0,\infty , {\hat{K}}}\le C[{\hat{f}}]_{k+1,{\hat{K}}}.\)

  2. 2.

    \({\hat{R}}[{\hat{f}}]_{k+1,k+1}-{\hat{R}}[{\hat{f}}]_{k,k}=M_{k+1}(t)\sum _{i=0}^k {\hat{b}}_{i,k+1}M_{i}(s)+M_{k+1}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t).\)

  3. 3.

    \(|{\hat{b}}_{i,k+1}|\le C_k |{\hat{f}}|_{k+1,2,{\hat{K}}},|{\hat{b}}_{k+1,i}|\le C_k |{\hat{f}}|_{k+1,2,{\hat{K}}},\quad 0\le i\le k+1.\)

  4. 4.

    If \({\hat{f}}\in H^{k+2}({\hat{K}})\), then \(|{\hat{b}}_{i,k+1}|\le C_k |{\hat{f}}|_{k+2,2,{\hat{K}}},\quad 1\le i\le k+1.\)

4.2 Estimates of M-Type Projection with Quadrature

Lemma 4.4

Assume \({\hat{f}}(s,t) \in H^{k+3}({\hat{K}})\), \(k \ge 2\),

$$\begin{aligned} \langle {\hat{R}}[{\hat{f}}]_{k+1,k+1}- {\hat{R}}[{\hat{f}}]_{k,k}, 1 \rangle _{{\hat{K}}}=0,\quad |\langle \partial _s {\hat{R}}[{\hat{f}}]_{k+1,k+1} , 1 \rangle _{{\hat{K}}}| \le C |{\hat{f}}|_{k+3,{\hat{K}}}. \end{aligned}$$

Proof

First, we have

$$\begin{aligned}&\langle {\hat{R}}[{\hat{f}}]_{k+1,k+1}- {\hat{R}}[{\hat{f}}]_{k,k}, 1 \rangle _{{\hat{K}}}=\left\langle M_{k+1}(t)\sum _{i=0}^k {\hat{b}}_{i,k+1}M_{i}(s)\right. \\&\quad \left. +M_{k+1}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t) , 1 \right\rangle _{{\hat{K}}}=0 \end{aligned}$$

due to the fact that roots of \(M_{k+1}(t)\) are the \((k+1)\)-point Gauss–Lobatto quadrature points for \([-1,1]\).

We have

$$\begin{aligned}&\langle \partial _s{\hat{R}}[{\hat{f}}]_{k+1,k+1} , 1 \rangle _{{\hat{K}}} \\&\quad = \langle \partial _s{\hat{R}}[{\hat{f}}]_{k+2,k+2} , 1 \rangle _{{\hat{K}}} - \langle \partial _s({\hat{R}}[{\hat{f}}]_{k+2,k+2}-{\hat{R}}[{\hat{f}}]_{k+1,k+1}) , 1 \rangle _{{\hat{K}}}\\&\quad = \langle \partial _s{\hat{R}}[{\hat{f}}]_{k+2,k+2} , 1 \rangle _{{\hat{K}}} - \left\langle M_{k+2}(t)\sum _{i=0}^{k+1} {\hat{b}}_{i,k+2}M_{i}'(s)\right. \\&\qquad \left. +M_{k+2}'(s)\sum _{j=0}^{k+2}{\hat{b}}_{k+2,j}M_j(t),1 \right\rangle _{{\hat{K}}}\\&\quad = \langle \partial _s{\hat{R}}[{\hat{f}}]_{k+2,k+2} , 1 \rangle _{{\hat{K}}} - \left\langle M_{k+2}(t)\sum _{i=0}^k {\hat{b}}_{i+1,k+2}l_{i}(s),1 \right\rangle _{{\hat{K}}}\\&\qquad + \left\langle l_{k+1}(s)\sum _{j=0}^{k+2}{\hat{b}}_{k+2,j}M_j(t),1 \right\rangle _{{\hat{K}}}. \end{aligned}$$

Then by Lemma 4.3,

$$\begin{aligned} |\langle \partial _s{\hat{R}}[{\hat{f}}]_{k+2,k+2} , 1 \rangle _{{\hat{K}}}|\le C |{\hat{f}}|_{k+3,{\hat{K}}}. \end{aligned}$$

Notice that we have \(\langle l_{k+1}(s)\sum _{j=0}^{k+2}{\hat{b}}_{k+2,j}M_j(t),1 \rangle _{{\hat{K}}}=0\) since the \((k+1)\)-point Gauss–Lobatto quadrature for s-integration is exact and \(l_{k+1}(s)\) is orthogonal to 1. Lemma 4.3 implies \(|{\hat{b}}_{i+1,k+2}|\le C[{\hat{f}}]_{k+3,{\hat{K}}}\) for \(i\ge 0\), thus we have

$$\begin{aligned} \left| \left\langle M_{k+2}(t)\sum _{i=0}^k {\hat{b}}_{i+1,k+2}l_{i}(s),1 \right\rangle _{{\hat{K}}}\right| \le C[{\hat{f}}]_{k+3,{\hat{K}}}. \end{aligned}$$

\(\square \)

Lemma 4.5

Assume \(a(x,y)\in W^{k,\infty }(\Omega )\), \(u(x,y)\in H^{k+3}(\Omega )\) and \(k \ge 2\). Then

$$\begin{aligned} \langle a (u-u_p)_x, (v_h)_x\rangle _h=\mathcal O(h^{k+2})\Vert a\Vert _{2,\infty }\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h. \end{aligned}$$

Proof

As before, we ignore the subscript of \(v_h\) for simplicity. We have

$$\begin{aligned} \langle a (u-u_p)_x, v_x\rangle _h=\sum _e \langle a (u-u_p)_x, v_x\rangle _{e,h}, \end{aligned}$$

and on each cell e,

$$\begin{aligned}&\langle a(u-u_p)_x,v_x \rangle _{e,h} =\langle (R[u]_{k,k})_x, av_x\rangle _{e,h} = \langle ({\hat{R}}[{\hat{u}}]_{k,k})_s, {\hat{a}}{\hat{v}}_s\rangle _{{\hat{K}}}\nonumber \\&\quad =\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_s\rangle _{{\hat{K}}}+\langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_s\rangle _{{\hat{K}}}. \end{aligned}$$
(4.2)

For the first term in (4.2), we have

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s , {\hat{a}} {\hat{v}}_s\rangle _{{\hat{K}}} =\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}} \overline{{\hat{v}}_s}\rangle _{{\hat{K}}}+\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}( {\hat{v}}_s-\overline{{\hat{v}}_s})\rangle _{{\hat{K}}}. \end{aligned}$$

By Lemma 4.4,

$$\begin{aligned} \langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, \overline{{\hat{a}}}\, \overline{{\hat{v}}_s}\rangle _{{\hat{K}}}\le C|{\hat{a}}|_{0,\infty }|{\hat{u}}|_{k+3,{\hat{K}}}|{\hat{v}}|_{1,\hat{K}}. \end{aligned}$$

By Lemma 4.3,

$$\begin{aligned} |({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s|_{0,\infty ,{\hat{K}}}\le C[{\hat{u}}]_{k+2,{\hat{K}}}. \end{aligned}$$

By Bramble–Hilbert Lemma Theorem 3.1 we have

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}} \overline{{\hat{v}}_s}\rangle _{{\hat{K}}} =\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, \overline{{\hat{a}}}\, \overline{{\hat{v}}_s}\rangle _{{\hat{K}}} +\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, ({\hat{a}}-\overline{{\hat{a}}})\overline{{\hat{v}}_s}\rangle _{{\hat{K}}}\\&\quad \le C(|{\hat{a}}|_{0,\infty }|{\hat{u}}|_{k+3,{\hat{K}}}|{\hat{v}}|_{1,\hat{K}}+|{\hat{a}}-\overline{{\hat{a}}}|_{0,\infty }|{\hat{u}}|_{k+2,{\hat{K}}}|{\hat{v}}|_{1,{\hat{K}}})\\&\quad \le C(|{\hat{a}}|_{0,\infty }|{\hat{u}}|_{k+3,{\hat{K}}} |{\hat{v}}|_{1,\hat{K}}+|{\hat{a}}|_{1,\infty }|{\hat{u}}|_{k+2, {\hat{K}}}|{\hat{v}}|_{1,{\hat{K}}})\\&\quad ={\mathcal {O}}(h^{k+2}) \Vert a\Vert _{1,\infty ,e}\Vert u\Vert _{k+3,e}\Vert v\Vert _{1,e}, \end{aligned}$$

and

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s , {\hat{a}}( {\hat{v}}_s -\overline{{\hat{v}}_s})\rangle _{{\hat{K}}} \le C[{\hat{u}}]_{k+2,2,{\hat{K}}}|{\hat{a}}|_{0,\infty ,{\hat{K}}}|{\hat{v}}_s -\overline{{\hat{v}}_s}|_{0,\infty ,{\hat{K}}}\\&\quad \le C[{\hat{u}}]_{k+2,2,{\hat{K}}}|{\hat{a}}|_{0,\infty ,{\hat{K}}}|{\hat{v}}_s -\overline{{\hat{v}}_s}|_{0,2,{\hat{K}}} \\&\quad = {\mathcal {O}}(h^{k+2})[ u]_{k+2,2,e}|a|_{0,\infty ,e}|v|_{2,2,e}. \end{aligned}$$

Thus,

$$\begin{aligned} \langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s , {\hat{a}} {\hat{v}}_s \rangle _{{\hat{K}}} = {\mathcal {O}}(h^{k+2}) \Vert a\Vert _{1,\infty ,e}| u|_{k+3,2,e} \Vert v\Vert _{2,e}. \end{aligned}$$
(4.3)

For the second term in (4.2), we have

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_s\rangle _{{\hat{K}}}\nonumber \\&\quad =-\left\langle (M_{k+1}(t)\sum _{i=0}^k {\hat{b}}_{i,k+1}M_{i}(s)+M_{k+1}(s)\sum _{j=0}^{k+1} {\hat{b}}_{k+1,j}M_j(t))_s, {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}}\nonumber \\&\quad =-\left\langle M_{k+1}(t)\sum _{i=0}^{k-1} {\hat{b}}_{i+1,k+1}l_{i}(s)+l_{k}(s)\sum _{j=0}^{k+1} {\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}}\nonumber \\&\quad =-\left\langle M_{k+1}(t)\sum _{i=0}^{k-1} {\hat{b}}_{i+1,k+1}l_{i}(s), {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}} - \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}}. \end{aligned}$$
(4.4)

Since \(M_{k+1}(t)\) vanishes at \((k+1)\) Gauss–Lobatto points, we have

$$\begin{aligned} \left\langle M_{k+1}(t)\sum _{i=0}^{k-1} {\hat{b}}_{i+1,3}l_{i}(s), {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}}=0. \end{aligned}$$

For the second term in (4.4),

$$\begin{aligned}&\left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}}= \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}\overline{{\hat{v}}_s}\right\rangle _{{\hat{K}}}\\&\qquad +\left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}({\hat{v}}_s- \overline{{\hat{v}}_s})\right\rangle _{{\hat{K}}}\\&\quad = \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), ({\hat{a}}-{{\hat{\Pi }}}_1{\hat{a}})\overline{{\hat{v}}_s} \right\rangle _{{\hat{K}}}\\&\qquad +\left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), ({{\hat{\Pi }}}_1{\hat{a}})\overline{{\hat{v}}_s}\right\rangle _{{\hat{K}}}\\&\qquad + \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t),({\hat{a}} -\overline{{\hat{a}}})({\hat{v}}_s- \overline{{\hat{v}}_s})\right\rangle _{{\hat{K}}}\\&\qquad +\left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), \overline{{\hat{a}}}({\hat{v}}_s- \overline{{\hat{v}}}_s)\right\rangle _{{\hat{K}}}\\&\quad = \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), ({\hat{a}}-{{\hat{\Pi }}}_1{\hat{a}})\overline{{\hat{v}}}_s \right\rangle _{{\hat{K}}}\\&\qquad +\left\langle l_{k}(s)\sum _{j=0}^{k+1} {\hat{b}}_{k+1,j}M_j(t),({\hat{a}} -\overline{{\hat{a}}})({\hat{v}}_s- \overline{{\hat{v}}}_s)\right\rangle _{{\hat{K}}}, \end{aligned}$$

where the last step is due to the facts that \(({\hat{\Pi }}_1 {\hat{a}}) \overline{{\hat{v}}_s}\) and \(\overline{{\hat{a}}}({\hat{v}}_s- \overline{{\hat{v}}}_s)\) are polynomials of degree at most \(k-1\) with respect to variable s, the \((k+1)\)-point Gauss–Lobatto quadrature on s-integration is exact for polynomial of degree \(2k-1\), and \(l_k(s)\) is orthogonal to polynomials of lower degree. With Lemma 4.3, we have

$$\begin{aligned}&\left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}_s\right\rangle _{{\hat{K}}} \le C|{\hat{u}}|_{k+1,2,{\hat{K}}}(|{\hat{a}}|_{2,\infty }|{\hat{v}}|_{1,{\hat{K}}}+ |{\hat{a}}|_{1,\infty }|{\hat{v}}|_{2,{\hat{K}}})\nonumber \\&\quad ={\mathcal {O}}(h^{k+2}) \Vert a\Vert _{2,\infty }\Vert u\Vert _{k+1, e}\Vert v\Vert _{2,e}. \end{aligned}$$
(4.5)

Combined with (4.3), we have proved the estimate. \(\square \)

Lemma 4.6

Assume \(a(x,y)\in W^{2,\infty }(\Omega )\), \(u(x,y)\in H^{k+2}(\Omega )\) and \(k \ge 2\). Then

$$\begin{aligned} \langle a (u-u_p), v_h\rangle _h=\mathcal O(h^{k+2})\Vert a\Vert _{2,\infty }\Vert u\Vert _{k+2}\Vert v_h\Vert _2,\quad \forall v_h\in V^h. \end{aligned}$$

Proof

As before, we ignore the subscript of \(v_h\) for simplicity and

$$\begin{aligned} \langle a(u-u_p), v\rangle _h=\sum _e \langle a(u-u_p), v\rangle _{e,h}. \end{aligned}$$

On each cell e we have

$$\begin{aligned}&\langle a(u-u_p), v\rangle _{e,h} =\langle R[u]_{k,k}, av\rangle _{e,h} = h^2\langle {\hat{R}}[{\hat{u}}]_{k,k}, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}\nonumber \\&\quad =h^2\langle {\hat{R}}[{\hat{u}}]_{k,k}, {\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}+h^2\langle {\hat{R}}[{\hat{u}}]_{k,k}, \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}. \end{aligned}$$
(4.6)

For the first term in (4.6), due to the embedding \(H^{2}({\hat{K}})\hookrightarrow C^0({\hat{K}})\), Bramble–Hilbert Lemma Theorem 3.1 and Lemma 4.3, we have

$$\begin{aligned}&h^2\langle {\hat{R}}[{\hat{u}}]_{k,k}, {\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}} \le C h^2 |R[{\hat{u}}]_{k,k}|_{\infty }|{\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}}|_{\infty } \le C h^2|\hat{u}|_{k+1,{\hat{K}}}\Vert {\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}}\Vert _{2,{\hat{K}}}\\&\quad \le C h^2|\hat{u}|_{k+1,{\hat{K}}}(\Vert {\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}}\Vert _{L^2({\hat{K}})}+|{\hat{a}}{\hat{v}}|_{1,{\hat{K}}}+|{\hat{a}}{\hat{v}}|_{2,{\hat{K}}})\\&\quad \le C h^2|\hat{u}|_{k+1,{\hat{K}}}(|{\hat{a}}{\hat{v}}|_{1,{\hat{K}}}+|{\hat{a}}{\hat{v}}|_{2,{\hat{K}}}) ={\mathcal {O}}(h^{k+2})\Vert a\Vert _{2,\infty , e}\Vert u\Vert _{k+1,e}\Vert v\Vert _{2,e}. \end{aligned}$$

For the second term in (4.6), we have

$$\begin{aligned} h^2\langle {\hat{R}}[{\hat{u}}]_{k+1,k+1}, \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}=h^2\langle {\hat{R}}[{\hat{u}}]_{k+1,k+1}, \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}-h^2\langle {\hat{R}}[{\hat{u}}]_{k+1,k+1}- {\hat{R}}[{\hat{u}}]_{k,k}, \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}. \end{aligned}$$

By Lemmas 4.3 and 4.4 we have

$$\begin{aligned} h^2\langle {\hat{R}}[{\hat{u}}]_{k+1,k+1}, \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}\le Ch^2 |{\hat{u}}|_{k+2,{\hat{K}}}|{\hat{a}} {\hat{v}}|_{0,{\hat{K}}}={\mathcal {O}}(h^{k+2})\Vert a\Vert _{0,\infty , e}\Vert u\Vert _{k+2,e}\Vert v\Vert _{0,e}, \end{aligned}$$

and

$$\begin{aligned}h^2\langle {\hat{R}}[{\hat{u}}]_{k+1,k+1}- {\hat{R}}[{\hat{u}}]_{k,k}, \overline{{\hat{a}}{\hat{v}}} \rangle _{{\hat{K}}}=0.\end{aligned}$$

Thus, we have \(\langle a (u-u_p), v_h\rangle _h=\mathcal O(h^{k+2})\Vert a\Vert _{2,\infty }\Vert u\Vert _{k+2}\Vert v_h\Vert _2\). \(\square \)

Lemma 4.7

Assume \(a\in W^{2,\infty }(\Omega )\), \(u\in H^{k+3}(\Omega )\) and \(k\ge 2\). Then

$$\begin{aligned} \langle a (u-u_p)_x, v_h\rangle _h=\mathcal O(h^{k+2})\Vert a\Vert _{2,\infty }\Vert u\Vert _{k+3}\Vert v_h\Vert _2,\quad \forall v_h\in V^h. \end{aligned}$$

Proof

As before, we ignore the subscript in \(v_h\) and we have

$$\begin{aligned} \langle a(u-u_p)_x, v\rangle _h=\sum _e \langle a(u-u_p)_x, v\rangle _{e,h}. \end{aligned}$$

On each cell e, we have

$$\begin{aligned}&\langle a(u-u_p)_x, v\rangle _{e,h} =\langle (R[u]_{k,k})_x, av\rangle _{e,h} = h\langle ({\hat{R}}[{\hat{u}}]_{k,k})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}\nonumber \\&\quad =h\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}-h\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1}-{\hat{R}}[{\hat{u}}]_{k,k})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}. \end{aligned}$$
(4.7)

For the first term in (4.7), we have

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s , {\hat{a}} {\hat{v}}\rangle _{{\hat{K}}} \le \langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, \overline{{\hat{a}} {\hat{v}}}\rangle _{{\hat{K}}}+\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}} {\hat{v}}-\overline{{\hat{a}} {\hat{v}}}\rangle _{{\hat{K}}} \end{aligned}$$

Due to Lemma 4.4,

$$\begin{aligned} h\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, \overline{{\hat{a}} {\hat{v}}}\rangle _{{\hat{K}}} \le Ch\Vert a\Vert _{0,\infty }|u|_{k+3,{\hat{K}}}\Vert v\Vert _{0,{\hat{K}}}={\mathcal {O}}(h^{k+2})\Vert a\Vert _{0,\infty } \Vert u\Vert _{k+3,e}\Vert v\Vert _{0,e}, \end{aligned}$$

and by the same arguments as in the proof of Lemma 4.6 we have

$$\begin{aligned}&h\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}} {\hat{v}}-\overline{{\hat{a}} {\hat{v}}}\rangle _{{\hat{K}}} \le C h |(R[{\hat{u}}]_{k+1,k+1})_s|_{\infty }|{\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}}|_{\infty }\\&\quad \le C h|\hat{u}|_{k+2,{\hat{K}}}\Vert {\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}}\Vert _{2,{\hat{K}}}\\&\quad \le C h|\hat{u}|_{k+2,{\hat{K}}}(\Vert {\hat{a}}{\hat{v}}- \overline{{\hat{a}}{\hat{v}}}\Vert _{L^2({\hat{K}})}+|{\hat{a}} {\hat{v}}|_{1,{\hat{K}}}+|{\hat{a}}{\hat{v}}|_{2,{\hat{K}}})\\&\quad \le C h|\hat{u}|_{k+2,{\hat{K}}}(|{\hat{a}}{\hat{v}}|_{1 ,{\hat{K}}}+|{\hat{a}}{\hat{v}}|_{2,{\hat{K}}}) ={\mathcal {O}}(h^{k+2})\Vert a\Vert _{2,\infty } \Vert u\Vert _{k+2,e}\Vert v\Vert _{2,e}. \end{aligned}$$

Thus

$$\begin{aligned} h\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}={\mathcal {O}}(h^{k+2})\Vert a\Vert _{2,\infty } \Vert u\Vert _{k+3,e}\Vert v\Vert _{2,e}. \end{aligned}$$
(4.8)

For the second term in (4.7), we have

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1}-{\hat{R}}[{\hat{u}}]_{k,k})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}\\&\quad =\left\langle (M_{k+1}(t)\sum _{i=0}^k {\hat{b}}_{i,k+1}M_{i}(s)+M_{k+1}(s)\sum _{j=0}^{k+1} {\hat{b}}_{k+1,j}M_j(t))_s, {\hat{a}}{\hat{v}}\right\rangle _{{\hat{K}}}\\&\quad =\left\langle M_{k+1}(t)\sum _{i=0}^{k-1} {\hat{b}}_{i+1,k+1} l_{i}(s)+l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}\right\rangle _{{\hat{K}}}\\&\quad =\left\langle M_{k+1}(t)\sum _{i=0}^{k-1} {\hat{b}}_{i+1,k+1}l_{i}(s), {\hat{a}}{\hat{v}}\right\rangle _{{\hat{K}}} + \left\langle l_{k}(s) \sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}} \right\rangle _{{\hat{K}}}\\&\quad = \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}\right\rangle _{{\hat{K}}}, \end{aligned}$$

where the last step is due to that \(M_{k+1}(t)\) vanishes at \((k+1)\) Gauss–Lobatto points. Then

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}=\left\langle l_{k}(s) \sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}\right\rangle _{{\hat{K}}}\\&\quad = \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}-{\hat{\Pi }}_1( {\hat{a}}{\hat{v}})\right\rangle _{{\hat{K}}}+\left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{\Pi }}_1( {\hat{a}}{\hat{v}}) \right\rangle _{{\hat{K}}}\\&\quad = \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}-{\hat{\Pi }}_1( {\hat{a}}{\hat{v}})\right\rangle _{{\hat{K}}}, \end{aligned}$$

where the last step is due to the facts that \({\hat{\Pi }}_1( {\hat{a}}{\hat{v}})\) is a linear function in s thus the \((k+1)\)-point Gauss–Lobatto quadrature on s-variable is exact, and \(l_k(s)\) is orthogonal to linear functions.

By Lemma 4.3 and Theorem 3.1, we have

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}=\left\langle l_{k}(s) \sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}-{\hat{\Pi }}_1( {\hat{a}}{\hat{v}})\right\rangle _{{\hat{K}}}\\&\quad \le C|u|_{k+1,{\hat{K}}}|{\hat{a}}{\hat{v}}|_{2,{\hat{K}}}\le C|u|_{k+1,{\hat{K}}}(|{\hat{a}}|_{2,\infty , {\hat{K}}}|{\hat{v}}|_{0,{\hat{K}}}+|{\hat{a}}|_{1,\infty , {\hat{K}}}|{\hat{v}}|_{1,{\hat{K}}}+|{\hat{a}}|_{0,\infty }| {\hat{v}}|_{2,{\hat{K}}}) \end{aligned}$$

Thus

$$\begin{aligned} h\langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}\rangle _{{\hat{K}}}=\mathcal O(h^{k+2})\Vert a\Vert _{2,\infty }\Vert u\Vert _{k+1,e}\Vert v\Vert _{2,e}. \end{aligned}$$
(4.9)

By (4.8) and (4.9) and sum up over all the cells, we get the desired estimate. \(\square \)

Lemma 4.8

Assume \(a(x,y)\in W^{4,\infty }(\Omega )\), \(u(x,y)\in H^{k+3}(\Omega )\) and \(k \ge 2\). Then

figure c

Proof

We ignore the subscript in \(v_h\) and we have

$$\begin{aligned} \langle a (u-u_p)_x, v_y\rangle _h=\sum _e \langle a (u-u_p)_x, v_y\rangle _{e,h}, \end{aligned}$$

and on each cell e

$$\begin{aligned}&\langle a(u-u_p)_x,v_y \rangle _{e,h} =\langle (R[u]_{k,k})_x, av_y\rangle _{e,h} = \langle ({\hat{R}}[{\hat{u}}]_{k,k})_s, {\hat{a}}{\hat{v}}_t\rangle _{{\hat{K}}}\nonumber \\&\quad =\langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_t\rangle _{{\hat{K}}} +\langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_t\rangle _{{\hat{K}}}. \end{aligned}$$
(4.11)

By the same arguments as in the proof of Lemma 4.5, we have

$$\begin{aligned} \langle ({\hat{R}}[{\hat{u}}]_{k+1,k+1})_s , {\hat{a}} {\hat{v}}_t \rangle _{{\hat{K}}} = {\mathcal {O}}(h^{k+2}) \Vert a\Vert _{1,\infty } |u|_{k+3,2,e} \Vert v\Vert _{2,e}, \end{aligned}$$
(4.12)

and

$$\begin{aligned} \langle ({\hat{R}}[{\hat{u}}]_{k,k}-{\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_t\rangle _{{\hat{K}}} = - \left\langle l_{k}(s)\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t), {\hat{a}}{\hat{v}}_t\right\rangle _{{\hat{K}}}. \end{aligned}$$

For simplicity, we define

$$\begin{aligned} {\hat{b}}_{k+1}(t):=\sum _{j=0}^{k+1}{\hat{b}}_{k+1,j}M_j(t). \end{aligned}$$

then by the third and fourth estimates in Lemma 4.3, we have

$$\begin{aligned}&|{\hat{b}}_{k+1}(t)|\le C\sum _{j=0}^{k+1} |{\hat{b}}_{k+1,j}| \le C |{\hat{u}}|_{k+1,{\hat{K}}},\\&|{\hat{b}}_{k+1}^{(m)}(t)|\le C\sum _{j=m}^{k+1} |{\hat{b}}_{k+1,j}| \le C |{\hat{u}}|_{k+2,{\hat{K}}}, \quad 1\le m, \end{aligned}$$

where \({\hat{b}}_{k+1}^{(m)}(t)\) is the mth derivative of \({\hat{b}}_{k+1}(t)\). We use the same technique in the proof of Theorem 3.7 and we let \(l_k = l_k(s)\), \(b_{k+1} = b_{k+1}(t)\) in the following,

$$\begin{aligned}&\langle ({\hat{R}}[{\hat{u}}]_{k,k}- {\hat{R}}[{\hat{u}}]_{k+1,k+1})_s, {\hat{a}}{\hat{v}}_t\rangle _{{\hat{K}}} = -\langle l_{k}(s){\hat{b}}_{k+1}(t), {\hat{a}}{\hat{v}}_t\rangle _{{\hat{K}}}\\&\quad = -\iint _{{\hat{K}}}l_{k}(s){\hat{b}}_{k+1}(t){\hat{a}} {\hat{v}}_td^hsd^ht =- \iint _{{\hat{K}}}(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}_td^hsd^ht\\&\quad =-\iint _{{\hat{K}}}(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}_td^hsd^ht +\iint _{{\hat{K}}}l_{k}{\hat{b}}_{k+1}{\hat{a}}{\hat{v}}_tdsdt - \iint _{{\hat{K}}}l_{k}{\hat{b}}_{k+1}{\hat{a}}{\hat{v}}_tdsdt, \end{aligned}$$

and

$$\begin{aligned}&- \iint _{{\hat{K}}}(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}_td^hsd^ht + \iint _{{\hat{K}}}l_{k}{\hat{b}}_{k+1}{\hat{a}}{\hat{v}}_tdsdt \\&\quad = \iint _{{\hat{K}}}\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}-(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I\right] {\hat{v}}_tdsdt + \iint _{{\hat{K}}}(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}_tdsdt \\&\qquad - \iint _{{\hat{K}}}(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}_td^hsdt\\&\quad = \iint _{{\hat{K}}}\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}-(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I\right] {\hat{v}}_tdsdt + \iint _{{\hat{K}}}\partial _t(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}d^hsdt \\&\qquad - \iint _{{\hat{K}}}\partial _t(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}}dsdt\\&\qquad +\left( \left. \int _{-1}^1 (l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} ds \right| ^{t=1}_{t=-1} - \left. \int _{-1}^1 (l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} d^hs \right| ^{t=1}_{t=-1} \right) = I + II + III. \end{aligned}$$

After integration by parts with respect to the variable s, we have

$$\begin{aligned}&\iint _{{\hat{K}}}l_{k}(s){\hat{b}}_{k+1}(t){\hat{a}}{\hat{v}}_t dsdt= -\iint _{{\hat{K}}}M_{k+1}(s){\hat{b}}_{k+1}(t)({\hat{a}}_s{\hat{v}}_t + {\hat{a}}{\hat{v}}_{st}) ds dt, \end{aligned}$$

which is exactly the same integral estimated in the proof of Lemma 3.7 in [13]. By the same proof of Lemma 3.7 in [13], after summing over all elements, we have the estimate for the term \(\iint _{{\hat{K}}}l_{k}(s){\hat{b}}_{k+1}(t){\hat{a}}{\hat{v}}_t dsdt\):

$$\begin{aligned} \sum _e \iint _{{\hat{K}}}l_{k}(s){\hat{b}}_{k+1}(t){\hat{a}}{\hat{v}}_t dsdt= {\left\{ \begin{array}{ll} {\mathcal {O}}(h^{k+\frac{3}{2}})\Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+3}\Vert v\Vert _2,&{} \forall v\in V^h,\\ {\mathcal {O}}(h^{k+2})\Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+3}\Vert v\Vert _2,&{} \forall v\in V_0^h. \end{array}\right. } \end{aligned}$$

Then we can do similar estimation as in Theorem 3.7 for IIIIII separately.

For term I, by Theorem 3.1 and the estimate (3.2), we have

$$\begin{aligned}&\iint _{{\hat{K}}}\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}-(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I\right] {\hat{v}}_tdsdt\\&\quad =\iint _{{\hat{K}}}\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}-(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I\right] \overline{{\hat{v}}_t}dsdt + \iint _{{\hat{K}}}\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}-(l_{k}{\hat{b}}_{k+1}{\hat{a}})_I\right] ({\hat{v}}_t -\overline{{\hat{v}}_t})dsdt\\&\quad \le C\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}\right] _{k+2,{\hat{K}}}|{\hat{v}}|_{1,{\hat{K}}} + C\left[ l_{k}{\hat{b}}_{k+1}{\hat{a}}\right] _{k+1,{\hat{K}}}|{\hat{v}}|_{2,{\hat{K}}}\\&\quad \le C \left( \sum _{m=2}^{k+2}|{\hat{a}}|_{m,\infty ,{\hat{K}}} \max _{t\in [-1,1]}|{\hat{b}}_{k+1}(t)| \right) |{\hat{v}}|_{1,{\hat{K}}} \\&\qquad + C \left( \sum _{m=0}^{k+2}|{\hat{a}}|_{m,\infty ,{\hat{K}}} \max _{t\in [-1,1]}|{\hat{b}}_{k+1}^{(k+2-m)}(t)| \right) |{\hat{v}}|_{1,{\hat{K}}} \\&\qquad + C\left( \sum _{m=1}^{k+1}|{\hat{a}}|_{m,\infty ,{\hat{K}}} \max _{t\in [-1,1]}|{\hat{b}}_{k+1}(t)|\right) |{\hat{v}}|_{2,{\hat{K}}}\\&\qquad + C\left( \sum _{m=0}^{k+1}|{\hat{a}}|_{m,\infty ,{\hat{K}}} \max _{t\in [-1,1]}|{\hat{b}}_{k+1}^{(k+1-m)}(t)|\right) |{\hat{v}}|_{2,{\hat{K}}}\\&\quad = \mathcal O(h^{k+2})\Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+2,e}\Vert v\Vert _{2,e}. \end{aligned}$$

For term II, as in the proof of Theorem 3.7, we define the linear form as

$$\begin{aligned} {\hat{E}}_{{\hat{v}}}({\hat{f}}) = \iint _{{\hat{K}}}({\hat{F}}_I)_t {\hat{v}} dsdt -\iint _{{\hat{K}}} ({\hat{F}}_I)_t{\hat{v}} d^hsdt, \end{aligned}$$

for each \({\hat{v}} \in Q^k({\hat{K}})\) and \({\hat{F}}\) is an antiderivative of \({\hat{f}}\) w.r.t. variable t. We can easily see that \({\hat{E}}_{{\hat{v}}}\) is well defined and \({\hat{E}}_{{\hat{v}}}\) is a continuous linear form on \(H^k({\hat{K}})\). With projection \({\hat{\Pi }}_1\) defined in (2.2), we have

$$\begin{aligned} {\hat{E}}_{{\hat{v}}}({\hat{f}}) = {\hat{E}}_{{\hat{v}} - {{\hat{\Pi }}}_1{\hat{v}}} ({\hat{f}})+ {\hat{E}}_{{{\hat{\Pi }}}_1{\hat{v}}}({\hat{f}}),\quad \forall {\hat{v}} \in Q^k({\hat{K}}). \end{aligned}$$

Since \( Q^{k-1}({\hat{K}}) \subset \ker {\hat{E}}_{{\hat{v}} - {\hat{\Pi }}_1{\hat{v}}}\) thus

$$\begin{aligned} {\hat{E}}_{{\hat{v}} - {{\hat{\Pi }}}_1{\hat{v}}} ({\hat{f}}) \le C[f]_{k,{\hat{K}}}\Vert {\hat{v}}- {{\hat{\Pi }}}_1{\hat{v}}\Vert _{0,{\hat{K}}}\le C[f]_{k,{\hat{K}}}|{\hat{v}}|_{2,{\hat{K}}} \end{aligned}$$

and

$$\begin{aligned} {\hat{E}}_{{{\hat{\Pi }}}_1{\hat{v}}} ({\hat{f}}) =\iint _{{\hat{K}}}({\hat{F}}_I)_t {\hat{\Pi }}_1{\hat{v}} dsdt -\iint _{{\hat{K}}} ({\hat{F}}_I)_t{{\hat{\Pi }}}_1{\hat{v}} d^hsdt=0. \end{aligned}$$

Thus we have

$$\begin{aligned}&\iint _{{\hat{K}}} \partial _t(l_k{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} d^hsdt -\iint _{{\hat{K}}}\partial _t(l_k{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} dsdt = -{\hat{E}}_{{\hat{v}}}((l_k{\hat{b}}_{k+1}{\hat{a}})_t) \\&\quad = -{\hat{E}}_{{\hat{v}} - \Pi _1{\hat{v}}} ((l_k{\hat{b}}_{k+1}{\hat{a}})_t) \le C [(l_k{\hat{b}}_{k+1}{\hat{a}})_t]_{k,{\hat{K}}}|{\hat{v}}_h|_{2,{\hat{K}}}\\&\quad = {\mathcal {O}}(h^{k+2})\Vert a\Vert _{k+1,\infty , e}\Vert u\Vert _{k+2,e}|v|_{2,e}. \end{aligned}$$

Now we only need to discuss term III. Let \(L_1\) and \(L_3\) denote the top and bottom boundaries of \(\Omega \) and let \(l^e_1\), \(l^e_3\) denote the top and bottom edges of element e (and \(l^{{\hat{K}}}_1\) and \(l^{{\hat{K}}}_3\) for \({\hat{K}}\)). Notice that after mapping back to the cell e we have

$$\begin{aligned} b_{k+1}(y_e+h)&={\hat{b}}_{k+1}(1)=\sum _{j=0}^{k+1} {\hat{b}}_{k+1,j} M_j(1)={\hat{b}}_{k+1,0}+ {\hat{b}}_{k+1,1}=\left( k+\frac{1}{2}\right) \\&\int _{-1}^1 \partial _s {\hat{u}}(s,1) l_k(s)ds =\left( k+\frac{1}{2}\right) \int _{x_e-h}^{x_e+h} \partial _x u(x,y_e+h) l_k\left( \frac{x-x_e}{h}\right) dx, \end{aligned}$$

and similarly we get \(b_{k+1}(y_e-h)={\hat{b}}_{k+1}(-1)=(k+\frac{1}{2})\int _{x_e-h}^{x_e+h} \partial _x u(x,y_e-h) l_k(\frac{x-x_e}{h})dx\). Thus the term \(l(\frac{x-x_e}{h})b_{k+1}(y)av\) is continuous across the top and bottom edges of cells. Therefore, if summing over all elements e, the line integral on the inner edges are cancelled out. So after summing over all elements, the line integral reduces to two line integrals along \(L_1\) and \(L_3\). We only need to discuss one of them. For a cell e adjacent to \(L_1\), consider its reference cell \({\hat{K}}\) and define linear form \({\hat{E}}({\hat{f}} ) = \int _{-1}^1 {\hat{f}}(s,1) ds - \int _{-1}^1 {\hat{f}}(s,1)d^hs\), then we have

$$\begin{aligned} {\hat{E}}({\hat{f}} {\hat{v}}) \le C |{\hat{f}}|_{0,\infty , l^{{\hat{K}}}_1} |{\hat{v}}|_{0,\infty ,l^{{\hat{K}}}_1} \le C\Vert {\hat{f}}\Vert _{2,l^{{\hat{K}}}_1}\Vert {\hat{v}}\Vert _{0,l^{{\hat{K}}}_1}, \end{aligned}$$

thus the mapping \({\hat{f}} \rightarrow {\hat{E}}({\hat{f}} {\hat{v}})\) is continuous with operator norm less than \(C\Vert {\hat{v}}\Vert _{0,l^{{\hat{K}}}_1}\) for some C. Since \({\hat{E}}(({\hat{a}}{\hat{u}}_s)_I{{\hat{\Pi }}}_1 {\hat{v}})=0\) we have

$$\begin{aligned}&\sum _{e\cap L_1\ne \emptyset }\int _{-1}^1 (l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} ds- \int _{-1}^1 (l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} d^hs\\&\quad = \sum _{e\cap L_1\ne \emptyset }{\hat{E}}((l_k{\hat{b}}_{k+1}{\hat{a}})_I {\hat{v}}) = \sum _{e\cap L_1\ne \emptyset }{\hat{E}}((l_k{\hat{b}}_{k+1}{\hat{a}})_I({\hat{v}}-{{\hat{\Pi }}}_1 {\hat{v}})) \\&\quad \le \sum _{e\cap L_1\ne \emptyset }C[(l_k{\hat{b}}_{k+1}{\hat{a}})_I]_{k,l^{{\hat{K}}}_1} [{\hat{v}}]_{2,l^{{\hat{K}}}_1}\\&\quad \le \sum _{e\cap L_1\ne \emptyset } C(|l_k{\hat{b}}_{k+1}{\hat{a}}-(l_k{\hat{b}}_{k+1}{\hat{a}})_I|_{k,l^{{\hat{K}}}_1} +|l_k{\hat{b}}_{k+1}{\hat{a}}|_{k,l^{{\hat{K}}}_1})[{\hat{v}}]_{2,l^{{\hat{K}}}_1}\\&\quad \le \sum _{e\cap L_1\ne \emptyset } (|l_k{\hat{b}}_{k+1}{\hat{a}}|_{k+1,l^{{\hat{K}}}_1}+|l_k{\hat{b}}_{k+1}{\hat{a}}|_{k,l^{{\hat{K}}}_1}) [{\hat{v}}]_{2,l^{{\hat{K}}}_1}\\&\quad \le \sum _{e\cap L_1\ne \emptyset }C \Vert {\hat{a}}\Vert _{k,\infty ,{\hat{K}}} |{\hat{b}}_{k+1}(1)|[{\hat{v}}]_{2,l^{{\hat{K}}}_1}, \end{aligned}$$

where the first inequality is derived from \({\hat{E}}({\hat{f}}({\hat{v}}-{{\hat{\Pi }}}_1{\hat{v}}))=0, \forall {\hat{f}} \in Q^{k-1}({\hat{K}})\) and Theorem 3.1.

Since \(l_k(t)=\frac{1}{2^k k!}\frac{d^k}{dt^k} (t^2-1)^k\), after integration by parts k times,

$$\begin{aligned} {\hat{b}}_{k+1}(1)=\left( k+\frac{1}{2}\right) \int _{-1}^{1} \partial _s u(s,1) l_k(s)dx =(-1)^k \left( k+\frac{1}{2}\right) \int _{-1}^{1} \partial ^{k+1}_s u(s,1) L(s)ds, \end{aligned}$$

where L(s) is a polynomial of degree 2k by taking antiderivatives of \(l_k(s)\)k times. Then by Cauchy–Schwarz inequality we have

$$\begin{aligned} {\hat{b}}_{k+1}(1) \le C \left( \int _{-1}^1|\partial ^{k+1}_s {\hat{u}}(s,1)|^2ds\right) ^{\frac{1}{2}} \le Ch^{k+\frac{1}{2}} |u|_{k+1,l^e_1}. \end{aligned}$$

By (3.13), we get \(|{\hat{v}}|_{2,l^{{\hat{K}}}_1} =h^{\frac{3}{2}}|{\hat{v}}|_{2,l^{e}_1} \le C h |v|_{2,e}. \) Thus we have

$$\begin{aligned}&\sum _{e\cap L_1\ne \emptyset }\int _{-1}^1 (l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} ds- \int _{-1}^1 (l_{k}{\hat{b}}_{k+1}{\hat{a}})_I{\hat{v}} d^hs \le \sum _{e\cap L_1\ne \emptyset }C \Vert {\hat{a}}\Vert _{k,\infty ,{\hat{K}}} |{\hat{b}}_{k+1}(1)| |{\hat{v}}|_{2,l^{{\hat{K}}}_1}\\&\quad = {\mathcal {O}}\left( h^{k+\frac{3}{2}}\right) \sum _{e\cap L_1\ne \emptyset }\Vert a\Vert _{k,\infty }|u|_{k+1, l^e_1} |v|_{2, e} =\mathcal O\left( h^{k+\frac{3}{2}}\right) \Vert a\Vert _{k,\infty } |u|_{k+1, L_1} \Vert v\Vert _{2, \Omega }\\&\quad ={\mathcal {O}}\left( h^{k+\frac{3}{2}}\right) \Vert a\Vert _{k,\infty } \Vert u\Vert _{k+2, \Omega } \Vert v\Vert _{2, \Omega }, \end{aligned}$$

where the trace inequality \( \Vert u\Vert _{k+1,\partial \Omega } \le C \Vert u\Vert _{k+2, \Omega }\) is used.

Combine all the estimates above, we get (4.10a). Since the \(\frac{1}{2}\) order loss is only due to the line integral along \(L_1\) and \(L_3\), on which \(v_{xx}=0\) if \(v\in V^h_0\), we get 4.10b). \(\square \)

By all the discussions in this subsection, we have proven (4.1a) and (4.1b).

5 Homogeneous Dirichlet Boundary Conditions

5.1 \(V^h\)-Ellipticity

In order to discuss the scheme (1.2), we need to show \(A_h\) satisfies \(V^h\)-ellipticity

$$\begin{aligned} \forall v_h\in V^h_0,\quad C\Vert v_h\Vert ^2_{1}\le A_h(v_h,v_h). \end{aligned}$$
(5.1)

We first consider the \(V_h\)-ellipticity for the case \(\mathbf{b}\equiv 0\).

Lemma 5.1

Assume the coefficients in (2.3) satisfy that \({\mathbf {b}}\equiv 0\), both c(xy) and the eigenvalues of \(\mathbf{a}(x,y)\) have a uniform upper bound and a uniform positive lower bound, then there exist two constants \(C_1, C_2>0\) independent of mesh size h such that

$$\begin{aligned} \forall v_h\in V_0^h,\quad C_1\Vert v_h\Vert ^2_{1}\le A_h(v_h,v_h)\le C_2 \Vert v_h\Vert ^2_{1}. \end{aligned}$$

Proof

Let \(Z_{0,{\hat{K}}}\) denote the set of \((k+1)\times (k+1)\) Gauss–Lobatto points on the reference cell \({\hat{K}}\). First we notice that the set \(Z_{0,{\hat{K}}}\) is a \(Q^k({\hat{K}})\)-unisolvent subset. Since the Gauss–Lobatto quadrature weights are strictly positive, we have

$$\begin{aligned} \forall {\hat{p}}\in Q^k({\hat{K}}),\, \sum _{i=1}^{2} \langle \partial _i {\hat{p}},\partial _i {\hat{p}}\rangle _{{\hat{K}}}=0 \Longrightarrow \partial _i {\hat{p}}=0 \text { at quadrature points}, \end{aligned}$$

where \(i=1,2\) represents the spatial derivative on variable \(x_i\) respectively. Since \(\partial _i {\hat{p}}\in Q^k({\hat{K}})\) and it vanishes on a \(Q^k({\hat{K}})\)-unisolvent subset, we have \(\partial _i {\hat{p}}\equiv 0\). As a consequence, \(\sqrt{\sum _{i=1}^{n}\langle \partial _i {\hat{p}},\partial _i {\hat{p}}\rangle _h}\) defines a norm over the quotient space \(Q^k({\hat{K}})/Q^0({\hat{K}})\). Since that \(|\cdot |_{1,{\hat{K}}}\) is also a norm over the same quotient space, by the equivalence of norms over a finite dimensional space, we have

$$\begin{aligned} \forall {\hat{p}}\in Q^k({\hat{K}}),\quad C_1 |{\hat{p}}|^2_{1,{\hat{K}}}\le \sum _{i=1}^{n}\langle \partial _i {\hat{p}},\partial _i {\hat{p}}\rangle _{{\hat{K}}} \le C_2 |{\hat{p}}|^2_{1,{\hat{K}}}. \end{aligned}$$

On the reference cell \({\hat{K}}\), by the assumption on the coefficients, we have

$$\begin{aligned} C_1 |{\hat{v}}_h|^2_{1,{\hat{K}}} \le C_1 \sum _{i}^{n} \langle \partial _i {\hat{v}}_h, \partial _i {\hat{v}}_h \rangle _{{\hat{K}}} \le \sum _{i,j=1}^{n} \left( \langle {\hat{a}}_{ij} \partial _i {\hat{v}}_h, \partial _j {\hat{v}}_h\rangle _{{\hat{K}}}+\langle {\hat{c}} {\hat{v}}_h, {\hat{v}}_h\rangle _{{\hat{K}}}\right) \le C_2 \Vert {\hat{v}}_h\Vert ^2_{1,{\hat{K}}} \end{aligned}$$

Mapping these back to the original cell e and summing over all elements, by the equivalence of two norms \(|\cdot |_{1}\) and \(\Vert \cdot \Vert _{1}\) for the space \(H^1_0(\Omega )\supset V^h_0\) [5], we get \(C_1\Vert v_h\Vert ^2_1 \le A_h(v_h,v_h)\le C_2\Vert v_h\Vert ^2_1\). \(\square \)

For discussing \(V_h\)-ellipticity when \({\mathbf {b}}\) is nonzero, by Young’s inequality we have

$$\begin{aligned} |\langle {\mathbf {b}}\cdot \nabla v_h, v_h\rangle _h| \le \sum _e\iint _e \frac{({\mathbf {b}} \cdot \nabla v_h)^2}{4c} + c |v_h|^2d^hxd^hy \le \left\langle \frac{|{\mathbf {b}}|^2}{4c} \nabla v_h, \nabla v_h\right\rangle _h + \langle c v_h, v_h\rangle _h. \end{aligned}$$

Thus we have

$$\begin{aligned} \langle {\mathbf {a}} \nabla v_h, \nabla v_h\rangle _h +\langle \mathbf{b}\cdot \nabla v_h, v_h\rangle _h + \langle c v_h, v_h\rangle _h \ge \langle \lambda _{{\mathbf {a}}} \nabla v_h, \nabla v_h\rangle _h - \left\langle \frac{|{\mathbf {b}}|^2}{4c} \nabla v_h, \nabla v_h\right\rangle _h, \end{aligned}$$

where \(\lambda _{{\mathbf {a}}}\) is smallest eigenvalue of \({\mathbf {a}}\). Then we have the following Lemma

Lemma 5.2

Assume \(4\lambda _{{\mathbf {a}}}c > |{\mathbf {b}}|^2\), then there exists a constant \(C>0\) independent of mesh size h such that

$$\begin{aligned} \forall v_h\in V_0^h, \quad A_h(v_h,v_h) \ge C \Vert v_h\Vert ^2_{1}. \end{aligned}$$

5.2 Standard Estimates for the Dual Problem

In order to apply the Aubin–Nitsche duality argument for establishing superconvergence of function values, we need certain estimates on a proper dual problem. Define \(\theta _h:=u_h-u_p\). Then we consider the dual problem: find \(w\in H_0^1(\Omega )\) satisfying

$$\begin{aligned} A^*(w,v)=(\theta _h,v),\quad \forall v\in H_0^1(\Omega ), \end{aligned}$$
(5.2)

where \(A^*(\cdot ,\cdot )\) is the adjoint bilinear form of \(A(\cdot ,\cdot )\) such that

$$\begin{aligned} A^*(u,v) = A(v,u) = ({\mathbf {a}} \nabla v, \nabla u)+({\mathbf {b}}\cdot \nabla v,u)+(c v, u). \end{aligned}$$

Let \(w_h\in V_0^h\) be the solution to

$$\begin{aligned} A^*_h(w_h,v_h)=(\theta _h,v_h),\quad \forall v_h\in V_0^h. \end{aligned}$$
(5.3)

Notice that the right hand side of (5.3) is different from the right hand side of the scheme (1.2).

We need the following standard estimates on \(w_h\) for the dual problem.

Theorem 5.3

Assume all coefficients in (2.3) are in \(W^{2,\infty }(\Omega )\). Let w be defined in (5.2), \(w_h\) be defined in (5.3), and \(\theta _h = u_h-u_p\). Assume elliptic regularity (2.6) and \(V^h\) ellipticity holds, we have

$$\begin{aligned}&\Vert w-w_h\Vert _1\le C h \Vert w\Vert _2,\\&\Vert w_h\Vert _2\le C\Vert \theta _h\Vert _0. \end{aligned}$$

Proof

By \(V^h\) ellipticity, we have \(C_1\Vert w_h-v_h\Vert _1^2\le A^*_h(w_h-v_h, w_h-v_h)\). By the definition of the dual problem, we have

$$\begin{aligned} A^*_h(w_h, w_h-v_h)=(\theta _h, w_h-v_h)=A^*(w, w_h-v_h),\quad \forall v_h\in V_0^h. \end{aligned}$$

Thus for any \(v_h\in V_0^h\), by Theorem 3.6, we have

$$\begin{aligned}&C_1\Vert w_h-v_h\Vert _1^2 \le A^*_h(w_h-v_h, w_h-v_h)\\&\quad =A^*(w-v_h, w_h-v_h)+[A_h^*(w_h, w_h-v_h)-A^*(w, w_h-v_h)]\\&\qquad +[A^*(v_h, w_h-v_h)-A^*_h(v_h, w_h-v_h)]\\&\quad =A^*(w-v_h, w_h-v_h)+[A(w_h-v_h, v_h)-A_h(w_h-v_h,v_h)]\\&\quad \le C\Vert w-v_h\Vert _1 \Vert w_h-v_h\Vert _1+Ch \Vert v_h \Vert _2 \Vert w_h-v_h\Vert _1. \end{aligned}$$

Thus

$$\begin{aligned} \Vert w-w_h\Vert _1\le \Vert w-v_h\Vert _1 +\Vert w_h-v_h\Vert _1 \le C \Vert w-v_h\Vert _1+Ch\Vert v_h \Vert _2. \end{aligned}$$
(5.4)

Now consider \(\Pi _1 w \in V^h_0\) where \(\Pi _1\) is the piecewise \(Q^1\) projection and its definition on each cell is defined through (2.2) on the reference cell. By the Bramble Hilbert Lemma Theorem 3.1 on the projection error, we have

$$\begin{aligned} \Vert w-\Pi _1 w\Vert _1\le C h \Vert w\Vert _2,\quad \Vert w-\Pi _1 w\Vert _2\le C \Vert w\Vert _2, \end{aligned}$$
(5.5)

thus \(\Vert \Pi _1 w\Vert _2\le \Vert w\Vert _2+\Vert w-\Pi _1 w\Vert _2\le C \Vert w\Vert _2\). By setting \(v_h=\Pi _1 w\), from (5.4) we have

$$\begin{aligned} \Vert w-w_h\Vert _1\le C \Vert w-\Pi _1 w\Vert _1+Ch \Vert \Pi _1 w \Vert _2\le C h \Vert w\Vert _2. \end{aligned}$$
(5.6)

By the inverse estimate on the piecewise polynomial \(w_h-\Pi _1 w\), we get

$$\begin{aligned} \Vert w_h\Vert _2 \le \Vert w_h-\Pi _1 w\Vert _2+\Vert \Pi _1 w-w\Vert _2+\Vert w\Vert _2\le Ch^{-1}\Vert w_h-\Pi _1 w\Vert _1+C\Vert w\Vert _2. \end{aligned}$$
(5.7)

By (5.5) and (5.6), we also have

$$\begin{aligned}&\Vert w_h-\Pi _1 w\Vert _1\le \Vert w-\Pi _1 w\Vert _1+\Vert w-w_h\Vert _1\le C h \Vert w\Vert _2. \end{aligned}$$
(5.8)

With (5.7), (5.8) and the elliptic regularity \(\Vert w\Vert _2\le C \Vert \theta _h\Vert _0\), we get

$$\begin{aligned} \Vert w_h\Vert _2\le C\Vert w\Vert _2\le C \Vert \theta _h\Vert _0. \end{aligned}$$

\(\square \)

5.3 Superconvergence of Function Values

Theorem 5.4

Assume \(a_{ij}, b_i, c \in W^{k+2,\infty }(\Omega )\) and \(u(x,y)\in H^{k+3}(\Omega )\), \(f(x,y)\in H^{k+2}(\Omega )\) with \(k \ge 2\). Assume elliptic regularity (2.6) and \(V^h\) ellipticity holds. Then \(u_h\), the numerical solution from scheme (1.2), is a \((k+2)\)th order accurate approximation to the exact solution u in the discrete 2-norm over all the \((k+1)\times (k+1)\) Gauss–Lobatto points:

$$\begin{aligned} \Vert u_h-u\Vert _{2, Z_0}= {\mathcal {O}} (h^{k+2})(\Vert u\Vert _{k+3,\Omega }+\Vert f\Vert _{k+2,\Omega }). \end{aligned}$$

Proof

By Theorems 3.7 and 3.3, for any \(v_h \in V^h_0\),

$$\begin{aligned}&A_h(u-u_h, v_h)=[A(u, v_h)-A_h(u_h, v_h)]+ [A_h(u, v_h) - A(u, v_h)]\\&\quad = A(u, v_h)-A_h(u_h, v_h)+{\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+3} \Vert v_h\Vert _2 \\&\quad = [(f,v_h)-\langle f,v_h\rangle _h] +{\mathcal {O}}(h^{k+2}) \Vert u\Vert _{k+3} \Vert v_h\Vert _2 ={\mathcal {O}}(h^{k+2}) (\Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert v_h\Vert _2. \end{aligned}$$

Let \(\theta _h=u_h-u_p\), then \(\theta _h\in V_0^h\) due to the properties of the M-type projection. So by (4.1a) and Theorem 5.3, we get

$$\begin{aligned}&\Vert \theta _h\Vert _0^2=(\theta _h,\theta _h)=A_h(\theta _h,w_h)= A_h(u_h-u, w_h)+A_h(u-u_p, w_h)\\&\quad = A_h(u-u_p, w_h)+{\mathcal {O}}(h^{k+2})(\Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert w_h\Vert _2\\&\quad ={\mathcal {O}}(h^{k+2})(\Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert w_h\Vert _2 ={\mathcal {O}}(h^{k+2})( \Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert \theta _h\Vert _0, \end{aligned}$$

thus

$$\begin{aligned} \Vert u_h-u_p\Vert _0=\Vert \theta _h\Vert _0={\mathcal {O}}(h^{k+2})( \Vert u\Vert _{k+3} +\Vert f\Vert _{k+2}). \end{aligned}$$

Finally, by the equivalence of the discrete 2-norm on \(Z_0\) and the \(L^2(\Omega )\) norm in finite-dimensional space \(V^h\) and Theorem 4.2, we obtain

$$\begin{aligned} \Vert u_h-u\Vert _{2,Z_0}&\le \Vert u_h-u_p\Vert _{2,Z_0}+\Vert u_p-u\Vert _{2,Z_0} \le C \Vert u_h-u_p\Vert _{0}+\Vert u_p-u\Vert _{2,Z_0} \\&= {\mathcal {O}} (h^{k+2})(\Vert u\Vert _{k+3}+\Vert f\Vert _{k+2}). \end{aligned}$$

\(\square \)

Remark 5.5

To extend the discussions to Neumann type boundary conditions, due to (4.1b) and Theorem 3.7, one can only prove \((k+\frac{3}{2})\)th order accuracy:

$$\begin{aligned} \Vert u_h-u\Vert _{2, Z_0}= {\mathcal {O}} (h^{k+\frac{3}{2}})(\Vert u\Vert _{k+3}+\Vert f\Vert _{k+2}). \end{aligned}$$

On the other hand, for solving a general elliptic equation, only \({\mathcal {O}}(h^{k+\frac{3}{2}})\) superconvergence at all Lobatto point can be proven for Neumann boundary conditions even for the full finite element scheme (1.1), see [4].

Remark 5.6

All key discussions can be extended to three-dimensional cases. For instance, M-type expansion has been used for discussing superconvergence for the three-dimensional case [4]. The most useful technique in Sect. 3.2 to obtain desired consistency error estimate is to derive error cancellations between neighboring cells through integration by parts on suitable interpolation polynomials, which still seems possible on rectangular meshes in three dimensions.

6 Nonhomogeneous Dirichlet Boundary Conditions

We consider a two-dimensional elliptic problem on \(\Omega =(0,1)^2\) with nonhomogeneous Dirichlet boundary condition,

$$\begin{aligned}&-\nabla \cdot ( {\mathbf {a}} \nabla u) +{\mathbf {b}}\cdot \nabla u+c u = f \text { on } \Omega \nonumber \\&u = g \text { on } \partial \Omega . \end{aligned}$$
(6.1)

Assume there is a function \({\bar{g}} \in H^1(\Omega )\) as a smooth extension of g so that \({\bar{g}}|_{\partial \Omega } = g\). The variational form is to find \({{\tilde{u}}} = u - {\bar{g}} \in H_0^1(\Omega )\) satisfying

$$\begin{aligned} A({{\tilde{u}}}, v)=(f,v) - A({\bar{g}},v) ,\quad \forall v\in H_0^1(\Omega ). \end{aligned}$$
(6.2)

In practice, \({\bar{g}}\) is not used explicitly. By abusing notations, the most convenient implementation is to consider

$$\begin{aligned} g(x,y)={\left\{ \begin{array}{ll} 0,&{}\quad \text{ if }\quad (x,y)\in (0,1)\times (0,1),\\ g(x,y),&{}\quad \text{ if }\quad (x,y)\in \partial \Omega ,\\ \end{array}\right. } \end{aligned}$$

and \(g_I\in V^h\) which is defined as the \(Q^k\) Lagrange interpolation at \((k+1)\times (k+1)\) Gauss–Lobatto points for each cell on \( \Omega \) of g(xy). Namely, \(g_I\in V^h\) is the piecewise \(P^k\) interpolation of g along the boundary grid points and \(g_I=0\) at the interior grid points. The numerical scheme is to find \({{\tilde{u}}}_h \in V_0^h\), s.t.

$$\begin{aligned} A_h( {{\tilde{u}}}_h, v_h)=\langle f,v_h \rangle _h - A_h( g_I,v_h) ,\quad \forall v_h\in V_0^h. \end{aligned}$$
(6.3)

Then \(u_h = {{\tilde{u}}}_h + g_I\) will be our numerical solution for (6.1). Notice that (6.3) is not a straightforward approximation to (6.2) since \({\bar{g}}\) is never used. Assuming elliptic regularity and \(V^h\) ellipticity hold, we will show that \(u_h-u\) is of \((k+2)\)th order in the discrete 2-norm over all \((k+1)\times (k+1)\) Gauss–Lobatto points.

6.1 An Auxiliary Scheme

In order to discuss the superconvergence of (6.3), we need to prove the superconvergence of an auxiliary scheme. Notice that we discuss the auxiliary scheme only for proving the accuracy of (6.3). In practice one should not implement the auxiliary scheme since (6.3) is a much more convenient implementation with the same accuracy.

Let \({\bar{g}}_p\in V^h\) be the piecewise M-type \(Q^k\) projection of the smooth extension function \({\bar{g}}\), and define \(g_p\in V^h\) as \(g_p={\bar{g}}_p\) on \(\partial \Omega \) and \(g_p = 0\) at all the inner grids. The auxiliary scheme is to find \({{\tilde{u}}}^{*}_h \in V_0^h\) satisfying

$$\begin{aligned} A_h({{\tilde{u}}}^{*}_h, v_h)=\langle f,v_h \rangle _h - A_h(g_p,v_h) ,\quad \forall v_h\in V_0^h, \end{aligned}$$
(6.4)

Then \(u^{*}_h = {{\tilde{u}}}^{*}_h + g_p\) is the numerical solution for problem (6.2). Define \(\theta _h=u^{*}_h-u_p\), then by Theorem 4.1 we have \(\theta _h \in V_0^h\). Following Sect. 5.2, define the following dual problem: find \(w\in H_0^1(\Omega )\) satisfying

$$\begin{aligned} A^*(w,v)=(\theta _h,v),\quad \forall v\in H_0^1(\Omega ). \end{aligned}$$
(6.5)

Let \(w_h\in V_0^h\) be the solution to

$$\begin{aligned} A^*_h(w_h,v_h)=(\theta _h,v_h),\quad \forall v_h\in V_0^h. \end{aligned}$$
(6.6)

Notice that the dual problem has homogeneous Dirichlet boundary conditions. By Theorems 3.3, 3.7, for any \(v_h \in V^h_0\),

$$\begin{aligned}&A_h(u-u^{*}_h, v_h)= [A(u, v_h)-A_h(u^{*}_h, v_h)]+ [A_h(u, v_h) - A(u, v_h)]\\&\quad = A(u, v_h)-A_h(u^{*}_h, v_h)+{\mathcal {O}}(h^{k+2}) \Vert a\Vert _{k+2,\infty }\Vert u\Vert _{k+3} \Vert v_h\Vert _2 \\&\quad = [(f,v_h)-\langle f,v_h\rangle _h] +{\mathcal {O}}(h^{k+2}) \Vert u\Vert _{k+3} \Vert v_h\Vert _2 ={\mathcal {O}}(h^{k+2}) (\Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert v_h\Vert _2. \end{aligned}$$

By (4.1a) and Theorem 5.3, we get

$$\begin{aligned}&\Vert \theta _h\Vert _0^2=(\theta _h,\theta _h)=A_h(\theta _h,w_h) = A_h(u^{*}_h-u, w_h)+A_h(u-u_p, w_h)\\&\quad = A_h(u-u_p, w_h)+{\mathcal {O}}(h^{k+2})(\Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert w_h\Vert _2\\&\quad ={\mathcal {O}}(h^{k+2})(\Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert w_h\Vert _2 ={\mathcal {O}}(h^{k+2})( \Vert u\Vert _{k+3} +\Vert f\Vert _{k+2})\Vert \theta _h\Vert _0, \end{aligned}$$

thus \( \Vert u_h^{*}-u_p\Vert _0=\Vert \theta _h\Vert _0={\mathcal {O}}(h^{k+2})( \Vert u\Vert _{k+3} +\Vert f\Vert _{k+2}).\) So Theorem 5.4 still holds for the auxiliary scheme (6.4):

$$\begin{aligned} \Vert u^*_h-u\Vert _{2,Z_0}={\mathcal {O}}(h^{k+2})(\Vert u\Vert _{k+3}+\Vert f\Vert _{k+2}). \end{aligned}$$
(6.7)

6.2 The Main Result

In order to extend Theorem 5.4 to (6.3), we only need to prove

$$\begin{aligned} \Vert u_h-u^*_h\Vert _0={\mathcal {O}}(h^{k+2}). \end{aligned}$$

The difference between (6.4) and (6.3) is

$$\begin{aligned} A_h({{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h, v_h)= A_h( g_I-g_p,v_h) ,\quad \forall v_h\in V_0^h. \end{aligned}$$
(6.8)

We need the following Lemma.

Lemma 6.1

Assuming \(u\in H^{k+4}(\Omega )\) for \(k \ge 2\), with \(g_I\) and \(g_p\) being defined as in this Section, then we have

$$\begin{aligned} A_h( g_I-g_p,v_h) = \mathcal O(h^{k+2})\Vert u\Vert _{k+4,\Omega }\Vert v_h\Vert _{2,\Omega },\quad \forall v_h\in V_0^h. \end{aligned}$$
(6.9)

Proof

For simplicity, we ignore the subscript \(_h\) of \(v_h\) in this proof and all the following v are in \(V^h\).

Notice that \(g_I-g_p\equiv 0\) in interior cells. Thus we only consider cells adjacent to \(\partial \Omega \). Let \(L_1, L_2, L_3\) and \(L_4\) denote the top, left, bottom and right boundary edges of \({{\bar{\Omega }}}=[0,1]\times [0,1]\) respectively. Without loss of generality, we consider cell \(e=[x_e-h,x_e+h]\times [y_e-h,y_e+h]\) adjacent to the left boundary \(L_2\), i.e., \(x_e-h=0\). Let \(l^e_1, l^e_2, l^e_3\) and \(l^e_4\) denote the top, left, bottom and right boundary edges of e respectively.

On \(l_2\subset L_2\), Let \(\phi _{ij}(x,y),i,j=0,1,\ldots ,k,\) be Lagrange basis functions on edge \(l_2^e\) for the \((k+1)\times (k+1)\) Gauss–Lobatto points in cell e. Then \(g_I-g_p = \sum _{i,j=0}^{k}\lambda _{ij} \phi _{ij}(x,y)\) and \(|\lambda _{ij}|\le \Vert g_I-g_p\Vert _{\infty ,Z_0}\). Due to Sobolev’s embedding, we have \(u\in W^{k+2,\infty }(\Omega )\). By Theorem 4.2, we have

$$\begin{aligned} \Vert g_I-g_p\Vert _{\infty ,Z_0} \le \Vert u-u_p\Vert _{\infty ,Z_0}= {\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+2,\infty ,\Omega } = {\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+4,\Omega }. \end{aligned}$$

Thus we get \(\forall v \in V^h_0\),

$$\begin{aligned}&\langle a (g_I-g_p)_x, v_x\rangle _e\\&\quad = \left\langle a \sum _{i,j=0}^k\lambda _{ij} \phi _{ij}(x,y)_x, v_x\right\rangle _e \le C\Vert {a}\Vert _{\infty ,\Omega }\max _{i,j}\left| \lambda _{ij}||\left\langle \sum _{i,j=0}^k\phi _{ij}(x,y)_x, v_x\right\rangle _e \right| . \end{aligned}$$

Since for polynomials on \({\hat{K}}\) all the norm are equivalent, we have

$$\begin{aligned} \left| \left\langle \sum _{i,j=0}^k\phi _{ij}(x,y)_x, v_x\right\rangle _e \right| = \left| \left\langle \sum _{i,j=0}^k{{\hat{\phi }}}_{ij}(s,t)_s, {\hat{v}}_s\right\rangle _{{\hat{K}}} \right| \le C |{\hat{v}}_s|_{\infty ,{\hat{K}}} \le C| v|_{1,{\hat{K}}} = C |v|_{1,e}, \end{aligned}$$

which implies

$$\begin{aligned} \langle a (g_I-g_p)_x, v_x\rangle _h\le C\Vert {a}\Vert _{\infty ,\Omega }\sum _e \max _{i,j}|\lambda _{ij}||v|_{1,e}= {\mathcal {O}}(h^{k+2})\Vert {a}\Vert _{\infty ,\Omega } \Vert u\Vert _{k+4,\Omega } \Vert v\Vert _{2,\Omega } \end{aligned}$$

Similarly, for any \(v \in V^h_0\), we have

$$\begin{aligned} \langle a (g_I-g_p)_y, v_y\rangle _h= & {} \mathcal O(h^{k+2})\Vert {a}\Vert _{\infty } \Vert u\Vert _{k+4} \Vert v\Vert _{2},\\ \langle a (g_I-g_p)_x, v_y\rangle _h= & {} \mathcal O(h^{k+2})\Vert {a}\Vert _{\infty } \Vert u\Vert _{k+4} \Vert v\Vert _{2},\\ \langle \mathbf{b } \cdot \nabla (g_I-g_p), v\rangle _h= & {} \mathcal O(h^{k+2})\Vert {\mathbf {b}}\Vert _{\infty } \Vert u\Vert _{k+4} \Vert v\Vert _{2},\\ \langle c(g_I-g_p), v\rangle _h= & {} \mathcal O(h^{k+2})\Vert {c}\Vert _{\infty } \Vert u\Vert _{k+4} \Vert v\Vert _{2}. \end{aligned}$$

Thus we conclude that

$$\begin{aligned} A_h( g_I-g_p,v_h) = {\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+4}\Vert v_h\Vert _{2},\quad \forall v_h\in V_0^h. \end{aligned}$$

\(\square \)

By (6.8) and Lemma 6.1, we have

$$\begin{aligned} A_h({{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h, v_h) = \mathcal O(h^{k+2})\Vert u\Vert _{k+4}\Vert v_h\Vert _{2},\quad \forall v_h\in V_0^h. \end{aligned}$$
(6.10)

Let \(\theta _h={{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h \in V_0^h\). Following Sect. 5.2, define the following dual problem: find \(w\in H_0^1(\Omega )\) satisfying

$$\begin{aligned} A^*(w,v)=(\theta _h,v),\quad \forall v\in H_0^1(\Omega ). \end{aligned}$$
(6.11)

Let \(w_h\in V_0^h\) be the solution to

$$\begin{aligned} A^*_h(w_h,v_h)=(\theta _h,v_h),\quad \forall v_h\in V_0^h. \end{aligned}$$
(6.12)

By (6.10) and Theorem 5.3, we get

$$\begin{aligned}&\Vert \theta _h\Vert _0^2=(\theta _h,\theta _h)=A_h^*(w_h,\theta _h) = A_h({{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h, w_h)\\&\quad =\mathcal O(h^{k+2})\Vert u\Vert _{k+4}\Vert w_h\Vert _{2} ={\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+4}\Vert \theta _h\Vert _{0}, \end{aligned}$$

thus \( \Vert {{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h\Vert _0=\Vert \theta _h\Vert _0=\mathcal O(h^{k+2}) \Vert u\Vert _{k+4}.\) By equivalence of norms for polynomials, we have

$$\begin{aligned} \Vert {{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h\Vert _{2, Z_0}\le C \Vert {{\tilde{u}}}^{*}_h- {{\tilde{u}}}_h\Vert _{0} = {\mathcal {O}}(h^{k+2})\Vert u\Vert _{k+4,\Omega }. \end{aligned}$$
(6.13)

Notice that both \({{\tilde{u}}}_h\) and \({{\tilde{u}}}^*_h\) are constant zero along \(\partial \Omega \), and \(u_h|_{\partial \Omega }=g_I\) is the Lagrangian interpolation of g along \(\partial \Omega \). With (6.7), we have proven the following main result.

Theorem 6.2

Assume elliptic regularity (2.6) and \(V^h\) ellipticity holds. For a nonhomogeneous Dirichlet boundary problem (6.1), with suitable smoothness assumptions for \(k \ge 2\), \(a_{ij}, b_i, c \in W^{k+2,\infty }(\Omega )\), the exact solution of (6.2) \(u(x,y) = {{\tilde{u}}} + {\bar{g}} \in H^{k+4}(\Omega )\) and \(f(x,y)\in H^{k+2}(\Omega )\), the numerical solution \(u_h\) by scheme (6.3) is a \((k+2)\)th order accurate approximation to u in the discrete 2-norm over all the \((k+1)\times (k+1)\) Gauss–Lobatto points:

$$\begin{aligned} \Vert u_h-u\Vert _{2, Z_0}= {\mathcal {O}} (h^{k+2})(\Vert u\Vert _{k+4}+\Vert f\Vert _{k+2}). \end{aligned}$$

7 Finite Difference Implementation

In this section we present the finite difference implementation of the scheme (6.3) for the case \(k=2\) on a uniform mesh. The finite difference implementation of the nonhomogeneous Dirichlet boundary value problem is based on a homogeneous Neumann boundary value problem, which will be discussed first. We demonstrate how it is derived for the one-dimensional case then give the two-dimensional implementation. It provides efficient assembling of the stiffness matrix and one can easily implement it in MATLAB. Implementations for higher order elements or quasi-uniform meshes can be similarly derived, even though it will no longer be a conventional finite difference scheme on a uniform grid.

7.1 One-Dimensional Case

Consider a homogeneous Neumann boundary value problem \( -(au')' = f \text { on } [0,1], u'(0) = 0, u'(1) = 0, \) and its variational form is to seek \(u\in H^1([0,1])\) satisfying

$$\begin{aligned} (au',v')=(f,v), \quad \forall v\in H^1([0,1]). \end{aligned}$$
(7.1)

Consider a uniform mesh \(x_i = ih\), \(i = 0,1,\ldots , n+1 \), \(h=\frac{1}{n+1}\). Assume n is odd and let \(N=\frac{n+1}{2}\). Define intervals \(I_k =[x_{2k},x_{2k+2}]\) for \(k=0,\ldots ,N-1\) as a finite element mesh for \(P^2\) basis. Define

$$\begin{aligned} V^h=\{v\in C^0([0,1]): v|_{I_k}\in P^2(I_k), k = 0,\ldots , N-1\}. \end{aligned}$$

Let \(\{v_i\}_{i=0}^{n+1} \subset V^h \) be a basis of \(V^h\) such that \(v_i(x_j)= \delta _{ij}, \,i,j=0,1,\ldots ,n+1\). With 3-point Gauss–Lobatto quadrature, the \(C^0\)-\(P^2\) finite element method for (7.1) is to seek \(u_h\in V^h\) satisfying

$$\begin{aligned} \langle au_h',v_i'\rangle _h=\langle f,v_i\rangle _h, \quad i=0,1,\ldots ,n+1. \end{aligned}$$
(7.2)

Let \(u_j=u_h(x_j)\), \(a_j=a(x_j)\) and \(f_j=f(x_j)\) then \(u_h(x)=\sum \limits _{j=0}^{n+1} u_jv_j(x)\). We have

$$\begin{aligned} \sum _{j=0}^{n+1} u_j \langle a v_j',v_i'\rangle _h =\langle au_h',v_j'\rangle _h = \langle f,v_i\rangle _h=\sum _{j=0}^{n+1} f_j \langle v_j,v_i\rangle _h , \quad i=0,1,\ldots ,n+1. \end{aligned}$$

The matrix form of this scheme is \({\bar{S}}\bar{ {\mathbf {u}}}={\bar{M}} \bar{{\mathbf {f}}}\), where

$$\begin{aligned} \bar{\mathbf{u }}=\begin{bmatrix} u_0,u_1,\ldots ,u_{n},u_{n+1} \end{bmatrix}^T,\quad \bar{\mathbf{f }}=\begin{bmatrix} f_0,f_1,\ldots ,f_{n},f_{n+1} \end{bmatrix}^T, \end{aligned}$$

the stiffness matrix \({\bar{S}}\) is has size \((n+2)\times (n+2)\) with (ij)th entry as \(\langle a v_i',v_j'\rangle _h\), and the lumped mass matrix M is a \((n+2)\times (n+2)\) diagonal matrix with diagonal entries \(h \begin{pmatrix} \frac{1}{3},\frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3},\frac{1}{3} \end{pmatrix}\).

Next we derive an explicit representation of the matrix \({\bar{S}}\). Since basis functions \(v_i\in V^h\) and \(u_h(x)\) are not \(C^1\) at the knots \(x_{2k}\) (\(k=1,2,\ldots ,N-1\)), their derivatives at the knots are double valued. We will use superscripts \(+\) and − to denote derivatives obtained from the right and from the left respectively, e.g., \(v'^+_{2k}\) and \(v'^-_{2k+2}\) denote the derivatives of \(v_{2k}\) and \(v_{2k+2}\) respectively in the interval \(I_{k}=[x_{2k}, x_{2k+2}]\). Then in the interval \(I_{k}=[x_{2k}, x_{2k+2}]\) we have the following representation of derivatives

$$\begin{aligned} \begin{bmatrix} v'^+_{2k}(x)\\ v'_{2k+1}(x)\\ v'^-_{2k+2}(x) \end{bmatrix} = \frac{1}{2h}\begin{bmatrix} -3 &{}\quad 4 &{}\quad -1\\ -1 &{}\quad 0 &{}\quad 1\\ 1 &{}\quad -\,4 &{}\quad 3 \end{bmatrix} \begin{bmatrix} v_{2k}(x)\\ v_{2k+1}(x)\\ v_{2k+2}(x) \end{bmatrix}. \end{aligned}$$
(7.3)

By abusing notations, we use \((v_i)'_{2k}\) to denote the average of two derivatives of \(v_i\) at the knots \(x_{2k}\):

$$\begin{aligned} (v_i)'_{2k} = \frac{1}{2}[(v_i')_{2k}^-+(v_i')^+_{2k}]. \end{aligned}$$

Let \([v_i]\) denote the difference between the right derivative and left derivative:

$$\begin{aligned} {[}v_i']_0=[v_i']_{n+2}=0, \quad [v_i']_{2k}: = (v_i')^+_{2k}-(v_i')^-_{2k}, \quad k=1,2,\ldots ,N-1. \end{aligned}$$

Then at the knots, we have

$$\begin{aligned} (v_i')^-_{2k} (v_j')^-_{2k}+(v_i')^+_{2k} (v_j')^+_{2k} =2(v_i')_{2k} (v_j')_{2k}+ \frac{1}{2}[v_i]_{2k} [v_j]_{2k}. \end{aligned}$$
(7.4)

We also have

$$\begin{aligned}&\langle a v_j',v_i'\rangle _{I_{2k}}\nonumber \\&\quad =h\left[ \frac{1}{3}a_{2k}(v_j')^+_{2k}(v_i')^+_{2k}+\frac{4}{3}a_{2k+1} (v_j')_{2k+1}(v_i')_{2k+1}+\frac{1}{3} a_{2k+2} (v_j')^-_{2k+2}(v_i')^-_{2k+2}\right] . \nonumber \\ \end{aligned}$$
(7.5)

Let \({\mathbf {v}}_i\) denote a column vector of size \(n+2\) consisting of grid point values of \(v_i(x)\). Plugging (7.4) into (7.5), with (7.3), we get

$$\begin{aligned} \langle a v_j',v_i'\rangle _h =\sum _{k=0}^{N-1} \langle a v_j',v_i'\rangle _{I_{2k}}= \frac{1}{h} \mathbf{v }_i^T ( D^T WA D + E^T WA E)\mathbf{v }_j, \end{aligned}$$

where A is a diagonal matrix with diagonal entries \(a_0,a_1,\ldots ,a_{n},a_{n+1}\), and

$$\begin{aligned} W&=diag\begin{pmatrix} \frac{1}{3},\frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3},\frac{1}{3} \end{pmatrix}_{(n+2)\times (n+2)},\\ D=&\frac{1}{2}\left( {\begin{matrix} -3 &{}\quad 4 &{}\quad -1 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ -1 &{}\quad 0 &{}\quad 1 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ \frac{1}{2} &{}\quad -2 &{}\quad 0 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad \frac{1}{2} &{}\quad -2 &{}\quad 0 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \frac{1}{2} &{}\quad -2 &{}\quad 0 &{}\quad 2 &{}\quad -\frac{1}{2}\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 1 &{}\quad -4 &{}\quad 3 \end{matrix}}\right) _{(n+2)\times (n+2)},\\ E&=\frac{1}{2} \left( {\begin{matrix} 0 &{}\quad 0 &{}\quad 0 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ -\frac{1}{2} &{}\quad 2 &{}\quad -3 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad -\frac{1}{2} &{}\quad 2 &{}\quad -3 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -\frac{1}{2} &{}\quad 2 &{}\quad -3 &{}\quad 2 &{}\quad -\frac{1}{2}\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 \end{matrix}}\right) _{(n+2)\times (n+2)}. \end{aligned}$$

Since \(\{v_i\}_{i=0}^{n}\) are the Lagrangian basis for \(V^h\), we have

$$\begin{aligned} {\bar{S}} = \frac{1}{h}(D^T W A D + E^T W A E). \end{aligned}$$
(7.6)

Now consider the one-dimensional Dirichlet boundary value problem:

$$\begin{aligned} -(au')' =&f \text { on } [0,1], \\ u(0) = \sigma _1, \quad&u(1) = \sigma _2. \end{aligned}$$

Consider the same mesh as above and define

$$\begin{aligned} V^h_0=\{v\in C^0([0,1]): v|_{I_k}\in P^2(I_k), k = 0,\ldots , N-1; v(0)=v(1)=0\}. \end{aligned}$$

Then \(\{v_i\}_{i=1}^{n} \subset V^h \) is a basis of \(V^h_0\) for \(\{v_i\}_{i=0}^{n+1}\) defined above. The one-dimensional version of (6.3) is to seek \(u_h\in V^h_0\) satisfying

$$\begin{aligned} \begin{aligned} \langle au_h',v_i'\rangle _h&=\langle f,v_i\rangle _h-\langle a g_I',v_i'\rangle _h, \quad i=1,2,\ldots ,n,\\ g_I(x)&=\sigma _0 v_0(x)+\sigma _1 v_{n+1}(x). \end{aligned} \end{aligned}$$
(7.7)

Notice that we can obtain (7.7) by simply setting \(u_h(0)=\sigma _0\) and \(u_h(1)=\sigma _1\) in (7.2). So the finite difference implementation of (7.7) is given as follows:

  1. 1.

    Assemble the \((n+2)\times (n+2)\) stiffness matrix \({\bar{S}}\) for homogeneous Neumann problem as in (7.6).

  2. 2.

    Let S denote the \(n\times n\) submatrix \({\bar{S}}(2:n+1, 2:n+1)\), i.e., \([{\bar{S}}_{ij}]\) for \(i,j=2,\cdots , n+1\).

  3. 3.

    Let \({\mathbf {l}}\) denote the \(n\times 1\) submatrix \(\bar{S}(2:n+1, 1)\) and \({\mathbf {r}}\) denote the \(n\times 1\) submatrix \(\bar{S}(2:n+1, n+2)\), which correspond to \(v_0(x)\) and \(v_{n+1}(x)\).

  4. 4.

    Let \({\mathbf {u}}=\begin{bmatrix} u_1&u_2&\cdots&u_n \end{bmatrix}^T\) and \({\mathbf {f}}=\begin{bmatrix} f_1&f_2&\cdots&f_n \end{bmatrix}^T\). Define \({\mathbf {w}}=\begin{bmatrix} \frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3} \end{bmatrix} \) as a column vector of size n. The scheme (7.7) can be implemented as

    $$\begin{aligned} S {\mathbf {u}}= h {\mathbf {w}}^T{\mathbf {f}} -\sigma _0\mathbf{l}-\sigma _1{\mathbf {r}}. \end{aligned}$$

7.2 Notations and Tools for the Two-Dimensional Case

We will need two operators:

  • Kronecker product of two matrices: if A is \(m \times n\) and B is \(p\times q\), then \(A\otimes B\) is \(mp\times nq\) give by

    $$\begin{aligned} A\otimes B=\begin{pmatrix} a_{11}B &{}\quad \cdots &{}\quad a_{1n} B\\ \vdots &{}\quad \vdots &{}\quad \vdots \\ a_{m1}B &{}\quad \cdots &{}\quad a_{mn} B \end{pmatrix}. \end{aligned}$$
  • For a \(m\times n\) matrix X, vec(X) denotes the vectorization of the matrix X by rearranging X into a vector column by column.

The following properties will be used:

  1. 1.

    \((A \otimes B)(C \otimes D)=AC \otimes BD\).

  2. 2.

    \((A \otimes B)^{-1}=A^{-1}\otimes B^{-1}\).

  3. 3.

    \((B^T\otimes A)vec(X)=vec(AXB)\).

  4. 4.

    \((A \otimes B)^T=A^T \otimes B^T.\)

Consider a uniform grid \((x_i,y_j)\) for a rectangular domain \(\bar{\Omega }=[0,1]\times [0,1]\) where \(x_i = ih_x\), \(i = 0,1,\ldots , n_x+1\), \(h_x=\frac{1}{n_x+1}\) and \(y_j = jh_y\), \(j = 0,1,\ldots , n_y+1\), \(h_y=\frac{1}{n_y+1}\).

Assume \(n_x\) and \(n_y\) are odd and let \(N_x=\frac{n_x+1}{2}\) and \(N_y=\frac{n_y+1}{2}\). We consider rectangular cells \(e_{kl} =[x_{2k},x_{2k+2}]\times [y_{2l},y_{2l+2}]\) for \(k=0,\ldots ,N_x-1\) and \(l=0,\ldots ,N_y-1\) as a finite element mesh for \(Q^2\) basis. Define

$$\begin{aligned} V^h= & {} \{v\in C^0(\Omega ): v|_{e_{kl}}\in Q^2(e_{kl}), k = 0,\ldots , N_x-1, l = 0,\ldots , N_y-1 \},\\ V^h_0= & {} \{v\in C^0(\Omega ): v|_{e_{kl}}\in Q^2(e_{kl}), k = 0,\ldots , N_x-1, l = 0,\ldots , N_y-1; v|_{\partial \Omega }\equiv 0 \}. \end{aligned}$$

For the coefficients \({\mathbf {a}}(x,y)=\begin{pmatrix} a^{11} &{}\quad a^{12}\\ a^{21} &{}\quad a^{22} \end{pmatrix} \), \({\mathbf {b}}=[b^1 \quad b^2]\) and c in the elliptic operator (2.3), consider their grid point values in the following form:

$$\begin{aligned} A^{kl}&=\begin{pmatrix} a_{00} &{}\quad a_{01} &{}\quad \dots &{}\quad a_{0,n_x+1}\\ a_{10} &{}\quad a_{11} &{}\quad \dots &{}\quad a_{1,n_x+1}\\ \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots \\ a_{n_y+1,0} &{}\quad a_{n_y+1,1} &{}\quad \dots &{}\quad a_{n_y+1,,n_x+1} \end{pmatrix}_{(n_y+2)\times (n_x+2)},\quad a_{ij}=a^{kl}(x_j,y_i), \quad k,l =1,2,\\ B^{m}&=\begin{pmatrix} b_{00} &{}\quad b_{01} &{}\quad \dots &{}\quad b_{0,n_x+1}\\ b_{10} &{}\quad b_{11} &{}\quad \dots &{}\quad b_{1,n_x+1}\\ \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots \\ b_{n_y+1,0} &{}\quad b_{n_y+1,1} &{}\quad \dots &{}\quad b_{n_y+1,n_x+1} \end{pmatrix}_{(n_y+2)\times (n_x+2)},\quad b_{ij}=b^{m}(x_j,y_i), \quad m =1,2,\\ C&=\begin{pmatrix} c_{00} &{}\quad c_{01} &{}\quad \dots &{}\quad c_{0,n_x+1}\\ c_{10} &{}\quad c_{11} &{}\quad \dots &{}\quad c_{1,n_x+1}\\ \vdots &{}\quad \vdots &{}\quad &{}\quad \vdots \\ c_{n_y+1,0} &{}\quad c_{n_y+1,1} &{}\quad \dots &{}\quad c_{n_y+1,n_x+1} \end{pmatrix}_{(n_y+2)\times (n_x+2)},\quad c_{ij}=c(x_j,y_i). \end{aligned}$$

Let \(diag({\mathbf {x}})\) denote a diagonal matrix with the vector \({\mathbf {x}}\) as diagonal entries and define

$$\begin{aligned} \bar{W}_x= & {} diag\begin{pmatrix} \frac{1}{3},\frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3},\frac{1}{3} \end{pmatrix}_{(n_x+2)\times (n_x+2)},\\ \bar{W}_y= & {} diag\begin{pmatrix} \frac{1}{3},\frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3},\frac{1}{3} \end{pmatrix}_{(n_y+2)\times (n_y+2)},\\ {W}_x= & {} diag\begin{pmatrix} \frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3} \end{pmatrix}_{n_x \times n_x}, {W}_y=diag\begin{pmatrix} \frac{4}{3},\frac{2}{3},\frac{4}{3},\frac{2}{3},\ldots ,\frac{2}{3},\frac{4}{3} \end{pmatrix}_{n_y \times n_y }. \end{aligned}$$

Let \(s=x\) or y, we define the D and E matrices with dimension \({(n_s+2)\times (n_s+2)}\) for each variable:

$$\begin{aligned} D_s&=\frac{1}{2}\left( {\begin{matrix} -3 &{}\quad 4 &{}\quad -1 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ -1 &{}\quad 0 &{}\quad 1 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ \frac{1}{2} &{}\quad -2 &{}\quad 0 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad \frac{1}{2} &{}\quad -2 &{}\quad 0 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \frac{1}{2} &{}\quad -2 &{}\quad 0 &{}\quad 2 &{}\quad -\frac{1}{2}\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 0 &{}\quad 1\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 1 &{}\quad -4 &{}\quad 3 \end{matrix}}\right) , \\ E_s&=\frac{1}{2}\left( {\begin{matrix} 0 &{}\quad 0 &{}\quad 0 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ 0 &{}\quad 0 &{}\quad 0 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ -\frac{1}{2} &{}\quad 2 &{}\quad -3 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad &{}\quad &{}\quad \\ &{}\quad &{}\quad -\frac{1}{2} &{}\quad 2 &{}\quad -3 &{}\quad 2 &{}\quad -\frac{1}{2} &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \ddots &{}\quad &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -\frac{1}{2} &{}\quad 2 &{}\quad -3 &{}\quad 2 &{}\quad -\frac{1}{2}\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad 0 &{}\quad 0 &{}\quad 0 \end{matrix}}\right) . \end{aligned}$$

Define an inflation operator \(Infl: \mathbb {R}^{n_y\times n_x}\longrightarrow \mathbb {R}^{(n_y+2)\times (n_x+2)}\) by adding zeros:

$$\begin{aligned} Infl(U)=\begin{pmatrix} 0 &{}\quad \cdots &{}\quad 0\\ \vdots &{}\quad U &{}\quad \vdots \\ 0 &{}\quad \cdots &{}\quad 0\\ \end{pmatrix}_{(n_y+2)\times (n_x+2)} \end{aligned}$$

and its matrix representation is given as \({{\tilde{I}}}_x \otimes \tilde{I}_y\) where

$$\begin{aligned} {{\tilde{I}}}_x=\begin{pmatrix} {\mathbf {0}} \\ I_{n_x\times n_x} \\ {\mathbf {0}} \end{pmatrix}_{(n_x+2)\times n_x}, {{\tilde{I}}}_y=\begin{pmatrix} {\mathbf {0}} \\ I_{n_y\times n_y} \\ {\mathbf {0}} \end{pmatrix}_{(n_y+2)\times n_y}. \end{aligned}$$

Its adjoint is a restriction operator \(Res: \mathbb {R}^{(n_y+2)\times (n_x+2)} \longrightarrow \mathbb {R}^{n_y\times n_x} \) as

$$\begin{aligned} Res(X)=X(2:n_y+1, 2:n_x+1)\quad , \forall X \in \mathbb {R}^{(n_y+2)\times (n_x+2)}, \end{aligned}$$

and its matrix representation is \({{\tilde{I}}}_x^T \otimes \tilde{I}_y^T.\)

7.3 Two-Dimensional Case

For \({{\bar{\Omega }}}=[0,1]^2\) we first consider an elliptic equation with homogeneous Neumann boundary condition:

$$\begin{aligned}&-\nabla \cdot ({\mathbf {a}} \nabla u ) +{\mathbf {b}}\nabla u + c u = f \text { on } \Omega , \end{aligned}$$
(7.8)
$$\begin{aligned}&{\mathbf {a}} \nabla u \cdot {\mathbf {n}} = 0 \text { on } \partial \Omega . \end{aligned}$$
(7.9)

The variational form is to find \(u\in H^1(\Omega )\) satisfying

$$\begin{aligned} A( u, v)=(f,v),\quad \forall v\in H^1(\Omega ). \end{aligned}$$
(7.10)

The \(C^0\)-\(Q^2\) finite element method with \(3\times 3\) Gauss–Lobatto quadrature is to find \(u_h\in V^h\) satisfying

$$\begin{aligned} \langle {\mathbf {a}} \nabla u_h ,\nabla v_h \rangle _h +\langle \mathbf{b}\nabla u_h ,v_h\rangle _h+ \langle c u_h,v_h\rangle _h=\langle f,v_h\rangle _h,\quad \forall v_h\in V^h, \end{aligned}$$
(7.11)

Let \({\bar{U}}\) be a \((n_y+2)\times (n_x+2)\) matrix such that its (ji)th entry is \({\bar{U}}(j,i)=u_h(x_{i-1},y_{j-1})\), \(i=1,\ldots ,n_x+2\), \(j=1,\ldots ,n_y+2\). Let \({\bar{F}}\) be a \((n_y+2)\times (n_x+2)\) matrix such that its (ji)th entry is \({\bar{F}}(j,i)=f(x_{i-1},y_{j-1})\). Then the matrix form of (7.11) is

$$\begin{aligned} {\bar{S}}vec({\bar{U}}) = {\bar{M}} vec({\bar{F}}), \quad {\bar{M}}=h_xh_y \bar{W}_x\otimes {\bar{W}}_y, \quad {\bar{S}}= \sum _{k,l=1}^2 S_a^{kl}+ \sum _{m=1}^2 S_b^m +S_c, \end{aligned}$$
(7.12)

where

$$\begin{aligned} S_a^{11}&=\frac{h_y}{h_x}(D_x^T\otimes I_y)diag(vec({\bar{W}}_y A^{11}{\bar{W}}_x))(D_x\otimes I_y)\\&\quad + \frac{h_y}{h_x}(E_x^T \otimes I_y)diag(vec({\bar{W}}_y A^{11}{\bar{W}}_x))( E_x\otimes I_y), \\ S_a^{12}&=(D_x^T\otimes I_y)diag(vec({\bar{W}}_y A^{12}\bar{W}_x))(I_x\otimes D_y)\\&\quad + (E_x^T \otimes I_y)diag(vec({\bar{W}}_y A^{12}{\bar{W}}_x))(I_x\otimes E_y), \\ S_a^{21}&=(I_x\otimes D_y^T)diag(vec({\bar{W}}_y A^{21}\bar{W}_x))(D_x\otimes I_y)\\&\quad + (I_x\otimes E_y^T)diag(vec({\bar{W}}_y A^{21}{\bar{W}}_x))( E_x\otimes I_y),\\\ S_a^{22}&=\frac{h_x}{h_y}(I_x\otimes D_y^T)diag(vec({\bar{W}}_y A^{22}{\bar{W}}_x))(I_x\otimes D_y)\\&\quad + \frac{h_x}{h_y}(I_x\otimes E_y^T)diag(vec({\bar{W}}_y A^{22}{\bar{W}}_x))(I_x\otimes E_y),\\ S_b^1&= h_y diag(vec({\bar{W}}_y B^1{\bar{W}}_x))(D_x\otimes I_y),\\ S_b^2&= h_x diag(vec({\bar{W}}_y B^2{\bar{W}}_x))(I_x\otimes D_y),\\ S_c&= h_x h_y diag(vec({\bar{W}}_y C{\bar{W}}_x). \end{aligned}$$

Now consider the scheme (6.3) for nonhomogeneous Dirichlet boundary conditions. Its numerical solution can be represented as a matrix U of size \(ny\times nx\) with (ji)-entry \(U(j,i)=u_h(x_i, y_j)\) for \(i=1,\cdots , nx; j=1,\cdots , ny\). Similar to the one-dimensional case, its stiffness matrix can be obtained as the submatrix of \({\bar{S}}\) in (7.12). Let \({\bar{G}}\) be a \((n_y+2)\) by \((n_x+2)\) matrix with (ji)th entry as \(\bar{G}(j,i)=g(x_{i-1},y_{j-1})\), where

$$\begin{aligned} g(x,y)={\left\{ \begin{array}{ll} 0,&{}\quad \text{ if }\quad (x,y)\in (0,1)\times (0,1),\\ g(x,y),&{}\quad \text{ if }\quad (x,y)\in \partial \Omega .\\ \end{array}\right. } \end{aligned}$$

In particular, \({\bar{G}}(j+1,i+1) = 0\) for \(j=1,\ldots ,n_y\), \(i=1,\ldots ,n_x\). Let F be a matrix of size \(ny\times nx\) with (ji)-entry as \(F(j,i)=f(x_i, y_j)\) for \(i=1,\cdots , nx; j=1,\cdots , ny\). Then the scheme (6.3) becomes

$$\begin{aligned} ({{\tilde{I}}}_x^T\otimes {{\tilde{I}}}_y^T){\bar{S}}({{\tilde{I}}}_x\otimes \tilde{I}_y)vec(U) = (W_x\otimes W_y)vec(F)-({{\tilde{I}}}_x^T\otimes \tilde{I}_y^T){\bar{S}} vec({\bar{G}}). \end{aligned}$$
(7.13)

Even though the stiffness matrix is given as \(S=({{\tilde{I}}}_x^T\otimes {{\tilde{I}}}_y^T){\bar{S}}({{\tilde{I}}}_x\otimes {{\tilde{I}}}_y)\), S should be implemented as a linear operator in iterative linear system solvers. For example, the matrix vector multiplication \((\tilde{I}_x^T\otimes {{\tilde{I}}}_y^T)S^{11}_a({{\tilde{I}}}_x\otimes \tilde{I}_y)vec(U)\) is equivalent to the following linear operator from \(\mathbb {R}^{ny\times nx}\) to \(\mathbb {R}^{ny\times nx}\):

$$\begin{aligned}&\frac{h_y}{h_x}{{\tilde{I}}}_y^T\left\{ I_y\left( [{\bar{W}}_y A^{11}\bar{W}_x]\circ [I_y({{\tilde{I}}}_y U \tilde{I}^T_x)D_x^T]\right) D_x\right. \\&\quad \left. +I_y\left( [{\bar{W}}_y A^{11}\bar{W}_x]\circ [I_y({{\tilde{I}}}_y U \tilde{I}^T_x)E_x^T]\right) E_x\right\} {{\tilde{I}}}_x, \end{aligned}$$

where \(\circ \) is the Hadamard product (i.e., entrywise multiplication).

7.4 The Laplacian Case

For one-dimensional constant coefficient case with homogeneous Dirichlet boundary condition, the scheme can be written as a classical finite difference scheme \(H {\mathbf {u}}={\mathbf {f}}\) with

$$\begin{aligned} H=M^{-1}S=\frac{1}{h^2}\left( {\begin{matrix} 2&{}\quad -1 &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad \\ -2&{}\quad \frac{7}{2} &{}\quad -2 &{}\quad \frac{1}{4} &{}\quad &{}\quad &{}\quad \\ &{}\quad -1 &{}\quad 2&{}\quad -1 &{}\quad &{}\quad &{}\quad \\ &{}\quad \frac{1}{4} &{}\quad -2&{}\quad \frac{7}{2} &{}\quad -2 &{}\quad \frac{1}{4} &{}\quad \\ &{}\quad &{}\quad &{}\quad -1 &{}\quad 2&{}\quad -1 &{}\quad \\ &{}\quad &{}\quad &{}\quad &{}\quad \ddots &{}\quad \ddots &{}\quad \\ &{}\quad &{}\quad &{}\quad \frac{1}{4} &{}\quad -2&{}\quad \frac{7}{2} &{}\quad -2\\ &{}\quad &{}\quad &{}\quad &{}\quad &{}\quad -1 &{}\quad 2\\ \end{matrix}}\right) \end{aligned}$$

In other words, if \(x_i\) is a cell center, the scheme is

$$\begin{aligned} \frac{-u_{i-1}+2u_i-u_{i+1}}{h^2}=f_i, \end{aligned}$$

and if \(x_i\) is a knot away from the boundary, the scheme is

$$\begin{aligned} \frac{u_{i-2}-8u_{i-1}+14u_i-8u_{i+1}+u_{i+2}}{4h^2}=f_i. \end{aligned}$$

It is straightforward to verify that the local truncation error is only second order.

For the two-dimensional Laplacian case homogeneous Dirichlet boundary condition, the scheme can be rewritten as

$$\begin{aligned} (H_x\otimes I_y)+(I_x\otimes H_y)vec(U)=vec(F), \end{aligned}$$

where \(H_x\) and \(H_y\) are the same H matrix above with size \(n_x\times n_x\) and \(n_y\times n_y\) respectively. The inverse of \((H_x\otimes I_y)+(I_x\otimes H_y)\) can be efficiently constructed via the eigen-decomposition of small matrices \(H_x\) and \(H_y\):

  1. 1.

    Compute eigen-decomposition of \(H_x=T_x \Lambda _x T_x^{-1}\) and \(H_y=T_y \Lambda _y T_y^{-1}\).

  2. 2.

    The properties of Kronecker product imply that

    $$\begin{aligned} (H_x\otimes I_y)+(I_x\otimes H_y)=(T_x\otimes T_y)(\Lambda _x\otimes I_y+I_x\otimes \Lambda _y)(T_x^{-1}\otimes T_y^{-1}), \end{aligned}$$

    thus

    $$\begin{aligned} {[}(H_x\otimes I_y)+(I_x\otimes H_y)]^{-1}=(T_x\otimes T_y)(\Lambda _x\otimes I_y+I_x\otimes \Lambda _y)^{-1}(T_x^{-1}\otimes T_y^{-1}). \end{aligned}$$
  3. 3.

    It is nontrivial to determine whether H is diagonalizable. In all our numerical tests, H has no repeated eigenvalues. So if assuming \(\Lambda _x\) and \(\Lambda _y\) are diagonal matrices, the matrix vector multiplication \([(H_x\otimes I_y)+(I_x\otimes H_y)]^{-1}vec(F)\) can be implemented as a linear operator on F:

    $$\begin{aligned} T_y([T_y^{-1} F(T_x^{-1})^T]./\Lambda )T_x^T, \end{aligned}$$
    (7.14)

    where \(\Lambda \) is a \(n_y\times n_x\) matrix with (ij)th entry as \(\Lambda (i,j)=\Lambda _y(i,i)+\Lambda _x(j,j)\) and ./ denotes entry-wise division for two matrices of the same size.

For the 3D Laplacian, the matrix can be represented as \(H_x\otimes I_y\otimes I_z+I_x\otimes H_y\otimes I_z+I_x\otimes I_y\otimes H_z\) thus can be efficiently inverted through eigen-decomposition of small matrices \(H_x, H_y\) and \(H_z\) as well.

Since the eigen-decomposition of small matrices \(H_x\) and \(H_y\) can be precomputed, and (7.14) costs only \(\mathcal O(n^3)\) for a 2D problem on a mesh size \(n\times n\), in practice (7.14) can be used as a simple preconditioner in conjugate gradient solvers for the following linear system equivalent to (7.13):

$$\begin{aligned} (W_x^{-1}\otimes W_y^{-1})({{\tilde{I}}}_x^T\otimes {{\tilde{I}}}_y^T)\bar{S}({{\tilde{I}}}_x\otimes {{\tilde{I}}}_y)vec(U) = vec(F)-(W_x^{-1}\otimes W_y^{-1})({{\tilde{I}}}_x^T\otimes {{\tilde{I}}}_y^T){\bar{S}} vec(G), \end{aligned}$$

even though the multigrid method as reviewed in [19] is the optimal solver in terms of computational complexity.

8 Numerical Results

In this section we show a few numerical tests verifying the accuracy of the scheme (6.3) for \(k=2\) implemented as a finite difference scheme on a uniform grid. We first consider the following two dimensional elliptic equation:

$$\begin{aligned} - \nabla \cdot ({\mathbf {a}}\nabla u)+{\mathbf {b}}\cdot \nabla u+c u=f\quad \text {on } [0,1]\times [0,2] \end{aligned}$$
(8.1)

where \({\mathbf {a}}=\left( {\begin{array}{cc} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \\ \end{array} } \right) \), \(a_{11}=10+30y^5+x\cos {y}+y\), \(a_{12}=a_{21}=2+0.5(\sin (\pi x)+x^3)(\sin (\pi y)+y^3)+\cos (x^4+y^3)\), \(a_{22}=10+x^5\), \({\mathbf {b}}={\mathbf {0}}\), \(c=1+x^4y^3\), with an exact solution

$$\begin{aligned} u(x,y)=0.1(\sin (\pi x)+x^3)(\sin (\pi y)+y^3)+\cos (x^4+y^3). \end{aligned}$$

The errors at grid points are listed in Table 1 for purely Dirichlet boundary condition and Table 2 for purely Neumann boundary condition. We observe fourth order accuracy in the discrete 2-norm for both tests, even though only \({\mathcal {O}}(h^{3.5})\) can be proven for Neumann boundary condition as discussed in Remark 5.5. Regarding the maximum norm of the superconvergence of the function values at Gauss–Lobatto points, one can only prove \({\mathcal {O}}(h^3\log h)\) even for the full finite element scheme (1.1) since discrete Green’s function is used, see [4].

Table 1 A 2D elliptic equation with Dirichlet boundary conditions
Table 2 A 2D elliptic equation with Neumann boundary conditions

Next we consider a three-dimensional problem \(-\Delta u=f\) with homogeneous Dirichlet boundary conditions on a cube \([0,1]^3\) with the following exact solution

$$\begin{aligned} u(x,y,z)=\sin (\pi x)\sin (2\pi y)\sin (3\pi z)+(x-x^3)(y^2-y^4)(z-z^2). \end{aligned}$$

See Table 3 for the performance of the finite difference scheme. There is no essential difficulty to extend the proof to three dimensions, even though it is not very straightforward. Nonetheless we observe that the scheme is indeed fourth order accurate. The linear system is solved by the eigenvector method shown in Sect. 7.4. The discrete 2-norm over the set of all grid points \(Z_0\) is defined as \(\Vert u\Vert _{2,Z_0}=\left[ h^3\sum _{(x,y,z)\in Z_0} |u(x,y,z)|^2\right] ^{\frac{1}{2}}\).

Table 3 \(-\Delta u=f\) in 3D with homogeneous Dirichlet boundary condition

Last we consider (8.1) with convection term and the coefficients \({\mathbf {b}}\) is incompressible \(\nabla \cdot \mathbf{b}=0\): \({\mathbf {a}}=\left( {\begin{array}{cc} a_{11} &{} a_{12} \\ a_{21} &{} a_{22} \\ \end{array} } \right) \), \(a_{11}=100+30y^5+x\cos {y}+y\), \(a_{12}=a_{21}=2+0.5(\sin (\pi x)+x^3)(\sin (\pi y)+y^3)+\cos (x^4+y^3)\), \(a_{22}=100+x^5\), \({\mathbf {b}}=\left( {\begin{array}{c} b_{1} \\ b_{2} \\ \end{array} } \right) \), \(b_1=\psi _y\), \(b_2=-\psi _x\), \(\psi = x\exp (x^2+y)\), \(c=1+x^4y^3\), with an exact solution

$$\begin{aligned} u(x,y)=0.1(\sin (\pi x)+x^3)(\sin (\pi y)+y^3)+\cos (x^4+y^3). \end{aligned}$$

The errors at grid points are listed in Table 4 for Dirichlet boundary conditions.

Table 4 A 2D elliptic equation with convection term and Dirichlet boundary conditions

9 Concluding Remarks

In this paper we have proven the superconvergence of function values in the simplest finite difference implementation of \(C^0\)-\(Q^k\) finite element method for elliptic equations. In particular, for the case \(k=2\) the scheme (6.3) can be easily implemented as a fourth order accurate finite difference scheme as shown in Sect. 7. It provides only only an convenient approach for constructing fourth order accurate finite difference schemes but also the most efficient implementation of \(C^0\)-\(Q^k\) finite element method without losing superconvergence of function values. In a follow up paper [12], we will show that discrete maximum principle can be proven for the scheme (6.3) in the case \(k=2\) when solving a variable coefficient Poisson equation.