1 Introduction

From the early beginning with the highly recognized and independent works [1, 10], the Helmholtz decomposition has been the dominating tool in the first a posteriori error estimates for mixed finite element methods. It allows a decomposition of the flux error vector field with components in \(L^2\) without any structure in a sum of some gradient and some curl term each of which is then successfully recast with piecewise integration by parts. The Helmholtz decomposition requires simply-connected domains which is sometimes unmentioned in the literature. An alternative formulation which does not require the Helmholtz decomposition explicitly has been provided as a unifying theory in [11, 14].

In the underlying dual formulation of mixed finite element methods, the primal variable acts as a Lagrange multiplier in \(L^2\). The error of that variable is not mentioned in [1] and analyzed with a duality argument for a convex polygonal domain in [10] for full elliptic regularity. From the arguments of that paper, it appears that no efficient error control is feasible for non-convex domains.

However, the present paper replaces the Helmholtz decomposition by a regular split [18, 19] and so establishes efficient error control for non-convex and possibly multiply-connected domains. The functional analytical framework is different to [11, 14] and employs the continuous spaces of the mixed finite element method like \(H({{\mathrm{div}}})\) and \(L^2\). It therefore arises the question how to compute some given residual in the dual \(H({{\mathrm{div}}})^*\) of \(H({{\mathrm{div}}})\).

Given some functions \(p_h\in L^2(\Omega ;\mathbb {R}^{m\times n})\) and \(u_h\in L^2(\Omega ;\mathbb {R}^m)\) on the bounded Lipschitz domain \(\Omega \) with polyhedral boundary \(\partial \Omega \) with values in the matrix space \(\mathbb {R}^{m\times n}\) (with scalar product  : ) and in the vector space \(\mathbb {R}^m\) (with scalar product \(\cdot \)), consider the functional

$$\begin{aligned} \ell (q):=\int _{\Omega }(p_h:q+u_h\cdot {{\mathrm{div}}}q)\,dx \end{aligned}$$
(1)

for any test function \(q\in \mathcal {H}:=H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n})\equiv H({{\mathrm{div}}},\Omega )^m\). This functional \(\ell \) is a typical residual of approximations \(p_h\) and \(u_h\) to the flux variable p and the displacement variable u with \(p=Du\) for some functional matrix D and follows via an integration by parts. In a typical mixed formulation for the discretization of a gradient, this residual \(\ell \) vanishes for the discrete test functions in some finite element subspace \(M_h\) of \( H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n})\). Under some natural conditions on some Fortin interpolation operator, the main result of this paper establishes that the a posteriori error estimator

$$\begin{aligned} \eta := \min _{ v\in H^1_0(\Omega ;\mathbb {R}^m)} \Vert p_h-Dv\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})}+\min _{ q\in Q} \Vert h(p_h-q)\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})} \end{aligned}$$
(2)

is reliable and efficient in the sense of the equivalence

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}\approx \eta . \end{aligned}$$
(3)

The weight function h in (2) is the local mesh-size with respect to some underlying triangulation. The point is that the power of the weight h in (2) is one also for non-convex domains in contrast to the suggestion of the immediate modification of the analysis in [10] to reduced elliptic regularity. It seems surprising that the variable \(u_h\) does not enter \(\eta \) in (2) explicitly; there is an implicit dependence via some orthogonality condition which leads to the space Q. The analysis relies on some Fortin interpolation operator \(I_F\) which maps gradients in the kernel of \(\ell \) and which specifies the space Q as an orthogonal complement, see Sect. 3. Whatever this space Q is, in all examples of this papers (and no counter-example is known to the authors at all) it holds

$$\begin{aligned} D_h u_h\in Q \end{aligned}$$

(where \(D_hu_h\) is the piecewise application of the functional matrix D to the piecewise smooth \(u_h\)). Hence,

$$\begin{aligned} \eta _h:= \min _{ v\in H^1_0(\Omega ;\mathbb {R}^m)} \Vert p_h-Dv\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})} + \Vert h(p_h-D_h u_h)\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})}\ge \eta \end{aligned}$$

defines a computable reliable estimator (where the minimization in the first term may be further estimated by the techniques introduced in [14]). The efficiency analysis covers this estimator \(\eta _h\) as well and hence provides the equivalence \( \Vert \ell \Vert _{\mathcal {H}^*}\approx \eta _h\approx \eta \).

The minimization in the first term on the right-hand side of (2) is standard and can be performed via a post-processing of \(u_h\) that provides continuity. This leads to some \(v\in H^1_0(\Omega ;\mathbb {R}^m)\) which bounds the minimum from above and leads to a guaranteed upper bound.

The abstract results apply immediately to the dual formulation of the Poisson equation and the pseudo-stress formulation of the Stokes equations. Further applications require a modification. In linear elasticity, an extended bilinear form is required for PEERS while the Arnold–Winther FEM requires symmetric strain tensors (rather than the functional matrices). All those applications generalize known results to non-convex domains and improve the existing a posteriori error estimates.

The remaining parts of the paper are organized as follows. Section 2 characterizes the norm of \(\ell \) in \(\mathcal {H}^*\). Section 3 presents the abstract conditions (F1)–(F3) for reliability and its proof. Under a further abstract condition (H), local shape functions in shape-regular triangulations allow to prove efficiency in Sect. 4. The impact of those results to mixed methods is outlined in Sect. 5 with a motivation where the abstract conditions (F1)–(F3) and (H) come from. While Sect. 5 merely aims at a paradigm, the subsequent sections discuss precise examples and quote details from the literature to ensure the reliable and efficient a posteriori error control. The list of applications includes the Poisson problem, the Stokes equations, and the Navier–Lamé equation in Sects. 6, 7, and 8.

Throughout this paper, an inequality \(a\lesssim b\) replaces \(a\le C\,b\) with a multiplicative mesh-size independent constant C that depends only on the domain and the shape (e.g. through the aspect ratio) of finite elements; \(a \approx b\) abbreviates \(a\lesssim b\lesssim a\). The Lebesgue and Sobolev spaces \(L^2(\Omega ;\mathbb {R}^m)\), \(H^{1}(\Omega ;\mathbb {R}^m)\) and \(H^1_0(\Omega ;\mathbb {R}^m)\) and their norms are defined as usual for \(m=1,2,\ldots ,n\) and \(n=2,3\). In (1), \(\cdot \) (resp. \(:\)) denotes the scalar product in \(\mathbb {R}^m\) (resp. \(\mathbb {R}^{m\times n}\)) and

$$\begin{aligned} \mathcal {H}:= H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n}):=\{ q\in L^2(\Omega ;\mathbb {R}^{m\times n}):\, {{\mathrm{div}}}q\in L^2(\Omega ;\mathbb {R}^m)\}. \end{aligned}$$
(4)

Given symmetric and positive definite linear operators \(A:\mathbb {R}^{m\times n}\rightarrow \mathbb {R}^{m\times n}\) and \(B:\mathbb {R}^{m}\rightarrow \mathbb {R}^{m}\), the Hilbert space \(\mathcal {H}\) is endowed with the norm

$$\begin{aligned} \Vert q\Vert _{\mathcal {H}}:=(\Vert q\Vert _{A^{-1}}^2+ \Vert {{\mathrm{div}}}q\Vert _{B^{-1}}^2 )^{1/2} \end{aligned}$$
(5)

for \(q\in \mathcal {H}\) with respect to the weighted \(L^2\)-norms

$$\begin{aligned} \Vert q\Vert _A:=\left( \int _\Omega q:A q\,dx\right) ^{1/2}\quad \text {and}\quad \Vert v\Vert _B:=\left( \int _\Omega v\cdot B v\,dx\right) ^{1/2} \end{aligned}$$
(6)

for all \(q\in L^2(\Omega ;\mathbb {R}^{m\times n})\) and \(v\in L^2(\Omega ;\mathbb {R}^m)\). Recall \(\mathcal {H}:= H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n})\) and set

$$\begin{aligned} \mathcal {V}:=H^1_0(\Omega ;\mathbb {R}^m), \mathcal {W}:=H^{2}(\Omega ;\mathbb {R}^m), \text { and }\,\, \mathcal {M}:=AD\mathcal {W}. \end{aligned}$$

The unit matrix is written \(\mathbf{1}\) and the symmetric (resp. skew-symmetric) matrices read \(\mathbb {R}_{{{\mathrm{sym}}}}^{n\times n}\) (resp. \(\mathbb {R}_{{{\mathrm{skew}}}}^{n\times n}\)) with scalar product  : , trace \({{\mathrm{tr}}}{}\), and deviatoric part \({{\mathrm{dev}}}\).

In principle, the linear operators A and B may be chosen more generally as varying coefficients. Here, the constant coefficient tensor A allows for \(\mathcal {M}\subset \mathcal {H}\) which is required to guarantee some regularity properties of the Helmholtz decomposition as introduced in Sect. 3.

2 Characterization of the norm of \(\ell \) in \(\mathcal {H}^*\)

Recall the definition (1) of \(\ell \) with given \(p_h\in L^2(\Omega ;\mathbb {R}^{m\times n})\) and \(u_h\in L^2(\Omega ;\mathbb {R}^m)\). In view of possible weights, suppose that \(\sigma _h\in \mathcal {H}:=H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n})\) satisfies \(p_h:= A^{-1} \sigma _h\).

Theorem 1

(Characterization of \(\Vert \ell \Vert _{\mathcal {H}^*}\)) Given \(\sigma _h\in \mathcal {H}\) and \(u_h\in L^2(\Omega ;\mathbb {R}^m)\), the dual norm of \(\ell \in \mathcal {H}^*\) defined by

$$\begin{aligned} \ell (q):=\int _{\Omega }(q : A^{-1} \sigma _h+u_h\cdot {{\mathrm{div}}}q )\,dx\quad \text {for all }\,\, q \in \mathcal {H}\end{aligned}$$
(7)

satisfies

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}= \min _{ v\in \mathcal {V}} (\Vert A^{-1} \sigma _h-Dv\Vert ^2_A + \Vert u_h-v\Vert _{B}^2)^{1/2}. \end{aligned}$$
(8)

The unique minimizer \(v\in \mathcal {V}:=H^1_0(\Omega ;\mathbb {R}^m)\) of the right-hand side is characterized as the unique solution to

$$\begin{aligned} -{{\mathrm{div}}}ADv+Bv=-{{\mathrm{div}}}\sigma _h+Bu_h\quad \text {in }\,\,H^{-1}(\Omega ;\mathbb {R}^m). \end{aligned}$$
(9)

Proof

Given \(q\in \mathcal {H}\) and \(v\in \mathcal {V}\), an integration by parts leads to

$$\begin{aligned} \ell (q)=\int _{\Omega }(p_h-Dv):q\,dx+\int _{\Omega }(u_h-v)\cdot {{\mathrm{div}}}q\,dx. \end{aligned}$$
(10)

Cauchy inequalities and the weighted \(L^2\)-norms from (6) therefore yield

$$\begin{aligned} \ell (q)\le (\Vert p_h-Dv\Vert _A^2+\Vert u_h-v\Vert _B^2)^{1/2}\Vert q\Vert _{\mathcal {H}}. \end{aligned}$$

Since \(q\in \mathcal {H}\) and \(v\in V\) are arbitrary, this implies

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}\le \inf _{v\in V}(\Vert p_h-Dv\Vert _A^2+\Vert u_h-v\Vert _B^2)^{1/2}. \end{aligned}$$

Existence and uniqueness of a minimizer \(v\in \mathcal {V}\) and its characterization by its Euler–Lagrange equation (9) follows from standard arguments in the calculus of variations.

To prove the equality (8), let \(v\in \mathcal {V}\) solve the elliptic PDE (9) and set \(q:=A(p_h-Dv)\in \mathcal {H}\). This yields \({{\mathrm{div}}}q=B(u_h-v)\in L^2(\Omega ;\mathbb {R}^m)\) and, together with (10), reveals that

$$\begin{aligned} \ell (q)=\Vert p_h-Dv\Vert _A^2+\Vert u_h-v\Vert _B^2=\Vert q\Vert _{\mathcal {H}}^2. \end{aligned}$$

Consequently, for the particular v,

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}\ge (\Vert p_h-Dv\Vert _A^2+\Vert u_h-v\Vert _B^2)^{1/2}. \end{aligned}$$

\(\square \)

Remark 1

(Reliable error control) Theorem 1 asserts that any conforming approximation \(\tilde{u}_h\in \mathcal {V}\) to \(u_h\) (e.g., some Clément quasi interpolation of \(u_h\)) leads to some reliable residual estimate

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}\le (\Vert p_h-D\tilde{u}_h\Vert ^2_A + \Vert u_h-\tilde{u}_h\Vert _{B}^2)^{1/2}. \end{aligned}$$
(11)

It cannot be overemphasized that (11) holds without extra conditions like smoothness or convexity of the domain or \(H^2\) regularity of the solution and involves the explicit constant factor one on the right-hand side.

Remark 2

(Conformity error) The contribution

$$\begin{aligned} \delta :=\min _{v\in \mathcal {V}}\Vert p_h-Dv\Vert _A \end{aligned}$$
(12)

denotes the conformity error of \(p_h\) which measures the weighted \(L^2\) distance of \(p_h\) to the set of admissible gradients. Observe from Theorem 1 that \(\delta \le \Vert \ell \Vert _{\mathcal {H}^*}\) is efficient.

3 Reliable a posteriori estimate for \(\Vert \ell \Vert _{\mathcal {H}^*}\)

Given \(\sigma _h\in \mathcal {H}\) and \(u_h\in L^2(\Omega ;\mathbb {R}^m)\), let the bounded linear functional \(\ell \in \mathcal {H}^*\) be defined by (7) and let \({{\mathrm{ker}}}\ell :=\{q\in \mathcal {H}:\ell (q)=0\}\) denote its kernel.

The a posteriori error control for the norm of \(\ell \) in \(\mathcal {H}^*\) requires the following hypothesis (F1)–(F3) on the data and the space \(\mathcal {M}=AD(H^2(\Omega ;\mathbb {R}^m))\subset H^1(\Omega ; \mathbb {R}^{m\times n})\subset \mathcal {H}\).

There exists some linear and bounded Fortin interpolation operator

$$\begin{aligned} I_F: \mathcal {M}\rightarrow {{\mathrm{ker}}}\ell , \end{aligned}$$
(F1)

which satisfies the orthogonality condition

$$\begin{aligned} \int _\Omega u_h\cdot {{\mathrm{div}}}(q-I_F q)\,dx =0\quad \text {for all }\,\,q\in \mathcal {M}, \end{aligned}$$
(F2)

and the approximation property

$$\begin{aligned} \Vert h^{-1}(q-I_F q)\Vert _{A^{-1}}\lesssim \Vert q\Vert _{H^1(\Omega ;\mathbb {R}^{m\times n})}\quad \text {for all }\,\,q\in \mathcal {M}\end{aligned}$$
(F3)

with some weight function \(h\in L^{\infty }(\Omega )\) and its reciprocal \(h^{-1}\in L^{\infty }(\Omega )\).

Define some subspace Q as the \(L^2\)-orthogonal complement (of the topological closure) of \((1-I_F)(\mathcal {M})\), namely

$$\begin{aligned} Q:=\left\{ q\in L^2(\Omega ;\mathbb {R}^{m\times n}):\;\forall \tau \in \mathcal {M}, \;\int _\Omega q:(\tau -I_F\tau )\,dx = 0\right\} . \end{aligned}$$
(13)

Theorem 2

\((\Vert \ell \Vert _{\mathcal {H}^*}\lesssim \eta )\) Recall \(\delta \) from (12) and set

$$\begin{aligned} \mu :=\min _{q\in Q}\Vert h(p_h-q)\Vert _B. \end{aligned}$$
(14)

The hypothesis (F1)–( F3) imply

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}\lesssim \eta :=\delta +\mu . \end{aligned}$$

Proof of Theorem 2

In order to estimate \(\ell \) we study \(\ell (q)\) for an arbitrary fixed element \(q\in \mathcal {H}\) and a regular decomposition. Recall the definition of the \({{\mathrm{Curl}}}\) operator; the notation for \(n=2\) and \(n=3\) is very different, so let \(\tilde{n}:=1\) if \(n=2\) and \(\tilde{n}:=3\) if \(n=3\), and define the \({{\mathrm{Curl}}}\) of a function \(\psi \in H^1(\Omega ;\mathbb {R}^{\tilde{n}})\) by

$$\begin{aligned} {{\mathrm{Curl}}}\psi := {\left\{ \begin{array}{ll} \left( -\tfrac{\partial \psi }{\partial x_2},\tfrac{\partial \psi }{\partial x_1}\right) ^\top &{} \text {if } n=2,\\ \nabla \times \psi &{}\text {if }n=3. \end{array}\right. } \end{aligned}$$

(Here, \(v\times w\) denotes the vector product of two vectors vw in \(\mathbb {R}^3\).) For \(\psi =(\psi _1,\ldots ,\psi _m)^\top \in H^1(\Omega ;\mathbb {R}^{m\times \tilde{n}})\) with rows \(\psi _1,\ldots ,\psi _m \in H^1(\Omega ;\mathbb {R}^{\tilde{n}})\) define

$$\begin{aligned} {{\mathrm{Curl}}}\psi := \left( \begin{array}{c}({{\mathrm{Curl}}}\psi _1)^\top \\ \vdots \\ ({{\mathrm{Curl}}}\psi _m)^\top \end{array} \right) \in L^2(\Omega ,\mathbb {R}^{m\times n}). \end{aligned}$$

Some minor generalization of [18, Lemma 3.3] shows that \(q\in \mathcal {H}\) can be decomposed as

$$\begin{aligned} q=AD\alpha +{{\mathrm{Curl}}}\beta , \end{aligned}$$
(15)

where \(\alpha \in \mathcal {W}=H^2(\Omega ;\mathbb {R}^m)\) (that is \(AD\alpha \in \mathcal {M}\)), \(\beta \in H^1(\Omega ;\mathbb {R}^{m\times \tilde{n}})\) and

$$\begin{aligned} \Vert AD\alpha \Vert _{H^{1}(\Omega )}+\Vert \beta \Vert _{H^1(\Omega )} \lesssim \Vert q\Vert _{\mathcal {H}}. \end{aligned}$$
(16)

The proof considers a large ball \(\hat{\Omega }\) which includes \(\Omega \) and an extension of q from \(\Omega \) to some \(\hat{q}\in H({{\mathrm{div}}},\hat{\Omega };\mathbb {R}^{m\times n})\). Some standard elliptic regularity estimate of the equation

$$\begin{aligned} -{{\mathrm{div}}}AD\hat{\alpha }=-{{\mathrm{div}}}\hat{q}\quad \text {with solution }\hat{\alpha }\in H^1_0(\hat{\Omega };\mathbb {R}^m)\cap H^2(\hat{\Omega };\mathbb {R}^m) \end{aligned}$$

and some divergence-free remainder \({{\mathrm{Curl}}}\hat{\beta }=\hat{q}-AD\hat{\alpha }\in L^2(\hat{\Omega };\mathbb {R}^{m\times n})\) lead to the required \(\alpha \) and \(\beta \) as the restrictions of \(\hat{\alpha }\) and \(\hat{\beta }\) to \(\Omega \).

The mapping property (F1) of \(I_F\) yields \(\ell (I_FAD\alpha )=0\). Hence, the approximation error

$$\begin{aligned} E:=AD\alpha -I_FAD\alpha \in H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n}) \end{aligned}$$

allows some error control with (F3) and satisfies \(\ell (AD\alpha )=\ell (E)\). This, the definition (1), and (F2) lead to

$$\begin{aligned} \ell (AD\alpha )=\int _{\Omega }(p_h-q):E\,dx \end{aligned}$$

for an arbitrary \(q\in Q\). A weighted Cauchy inequality proves

$$\begin{aligned} \ell (AD\alpha )\le \Vert h(p_h-q)\Vert _{A}\Vert h^{-1}E\Vert _{A^{-1}}. \end{aligned}$$

Since \(q\in Q\) is arbitrary, this and the definition (14) imply

$$\begin{aligned} \ell (AD\alpha )\le \mu \,\Vert h^{-1}E\Vert _{A^{-1}}. \end{aligned}$$

The approximation property of \(I_F\) in (F3) proves

$$\begin{aligned} \ell (AD\alpha )\lesssim \mu \, || AD\alpha ||_{H^1(\Omega ;\mathbb {R}^{m\times n})}. \end{aligned}$$
(17)

On the other hand, given any \(v\in \mathcal {V}\), a weighted Cauchy inequality yields

$$\begin{aligned} \ell ({{\mathrm{Curl}}}\beta )&=\int _\Omega p_h:{{\mathrm{Curl}}}\beta \,dx =\int _\Omega (p_h-Dv):{{\mathrm{Curl}}}\beta \,dx\\&\le \Vert p_h-Dv\Vert _A\Vert {{\mathrm{Curl}}}\beta \Vert _{A^{-1}}. \end{aligned}$$

Since v is arbitrary, the definition (12) leads to

$$\begin{aligned} \ell ({{\mathrm{Curl}}}\beta )\le \delta \, \Vert {{\mathrm{Curl}}}\beta \Vert _{A^{-1}}. \end{aligned}$$
(18)

The combination of (17) and (18) with (15) and (16) concludes the proof. \(\square \)

Remark 3

(Regular split) The proof of the decomposition (15) is the same as that in the proof of [18, Lemma 3.3] with \(\nabla (\cdot )\) replaced by \(AD(\cdot )\) and \(\Delta (\cdot )\) replaced by \({{\mathrm{div}}}AD(\cdot )\). Note that a similar decomposition could be derived from [19].

Remark 4

(Helmholtz decomposition) The decomposition (15) exploits the regularity of the input \(q\in H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n})\) to choose \(\alpha \) as the gradient (weighted by A) of some \(H^2(\Omega ;\mathbb {R}^m)\) function. In contrast to the classical (\(L^2\) orthogonal) Helmholtz decomposition

$$\begin{aligned} L^2(\Omega ;\mathbb {R}^{m\times n})=D(H^1_0(\Omega ;\mathbb {R}^m))\oplus {{\mathrm{Curl}}}(H^1(\Omega ;\mathbb {R}^m)/\mathbb {R}), \end{aligned}$$
(19)

\(\alpha \) is not enforced to match Dirichlet boundary condition on \(\partial \Omega \).

Remark 5

(Suboptimal analysis) It is possible to perform our analysis with (15) replaced by the classical Helmholtz decomposition (19). However, the regularity of \(AD\alpha \) is then limited by the elliptic regularity on the domain \(\Omega \) under consideration, i.e., \(\alpha \in H^{1+s}(\Omega ;\mathbb {R}^{m})\cap H^1_0(\Omega ;\mathbb {R}^{m})\) and \(AD\alpha \in H^s(\Omega ;\mathbb {R}^{m\times n})\) for some \(0< s\le 1\) with \(s<1\) for non-convex \(\Omega \). The arguments then lead to suboptimal upper bounds

$$\begin{aligned} \Vert \ell \Vert _{\mathcal {H}^*}\lesssim \delta +\min _{q\in Q}\Vert h^s(p_h-q)\Vert _B. \end{aligned}$$

4 Efficient a posteriori error estimate for \(\Vert \ell \Vert _{\mathcal {H}^*}\)

The efficiency of the proposed estimator is based on a local inverse estimate technique described in terms of a triangulation. For this purpose, let \(\partial \Omega \) be piecewise affine such that \(\overline{\Omega }\) is the union of a shape regular triangulation \(\mathcal {T}\) into triangles or parallelograms for \(n=2\) and into tetrahedra or parallelepipeds for \(n=3\) (without hanging nodes etc.). Let the weight function h from (F3) be piecewise constant on \(\Omega \) defined by \(h\vert _T:=h_T:={{\mathrm{diam}}}(T)\) for \(T\in \mathcal {T}\). Moreover, let \(D_h\) denote the piecewise action of the differential operator D to piecewise smooth functions (piecewise with respect to the triangulation \(\mathcal {T}\)) and set

$$\begin{aligned} \mathcal {P}_k(\mathcal {T}):=\{v_h\in L^2(\Omega ):\;\forall T\in \mathcal {T},\;q_h\vert _T\in \mathcal {P}_k(T)\} \end{aligned}$$

for the set of algebraic polynomials \(\mathcal {P}_k(T)\) of total degree less than or equal to \(k\in \mathbb {N}_0\) regarded as functions on T.

Recall the definition (13) of Q and suppose, for some polynomial degree \(k\in \mathbb {N}_0\) and the convention that \(\mathcal {P}_{-1}\) denotes the zero polynomial, that \(u_h\) satisfies

$$\begin{aligned} D_hu_h\in Q\cap \mathcal {P}_{k-1}(\mathcal {T};\mathbb {R}^{m\times n}) . \end{aligned}$$
(H)

Recall \(\delta \) from (12) and \(\mu \) from (14).

Theorem 3

\((\eta \lesssim \Vert \ell \Vert _{\mathcal {H}^*})\) The hypothesis (H) implies

$$\begin{aligned} \eta :=\delta +\mu \lesssim \Vert \ell \Vert _{\mathcal {H}^*}. \end{aligned}$$

The proof of the theorem is based on the Proposition 1 on local efficiency which generalizes [10, Lemma 6.3].

Proposition 1

Any \(u_h\in \mathcal {P}_k(T;\mathbb {R}^m)\) and \(p_h\in \mathcal {P}_k(T;\mathbb {R}^{m\times n})\) on \(T\in \mathcal {T}\) satisfy

$$\begin{aligned}&h_T\Vert p_h-D_hu_h\Vert _{L^2(T;\mathbb {R}^{m\times n})}\\&\quad \lesssim \min _{v\in H^1(T;\mathbb {R}^m)}\left( \Vert u_h-v\Vert _{L^2(T;\mathbb {R}^m)}+h_T\Vert p_h-Dv\Vert _{L^2(T;\mathbb {R}^{m\times n})}\right) . \end{aligned}$$

Proof

Let \(b_T\in H^1_0(T)\) be the bubble-function defined as the product of all first-order nodal basis functions with respect to all vertices in T. Then, \(0\le b_T\le 1\) and

$$\begin{aligned} \Vert p_h-D_hu_h\Vert _{L^2(T;\mathbb {R}^{m\times n})}\lesssim \Vert b_T^{1/2}(p_h-D_hu_h)\Vert _{L^2(T;\mathbb {R}^{m\times n})}. \end{aligned}$$

Note that \(D_hu_h=Du_h\) on T. Hence, for each \(v\in H^1(T;\mathbb {R}^m)\),

$$\begin{aligned} \Vert p_h-D_hu_h\Vert _{L^2(T;\mathbb {R}^{m\times n})}^2\lesssim \int _T b_T(p_h-D_hu_h):(p_h-Dv+Dv-D_hu_h)\,dx. \end{aligned}$$

The product with \(Dv-D_hu_h\) is recast with an integration by parts. This and Cauchy inequalities lead to

$$\begin{aligned} \Vert p_h-D_hu_h\Vert _{L^2(T;\mathbb {R}^{m\times n})}^2&\lesssim \Vert p_h-Dv\Vert _{L^2(T;\mathbb {R}^{m\times n})}\Vert b_T(p_h-D_hu_h)\Vert _{L^2(T;\mathbb {R}^{m\times n})}\nonumber \\&\quad +\Vert v-u_h\Vert _{L^2(T;\mathbb {R}^m)}\Vert {{\mathrm{div}}}((p_h-D_hu_h)b_T)\Vert _{L^2(T;\mathbb {R}^m)}. \end{aligned}$$
(20)

Since \((p_h-D_hu_h)b_T\) is a polynomial on T, an inverse estimate reads

$$\begin{aligned} h_T\Vert {{\mathrm{div}}}((p_h-D_hu_h)b_T)\Vert _{L^2(T;\mathbb {R}^m)}\lesssim \Vert (p_h-D_hu_h)b_T\Vert _{L^2(T;\mathbb {R}^{m\times n})}. \end{aligned}$$
(21)

The combination of (20)–(21) plus a division by

$$\begin{aligned} \Vert (p_h-D_hu_h)b_T\Vert _{L^2(T;\mathbb {R}^{m\times n})}\le \Vert p_h-D_hu_h\Vert _{L^2(T;\mathbb {R}^{m\times n})} \end{aligned}$$

proves the assertion. \(\square \)

Proof of Theorem 3

Under the condition (H), Proposition 1 implies the efficiency of the estimator \(\eta \) in the sense of

$$\begin{aligned} \min _{q\in Q}\Vert h(p_h-q)\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})}\lesssim \min _{v\in V}(\Vert u_h-v\Vert _{L^2(\Omega ;\mathbb {R}^m)}+\Vert h(p_h-Dv)\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})}). \end{aligned}$$

Let \(\rho (B)\) (resp. \(\rho (A^{-1})\) and \(\rho (B^{-1})\)) denote the largest eigenvalue of B (resp. \(A^{-1}\) and \(B^{-1}\)). Then,

$$\begin{aligned} \mu \lesssim \rho (B)^{1/2}\min _{v\in V}(\Vert u_h-v\Vert _{L^2(\Omega ;\mathbb {R}^m)}+\Vert h(p_h-Dv)\Vert _{L^2(\Omega ;\mathbb {R}^{m\times n})}). \end{aligned}$$

Theorem 1 yields

$$\begin{aligned} \mu \lesssim \rho (B)^{1/2}(\Vert h\Vert _{L^\infty (\Omega )}\rho (A^{-1})+\rho (B^{-1}))^{1/2}\Vert \ell \Vert _{\mathcal {H}^*}. \end{aligned}$$

The remaining estimate \(\delta \le \Vert \ell \Vert _{\mathcal {H}^*}\) follows from Theorem 1. \(\square \)

Remark 6

(Violation of the orthogonality condition (F2)) All the examples of MFEM in Table 1 below allow a generalization in case that the data \(u_h\) does not satisfy (F2). It suffices to replace \(u_h\) by its \(L^2(\Omega )\)-orthogonal projection \(\tilde{u}_h:=\Pi _k u_h\) onto \(\mathcal {P}_k(\mathcal {T};\mathbb {R}^m)\). Since (13) still holds with this substitution, \(\tilde{\ell }(q):= \int _\Omega (p_h:q+\tilde{u}_h\cdot q)dx\) can be estimated via Theorems 2 and 3. The difference \(\ell -\tilde{\ell }\) satisfies

$$\begin{aligned} \Vert \ell - \tilde{\ell }\Vert _{\mathcal {H}^*}\le \Vert (1-\Pi _k) u_h \Vert _{L^2(\Omega ;\mathbb {R}^m)}\lesssim \Vert \ell - \tilde{\ell }\Vert _{\mathcal {H}^*}. \end{aligned}$$
(22)

The first inequality in (22) is obvious whereas the proof of the second employs the Ladyzhenskaya lemma as follows. Since \( (1-\Pi _k) u_h\in L^2_0(\Omega ;\mathbb {R}^m)\) there exists some \(q\in H^1_0(\Omega ;\mathbb {R}^{m\times n})\) with \((1-\Pi _k) u_h={{\mathrm{div}}}q\) and

$$\begin{aligned} \Vert q\Vert _\mathcal {H}\lesssim \Vert q \Vert _{H^1(\Omega )}\lesssim \Vert (1-\Pi _k) u_h \Vert _{L^2(\Omega ;\mathbb {R}^m)} . \end{aligned}$$

The combination of this with \( u_h-\tilde{u}_h ={{\mathrm{div}}}q\) concludes the proof, namely

$$\begin{aligned} \Vert (1-\Pi _k) u_h \Vert _{L^2(\Omega ;\mathbb {R}^m)}^2=\int _\Omega (u_h-\tilde{u}_h)\cdot {{\mathrm{div}}}q\, dx= (\ell -\tilde{\ell })(q)\le \Vert \ell - \tilde{\ell }\Vert _{\mathcal {H}^*} \Vert q\Vert _\mathcal {H}. \end{aligned}$$
Table 1 Standard 2D mixed FEMs with polynomials \(\mathcal {P}_k(T)\) of total degree \(\le k\) and edge-wise polynomials \(\mathcal {P}_k(\partial T)\) of degree \(\le k\); \(M_k(T)\) and \(D_k(T)\) define the mixed finite element spaces \(M_h\) and \(L_h\) via (30)

5 Application to mixed finite element methods in abstract setting

The mixed finite element system of the Laplace equation or the Navier–Lamé equations (amongst many others) seeks \(\sigma \in H\) and \(u\in L\) with

$$\begin{aligned} a(\sigma ,\tau )+b(\tau ,u)&=f(q)\quad \text{ for } \text{ all } \,\, \tau \in H,\nonumber \\ b(\sigma ,v)&=g(v)\quad \text{ for } \text{ all } \,\,v\in L. \end{aligned}$$
(23)

Therein, H and L are Hilbert spaces for the fluxes (or stresses) and displacements and a and b are bilinear forms; throughout this paper,

$$\begin{aligned} H\subset \mathcal {H}:=H({{\mathrm{div}}},\Omega ;\mathbb {R}^{m\times n})\quad \text{ and }\quad L\subset \mathcal {L}:=L^2(\Omega ;\mathbb {R}^m) \end{aligned}$$

and the given right-hand sides f and g belong to the duals \(H^*\) and \(L^*\).

Given some mixed finite element approximations \(\sigma _h\in M_h\subset H\) and \(u_h\in L_h\subset L\), the residuals \(\mathcal {R}\text {es}_H+\mathcal {R}\text {es}_L\) of (23) read

$$\begin{aligned} \mathcal {R}\text {es}_L(v)&:= g(v)-b(\sigma _h,v)\quad \text{ for } \,\,v\in L,\nonumber \\ \mathcal {R}\text {es}_H(\tau )&:= f(\tau )-a(\sigma _h,\tau )-b(\tau ,{u}_h)\quad \text{ for } \,\, \tau \in H. \end{aligned}$$
(24)

The well-established mapping properties of the operators of (23) and the inf-sup condition on the continuous level immediately imply the well-known equivalence [11]

$$\begin{aligned} \Vert \sigma -\sigma _h \Vert _H +\Vert u-u_h \Vert _L\approx \Vert \mathcal {R}\text {es}_H\Vert _{H^*}+ \Vert \mathcal {R}\text {es}_L\Vert _{L^*} \end{aligned}$$
(25)

and justify residual-based error control.

The analysis of [11, 14] concerned the primal mixed formulation whereas, here, (23) represents the standard weak formulation of mixed finite element methods. Therefore, the residuals are utterly different. The norm of the first residual

$$\begin{aligned} \Vert \mathcal {R}\text {es}_L\Vert _{L^*}=\Vert g-g_h\Vert _{L^2(\Omega )} \end{aligned}$$
(26)

equals the \(L^2\) norm of a known right-hand side \(g\in L^2(\Omega ;\mathbb {R}^{m})\) minus its (computable) piecewise approximation \(g_h\). This term (26) can be computed (up to quadrature errors) and hence deserves no further investigations in this paper.

In all applications of this paper, the second residual \(\ell =\mathcal {R}\text {es}_H \) from (24) has the format of (1), i.e.,

$$\begin{aligned} \ell (q)=\int _{\Omega }(p_h:q+u_h\cdot {{\mathrm{div}}}q)\,dx\quad \text {for }\,\,q\in H \end{aligned}$$

and for given \(p_h\in L^2(\Omega ;\mathbb {R}^{m\times n})\) and given \(u_h\in L^2(\Omega ;\mathbb {R}^{m})\).

The subsequent sections study a series of applications and comment on the conditions (F1)–(F3) with Q from (13) and (H) to deduce

$$\begin{aligned} \Vert \mathcal {R}\text {es}_H\Vert _{H^*}\approx \delta +\mu \approx \delta + || h(p_h-D_h u_h ) ||_{L^2(\Omega ;\mathbb {R}^m)} \end{aligned}$$
(27)

for \(\delta \) from (12) and \(\mu \) from (14). Recall that h is the local mesh-size of the underlying triangulation \(\mathcal {T}\) for the piecewise polynomial finite element functions.

The remaining parts of this section give an overview why the conditions (F1)–(F3) and (H) are satisfied in the examples below.

In all examples for an MFEM of this paper, the Fortin interpolation operator \(I_F\) is defined on the bigger space \(H^1(\Omega ;\mathbb {R}^{m\times n})\) and maps onto \(M_h\). The discrete solution \((\sigma _h,u_h)\) for the mixed formulation (23) leads to the kernel property \( M_h\subset {{\mathrm{ker}}}\ell \). This implies (F1).

With the \(L^2\)-orthogonal projection \(\Pi \) onto \(L_h\), the Fortin interpolation operator \(I_F\) satisfies the commuting diagram property

$$\begin{aligned} {{\mathrm{div}}}I_F = \Pi {{\mathrm{div}}}\end{aligned}$$

for all arguments in \(H^1(\Omega ;\mathbb {R}^{m\times n})\). This and \(u_h\in L_h\) imply (F2).

The approximation property (F3) is the heart of the operator defined by the degrees of freedom for the mixed finite element space at hand and can be quoted from the existing a priori error estimates for mixed FEM.

In the examples of this paper, the aforementioned degrees of freedom of the mixed finite element act on the normal components on the fluxes such that, on each side F with unit normal \(\nu _F\), the residual \(\tau -I_F\tau \) satisfies an \(L^2\) orthogonality along F onto polynomials of a degree at most k,

$$\begin{aligned} (\tau -I_F\tau )\nu _F\perp \mathcal {P}_k(F;\mathbb {R}^m)\quad \text {in }\,\,L^2(F;\mathbb {R}^m)\quad \text { for all sides }\,\, F. \end{aligned}$$
(28)

For \(u_h\in L_h\cap \mathcal {P}_k(\mathcal {T};\mathbb {R}^m)\), an integration by parts of \(\int _T (\tau -I_F\tau ):D u_h\,dx\) leads to the sum over the sides F of the element domain \(T\in \mathcal {T}\) with the integral of \(u_h\cdot (\tau -I_F\tau )\nu _F\) over F. According to the orthogonality (28), this integral

$$\begin{aligned} \int _F u_h\cdot (\tau -I_F\tau )\nu _Fds =0 \end{aligned}$$

vanishes. The conclusion is that

$$\begin{aligned} \int _\Omega (\tau -I_F\tau ):D_h u_h\,dx=0\quad \text{ for } \text{ all } \,\,\tau \in H^1(\Omega ;\mathbb {R}^{m\times n}). \end{aligned}$$

This and \(\mathcal {M}\subset H^1(\Omega ;\mathbb {R}^{m\times n})\) guarantee (H) for \(u_h\in L_h\cap \mathcal {P}_k(\mathcal {T};\mathbb {R}^m)\).

6 Poisson problem

This section concerns the Laplace equation with \(m=1\) and \(n\in \{2,3\}\) and identities A and B: Given \(g\in L^2(\Omega )\) seek \(u\in V:= H^1_0(\Omega )\) such that

$$\begin{aligned} -\Delta u=g\,\,\text { in }\,\,\Omega . \end{aligned}$$
(29)

6.1 Mixed FEM

The flux \(\sigma :=\nabla u\in \mathcal {H}:=H({{\mathrm{div}}},\Omega )\) and the displacement \(u\in \mathcal {L}:=L^2(\Omega )\) solve the problem (23) with \(f=0\) and g(v) substituted by the \(L^2\)-scalar product of g and v,

$$\begin{aligned} a(\sigma ,\tau ):=\int _\Omega \sigma \cdot \tau \, dx\quad \text {and} \quad b(\tau ,v):=\int _\Omega v\,{{\mathrm{div}}}\tau \, dx. \end{aligned}$$

The mapping properties of the operator defined in (23) as well as discrete spaces \(M_h\subset \mathcal {H}\) and \(L_h\subset \mathcal {L}\) are well established [68] and a few examples are depicted in Table 1. With the sets \(D_k(T)\) and \(M_k(T)\) from Table 1 for \(n=2\) set

$$\begin{aligned} M_h&:=M_k(\mathcal {T}):=\{q_h\in \mathcal {H}:\,\forall T\in \mathcal {T},\,q_h|_T\in M_k(T)\},\nonumber \\ L_h&:=D_k(\mathcal {T}):=\{v_h\in L^{\infty }(\Omega ):\,\forall T\in \mathcal {T},\,v_h|_T\in D_k(T)\}. \end{aligned}$$
(30)

The finite element approximations \(\sigma _h\in M_h\) and \(u_h\in L_h\), their unique existence, stability and a priori convergence properties are well established and further details are not recalled here. The analysis of this paper generalizes the main result from [10] to non-convex domains.

Theorem 4

The discrete solution \((\sigma _h,u_h)\) of any mixed FEM from Table 1 to the exact solution \((\sigma ,u)\) satisfies

$$\begin{aligned}&||\sigma -\sigma _h||_{H({{\mathrm{div}}}; \Omega )}+||u-u_h||_{L^2(\Omega )}\\&\quad \approx \min _{v\in H^1_0(\Omega )} ||\sigma _h-Dv ||_{L^2(\Omega ;\mathbb {R}^n)} + ||h(\sigma _h- D_h u_h)||_{L^2(\Omega ;\mathbb {R}^n)} +||g-g_h||_{L^2(\Omega )}. \end{aligned}$$

Proof

The Fortin interpolation \(I_F:H^1(\Omega ;\mathbb {R}^n)\rightarrow M_h\) is well-established for RT, BDM, and BDFM mixed finite element methods (MFEM); see, e.g., [8, Section III.3.3] and allows for (F1)–(F3) and (H). The proofs are verbatim with the arguments at the end of the previous section and further details are omitted. \(\square \)

Remark 7

A posteriori estimates without restrictions on the domain topology and being very similar to those of Theorem 4 are presented in [16] which appeared while this work was under review. The estimates in [16] concern mixed finite elements methods for general Hodge–de Rham–Laplace problems and are also based on Helmholtz or Hodge decomposition techniques. The main difference between our approach and that in [16] is that we use a refined version of the Helmholtz decomposition as used in [10], which enables us to fully neglect boundary conditions on \(\partial \Omega \), see Remark 4. In fact, our work justifies that additional regularity assumptions can be avoided with classical tools and a regular split and no longer employs further restrictions on the domain.

Example 1

(Characterization of Q) The Fortin interpolation \(I_F\) for the lowest-order Raviart–Thomas space in 2D reads \(I_F q:=\sum _{E\in \mathcal {E}} \int _E q\cdot \nu _E\, ds\, \Psi _E\) for the set of edges \(\mathcal {E}\) in the triangulation \(\mathcal {T}\) and for the edge-oriented basis functions \(\Psi _E\) (e.g. from [12]). Then, the space Q from (13) is characterized by

$$\begin{aligned} Q&=\left\{ q\in H({{\mathrm{div}}}=0,\Omega ): \, \forall E\in \mathcal {E}, \, \int _\Omega q\cdot \Psi _E\, dx=0\right\} ,\\ H({{\mathrm{div}}}=0,\Omega )&:=\{ q\in H(div,\Omega ): \, {{\mathrm{div}}}q=0\text { a.e. in }\Omega ,\, g\cdot \nu =0\text { along }\,\,\partial \Omega \}. \end{aligned}$$

To prove this characterization, consider \(q\in H({{\mathrm{div}}}=0,\Omega )\) and \(\phi \in H^2(\Omega )\) and integrate by parts to show that \( \int _\Omega q\cdot D \phi \, dx \) vanishes. If, in addition, \(q\perp RT_0(\mathcal {T})\), then \(q\perp D\phi - I_FD\phi \). In conclusion \(q\in Q\). The converse assertion is more tricky and sketched for brevity. Given any \(q\in Q\), and any triangle \(T\in \mathcal {T}\), let \(\phi \) be a \(C^\infty \) function with compact support in the interior of T. Then \(I_F D\phi =0\) and the orthogonality in (13) leads to

$$\begin{aligned} 0=\int _T q\cdot D\phi \, dx=-\int _T \phi \, {{\mathrm{div}}}q\, dx. \end{aligned}$$

Hence \({{\mathrm{div}}}q\) vanishes in the interior of any triangle. Given a point \(\xi \) in the relative interior of an edge E and some standard mollifier \(\eta _\epsilon \) with compact support in the interior of the edge-patch \(\omega _E=\text {int}(T_+\cup T_-)\) if the interior edge \(E=\partial T_+\cap T_-\) is shared by the two triangles \(T_\pm \). By symmetry of this standard mollifier \(\phi (x):=\eta _\epsilon (x-\xi )\), \(\partial \eta _\epsilon (x-\xi )/\partial \nu _E=0\) for \(x\in E\). Consequently, \(I_F D\phi =0\) and

$$\begin{aligned} 0=\int _{\omega _E} q\cdot D\phi \, dx=\int _E \phi \,[q]\cdot \nu _E\, dx \end{aligned}$$

for the jump [q] of the function q along the skeleton of element boundaries. It follows that the jump \([q]\cdot \nu \) vanishes in the distributional sense along each edge. For boundary edges, this implies \(q\cdot \nu =0\). Altogether, one deduces \(q\in H({{\mathrm{div}}}=0,\Omega )\).

The function \(\phi (x):= x\cdot \nu _E \, \eta _\epsilon (x-\xi )\) has a normal derivative \(\eta _\epsilon (x-\xi )\) for \(x\in E\) as above. Hence \(I_F D\phi \) is a multiple of \(\Psi _E\) and the orthogonality reduces to \(q\perp \Psi _E\). Altogether, q belongs to \(H({{\mathrm{div}}}=0,\Omega )\) and is \(L^2\)-orthogonal onto \(RT_0(\mathcal {T})\). \(\square \)

6.2 Application to least-squares FEM

The div least-squares formulation of (29) seeks the minimizer \((\sigma ,u)\) of the functional

$$\begin{aligned} {\text {LS}}(q,v):=\Vert g+{\text {div}}q\Vert _{L^2(\Omega )}^2+\Vert q -\nabla v\Vert _{L^2(\Omega ;\mathbb {R}^n)}^2 \end{aligned}$$

amongst \((q,v)\in \mathcal {H}\times \mathcal {V}=H({{\mathrm{div}}},\Omega )\times H^1_0(\Omega )\). In particular, it holds

$$\begin{aligned} d((\sigma ,u),(q,v))&:=\int _{\Omega }{{\mathrm{div}}}\sigma {{\mathrm{div}}}q\,dx+\int _{\Omega }(\sigma -\nabla u)\cdot (q-\nabla v)\,dx\\ {}&=-\int _{\Omega }g {{\mathrm{div}}}q\,dx\quad \text {for all }(q,v)\in \mathcal {H}\times \mathcal {V}. \end{aligned}$$

The least-squares finite element method (LSFEM) seeks the minimizer \((\sigma _{{\text {LS}}}\), \(u_{{\text {LS}}})\) of \({\text {LS}}\) in the subspace \(M_h\times V_h\), where \(M_h\) is one of the mixed finite element spaces of order k from the previous subsection from Table 1 and \(V_h\) is the \(H^1\)-conforming space of piecewise polynomials of total degree at most \(k+1\). Given \((q,v)\in \mathcal {H}\times \mathcal {V}\), the residual reads

$$\begin{aligned} \mathcal {R}\text {es}(q,v)&:=\int _\Omega (\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}})\cdot q\, dx+ \int _\Omega ({{\mathrm{div}}}\sigma _{{\text {LS}}}+ g){{\mathrm{div}}}q\, dx \nonumber \\&\quad \,\, + \int _{\Omega }(\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}})\cdot \nabla v\, dx. \end{aligned}$$
(31)

It is a well-established equivalence result (see for instance [5]) that the norm induced by the scalar product d is equivalent to the norm in \(\mathcal {H}\times \mathcal {V}\). Therefore,

$$\begin{aligned} \Vert \sigma -\sigma _{{\text {LS}}}\Vert _{\mathcal {H}} +\Vert u-u_{{\text {LS}}}\Vert _{\mathcal {V}}\approx \Vert \mathcal {R}\text {es}\Vert _{(\mathcal {H}\times \mathcal {V})^*} \end{aligned}$$
(32)

and the analysis of this paper leads to the following error estimate.

Theorem 5

The exact (resp. discrete) solution \((\sigma ,u)\) (resp. \((\sigma _{{\text {LS}}},u_{{\text {LS}}}))\) satisfies

$$\begin{aligned} \Vert \sigma -\sigma _{{\text {LS}}}\Vert _{\mathcal {H}} +\Vert u-u_{{\text {LS}}}\Vert _{H^1(\Omega )}\approx \Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)} +\Vert g-g_h\Vert _{L^2(\Omega )}. \end{aligned}$$
(33)

The point in this theorem beyond (32) is that the estimator differs from the natural a posteriori error estimate [4] for the LSFEM which reads

$$\begin{aligned}&\Vert \sigma -\sigma _{{\text {LS}}}\Vert _{\mathcal {H}} +\Vert u-u_{{\text {LS}}}\Vert _{H^1(\Omega )}\nonumber \\&\qquad \qquad \approx \Vert g+{{\mathrm{div}}}\sigma _{{\text {LS}}}\Vert _{L^2(\Omega )}+\Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)}. \end{aligned}$$
(34)

Theorem 5 discovers the equivalence of the a posteriori estimators (33) and (34). In particular, it implies that

$$\begin{aligned} \Vert g+{{\mathrm{div}}}\sigma _{{\text {LS}}}\Vert _{L^2(\Omega )}\lesssim \Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)}+\Vert g-g_h\Vert _{L^2(\Omega )}. \end{aligned}$$
(35)

Proof of Theorem 5

For \(u_h:=g+{{\mathrm{div}}}\sigma _{{\text {LS}}}\) and \(p_h:=\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\), the sum of the first two terms on the right-hand side of (31) coincides with the functional \(\ell \) from (1) for any test function \(q\in H({{\mathrm{div}}},\Omega )\). The conditions on the Fortin interpolation operator follow as in the previous subsection. Therefore, Theorem 2, Remark 6, and Theorem 3 establish the equivalence

$$\begin{aligned}&\Vert (\sigma -\sigma _{{\text {LS}}},u-u_{{\text {LS}}})\Vert _{\mathcal {H}\times \mathcal {V}}\approx \min _{q\in Q} \Vert h(\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}-q)\Vert _{L^2(\Omega ;\mathbb {R}^n)}\\&\quad +\min _{v\in \mathcal {V}}\Vert \sigma _{{\text {LS}}}-\nabla v\Vert _{L^2(\Omega ;\mathbb {R}^n)}+ \Vert {{\mathrm{div}}}(\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}})\Vert _{\mathcal {V}^*}+\Vert g-g_h\Vert _{L^2(\Omega )}. \end{aligned}$$

Note that \(0\in Q\) and \(h\lesssim 1\) and so \(\min _{q\in Q} \Vert h(\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}-q)\Vert _{L^2(\Omega ;\mathbb {R}^n)}\lesssim \Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)}\). Since \(u_{{\text {LS}}}\in \mathcal {V}\), \( \min _{v\in \mathcal {V}}\Vert \sigma _{{\text {LS}}}-\nabla v\Vert _{L^2(\Omega ;\mathbb {R}^n)}\lesssim \Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)}\). The distributional definition of \({{\mathrm{div}}}(\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}})\) shows \(\Vert {{\mathrm{div}}}(\sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}})\Vert _{\mathcal {V}^*} \lesssim \Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)}\). This proves

$$\begin{aligned} \Vert (\sigma -\sigma _{{\text {LS}}},u-u_{{\text {LS}}})\Vert _{\mathcal {H}\times \mathcal {V}}\lesssim \Vert \sigma _{{\text {LS}}}-\nabla u_{{\text {LS}}}\Vert _{L^2(\Omega ;\mathbb {R}^n)} +\Vert g-g_h\Vert _{L^2(\Omega )} \end{aligned}$$

The upper bound is smaller than or equal to the right-hand side in (34). For instance, \(g_h\) is the \(L^2\) projection of g onto the space of piecewise polynomials of degree \(\le k\) which includes the discrete function \({{\mathrm{div}}}\sigma _{{\text {LS}}}\) and so \(\Vert g-g_h \Vert _{L^2(\Omega )}\le \Vert g+{{\mathrm{div}}}\sigma _{{\text {LS}}}\Vert _{L^2(\Omega )}\). This concludes the proof.

7 Pseudostress-velocity formulation of the Stokes equations

The Stokes equations with the standard no-slip boundary conditions read with \(n=m=2\): Given some force density \(f\in L^2(\Omega ;\mathbb {R}^2)\) find a velocity field \(u\in \mathcal {V}:= H^1_0(\Omega ;\mathbb {R}^2)\) and a pressure distribution \(p\in L^2_0(\Omega ):=\{q\in L^2(\Omega )\;\vert \;\int _\Omega q\;dx=0\}\) such that

$$\begin{aligned} -\Delta u +\nabla p= f\quad \text {and}\quad {{\mathrm{div}}}u= 0\quad \text {in }\Omega . \end{aligned}$$
(36)

Let \({{\mathrm{dev}}}:\mathbb {R}^{2\times 2}\rightarrow \mathbb {R}^{2\times 2}\) be the deviatoric operator

$$\begin{aligned} {{\mathrm{dev}}}\tau :=\tau -1/2\, {{\mathrm{tr}}}(\tau ) \mathbf{1}\quad \text {for }\tau \in \mathbb {R}^{2\times 2}. \end{aligned}$$

The framework of [9] establishes the pseudostress-velocity formulation

$$\begin{aligned} {{\mathrm{dev}}}\sigma -\nabla u= 0\quad \text {and}\quad {{\mathrm{div}}}\sigma = -g\quad \text {in }\,\,\Omega \end{aligned}$$
(37)

of the Stokes equations (36). The weak formulation of (37) is (23) with

$$\begin{aligned} a(\sigma ,\tau )&:=\int _{\Omega }{{\mathrm{dev}}}\sigma :{{\mathrm{dev}}}\tau \,dx\quad \text { and }\quad b(\tau ,v):=\int _\Omega v\cdot {{\mathrm{div}}}\tau \,dx\quad \text { for }\\ \sigma ,\tau \in H&:= \mathcal {H}({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2})/\mathbb {R}\quad \text {and} \quad v\in L:=L^2(\Omega ;\mathbb {R}^2). \end{aligned}$$

With the spaces \(D_k(\mathcal {T})\) and \(M_k(\mathcal {T})\) from (30) and Table 1 set

$$\begin{aligned} M_h:=M_k(\mathcal {T})\times M_k(\mathcal {T}) \quad \text {and}\quad L_h:=D_k(\mathcal {T})\times D_k(\mathcal {T}). \end{aligned}$$
(38)

The unique existence, stability, and a priori convergence of the finite element approximations \(\sigma _h\in M_h\) and \({u}_h\in L_h\) to (23) are established in [15, 17] with further details. The following theorem complements the a posteriori error analysis of [15] for \(L^2\) error control with error estimates in the natural \(H({{\mathrm{div}}})\times L^2\) norms of the mixed FEM.

Theorem 6

The discrete solution \((\sigma _h,u_h)\) of any mixed FEM from (38) and Table 1 to the exact solution \((\sigma ,u)\) to (37) satisfies

Proof

With the identities A and B and \(\mathcal {M}:=D^2(H^2(\Omega ;\mathbb {R}^{2}))\subset H^1(\Omega ;\mathbb {R}^{2\times 2})\), the modified Fortin interpolation \(\tilde{I}_F:H^1(\Omega ;\mathbb {R}^{2\times 2})\rightarrow M_h\) reads

$$\begin{aligned} \tilde{I}_F\tau = I_F\tau - \left( {\textstyle \tfrac{1}{2|\Omega |}\int _\Omega {{\mathrm{tr}}}(I_F\tau )\,dx}\right) \mathbf{1} \end{aligned}$$

for all \(\tau \in \mathcal {H}\) of the standard Fortin interpolation \(I_F\) (applied component-wise) for mixed FEMs as in Sect. 6 (cf., e.g., [17, Sect. 3.2], [15, Sect. 3], and [8, Sect. III. 3.3] for details); \(|\Omega |\) is the area of \(\Omega \) to normalize \(\tilde{I}_F\tau \in \mathcal {H}\). The modified interpolation \(\tilde{I}_F\) allows for (F1)–(F3) as well as \(D_hu_h\in \mathcal {P}_{k-1}(\mathcal {T})^2\subset Q\). In fact, \(\tilde{I}_F(\mathcal {M})\subset M_h\subset {{\mathrm{ker}}}\ell \) and \( {{\mathrm{div}}}(1-\tilde{I}_F)(\mathcal {M}) \perp L_h = {{\mathrm{div}}}M_h. \) Since

$$\begin{aligned} \int _{\Omega }{{\mathrm{tr}}}(Dv)\,dx=\int _{\Omega }{{\mathrm{div}}}v\,dx =\int _{\partial \Omega } v\cdot \nu \,ds=0 \end{aligned}$$

(with the outer normal \(\nu \) of \(\partial \Omega \)), \(Dv\in \mathcal {H}\) for \(v\in \mathcal {V}\cap H^2(\Omega ;\mathbb {R}^2)\). Altogether, Theorems 2 and 3 imply the assertion. \(\square \)

8 Mixed FEM in elasticity

This section is devoted to the Navier–Lamé equation for \(m=n=2\) and a symmetric variant of the theory and its application to PEERS and to the symmetric Arnold–Winther MFEM.

8.1 Linear elasticity

Linear elasticity is modeled via the linear Green strain \(\varepsilon (u):=\frac{1}{2}(Du+Du^\top )\) and a linear stress–strain relation

$$\begin{aligned} \mathbb {C}\tau :=\lambda \,{{\mathrm{tr}}}(\tau )\,\mathbf{1}+2\mu \,\tau \quad \text {for all }\tau \in \mathbb {R}^{2\times 2} \end{aligned}$$

with the Lamé parameters \(\lambda ,\mu >0\) and with the inverse relation

$$\begin{aligned} \mathbb {C}^{-1}\tau =1/(2\mu )\,\tau -\lambda /(2\mu (n\lambda -2\mu ))\,{{\mathrm{tr}}}(\tau )\mathbf{1}. \end{aligned}$$

The Navier–Lamé equation reads: Given \(g\in L^2(\Omega ;\mathbb {R}^2)\) seek \(u\in \mathcal {V}\) with

$$\begin{aligned} -{{\mathrm{div}}}\mathbb {C}\,\varepsilon (u)=g\quad \text {in }\,\, \Omega . \end{aligned}$$
(39)

By Korn’s inequality, (39) has a unique weak solution \(u\in \mathcal {V}\equiv H^1_0(\Omega ;\mathbb {R}^2)\) and the elliptic regularity theory applies. The stress \(\sigma :=\mathbb {C}\,\varepsilon (u)\) belongs to

$$\begin{aligned} H:=H({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2})/\mathbb {R}:=\left\{ \tau \in H({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2}): \, \int _\Omega {{\mathrm{tr}}}(\tau )\, dx=0\right\} . \end{aligned}$$

It is an essential detail throughout this section that all the multiplicative generic constants may depend on the Lamé parameter \(\mu \) but shall not depend on the other parameter \(\lambda \) which many be arbitrary large in the incompressible limit.

8.2 PEERS

The symmetry condition of the stress variables in linear elasticity is weakened in PEERS a priori and replaced by a Lagrange multiplier. This leads to the weak formulation for \( H:=H({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2})/\mathbb {R}\) as above and \(L:=L^2(\Omega ;\mathbb {R}^2\times \mathbb {R}_{{{\mathrm{skew}}}}^{2\times 2})\) with the skew-symmetric matrices \(\mathbb {R}_{{{\mathrm{skew}}}}^{2\times 2}:=\{F\in \mathbb {R}^{2\times 2}:\;F^T=-F\}\) and the bilinear forms a and b defined for \(\sigma ,\tau \in H\) and \((v,\gamma )\in L\) by

$$\begin{aligned} a(\sigma ,\tau ):=\int _\Omega \sigma :\mathbb {C}^{-1}\tau \,dx \quad \text {and}\quad b(\tau ,(v,\gamma )):=\int _\Omega (v\cdot {{\mathrm{div}}}\tau +\tau :\gamma )\,dx. \end{aligned}$$

Details on the finite element subspaces \(M_h\subset H\) and \(L_h=V_h\times W_h\subset L\) of piecewise polynomials can be found in [2, 20], and [8, Section III. 3.3]. The skew symmetric part \({{\mathrm{skew}}}\tau :=(\tau -\tau ^T)/2\) of a matrix \(\tau \in \mathbb {R}^{2\times 2}\) leads to \(\gamma := {{\mathrm{skew}}}Du\) for the displacement \(u\in \mathcal {V}\) with \(\sigma = \mathbb {C}\varepsilon (u)\) as part of the exact solution of the formulation (23).

The abstract results of this paper extend the reliable and efficient a posteriori error control of [13] to nonconvex domains (without any \(H^2\) regularity assumption).

Theorem 7

The exact (resp. discrete) solution \((\sigma ,u,\gamma )\) (resp. \((\sigma _h,u_h,\gamma _h)\)) satisfies

$$\begin{aligned}&\Vert \sigma -\sigma _h\Vert _{H({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2})} + \Vert u-u_h\Vert _{L^2(\Omega ;\mathbb {R}^2)} +\Vert \gamma -\gamma _h\Vert _{L^2(\Omega ;\mathbb {R}^{2\times 2})} \\&\quad \approx \min _{v\in \mathcal {V}} \Vert \mathbb {C}^{-1} \sigma _h-\gamma _h -Dv\Vert _{L^2(\Omega ;\mathbb {R}^{2\times 2})} +\Vert {{\mathrm{skew}}}\sigma _h \Vert _{L^2(\Omega ;\mathbb {R}^{2\times 2})} \\&\qquad +\, \Vert h( \mathbb {C}^{-1} \sigma _h-\gamma _h -D_h u_h) \Vert _{L^2(\Omega ;\mathbb {R}^{2\times 2})} +\Vert g-g_h\Vert _{L^2(\Omega ;\mathbb {R}^2)}. \end{aligned}$$

Proof

The Fortin interpolation operator \(I_F:H^1(\Omega ;\mathbb {R}^{2\times 2})\rightarrow M_h\) is defined in [2, 20] and satisfies the orthogonality \({{\mathrm{div}}}(1-I_F)(\mathcal {M})\perp V_h\) as well. The enlarged space \(L:=L^2(\Omega ;\mathbb {R}^2\times \mathbb {R}_{{{\mathrm{skew}}}}^{2\times 2})\) concerns the extra \(L^2\) residual \(\Vert {{\mathrm{skew}}}\sigma _h \Vert _{L^2(\Omega ;\mathbb {R}^{2\times 2})}\) as in [13]. The conditions (F1)–(F3) and (H) follow with the arguments at the end of Sect. 5 and further details are omitted. \(\square \)

8.3 Symmetric variant

For the symmetric stress formulation, the definition of the bilinear forms read

$$\begin{aligned} a(\sigma ,\tau )&:=\int _\Omega \sigma :\mathbb {C}^{-1}\tau \,dx \quad \text {and}\quad b(\tau ,v):=\int _\Omega v\cdot {{\mathrm{div}}}\tau \,dx\quad \text {for }\end{aligned}$$
(40)
$$\begin{aligned} \tau \in H&:=H({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2})/\mathbb {R}\cap L^2(\Omega ;\mathbb {R}_{{{\mathrm{sym}}}}^{2\times 2})\quad \text {and}\quad v\in L:=L^2(\Omega ;\mathbb {R}^2)\quad \end{aligned}$$
(41)

with the set \(\mathbb {R}_{{{\mathrm{sym}}}}^{2\times 2}\) of symmetric \(2\times 2\) matrices. Since \(H\subset H({{\mathrm{div}}},\Omega ;\mathbb {R}^{2\times 2})\), all arguments of the Sects. 3 and 4 can be transferred to \(\mathcal {M}:=\mathbb {C}\varepsilon H^2(\Omega ;\mathbb {R}^n)\) with \(A=\mathbb {C}^{-1}\) and the \(2\times 2\) unit matrix B. The linear functional \(\ell \in H^*\) is defined by

$$\begin{aligned} \ell (\tau ):=\int _\Omega (\tau :\mathbb {C}^{-1}\sigma _h+u_h\cdot {{\mathrm{div}}}\tau )dx \end{aligned}$$

for some piecewise polynomials \(\sigma _h\in H\) and \(u_h\in L\). The only difference to the previous sections is that the arguments \(\sigma _h\) and test functions \(\tau \) are a.e. pointwise symmetric.

Theorem 8

(Symmetric variant) Suppose (F1)–(F3) and (H) for \( Q:=\{q\in L^2(\Omega ;\mathbb {R}^{2\times 2}_{{{\mathrm{sym}}}}):\;\forall \tau \in \mathcal {M}, \;\int _\Omega q:(\tau -I_F\tau )\,dx = 0\}\). Then

$$\begin{aligned} \delta&:= \min _{v\in H^1_0(\Omega ;\mathbb {R}^2)}\Vert \mathbb {C}^{-1}\sigma _h-\varepsilon (v)\Vert _{\mathbb {C}} \quad \text {and}\\ \mu&:= \min _{q\in Q}\Vert h(\mathbb {C}^{-1}\sigma _h-q)\Vert _{\mathbb {C}}\le \mu _h:= \Vert h(\mathbb {C}^{-1}\sigma _h-\varepsilon _h(u_h))\Vert _{\mathbb {C}} \end{aligned}$$

satisfy

$$\begin{aligned} \eta :=\delta +\mu \approx \delta +\mu _h \approx \Vert \ell \Vert _{\mathcal {H}^*}. \end{aligned}$$

Proof

The proof of reliability follows the lines of that of Theorem 2 with the substitution of the decomposition (15) by

$$\begin{aligned} q=\mathbb {C}\varepsilon (\alpha )+{{\mathrm{Curl}}}{{\mathrm{Curl}}}\gamma \end{aligned}$$

with \(\alpha \in H^2(\Omega ;\mathbb {R}^{2})\) and \(\gamma \in H^2(\Omega )\). This decomposition follows from [18, Lemma 3.3] (with \(\nabla (\cdot )\) replaced by \(\mathbb {C}\varepsilon (\cdot )\) and \(\Delta (\cdot )\) replaced by \({{\mathrm{div}}}\mathbb {C}\varepsilon (\cdot )\)) and the proof of [13, Lemma 3.2]; symmetry of the remainder \(q-\mathbb {C}\varepsilon (\alpha )\) allows the recast into \({{\mathrm{Curl}}}{{\mathrm{Curl}}}\gamma \).

The remaining parts on the reliability in Theorem 8 follow the lines of Sect. 3 with \(p_h:=\mathbb {C}^{-1}\sigma _h\) and the substitution of \(D\alpha \) by \(\varepsilon (\alpha )\). This yields

$$\begin{aligned} \ell (A\varepsilon (\alpha ))\le \mu (\Vert \varepsilon (\alpha )\Vert _A^2 +\Vert {{\mathrm{div}}}A\varepsilon (\alpha )\Vert _{L^2(\Omega ;\mathbb {R}^2)}^2)^{1/2} \end{aligned}$$

instead of (17). The substitute of (18) reads, for all \(v\in V\),

$$\begin{aligned} \ell ({{\mathrm{Curl}}}{{\mathrm{Curl}}}\gamma )&=\int _\Omega (p_h-\varepsilon (v)):{{\mathrm{Curl}}}{{\mathrm{Curl}}}\gamma \,dx\\ {}&\le \Vert p_h-\varepsilon (v)\Vert _{\mathbb {C}}\Vert {{\mathrm{Curl}}}{{\mathrm{Curl}}}\gamma \Vert _{\mathbb {C}^{-1}}. \end{aligned}$$

The efficiency follows the lines of Theorem 3; the remaining details are omitted. \(\square \)

8.4 Arnold–Winther mixed FEM in elasticity

Arnold and Winther design \(M_h\subset \mathcal {H}\) and \(L_h\subset \mathcal {L}\) of order \(k=1,2\) in [3] with the \(L^2\) projection \(g_h\) of the right-hand side g onto \(\mathcal {P}_k(\mathcal {T};\mathbb {R}^2)\).

Theorem 9

The exact (resp. discrete) solution \((\sigma ,u)\) (resp. \((\sigma _h,u_h))\) to the mixed formulation (23) with (40) in (41) (resp. \(M_h\subset \mathcal {H}\) and \(L_h\subset \mathcal {L})\) satisfy

Proof

The operator \(I_F\) is denoted as \(\Pi _h\) in [3, Eq. (4.2)]) and satisfies (F1)–(F3) and (H) with \(\mathcal {P}_k(\mathcal {T};\mathbb {R}^2)\subset Q\) for \(k=1,2\), cf. also [3, Eq. (A1’)–(A2’) and (4.7)]. The proofs follows the arguments for mixed FEM at the end of Sect. 5 and [3]. Consequently, Theorem 8 applies to the unique discrete solution \((\sigma _h,u_h)\) in (1). Therefore, Theorem 8 implies the assertion. \(\square \)