1 Introduction

The \(L_2\)-projection onto a finite element space is a valuable tool in many areas of finite element analysis. It plays, for example, often an important role in the convergence analysis of multigrid methods. In [2], a simple proof of \(H^1\)-stability of the \(L_2\)-projection was given in the case of quasiuniform meshes with this application in mind. With the widespread use of adaptive and more general classes of nonuniform meshes, there is interest in generalizing this result to the nonuniform mesh case. At first glance, it seems that this should not be a difficult problem. The mass matrix, while no longer comparable to the identity matrix independent of the (now local) mesh size, does remain comparable to its own diagonal. One expects that the exponential decay of matrix elements away from the diagonal in the inverse of the mass matrix, shown by Douglas et al. in [14], should also remain valid even in the nonuniform mesh case. However, the central difficulty is that this exponential decay might potentially be offset by exponential growth due to grading of the finite element mesh. The works of Crouzeix and Thomée [10], Bramble, Pasciak and Steinbach [6], and Carstensen [7, 8] all address this issue by imposing certain local or global growth constraints on the mesh. In this work, we must also impose some constraints, but ours are weaker, and allow the inclusion of high order elements and meshes generated by many commonly used adaptive meshing strategies.

Although our technique easily transfers to more general situations and can be applied to a large variety of different finite element spaces, we restrict ourselves in this article for the ease of presentation to the classical case of piecewise polynomial conforming elements. Starting point is a conforming triangulation \({\fancyscript{T}}\) of a polygonal domain \(\varOmega \) in two or three space dimensions, built up from triangles in two space dimensions and tetrahedrons in the three-dimensional case. Associated with \({\fancyscript{T}}\) is a conforming finite element space \({\fancyscript{S}}\) of the usual kind, consisting of continuous, piecewise polynomial functions of at first fixed degree, determined by their nodal values. Our object of study is the \(L_2\)-orthogonal projection

$$\begin{aligned} Q:L_2(\varOmega )\rightarrow {\fancyscript{S}}\end{aligned}$$
(1.1)

from \(L_2(\varOmega )\) onto the finite element space \({\fancyscript{S}}\). We want to estimate the \(H^1\)-seminorm of the projection \(Qu\), the \(L_2\)-norm of its first order derivatives, of a function \(u\) in the Sobolev space \(H^1\) by the \(H^1\)-seminorm of \(u\) itself.

To each finite element \(T\in {\fancyscript{T}}\) we assign a nonnegative integer \(k=k(T)\), the level of the element, such that \(2^{-k(T)}\) is roughly proportional to the diameter \(h(T)\) of \(T\), in the sense that there are constants \(\alpha >0\) and \(\beta >0\) with

$$\begin{aligned} \alpha 2^{-k(T)}\le h(T) \le \beta 2^{-k(T)}. \end{aligned}$$
(1.2)

The actual size of these constants is of no significance; only their ratio \(\beta /\alpha \) will enter into our estimates. The triangulation \({\fancyscript{T}}\) can be highly nonuniform and can contain finite elements from a very wide range of levels. We require, however, that the level of two neighboring elements differs at most by one. By neighbor, we refer to all elements that share a vertex with a given element.

Originally we had grids in mind that are generated by common adaptive meshing schemes based on bisection, for example the red-green refinement scheme in two [4] or three space dimensions [5]. In this case, the level of an element \(T\) counts the number of refinement steps that are needed to generate \(T\) from its ancestor in the initial triangulation. This is consistent with our current definition of level. The ratio of the constants \(\alpha \) and \(\beta \) reflects in such cases the degree of non-uniformity of the initial triangulation but remains independent of the degree of refinement. Consider, for example, red-green refinement in two space dimensions. Red refinement of a triangle means subdivision into four congruent subtriangles of half the diameter. Since a green refinement into two subtriangles takes place only in the final step and is reversed in case of an additional refinement, the condition (1.2) holds with \(\beta /\alpha =2\) if one restricts the attention to a single triangle of the initial triangulation. In three space dimensions, the situation is basically the same. Red refinement of a tetrahedron means subdivision into eight tetrahedrons of equal volume that fall in at most three congruence classes, independent of the number of refinement steps, and green refinement takes again only place in the final step.

In red-green refinement, it is typical to impose a difference of at most one level between elements sharing a common edge in two or face in three space dimension. This is not sufficient to guarantee the same kind of constraint for the vertices, that is, that the levels of neighboring elements in the sense introduced above differs only by one. This condition can, however, be explicitly imposed on red-green meshes at little additional cost, even if such meshes often satisfy such a condition simply due to the refinement pattern generated by the local error indicators used in the refinement process. Experience shows that the local element sizes change even more moderately in case of mesh generators for nonuniform meshes that employ vertex-based mesh smoothing to locally optimize some measure of shape regularity [1], at least up to very few elements in small exceptional regions around strong singularities.

We will show that subject to the given conditions on the mesh the estimate

$$\begin{aligned} |Qu|_1\le c |\,u\,|_1 \end{aligned}$$
(1.3)

for the \(H^1\)-seminorm of the projection \(Qu\) of a function \(u\) in \(H^1\) holds for elements up to order twelve in two and up to order seven in three space dimensions. The constant \(c\) depends only on the order of the elements, on the ratio \(\beta /\alpha \) of the constants in (1.2), and on the shape regularity of the triangles or tetrahedrons, but neither on the quasiuniformity of the mesh nor on any other such global property. For a sufficiently smooth grading of the mesh our argumentation works without any restriction to the polynomial order of the finite elements or any further condition to the triangulation up to shape regularity. It is even possible to extend our proof to \(hp\)-like adaptive finite element methods as long as the maximum order of the elements remains limited.

2 An iterative method

The main ingredient of our proof for the \(H^1\)-stability of the \(L_2\)-projection, as well as other proofs in the literature, are its strong localization properties, or roughly speaking, the fact that the inverse of the mass matrix decays very rapidly away from the diagonal. See Douglas et al. [14] and the related work [11, 12] of Demko and coauthors for band matrices. The results of this section will serve to quantify this effect in a fashion needed for our analysis.

Let \({\fancyscript{S}}_0\) be the subspace of the finite element space \({\fancyscript{S}}\) under consideration that consists of the functions in \({\fancyscript{S}}\) that are \(L_2\)-orthogonal to the functions in \({\fancyscript{S}}\) that vanish on the boundaries of the single elements. The functions in \({\fancyscript{S}}_0\) are determined by their values at the nodes on the boundaries of the finite elements. For finite element spaces of polynomial order \(d\) or less, \(d=2\) or \(d=3\) the space dimension, no interior nodes, inside the elements, exist and \({\fancyscript{S}}_0\) is the original space \({\fancyscript{S}}\). In all other cases, we denote by \({\fancyscript{S}}_1\) the space of the functions in \({\fancyscript{S}}\) that vanish on the boundaries of the finite elements. The \(L_2\)-orthogonal projection \(Q\) onto \({\fancyscript{S}}\) splits then into the sum

$$\begin{aligned} Q = Q_0 + Q_1 \end{aligned}$$
(2.1)

of the \(L_2\)-orthogonal projection \(Q_0\) onto the subspace \({\fancyscript{S}}_0\) and the \(L_2\)-orthogonal projection \(Q_1\) onto \({\fancyscript{S}}_1={\fancyscript{S}}_0^\bot \). As the contributions from the single elements to \(Q_1\) do not interact, the localization properties of the projection \(Q\) are completely determined by those of \(Q_0\), the operator that will be studied in the rest of this section.

We label the vertices of the finite elements by the integers \(i=1,2,\ldots \,n\). The vertex \(i\) is surrounded by the patch \(U_i\), the union of the finite elements that share this vertex. Let \({\fancyscript{V}}_i\) be the space that consists of the functions in the space \({\fancyscript{S}}_0\) that vanish outside \(U_i\). Let \(P_i\) be the \(L_2\)-orthogonal projection onto \({\fancyscript{V}}_i\) and let

$$\begin{aligned} C=P_1+P_2+\ldots +P_n. \end{aligned}$$
(2.2)

We will construct with help of this operator approximations of the projection \(Q_0u\) of a given square integrable function \(u\) onto \({\fancyscript{S}}_0\). For that purpose, we first define finite element functions \(u^{(\nu )}\in {\fancyscript{S}}_0\) recursively by

$$\begin{aligned} u^{(\nu +1)}=u^{(\nu )}+C\,(u-u^{(\nu )}), \end{aligned}$$
(2.3)

with \(u^{(0)}\in {\fancyscript{S}}_0\) an arbitrary starting value, and combine them to weighted averages

$$\begin{aligned} w^{(\ell )}=\sum _{\nu =0}^\ell \alpha _{\ell \nu }u^{(\nu )}, \quad \sum _{\nu =0}^\ell \alpha _{\ell \nu }=1. \end{aligned}$$
(2.4)

This procedure can be seen as a polynomially accelerated additive subspace correction method in the sense of [16]; see also the survey article [18].

The analysis of this iterative method starts from two abstract assumptions [16, 18] that still need to be verified. First, we assume that every finite element function

$$\begin{aligned} v=v_1+v_2+\ldots \,+v_n \end{aligned}$$
(2.5)

in \({\fancyscript{S}}_0\) can be decomposed into functions \(v_i\) in the subspaces \({\fancyscript{V}}_i\) such that

$$\begin{aligned} \sum _i\Vert v_i\Vert _0^2 \le K_1\Vert v\Vert _0^2. \end{aligned}$$
(2.6)

Secondly, we assume that for every such linear combination (2.5) the estimate

$$\begin{aligned} \Vert v\Vert _0^2 \le K_2 \sum _i\Vert v_i\Vert _0^2 \end{aligned}$$
(2.7)

holds. Our main tool is the following lemma that is an adaption of the general result for subspace decomposition methods [16, 18] to the present situation.

Lemma 2.1

The restriction of the operator \(C\) to the subspace \({\fancyscript{S}}_0\) is self-adjoint with respect to the \(L_2\)-inner product. Its eigenvalues \(\lambda \) range in the interval

$$\begin{aligned} 1/K_1 \le \lambda \le K_2. \end{aligned}$$
(2.8)

Proof

The self-adjointness of the operator \(C\) immediately results from its definition, that is, the self-adjointness of the projections \(P_i\). Let \(v=v_1+v_2+\,\ldots \,+v_n\) be a decomposition of the finite element function \(v\in {\fancyscript{S}}_0\) as in the assumption (2.6). Then it follows from the Cauchy-Schwarz inequality

$$\begin{aligned} (v,v)=\sum _i(v_i,v)=\sum _i(v_i,P_iv) \le \Big (\sum _i\Vert v_i\Vert _0^2\Big )^{1/2} \Big (\sum _i\Vert P_iv\Vert _0^2\Big )^{1/2} \end{aligned}$$

and therefore, by assumption (2.6),

$$\begin{aligned} (v,v) \le K_1\sum _i\Vert P_iv\Vert _0^2=K_1\sum _i(v,P_iv)=K_1(v,Cv). \end{aligned}$$

The operator \(C\) is therefore positive definite and its eigenvalues are bounded from below by the constant \(1/K_1\). By assumption (2.7) conversely

$$\begin{aligned} \Vert Cv\Vert _0^2=\Big \Vert \sum _i P_iv\Big \Vert _0^2\,\le \,K_2\sum _i\Vert P_iv\Vert _0^2 \end{aligned}$$

and, using once more that \(P_i\) is an \(L_2\)-orthogonal projection,

$$\begin{aligned} \Vert Cv\Vert _0^2\,\le \,K_2\sum _i(v,P_iv)=K_2(v,Cv). \end{aligned}$$

The constant \(K_2\) is therefore an upper bound for the eigenvalues of \(C\). \(\square \)

The error between the projection \(Q_0u\) and its approximation (2.4) can be bounded in terms of the ratio \(\lambda _n/\lambda _1\le K_1K_2\) of the maximum and the minimum eigenvalue of the operator \(C\) restricted to the subspace \({\fancyscript{S}}_0\). This results from:

Lemma 2.2

Let \(\lambda _1\le \lambda _2\le \ldots \le \lambda _n\) be the eigenvalues of the operator \(C\) restricted to the space \({\fancyscript{S}}_0\) and introduce the polynomial

$$\begin{aligned} P(\lambda )=\sum _{\nu =0}^\ell \alpha _{\ell \nu }(1-\lambda )^\nu \end{aligned}$$
(2.9)

of degree \(\ell \). Then the error estimate

$$\begin{aligned} \Vert Q_0u-w^{(\ell )}\Vert _0 \le \max _{k=1,\ldots ,n}|P(\lambda _k)| \Vert Q_0u-w^{(0)}\Vert _0. \end{aligned}$$
(2.10)

holds for the approximation (2.4) of the projection \(Q_0u\) of \(u\in L_2\) onto \({\fancyscript{S}}_0\).

Proof

Since \(\sum _{\nu =0}^\ell \alpha _{\,\ell \nu }=1, w^{(0)}=u^{(0)}\), and \(Cu=CQ_0u\),

$$\begin{aligned} Q_0u-w^{(\ell )}= \sum _{\nu =0}^\ell \alpha _{\,\ell \nu }\,(I-C)^\nu (Q_0u-w^{(0)}). \end{aligned}$$

Expanding the initial error into \(L_2\)-orthogonal eigenfunctions of the self-adjoint operator \(C\) from the finite element space into itself, the proposition follows. \(\square \)

The polynomial \(P\) of degree \(\ell \) that satisfies the normalization condition \(P(0)=1\) and attains the minimum maximal value on the interval \(\lambda _1\le \lambda \le \lambda _n\) is, up to a linear transformation of the variable and the multiplication by a constant factor, the Chebyshev polynomial of degree \(\ell \), a fact that is widely used in the analysis of semi-iterative and Krylov-space methods; see [13], for example. Choosing the averaging coefficients \(\alpha _{\,\ell \nu }\) accordingly, one obtains from (2.10) and the estimate (2.8) for the eigenvalues of \(C\) the final estimate for the convergence rate of our semiiterative method:

Lemma 2.3

If the coefficients \(\alpha _{\,\ell \nu }\) are optimally chosen,

$$\begin{aligned} \Vert Q_0u-w^{(\ell )}\Vert _0\le \frac{2\,q^{\,\ell }}{1+q^{\,2\ell }}\Vert Q_0u-w^{(0)}\Vert _0, \end{aligned}$$
(2.11)

where the convergence rate

$$\begin{aligned} q=\frac{\sqrt{\kappa }-1}{\sqrt{\kappa }+1} \end{aligned}$$
(2.12)

is determined by the condition number \(\kappa =\lambda _n/\lambda _1\le K_1K_2\) of the operator \(C\).

It remains to bound the constants \(K_1\) and \(K_2\). The optimum decomposition (2.5) of a given finite element function in \({\fancyscript{S}}_0\) can be found solving a global saddle point problem, but this observation is not truly helpful. Our estimate for \(K_1\) is based on a local, element-wise construction and utilizes the nodal basis representation of the finite element functions. Let \(\varphi _i\) be the linear nodal basis function assigned to the vertex \(i\). The \(\varphi _i\) form a partition of unity. We decompose the finite element functions \(v\in {\fancyscript{S}}_0\) therefore into the finite element functions \(v_i\in {\fancyscript{V}}_i\) given by their values

$$\begin{aligned} v_i(x_k)=\varphi _i(x_k)v(x_k) \end{aligned}$$
(2.13)

at the boundary nodes \(x_k\) of the finite elements. The estimate (2.6) on the overall region then follows from the local estimates

$$\begin{aligned} \sum _i\Vert v_i\Vert _{0,T}^2 \le K_1\Vert v\Vert _{0,T}^2 \end{aligned}$$
(2.14)

on the single elements \(T\) in the triangulation \({\fancyscript{T}}\) by summation over all \(T\). As the local mass matrices for the single elements differ only by a scalar area respectively volume factor and \(L_2\)-orthogonality is not affected by the size or shape of the elements, thus one can restrict oneself in the calculation of \(K_1\) to the examination of a single reference element. This leads to a small eigenvalue problem

$$\begin{aligned} Bx=\zeta Mx, \end{aligned}$$
(2.15)

where \(M\) is the element mass matrix and \(B\) the sum of the three or four matrices that arise from the multiplication of the values of the finite element functions at the boundary nodes by the values of the corresponding functions \(\varphi _i\). The maximum eigenvalue \(\zeta \) represents then an upper bound for the best possible constant \(K_1\).

Upper bounds for the constants \(K_2\) can be derived in a similar way. As above, one can restrict oneself again to the consideration of a reference element \(T\). If

$$\begin{aligned} \Vert v_1+v_2+\ldots +v_{d+1}\Vert _{0,T}^2\le K_2\sum _{i=1}^{d+1}\Vert v_i\Vert _{0,T}^2 \end{aligned}$$
(2.16)

holds for every set of finite element functions \(v_i\) that vanish on the edge respectively the face opposite to the vertex \(i\) and that are orthogonal to all polynomials of given order that vanish on the boundary of \(T\), (2.7) holds with the same constant. Therefore

$$\begin{aligned} K_2 \le d+1 \end{aligned}$$
(2.17)

independent of the order of the elements, as follows from the triangle and the Cauchy-Schwarz inequality. The calculation of of the minimum possible constant \(K_2\) in (2.16) leads again to a matrix eigenvalue problem.

The corresponding eigenvalues have been calculated with help of the computer algebra program Maple, where exact arithmetic over the rational numbers has been used to set up the element matrices and to factor the resulting characteristic polynomials as far as possible. The zeroes of the single factors have then been calculated numerically with very high precision. All these zeroes turned out to be simple. To guarantee the accuracy of the zeroes up to the given number of digits, it has been shown afterward using again rational arithmetic that the factors change their sign in correspondingly small neighborhoods of the computed floating point numbers.

Table 1 shows the results of these computations, rounded to sixteen decimal digits, for triangular elements of order \(p=1\) to \(p=13\) together with the resulting upper bound for the convergence rate (2.12) of the polynomially accelerated iteration. For elements up to order twelve, the convergence rate remains below the bound

$$\begin{aligned} q<\frac{1}{2} \end{aligned}$$
(2.18)

that will turn out to be critical and will be needed to counterbalance the mesh grading.

Table 1 The constants \(K_1\) and \(K_2\) and the convergence rate \(q\) for triangular elements of order \(p=1\) to \(13\)

The situation for tetrahedral elements in three space dimensions is somewhat less favorable. The results of the corresponding computations are shown in Table 2. The convergence rate (2.12) remains below the critical bound (2.18) for elements of order one to seven. The condition (2.18) is violated for elements of order eight.

Table 2 The constants \(K_1\) and \(K_2\) and the convergence rate \(q\) for tetrahedral elements of order \(p=1\) to \(8\)

3 A decay property of the \(\mathbf{L_2}\)-projection

The convergence rate \(q\) determines the localization properties of the \(L_2\)-projection. The faster the iteration converges, that is, the smaller \(q\) is, the faster local influences decay with distance from the source. The aim of this section is to quantify this effect. We group the finite elements by their levels and denote by \(\varOmega _i\) the union of the elements of level \(i\). The next lemma describes how fast the projections of functions that vanish outside these sets decay on the exterior of these sets.

Lemma 3.1

Let \(v_k\) be a square integrable function vanishing outside \(\varOmega _k\). Then

$$\begin{aligned} \Vert Qv_k\Vert _{0,\varOmega _i}\le \gamma {(|i-k|)}\Vert v_k\Vert _0, \end{aligned}$$
(3.1)

where the prefactors decay like \(q^{|i-k|}\) with the distance from \(i\) to \(k\) and are given by

$$\begin{aligned} \gamma (\ell )=\min (1,2\,q^{\,\ell -1}). \end{aligned}$$
(3.2)

Proof

Let the \(w^{(\ell )}\in {\fancyscript{S}}_0\) be the iteratively generated approximations of \(Q_0v_k\) as in (2.4), with starting value \(w^{(0)}=0\). By (2.11) then

$$\begin{aligned} \Vert Q_0v_k-w^{(\ell )}\Vert _0 \le 2\,q^{\ell }\Vert Q_0v_k\Vert _0 \le 2 q^{\ell } \Vert v_k\Vert _0. \end{aligned}$$

As the operator (2.2) is composed of the local projections onto the spaces of functions in \({\fancyscript{S}}_0\) that vanish outside the patches consisting of the elements surrounding a given vertex, information propagates in the transition from step \(\ell \) to step \(\ell +1\) from a given set of elements not further than to the union of the patches that cover at least one of these elements. As the level of neighboring elements, where neighbors were in our definition elements that share a vertex, differs by assumption at most by one, the approximation \(w^{(\ell )}\) of \(Q_0v_k\) thus vanishes on the union \(\varOmega _i\) of the elements of level \(i\) as long as the number \(\ell \) of iterations remains less than \(|i-k|\). For these \(\ell \),

$$\begin{aligned} \Vert Q_0v_k\Vert _{0,\varOmega _i}=\Vert Q_0v_k-w^{(\ell )}\Vert _{0,\varOmega _i} \le \,2\,q^{\,\ell }\,\Vert v_k\Vert _0. \end{aligned}$$

As \(Qv_k=Q_0v_k\) outside \(\varOmega _k\), the proposition follows choosing \(\ell =|i-k|-1\). \(\square \)

4 The \(\mathbf{H^1}\)-stability

If the convergence rate \(q\) given by (2.12) is less than \(1/2\), then the decay behavior expressed by (3.1) counterbalances the possible decrease of the element diameter from a given element to its neighbors. This is the key to the proof of the following theorem, that forms the basis of our proof for the \(H^1\)-stability of the \(L_2\)-projection.

Theorem 4.1

If the convergence rate \(q\) given by (2.12) is less than \(1/2\), there is a constant \(c\) such that for all square integrable functions \(v\)

$$\begin{aligned} \Vert h^{-1}Qv\Vert _0 \le c\,\Vert h^{-1}v\Vert _0, \end{aligned}$$
(4.1)

where the function \(h\) denotes the local meshwidth on the single elements. This constant exclusively depends on the convergence rate \(q\) and the ratio \(\beta /\alpha \) of the constants in the assumption (1.2) that links the levels of the elements to their diameters.

Proof

We first introduce the weighted \(L_2\)-norm given by the expression

$$\begin{aligned} |\!|\!|v|\!|\!|^2 =\sum _i\,4^i\,\Vert v\Vert _{0,\varOmega _i}^2, \end{aligned}$$
(4.2)

where \(\varOmega _i\) is as above the union of the elements of level \(i\). By assumption (1.2),

$$\begin{aligned} \alpha \Vert h^{-1}v\Vert _0 \le |\!|\!|v|\!|\!|\le \beta \Vert h^{-1}v\Vert _0 \end{aligned}$$

holds for all square integrable functions \(v\). Thus it suffices to prove an estimate

$$\begin{aligned} |\!|\!|Qv|\!|\!| \le c_0\,|\!|\!|v|\!|\!|. \end{aligned}$$
(4.3)

Let \(v_k\) attain the same values as \(v\) on \(\varOmega _k\) and the value \(0\) elsewhere. Then

$$\begin{aligned} |\!|\!|Qv|\!|\!|^2 ={\Big |\!\Big |\!\Big |\sum _k Qv_k\Big |\!\Big |\!\Big |}^2= \sum _i 4^i \sum _{k,l} (Qv_k,Qv_\ell )_{\varOmega _i}, \end{aligned}$$

where all sums extend over the finite set of nonempty levels \(i,k\), and \(\ell \) in the given triangulation. The estimate from Lemma 3.1 leads therefore to the estimate

$$\begin{aligned} |\!|\!|Qv|\!|\!|^2\le \sum _{k,l}\sum _i 4^i \gamma (|i-k|)\gamma (|i-\ell |) \Vert v_k\Vert _0\Vert v_\ell \Vert _0 \end{aligned}$$

or, going back to the definition of \(v_k\) and \(v_\ell \) and of the weighted \(L_2\)-norm (4.2),

$$\begin{aligned} |\!|\!|Qv|\!|\!|^2 \le \sum _{k,l}\,a_{k\ell } |\!|\!|v_k|\!|\!| |\!|\!|v_\ell |\!|\!|, \end{aligned}$$

where the coefficients \(a_{k\ell }\) are given by

$$\begin{aligned} a_{k\ell }= \sum _i\,2^{i-k}\gamma (|i-k|)\,2^{i-\ell }\gamma (|i-\ell |). \end{aligned}$$

If \(\varLambda \) denotes the maximum eigenvalue of the symmetric matrix with these entries,

$$\begin{aligned} |\!|\!|Qv|\!|\!|^2\le \varLambda \sum _k |\!|\!|v_k|\!|\!|^2= \varLambda \sum _k\,4^k \Vert v\Vert _{0,\varOmega _k}^2= \varLambda |\!|\!|v|\!|\!|^2 \end{aligned}$$

follows. The eigenvalue \(\varLambda \) is bounded by the maximum row sum

$$\begin{aligned} \sum _\ell |a_{k\ell }|= \sum _i 2^{i-k}\gamma (|i-k|) \sum _\ell \,2^{i-\ell }\gamma (|i-\ell |) \end{aligned}$$

of this matrix, that is, in terms of the maximum level \(j\) of the finite elements in the given triangulation, by the square of the expression

$$\begin{aligned} \sum _{\ell =0}^j 2^{-\ell }\gamma (\ell )+ \sum _{\ell =1}^j 2^\ell \gamma (\ell ). \end{aligned}$$

If \(q<1/2\), one can let tend \(j\) to infinity and obtains the desired upper bound

$$\begin{aligned} \sum _{\ell =0}^\infty 2^{-\ell }\gamma (\ell )+ \sum _{\ell =1}^\infty 2^\ell \gamma (\ell )= \frac{4}{1-2q}-\frac{1}{2}+\frac{q}{2-q} \end{aligned}$$

for the constant \(c_0\) in the estimate (4.3). In the limit case \(q=1/2\),

$$\begin{aligned} \sum _{\ell =0}^j 2^{-\ell }\gamma (\ell )+ \sum _{\ell =1}^j 2^\ell \gamma (\ell )= 4j-\frac{1}{6}-\frac{4}{3}\,4^{-j}. \end{aligned}$$

In this case, it is no longer possible to limit the weighted \(L_2\)-norm of the projection operator independent of \(j\). It increases, however, at most like the logarithm of the quotient of the maximum and the minimum diameter of the finite elements. \(\square \)

The stability (4.1) of \(Q\) with respect to the given weighted \(L_2\)-norm implies the \(H^1\)-stability of \(Q\). In fact, to finish our proof we need only a quasi-interpolation operator \(\varPi :L_2\rightarrow {\fancyscript{S}}\) with the following properties. First, we assume that the estimate

$$\begin{aligned} \Vert h^{-1}(u-\varPi u)\Vert _0\,\lesssim |\,u\,|_1 \end{aligned}$$
(4.4)

holds for all functions \(u\in H^1\), which is a kind of error estimate, and secondly, that

$$\begin{aligned} |\varPi u|_1\,\lesssim \,|\,u\,|_1 \end{aligned}$$
(4.5)

for these \(u\). Since its introduction by Clément [9], the use of such quasi-interpolation operators is standard in finite element theory. One possible example is the operator

$$\begin{aligned} \varPi u= \sum _i\frac{(u,\varphi _i)}{(1,\varphi _i)}\,\varphi _i \end{aligned}$$
(4.6)

that reproduces locally constant functions; the \(\varphi _i\) are the piecewise linear hat functions assigned to the vertices of the finite elements. It is analyzed in more detail in [17]. The proof of (4.4) and (4.5) is based on a local version of the Poincaré inequality and the local quasiuniformity of the triangulation.

Theorem 4.2

If the convergence rate \(q\) given by (2.12) is less than \(1/2\), and the \(L_2\)-orthogonal projection \(Q\) onto the finite element space \({\fancyscript{S}}\) is thus stable with respect to the weighted \(L_2\)-norm from (4.1), there is a constant \(c\) such that for all \(u\in H^1\)

$$\begin{aligned} |Q u|_1 \le c\,|\,u\,|_1. \end{aligned}$$
(4.7)

Proof

Since \(\varPi u\) is reproduced by the projection \(Q\),

$$\begin{aligned} |Qu|_1 \le |Q(u-\varPi u)|_1 +|\varPi u|_1. \end{aligned}$$

The second term on the right hand side can, by means of (4.5), be estimated as desired. To estimate the first term, we first apply the inverse inequality

$$\begin{aligned} |Q(u-\varPi u)|_1 \lesssim \Vert h^{-1}Q(u-\varPi u)\Vert _0 \end{aligned}$$

and utilize Theorem 4.1, the stability of \(Q\) in the weighted \(L_2\)-norm:

$$\begin{aligned} \Vert h^{-1}Q(u-\varPi u)\Vert _0 \lesssim \Vert h^{-1}(u-\varPi u)\Vert _0. \end{aligned}$$

Applying the error estimate (4.4), one obtains the estimate

$$\begin{aligned} |Q(u-\varPi u)|_1 \lesssim |\,u\,|_1 \end{aligned}$$

for the first term, which proves the theorem. \(\square \)

The assumptions of Theorem 4.2 are in particular fulfilled for conforming triangular elements up to order twelve in two space dimensions and conforming tetrahedral elements up to order seven in three space dimensions. We conclude:

Theorem 4.3

The \(L_2\)-projection onto the given finite element spaces is \(H^1\)-stable in the case of elements up to order twelve in two and up to order seven in three space dimensions, regardless the quasiuniformity of the triangulation.

5 An analysis of a simple hp-method

Finite elements of a fixed order greater than three or four are rarely used in practice, in contrast to \(hp\)-methods that became meanwhile very popular. The recent version [1] of the finite element code pltmg is based on such an approach. More details on such kinds of finite element methods can be found in [3]. Our theory applies to the \(hp\)-case, at least if the maximum order of the elements is limited.

We restrict ourselves here to the two-dimensional case and start from the same type of conforming triangulations \({\fancyscript{T}}\) as described in the introduction, to which a conforming finite element space \({\fancyscript{S}}\) is associated. The functions in \({\fancyscript{S}}\) are globally continuous but their polynomial degree can now differ from triangle to triangle. The triangulations are composed of regular elements and a special kind of transition elements. Regular elements are elements as considered before. The restrictions of the finite element functions to regular elements of order \(p\) are the polynomials of degree \(p\). These are determined by their values at the nodes of order \(p\) in the given triangle. The transition elements serve to connect elements of a given order \(p\) to a single element of order \(p+1\). The finite element functions are determined on such triangles by their values at the nodes of order \(p\), except for those on the transition edge that are there replaced by the nodes of order \(p+1\). The restriction of the finite element space to such a triangle is spanned by the polynomials of degree \(p\) and a further polynomial of degree \(p+1\) that enables the continuous transition across the transition edge and vanishes on the other two edges. The transition elements play in the given context the same role as the “green” triangles in red-green \(h\)-refinement [4].

The analysis of the preceding sections completely covers this construction. The corresponding constants \(K_1\) and \(K_2\) from (2.14) and (2.16) for the transition elements, calculated and checked in the same way as in Sect. 2, can be found in Table 3. As there, the interior nodes of the transition elements have been eliminated to speed up the convergence of the iteration underlying our proof. Since in the \(hp\)-case elements of different kind are combined, the global constants

$$\begin{aligned} K_1=\max _{T\in {\fancyscript{T}}}\,K_1(T), \quad K_2=\max _{T\in {\fancyscript{T}}}\,K_2(T) \end{aligned}$$
(5.1)

are the maxima of the corresponding constants for the single elements respectively reference elements. Remarkably, the global constants \(K_1\) and \(K_2\) are dominated by those of low-order transition elements, \(K_1\) by that of the transition elements from order \(1\) to \(2\) and \(K_2\) by that of the transition elements from order \(2\) to \(3\). Nevertheless we can conclude that the convergence rate of the iteration from Sect. 2 is at least

$$\begin{aligned} q=0.4799547\ldots \end{aligned}$$
(5.2)

if we limit the maximum order of the elements to eleven and remains therefore below the critical bound \(q=1/2\). The \(L_2\)-projection thus remains also for this case \(H^1\)-stable without further conditions to the mesh or any other assumption.

Table 3 The constants \(K_1\) and \(K_2\) for the transition elements connecting triangles of order \(p\) and \(p+1\)

6 The influence of the mesh grading

The crucial estimate that links the structure of the mesh and the type of the finite element functions is (3.1). It represents a worst case estimate that allows the size of the elements to change rather quickly. If one assumes a smoother grading of the finite element mesh, one can relax our conditions considerably. To quantify this effect, we attribute a graph to the triangulation. The single elements are the nodes of this graph. Two nodes are connected by an edge if the corresponding two elements are neighbors. The distance of two elements is then the distance of the corresponding nodes in the graph. The proof of (3.1) uses that the distance of two elements of levels \(i\) and \(k\) is at least \(|i-k|\). If one assumes that the distance of two such elements is at least \(m|i-k|\) with \(m\) a real parameter greater than one for levels that differ sufficiently much, that is, that the meshsize changes at least in some average less abruptly, one obtains instead of (3.1) a potentially much stronger estimate

$$\begin{aligned} \Vert Qv_k\Vert _{0,\varOmega _i} \le \widetilde{\gamma } (|i-k|)\Vert v_k\Vert _0, \end{aligned}$$
(6.1)

where the prefactor is, for \(\ell \) greater than a fixed \(\ell _0\), now given by

$$\begin{aligned} \widetilde{\gamma }\,(\ell )=2\,q^{m\ell -1}. \end{aligned}$$
(6.2)

It depends now on \(m\) and decays like \(q^{m|i-k|}\) with the distance from \(i\) to \(k\). The proof of Theorem 4.1 works then as before, with the only difference that the condition that the convergence rate \(q\) is less than \(1/2\) can be replaced by the less restrictive condition

$$\begin{aligned} q^{m}<\frac{1}{2}. \end{aligned}$$
(6.3)

One can even go a step further modifying the concept of level. Assume that the diameter \(h(T)\) of two elements in \({\fancyscript{T}}\) that have at least one point in common differs at most by the factor \(\mu \ge 1\); the existence of such a factor follows from the shape regularity of the elements. Then to each finite element \(T\) in our triangulation \({\fancyscript{T}}\) we assign a nonnegative integer \(k=k(T)\), the new kind of level of the element, such that

$$\begin{aligned} h_0 \mu ^{-k(T)}\le h(T) <h_0\,\mu ^{-k(T)+1}, \end{aligned}$$
(6.4)

where \(h_0\) is the maximum element diameter. The level of two neighboring elements differs then by construction at most by one. One can therefore show along the given lines that the \(L_2\)-projection remains \(H^1\)-stable as long as the condition

$$\begin{aligned} q^{\,m}<\,\frac{1}{\mu } \end{aligned}$$
(6.5)

is satisfied. The condition on the convergence rate \(q\) of the iterative method from Sect. 2 thus becomes less and less restrictive the more \(\mu \) approaches the value one or the larger constant \(m\) introduced as above is, that is, the smoother the grading of the mesh becomes. Shape regularity of the elements and a sufficiently smooth grading of the mesh provided, our argumentation thus works without any restriction to the type or order of the finite elements. There is therefore less doubt that the \(L_2\)-projection remains \(H^1\)-stable in almost all cases of practical interest.

7 A counterexample

The following simple example strongly suggests that the \(L_2\)-projection onto finite element spaces cannot be \(H^1\)-stable without any condition to the grid and that assumptions on the grading of the mesh cannot be avoided. We start from the sequence

$$\begin{aligned} x_0=0,\quad x_{k+1}= x_k+h_k,\quad k=1,2,3,\ldots \,, \end{aligned}$$
(7.1)

of gridpoints in the interval \([0,1)\), where the local gridwidths

$$\begin{aligned} h_k=(1-\rho )\rho ^k,\quad 0<\rho <1, \end{aligned}$$
(7.2)

tend exponentially to zero and are chosen such that the \(x_k\) tend to \(1\) as \(k\) goes to infinity. We consider the infinite-dimensional space \({\fancyscript{S}}\) that consists of the on \([0,1)\) continuous and square integrable functions \(u\) whose restrictions to the intervals \([x_k,x_{k+1}]\) are linear. As the spaces of the restrictions of the functions in \({\fancyscript{S}}\) to all closed subintervals of the half-open interval \([0,1)\) are finite-dimensional, \({\fancyscript{S}}\) is a closed subspace of \(L_2(0,1)\) and the \(L_2\)-orthogonal projection \(Q\) from \(L_2(0,1)\) to \({\fancyscript{S}}\) exists. The nodal values \(u_k\) of the projection \(Qu\) of a functions \(u\in L_2\) to \({\fancyscript{S}}\) satisfy the equations

$$\begin{aligned} \frac{h_{k-1}}{6} u_{k-1}+\frac{h_{k-1}+h_k}{3}\,u_k +\frac{h_k}{6}\,u_{k+1} =(u,\varphi _k) \end{aligned}$$
(7.3)

assigned to the gridpoints \(x_k\) for \(k=1,2,\ldots \) and the additional equation

$$\begin{aligned} \frac{h_0}{3} u_0 +\frac{h_0}{6}\,u_1=(u,\varphi _0), \end{aligned}$$
(7.4)

assigned to \(x_0\); the \(\varphi _k\) are the hat functions assigned to the \(x_k\), the piecewise linear functions that take the value \(1\) at \(x_k\) and the value \(0\) at all other gridpoints.

In the following, we are only interested in the projections \(Qu\) of functions \(u\) that vanish for \(x\ge x_1\). To represent these in closed form, we introduce the constants

$$\begin{aligned} \lambda _1=-\frac{1+\rho -\sqrt{1+\rho +\rho ^2}}{\rho },\quad \lambda _2=-\frac{1+\rho +\sqrt{1+\rho +\rho ^2}}{\rho }. \end{aligned}$$
(7.5)

Lemma 7.1

The nodal values \(u_k\) of the projection \(Qu\) of a square integrable function \(u\) that takes the value zero for \(x\ge x_1\) are, for \(k\ge 1\), given by

$$\begin{aligned} u_k= \frac{6\,(u,\varphi _0)-12\,(u,\varphi _1)}{(1-\rho )(2+\lambda _1)}\; \lambda _1^k. \end{aligned}$$
(7.6)

Proof

The right hand sides of the difference equation (7.3) vanish in this case for all indices \(k\ge 2\). The \(u_k\) satisfy therefore, for \(k\ge 2\), the difference equation

$$\begin{aligned} u_{k-1}+2\,(1+\rho ) u_k+\rho \,u_{k+1} =0. \end{aligned}$$

Since \(\lambda _1\) and \(\lambda _2\) are the roots \(\lambda \) of the characteristic polynomial

$$\begin{aligned} 1+2\,(1+\rho ) \lambda +\rho \lambda ^2 \end{aligned}$$

of this equation, the general solution of this equation is

$$\begin{aligned} u_k=a\,\lambda _1^{k-1}+b \lambda _2^{k-1}, \quad k=1,2,\ldots \,. \end{aligned}$$

Since \(\rho \lambda _1^2<1\) and \(\rho \lambda _2^2>1\) for all \(\rho \) in the interval \((0,1]\) and because of

$$\begin{aligned} \int _{x_k}^{x_{k+1}}(\lambda ^k\varphi _k+\lambda ^{k+1}\varphi _{k+1})^2 \,\mathrm{d}x= \frac{(1-\rho )(1+\lambda +\lambda ^2)}{3}\,(\rho \lambda ^2)^k, \end{aligned}$$

the \(u_k\) are nodal values of a square integrable function in \({\fancyscript{S}}\) if and only if \(b=0\). This observation fixes the coefficient \(b\). To determine \(a\), we use that

$$\begin{aligned} (3h_0+4h_1)u_1+2h_1u_2=12\,(u,\varphi _1)-6\,(u,\varphi _0), \end{aligned}$$

as follows from (7.3) and (7.4). Since \(\lambda _1\) is a root of the characteristic polynomial,

$$\begin{aligned} (3h_0+4h_1+2h_1\lambda _1)\,\lambda _1=-\,(1-\rho )(2+\lambda _1). \end{aligned}$$

Inserting the representation \(u_1=a, u_2=a\,\lambda _1\), the proposition follows. The value

$$\begin{aligned} u_0= \frac{6\,(u,\varphi _0)+6\,\lambda _1(u,\varphi _1)}{(1-\rho )(2+\lambda _1)} \end{aligned}$$

of \(u_0\) can then be derived from (7.4). \(\square \)

Not much surprisingly, \(u_k=0\) for \(k\ge 1\) if \(u=\varphi _0\). As

$$\begin{aligned} \int _{x_k}^{x_{k+1}}(\lambda ^k\varphi _k^{\prime }+\lambda ^{k+1}\varphi _{k+1}^{\prime })^2\mathrm{d}x= \frac{(1-\lambda )^2}{1-\rho }\;\Big (\frac{\lambda ^2}{\rho }\Big )^k, \end{aligned}$$
(7.7)

the \(H^1\)-seminorms over the subintervals of a function represented by the values (7.6) sum up, for \((u,\varphi _0)\ne 2\,(u,\varphi _1)\), to a finite value if and only if

$$\begin{aligned} \frac{\lambda _1^2}{\rho }< 1, \end{aligned}$$
(7.8)

regardless the smoothness of \(u\). Since \(\lambda _1\) behaves asymptotically like

$$\begin{aligned} \lambda _1=-\frac{1}{2}+\frac{3}{8} \rho +\fancyscript{O}(\rho ^2), \quad \rho \rightarrow 0+, \end{aligned}$$
(7.9)

this condition is violated if \(\rho \) approaches zero. That means, the \(L_2\)-projection can only be \(H^1\)-stable if the length of the subintervals does not decrease too fast in comparison to the speed of decay of the projection from gridpoint to gridpoint, the property on which the proof of Theorem 4.1 and with that our entire theory is based.