1 Introduction

The adaptive finite element method plays an important role in the numerical solution for partial differential equations [1, 2, 42]. The convergence and optimality of the adaptive method have been much studied in recent years. For the Poisson equation and its variants, the theory is well-developed [9, 15, 19, 20, 26, 3538, 40, 41]. However, for many other important problems this is not the case. Among these under studied problems is the Stokes problem, the main subject of this paper.

The convergence analysis of the adaptive finite element method of the Poisson equation is based on the orthogonality property [19, 26, 35, 36], such orthogonality can be weakened to some quasi-orthogonality for the nonconforming and mixed methods [4, 6, 14, 15, 17, 20, 21, 29, 31, 34, 39]. The Stokes problem, as a saddle point problem with two variables (velocity and pressure), lacks the usual orthogonality or quasi-orthogonality that holds for the positive and definite problem. As a result, it is not obvious how the technique for nonconforming and mixed methods for the Poisson equation can be carried over to the Stokes problem. Although the mixed formulation of the Poisson equation is also a saddle point problem, analyses of this formulation’s convergence and optimality [4, 17, 20] are not so different from that for the primary formulation of the Poisson equation. The reason is that only the stress variable, which can be decoupled from the primary variable, needs to be involved in the analysis. This is not, however, the case for the Stokes problem under consideration here because the two variables, velocity and pressure, are coupled and cannot be separated in analyses of the convergence and optimality. To circumvent this difficulty, Bänsch, Morin, and Nochetto developed a modified adaptive procedure in which the Uzawa algorithm on the continuous level is used as the outer iteration [3, 32, 33]. See also [24] for adaptive wavelet methods.

The optimality of the adaptive finite element method for the Poisson equation is analyzed based on discrete reliability (see [19, 40, 41] and the references therein). Basically, we need one restriction operator and one prolongation operator in order to analyze the discrete reliability. For the conforming method, a natural candidate for the prolongation operator is the usual inclusion operator, and for the restriction operator a Scott–Zhang-type can be used as it has both the local projection property and the global and uniform boundedness property. For the nonconforming method under consideration here, however, it is a challenge to come up with a prolongation operator that has both the local projection property and the global and uniform boundedness property. For the nonconforming linear element method for the Poisson equation, such a difficulty can be circumvented using the discrete Helmholtz decomposition [6, 39]. However, the Helmholtz decomposition seems not applicable for the problem under consideration because the existence of such a decomposition is unclear for the general case.

The first convergence and optimality analysis of a standard adaptive finite element method for the Stokes problem was presented in a technical report [30] in 2007 by the authors of this paper. The analysis was based on some special relation between the nonconforming P 1 element and the lowest Raviart–Thomas element for the Stokes problem and one prolongation operator between the discrete spaces. But we later found a gap in our discrete reliability analysis caused by the prolongation operator used therein. A convergence and optimality analysis was published in [5] in 2011; however, we also found a gap in their analysis similar to that in our earlier report [30] (see Appendix for more details).

The present paper is an improved version of [30] with simplified and corrected proofs. Its purpose is to provide a rigorous analysis of the convergence and optimality of the adaptive nonconforming linear element method for the Stokes problem. The main idea is to establish the orthogonality or quasi-orthogonality of both the velocity variable and the pressure variable. The nonconformity of the discrete velocity space is the main difficulty in establishing the desired quasi-orthogonality property and the discrete reliability estimate. To overcome this difficulty we take two steps, (1) we establish the quasi-orthogonality for both the velocity and pressure variables by using a special conservative property of the nonconforming linear element, and (2) we introduce a new prolongation operator that has both the projection property and the uniform boundedness property for the discrete reliability analysis. To analyze optimality within the standard nonlinear approximate class [19], we define a new interpolation operator to bound the consistency error and prove that the consistency error can be bounded by the approximation error up to oscillation. This in fact implies that the nonlinear approximate class used in [30] is equivalent to the standard nonlinear approximate class [7, 19]. Finally, by introducing a new parameter-dependent error estimator, we prove convergence and optimality estimates for the Stokes problem.

The rest of the paper is organized as follows. In Sect. 2 we present the Stokes problem and its nonconforming linear finite element method, and recall a posteriori error estimate according to [12, 13, 16, 25]. We prove the quasi-orthogonality in Sect. 3 and then show the reduction of some total error in Sect. 4 in terms of a new parameter-dependent estimator. We introduce a new prolongation operator to establish discrete reliability in Sect. 5. And, we show optimality of the adaptive nonconforming linear element method in Sect. 6.

2 The Adaptive Nonconforming Linear Element

Let us first introduce some notations. We use the standard gradient and divergence operators ∇r:=(∂r/∂x,∂r/∂y) for a scalar function r, and \(\operatorname {div}\boldsymbol {\psi }:={\partial \psi_{1}}/{\partial x}+{\partial \psi_{2}}/{\partial y}\) for a vector function ψ=(ψ 1,ψ 2). Given a polygonal domain Ω⊂ℝ2 with the boundary ∂Ω, we use the standard notation for Sobolev spaces, such as H 1(Ω) and L 2(Ω). We define

In addition, we denote \((\cdot, \cdot)_{L^{2}(\varOmega )}\) as the usual L 2 inner product of functions in the space L 2(Ω), and \(\|\cdot\|_{L^{2}(\varOmega )}\) the L 2 norm.

Suppose that \(\overline{\varOmega}\) is covered exactly by a sequence of shape-regular triangulations \(\mathcal {T}_{k}\) (k≥0) consisting of triangles in 2D (see [11, 22]), and that this sequence is produced by some adaptive algorithm where \(\mathcal {T}_{k}\) is some nested refinement of \(\mathcal {T}_{k-1}\) by the newest vertex bisection [40, 41]. Let \(\mathcal {E}_{k}\) be the set of all edges in \(\mathcal {T}_{k}\); \(\mathcal {E}_{k}(\varOmega )\) the set of interior edges; \(\mathcal {E}(K)\) the set of edges of any given element K in \(\mathcal {T}_{k}\); and h K =|K|1/2 the size of the element \(K\in \mathcal {T}_{k}\) where |K| is the area of element K. ω K is the union of elements \(K'\in \mathcal {T}_{k}\) that share an edge with K, and ω E is the union of elements that share a common edge E. Given any edge \(E\in \mathcal {E}_{k}(\varOmega )\) with the length h E , we assign one fixed unit normal ν E :=(ν 1, ν 2) and tangential vector τ E :=(−ν 2, ν 1). For E on the boundary, we choose ν E :=ν, the unit outward normal to Ω. Once ν E and τ E are fixed on E, in relation to ν E we define the elements \(K_{-}\in \mathcal {T}_{k}\) and \(K_{+}\in \mathcal {T}_{k}\), with E=K +K . Given \(E\in \mathcal {E}_{k}(\varOmega)\) and some ℝd-valued function v defined in Ω, with d=1,2, we denote \([v]:=(v|_{K_{+}})|_{E}-(v|_{K_{-}})|_{E}\) as the jump of v across E, where v| K is the restriction of v on K and v| E is the restriction of v on E.

2.1 The Stokes Problem and Its Nonconforming Linear Element

The Stokes problem is defined as follows: Given gL 2(Ω)2, find \((u, p)\in V\times Q:=(H_{0}^{1}(\varOmega ))^{2}\times L_{0}^{2}(\varOmega )\) such that

$$ a(u,v)+b(v,p)+b(u,q)=(g,v)_{L^2(\varOmega )}\quad \text{for any}\ (v,q)\in V\times Q, $$
(2.1)

where u and p are the velocity and pressure of the flow, respectively, and

$$ a(u,v):=\mu(\nabla u,\nabla v)_{L^2(\varOmega )} \quad \text{and}\quad b(v,q):=(\operatorname {div}v, q)_{L^2(\varOmega )}, $$
(2.2)

where μ>0 is the viscosity coefficient of the flow.

Given ω⊂ℝ2 and some integer , denote P (ω) as the space of polynomials of degree ≤ over ω. We define

Since V k is not a subspace of H 1(Ω)2, the gradient and divergence operators are defined element by element with respect to \(\mathcal {T}_{k}\), and denoted by ∇ k and \(\operatorname {div}_{k}\). Define the piecewise smooth space

$$ H^1(\mathcal {T}_k):=\bigl\{v\in L^2(\varOmega ), v|_K\in H^1(K) \text{ for any }K\in \mathcal {T}_k \bigr\}. $$
(2.3)

The discrete bilinear forms read

$$ a_k(u,v):=\mu(\nabla _ku,\nabla _kv)_{L^2(\varOmega )} \quad \text{and}\quad b_k(v, q):=(\operatorname {div}_kv, q)_{L^2(\varOmega )} $$
(2.4)

for any \(u, v\in(H^{1}(\mathcal {T}_{k}))^{2}, \text{ and } q\in Q\).

The nonconforming P 1 element, proposed in [23], for the Stokes problem is as follows: Given gL 2(Ω)2, find (u k ,p k )∈V k ×Q k such that

$$ a_k(u_k,v)+b_k(v,p_k)+b_k(u_k,q)=(g,v)_{L^2(\varOmega )} \quad \text{for any }(v,q) \in V_k\times Q_k. $$
(2.5)

Let \(\operatorname {id}\in \mathbb {R}^{2\times 2}\) be the identity matrix. Define

$$\sigma_{k}:=\mu \nabla _ku_k+p_k \operatorname {id}. $$

Then, we have

$$ (\sigma_k, \nabla _kv_k)_{L^2(\varOmega )}=(g, v_k)_{L^2(\varOmega )} \quad \text{for any }v_k\in V_k. $$
(2.6)

2.2 The a Posteriori Error Estimate

To recall the a posteriori error estimator of the nonconforming P 1 element, we define the residual \(\operatorname {R}_{k-1}(\cdot)\) by

$$ \operatorname {R}_{k-1}(v):=(g,v)_{L^2(\varOmega )}-a_k(u_{k-1},v)-b_k(v,p_{k-1}) \quad \text{for any }v\in H^1(\mathcal {T}_k)^2, $$
(2.7)

with the solution (u k−1,p k−1) of (2.5) on the mesh \(\mathcal {T}_{k-1}\), which is a coarser and nested mesh of \(\mathcal {T}_{k}\). It follows from the definition of (u k−1,p k−1) that

$$\operatorname {R}_{k-1}(v_{k-1})=0 \quad \text{for any }v_{k-1}\in V_{k-1}. $$

Given \(K\in \mathcal {T}_{k}\), we define the element estimator

$$ \eta_{K}(u_k, p_k):=h_K\|g \|_{L^2(K)}+\biggl(\sum _{E\subset \partial K}h_K \bigl\|[\nabla _ku_k \tau_E]\bigr\|_{L^2(E)}^2 \biggr)^{1/2}. $$
(2.8)

Given \(S_{k}\subset \mathcal {T}_{k}\), we define the estimator over it by

$$ \eta^2(u_k, p_k, S_k):= \sum _{K\in S_k} \eta_{K}^2(u_k, p_k). $$
(2.9)

Given any \(K\in \mathcal {T}_{k}\), denote g K as the L 2 projection of g onto P 0(K). We define the oscillation

$$ \operatorname {osc}^2(g,\mathcal {T}_k):=\sum _{K\in \mathcal {T}_k}h_K^2\|g-g_K \|_{L^2(K)}^2. $$
(2.10)

The reliability and efficiency of the estimator \(\eta(u_{k}, p_{k}, \mathcal {T}_{k})\) can be found in [12, 13, 16, 25], as stated in the following lemma.

Lemma 2.1

Let (u,p) and (u k ,p k ) be the solutions of the Stokes problem (2.1) and the discrete problem (2.5), respectively. Then,

(2.11)
(2.12)

Remark 2.2

For the Stokes problem, the estimator usually involves the pressure approximation. For the nonconforming P 1 element, as shown in the above lemma, we can decouple the pressure from the velocity [25].

Here and throughout the paper, we use the notations ≲ and ≊. When we write

$$A_1 \lesssim B_1, \quad \text{and}\quad A_2\approxeq B_2, $$

possible constants C 1, c 2 and C 2 exist such that

$$A_1 \leq C_1 B_1, \quad \text{and}\quad c_2B_2\leq A_2\leq C_2 B_2. $$

2.3 The Adaptive Nonconforming Finite Element Method

The adaptive algorithm is defined as follows: Let \(\mathcal{T}_{0}\) be an initial shape-regular triangulation, a right-side gL 2(Ω)2, a tolerance ϵ, and a parameter 0<θ<1.

Algorithm 2.1

\([\mathcal{T}_{N}, u_{N}, p_{N}]= \textbf{ANFEM}(\mathcal{T}_{0}, g, \epsilon, \theta)\)

η=ϵ,k=0

WHILE ηϵ, DO

  1. (1)

    Solve (2.5) on \(\mathcal{T}_{k}\) to get the solution (u k ,p k ).

  2. (2)

    Compute the error estimator \(\eta=\eta(u_{k}, p_{k}, \mathcal {T}_{k})\).

  3. (3)

    Mark the minimal element set \(\mathcal{M}_{k}\) such that

    $$ \eta^2(u_{k}, p_{k}, \mathcal{M}_{k})\geq\theta\, \eta^2(u_{k}, p_{k}, \mathcal {T}_{k}). $$
    (2.13)
  4. (4)

    Refine each triangle \(K \in\mathcal{M}_{k}\) by the newest vertex bisection to get \(\mathcal {T}_{k+1}\) and set k=:k+1.

END WHILE

\(\mathcal {T}_{N}=\mathcal {T}_{k}\).

END ANFEM

3 Quasi-orthogonality

The quasi-orthogonality property is the main ingredient for the convergence analysis of the adaptive nonconforming method under consideration. In this section we establish such a property by exploring the conservative property of the nonconforming linear element and by confirming that the stress is piecewise constant. To this end, we define a canonical interpolation operator Π k for the nonconforming space V k and a restriction operator I k−1 from V k to the coarser space V k−1. Given vV, we define the interpolation Π k vV k by

$$ \int_E \varPi_kvds:= \int_E vds\quad \text{for any }E\in \mathcal {E}_k. $$
(3.1)

In this paper, the above property is referred to as the conservative property. This property is crucial for the analysis herein. A similar conservative property was first explored in [29] to analyze the quasi-orthogonality property of the Morley element.

The interpolation admits the following estimate:

$$ \|v-\varPi_kv\|_{L^2(K)}\lesssim h_K\|\nabla v\|_{L^2(K)}\quad \text{for any }K\in \mathcal {T}_k \text{ and }v\in V. $$
(3.2)

Given v k V k , we define the restriction interpolation I k−1 v k V k−1 by

$$ \int_E I_{k-1}v_k ds:=\sum _{l=1}^{\ell}\int _{E_l} v_k ds,\quad E\in \mathcal {E}_{k-1} \text{ with } E=E_1\cup E_2\cup\cdots\cup E_{\ell}\text{ and }E_i\in \mathcal {E}_k. $$
(3.3)

The properties of the restriction operator I k−1 are summarized in the following lemma.

Lemma 3.1

Let the restriction operator I k−1 be defined in (3.3). Then,

(3.4)
(3.5)

Proof

The property (3.4) directly follows from the definition of the restriction interpolation. Only the estimate (3.5) needs to be proved. In fact, both sides of (3.5) are semi-norms of the restriction (V k ) K of V k on K. If the right-hand side vanishes for some v∈(V k ) K , then v k is a piecewise constant vector over K with respect to \(\mathcal {T}_{k}\). Given the average continuity of v k across the internal edges of \(\mathcal {T}_{k}\), it follows that v k is a constant vector on K. Therefore, the left-hand side also vanishes for the same v k . The desired result then follows a scaling argument. □

Remark 3.2

An alternative proof for the inequality (3.5) follows the discrete Poincare inequality established in [10] for the scalar function, which is further investigated in [39]. Notice that the positive constant of (3.5) is independent of the ratio

$$ \gamma:=\max _{K\in \mathcal {T}_{k-1}\backslash \mathcal {T}_{k}}\max _{\mathcal {T}_{k}\ni T\subset K} \frac{h_K}{h_T}, $$
(3.6)

see [39, Lemma 4.1] for more details.

Lemma 3.3

Let (u k−1,p k−1) be the solution of the discrete problem (2.5) on the mesh \(\mathcal {T}_{k-1}\). It, therefore, holds that

$$ \bigl|\operatorname {R}_{k-1}(v_k\bigr)|\lesssim \biggl( \sum _{K\in \mathcal {T}_{k-1}\backslash \mathcal {T}_k}h_K^2 \|g\|_{L^2(K)}^2 \biggr)^{1/2}\|\nabla _kv_k \|_{L^2(\varOmega )}\quad \text{\textit{for any}}\ v_k\in V_k. $$
(3.7)

Proof

For the reader’s convenience, we recall the definition of the residual as follows:

$$ \operatorname {R}_{k-1}(v_k)=(g,v_k)_{L^2(\varOmega )}-( \sigma_{k-1},\nabla _kv_k)_{L^2(\varOmega )}. $$
(3.8)

To analyze the right-hand side of the above equation, we set v k−1=I k−1 v k . As σ k−1 is a piecewise constant tensor with respect to the mesh \(\mathcal {T}_{k-1}\), the definition of the interpolation operator I k−1 in (3.3) leads to

$$ \int_E (v_k-v_{k-1}) \cdot\sigma_{k-1}\nu_Eds=0\quad \text{for any }E\in \mathcal {E}_{k-1}. $$
(3.9)

For any \(E\in \mathcal {E}_{k}\) that lies in the interior of some \(K\in \mathcal {T}_{k-1}\), the integral average of v k over E is continuous and σ k−1 is a constant on K. Then,

$$ \int_E [v_k-v_{k-1}] \cdot\sigma_{k-1}\nu_Eds=0. $$
(3.10)

By integrating parts on the fine mesh \(\mathcal {T}_{k}\) and using (3.9) and (3.10), we get

$$ \bigl(\nabla _k(v_k-v_{k-1}), \sigma_{k-1}\bigr)_{L^2(\varOmega )}=0. $$
(3.11)

Inserting this identity into (3.8) and adopting the discrete problem (2.5), we employ properties (3.4) and (3.5) of the interpolation operator I k−1 to derive

(3.12)

which completes the proof. □

Lemma 3.4

(Quasi-orthogonality of the velocity)

Let (u k ,p k ) and (u k−1,p k−1) be the discrete solutions of (2.5) on \(\mathcal {T}_{k}\) and \(\mathcal {T}_{k-1}\), respectively. Then,

$$ \bigl|a_k(u-u_k,u_k-u_{k-1})\bigr| \lesssim\|\nabla _k(u-u_k)\bigl\|_{L^2(\varOmega )} \biggl(\sum _{K\in \mathcal {T}_{k-1}\backslash \mathcal {T}_k}h_K^2\|g \bigr\|_{L^2(K)}^2 \biggr)^{1/2}. $$

Proof

The Stokes problem (2.1) and the discrete problem (2.5) give

(3.13)

Given that \((\operatorname {div}_{k}(u-u_{k}), p_{k}-p_{k-1})_{L^{2}(\varOmega )}=0\), let v k =Π k (uu k ). And, σ k σ k−1 is a piecewise constant tensor with respect to the fine mesh \(\mathcal {T}_{k}\); therefore, by the definition of the interpolation operator Π k in (3.1), we integrate by parts on \(\mathcal {T}_{k}\) to obtain

$$ \bigl(\nabla _k\bigl((u-u_k)-v_k\bigr), \sigma_k-\sigma_{k-1}\bigr)_{L^2(\varOmega )}=0. $$
(3.14)

From the discrete problem (2.5), we have

$$ a_k(u-u_k,u_k-u_{k-1})=(g,v_k)_{L^2(\varOmega )}-( \nabla _kv_k,\sigma_{k-1})_{L^2(\varOmega )}= \operatorname {R}_{k-1}(v_k). $$
(3.15)

The term on the right-hand side of (3.15) can be estimated by the inequality (3.7) as follows:

which completes the proof. □

Lemma 3.5

(Quasi-orthogonality of the pressure)

Let (u k ,p k ) and (u k−1,p k−1) be the discrete solutions of (2.5) on \(\mathcal {T}_{k}\) and \(\mathcal {T}_{k-1}\), respectively. Then,

(3.16)

Remark 3.6

The quasi-orthogonality of the pressure herein is different from those for the nonstandard method of the Poisson equation [14, 15, 20] by the fact that both \(\|\nabla _{k}(u_{k}-u_{k-1})\|_{L^{2}(\varOmega )}\) and \(\|p-p_{k}\|_{L^{2}(\varOmega )}\) appear on the right-hand side of (3.16).

Proof

Let Π 0,k be the L 2 projection operator from \(L^{2}_{0}(\varOmega)\) onto Q k . It follows from the discrete inf-sup condition that there exists v k V k with

$$ \operatorname {div}_k v_k=\varPi_{0, k}p-p_k, \quad \text{and}\quad \|\nabla _kv_k\|_{L^2(\varOmega )}\lesssim\| \varPi_{0, k}p-p_k\|_{L^2(\varOmega )}. $$
(3.17)

Since p k p k−1Q k , it follows from the continuous problem (2.1), the discrete problem (2.5), and the definition of the residual (2.7) that

$$ (p-p_k, p_k-p_{k-1})_{L^2(\varOmega )} =(\operatorname {div}_k v_k, p_k-p_{k-1})_{L^2(\varOmega )} =\operatorname {R}_{k-1}(v_k)+a_k(u_{k-1}-u_k,v_k). $$

We use the estimates in (3.7) and (3.17) to get

which completes the proof. □

4 The Convergence of the ANFEM

To prove the convergence of the adaptive algorithm, we first prove the reduction of the error between the two nested meshes, \(\mathcal {T}_{k}\) and \(\mathcal {T}_{k-1}\), where \(\mathcal {T}_{k}\) is the refinement of the coarser mesh \(\mathcal {T}_{k-1}\) with (2.13) by the newest vertex bisection. In order to control the volume part \(\sum_{K\in \mathcal {T}_{k-1}\backslash \mathcal {T}_{k}}h_{K}^{2}\|g\|_{L^{2}(K)}^{2}\) appearing in Lemmas 3.4 and 3.5, we introduce the following modified estimator:

$$ \tilde{\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1}):=\sum _{K\in \mathcal {T}_{k-1}} \bigl(\beta_1 h_K^2 \|g\|_{L^2(K)}^2+\eta_K^2(u_{k-1}, p_{k-1}) \bigr) $$
(4.1)

with the positive constant β 1>0 to be determined later. Note that this modified estimator is introduced only for the convergence analysis and that the final convergence and optimal complexity will be proved for Algorithm 2.1.

Note that the volume residual \(\sum_{K\in \mathcal {T}_{k-1}} h_{K}^{2}\|g\|_{L^{2}(K)}^{2}\) does not contain the unknowns. Hence, we add it to settle down the lacking of the Galerkin-orthogonality or quasi-orthogonality. We stress that the Galerkin-orthogonality or quasi-orthogonality is an essential ingredient for the convergence analysis of the adaptive conforming, nonconforming, and mixed methods for the Poisson-like problems [14, 15, 19, 20, 26, 35, 36]. This is another reason that we need a modified estimator as in (4.1).

We list three standard components for the convergence analysis of the adaptive method, which can be proved by following the arguments, for instance, in [15, 19, 26].

Lemma 4.1

Let \(\mathcal {T}_{k}\) be some refinement of \(\mathcal {T}_{k-1}\) from Algorithm 2.1, then ρ>0 and a positive constant β∈(1−ρθ,1) exist, such that

$$ {\eta}^2(u_{k-1},p_{k-1}, \mathcal {T}_k) \leq\beta{\eta}^2(u_{k-1},p_{k-1}, \mathcal {T}_{k-1})+(1-\rho\theta-\beta ){\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1}). $$
(4.2)

Proof

The result can be proved by following the idea in [15, 19, 26]. The details are only given for the readers’ convenience. In fact, we have

$$ {\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_k)={ \eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1}\cap \mathcal {T}_k)+{\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_k\backslash \mathcal {T}_{k-1}). $$
(4.3)

For any \(K\in \mathcal {T}_{k-1}\backslash \mathcal {T}_{k}\), we only need to consider the case where K is subdivided into \(K_{1},K_{2}\in \mathcal {T}_{k}\) with \(|K_{1}|=|K_{2}|=\frac{1}{2}|K|\). As [∇ k−1 u k−1 τ E ]=0 over the interior edge \(E=K_{1}\cap K_{2}\in \mathcal {E}_{k}\), we have

(4.4)

Consequently,

(4.5)

Let \(\rho=1-\frac{1}{2^{1/2}}\), therefore, we obtain

$$ {\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_k) \leq{\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1})-\rho{\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1}\backslash \mathcal {T}_k). $$
(4.6)

Choosing the positive parameter β with 1−ρθ<β<1, we combine the above inequality and the bulk criterion (2.13) to achieve the desired result. □

Lemma 4.2

Let \(\mathcal {T}_{k}\) be some refinement of \(\mathcal {T}_{k-1}\) produced in Algorithm 2.1, then there exists ρ>0 such that

$$ \sum _{K\in \mathcal {T}_k}h_K^{2} \|g\|_{L^2(K)}^2\leq \sum _{K\in \mathcal {T}_{k-1}}h_K^{2}\|g\|_{L^2(K)}^2- \rho\sum _{K\in \mathcal {T}_{k-1}\backslash \mathcal {T}_k}h_K^{2} \|g\|_{L^2(K)}^2. $$
(4.7)

Proof

This can be proved by a similar argument proposed in the previous lemma. □

Lemma 4.3

(Continuity of the estimator)

Let u k and u k−1 be the solutions to the discrete problem (2.5) on the meshes \(\mathcal {T}_{k}\) and \(\mathcal {T}_{k-1}\) obtained from Algorithm 2.1. Given any positive constant ϵ, there exists a positive constant β 2(ϵ) dependent on ϵ such that

$$ {\eta}^2(u_k, p_k, \mathcal {T}_k)\leq(1+\epsilon) {\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_k) +\frac{1}{\beta_2(\epsilon)}\bigl\|\nabla _k(u_k-u_{k-1}) \bigr\|_{L^2(\varOmega )}^2. $$
(4.8)

Proof

Given any \(K\in \mathcal {T}_{k}\), it follows from the definitions of η K (u k ,p k ) and η K (u k−1,p k−1) in (4.4) that

Given \(E\in \mathcal {E}_{k}\), let \(K_{1}, K_{2}\in \mathcal {T}_{k}\) be the two elements that take E as one edge. Then, we use the trace theorem and the fact that ∇ k (u k u k−1) is a piecewise constant tensor to get

(4.9)

which gives

$$ {\eta}_K(u_k, p_k)\leq{ \eta}_K(u_{k-1}, p_{k-1})+ C_{Con} \bigl\| \nabla_k (u_k-u_{k-1})\bigr\|_{L^2(\omega_K)}, $$
(4.10)

for some positive constant C Con . Given any positive constant ϵ, we apply the Young inequality to get

$$ {\eta}^2_K(u_k, p_k)\leq(1+ \epsilon) {\eta}_K^2(u_{k-1}, p_{k-1})+ \frac{C_{Con}^2(1+\epsilon)}{\epsilon} \bigl\|\nabla_k (u_k-u_{k-1}) \bigr\|_{L^2(\omega_K)}^2. $$
(4.11)

A summation over all elements in \(\mathcal {T}_{k}\) completes the proof with \(\beta_{2}(\epsilon)=\frac{M\epsilon}{C_{Con}^{2}(1+\epsilon)}\), where the positive constant M depends on the finite overlapping of the patches ω K . □

In the following theorem, we prove the convergence of the adaptive nonconforming finite element method for the Stokes problem. The main ingredients are the quasi-orthogonality of both the velocity and the pressure in Lemmas 3.4 and 3.5, and the relations of the estimators between two the meshes \(\mathcal {T}_{k}\) and \(\mathcal {T}_{k-1}\) presented in Lemmas 4.1–4.3.

Theorem 4.4

Let (u,p) and (u k ,p k ) be the solutions of (2.1) and (2.5). Then γ 1,γ 2,β 1>0 and 0<α<1 exist, such that

(4.12)

Proof

First, we adopt the quasi-orthogonality of both the velocity and the pressure. Denote the multiplication constant in Lemma 3.4 by C QOV . As

(4.13)

it follows from the quasi-orthogonality of the velocity in Lemma 3.4 and the Young inequality that

(4.14)

where \(C_{1}(\delta_{1})=\frac{C_{QOV}^{2}}{\delta_{1}}\) for any positive constant 0<δ 1<1. Denote the multiplication constant in Lemma 3.5 by C QOP . From the quasi-orthogonality of the pressure proved in Lemma 3.5 and the Young inequality, we have

(4.15)

here \(\beta_{3}(\delta_{3})=\frac{\delta_{3}}{C_{QOP}^{2}}\) and \(C_{2}(\delta_{2})=\frac{C_{QOP}^{2}}{\delta_{2}}\) for any constants 0<δ 2,δ 3<1. Then we multiply the inequality (4.14) by γ 1>0 and the inequality (4.15) by γ 2>0 to obtain

(4.16)

For the presentation, we introduce some short-hand notations for any positive constants γ 3,γ 4>0:

(4.17)

Second, we use the continuity of the estimators from Lemmas 4.1–4.3 to cancel both the term \(\|\nabla _{k}(u_{k}-u_{k-1})\|_{L^{2}(\varOmega )}\) and the volume estimator. In fact, from (4.2) and (4.8), we have

(4.18)

Then we combine the above inequality with the inequalities (4.16) and (4.7) to obtain

It remains to prove that the positive constants δ i ,i=1,2,3, γ i ,i=1,2,3,4, ϵ, β, and β 1 exist such that the contraction (4.12) holds for some constant 0<α<1. Further it is possible that the constant dependent on the choices of the aforementioned parameters but independent of the meshsize h and the level k. This will be achieved in the following three steps.

Step 1

For the second, fourth, and fifth terms on the right-hand side of the above inequality to vanish, we set

(4.19)

Note that γ 2, γ 4, and β will be determined after δ i ,i=1,2,3, γ 1, γ 3, and ϵ have been specified. In the following, we assume that ϵ is fixed in such a way that 0<β<1. Also, we let γ 1 and γ 3 be fixed such that \(\gamma_{1}>\frac{\gamma_{3}}{\beta_{2}(\epsilon)}\) and γ 2>0. Hence, we have

$$ \mathfrak{G}_k(u_k,p_k) \leq\overline{ \mathfrak{G}}_{k-1}(u_{k-1}, p_{k-1}). $$

Let the positive constant α with β<α<1 be determined later. We define

Then we perform the decomposition \(\overline{\mathfrak {G}}_{k-1}(u_{k-1}, p_{k-1})=\alpha \mathfrak {G}_{k-1}(u_{k-1},p_{k-1})+ \mathfrak{R}_{k-1}(u_{k-1}, p_{k-1})\) to get

$$ \mathfrak{G}_k(u_k,p_k) \leq \alpha \mathfrak{G}_{k-1}(u_{k-1},p_{k-1})+ \mathfrak{R}_{k-1}(u_{k-1}, p_{k-1}). $$

Step 2

Now we only need to show that it is possible to choose α<1 such that ℜ k−1(u k−1,p k−1)≤0. This can be achieved by selecting parameters δ i ,i=1,2,3. To this end, we recall the reliability of \(\eta(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1})\) in Lemma 2.1 with the multiplication coefficient C Rel :

$$ \bigl\|\nabla _{k-1}(u-u_{k-1})\bigr\|_{L^2(\varOmega )}^2+ \|p-p_{k-1}\|_{L^2(\varOmega )}^2\leq C_{Rel}{ \eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1}). $$
(4.20)

Further, we take δ 1=δ 2+δ 3 with \(0<\delta_{1}<\min( \frac{\gamma_{3}(1-\beta)}{C_{Rel}(\gamma_{1}+\gamma_{2})},1)\). Then, we take

$$\alpha:=\frac{(\gamma_1+\gamma_2)C_{Rel}+\gamma_3\beta+\gamma_4}{ (1-\delta_1)(\gamma_1+\gamma_2)C_{Rel}+\gamma_3+\gamma_4}. $$

It is straightforward to see that β<α<1. As

$$ \sum _{K\in \mathcal {T}_{k-1}}h_K^{2} \|g\|_{L^2(K)}^2 \leq{\eta}^2(u_{k-1}, p_{k-1}, \mathcal {T}_{k-1}), $$
(4.21)

we obtain

This proves that

$$ \mathfrak{G}_k(u_k,p_k) \leq \alpha \mathfrak{G}_{k-1}(u_{k-1},p_{k-1}). $$

Step 3

Finally, we take β 1:=γ 4/γ 3 and rearrange γ 2:=γ 2(1−δ 2δ 3)/(1−δ 1)γ 1, γ 3:=γ 3/(1−δ 1)γ 1, which completes the proof. □

5 The Discrete Reliability

In this section, we prove the discrete reliability. The analysis needs some prolongation operator from V k to V k+ with some integer ≥1. Some further notations are needed. Given \(E\in \mathcal {E}_{k+\ell}\), the edge patch ω E,k of E with respect to the mesh \(\mathcal {T}_{k}\) is defined as

$$ \omega_{E, k}:=\{K\in \mathcal {T}_{k}, E\subset\partial K \text{ or } E \text{ lies in the interior of }K\}. $$
(5.1)

Let \(\xi_{E}=\operatorname {card}( \omega_{E, k})\). We define the prolongation interpolation \(I_{k+\ell}^{\prime}v_{k}\in V_{k+\ell}\) for any v k V k , as

$$ \int_E I_{k+\ell}^{\prime}v_k ds:= \frac{1}{\xi_{E}}\sum _{K\in\omega_{E, k}} \int _E (v_k|_{K}) ds\quad \text{for any } E\in \mathcal {E}_{k+\ell}. $$
(5.2)

For the interpolation operator \(I_{k+\ell}^{\prime}\), we have

$$ I_{k+\ell}^{\prime}v_k=v_k \quad \text{for any }K\in \mathcal {T}_k\cap \mathcal {T}_{k+\ell}\text { and }v_k\in V_{k+\ell}. $$
(5.3)

As we will see in Remark 5.3 below, we cannot directly use the prolongation operator \(I^{\prime}_{k+\ell}\) in the analysis of the discrete reliability. An averaging operator is needed. Denote \(\mathcal{N}_{k}\) as the set of internal vertexes of the mesh \(\mathcal{T}_{k}\), and denote \(S_{k}\subset H_{0}^{1}(\varOmega)\) as the conforming linear element space over \(\mathcal{T}_{k}\). Given \(Z\in\mathcal{N}_{k}\), the nodal patch ω Z,k is defined by

$$ \omega_{Z, k}:=\{K\in \mathcal {T}_k, Z\in K\}. $$
(5.4)

Denote ϕ Z S k as the canonical basis function associated to Z, which satisfies ϕ(Z)=1 and ϕ(Z′)=0 for vertex Z′ of \(\mathcal {T}_{k}\) other than Z. We define

$$ \mathcal {E}_Z:=\{E\in \mathcal {E}_k, Z\in\mathcal{N}_k \text{ is one end point of }E\}. $$
(5.5)

The idea of [10] leads to the definition of the following averaging operator Π:V k →(S k )2:

$$ \varPi v_k:=\sum _{Z\in\mathcal{N}_k}v_Z \phi_Z \quad \text{for any }v_k\in V_k, $$
(5.6)

where

$$ v_Z=\frac{1}{\xi_Z}\sum _{K\in\omega_{Z, k}}(v_k|_K) (Z) \quad \text{ with } \xi_{Z}=\operatorname {card}( \omega_{Z, k}). $$
(5.7)

Given any \(K\in \mathcal {T}_{k}\), we have

(5.8)

for any v k V k , see [10] for the proof. Define

$$ \varOmega_{\mathcal{R}}:=\textrm{interior}\Bigl(\bigcup\bigl\{K: K\in \mathcal {T}_k\backslash\mathcal{T}_{k+\ell}, \text{ $\partial K\cap \partial (\mathcal {T}_{k}\cap \mathcal {T}_{k+\ell})=\emptyset$}\bigr\}\Bigr). $$

The main idea herein is to take the mixture of the prolongation operators \(I_{k+\ell}^{\prime}\) and Π. More precisely, we use Π in the region \(\varOmega_{\mathcal {R}}\) where the elements of \(\mathcal {T}_{k}\) are refined and take \(I_{k+\ell }^{\prime}\) in the region \(\mathcal {T}_{k+\ell}\cap \mathcal {T}_{k}\), and we define some mixture in the layers between them. This leads to the prolongation operator J k+ :V k V k+ as follows:

$${J}_{k+\ell}v_k:= \begin{cases} \varPi_{k+\ell}\varPi v_k &\text{on }\varOmega_{\mathcal {R}},\\ I_{k+\ell}^{\prime}v_k &\text{on } \mathcal {T}_k\cap \mathcal {T}_{k+\ell },\\ v_{k+\ell, tr}&\text{on }\varOmega\backslash(\varOmega_{\mathcal {R}}\cup (\mathcal {T}_k\cap \mathcal {T}_{k+\ell})), \end{cases} $$

where v k+,tr is defined as

$$\int_Ev_{k+\ell, tr}ds:= \begin{cases} \int_E\varPi v_kds & \text{if } E\subset \partial \varOmega_{\mathcal{R}}\\ \int_E I_{k+\ell}^{\prime}v_k ds &\text{otherwise } \end{cases} \quad \text{for any } E\in \mathcal {E}_{k+\ell}. $$

Lemma 5.1

For any v k V k , it holds that

$$ \bigl\|\nabla_{k+\ell}(J_{k+\ell}v_k-v_k) \bigr\|_{L^2(\varOmega )}^2 \lesssim \sum _{K\in \mathcal {T}_k\backslash \mathcal {T}_{k+\ell}}\sum _{E\subset \partial K\& E\nsubseteq \partial (T_k\cap \mathcal {T}_{k+\ell})}h_K \bigl\|[\nabla_k v_k\tau_E]\bigr\|_{L^2(E)}^2. $$
(5.9)

Proof

As J k+ v k =Πv k on \(\varOmega_{\mathcal{R}}\) and J k+ v k =v k on \(\mathcal {T}_{k-1}\cap \mathcal {T}_{k+\ell}\), from (5.3) and (5.8), we only need to estimate \(\|\nabla (J_{k+\ell}v_{k}-v_{k})\|_{L^{2}(K)}=\|\nabla (v_{k+\ell, tr}-v_{k})\|_{L^{2}(K)}\) for \(\mathcal {T}_{k+\ell}\ni K\subset \varOmega \backslash(\varOmega_{\mathcal{R}}\cup(\mathcal {T}_{k}\cap \mathcal {T}_{k+\ell}))\). Given \(E\in \mathcal {E}_{k+\ell}\), let φ E be the canonical basis function of the nonconforming P 1 element on \(\mathcal {T}_{k+\ell}\), which satisfies ∫ E φ E ds=|E| and ∫ E φ E ds=0 for any \(E^{\prime}\in \mathcal {E}_{k+\ell}\) other than E. A direct calculation yields

$$ \|\varphi_E\|_{L^2(\varOmega )}+h_E\|\nabla _{k+\ell} \varphi_E\|_{L^2(\varOmega )}\lesssim h_E. $$

Let \(v_{E}^{\prime}:=\int_{E} v_{k+\ell, tr}|_{K}ds\) and v E :=∫ E v k | K ds; thus we have

$$ \bigl\|\nabla (v_{k+\ell, tr}-v_k)\bigr\|_{L^2(K)} \lesssim\sum _{E\subset \partial K}\bigl|v_{E}^{\prime}-v_E\bigr|/h_E. $$
(5.10)

Next we bound the terms \(|v_{E}^{\prime}-v_{E}|\) for \(E\in \mathcal {E}_{k+\ell}\).

Case 1

\(E\subset \partial \varOmega_{\mathcal{R}}\). Let \(F\in \mathcal {E}_{k}\) be the mother of edge E in the sense of EF. Let \(T\in \mathcal {T}_{k}\) be the mother of K in the sense of KT. Denote the vertexes of T as Z i ,i=1,2,3. Without losing generality, we assume that Z 1 and Z 2 are two endpoints of F. Then, the trace of v k | T on F can be expressed as

$$ v_k|_F=(v_k|_T) (Z_1)\phi_{Z_1}+(v_k|_T) (Z_2)\phi_{Z_2}. $$
(5.11)

Note that

$$ \varPi v_k|_F=v_{Z_1}\phi_{Z_1}+v_{Z_2} \phi_{Z_2}. $$
(5.12)

We recall that \(v_{Z_{i}}\) are defined in (5.7) and that \(\phi_{Z_{i}}\) are the canonical basis functions associated with vertexes Z i for the conforming linear element. Therefore

(5.13)

Case 2

\(E\nsubseteq \partial \varOmega_{\mathcal{R}}\). Again, let \(F\in \mathcal {E}_{k}\) be the mother of E in the sense of EF. Then, we simply have

$$ \bigl|v_{E}^{\prime}-v_E\bigr|\lesssim h_F^{3/2}\bigl\|[\nabla_kv_k \tau_F]\bigr\|_{L^2(F)}. $$
(5.14)

By inserting the estimates of \(|v_{E}^{\prime}-v_{E}|\) from (5.13) and (5.14) into (5.10), we complete the proof. □

We define the ratio γ as follows:

$$ \gamma:=\max _{K\in \mathcal {T}_k\backslash \mathcal {T}_{k+\ell}}\max _{\mathcal {T}_{k+\ell}\ni T\subset K} \frac{h_K}{h_T}. $$
(5.15)

One observation herein is that γ is bounded for the element \(K\in \mathcal {T}_{k}\), which lies in the layer \(\varOmega\backslash(\varOmega_{\mathcal{R}}\cup(\mathcal {T}_{k}\cap \mathcal {T}_{k+\ell}))\).

Lemma 5.2

The following discrete reliability holds:

$$ \bigl\|\nabla _{k+\ell}( u_{k+\ell}-u_k) \bigr\|_{L^2(\varOmega )}+\|p_{k+\ell}-p_k\|_{L^2(\varOmega )} \lesssim { \eta}(u_k, p_k, \mathcal {T}_k\backslash \mathcal {T}_{k+\ell}). $$
(5.16)

Remark 5.3

If we directly take the prolongation operator \(I_{k+\ell}^{\prime}\) to analyze this discrete reliability, the constant for the established discrete reliability will depend on the ratio γ (see Appendix for an example).

Proof

For any v k+ V k+ , we have the following decomposition:

(5.17)

We will first estimate the first term on the right-hand side of the above equation. It follows the discrete problem (2.5) that

$$ a_{k+\ell}(u_{k+\ell}-u_k, u_{k+\ell}-v_{k+\ell}) =\operatorname {R}_k(u_{k+\ell}-v_{k+\ell})-b_{k+\ell}(u_{k+\ell}-v_{k+\ell },p_{k+\ell}-p_k) . $$
(5.18)

The first term on the right-hand side of (5.18) can be bounded as in (3.7):

$$ \bigl|\operatorname {R}_k(u_{k+\ell}-v_{k+\ell})\bigr| \lesssim \biggl(\sum _{K\in \mathcal {T}_k\backslash \mathcal {T}_{k+\ell}} h_K^2\|g\|_{L^2(K)}^2 \biggr)^{1/2}\bigl\|\nabla _{k+\ell}(u_{k+\ell}-v_{k+\ell}) \bigr\|_{L^2(\varOmega )}. $$
(5.19)

Now we turn to the second term on the right hand side of (5.18). Thanks to the discrete inf-sup condition, we use the discrete problem (2.5) to get

(5.20)

An application of the Cauchy–Schwarz inequality leads to

$$ \bigl|b_{k+\ell}(u_{k+\ell}-v_{k+\ell},p_{k+\ell}-p_k)\bigr| \leq\|p_{k+\ell}-p_k\|_{L^2(\varOmega )} \bigl\|\nabla _{k+\ell}(u_{k+\ell}-v_{k+\ell}) \bigr\|_{L^2(\varOmega )}. $$
(5.21)

After inserting (5.18), (5.19), (5.20), and (5.21) into (5.17), we use the triangle and Young inequalities to derive

(5.22)

An application of (5.9) bounds the second term on the right-hand side of (5.22). This completes the proof. □

With γ 1 from Theorem 4.4, we define the following energy norm:

$$ |\hskip -0.8pt|\hskip -0.8pt|v,q|\hskip -0.8pt|\hskip -0.8pt|^2:=\|\nabla v\|_{L^2(\varOmega )}^2+ \gamma_1\|q\|_{L^2(\varOmega )}^2, \quad \text{for any } (v,q)\in V\times Q. $$
(5.23)

We denote its piecewise version by \(|\hskip -0.8pt|\hskip -0.8pt|\cdot |\hskip -0.8pt|\hskip -0.8pt|_{k+\ell}\).

The following lemma gives links between the error reduction to the bulk criterion.

Lemma 5.4

Let \(\mathcal {T}_{k+\ell}\) be the refinement of \(\mathcal {T}_{k}\) with the following reduction:

(5.24)

with 0<α′<1 and the positive constant γ 2 from Theorem 4.4. There exists 0<θ <1 with

$$ \theta_{\ast}\eta^2(u_k, p_k,\mathcal {T}_k)\leq\eta^2(u_k,p_k, {\mathcal {T}_k\backslash \mathcal {T}_{k+\ell}}). $$
(5.25)

Proof

It follows (5.24) and the definitions of the norms \(|\hskip -0.8pt|\hskip -0.8pt|\cdot |\hskip -0.8pt|\hskip -0.8pt|_{k}\) and \(|\hskip -0.8pt|\hskip -0.8pt|\cdot |\hskip -0.8pt|\hskip -0.8pt|_{k+\ell}\) that

The first two terms, I 1 and I 2, are estimated by the discrete reliability in Lemma 5.2,

(5.26)

where the coefficient C Drel is from Lemma 5.2. The third term I 3 can be estimated by the quasi-orthogonality of the velocity in Lemma 3.4. In fact, let the multiplication constant therein be the coefficient C QOV , so that we have

(5.27)

Next, we use the quasi-orthogonality of the pressure in Lemma 3.5 to analyze the fourth term, I 4. Denote the constant of Lemma 3.5 by C QOP , and we obtain

Hence it follows from (5.26) that

(5.28)

A direct calculation leads to

$$ \gamma_2\bigl|\operatorname {osc}^2(f,\mathcal {T}_k)- \operatorname {osc}^2(f, \mathcal {T}_{k+\ell})\bigr|\leq\gamma_2 \eta^2(u_k, p_k, \mathcal {T}_k \backslash \mathcal {T}_{k+\ell}), $$
(5.29)

we combine (5.26)–(5.29), and (5.24) with the efficiency of the estimator, which proves the desired result by the parameter

$$\theta_{\ast}=\frac{(1-\alpha ^{\prime })^2C_{Eff}}{2(2(C_{QOV})^2+2\gamma_1(C_{QOP})^2(1+C_{Drel}^{1/2})^2 +(1-\alpha ^{\prime})(C_{Drel}+\gamma_2))}, $$

with the efficiency constant C Eff of the estimator \(\eta(u_{k}, p_{k}, \mathcal {T}_{k})\) from Lemma 2.1. □

6 The Optimality of the ANFEM

In this section, we address the optimality of the adaptive nonconforming linear element method under consideration. We need to control the consistency error \(\kappa (\sigma ,\mathcal {T})\) defined by

$$ \kappa (\sigma,\mathcal {T})=\sup _{v_{\mathcal {T}}\in V_{\mathcal {T}}} \frac{(g, v_{\mathcal {T}})_{L^2(\varOmega )}-(\sigma, \nabla _{\mathcal {T}} v_{\mathcal {T}})_{L^2(\varOmega )}}{\|\nabla _{\mathcal {T}}v_{\mathcal {T}}\|_{L^2(\varOmega )}}\quad\text{with } \sigma=\mu \nabla u+p\operatorname {id}, $$
(6.1)

where \(\mathcal {T}\) is some refinement of the initial mesh \(\mathcal {T}_{0}\) by the newest vertex bisection. The following conforming finite element space is needed:

$$ P_3(\mathcal {T}):=\bigl\{v\in\bigl(H_0^1(\varOmega ) \bigr)^2, v|_K\in\bigl(P_3(K) \bigr)^2,\ \text{for any }K\in \mathcal {T}\bigr\}. $$
(6.2)

Then, there exists an interpolation operator \(\varPi_{\mathcal {T}}: V_{\mathcal {T}}\rightarrow P_{3}(\mathcal {T})\) with the following properties [28, Lemma A.3]:

(6.3)

for any edge E and element K of \(\mathcal {T}\). In addition, we have

$$ \|v_{\mathcal {T}}-\varPi_{\mathcal {T}}v_{\mathcal {T}} \|_{L^2(K)}+h_K\|\nabla \varPi_{\mathcal {T}}v_{\mathcal {T}} \|_{L^2(K)}\lesssim h_K \|\nabla _{\mathcal {T}}v_{\mathcal {T}} \|_{L^2(\omega_K)}. $$
(6.4)

For any \(s_{\mathcal {T}}\in V_{\mathcal {T}}\) and \(q_{\mathcal {T}}\in Q_{\mathcal {T}}\), we define \(\sigma_{\mathcal {T}}=\mu s_{\mathcal {T}}+q_{\mathcal {T}}\). The idea of [27, Lemma 2.1] leads to the following decomposition:

(6.5)

for any \(v_{\mathcal {T}}\in V_{\mathcal {T}}\). By the properties (6.3) and (6.4), we obtain

$$ \kappa (\sigma,\mathcal {T})\lesssim\inf _{(v_{\mathcal {T}},q_{\mathcal {T}})\in V_{\mathcal {T}}\times Q_{\mathcal {T}}}|\hskip -0.8pt|\hskip -0.8pt|u-v_{\mathcal {T}}, p-q_{\mathcal {T}}|\hskip -0.8pt|\hskip -0.8pt|_{\mathcal {T}}+\operatorname {osc}(g, \mathcal {T}). $$
(6.6)

This implies that the nonlinear approximate class used in [30] is equivalent to the standard nonlinear approximate class [7, 19]. Hence, we can introduce the following semi-norm:

$$ \mathfrak{E}^2(N;u,p, g):=\inf _{\mathcal {T}\in\mathbb{T}_N} \Bigl(\inf _{(v_{\mathcal {T}},q_{\mathcal {T}})\in V_{\mathcal {T}}\times Q_{\mathcal {T}}}|\hskip -0.8pt|\hskip -0.8pt|u-v_{\mathcal {T}}, p-q_{\mathcal {T}}|\hskip -0.8pt|\hskip -0.8pt|^2_{\mathcal {T}}+ \gamma_2\operatorname {osc}^2(g, \mathcal {T}) \Bigr). $$
(6.7)

Then the nonlinear approximate class \(\mathbb{A}_{s}\) can be defined by

$$ \mathbb{A}_s:=\Bigl\{(u,p, g), |u,p, g|_s:=\sup _{N>N_0}N^{s} \mathfrak{E}(N;u,p, g)<+\infty\Bigr\}. $$
(6.8)

We must stress that this is the first time the standard nonlinear approximate class [19] has been used to analyze the adaptive nonconforming finite element method. In the relevant literature, the discrete solution of the discrete problem has been used to define the nonlinear approximate class [5, 6, 34, 39]. Let \((u_{\mathcal {T}}, p_{\mathcal {T}})\) be the approximation solution of (2.5) on the mesh \(\mathcal {T}\). It follows from the Strang Lemma [22]

$$|\hskip -0.8pt|\hskip -0.8pt|u-u_{\mathcal {T}}, p-p_{\mathcal {T}}|\hskip -0.8pt|\hskip -0.8pt|_{\mathcal {T}}\lesssim \inf _{(v_{\mathcal {T}},q_{\mathcal {T}})\in V_{\mathcal {T}}\times Q_{\mathcal {T}}}|\hskip -0.8pt|\hskip -0.8pt|u-v_{\mathcal {T}}, p-q_{\mathcal {T}} |\hskip -0.8pt|\hskip -0.8pt|_{\mathcal {T}}+\kappa (\sigma,\mathcal {T}), $$

and the following fact

$$\inf _{(v_{\mathcal {T}},q_{\mathcal {T}})\in V_{\mathcal {T}}\times Q_{\mathcal {T}}}|\hskip -0.8pt|\hskip -0.8pt|u-v_{\mathcal {T}}, p-q_{\mathcal {T}} |\hskip -0.8pt|\hskip -0.8pt|_{\mathcal {T}}+\kappa (\sigma,\mathcal {T})\lesssim |\hskip -0.8pt|\hskip -0.8pt|u-u_{\mathcal {T}}, p-p_{\mathcal {T}}|\hskip -0.8pt|\hskip -0.8pt|_{\mathcal {T}}, $$

that the nonlinear approximate class of [5] is equivalent to \(\mathbb{A}_{s}\) of (6.8). A similar method herein proves that the nonlinear approximate class of [6, 34, 39] is equivalent to the standard nonlinear approximate class [19].

Remark 6.1

After we submitted the revised version to the journal, we learnt about that a different argument of [18] shows that the nonlinear approximate class of [6, 34, 39] is equivalent to the standard nonlinear approximate class [19].

Thanks to (6.6), we have

(6.9)

A straightforward investigation shows that if \(\mathcal {T}_{k}\) is any refinement of \(\mathcal {T}_{k-1}\), then it holds that

(6.10)

With these preparations, following [29], we have the following optimality:

Theorem 6.2

Let (u,p) be the solution of Problem (2.1), and let \((\mathcal {T}_{k}, V_{k}\times Q_{k}, (u_{k},p_{k}))\) be the sequence of meshes, finite element spaces, and discrete solutions produced by the adaptive finite element methods. If \(( u, p, g)\in\mathbb{A}_{s}\) with

$$\theta\leq\frac{C_{Eff}}{2(2(C_{QOV})^2+2\gamma_1(C_{QOP})^2(1+C_{Drel}^{1/2})^2 +C_{Drel}+\gamma_2)}. $$

Then, it holds that

$$ |\hskip -0.8pt|\hskip -0.8pt|u-u_N, p-p_N|\hskip -0.8pt|\hskip -0.8pt|_N^2+ \gamma_2\operatorname {osc}^2(g, \mathcal {T}_N)\lesssim| u, p, g|^2_s(\#\mathcal {T}_N-\#\mathcal {T}_0)^{-2s}. $$
(6.11)