1 Introduction

Let \(\varOmega \) be a bounded domain in \({\mathbb {R}}^d, d=2,3\), with a Lipschitz boundary \(\partial \varOmega \). Let \({\mathbb {M}}, {\mathbb {S}}, {\mathbb {K}}\) be the sets of all, symmetric, skew-symmetric \(d \times d\) matrices, and \({\mathbb {V}}\) be the set of (column) \({\mathbb {R}}^d\)-vectors. In (static) linear elasticity problems, we find mechanical quantities of a linearly elastic body occupying \(\varOmega \) for a given body force and a given boundary condition. The problem can be written as a system of equations seeking two unknowns \(\varvec{\sigma }\) and \(u\), the stress tensor and the displacement field, such that

$$\begin{aligned} A \varvec{\sigma } = \epsilon (u), \quad -{\text {div}}\, \varvec{\sigma } = f \qquad \text {in } \varOmega . \end{aligned}$$
(1)

Here \(\varvec{\sigma }\) and \(u\) are \({\mathbb {S}}\)-valued and \({\mathbb {V}}\)-valued functions on \(\varOmega \), respectively, \(A\) is the compliance tensor determined by material parameters of the elastic medium, \(\epsilon (u)\) is the symmetric part of the gradient of \(u, f: \varOmega \rightarrow {{\mathbb {V}}}\) is a given body force, and \({\text {div}}\) is the row-wise divergence. If the elastic body is clamped on the boundary of \(\varOmega \), then \(u=0\) on \(\partial \varOmega \) is imposed as a boundary condition.

For the compliance tensor \(A\), we assume that \(A(x)\) for each \(x \in \varOmega \) is a linear symmetric positive definite map from \({\mathbb S}\) to \({\mathbb S}\), and there exist uniform upper and lower bounds of \(A(x)\) which are independent of \(x\). For \(\varvec{\tau } : \varOmega \rightarrow {\mathbb {S}}, A \varvec{\tau }\) naturally gives an \({\mathbb {S}}\)-valued function on \(\varOmega \). In a homogeneous isotropic elastic medium, \(A\) has the form

$$\begin{aligned} A \varvec{\tau } = \frac{1}{2\mu } \left( \varvec{\tau } - \frac{\lambda }{2\mu + d\lambda } {\text {tr }}(\varvec{\tau }) \varvec{I} \right) , \end{aligned}$$
(2)

in which \(\mu , \lambda \) are positive constants, called the Lamé parameters, \({\text {tr }}(\varvec{\tau })\) is the trace of function \(\varvec{\tau } :\varOmega \rightarrow {\mathbb S}\), and \(\varvec{I}\) is the \(d \times d\) identity matrix. We refer to [20] for more details on properties of compliance tensors.

A natural finite element approach for (1) is the mixed method [12]. However, this mixed method turns out to be highly nontrivial due to the symmetry constraint of stress tensor, which is originally from the law of conservation of angular momentum [20]. There are known mixed finite elements for the problem with symmetric stress tensor but those elements usually need sophisticated high order polynomials for shape functions of the stress tensor [2, 5, 7, 22, 23]. An alternative approach for the problem, say weak symmetry approach, is to impose the symmetry constraint weakly [36]. Various mixed finite elements have been developed based on it [1, 4, 6, 10, 14, 15, 17, 19, 25, 26, 28, 32, 33] but numerical stress tensor in these methods is only weakly symmetric. Mixed discontinuous Galerkin (DG) methods have also been considered but it seems that there are only a few of such results [15, 30].

In this paper, for the linear elasticity Eq. (1), we will develop a staggered discontinuous Galerkin (SDG) method which has been successfully developed and analyzed for the acoustic wave equation and the Stokes Eq. [13, 24]. In SDG methods, the flux condition across the inter-element boundary can be naturally obtained by the staggered continuity property of the finite element functions, whereas an artificial numerical flux needs to be introduced in standard DG methods and other mixed DG methods [15, 30].

In order to deal with the symmetry constraint of stress tensor, we establish our SDG method using a variational formulation with the weak symmetry approach [36]. However, the numerical stress tensor obtained in our SDG method becomes strongly symmetric. Polynomials of the same degree are used for all the finite element spaces and the optimal order of \(L^2\)-error estimates are proved for the given order of polynomials. In contrast, most other mixed and DG methods, which use the variational formulation with weakly symmetric stress, give only weakly symmetric numerical stress tensor and require more sophisticated shape functions to achieve the stability and optimal order of \(L^2\)-error estimates. We also prove that our SDG method does not suffer from the volumetric locking [8]. More precisely, for \(A\) of the form in (2), we show that the \(L^2\)-errors of all the unknowns in our method do not grow unboundedly as \(\lambda \) becomes arbitrarily large. We also point out that SDG methods are relevant for time-dependent problems [13], so it is reasonable to expect applications of our method to time-dependent elasticity problems.

We remark that another SDG method with symmetric stress tensor can also be developed by imposing symmetry constraint a priori to all shape functions of stress tensor. This is a reduced method in a sense because only two unknowns, stress tensor and displacement, are used and the total number of degrees of freedom (DOFs) in this method is smaller than that of our SDG method. However, there are prices to pay. In this reduced method, symmetric functions with local \(H({\text {div}})\)-continuity are used as shape functions for stress tensor. For error analysis, these symmetric shape functions require construction of new interpolation operator having the symmetric shape functions as its range because a standard (locally) \(H({\text {div}})\) interpolation operator does not have range of symmetric functions. The construction of new interpolation operator is nontrivial and makes error analysis more complicated. In contrast, no new interpolation operator is needed in our SDG method. We only use the interpolation operators of standard finite elements [11] and of the SDG method for mixed Poisson problem [13]. In addition to this drawback from new interpolation operator for error analysis, there is a difficulty in implementation. In order to implement the reduced method, we need local DOFs for the symmetric locally \(H({\text {div}})\) shape functions but finding such local DOFs is by no means straightforward. These drawbacks seem to outweigh the benefit from less number of DOFs, so we do not pursue the reduced method further. In our case, finite element functions for stress tensor and the additional Lagrange multipliers for the weak symmetry are discontinuous across the element boundaries and thus in practical implementation those unknowns can be eliminated element-wise to obtain a reduced system on the displacement.

This work is organized as follows. In Sect. 2, a variational formulation with weakly symmetric stress is introduced. In Sect. 3, we introduce an SDG method for the problem, prove its stability and a priori error estimates. In particular, robustness of the error estimates in nearly incompressible materials is proved. In Sect. 4, we present numerical results for test problems on structured/unstructured meshes and in nearly incompressible materials. Finally, auxiliary results for the proof of stability are provided in Sect. 1.

2 Variational Form with Weak Symmetry

Suppose that there are two open subsets \(\varLambda _D, \varLambda _N\) of \(\partial \varOmega \) such that \(\varLambda _D \cap \varLambda _N = \emptyset , \overline{\varLambda }_D \cup \overline{\varLambda }_N = \partial \varOmega \). Suppose that \(\varLambda _D \not = \emptyset \) and a boundary condition of (1) is given by

$$\begin{aligned} u = g_D \quad \text { on } \varLambda _D, \qquad \varvec{\sigma }n = g_N \quad \text { on } \varLambda _N, \end{aligned}$$
(3)

with \(g_D : \varLambda _D \rightarrow {\mathbb {V}}, g_N : \varLambda _N \rightarrow {\mathbb {V}}\), in which \(n: \partial \varOmega \rightarrow {\mathbb {V}}\) is the unit outward normal vector field on \(\partial \varOmega \) and \(\varvec{\sigma }n\) stands for the matrix-vector multiplication. The problem (1) with this boundary condition is well-posed under suitable regularity assumptions on \(g_D\) and \(g_N\).

Here we introduce notation and some definitions. For a bounded open subset \(G \subset {\mathbb {R}}^d\) of Lipschitz boundary, we use \(L^2(G)\) to denote the Hilbert space of real-valued functions on \(G\) whose inner product and the corresponding norm are

$$\begin{aligned} (p, q)_G := \int _G p q\, dx, \quad \Vert p \Vert _{0,G} := (p, p)_G^{\frac{1}{2}}, \qquad p, q \in L^2(G). \end{aligned}$$

When \(G = \varOmega \) or \(G\) is obvious in context, we will use \((p,q)\) and \(\Vert p \Vert _0\) for simplicity. For \(r >0, H^r(G)\) and \(\Vert \cdot \Vert _{r,G}\) stand for the standard Sobolev space based on the \(L^2(G)\)-norm with differentiability of order \(r\) [16] and the corresponding norm. For a nonnegative integer \(k, {\mathcal {P}}_k(G)\) is the space of polynomials on \(G\) of order \(\le k\). For a finite dimensional inner product space \(\mathbb {X}\), all these function spaces are naturally extended to \(\mathbb {X}\)-valued functions with counterparts \(L^2(G; \mathbb {X}), H^r(G; \mathbb {X})\), and \({\mathcal {P}}_k(G; \mathbb {X})\). We will use same notation for inner products and norms in these extended function spaces, and the meaning will be clear in context. Let

$$\begin{aligned} H({\text {div}}, G) := \left\{ v \in L^2\left( G; {\mathbb {R}}^d\right) \,:\, {\text {div}}\, v \in L^2(G) \right\} , \end{aligned}$$

in which \({\text {div}}\, v\) is the divergence of \(v\) in the sense of distributions. Recall that \({\text {div}}\) is regarded as row-wise divergence for \({\mathbb {M}}\)-valued functions. Now we define

$$\begin{aligned} H({\text {div}}, G; {\mathbb {M}}) := \left\{ \varvec{\tau } \in L^2(G; {\mathbb {M}}) \,:\, {\text {div}}\, \varvec{\tau } \in L^2(G; {\mathbb {V}}) \right\} . \end{aligned}$$

For \(v \in H^1(G; {\mathbb {V}})\), the \({\text {grad}}\) operator stands for the row-wise gradient and thus \({\text {grad}}\, v \in L^2(G; {\mathbb {M}})\). For an \({\mathbb {M}}\)-valued function \(\varvec{\tau }\), the symmetric and skew-symmetric parts of \(\varvec{\tau }\) are

$$\begin{aligned} {\text {sym }}\varvec{\tau } = \frac{\varvec{\tau } + \varvec{\tau }^T}{2}, \qquad {\text {skw }}\varvec{\tau } = \frac{\varvec{\tau } - \varvec{\tau }^T}{2}, \end{aligned}$$

where \(\varvec{\tau }^T\) is the transpose of \(\varvec{\tau }\). Note that \(\epsilon (u) = {\text {sym }}{\text {grad}}\, u\) by these definitions. Finally, for a bounded subset \(G \subset {\mathbb {R}}^d\) of codimension 1, \(L^2(G)\) is the Hilbert space of square-integrable functions on \(G\) with inner products

$$\begin{aligned} \langle p, q \rangle _G := \int _G p q \, ds, \qquad p, q \in L^2(G), \end{aligned}$$

and other spaces \(H^r(G), L^2(G; \mathbb {X}), H^r(G; \mathbb {X}), {\mathcal {P}}_k(G), {\mathcal {P}}_k(G, \mathbb {X})\) and norms \(\Vert \cdot \Vert _{0,G}, \Vert \cdot \Vert _{r,G}\) are similarly defined.

In order to derive a variational form of (1) with weakly imposed symmetry of the stress, we first extend the compliance tensor \(A(x)\) for \(x \in \varOmega \) to be a map from \({\mathbb M}\) into \({\mathbb M}\) by extending \(A(x)\) to be a positive multiple of the identity map on \({\mathbb K}\). For simplicity we again denote this extended compliance tensor by \(A\) because we will not use the original compliance tensor. Note that \(A \varvec{\sigma } = \epsilon (u)\) still holds because \(\varvec{\sigma }\) is symmetric. We then introduce a new unknown \(\varvec{\gamma } \in L^2(\varOmega ; {\mathbb K})\) instead of \({\text {skw }}{\text {grad}}\, u\), and rewrite (1) as

$$\begin{aligned} A \varvec{\sigma } = {\text {grad}}\, u - \varvec{\gamma }, \qquad -{\text {div}}\, \varvec{\sigma } = f, \end{aligned}$$
(4)

with symmetry constraint \((\varvec{\sigma }, \varvec{\eta }) = 0, \forall \varvec{\eta } \in L^2(\varOmega ; {\mathbb K})\). If the triple \((\varvec{\sigma }, u, \varvec{\gamma })\) satisfies (4) with the symmetry constraint, then one can check that the pair \((\varvec{\sigma }, u)\) satisfies (1). To write this system with (3) as a variational form, set

$$\begin{aligned} \Sigma&= L^2(\varOmega ; {\mathbb M}), \qquad \varGamma = L^2(\varOmega ; {\mathbb K}),\\ U_{g_D}&= \{ v \in H^1(\varOmega ; {\mathbb {V}}) \,:\, v|_{\varLambda _D} = g_D \}, \\ U_0&= \{ v \in H^1(\varOmega ; {\mathbb {V}}) \,:\, v|_{\varLambda _D} = 0 \}. \end{aligned}$$

The first equation in (4) is simply \((A \varvec{\sigma }, \varvec{\tau }) = ({\text {grad}}\, u - \varvec{\gamma }, \varvec{\tau })\) for all \(\varvec{\tau } \in \Sigma \). For \(v \in U_0\), by the integration by parts, we have

$$\begin{aligned} (f, v) = -({\text {div}}\, \varvec{\sigma }, v) = -\langle \varvec{\sigma } n, v \rangle _{\partial \varOmega } + (\varvec{\sigma }, {\text {grad}}\, v) = - \langle g_N, v \rangle _{\varLambda _N} + (\varvec{\sigma }, {\text {grad}}\, v), \end{aligned}$$

where \(n\) is the outward unit normal vector field on \(\partial \varOmega \). Combining it with the symmetry constraint \((\varvec{\sigma }, \varvec{\eta }) = 0\) for all \(\varvec{\eta } \in \varGamma \), we have

$$\begin{aligned} (\varvec{\eta } - {\text {grad}}\, v, \varvec{\sigma }) = -(f,v) - \langle g_N, v \rangle _{\varLambda _N}, \qquad \forall v \in U_0. \end{aligned}$$

Rewriting \(u\) by \(u = u_0 + u_g\) with a fixed \(u_g \in U_{g_D}\), a variational form is to find \((\varvec{\sigma }, u_0, \varvec{\gamma }) \in \Sigma \times U_0 \times \varGamma \) such that

$$\begin{aligned} (A \varvec{\sigma } , \varvec{\tau }) + (\varvec{\gamma } - {\text {grad}}\, u_0, \varvec{\tau })&= ({\text {grad}}\, u_g, \varvec{\tau }),&\quad \forall \varvec{\tau } \in \Sigma , \end{aligned}$$
(5)
$$\begin{aligned} (\varvec{\eta } - {\text {grad}}\, v, \varvec{\sigma })&= -(f, v) - \langle g_N, v \rangle _{\varLambda _N},&\quad \forall (v, \varvec{\eta }) \in U_0 \times \varGamma . \end{aligned}$$
(6)

For simplicity of presentation we will assume that \(g_D = g_N = 0\) with \(u_g = 0\) in the rest of the paper because it is straightforward to generalize our method, which will be described in the next section, to more general boundary conditions.

3 A Staggered DG Method, Stability, and Error Analysis

In this section we develop a staggered DG method for (5)–(6), prove its stability, and show a priori error estimates.

3.1 A Staggered DG Method and Stability

Let \({\mathcal {M}}_h\) be a shape-regular triangulation of \(\varOmega \) with parameter \(h\), the largest diameter of simplices in \({\mathcal {M}}_h\), and assume that \({\mathcal {M}}_h\) is conforming to boundary subdomains \(\varLambda _D, \varLambda _N\). Throughout this work, \(c\) will denote a positive constant independent of \(h\) but it can be different in each formula in order to avoid proliferation of constants in inequalities. Let \({\mathcal {T}}_h\) be the simplicial mesh obtained from \({\mathcal {M}}_h\) by refining \({\mathcal {M}}_h\) with barycentric subdivision, and \({\mathcal {F}}_h\) be the set of subsimplices of \({\mathcal {T}}_h\) with codimension one. The sets of subsimplices \({\mathcal {F}}_u, {\mathcal {F}}_{\sigma }\), and \({\mathcal {F}}_h^{\circ }\) are defined by

$$\begin{aligned} {\mathcal {F}}_u = \{ e \in {\mathcal {F}}_h \,:\, e \subset \partial M, M \in {\mathcal {M}}_h \}, \quad {\mathcal {F}}_{\sigma } = {\mathcal {F}}_h \setminus {\mathcal {F}}_u, \quad {\mathcal {F}}_h^{\circ } = \{ e \in {\mathcal {F}}_h \,:\, e \not \subset \partial \varOmega \}. \end{aligned}$$

For an illustration of a staggered mesh and its subsimplices we refer to Fig. 1. Suppose that \(\varvec{\tau } \in L^2(\varOmega ; {\mathbb {M}})\) and \(v \in L^2(\varOmega ; {\mathbb {V}})\), whose restrictions on each \(T \in {\mathcal {T}}_h\) are in \(H^r(T; {\mathbb {M}}), H^r(T; {\mathbb {V}})\) with \(r > 1/2\). We now define the jumps \([\![\varvec{\tau } n ]\!]\) and \([\![v ]\!]\) on each \(e \in {\mathcal {F}}_h\). For \(e \in {\mathcal {F}}_h\) with \(e \subset \partial \varOmega \), we define

$$\begin{aligned}{}[\![\varvec{\tau } n ]\!]|_e := \varvec{\tau }|_e n, \qquad [\![v ]\!]|_e := v|_e \end{aligned}$$

where \(n\) is the unit outward normal vector on \(\partial \varOmega \), and \(\varvec{\tau }|_e, v|_e\) are the traces of \(\varvec{\tau }\) and \(v\) on \(e\), respectively. To define the jumps on given \(e \in {\mathcal {F}}_h^{\circ }\), let \(T^+, T^- \in {\mathcal {T}}_h\) be the unique two simplices such that \(e \subset \partial T^+ \cap \partial T^-\), and we denote the unit normal vector on \(e\) pointing outward from \(T^+ (T^-)\) by \(n^+ (n^-\), respectively). We also denote by \(\varvec{\tau }^+\) the trace of \(\varvec{\tau }\) on \(e\) given by \(\varvec{\tau }|_{T^+}\). The notations \(\varvec{\tau }^-, v^+, v^-\) are similarly defined. Then the jumps are defined by

$$\begin{aligned}{}[\![\varvec{\tau } n ]\!]|_e := \varvec{\tau }^+ n^+ + \varvec{\tau }^- n^-, \qquad [\![v ]\!]|_e := v^+ - v^-. \end{aligned}$$
(7)

Since the definition of \([\![v ]\!]|_e\) depends on the choice of \(T^+\) and \(T^-\), it will need an additional care in error analysis.

Fig. 1
figure 1

An illustration of a staggered mesh in two dimensions: \(M(\nu _i)\) are triangles in \({\mathcal {M}}_h, \nu _i\) is the barycenter of \(M(\nu _i)\), solid lines are edges of \({\mathcal {F}}_u\), dotted lines are edges of \({\mathcal {F}}_\sigma \), and \({\mathcal {R}}(e)\) is the union of triangles sharing an edge \(e\) in \({\mathcal {F}}_u\)

We are ready to define finite element spaces for the problem. The space \(\Sigma _h\) is the set of functions

$$\begin{aligned} \Sigma _h&= \{ \varvec{\tau }\in \Sigma \,:\, \varvec{\tau }|_T \in \mathcal {P}_{k}(T; {\mathbb M}) \;\; \forall T \in {\mathcal {T}}_h \;\text {and} \; [\![\varvec{\tau }n ]\!]|_e = 0 \;\; \forall e \in {\mathcal {F}}_{\sigma } \}, \quad k \ge 1. \end{aligned}$$

The vanishing jump condition of \(\Sigma _h\) implies that the normal components of each row of \(\varvec{\tau } \in \Sigma _h\) are single-valued on every \(e \in {\mathcal {F}}_{\sigma }\). The degrees of freedom of \(\Sigma _h\) are

$$\begin{aligned}&\int _e \varvec{\tau }n_e \cdot p \,ds, \qquad \forall p \in \mathcal {P}_{k}(e; {\mathbb {V}}), \quad \forall e \in {\mathcal {F}}_{\sigma }, \end{aligned}$$
(8)
$$\begin{aligned}&\int _T \varvec{\tau }: \varvec{\phi } \,dx, \qquad \forall \varvec{\phi } \in \mathcal {P}_{k-1}(T; {\mathbb M}), \quad \forall T \in {\mathcal {T}}_h, \end{aligned}$$
(9)

for \(\varvec{\tau }\in \Sigma _h\), where the dot and colon are inner products on \({\mathbb {V}}\) and \({\mathbb {M}}\), and \(n_e\) is a (chosen) unit normal vector on \(e\). The well-definedness of \(\Sigma _h\) with respect to these DOFs is a consequence of [13], Lemma 2.3]. Let

$$\begin{aligned} \widetilde{U}_h&= \{ v \in L^2(\varOmega ; {\mathbb {V}}) \,: \, v|_T \in \mathcal {P}_{k}(T; {\mathbb {V}}) \;\; \forall T \in {\mathcal {T}}_h \;\text {and} \; [\![v ]\!]|_e = 0 \;\; \forall e \in {\mathcal {F}}_u \}, \end{aligned}$$

and the jump condition of \(\tilde{U}_h\) implies that \(v \in \tilde{U}_h\) is continuous on every \(e \in {\mathcal {F}}_u\). The degrees of freedom of \(\tilde{U}_h\) are

$$\begin{aligned}&\int _e v \cdot p \, ds, \qquad \forall p \in {\mathcal {P}}_k(e;{\mathbb {V}}),\quad \forall e \in {\mathcal {F}}_u,\end{aligned}$$
(10)
$$\begin{aligned}&\int _T v \cdot \phi \, dx ,\qquad \forall \phi \in {\mathcal {P}}_{k-1}(T;{\mathbb {V}}),\quad \forall T \in {\mathcal {T}}_h, \end{aligned}$$
(11)

for \(v \in \tilde{U}_h\), and we refer to [13], Lemma2.2] for its well-definedness. We define \(U_h\) as the subspace of \(\widetilde{U}_h\) such that all the DOFs associated to \(e \in {\mathcal {F}}_u\) with \(e \subset \varLambda _D\) vanish.

By the construction, \(\mathbf{\tau }\) in \(\Sigma _h\) is locally \(H({\text {div}})\)-conforming in \(M(\nu _i)\) and \(v\) in \(U_h\) is locally \(H^1\)-conforming in \({\mathcal {R}}(e)\) for \(e\) in \({\mathcal {F}}_\sigma \), see Fig. 1. We observe that at least one of \([\![\varvec{\tau }n ]\!]\) and \([\![v ]\!]\) vanishes on \(e\) for any \(e \in {\mathcal {F}}_h^{\circ }\). In other words, continuity of functions in the pair \((\Sigma _h,U_h)\) is staggered on subsimplices in \({\mathcal {F}}_h^{\circ }\). This property will give us the inter-element flux condition in our Galerkin formulation without the need of carefully designed flux condition in other DG methods [3, 15, 30] while allowing partial discontinuity for the finite element spaces. The space \(\varGamma _h\) is defined by

$$\begin{aligned} \varGamma _h&= \{ \varvec{\eta } \in \varGamma \,:\, \varvec{\eta }|_T \in \mathcal {P}_{k}(T; {\mathbb K}), \;\; \forall T \in {\mathcal {T}}_h \}. \end{aligned}$$

We now define norms for \(\Sigma _h\) and \(U_h \times \varGamma _h\) by

$$\begin{aligned} \Vert \varvec{\tau }\Vert _{0,h}^2&= \Vert \varvec{\tau }\Vert _0^2 + \sum _{e \in {\mathcal {F}}_{\sigma }} h_e \Vert \varvec{\tau }n \Vert _{0,e}^2, \end{aligned}$$
(12)
$$\begin{aligned} \Vert (v, \varvec{\eta }) \Vert _{h}^2&= \Vert \varvec{\eta } - {\text {grad}}_h\, v \Vert _0^2 + \sum _{e \in {\mathcal {F}}_{\sigma }} h_e^{-1} \Vert [\![v ]\!]\Vert _{0,e}^2, \end{aligned}$$
(13)

where \({\text {grad}}_h\) is the element-wise \({\text {grad}}\) operator with respect to \({\mathcal {T}}_h\), and \(h_e\) is the diameter of simplex \(e\). To see that \(\Vert (\cdot , \cdot ) \Vert _h\) is a norm, suppose that \(\Vert (v, \varvec{\eta }) \Vert _h = 0\) for \(v \in U_h, \varvec{\eta } \in \varGamma _h\). This is equivalent to \(\varvec{\eta } - {\text {grad}}_h\, v = 0\) and \([\![v ]\!]|_e = 0\) for all \(e \in {\mathcal {F}}_{\sigma }\). Considering the symmetric and skew-symmetric parts of \(\varvec{\eta } - {\text {grad}}_h\, v\), we have \({\text {sym }}{\text {grad}}_h\, v = 0\) and \(\varvec{\eta } - {\text {skw }}{\text {grad}}_h\, v = 0\). Then \(v\) is an element-wise rigid body motion because \({\text {sym }}{\text {grad}}_h\, v = 0\). Furthermore, since \([\![v ]\!]|_e = 0\) for all \(e \in {\mathcal {F}}_{\sigma }\) and \(v\) is continuous on all \(e \in {\mathcal {F}}_u, v\) is a rigid body motion on \(\varOmega \). This implies that \(v = 0\) because \(v = 0\) on \(\varLambda _D \not = \emptyset \). Then \(\varvec{\eta } = 0\) as well because \(\varvec{\eta } - {\text {skw }}{\text {grad}}_h\, v = 0\). The other conditions of \(\Vert (\cdot , \cdot ) \Vert _h\) to be a norm, are easy to check. For pure traction boundary conditions \((\varLambda _D = \emptyset )\), a modified argument is needed and we will discuss it later.

Recall that \([\![v ]\!]|_e\) in (7) and \(\varvec{\tau }n|_e\) on \(e \in {\mathcal {F}}_\sigma \) depend on the choice of \(T^+, T^-\), and the direction of unit normal vector \(n\) on \(e\), respectively. To avoid this ambiguity in our analysis, whenever we deal with a term of the form \(\langle [\![v ]\!], \varvec{\tau }n \rangle _e\) on \(e \in {\mathcal {F}}_{\sigma }\), we assume that \(n\) is chosen to satisfy:

$$\begin{aligned} n = n^+ \text { for the simplices } T^+ \text { and } T^- \text { such that }[\![v ]\!]= v^+ - v^- . \end{aligned}$$

Adopting this convention, let us define bilinear forms

$$\begin{aligned} a( \varvec{\phi }, \varvec{\tau })&= (A \varvec{\phi }, \varvec{\tau }), \nonumber \\ b(\varvec{\tau }; v, \varvec{\eta })&= (\varvec{\eta } - {\text {grad}}_h\, v, \varvec{\tau }) + \sum _{e \in {\mathcal {F}}_{\sigma }} \langle [\![v ]\!], \varvec{\tau }n \rangle _e ,\nonumber \\ b^*(v, \varvec{\eta }; \varvec{\tau })&= (v, {\text {div}}_h\, \varvec{\tau }) - \sum _{e \in {\mathcal {F}}_{u}} \langle v , [\![\varvec{\tau }n ]\!]\rangle _e + (\varvec{\eta }, \varvec{\tau }), \end{aligned}$$
(14)

for \(\varvec{\phi }, \varvec{\tau }\in \Sigma _h, v \in U_h, \varvec{\eta } \in \varGamma _h\), in which \({\text {div}}_h\) is the element-wise \({\text {div}}\) operator with respect to \({\mathcal {T}}_h\). In fact, \(b(\varvec{\tau }; v, \varvec{\eta })\) and \(b^*(v, \varvec{\eta }; \varvec{\tau })\) are well-defined for \(\varvec{\tau }\in L^2(\varOmega ; {\mathbb {M}}), v \in L^2(\varOmega ; {\mathbb {V}}), \varvec{\eta } \in L^2(\varOmega ; {\mathbb {K}})\) when \(\varvec{\tau }\) and \(v\) are regular enough that \({\text {grad}}_h\, v, {\text {div}}_h\, \varvec{\tau }\) are meaningful and the integrations on \(e \in {\mathcal {F}}_h\) in the definitions are well-defined.

Notice that

$$\begin{aligned} \sum _{T \in {\mathcal {T}}_h} \langle v, \varvec{\tau }n_T \rangle _{\partial T} = \sum _{e \in {\mathcal {F}}_{\sigma }} \langle [\![v ]\!], \varvec{\tau }n \rangle _e + \sum _{e \in {\mathcal {F}}_{u}} \langle v , [\![\varvec{\tau }n ]\!]\rangle _e, \quad \forall v \in U_h, \forall \varvec{\tau } \in \Sigma _h, \end{aligned}$$

where \(n_T\) is the outward unit normal vector field on \(\partial T\). This fact and the integration by parts give

$$\begin{aligned} b^*(v, \varvec{\eta }; \varvec{\tau }) = b(\varvec{\tau }; v, \varvec{\eta }), \qquad \forall (\varvec{\tau }, v, \varvec{\eta }) \in \Sigma _h \times U_h \times \varGamma _h . \end{aligned}$$
(15)

We now establish a discrete counterpart of the problem (5)–(6) in our numerical scheme: Find \((\varvec{\sigma }_h, u_h, \varvec{\gamma }_h) \in \Sigma _h \times U_h \times \varGamma _h\) such that

$$\begin{aligned} a(\varvec{\sigma }_h, \varvec{\tau }) + b^*(u_h, \varvec{\gamma }_h;\varvec{\tau })&= 0,&\forall \varvec{\tau } \in \Sigma _h, \end{aligned}$$
(16)
$$\begin{aligned} -b(\varvec{\sigma }_h; v, \varvec{\eta })&= (f, v),&\forall v \in U_h, \forall \varvec{\eta } \in \varGamma _h . \end{aligned}$$
(17)

We emphasize that in the above the flux condition on any \(e\) in \({\mathcal {F}}_h\) is naturally obtained by the staggered continuity property of our finite element function spaces in contrast to other standard DG methods. For the consistency of this discretization, we show that the solution of (5)–(6) satisfies (16)–(17) under a suitable regularity assumption.

Lemma 1

For given \(f \in L^2(\varOmega ; {\mathbb {V}})\), suppose that \((\varvec{\sigma }, u, \varvec{\gamma })\) is a solution of (5)–(6) with

$$\begin{aligned} \varvec{\sigma } \in H^r(\varOmega ; {\mathbb {S}}), \quad u \in H^r(\varOmega ; {\mathbb {V}}), \qquad r > 1/2. \end{aligned}$$
(18)

Then \((\varvec{\sigma }, u, \varvec{\gamma })\) satisfies (16)–(17).

Proof

Under the regularity assumptions on \(\varvec{\sigma }\) and \(u, [\![\varvec{\sigma } n ]\!]|_e\) and \([\![u ]\!]|_e\) are well-defined and vanish on every \(e \in {\mathcal {F}}_h^{\circ }\). Using \(A \varvec{\sigma } = {\text {grad}}\, u - \varvec{\gamma } = {\text {grad}}_h\, u - \varvec{\gamma }\) and the integration by parts with boundary condition \(g_N=g_D=0\), we have

$$\begin{aligned} (A \varvec{\sigma }, \varvec{\tau })&= ({\text {grad}}_h\, u - \varvec{\gamma }, \varvec{\tau }) = -(u, {\text {div}}_h\, \varvec{\tau }) + \sum _{e \in {\mathcal {F}}_u} \langle u, [\![\varvec{\tau }n ]\!]\rangle _e - (\varvec{\gamma }, \varvec{\tau }) , \quad \forall \varvec{\tau }\in \Sigma _h, \end{aligned}$$

so \((\varvec{\sigma }, u, \varvec{\gamma })\) satisfies (16). Similarly, if we use the fact \((\varvec{\sigma }, \varvec{\eta }) = 0\) for all \(\varvec{\eta } \in \varGamma _h\), the equality \(-{\text {div}}\, \varvec{\sigma } = f\), and the integration by parts, then we have

$$\begin{aligned} (f, v) = -({\text {div}}\, \varvec{\sigma }, v) - (\varvec{\sigma }, \varvec{\eta })= (\varvec{\sigma }, {\text {grad}}_h\, v) - \sum _{e \in {\mathcal {F}}_{\sigma }} \langle \varvec{\sigma }n, [\![v ]\!]\rangle _e - (\varvec{\sigma }, \varvec{\eta }), \end{aligned}$$

and it implies that \((\varvec{\sigma }, u, \varvec{\gamma })\) satisfies (17). \(\square \)

The stability of discretization, i.e., well-posedness of the problem (16)–(17), is obtained by the Babuška–Brezzi stability theory [12]. The following inf-sup condition, will be proved using the mesh-dependent norm idea in [34], is crucial in the stability theory. An auxiliary result Lemma 9, necessary in the inf-sup condition proof, is provided in the appendix.

Lemma 2

There exists \(\alpha >0\) independent of \(h\) such that, for any \((0,0) \not = (v, \varvec{\eta }) \in U_h \times \varGamma _h\), one can find \(0 \not = \varvec{\tau } \in \Sigma _h\) satisfying

$$\begin{aligned} \frac{b(\varvec{\tau }; v, \varvec{\eta })}{\Vert (v, \varvec{\eta }) \Vert _h \Vert \varvec{\tau }\Vert _{0,h}} \ge \alpha . \end{aligned}$$

Proof

For given \(v \in U_h\) and \(\varvec{\eta } \in \varGamma _h\), suppose that \(b(\varvec{\tau }; v, \varvec{\eta }) = 0\) for all \(\varvec{\tau }\in \Sigma _h\). For each \(M \in {\mathcal {M}}_h\), we set

$$\begin{aligned} {\mathcal {F}}_{\sigma }^M&= \{ e \in {\mathcal {F}}_{\sigma } \,:\, e \subset M \}, \end{aligned}$$
(19)
$$\begin{aligned} \Sigma _M&= \{ \varvec{\tau } \in \Sigma _h \,:\, \varvec{\tau }|_T = 0 \; \text { for all }T \in {\mathcal {T}}_h \text { such that } T \not \subset M \}, \end{aligned}$$
(20)
$$\begin{aligned} \Sigma _{M,0}&= \{ \varvec{\tau } \in \Sigma _M \,:\, \varvec{\tau }n = 0 \text { on } \partial M \}, \end{aligned}$$
(21)

where \(n\) is the outward unit normal vector field on \(\partial M\). For \(\varvec{\tau } \in \Sigma _{M}\), the assumption \(b(\varvec{\tau }; v, \varvec{\eta }) = 0\) and the integration by parts give

$$\begin{aligned} 0&= b(\varvec{\tau }; v, \varvec{\eta }) \\&= (\varvec{\eta } - {\text {grad}}_h\, v, \varvec{\tau })_M + \sum _{e \in {\mathcal {F}}_{\sigma }^M} \langle [\![v ]\!], \varvec{\tau } n \rangle _e \\&= ({\text {div}}_h\, \varvec{\tau }, v)_M + (\varvec{\tau }, \varvec{\eta })_M - \sum _{e \subset \partial M} \langle v, [\![\varvec{\tau }n ]\!]\rangle _e. \end{aligned}$$

The condition on \(\varvec{\tau }n\) in the definition of \(\Sigma _{M,0}\) implies that \([\![\varvec{\tau }n ]\!]= 0\) on \(\partial M\) for \(\varvec{\tau }\in \Sigma _{M,0}\), so

$$\begin{aligned} ({\text {div}}_h\, \varvec{\tau }, v)_M + (\varvec{\tau }, \varvec{\eta })_M = 0, \qquad \forall \varvec{\tau }\in \Sigma _{M,0}. \end{aligned}$$

Let \(\varvec{\eta }_0\) be the component of \(\varvec{\eta }|_M\) such that each entry of \(\varvec{\eta }_0\) in \({\mathbb {K}}\) is mean-value zero on \(M\). By Lemma 9 in the appendix, one can find \(\varvec{\tau }_0 \in \Sigma _{M,0}\) such that \({\text {div}}_h\, \varvec{\tau }_0 = 0\) and \({\text {skw }}\varvec{\tau }_0|_M = \varvec{\eta }_0\). Noting that \((\varvec{\tau }_0, \varvec{\eta })_M = ({\text {skw }}\varvec{\tau }_0, \varvec{\eta })_M = (\varvec{\eta }_0, \varvec{\eta })_M = (\varvec{\eta }_0, \varvec{\eta }_0)_M\), the above equation implies \(\varvec{\eta }_0 = 0\), and therefore \(\varvec{\eta }|_M \in {\mathcal {P}}_0(M; {\mathbb {K}})\). Recall that general forms of rigid body motions in \({\mathbb {R}}^2\) and \({\mathbb {R}}^3\) are

$$\begin{aligned} \begin{pmatrix} 0 &{} -s \\ s &{} 0 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \end{pmatrix} + \begin{pmatrix} a_1 \\ a_2 \end{pmatrix}, \qquad \begin{pmatrix} 0 &{}\quad s_3 &{}\quad -s_2 \\ -s_3 &{}\quad 0 &{}\quad s_1 \\ s_2 &{}\quad -s_1 &{}\quad 0 \end{pmatrix} \begin{pmatrix} x_1 \\ x_2 \\ x_3 \end{pmatrix} + \begin{pmatrix} a_1 \\ a_2 \\ a_3 \end{pmatrix}, \end{aligned}$$
(22)

with constants \(s,s_i,a_i \in {\mathbb {R}}, i=1,2,3\). Since \(\varvec{\eta }|_M \in {\mathcal {P}}_0(M;{\mathbb K})\), there is a rigid body motion \(r_M\) on \(M\) such that \(\varvec{\eta }|_M = {\text {grad}}\, r_M\). Then

$$\begin{aligned} b(\varvec{\tau }; v, \varvec{\eta }) = ({\text {grad}}_h(r_M - v), \varvec{\tau })_M + \sum _{e \in {\mathcal {F}}_{\sigma }^M} \langle [\![v ]\!], \varvec{\tau } n \rangle _e = 0. \end{aligned}$$

Due to the polynomial degree of \(v\), we have \({\text {grad}}_h (r_M-v)|_T \in {\mathcal {P}}_{k-1}(T; {\mathbb M})\) for each subsimplex \(T \subset M\) and \([\![v ]\!]|_e \in {\mathcal {P}}_k(e; {\mathbb {V}})\) for \(e \in {\mathcal {F}}_{\sigma }^M\). By (8) and (9), there is \(\varvec{\tau } \in \Sigma _M\) such that

$$\begin{aligned} \langle \varvec{\tau }n, [\![v ]\!]\rangle _e&= \Vert [\![v ]\!]\Vert _{0,e}^2,&\forall e \in {\mathcal {F}}_\sigma ^M, \\ \qquad \int _T \varvec{\tau } : {\text {grad}}_h(r_M -v) \,dx&= \int _T | {\text {grad}}_h(r_M -v) |^2 \,dx,&\forall T \in {\mathcal {T}}_h \text { with } T \subset M. \end{aligned}$$

Hence \([\![v ]\!]|_e = 0\) for \(e \in {\mathcal {F}}_{\sigma }^M\) and \({\text {grad}}_h(r_M - v) = \varvec{\eta } - {\text {grad}}_h v = 0\) on \(M\).

If we do the same procedure for all \(M \in {\mathcal {M}}_h\), then we can conclude that \(b(\varvec{\tau }; v, \varvec{\eta })= 0\) for all \(\varvec{\tau }\in \Sigma _h\) implies \(\Vert (v, \varvec{\eta }) \Vert _h = 0\) by the definition of \(\Vert \cdot \Vert _h\) and the boundary condition \(v = 0\) on \(\varLambda _D\). By equivalence of norms on finite dimensional spaces and a standard scaling argument, one can conclude that there is \(\alpha > 0\) independent of mesh sizes such that

$$\begin{aligned} \Vert \varvec{\tau } \Vert _{0,h} \le \Vert (v, \varvec{\eta }) \Vert _h, \qquad b(\varvec{\tau }; v, \varvec{\eta }) \ge \alpha \Vert (v, \varvec{\eta }) \Vert _h^2. \end{aligned}$$

The assertion easily follows from this. \(\square \)

Theorem 1

The problem (16)–(17) is well-posed.

Proof

Since \(A(x)\) is positive definite on \({\mathbb {M}}\) uniformly in \(x \in \varOmega \), the bilinear form \(a(\cdot , \cdot )\) is coercive on \(L^2(\varOmega ; {\mathbb {M}}) \times L^2(\varOmega ; {\mathbb {M}})\). By a standard scaling argument, \(\Vert \varvec{\tau }\Vert _0\) and \(\Vert \varvec{\tau }\Vert _{0,h}\) for \(\varvec{\tau }\in \Sigma _h\) are equivalent norms, so \(a(\cdot , \cdot )\) is also coercive on \(\Sigma _h \times \Sigma _h\). By the Cauchy–Schwarz inequality, \(a(\cdot , \cdot )\) is a bounded bilinear form on \(\Sigma _h \times \Sigma _h\). Again by the Cauchy–Schwarz inequality,

$$\begin{aligned} b(\varvec{\tau }; v, \varvec{\eta }) \le \Vert \varvec{\tau }\Vert _{0,h} \Vert (v, \varvec{\eta }) \Vert _h, \qquad \forall (\varvec{\tau }, v, \varvec{\eta }) \in \Sigma _h \times U_h \times \varGamma _h, \end{aligned}$$
(23)

so \(b(\cdot ; \cdot , \cdot )\) is a bounded bilinear form on \(\Sigma _h \times (U_h \times \varGamma _h)\). Then the well-posedness of (16)–(17) follows by the inf-sup condition in Lemma 2, the equality (15), and the Babuška–Brezzi stability theory [12]. \(\square \)

3.2 Error Analysis

Let \((\varvec{\sigma }, u, \varvec{\gamma })\) and \((\varvec{\sigma }_h, u_h, \varvec{\gamma }_h)\) be the solutions of (5)–(6) and (16)–(17), respectively. Under the regularity assumption of \(\varvec{\sigma }\) and \(u\) in (18), the Babuška–Brezzi stability theory immediately gives

$$\begin{aligned}&\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _{0,h} + \Vert (u - u_h, \varvec{\gamma } - \varvec{\gamma }_h) \Vert _{h} \nonumber \\&\quad \le c \inf _{(\varvec{\tau }, v, \varvec{\eta }) \in \Sigma _h \times U_h \times \varGamma _h}( \Vert \varvec{\sigma } - \varvec{\tau } \Vert _{0,h} + \Vert (u - v, \varvec{\gamma } - \varvec{\eta }) \Vert _{h}). \end{aligned}$$
(24)

However, orders of all the errors in this estimate are \(O(h^k)\) for sufficiently regular solutions, so this estimate is not optimal because the optimal \(L^2\)-error of \(\varvec{\sigma }\) is \(O(h^{k+1})\) with respect to the full approximability of finite element spaces. Therefore, we present an error analysis which gives optimal error bounds.

We first introduce some interpolation operators. For \(\varvec{\tau } \in H^r(\varOmega ;{\mathbb M}), r > 1/2\), let \(J\) be the interpolation operator defined by (8)–(9), i.e.,

$$\begin{aligned} \int _{e} (J \varvec{\tau }) n \cdot p \, ds&=\int _e \varvec{\tau } n \cdot p \, ds,\quad \forall p \in {\mathcal {P}}_k(e;{\mathbb {V}}),\quad \forall e \in {\mathcal {F}}_\sigma ,\\ \int _{T} J \varvec{\tau } : \varvec{\phi } \, dx&=\int _T \varvec{\tau } : \varvec{\phi } \, dx,\quad \forall \varvec{\phi } \in {\mathcal {P}}_{k-1}(T;{{\mathbb {M}}}),\quad \forall T \in {\mathcal {T}}_h. \end{aligned}$$

For \(v \in H^r(\varOmega ;\mathbb {R}^d), r > 1/2\), we define an interpolation \(I v\) by

$$\begin{aligned} \int _e I v \cdot p \, ds&=\int _e v \cdot p \, ds, \quad \forall p \in {\mathcal {P}}_k(e;{\mathbb {V}}),\quad \forall e \in {\mathcal {F}}_u,\end{aligned}$$
(25)
$$\begin{aligned} \int _T I v \cdot \phi \, dx&=\int _T v \cdot \phi \, dx,\quad \forall \phi \in {\mathcal {P}}_{k-1}(T;{\mathbb {V}}),\quad \forall T \in {\mathcal {T}}_h. \end{aligned}$$
(26)

Let \(K\) be the \(L^2\) projection from \(L^2(\varOmega ; {\mathbb K})\) into \(\varGamma _h\),

let \(P_{i}, 0 \le i \le k\), be the \(L^2\) projection from \(L^2(\varOmega ;{\mathbb {V}})\) into

$$\begin{aligned} \left\{ v \in L^2(\varOmega ; {\mathbb {V}}) \,:\, v|_T \in {\mathcal {P}}_i(T; {\mathbb {V}}), \quad \forall T \in {\mathcal {T}}_h \right\} . \end{aligned}$$

For these interpolation operators and functions with suitable regularity, we have

$$\begin{aligned} \Vert \varvec{\tau }-J \varvec{\tau } \Vert _{0,h}&\le c h^{ \min \{k+1,r\} } \Vert \varvec{\tau } \Vert _r,&r > 1/2, \end{aligned}$$
(27)
$$\begin{aligned} \Vert v-Iv \Vert _{0}&\le c h^{ \min \{k+1,r\} } \Vert v \Vert _r,&r > 1/2,\end{aligned}$$
(28)
$$\begin{aligned} \Vert \varvec{\eta } - K \varvec{\eta } \Vert _0&\le c h^{ \min \{k+1,r\} } \Vert \varvec{\eta } \Vert _r ,&r \ge 0, \end{aligned}$$
(29)
$$\begin{aligned} \Vert (v - Iv, 0) \Vert _{h}&\le c h^{ \min \{k,r-1\} } \Vert v \Vert _r ,&r \ge 1, \end{aligned}$$
(30)
$$\begin{aligned} \Vert v - P_{k-1} v \Vert _0&\le c h^{ \min \{k,r\} } \Vert v \Vert _r,&r \ge 0. \end{aligned}$$
(31)

The estimates (29), (31) are obvious, and the estimates (27), (30), and (28) are consequences of Theorem 3.5 and Theorem 3.4 in [13]. Finally, using the triangle inequality, (29), and (30), one can get

$$\begin{aligned} \Vert (v - Iv, \varvec{\eta } - K \varvec{\eta } ) \Vert _{h} \le c \left( h^{ \min \{k,r-1\} } \Vert v \Vert _r + h^{ \min \{k+1, \tilde{r} \} } \Vert \varvec{\eta } \Vert _{\tilde{r}}\right) , \quad r \ge 1, \tilde{r} \ge 0. \end{aligned}$$
(32)

Optimal convergence rates of the \(L^2\)-errors of \(\varvec{\sigma }\) and \(\varvec{\gamma }\) are obtained by (27), (29), and the following theorem.

Theorem 2

Suppose that \((\varvec{\sigma }, u, \varvec{\gamma }), (\varvec{\sigma }_h, u_h, \varvec{\gamma }_h)\) are solutions of (5)–(6) and (16)–(17), respectively. Suppose also that \(\varvec{\sigma }, u\) satisfy (18). Then the following holds:

$$\begin{aligned} \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _{0} + \Vert P_{k-1} (u - u_h) \Vert _0 + \Vert \varvec{\gamma } - \varvec{\gamma }_h \Vert _0 \le c ( \Vert \varvec{\sigma } - J \varvec{\sigma } \Vert _0 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0) . \end{aligned}$$
(33)

Error bounds up to regularity of \((\varvec{\sigma },\varvec{\gamma })\) are immediately obtained by (27) and (29).

Proof

By Lemma 1, \((\varvec{\sigma },u,\varvec{\gamma })\) satisfies (16)–(17). The difference of (16)–(17) with \((\varvec{\sigma },u,\varvec{\gamma })\) and \((\varvec{\sigma }_h, u_h, \varvec{\gamma }_h)\) yields

$$\begin{aligned} (A( \varvec{\sigma } - \varvec{\sigma }_h), \varvec{\tau }) + (u - u_h , {\text {div}}_h\, \varvec{\tau }) - \sum _{e \in {\mathcal {F}}_u} \langle u - u_h, [\![\varvec{\tau } n ]\!]\rangle _e + (\varvec{\gamma } - \varvec{\gamma }_h, \varvec{\tau })&= 0, \end{aligned}$$
(34)
$$\begin{aligned} (\varvec{\sigma } - \varvec{\sigma }_h, \varvec{\eta } - {\text {grad}}_h v) + \sum _{e \in {\mathcal {F}}_{\sigma }} \langle (\varvec{\sigma } - \varvec{\sigma }_h)n , [\![v ]\!]\rangle _e&= 0, \end{aligned}$$
(35)

for any \(\varvec{\tau } \in \Sigma _h, v \in U_h, \varvec{\eta } \in \varGamma _h\).

We first prove an auxiliary estimate

$$\begin{aligned} \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0 \le c_0(\Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _0 + \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _0+ \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0) , \end{aligned}$$
(36)

with \(c_0 >0\) independent of mesh sizes. To prove it, let \(P_0(K\varvec{\gamma } - \varvec{\gamma }_h)\) be the \(L^2\) projection of \(K\varvec{\gamma } - \varvec{\gamma }_h\) into the space

$$\begin{aligned} {\mathcal {P}}_0({\mathcal {T}}_h; {\mathbb {K}}) := \{ \varvec{\eta } \in \varGamma \,:\, \varvec{\eta }|_T \in {\mathcal {P}}_0(T; {\mathbb {K}}) \; \forall T \in {\mathcal {T}}_h \}. \end{aligned}$$

It is known (e.g., [6, 21]) that there is \(\varvec{\tau }_0 \in H({\text {div}}, \varOmega ; {\mathbb M})\) such that

$$\begin{aligned}&\varvec{\tau }_0|_T \in {\mathcal {P}}_1(T; {\mathbb M}), \quad \forall T \in {\mathcal {T}}_h, \quad&(\varvec{\tau }_0, \varvec{\eta })&= (P_0(K\varvec{\gamma } - \varvec{\gamma }_h), \varvec{\eta }), \quad \forall \varvec{\eta } \in {\mathcal {P}}_0({\mathcal {T}}_h; {\mathbb K}), \\&{\text {div}}\, \varvec{\tau }_0 = 0,&\Vert \varvec{\tau }_0 \Vert _0&\le c \Vert P_0(K \varvec{\gamma } - \varvec{\gamma }_h) \Vert _0. \end{aligned}$$

By Lemma 9 there is \(\varvec{\tau }_1 \in \Sigma _h \cap H({\text {div}}, \varOmega ; {\mathbb M})\) such that \({\text {div}}\, \varvec{\tau }_1 = 0, \Vert \varvec{\tau }_1 \Vert _0 \le c \Vert K \varvec{\gamma } - \varvec{\gamma }_h - {\text {skw }}\varvec{\tau }_0 \Vert _0\) and \({\text {skw }}\varvec{\tau }_1 = K \varvec{\gamma } - \varvec{\gamma }_h - {\text {skw }}\varvec{\tau }_0\). Letting \(\varvec{\tau } = \varvec{\tau }_0 + \varvec{\tau }_1\), one can check that \({\text {div}}\, \varvec{\tau } = 0, [\![\varvec{\tau } n ]\!]|_e = 0\) for \(e \in {\mathcal {F}}_u, {\text {skw }}\varvec{\tau } = K \varvec{\gamma } - \varvec{\gamma }_h\), and

$$\begin{aligned} \Vert \varvec{\tau } \Vert _0 \le \Vert \varvec{\tau }_0 \Vert _0 + \Vert \varvec{\tau }_1 \Vert _0 \le c (\Vert K \varvec{\gamma } - \varvec{\gamma }_h \Vert _0 + \Vert \varvec{\tau }_0 \Vert _0) \le c \Vert K \varvec{\gamma } - \varvec{\gamma }_h \Vert _0. \end{aligned}$$

If we put this \(\varvec{\tau }\) in (34), then

$$\begin{aligned} 0&= (A( \varvec{\sigma } - \varvec{\sigma }_h), \varvec{\tau }) + (\varvec{\gamma } - \varvec{\gamma }_h, \varvec{\tau }) \\ \nonumber&= (A( \varvec{\sigma } - \varvec{\sigma }_h), \varvec{\tau }) + (\varvec{\gamma } - K \varvec{\gamma }, \varvec{\tau })+ \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0^2, \end{aligned}$$
(37)

where the second identity is obtained by

$$\begin{aligned} (\varvec{\gamma }-\varvec{\gamma }_h,\varvec{\tau })=(\varvec{\gamma }-K \varvec{\gamma },\varvec{\tau }) +(K \varvec{\gamma }-\varvec{\gamma }_h,\varvec{\tau }) \end{aligned}$$

combined with \((K \varvec{\gamma }-\varvec{\gamma }_h,{\text {sym }}\varvec{\tau })=0\) and \({\text {skw }}\varvec{\tau } = K \varvec{\gamma } - \varvec{\gamma }_h\). The estimate (36) follows by the above equation combined with the Cauchy–Schwarz inequality, \(\Vert \varvec{\tau } \Vert _0 \le c \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0\), the triangle inequality, and \(\Vert A \varvec{\tau } \Vert _0 \le c \Vert \varvec{\tau } \Vert _0\) for \(\varvec{\tau } \in L^2(\varOmega ; {\mathbb M})\).

Now we remark two observations. First, the definition of \(J \varvec{\sigma }\) and (35) with \(\varvec{\eta } = 0\) give

$$\begin{aligned} 0&= -(J\varvec{\sigma } - \varvec{\sigma }_h, {\text {grad}}_h v) + \sum _{e \in {\mathcal {F}}_{\sigma }} \langle (J\varvec{\sigma } - \varvec{\sigma }_h)n , [\![v ]\!]\rangle _e, \qquad \forall v \in U_h. \end{aligned}$$
(38)

Second, the definition of \(I u\) and (34) give

$$\begin{aligned} (A( \varvec{\sigma } - \varvec{\sigma }_h), \varvec{\tau }) + (Iu - u_h , {\text {div}}_h\, \varvec{\tau }) - \sum _{e \in {\mathcal {F}}_u} \langle Iu - u_h, [\![\varvec{\tau } n ]\!]\rangle _e + (\varvec{\gamma } - \varvec{\gamma }_h, \varvec{\tau }) = 0. \end{aligned}$$

Taking \(\varvec{\tau } = J \varvec{\sigma } - \varvec{\sigma }_h\) in this equation, and considering (38) with \(v = Iu - u_h\) and (15), we have

$$\begin{aligned} (A( \varvec{\sigma } - \varvec{\sigma }_h), J \varvec{\sigma } - \varvec{\sigma }_h) + (\varvec{\gamma } - \varvec{\gamma }_h, J \varvec{\sigma } - \varvec{\sigma }_h) = 0. \end{aligned}$$

This is equivalent to

$$\begin{aligned} (A( J\varvec{\sigma } - \varvec{\sigma }_h), J \varvec{\sigma } - \varvec{\sigma }_h)&= -(A( \varvec{\sigma } - J\varvec{\sigma }), J \varvec{\sigma } - \varvec{\sigma }_h) \nonumber \\&\quad - (\varvec{\gamma } - K \varvec{\gamma }, J \varvec{\sigma } - \varvec{\sigma }_h) - (K \varvec{\gamma } - \varvec{\gamma }_h, J \varvec{\sigma } - \varvec{\sigma }_h) \nonumber \\&= -(A( \varvec{\sigma } - J\varvec{\sigma }), J \varvec{\sigma } - \varvec{\sigma }_h) \nonumber \\&\quad - (\varvec{\gamma } - K \varvec{\gamma }, J \varvec{\sigma } - \varvec{\sigma }_h) - (K \varvec{\gamma } - \varvec{\gamma }_h, J \varvec{\sigma } - \varvec{\sigma }), \end{aligned}$$
(39)

where the last equality is due to \((K \varvec{\gamma } - \varvec{\gamma }_h, \varvec{\sigma }_h) = (K \varvec{\gamma } - \varvec{\gamma }_h, \varvec{\sigma })\) which is obtained from (35) with \(v=0, \varvec{\eta } = K\varvec{\gamma } - \varvec{\gamma }_h\). With this observation, the coercivity of \(A\), the Cauchy–Schwarz inequality, and Young’s inequality, we have

$$\begin{aligned} \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _0^2&\le C_{\epsilon } \left( \Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _0^2 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0^2\right) + \epsilon \left( \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _0^2 + \Vert K \varvec{\gamma } - \varvec{\gamma }_h \Vert _0^2\right) , \end{aligned}$$

for any \(\epsilon >0\) with an \(\epsilon \)-dependent constant \(C_{\epsilon } >0\). Using (36) and taking \(\epsilon \) sufficiently small, one can get

$$\begin{aligned} \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _0^2&\le C_{\epsilon }' \left( \Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _0^2 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0^2\right) + \frac{1}{2} \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _0^2 , \end{aligned}$$

with another constant \(C_{\epsilon }'\), so \(\Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _0 \le c (\Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _0 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0)\). The combination of this and the estimate (36) yields

$$\begin{aligned} \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 + \Vert K \varvec{\gamma } - \varvec{\gamma }_h \Vert _0 \le c ( \Vert \varvec{\sigma } - J \varvec{\sigma } \Vert _0 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0 ), \end{aligned}$$

and the assertion (33) for \(\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0\) and \(\Vert \varvec{\gamma } - \varvec{\gamma }_h \Vert _0\) follows by the triangle inequality.

To complete the proof, we need to show (33) for \(\Vert P_{k-1} (u - u_h) \Vert _0\). Note that \(d\)-tuples of the Brezzi–Douglas–Marini (BDM) element of order \(k\) is nothing but \(\Sigma _h^{BDM} := \Sigma _h \cap H({\text {div}}, \varOmega ; {\mathbb {M}})\), and it is known that there is \(\varvec{\tau } \in \Sigma _h^{BDM}\) such that \({\text {div}}\, \varvec{\tau } = P_{k-1}(u - u_h)\) and \(\Vert \varvec{\tau } \Vert _0 \le c \Vert P_{k-1}(u - u_h) \Vert _0\) (see [11, 12]). Inserting this \(\varvec{\tau }\) in (34) we have

$$\begin{aligned} (A(\varvec{\sigma } - \varvec{\sigma }_h, \varvec{\tau }) + \Vert P_{k-1}(u - u_h) \Vert _0^2 + (\varvec{\gamma } - \varvec{\gamma }_h, \varvec{\tau }) = 0. \end{aligned}$$

By the Cauchy–Schwarz inequality and \(\Vert \varvec{\tau } \Vert _0 \le c \Vert P_{k-1}(u - u_h) \Vert _0\), one can get

$$\begin{aligned} \Vert P_{k-1}(u - u_h) \Vert _0 \le c(\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 + \Vert \varvec{\gamma } - \varvec{\gamma }_h \Vert _0). \end{aligned}$$

Combining this with the previous result of \(\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 + \Vert \varvec{\gamma } - \varvec{\gamma }_h \Vert _0\), the assertion follows. \(\square \)

The estimate (33) does not give an optimal \(L^2\)-error bound of \(u\) with respect to the full approximability of \(U_h\). Now we show that the optimal order of \(L^2\)-error of \(u\) is obtained by the Aubin–Nitsche’s trick when the domain \(\varOmega \) satisfies the elliptic regularity assumption, namely, if the triple \(\tilde{\varvec{\sigma }} \in L^2(\varOmega ; {\mathbb M}), \tilde{u} \in H_0^1(\varOmega ; {\mathbb {V}}), \tilde{\varvec{\gamma }} \in L^2(\varOmega ; {\mathbb K})\) is a solution of the problem

$$\begin{aligned} (A \tilde{\varvec{\sigma }}, \varvec{\tau }) + ( \tilde{\varvec{\gamma }}-{\text {grad}}\, \tilde{u}, \varvec{\tau })&= 0,&\varvec{\tau } \in L^2(\varOmega ; {\mathbb M}), \end{aligned}$$
(40)
$$\begin{aligned} -( \varvec{\eta } - {\text {grad}}\, v, \tilde{\varvec{\sigma }})&= (g, v),&(v, \varvec{\eta }) \in H_0^1(\varOmega ; {\mathbb {V}}) \times L^2(\varOmega ; {\mathbb K}), \end{aligned}$$
(41)

for \(g \in L^2(\varOmega ; {\mathbb {V}})\), then

$$\begin{aligned} \Vert \tilde{\varvec{\sigma }} \Vert _1 + \Vert \tilde{u} \Vert _2 + \Vert \tilde{\varvec{\gamma }} \Vert _1 \le c \Vert g \Vert _0 \end{aligned}$$
(42)

holds with \(c>0\) depending only on \(\varOmega \) and \(A\). Here \(H_0^1(\varOmega ;{\mathbb {V}})\) is the subspace of \(H^1(\varOmega ; {\mathbb {V}})\) with vanishing traces [16]. There are several known results on domains satisfying this regularity assumption. For example, it is known that convex polygonal domain in two dimensions [9] and a domain with \(C^2\) boundary in \({\mathbb {R}}^d, d=2,3\), satisfy the assumption [35], Theorem7.1].

Theorem 3

Suppose that \((\varvec{\sigma }, u, \varvec{\gamma })\) and \((\varvec{\sigma }_h, u_h, \varvec{\gamma }_h)\) are given as in Theorem 2 and \(\varOmega \) satisfies the elliptic regularity assumption. Then

$$\begin{aligned} \Vert u - u_h \Vert _0 \le ch (\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _{0,h} + \Vert (u - u_h, \varvec{\gamma } - \varvec{\gamma }_h) \Vert _h) \end{aligned}$$

holds with \(c>0\) independent of \(h\). In particular, assuming that the solution \((\varvec{\sigma }, u, \varvec{\gamma })\) is sufficiently regular, this estimate with (24) will give the optimal \(O(h^{k+1})\) error bound for \(\Vert u - u_h \Vert _0\).

Proof

Let \((\tilde{\varvec{\sigma }}, \tilde{u}, \tilde{\varvec{\gamma }})\) be the solution of (40)–(41) for \(g = u - u_h\). Recalling the definition of \(b(\varvec{\tau };v,\varvec{\eta })\) in (14), we have

$$\begin{aligned} ( \tilde{\varvec{\gamma }}-{\text {grad}}_h\, \tilde{u}, \varvec{\tau })=b(\varvec{\tau };\tilde{u},\tilde{\varvec{\gamma }}), \qquad \forall \varvec{\tau }\in \Sigma _h, \end{aligned}$$
(43)

because \(\tilde{u}\) is in \(H^1(\varOmega ;{\mathbb {V}})\). Since the variational Eq. (41) implies that \(- {\text {div}}\, \tilde{\varvec{\sigma }} = u - u_h\), and \((\tilde{\varvec{\sigma }}, \varvec{\gamma } - \varvec{\gamma }_h)=0\) by choosing \(\varvec{\eta }=\varvec{\gamma }-\varvec{\gamma }_h\) and \(v=0\), the integration by parts gives

$$\begin{aligned} \Vert u - u_h \Vert _0^2&= (-{\text {div}}\, \tilde{\varvec{\sigma }}, u-u_h) - (\tilde{\varvec{\sigma }}, \varvec{\gamma } - \varvec{\gamma }_h) \\&= (\tilde{\varvec{\sigma }}, {\text {grad}}_h (u- u_h) - (\varvec{\gamma } - \varvec{\gamma }_h)) - \sum _{e \in {\mathcal {F}}_\sigma } \langle \tilde{\varvec{\sigma }} n, [\![u - u_h ]\!]\rangle _e \\&= - b(\tilde{\varvec{\sigma }}; u - u_h, \varvec{\gamma } - \varvec{\gamma }_h). \end{aligned}$$

If we take \(\varvec{\tau } = \varvec{\sigma } - \varvec{\sigma }_h\) in (40), add it with the above equation, and use the identity (43), then we have

$$\begin{aligned} -a(\tilde{\varvec{\sigma }} , \varvec{\sigma } - \varvec{\sigma }_h) - b( \varvec{\sigma } - \varvec{\sigma }_h; \tilde{u} , \tilde{\varvec{\gamma }}) -b(\tilde{\varvec{\sigma }} ; u - u_h, \varvec{\gamma } - \varvec{\gamma }_h)&= \Vert u - u_h \Vert _0^2 . \end{aligned}$$

By the Galerkin orthogonality, see (34) and (35), we have

$$\begin{aligned}&-a(\tilde{\varvec{\sigma }} - J \tilde{\varvec{\sigma }}, \varvec{\sigma } - \varvec{\sigma }_h) - b( \varvec{\sigma } - \varvec{\sigma }_h; \tilde{u} - I \tilde{u}, \tilde{\varvec{\gamma }} - K\tilde{\varvec{\gamma }}) \nonumber \\&\quad -b(\tilde{\varvec{\sigma }} - J \tilde{\varvec{\sigma }}; u - u_h, \varvec{\gamma } - \varvec{\gamma }_h) = \Vert u - u_h \Vert _0^2 . \end{aligned}$$
(44)

By (23) and the Cauchy–Schwarz inequality, one can have

$$\begin{aligned} | a(\tilde{\varvec{\sigma }} - J \tilde{\varvec{\sigma }}, \varvec{\sigma } - \varvec{\sigma }_h) |&\le ch \Vert \tilde{\varvec{\sigma }} \Vert _1 \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 \le ch \Vert \tilde{\varvec{\sigma }} \Vert _1 \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _{0,h}, \\ | b( \varvec{\sigma } - \varvec{\sigma }_h ; \tilde{u} - I \tilde{u}, \tilde{\varvec{\gamma }} - K\tilde{\varvec{\gamma }}) |&\le \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _{0,h} \Vert ( \tilde{u} - {I} \tilde{u}, \tilde{\varvec{\gamma }} - K\tilde{\varvec{\gamma }} )\Vert _h \\&\le ch \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _{0,h} (\Vert \tilde{\varvec{\gamma }} \Vert _1 + \Vert \tilde{u} \Vert _2) , \quad (\text {by} (32)) \\ | b(\tilde{\varvec{\sigma }} - J \tilde{\varvec{\sigma }}; u - u_h, \varvec{\gamma } - \varvec{\gamma }_h) |&\le \Vert \tilde{\varvec{\sigma }} - J \tilde{\varvec{\sigma }} \Vert _{0,h} \Vert (u -u_h, \varvec{\gamma } - \varvec{\gamma }_h) \Vert _h \\&\le ch \Vert \tilde{\varvec{\sigma }} \Vert _1 \Vert (u -u_h, \varvec{\gamma } - \varvec{\gamma }_h) \Vert _h . \quad (\text {by }(27)) \end{aligned}$$

The conclusion follows by (44), the above three estimates, and (42) with \(g = u - u_h\). \(\square \)

Finally, we remark that the numerical solution \(\varvec{\sigma }_h\) is indeed symmetric. This symmetry follows from the orthogonality between \(\varvec{\sigma }_h\) and \(\varGamma _h\), and the fact that \(\Sigma _h\) and \(\varGamma _h\) are spaces of polynomials of same degree on each element \(T \in {\mathcal {T}}_h\).

3.3 Robustness in Nearly Incompressible Materials and Pure Traction Boundary Conditions

In finite element methods for linear elasticity it is known that the energy minimization approach with standard continuous finite elements suffers from the volumetric locking for nearly incompressible materials [8]. When the volumetric locking occurs, the constants in error estimates may grow unboundedly as the incompressibility of the material increases. It is known that mixed methods based on the Hellinger–Reissner formulation of linear elasticity are locking-free [5]. We will prove that our method, which uses a variant of the Hellinger–Reissner formulation, is locking-free as well. For the compliance tensor of the form in (2), nearly incompressible materials have very large \(\lambda \) and the incompressible limit corresponds to \(\lambda = + \infty \). Therefore, in order to show that a numerical scheme is free from the locking, we need to obtain error estimates such that their involved constants are uniformly bounded for unboundedly growing \(\lambda \). Note that the error analysis in the proof of Theorem 2 is not enough because the coercivity constant of \(A\) on \(L^2(\varOmega ; {\mathbb {M}})\) converges to 0 as \(\lambda \rightarrow +\infty \). The key idea of our proof is originally from [5] but there is an additional technical difficulty due to the nonconformity of numerical stress with respect to \(H({\text {div}}, \varOmega ; {{\mathbb {M}}})\) because the proof in [5] requires that numerical stresses are in \(H({\text {div}}, \varOmega ; {{\mathbb {M}}})\).

In the rest of this section, we assume that compliance tensors have the form in (2) and \(\varLambda _D = \partial \varOmega \), as in [5]. Recall that we have extended compliance tensor, originally defined only on symmetric tensors, to be defined on general tensors. This extension is not necessarily unique. However, for simplicity, we will assume that the extended compliance tensor is of the form (2) for \(\varvec{\tau }\in L^2(\varOmega ; {\mathbb {M}})\). Before we state the main result let us introduce preliminary results for the proof. For \(\varvec{\tau } \in L^2(\varOmega ; {\mathbb M})\) let \(\varvec{\tau }^D := \varvec{\tau } - (1/d) {\text {tr }}(\varvec{\tau }) \varvec{I}\) with \(d\), the dimension of Euclidean space \({\mathbb {R}}^d\). The following lemma is proved in [5].

Lemma 3

For \(\varvec{\tau } \in L^2(\varOmega ; {\mathbb M})\) and \(A\) of the form in (2), the inequality

$$\begin{aligned} \Vert \varvec{\tau }^D \Vert _0 \le c \Vert \varvec{\tau } \Vert _A \end{aligned}$$
(45)

holds with \(c\) depending only on \(\mu \) and \(d\), where \(\Vert \varvec{\tau } \Vert _A = (A \varvec{\tau }, \varvec{\tau })^{1/2}\).

We remark that the interpolation \(I:H^1(\varOmega ; {\mathbb {V}}) \rightarrow U_h\) defined by (25) and (26), satisfies

$$\begin{aligned}&-(v - {I} v, {\text {div}}_h\, \varvec{\tau }) + \sum _{e \in {\mathcal {F}}_u} \langle v - {I} v, [\![\varvec{\tau } n ]\!]\rangle _e = 0, \qquad \forall \varvec{\tau } \in \Sigma _h, \end{aligned}$$
(46)
$$\begin{aligned}&\quad \Vert v - {I} v \Vert _0 \le ch \Vert v \Vert _1. \end{aligned}$$
(47)

Here is the main result of the subsection.

Theorem 4

Suppose that \(A\) is of the form in (2) with positive constants \(\lambda , \mu \), and suppose also that \((\varvec{\sigma }, u, \varvec{\gamma }), (\varvec{\sigma }_h, u_h, \varvec{\gamma }_h)\) are solutions of (5)–(6) and (16)–(17), satisfying (18). Then

$$\begin{aligned} \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 + \Vert P_{k-1} (u - u_h) \Vert _0 + \Vert \varvec{\gamma } - \varvec{\gamma }_h \Vert _0 \le c h^{\min \{r,k+1 \}} (\Vert \varvec{\sigma } \Vert _r + \Vert \varvec{\gamma } \Vert _r) \end{aligned}$$

for \(r > 1/2\), holds with a constant \(c\) which is uniformly bounded as \(\lambda \rightarrow + \infty \).

The proof of this theorem is lengthy, so we split it into two lemmas.

Lemma 4

Under the assumptions same as Theorem 4 we have

$$\begin{aligned} \Vert (\varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0 + \Vert P_{k-1} (u - u_h) \Vert _0 + \Vert \varvec{\gamma } - \varvec{\gamma }_h \Vert _0 \le c (\Vert \varvec{\sigma } - J \varvec{\sigma } \Vert _0 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0) , \end{aligned}$$

with \(c\) uniformly bounded as \(\lambda \rightarrow + \infty \).

Proof

We begin with deriving a necessary result for the estimate of \(\Vert (\varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0\). Clearly, \(\Vert \varvec{\tau }^D \Vert _0 \le c \Vert \varvec{\tau } \Vert _0\) holds. Then by the triangle inequality and Lemma 3,

$$\begin{aligned} \Vert (\varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0&\le \Vert (\varvec{\sigma } - J \varvec{\sigma })^D \Vert _0 + \Vert (J \varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0 \\&\le c( \Vert \varvec{\sigma } - J \varvec{\sigma } \Vert _0 + \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _A ). \end{aligned}$$

By (27) there is nothing to do for \(\Vert \varvec{\sigma } - J \varvec{\sigma } \Vert _0\) and the desired estimate of \(\Vert (\varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0\) follows from a suitable estimate of \(\Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _A\).

To complete the proof we repeat the proof of Theorem 2 with some changes, and we only explain the necessary changes because most of the steps are same. To estimate \(\Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _A\) we first note that

$$\begin{aligned} (A \varvec{\tau }, \varvec{\eta }) = (2 \mu )^{-1} ( \varvec{\tau }, \varvec{\eta }) \quad \text { and } \quad (\varvec{\tau }, \varvec{\eta }) = (\varvec{\tau }^D, \varvec{\eta }), \end{aligned}$$
(48)

for \(\varvec{\tau } \in L^2(\varOmega ; {\mathbb M})\) and \(\varvec{\eta } \in L^2(\varOmega ; {\mathbb K})\), due to skew-symmetry of \(\varvec{\eta }\). Note also that \(\Vert \cdot \Vert _A \le c \Vert \cdot \Vert _0\) holds with \(c>0\) independent of \(\lambda \). We then claim that

$$\begin{aligned} \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0 \le c_0(\Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _0 + \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _A + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0) , \end{aligned}$$
(49)

which is an analogue of (36) but with \(c_0\) independent of \(\lambda \). This estimate follows easily from (37) with the same \(\varvec{\tau }\) by

$$\begin{aligned} \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0^2&= - (A( \varvec{\sigma } - \varvec{\sigma }_h), \varvec{\tau }) - (\varvec{\gamma } - K \varvec{\gamma }, \varvec{\tau }) \\&\le \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _A \Vert \varvec{\tau } \Vert _A + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0 \Vert \varvec{\tau } \Vert _0 \\&\le c(\Vert \varvec{\sigma } - J \varvec{\sigma } \Vert _0 + \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _A + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0) \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0 , \end{aligned}$$

in which we used the Cauchy–Schwarz inequality with \(\Vert \cdot \Vert _A\) and \(\Vert \cdot \Vert _0\), the triangle inequality, and two facts, \(\Vert \cdot \Vert _A \le c \Vert \cdot \Vert _0\) and \(\Vert \varvec{\tau } \Vert _0 \le c \Vert K\varvec{\gamma } - \varvec{\gamma }_h \Vert _0\). The equalities (39) and (48) give

$$\begin{aligned} \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _A^2&= -(A( \varvec{\sigma } - J\varvec{\sigma }), J \varvec{\sigma } - \varvec{\sigma }_h) \\&\quad - 2 \mu (\varvec{\gamma } - K \varvec{\gamma }, A(J \varvec{\sigma } - \varvec{\sigma }_h)) - 2 \mu (K \varvec{\gamma } - \varvec{\gamma }_h, A(J \varvec{\sigma } - \varvec{\sigma })) . \end{aligned}$$

Young’s inequality with \(\Vert \cdot \Vert _A\) norm gives

$$\begin{aligned} \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _A^2 \le C_{\epsilon } \left( \Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _A^2 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _A^2\right) + \epsilon \left( \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _A^2 + \Vert K \varvec{\gamma } - \varvec{\gamma }_h \Vert _A^2\right) . \end{aligned}$$

Using \(\Vert \cdot \Vert _A \le c \Vert \cdot \Vert _0\) and the estimate (49), and taking \(\epsilon \) sufficiently small,

$$\begin{aligned} \Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _A^2&\le C_{\epsilon }' \left( \Vert \varvec{\sigma } - J\varvec{\sigma } \Vert _0^2 + \Vert \varvec{\gamma } - K \varvec{\gamma } \Vert _0^2\right) + \frac{1}{2} \Vert J \varvec{\sigma } - \varvec{\sigma }_h \Vert _A^2 . \end{aligned}$$

This leads to the desired estimate of \(\Vert J\varvec{\sigma } - \varvec{\sigma }_h \Vert _A\). The rest of the proof is completely analogous to that of Theorem 2, so we omit details. \(\square \)

Lemma 4 gives the \(L^2\)-error estimates of \(u\) and \(\varvec{\gamma }\) in Theorem 4. We now prove the estimate of \(\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0\) in Theorem 4.

Lemma 5

Under the assumptions same as Theorem 4 we have

$$\begin{aligned} \Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 \le c h^{\min \{r,k+1 \}} (\Vert \varvec{\sigma } \Vert _r + \Vert \varvec{\gamma } \Vert _r) , \quad r > \frac{1}{2}, \end{aligned}$$

with \(c\) uniformly bounded as \(\lambda \rightarrow + \infty \).

Proof

By the triangle inequality \(\Vert \varvec{\sigma } - \varvec{\sigma }_h \Vert _0 \le \Vert (\varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0 + \Vert {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h)\varvec{I} \Vert _0\) and by Lemma 4, it is enough to estimate \(\Vert {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h) \Vert _0\). If we take \(\varvec{\tau } = \varvec{I}\) in (34), then

$$\begin{aligned} (A(\varvec{\sigma } - \varvec{\sigma }_h), \varvec{I}) = \frac{1}{2 \mu + d \lambda } \int _{\varOmega } {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h)\, dx = 0, \end{aligned}$$

so \({\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h)\) is mean-value zero. It is known (see, e.g., [18]) that there is \(w \in H_0^1(\varOmega ; {\mathbb {V}})\) such that

$$\begin{aligned} {\text {div}}^* w = {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h), \qquad \Vert w \Vert _1 \le c \Vert {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h) \Vert _0, \end{aligned}$$
(50)

where \({\text {div}}^*\) stands for the column-wise divergence for the \({\mathbb {V}}\)-valued function \(w\). Then

$$\begin{aligned} \Vert {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h) \Vert _0^2&= ( {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h), {\text {div}}^* w ) \nonumber \\&= ( {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h) \varvec{I} , {\text {grad}}\, w) \nonumber \\&= d ( (\varvec{\sigma } - \varvec{\sigma }_h) - (\varvec{\sigma } - \varvec{\sigma }_h)^D, {\text {grad}}\, w). \end{aligned}$$
(51)

By the Cauchy–Schwarz inequality and the inequality in (50),

$$\begin{aligned} \left( (\varvec{\sigma } - \varvec{\sigma }_h)^D, {\text {grad}}\, w\right) \le c \Vert (\varvec{\sigma } - \varvec{\sigma }_h)^D \Vert _0 \Vert {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h) \Vert _0 . \end{aligned}$$
(52)

We now claim that

$$\begin{aligned} ( \varvec{\sigma } - \varvec{\sigma }_h, {\text {grad}}\, w) \le c h^{\min \{r,k+1 \}} \Vert \varvec{\sigma } \Vert _r \Vert {\text {tr }}(\varvec{\sigma } - \varvec{\sigma }_h) \Vert _0 , \qquad r > \frac{1}{2}, \end{aligned}$$
(53)

and the proof will be completed by combining this, (52), and Lemma 4.

To prove (53), notice first that we can obtain the following from (35) using the integration by parts and taking \(\varvec{\eta } = \varvec{0}\):

$$\begin{aligned} ({\text {div}}_h(\varvec{\sigma } - \varvec{\sigma }_h), v) - \sum _{e \in {\mathcal {F}}_u} \langle [\![(\varvec{\sigma } - \varvec{\sigma }_h)n ]\!], v \rangle _e = 0, \qquad \forall v \in U_h. \end{aligned}$$
(54)

By the integration by parts and (54),

$$\begin{aligned} ( \varvec{\sigma } - \varvec{\sigma }_h, {\text {grad}}\, w)&= -({\text {div}}_h (\varvec{\sigma } - \varvec{\sigma }_h), w) + \sum _{e \in {\mathcal {F}}_u} \langle [\![(\varvec{\sigma } - \varvec{\sigma }_h)n ]\!], w \rangle _e \\&= -({\text {div}}_h (\varvec{\sigma } - \varvec{\sigma }_h), w- I w) + \sum _{e \in {\mathcal {F}}_u} \langle [\![(\varvec{\sigma } - \varvec{\sigma }_h)n ]\!], w - {I} w \rangle _e . \end{aligned}$$

Let \(\varvec{\sigma }_h' \in \Sigma _h^{BDM}\) be the object obtained from \(\varvec{\sigma }\) by the canonical interpolation of the BDM element of order \(k\). Then, in the last form of the above, \(\varvec{\sigma }_h\) can be replaced by \(\varvec{\sigma }_h'\) due to (46), and we get

$$\begin{aligned} ( \varvec{\sigma } - \varvec{\sigma }_h, {\text {grad}}\, w) = ({\text {div}}_h (\varvec{\sigma } - \varvec{\sigma }_h'), w- {I} w) . \end{aligned}$$
(55)

Here the jump terms disappeared because \(\varvec{\sigma }_h'\) has no normal jump. By the integration by parts,

$$\begin{aligned} ({\text {div}}_h (\varvec{\sigma } - \varvec{\sigma }_h'), w- {I} w)&= -((\varvec{\sigma } - \varvec{\sigma }_h'), {\text {grad}}_h ( w- {I} w)) \\&\quad + \sum _{e \in {\mathcal {F}}_{\sigma }} \langle (\varvec{\sigma } - \varvec{\sigma }_h')n, [\![w - Iw ]\!]\rangle _e \\&\le \Vert \varvec{\sigma } - \varvec{\sigma }_h' \Vert _{0,h} \Vert (w - Iw, 0) \Vert _h \\&\le c h^{\min \{r,k+1 \}} \Vert \varvec{\sigma } \Vert _r \Vert w \Vert _1, \end{aligned}$$

where the last inequality follows by an approximation estimate of BDM interpolation [11] and (30). Now (53) follows by (55), the above inequality, and (50). \(\square \)

As the final topic in this section, we discuss necessary modifications of our numerical scheme for the problems with pure traction boundary conditions. For these boundary conditions the space of rigid body motions on \(\varOmega \), denoted by \(RM(\varOmega )\), is the kernel of the problem, so it needs to be ruled out in the finite element spaces. More precisely, we define

$$\begin{aligned} \widehat{\Sigma }_h&= \{ \varvec{\tau } \in \Sigma _h \,:\, \varvec{\tau }n = 0 \; \text { on } \partial \varOmega \}, \\ \widehat{U}_h&= \left\{ v \in U_h \,:\, \int _{\varOmega } v \cdot w \,dx = 0 \; \text { for } w \in RM(\varOmega ) \right\} . \end{aligned}$$

It is easy to see that \(\Vert \cdot \Vert _{0,h}\) in (12) is a norm on \(\widehat{\Sigma }_h\). To see that \(\Vert (\cdot , \cdot ) \Vert _h\) in (13) is a norm on \(\widehat{U}_h \times \widehat{\varGamma }_h\), we only need to check that \(\Vert ( v, \varvec{\eta }) \Vert _h = 0\) for \((v, \varvec{\eta }) \in \widehat{U}_h \times \widehat{\varGamma }_h\) implies that \(v = \varvec{\eta } = 0\). If \(\Vert ( v, \varvec{\eta }) \Vert _h = 0\), then \((\varvec{\eta } - {\text {grad}}_h\, v)|_T = 0\) for all \(T \in {\mathcal {T}}_h\), and thus \({\text {sym }}{\text {grad}}_h\, v = 0\). In other words, \(v|_T\) is a rigid body motion on each \(T \in {\mathcal {T}}_h\). Since \([\![v ]\!]|_e = 0\) for all \(e \in {\mathcal {F}}_{\sigma }\) due to \(\Vert ( v, \varvec{\eta }) \Vert _h = 0, v\) is a rigid body motion on \(\varOmega \) and therefore \(v=0\) by the definition of \(\widehat{U}_h\). Then \(\varvec{\eta } = 0\) holds because \(\Vert (0, \varvec{\eta }) \Vert _h = 0\).

4 Numerical Results

For our staggered DG method, we test a simple model problem in two dimensions with the exact solution,

$$\begin{aligned} u(x,y)=\begin{pmatrix} \sin (\pi x)\quad \sin (\pi y) \\ x(1-x){\quad }y(1-y) \end{pmatrix},\quad \forall (x,y) \in \varOmega , \end{aligned}$$
(56)

where the computational domain \(\varOmega \) is the unit square, \([0 \; 1] \times [0 \; 1] \subset {\mathbb {R}}^2\).

The computational domain is divided into a conforming triangulation \({\mathcal {M}}_h\) and then each triangle \(M\) in \({\mathcal {M}}_h\) is subdivided into three small triangles to form the resulting triangulation \({\mathcal {T}}_h\). All the finite element functions are piecewise polynomials in the triangulation \({\mathcal {T}}_h\) with partial continuity across the edges of the triangulation as described in Sect. 3.1. The order of polynomials of approximation spaces is taken to be \(k=1\) in the following experiments for simplicity.

In our experiment we test our algorithm for both structured and unstructured meshes by decreasing the mesh size. We consider the exact solution in (56) with the material parameters \(\mu =1\) and \(\lambda =1\). In the structured mesh, the computational domain is divided into \(N \times N\) uniform squares and then each square is divided into two triangles to obtain the triangulation \({\mathcal {M}}_h\). The triangles \(M\) in \({\mathcal {M}}_h\) are then divided into three subtriangles to form the resulting triangulation \({\mathcal {T}}_h\), see Fig. 2 for an example. In the unstructured mesh, the initial triangulation \({\mathcal {M}}_{h_0}\) is refined into \({\mathcal {M}}_{h_1}\) by connecting the mid points of edges of each triangles \(M\) in \({\mathcal {M}}_{h_0}\), see Fig. 3. The refinement is done successively to get more refined mesh up to the level 5, \({\mathcal {M}}_{h_5}\). Similarly as in Fig. 2, each triangles in \({\mathcal {M}}_{h_l}\) are then subdivided into three small triangles to form the resulting triangulation \({\mathcal {T}}_{h_l}\). In the following experiments, the Dirichlet boundary condition is imposed on the whole boundary of the computational domain.

Fig. 2
figure 2

Structured mesh: triangulation \({\mathcal {M}}_h\) (left) and \({\mathcal {T}}_h\) (right) when \(N=2\)

Fig. 3
figure 3

Unstructured mesh: level 0 mesh, \({\mathcal {M}}_{h_0}\) (left) and level 1 mesh, \({\mathcal {M}}_{h_1}\) (right)

In Table 1, errors are presented by increasing \(N\), i.e., by decreasing the mesh size \(h=1/N\). We use the notations \(E(u):=u-u_h, E(\varvec{\sigma }):=\varvec{\sigma }-\varvec{\sigma }_h, E(\varvec{\eta }):=\varvec{\eta }-\varvec{\eta }_h\), and \(E(u,\varvec{\eta }):=(u-u_h,\varvec{\eta }-\varvec{\eta }_h)\). We observe that \(L^2\)-errors for \(u-u_h, \varvec{\sigma }-\varvec{\sigma }_h\), and \(\varvec{\eta }-\varvec{\eta }_h\) follow the order \(2\) and the error \(\Vert (u-u_h,\varvec{\eta }-\varvec{\eta }_h) \Vert _h\) follows the order \(1\), which are optimal with respect to the given polynomial order \(k=1\).

Table 1 Structured mesh: convergence of errors for the approximation using \(P_1\) elements, \(E(u):=u-u_h, E(\varvec{\sigma }):=\varvec{\sigma }-\varvec{\sigma }_h, E(\varvec{\eta }):=\varvec{\eta }-\varvec{\eta }_h\), and \(E(u,\varvec{\eta }):=(u-u_h, \varvec{\eta }-\varvec{\eta }_h)\)

In Table 2, errors are presented by increasing the refinement level \(l\) in the unstructured meshes. We also observe that \(L^2\)-errors for \(u-u_{h_l}, \varvec{\sigma }-\varvec{\sigma }_{h_l}\), and \(\varvec{\eta }-\varvec{\eta }_{h_l}\) follow the order \(k+1\) and the errors for \((u-u_{h_l},\varvec{\eta }-\varvec{\eta }_{h_l})\) in the discrete energy norm follow the order \(k\).

Table 2 Unstructured mesh: convergence of errors for the approximation using \(P_1\) elements, \(E(u):=u-u_{h_l}, E(\varvec{\sigma }):=\varvec{\sigma }-\varvec{\sigma }_{h_l}, E(\varvec{\eta }):=\varvec{\eta }-\varvec{\eta }_{h_l}\), and \(E(u,\varvec{\eta }):=(u-u_{h_l}, \varvec{\eta }-\varvec{\eta }_{h_l})\)

To show locking-free property of our numerical scheme, we consider a model problem with the same exact solution in (56) and with the shear modulus \(\mu =1\) and the material parameter \(\lambda \), which is determined by the given Poisson ratio \(\nu \) and equation \(\lambda = 2 \mu \nu / (1-2 \nu )\). We compare errors for the various values of \(\nu =0.3,\, 0.49,\, 0.499\) and the errors are plotted in Fig. 4 with respect to the number of triangles in one direction. For the stress tensor, the relative \(L^2\)-error is calculated by using \(\Vert \varvec{\sigma }_h - \varvec{\sigma } \Vert _0/\Vert f \Vert _{0}\), where \(f\) is the right hand side of (1) for the exact solution in (56) with the given \(\mu \) and \(\lambda \), see also (2). The uniform meshes as in Fig. 2 are used in our experiment. The relative \(L^2\)-errors for the stress tensor follow the growth of the second order to the size of meshes and they seem to approach the same value as \(\nu \) gets closer to the limit value \(0.5\). For \(L^2\)-errors of displacement, as the mesh size gets closer to zero, we can observe more reduction in the ratio of errors when \(\nu \) approaches the limit value \(0.5\). In other words, the slope of errors is getting steeper for decreasing the mesh size \(h\) as \(\nu \) gets closer to \(0.5\). Though the \(\Vert (\cdot , \cdot ) \Vert _h\)-norm for \(({u}_h,{\varvec{\eta }}_h)\) is not covered by our theory, we can numerically observe the locking-free property for that case (Fig. 4).

Fig. 4
figure 4

Plot of errors with respect to the number of triangles in one direction as \(\nu \) approaches the limit value \(0.5\)

We now consider an example with a singular solution in a non-convex domain. The computational domain is an L-shaped domain \(\varOmega =[-1 \, 1] \times [-1 \, 1] \setminus (0 \, 1) \times (0 \, 1) \subset {\mathbb {R}}^2\) (see Fig. 5), where a model problem with the following exact solution is considered

$$\begin{aligned} u(x,y)=\left( x^2+y^2\right) ^{\alpha } \begin{pmatrix} 1\\ 1 \end{pmatrix}, \qquad \alpha \ge 0 . \end{aligned}$$

Here \(\alpha \) is chosen as \(1/3\). With this choice of \(\alpha , u\) is in \(H^r(\varOmega ;{{\mathbb {V}}})\) for any \(0 \le r < 2 \alpha +1(\simeq 1.67)\). Starting from the initial uniform mesh \({\mathcal {M}}_{h_0}\), the refined meshes \({\mathcal {M}}_{h_l}\) are obtained as in Fig. 5. Numerical results are presented in Table 3 and again here the Dirichlet boundary condition is enforced on the whole boundary of the L-shaped domain. From the results, we observe that the errors except the \(L^2\) errors of \(u\) follow the order about 0.66 determined by the regularity of the model problem. In addition, the \(L^2\)-errors of \(u\) are observed to be about 1.4 which is affected by the limited regularity of the L-shaped domain.

Fig. 5
figure 5

L-shaped domain: initial mesh \({\mathcal {M}}_{h_0}\) (left) and level2 mesh \({\mathcal {M}}_{h_2}\) (right)

Table 3 L-shaped domain with a singular solution: convergence of errors for the approximation using \(P_1\) elements, \(E(u):=u-u_{h_l}, E(\varvec{\sigma }):=\varvec{\sigma }-\varvec{\sigma }_{h_l}, E(\varvec{\eta }):=\varvec{\eta }-\varvec{\eta }_{h_l}\), and \(E(u,\varvec{\eta }):=(u-u_{h_l}, \varvec{\eta }-\varvec{\eta }_{h_l})\)

As the last numerical experiment, we consider Cook’s membrane problem with the geometry in Fig. 6. The left part of the boundary is clammed on the wall, a surface load in vertical direction is enforced on the opposite right part of the boundary, and the zero surface load on the rest part of the boundary. The body force \(f\) is zero and the Young modulus \(E=2900\) and Poisson ratio \(\nu =0.3\) are given. The deformed mesh from the Cook’s membrane problem is plotted in Fig. 6.

Fig. 6
figure 6

Cook’s membrane problem: staggered mesh on the geometry (left) and deformed mesh (right)