1 Introduction

Many physical problems are of obstacle type, or more generally, described by variational inequalities [25, 29]. In this article we consider, as a model problem, the classical obstacle problem where one seeks the equilibrium position of an elastic membrane constrained to lie over an obstacle. Another important example of an elliptic obstacle problem is the bending of a plate over an obstacle.

There exists already a long history of numerical methods, in particular finite element methods, see e.g., the books [16, 17] for an overview on the topic. However, the literature on least-squares methods for obstacle problems is scarce. In fact, until the writing of this paper only [9] was available for the classical obstacle problem where the idea goes back to a Nitsche-based method for contact problems introduced and analyzed in [11]. An analysis of first-order least-squares finite element methods for Signorini problems can be found in [1] and more recently [26]. Let us also mention the pioneering work [14] for the a priori analysis of a classical finite element scheme. Newer articles include [18,19,20] where mixed and stabilized methods are considered.

Least-squares finite element methods are a widespread class of numerical schemes and their basic idea is to approximate the solution by minimizing the residual in some given norm. Let us recall some important properties of least-squares finite element methods, a detailed list is given in the introduction of the overview article [5], see also the book [6].

  • Unconstrained stability One feature of least-squares schemes is that the methods are stable for all pairings of discrete spaces.

  • Adaptivity Another feature is that a posteriori error bounds are obtained by simply evaluating the least-squares functional. For instance, standard least-squares methods for the Poisson problem [6] are based on minimizing residuals in \(L^2\) norms, which can be localized and, then, be used as error indicators in an adaptive algorithm.

The main purpose of this paper is to close the gap in the literature and define least-squares based methods for the obstacle problem. In particular, we want to study if the aforementioned properties transfer to the case of obstacle problems. Let us shortly describe the functional our method is based on. For simplicity assume a zero obstacle (the remainder of the paper deals with general non-zero obstacles). Then, the problem reads

$$\begin{aligned} -\,\Delta u\ge f, \quad u\ge 0, \quad (-\,\Delta u-f)u = 0 \end{aligned}$$

in some domain \(\Omega \) and \(u|_{\partial \Omega }=0\). Introducing the Lagrange multiplier (or reaction force) \(\lambda = -\Delta u-f\) and \(\varvec{\sigma }=\nabla u\), we rewrite the problem as a first-order system, see also [2, 3, 9, 18],

$$\begin{aligned} -\,{{\,\mathrm{div}\,}}\varvec{\sigma }- \lambda = f, \quad \varvec{\sigma }-\nabla u = 0, \quad u\ge 0, \quad \lambda \ge 0, \quad \lambda u = 0. \end{aligned}$$

Note that \(f\in L^2(\Omega )\) does not imply more regularity for u so that \(\lambda \in H^{-1}(\Omega )\) lives only in the dual space in general. However, observe that \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda =-f\in L^2(\Omega )\) and therefore the functional

$$\begin{aligned} J((u,\varvec{\sigma },\lambda ); f) := \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda +f\Vert _{}^2 + \Vert \nabla u-\varvec{\sigma }\Vert _{}^2 + \langle \lambda ,u\rangle , \end{aligned}$$

where \(\langle \cdot ,\cdot \rangle \) denotes a duality pairing, is well-defined for \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \in L^2(\Omega )\). We will show that minimizing J over a convex set with the additional constraints \(u\ge 0\), \(\lambda \ge 0\) is equivalent to solving the obstacle problem. We will consider the variational inequality associated to this problem with corresponding bilinear form \(a(\cdot ,\cdot )\). An issue that arises is that \(a(\cdot ,\cdot )\) is not necessarily coercive. However, as it turns out, a simple scaling of the first term in the functional ensures coercivity on the whole space. In view of the aforementioned properties, this means that our method is unconstrained stable. The recent work [18] based on a Lagrange formulation (without reformulation to a first-order system) considers augmenting the trial spaces with bubble functions (mixed method) resp. adding residual terms (stabilized method) to obtain stability. The authors extended their work also to plate-bending problems, see [20].

Another motivation of the proposed first-order reformulation is that it allows to simultaneously approximate displacements and stresses. In many problems of structural engineering the stress is usually the primary quantity of interest. For the present problem of an elastic membrane the stress is directly related to the gradient and for the problem of bending a plate over an obstacle the physical quantities of interest are the bending moments.

Furthermore, we will see that the functional J evaluated at some discrete approximation \((u_h,\varvec{\sigma }_h,\lambda _h)\) with \(u_h,\lambda _h\ge 0\) is an upper bound for the error. Note that for \(\lambda _h\in L^2(\Omega )\) the duality \(\langle \lambda _h,u_h\rangle \) reduces to the \(L^2\) inner product. Thus, all the terms in the functional can be localized and used as error indicators.

Additionally, we will derive and analyse other variational inequalities that are also based on the first-order reformulation. The resulting methods are quite similar to the least-squares scheme since they share the same residual terms. The only difference is that the complementary condition \(\lambda u = 0\) is incorporated in a different, non-symmetric, way. We will present a uniform analysis that covers the least-squares formulation and the novel variational inequalities of the obstacle problem.

Finally, we point out that the use of adaptive schemes for obstacle problems is quite natural. First, the solutions may suffer from singularities stemming from the geometry, and second, the free boundary is a priori unknown. There exists plenty of literature on a posteriori estimators resp. adaptivity for finite elements methods for the obstacle problem, see e.g., [4, 7, 10, 27, 28, 31, 32] to name a few. Many of the estimators are based on the use of a discrete Lagrange multiplier which is obtained in a postprocessing step. In contrast, our proposed methods simultaneously approximate the Lagrange multiplier. This allows for a simple analysis of reliable a posteriori bounds.

1.1 Outline

The remainder of the paper is organized as follows. In Sect. 2 we describe the model problem, introduce the corresponding first-order system and based on that reformulation define our least-squares method. Then, Sect. 3 deals with the definition and analysis of different variational inequalities. In Sect. 4 we provide an a posteriori analysis and numerical studies are presented in Sect. 5. The appendix contains an example, which shows that \(a(\cdot ,\cdot )\) is not coercive in general, and proofs of some auxiliary results.

2 Least-squares method

In Sects. 2.1 and 2.2 we describe the model problem and introduce the reader to our notation. Then, Sect. 2.3 is devoted to the definition and analysis of a least-squares functional.

2.1 Model problem

Let \(\Omega \subset \mathbb {R}^n\), \(n=2,3\) denote a polygonal Lipschitz domain with boundary \(\Gamma =\partial \Omega \). For given \(f\in L^2(\Omega )\) and \(g\in H^1(\Omega )\) with \(g|_{\Gamma }\le 0\) we consider the classical obstacle problem: find a solution u to

$$\begin{aligned} -\Delta u\ge & {} f \quad \text {in}\,\Omega , \end{aligned}$$
(1a)
$$\begin{aligned} u\ge & {} g \quad \text {in}\,\Omega , \end{aligned}$$
(1b)
$$\begin{aligned} (u-g)(-\Delta u-f)= & {} 0 \quad \text {in}\,\Omega , \end{aligned}$$
(1c)
$$\begin{aligned} u= & {} 0 \quad \text {on}\,\Gamma . \end{aligned}$$
(1d)

It is well-known that this problem admits a unique solution \(u\in H_0^1(\Omega )\), and it can be equivalently characterized by the variational inequality: find \(u\in H_0^1(\Omega )\), \(u\ge g\) such that

$$\begin{aligned} \int _{\Omega } \nabla u \cdot \nabla (v-u)\, dx \ge \int _\Omega f(v-u) \,dx \quad \text {for all } v\in H_0^1(\Omega ), v\ge g, \end{aligned}$$
(2)

see [25]. For a more detailed description of the involved function spaces we refer to Sect. 2.2 below.

2.2 Notation and function spaces

We use the common notation for Sobolev spaces \(H_0^1(\Omega )\), \(H^s(\Omega )\) (\(s>0\)). Let \((\cdot ,\cdot )\) denote the \(L^2(\Omega )\) inner product, which induces the norm \(\Vert \cdot \Vert _{}\). The dual of \(H_0^1(\Omega )\) is denoted by \(H^{-1}(\Omega ) := (H_0^1(\Omega ))^*\), where duality \(\langle \cdot ,\cdot \rangle \) is understood with respect to the extended \(L^2(\Omega )\) inner product. We equip \(H^{-1}(\Omega )\) with the dual norm

$$\begin{aligned} \Vert \lambda \Vert _{-1} := \sup _{0\ne v\in H_0^1(\Omega )} \frac{\langle \lambda ,v\rangle }{\Vert \nabla v\Vert _{}}. \end{aligned}$$

Recall Friedrichs’ inequality

$$\begin{aligned} \Vert u\Vert _{} \le C_F\Vert \nabla v\Vert _{} \quad \text {for }v\in H_0^1(\Omega ), \end{aligned}$$

where \(0<C_F=C_F(\Omega )\le {{\,\mathrm{diam}\,}}(\Omega )\). Thus, by definition we have \(\Vert \lambda \Vert _{-1}\le C_F\Vert \lambda \Vert _{}\) for \(\lambda \in L^2(\Omega )\).

Let \({{\,\mathrm{div}\,}}{:}\,\varvec{L}^2(\Omega ):=L^2(\Omega )^n \rightarrow H^{-1}(\Omega )\) denote the generalized divergence operator, i.e., \(\langle {{\,\mathrm{div}\,}}\varvec{\sigma },u\rangle := -(\varvec{\sigma },\nabla u)\) for all \(\varvec{\sigma }\in \varvec{L}^2(\Omega )\), \(u\in H_0^1(\Omega )\). This operator is bounded,

$$\begin{aligned} \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }\Vert _{-1} = \sup _{0\ne v\in H_0^1(\Omega )} \frac{\langle {{\,\mathrm{div}\,}}\varvec{\sigma },v\rangle }{\Vert \nabla v\Vert _{}} = \sup _{0\ne v\in H_0^1(\Omega )} \frac{-(\varvec{\sigma },\nabla v)}{\Vert \nabla v\Vert _{}} \le \Vert \varvec{\sigma }\Vert _{}. \end{aligned}$$

Let \(v\in H^1(\Omega )\). We say \(v\ge 0\) if \(v\ge 0\) a.e. in \(\Omega \). Moreover, \(\lambda \ge 0\) for \(\lambda \in H^{-1}(\Omega )\) means that \(\langle \lambda ,v\rangle \ge 0\) for all \(v\in H_0^1(\Omega )\) with \(v\ge 0\). We also interpret \(v\ge w\) as \(v-w\ge 0\) for \(v,w\in H^1(\Omega )\).

Define the space

$$\begin{aligned} V := H_0^1(\Omega )\times \varvec{L}^2(\Omega ) \times H^{-1}(\Omega ) \end{aligned}$$

with norm

$$\begin{aligned} \Vert \varvec{v}\Vert _{V}^2 := \Vert \nabla v\Vert _{}^2 + \Vert {\varvec{\tau }}\Vert _{}^2 + \Vert \mu \Vert _{-1}^2 \quad \text {for } \varvec{v}= (v,{\varvec{\tau }},\mu ) \in V \end{aligned}$$

and the space

$$\begin{aligned} U := \big \{(u,\varvec{\sigma },\lambda )\in V{:}\,{{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \in L^2(\Omega )\big \} \end{aligned}$$

with norm

$$\begin{aligned} \Vert \varvec{u}\Vert _{U}^2 := \Vert \nabla u\Vert _{}^2 + \Vert \varvec{\sigma }\Vert _{}^2 + \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \Vert _{}^2 \quad \text {for }\varvec{u}= (u,\varvec{\sigma },\lambda )\in U. \end{aligned}$$

Observe that \(\Vert \cdot \Vert _{U}\) is a stronger norm than \(\Vert \cdot \Vert _{V}\), i.e.,

$$\begin{aligned} \Vert \nabla u\Vert _{}^2 + \Vert \varvec{\sigma }\Vert _{}^2 + \Vert \lambda \Vert _{-1}^2&\le \Vert \nabla u\Vert _{}^2 + \Vert \varvec{\sigma }\Vert _{}^2 + 2\Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \Vert _{-1}^2 +2\Vert {{\,\mathrm{div}\,}}\varvec{\sigma }\Vert _{-1}^2 \\&\le \Vert \nabla u\Vert _{}^2 + 3\Vert \varvec{\sigma }\Vert _{}^2 + 2C_F^2\Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \Vert _{}^2. \end{aligned}$$

Our first least-squares formulation will be based on the minimization over the non-empty, convex and closed subset

$$\begin{aligned} K^{s} := \big \{(u,\varvec{\sigma },\lambda )\in U{:}\, u-g\ge 0,\, \lambda \ge 0\big \}, \end{aligned}$$

where g is the given obstacle function. We will also derive and analyse variational inequalities based on non-symmetric bilinear forms that utilize the sets

$$\begin{aligned} K^{0}&:= \big \{(u,\varvec{\sigma },\lambda )\in U{:}\, u-g\ge 0\big \}, \\ K^{1}&:= \big \{(u,\varvec{\sigma },\lambda )\in U{:}\, \lambda \ge 0\big \}. \end{aligned}$$

Clearly, \(K^s\subset K^{j}\) for \(j=0,1\).

We write \(A\lesssim B\) if there exists a constant \(C>0\), independent of quantities of interest, such that \(A\le C B\). Analogously we define \(A > rsim B\). If \(A\lesssim B\) and \(B\lesssim A\) hold then we write \(A\simeq B\).

2.3 Least-squares functional

Let \(u\in H_0^1(\Omega )\) denote the unique solution of the obstacle problem (1). Define \(\lambda := -\Delta u - f\in H^{-1}(\Omega )\) and \(\varvec{\sigma }:=\nabla u\). Problem (1) can equivalently be written as the first-order problem

$$\begin{aligned} -{{\,\mathrm{div}\,}}\varvec{\sigma }-\lambda= & {} f \quad \text {in}\,\Omega , \end{aligned}$$
(3a)
$$\begin{aligned} \varvec{\sigma }-\nabla u= & {} 0 \quad \text {in}\,\Omega , \end{aligned}$$
(3b)
$$\begin{aligned} u\ge & {} g \quad \text {in}\,\Omega , \end{aligned}$$
(3c)
$$\begin{aligned} \lambda\ge & {} 0 \quad \text {in}\,\Omega , \end{aligned}$$
(3d)
$$\begin{aligned} (u-g)\lambda= & {} 0 \quad \text {in}\,\Omega , \end{aligned}$$
(3e)
$$\begin{aligned} u= & {} 0 \quad \text {on}\,\Gamma . \end{aligned}$$
(3f)

Observe that \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \in L^2(\Omega )\) and that the unique solution \(\varvec{u}=(u,\varvec{\sigma },\lambda )\in U\) satisfies \(\varvec{u}\in K^s\). We consider the functional

$$\begin{aligned} J(\varvec{u};f,g) := \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda +f\Vert _{}^2 + \Vert \nabla u-\varvec{\sigma }\Vert _{}^2 + \langle \lambda ,u-g\rangle \end{aligned}$$

for \(\varvec{u}=(u,\varvec{\sigma },\lambda )\in U\), \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\) and the minimization problem: find \(\varvec{u}\in K^s\) with

$$\begin{aligned} J(\varvec{u};f,g) = \min _{\varvec{v}\in K^s} J(\varvec{v};f,g). \end{aligned}$$
(4)

Note that the definition of the functional only makes sense if \(g\in H_0^1(\Omega )\).

Theorem 1

If \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\), then problems (3) and (4) are equivalent. In particular, there exists a unique solution \(\varvec{u}\in K^s\) of (4) and it holds that

$$\begin{aligned} J(\varvec{v};f,g) \ge C_J \Vert \varvec{v}-\varvec{u}\Vert _{U}^2 \quad \text {for all }\varvec{v}\in K^s. \end{aligned}$$
(5)

The constant \(C_J>0\) depends only on \(\Omega \).

Proof

Let \(\varvec{u}:= (u,\varvec{\sigma },\lambda ) =(u,\nabla u,-\Delta u - f)\in K^s\) denote the unique solution of (3). Observe that \(J(\varvec{v};f,g)\ge 0\) for all \(\varvec{v}\in K^s\) and \(J(\varvec{u};f,g)=0\), thus, \(\varvec{u}\) minimizes the functional. Suppose (5) holds and that \(\varvec{u}^*\in K^s\) is another minimizer. Then, (5) proves that \(\varvec{u}=\varvec{u}^*\). It only remains to show (5). Let \(\varvec{v}=(v,{\varvec{\tau }},\mu )\in K^s\). Note that all terms in \(J(\varvec{v};f,g)\) are non-negative. Since \(f=-{{\,\mathrm{div}\,}}\varvec{\sigma }-\lambda \) and \(\nabla u-\varvec{\sigma }= 0\) we have with the constant \(C_F>0\) that

$$\begin{aligned} J(\varvec{v};f,g)&= \Vert {{\,\mathrm{div}\,}}({\varvec{\tau }}-\varvec{\sigma })+(\mu -\lambda )\Vert _{}^2 + \Vert \nabla (v-u)-({\varvec{\tau }}-\varvec{\sigma })\Vert _{}^2 + \langle \mu ,v-g\rangle \\&= \frac{1}{1+C_F^2}\Big ( (1+C_F^2)\Vert {{\,\mathrm{div}\,}}({\varvec{\tau }}-\varvec{\sigma })+(\mu -\lambda )\Vert _{}^2 \\&+ (1+C_F^2)\Vert \nabla (v-u)-({\varvec{\tau }}-\varvec{\sigma })\Vert _{}^2 \\&+\,(1+C_F^2)\langle \mu ,v-g\rangle \Big ) \\&\ge \frac{1}{1+C_F^2}\Big ( (1+C_F^2) \Vert {{\,\mathrm{div}\,}}({\varvec{\tau }}-\varvec{\sigma })+(\mu -\lambda )\Vert _{}^2\\&+\,\Vert \nabla (v-u)-({\varvec{\tau }}-\varvec{\sigma })\Vert _{}^2 + \langle \mu ,v-g\rangle \Big ). \end{aligned}$$

Moreover, \(\langle \lambda ,u-g\rangle =0\) and \(\langle \lambda ,v-g\rangle \ge 0\), \(\langle \mu ,u-g\rangle \ge 0\). We estimate

$$\begin{aligned} \langle \mu ,v-g\rangle&= \langle \mu ,v-u\rangle + \langle \mu ,u-g\rangle + \langle \lambda ,u-g\rangle \\&\ge \langle \mu ,v-u\rangle + \langle \lambda ,u-g\rangle + \langle \lambda ,g-v\rangle \\&= \langle \mu ,v-u\rangle + \langle \lambda ,u-v\rangle = \langle \mu -\lambda ,v-u\rangle . \end{aligned}$$

Define \(\varvec{w}:= (w,{\varvec{\chi }},\nu ) := \varvec{v}-\varvec{u}\). Then, the Cauchy–Schwarz inequality, Young’s inequality and the definition of the divergence operator yield

$$\begin{aligned} J(\varvec{v};f,g)& > rsim (1+C_F^2) \Vert {{\,\mathrm{div}\,}}({\varvec{\tau }}-\varvec{\sigma })+(\mu -\lambda )\Vert _{}^2 \\&\quad + \Vert \nabla (v-u)-({\varvec{\tau }}-\varvec{\sigma })\Vert _{}^2 + \langle \mu ,v-g\rangle \\&\ge (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \Vert \nabla w-{\varvec{\chi }}\Vert _{}^2 + \langle \nu ,w\rangle \\&= (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \Vert \nabla w\Vert _{}^2 \\&\quad + \Vert {\varvec{\chi }}\Vert _{}^2 -(\nabla w,{\varvec{\chi }}) + \langle {{\,\mathrm{div}\,}}{\varvec{\chi }},w\rangle + \langle \nu ,w\rangle \\&\ge (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \tfrac{1}{2}\Vert \nabla w\Vert _{}^2 + \tfrac{1}{2}\Vert {\varvec{\chi }}\Vert _{}^2 + \langle {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu ,w\rangle . \end{aligned}$$

Application of the Cauchy–Schwarz inequality, Friedrichs’ inequality and Young’s inequality gives us for the last term and \(\delta >0\)

$$\begin{aligned} |\langle {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu ,w\rangle |\le C_F\Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}\Vert \nabla w\Vert _{} \le C_F^2 \frac{\delta ^{-1}}{2} \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \frac{\delta }{2} \Vert \nabla w\Vert _{}^2. \end{aligned}$$

Putting altogether and choosing \(\delta =\tfrac{1}{2}\) we end up with

$$\begin{aligned} J(\varvec{v};f,g)& > rsim (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \Vert \nabla w-{\varvec{\chi }}\Vert _{}^2 + \langle \mu ,v-g\rangle \\&\ge (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \Vert \nabla w-{\varvec{\chi }}\Vert _{}^2 + \langle \nu ,w\rangle \\&\ge \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \tfrac{1}{4}\Vert \nabla w\Vert _{}^2 + \tfrac{1}{2}\Vert {\varvec{\chi }}\Vert _{}^2 \simeq \Vert \varvec{w}\Vert _{U}^2 = \Vert \varvec{v}-\varvec{u}\Vert _{U}^2, \end{aligned}$$

which finishes the proof. \(\square \)

Remark 2

Note that (5) measures the error of any function \(\varvec{v}\in K^s\), in particular, it can be used as a posteriori error estimator when \(\varvec{v}\in K_h^s\subset K^s\) is a discrete approximation. However, in practice the condition \(K_h^s \subset K^s\) is often hard to realize. Below we introduce a simple scaling of the first term in the least-squares functional that allows us to prove coercivity of the associated bilinear form on the whole space U.

For given \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\), and fixed parameter \(\beta >0\) define the bilinear form \(a_\beta {:}\,U\times U \rightarrow \mathbb {R}\) and the functional \(F_\beta {:}\,U\rightarrow \mathbb {R}\) by

$$\begin{aligned} a_\beta (\varvec{u},\varvec{v})&:= \beta ({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda ,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ) + (\nabla u-\varvec{\sigma },\nabla v-{\varvec{\tau }}) + \tfrac{1}{2}(\langle \mu ,u\rangle + \langle \lambda ,v\rangle ), \end{aligned}$$
(6)
$$\begin{aligned} F_\beta (\varvec{v})&:= -\beta (f,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ) + \tfrac{1}{2}\langle \mu ,g\rangle \end{aligned}$$
(7)

for all \(\varvec{u}=(u,\varvec{\sigma },\lambda ), \varvec{v}= (v,{\varvec{\tau }},\mu )\in U\). We stress that \(a_1(\cdot ,\cdot )\) and \(F_1(\cdot )\) induce the functional \(J(\cdot ;\cdot )\), i.e.,

$$\begin{aligned} J(\varvec{u};f,g) = a_1(\varvec{u},\varvec{u})-2F_1(\varvec{u})+(f,f). \end{aligned}$$

Since J is differentiable it is well-known that the solution \(\varvec{u}\in K^s\) of (4) satisfies the variational inequality

$$\begin{aligned} a_1(\varvec{u},\varvec{v}-\varvec{u})&\ge F_1(\varvec{v}-\varvec{u}) \quad \text {for all } \varvec{v}\in K^s. \end{aligned}$$
(8)

Conversely, if J is also convex in \(K^s\), then any solution of (8) solves (4). However, J is convex on \(K^s\) iff \(a_1(\varvec{v}-\varvec{w},\varvec{v}-\varvec{w})\ge 0\) for all \(\varvec{v},\varvec{w}\in K^s\), which is not true in general, see the example from “Appendix A”. In Sect. 3 below we will show that for sufficiently large \(\beta >1\) the bilinear form \(a_\beta (\cdot ,\cdot )\) is coercive, even on the whole space U. This has the advantage that we can prove unique solvability of the continuous problem and its discretization simultaneously. More important, in practice this allows the use of non-conforming subsets \(K_h^s\nsubseteq K^s\).

3 Variational inequalities

In this section we introduce and analyse different variational inequalities. The idea of including the complementary condition in different ways has also been used in [15] to derive DPG methods for contact problems.

We define the bilinear forms \(b_\beta ,c_\beta {:}\,U\times U\rightarrow \mathbb {R}\) and functionals \(G_\beta \), \(H_\beta \) by

$$\begin{aligned} b_\beta (\varvec{u},\varvec{v})&:= \beta ({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda ,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ) + (\nabla u-\varvec{\sigma },\nabla v-{\varvec{\tau }}) + \langle \lambda ,v\rangle , \\ c_\beta (\varvec{u},\varvec{v})&:= \beta ({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda ,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ) + (\nabla u-\varvec{\sigma },\nabla v-{\varvec{\tau }}) + \langle \mu ,u\rangle , \\ G_\beta (\varvec{v})&:= -\beta (f,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ) \\ H_\beta (\varvec{v})&:= -\beta (f,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ) + \langle \mu ,g\rangle . \end{aligned}$$

Let \(\varvec{u}= (u,\varvec{\sigma },\lambda )\in K^s\subset K^j\) (\(j=0,1\)) denote the unique solution of (3) with \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\). Recall that \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda = -f\). Testing this identity with \({{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu \), multiplying with \((\beta -1)\) and adding it to (8) we see that the solution \(\varvec{u}\in K^s\) satisfies the variational inequality

$$\begin{aligned} a_\beta (\varvec{u},\varvec{v}-\varvec{u}) \ge F_\beta (\varvec{v}-\varvec{u}) \quad \text {for all }\varvec{v}\in K^s. \end{aligned}$$
(VIa)

For the derivation of our second variational inequality let \(\varvec{u}=(u,\varvec{\sigma },\lambda )\in K^{0}\) denote the unique solution of (3) with \(f\in L^2(\Omega )\), \(g\in H^1(\Omega )\), \(g|_\Gamma \le 0\). Recall that \(\lambda = -\Delta u-f\). By (2) we have that

$$\begin{aligned} \langle \lambda ,v-u\rangle = (\nabla u,\nabla (v-u)) - (f,v-u) \ge 0 \end{aligned}$$

for all \(v\in H_0^1(\Omega )\), \(v\ge g\). Thus, \(\varvec{u}\in K^0\) satisfies the variational inequality

$$\begin{aligned} b_\beta (\varvec{u},\varvec{v}-\varvec{u}) \ge G_\beta (\varvec{v}-\varvec{u}) \quad \text {for all }\varvec{v}\in K^0. \end{aligned}$$
(VIb)

Our final method is based on the observation that for \(\mu \ge 0\), we have that \(\langle \mu ,u-g\rangle \ge 0\) for \(u\ge g\in H_0^1(\Omega )\). Together with \(\langle \lambda ,u-g\rangle =0\) we conclude \(\langle \mu -\lambda ,u-g\rangle \ge 0\). Thus, \(\varvec{u}\in K^1\) satisfies the variational inequality

$$\begin{aligned} c_\beta (\varvec{u},\varvec{v}-\varvec{u}) \ge H_\beta (\varvec{v}-\varvec{u}) \quad \text {for all }\varvec{v}\in K^1. \end{aligned}$$
(VIc)

Note that \(a_\beta \) is symmetric, whereas \(b_\beta \), \(c_\beta \) are not.

3.1 Solvability

In what follows we analyse the (unique) solvability of the variational inequalities (VIa)–(VIc) in a uniform manner (including discretizations).

Lemma 3

Suppose \(\beta >0\). Let \(A\in \{a_\beta ,b_\beta ,c_\beta \}\). There exists \(C_\beta >0\) depending only on \(\beta >0\) and \(\Omega \) such that

$$\begin{aligned} |A(\varvec{u},\varvec{v})| \le C_\beta \Vert \varvec{u}\Vert _{U}\Vert \varvec{v}\Vert _{U} \quad \text {for all }\varvec{u},\varvec{v}\in U. \end{aligned}$$

If \(\beta \ge 1+C_F^2\), then A is coercive, i.e.,

$$\begin{aligned} C \Vert \varvec{u}\Vert _{U}^2 \le A(\varvec{u},\varvec{u}) \quad \text {for all }\varvec{u}\in U. \end{aligned}$$

The constant \(C>0\) is independent of \(\beta \) and \(\Omega \).

Proof

We prove boundedness of \(A = a_\beta \). Let \(\varvec{u}=(u,\varvec{\sigma },\lambda ),\varvec{v}=(v,{\varvec{\tau }},\mu )\in U\) be given. The Cauchy–Schwarz inequality together with the Friedrichs’ inequality and boundedness of the divergence operator yields

$$\begin{aligned} |a_\beta (\varvec{u},\varvec{v})|&\le \beta \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \Vert _{}\Vert {{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu \Vert _{} +\Vert \nabla u-\varvec{\sigma }\Vert _{}\Vert \nabla v-{\varvec{\tau }}\Vert _{} \\&\quad +\,\tfrac{1}{2}(\langle {{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ,u\rangle -\langle {{\,\mathrm{div}\,}}{\varvec{\tau }},u\rangle + \langle {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda ,v\rangle -\langle {{\,\mathrm{div}\,}}\varvec{\sigma },v\rangle ) \\&\le \beta \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \Vert _{}\Vert {{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu \Vert _{} +\Vert \nabla u-\varvec{\sigma }\Vert _{}\Vert \nabla v-{\varvec{\tau }}\Vert _{} \\&\quad +\,\tfrac{1}{2}\big ( (C_F\Vert {{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu \Vert _{}+\Vert {\varvec{\tau }}\Vert _{})\Vert \nabla u\Vert _{} + (C_F\Vert {{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda \Vert _{}+\Vert \varvec{\sigma }\Vert _{})\Vert \nabla v\Vert _{}\big ). \end{aligned}$$

This shows boundedness of \(a_\beta (\cdot ,\cdot )\). Similarly, one concludes boundedness of \(b_\beta (\cdot ,\cdot )\) and \(c_\beta (\cdot ,\cdot )\).

For the proof of coercivity, observe that \(a_\beta (\varvec{w},\varvec{w}) = b_\beta (\varvec{w},\varvec{w}) = c_\beta (\varvec{w},\varvec{w})\) for all \(\varvec{w}\in U\). We stress that coercivity directly follows from the arguments given in the proof of Theorem 1. Note that the choice of \(\beta \) yields

$$\begin{aligned} A(\varvec{w},\varvec{w})&\ge (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \Vert \nabla w-{\varvec{\chi }}\Vert _{}^2 + \langle \nu ,w\rangle \end{aligned}$$

for \(\varvec{w}=(w,{\varvec{\chi }},\nu )\in U\). The right-hand side can be further estimated following the argumentation as in the proof of Theorem 1 which gives us

$$\begin{aligned} (1+C_F^2) \Vert {{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu \Vert _{}^2 + \Vert \nabla w-{\varvec{\chi }}\Vert _{}^2 + \langle \nu ,w\rangle > rsim \Vert \varvec{w}\Vert _{U}^2. \end{aligned}$$

This finishes the proof. \(\square \)

Remark 4

Recall that \(C_F\le {{\,\mathrm{diam}\,}}(\Omega )\). Therefore, we can always choose \(\beta =1+{{\,\mathrm{diam}\,}}(\Omega )^2\) to ensure coercivity of our bilinear forms. We stress that a choice of \(\beta \) of order \({{\,\mathrm{diam}\,}}(\Omega )\) is not only sufficient to ensure coercivity but also necessary in general as the example from “Appendix A” shows. Another possibility is to rescale \(\Omega \) such that \({{\,\mathrm{diam}\,}}(\Omega )\le 1\) which implies that we can choose \(\beta =2\). Furthermore, observe that a scaling of \(\Omega \) transforms (1) to an equivalent obstacle problem (with appropriate redefined functions fg). To be more precise, define \(\widetilde{u}(x) := u(dx)\) with \(d:={{\,\mathrm{diam}\,}}(\Omega )>0\) and \(u\in H_0^1(\Omega )\) the solution of (1). Moreover, set \(\widetilde{f}(x) = d^2 f(dx)\), \(\widetilde{g}(x) := g(dx)\). Then, \(\widetilde{u}\) solves (1) in \(\widetilde{\Omega }:= \big \{x/d{:}\,x\in \Omega \big \}\) with fg replaced by \(\widetilde{f},\widetilde{g}\).

The variational inequalities (VIa)–(VIc) are of the first kind and we use a standard framework for the analysis (Lions–Stampacchia theorem), see [16, 17, 25].

Theorem 5

Suppose \(\beta \ge 1+C_F^2\). Let \(A\in \{a_\beta ,b_\beta ,c_\beta \}\) and let \(F: U\rightarrow \mathbb {R}\) denote a bounded linear functional. If \(K\subseteq U\) is a non-empty convex and closed subset, then the variational inequality

$$\begin{aligned} \text {Find }\varvec{u}\in K \text { s.t. } A(\varvec{u},\varvec{v}-\varvec{u}) \ge F(\varvec{v}-\varvec{u}) \quad \text {for all }\varvec{v}\in K \end{aligned}$$
(9)

admits a unique solution.

In particular, for \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\) each of the problems (VIa)–(VIc) has a unique solution and the problems are equivalent to (3).

Proof

With the assumption on \(\beta \), Lemma 3 proves that the bilinear forms are coercive and bounded. Then, unique solvability of (9) follows from the Lions–Stampacchia theorem, see e.g., [16, 17, 25].

Unique solvability of (VIa)–(VIc) follows since the functionals \(F_\beta \), \(G_\beta \), \(H_\beta \) are linear and bounded: For example, boundedness of \(F_\beta \) can be seen from

$$\begin{aligned} |F_\beta (\varvec{v})|&= |-\beta (f,{{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ){} + \tfrac{1}{2}({{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu ,g) - \tfrac{1}{2}\langle {{\,\mathrm{div}\,}}{\varvec{\tau }},g\rangle | \\&\le \beta \Vert f\Vert _{}\Vert {{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu \Vert _{} + \tfrac{1}{2}\Vert {{\,\mathrm{div}\,}}{\varvec{\tau }}+\mu \Vert _{}\Vert g\Vert _{} + \tfrac{1}{2}\Vert {\varvec{\tau }}\Vert _{}\Vert \nabla g\Vert _{}\\&\lesssim (\Vert f\Vert _{} +\Vert \nabla g\Vert _{})\Vert \varvec{v}\Vert _{U}. \end{aligned}$$

The same arguments prove that \(G_\beta \) and \(H_\beta \) are bounded.

Finally, equivalence to (3) follows since all problems admit unique solutions and by construction the solution of (3) also solves each of the problems (VIa)–(VIc). \(\square \)

Remark 6

We stress that the assumption \(g\in H_0^1(\Omega )\) is necessary. If \(g\in H^1(\Omega )\) then the term \(\langle \mu ,g\rangle \) in \(F_\beta \), \(H_\beta \) is not well-defined. However, this term does not appear in \(G_\beta \) and therefore the variational inequality in (VIb) admits a unique solution if we only assume \(g\in H^1(\Omega )\) with \(g|_\Gamma \le 0\).

Remark 7

The variational inequality (VIa) corresponds to a least-squares finite element method with convex functional

$$\begin{aligned} J_\beta (\varvec{u};f,g) := a_\beta (\varvec{u},\varvec{u})-2F_\beta (\varvec{u})+\beta (f,f). \end{aligned}$$

Then, Theorem 5 proves that the problem

$$\begin{aligned} J_\beta (\varvec{u};f,g) = \min _{\varvec{v}\in K} J_\beta (\varvec{v};f,g) \end{aligned}$$

admits a unique solution for all non-empty convex and closed sets \(K\subseteq U\). Moreover, \(J_\beta (\varvec{u};f,g)\simeq J(\varvec{u};f,g)\) for \(\varvec{u}\in K^s\), so that this problem is equivalent to (4) for \(K=K^s\).

3.2 A priori analysis

The following three results provide general bounds on the approximation error. The proofs are based on standard arguments, see e.g., [14]. We give details for the proof of the first result, the others follow the same lines of argumentation and are left to the reader.

Theorem 8

Suppose \(\beta \ge 1+C_F^2\). Let \(\varvec{u}\in K^s\) denote the solution of (VIa), where \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\). Let \(K_h\subset U\) denote a non-empty convex and closed subset and let \(\varvec{u}_h \in K_h\) denote the solution of (9) with \(A=a_\beta \), \(F=F_\beta \) and \(K=K_h\). It holds that

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2&\le C_\mathrm {opt}\Big ( \inf _{\varvec{v}_h\in K_h} \big ( \Vert \varvec{u}-\varvec{v}_h\Vert _{U}^2 + |\langle \lambda ,v_h-u\rangle +\langle \mu _h-\lambda ,u-g\rangle | \big ) \\&\qquad \qquad \quad +\,\inf _{\varvec{v}\in K^s} | \langle \lambda ,v-u_h\rangle + \langle \mu -\lambda _h,u-g\rangle |\Big ). \end{aligned}$$

The constant \(C_\mathrm {opt}>0\) depends only on \(\beta \) and \(\Omega \).

Proof

Throughout let \(\varvec{v}=(v,{\varvec{\tau }},\mu )\in K^s\), \(\varvec{v}_h=(v_h,{\varvec{\tau }}_h,\mu _h)\in K_h\) and let \(\varvec{u}=(u,\varvec{\sigma },\lambda )\in K^s\) denote the exact solution of (VIa). Thus, \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda +f=0\) and \(\nabla u-\varvec{\sigma }= 0\). For arbitrary \(\varvec{w}= (w,{\varvec{\chi }},\nu )\in U\) it holds that

$$\begin{aligned} a_\beta (\varvec{u},\varvec{w})&= \beta ({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda ,{{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu ) + (\nabla u-\varvec{\sigma },\nabla w-{\varvec{\chi }}) + \tfrac{1}{2}(\langle \lambda ,w\rangle +\langle \nu ,u\rangle ) \nonumber \\&= -\beta (f,{{\,\mathrm{div}\,}}{\varvec{\chi }}+\nu ) +\tfrac{1}{2}\langle \nu ,g\rangle +\tfrac{1}{2}(\langle \lambda ,w\rangle + \langle \nu ,u-g\rangle ) \nonumber \\&= F_\beta (\varvec{w}) + \tfrac{1}{2}(\langle \lambda ,w\rangle + \langle \nu ,u-g\rangle ). \end{aligned}$$
(10)

Using coercivity of \(a_\beta (\cdot ,\cdot )\), identity (10) and the fact that \(\varvec{u}_h\) solves the discretized variational inequality (on \(K_h\)) shows that

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2&\lesssim a_\beta (\varvec{u}-\varvec{u}_h,\varvec{u}-\varvec{u}_h) \\ {}&= a_\beta (\varvec{u},\varvec{u}-\varvec{u}_h) - a_\beta (\varvec{u}_h,\varvec{u}-\varvec{v}_h) -a_\beta (\varvec{u}_h,\varvec{v}_h-\varvec{u}_h) \\&\le F_\beta (\varvec{u}-\varvec{u}_h) + \tfrac{1}{2}(\langle \lambda ,u-u_h\rangle + \langle \lambda -\lambda _h,u-g\rangle ) \\ {}&\qquad -\,a_\beta (\varvec{u}_h,\varvec{u}-\varvec{v}_h) - F_\beta (\varvec{v}_h-\varvec{u}_h) \\ {}&= F_\beta (\varvec{u}-\varvec{v}_h) + \tfrac{1}{2}(\langle \lambda ,u-u_h\rangle + \langle \lambda -\lambda _h,u-g\rangle ) - a_\beta (\varvec{u}_h,\varvec{u}-\varvec{v}_h). \end{aligned}$$

Note that \(0=\langle \lambda ,u-g\rangle \le \langle \lambda ,v-g\rangle \) and \(\langle \lambda ,u-g\rangle \le \langle \mu ,u-g\rangle \). Hence,

$$\begin{aligned} \langle \lambda ,u-u_h\rangle +\langle \lambda -\lambda _h,u-g\rangle&= \langle \lambda ,u-g+g-u_h\rangle + \langle \lambda -\lambda _h,u-g\rangle \\&\le \langle \lambda ,v-g+g-u_h\rangle + \langle \mu -\lambda _h,u-g\rangle . \end{aligned}$$

This and identity (10) with \(\varvec{w}=\varvec{u}-\varvec{v}_h\) imply that

$$\begin{aligned}&F_\beta (\varvec{u}-\varvec{v}_h)-a_\beta (\varvec{u}_h,\varvec{u}-\varvec{v}_h) + \tfrac{1}{2}(\langle \lambda ,u-u_h\rangle + \langle \lambda -\lambda _h,u-g\rangle ) \\&\quad \le a_\beta (\varvec{u}-\varvec{u}_h,\varvec{u}-\varvec{v}_h) - \tfrac{1}{2}(\langle \lambda ,u-v_h\rangle +\langle \lambda -\mu _h,u-g\rangle ) \\&\qquad +\,\tfrac{1}{2}(\langle \lambda ,v-u_h\rangle + \langle \mu -\lambda _h,u-g\rangle ). \end{aligned}$$

Putting altogether, boundedness of \(a_\beta (\cdot ,\cdot )\) and an application of Young’s inequality with parameter \(\delta >0\) show that

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2&\lesssim \frac{\delta }{2} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2 + \frac{\delta ^{-1}}{2}\Vert \varvec{u}-\varvec{v}_h\Vert _{U}^2 + |\langle \lambda ,v_h-u\rangle +\langle \mu _h-\lambda ,u-g\rangle | \\&\qquad +\,|\langle \lambda ,v-u_h\rangle + \langle \mu -\lambda _h,u-g\rangle |. \end{aligned}$$

Subtracting the term \(\delta /2\Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2\) for some sufficiently small \(\delta >0\) finishes the proof since \(\varvec{v}\in K^s\), \(\varvec{v}_h\in K_h\) are arbitrary. \(\square \)

Theorem 9

Suppose \(\beta \ge 1+C_F^2\). Let \(\varvec{u}\in K^0\) denote the solution of (VIb), where \(f\in L^2(\Omega )\), \(g\in H^1(\Omega )\) with \(g|_\Gamma \le 0\). Let \(K_h\subset U\) denote a non-empty convex and closed subset and let \(\varvec{u}_h \in K_h\) denote the solution of (9) with \(A=b_\beta \), \(F=G_\beta \), and \(K=K_h\). It holds that

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2&\le C_\mathrm {opt}\Big ( \inf _{\varvec{v}_h\in K_h}\big (\Vert \varvec{u}-\varvec{v}_h\Vert _{U}^2 + |\langle \lambda ,v_h-u\rangle |\big ) + \inf _{\varvec{v}\in K^0} | \langle \lambda ,v-u_h\rangle |\Big ). \end{aligned}$$

The constant \(C_\mathrm {opt}>0\) depends only on \(\beta \) and \(\Omega \).

Theorem 10

Suppose \(\beta \ge 1+C_F^2\). Let \(\varvec{u}\in K^1\) denote the solution of (VIc), where \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\). Let \(K_h\subset U\) denote a non-empty convex and closed subset and let \(\varvec{u}_h \in K_h\) denote the solution of (9) with \(A=c_\beta \), \(F=H_\beta \), and \(K=K_h\). It holds that

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2&\le C_\mathrm {opt}\Big ( \inf _{\varvec{v}_h\in K_h}\big (\Vert \varvec{u}-\varvec{v}_h\Vert _{U}^2 {+} |\langle \mu _h\!-\!\lambda ,u\!-\!g\rangle |\big ) {+} \inf _{\varvec{v}\in K^1} | \langle \mu \!-\!\lambda _h,u\!-\!g\rangle |\Big ). \end{aligned}$$

The constant \(C_\mathrm {opt}>0\) depends only on \(\beta \) and \(\Omega \).

3.3 Discretization

Let \(\mathcal {T}\) denote a regular triangulation of \(\Omega \), \(\bigcup _{T\in \mathcal {T}} \overline{T}=\overline{\Omega }\). We assume that \(\mathcal {T}\) is \(\kappa \)-shape regular, i.e.,

$$\begin{aligned} \sup _{T\in \mathcal {T}} \frac{{{\,\mathrm{diam}\,}}(T)^n}{|T|} \le \kappa < \infty . \end{aligned}$$

Moreover, let \(\mathcal {V}\) denote the vertices of the mesh \(\mathcal {T}\) and \(\mathcal {V}_0:=\mathcal {V}{\setminus }\Gamma \). Let \(h_\mathcal {T}\in L^\infty (\Omega )\) denote the mesh-size function, \(h_\mathcal {T}|_T := h_T := {{\,\mathrm{diam}\,}}(T)\) for \(T\in \mathcal {T}\). Set \(h:=\max _{T\in \mathcal {T}}{{\,\mathrm{diam}\,}}(T)\). We use standard finite element spaces for the discretization. Let \(\mathcal {P}^p(\mathcal {T})\) denote the space of \(\mathcal {T}\)-elementwise polynomials of degree less or equal than \(p\in \mathbb {N}_0\). Let \(\mathcal {R}\!\mathcal {T}^p(\mathcal {T})\) denote the Raviart–Thomas space of degree \(p\in \mathbb {N}_0\), \(\mathcal {S}_0^{p+1}(\mathcal {T}) := \mathcal {P}^{p+1}(\mathcal {T})\cap H_0^1(\Omega )\), and

$$\begin{aligned} U_{hp} := \mathcal {S}_0^{p+1}(\mathcal {T}) \times \mathcal {R}\!\mathcal {T}^p(\mathcal {T}) \times \mathcal {P}^p(\mathcal {T}). \end{aligned}$$

Clearly, \(U_{hp} \subset U\). We stress that the polynomial degree is chosen, so that the best approximation in the norm \(\Vert \cdot \Vert _{U}\) is of order \(h^{p+1}\).

To define admissible convex sets for the discrete variational inequalities we need to put constraints on functions from the space \(\mathcal {S}_0^{p+1}(\mathcal {T})\) or from \(\mathcal {P}^p(\mathcal {T})\) or both. Let us remark that for a polynomial degree \(\ge 2\) such constraints are not straightforward to implement. One possibility would be to impose such constraints pointwise and then analyse the consistency error. We comment on the case \(p=1\) and \(n=2\) below. For some hp-FEM method for elliptic obstacle problems we refer to [2, 3]. In order to avoid such quite technical treatments and for a simpler representation of the basic ideas we consider from now on the lowest-order case only, where the linear constraints can easily be built in. To that end define the non-empty convex subsets

$$\begin{aligned} K_h^s:= & {} \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h0}{:}\,\mu _h\ge 0, \, v_h(x)\ge g(x) \text { for all } x\in \mathcal {V}_0\big \}, \end{aligned}$$
(11a)
$$\begin{aligned} K_h^0:= & {} \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h0}{:}\,v_h(x)\ge g(x) \text { for all } x\in \mathcal {V}_0\big \}, \end{aligned}$$
(11b)
$$\begin{aligned} K_h^1:= & {} \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h0}{:}\,\mu _h\ge 0\big \}. \end{aligned}$$
(11c)

In the definition of \(K_h^s\), \(K_h^0\) we assume \(g\in H^1(\Omega )\cap C^0(\overline{\Omega })\) so that the point evaluation is well-defined.

Let us shortly comment on how to incorporate the constraints for the higher-order space \(U_{h1}\) and \(n=2\). Let \(\mathcal {V}_m\) denote the midpoints of interior edges of the triangulation \(\mathcal {T}\). Then, a choice for the discrete convex set is

$$\begin{aligned} K_{h1}^s&:= \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h1}{:}\,\mu _h\ge 0, \, v_h(z) \ge g(z) \quad \text {for all } z\in \mathcal {V}_0\cup \mathcal {V}_m\big \}. \end{aligned}$$

In the same manner one defines \(K_{h1}^0\) resp. \(K_{h1}^1\).

3.4 Auxiliary results

For the analysis of the convergence rates we use the nodal interpolation operator \(I_h{:}\,H^2(\Omega )\rightarrow \mathcal {S}^1(\mathcal {T}):= \mathcal {P}^1(\mathcal {T})\cap C^0(\overline{\Omega })\), the Raviart–Thomas projector \(\Pi ^{{{\,\mathrm{div}\,}}}_h{:}\,H^1(\Omega )^n \rightarrow \mathcal {R}\!\mathcal {T}^0(\mathcal {T})\), and the \(L^2(\Omega )\) projector \(\Pi _h{:}\,L^2(\Omega )\rightarrow \mathcal {P}^0(\mathcal {T})\). Observe that with \(v\ge 0\), \(\mu \ge 0\) we have (with sufficient regularity) that \(I_h v\ge 0\), \(\Pi _h\mu \ge 0\). Moreover, recall the commutativity property \({{\,\mathrm{div}\,}}\Pi ^{{{\,\mathrm{div}\,}}}_h = \Pi _h{{\,\mathrm{div}\,}}\), as well as the approximation properties

$$\begin{aligned} \Vert v-I_hv\Vert _{} + h\Vert \nabla (v-I_hv)\Vert _{}&\lesssim h^2\Vert D^2v\Vert _{}, \end{aligned}$$
(12)
$$\begin{aligned} \Vert {\varvec{\tau }}-\Pi ^{{{\,\mathrm{div}\,}}}_h{\varvec{\tau }}\Vert _{}&\lesssim h \Vert \nabla {\varvec{\tau }}\Vert _{}, \end{aligned}$$
(13)
$$\begin{aligned} \Vert \mu -\Pi _h\mu \Vert _{}&\lesssim \Vert h_\mathcal {T}\nabla _\mathcal {T}\mu \Vert _{}. \end{aligned}$$
(14)

Here, \(\nabla {\varvec{\tau }}\) is understood componentwise, \(\nabla _\mathcal {T}\mu \) denotes the \(\mathcal {T}\)-elementwise gradient of \(\mu \in H^1(\mathcal {T}) := \big \{\nu \in L^2(\Omega ){:}\,\nu |_T \in H^1(T), \, T\in \mathcal {T}\big \}\). Set \(\Vert \nu \Vert _{H^1(\mathcal {T})}^2 := \Vert \nu \Vert _{}^2 + \Vert \nabla _\mathcal {T}\nu \Vert _{}^2\). The involved constants depend only on the \(\kappa \)-shape regularity of \(\mathcal {T}\) but are otherwise independent of \(\mathcal {T}\). Furthermore, for \(\mu \in L^2(\Omega )\), it also holds that

$$\begin{aligned} \Vert \mu -\Pi _h\mu \Vert _{-1} \lesssim \Vert h_\mathcal {T}(\mu -\Pi _h\mu )\Vert _{}, \end{aligned}$$

which follows from the definition of the dual norm, the projection and approximation property of \(\Pi _h\).

The proof of optimal a priori convergence rates will also rely on the following two results. Scaling arguments and the continuous embedding \(H^2(T_\mathrm {ref}) \hookrightarrow C^0(\overline{T_\mathrm {ref}})\) show the next result. Here, \(\overline{T_\mathrm {ref}}\) denotes some reference element.

Lemma 11

There exists a constant \(C>0\) depending only on \(T_\mathrm {ref}\) and \(\kappa \)-shape regularity of the triangulation such that

$$\begin{aligned} \Vert v\Vert _{L^\infty (T)} \le C |T|^{-1/2} \big ( \Vert v\Vert _{T} + h_T \Vert \nabla v\Vert _{T} + h_T^2 \Vert D^2 v\Vert _{T} \big ) \quad \text {for all } v\in H^2(T). \end{aligned}$$
(15)

The next result is proven along the lines of [12, Lemma 7]. For completeness we present the proof of the nontrivial result adapted to our situation in “Appendix B”. For each element \(T\in \mathcal {T}\) and \(v\in H^2(T)\) we define the level set

$$\begin{aligned} T_\mathrm {C}(v) := \big \{x\in T{:}\,v(x) = 0\big \} \quad \text {as well as the set}\quad T_\mathrm {NC}(v) := \big \{x\in T{:}\,v(x) \ne 0\big \}. \end{aligned}$$

Note that \(v\in H^2(T)\) implies that these sets are measurable. Moreover, \(|T_\mathrm {C}(v)| + |T_\mathrm {NC}(v)| = |T|\).

Lemma 12

Let \(v\in H^2(T)\). Assume \(|T_\mathrm {C}(v)|>0\). Then,

$$\begin{aligned} \Vert v\Vert _{T} \le C h_T \frac{|T|^{1/2}}{|T_\mathrm {C}(v)|^{1/2}} \Vert \nabla v\Vert _{T} \end{aligned}$$

and in particular

$$\begin{aligned} \Vert \nabla v\Vert _{T} \le C h_T \frac{|T|^{1/2}}{|T_\mathrm {C}(v)|^{1/2}} \Vert D^2 v\Vert _{T}. \end{aligned}$$

Here, \(C = \sqrt{n}\) for \(n=2,3\).

3.5 Optimal a priori convergence rates

Theorem 13

Suppose \(\beta \ge 1+C_F^2\). Let \(\varvec{u}\in K^s\) denote the solution of (VIa) with data \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\). Let \(K_h^s\) denote the set defined in (11a) and let \(\varvec{u}_h \in K_h^s\) denote the solution of (9) with \(A=a_\beta \), \(F=F_\beta \), and \(K=K_h^s\). If \(u \in H^2(\Omega )\), \(g\in H^2(\Omega )\) and \(f\in H^1(\mathcal {T})\), then

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}&\le C_\mathrm {app}h (\Vert u\Vert _{H^2(\Omega )} + \Vert \nabla _\mathcal {T}f\Vert _{} +\Vert \lambda \Vert _{} + \Vert g\Vert _{H^2(\Omega )}). \end{aligned}$$

The constant \(C_\mathrm {app}>0\) depends only on \(\beta \), \(\Omega \), and \(\kappa \)-shape regularity of \(\mathcal {T}\).

Proof

Choose \(\varvec{v}_h = (I_hu,\Pi ^{{{\,\mathrm{div}\,}}}_h\varvec{\sigma },\Pi _h\lambda )\in K_h^s\). The commutativity property of \(\Pi ^{{{\,\mathrm{div}\,}}}_h\) shows that

$$\begin{aligned} {{\,\mathrm{div}\,}}(\varvec{\sigma }-\Pi ^{{{\,\mathrm{div}\,}}}_h\varvec{\sigma })+\lambda -\Pi _h\lambda = (1-\Pi _h)({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda ) = (1-\Pi _h)f. \end{aligned}$$

Therefore, using the approximation properties of the involved operators proves

$$\begin{aligned} \Vert \varvec{u}-\varvec{v}_h\Vert _{U}&\le \Vert (1-\Pi _h)f\Vert _{} + \Vert \varvec{\sigma }-\Pi ^{{{\,\mathrm{div}\,}}}_h\varvec{\sigma }\Vert _{}+ \Vert \nabla (u-I_h u)\Vert _{} \\&\lesssim h\Vert \nabla _\mathcal {T}f\Vert _{} + h\Vert u\Vert _{H^2(\Omega )}. \end{aligned}$$

Moreover,

$$\begin{aligned} |\langle \lambda ,I_h u-u\rangle | \le \Vert \lambda \Vert _{}h^2\Vert D^2u\Vert _{}\lesssim h^2(\Vert u\Vert _{H^2(\Omega )}^2+\Vert \lambda \Vert _{}^2). \end{aligned}$$

We have to estimate the term

$$\begin{aligned} |\langle \Pi _h\lambda -\lambda ,u-g\rangle |. \end{aligned}$$

Define \(T_\mathrm {C} := T_\mathrm {C}(u-g)\) and \(T_\mathrm {NC} := T_\mathrm {NC}(u-g)\). Note that these two sets are measurable and we have that \(|T_\mathrm {C}| + |T_\mathrm {NC}|=|T|\). We consider three cases: First, assume that \(|T_\mathrm {C}| = 0\). This implies that \(u-g>0\) a.e. in T but since \((u-g)\lambda = 0\) we infer that \(\lambda = 0\) a.e. in T. Therefore, \(\Pi _h\lambda |_T = 0\) and we have that \(\langle \Pi _h\lambda -\lambda ,u-g\rangle _T = 0\). Second, assume that \(|T_\mathrm {NC}| = 0\). But then, \(u-g = 0\) a.e. in T and we have again \(\langle \Pi _h\lambda -\lambda ,u-g\rangle _T = 0\). The final case to be considered is \(|T_\mathrm {NC}|>0, |T_\mathrm {C}|>0\): We have that

$$\begin{aligned} |\langle \Pi _h\lambda -\lambda ,u-g\rangle _T|&= |\langle \lambda ,(\Pi _h-1)(u-g)\rangle _T|\\&\le \Vert \lambda \Vert _{L^1(T)} \Vert (1-\Pi _h)(u-g)\Vert _{L^\infty (T)}. \end{aligned}$$

Note that \(\lambda |_{T_\mathrm {NC}} = 0\). Thus, \(\Vert \lambda \Vert _{L^1(T)} = \Vert \lambda \Vert _{L^1(T_\mathrm {C})} \le |T_\mathrm {C}|^{1/2} \Vert \lambda \Vert _{T}\). For the second term we apply Lemma 11 with \(v = (1-\Pi _h)(u-g)\) and together with the approximation property of \(\Pi _h\) we get the estimate

$$\begin{aligned} \Vert (1-\Pi _h)(u-g)\Vert _{L^\infty (T)}&\lesssim |T|^{-1/2}\big ( \Vert (1-\Pi _h)(u-g)\Vert _{T} \\&\quad + h_T\Vert \nabla (u-g)\Vert _{T} + h_T^2 \Vert D^2(u-g)\Vert _{T} \big )\\&\lesssim |T|^{-1/2} \big ( h_T\Vert \nabla (u-g)\Vert _{T} + h_T^2 \Vert D^2(u-g)\Vert _{T} \big ). \end{aligned}$$

We can estimate the gradient term by applying the second inequality of Lemma 12 which gives us

$$\begin{aligned} \Vert \nabla (u-g)\Vert _{T} \lesssim \frac{h_T|T|^{1/2}}{|T_\mathrm {C}|^{1/2}}\Vert D^2 (u-g)\Vert _{T}. \end{aligned}$$

Clearly \(|T_\mathrm {C}|^{1/2}\le |T|^{1/2}\), thus \(|T|^{-1/2}\le |T_\mathrm {C}|^{-1/2}\) and we conclude that

$$\begin{aligned} \Vert (1-\Pi _h)(u-g)\Vert _{L^\infty (T)}&\lesssim \frac{h_T^2}{|T_\mathrm {C}|^{1/2}} \Vert D^2(u-g)\Vert _{T}. \end{aligned}$$

Using \(\Vert \lambda \Vert _{L^1(T)} \le |T_\mathrm {C}|^{1/2} \Vert \lambda \Vert _{T}\) then yields that

$$\begin{aligned} |\langle \Pi _h\lambda -\lambda ,u-g\rangle _T|&\lesssim |T_\mathrm {C}|^{1/2} \Vert \lambda \Vert _{T} \frac{h_T^2}{|T_\mathrm {C}|^{1/2}}\Vert D^2(u-g)\Vert _{T}\\&\le h_T^2\left( \Vert \lambda \Vert _{T}^2 + \Vert u\Vert _{H^2(T)}^2 + \Vert g\Vert _{H^2(T)}^2\right) . \end{aligned}$$

Summing up we have that

$$\begin{aligned}&\inf _{\varvec{v}_h\in K_h^s} \big ( \Vert \varvec{u}-\varvec{v}_h\Vert _{U}^2 + |\langle \lambda ,v_h-u\rangle +\langle \mu _h-\lambda ,u-g\rangle | \big ) \\&\qquad \quad \lesssim h^2 \big (\Vert u\Vert _{H^2(\Omega )}^2 + \Vert \nabla _\mathcal {T}f\Vert _{}^2 +\Vert \lambda \Vert _{}^2 + \Vert g\Vert _{H^2(\Omega )}^2\big ). \end{aligned}$$

Therefore, in view of Theorem 8 it only remains to estimate the consistency error

$$\begin{aligned} \inf _{\varvec{v}\in K^s} | \langle \lambda ,v-u_h\rangle + \langle \mu -\lambda _h,u-g\rangle |. \end{aligned}$$

Define \(\varvec{v}:= (v,{\varvec{\chi }},\mu ):=(v,0,\lambda _h)\in U\) with \(v:=\sup \{u_h,g\}\) and observe that \(\varvec{v}\in K^s\). This directly leads to \(\langle \mu -\lambda _h,u-g\rangle = 0\). For the remaining term we follow the seminal work [14] of Falk. The same lines as in the proof of [14, Lemma 4] show that

$$\begin{aligned} |\langle \lambda ,v-u_h\rangle |\le \Vert \lambda \Vert _{} \Vert v-u_h\Vert _{} \le \Vert \lambda \Vert _{} \Vert g-I_hg\Vert _{} \lesssim h^2\Vert g\Vert _{H^2(\Omega )}\Vert \lambda \Vert _{}. \end{aligned}$$

This finishes the proof. \(\square \)

The proof of the following result can be obtained in the same fashion as the previous one and is therefore omitted.

Theorem 14

Suppose \(\beta \ge 1+C_F^2\). Let \(\varvec{u}\in K^0\) denote the solution of (VIb) with data \(f\in L^2(\Omega )\), \(g\in H^1(\Omega )\), \(g|_\Gamma \le 0\). Let \(\varvec{u}_h \in K_h\) denote the solution of (9) with \(A=b_\beta \), \(F=G_\beta \), and \(K=K_h\), where either \(K_h=K_h^s\) or \(K_h=K_h^0\). If \(u \in H^2(\Omega )\), \(g\in H^2(\Omega )\) and \(f\in H^1(\mathcal {T})\), then

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}&\le C_\mathrm {app}h (\Vert u\Vert _{H^2(\Omega )} + \Vert \nabla _\mathcal {T}f\Vert _{} +\Vert \lambda \Vert _{} + \Vert g\Vert _{H^2(\Omega )}). \end{aligned}$$

The constant \(C_\mathrm {app}>0\) depends only on \(\beta \), \(\Omega \), and \(\kappa \)-shape regularity of \(\mathcal {T}\).

Finally, we show convergence rates for problem (VIc) and its approximation. Note that for the sets \(K_h^1\), \(K_h^s\) defined in (11c), (11a) it holds that \(K_h^s\subset K_h^1\subset K^1\) and thus the consistency error, see Theorem 10, vanishes. The proof is similar to the one of Theorem 13 and is therefore left to the reader.

Theorem 15

Suppose \(\beta \ge 1+C_F^2\). Let \(\varvec{u}\in K^1\) denote the solution of (VIc) with data \(f\in L^2(\Omega )\), \(g\in H_0^1(\Omega )\). Let \(\varvec{u}_h \in K_h\) denote the solution of (9) with \(A=c_\beta \), \(F=H_\beta \), and \(K=K_h\), where either \(K_h=K_h^s\) or \(K_h=K_h^1\). If \(u \in H^2(\Omega )\), \(g\in H^2(\Omega )\) and \(f\in H^1(\mathcal {T})\), then

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}&\le C_\mathrm {app}h \left( \Vert u\Vert _{H^2(\Omega )} + \Vert \nabla _\mathcal {T}f\Vert _{} +\Vert \lambda \Vert _{} + \Vert g\Vert _{H^2(\Omega )}\right) . \end{aligned}$$

The constant \(C_\mathrm {app}>0\) depends only on \(\beta \), \(\Omega \), and \(\kappa \)-shape regularity of \(\mathcal {T}\).

4 A posteriori analysis

In this section we derive reliable error bounds that can be used as an a posteriori estimator. We define

$$\begin{aligned} {{\,\mathrm{osc}\,}}&:= {{\,\mathrm{osc}\,}}(f) := \Vert (1-\Pi _h)f\Vert _{}. \end{aligned}$$

The estimator below includes the residual term

$$\begin{aligned} \eta ^2:= \eta (\varvec{u}_h,f)^2:= \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+\Pi _hf\Vert _{}^2 + \Vert \nabla u_h-\varvec{\sigma }_h\Vert _{}^2, \end{aligned}$$

which can be localized. The derivation of our estimators is quite simple and is based on the following observation. Let \(\varvec{u}\in K^s\subset K^j\) denote the unique solution of (3) and let \(\varvec{u}_h\in U_{h0}\) be arbitrary. Take \(\beta = 1+C_F^2\) and recall that by Lemma 3 it holds that \(a_\beta (\varvec{v},\varvec{v})=b_\beta (\varvec{v},\varvec{v})=c_\beta (\varvec{v},\varvec{v}) > rsim \Vert \varvec{v}\Vert _{U}^2\) for all \(\varvec{v}\in U\). Then, together with the Pythagoras theorem \(\Vert \mu \Vert _{}^2 = \Vert (1-\Pi _h)\mu \Vert _{}^2 + \Vert \Pi _h\mu \Vert _{}^2\) for \(\mu \in L^2(\Omega )\) and using \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda +f=0\), \(\nabla u = \varvec{\sigma }\), \({{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h\in \mathcal {P}^0(\mathcal {T})\), it follows that

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2&\lesssim \beta \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+f\Vert _{}^2 + \Vert \nabla u_h-\varvec{\sigma }_h\Vert _{}^2 + \langle \lambda _h-\lambda ,u_h-u\rangle \nonumber \\&= \beta \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+\Pi _h f\Vert _{}^2 + \beta {{\,\mathrm{osc}\,}}^2 \nonumber \\&\quad + \Vert \nabla u_h-\varvec{\sigma }_h\Vert _{}^2 + \langle \lambda _h-\lambda ,u_h-u\rangle \nonumber \\&\le \beta ( \eta ^2 + {{\,\mathrm{osc}\,}}^2) + \langle \lambda _h-\lambda ,u_h-u\rangle . \end{aligned}$$
(16)

The remaining results in this section are proved by estimating the duality term \(\langle \lambda _h-\lambda ,u_h-u\rangle \) from (16). In particular, the proof of the next result employs only \(\lambda _h\ge 0\) We will need the positive resp. negative part of a function \(v{:}\,\Omega \rightarrow \mathbb {R}\),

$$\begin{aligned} v_{+} := \max \{0,v\}, \quad v_{-} := -\min \{0,v\}. \end{aligned}$$

This definition implies that \(v = v_+-v_-\). The ideas of estimating the duality term are similar as in [18, 31] and references therein, see also [15] for a related estimate for Signorini-type problems. Note that we do not need to assume \(g\in H_0^1(\Omega )\).

Theorem 16

Let \(\varvec{u}\in K^s\) denote the solution of (3). Let \(\varvec{u}_h \in K_h\), where \(K_h\in \{K_h^s,K_h^1\}\), be arbitrary. The error satisfies

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2 \le C_\mathrm {rel}\big ( \eta ^2 + \rho ^2 + {{\,\mathrm{osc}\,}}^2 \big ), \end{aligned}$$

where the estimator contribution \(\rho \) is given by

$$\begin{aligned} \rho ^2 := \langle \lambda _h,(u_h-g)_+\rangle + \Vert \nabla (g-u_h)_+\Vert _{}^2. \end{aligned}$$

The constant \(C_\mathrm {rel}>0\) depends only on \(\Omega \).

Proof

In view of estimate (16) we only have to tackle the term \(\langle \lambda _h-\lambda ,u_h-u\rangle \). Define \(v_h := \max \{u_h,g\}\). Clearly, \(v_h\ge g\) and \(v_h\in H_0^1(\Omega )\). Note that \(\lambda = -\Delta u - f \in H^{-1}(\Omega )\). Therefore, \(\langle \lambda ,v\rangle = (\nabla u,\nabla v)-(f,v)\) for all \(v\in H_0^1(\Omega )\) and using the variational inequality for the exact solution (2) yields

$$\begin{aligned} -\,\langle \lambda ,u_h-u\rangle&= -\langle \lambda ,u_h-v_h\rangle -\langle \lambda ,v_h-u\rangle \le -\langle \lambda ,u_h-v_h\rangle \\&= \langle \lambda ,(u_h-g)_-\rangle = \langle \lambda -\lambda _h,(u_h-g)_-\rangle + \langle \lambda _h,(u_h-g)_-\rangle \\&\le \frac{\delta }{2}\Vert \lambda -\lambda _h\Vert _{-1}^2 + \frac{\delta ^{-1}}{2}\Vert \nabla (u_h-g)_-\Vert _{}^2 + \langle \lambda _h,(u_h-g)_-\rangle \end{aligned}$$

for all \(\delta >0\). Employing \(\lambda _h\ge 0\), \(g-u\le 0\), and \(v+v_-=v_+\) we further infer that

$$\begin{aligned} \langle \lambda _h-\lambda ,u_h-u\rangle&\le \langle \lambda _h,u_h-g+(u_h-g)_-\rangle + \langle \lambda _h,g-u\rangle \\&\qquad +\,\frac{\delta }{2}\Vert \lambda -\lambda _h\Vert _{-1}^2 + \frac{\delta ^{-1}}{2}\Vert \nabla (u_h-g)_-\Vert _{}^2 \\&\le \langle \lambda _h,(u_h-g)_+\rangle + \frac{\delta }{2}\Vert \lambda -\lambda _h\Vert _{-1}^2 + \frac{\delta ^{-1}}{2}\Vert \nabla (u_h-g)_-\Vert _{}^2. \end{aligned}$$

Recall that \(\Vert \lambda -\lambda _h\Vert _{-1}\le \Vert \varvec{u}-\varvec{u}_h\Vert _{V}\lesssim \Vert \varvec{u}-\varvec{u}_h\Vert _{U}\), where the involved constant depends only on \(\Omega \). Thus, choosing \(\delta >0\) sufficiently small the proof is concluded with (16). \(\square \)

We could derive a similar estimate if \(\varvec{u}_h\in K_h^0\) by changing the role of \(u_h\) and \(\lambda _h\) resp. u and \(\lambda \) in the proof. However, this leads to an estimator with a non-local term. To see this, suppose \(g=0\). Then, following the last proof we get

$$\begin{aligned} \langle \lambda _h-\lambda ,u_h-u\rangle \le \langle (\lambda _h)_+,u_h\rangle + \frac{\delta }{2} \Vert \nabla (u-u_h)\Vert _{}^2 + \frac{\delta ^{-1}}{2} \Vert (\lambda _h)_{-}\Vert _{-1}^2 \end{aligned}$$

for \(\delta >0\). For the total error this would yield

$$\begin{aligned} \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2 \lesssim \eta ^2 + {{\,\mathrm{osc}\,}}^2 + \langle (\lambda _h)_+,u_h\rangle + \Vert (\lambda _h)_{-}\Vert _{-1}^2. \end{aligned}$$

The last term is not localizable and therefore it is not feasible to use this estimate as an a posteriori error estimator in an adaptive algorithm.

Remark 17

The derived estimator is efficient up to the term \(\rho \), i.e.,

$$\begin{aligned} \eta ^2 + {{\,\mathrm{osc}\,}}^2 \lesssim \Vert \varvec{u}-\varvec{u}_h\Vert _{U}^2. \end{aligned}$$

To see this, we employ the Pythagoras theorem to obtain

$$\begin{aligned} \eta ^2+{{\,\mathrm{osc}\,}}^2 = \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+f\Vert _{}^2 + \Vert \nabla u_h-\varvec{\sigma }_h\Vert _{}^2. \end{aligned}$$

Then, \({{\,\mathrm{div}\,}}\varvec{\sigma }+\lambda =-f\), \(\nabla u = \varvec{\sigma }\) and the triangle inequality prove the asserted estimate. The proof of the efficiency estimate \(\rho \lesssim \Vert \varvec{u}-\varvec{u}_h\Vert _{U}\) (up to possible data resp. obstacle oscillations) is an open problem, see also the related works [1, 18].

5 Examples

In this section we present numerical studies that demonstrate the performance of our proposed methods in different situations:

  • In Sect. 5.3 we consider a problem on the unit square with smooth obstacle and known smooth solution.

  • In Sect. 5.4 we consider the example from [4, Section 5.2] where the solution is known and exhibits a singularity.

  • In Sect. 5.5 we consider a problem on an L-shaped domain with a pyramid-like obstacle and unknown solution.

Before we come to a detailed discussion on the numerical studies some remarks are in order. In all examples we choose \(\beta = 1+{{\,\mathrm{diam}\,}}(\Omega )^2\) to ensure coercivity of the bilinear forms (Lemma 3). This also implies that the Galerkin matrices associated to the bilinear forms \(a_\beta \), \(b_\beta \), and \(c_\beta \) are positive definite. Choosing standard basis functions for \(\mathcal {S}_0^1(\mathcal {T})\) (nodal basis), \(\mathcal {R}\!\mathcal {T}^0(\mathcal {T})\) (lowest-order Raviart–Thomas basis) and \(\mathcal {P}^0(\mathcal {T})\) (characteristic functions), the constraints in the discrete convex sets \(K_h^\star \), where \(\star =0\), \(\star =1\) or \(\star =s\), are straightforward to impose. The resulting discrete variational inequalities are then solved using a (primal-dual) active set strategy, see e.g., [21,22,23].

5.1 Active set method and discrete variational inequalities

In this section we first define and collect results on the (primal-dual) active set method. Then, we recall the variational inequalities (VIa)–(VIc) and write down their discrete variants.

5.1.1 Active set method

Let \(\mathcal {N}= \{1,\dots ,N\}\), \(N\in \mathbb {N}\), and let \(\mathcal {N}_\gamma \subseteq \mathcal {N}\) be a non-empty subset. We set \(\mathcal {N}_\omega := \mathcal {N}{\setminus } \mathcal {N}_\gamma \). For a vector \(\varvec{x}\in \mathbb {R}^N\) we write \(\varvec{x}=0\) if all components are equal to 0. Similarly, \(\varvec{x}\ge 0\) means that all components of \(\varvec{x}\) are \(\ge 0\). For a subset \(\mathcal {I}\subseteq \mathcal {N}\), \(\varvec{x}_{\mathcal {I}} = 0\) means \(\varvec{x}_i = 0\) for all \(i\in \mathcal {I}\). We also use the notation \(\varvec{x}_{\mathcal {I}}\ge 0\), which means \(\varvec{x}_i\ge 0\) for all \(i\in \mathcal {I}\) and \(\varvec{x}_\mathcal {I}\ge \varvec{y}_\mathcal {I}\) stands for \(\varvec{x}_\mathcal {I}-\varvec{y}_\mathcal {I}\ge 0\).

For \(\varvec{g}\in \mathbb {R}^N\) we consider the convex set

$$\begin{aligned} K := K_{\varvec{g}} := \big \{\varvec{x}\in \mathbb {R}^N{:}\,\varvec{x}_{\mathcal {N}_\gamma }\ge \varvec{g}_{\mathcal {N}_\gamma }\big \}. \end{aligned}$$

Let \(\varvec{S}\in \mathbb {R}^{N\times N}\) denote a positive definite (but possibly non-symmetric) matrix, and \(\varvec{b}\in \mathbb {R}^N\) some arbitrary vector. We consider the variational inequality: find \(\varvec{x}\in K\), such that

$$\begin{aligned} \langle \varvec{S}\varvec{x},\varvec{y}-\varvec{x}\rangle _2 \ge \langle \varvec{b},\varvec{y}-\varvec{x}\rangle _2 \quad \text {for all } \varvec{y}\in K, \end{aligned}$$
(17)

where \(\langle \cdot ,\cdot \rangle _2\) denotes the Euclidean inner product on \(\mathbb {R}^N\). Since \(\varvec{S}\) is positive definite this problem admits a unique solution. It is well-known that problem (17) can be rewritten as follows: find \((\varvec{x},\varvec{\lambda })\in \mathbb {R}^N\times \mathbb {R}^N\) such that

$$\begin{aligned} \varvec{S}\varvec{x}-\varvec{\lambda }= & {} \varvec{b}, \end{aligned}$$
(18a)
$$\begin{aligned} \varvec{\lambda }_{\mathcal {N}_\omega }= & {} 0, \end{aligned}$$
(18b)
$$\begin{aligned} \varvec{\lambda }_{\mathcal {N}_\gamma }= & {} \max \{0,\varvec{\lambda }_{\mathcal {N}_\gamma }-C(\varvec{x}_{\mathcal {N}_\gamma }-\varvec{g}_{\mathcal {N}_\gamma })\}, \end{aligned}$$
(18c)

where \(\max \{\cdot ,\cdot \}\) denotes the componentwise maximum and \(C>0\) is some constant. Note that the solution is independent of C. Now following the seminal work [21] one defines a (semi-smooth) Newton method for solving (18). The same lines of argumentation as in [21] show that the method can be written as an active set strategy. The algorithm adapted to our situation is given in Algorithm 1.

figure a

The solution of the linear system in Line 8 of Algorithm 1 can be written (with \(\mathcal {I}=\mathcal {I}^k\), \(\mathcal {J}=\mathcal {J}^k\)) as

$$\begin{aligned} \begin{pmatrix} \varvec{S}_{\mathcal {I}\mathcal {I}} &{}\quad \varvec{S}_{\mathcal {I}\mathcal {J}} \\ \varvec{S}_{\mathcal {J}\mathcal {I}} &{}\quad \varvec{S}_{\mathcal {J}\mathcal {J}} \end{pmatrix} \begin{pmatrix} \varvec{x}_{\mathcal {I}} \\ \varvec{x}_{\mathcal {J}} \end{pmatrix} - \begin{pmatrix} \varvec{\lambda }_{\mathcal {I}} \\ \varvec{\lambda }_{\mathcal {J}} \end{pmatrix} = \begin{pmatrix} \varvec{b}_{\mathcal {I}} \\ \varvec{b}_{\mathcal {J}} \end{pmatrix}. \end{aligned}$$

With the constraints \(\varvec{x}_{\mathcal {J}} = \varvec{g}_\mathcal {J}\) and \(\varvec{\lambda }_{\mathcal {I}} = 0\) this reduces to the solution of the system

$$\begin{aligned} \varvec{S}_{\mathcal {I}\mathcal {I}} \varvec{x}_{\mathcal {I}} = \varvec{b}_{\mathcal {I}}-\varvec{S}_{\mathcal {I}\mathcal {J}}\varvec{g}_\mathcal {J}\end{aligned}$$

and the definition

$$\begin{aligned} \varvec{\lambda }_{\mathcal {J}} := \varvec{S}_{\mathcal {J}\mathcal {I}}\varvec{x}_\mathcal {I}+ \varvec{S}_{\mathcal {J}\mathcal {J}}\varvec{g}_\mathcal {J}-\varvec{b}_\mathcal {J}. \end{aligned}$$

Since \(\varvec{S}\) is positive definite the subblock \(\varvec{S}_{\mathcal {I}\mathcal {I}}\) is as well and thus \(\varvec{S}_{\mathcal {I}\mathcal {I}}\) is invertible.

Some remarks are in order. We can follow the analysis of [21] to see that the basic (local) convergence result holds true in our case as well.

Proposition 18

[21, Theorem 3.1] If the initial guess \((\varvec{x}^0,\varvec{\lambda }^0)\) is sufficient close to the exact solution \((\varvec{x},\varvec{\lambda })\) of (18) then the iterates \((\varvec{x}^k,\varvec{\lambda }^k)\) in Algorithm 1 converge superlinearly to \((\varvec{x},\varvec{\lambda })\).

The stopping criterion in Line 5 can be replaced by other criterions. Here, we choose \(\mathcal {J}^k =\mathcal {J}^{k-1}\) because then we know that we have hit the exact solution of (17). The proof of the following result is a slight modification of the proof of [23, Lemma 3.1] and the interested reader can find it in “Appendix C”.

Lemma 19

If the stopping criterion in Line 5 of Algorithm 1 is satisfied, then \(\varvec{x}=\varvec{x}^k\) is the solution of (17).

5.1.2 Discrete variational inequalities

In this section we recall the discrete versions of the variational inequalities (VIa)–(VIc) and present them in matrix-vector form. They fit into the abstract framework given in Sect. 5.1.1.

Let us recall the discrete space from Sect. 3.3,

$$\begin{aligned} U_{h0} = \mathcal {S}_0^1(\mathcal {T}) \times \mathcal {R}\!\mathcal {T}^0(\mathcal {T}) \times \mathcal {P}^0(\mathcal {T}). \end{aligned}$$

Let \(\mathcal {E}\) denote the set of edges (\(n=2\)) resp. faces (\(n=3\)). Then, \(\dim (U_{h0}) = \#\mathcal {V}_0+\#\mathcal {E}+\#\mathcal {T}=:N\). Numbering the nodes \(x_j\) of \(\mathcal {V}_0\), the edges/faces \(E_j\) in \(\mathcal {E}\) and the elements \(T_j\) in \(\mathcal {T}\), we consider the following functions:

  • For \(j=1,\dots ,\#\mathcal {V}_0\) let \(v_j\) denote the nodal basis functions associated to the node \(x_j\in \mathcal {V}_0\).

  • For \(j=1,\dots ,\#\mathcal {E}\) let \({\varvec{\tau }}^{(j)}\) denote the Raviart–Thomas basis functions associated to the edge/face \(E_j\in \mathcal {E}\).

  • For \(j=1,\dots ,\#\mathcal {T}\) let \(\chi _j\) denote the characteristic function of the element \(T_j\in \mathcal {T}\).

We define the basis \(({\varvec{\xi }}^{(j)})_{j=1}^N\) for the space \(U_{h0}\) by

$$\begin{aligned} \begin{array}{lll} {\varvec{\xi }}^{(j)} := (v_j,0,0) &{}\quad \text {for}\,j=1,\dots ,\#\mathcal {V}_0, \\ {\varvec{\xi }}^{(\#\mathcal {V}_0+j)} := (0,{\varvec{\tau }}^{(j)},0) &{}\quad \text {for}\,j=1,\dots ,\#\mathcal {E}, \\ {\varvec{\xi }}^{(\#\mathcal {V}_0+\#\mathcal {E}+j)} := (0,0,\chi _j) &{}\quad \text {for}\,j=1,\dots ,\#\mathcal {T}. \end{array} \end{aligned}$$

Recall from Eq. 11 the discrete convex sets

$$\begin{aligned} K_h^s&:= \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h0}{:}\,\mu _h\ge 0, \, v_h(x)\ge g(x) \text { for all } x\in \mathcal {V}_0\big \},\\ K_h^0&:= \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h0}{:}\,v_h(x)\ge g(x) \text { for all } x\in \mathcal {V}_0\big \}, \\ K_h^1&:= \big \{(v_h,{\varvec{\tau }}_h,\mu _h)\in U_{h0}{:}\,\mu _h\ge 0\big \}. \end{aligned}$$

These convex subsets of \(U_{h0}\) correspond to convex subsets of \(\mathbb {R}^N\) as follows: For given obstacle function \(g\in H_0^1(\Omega )\cap C^0(\overline{\Omega })\) define the vector \(\varvec{g}\in \mathbb {R}^N\) by

$$\begin{aligned} \varvec{g}_j = {\left\{ \begin{array}{ll} g(x_j) &{}\quad \text {for }j=1,\dots ,\#\mathcal {V}_0, \\ 0 &{}\quad \text {else} \end{array}\right. }. \end{aligned}$$

Let \(\mathcal {N}= \{1,\dots ,N\}\) and define \(\mathcal {N}_\gamma ^s\), \(\mathcal {N}_\gamma ^0\), \(\mathcal {N}_\gamma ^1\) by

$$\begin{aligned} \mathcal {N}_\gamma ^s&:= \{1,\dots ,\#\mathcal {V}_0,\#\mathcal {V}_0+\#\mathcal {E}+1,\dots ,N\} \subset \mathcal {N}, \\ \mathcal {N}_\gamma ^0&:= \{1,\dots ,\#\mathcal {V}_0\}\subset \mathcal {N}, \\ \mathcal {N}_\gamma ^1&:= \{\#\mathcal {V}_0+\#\mathcal {E}+1,\dots ,N\} \subset \mathcal {N}. \end{aligned}$$

Then, the three sets \(K_h^s\), \(K_h^0\), \(K_h^1\) correspond to the sets

$$\begin{aligned} K_N^s&:= \big \{\varvec{x}\in \mathbb {R}^N{:}\,\varvec{x}_{\mathcal {N}_\gamma ^s}\ge \varvec{g}_{\mathcal {N}_\gamma ^s}\big \}, \\ K_N^0&:= \big \{\varvec{x}\in \mathbb {R}^N{:}\,\varvec{x}_{\mathcal {N}_\gamma ^0}\ge \varvec{g}_{\mathcal {N}_\gamma ^0}\big \}, \\ K_N^1&:= \big \{\varvec{x}\in \mathbb {R}^N{:}\,\varvec{x}_{\mathcal {N}_\gamma ^1}\ge 0\big \}. \end{aligned}$$

With these definitions we can now state the algebraic forms of the discrete variational inequalities:

5.1.3 Discrete version of (VIa) with \(K_h^s\)

The discrete version of (VIa) with convex set \(K_h^s\) reads: find \(\varvec{u}_h\in K_h^s\) such that

$$\begin{aligned} a_\beta (\varvec{u}_h,\varvec{v}_h-\varvec{u}_h) \ge F_\beta (\varvec{v}_h-\varvec{u}_h) \quad \text {for all } \varvec{v}_h\in K_h^s. \end{aligned}$$
(19)

Let \(\varvec{S}^{(s)}\in \mathbb {R}^N\times \mathbb {R}^N\) denote the Galerkin matrix of the bilinear form \(a_\beta (\cdot ,\cdot )\) and let \(\varvec{b}^{(s)}\in \mathbb {R}^N\) denote the load vector, i.e.,

$$\begin{aligned} \varvec{S}^{(s)}_{jk} = a_\beta ({\varvec{\xi }}^{(k)},{\varvec{\xi }}^{(j)}), \quad \varvec{b}^{(s)}_j = F_\beta ({\varvec{\xi }}^{(j)}) \end{aligned}$$

for all \(j,k=1,\dots ,N\). Note that \(\varvec{S}^{(s)}\) is symmetric and positive definite. Problem (19) then reads in algebraic form as: find \(\varvec{x}\in K_N^s\) such that

$$\begin{aligned} \langle \varvec{S}^{(s)}\varvec{x},\varvec{y}-\varvec{x}\rangle _2 \ge \langle \varvec{b}^{(s)},\varvec{y}-\varvec{x}\rangle _2 \quad \text {for all }\varvec{y}\in K_N^s. \end{aligned}$$
(20)

5.1.4 Discrete version of (VIb) with \(K_h^0\)

The discrete version of (VIb) with convex set \(K_h^0\) reads: find \(\varvec{u}_h\in K_h^0\) such that

$$\begin{aligned} b_\beta (\varvec{u}_h,\varvec{v}_h-\varvec{u}_h) \ge G_\beta (\varvec{v}_h-\varvec{u}_h) \quad \text {for all } \varvec{v}_h\in K_h^0. \end{aligned}$$
(21)

Let \(\varvec{S}^{(0)}\in \mathbb {R}^N\times \mathbb {R}^N\) denote the Galerkin matrix of the bilinear form \(b_\beta (\cdot ,\cdot )\) and let \(\varvec{b}^{(0)}\in \mathbb {R}^N\) denote the load vector, i.e.,

$$\begin{aligned} \varvec{S}^{(0)}_{jk} = b_\beta ({\varvec{\xi }}^{(k)},{\varvec{\xi }}^{(j)}), \quad \varvec{b}^{(0)}_j = G_\beta ({\varvec{\xi }}^{(j)}) \end{aligned}$$

for all \(j,k=1,\dots ,N\). Note that \(\varvec{S}^{(0)}\) is non-symmetric and positive definite. Problem (21) then reads in algebraic form as: find \(\varvec{x}\in K_N^0\) such that

$$\begin{aligned} \langle \varvec{S}^{(0)}\varvec{x},\varvec{y}-\varvec{x}\rangle _2 \ge \langle \varvec{b}^{(0)},\varvec{y}-\varvec{x}\rangle _2 \quad \text {for all }\varvec{y}\in K_N^0. \end{aligned}$$
(22)

5.1.5 Discrete version of (VIc) with \(K_h^1\)

The discrete version of (VIc) with convex set \(K_h^1\) reads: find \(\varvec{u}_h\in K_h^1\) such that

$$\begin{aligned} c_\beta (\varvec{u}_h,\varvec{v}_h-\varvec{u}_h) \ge H_\beta (\varvec{v}_h-\varvec{u}_h) \quad \text {for all } \varvec{v}_h\in K_h^1. \end{aligned}$$
(23)

Let \(\varvec{S}^{(1)}\in \mathbb {R}^N\times \mathbb {R}^N\) denote the Galerkin matrix of the bilinear form \(c_\beta (\cdot ,\cdot )\) and let \(\varvec{b}^{(1)}\in \mathbb {R}^N\) denote the load vector, i.e.,

$$\begin{aligned} \varvec{S}^{(1)}_{jk} = c_\beta ({\varvec{\xi }}^{(k)},{\varvec{\xi }}^{(j)}), \quad \varvec{b}^{(1)}_j = H_\beta ({\varvec{\xi }}^{(j)}) \end{aligned}$$

for all \(j,k=1,\dots ,N\). Note that \(\varvec{S}^{(1)}\) is non-symmetric and positive definite. Problem (23) then reads in algebraic form as: find \(\varvec{x}\in K_N^1\) such that

$$\begin{aligned} \langle \varvec{S}^{(1)}\varvec{x},\varvec{y}-\varvec{x}\rangle _2 \ge \langle \varvec{b}^{(1)},\varvec{y}-\varvec{x}\rangle _2 \quad \text {for all }\varvec{y}\in K_N^1. \end{aligned}$$
(24)

5.1.6 Solver setup

The algebraic problems (20), (22) and (24) are then solved using Algorithm 1. The initial data \((\varvec{x}^0,\varvec{\lambda }^0)\) is chosen as the solution of

$$\begin{aligned} \varvec{S}^{(\star )} \varvec{x}^0&= \varvec{b}^{(\star )}, \\ \varvec{x}^0_{\mathcal {N}_\gamma ^{\star }}&= \varvec{g}_{\mathcal {N}_\gamma ^{\star }} \end{aligned}$$

and

$$\begin{aligned} \varvec{\lambda }^0 := \max \{0,\varvec{S}^{(\star )}\varvec{x}^0-\varvec{b}^{(\star )}\}, \end{aligned}$$

where \(\star =s\), \(\star =0\) or \(\star =1\). The constant C in Algorithm 1 is choosen as \(C=1\). The linear systems in Line 8 of Algorithm 1 are solved using the MATLAB backslash operator.

5.2 Error and estimator quantities

We define the error resp. total estimator by

$$\begin{aligned} {{\,\mathrm{err}\,}}_U := \Vert \varvec{u}-\varvec{u}_h\Vert _{U}, \quad {{\,\mathrm{est}\,}}^2 := \eta ^2 + \rho ^2 + {{\,\mathrm{osc}\,}}^2. \end{aligned}$$

Note that the estimator can be decomposed into local contributions,

$$\begin{aligned} {{\,\mathrm{est}\,}}^2 = \sum _{T\in \mathcal {T}} {{\,\mathrm{est}\,}}(T)^2&:= \sum _{T\in \mathcal {T}} \Big ( \Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+\Pi _h f\Vert _{T}^2 + \Vert \nabla u_h-\varvec{\sigma }_h\Vert _{T}^2 \\&\qquad +\,(\lambda _h,(u_h-g)_+)_T + \Vert \nabla (g-u_h)_+\Vert _{T}^2 + \Vert (1-\Pi _h)f\Vert _{T}^2 \Big ), \end{aligned}$$

where \(\Vert \cdot \Vert _{T}\) denotes the \(L^2(T)\) norm and \((\cdot ,\cdot )_T\) the \(L^2(T)\) inner product. Moreover, we will estimate the error in the weaker norm \(\Vert \cdot \Vert _{V}\). To do so we consider an upper bound given by

$$\begin{aligned} {{\,\mathrm{err}\,}}_V^2 := \Vert \nabla (u-u_h)\Vert _{}^2 + \Vert \varvec{\sigma }-\varvec{\sigma }_h\Vert _{}^2 + \Vert \lambda -\lambda _h\Vert _{-1,h}^2, \end{aligned}$$

where the evaluation of \(\Vert \cdot \Vert _{-1,h}\) is based on the discrete \(H^{-1}(\Omega )\) norm discussed in the seminal work [8]: Let \(Q_h{:}\,L^2(\Omega )\rightarrow \mathcal {S}_0^1(\mathcal {T})\) denote the \(L^2(\Omega )\) projector. Let \(\mu \in L^2(\Omega )\). We stress that using the projection and local approximation property of \(Q_h\) yields

$$\begin{aligned} \Vert (1-Q_h)\mu \Vert _{-1} = \sup _{0\ne v\in H_0^1(\Omega )} \frac{\langle (1-Q_h)\mu ,(1-Q_h)v\rangle }{\Vert \nabla v\Vert _{}} \lesssim \Vert h_\mathcal {T}\mu \Vert _{}, \end{aligned}$$

where the involved constant depends on shape regularity of \(\mathcal {T}\). Following [8] it holds that

$$\begin{aligned} \Vert \mu \Vert _{-1} \le \Vert (1-Q_h)\mu \Vert _{-1} + \Vert Q_h\mu \Vert _{-1} \lesssim \Vert h_\mathcal {T}\mu \Vert _{} + \Vert \nabla u_h[\mu ]\Vert _{} \end{aligned}$$

where \(u_h[\mu ]\in \mathcal {S}_0^1(\mathcal {T})\) is the solution of

$$\begin{aligned} (\nabla u_h[\mu ],\nabla v_h) = \langle \mu ,v_h\rangle \quad \text {for all }v_h\in \mathcal {S}_0^1(\mathcal {T}). \end{aligned}$$

Note that \(\Vert \nabla u_h[\mu ]\Vert _{}\le \Vert \mu \Vert _{-1}\). The estimate \(\Vert Q_h\mu \Vert _{-1}\lesssim \Vert \nabla u_h[\mu ]\Vert _{}\) depends on the stability of the projection \(Q_h\) in \(H^1(\Omega )\), \(\Vert \nabla Q_h v\Vert _{} \lesssim \Vert \nabla v\Vert _{}\) for \(v\in H_0^1(\Omega )\), i.e.,

$$\begin{aligned} \Vert Q_h\mu \Vert _{-1}&= \sup _{0\ne v\in H_0^1(\Omega )} \frac{\langle Q_h \mu ,v\rangle }{\Vert \nabla v\Vert _{}} = \sup _{0\ne v\in H_0^1(\Omega )} \frac{\langle \mu ,Q_hv\rangle }{\Vert \nabla v\Vert _{}}\\&= \sup _{0\ne v\in H_0^1(\Omega )} \frac{(\nabla u_h[\mu ],\nabla Q_hv)}{\Vert \nabla v\Vert _{}} \\&\lesssim \sup _{0\ne v\in H_0^1(\Omega )} \frac{(\nabla u_h[\mu ],\nabla Q_h v)}{\Vert \nabla Q_h v\Vert _{}} = \Vert \nabla u_h[\mu ]\Vert _{}. \end{aligned}$$

Here, we use newest-vertex bisection [30] as refinement strategy where stability of the \(L^2(\Omega )\) projection is known [24].

Fig. 1
figure 1

Convergence rates for the problem from Sect. 5.3

We use an adaptive algorithm that basically consists of iterating the four steps

where the marking step is done with the bulk criterion, i.e., we determine a set \(\mathcal {M}\subseteq \mathcal {T}\) of (up to a constant) minimal cardinality with

$$\begin{aligned} \theta {{\,\mathrm{est}\,}}^2 \le \sum _{T\in \mathcal {M}} {{\,\mathrm{est}\,}}(T)^2. \end{aligned}$$

For the experiments the marking parameter \(\theta \) is set to \(\tfrac{1}{4}\).

Convergence rates in the figures are indicated by triangles, where the number \(\alpha \) besides the triangle denotes the experimental rate \(\mathcal {O}( (\#\mathcal {T})^{-\alpha })\). For uniform refinement we have \(h^{2\alpha } \simeq \#\mathcal {T}^{-\alpha }\).

Fig. 2
figure 2

Convergence rates for the problem from Sect. 5.4. The upper plot shows the total errors and estimators for uniform and adaptive refinement. The lower plot compares the error and estimator contributions in the case of adaptive refinements

Fig. 3
figure 3

Approximation \(\lambda _h\) (left) and distribution of the estimator contribution \(\rho ^2\) (right) for the example from Sect. 5.4

Fig. 4
figure 4

Experimental convergence rates for the problem from Sect. 5.5

Fig. 5
figure 5

Adaptively refined meshes and corresponding solution component \(u_h\) for the problem from Sect. 5.5

5.3 Smooth solution

Let \(\Omega = (0,1)^2\), \(u(x,y) = (1-x)x(1-y)y\),

$$\begin{aligned} f(x,y) := {\left\{ \begin{array}{ll} 0 &{}\quad x<\tfrac{1}{2} \\ -\,\Delta u (x,y) &{}\quad x\ge \tfrac{1}{2} \end{array}\right. }. \end{aligned}$$

Then, u solves the obstacle problem (1) with data f and obstacle

$$\begin{aligned} g(x,y) = {\left\{ \begin{array}{ll} (1-x)x(1-y)y &{}\quad x\le \tfrac{1}{2} \\ \widetilde{g}(x) (1-y)y &{}\quad x\in \left( \tfrac{1}{2},\tfrac{3}{4}\right) \\ 0 &{}\quad x\ge \tfrac{3}{4} \end{array}\right. }, \end{aligned}$$

where \(\widetilde{g}\) is the unique polynomial of degree 3 such that g and \(\nabla g\) are continuous at the lines \(x=\tfrac{1}{2},\tfrac{3}{4}\). In particular, \(g\in H^2(\Omega )\). Note that \(\lambda = -\Delta u-f \in H^1(\mathcal {T})\). Figure 1 shows that the convergence rates for the solutions of the discrete variational inequalities (VIa)–(VIc) based on the convex sets \(K_h^s\), \(K_h^0\), \(K_h^1\) are optimal. This perfectly fits to our theoretic considerations in Theorems 1315. Additionally, we plot \({{\,\mathrm{err}\,}}_V\) which is in all cases slightly smaller than \({{\,\mathrm{err}\,}}_U\) but of the same order. Note that since \(\lambda \) is a \(\mathcal {T}\)-elementwise polynomial, an inverse inequality shows that \(h\Vert \lambda -\lambda _h\Vert _{} \lesssim \Vert \lambda -\lambda _h\Vert _{-1}\) and thus \({{\,\mathrm{err}\,}}_V\) is equivalent to \(\Vert \varvec{u}-\varvec{u}_h\Vert _{V}\).

5.4 Manufactured solution on L-shaped domain

We consider the same problem as given in [4, Section 5.2], where \(g=0\), \(\Omega = (-2,2)^2{\setminus } [0,2]^2\) and

$$\begin{aligned} f(r,\varphi ) := -r^{2/3} \sin (2/3\varphi )(\gamma '(r)/r + \gamma ''(r)) - 4/3 r^{-1/3} \gamma '(r) \sin (2/3\varphi )-\delta (r), \end{aligned}$$

where \((r,\varphi )\) denote polar coordinates. With \(r_* = 2(r-1/4)\), \(\gamma ,\delta \) are given by

$$\begin{aligned} \gamma (r) := {\left\{ \begin{array}{ll} 1 &{}\quad r_*< 0, \\ -\,6r_*^5 + 15r_*^4 -10r_*^3+1 &{}\quad 0\le r_* < 1, \\ 0 &{}\quad 1\le r_*, \end{array}\right. } \quad \delta (r) := {\left\{ \begin{array}{ll} 0 &{}\quad r \le 5/4, \\ 1 &{}\quad r > 5/4. \end{array}\right. } \end{aligned}$$

The exact solution then reads \(u(r,\varphi ) = r^{2/3}\sin (2/3\varphi )\gamma (r)\). Note that u has a generic singularity at the reentrant corner. We consider the discrete version of (VIa), where solutions are sought in the convex set \(K_h^s\). We conducted various tests with \(\beta \) between 1 and 100 and the results were in all cases comparable. For the results displayed here we have used \(\beta =3\). Figure 2 displays convergence rates in the case of uniform and adaptive mesh-refinement. We note that in the first plot the lines for \({{\,\mathrm{err}\,}}_U\) and \({{\,\mathrm{est}\,}}\) are almost identical. In the second plot we compare the contributions of the overall error and estimator in the adaptive case. The lines for \({{\,\mathrm{osc}\,}}\) and \(\Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+f\Vert _{}\) are almost identical. This means that the estimator contribution \(\Vert {{\,\mathrm{div}\,}}\varvec{\sigma }_h+\lambda _h+\Pi _hf\Vert _{}\) in \(\eta \) is negligible and \({{\,\mathrm{osc}\,}}\) is dominating the overall estimator. We observe from the first plot that \({{\,\mathrm{err}\,}}_V\) is much smaller than \({{\,\mathrm{err}\,}}_U\) but has the same rate of convergence. In the uniform case we see that the errors and estimators approximately converge at rate 0.45. One would expect a smaller rate due to the singularity. However, in this example the solution has a large gradient so that the algorithm first refines the regions where the gradient resp. f is large. This preasymptotic behavior was also observed in [4, Section 5.2]. Nevertheless, adaptivity yields a significant error reduction.

Figure 3 shows the approximation \(\lambda _h\) (left column) and the distribution of the estimator contribution \(\rho ^2\) (right column) on some adaptively refined meshes.

5.5 Unknown solution

For our final experiment, we choose \(\Omega = (-1,1)^2 {\setminus } [-\,1,0]^2\), \(f=1\), and the pyramid-like obstacle \(g(x) = \max \{0,{{\,\mathrm{dist}\,}}(x,\partial \Omega _u)-\tfrac{1}{4}\}\), where \(\Omega _u = (0,1)^2\). The solution in this case is unknown. We solve the discrete version of (VIa) with convex set \(K_h^s\). Since f is constant we have \({{\,\mathrm{osc}\,}}= 0\). Figure 4 shows the overall estimator (left) and its contributions (right). We observe that uniform refinement leads to the reduced rate \(\tfrac{1}{3}\), whereas for adaptive refinement we recover the optimal rate. Heuristically, we expect the solution to have a singularity at the reentrant corner as well as in the contact regions. This would explain the reduced rates. Figure 5 visualizes meshes produced by the adaptive algorithm and corresponding solution components \(u_h\). We observe strong refinements towards the corner (0, 0) and around the point \((\tfrac{1}{2},\tfrac{1}{2})\), which coincides with the tip of the pyramid obstacle.