1 Introduction

Shape calculus studies the “differentiation of shape functionals with respect to the variation of a domain they depend upon”. Over the last three decades this notion has been made rigorous, notably by the introduction of the velocity method by Zolesio [10, 28] and the domain perturbation method by Simon [24, 25, 27] and Eppler [11, 12]. Shape calculus has also become important as a key tool in the field of optimization, where it supplies the so-called shape gradient, that is, the first derivative of a functional with respect to a shape, for use in the framework of descent methods. Since this article will not directly discuss methods for shape optimization we refer the reader to the monographs [2, 8, 15, 18, 19, 26, 28].

Shape optimization entails the approximate numerical computation of shape gradients. This step will be the focus of this article. Of course, many different shape functionals are conceivable, leading to vastly different types of shape gradients. Thus, we have to adopt a “case study approach” and restrict our study to a special, albeit important, class of shape functionals.

The shape functionals under scrutiny are least squares output functionals for solutions of scalar second-order elliptic boundary value problems. They belong to the category of PDE constrained shape functionals and have widely been considered in articles on shape optimization [3, 16].

In [2], for instance, formulas have been derived for the associated shape gradients. They are based on solutions \(u\) and \(p\) of two boundary value problems, called state and adjoint problem. Starting point for our investigations was the insight that the formulas can be stated in two equivalent ways, (i) as expressions involving traces of \(u\) and \(p\) on the boundary of the domain, and (ii) by means of volume integrals on the domain, see [5, Sect. 6].

The situation resembles that faced for quite a few common output functionals depending on solutions of BVPs for second-order elliptic PDEs. Examples are the total heat flux in heat conduction, lift functionals for potential flow [16], far field functionals [22, 23], and electromagnetic force functionals [21]. All these functionals can be stated as integrals either over boundaries or over parts of the domain, and the same value is obtained when inserting exact solutions of the BVPs. Both kinds of formulas can also be used in the context of finite element approximation, but when applied to discrete solutions, they fail to give the same answer. More strikingly, the volume integrals often display much faster convergence and provide superior accuracy compared to their boundary based counterparts. An explanation is that the expressions featuring volume integrals enjoy continuity in energy norm, whereas integrals of traces are not well-defined on the natural variational spaces. This makes a crucial difference, because we can benefit from superconvergence, when evaluating continuous functionals for Galerkin solutions [4, Sect. 2].

This made us suspect that similar effects could be observed for the different expressions for shape gradients and their use with finite element solutions. The analysis and numerical experiments of this article largely confirm our expectation that volume based expressions for the shape gradient often offer better accuracy than the use of formulas involving traces on boundaries. This is the message of both the a priori convergence estimates developed in Sect. 3, see Theorems 3.1 and 3.2, and of the numerical tests reported in Sect. 4.

What compounds the difficulties of gauging the quality of formulas for shape gradients is the fact that they must be viewed as linear functionals on spaces of infinitesimal deformations. Of course, one can switch back to functions via the Riesz representation theorem, but the choice of the underlying inner product is somewhat arbitrary and might bias the outcome. Thus, we have decided to study the errors of shape gradients directly in the relevant dual norms.

2 Shape gradients

Let \(\varOmega \subset \mathbb {R}^d\), \(d=2,3\), be an open bounded domain with piecewise smooth boundary \(\partial \varOmega \), and let  \({\mathcal {J}}(\varOmega )\in \mathbb {R}\) be a real-valued quantity of interest associated to it. One is often interested in its shape sensitivity, which quantifies the impact of small perturbations of \(\partial \varOmega \) on the value \({\mathcal {J}}(\varOmega )\).

For this purpose, we model perturbations of the domain \(\varOmega \) through maps of the form

$$\begin{aligned} T_\mathcal{V}:= \mathcal{I}+ \mathcal{V}\,, \end{aligned}$$
(2.1)

where \(\mathcal{I}\) is the identity operator and \(\mathcal{V}\) is a vector field in \(C^1(\mathbb {R}^d;\mathbb {R}^d)\). It can easily be proven that the map (2.1) is a diffeomorphism for \(\Vert \mathcal{V}\Vert _{C^1} <1\) [2, Lemma 6.13]. Therefore, it is natural to consider \({\mathcal {J}}(\varOmega )\) as the realization of a shape functional, a real map

$$\begin{aligned} \mathcal{J}: \mathcal{A}\rightarrow \mathbb {R}\end{aligned}$$

defined on the family of admissible domains

$$\begin{aligned} \mathcal{A}:\,= \left\{ T_\mathcal{V}(\varOmega )\,;\mathcal{V}\in C^1(\mathbb {R}^d;\mathbb {R}^d)\,, \Vert \mathcal{V}\Vert _{C^1}<1 \right\} \,. \end{aligned}$$

The sensitivity of \(\mathcal{J}(\varOmega )\) with respect to the perturbation direction \(\mathcal{V}\) can be expressed through the Eulerian derivative of the shape functional \(\mathcal{J}\) in the direction \(\mathcal{V}\), that is,

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V}) :\,=\lim _{s\searrow 0} \frac{\mathcal{J}\left( T_{s\cdot \mathcal{V}} (\varOmega )\right) -\mathcal{J}(\varOmega )}{s}\,. \end{aligned}$$
(2.2)

It goes without saying that it is desirable that (2.2) exists for all possible perturbation directions \(\mathcal{V}\). It is therefore natural to define a shape functional \(\mathcal{J}\) to be shape differentiable at \(\varOmega \) if the mapping

$$\begin{aligned} d\mathcal{J}(\varOmega ;\cdot ) : C^1(\mathbb {R}^d;\mathbb {R}^d) \rightarrow \mathbb {R}, \qquad \mathcal{V}\mapsto d\mathcal{J}(\varOmega ;\mathcal{V})\,. \end{aligned}$$
(2.3)

defined by (2.2) is linear and bounded on \(C^1(\mathbb {R}^d;\mathbb {R}^d)\). In literature, the mapping \(d\mathcal{J}(\varOmega ;\mathcal{V})\) is called shape gradient of \(\mathcal{J}\) at \(\varOmega \), as it is the Gâteaux derivative in \(0\in C^1(\mathbb {R}^d;\mathbb {R}^d)\) of the map

$$\begin{aligned} \mathcal{V}\mapsto \mathcal{J}\left( T_\mathcal{V}(\varOmega )\right) \,, \end{aligned}$$

see [10, Ch. 9, Def. 2.2]. Note that Formula (2.2) is well-defined for any vector field in the Banach space \(C^1(\mathbb {R}^d;\mathbb {R}^d)\), and the shape gradient is an element of its dual space.

Remark 2.1

In literature, perturbations as in (2.1) are known as perturbations of the identity. From a differential geometry point of view, this approach is less general than the so called velocity method, which is, for instance, introduced in [10, Ch. 4]. However, both methods lead to the same formula for the shape gradient, which merely takes into account first order perturbations of the shape functional \(\mathcal{J}\) [10, Ch. 9, Thm 3.2].

An interesting property of shape gradients is expressed in the Hadamard structure theorem [10, Ch. 9, Thm 3.6]: If \(\partial \varOmega \) is smooth, \(d\mathcal{J}(\varOmega ;\cdot )\) admits a representative \(\mathfrak {g}(\varOmega )\) in the space of distributions \({\mathcal {D}}^k(\partial \varOmega )\)

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V}) = \langle \mathfrak {g}(\varOmega ), \gamma _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\rangle _{{\mathcal {D}}^k(\partial \varOmega )}\,, \end{aligned}$$
(2.4)

where \( \gamma _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\) is the normal component of \(\mathcal{V}\) on the boundary \(\partial \varOmega \). This implies that only normal displacements of the boundary have an impact on the value of \(\mathcal{J}(\varOmega )\). However, we should take into account that this is no longer true, if the boundary \(\partial \varOmega \) is only piecewise smooth.

We are particularly interested in PDE constrained shape functionals of the form

$$\begin{aligned} \mathcal{J}(\varOmega ) = \int _{\varOmega } j(u) \, d{\mathbf {x}}\,, \end{aligned}$$
(2.5)

where \(j:\mathbb {R}\rightarrow \mathbb {R}\) possesses a locally Lipschitz continuous derivative \(j'\) and \(u\) is the solution of the state problem, a scalar elliptic equation with Neumann or Dirichlet boundary conditions

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \mathcal{L}(u) = f &{} \text {in } \varOmega \,,\\ u = g \text { or } \frac{\partial {u}}{\partial {\mathbf {n}}} = g &{} \text {on } \partial \varOmega \,. \end{array}\right. \end{aligned}$$
(2.6)

The functions \(f\) and \(g\) are assumed to belong to \(L^2(\mathbb {R}^d)\) (\(H^1(\mathbb {R}^d)\) in the case of the Neumann BVP) and \(H^2(\mathbb {R}^d)\), respectively, and they are identified with their restrictions onto \(\varOmega \) and \(\partial \varOmega \).

Explicit formulas for \(d\mathcal{J}(\varOmega )\) can easily be derived both for unconstrained and PDE constrained shape functionals, cf. [10, Ch. 9, Sect. 4.3, and Ch. 10, Sect. 2.5]. In the case of PDE constrained shape functionals, the formulas involve the integrals of \(u\), the solution of (2.6), and of \(p\), the solution of the adjoint problemFootnote 1

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} \mathcal{L}(p) = j'(u) &{} \text {in } \varOmega \,,\\ p = 0 \text { or } \frac{\partial {p}}{\partial {\mathbf {n}}} = 0 &{} \text {on } \partial \varOmega \,. \end{array}\right. \end{aligned}$$
(2.7)

As different \(\mathcal{L}\) lead to different formulas for the Eulerian derivative, from now on we consider only the model elliptic operator

$$\begin{aligned} \mathcal{L}(u) = -\Delta u + u\,, \end{aligned}$$
(2.8)

which should be regarded as a representative for the class of scalar elliptic differential operators of second order.

As mentioned in the introduction, \(d\mathcal{J}(\varOmega ;\mathcal{V})\) can be formulated as an integral over a volume, as well as an integral on the boundary. For example, the formula for the PDE constrained shape functional (2.5) with elliptic operator (2.8) and Dirichlet boundary conditions \(u=g\) on \(\partial \varOmega \) reads (see the Appendix for the derivation)

$$\begin{aligned} \nonumber d\mathcal{J}(\varOmega ;\mathcal{V})&= \int _{\varOmega } \left( \nabla u \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p - f\mathcal{V}\cdot \nabla p \right. \nonumber \\&\quad + \mathrm{div }\mathcal{V}(j(u) - \nabla u\cdot \nabla p - up )\nonumber \\&\quad \left. + (j'(u)-p)(\nabla g\cdot \mathcal{V})-\nabla p \cdot \nabla (\nabla g \cdot \mathcal{V}) \right) \, d{\mathbf {x}}\,, \end{aligned}$$
(2.9)

and can be recast as

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V}) = \int _{\partial \varOmega } \left( \mathcal{V}\cdot {\mathbf {n}}\right) \left( j(u)+\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {(u-g)}}{\partial {\mathbf {n}}} \right) \, dS\,. \end{aligned}$$
(2.10)

The volume integral (2.9) and the boundary integral (2.10) are equivalent representations of the shape gradient \(d\mathcal{J}(\varOmega ;\mathcal{V})\). They can be converted into each other by means of integration by parts on \(\partial \varOmega \) [28, Sect. 3.8] and Gauss’s theorem. However, the bulk of literature mainly considers (2.10) and does not pay attention to (2.9), probably because the former better matches the Hadamard structure theorem (2.4). Only recently it has been realized that the volume representation (2.9) may be better suited for computations, see [5] and [10, Ch. 10, Remark 2.3].

Remark 2.2

In the case of Neumann boundary conditions on smooth domains, the counterparts of Formulas (2.9) and (2.10) read

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V})&= \int _{\varOmega } \left( (\nabla f \cdot \mathcal{V})p +\nabla u \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p \right. \nonumber \\&\quad \left. + \mathrm{div }\mathcal{V}(fp + j(u) - \nabla u\cdot \nabla p - up )\right) \, d{\mathbf {x}}\nonumber \\&\quad + \int _{\partial \varOmega } (\nabla g \cdot \mathcal{V})p + gp \mathrm{div }_\Gamma \mathcal{V}\, dS\,, \end{aligned}$$
(2.11)

where \(\mathrm{div }_\Gamma \) denotes the tangential divergence on \(\partial \varOmega \), and

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V}) = \int _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\left( j(u)-\nabla u \cdot \nabla p - up + fp + \frac{\partial {gp}}{\partial {\mathbf {n}}} + \mathrm {K}gp \right) \, dS\,, \end{aligned}$$
(2.12)

where \(K\) is the mean curvature of \(\partial \varOmega \).

Remark 2.3

In general, the shape gradient does not feature the Hadamard structure (2.4) if the boundary is piecewise smooth only. For instance, in the presence of corners in 2D, Formula (2.12) has to be corrected by adding the term

$$\begin{aligned} \sum _{i} p({\mathbf {a}}_i)g({\mathbf {a}}_i)\mathcal{V}({\mathbf {a}}_i)\cdot [[\tau ({\mathbf {a}}_i)]]\,, \end{aligned}$$
(2.13)

where the \({\mathbf {a}}_i\) denote the corner points and \([[\tau ({\mathbf {a}}_i)]]\) is the jump of the tangential unit vector field in the corner \({\mathbf {a}}_i\) [28, Ch. 3.8]. On the other hand, no correction has to be made to formula (2.10).

3 Approximation of shape gradients

In this section we investigate the approximation of the shape gradient \(d\mathcal{J}\). For the sake of readability, we perform the analysis for the elliptic operator (2.8) with Dirichlet boundary conditions only. The results can easily be extended to general elliptic operators in divergence form with both Dirichlet and Neumann boundary conditions.

To highlight the dependence of \(d\mathcal{J}\) on the solution of the state and adjoint problem \(u\) and \(p\), as well as to distinguish between formulas (2.9) and (2.10), we introduce the notations

$$\begin{aligned} \nonumber d\mathcal{J}(\varOmega ,u,p; \mathcal{V})^{\mathrm {Vol} }&:= \int _{\varOmega } \left( \nabla u \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p - f\mathcal{V}\cdot \nabla p\right. \nonumber \\&+ \mathrm{div }\mathcal{V}(j(u) - \nabla u\cdot \nabla p - up )\nonumber \\&\left. + (j'(u)-p)(\nabla g\cdot \mathcal{V})-\nabla p \cdot \nabla (\nabla g \cdot \mathcal{V}) \right) \, d{\mathbf {x}}\,,\end{aligned}$$
(3.1)
$$\begin{aligned} d\mathcal{J}(\varOmega ,u,p;\mathcal{V})^{\mathrm {Bdry} }&:= \int _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\left( j(u)+\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {(u-g)}}{\partial {\mathbf {n}}} \right) \, dS\,. \end{aligned}$$
(3.2)

Note that, provided \(u\) and \(p\) are exact solutions of (2.6) and (2.7),

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V}) = d\mathcal{J}(\varOmega ,u,p; \mathcal{V})^{\mathrm {Vol} }= d\mathcal{J}(\varOmega ,u,p;\mathcal{V})^{\mathrm {Bdry} }\,. \end{aligned}$$
(3.3)

The operator \(d\mathcal{J}(\varOmega ;\cdot )\) can be approximated by replacing the functions \(u\) and \(p\) with Ritz–Galerkin Lagrangian finite element solutions of (2.6) and (2.7) respectively. We consider approximations based on discretization with finite elements, as this approach is very popular in shape optimization due to its flexibility for engineering applications. Approximations based on boundary element methods are also possible, cf. [13, 17, 29].

Equality (3.3) certainly breaks down when the functions \(u\) and \(p\) are approximated with finite elements [5]. Thus, a natural question is, which formula, (3.1) or (3.2), should be preferred for an approximation of \(d\mathcal{J}(\varOmega ;\cdot )\) in the operator norm. The answer is provided by Theorems 3.1 and 3.2. Next we state a few assumptions necessary for a precise statement of the theorems.

Assumption 1

The Dirichlet BVP for the Laplacian is \(H^2\)-regular [6, Ch. II, Def. 7.1], that is, if a function \(w\in H^1_0(\varOmega )\) is the (unique) weak solution of the elliptic BVP

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w + w= \rho &{}\text {in }\varOmega \,,\\ w=0 &{}\text {on } \partial \varOmega \,, \end{array}\right. \end{aligned}$$

for a function \(\rho \in L^{2}({\varOmega })\), then \(w\in H^2(\varOmega )\), and there is a constant \(C_r\),depending only on \(\varOmega \), so that

$$\begin{aligned} \Vert w \Vert _{H^2(\varOmega )} \le C_r\Vert \rho \Vert _{L^{2}({\varOmega })}\,. \end{aligned}$$

Remark 3.1

Assumption 1 holds for convex Lipschitz domains and (possibly non-convex) domains with \(C^2\) boundary [6, Ch. II, Thm 7.2].

Assumption 2

The source function \(f\) and the boundary data \(g\) in (2.6) are restrictions of functions in \(H^1(\mathbb {R}^d)\) and \(H^3(\mathbb {R}^d)\) to \(\varOmega \) and \(\partial \varOmega \), respectively.

Next, for an index set \(\mathbb {H}\), we introduce a family \((V_h)_{h\in \mathbb {H}}\) of finite-dimensional subspaces of \(H^1_0(\varOmega )\) and define \(u_h\in g+V_h\), \(p_h\in V_h\) as Ritz–Galerkin solutionsFootnote 2 of (2.6) and (2.7), respectively, that is,

$$\begin{aligned} \int _\varOmega \nabla u_h\cdot \nabla v_h + u_h v_h\,d{\mathbf {x}}&= \int _\varOmega f v_h\,d{\mathbf {x}}\qquad \qquad \forall \, v_h \in V_h\,, \end{aligned}$$
(3.4)
$$\begin{aligned} \int _\varOmega \nabla p_h\cdot \nabla v_h + p_h v_h\,d{\mathbf {x}}&= \int _\varOmega j(u_h) v_h\,d{\mathbf {x}}\qquad \forall \, v_h \in V_h\,. \end{aligned}$$
(3.5)

In particular, let \((V_h)_{h\in \mathbb {H}}\) be a family of \(H^{1}\)-conforming piecewise linear Lagrangian finite element spaces built on a shape-regular and quasi-uniform family of simplicial meshes [6, Ch. II, Def. 5.1], and let \(h\) designate the meshwidth. We recall that the associated family of nodal interpolation operators

$$\begin{aligned} \mathcal{I}_h: H^2(\varOmega )\cap H_0^1(\varOmega ) \rightarrow V_h \end{aligned}$$

satisfiesFootnote 3 [6, Ch. II, Thm 6.4]

$$\begin{aligned} \Vert w - \mathcal{I}_h w \Vert _{H^{1}({\varOmega })} \le C h \vert w\vert _{H^2(\varOmega )}\quad \forall \, h \in \mathbb {H}\,. \end{aligned}$$
(3.6)

Theorem 3.1

Let \(u\) and \(p\) be the solutions of (2.6) and (2.7), and let \(u_h\) and \(p_h\) be their Ritz–Galerkin approximations in the sense of (3.4) and (3.5) by piecewise linear Lagrangian finite elements. Furthermore, let Assumptions 1 and 2 be satisfied. ThenFootnote 4

$$\begin{aligned} |d\mathcal{J}(\varOmega ;\mathcal{V}) - d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }|\le C(\varOmega ,u,p,f,g) h^2\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\,, \end{aligned}$$

where the constant \(C(\varOmega , u,p,f,g)\) depends on the domain \(\varOmega \) and its discretization, \(\Vert u \Vert _{H^2(\varOmega )}\), \(\Vert p \Vert _{H^2(\varOmega )}\), \(\Vert f \Vert _{H^1(\varOmega )}\), and \(\Vert g \Vert _{H^3(\varOmega )}\) .

Proof

The proof heavily relies on duality techniques that are repeatedly used to obtain estimates for the various terms in (3.1). The impatient reader may skip the proof after (3.14) and will get main ideas nevertheless.

From the equality \(d\mathcal{J}(\varOmega ;\mathcal{V}) = d\mathcal{J}(\varOmega ,u,p; \mathcal{V})^{\mathrm {Vol} }\), we immediately get by the triangle inequality

$$\begin{aligned}&|d\mathcal{J}(\varOmega ;\mathcal{V}) - d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }|\nonumber \\&\quad \le \left( \left|\int _\varOmega \nabla u \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p - \nabla u_h \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p_h \,d{\mathbf {x}}\right|\right. \nonumber \\&\qquad +\left|\int _\varOmega f\mathcal{V}\cdot \nabla (p-p_h) \,d{\mathbf {x}}\right|\nonumber \\&\qquad +\left|\int _\varOmega \mathrm{div }\mathcal{V}(j(u) - j(u_h) - \nabla u\cdot \nabla p - up + \nabla u_h\cdot \nabla p_h + u_hp_h )\,d{\mathbf {x}}\right|\nonumber \\&\qquad + \left|\int _\varOmega (j'(u)-j'(u_h)-p+p_h)(\nabla g\cdot \mathcal{V})\,d{\mathbf {x}}\right|\nonumber \\&\qquad \left. +\left|\int _\varOmega \nabla (p-p_h) \cdot \nabla (\nabla g \cdot \mathcal{V})\,d{\mathbf {x}}\right|\right) \,. \end{aligned}$$
(3.7)

The proof boils down to bounding each integral in the previous inequality and applying standard finite element convergence and interpolation estimates. To begin with, we split the first integral into

$$\begin{aligned}&\int _\varOmega ( \nabla u \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p - \nabla u_h \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p_h \,d{\mathbf {x}}\nonumber \\&\quad = \int _\varOmega \nabla (u-u_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p \,d{\mathbf {x}}\nonumber \\&\qquad + \int _\varOmega \nabla u \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla (p-p_h) \,d{\mathbf {x}}\nonumber \\&\qquad - \int _\varOmega \nabla (u-u_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla (p-p_h) \,d{\mathbf {x}}\,. \end{aligned}$$
(3.8)

To bound the first and the second integral on the right-hand side of (3.8) we make use of standard duality techniques. For the first one we introduce the function \(w\) as weak solution of the adjoint BVPFootnote 5

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w+ w= -\mathrm{div }\left( ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p\right) &{}\text {in }\varOmega \,,\\ w= 0 &{}\text {on }\partial \varOmega \,, \end{array} \right. \end{aligned}$$
(3.9)

that is,

$$\begin{aligned} \int _\varOmega \nabla w\cdot \nabla v +wv \,d{\mathbf {x}}= \int _\varOmega \left( ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p\right) \cdot \nabla v \quad \forall \, v\in H^1_0(\varOmega )\,. \end{aligned}$$
(3.10)

We recall that for two generic functions \(q_1\), \(q_2\in L^4(\varOmega )\) the Cauchy–Schwarz inequality implies

$$\begin{aligned} \Vert q_1 q_2 \Vert _{L^{2}({\varOmega })} \le \Vert q_1 \Vert _{L^4(\varOmega )} \Vert q_2 \Vert _{L^4(\varOmega )}\,. \end{aligned}$$
(3.11)

By the triangle inequality, (3.11) and the Sobolev Imbedding Theorem [1, Thm 4.12], we bound the source function in (3.9) by

$$\begin{aligned}&\Vert \mathrm{div }\left( ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p\right) \Vert _{L^{2}({\varOmega })}\\&\quad \le C \left( \Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert p \Vert _{W^{1,4}(\varOmega )} +\Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert p \Vert _{H^2(\varOmega )}\right) \,,\\&\quad \le C \Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert p \Vert _{H^{2}(\varOmega )}\,. \end{aligned}$$

By Assumption 1, \(w\) is in \(H^2(\varOmega )\) and it satisfies

$$\begin{aligned} \Vert w\Vert _{H^2(\varOmega )} \le C \Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert p \Vert _{H^{2}(\varOmega )}\,. \end{aligned}$$
(3.12)

By exploiting the Galerkin orthogonality of \(u-u_h\) to the finite dimensional trial space \(V_h\subset H^1_0(\varOmega )\), we derive the bound

$$\begin{aligned}&\left|\int _\varOmega \nabla (u-u_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p \,d{\mathbf {x}}\right|\nonumber \\&\quad = \left|\int _\varOmega \nabla (u-u_h) \cdot \nabla w+(u-u_h)w\,d{\mathbf {x}}\right|, \nonumber \\&\quad = \left|\int _\varOmega \nabla (u-u_h) \cdot \nabla (w-\mathcal{I}_hw)+(u-u_h)(w- \mathcal{I}_hw) \,d{\mathbf {x}}\right|, \nonumber \\&\quad \le \Vert u-u_h \Vert _{H^{1}({\varOmega })} \Vert w-\mathcal{I}_hw\Vert _{H^{1}({\varOmega })}\,. \end{aligned}$$
(3.13)

Then by (3.6) and the standard finite element convergence estimate [6, Ch. II, Sect. 7]

$$\begin{aligned} \Vert u-u_h \Vert _{H^{1}({\varOmega })} \le Ch\Vert u \Vert _{H^2(\varOmega )}\,, \end{aligned}$$
(3.14)

we conclude from (3.12)

$$\begin{aligned} \left|\int _\varOmega \nabla (u-u_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla p \,d{\mathbf {x}}\right|\le Ch^2\Vert u \Vert _{H^2(\varOmega )}\Vert p \Vert _{H^2(\varOmega )} \Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\,. \end{aligned}$$

Similarly, for the second integral on the right-hand side of (3.8) we introduce the function \(w\) as weak solution of the adjoint BVP

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w+ w= -\mathrm{div }\left( ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla u\right) &{}\text {in }\varOmega \,,\\ w= 0 &{}\text {on }\partial \varOmega \,, \end{array} \right. \end{aligned}$$
(3.15)

that is,

$$\begin{aligned} \int _\varOmega \nabla w\cdot \nabla v +wv \,d{\mathbf {x}}= \int _\varOmega \left( ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla u\right) \cdot \nabla v \quad \forall \, v\in H^1_0(\varOmega )\,. \end{aligned}$$
(3.16)

Assumption 1 and the bound

$$\begin{aligned} \Vert \mathrm{div }\left( ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla u\right) \Vert _{L^{2}({\varOmega })} \le C \Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert u \Vert _{H^{2}(\varOmega )} \end{aligned}$$

imply that \(w\in H^2(\varOmega )\) and that it satisfies

$$\begin{aligned} \Vert w\Vert _{H^2(\varOmega )} \le C \Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert u \Vert _{H^{2}(\varOmega )}\,. \end{aligned}$$
(3.17)

Next, we note that for every \(v_h\in V_h\)

$$\begin{aligned} \int _\varOmega \nabla (p-p_h)\cdot \nabla v_h + (p-p_h)v_h \,d{\mathbf {x}}= \int _\varOmega (j(u)-j(u_h))v_h\,d{\mathbf {x}}\,, \end{aligned}$$
(3.18)

which implies

$$\begin{aligned}&\left|\int _\varOmega \nabla (p-p_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla u \,d{\mathbf {x}}\right|= \left|\int _\varOmega \nabla (p-p_h) \cdot \nabla w+(p-p_h)w\,d{\mathbf {x}}\right|, \nonumber \\&\quad \le \left|\int _\varOmega \nabla (p-p_h) \cdot \nabla (w-\mathcal{I}_hw)+(p-p_h)(w- \mathcal{I}_hw) \,d{\mathbf {x}}\right|\nonumber \\&\qquad +\left| \int _\varOmega \left( j(u)-j(u_h)\right) \mathcal{I}_hw\,d{\mathbf {x}}\right| ,\nonumber \\&\quad \le \Vert p-p_h\Vert _{H^{1}({\varOmega })} \Vert w-\mathcal{I}_hw\Vert _{H^{1}({\varOmega })} +\Vert \mathcal{I}_hw\Vert _{L^{2}({\varOmega })} \Vert j(u)-j(u_h) \Vert _{L^{2}({\varOmega })}\,.\nonumber \\ \end{aligned}$$
(3.19)

For the concrete BVP considered the state solution \(u\) will belong to \(C^{0}(\overline{\varOmega })\). Further, \(L^{\infty }(\varOmega )\)-estimates for finite element solutions [6, Ch. II, Sect. 7] ensure that \(\left\| {u-u_{h}}\right\| _{L^{\infty }(\varOmega )}\rightarrow 0\) as \(h\rightarrow 0\). Hence, we can take for granted that there are \(h\)-independent bounds \(\underline{u}\) and \(\overline{u}\)

$$\begin{aligned} -\infty < \underline{u} \le u({\mathbf {x}}),u_{h}({\mathbf {x}}) \le \overline{u} < \infty \quad \forall {\mathbf {x}}\in \varOmega \,. \end{aligned}$$
(3.20)

We write \(I :\,= [\underline{u},\overline{u}]\) and point out that \(j'\) is bounded on \(I\). Thus the standard finite element convergence estimate [6, Ch. II, Sect. 7]

$$\begin{aligned} \Vert u-u_h \Vert _{L^{2}({\varOmega })} \le Ch^2\Vert u \Vert _{H^2(\varOmega )}\,, \end{aligned}$$
(3.21)

gives

$$\begin{aligned} \nonumber \Vert j(u) -j(u_h) \Vert _{L^{2}({\varOmega })}&\le {\Vert j' \Vert _{C^{0}(I)}} \Vert u-u_h\Vert _{L^{2}({\varOmega })}\,,\\&\le Ch^2\Vert j' \Vert _{C^{0}(I)}\Vert u \Vert _{H^2(\varOmega )}\,. \end{aligned}$$
(3.22)

In order to establish a bound for (3.19), we follow the arguments in the proof of Strang’s first lemma [6, Ch. III, Thm. 1.1]. We note that for every \(v_h\in V_h\)

$$\begin{aligned} \nonumber \Vert p_h - v_h \Vert _{H^{1}({\varOmega })}^2&= \int _\varOmega \nabla (p_h - p) \cdot \nabla (p_h - v_h) + (p_h-p)(p_h-v_h)\,d{\mathbf {x}}\nonumber \\&\quad + \int _\varOmega \nabla (p - v_h) \cdot \nabla (p_h - v_h) + (p-v_h)(p_h-v_h)\,d{\mathbf {x}}\nonumber \\&\le \left( \Vert j(u_h)-j(u)\Vert _{L^{2}({\varOmega })}+\Vert p-v_h \Vert _{H^{1}({\varOmega })}\right) \Vert p_h - v_h \Vert _{H^{1}({\varOmega })}\,, \end{aligned}$$
(3.23)

where in the last step we used (3.18) and the Cauchy–Schwarz inequality. Then by the triangle inequality, (3.23) and (3.6),

$$\begin{aligned} \Vert p - p_h \Vert _{H^{1}({\varOmega })}&\le \Vert p - \mathcal{I}_hp\Vert _{H^{1}({\varOmega })} + \Vert \mathcal{I}_hp -p_h\Vert _{H^{1}({\varOmega })}\,,\nonumber \\&\le 2\Vert p - \mathcal{I}_h p \Vert _{H^{1}({\varOmega })} +\Vert j(u_h)-j(u)\Vert _{L^{2}({\varOmega })}\,,\nonumber \\&\le Ch\Vert p\Vert _{H^2(\varOmega )} + Ch^2 {\Vert j' \Vert _{C^{0}(I)}}\Vert u \Vert _{H^2(\varOmega )}\,, \end{aligned}$$
(3.24)

which implies

$$\begin{aligned}&\left|\int _\varOmega \nabla (p-p_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla u \,d{\mathbf {x}}\right|\\&\quad \le Ch^2\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert u \Vert _{H^2(\varOmega )} (\Vert p \Vert _{H^2(\varOmega )}+\Vert u \Vert _{H^2(\varOmega )}\Vert j' \Vert _{C^{0}(I)})\,. \end{aligned}$$

Finally, by the Cauchy–Schwarz inequality, (3.14) and (3.24), the following bound for the third integral on the right-hand side of (3.8) holds.

$$\begin{aligned}&\left|\int _\varOmega \nabla (u-u_h) \cdot ({\mathbf {D}}\mathcal{V}+{\mathbf {D}}\mathcal{V}^T) \nabla (p-p_h) \,d{\mathbf {x}}\right|\\&\quad \le \Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert u-u_h\Vert _{H^{1}({\varOmega })} \Vert p-p_h \Vert _{H^{1}({\varOmega })}\,,\\&\quad \le Ch^2\Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )} \Vert u \Vert _{H^2(\varOmega )} \Vert p \Vert _{H^2(\varOmega )}\,. \end{aligned}$$

To bound the second integral on the right-hand side of (3.7) we introduce the function \(w\) as weak solution of the adjoint BVP

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w+ w= -\mathrm{div }\left( f \mathcal{V}\right) &{}\text {in }\varOmega \,,\\ w= 0 &{}\text {on }\partial \varOmega \,, \end{array} \right. \end{aligned}$$
(3.25)

that is,

$$\begin{aligned} \int _\varOmega \nabla w\cdot \nabla v+wv \,d{\mathbf {x}}= \int _\varOmega f \mathcal{V}\cdot \nabla v \quad \forall \, v\in H^1_0(\varOmega )\,. \end{aligned}$$
(3.26)

Note that

$$\begin{aligned} \Vert -\mathrm{div }\left( f \mathcal{V}\right) \Vert _{L^{2}({\varOmega })} \le \Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )} \Vert f\Vert _{H^{1}({\varOmega })}\,, \end{aligned}$$

which implies that \(w\) is in \(H^2(\varOmega )\) and that it satisfies

$$\begin{aligned} \Vert w\Vert _{H^2(\varOmega )} \le C \Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert f \Vert _{H^{1}(\varOmega )}\,. \end{aligned}$$
(3.27)

Then by (3.26), (3.18), (3.24), (3.6), and (3.21),

$$\begin{aligned}&\left|\int _{\varOmega }f\mathcal{V}\cdot \nabla (p-p_h) \,d{\mathbf {x}}\right|= \Bigg \vert \int _\varOmega \nabla (p-p_h) \cdot \nabla w+(p-p_h)w\,d{\mathbf {x}}\Bigg \vert \,, \\&\quad \le \Vert p-p_h\Vert _{H^{1}({\varOmega })} \Vert w-\mathcal{I}_hw\Vert _{H^{1}({\varOmega })} +\Vert \mathcal{I}_hw\Vert _{L^{2}({\varOmega })} \Vert j' \Vert _{C^{0}(I)}\Vert u-u_h \Vert _{L^{2}({\varOmega })}\,,\\&\quad \le Ch^2 \Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert f \Vert _{H^{1}({\varOmega })} \left( \Vert p \Vert _{H^2(\varOmega )}+\Vert u {\Vert _{H^2(\varOmega )}\Vert j' \Vert _{C^{0}(I)}} \right) \,. \end{aligned}$$

To bound the third integral on the right-hand side of (3.7), we first apply the triangle inequality

$$\begin{aligned}&\left|\int _\varOmega \mathrm{div }\mathcal{V}(j(u) - j(u_h) - \nabla u\cdot \nabla p - up + \nabla u_h\cdot \nabla p_h + u_hp_h )\,d{\mathbf {x}}\right|\nonumber \\&\quad \le \left|\int _\varOmega \mathrm{div }\mathcal{V}(j(u) - j(u_h))\,d{\mathbf {x}}\right|\nonumber \\&\qquad +\left|\int _\varOmega \mathrm{div }\mathcal{V}(\nabla u\cdot \nabla p + up - \nabla u_h\cdot \nabla p_h - u_hp_h )\,d{\mathbf {x}}\right|\,. \end{aligned}$$
(3.28)

The first integral on the right-hand side of (3.28) can be bounded by

$$\begin{aligned} \left|\int _\varOmega \mathrm{div }\mathcal{V}(j(u) - j(u_h))\,d{\mathbf {x}}\right|&\le C\Vert \mathcal{V}\Vert _{W^{1,\infty }}{\Vert j' \Vert _{C^{0}({I})}} \Vert u-u_h \Vert _{L^{2}({\varOmega })}\,,\nonumber \\&\le Ch^2\Vert \mathcal{V}\Vert _{W^{1,\infty }}{\Vert j' \Vert _{C^{0}({I})}} \Vert u \Vert _{H^2(\varOmega )}\,, \end{aligned}$$
(3.29)

whereas the second one can conveniently be rewritten as

$$\begin{aligned}&\left|\int _\varOmega \mathrm{div }\mathcal{V}(\nabla u\cdot \nabla p + up - \nabla u_h\cdot \nabla p_h - u_hp_h )\,d{\mathbf {x}}\right|\nonumber \\&\quad =\left| \int _\varOmega \mathrm{div }\mathcal{V}\left( \nabla (u-u_h)\cdot \nabla p + (u-u_h)p\right) \,d{\mathbf {x}}\right. \nonumber \\&\left. \qquad +\int _\varOmega \mathrm{div }\mathcal{V}\left( \nabla u\cdot \nabla (p-p_h) + u(p-p_h)\right) \,d{\mathbf {x}}\right. \nonumber \\&\left. \qquad -\int _\varOmega \mathrm{div }\mathcal{V}\left( \nabla (u-u_h)\cdot \nabla (p-p_h) + (u-u_h)(p-p_h)\right) \,d{\mathbf {x}}\right| \,. \end{aligned}$$
(3.30)

Again, the first two integrals on the right-hand side of (3.30) can be bounded with standard duality techniques. For the first one we introduce the function \(w\) as weak solution of the adjoint BVP

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w+ w= -\mathrm{div }\left( \mathrm{div }(\mathcal{V}) \nabla p\right) +\mathrm{div }(\mathcal{V})p &{}\text {in }\varOmega \,,\\ w= 0 &{}\text {on }\partial \varOmega \,, \end{array} \right. \end{aligned}$$
(3.31)

that is,

$$\begin{aligned} \int _\varOmega \nabla w\cdot \nabla v+wv\,d{\mathbf {x}}= \int _\varOmega \mathrm{div }(\mathcal{V}) \left( \nabla p\cdot \nabla v+pv\right) \,d{\mathbf {x}}\quad \forall \, v \in H^1_0(\varOmega )\,. \end{aligned}$$
(3.32)

Assumption 1 and the bound

$$\begin{aligned} \Vert \mathrm{div }\left( \mathrm{div }(\mathcal{V}) \nabla p\right) +\mathrm{div }(\mathcal{V})p\Vert _{L^{2}({\varOmega })} \le C\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert p \Vert _{H^2(\varOmega )} \end{aligned}$$

imply that \(w\) is in \(H^2(\varOmega )\) and that it satisfies

$$\begin{aligned} \Vert w\Vert _{H^2(\varOmega )} \le C\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert p \Vert _{H^2(\varOmega )}\,. \end{aligned}$$
(3.33)

Then by (3.32), Galerkin orthogonality of \(u-u_h\) to \(V_h\), the Cauchy–Schwarz inequality, (3.14), and (3.6),

For the second integral on the right-hand side of (3.30) we introduce the function \(w\) as weak solution of the adjoint BVP

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w+ w= -\mathrm{div }\left( \mathrm{div }(\mathcal{V}) \nabla u\right) +\mathrm{div }(\mathcal{V})u &{}\text {in }\varOmega \,,\\ w= 0 &{}\text {on }\partial \varOmega \,, \end{array} \right. \end{aligned}$$
(3.34)

that is,

$$\begin{aligned} \int _\varOmega \nabla w\cdot \nabla v+wv\,d{\mathbf {x}}= \int _\varOmega \mathrm{div }(\mathcal{V}) \left( \nabla u\cdot \nabla v+uv\right) \,d{\mathbf {x}}\quad \forall \, v \in H^1_0(\varOmega )\,. \end{aligned}$$
(3.35)

Assumption 1 and the bound

$$\begin{aligned} \Vert \mathrm{div }\left( \mathrm{div }(\mathcal{V}) \nabla u\right) +\mathrm{div }(\mathcal{V})u\Vert _{L^{2}({\varOmega })} \le C\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert u \Vert _{H^2(\varOmega )} \end{aligned}$$

imply that \(w\) is in \(H^2(\varOmega )\) and that it satisfies

$$\begin{aligned} \Vert w\Vert _{H^2(\varOmega )} \le C\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert u \Vert _{H^2(\varOmega )}\,. \end{aligned}$$
(3.36)

Then, by (3.35), (3.18), the Cauchy–Schwarz inequality, (3.24), (3.6), and (3.21),

By the Cauchy–Schwarz inequality, (3.14), and (3.24), we obtain the following bound for the third integral on the right-hand side of (3.30):

$$\begin{aligned}&\left|\int _\varOmega \mathrm{div }\mathcal{V}\left( \nabla (u-u_h)\cdot \nabla (p-p_h) + (u-u_h)(p-p_h)\right) \,d{\mathbf {x}}\right|\\&\quad \le \Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert u-u_h\Vert _{H^{1}({\varOmega })} \Vert p-p_h\Vert _{H^{1}({\varOmega })}\,,\\&\quad \le C h^2\Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert u\Vert _{H^2(\varOmega )} \Vert p\Vert _{H^2(\varOmega )}\,. \end{aligned}$$

The fourth integral on the right-hand side of (3.7) can be bounded similarly as in (3.29), relying on \(L^\infty (\varOmega )\)-estimates for finite element solutions. Now, we make use of the uniform Lipschitz continuity of \(j'\) on the compact interval \(I\), which yields

$$\begin{aligned}&\left|\int _\varOmega (j'(u)-j'(u_h)-p+p_h)(\nabla g\cdot \mathcal{V})\,d{\mathbf {x}}\right|\\&\quad \le \Vert \mathcal{V}\Vert _{L^{\infty }(\varOmega )}\Vert g \Vert _{H^{1}({\varOmega })} \left( {\Vert j' \Vert _{C^{0,1}(I)}}\Vert u - u_h\Vert _{L^{2}({\varOmega })} +\Vert p-p_h\Vert _{L^{2}({\varOmega })}\right) \,, \end{aligned}$$

and since (3.18), (3.22), and (3.24) imply [6, Ch. III, Sect. 1]

$$\begin{aligned} \Vert p - p_h \Vert _{L^{2}({\varOmega })} \le Ch^2 {\Vert j' \Vert _{C^{0}(I)}} \Vert p \Vert _{H^2(\varOmega )}\,, \end{aligned}$$
(3.37)

we conclude

$$\begin{aligned}&\left|\int _\varOmega (j'(u)-j'(u_h)-p+p_h)(\nabla g\cdot \mathcal{V})\,d{\mathbf {x}}\right|\\&\quad \le Ch^2 {\Vert j' \Vert _{C^{0,1}(I)}}\Vert \mathcal{V}\Vert _{L^{\infty }(\varOmega )} \Vert g \Vert _{H^{1}({\varOmega })} \left( \Vert u \Vert _{H^2(\varOmega )}+\Vert p\Vert _{H^2(\varOmega )}\right) \,. \end{aligned}$$

Finally, the fifth integral on the right-hand side of (3.7) can be bounded with standard duality techniques by introducing the function \(w\) as weak solution of the adjoint BVP

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta w+ w= -\Delta \left( \nabla g \cdot \mathcal{V}\right) &{}\text {in }\varOmega \,,\\ w= 0 &{}\text {on }\partial \varOmega \,, \end{array} \right. \end{aligned}$$
(3.38)

that is,

$$\begin{aligned} \int _\varOmega \nabla w\cdot \nabla v+wv \,d{\mathbf {x}}= \int _\varOmega \nabla \left( \nabla g \cdot \mathcal{V}\right) \cdot \nabla v \,d{\mathbf {x}}\quad \forall \, v\in H^1_0(\varOmega )\,. \end{aligned}$$
(3.39)

Assumption 1 and the bound

$$\begin{aligned}&\Vert \Delta \left( \nabla g \cdot \mathcal{V}\right) \Vert _{L^{2}({\varOmega })}\\&\quad \le C\left( \Vert \mathcal{V}\Vert _{L^{\infty }(\varOmega )}\Vert g \Vert _{H^3(\varOmega )} +\Vert \mathcal{V}\Vert _{W^{1,\infty }(\varOmega )}\Vert g \Vert _{H^2(\varOmega )} +\Vert \mathcal{V}\Vert _{H^{2}(\varOmega )}\Vert g \Vert _{W^{1,\infty }(\varOmega )} \right) \\&\quad \le C\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert g \Vert _{H^3(\varOmega )} \end{aligned}$$

imply that \(w\) is in \(H^2(\varOmega )\) and that it satisfies

$$\begin{aligned} \Vert w\Vert _{H^2(\varOmega )} \le C\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert g \Vert _{H^3(\varOmega )}\,. \end{aligned}$$
(3.40)

Then by (3.39), (3.18), (3.24), (3.6), and (3.21) ,

$$\begin{aligned}&\left|\int _\varOmega \nabla (\nabla g \cdot \mathcal{V})\cdot \nabla (p-p_h)\,d{\mathbf {x}}\right|\\&\quad =\int _\varOmega \nabla w\cdot \nabla (p-p_h)+w(p-p_h)\,d{\mathbf {x}}\\&\quad \le \Vert p-p_h\Vert _{H^{1}({\varOmega })} \Vert w-\mathcal{I}_hw\Vert _{H^{1}({\varOmega })} +\Vert \mathcal{I}_hw\Vert _{L^{2}({\varOmega })} \Vert j \Vert _{C^{0,1}(I)}\Vert u-u_h \Vert _{L^{2}({\varOmega })}\,,\\&\quad \le C h^2\Vert \mathcal{V}\Vert _{W^{2,4}(\varOmega )}\Vert g \Vert _{H^3(\varOmega )} \left( \Vert p \Vert _{H^2(\varOmega )}+\Vert u \Vert _{H^2(\varOmega )}\Vert j \Vert _{C^{0,1}(I)} \right) \,. \end{aligned}$$

\(\square \)

Remark 3.2

The shape gradient formula (2.9) clearly represents a linear continuous operator on \(W^{1,\infty }(\mathbb {R}^d)\). Nevertheless, to exploit finite element superconvergence as in Theorem (3.1), we have to restrict ourselves to vector fields in \(W^{2,\infty }(\mathbb {R}^d)\). If this condition is violated, only first order convergence of \(d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }\) to \(d\mathcal{J}(\varOmega ;\mathcal{V})\) as \(h\rightarrow 0\) can be shown, because two key duality estimates in the proof of Theorem 3.1 are no longer available.

Remark 3.3

The quadratic rate of convergence in Theorem 3.1 depends on the regularity of the functions \(u\) and \(p\). If the assumption on the \(H^2\)-regularity of (2.6) is not fulfilled, the provable rate of convergence deteriorates to \(O(h^{\alpha })\) with fractional \(\alpha <2\), but the formula (3.1) remains meaningful, as long as a weak solutions in \(H^{1}({\varOmega })\) exist. On the other hand, if the functions \(u\) and \(p\) enjoy higher smoothness, the convergence may be improved by increasing the polynomial degree of the finite element space.

Remark 3.4

Theorem 3.1 holds true for Dirichlet boundary conditions only. However, a similar result can be achieved for Neumann boundary conditions. The proof follows the same lines as for the Dirichlet case and relies on \(H^2\)-regularity of the state problem and regularity assumptions on the source function \(f\) and the boundary data \(g\). In particular, convergence for the boundary term in (2.11) can be conclude either via duality techniques or by continuity of the Dirichlet trace operator with respect to \(H^{1}({\varOmega })\).

For Formula (3.2), the following holds:

Theorem 3.2

Let \(u_h\) and \(p_h\) be Ritz–Galerkin linear Lagrangian finite element approximations of the solutions \(u\) and \(p\) of (2.6) and (2.7). In addition to the hypothesis of Theorem 3.1, let us assume that

$$\begin{aligned} \Vert u \Vert _{W^{2,p}(\varOmega )} \le C \Vert f \Vert _{L^p(\varOmega )} \end{aligned}$$
(3.41)

for some \(p>d\), where \(d\) is the space dimension. Then

$$\begin{aligned} |d\mathcal{J}(\varOmega ;\mathcal{V}) - d\mathcal{J}(\varOmega ,u_h,p_h;\mathcal{V})^{\mathrm {Bdry} }|\le C h \Vert \mathcal{V}\cdot {\mathbf {n}}\Vert _{L^{\infty }(\partial \varOmega )}\,, \end{aligned}$$

where \(h\) stands for the meshwidth, and \(C>0\) does not depend on \(h\).

Proof

By the equality \(d\mathcal{J}(\varOmega ;\mathcal{V}) = d\mathcal{J}(\varOmega ,u,p;\mathcal{V})^{\mathrm {Bdry} }\), we immediately deduce from (3.2)

$$\begin{aligned}&\vert d\mathcal{J}(\varOmega ;\mathcal{V}) - d\mathcal{J}(\varOmega ,u_h,p_h;\mathcal{V})^{\mathrm {Bdry} }\vert \nonumber \\&\quad \le \Vert \mathcal{V}\cdot {\mathbf {n}}\Vert _{L^{\infty }(\varOmega )} \int _{\partial \varOmega } \left| j(u)-j(u_h)+\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {(u-g)}}{\partial {\mathbf {n}}}-\frac{\partial {p_h}}{\partial {\mathbf {n}}}\frac{\partial {(u_h-g)}}{\partial {\mathbf {n}}} \right| \, dS\,.\nonumber \\ \end{aligned}$$
(3.42)

By linearity, and similarly as in (3.8), we find

$$\begin{aligned}&\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {(u-g)}}{\partial {\mathbf {n}}}-\frac{\partial {p_h}}{\partial {\mathbf {n}}}\frac{\partial {(u_h-g)}}{\partial {\mathbf {n}}} \\&\quad =\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {u}}{\partial {\mathbf {n}}}-\frac{\partial {p_h}}{\partial {\mathbf {n}}}\frac{\partial {u_h}}{\partial {\mathbf {n}}}+\frac{\partial {p_h}}{\partial {\mathbf {n}}}\frac{\partial {g}}{\partial {\mathbf {n}}}-\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {g}}{\partial {\mathbf {n}}}\\&\quad =\frac{\partial {p}}{\partial {\mathbf {n}}}\frac{\partial {(u-u_h)}}{\partial {\mathbf {n}}}+\frac{\partial {(p-p_h)}}{\partial {\mathbf {n}}}\frac{\partial {u}}{\partial {\mathbf {n}}}-\frac{\partial {(p-p_h)}}{\partial {\mathbf {n}}}\frac{\partial {(u-u_h)}}{\partial {\mathbf {n}}} +\frac{\partial {(p_h-p)}}{\partial {\mathbf {n}}}\frac{\partial {g}}{\partial {\mathbf {n}}}\,. \end{aligned}$$

Therefore, applying the triangle inequality on the right-hand side of (3.42), the estimate of the theorem follows straightforwardly from finite element error estimates in \(W^{1,\infty }(\varOmega )\):

$$\begin{aligned} \Vert u-u_h \Vert _{W^{1,\infty }(\varOmega )}\le Ch\quad \text {and}\quad \Vert p-p_h \Vert _{W^{1,\infty }(\varOmega )}\le Ch\,, \end{aligned}$$

cf. [7, Corollary 8.1.12], which requires the assumption (3.41). \(\square \)

Remark 3.5

For \(d\mathcal{J}(\varOmega ,u,p;\mathcal{V})^{\mathrm {Bdry} }\) to be well-defined, the functions \(u\) and \(p\) must be smoother than merely belonging to \(H^{1}(\varOmega )\).

4 Numerical experiments

We numerically study the approximation of the shape gradient for the quadratic shape functional

$$\begin{aligned} \mathcal{J}(\varOmega ) = \int _\varOmega u^2 \,d{\mathbf {x}}\,, \end{aligned}$$

for \(\varOmega \subset \mathbb {R}^2\), under the scalar PDE constraint

$$\begin{aligned} \left\{ \begin{array}{l@{\quad }l} -\Delta u + u = f &{} \text {in } \varOmega \,,\\ u = g &{} \text {on } \partial \varOmega \,. \end{array}\right. \end{aligned}$$
(4.1)

It is challenging to investigate convergence rates in the \(C^1(\mathbb {R}^d;\mathbb {R}^d)\) dual norm numerically. Therefore, we consider only an operator norm over a finite dimensional space of vector fields in \(\mathcal {P}_{3,3}(\mathbb {R}^{2})\), whose components are multivariate product polynomials of degree three. Moreover, the \(C^1(\mathbb {R}^d;\mathbb {R}^d)\)-norm is replaced with the \(H^{1}({\varOmega })\)-norm, which is more tractable computationally. The convergence studies are performed monitoring the approximate dual norms

$$\begin{aligned} \mathrm {err}^{\mathrm {Vol}} :\,= \left( \max _{\mathcal{V}\in \mathcal {P}_{3,3}} \frac{1}{\Vert \mathcal{V}\Vert _{H^{1}({\varOmega })}^{2}} \vert d\mathcal{J}(\varOmega ;\mathcal{V}) - d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }\vert ^2 \right) ^{1/2} \end{aligned}$$

and

$$\begin{aligned} \mathrm {err}^{\mathrm {Bdry}} :\,=\left( \max _{\mathcal{V}\in \mathcal {P}_{3,3}} \frac{1}{\Vert \mathcal{V}\Vert _{H^{1}({\varOmega })}^{2}} \vert d\mathcal{J}(\varOmega ;\mathcal{V}) - d\mathcal{J}(\varOmega ,u_h,p_h;\mathcal{V})^{\mathrm {Bdry} }\vert ^2 \right) ^{1/2} \end{aligned}$$

on different meshes generated through uniform refinement.Footnote 6

To compute the values \(\mathrm {err}^{\mathrm {Vol}}\) and \(\mathrm {err}^{\mathrm {Bdry}}\), we introduce a basis \(\{\mathcal{V}_i\}_{i=1}^{m}\), \(m=20\), of \(\mathcal {P}_{3,3}(\mathbb {R}^{2})\), and define the column vectors

$$\begin{aligned} {\mathbf {z}}^{\mathrm {Vol}}&:= \left( d\mathcal{J}(\varOmega ;\mathcal{V}_i) - {d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V}_i)^{\mathrm {Vol} }}\right) _{i=1}^{m}\,,\\ {\mathbf {z}}^{\mathrm {Bdry}}&:= \left( d\mathcal{J}(\varOmega ;\mathcal{V}_i) - {d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V}_i)^{\mathrm {Bdry} }}\right) _{i=1}^{m}\,. \end{aligned}$$

Let \({\mathbf {M}}\) be the Gramian matrix of \(\{\mathcal{V}_i\}_{i=1}^{20}\) with respect to the \(H^1(\varOmega )\) inner product, and consider the matrices \({\mathbf {A}}^{\mathrm {Vol}}\) and \({\mathbf {A}}^{\mathrm {Bdry}}\) defined by

$$\begin{aligned} \{{\mathbf {A}}^{\mathrm {Vol}}\}_{i,j=1}^{20} = {\mathbf {z}}^{\mathrm {Vol}}({\mathbf {z}}^{\mathrm {Vol}})^{T} \quad \text { and } \quad \{{\mathbf {A}}^{\mathrm {Bdry}}\}_{i,j=1}^{20} = {\mathbf {z}}^{\mathrm {Bdry}} ({\mathbf {z}}^{\mathrm {Bdry}})^{T}\,, \end{aligned}$$

respectively. Then, \(\mathrm {err}^{\mathrm {Vol}}\) and \(\mathrm {err}^{\mathrm {Bdry}}\) can be obtained as the square roots of the maximal eigenvalues of \({\mathbf {M}}^{-1}{\mathbf {A}}^{\mathrm {Vol}}\) and \({\mathbf {M}}^{-1}{\mathbf {A}}^{\mathrm {Bdry}}\), which can be computed by

$$\begin{aligned} ({\mathbf {z}}^{\mathrm {Vol}})^T{\mathbf {M}}^{-1}{\mathbf {z}}^{\mathrm {Vol}} \quad \text { and }\quad ({\mathbf {z}}^{\mathrm {Bdry}})^T{\mathbf {M}}^{-1}{\mathbf {z}}^{\mathrm {Bdry}}\,, \end{aligned}$$

respectively.

Although analytical values are in some cases computable, the reference values \(d\mathcal{J}(\varOmega ;\mathcal{V})\) are approximated by evaluating \(d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }\) on a mesh with an extra level of refinement. This gives us much flexibility in the selection of test cases (the same code can be used for different geometries \(\varOmega \), source functions \(f\) and \(g\), and vector fields \(\mathcal{V}\)). Agreement with the theoretical predictions of Theorem 3.1 and a numerical study in the third numerical experiment confirm the viability of this approach.

In the implementation, we opt for linear Lagrangian finite elements on quasi-uniform triangular meshes.Footnote 7 Integrals in the domain are computed by a 7-point quadrature rule in each triangle and line integrals with a 6-point Gauss quadrature on each segment. The boundary of the computational domains is approximated by a polygon, which is generally believed not to affect the convergence of linear finite elements [7, Sect. 10.2].

The first numerical experiment is constructed starting from the solution

$$\begin{aligned} u(x,y)=\cos (x)\cos (y) \end{aligned}$$

and setting \(f\) and \(g\) accordingly. The computational domain is a disc with radius \(\sqrt{\pi }\) (see Fig. 1, left). The predicted quadratic and linear convergence with respect to the meshwidth \(h\) for, respectively, Formulas (3.1) and (3.2) are evident in Fig. 2 (left).

The second experiment is performed on a triangle with corners located at \((-\pi ,-\pi )\), \((\pi ,-\pi )\), and \((0,\pi )\) (see Fig. 1, right). The source function and the boundary data are chosen as follows:

$$\begin{aligned} f(x,y)= x^2-y^2\,, \quad g(x,y)=x+y\,. \end{aligned}$$

Again, the rates of convergence predicted in Theorems 3.1 and 3.2 are confirmed by the experiment, see Fig. 2 (right).

Fig. 1
figure 1

Plot of the solution \(u\) of the state problem in the computational domain \(\varOmega \) for the first (left) and the second (right) numerical experiment

Fig. 2
figure 2

Convergence study for the first (left) and the second (right) numerical experiment. Obviously, Formula (3.1) is better suited for a finite element approximation of the Eulerian derivative \(d\mathcal{J}(\varOmega ;\mathcal{V})\) than Formula (3.2)

The third numerical experiment is conducted on a domain which does not guarantee \(H^2\)-regularity of the state problem (2.6), see Fig. 3 (left). The source and the boundary functions are, in polar coordinates, \(f({\mathbf {x}})=r^{2/3}\cos (2\varphi /3)\) and \(g({\mathbf {x}})=0\) respectively. As expected, the convergence rates deteriorate to fractional values due to the presence of a reentrant corner which, with an interior angle of size \(2\pi \cdot 60/61\), affects the regularity of the functions \(u\) and \(p\).

Fig. 3
figure 3

Plot of the solution \(u\) of the state problem in the computational domain \(\varOmega \) (left) for the third numerical experiment, and corresponding convergence study (right). Due to the poor regularity of the functions \(u\) and \(p\), the convergence rate of \(d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }\) and \(d\mathcal{J}(\varOmega ,u_h,p_h;\mathcal{V})^{\mathrm {Bdry} }\) deteriorate

In the fourth numerical experiment, we investigate the Neumann problem and the accuracy of Formulas (2.11) and (2.12), for which we expect results similar to the Dirichlet case. We consider the solution

$$\begin{aligned} u(x,y)=\cos (x-1)\cos (y+1) \end{aligned}$$

and we choose \(f\) and \(g\) accordingly. The computational domain is a disc with radius \(\sqrt{\pi }\) (see Fig. 4, left). Surprisingly, we observe that Formula (2.12) performs as well as Formula (2.11), showing quadratic convergence in the meshwidth \(h\), too (see Fig. 5, left).

This surprising observation is not confined to smooth domains, as will be demonstrated by our fifth numerical experiment. It investigates the convergence for the Neumann case on a triangle with corners located at \((-\pi ,-\pi )\), \((\pi ,-\pi )\), and \((0,\pi )\) (see Fig. 4, right). The source function and the boundary data are set as follows:

$$\begin{aligned} f(x,y)= \cos (x+1)\cos (y-1)\!, \quad g(x,y)=\cos (x-1)\cos (y+1)\,. \end{aligned}$$

Again, we observe that Formula (2.12), corrected according to Remark 2.3, converges quadratically in the meshwidth \(h\) (see Fig. 5, right).

Fig. 4
figure 4

Plot of the solution \(u\) of the state problem in the computational domain \(\varOmega \) for the fourth (left) and the fifth (right) numerical experiment

Fig. 5
figure 5

Convergence study for the fourth (left) and fifth (right) numerical experiment. The quadratic convergence of \(d\mathcal{J}(\varOmega ,u_h,p_h;\mathcal{V})^{\mathrm {Bdry} }\) is unexpected

Nevertheless, the sixth numerical experiment, which studies the Neumann boundary value problem again, shows that Formula (2.11) is superior to (2.12) in terms of accuracy and convergence in case of domains which do not guarantee \(H^2\)-regularity, see Fig. 6. The source and the boundary functions are chosen as in the third numerical experiment.

Fig. 6
figure 6

Plot of the solution \(u\) of the state problem in the computational domain \(\varOmega \) (left) for the sixth numerical experiment, and corresponding convergence study (right). Due to the poor regularity of the functions \(u\) and \(p\), the convergence rate of \(d\mathcal{J}(\varOmega ,u_h,p_h; \mathcal{V})^{\mathrm {Vol} }\) and \(d\mathcal{J}(\varOmega ,u_h,p_h;\mathcal{V})^{\mathrm {Bdry} }\) deteriorate

Remark 4.1

The superconvergence observed in the fourth and in the fifth numerical experiments may be of interest for practical applications. For instance, in shape optimization it is common to arbitrarily restrict the choice of descent directions to vector fields which vanish on subregions of the computational domain, so that the optimization task is limited to the complement of these subregions [2, 18, 26]. At the same time, the formation of reentrant corners during the optimization routine is prevented by the use of regularization techniques such as filtering [14, 20].

A closer look at Formula (2.12) reveals a cancellation of the normal derivatives of \(u\) and \(p\), so that the formula is equivalent to

$$\begin{aligned} \nonumber d\mathcal{J}(\varOmega ;\mathcal{V}) =&\int _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\left( j(u)-\nabla _\Gamma u \nabla _\Gamma p - up + fp + \mathrm {K}gp \right) \, dS\\&+ \sum _{i=1}^3 p({\mathbf {a}}_i)g({\mathbf {a}}_i)\mathcal{V}({\mathbf {a}}_i)\cdot [[\tau ({\mathbf {a}}_i)]]\,, \end{aligned}$$
(4.2)

where \(\nabla _\Gamma \) stands for the tangential derivative. To elucidate the behavior of different contributions, we split Formula (4.2) according to

$$\begin{aligned} d\mathcal{J}(\varOmega ;\mathcal{V})&= \int _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\left( j(u) - up + fp + \mathrm {K}gp \right) \, dS \end{aligned}$$
(4.3a)
$$\begin{aligned}&\qquad \qquad + \sum _{i=1}^3 p({\mathbf {a}}_i)g({\mathbf {a}}_i)\mathcal{V}({\mathbf {a}}_i)\cdot [[\tau ({\mathbf {a}}_i)]] \end{aligned}$$
(4.3b)
$$\begin{aligned}&-\int _{\partial \varOmega } \mathcal{V}\cdot {\mathbf {n}}\left( \nabla _\Gamma u \nabla _\Gamma p \right) \, dS\,. \end{aligned}$$
(4.3c)

An approximation of the first integral (4.3a) by finite elements converges quadratically in \(h\). This can be shown as in the proof of Theorem 3.1, since the Dirichlet trace operator is bounded on \(H^{1}({\varOmega })\). Quadratic convergence is also expected for the approximation of (4.3b), due to the convergence properties of finite element solutions in \(L^{\infty }\) [7, Ch. 8]. On the other hand, the good approximation of the tangential derivative of \(u\) and \(p\) in (4.3c) still defies a theoretical explanation.

Finally, all experiments are repeated considering the operator norm on the subspace of multivariate polynomials of degree two instead of three. The measured errors well agree with those reported above, see Fig. 7. Thus, the arbitrary choice of computing the operator norm on the finite dimensional subspace of multivariate polynomial vector fields of degree three does not seem to compromise our observations.

Fig. 7
figure 7

Convergence study for the first (left, up), the second (left, middle), the third (left, down), the fourth (right, up), the fifth (right, middle) and the sixth (right, down) numerical experiment, when considering the operator norm on the subspace of multivariate polynomials of degree two. The results agree with those obtained with cubic polynomials

5 Conclusion

The shape gradient of shape differentiable PDE constrained shape functionals is an element of the dual space of \(C^1(\mathbb {R}^d;\mathbb {R}^d)\), and it can be expressed either as an integration in volume or as an integration on the boundary. Theorems in Sect. 3 and numerical experiments in Sect. 4 confirm that it is advisable to evaluate the shape gradient through volume integrals, when the finite element method is used.

This observation might be of relevance for shape optimization, because, in the words of M. Berggren, “the sensitivity information - directional derivatives of objective functions and constraints - needs to be very accurately computed in order for the optimization algorithms to fully converge” [5]. However, shape optimization techniques usually rely on function representatives of the shape gradient on the boundary. If volume based formulas are used, it takes an extension of boundary deformations into the interior of the domain, in order to obtain those. It remains to be seen whether the superiority of volume based formulas persists under these conditions.