Abstract
The rigorous justification of a numerical scheme for approximating a partial differential equation depends critically on the analytical properties of the problem and its solutions. If regularity and stability results are available, then error estimates can often be proved. Otherwise, weaker concepts, such as weak accumulation of approximations at exact solutions have to be employed. Discretizations of differential equations typically lead to nonlinear systems of equations, the convergence of iterative algorithms for their practical solution can only be guaranteed if the method respects the particular features of the underlying problem. In this chapter various techniques to prove convergence of discretizations and iterative algorithms are introduced, analyzed, and applied to model problems.
Access provided by Autonomous University of Puebla. Download chapter PDF
Keywords
- Discrete Minimization Problem
- Discrete Euler Lagrange Equations
- Abstract Convergence Theory
- Grid Transfer
- Infinite-dimensional Variational Problems
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
1 Convergence of Minimizers
We consider an abstract finite-dimensional minimization problem that seeks a minimizing function \(u_h\in \fancyscript{A}_h\) for a functional
where the indices \(h\) in \(\fancyscript{A}_h\) and \(W_h\) refer to discretized versions of given counterparts in the infinite-dimensional variational problem for minimizing
in the set of functions \(u\in \fancyscript{A}\). We will often refer to the infinite-dimensional problem as the continuous problem, but this does not imply a continuity property of the functional or its integrand. The finite-dimensional problems will also be referred to as discretized problems. We recall that it is sufficient for the existence of discrete solutions to have coercivity and lower semicontinuity of \(I_h\), while in the continuous situation, coercivity and the strictly stronger notion of weak lower semicontinuity of \(I\) are required. We discuss in this section the variational convergence of minimization problems and adopt concepts described in the textbook [5].
1.1 Failure of Convergence
A natural question to address is whether a family of discrete solutions \((u_h)_{h>0}\) converges to a minimizer \(u\in \fancyscript{A}\) for \(I\) with respect to some topology. Obviously, this requires the existence of a minimizer \(u\in \fancyscript{A}\) for \(I\) and convergence of the entire sequence of approximations requires uniqueness of the continuous solution, or a certain selection principle contained in the discrete problems. Surprisingly, even if a solution exists for the continuous problem, if the discretization is conforming in the sense that \(\fancyscript{A}_h \subset \fancyscript{A}\) and \(W_h= W\), and if the family \((\fancyscript{A}_h)_{h>0}\) is dense in \(\fancyscript{A}\), then convergence of discrete solutions may fail entirely.
Example 4.1
(Lavrentiev phenomenon [9]) Let \(\fancyscript{A}\) be the set of all functions \(v \in W^{1,1}(0,1)\) satisfying \(v(0)=0\) and \(v(1)=1\) and consider
For \(h>0\) let \(\fancyscript{T}_h\) be a triangulation of \((0,1)\), and define \(\fancyscript{A}_h = \fancyscript{A}\cap \fancyscript{S}^1(\fancyscript{T}_h)\). Then the function \(u(x)=x^{1/3}\) is a minimizer for \(I\) in \(\fancyscript{A}\), but for every \(h>0\), we have
In particular, the discrete minimal energies cannot converge to the right value. The reason for this discrepancy is the incompatibility of the growth of the integrand of \(I\) and the exponent of the employed Sobolev space in the definition of \(\fancyscript{A}\).
The example shows that even the seemingly simple notion of convergence
for \(h\rightarrow 0\) requires stronger arguments than just the density of the approximation spaces. Once convergence is understood, a natural question to investigate is whether a rate of convergence can be proved, i.e., whether there exists \(\alpha >0\) with
Even if this is the case, it is not guaranteed that discrete solutions \(u_h\in \fancyscript{A}_h\) converge to a minimizer \(u\in \fancyscript{A}\) of \(I\).
Example 4.2
(Lack of weak lower semicontinuity) Set \(\fancyscript{A}= W^{1,4}(0,1)\) and let
For \(h>0\) let \(\fancyscript{T}_h\) be a triangulation of \((0,1)\) of maximal mesh-size \(h\) and define \(\fancyscript{A}_h = \fancyscript{A}\cap \fancyscript{S}^1(\fancyscript{T}_h)\). Then \(\inf _{u\in \fancyscript{A}}I(u) = 0\) and
and any weakly convergent sequence of discrete minimizers \((u_h)_{h>0}\) satisfies \(u_h \rightharpoonup 0\) in \(W^{1,4}(\varOmega )\) as \(h\rightarrow 0\). Due to the nonconvexity of the integrand, we have that \(u = 0\) is not a minimizer for \(I\), i.e., \(0<1 = I(0)\).
1.2 \(\varGamma \)-Convergence of Discretizations
The concept of \(\varGamma \)-convergence provides a concise framework to analyze convergence of a sequence of energy functionals and its minimizers. In an abstract form we consider a sequence of discrete minimization problems:
Here, every space \(X_h\) is assumed to be a subspace of a Banach space \(X\) and \(I_h\) is allowed to attain the value \(+\infty \), so that constraints contained in \(\fancyscript{A}_h\subset X_h\) can be incorporated in \(I_h\). We formally extend the discrete problems to \(X\) by setting
In the following, \(h>0\) stands for a sequence of positive real numbers that accumulate at zero.
Definition 4.1
Let \(X\) be a Banach space, \(I:X\rightarrow \mathbb {R}\cup \{+\infty \), and let \((I_h)_{h>0}\) be a sequence of functionals \(I_h:X \rightarrow \mathbb {R}\cup \{+\infty \). We say that the sequence \((I_h)_{h>0}\) \(\varGamma \) -converges to \(I\) as \(h\rightarrow 0\), denoted by \(I_h \rightarrow ^{\varGamma } I\), with respect to a given topology \(\omega \) on \(X\) if the following conditions hold:
-
(a)
For every sequence \((u_h)_{h>0} \subset X\) with \(u_h \rightarrow ^{\omega } u\) for some \(u\in X\), we have that \(\liminf _{h\rightarrow 0} I_h(u_h) \ge I(u)\).
-
(b)
For every \(u\in X\) there exists a sequence \((u_h)_{h>0}\subset X\) with \(u_h \rightarrow ^\omega u\) and \(I_h(u_h) \rightarrow I(u)\) as \(h\rightarrow 0\).
Remark 4.1
The first condition is called liminf-inequality and implies that \(I\) is a lower bound for the sequence \((I_h)_{h>0}\) in the limit \(h\rightarrow 0\). The second condition guarantees that the lower bound is attained, and the involved sequence is called a recovery sequence.
Unless otherwise stated, we consider the weak topology \(\omega \) on \(X\). For conforming discretizations, i.e., if \(I_h(u_h)=I(u_h)\) for all \(u_h\in X_h\), of well-posed minimization problems, a \(\varGamma \)-convergence result can be proved under moderate conditions.
Theorem 4.1
(Conforming discretizations) Assume that \(I_h(u_h)=I(u_h)\) for \(u_h\in X_h\) and \(h>0\) and that the spaces \((X_h)_{h>0}\) are dense in \(X\) with respect to the strong topology of \(X\). If \(I\) is weakly lower semicontinuous and strongly continuous, then we have \(I_h \rightarrow ^\varGamma I\) as \(h\rightarrow 0\) with respect to weak convergence in \(X\).
Proof
Let \((u_h)_{h>0}\subset X\) and \(u\in X\) be such that \(u_h \rightharpoonup u\) as \(h\rightarrow 0\). To prove the liminf-inequality, we note that \(I_h(u_h)\ge I(u_h)\) and thus the weak lower semicontinuity of \(I\) implies \(\liminf _{h\rightarrow 0} I_h(u_h) \ge \liminf _{h\rightarrow 0} I(u_h) \ge I(u)\). To prove that \(I(u)\) is attained for every \(u\in X\), let \((u_h)_{h>0}\) be a sequence with \(u_h\in X_h\) for every \(h>0\) and \(u_h\rightarrow u\) in \(X\). The strong continuity of \(I\) and \(I_h(u_h)=I(u_h)\) imply that \(I(u) = \lim _{h\rightarrow 0}I_h(u_h)\). \(\square \)
The definition of \(\varGamma \)-convergence has remarkable consequences.
Proposition 4.1
(\(\varGamma \)-Convergence)
(i) If \(I_h \rightarrow ^\varGamma I\) as \(h\rightarrow 0\), then \(I\) is weakly lower semicontinuous on \(X\).
(ii) If \(I_h \rightarrow ^\varGamma I\) as \(h\rightarrow 0\) and for every \(h>0\) there exists \(u_h\in X\) such that \(I_h(u_h) \le \inf _{v_h\in X} I_h(v_h) + \varepsilon _h\) with \(\varepsilon _h \rightarrow 0\) as \(h\rightarrow 0\) and \(u_h \rightarrow ^\omega u\) for some \(u\in X\), then \(I_h(u_h) \rightarrow I(u)\) and \(u\) is a minimizer for \(I\).
(iii) If \(I_h \rightarrow ^\varGamma I\) and \(G\) is \(\omega \)-continuous on \(X\), then \(I_h+G\rightarrow ^\varGamma I+G\).
Proof
(i) Let \((u_j)_{j\in \mathbb {N}}\subset X\) be a sequence with \(u_j \rightarrow ^\omega u\) in \(X\) as \(j\rightarrow \infty \). For every \(j\in \mathbb {N}\) there exists a sequence \((u_j^h)_{h>0}\) such that \(u_j^h \rightarrow ^\omega u_j\) as \(h\rightarrow 0\) and \(I_h(u_j^h) \rightarrow I(u_j)\). For every \(j\in \mathbb {N}\) we may thus choose \(h_j>0\), such that \(|I(u_j)-I_{h_j}(u_j^{h_j})|\le 1/j\) and \(u_j^{h_j} \rightarrow ^\omega u\) as \(j\rightarrow \infty \). It follows that
This proves the first statement.
(ii) If \(u_h \rightarrow ^\omega u\), then by condition (a) we have \(I(u) \le \liminf _{h\rightarrow 0} I_h(u_h)\). Moreover, due to (b) for every \(v\in X\), there exists \((v_h)_{h>0} \subset X\) with \(v_h \rightarrow ^\omega v\) and \(I_h(v_h) \rightarrow I(v)\) as \(h\rightarrow 0\). Therefore, \(I(u_h) \le I(v_h) + \varepsilon _h\) and
i.e., \(u\) is a minimizer for \(I\).
(iii) If \(G\) is \(\omega \)-continuous, then \(G(u_h)\rightarrow G(u)\) whenever \(u_h\rightarrow ^\omega u\) in \(X\) and the \(\varGamma \)-convergence of \(I_h+G\) to \(I+G\) follows directly from \(I_h \rightarrow ^\varGamma I\). \(\square \)
1.3 Examples of \(\varGamma \)-Convergent Discretizations
We discuss some examples of \(\varGamma \)-convergence. As above, we always extend a functional \(I_h\) defined on a subspace \(X_h\subset X\) by the value \(+\infty \) to the whole space \(X\).
Example 4.3
(Poisson problem) Let \(X = H^1_\mathrm{D}(\varOmega )\) and \(X_h = \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) for a regular family of triangulations \((\fancyscript{T}_h)_{h>0}\) of \(\varOmega \). For \(f\in L^2(\varOmega )\) and \(g\in L^2({\varGamma _\mathrm{N}})\), let
and let \(I_h:H^1_\mathrm{D}(\varOmega )\rightarrow \mathbb {R}\cup \{+\infty \}\) coincide with \(I\) on \(\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\). Since the Dirichlet energy is weakly lower semicontinuous and strongly continuous, the linear lower-order terms are weakly continuous on \(H^1_\mathrm{D}(\varOmega )\), and since the finite element spaces are dense in \(H^1_\mathrm{D}(\varOmega )\), we verify that \(I_h \rightarrow ^\varGamma I\) as \(h\rightarrow 0\). Nonhomogeneous Dirichlet conditions can be included by considering the decomposition \(u = \widetilde{u}+ \widetilde{u}_\mathrm{D}\) with \(\widetilde{u}\in H^1_\mathrm{D}(\varOmega )\). For minimizers \(u\in H^2(\varOmega )\cap H^1_\mathrm{D}(\varOmega )\) of \(I\) and \(u_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) of \(I_h\), we have
A constant sequence of functionals can have a different \(\varGamma \)-limit.
Example 4.4
(Relaxation) For the sequence of functionals defined through \(X=W^{1,4}(0,1)\),
subspaces \(X_h = \fancyscript{S}^1(\fancyscript{T}_h)\), and \(I_h=I\) on \(X_h\), we have that \(I_h \rightarrow ^\varGamma I^{**}\) in \(W^{1,4}(0,1)\) with the convexified functional
where \(s_+ = \max \{s,0\}\) for \(s\in \mathbb {R}\). Since the integrand of \(I^{**}\) is convex, the functional is weakly lower semicontinuous. Using that \(I_h(u_h) =I(u_h)\ge I^{**}(u_h)\) for all \(h>0\), we deduce that \(\liminf _{h\rightarrow 0} I_h(u_h) \ge I^{**}(u)\) whenever \(u_h \rightharpoonup u\) in \(W^{1,4}(0,1)\). To prove that the lower bound is attained, we first consider the case that \(u\in W^{1,4}(\varOmega )\) is piecewise affine, i.e., \(u= u_H\in \fancyscript{S}^1(\fancyscript{T}_H)\) for some \(H>0\). For \(0<h<H\) we then construct a function \(u_h\) that nearly coincides with \(u_H\) on elements \(T_H\in \fancyscript{T}_H\) for which \(|u_H'|_{T_H}|\ge 1\). For elements with \(|u_H'|_{T_H}|\le 1\) we use gradients \(u_h' \in \{\pm 1\}\) on \(T_H\) in such a way that \(u_h\) and \(u_H\) nearly coincide at the endpoints of \(T_H\) and differ by at most \(h\) in the interior. Then \(I(u_h) \approx I^{**}(u_H)\) and \(I(u_h)\rightarrow I^{**}(u_H)\) as \(h\rightarrow 0\). The construction is depicted in Fig. 4.1. The assertion for general \(u\in W^{1,4}(\varOmega )\) follows from an approximation result and the strong continuity of \(I\).
A typical application of conforming discretizations of well-posed minimization problems occurs in simulating hyperelastic materials.
Example 4.5
(Hyperelasticity) Let \(\fancyscript{A}= \{y\in W^{1,p}(\varOmega ;\mathbb {R}^d): y|_{\varGamma _\mathrm{D}}= \widetilde{y}_\mathrm{D}|_{\varGamma _\mathrm{D}}\}\) for \(1\le p<\infty \) and \(\widetilde{y}_\mathrm{D}\in W^{1,p}(\varOmega ;\mathbb {R}^d)\). Assume that \(W:\mathbb {R}^{d\times d}\rightarrow \mathbb {R}\) is continuous and quasiconvex with
Then for \(f\in L^{p'}(\varOmega ;\mathbb {R}^d)\) and \(g\in L^{p'}({\varGamma _\mathrm{N}};\mathbb {R}^d)\), the functional
is weakly lower semicontinuous and coercive on \(W^{1,p}(\varOmega ;\mathbb {R}^d)\). Moreover, if the sequence \((y_j)_{j\in \mathbb {N}}\subset W^{1,p}(\varOmega ;\mathbb {R}^d)\) converges strongly to \(y\in W^{1,p}(\varOmega ;\mathbb {R}^d)\) then we have \(\nabla y_{j_k} (x) \rightarrow \nabla y(x)\) for almost every \(x\in \varOmega \) for a subsequence \((y_{j_k})_{k\in \mathbb {N}}\), and the generalized dominated convergence theorem implies
i.e., up to subsequences \(I\) is strongly continuous and this is sufficient to establish \(\varGamma \)-convergence. For piecewise affine boundary data \(y_\mathrm{D}\), we have that \(\fancyscript{A}_h = \fancyscript{A}\cap \fancyscript{S}^1(\fancyscript{T}_h)^d\) is nonempty and the density of finite element spaces implies \(I_h \rightarrow ^\varGamma I\) for conforming discretizations. More generally, it suffices to consider convergent approximations \(\widetilde{y}_{\mathrm{D},h}\) of \(\widetilde{y}_\mathrm{D}\).
The abstract convergence theory allows us to include nonlinear constraints.
Example 4.6
(Harmonic maps) Assume that \(u_\mathrm{D}\in C({\varGamma _\mathrm{D}};\mathbb {R}^m)\) is such that
is nonempty and for a triangulation \(\fancyscript{T}_h\) of \(\varOmega \) with nodes \(\fancyscript{N}_h\), set
i.e., \(\fancyscript{A}_h\not \subset \fancyscript{A}\). We then consider the minimization of the Dirichlet energy \(I\) on \(\fancyscript{A}_h\) and \(\fancyscript{A}\), respectively, which defines minimization problems with functionals \(I_h\) and \(I\) on \(H^1(\varOmega ;\mathbb {R}^m)\), respectively. To show that \(I_h\rightarrow ^\varGamma I\) in \(H^1(\varOmega ;\mathbb {R}^m)\) we note that the liminf-inequality follows from the weak lower semicontinuity of \(I\), together with the fact that if \(u_h \rightharpoonup u\) in \(W^{1,2}(\varOmega ;\mathbb {R}^m)\) with \(u_h \in \fancyscript{A}_h\) for every \(h>0\), then \(u\in \fancyscript{A}\). The latter implication follows from a nodal interpolation result, together with elementwise inverse estimates, i.e.,
Therefore, \(|u_{h'}(x)| \rightarrow 1\) for almost every \(x\in \varOmega \) and a subsequence \(h'>0\) so that \(|u(x)|=1\) for almost every \(x\in \varOmega \). We assume that \(u_\mathrm{D}\) is sufficiently regular, so that a similar argument shows \(u|_{\varGamma _\mathrm{D}}= u_\mathrm{D}\). To prove the attainment of \(I\), we note that due to the density of smooth unit-length vector fields in \(\fancyscript{A}\), we may assume \(u \in \fancyscript{A}\cap H^2(\varOmega ;\mathbb {R}^m)\) and define \(u_h = \fancyscript{I}_h u \in \fancyscript{A}_h\). Then \(u_h \rightarrow u\) in \(H^1(\varOmega ;\mathbb {R}^m)\) and \(I_h(u_h)\rightarrow I(u)\) as \(h\rightarrow 0\).
Remark 4.2
In general, smooth constrained vector fields are not dense in sets of weakly differentiable constrained vector fields, cf., e.g., [18].
For practical purposes it is often desirable to modify a given functional.
Example 4.7
(Total variation minimization) For \(X=W^{1,1}(\varOmega )\) we consider
and given a family of triangulations \((\fancyscript{T}_h)_{h>0}\) of \(\varOmega \) and \(u_h\in \fancyscript{S}^1(\fancyscript{T}_h)\), we define for \(\beta >0\) the regularized functionals
If \(u_h\rightharpoonup u\) in \(W^{1,1}(\varOmega )\), then the liminf-inequality follows from the weak lower semicontinuity of \(I\) on \(W^{1,1}(\varOmega )\) and the fact that \(I_h(u_h)\ge I(u_h)\) for every \(h>0\). To verify that \(I(u)\) is attained for every \(u\in W^{1,1}(\varOmega )\) in the limit \(h\rightarrow 0\), we note that the density of finite element spaces in \(W^{1,1}(\varOmega )\) allows us to consider a sequence \((u_h)_{h>0}\subset W^{1,1}(\varOmega )\) with \(u_h\in \fancyscript{S}^1(\fancyscript{T}_h)\) for every \(h>0\) and \(u_h\rightarrow u \in W^{1,1}(\varOmega )\) as \(h\rightarrow 0\). The estimate \((a^2+b^2)^{1/2} \le |a|+|b|\) implies that
and for a subsequence we have \(((h')^\alpha +|\nabla u_{h'}|^2)^{1/2} \rightarrow |\nabla u|\) almost everywhere in \(\varOmega \). The generalized dominated convergence theorem implies that \(I_{h'}(u_{h'})\rightarrow I(u)\) as \(h'\rightarrow 0\). With Proposition 4.1, this also implies the \(\varGamma \)-convergence of discretizations of
for \(g\in L^2(\varOmega )\). Due to the lack of reflexivity of \(W^{1,1}(\varOmega )\) this is not sufficient to deduce the existence of minimizers for \(I\), i.e., we cannot deduce the existence of weak limits of (subsequences) of a bounded sequence. For this, the larger space \(BV(\varOmega )\cap L^2(\varOmega )\) has to be considered. A corresponding \(\varGamma \)-convergence result follows analogously with the density of \(W^{1,1}(\varOmega )\) in \(BV(\varOmega )\) with respect to an appropriate notion of convergence.
1.4 Error Control for Strongly Convex Problems
For Banach spaces \(X\) and \(Y\), a bounded linear operator \(\varLambda : X\rightarrow Y\), and convex, lower-semicontinuous, proper functionals \(F: X\rightarrow \mathbb {R}\cup \{+\infty \) and \(G:Y\rightarrow \mathbb {R}\cup \{+\infty \), we consider the problem of finding \(u\in X\) with
The Fenchel conjugates \(F^*:X'\rightarrow \mathbb {R}\cup \{+\infty \) and \(G^*:Y'\rightarrow \mathbb {R}\cup \{+\infty \) are the convex, lower-semicontinuous, proper functionals defined by
for \(w\in X'\) and \(q\in Y'\), respectively. We assume that \(Y\) is reflexive, so that \(G=G^{**}\). Then, the property of the formal adjoint operator \(\varLambda ':Y'\rightarrow X'\), that \(\langle \varLambda v,q\rangle = \langle v,\varLambda ' q\rangle \), and the general relation \(\inf _v \sup _q H(v,q) \ge \sup _q \inf _v H(v,q)\) for an arbitrary function \(H:X\times Y'\rightarrow \mathbb {R}\cup \{+\infty \) yield
This motivates considering the dual problem which consists in finding \(p\in Y'\) with
We assume that \(F\) or \(G\) is strongly convex, so that there exist \(\alpha _F, \alpha _G \ge 0\) with \(\max \{\alpha _F,\alpha _G\}>0\), so that for all \(q_1,q_2\in Y\) and \(v_1,v_2\in X\), we have
By convexity, the estimates hold with \(\alpha _G=\alpha _F= 0\). The primal and dual optimization problems are related by the weak complementarity principle
We say that strong duality applies if equality holds. Our final ingredient for the error estimate is a characterization of the optimality of the solution of the primal problem. For some \(\alpha _I\ge 0\) and all \(w\in \partial I(u)\), we have that
and \(u\) is optimal if and only if \(0\in \partial I(u)\). We assume in the following that \(\alpha _F>0\) or \(\alpha _I>0\), so that \(I\) has a unique minimizer \(u\in X\).
Theorem 4.2
(Error control [16]) Assume that \(\max \{\alpha _F,\alpha _G,\alpha _I\}>0\) and let \(u\in X\) be the unique minimizer for \(I\).
(i) For a minimizer \(u_h \in X_h\) for \(I\) restricted to a subspace \(X_h\subset X\), we have the a priori error estimate
(ii) For an arbitrary approximation \(\widetilde{u}_h \in X\) of \(u\), we have the a posteriori error estimate
Proof
The convexity estimates imply that
The optimality of \(u\) shows that we have
It follows that
If \(u_h\in X_h\) is minimal in \(X_h\), then the identity \(I(u_h) = \inf _{w_h\in X_h} I(w_h)\) implies the a priori estimate. The weak complementarity principle \(I(u) \ge D(q)\) yields the a posteriori estimate. \(\square \)
Remarks 4.3
(i) If strong duality holds, i.e., if \(I(u)=D(p)\), then the estimate of the theorem is sharp in the sense that the right-hand side vanishes if \(v=u\) and \(q\) solves the dual problem.
(ii) Sufficient conditions for strong duality are provided by von Neumann’s minimax theorem, e.g., that \(F\) and \(G^*\) are convex, lower semicontinuous, and coercive.
Example 4.8
For the Poisson problem \(-\Delta u = f\) in \(\varOmega \), \(u|_{\partial \varOmega }= 0\), we have \(X= H^1_0(\varOmega )\), \(Y=L^2(\varOmega ;\mathbb {R}^d)\), \(\varLambda = \nabla \), \(G(\varLambda v) = (1/2) \int _\varOmega |\nabla v|^2 \,{\mathrm d}x\), and \(F(v) = -\int _\varOmega f v\,{\mathrm d}x\). It follows that \(F^*(w) = I_{\{-f\}}(w)\), \(G^*(q) = (1/2) \int _\varOmega |q|^2 \,{\mathrm d}x\),
We thus have
so that \(\alpha _G= 1/8\) and
i.e., \(\alpha _I= 1/2\). Moreover, we have \(\alpha _F = 0\).
(i) Incorporating the definition of the exact weak solution, the abstract a priori estimate of Theorem 4.2 provides the bound
which implies the best-approximation property
(ii) Letting \(\eta ^2(v,q)\) denote the right-hand side of the a posteriori error estimate of Theorem 4.2, we have
provided that \(-{{\mathrm{div}}}\,q = f\). The theorem thus implies
2 Approximation of Equilibrium Points
The Euler–Lagrange equations related to a minimization problem typically seek a function \(u\in X\) such that
for all \(v\in X\) with a possibly nonlinear operator \(F: X\rightarrow X'\) and a linear functional \(\ell \in X'\). Various other mathematical problems that may not be related to a minimization problem can also be formulated in this abstract form. A natural discretization employs subspaces \(X_h\subset X\) and seeks \(u_h\in X_h\) with
for all \(v_h\in X_h\). Here, \(F_h:X_h \rightarrow X_h'\) and \(\ell _h\in X_h'\) are approximations of \(F\) and \(\ell \) that result from a discretization, e.g., via numerical integration. The important question to address is whether numerical solutions \((u_h)_{h>0}\) for a sequence of finite-dimensional subspaces \(X_h\) converge in an appropriate sense to a solution of the infinite-dimensional problem. We assume that the finite-dimensional space \(X_h\) is equipped with the norm of \(X\). The corresponding dual spaces \(X_h'\) and \(X'\) are related by the inclusion \(X'|_{X_h}\subset X_h'\). Topics related to the contents of this section can be found in the textbooks [3, 11].
2.1 Failure of Convergence
The following examples show that unjustified regularity assumptions can lead to the failure of convergence to the correct object. The following examples are taken from [6].
Example 4.9
(Maxwell’s equations) For \(\varOmega \subset \mathbb {R}^2\) set \(X=H_0({{\mathrm{curl}}};\varOmega )\cap H({{\mathrm{div}}};\varOmega )\), where
with \({{\mathrm{curl}}}\,v = \partial _1 v_2 - \partial _2 v_1\) for \(v=(v_1,v_2)\) and \(t:\partial \varOmega \rightarrow \mathbb {R}^2\) a unit tangent. For \(f\in L^2(\varOmega ;\mathbb {R}^2)\), consider the problem of finding \(u\in X\) such that
for all \(v\in X\). The existence and uniqueness of a solution follows from the Lax–Milgram lemma. A discretization of this problem is obtained by choosing \(X_h = \fancyscript{S}^1(\fancyscript{T}_h)^2 \cap X\) and computing \(u_h\in X_h\) such that
for all \(v_h\in X_h\). This defines a convergent numerical scheme if \(\varOmega \) is convex. If \(\varOmega \) is nonconvex, then \(H^1(\varOmega ;\mathbb {R}^2)\cap X\) is a closed proper subspace of \(X\), cf. [8] for details, and convergence \(u_h\rightarrow u\) as \(h\rightarrow 0\) fails in general.
A similar effect occurs for higher-order problems.
Example 4.10
(Biharmonic equation) The biharmonic equation
formally corresponds to the weak formulation that seeks \(u\in H^2(\varOmega )\cap H^1_0(\varOmega )\) with
for all \(v\in H^2(\varOmega )\cap H^1_0(\varOmega )\) We denote the unique weak solution of the variational formulation by \(u = (\Delta ^2)^{-1} f\). A natural discretization of the problem is based on an operator splitting which is obtained by introducing \(z= -\Delta u\) and solving the Poisson problems
We have \(z= (-\Delta )^{-1} f \) and \(u = (-\Delta )^{-1}z = (-\Delta )^{-2}f\). Unless \(\varOmega \) is convex so that \(\Delta u\in H^1_0(\varOmega )\) we do not have \((\Delta ^2)^{-1} f = (-\Delta )^{-2} f\), and convergence of related numerical methods will fail in general.
Failure of convergence may also be related to the lack of uniqueness of a solution as in the case of degenerately monotone problems.
Example 4.11
(Degenerate monotonicity) For \(\sigma (F) = DW^{**}(F)\) for \(F\in \mathbb {R}^d\) and \(W^{**}(F) = (|F|^2-1)_+^2\), there are infinitely many functions \(u\in W^{1,4}_0(\varOmega )\) satisfying \(F(u)[v] = \int _\varOmega \sigma (\nabla u) \cdot \nabla v\,{\mathrm d}x =0\) for all \(v\in W^{1,4}_0(\varOmega )\).
2.2 Abstract Error Estimates
We sketch below the classical concept that consistency and stability imply the convergence of numerical approximations, provided that appropriate regularity results are available. Dual to this is an approach that leads to computable upper bounds for the approximation error and which avoids regularity assumptions entirely.
Theorem 4.3
(Abstract a priori error estimate) Let \(u\in X\) satisfy \(F(u)=\ell \) and assume that for an interpolant \(i_h u \in X_h\) and a consistency functional \(\fancyscript{C}_h(u) \in X_h'\), we have
for all \(v_h\in X_h\). Assume that we have discrete stability in the sense that for all \(z_h\in X_h\) and \(b_h\in X_h'\), the implication
holds. Then, if \(F_h :X_h\rightarrow X_h'\) is linear, there exists a unique solution \(u_h\in X_h\) with
Proof
Discrete stability implies that \(F_h:X_h\rightarrow X_h'\) is a bijection and hence there exists a unique \(u_h\in X_h\) with \(F_h(u_h)=0\). Since \(F_h(i_hu-u_h) = F_h(i_h u)-F_h(u_h) = F_h(i_h u)-\ell _h = \fancyscript{C}_h(u)\) we deduce the estimate. \(\square \)
Remark 4.4
We say that a discretization is consistent of order \(\beta \ge 0\), given the regularity \(u\in Z\subset X\) if \(\Vert \fancyscript{C}_h(u)\Vert _{X_h'}\le c h^\beta \). This implies convergence of approximations with rate \(\beta \).
A similar abstract concept leads to a posteriori error estimates for many linear problems.
Theorem 4.4
(Abstract a posteriori error estimate) Let \(u_h\in X_h\) and define the residual \(\fancyscript{R}_h(u_h)\in X'\) through
for all \(v\in X\). Assume that we have the continuous stability result that for all \(z\in X\) and \(b\in X'\), the implication
holds. If \(u\in X\) satisfies \(F(u)=\ell \) and if \(F\) is linear, then \(u\) is unique with
Proof
The difference \(u-u_h\) satisfies \(F(u-u_h)[v] = \fancyscript{R}_h(u_h;v)\) for all \(v\in X\), and the stability result implies the error estimate and the uniqueness property. \(\square \)
Example 4.12
(Poisson problem) Let \(u\in H^1_\mathrm{D}(\varOmega )\) be the weak solution of \(-\Delta u = f\) in \(\varOmega \), \(u|_{\varGamma _\mathrm{D}}=0\), and \(\partial _\nu u|_{\varGamma _\mathrm{N}}= g\), i.e., we have \(F(u)=\ell \) with
The lowest-order finite element method seeks \(u_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) with \(F(u_h)[v_h]=\ell (v_h)\) for all \(v_h \in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\).
(i) Inserting an interpolant \(i_h u \in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) in the discrete formulation leads to
for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\). We have \(\Vert \fancyscript{C}_h(u)\Vert _{\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)'} \le c h\Vert D^2u \Vert \) if \(u\in H^2(\varOmega )\cap H^1_\mathrm{D}(\varOmega )\) and \(i_h u = \fancyscript{I}_h u\) is the nodal interpolant of \(u\). If \(z_h \in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) and \(b_h \in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)'\) are such that
for all \(v_h \in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\), then the choice of \(v_h=z_h\) shows the discrete stability estimate \(\Vert \nabla z_h\Vert \le \Vert b_h\Vert _{\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)'}\). Therefore, Theorem 4.3 implies the error estimate
(ii) Let \(u_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) and define
for all \(v\in H^1_\mathrm{D}(\varOmega )\). Noting the stability estimate \(\Vert \nabla z\Vert \le \Vert b\Vert _{X'}\) for \(z\in H^1_\mathrm{D}(\varOmega )\) and \(b\in H^1_\mathrm{D}(\varOmega )'\) with
for all \(v\in H^1_\mathrm{D}(\varOmega )\), Theorem 4.4 implies the error estimate
If \(u_h\) satisfies \(F(u_h)[v_h]=0\) for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\), we have the Galerkin orthogonality \(F(u-u_h)[v_h]=0\) for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) and \(\Vert \fancyscript{R}_h (u_h)\Vert _{X'} \le c \eta (u_h)\) with a computable quantity \(\eta (u_h)\), cf. Theorem 3.6.
The concepts can be generalized to the class of strongly monotone operators.
Definition 4.2
The operator \(F:X\rightarrow X'\) is called strongly monotone if there exists an increasing bijection \(\chi :[0,\infty )\rightarrow [0,\infty )\) with
for all \(u,v\in X\).
We consider a conforming discretization of a strongly monotone problem in the following theorem.
Theorem 4.5
(Monotone problems) Assume that \(u\in X\) and \(u_h\in X_h\) satisfy
for all \(v\in X\) and \(v_h\in X_h\), respectively, and let \(\fancyscript{C}_h(u)\) and \(\fancyscript{R}_h(u_h)\) for an interpolation operator \(i_h\) be defined by
for all \(v_h\in X_h\) and \(v\in X\), respectively. Then we have the a priori and a posteriori error estimates
Proof
We have
and
Dividing by \(\Vert i_hu-u_h\Vert _X\) and \(\Vert u-u_h\Vert _X\), respectively, yields the estimates. \(\square \)
Example 4.13
(\(p\) -Laplacian) The \(p\)-Laplacian \(-{{\mathrm{div}}}(|\nabla u|^{p-2}\nabla u)\) is identified with the functional \(F:W^{1,p}_\mathrm{D}(\varOmega )\rightarrow W^{1,p}_\mathrm{D}(\varOmega )'\) defined by
for \(u,v\in W^{1,p}_\mathrm{D}(\varOmega )\). The functional \(F\) is the Fréchet derivative \(F=DI\) of
If \(p\ge 2\), then \(F\) is monotone with \(\chi (s) = \alpha s^{p-1}\) for all \(s\ge 0\) and some \(\alpha >0\). The functional is locally Lipschitz continuous in the sense that
for a constant \(M\in \mathbb {R}\) and \(u,v \in W^{1,p}_\mathrm{D}(\varOmega )\). This estimate implies the consistency of conforming discretizations, e.g., with \(\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\), and we obtain the error estimate
thus \(\Vert \nabla (u-u_h)\Vert _{L^p(\varOmega )} \le c h^{1/(p-1)}\) if \(u\in W^{2,p}(\varOmega )\cap W^{1,p}_\mathrm{D}(\varOmega )\).
If the operator \(F\) fails to be monotone but has a regular Fréchet derivative in the neighborhood of a solution, then a local error estimate follows from the implicit function theorem. For ease of presentation and without loss of generality, we consider the homogeneous problem \(F(u)=0\).
Theorem 4.6
(Local error estimate [10]) Suppose that \(F:X\rightarrow X'\) is continuous and \(u\in X\) satisfies \(F(u)=0\). Assume that there exist constants \(c_1,c_2,c_3,\varepsilon >0\) with \(c_2<c_1\) such that
for all \(v,w\in B_\varepsilon (u)\). Let \(i_h u\in X_h\) be an interpolant of \(u\) such that \(c_0 \Vert i_hu- u\Vert _X \le (c_1-c_2) \varepsilon \). Then there exists a unique \(u_h\in X_h \) with \(F(u_h)=0\) and \(\Vert u-u_h\Vert _X \le \varepsilon \).
Proof
The assumptions of the theorem imply that
A quantitative version of the implicit function theorem, cf. [2], implies the existence of a unique \(u_h\in X_h\) with the asserted properties. \(\square \)
Example 4.14
(Semilinear diffusion) The theorem implies error estimates for the approximation of the semilinear equation
provided that \(f'\) and a solution \(u\in H^1_0(\varOmega )\) are such that the operator \(-\Delta + f'(v) \mathrm{id}\) is invertible for all \(v\in B_\varepsilon (u)\) for some \(\varepsilon >0\). It is sufficient for this that \(f'> - c_P^{-2}\) with the smallest constant \(c_P>0\), such that \(\Vert w\Vert \le c_P \Vert \nabla w\Vert \) for all \(w\in H^1_0(\varOmega )\).
The following proposition generalizes the Lax–Milgram and the Céa lemma to bilinear forms that are not elliptic.
Proposition 4.2
(Generalized Lax–Milgram and Céa lemma [1, 13]) Let \(X,Y\) be Hilbert spaces, \(a:X\times Y\rightarrow \mathbb {R}\) a continuous bilinear form with continuity constant \(M\), and \(\ell \in Y'\). Assume that there exists \(\alpha >0\) such that
for all \(u\in X\) and that for all \(v\in Y{\setminus }\{0\}\), there exists \(u \in Y\) with \(a(u,v)\ne 0\). Then there exists a unique \(u\in X\) with
for all \(v\in Y\) and \(\Vert u\Vert _X\le \alpha ^{-1} \Vert \ell \Vert _{Y'}\). If \(X_h\subset X\) and \(Y_h\subset Y\) are such that the above conditions are satisfied with \(X\) and \(Y\) replaced by \(X_h\) and \(Y_h\), respectively, then there exists a unique \(u_h\in X_h\) with
for all \(v_h\in Y_h\), and we have
Proof
Identifying the bilinear form \(a\) with the operator \(A:X\rightarrow Y'\), we see that \(A\) is injective, i.e., \(Au=0\) for \(u\in X\) implies \(u=0\). Noting that
proves that the range of \(A\) is closed. If \(v\in Y\) is such that \(\langle Au,v\rangle =0\) for all \(u\in X\), then the assumptions imply \(v=0\). Hence, the closed range theorem yields that the range of \(A\) is \(Y'\) and it follows that \(A\) is bijective, i.e., there exists a unique \(u\in X\) with \(Au=\ell \). The estimate for \(\Vert u\Vert _X\) is an immediate consequence of the assumptions. The same arguments show that the operator \(A_h: X_h \rightarrow Y_h'\) is an isomorphism and hence there exists a unique \(u_h \in X_h\) with the asserted properties. Let \(w_h \in X_h\), and for every \(v_h\in X_h\) define
Then there exists a unique \(z_h\in X_h\) with \(a(z_h,v_h) = \widetilde{\ell }(v_h)\) and \(\Vert z_h\Vert _X \le \alpha ^{-1} \Vert \widetilde{\ell }\Vert _{Y_h'}\). Since \(a(u_h,v_h)=a(u,v_h)\) it follows that \(z_h = u_h-w_h\), and hence
The triangle inequality implies the asserted estimate. \(\square \)
Example 4.15
(Helmholtz equation) Let \(\omega \in \mathbb {R}\) and \(a:H^1_0(\varOmega )\times H^1_0(\varOmega )\rightarrow \mathbb {R}\) be for \(u,v\in H^1_0(\varOmega )\) defined by
which corresponds to the partial differential equation \(-\Delta u - \omega ^2 u = f\) in \(\varOmega \) with boundary condition \(u|_{\partial \varOmega }=0\). If \(\omega ^2\) is not an eigenvalue of \(-\Delta \), then \(a\) satisfies the conditions of the proposition. To prove this, note that \((-\Delta )^{-1}:L^2(\varOmega ) \rightarrow H^1_0(\varOmega )\subset L^2(\varOmega )\) is selfadjoint and compact with trivial kernel, so that there exists a complete orthonormal system \((u_j)_{j\in \mathbb {N}} \subset L^2(\varOmega )\) of eigenfunctions of \((-\Delta )^{-1}\), i.e., for every \(j\in \mathbb {N}\) we have \(-\Delta u_j = \lambda _j u_j\) with positive eigenvalues \((\lambda _j)_{j\in \mathbb {N}}\) that do not accumulate at zero. We have \(\lambda _j^{-1} (\nabla u_j,\nabla u_k) =(u_j,u_k)=\delta _{jk}\) for all \(j,k\in \mathbb {N}\). Given \(u=\sum _{j\in \mathbb {N}} \alpha _j u_j \in H^1_0(\varOmega )\), define \(v=\sum _{j\in \mathbb {N}} \sigma _j \alpha _j u_j\) with \(\sigma _j = {{\mathrm{sign}}}(\Vert \nabla u_j\Vert ^2 - \omega ^2 \Vert u_j\Vert ^2)\). Then
and with \(\Vert \nabla u\Vert =\Vert \nabla v\Vert \), we deduce that
The second condition of the proposition is a direct consequence of the requirement that \(\omega ^2\) is not an eigenvalue of \(-\Delta \).
Remark 4.5
Proposition 4.2 is important for the analysis of saddle-point problems; the seminal paper [7] provides conditions that imply the assumptions of the proposition.
2.3 Abstract Subdifferential Flow
The subdifferential flow of a convex and lower semicontinuous functional \(I:H\rightarrow \mathbb {R}\cup \{+\infty \) arises as an evolutionary model in applications, and can be used as a basis for numerical schemes to minimize \(I\). The corresponding differential equation seeks \(u:{[0,T]}\rightarrow H\), such that \(u(0)=u_0\) and
i.e., \(u(0)=u_0\) and
for almost every \(t\in {[0,T]}\) and every \(v\in H\). An implicit discretization of this nonlinear evolution equation is equivalent to a sequence of minimization problems involving a quadratic term. We recall that \(d_t u^k = (u^k-u^{k-1}){/}\tau \) denotes the backward difference quotient.
Theorem 4.7
(Semidiscrete scheme [15, 17]) Assume that \(I\ge 0\) and for \(u^0\in H\) let \((u^k)_{k=1,\ldots ,K}\subset H\) be minimizers for
for \(k=1,2,\ldots ,K\). For \(L=1,2,\ldots ,K\), we have
With the computable quantities
and the affine interpolant \(\widehat{u}_\tau : {[0,T]}\rightarrow H\) of the sequence \((u^k)_{k=0,\ldots ,K}\) we have the a posteriori error estimate
We have the a priori error estimate
and under the condition \(\partial I(u^0)\ne \emptyset \), the improved variant
where \(\partial ^o I(u^0)\in H\) denotes the element of minimal norm in \(\partial I(u^0)\).
Proof
The direct method in the calculus of variations yields that for \(k=1,2,\ldots ,K\), there exists a unique minimizer \(u^k\in H\) for \(I_\tau ^k\), and we have \(d_t u^k \in -\partial I(u^k)\), i.e.,
for all \(v\in H\); the choice of \(v=u^{k-1}\) implies that
with \(0\le \fancyscript{E}_k \le -\tau d_t I(u^k)\). A summation over \(k=1,2,\ldots ,L\) yields the asserted stability estimate. If \(\widehat{u}_\tau \) is the piecewise affine interpolant of \((u^k)_{k=0,\ldots ,K}\) associated to the time steps \(t_k = k\tau \), \(k=0,1,\ldots ,K\), and \(u_\tau ^+\) is such that \(u_\tau ^+|_{(t_{k-1},t_k)}=u^k\) for \(k=1,2,\ldots \) and \(t_k = k\tau \), then we have
for almost every \(t\in {[0,T]}\) and all \(v\in H\). In introducing
we have
The choice of \(v=u\) in this inequality and \(v=\widehat{u}_\tau \) in the continuous evolution equation yield
Noting \(\widehat{u}_\tau - u_\tau ^+ = (t-t_k)\partial _t \widehat{u}_\tau \) for \(t\in (t_{k-1},t_k)\) and using the convexity of \(I\), i.e.,
we verify for \(t\in (t_{k-1},t_k)\) using \(u_\tau ^+=u^k\) that
With \(\fancyscript{E}_k \le - \tau d_t I(u^k)\) and \(I\ge 0\) we deduce that
which implies the a posteriori and the first a priori error estimate. Assume that \(\partial I(u^0)\ne \emptyset \) and define \(u^{-1} \in H\) so that \(d_t u^0=(u^0-u^{-1})/\tau = -\partial ^o I(u^0)\), i.e., the discrete evolution equation also holds for \(k=0\),
for all \(v\in H\). Choosing \(v=u^k\) in the equation for \(d_t u^{k-1}\), \(k=1,2,\ldots ,K\), we observe that
i.e., \(-\tau d_t I(u^k) \le \tau (d_t u^k,d_t u^{k-1})_H\), and it follows that
This implies that
which proves the improved a priori error estimate. \(\square \)
Remarks 4.6
(i) The condition \(\partial I(u^0) \ne \emptyset \) is restrictive in many applications.
(ii) Subdifferential flows \(\partial _t u \in - \partial I(u)\), i.e., \(Lu \ni 0\) for \(Lu = \partial _t u +v\) with \(v\in \partial I (u)\), and with a convex functional \(I: H \rightarrow \mathbb {R}\cup \{+\infty \) define monotone problems in the sense that
for \(u_1,u_2\) and \(v_1,v_2\) with \(v_i\in \partial I(u_i)\), \(i=1,2\).
(iii) If \(I:H \rightarrow \mathbb {R}\cup \{+\infty \) is strongly monotone in the sense that \((u_1-u_2,v_1-v_2)_H \ge \alpha \Vert u_1-u_2\Vert _H^2\) whenever \(v_\ell \in \partial I(u_\ell )\), \(\ell =1,2\), and if there exists a solution \(\overline{u}\in H\) of the stationary inclusion \(\overline{v}=0\in \partial I (\overline{u})\), then we have \(u(t)\rightarrow \overline{u}\) as \(t\rightarrow \infty \). A proof follows from the estimate
where \(v=-\partial _t u\in \partial I(u)\), and an application of Gronwall’s lemma.
2.4 Weak Continuity Methods
Let \((u_h)_{h>0}\subset X\) be a bounded sequence in the reflexive, separable Banach space \(X\) such that there exists a weak limit \(u\in X\) of a subsequence that is not relabeled, i.e., we have \(u_h \rightharpoonup u\) as \(h\rightarrow 0\). For an operator \(F:X\rightarrow X'\), we define the sequence \((\xi _h)_{h>0}\subset X'\) through \(\xi _h = F(u_h)\), and if the sequence is bounded in \(X'\), then there exists \(\xi \in X'\), such that for a further subsequence \((\xi _h)_{h>0}\) which again is not relabeled, we have \(\xi _h\rightharpoonup ^* \xi \). The important question is now whether we have weak continuity in the sense that
Notice that weak continuity is a strictly stronger notion of continuity than strong continuity. For partial differential equations, this property is called weak precompactness of the solution set of the homogeneous equation, i.e., if \((u_j)_{j\in \mathbb {N}}\) is a sequence with \(F(u_j) = 0\) for all \(j\in \mathbb {N}\) and \(u_j \rightharpoonup u\) as \(j\rightarrow \infty \) then we may deduce that \(F(u)=0\). Such implications may also be regarded as properties of weak stability since they imply that if \(F(u_j)=r_j\) with \(\Vert r_j\Vert _{X'}\le \varepsilon _j\) and \(\varepsilon _j\rightarrow 0\) as \(j\rightarrow \infty \), then we have \(F(u)=0\) for every accumulation point of the sequence \((u_j)_{j\in \mathbb {N}}\).
Theorem 4.8
(Discrete compactness) For every \(h>0\) let \(u_h\in X_h\) solve \(F_h(u_h)\,{=}\,0\). Assume that \(F_h(u_h) \in X'\) with \(\Vert F(u_h)\Vert _{X'}\le c\) for all \(h>0\) and \(F\) is weakly continuous on \(X\), i.e., \(F(u_j)[v]\rightarrow F(u)[v]\) for all \(v\in X\) whenever \(u_j\rightharpoonup u\) in \(X\). Suppose that for every bounded sequence \((w_h)_{h>0} \subset X\) with \(w_h\in X_h\) for all \(h>0\), we have
as \(h \rightarrow 0\) and \((X_h)_{h>0}\) is dense in \(X\) with respect to strong convergence. If \((u_h)_{h>0}\subset X\) is bounded, then there exists a subsequence \((u_{h'})_{h'>0}\) and \(u\in X\) such that \(u_h\rightharpoonup u\) in \(X\) and \(F(u)=0\).
Proof
After extraction of a subsequence, we may assume that \(u_h\rightharpoonup u\) in \(X\) as \(h\rightarrow 0\) for some \(u\in X\). Fixing \(v\in X\) and using \(F_h(u_h)[v_h]=0\) for every \(v_h\in X_h\), we have
For a sequence \((v_h)_{h>0}\subset X\) with \(v_h\in X_h\) for every \(h>0\) and \(v_h\rightarrow v\) in \(X\), we find that
as \(h\rightarrow 0\). The sequences \((u_h)_{h>0}\) and \((v_h)_{h>0}\) are bounded in \(X\) and thus
as \(h\rightarrow 0\). Together with the weak continuity of \(F\) we find that
Since \(v\in X\) was arbitrary this proves the theorem. \(\square \)
The crucial part in the theorem is the weak continuity of the operator \(F\). We include an example of an operator related to a constrained nonlinear partial differential equation that fulfills this requirement.
Example 4.16
(Harmonic maps) Let \((u_j)_{j\in \mathbb {N}}\subset H^1(\varOmega ;\mathbb {R}^3)\) be a bounded sequence such that \(|u_j(x)|=1\) for all \(j\in \mathbb {N}\) and almost every \(x\in \varOmega \). Assume that for every \(j\in \mathbb {N}\) and all \(v\in H^1(\varOmega ;\mathbb {R}^3)\cap L^\infty (\varOmega ;\mathbb {R}^3)\), we have
The choice of \(v=u_j\times w\) shows that we have
for all \(w\in H^1(\varOmega ;\mathbb {R}^3)\cap L^\infty (\varOmega ;\mathbb {R}^3)\). Using \(\partial _k u_j \cdot \partial _k (u_j\times w) = \partial _k u_j \cdot (u_j \times \partial _k w)\) for \(k=1,2,\ldots ,d\), we find that
If \(u_j \rightharpoonup u\) in \(H^1_\mathrm{D}(\varOmega ;\mathbb {R}^3)\), then \(u_j \rightarrow u\) in \(L^2(\varOmega ;\mathbb {R}^3)\) and thus, for every fixed \(w\in C^\infty (\overline{\varOmega };\mathbb {R}^3)\), we can pass to the limit and find that
Since up to a subsequence we have \(u_j(x)\rightarrow u(x)\) for almost every \(x\in \varOmega \), we verify that \(|u(x)|=1\) for almost every \(x\in \varOmega \). A density result shows that this holds for all \(w\in H^1(\varOmega ;\mathbb {R}^3)\cap L^\infty (\varOmega ;\mathbb {R}^3)\). Reversing the above argument by choosing \(w=u \times v\) and employing the identity \(a\times (b\times c) = (b\cdot a)c -(c\cdot a)b\) shows that \(F(u)[v]=0\) for all \(v\in H^1(\varOmega ;\mathbb {R}^3)\cap L^\infty (\varOmega ;\mathbb {R}^3)\).
A general concept for weak continuity is based on the notion of pseudomonotonicity.
Example 4.17
(Pseudomonotone operators) The operator \(F:X\rightarrow X'\) is a pseudomonotone operator if it is bounded, i.e., \(\Vert F(u)\Vert _{X'} \le c (1+\Vert u\Vert _X^s)\) for some \(s\ge 0\), and whenever \(u_j \rightharpoonup u\) in \(X\), we have the implication that
For such an operator we have that if \(F(u_h)[v_h]=\ell (v_h)\) for all \(v_h\in X_h\) with a strongly dense family of subspaces \((X_h)_{h>0}\) and \(u_h \rightharpoonup u\) as \(h\rightarrow 0\), then \(F(u)=\ell \). To verify this, let \(v\in X\) and \((v_h)_{h>0}\) with \(v_h\in X_h\) such that \(v_h\rightarrow u\) and note that
Pseudomonotonicity yields for every \(v_{h'} \in \cup _{h>0} X_h\) that
With the density of \((X_h)_{h>0}\) in \(X\), we conclude that \(F(u)[u-v] \le \ell (u-v)\) for all \(v\in X\) and with \(v=u \pm w\), we find that \(F(u)[w] = \ell (w)\) for all \(w\in X\).
Remarks 4.7
(i) Radially continuous bounded operators are pseudomonotone. Here, radial continuity means that \(t\mapsto F(u+tv)[v]\) is continuous for \(t\in \mathbb {R}\) and all \(u,v\in X\). These operators allow us to apply Minty’s trick to deduce from the inequality \(\ell (u-v)-F(v)[u-v]\ge 0\) for all \(v\in X\) that \(F(u)=\ell \). To prove this implication, note that with \(v=u+\varepsilon w\), we find that \(\ell ( w) -F(u+\varepsilon w)[w] \le 0\) and by radial continuity for \(\varepsilon \rightarrow 0\), it follows that \(\ell (w)-F(u)[w] \le 0\) and hence \(F(u)=\ell \).
(ii) Pseudomonotone operators are often of the form \(F=F_1+F_2\) with a monotone operator \(F_1\) and a weakly continuous operator \(F_2\), e.g., a lower-order term described by \(F_2\).
Example 4.18
(Quasilinear diffusion) The concept of pseudomonotonicity applies to the quasilinear elliptic equation
with \(g\in C(\mathbb {R})\) such that \(|g(s)|\le c (1+|s|^{r-1})\) and \(1<p<d\), \(r<dp/(d-p)\).
3 Solution of Discrete Problems
We discuss in this section the practical solution of discretized minimization problems of the form
In particular, we investigate four model situations with smooth and nonsmooth integrands and smooth and nonsmooth constraints included in \(\fancyscript{A}\). The iterative algorithms are based on an approximate solution of the discrete Euler–Lagrange equations. More general results can be found in the textbooks [4, 12].
3.1 Smooth, Unconstrained Minimization
Suppose that
and \(I_h\) is defined as above with functions \(W\in C^1(\mathbb {R}^{m\times d})\) and \(g\in C^1(\mathbb {R}^m)\). The case \({\varGamma _\mathrm{D}}=\emptyset \) is not generally excluded in the following. A necessary condition for a minimizer \(u_h\in \fancyscript{A}_h\) is that for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\), we have
Steepest descent methods successively lower the energy by minimizing in descent directions defined through an appropriate gradient.
Algorithm 4.1
(Descent method) Let \((\cdot ,\cdot )_H\) be a scalar product on \(\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) and \(\mu \in (0,1/2)\). Given \(u_h^0 \in \fancyscript{A}_h\), compute the sequence \((u_h^j)_{j=0,1,\ldots }\) via \(u_h^{j+1} = u_h^j + \alpha _j d_h^j\) with \(d_h^j\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) such that
for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) and either the fixed step-size
or the line-search minimum which seeks the maximal \(\alpha _j \in \{2^{-\ell },\, \ell \in \mathbb {N}_0 \}\) such that
Stop the iteration if \(\Vert \alpha _j d_h^j\Vert _H \le \varepsilon _\mathrm{stop}\).
Remarks 4.8
(i) Since \(I_h\) is continuously differentiable, the descent method decreases the energy in every step. This follows from
i.e., the continuous function \(\varphi (\alpha ) = I_h(u_h^j+\alpha d_h^j)\) is strictly decreasing for \(\alpha \in [0,\delta ]\). The existence of \(\alpha _j>0\) that satisfies the Armijo–Goldstein condition of Algorithm 4.1 follows from expanding
provided that \(W\) and \(g\) are sufficiently smooth so that \(I_h\in C^2(X_h)\).
(ii) The scalar product \((\cdot ,\cdot )_H\) acts like a preconditioner for \(F_h\), i.e., we have \(u_h^{j+1} = u_h^j - \tau X_H^{-1} F_h (u_h^j)\) with respect to an appropriate basis. In particular, the descent method may be regarded as a fixed-point iteration.
(iii) Larger step sizes are typically possible for implicit or semi-implicit versions of the descent method, i.e., by considering a fixed step-size and the modified equation
for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) and with a function \(\widetilde{F}_h\) such that \(\widetilde{F}_h(u_h,u_h)=F_h(u_h)\). If \(F_h(u_h) = G_h(u_h) + T_h(u_h)\) with a linear or monotone operator \(G_h\), then a natural choice is \(\widetilde{F}_h(u_h,\widetilde{u}_h) = G_h(u_h)+ T_h(\widetilde{u}_h)\). Generally, large time steps are possible when monotone terms are treated implicitly and antimonotone terms explicitly.
(iv) If \(X_h= V_h \times W_h\) and \(I_h(u_h)=J_h(\phi _h,\psi _h)\) is separately convex, i.e., the mappings \(v_h \mapsto J_h(v_h,\psi _h)\) and \(w_h \mapsto J_h(\phi _h,w_h)\) are convex for all \((\phi _h,\psi _h)\in V_h\times W_h\), a decoupled, semi-implicit gradient flow discretization is unconditionally stable. Given the initial \((\phi _h^0,\psi _h^0)\in V_h\times W_h\), consider the iteration
where \(\delta _1 J_h\) and \(\delta _2J_h\) denote the Fréchet derivatives of \(J_h\) with respect to the first and second argument, respectively. The choices \(v_h = d_t \phi _h^{j+1}\), \(w_h = d_t \psi _h^{j+1}\) and the separate convexity of \(J\) lead to
which implies the unconditional stability of the scheme.
Theorem 4.9
(Convex functionals) Assume that \(I_h\) is convex and bounded from below and \(F_h\) is Lipschitz continuous, i.e., there exists \(c_F\ge 0\) such that
for all \(w_h,v_h\in X_h\). Let \(c_h>0\) be such that \(\Vert v_h\Vert _X \le c_h \Vert v_h\Vert _H\) for all \(v_h\in X_h\). Then the steepest descent method with fixed step-size \(\tau >0\) such that \(\tau c_F c_h\le 1/2\) terminates within a finite number of iterations, and for all \(J\ge 0\), we have
Proof
The convexity of \(I_h\) implies that
Using that \(\tau d_h^j = u_h^{j+1}-u_h^j\) and choosing \(v_h = \tau d_h^j\) in the discrete scheme leads to
Therefore, if \(\tau c_F c_h\le 1/2\) we deduce the estimate from a summation over \(j=0,1,\ldots ,J\). The estimate implies that \(d_h^j\rightarrow 0\) as \(j\rightarrow \infty \) so that \(\Vert \tau d_h^j\Vert _H \le \varepsilon _\mathrm{stop}\) for \(j\) sufficiently large. \(\square \)
Remarks 4.9
(i) The arguments of the proof of the theorem show that the implicit version of the descent method, defined by \((d_h^j,v_h)_H + F_h(u_h^j+\tau d_h^j)[v_h]=0\) for every \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\), is unconditionally convergent, but requires the solution of nonlinear systems of equations in every time step.
(ii) For nonconvex functionals, the iteration typically converges to a local minimum of \(I_h\). Theoretically, the iteration may stop at a saddle point or local maximum.
To formulate the Newton method for solving the equation \(F_h(u_h)=0\) in \(X_h'\) we assume that \(W\in C^2(\mathbb {R}^{m\times d})\) and \(g\in C^2(\mathbb {R}^m)\). The Newton scheme may be regarded as an explicit descent method with a variable metric defined by the second variation of the energy functional \(I_h\), i.e.,
for \(u_h,v_h,w_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\).
Algorithm 4.2
(Newton method) Given \(u_h^0 \in \fancyscript{A}_h\), compute the sequence \((u_h^j)_{j=0,1,\ldots }\) via \(u_h^{j+1} = u_h^j + \alpha _j d_h^j\) with \(d_h^j\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) such that
for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) and \(\alpha _j>0\) with either the optimal step-size \(\alpha _j =1\), a fixed damping parameter \(\alpha _j = \tau <1\), or a line search minimum \(\alpha _j\) as in Algorithm 4.1. Stop the iteration if \(\Vert \alpha _j d_h^j\Vert _H \le \varepsilon _\mathrm{stop}\) for a norm \(\Vert \cdot \Vert _H\) on \(\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\).
The convergence of the Newton iteration will be discussed in a more general context below in Sect. 4.3.3.
Remark 4.10
As opposed to the descent method, the Newton iteration can in general only be expected to converge locally. Under certain conditions the Newton scheme converges quadratically in a neighborhood of a solution. Optimal results can be obtained by combining the globally but slowly convergent descent method with the locally but rapidly convergent Newton method. Since the convergence of the Newton method is often difficult to establish and requires \(W\) and \(g\) to be sufficiently regular, developing globally convergent schemes is important to construct reliable numerical methods.
Example 4.19
For the approximation of minimal surfaces that are presented by graphs of functions over \(\varOmega \), we consider
and note that for \(u_h\in \fancyscript{A}_h= \{v_h\in \fancyscript{S}^1(\fancyscript{T}_h): v_h|_{\varGamma _\mathrm{D}}= u_{\mathrm{D},h}\}\) and \(v_h,w_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) we have
and
Figure 4.2 displays a combined Matlab implementation of the Newton iteration and the descent method with line search. The Newton method fails to provide meaningful approximations for moderate perturbations of the nodal interpolant of the exact solution as a starting value.
3.2 Smooth Constrained Minimization
We next consider the case that the set of admissible functions includes a pointwise constraint, which is imposed at the nodes of a triangulation, i.e., for \(G\in C(\mathbb {R}^m)\), we have
The identity \(G\big (u_h(z)\big )=0\) for all \(z\in \fancyscript{N}_h\) is equivalent to the condition \(\fancyscript{I}_h G(u_h)=0\). We always assume in the following that \(\fancyscript{A}_h \ne \emptyset \), i.e., that the function \(u_{\mathrm{D},h}\) is compatible with the constraint. Moreover, we assume \(G\in C^1(\mathbb {R}^m)\) with \(DG(s)\ne 0\) for every \(s\in M=G^{-1}(\{0\})\) so that \(M\subset \mathbb {R}^m\) is an \((m-1)\)-dimensional \(C^1\)-submanifold. The Euler–Lagrange equations of the discrete minimization problem
in the set of all functions \(u_h\in \fancyscript{A}_h\) can then be formulated as follows.
Proposition 4.3
(Optimality conditions) The function \(u_h\in \fancyscript{A}_h\) is stationary for \(I_h\) in \(\fancyscript{A}_h\) if and only if
for all \(w_h\in T_{u_h} \fancyscript{A}_h\), where the discrete tangent space \(T_{u_h} \fancyscript{A}_h\) of \(\fancyscript{A}_h\) at \(u_h\) is defined by
Proof
We let \(\varphi _h:(-\varepsilon ,\varepsilon )\rightarrow \fancyscript{A}_h\) be a continuously differentiable function with \(\varphi _h(0)=u_h\). We then have that \(w_h = \varphi _h'(0)\in T_{u_h}\fancyscript{A}_h\) and
Conversely, for every \(w_h\in T_{u_h}\fancyscript{A}_h\) there exists a function \(\varphi _h(t)\) as above. \(\square \)
Remark 4.11
An equivalent characterization of stationary points is the existence of a Lagrange multiplier \(\lambda _h \in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)\) such that for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\), we have
We propose the following descent scheme for the iterative solution of the constrained problem. It may be regarded as a semi-implicit discretization of an \(H\)-gradient flow. In particular, the problems that have to be solved at every step of the iteration are linear if \(F_h\) is linear.
Algorithm 4.3
(Constrained descent method) Let \((\cdot ,\cdot )_H\) be a scalar product on \(\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) and given \(u_h^0 \in \fancyscript{A}_h\), compute the sequence \((u_h^j)_{j=0,1,\ldots }\) via \(u_h^{j+1} = u_h^j + \tau d_h^j\) with \(d_h^j\in T_{u_h^j}\fancyscript{A}_h\) such that
for all \(v_h\in T_{u_h^j}\fancyscript{A}_h\). Stop the iteration if \(\Vert d_h^j\Vert _H \le \varepsilon _\mathrm{stop}\).
Remark 4.12
If \(F_h\) is linear, then the solution of an iteration is equivalent to the solution of a linear system of equations of the form
where \(D_h^j\), \(U_h^j\), and \(\varLambda _h^j\) are vectors that contain the nodal values of the functions \(d_h^j\), \(u_h^j\), and \(\lambda _h^j\), respectively, and \(X_H\), \(S\), and \(dG_h\) are matrices that represent the scalar product \((\cdot ,\cdot )_H\), the bilinear form \(F_h(u_h)[v_h]\), and the linearized constraint defined by \(DG\).
The iterates \((u_h^j)_{j=0,1,\ldots }\) will in general not satisfy the constraint \(\fancyscript{I}_h G(u_h^j)=0\) but under moderate conditions, the violation of the constraint is small. We recall the notation \(\Vert v\Vert _h^2 = \int _\varOmega \fancyscript{I}_h [v^2] \,{\mathrm d}x\) for \(v\in C(\overline{\varOmega })\).
Theorem 4.10
(Constrained convex minimization) Assume that \(G\in C^2(\mathbb {R}^m)\) with \(\Vert D^2G\Vert _{L^\infty (\mathbb {R}^m)}\le c\), \(I_h\) is convex, \(u_h^0\in \fancyscript{A}_h\), and \(\Vert v_h\Vert _h \le c \Vert v_h\Vert _H\) for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\). For all \(J\ge 0\) we have
and for every \(j=1,2,\ldots \), the bound
The algorithm terminates after a finite number of iterations.
Proof
The convexity of \(I_h\) implies that
With the choice of \(v_h=\tau d_h^j\) in the algorithm and the relation \(u_h^{j+1}=u_h^j+\tau d_h^j\), this leads to
A summation over \(j=0,1,\ldots ,J\) proves the energy law. A Taylor expansion shows that for every \(z\in \fancyscript{N}_h{\setminus }{\varGamma _\mathrm{D}}\), we have for some \(\xi _z^j\in \mathbb {R}^m\) that
Noting \(DG(u_h^j(z))\cdot d_h^j(z)=0\) and \(G(u_h^0(z))=0\), we deduce by induction that
Since \(D^2G\) is uniformly bonded we have with \(\beta _z = \int _\varOmega \varphi _z\,{\mathrm d}x\) that
A combination with the energy law implies the bound for \(\Vert \fancyscript{I}_h G(u_h^{j+1})\Vert _{L^1(\varOmega )}\). The convergence of the iteration follows from the convergence of the sum of norms of the correction vectors \(d_h^j\). \(\square \)
Remark 4.13
In order to satisfy the constraint exactly, the algorithm can be augmented by defining the new iterates through the projection
where \(\pi _M : U_\delta (M)\rightarrow M\) is the nearest neighbor projection onto \(M=G^{-1}(\{0\})\) that is defined in a tubular neighborhood \(U_\delta (M)\) of \(M\) for some \(\delta >0\) if \(M\in C^2\). The step-size \(\tau >0\) has to be sufficiently small in order to guarantee the well-posedness of the iteration.
Example 4.20
(Harmonic maps) Minimizing the Dirichlet energy in the set
corresponds to the situation of Theorem 4.10 with \(G(s)=|s|^2-1\) and \(M=S^{m-1}=\{s\in \mathbb {R}^m: |s|=1\}\). In particular, we have \(DG(s)=2s\) and \(\Vert D^2G\Vert _{L^\infty (\mathbb {R}^m)}=2m^{1/2}\). The discrete tangent spaces are given by
The nearest neighbor projection \(\pi _{S^2}\) is for \(s\in \mathbb {R}^m{\setminus }\{0\}\) defined by \(\pi _{S^2}(s) = s/|s|\).
3.3 Nonsmooth Equations
We consider an abstract equation of the form
for all \(v_h\in X_h\) with a continuous operator \(F_h:X_h \rightarrow Y_h\) that may not be continuously differentiable. The goal is to formulate conditions that allow us to prove convergence of an appropriate generalization of the Newton method. We let \(X_h\) and \(Y_h\) be Banach spaces in the following, and assume that \(X_h\) is equipped with the norm of a Banach space \(X\). We let \(\mathrm{L}(X_h,Y_h)\) denote the space of continuous linear operators \(A_h:X_h\rightarrow Y_h\) and let \(\Vert A_h\Vert _{\mathrm{L}(X_h,Y_H)}\) be the corresponding operator norm.
Definition 4.3
We say that \(F_h:X_h\rightarrow Y_h\) is Newton differentiable at \(v_h \in X_h\) if there exists \(\varepsilon >0\) and a function \(G_h:B_\varepsilon (v_h)\rightarrow \mathrm{L}(X_h,Y_h)\) such that
The function \(G_h\) is called the Newton derivative of \(F_h\) at \(v_h\).
Remark 4.14
Notice that in contrast to the definition of the classical derivative, here the derivative is evaluated at the perturbed point \(v_h+w_h\). This is precisely the expression that arises in the convergence analysis of the classical Newton iteration.
Examples 4.21
(i) If \(F_h:X_h \rightarrow Y_h\) is continuously differentiable in a neighborhood of \(v_h\in X_h\), then \(F_h\) is Newton differentiable at \(v_h\) with Newton derivative\(G_h = DF_h\), i.e., we have
and the right-hand side converges faster to 0 than \(\Vert w_h\Vert _X\) as \(w_h \rightarrow 0\).
(ii) If \(X_h\) is a Hilbert space the function \(F_h(v)=\Vert v\Vert _X\), \(v\in X_h\), is Newton differentiable with
where \(\xi \in X_h\) with \(\Vert \xi \Vert _X\le 1\) is arbitrary.
(iii) The function \(F_h:\mathbb {R}\rightarrow \mathbb {R}\), \(s\mapsto \max \{0,s\}\), is Newton differentiable with Newton derivative \(G_h(s) = 0\) for \(s<0\), \(G_h(0)=\delta \) for arbitrary \(\delta \in [0,1]\), and \(G(s)=1\) for \(s>0\).
(iv) If \(1\le p<q\le \infty \), the mapping
is Newton differentiable with the Newton derivative \(G_h(v_h)\) for \(G_h\) as above. For \(p=q\) this is false.
The semismooth Newton method is similar to the classical Newton iteration but employs the Newton derivative instead of the classical derivative.
Algorithm 4.4
(Semismooth Newton method) Given \(u_h^0 \in X_h\), compute the sequence \((u_h^j)_{j=0,1,\ldots }\) via \(u_h^{j+1} = u_h^j + d_h^j\) with \(d_h^j\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\) such that
for all \(v_h\in \fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\). Stop the iteration if \(\Vert d_h^j\Vert _H \le \varepsilon _\mathrm{stop}\) for a norm \(\Vert \cdot \Vert _H\) on \(\fancyscript{S}^1_\mathrm{D}(\fancyscript{T}_h)^m\).
Theorem 4.11
(Superlinear convergence) Suppose that \(F_h(u_h)=0\) and \(F_h:X_h\rightarrow Y_h\) is Newton differentiable at \(u_h\), such that the linear mapping \(G(\widetilde{u}_h):X_h\rightarrow Y_h\) is invertible with \(\Vert G_h^{-1}(\widetilde{u}_h)\Vert _{\mathrm{L}(Y_h,X_h)}\le M\) for every \(\widetilde{u}_h\in B_\varepsilon (u_h)\) with some \(\varepsilon >0\). Then the semismooth Newton method converges superlinearly to \(u_h\) if \(u_h^0\) is sufficiently close to \(u_h\), i.e., for every \(\eta >0\), there exists \(J\ge 0\) such that for all \(j\ge J\), we have
Proof
Noting \(d_h^j = -G_h(u_h^j)^{-1} F_h(u_h^j)\), we have
Writing \(u_h^{j+1} = u_h+w_h^{j+1}\), we have
with a function \(\varphi (s)\) satisfying \(\varphi (s)/s\rightarrow 0\) as \(s\rightarrow 0\). If \(\Vert w_h^0\Vert _X\) is sufficiently small, e.g., \(\Vert w_h^0\Vert _X\le \varepsilon /(M \theta )\) with \(\theta = \max _{s\in [0,1]} \varphi (s)\), then we inductively find \(u_h^j\in B_\varepsilon (u_h)\) for all \(j\ge 0\) and \(\Vert w_h^j\Vert _X \rightarrow 0\) as \(j\rightarrow \infty \). For \(J\ge 0\) such that \(\phi (\Vert w_h^j\Vert _X)\le (\eta /M) \Vert w_h^j\Vert _X\) for all \(j\ge J\), we verify the estimate of the theorem. \(\square \)
Remark 4.15
If \(F_h\) is twice continuously differentiable so that \(G_h = DF_h\) is locally Lipschitz continuous and \(\Vert DF_h^{-1}(\widetilde{u}_h)\Vert _{\mathrm{L}(Y_h,X_h)} \le M\), then Algorithm 4.4 coincides with the classical Newton iteration which is locally and quadratically convergent.
3.4 Nonsmooth, Strongly Convex Minimization
For Banach spaces \(X\) and \(Y\), proper, convex, and lower semicontinuous functionals \(G:X\rightarrow \mathbb {R}\cup \{+\infty \}\), \(F:Y\rightarrow \mathbb {R}\cup \{+\infty \}\), and a bounded, linear operator \(\varLambda :X\rightarrow Y\), we consider the saddle-point problem
The pair \((u,p)\) is a saddle point for \(L\) if and only if
where \(\varLambda ':Y'\rightarrow X'\) denotes the formal adjoint of \(\varLambda \). The related primal and dual problem consist in the minimization of the functionals
We have \(I(u)-D(p)\ge 0\) for all \((u,p)\in X\times Y'\) with equality if and only if \((u,p)\) is a saddle point for \(L\). We assume in the following that \(X\) and \(Y\) are Hilbert spaces and identify them with their duals. The descent and ascent flows \(\partial _t u = -\partial _u L(u,p)\) and \(\partial _t p = \partial _p L(u,p)\), respectively, motivate the following algorithm. Further details about related nonsmooth minimization problems can be found in [14].
Algorithm 4.5
(Primal-dual iteration) Let \((u^0,p^0)\in X\times Y\) and set \(d_tu^0 =0\). Compute the sequences \((u^j)_{j=0,1,\ldots }\) and \((p^j)_{j=0,1,\ldots }\) by iteratively solving the equations
Stop the iteration if \(\Vert u^{j+1}-u^j\Vert _X\le \varepsilon _\mathrm{stop}\).
Remark 4.16
The equations in Algorithm 4.5 are equivalent to the variational inequalities
for all \((v,q)\in X\times Y\). Here, \(\alpha > 0\) if \(G\) is uniformly convex.
We prove convergence of Algorithm 4.5 assuming that \(\alpha >0\). We abbreviate by \(\Vert \varLambda \Vert \) the operator norm \(\Vert \varLambda \Vert _{\mathrm{L}(X,Y)}\).
Theorem 4.12
(Convergence) Let \((u,p)\) be a saddle point for \(L\). If \(\tau \Vert \varLambda \Vert \le 1\), we have for every \(J\ge 0\) that
In particular, the iteration of Algorithm 4.5 terminates.
Proof
We denote \(\delta _u^{j+1} = u-u^{j+1}\) and \(\delta _p^{j+1} = p-p^{j+1}\) in the following. Using that \(d_t \delta _u^{j+1} = -d_t u^{j+1}\) and \(d_t \delta _p^{j+1} = -d_t p^{j+1}\), we find that
The equations for \(d_t p^{j+1}\) and \(d_t u^{j+1}\) of Algorithm 4.5 and their equivalent characterization in Remark 4.16 lead to
The definitions of \(F^{**}=F\) and \(G^*\) imply that
These estimates and the identity \(u^{j+1}-\widetilde{u}^{j+1}=\tau ^2 d_t^2 u^{j+1}= - \tau ^2 d_t^2 \delta _u^{j+1}\) allow us to deduce that
We use \(I(u)-D(p)=0\) to derive the estimate
A summation of the estimate over \(j=0,1,\ldots ,J\) and multiplication by \(\tau \) lead to
A summation by parts, \(-d_t \delta _u^0 = d_tu^0=0\), and Young’s inequality show that
A combination of the estimates proves the theorem. \(\square \)
Remarks 4.17
(i) The assumption that a saddle point exists implies that primal and dual problem are related by a strong duality principle.
(ii) If \(F\) is strongly convex and \(G\) is only convex, then the roles of \(u\) and \(p\) have to be exchanged to ensure convergence.
(iii) The algorithm may be regarded as an inexact Uzawa algorithm. The classical Uzawa method corresponds to omitting \(d_t u^{j+1}\), i.e., solving the equation \(-\varLambda 'p^{j+1}\in \partial G(u^{j+1})\) for \(u^{j+1}\) at every step of the algorithm.
(iv) Algorithm 4.5 is practical if the proximity operators \(r=(1+\tau \partial F^*)^{-1}q\) and \(w=(1+\tau \partial G)^{-1}v\) can be easily evaluated, i.e., if the unique minimizers of
are directly accessible. This is the case for quadratic functionals and indicator functionals.
Example 4.22
In the case of the discretized Poisson problem with \(X = \fancyscript{S}^1_0(\fancyscript{T}_h)\), we may choose \(Y = \fancyscript{L}^0(\fancyscript{T}_h)^d\), \(\varLambda =\nabla \),
and exchange the roles of \(u_h\) and \(p_h\). Letting \(P_{h,0}f\) denote the \(L^2\) projection onto \(\fancyscript{S}^1_0(\fancyscript{T}_h)\), the iteration reads
The discrete divergence operator \({{\mathrm{div}}}_h : \fancyscript{L}^0(\fancyscript{T}_h)^d\rightarrow \fancyscript{S}^1_0(\fancyscript{T}_h)\) is for every elementwise constant vector field \(q_h\in \fancyscript{L}^0(\fancyscript{T}_h)^d\) defined by \(({{\mathrm{div}}}_h q_h, v_h) = -(q_h, \nabla v_h)\) for all \(v_h\in \fancyscript{S}^1_0(\fancyscript{T}_h)\). Convergence holds if \(\tau \Vert \nabla \Vert \le 1\), where \(\Vert \nabla \Vert \le c h^{-1}\).
3.5 Nested Iteration
The semismooth and classical Newton method can only be expected to converge if the starting value \(u_h^0\) is sufficiently close to the discrete solution \(u_h\). The radius of the ball around \(u_h\) which contains such starting values may depend critically on the mesh-size in the sense that it becomes smaller when the mesh is refined. Such a behavior reflects the problem that the Newton scheme may not be well-defined for the underlying continuous formulation. When a sequence of refined triangulations is used, the corresponding finite element spaces are nested, and one may use an approximate solution computed on a coarse grid to define a starting value for the iteration process on the finer grid. Besides providing a method to construct feasible starting values, this approach can also significantly reduce the computational effort.
Algorithm 4.6
(Nested iteration) Let \((\fancyscript{T}_\ell )_{\ell = 0,\ldots ,L}\) be a sequence of triangulations with \(\fancyscript{S}^1(\fancyscript{T}_{\ell -1}) \subset \fancyscript{S}^1(\fancyscript{T}_\ell )\) for \(\ell =1,2,\ldots ,L\). Set \(\ell = 0\) and choose \(u_\ell ^0 \in \fancyscript{S}^1 (\fancyscript{T}_\ell )\).
(i) Iteratively approximate a solution \(u_\ell \in \fancyscript{S}^1(\fancyscript{T}_\ell )\) of \(F_\ell (u_\ell )=0\) using the starting value \(u_\ell ^0\) to obtain an approximate solution \(u_\ell ^* \in \fancyscript{S}^1(\fancyscript{T}_\ell )\).
(ii) Stop if \(\ell = L\). Otherwise set \(u_{\ell +1}^0 = u_\ell ^*\), \(\ell \rightarrow \ell +1\), and continue with (i).
We make the ideas more precise for a red-green-blue refinement method. The definition is easily generalized to other refinement methods such as newest-vertex bisection.
Definition 4.4
We say that \(\fancyscript{T}_h\) is a refinement of the triangulation \(\fancyscript{T}_H\) if \(\fancyscript{S}^1(\fancyscript{T}_H) \subset \fancyscript{S}^1(\fancyscript{T}_h)\) and for every node \(z^h\in \fancyscript{N}_h\) we either have \(z^h\in \fancyscript{N}_H\) or there exist nodes \(z^H_1,z^H_2\in \fancyscript{N}_H\) with \(z^h = (z^H_1+z^H_2)/2\), cf. Fig. 4.3.
Lemma 4.1
(Prolongation) Let \(\fancyscript{T}_h\) be a refinement of the triangulation \(\fancyscript{T}_H\). Given \(u_H\in \fancyscript{S}^1(\fancyscript{T}_H)\), we have \(u_h=u_H\in \fancyscript{S}^1(\fancyscript{T}_h)\) with nodal values \(u_h(z^h)=u_H(z^h)\) for every \(z^h\in \fancyscript{N}_H\subset \fancyscript{N}_h\) and \(u_h(z^h) = (u_H(z^H_1)+u_H(z^H_2))/2\) for every \(z^h\in \fancyscript{N}_h{\setminus }\fancyscript{N}_H\) and \(z^h_1,z^h_2\in \fancyscript{N}_H\) with \(z^h=(z^H_1+z^H_2)/2\). In particular, there exists a linear prolongation operator
for every \(u_H\in \fancyscript{S}^1(\fancyscript{T}_H)\).
Proof
The assertion of the lemma follows from the fact that the function \(u_h\) is affine on every one-dimensional subsimplex in the triangulation. \(\square \)
Remarks 4.18
(i) The superscript \(1\) in \(Pr^1_{H\rightarrow h}\) corresponds to affine functions. Analogously, there exists a linear operator \(Pr^0_{H\rightarrow h}\) that maps the values of an elementwise constant function on \(\fancyscript{T}_H\) to the values of the function represented on \(\fancyscript{T}_h\).
(ii) Matrices that realize the linear mappings of the nodal or elementwise values are provided by the routine red_refine.m.
(iii) Nested iterations are the simplest version of a multigrid scheme. In more general versions, grid transfer from a fine to a coarse grid called restriction is required. This is often realized with the adjoint operators, i.e., with the transposed matrices.
(iv) For nonnested finite element spaces the grid transfer can be realized with interpolation or projection operators.
References
Babuška, I.: Error-bounds for finite element method. Numer. Math. 16, 322–333 (1970/1971)
Berger, M.S.: Nonlinearity and Functional Analysis. Academic Press, New York (1977)
Boffi, D., Brezzi, F., Fortin, M.: Mixed Finite Element Methods and Applications. Springer Series in Computational Mathematics, vol. 44. Springer, Heidelberg (2013)
Bonnans, J.F., Gilbert, J.C., Lemaréchal, C., Sagastizábal, C.A.: Numerical Optimization, 2nd edn. Universitext. Springer, Berlin (2006)
Braides, A.: Approximation of Free-Discontinuity Problems. Lecture Notes in Mathematics, vol. 1694. Springer, Berlin (1998)
Brenner, S.C.: A Cautionary Tale in Numerical PDEs. In: Sonia Kovalevskii Lecture, ICIAM 2011. Vancouver, Canada (2011)
Brezzi, F.: On the existence, uniqueness and approximation of saddle-point problems arising from Lagrangian multipliers. Rev. Française Automat. Informat. Recherche Opérationnelle Sér. Rouge 8(R-2), 129–151 (1974)
Costabel, M.: A coercive bilinear form for Maxwell’s equations. J. Math. Anal. Appl. 157(2), 527–541 (1991). http://dx.doi.org/10.1016/0022-247X(91)90104-8
Dacorogna, B.: Direct Methods in the Calculus of Variations. Applied Mathematical Sciences, vol. 78, 2nd edn. Springer, New York (2008)
Dziuk, G., Hutchinson, J.E.: Finite element approximations to surfaces of prescribed variable mean curvature. Numer. Math. 102(4), 611–648 (2006). http://dx.doi.org/10.1007/s00211-005-0649-7
Evans, L.C.: Weak Convergence Methods for Nonlinear Partial Differential Equations. CBMS Regional Conference Series in Mathematics, vol. 74. Published for the Conference Board of the Mathematical Sciences, Washington (1990)
Ito, K., Kunisch, K.: Lagrange Multiplier Approach to Variational Problems and Applications. Advances in Design and Control, vol. 15. Society for Industrial and Applied Mathematics (SIAM), Philadelphia (2008)
Nečas, J.: Sur une méthode pour résoudre les équations aux dérivées partielles du type elliptique, voisine de la variationnelle. Ann. Scuola Norm. Sup. Pisa 16(3), 305–326 (1962)
Nesterov, Y., Nemirovski, A.: On first-order algorithms for \(\ell _1\)/nuclear norm minimization. Acta Numer. 22, 509–575 (2013)
Nochetto, R.H., Savaré, G., Verdi, C.: A posteriori error estimates for variable time-step discretizations of nonlinear evolution equations. Commun. Pure Appl. Math. 53(5), 525–589 (2000)
Repin, S.: A Posteriori Estimates for Partial Differential Equations. Radon Series on Computational and Applied Mathematics, vol. 4. Walter de Gruyter GmbH & Co. KG, Berlin (2008)
Rulla, J.: Error analysis for implicit approximations to solutions to Cauchy problems. SIAM J. Numer. Anal. 33(1), 68–87 (1996). http://dx.doi.org/10.1137/0733005
Struwe, M.: Variational Methods, 4th edn. Springer, Berlin (2008)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Bartels, S. (2015). Concepts for Discretized Problems. In: Numerical Methods for Nonlinear Partial Differential Equations. Springer Series in Computational Mathematics, vol 47. Springer, Cham. https://doi.org/10.1007/978-3-319-13797-1_4
Download citation
DOI: https://doi.org/10.1007/978-3-319-13797-1_4
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-13796-4
Online ISBN: 978-3-319-13797-1
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)