1 Introduction

This article investigates the numerical discretization of partial differential equations (PDEs) in variational form for functions whose codomain is a nonlinear Riemannian manifold \(M\). Such problems arise, for example, in Cosserat-type material models [44, 45, 47, 48, 59, 68], liquid crystal physics [2, 32], and image processing [64, 65]. Further, we mention variational splines in manifolds [34], multi-body dynamics [35], and the investigation of harmonic maps into manifolds [18]. In signal processing of manifold-valued signals (see, e.g., [51]), any generalization of a linear variational method leads to a variational problem with values in a manifold.

The numerical approximation of solutions to such PDEs is difficult, because the relevant function spaces do not possess a linear structure. Therefore, standard discretization methods such as finite elements cannot be used. Instead, various ad hoc methods have been proposed in the literature to discretize individual PDEs with particular codomains \(M\). For example, to compute harmonic maps into the unit sphere \(S^2\), Bartels and Prohl [7, 9] embedded \(S^2\) into \(\mathbb {R}^3\) and used first-order Lagrangian finite elements, constraining only the vertex values to be in \(S^2\). In [8], this method has been generalized to compact subsurfaces of Euclidean space which excludes, for instance, the important case of the projective space \(\mathbb {P}^2\). This latter case has been treated in [10]. Other references on the numerical computation of harmonic maps, less related to the present paper, include [2, 40]. In the literature on geometrically exact shells, the direction of the shell surface normal is frequently expressed as a set of angles, and the angles are discretized separately using finite elements [70]. For Cosserat continua (with values in \(\mathbb {R}^3 \times \hbox {SO}(3)\)), an alternative approach, used by Münch [43], Münch et al. [44, 45], and Müller [42], interpolates rotation vectors in \(\mathfrak {so}(3)\) instead of in the group of rotations \(\text {SO(3)}\). Finally, Simo et al. [60, 61] did not interpolate rotations at all. Rather, they kept the orientation at each quadrature point as a history variable and updated it with linear interpolants of the corrections coming from a Newton method.

All these approaches have their shortcomings. Bartels and Prohl rely on an isometric embedding with corresponding projection. This is only an esthetic problem for spaces such as the unit spheres. However, for others such as the symmetric positive definite matrices (used, for example, in [22, 66]) or the projective space \(\mathbb {P}^2\) (used to model liquid crystals [10]), such a projection is not easily available. Also, it is unclear whether their method can achieve higher than first-order convergence. The approach used by Münch and Müeller requires certain ad hoc reparametrizations to properly handle large rotations [43, Sec. 2.5]. Also, the dependence on a fixed tangent space of the codomain breaks objectivity. For the approach by Simo and coworkers [61], Crisfield and Jelenić [15] showed that it introduces a spurious dependence of the solution on the initial iterate and the parameters of the path-following mechanism.

With the notable exception of Bartels and Prohl and Bartels, who proved weak convergence of their discrete solutions to weakly harmonic maps (see also Remark 7.3 below), no analytical investigations of any of the above discretization methods appear in the literature. Hence, it is generally unknown whether these methods converge and whether the nominal rate of the approximation spaces is actually achieved. For the numerical approximation of explicitly given functions with values in a manifold, several theoretical results have been achieved in the recent years [17, 24, 25, 29, 51, 69, 71]. These methods are based on subdivision schemes, and it is unclear how they can be used for solving PDEs.

Recently, geodesic finite elements (GFE) have been introduced for partial differential equations with nonlinear codomains [5456]. Based on the Karcher mean (or Riemannian center of mass), they form a natural generalization of Lagrangian finite elements of arbitrary order to the case where the codomain \(M\) is a nonlinear Riemannian manifold. Geodesic finite elements do not rely on an embedding of \(M\) into a linear space and form a conforming discretization in the sense that geodesic finite element functions are \(H^1\)-functions [56, Thm. 5.1]. Also, they are equivariant under isometries of \(M\). In mechanics, this leads to the desirable property that discretizations of objective problems are again objective. Note that for interpolation of values on a nonlinear manifold, the Karcher mean has already been used in [12, 41, 50].

In [5456], numerical studies of the discretization error were performed. These studies involved geodesic finite elements of order up to three for functions mapping into the unit sphere \(S^2\) and the special orthogonal group \(\text {SO}(3)\). In all cases, optimal convergence orders in the \(L^2\)- and \(H^1\)-norms were observed. However, no analytical investigation of the discretization error was given at all. We make up for this with the present article, providing a complete, intrinsic convergence theory for geodesic finite elements for problems of variational type.

By “variational type,” we mean the following setting. For a domain \(\Omega \subset \mathbb {R}^d\) and \(M\) a Riemannian manifold, we look at minimization problems

$$\begin{aligned} u : \Omega \rightarrow M, \qquad \qquad u ={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in H}}\mathfrak {J}(w), \end{aligned}$$
(1)

with \(\mathfrak {J}: H \rightarrow \mathbb {R}\) a nonlinear functional. The domain \(H\) of \(\mathfrak {J}\) is a set of functions \(\Omega \rightarrow M\) of \(H^1\) smoothness, which we discuss in detail in Sect. 2 (see Definition 2.1 for a definition of \(H^1(\Omega ,M)\)). By construction, GFE functions are \(H^1\) functions, and the set \(V^h\) of GFE functions for a given grid is a subset of \(H\). We can therefore formulate a discrete problem by restricting \(\mathfrak {J}\) to \(V^h\). The discrete solution is

$$\begin{aligned} u^h ={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w^h\in V^h}}\mathfrak {J}(w^h), \end{aligned}$$
(2)

i.e., we minimize the original energy functional over a finite-dimensional subset (in the sense that every element can be described by a finite list of real numbers) of the original set \(H\).

As in the linear case, assessing the error of this numerical procedure is done in two steps. First, under an ellipticity assumption on the energy \(\mathfrak {J}\), we show that \(u^h\) is a quasioptimal solution in the approximation space \(V^h\), that is, the error between \(u^h\) and \(u\) is comparable to the approximation power of the space \(V^h\) (inspired by the linear theory we call such a result a Céa lemma). As it turns out, such a result can be proved easily in general metric spaces, using only certain convexity properties of the energy along geodesics, see Theorem 3.1. However, for the crucial \(H^1\)-type distance, this convexity is difficult to verify in practice. We therefore also give a more elaborate result (Theorem 3.3), which allows to bound the \(H^1\) distance using variations of the energy along geodesic homotopies. The results are independent of the construction of geodesic finite elements and also cover other discretization methods.

Then, in a second step, the approximation power of the GFE spaces is assessed. In Theorem 5.4, we find that, provided that the solution \(u\) has a certain smoothness, the best approximation error of \(u\) in \(V^h\) decays like a power of the mesh size \(h\). Footnote 1 We obtain the same orders as in the corresponding linear cases. All our arguments are completely intrinsic, and the dependence of the approximation quality on the geometry of \(M\) is given via iterated covariant derivatives of the logarithm mapping of \(M\).

Combining these two results yields optimal convergence orders for the discretization error of geodesic finite element discretizations of general nonlinear elliptic variational problems (1) in Theorems 6.1 and 6.2. Compared with known results in the linear setting, the only important additional restriction of our results is that we require the solution \(u\) to lie in a Sobolev space that is embedded in the space of continuous functions—a common minimal assumption for manifold-valued problems. As an application, we give optimal a priori error estimates for GFE discretizations of harmonic maps in Theorem 7.1 under certain assumptions of the sectional curvature of \(M\).

We would like to emphasize that the two aforementioned results, viz. the nonlinear Céa lemmas and the interpolation error estimate, are highly interesting in their own right. For instance, the Céa lemmas apply to approximation spaces other than GFE spaces, for example, the interpolation method used in [42, 43] or projection-based approximation spaces as in [7, 28]. The interpolation error estimates are also useful in the general context of approximating manifold-valued functions (see, e.g., [4, 51]).

A delicate issue is the proper choice of error measures in a nonlinear function space. In the classical theory of a priori bounds in linear spaces, a Sobolev-type half-norm \({|u|}\) of the solution \(u\) bounds the error \(||u-u^h||\). Since there is no subtraction defined on the set \(H\), we need to replace \(||u-u^h||\) by a suitable distance metric in the function space \(H\). We present two such metrics in Sect. 2.2, which reduce to \(||u-u^h||_{H^1}\) if \(M\) is a linear space.

To generalize the term \({|u|}\), the covariant Sobolev half-norm is an obvious choice. However, in our expression for the interpolation error, terms appear that cannot be controlled by a Sobolev half-norm alone. In Sect. 2.4, we therefore introduce a slightly stronger concept, which we call the smoothness descriptor. We show that it provides information that is comparable to the actual Sobolev (half-)norms, but it does differ from them even in linear spaces. The question of whether our bounds also hold for covariant Sobolev norms is open.

We have structured the article as follows. In Chapter 2, we discuss the nonlinear spaces made up by functions \(\Omega \rightarrow M\) of Sobolev smoothness. We propose two distance notions and introduce the smoothness descriptor. In Chapter 3, we prove different forms of a nonlinear Céa lemma. Only then geodesic finite elements are introduced in Chapter 4. The second important part of the proof, the interpolation error bound, is shown in Chapter 5. This allows us to state a priori bounds for the discretization error for the discrete problem (2) in Chapter 6. Finally, in Chapter 7, we apply our results to harmonic maps and some of their generalizations. Under some regularity and curvature assumptions, we obtain optimal error bounds for discrete harmonic maps of all approximation orders.

2 Nonlinear Function Spaces

Describing regularity of functions with a nonlinear codomain is a much less unified field than the corresponding linear theory. We introduce the notions that will be used in this article.

2.1 Sobolev Spaces

The content of this subsection follows the standard definition of manifold-valued Sobolev spaces, see, for instance, [11, 39, 49]. Let \(\Omega \subset \mathbb {R}^{d}\) be open and bounded with Lipschitz boundary. On \(\Omega \), we use canonical coordinates \(x^1,\dots ,x^d\). We use the notation \(\partial ^{\vec {k}}\) for the (weak) partial derivative of a \(d\)-variate function with respect to the multi-index \(\vec {k} = (k_1,\dots , k_d)\in \mathbb {N}_0^d\), i.e.,

$$\begin{aligned} \partial ^{\vec {k}} = \frac{\partial ^{|\vec {k}|}}{\left( \partial x^{d}\right) ^{k_d}\dots \left( \partial x^{1}\right) ^{k_1}}, \end{aligned}$$

where we have written \({|\vec {k}|} :=k_1 + \cdots + k_d\). For a function \(v : \Omega \rightarrow \mathbb {R}\) and an integrability parameter \(p\in [1,\infty )\), we define the usual Sobolev half norms and norms

$$\begin{aligned} {|v|}_{W^{k,p}}^p :=\int \limits _{\Omega } \sum _{{|\vec {k}|}=k}|\partial ^{\vec {k}} v(x)|^p\,dx, \qquad \qquad \Vert v\Vert _{W^{k,p}}^p :=\sum _{j=0}^k {|v|}_{W^{j,p}}^p. \end{aligned}$$
(3)

We denote by \(W^{k,p}(\Omega ,\mathbb {R}^N)\) the set of measurable functions \(\Omega \rightarrow \mathbb {R}^N\) for which this quantity is finite componentwise. This set of functions forms a linear space. As an extension, the space \(W^{k,\infty }(\Omega ,\mathbb {R}^N)\) is defined as the set of all measurable functions \(\Omega \rightarrow \mathbb {R}^N\) for which

$$\begin{aligned} ||v||_{W^{k,\infty }} :=\sum _{{|\vec {k}|}\le k} \sup _{x \in \Omega } {|\partial ^{\vec {k}} v(x)|}\, dx \end{aligned}$$

is finite. For a simpler notation, we will sometimes write \(H^k(\Omega ,\mathbb {R}^N)\) for \(W^{k,2}(\Omega ,\mathbb {R}^N)\).

Let now \((M,g)\) be an \(n\)-dimensional Riemannian manifold with scalar product \(\langle \cdot ,\cdot \rangle _g\) and induced distance \(\mathrm{dist }: M\times M\rightarrow \mathbb {R}^+\). The following definition of a Sobolev space for functions with values in \(M\) is standard (see, e.g., [57]).

Definition 2.1

Let \(i:M\rightarrow \mathbb {R}^N\) be an isometric embedding (which always exists by [46]), \(k \in \mathbb {N}_0\) and \(p \in \mathbb {N} \cup \{\infty \}\). Define

$$\begin{aligned} W^{k,p}(\Omega ,M):=\left\{ v\in W^{k,p}(\Omega ,\mathbb {R}^N):\, v(x)\in i(M),\ \hbox {a.e.}\right\} . \end{aligned}$$

Again we will write \(H^k(\Omega ,M)\) for \(W^{k,2}(\Omega ,M)\). We shall also use the notation \(C(\Omega ,M)\) to denote continuous functions from \(\Omega \) to \(M\).

For nonlinear \(M\), these spaces obviously do not form vector spaces. However, under certain smoothness conditions, the manifold structure of \(M\) is inherited. The following result is proved in [49].

Lemma 2.1

If \(k>d/p\), the spaces \(W^{k,p}(\Omega ,M)\) are Banach manifolds.

Unfortunately, this lemma excludes the important case of \(W^{1,2}(\Omega ,M)\) with \(d \ge 2\). However, even when \(W^{k,p}(\Omega ,M)\) is not a manifold, we can still consider vector fields that are attached to a general continuous \(M\)-valued function.

Definition 2.2

Let \(u \in C(\Omega ,M)\). We say that \(W : \Omega \rightarrow TM\) is a vector field along \(u\) if \(W(x) \in T_{u(x)}M\) for all \(x\in \Omega \). The set of all vector fields along \(u\) is denoted by \(u^{-1}TM\).

For each continuous \(u :\Omega \rightarrow M\), the set \(u^{-1}TM\) forms a linear space which we now equip with two norms. The first is of \(L^p\)-type.

Definition 2.3

Let \(u\in C(\Omega ,M)\). For a vector field \(W \in u^{-1}TM\), and \(p \in [1,\infty ]\), we set

$$\begin{aligned} |W|_{L^p}^p :=\int \limits _{\Omega }{| W(x) |}^p_{g(u(x))} dx, \end{aligned}$$

with the obvious modifications for \(p=\infty \).

The second one is a \(W^{1,2}\)-type norm, involving derivatives with respect to \(x\). With \(\frac{D}{dx^\alpha }\), we denote the covariant partial derivative along \(u\) with respect to \(x^\alpha \). In coordinates on \(\Omega \) and \(M\), it reads

$$\begin{aligned} \frac{D}{dx^\alpha }W^{l}(x):=\frac{dW^l}{dx^\alpha }(x)+{\Gamma }_{ij}^{l}(u(x))\frac{du^{i}}{dx^\alpha }W^{j}(x), \end{aligned}$$

where we sum over repeated indices and denote with \({\Gamma }_{ij}^{l}\) the Christoffel symbols associated with the metric of \(M\).

Definition 2.4

Let \(u\in W^{1,q}(\Omega ,M)\) with \(q>\max (2,d)\) and assume that the coordinate functions associated with the vector field \(W \in u^{-1}TM\) are in \(H^1(\Omega ,\mathbb {R})\). We set

$$\begin{aligned} {|W|}^2_{H^1} :={|W|}^2_{L^2} + \int \limits _\Omega \big |\nabla _xW(x) \big |^2_g dx :={|W|}^2_{L^2} + \sum _{\alpha =1}^{d} \int \limits _\Omega \bigg |\frac{D}{dx^\alpha }W(x) \bigg |^2_{g(u(x))} dx. \end{aligned}$$
(4)

Observe that by the smoothness assumption on \(u\) and the Sobolev embedding theorem the \(H^1\)-norm is indeed well defined by (4). For this norm, we can show the following version of the Poincaré inequality.

Lemma 2.2

Let \(u\in W^{1,q}(\Omega ,M)\) for \(q>\max (2,d)\), and assume that \(W\in u^{-1}TM\) with \(W\big |_{\partial \Omega } = 0\). Then, we have

$$\begin{aligned} {|W|}_{L^2}^2 \le C_1(\Omega ) \sum _{\alpha =1}^{d}\int \limits _{\Omega }\left| \frac{D}{dx^\alpha } W(x)\right| _{g(u(x))}^{2} dx, \end{aligned}$$

with \(C_1(\Omega )\) the Poincaré constant of the domain \(\Omega \).

Proof

By the Poincaré inequality for \(f:x\mapsto |W(x)|_{g(u(x))}\in \mathbb {R}\), we get

$$\begin{aligned} {|W|}_{L^2}^2 = \int \limits _{\Omega }\left| W(x)\right| _{g(u(x))}^{2} dx=\Vert f\Vert _{L^{2}}^2\le C_1 \sum _{\alpha =1}^{d}\left\| \frac{df}{dx^\alpha }\right\| _{L^{2}}^2\;. \end{aligned}$$

Using the Cauchy inequality for \(g\), we may then calculate

$$\begin{aligned} \left| \frac{df}{dx^{\alpha }}(x)\right| =\frac{\langle W(x), \frac{D}{dx^{\alpha }}W\rangle _{g(u(x))}}{|W(x)|_{g(u(x))}} \le \left| \frac{D}{dx^{\alpha }}W(x)\right| _{g(u(x))}, \end{aligned}$$

and the assertion follows. \(\square \)

We will frequently work with functions whose \(W^{1,q}\)-norms are bounded by a fixed constant \(K>0\). We therefore introduce for \(q\in (d,\infty )\) the notation

$$\begin{aligned} W^{1,q}_K \!=\! W^{1,q}_K(\Omega ,M) :=\left\{ v\!\in \! W^{1,q}(\Omega ,M)\!:\! \max _{\alpha = 1 \dots , d}\left( \int \limits _\Omega \left| \frac{d}{dx^\alpha }v(x)\right| ^q_{g(v(x))} dx\right) ^{1/q} \!\le \! K \right\} , \end{aligned}$$
(5)

with obvious modifications for \(q=\infty \). The sets \(W^{1,q}_K\) are a manifold-valued analog of \(K\)-balls in Sobolev spaces. Note that functions in \(W^{1,q}_K\) are necessarily continuous for \(q>\max (2,d)\).

2.2 Distance Measures in Nonlinear Function Spaces

To quantify the error between a function \(u \in W^{k,p}(\Omega ,M)\) and an approximation \(v\) of \(u\) in the same space, we need a distance measure on the nonlinear function space \(W^{k,p}(\Omega ,M)\). This subsection discusses different distance measures in manifold-valued Sobolev spaces and their relation to each other. We suspect that these results are not new but were unable to find a reference for them.

There are several ways to construct such a distance. The simplest one uses the embedding \(i\) used in Definition 2.1 to define the space \(W^{k,p}(\Omega ,M)\).

Definition 2.5

For all \(u,v \in W^{k,p}(\Omega ,M)\) define

$$\begin{aligned} \mathrm{dist }_\mathrm{{emb},W^{k,p}} (u,v) :=||i(u)-i(v)||_{W^{k,p}}. \end{aligned}$$
(6)

Since \(i\) is an isometry, the definition yields a metric. Also, it equals the standard Sobolev distance if \(M\) is a linear space.

This distance is convenient to evaluate and defined even for functions \(u,v\) of little smoothness. However, esthetically, it is somewhat unpleasing, because it depends on the embedding \(i\). A purely intrinsic distance can be defined using minimizing paths.

Definition 2.6

Let \(H\) be a set of functions \(\Omega \rightarrow M\), and \(u,v \in H\). Suppose there is at least one continuously differentiable path \(\gamma \) from \(u\) to \(v\) in \(H\). We denote by \(\dot{\gamma } : \Omega \times [0,1] \rightarrow M\) the push-forward of \(\frac{d}{dt}\) along \(\gamma \), i.e., the vector field defined by

$$\begin{aligned} \dot{\gamma }(x,t):=\frac{d}{dt}\gamma (x,t)\in T_{\gamma (x,t)}M. \end{aligned}$$

For each \(\gamma (t) \in H\), let there be a norm \({|\cdot |}_G\) on the space of vector fields along \(\gamma (t)\) and define

$$\begin{aligned} \mathrm{dist }_G(u,v) :=\inf _{\gamma \text { path from } u \text { to } v} \int \limits _0^1{|\dot{\gamma }(t)|}_G\, dt. \end{aligned}$$

For each norm \({|\cdot |}_G\), we obtain a corresponding distance.

Definition 2.7

For each \(u,v \in C(\Omega ,M)\) and \(p\in [1,\infty )\) define

$$\begin{aligned} \mathrm{dist }_{L^p}(u,v) :=\inf _{\gamma \text { path from } u \text { to } v} \int \limits _0^1{|\dot{\gamma }(t)|}_{L^p}\,dt = \left( \int \limits _{\Omega }\mathrm{dist }(u(x),v(x))^p\, dx\right) ^{1/p} \end{aligned}$$

and

$$\begin{aligned} \mathrm{dist }_{L^\infty }(u,v) :=\inf _{\gamma \text { path from } u \text { to } v} \int \limits _0^1{|\dot{\gamma }(t)|}_{L^\infty }\,dt=\sup _{x\in \Omega }\mathrm {dist}(u(x),v(x)). \end{aligned}$$

Finally, for each \(u,v \in W^{1,q}(\Omega ,M)\), \(q>\max (2,d)\), define

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,v) :=\inf _{\gamma \text { path from } u \text { to } v} \int \limits _0^1{|\dot{\gamma }(t)|}_{H^1}\, dt. \end{aligned}$$
(7)

The minimizing curves with respect to \(\mathrm{dist }_{L^2}\) are called geodesic homotopies. They have the following useful property.

Remark 2.1

Let \(\gamma : [0,1] \rightarrow C(\Omega ,M)\) be a geodesic homotopy. Then, for each \(x \in \Omega \), the curve \(\gamma (x,\cdot )\) is a geodesic on \(M\).

Two functions that can be connected by a geodesic homotopy are called geodesically homotopic.

Defining distance using minimizing paths is a very elegant way of defining a distance, but it can be difficult to work with. Inside our proofs, we will therefore frequently use a third error measure. It has a lot less mathematical structure than the two distance notions introduced above. However, we show below that it bounds both the embedded and the path-induced distance from above.

For the definition, we need the exponential map \(\exp (\cdot ,\cdot )\) of \(M\), as well as its inverse \(\log (\cdot ,\cdot )\). For both maps, the first argument denotes the base point \(p \in M\). That is, \(\exp (p,\cdot ) : T_pM \rightarrow M\) and \(\log (p,\cdot ) : M \supset U \rightarrow T_pM\).

Definition 2.8

Let \(u,v\in W^{1,q}(\Omega ,M)\) for \(q>\max (2,d)\). Define the quantity

$$\begin{aligned} D_{1,2}(u,v)^2\!:=\!\int \limits _{\Omega }\left| \log (u(x),v(x))\right| ^2_{g(u(x))} dx + \sum _{\alpha =1}^d\int \limits _{\Omega } \left| \frac{D}{dx^\alpha }\log (u(x),v(x)) \right| ^2_{g(u(x))} dx. \end{aligned}$$
(8)

In the linear case, this definition coincides with the usual \(H^1\) error. It is, however, not a metric, since it is neither symmetric nor does it fulfill the triangle inequality.

The following lemma states that \(D_{1,2}(u,v)\) provides an upper bound for \(\Vert i(u)-i(v)\Vert _{H^1}\) for \(u,v\in W^{1,q}_K\) as defined in (5). In the following, we will write \(A \lesssim B\) to say that a quantity \(A\) is bounded by a quantity \(B\) times a constant. If also the converse estimate holds, we will sometimes write \(A\sim B\).

Lemma 2.3

For \(u,v\in W^{1,q}_K\) with \(q>\max (2,d)\) and \(M\) isometrically embedded into Euclidean space, we have the estimate

$$\begin{aligned} \Vert i(u)-i(v)\Vert _{H^1}\lesssim D_{1,2}(u,v), \end{aligned}$$

with the implicit constant only depending on \(K\), the embedding \(i\), and the geometry of \(M\).

Proof

For simplicity, we abuse notation and write \(i(u) = u\), \(i(v) = v\). Clearly, we have

$$\begin{aligned} |u(x) - v(x)|\le \mathrm{dist }(u(x),v(x)) = \left| \log (u(x),v(x))\right| _{g(u(x))} \qquad \text {for almost all } x\; \in \Omega , \end{aligned}$$

which takes care of the first term in the definition of \(\Vert \cdot \Vert _{H^1}\). For the term associated with the derivative, we put \(v(x) = \exp (u(x),\log (u(x),v(x)))\) and compute, using the notation \(\partial _1\exp (p,w) = \frac{d}{dp}\exp (p,w)\), \(\partial _2\exp (p,w) = \frac{d}{dw}\exp (p,w)\) that

$$\begin{aligned} \frac{d}{dx^\alpha }v(x)&= \partial _1\exp \big (u(x),\log (u(x),v(x))\big )\frac{d}{dx^\alpha }u(x)\\&+\, \partial _2\exp \big (u(x),\log (u(x),v(x))\big )\frac{D}{dx^\alpha }\log (u(x),v(x)). \end{aligned}$$

Then, since \(\partial _1\exp (p,0)w = w\) for all \(p\in M,\ w\in T_{p}M\), we have

$$\begin{aligned} \frac{d}{dx^\alpha }u(x) =\partial _1\exp (u(x),0)\frac{d}{dx^\alpha }u(x). \end{aligned}$$

Hence, we can write the difference \(\frac{d}{dx^\alpha }u(x) - \frac{d}{dx^\alpha }v(x)\) as a sum of the terms

$$\begin{aligned} \mathrm {I}:=\Big [\partial _1\exp (u(x),0) -\partial _1\exp \big (u(x),\log (u(x),v(x))\big )\Big ]\frac{d}{dx^\alpha }u(x) \end{aligned}$$

and

$$\begin{aligned} \mathrm {II} :=\partial _2\exp \big (u(x),\log (u(x),v(x))\big )\frac{D}{dx^\alpha }\log (u(x),v(x)). \end{aligned}$$

The quantity \(\mathrm {II}\) can be bounded in modulus by \(\frac{D}{dx^\alpha }\log (u(x),v(x))\), up to a constant. By the Lipschitz continuity of \(\partial _1\exp \) in its second argument and the fact that \(u\in W^{1,q}_K\) with \(q>\max (2,d)\) by assumption, we can use the Sobolev embedding theorem to bound \(\mathrm {I}\) up to a constant by \(||\log (u(x),v(x))||_{L^{r}}\) for some \(r<2d/(d-2)\) (for \(d=1\) we can put \(r=\infty \)), which, again by the Sobolev embedding theorem, is bounded by \(D_{1,2}(u,v)\). This proves the statement. \(\square \)

Using a uniformity property of geodesic homotopies (which we prove in the following section), we can show that \(D_{1,2}\) also bounds the distance \(\mathrm{dist }_{W^{1,2}}\) introduced in Definition 2.7.

Lemma 2.4

For each \(u,v \in W^{1,q}_K\) with \(q>\max (2,d)\), we have

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,v)\le C_2 D_{1,2}(u,v), \end{aligned}$$

where \(C_2\) is the constant defined in (9).

Proof

Let \(\Gamma \) a geodesic homotopy (\(L^2\)-geodesic) from \(u\) to \(v\). Then,

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,v) \le \int \limits _0^1 {|\dot{\Gamma }(t)|}_{H^1}\, dt \le \sup _{t \in [0,1]} {|\dot{\Gamma }(t)|}_{H^1} \le C_2 \inf _{t \in [0,1]} {|\dot{\Gamma }(t)|}_{H^1}, \end{aligned}$$

where the last inequality is proved in Lemma 2.5. From this, we can conclude that

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,v) \le C_2 \inf _{t \in [0,1]} {|\dot{\Gamma }(t)|}_{H^1} \le C_2 |\dot{\Gamma }(0)|_{H^1} = C_2 D_{1,2}(u,v). \end{aligned}$$

\(\square \)

2.3 \(H^1\)-Uniformity of Geodesic Homotopies

The curves that induce the \(\mathrm{dist }_{W^{1,2}}\)-distance are difficult to work with. The following result shows that geodesic homotopies are in some sense similar to these curves, provided the derivatives are bounded by a constant \(K\) in the \(W^{1,q}\)-sense. This will allow us to work with geodesic homotopies, and still obtain bounds in the \(\mathrm{dist }_{W^{1,2}}\)-distance.

Lemma 2.5

Assume that \(u,v \in W^{1,q}_K(\Omega ,M)\) for \(q>\max (2,d)\) and that \(\Gamma \) is a geodesic homotopy from \(u\) to \(v\). Then, for all \(s>\frac{qd}{q-d}\), we have

$$\begin{aligned} \sup _{t\in [0,1]}\big | \dot{\Gamma } (\cdot , t)\big |_{H^1} \le C_2 \inf _{t\in [0,1]}\big | \dot{\Gamma } (\cdot , t)\big |_{H^1} \end{aligned}$$

with

$$\begin{aligned} C_2 = \sqrt{2} + 2^{d/2+1} C_3 \Vert \mathrm {Rm}\Vert _{g} K \mathrm{dist }_{L^{s}}(u,v), \end{aligned}$$
(9)

where \(\Vert \mathrm {Rm}\Vert _g\) is the maximum norm of the Riemann curvature tensor \(\mathrm {Rm}\) [16], and \(C_3\) only depends on the geometry of \(M\).

Proof

Since \(t\mapsto \Gamma (x,t)\) is a geodesic, we have that

$$\begin{aligned} \big |\dot{\Gamma }(x,t)\big |^2 = \mathrm{dist }(u(x),v(x))^2, \qquad \text {for almost all } x \in \Omega , \end{aligned}$$

independent of \(t\). Hence,

$$\begin{aligned} \big |\dot{\Gamma }(\cdot , t)\big |_{H^1}^2 =\mathrm{dist }_{L^2}(u,v)^2 + U^2(t), \end{aligned}$$
(10)

where we have defined

$$\begin{aligned} U^2(t):=\sum _{\alpha =1}^d\int \limits _{\Omega } \left| \frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\right| _{g({\Gamma }(x,t))}^{2} dx. \end{aligned}$$

We note that

$$\begin{aligned} \frac{D}{dx^\alpha }\frac{d}{dt} \Gamma (x,t) = \frac{D}{dt}\frac{d}{dx^\alpha }\Gamma (x,t), \end{aligned}$$

as well as the fact that

$$\begin{aligned} J^\alpha (x,t) :=\frac{d}{dx^\alpha }\Gamma (x,t) \end{aligned}$$

satisfies the Jacobi differential equation

$$\begin{aligned} \frac{D^2}{dt^2}J^\alpha (x,t) = \mathrm {Rm}\big (J^\alpha (x,t),\dot{\Gamma }(x,t)\big )\dot{\Gamma }(x,t). \end{aligned}$$

Using this, we can write for every \(\alpha = 1,\dots , d\)

$$\begin{aligned} \frac{d}{dt}\left\langle \frac{D}{dx^\alpha }\dot{\Gamma }(x,t), \frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\right\rangle _{g(\Gamma (x,t))}&= 2\left\langle \frac{D}{dt}\frac{D}{dx^\alpha }\dot{\Gamma }(x,t), \frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\right\rangle _{g(\Gamma (x,t))}\\&= 2\left\langle \frac{D^2}{dt^2}J^\alpha (x,t), \frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\right\rangle _{g(\Gamma (x,t))}\\&= 2\left\langle \mathrm {Rm}\Big (J^\alpha (x,t),\dot{\Gamma }(x,t)\Big )\dot{\Gamma }(x,t), \frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\right\rangle _{g(\Gamma (x,t))}\\&\le 2 \Vert \mathrm {Rm}\Vert _g \big |J^\alpha (x,t)\big |_{g(\Gamma (x,t))}\\&\qquad \Big |\frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\Big |_{g(\Gamma (x,t))}\big |\dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2. \end{aligned}$$

For simplicity, we shall omit the subscript \({g(\Gamma (x,t))}\) from now on.

Since we can write \(J^\alpha \) as

$$\begin{aligned} J^\alpha (x,t) = \frac{d}{dx^\alpha }\exp \big (u(x), t\log (u(x),v(x))\big ), \end{aligned}$$

we see that there exists a uniform constant \(C_3\), only depending on the geometry of \(M\) such that

$$\begin{aligned} |J^\alpha (x,t)|\le C_3\max \Big (\Big |\frac{d}{dx^\alpha }u(x)\Big |,\Big |\frac{d}{dx^\alpha }v(x)\Big |\Big ). \end{aligned}$$

We can use the previous considerations to bound the time derivative of \(U^2\). For \(\frac{1}{d} - \frac{1}{q}>\varepsilon >0\) arbitrary, define

$$\begin{aligned} \frac{1}{r}:=\frac{1}{2} - \frac{1}{d} +\varepsilon ,\quad \frac{1}{s}:=\frac{1}{d}-\frac{1}{q} -\varepsilon , \end{aligned}$$

and observe that we have

$$\begin{aligned} \frac{1}{q} + \frac{1}{2} + \frac{1}{r} + \frac{1}{s} = 1. \end{aligned}$$

Therefore, the Hölder inequality implies that

$$\begin{aligned} \left| \frac{d}{dt}U^2(t)\right|&\le 2 C_3 \Vert \mathrm {Rm}\Vert _g K \mathrm{dist }_{L^{s}}(u,v) \sum _{\alpha =1}^d \left( \int \limits _{\Omega }\left| \frac{D}{dx^\alpha }\dot{\Gamma }(x,t)\right| ^2 \, dx\right) ^{1/2}\mathrm{dist }_{L^{r}}(u,v) \\&\le 2^{(d+1)/2} C_3 \Vert \mathrm {Rm}\Vert _g K \mathrm{dist }_{L^{s}}(u,v) \mathrm{dist }_{L^{r}}(u,v) U(t). \end{aligned}$$

We divide by \(2U(t)\) to get

$$\begin{aligned} \Big | \frac{dU(t)}{dt} \Big | = \Big |\frac{\frac{d}{dt}U^2(t)}{2U(t)}\Big |\le 2^{(d-1)/2} C_3 \Vert \mathrm {Rm}\Vert _g K \mathrm{dist }_{L^{s}}(u,v) \mathrm{dist }_{L^{r}}(u,v). \end{aligned}$$

The results above imply that

$$\begin{aligned} | U(t_2) - U(t_1)| \le 2^{(d-1)/2} C_3 \Vert \mathrm {Rm}\Vert _g K \mathrm{dist }_{L^{s}}(u,v) \mathrm{dist }_{L^{r}}(u,v) \end{aligned}$$
(11)

for any \(t_1,t_2\in [0,1]\).

In the other direction, we note that for all \(t\in [0,1]\) we have by the Sobolev embedding theorem that

$$\begin{aligned} |\dot{\Gamma }(\cdot , t_1)|_{H^1}\ge \mathrm{dist }_{L^{r}}(u,v), \end{aligned}$$

and therefore

$$\begin{aligned} |\dot{\Gamma }(\cdot , t_1)|_{H^1}\ge \frac{1}{2\sqrt{2}}\left( \mathrm{dist }_{L^{r}}(u,v)+ U(t)\right) \quad \text {for all}\ t\in [0,1]. \end{aligned}$$
(12)

Now we can use (10) together with (11) and (12) to see that for \(t_1,t_2\in [0,1]\) we have

$$\begin{aligned} \frac{|\dot{\Gamma }(\cdot , t_1)|_{H^1}}{|\dot{\Gamma }(\cdot , t_2)|_{H^1}}&\le \sqrt{2} \; \frac{\mathrm{dist }_{L^2}(u,v) + U(t_1)}{\mathrm{dist }_{L^2}(u,v) + U(t_2)} \\&\le 2\sqrt{2} \; \bigg [1+ \frac{ | U(t_2)- U(t_1)|}{\mathrm{dist }_{L^2}(u,v) + U(t_2)} \bigg ] \\&\le \sqrt{2} \; \bigg [1 + \frac{2^{(d+1)/2} C_3 \Vert \mathrm {Rm}\Vert _g K \mathrm{dist }_{L^{s}}(u,v) \mathrm{dist }_{L^{r}}(u,v)}{\mathrm{dist }_{L^{r}}(u,v) + U(t_2)} \bigg ] \\&\le \sqrt{2} + 2^{d/2+1} C_3 \Vert \mathrm {Rm}\Vert _g K \mathrm{dist }_{L^{s}}(u,v), \end{aligned}$$

which finally proves the desired estimate. \(\square \)

2.4 The Smoothness Descriptor

We have given one definition of Sobolev regularity of functions \(u : \Omega \rightarrow M\) in Sect. 2.1. A natural alternative is the covariant Sobolev norm

$$\begin{aligned} |u|_{H^k_{\text {cov}}}:=\sum _{\dim \vec {\beta }=k}\left( \int \limits _\Omega \big |\mathcal {D}^{\vec {\beta }} u(x)\big |^2_{g(u(x))}\right) ^{1/2}dx, \qquad \qquad \Vert u\Vert _{H^k_{cov}} = \sum _{i=1}^k |u|_{H^i_{cov}}. \end{aligned}$$

Here, the symbol \(\mathcal {D}^{{\vec {\beta }}}u\) means covariant partial differentiation along \(u\) with respect to the multi-index \(\vec {\beta }\) in the sense that

$$\begin{aligned} \mathcal {D}^{\vec {\beta }} u:=\frac{D}{dx^{\beta _k}} \dots \frac{D}{dx^{\beta _2}}\frac{d}{dx^{\beta _1}}u,\qquad \vec {\beta }\in \{1,\dots ,d\}^k,\ k\in \mathbb {N}_0. \end{aligned}$$
(13)

Additionally, we define \(\mathcal {D}^{\vec {\beta }} u:=1\) (a constant function \(\Omega \rightarrow \mathbb {R}\)) if \(\dim \vec {\beta } = 0\). For a shorter notation, we introduce the symbol

$$\begin{aligned}{}[d] :=\{1,\dots , d\}. \end{aligned}$$

Note that (13) differs from the usual multi-index notation, which cannot be used because covariant partial derivatives do not commute.

Clearly, for linear \(M\), these definitions coincide with the usual Sobolev half norms and norms (3). However, they cannot control all terms appearing in the nonlinear Bramble–Hilbert lemma in Sect. 5 (details are given in Remark 5.1). Therefore, we define the following alternative.

Definition 2.9

(Smoothness Descriptor) For a function \(u: \Omega \rightarrow M\), \(k\ge 1\) and \(p\in [1,\infty ]\), define the homogeneous \(k\)-th-order smoothness descriptor

$$\begin{aligned} \dot{\Theta }_{p,k,\Omega }(u):=\sum _{\genfrac{}{}{0.0pt}{}{\vec {\beta }_j \in [d]^{m_j},\ j=1,\dots , k}{\sum _{j=1}^k m_j = k} } \left( \,\int \limits _\Omega \prod _{j=1}^k \bigg |\mathcal {D}^{{\vec {\beta }}_j}u(x)\bigg |_{g(u(x))}^p dx\right) ^{1/p}, \end{aligned}$$

with the usual modifications for \(p=\infty \). Further, we define the \(L^p\) part

$$\begin{aligned} \dot{\Theta }_{p,0,\Omega }(u):=\min _{q\in M}\left( \,\int \limits _{\Omega }\left| \mathrm{dist }(u(x),q)\right| ^p dx\right) ^{1/p}, \end{aligned}$$

and the corresponding inhomogeneous smoothness descriptor

$$\begin{aligned} \Theta _{p,k,\Omega }(u):=\sum _{i=0}^k \dot{\Theta }_{p,i,\Omega }(u). \end{aligned}$$

We will be mostly dealing with the case \(p=2\), for which we will omit the parameter \(p\) in the notation, i.e.,

$$\begin{aligned} \dot{\Theta }_{k,\Omega }:=\dot{\Theta }_{2,k,\Omega }\quad \text{ and }\quad \Theta _{k,\Omega }:=\Theta _{2,k,\Omega }. \end{aligned}$$

Note that we use a superposed dot to denote homogeneous quantities.

Remark 2.2

A function \(u\) with \(\Theta _{k,\Omega }(u)<\infty \) must be uniformly continuous if \(k>d/2\). Furthermore, in that case, we have

$$\begin{aligned} \mathrm{diam }(u) :=\sup _{x,y\in \Omega } \mathrm {dist}(u(x),u(y)) \lesssim \Theta _{k,\Omega }(u). \end{aligned}$$

Both these assertions are direct consequences of the Sobolev embedding theorem.

To better present the smoothness descriptors \(\Theta \), we discuss their relationships to other measures of regularity. For simplicity, we restrict our analysis to the case \(p=2\). First, it follows directly from the definition that the smoothness descriptor \(\Theta \) is a stronger notion than the covariant Sobolev norm.

Lemma 2.6

\(\Vert u\Vert _{H^k_{\text {cov}}(\Omega ,M)}\le \Theta _{k,\Omega }(u)\).

Proof

The proof follows immediately by noting that all terms that occur in the definition of \(\Vert u\Vert _{H^k_{\text {cov}}(\Omega ,M)}\) also occur in the definition of \(\Theta _{k,\Omega }(u)\). \(\square \)

In the other direction, we show that the Sobolev norm with respect to an embedding also bounds \(\Theta \) from above, if \(k\) is sufficiently large.

Lemma 2.7

Let \(i\) be an isometric embedding of \(M\) into a Euclidean space. For \(k>\frac{d}{2}\), we have \(\Theta _{k,\Omega }(u) \lesssim \Vert i\circ u\Vert _{H^k}^k\).

Note that the smoothness descriptor is bounded by the k-th power of the corresponding norm.

Proof

Identify \(u\) with \(i \circ u\) for simplicity. We need to estimate terms of the form

$$\begin{aligned} \left( \,\int \limits _\Omega \prod _{j=1}^k\left| \mathcal {D}^{{\vec {\beta }}_j}u(x)\right| _{g(u(x))}^2 dx\right) ^{\frac{1}{2}} \end{aligned}$$
(14)

with \(\vec {\beta }_j\in [d]^{m_j}\), \(j=1,\dots ,k\), and \(\sum _{j=1}^km_j\le k\). It will be no loss of generality to assume the most difficult case \(\sum _{j=1}^km_j = k\). First, we deduce from the definition of the covariant derivative that any term of the form (14) can be estimated by a finite linear combination of terms of the form

$$\begin{aligned} \left( \,\int \limits _\Omega \prod _{j=1}^k\left| \partial ^{{\vec {k}}_j}u(x)\right| ^2_{g(u(x))} dx\right) ^{\frac{1}{2}}, \end{aligned}$$
(15)

with \(\sum _{j=1}^k|{\vec {k}}_j|= k\).

Now, for any values \(p_j\), \(j\in 1,\dots , k\) with \(\sum _{j=1}^k \frac{1}{p_j} \le \frac{1}{2}\), by Hölder’s inequality, we can bound (15) by

$$\begin{aligned} \left( \,\int \limits _\Omega \prod _{j=1}^k\left| \partial ^{{\vec {k}}_j}u(x)\right| ^2_{g(u(x))} dx\right) ^{\frac{1}{2}} \le \! \prod _{j=1}^k \left( \,\int \limits _\Omega \left| \partial ^{{\vec {k}}_j}u(x)\right| _{g(u(x))}^{p_j} dx\right) ^{\frac{1}{p_j}} \le \prod _{j=1}^k\Vert u\Vert _{W^{{|\vec {k}}_j|,p_j}}. \end{aligned}$$

We make the specific choice

$$\begin{aligned} \frac{1}{p_j} = \frac{1}{2}-\frac{k-|{\vec {k}}_j|}{d} + \frac{(k-1)(k-d/2)}{kd}. \end{aligned}$$

With this choice and \({|\vec {k}|}=k\), we have that

$$\begin{aligned} \sum _{j=1}^k \frac{1}{p_j} = \frac{k}{2} - \frac{k(k-1)}{d} + \frac{(k-d/2)(k-1)}{d} = \frac{1}{2}. \end{aligned}$$

We shall now use the Sobolev embedding theorem that states that

$$\begin{aligned} \Vert u\Vert _{W^{l,p}}\lesssim \Vert u\Vert _{W^{k,2}} , \end{aligned}$$

whenever

$$\begin{aligned} \frac{1}{2} -\frac{k-l}{d} < \frac{1}{p}. \end{aligned}$$

Setting \(l = {|\vec {k}_j|}\) and \(p = p_j\) for each \(j = 1,\dots , k\), we arrive at the desired statement. \(\square \)

A result similar to Lemma 2.7 can also be established for \(p \ne 2\). In summary, our smoothness descriptor is an appropriate covariant way to measure smoothness of an \(M\)-valued function.

We finally show that the smoothness descriptor has a particular homogeneity property, also enjoyed by conventional Sobolev seminorms in linear spaces.

Definition 2.10

Let \(T_1, T_2\) be two domains in \(\mathbb {R}^d\), and \(\mathcal {F} : T_1 \rightarrow T_2\) a \(C^\infty \)-diffeomorphism. For \(l \in \mathbb {N}_0\), we say that \(\mathcal {F}\) scales with \(h\) of order \(l\) if we have

$$\begin{aligned}&\sup _{x \in T_2} \big |\partial ^{{{\vec {k}}}} \mathcal {F}^{-1}(x)\big | \lesssim h^{|{{\vec {k}}}|} \quad \text{ for } \text{ all } {{\vec {k}}} \in \mathbb {N}_0^d,\ |{{\vec {k}}}| = 0,\dots , l,\end{aligned}$$
(16a)
$$\begin{aligned}&\left| \det \left( \nabla \mathcal {F}(x)\right) \right| \sim h^{-d} \text{ for } \text{ all } x\in T_1\ (\hbox {where} \nabla \mathcal {F} \hbox {is the Jacobian of} \mathcal {F}),\end{aligned}$$
(16b)
$$\begin{aligned}&\sup _{x \in T_1} \Big |\frac{d}{dx^\alpha } \mathcal {F}(x)\Big | \lesssim h^{-1} \text{ for } \text{ all } \alpha =1,\dots , d. \end{aligned}$$
(16c)

Such an \(\mathcal {F}\) will be used to move finite element functions to the reference element and back, without losing approximation orders, see Sect. 5 below.

Lemma 2.8

Let \(T_1, T_2\) be two domains in \(\mathbb {R}^d\), and \(\mathcal {F} : T_1 \rightarrow T_2\) a map that scales with \(h\) of order \(l\). Then, for any \(u : T_1 \rightarrow M\), \(k\le l\) and \(p\in [1,\infty ]\), we have

$$\begin{aligned} \dot{\Theta }_{p,k,T_2}(u\circ \mathcal {F}^{-1}) \lesssim h^{-d/p}h^k\Theta _{p,k,T_1}(u). \end{aligned}$$

Note that we bound the homogeneous smoothness descriptor by the inhomogeneous one.

Proof

It follows directly from the chain rule and the product rule that for any \(m\in \mathbb {N}_0\) and \(\vec {\beta }\in [d]^m\) the expression \(\mathcal {D}^{{\vec {\beta }}}\left( u\circ \mathcal {F}^{-1}\right) \) can be written as a linear combination of terms of the form

$$\begin{aligned} \mathcal {D}^{{\vec {\tau }}}u (\mathcal {F}^{-1}(x)) \prod _{i=1}^l \partial ^{{\vec {k}}_i}\left( \mathcal {F}^{-1}\right) ^{j_i}(x) \end{aligned}$$

with \(\vec {\tau }\in [d]^n\),

$$\begin{aligned} \sum _{i=1}^l|{\vec {k}}_i| = m,\quad n\le m,\ l\le m, \end{aligned}$$

and \(\left( \mathcal {F}^{-1}\right) ^{j_i}\) denoting the \(j_i\)-th coordinate of \(\mathcal {F}^{-1}\). Using the scaling assumption (16a), we can therefore estimate the quantity

$$\begin{aligned} \big |\mathcal {D}^{{\vec {\beta }}}\left( u\circ \mathcal {F}^{-1}\right) \big |_{g(u(\mathcal {F}^{-1}(x)))} \end{aligned}$$

by terms of the form

$$\begin{aligned} \Big |\mathcal {D}^{{\vec {\tau }}}u (\mathcal {F}^{-1}(x)) \prod _{i=1}^l \partial ^{{\vec {k}}_i}\left( \mathcal {F}^{-1}\right) ^{j_i}(x) \Big |_{g(u(\mathcal {F}^{-1}(x)))} \lesssim h^{m}\left| \mathcal {D}^{{\vec {\tau }}}u (\mathcal {F}^{-1}(x)) \right| _{g(u(\mathcal {F}^{-1}(x)))}. \end{aligned}$$

Therefore, every integrand

$$\begin{aligned} \prod _{j=1}^k\big |\mathcal {D}^{{\vec {\beta }}_j} \left( u\circ \mathcal {F}^{-1}\right) (x)\big |_{g(u(\mathcal {F}^{-1}(x)))}^p \end{aligned}$$

in the definition of the homogeneous smoothness descriptor \(\dot{\Theta }_{p,k,T_2}(u\circ \mathcal {F}^{-1})\) can be estimated pointwise by terms of the form

$$\begin{aligned} h^{pk} \prod _{j=1}^k\left| \mathcal {D}^{{\vec {\tau }}_j}u\left( \mathcal {F}^{-1}(x)\right) \right| _{g(u(\mathcal {F}^{-1}(x)))}^p \end{aligned}$$
(17)

with

$$\begin{aligned} \vec {\tau }_j\in [d]^{n_j},\quad n_j\in \mathbb {N}_0,\ \sum _{j=1}^k n_j\le k. \end{aligned}$$

Now, integrating (17) over \(T_2\), and using the substitution \(y= \mathcal {F}^{-1}(x)\), introduces an additional factor \(h^{-d/p}\). Together with the scaling assumption (16b), we obtain the desired estimate. \(\square \)

Remark 2.3

The attentive reader will have noticed that only properties (16a) and (16b) have been used for the proof of Lemma 2.8. We will require the third assumption (16c) later, when we use scaling to derive local elementwise interpolation error estimates in Theorem 5.3 below.

3 Ellipticity and Céa’s Lemma

Recall that we are trying to approximate the solution \(u\) of the variational problem

$$\begin{aligned} u ={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in H}}\mathfrak {J}(w) \end{aligned}$$
(18)

by a minimizer \(v\) on a set \(V\subset H\)

$$\begin{aligned} v ={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in V}}\mathfrak {J}(w), \end{aligned}$$
(19)

where \(H\) is a suitable set of functions, possibly fulfilling Dirichlet conditions. The classical linear Céa lemma assumes that \(H\) is a Hilbert space, and gives an estimate for the error between \(v\) and \(u\) in terms of the optimal approximation error \(\inf _{w\in V} ||u-w||_H\) of the approximation space \(V\) [13]. In this section, we show analogous results when \(H\) consists of manifold-valued functions.

We proceed in two steps. Céa-type lemmas can be formulated and proved elegantly in general metric spaces. We show this in Sect. 3.1 and also give a reformulation for the case that \(H\) has a smooth structure with a Finsler norm [6]. These results require certain convexity or ellipticity properties of the energy along distance-realizing curves. They are of independent interest, but they also illustrate some of the ideas of the subsequent section. There we allow variations over certain nonminimizing curves. The resulting Céa lemma is the basis of the discretization error bounds for geodesic finite elements in Chapter 6.

3.1 Céa’s Lemma Based on Variations Along Curves

We start in an abstract setting. Suppose \(H\) is a metric space with distance function \(\mathrm{dist }(\cdot ,\cdot )\), and \(u\) and \(v\) are solutions of the minimization problems (18) and (19), respectively. We will refer to \(v\) as a quasioptimal solution if

$$\begin{aligned} \mathrm{dist }(u,v) \le C \inf _{w\in V}\mathrm{dist }(u,w) \end{aligned}$$

for a constant \(C>0\). In other words, \(v\) is quasioptimal if its distance to \(u\) can be bounded by a constant \(C\) times the best approximation \(\inf _{w\in V}\mathrm{dist }(u,w)\).

The main assumption leading to quasioptimality is a notion of strong convexity along curves. The following definition is taken from [3].

Definition 3.1

A functional \(\mathfrak {J}: H \rightarrow \mathbb {R}\) is called \(\lambda \)-convex along the curve \(\gamma :[0,1]\rightarrow H\) if there is a \(\lambda > 0\) such that

$$\begin{aligned} \mathfrak {J}(\gamma (t)) \le (1-t) \,\mathfrak {J}(\gamma (0)) + t \,\mathfrak {J}(\gamma (1)) - \frac{1}{2} \lambda t(1-t) \mathrm{dist }(\gamma (0), \gamma (1))^2 \end{aligned}$$

for all \(t\in [0,1]\).

With this assumption, a metric version of the Céa lemma follows almost immediately.

Theorem 3.1

Assume that \(H\) is a metric space, and let \(\mathfrak {J}: H \rightarrow \mathbb {R}\). Suppose that \(u\in H\) is a minimizer of \(\mathfrak {J}\) and let \(V\) be a subset of \(H\) for which the minimization problem

$$\begin{aligned} v :={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in V}} \mathfrak {J}(w) \end{aligned}$$

has a solution. Assume that there exists a curve \(\gamma \) with \(\gamma (0) = u\) and \(\gamma (1) = v\), along which the energy \(\mathfrak {J}\) is \(\lambda \)-convex. Further, assume that \(\mathfrak {J}\) is quadratically bounded around \(u\) in the sense that there is a constant \(\Lambda >0\) such that

$$\begin{aligned} \mathfrak {J}(w) - \mathfrak {J}(u)\le \frac{\Lambda }{2} \mathrm{dist }(u,w)^2 \end{aligned}$$
(20)

for all \(w\in V\). Then,

$$\begin{aligned} \mathrm{dist }(u,v) \le \sqrt{2} \; \sqrt{\frac{\Lambda }{\lambda }} \inf _{w\in V}\mathrm{dist }(u,w). \end{aligned}$$

Proof

Inserting \(t=\frac{1}{2}\) into the definition of \(\lambda \)-convexity yields that

$$\begin{aligned} \mathrm{dist }(u,v)^2\le \frac{4}{\lambda } \left( \mathfrak {J}(v) - \mathfrak {J}(u)\right) . \end{aligned}$$

Since \(v\) is a minimizer on \(V\), we can write

$$\begin{aligned} \mathrm{dist }(u,v)^2\le \frac{4}{\lambda } \inf _{w\in V} \left( \mathfrak {J}(w) - \mathfrak {J}(u)\right) . \end{aligned}$$

By (20), the right-hand side can be bounded as desired. \(\square \)

A slightly more involved argument allows to get rid of the factor \(\sqrt{2}\).

We now consider the case that \(H\) has a differentiable structure, which implies that we can have curves \(\gamma : [0,1] \rightarrow H\) with well-defined tangent vectors \(\dot{\gamma }\). We also assume that there is a norm \({|\cdot |}\) defined on these tangent vectors. The following alternative condition on \(\mathfrak {J}\) is frequently convenient.

Definition 3.2

We say that \(\mathfrak {J}\) is elliptic along a differentiable curve \(\gamma :[0,1]\rightarrow H\) if it is twice continuously differentiable along \(\gamma \), and if there exist positive constants \(\lambda , \Lambda \) such that

$$\begin{aligned} \lambda |\dot{\gamma }(t)|^2 \le \frac{d^2}{dt^2}\mathfrak {J}({\gamma }(t))\le \Lambda |\dot{\gamma }(t) |^2 \end{aligned}$$
(21)

for all \(t \in [0,1]\).

This concept is related to convexity in the following way. Assume that for each pair \(w_1, w_2 \in H\) there is a differentiable path from \(w_1\) to \(w_2\), parametrized by arc length, that realizes the distance \(\mathrm{dist }(w_1,w_2)\). We call such paths (constant speed) geodesics.

Lemma 3.1

Let \(\mathfrak {J}\) be elliptic in the sense of Definition 3.2 along a given constant speed geodesic \(\gamma :[0,1] \rightarrow H\). Then, \(\mathfrak {J}\) is \(\lambda \)-convex along that curve. If additionally \(\gamma (0)\) is a minimizer of \(\mathfrak {J}\), then \(\gamma \) is quadratically bounded in the sense of (20) along \(\gamma \), with constant \(\Lambda \).

In particular, we see that the requirements of Theorem 3.1 are strictly weaker, because they are implied by ellipticity, but require no smoothness.

Proof

Set \(f :=\mathfrak {J}(\gamma ) : [0,1] \rightarrow \mathbb {R}\). By the ellipticity assumption, \(f\) is twice continuously differentiable. We first show that the lower bound on \(f''\) implies that \(\mathfrak {J}\) is \(\lambda \)-convex along \(\gamma \). Pick \(0 \le t \le 1\) and apply Taylor’s formula to \(f\) at \(t\). This gives

$$\begin{aligned} f(0)&\ge f(t) + f'(t)(0-t) + \frac{1}{2}\lambda {|\dot{\gamma }(s_1)|}^2(0-t)^2 \end{aligned}$$

and

$$\begin{aligned} f(1)&\ge f(t) + f'(t)(1-t) + \frac{1}{2}\lambda {|\dot{\gamma }(s_2)|}^2(1-t)^2, \end{aligned}$$

where \(0 \le s_1 \le t\) and \(t \le s_2 \le 1\). Since \(\gamma \) is a constant speed geodesic, we have \({|\dot{\gamma }(s_1)|}^2 = {|\dot{\gamma }(s_2)|}^2 = \mathrm{dist }(\gamma (0),\gamma (1))^2\). Multiply the first inequality by \(t\), the second one by \(1-t\), and add them to obtain the assertion.

Next, we show that \(f''(t) \le \Lambda {|\dot{\gamma }(t)|}^2\) implies \(\mathfrak {J}(\gamma (1)) - \mathfrak {J}(\gamma (0)) \le \frac{\Lambda }{2} \mathrm{dist }(\gamma (0),\gamma (1))^2\). Using that \(f'(0) = 0\) by assumption, we can directly compute

$$\begin{aligned} \mathfrak {J}(\gamma (1)) - \mathfrak {J}(\gamma (0))&= \int \limits _{0}^1 f'(t)\,dt - \int \limits _0^1 f'(0)\,dt= \int \limits _0^1 \int \limits _0^t f''(s)\,ds\,dt\\ {}&= \int \limits _0^1(1-t)f''(t)\,dt \le \int \limits _0^1(1-t) \Lambda |\dot{\gamma }(t)|^2 \,dt\\&= \frac{\Lambda }{2} \mathrm{dist }(\gamma (0),\gamma (1))^2. \end{aligned}$$

\(\square \)

As an immediate consequence of Theorem 3.1 together with Lemma 3.1, we get the following result.

Theorem 3.2

Let \(H\) be a Banach manifold with norm \({|\cdot |}\), and \(\mathfrak {J}: H \rightarrow \mathbb {R}\) a functional. Assume that \(u\in H\) is a minimizer of \(\mathfrak {J}\), and that \(\mathfrak {J}\) is elliptic along constant speed geodesics, with constants \(\lambda ,\Lambda \). For a \(V\subset H\) set

$$\begin{aligned} v :={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in V}}\mathfrak {J}(w), \end{aligned}$$

assuming that this is well defined. Then, we have that

$$\begin{aligned} \mathrm{dist }(u,v) \le \sqrt{2} \; \sqrt{\frac{\Lambda }{\lambda }} \inf _{w\in V}\mathrm{dist }(u,w). \end{aligned}$$

Using this theorem for the space \(W^{1,2}(\Omega ,M)\) together with the norm \(|\cdot |_{H^1}\) defined in Definition 2.4 yields the following corollary, provided that \(W^{1,2}(\Omega ,M)\) is a Banach manifold. Unfortunately, the latter only holds for \(d = 1\).

Corollary 3.1

Assume that \(W^{1,2}(\Omega ,M)\) is a Banach manifold, and let \(\mathfrak {J}: W^{1,2}(\Omega ,M) \rightarrow \mathbb {R}\). Assume that \(u\in W^{1,2}(\Omega ,M)\) is a minimizer of \(\mathfrak {J}\), and that \(\mathfrak {J}\) is elliptic along constant speed geodesics in \(W^{1,2}(\Omega ,M)\), starting in \(u\). Let \(V\subset W^{1,2}(\Omega ,M)\) and

$$\begin{aligned} v = {\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in V}}\mathfrak {J}(w) \qquad \text {(the ``discrete'' solution)}. \end{aligned}$$

Then, we have that

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,v) \le \sqrt{2} \; \sqrt{\frac{\Lambda }{\lambda }} \inf _{w\in V}\mathrm{dist }_{W^{1,2}}(u,w). \end{aligned}$$

This corollary is the natural extension of the standard Céa lemma to nonlinear function spaces.

3.2 Céa’s Lemma Using Geodesic Homotopies

When trying to apply the results of the previous section, we encounter two problems. First, for the energies \(\mathfrak {J}\) and domains \(\Omega \) of our interest, we consider variational problem formulations in \(W^{1,2}(\Omega ,M)\), and in general, this space does not possess the structure of a Banach manifold [20, 30, 31]. Hence, the results based on Banach manifolds cannot be used. Secondly, even if the space \(W^{1,2}(\Omega ,M)\) turns out to be a Banach manifold, it is difficult to work with constant speed geodesics in these spaces. In particular, it is not easy to verify ellipticity properties along these curves for important energies, such as the harmonic energy.

To overcome these issues, we generalize the approach somewhat. Instead of geodesics in \(W^{1,2}\), we now consider geodesic homotopies. However, we still obtain bounds in terms of a \(W^{1,2}\)-like measure, namely the quantity \(D_{1,2}\) introduced in (8). While this quantity is of little interest in itself, the result will allow to bound the discretization error of geodesic finite elements in terms of the embedded distance (6) and the geodesic distance (7). The proof is based on the \(H^1\)-uniformity of geodesic homotopies shown in Sect. 2.3. The price we pay is that we additionally have to assume the existence of a constant \(K>0\) such that \(u \in W^{1,q}_K\) and \(V \subset W^{1,q}_K\) with some \(q>\max (2,d)\) (this simply implies the existence of an embedding into the space of continuous functions and is a natural restriction for manifold-valued functions).

Theorem 3.3

Let \(H\subset W^{1,2}(\Omega ,M)\), \(\mathfrak {J}: H\rightarrow \mathbb {R}\) and \(K>0\). Assume that \(u\in H\cap W^{1,q}_K(\Omega ,M)\), \(q>\max (2,d)\) is a stationary point of \(\mathfrak {J}\) w.r.t. variations along geodesic homotopies in \(H\cap W^{1,q}_{K}(\Omega ,M)\) starting in \(u\).

For a second constant, \(L>0\) and \(s>\frac{qd}{q-d}\) arbitrary denote

$$\begin{aligned} H_L^u :=\big \{ v \, : \, \mathrm{dist }_{L^{s}}(u,v) \le L\big \}, \end{aligned}$$

and assume that \(\mathfrak {J}: W^{1,q}_K \cap H_L^u\cap H\rightarrow \mathbb {R}\) is elliptic along geodesic homotopies that start in \(u\). Let \(V\subset W^{1,q}_K \cap H_L^{u}\cap H\) and

$$\begin{aligned} v :={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{w\in V}}\mathfrak {J}(w). \end{aligned}$$

Then, we have that

$$\begin{aligned} D_{1,2}(u,v) \le C_2^2\sqrt{\frac{\Lambda }{\lambda }} \inf _{w\in V}D_{1,2}(u,w), \end{aligned}$$

with \(C_2\) the uniformity constant (9), only depending on \(d\), the product \(KL\) and the curvature of \(M\).

Proof

For \(w\in V\) define

$$\begin{aligned} f_w(t):=\mathfrak {J}(\Gamma (t)) \end{aligned}$$

with \(\Gamma : [0,1] \rightarrow W^{1,q}_K \cap H^u_{L} \cap H\) a geodesic homotopy from \(u\) to \(w\). We have

$$\begin{aligned} \mathfrak {J}(w) - \mathfrak {J}(u) = \int \limits _{0}^1 f_w'(t)\,dt - \int \limits _0^1 f_w'(0)\,dt = \int \limits _0^1 \int \limits _0^t f_w''(s)\,ds\,dt = \int \limits _0^1(1-t)f_w''(t)\,dt. \end{aligned}$$

By the ellipticity assumption (21), we have

$$\begin{aligned} \lambda \int \limits _0^1 (1-t)|\dot{\Gamma }(t)|^2_{H^1}\,dt \le \mathfrak {J}(w) - \mathfrak {J}(u) \le \Lambda \int \limits _0^1 (1-t)|\dot{\Gamma }(t)|^2_{H^1}\,dt. \end{aligned}$$

Now, we use Lemma 2.5 that shows that

$$\begin{aligned} \frac{\lambda }{C_2^2}|\dot{\Gamma }(0)|^2_{H^1} \le \mathfrak {J}(w) - \mathfrak {J}(u) \le \Lambda C_2^2 |\dot{\Gamma }(0)|^2_{H^1}, \end{aligned}$$

where the constant \(C_2\) depends only on \(d\), \(K\), \(L\), and the curvature of \(M\). Noting further that

$$\begin{aligned} D_{1,2}(u,w)^2 = |\dot{\Gamma }(0)|^2_{H^1} \end{aligned}$$

immediately yields that

$$\begin{aligned} D_{1,2}(u,v)^2 \le \frac{C_2^2}{\lambda } \big [ \mathfrak {J}(v) - \mathfrak {J}(u)\big ] \le \frac{C_2^2}{\lambda } \big [\mathfrak {J}(w) - \mathfrak {J}(u)\big ] \end{aligned}$$

for all \(w\in V\). Furthermore, we have

$$\begin{aligned} \mathfrak {J}(w) - \mathfrak {J}(u) \le C_2^2 \Lambda D_{1,2}^2(u,w). \end{aligned}$$

Together, we obtain that

$$\begin{aligned} D_{1,2}(u,v) \le C_2^2 \sqrt{\frac{\Lambda }{\lambda }}\inf _{w\in V}D_{1,2}(u,w). \end{aligned}$$

\(\square \)

Replacing \(|\dot{\Gamma }(0)|^2_{H^1}\) by \(|\dot{\Gamma }(1)|^2_{H^1}\) in the arguments above shows the following corollary.

Corollary 3.2

With the notation of the previous theorem, we also have the estimate

$$\begin{aligned} D_{1,2}(v,u) \le C_2^2\sqrt{\frac{\Lambda }{\lambda }} \inf _{w\in V}D_{1,2}(w,u). \end{aligned}$$

The restriction that \(u \in W^{1,q}_K\) does not appear in the linear theory. The question whether Theorem 3.3 can be shown without it is open.

Remark 3.1

Requiring that the approximation space \(V\) consists only of functions with derivatives bounded by \(K\) in the \(W^{1,q}\)-sense may lead to a restriction when considering families of approximation spaces \(V^h\) associated with a mesh width \(h > 0\). If \(V^h\) are chosen as GFE spaces as introduced in Sect. 4, the quantity \(K\) could in general grow as \(h^{-1}\) with \(h\) denoting the meshwidth (as it is the case with standard Lagrangian finite elements with values in \(\mathbb {R}\)). As a remedy, we show in Theorem 6.2 that the condition \(V^h \subset W^{1,q}_K\) can be dispensed with, provided that \(u\) is sufficiently regular, more precisely if \(u\in H^{d/2+1}(\Omega ,M)\).

4 Geodesic Finite Elements

In Chapter 3, very little has been required from the approximation spaces \(V\). For the theory based on distance-realizing curves in Sect. 3.1, only the existence of a minimizer of \(\mathfrak {J}\) in \(V\) was asked. In Sect. 3.2, we additionally needed that the approximating functions that make up \(V\) have their derivatives bounded by a constant \(K\).

In this section, we present geodesic finite elements (GFE) as one particular example of a suitable space \(V\). They have originally been introduced in [55, 56], but for completeness we give a brief review. The definition consists of two parts. First, nonlinear interpolation functions are constructed that interpolate values given on a reference element. Then, for a given grid, these interpolation functions are pieced together to form global finite element functions.

4.1 Geodesic Interpolation

Let \({T_\text {ref}}\) be an open bounded subset of \(\mathbb {R}^d\), which we will call reference element. Particular instances are the reference simplex \(\big \{ x \in \mathbb {R}^d \; | \; x^\alpha \ge 0, \; \alpha = 1,\dots ,d, \, \sum _{\alpha =1}^d x^\alpha \le 1 \big \}\), and the reference cube \([0,1]^d\). On \({T_\text {ref}}\), we assume the existence of a set of Lagrangian interpolation polynomials, i.e., a set of Lagrange nodes \(a_i \in {T_\text {ref}}\), \(i=1,\dots ,m\), and corresponding polynomial functions \(\lambda _i : {T_\text {ref}} \rightarrow \mathbb {R}\) of order \(p\) such that

$$\begin{aligned} \lambda _i(a_j) = \delta _{ij} \qquad \text {for all } 1 \le i,j \le m, \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^m \lambda _i \equiv 1. \end{aligned}$$
(22)

We now generalize Lagrange interpolation to values in a manifold. Let \(v_i \in M\), \(i=1,\dots ,m\) be given values at the Lagrange nodes \(a_i \in {T_\text {ref}}\). We want to construct a function \(\Upsilon _v : {T_\text {ref}} \rightarrow M\) such that \(\Upsilon _v(a_i) = v_i\) for all \(i = 1,\dots ,m\). The following definition was given and motivated in [56].

Definition 4.1

Let \(\{\lambda _i, i=1,\dots ,m \}\) be a set of \(p\)-th-order scalar Lagrangian shape functions, and let \(v_i \in M\), \(i=1,\dots ,m\) be values at the corresponding Lagrange nodes. We call

$$\begin{aligned} \nonumber \Upsilon&\; : \; M^m \times {T_\text {ref}} \rightarrow M \\ \Upsilon (v_1,\dots ,v_m;x)&= {\mathop {{{\mathrm{arg\,min\;}}}}\limits _{q \in M}} \sum _{i=1}^m \lambda _i(x) \mathrm{dist }(v_i,q)^2 \end{aligned}$$
(23)

\(p\)-th-order geodesic interpolation on \(M\).

For fixed coefficients \(v_1,\dots ,v_m\), we set \(\Upsilon _v(\cdot ) :=\Upsilon (v_1,\dots ,v_m;\cdot )\) and obtain the desired function.

Remark 4.1

Formulas similar to (23) have been used in the literature to interpolate manifold-valued data [12, 41, 50]. The idea to use them to construct finite element spaces was first proposed in [27, 55, 56].

It is easy to verify that this definition reduces to \(p\)-th-order Lagrangian interpolation if \(M\) is a linear space and \(\mathrm{dist }(\cdot ,\cdot )\) the standard distance. For the nonlinear case and \(p=1\), well posedness of the definition under certain restrictions on the \(v_i\) is a classic result by [38]. For \(p \ge 2\), where the \(\lambda _i\) can become negative, well posedness has been proved in [56]. The interpolation function \(\Upsilon \) is infinitely differentiable both as a function of the \(v_i\) and of the local coordinates \(x\). This and several other features are discussed in [55, 56].

Since the values of \(\Upsilon _v\) are defined as solutions of a minimization problem, we can also characterize them by the corresponding first-order optimality condition (see, for instance, [38]). We will make use of this representation in the interpolation error bound in Chapter 5.

Lemma 4.1

For any \(q \in M\) denote by \(\log (q,\cdot ) : M \supset U \rightarrow T_q M\) the inverse of the exponential map of \(M\) at \(q\). Then, \(q^* :=\Upsilon (v_1,\dots ,v_m;x)\) is (locally uniquely) characterized by the first-order condition

$$\begin{aligned} \sum _{i=1}^m\lambda _i(x)\log (q^*,v_i) = 0 \; \in \; T_{q^*}M. \end{aligned}$$
(24)

Interpolation error bounds for geodesic finite elements are based on the fact that the shape functions \(\lambda _i\) are exact on polynomials of degree no greater than \(p\), meaning that

$$\begin{aligned} \sum _{i=1}^m\lambda _i(x) q(a_i) = q(x) \end{aligned}$$
(25)

for all polynomials \(q : T_\text {ref} \rightarrow \mathbb {R}\) of degree less than or equal to \(p\). Using this, we can prove the following technical property.

Lemma 4.2

For all multi-indices \(\vec {l}\) with \(|{\vec {l}}|\le p\) and all functions \(f : T_\text {ref} \rightarrow \mathbb {R}\), we have

$$\begin{aligned} \sum _{i=1}^m\lambda _i(x) \left( a_i - x\right) ^{\vec {l}} f(x) = 0. \end{aligned}$$

Proof

We start by fixing some arbitrary \(x_*\in T_\text {ref}\). Then, we can write

$$\begin{aligned} \sum _{i=1}^m\lambda _i(x) \left( a_i - x_*\right) ^{\vec {l}} f(x_*) = \sum _{i=1}^m\lambda _i(x)p_{x_*}(a_i) \end{aligned}$$

where

$$\begin{aligned} p_{x_*}(y):=(y - x_*)^{\vec {l}} f(x_*) \end{aligned}$$

is a polynomial of degree \(|{\vec {l}}|\). By (25), we get

$$\begin{aligned} \sum _{i=1}^m\lambda _i(x) \left( a_i - x_*\right) ^{\vec {l}} f(x_*) = p_{x_*}(x). \end{aligned}$$

Since by definition \(p_{x_*}(x_*) = 0\), this implies

$$\begin{aligned} \sum _{i=1}^m\lambda _i(x_*) \left( a_i - x_*\right) ^{\vec {l}} f(x_*) = 0, \end{aligned}$$

which, by the arbitrariness of \(x_*\), implies the statement. \(\square \)

4.2 Geodesic Finite Element Functions

Let now \(\Omega \) be a domain in \(\mathbb {R}^d\). Suppose we have a conforming grid \(\mathcal {G}\) for \(\Omega \) with elements not necessarily restricted to simplices. Let \(n_i \in \Omega \), \(i=1,\dots ,{|n|}\) be a set of Lagrange nodes such that for each element \(T\) of \(\mathcal {G}\) there are \(m\) nodes \(a_{T,i}\) contained in \(T\) and such that the \(p\)-th-order interpolation problem on \(T\) is well posed.

Definition 4.2

(Geodesic Finite Elements) Let \(\mathcal {G}\) be a conforming grid on \(\Omega \), and let \(M\) be a Riemannian manifold. We call \(v^h : \Omega \rightarrow M\) a geodesic finite element function for \(M\) if it is continuous, and for each element \(T \in \mathcal {G}\), the restriction \(v^h|_T\) is a geodesic interpolation in the sense that

$$\begin{aligned} v^h|_T(x) = \Upsilon \big (v_{T,1}, \dots , v_{T,m}; \mathcal {F}_T(x) \big ), \end{aligned}$$

where the \(\mathcal {F}_T : T \rightarrow {T_\text {ref}}\) are element mappings (typically affine or multilinear), and the \(v_{T,i}\) are values in \(M\) corresponding to the Lagrange nodes \(a_{T,i}\). The space of all such functions \(v^h\) will be denoted by \(V_{p,\mathcal {G}}^M\).

This definition reduces to standard (vector-valued) Lagrangian finite elements if \(M\) is a linear space with the usual Euclidean distance.

The following property is crucial for our analysis, because we always assume that the approximation space \(V\) is a subset of the solution space. The proof is given in [55] and [56].

Theorem 4.1

\(V_{p,\mathcal {G}}^M(\Omega ) \subset H^1(\Omega ,M)\) for all \(p \ge 1\).

While this holds for all grids \(\mathcal {G}\) and all polynomial orders \(p\), we note that geodesic finite element spaces are generally not nested. This means that in general \(V_{p,\mathcal {G}}^M \not \subset V_{p+1,\mathcal {G}}^M\) and \(V_{p,\mathcal {G}}^M \not \subset V_{p,\mathcal {G}'}^M\) if \(\mathcal {G}'\) is a uniform refinement of \(\mathcal {G}\). See [56, Chap. 4] for a brief discussion.

Remark 4.2

In numerical algorithms, one uses the algebraic representation of \(V_{p,\mathcal {G}}^M\), that is, a function \(v^h \in V_{p,\mathcal {G}}^M\) is identified with a set of nodal coefficients \(\bar{v} \in M^{|n|}\). However, note that \(V_{p,\mathcal {G}}^M\) is not globally homeomorphic to \(M^{|n|}\); in fact, it is not even globally a manifold. This is so because for certain sets of coefficients there is more than one interpolation function (see [54] for a simple example). On the other hand, it is shown in [56] that for many \(\bar{v} \in M^{|n|}\) there is only a single interpolating function \(v^h\), and then, there is a diffeomorphism mapping a neighborhood of \(\bar{v}\) in \(M^{|n|}\) to a neighborhood of \(v^h\) in \(V_{p,\mathcal {G}}^M\). In this sense, the space \(V_{p,\mathcal {G}}^M\) contains many small “manifold patches.” Its global structure, however, remains unclear.

To prove quasioptimality in Theorem 3.3, we had to make the assumption that the discrete space \(V\) contains only functions with first derivatives bounded by a global constant \(K\). While it is obvious that each GFE function has bounded first derivatives, a global bound for all functions of a space \(V_{p,\mathcal {G}}^M\) exists only if \(M\) has finite diameter. This global bound depends on the grid size \(h\). The specific nature of this dependence will allow us in Theorem 6.2 to circumvent the restriction \(V \subset W^{1,q}_K\) and obtain discretization error bounds without constraints on the ansatz space. For later use, there we therefore state the following simple result, which holds for all orders \(p\) and for \(M\) with bounded or unbounded diameter.

Lemma 4.3

Let \(\mathcal {G}\) be such that \(\mathcal {F}_T\) scales with \(h\) of order \(p\) for each element \(T\) of \(\mathcal {G}\). Then, for each function \(v^h \in V_{p,\mathcal {G}}^M\), we have

$$\begin{aligned} \Theta _{\infty ,1,\Omega }(v^h) \lesssim h^{-1}, \end{aligned}$$

where the constant depends on the values of \(v^h\) at the Lagrange nodes.

In order to assess the approximation properties of the spaces \(V_{p,\mathcal {G}}^M\), we finally construct the pointwise interpolation operator mapping continuous functions with values in \(M\) to elements in \(V_{p,\mathcal {G}}^M\). As in the classical linear case, we first define the interpolant on a reference element. We start by fixing the reference element \({T_\text {ref}}\) with Lagrangian interpolation nodes \(a_i\) and corresponding local basis functions \(\lambda _i\), \(i = 1,\dots ,m\). Given a function \(u: {T_\text {ref}} \rightarrow M\), its local Lagrangian interpolant on \({T_\text {ref}}\) is defined by

$$\begin{aligned} \mathbb {I}_{{T_\text {ref}}}u(x):=\Upsilon \left( u(a_1),\dots , u(a_m); x\right) . \end{aligned}$$

Likewise, for a general element \(T\) with associated mapping \(\mathcal {F}_T:T \rightarrow {T_\text {ref}}\), the local Lagrangian interpolant is given by

$$\begin{aligned} \mathbb {I}_{T}u(x) = \mathbb {I}_{{T_\text {ref}}}\left( u\circ \mathcal {F}_T^{-1}\right) \left( \mathcal {F}_T(x)\right) ,\quad x\in T. \end{aligned}$$

With these notions at hand, we can define the geodesic Lagrange interpolant of a continuous function \(u:\Omega \rightarrow M\).

Definition 4.3

For each continuous function \(u : \Omega \rightarrow M\), define the geodesic Lagrange interpolant \(\mathbb {I}_{\mathcal {G}}u\in V_{p,\mathcal {G}}^M\) by

$$\begin{aligned} \mathbb {I}_{\mathcal {G}}u(x) = \mathbb {I}_{T}u(x),\quad x\in T, \ T\in \mathcal {G}. \end{aligned}$$

Note that unlike in the linear case, this interpolant is not always unique.

5 Interpolation Error Estimates

The goal of this section is to derive estimates of optimal order for the interpolation error between a function \(u : \Omega \rightarrow M\) and its interpolant \(\mathbb {I}_{\mathcal {G}}u\). To motivate our proof, we briefly review how interpolation error estimates can be obtained in the linear case \(M = \mathbb {R}\). There, we start with an error bound on the reference element.

Theorem 5.1

Let \(u:{T_\text {ref}} \rightarrow \mathbb {R}\) satisfy \(u \in H^k({T_\text {ref}})\) with \(k>d/2\). Then, we have

$$\begin{aligned} \Vert u - \mathbb {I}_{{T_\text {ref}}} u \Vert _{H^1({T_\text {ref}})} \lesssim |u|_{H^k({T_\text {ref}})}. \end{aligned}$$

In order to turn Theorem 5.1 into an estimate for a small element \(T\), say, \(T = h{T_\text {ref}}\) with \(\mathcal {F}_T (x) :=h^{-1}x\) and a function \(u : T \rightarrow \mathbb {R}\), we use the fact that the Sobolev seminorm satisfies the subhomogeneity property

$$\begin{aligned} |u\circ \mathcal {F}_T^{-1}|_{H^k({T_\text {ref}})} \lesssim h^{k}h^{-d/2}|u|_{H^k(T)}. \end{aligned}$$
(26)

We obtain the factor \(h^k\) from the \(k\)-fold application of the chain rule to \(u\circ \mathcal {F}_T^{-1}(x) =u(hx)\) and the factor \(h^{-d/2}\) from the integral transformation formula.

Then, denoting \(v:=u\circ \mathcal {F}_T^{-1}: {T_\text {ref}}\rightarrow \mathbb {R}\) and using \(\frac{d}{dx^\alpha }\mathcal {F}_{T} = h^{-1}\) for all \(\alpha = 1,\dots , d\), we get

$$\begin{aligned} |u - \mathbb {I}_{T} u |_{H^1(T)}^2&= h^{-2}\sum _{\alpha = 1}^d\int \limits _T \left| \left( \frac{d}{dx^\alpha }v-\frac{d}{dx^\alpha } \mathbb {I}_{{T_\text {ref}}} v\right) \right| ^2\circ \mathcal {F}_T(x)\,dx\\&= h^{-2}h^{d}|v-\mathbb {I}_{{T_\text {ref}}} v|_{H^1({T_\text {ref}})}^2. \end{aligned}$$

We can now invoke Theorem 5.1 and get the estimate

$$\begin{aligned} |u - \mathbb {I}_{T} u |_{H^1(T)}^2\lesssim h^{-2}h^{d}|u\circ \mathcal {F}_T^{-1}|_{H^k({T_\text {ref}})}^2 \end{aligned}$$

which, together with (26), yields the classical estimate

$$\begin{aligned} \Vert u - \mathbb {I}_{T} u \Vert _{H^1(T)} \lesssim h^{k-1}|u|_{H^k(T)}. \end{aligned}$$

To obtain a similar result for nonlinear codomains \(M\), we first need a generalization of Theorem 5.1. We prove such a result in Sect. 5.1, where the norm on the left becomes the quantity \(D_{1,2}\) and the norm on the right becomes the smoothness descriptor \(\Theta \). Then, in Sect. 5.2, we assemble these local estimates to establish optimal approximation rates for the geodesic finite element spaces \(V_{p,\mathcal {G}}^M\). This works because the smoothness descriptor \(\Theta \) also has the subhomogeneity property (26) (Lemma 2.8).

5.1 Nonlinear Elementwise Estimates

In this section, we prove a nonlinear generalization of the linear elementwise approximation result of Theorem 5.1. Note that the definition (23) is implicit which complicates the analysis. We cope with this difficulty by a clever use of the equilibrium condition (24).

Let \(\log (p,\cdot ) : M \rightarrow T_pM\) be the inverse of the exponential map at \(p\). Denote by \(\nabla _1\), \(\nabla _2\), the covariant derivative of a bivariate function with respect to the first and second argument, respectively. In particular, for \(l\in \mathbb {N}\), we will require the derivatives

$$\begin{aligned} \nabla _2^l\log (p,q):(T_qM)^l \rightarrow T_pM \end{aligned}$$

and

$$\begin{aligned} \nabla _2^l\nabla _1\log (p,q): T_pM\otimes (T_qM)^l \rightarrow T_pM; \end{aligned}$$

more precisely their norms

$$\begin{aligned} \Vert \nabla _2^l\log (p,q)\Vert =\sup _{v_1,\dots , v_l\in T_qM} \frac{\left| \nabla _2^l\log (p,q)\left( v_1,\dots , v_l\right) \right| _{g(p)}}{\prod _{i=1}^l|v_i|_{g(q)}} \end{aligned}$$

and

$$\begin{aligned} \Vert \nabla _2^l\nabla _1\log (p,q)\Vert =\sup _{\begin{array}{c} v_1,\dots , v_l\in T_qM\\ w\in T_pM \end{array}} \frac{\left| \nabla _2^l\nabla _1\log (p,q)\left( w,v_1,\dots , v_l\right) \right| _{g(p)}}{|w|_{g(p)}\prod _{i=1}^l|v_i|_{g(q)}}. \end{aligned}$$

Now, we can state and prove a nonlinear elementwise approximation result.

Theorem 5.2

Let \(u \in W^{k,2}(T_{\text {ref}},M)\), and \(\mathbb {I}_{T_\text {ref}} u\) its \(p\)-th-order geodesic interpolation. For \(k > d/2\) and \(p\ge k-1\), we have

$$\begin{aligned} \int \limits _{T_\text {ref}} \left| \log (\mathbb {I}_{T_\text {ref}} u(x) , u(x)) \right| _{g(\mathbb {I}_{T_\text {ref}} u(x))}^2 dx \lesssim \mathcal {C}^2_{1,u}({T_\text {ref}}) \dot{\Theta }_{k,{T_\text {ref}}}(u)^2, \end{aligned}$$
(27)

and for any \(\alpha =1,\dots ,d\)

$$\begin{aligned} \int \limits _{T_\text {ref}} \left| \frac{D}{dx^\alpha }\log (\mathbb {I}_{T_\text {ref}} u(x) , u(x)) \right| _{g(\mathbb {I}_{T_\text {ref}} u(x))}^2 dx \lesssim \left( \mathcal {C}^2_{1,u}({T_\text {ref}}) + \mathcal {C}^2_{2,u}({T_\text {ref}}) \right) \dot{\Theta }_{k,{T_\text {ref}}}( u)^2, \end{aligned}$$
(28)

where

$$\begin{aligned} \mathcal {C}_{1,u}({T_\text {ref}})&:=\sup _{1\le l\le k}\sup _{\begin{array}{c} p\in \mathbb {I}_{T_\text {ref}} u({T_\text {ref}})\\ q \in u({T_\text {ref}}) \end{array}} \left\| \nabla _2^l\log \left( p,q\right) \right\| \\ \end{aligned}$$

and

$$\begin{aligned} \mathcal {C}_{2,u}({T_\text {ref}})&:=\sup _{1\le l\le k}\sup _{\begin{array}{c} p \in \mathbb {I}_{T_\text {ref}} u({T_\text {ref}})\\ q \in u({T_\text {ref}}) \end{array}} \left\| \nabla _2^{l}\nabla _1\log \left( p,q\right) \right\| . \end{aligned}$$

The implicit constants are independent of \(u\) and \(M\) and only depend on the basis functions \(\lambda _i\).

Note that the left-hand sides of (27) and (28) make up the quantity \(D_{1,2}(\mathbb {I}_{T_\text {ref}}u, u)\).

Proof

We split the proof into eight steps.

Step 1 We first prove (27). Using the balance law (24), we obtain for any \(x \in {T_\text {ref}}\) that

$$\begin{aligned} \sum _{i=1}^m \lambda _i(x)\log \left( \mathbb {I}_{{T_\text {ref}}}u(x),u(a_i)\right) = 0. \end{aligned}$$

Adding a zero, we rewrite this as

$$\begin{aligned} \log (\mathbb {I}_{T_\text {ref}} u(x) , u(x)) =\log (\mathbb {I}_{T_\text {ref}} u(x) , u(x)) - \sum _{i=1}^m \lambda _i(x)\log \left( \mathbb {I}_{{T_\text {ref}}}u(x),u(a_i)\right) , \end{aligned}$$

and call the right-hand side \(\varepsilon (x) \in T_{\mathbb {I}_{T_\text {ref}} u(x)}M\). To obtain (27), we need to control the \(L^2\)-norm of the function \(\varepsilon \).

Step 2 Next, we define the auxiliary function

$$\begin{aligned} G(x,y):=\log \left( \mathbb {I}_{\Delta }u(x),u(y)\right) , \end{aligned}$$

and perform a Taylor expansion of \(G\) in its second argument around \(y=x\) (note that for fixed \(x\), the function \(G\) takes its values in a vector space).

In what follows, we shall use the notation \(\partial _y^{\vec {k}} G(x,y)\) for the partial derivatives of \(G\) with respect to its second argument and the multi-index \(\vec {k}\). The Taylor expansion then reads

$$\begin{aligned} G(x,y) = \sum _{|\vec {l}| < k}\frac{(y-x)^{\vec {l}}}{\vec {l} !} \partial _y^{\vec {l}} G(x,x) + \sum _{|{\vec {k}}|=k}R_{\vec {k}}(x,y)(y-x)^{\vec {k}}, \end{aligned}$$
(29)

where

$$\begin{aligned} R_{\vec {k}}(x,y) = \frac{|{\vec {k}}|}{{\vec {k}}!}\int \limits _0^1(1-t)^{|{\vec {k}}|-1} \partial _y^{\vec {k}} G(x,x+t(y-x))\,dt. \end{aligned}$$

We can express the terms \(\log \left( \mathbb {I}_{{T_\text {ref}}}u(x),u(a_i)\right) \) occurring in the definition of \(\varepsilon \) in the form (29) and get

$$\begin{aligned} \varepsilon (x) \!=\! G(x,x)\!-\! \sum _{i=1}^m\lambda _i(x) \left( \sum _{|{\vec {l}}| < k}\frac{(a_i-x)^{\vec {l}}}{{\vec {l}} !} \partial _y^{\vec {l}} G(x,x) \,+\, \sum _{|{\vec {k}}|=k}R_{\vec {k}}(x,a_i)(a_i\!-\!x)^{\vec {k}}\right) \!. \end{aligned}$$

Using that the weight functions \(\lambda _i\) form a partition of unity on \(T_\text {ref}\) (22), we get

$$\begin{aligned} -\varepsilon (x) \!=\! \sum _{0<|{\vec {l}}| \!<\! k}\sum _{i=1}^m\lambda _i(x) \frac{(a_i-x)^{\vec {l}}}{{\vec {l}} !} \partial _y^{\vec {l}} G(x,x) \,+\, \sum _{i=1}^m\lambda _i(x)\sum _{|{\vec {k}}|=k} R_{\vec {k}}(x,a_i)(a_i-x)^{\vec {k}}, \end{aligned}$$
(30)

where the zeroth-order derivative cancels with \(G(x,x)\).

Step 3 By the assumption \(p\ge k-1\), we can apply Lemma 4.2 with \(f(x) = \partial _y^{\vec {l}} G(x,x)\) to each sum \(\sum _{i=1}^m\lambda _i(x) \frac{(a_i-x)^{\vec {l}}}{{\vec {l}} !} \partial _y^{\vec {l}} G(x,x)\) in (30) and see that the first addend in (30) is zero. Hence, we can write \(\varepsilon (x)\) as the sum

$$\begin{aligned} \varepsilon (x) = \sum _{i=1}^m\varepsilon _i(x) \qquad \text {with}\qquad \varepsilon _i(x) :=-\lambda _i(x) \sum _{|{\vec {k}}|=k} R_{\vec {k}}(x,a_i)(a_i-x)^{\vec {k}}. \end{aligned}$$

We now treat each term \(\varepsilon _i\) separately. For simplicity, we may assume, after a suitable translation (depending on the index \(i\)), that \(a_i = 0\), and thus, we arrive at the pointwise estimate

$$\begin{aligned} \left| \varepsilon _i(x)\right| _{g(\mathbb {I}_{T_\text {ref}} u(x))}&\lesssim \sum _{|{\vec {k}}| =k}\bigg |\frac{|{\vec {k}}|}{{\vec {k}}!} \int \limits _0^1t^{|{\vec {k}} | - 1}\partial _y^{\vec {k}} G(x,tx) x^{\vec {k}} \,dt \bigg |_{g(\mathbb {I}_{T_\text {ref}} u(x))}\nonumber \\&\le \sum _{|{\vec {k}}| =k}\frac{|{\vec {k}}|}{{\vec {k}}!} \int \limits _0^1t^{|{\vec {k}} | - 1}\left| \partial _y^{\vec {k}} G(x,tx)\right| _{g(\mathbb {I}_{T_\text {ref}} u(x))} x^{\vec {k}} \,dt, \end{aligned}$$
(31)

where we have used that the \(\lambda _i\) are bounded on \(T_\text {ref}\).

Step 4 In order to untangle the derivatives of \(\log \) and \(u\) in the expression \(\partial _y^{\vec {k}} G(x,y)\), we use the chain rule which yields

$$\begin{aligned} \left| \partial _y^{\vec {k}} G(x,y)\right| _{g(\mathbb {I}_{T_\text {ref}} u(x))}&\lesssim \sum _{\begin{array}{c} 1\le l\le k,\ \vec {\beta }_j\in [d]^{m_j}\\ \sum _{j=1}^{l}m_j = k \end{array}} \nabla _2^{l}\log \left( \mathbb {I}_{T_\text {ref}} u(x),u(y)\right) \nonumber \\&\quad \times \, \Big (\mathcal {D}^{{\vec {\beta }}_1}u(y),\dots , \mathcal {D}^{{\vec {\beta }}_l}u(y)\Big ). \end{aligned}$$
(32)

We repeat that we use the notation

$$\begin{aligned} \nabla _2^l\log (p,q) : \left( T_qM\right) ^l \rightarrow T_pM \end{aligned}$$

to denote the \(l\)-th-order covariant derivative of the function \(q\mapsto \log (p,q)\), which is an \(l\)-multilinear form. \(\square \)

Remark 5.1

We record here that this is the point where the smoothness descriptor \(\dot{\Theta }\) (defined in Sect. 2.4) becomes necessary. Indeed, (32) already indicates that control over products of covariant derivatives of lower order is required whenever \(\nabla _2^{l}\log \ne 0\). Note also that in the linear case we have \(\nabla _2^{l}\log = 0\) for all \(l>1\) and therefore the usual Sobolev seminorm \(|\cdot |_{H^k}\) is sufficient to obtain the desired control over terms of the form (32). Keeping this in mind, it is easy to see that in the linear case our proof yields exactly the expected bounds for the interpolation error in the Sobolev seminorm.

  • By (31), we get

    $$\begin{aligned}&\int \limits _{{T_\text {ref}}}\left| \varepsilon _i(x) \right| ^2_{g(\mathbb {I}_{T_\text {ref}} u(x))} \,dx\\&\quad \lesssim \int \limits _{T_\text {ref}} \left( \int \limits _0^1t^{|{\vec {k}} | - 1}\left| \partial _y^{\vec {k}} G(x,tx)\right| _{g(\mathbb {I}_{T_\text {ref}} u(x))} x^{\vec {k}} \,dt\right) ^2 dx \\&\quad \lesssim \sum _{\begin{array}{c} 1\le l\le k,\ \vec {\beta }_j\in [d]^{m_j}\\ \sum _{j=1}^{l}m_j = k \end{array}} \int \limits _{T_\text {ref}} \left( \int \limits _0^1t^{|{\vec {k}} | - 1}\left| \nabla _2^{l}\log \left( \mathbb {I}_{T_\text {ref}} u(x),u(tx)\right) \right. \right. \\&\qquad \times \left. \left. \left( \mathcal {D}^{{\vec {\beta }}_1}u(tx),\dots , \mathcal {D}^{{\vec {\beta }}_k}u(tx)\right) \right| _{g(\mathbb {I}_{T_\text {ref}} u(x))} x^{\vec {k}} \,dt\right) ^2 dx\\&\quad \le \! \sum _{\begin{array}{c} 1\le l\le k,\ \vec {\beta }_j\in [d]^{m_j}\\ \sum _{j=1}^{l}m_j = k \end{array}} \int \limits _{T_\text {ref}} \left( \int \limits _0^1t^{|{\vec {k}} | \!-\! 1}H^{{\vec {\beta }}_1,\dots ,{\vec {\beta }}_{l}}(x,tx) x^{\vec {\beta }} \,dt\right) ^2 dx, \end{aligned}$$

    where we have put

    $$\begin{aligned} H^{{\vec {\beta }}_1,\dots ,{\vec {\beta }}_{l}}(x,y):=\left\| \nabla _2^{l}\log \left( \mathbb {I}_{T_\text {ref}} u(x),u(y)\right) \right\| \prod _{j=1}^{l}\left| \mathcal {D}^{{\vec {\beta }}_j}u(y)\right| _{g(u(y))}. \end{aligned}$$
  • Step 5 In the appendix, we have collected a few estimates for remainder terms of Taylor series. We can use Lemma 8.1 for the functions \(H^{{\vec {\beta }}_1,\dots ,{\vec {\beta }}_{l}}\), which gives us the bound

    $$\begin{aligned} \int \limits _{{T_\text {ref}}}|\varepsilon _i(x)|_{g(\mathbb {I}_{T_\text {ref}} u(x))}^2\, dx&\lesssim \sum _{\begin{array}{c} 1\le l\le k,\ \vec {\beta }_j\in [d]^{m_j}\\ \sum _{j=1}^{l}m_j = k \end{array}} \int \limits _{T_\text {ref}} \sup _{x\in {T_\text {ref}}}|H^{{\vec {\beta }}_1,\dots , {\vec {\beta }}_{l}}(x,y)|^2_{g(u(y))}\,dy\\&\le \sup _{1\le l\le k}\sup _{\begin{array}{c} p\in \mathbb {I}_{T_\text {ref}} u({T_\text {ref}})\\ q \in u({T_\text {ref}}) \end{array}} \left\| \nabla _2^{l}\log \left( p,q\right) \right\| ^2\\&\quad \,\times \sum _{\begin{array}{c} 1\le l\le k,\ \vec {\beta }_j\in [d]^{m_j}\\ \sum _{j=1}^{l}m_j = k \end{array}} \int \limits _{T_\text {ref}} \prod _{j=1}^{l}\left| \mathcal {D}^{{\vec {\beta }}_j}u(x)\right| _{g(u(x))}^2dx\\&= \mathcal {C}^2_{1,u}({T_\text {ref}}) \dot{\Theta }_{k,{T_\text {ref}}}(u)^2. \end{aligned}$$

    This concludes the first part of the proof.

  • Step 6 We now turn to the estimate for the first derivatives. To that end, we need to bound the \(L^2\)-norm of \(\frac{D}{dx^\alpha } \varepsilon (x)\). Since the functions \(\lambda _i\) have uniformly bounded first derivatives on \({T_\text {ref}}\), by the product rule, we can further reduce the problem to bounding the \(L^2\)-norm of

    $$\begin{aligned} \frac{D}{dx^\alpha } \sum _{|{\vec {k}}| = k} \frac{|{\vec {k}}|}{{\vec {k}}!} \int \limits _0^1t^{|{\vec {k}} | - 1}\partial _y^{\vec {k}} G(x,tx) x^{\vec {k}} \,dt \end{aligned}$$

    for \(\alpha = 1,\dots , d\). Using the chain rule, we get

    $$\begin{aligned} \frac{D}{dx^\alpha } \sum _{|{\vec {k}}| = k} \frac{|{\vec {k}}|}{{\vec {k}}!} \int \limits _0^1t^{|{\vec {k}} | - 1}\partial _y^{\vec {k}} G(x,tx) x^{\vec {k}}\, dt = \mathrm {I}(x,x) + \mathrm {II}(x), \end{aligned}$$

    with

    $$\begin{aligned} \mathrm {I}(z,x)&:=\sum _{|{\vec {k}}| = k} \frac{d}{dx^\alpha }\frac{|{\vec {k}}|}{{\vec {k}}!} \int \limits _0^1t^{|{\vec {k}} | - 1}\partial _y^{\vec {k}} G(z,tx) x^{\vec {k}} \, dt,\qquad z\in {T_\text {ref}} \end{aligned}$$

    and

    $$\begin{aligned} \mathrm {II}(x)&:=\sum _{|{\vec {k}}| = k} \frac{|{\vec {k}}|}{{\vec {k}}!} \int \limits _0^1t^{|{\vec {k}} | - 1}\partial _y^{\vec {k}} \frac{D}{dx^\alpha } G(x,z) \bigg |_{z = tx} x^{\vec {k}} \, dt. \end{aligned}$$
  • Step 7 To bound the \(L^2\)-norm of \(\mathrm {II}\), we may again use Lemma 8.1 and proceed exactly in the same fashion as for the proof of (27) in Step 5 above. More precisely, by the chain rule, we can bound

    $$\begin{aligned}&\Big |\partial _y^{\vec {k}}\frac{D}{dx^\alpha } G(x,z) \big |_{z = y}\Big |_{g(\mathbb {I}_{T_\text {ref}} u(x))} \\&\quad \lesssim \sum _{\begin{array}{c} 1\le l\le k,\ \vec {\beta }_j\in [d]^{m_j} \\ \sum _{j=1}^{l}m_j = k \end{array}}\left| \nabla _2^{l}\nabla _1\log \left( \mathbb {I}_{T_\text {ref}} u(x),u(y)\right) \right. \\&\qquad \times \left. \left( \frac{d}{dx^\alpha }\mathbb {I}_{T_\text {ref}} u(x),\mathcal {D}^{{\vec {\beta }}_1}u(y),\dots , \mathcal {D}^{{\vec {\beta }}_l}u(y)\right) \right| _{g(\mathbb {I}_{T_\text {ref}} u(x))}, \end{aligned}$$

    where we recall that \(\nabla _1\) denotes the covariant derivative of the vector field \(p\mapsto \log (p,q)\). Using that \(\mathbb {I}_{T_\text {ref}} u\) has uniformly bounded derivatives, and arguing exactly as in Step 4 and Step 5, we obtain

    $$\begin{aligned}&\int \limits _{{T_\text {ref}}}| \mathrm {II}(x)|_{g(\mathbb {I}_{T_\text {ref}} u(x))}^2\,dx \lesssim \sup _{1\le l\le k}\sup _{\begin{array}{c} p\in \mathbb {I}_{T_\text {ref}} u({T_\text {ref}})\\ q \in u({T_\text {ref}}) \end{array}}\\&\qquad \times \left\| \nabla _2^{l}\nabla _1\log \left( p,q\right) \right\| ^2 \dot{\Theta }_{k,{T_\text {ref}}}(u)^2 = \mathcal {C}^2_{2,u}({T_\text {ref}}) \dot{\Theta }_{k,{T_\text {ref}}}(u)^2. \end{aligned}$$
  • Step 8 The bound for \(\mathrm {I}(z,x)\) is more subtle. At first sight, it looks as if a bound for \(\mathrm {I}(z,x)\) would require derivatives of order \(k+1\) of \(u\), which may not be available. However, by Lemma 8.2 applied to the function \(U(\cdot ) :=G(z,\cdot ):T_{\text {ref}}\rightarrow T_{\mathbb {I}_{T_{\text {ref}}}u(z)}M\) for every fixed \(z\in T_{\text {ref}}\), we can write

    $$\begin{aligned} \mathrm {I}(z,x) = \sum _{|{\vec {l}} | = k-1}\frac{(-1)^{k-1}}{{\vec {l}} !} x^{\vec {l}} \partial _y^{{\vec {l}} + \vec {e}_\alpha }G(z,x), \end{aligned}$$

    which contains only derivatives of the desired order \(k\). Here, for any \(\alpha \in \{ 1,\dots ,d\}\), we use the notation \(\vec {e}_\alpha \in \mathbb {N}_0^d\) for the unit vector which is \(1\) in its \(\alpha \)-th digit and \(0\) everywhere else.

Now, we can proceed as above in Step 3 (using Lemma 8.1) to show that

$$\begin{aligned} \int \limits _{{T_\text {ref}}}|\mathrm {I}(x,x)|_{g(\mathbb {I}_{T_\text {ref}} u(x))}^2\,dx \lesssim \sup _{1\le l\le k}\sup _{\begin{array}{c} p\in \mathbb {I}_{T_\text {ref}} u({T_\text {ref}})\\ q \in u({T_\text {ref}}) \end{array}} \left\| \nabla _2^{l}\log \left( p,q\right) \right\| ^2 \dot{\Theta }_{k,{T_\text {ref}}}(u)^2 \end{aligned}$$

which proves (28).

The previous theorem has bounded the interpolation error on the reference element. We now derive an estimate on a general element \(T\).

Theorem 5.3

Let \(T\) be a domain in \(\mathbb {R}^{d}\), \(u \in W^{k,2}(T,M)\), \(\mathbb {I}_{T} u\) its \(p\)-th-order geodesic interpolation, and \(\mathcal {F}_T:T\rightarrow {T_\text {ref}}\) a map that scales with \(h\) of order \(l\ge k\). For \(k > d/2\) and \(p\ge k-1\), we have the estimate

$$\begin{aligned} D_{1,2}(\mathbb {I}_Tu,u) \lesssim h^{k-1} C_{M,T}\Theta _{k,T}(u), \end{aligned}$$

with

$$\begin{aligned} C_{M,T}(u):=\mathcal {C}_{1,u}(T) + \mathcal {C}_{2,u}(T). \end{aligned}$$
(33)

The implicit constant is independent of \(M\) and only depends on the basis functions \(\lambda _i\).

Proof

We use the representation

$$\begin{aligned} \mathbb {I}_Tu= \mathbb {I}_{T_\text {ref}} \left( u\circ \mathcal {F}_T^{-1}\right) \circ \mathcal {F}_T. \end{aligned}$$

As a first step, we prove the desired estimate for the \(L^2\)-part. Putting \(v :=u\circ \mathcal {F}_T^{-1}:{T_\text {ref}} \rightarrow M\), \(y = \mathcal {F}_T(x)\), and using (16b), we get

$$\begin{aligned} \int \limits _T \left| \log (\mathbb {I}_T u(x) , u(x)) \right| _{g(\mathbb {I}_T u(x))}^2 dx&= \int \limits _{{T_\text {ref}}} \left| \log (\mathbb {I}_{T_\text {ref}} v(y) , v(y)) \right| _{g(\mathbb {I}_{T_\text {ref}} v(y))}^2 \left| \det \right. \\&\quad \,\times \left. \left( \nabla \mathcal {F}_T (\cdot )\right) \right| ^{-1} \circ \mathcal {F}_T^{-1}(y)\, dy\\&\le h^{d}\int \limits _{{T_\text {ref}}} \left| \log (\mathbb {I}_{T_\text {ref}} v(y) , v(y)) \right| _{g(\mathbb {I}_{T_\text {ref}} v(y))}^2 dy. \end{aligned}$$

By Theorem 5.2, we can further estimate

$$\begin{aligned} h^{d}\int \limits _{{T_\text {ref}}} \left| \log (\mathbb {I}_{T_\text {ref}} v(y) , v(y)) \right| _{g(\mathbb {I}_{T_\text {ref}} v(y))}^2 dy \lesssim h^{d}\mathcal {C}_{1,u}(T)^2\dot{\Theta }_{k,{T_\text {ref}}}(v)^2. \end{aligned}$$

Finally, we use Lemma 2.8 and the definition of \(v\) to arrive at

$$\begin{aligned} h^{d}\mathcal {C}_{1,u}(T)^2\dot{\Theta }_{k,{T_\text {ref}}}(v)^2 \lesssim h^{2k}\mathcal {C}_{1,u}(T)^2\Theta _{k,T}(u)^2. \end{aligned}$$

Hence, we have shown the \(L^2\)-part of the assertion.

We go on to estimate the quantity

$$\begin{aligned} \int \limits _T \Big |\frac{D}{dx^\alpha }\log (\mathbb {I}_T u(x) , u(x)) \Big |_{g(\mathbb {I}_T u(x))}^2 dx \end{aligned}$$

for an \(\alpha \in \{1,\dots ,d\}\). The chain rule yields that

$$\begin{aligned}&\int \limits _T \left| \frac{D}{dx^\alpha }\log (\mathbb {I}_T u(x) , u(x)) \right| _{g(\mathbb {I}_T u(x))}^2 dx\\&\quad \le \int \limits _T \Bigg [\left| \frac{D}{dx^\alpha }\log (\mathbb {I}_{T_\text {ref}} v(\cdot ) , v(\cdot )) \right| _{g(\mathbb {I}_{T_\text {ref}} v(\cdot ))}^2\circ \mathcal {F}_T(x) \Bigg ] \cdot \left| \frac{d}{dx^\alpha }\mathcal {F}_T(x)\right| ^2 dx . \end{aligned}$$

Now, we use the scaling assumption (16c) for the term \(\left| \frac{d}{dx^\alpha }\mathcal {F}_T(x)\right| \) to bound the previous quantity by

$$\begin{aligned} h^{-2} \int \limits _T \left| \frac{D}{dx^\alpha }\log (\mathbb {I}_{T_\text {ref}} v(\cdot ) , v(\cdot )) \right| _{g(\mathbb {I}_{T_\text {ref}} v(\cdot ))}^2\circ \mathcal {F}_T(x) \,dx. \end{aligned}$$

We can now again use the substitution \(y=\mathcal {F}_T(x)\) and, using (16b), get the bound

$$\begin{aligned}&h^{-2} \int \limits _T \left| \frac{D}{dx^\alpha }\log (\mathbb {I}_{T_\text {ref}} v(\cdot ) , v(\cdot )) \right| _{g(\mathbb {I}_{T_\text {ref}} v(\cdot ))}^2\circ \mathcal {F}_T(x)\, dx\\&\quad \le h^{d-2}\int \limits _{T_\text {ref}} \left| \frac{D}{dx^\alpha }\log (\mathbb {I}_{T_\text {ref}} v(y) , v(y)) \right| _{g(\mathbb {I}_{T_\text {ref}} v(y))}^2dy. \end{aligned}$$

Now, we can again invoke Theorem 5.2 to deduce the estimate

$$\begin{aligned} h^{d-2}\int \limits _{T_\text {ref}} \left| \frac{D}{dx^\alpha }\log (\mathbb {I}_{T_\text {ref}} v(y) , v(y)) \right| _{g(\mathbb {I}_{T_\text {ref}} v(y))}^2dy \lesssim h^{d-2}\mathcal {C}_{2,u}(T)^2\dot{\Theta }_{k,{T_\text {ref}}}(v)^2. \end{aligned}$$

Finally, applying Lemma 2.8 to \(\dot{\Theta }_{k,{T_\text {ref}}}(v)\) yields the desired bound. \(\square \)

5.2 Global Interpolation Error Bounds

We now use Theorem 5.3 to obtain a global approximation result. The necessary grid regularity is formalized in the following definition.

Definition 5.1

We say that a grid \(\mathcal {G}\) is of width \(h\) if for each element \(T\) of \(\mathcal {G}\) the map \(\mathcal {F}_T\) from \(T\) to its reference element scales with \(h\) (of order \(p\), where \(p\) is the order of the Lagrange shape functions used in the construction of the GFE spaces).

A particular instance of such grids is shape regular triangulations with element diameters of the order of \(h\). However, the definition also covers more general cases, such as grids where the \(\mathcal {F}_T\) are polynomials.

Theorem 5.4

Let \(\Omega \) be a domain with a conforming grid \(\mathcal {G}\) of width \(h\), and let \(\mathbb {I}_\mathcal {G}\) be the pointwise interpolation operator onto the space of \(p\)-th-order geodesic finite elements on \(\mathcal {G}\). If \(k > d/2\) and \(p\ge k-1\), we have the estimate

$$\begin{aligned} D_{1,2}(\mathbb {I}_{\mathcal {G}}u,u) \lesssim h^{k-1}C_{M,\mathcal {G}}(u) \Theta _{k,\Omega }( u) \end{aligned}$$
(34)

with

$$\begin{aligned} C_{M,\mathcal {G}}(u) :=\sup _{T\in \mathcal {G}}C_{M,T}(u), \end{aligned}$$

and \(C_{M,T}(u)\) as defined in (33). The implicit constants are independent of \(M\) and only depend on the shape functions \(\lambda _i\). For \(h\rightarrow 0\), the constant \(C_{M,\mathcal {G}}(u)\) approaches the limit

$$\begin{aligned} \lim _{h\rightarrow 0}C_{M,\mathcal {G}}(u)= \sup _{1\le l\le k}\sup _{q \in u(\Omega )} \left\| \nabla _2^{l}\log \left( q,q\right) \right\| + \sup _{1\le l\le k} \sup _{q\in u(\Omega )}\left\| \nabla _2^{l}\nabla _1\log (q,q)\right\| . \end{aligned}$$
(35)

Proof

The bound (34) follows from applying Theorem 5.3 elementwise and summing up. To show (35), note that \(u\) is uniformly continuous, because \(k>d/2\). Therefore, the sets \(u(T)\), \(\mathbb {I}_Tu(T)\) converge to single points as \(h\) goes to zero. \(\square \)

The error estimates of Theorem 5.4 assess the error between a function and its Lagrange interpolant whenever the given function is of smoothness \(k>d/2\). In particular, in three dimensions, our results require that \(u\in W^{k,2}(\Omega ,M)\) with \(k>3/2\). The same requirement is needed for the linear theory, since as a minimal requirement to define the Lagrange interpolant an embedding into continuous functions is needed.

However, numerical experiments in Fig. 1 indicate nevertheless optimal approximation properties of both linear and geodesic finite element spaces even for \(k\le d/2\).

Fig. 1
figure 1

GFE approximation rates for the functions \(u^\alpha \) which map \((x_1,x_2,x_3)\in [0,1]^3\) to \(\left( .5\cdot x_1^\alpha ,.5\cdot x_2^\alpha ,.5\cdot x_3^\alpha ,\left( 1-.25\cdot x_1^{2\alpha }-.25\cdot x_2^{2\alpha }-.25\cdot x_3^{2\alpha }\right) ^{1/2}\right) \in \mathbb {S}^3\). It is classical that \(u_\alpha \in H^{\beta }\) for all \(\beta <\alpha + .5\), see, for instance, [53]. Even though for \(\alpha <1\) the functions \(u^\alpha \) are not in \(H^{d/2}\), the best GFE approximation with order \(p=1\) converges with the optimal rate nevertheless, as the above figure suggests

In the linear setting, this stronger result is proved using the Clément interpolation operator [14]. A generalization of this technique to nonlinear finite element spaces would be interesting.

Remark 5.2

For the linear case \(M = \mathbb {R}\), we have \(\log (q_1,q_2) = q_2 - q_1\),

$$\begin{aligned} \nabla _2^l\log (q_1,q_2) = {\left\{ \begin{array}{ll} 1 &{} \text {if } l = 1\\ 0 &{} \text {if } l>1, \end{array}\right. } \qquad \text {and} \qquad \nabla _2^l\nabla _1\log (q_1,q_2) = 0 \qquad l\ge 1. \end{aligned}$$

Therefore,

$$\begin{aligned} C_{M,\mathcal {G}} = 1 \end{aligned}$$

for any grid \(\mathcal {G}\) of size \(h\).

We also remark that the same argument as the one in Theorem 5.4 also allows to obtain error estimates in terms of

$$\begin{aligned} D_{1,q}(v,w) :=\left( \mathrm{dist }_{L^q} (v,w ) +\sum _{\alpha =1}^d \int \limits _\Omega \left| \frac{D}{dx^\alpha }\log (v(x),w(x))\right| ^q\right) ^{1/q}, \end{aligned}$$

and \(\Theta _{q,k,\Omega }\) for \(q\in [1,\infty )\) and with obvious modifications for \(q=\infty \). In this case, we require \(k>d/q\) so that pointwise interpolation is defined in \(W^{k,q}(\Omega ,M)\). The proof proceeds as the one for Theorem 5.4, except that the remainder terms occurring in the proof of Theorem 5.2 (e.g., in Step 2) have to be estimated in the \(q\)-norm (which is done similar to the \(L^2\) norm bounds).

Theorem 5.5

Let \(\Omega \) be a domain with a conforming grid \(\mathcal {G}\) of width \(h\) and let \(p\ge k-1\). Then, for \(q\in [1,\infty ]\), we have the estimates

$$\begin{aligned} D_{1,q}(\mathbb {I}_{\mathcal {G}}u,u)&\lesssim h^{k-1}C_{M,\mathcal {G}}(u) \Theta _{q,k,\Omega }( u) \end{aligned}$$

and

$$\begin{aligned} \mathrm{dist }_{L^q}(\mathbb {I}_{\mathcal {G}}u,u)&\lesssim h^{k}C_{M,\mathcal {G}}(u) \Theta _{q,k,\Omega }( u). \end{aligned}$$

The implicit constants are independent of \(M\) and only depend on the shape functions \(\lambda _i\).

Additionally, we obtain the following stability of the pointwise interpolation operator.

Corollary 5.1

There exists a constant \(C_4\) which only depends on \(M\) and the shape functions \(\lambda _i\) (but not on \(h\)) such that for \(q>\max (2,d)\) we have

$$\begin{aligned} \Theta _{q,1,\Omega }\left( \mathbb {I}_{\mathcal {G}}u\right) \le C_4 \Theta _{q,1,\Omega }\left( u\right) . \end{aligned}$$

Proof

For simplicity, we only show the case \(q=\infty \), the general case being similar. We assume that our manifold \(M\) is smoothly embedded into \(\mathbb {R}^N\). With the ansatz and notation of the proof of Lemma 2.3, we obtain

$$\begin{aligned} \frac{d}{dx^\alpha }u(x)&= \partial _1\exp \left( \mathbb {I}_\mathcal {G} u(x),\log (\mathbb {I}_\mathcal {G} u(x),u(x))\right) \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x) \\&+\, \partial _2\exp \left( \mathbb {I}_\mathcal {G} u(x),\log (\mathbb {I}_\mathcal {G} u(x),u(x))\right) \frac{D}{dx^\alpha }\log (\mathbb {I}_\mathcal {G} u(x),u(x)) \end{aligned}$$

for all \(\alpha \in \{1,\dots ,d\}\), which implies that

$$\begin{aligned}&\Big |\partial _1\exp \left( \mathbb {I}_\mathcal {G} u(x),\log (\mathbb {I}_\mathcal {G} u(x),u(x))\right) \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\Big |\nonumber \\&\qquad \lesssim \Big |\frac{d}{dx^\alpha } u(x)\Big |+\Big |\frac{D}{dx^\alpha }\log (\mathbb {I}_\mathcal {G} u(x),u(x))\Big | \end{aligned}$$
(36)

for all \(\alpha \in \{1,\dots , d\}\). Now, we use the fact that

$$\begin{aligned} \partial _1\exp \left( \mathbb {I}_\mathcal {G} u(x),0\right) \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x) = \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x), \end{aligned}$$

together with the Lipschitz continuity of \(\partial _1\exp (p,w)\) in \(w\) to get that, up to a constant \(C\) independent of \(h\),

$$\begin{aligned}&\Big |\partial _1\exp \left( \mathbb {I}_\mathcal {G} u(x),\log (\mathbb {I}_\mathcal {G} u(x),u(x))\right) \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)-\frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\Big |\\&\quad \le C \left| \log (\mathbb {I}_\mathcal {G} u(x),u(x))\right| \Big |\frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\Big |. \end{aligned}$$

By Theorem 5.5, we can further bound

$$\begin{aligned} \left| \log (\mathbb {I}_\mathcal {G} u(x),u(x))\right| \le \mathrm{dist }_{L^\infty }(u,\mathbb {I}_\mathcal {G} u) \le D h\dot{\Theta }_{\infty ,1,\Omega }(u) \end{aligned}$$

with another constant \(D>0\) independent of \(h\). Putting these estimates together, we obtain that

$$\begin{aligned} \left| \partial _1\exp \left( \mathbb {I}_\mathcal {G} u(x),\log (\mathbb {I}_\mathcal {G} u(x),u(x))\right) \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\!-\!\frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\right| \!\lesssim \! hCD\left| \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\right| \end{aligned}$$
(37)

for all \(\alpha \in \{1,\dots , d\}\). Putting together (36) and (37), we obtain

$$\begin{aligned} \left( 1-hCD\right) \left| \frac{d}{dx^\alpha }\mathbb {I}_\mathcal {G} u(x)\right| \lesssim \left| \frac{d}{dx^\alpha } u(x)\right| +\left| \frac{D}{dx^\alpha }\log (\mathbb {I}_\mathcal {G} u(x),u(x))\right| , \end{aligned}$$

which by Theorem 5.5 implies the desired result. \(\square \)

Remark 5.3

One can generalize these results in terms of the shape functions which are used in the construction for the GFE spaces. Indeed, all approximation error estimates in the present section only use the property that the Lagrange shape functions \(\lambda _i\) are exact on polynomials (25). Therefore, the same proofs can be used for any such set of shape functions.

5.3 Retraction Pairs

In certain cases, it is computationally expensive to compute the exponential or logarithm function of a given manifold. Then, alternative functions can sometimes be used. This idea is formalized by the concept of retraction pairs.

Definition 5.2

([26], see also [1, 23]) A pair \((P,Q)\) of smooth functions

$$\begin{aligned} P: TM \rightarrow M,\quad Q: M\times M\rightarrow TM \end{aligned}$$

is called a retraction pair if

$$\begin{aligned} P\left( x,Q\left( x,y\right) \right) = y,\qquad&\text{ for } \text{ all } x,\ y\in M, \end{aligned}$$

and

$$\begin{aligned} P\left( x,0\right) =x, \quad \frac{d}{dv} P(x,v)\Big |_{v=0} = \mathrm {Id} \qquad&\text{ for } \text{ all } x\in M. \end{aligned}$$

In general, \(P\) may only be defined locally around \(M\), and \(Q\) around the diagonal of \(M\times M\). Certainly, the pair \((\exp ,\log )\) satisfies the above assumptions [16] and therefore forms a retraction pair. We refer to [1] for examples of retraction pairs for several manifolds of practical interest. To better illustrate the concept of retraction pairs, Fig. 2 shows different pairs for the circle \(S^1\).

Given a retraction pair \((P,Q)\), we can construct generalized geodesic finite elements by using interpolants \(\Upsilon ^{(P,Q)}\) based on the first-order condition (24)

$$\begin{aligned} \sum _{i=1}^m\lambda ^p_i(x) \; Q\left( \Upsilon ^{(P,Q)}(v_1,\dots ,v_m;x),v_i\right) = 0 \; \in \; T_{\Upsilon ^{(P,Q)}(v_1,\dots ,v_m;x)}M. \end{aligned}$$
(38)

The results in [26] show that this expression is locally well defined.

Fig. 2
figure 2

Different retraction pairs for the circle. a Retraction pair based on exponential map. b Retraction pair based on closest point projection. c Retraction pair based on vertical projection

We state the following theorem whose correctness can be easily verified by going through the proofs of the results in Sect. 5.

Theorem 5.6

All approximation results shown in Sect. 5 remain valid if we replace the definition of the interpolant \(\Upsilon \) by (38) with \((P,Q)\) an arbitrary retraction pair, provided that the function \(u\) to be approximated is in \(W^{1,q}(\Omega ,M)\) for some \(q>\max (2,d)\).

The details are left to the reader.

6 A Priori Error Estimates for Geodesic Finite Elements

We are now in a position to combine the nonlinear Céa lemma (Theorem 3.3) with the approximation result (Theorem 5.4) to arrive at an a priori error bound for variational problems. For later use, we include Dirichlet boundary conditions and define

$$\begin{aligned} H^\Phi :=\left\{ v\in W^{1,2}(\Omega ,M):\ u|_{\partial \Omega } = \Phi \right\} \end{aligned}$$

for a given function \(\Phi : \partial \Omega \rightarrow M\). Note, however, that all results in this chapter also hold without Dirichlet conditions if the functional \(\mathfrak {J}\) has the appropriate ellipticity properties.

We also put for a fixed \(q>\max (2,d)\)

$$\begin{aligned} H_{K,L,q}^u(\Omega ,M) :=W^{1,q}_K(\Omega ,M)\cap H^u_L(\Omega ,M), \end{aligned}$$

where \(W^{1,q}_K,\ H^u_L\) are defined as in Theorem 3.3 for \(u\in C(\Omega ,M)\), and \(K, L > 0\).

We first show a direct consequence of Theorems 3.3 and 5.4. Then, we give an alternative proof showing the same optimal error bounds under weaker assumptions on the approximation space.

Theorem 6.1

Let \(u\) be a stationary point of the energy \(\mathfrak {J}:H^{\Phi } \rightarrow \mathbb {R}\) w.r.t. variations along geodesic homotopies starting in \(u\), and assume that

$$\begin{aligned} u \in H^{k}(\Omega ,M) \end{aligned}$$

for some \(k>d/2\).

Let \(q>\max (2,d)\) be such that \(H^k(\Omega ,M)\) embeds into \(W^{1,q}(\Omega ,M)\). Let \(C_4\) be the constant from Corollary 5.1, and pick a second constant

$$\begin{aligned} K \ge C_4\Theta _{q,1,\Omega }(u). \end{aligned}$$
(39)

With this constant \(K\), and \(L>0\) arbitrary, assume that \(\mathfrak {J}\) is elliptic on \(H_{K,L,q}^u\cap H^\Phi \) along geodesic homotopies that start in \(u\).

Let \(\mathcal {G}\) be a grid for \(\Omega \) of width \(h\) and order \(p\), \(V_{p,\mathcal {G}}^M\) a \(p\)-th-order GFE space as defined in Sect. 4, and set

$$\begin{aligned} V^h :=V_{p,\mathcal {G}}^M \cap H_{K,L,q}^u \cap H^\Phi . \end{aligned}$$

Assume that \(\Phi \) is such that this space is not empty. Finally, denote

$$\begin{aligned} u^h :={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{v^h\in V^h}} \mathfrak {J}(v^h). \end{aligned}$$

Then, whenever \(p\ge k-1\), we have the a priori estimates

$$\begin{aligned} ||u -u^h||_{W^{1,2}(\Omega ,M)}&\lesssim h^{k-1} C_{M,\mathcal {G}}(u) \Theta _{k,\Omega }(u) \end{aligned}$$

(with respect to some embedding) and

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,u^h)&\lesssim h^{k-1} C_{M,\mathcal {G}}(u) \Theta _{k,\Omega }(u). \end{aligned}$$

In these estimates, the implicit constants only depend on \(d\), the ellipticity constants of \(\mathfrak {J}\) on \(H^u_{K,L} \cap H^\Phi \), the interpolation functions \(\lambda _i\), \(i=1,\dots , m\), and the geometry of \(M\).

Proof

Consider the \(p\)-th-order interpolant \(\mathbb {I}_\mathcal {G} u \in V_{p,\mathcal {G}}^M\) of \(u\). By the choice (39), Corollary 5.1, and the assumption on the boundary data, we obtain that \(\mathbb {I}_\mathcal {G}u \in H_{K,L}^u \cap H^\Phi \). We can therefore apply the Céa lemma (Theorem 3.3) to get

$$\begin{aligned} D_{1,2}(u,u^h) \le C_2^2 \sqrt{\frac{\Lambda }{\lambda }} D_{1,2}(u,\mathbb {I}_\mathcal {G}u), \end{aligned}$$

with \(\lambda , \Lambda \) the ellipticity constants, and \(C_2\) depending only on \(d\), the product \(KL\) and the curvature of \(M\). By Theorem 5.4, the term \(D_{1,2}(u,\mathbb {I}_\mathcal {G}u)\) is less than \(h^{k-1}C_{M,\mathcal {G}} \Theta _{k,\Omega }(u)\) times another constant depending only on the \(\lambda _i\). On the other hand, Lemmas 2.3 and 2.4 bound \(D_{1,2}(u,u^h)\) from below by \(||u - u^h||_{W^{1,2}(\Omega ,M)}\) and \(\mathrm{dist }_{W^{1,2}}(u,u^h)\), respectively. Together the assertion follows. \(\square \)

Theorem 6.1 requires that the discrete solution \(u^h\) is obtained by minimizing the energy \(\mathfrak {J}\) over the approximation space \(V_{p,\mathcal {G}}^M\cap H_{K,L,q}^u \cap H^\Phi \). The restriction to \(H_{K,L,q}^u\) (i.e., the requirement that the \(L^q\)-norm of the first derivatives of all functions in the approximation space are uniformly bounded by \(K\)) is not usually encountered in the geometrically linear theory. It is problematic because the first derivatives of GFE functions deteriorate with decreasing mesh size (Lemma 4.3). In the next theorem, we will show that we can dispense with \(K\) provided that \(u \in H^k\) with \(k\) sufficiently large, more precisely whenever \(u\) possesses bounded first derivatives. We conjecture that this result also holds without the additional restriction on \(k\).

Theorem 6.2

Let \(u\) be a stationary point of the energy \(\mathfrak {J}:H^{\Phi } \rightarrow \mathbb {R}\) w.r.t. variations along geodesic homotopies starting in \(u\), and assume that

$$\begin{aligned} u \in H^{k}(\Omega ,M), \qquad \qquad k>\max \Big (2,\frac{d}{2}+1\Big ). \end{aligned}$$

Suppose that \(\mathfrak {J}\) is elliptic along geodesic homotopies starting in \(u\), with ellipticity constants \(\lambda ,\Lambda \), where, for a geodesic homotopy from \(u\) to \(v\), the upper bound \(\Lambda \) may depend on \(\max \left( \Theta _{1,\infty ,\Omega }(u),\Theta _{1,\infty ,\Omega }(v)\right) \).

Let \(\mathcal {G}\) be a grid of width \(h\), and \(V_{p,\mathcal {G}}^M\) a \(p\)-th-order GFE space. Denote \(V^h:=V_{p,\mathcal {G}}^M \cap H^u_L \cap H^\Phi \) (assuming again that \(\Phi \) is such that \(V^h\) is not empty) with \(L>0\) arbitrary (for \(M\) compact set \(L=\infty \)). Define the discrete minimizer

$$\begin{aligned} u^h:={\mathop {{{\mathrm{arg\,min\;}}}}\limits _{v^h\in V^h}}\mathfrak {J}(v^h) . \end{aligned}$$

Then, whenever \(p \ge k-1\), we have the a priori estimates

$$\begin{aligned} \Vert u - u^h\Vert _{W^{1,2}(\Omega ,M)}&\lesssim h^{k-1} C_{M,\mathcal {G}}(u) \Theta _{k,\Omega }(u) \end{aligned}$$

(with respect to some embedding) and

$$\begin{aligned} \mathrm{dist }_{W^{1,2}}(u,u^h)&\lesssim h^{k-1} C_{M,\mathcal {G}}(u) \Theta _{k,\Omega }(u). \end{aligned}$$

Proof

For simplicity, we will tacitly assume that the manifold \(M\) is embedded into \(\mathbb {R}^N\). We proceed in several steps.

  • Step 1 Using the argument from the proof of Theorem 3.3, we can show that

    $$\begin{aligned} D_{1,2}(u,u^h)^2\le \frac{C_2^2}{\lambda }\left( \mathfrak {J}(u^h) - \mathfrak {J}(u)\right)&\le \frac{C_2^2}{\lambda }\left( \mathfrak {J}(\mathbb {I}_\mathcal {G}u) - \mathfrak {J}(u)\right) \\&\le \frac{C_2^2\Lambda (u,\mathbb {I}_\mathcal {G}u)}{\lambda } D_{1,2}(u,\mathbb {I}_\mathcal {G}u)^2. \end{aligned}$$

    The constant \(C_2\) is the one given in (9), and the \(K\) appearing there has to be interpreted as an upper bound on \(\Theta _{\infty ,1,\Omega }\) on the geodesic homotopy from \(u\) to \(u^h\). By Lemma 4.3, we can pick the \(K\) such that \(\Theta _{\infty ,1,\Omega }(w^h)\lesssim K \lesssim h^{-1}\) for any \(w^h \in V^h\), where the implicit constant depends on the nodal values of \(w^h\). We therefore obtain

    $$\begin{aligned} D_{1,2}(u,u^h)^2 \lesssim h^{-2} \frac{\Lambda (u,\mathbb {I}_\mathcal {G}u)}{\lambda } D_{1,2}(u,\mathbb {I}_\mathcal {G}u)^2 \end{aligned}$$

    for \(h\) small. Now, we use that by our smoothness assumptions \(\Theta _{\infty ,1,\Omega }(u)\) is finite. Then, by Corollary 5.1, we get

    $$\begin{aligned} \Theta _{\infty ,1,\Omega }(\mathbb {I}_\mathcal {G}u) \lesssim \Theta _{\infty ,1,\Omega }(u), \end{aligned}$$

    and the constant is independent of \(h\). Therefore, also the quantity \(\Lambda (u,\mathbb {I}_\mathcal {G}u)\) is uniformly bounded, independent of \(h\). Using additionally Theorem 5.4, this gives

    $$\begin{aligned} D_{1,2}(u,u^h)\lesssim h^{-1}D_{1,2}(u,\mathbb {I}_\mathcal {G}u) \lesssim h^{k-2} C_{M,\mathcal {G}} \Theta _{k,\Omega }(u), \end{aligned}$$
    (40)

    where we have omitted the dependence on the ellipticity constants. Using Lemma 2.3, we see that (40) implies that in our embedding we have

    $$\begin{aligned} \Vert u-u^h\Vert _{H^1} \lesssim h^{k-2} C_{M,\mathcal {G}} \Theta _{k,\Omega }(u). \end{aligned}$$
    (41)

    We need to improve this estimate to the desired order \(k-1\).

  • Step 2 We will improve the suboptimal order of \(h^{k-2}\) to the desired order in the remainder of this proof. First, we assume \(d\ge 3\) and let \(d_*:=\frac{2d}{d-2}\) so that we have an embedding of \(H^1(\Omega ,\mathbb {R}^N)\) into \(L^{d_*}(\Omega ,\mathbb {R}^N)\). By (41), we get that

    $$\begin{aligned} \Vert u-u^h\Vert _{L^{d_*}} \lesssim h^{k-2} C_{M,\mathcal {G}} \Theta _{k,\Omega }(u). \end{aligned}$$
    (42)
  • Step 3 Now, we assume that \(d_*\le d\) and use a standard argument to gain an upper bound for the error of \(u-u^h\) measured in the \(L^s\)-norm for \(s>d\) arbitrary:

    $$\begin{aligned} \Vert u-u^h\Vert _{L^{s}}^s&= \int \limits _\Omega |u(x)-u^h(x)|^s dx \\&= \int \limits _\Omega |u(x)-u^h(x)|^{d_*} |u(x)-u^h(x)|^{s-d_*}dx \\&\le \max _{x\in \Omega }|u(x)-u^h(x)|^{s-d_*} \Vert u - u^h\Vert _{L^{d_*}}^{d_*} \\&\lesssim h^{d_{*}(k-2)} C_{M,\mathcal {G}}^{d_*} \Theta _{k,\Omega }(u)^{d_*}, \end{aligned}$$

    the last inequality following from (42). For \(k>\frac{d}{2}+1\), we can therefore deduce that we can find \(s>d\) such that

    $$\begin{aligned} \frac{d_*}{s}(k-2) \ge 1, \end{aligned}$$

    and consequently

    $$\begin{aligned} \Vert u-u^h\Vert _{L^s} \lesssim h\quad \text {for some } s>d. \end{aligned}$$
    (43)
  • Step 4 Put \(q = \infty \). Then, Lemma 3.3 states that with

    $$\begin{aligned} C_2 = \sqrt{2} + 2^{d/2+1} C_3 \Vert \mathrm {Rm}\Vert _{g} K \mathrm{dist }_{L^{s}}(u,v) \end{aligned}$$

    and

    $$\begin{aligned} K=\Theta _{q,1,\Omega }(u^h)\lesssim h^{-1} \end{aligned}$$

    we have the inequality

    $$\begin{aligned} D_{1,2}(u,u^h)^2 \lesssim C_2^2 D_{1,2}(u,\mathbb {I}_\mathcal {G}u)^2 \lesssim C_2^2 h^{2(k-1)}C_{M,\mathcal {G}}^2 \Theta _{k,\Omega }(u)^2. \end{aligned}$$

    Note that, due to (43), \(C_2\) is now bounded independently of \(h\), which yields the desired bound for the error \(D_{1,2}(u,u^h)\). Finally, Lemmas 2.3 and 2.4 bound \(D_{1,2}(u,u^h)\) from below by \(||u - u^h||_{W^{1,2}(\Omega ,M)}\) and \(\mathrm{dist }_{W^{1,2}}(u,u^h)\), respectively, which proves the desired result for \(d_*\le d\).

  • Step 5 The condition \(d_*\le d\) always holds if \(d \ge 4\). For \(d=3\), we can directly use the fact that

    $$\begin{aligned} \Vert u - u^h\Vert _{L^d} \le \Vert u - u^h\Vert _{L^{d_*}} \lesssim h^{k-2} C_{M,\mathcal {G}} \Theta _{k,\Omega }(u) \end{aligned}$$

    by (41) whenever \(k\ge 3\), which yields the bound

    $$\begin{aligned} \Vert u - u^h\Vert _{L^d} \lesssim h. \end{aligned}$$

    Now, we can proceed as in Step 4 to gain the optimal order for \(d = 3\), whenever \(k\ge 3\). For \(d=3\), this again implies the desired asymptotic approximation rate \(h^{k-1}\) whenever \(k>\frac{d}{2}+1\).

  • In the case \(d=2\), we note that \(H^1\) embeds into \(L^s\) for every \(s<\infty \), and therefore, we have

    $$\begin{aligned} \Vert u - u^h\Vert _{L^s}\lesssim h^{k-2}\quad \text {for any} \quad s<\infty . \end{aligned}$$

    It follows that for \(k\ge 3\) and \(s<\infty \) arbitrary, we have

    $$\begin{aligned} \mathrm{dist }_{L^s}(u,u^h)\lesssim h, \end{aligned}$$

    which allows us to use the same arguments as in Step 4 to deduce the desired approximation rate. The case \(d=1\) follows with the same argumentation.\(\square \)

To summarize, Theorems 6.1 and 6.2 both present extensions of linear a priori error estimates for finite elements. In Theorem 6.1, we require that the approximation spaces and the solution \(u\) possess uniformly bounded derivatives. In contrast, Theorem 6.2 does not impose restrictions on the approximation spaces, but poses stronger assumptions on the smoothness of \(u\) instead.

7 Examples

To illustrate our results, we apply them to a few specific examples. We focus on the harmonic energy and related functionals and leave the study of more general energies to future work.

Let \(\Omega \) be a domain and \((M,g)\) a Riemannian manifold. As previously, we consider Dirichlet problems only. Boundary values are given in form of a function \(\Phi : \partial \Omega \rightarrow M\) of sufficient regularity. For such a \(\Phi \), we write \(H^\Phi \) for the set of all functions \(v : \Omega \rightarrow M\) for which \(v|_{\partial \Omega } = \Phi \) holds.

Studying the assumptions of Theorems 6.1 and 6.2, we recall that we can give optimal a priori discretization error bounds for discrete minimizers of an energy functional \(\mathfrak {J}\) if \(\mathfrak {J}\) is elliptic, and if the minimizer of \(\mathfrak {J}\) has sufficient smoothness.

7.1 Harmonic Maps

The prototypical elliptic functional is the harmonic energy

$$\begin{aligned} \mathfrak {J}^{\mathrm {harm}}(v) = \int \limits _{\Omega }\left| \nabla v(x)\right| _{g(v(x))}^2 dx. \end{aligned}$$

The stationary points of this functional are called harmonic maps and have been widely studied in the literature (see, e.g., [19]).

There are different approaches to showing ellipticity of the harmonic energy. We first use bounds on the second derivatives along geodesic homotopies. Let \(K\) be a positive constant, and \(H_K :=W^{1,q}_K\) as defined in (5) for some \(q>\max (2,d)\).

Lemma 7.1

The energy \(\mathfrak {J}^{\mathrm {harm}}\) is elliptic along geodesic homotopies in \(H_K \cap H^\Phi \) in the sense of Definition 3.2 if either

  1. 1.

    \(M\) has nonpositive sectional curvature, or

  2. 2.

    we have \(1-K^2\Vert \mathrm {Rm}\Vert _g C_1(\Omega )>0\),

where \(C_1(\Omega )\) is the Poincaré constant of \(\Omega \) from Lemma 2.2.

Proof

Let \(\Gamma \) be a geodesic homotopy in \(H_K \cap H^\Phi \), and set \(f(t) :=\mathfrak {J}^{\mathrm {harm}}(\Gamma (t))\). Lemma X.3.2(ii) in [58] tells us that

$$\begin{aligned} \frac{d^2}{dt^2}f(t)&= 2\int \limits _{\Omega } \left\langle \nabla \dot{\Gamma }(x,t),\nabla \dot{\Gamma }(x,t) \right\rangle _{g(\Gamma (x,t))} dx\\&\qquad - 2\int \limits _\Omega \left\langle \mathrm {Rm}\left( \nabla \Gamma (x,t),\dot{\Gamma }(x,t)\right) \dot{\Gamma }(x,t),\nabla \Gamma (x,t) \right\rangle _{g(\Gamma (x,t))} dx. \end{aligned}$$

Now, the assertion follows as a direct consequence of the Poincaré inequality in Lemma 2.2. \(\square \)

Remark 7.1

For positive curvature, this ellipticity result is fairly weak. The results in [36] may allow improvements.

Alternatively, one can also directly show the \(\lambda \)-convexity of the harmonic energy functional along geodesic homotopies.

Lemma 7.2

Let \(M\) be simply connected and have nonpositive sectional curvatures. Then, the harmonic energy is \(\lambda \)-convex along geodesic homotopies in \(H^\Phi \), with \(\lambda \) equal to \(1/2\) times the Poincaré constant of \(\Omega \).

Proof

Let \(u,v\) be functions in \(H^\Phi \), and let \(\Gamma \) be a geodesic homotopy from \(u\) to \(v\). Since \(M\) is simply connected and has nonpositive curvature, it is an NPC space is the sense of [58, Sec. X.2.1]. For this setting, it is shown in the proof for [58, Thm. X.2.2] that

$$\begin{aligned} \mathfrak {J}^\text {harm}(\Gamma (t)) \le (1-t) \,\mathfrak {J}^\text {harm}(u) + t\, \mathfrak {J}^\text {harm}(v) - t(1-t) \int \limits _\Omega {|\nabla \mathrm{dist }(u(x),v(x)|}^2\,dx. \end{aligned}$$

Since \(u\) and \(v\) fulfill the same Dirichlet boundary conditions, we have \(\mathrm{dist }(u(x),v(x)) = 0\) on \(\partial \Omega \). The assertion then follows with the standard Poincaré inequality. \(\square \)

Regularity of harmonic maps is a well-studied subject. The following results are derived in [19, 33, 36, 37].

Lemma 7.3

A harmonic map \(u: \Omega \rightarrow M\) with continuous boundary data is in \(C^\infty \), if one of the following conditions is satisfied:

  1. 1.

    \(M\) has nonpositive sectional curvature,

  2. 2.

    \(d\in \{1,2\}\), or

  3. 3.

    the image of \(u\) is contained in a convex geodesic ball.

We remark that in other cases singularities may develop [52].

Using the preliminaries above and Theorem 6.2, we are able to prove the following convergence theorem for harmonic maps.

Theorem 7.1

Let \(u\) be a local minimizer of \(\mathfrak {J}^\mathrm {harm}\) on \(H_K \cap H^\Phi \) for a constant \(K > 0\) and continuous boundary data \(\Phi \). Also, let \(u^h\) be the corresponding minimizer in a \(p\)-th-order GFE space generated by a grid of width \(h\) and order \(p\), and resolving the boundary conditions. If \(M\) has positive sectional curvature suppose that \(1-K^2\Vert \mathrm {Rm}\Vert _g C_1(\Omega )>0\) and that either \(d \in \{1,2\}\) or that the image of \(u\) is contained in a convex geodesic ball of \(M\) (for \(M\) with nonpositive sectional curvature no assumptions are needed). Then

$$\begin{aligned} ||u-u^h||_{W^{1,2}(\Omega ,M)}&\lesssim h^{p}\Vert u\Vert _{H^{p+1}}^{p+1} \end{aligned}$$

(in an embedding), and

$$\begin{aligned} \mathrm{dist }(u,u^h)_{W^{1,2}}&\lesssim h^{p}\Vert u\Vert _{H^{p+1}}^{p+1}. \end{aligned}$$

Proof

By Lemma 7.3, \(u\) is smooth enough for the smoothness descriptor \(\Theta _{p+1,\Omega }(u)\) to be finite, and by Lemma 7.1, the harmonic energy \(\mathfrak {J}^\text {harm}\) is elliptic. Hence, Theorem 6.2 yields bounds in terms of the smoothness descriptor \(\Theta _{p+1,\Omega }(u)\), which we bound in turn with Lemma 2.7. \(\square \)

Remark 7.2

In Theorem 7.1, we have assumed that the boundary data can be represented exactly in the GFE approximation space. This may not always be the case, but a simple approximation argument shows that the same result holds if \(u^h\) is interpolating smooth boundary data.

Theorem 7.1 is confirmed by numerical studies in [56] for \(M=S^2\). In [55], the same optimal orders were observed for \(p=1\), even though the assumptions of Lemma 7.3 did not hold there.

Remark 7.3

In [7], harmonic maps into spheres \(S^2 \in \mathbb {R}^3\) are approximated by minimizing the harmonic energy over piecewise affine finite elements with nodal values on the sphere. It is shown that for \(h\rightarrow 0\) there exists a subsequence of discrete solutions (more precisely stationary points of the discrete optimization problems) which converges weakly to a harmonic map. This holds even for nonregular solutions and without any ellipticity assumption, which is in contrast to our own results. The latter always assume a certain smoothness of the solution, but, on the other hand, allow to obtain not just weak convergence of a subsequence but strong convergence with optimal rates. We consider it an interesting question whether we can use the approach of [7] to prove weak convergence of sequences of GFE approximations when the solution is not smooth and/or the harmonic energy is not elliptic.

7.2 Generalizations

We can generalize the discretization error bounds for harmonic maps in a few simple ways. We show only the ellipticity of the different functionals. Together with regularity results available from the literature, optimal discretization error bounds then follow by Theorem 6.2.

7.2.1 \(F\)-Harmonic Maps

\(F\)-harmonic maps are stationary points of the energy

$$\begin{aligned} \mathfrak {J}^F(v):=\int \limits _{\Omega }F\Big (x,\left| \nabla v(x)\right| _{g(v(x))}^2\Big ) dx, \end{aligned}$$

with a function \(F:\Omega \times \mathbb {R}^+ \rightarrow \mathbb {R}\). Such energies generalize harmonic maps and include, e.g., \(p\)-harmonic maps and exponentially harmonic maps [5]. For notational simplicity, we will suppress the dependence of \(F\) on \(x\) in the following results. The proofs for this case easily carry over to the \(x\)-dependent case.

The following result follows from direct calculations.

Lemma 7.4

Denote

$$\begin{aligned} f(t):=\mathfrak {J}^F(\Gamma (t)), \end{aligned}$$

where \(\Gamma \) is a geodesic homotopy. Then, we have

$$\begin{aligned} \frac{d^2}{dt^2}f(t)&= 2\int \limits _{\Omega } F'\left( \left| \nabla \Gamma (x,t)\right| _{g(\Gamma (x,t))}^2\right) \left\langle \nabla \dot{\Gamma }(x,t),\nabla \dot{\Gamma }(x,t) \right\rangle _{g(\Gamma (x,t))} dx\\&\quad - 2\int \limits _\Omega F'\left( \left| \nabla \Gamma (x,t)\right| _{g(\Gamma (x,t))}^2\right) \left\langle \mathrm {Rm}\left( \nabla \Gamma (x,t),\dot{\Gamma }(x,t)\right) \dot{\Gamma }(x,t),\nabla \Gamma (x,t) \right\rangle _{g(\Gamma (x,t))} dx\\&\quad + 4\int \limits _{\Omega } F''\left( \left| \nabla \Gamma (x,t)\right| _{g(\Gamma (x,t))}^2\right) \left\langle \nabla \dot{\Gamma }(x,t),\nabla \Gamma (x,t) \right\rangle _{g(\Gamma (x,t))}^2 dx. \end{aligned}$$

Based on this, we can prove the following ellipticity result.

Lemma 7.5

Assume that there are constants \(w_2\), \(w'_2\), \(w'_3\) such that

$$\begin{aligned} w_2'\ge F'(y)\ge w_2>0,\quad w_3'\ge F''(y)\ge 0 \quad \forall y\in \mathbb {R}^+, \end{aligned}$$

and either

  1. 1.

    \(M\) has nonpositive sectional curvature, or

  2. 2.

    \(w_2-w_2'K^2\Vert \mathrm {Rm}\Vert _g C_1(\Omega )>0\),

where \(C_1(\Omega )\) is the Poincaré constant of \(\Omega \) from Lemma 2.2. Then, the energy \(\mathfrak {J}^F\) is elliptic along geodesic homotopies in \(H_K \cap H^\Phi \) in the sense of Definition 3.2.

Proof

Let \(M\) have nonpositive sectional curvature. Then, using Lemma 7.4, we have

$$\begin{aligned} \frac{d^2}{dt^2}f(t)&\ge 2\int \limits _{\Omega } F'\left( \left| \nabla \Gamma (x,t)\right| _{g(\Gamma (x,t))}^2\right) \left\langle \nabla \dot{\Gamma }(x,t),\nabla \dot{\Gamma }(x,t) \right\rangle _{g(\Gamma (x,t))} dx\\&\ge 2w_2\int \limits _{\Omega } \left\langle \nabla \dot{\Gamma }(x,t),\nabla \dot{\Gamma }(x,t) \right\rangle _{g(\Gamma (x,t))} dx\\&\ge \frac{2w_2}{1+C_1(\Omega )} |\dot{\Gamma }(t)|_{H^1}, \end{aligned}$$

where \(C_1(\Omega )\) is the Poincaré constant of \(\Omega \) from Lemma 2.2. On the other hand, again by Lemma 7.4, we have

$$\begin{aligned} \frac{d^2}{dt^2}f(t)&\le 2w_2'\int \limits _{\Omega }\left( \big | \nabla \dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2 + K^2\Vert \mathrm {Rm}\Vert _g \big |\dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2 \right) dx\\&\quad + \, 4w_3'K^2\int \limits _\Omega \big |\nabla \dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2 dx. \end{aligned}$$

In summary, we have ellipticity with \(\lambda = \frac{2w_2}{1+C_1(\Omega )}\) and \(\Lambda = \max \big (2w_2'+4w_3'K^2,2w_2' K^2\Vert \mathrm {Rm}\Vert _g\big )\). This proves 1. For the proof of the result under Assumption 2, we estimate

$$\begin{aligned} \frac{d^2}{dt^2}f(t)&\ge 2\int \limits _{\Omega }\left( w_2\big | \nabla \dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2 - w_2'K^2\Vert \mathrm {Rm}\Vert _g \big |\dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2 \right) dx\\&\ge 2(w_2-w_2'K^2\Vert \mathrm {Rm}\Vert _gC_1(\Omega )) \int \limits _{\Omega }\big | \nabla \dot{\Gamma }(x,t)\big |_{g(\Gamma (x,t))}^2dx\\&\ge 2\frac{w_2-w_2'K^2\Vert \mathrm {Rm}\Vert _gC_1(\Omega )}{1+C_1(\Omega )} |\dot{\Gamma }(t)|_{H^1}^2. \end{aligned}$$

We get ellipticity with \(\lambda = 2\frac{w_2-w_2'K^2\Vert \mathrm {Rm}\Vert _gC_1(\Omega )}{1+C_1(\Omega )}\) and \(\Lambda = \max \big (2w_2'+4w_3'K^2,2w_2' K^2\Vert \mathrm {Rm}\Vert _g\big )\). \(\square \)

7.2.2 Harmonic Maps with Potential

We can also generalize the harmonic energy by adding a source term with potential \(G : \Omega \times M \rightarrow \mathbb {R}\). We arrive at

$$\begin{aligned} \mathfrak {J}^{\mathrm {harm},G}(v) = \mathfrak {J}^{\mathrm {harm}}(v) + \int \limits _{\Omega }G(x,v(x)) \,dx, \end{aligned}$$

see [21].

Again, for simplicity, in the following we will suppress the dependence of \(G\) on its first variable \(x\) and assume that \(G:M\rightarrow \mathbb {R}\). The second derivative of \(\mathfrak {J}^{\mathrm {harm},G}\) along a geodesic homotopy \(\Gamma \) splits in the same terms as above for the harmonic energy, plus the Hessian of \(G\). Note that for a point \(q \in M\), the Hessian \(\mathrm{Hess } G: T_q M \times T_q M \rightarrow R\) is

$$\begin{aligned} \mathrm{Hess }(G)(v,w) :=\left\langle \frac{D}{dt}\mathrm{grad } G(\gamma (s))\Big |_{s=0}, w \right\rangle _{g(q)},\quad \forall v, w\in T_q M. \end{aligned}$$

where \(\gamma : \{-\epsilon ,\epsilon \} \rightarrow M\) is a differentiable path such that \(\gamma (0) = q\) and \(\dot{\gamma }(0) = v\).

Lemma 7.6

With \(\Gamma \) a geodesic homotopy and \(f(t):=\mathfrak {J}^{\mathrm {harm},G}(\Gamma (t))\) with \(G:M\rightarrow \mathbb {R}\), we have

$$\begin{aligned} \frac{d^2}{dt^2}f(t) = 2\int \limits _{\Omega }\big \langle \nabla \dot{\Gamma }, \nabla \dot{\Gamma }\big \rangle ^2 dx - 2\int \limits _{\Omega } \big \langle \mathrm {Rm}(\nabla \Gamma ,\dot{\Gamma })\dot{\Gamma }, \nabla \Gamma \big \rangle \, dx + \int \limits _\Omega \mathrm{Hess }(G)(\dot{\Gamma },\dot{\Gamma }) \,dx. \end{aligned}$$

The potential \(G\) influences the ellipticity of \(\mathfrak {J}^{\text {harm},G}\) in the following way.

Corollary 7.1

The energy \(\mathfrak {J}^{\mathrm {harm},G}: H_K \cap H^\Phi \rightarrow \mathbb {R}\) is elliptic along geodesic homotopies if either

  1. 1.

    \(M\) has nonpositive sectional curvature, and \(\mathrm{Hess }G\) is positive semidefinite, or

  2. 2.

    we have

    $$\begin{aligned} 1-K^2\Vert \mathrm {Rm}\Vert _gC_1(\Omega ) + \inf _{v\in TM} \frac{\mathrm{Hess }(G)(v,v)}{|v|_g^2}>0. \end{aligned}$$

Hence, \(\mathfrak {J}^{\mathrm {harm},G}\) can be elliptic even if \(\mathfrak {J}^{\mathrm {harm}}\) by itself is not, provided that \(\mathrm{Hess } G\) is sufficiently positive definite.

For various results related to the smoothness of harmonic maps with potential, we refer to [21].

7.2.3 Tikhonov Regularization

As a special case of the above, we can choose the source term to be the distance from a given function \(w: \Omega \rightarrow M\)

$$\begin{aligned} \mathfrak {J}^w(v):=\mathfrak {J}^{\mathrm {harm}}(v) + \int \limits _\Omega \mathrm{dist }(v(x),w(x))^2 \mu (dx). \end{aligned}$$

It is useful for applications to allow the source term to be integrated with respect to a general positive measure \(\mu \), which may be discrete. Minimizing such an energy \(\mathfrak {J}^w\) can be useful in smoothing, denoising or motion planning [67]. For \(d=1\) (by defining \(\mu \) to be a discrete measure and using boundedness of point evaluations in \(H^1\) for \(d=1\)), the framework includes a point-fitting energy

$$\begin{aligned} \mathfrak {J}^{\mathrm {harm}}(v) + \sum _{i=1}^N \mathrm {dist}(v(x_i),p_i)^2 \end{aligned}$$

for interpolation points \(x_i\in \Omega \) and point values \(p_i\in M\).

In the case of nonpositive curvature, ellipticity can be established easily.

Lemma 7.7

Assume that \(M\) has nonpositive sectional curvature. Then, the energy \(\mathfrak {J}^{w}: H_K \cap H^\Phi \rightarrow \mathbb {R}\) is elliptic along geodesic homotopies.

Proof

This is a simple consequence of the ellipticity of \(\mathfrak {J}^{\mathrm {harm}}\), together with the fact that for a geodesic \(\gamma (t)\) in \(M\), we have that

$$\begin{aligned} \frac{d^2}{dt^2}\mathrm {dist}(\gamma (t),p)^2 \ge 0 \end{aligned}$$

for all points \(p\in M\) if \(M\) has nonpositive curvature [63]. Therefore, the functional \(\mathfrak {J}^w\) is coercive for any choice of \(w\). \(\square \)

Observe that the ellipticity of the functional \(\mathfrak {J}^w\) holds even without Dirichlet boundary conditions. If \(M\) has positive curvature, additional restrictions regarding the diameter of the image \(u(\Omega )\) apply.

8 Conclusion

We have established optimal a priori discretization error bounds for the discretization of manifold-valued problems by geodesic finite elements (GFE). This was achieved by establishing appropriate manifold-valued generalizations of the classical Céa lemma and interpolation error bounds for geodesic finite element spaces. Along the way we have introduced a number of new technical tools for dealing with the analysis of manifold-valued functions which we expect to be useful beyond this paper. One example application of our theory is high-order numerical schemes for the computation of harmonic maps into manifolds.

Many issues remain for future work. Aside from natural issues such as for instance the investigation of the effects of variational crimes in the spirit of [62], we mention a more thorough study of ellipticity properties for several geometric energies of interest, among them a finer study of the harmonic energy with positively curved target spaces, or the Cosserat energies studied in [48, 54]. Additionally, convexity properties of the energies on the approximation spaces are of interest, because they influence the convergence speed of numerical solvers. Further, it will be interesting to study weak convergence properties of GFE discretizations for nonelliptic energies and/or nonsmooth solutions, generalizing results of [7]. Finally, we mention further extensions of linear finite element-based methods, e.g., nonconforming variants of geodesic finite elements and temporal discretizations for nonstationary problems.