1 Introduction and results

Let \(M\) be an \(n\)-dimensional paracompact connected \(C^{2,1}\) manifold and let \(x^\mu :U\rightarrow \mathbb {R}^n\) be a local chart where \(U\) is an open subset. Every chart induces a chart \((x^\mu , v^\mu ):\pi ^{-1}(U)\rightarrow \mathbb {R}^n\times \mathbb {R}^n\), \(\mu =0,1,\ldots ,n-1\), on the tangent bundle \(\pi : TU\rightarrow U\).

For the moment let us consider a second order ODE defined just over \(U\) by

$$\begin{aligned} \frac{\mathrm{d}x^{\mu }}{ \mathrm{d}t}&=v^\mu , \end{aligned}$$
(1)
$$\begin{aligned} \frac{\mathrm{d}v^\mu }{\mathrm{d}t}&=H^\mu (x,v) , \end{aligned}$$
(2)

where \(H^\mu \) is locally Lipschitz. Under a coordinate change \(\tilde{x}^\mu =\tilde{x}^\mu (x^\alpha )\) the system becomes \(\dot{\tilde{x}}^\mu =\tilde{v}^\mu \), \(\dot{\tilde{v}}^\mu = \tilde{H}^\mu (\tilde{x},\tilde{v})\), where

$$\begin{aligned} \tilde{H}^\mu (\tilde{x},\tilde{v})=\frac{\partial \tilde{x}^\mu }{\partial x^\alpha \partial x^\beta } \frac{\partial x^\alpha }{\partial \tilde{x}^\gamma } \frac{\partial x^\beta }{\partial \tilde{x}^\delta }\, {\tilde{v}}^\gamma {\tilde{v}}^\delta + \frac{\partial \tilde{x}^\mu }{\partial x^\nu }\, H^\nu (x(\tilde{x}), v(\tilde{x},{\tilde{v}})), \end{aligned}$$
(3)

and \(v^\mu (\tilde{x},{\tilde{v}})=\frac{\partial x^\mu }{\partial \tilde{x}^\nu } \,\tilde{v}^\nu \). The transformation shows that the following notions are well-defined as independent of the coordinate chart:

  1. 1.

    \(H^\mu \) is positive homogeneous of second degree on the velocities, that is, for every \(s>0\), \(H^\mu (x,sv)=s^2H^\mu (x,v)\),

  2. 2.

    \(H^\mu \) is a homogeneous quadratic form in the velocities.

In the former case we say that the second order ODE defines a spray over \(M\) [2, 39] while in the latter more restrictive case we say that it defines a (torsionless) connection. In the latter case we set \(H^\mu (x,v)=-\varGamma ^\mu _{\alpha \beta }(x) v^\alpha v^\beta \) and we recognize in Eq. (3) the transformation rule for the Christoffel symbols \(\varGamma ^\mu _{\alpha \beta }\). The transformation rule (3) clarifies that \(M\) must be \(C^{2,1}\) to make sense of the Lipschitz condition on sprays. The non-stationary solutions to Eqs. (1, 2) will be called geodesics.

Remark 1

Actually, the above notion of spray is somewhat more general than that introduced in [2, 39] since these authors drop the condition \(s>0\) on (a). Our definition, consistent with current usage [4], allows us to include non-reversible pseudo-Finsler manifolds in our analysis of sprays. Clearly, if \(H^\mu (x,v)\) defines a spray then \(\tilde{H}^\mu (x,v):=H^\mu (x,-v)\) defines also a spray called the reverse spray. If \(H^\mu (x,v)=H^\mu (x,-v)\) the spray is called reversible. If \(x(t)\) is a geodesic then \(x(-t)\) is a geodesic for the reverse spray but not necessarily for the original spray. Thus the direction of the parametrization of geodesics is important. If we say that two points are connected by a unique geodesic we tacitly assume that the spray under consideration is \(H^\mu (x,v)\); that claim does not exclude the possible presence of a connecting geodesic, with different image, for the reverse spray.

From now on we shall consider just locally Lipschitz sprays and we shall clearly speak of connection whenever \(H^\mu \) is a quadratic form. If \(H^\mu \) is twice continuously differentiable with respect to the velocities on the zero section of \(TM\), then differentiating twice \(H^\mu (x,s v)=s^2H^\mu (x,v)\) with respect to \(s\) and letting \(s\rightarrow 0\) we obtain \(H^\mu (x,v)=\frac{1}{2}(\partial ^2 H^\mu /\partial v^\alpha \partial v^\beta (x,0)) v^\alpha v^\beta \), that is the spray is a connection [39]. Whenever the connection comes from a pseudo-Riemannian metric \(g\) we shall assume \(g\) to be \(C_{loc}^{1,1}\). For notational convenience, we shall write just \(C^{k,1}\) for \(C^{k,1}_{loc}\). It is customary [9] to call non-regular the geometrical theory for which the differentiability condition on \(g\) is weaker than \(C^2\) (or that on the connection or spray is weaker than \(C^1\)). As an example of manifold which has a \(C^{1,1}\) metric \(g\), consider a cylinder closed by two semispherical cups, where the submanifold is endowed with the metric induced from the Euclidean space.

1.1 Example: Pseudo-Finsler geometry

Pseudo-Finsler geometry [1, 6, 7] is a generalization of Finsler geometry [5, 42] in which the fundamental tensor \(g\) is required to be non-degenerate rather than positive definite. Geodesics in pseudo-Finsler geometry are described by sprays.

Since the definitions of pseudo-Finsler manifold which can be found in the literature impose too strong differentiability conditions we provide a different definition.

Definition 1

A pseudo-Finsler manifold \((M,g)\) is a paracompact connected \(C^{2,1}\) manifold endowed with a \(C^{1,1}\) symmetric tensor

$$\begin{aligned} g: TM\backslash 0 \rightarrow T^*M\otimes _{M} T^*M, \quad (x,v) \mapsto g_{(x,v)}, \end{aligned}$$

defined on the non vanishing vectors, which is non-singular and satisfies (in one and hence every chart induced from a chart on \(M\))

$$\begin{aligned} \frac{\partial g_{(x,v)\, \mu \nu }}{\partial v^\alpha } \, v^\nu =0, \qquad \frac{\partial g_{(x,v)\, \mu \nu }}{\partial v^\alpha } \, v^\alpha =0. \end{aligned}$$
(4)

It is called reversible if \(g_{(x,v)}=g_{(x,- v)}\).

Sometimes we shall write \(g\) or \(g_v\) for \(g_{(x,v)}\) either in order to shorten the notation or because we regard \(v\) as an element of \(TM\backslash 0\). In the pseudo-Riemannian case \(g\) is independent of \(v\) and written without the \(v\) index.

Remark 2

The latter equation in display is equivalent to the homogeneity condition: for every \(s>0\), \(g_{(x,sv)}=g_{(x,v)}\). Thus it implies that \(L:\textit{TM}\rightarrow \mathbb {R}\) defined by

$$\begin{aligned} L(x,v)&\,{:=}\,\frac{1}{2}\, g_{(x,v)}(v,v), \ \mathrm for \ v\ne 0, \end{aligned}$$
(5)
$$\begin{aligned} L(x,0)&\,{:=}\,0 \end{aligned}$$
(6)

is positively homogeneous of second degree, namely for every \(s>0\), and \(v\ne 0\), \(L(x,sv)=s^2 L(x,v)\). The former equation implies

$$\begin{aligned} \frac{\partial L}{\partial v^\mu }(x,v)&=g_{(x,v)\, \mu \nu } v^\nu , \end{aligned}$$
(7)
$$\begin{aligned} \frac{\partial ^2 L}{\partial v^\mu \partial v^\nu }&=g_{(x,v)\, \mu \nu }. \end{aligned}$$
(8)

We could have defined the pseudo-Finsler manifold as a pair \((M,L)\) in which \(L\) is positive homogeneous of second degree, and where \(g\) is defined through Eq. (8). This is the definition adopted by most authors. Indeed, differentiating twice with respect to \(s\), \(L(x,sv)=s^2L(x,v)\), and setting \(s=1\) gives Eq. (5). Equation (7) is obtained from Eq. (8) observing that \(\frac{\partial L}{\partial v^\mu }\) is positively homogeneous of first degree.

Our definition dispenses from additional differentiability conditions that would have to be imposed on \(L\) in order to define \(g\). Furthermore, it has the advantage of making clear the connection with pseudo-Riemannian geometry. Also it clarifies that not every tensor \(g_{(x,v)}\), positive homogeneous of zero degree in \(v\), is a pseudo-Finsler metric as the first equation in (4) has to be satisfied.

A geodesic is a stationary point of the functional (with a prime we denote differentiation, typically with respect to a parameter \(s\), if the parameter is \(t\) we often use a dot)

$$\begin{aligned} S[x]=\int _{s_0}^{s_1} L(x,x')\, \mathrm{d}s , \qquad x:[s_0,s_1]\rightarrow M,\ x(s_0)=x_0, \ x(s_1)=x_1 , \end{aligned}$$

where \(x\in C^1([s_0,s_1])\). By the same argument used above for \(H^\mu \) we cannot demand \(g\) to exist and to be continuous on the zero section unless \(L\) is quadratic in the velocities, which corresponds to the case of pseudo-Riemannian geometry. In our terminology the Finsler (Riemannian) structures are special cases of the pseudo-Finsler (pseudo-Riemannian) ones. In the former case \(\sqrt{2L}\) is often denoted \(F\).

We observe that it is always possible to introduce an auxiliary Riemannian metric \(h\) over \(M\) and to consider the unit sphere subbundle of \(\textit{TM}\). If \(x\) is kept fixed then \(g_{(x,v)\, \mu \nu }\) depends only on the direction and orientation of \(v\) namely on \(\hat{v}\), and since the unit sphere bundle is compact over compact subsets, \(g_{(x,v)\, \mu \nu }\) is bounded in relatively compact neighborhoods of points of TM belonging to the zero section. The same observation holds for the partial derivatives with respect to \(x\) of \(g\), and hence combinations such as \(g_{v\, \alpha \beta } v^\delta \) or \(\frac{\partial g_{v\, \nu \beta }}{\partial x^\alpha } v^\delta \) are locally Lipschitz also at the zero section once they are defined to vanish there. In particular, Eqs. (5) and (7) make sense also for \(v=0\), and for fixed \(x\), \(L\) is \(C^{1,1}\) on the zero section and \(C^{3,1}\) outside it.

The Lagrangian \(L\) is constant over the geodesics because, using the Euler–Lagrange equations (we cannot invoke the Hamiltonian to obtain this result since we have not proved the convexity of \(L\) in the velocities)

$$\begin{aligned} \frac{\mathrm{d}L}{\mathrm{d}t}=\frac{\partial L}{\partial x^\mu }\, v^\mu + \frac{\partial L}{\partial v^\mu } \frac{\mathrm{d}v^\mu }{\mathrm{d}t}=\left( \frac{\mathrm{d}}{\mathrm{d}t}\frac{\partial L}{\partial v^\mu }\right) v^\mu + \frac{\partial L}{\partial v^\mu } \frac{\mathrm{d}v^\mu }{\mathrm{d}t}=\frac{\mathrm{d}}{\mathrm{d}t} \left( \frac{\partial L}{\partial v^\mu } v^\mu \right) =2\frac{\mathrm{d}L}{\mathrm{d}t} \end{aligned}$$

where in the last step we used the homogeneity of \(L\). The spray reads

$$\begin{aligned} H^\mu (x,v)=-\frac{1}{2}g_v^{\mu \nu }\,\left( \frac{\partial g_{v\, \nu \alpha }}{\partial x^\beta }+\frac{\partial g_{v\, \nu \beta }}{\partial x^\alpha }-\frac{\partial g_{v\, \alpha \beta }}{\partial x^\nu }\right) \, v^\alpha v^\beta , \end{aligned}$$

where it is understood that \(H^\mu (x,0)=0\). It is Lipschitz as required because \(g^{-1}_{(x,v)}=g^{-1}_{(x,\hat{v})}\) depends continuously on the unit sphere bundle which is compact over compact subsets of \(M\), thus the inverse \(g^{\mu \nu }_{(x,v)}\) stays bounded in relatively compact neighborhoods of points belonging to the zero section, and combinations of the form \( g^{\mu \nu }_{(x,v)} v^\beta \) are locally Lipschitz everywhere once they are defined to vanish on the zero section.

We shall return to the geometry of pseudo-Finsler spaces when we shall discuss Gauss’ Lemma. Any mention to the various connections that can be introduced in this theory will be avoided in both results and proofs.

1.2 The exponential map for sprays

As we mentioned, the non-stationary solutions to Eqs. (1, 2) will be called geodesics. As this is a system of first order ODE over \(TM\), according to the Picard-Lindelöf theorem, the existence and uniqueness of its solutions are guaranteed by the locally Lipschitz condition on \(H^\mu \).

Let \(\gamma _v(t)\) be the unique geodesic which starts from \(\pi (v)\) with velocity \(v\). The set \(\varOmega \) is given by those \(v\) for which the geodesic exists at least for \(t\in [0,1]\). The exponential map \(\exp :\varOmega \rightarrow M\times M\) is given by

$$\begin{aligned} v \mapsto (\pi (v),\gamma _v(1)), \end{aligned}$$

while the pointed exponential map at \(p \in M\), is \(\exp _p:\varOmega _p \rightarrow M\), \(\varOmega _p=\varOmega \cap \pi ^{-1}(p)\), \(\exp _p v:=\gamma _v(1) =\pi _2(\exp v) \). By the homogeneity of \(H^\mu \) on velocities we have

$$\begin{aligned} \gamma _{sv}(t)=\gamma _v(st), \end{aligned}$$
(9)

thus the set \(\varOmega \) (and \(\varOmega _p\)) is star-shaped in the sense that if \(v\in \varOmega \) then \(s v\in \varOmega \) for every \(s\in [0,1]\). Equation (9) clarifies that it make sense to call affine the geodesic parameter, for any affine reparametrization of a geodesic gives a curve which solves the geodesic equation.

Remark 3

The exponential map of the reverse spray, denoted \(\tilde{exp}\) is

$$\begin{aligned} \tilde{exp} \,v\,{:=}\, (\pi (v),\gamma _{v}(-1)), \qquad \tilde{exp}_p\, v\,{:=}\,\gamma _{v}(-1), \end{aligned}$$

and since in general \(\gamma _v(-1)\ne \gamma _{-v}(1)\) this map cannot be simply expressed through the exponential map \(\exp \). Of course, if the spray is reversible it coincides with \(v \mapsto \exp (-v)\).

Hartman [29] proved that for connections the uniqueness of the geodesic equation is lost if the Lipschitz condition is weakened to continuity [29]. This result was improved by Hartman and Wintner [33] [31, Exercise 6.2, Chap.  5] who considered the metric

$$\begin{aligned} \mathrm{d}s^2=\left( 1+| v|^{1+\alpha }\right) \left( \mathrm{d}u^2+ \mathrm{d}v^2\right) \end{aligned}$$

for \(0<\alpha <1\). Its connection satisfies an Hölder condition of exponent \(\alpha \), and on any neighborhood of \(p=(0,0)\) one can find infinite geodesics which start from \(p\) with velocity \((1,0)\).

These examples suggest the Lipschitz condition as the best differentiability condition that can be placed on a spray.

Remark 4

Actually, if the connection is that of a Riemannian \(C^2\) surface of Euclidean 3-space then uniqueness of geodesics is guaranteed, and one can even build \(C^1\) normal coordinates though the connection is just continuous [29]. Moreover, still in the 2-dimensional case under Lipschitzness of the connection one can prove results which are stronger than those considered in this workFootnote 1 [29, 30, 33]. Nevertheless, we shall work in the general \(n\)-dimensional case since the 2-dimensional one appears too special and less relevant for applications (it suffices to recall that in 2-dimensions any metric is locally conformally flat).

In this work we shall prove that the exponential map of every spray is a local Lipeomorphism (local bi-Lipschitz homeomorphism) and that on \(M\) any point admits arbitrarily small convex neighborhoods (Theor. 4).

In a Riemannian framework, this result can be improved in some directions. For instance, it is well known that Riemannian spaces with sectional curvature bounded from below or above find a remarkable generalization in the notion of Alexandrov space. In this quite general setting there are indeed results on the existence of convex neighborhoods [9], Prop. 5.5] [54, 55].

We were finishing this work when we learned that Kunzinger, Steinbauer and Stojkovíc, in a recent preprint [36], have also provided a proof of the bi-Lipschitzness of the pointed exponential map and of the existence of convex neighborhoods. As us they were motivated by a recent work by Chrusciel and Grant on causality theory under low differentiability conditions [16]. Their approach is complementary to our own and deserves some comments. They consider a net of smooth Riemannian metrics \(g_\epsilon \) obtained from \(g\) through convolution with a mollifier, and use methods from comparison geometry to obtain sufficiently strong estimates on the exponential maps of the regularized metrics, so as to be able to carry over the bi-Lipschitz property through the limit. In order to perform this last step in the general pseudo-Riemannian case they use some results on comparison geometry for indefinite metrics recently obtained by Chen and LeFloch [14]. They also show that the Riemannian case can be dealt with using the Rauch comparison theorem.

Our approach has several advantages among which that of being tailored to the results on convexity that we wish to prove. The differences between our strategy and more classical approaches based on the smooth category and the inverse function theorem are minimal; there is no use of comparison geometry, nor regularization is required. No prior knowledge of Riemannian geometry is actually needed, for we never use the concept of (sectional) curvature or Jacobi field. Our results are therefore obtained improving some local analytical results, without introducing advanced topics in differential geometry or touching the very foundations of the theory under consideration. This is desirable since we are actually obtaining basic results on local convexity which could be placed at the very beginning of treatments on differential geometry under low regularity. Our study may be useful, for instance, to understand the limits of pseudo-Finsler geometry and particularly Lorentzian geometry, for which a theory of the same generality of Alexandrov’s is missing.Footnote 2

We have also tried to be as a complete as possible. In this way the reader will be able to refer to the results of this work without the need of making involute arguments in the attempt of extending the results herein obtained. For instance, we prove that the non-pointed exponential map \(\exp \) is a Lipeomorphism from a neighborhood of the zero section to a neighborhood of the diagonal on \(M\times M\). This result is quite useful in applications, for instance in causality theory it is used in the proof that the causal relation over convex normal sets is closed (Theor. 12).

Unless otherwise specified, \(\Vert \, \Vert \) will denote the Euclidean norm on \(\mathbb {R}^n\). Let us recall that a function \(f:O \rightarrow \mathbb {R}^k\) defined on an open set \(O\subset \mathbb {R}^n\) is Lipschitz if for every \(p,q\in O\),

$$\begin{aligned} \Vert g(p)- g(q)\Vert < K \Vert p-q\Vert \end{aligned}$$

for some \(K>0\). It is locally Lipschitz if this inequality holds over every compact subset of \(O\), with \(K\) dependent on the compact subset. If \(f\) depends on another variable \(z\), then \(f\) is uniformly Lipschitz if the Lipschitz constant does not depend on \(z\). It is locally uniformly Lipschitz if, chosen any compact set on the domain of \(z\), the Lipschitz constant can be chosen to be dependent on just the compact set rather than \(z\). As a consequence, a function of, say, two variables \(f(x,y)\) which is Lipschitz is also Lipschitz in one variable uniformly in the other. A Lipeomorphism \(f\) is a homeomorphism for which both \(f\) and \(f^{-1}\) are locally Lipschitz. In the cases that will interest us \(f\) will be defined in an open subset \(O\subset \mathbb {R}^n\), and \(f(O)\) will also be an open subset of \(\mathbb {R}^n\).

It is well known that the ODE \(\dot{x}=f(x)\) for which \(f\) is Lipschitz admits unique solutions which have a Lipschitz dependence on the initial conditions [31], Ex. 1.2, Chap. 2] [13], Prop. 1.10.1] [38], Cor. 1.6]. As a result, the exponential map \(\exp \) and its pointed version \(\exp _p\) are locally Lipschitz.

We shall improve this result as in the next theorem. This refinement will be used in the proof that in a Riemannian space the geodesics are locally length minimizing in the family of absolutely continuous curves (and to prove an analogous result in the Lorentzian case). To increase readability we postpone most proofs to the next sections.

Theorem 1

Let us consider a Lipschitz spray (Lipschitz \(H^\mu \)) on a \(C^{2,1}\) manifold \(M\), and let \(\varphi :W\rightarrow TM\), \(W\subset [0,+\infty )\times TM\), \(\varphi (t,v):=\gamma _v'(t)\) be the geodesic flow map defined at those \((t,v)\in \mathbb {R} \times TM\) for which the geodesic \(\gamma _v\) extends up to time \(t\) (so that the expression on the right-hand side makes sense). Then \(W\) is an open subset of \([0,+\infty )\times TM\) such that \([0,1]\times \varOmega \subset W\) and such that for every \(s\in [0,1]\) if \((t,v) \in W\) then \((st,v),(t,sv)\in W\). Analogously, \(\varOmega \subset TM\) is open and star-shaped in the sense that for every \(s\in [0,1]\) if \(v\in \varOmega \) then \(sv\in \varOmega \).

Moreover, \(\varphi (\cdot , v)\) is \(C^{1,1}\), \(\varphi \) is locally Lipschitz and there is a star-shaped subset \(\tilde{\varOmega }\subset \varOmega \) such that, \(\varOmega \backslash \tilde{\varOmega }\) has zero Lebesgue measure, and for every \(v\in \tilde{\varOmega }\) and every \(t\in [0,1]\), \(\varphi (t,\cdot )\) is differentiable at \((t,v)\) and the differential (Jacobian) \(\partial _2 \varphi (\cdot , v)\) is locally Lipschitz in \(t\), locally uniformly with respect to those \(v\) belonging to \(\tilde{\varOmega }\) (that is the local Lipschitz constant can be chosen so that it does not vary in a small neighborhood of \(v\) as long as the independent variable stays in \(\tilde{\varOmega }\)). Finally, for any \(v\in \tilde{\varOmega }\) we have that for almost every \(t\in [0,1]\) the following mixed differentials exist, are locally bounded and coincide

$$\begin{aligned} \partial _1 \partial _2 \varphi = \partial _2 \partial _1 \varphi . \end{aligned}$$

These conclusions do not change if we restrict \(\varphi \) to some Lipschitz \(m\)-dimensional submanifold \(N\) of \(TM\). In this case the differential \(\partial _2\) refers to the variables of the Lipschitz chart on \(N\) and the almost everywhere existence of \(\partial _2 \varphi \) must be understood in the \(m\)-dimensional Lebesgue measure of \( N\).

One would like to prove that the (pointed) exponential map is invertible, say a local Lipeomorphism or a local diffeomorphism. Hartman [31, Exercise 6.2, Chap. 5] [32] showed in the connection case that the exponential map is actually \(C^1\) and hence a local diffeomorphism provided the connection admits a continuous exterior derivative. We shall not impose these additional conditions.

The local injectivity of the exponential map for Lipschitz connections was proved by Whitehead in his paper on the existence of convex normal neighborhoods [62]. In [63] he observed that the result could be generalized to sprays. Whitehead uses a theorem by Picard which applies to boundary value problems of second order ODEs [31], Cor. 4.1, Chap. 12]. Using the theorem on the invariance of domain one could infer that the exponential map provides a local homeomorphism, though from here it does not seems easy to obtain that it admits a Lipschitz inverse.

Our approach relies instead on an improved version of the inverse function theorem due to Leach [40] (see also [13]). This theorem depends on Peano’s definition of strong derivative [52] and on the natural corresponding notion of strong differential studied by Severi. Unfortunately, Peano’s contributions in this direction are, together with many other accomplishments by the Italian mathematician, little known [20]. Nijenhuis’ attempt [49] to popularize Peano’s choice and Leach’s inversion theorem passed essentially unnoticed. Peano’s choice provides a better definition of differential, so good, in fact, that having to choose one should probably adopt it in place of the usual differential in analysis textbooks. Indeed, the strong differential leads to stronger and more elegant results, and seems to corresponds better with the intuition. We hope that this study, showing the usefulness of Peano’s strong derivative for the exponential map will serve to motivate its mention in University courses.

Let us recall Peano’s definition of strong differential and its basic properties. We give a general definition for Banach spaces although we shall work on \(\mathbb {R}^k\) for some \(k\ge 1\). We denote with \(B(p,r):=\{q: | q-p|<r \}\) the open ball of radius \(r\) centered at \(p\), and with \(\bar{B}(p,r):=\{q: | q-p|\le r \}\) the closed ball.

Definition 2

Let \(E\) and \(F\) be Banach spaces, and let \(f:O \rightarrow F\), be a function defined on an open set \(O\subset E\). The strong differential of \(f\) at \(p \in O\) is a bounded linear transformation \(L:E \rightarrow F\) which approximates changes of \(f\) in the sense that for every \(\epsilon > 0\), there is a \(\delta >0\) such that if \(| q_1-p| \le \delta \) and \(| q_2-p| \le \delta \), then:

$$\begin{aligned} | f(q_1) -f(q_2) - L(q_1 - q_2) | \le \epsilon | q_1 - q_2| . \end{aligned}$$
(10)

Clearly, if the strong differential at \(p\) exists then it is unique. If \(f\) is strongly differentiable at \(p\) then taking \(q_2=p\) shows that it is also Fréchet differentiable and that the differentials so defined coincide. In the finite dimensional case which will interest us all norms are equivalent thus the notions of strong differentiation and strong differential are independent of the norm used.

We list some properties of strong differentiation which are easy to prove [21, 40, 41, 49, 52].

  1. (i)

    If \(f\) is strongly differentiable at \(p\) then it satisfies a Lipschitz condition in a neighborhood of \(p\).Footnote 3

  2. (ii)

    If \(f\) is differentiable in a neighborhood of \(p\) and the differential is continuous at \(p\) then it is strongly differentiable at \(p\).

  3. (iii)

    If \(f\) is strongly differentiable over a subset \(A\subset E\) then the strong differential is continuous over \(A\) with respect to the induced topology.

  4. (iv)

    If a continuous function \(f:U_1\times U_2\rightarrow V\) on a product Banach space admits strong partial differentials at \(p\) (obtained keeping constant the other variable) then it is strongly differentiable at \(p\) and the total differential has the usual gradient expression in terms of partial differentials.

  5. (v)

    The composition of strongly differentiable functions is strongly differentiable.

  6. (vi)

    The mixed partial strong derivatives coincide wherever they exist [46].

  7. (vii)

    If \(f:\mathbb {R}\rightarrow \mathbb {R}\) has positive strong differential at \(p\), then \(f\) is continuous and increasing in a neighborhood of \(p\).

Conditions (ii) and (iii) clarify that a function is \(C^1\) over an open set if and only if it is strongly differentiable over it. Some key theorems in analysis that require a \(C^1\) condition on a neighborhood of \(p\) can be proved demanding the weaker condition of strong differentiability at \(p\). An example is Leach’s inversion theorem [13, 35, 40, 49] which generalizes Dini’s and which we state in a form suitable for our purposes:

Theorem 2

(Leach) Let \(f:O \rightarrow \mathbb {R}^n\), be a function defined on an open subset \(O\subset \mathbb {R}^n\), such that \(f\) has strong differential \(L:\mathbb {R}^n\rightarrow \mathbb {R}^n\) at \(p\in O\). If \(L\) is invertible then there are an open neighborhood \(N_1\) of \(p\), an open neighborhood \(N_2\) of \(f(p)\), and a function \(g: N_2\rightarrow \mathbb {R}^n\) such that, \(f(N_1)=N_2\), \(g(N_2)=N_1\), \(f|_{N_1}\) and \(g\) are one the inverse of the other, they are both Lipschitz and \(g\) has strong differential \(L^{-1}\) at \(f(p)\).

Moreover, in this case \(f\) is differentiable at \(q\in N_1\) if and only if \(g\) is differentiable at \(f(q)\), in which case the differentials are invertible. This last statement holds also with differentiable replaced by strongly differentiable.Footnote 4

In order to clarify the connection between this inversion theorem and Clarke’s [17] it is convenient to recall the notion of Clarke’s generalized differential for locally Lipschitz functions:

Definition 3

The generalized Jacobian of a locally Lipschitz function \(f:O \rightarrow \mathbb {R}^n\), \(O\subset \mathbb {R}^k\) at \(p\), denoted \(\partial f(p)\), is the convex hull of all matrices \(M\) of the form

$$\begin{aligned} M = \lim _{p_i\rightarrow p} df(p_i) \end{aligned}$$

where \(p_i\) converges to \(p\), \(f\) is differentiable at \(p_i\) for each \(i\) and \(d f\) denotes the usual Jacobian.

By Rademacher’s theorem the generalized differential is non-empty at \(p\) and we have (see also [17, 21])

Proposition 1

If \(f:O \rightarrow \mathbb {R}^n\), \(O\subset \mathbb {R}^k\), is strongly differentiable at \(p\) then \(\partial f(p)=\{\mathrm{d}f(p) \}\).

Proof

Indeed, take any \(\epsilon >0\) and let \(\delta >0\) be such that Eq. (10) holds. Let \(J\) be the limit of a sequence \(J_i=d f(p_i)\) for \(p_i\rightarrow p\). We can assume \(\Vert p_i-p\Vert <\delta /2\) for each \(i\). Let \(e\) be any normalized unit vector. For each \(i\) we can find some \(0<\delta _i\le \delta /2\) such that

$$\begin{aligned} \Vert f(q_i)-f(p_i)-d f(p_i)(q_i-p_i)\Vert \le \epsilon \Vert q_i-p_i\Vert \end{aligned}$$

for every \(q_i\) such that \(\Vert q_i-p_i\Vert \le \delta _i\). Let \(q_i=p_i+\delta _i e\), then

$$\begin{aligned} \delta _i \Vert (L-J_i) e\Vert&=\Vert (L-J_i)(q_i-p_i)\Vert \le \Vert f(q_i)-f(p_i)-J_i(q_i-p_i)\Vert \\&\quad + \Vert f(q_i)-f(p_i)-L(q_i-p_i)\Vert \le 2\epsilon \Vert q_i-p_i\Vert =2\epsilon \delta _i. \end{aligned}$$

Simplifying \(\delta _i\), taking the limit \(i\rightarrow \infty \) and using the arbitrariness of \(\epsilon \) and \(e\) proves that \(J=d f(p)\) and hence \(\partial f(p)=\{d f(p)\}\).\(\square \)

Clarke proved that if \(k=n\) and \(\partial f\) admits only invertible elements then \(f\) is a local Lipeomorphism. Leach’s version states something more for \(f\) strongly differentiable at \(p\), for it establishes that the inverse is strongly differentiable at \(f(p)\).

Our strategy is then clear: we are going to prove the strong differentiability of the exponential map in order to deduce the Lipschitzness of the inverse by means of Leach’s (or Clarke’s) inversion theorem. The proof of the strong differentiability of the exponential map will pass through a local analysis based on the Picard-Lindelöf approximation method.

In the end we shall prove:

Theorem 3

Let \(M\) be a \(C^{2,1}\)-manifold endowed with a locally Lipschitz spray.

\((\exp )\) :

The set \(\varOmega \) is open in the topology of \(TM\). The exponential map \(\exp :\varOmega \rightarrow M\times M\), \(\varOmega \subset TM\), is locally Lipschitz. Moreover, \(\exp \) is strongly differentiable over the zero section, namely over the image of \(p\mapsto 0_p\). The map \(\exp \) provides a Lipeomorphism between an open star-shaped neighborhood of the zero section and an open neighborhood of the diagonal of \(M\times M\).

\((\exp _p)\) :

For every \(p\in M\) the set \(\varOmega _p\) is open in the topology of \(T_pM\). The pointed exponential map \(\exp _p:\varOmega _p \rightarrow M\), \(\varOmega \subset T_pM\), is locally Lipschitz and strongly differentiable at the origin. The map \(\exp _p\) provides a local Liperomorphism from a star-shaped open subset of \(\varOmega _p\) and an open neighborhood of \(p\) (for more see Theorems 1 and 4).

In the pointed case this result can be refined. We shall need some definitions.

Definition 4

An open neighborhood \(N\) of \(p\in M\) will be called normal if there is an open star-shaped subset \(N_p\subset \varOmega _p\) such that \(\exp _p:N_p \rightarrow {N}\) is a Lipeomorphism.

Definition 5

An open set \(C\subset M\) will be called convex normal if it is a normal neighborhood of each of its points. We shall say that \(\bar{C}\) is strictly convex normal if \(C\) is convex normal and any two points of \(\bar{C}\) are connected by a unique geodesic contained in \(C\) but for the endpoints. A (strictly) convex normal set is caller reversible if it is so also for the reverse spray.

Remark 5

In a reversible convex normal subset \(C\) for any two points \(p, q\in C\) there is a geodesic \(\gamma _{pq}:[0,1]\rightarrow C\) connecting \(p\) to \(q\) and a geodesic \(\gamma _{qp}:[0,1]\rightarrow C\) connecting \(q\) to \(p\). They coincide if the spray is reversible. Furthermore, there are geodesics for the reverse spray \(\tilde{\gamma }_{pq}(t)=\gamma _{qp}(1-t)\) connecting \(p\) to \(q\) and \(\tilde{\gamma }_{qp}(t)=\gamma _{pq}(1-t)\) connecting \(q\) to \(p\). The last identities follow from the uniqueness of the connecting geodesic for the spray. Observe that while the convexity of \(C\) with respect to the spray implies the convexity of \(C\) with respect to the reverse spray, a reversible convex normal set has a stronger property which cannot be deduced from the corresponding property for the spray, that is, in a reversible convex normal set \(C\) we have that \(\tilde{\exp }_p^{-1}|_C\) is a Lipeomorphism for every \(p \in C\).

The following concept will be useful in the next section.

Definition 6

Let \(C\) be a convex normal set, let \(p,q\in C\) and let \(x: [0,1]\rightarrow C\), \(x(0)=p\), \(x(1)=q\), be the unique geodesic connecting them. The vector \(\dot{x}(1)\) is denoted \(P(p,q)\) and called position vector.

We shall prove:

Theorem 4

Let \(M\) be a \(C^{2,1}\)-manifold endowed with a locally Lipschitz spray. Let \(O\) be an open neighborhood of \(p\in M\). Then there is a reversible strictly convex normal neighborhood \(C\) of \(p\) contained in \(O\), such that \(\exp \) establishes a Lipeomorphism between an open star-shaped subset of \(TC\) and \(C\times C\). Analogously, \(\tilde{exp}\) establishes a Lipeomorphism between an open star-shaped subset of \(TC\) and \(C\times C\).

Moreover, for every chart \(\{x^\mu \}\) defined in a neighborhood of \(p\), \(C\) can be chosen equal to the open ball \(B(p,\delta )\) for any sufficiently small \(\delta \) (the ball is defined through the Euclidean norm induced by the coordinates).

If the spray is compatible with a pseudo-Finsler structure then this result can be further refined.

Remark 6

The previous theorems can be formulated in more generality for \(C^{k,1}\) sprays over \(C^{k+2,1}\) manifolds with \(k\ge 0\) or for \(C^{k+1, \alpha }\) sprays over \(C^{k+3,\alpha }\), \(\alpha \in [0,1)\), manifolds. The exponential map and its inverse have the degree of differentiability of the spray.

These cases follow more or less straightforwardly from the Lipschitz \(k=0\) case treated here or from what is already known for \(C^1\) sprays. For a direct proof that generalizes that for the Lipschitz case see Remark 15.

1.3 Gauss’ Lemma for pseudo-Finsler sprays

Let a consider again a pseudo-Finsler geometry in which the fundamental tensor \(g_v\) is \(C^{1,1}\) and the spray is Lipschitz. This means that in the pseudo-Riemannian case \(g\) is \(C^{1,1}\) and the connection is Lipschitz.

The local length minimization property of geodesics in Riemannian spaces, or the local Lorentzian length maximization property of causal geodesics in Lorentzian manifolds, are proved passing through Gauss’ Lemma (in the Lorentzian case see [15, 51]). This Lemma is known to hold in Finsler geometry [5]

(Gauss’ Lemma under sufficient differentiability conditions) Let \(p\in M\), let \(N\) be a normal neighborhood of \(p\) and let \(v\in \exp _p^{-1} N\backslash 0\). Let \(w\in T_p M\sim T_v(T_p M)\). Then

$$\begin{aligned} g_{(d \exp _p)_v v}( (d \exp _p)_v v, (d \exp _p)_v w )=g_v(v,w). \end{aligned}$$
(11)

This lemma is usually expressed as above using the push forward of the exponential map [26], Theor. 3.70] [12]. As the exponential map is \(C^1\) for \(C^2\) metrics, one expects this lemma to be valid for \(C^2\) metrics.

The just mentioned classical proofs of Gauss’ lemma work indeed in the \(C^2\) case. Without entering in too many details the reader should just keep in mind that for what concerns differentiability with respect to the initial conditions the exponential map, by Peano’s theorem [31], Theor. 3.1], behaves better in the radial direction than in the transverse directions. As a consequence, the mixed derivative of the expression \(f(t,s)=\exp _p (t v(s))\) is continuous if \(v(s)\) is \(C^1\), cf. [31], Cor. 3.2]. Thus one can use Schwarz theorem and fully justify the proof of [12].

Other proofs seem less convincing [59], Cor. 2.2]. It is very easy to forget that one cannot work directly with, say, derivatives of vector or tensor fields expressed in a normal coordinate chart or the Christoffel symbols in normal coordinates, indeed, since this chart is just \(C^1\), vector and tensor fields on them can be at most \(C^0\) and hence are not differentiable and similarly the Christoffel symbols are not defined [18].

The situation is worse for Lipschitz connection or sprays and \(C^{1,1}\) metrics. Under this differentiability hypothesis the exponential map is just Lipschitz, thus Gauss’ Lemma is not even expected to hold. However, we shall show that Gauss’ Lemma still holds true in a formulation which does not require the differentiability of the exponential map.

We shall need some technical result either on (a) the differentiation under the integral sign, or on (b) Schwarz’s theorem on the equality of mixed partial derivatives. We shall discuss them in Sect. 1.8. In the end we shall prove (in order to obtain a more restrictive pseudo-Riemannian version just remove the vector index from the fundamental tensor \(g\)):

Theorem 5

Let \((M,g)\) be a \(C^{2,1}\) pseudo-Finsler manifold for which \(g\) is \(C^{1,1}\) (Sect. 1.1). Let \(N\) be a normal neighborhood of \(p\in M\). The function \(D^2_p:N \rightarrow \mathbb {R}\) defined by

$$\begin{aligned} D^2_p(q)\,{:=}\, 2L(p, exp_p^{-1}(q))=\,g_{exp_p^{-1}(q)}(exp_p^{-1}(q), exp_p^{-1}(q)) \end{aligned}$$
(12)

is \(C^{1,1}\) in \(q\) and

$$\begin{aligned} d D^2_p(q)= 2 g_{P(p,q)} (P(p,q),\cdot ), \end{aligned}$$
(13)

where \(P(p,q)= \gamma '_{\exp ^{-1}_p q}(1)\) is the position vector of \(q\) with respect to \(p\). Thus the level sets of \(D^2_p\) are orthogonal to the geodesics issued from \(p\), and for \(t,s>0\) the \((-t)\)-time flow map of \(P(p,\cdot )\) is a Lipeomorphism between \((D^2_p)^{-1}(s)\) and its image on \((D^2_p)^{-1}(s\, e^{-2t})\).

Finally, Eq. (11) holds wherever \(\exp _p:\exp _p^{-1} N \rightarrow N\) is differentiable, hence almost everywhere. Thus the usual Gauss’ Lemma holds under \(C^2\) differentiability of the metric \(g\).

Geometrically Eq. (13) states the geodesic connecting \(p\) to \(q\) is perpendicular to a level set \(D^2_p=cnst\) (in pseudo-Finsler geometry \(v\) is perpendicular to \(w\) if \(g_v(v,w)=0\)).

Observe that \(D^2_p\) can be negative if \(g\) is not positive definite. If \(g\) is positive definite, as we shall prove in a moment in Theorem  6, \(D_p(q)\) coincides with the Finsler distance between \(p\) and \(q\) in the space \((N,g|_N)\) and hence coincides with the Finsler distance on \(M\) provided the ball of radius \(D_p(q)\) centered at \(p\) is contained in \(N\).

Remark 7

It is somewhat surprising that \(D^2_p\) is \(C^{1,1}\). One would expect it to be Lipschitz because from its definition, that is Eq. (12), we see that it is built from the the inverse of the exponential map which is Lipschitz. The additional degree of differentiability comes from the fact that we can check that the differential is almost everywhere as in Eq. (13). This result can then be extended everywhere thanks to the Lipschitzness of \(D^2_p\) and the continuity of \(P(p,q)\) (see Theorem 16).

Remark 8

The proof of this result can be easily generalized to show that for \((p,q)\) belonging to a reversible convex normal set, \(D_p(q)\) is \(C^{1,1}\) in \((p,q)\) and its differential is

$$\begin{aligned} d D^2_p(q) (v_p,v_q)= 2 g_{P(p,q)} (P(p,q),v_q)+2 g_{\tilde{P}(q,p)} (\tilde{P}(q,p),v_p), \end{aligned}$$

where \(v_p\in T_pM\), \(v_q\in T_qM\), and \(\tilde{P}\) is the position vector map according to the reverse spray. Furthermore, with Prop. 3 we shall prove that \(P(p,q)\) is strongly differentiable on the diagonal of \(M\times M\) hence \(D^2_p(q)\) has first differential which is strongly differentiable at the origin.

We shall state the next result for pseudo-Finsler manifolds for which the fundamental tensor is either positive definite (Finsler geometry) or of signature \((-,+,\cdots ,+)\) (Lorentzian-Finsler geometry). It is necessary to elaborate the last structure in the notion of Finsler spacetime which extends the usual notion of spacetime met in mathematical relativity.

Let us start from a Lorentzian-Finsler manifold \((M,g)\), and let us keep in mind that if the fundamental tensor \(g_v\) does not depend on the velocity then we are back to a Lorentzian manifold [8]. Non vanishing vectors are called spacelike, lightlike or timelike depending on the sign of \(g_v(v,v)\), namely positive, null or negative, and the terminology extends to \(C^1\) curves provided the causality type of the tangent vector is consistent throughout the curve (which is assured for geodesics since in that case \(g_v(v,v)\) is constant over the curve). A vector is null if it is lightlike or zero, and non-spacelike if it is causal or zero. At any \(x\in M\) let us denote with \(I_x\subset T_xM\) the subset of timelike vectors, and with \(J_x\) the subset of non-spacelike vectors and with \(E_x\) the subset of null vectors.

Beem and Perlick [6, 56] have shown that each component of \(I_x\) is convex, and hence, by continuity, that each component of \(J_x\backslash \{0\}\) is convex. Since \(\partial L(x,v)/\partial v\ne 0\) for \(v\ne 0\), the hypersurfaces \(g_v(v,v)=cnst\) are imbedded submanifolds and hence each component of \(E_x\backslash \{0\}\) plus \(\{0\}\) is the boundary of some component of \(I_x\). Analogously, each component of \(J_x\backslash \{0\}\) plus \(\{0\}\) is the closure of some component of \(I_x\). Furthermore, again because \(\partial L(x,v)/\partial v\ne 0\) for \(v\ne 0\), distinct components of \(J_x\backslash \{0\}\) do not intersect.

For simplicity we shall restrict our analysis to

Definition 7

Finsler spacetimes are pseudo-Finsler manifolds \((M,g)\) for which (a) \(g\) has signature \((-,+,\cdots , +)\), (b) for one (and hence every) \(x\in M\), \(I_x\) has just 2 components, (c) there exists a global continuous timelike vector field which defines a notion of future cone.

Condition (c), can be accomplished passing to a double covering while (b) is assured under reversibility if the spacetime has dimension larger than two [47].

Given the time orientation, causal vectors are either future or past, and so the regular \(C^1\) causal curves are either past directed or future directed . The \(C^1\) causal curves which we shall consider will be future directed.

Remark 9

Observe that we do not assume that \(g\) is reversible. Thus we have essentially two different distributions of light cones on spacetime and hence two causality theories. In what follows, by mentioning only future directed timelike curves we restrict ourselves to one of these theories.

Two points \(x,y\in M\) are said to be chronologically related in a set \(S\subset M\), this being denoted \(y\in I^+_S(x)\), \((x,y)\in I^+_S\) or \(x\ll _S y\), if there is a future directed \(C^1\) timelike curve from \(x\) to \(y\) contained in \(S\). Two points are said to be causally related, this being denoted \(y\in J^+_S(x)\), \((x,y)\in J^+_S\) or \(x \le _S y\), if there is a future directed \(C^1\) causal curve from \(x\) to \(y\) contained in \(S\) or \(x=y\in S\). We write \(x<_Sy\) if \(x\le _S y\) but \(x\ne y\). If \(S=M\) then we write simply \(\ll \), \(\le \) and \(<\).

A curve \(\sigma :[a,b]\rightarrow M\) will be absolutely continuous (an AC-curve for short) if its components in one (and hence every) local chart are locally absolutely continuous. Equivalently, introduced a complete Riemannian metric on \(M\), and denoting with \(\rho \) the corresponding distance, \(\sigma \) is absolutely continuous if it satisfies locally the the usual definition of absolute continuity between (topological) metric spaces. Since every pair of Riemannian metrics over a compact set is Lipschitz equivalent, and \(M\) is locally compact, this definition does not depend on the metric chosen. Analogously, we can define the concept of Lipschitz curve.

We shall say that an AC-curve \(\sigma :[a,b]\rightarrow M\), \(t\mapsto \sigma (t)\), is future directed causal if \(\dot{\sigma }\) is is future directed causal almost everywhere. We do not need to define a notion of absolutely continuous timelike curve.

The Lorentzian–Finsler length of a causal AC-curve is

$$\begin{aligned} l[\sigma ]=\int _a^b\, \sqrt{-g_{\dot{\sigma }}(\dot{\sigma },\dot{\sigma })} \,\, \mathrm{d}t , \end{aligned}$$

and it is finite because the integrand belongs to \(L^1([a,b])\) as in coordinates we have \(\dot{\sigma }^\mu \in L^1([a,b])\), \(\sqrt{|\dot{\sigma }^\mu |} \in L^2([a,b])\).

In the Finsler case the concept of Finsler length is defined analogously but with a plus sign inside the square root. The Finsler distance from \(p\) to \(q\) is the infimum of the Finsler lengths of the \(C^1\) curves connecting \(p\) to \(q\). It is symmetric and hence a true distance whenever the Finsler structure is reversible. We can also define a Lorentzian-Finsler distance between \(p\) and \(q\) with \(p\le q\) as the supremum of the Lorentzian-Finsler lengths of the causal AC-curves connecting \(p\) to \(q\). As in Lorentzian geometry, it satisfies a reverse triangle inequality [8].

Theorem 6

Let \((M,g)\) be a \(C^{2,1}\) pseudo-Finsler manifold for which \(g\) is \(C^{1,1}\) (Sect. 1.1). Let \(N\) be a normal neighborhood of \(p\in M\) and suppose that \(g\) is

  • Finsler:

    • Let \(\sigma :[0,1]\rightarrow N\), \(s\mapsto \sigma (s)\), be any AC-curve starting from \(p\), then its length is larger than that of the (unique) geodesic connecting its endpoints, unless its image coincides with that of that geodesic. In this last case the Finsler distance from \(p\) provides an affine parameter \(r(s)\) where the dependence on \(s\) is absolutely continuous and increasing.

  • Lorentzian–Finsler:

    • Let \(\sigma :[0,1]\rightarrow N\) be any future directed causal AC-curve starting from \(p\), then \(\exp ^{-1}(\sigma (s))\) is future directed causal for every \(s>0\), and if \(\exp ^{-1}(\sigma (\hat{s}))\) is lightlike then \(\sigma |_{[0,\hat{s}]}\) coincides with a future directed lightlike geodesic segment up to parametrizations. Finally, the Lorentzian–Finsler length of \(\sigma \) is smaller than that of the (unique) future directed casual geodesic connecting its endpoints, unless its image coincides with that of that geodesic. In this last case the affine parameter of the geodesic is absolutely continuous and increasing with \(s\).

Remark 10

Physically the Lorentzian–Finsler version proves that a motion which is almost everywhere slower than light is also locally slower than light. This is the main result which allows us to develop causality theory for Lipschitz connections in Finsler spacetimes. In particular, \(I^+_N(p)\) (or \(J^+_N(p)\)) coincides with the exponential map-image in \(N\) of the future directed timelike (resp. causal) cone at \(p\).

Remark 11

It is natural to ask whether locally a spacelike geodesic segment minimizes the functional \(\int \sqrt{g_{\dot{\sigma }}(\dot{\sigma },\dot{\sigma })} \,\mathrm{d}t\) over the \(C^1\) spacelike curves connecting the same endpoints. The answer is negative already in 1+1 Minkowski spacetime, just consider almost lightlike zig-zag curves which approximate the geodesic. Their presence shows that the infimum of the functional vanishes.

1.4 Some applications to mathematical relativity

We recall that according to Hawking and Ellis [34] a future directed continuous causal curve \(x:[a,b]\rightarrow M\), is a continuous curve such that for every open convex normal set \(C\) intersecting \(x\), whenever \(x([t_1,t_2])\subset C\), \(t_1<t_2\), the points \(x(t_1)\) and \(x(t_2)\) are connected by a future directed causal geodesic contained in \(C\). This definition can be imported word by word to the realm of Finsler spacetimes.

An interesting consequence of Theorem 6 is Theorem 7 which will provide a kind of converse of the well known fact that continuous causal curves are Lipschitz when parametrized with respect to the arc-length of a Riemannian metric [53] (due to the light cones that bound the curve). Known proofs in Lorentzian geometry [11] work under stronger differentiability assumption which guarantee the validity of the usual Gauss’ Lemma.

For its proof we shall need a lemma (a Lorentzian version is [48, Lemma 2.13]). Given two Lorentzian-Finsler metrics \(g_1\) and \(g_2\) we write \(g_1<g_2\) if the causal vectors for \(g_1\) are timelike for \(g_2\).

Lemma 1

Let \((M,g)\) be a Finsler spacetime. Let \(p\in M\) then we can find in a neighborhood \(O\) of \(p\) a flat Lorentzian metric (hence independent of \(v\)) \(g^+\), such that \(g<g^+\).

Proof

Let \(\{x^\mu \}\) be a coordinate chart in a neighborhood \(O\) of \(p\in M\). Let \(h=(\mathrm{d}x^0)^2+(\mathrm{d}x^1)^2+\cdots +(\mathrm{d}x^{n-1})^2\) be the usual Euclidean metric and let us consider the corresponding unit sphere subbundle of \(TM\). Let \(\hat{J}_x\), \(x\in O\), be the intersection of \(J_x\) with the unit sphere at \(x\). Since the components of \(J_x\backslash \{0\}\) are convex in the linear structure of \(T_xM\), \(\hat{J}_x\) is made of two closed disjoined convex sets of the unit sphere at \(T_xM\). We can always find a great circle separating the two convex sets (note that the sphere has dimension \(n-1\), the circle has dimension \(n-2\), thus the terminology used is not accurate for \(n\ne 3\)) (to prove this use for instance the stereographic projection from a point not belonging to the convex sets, use the Hahn-Banach theorem, and then project back to the sphere) and we can also rotate the coordinate system so that the hyperplane on \(T_pM\) spanned by that great circle is orthogonal to \(\partial _0\). By continuity the Lorentzian metric at \(p\), \(g^+=-N (\mathrm{d}x^0)^2+(\mathrm{d}x^1)^2+\cdots +(\mathrm{d}x^{n-1})^2\), satisfies \(g<g^+\) for sufficiently large \(N\). Still by continuity of \(J_x\) on \(x\) (this continuity can be rigourously expressed in the Hausdorff metric on sets, but the details will not be needed) the relation \(g<g^+\) holds in a neighborhood of \(p\) which we can redefine to be \(O\).\(\square \)

Theorem 7

Let \((M,g)\) be a Finsler spacetime where \(M\) is a \(C^{2,1}\)-manifold endowed with a \(C^{1,1}\) fundamental field \(g\). Let \(I\) be an interval of the real line. Every future directed causal AC-curve \(x:I\rightarrow M\) is a future directed continuous causal curve. Every future directed continuous causal curve \(x:I\rightarrow M\) once suitably parametrized (e.g. with respect to the arc-length of a Riemannian metric) becomes a future directed causal locally Lipschitz curve.

Remark 12

It is not true that every continuous causal curve is a causal AC-curve. For instance, consider the timelike geodesic of Minkowski spacetime which satisfies \(\varvec{x}=0\) and which is parametrized by \(x^0\). Consider the parametrization \(t=f_s^{-1}(x^0)\) where \(f_s\) is a singular monotone continuous function [57], Ex. 8.20], so that \(\dot{f}_s=0\) almost everywhere.

Proof

It is sufficient to prove it for \(I=[a,b]\). Suppose that \(x\) is a future directed causal AC-curve, let \(C\) be a convex normal set intersecting \(x\), and let \(t_1<t_2\) be such that \(x([t_1,t_2]) \subset C\). The set \(C\) is a normal neighborhood for \(p:=x(t_1)\) thus by Theorem 6 the geodesic connecting \(x(t_1)\) and \(x(t_2)\) is (future directed) causal.

Conversely, suppose that \(x:I\rightarrow M\) is a future directed continuous causal curve and let \(\bar{t}\in I\). By Lemma 1 we can find \(C^{2,1}\) coordinates \(x^\mu \) in a convex neighborhood \(C\) of \(p:=x(\bar{t})\) such that for some \(N>0\), the Lorentzian metric \(g^+=-N (\mathrm{d}x^0)^2+ (\mathrm{d}x^1)^2+\cdots (\mathrm{d}x^{n-1})^2\) satisfies \(g<g^+\) throughout \(C\).

The function \(x^0(t)\) must be increasing in a neighborhood of \(\bar{t}\). Indeed, for \(t_1,t_2\) belonging to a sufficiently small neighborhood of \(\bar{t}\), \(x(t_1), x(t_2)\in C\). By Theorem 6 there is a future directed causal \(g\)-geodesic connecting \(x(t_1)\) with \(x(t_2)\), which is in particular a future directed \(g^+\)-causal \(C^1\) curve. But \(x^0\) is increasing over this type of curve since \(x^0\) is the usual time coordinate for the subset \((C,g^+)\) of Minkowski spacetime, which proves the claim.

Once parametrized with respect to \(x^0\) the curve becomes Lipschitz because of the condition of \(g\)-causality which implies \(g^+\)-causality which implies \(\Vert \varvec{x}(t_2)-\varvec{x}(t_1)\Vert \le N | x^0(t_2)-x^0(t_1)|\). Clearly, if \(l\) is an arc-length parameter induced by the Euclidean coordinate metric \((\mathrm{d}x^0)^2+(\mathrm{d}\varvec{x})^2\) then \(x^0(l)\) is 1-Lipschitz, and so \(x(l)\) is locally Lipschitz. As all Riemannian metrics are Lipschitz equivalent over compact sets, \(x\) is locally Lipschitz whenever parametrized with respect to Riemannian arc-length.\(\square \)

Continuous causal curves in Lorentzian geometry enjoy nice properties under various notions of limit [45]. The proofs presented in reference [45] hold as well under our present weaker differentiability assumptions and in the Finsler spacetime case. Analogously, as should be expected from the above equivalence, the family of absolutely continuous curves is closed under uniform convergence, a fact quite well known since the work of Tonelli on one-dimensional variational principles [10].

The theorems so far proved in the Lorentzian–Finsler case are sufficient to establish the validity of most results of mathematical relativity and especially of causality theory for \(C^{1,1}\) metrics on even Finsler rather than just Lorentzian spacetimes. In particular, the notions of chronological and causal relations do not require modifications from the standard ones [34]. The chronological relation is still open, the boundaries of the causal and chronological futures of a point coincide, the achronal boundaries are still Lipschitz hypersurfaces and so on.

We wish to include a result of this type to show that most proofs can be extended word by word from the Lorentzian \(C^3\) metric case to the Lorentzian–Finslerian \(C^{1,1}\) metric case. Let us recall that a set is achronal if there is no timelike curve starting and ending at the set.

Lemma 2

Let \((M,g)\) be a Finsler spacetime where \(M\) is a \(C^{2,1}\)-manifold endowed with a \(C^{1,1}\) fundamental field \(g\). Let \(p<q\) then \((p,q)\in I^{+}\) or every continuous causal curve connecting \(p\) to \(q\) is an achronal future directed lightlike geodesic (up to parametrizations).

Proof

Assume \((p,q)\notin I^{+}\) and let \(\gamma : [0,1]\rightarrow M\) be a future directed continuous causal curve such that \(\gamma (0)=p\) and \(\gamma (1)=q\). Since the image of \(\gamma \) is compact there is a finite covering with convex normal neighborhoods \(U_i\), \(i=1,\ldots ,n\). We can assume \(U_i\cap U_{i+1}\ne \emptyset \) and that there are \(p_i \in \gamma \cap U_i\cap U_{i+1}\), \(i=1,\ldots ,n-1\), \(p_0\equiv p\in U_1\) and \(p_n\equiv q\in U_n\). Since \(\gamma \) is a continuous causal curve by Theorem 6 \((p_i,p_{i+1}) \in J^{+}_{U_{i+1}}\) thus \(p_i\) and \(p_{i+1}\) are joined by a geodesic \(\eta _i\) in \(U_{i+1}\) and this geodesic coincides with \(\gamma \) between the same points or it is timelike.

Let us show that the presence of one timelike segment \(\eta _i\) implies \((p,q)\in I^{+}\). This is so because from the curve made of geodesic segments \(\eta _i\) one can construct a piecewise curve made of timelike geodesic segments. Indeed, one start from \(\eta _i\) and translate slightly the final point of \(\eta _{i-1}\) along \(\eta _i\) so that the new connecting \(\eta '_{i-1}\) becomes timelike (as the Lorentzian–Finsler distance between the new endpoints is necessarily positive). Analogously, one translates slightly the starting point of \(\eta _{i+1}\) along \(\eta _i\) so that the new connecting \(\eta '_{i+1}\) becomes timelike. Then one continues in this way by taking as reference the timelike geodesic segments \(\eta '_{i-1}\) or \(\eta '_{i+1}\). The corners of the so obtained piecewise timelike curve can be finally smoothed out.

Also note that if all the segments \(\eta _i\) are lightlike but do not join smoothly then one can, arguing as above, replace one lightlike segment with one timelike segment by moving slightly the starting endpoint along the previous segment.

In conclusion if \((p,q)\notin I^{+}\) the continuous causal curve must be coincident with a lightlike geodesic connecting \(p\) to \(q\). This geodesic must be achronal, otherwise there is a timelike curve \(\sigma \) connecting \(p',q'\in \gamma \). The continuous causal curve connecting \(p\) to \(p'\) following \(\gamma \) and \(p'\) to \(q'\) following \(\sigma \) and \(q'\) to \(q\) following \(\gamma \) is, by the just proved result, a lightlike geodesic which is impossible since \(\sigma \) is timelike. The contradiction proves that \(\gamma \) is achronal.\(\square \)

Under the assumption of the previous theorem we have:

Corollary 1

If \(p\ll r\) and \(r \le q\) then \(p \ll q\). If \(p\le r\) and \(r\ll q\) then \(p\ll q\).

Proof

It follows from the fact that the composition of a timelike and a causal curve, in whatever order, gives a causal curve which is not a lightlike geodesic as at some points it is timelike.\(\square \)

The question as to whether causality theory could be mindlessly generalized to \(C^{1,1}\) spacetime metrics was considered by Chruściel and Grant [16] among others. They developed a generalization of causality theory to continuous metrics and found that classical results involving the existence of normal neighborhoods cannot be proved in their framework. They observed:

We note that several statements in (causality theory) concerning geodesics remain true for \(C^{1,1}\) metrics; it is conceivable that all of them remain true, but justifications would be needed.

We provided arguments which prove the correctness of this expectation. Indeed, the existence of convex normal neighborhoods is the central technical tool which allows one to complete many local arguments, such as that on the openness of the chronology relation, over which causality theory is based. Once the existence of convex normal neighborhoods has been established, and the local maximization property of causal geodesics has been obtained, most (if not all) results of causality theory follow without any substantial alteration to their classical proofs. Of course those result that can be expressed only through the use of the second derivative of the metric, for instance because they use the curvature tensor, would require further discussion (especially in the Finsler case).

Working with continuous metrics as in [16] expands very much causality theory though there is a price to be paid. Some desirable results do not hold anymore, for instance lightlike geodesics are not necessarily locally achronal (for Lipschitz connections this result is guaranteed by Theorem 6). For other differences the reader is referred to [16].

1.5 Distance balls are convex

If the spray is the Levi-Civita connection of a Riemannian metric \(g\) it is natural to ask whether the convex neighborhood can be chosen to be a distance ball, that is, if for sufficiently small \(\delta \), \(D_p^{-1}([0,\delta ))\) is convex normal. Whitehead gave a positive answer to this problem through a proof which demands quite strong differentiability properties on \(D_p\) and hence on the metric. He reasons that it is sufficient to introduce normal coordinates \((y^1,\ldots , y^n)\) on \(N\), because in this way the distance balls become coincident with the coordinate balls (we proved the existence of convex normal neighborhood constructing them as coordinate balls). Unfortunately, we can apply our argument which shows the convexity of coordinate balls only for charts which are \(C^{2,1}\), thus the exponential map would have to be \(C^{2,1}\) and hence the metric would have to be \(C^{3,1}\) (and the manifold \(C^{4,1}\)). A different approach [58, 61] demands just the twice continuous differentiability of \(D_p^2\), but under our assumptions \(D_p^2\) turns out to be just \(C^{1,1}\). Finally, one could invoke Morse’s Lemma so as to find a coordinate system in which the level sets of \(D_p^2\) are coordinate spheres [43]. Unfortunately, this lemma applies only if \(D_p^2\) is \(C^2\).

Nevertheless, we shall prove that \(D_p^2\) is strongly convex using a Picard–Lindelöf analysis.

Let us recall [23] that a real function defined on an open convex set \(C\) of an affine space \(A\) is strongly convex with constant \(\lambda \) if there is a \(\lambda >0\) such that for every \(x,y \in C\), \(\alpha \in [0,1]\),

$$\begin{aligned} f(\alpha x+(1-\alpha ) y)\le \alpha f(x)+(1-\alpha ) f(y)-\frac{\lambda }{2}\, \alpha (1-\alpha ) \Vert x-y\Vert ^2. \end{aligned}$$

A \(C^1\) real function is strongly convex on \(C\) with constant \(\lambda >0\) if and only if its differential \(\mathrm{d}f\) is strongly monotone with constant \(\lambda \), that is, for every \(x,y \in C\),

$$\begin{aligned}{}[\mathrm{d}f (y)-\mathrm{d}f(x)]\cdot (y-x)\ge \lambda \Vert y-x\Vert ^2. \end{aligned}$$

In a Riemannian space we have analogous definitions and results [60], Prop. 16.2]. A real function defined on an open geodesically convex set \(C\) is geodesically strongly convex with constant \(\lambda \) if there is \(\lambda >0\) such that for every geodesic \(x: J\rightarrow C\), \(\alpha \in [0,1]\),

$$\begin{aligned} f( x((1-\alpha )a+\alpha b))\le (1-\alpha ) f(x(a))+\alpha f(x(b))-\frac{\lambda }{2} \,\alpha (1-\alpha ) D(x(a),x(b))^2. \end{aligned}$$

A \(C^1\) real function is geodesically strongly convex on \(C\) with constant \(\lambda >0\) if and only if its differential \(\mathrm{d}f\) is geodesically strongly monotone with constant \(\lambda \), that is, for every arc-length parametrized geodesic \(x: J\rightarrow C\),

$$\begin{aligned} \mathrm{d}f(\dot{x}(b))- \mathrm{d}f(\dot{x}(a)) \ge \lambda D(x(a),x(b)). \end{aligned}$$

The proof of the equivalence is obtained applying the result for the affine space case to the composition \(f(x(t))\).

We shall prove

Theorem 8

Let \(M\) be a \(C^{2,1}\)-manifold endowed with a \(C^{1,1}\) Riemannian metric \(g\) and corresponding locally Lipschitz Levi-Civita connection. Let \(p\in M\) and let \(\epsilon >0\). Let \(x^\mu :U\rightarrow \mathbb {R}^n\) be a chart in a neighborhood of \(p\) such that, \(g_{\alpha \beta }(p)=\delta _{\alpha \beta }\), \(\varGamma ^\mu _{\alpha \beta }(p)=0\). Let \(C\subset U\) be a coordinate ball around \(p\) which we already know to be convex normal for sufficiently small radius. We also have that for sufficiently small radius for every \(q, q_1,q_2\in C\), interpreting the minus sign and scalar product through the Euclidean structure induced by the coordinate system on \(C\)

$$\begin{aligned} | [d D^2_q(q_2)- d D^2_q(q_1)] (q_2-q_1)-2(q_2-q_1)^2 | \le \epsilon (q_2-q_1)^2, \end{aligned}$$
(14)

and for every \(q\in C\) and arc-length parametrized geodesic \(x:J\rightarrow C\)

$$\begin{aligned} | \nabla _{\dot{x}(b)} D^2_q(x(b)) - \nabla _{\dot{x}(a)} D^2_q(x(a)) -2 D(x(a),x(b)) | \le \epsilon D(x(a),x(b)). \end{aligned}$$
(15)

In particular, \(D^2_q:C\rightarrow [0,+\infty )\) is strongly convex with parameter \(\lambda =2-\epsilon \) with respect to both the Euclidean

$$\begin{aligned} D(q, (1-\alpha ) q_1+\alpha q_2)^2&\le (1-\alpha ) D(q,q_1)^2+\alpha D(q,q_2)^2\\&-\left( 1-\frac{\epsilon }{2}\right) \,\alpha (1-\alpha ) \Vert q_2-q_1\Vert ^2, \end{aligned}$$

and the Riemannian structures of \(C\)

$$\begin{aligned} D(q, x((1-\alpha ) a+\alpha b))^2\le&\, (1-\alpha ) D(q,x(a))^2+\alpha D(q,x(b))^2\\&-\left( 1-\frac{\epsilon }{2}\right) \,\alpha (1-\alpha ) D(x(0),x(1))^2. \end{aligned}$$

Thus for any sufficiently small \(r\) the open balls \(D_p^{-1}([0,r))\) are contained in \(C\) and are (strictly) convex normal neighborhoods.

As the balls \(D_p^{-1}([0,r))\) are convex we can infer a number of equivalent convexity properties thanks to the equivalences for metric spaces proved in [25], for instance \(D_p\) is itself convex.

We have the following improvement of our previous formulation of Gauss’ Lemma (Theorem 5) which shows that the direct product sum for the metric can be accomplished on a sphere of a chosen radius \(\bar{r}\) and almost everywhere outside it.

Theorem 9

With the notations of the previous theorem, the levels sets \(D_p^{-1}(r)\) are \(C^{1,1}\) hypersurfaces diffeomorphic to \(S^{n-1}\) with an induced Lipschitz metric \(h_r\). Locally on \(C\) we can always find \(C^{2,1}\) functions \(\theta _i\) \(i=1,\ldots , n-1\), such that in the \(C^{1,1}\) chart \((r,\theta _1,\ldots , \theta _{n-1})\) the metric takes the form

$$\begin{aligned} g=d r^2+(h_r)_{ij}( \mathrm{d}\theta _i-A_i \mathrm{d}r) ( \mathrm{d}\theta _j-A_j \mathrm{d}r), \end{aligned}$$
(16)

where \(h_r=(h_r)_{ij} \mathrm{d}\theta _i \mathrm{d}\theta _j\) and all the components \((h_r)_{ij}\), \(A_i\), are Lipschitz in \((r,\theta )\).

For any chosen sufficiently small radius \(\bar{r}\) new Lipschitz coordinates \((r,\alpha _1\), \(\ldots ,\alpha _{n-1})\) can be found such that \(\{\alpha _i\}\) provide a \(C^{1,1}\) chart on \(D_p^{-1}(\bar{r})\) and the metric takes the direct sum form

$$\begin{aligned} g=\mathrm{d}r^2+(h_r(\alpha ))_{ij}\,\mathrm{d}\alpha ^i \mathrm{d}\alpha ^j, \end{aligned}$$

where the components \((h_r(\alpha ))_{ij}\) are defined almost everywhere and are bounded (this holds also for spherical normal coordinates) and for \(r=\bar{r}\) they are defined everywhere and are Lipschitz in \(\alpha \).

1.6 Convexity on Lorentzian manifolds

In this section we study the problem of the existence of convex/concave functions and sets in Lorentzian manifolds. The Lorentzian case is more involved that the Riemannian but is essential for the understanding of Lorentzian manifolds under low differentiability conditions.

In the next theorem \(\eta _{\alpha \beta }\) is the usual Minkowski metric in diagonal form (\(\eta _{00}=-1\), \(\eta _{ii}=+1\) for \(i\ge 1\)).

Theorem 10

Let \(M\) be a \(C^{2,1}\)-manifold endowed with a \(C^{1,1}\) Lorentzian metric \(g\) and corresponding Levi-Civita Lipschitz connection. Let \(p\in M\) and let \(\epsilon >0\). Let \(x^\mu :U\rightarrow \mathbb {R}^n\) be a chart in a neighborhood of \(p\) such that, \(g_{\alpha \beta }(p)=\eta _{\alpha \beta }\), \(\varGamma ^\mu _{\alpha \beta }(p)=0\). Let \(C\subset U\) be a coordinate ball around \(p\) which we already know to be convex normal for sufficiently small radius. We also have that for sufficiently small radius, for every geodesic \(x: [0,1]\rightarrow C\), and for every \(q\in C\)

$$\begin{aligned} \left| \frac{\mathrm{d}}{\mathrm{d}t}D^2_q(x(t))\Bigg |_{t=1}-\frac{\mathrm{d}}{\mathrm{d}t} D^2_q(x(t))\right| _{t=0}-2D^2(x(0),x(1)) |\le \epsilon \Vert x(1)-x(0)\Vert ^2, \end{aligned}$$
(17)

where the Euclidean and affine structure of the coordinate chart is used just on the right-hand side.

Observe that in the next theorem there is no mention to the Euclidean or affine structures induced by a coordinate chart (we stress once again that \(D^2_p\) can be negative).

Theorem 11

Let \(M\) be a \(C^{2,1}\)-manifold endowed with a \(C^{1,1}\) Lorentzian metric \(g\) and corresponding locally Lipschitz Levi-Civita connection. Let \(p\in M\) and let \(\epsilon >0\). Let \(\gamma : I\rightarrow M\), \(t\mapsto \gamma (t)\), be a timelike geodesic such that \(p=\gamma (0)\). The convex normal set \(C\ni p\) of Theorem 4 can be chosen so small that once \(I\) is redefined to be the connected component of \(\gamma ^{-1}(C)\) containing \(0\), the following property holds.

For every \(q,r\in \gamma (I)\), \(q\ne r\), there is a strictly convex normal set \(O\ni r\), \(\bar{O}\subset C\), such that all the points of \(\bar{O}\) are either in the chronological future or in the chronological past of \(q\); every geodesic \(x: J\rightarrow O\), connecting two points on the same level set \((D_q^2)^{-1}(c)\), \(c<0\), of the function \(D^2_q:C\times C \rightarrow \mathbb {R}\), satisfies, once reparametrized with respect to \(g\)-arc length (\(x\) is necessarily spacelike by Theorem 6, thus \(D(x(a),x(b))\ge 0\)), for every \(a,b\in J\),

$$\begin{aligned} | \nabla _{\dot{x}(a)} D^2_q(x(a)) - \nabla _{\dot{x}(b)} D^2_q(x(b))-2 D(x(a),x(b))| \le \epsilon D(x(a),x(b)). \end{aligned}$$
(18)

In particular, \(D^2_q(x(t))\) is strongly convex with parameter \(2-\epsilon \)

$$\begin{aligned} D_q^2(x((1-\alpha ) a +\alpha b))&\le (1-\alpha ) D_q^2(x(a))+\alpha D_q^2(x(b))\\&-\left( 1-\frac{\epsilon }{2}\right) \,\alpha (1-\alpha ) D(x(a),x(b))^2, \end{aligned}$$

and the sets of the form \((D^2_q)^{-1}((-\infty ,c))\cap O\) for \(c<0\) are strictly geodesically convex.

Remark 13

Mimicking the proof of [3], Prop. 3.1] it is easy to show that the Lorentzian distance on \(O\) from \(q\), \(D^L_q=\sqrt{-D^2_q}\) is semiconvex and hence almost everywhere first and twice differentiable. However, in order to prove the geodesic convexity of \((D^2_q)^{-1}((-\infty ,c))\cap O\) we need a result on the convexity of \(D^2_q\) rather than on the convexity of \(D^L_q\). Furthermore, we stress that the notion of semiconvexity, due to Rockafellar, is a rather weak notion. For instance the concave function \(-x^2\) on \(\mathbb {R}\) is semiconvex (thus the terminology semiconvexity can be misleading).

We recall that a subset \(A \subset M\) is causally convex in \(B\subset M\), with \(A\subset B\), if every \(C^1\) causal curve contained in \(B\) and joining two points of \(A\) is necessarily contained in \(A\). An open subset \(B\) is strongly causal if every point \(p\in B\) admits arbitrarily small open neighborhoods which are causally convex in \(B\). An open subset \(S\) of \(M\) is called causally simple if it is strongly causal and \(J^+_S\subset S\times S\) is closed in the product topology [34, 48]. A causally simple subset \(S\) is globally hyperbolic if for every \(p,q\in S\), \(J_S^{+}(p)\cap J^{-}_S(p)\) is compact.

The following claim was known under stronger differentiability assumptions. It can be found in a footnote of [48].

Theorem 12

Convex normal subsets are causally simple.

Proof

Let \(T\) be the global timelike vector field which provides the time orientation. Let \(C\) be a convex normal subset, and let \(f_1,f_2:C\times C \rightarrow \mathbb {R}\) be the functions \(f_1(p,q):=g(\exp _p^{-1}q, \exp _p^{-1}q)\), \(f_2(p,q):=g(\exp _p^{-1}q, T(p))\) since \(\exp ^{-1}\) and \(g\) are continuous \(f_1\) and \(f_2\) are continuous, and hence \(J^{+}_C=f_1^{-1}((-\infty ,0])\cap f_2^{-1}((-\infty ,0))\) is closed.

A spacetime \(C\) is strongly causal if and only if for every \(p,q \in C\), \((p,q)\in J_C^+\) and \((q,p)\in \overline{J_C^+}\) imply \(p=q\) (see [44]). Suppose that there are \(p,q\in C\), \(p\ne q\), such that \(p\le _C q\) and \(q\le _C p\) (we just proved \(\overline{J_C^+}=J^+_C\)). Let \(\gamma _1\) be the future directed causal geodesic connecting \(p\) to \(q\) and let \(\gamma _2\) be the future directed causal geodesic connecting \(q\) to \(p\). Then the images of \(\gamma _1\) and \(\gamma _2\) differ (otherwise there would be a geodesic which is both future and past directed), and hence there are two geodesics connecting \(p\) to \(q\), a contradiction to the uniqueness of the connecting geodesic in convex normal sets (Theor. 4).\(\square \)

Corollary 2

Let \(M\) be a \(C^{2,1}\)-manifold endowed with a \(C^{1,1}\) Lorentzian metric \(g\) and corresponding locally Lipschitz Levi-Civita connection. Let \(p\in M\) then there is a local base \(\{C_i\}\) for the topology at \(p\) such that for every \(i\), \(C_i\) is strictly convex normal, globally hyperbolic, \(\bar{C}_{i+1}\subset C_i\), and \(C_{i+1}\) is causally convex in \(C_{i}\) (and hence \(C_1\)).

Part of the previous result was already known [48], what was open was the result on the convexity of \(C_i\). Observe that if \(C_i\) is globally hyperbolic then any two causally related events are connected by a causal geodesic contained in \(C_i\) (by the Avez-Seifert theorem [34]). Thus what was really missing was the convexity with respect to spacelike geodesics. These globally hyperbolic convex normal sets behave pretty well, indeed this property is left invariant under finite intersections. As a consequence, by Lebesgue’s covering lemma, for any compact set \(K\) on spacetime, for any metric \(\rho : M\times M \rightarrow \mathbb {R}\) inducing the manifold topology, and finite covering of \(K\) by globally hyperbolic convex normal sets, there is an \(\epsilon >0\) such that for any pair of points \(p,q\) such that \(\rho (p,q)<\epsilon \), points \(p\) and \(q\) are contained in one element of the covering and hence connected by some geodesic.

Remark 14

A spacetime is strongly causal if every point admits an arbitrarily small causally convex set (in \(M\)). In a strongly causal spacetime, we can find a causally convex (in \(M\)) open neighborhood \(Y\) of \(p\) contained in \(C_1\), and for sufficiently large \(i\), \(C_i\subset Y\). As a consequence, the sets \(C_i\) for sufficiently large \(i\) are also causally convex in \(M\). Thus for strongly causal spacetimes we can include in the previous Corollary the causal convexity of \(C_i\) with respect to \(M\) among the properties of these sets.

1.7 Two variations on the main theme

Our results admit a number of variations obtained replacing the base point \(p\) of the pointed exponential map with an embedded manifold. For instance, consider a pseudo-Riemannian manifold endowed with a \(C^{1,1}\) metric and a Lipschitz metric compatible connection. Let \(\phi :S\rightarrow M\), \(k\ge 1\), be a \(C^{1,1}\) \(k\)-dimensional embedding, such that the induced metric is pseudo-Riemannian. Its non-degeneracy implies that at each point \(p\in \phi (S)\) the tangent space \(T_pM\) is the direct sum of the tangent space to \(\phi (S)\) and it normal space. Let \(\nu (S)\) be the corresponding \(n\)-dimensional normal bundle with base \(\phi (S)\).

We shall prove the following theorem analogous to Theorem 3:

Theorem 13

(\(\exp _{\nu (S)}\)):

The vector bundle \(\nu (S)\) is Lipschitz. Moreover, the map \(\exp _{\nu (S)}\) is strongly differentiable on the image of the zero section of \(\pi | _{\nu (S)}: \nu (S)\rightarrow \phi (S)\), and establishes a Lipeomorphism between a neighborhood of the image of the zero section and a neighborhood of \(\phi (S)\).

By the Lipschitzness of \(\nu (S)\) we can apply to \(\exp _{\nu (S)}:=\exp |_{\nu (S)}\) the results of Theorem 1. This theorem can also be used to construct tabular neighborhoods. The pointed exponential map can be regarded as a special case of this type of construction for \(k=0\).

We can also consider a function \(H^\mu \) dependent on time provided it is Lipschitz. Let \(\gamma _{(t_0,v)}(t)\) be any solution of

$$\begin{aligned} \frac{\mathrm{d}x^{\mu }}{ \mathrm{d}t}&=v^\mu , \end{aligned}$$
(19)
$$\begin{aligned} \frac{\mathrm{d}v^\mu }{\mathrm{d}t}&=H^\mu (t,x,v) , \end{aligned}$$
(20)

with initial condition \(v\in TM\) at time \(t_0\).

Theorem 14

The solution \(\gamma _{(t_0,v)}(t)\) is locally Lipschitz in \((t,t_0,v)\) and \(\gamma _{(t_0,v)}(\cdot )\) is \(C^{2,1}_{loc}\). Let \((\hat{t},\hat{p})\in \mathbb {R}\times M\), then for every \(\epsilon >0\) there is an open neighborhood \(C\) of \(\hat{p}\) such that for every \(p_1,p_2\in C\), \(t_1,t_2\in (t_0-\epsilon ,t_0+\epsilon )\), \(t_1<t_2\), there is one and only one solution starting from \(p_1\) at time \(t_1\) and reaching \(p_2\) at time \(t_2\) entirely contained in \(C\).

The idea is to rewrite the system of first order ODE as follows

$$\begin{aligned} \frac{\mathrm{d}t}{\mathrm{d}s}&=t', \\ \frac{\mathrm{d}x^{\mu }}{ \mathrm{d}s}&={x'}^\mu , \\ \frac{\mathrm{d}t'}{\mathrm{d}s}&=0, \\ \frac{\mathrm{d}{x'}^\mu }{\mathrm{d}s}&=(t')^2\,H^\mu (t,x,x'/t')=H^\mu (t,x,x') , \end{aligned}$$

where \(x'= v t'\), so as to reduce the problem to the (\(s\)-)time independent case. The previous theorem is then a corollary of Theorem 4 whenever in the proof provided for that theorem in place of the Banach space \((\mathbb {R}^n, \Vert \, \Vert )\) we consider the Banach space \((\mathbb {R}^{n+1}, \max (\Vert \, \Vert , | \, |))\).

1.8 Some technical preliminary results

Let us first recall that in finite dimensions and for Lipschitz functions the notions of Gâteaux differential and Frechet differential coincide [19, p. 158].

Proposition 2

Any Lipschitz function \(f:O\rightarrow \mathbb {R}\), \(O\subset \mathbb {R}^n\), which is Gâteaux differentiable at \(p\) is differentiable at \(p\).

Proof

Let \(G\) be the Gâteaux differential at \(p\). By contradiction, suppose that \(G\) is not the differential of \(f\) at \(p\), then there is \(\epsilon >0\) and a sequence \(v_n\in \mathbb {R}^n\), \(v_n \ne 0\), \(v_n\rightarrow 0\), such that

$$\begin{aligned} \Vert f(p+v_n)-f(p)-G(v_n)\Vert > \epsilon \Vert v_n\Vert . \end{aligned}$$
(21)

Let \(e_n=v_n/\Vert v_n\Vert \), by the compactness of \(S^{n-1}\) we can assume without loss of generality that \(e_n \rightarrow e\), with \(\Vert e\Vert =1\). Let us decompose \(v_n\) in components parallel and perpendicular to \(e\)

$$\begin{aligned} v_n=a_n e+b_n \end{aligned}$$

where \(a_n\) is a scalar and \(b_n\) is a vector. We have

$$\begin{aligned} \Vert f(p+v_n)-f(p)-G(v_n)\Vert&\le \Vert f(p+v_n)-f(p+a_n e)\Vert \\ {}&\quad +\Vert f(p+a_n e)-f(p)-G(a_n e)\Vert \\ {}&\quad +\Vert G(a_n e)-G(v_n)\Vert \\&= K \Vert b_n\Vert +o_{e}(a_n)+| G| \, \Vert b_n\Vert , \end{aligned}$$

where \(K\) is the Lipschitz constant. The condition \(e_n\rightarrow e\) reads \(a_n/\Vert v_n\Vert \rightarrow 1\), \(\Vert b_n\Vert / \Vert v_n\Vert \rightarrow 0\). For sufficiently large \(n\) the previous inequality contradicts Eq. (21).\(\square \)

In order to prove Theorem 1 we shall need the following result on differentiation under the integral sign which, we believe, is interesting in its own right.

Theorem 15

Let \(f:[a,b]\times \mathbb {R}^k\rightarrow \mathbb {R}\), \((x,y) \mapsto f(x,y)\), be a continuous function which is locally Lipschitz in \(y\) uniformly in \(x\). Then for almost every \(y\), the differential \(d_2 f\) exists at \((u,{y})\) for almost every \(u\in [a,b]\), it is summable in \([a,b]\), and for every \(x\in [a,b]\):

$$\begin{aligned} \left( \mathrm{d}_2 \int _a^x f(u,y)\, \mathrm{d}u\right) = \int _a^x \mathrm{d}_2 f(u,{y})\, \mathrm{d}u. \end{aligned}$$

Proof

Let \(\bar{y}\in \mathbb {R}^n\), there is a relatively compact neighborhood \(O\ni \bar{y}\) which is a product of open intervals. The function \(f(x,\cdot )|_O\) is \(K\)-Lipschitz in \(y\) where \(K\) does not depend on \(x\). In particular, for any \(x\in [a,b]\) the function \(f(x,\cdot )\) is differentiable almost everywhere (Rademacher’s theorem). Let \(E(x)\subset O\) be the subset where \(f(x,\cdot )|_O\) is differentiable. Fubini’s theorem applied to the characteristic function of

$$\begin{aligned} A=\cup _{x\in [a,b]} [\{x\}\times E(x)]\subset [a,b]\times O, \end{aligned}$$

states that for almost every \(y \in O\), the differential \(d_2 f\) exists at \((x,{y})\) for almost every \(x\in [a,b]\). From now on let \(y\) be one of these special values.

As a consequence, for almost every \(y \in O\), and for every vector \(v\in \mathbb {R}^k\), the partial derivative \(\partial _v f(x,y)\) exists for almost every \(x\in [a,b]\). Because of the Lipschitz condition we have \(| \partial _v f(x,y)|\le K\Vert v\Vert \). Let \(\epsilon _n\rightarrow 0\) then

$$\begin{aligned} \partial _v f(x,y)=\lim _{n\rightarrow \infty } f_n^y(x) , \end{aligned}$$

where

$$\begin{aligned} f_n^y(x)=\frac{1}{\epsilon _n} [f(x,y+v \epsilon _n)-f(x,y)],\qquad | f^y_n(x) | \le K\Vert v\Vert . \end{aligned}$$

By the dominated convergence theorem \(\partial _v f(x,y)\) is summable and

$$\begin{aligned} \partial _v \int _a^x f(u,y) \mathrm{d}u= \int _a^x \partial _v f(u,y) \mathrm{d}u. \end{aligned}$$

This equation proves that the linear operator \(G(v)\) on the right-hand side is the Gâteaux differential at \(y\) of \(F(x,v):=\int _a^x f(u,v) \mathrm{d}u\). Moreover, this function is Lipschitz in \(v\) thus \(F(x,\cdot )\) is differentiable and \(G\) coincides with its differential (Prop. 2).\(\square \)

Alternatively, we could use a theorem [46] which improves Schwarz’s theorem on the equality of mixed partial derivatives. The reader is referred to [46] for further details. A result similar to the next one was proved by Federer [24, Lemma 4.7].

Theorem 16

Let \(f:O\rightarrow \mathbb {R}\) be a Lipschitz function defined on an open subset of \(\mathbb {R}^n\). Let \(J:O\rightarrow \mathbb {R}^n\) be a continuous function. If the differential of \(f\) exists and coincides with \(J\) almost everywhere then \(f\) is \(C^{1}\) and its differential is \(J\).

Proof

For \(n=1\) this statement follows from inspection of the function

$$\begin{aligned} f(x)-f(a)-\int ^x_a J(u) \mathrm{d}u, \end{aligned}$$

\(a\in O\). This function is Lipschitz, thus absolutely continuous, and hence has a derivative which vanishes almost everywhere, thus it is a constant. Taking \(x=a\) we find that this constant is zero, thus \(f(x)=f(a)+\int ^x_a J(u) \mathrm{d}u\) which implies that \(f\) is \(C^1\) with derivative \(J\).

We can assume without loss of generality, \(O=\mathbb {R}^n\). Let \(v\in \mathbb {R}^n\), we want to show that at every \(x\in O\), \(\partial _v f=J(v)\). This fact would imply that \(f\) is Gâteaux differentiable at each point with Gâteaux differential \(J\), thus the desired conclusion would follow from Prop. 2.

By Fubini’s and Rademacher’s theorem for almost every hyperplane perpendicular to \(v\), the function \(f\) is almost everywhere differentiable at the points belonging to the hyperplane.

Let us introduce coordinates \(\{y^0,\ldots , y^n\}\) such that one such hyperplane has equation \(y^0=0\) and \(v=\partial _{y^0}\). Let \(g(y_1,\cdot , y_n):=f(0,y_1,\ldots ,y_n)\) be the Lipschitz restriction of \(f\) to the hyperplane.

The function \(J(t,y)\) is continuous and hence uniformly continuous over compact subsets. As a consequence, the function

$$\begin{aligned} F(t,y)=g(y)+\int _0^t J(t,y;e_0) \, \mathrm{d}t \end{aligned}$$

is continuous in \((t,y)\) and \(C^{1}\) in \(t\).

Let \((t,y)\in \mathbb {R}^n\) and let \(U\ni (t,y)\) be an open neighborhood. The function \(f\) is almost everywhere differentiable with differential \(J\). By Fubini’s theorem for almost every line parallel to the \(y^0\)-axis ( the line is determined by its intersection with \(y^0=0\) and “a.e.” in this statement is meant in the Lebesgue (\(n-1\))-dimensional measure of this hyperplane), \(f\) is almost everywhere differentiable with differential \(J\). Thus there is some \((t',y')\in U\) passing through one such line. The function \(f-F\) over such line is Lipschitz and differentiable almost everywhere with zero derivative, thus it is a constant. But \(f-F=0\) at \(y^0=0\) thus that constant vanishes and hence \(f(t',y')=F(t',y')\). As both functions are continuous and \(U\) is arbitrary, \(f(t,y)=F(t,y)\) that is \(f=F\). We conclude that \(\partial _{v}f= \partial _v F=\partial _{y^0} F= J(e_0)\), which is what we wanted to prove.\(\square \)

2 Proofs I: a Picard-Lindelöf analysis

This section is devoted to the proofs of the results stated in the previous section and, in particular, to the proof of Theorem 4. In order to prove the strong differentiability of \(\exp \) we shall make a Picard-Lindelöf analysis of the geodesic equation in the small. We shall also pass through the proof of existence and Lipschitz dependence on initial conditions. Though they are already known they will be useful to fix the notation and introduce the bounds used in the last step of the proof.

Let \(x^\mu :U\rightarrow \mathbb {R}^n\) be a coordinate chart in a neighborhood \(U\) of \(p\in M\). Without loss of generality let us assume that \(x^\mu (p)=0\) and let \(r\) be such that the closed ball \(\bar{B}(p,r)=\{q: \Vert q-p\Vert \le r\}\) is contained in \(U\), where \(\Vert \, \Vert \) is the Euclidean coordinate norm. On the tangent bundle we introduce the local coordinate system \(\{x^\mu ,\dot{x}^\mu \}\). We shall regard the coordinate chart image \(x^\mu (U)\) as an open subset of the normed space \((\mathbb {R}^n, \Vert \,\Vert )\).

Let us consider the system (1, 2) where \(H^\mu \) is homogeneous of second degree in \(v\) and Lipschitz

$$\begin{aligned} \Vert H(x_2,v_2)-H(x_1,v_1)\Vert \le \alpha \Vert x_2-x_1\Vert +\beta \Vert v_2-v_1\Vert \end{aligned}$$

in the domain \(\bar{B}(p,r)\times \{v: \Vert v\Vert \le 1\}\).

Suppose that \(v_1,v_2\) are not bounded as stated. Let \(V\) be any constant such that

$$\begin{aligned} V>\max (\Vert v_1\Vert , \Vert v_2\Vert ). \end{aligned}$$
(22)

The Lipschitz conditions is then better rewritten in the following form:

$$\begin{aligned} \Vert H(x_2,v_2)-H(x_1,v_1)\Vert&=V^2 \left\| H(x_2,\frac{v_2}{V})-H(x_1,\frac{v_1}{V})\right\| \\&\le V^2\left\{ \alpha \Vert x_2-x_1\Vert +\beta \Vert \frac{v_2}{V}-\frac{v_1}{V}\Vert \right\} \\&\le \alpha V^2\Vert x_2-x_1\Vert +\beta V\Vert v_2-v_1\Vert . \end{aligned}$$

Let

$$\begin{aligned} M=\sup _{x\in \bar{B}(p,r)} \sup _{ \Vert v\Vert =1} \Vert H(x, v)\Vert . \end{aligned}$$
(23)

We rewrite (1, 2) in integral form

$$\begin{aligned} x^\mu (t, x_0,\dot{x}_0)&=x^\mu _0+\int _0^t \dot{x}^\mu (t,x_0,\dot{x}_0)\, \mathrm{d}t , \end{aligned}$$
(24)
$$\begin{aligned} \dot{x}^\mu (t, x_0,\dot{x}_0)&= \dot{x}^\mu _0+\int _0^t H^\mu \Big (x(t, x_0,\dot{x}_0), \dot{x}(t, x_0,\dot{x}_0)\Big ) \, \mathrm{d}t , \end{aligned}$$
(25)

where \((x_0,\dot{x}_0)\) is the initial condition at time \(t=0\). We have included in the above expression the dependence of \((x,\dot{x})\) on the initial conditions. Unless needed we shall remove it from the expressions below.

Let \((x_0,\dot{x}_0)\) belong to the domain

$$\begin{aligned} \max \{ \Vert x_0\Vert ,\Vert \dot{x}_0\Vert \}&< \delta , \end{aligned}$$
(26)

where \(\delta \) is a positive constant such that (the expression makes sense for \(M=0\))

$$\begin{aligned} \delta < \frac{1}{M}\left( 1-e^{-M r/2}\right) \le \frac{r}{2} , \end{aligned}$$
(27)

and is sufficiently small that

$$\begin{aligned} \frac{\delta }{1-\delta M}\le 1,\qquad \frac{\beta \delta }{2(1-\delta \, M)} \, \left( 1+\sqrt{1+4 \alpha /\beta ^2}\, \right) \le 1. \end{aligned}$$
(28)

Starting from \(k=-1\) with

$$\begin{aligned} x_{-1}(t)=x_0, \quad \dot{x}_{-1}(t)=0, \end{aligned}$$

we define inductively the next two functions defined over \([0,1]\) which, in the induction hypothesis are well defined and \(C^1\) and are such that \((x_k,\dot{x}_k)(t)\in \pi ^{-1}(\bar{B}(p,r))\) for every \(t\in [0,1]\)

$$\begin{aligned} x^\mu _{k+1}(t)&=x^\mu _0+\int _0^t \dot{x}_k^\mu (t)\, \mathrm{d}t , \end{aligned}$$
(29)
$$\begin{aligned} \dot{x}^\mu _{k+1}(t)&= \dot{x}^\mu _0+\int _0^t H^\mu \Big (x_k(t),\dot{x}_k(t)\Big ) \, \mathrm{d}t . \end{aligned}$$
(30)

In particular observe that they imply for \(k=0\)

$$\begin{aligned} x_{0}(t)=x_0, \quad \dot{x}_{0}(t)=\dot{x}_0, \end{aligned}$$

while for \(k\ge 1\) they will generically depend on \(t\). From Eqs. (29, 30) we obtain (we shall repeatedly use the inequality \(\Vert \int v^\mu (t) \mathrm{d}t\Vert \le \int \Vert v^\mu (t)\Vert \mathrm{d}t\) which is well known once interpreted as the fact that the length of a \(C^1\) curve on \(\mathbb {R}^n\) is greater than the distance between its endpoints)

$$\begin{aligned} \Vert x_{k+1}(t)\Vert&\le \Vert x_0\Vert +\int _0^t \Vert \dot{x}_k(t)\Vert \, \mathrm{d}t ,\\ \Vert \dot{x}_{k+1}(t)\Vert&\le \Vert \dot{x}_0\Vert +\int _0^t M \,\Vert \dot{x}_k(t)\Vert ^2\, \mathrm{d}t . \end{aligned}$$

The second inequality implies inductively the following bound

$$\begin{aligned} \Vert \dot{x}_{k+1}(t)\Vert \le \frac{\Vert \dot{x}_0\Vert }{1-\Vert \dot{x}_0\Vert M t} , \end{aligned}$$

which is clearly satisfied for \(k=-1\) and which, replaced into the first inequality gives

$$\begin{aligned} \Vert x_{k+1}(t)\Vert \le \Vert x_0\Vert -\frac{1}{M}\ln (1-\Vert \dot{x}_0\Vert M t )\le \delta - \frac{1}{M}\ln (1-\delta \, M ). \end{aligned}$$

By (27) we have for every \(k\ge 0\),

$$\begin{aligned} \Vert x_{k}(t)\Vert&< r, \\ \Vert \dot{x}_{k}(t)\Vert&< \frac{\delta }{1-\delta \, M} \end{aligned}$$

thus these functions define indeed points belonging to \(\pi ^{-1}(\bar{B}(p,r))\). We observe that for any instants \(t_1,t_2\in [0,1]\) and any pair \(v_1:=\dot{x}_i(t_1), v_2:=\dot{x}_j(t_2)\)

$$\begin{aligned} V{:=}\frac{\delta }{1-\delta \, M} \end{aligned}$$
(31)

satisfies the condition of Eq. (22). From now on \(V\) will be given by this equation.

2.1 Existence of geodesics

We have:

$$\begin{aligned} \Vert x_{k+1}(t)- x_{k}(t)\Vert&\le \int _0^t \Vert \dot{x}_k(t)-\dot{x}_{k-1}(t) \Vert \, \mathrm{d}t ,\\ \Vert \dot{x}_{k+1}(t)-\dot{x}_{k}(t) \Vert&\, \le \,\, \int _0^t \,\, \Vert H(x_k(t),\dot{x}_k(t)) -H(x_{k-1}(t),\dot{x}_{k-1}(t)) \Vert \, \mathrm{d}t , \\&\le \int _0^t\{A \Vert x_k(t)-x_{k-1}(t)\Vert + B\Vert \dot{x}_{k}(t)-\dot{x}_{k-1}(t)\Vert \}\, \mathrm{d}t , \end{aligned}$$

where \(A=[\frac{\delta }{1-\delta \, M}]^2 \alpha \) and \(B= \frac{\beta \delta }{1-\delta \, M}\). There is a positive constant \(D\) such that

$$\begin{aligned} \frac{A}{D}+B=D, \end{aligned}$$
(32)

namely

$$\begin{aligned} D=\frac{1}{2}\,\left( B+\sqrt{B^2+4 A}\,\right) =\frac{\beta \delta }{2(1-\delta \, M)} \, \left( 1+\sqrt{1+4 \alpha /\beta ^2}\, \right) . \end{aligned}$$
(33)

By Eq. (28)

$$\begin{aligned} D\le 1. \end{aligned}$$

By induction we have the bounds

$$\begin{aligned} \Vert {x}_k(t)-{x}_{k-1}(t) \Vert&\le \frac{1}{D}\frac{(Dt)^k}{k!} \\ \Vert \dot{x}_k(t)-\dot{x}_{k-1}(t) \Vert&\le \frac{(Dt)^k}{k!} \end{aligned}$$

They imply that the series

$$\begin{aligned} x^\mu _k(t)&= x^\mu _0+ \sum _{k=0}^n \left[ x^\mu _{k+1}(t)-x^\mu _k(t)\right] , \\ \dot{x}^\mu _k(t)&=\dot{x}^\mu _0+\sum _{k=0}^n \left[ \dot{x}^\mu _{k+1}(t)-\dot{x}^\mu _k(t)\right] , \end{aligned}$$

define a succession of continuous functions which converge uniformly to (continuous) functions \(x^\mu (t)\) and \(\dot{x}^\mu (t)\) over \([0,1]\). By uniform convergence we can pass to the limit in Eqs. (29, 30). Indeed, observe that \(H^\mu (x,\dot{x})\) is continuous over the compact set \(\bar{B}(p,r)\times \{v: \Vert v\Vert \le V\}\) and hence uniformly continuous. Thus these limits indeed solve the system (24, 25). In particular, Eq. (24) proves that indeed \(\dot{x}^\mu (t)=\frac{\mathrm{d}}{\mathrm{d}t}\, x^\mu (t)\).

For the proof of the uniqueness of geodesics the reader can consult [31], Theor. 1.1, Chap. 2].

2.2 Lipschitz dependence on initial conditions (Theorem 1)

The Lipschitz dependence on the initial condition can be proved using the Gronwall’s inequality. It is rather easy to obtain it directly using the Picard–Lindelöf approximation.

Let us consider two solutions \(x^\mu (t)\) and \(y^\mu (t)\) of the geodesic equation with initial conditions \((x^\mu _0,\dot{x}^\mu _0)\), \((y^\mu _0,\dot{y}^\mu _0)\), which belong to the domain given by Eq. (26). Starting from these initial conditions we define inductively functions \(x_k^\mu (t)\), \(y_k^\mu (t)\) as above, and subtract the corresponding Eqs. (29, 30).

$$\begin{aligned} \Vert x_{k+1}(t)- y_{k+1}(t)\Vert&\le \Vert x_0-y_0\Vert +\int _0^t \Vert \dot{x}_k(t)-\dot{y}_{k}(t) \Vert \, \mathrm{d}t ,\\ \Vert \dot{x}_{k+1}(t)-\dot{y}_{k+1}(t) \Vert&\le \Vert \dot{x}_0-\dot{y}_0\Vert +\int _0^t\,\{A \Vert x_k(t)-y_{k}(t)\Vert + B\Vert \dot{x}_{k}(t)-\dot{y}_{k}(t)\Vert \}\, \mathrm{d}t . \end{aligned}$$

We now regard the chart-trivialized tangent bundle of coordinates \(\{x^\mu , \dot{x}^\mu \}\) as the direct sum of vector spaces \(\mathbb {R}^n\oplus \mathbb {R}^n\) endowed with the norm

$$\begin{aligned} \Vert (x, \dot{x}) \Vert := \max \{\Vert x \Vert , \Vert \dot{x} \Vert \}. \end{aligned}$$

By induction we obtain

$$\begin{aligned} \Vert x_{k}(t)- y_{k}(t)\Vert&\le \max \{ D \Vert x_0-y_0\Vert ,\Vert \dot{x}_0-\dot{y}_0\Vert \}\, \frac{e^{Dt}}{D} , \\ \Vert \dot{x}_{k}(t)-\dot{y}_{k}(t) \Vert&\le \max \{D\Vert x_0-y_0\Vert ,\Vert \dot{x}_0-\dot{y}_0\Vert \} \, e^{Dt}. \end{aligned}$$

Clearly, the induction hypothesis is satisfied for \(k=0\). Taking the limit \(k\rightarrow \infty \)

$$\begin{aligned} \Vert x(t)- y(t)\Vert&\le \max \{ D \Vert x_0-y_0\Vert , \Vert \dot{x}_0-\dot{y}_0\Vert \}\, \frac{e^{Dt}}{D} , \\ \Vert \dot{x}(t)-\dot{y}(t) \Vert&\le \max \{D\Vert x_0-y_0\Vert , \Vert \dot{x}_0-\dot{y}_0\Vert \} \, e^{Dt}. \end{aligned}$$

Recalling that \(D \le 1\) these inequalities imply

$$\begin{aligned} \Vert ( x, \dot{x})(t) - (y,\dot{y})(t)\Vert \le \Vert (x_0,\dot{x}_0) -(y_0, \dot{y}_0)\Vert \, \frac{e^{Dt} }{D} , \end{aligned}$$

which proves the Lipschitz dependence on the initial conditions (the exponential map is obtained for \(t=1\)).

The joint dependence on \(t\) and \((x_0,\dot{x}_0)\) is also locally Lipschitz, indeed since the time dependence of \(( x, \dot{x})(t)\) is \(C^{1,1}\) it is locally Lipschitz, thus for \(t,t'\in [a,b]\)

$$\begin{aligned} \Vert ( x, \dot{x})(t') - (y,\dot{y})(t)\Vert \le \Vert ( x, \dot{x})(t')- ( x, \dot{x})(t)\Vert + \Vert ( x, \dot{x})(t) - (y,\dot{y})(t)\Vert , \end{aligned}$$

from which we infer the local Lipschitzness of the dependence on \((t,x_0,\dot{x}_0)\).

As a consequence, the geodesic flow over \(TM\) is locally Lipschitz because every geodesic segment can be covered with a finite number of coordinate patches. By continuity the domain \(W\) where it is defined is open and satisfies the conditions of Theorem 1. Analogously, \(\varOmega \) is open.

Let us rewrite the system (24, 25) reintroducing the dependence on the initial conditions

$$\begin{aligned} (x,\dot{x})(t,x_0,\dot{x}_0)=(x_0,\dot{x}_0)+\int _0^t (f_1, f_2) (t,x_0,\dot{x}_0) \mathrm{d}t , \end{aligned}$$
(34)

where

$$\begin{aligned} f_1^\mu (t,x_0,\dot{x}_0)&=\dot{x}^\mu (t,x_0,\dot{x}_0), \\ f_2^\mu (t,x_0,\dot{x}_0)&=H^\mu \big (x(t,x_0,\dot{x}_0), \dot{x}(t,x_0,\dot{x}_0)\big ). \end{aligned}$$

Since the dependence \((x,\dot{x})(t,x_0,\dot{x}_0)\) is Lipschitz we can apply Theorem 15 with the replacements \(x\rightarrow t\), \(y\rightarrow (x_0,\dot{x}_0)\).

We conclude that for almost every \((x_0,\dot{x}_0)\) the differential \(\mathrm{d}_{(x_0,\dot{x}_0)} (f_1, f_2) (u)\) exists for almost every \(u\), it is summable and for every \(t\in [0,1]\)

$$\begin{aligned} \mathrm{d}_{(x_0,\dot{x}_0)}(x,\dot{x})=\mathrm{d}_{(x_0,\dot{x}_0)}(x_0,\dot{x}_0)+\int _0^t \mathrm{d}_{(x_0,\dot{x}_0)} (f_1, f_2)(u,x_0,\dot{x}_0) \mathrm{d}u. \end{aligned}$$
(35)

In particular, for almost every \((x_0,\dot{x}_0)\) satisfying Eq. (27) the differential \(\mathrm{d}_{(x_0,\dot{x}_0)}(x,\dot{x})\) exists for every \(t\in [0,1]\) and is given by this equation. Let us call \(U\) this special subset of the initial conditions.

Since the functions \(f_1\) and \(f_2\) are locally Lipschitz the differential on the right-hand side is bounded and so, for every \((x_0,\dot{x}_0)\in U\), the quantity \(\mathrm{d}_{(x_0,\dot{x}_0)}(x,\dot{x})\) has a Lipschitz dependence on \(t\) where the Lipschitz constant does not vary in a small relatively compact neighborhood of \((x_0,\dot{x}_0)\). In other words, \(\mathrm{d}_{(x_0,\dot{x}_0)}(x,\dot{x})\) is a function dependent on \((t,x_0,\dot{x}_0)\) which is Lipschitz in \(t\) uniformly in those \((x_0,\dot{x}_0)\) belonging to \(U\).

Differentiating Eq. (35) with \((x_0,\dot{x}_0)\in U\) we get that for almost every \(t\)

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t}\, \mathrm{d}_{(x_0,\dot{x}_0)}(x,\dot{x})=\mathrm{d}_{(x_0,\dot{x}_0)} (f_1, f_2)= \mathrm{d}_{(x_0,\dot{x}_0)} \frac{\mathrm{d}}{\mathrm{d}t}\, (x,\dot{x}). \end{aligned}$$

Though we performed just a local analysis, the conclusion does not change in the setting of Theorem 1 where \(U\) is replaced by \(\tilde{\varOmega }\) since, as observed above, every geodesic segment can be covered with a finite number of coordinate patches. The fact that \(U\) and \(\tilde{\varOmega }\) are star-shaped is a consequence of Eq. (9).

In order to prove the last statement of Theorem 1 it suffices to observe that we can restart the last argument beginning with Eq. (34) by introducing the Lipschitz dependence \((x_0,\dot{x}_0)(z)\) where \(z\) are local coordinates on \(N\). Since \((x,\dot{x})(t,z)\) is locally Lipschitz the whole argument stills works where it is understood that \(\partial _z \varphi \) exists almost everywhere in the Lebesgue \(m\)-dimensional measure of \(N\) (This measure can be equivalently defined either as done here using a chart of \(N\) or in a more intrinsic way regarding \(N\) as a subset of \(TM\), see [22], Sect. 3.3.3]). Theorem 1 is proved.

Remark 15

We pause for a moment to outline how to generalize this proof for \(C^{1,1}\) sprays over \(C^{3,1}\) manifold, the further generalization to the \(C^{k,1}\), \(k\ge 0\), spray case being then analogous.

The idea is to introduce variables \({x}^\mu _{k,\beta }\), \({x}^\mu _{k,\dot{\beta }}\), \(\dot{x}^\mu _{k,\beta }\), \(\dot{x}^\mu _{k,\dot{\beta }}\) and add to the system (29, 30) the equations obtained (formally) differentiating the right-hand side (29, 30) with respect to the initial conditions

$$\begin{aligned} x^\mu _{k+1,\beta }(t)&=\delta ^\mu _\beta +\int _0^t \dot{x}_{k,\beta }^\mu (t)\, \mathrm{d}t , \\ x^\mu _{k+1,\dot{\beta }}(t)&=\int _0^t \dot{x}_{k,\dot{\beta }}^\mu (t)\, \mathrm{d}t , \\ \dot{x}^\mu _{k+1,\beta }(t)&= \int _0^t (\partial _{x^\alpha } H^\mu ) x^\alpha _{k,\beta }+ (\partial _{\dot{x}^\alpha } H^\mu ) \dot{x}^{\alpha }_{k,\beta } \, \mathrm{d}t,\\ \dot{x}^\mu _{k+1,\dot{\beta }}(t)&= \delta ^\mu _\beta + \int _0^t (\partial _{x^\alpha } H^\mu ) x^\alpha _{k,\dot{\beta }}+ (\partial _{\dot{x}^\alpha } H^\mu ) \dot{x}^{\alpha }_{k,\dot{\beta }} \, \mathrm{d}t. \end{aligned}$$

The proof of the convergence for \(k\rightarrow \infty \) proceeds as above. From here it follows that \({x}^\mu _{,\beta }\), \({x}^\mu _{,\dot{\beta }}\), \(\dot{x}^\mu _{,\beta }\), \(\dot{x}^\mu _{,\dot{\beta }}\) are Lipschitz as they solve a first order Lipschitz ODE. Since the solution is unique \({x}^\mu _{,\beta }\) coincides with \(\partial _{x_0^\beta } {x}^\mu \) and so on.

2.3 Strong differentiability of exp (Theorems 3 and 13)

Let us consider two solutions \(x^\mu (t)\) and \(y^\mu (t)\) of the geodesic equation with initial conditions \((x^\mu _0,\dot{x}^\mu _0)\), \((y^\mu _0,\dot{y}^\mu _0)\), which belong to the domain given by Eq. (26). Starting from these initial conditions we define inductively functions \(x_k^\mu (t)\), \(y_k^\mu (t)\) as above, and subtract the corresponding Eqs. (29, 30), rearranging them as follows

$$\begin{aligned}&x_{k+1}(t)- y_{k+1}(t)-(x_0-y_0)-(\dot{x}_0 -\dot{y}_0) t =\int _0^t [\dot{x}_k(t)-\dot{y}_{k}(t)-(\dot{x}_0-\dot{y}_0) ]\, \mathrm{d}t ,\\&\dot{x}_{k+1}(t)-\dot{y}_{k+1}(t) -(\dot{x}_0-\dot{y}_0)=\,\int _0^t\,\big [H\left( x_k(t),\dot{x}_k(t)\right) -H\left( y_k(t),\dot{y}_k(t)\right) \big ]\, \mathrm{d}t . \end{aligned}$$

Thus

$$\begin{aligned} \Vert x_{k+1}(t)\!-\! y_{k+1}(t)-(x_0-y_0)\!-\!(\dot{x}_0 -\dot{y}_0) t \Vert&\le \!\int _0^t \Vert \dot{x}_k(t)-\dot{y}_{k}(t)-(\dot{x}_0-\dot{y}_0) \Vert \, \mathrm{d}t , \end{aligned}$$

and

$$\begin{aligned}&\Vert \dot{x}_{k+1}(t)-\dot{y}_{k+1}(t) -(\dot{x}_0-\dot{y}_0) \Vert \le \int _0^t \Vert H(x_k(t),\dot{x}_k(t))-H(y_k(t),\dot{y}_k(t))\Vert \, \mathrm{d}t \\&\le \int _0^t \{A \Vert x_k -y_k\Vert +B \Vert \dot{x}_k-\dot{y}_k \Vert \}\, \mathrm{d}t \\&\le \int _0^t \{A [\Vert x_k -y_k-(x_0-y_0)-(\dot{x}_0 -\dot{y}_0) t \Vert +\Vert x_0-y_0 \Vert +\Vert \dot{x}_0 -\dot{y}_0 \Vert t]\\&\quad \ +B [\Vert \dot{x}_k-\dot{y}_k -(\dot{x}_0 -\dot{y}_0) \Vert +\Vert \dot{x}_0 -\dot{y}_0 \Vert ] \}\, \mathrm{d}t. \end{aligned}$$

By induction it follows that

$$\begin{aligned}&\Vert x_{k}(t)-y_{k}(t)-(x_0-y_0)-(\dot{x}_0 -\dot{y}_0) t \Vert \le \max \{D \Vert x_0-y_0 \Vert , \Vert \dot{x}_0-\dot{y}_0\Vert \}\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \qquad \times \left( \frac{e^{Dt}-1}{D}-t\right) ,\\&\Vert \dot{x}_{k}(t)-\dot{y}_{k}(t) -(\dot{x}_0-\dot{y}_0) \Vert \le \max \{D \Vert x_0-y_0 \Vert ,\Vert \dot{x}_0-\dot{y}_0\Vert \} \left( e^{Dt}-1\right) . \end{aligned}$$

The induction hypothesis for the former equation and \(k=1\) is satisfied because of Eq. (29). The induction hypothesis for the latter equation and \(k=1\) is satisfied because using Eq. (32)

$$\begin{aligned} \Vert \dot{x}_{1}(t)-\dot{y}_{1}(t) -(\dot{x}_0-\dot{y}_0) \Vert&\le \int _0^t A \Vert x_0-y_0\Vert +B\Vert \dot{x}_0-\dot{y}_0\Vert \mathrm{d}t\\&\le \max \{D \Vert x_0-y_0 \Vert ,\Vert \dot{x}_0-\dot{y}_0\Vert \} Dt. \end{aligned}$$

Taking the limit \(k\rightarrow \infty \) we obtain

$$\begin{aligned}&\Vert x(t)-y(t)-(x_0-y_0)-(\dot{x}_0 -\dot{y}_0) t \Vert \le \max \{ D \Vert x_0-y_0 \Vert , \Vert \dot{x}_0-\dot{y}_0\Vert \}\nonumber \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \times \left( \frac{e^{Dt}-1}{D}-t\right) , \end{aligned}$$
(36)
$$\begin{aligned}&\Vert \dot{x}(t)-\dot{y}(t) -(\dot{x}_0-\dot{y}_0) \Vert \le \max \{D \Vert x_0-y_0 \Vert ,\Vert \dot{x}_0-\dot{y}_0\Vert \} \left( e^{Dt}-1\right) , \end{aligned}$$
(37)

Let us disregard for the moment the last inequality. We have the trivial inequality

$$\begin{aligned} \Vert x_0- y_0-(x_0-y_0) \Vert \le \max \{ D \Vert x_0-y_0 \Vert , \Vert \dot{x}_0-\dot{y}_0\Vert \} ({e^{Dt}-1}-Dt) . \end{aligned}$$
(38)

On \(X=\mathbb {R}^{2n}\) let us consider a function \(f: X\rightarrow X\) defined as follows

$$\begin{aligned} f(x_0, \dot{x}_0)=(x_0, x(1)). \end{aligned}$$

Clearly, \(f\) is the coordinate expression of the exponential map. Let \(L: X \rightarrow X\) be the linear map given by the matrix

$$\begin{aligned} L=\begin{pmatrix} I &{}\quad 0\\ I &{} \quad I \end{pmatrix} \end{aligned}$$
(39)

with \(I\) the \(n\times n\) identity matrix. Recalling that \(D\le 1\) the inequalities (38) and (36) can be rewritten for \(t=1\)

$$\begin{aligned}&\Vert f(x_0,\dot{x}_0)-f(y_0,\dot{y}_0)-L((x_0,\dot{x}_0)-(y_0,\dot{y}_0)) \Vert \\&\quad \le \Vert (x_0,\dot{x}_0)- (y_0,\dot{y}_0) \Vert \left( \frac{e^{D}-1}{D}-1\right) \end{aligned}$$

for every \((x_0,\dot{x}_0)\) and \((y_0,\dot{y}_0)\) such that \(\Vert (x_0,\dot{x}_0)\Vert < \delta \) and \(\Vert (y_0,\dot{y}_0)\Vert < \delta \).

But Eq. (33) shows that \(D\) as a function of \(\delta \) satisfies \(\lim _{\delta \rightarrow 0} D(\delta )=0\), thus \(f\) is strongly differentiable at \((0,0)\) with strong differential \(L\). Since any point \(p\in M\) corresponds to zero coordinates for some chart compatible with the atlas, we have that \(\exp \) is differentiable at any point in the image of \(p\mapsto 0_p\).

Let us observe that if a strongly differentiable function \(f(x,y)\) is strongly differentiable then keeping \(x\) fixed we obtain a function \(f(x,\cdot )\) which is strongly differentiable. This fact follows easily from the definition of strong differentiability.

Furthermore, the composition of strongly differentiable functions is strongly differentiable. As a consequence, the pointed exponential map \(\exp _p:=\pi _2(f(0, \cdot ))\) is strongly differentiable at the origin. From Eq. (39) we read that the Jacobian is given by the \(n\times n\) identity matrix. Since it is invertible \(\exp _p\) establishes a local Lipeomorphism.

2.3.1 Normal vector bundle case (Theorem 13)

Let us prove Theorem 13. Let \(p\in \phi (S)\), let \(\{s^k; k=1,\ldots , l\}\) be coordinates on \(S\) at \(\phi ^{-1}(p)\) and let \(x_0^\mu (s):=\phi ^\mu (s)\). The tangent vectors \((\partial _k x_0^\mu )\partial _\mu \), \(k=1,\ldots , l\), provide a (Lipschitz) base for the tangent space at any point in a neighborhood of \(p\). Applying the Gram-Schmidt procedure to

$$\begin{aligned} \left( \partial _1 x_0^\mu \right) \partial _\mu , \ldots , \left( \partial _l x_0^\mu \right) \partial _\mu , \partial _1, \ldots , \partial _n, \end{aligned}$$

discarding the last found \(l\) null vectors, and keeping the last \(n-l\) non-trivial vectors, we are left with a Lipschitz base of the normal space. Call this base \(e^\mu _k \partial _\mu \), \(k=1,\ldots , n-l\). By construction it is Lipschitz. Thanks to this base we can introduce a chart of coordinates \((s,y) \in \mathbb {R}^l\times \mathbb {R}^{n-l}\) over \(\nu (S)\), so as to represent each \(v\in \nu (S)\), \(v=(x_0,\dot{x}_0)\), as follows

$$\begin{aligned} x_0^\mu (s,y)&=x^\mu _0(s^i), \end{aligned}$$
(40)
$$\begin{aligned} \dot{x}_0^\mu (s,y)&= y^j e^\mu _j(s^i). \end{aligned}$$
(41)

This system of equations gives the map between \(\nu (S)\) and \(TM\) expressed through the respective coordinate charts. A naive calculation would suggest that the Jacobian of this transformation is given by the \((n+n) \times (l+(n-l))\) matrix

$$\begin{aligned} J= \begin{pmatrix} \partial _k x^\mu _0&{} 0\\ y^j \partial _i e^\mu _j &{} e^\mu _j \end{pmatrix}. \end{aligned}$$

However, since \(e^\mu _j\) is just Lipschitz the \(n\times l\) block matrix \(y^j \partial _i e^\mu _j\) is not well-defined. Nevertheless, this expression suggests that \(J\) should be as given for \(y=0\), namely on \(\phi (S)\), for in this case the ill defined block matrix vanishes.

Let us prove that for \(y=0\), \(J\) is the strong differential of the map \((s,y) \rightarrow (x_0,\dot{x}_0)\). Let \(\epsilon >0\). We need only to show that for \(y_1,y_2\) sufficiently close to zero and for \(s_1,s_2\) sufficiently close to \(s\)

$$\begin{aligned} \Vert x_0(y_2,s_2)- x_0(y_1,s_1)-(\partial _i x_0) (s_2^i-s_1^i)\Vert \ \le \epsilon (\Vert s_2-s_1\Vert +\Vert y_2-y_1\Vert ) \\ \Vert \dot{x}_0(y_2,s_2)- \dot{x}_0(y_1,s_1)-e_j(s) (y_2^j-y_1^j)\Vert \ \le \epsilon (\Vert s_2-s_1\Vert +\Vert y_2-y_1\Vert ) \end{aligned}$$

The former inequality is a consequence of the fact that \(x_0(s)\) is \(C^1\) hence strongly differentiable. The latter inequality can be rewritten using Eq. (41)

$$\begin{aligned} \Vert y_2^j e_j(s_2)- y_1^j e_j(s_1) -e_j(s) (y_2^j-y_1^j)\Vert \ \le \epsilon (\Vert s_2-s_1\Vert +\Vert y_2-y_1\Vert ) \end{aligned}$$

which is a consequence of the Lipschitzness of \(e_j^\mu \).

The map

$$\begin{aligned} (s,y) \rightarrow (x_0,\dot{x}_0) \rightarrow \exp (x_0,\dot{x}_0) \rightarrow \pi _2(\exp (x_0,\dot{x}_0) ) \end{aligned}$$

being the composition of (1) a strongly differentiable map at \(y=0\), (2) a strongly differentiable map at \(\dot{x}_0=0\), (3) the strongly differentiable projection map \((x,y)\rightarrow y\), and being such that \(\dot{x}_0(s,0)=0\), is strongly differentiable at \(y=0\), with strong differential given by the \(n \times n\) matrix \(J_{\pi _2} LJ=(\partial _k x_0^\mu \, , \, e_j^\mu )(s)\) which is invertible because its columns are linearly independent vectors. Thus the hypothesis of Leach’s inverse function theorem are satisfied.

2.4 Convex neighborhoods (Theorem 4)

Let us prove that for sufficiently small \(\delta \), \(\bar{B}(p,\delta )\) is reversible strictly convex normal.

By the strong differentiability of \(\exp \) there is an open neighborhood \(O\ni (p,p)\), \(O\subset M\times M\), which is Lipeomorphic to an open neighborhood \(U\) of \(0_p\in TM\). Let \(\delta >0\) be sufficiently small that \(\bar{B}(p,\delta )^2\subset O\) and \(\delta \) satisfies Eqs. (27, 28). Let \(q\in B(p,\delta )\), and let

$$\begin{aligned} N_q&= (\exp ^{-1} {B}(p,\delta )^2 ) \cap \pi ^{-1}(q)=[\exp ^{-1} (\{q\}\times {B}(p,\delta )) ] \cap \pi ^{-1}(q)\\&= \exp _q^{-1} {B}(p,\delta ) . \end{aligned}$$

The first defining equality shows that this set is open in the topology of \(\pi ^{-1}(q)=T_qM\). Furthermore, \(\exp _q|_{N_q}:N_q\rightarrow {B}(p,\delta )\) is injective because \(\{q\}\times N_q\subset U\) and so is \(\exp |_{U}\). It is surjective, for if \(r\in {B}(p,\delta )\) then \((q,r)\in {B}(p,\delta )^2\subset O\), thus there is some \(v\in T_qM\), \(v\in U\), such that \(\exp _q v=r\). Finally, \(\exp _q|_{N_q}\) is Lipschitz because so is \(\exp |_U\), and \(\exp ^{-1}_q|_{{B}(p,\delta )}\) is Lipschitz because so is \(\exp ^{-1}|_{{B}(p,\delta )^2}\). Thus we have proved that for each \(q\in B(p,\delta )\), there is an open set \(N_q\) such that \(\exp _q: N_q\rightarrow B(p,\delta )\) is a Lipeomorphism. We stress that we have not yet shown that \(N_q\) is star-shaped.

Analogously, for sufficiently small \(\delta \), for each \(q\in B(p,\delta )\), there is an open set \(\tilde{N}_q(=\tilde{\exp }_q^{-1} {B}(p,\delta ))\) such that \(\tilde{\exp }_q: \tilde{N}_q\rightarrow B(p,\delta )\) is a Lipeomorphism.

Our choice of \(\delta \) allows us to prove the strict convexity of \(\bar{B}(p,\delta )\) for both the spray and the reverse spray, and hence that each \(N_q\) and \(\tilde{N}_q\) are star-shaped. The key observation is that the continuous function defined on \(\pi ^{-1}(\bar{B}(p,\delta ))\) (“\(\cdot \)” is the Euclidean scalar product in \(\mathbb {R}^n\) induced by the chart)

$$\begin{aligned} z^\pm (x,v)\,{:=}\,\Vert v\Vert ^2+x \cdot H(x,\pm v) \end{aligned}$$

is positive if restricted to the unit tangent bundle \(\bar{B}(p,\delta )\times S^{n-1}\). Indeed,

$$\begin{aligned} z^\pm (x,e)\,{:=}\,1+x \cdot H(x,\pm e)\ge 1-\delta M >0 \end{aligned}$$

where the last inequality is a consequence of Eq. (27). Let us consider a geodesic segment contained in \(\bar{B}(p,\delta )\). Any geodesic \(x(t)\) is \(C^{2,1}\) thus \(\Vert x \Vert ^2(t)\) is \(C^{2,1}\) and

$$\begin{aligned} \frac{\mathrm{d}^2 \Vert x \Vert ^2}{ \mathrm{d}t^2}&= 2 \left( \Vert \dot{x}\Vert ^2+ x \cdot \frac{\mathrm{d}^2 x }{ \mathrm{d}t^2}\right) =2 \left( \Vert \dot{x}\Vert ^2+ x \cdot H(x,\dot{x})\right) \\&= 2z^+\left( x,\frac{\dot{x}}{\Vert \dot{x} \Vert }\right) \Vert \dot{x}\Vert ^2>0 , \end{aligned}$$

where we used the fact that by definition a geodesic is regular, i.e. \(\dot{x}\ne 0\).

Analogously, if we consider a geodesic for the reverse spray

$$\begin{aligned} \frac{\mathrm{d}^2 \Vert x \Vert ^2}{ \mathrm{d}t^2}&= 2 \left( \Vert \dot{x}\Vert ^2+ x \cdot \frac{\mathrm{d}^2 x }{ \mathrm{d}t^2}\right) =2 \left( \Vert \dot{x}\Vert ^2+ x \cdot H(x,-\dot{x})\right) \\&= 2z^-\left( x,\frac{\dot{x}}{\Vert \dot{x} \Vert }\right) \Vert \dot{x}\Vert ^2>0 . \end{aligned}$$

As a consequence \(\Vert x \Vert ^2(t)\) takes its maximum value at the boundary of its interval of definition, that is, in correspondence of the endpoints of the geodesic segment which, therefore, must be contained in \(\bar{B}(p,\delta )\). Furthermore, if its endpoints are at the boundary of the ball then its interior points stay in \({B}(p,\delta )\) because the inequality is strict. Thus \(C:=\bar{B}(p,\delta )\) is reversible strictly convex normal. The fact that \(\exp \) establishes a Lipeomorphism between an open subset of \(TC\) and \(B(p,\delta )^2\) is immediate from the inclusion \(B(p,\delta )^2\subset O\). Analogously, the same result holds for \(\tilde{\exp }\). The proof of Theorem 4 is complete.

2.5 Role of the coordinate affine structure and position vector

We shall need the following result on the behavior of the position vector on a convex neighborhood.

Theorem 17

Under the assumptions of Theorem 4, for every \(\epsilon >0\) we have for any sufficiently small \(\delta \) that \(C=B(p,\delta )\) not only satisfies the conclusions of Theorem 4 but also that for every \(q_1,q_2,q_1',q_2',q\in C\) we have, interpreting the minus sign as that given by the affine structure induced by the coordinate chart

$$\begin{aligned}&\Vert [P(q'_1,q'_2)-(q'_2-q'_1)]-[P(q_1,q_2)-(q_2-q_1)]\Vert \le \epsilon \max \{\Vert q_1'-q_1\Vert , \Vert q_2'-q_2\Vert \}, \nonumber \\&\Vert P(q,q_2)-P(q,q_1)-(q_2-q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert , \end{aligned}$$
(42)
$$\begin{aligned}&\Vert P(q_1,q_2)-(q_2-q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert , \end{aligned}$$
(43)
$$\begin{aligned}&\Vert P(q_1,q_2)\Vert \le \epsilon . \end{aligned}$$
(44)

In the connection case, if the coordinate chart is chosen so as to make \(\varGamma ^\mu _{\alpha \beta }(p)\) vanish, then \(\delta \) can be also chosen such that

$$\begin{aligned} \Vert P(q_1,q_2)+P(q_2,q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert ^2. \end{aligned}$$
(45)

This section will be devoted to the proof of this result.

Our coordinate system \(\{x^\mu \}\) in a neighborhood \(U\) of \(p\) induces an affine structure which allows us to compare tangent vectors at different points of \(U\). Let us consider the geodesic \(x(t)\) such that its initial condition \((x_0,\dot{x}_0)\) satisfy Eq. (26). We ask if, keeping \(x_0\) fixed, the map \(\dot{x}_0\mapsto \dot{x}(1)\) is injective (observe that \(\dot{x}(1)=P(x_0,x(1))=P(x_0,\exp _{x_0} \dot{x}_0\)).

Lemma 3

Let \(\Vert (x_0,\dot{x}_0)\Vert \le \delta \). For fixed base point \(x_0\), the map \(\dot{x}_0\mapsto \dot{x}(1)\) is strongly differentiable at the origin, the strong differential being the \(n\times n\) identity matrix \(I\). The map \((x_0, \dot{x}_0)\mapsto (x_0,\dot{x}(1))\) is also strongly differentiable wherever \(\dot{x}_0=0\), the strong differential being the matrix

$$\begin{aligned} \begin{pmatrix} I &{}\quad 0 \\ 0&{}\quad I \end{pmatrix}. \end{aligned}$$

Thus for sufficiently small \(\delta \) both maps are injective (and bi-Lipschitz) with inverse strongly differentiable wherever \(\dot{x}(1)=0\).

Proof

Let us consider the exponential map of the vectors \(\dot{x_0}, \dot{y}_0\) with base point \(x_0\) (thus \(y_0=x_0\)). By Eq. (37) setting \(t=1\)

$$\begin{aligned} \Vert \dot{x}(1)-\dot{y}(1) -(\dot{x}_0-\dot{y}_0) \Vert \le \Vert \dot{x}_0-\dot{y}_0\Vert (e^{D}-1) . \end{aligned}$$

Since \(D(\delta )\rightarrow 0\) for \(\delta \rightarrow 0\), the map \(\dot{x}_0\mapsto \dot{x}(1)\) is strongly differentiable at the origin, with strong differential the identity matrix.

As for the map \((x_0, \dot{x}_0)\mapsto (x_0,\dot{x}(1))\) it suffices to include in the previous analysis the trivial inequality

$$\begin{aligned} \Vert x_0-y_0-(x_0-y_0) \Vert \le \Vert x_0-y_0\Vert (e^{D}-1) , \end{aligned}$$

and recall that on \(\mathbb {R}^n\oplus \mathbb {R}^n\) we use the norm \(\max \{\Vert \, \Vert , \Vert \,\Vert \}\). The last claim follows from Leach’s inverse function theorem.\(\square \)

Proposition 3

Let \(p\in M\), and let \(\{x^\mu \}\) be a chart in a neighborhood \(U\) of \(p\). Let \(P(r,q)\) be the position vector of \(q\) with respect to \(r\). There is an open convex neighborhood \(C\ni p\), such that for every \(q_2,q_1\in C\), the map \((q_1,q_2)\mapsto P(q_1,q_2)-(q_2-q_1)\), interpreted with the affine structure induced by \(\{x^\mu \}\), is strongly differentiable on the diagonal \(q_1=q_2\), with zero strong differential.

Proof

Let \(C\) be a convex neighborhood of \(p\) such that \(\exp \) establishes a Lipeomorphism between an open subset of \(TC\) and \(C\times C\). Let \(x^\mu :C\mapsto \mathbb {R}^n\) be (\(C^{2,1}\)) coordinates on \(C\), such that \(x^\mu (p)=0\).

We proved that \(\exp \) is strongly differentiable on the zero section of \(TC\) with differential \(L\) on a suitable trivialization [Eq. (39)], thus by Leach’s inverse function theorem, \(\exp ^{-1}\) is strongly differentiable on the diagonal of \(C\times C\) with differential \(L^{-1}\). Stated in another way, the map in coordinates given by \((q_1,q_2) \mapsto (q_1,\exp ^{-1}_{q_1} q_2)\) is strongly differentiable on the diagonal with differential \(L^{-1}\).

Lemma 3 proves that the coordinate map \((q_1, v)\mapsto P(q_1,\exp _{q_1} v)\) is strongly differentiable at the origin with strong differential the identity , thus the coordinate map \((q_1,q_2) \mapsto (q_1,P(q_1,q_2))\) is strongly differentiable on the diagonal with strong differential

$$\begin{aligned} L^{-1}=\left( \!\!\begin{array}{c@{\quad }l} I&{} 0\\ -I&{} I \end{array}\!\right) . \end{aligned}$$

This is the same strong differential of the map \((q_1,q_2) \mapsto (q_1, q_2-q_1)\) where the difference makes sense using the affine structure induced by the coordinate chart. Thus the map \((q_1,q_2)\mapsto (q_1,P(q_1,q_2)-(q_2-q_1))\) has vanishing strong differential on the diagonal of \(C\times C\) which implies that the map \((q_1,q_2)\mapsto P(q_1,q_2)-(q_2-q_1)\) has vanishing strong differential on the diagonal of \(C\times C\).\(\square \)

In particular the map \((q_1,q_2)\mapsto P(q_1,q_2)-(q_2-q_1)\) is strongly differentiable at \((p,p)\) thus for every \(\epsilon >0\) the constant \(\delta >0\) and hence the convex neighborhood \(C=B(p,\delta )\) can be chosen such that for every \(q_1,q_2, q_1', q_2'\in C\)

$$\begin{aligned} \Vert [P(q'_1,q'_2)-(q'_2-q'_1)]\!-\![P(q_1,q_2)\!-\!(q_2-q_1)]\Vert \le \epsilon \max \{\Vert q_1'\!-\!q_1\Vert , \Vert q_2'-q_2\Vert \}. \end{aligned}$$

Thus for every \(q_1,q_2,q\in C\) (set \(q_1'\rightarrow q\), \(q_2'\rightarrow q_2\), \(q_2\rightarrow q_1\), \(q_1\rightarrow q\))

$$\begin{aligned} \Vert P(q,q_2)-P(q,q_1)-(q_2-q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert , \end{aligned}$$

and (set \(q=q_1\))

$$\begin{aligned} \Vert P(q_1,q_2)-(q_2-q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert . \end{aligned}$$
(46)

Thus \(\Vert P(q_1,q_2)\Vert \le (1+\epsilon ) \Vert q_2-q_1\Vert \) and the diameter of \(C\) can be chosen sufficiently small that \(\Vert P(q_1,q_2)\Vert \le \epsilon \).

Suppose now that the spray is a connection (hence reversible) and assume to have chosen the coordinate system in such a way that for every every normalized vector \(e\), \(H^{\mu }(p,e)=0\). This is always possible through an invertible quadratic coordinate change, so that the new coordinate system is still \(C^{2,1}\). By continuity \(C\) can be chosen sufficiently small that on \(C\), \(\Vert H\Vert :=\sup _{x\in C}\sup _{\Vert e\Vert =1} \Vert H(x,e)\Vert \le \epsilon \).

Let \(x:[0,1]\rightarrow C\), \(x(0)=q_1'\), \(x(1)=q_1\), be a geodesic. Let \(q_2,q_2'=x(t)\), \(t\in [0,1]\) then the above 4-points inequality gives

$$\begin{aligned} \Vert P(x(0),x(t))-P(x(1),x(t))-(x(1) -x(0))\Vert \le \epsilon \Vert x(1)-x(0)\Vert . \end{aligned}$$

We observe that \(P(x(0),x(t))=t \dot{x}(t)\) and \(P(x(1),x(t))=-(1-t) \dot{x}(t)\) thus for every \(t\in [0,1]\)

$$\begin{aligned} \Vert \dot{x}(t)-(x(1) -x(0))\Vert \le \epsilon \Vert x(1)-x(0)\Vert . \end{aligned}$$

We have \(P(x(0),x(1))=\dot{x}(1)\) and \(P(x(1),x(0))=-\dot{x}(0)\) thus setting \(r(t)=\dot{x}(t)-(x(1) -x(0))\),

$$\begin{aligned}&\Vert P(x(0),x(1))+P(x(1),x(0))\Vert =\Vert \dot{x}(1)-\dot{x}(0)\Vert = \Vert \int _0^1 H(x(s),\dot{x}(s)) \mathrm{d}s\Vert \\&\quad \le \int _0^1 \Vert H(x(s),\hat{\dot{x}}(s)) \Vert \, \Vert \dot{x}(s)\Vert ^2 \mathrm{d}s\le \epsilon \int _0^1 \Vert \dot{x}(s)\Vert ^2 \mathrm{d}s\\&\quad \le \epsilon [ \int _0^1 \Vert r(s)\Vert ^2 \mathrm{d}s+\int _0^1\Vert x(1)-x(0)\Vert ^2 \mathrm{d}s+ 2 [x(1)-x(0)]\cdot \int _0^1 r(s)\mathrm{d}s] \\&\quad \le \epsilon (1+\epsilon ^2) \Vert x(1)-x(0)\Vert ^2, \end{aligned}$$

thus a redefinition of \(\epsilon \) gives the the last inequality of Theorem 17.

2.6 Local lipeomorphisms (Theorems 3 and 13)

Let \(\varphi (v,t)=\gamma '_v(t)\) be the geodesic flow on \(TM\). We have, \(\exp v=(\pi (v),\pi (\gamma _v(1)))\). If \(\varphi (v,1)\) is well defined then, by the continuity of the geodesic flow, so is \(\varphi (w,1)\) for \(w\) near \(v\). Thus \(\varOmega \) is open and, analogously, so is \(\varOmega _p\).

The map \(\exp \) is locally Lipschitz wherever it is defined because \(\pi \) is locally Lipschitz and if \(v\in \varOmega \), \(\varphi (v,1)\) is Lipschitz for \(w\) near \(v\) by the local Lipschitzness of the geodesic flow (that is by the dependence on initial conditions of solutions to the geodesic equation, see Sect. 2.2). Analogously, \(\exp _p\) is locally Lipschitz on \(\varOmega _p\).

We have shown that for each \(p\) there is a convex normal relatively compact open neighborhood \(C_p\), such that \(\exp \) provides a Lipeomorphism between the star-shaped relatively compact open set \(\exp ^{-1} C_p^2\) and \(C_p^2\).

Let \(\{\exp ^{-1} C_{p_i}^2\}\) be a locally finite covering of the image of the zero section \(Z\) of \(TM\), and let \(N=\cup _{p_i} \exp ^{-1} C_{p_i}^2\). Observe that if \(w\in N\) then \(w\in \exp ^{-1} C_{p_i}^2\) for some \(i\) and so it cannot be \(\exp w=\pi (w)\) unless \(w\in Z\) by the injectivity of \(\exp \) on \(C_p\). By construction for every compact set \(K\subset M\), we have that \(\pi ^{-1}(K)\cap \bar{N}\) is compact.

We want to show that there is an open subset \(E\subset N\), containing \(Z\) such that \(\exp |_E\) is injective and hence a Lipeomorphism on its image. Let \(K_i\), \(K_i \subset \mathrm Int K_{i+1}\subset M\), be a sequence of compact sets such that \(\cup _i K_i=M\), and let \(\bar{N_j}\subset N\), \(\bar{N}_{j+1} \subset N_j\), be open neighborhoods of \(Z\) such that for every compact set \(K\), \(\pi ^{-1}(K)\cap \bar{N}_j\) is compact and \(\cap _j \bar{N}_j=Z\). Clearly \(\exp \) is injective on \(Z\). By induction, suppose that there is an increasing map \(\sigma \) defined on \(\{1, 2, \ldots , k\}\) such that the map \(\exp \) is injective on the set (\(E_0:=Z\))

$$\begin{aligned} E_k{:=} Z\cup \bigcup _{i=1}^k \pi ^{-1}(\mathrm Int K_i)\cap {N}_{\sigma (i)}. \end{aligned}$$

and hence \(E_j\) for \(j\le k\). Then we can define \(\sigma (k+1)\) such that the same property holds with \(k\) replaced by \(k+1\). For if not we could find sequence \(v_j, w_j \in E_k \cup (\pi ^{-1}(\mathrm Int K_{k+1}) \cap {N}_{j} )\), \(v_j\ne w_j\), such that for every \(j\), \(\exp v_j=\exp w_j\). On the first component the last equality reads \(\pi (v_j)=\pi (w_j)\). However, by injectivity of \(\exp \) on \(E_k\), \(v_j\), \(w_j\) do not both belong to \(E_k\) thus passing to a subsequence we can assume without loss of generality that \(v_j \in \pi ^{-1}(\mathrm Int K_{k+1}) \cap {N}_{j}\). Observe that \(v_j\) belongs to the compact set \(\pi ^{-1}(K_{k+1})\cap \bar{N}\), and passing to the limit we obtain that up to subsequences \(v_j\) converges to

$$\begin{aligned} v \in \cap _j[\pi ^{-1}( K_{k+1})\cap \bar{N}_j]\subset \pi ^{-1}(K_{k+1})\cap Z. \end{aligned}$$

In particular, \(\pi (w_j)=\pi (v_j)\) converges to some point of \(K_{k+1}\), thus for sufficiently large \(j\), \(w_j\) is contained in a compact set \(\pi ^{-1}(K_{k+2})\cap \bar{N}_1\subset N\) and so converges up to subsequences to some vector \(w\in N\). Using the continuity of \(\pi \) we obtain \(\pi (v)=\pi (w)\). By the continuity of \(\exp \), \(\exp v=\exp w\), and using \(v\in Z\), \(\pi (v)=\exp w\). Thus \(\pi (w)=\exp w\), and since \(w\in {N}\) we have by the observation above that \(w\in Z\), thus \(w=v=0_{\pi (v)}\). Let \(C\) be a convex normal neighborhood of \(r:=\pi (v)\), (\(v=0_r\)) then \(\exp ^{-1} C^2\) is a neighborhood of \(v\in TC\) and the vectors \(v_j\) and \(w_j\) for sufficiently large \(j\) enter it which contradicts the injectivity of \(\exp \) on \(\exp ^{-1} C^2\). Finally, \(E=\cup _j E_j\) gives the searched open set.

The proof in the normal bundle case is analogous. We have to start from a sequence \(p_i\in \phi (S)\) such that \(C_{p_i}\) is a locally finite covering of \(\phi (S)\), define \(N= \cup _{p_i} \exp ^{-1} C_{p_i}\subset \nu (S)\) and proceed as in [51], Prop. 26, Chap. 7] to prove the injectivity of \(\exp _{\nu (S)}\) on an open subset \(E\subset N\).

3 Proofs II: Pseudo-Finsler sprays and connections

In the next section we prove Gauss’ Lemma for sprays which come from a pseudo-Finsler metric \(L\).

3.1 Gauss’ Lemma (Theorem 5)

Let us prove Eq. (13). Since \(\exp _p^{-1}\) is Lipschitz, \(D^2_p\) is Lipschitz. By Theorem 1 the function \( 2g_{P(p,q)}( P(p,q), \cdot ) \) where \(P(p,q):=\gamma _{\exp _p^{-1} q}'(1)\), is Lipschitz in \(q\). By Theorem 16 we need only to show that the differential of \(D^2_p\) exists and coincides with the previous expression almost everywhere on \(N\).

However, we know that \(\exp \) is differentiable almost everywhere over the star-shaped open set \(\exp ^{-1}N\subset T_pM\), and hence, by Fubini’s theorem, that it is almost everywhere differentiable on almost every radial line passing through the origin (the expression a.e. here refers to the (\(n-1\))-dimensional Lebesgue measure of a Euclidean sphere contained in \(\exp ^{-1}N\)). It suffices to take \(q\) on the exponential map of one of these geodesics (Lipeomorphisms preserve zero measure sets [22], Sect. 2.4]).

Let \(w\in T_qM\) and let \(\sigma : [-a,a]\rightarrow N\) be a (\(C^{2,1}\)) geodesic segment such that \(\sigma '(0)=w\). Let \(x^{(s)}: [0,1] \rightarrow N\), \(t\mapsto x^{(s)}(t)\), be the unique geodesic such that \(x^{(s)}(0)=p\), \(x^{(s)}(1)=\sigma (s)\). Let \(v(s)=\exp _p^{-1} \sigma (s)\), since \(\exp _p^{-1}\) is Lipschitz \(v: [-a,a]\rightarrow T_pM\) is Lipschitz. In particular \(S=\{u\in T_pM: u=tv(s), t\in [0,1], s\in [-a,a]\}\) is a Lipschitz submanifold of \(T_pM\), thus by Theorem 1 \(x(t,s):=x^{(s)}(t)=\exp _p (tv(s))\) is Lipschitz in \((t,s)\). Furthermore, still by Theorem 1

(*): for almost every \(s\), for every \(t\in [0,1]\), \(x(t,\cdot )\) is differentiable at \(s\) and the derivative \(\partial _2 x(t,s)\) is locally Lipschitz in \(t\), locally uniformly with respect to those \(s\) where it is defined. Finally, for any such \(s\) we have that for almost every \(t\in [0,1]\), the mixed partial derivatives \(\partial _1\partial _2 x(t,s)\), \(\partial _2\partial _1 x(t,s)\) are locally bounded and coincide.

The quantity \(L(x^{(s)}(t),x^{(s)}_t(t))\) is independent of \(t\) hence coincident with \(L(p,v(s))\). The function \(v(s)\) being Lipschitz is differentiable almost everywhere. Let \(s\) be such that \(v'(s)\) exists and (*) holds

$$\begin{aligned} \frac{1}{2}\,\partial _{\sigma '(s)} D^2_p&= \frac{\mathrm{d}}{\mathrm{d}s}\,L(p,v(s))= \frac{\partial }{\partial s}\, L(x(t,s),x_t(t,s)) = x^\mu _s \frac{\partial L}{\partial x^\mu }+ \frac{\partial x^\mu _t}{\partial s} \frac{\partial L}{\partial x^\mu _t}\\ {}&= \frac{\partial }{\partial t}\left( x^\mu _s \frac{\partial L}{\partial x^\mu _t}\right) -x^\mu _s\left( \frac{\partial }{\partial t} \frac{\partial L}{\partial x^\mu _t}-\frac{\partial L}{\partial x^\mu }\right) +\left( \frac{\partial x^\mu _t}{\partial s} -\frac{\partial x^\mu _s}{\partial t}\right) \frac{\partial L}{\partial x^\mu _t} \end{aligned}$$

where \(t\) can be chosen arbitrarily. By (*) for almost every \(t\) the last term vanishes. Moreover, the second term of the right-hand side vanishes because \(x^{(s)}(t)\) is a geodesic, hence it solves the Euler-Lagrange equation for the Lagrangian \(L\). Integrating in \(t\) over the interval \([0,1]\) we obtain, taking into account that the left-hand side does not depend on \(t\) and using Eq. (7)

$$\begin{aligned} \frac{1}{2}\, \partial _{\sigma '(s)} D^2_p&= g_{x_t(1,s)}(x_t(1,s),\sigma '(s))-g_{x_t(0,s)}(x_t(0,s),x_s(0,s)) \nonumber \\&=g_{\gamma '_{v(s)}(1)}(\gamma '_{v(s)}(1), \sigma '(s)), \end{aligned}$$
(47)

where we used the fact that since \(x(0,s)=p\), it is \(x_s(0,s)=0\). Evaluated at \(s=0\) the previous expression proves Eq. (13) and hence the first claim of the theorem.

Let \(\alpha :[0,a)\rightarrow N\), \(t \mapsto \alpha (t)\), be an integral curve of \(P\). Let us differentiate \(D^2_p\) along it

$$\begin{aligned} \frac{\mathrm{d}D^2_p(\alpha (t))}{\mathrm{d}t}=2 g_{P(\alpha (t))}(P(\alpha (t)), \dot{\alpha }(t))\!=\!2g_{P(\alpha (t))}(P(\alpha (t)), P(\alpha (t)))\!=\!2D^2_p(\alpha (t)). \end{aligned}$$

Then \(D^2_p(\alpha (t))=D^2_p(\alpha (0)) \exp (2t)\), and since \(\exp ^{-1} N\) is star-shaped the \(-t\)-time flow maps \((D^2_p)^{-1}(s)\) to \((D^2_p)^{-1}(s e^{-2t})\) for \(t>0\). Since \(P\) is Lipschitz, by the mentioned result on the dependence of solutions to first order ODE on the initial conditions, the flow map is Lipschitz, and since it is injective and can be inverted it is actually a Lipeomorphism between \((D^2_p)^{-1}(s)\) and its image on \((D^2_p)^{-1}(s e^{-2t})\).

Finally, suppose that \(\exp _p\) is differentiable at \(v\in T_p M\backslash 0\). We observe that \(\gamma '_v(1)=(d \exp _p)_v v\). Let \(w\in T_v(T_pM)\sim T_pM\) and define \(v(s):=v+s w\) and \(\sigma (s):=\exp _p(v+s w)\), so that by the differentiability assumption \(\sigma '(s)=(d \exp _p)_v w\). Due to the choice of curve \(\sigma (s)\) we have \(D^2_p(\sigma (s))=2L(p,\exp ^{-1}_p(\sigma (s)))=2L(p,v+sw)\) thus from Eq. (7)

$$\begin{aligned} \frac{1}{2} \frac{\mathrm{d}D^2_p(\sigma (s))}{\mathrm{d}s}|_{s=0}=g_v(v,w) \end{aligned}$$

while from Eq. (47)

$$\begin{aligned} \frac{1}{2} \frac{\mathrm{d}D^2_p(\sigma (s))}{\mathrm{d}s}|_{s=0}=g_{(d \exp _p)_v v}((d \exp _p)_v v,(d \exp _p)_v w). \end{aligned}$$

3.2 Local properties of geodesics in pseudo-Finsler geometry (Theorem 6)

Let \(\sigma :[0,1] \rightarrow N\) be an AC-curve starting from \(p\) and ending at \(q\in N\).

Let us consider the Finsler case. By continuity there is a last value \(\hat{s}\ge 0\), such that \(D_p(\sigma (\hat{s}))=0\) (possibly \(\hat{s}=0\) or \(\hat{s}=1\)).

The length of the geodesic connecting \(p\) to \(q\) is \(D_p(q)\) and is positive if and only if \(q\ne p\). The statement of the theorem is trivial for \(q=p\) thus let us assume \(q\ne p\) (so that \(\hat{s}\ne 1\)). Since \(D_p^2\) is \(C^{1,1}\), \(D_p\) is \(C^{1,1}\) in the region \(N\backslash \{p\}\). Thus \(D_p(\sigma (s))\) being the composition of a locally Lipschitz and an absolutely continuous function is absolutely continuous. We have for \(s\ge \hat{s}\)

$$\begin{aligned} D_p(\sigma (s))&=\int _{\hat{s}}^s \frac{\mathrm{d}D_p(\sigma (s))}{\mathrm{d}s} \,\mathrm{d}s= \int _{\hat{s}}^s \frac{1}{D_p(\sigma (s))}\,g_{P(p,\sigma (s))}(P(p,\sigma (s)),\sigma '(s)) \,\mathrm{d}s\\&= \int _{\hat{s}}^s g_{\hat{P}(p,\sigma (s))}( \hat{P}(p,\sigma (s)),\sigma '(s))\, \mathrm{d}s \\&\le \int _{\hat{s}}^s \, \sqrt{g_{\sigma '(s)}(\sigma '(s),\sigma '(s))} \,\mathrm{d}s\le l[\sigma ], \end{aligned}$$

where \(\hat{P}:=P/\sqrt{g_P(P,P)}\). In the last step we used the analog of the Cauchy-Schwarz inequality for Finsler geometry [5], Theor. 1.2.2]. For \(s=1\) the above inequality proves that the length of \(\sigma \) is no smaller than that of the geodesic connecting its endpoints. If they are equal then for almost every \(s\in [0,1]\), we have the equality \(g_{\hat{P}(p,\sigma (s))}( \hat{P}(p,\sigma (s)),\sigma '(s))= \sqrt{ g_{\sigma '(s)}(\sigma '(s),\sigma '(s))} \), thus, by the equality case in the Finslerian Cauchy-Schwarz inequality we have that for almost every \(s\in [0,1]\), \(\sigma '\propto P(p,\sigma (s))\). If we introduce spherical normal coordinates \((r,\theta _1,..., \theta _{n-1})\), as this coordinate chart is Lipschitz related to those of \(M\), \(\sigma \) is still absolutely continuous in this chart. Thus since \(\sigma '\propto \partial _r\) almost everywhere, the angular coordinates cannot change over \(\sigma \), for otherwise since \(\theta _i(\sigma (s))\) is the integral of its own derivative one would gets that \(\sigma '\) is not radial in a set of non-vanishing measure, a contradiction. Thus the image of \(\sigma \) coincides with the image of an integral curve of \(P\) and hence coincides with the image of the geodesic \(\eta (r)\) connecting \(p=\sigma (0)\) with \(q=\sigma (1)\). Since the coordinates of the spherical normal chart are Lipschitz functions, the composition \(r(s)\) is absolutely continuous. By definition \(r\) is an affine parameter over the geodesic which has the same image of \(\sigma \). The map is necessarily increasing, for if \(r(s_2)\le r(s_1)\) for \(s_1<s_2\), then we would have \(r'< 0\) (by definition of AC-curve \(r'\ne 0\) almost everywhere) in a subset of measure different from zero on \([s_1,s_2]\), and it would be easy to obtain a shorter curve cutting a piece of domain from \(\sigma \), a contradiction to the length minimization assumption.

Let us consider now the Lorentzian–Finsler case with \(\sigma \) causal and future directed. We recall that the analog to the reverse Cauchy–Schwarz inequality for Finsler spacetimes reads [47]:

Let \(v_1,v_2\) be causal and future directed then

$$\begin{aligned} -g_{v_1}({v}_1,{v}_2)\ge \sqrt{-g_{v_1}({v}_1,{v}_1)}\, \sqrt{-g_{v_2}({v}_2,{v}_2)}, \end{aligned}$$
(48)

with equality if and only if \(v_1\) and \(v_2\) are proportional.

Suppose that for some \(\tilde{s}\), \( D_p^2(\tilde{s})< 0\). The Lorentzian–Finsler length of the geodesic connecting \(p\) to \(q\) is: \(D^L_p(q):=(- D^2_p(q))^{1/2}\). Since \(D_p^2\) is \(C^{1,1}\), \(D^L_p\) is \(C^{1,1}\) in the region \(D_p^2< 0\). Thus \(D^L(\sigma (s))\) being the composition of a locally Lipschitz and an absolutely continuous function is absolutely continuous. We know that \(D_p^2(\tilde{s})< 0\) and by continuity the same inequality holds in an interval \([\tilde{s},s]\) provided \(s\) is sufficiently close to \(\tilde{s}\). We have

$$\begin{aligned}&D_p^L(\sigma (s))- D_p^L(\sigma (\tilde{s}))=\int _{\tilde{s}}^s \frac{\mathrm{d}D_p^L(\sigma (s))}{\mathrm{d}s}\, \mathrm{d}s \nonumber \\&=- \int _{\tilde{s}}^s \frac{1}{D_p^L(\sigma (t))}\,g_{P(p,\sigma (s))}(P(p,\sigma (s)),\sigma '(s))\, \mathrm{d}s \nonumber \\&\!=\! -\!\int _{\tilde{s}}^s \,g_{\hat{P}(p,\sigma (s))}(\hat{P}(p,\sigma (s)),\sigma '(s)) \, \mathrm{d}s \ge \int _{\hat{s}}^s \sqrt{- g_{\sigma '(s)}(\sigma '(s),\sigma '(s))} \,\mathrm{d}s \nonumber \\&\ge \, l[\sigma ], \end{aligned}$$
(49)

where \(\hat{P}:=P/\sqrt{-g_P(P,P)}\). In the last inequality we used the above Finslerian reverse Cauchy–Schwarz inequality.

The equality so obtained proves that once \(\sigma \) enters a region with \(D^2_p<0\) (the chronological future of \(p\)) it remains in that region.

Now let \(\eta :[-\epsilon ,0] \rightarrow N\), \(\eta (0)=p\), be a small future directed timelike geodesic contained in a reversible convex normal neighborhood of \(p\), \(C\subset N\). For sufficiently small \(s\), \(\sigma (s)\in C\), and the curve obtained concatenating \(\eta \) with \(\sigma \) which connects \(\eta (-\epsilon )\) to \(\sigma (s)\) starts with a timelike geodesic, hence it enters the chronological future of \(\eta (-\epsilon )\), and hence, by the above argument there is a future directed timelike geodesic \(\nu ^{(\epsilon )}\) connecting \(\eta (-\epsilon )\) with \(\sigma (s)\). Letting \(\epsilon \rightarrow 0\), and using the continuity of the exponential map \(\tilde{\exp }\) for the reverse spray at \(\sigma (s)\) we infer the existence of a geodesic connecting \(p\) to \(\sigma (s)\), which by the continuity \(g_{v}(v,v)\) at \( T_{\sigma (s)}M\) must be future directed causal. As \(s\) is arbitrary we have shown that in a maximal closed interval \([0,b]\subset [0,1]\), \(b>0\), we have \(D^2_p(\sigma (s))\le 0\).

Let us prove that if for \(a\in (0,b]\), \(D^2_p(\sigma (a))=0\) then \(\sigma |_{[0,a]}\) is a lightlike geodesic up to parametrizations and hence that \(D^2_p=0\) over \([0,a]\).

Observe that \(D^2_p\) is Lipschitz thus \(D^2_p(\sigma (s))\) is absolutely continuous

$$\begin{aligned} D^2_p(\sigma (a))=\int _0^a \frac{\mathrm{d}D^2_p(\sigma (s))}{\mathrm{d}s}\, \mathrm{d}s=2 \int _0^a g_{P(p,\sigma (s))}(P(p,\sigma (s)), \sigma '(s))\, \mathrm{d}s. \end{aligned}$$

Since on the region \(D^2_p\le 0\), we have \(g_{P(p,\sigma (s))}(P(p,\sigma (s)), \sigma '(s))\le 0\) for almost every \(s\) (by the Finslerian reverse Cauchy-Schwarz inequality since \(\sigma '\) is future directed causal almost everywhere), thus we can have \(D^2_p(\sigma (a))=0\) only if \(\sigma '\propto P\) for almost every \(s\) in \([0,a]\). Introducing a Euclidean scalar product on \(T_pM\), associated spherical normal coordinates over \(N\), and arguing as above for the Finsler case we obtain that \(\sigma |_{[0,a]}\) is an integral curve of \(P\), hence a lightlike geodesic issued from \(p\).

From now on let \(a\) be the maximum value of \(s\) for which \(D^2_p(\sigma (s))=0\).

It remains only to prove that \(b=1\). Suppose not then \(a=b\) otherwise \(D^2_p(b)<0\) which would imply the same inequality also in \((b,1]\), a contradiction to \(b<1\). Set \(p'=\sigma (b)\) and take a reversible convex normal neighborhood \(C'\ni p'\), \(C'\subset N\). Arguing as above proves that for any sufficiently small \(\delta \), \(p'\) is connected to \(\sigma (b+\delta )\) by a future directed causal geodesic \(\eta : [0,1]\rightarrow C'\). This geodesic cannot be the prolongation of the lightlike geodesic \(\sigma _{[0,b]}\) for we would get \(D_p^2(\sigma (b+\alpha \delta ))\le 0\), \(\alpha \in [0,1]\), a contradiction to the maximality of \(b\). Thus the scalar product \(g_{P(p,\eta (t))}(P(p,\eta (t)), \eta '(t))\) is negative for \(t=0\) and hence in a neighborhood of \(t=0\). Now observe that \(D^2_p\) is Lipschitz thus \(D^2_p(\eta (t))\) is absolutely continuous, and for sufficiently small \(t\)

$$\begin{aligned} D^2_p(\eta (t))&= D^2_p(\sigma (b))+\int _0^t \frac{\mathrm{d}D^2_p(\eta (t))}{\mathrm{d}t}\, \mathrm{d}t\\&= 2 \int _0^t g_{P(p,\eta (t))}(P(p,\eta (t)), \eta '(t))\, \mathrm{d}t<0. \end{aligned}$$

As the concatenation of \(\sigma _{|_{[0,b]}}\) with \(\eta \) is a causal AC-curve and on it \(D^2_p\) becomes negative at some point, and it remains so, we have at the endpoint \(D^2_p(\sigma (b+\delta ))=D^2_p(\eta (1))<0\). As \(\delta \) is arbitrarily we get a contradiction to the maximality of \(b\). The contradiction proves that \(b=1\).

If \(\sigma \) is a lightlike geodesic up to parametrization, then clearly its Lorentzian-Finsler length vanishes and the inequality \(D_p^L(\sigma (1))\ge l(\sigma )\) is satisfied. Suppose that \(\sigma \) is not a lightlike geodesic up to parametrizations then \(a<1\), and its Lorentzian-Finsler length is given just by the contribution of \(\sigma _{[a,1]}\). Let \(\tilde{s}\in [a,1]\) so that \(D^2_p(\sigma (\tilde{s}))<0\). By (49)

$$\begin{aligned} D^L_p(\sigma (1))\ge l(\sigma _{[\tilde{s},1]}) \end{aligned}$$

and taking the limit \(\tilde{s}\rightarrow a\) we obtain \(D^L_p(\sigma (1))\ge l(\sigma )\). This proves that \(\sigma \) has a Lorentzian-Finsler length no larger than that of the geodesic connecting its endpoints.

Now, suppose by contradiction that they have the same Lorentzian-Finsler length and that \(\sigma \) is not a causal geodesic up to parametrizations. Then necessarily \(a<1\), for otherwise it would be a lightlike geodesic. But then from (49), for \(\tilde{s}>a\),

$$\begin{aligned} D^L_p(\sigma (1))\ge D^L_p(\sigma (\tilde{s})+ l(\sigma _{[\tilde{s},1]})\ge l(\sigma _{[0,\tilde{s}]})+l(\sigma _{[\tilde{s},1]})=l(\sigma ). \end{aligned}$$

Thus the equality implies that the first inequality is actually an equality which implies that \(g_{P(p,\sigma (s))}(P(p,\sigma (s)),\sigma '(s))=0\) for almost every \(s\in [\tilde{s}, 1]\), and hence, by the arbitrariness of \(\tilde{s}\), \(\sigma '\propto P\) for almost every \(s\in [a,1]\). Introducing again spherical normal coordinates and arguing as above proves that the image of \(\sigma |_{[a,1]}\) is an integral curve of \(P\) (and hence the prolongation of \(\sigma |_{[0,a]}\) if \(a\ne 0\)) thus it is the image of a geodesic.

Finally, suppose that the image of \(\sigma \) coincides with that of a causal geodesic \(\eta \). Since the coordinates of the spherical normal chart are Lipschitz functions, the composition \(r(s)\) is absolutely continuous. By definition \(r\) is an affine parameter over the geodesic \(\eta \). The map \(r(s)\) is necessarily increasing, for if \(r(s_2)\le r(s_1)\) for \(s_1<s_2\), then we would have \(r'\le 0\) and hence in a subset of measure different from zero on \([s_1,s_2]\), which would imply that \(\frac{\mathrm{d}}{\mathrm{d}s } \sigma =(\frac{\mathrm{d}}{\mathrm{d}r } \eta ) r'\) is not future directed causal in a set of measure different from zero, a contradiction to the definition of future directed causal AC-curve.

3.3 Strong convexity of squared Riemannian distance (Theorem 8)

Let \(C\) be a convex neighborhood of \(p\) as in Theorems 4 and 17 where \(\epsilon \in (0,1)\) and where the coordinate chart is chosen so that \(\varGamma ^\mu _{\alpha \beta }(p)=0\).

Let us consider the Riemannian case. From Eq. (42) using the the Cauchy-Schwarz inequality

$$\begin{aligned} | P(q,q_2)\cdot (q_2-q_1)-P(q,q_1)\cdot (q_2-q_1)-(q_2-q_1)^2|\le \epsilon \Vert q_2-q_1\Vert ^2. \end{aligned}$$
(50)

Let \(g_{q}\) be the matrix of \(g\) at \(q\). Since \(g\) is \(C^{1}\) it is strongly differentiable and by the choice of coordinate system its strong derivative vanishes at \(p\). Thus \(C\) can be chosen sufficiently small that for every \(q_1,q_2\in C\)

$$\begin{aligned} \Vert g_{q_2}-g_{q_1}\Vert \le \epsilon \Vert q_2-q_1\Vert \end{aligned}$$

Moreover, \(C\) can be chosen sufficiently small that once expressed in the coordinate chart \(\Vert g-I\Vert \le \epsilon \) on \(C\).

Thus if \(D^2_{q}\) is the squared distance function from \(q\), using also Eqs. (42), (43) and (44) we are able to prove Eq. (14)

$$\begin{aligned}&\frac{1}{2}\left| \left[ \mathrm{d}D^2_q(q_2)- \mathrm{d}D^2_q(q_1)\right] (q_2-q_1)-2(q_2-q_1)^2 \right| \\&\quad =| g_{q_2}(P(q,q_2), (q_2-q_1))-g_{q_1}(P(q,q_1), (q_2-q_1))-(q_2-q_1)^2|\\&\quad =| g_{q_2}(P(q,q_2)-P(q,q_1), (q_2-q_1))-(g_{q_1}-g_{q_2})(P(q,q_1), \\&\qquad \times (q_2-q_1))-(q_2-q_1)^2| \\&\quad \le | [P(q,q_2)-P(q,q_1)]\cdot (q_2-q_1)-(q_2-q_1)^2|+\epsilon \Vert P(q,q_2)-P(q,q_1) \Vert \, \\&\quad \quad \Vert q_2-q_1\Vert + \epsilon \Vert P(q,q_2)\Vert \, \Vert q_2-q_1\Vert ^2\\&\quad \le 2\epsilon (1+ \epsilon ) \Vert q_2-q_1\Vert ^2, \end{aligned}$$

which proves that \(D^2_{q}\) is strongly convex with respect to the affine structure induced by the coordinate chart.

Let us give a geodesic version. This time we shall need to use Eq. (45). Also observe that from Eq. (43) we have

$$\begin{aligned} \Vert P(q_1,q_2)-(q_2-q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert \le \epsilon \Vert P(q_1,q_2)-(q_2-q_1)\Vert +\epsilon \Vert P(q_1,q_2)\Vert , \end{aligned}$$

and hence

$$\begin{aligned} \Vert P(q_1,q_2)-(q_2-q_1)\Vert \le \frac{\epsilon }{1-\epsilon } \Vert P(q_1,q_2)\Vert , \end{aligned}$$

and

$$\begin{aligned} \Vert q_2-q_1\Vert \le \frac{1}{1-\epsilon } \Vert P(q_1,q_2)\Vert . \end{aligned}$$
(51)

Similarly, let \(g\) be a metric such that at any point of \(C\), \(\Vert g-I\Vert \le \epsilon \), and let \(v\) be any vector. Then

$$\begin{aligned} | (g-I)(v,v)| \le \epsilon v\cdot v = \epsilon |(I-g)(v,v)+g(v,v) |\le \epsilon | (g-I)(v,v)|+\epsilon g(v,v), \end{aligned}$$

from which we obtain

$$\begin{aligned} | (g-I)(v,v)| \le \frac{\epsilon }{1-\epsilon } \,| g(v,v)| , \end{aligned}$$

and

$$\begin{aligned} | v\cdot v|\le | (I-g)(v,v)| +| g(v,v)|\le \frac{1}{1-\epsilon } \,| g(v,v)|. \end{aligned}$$

Let \(x:[0,1]\rightarrow C\) be a geodesic and let \(q_1:=x(0)\), \(q_2:=x(1)\). We are ready to prove Eq. (15).

$$\begin{aligned}&\frac{1}{2} | \frac{\mathrm{d}}{\mathrm{d}t}D(q,x(t))^2|_{t=1}-\frac{\mathrm{d}}{\mathrm{d}t} D(q,x(t))^2|_{t=0}-2D(q_1,q_2)^2 |\\&\quad =| g_{q_2}(P(q,q_2),P(q_1,q_2))+g_{q_1}(P(q,q_1),P(q_2,q_1))\\&\qquad - g_{q_1}(P(q_2,q_1),P(q_2,q_1)) |\\&\quad \le | g_{q_1}(P(q,q_2),P(q_1,q_2))+g_{q_1}(P(q,q_1),P(q_2,q_1))\\&\qquad - g_{q_1}(P(q_2,q_1),P(q_2,q_1)) | + \epsilon ^2(1+\epsilon ) \Vert q_2-q_1 \Vert ^2 \\&\quad \le | g_{q_1}(P(q,q_2),-P(q_2,q_1))+g_{q_1}(P(q,q_1),P(q_2,q_1))\\&\qquad - g_{q_1}(P(q_2,q_1),P(q_2,q_1)) | +2(1+\epsilon ) \epsilon ^2 \Vert q_2-q_1 \Vert ^2 \\&\quad \le | g_{q_1}(P(q,q_1)-P(q,q_2),P(q_2,q_1))- g_{q_1}(q_1-q_2,P(q_2,q_1)) | \\ {}&\qquad +(2(1+\epsilon ) \epsilon ^2+\epsilon (1+\epsilon )^2 ) \Vert q_2-q_1 \Vert ^2 \\&\quad \le | g_{q_1}(P(q,q_1)-P(q,q_2)-(q_1-q_2),P(q_2,q_1))| \\ {}&\qquad +(2(1+\epsilon ) \epsilon ^2+\epsilon (1+\epsilon )^2 ) \Vert q_2-q_1 \Vert ^2 \\&\quad \!\le \! 2 \epsilon (1+\epsilon ) (1+2\epsilon ) \Vert q_2\!-\!q_1 \Vert ^2\!\le \! 2 \epsilon \frac{(1+\epsilon ) (1+2\epsilon ) }{(1-\epsilon )^2}\, \Vert P(q_1,q_2) \Vert ^2\\&\quad \le 2 \epsilon \frac{(1+\epsilon ) (1+2\epsilon ) }{(1-\epsilon )^3} \,g_{q_2}(P(q_1,q_2),P(q_1,q_2)) \le 2 \epsilon \frac{(1+\epsilon ) (1+2\epsilon ) }{(1-\epsilon )^3} \,D(q_1,q_2)^2.\! \end{aligned}$$

A reparametrization of \(x\) with arc-length and a redefinition of \(\epsilon \) gives Eq. (15).

The statement of Theorem 8 concerning the strong convexity of \(D^2_q\) is immediate from the triangle inequality and from the equivalences recalled in Sect. 1.5.

3.4 Splitting of the metric at a given radius (Theorem 9)

On the coordinate ball let us introduce radial coordinates \((\rho ,\theta _1,\ldots ,\theta _{n-1})\). Each ball \((D_p)^{-1}([0,r])\) is convex with respect to the affine structure induced by the coordinate chart \((x_1,\ldots ,x_n)\), thus the radial lines issued from \(p\) intersect the boundary of the ball only once. Let \(r=D_p\), there is therefore a function \(\rho (r,\theta )\) establishing the dependence of the radial coordinate on the angular ones. We known that \(r(q(\rho ,\theta ))\) is \(C^{1,1}\), and Eq. (43) and \(\Vert g-I\Vert \le \epsilon \) imply that \(g(P(p,q), (q-p))\ne 0\), namely \(\partial r/\partial \rho \ne 0\). By the usual implicit function theorem \(\rho (r,\theta )\) is \(C^{1,1}\), thus the components of \(g\) in coordinates \((r,\theta _1,\ldots ,\theta _{n-1})\) are locally Lipschitz. In particular, for any given \(r>0\) the map which sends \(S^{n-1}\) (i.e. \(\theta \)) to \(D_p^{-1}(r)\) is \(C^{1,1}\) thus differentiable.

Taking \(r=cnst.\) in \(g_{ij}(r,\theta )\) shows that the metric induced on each hypersurface \(D_p^{-1}(r)\) is locally Lipschitz.

The function \(r\) is \(C^{1,1}\) and by Theorem 5 \( \nabla r\) is the normalized geodesic field orthogonal to the level sets of constant \(r\). Thus \(g^{-1}(\mathrm{d}r,\mathrm{d}r)=g(\nabla r, \nabla r)=1\) and Eq. (16)

$$\begin{aligned} g^{-1}=(\partial _r+A_i(r,\theta )\partial _i)^2+(h_r^{-1})_{ij} \partial _i\otimes \partial _j \end{aligned}$$

holds for some Lipschitz components \(A_i\), \((h_r^{-1})_{ij}\). Its inverse is

$$\begin{aligned} g=d r^2+(h_r)_{ij}( \mathrm{d}\theta _i-A_i \mathrm{d}r) ( \mathrm{d}\theta _j-A_j \mathrm{d}r), \end{aligned}$$

thus \(h_r\) is the metric induced on the level set \(D_p^{-1}(r)\).

Let us fix \(\bar{r}\) so that \(D_p^{-1}(\bar{r}) \subset C\). We know that \(\{\theta _i\}\) provides a \(C^{1,1}\) chart over \(D_p^{-1}(\bar{r})\). Let us rename these coordinates \(\{\alpha _i\}\) and let us extend them in a neighborhood of \(D_p^{-1}(\bar{r})\) solving the differential equation \(\partial \theta _i/\partial r=A_i(r,\theta )\), \(\theta _i(\bar{r},\alpha )=\alpha _i\). Since \(A_i\) is Lipschitz the solution \(\theta (r,\alpha )\) is Lipschitz in \(\alpha \) and \(C^{1,1}\) in \(r\) [31], Ex. 1.2, Chap. 2] [13], Prop. 1.10.1] [38], Cor. 1.6]. Thus \(g\) can be brought to a direct sum form almost everywhere where the Jacobian \(\partial \theta /\partial \alpha \) exists. This Jacobian exists at \(r=\bar{r}\) and equals the identity matrix, thus

$$\begin{aligned} h'_{ij}(\bar{r},\alpha )= h_{ks}(\bar{r},\theta (\bar{r},\alpha )) J^{k}_{i} J^{s}_{j}= h_{ij}(\bar{r},\theta (\bar{r},\alpha ))=h_{ij}(\bar{r},\alpha ). \end{aligned}$$

This equality proves that the components \(h'_{ij}(\bar{r},\cdot )\) exist and are Lipschitz in \(\alpha \).

Of course, since the geodesic flow is Lipschitz we could deduce immediately that normal spherical coordinates are Lipschitz and hence that \(g\) can be brought to a direct sum form almost everywhere, but we wanted to construct a coordinate system for which the direct sum form was valid everywhere at a given radius.

3.5 Some local results on the strong concavity of the squared Lorentzian distance (Theorems 10, 11, Corollary 2)

Proof

(Proof of Theorem 10) Let \(C\) be a convex neighborhood of \(p\) as in Theorems 4 and 17 where \(\epsilon \in (0,1/3)\) and the coordinate system is such that \(g_{\alpha \beta }(p)=\eta _{\alpha \beta }\), \(\varGamma ^\mu _{\alpha \beta }(p)=0\). The metric defined by

$$\begin{aligned} r(v_1,v_2)=g(v_1,v_2)+2 (\partial _0 \cdot v_1) (\partial _0\cdot v_2) \end{aligned}$$
(52)

is positive definite at \(p\) and hence \(C\) can be chosen sufficiently small that it is positive definite everywhere in \(C\). Observe that at \(p\) the metric \(r\) once expressed in components coincides with the identity matrix. Thus \(C\) can be chosen sufficiently small that

$$\begin{aligned} \Vert r-I\Vert \le \epsilon . \end{aligned}$$

From Eqs. (42, 43)

$$\begin{aligned} \Vert P(q,q_1)-P(q,q_2)-P(q_2,q_1)\Vert \le 2\epsilon \Vert q_2-q_1\Vert . \end{aligned}$$
(53)

Let \(x:[0,1]\rightarrow C\) be a geodesic, let \(q_1:=x(0)\) and \(q_2:=x(1)\), and let

$$\begin{aligned} V=P(q,q_1)-P(q,q_2)-P(q_2,q_1). \end{aligned}$$

We have

$$\begin{aligned} | g_{q_1}(V,P(q_2,q_1))|&=| r(V,P(q_2,q_1))- 2(\partial _0\cdot V) (\partial _0\cdot P(q_2,q_1))|\\&\le \Vert V\Vert \, \Vert P(q_2,q_1)\Vert \, [\Vert r\Vert +2| ]\le \Vert V\Vert \, \Vert P(q_2,q_1)\Vert \, (3+\epsilon )\\&\le 2 \epsilon (1+\epsilon )(3+\epsilon ) \Vert q_2-q_1\Vert ^2 \end{aligned}$$

Since \(g_q\) is \(C^{1,1}\) in \(q\) it is strongly differentiable at \(p\) with zero strong differential, thus \(C\) can be chosen sufficiently small that for every \(q_1,q_2\in C\),

$$\begin{aligned} \Vert g_{q_1}\Vert \le 1+\epsilon , \qquad \Vert g_{q_2}-g_{q_1}\Vert \le \epsilon \Vert q_2-q_1\Vert . \end{aligned}$$
(54)

Thus

$$\begin{aligned}&\frac{1}{2} \left| \frac{\mathrm{d}}{\mathrm{d}t}D^2_q(x(t))\right| _{t=1}-\frac{\mathrm{d}}{\mathrm{d}t} D^2_q(x(t))|_{t=0}-2D_{q_1}^2(q_2) |\\&\quad =| g_{q_2}(P(q,q_2),P(q_1,q_2))+g_{q_1}(P(q,q_1),P(q_2,q_1))\\&\qquad - g_{q_1}(P(q_2,q_1),P(q_2,q_1)) |\\&\quad \le |g_{q_1}(P(q,q_2),P(q_1,q_2))+g_{q_1}(P(q,q_1),P(q_2,q_1))\\&\qquad - g_{q_1}(P(q_2,q_1),P(q_2,q_1)) | + \epsilon ^2(1+\epsilon ) \Vert q_2-q_1 \Vert ^2 \\&\quad \le | g_{q_1}(P(q,q_2),-P(q_2,q_1))+g_{q_1}(P(q,q_1),P(q_2,q_1))\\&\qquad - g_{q_1}(P(q_2,q_1),P(q_2,q_1)) | +2(1+\epsilon ) \epsilon ^2 \Vert q_2-q_1 \Vert ^2\\&\quad \le | g_{q_1}(V,P(q_2,q_1))| +2(1+\epsilon ) \epsilon ^2 \Vert q_2-q_1 \Vert ^2\\&\quad \le 2 \epsilon (1+\epsilon )(3+2\epsilon ) \Vert q_2-q_1 \Vert ^2 \end{aligned}$$

A redefinition of \(\epsilon \) proves Eq. (17).\(\square \)

Proof

(Proof of Theorem 11) It is sufficient to prove Eq. (18) for \(\epsilon < 1/9\). Let us parametrize \(\gamma \) with respect to \(g\)-arc length (i.e. proper time), \(g(\dot{\gamma },\dot{\gamma })=-1\). As a first step let us introduce, through a quadratic locally invertible coordinate transformation, a coordinate system such that \(\dot{\gamma }(0)=\partial _0\), \(g_{\alpha \beta }(p)=\eta _{\alpha \beta }\) and \(\varGamma ^\mu _{\alpha \beta }(p)=0\), so that the hypothesis of Theorem 10 apply.

Let us consider the spacelike subspace of \(T_{\gamma (t)} M\), given by

$$\begin{aligned} S(t)\,{:=}\,\mathrm Ker \, g(\dot{\gamma }(t),\cdot ). \end{aligned}$$

Let \(L(t)\) be the subspace of \(T_{\gamma (t)}M\) spanned by \(\{\partial _i, i\ge 1\}\). For \(t=0\), \(S(0)=L(0)\), thus by continuity there is a neighborhood of \(0\), such that for \(t\) belonging to this neighborhood \(S(t)\) makes with \(L(t)\) an (Euclidean) angle smaller than \(1/16\) rad. Thus \(C\) can be taken sufficiently small that this property holds for every \(t\in \gamma ^{-1}(C)\).

Let \(x:[0,1]\rightarrow C\) be a geodesic such that \(x(0)\) and \(x(1)\) belong to the same level set of \((D^2_q)^{-1}(c)\), \(c<0\), for some \(q\in C\). By Theorem 6 \(x\) cannot be future directed causal otherwise \(D^2_q(x(1))>D^2_q(x(0))\), and it cannot be past directed causal otherwise \(D^2_q(x(1))<D^2_q(x(0))\), thus \(x\) is spacelike. Let \(a,b\in [0,1]\), \(a<b\), then \(y(t)=x((b-a)t+a)\) is such that \(y(0)=x(a)\) and \(y(1)=x(b)\). Let us use Eq. (51) in Eq. (17) for the geodesic \(y\)

$$\begin{aligned} | \frac{\mathrm{d}}{\mathrm{d}t}D^2_q(y(t))|_{t=1}-\frac{\mathrm{d}}{\mathrm{d}t} D^2_q(y(t))|_{t=0}-2D^2(x(a),x(b)) |\le \frac{\epsilon }{(1-\epsilon )^2}\,\Vert P(x(a),x(b)) \Vert ^2. \end{aligned}$$
(55)

Now we have to impose some constraint to \(x(a)\) and \(x(b)\) so as to obtain an inequality of the form

$$\begin{aligned}\Vert P(x(a),x(b)) \Vert ^2\le {\sigma (\epsilon )} g_{x(b)}(P(x(a),x(b)),P(x(a),x(b))) \end{aligned}$$

where \(\epsilon \sigma (\epsilon ) \rightarrow 0\) for \(\epsilon \rightarrow 0\).

We recall that given two Lorentzian metrics \(g_1,g_2\) on a differentiable manifold, \(g_1<g_2\) means that at each point the timelike cone of \(g_2\) contains the causal cone of \(g_1\). Let me consider the metric

$$\begin{aligned} \eta ^+{:=}-\frac{1+2\sqrt{\epsilon }}{1-2 \sqrt{\epsilon }}\, (\mathrm{d}x^0)^2+(\mathrm{d}\varvec{x})^2 \end{aligned}$$

which satisfies \(g<\eta ^+\) at \(p\). By continuity we can choose \(C\) so small that it holds anywhere in \(C\).

Suppose that \(v\in TC\) is a spacelike vector for \(\eta ^+\), that is \(\eta ^+(v,v)\ge 0\), then a little algebra shows that this condition can be rewritten

$$\begin{aligned} v\cdot v\le \frac{1}{\sqrt{\epsilon }}(-(1+\sqrt{\epsilon })( v^0)^2+(1-\sqrt{\epsilon })( \varvec{v})^2). \end{aligned}$$

Now observe that at \(p\) for every \(v\in T_p C\), \(\Vert v\Vert =1\),

$$\begin{aligned} -(1+\sqrt{\epsilon })( v^0)^2+(1-\sqrt{\epsilon })( \varvec{v})^2< g_p(v,v)=-(v^0)^2+(\varvec{v})^2, \end{aligned}$$

thus by continuity the same holds at any point in a neighborhood of \(p\), and we can choose \(C\) sufficiently small that for every \(v\in TC\), and \(q\in C\)

$$\begin{aligned} (-(1+\sqrt{\epsilon })( v^0)^2+(1-\sqrt{\epsilon })( \varvec{v})^2)\le g_q(v,v). \end{aligned}$$

As a consequence, for every \(q\in C\) and for every \(v\in TC\) such that \(\eta ^+(v,v)\ge 0\)

$$\begin{aligned} v\cdot v\le \frac{1}{\sqrt{\epsilon }}\, g_q(v,v). \end{aligned}$$
(56)

Let \(q=\gamma (t_q)\), \(r=\gamma (t_r)\), \(t_q,t_r\in I\), \(t_q\ne t_r\).

In Sect. 2.4 we proved that \(O:=\bar{B}(r,\delta )\) is strictly convex normal for any sufficiently small \(\delta \), and in Sect. 2.1 through Eq. (31) we proved that the Euclidean velocities \(\dot{x}(t)\), \(t\in [0,1]\) of any geodesic \(x:[0,1]\rightarrow \bar{O}\) are bounded by a constant \(V(\delta )\) which goes to zero for \(\delta \rightarrow 0\). Since \(\varGamma ^\mu _{\alpha \beta }\) is bounded in a neighborhood of \(r\), the geodesic equation implies that there is \(M>0\) such that \(\frac{\mathrm{d}v^\mu }{\mathrm{d}t}\le M \Vert v\Vert ^2\), which is the same as saying that the (Euclidean) radius of curvature is greater than \(1/M\) and hence \(\dot{x}\) can vary in an subinterval of \([0,1]\) of an angle of at most \(V(\delta ) M\). Thus if we take \(\delta \) sufficiently small we can make the variation of angle on the tangent to any geodesic \(x:[0,1]\rightarrow \bar{O}\) to be bounded by \(\frac{1}{8}\) rad.

We have already shown in Theorem 5 that \(D^2_q\) is \(C^{1,1}\) on \(C\) with a differential at \(r\) given by \(2g(P(q,r),\cdot )\). Since the differential is continuous, \(D^2_q\) is strongly differentiable thus for every \(\beta \) we can write for sufficiently small \(\delta >0\), for \(q_1,q_2\in \bar{O}\)

$$\begin{aligned} | D^2_q(q_2)-D^2_q(q_1)-2g(P(q,r),q_2-q_1)|\le \beta {\sqrt{-g(P(q,r),P(q,r))}} \,\Vert q_2-q_1 \Vert . \end{aligned}$$

If \(q_1\) and \(q_2\) belong to the same level set, recalling that

$$\begin{aligned} \dot{\gamma }(t_r)= P(q,r)/\sqrt{-g(P(q,r),P(q,r))} \end{aligned}$$

we obtain

$$\begin{aligned} | g_r(\dot{\gamma }(t_r),q_2-q_1)|\le \beta \, \Vert q_2-q_1 \Vert . \end{aligned}$$

As \(\beta \rightarrow 0\), the direction of \(q_2-q_1\) is constrained to approach \(S(t_r)\). As the space of directions is a sphere and hence compact there is a value of \(\beta \) and a corresponding \(O\) such that whenever \(q_1,q_2\in \bar{O}\) belong to the same level set of \(D_q\), \(q_2-q_1\) makes with \(S(t_r)\) an (Euclidean) angle smaller than \(1/16\) rad.

Since \(\Vert P(q_1,q_2)-(q_2-q_1)\Vert \le \epsilon \Vert q_2-q_1\Vert \) and \(\epsilon <1/9\), the vector \(P(q_1,q_2)\) makes with \(q_2-q_1\) an Euclidean angle smaller than \(2 \arcsin (\epsilon /2) <1/8\) rad (because \(\epsilon <1/9\)). Thus \(P(q_1,q_2)\) makes with \(S(t_r)\) an (Euclidean) angle smaller than \(\frac{3}{16}\) rad, and with \(L(t_r)\) an (Euclidean) angle smaller than \(\frac{2}{8}\) rad. If \(x(a)\) and \(x(b)\) are any two points in the geodesic \(x\) joining \(q_1\) to \(q_2\) then \(P(x(a),x(b))\) makes an angle with \(P(q_1,q_2)\) of at most \(\frac{1}{8}\) rad, thus it makes an angle with \(L(t_r)\) smaller than \(\frac{3}{8}\) rad. Moreover,

$$\begin{aligned} \arctan \left( \left[ \frac{1-2\sqrt{\epsilon }}{1+2\sqrt{\epsilon }}\right] ^{1/2}\right) > \arctan \left( \left[ \frac{1-2/3}{1+2/3}\right] ^{1/2}\right) >\frac{3}{8} \ \mathrm{ rad} \end{aligned}$$

thus \(P(x(a),x(b))\) is \(\eta ^+\)-spacelike (we knew already that it was \(g\)-spacelike). Since \(P(x(a),x(b))\) is \(\eta ^+\)-spacelike we have using Eqs. (55) and (56)

$$\begin{aligned}&| \nabla _{(b-a) \dot{x}(b)}D^2_q-\nabla _{(b-a) \dot{x}(a)} D^2_q-2D^2(x(a),x(b)) |\\&\quad \le \frac{\sqrt{\epsilon }}{(1-\epsilon )^2}\, g_{x(b)}(P(x(a),x(b)),P(x(a),x(b))) = \frac{\sqrt{\epsilon }}{(1-\epsilon )^2}\,D^2(x(a),x(b)) , \end{aligned}$$

which a redefinition of \(\epsilon \), and a redefinition of parametrization such that \((b-a)\rightarrow D(x(a),x(b))\) (namely the \(g\)-arc length parametrization) brings to the form of Eq. (18).

The statement of concerning the strong convexity of \(D^2_q\circ x\) is immediate from the equivalences recalled in Sect. 1.5.

Let us prove the the strict convexity of \((D^2_q)^{-1}((-\infty ,c))\cap O\). Suppose that \(c<0\) is such that \(G:=(D^2_q)^{-1}((-\infty ,c))\cap O\ne \emptyset \), otherwise there is nothing to prove. Let \(x:[a,b]\rightarrow C\) be a geodesic such that \(x(a),x(b)\in \bar{G}\), then \(x(a),x(b)\in \bar{O}\), and since \(O\) is strictly geodesically convex \(x\) is contained in \(O\) but for the endpoints. Since \(D^2_q:C\times C\rightarrow \mathbb {R}\) is continuous, \(x(a),x(b)\le c\) and we have to show that for every \(t\in (a,b)\), \(D^2_q(x(t))<c\).

Suppose not, then there is some \(t_{max}\in (a,b)\) such that \(D^2_q(x(t_{max}))\) is the maximum of \(D^2_q(x(\cdot ))\) over \([a,b]\) and \(D^2_q(x(t_{max}))\ge c\). In particular,

$$\begin{aligned} \frac{\mathrm{d}(D^2_p\circ x)}{\mathrm{d}t}|_{t=t_{max}}=\nabla _{\dot{x}(t_{max})} D^2_q(x(t_{max}))=0. \end{aligned}$$

However, by Eq. (18) no two values of \(t\) can attain this maximum, thus in any neighborhood \(E\) of \(t_{max}\) we can find \(t_1,t_2\in E\backslash \{t_{max}\}\), \(t_1<t_{\max }<t_2\) such that \(D^2_q(x(t_1))=D^2_q(x(t_2))<D^2_q(x(t_{max}))\), thus we can apply once again the equation following Eq. (18) using these two values as endpoints of the spacelike geodesic. But that equation implies that \(D^2_q(x(t_{max}))< D^2_q(x(t_2))\), a contradiction.\(\square \)

Lemma 4

Let \(p\in M\) and let \(\gamma : I\rightarrow M\), \(t\mapsto \gamma (t)\), be a timelike geodesic such that \(p=\gamma (0)\). The convex normal set \(C\ni p\) can be taken sufficiently small that once \(I\) is redefined to be the connected component of \(\gamma ^{-1}(C)\) containing \(0\), the following property holds. We can find \(q_1=\gamma (t_1)\), and \(q_2=\gamma (t_2)\) with \(t_1<0<t_2\), and a strictly convex normal set \(O\ni p\), such that introduced the constants \(c_1:=D^2_{q_1}(p)\) and \(c_2:=D^2_{q_2}(p)\), we have that, for any \(c_1'>c_1\) sufficiently close to \(c_1\) and for any \(c_2'>c_2\) sufficiently close to \(c_2\),

$$\begin{aligned} S(c_1',c_2')=(D^2_{q_1})^{-1}((-\infty , c_1'))\cap (D^2_{q_2})^{-1}((-\infty , c_2')) \cap O \end{aligned}$$

is strictly convex normal and globally hyperbolic.

Proof

By Theorem 11 we can find a strictly convex relatively compact set \(C\ni p\) with the property of that theorem. In particular, let \(q_1=\gamma (t_1)\), \(q_1\in C\), \(t_1<0\), and let \(q_2=\gamma (t_2)\), \(q_2\in C\), \(t_2>0\). There is a strictly convex normal set \(O_1\ni p\), \(\bar{O}_1\subset I^+_C(q_1)\), such that \(D^2_{q_1}:C\times C\rightarrow \mathbb {R}\) is strongly convex over the geodesics segments in \(O_1\) connecting two points in its level surfaces. Similarly there is a strictly convex set \(O_2\), \(\bar{O}_2\subset I^{-}_C(q_2)\), with an analogous property with respect to \(q_2\). Let \(O=O_1\cap O_2\).

Let us introduce the closed sets

$$\begin{aligned} A(c_1',c_2')= (D^2_{q_1})^{-1}((-\infty , c_1'])\cap (D^2_{q_2})^{-1}((-\infty , c_2'])\cap \bar{O}. \end{aligned}$$

The set \(A(c_1,c_2)\) contains just \(p\) for otherwise there would be a different timelike curve made of two geodesic pieces of total Lorentzian length equal with that of \(\gamma |_{[t_{1},t_2]}\), a contradiction to Theorem 6. Since the intersection of a family of non-empty compact sets is non-empty, for sufficiently large \(i\) the set \(A(c_1+1/i,c_2+1/i)\) must be disjoint from \(\partial O\).

Thus for \(c_1'\le c_1+1/i\) and \(c_2'\le c_2+1/i\), \(S(c_1',c_2')\) is the component of the open set \((D^2_{q_1})^{-1}((-\infty , c_1'))\cap (D^2_{q_2})^{-1}((-\infty , c_2'))\) contained in \(O\) and its closure is contained in \(O\). As \(D^2_{q_1}\) is decreasing over future directed causal curves and \(D^2_{q_1}\) is increasing over future directed causal curves, no causal curve in \(C\) can leave and reenter \(S(c_1',c_2')\), in particular since \(C\) is causally simple and relatively compact, \(S(c_1',c_2')\) is globally hyperbolic.

The set \(S(c_1',c_2')\) is strictly convex normal because it is the intersection of the strictly convex normal set \((D^2_{q_1})^{-1}((-\infty , c_1'))\cap O_1\) and the strictly convex normal set \((D^2_{q_2})^{-1}((-\infty , c_2'))\cap O_2\).\(\square \)

Proof

(Proof of Corollary 2) Just take \(C_i=S(c_1+1/(k+i),c_2+1/(k+i))\) for sufficiently large \(k\) and use the results of Lemma 4 (see also its proof).\(\square \)