Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

Aim of these notes is to provide a quick overview of the main results contained in [4] and [6] in the simplified case of compact metric spaces (X,d) endowed with a reference probability measure \({\mathfrak{m}}\). The idea is to give the interested reader the possibility to get as quickly as possible the key ideas behind the proofs of our recent results, neglecting all the problems that appear in a more general framework (as a matter of fact, no compactness assumption is made in [4, 6] and finiteness of \({\mathfrak{m}}\) is assumed only in [6]). Passing from compact spaces to complete and separable ones (and even to a more general framework which includes the so-called Wiener space) is not just a technical problem, meaning that several concepts need to be properly adapted in order to achieve such generality. Hence, in particular, the discussion here is by no means exhaustive, as both the key statements and the auxiliary lemmas are stated in the simplified case of a probability measure in a compact space.

Apart some very basic concepts about optimal transport, Wasserstein distance and gradient flows, this paper pretends to be self-contained. All the concepts that we need are recalled in the preliminary section, whose proofs can be found, for instance, in the first three chapters of [1] (for an overview on the theory of gradient flows, see also [3], and for a much broader discussion on optimal transport, see the monograph by Villani [32]). For completeness reasons, we included in our discussion some results coming from previous contributions which are potentially less known, in particular: the (sketch of the) proof by Lisini [22] of the characterization of absolutely continuous curves w.r.t. the Wasserstein distance (Proposition 4.13), and the proof of uniqueness of the gradient flow of the relative entropy w.r.t. the Wasserstein distance on spaces with Ricci curvature bounded below in the sense of Lott-Sturm-Villani (CD(K,∞) spaces in short) given by the second author in [12] (Theorem 5.9).

In summary, the main arguments and results that we present here are the following.

  1. (1)

    The Hopf-Lax formula produces subsolutions of the Hamilton-Jacobi equation, and solutions on geodesic spaces (Theorem 3.2 and Theorem 3.3).

  2. (2)

    A new approach to the theory of Sobolev spaces over metric measure spaces, which leads in particular to the proof that Lipschitz functions are always dense in energy in \(W^{1,2}(X,\mathsf{d},{\mathfrak{m}})\) (Theorem 4.7).

  3. (3)

    The uniqueness of the gradient flow w.r.t. the Wasserstein distance W 2 of the relative entropy in CD(K,∞) spaces (Theorem 5.9).

  4. (4)

    The identification of the L 2-gradient flow of the natural “Dirichlet energy” and the W 2-gradient flow of the relative entropy in CD(K,∞) spaces (see also [15] for the Alexandrov case, a paper to which our paper [4] owes a lot).

  5. (5)

    A metric version of Brenier’s theorem valid in spaces having Ricci curvature bounded from below in a sense slightly stronger than the one proposed by Lott-Sturm-Villani. If this curvature assumption holds (Definition 7.11) and μ,ν are absolutely continuous w.r.t. \({\mathfrak{m}}\), then “the distance traveled is uniquely determined by the starting point”, i.e. there exists a map D:X→ℝ such that for any optimal plan γ it holds d(x,y)=D(x) for γ-a.e. (x,y). Moreover, the map D is nothing but the weak gradient (according to the theory illustrated in Sect. 4) of any Kantorovich potential. See Theorem 7.11.

  6. (6)

    A key lemma (Lemma 8.7) concerning “horizontal” and “vertical” differentiation: it allows to compare the derivative of the squared Wasserstein distance along the heat flow with the derivative of the relative entropy along a geodesic.

  7. (7)

    A new (stronger) definition of Ricci curvature bound from below for metric measure spaces which is stable w.r.t. measured Gromov-Hausdorff convergence and rules out Finsler geometries (Theorem 9.12 and the discussion thereafter).

2 Preliminary Notions

As a general convention, we will always denote by (X,d) a compact metric space and by \({\mathfrak{m}}\) a Borel probability measure on X; we will always refer to the structure \((X,\mathsf{d},{\mathfrak{m}})\) as a compact and normalized metric measure space. We will use the symbol (Y,d Y ) for metric spaces when the compactness is not implicitly assumed.

2.1 Absolutely Continuous Curves and Slopes

Let (Y,d Y ) be a complete and separable metric space, J⊂ℝ an interval with nonempty interior and Jtγ t Y. We say that γ t is absolutely continuous if

$$\mathsf{d}_{Y}(\gamma_s,\gamma_t)\leq\int _t^sg(r) \mathrm {d} {r},\quad\forall s, t\in J, \ t<s $$

for some gL 1(J). It turns out that, if γ t is absolutely continuous, there is a minimal function g with this property, called metric speed and given for a.e. tJ by

$$|\dot{\gamma_t}|=\lim_{s\to t}\frac{\mathsf{d}_{Y}(\gamma_s,\gamma_t)}{|s-t|}. $$

See [3, Theorem 1.1.2] for the simple proof. Notice that the absolute continuity property of the integral ensures that absolutely continuous functions can be extended by continuity to the closure of their domain.

We will denote by C([0,1],Y) the space of continuous curves on [0,1] with values in Y endowed with the sup norm. The set AC 2([0,1],Y)⊂C([0,1],Y) consists of all absolutely continuous curves γ such that \(\int_{0}^{1}|\dot{\gamma}_{t}|^{2} \mathrm{d} {t}<\infty\): it is easily seen to be equal to the countable union of the closed sets \(\{\gamma:\int_{0}^{1}|\dot{\gamma}_{t}|^{2} \mathrm {d}{t}\leq n\}\), and thus it is a Borel subset of C([0,1],Y). The evaluation maps e t :C([0,1],Y)→Y are defined by

$$\mathrm{e}_t(\gamma):=\gamma_t, $$

and are clearly 1-Lipschitz.

We say that a subset D of Y is geodesic if for any x,yD there exists a curve (γ t )⊂D on [0,1] such that γ 0=x, γ 1=y and d Y (γ t ,γ s )=|ts|d Y (x,y) for all s,t∈[0,1]. Such a curve is called constant speed geodesic, or simply geodesic. The space of all geodesics in Y endowed with the sup distance will be denoted by \(\operatorname{Geo}(Y)\).

Given f:Y→ℝ∪{±∞} we define the slope (also called local Lipschitz constant) at points x where f(x)∈ℝ by

$$|D f|(x):=\mathop{\overline{\lim}}_{y\to x}\frac {|f(y)-f(x)|}{\mathsf{d}_{Y}(y,x)}. $$

We shall also need the one-sided counterparts of the slope called respectively descending slope and ascending slope:

$$ \big|D^- f\big|(x):=\mathop{\overline{\lim}}_{y\to x} \frac {[f(y)-f(x)]^-}{\mathsf{d}_{Y}(y,x)},\qquad \big|D^+ f\big|(x):=\mathop{\overline{\lim}}_{y\to x} \frac {[f(y)-f(x)]^+}{\mathsf{d}_{Y}(y,x)}, $$
(1)

where [⋅]+ and [⋅] denote respectively the positive and negative part. Notice the change of notation w.r.t. previous works of the authors: the slopes and its one-sided counterparts were denoted by |∇f|, |∇± f|. Yet, as remarked in [13], these notions, being defined in duality with the distance, are naturally cotangent notions, rather than tangent ones, whence the notation proposed here.

It is not difficult to see that for f Lipschitz the slopes and the local Lipschitz constant are upper gradients according to [18], namely

$$\biggl \vert \int_{\partial\gamma} f\biggr \vert \leq\int _\gamma\big|D^\pm f\big| $$

for any absolutely continuous curve γ:[0,1]→Y; here and in the following we write ∫ ∂γ f for f(γ 1)−f(γ 0) and ∫ γ g for \(\int_{0}^{1} g(\gamma_{s})|\dot{\gamma}_{s}| \mathrm{d}s\).

Also, for f,g:Y→ℝ Lipschitz it clearly holds

(2a)
(2b)

2.2 The Space

Let (X,d) be a compact metric space. The set consists of all Borel probability measures on X. As usual, if and T:XY is a μ-measurable map with values in the topological space Y, the push-forward measure is defined by T μ(B):=μ(T −1(B)) for every set Borel set BY.

Given , we define the Wasserstein distance W 2(μ,ν) between them as

$$ W_2^2(\mu,\nu):=\min\int\mathsf{d}^2(x,y) \mathrm{d}\boldsymbol {\gamma}(x,y), $$
(3)

where the minimum is taken among all Borel probability measures γ on X 2 such that

$$\pi^1_\sharp\boldsymbol{\gamma}=\mu,\qquad \pi^2_\sharp\boldsymbol{\gamma}=\nu;\quad \text{here } \pi^i:X^2\to X,\ \pi^i(x_1,x_2):=x_i. $$

Such measures are called admissible plans or couplings for the couple (μ,ν); a plan γ which realizes the minimum in (3) is called optimal, and we write γOpt(μ,ν). From the linearity of the admissibility condition we get that the squared Wasserstein distance is convex, i.e.:

$$ W_2^2 \bigl((1-\lambda) \mu_1+\lambda \mu_2,(1-\lambda)\nu_1+\lambda \nu_2 \bigr)\leq(1-\lambda) W_2^2( \mu_1,\nu_1)+\lambda W_2^2( \mu_2,\nu_2). $$
(4)

It is also well known (see e.g. Theorem 2.7 in [1]) that the Wasserstein distance metrizes the weak convergence of measures in , i.e. the weak convergence with respect to the duality with C(X); in particular is a compact metric space.

An equivalent definition of W 2 comes from the dual formulation of the transport problem:

$$ \frac{1}{2}W_2^2(\mu,\nu)= \sup_{\psi}\int_X\psi \mathrm{d}\mu +\int _X\psi^c \mathrm{d}\nu, $$
(5)

the supremum being taken among all Lipschitz functions ψ, where the c-transform in this formula is defined by

$$\psi^c(y):=\inf_{x\in X}\frac{\mathsf{d}^2(x,y)}{2}-\psi(x). $$

A function ψ:X→ℝ is said to be c-concave if ψ=ϕ c for some ϕ:X→ℝ. It is possible to prove that the supremum in (5) is always achieved by a c-concave function, and we will call any such function ψ a Kantorovich potential. We shall also use the fact that c-concave functions satisfy

$$ \psi^{cc}=\psi. $$
(6)

The (graph of the) c-superdifferential c ψ of a c-concave function ψ is the subset of X 2 defined by

$$\partial^c\psi:= \biggl\{(x,y)\ :\ \psi(x)+\psi^c(y)= \frac{\mathsf{d}^2(x,y)}{2} \biggr\}, $$

and the c-superdifferential c ψ(x) at x is the set of y’s such that (x,y)∈ c ψ. A consequence of the compactness of X is that any c-concave function ψ is Lipschitz and that the set c ψ(x) is non empty for any xX.

It is not difficult to see that if ψ is a Kantorovich potential for and γ is a coupling for (μ,ν) then γ is optimal if and only if \(\operatorname{supp}(\gamma)\subset\partial^{c}\psi\).

If (X,d) is geodesic, then so is , and in this case a curve (μ t ) is a constant speed geodesic from μ 0 to μ 1 if and only if there exists a measure concentrated on \(\operatorname{Geo}(X)\) such that (e t ) π=μ t for all t∈[0,1] and (e0,e1) πOpt(μ 0,μ 1). We will denote the set of such measures, called optimal geodesic plans, by \(\operatorname{GeoOpt}(\mu_{0},\mu_{1})\).

2.3 Geodesically Convex Functionals and Their Gradient Flows

Given a geodesic space (Y,d Y ) (in the following this will always be the Wasserstein space built over a geodesic space (X,d)), a functional E:Y→ℝ∪{+∞} is said K-geodesically convex (or simply K-convex) if for any y 0,y 1Y there exists a constant speed geodesic γ:[0,1]→Y such that γ 0=y 0, γ 1=y 1 and

$$E(\gamma_t)\leq(1-t)E(y_0)+tE(y_1)-\frac{K}{2}t(1-t)\mathsf{d}_{Y}^2(y_0,y_1), \quad\forall t\in[0,1]. $$

We will denote by D(E) the domain of E i.e. D(E):={y:E(y)<∞}: if E is K-geodesically convex, then D(E) is geodesic.

An easy consequence of the K-convexity is the fact that the descending slope defined in (1) can de computed as a sup, rather than as a limsup:

$$ |D^-E|(y)=\sup_{z\neq y} \biggl( \frac{E(y)-E(z)}{\mathsf{d}_{Y}(y,z)}+\frac{K}{2}\mathsf{d}_{Y}(y,z) \biggr)^+. $$
(7)

What we want to discuss here is the definition of gradient flow of a K-convex functional. There are essentially two different ways of giving such a notion in a metric setting. The first one, which we call Energy Dissipation Equality (EDE), ensures existence for any K-convex and lower semicontinuous functional (under suitable compactness assumptions), the second one, which we call Evolution Variation Inequality (EVI), ensures uniqueness and K-contractivity of the flow. However, the price we pay for these stronger properties is that existence results for EVI solutions hold under much more restrictive assumptions.

It is important to distinguish the two notions. The EDE one is the “correct one” to be used in a general metric context, because it ensures existence for any initial datum in the domain of the functional. However, typically gradient flows in the EDE sense are not unique: this is the reason of the analysis made in Sect. 5, which ensures that for the special case of the entropy functional uniqueness is indeed true.

EVI gradient flows are in particular gradient flows in the EDE sense (see Proposition 2.2), ensure uniqueness, K-contractivity and provide strong a priori regularizing effects. Heuristically speaking, existence of gradient flows in the EVI sense depends also on properties of the distance, rather than on properties of the functional only. A more or less correct way of thinking at this is: gradient flows in the EVI sense exist if and only if the distance is Hilbertian on small scales. For instance, if the underlying metric space is an Hilbert space, then the two notions coincide.

Now recall that one of our goals here is to study the gradient flow of the relative entropy in spaces with Ricci curvature bounded below (Definition 5.9), and recall that Finsler geometries are included in this setting (see page 926 of [32]). Thus, in general we must deal with the EDE notion of gradient flow. The EVI one will come into play in Sect. 9, where we use it to identify those spaces with Ricci curvature bounded below which are more ‘Riemannian like’.

Note: later on we will refer to gradient flows in the EDE sense simply as “gradient flows”, keeping the distinguished notation EVI-gradient flows for those in the EVI sense.

2.3.1 Energy Dissipation Equality

An important property of K-geodesically convex and lower semicontinuous functionals (see Corollary 2.4.10 of [3] or Proposition 3.19 of [1]) is that the descending slope is an upper gradient, that is: for any absolutely continuous curve y t :J⊂ℝ→D(E) it holds

$$ \big|E(y_t)-E(y_s)\big|\leq\int _t^s |\dot{y}_r|\big|D^- E\big|(y_r) \mathrm{d}r,\quad\forall t\leq s. $$
(8)

An application of Young inequality gives that

$$ E(y_t)\leq E(y_s)+\frac{1}{2}\int _t^s|\dot{y}_r|^2 \mathrm{d}r +\frac{1}{2}\int_t^s|D^- E|^2(y_r) \mathrm{d}r,\quad\forall t\leq s. $$
(9)

This inequality motivates the following definition:

Definition 2.1

(Energy Dissipation Equality definition of gradient flow)

Let E be a K-convex and lower semicontinuous functional and let y 0D(E). We say that a continuous curve [0,∞)∋ty t is a gradient flow for E in the EDE sense (or simply a gradient flow) if it is locally absolutely continuous in (0,∞), it takes values in the domain of E and it holds

$$ E(y_t)= E(y_s)+\frac{1}{2}\int _t^s|\dot{y}_r|^2 \mathrm{d}r+\frac{1}{2}\int_t^s|D^- E|^2(y_r) \mathrm{d}r,\quad\forall t\leq s. $$
(10)

Notice that, due to (9), the equality (10) is equivalent to

$$ E(y_0)\geq E(y_s)+\frac{1}{2}\int _0^s|\dot{y}_r|^2 \mathrm{d}r+\frac{1}{2}\int_0^s|D^- E|^2(y_r) \mathrm{d}r,\quad\forall s>0. $$
(11)

Indeed, if (11) holds, then (10) holds with t=0, and then by the additivity of the integral (10) holds in general.

It is not hard to check that if E:ℝd→ℝ is a C 1 function, then a curve y t :J→ℝd is a gradient flow according to the previous definition if and only if it satisfies

$$y_t'=-\nabla E(y_t),\quad\forall t\in J, $$

so that the metric definition reduces to the classical one when specialized to Euclidean spaces.

The following theorem has been proved in [3] (Corollary 2.4.11):

Theorem 2.1

(Existence of gradient flows in the EDE sense)

Let (Y,d Y ) be a compact metric space and let E:Y→ℝ∪{+∞} be a K-geodesically convex and lower semicontinuous functional. Then every y 0D(E) is the starting point of a gradient flow in the EDE sense of E.

It is important to stress the fact that in general gradient flows in the EDE sense are not unique. A simple example is Y:=ℝ2 endowed with the L norm, and E defined by E(x,y):=x. It is immediate to see that E is 0-convex and that for any point (x 0,y 0) there exist uncountably many gradient flows in the EDE sense starting from it, for instance all curves (x 0t,y(t)) with |y′(t)|≤1 and y(0)=y 0.

2.3.2 Evolution Variational Inequality

To see where the EVI notion comes from, notice that for a K-convex and smooth function f on ℝd it holds \(y_{t}'=-\nabla f(y)\) for any t≥0 if and only if

$$ \frac{\mathrm{d}}{\mathrm{d}t}\frac {|y_t-z|^2}{2}+\frac{K}{2}|y_t-z|^2+f(y_t) \leq f(z),\quad\forall z\in\mathbb{R}^d,\ \forall t\geq 0. $$
(12)

This equivalence is true because K-convexity ensures that v=−∇f(y) if and only

$$\langle v,y-z\rangle+\frac{K}{2}|y-z|^2+f(y)\leq f(z),\quad\forall z \in \mathbb{R}^d. $$

Inequality (12) can be written in a metric context in several ways, which we collect in the following statement (we omit the easy proof).

Proposition 2.1

(Evolution Variational Inequality: equivalent statements)

Let (Y,d Y ) be a complete and separable metric space, E:Y→(−∞,∞] a lower semicontinuous functional, and (y t ) a locally absolutely continuous curve in Y. Then the following properties are equivalent:

  1. (i)

    For any zD(E) it holds

    $$\frac{\mathrm{d}}{\mathrm{d}t}\frac{\mathsf {d}_{Y}^2(y_t,z)}{2}+\frac{K}{2}\mathsf{d}_{Y}^2(y_t,z)+E(y_t) \leq E(z),\quad\text{\textit{for a.e.}}\ t\in (0,\infty). $$
  2. (ii)

    For any zD(E) it holds ∀0<t<s<∞

    $$\frac{\mathsf{d}_{Y}^2(y_s,z)-\mathsf{d}_{Y}^2(y_t,z)}{2h}+\frac{K}{2}\int_t^s \mathsf{d}_{Y}^2(y_r,z) \mathrm{d}r+\int _t^s E(y_r) \mathrm{d}r\leq (s-t)E(z). $$
  3. (iii)

    There exists a set AD(E) dense in energy (i.e., for any zD(E) there exists (z n )⊂A converging to z such that E(z n )→E(z)) such that for any zA it holds

    $$\mathop{\overline{\lim}}_{h\downarrow0}\frac{\mathsf {d}_{Y}^2(y_{t+h},z)-\mathsf{d}_{Y}^2(y_t,z)}{2} +\frac{K}{2} \mathsf{d}_{Y}^2(y_t,z)+E(y_t)\leq E(z),\quad\forall t\in (0,\infty). $$

Definition 2.2

(Evolution Variational Inequality definition of gradient flow)

We say that a curve (y t ) is a gradient flow of E in the EVI sense relative to K∈ℝ (in short, EVI K -gradient flow), if any of the above equivalent properties are true. We say that y t starts from y 0 if y t y 0 as t↓0.

This definition of gradient flow is stronger than the one discussed in the previous section, because of the following result proved by the third author in [29] (see also Proposition 3.6 of [1]), which we state without proof.

Proposition 2.2

(EVI implies EDE)

Let (Y,d Y ) be a complete and separable metric space, K∈ℝ, E:Y→(−∞,∞] a lower semicontinuous functional and y t :(0,∞)→D(E) a locally absolutely continuous curve. Assume that y t is an EVI K -gradient flow for E. Then (10) holds for any 0<t<s.

Remark 2.1

(Contractivity)

It can be proved that if (y t ) and (z t ) are gradient flows in the EVI K sense of the l.s.c. functional E, then

$$\mathsf{d}_{Y}(y_t,z_t)\leq e^{-Kt} \mathsf{d}_{Y}(y_0,z_0),\quad \forall t \geq0. $$

In particular, gradient flows in the EVI sense are unique. This contractivity property, used in conjunction with (ii) of Proposition 2.1, guarantees that if existence of gradient flows in the EVI sense is known for initial data lying in some subset SY, then it is also known for initial data in the closure \(\overline{S}\) of S.

We also point out the following geometric consequence of the EVI, proven in [10].

Proposition 2.3

Let E:Y→(−∞,∞] be a lower semicontinuous functional on a complete space (Y,d Y ). Assume that every y 0D(E) is the starting point of an EVI K -gradient flow of E. Then E is K-convex along all geodesics contained in \(\overline {D(E)}\).

As we already said, gradient flows in the EVI sense do not necessarily exist, and their existence depends on the properties of the distance d Y . For instance, it is not hard to see that if we endow ℝ2 with the L norm and consider the functional E(x,y):=x, then there re is no gradient flow in the EVI K -sense, regardless of the constant K.

3 Hopf-Lax Formula and Hamilton-Jacobi Equation

Aim of this subsection is to study the properties of the Hopf-Lax formula in a metric setting and its relations with the Hamilton-Jacobi equation. Here we assume that (X,d) is a compact metric space. Notice that there is no reference measure \({\mathfrak{m}}\) in the discussion.

Let f:X→ℝ be a Lipschitz function. For t>0 define

$$F(t,x,y):=f(y)+\frac{\mathsf{d}^2(x,y)}{2t}, $$

and the function Q t f:X→ℝ by

$$Q_tf(x):=\inf_{y\in X}F(t,x,y)=\min_{y\in X}F(t,x,y). $$

Also, we introduce the functions D +,D :X×(0,∞)→ℝ as

(13)

where, in both cases, the y’s vary among all minima of F(t,x,⋅). We also set Q 0 f=f and D ±(x,0)=0. Thanks to the continuity of F and the compactness of X, it is easy to check that the map [0,∞)×X∋(t,x)↦Q t f(x) is continuous. Furthermore, the fact that f is Lipschitz easily yields

$$ D^-(x,t)\leq D^+(x,t)\leq2t\operatorname{Lip}(f), $$
(14)

and from the fact that the functions {d 2(⋅,y)} yY are uniformly Lipschitz (because (X,d) is bounded) we get that Q t f is Lipschitz for any t>0.

Proposition 3.4

(Monotonicity of D ±)

For all xX it holds

$$ D^+(x,t)\leq D^-(x,s),\quad0\leq t< s. $$
(15)

As a consequence, D +(x,⋅) and D (x,⋅) are both nondecreasing, and they coincide with at most countably many exceptions in [0,∞).

Proof

Fix xX. For t=0 there is nothing to prove. Now pick 0<t<s and choose x t and x s minimizers of F(t,x,⋅) and F(s,x,⋅) respectively, such that d(x,x t )=D +(x,t) and d(x,x s )=D (x,s). The minimality of x t ,x s gives

Adding up and using the fact that \(\frac{1}{t}\geq\frac{1}{s}\) we deduce

$$D^+(x,t)=\mathsf{d}(x_t,x)\leq\mathsf{d}(x_s,x)= D^-(x,s), $$

which is (15).

Combining this with the inequality D D + we immediately obtain that both functions are nondecreasing. At a point of right continuity of D (x,⋅) we get

$$D^+(x,t)\leq\inf_{s>t}D^-(x,s)=D^-(x,t). $$

This implies that the two functions coincide out of a countable set. □

Next, we examine the semicontinuity properties of D ±. These properties imply that points (x,t) where the equality D +(x,t)=D (x,t) occurs are continuity points for both D + and D .

Proposition 3.5

(Semicontinuity of D ±)

The map D + is upper semicontinuous and the map D is lower semicontinuous in X×(0,∞).

Proof

We prove lower semicontinuity of D , the proof of upper semicontinuity of D + being similar. Let (x i ,t i ) be any sequence converging to (x,t) and, for every i, let (y i ) be a minimum of F(t i ,x i ,⋅) for which d(y i ,x i )=D (x i ,t i ). For all i we have

$$f(y_i)+\frac{\mathsf{d}^2(y_i,x_i)}{2t_i}=Q_{t_i}f(x_i). $$

Moreover, the continuity of (x,t)↦Q t f(x) gives that \(\lim_{i}Q_{t_{i}}f(x_{i})=Q_{t}f(x)\), thus

$$\lim_{i\to\infty} f(y_i)+\frac{\mathsf{d}^2(y_i,x)}{2t}=Q_tf(x). $$

This means that (y i ) is a minimizing sequence for F(t,x,⋅). Since (X,d) is compact, possibly passing to a subsequence, not relabeled, we may assume that (y i ) converges to y as i→∞. Therefore

$$D^-(x,t)\leq\mathsf{d}(x,y)=\lim_{i\to\infty}\mathsf {d}(x,y_i)= \lim_{i\to \infty}D^-(x_i,t_i).\vspace{-4pt} $$

 □

Proposition 3.6

(Time derivative of Q t f)

The map tQ t f is Lipschitz from [0,∞) to C(X) and, for all xX, it satisfies

$$ \frac{\mathrm{d}}{\mathrm{d}t}Q_tf(x)=-\frac{[D^{\pm}(x,t)]^2}{2t^2}, $$
(16)

for any t>0 with at most countably many exceptions.

Proof

Let t<s and x t , x s be minima of F(t,x,⋅) and F(s,x,⋅). We have

which gives that tQ t f(x) is Lipschitz in (ε,+∞) for any ε>0 and xX. Also, dividing by (st) and taking Proposition 3.4 into account, we get (16). Now notice that from (14) we get that \(|\frac{\mathrm {d}}{\mathrm{d} t}Q_{t}f(x)|\leq2\operatorname{Lip}^{2}(f)\) for any x and a.e. t, which, together with the pointwise convergence of Q t f to f as t↓0, yields that tQ t fC(X) is Lipschitz in [0,∞). □

Proposition 3.7

(Bound on the local Lipschitz constant of Q t f)

For (x,t)∈X×(0,∞) it holds:

$$ |D Q_tf|(x)\leq\frac{D^+(x,t)}{t}. $$
(17)

Proof

Fix xX and t∈(0,∞), pick a sequence (x i ) converging to x and a corresponding sequence (y i ) of minimizers for F(t,x i ,⋅) and similarly a minimizer y of F(t,x,⋅). We start proving that

$$\mathop{\overline{\lim}}_{i\to\infty}\frac {Q_tf(x)-Q_tf(x_i)}{d(x,x_i)}\leq \frac{D^+(x,t)}{t}. $$

Since it holds

dividing by d(x,x i ), letting i→∞ and using the upper semicontinuity of D + we get the claim. To conclude, we need to show that

$$\mathop{\overline{\lim}}_{i\to\infty}\frac {Q_tf(x_i)-Q_tf(x)}{d(x,x_i)}\leq \frac{D^+(x,t)}{t}. $$

This follows along similar lines starting from the inequality

$$Q_tf(x_i)-Q_tf(x)\leq F(t,x_i,y)-F(t,x,y_i). $$

 □

Theorem 3.2

(Subsolution of HJ)

For every xX it holds

$$ \frac{\mathrm{d}}{\mathrm{d}t}Q_tf(x)+\frac{1}{2}|D Q_tf|^2(x)\leq0 $$
(18)

with at most countably many exceptions in (0,∞).

Proof

The claim is a direct consequence of Proposition 3.6 and Proposition 3.7. □

We just proved that in an arbitrary metric space the Hopf-Lax formula produces subsolutions of the Hamilton-Jacobi equation. Our aim now is to prove that if (X,d) is a geodesic space, then the same formula provides also supersolutions.

Theorem 3.3

(Supersolution of HJ)

Assume that (X,d) is a geodesic space. Then equality holds in (17). In particular, for all xX it holds

$$\frac{\mathrm{d}}{\mathrm{d}t}Q_tf(x)+\frac{1}{2} |D Q_tf|^2(x)=0, $$

with at most countably many exceptions in (0,∞).

Proof

Let y be a minimum of F(t,x,⋅) such that d(x,y)=D +(x,t). Let γ:[0,1]→X be a constant speed geodesic connecting x to y. We have

Therefore we obtain

$$\mathop{\overline{\lim}}_{s\downarrow0}\frac{Q_tf(x)-Q_tf(\gamma_s)}{\mathsf{d}(x,\gamma_s)} =\mathop{\overline{ \lim}}_{s\downarrow0}\frac{Q_tf(x)-Q_tf(\gamma_s)}{sD^+(x,t)}\geq \frac{D^+(x,t)}{t}. $$

Since sγ s is a particular family converging to x we deduce

$$\big|D^-Q_tf\big|(x)\geq\frac{D^+(x,t)}{t}. $$

Taking into account Proposition 3.6 and Proposition 3.7 we conclude. □

4 Weak Definitions of Gradient

In this section we introduce two weak notions of ‘norm of the differential’, one inspired by Cheeger’s seminal paper [9], that we call minimal relaxed slope and denote by |Df|, and one inspired by the papers of Koskela-MacManus [20] and of Shanmugalingam [30], that we call minimal weak upper gradient and denote by |Df| w . Notice that, as for the slopes, the objects that we are going to define are naturally in duality with the distance, thus are cotangent notion: that’s why we use the ‘D’ instead of the ‘∇’ in the notation. Still, we will continue speaking of upper gradients and their weak counterparts to be aligned with the convention used in the literature (see [13] for a broader discussion on this distinction between tangent and cotangent objects and its effects on calculus).

We compare our concepts with those of the original papers in Sect. 4.4, where we show that all these approaches a posteriori coincide. As usual, we will adopt the simplifying assumption that \((X,\mathsf{d},{\mathfrak{m}})\) is compact and normalized metric measure space, i.e. (X,d) is compact and .

4.1 The “Vertical” Approach: Minimal Relaxed Slope

Definition 4.3

(Relaxed slopes)

We say that \(G\in L^{2}(X,{\mathfrak{m}})\) is a relaxed slope of \(f\in L^{2}(X,{\mathfrak{m}})\) if there exist \(\tilde{G}\in L^{2}(X,{\mathfrak{m}})\) and Lipschitz functions f n :X→ℝ such that:

  1. (a)

    f n f in \(L^{2}(X,{\mathfrak{m}})\) and |Df n | weakly converges to \(\tilde{G}\) in \(L^{2}(X,{\mathfrak{m}})\);

  2. (b)

    \(\tilde{G}\leq G\) \({\mathfrak{m}}\)-a.e. in X.

We say that G is the minimal relaxed slope of f if its \(L^{2}(X,{\mathfrak{m}})\) norm is minimal among relaxed slopes. We shall denote by |Df| the minimal relaxed slope.

Using Mazur’s lemma and (2a) (see Proposition 4.8) it is possible to show that an equivalent characterization of relaxed slopes can be given by modifying (a) as follows: \(\tilde{G}\) is the strong limit in \(L^{2}(X,{\mathfrak{m}})\) of G n ≥|Df n |. The definition of relaxed slope we gave is useful to show existence of relaxed slopes (as soon as an approximating sequence (f n ) with |Df n | bounded in \(L^{2}(X,{\mathfrak{m}})\) exists) while the equivalent characterization is useful to perform diagonal arguments and to show that the class of relaxed slopes is a convex closed set. Therefore the definition of |Df| is well posed.

Lemma 4.1

(Locality)

Let G 1,G 2 be relaxed slopes of f. Then min{G 1,G 2} is a relaxed slope as well. In particular, for any relaxed slope G it holds

$$|D f|_*\leq G\quad \mathfrak{m}\hbox{-a.e.~in}\ X. $$

Proof

It is sufficient to prove that if BX is a Borel set, then χ B G 1+χ XB G 2 is a relaxed slope of f. By approximation, taking into account the closure of the class of relaxed slopes, we can assume with no loss of generality that B is an open set. We fix r>0 and a Lipschitz function ϕ r :X→[0,1] equal to 0 on XB r and equal to 1 on B 2r , where the open sets B s B are defined by

$$B_s:= \bigl\{x\in X: \operatorname{dist}(x,X\setminus B)> s \bigr\}\subset B. $$

Let now f n,i , i=1,2, be Lipschitz and L 2 functions converging to f in \(L^{2}(X,{\mathfrak{m}})\) as n→∞, with |Df n,i | weakly convergent to G i and set f n :=ϕ r f n,1+(1−ϕ r )f n,2. Then, |Df n |=|Df n,1| on B 2r and |Df n |=|Df n,2| on \(X\setminus\overline{B_{r}}\); in \(\overline{B_{r}}\setminus B_{2r}\), by applying (2a) and (2b), we can estimate

$$|D f_n|\leq|D f_{n,2}|+\operatorname{Lip}(\phi_r)|f_{n,1}-f_{n,2}|+ \phi_r \bigl(|D f_{n,1}|+|D f_{n,2}| \bigr). $$

Since \(\overline{B_{r}}\subset B\), by taking weak limits of a subsequence, it follows that

$$\chi_{B_{2r}}G_1+\chi_{X\setminus\overline{B_r}}G_2+ \chi_{B\setminus B_{2r}}(G_1+2G_2) $$

is a relaxed slope of f. Letting r↓0 gives that χ B G 1+χ XB G 2 is a relaxed slope as well.

For the second part of the statement argue by contradiction: let G be a relaxed slope of f and assume that B={G<|Df|} is such that \({\mathfrak{m}}(B)>0\). Consider the relaxed slope B +|Df| χ XB : its L 2 norm is strictly less than the L 2 norm of |Df|, which is a contradiction. □

A trivial consequence of the definition and of the locality principle we just proved is that if f:X→ℝ is Lipschitz it holds:

$$ |D f|_*\leq|D f|\quad \mathfrak{m}\hbox{-a.e. in}\ X. $$
(19)

We also remark that it is possible to obtain the minimal relaxed slope as strong limit in L 2 of slopes of Lipschitz functions, and not only weak, as shown in the next proposition.

Proposition 4.8

(Strong approximation)

If \(f\in L^{2}(X,{\mathfrak{m}})\) has a relaxed slope, there exist Lipschitz functions f n convergent to f in \(L^{2}(X,{\mathfrak{m}})\) with |Df n | convergent to |Df| in \(L^{2}(X,{\mathfrak{m}})\).

Proof

If g i f in L 2 and |Dg i | weakly converges to |Df| in L 2, by Mazur’s lemma we can find a sequence convex combinations

$$G_h=\sum_{i=N_h+1}^{N_{h+1}} \alpha_{h,i}|D g_i|,\quad \text{with}\ \alpha_{i,h}\ge0,\quad \sum_{i=N_h+1}^{N_{h+1}} \alpha_{h,i}=1,\quad N_h\to\infty $$

of |Dg i | strongly convergent to |Df| in L 2; the corresponding convex combinations of g i , that we shall denote by f h , still converge in L 2 to f and |Df h | is dominated by G h . It follows that

$$\mathop{\overline{\lim}}_{h\to\infty}\int_X|D f_h|^2 \mathrm {d} {\mathfrak{m}}\leq \mathop{ \overline{\lim}}_{h\to\infty}\int_X G_h^2 \mathrm {d} {\mathfrak{m}}=\int_X|D f|_*^2 \mathrm{d} {\mathfrak{m}}. $$

This implies at once that |Df h | weakly converges to |Df| (because any limit point in the weak topology is a relaxed slope with minimal norm) and that the convergence is strong. □

Theorem 4.4

The Cheeger energy functional

$$ \mathsf{Ch}(f):=\frac{1}{2}\int_X |D f|_*^2 \mathrm{d} {\mathfrak{m}}, $$
(20)

set to +∞ if f has no relaxed slope, is convex and lower semicontinuous in \(L^{2}(X,{\mathfrak{m}})\).

Proof

A simple byproduct of condition (2a) is that αF+βG is a relaxed slope of αf+βg whenever α,β are nonnegative constants and F,G are relaxed slopes of f,g respectively. Taking F=|Df| and G=|Dg| yields the convexity of Ch, while lower semicontinuity follows by a simple diagonal argument based on the strong approximation property stated in Proposition 4.8. □

Proposition 4.9

(Chain rule)

If \(f\in L^{2}(X,{\mathfrak{m}})\) has a relaxed slope and ϕ:ℝ→ℝ is Lipschitz and C 1, then |(f)|=|ϕ′(f)||Df| \({\mathfrak{m}}\)-a.e. in X.

Proof

We trivially have |(f)|≤|ϕ′(f)||Df|. If we apply this inequality to the “optimal” approximating sequence of Lipschitz functions given by Proposition 4.8 we get that |ϕ′(f)||Df| is a relaxed slope of ϕ(f), so that |(f)|≤|ϕ′(f)||Df| \({\mathfrak{m}}\)-a.e. in X. Applying twice this inequality with ϕ(r):=−r we get |Df|≤|D(−f)|≤|Df| and thus |Df|=|D(−f)| \({\mathfrak{m}}\)-a.e. in X.

Up to a simple rescaling, we can assume |ϕ′|≤1. Let ψ 1(z):=zϕ(z), notice that \(\psi_{1}'\geq0\) and thus \({\mathfrak{m}}\)-a.e. on f −1({ϕ′≥0}) it holds

$$|D f|_*\leq\big|D \bigl(\phi(f)\bigr)\big|_*+ \big|D \bigl(\psi_1(f)\bigr)\big|_*\leq \phi'(f)|D f|_*+\psi_1'(f)|D f|_*=|D f|_*, $$

hence all the inequalities must be equalities, which forces |D(ϕ(f))|=ϕ′(f)|Df| \({\mathfrak{m}}\)-a.e. on f −1({ϕ′≥0}). Similarly, let ψ 2(z)=−zϕ(z) and notice that \(\psi_{2}'\leq 0\), so that \({\mathfrak{m}}\)-a.e. on f −1({ϕ′≤0}) it holds

$$ \begin{aligned} |D f|_*&=\big|D (-f)\big|_*\leq \big|D \bigl(\phi(f)\bigr)\big|_*+\big|D \bigl(\psi_2(f)\bigr)\big|_* \\ &\leq -\phi'(f)|D f|_*-\psi_2'(f)|D f|_*=|D f|_*. \end{aligned} $$

As before we can conclude that |D(ϕ(f))|=−ϕ′(f)|Df| \({\mathfrak{m}}\)-a.e. on f −1({ϕ′≤0}). □

Still by approximation, it is not difficult to show that ϕ(f) has a relaxed slope if ϕ is Lipschitz, and that |(f)|=|ϕ′(f)||Df| \({\mathfrak{m}}\)-a.e. in X. In this case ϕ′(f) is undefined at points x such that ϕ is not differentiable at f(x), on the other hand the formula still makes sense because |Df|=0 \({\mathfrak{m}}\)-a.e. on f −1(N) for any Lebesgue negligible set N⊂ℝ. Particularly useful is the case when ϕ is a truncation function, for instance ϕ(z)=min{z,M}. In this case

$$\big|D \min\{f,M\}\big|_*= \begin{cases} |D f|_*&\text{if}\ f(x)<M\\ 0&\text{if} f(x)\geq M. \end{cases} $$

Analogous formulas hold for truncations from below.

4.1.1 Laplacian: Definition and Basic Properties

Since the domain of Ch is dense in \(L^{2}(X,{\mathfrak{m}})\) (it includes Lipschitz functions), the Hilbertian theory of gradient flows (see for instance [3, 8]) can be applied to Cheeger’s functional (20) to provide, for all \(f_{0}\in L^{2}(X,{\mathfrak{m}})\), a locally Lipschitz continuous map tf t from (0,∞) to \(L^{2}(X,{\mathfrak{m}})\), with f t f 0 as t↓0, whose derivative satisfies

$$ \frac{\mathrm{d}}{\mathrm{d}t}f_t\in-\partial\mathsf {Ch}(f_t)\quad\text{for a.e.}\ t. $$
(21)

Here Ch(g) denotes the subdifferential of Ch at gD(Ch) in the sense of convex analysis, i.e.

$$\partial\mathsf{Ch}(g):= \biggl\{\xi\in L^2(X,{\mathfrak{m}}): \mathsf{Ch}(f)\geq\mathsf{Ch}(g)+\int_X\xi(f-g) \mathrm {d} \mathfrak{m} \ \forall f\in L^2(X,{\mathfrak{m}}) \biggr\}. $$

Another important regularizing effect of gradient flows of convex l.s.c. functionals lies in the fact that for every t>0 (the opposite of) the right derivative \(-\frac{\mathrm{d}}{{\mathrm{d}t}_{+}} f_{t}=\lim_{h\downarrow0}\frac{1}{h}(f_{t}-f_{t+h})\) exists and it is actually the element with minimal \(L^{2}(X,{\mathfrak{m}})\) norm in Ch(f t ). This motivates the next definition:

Definition 4.4

(Laplacian)

The Laplacian Δf of \(f\in L^{2}(X,{\mathfrak{m}})\) is defined for those f such that Ch(f)≠∅. For those f, −Δf is the element of minimal \(L^{2}(X,{\mathfrak{m}})\) norm in Ch(f). The domain of Δ is defined as D(Δ).

Remark 4.2

(Potential lack of linearity)

It should be observed that in general the Laplacian—as we just defined it—is not a linear operator: the potential lack of linearity is strictly related to the fact that potentially the space \(W^{1,2}(X,\mathsf{d},{\mathfrak{m}})\) is not Hilbert, because \(f\mapsto\int|D f|_{*}^{2} \mathrm{d}{\mathfrak{m}}\) need not be quadratic. For instance if X=ℝ2, \({\mathfrak{m}}\) is the Lebesgue measure and d is the distance induced by the L norm, then it is easily seen that

$$|D f|_*^2= \biggl( \bigg|\frac{\partial f}{\partial x} \bigg|+ \bigg|\frac{\partial f}{\partial y} \bigg| \biggr)^2. $$

Even though the Laplacian is not linear, the trivial implication

$$v\in\partial\mathsf{Ch}(f)\quad\Rightarrow\quad\lambda v\in \partial \mathsf{Ch}(\lambda f),\quad\forall\lambda\in\mathbb{R}, $$

ensures that the Laplacian (and so the gradient flow of Ch) is 1-homogeneous.

We can now write

$$\frac{\mathrm{d}}{\mathrm{d}t}f_t=\Delta f_t $$

for gradient flows f t of Ch, the derivative being understood in \(L^{2}(X,{\mathfrak{m}})\), in accordance with the classical case. The classical Hilbertian theory of gradient flows also ensures that

$$ \lim_{t\to\infty}\mathsf{Ch}(f_t)=0\quad\text {and}\quad \frac{\mathrm{d}}{\mathrm{d}t}\mathsf{Ch}(f_t)=-\|\Delta f_t\|^2_{L^2(X,{\mathfrak{m}})},\quad \text{for a.e.}\ t\in(0, \infty). $$
(22)

Proposition 4.10

(Integration by parts)

For all fD(Δ), gD(Ch) it holds

$$ \biggl \vert \int_X g\Delta f \mathrm{d}\mathfrak {m}\biggr \vert \leq\int_X |D g|_*|D f|_* \mathrm{d} {\mathfrak{m}}. $$
(23)

Also, let fD(Δ) and ϕC 1(ℝ) with bounded derivative on an interval containing the image of f. Then

$$ \int_X \phi(f)\Delta f \mathrm{d} { \mathfrak{m}}= -\int_X |D f|_*^2 \phi'(f) \mathrm{d} {\mathfrak{m}}. $$
(24)

Proof

Since −Δf Ch(f) it holds

$$\mathsf{Ch}(f)-\int_X \varepsilon g\Delta f \mathrm{d} \mathfrak {m}\leq\mathsf{Ch}(f+\varepsilon g),\quad \forall g\in L^2(X,{\mathfrak{m}}),\ \varepsilon\in\mathbb{R}. $$

For ε>0, |Df|+ε|Dg| is a relaxed slope of f+εg (possibly not minimal). Thus it holds \(2\mathsf {Ch}(f+\varepsilon g)\leq\int_{X}(|D f|_{*}+\varepsilon|D g|_{*})^{2} \mathrm{d}\mathfrak {m}\) and therefore

Dividing by ε, letting ε↓0 and then repeating the argument with −g in place of g we get (23).

For the second part we recall that, by the chain rule, |D(f+εϕ(f))|=(1+εϕ′(f))|Df| for |ε| small enough. Hence

$$ \begin{aligned} \mathsf{Ch}\bigl(f+\varepsilon\phi(f)\bigr)- \mathsf{Ch}(f)&= \frac{1}{2}\int_X|D f|_*^2 \bigl(\bigl(1+\varepsilon \phi'(f) \bigr)^2-1 \bigr) \mathrm{d} {\mathfrak{m}} \\ &=\varepsilon\int_X|D f|_*^2 \phi'(f) \mathrm{d} {\mathfrak{m}}+o(\varepsilon), \end{aligned} $$

which implies that for any v Ch(f) it holds \(\int_{X}v \phi(f) \mathrm{d}{\mathfrak{m}}=\int_{X}|D f|_{*}^{2}\phi'(f) \mathrm {d}{\mathfrak{m}}\), and gives the thesis with v=−Δf. □

Proposition 4.11

(Some properties of the gradient flow of Ch)

Let \(f_{0}\in L^{2}(X,{\mathfrak{m}})\) and let (f t ) be the gradient flow of Ch starting from f 0. Then the following properties hold.

Mass preservation. \(\int f_{t} \mathrm {d}{\mathfrak{m}}=\int f_{0} \mathrm{d}{\mathfrak{m}}\) for any t≥0.

Maximum principle. If f 0C (resp. f 0c) \({\mathfrak{m}}\)-a.e. in X, then f t C (resp f t c) \({\mathfrak{m}}\)-a.e. in X for any t≥0.

Entropy dissipation. Suppose 0<cf 0C<∞ \({\mathfrak{m}}\)-a.e. Then \(t\mapsto\int f_{t}\log f_{t} \mathrm{d}{\mathfrak{m}}\) is absolutely continuous in [0,∞) and it holds

$$\frac{\mathrm{d}}{\mathrm{d}t}\int_Xf_t\log f_t \mathrm{d} {\mathfrak{m}}=-\int_X \frac{|D f_t|_*^2}{f_t} \mathrm {d} {\mathfrak{m}},\quad\text{\textit{for a.e.}}\ t\in(0, \infty). $$

Proof

Mass preservation. Just notice that from (23) we get

$$\biggl \vert \frac{\mathrm{d}}{\mathrm{d}t}\int_Xf_t \mathrm {d} {\mathfrak{m}}\biggr \vert =\biggl \vert \int_X \mathbf{1}\cdot\Delta f_t \mathrm{d} {\mathfrak{m}}\biggr \vert \leq\int_X|D \mathbf{1}|_*|D f_t|_* \mathrm{d} {\mathfrak{m}}=0,\quad\text{for a.e.}\ t\in(0,\infty), $$

where 1 is the function identically equal to 1, which has minimal relaxed gradient equal to 0.

Maximum principle. Fix \(f\in L^{2}(X,{\mathfrak{m}})\), τ>0 and, according to the implicit Euler scheme, let f τ be the unique minimizer of

$$g \mapsto\mathsf{Ch}(g)+\frac{1}{2\tau}\int_X|g-f|^2 \mathrm{d} {\mathfrak{m}}. $$

Assume that fC. We claim that in this case f τC as well. Indeed, if this is not the case we can consider the competitor g:=min{f τ,C} in the above minimization problem. By Proposition 4.9 (with the choice ϕ(r):=min{r,C}) we get Ch(g)≤Ch(f τ) and the L 2 distance of f and g is strictly smaller than the one of f and f τ as soon as \({\mathfrak{m}}(\{f^{\tau}>C\})>0\), which is a contradiction.

Starting from f 0, iterating this procedure, and using the fact that the implicit Euler scheme converges as τ↓0 (see [3, 8] for details) to the gradient flow we get the conclusion.

The same arguments applies to uniform bounds from below.

Entropy dissipation. The map zzlogz is Lipschitz on [c,C] which, together with the maximum principle and the fact that \(t\mapsto f_{t}\in L^{2}(X,{\mathfrak{m}})\) is locally absolutely continuous, yields the claimed absolute continuity statement. Now notice that we have \(\frac{\mathrm{d}}{\mathrm{d}t}\int f_{t}\log f_{t} \mathrm{d}{\mathfrak{m}}=\int (\log f_{t}+1)\Delta f_{t} \mathrm{d}{\mathfrak{m}}\) for a.e. t. Since by the maximum principle f t c \({\mathfrak{m}}\)-a.e., the function logz+1 is Lipschitz and C 1 on the image of f t for any t≥0, thus from (24) we get the conclusion. □

4.2 The “Horizontal” Approach: Weak Upper Gradients

In this subsection, following the approach of [4, 5], we introduce a different notion of “weak norm of gradient” in a compact and normalized metric measure space \((X,\mathsf{d},{\mathfrak{m}})\). This notion of gradient is Lagrangian in spirit, it does not require a relaxation procedure, it will provide a new estimate of entropy dissipation along the gradient flow of Ch, and it will also be useful in the analysis of the derivative of the entropy along Wasserstein geodesics.

While the definition of minimal relaxed slope was taken from Cheeger’s work [9], the notion we are going to introduce is inspired by the work of Koskela-MacManus [20] and Shanmugalingam [30], the only difference being that we consider a different notion of null set of curves.

4.2.1 Negligible Sets of Curves and Functions Sobolev Along a.e. Curve

Recall that the evaluation maps e t :C([0,1],X)→X are defined by e t (γ):=γ t . We also introduce the restriction maps \(\mathrm{restr}_{t}^{s}: C([0,1],X)\to C([0,1],X)\), 0≤ts≤1, given by

$$ \operatorname{restr}_t^s(\gamma)_r:= \gamma_{((1-r)t+rs)}, $$
(25)

so that \(\mathrm{restr}_{t}^{s}\) restricts the curve γ to the interval [t,s] and then “stretches” it on the whole of [0,1].

Definition 4.5

(Test plans and negligible sets of curves)

We say that a probability measure is a test plan if it is concentrated on AC 2([0,1];X), \(\iint_{0}^{1}|\dot{\gamma}_{t}|^{2}{\mathrm{d}t} \mathrm{d}\boldsymbol {\pi}<\infty\), and there exists a constant C(π) such that

$$ (\mathrm{e}_t)_\sharp\boldsymbol{\pi}\leq C(\boldsymbol{\pi }){\mathfrak{m}}\quad\text{for every }t\in[0,1]. $$
(26)

A Borel set AAC 2([0,1],X) is said negligible if for any test plan π there exists a π-negligible set N such that AN. A property which holds for every γAC 2([0,1],X), except possibly a negligible set, is said to hold for almost every curve.

Remark 4.3

An easy consequence of condition (26) is that if two \({\mathfrak{m}}\)-measurable functions f,g:X→ℝ coincide up to a \({\mathfrak{m}}\)-negligible set and \(\mathcal{T}\) is an at most countable subset of [0,1], then the functions fγ and gγ coincide in \(\mathcal{T}\) for almost every curve γ.

Moreover, choosing an arbitrary test plan π and applying Fubini’s Theorem to the product measure in (0,1)×C([0,1];X) we also obtain that fγ=gγ -a.e. in (0,1) for π-a.e. curve γ; since π is arbitrary, the same property holds for almost every curve.

Coupled with the definition of negligible set of curves, there are the definitions of weak upper gradient and of functions which are Sobolev along a.e. curve.

Definition 4.6

(Weak upper gradients)

A Borel function g:X→[0,∞] is a weak upper gradient of f:X→ℝ if

$$ \biggl \vert \int_{\partial\gamma}f\biggr \vert \leq \int_\gamma g<\infty\quad\text{for a.e.}\ \gamma. $$
(27)

Definition 4.7

(Sobolev functions along a.e. curve)

A function f:X→ℝ is Sobolev along a.e. curve if for a.e. curve γ the function fγ coincides a.e. in [0,1] and in {0,1} with an absolutely continuous map f γ :[0,1]→ℝ.

By Remark 4.3 applied to \(\mathcal{T}:=\{0,1\}\), (27) does not depend on the particular representative of f in the class of \({\mathfrak{m}}\)-measurable function coinciding with f up to a \({\mathfrak{m}}\)-negligible set. The same Remark also shows that the property of being Sobolev along almost every curve γ is independent of the representative in the class of \({\mathfrak{m}}\)-measurable functions coinciding with f \({\mathfrak{m}}\)-a.e. in X.

In the following remarks we will make use of this basic calculus lemma:

Lemma 4.2

Let f:(0,1)→ℝ Lebesgue measurable, q∈[1,∞], gL q(0,1) nonnegative be satisfying

Then fW 1,q(0,1) and |f′|≤g a.e. in (0,1).

Proof

We start by proving the Lemma in the case q=1. It is immediate to check that fL (0,1). Let N⊂(0,1)2 be the -negligible subset where the above inequality fails. By Fubini’s theorem, also the set {(t,h)∈(0,1)2:(t,t+h)∈N∩(0,1)2} is -negligible. In particular, by Fubini’s theorem, for a.e. h we have (t,t+h)∉N for a.e. t∈(0,1). Let h i ↓0 with this property and use the identities

$$\int_0^1f(t)\frac{\phi(t+h)-\phi(t)}{h} { \mathrm{d}t}=-\int_0^1\frac {f(t-h)-f(t)}{-h}\phi(t) {\mathrm{d}t} $$

with \(\phi\in C^{1}_{c}(0,1)\) and h=h i sufficiently small to get

$$\biggl|\,\int_0^1f(t)\phi'(t) { \mathrm{d}t} \biggr|\leq\int_0^1g(t)\big|\phi (t)\big| \mathrm{d}t. $$

It follows that the distributional derivative of f is a signed measure η with finite total variation which satisfies

$$-\int_0^1f \phi' { \mathrm{d}t}=\int_0^1 \phi \mathrm{d}\eta , \qquad \biggl|\,\int_0^1 \phi \mathrm{d}\eta \biggr|\le\int _0^1g |\phi| \mathrm{d} {t}\quad\text{for every }\phi\in C^1_c(0,1); $$

therefore η is absolutely continuous with respect to the Lebesgue measure with . This gives the W 1,1(0,1) regularity and, at the same time, the inequality |f′|≤g a.e. in (0,1). The case q>1 immediately follows by applying this inequality when gL q(0,1). □

With the aid of this lemma, we can prove that the existence of a weak upper gradient implies Sobolev regularity along a.e. curve.

Remark 4.4

(Restriction and equivalent formulation)

Notice that if π is a test plan, so is \((\operatorname{restr}_{t}^{s})_{\sharp}\boldsymbol{\pi}\). Hence if g is a weak upper gradient of f then for every t<s in [0,1] it holds

$$\big|f(\gamma_s)-f(\gamma_t)\big|\leq\int_t^s g(\gamma_r)|\dot{\gamma}_r| {\mathrm{d}r} \quad\text{for a.e.}\ \gamma. $$

Let π be a test plan: by Fubini’s theorem applied to the product measure in (0,1)2×C([0,1];X), it follows that for π-a.e. γ the function f satisfies

An analogous argument shows that

(28)

Since \(g\circ\gamma|\dot{\gamma}|\in L^{1}(0,1)\) for π-a.e. γ, by Lemma 4.2 it follows that fγW 1,1(0,1) for π-a.e. γ, and

$$ \bigg|\frac{\mathrm{d}}{{\mathrm{d}t}}(f\circ\gamma) \bigg|\leq g\circ\gamma|\dot{\gamma}|\quad\text{a.e. in }(0,1),\ \mbox{for}\ \boldsymbol{\pi}\hbox{-a.e. }\gamma. $$
(29)

Since π is arbitrary, we conclude that fγW 1,1(0,1) for a.e. γ, and therefore it admits an absolutely continuous representative f γ ; moreover, by (28), it is immediate to check that f(γ(t))=f γ (t) for t∈{0,1} and a.e. γ.

Remark 4.5

(An approach with a non explicit use of negligible set of curves)

The previous remark could be used to introduce the notion of weak upper gradients without speaking (explicitly) of Borel sets at all. One can simply say that \(g\in L^{2}(X,{\mathfrak{m}})\) is a weak upper gradient of f:X→ℝ provided for every test plan π it holds

$$\int \big|f(\gamma_1)-f(\gamma_0)\big| \mathrm{d}\boldsymbol{ \pi}(\gamma )\leq\iint_0^1g(\gamma_s)|\dot{\gamma}_s| {\mathrm{d}s} \mathrm {d}\boldsymbol{\pi}(\gamma) $$

(this has been the approach followed in [13]).

Proposition 4.12

(Locality)

Let f:X→ℝ be Sobolev along almost all absolutely continuous curves, and let G 1,G 2 be weak upper gradients of f. Then min{G 1,G 2} is a weak upper gradient of f.

Proof

It is a direct consequence of (29). □

Definition 4.8

(Minimal weak upper gradient)

Let f:X→ℝ be Sobolev along almost all curves. The minimal weak upper gradient |Df| w of f is the weak upper gradient characterized, up to \({\mathfrak{m}}\)-negligible sets, by the property

$$ |D f|_w\leq G\quad \mathfrak{m}\hbox{-a.e.~in}\ X,\ \mbox{for every weak upper gradient}\ G\ \mbox{of}\ f. $$
(30)

Uniqueness of the minimal weak upper gradient is obvious. For existence, we take |Df| w :=inf n G n , where G n are weak upper gradients which provide a minimizing sequence in

$$\inf \biggl\{\,\int_X \operatorname{tan}^{-1}G \mathrm{d} {\mathfrak{m}}: G\ \mbox{is a weak upper gradient of}\ f \biggr\}. $$

We immediately see, thanks to Proposition 4.12, that we can assume with no loss of generality that G n+1G n . Hence, by monotone convergence, the function |Df| w is a weak upper gradient of f and \(\int_{X} {\rm tan}^{-1}G \mathrm{d}{\mathfrak{m}}\) is minimal at G=|Df| w . This minimality, in conjunction with Proposition 4.12, gives (30).

Theorem 4.5

(Stability w.r.t. \(\mathfrak {m}\)-a.e. convergence)

Assume that f n are \({\mathfrak{m}}\)-measurable, Sobolev along almost all curves and that G n are weak upper gradients of f n . Assume furthermore that f n (x)→f(x)∈ℝ for \(\mathfrak {m}\)-a.exX and that (G n ) weakly converges to G in \(L^{2}(X,{\mathfrak{m}})\). Then G is a weak upper gradient of f.

Proof

Fix a test plan π. By Mazur’s theorem we can find convex combinations

$$H_h:=\sum_{i=N_h+1}^{N_{h+1}} \alpha_{h,i} G_i\quad\text{with}\ \alpha_{h,i} \geq0,\quad \sum_{i=N_h+1}^{N_{h+1}} \alpha_{h,i}=1,\quad N_h\to\infty $$

converging strongly to G in \(L^{2}(X,{\mathfrak{m}})\). Denoting by \(\tilde{f}_{h}\) the corresponding convex combinations of f h , H h are weak upper gradients of \(\tilde{f}_{h}\) and still \(\tilde{f}_{h}\to f\) \({\mathfrak{m}}\)-a.e. in X.

Since for every nonnegative Borel function φ:X→[0,∞] it holds (with C=C(π))

(31)

we obtain, for \(\bar{C}:=\sqrt{C} (\iint_{0}^{1}|\dot{\gamma}_{t}|^{2} {\mathrm{d}t} \mathrm{d}\boldsymbol{\pi} )^{1/2}\),

By a diagonal argument we can find a subsequence h(n) such that \(\int_{\gamma}|H_{h(n)}-G|+\min\{|\tilde{f}_{h(n)}-f|,1\}\to0\) as n→∞ for π-a.e. γ. Since \(\tilde{f}_{h}\) converge \({\mathfrak{m}}\)-a.e. to f and the marginals of π are absolutely continuous w.r.t. \({\mathfrak{m}}\) we have also that for π-a.e. γ it holds \(\tilde{f}_{h}(\gamma_{0})\to f(\gamma_{0})\) and \(\tilde{f}_{h}(\gamma_{1})\to f(\gamma_{1})\).

If we fix a curve γ satisfying these convergence properties, since \((\tilde{f}_{h(n)})_{\gamma}\) are equi-absolutely continuous (being their derivatives bounded by \(H_{h(n)}\circ\gamma|\dot{\gamma}|\)) and a further subsequence of \(\tilde{f}_{h(n)}\) converges a.e. in [0,1] and in {0,1} to f(γ s ), we can pass to the limit to obtain an absolutely continuous function f γ equal to f(γ s ) a.e. in [0,1] and in {0,1} with derivative bounded by \(G(\gamma_{s})|\dot{\gamma}_{s}|\). Since π is arbitrary we conclude that f is Sobolev along almost all curves and that G is a weak upper gradient of f. □

Remark 4.6

(|Df| w ≤|Df|)

An immediate consequence of the previous proposition is that any fD(Ch) is Sobolev along a.e. curve and satisfies |Df| w ≤|Df|. Indeed, for such f just pick a sequence of Lipschitz functions converging to f in \(L^{2}(X,\mathfrak {m})\) such that |Df n |→|Df| in \(L^{2}(X,{\mathfrak{m}})\) (as in Proposition 4.8) and recall that for Lipschitz functions the local Lipschitz constant is an upper gradient.

4.2.2 A Bound from Below on Weak Gradients

In this short subsection we show how, using test plans and the very definition of minimal weak gradients, it is possible to use |Df| w to bound from below the increments of the relative entropy. We start with the following result, proved—in a more general setting—by Lisini in [22]: it shows how to associate to a curve a plan concentrated on AC 2([0,1];X) representing the curve itself (see also Theorem 8.2.1 of [3] for the Euclidean case). We will only sketch the proof.

Proposition 4.13

(Superposition principle)

Let (X,d) be a compact space and let . Then there exists concentrated on AC 2([0,1];X) such that (e t ) π=μ t for any t∈[0,1] and \(\int|\dot{\gamma}_{t}|^{2} \mathrm{d}\boldsymbol{\pi}(\gamma)=|\dot{\mu}_{t}|^{2}\) for a.e. t∈[0,1].

Proof

If is any plan concentrated on AC 2([0,1],X) such that (e t ) π=μ t for any t∈[0,1], since (e t ,e s ) πAdm(μ t ,μ s ), for any t<s it holds

$$ \begin{aligned} W_2^2(\mu_t, \mu_s)&\leq\int\mathsf{d}^2(\gamma_t, \gamma_s) \mathrm{d}\boldsymbol{\pi} (\gamma) \leq\int \biggl(\int _t^s|\dot{\gamma}_r| {\mathrm{d}r} \biggr)^2 \mathrm{d}\boldsymbol{\pi} (\gamma) \\ &\leq(s-t)\iint_t^s|\dot{\gamma}_r|^2 {\mathrm{d}r} \mathrm {d}\boldsymbol{\pi}(\gamma), \end{aligned} $$

which shows that \(|\dot{\mu}_{t}|^{2}\leq \int|\dot{\gamma}_{t}|^{2} \mathrm{d}\boldsymbol{\pi}(\gamma)\) for a.e. t. Hence, to conclude it is sufficient to find a plan , concentrated on AC 2([0,1],X), with (e t ) π=μ t for any t∈[0,1] such that \(\int|\dot{\mu}_{t}|^{2} {\mathrm{d}t}\geq\iint_{0}^{1}|\dot{\gamma}_{t}|^{2} {\mathrm{d}t} \mathrm{d}\boldsymbol{\pi}(\gamma)\).

To build such a π we make the simplifying assumption that (X,d) is geodesic (the proof for the general case is similar, but rather than interpolating with piecewise geodesic curves one uses piecewise constant ones, this leads to some technical complications that we want to avoid here—see [22] for the complete argument). Fix n∈ℕ and use a gluing argument to find such that \((\pi^{i},\pi^{i+1})_{\sharp}\boldsymbol{\gamma}^{n}\in\mbox{\textsc {Opt}}(\mu_{\frac{i}{n}},\mu_{\frac{i+1}{n}})\) for i=0,…,n−1. By standard measurable selection arguments, there exists a Borel map T n:X n+1C([0,1],X) such that γ:=T n(x 0,…,x n ) is a constant speed geodesic on each of the intervals [i/n,(i+1)/n] and γ i/n =x i , i=0,…,n. Define \(\boldsymbol{\pi}^{n}:=T^{n}_{\sharp}\boldsymbol{\gamma}^{n}\). It holds

$$ \begin{aligned}[b] \iint_0^1| \dot{\gamma}_t|^2 {\mathrm{d}t} \mathrm{d}\boldsymbol { \pi}^n(\gamma)&= \frac{1}{n}\int\sum_{i=0}^{n-1} \mathsf{d}^2 (\gamma_{\frac{i}{n}},\gamma_{\frac{i+1}{n}} ) \mathrm{d}\boldsymbol{\pi}(\gamma)= \frac{1}{n}\sum_{i=0}^{n-1}W_2^2 (\mu_{\frac{i}{n}},\mu_{\frac{i+1}{n}} ) \\ &\leq\int_0^1|\dot{\mu}_t|^2 {\mathrm{d}t}. \end{aligned} $$
(32)

Now notice that the map E:C([0,1],X)→[0,∞] given by \(E(\gamma):=\int_{0}^{1}|\dot{\gamma}_{t}|^{2} {\mathrm{d}t}\) if γAC 2([0,1],X) and +∞ otherwise, is lower semicontinuous and, via a simple equicontinuity argument, with compact sublevels. Therefore by Prokorov’s theorem we get that is a tight sequence, hence for any limit measure π the uniform bound (32) gives the thesis. □

Proposition 4.14

Let \([0,1]\ni t\mapsto\mu_{t}=f_{t}{\mathfrak{m}}\) be a curve in . Assume that for some 0<c<C<∞ it holds cf t C \({\mathfrak{m}}\)-a.e. for any t∈[0,1], and that f 0 is Sobolev along a.e. curve with \(|D f_{0}|_{w}\in L^{2}(X,{\mathfrak{m}})\). Then

Proof

Let be a plan associated to the curve (μ t ) as in Proposition 4.13. The assumption f t C \({\mathfrak{m}}\)-a.e. and the fact that \(\iint_{0}^{1}|\dot{\gamma}_{t}|^{2} {\mathrm{d}t} \mathrm{d}\boldsymbol {\pi}(\gamma)=\int|\dot{\mu}_{t}|^{2} {\mathrm{d}t}<\infty\) guarantee that π is a test plan. Now notice that it holds |Dlogf t | w =|Df t | w /f t (because z↦logz is C 1 in [c,C]), thus we get

 □

4.3 The Two Notions of Gradient Coincide

Here we prove that the two notions of “norm of weak gradient” we introduced coincide. We already noticed in Remark 4.6 that |Df| w ≤|Df|, so that to conclude we need to show that |Df| w ≥|Df|.

The key argument to achieve this is the following lemma, which gives a sharp bound on the W 2-speed of the L 2-gradient flow of Ch. This lemma has been introduced in [15] to study the heat flow on Alexandrov spaces, see also Sect. 6.

Lemma 4.3

(Kuwada’s lemma)

Let \(f_{0}\in L^{2}(X,{\mathfrak{m}})\) and let (f t ) be the L 2-gradient flow of Ch starting from f 0. Assume that for some 0<cC<∞ it holds cf 0C \({\mathfrak{m}}\)-a.e. in X, and that \(\int_{X}f_{0} \mathrm{d}{\mathfrak{m}}=1\). Then the curve \(t\mapsto\mu_{t}:=f_{t}{\mathfrak{m}}\) is absolutely continuous w.r.t. W 2 and it holds

$$|\dot{\mu}_t|^2\leq\int_X \frac{|D f_t|_*^2}{f_t} \mathrm {d} {\mathfrak{m}},\quad \text{\textit{for}\ \textit{a.e.}}\ t\in(0, \infty). $$

Proof

We start from the duality formula (5) with φ=−ψ: taking into account the factor 2 and using the identity Q 1(−ψ)=ψ c we get

$$ \frac{W_2^2(\mu,\nu)}{2}=\sup_\varphi\int _X Q_1\varphi d\nu -\int_X \varphi d\mu $$
(33)

where the supremum runs among all Lipschitz functions φ.

Fix such a φ and recall (Proposition 3.6) that the map tQ t φ is Lipschitz with values in \(L^{\infty}(X,{\mathfrak{m}})\), and a fortiori in \(L^{2}(X,{\mathfrak{m}})\).

Fix also 0≤t<s, set =(st) and recall that since (f t ) is the Gradient Flow of Ch in L 2, the map [0,]∋τf t+τ is absolutely continuous with values in L 2. Therefore the map \([0,\ell]\ni\tau\mapsto Q_{\frac{\tau}{\ell}}\varphi f_{t+\tau}\) is absolutely continuous with values in L 2. The equality

$$\frac{Q_{\frac{\tau+h}{\ell}}\varphi f_{t+\tau+h}-Q_{\frac{\tau }{\ell}} \varphi f_{t+\tau}}{h}= f_{t+\tau} \frac{Q_{\frac{\tau+h}{\ell}}\varphi-Q_{\frac{\tau}{\ell}}\varphi}{h}+Q_{\frac{\tau+h}{\ell}} \varphi \frac{ f_{t+\tau+h}- f_{t+\tau}}{h}, $$

together with the uniform continuity of \((x,\tau)\mapsto Q_{\frac{\tau}{\ell}}\varphi(x)\) shows that the derivative of \(\tau\mapsto Q_{\frac{\tau}{\ell}}\varphi f_{t+\tau}\) can be computed via the Leibniz rule.

We have:

$$ \everymath{\displaystyle} \begin{array}[b]{rcl} \int_X Q_1\varphi \mathrm{d}\mu_s-\int_X \varphi \mathrm{d}\mu_t&=&\int_X Q_1\varphi f_{t+\ell} \mathrm{d} {\mathfrak{m}}-\int _X\varphi f_t \mathrm{d} {\mathfrak{m}}\\ &= & \int _X\int_0^\ell \frac{\mathrm{d}}{\mathrm{d}\tau} (Q_{\frac{\tau}{\ell}}\varphi f_{t+\tau} ) \mathrm{d}\tau \mathrm{d} {\mathfrak{m}} \\[9pt] & \le& \int_X\int_0^\ell \biggl(-\frac{|D Q_{\frac{\tau}{\ell}}\varphi |^2}{2\ell}f_{t+\tau}+Q_{\frac{\tau}{\ell}}\varphi \Delta f_{t+\tau} \biggr) \mathrm{d}\tau \mathrm{d}\mathfrak {m}, \end{array} $$
(34)

having used Theorem 3.2. Observe that by inequalities (23) and (19) we have

$$ \begin{aligned}[b] \int_X Q_{\frac{\tau}{\ell}}\varphi \Delta f_{t+\tau} \mathrm {d} {\mathfrak{m}}& \leq \int_X|D Q_{\frac{\tau}{\ell}}\varphi|_* |D f_{t+\tau}|_* \mathrm{d} {\mathfrak{m}} \leq\int_X|D Q_{\frac{\tau}{\ell}}\varphi| |D f_{t+\tau}|_* \mathrm{d} {\mathfrak{m}} \\ &\leq\frac{1}{2\ell}\int_X|D Q_{\frac{\tau}{\ell}} \varphi|^2f_{t+\tau }\mathrm{d}{\mathfrak{m}}+ \frac{\ell}{2}\int _X\frac{|D f_{t+\tau}|_*^2}{f_{t+\tau}} \mathrm{d} {\mathfrak{m}}. \end{aligned} $$
(35)

Plugging this inequality in (34), we obtain

$$\int_X Q_1\varphi \mathrm{d} \mu_s-\int_X\varphi \mathrm{d} \mu_t\leq \frac{\ell}{2}\int_0^\ell\int _X\frac{|D f_{t+\tau }|_*^2}{f_{t+\tau }} \mathrm{d} {\mathfrak{m}}. $$

This latter bound does not depend on φ, so from (33) we deduce

$$W_2^2(\mu_t,\mu_s)\leq\ell\int _0^\ell\int_X \frac{|D f_{t+\tau }|_*^2}{f_{t+\tau}} \mathrm{d} {\mathfrak{m}}. $$

Since f r c for any r≥0 and rCh(f r ) is nonincreasing and finite for every r>0, we immediately get that tμ t is locally Lipschitz in (0,∞). At Lebesgue points of \(t\mapsto\int_{X}|D f_{t}|_{*}^{2}/f_{t} \mathrm{d}{\mathfrak{m}}\) we obtain the stated pointwise bound on the metric speed. □

Theorem 4.6

Let \(f\in L^{2}(X,{\mathfrak{m}})\). Assume that f is Sobolev along a.e. curve and that \(|D f|_{w}\in L^{2}(X,{\mathfrak{m}})\). Then fD(Ch) and |Df|=|Df| w \({\mathfrak{m}}\)-a.e. in X.

Proof

Up to a truncation argument and addition of a constant, we can assume that 0<cfC<∞ \({\mathfrak{m}}\)-a.e. in X for some c,C. Let (f t ) be the L 2-gradient flow of Ch starting from f and recall that from Proposition 4.11 we have

$$\int_Xf\log f \mathrm{d} {\mathfrak{m}}-\int _Xf_t\log f_t \mathrm {d} { \mathfrak{m}}= \int_0^t\int _X\frac{|D f_s|_*^2}{f_s} {\mathrm{d}s} \mathrm {d} { \mathfrak{m}}<\infty \quad\text{for every }t>0. $$

On the other hand, from Proposition 4.14 and Lemma 4.3 we have

$$ \int_Xf\log f \mathrm{d} {\mathfrak{m}}-\int _Xf_t\log f_t \mathrm {d} { \mathfrak{m}}\leq \frac{1}{2}\int_0^t\int _X\frac{|D f|_w^2}{f^2}f_s {\mathrm{d}s} \mathrm{d} {\mathfrak{m}} +\frac{1}{2}\int_0^t\int _X\frac{|D f_s|_*^2}{f_s} {\mathrm{d}s} \mathrm{d} { \mathfrak{m}} . $$
(36)

Hence we deduce

$$\int_0^t 4\mathsf{Ch}(\sqrt{f_s}) {\mathrm{d}s}=\frac{1}{2}\int_0^t\int _X\frac {|D f_s|_*^2}{f_s} {\mathrm{d}s} \mathrm{d} { \mathfrak{m}} \leq\frac{1}{2}\int_0^t\int _X\frac{|D f|_w^2}{f^2}f_s {\mathrm{d}s} \mathrm{d} {\mathfrak{m}}. $$

Letting t↓0, taking into account the L 2-lower semicontinuity of Ch and the fact—easy to check from the maximum principle—that \(\sqrt{f_{s}}\to\sqrt{f}\) as s↓0 in \(L^{2}(X,{\mathfrak{m}})\), we get \(\mathsf{Ch}(\sqrt{f})\leq\varliminf_{t\downarrow 0}\frac{1}{t}\int_{0}^{t}\mathsf{Ch}(\sqrt{f_{s}}) {\mathrm{d}s}\). On the other hand, the bound fc>0 ensures \(\frac{|D f|_{w}^{2}}{f^{2}}\in L^{1}(X,{\mathfrak{m}})\) and the maximum principle again together with the convergence of f s to f in \(L^{2}(X,{\mathfrak{m}})\) when s↓0 grants that the convergence is also weak in \(L^{\infty}(X,{\mathfrak{m}})\), therefore \(\int_{X}\frac{|D f|_{w}^{2}}{f} \mathrm{d}{\mathfrak{m}}= \lim_{t\downarrow 0}\frac{1}{t}\int_{0}^{t}\int_{X}\frac{|D f|_{w}^{2}}{f^{2}}f_{s} \mathrm {d}{\mathfrak{m}} {\mathrm{d}s}\).

In summary, we proved

$$\frac{1}{2}\int_X\frac{|D f|_*^2}{f} \mathrm{d} { \mathfrak{m}}\leq\frac{1}{2}\int_X \frac{|D f|_w^2}{f} \mathrm{d} {\mathfrak{m}}, $$

which, together with the inequality |Df| w ≤|Df| \({\mathfrak{m}}\)-a.e. in X, gives the conclusion. □

We are now in the position of defining the Sobolev space \(W^{1,2}(X,\mathsf{d},{\mathfrak{m}})\). We start with the following simple and general lemma.

Lemma 4.4

Let (B,∥⋅∥) be a Banach space and let E:B→[0,∞] be a 1-homogeneous, convex and lower semicontinuous map. Then the vector space {E<∞} endowed with the norm

$$\|v\|_E:=\sqrt{\|v\|^2+E^2(v)}, $$

is a Banach space.

Proof

It is clear that (D(E),∥⋅∥ E ) is a normed space, so we only need to prove completeness. Pick a sequence (v n )⊂D(E) which is Cauchy w.r.t. ∥⋅∥ E . Then, since ∥⋅∥≤∥⋅∥ E we also get that (v n ) is Cauchy w.r.t. ∥⋅∥, and hence there exists vB such that ∥v n v∥→0. The lower semicontinuity of E grants that \(E(v)\leq\varliminf_{n}E(v_{n})<\infty\) and also that it holds

$$\mathop{\overline{\lim}}_{n\to\infty}\|v_n-v\|_E \leq\mathop {\overline{\lim}}_{n,m\to\infty}\|v_n-v_m \|_E=0, $$

which is the thesis. □

Therefore, if we want to build the space \(W^{1,2}(X,\mathsf {d},{\mathfrak{m}})\subset L^{2}(X,{\mathfrak{m}})\), the only thing that we need is an L 2-lower semicontinuous functional playing the role which on ℝd is played by the L 2-norm of the distributional gradient of Sobolev functions. We certainly have this functional, namely the map \(f\mapsto\||D f|_{*}\|_{L^{2}(X,{\mathfrak{m}})}=\||D f|_{w}\|_{L^{2}(X,{\mathfrak{m}})}\). Hence the lemma above provides the Banach space \(W^{1,2}(X,\mathsf{d},{\mathfrak{m}})\). Notice that in general \(W^{1,2}(X,\mathsf{d},{\mathfrak{m}})\) is not Hilbert: this is not surprising, as already the Sobolev space W 1,2 built over \((\mathbb{R}^{d},\|\cdot\|,\mathcal{L}^{d})\) is not Hilbert if the underlying norm ∥⋅∥ does not come from a scalar product.

4.4 Comparison with Previous Approaches

It is now time to underline that the one proposed here is certainly not the first definition of Sobolev space over a metric measure space (we refer to [17] for a much broader overview on the subject). Here we confine the discussion only to weak notions of (modulus of) gradient, and in particular to [9] and [20, 30]. Also, we discuss only the quadratic case, referring to [5] for general power functions p and the independence (in a suitable sense) of p of minimal gradients.

In [9] Cheeger proposed a relaxation procedure similar to the one used in Sect. 4.1, but rather than relaxing the local Lipschitz constant of Lipschitz functions, he relaxed upper gradients of arbitrary functions. More precisely, he defined

$$E(f):=\inf\varliminf_{n\to\infty}\|G_n\|_{L^2(X,{\mathfrak{m}})}, $$

where the infimum is taken among all sequences (f n ) converging to f in \(L^{2}(X,{\mathfrak{m}})\) such that G n is an upper gradient for f n . Then, with the same computations done in Sect. 4.1 (actually and obviously, the story goes the other way around: we closely followed his arguments) he showed that for fD(E) there is an underlying notion of weak gradient |Df| C , called minimal generalized upper gradient, such that \(E(f)=\||D f|_{C}\|_{L^{2}(X,{\mathfrak{m}})}\) and

$$|D f|_C\leq G\quad \mathfrak{m}\hbox{-a.e.~in}\ X, $$

for any G weak limit of a sequence (G n ) as in the definition of E(f).

Notice that since the local Lipschitz constant is always an upper gradient for Lipschitz functions, one certainly has

$$ |D f|_C\leq|D f|_*\quad \mathfrak {m}\hbox{-a.e.~in}\ X,\ \mbox{for any}\ f\in D(\mathsf{Ch}). $$
(37)

Koskela and MacManus [20] introduced and Shanmugalingam [30] further studied a procedure close to ours (again: actually we have been inspired by them) to produce a notion of “norm of weak gradient” which does not require a relaxation procedure. Recall that for ΓAC([0,1],X) the 2-Modulus Mod2(Γ) is defined by

$$ \operatorname{Mod}_2(\varGamma):=\inf \biggl\{\|\rho \|^2_{L^2(X,{\mathfrak{m}})} : \int_\gamma\rho\geq1\ \forall \gamma\in\varGamma \biggr\}\quad\text{for every }\varGamma\subset AC \bigl([0,1],X\bigr). $$
(38)

It is possible to show that the 2-Modulus is an outer measure on AC([0,1],X). Building on this notion, Koskela and MacManus [20] considered the class of functions f which satisfy the upper gradient inequality not necessarily along all curves, but only out of a Mod2-negligible set of curves. In order to compare more properly this concept to Sobolev classes, Shanmugalingam said that G:X→[0,∞] is a weak upper gradient for f if there exists \(\tilde{f}=f\) \({\mathfrak{m}}\)-a.e. such that

$$\big|\tilde{f}(\gamma_0)-\tilde{f}(\gamma_1) \big|\leq\int _\gamma G\quad\text{for every }\gamma\in \mathit{AC}\bigl([0,1],X\bigr) \setminus\mathcal{N} \quad\text{with }\mathrm{Mod}_2(\mathcal{N})=0. $$

Then, she defined the energy \(\tilde{E}:L^{2}(X,{\mathfrak{m}})\to[0,\infty]\) by putting

$$\tilde{E}(f):=\inf\|G\|_{L^2(X,{\mathfrak{m}})}^2, $$

where the infimum is taken among all weak upper gradient G of f according to the previous condition. Thanks to the properties of the 2-modulus (a stability property of weak upper gradients analogous to ours), it is possible to show that \(\tilde{E}\) is indeed L 2-lower semicontinuous, so that it leads to a good definition of the Sobolev space. Also, using a key lemma due to Fuglede, Shanmugalingam proved that \(E=\tilde{E}\) on \(L^{2}(X,{\mathfrak{m}})\), so that they produce the same definition of Sobolev space \(W^{1,2}(X,\mathsf{d},{\mathfrak{m}})\) and the underlying gradient |Df| S which gives a pointwise representation to \(\tilde{E}(f)\) is the same |Df| C behind the energy E.

Observe now that for a Borel set ΓAC 2([0,1],X) and a test plan π, integrating w.r.t. π the inequality ∫ γ ρ≥1 ∀γΓ and then minimizing over ρ, we get

$$\bigl[\boldsymbol{\pi}(\varGamma) \bigr]^2\leq C(\boldsymbol{\pi }) \operatorname{Mod}_2(\varGamma)\iint_0^1|\dot{\gamma}|^2 {\mathrm{d}s} \mathrm {d}\boldsymbol{\pi}(\gamma), $$

which shows that any Mod2-negligible set of curves is also negligible according to Definition 4.5. This fact easily yields that any \(f\in D(\tilde{E})\) is Sobolev along a.e. curve and satisfies

$$ |D f|_w\leq|D f|_C,\quad \mathfrak{m}\hbox{-a.e.~in }X. $$
(39)

Given that we proved in Theorem 4.6 that |Df|=|Df| w , inequalities (37) and (39) also give that |Df|=|Df| w =|Df| C =|Df| S (the smallest one among the four notions coincides with the largest one).

What we get by the new approach to Sobolev spaces on metric measure spaces is the following result.

Theorem 4.7

(Density in energy of Lipschitz functions)

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a compact normalized metric measure space. Then for any \(f\in L^{2}(X,{\mathfrak{m}})\) with weak upper gradient in \(L^{2}(X,{\mathfrak{m}} )\) there exists a sequence (f n ) of Lipschitz functions converging to f in \(L^{2}(X,{\mathfrak{m}})\) such that both |Df n | and |Df n | w converge to |Df| w in \(L^{2}(X,{\mathfrak{m}})\) as n→∞.

Proof

Straightforward consequence of the identity of weak and relaxed gradients and of Proposition 4.8. □

Let us point out a few aspects behind the strategy of the proof of Theorem 4.7, which of course strongly relies on Lemma 4.3 and Proposition 4.14. First of all, let us notice that the stated existence of a sequence of Lipschitz function f n converging to f with |Df n |→|Df| w in \(L^{2}(X,{\mathfrak{m}})\) is equivalent to show that

$$ \lim_{n\to\infty} Y_{1/n}(f)\le\int _X|D f|_w^2 \mathrm {d} { \mathfrak{m}}, $$
(40)

where, for τ>0, Y τ denotes the Yosida regularization

$$Y_\tau(f):= \inf_{h\in\operatorname{Lip}(X)} \biggl \{\frac{1}{2}\int_X|D h|^2 \mathrm{d} { \mathfrak{m}} +\frac{1}{2\tau} \int_X|h-f|^2 \mathrm{d} {\mathfrak{m}} \biggr\}. $$

In fact, the sequence f n can be chosen by a simple diagonal argument among the approximate minimizers of Y 1/n (f). On the other hand, it is well known that the relaxation procedure we used to define the Cheeger energy yields

$$ Y_{1/n}(f)=\min_{h\in D(\mathsf{Ch})} \biggl\{\mathsf{Ch}(h)+ \frac{n}{2}\int_X |h-f|^2 \mathrm{d} { \mathfrak{m}} \biggr\}, $$
(41)

and therefore (40) could be achieved by trying to estimate the Cheeger energy of the unique minimizer \(\tilde{f}_{n}\) of (41) in terms of |Df| w .

Instead of using the Yosida regularization Y 1/n , in the proof of Theorem 4.6 we obtained a better approximation of f by flowing it (for a small time step, say t n ↓0) through the L 2-gradient flow f t of the Cheeger energy. This flow is strictly related to Y τ , since it can be obtained as the limit of suitably rescaled iterated minimizers of Y τ (the so called Minimizing Movement scheme, see e.g. [3]), but has the great advantage to provide a continuous curve of probability densities f t , which can be represented as the image of a test plan, through Lisini’s Theorem. Thanks to this representation and Kuwada’s Lemma, we were allowed to use the weak upper gradient |Df| w instead of |Df| to estimate the Entropy dissipation along f t (see (36)) and to obtain the desired sharp bound of |Df s | at least for some time s∈(0,t n ). In any case, a posteriori we recovered the validity of (40).

This density result was previously known (via the use of maximal functions and covering arguments) under the assumption that the space was doubling and supported a local Poincaré inequality for weak upper gradients, see [9, Theorem 4.14, Theorem 4.24]. Actually, Cheeger proved more, namely that under these hypotheses Lipschitz functions are dense in the W 1,2 norm, a result which is still unknown in the general case. Also, notice that another byproduct of our density in energy result is the equivalence of local Poincaré inequality stated for Lipschitz functions on the left hand side and slope on the right hand side, and local Poincaré inequality stated for general functions on the left hand side and upper gradients on the right hand side; this result was previously known [19] under much more restrictive assumptions on the metric measure structure.

5 The Relative Entropy and Its W 2-Gradient Flow

In this section we study the W 2-gradient flow of the relative entropy on spaces with Ricci curvature bounded below (in short: CD(K,∞) spaces). The content is essentially extracted from [12]. As before the space \((X,\mathsf{d},{\mathfrak{m}})\) is compact and normalized (i.e. \({\mathfrak{m}}(X)=1\)).

Recall that the relative entropy functional is defined by

$$\operatorname {Ent}_{\mathfrak {m}}(\mu):= \begin{cases} \int_Xf\log f \mathrm{d}{\mathfrak{m}}&\mbox{if }\mu=f\mathfrak {m},\\ +\infty&\mbox{otherwise}. \end{cases} $$

Definition 5.9

(Weak bound from below on the Ricci curvature)

We say that \((X,\mathsf{d},{\mathfrak{m}})\) has Ricci curvature bounded from below by K for some K∈ℝ if the Relative Entropy functional \(\operatorname{Ent}_{{\mathfrak{m}}}\) is K-convex along geodesics in . More precisely, if for any \(\mu_{0}, \mu_{1}\in D(\operatorname{Ent}_{{\mathfrak{m}}})\) there exists a constant speed geodesic between μ 0 and μ 1 satisfying

$$\operatorname{Ent}_{\mathfrak{m}}(\mu_t)\leq(1-t)\operatorname{Ent}_{\mathfrak{m}}( \mu_0)+ t\operatorname{Ent}_{\mathfrak{m}}(\mu_1)- \frac{K}{2}t(1-t)W_2^2(\mu_0, \mu_1)\quad \forall t\in[0,1]. $$

This definition was introduced in [23] and [31]. Its two basic features are: compatibility with the Riemannian case (i.e. a compact Riemannian manifold endowed with the normalized volume measure has Ricci curvature bounded below by K in the classical pointwise sense if and only if \(\operatorname {Ent}_{{\mathfrak{m}}}\) is K-geodesically convex in ) and stability w.r.t. measured Gromov-Hausdorff convergence.

We also recall that Finsler geometries are included in the class of metric measure spaces with Ricci curvature bounded below. This means that if we have a smooth compact Finsler manifold (that is: a differentiable manifold endowed with a norm—possibly not coming from an inner product—on each tangent space which varies smoothly on the base point) endowed with an arbitrary positive C measure, then this space has Ricci curvature bounded below by some K∈ℝ (see the theorem stated at page 926 of [32] for the flat case and [24] for the general one).

The goal now is to study the W 2-gradient flow of \(\operatorname {Ent}_{{\mathfrak{m}}}\). Notice that the general theory of gradient flows of K-convex functionals ensures the following existence result (see the representation formula for the slope (7) and Theorem 2.1).

Theorem 5.8

(Consequences of the general theory of gradient flows)

Let \((X,\mathsf{d}, {\mathfrak{m}})\) be a CD(K,∞) space. Then the slope \(|D^{-}\operatorname{Ent}_{{\mathfrak{m}}}|\) is lower semicontinuous w.r.t. weak convergence and for any \(\mu\in D(\operatorname{Ent}_{{\mathfrak{m}}})\) there exists a gradient flow (in the EDE sense of Definition 2.1) of \(\operatorname{Ent}_{{\mathfrak{m}}}\) starting from μ.

Thus, existence is granted. The problem is then to show uniqueness of the gradient flow. To this aim, we need to introduce the concept of push forward via a plan.

Definition 5.10

(Push forward via a plan)

Let and let be such that \(\mu\ll\pi^{1}_{\sharp}\boldsymbol{\gamma}\). The measures and are defined as:

$$\mathrm{d}\boldsymbol{\gamma}_\mu(x,y):=\frac{\mathrm{d}\mu }{\mathrm{d}\pi^1_\sharp \boldsymbol{\gamma}}(x) \mathrm{d}\boldsymbol{\gamma}(x,y),\qquad {\boldsymbol{\gamma}_\sharp \mu}:=\pi^2_\sharp\boldsymbol{\gamma }_\mu. $$

Observe that, since γ μ γ, we have \({\boldsymbol{\gamma}_{\sharp}\mu}\ll\pi^{2}_{\sharp}\boldsymbol {\gamma}\). We will say that γ has bounded deformation if there exist 0<cC<∞ such that \(c{\mathfrak{m}}\leq\pi^{i}_{\sharp}\boldsymbol{\gamma}\leq C{\mathfrak{m}}\), i=1,2. Writing \(\mu=f \pi^{1}_{\sharp}\boldsymbol{\gamma}\), the definition gives that

$$ \boldsymbol{\gamma}_\sharp\mu=\eta \pi^2_\sharp\boldsymbol {\gamma}\quad\text{with $\eta$ given by}\quad \eta(y)=\int f(x) \mathrm{d}\boldsymbol{\gamma}_y(x), $$
(42)

where {γ y } yX is the disintegration of γ w.r.t. its second marginal.

The operation of push forward via a plan has interesting properties in connection with the relative entropy functional.

Proposition 5.15

The following properties hold:

  1. (i)

    For any , such that \(\mu, \nu\ll\pi^{1}_{\sharp}\boldsymbol{\gamma}\) it holds

    $$\operatorname {Ent}_{\boldsymbol{\gamma}_\sharp \nu}(\boldsymbol{\gamma}_\sharp\mu)\leq \operatorname {Ent}_{\nu}(\mu). $$
  2. (ii)

    For \(\mu\in D(\operatorname{Ent}_{{\mathfrak{m}}})\) and with bounded deformation, it holds \(\boldsymbol{\gamma}_{\sharp}\mu\in D(\operatorname{Ent}_{{\mathfrak{m}}})\).

  3. (iii)

    Given with bounded deformation, the map

    $$D(\operatorname{Ent}_{{\mathfrak{m}}})\ni\mu\quad\mapsto\quad \operatorname {Ent}_{{ \mathfrak{m}}}(\mu)-\operatorname {Ent}_{{\mathfrak{m}}}(\boldsymbol{\gamma}_\sharp\mu), $$

    is convex (w.r.t. linear interpolation of measures).

Proof

(i). We can assume μν, otherwise there is nothing to prove. Then it is immediate to check from the definition that γ μγ ν. Let μ=, \(\nu=\theta \pi^{1}_{\sharp}\gamma\), γ μ=η γ ν, and u(z):=zlogz. By disintegrating γ as in (42), we have that

$$\eta(y)=\int f(x) \mathrm{d}\tilde{\boldsymbol{\gamma}}_y(x),\quad \tilde{\boldsymbol{\gamma}}_y= \biggl(\int\theta(x) \mathrm {d} \boldsymbol{\gamma}_y(x) \biggr)^{-1} \theta \boldsymbol{\gamma}_y. $$

The convexity of u and Jensen’s inequality with the probability measures \(\tilde{\boldsymbol{\gamma}}_{y}\) yield

$$u\bigl(\eta(y)\bigr)\le\int u\bigl(f(x)\bigr) \mathrm{d}\tilde{\boldsymbol{ \gamma}}_y(x). $$

Since \(\{\tilde{\boldsymbol{\gamma}}_{y}\}_{y\in X}\) is the disintegration of \(\tilde{\boldsymbol{\gamma}}=(\theta\circ\pi^{1})\boldsymbol{\gamma }\) with respect to its second marginal γ ν and the first marginal of \(\tilde{\boldsymbol{\gamma}}\) is ν, by integration of both sides with respect to γ ν we get

(ii). Taking into account the identity

$$ \operatorname {Ent}_{\nu}(\mu)=\operatorname {Ent}_{\sigma}(\mu)+\int\log \biggl( \frac{\mathrm{d}\sigma}{\mathrm{d} \nu} \biggr) \mathrm{d}\mu, $$
(43)

valid for any with σ having bounded density w.r.t. ν, the fact that \(\boldsymbol{\gamma}_{\sharp}(\pi^{1}_{\sharp}\boldsymbol{\gamma})=\pi^{2}_{\sharp}\boldsymbol{\gamma}\) and the fact that \(c{\mathfrak{m}}\leq\pi^{1}_{\sharp}\boldsymbol{\gamma},\pi^{2}_{\sharp}\boldsymbol{\gamma}\leq C{\mathfrak{m}}\), the conclusion follows from

(iii). Let \(\mu_{0}, \mu_{1}\in D(\operatorname {Ent}_{{\mathfrak{m}}})\) and define μ t :=(1−t)μ 0+ 1 and ν t :=γ μ t . A direct computation shows that

and from (i) we have that

$$\operatorname {Ent}_{\mu_t}(\mu_i)\geq \operatorname {Ent}_{\boldsymbol{\gamma}_\sharp \mu_t}(\boldsymbol{ \gamma}_\sharp\mu_i) =\operatorname {Ent}_{\nu_t}(\nu_i),\quad\forall t \in[0,1],\ i=0,1, $$

which gives the conclusion. □

In the next lemma and in the sequel we use the short notation

$$C(\boldsymbol{\gamma}):=\int_{X\times X}\mathsf{d}^2(x,y) \mathrm {d}\boldsymbol{\gamma}(x,y). $$

Lemma 5.5

(Approximability in Entropy and distance)

Let \(\mu, \nu\in D(\operatorname{Ent}_{\mathfrak {m}})\). Then there exists a sequence (γ n) of plans with bounded deformation such that \(\operatorname{Ent}_{\mathfrak{m}}(\boldsymbol{\gamma}^{n}_{\sharp}\mu)\to \operatorname{Ent}_{\mathfrak{m}}(\nu)\) and \(C(\boldsymbol{\gamma}^{n}_{\mu})\to W_{2}^{2}(\mu,\nu)\) as n→∞.

Proof

Let f and g respectively be the densities of μ and ν w.r.t. \({\mathfrak{m}}\); pick γOpt(μ,ν) and, for every n∈ℕ, let A n :={(x,y):f(x)+g(y)≤n} and

$$\boldsymbol{\gamma}_n:=c_n \biggl(\boldsymbol{ \gamma}|_{A_n}+\frac{1}{n}(\mathrm{Id},\mathrm{Id})_\sharp { \mathfrak{m}} \biggr), $$

where c n →1 is the normalization constant. It is immediate to check that γ n is of bounded deformation and that this sequence satisfies the thesis (see [12] for further details). □

Proposition 5.16

(Convexity of the squared slope)

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a CD(K,∞) space. Then the map

$$D(\operatorname{Ent}_{{\mathfrak{m}}})\ni\mu\quad\mapsto\quad |D^- \operatorname{Ent}_{{\mathfrak{m}}}|^2(\mu) $$

is convex (w.r.t. linear interpolation of measures).

Notice that the only assumption that we make is the K-convexity of the entropy w.r.t. W 2, and from this we deduce the convexity w.r.t. the classical linear interpolation of measures of the squared slope.

Proof

Recall that from (7) we know that

We claim that it also holds

$$\big|D^-\operatorname{Ent}_{{\mathfrak{m}}}\big|(\mu)=\sup_{\boldsymbol {\gamma}} \frac{[\operatorname{Ent}_{\mathfrak{m}}(\mu)- \operatorname{Ent}_{\mathfrak{m}}(\boldsymbol{\gamma}_\sharp\mu) - \frac{K^-}{2}C(\boldsymbol{\gamma}_\mu) ]^+}{\sqrt{C(\boldsymbol{\gamma}_\mu)}}, $$

where the supremum is taken among all plans with bounded deformation (where the right hand side is taken 0 by definition if C(γ μ )>0).

Indeed, Lemma 5.5 gives that the first expression is not larger than the second. For the converse inequality we can assume C(γ μ )>0, ν=γ μμ, and K<0. Then it is sufficient to apply the simple inequality

$$a, b, c\in\mathbb{R},\quad 0<b\leq c\quad\Rightarrow\quad\frac{(a-b)^+}{\sqrt{b}} \geq\frac{(a-c)^+}{\sqrt{c}}, $$

with \(a:=\operatorname {Ent}_{\mathfrak {m}}(\mu)-\operatorname {Ent}_{\mathfrak {m}}(\boldsymbol{\gamma}_{\sharp}\mu)\), \(b:=\frac {K^{-}}{2}W^{2}_{2}(\mu,\boldsymbol{\gamma}_{\sharp}\mu)\) and \(c:=\frac{K^{-}}{2}C(\boldsymbol{\gamma}_{\mu})\).

Thus, to prove the thesis it is enough to show that for every γ with bounded deformation the map

$$D(\operatorname{Ent}_{{\mathfrak{m}}})\ni\mu\quad\mapsto\quad \frac{ [ (\operatorname {Ent}_{\mathfrak {m}}(\mu) -\operatorname {Ent}_{\mathfrak {m}}(\boldsymbol{\gamma}_\sharp\mu) -\frac{K^-}{2}C(\boldsymbol{\gamma}_\mu) )^+]^2}{C(\boldsymbol{\gamma}_\mu)}, $$

is convex w.r.t. linear interpolation of measures.

Clearly the map

$$D(\operatorname{Ent}_{{\mathfrak{m}}})\ni\mu\quad\mapsto\quad C(\boldsymbol{ \gamma}_\mu)=\int \biggl( \int\mathsf{d}^2(x,y) \mathrm{d}\boldsymbol{\gamma}_{x}(y) \biggr) \mathrm{d}\mu(x), $$

where {γ x } is the disintegration of γ w.r.t. its first marginal, is linear. Thus, from (iii) of Proposition 5.15 we know that the map

$$\mu\quad\mapsto\quad \operatorname {Ent}_{\mathfrak {m}}(\mu)-\operatorname {Ent}_{\mathfrak {m}}(\boldsymbol{\gamma}_\sharp \mu)- \frac {K^-}{2}C(\boldsymbol{\gamma}_\mu), $$

is convex w.r.t. linear interpolation of measures. Hence the same is true for its positive part. The conclusion follows from the fact that the function Ψ:[0,∞)2→ℝ∪{+∞} defined by

$$\varPsi(a,b):= \begin{cases} \frac{a^2}{b}&\mbox{if }b>0,\\ +\infty&\mbox{if }b=0,a>0\\ 0&\mbox{if }a=b=0,\\ \end{cases} $$

is convex and it is nondecreasing w.r.t. a. □

The convexity of the squared slope allows to prove uniqueness of the gradient flow of the entropy:

Theorem 5.9

(Uniqueness of the gradient flow of \(\operatorname {Ent}_{{\mathfrak{m}}}\))

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a CD(K,∞) space and let \(\mu\in D(\operatorname{Ent}_{{\mathfrak{m}}})\). Then there exists a unique gradient flow of \(\operatorname{Ent}_{{\mathfrak{m}}}\) starting from μ in .

Proof

We recall (inequality (4)) that the squared Wasserstein distance is convex w.r.t. linear interpolation of measures. Therefore, given two absolutely continuous curves \((\mu^{1}_{t})\) and \((\mu^{2}_{t})\), the curve \(t\mapsto\mu_{t}:=\frac{\mu^{1}_{t}+\mu^{2}_{t}}{2}\) is absolutely continuous as well and its metric speed can be bounded from above by

$$ |\dot{\mu}_t|^2\leq\frac{|\dot{\mu}^1_t|^2+|\dot{\mu}^2_t|^2}{2}, \quad \text{for a.e.}\ t\in(0,\infty). $$
(44)

Let \((\mu^{1}_{t})\) and \((\mu^{2}_{t})\) be gradient flows of \(\operatorname {Ent}_{{\mathfrak{m}}}\) starting from \(\mu\in D(\operatorname{Ent}_{{\mathfrak{m}}})\). Then we have

Adding up these two equalities, using the convexity of the squared slope guaranteed by Proposition 5.16, the convexity of the squared metric speed given by (44) and the strict convexity of the relative entropy, we deduce that for the curve tμ t it holds

$$\operatorname {Ent}_{\mathfrak {m}}(\mu)>\operatorname {Ent}_{\mathfrak {m}}(\mu_T)+\frac{1}{2}\int_0^T| \dot{\mu}_t|^2 \mathrm{d} {t}+\frac{1}{2}\int _0^T\big|D^-\operatorname{Ent}_{{\mathfrak{m}}}\big|^2( \mu_t) {\mathrm{d}t}, $$

for every T such that \(\mu^{1}_{T}\neq\mu^{2}_{T}\). This contradicts inequality (9). □

6 The Heat Flow as Gradient Flow

It is well known that on ℝd the heat flow can be seen both as gradient flow of the Dirichlet energy in L 2 and as gradient flow of the relative entropy in . It is therefore natural to ask whether this identification between the two a priori different gradient flows persists or not in a general compact and normalized metric measure space \((X,\mathsf {d},{\mathfrak{m}})\).

The strategy consists in considering a gradient flow (f t ) of Ch with nonnegative initial data and in proving that the curve \(t\mapsto\mu_{t}:=f_{t}{\mathfrak{m}}\) is a gradient flow of \(\operatorname {Ent}_{\mathfrak {m}}(\cdot)\) in : by the uniqueness result of Theorem 5.9 this will be sufficient to conclude.

We already built most of the ingredients needed for the proof to work, the only thing that we should add is the following lemma, where the slope of \(\operatorname{Ent}_{{\mathfrak{m}}}\) is bounded from above in terms of the notions of “norm of weak gradient” that we discussed in Chap. 4. Notice that the bound (47) for Lipschitz functions was already known to Lott-Villani [23], so that our added value here is the use of the density in energy of Lipschitz functions to get the correct, sharp inequality (45) (sharpness will be seen in (48)).

Lemma 6.6

(Fisher bounds slope)

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a compact and normalized CD(K,∞) metric-measure space and let f be a probability density which is Sobolev along a.e. curve. Then

$$ \big|D^-\operatorname{Ent}_{{\mathfrak{m}}}\big|^2(f{\mathfrak{m}})\leq \int _X\frac{|D f|^2_w}{f} \mathrm{d} {\mathfrak{m}}=4\int _X|D \sqrt{f}|_w^2 \mathrm{d} { \mathfrak{m}}. $$
(45)

Proof

Assume at first that f is Lipschitz with 0<cf, and let (f n ) be a sequence of probability densities such that \(W_{2}(f_{n}{\mathfrak{m}},f{\mathfrak{m}})\to0\) and where the slope of \(\operatorname{Ent}_{{\mathfrak{m}}}\) at \(f{\mathfrak{m}}\) is attained. Choose \(\boldsymbol{\gamma}_{n}\in\mbox {\textsc{Opt}}(f{\mathfrak{m}},f_{n}{\mathfrak{m}})\) and notice that

(46)

where γ n,x is the disintegration of γ n with respect to \(f{\mathfrak{m}}\), and L is the bounded Borel function

$$L(x,y):= \begin{cases} \frac{ |\log f(x)-\log f(y) |}{\mathsf{d}(x,y)}& \mbox{if }x\neq y,\\ |D\log f|(x)=\frac{|Df|(x)}{f(x)}&\mbox{if}x= y. \end{cases} $$

Notice that for every xX the map yL(x,y) is upper-semicontinuous; since \(\int (\int \mathsf{d}^{2}(x,y) \mathrm{d}\boldsymbol{\gamma}_{n,x} )f(x) \mathrm{d}{\mathfrak{m}}\to0\) as n→∞, we can assume without loss of generality that

$$\lim_{n\to\infty}\int\mathsf{d}^2(x,y) \mathrm{d}\boldsymbol { \gamma}_{n,x}(y)=0\quad \mbox{for }f{\mathfrak{m}}\hbox{-a.e. }x\in X. $$

Fatou’s Lemma then yields

$$\mathop{\overline{\lim}}_{n\to\infty}\int L^2(x,y) \mathrm {d} \boldsymbol{\gamma}_n(x,y) \leq\int_XL^2(x,x)f(x) \mathrm{d} {\mathfrak{m}}(x)=\int_X\frac{|Df |^2}{f} \mathrm{d} {\mathfrak{m}}, $$

hence (46) gives

$$ \big|D^-\operatorname{Ent}_{{\mathfrak{m}}}\big|(f{\mathfrak{m}})= \mathop {\overline{\lim}}_{n\to\infty}\frac{(\operatorname {Ent}_{\mathfrak {m}}(f{\mathfrak{m}} )-\operatorname {Ent}_{\mathfrak {m}}(f_n{\mathfrak{m}}))^+}{ W_2(f{\mathfrak{m}},f_n{\mathfrak{m}})}\leq\sqrt{ \int _X\frac{|Df |^2}{f} \mathrm{d} {\mathfrak{m}}}. $$
(47)

We now turn to the general case. Let f be any probability density Sobolev along a.e. curve such that \(\sqrt{f}\in D(\mathsf{Ch})\) (otherwise is nothing to prove). We use Theorem 4.7 to find a sequence of Lipschitz functions \((\sqrt{f_{n}})\) converging to \(\sqrt{f}\) in \(L^{2}(X,{\mathfrak{m}})\) and such that \(|D \sqrt{f_{n}}|\to|D \sqrt{f}|_{w}\) in \(L^{2}(X,{\mathfrak{m}})\) and \({\mathfrak{m}}\)-a.e. Up to summing up positive and vanishing constants and multiplying for suitable normalization factors, we can assume that 0<c n f n and \(\int_{X}f_{n} \mathrm{d}{\mathfrak{m}}=1\), for any n∈ℕ. The conclusion follows passing to the limit in (47) by taking into account the weak lower semicontinuity of \(|D^{-}\operatorname{Ent}_{{\mathfrak{m}}}|\) (formula (7) and discussion thereafter). □

Theorem 6.10

(The heat flow as gradient flow)

Let \(f_{0}\in L^{2}(X,{\mathfrak{m}})\) be such that and denote by (f t ) the gradient flow of Ch in \(L^{2}(X,{\mathfrak{m}})\) starting from f 0 and by (μ t ) the gradient flow of \(\operatorname{Ent}_{{\mathfrak{m}}}\) in starting from μ 0. Then \(\mu_{t}=f_{t}{\mathfrak{m}}\) for any t≥0.

Proof

Thanks to the uniqueness result of Theorem 5.9, it is sufficient to prove that \((f_{t}{\mathfrak{m}})\) satisfies the Energy Dissipation Equality for \(\operatorname{Ent}_{{\mathfrak{m}}}\) in . We assume first that 0<cf 0C<∞ \({\mathfrak{m}}\)-a.e. in X, so that the maximum principle (Proposition 4.11) ensures 0<cf t C<∞ for any t>0. By Proposition 4.11 we know that \(t\mapsto \operatorname {Ent}_{{\mathfrak{m}}}(f_{t}{\mathfrak{m}})\) is absolutely continuous with derivative equal to \(-\int_{X}\frac{|D f_{t}|_{w}^{2}}{f_{t}} \mathrm {d}{\mathfrak{m}}\). Lemma 4.3 ensures that \(t\mapsto f_{t}{\mathfrak{m}}\) is absolutely continuous w.r.t. W 2 with squared metric speed bounded by \(\int_{X}\frac{|D f_{t}|_{w}^{2}}{f_{t}} \mathrm{d}{\mathfrak{m}}\), so that taking into account Lemma 6.6 we get

$$\operatorname {Ent}_{{\mathfrak{m}}}(f_0{\mathfrak{m}})\geq \operatorname {Ent}_{{\mathfrak{m}}}(f_t \mathfrak {m})+ \frac{1}{2}\int_0^t| \dot{f_s{\mathfrak{m}}}|^2 {\mathrm{d}s}+\frac{1}{2}\int _0^t\big|D^-\operatorname{Ent}_{{\mathfrak{m}}} \big|^2(f_s{\mathfrak{m}}) {\mathrm{d}s}, $$

which, together with (9), ensures the thesis.

For the general case we argue by approximation, considering

$$f^n_0:=c_n\min\bigl\{n,\max \{f_0,1/n\}\bigr\}, $$

c n being the normalizing constant, and the corresponding gradient flow \((f^{n}_{t})\) of Ch. The fact that \(f^{n}_{0}\to f_{0}\) in \(L^{2}(X,{\mathfrak{m}})\) and the convexity of Ch implies that \(f^{n}_{t}\to f_{t}\) in \(L^{2}(X,{\mathfrak{m}})\) for any t>0. In particular, \(W_{2}(f^{n}_{t}{\mathfrak{m}},f_{t}{\mathfrak{m}})\to0\) as n→∞ for every t (because convergence w.r.t. W 2 is equivalent to weak convergence of measures).

Now notice that we know that

$$\operatorname{Ent}_{\mathfrak{m}}\bigl(f_0^n\mathfrak{m}\bigr) =\operatorname{Ent}_{\mathfrak{m}}\bigl(f^n_t\bigr)+ \frac{1}{2}\int_0^t\big|\dot {f^n_s{ \mathfrak{m}}}\big|^2 {\mathrm{d}s} +\frac{1}{2}\int_0^t\big|D^- \operatorname{Ent}_{{\mathfrak{m}}}\big|^2\bigl(f^n_s \bigr) {\mathrm{d}s},\quad\forall t>0. $$

Furthermore, it is immediate to check that \(\operatorname {Ent}_{\mathfrak {m}}(f^{n}_{0}{\mathfrak{m}})\to \operatorname {Ent}_{\mathfrak {m}}(f_{0}{\mathfrak{m}})\) as n→∞. The pointwise convergence of \(f^{n}_{t}{\mathfrak{m}}\) to \(f_{t}{\mathfrak{m}}\) w.r.t. W 2 easily yields that the terms on the right hand side of the last equation are lower semicontinuous when n→∞ (recall Theorem 5.8 for the slope). Thus it holds

$$\operatorname {Ent}_{\mathfrak {m}}(f_0{\mathfrak{m}})\geq \operatorname {Ent}_{\mathfrak {m}}(f_t)+\frac{1}{2}\int _0^t|\dot {f_s{ \mathfrak{m}}}|^2 {\mathrm{d}s}+ \frac{1}{2}\int_0^t\big|D^- \operatorname{Ent}_{{\mathfrak{m}}}\big|^2(f_s) { \mathrm{d}s},\quad\forall t>0, $$

which, by (11), is the thesis.

We know, by Theorem 5.9, that there is at most a gradient flow starting from μ 0. We also know that a gradient flow \(f_{t}'\) of Ch starting from f 0 exists, and part (i) gives that \(\mu_{t}':=f_{t}'{\mathfrak{m}}\) is a gradient flow of \(\mathrm{Ent}_{\mathfrak{m}}\). The uniqueness of gradient flows gives \(\mu_{t}=\mu_{t}'\) for all t≥0. □

As a consequence of the previous Theorem 6.10 it would not be difficult to prove that the inequality (45) is in fact an identity: if \((X,\mathsf{d},{\mathfrak{m}})\) is a compact and normalized CD(K,∞) space, then \(|D^{-}\operatorname{Ent}_{{\mathfrak{m}}}|(f\mathfrak {m})<\infty\) if and only if the probability density f is Sobolev along a.e. curve and \(\sqrt{f}\in D(\mathsf{Ch})\); in this case

$$ \big|D^-\operatorname{Ent}_{{\mathfrak{m}}}\big|^2(f{ \mathfrak{m}})= \int_X\frac{|D f|^2_w}{f} \mathrm{d} { \mathfrak{m}}=4\int_X|D \sqrt{f}|_w^2 \mathrm{d} {\mathfrak{m}}. $$
(48)

7 A Metric Brenier Theorem

In this section we state and prove the metric Brenier theorem in CD(K,∞) spaces we announced in the introduction. It was recently proven in [14] that under an additional non-branching assumption one can really recover an optimal transport map, see also [7] for related results, obtained under stronger non-branching assumptions and weaker convexity assumptions.

Definition 7.11

(Strict CD(K,∞) spaces)

We say that a compact normalized metric measure space \((X,\mathsf {d},{\mathfrak{m}})\) is a strict CD(K,∞) space if for any \(\mu_{0}, \mu_{1}\in D(\operatorname{Ent}_{{\mathfrak{m}}}) \) there exists \(\boldsymbol{\pi }\in\operatorname{GeoOpt}(\mu_{0},\mu_{1})\) with the following property. For any bounded Borel function \(F:\operatorname{Geo}(X)\to [0,\infty)\) such that ∫Fdπ=1, it holds

$$\operatorname{Ent}_{\mathfrak{m}}\bigl(\mu^F_t\bigr)\leq(1-t) \operatorname{Ent}_{\mathfrak{m}} \bigl(\mu^F_0\bigr)+t \operatorname{Ent}_{\mathfrak{m}} \bigl(\mu^F_1\bigr) -\frac{K}{2}t(1-t)W_2^2\bigl( \mu_0^F,\mu_1^F\bigr), $$

where \(\mu^{F}_{t}:=(\mathrm{e}_{t})_{\sharp}(F\boldsymbol{\pi})\), for any t∈[0,1].

Thus, the difference between strict CD(K,∞) spaces and standard CD(K,∞) ones is the fact that geodesic convexity is required along all geodesics induced by the weighted plans F π, rather than the one induced by π only. Notice that the necessary and sufficient optimality conditions ensure that (e0,e1) π is concentrated on a c-monotone set, hence (e0,e1)(F π) has the same property and it is optimal, relative to its marginals. (We remark that recent results of Rajala [28] suggest that it is not necessary to assume this stronger convexity to get the metric Brenier theorem—and hence not even a treatable notion of spaces with Riemannian Ricci curvature bounded from below—see [2] for progresses in this direction.)

It is not clear to us whether the notion of being strict CD(K,∞) is stable or not w.r.t. measured Gromov-Hausdorff convergence and, as such, it should be handled with care. The importance of strict CD(K,∞) bounds relies on the fact that on these spaces geodesic interpolation between bounded probability densities is made of bounded densities as well, thus granting the existence of many test plans.

Notice that non-branching CD(K,∞) spaces are always strict CD(K,∞) spaces, indeed let \(\mu_{0}, \mu_{1}\in D(\operatorname {Ent}_{{\mathfrak{m}}})\) and pick \(\boldsymbol{\pi}\in\operatorname{GeoOpt}(\mu_{0},\mu_{1})\) such that \(\operatorname{Ent}_{{\mathfrak{m}}}\) is K-convex along ((e t ) π). From the non-branching hypothesis it follows that for F as in Definition 7.11 there exists a unique element in \(\operatorname{GeoOpt}(\mu^{F}_{t},\mu^{F}_{1})\) (resp. in \(\operatorname{GeoOpt}(\mu^{F}_{t},\mu^{F}_{0})\)). Also, since F is bounded, from \(\mu_{t}\in D(\operatorname{Ent}_{{\mathfrak{m}}})\) we deduce \(\mu^{F}_{t}\in D(\operatorname{Ent}_{{\mathfrak{m}}})\). Hence the map \(t\mapsto \operatorname {Ent}_{{\mathfrak{m}}}(\mu^{F}_{t})\) is K-convex and bounded on [ε,1] and on [0,1−ε] for all ε∈(0,1), and therefore it is K-convex on [0,1].

Proposition 7.17

(Bound on geodesic interpolant)

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a strict CD(K,∞) space and let be with bounded densities. Then there exists a test plan \(\boldsymbol{\pi}\in\operatorname {GeoOpt}(\mu_{0},\mu_{1})\) so that the induced geodesic μ t =(e t ) π connecting μ 0 to μ 1 is made of measures with uniformly bounded densities.

Proof

Let M be an upper bound on the densities of μ 0,μ 1, \(\boldsymbol{\pi}\in\operatorname{GeoOpt}(\mu_{0},\mu_{1})\) be a plan which satisfies the assumptions of Definition 7.11 and μ t :=(e t ) π. We claim that the measures μ t have uniformly bounded densities. The fact that \(\mu_{t}\ll{\mathfrak{m}}\) is obvious by geodesic convexity, so let f t be the density of μ t and assume by contradiction that for some t 0∈[0,1] it holds

$$ f_{t_0}(x)> Me^{K^-\mathrm{D}^2/8},\quad\forall x\in A, $$
(49)

where \({\mathfrak{m}}(A)>0\) and D is the diameter of X. Define \(\tilde{\boldsymbol{\pi}}:=c\boldsymbol{\pi}|_{\mathrm {e}_{t_{0}}^{-1}(A)}\), where c is the normalizing constant (notice that \(\tilde{\boldsymbol{\pi}}\) is well defined, because \(\boldsymbol{\pi}(\mathrm{e}_{t_{0}}^{-1}(A))=\mu_{t_{0}}(A)>0\)) and observe that the density of \(\tilde{\boldsymbol{\pi}}\) w.r.t. π is bounded. Let \(\tilde{\mu}_{t}:=(\mathrm{e}_{t})_{\sharp}\tilde{\boldsymbol{\pi}}\) and \(\tilde{f}_{t}\) its density w.r.t. \({\mathfrak{m}}\). From (49) we get \(\tilde{f}_{t_{0}}=cf_{t_{0}}\) on A and \(\tilde{f}_{t_{0}}=0\) on XA, hence

$$ \operatorname {Ent}_{{\mathfrak{m}}}(\tilde{\mu}_{t_0})=\int \log(\tilde{f}_{t_0}\circ \mathrm{e}_{t_0}) \mathrm{d} \boldsymbol{\pi}>\log c+\log M+\frac{K^-}{8}\mathrm{D}^2. $$
(50)

On the other hand, we have \(\tilde{f}_{0}\leq cf_{0}\leq cM\) and \(\tilde{f}_{1}\leq cf_{1}\leq cM\) and thus

$$ \operatorname {Ent}_{{\mathfrak{m}}}(\tilde{\mu}_i)=\int \log(\tilde{f}_{i}\circ\mathrm {e}_i) \mathrm{d} \tilde{ \boldsymbol{\pi}}\leq \log c+\log M,\quad i=0,1. $$
(51)

Finally, it certainly holds \(W_{2}^{2}(\tilde{\mu}_{0},\tilde{\mu}_{1})\leq \mathrm{D}^{2}\), so that (50) and (51) contradict the K-convexity of \(\operatorname{Ent}_{{\mathfrak{m}}}\) along \((\tilde{\mu}_{t})\). Hence (49) is false and the f t ’s are uniformly bounded. □

An important consequence of this uniform bound is the following metric version of Brenier’s theorem.

Theorem 7.11

(A metric Brenier theorem)

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a strict CD(K,∞) space, let f 0,f 1 be probability densities and φ any Kantorovich potential for the couple \((f_{0}{\mathfrak{m}},f_{1}\mathfrak {m})\). Then for every \(\boldsymbol{\pi}\in\operatorname{GeoOpt}(f_{0}\mathfrak {m},f_{1}{\mathfrak{m}})\) it holds

$$ \mathsf{d}(\gamma_0,\gamma_1)=|D \varphi|_w( \gamma_0)=\big|D^+\varphi \big|(\gamma_0),\quad\mathit{for}\ \boldsymbol{\pi}\hbox{-}\mathit{a.e.}\ \gamma. $$
(52)

In particular,

$$W_2^2(f_0{\mathfrak{m}},f_1{ \mathfrak{m}})=\int_X|D \varphi|^2_* f_0 \mathrm{d} \mathfrak{m}. $$

If moreover \(f_{0},f_{1}\in L^{\infty}(X,{\mathfrak{m}})\) and π is a test plan (such a plan exists thanks to Proposition 7.17) then

$$ \lim_{t\downarrow0} \frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{\mathsf{d}(\gamma_0,\gamma_t)}= \big|D^+\varphi\big|( \gamma_0)\quad\text{\textit{in}}\ L^2\bigl(\mathrm {Geo}(X), \boldsymbol{\pi}\bigr). $$
(53)

Proof

φ is Lipschitz, therefore |D + φ| is an upper gradient of φ, and hence || w ≤|D + φ| \({\mathfrak{m}}\)-a.e. Now fix xX and pick any y c φ(x). From the c-concavity of φ we get

Therefore

$$\varphi(z)-\varphi(x)\leq\frac{\mathsf{d}^2(z,y)}{2}-\frac{\mathsf{d}^2(x,y)}{2}\leq \mathsf{d}(z,x)\frac{\mathsf{d}(z,y)+\mathsf{d}(x,y)}{2}. $$

Dividing by d(x,z) and letting zx, by the arbitrariness of y c φ(x) and the fact that \(\operatorname{supp}((\mathrm{e}_{0},\mathrm{e}_{1})_{\sharp}\boldsymbol {\pi})\subset\partial^{c}\varphi\) we get

$$\big|D^+\varphi\big|(\gamma_0)\leq \min_{y\in\partial^c\varphi(\gamma_0)}\mathsf{d}( \gamma_0,y) \leq\mathsf{d}(\gamma_0, \gamma_1)\quad\text{for}\ \boldsymbol {\pi}\hbox{-a.e. }\gamma. $$

Since

to conclude it is sufficient to prove that

$$ W_2^2(f_0{ \mathfrak{m}},f_1{\mathfrak{m}})\leq\int_X|D \varphi|_w^2f_0 \mathrm{d} {\mathfrak{m}}. $$
(54)

Now assume that f 0 and f 1 are bounded from above and let \(\tilde{\boldsymbol{\pi}}\in\operatorname{GeoOpt}(f_{0}\mathfrak {m},f_{1}{\mathfrak{m}})\) be a test plan (such \(\tilde{\boldsymbol{\pi}}\) exists thanks to Proposition 7.17). Since φ is a Kantorovich potential and \((\mathrm{e}_{0},\mathrm{e}_{1})_{\sharp}\tilde{\boldsymbol{\pi}}\) is optimal, it holds γ 1 c φ(γ 0) for any \(\gamma\in\operatorname {supp}(\tilde{\boldsymbol{\pi}})\). Hence arguing as before we get

$$ \varphi(\gamma_0)-\varphi(\gamma_t)\geq \frac{\mathsf{d}^2(\gamma_0,\gamma_1)}{2}-\frac{\mathsf{d}^2(\gamma_t,\gamma_1)}{2}= \mathsf{d}^2( \gamma_0,\gamma_1) \bigl(t-t^2/2 \bigr). $$
(55)

Dividing by d(γ 0,γ t )=t d(γ 0,γ 1), squaring and integrating w.r.t. \(\tilde{\boldsymbol{\pi}}\) we obtain

$$ \varliminf_{t\downarrow0}\int \biggl(\frac{\varphi(\gamma_0)-\varphi (\gamma_t)}{ \mathsf{d}(\gamma_0,\gamma_t)} \biggr)^2 \mathrm{d}\tilde {\boldsymbol{\pi}}(\gamma)\geq \int \mathsf{d}^2(\gamma_0,\gamma_1) \mathrm{d} \tilde{\boldsymbol {\pi}}(\gamma) =W_2^2(f_0{ \mathfrak{m}},f_1{\mathfrak{m}}). $$
(56)

Using Remark 4.4 and the fact that \(\tilde{\boldsymbol{\pi }}\) is a test plan we have

$$ \begin{aligned}[b] \int \biggl( \frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{\mathsf{d} (\gamma_0,\gamma_t)} \biggr)^2 \mathrm{d}\tilde{\boldsymbol{\pi}}( \gamma) &\leq\int\frac{1}{t^2} \biggl(\int_0^t|D \varphi|_w(\gamma_s) {\mathrm{d}s} \biggr)^2 \mathrm{d}\tilde{\boldsymbol{\pi}}(\gamma ) \\ &\leq \frac{1}{t}\iint_0^t|D \varphi|_w^2( \gamma_s) {\mathrm{d}s} \mathrm {d}\tilde{\boldsymbol{\pi}}(\gamma) \\ &=\frac{1}{t}\iint_0^t|D \varphi|_w^2 {\mathrm{d}s} \mathrm {d}(\mathrm{e}_t)_\sharp \tilde{ \boldsymbol{\pi}} \\ & =\frac{1}{t}\iint_0^t|D \varphi|_w^2f_s {\mathrm{d}s} \mathrm {d} { \mathfrak{m}}, \end{aligned} $$
(57)

where f s is the density of \((\mathrm{e}_{s})_{\sharp}\tilde{\boldsymbol{\pi}}\). Since \((\mathrm{e}_{t})_{\sharp}\tilde{\boldsymbol{\pi}}\) weakly converges to \((\mathrm{e}_{0})_{\sharp}\tilde{\boldsymbol{\pi}}\) as t↓0 and \(\operatorname {Ent}_{{\mathfrak{m}}}((\mathrm{e}_{t})_{\sharp}\tilde{\boldsymbol {\pi}})\) is uniformly bounded (by the K-geodesic convexity), we conclude that f t f 0 weakly in \(L^{1}(X,{\mathfrak{m}})\) and since \(|D \varphi|_{w}\in L^{\infty}(X,{\mathfrak{m}})\) we have

$$ \lim_{t\downarrow0}\frac{1}{t}\iint_0^t|D \varphi|_w^2f_s \mathrm{d} {s} \mathrm{d} {\mathfrak{m}}=\int_X|D \varphi|_w^2f_0 \mathrm {d} {\mathfrak{m}}. $$
(58)

Equations (56), (57) and (58) yield (54).

In order to prove (54) in the general case of possibly unbounded densities, let us fix a Kantorovich potential φ, \(\boldsymbol{\pi}\in\operatorname{GeoOpt}(f_{0}\mathfrak {m},f_{1}{\mathfrak{m}})\) and for n∈ℕ define \(\boldsymbol{\pi}^{n}:=c_{n}\boldsymbol{\pi}|_{\{\gamma:f_{0}(\gamma _{0})+f_{1}(\gamma _{1})\leq n\}}\), c n →1 being the normalization constant. Then \(\boldsymbol{\pi}^{n}\in\operatorname{GeoOpt}(f^{n}_{0}\mathfrak {m},f^{n}_{1}{\mathfrak{m}})\), where \(f^{n}_{i}:=(\mathrm{e}_{i})_{\sharp}\boldsymbol{\pi}^{n}\), φ is a Kantorovich potential for \((f^{n}_{0}{\mathfrak{m}},f^{n}_{1}{\mathfrak{m}})\) and \(f^{n}_{0},f^{n}_{1}\in L^{\infty}(X,{\mathfrak{m}})\). Thus from what we just proved we know that it holds

$$\mathsf{d}(\gamma_0,\gamma_1)=|D \varphi|_w( \gamma_0)=\big|D^+\varphi \big|(\gamma_0), \quad\text{for}\ \boldsymbol{\pi}^n\mbox{-a.e. }\gamma. $$

Letting n→∞ we conclude.

Concerning (53), we can choose \(\tilde{\boldsymbol{\pi }}=\boldsymbol{\pi}\) and obtain by (55) and (52)

$$\frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{\mathsf{d}(\gamma_0,\gamma_t)}\ge0,\qquad \liminf_{t\downarrow0}\frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{\mathsf{d}(\gamma_0,\gamma_t)}\ge \big|D^+ \varphi\big|(\gamma_0)\quad \mbox{for}\ \boldsymbol{\pi}\hbox{-a.e.}\ \gamma. $$

On the other hand (57) and (58) yield

$$\limsup_{t\downarrow0}\int \biggl(\frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{\mathsf {d}(\gamma_0,\gamma_t)} \biggr)^2 \mathrm{d}\boldsymbol{\pi}(\gamma)\le \int \big|D^+\varphi\big|^2( \gamma_0) \mathrm{d}\boldsymbol{\pi}(\gamma), $$

so that, by expanding the square and applying Fatou’s Lemma, we obtain

$$\limsup_{t\downarrow0}\int \biggl(\frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{\mathsf {d}(\gamma_0,\gamma_t)}-\big|D^+\varphi\big|( \gamma_0) \biggr)^2 \mathrm{d}\boldsymbol {\pi}(\gamma) \le0. $$

 □

8 More on Calculus on Compact CD(K,∞) Spaces

8.1 On Horizontal and Vertical Derivatives Again

Aim of this subsection is to prove another deep relation between “horizontal” and “vertical” derivation, which will allow to compare the derivative of the squared Wasserstein distance along the heat flow with the derivative of the relative entropy along a geodesic (see the next subsection). This will be key in order to understand the properties of spaces with Riemannian Ricci curvature bounded from below, illustrated in the last section.

In order to understand the geometric point, consider the following simple example.

Example 8.1

Let ∥⋅∥ be a smooth, strictly convex norm on ℝd and let ∥⋅∥ be the dual norm. Denoting by 〈⋅,⋅〉 the canonical duality from (ℝd)×ℝd into ℝ, let \(\mathcal{L}\) be the duality map from (ℝd,∥⋅∥) to ((ℝd),∥⋅∥), characterized by

$$\bigl\langle\mathcal{L}(u),u\bigr\rangle= \big\|\mathcal{L}(u)\big\|_*\|u\|\quad\text{and} \quad\big\|\mathcal{L}(u)\big\|_*=\|u\| \quad\forall u\in\mathbb{R}^d, $$

and let \(\mathcal{L}^{*}\) be its inverse, equally characterized by

$$\bigl\langle v,\mathcal{L}^*(v)\bigr\rangle=\|v\|_*\big\|\mathcal{L}^*(v)\big\|\quad \text{and}\quad \big\|\mathcal{L}^*(v)\big\|=\|v\|_*\quad\forall v\in\bigl( \mathbb{R}^d\bigr)^*. $$

Using the fact that \(\varepsilon \mapsto\|u\|\|u+\varepsilon u'\|-\langle \mathcal{L} u,u+\varepsilon u'\rangle\) attains its minimum at ε=0 and the analogous relation for \(\mathcal{L}^{*}\), one obtains the useful relations

$$ \bigl\langle\mathcal{L}(u),u'\bigr\rangle= \frac{1}{2}\mathrm{d}_u\|\cdot\|^2\bigl(u' \bigr),\qquad \bigl\langle v',\mathcal{L}^*(v)\bigr\rangle=\frac{1}{2} \mathrm{d}_v\|\cdot\|_*^2\bigl(v'\bigr). $$
(59)

For a smooth map f:ℝd→ℝ its differential d x f at any point x is intrinsically defined as cotangent vector, namely as an element of (ℝd). To define the gradient ∇f(x)∈ℝd (which is a tangent vector), the norm comes into play via the formula \(\nabla f(x):=\mathcal{L}^{*}(\mathrm{d}_{x}f)\). Now, given two smooth functions f,g, the real number d x f(∇g(x)) is well defined as the application of the cotangent vector d x f to the tangent vector ∇g(x).

What we want to point out, is that there are two very different ways of obtaining d x f(∇g(x)) from a derivation. The first one, which is usually taken as the definition of d x f(∇g(x)), is the “horizontal derivative”:

$$ \langle\mathrm{d}_x f,\nabla g\rangle= \mathrm{d}_xf\bigl(\nabla g(x)\bigr)=\lim_{t\to 0} \frac{f(x+t\nabla g(x))-f(x)}{t}. $$
(60)

The second one is the “vertical derivative”:

$$ Df(\nabla g) (x)=\lim_{\varepsilon\to0}\frac{\frac{1}{2}\|\mathrm {d}_x(g+\varepsilon f)\|^2_*-\frac{1}{2}\|\mathrm{d}_x g\|_*^2(x)}{\varepsilon}. $$
(61)

It is not difficult to check that (61) is consistent with (60): indeed (omitting the x dependence), recalling the second identity of (59), we have

$$\|\mathrm{d} {g}+\varepsilon\mathrm{d} {f}\|^2_*=\|\mathrm{d} {g} \|_*^2+ 2\varepsilon\bigl\langle\mathcal{L}^*(\mathrm{d} {g}), \mathrm{d} {f}\bigr\rangle +o(\varepsilon)= \|\nabla g\|^2+2 \varepsilon\langle\nabla g,\mathrm{d} {f}\rangle +o(\varepsilon). $$

The point is that the equality between the right hand sides of formulas (61) and (60) extends to a genuine metric setting. In the following lemma (where the plan π plays the role of −∇g) we prove one inequality, but we remark that “playing with signs” it is possible to obtain an analogous inequality with ≤ in place of ≥.

Lemma 8.7

(Horizontal and vertical derivatives)

Let f be a Sobolev function along a.e. curve with \(|D f|_{w}\in L^{2}(X,{\mathfrak{m}})\), let g:X→ℝ be Lipschitz and let π be a test plan concentrated on \(\operatorname{Geo}(X)\). Assume that

$$ \lim_{t\downarrow0}\frac{g(\gamma_0)-g(\gamma_t)}{ \mathsf{d}(\gamma_0,\gamma_t)}=|D g|_w(\gamma_0) \quad\text{\textit{in} }L^2\bigl( \mathrm{Geo}(X),\boldsymbol{\pi}\bigr). $$
(62)

Then

(63)

Proof

Define the functions \(F_{t}, G_{t}:\operatorname{Geo}(X)\to\mathbb {R}\cup\{\pm\infty\}\) by

By (62) it holds

$$ \int|D g|_w^2\circ \mathrm{e}_0 \mathrm{d}\boldsymbol{\pi}(\gamma )= \lim_{t\downarrow 0}\int G_t^2 \mathrm{d}\boldsymbol{ \pi}. $$
(64)

Since the measures (e t ) π→(e0) π weakly in duality with C(X) as t↓0 and their densities with respect to \({\mathfrak{m}}\) are uniformly bounded, we obtain that the densities are weakly convergent in \(L^{\infty}(X,{\mathfrak{m}})\). Therefore, using the fact that \(|D (g+\varepsilon f)|_{w}^{2}\in L^{1}(X,{\mathfrak{m}})\) and taking into account Remark 4.4 we obtain

Subtracting this inequality from (64) and dividing by 2ε we get

$$\frac{1}{2}\int\frac{|D g|_w^2(\gamma_0)-|D (g+\varepsilon f)|_w^2(\gamma_0)}{\varepsilon}\mathrm{d}\boldsymbol{\pi}(\gamma ) \leq\varliminf_{t\downarrow 0}-\int G_t(\gamma)F_t(\gamma) \mathrm{d}\boldsymbol{\pi}(\gamma). $$

We know that G t →|Dg| w ∘e0 in \(L^{2}(\operatorname {Geo}(X),\boldsymbol{\pi})\) and that |Dg| w (γ 0)=d(γ 0,γ 1) for π-a.e. γ. Also, by Remark 4.4 and the fact that π is a test plan we easily get \(\sup_{t\in[0,1]}\|F_{t}\|_{L^{2}(\boldsymbol{\pi})}<\infty\). Thus it holds

which is the thesis. □

8.2 Two Important Formulas

Proposition 8.18

(Derivative of \(\frac{1}{2}W_{2}^{2}\) along the heat flow)

Let \((f_{t})\subset L^{2}(X,{\mathfrak{m}})\) be a heat flow made of probability densities. Then for every , for a.et∈(0,∞) it holds:

$$ \frac{\mathrm{d}}{\mathrm{d}t}\frac{1}{2}W_2^2(f_t{ \mathfrak{m}} ,\sigma)=\int_X\varphi_t\Delta f_t \mathrm{d} {\mathfrak{m}},\quad \text{\textit{for any Kantorovich potential}}\ \varphi\ \mathit{from}\ f_t\ \mathit{to}\ \sigma. $$
(65)

Proof

Since \(t\mapsto f_{t}{\mathfrak{m}}\) is an absolutely continuous curve w.r.t. W 2 (recall Theorem 6.10), the derivative at the left hand side of (65) exists for a.e. t∈(0,∞). Also, for a.e. t∈(0,∞) it holds \(\lim_{h\to 0}\frac{1}{h}(f_{t+h}-f_{t})=\Delta f_{t}\), the limit being understood in \(L^{2}(X,{\mathfrak{m}})\).

Fix t 0 such that the derivative of the Wasserstein distance exists and the above limit holds and choose any Kantorovich potential \(\varphi_{t_{0}}\) for \((f_{t_{0}}{\mathfrak{m}},\sigma)\). We have

Therefore, since \(\varphi_{t_{0}}\in L^{\infty}(X,{\mathfrak{m}})\) we get

$$\frac{W_2^2(f_{t_0+h}{\mathfrak{m}},\sigma)}{2}-\frac{W_2^2(f_{t_0}{\mathfrak{m}},\sigma)}{2} \geq\int_X \varphi_{t_0}(f_{t_0+h}-f_{t_0}) \mathrm{d} { \mathfrak{m}}= h\int_X\varphi_{t_0}\Delta f_{t_0}+o(h). $$

Dividing by h<0 and h>0 and letting h→0 we get the thesis. □

Proposition 8.19

(Derivative of the Entropy along a geodesic)

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a strict CD(K,∞) space. Let , \(\boldsymbol{\pi}\in \operatorname{GeoOpt}(\mu_{0},\mu_{1})\) and φ a Kantorovich potential for (μ 0,μ 1). Assume that π is a test plan and that \(\mu_{0}\geq c{\mathfrak{m}}\) from some c>0 and denote by h t the density of μ t :=(e t ) π. Then

$$ \varliminf_{t\downarrow0}\frac{\operatorname {Ent}_{{\mathfrak{m}}}(\mu_t)-\operatorname {Ent}_{{\mathfrak{m}}}(\mu_0)}{t}\geq \lim_{\varepsilon\downarrow0}\frac{\mathsf{Ch}(\varphi)-\mathsf {Ch}(\varphi+\varepsilon h_0)}{\varepsilon}. $$
(66)

Proof

The convexity of Ch ensures that the limit at the right hand side exists. From the fact that φ is Lipschitz, it is not hard to see that h 0D(Ch) implies Ch(φ+εh 0)=+∞ for any ε>0 and in this case there is nothing to prove. Thus, we assume that h 0D(Ch).

The convexity of zzlogz gives

$$ \frac{\operatorname {Ent}_{{\mathfrak{m}}}(\mu_t)-\operatorname {Ent}_{{\mathfrak{m}}}(\mu_0)}{t}\geq \int_X\log h_0\frac{h_t-h_0}{t} \mathrm{d} {\mathfrak{m}}=\int \frac{\log(h_0\circ e_t)-\log(h_0\circ e_0)}{t} \mathrm{d} \boldsymbol{\pi}. $$
(67)

Using the trivial inequality given by Taylor’s formula

$$\log b-\log a\geq\frac{b-a}{a}-\frac{|b-a|^2}{2c^2}, $$

valid for any a,b∈[c,∞), we obtain

$$ \begin{aligned}[b] \int\frac{\log(h_0\circ\mathrm{e}_t)-\log(h_0\circ\mathrm{e}_0)}{t} \mathrm{d} \boldsymbol{\pi}\geq{}& \int\frac{h_0\circ e_t-h_0\circ\mathrm{e}_0}{th_0\circ \mathrm{e}_0} \mathrm{d}\boldsymbol{\pi} \\ &{} -\frac{1}{2tc^2}\int|h_0\circ\mathrm{e}_t-h_0 \circ\mathrm{e}_0|^2 \mathrm{d}\boldsymbol{\pi}. \end{aligned} $$
(68)

Taking into account Remark 4.4 and the fact that \(|\dot{\gamma}_{t}|=\mathsf{d}(\gamma_{0},\gamma_{1})\le\mathrm{diam}(X)\) for a.e. t∈(0,1) and π-a.e. γ, the last term in this expression can be bounded from above by

(69)

which goes to 0 as t→0.

Now let \(S:\operatorname{Geo}(X)\to\mathbb{R}\) be the Borel function defined by S(γ):=h 0γ 0 and define \(\tilde{\boldsymbol{\pi}}:=\frac{1}{S}\boldsymbol{\pi}\). It is easy to check that \((\mathrm{e}_{0})_{\sharp}\tilde{\boldsymbol{\pi}}={\mathfrak{m}}\), so that in particular \(\tilde{\boldsymbol{\pi}} \) is a probability measure. Also, the bound h 0c>0 ensures that \(\tilde{\boldsymbol{\pi}}\) is a test plan. By definition we have

$$\int\frac{h_0\circ e_t-h_0\circ\mathrm{e}_0}{th_0\circ \mathrm{e}_0} \mathrm{d}\boldsymbol{\pi} =\int\frac{h_0\circ e_t-h_0\circ\mathrm{e}_0}{t} \mathrm{d}\tilde{ \boldsymbol{\pi}}. $$

The latter equality and inequalities (67), (68) and (69) ensure that to conclude it is sufficient to show that

$$ \varliminf_{t\downarrow0}\int\frac{h_0\circ e_t-h_0\circ \mathrm{e}_0}{t} \mathrm{d}\tilde{\boldsymbol{\pi}} \geq\lim_{\varepsilon\downarrow0}\frac{\mathsf{Ch}(\varphi )-\mathsf{Ch}(\varphi+\varepsilon h_0)}{\varepsilon}. $$
(70)

Here we apply the key Lemma 8.7. Observe that Theorem 7.11 ensures that

$$|D \varphi|_w(\gamma_0)=\lim_{t\downarrow 0} \frac{\varphi(\gamma_0)-\varphi(\gamma_t)}{t}=\mathsf{d}(\gamma_0,\gamma_1) $$

where the convergence is understood in L 2(π). Thus the same holds for \(L^{2}(\tilde{\boldsymbol{\pi}})\) and the hypotheses of Lemma 8.7 are satisfied with \(\tilde{\boldsymbol{\pi}}\) as test plan, g:=φ and f:=h 0. Equation (63) then gives

which concludes the proof. □

9 Riemannian Ricci Bounds

We say that \((X,\mathsf{d},{\mathfrak{m}})\) has Riemannian Ricci curvature bounded below by K∈ℝ (in short, it is a RCD(K,∞) space) if any of the 3 equivalent conditions stated in the following theorem is true.

Theorem 9.12

Let \((X,\mathsf{d},{\mathfrak{m}})\) be a compact and normalized metric measure space and K∈ℝ. The following three properties are equivalent.

  1. (i)

    \((X,\mathsf{d},{\mathfrak{m}})\) is a strict CD(K,∞) space (Definition 7.11) and the L 2-gradient flow of Ch is linear.

  2. (ii)

    \((X,\mathsf{d},{\mathfrak{m}})\) is a strict CD(K,∞) space (Definition 7.11) and Cheeger’s energy is quadratic, i.e.

    $$ 2 \bigl(\mathsf{Ch}(f)+\mathsf{Ch}(g) \bigr)=\mathsf{Ch}(f+g)+ \mathsf {Ch}(f-g),\quad \forall f, g\in L^2(X,{\mathfrak{m}}). $$
    (71)
  3. (iii)

    \(\operatorname{supp}({\mathfrak{m}})\) is geodesic and for any there exists an EVI K -gradient flow for \(\operatorname {Ent}_{{\mathfrak{m}}}\) starting from μ.

Proof

(i) ⇒ (ii). Since the heat semigroup P t in \(L^{2}(X,{\mathfrak{m}})\) is linear we obtain that Δ is a linear operator (i.e. its domain D(Δ) is a subspace of \(L^{2}(X,{\mathfrak{m}})\) and \(\Delta:D(\Delta)\to L^{2}(X,{\mathfrak{m}})\) is linear). Since tCh(P t (f)) is locally Lipschitz, tends to 0 as t→∞ and \(\partial_{t}\mathsf{Ch}(P_{t}(f))=-\|\Delta P_{t}(f)\|^{2}_{L^{2}}\) for a.e. t>0 (see (22)), we have

$$\mathsf{Ch}(f)=\int_0^\infty\big\|\Delta P_t(f)\big\|^2_{L^2(X,\mathfrak {m})} {\mathrm{d}t}. $$

Therefore Ch, being an integral of quadratic forms, is a quadratic form. Specifically, for any \(f, g\in L^{2}(X,{\mathfrak{m}})\) it holds

(ii) ⇒ (iii). By [31, Remark 4.6(iii)] \((\operatorname{supp}({\mathfrak{m}}),\mathsf{d})\) is a length space and therefore it is also geodesic, since X is compact.

Thanks to Remark 2.1 it is sufficient to prove that a gradient flow in the EVI K sense exists for an initial datum \(\mu_{0}\ll{\mathfrak{m}}\) with density bounded away from 0 and infinity. Let f 0 be this density, (f t ) the heat flow starting from it and recall that from the maximum principle 4.11 we know that the f t ’s are far from 0 and infinity as well for any t>0. Fix a reference probability measure σ with density bounded away from 0 and infinity as well. For any t≥0 pick a test plan π t optimal for \((f_{t}{\mathfrak{m}},\sigma)\). Define \(\sigma_{t}^{s}:=(\mathrm{e}_{s})_{\sharp}\pi_{t}\).

We claim that for a.e. t∈(0,∞) it holds

$$ \frac{\mathrm{d}}{\mathrm{d}t}\frac{1}{2}W_2^2(f_t{ \mathfrak{m}},\sigma {\mathfrak{m}})\leq\varliminf_{s\downarrow 0} \frac{\operatorname {Ent}_{\mathfrak {m}}(\sigma_t^s)-\operatorname {Ent}_{\mathfrak {m}}(\sigma_t^0)}{s}. $$
(72)

Let φ t be a Kantorovich potential for \(f_{t}{\mathfrak{m}},\sigma{\mathfrak{m}}\). By Proposition 8.18 we know that for a.e. t∈(0,∞) it holds

$$\frac{\mathrm{d}}{\mathrm{d}t}\frac{1}{2}W_2^2(f_t{ \mathfrak{m}},\sigma {\mathfrak{m}}) =\int_X\varphi\Delta f_t \mathrm{d} {\mathfrak{m}}\leq\lim_{\varepsilon\downarrow 0} \frac{\mathsf{Ch}(f_t-\varepsilon\varphi_t)-\mathsf {Ch}(f_t)}{\varepsilon}, $$

while from Proposition 8.19 we have that for any t>0 it holds

$$\varliminf_{s\downarrow0}\frac{\operatorname {Ent}_{\mathfrak {m}}(\sigma_t^s)-\operatorname {Ent}_{\mathfrak {m}}(\sigma_t^0)}{s} \geq\lim_{\varepsilon\downarrow0} \frac{\mathsf{Ch}(\varphi_t)-\mathsf{Ch} (\varphi_t+\varepsilon f_t)}{\varepsilon}. $$

Here we use the fact that Ch is quadratic. Indeed in this case simple algebraic manipulations show that

$$\frac{\mathsf{Ch}(f_t-\varepsilon\varphi_t)-\mathsf {Ch}(f_t)}{\varepsilon }=\frac{\mathsf{Ch}(\varphi_t)-\mathsf{Ch}(\varphi_t+\varepsilon f_t)}{\varepsilon} +O(\varepsilon ),\quad\forall t>0, $$

and therefore (72) is proved.

Now notice that the K-convexity of the entropy yields

$$\varliminf_{s\downarrow0}\frac{\operatorname {Ent}_{\mathfrak {m}}(\sigma_t^s)-\operatorname {Ent}_{\mathfrak {m}}(\sigma_t^0)}{s}\leq \operatorname {Ent}_{\mathfrak {m}}(\sigma)- \operatorname {Ent}_{\mathfrak {m}}(f_t{\mathfrak{m}})-\frac{K}{2}W_2^2(f_t \mathfrak {m},\sigma), $$

and therefore we have

By Proposition 2.1 we conclude.

(iii) ⇒ (i). Since \((\operatorname {supp}({\mathfrak{m}}),\mathsf{d})\) is geodesic, so is \((\overline{D(\operatorname{Ent}_{{\mathfrak{m}}})},W_{2})\), which together with existence of EVI K -gradient flows for \(\operatorname{Ent}_{{\mathfrak{m}}}\) yields, via Proposition 2.3, K-geodesic convexity of \(\operatorname{Ent}_{{\mathfrak{m}}}\) along all geodesics in \(\overline{D(\operatorname{Ent}_{{\mathfrak{m}}})}\). In particular, \((X,\mathsf{d},{\mathfrak{m}})\) is a strict CD(K,∞) space.

We turn to the linearity. Let \((\mu_{t}^{0})\), \((\mu^{1}_{t})\) be two EVI K -gradient flows of the relative entropy and, for λ∈(0,1) fixed, define \(\mu^{\lambda}_{t}:=(1-\lambda)\mu^{0}_{t}+\lambda\mu^{1}_{t}\).

We claim that (μ t ) is an EVI K -gradient flow of \(\operatorname{Ent}_{{\mathfrak{m}}}\). To prove this, fix , t>0 and an optimal plan \(\boldsymbol{\gamma}\in\mbox{\textsc{Opt}}(\mu_{t}^{\lambda},\nu)\). Since \(\mu^{i}_{t}\ll\mu^{\lambda}_{t}=\pi^{1}_{\sharp}\boldsymbol{\gamma}\) for i=0,1 we can define, as in Definition 5.10, the plans and the measures \(\nu^{i}:=\boldsymbol{\gamma}_{\sharp}\mu^{i}_{t}\), i=0,1. Since \(\operatorname{supp}(\gamma_{\mu^{i}_{t}})\subset\operatorname {supp}(\boldsymbol{\gamma})\), we have that \(\gamma_{\mu^{i}_{t}}\in\mbox{\textsc{Opt}}(\mu^{i}_{t},\nu^{i})\), therefore from \(\boldsymbol{\gamma}=(1-\lambda)\boldsymbol{\gamma}_{{\mu ^{0}_{t}}}+\lambda\boldsymbol{\gamma}_{\mu ^{1}_{t}}\) we deduce

$$ W_2^2\bigl( \mu_t^\lambda,\nu\bigr)=(1-\lambda)W_2^2 \bigl(\mu^0_t,\nu^0\bigr)+\lambda W_2^2\bigl(\mu^1_t, \nu^1\bigr). $$
(73)

On the other hand, from the convexity of the squared Wasserstein distance we immediately get that

$$ W_2^2\bigl(\mu^\lambda_{t+h}, \nu\bigr)\leq(1-\lambda) W_2^2\bigl(\mu^0_{t+h}, \nu^0\bigr) +\lambda W_2^2\bigl( \mu^1_{t+h},\nu^1\bigr),\quad\forall h>0. $$
(74)

Furthermore, recalling (iii) of Proposition 5.15, we get

$$ \begin{aligned} \operatorname{Ent}_{\mathfrak{m}}\bigl(\mu^\lambda_t\bigr) {{\mathfrak{m}}}-\operatorname {Ent}_{{\mathfrak{m}}}(\nu)\leq {} &(1-\lambda) \bigl(\operatorname{Ent}_{\mathfrak{m}}\bigl(\mu^0_t\bigr) - \operatorname{Ent}_{\mathfrak{m}}\bigl(\nu^0\bigr) \bigr) \\ &{} +\lambda \bigl( \operatorname{Ent}_{\mathfrak{m}}\bigl(\mu^1_t\bigr) -\operatorname{Ent}_{\mathfrak{m}}\bigl(\nu^1\bigr)\bigr). \end{aligned} $$
(75)

The fact that \((\mu^{0}_{t})\) and \((\mu^{1}_{t})\) are EVI K -gradient flows for \(\operatorname{Ent}_{{\mathfrak{m}}}\) (see in particular the characterization (iii) given in Proposition 2.1) in conjunction with (73), (74) and (75) yield

$$ \mathop{\overline{\lim}}_{h\downarrow0}\frac{W_2^2(\mu^\lambda_{t+h},\nu)-W_2^2(\mu^\lambda_t,\nu)}{2} + \frac{K}{2}W_2^2\bigl(\mu^\lambda_t,\nu \bigr)+\operatorname{Ent}_{\mathfrak{m}}\big(\mu^\lambda_t\big) \leq \operatorname {Ent}_{\mathfrak{m}}(\nu). $$
(76)

Since t>0 and were arbitrary, we proved that \((\mu^{\lambda}_{t})\) is a EVI K -gradient flow of \(\operatorname {Ent}_{{\mathfrak{m}}}\) (see again (iii) of Proposition 2.1).

Thus, recalling the identification of gradient flows, we proved that the L 2-heat flow is additive in \(D(\operatorname{Ent}_{\mathfrak {m}})\). Since the heat flow in \(L^{2}(X,{\mathfrak{m}})\) commutes with additive and multiplicative constants, it is easy to get from this linearity in the class of bounded functions. By L 2 contractivity, linearity extends to the whole of \(L^{2}(X,{\mathfrak{m}})\). □

We conclude by discussing some basic properties of the spaces with Riemannian Ricci curvature bounded from below.

We start observing that Riemannian manifolds with Ricci curvature bounded below by K are RCD(K,∞) spaces, as they are non branching CD(K,∞) spaces and the heat flow is linear on them. Also, from the studies made in [25, 27, 33] and [16] we also know that finite dimensional Alexandrov spaces with curvature bounded from below are RCD(K,∞) spaces as well. On the other side, Finsler manifolds are ruled out, as it is known (see for instance [26]) that the heat flow is linear on a Finsler manifold if and only if the manifold is Riemannian.

The stability of the RCD(K,∞) notion can be deduced by the stability of EVI K -gradient flows w.r.t. Γ-convergence of functionals, which is an easy consequence of the integral formulation in (ii) of Proposition 2.1.

Hence RCD(K,∞) spaces have the same basic properties of CD(K,∞) spaces, which gives to this notion the right of being called a synthetic (or weak) notion of Ricci curvature bound.

The point is then to understand the additional analytic/geometric properties of these spaces, which come mainly by the addition of linearity condition. A first consequence is that the heat flow contracts, up to an exponential factor, the distance W 2, i.e.

$$W_2(\mu_t,\nu_t)\leq e^{-Kt} W_2(\mu_0,\nu_0),\quad\forall t\geq0, $$

whenever are gradient flows of the entropy.

By a duality argument (see [6, 15, 21]), this property implies the Bakry-Emery gradient estimate

$$\big|D {\sf h}_t(f)\big|_w^2(x)\leq e^{-2Kt}\mathsf{h}_t\bigl(|D f|_w^2 \bigr) (x),\quad\text{for}\ \mathfrak{m}\hbox{-a.e.}\ x\in X, $$

for all t>0, where \(\mathsf{h}_{t}:L^{2}(X,{\mathfrak{m}})\to L^{2}(X,\mathfrak {m})\) is the heat flow seen as gradient flow of Ch. If \((X,\mathsf {d},{\mathfrak{m}})\) is doubling and supports a local Poincaré inequality, then also the Lipschitz regularity of the heat kernel is deduced (following an argument described in [15]).

Also, since in RCD(K,∞) spaces Ch is a quadratic form, if we define

$$\mathcal{E}(f,g):=\mathsf{Ch}(f+g)-\mathsf{Ch}(f)-\mathsf {Ch}(g),\quad\forall f,g\in W^{1,2}(X,\mathsf{d},{\mathfrak{m}}), $$

we get a closed Dirichlet form on \(L^{2}(X,{\mathfrak{m}})\) (closure follows from the L 2-lower semicontinuity of Ch). Hence it is natural to compare the calculus on RCD(K,∞) spaces with the abstract one available for Dirichlet forms (see [11]). The picture here is pretty clear and consistent. Recall that to any \(f\in D(\mathcal{E})\) one can associate the energy measure [f] defined by

$$[f](\varphi):=-\mathcal{E}(f,f\varphi)+\mathcal{E}\bigl(f^2/2,\varphi \bigr). $$

Then it is possible to show that the energy measure coincides with \(|D f|_{*}^{2}{\mathfrak{m}}\). Also, the distance d coincides with the intrinsic distance \(\mathsf{d}_{\mathcal{E}}\) induced by the form, defined by

$$\mathsf{d}_{\mathcal{E}}(x,y):=\sup \bigl\{\big| g(x)-g(y)\big| : g\in D(\mathcal{E}) \cap C(X), [g]\leq{\mathfrak{m}} \bigr\}. $$

Taking advantage of these identification and of the locality of \(\mathcal{E}\) (which is a consequence of the locality of the notion |Df|), one can also see that on RCD(K,∞) spaces a continuous Brownian motion with continuous sample paths associated to \({\sf h}_{t}\) exists and is unique.

Finally, for RCD(K,∞) spaces it is possible to prove tensorization and globalization properties which are in line with those available for CD(K,∞) spaces.