1 Introduction

Kinetic models for granular media were initiated in the work of Benedetto et al. [4, 5] who considered the following PDE

$$\begin{aligned} \partial _t f+ v\cdot \nabla _x f=\mathrm {div}_v(f (\nabla W\star _v f)), \; (t,x,v)\in \mathbb {R}_+\times \mathbb {R}^d\times \mathbb {R}^d, \; f\vert _{t=0}=f_0,\qquad \end{aligned}$$
(1.1)

where \(f_0\) is an integrable nonnegative function on the phase space and W is a certain convex and radially symmetric potential capturing the (inelastic) collision rule between particles, and the convolution is in velocity only \((\nabla W\star _v f_t)(x,v)=\int _{\mathbb {R}^d} \nabla W(v-u)f_t(x,u) \text{ d } u\) (so that there is no regularizing effect in the spatial variable). At least formally, (1.1) captures the limit as the number N of particles tends to \(+\infty \) of the second-order ODE system:

$$\begin{aligned} \dot{X}_i(t)=V_i(t), \; \dot{V}_i=-\frac{1}{N} \sum _{j\ne i} \nabla W(V_i(t)-V_j(t)) \delta _{X_i(t)-X_j(t)}, \quad i=1, \ldots , N, \end{aligned}$$
(1.2)

which describes the motion of N particles of mass \(\frac{1}{N}\) moving freely until collisions occur, and at collision times, there is some velocity exchange with a loss of kinetic energy depending on the form of the potential W.

Surprisingly there are very few results on well-posedness for such equations. This is in contrast with the spatially homogeneous case (i.e. f depending on t and v only) associated with (1.1) that has been very much studied (see [4, 6, 1113, 17] and the references therein) and for which existence, uniqueness and long-time behavior are well understood. In fact, the spatially homogeneous version of (1.1) can be seen as the Wasserstein gradient flow of the interaction energy associated to W, and then well-posedness results can be viewed as a consequence of the powerful theory of Wasserstein gradient flows (see [3]). For the full kinetic equation (1.1), local existence and uniqueness of a classical solution was proved in one dimension in [4] for the potential \(W(v)=|v|^3/3\,\) (as observed in [2], the arguments of [4] extend to dimension d and \(W(v)=|v|^p/p\) provided \(p>3-d\)) when the initial datum \(f_0\) is a non-negative \(C^1\cap W^{1,\infty }(\mathbb {R}\times \mathbb {R})\) integrable function with compact support. Under an additional smallness assumption, the authors of [4] also proved a global existence result. In [1], the first author has extended the local existence result of [4] to more general interaction potentials W and to any dimension, \(d\ge 1\). The proof of [1] is based on a splitting of the kinetic equation (1.1) into a free transport equation in x, and a collision equation in v that is interpreted as the gradient flow of a convex interaction energy with respect to the quadratic Wasserstein distance. In [2], various a priori estimates are obtained, in particular a global entropy bound (which thus rules out concentration in finite time) in dimension 1 when \(W''\) is subquadratic near zero.

Understanding under which conditions one can hope for global existence or on the contrary expect explosion in finite time is mainly an open question. Let us remark that the weak formulation of (1.1) means that for any \(T>0\) and any \(\phi \in C_c^{\infty }([0,T]\times \mathbb {R}^d\times \mathbb {R}^d)\) one has

$$\begin{aligned}&\int _0^T \int _{\mathbb {R}^d\times \mathbb {R}^d} ( \partial _t \phi (t,x,v) f_t(x,v) + \nabla _x \phi (t,x,v)\cdot v f_t(x,v) ) \text{ d } x \text{ d } v \text{ d } t \\&\quad =\int _{\mathbb {R}^d\times \mathbb {R}^d} \phi (T,x,v) f_T(x,v) \text{ d } x \text{ d } v-\int _{\mathbb {R}^d\times \mathbb {R}^d} \phi (0,x,v) f_0(x,v) \text{ d } x \text{ d } v\\&\qquad + \int _0^T \int _{\mathbb {R}^d\times \mathbb {R}^d\times \mathbb {R}^d} \nabla _v \phi (t,x,v)\cdot \nabla W(v-u) f_t(x,v) f_t(x,u) \text{ d } x \text{ d } u \text{ d } v \text{ d } t \end{aligned}$$

and for the right hand side to make sense, it is necessary to have a control on nonlinear quantities like

$$\begin{aligned} \int _0^T \int _{\mathbb {R}^d\times \mathbb {R}^d\times \mathbb {R}^d} f_t(x,v) f_t(x,u) \text{ d } x \text{ d } u \text{ d } v \text{ d } t \end{aligned}$$

which actually makes it difficult to define measure solutions (this also explains why in [4] or [1], the authors look for \(L^1\cap L^\infty \) solutions). Observing that (1.1) can be written in conservative form as

$$\begin{aligned} \partial _t f + \mathrm {div}_{x,v} \big (f F(f)\big )=0, \text{ with } F(f)(x,v)=\big (v, -(\nabla W\star _v f)(x,v)\big ), \end{aligned}$$

we see that, at least for smooth solutions, (1.1) can be integrated using the method of characteristics:

$$\begin{aligned} f_t={S_t}_\# f_0, \end{aligned}$$

where \(S_t\) is the flow of the vector-field F(f) i.e.

$$\begin{aligned} S_0(x,v)=(x,v), \; \frac{\text{ d }}{\text{ d }t} S_t(x,v)=F\big (f_t)(S_t(x,v)\big ), \end{aligned}$$

and \(f_t={S_t}_\# f_0\) means that

$$\begin{aligned} \int _{\mathbb {R}^d\times \mathbb {R}^d}\varphi (x,v) f_t(x,v)\text{ d } x \text{ d } v = \int _{\mathbb {R}^d\times \mathbb {R}^d}\varphi \big (S_t(x,v)\big ) f_0(x,v)\text{ d } x \text{ d } v, \quad \forall \varphi \in C_b(\mathbb {R}^d\times \mathbb {R}^d). \end{aligned}$$

In the present work, we investigate the one-dimensional case with the quadratic kernel \(W(v)=\frac{1}{2} \vert v\vert ^2\) which is neither covered by the analysis of [4] nor by the entropy estimate of [2] (actually the entropy cannot be globally bounded in this case, see [2]). In this case the convolution takes the form

$$\begin{aligned} \int _{\mathbb {R}}(v-u)f_t(x,u) \text{ d } u=\rho _t(x) v -m_t(x), \end{aligned}$$

where

$$\begin{aligned} \rho _t(x):=\int _{\mathbb {R}} f_t(x,v) \text{ d } v, \; m_t(x):=\int _{\mathbb {R}} v f_t(x,v) \text{ d } v, \end{aligned}$$
(1.3)

so that the kinetic equation (1.1) rewrites

$$\begin{aligned} \partial _t f_t(x,v)+v \partial _x f_t(x,v)= \partial _v\Big (f_t(x,v)(\rho _t(x) v -m_t(x)) \Big ), \end{aligned}$$
(1.4)

and we supplement (1.4) with the initial condition

$$\begin{aligned} f\vert _{t=0}=f_0, \end{aligned}$$
(1.5)

where \(f_0\) is a compactly supported probability density:

$$\begin{aligned} f_0\in L^1(\mathbb {R}^d\times \mathbb {R}^d), \quad \int _{\mathbb {R}^d\times \mathbb {R}^d} f_0 \text{ d } x \text{ d } v=1 \end{aligned}$$
(1.6)

and

$$\begin{aligned} \mathrm {Supp}(f_0) \subset B_{R_x} \times B_{R_v} \end{aligned}$$
(1.7)

for some positive constants \(R_x\) and \(R_v\). We shall see later on, how to treat more general measures as initial conditions. Our first contribution is the observation that, thanks to a special first integral of motion for the characteristics system associated with (1.4), one may define weak solutions not at the level of measures on the phase space but on a (possibly infinite) product of measures on the physical space. Our second contribution is to show that this reformulation has a gradient flow structure for an energy functional with good properties which will enable us to prove global well-posedness. To the best of our knowledge, even if the situation we are dealing with is very particular, this is the first global result of this type for kinetic models of granular media. As pointed out to us by Yann Brenier, our analysis has some similarities with (but is different from) some models of sticky particles for pressureless flows (see [8, 9]) and Brenier’s formulation of the Darcy–Boussinesq system [7].

The article is organized as follows. In Sect. 2, we show how a certain first integral of motion can be used to give a reformulation of (1.4) which allows for measure solutions. Section 3 investigates the gradient flow structure of this reformulation. Section 4 proves global existence thanks to the celebrated Jordan–Kinderlehrer–Otto (henceforth JKO) implicit Euler scheme of [16] for a certain energy functional. In Sect. 5, we prove uniqueness and stability and give some concluding remarks.

2 A first integral and measure solutions

2.1 A first integral for classical solutions

Let us consider a \(C^1\) compactly supported initial condition \(f_0\) and a classical solution f, that is a \(C^1\) function which solves (1.4) in a pointwise sense on \(\mathbb {R}_+\times \mathbb {R}^d\times \mathbb {R}^d\). It is then easy to show (see [2]) that f remains compactly supported locally in time; more precisely (1.7) and (1.4) imply that

$$\begin{aligned} \mathrm {Supp}(f_t)\subset B_{R_x+t R_v}\times B_{R_v}, \quad \forall t\ge 0. \end{aligned}$$
(2.1)

The characteristics for (1.4) is the flow map for the second-order ODE

$$\begin{aligned} \ddot{X}= -\rho _t(X) \dot{X}+m_t(X) \end{aligned}$$
(2.2)

in the sense that

$$\begin{aligned} f_t =(X_t, V_t)_\# f_0, \end{aligned}$$

where \((X_0(x,v), V_0(x,v))=(x,v)\) and

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t(x,v)=V_t(x,v), \; \frac{\mathrm{d}}{\mathrm{d}t} V_t(x,v)=-\rho _t\big (X_t(x,v)\big ) V_t(x,v)+m_t\big (X_t(x,v)\big ),\; \end{aligned}$$
(2.3)

with \(\rho \) and m being respectively the spatial marginal and momentum associated to f defined by (1.3). Integrating (1.4) with respect to v, first gives:

$$\begin{aligned} \partial _t \rho _t(x)+\partial _x m_t(x)=0, \quad t\ge 0, \; x\in \mathbb {R}\end{aligned}$$
(2.4)

so that there is a stream potential G such that

$$\begin{aligned} \rho =\partial _x G, \; m=-\partial _t G , \end{aligned}$$
(2.5)

and since \(\rho \) is a probability measure, it is natural to choose the integration constant in such a way that G is the cumulative distribution function of \(\rho \):

$$\begin{aligned} G_t(x)=\int _{-\infty }^x \rho _t(y) \text{ d } y=\rho _t\big ((-\infty , x]\big ). \end{aligned}$$
(2.6)

Replacing (2.6) in (2.2) then gives

$$\begin{aligned} \ddot{X}= -\partial _xG_t(X) \dot{X}-\partial _t G_t(X)=-\frac{\mathrm{d}}{\mathrm{d}t} G_t(X) \end{aligned}$$

so that \(\dot{X}+G_t(X)\) is constant along the characteristics. Since \(G_0\) can be deduced from the initial condition \(f_0\) by

$$\begin{aligned} G_0(x)=\int _{-\infty }^x \int _\mathbb {R}f_0(y,v) \text{ d }v \; \text{ d } y, \end{aligned}$$

we have the following explicit first integral of motion for (2.3):

$$\begin{aligned} V_t(x,v)+G_t\big ( X_t(x,v)\big )=v+G_0(x). \end{aligned}$$
(2.7)

2.2 Reformulation and equivalence for classical solutions

In view of the first integral (2.7), it is natural to perform a change of variables on the initial conditions:

$$\begin{aligned} a(x,v):=v+G_0(x)), \; \nu _0^a(x):=f_0\big (x,a-G_0(x)\big ) \end{aligned}$$

so that for every \(\phi \in C(\mathbb {R}\times \mathbb {R})\) one has

$$\begin{aligned} \int _{\mathbb {R}\times \mathbb {R}} \phi \big (x, a(x,v)\big ) f_0(x,v) \text{ d } x \text{ d } v= \int _{\mathbb {R}\times \mathbb {R}} \phi (x,a) \nu _0^a(x) \text{ d } x \text{ d } a, \end{aligned}$$

and then to rewrite the characteristics as a family of first-order ODEs parametrized by the label a:

$$\begin{aligned} \frac{\mathrm{d}}{\mathrm{d}t} X_t^a(x)=a-G_t\big (X_t^a(x)\big ),\; X_0^a (x)=x. \end{aligned}$$
(2.8)

The flow (2.3) may then be rewritten as:

$$\begin{aligned} X_t(x,v)=X_t^a (x), \; V_t(x,v)=a-G_t\big (X_t^a(x)\big ) \text{ for } a=a(x,v)=v+G_0(x). \end{aligned}$$

Hence setting

$$\begin{aligned} \nu _t^a:={X_t^a}_\# \nu _0^a, \end{aligned}$$
(2.9)

the relation \(f_t =(X_t, V_t)_\# f_0\) can be re-expressed as:

$$\begin{aligned} \int _{\mathbb {R}^2} \phi (x,v)f_t(x, v) \text{ d }x \text{ d } v=\int _{\mathbb {R}^2} \phi \big (x, a-G_t(x)\big ) \nu _t^a(x) \text{ d }x \text{ d } a \end{aligned}$$
(2.10)

for every \(t\ge 0\) and every test-function \(\phi \in C(\mathbb {R}^2)\). This implies in particular that

$$\begin{aligned} \rho _t(x)=\int _\mathbb {R}\nu _t^a(x) \text{ d }a \end{aligned}$$

and then also

$$\begin{aligned} G_t(x)=\int _\mathbb {R}G_t^a(x) \text{ d }a \; \text{ with } \; G_t^a (x):=\nu _t^a\big ((-\infty ,x]\big ). \end{aligned}$$
(2.11)

On the other hand, using (2.8), we deduce that for each \(a\in \mathbb {R}\), \(\nu ^a\) satisfies the continuity equation:

$$\begin{aligned} \partial _t \nu _t^a +\partial _x \Bigg ( \nu _t ^a \big (a-G_t(x)\big )\Bigg )=0, \; \nu ^a \vert _{t=0}(x)=\nu _0^a(x)=f_0\big (x, a -G_0(x)\big ). \end{aligned}$$
(2.12)

Note that \(\nu _t^a\) is a nonnegative measure but not necessarily a probability measure, its total mass being that of \(\nu _0^a\) i.e. \(h(a):=\int _\mathbb {R}f_0(x, a-G_0(x)) \text{ d } x\).

The previous considerations show that any classical solution of (1.4) is related to a solution of the system of continuity equations (2.11) and (2.12) with initial condition \(f_0\) via the relation (2.10). The converse is also true: if \(\nu ^a\) is a family of classical solutions of (2.12) with \(G^a\) and G given by (2.11), then the time-dependent family of probability measures \(f_t\) on \(\mathbb {R}^2\) defined by (2.10) actually solves (1.4). Indeed, by construction the spatial marginal \(\rho \) of f is \(\partial _x G\); as for the momentum, we have

$$\begin{aligned} m_t(x):=\int _{\mathbb {R}} v f_t(x,v) \text{ d }v =\int _{\mathbb {R}} \big (a-G_t(x)\big ) \nu _t^a (x) \text{ d }a. \end{aligned}$$

Then, thanks to (2.12) and Fubini’s theorem, we have

$$\begin{aligned} \begin{array}{lll} \partial _t G(x)=\int _{-\infty }^x \int _\mathbb {R}\partial _t \nu ^a(y) \text{ d }y \text{ d }a &{}=- \int _{-\infty }^x \int _\mathbb {R}\partial _x\bigg (\nu _t^a(y) \big (a-G_t(y)\big )\bigg ) \text{ d }y \text{ d }a\\ &{}=-\int _{\mathbb {R}} \big (a-G_t(x)\big ) \nu _t^a (x) \mathrm{d}x=-m_t(x). \end{array} \end{aligned}$$

Then let us take a test-function \(\phi \in C_c^1(\mathbb {R}^2)\), differentiating (2.10) with respect to time, using \(\partial _x G=\rho \), \(\partial _t G=-m\), (2.10) and an integration by parts and (2.12), we have

$$\begin{aligned} \begin{array}{ll} \frac{\mathrm{d}}{\mathrm{d}t} \int _{\mathbb {R}^2} \phi f_t&{}=\int _{\mathbb {R}^2} \Big ( -\phi (x, a-G_t(x))\partial _x(\nu _t^a (a-G_t))+\partial _v \phi (x, a-G_t) m_t \nu _t^a \Big ) \text{ d }x \text{ d }a\\ &{}=\int _{\mathbb {R}^2} \Big ( \partial _x \phi (x, a-G_t(x)) -\partial _v \phi (x, a-G_t(x)) \rho _t(x) \Big ) (a-G_t(x)) \nu _t^a(x) \text{ d }x \text{ d }a\\ &{}\quad +\int _{\mathbb {R}^2} \partial _v \phi (x, v) m_t(x) f_t(x,v) \text{ d }x \text{ d }v\\ &{}= \int _{\mathbb {R}^2}( \partial _x \phi (x, v) v+ \partial _v \phi (x, v) (m_t(x)-\rho _t(x)v) f_t(x,v))\text{ d }x \text{ d }v. \end{array} \end{aligned}$$

This proves that, for classical solutions, the kinetic equation (1.4) is actually equivalent to the system of PDEs (2.12)–(2.11) indexed by the label a.

2.3 Measure solutions

We now take the system (2.11) and (2.12) as a starting point to define measure solutions. We have to suitably relax the system so as to take into account:

  • The fact that shocks may occur i.e. atoms of \(\rho \) may appear in finite time, then the cumulative distribution G may become discontinuous (in which case it will be convenient to view G, which is monotone, as a set-valued map),

  • The fact that when shocks occur, the velocity may depend on the label a,

  • More general initial conditions.

Let us treat first the case of more general initial conditions. What really matters is to be able perform the change of variables \((x,v)\mapsto (x,a):=(x, v+G_0(x))\), which can be done as soon as \(\rho _0\) is atomless i.e. does not charge points. We shall therefore assume that \(f_0\) is a probability measure on \(\mathbb {R}^2\) with compact support and having an atomless spatial marginal:

$$\begin{aligned} \mathrm {Supp}(f_0) \subset B_{R_x} \times B_{R_v}, \; \rho _0 \text{ is } \text{ atomless } \text{ i.e. } f_0(\{x\}\times \mathbb {R})=0, \; \forall x\in \mathbb {R}. \end{aligned}$$
(2.13)

Defining the spatial marginal \(\rho _0\) of \(f_0\) by

$$\begin{aligned} \int _\mathbb {R}\phi (x) \text{ d }\rho _0(x)=\int _{\mathbb {R}^2} \phi (x) \text{ d }f_0(x,v), \; \forall \phi \in C(\mathbb {R}) \end{aligned}$$

as well as its cumulative distribution function

$$\begin{aligned} G_0(x):=\rho _0\big ((-\infty ,x]\big )=f_0\big ((-\infty ,x]\times \mathbb {R}\big ), \; \forall x\in \mathbb {R}, \end{aligned}$$

\(G_0\) is continuous and \(\rho _0\) is supported on \([-R_x, R_x]\). Since \(G_0\) takes values in [0, 1], then \(a(x,v):=v+G_0(x)\in [-R_v, R_v+1]\) for \((x,v)\in \mathrm {Supp}(f_0)\). We then define the probability measure \(\eta _0\) as the push-forward of \(f_0\) through \((x,v)\mapsto (x, a(x,v))\) i.e.

$$\begin{aligned} \eta _0(C):=f_0 \Big (\big \{ (x,v) \; : \; (x,v+G_0(x))\in C\big \}\Big ), \text{ for } \text{ every } \text{ Borel } \text{ subset } \text{ C } \text{ of } \mathbb {R}^2.\nonumber \\ \end{aligned}$$
(2.14)

We then fix a \(\sigma \)-finite measure \(\mu \) such that the second marginal of \(\eta _0\) is absolutely continuous with respect to \(\mu \); for instance it could be the second marginal of \(\eta _0\), but we allow \(\mu \) to be a more general measure (not necessarily a probability measure; for instance it was the Lebesgue measure in the previous Sect. 2.2, and in the discrete example of Sect. 2.4 below, \(\mu \) will be a discrete measure). Then we can disintegrate \(\eta _0\) as \(\eta _0=\nu _0^a \otimes \mu \) which means that for every \(\phi \in C(\mathbb {R}^2)\) we have

$$\begin{aligned} \int _{\mathbb {R}^2} \phi \big (x, v+G_0(x)\big ) \text{ d } f_0(x,v)=\int _{\mathbb {R}} \Big ( \int _\mathbb {R}\phi (x,a) \text{ d } \nu _0^a(x)\Big ) \text{ d } \mu (a). \end{aligned}$$

Note that \(\nu _0^a\) is supported on \([-R_x, R_x]\) and it is not necessarily a probability measure. We denote by h(a) its total mass i.e. the Radon–Nikodym density of the second marginal of \(\eta _0\) with respect to \(\mu \):

$$\begin{aligned} \int _{\mathbb {R}^2} \phi \big (v+G_0(x)\big ) \text{ d } f_0(x,v)=\int _{\mathbb {R}} \phi (a) h(a) \text{ d } \mu (a), \quad \forall \phi \in C(\mathbb {R}) \end{aligned}$$
(2.15)

so that \(h\in L^1(\mu )\), \(\int _{\mathbb {R}} h(a) \text{ d } \mu (a)=1\) and \(h=0\) outside of the interval \([-R_v, R_v+1]\).

The rest of the paper will be devoted to study the structure and well-posedness of the following system which relaxes to a measure-valued setting the system (2.11) and (2.12):

$$\begin{aligned} \partial _t \nu _t^a +\partial _x (\nu _t^a v_t^a)=0, \; \nu ^a\big \vert _{t=0} =\nu _0^a, \end{aligned}$$
(2.16)

subject to the constraint that

$$\begin{aligned} v_t^a(x) \in \left[ a-G_t(x), a-G_t^{-}(x)\right] , \end{aligned}$$
(2.17)

where

$$\begin{aligned} \rho _t :=\int _{\mathbb {R}} \nu _t^a \text{ d }\mu (a), G_t(x)=\rho _t(\big (-\infty , x]\big ), \; G_t^-(x)=\rho _t\big ((-\infty , x)\big ). \end{aligned}$$
(2.18)

Note that when \(\mu \) is the Lebesgue measure and there are no shocks i.e. when \(G_t\) is continuous, we recover the system (2.11) and (2.12) of Sect. 2.2. Denoting by \(\mathcal{P}_2(\mathbb {R})\) the set of Borel probability measures on \(\mathbb {R}\) with finite second moment, solutions of (2.16)–(2.18) are then formally defined by:

Definition 2.1

Fix a time \(T>0\); a measure solution of the system (2.16)–(2.18) on \([0,T]\times \mathbb {R}\) is a family of measures \((t,a) \in [0,T]\times [-R_v, R_v+1] \mapsto \nu _t^a \in h(a) \mathcal{P}_2(\mathbb {R})\) which

  1. 1.

    Is measurable in the sense that for every Borel bounded function \(\phi \) on \([0,T]\times \mathbb {R}\times \mathbb {R}\), the map \((t,a)\mapsto \int _\mathbb {R}\phi (t,a,x)\text{ d } \nu _t^a(x)\) is \( \text{ d }t \otimes \mu \) measurable,

  2. 2.

    Satisfies the continuity equation (2.16) in the sense of distributions for \(h\mu \)-a.e. a, with a \(\nu _t^a\otimes \mu \otimes \text{ d }t\)-measurable velocity field \(v_t^a\) which satisfies (2.17), \(\nu _t^a\otimes \mu \otimes \text{ d }t\) a.e, and with \(G_t\) and \(G_t^{-}\) defined by (2.18).

Note that since \(v_t^a\) constrained by (2.18) is bounded, \(t\mapsto \nu _t^a\) is actually continuous for the weak convergence of measures for \(h\mu \) a.e. a. Note also that the fact that \(t\mapsto \nu _t^a\) satisfies the continuity equation (2.16) in the sense of distributions is equivalent to the condition that for every \(\psi \in C( [-R_v, R_v+1])\) and \(\phi \in C_c^1([0,T]\times \mathbb {R})\) one has:

$$\begin{aligned} \begin{array}{ll} &{}\int _{\mathbb {R}} \psi (a) \Big (\int _0^T \int _{\mathbb {R}} (\partial _t \phi (t,x)+\partial _x \phi (t,x) v_t^a(x) ) \text{ d } \nu _t^a(x) \text{ d } t\Big ) \text{ d } \mu (a)\\ &{}\quad =\int _{\mathbb {R}} \psi (a) \Big ( \int _{\mathbb {R}} \phi (T,x) \text{ d } \nu _T^a(x)- \int _{\mathbb {R}} \phi (0,x) \text{ d } \nu _0^a(x) \Big ) \text{ d } \mu (a). \end{array} \end{aligned}$$

2.4 A discrete example and a system of Burgers equations

The aim of this paragraph, somehow independent from the rest of the paper, is to show, on a discrete example, that one cannot take for granted that the stream \(G_t\) remains continuous, which justifies the necessity to relax the condition \(v_t^a (x)=a-G_t(x)\) by (2.17). Consider indeed the special case

$$\begin{aligned} f_0=\rho _0 \otimes \frac{1}{N} \sum _{i=1}^N \delta _{a_i-G_0(x)}, \end{aligned}$$

where \(\rho _0\) is a smooth compactly supported probability density and \(a_1<\cdots <a_N\) are the finitely many values that the label a may take. In this case, we take \(\mu \) as the counting measure and then

$$\begin{aligned} \mu = \sum _{i=1}^N \delta _{a_i}, \; h(a_i)=\frac{1}{N}, \; \nu _0^{a_i}=\frac{1}{N} \rho _0. \end{aligned}$$

Even though \(G_0\) is smooth, we have to expect that shocks may appear in finite time. Let us relabel the measures \(\nu ^i:=\nu ^{a_i}\) and the corresponding cumulative distributions \(G^i:=G^{a_i}\), \(G:=\sum _{j=1}^N G^j\). If G was continuous then all the nondecreasing functions \(G_i\) would also be continuous (no shocks), and, then, the system (2.16)–(2.18) would become

$$\begin{aligned} \partial _ t \nu ^i +\partial _x \Bigg (\nu ^i\Big (a_i-\sum _{j=1}^N G_j\Big )\Bigg )=0, \; \nu ^i\big \vert _{t=0}=\frac{1}{N} \rho _0, \; i=1,\ldots , N. \end{aligned}$$
(2.19)

Integrating with respect to the spatial variable between \(-\infty \) and x would then give a system of Burgers-like equations:

$$\begin{aligned} \partial _t G^i +\partial _x G^i (a_i-\sum _{j=1}^N G^j)=0, \; G^i\vert _{t=0}=\frac{1}{N} G_0, \; i=1,\ldots , N. \end{aligned}$$
(2.20)

We can at least formally rewrite each of these equations in the more familiar form

$$\begin{aligned} \partial _t G^i +\partial _x G^i \psi ^i_t(G^i) =0, \end{aligned}$$

where each function \(\psi ^i\) is implicitly defined in terms of the pseudo inverse \(H^i_t\) of \(G^i_t\):

$$\begin{aligned} \psi ^i_t (\alpha )=a_i-\alpha -\sum _{j\ne i} G^j_t \left( H^i_t(\alpha )\right) . \end{aligned}$$

Note that \(\psi _t^i\) is decreasing for every t and actually \((\psi _t^i)'\le -1\). In the absence of shocks, \(H_t^i\) simply solves \(\partial _t H^i=\psi ^i_t\). Let us then take \(x_1<x_2\) belonging to a certain interval on which \(\rho _0\ge \nu \) with \(\nu >0\) and define \(y_1:=\frac{1}{N} G_0(x_1)\), \(y_2:=\frac{1}{N} G_0(x_2)\), we then have \(y_2-y_1=\frac{1}{N} \int _{x_1}^{x_2} \rho _0\ge \frac{\nu }{N}(x_2-x_1)\). Integrating \(\partial _t H^i=\psi ^i_t\) and using the fact that \((\psi ^i)'\le -1\), we get

$$\begin{aligned} H^i_t( y_2)-H^i_t(y_1)=x_2-x_1 +\int _0^t \left( \psi ^i_s(y_2)-\psi ^i_s(y_1)\right) \text{ d } s\le x_2-x_1-t(y_2-y_1). \end{aligned}$$

This means that \(H^i_t\) becomes noninjective before a time

$$\begin{aligned} \frac{x_2-x_1}{y_2-y_1}\le \frac{N}{\nu }. \end{aligned}$$

In other words, discontinuities of \(G^i\) i.e. shocks appear in finite time of order O(N) for any finite N.

3 A gradient flow structure

In this section, assuming (2.13) we will see how to obtain solutions to the system (2.16)–(2.18) by a gradient flow approach. Existence of such gradient flows using the JKO implicit scheme for Wasserstein gradient flows will be detailed in Sect. 4. We denote by \(\mathcal{M}(\mathbb {R}^d)\) the set of Borel measures on \(\mathbb {R}^d\) and \(\mathcal{P}(\mathbb {R}^d)\) the set of Borel probability measures on \(\mathbb {R}^d\). Given two nonnegative Borel measures on \(\mathbb {R}^d\) with common finite total mass h (not necessarily 1) and finite p-moments, \(\nu \) and \(\theta \), recall that for \(p\in [1,+\infty )\), the p-Wasserstein distance between \(\nu \) and \(\theta \) is by definition:

$$\begin{aligned} W_p(\nu , \theta ):= \inf _{\gamma \in \Pi (\nu , \theta ) } \Bigg \{\int _{\mathbb {R}^d\times \mathbb {R}^d} \vert x-y\vert ^p \text{ d } \gamma (x,y) \Bigg \}^{\frac{1}{p}}, \end{aligned}$$

where \( \Pi (\nu , \theta )\) is the set of transport plans between \(\nu \) and \(\theta \) i.e. the set of Borel measures on \(\mathbb {R}^d\times \mathbb {R}^d\) having \(\nu \) and \(\theta \) as marginals (we refer to the textbooks of Villani [18, 19] for a detailed exposition of optimal transport theory). Wasserstein distances are usually defined between probability measures such as \(h^{-1} \nu \) and \(h^{-1} \theta \) , but of course they extend to measures with the same total mass and \(W_p^p(\nu , \theta )=h W_p^p(h^{-1} \nu , h^{-1}\theta )\). We shall mainly use the 2-Wasserstein distance but the 1-Wasserstein distance will be useful as well in the sequel. We also recall that the 1-Wasserstein distance can also be defined through the Kantorovich duality formula (see for instance [18, 19]):

$$\begin{aligned} W_1(\nu , \theta ):=\sup \Big \{ \int _{\mathbb {R}^d} f \text{ d } (\nu -\theta ) \; : \; f 1\text{-Lipschitz } \Big \}. \end{aligned}$$
(3.1)

We will see in Sect. 4 that one may obtain solutions to the system (2.16)–(2.18) by a minimizing scheme for an energy defined on an infinite product of spaces of measures parametrized by the label a. Wasserstein gradient flows on finite products have recently been investigated in [10, 15]. To our knowledge the case of an infinite product is new in the literature.

3.1 Functional setting

As in section 2.3, starting from \(f_0\) satisfying (2.13), let us define \(A:=[-R_v, R_v+1]\), fix a \(\sigma \)-finite measure \(\mu \) on \(\mathbb {R}\) and a measurable family of finite Borel measures \(a\in A\mapsto \nu _0^a\) such that, for every \(\phi \in C(\mathbb {R}^2)\):

$$\begin{aligned} \int _{\mathbb {R}^2} \phi \big (x,v+G_0(x)\big ) \text{ d } f_0(x,v)=\int _{\mathbb {R}} \Big ( \int _\mathbb {R}\phi (x,a) \text{ d } \nu _0^a(x)\Big ) \text{ d } \mu (a). \end{aligned}$$

As already pointed out, neither \(\mu \) nor \(\nu _0^a\) need to be probability measures, we thus define

$$\begin{aligned} h(a):=\nu _0^a (\mathbb {R}) \end{aligned}$$

so that \(h\in L^1(\mu )\), \(\int _{\mathbb {R}} h(a) \text{ d } \mu (a)=\int _A h(a) \text{ d } \mu (a)=1\). Let us then denote by X the set consisting of all \({\varvec{\nu }}:=(\nu ^a)_{a\in A}\), \(\mu \)-measurable families of measures such that

$$\begin{aligned} \nu ^a(\mathbb {R})=h(a); \; \text{ for } \mu \text{-a.e. }\; a \quad \text{ and } \quad \int _A \int _\mathbb {R}x^2 \text{ d } \nu ^a(x) \text{ d } \mu (a)<\!+\infty . \end{aligned}$$

Given \(R>0\) [the precise choice of R will be made later on, see (4.2) below], let us denote by \(X_R\) the subset of X defined by

$$\begin{aligned} X_R:=\big \{{\varvec{\nu }}\in X \; : \; \mathrm {Supp}(\nu ^a)\subset [-R, R], \text{ for } \mu \text{-a.e. }\; a\in A\big \}. \end{aligned}$$
(3.2)

For \({\varvec{\nu }}\in X_R\), let us define the probability [because \(\int _\mathbb {R}h(a) \text{ d } \mu (a)=1\)] measure

$$\begin{aligned} {\overline{{\varvec{\nu }}}}:=\int _{\mathbb {R}} \nu ^a \text{ d } \mu (a) \end{aligned}$$

and the energy

$$\begin{aligned} J({\varvec{\nu }})=\frac{1}{4} \int _{\mathbb {R}\times \mathbb {R}} \vert x-y\vert \text{ d } {\overline{{\varvec{\nu }}}}(x) \text{ d } {\overline{{\varvec{\nu }}}}(y)+\int _A \int _{\mathbb {R}} \Big (\frac{1}{2}-a\Big ) x \text{ d } \nu ^a(x) \text{ d } \mu (a). \end{aligned}$$
(3.3)

Note that J is unbounded from below on the whole of X but it is bounded on each \(X_R\). Note also that the interaction term can be rewritten as:

$$\begin{aligned} \int _{\mathbb {R}\times \mathbb {R}} \vert x-y\vert \text{ d } {\overline{{\varvec{\nu }}}}(x) \text{ d } {\overline{{\varvec{\nu }}}}(y)= \int _{\mathbb {R}^4} \vert x-y\vert \text{ d } \nu ^a(x)\text{ d } \nu ^b(y) \text{ d } \mu (a) \text{ d } \mu (b). \end{aligned}$$
(3.4)

We equip \(X_R\) with the distance d given by:

$$\begin{aligned} d^2({\varvec{\nu }}, {\varvec{\theta }}):=\int _A W_2^2(\nu ^a, \theta ^a) \text{ d } \mu (a), \; ({\varvec{\nu }}, {\varvec{\theta }})=\left( (\nu ^a)_{a\in A}, (\theta ^a)_{a\in A}\right) \in X_R\times X_R. \end{aligned}$$
(3.5)

It will also be convenient to work with the weak topology on \(X_R\) that is the one defined by the family of semi-norms

$$\begin{aligned} p_\phi ({\varvec{\nu }}):=\Big \vert \int _{A\times [-R,R]} \phi \text{ d } ({\varvec{\nu }}\otimes \mu ) \ \Big \vert , \quad \phi \in C\left( A\times [-R, R]\right) , \end{aligned}$$

where \({\varvec{\nu }}\otimes \mu \) is the probability measure defined by

$$\begin{aligned} \int _{A\times [-R,R]} \phi \text{ d } ({\varvec{\nu }}\otimes \mu ) := \int _{A} \Big (\int _{[-R,R]} \phi (a,x) \text{ d } \nu ^a (x)\Big ) \text{ d } \mu (a) \end{aligned}$$

and

$$\begin{aligned} K:=A\times [-R, R] \end{aligned}$$

so that convergence for the weak topology is nothing but weak-\(*\) convergence of \({\varvec{\nu }}\otimes \mu \). Since for all \({\varvec{\nu }}\in X_R\), \({\varvec{\nu }}\otimes \mu \) is a probability measure on the compact set \(A\times [-R, R]\), \(X_R\) is compact for the weak topology. Note also that since the weak-\(*\) topology is metrizable by the Wasserstein distance (see [18, 19]) on the set of probability measures on a compact set of \(\mathbb {R}^2\), the weak topology is metrizable by the distance \(d_w\):

$$\begin{aligned} d_w^2({\varvec{\nu }}, {\varvec{\theta }}):= W_2^2({\varvec{\nu }}\otimes \mu , {\varvec{\theta }}\otimes \mu ), \; ({\varvec{\nu }}, {\varvec{\theta }})\in X_R\times X_R, \end{aligned}$$
(3.6)

so that \((X_R, d_w)\) is a compact metric space. We summarize the basic properties of J, d and \(d_w\) in the following.

Lemma 3.1

Let \(X_R\), J, d and \(d_w\) be defined as above then we have:

  1. 1.

    J is Lipschitz continuous for \(d_w\),

  2. 2.

    \(d_w\le d\),

  3. 3.

    d is lower semicontinous for \(d_w\): if \(({\varvec{\nu }}_n)_n\) is a sequence in \(X_R\), \(({\varvec{\nu }}, {\varvec{\theta }})\in X_R\times X_R\) and \(\lim _n d_w({\varvec{\nu }}_n, {\varvec{\nu }})=0\) then \(\liminf _n d^2({\varvec{\nu }}_n, {\varvec{\theta }})\ge d^2({\varvec{\nu }}, {\varvec{\theta }})\).

Proof

Let us recall that if \(\theta \) and \(\nu \) are (compactly supported say) probability measures on \(\mathbb {R}^d\) then by Cauchy–Schwarz inequality,

$$\begin{aligned} W_1( \nu , \theta )\le W_2(\nu , \theta ) \end{aligned}$$
(3.7)

and, it follows from (3.1) that, if f is M-Lipschitz then

$$\begin{aligned} \int _{\mathbb {R}^d} f\text{ d } (\nu -\theta )\le M W_1(\nu , \theta ). \end{aligned}$$
(3.8)

Moreover,

$$\begin{aligned} W_1(\nu \otimes \nu , \theta \otimes \theta )\le 2 W_1(\nu , \theta ). \end{aligned}$$
(3.9)
  1. 1.

    Let us rewrite J as

    $$\begin{aligned} J({\varvec{\nu }})=\frac{1}{4} J_0({\varvec{\nu }})+J_1({\varvec{\nu }}), \end{aligned}$$

    with

    $$\begin{aligned} J_0({\varvec{\nu }}):=\int _{K^2} \vert x-y\vert \text{ d } ({\varvec{\nu }}\otimes \mu )(a,x) \text{ d } ({\varvec{\nu }}\otimes \mu )(b,y), \end{aligned}$$
    (3.10)

    and

    $$\begin{aligned} J_1({\varvec{\nu }}):=\int _K \Big (\frac{1}{2}-a\Big )x \text{ d } ({\varvec{\nu }}\otimes \mu )(a,x). \end{aligned}$$
    (3.11)

    The fact that \(J_1\) is Lipschitz for \(d_w\) directly follows from (3.7), (3.8) and the fact that the integrand in \(J_1\) is uniformly Lipschitz in x. As for \(J_0\), using also (3.9) and the fact that the distance is 1-Lipschitz, we have

    $$\begin{aligned} \begin{array}{ll} J_0({\varvec{\nu }})-J_0({\varvec{\theta }})\le W_1(({\varvec{\nu }}\otimes \mu ) \otimes ({\varvec{\nu }}\otimes \mu ), ({\varvec{\theta }}\otimes \mu ) \otimes ({\varvec{\theta }}\otimes \mu ))\\ \quad \le 2 W_2({\varvec{\nu }}\otimes \mu , {\varvec{\theta }}\otimes \mu )=2d_w({\varvec{\nu }}, {\varvec{\theta }}). \end{array} \end{aligned}$$
  2. 2.

    Let \({\varvec{\nu }}=(\nu ^a)_{a\in A}\) and \({\varvec{\theta }}=(\theta ^a)_{a\in A}\) be two elements of \(X_R\) and let \(\gamma ^a\) be an optimal plan between \(\nu ^a\) and \(\theta ^a\) (which can be chosen in a \(\mu \)-measurable way, thanks to standard measurable selection arguments, see [14]). Let us then define the probability measure \(\alpha \) on \(K^2\) by

    $$\begin{aligned}&\int _{K\times K} \phi \big ((a,x), (b, y)\big ) \text{ d } \alpha (a,x, b,y)\\&\quad :=\int _A \Big (\int _{[-R,R]^2} \phi \big ((a,x), (a,y)\big ) \text{ d } \gamma ^a(x,y) \Big ) \text{ d } \mu (a) \end{aligned}$$

    for all \(\phi \in C(K\times K)\). Observing that \(\alpha \in \Pi ({\varvec{\nu }}\otimes \mu , {\varvec{\theta }}\otimes \nu )\), we get

    $$\begin{aligned} d^2_w({\varvec{\nu }}, {\varvec{\theta }})&\le \int _{K\times K} \vert x-y\vert ^2 \text{ d } \alpha (a,x, b,y) =\int _A \Big ( \int _{[-R,R]^2} \vert x-y\vert ^2 \text{ d } \gamma ^a (x,y) \Big )\text{ d } \mu (a)\\&= \int _A W_2^2 (\nu ^a, \theta ^a) \text{ d } \mu (a)=d^2({\varvec{\nu }}, {\varvec{\theta }}). \end{aligned}$$
  3. 3.

    Let \(\gamma _n^a\) be an optimal plan (\(\mu \)-measurable with respect to a) between \(\nu _n^a\) and \(\theta ^a\). Again passing to a subsequence if necessary we may assume that \(\gamma _n^a\otimes \mu \) weakly \(*\) converges to some measure of the form \(\gamma ^a \otimes \mu \). Using test-functions of the form \(\psi (a)(\alpha (x)+\beta (y))\) we deduce easily that for \(\mu \)-almost every a, \(\gamma ^a\in \Pi (\nu ^a, \theta ^a)\) and then

    $$\begin{aligned}&\liminf _n d^2({\varvec{\nu }}_n, {\varvec{\theta }}) =\liminf \int _{A} \int _{[-R,R]^2} \vert x-y \vert ^2 \text{ d } \gamma _n^a (x,y) \text{ d }\mu ( a)\\&\quad = \int _A\int _{[-R,R]^2} \vert x-y \vert ^2 \text{ d } \gamma ^a (x,y) \text{ d }\mu ( a) \ge d^2({\varvec{\nu }}, {\varvec{\theta }}). \end{aligned}$$

3.2 Subdifferential of the energy and gradient flows as measure solutions

Let us start with some convexity properties of J. Let \({\varvec{\nu }}=(\nu ^a)_{a\in A}\) and \({\varvec{\theta }}\) belong to \(X_R\) and let \({\varvec{\gamma }}:=(\gamma ^a)_{a\in A}\) be a measurable family of transport plans between \(\nu ^a\) and \(\theta ^a\) [which we shall simply denote by \({\varvec{\gamma }}\in \Pi ({\varvec{\nu }}, {\varvec{\theta }})\)]. For \(\varepsilon \in [0,1]\), then define

$$\begin{aligned} {\varvec{\nu }}_\varepsilon :=(((1-\varepsilon ) \pi _1+\varepsilon \pi _2)_\# \gamma ^a)_{a\in A}, \end{aligned}$$
(3.12)

where \(\pi _1\) and \(\pi _2\) are the canonical projections \(\pi _1(x,y)=x\), \(\pi _2(x,y)=y\). Then \(\varepsilon \in [0,1]\mapsto {\varvec{\nu }}_\varepsilon \) is a curve which interpolates between \({\varvec{\nu }}\) and \({\varvec{\theta }}\). Similarly if we take transport plans \(\gamma ^a\) induced by maps of the form \({{\mathrm{id}}}+\xi ^a\) with \({\varvec{\xi }}=(\xi ^a)_{a\in A} \in L^{\infty } ({\varvec{\nu }}\otimes \mu )\) i.e. \(\theta ^a=({{\mathrm{id}}}+\xi ^a)_\#\nu ^a\) then \(\nu _\varepsilon ^a=({{\mathrm{id}}}+\varepsilon \xi ^a)_\#\nu ^a\) and in this case, we shall simply denote \({\varvec{\xi }}:=(\xi ^a)_{a\in A}\) and \({\varvec{\nu }}_\varepsilon \) as

$$\begin{aligned} {\varvec{\nu }}_\varepsilon =({\varvec{{{\mathrm{id}}}}}+\varepsilon {\varvec{\xi }})_\#{\varvec{\nu }}, \; {\varvec{\theta }}=({\varvec{{{\mathrm{id}}}}}+ {\varvec{\xi }})_\#{\varvec{\nu }}. \end{aligned}$$

Lemma 3.2

Let \({\varvec{\nu }}\) and \({\varvec{\theta }}\) be in \(X_R\), \({\varvec{\gamma }}\in \Pi ({\varvec{\nu }}, {\varvec{\theta }})\) and \({\varvec{\nu }}_\varepsilon \) be given by (3.12). Then

$$\begin{aligned} J({\varvec{\nu }}_\varepsilon )\le (1-\varepsilon ) J({\varvec{\nu }})+\varepsilon J({\varvec{\theta }}), \; \forall \varepsilon \in [0,1]. \end{aligned}$$

In particular, the same inequality holds if \({\varvec{\nu }}_\varepsilon =({\varvec{{{\mathrm{id}}}}}+\varepsilon {\varvec{\xi }})_\#{\varvec{\nu }}\) with \({\varvec{\xi }}\in L^{\infty } ({\varvec{\nu }}\otimes \mu )\).

Proof

This immediately follows from the construction of \({\varvec{\nu }}_\varepsilon \), the convexity of the absolute value in \(J_0\) defined by (3.10) and the linearity in x of the integrand in \(J_1\) defined by (3.11). \(\square \)

Remark 3.3

The convexity Lemma 3.2 holds along the interpolation \({\varvec{\nu }}_\varepsilon \) given by any transportation plan \(\gamma ^a\) between \(\nu ^a\) and \(\mu ^a\), it is in particular true when in addition \(\gamma ^a\) is a required to be an optimal plan, in such a case, it is easy to see that the interpolation \(\varepsilon \in [0,1]\mapsto {\varvec{\nu }}_\varepsilon \) is a geodesic between \({\varvec{\nu }}\) and \({\varvec{\theta }}\), in other words, J is convex along geodesics (but does not satisfy any strong convexity property along geodesics).

Definition 3.4

Let \({\varvec{\nu }}\in X_R\), the subdifferential of J at \({\varvec{\nu }}\), denoted \(\partial J({\varvec{\nu }})\), consists of all \({\varvec{w}}:=(w^a)_{a\in A} \in L^1({\varvec{\nu }}\otimes \mu )\) such that for every \(R'>0\), every \({\varvec{\theta }}\in X_{R'}\) and every \({\varvec{\gamma }}=(\gamma ^a)_{a\in A} \in \Pi ({\varvec{\nu }}, {\varvec{\theta }})\), one has

$$\begin{aligned} J({\varvec{\theta }})-J({\varvec{\nu }}) \ge \int _ {[-R,R]\times [-R', R']\times A} w^a (y) (z-y) \text{ d } \gamma ^a(y,z) \text{ d }\mu (a). \end{aligned}$$

Remark 3.5

An equivalent way to define \(\partial J({\varvec{\nu }})\) (which will turn out to be more convenient in the sequel to prove stability properties, see Lemma 4.4) is in terms of transition kernels rather than of transport plans. More precisely, given \({\varvec{\nu }}\in X_R\), we define the set \(T({\varvec{\nu }})\) of \({\varvec{\nu }}\otimes \mu \) measurable maps \({\varvec{\eta }}\): \((a,y)\in K\mapsto \eta ^{a,y}\in \mathcal{P}(\mathbb {R})\) such that there exists an \(R'>0\) such that \(\eta ^{a,y}\) is supported by \([-R', R']\) for \({\varvec{\nu }}\otimes \mu \) almost every \((a,y)\in K\). We then define \({\varvec{\nu }}_{{\varvec{\eta }}}=(\nu ^a_{{\varvec{\eta }}})_{a\in A}\) by

$$\begin{aligned} \int _{\mathbb {R}} \varphi (z) \text{ d } \nu ^a_{{\varvec{\eta }}}(z):=\int _{\mathbb {R}^2} \varphi (z) \text{ d } \eta ^{a,y} (z) \text{ d } \nu ^a (y), \quad \forall \varphi \in C(\mathbb {R}). \end{aligned}$$

By construction, \({\varvec{\gamma }}=(\gamma ^a)_{a\in A}\) with \(\gamma ^a =\nu ^a \otimes \eta ^{a,y}\) defined by

$$\begin{aligned} \int _{\mathbb {R}^2} \varphi (y,z) \text{ d } \gamma ^a(y,z):=\int _{\mathbb {R}^2} \varphi (y,z) \text{ d } \eta ^{a,y} (z) \text{ d } \nu ^a (y), \quad \forall \varphi \in C(\mathbb {R}^2) \end{aligned}$$

belongs to \(\Pi ({\varvec{\nu }}, {\varvec{\nu }}_{{\varvec{\eta }}})\) and thanks to the disintegration Theorem, it is then easy to check that \({\varvec{w}}\in \partial J({\varvec{\nu }})\) if and only if, for every \({\varvec{\eta }}\in T({\varvec{\nu }})\), one has

$$\begin{aligned} J({\varvec{\nu }}_{\varvec{\eta }})-J({\varvec{\nu }})\ge \int _{\mathbb {R}^3} w^a (y)(z-y) \text{ d } \eta ^{a,y} (z) \text{ d } \nu ^a (y) \text{ d } \mu (a). \end{aligned}$$
(3.13)

Remark 3.6

If we restrict ourselves to transport maps [i.e. take \(\eta ^{a,y}=\delta _{\xi ^a(y)}\) in (3.13)], we obtain a condition which is weaker than definition 3.4 but somehow easier to handle. If \({\varvec{w}}:=(w^a)_{a\in A} \in L^1({\varvec{\nu }}\otimes \mu )\in \partial J({\varvec{\nu }})\) then for every \({\varvec{\xi }}=(\xi ^a)_{a\in A} \in L^{\infty } ({\varvec{\nu }}\otimes \mu )\), one has

$$\begin{aligned} J\big (({\varvec{{{\mathrm{id}}}}}+ {\varvec{\xi }})_\#{\varvec{\nu }}\big )-J({\varvec{\nu }}) \ge \int _K {\varvec{w}}{\varvec{\xi }}\text{ d }({\varvec{\nu }}\otimes \mu )=\int _K w^a (x) \xi ^a (x) \text{ d } \nu ^a(x) \text{ d }\mu (a). \end{aligned}$$
(3.14)

Remark 3.7

The subdifferential \(\partial J\) obviously has the following monotonicity property (which will be crucial for uniqueness, see Sect. 5): if \({\varvec{\nu }}_1\) and \({\varvec{\nu }}_2\) belong to \(X_R\) and \({\varvec{w}}_1\in \partial J({\varvec{\nu }}_1)\) and \({\varvec{w}}_2\in \partial J({\varvec{\nu }}_2)\), then for every \({\varvec{\gamma }}\in \Pi ({\varvec{\nu }}_1, {\varvec{\nu }}_2)\), one has

$$\begin{aligned} \int _{\mathbb {R}^3} (w_1^a(y)-w_2^a(z))(y-z) \text{ d } \gamma ^a(y,z) \text{ d } \mu (a)\ge 0. \end{aligned}$$
(3.15)

The connection between the subdifferential [in fact the weak condition (3.14)] of the energy J given by (3.3) and the condition (2.17) is clarified by the following:

Proposition 3.8

Let \({\varvec{\nu }}\in X_R\), if \({\varvec{w}}\in \partial J({\varvec{\nu }})\) then, defining the x-marginal of \({\varvec{\nu }}\otimes \mu \) by

$$\begin{aligned} \rho :=\int _A \nu ^a \text{ d } \mu (a) \end{aligned}$$

and its cumulative distribution function by

$$\begin{aligned} G(x):=\rho \big ((-\infty , x]\big ),\; G^{-}(x)=\rho \big ((-\infty , x)\big ), \quad \forall x\in \mathbb {R}, \end{aligned}$$

we have

$$\begin{aligned} w^a(x)\in [G^{-}(x)-a, G(x)-a] \quad \text{ for } {\varvec{\nu }}\otimes \mu \text{ a.e. } (a,x). \end{aligned}$$
(3.16)

In particular \({\varvec{w}}\in L^{\infty } ({\varvec{\nu }}\otimes \mu )\) with

$$\begin{aligned} \Vert {\varvec{w}}\Vert _{L^{\infty } ({\varvec{\nu }}\otimes \mu )} \le R_v+2. \end{aligned}$$
(3.17)

Proof

Let \({\varvec{\xi }}\in L^{\infty } ({\varvec{\nu }}\otimes \mu )\) and define \({\varvec{\nu }}_\varepsilon :=({\varvec{{{\mathrm{id}}}}}+\varepsilon {\varvec{\xi }})_\# {\varvec{\nu }}\) for \(\varepsilon \in [0,1]\). Since \({\varvec{w}}\in \partial J({\varvec{\nu }})\) we have in particular

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+} \frac{1}{\varepsilon } \big (J({\varvec{\nu }}_\varepsilon )-J({\varvec{\nu }})\big ) \ge \int _K {\varvec{w}}{\varvec{\xi }}\text{ d }({\varvec{\nu }}\otimes \mu )= \int _K w^a (x) \xi ^a(x) \text{ d } \nu ^a(x) \text{ d } \mu (a). \end{aligned}$$
(3.18)

Defining \(J_0\) and \(J_1\) as in (3.10) and (3.11) and \(K:=A\times [-R, R]\) , first we have

$$\begin{aligned} \frac{1}{\varepsilon } \big (J_1({\varvec{\nu }}_\varepsilon )-J_1({\varvec{\nu }})\big ) =I_0:= \int _K \Big (\frac{1}{2}-a\Big ) \xi ^a(x) \text{ d } \nu ^a(x) \text{ d } \mu (a). \end{aligned}$$
(3.19)

We then write

$$\begin{aligned} \frac{1}{\varepsilon } \big (J_0({\varvec{\nu }}_\varepsilon )-J_0({\varvec{\nu }})\big ) = \int _{K\times K} \eta _\varepsilon (a,b,x, y) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x) \text{ d }( {\varvec{\nu }}\otimes \mu )(b,y) \end{aligned}$$
(3.20)

with

$$\begin{aligned} \eta _\varepsilon (a,b,x, y) =\frac{1}{\varepsilon } \Big ( \big \vert x +\varepsilon \xi ^a (x)-\big (y+\varepsilon \xi ^b(y)\big )\big \vert - \big \vert x-y\big \vert \Big ). \end{aligned}$$
(3.21)

Observing that \(\eta _\varepsilon \) is bounded by \(2 \Vert {\varvec{\xi }}\Vert _{L^{\infty } ({\varvec{\nu }}\otimes \mu )}\) and that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+} \eta _\varepsilon (a,b,x, y)= {\left\{ \begin{array}{ll} {{\mathrm{sign}}}(x-y) \left( \xi ^a(x)-\xi ^b(y)\right) , &{} \text{ if } x\ne y \\ \vert \xi ^a(x)-\xi ^b(y)\vert , &{} \text{ if } x=y, \end{array}\right. } \end{aligned}$$
(3.22)

by Lebesgue’s dominated convergence theorem, we get

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+} \frac{1}{\varepsilon } \bigg (J({\varvec{\nu }}_\varepsilon )-J({\varvec{\nu }})\bigg )=I_0+I_1+I_2 \end{aligned}$$
(3.23)

with \(I_0\) given by (3.19), and

$$\begin{aligned} I_1=\frac{1}{4} \int _{K\times K} {\mathbf {1}}_{x\ne y}{{\mathrm{sign}}}(x-y) \left( \xi ^a(x)-\xi ^b(y)\right) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x) \text{ d }( {\varvec{\nu }}\otimes \mu )(b,y)\qquad \end{aligned}$$
(3.24)

and

$$\begin{aligned} I_2=\frac{1}{4} \int _{K\times K} {\mathbf {1}}_{x=y} \left| \xi ^a(x)-\xi ^b(x)\right| \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x) \text{ d }( {\varvec{\nu }}\otimes \mu )(b,y). \end{aligned}$$
(3.25)

To compute \(I_1\) we observe that thanks to Fubini’s theorem

$$\begin{aligned}&\frac{1}{4} \int _{K\times K} {\mathbf {1}}_{x>y} (\xi ^a(x)-\xi ^b(y)) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x) \text{ d }( {\varvec{\nu }}\otimes \mu )(b,y) \\&\quad = \frac{1}{4} \int _{K} \xi ^a(x) G^{-}(x) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x) -\frac{1}{4} \int _{K} \xi ^b(y) (1-G(y)) \text{ d }( {\varvec{\nu }}\otimes \mu )(b,y)\\&\quad = \frac{1}{4} \int _{K} \xi ^a(x) (G^-(x)+G(x)-1) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x). \end{aligned}$$

Treating similarly the integral on \(\{x<y\}\) we thus get

$$\begin{aligned} I_1= \int _{K} \Big (\frac{G^-(x)+G(x)}{2}-\frac{1}{2}\Big ) \xi ^a(x) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x). \end{aligned}$$
(3.26)

As for \(I_2\), we have

$$\begin{aligned} I_2\le \frac{1}{4} \int _{A\times A} \Big ( \int _{[-R, R]} \Big ( \vert \xi ^a(x)\vert +\vert \xi ^b(x)\vert \Big ) \nu ^b(\{x\}) \text{ d } \nu ^a(x) \Big ) \text{ d } \mu (a) \text{ d } \mu (b), \end{aligned}$$
(3.27)

then we use Fubini’s theorem to get

$$\begin{aligned}&\int _{A\times A} \Big ( \int _{[-R, R]} \vert \xi ^a(x)\vert \nu ^b(\{x\}) \text{ d } \nu ^a(x) \Big ) \text{ d } \mu (a) \text{ d } \mu (b)\\&\quad =\int _K \vert \xi ^a (x)\vert (G(x)-G^-(x)) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x). \end{aligned}$$

Note that in the previous integral, the integration with respect to x is actually a discrete sum, because the set of atoms where \(G>G^{-}\) is at most countable since G is nondecreasing; let us denote this set by

$$\begin{aligned} S:=\{x\in [-R, R] \; : \; G(x)-G^{-}(x)>0\}=\{x_i\}_{i\in I}, \end{aligned}$$

where I is at most countable. Similarly for the second term in the right hand side of (3.27) observing that \(\vert \xi ^b(x)\vert \int _A \nu ^b(\{x\}) \text{ d } \mu (b) \le \Vert {\varvec{\xi }}\Vert _{L^{\infty } ({\varvec{\nu }}\otimes \mu )} (G(x)-G^{-}(x))\), we only have to integrate in x over S which gives

$$\begin{aligned}&\int _{A\times A} \Big ( \int _{[-R, R]} \big \vert \xi ^b(x)\big \vert \nu ^b\big (\{x\}\big ) \text{ d } \nu ^a(x) \Big ) \text{ d } \mu (a) \text{ d } \mu (b)\\&\quad =\int _{A\times A} \Big ( \sum _{i\in I} \vert \xi ^b(x_i)\vert \nu ^b(\{x_i\}) \nu ^a(\{x_i\}) \Big ) \text{ d } \mu (a) \text{ d } \mu (b)\\&\quad =\int _{A} \Big ( \sum _{i\in I} \vert \xi ^b(x_i)\vert \nu ^b(\{x_i\}) (G(x_i)-G^{-}(x_i) \Big ) \text{ d } \mu (b)\\&\quad =\int _K \big \vert \xi ^b (x)\big \vert \big (G(x)-G^-(x)\big ) \text{ d }( {\varvec{\nu }}\otimes \mu )(b,x), \end{aligned}$$

so that

$$\begin{aligned} I_2\le \frac{1}{2} \int _K \vert \xi ^a (x)\vert (G(x)-G^-(x)) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x). \end{aligned}$$
(3.28)

Putting together (3.18), (3.19), (3.23), (3.26) and (3.28) we arrive at the inequality

$$\begin{aligned}&\int _K \Big (w^a (x)+a-\frac{1}{2}(G(x)+G^-(x))\Big ) \xi ^a(x) \text{ d } ({\varvec{\nu }}\otimes \mu )(a,x)\\&\quad \le \frac{1}{2} \int _K \vert \xi ^a (x)\vert (G(x)-G^-(x)) \text{ d }( {\varvec{\nu }}\otimes \mu )(a,x) \end{aligned}$$

which holds for any \({\varvec{\xi }}\in L^{\infty } ({\varvec{\nu }}\otimes \mu )\) and (3.16) obviously follows. \(\square \)

Definition 3.9

A gradient flow of J on the time interval [0, T] starting from \({\varvec{\nu }}_0\) is a Lipschitz continuous (for d) curve \(t\in [0,T]\mapsto {\varvec{\nu }}(t)=(\nu (t)^a)_{a\in A} \in X_R\) together with a measurable map \(t\in [0,T]\mapsto {\varvec{v}}(t) \in L^1({\varvec{\nu }}\otimes \mu )\) such that \({\varvec{v}}(t) \in -\partial J({\varvec{\nu }}(t))\) for almost every \(t\in [0,T]\), and for \(\mu \)-almost every \(a\in A\), \(t\mapsto \nu (t)^a\) is a solution in the sense of distributions of the continuity equation (2.16).

It then follows from Proposition 3.8 that gradient flows starting from \({\varvec{\nu }}_0\) are measure solutions of the system (2.16)–(2.18). Note also that thanks to the bound (3.17), gradient flows are not only absolutely continuous but automatically Lipschitz for d and even more is true: for \(\mu \)-almost every a, the curve \(t\mapsto \nu _t^a\) is Lipschitz for \(W_2\), more precisely

$$\begin{aligned} W_2(\nu _t^a, \nu _s^a)\le \vert t-s\vert (R_v+2) h(a)^{1/2} \text{ hence } d({\varvec{\nu }}(t), {\varvec{\nu }}(s))\le \vert t-s\vert (R_v+2). \end{aligned}$$
(3.29)

4 Existence by the JKO scheme

We will prove existence of a gradient flow curve on the time interval [0, T] starting from \({\varvec{\nu }}_0=(\nu _0^a)_{a\in A}\) by considering the JKO scheme. Given a time step \(\tau >0\), starting from \({\varvec{\nu }}_0\), we construct inductively a sequence \({\varvec{\nu }}_k\) by

$$\begin{aligned} {\varvec{\nu }}_{k+1}\in {{\mathrm{argmin}}}_{{\varvec{\nu }}\in X} \Big \{\frac{1}{2\tau } d^2({\varvec{\nu }}, {\varvec{\nu }}_k)+J({\varvec{\nu }})\Big \} \end{aligned}$$
(4.1)

for \(k=0, \cdots , N\) with \(N:=[\frac{T}{\tau }]\).

4.1 Estimates

The first step in proving that this scheme is well-defined consists in showing that one can a priori bound the support. This is based on the following basic result (which we state in any dimension d eventhough, in the sequel, we will only apply it when \(d=1\)):

Lemma 4.1

Let \(R_0\), \(R>0\) and \(\tau \) be positive constants, \(\nu _0\) be a probability measure on \(\mathbb {R}^d\) with support in \(B_{R_0}\) and \(\nu \in \mathcal{P}_2(\mathbb {R}^d)\). Let P be the projection onto \(B_{R_0+\tau R}\) and define \({\hat{\nu }}:=P_\# \nu \). Then, for every \(a\in B_R\), one has

$$\begin{aligned} \frac{1}{2} W_2^2({\hat{\nu }}, \nu _0)-\tau \int _{\mathbb {R}^d} a \cdot x \text{ d } {\hat{\nu }}(x) \le \frac{1}{2} W_2^2(\nu , \nu _0)-\tau \int _{\mathbb {R}^d} a \cdot x \text{ d } \nu (x). \end{aligned}$$

Proof

Fix an optimal transport plan between \(\nu _0\) and \(\nu \) i.e. a \(\gamma \in \Pi (\nu _0, \nu )\) such that \(W_2^2(\nu , \nu _0)=\int _{\mathbb {R}^d\times \mathbb {R}^d} \vert x-y \vert ^2 \text{ d } \gamma (x,y)\). Since the map \((x,y)\mapsto (x, P(y))\) pushes forward \(\gamma \) to a plan having \(\nu _0\) and \({\hat{\nu }}\) as marginals, we have

$$\begin{aligned} \frac{1}{2} W_2^2({\hat{\nu }}, \nu _0)\le & {} \frac{1}{2} \int _{\mathbb {R}^d\times \mathbb {R}^d} \vert x-P(y)\vert ^2 \text{ d } \gamma (x,y)\\ \\= & {} \frac{1}{2} W_2^2(\nu , \nu _0)-\frac{1}{2} \int _{\mathbb {R}^d\times \mathbb {R}^d} \vert y-P(y)\vert ^2 \text{ d } \gamma (x,y)\\ \\&+ \int _{\mathbb {R}^d\times \mathbb {R}^d} (y-P(y)) \cdot (x-P(y)) \text{ d } \gamma (x,y) \end{aligned}$$

and then

$$\begin{aligned} \begin{array}{ll} &{}\frac{1}{2} W_2^2({\hat{\nu }}, \nu _0)-\tau \int _{\mathbb {R}^d} a \cdot x \text{ d } {\hat{\nu }}(x) -\frac{1}{2} W_2^2(\nu , \nu _0)+\tau \int _{\mathbb {R}^d} a \cdot x \text{ d } \nu (x)\\ &{}\quad \le \int _{\mathbb {R}^d\times \mathbb {R}^d} (y-P(y)) \cdot (x+\tau a-P(y)) \text{ d } \gamma (x,y). \end{array} \end{aligned}$$

But since \(\gamma \)-a.e. \(x+\tau a \in B_{R_0+\tau R}\), we get that the integrand in the right-hand side is nonpositive by the well-known characterization of the projection onto \(B_{R_0+\tau R}\). \(\square \)

Now consider the first step of the JKO scheme. Since \(\nu _0^a\) is supported by \([-R_x, R_x]\), for every \(a\in A\) and \(a\in A\Rightarrow \vert a \vert \le R_v +1\), the previous lemma implies that if one replaces \({\varvec{\nu }}=(\nu ^a)_{a\in A}\in X\) by \(\hat{{\varvec{\nu }}}=(\hat{\nu }^a)_{a\in A}\) defined for every a by \(\hat{\nu }^a=P_\#\nu ^a\) where P is the projection on \([-R_x-\tau (R_v+3/2), R_x+\tau (R_v+3/2)]\), one has

$$\begin{aligned} \frac{1}{2} W_2^2({\hat{\nu }}^a, \nu ^a_0)+\tau \int _{\mathbb {R}^d} \left( \frac{1}{2}-a\right) \cdot x \text{ d } {\hat{\nu }}^a(x) \le \frac{1}{2} W_2^2(\nu ^a, \nu ^a_0)+\tau \int _{\mathbb {R}^d} \left( \frac{1}{2}-a\right) \cdot x \text{ d } \nu ^a(x). \end{aligned}$$

As for the interaction term, it is also improved by replacing \({\varvec{\nu }}\) by \(\hat{{\varvec{\nu }}}\); this is obvious from the expression (3.4) and the fact that P is 1-Lipschitz. In the first step of the JKO scheme, we may therefore impose the constraint that \({\varvec{\nu }}\in X_{R_x+\tau (R_v+3/2)}\). After k steps, we may similarly impose that the minimization is performed on \(X_{R_x+k \tau (R_v+3/2)}\), so simply setting

$$\begin{aligned} R=R_x+(T+\tau ) (R_v+3/2), \end{aligned}$$
(4.2)

we may replace (4.1) with a bound on the support:

$$\begin{aligned} {\varvec{\nu }}_{k+1}\in {{\mathrm{argmin}}}_{{\varvec{\nu }}\in X_R} \Big \{\frac{1}{2\tau } d^2({\varvec{\nu }}, {\varvec{\nu }}_k)+J({\varvec{\nu }})\Big \}. \end{aligned}$$
(4.3)

By a direct application of Lemma 3.1 and the compactness of \((X_R, d_w)\), we then see that the minimizing scheme (4.3) is well-defined and actually defines a sequence \({\varvec{\nu }}_k\), \(k=0,\ldots , N+1\). We also extend this sequence by piecewise constant in time interpolation:

$$\begin{aligned} {\varvec{\nu }}_{\tau }(t):={\varvec{\nu }}_k, \text{ for } t\in ((k-1)\tau , k\tau ], \; k=1, \cdots , N+1. \end{aligned}$$
(4.4)

In the following basic estimates, C will denote a constant (possibly depending on T) which may vary from one line to the other. By construction, for all \(k=0,\ldots , N\), we have

$$\begin{aligned} \frac{1}{2\tau } d^2({\varvec{\nu }}_{k+1}, {\varvec{\nu }}_k)\le J({\varvec{\nu }}_k)-J({\varvec{\nu }}_{k+1}). \end{aligned}$$
(4.5)

Summing and using the fact that every \({\varvec{\nu }}_k\) belongs to \(X_R\) and that J is bounded from below on \(X_R\) we get:

$$\begin{aligned} \frac{1}{2\tau } \sum _{k=0}^N d^2({\varvec{\nu }}_{k+1}, {\varvec{\nu }}_k)\le J({\varvec{\nu }}_0)-J({\varvec{\nu }}_{N+1})\le C. \end{aligned}$$
(4.6)

From (4.6), Cauchy–Schwarz inequality and Lemma 3.1 we classically get a uniform Hölder estimate:

$$\begin{aligned} d_w({\varvec{\nu }}_\tau (t), {\varvec{\nu }}_\tau (s))\le d\left( {\varvec{\nu }}_\tau (t), {\varvec{\nu }}_\tau (s)\right) \le C \sqrt{ \vert t-s\vert +\tau }, \quad \forall (s,t)\in [0, T]^2. \end{aligned}$$
(4.7)

Since \((X_R, d_w)\) is a compact metric space, it follows from some refined variant of Ascoli-Arzelà theorem (see [3]) that there exists a limit curve

$$\begin{aligned} t\mapsto {\varvec{\nu }}(t) \text{ belonging } \text{ to } C^{0,\frac{1}{2}}\left( [0,T], (X_R, d_w)\right) \end{aligned}$$

and a vanishing sequence of time-steps \(\tau _n \rightarrow 0\) as \(n\rightarrow +\infty \) such that

$$\begin{aligned} \sup _{t\in [0,T]} d_w({\varvec{\nu }}_{\tau _n}(t), {\varvec{\nu }}(t)) \rightarrow 0 \quad \text{ as } n\rightarrow +\infty . \end{aligned}$$
(4.8)

4.2 Discrete Euler–Lagrange equation

Let \({\varvec{\gamma }}_{k+1}=(\gamma _{k+1}^a)_{a\in A} \in \Pi ({\varvec{\nu }}_{k}, {\varvec{\nu }}_{k+1})\) be such that \(\gamma ^a_{k+1}\) is an optimal plan for \(\mu \)-almost every a and let \(v_{k+1}^a\) be defined by

$$\begin{aligned} \int _{[-R,R]} \xi (y) v_{k+1}^a(y) \text{ d } \nu _{k+1}^a (y)= \int _{[-R,R]^2} \xi (y)\frac{y-x}{\tau } \text{ d } \gamma _{k+1}^a(x,y) \end{aligned}$$

for all \(\xi \in C([-R, R])\), or equivalently, disintegrating \(\gamma _{k+1}^a\) with respect to its second marginal \(\nu _{k+1}^a\) as \(\text{ d } \gamma _{k+1}^a(x,y)= \text{ d } \gamma _{k+1}^{a,y}(x) \otimes \text{ d } \nu _{k+1}^a(y)\):

$$\begin{aligned} v_{k+1}^a(y)=\frac{1}{\tau }\Big ( y-\int _{[-R,R]}x \text{ d } \gamma _{k+1}^{a,y}(x)\Big ). \end{aligned}$$
(4.9)

The Euler–Lagrange equation for (4.1) can then be written as

Lemma 4.2

Let \({\varvec{\nu }}_{k+1}\) be a solution of (4.1), \({\varvec{\gamma }}_{k+1} \in \Pi ({\varvec{\nu }}_{k}, {\varvec{\nu }}_{k+1})\) and \({\varvec{v}}_{k+1}\) be constructed as above, then:

$$\begin{aligned} {\varvec{v}}_{k+1}\in -\partial J ({\varvec{\nu }}_{k+1}). \end{aligned}$$
(4.10)

Proof

Let \(R'>0\), \({\varvec{\theta }}\in X_{R'}\) and \({\varvec{\gamma }}\in \Pi ({\varvec{\nu }}^{k+1}, {\varvec{\theta }})\), and define for \(\varepsilon \in [0,1]\)

$$\begin{aligned} {\varvec{\nu }}_\varepsilon = (\nu _\varepsilon ^a)_{a\in A} \text{ with } \nu _\varepsilon ^a :=((1-\varepsilon ) \pi _1+\varepsilon \pi _2)_\#\gamma ^a. \end{aligned}$$

Then by optimality of \({\varvec{\nu }}_{k+1}\) and using Lemma 3.2, we have

$$\begin{aligned}&0\le \liminf _{\varepsilon \rightarrow 0^+} \frac{1}{\varepsilon } \Big ( \frac{1}{2\tau } (d^2({\varvec{\nu }}_\varepsilon , {\varvec{\nu }}_k)-d^2({\varvec{\nu }}_{k+1}, {\varvec{\nu }}_k)) + J({\varvec{\nu }}_\varepsilon )-J({\varvec{\nu }}_{k+1}) \Big )\\&\quad \le \liminf _{\varepsilon \rightarrow 0^+} \frac{1}{\varepsilon } \Big ( \frac{1}{2\tau } (d^2({\varvec{\nu }}_\varepsilon , {\varvec{\nu }}_k)-d^2({\varvec{\nu }}_{k+1}, {\varvec{\nu }}_k) \Big ) +J({\varvec{\theta }})-J({\varvec{\nu }}_{k+1}). \end{aligned}$$

We have already disintegrated the optimal plan \(\gamma _{k+1}^a \) between \(\nu _k^a\) and \(\nu _{k+1}^a\) as

$$\begin{aligned} \gamma _{k+1}^a (\text{ d }x,\text{ d } y)= \gamma _{k+1}^{a,y} ( \text{ d }x) \otimes \nu _{k+1}^a (\text{ d }y). \end{aligned}$$

Let us also disintegrate the (arbitrary) plan \(\gamma ^a \) between \(\nu _{k+1}^a\) and \(\theta ^a\) as:

$$\begin{aligned} \gamma ^a (\text{ d }y,\text{ d } z)= \nu _{k+1}^a (\text{ d }y) \otimes \gamma ^{a,y} (\text{ d }z). \end{aligned}$$

Define then the 3-plan \(\beta ^a\) by \(\beta ^a =( \gamma _{k+1}^{a,y} \otimes \gamma ^{a,y} )\otimes \nu _{k+1}^a\) i.e.

$$\begin{aligned} \int _{\mathbb {R}^3} \phi (x,y,z) \text{ d } \beta ^a (x,y,z):=\int _{\mathbb {R}} \Big ( \int _{\mathbb {R}^2} \phi (x,y,z) \text{ d } \gamma _{k+1}^{a,y}(x) \text{ d } \gamma ^{a,y}(z) \Big ) \text{ d } \nu _{k+1}^a (y) \end{aligned}$$

for every \(\phi \in C(\mathbb {R}^3)\). Setting

$$\begin{aligned} \begin{array}{ll} (\pi _1(x,y,z), \pi _2(x,y,z), \pi _3(x,y,z))&{}=(x,y,z), \\ (\pi _{12}(x,y,z), \pi _{23} (x,y,z), \pi _{13}(x,y,z))&{}=( (x,y), (y,z), (x,z)), \end{array} \end{aligned}$$

we have by construction, \({\pi _{12}}_\# \beta ^a =\gamma _{k+1}^a\), \({\pi _{23}}_\#\beta ^a=\gamma ^a\). By the very definition of \(\nu _\varepsilon ^a\), we also have \((\pi _1, (1-\varepsilon )\pi _2+\varepsilon \pi _3))_\#\beta ^a \in \Pi (\nu _k^a, \nu _\varepsilon ^a)\) so that

$$\begin{aligned} W_2^2(\nu _k^a, \nu _{k+1}^a)= \int _{\mathbb {R}^3} \vert y-x\vert ^2 \text{ d } \beta ^a(x,y,z) \end{aligned}$$

and

$$\begin{aligned} W_2(\nu _k^a, \nu _\varepsilon ^a)\le \int _{\mathbb {R}^3} \vert (1-\varepsilon )y+\varepsilon z-x\vert ^2 \text{ d } \beta ^a(x,y,z). \end{aligned}$$

Using Lebesgue’s dominated convergence Theorem and recalling the definition of \(\beta ^a\) and \(v_{k+1}^a\) we then get

$$\begin{aligned}&\liminf _{\varepsilon \rightarrow 0^+} \frac{1}{\varepsilon } \Big ( \frac{1}{2\tau } (d^2({\varvec{\nu }}_\varepsilon , {\varvec{\nu }}_k)-d^2({\varvec{\nu }}_{k+1}, {\varvec{\nu }}_k) \Big )\\&\quad \le \int _A \Big (\int _{\mathbb {R}^3} (z-y)\cdot \frac{y-x}{\tau } \text{ d } \beta ^a (x,y,z) \Big ) \text{ d } \mu (a)\\&\quad = \int _A \Big (\int _{\mathbb {R}^2} \Big ( \int _{[-R,R]} \frac{y-x}{\tau } \text{ d } \gamma _{k+1}^{a,y}(x) \Big ) (z-y) \text{ d } \gamma ^{a,y}(z) \text{ d } \nu _{k+1}^a(y) \Big ) \text{ d } \mu (a)\\&\quad =\int _ {[-R,R]\times [-R', R']\times A} v^a_{k+1} (y) \cdot (z-y) \text{ d } \gamma ^a(y,z) \text{ d }\mu (a). \end{aligned}$$

This yields

$$\begin{aligned} J({\varvec{\theta }})-J({\varvec{\nu }}_{k+1})\ge -\int _ {[-R,R]\times [-R', R']\times A} v^a_{k+1} (y) \cdot (z-y) \text{ d } \gamma ^a(y,z) \text{ d }\mu (a) \end{aligned}$$

i.e. \({\varvec{v}}_{k+1}\in -\partial J ({\varvec{\nu }}_{k+1})\). \(\square \)

Let us also extend \(v_{k+1}\) by piecewise constant interpolation

$$\begin{aligned} {\varvec{v}}_{\tau }(t)={\varvec{v}}_{k+1}, \; t\in \big ((k\tau , (k+1)\tau \big ], \; t\in [0, T], \; {\varvec{v}}_{k+1}=(v^a_{k+1})_{a\in A}, \end{aligned}$$
(4.11)

so that, thanks to the previous Lemma, we have

$$\begin{aligned} {\varvec{v}}_\tau (t)\in -\partial J({\varvec{\nu }}_\tau (t)), \; t\in [0, T]. \end{aligned}$$
(4.12)

Thanks to Proposition 3.8, note that \(\sup _{t\in [0,T]} \Vert {\varvec{v}}_{\tau }(t)\Vert _{L^{\infty }({\varvec{\nu }}_\tau (t)\otimes \mu )} \le C\); we can then define the time-dependent-family of signed measures

$$\begin{aligned} \text{ d }{\varvec{q}}_\tau (t)= {\varvec{v}}_\tau (t) \text{ d } {\varvec{\nu }}_{\tau }(t), \text{ i.e. } \text{ d } q_\tau (t)^a= v_\tau (t)^a \text{ d } \nu _{\tau }(t)^a. \end{aligned}$$

Denoting by \(\lambda \) the one dimensional Lebesgue measure on [0, T], we may assume, taking a subsequence if necessary, that the bounded family of measures on \({\varvec{q}}_{\tau _n} \otimes \mu \otimes \lambda \) converges weakly \(*\) to some bounded signed measure on \([-R,R]\times A\times [0,T]\) which is necessarily of the form \({\varvec{q}}\otimes \mu \otimes \lambda \) because marginals (with respect to the a and t variables) are stable under weak limits. Since \(\vert {\varvec{q}}_{\tau _n}\vert \otimes \mu \otimes \lambda \le C {\varvec{\nu }}_{\tau _n} \otimes \mu \otimes \lambda \) and \({\varvec{\nu }}_{\tau _n}\otimes \mu \) converges weakly \(*\) to \({\varvec{\nu }}\otimes \mu \), we have \(\vert {\varvec{q}}\vert \otimes \mu \otimes \lambda \le C {\varvec{\nu }}\otimes \mu \otimes \lambda \). Hence, for \(\mu \otimes \lambda \) a.e. (at), the limit satisfies \(\vert q(t)^a \vert \le C \nu (t)^a\) and therefore can be written in the form \(\text{ d } q(t)^a=v(t)^a \text{ d } \nu ^a(t)\) (\({\varvec{q}}= {\varvec{v}}{\varvec{\nu }}\) for short) with \(\Vert {\varvec{v}}(t)\Vert _{L^{\infty }({\varvec{\nu }}(t)\otimes \mu )} \le C\) for \(\lambda \)-a.e. \(t\in [0,T]\). We thus have

$$\begin{aligned} {\varvec{q}}_{\tau _n} \otimes \mu \otimes \lambda = ({\varvec{v}}_{\tau _n} {\varvec{\nu }}_{\tau _n}) \otimes \mu \otimes \lambda \mathop {\rightharpoonup }\limits ^{*}{\varvec{q}}\otimes \mu \otimes \lambda = ({\varvec{v}}{\varvec{\nu }}) \otimes \mu \otimes \lambda \text{ as } n\rightarrow +\infty .\nonumber \\ \end{aligned}$$
(4.13)

In other words, for every \(\phi \in C([0,T]\times A\times [-R, R])\) we have

$$\begin{aligned} \begin{array}{ll} &{}\lim _n\int _0^T \int _{A} \Big (\int _{[-R, R]} \phi (t,a,x) v_{\tau _n}(t)^a(x) \text{ d } \nu _{\tau _n}(t)^a(x) \Big ) \text{ d } \mu (a) \text{ d }t\\ &{}\quad =\int _0^T \int _{A} \Big (\int _{[-R, R]} \phi (t,a,x) v(t)^a(x) \text{ d } \nu (t)^a(x) \Big ) \text{ d } \mu (a) \text{ d }t. \end{array} \end{aligned}$$

4.3 Existence by passing to the limit

Our task now consists in showing that the limit curve \(t\mapsto {\varvec{\nu }}(t)\) is a gradient flow solution associated to the velocity \(t\mapsto {\varvec{v}}(t)\) constructed above. Let us first check that it satisfies the system of continuity equations (2.16). To do so, take test functions \(\psi \in C(A)\) and \(\phi \in C^2([0,T]\times [-R,R])\) and let us consider

$$\begin{aligned} \begin{array}{ll} &{}\displaystyle \int _0^{N\tau } \Big ( \int _{K} \psi (a) \partial _t \phi (t,x) \text{ d } \nu _{\tau }(t)^a(x) \text{ d } \mu (a) \Big ) \text{ d } t\\ &{}\quad =\int _A \psi (a) \Big ( \sum _{k=0}^{N-1} \int _{-R}^R (\phi ((k+1)\tau , x)-\phi (k\tau ,x)) \text{ d } \nu _{k+1}^a(x) \Big ) \text{ d } \mu (a). \end{array} \end{aligned}$$

Then, we rewrite

$$\begin{aligned} \begin{array}{ll} &{}\sum _{k=0}^N \int _{-R}^R (\phi ((k+1)\tau , x)-\phi (k\tau ,x)) \text{ d } \nu _{k+1}^a(x)\\ &{}\quad =\sum _{k=1}^{N-1} \int _{-R}^R \phi (k\tau ,x)) \text{ d } (\nu _k^a- \nu _{k+1}^a)(x)\\ &{}\quad \quad + \int _{-R}^R \phi (N\tau , x) \text{ d } \nu _{N}^a(x)- \int _{-R}^R \phi (0, x) \text{ d } \nu _1^a(x). \end{array} \end{aligned}$$

Using the optimal plans \(\gamma ^a_{k+1}\) as in Lemma 4.2, we then rewrite

$$\begin{aligned} \int _{-R}^R \phi \big (k\tau ,x)\big ) \text{ d } \big (\nu _k^a- \nu _{k+1}^a)(x)=\int _{-R}^R \int _{-R}^R \big (\phi (k\tau ,x)-\phi (k\tau ,y)\big ) \text{ d } \gamma _{k+1}^a(x,y). \end{aligned}$$

A Taylor expansion gives

$$\begin{aligned} \phi (k\tau ,x)-\phi (k\tau ,y)= & {} \partial _x \phi (k\tau , y)(x-y)+ l_k(\tau , a, x,y), \\ \vert l_k(\tau , a, x,y)\vert\le & {} \Vert \partial _{xx} \phi \Vert _{\infty } \vert x-y\vert ^2. \end{aligned}$$

Integrating and using the optimality of \(\gamma ^a_{k+1}\) gives

$$\begin{aligned} l_k(\tau , a):= \int _{-R}^R \int _{-R}^R \vert l_k(\tau , a, x,y)\vert \text{ d } \gamma _{k+1}^a(x,y) \le \Vert \partial _{xx} \phi \Vert _{\infty } W_2^2(\nu _k^a, \nu _{k+1}^a) \end{aligned}$$

and then, recalling (4.6) we have

$$\begin{aligned} \int _A \psi (a) \sum _{k=1}^{N-1} l_k(\tau , a) \text{ d } \mu (a) \le C \tau \Vert \partial _{xx} \phi \Vert _{\infty } \Vert \psi \Vert _{\infty }. \end{aligned}$$
(4.14)

Recalling the definition of the discrete velocity \(v_{k+1}\) from Lemma 4.2, we can rewrite

$$\begin{aligned} \int _{-R}^R \int _{-R}^R \partial _x \phi (k\tau , y)(x-y) \text{ d } \gamma _{k+1}^a(x,y)=-\tau \int _{-R}^R \partial _x \phi (k\tau , x) v_{k+1}^a(x) \text{ d } \nu ^a_{k+1}(x), \end{aligned}$$

hence by definition of \({\varvec{\nu }}_\tau \) and \({\varvec{v}}_\tau \)

$$\begin{aligned}&\int _A \psi (a) \Big (\sum _{k=1}^{N-1} \int _{-R}^R \int _{-R}^R \partial _x \phi (k\tau , y)(x-y) \text{ d } \gamma _{k+1}^a(x,y)\Big ) \text{ d } \mu (a)\\&\quad =- \int _0^T \int _K \psi (a) \partial _x \phi (t,x) v_\tau (t)^a \text{ d }\nu _{\tau }(t)^a(x) \text{ d } \mu (a) \text{ d }t +O(\tau ). \end{aligned}$$

Now thanks to (4.8), we have

$$\begin{aligned} \lim _n \int _A \psi (a) \Big ( \int _{-R}^R \phi (N\tau _n, x) \text{ d } \nu _{N}^a(x) \Big ) \text{ d } \mu (a)=\int _A \psi (a) \Big ( \int _{-R}^R \phi (T, x) \text{ d } \nu (T)^a(x) \Big ) \text{ d } \mu (a)\nonumber \\ \end{aligned}$$
(4.15)

and

$$\begin{aligned} \lim _n \int _A \psi (a) \Big (\int _{-R}^R \phi (0, x) \text{ d } \nu _1^a(x)) \Big ) \text{ d } \mu (a)=\int _A \psi (a) \Big ( \int _{-R}^R \phi (0, x) \text{ d } \nu _{0}^a(x) \Big ) \text{ d } \mu (a),\nonumber \\ \end{aligned}$$
(4.16)

where we use in the above limits that \(\nu _{N}^a=\nu ^a_{\tau _n}(N\tau _n)\) and \(\nu ^a_1=\nu ^a_{\tau _n}(\tau _n)\). Putting the previous computations together, summing and using (4.15), (4.14), (4.16), we thus obtain

$$\begin{aligned}&\int _0^{N\tau } \Big ( \int _{K} \psi (a) \partial _t \phi (t,x) \text{ d } \nu _{\tau }(t)^a(x) \text{ d } \mu (a) \Big ) \text{ d } t\\&\quad =- \int _0^T \int _K \psi (a) \partial _x \phi (t,x) v_\tau (t)^a \text{ d }\nu _{\tau }(t)^a(x) \text{ d } \mu (a) \text{ d }t \\&\qquad + \int _A \psi (a) \Big ( \int _{-R}^R \phi (T, x) \text{ d } \nu (T)^a(x) \Big ) \text{ d } \mu (a)\\&\qquad -\int _A \psi (a) \Big ( \int _{-R}^R \phi (0, x) \text{ d } \nu _{0}^a(x) \Big ) \text{ d } \mu (a)+\varepsilon _\tau , \end{aligned}$$

where \(\varepsilon _{\tau _n}\) goes to 0 as \(n\rightarrow +\infty \). Taking \(\tau =\tau _n\), using (4.8) and (4.13) and letting \(n\rightarrow +\infty \) in the previous identity we get

$$\begin{aligned}&\int _{A} \psi (a) \Big (\int _0^T \int _{-R}^R (\partial _t \phi (t,x)+\partial _x \phi (t,x) v(t)^a(x) ) \text{ d } \nu (t)^a(x) \text{ d } t\Big ) \text{ d } \mu (a)\\&\quad = \int _{A} \psi (a) \Big ( \int _{-R}^R \phi (T,x) \text{ d } \nu (T)^a(x)- \int _{\mathbb {R}} \phi (0,x) \text{ d } \nu _0^a(x) \Big ) \text{ d } \mu (a). \end{aligned}$$

In other words, we have proved the following:

Lemma 4.3

For \(\mu \)-almost every a, the limit curve \(t\mapsto \nu (t)^a\) solves the continuity equation (2.16) associated to the limit velocity \(t\mapsto v(t)^a\).

It remains to check that

Lemma 4.4

For a.e. \(t\in [0,T]\), we have \({\varvec{v}}(t)\in -\partial J({\varvec{\nu }}(t))\).

Proof

By construction of the curves \({\varvec{v}}_\tau \) and \({\varvec{\nu }}_\tau \) and thanks to Lemma 4.2, we have seen in (4.12) that

$$\begin{aligned} {\varvec{v}}_\tau (t)\in -\partial J({\varvec{\nu }}_\tau (t)), \; \forall t\in [0,T] \end{aligned}$$

which means that for every \(\tau >0\), every \(t\in [0,T]\) and every \({\varvec{\eta }}\in T({\varvec{\nu }}_\tau (t))\) (as defined in Remark 3.5), we have

$$\begin{aligned} J({{\varvec{\nu }}_\tau (t)}_{\varvec{\eta }})-J({\varvec{\nu }}_\tau (t))\ge -\int _{A\times \mathbb {R}^2} v_\tau ^a(t)(y)(z-y) \text{ d } \eta ^{a, y} (z) \text{ d } \nu _\tau (t)^a(y) \text{ d } \mu (a). \end{aligned}$$
(4.17)

We wish to prove that there exists \(S\subset [0,T]\), \(\lambda \)-negligible, such that for every \(t\in [0,T]\setminus S\) and every \(\eta \in T({\varvec{\nu }}(t))\), one has

$$\begin{aligned} J({{\varvec{\nu }}(t)}_{\varvec{\eta }})-J({\varvec{\nu }}(t))\ge -\int _{A\times \mathbb {R}^2} v^a(t)(y)(z-y) \text{ d } \eta ^{a, y} (z) \text{ d } \nu (t)^a(y) \text{ d } \mu (a). \end{aligned}$$
(4.18)

To pass to the limit \(\tau =\tau _n\), \(n\rightarrow \infty \) in (4.17) to obtain (4.18), we shall proceed in several steps. Let us remark that it is enough to prove (4.17) when \(\eta ^{a,y}\) is supported by a fixed compact interval \([-R', R']\) (and then to take an exhaustive sequence of such compact intervals). Let us also recall that, thanks to Lemma 3.1 and (4.8), \(J({\varvec{\nu }}_{\tau _n}(t))\) converges to \(J({\varvec{\nu }}(t))\) as \(n\rightarrow \infty \) uniformly on [0, T].

Step 1: Let us first consider the case where \({\varvec{\eta }}\) is continuous in the sense that \((a,y)\in K \mapsto \int _{[-R', R']} \varphi (z) \text{ d } \eta ^{a,y}(z)\) is continuous for every \(\varphi \in C(\mathbb {R})\). Let \(\phi \in C(A\times \mathbb {R})\). Since \(\varphi _{\varvec{\eta }}\) defined by \(\varphi _{\varvec{\eta }}(a,y):=\int \phi (a,z) \text{ d } \eta ^{a,y}(z)\) belongs to C(K), using the fact that

$$\begin{aligned} \begin{array}{ll} \langle \phi , {\varvec{\nu }}_{\tau _n}(t)_{\varvec{\eta }}\otimes \mu \rangle &{}=\langle \varphi _{\varvec{\eta }}, {\varvec{\nu }}_{\tau _n}(t) \otimes \mu \rangle ,\\ \langle \phi , {\varvec{\nu }}(t)_{\varvec{\eta }}\otimes \mu \rangle &{}=\langle \varphi _{\varvec{\eta }}, {\varvec{\nu }}(t) \otimes \mu \rangle \end{array} \end{aligned}$$

and (4.8), we deduce that \(\lim _n d_w({\varvec{\nu }}_{\tau _n}(t)_{\varvec{\eta }}, {\varvec{\nu }}(t)_{\varvec{\eta }})=0\) for every \(t\in [0,T]\). Hence, thanks to Lemma 3.1, we have

$$\begin{aligned} \lim _{n} [J({{\varvec{\nu }}_{\tau _n}(t)}_{\varvec{\eta }})-J({\varvec{\nu }}_{\tau _n}(t))]= J({{\varvec{\nu }}_\tau (t)}_{\varvec{\eta }})-J({\varvec{\nu }}_\tau (t)), \; \forall t\in [0,T]. \end{aligned}$$
(4.19)

Let \(\varphi \in C([0,T])\), \(\varphi \ge 0\). Using (4.17) gives

$$\begin{aligned}&\int _0^T \varphi (t) [J({{\varvec{\nu }}_{\tau _n}(t)}_{\varvec{\eta }})-J({\varvec{\nu }}_{\tau _n}(t))]\text{ d } t \\&\quad \ge -\int _{[0,T]\times A\times \mathbb {R}^2} \varphi (t) v_{\tau _n}^a(t)(y)(z-y) \text{ d } \eta ^{a, y} (z) \text{ d } \nu _{\tau _n}(t)^a(y) \text{ d } \mu (a) \text{ d } t\\&\quad =-\int _{[0,T]\times K} \varphi (t) \psi (a,y) \text{ d } q_{\tau _n}(t)^a(y) \text{ d }\mu (a) \text{ d } t, \end{aligned}$$

where

$$\begin{aligned} \psi (a,y):=\int (z-y)\text{ d } \eta ^{a,y}(z) \end{aligned}$$

belongs to C(K). We then deduce from (4.13), (4.19) and Lebesgue’s dominated convergence that

$$\begin{aligned}&\int _0^T \varphi (t) [J({{\varvec{\nu }}(t)}_{\varvec{\eta }})-J({\varvec{\nu }}(t))]\text{ d } t \\&\quad \ge -\int _{[0,T]\times K} \varphi (t) \psi (a,y) \text{ d } q^a(y) \text{ d }\mu (a) \text{ d } t\\&\quad = -\int _{[0,T] \times A\times \mathbb {R}^2} \varphi (t) v^a(t)(y)(z-y) \text{ d } \eta ^{a, y} (z) \text{ d } \nu (t)^a(y) \text{ d } \mu (a) \text{ d } t. \end{aligned}$$

This implies that there exists a negligible subset \(S_{\varvec{\eta }}\) of [0, T] outside which (4.18) holds.

Step 2: For every \(N\in {\mathbb N}^*\), let \(\Delta _N:=\{(\alpha _0, \cdots , \alpha _{2N-1}) \in \mathbb {R}_+^{2N} \; : \; \sum _{k=0}^{2N-1} \alpha _i=1\}\), \(F_N\) be a countable and dense family in \(C(K, \Delta _N)\), and consider

$$\begin{aligned} D_N:=\Big \{ (a,y)\in K \mapsto \sum _{k=0}^{2N-1} \alpha _k(a,y) \delta _{z_k^N}, \; (\alpha _0, \ldots , \alpha _{2N-1})\in F_N\Big \}, \; D:=\bigcup _{N\in {\mathbb N}^*} D_N, \end{aligned}$$

where for \(k=0, \ldots , 2N-1\), \(z_k^N\) denotes the midpoint of the interval \([-R'+kR'/N, -R'+(k+1)R'/N]\). Since D is countable and its elements belong to \(C(K, (\mathcal{P}([-R', R']), W_2))\), it follows from Step 1, that (4.18) holds for every \({\varvec{\eta }}\in D\) and every \(t\in [0,T]\setminus S\) where S is the \(\lambda \)-negligible set

$$\begin{aligned} S:=\bigcup _{{\varvec{\eta }}\in D} S_{{\varvec{\eta }}}. \end{aligned}$$
(4.20)

Step 3: Let \(t\in [0,T]\setminus S\), and \({\varvec{\eta }}\in T({\varvec{\nu }})\) having its support in \([-R', R']\). Note that now we are working with a fixed t so that we just have to suitably approximate \({\varvec{\eta }}\) by a sequence in D. For \(N\in {\mathbb N}^*\), first define for every \((a,y)\in K\) the discrete measure

$$\begin{aligned} \sum _{k=0}^{2N-1} f_k^N(a,y) \delta _{z_k^N}, \; f_k^N(a,y) :=\eta ^{a,y}(I_k^N), \end{aligned}$$
(4.21)

where \(I_k^N\) is the interval \([-R'+kR'/N, -R'+(k+1)R'/N)\) if \(k=0, \ldots , 2N-2\) and \(I_{2N-1}^N:=[R'(1-1/N), R']\). We then have

$$\begin{aligned} \sup _{(a,y)\in K} W_1\Big (\eta ^{a,y}, \sum _{k=0}^{2N-1} f_k^N(a,y) \delta _{z_k^N}\Big ) \le \frac{R'}{N}. \end{aligned}$$
(4.22)

The function \((f_k^N)_{k=0, \ldots , 2N-1}\) is not continuous but belongs to \(L^1({\varvec{\nu }}(t)\otimes \mu , \Delta _N)\). Since \(C(K, \Delta _N)\) is dense in \(L^1({\varvec{\nu }}(t)\otimes \mu , \Delta _N)\), there exist \((g_0^N, \ldots , g_{2N-1}^N)\in C(K, \Delta _N)\) such that

$$\begin{aligned} \sum _{k=0}^{2N-1} \int _K \vert f_k^N(a,y)-g_k^N(a,y) \vert \text{ d } \nu (t)^a(x) \text{ d } \mu (a) \le \frac{1}{N}. \end{aligned}$$
(4.23)

Since we have chosen \(F_N\) dense in \(C(K, \Delta _N)\), there exist \(\alpha =(\alpha _0^N, \ldots , \alpha _{2N-1}^N)\in F_N\) such that

$$\begin{aligned} \sum _{k=0}^{2N-1} \sup _{(a,y)\in K} \vert g_k^N(a,y)-\alpha _k^N(a,y) \vert \le \frac{1}{N}. \end{aligned}$$
(4.24)

We then define \(\eta _N\in D\) by

$$\begin{aligned} \eta _N^{a,y}:=\sum _{k=0}^{2N-1} \alpha _k^N(a,y) \delta _{z_k^N}. \end{aligned}$$

Thanks to Kantorovich duality formula (3.1), it is easy to see that for every \(\alpha \) and \(\beta \) in \(\Delta _N\), \(W_1(\sum _k \alpha _k \delta _{z_k^N}, \sum _k \beta _k \delta _{z_k^N}) \le R' \sum _k \vert \alpha _k-\beta _k \vert \). In particular, thanks to (4.23), we have

$$\begin{aligned} \int _K W_1\Big (\sum _k f_k^N (a,y) \delta _{z_k^N}, \sum _k g_k^N (a,y) \delta _{z_k^N}\Big ) \text{ d } ({\varvec{\nu }}(t)\otimes \mu )(a,y) \le \frac{R'}{N}. \end{aligned}$$
(4.25)

Similarly, (4.24) implies that

$$\begin{aligned} \sup _{(a,y)\in K} W_1\Big (\eta _N^{a,y}, \sum _{k=0}^{2N-1} g_k^N(a,y)\delta _{z_k^N}\Big ) \le \frac{R'}{N}. \end{aligned}$$
(4.26)

We know, from Step 2 that for every \(N\in {\mathbb N}^*\):

$$\begin{aligned} J({{\varvec{\nu }}(t)}_{{\varvec{\eta }}_N})-J({\varvec{\nu }}(t))\ge -\int _{A\times \mathbb {R}^2} v^a(t)(y)(z-y) \text{ d } \eta _N^{a, y} (z) \text{ d } \nu (t)^a(y) \text{ d } \mu (a). \end{aligned}$$
(4.27)

Thanks to (4.22), (4.25) and (4.26) and the triangle inequality, we have

$$\begin{aligned} \lim _{N\rightarrow \infty } \int _K W_1(\eta ^{a,y}, \eta _N^{a,y}) \text{ d } ({\varvec{\nu }}(t)\otimes \mu )(a,y)=0. \end{aligned}$$
(4.28)

Recalling that \({\varvec{v}}(t)\in L^{\infty }({\varvec{\nu }}(t)\otimes \mu )\) and using (3.8), we have

$$\begin{aligned}&\Big \vert \int _{K} v^a(t)(y) \Big ( \int _{[-R',R']} (z-y) \text{ d } (\eta _N^{a, y}-\eta ^{a,y}) (z)\Big ) \text{ d } \nu (t)^a(y) \text{ d } \mu (a) \Big \vert \\&\quad \le \Vert {\varvec{v}}\Vert _{L^{\infty }({\varvec{\nu }}(t)\otimes \mu )} \int _K W_1(\eta ^{a,y}, \eta _N^{a,y}) \text{ d } ({\varvec{\nu }}(t)\otimes \mu )(a,y) \end{aligned}$$

so that the right-hand side of (4.27) converges to

$$\begin{aligned} -\int _{A\times \mathbb {R}^2} v^a(t)(y)(z-y) \text{ d } \eta ^{a, y} (z) \text{ d } \nu (t)^a(y) \text{ d } \mu (a) \end{aligned}$$

as \(N\rightarrow \infty \). As for the convergence of the right-hand side of (4.27), we have to show that \(\lim _N W_1({\varvec{\nu }}_{{\varvec{\eta }}_N}\otimes \mu , {\varvec{\nu }}_{{\varvec{\eta }}}\otimes \mu )=0\). For this, we shall use the Kantorovich-duality formula (3.1) and observe that if \(\phi \in C(K)\) is 1-Lipschitz then

$$\begin{aligned} \int _K \phi (a,y) \text{ d } \left( ({\varvec{\nu }}_{{\varvec{\eta }}_N}- {\varvec{\nu }}_{{\varvec{\eta }}})\otimes \mu \right) (a,y) \le \int _K W_1(\eta ^{a,y}, \eta _N^{a,y}) \text{ d } \left( {\varvec{\nu }}(t)\otimes \mu \right) (a,y) \end{aligned}$$

which tends to 0 as \(N\rightarrow \infty \) thanks to (4.28). Using Lemma 3.1 we then have \(\lim _{N\rightarrow \infty } J({{\varvec{\nu }}(t)}_{{\varvec{\eta }}_N})= J({{\varvec{\nu }}(t)}_{{\varvec{\eta }}})\). Passing to the limit \(N\rightarrow \infty \) in (4.27) gives the desired inequality (4.18). This shows that \({\varvec{v}}(t)\in -\partial J({\varvec{\nu }}(t))\) for every \( t\in [0,T]{\setminus } S\). \(\square \)

We deduce from Lemmas 4.3 and 4.4 the following existence result:

Theorem 4.5

If (2.13) holds, then for any \(T>0\), there exists a gradient flow of J starting from \({\varvec{\nu }}_0\) on the time interval [0, T]. In particular, there exists measure solutions to the system (2.16)–(2.18).

5 Uniqueness and concluding remarks

5.1 Uniqueness and stability

Thanks to (3.15), we easily deduce uniqueness and stability:

Theorem 5.1

Let \({\varvec{\nu }}_0\) and \({\varvec{\theta }}_0\) be in \(X_R\). If \(t\mapsto {\varvec{\nu }}(t)\) and \(t\mapsto {\varvec{\theta }}(t)\) are gradient flows of J starting respectively from \({\varvec{\nu }}_0\) and \({\varvec{\theta }}_0\), then

$$\begin{aligned} d({\varvec{\nu }}(t), {\varvec{\theta }}(t))\le d({\varvec{\nu }}_0, {\varvec{\theta }}_0), \; \forall t\in \mathbb {R}_+. \end{aligned}$$

In particular there is a unique gradient flow of J starting from \({\varvec{\nu }}_0\).

Proof

By definition there exists velocity fields \({\varvec{v}}\) and \({\varvec{w}}\) such that for a.e. t, \({\varvec{v}}(t)=(v(t)^a)_{a\in A}\in -\partial J({\varvec{\nu }}(t))\) and \({\varvec{w}}(t)=(w(t)^a)_{a\in A}\in -\partial J({\varvec{\theta }}(t))\) and for \(\mu \)-almost every a, one has

$$\begin{aligned} \partial _t \nu ^a+ \partial _x (\nu ^a v^a)=\partial _t \theta ^a+ \partial _x (\theta ^a w^a)=0, \; \nu ^a\vert _{t=0}=\nu _0^a, \; \theta ^a\vert _{t=0}=\theta _0^a. \end{aligned}$$
(5.1)

Since \(v^a\) and \(w^a\) are bounded in \(L^{\infty }(\nu ^a)\) and \(L^{\infty }(\theta ^a)\) respectively, it follows from well-known arguments (see [3], in particular Theorem 8.4.7 and Lemma 4.3.4) that \(t\mapsto W_2^2(\nu _t^a, \theta _t^a)\) is a Lipschitz function and that for any family of optimal plans \(\gamma _s^a\) between \(\nu _s^a\) and \(\theta _s^a\) for \(t_1\le t_2\) one has:

$$\begin{aligned} W_2^2(\nu _{t_2}^a, \theta _{t_2}^a)\le W_2^2(\nu _{t_1}^a, \theta _{t_1}^a) + \int _{t_1}^{t_2} \Big ( \int _{\mathbb {R}^2} (v^a(s)(y)-w^a(s)(z))(y-z) \text{ d } \gamma ^a_s(y,z) \Big ) \text{ d } s. \end{aligned}$$

Integrating the previous inequality gives

$$\begin{aligned} d^2(\nu _{t_2}, \theta _{t_2})\le d^2(\nu _{t_1}, \theta _{t_1}) + \int _{t_1}^{t_2} \Big ( \int _{A\times \mathbb {R}^2} (v^a(s)(y)-w^a(s)(z))(y-z) \text{ d } \gamma ^a_s(y,z) \text{ d } \mu (a)\Big ) \text{ d } s. \end{aligned}$$

But since \({\varvec{v}}(s)\in -\partial J({\varvec{\nu }}(s))\) and \({\varvec{w}}(s)\in -\partial J({\varvec{\theta }}(s))\) for a.e. s, the monotonicity relation (3.15) gives

$$\begin{aligned} \int _{A\times \mathbb {R}^2} (v^a(s)(y)-w^a(s)(z))(y-z) \text{ d } \gamma ^a_s(y,z) \text{ d } \mu (a)\le 0. \end{aligned}$$

We then obtain the desired contraction estimate. \(\square \)

5.2 Concluding remarks

5.3 Back to classical solutions, more general initial conditions

Starting from a one-dimensional kinetic model of granular media, we have defined generalized (measure) solutions thanks to a special first-integral and have proven that measure solutions exist globally in time thanks to a gradient flow approach. For classical solutions, as explained in section 2.2 there is an equivalence between the initial kinetic formulation and the system of PDEs (2.12) and (2.11) which enabled us to define weak solutions through (2.16)–(2.18). We also gave an example in section 2.4 which shows that one cannot expect that the spatial cumulative function \(G_t\) remains continuous globally in time even if \(G_0\) is very smooth, but in this example the initial condition is very singular in the velocity variable. If one starts with a more regular initial condition \(f_0\) in the phase space, it is not clear to us whether measure solutions of (2.16)–(2.18) are such that \(G_t\) remains absolutely continuous globally in time [a necessary condition to give a meaning to (1.4)]. In other words, we have defined a notion of generalized solutions to (1.4) and proved a global existence result for the latter but have a priori no guarantee that these generalized solutions have enough regularity to be solutions of (1.4).

We would also like to mention here that in our main results of existence and uniqueness of a gradient flow for J, the assumption that \(\rho _0\) is atomless plays no significant role. Actually, our results hold for any compactly supported initial condition \({\varvec{\nu }}_0\) (we did not investigate the extension to the case where this assumption is relaxed to a second moment bound, but this is probably doable). The assumption that \(\rho _0\) is atomless was used only to select unambiguously the Cauchy datum \(\nu _0^a\). We suspect that in the case where \(\rho _0\) is a discrete measure, there might be an interesting connection between gradient flows solutions (which typically select elements of the subgradient with minimal norm) and some solutions of the initial ODE system (1.2) but a more precise investigation is left for the future.

5.4 Higher dimensions, more general functionals

The motivation for the present work comes from kinetic models of granular media. Since the first integral trick of Sect. 2 is very specific to the quadratic interaction kernel case in dimension one, all our subsequent analysis has been performed in dimension one only. However, it is obvious (but we are not aware of any practical examples in kinetic theory) that our arguments can be used also to study systems of continuity equations in \(\mathbb {R}^d\) for infinitely many species (labeled by a parameter a) such as

$$\begin{aligned} \partial _t \nu ^a +\mathrm {div}_x\Big (\nu ^a (\nabla _x V(a,x) + \int _{A\times \mathbb {R}^d} \nabla _x W(a,b, x,y) \text{ d } \nu ^b(y) \text{ d } \mu (b) )\Big )=0, \end{aligned}$$

which (taking for instance W symmetric \(W(a,b,x,y)=W(b,a,y,x)\)), can be seen as the gradient flow of

$$\begin{aligned} J({\varvec{\nu }}):=\int _{A\times \mathbb {R}^d} V \text{ d } ({\varvec{\nu }}\otimes \mu )+ \frac{1}{2} \int _{A\times \mathbb {R}^d} \int _{A\times \mathbb {R}^d} W \text{ d } ({\varvec{\nu }}\otimes \mu ) \otimes \text{ d } ({\varvec{\nu }}\otimes \mu ). \end{aligned}$$