1 Introduction

Nonlinear diffusion equations are ubiquitous in several real world applications. They were introduced to analyse gas expansion in a porous medium, groundwater infiltration, and heat conduction in plasmas, to name a few applications in physics. These applications drove the first rigorous mathematical results by Zel’dovich and Kompaneets in [62] and Barenblatt in [2] regarding important particular weak solutions of nonlinear diffusion equations with homogeneous nonlinearity. The general filtration equation was then first developed in [38]. The use of these equations in oil recovery software is extensive nowadays. Another source of applications of this family of equations arises from population models in mathematical biology: ecological models [4, 6, 56] derived from probabilistic interpretations [34, 44], volume effect in Keller-Segel type models [8, 35, 48], volume exclusion in cell-cell adhesion models [11, 22], and many others.

Although a rigorous mathematical theory has been extensively provided over the years [17, 47, 58], there are particular aspects of renewed interest in view of novel applications as well as advances in mathematics. For instance, their derivation from interacting particles, with a distinction between deterministic and stochastic methods, has recently attracted attention for its implications in derivation of models in mathematical biology [22] and data science [27]. We take advantage of the gradient flow structure of nonlinear diffusions [47] to connect with nonlocal interaction equations. In fact, we rigorously derive particle approximations of nonlinear diffusions from these variational considerations by approximating their energy functional completing the approach started in [10].

For ease of presentation, let us focus on more standard diffusion equations. Let \(m\ge 1\) and consider the equation

$$\begin{aligned} \partial _t\rho = \Delta \rho ^m, \end{aligned}$$

which is better known as the heat equation for \(m=1\), or the porous medium equation (PME) in the case \(m>1\). A comprehensive study of the above PDE can be found, e.g., in the book of Vázquez, [58]. Owing to the advances in optimal transport theory, [54, 60, 61], starting from the seminal works of Jordan, Kinderlehrer, and Otto, [37, 47], such diffusion equations are known to be 2-Wasserstein gradient flows for a specific choice of the energy functional. More precisely, the previous equation can be written as

$$\begin{aligned} \left\{ \begin{array}{l} \partial _t\rho +\nabla \cdot \left( \rho v\right) =0,\\ v=-\nabla \frac{\delta {\mathcal {H}}_m}{\delta \rho }, \end{array} \right. \end{aligned}$$
(1.1)

being \(\frac{\delta {\mathcal {H}}_m}{\delta \rho }\) the first variation of the energy functional

$$\begin{aligned} {\mathcal {H}}_m[\rho ]= {\left\{ \begin{array}{ll} \int _{{\mathbb {R}}^{d}}\rho (x)\log \rho (x)dx \qquad &{}m=1\\ \frac{1}{m-1}\int _{{\mathbb {R}}^{d}}\rho ^m(x)dx &{}m>1 \end{array}\right. }. \end{aligned}$$
(1.2)

In [37] the equation of interest was the linear Fokker–Planck equation, while Otto focused on the porous medium equation in [47]. Afterwards, a 2-Wasserstein gradient flow approach has been extended to other PDEs, in particular those modelling nonlocal interaction, [1, 14, 19, 20]. The latter equation is of the form (1.1) with \(v=-\nabla \frac{\delta {\mathcal {W}}}{\delta \rho }\) and

$$\begin{aligned} {\mathcal {W}}[\rho ]=\frac{1}{2}\int _{{\mathbb {R}}^{d}}W*\rho (x)d\rho (x). \end{aligned}$$
(1.3)

Recent works in the literature show a rigorous and fascinating connection between the two energies above for \(m>1\) in (1.2) and the corresponding dynamics, by means of gradient flow techniques, c.f. [7, 10]. More precisely, exploiting the so-called blob method developed in [26], one can notice already at a formal level that an appropriate regularisation of \({\mathcal {H}}_m\) transforms a diffusion equation (which is local) into an interaction PDE (which is nonlocal) by choosing a delocalising kernel. For simplicity, let \(m=2\) and consider a standard family of non-negative radial mollifiers \(V_\varepsilon (x) = \varepsilon ^{-d}V_1(x/\varepsilon )\) for \(\varepsilon >0\) on \({{\mathbb {R}}^{d}}\). Using the commutativity of convolution with even functions such as \(V_\varepsilon \), it is indeed not difficult to see

$$\begin{aligned} {\mathcal {H}}_2[V_\varepsilon *\rho ]= & {} \int _{{\mathbb {R}}^{d}}(V_\varepsilon *\rho )^2(x)dx \\= & {} \int _{{\mathbb {R}}^{d}}(V_\varepsilon *V_\varepsilon )*\rho (x)d\rho (x)=\int _{{\mathbb {R}}^{d}}W_\varepsilon *\rho (x)d\rho (x)=2{\mathcal {W}}_\varepsilon [\rho ], \end{aligned}$$

by setting \(W_\varepsilon {:=}V_\varepsilon *V_\varepsilon \). This observation sheds light on the aforementioned link between local and nonlocal PDEs. As a natural byproduct such a connection provides a rigorous particle approximation for a class of nonlinear diffusion equations. More precisely, this hinges on deterministic approaches for nonlocal interaction equations, since, particles are solutions, i.e. the following empirical measure \(\rho _t^N\) is a weak solution of (1.1) with \(v=-\nabla \frac{\delta {\mathcal {W}}}{\delta \rho }\) and \({\mathcal {W}}\) as in (1.3)

$$\begin{aligned} \rho _t^N=\frac{1}{N}\sum _{i=1}^N\delta _{X_i(t)}, \end{aligned}$$

where, for any \(i=1,\dots ,N\), \(X_i(t)\) solves the ODE

$$\begin{aligned} {\dot{X}}_i(t)=-\frac{1}{N}\sum _{j}\nabla W(X_i(t)-X_j(t)). \end{aligned}$$

Further details on this aspect can be found, e.g., in [9, 14], and in [31, 32] in case of systems of nonlocal PDEs. This structure is advantageous for the computational approximation of continuous solutions to (1.1). The main issue when diffusion is present is that particles do not remain particles. Indeed, if the initial datum is a Dirac delta, we have an immediate smoothing effect, excluding measure solutions. However, numerical evidence of these deterministic particle methods [10] show that this can be achieved, as we shall see later on.

In this manuscript, we consider a general class of internal energy functionals \({\mathcal {F}}~:~{{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\rightarrow (-\infty ,+\infty ]\) given by

$$\begin{aligned} {\mathcal {F}}[\rho ] {:=} \left\{ \begin{array}{cl} \int _{{{\mathbb {R}}^{d}}}F(\rho (x))\, dx, &{}\rho \ll \textrm{Leb}({{\mathbb {R}}^{d}}) \\ +\infty , &{}\text {otherwise} \end{array} \right. , \end{aligned}$$

where we identify the measure \(\rho \) with its density \(\rho (x)\) if it is absolutely continuous with respect to Lebesgue measure and \({{{\mathcal {P}}_2({{\mathbb {R}}^{d}})}}\) denotes the set of probability measures with finite second order moment. We define the regularised internal energy functional \({\mathcal {F}}^\varepsilon ~:~{{\mathcal {P}}}_2({{\mathbb {R}}^{d}})\rightarrow (-\infty ,+\infty ]\) given by

$$\begin{aligned} {\mathcal {F}}^\varepsilon [\rho ] {:=} \int _{{{\mathbb {R}}^{d}}}F(V_\varepsilon *\rho (x)) \, dx, \end{aligned}$$

which gives rise to a class of nonlocal PDEs

$$\begin{aligned} \partial _t\rho =\nabla \cdot (\rho \nabla V_\varepsilon *F'(V_\varepsilon *\rho )). \end{aligned}$$
(NLE)

The functional \({\mathcal {F}}\) includes \({\mathcal {H}}_m\), but it is not limited to it, c.f. Sect. 2. The reader is invited to verify

$$\begin{aligned} \frac{\delta {\mathcal {F}}^\varepsilon }{\delta \rho }(\rho ) = V_\varepsilon * \left[ \frac{\delta {\mathcal {F}}}{\delta \rho }(V_\varepsilon * \rho )\right] , \end{aligned}$$

which motivates the consideration of (NLE) as the 2-Wasserstein gradient flow of \({\mathcal {F}}^\varepsilon \). Following the strategy proposed in [7], defining the pressure by \(P(x){:=}x F'(x)-F(x)\), as in [1, 19, 42], we construct weak solutions of the nonlinear diffusion equation

$$\begin{aligned} \partial _t\rho =\Delta P(\rho ) \end{aligned}$$
(DE)

as a limit of a sequence of weak measure solutions of (NLE), in case F behaves like power laws of porous medium type, for \(m>1\).

The blob method for diffusion was first introduced in [10] for diffusion equations with the addition of local and nonlocal drifts. Let us mention that a similar approach was used on the previous work [26] approximating nonlocal equations with singular kernels by smooth kernels. The authors in [10] consider a slightly different regularisation of the internal energy which is better for numerical purposes, see [10, Eq. (6)]. Despite this difference, the gradient flow perspective remains at the forefront of their and our present work. The corresponding nonlocal gradient flow is indeed different from (NLE), c.f. [10, Eq. (8)], but it coincides with ours in case \(m=2\) for the energy \({\mathcal {H}}_2\). In [10], \(\Gamma \)-convergence of the regularised energy, as well as that of minimisers is proven for \(m\ge 1\). The authors show that stability of gradient flows in the \(\varepsilon \rightarrow 0\) can be established for \(m\ge 2\) using the framework introduced by Sandier and Serfaty in [53, 55] and the concept of \(\lambda \)-gradient flows developed in [1]. This strategy requires to verify additional assumptions which are only known to hold in the case \(m=2\) for an initial datum with finite second order moment and log-entropy, i.e. \({\mathcal {H}}_1[\rho _0]<\infty \). The result for \(m=2\) was previously proven in [40], however on a bounded domain with periodic boundary conditions. The blob method in [10] is a deterministic particle method for linear and nonlinear diffusion on \({{\mathbb {R}}^{d}}\). Numerical simulations in [10, Section 6] suggest that the particle approximation remains valid even when \({\mathcal {H}}_1[\rho _0]=\infty \). Relaxing the condition \({\mathcal {H}}_1[\rho _0]<\infty \) and rigorously proving a quantitative particle approximation is still an open problem, and left for future research.

In the case \(m=2\), in the same spirit of [10, 27], the authors in [7] construct weak solutions of the quadratic porous medium equation as a localising limit (\(\varepsilon \rightarrow 0\)) of a sequence of weak measure solutions of the nonlocal interaction equation (NLE), for \(F(x)=x^2\). The authors work directly at the level of the (nonlocal) equations by means of a time-discretisation scheme which allows to work with lack of convexity, as for instance in the case of cross-diffusion systems, or even PDEs with no purely gradient flow structure. As in [10], finite initial log-entropy is required, thus excluding particle approximation. However, simultaneously to [7], the authors in [27] focus on a weighted (quadratic) porous medium equation which is relevant, e.g. in sampling — the weight, \({{\bar{\rho }}}\) in their notations, represents a target probability measure to be approximated from specific samples drawn from it. The blob method is indeed useful to develop a deterministic particle approximation for the weighted porous medium equation, and, as a byproduct, it provides a way to quantize a target \({{\bar{\rho }}}\) in the long-time behaviour. We stress that also in this work it is essential to assume \({\mathcal {H}}_1[\rho _0]<\infty \), however using again \(\lambda \)-convexity of the regularised energy one can achieve a rigorous particle approximation as consequence of \(\lambda \)-stability (or contractivity) of Wasserstein gradient flows, as in [1]. This means one can achieve, so far, a qualitative result, as the initial datum needs to be approximated fast enough, c.f. [27, Theorem 1.4]. To the best of our knowledge, a quantitative result has not been achieved yet in more than one dimension. Still in one space dimension, the authors in [29] introduce a deterministic particle approximation for aggregation-diffusion equations, including the porous medium equation for the subquadratic (\(1<m<2\)) and superquadratic (\(m>2\)) cases. This approach, however, is limited to one space dimension. All the previous three works do not make use of gradient flow techniques. Indeed, other attempts for a particle method have been proposed in the literature. Let us mention two simultaneous numerical methods for linear diffusion \((m=1)\) [30, 52]. In one dimension and for nonlinear diffusions, there are other numerical methods based on the PDE satisfied by the transporting maps, see [21, 23, 36]. A nice survey of most of the available numerical methods for these families of equations can be found in [18].

Further related to particle methods, we mention the seminal paper by Oelschläger, [45], where a stochastic particle approximation is proven for classical and positive solutions of the quadratic porous medium equation in \({{\mathbb {R}}^{d}}\), and for weak solutions in one dimension, and the recent results in [24] for systems. In [34] very weak solutions of the viscous porous medium equation (\(m>1\)) are studied as a limit of a sequence of distributions of the solutions to nonlinear stochastic differential equations generalising previous results [34, 43, 46]. In [49] strong \(L^1\)-solutions, c.f. [57], of the quadratic porous medium equation are derived from a stochastic mean field interacting particle system with the addition of a vanishing Brownian motion.

Our strategy is different from the aforementioned stochastic approaches as it is based on an optimal transport approach avoiding the addition of higher regularity induced by the (vanishing) viscosity method. We consider a time-discretisation of (NLE) à la Jordan-Kinderlehrer-Otto (JKO), c.f. [37]. This method provides uniform bounds on the approximating sequence in terms of the associated energy and second order moments. Although the sequence solving (NLE) is only a measure, we are able to prove strong \(L^m\)-compactness of a smoother sequence of solutions for the \(\varepsilon \rightarrow 0\) limit by using the so-called flow interchange technique, c.f. [41]. More precisely, one of our main contributions is to construct weak solutions of

$$\begin{aligned} \partial _t\rho =\Delta (\rho ^m) \end{aligned}$$
(PME)

as a subsequential \(\varepsilon \rightarrow 0\) limit of weak measure solutions to

$$\begin{aligned} \partial _t\rho ^\varepsilon =\frac{m}{m-1}\nabla \cdot (\rho ^\varepsilon \nabla V_\varepsilon *(V_\varepsilon *\rho ^\varepsilon )^{m-1}), \end{aligned}$$
(NLE-m)

for all \(m>1\). The same result is proven also for (NLE) and (DE). In particular, this extends [7] to the case \(m>2\), which is not trivial in view of the nonlinearities involved, and to a class of general nonlinear diffusion function. In [10], their gradient flow convergence result for \(m>2\) was conditional on a uniform BV bound for \(\rho ^\varepsilon \) while we make no such assumptions here. Furthermore, we are also able to treat the case \(1<m<2\) which is more challenging due to the lack of regularity at zero.

As a byproduct of our analysis we obtain an existence result for nonlocal diffusion equations related to a nonlocal internal energy functional. In particular, we are able to construct weak solutions to (NLE-m) via the JKO scheme for \(m>1\). While this may not be surprising, this is the first result in this direction to the best of our knowledge. We also provide a particle approximation for (NLE) in case F behaves like power laws, for \(m>1\). This result is purely qualitative, and quantitative estimates are not proven. Finally, we stress that the strategy we use to construct weak solutions does not require convexity of the internal energy, thus allowing to extend this method to non-convex energies, e.g. nonlinear cross-diffusion systems, see [7]. We leave the extension to systems for a future work as it deserves a deeper analysis.

The case \(m=1\), i.e. linear diffusion, is not completely covered in our theory, due to the lack of control on the compactness near the logarithmic singularity in the gradient flow approach. More precisely, our strategy does provide an approximating scheme, validated numerically in [10], but we are not able at this stage to identify the limit as solution of the heat equation. Indeed, the logarithmic singularity cannot be coped with for the case \(m=1\) when the mollifier \(V_1\) is compactly supported. This is indeed one of the reasons we did not assume \(V_1\) is compactly supported in the case \(m\ge 2\). Similar difficulties are found for the Landau equation [39] in plasma physics, for which efficient deterministic particle methods preserving all the properties of the Landau equation at the discrete level were introduced in [16] using the same strategy as in this work. Moreover such an approximated Landau equation has been analytically studied in [12, 13] showing the existence of solutions for the approximated problems where \(V_1(x)=e^{-|x|}\) with an appropriate mollification at the origin. The particular non-compactly supported kernel is crucial in the detailed estimates performed in [12]. Dealing with the logarithmic singularity in these problems is a challenging open problem.

1.1 Structure of the manuscript

Section 2 sets the assumptions, notations, and definitions we use in this paper. At the end of Sect. 2, we state the precise results obtained once the appropriate notions of solutions are introduced. Section 3 focuses on the construction of weak solutions \(\rho ^\varepsilon \) to (NLE) (c.f. Theorem 2.1) based on the JKO scheme [37]. Section 4 discusses the strong compactness criteria used to construct a limit \(\rho \) (which is the candidate weak solution to (DE)) from the sequence \(\rho ^\varepsilon \). Section 5 verifies that the limit \(\rho \) is a weak solution to (DE) (c.f. Theorem 2.2) by passing to the limit \(\varepsilon \rightarrow 0\) from (NLE). In Sect. 6, we sketch the ideas behind the proofs of Theorem 2.3, which gives conditions for uniqueness of solutions to (NLE), and Corollary 2.1, which provides a particle approximation to (DE). Finally, Sect. 7 collects various technical results which, possibly with minor adaptations, already exist in the literature.

2 Preliminaries and results

The mollifying sequence is generated by \(V_\varepsilon (x) = \varepsilon ^{-d}V_1(x/\varepsilon )\) for \(\varepsilon >0\). We assume that the generating function \(V_1\) satisfies

\({\textbf {(V)}}\):

\(V_1\in C_b({{\mathbb {R}}^{d}};[0,+\infty ))\cap C^1({{\mathbb {R}}^{d}})\), \(\Vert V_1\Vert _{L^1}=1\), \(V_1(x)=V_1(-x)\), \(\int _{{\mathbb {R}}^{d}}|x|^2V_1(x)\,dx<+\infty \), \(|\nabla V_1|\in L^1({{\mathbb {R}}^{d}})\), and \(|\nabla V_1(x)|\le C(1+|x|)\).

Depending on the results we prove, we assume the function \(F:[0,+\infty ) \rightarrow (-\infty ,+\infty ]\) satisfies some combination of the following assumptions:

\({\textbf {(F1)}}\):

F is a proper, convex, and lower semicontinuous function such that

$$\begin{aligned} F(0) = 0,\quad \liminf _{s\uparrow +\infty }\frac{F(s)}{s}= +\infty , \quad \liminf _{s\downarrow 0}\frac{F(s)}{s^\alpha }>-\infty , \quad \text {for some }\alpha > \frac{d}{d+2}. \end{aligned}$$
\({\textbf {(F2)}}\):

\(F \in C^1([0,+\infty ))\).

\({\textbf {(F3)}}\):

\(F \in C([0,+\infty ))\cap C^2((0,+\infty ))\).

\(({{\textbf {F}}}_m)\):

There exist \(c_1, \, c_2>0\) and \(m\ge 1\) such that \(c_1 x^{m-2} \le F''(x) \le c_2 x^{m-2}\) for all \(x> 0\).

Remark 2.1

(Comments on the assumptions)  (F1) is lifted directly from [1, Example 9.3.6] so that \({\mathcal {F}}\) enjoys certain properties; it is well-defined and the associated JKO scheme is well-posed c.f. [37].

For the reader’s convenience, we observe the condition \(\liminf _{s\downarrow 0}\frac{F(s)}{s^\alpha }>-\infty \) for some \(\alpha >\frac{d}{d+2}\) ensures (c.f. [1, Remark 9.3.7] and Lemma A.1) that \(F^-(\rho ) \in L^1({{\mathbb {R}}^{d}})\) whenever \(\rho \in {{\mathcal {P}}}_2({{\mathbb {R}}^{d}})\) is absolutely continuous with respect to Lebesgue measure. In particular, on any sublevel subset \(\{\rho \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\, | \, m_2(\rho ) \le C\}\), the functional \({\mathcal {F}}\) is uniformly bounded below.

The superlinear growth (c.f. [1, Remark 9.3.8]) and convexity ensure that \({\mathcal {F}}\) is lower semicontinuous in \({{\mathcal {P}}}_1({{\mathbb {R}}^{d}})\).

Assumption (F3) mainly refers to energies lacking regularity at the origin as in the case \(F(x)=x\log x\). We stress that (F2) is used to construct solutions to (NLE) in Sect. 5, however it is not used to derive the compactness estimates in Sect. 4. Conversely, (F3) is used for the compactness estimates in Sect. 4 but is not assumed to construct solutions to (NLE). The motivating examples which satisfy all of (F1)(F2), (F3), and (\({{\textbf {F}}}_m\)) are power laws \(F(x) = \frac{1}{m-1}x^m\) for \(m>1\).

A further discussion can be found after the statements of Theorem 2.1 and Theorem 2.2.

Throughout the manuscript we will denote by \({{\mathcal {P}}}({{\mathbb {R}}^{d}})\) the set of probability measures on \({{\mathbb {R}}^{d}}\), for \(d\in {\mathbb {N}}\), and by \({{\mathcal {P}}}_p({{\mathbb {R}}^{d}}){:=}\{\rho \in {{\mathcal {P}}}({{\mathbb {R}}^{d}}):m_p(\rho )<+\infty \}\), being \(m_p(\rho ){:=}\int _{{\mathbb {R}}^{d}}|x|^p\,d\rho (x)\) the \(p^{\textrm{th}}\)-order moment of \(\rho \), for \(1\le p<\infty \). We shall use \({{\mathcal {P}}}_p^a({{\mathbb {R}}^{d}})\) for elements in \({{\mathcal {P}}}_p({{\mathbb {R}}^{d}})\) which are absolutely continuous with respect to the Lebesgue measure. For \(p=2\), the 2-Wasserstein distance between \(\mu _1,\mu _2\in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) is

$$\begin{aligned} d_W^2(\mu _1,\mu _2){:=}\min _{\gamma \in \Gamma (\mu _1,\mu _2)}\left\{ \int _{{{\mathbb {R}}^{2d}}}|x-y|^2\,d\gamma (x,y)\right\} , \end{aligned}$$
(2.1)

where \(\Gamma (\mu _1,\mu _2)\) is the class of all transport plans between \(\mu _1\) and \(\mu _2\), that is the class of measures \(\gamma \in {{\mathcal {P}}}({{\mathbb {R}}^{2d}})\) such that, denoting by \(\pi _i\) the projection operator on the i-th component of the product space, the marginality condition

$$\begin{aligned} (\pi _i)_{\#}\gamma =\mu _i \quad \text{ for }\ i=1,2 \end{aligned}$$

is satisfied. In the expression above, marginals are the push-forward of \(\gamma \) through \(\pi _i\). For a measure \(\rho \in {{\mathcal {P}}}({{\mathbb {R}}^{d}})\) and a Borel map \(T:{{\mathbb {R}}^{d}}\rightarrow {{\mathbb {R}}^{n}}\), \(n\in {\mathbb {N}}\), the push-forward of \(\rho \) through T is defined by

$$\begin{aligned} \int _{{{\mathbb {R}}^{n}}}f(y)\,dT_{\#}\rho (y)=\int _{{{\mathbb {R}}^{d}}}f(T(x))\,d\rho (x) \qquad \text{ for } \text{ all } \text{ Borel } \text{ functions } \text{ f } \text{ on }\ {{\mathbb {R}}^{n}}. \end{aligned}$$

Setting \(\Gamma _0(\mu _1,\mu _2)\) as the class of optimal plans, i.e. minimizers of (2.1), the 2-Wasserstein distance can be written as

$$\begin{aligned} d_W^2(\mu _1,\mu _2)=\int _{{{\mathbb {R}}^{2d}}}|x-y|^2\,d\gamma (x,y), \qquad \gamma \in \Gamma _0(\mu _1,\mu _2). \end{aligned}$$

We denote the 1-Wasserstein distance with \(d_1\) and it is defined by

$$\begin{aligned} d_1(\mu _1,\mu _2){:=}\min _{\gamma \in \Gamma (\mu _1,\mu _2)}\left\{ \int _{{{\mathbb {R}}^{2d}}}|x-y|\,d\gamma (x,y)\right\} . \end{aligned}$$
(2.2)

We refer the reader to [1, 54, 61] for further details on optimal transport theory and Wasserstein spaces.

Below we recall the concepts of solutions used throughout the manuscript, distinguishing between measure and weak solutions.

Definition 2.1

(Weak measure solution to (NLE)) Suppose F satisfies (F1) and (F2). An absolutely continuous curve \(\rho ^\varepsilon :[0,T]\rightarrow {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), mapping \(t\in [0,T]\mapsto \rho _t^\varepsilon \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), is a weak measure solution to (NLE) if, for every \(\varphi \in C^1_c({{\mathbb {R}}^{d}})\) and any \(t\in [0,T]\), it holds

$$\begin{aligned}{} & {} \int _{{\mathbb {R}}^{d}}\varphi (x)d\rho _t^\varepsilon (x)\!-\!\int _{{\mathbb {R}}^{d}}\varphi (x)d\rho _0(x) \nonumber \\{} & {} \quad =-\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x) \cdot [\nabla V_\varepsilon *F'(V_\varepsilon *\rho _s^\varepsilon )](x)d\rho _s^\varepsilon (x)ds. \end{aligned}$$
(2.3)

Definition 2.2

(Weak measure solution to (NLE-m)) An absolutely continuous curve \(\rho ^\varepsilon :[0,T]\rightarrow {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), mapping \(t\in [0,T]\mapsto \rho _t^\varepsilon \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), is a weak measure solution to (NLE-m) for \(m>1\) if, for every \(\varphi \in C^1_c({{\mathbb {R}}^{d}})\) and any \(t\in [0,T]\), it holds

$$\begin{aligned}{} & {} \int _{{\mathbb {R}}^{d}}\varphi (x)d\rho _t^\varepsilon (x)\!-\!\int _{{\mathbb {R}}^{d}}\varphi (x)d\rho _0(x) \nonumber \\{} & {} \quad = -\frac{m}{m-1}\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x) \cdot [\nabla V_\varepsilon *(V_\varepsilon *\rho _s^\varepsilon )^{m-1}](x)d\rho _s^\varepsilon (x)ds. \end{aligned}$$
(2.4)

Remark 2.2

By considering fixed \(\varepsilon >0\) and the corresponding scaling for \(V_1\) satisfying (V), the driving velocity field satisfies

$$\begin{aligned} \begin{aligned}&\int _0^T\int _{{\mathbb {R}}^{d}}|[\nabla V_\varepsilon *(V_\varepsilon *\rho _t^\varepsilon )^{m-1}](x)|d\rho _t^\varepsilon (x)dt\\&\quad \le \int _0^T\iint _{{\mathbb {R}}^{2d}}|\nabla V_\varepsilon (x-y)|(V_\varepsilon *\rho _t^\varepsilon )^{m-1}(y)\,dy\,d\rho _t^\varepsilon (x)\,dt\\&\quad = \int _0^T \int _{{{\mathbb {R}}^{d}}} (|\nabla V_\varepsilon | * \rho _t^\varepsilon )(y) (V_\varepsilon * \rho _t^\varepsilon )^{m-1}(y)dy \, dt\\&\quad \le \int _0^T \Vert V_\varepsilon * \rho _t^\varepsilon \Vert _{L^\infty }^{m-1}\left( \int _{{{\mathbb {R}}^{d}}} (|\nabla V_\varepsilon | * \rho _t^\varepsilon )(y) dy\right) \, dt \\&\quad \le \varepsilon ^{-md}\Vert V_1\Vert _{L^\infty } \int _0^T \Vert |\nabla V_\varepsilon | * \rho _t^\varepsilon \Vert _{L^1}dt \\&\quad \le \varepsilon ^{-md}\Vert V_1\Vert _{L^\infty } T \Vert \nabla V_\varepsilon \Vert _{L^1} \\&\quad = \frac{T}{\varepsilon ^{md+1}}\Vert V_1\Vert _{L^\infty }\Vert \nabla V_1\Vert _{L^1} < \infty . \end{aligned} \end{aligned}$$
(2.5)

[1, Lemma 8.2.1] provides the existence of a continuous representative for distributional solutions of continuity equations with velocity fields in \(L^1([0,T];L^1(\rho _t))\). This justifies Definition 2.2 in the sense that the right-hand side of (2.4) is well-defined. Note that a similar computation holds true for the velocity field in (NLE) by applying Lemma A.3, thus justifying Definition 2.1 in the sense that the right-hand side of (2.3) is well-defined.

Definition 2.3

(Weak solution to (PME)) A weak solution to the Cauchy problem for \(m>1\)

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t\rho =\Delta \rho ^m\\ \rho (0,\cdot )=\rho _0 \end{array}\right. } \end{aligned}$$
(PME)

on the time interval [0, T] with initial datum \(\rho _0\in {{\mathcal {P}}_{2}^a({{\mathbb {R}}^{d}})}\cap L^m({{\mathbb {R}}^{d}})\) is an absolutely continuous curve \(\rho \in C([0,T];{{\mathcal {P}}}_2({{\mathbb {R}}^{d}}))\) satisfying the following properties:

  1. (1)

    for almost every \(t\in [0,T]\) the measure \(\rho (t)\) has a density with respect to the Lebesgue measure, still denoted by \(\rho (t)\), such that \(\rho \in L^\infty ([0,T];L^m({{\mathbb {R}}^{d}}))\) and \(\nabla \rho ^{\frac{m}{2}}\in L^2([0,T];L^2({{\mathbb {R}}^{d}}))\);

  2. (2)

    for any \(\varphi \in C^1_c({\mathbb {R}}^d)\) and all \(t\in [0,T]\) it holds

    $$\begin{aligned} \int _{{\mathbb {R}}^{d}}\varphi (x)\rho (t,x)\,dx= \int _{{\mathbb {R}}^{d}}\varphi (x)\rho _0(x)\,dx-\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)\cdot \nabla \rho (s,x)^{m}\,dx\,ds; \end{aligned}$$
  3. (3)

    \(\rho ^{-\frac{1}{2}}|\nabla \rho ^m|\in L^1([0,T];L^2({{\mathbb {R}}^{d}}))\).

Remark 2.3

For the sake of clarity we point out the weak solution we obtain initially satisfies, for any test function \(\varphi \in C^1_c(\mathbb {R}^d)\),

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\varphi (x)\rho (t,x)\,dx=\int _{{\mathbb {R}}^{d}}\varphi (x)\rho _0(x)\,dx-2\int _0^t\int _{{\mathbb {R}}^{d}}\rho (s,x)^{\frac{m}{2}} \nabla \varphi (x)\cdot \nabla \rho (s,x)^{\frac{m}{2}}\,dx\,ds. \end{aligned}$$

The chain rule in Sobolev spaces gives sense to \(\nabla \rho ^m\) in \(L^1({{\mathbb {R}}^{d}})\), hence the more standard concept of weak solution for porous medium equation. A further application of the chain rule identifies \(\nabla \rho ^{m}=\frac{m}{m-1}\rho \nabla \rho ^{m-1}\), for \(m\ge 2\); the same result, however, does not hold in the case \(1<m<2\). Further details are provided in the proof of Theorem 2.2 in Sect. 5. Finally, the last condition in Definition 2.3 is a consequence of uniqueness of very weak solutions, cf. [28], and the theory in [1].

Equally, the same concept is extended to general diffusion equations.

Definition 2.4

(Weak solution to (DE)) Let F satisfy (F1)(F2), (F3), and (\({{\textbf {F}}}_m\)) for some \(m> 1\). A weak solution to the Cauchy problem

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t\rho =\Delta P(\rho )\\ \rho (0,\cdot )=\rho _0 \end{array}\right. } \end{aligned}$$
(DE)

on the time interval [0, T] with initial datum \(\rho _0\in {{\mathcal {P}}_{2}^a({{\mathbb {R}}^{d}})}\) such that \({\mathcal {F}}[\rho _0]<\infty \) is an absolutely continuous curve \(\rho \in C([0,T];{{\mathcal {P}}}_2({{\mathbb {R}}^{d}}))\) satisfying the following properties:

  1. (1)

    for almost every \(t\in [0,T]\) the measure \(\rho (t)\) has a density with respect to the Lebesgue measure, still denoted by \(\rho (t)\), such that \(\rho \in L^\infty ([0,T];L^m({{\mathbb {R}}^{d}}))\) and \(\nabla \rho ^{\frac{m}{2}}\in L^2([0,T];L^2({{\mathbb {R}}^{d}}))\);

  2. (2)

    for any \(\varphi \in C^1_c({\mathbb {R}}^d)\) and all \(t\in [0,T]\) it holds

    $$\begin{aligned} \int _{{\mathbb {R}}^{d}}\varphi (x)\rho (t,x)\,dx= \int _{{\mathbb {R}}^{d}}\varphi (x)\rho _0(x)\,dx-\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)\cdot \nabla P(\rho (s,x))\,dx\,ds; \end{aligned}$$
  3. (3)

    \(\rho ^{-\frac{1}{2}}|\nabla P(\rho )|\in L^1([0,T];L^2({{\mathbb {R}}^{d}}))\).

With the previous definitions, we are ready to state the results of this manuscript.

Theorem 2.1

(Existence for (NLE)) Fix \(\varepsilon >0\) and let \(V_1\), the generator of the mollifying sequence \(V_\varepsilon (x) = \varepsilon ^{-d}V_1(x/\varepsilon )\), satisfy (V). Let F satisfy (F1) and (F2) and suppose \({\mathcal {F}}^\varepsilon [\rho _0]<+\infty \). Then, there exist weak measure solutions \(\rho ^\varepsilon \) to (NLE) such that \(\rho ^\varepsilon (0) = \rho _0\).

Theorem 2.2

(\(\lim _{\varepsilon \rightarrow 0}(\text {NLE}) = (\text {DE})\)) Let F satisfy (F1)(F2)(F3), and (\({{\textbf {F}}}_m\)) for some \(m>1\). Suppose \(\rho _0\in {{\mathcal {P}}_{2}^a({{\mathbb {R}}^{d}})}\) such that \({\mathcal {F}}[\rho _0]<\infty \) and \(V_1\) satisfies (V). In the case \(1<m<2\), assume further that \(\textrm{supp}V_1 \subset B_R\) for some \(R>0\). Let \(\rho ^\varepsilon \) be a sequence of weak measure solutions to (NLE) from Theorem 2.1 with initial condition \(\rho ^\varepsilon (0) = \rho _0\). Then, the sequence \(\rho ^\varepsilon \) converges narrowly to the unique weak solution \(\rho \) of (DE) as \(\varepsilon \downarrow 0\).

Remark 2.4

In the case F is a power law given by \(F(x) = \frac{1}{m-1}|x|^m\) for some \(m>1\), all of (F1)(F2), (F3), and (\({{\textbf {F}}}_m\)) are fulfilled; Theorem 2.2 holds for (PME).

At first glance, our compactness estimates only show that a subsequence of \(\rho ^\varepsilon \) converges narrowly to \(\rho \). However, we appeal to [5, 28, 58] which imply that weak solutions to (DE) and (PME) are unique. Hence, the entire sequence converges.

In Theorem 2.1, the construction of weak measure solutions \(\rho ^\varepsilon \) to (NLE) leverages the JKO scheme [37]. Just at the level of the JKO scheme, only (F1) is required (c.f. Proposition 3.1) for which all of the regularised Rényi entropies \({\mathcal {H}}_m^\varepsilon [\rho ] = {\mathcal {H}}_m[V_\varepsilon *\rho ]\) for any \(m\ge 1\) are admissible. In fact, assumption (F2) enters only when verifying \(\rho ^\varepsilon \) is a weak measure solution of (NLE) (c.f. Sect. 3). This excludes \(F(x) = x\log x\), but all the power laws for \(m>1\) are permitted in this consistency result. Moreover, the assumption that (\({{\textbf {F}}}_m\)) holds for some \(m>1\) in Theorem 2.2 is only used to verify that the limit \(\rho \) is a weak solution to (DE). On the other hand, the construction of the limit \(\rho \) from the sequence \(\rho ^\varepsilon \) allows to relax assumption (\({{\textbf {F}}}_m\)) to any \(m\ge 1\) provided the initial condition \(\rho ^\varepsilon (0) = \rho _0\) belongs in \(L^m\cap L\log L\) (c.f. Sect. 4), thus including all of the regularised Rényi entropies \({\mathcal {H}}_m^\varepsilon \). To summarise in the specific case of \({\mathcal {F}}^\varepsilon ={\mathcal {H}}_m^\varepsilon \) as the regularised energy, the construction of curves \(\rho ^\varepsilon \) and \(\rho \) without consideration of the respective equations (NLE-m) and (PME) can be done for any \(m\ge 1\). However, our technique requires \(m>1\) to verify that \(\rho ^\varepsilon \) is a weak measure solution of (NLE-m). Moreover, when \(1<m<2\), we insist that the generator, \(V_1\) of the mollifying sequence, satisfies (V) and has compact support (in the case \(m\ge 2\) only (V) is required). It is certainly interesting to investigate how we can close this gap to \(m=1\) and we leave this direction for future research.

In Theorem 2.2 we prove that the solutions \(\rho ^\varepsilon \) to (NLE) coming from the construction in Theorem 2.1 converge to \(\rho \), the unique weak solution of (DE). It is natural to ask whether other solutions \({\tilde{\rho }}^\varepsilon \) to (NLE) (not necessarily those constructed via the JKO scheme c.f. Sect. 3) also converge to \(\rho \). Actually, under additional assumptions on the nonlinearity F and the mollifier V, the sequence \(\rho ^\varepsilon \) is unique.

Theorem 2.3

(Uniqueness of solutions to (NLE)) Let F satisfy (F1), (F2)(F3), and (\({{\textbf {F}}}_m\)) for some \(m>1\). Assume \(V_1\) satisfies (V), \(V_1\in C^2({{\mathbb {R}}^{d}})\), and \(D^2V_1\in L^\infty ({{\mathbb {R}}^{d}})\). Then, the weak measure solution \(\rho ^\varepsilon \) in Theorem 2.1 is unique among absolutely continuous curves \(\rho :[0,T]\rightarrow {{\mathcal {P}}}_2({\mathbb {R}}^d)\) satisfying (NLE) in the sense of Definition 2.1.

The following concluding result is completely analogous to Theorem 1.2 of [27].

Corollary 2.1

(Particle approximation to (DE)) Let F satisfy (F1), (F2)(F3), and (\({{\textbf {F}}}_m\)) for some \(m>1\). Assume \(V_1\) satisfies (V), \(V_1\in C^2({{\mathbb {R}}^{d}})\), and \(D^2V_1\in L^\infty ({{\mathbb {R}}^{d}})\). In the case \(1<m<2\), assume moreover that \(\textrm{supp}V_1 \subset B_R\) for some \(R>0\). For any \(t\in [0,T]\), \(N\in {\mathbb {N}}\), the empirical measure \(\rho ^N_\varepsilon (t) = \frac{1}{N}\sum _{j=1}^N \delta _{x^j_\varepsilon (t)}\) is a weak solution to (NLE) provided the particles satisfy the following ODE system

$$\begin{aligned} {\dot{x}}^i_\varepsilon (t) = - \nabla \int _{{\mathbb {R}}^d} V_\varepsilon (x^i_\varepsilon (t) - y) F'\left( \frac{1}{N}\sum _{j=1}^NV_\varepsilon (y - x^j_\varepsilon (t)) \right) dy \quad \forall i=1,\dots , N. \end{aligned}$$

Suppose that (up to a subsequence) as \(\varepsilon \rightarrow 0\) there exist \(N=N(\varepsilon )\rightarrow +\infty \) such that

$$\begin{aligned} e^{-\lambda _F^\varepsilon t} d_W(\rho _\varepsilon ^N(0),\rho (0))\rightarrow 0, \qquad \text{ for } \, \lambda _F^\varepsilon \approx -\varepsilon ^{-2-d(m-1)},\quad t\in [0,T], \end{aligned}$$

with \(\rho _0\in {{\mathcal {P}}_{2}^a({{\mathbb {R}}^{d}})}\) such that \({\mathcal {F}}[\rho _0]<\infty \) and \(T>0\). Then \(\rho _\varepsilon ^N(t)\) converges narrowly to a weak solution of (DE), \(\rho (t)\), for any \(t\in [0,T]\).

In view of Corollary 2.1 and [27], if the initial distribution of particles \(x_\varepsilon ^i\) is cleverly chosen (so that \(d_W(\rho _\varepsilon ^N(0),\rho (0)) = O(1/N)\)), then one can take \(N = o\left( e^{-1/\varepsilon ^{2+d(m-1)}} \right) \) to fulfill the hypothesis on the initial condition. However, it was also suggested in [26, 27] by numerical evidence that a much smaller number of particles \(N\sim \varepsilon ^{-1.01}\) for \(m=2\) in one dimension still yields good accuracy. Bridging this gap between theory and practice is left for future investigation.

3 Results on the nonlocal equation

In this section we focus on (NLE). We show existence of weak measure solutions by means of the JKO scheme [37] which is needed to derive uniform bounds for the nonlocal-to-local limit proven in Sect. 5. Although this is not the main purpose of the paper, and it may be unsurprising, this is indeed an existence result for weak measure solutions to a class of nonlocal PDEs, including nonlocal interactions but not limited to this case. To the best of our knowledge this is the first general result in this context — the structure of \({\mathcal {F}}^\varepsilon \) does not fit in the classical framework of functionals considered in [1]. Note that we do not require the functional to satisfy convexity, for instance as in [1, 14].

We consider initial data \(\rho _0 \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) such that \(\sup _{\varepsilon >0}{\mathcal {F}}^\varepsilon [\rho _0]<+\infty \). In the case of nonlinear diffusion equations, \(F(x) = \frac{1}{m-1}|x|^m\) with \(m>1\) we denote the corresponding energy functionals by

$$\begin{aligned} {\mathcal {H}}_m^\varepsilon [\rho ] {:=} \frac{1}{m-1}\int _{{{\mathbb {R}}^{d}}}|V_\varepsilon * \rho (x)|^m \, dx. \end{aligned}$$

Remark 3.1

In the case of power laws \(F(x) = \frac{1}{m-1}|x|^m\) for \(m>1\), the condition \(\sup _{\varepsilon >0}{\mathcal {H}}_m^\varepsilon [\rho _0]<+\infty \) is guaranteed when \(\rho _0\in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\cap L^m({{\mathbb {R}}^{d}})\). More precisely, Young’s convolution inequality gives

$$\begin{aligned} {\mathcal {H}}_m^\varepsilon [\rho _0]&=\frac{1}{m-1}\int _{{{\mathbb {R}}^{d}}}|V_\varepsilon * \rho _0(x)|^m\,dx =\frac{1}{m-1}\Vert V_\varepsilon *\rho _0\Vert _{L^m({{\mathbb {R}}^{d}})}^m \\&\le \frac{1}{m-1}\Vert V_\varepsilon \Vert _{L^1}^m\Vert \rho _0\Vert _{L^m}^m=\frac{1}{m-1}\Vert V_1\Vert _{L^1}^m\Vert \rho _0\Vert _{L^m}^m<\infty . \end{aligned}$$

We now proceed with the JKO scheme associated to \({\mathcal {F}}^\varepsilon \). First, we define a sequence recursively as follows:

  • fix a time step \(\tau \in (0,1)\) such that \(\rho _{\tau ,\varepsilon }^0{:=}\rho _0\);

  • for \(n\in {\mathbb {N}}\) and given \(\rho _{\tau ,\varepsilon }^{n}\in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), choose

    $$\begin{aligned} \rho _{\tau ,\varepsilon }^{n+1}\in \mathop {\text {argmin}}\limits _{\rho \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}}\left\{ \frac{d_W^2(\rho _{\tau ,\varepsilon }^{n},\rho )}{2\tau }+{\mathcal {F}}^\varepsilon [\rho ]\right\} . \end{aligned}$$
    (3.1)

The above sequence is well-defined for \(\tau \) sufficiently small independently of \(\varepsilon \) (given explicitly in Lemma A.2).

Let \(T>0\) be fixed, and define a piecewise constant interpolation as follows: take \(N{:=}\left[ \frac{T}{\tau }\right] \) the largest integer less than or equal to \(\frac{T}{\tau }\) and set

$$\begin{aligned} \rho _{\tau }^{\varepsilon }(t)=\rho _{\tau ,\varepsilon }^{n}\qquad t\in ((n-1)\tau ,n\tau ], \quad n=0,1,\dots ,N, \end{aligned}$$

being \(\rho _{\tau ,\varepsilon }^{n}\) defined in (3.1). As usually proven, we derive energy and moments bounds sufficient to show narrow compactness.

Proposition 3.1

(Narrow compactness, energy, & moments bound) Let \(0< \varepsilon _0<\infty \) be fixed and suppose F satisfied (F1). There exists an absolutely continuous curve \({\tilde{\rho }}^\varepsilon : [0,T]\rightarrow {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) such that the piecewise constant interpolation \(\rho _{\tau }^{\varepsilon }\) admits a subsequence \(\rho _{\tau _k}^\varepsilon \) narrowly converging to \({\tilde{\rho }}^\varepsilon \) uniformly in \(t\in [0,T]\) and \(0 < \varepsilon \le \varepsilon _0\) as \(k\rightarrow +\infty \). Moreover, for any \(t\in [0,T]\), the following uniform bounds in \(\tau \) and \(0< \varepsilon \le \varepsilon _0\) hold

$$\begin{aligned} {\mathcal {F}}^\varepsilon [{\tilde{\rho }}^\varepsilon (t)]&\le \sup _{\varepsilon>0}{\mathcal {F}}^\varepsilon [\rho _0],\quad m_2({\tilde{\rho }}^\varepsilon )\le C\left( T, \, m_2(\rho _0), \, \varepsilon _0^2 m_2(V_1), \, \sup _{\varepsilon >0}{\mathcal {F}}^\varepsilon [\rho _0]\right) , \end{aligned}$$

where \(C\left( T, \, m_2(\rho _0), \, \varepsilon _0^2m_2(V_1), \, \sup _{\varepsilon>0}{\mathcal {F}}^\varepsilon [\rho _0]\right) >0\) is a uniform constant depending only on the quantities in the brackets.

The following proof is based on [1, 37].

Proof

From the definition of the sequence \(\{\rho _{\tau ,\varepsilon }^{n}\}_{n=0,\dots ,N}\) it holds

$$\begin{aligned} \frac{d_W^2(\rho _{\tau ,\varepsilon }^{n},\rho _{\tau ,\varepsilon }^{n+1})}{2\tau }+{\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]\le {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n}], \quad \forall n=0,\dots ,N-1. \end{aligned}$$
(3.3)

which implies \({\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]\le {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n}]\), and, in particular, the following bound for the regularised internal energy

$$\begin{aligned} \sup _{0 \le n \le N, \, N\tau \le T} {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n}]\le {\mathcal {F}}^\varepsilon [\rho _0], \end{aligned}$$
(3.4)

where the supremum is over all \(n=0,\dots ,N\) and \(\tau \in (0,1)\) such that \(N\tau \le T\) with \(N {:=} \left[ \frac{T}{\tau }\right] \). By summing up over k in inequality (3.3), we obtain

$$\begin{aligned} \sum _{k=m}^n\frac{d_W^2(\rho _{\tau ,\varepsilon }^k,\rho _{\tau ,\varepsilon }^{k+1})}{2\tau }\le {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{m}]-{\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}], \quad \forall 0 \le m \le n \le N-1. \end{aligned}$$
(3.5)

Bounded second moment: We claim the existence of some uniform constant \(C>0\) (depending on the quantities discussed in the statement of this result) such that

$$\begin{aligned} \sup _{0 \le n \le N, \, N\tau \le T}m_2(\rho _{\tau ,\varepsilon }^n) \le C. \end{aligned}$$
(3.6)

By Remark A.1, for any fixed \(n=0,\dots ,N-1\), we begin with

$$\begin{aligned} m_2(\rho _{\tau , \varepsilon }^{n+1})\le 2d_W^2(\rho _0,\rho _{\tau ,\varepsilon }^{n+1}) + 2m_2(\rho _0). \end{aligned}$$

We use the triangle inequality and Cauchy-Schwarz to estimate the \(d_W^2\) term

$$\begin{aligned} m_2(\rho _{\tau ,\varepsilon }^{n+1}) \le 2(n+1) \sum _{k=0}^{n}d_W^2(\rho _{\tau ,\varepsilon }^k, \, \rho _{\tau , \varepsilon }^{k+1}) + 2m_2(\rho _0). \end{aligned}$$

We replace the summation with (3.5) and use \(n+1 \le N\) to obtain

$$\begin{aligned} m_2(\rho _{\tau , \varepsilon }^{n+1})\le 4 T ({\mathcal {F}}^\varepsilon [\rho _0] - {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]) + 2m_2(\rho _0). \end{aligned}$$

We insert the lower bound for \({\mathcal {F}}^\varepsilon \) from (A.1) so that we have

$$\begin{aligned} m_2(\rho _{\tau ,\varepsilon }^{n+1}) \le 4T \left( {\mathcal {F}}^\varepsilon [\rho _0] +c_1 + c_2 C_{d,\alpha }(1 + \varepsilon ^2 m_2(V_1) + m_2(\rho _{\tau ,\varepsilon }^{n+1}))^\alpha \right) + 2m_2(\rho _0). \end{aligned}$$

Keeping in mind that we can assume \(\alpha < 1\) without loss of generality, this final inequality implies the bound (3.6). This can be seen by analysing sequences \(x_n\ge 0\) satisfying \(x_n \le C_1 + C_2x_n^\alpha \).

Bounded squared 2-Wasserstein distance: We claim the existence of some uniform constant \(c>0\) such that

$$\begin{aligned} \sum _{k=m}^nd_W^2(\rho _{\tau ,\varepsilon }^k, \, \rho _{\tau ,\varepsilon }^{k+1}) \le c\tau , \quad \forall 0 \le m \le n \le N-1. \end{aligned}$$
(3.7)

We insert the upper bound of \({\mathcal {F}}^\varepsilon \) (3.4) and the lower bound of \({\mathcal {F}}^\varepsilon \) (A.1) into (3.5) to obtain

$$\begin{aligned} \sum _{k=m}^nd_W^2(\rho _{\tau ,\varepsilon }^k, \, \rho _{\tau ,\varepsilon }^{k+1})&\le 2\tau ({\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{m}]-{\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]) \\&\le 2\tau ({\mathcal {F}}^\varepsilon [\rho _0] +c_1 + c_2 C_{d,\alpha }(1 + \varepsilon ^2 m_2(V_1) + m_2(\rho _{\tau ,\varepsilon }^{n+1}))^\alpha ). \end{aligned}$$

By the uniform second moment estimate (3.6), the inequality (3.7) is verified.

Compactness: Now, let us consider \(0< s<t\) such that \(s\in ((m-1)\tau ,m\tau ]\) and \(t\in ((n-1)\tau ,n\tau ]\) (which implies \(|n-m|<\frac{|t-s|}{\tau }+1\)); by Cauchy-Schwarz inequality and (3.7), we obtain

$$\begin{aligned} \begin{aligned} d_W(\rho _{\tau }^{\varepsilon }(s),\rho _{\tau }^{\varepsilon }(t))&\le \sum _{k=m}^{n-1}d_W(\rho _{\tau ,\varepsilon }^{k},\rho _{\tau ,\varepsilon }^{k+1})\le \left( \sum _{k=m}^{n-1}d_W^2(\rho _{\tau ,\varepsilon }^{k},\rho _{\tau ,\varepsilon }^{k+1})\right) ^{\frac{1}{2}}|n-m|^{\frac{1}{2}}\\ {}&\le c \left( \sqrt{|t-s|}+\sqrt{\tau }\right) , \end{aligned} \end{aligned}$$
(3.8)

where c is a positive constant. Thus \(\rho _{\tau }^{\varepsilon }\) is \(\frac{1}{2}\)-Hölder equicontinuous, up to a negligible error of order \(\sqrt{\tau }\). By using a refined version of Ascoli-Arzelà’s theorem, [1, Proposition 3.3.1], we obtain that \(\rho _{\tau }^{\varepsilon }\) admits a subsequence \(\rho _{\tau _k}^\varepsilon \) narrowly converging to a limit \({\tilde{\rho }}^\varepsilon \) as \(k\rightarrow +\infty \) uniformly on [0, T]. Since \(|\cdot |^2\) is lower semicontinuous and bounded from below, we actually have for any \(t\in [0,T]\)

$$\begin{aligned} \liminf _{k\rightarrow +\infty }\int _{{\mathbb {R}}^{d}}|x|^2\,d\rho _{\tau _k}^\varepsilon (x)\ge \int _{{\mathbb {R}}^{d}}|x|^2\,d{\tilde{\rho }}^\varepsilon (x). \end{aligned}$$

Moreover, \({\mathcal {F}}^\varepsilon \) is lower semicontinuous and bounded from below since \(V_\varepsilon *\rho \) is bounded. Then an application of Fatou’s lemma implies

$$\begin{aligned} \liminf _{k\rightarrow +\infty }{\mathcal {F}}^\varepsilon [\rho _{\tau _k}^\varepsilon ]&\ge {\mathcal {F}}^\varepsilon [{\tilde{\rho }}^\varepsilon ], \end{aligned}$$

whence the thesis follows by applying the above inequalities to (3.4) and (3.6). \(\square \)

Next, we show that \({\tilde{\rho }}^\varepsilon \) provided by Proposition 3.1 is indeed a solution to (NLE), thus proving Theorem 2.1. Since we make use of (F2), the theorem below does not include linear diffusion corresponding to \(F(x) = x\log x\).

Proof of Theorem 2.1

Let us consider two consecutive elements of the sequence \(\{\rho _{\tau ,\varepsilon }^{n}\}_{n\in {\mathbb {N}}}\) defined from the JKO step (3.1), i.e. \(\rho _{\tau ,\varepsilon }^{n}\) and \(\rho _{\tau ,\varepsilon }^{n+1}\). We perturb \(\rho _{\tau ,\varepsilon }^{n+1}\) by using the map \(P^\sigma =\textrm{id}+\sigma \zeta \), for some \(\zeta \in C_c^\infty ({{\mathbb {R}}^{d}};{{\mathbb {R}}^{d}})\) and \(\sigma >0\), that is we consider the perturbation

$$\begin{aligned} \rho ^\sigma {:=} P_\#^\sigma \rho _{\tau ,\varepsilon }^{n+1}. \end{aligned}$$
(3.9)

Being \(\rho _{\tau ,\varepsilon }^{n+1}\) a minimiser of (3.1), we have

$$\begin{aligned} \frac{1}{2\tau }\left[ \frac{d_W^2(\rho _{\tau ,\varepsilon }^{n},\rho ^\sigma )- d_W^2(\rho _{\tau ,\varepsilon }^{n}, \rho _{\tau ,\varepsilon }^{n+1})}{\sigma }\right] +\frac{{\mathcal {F}}^\varepsilon [\rho ^\sigma ]-{\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]}{\sigma }\ge 0. \end{aligned}$$
(3.10)

We now let \(\sigma \rightarrow 0\) in (3.10) analysing the two terms involved separately.

The energy functional terms in (3.10): In this part of the proof, we aim to show

$$\begin{aligned} \frac{{\mathcal {F}}^\varepsilon [\rho ^\sigma ] - {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]}{\sigma } \rightarrow \int _{{{\mathbb {R}}^{d}}}\zeta (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1})](x) \, d\rho _{\tau ,\varepsilon }^{n+1}(x), \quad \sigma \rightarrow 0.\nonumber \\ \end{aligned}$$
(3.11)

We apply the mean-value form of the Taylor expansion to F

$$\begin{aligned} \begin{aligned}&\frac{1}{\sigma }\int _{{{\mathbb {R}}^{d}}}(F(V_\varepsilon * \rho ^\sigma (x)) - F(V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1}))dx = \frac{1}{\sigma }\int _{{{\mathbb {R}}^{d}}}(V_\varepsilon *\rho ^\sigma (x) \\&\qquad - V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1}(x))\underbrace{\int _0^1 F'\left( tV_\varepsilon *\rho ^\sigma (x) + (1-t)V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1}(x) \right) dt}_{=:M_\varepsilon ^\sigma (x)}\, dx \\&\quad = \frac{1}{\sigma }\int _{{{\mathbb {R}}^{d}}} (V_\varepsilon * M_\varepsilon ^\sigma )(x) d[\rho ^\sigma - \rho _{\tau ,\varepsilon }^{n+1}](x) \\&\quad = \int _{{{\mathbb {R}}^{d}}}\frac{(V_\varepsilon *M_\varepsilon ^\sigma )(P^\sigma (x)) - (V_\varepsilon *M_\varepsilon ^\sigma )(x)}{\sigma }d\rho _{\tau ,\varepsilon }^{n+1}(x) \\&\quad = \int _{{{\mathbb {R}}^{d}}}\left\{ \int _{{{\mathbb {R}}^{d}}} \left( \frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) M_\varepsilon ^\sigma (y)dy\right\} d\rho _{\tau ,\varepsilon }^{n+1}(x). \end{aligned} \end{aligned}$$
(3.12)

In the last few lines, we used the definition of \(\rho ^\sigma \) from (3.9) and expanded the convolution. The limit (3.11) is achieved by applying Egorov’s theorem. First, we prove convergence up to sets of \(\rho _{\tau ,\varepsilon }^{n+1}\)-measure zero for the term in curly brackets, which is a sequence of functions of x, i.e.:

$$\begin{aligned} \begin{aligned} \int _{{{\mathbb {R}}^{d}}}&\left( \frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) M_\varepsilon ^\sigma (y)dy \\&\rightarrow \zeta (x)\cdot \int _{{{\mathbb {R}}^{d}}}\nabla V_\varepsilon (x-y) F'(V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1}(y))dy, \quad \sigma \rightarrow 0, \, \rho _{\tau ,\varepsilon }^{n+1}\text {-almost every }x\in {{\mathbb {R}}^{d}}. \end{aligned} \end{aligned}$$
(3.13)

This is exactly \(\zeta (x)\cdot \nabla V_\varepsilon * [F'(V_{\varepsilon }*\rho _{\tau ,\varepsilon }^{n+1})](x)\) which appears as the integrand in (3.11). Assuming this is true for now, by Egorov’s theorem, for every \(\eta >0\), there exists a measurable set \(S_\eta \subset {{\mathbb {R}}^{d}}\) such that \(\rho _{\tau ,\varepsilon }^{n+1}(S_\eta ) < \eta \) and the convergence (3.13) is uniform on \({{\mathbb {R}}^{d}}{\setminus } S_\eta \). Continuing from the last line of (3.12), we have

$$\begin{aligned} \begin{aligned}&\frac{{\mathcal {F}}^\varepsilon [\rho ^\sigma ] - {\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]}{\sigma } \\&\quad =\int _{S_\eta } \left\{ \int _{{{\mathbb {R}}^{d}}} \left( \frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) M_\varepsilon ^\sigma (y)dy\right\} d\rho _{\tau ,\varepsilon }^{n+1}(x) \\&\qquad +\int _{{{\mathbb {R}}^{d}}\setminus S_\eta } \left\{ \int _{{{\mathbb {R}}^{d}}} \left( \frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) M_\varepsilon ^\sigma (y)dy\right\} d\rho _{\tau ,\varepsilon }^{n+1}(x). \end{aligned} \end{aligned}$$
(3.14)

The integral over \({{\mathbb {R}}^{d}}\setminus S_\eta \) passes well in the limit \(\sigma \rightarrow 0\) owing to (3.13) and Egorov’s theorem, so (3.11) is achieved once we show that the integral over \(S_\eta \) is small. We apply the mean-value form of Taylor’s theorem for \(V_\varepsilon \) and Lemma A.3 (with \(\rho = t\rho ^\sigma + (1-t)\rho _{\tau ,\varepsilon }^{n+1}\) and \(C=1\)) to estimate \(M_\varepsilon ^\sigma \) and obtain

$$\begin{aligned}&\left| \int _{{{\mathbb {R}}^{d}}} \left( \frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) M_\varepsilon ^\sigma (y)dy\right| \\&\quad \le \Vert F'\Vert _{L^\infty ([0,\,\Vert V_\varepsilon \Vert _{L^\infty }])} |\zeta (x)|\int _{{{\mathbb {R}}^{d}}} \int _0^1 |\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)|ds dy \\&\quad = \Vert F'\Vert _{L^\infty ([0,\,\Vert V_\varepsilon \Vert _{L^\infty }])}|\zeta (x)|\int _0^1 \int _{{{\mathbb {R}}^{d}}} |\nabla V_\varepsilon (x + s \sigma \zeta (x) - y)|dy ds\\&\quad = \Vert F'\Vert _{L^\infty ([0,\,\Vert V_\varepsilon \Vert _{L^\infty }])}|\zeta (x)|\int _0^1 \int _{{{\mathbb {R}}^{d}}} |\nabla V_\varepsilon (z)|dz ds \\&\quad = \Vert F'\Vert _{L^\infty ([0,\,\Vert V_\varepsilon \Vert _{L^\infty }])}\Vert \nabla V_\varepsilon \Vert _{L^1}|\zeta (x)|. \end{aligned}$$

In the second to last line, we have used Fubini and the linear change of variables \(z = x + s\sigma \zeta (x) - y\) for fixed x. Therefore, the integral over \(S_\eta \) from (3.14) can be estimated by

$$\begin{aligned}{} & {} \left| \int _{S_\eta }\! \left\{ \int _{{{\mathbb {R}}^{d}}} \left( \!\frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) \!M_\varepsilon ^\sigma (y)dy\right\} \!d\rho _{\tau ,\varepsilon }^{n+1}(x) \right| \!\\{} & {} \quad \le \!\Vert \zeta \Vert _{L^\infty } \Vert F'\Vert _{L^\infty ([0,\,\Vert V_\varepsilon \Vert _{L^\infty }])} \Vert \nabla V_\varepsilon \Vert _{L^1} \, \eta , \end{aligned}$$

which is negligible by taking \(\eta \rightarrow 0\).

Throughout this step, we fix \(x\in {{\mathbb {R}}^{d}}\). We again use the mean-value form of Taylor’s theorem to rewrite the difference quotient appearing in (3.13)

$$\begin{aligned}{} & {} \int _{{{\mathbb {R}}^{d}}} \left( \frac{V_\varepsilon (P^\sigma (x) - y) - V_\varepsilon (x-y)}{\sigma }\right) M_\varepsilon ^\sigma (y)dy \\{} & {} \quad = \zeta (x)\cdot \int _{{{\mathbb {R}}^{d}}}\left( \int _0^1\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)\,ds\right) M_\varepsilon ^\sigma (y)dy. \end{aligned}$$

We majorise the integrand with the sequence

$$\begin{aligned} \left| \int _0^1\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)\,ds \, M_\varepsilon ^\sigma (y) \right| \le \Vert F'\Vert _{L^\infty ([0,\Vert V_\varepsilon \Vert _{L^\infty }])}\int _0^1|\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)|ds. \end{aligned}$$

We seek to apply Theorem A.1 on \(X = {{\mathbb {R}}^{d}}\) with

$$\begin{aligned} f^\sigma (y)&{:=} \int _0^1\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)\,ds \, M_\varepsilon ^\sigma (y), \\ g^\sigma (y)&{:=} \Vert F'\Vert _{L^\infty ([0,\Vert V_\varepsilon \Vert _{L^\infty }])}\int _0^1|\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)|ds. \end{aligned}$$

We have already shown the majorisation \(|f^\sigma (y)|\le g^\sigma (y)\) and the convergence

$$\begin{aligned} f^\sigma (y) \rightarrow f(y)&{:=} \nabla V_\varepsilon (x-y) F'(V_\varepsilon *\rho (y)), \\ g^\sigma (y) \rightarrow g(y)&{:=} \Vert F'\Vert _{L^\infty ([0,\Vert V_\varepsilon \Vert _{L^\infty }])} |\nabla V_\varepsilon (x-y)|, \end{aligned}$$

for almost every \(y\in {{\mathbb {R}}^{d}}\) can be proven using the usual Dominated Convergence Theorem. In particular, the growth estimate \(|\nabla V_1(z)| \le C(1+|z|)\) treats the integration \(\int _0^1\, ds\). On the other hand, for \(M_\varepsilon ^\sigma (y)\), the composition \(F'(tV_\varepsilon *\rho ^\sigma (y) + (1-t)V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1}(y))\) is bounded uniformly in \(\sigma \) by Lemma A.3. We verify the last assumption of Theorem A.1 using Fubini and the change of variables \(z = -s\sigma \zeta (x) + y\).

$$\begin{aligned} \int _{{{\mathbb {R}}^{d}}}g^\sigma (y) dy&= \Vert F'\Vert _{L^\infty ([0,\Vert V_\varepsilon \Vert _{L^\infty }])} \int _{{{\mathbb {R}}^{d}}} \int _0^1|\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)|ds dy \\&=\Vert F'\Vert _{L^\infty ([0,\Vert V_\varepsilon \Vert _{L^\infty }])} \int _0^1 \int _{{{\mathbb {R}}^{d}}}|\nabla V_\varepsilon (x + s\sigma \zeta (x) - y)|dy \, ds \\&= \Vert F'\Vert _{L^\infty ([0,\Vert V_\varepsilon \Vert _{L^\infty }])} \int _0^1 \int _{{{\mathbb {R}}^{d}}}|\nabla V_\varepsilon (x-z)|dz \, ds \\&= \int _{{{\mathbb {R}}^{d}}}g(y) dy. \end{aligned}$$

Therefore, we can apply Theorem A.1 and (3.13) is established.

The 2-Wasserstein terms in (3.10): the treatment here is standard and we reproduce the proof in [7, Theorem 3.1] for completeness. Consider an optimal transport plan \(\gamma _{\tau ,\varepsilon }^{n+1} \in \Gamma _o(\rho _{\tau ,\varepsilon }^n, \, \rho _{\tau ,\varepsilon }^{n+1})\) between \(\rho _{\tau ,\varepsilon }^n\) and \(\rho _{\tau ,\varepsilon }^{n+1}\). By definition of \(d_W\), we have

$$\begin{aligned} \begin{aligned} \frac{1}{2\tau }\left[ \frac{d_W^2(\rho _{\tau ,\varepsilon }^{n}, \rho ^\sigma )-d_W^2(\rho _{\tau ,\varepsilon }^{n}, \rho _{\tau ,\varepsilon }^{n+1})}{\sigma }\right]&\le \frac{1}{2\tau \sigma }\iint _{{\mathbb {R}}^{2d}}\left( |x-P^\sigma (y)|^2 -|x-y|^2\right) \,d\gamma _{\tau ,\varepsilon }^n(x,y)\\&=\frac{1}{2\tau \sigma }\iint _{{\mathbb {R}}^{2d}}\left( |x-y-\sigma \zeta (y)|^2 -|x-y|^2\right) \,d\gamma _{\tau ,\varepsilon }^n(x,y)\\&=-\frac{1}{\tau }\iint _{{\mathbb {R}}^{2d}}(x-y)\cdot \zeta (y)\,d\gamma _{\tau ,\varepsilon }^n(x,y)+o(\sigma ), \end{aligned} \end{aligned}$$

where in the last equality we applied a first order Taylor expansion. By sending \(\sigma \) to 0 and recalling (3.10), it holds

$$\begin{aligned} \frac{1}{\tau }\iint _{{\mathbb {R}}^{2d}}(x-y)\cdot \zeta (y)\,d\gamma _{\tau ,\varepsilon }^n(x,y)\le \int _{{{\mathbb {R}}^{d}}}\zeta (x)\cdot \nabla V_\varepsilon *[F'(V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1})](x) d\rho _{\tau ,\varepsilon }^{n+1}(x). \end{aligned}$$

Repeating the same computation for \(\sigma \le 0\), we actually obtain an equality, that is, for \(\zeta =\nabla \varphi \)

$$\begin{aligned} \begin{aligned} \frac{1}{\tau }\!\iint _{{\mathbb {R}}^{2d}}(x-y)\cdot \nabla \varphi (y)d\gamma _{\tau ,\varepsilon }^n(x,y)\!=\!\int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)\cdot \nabla V_\varepsilon *[F'(V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1})](x) d\rho _{\tau ,\varepsilon }^{n+1}(x). \end{aligned} \end{aligned}$$
(3.15)

Note that the Hölder estimate (3.8) and \((x-y)\cdot \nabla \varphi (y)=\varphi (x)-\varphi (y)+o(|x-y|^2)\) imply

$$\begin{aligned} \frac{1}{\tau }\iint _{{\mathbb {R}}^{2d}}(x-y)\cdot \nabla \varphi (y)\,d\gamma _{\tau ,\varepsilon }^n(x,y)=\frac{1}{\tau }\int _{{\mathbb {R}}^{d}}\varphi (x)\,d(\rho _{\tau ,\varepsilon }^{n}-\rho _{\tau ,\varepsilon }^{n+1})(x) + O(\tau ). \end{aligned}$$

Now, let \(0\le s<t\) be fixed, with

$$\begin{aligned} h=\left[ \frac{s}{\tau }\right] +1\quad \text {and}\quad k=\left[ \frac{t}{\tau }\right] . \end{aligned}$$

Taking into account the last equality, by summing in (3.15) over j from h to k, we obtain

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho _{\tau ,\varepsilon }^{k+1}-&\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho _{\tau ,\varepsilon }^h+O(\tau ^2)\\&\quad =-\tau \sum _{j=h}^k \int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon *\rho _{\tau ,\varepsilon }^{j+1})](x)d\rho _{\tau ,\varepsilon }^{j+1}(x), \end{aligned}$$

which is equivalent to

$$\begin{aligned} \begin{aligned}&\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho _{\tau }^{\varepsilon }(t)(x)-\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho _{\tau }^{\varepsilon }(s)(x)+O(\tau ^2)\\&\quad =-\int _s^t\int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _\tau ^\varepsilon (r))](x)\,d\rho _{\tau }^{\varepsilon }(r)(x)\,dr. \end{aligned} \end{aligned}$$
(3.16)

It remains to pass the limit \(\tau \downarrow 0\) up to a subsequence for \(\rho _\tau ^\varepsilon \rightharpoonup {\tilde{\rho }}^\varepsilon \) as in Proposition 3.1. More specifically, the result there states that \(\rho _\tau ^\varepsilon \) narrowly converges uniformly in \(t\in [0,T]\) to \({\tilde{\rho }}^\varepsilon \) as (a subsequence of) \(\tau \downarrow 0\). Clearly, the left-hand side of (3.16) passes easily in the limit \(\tau \downarrow 0\) so we only focus on the right-hand side. Let us take the following statement for granted: for fixed \(\varepsilon >0\) and almost every \(r\in [0,T]\), we have

$$\begin{aligned} \begin{aligned}&\int _{{{\mathbb {R}}^{d}}}\!\!\nabla \varphi (x)\!\cdot \! \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _\tau ^\varepsilon (r))](x)\,d\rho _{\tau }^{\varepsilon }(r)(x) \\&\quad \rightarrow \int _{{{\mathbb {R}}^{d}}}\!\!\nabla \varphi (x)\!\cdot \!\nabla V_\varepsilon * [F'(V_\varepsilon * {\tilde{\rho }}^\varepsilon (r))](x)\,d{\tilde{\rho }}^\varepsilon (r)(x),\\&\qquad \tau \downarrow 0, \, \text {almost every }r\in [0,T]. \end{aligned} \end{aligned}$$
(3.17)

Passing to the limit \(\tau \downarrow 0\) on the right-hand side of (3.16) reduces to finding an \(L^1((s,t);dr)\) majorant, assuming (3.17) holds. By Young’s convolution inequality and Lemma A.3, we have

$$\begin{aligned} |\nabla V_\varepsilon * [F'(V_\varepsilon *\rho _\tau ^\varepsilon (r))](x)| \le \Vert \nabla V_\varepsilon \Vert _{L^1}\Vert F'\Vert _{L^\infty ([0, \, \Vert V_\varepsilon \Vert _{L^\infty }])}. \end{aligned}$$

Overall, this implies the uniform estimate in \(\tau >0\)

$$\begin{aligned}&\quad \left| \int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _\tau ^\varepsilon (r))](x)\,d\rho _{\tau }^{\varepsilon }(r)(x) \right| \le \Vert \nabla \varphi \Vert _{L^\infty }\Vert \nabla V_\varepsilon \Vert _{L^1}\Vert F'\Vert _{L^\infty ([0, \, \Vert V_\varepsilon \Vert _{L^\infty }])}. \end{aligned}$$

Hence, we can pass to the limit \(\tau \downarrow 0\) in the right-hand side of (3.16) and conclude.

Let us prove (3.17). We fix \(r\in [0,T]\) and henceforth drop the explicit dependence on this variable. We add and subtract

$$\begin{aligned}&\int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _\tau ^\varepsilon )](x)\,d\rho _{\tau }^{\varepsilon }(x) - \int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * {\tilde{\rho }}^\varepsilon )](x)\,d{\tilde{\rho }}^\varepsilon (x) \nonumber \\&\quad =\int _{\textrm{supp}\varphi }\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _\tau ^\varepsilon ) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon )](x)\,d\rho _{\tau }^{\varepsilon }(x) \end{aligned}$$
(3.18)
$$\begin{aligned}&\qquad + \int _{\textrm{supp}\varphi }\nabla \varphi (x)\cdot \nabla V_\varepsilon * [F'(V_\varepsilon * {\tilde{\rho }}^\varepsilon )](x)\,d[\rho _\tau ^\varepsilon -{\tilde{\rho }}^\varepsilon ](x). \end{aligned}$$
(3.19)

Fix small \(\eta >0\) and find \(R>1\) large enough such that \(\int _{{{\mathbb {R}}^{d}}\setminus B_R}|\nabla V_\varepsilon (y)| dy < \eta \) where \(B_R\) denotes the open ball of radius R centred at the origin. We begin with the difference in (3.18) by expanding the convolution

$$\begin{aligned} \left| \nabla V_\varepsilon * [F'(V_\varepsilon * \rho _\tau ^\varepsilon ) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon )](x)\right|&\le \int _{{{\mathbb {R}}^{d}}} |\nabla V_\varepsilon (y)| | F'(V_\varepsilon * \rho _\tau ^\varepsilon (x-y)) \\&\quad - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x-y)) |dy. \end{aligned}$$

Up to a further subsequence, Corollary A.1 gives

$$\begin{aligned} \sup _{x\in \textrm{supp}\varphi , \, y \in {\bar{B}}_R} | F'(V_\varepsilon * \rho _\tau ^\varepsilon (x-y)) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x-y)) | < \eta , \end{aligned}$$

for \(\tau >0\) sufficiently small. Hence,

$$\begin{aligned} \int _{B_R} |\nabla V_\varepsilon (y)| | F'(V_\varepsilon * \rho _\tau ^\varepsilon (x-y)) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x-y)) |dy < \Vert \nabla V_\varepsilon \Vert _{L^1}\eta . \end{aligned}$$
(3.20)

Concerning the integral over \({{\mathbb {R}}^{d}}\setminus B_R\), we apply Lemma A.3 to obtain (uniformly in \(\tau >0\))

$$\begin{aligned} \int _{{{\mathbb {R}}^{d}}\setminus B_R} |\nabla V_\varepsilon (y)| | F'(V_\varepsilon * \rho _\tau ^\varepsilon (x-y)) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x-y)) |dy \,\le \, 2\Vert F'\Vert _{L^\infty ([0, \, \Vert V_\varepsilon \Vert _{L^\infty }])} \eta . \end{aligned}$$
(3.21)

These inequalities imply that the integral in (3.18) can be made arbitrarily small in the limit \(\tau \downarrow 0\).

Turning to the difference in (3.19), we only need to show that \(\nabla V_\varepsilon * [F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon )]\) is continuous on \(\textrm{supp}\varphi \). Then, we can appeal to the narrow convergence \(\rho _\tau ^\varepsilon \rightharpoonup {\tilde{\rho }}^\varepsilon \) in duality with continuous and bounded functions. Suppose \(x^n\in \textrm{supp}\varphi \) is a sequence which converges to \(x\in \textrm{supp}\varphi \), we compare the difference

$$\begin{aligned}&\quad \nabla V_\varepsilon *F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon )(x^n) - \nabla V_\varepsilon *F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon )(x) \\&\quad = \int _{B_R} \nabla V_\varepsilon (y) [F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x^n-y)) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x-y))]dy \\&\qquad + \int _{{{\mathbb {R}}^{d}}\setminus B_R} \nabla V_\varepsilon (y) [F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x^n-y)) - F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (x-y))]dy. \end{aligned}$$

The integral over \(B_R\) can be made arbitrarily small as \(n\rightarrow \infty \) owing to the uniform continuity of \(F'(V_\varepsilon *{\tilde{\rho }}^\varepsilon (\cdot ))\) from Corollary A.1 and integrability of \(\nabla V_\varepsilon \). This is similar to what is done for (3.20). The other integral over \({{\mathbb {R}}^{d}}{\setminus } B_R\) can be made arbitrarily small by the same argument for (3.21). \(\square \)

4 Compactness in the limit \(\varepsilon \downarrow 0\)

This section discusses the construction of a limit \(\rho \) for a subsequence of \(\{{\tilde{\rho }}^\varepsilon \}_{\varepsilon >0}\). The key estimate is Lemma 4.1 which we are able to prove for general functions F satisfying (F1), (F3), and the growth conditions (\({{\textbf {F}}}_m\)).

Assumptions (F1)(F3), and (\({{\textbf {F}}}_m\)) cover all the power laws \(F(x) = \frac{1}{m-1}|x|^m\) for \(m>1\) and \(F(x) = x\log x\) (corresponding to \(m=1\)). In the case \(m>1\), (\({{\textbf {F}}}_m\)) implies (F2) since \(F'\) can be extended to \(x=0\). More precisely, if \(F''\) satisfies the bounds in (\({{\textbf {F}}}_m\)) for some \(m>1\), then \(F''\) is locally integrable around 0. By the fundamental theorem of calculus,

$$\begin{aligned} F'(x) = F'(1) - \int _x^1 F''(t)dt, \quad \forall x>0. \end{aligned}$$

Owing to Lebesgue’s dominated convergence theorem, the right-hand side has a limit as \(x\downarrow 0\) and therefore so does the left-hand side which we call \(F'(0) {:=} \lim _{x\downarrow 0} F'(x)\).

Remark 4.1

(Comments on (\({{\textbf {F}}}_m\))) Combining (\({{\textbf {F}}}_m\)) with the assumption \(F(0)=0\) from (F1) gives, for \(m>1\),

$$\begin{aligned} \frac{c_1}{m(m-1)}x^m \le F(x) - F'(0)x \le \frac{c_2}{m(m-1)}x^m. \end{aligned}$$

The inequalities above and the uniform bound for \({\mathcal {F}}^\varepsilon [{\tilde{\rho }}^\varepsilon (t)]\) from Proposition 3.1 yield the following integrability estimate uniform in \(t\in [0,T]\) and \(\varepsilon >0\)

$$\begin{aligned} \begin{aligned}&\Vert V_\varepsilon * {\tilde{\rho }}^\varepsilon (t) \Vert _{L^m({\mathbb {R}}^d)}^m \le \frac{m(m-1)}{c_1}{\mathcal {F}}^\varepsilon [{\tilde{\rho }}^\varepsilon (t)] - \frac{m(m-1)}{c_1}F'(0) \\&\quad \le \frac{m(m-1)}{c_1}{\mathcal {F}}^\varepsilon [\rho _0]- \frac{m(m-1)}{c_1}F'(0) \le \frac{c_2}{c_1}\Vert V_\varepsilon *\rho _0\Vert _{L^m}^m \le \frac{c_2}{c_1}\Vert \rho _0\Vert _{L^m}^m. \end{aligned} \end{aligned}$$
(4.1)

Concerning the \(m=1\) case, we directly estimate

$$\begin{aligned} {\mathcal {H}}[V_\varepsilon *{\tilde{\rho }}^\varepsilon (t)] = {\mathcal {F}}^\varepsilon [{\tilde{\rho }}^\varepsilon (t)] \le {\mathcal {F}}^\varepsilon [\rho _0] = {\mathcal {H}}[V_\varepsilon *\rho _0] \le {\mathcal {H}}[\rho _0]. \end{aligned}$$
(4.2)

Here, we used Jensen’s inequality with the convex function \(x\log x\) and reference measure \(V_\varepsilon \) to obtain \({\mathcal {H}}[V_\varepsilon *\rho _0]\le \int V_\varepsilon *(\rho _0\log \rho _0) = \int \rho \log \rho \) recalling \(\int V_\varepsilon = 1\) as well as Proposition 3.1.

The sequence of solutions \(\{{\tilde{\rho }}^\varepsilon \}_{\varepsilon >0}\) to (NLE) constructed in Sect. 3 is the candidate approximating weak solution of (DE). As \(\{{\tilde{\rho }}^\varepsilon \}_{\varepsilon >0}\) is in general a sequence of measures, it is useful to consider the regularised version, \(V_\varepsilon * {\tilde{\rho }}^\varepsilon \). For brevity, we drop the tilde on \(\rho ^\varepsilon \) from now on. First, we state compactness of \(\{\rho ^\varepsilon \}_{\varepsilon >0}\) in \(C([0,T]; {{\mathcal {P}}}_2({\mathbb {R}}^d))\).

Proposition 4.1

There exists an absolutely continuous curve \({\tilde{\rho }}:[0,T]\rightarrow {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) such that the sequence \(\{\rho ^\varepsilon \}_{\varepsilon >0}\) admits a subsequence \(\{\rho ^{\varepsilon _k}\}\) such that \(\rho ^{\varepsilon _k}(t)\) narrow converges to \({\tilde{\rho }}(t)\) for any \(t\in [0,T]\) as \(k\rightarrow +\infty \).

Proof

The proof is exactly the same as in [7, Proposition 4.1] using a refined version of Ascoli-Arzelà [1, Proposition 3.3.1]. \(\square \)

The narrow convergence proven in Proposition 4.1 is not sufficient to pass to the limit \(\varepsilon \downarrow 0\) from (NLE) to (DE). For this reason, we study the sequence \(v^\varepsilon (t) {:=} V_\varepsilon * \rho ^\varepsilon (t)\) for \(t\in [0,T]\) (we drop the subscript k for simplicity). We obtain higher regularity estimates uniform in \(\varepsilon \) by using the flow interchange technique developed by Matthes, McCann, and Savaré in [41]. The strategy is to compute the dissipation of \({\mathcal {F}}^\varepsilon \) along a solution of an auxiliary gradient flow. This flow is chosen so that it satisfies an Evolution Variational Inequality (EVI) which allows us to obtain the desired estimate leading to compactness.

Since the seminal work of Jordan, Kinderlehrer, and Otto [37], it is known that the heat equation can be interpreted as the 2-Wasserstein gradient flow of the Boltzmann entropy \({\mathcal {H}}\) (see below for the precise definition). Moreover the heat semigroup, denoted by \(S_{\mathcal {H}}\), is a 0-flow in the following sense.

Definition 4.1

(\(\lambda \)-flow) A semigroup \(S_{{\mathcal {E}}}:[0,+\infty ]\times {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\rightarrow {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) is a \(\lambda \)-flow for a functional \({\mathcal {E}}:{{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\rightarrow {\mathbb {R}}\cup \{+\infty \}\) with respect to the distance \(d_W\) if, for an arbitrary \(\rho \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), the curve \(t\mapsto S_{{\mathcal {E}}}^t\rho \) is absolutely continuous on \([0,+\infty [\) and it satisfies the so-called Evolution Variational Inequality (EVI)

$$\begin{aligned} \frac{1}{2}\frac{d^+}{dt}d_W^2(S_{{\mathcal {E}}}^t\rho ,{\bar{\rho }}) +\frac{\lambda }{2}d_W^2(S_{{\mathcal {E}}}^t\rho ,{\bar{\rho }})\le {\mathcal {E}}({\bar{\rho }})-{\mathcal {E}}(S_{{\mathcal {E}}}^t\rho ) \end{aligned}$$
(4.3)

for all \(t\ge 0\), with respect to every reference measure \({\bar{\rho }}\in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) such that \({\mathcal {E}}({\bar{\rho }})<\infty \).

Below we use the flow interchange by considering the heat equation as an auxiliary flow with respect to the Boltzmann entropy

$$\begin{aligned} {\mathcal {H}}[\rho ]= {\left\{ \begin{array}{ll} \int _{{{\mathbb {R}}^{d}}}\rho (x)\log \rho (x)\,dx, &{}\rho \ll \text {Leb}({{\mathbb {R}}^{d}})\\ +\infty , &{} \text {otherwise} \end{array}\right. }. \end{aligned}$$
(4.4)

Again, when \(\rho \) is an absolutely continuous measure with respect to Lebesgue, we identify its density as \(\rho (x)\).

Remark 4.2

We remind the reader that \({\mathcal {H}}[\rho ]\) is bounded below by \(m_2(\rho )\). This can be seen by looking at the relative entropy with respect to the standard Gaussian on \({\mathbb {R}}^d\) denoted by \({\mathcal {M}}(x) = (2\pi )^{-\frac{d}{2}}\exp \{-|x|^2/2\}\). For any \(\rho \in {{\mathcal {P}}_{2}^a({{\mathbb {R}}^{d}})}\), Jensen’s inequality with the convex function \(x\log x\) gives

$$\begin{aligned} {\mathcal {H}}[\rho \, |\, {\mathcal {M}}]&{:=} \int _{{\mathbb {R}}^d}\rho (x) \log \frac{\rho (x)}{{\mathcal {M}}(x)} dx = \int _{{\mathbb {R}}^d} \frac{\rho (x)}{{\mathcal {M}}(x)}\log \frac{\rho (x)}{{\mathcal {M}}(x)} {\mathcal {M}}(x)dx \\&\ge \left( \int _{{\mathbb {R}}^d} \frac{\rho (x)}{{\mathcal {M}}(x)} {\mathcal {M}}(x)dx \right) \log \left( \int _{{\mathbb {R}}^d} \frac{\rho (x)}{{\mathcal {M}}(x)} {\mathcal {M}}(x)dx \right) = 0. \end{aligned}$$

This gives the lower bound for the entropy

$$\begin{aligned} {\mathcal {H}}[\rho ] \ge \int _{{\mathbb {R}}^d}\rho (x)\log {\mathcal {M}}(x)dx = -\frac{d}{2}\log 2\pi - \frac{1}{2}m_2(\rho ). \end{aligned}$$

In the following, for any \(\nu \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) such that \({\mathcal {H}}(\nu )<+\infty \), we denote by \(S_{{\mathcal {H}}}^t\nu \) the solution at time t of the heat equation coupled with an initial value \(\nu \) at \(t=0\). Moreover, for every \(\rho \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\), we define the dissipation of \({\mathcal {F}}^\varepsilon \) along \(S_{{\mathcal {H}}}\) by

$$\begin{aligned} D_{{\mathcal {H}}}{\mathcal {F}}^\varepsilon (\rho ){:=}\limsup _{s\downarrow 0}\left\{ \frac{{\mathcal {F}}^\varepsilon [\rho ]-{\mathcal {F}}^\varepsilon [S_{{\mathcal {H}}}^s\rho ]}{s}\right\} . \end{aligned}$$

In order to prove stronger compactness, we begin with an \(L_t^2H_x^1\) estimate on the \(\frac{m}{2}\) power of \(v_\tau ^\varepsilon = V_\varepsilon * \rho _\tau ^\varepsilon \). This generalises Lemma 4.1 from [7].

Lemma 4.1

Suppose F satisfies (F1), (F3), and (\({{\textbf {F}}}_m\)) for some \(m\ge 1\). Let \(\rho _0\in {{\mathcal {P}}_{2}^a({{\mathbb {R}}^{d}})}\cap L^m({\mathbb {R}}^d)\). In the case \(m=1\), assume further \({\mathcal {H}}[\rho _0] < +\infty \). Then, there exists a constant \(C=C(\rho _0, \, V_1, \, T) >0\) such that

$$\begin{aligned} \sup _{\varepsilon , \, \tau >0}\left\| (v_\tau ^\varepsilon )^\frac{m}{2} \right\| _{L^2(0,T;\, H^1({\mathbb {R}}^d))} \le C. \end{aligned}$$

Proof

If \(m=1\), then the \(L_t^2L_x^2\) bound simply reads

$$\begin{aligned} \left\| (v_\tau ^\varepsilon )^\frac{1}{2}\right\| _{L^2(0,T;\, L^2({\mathbb {R}}^d))}^2 = \int _0^T\int _{{\mathbb {R}}^d} V_\varepsilon *\rho _\tau ^\varepsilon (t,x) \, dx dt = T, \end{aligned}$$

since both \(\Vert V_\varepsilon \Vert _{L^1} = \int _{{\mathbb {R}}^d}d\rho _\tau ^\varepsilon (t)(x) = 1\). For \(m > 1\), the estimate is very similar to that of (4.1) applied to the pre-limit curves \(\rho _{\tau }^{\varepsilon }\),

$$\begin{aligned} \left\| (v_\tau ^\varepsilon )^\frac{m}{2} \right\| _{L^2([0,T]; L^2({\mathbb {R}}^d))}^2&= \int _0^T\Vert V_\varepsilon *\rho _{\tau }^{\varepsilon }\Vert _{L^m}^m \,dt\le \frac{c_2 T}{c_1}\Vert \rho _0\Vert _{L^m}^m. \end{aligned}$$

The rest of this proof focuses on the uniform bound for \(\nabla (v_\tau ^\varepsilon )^\frac{m}{2}\). For \(s>0\), we take \(S_{\mathcal {H}}^s\rho _{\tau , \varepsilon }^{n+1}\) as a competitor against \(\rho _{\tau , \varepsilon }^{n+1}\) in the minimisation problem (3.1). We thus have

$$\begin{aligned} \frac{1}{2\tau }d_W^2(\rho _{\tau ,\varepsilon }^{n+1},\rho _{\tau ,\varepsilon }^{n})+{\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]\le \frac{1}{2\tau }d_W^2(S_{{\mathcal {H}}}^s\rho _{\tau ,\varepsilon }^{n+1},\rho _{\tau ,\varepsilon }^{n}) +{\mathcal {F}}^\varepsilon [S_{{\mathcal {H}}}^s\rho _{\tau ,\varepsilon }^{n+1}], \end{aligned}$$

which, dividing by \(s>0\) and passing to \(\limsup _{s\downarrow 0}\), gives

$$\begin{aligned} \tau D_{{\mathcal {H}}}{\mathcal {F}}^\varepsilon (\rho _{\tau ,\varepsilon }^{n+1})\le \left. \frac{1}{2}\frac{d^+}{dt}\right| _{t=0}\Big (d_W^2(S_{{\mathcal {H}}}^t\rho _{\tau ,\varepsilon }^{n+1},\rho _{\tau ,\varepsilon }^{n})\Big ) \overset{\varvec{(E.V.I.)}}{\le }{\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n}]-{\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n+1}]. \end{aligned}$$
(4.5)

In the last inequality we used that \(S_{\mathcal {H}}\) is a 0-flow, cf. Definition 4.1. Now, let us focus on the left hand side of (4.5). Firstly, note that

$$\begin{aligned} \begin{aligned} D_{{\mathcal {H}}}{\mathcal {F}}^\varepsilon (\rho _{\tau ,\varepsilon }^{n+1})&=\limsup _{s\downarrow 0}\left\{ \frac{{\mathcal {F}}^\varepsilon [\rho _{\tau ,\varepsilon }^{n+1}]- {\mathcal {F}}^\varepsilon [S_{{\mathcal {H}}}^s\rho _{\tau ,\varepsilon }^{n+1}]}{s}\right\} \\ {}&=\limsup _{s\downarrow 0}\int _0^1\left( -\frac{d}{dz}\Big |_{z=st}{\mathcal {F}}^\varepsilon [S_{{\mathcal {H}}}^{z}\rho _{\tau ,\varepsilon }^{n+1}]\right) \,dt. \end{aligned} \end{aligned}$$
(4.6)

Thus, we now compute the time derivative inside the above integral. Using integration by parts, the \(C^\infty \) regularity of the heat semigroup, and (\({{\textbf {F}}}_m\)), we have

$$\begin{aligned} \begin{aligned} \frac{d}{dt}{\mathcal {F}}^\varepsilon [S_{{\mathcal {H}}}^t\rho _{\tau ,\varepsilon }^{n+1}]&= -\int _{{\mathbb {R}}^{d}}F''(V_\varepsilon * S_{\mathcal {H}}^t\rho _{\tau ,\varepsilon }^{n+1}) |\nabla V_\varepsilon * S_{\mathcal {H}}^t\rho _{\tau ,\varepsilon }^{n+1}|^2 dx \\&\le -c_1 \int _{{\mathbb {R}}^d} (V_\varepsilon * S_{\mathcal {H}}^t\rho _{\tau ,\varepsilon }^{n+1})^{m-2}|\nabla V_\varepsilon * S_{\mathcal {H}}^t\rho _{\tau ,\varepsilon }^{n+1}|^2 dx\\&=-\frac{4c_1}{m^2}\int _{{\mathbb {R}}^d}\left| \nabla (V_\varepsilon * S_{\mathcal {H}}^t\rho _{\tau ,\varepsilon }^{n+1})^\frac{m}{2} \right| ^2 dx. \end{aligned} \end{aligned}$$
(4.7)

The previous computation is justified since \(S_{{\mathcal {H}}}^t\rho _{\tau ,\varepsilon }^{n+1}>0\) everywhere on \({{\mathbb {R}}^{d}}\) so there is no division by zero. By substituting (4.7) into (4.6), from (4.5) we obtain

$$\begin{aligned} \tau \liminf _{s\downarrow 0}\int _0^1\int _{{{\mathbb {R}}^{d}}}\left| \nabla (V_\varepsilon *S_{{\mathcal {H}}}^{st}\rho _{\tau ,\varepsilon }^{n+1})^\frac{m}{2}(x)\right| ^2\,dx\,dt\le \frac{m^2}{4c_1}\left( {\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n}]-{\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n+1}]\right) . \end{aligned}$$

In order to pass to the limit \(s\downarrow 0\) for \(m>1\), we first deduce \(V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1}\in L^m\) by (\({{\textbf {F}}}_m\)) and Proposition 3.1. Second, by standard properties of the heat semigroup, we obtain \(V_\varepsilon * S_{\mathcal {H}}^{st}\rho _{\tau ,\varepsilon }^{n+1}\rightarrow V_\varepsilon * \rho _{\tau ,\varepsilon }^{n+1}\) in \(L^m\) as \(s\downarrow 0\). Notice that the first and second steps are immediate for \(m=1\). Third, by the inequality

$$\begin{aligned} \left| (V_\varepsilon *S_{{\mathcal {H}}}^{st}\rho _{\tau ,\varepsilon }^{n+1})^\frac{m}{2} - (V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1})^\frac{m}{2} \right| ^2 \le 2\left( (V_\varepsilon *S_{{\mathcal {H}}}^{st}\rho _{\tau ,\varepsilon }^{n+1})^m + (V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1})^m \right) , \end{aligned}$$

we can apply Theorem A.1 to deduce \((V_\varepsilon *S_{{\mathcal {H}}}^{st}\rho _{\tau ,\varepsilon }^{n+1})^\frac{m}{2} \rightarrow (V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1})^\frac{m}{2}\) in \(L^2\) as \(s\downarrow 0\). Finally, the weak \(L^2\) lower semi-continuity of the \(H^1\) semi-norm gives

$$\begin{aligned} \tau \int _{{{\mathbb {R}}^{d}}}\left| \nabla |V_\varepsilon *\rho _{\tau ,\varepsilon }^{n+1}|^\frac{m}{2}(x)\right| ^2\,dx\le \frac{m^2}{4c_1}\left( {\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n}]-{\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n+1}]\right) . \end{aligned}$$

By summing up over n from 0 to \(N-1\), taking into account Remark 4.2 and that second order moments are uniformly bounded (see Proposition 3.1), we get

$$\begin{aligned} \int _0^T\int _{{{\mathbb {R}}^{d}}}\left| \nabla | V_\varepsilon *\rho _{\tau }^{\varepsilon }(t)|^\frac{m}{2}(x)\right| ^2\,dx\,dt\le & {} \frac{m^2}{4c_1}\left( {\mathcal {H}}[\rho _0]-{\mathcal {H}}[\rho _{\tau ,\varepsilon }^{n}]\right) \nonumber \\\le & {} \frac{m^2}{4c_1}\left( {\mathcal {H}}[\rho _0] + C(\rho _0, \, V_1, \, T)\right) . \end{aligned}$$
(4.8)

For \(m=1\), the initial entropy is assumed to be bounded. For \(m>1\), since \(x\log x \le C x^m\) for any \(x\ge 0\), and some constant \(C>0\), we always have \({\mathcal {H}}[\rho _0]\le C\Vert \rho _0\Vert _{L^m}^m\). In both cases, the initial entropy is bounded, and this establishes the desired \(L_{t,\,x}^2\) bound for \(\nabla (v_\tau ^\varepsilon )^\frac{m}{2}\). \(\square \)

The strong \(L^m\) compactness in time and space follows by applying a refined version of the Aubin-Lions Lemma due to Rossi and Savaré [50, Theorem 2]. For the reader’s convenience we recall the latter result below before presenting the compactness result for \(\{v^{\varepsilon _k}\}_k\).

Proposition 4.2

[50, Theorem 2] Let X be a separable Banach space. Consider

  • a lower semicontinuous functional \({\mathscr {F}}:X\rightarrow [0,+\infty ]\) with relatively compact sublevels in X;

  • a pseudo-distance \(g:X\times X\rightarrow [0,+\infty ]\), i.e., g is lower semicontinuous and such that \(g(\rho ,\eta )=0\) for any \(\rho ,\eta \in X\) with \({\mathscr {F}}(\rho )<\infty \), \({\mathscr {F}}(\eta )<\infty \) implies \(\rho =\eta \).

Let U be a set of measurable functions \(u:(0,T)\rightarrow X\), with a fixed \(T>0\). Assume further that

$$\begin{aligned} \sup _{u\in U}\int _{0}^T{\mathscr {F}}(u(t))\,dt<\infty \quad \text {and}\quad \lim _{h\downarrow 0}\sup _{u\in U}\int _{0}^{T-h}g(u(t+h),u(t))\,dt=0. \end{aligned}$$
(4.9)

Then U contains an infinite sequence \((u_n)_{n\in {\mathbb {N}}}\) that converges in measure, with respect to \(t\in (0,T)\), to a measurable \({\tilde{u}}:(0,T)\rightarrow X\), i.e.

$$\begin{aligned} \lim _{n\rightarrow \infty }|\{t\in (0,T):\Vert u_n(t)-u(t)\Vert _X\ge \sigma \}|=0, \quad \forall \sigma >0. \end{aligned}$$

The two conditions in (4.9) are called tightness and weak integral equicontinuity, respectively.

Proposition 4.3

Fix \(m\ge 1\) and consider the family \(\{v_\tau ^\varepsilon \}_{\varepsilon \in (0,\varepsilon _0), \tau >0}\) in Lemma 4.1. There is a subsequence \(\tau _k\downarrow 0\) such that for any \(\varepsilon >0\), we have

$$\begin{aligned} v_{\tau _k}^\varepsilon \rightarrow v^\varepsilon = V_\varepsilon *{\tilde{\rho }}^\varepsilon , \quad \text {in }L^m([0,T]\times {\mathbb {R}}^d). \end{aligned}$$

Moreover, there is a subsequence \(\varepsilon _k\downarrow 0\) and a curve \(v\in C([0,T];{{\mathcal {P}}_2({{\mathbb {R}}^{d}})})\cap L^m([0,T]\times {\mathbb {R}}^d)\) such that

$$\begin{aligned} v^\varepsilon \rightarrow v, \quad \text {in }L^m([0,T]\times {\mathbb {R}}^d). \end{aligned}$$

Proof

The proof of the result is obtained by applying Proposition 4.2 to a subset of the sequence \(U{:=}\{v_\tau ^\varepsilon \}_{\varepsilon \in (0,\varepsilon _0), \tau >0}\) for \(X{:=}L^m({{\mathbb {R}}^{d}})\) and \(g{:=}d_1\) being the 1-Wasserstein distance — extended to \(+\infty \) outside of \({{\mathcal {P}}}_1({{\mathbb {R}}^{d}})\times {{\mathcal {P}}}_1({{\mathbb {R}}^{d}})\). As for the functional, we consider \({\mathscr {F}}:L^m({{\mathbb {R}}^{d}})\rightarrow [0,+\infty ]\) defined by

$$\begin{aligned} {\mathscr {F}}[v]= {\left\{ \begin{array}{ll} \left\| v^\frac{m}{2}\right\| _{H^1({{\mathbb {R}}^{d}})}^2 + \int _{{\mathbb {R}}^d}|x|v(x)\, dx, &{} \text {if } v\in {{\mathcal {P}}}_1({{\mathbb {R}}^{d}}) \text{ and } v^{\frac{m}{2}}\in H^1({{\mathbb {R}}^{d}});\\ +\infty , &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

Note that elements in the domain of the functional \({\mathscr {F}}\) belong to \({{\mathcal {P}}}_1({{\mathbb {R}}^{d}})\), thus \(0=g(\rho ,\eta )=d_1(\rho ,\eta )\) implies \(\rho =\eta \). Let us check that \({\mathscr {F}}\) is an admissible functional.

Lower semicontinuity can be easily verified following, e.g., [7]. Let \(A_c{:=}\{v\in L^m({{\mathbb {R}}^{d}}): {\mathscr {F}}[v]\le c\}\) be a sublevel of \({\mathscr {F}}\), where c is a positive constant. We consider \(B_c{:=}\{w=v^{\frac{m}{2}}:v\in A_c\}\) and prove that \(B_c\) is relatively compact in \(L^2({{\mathbb {R}}^{d}})\), as the map \(w\in L^2({{\mathbb {R}}^{d}})\mapsto \iota (w)=w^\frac{2}{m}\in L^m({{\mathbb {R}}^{d}})\) is continuous and \(A_c=\iota (B_c)\).

The Riesz-Fréchet-Kolmogorov theorem provides relative compactness in \(L^2({{\mathbb {R}}^{d}})\) of \(B_c\). In fact, elements of \(B_c\) are bounded in \(L^2({{\mathbb {R}}^{d}})\) and it holds the uniform continuity estimate

$$\begin{aligned} \begin{aligned}&\int _{{{\mathbb {R}}^{d}}}|w(x+h)-w(x)|^2dx \\&\quad \!=\! \int _{{{\mathbb {R}}^{d}}}\left| \int _0^1 \frac{d}{d\tau }w(x+\tau h)\,d\tau \right| ^2 dx\! =\! \int _{{{\mathbb {R}}^{d}}}\left| \int _0^1 h \cdot \nabla w(x+\tau h)\,d\tau \right| ^2 dx\\&\quad \le |h|^2\int _{{{\mathbb {R}}^{d}}}\int _0^1 |\nabla w(x+\tau h)|^2\,d\tau \,dx = |h|^2\Vert \nabla w\Vert _{L^2({{\mathbb {R}}^{d}})}^2, \end{aligned} \end{aligned}$$
(4.10)

which implies \(\Vert w(\cdot +h)-w(\cdot )\Vert _{L^2({{\mathbb {R}}^{d}})}\rightarrow 0\) as \(h\rightarrow 0^+\).

Before proceeding to the uniform integrability, we record the following improved estimates afforded to us by the fact that \(B_c\) is a bounded subset of \(H^1({\mathbb {R}}^d)\).

$$\begin{aligned} \sup _{w\in B_c}\Vert w\Vert _{L^q({\mathbb {R}}^d)} \le c, \quad q \in \left\{ \begin{array}{cc} \{+\infty \} &{}d=1 \\ {[}2,+\infty ) &{}d=2 \\ {[}2, \frac{2d}{d-2}] &{}d>2 \end{array} \right. . \end{aligned}$$
(4.11)

In the case \(d=1\), for any \(m\ge 1\), we set \(\delta = 1\) in the following estimate

$$\begin{aligned} \Vert w\Vert _{L^2({{\mathbb {R}}^{d}}\setminus B_R)}^2&=\int _{|x|\ge R}|v(x)|^m \,dx \le \frac{1}{R^{\delta }}\int _{{{\mathbb {R}}^{d}}}|x|^{\delta }|v(x)|^m\,dx \nonumber \\&\le \frac{\Vert v\Vert _{L^\infty }^{m-1}}{R}\int _{{\mathbb {R}}^d}|x|v(x)\, dx\le \frac{\Vert v\Vert _{L^\infty }^{m-1}}{R}{\mathscr {F}}[v]\le \frac{\Vert v\Vert _{L^\infty }^{m-1}}{R}c. \end{aligned}$$
(4.12)

Hence, uniform integrability is proven in the case \(d=1\) and \(m\ge 1\). In fact, for any \(d\ge 2\) and \(m=1\), we can simply take \(\delta =1\) again in (4.12) to establish uniform integrability in this case. For general \(d\ge 2\) and \(m>1\), we further develop (4.12) by Hölder’s inequality to obtain, for a particular choice of \(\delta \in (0,1)\) which will be made clear,

$$\begin{aligned} \Vert w\Vert _{L^2({\mathbb {R}}^d\setminus B_R)}^2 \le \frac{1}{R^{\delta }}\left( \int _{{\mathbb {R}}^d}|x|v(x)\,dx\right) ^\delta \left( \int _{{{\mathbb {R}}^{d}}}|v(x)|^{\frac{m-\delta }{1-\delta }}\,dx\right) ^{1-\delta }. \end{aligned}$$
(4.13)

The parameter \(\delta \in (0,1)\) can be chosen to take advantage of the extra integrability from (4.11). For example, we can take

$$\begin{aligned} \delta = \frac{2}{d(m-1)+2} \in (0,1), \end{aligned}$$

which is permissible in light of the Sobolev embedding (4.11) \(\int _{{\mathbb {R}}^d}|v(x)|^\frac{m-\delta }{1-\delta } \, dx \le c\) recalling \(v = w^\frac{2}{m}\). Thus, (4.13) yields the uniform integrability of w in \(L^2\).

We now check tightness and weak integral equicontinuity, i.e. conditions (4.9). Let us set \(U{:=}\{v_\tau ^\varepsilon \}_{0<\varepsilon \le \varepsilon _0, 0 < \tau }\), being \(v_\tau ^\varepsilon :[0,T]\rightarrow L^m({{\mathbb {R}}^{d}})\) the sequence defined above by \(v_\tau ^\varepsilon =V_\varepsilon *\rho _{\tau }^{\varepsilon }\), which satisfies Lemma 4.1. For any \(0<\varepsilon \le \varepsilon _0\) and \(\tau >0\), it holds

$$\begin{aligned} \int _0^T{\mathscr {F}}[v_\tau ^\varepsilon (t)]\,dt&=\int _0^T\left\| \left( v^\varepsilon _\tau \right) ^{\frac{m}{2}}\right\| _{H^1({{\mathbb {R}}^{d}})}^2\, dt + \int _0^T\int _{{\mathbb {R}}^d}|x|v^\varepsilon _\tau (x)\, dx\,dt\\&\le C(\rho _0,V_1,T)+ \varepsilon _0T\int _{{\mathbb {R}}^{d}}V_1(z)|z|\,dz<+\infty , \end{aligned}$$

where we used

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}|x|v^\varepsilon _\tau (x)\,dx&=\iint _{{\mathbb {R}}^{2d}}|x|V_\varepsilon (x-y)\,d\rho _{\tau }^{\varepsilon }(y)\,dx\\&\le \iint _{{\mathbb {R}}^{2d}}V_\varepsilon (x-y)|x-y|\,d\rho _{\tau }^{\varepsilon }(y)\,dx+ \iint _{{\mathbb {R}}^{2d}}V_\varepsilon (x-y)|y|\,d\rho _{\tau }^{\varepsilon }(y)\,dx\\&=\varepsilon \int _{{\mathbb {R}}^{d}}V_1(z)|z|\,dz+\int _{{\mathbb {R}}^{d}}V_1(z)\,dz\int _{{\mathbb {R}}^{d}}|y|\,d\rho _{\tau }^{\varepsilon }(y)\\&\le \varepsilon _0\int _{{\mathbb {R}}^{d}}V_1(z)|z|\,dz+\sqrt{m_2(\rho _{\tau }^{\varepsilon })}\int _{{\mathbb {R}}^{d}}V_1(z)\,dz<+\infty . \end{aligned}$$

due to Remark 3.1 and Proposition 3.1. Taking the supremum in U we have tightness. For the weak integral equicontinuity, we fix \(\varepsilon , \, h>0\) and consider the \(\tau \le h\) and \(\tau > h\) cases separately. Starting with \(\tau \le h\), we use the almost Hölder continuity of \(\rho _{\tau }^{\varepsilon }\) proven in (3.8) of Proposition 4.1. More precisely, it holds

$$\begin{aligned} \int _0^{T-h}\!\!d_1(v_\tau ^\varepsilon (t+h),v_\tau ^\varepsilon (t))\,dt&\le \!\!\int _0^{T-h}\!\!d_W(v_\tau ^\varepsilon (t+h),v_\tau ^\varepsilon (t))\,dt\\&\le \!\!\int _0^{T-h}\!\!d_W(\rho _{\tau }^{\varepsilon }(t+h),\rho _{\tau }^{\varepsilon }(t))\,dt \\&\le c\int _0^{T-h} (\sqrt{h} + \sqrt{\tau }) \, dt \le 2c(T-h)\sqrt{h}. \end{aligned}$$

where in the intermediate inequalities we used (3.8) for some constant \(c>0\) (independent of \(\varepsilon , \, \tau , \, h\)) as well as standard properties of Wasserstein distances, c.f. for example [54, Section 5.1]. The equicontinuity follows by sending \(h\downarrow 0\). In the case \(\tau > h\), we use (3.7) instead to estimate

$$\begin{aligned} \quad \int _0^{T-h}\!\!d_1(v_\tau ^\varepsilon (t+h),v_\tau ^\varepsilon (t))\,dt&\le \!\!\int _0^{T-h}\!\!d_W(\rho _{\tau }^{\varepsilon }(t+h),\rho _{\tau }^{\varepsilon }(t))\,dt \le h\sum _{n=0}^{N-1}d_W(\rho _{\tau ,\varepsilon }^{n+1},\rho _{\tau ,\varepsilon }^{n}) \\&\le h N^\frac{1}{2}\left( \sum _{n=0}^{N-1}d_W^2(\rho _{\tau ,\varepsilon }^{n+1}, \rho _{\tau ,\varepsilon }^{n}) \right) ^\frac{1}{2} \le cT^\frac{1}{2} h, \end{aligned}$$

where the constant c is defined when proving (3.7).

We are left to prove the relative compactness in \(L^m([0,T];L^m({{\mathbb {R}}^{d}}))\) for all \(m\ge 1\). We start with the limit \(\tau \downarrow 0\) for fixed \(\varepsilon >0\). Remember that the estimates we have proven so far are uniform in \(\varepsilon \) and \(\tau \) so there is no dependence on \(\varepsilon \) as \(\tau \downarrow 0\). We begin with the \(m>1\) case. The first part in the proof of Lemma 4.1 showed that \(\Vert v_\tau ^\varepsilon \Vert _{L^m([0,T]\times {\mathbb {R}}^d)}\) is uniformly bounded. Thus there exists a subsequence \(\tau _k\downarrow 0\) such that \(v_{\tau _k}^\varepsilon \rightharpoonup v^\varepsilon \) in \(L^m([0,T]\times {\mathbb {R}}^d)\) for some \(v^\varepsilon \in L^m([0,T]\times {\mathbb {R}}^d)\). By Proposition 3.1, we know that \(\rho _\tau ^\varepsilon \) narrowly converges to \({\tilde{\rho }}^\varepsilon \) along a subsequence uniformly in [0, T]. By testing against smooth functions, we must have agreement between these limits \(v^\varepsilon = V_\varepsilon * {\tilde{\rho }}^\varepsilon \). Moreover, along a further subsequence which we just label \(\tau \downarrow 0\), we can apply Proposition 4.2 giving

$$\begin{aligned} \lim _{\tau \downarrow 0} \left| \left\{ t\in (0,T) \,: \, \Vert v_\tau ^\varepsilon (t) - v^\varepsilon (t)\Vert _{L^m({\mathbb {R}}^d)}\ge \sigma \right\} \right| = 0, \quad \forall \sigma >0. \end{aligned}$$

Let us denote the set above by \(A_\sigma (\tau )\). For arbitrary \(\sigma >0\), we have

$$\begin{aligned} \Vert v_\tau ^\varepsilon - v^\varepsilon \Vert _{L^m([0,T]\times {\mathbb {R}}^d)}^m&= \int _0^T\int _{{\mathbb {R}}^d} |v_\tau ^\varepsilon - v^\varepsilon |^m = \left( \int _{A_\sigma (\tau )} + \int _{[0,T]\setminus A_\sigma (\tau )}\right) \int _{{\mathbb {R}}^d}|v_\tau ^\varepsilon - v^\varepsilon |^m \\&\le \sup _{s\in [0,T]}2^{(m-1)}\left( \Vert v_\tau ^\varepsilon (s)\Vert _{L^m}^m + \Vert v^\varepsilon (s)\Vert _{L^m}^m \right) \left| A_\sigma (\tau )\right| + \sigma ^m T. \end{aligned}$$

Similar to (4.1), we can insert

$$\begin{aligned} \sup _{\varepsilon ,\tau>0,\, t \in [0,T]}\Vert v_\tau ^\varepsilon \Vert _{L^m}^m \le \frac{c_2}{c_1}\Vert \rho _0\Vert _{L^m}^m, \quad \text {and} \quad \sup _{\varepsilon >0, \,t\in [0,T]}\Vert v^\varepsilon \Vert _{L^m}^m\le \frac{c_2}{c_1}\Vert \rho _0\Vert _{L^m}^m \end{aligned}$$

into the previous estimate to obtain

$$\begin{aligned} \Vert v_\tau ^\varepsilon - v^\varepsilon \Vert _{L^m([0,T]\times {\mathbb {R}}^d)}^m \le 2^m\frac{c_2}{c_1} \Vert \rho _0\Vert _{L^m}^m |A_\sigma (\tau )| + \sigma ^mT. \end{aligned}$$

Passing to \(\tau \downarrow 0\) and using \(\lim _{\tau \downarrow 0} |A_\sigma (\tau )| = 0\), we arrive at

$$\begin{aligned} \limsup _{\tau \downarrow 0}\Vert v_\tau ^\varepsilon - v^\varepsilon \Vert _{L^m([0,T]\times {\mathbb {R}}^d)} \le \sigma T^\frac{1}{m}. \end{aligned}$$

Since \(\sigma >0\) was arbitrary, this implies the strong \(L^m\) convergence from \(v_\tau ^\varepsilon \) to \(v^\varepsilon = V_\varepsilon *{\tilde{\rho }}^\varepsilon \).

In the case \(m=1\), we need to argue differently. We apply [50, Proposition 1.10] which asserts that relative compactness in \(L^1((0,T);X)\) is implied by uniform integrability and relative compactness in measure as a function with values in X (\(X\equiv L^1({{\mathbb {R}}^{d}})\) in this proof). Compactness in measure has just been proven as an application of Proposition 4.2. Following [50, Remark 1.11], uniform integrability is a consequence of the strong integral equicontinuity

$$\begin{aligned} \lim _{h\rightarrow 0}\sup _{\tau }\int _0^{T-h}\left| \Vert v^\varepsilon _\tau (t+h)\Vert _{L^1({{\mathbb {R}}^{d}})}-\Vert v^\varepsilon _\tau (t)\Vert _{L^1({{\mathbb {R}}^{d}})}\right| \,dt=0, \end{aligned}$$

where we used that \(\Vert v^\varepsilon _\tau (t)\Vert _{L^1({{\mathbb {R}}^{d}})}=1\) for any \(t\in [0,T]\) and \(\varepsilon >0\).

\(\underline{\text {Strong compactness}\ \varepsilon \downarrow 0:}\) We first claim that the estimate in Lemma 4.1 also holds uniformly for \(v^\varepsilon \), namely

$$\begin{aligned} \sup _{\varepsilon >0} \left\| (v^\varepsilon )^\frac{m}{2}\right\| _{L^2(0,T; \, H^1({\mathbb {R}}^d))} \le C(\rho _0,\, V_1, \, T). \end{aligned}$$
(4.14)

This can be seen by the fact that, up to a further subsequence, \((v_\tau ^\varepsilon )^\frac{m}{2}\) converges to \((v^\varepsilon )^\frac{m}{2}\) strongly in \(L^2([0,T]\times {\mathbb {R}}^d)\). Indeed, by standard results in \(L^p\) integration theory and the fact that \(v_\tau ^\varepsilon \rightarrow v^\varepsilon \) strongly in \(L^m\), there exists \(w^\varepsilon \in L^m([0,T]\times {\mathbb {R}}^d)\) such that, along a subsequence, \(|v_\tau ^\varepsilon | \le w^\varepsilon \) for almost every \((t,x) \in [0,T]\times {\mathbb {R}}^d\). Moreover, we have \(v_\tau ^\varepsilon \rightarrow v^\varepsilon \) pointwise almost everywhere in \([0,T]\times {\mathbb {R}}^d\). Using Lebesgue’s dominated convergence theorem, we obtain

$$\begin{aligned} \int _0^T\int _{{\mathbb {R}}^d}\left| (v_\tau ^\varepsilon )^\frac{m}{2} - (v^\varepsilon )^\frac{m}{2}\right| ^2 \rightarrow 0, \end{aligned}$$

since the integrand converges to 0 pointwise almost everywhere and it is majorised, uniformly in \(\tau \), by

$$\begin{aligned} |(v_\tau ^\varepsilon )^\frac{m}{2} - (v^\varepsilon )^\frac{m}{2}|^2 \le 2((w^\varepsilon )^m + (v^\varepsilon )^m)\in L^1([0,T]\times {\mathbb {R}}^d). \end{aligned}$$

Owing to the (weak \(L^2\)) lower semicontinuity of the \(H^1\) seminorm, the estimate in Lemma 4.1 passes to the limit (along a subsequence) \(\tau \downarrow 0\) and (4.14) is established.

At this point, we can repeat all of the previous argument for \(U=\{v^\varepsilon \}_{\varepsilon \in (0,\varepsilon _0)}\). We take the same space \(X = L^m({\mathbb {R}}^d)\) and \(g=d_1\). The same functional \({\mathscr {F}}\) is still admissible. Tightness and weak integral equicontinuity can be analogously proven.

Proposition 4.2 applies and we have convergence in measure for \(v^\varepsilon \) to some curve v described in the statement of this result. By the same arguments as before, this convergence is strong in \(L^m([0,T]; L^m({\mathbb {R}}^d))\).

\(\square \)

5 Convergence of solutions

This section addresses the proof of Theorem 2.2. We cover the case \(m\ge 2\) in Sect. 5.1 while the case \(1<m<2\) is treated in Sect. 5.2. To simplify the presentation, we focus on functionals \({\mathcal {F}}= {\mathcal {H}}_m\) but we also discuss (see Remark 5.2) the extension to general energies satisfying (F1)(F2), (F3), and (\({{\textbf {F}}}_m\)) for convergence from (NLE) to (DE).

5.1 The case \(m\ge 2\)

Building on the previous discussions from Sects. 3 and 4, we denote \(\rho ^\varepsilon \) the weak measure solutions to (NLE-m) constructed from the JKO scheme in Proposition 3.1. Moreover, we focus on the subsequence such that \(v^\varepsilon = V_\varepsilon * \rho ^\varepsilon \) converges to \(v\in C([0,T];\, {{\mathcal {P}}_2({{\mathbb {R}}^{d}})})\cap L^m([0,T]\times {{\mathbb {R}}^{d}})\) in \(L^m([0,T]\times {{\mathbb {R}}^{d}})\) from Proposition 4.3. Starting from the definition of weak measure solution to (NLE-m) we can reformulate the right-hand side as follows:

$$\begin{aligned} \begin{aligned}&\int _{{\mathbb {R}}^{d}}\varphi (x)d\rho _t^\varepsilon (x)\!-\!\int _{{\mathbb {R}}^{d}}\varphi (x)d\rho _0(x)\\&\quad =-\frac{m}{m-1}\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)\cdot \nabla V_\varepsilon *(V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x)d\rho _r^\varepsilon (x)dr\\&\quad =-\frac{m}{m-1}\int _0^t\int _{{\mathbb {R}}^{d}}(V_\varepsilon *\rho _r^\varepsilon \nabla \varphi )(x)\nabla (V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x)\,dx\,dr\\&\quad =-2\int _0^t\int _{{\mathbb {R}}^{d}}(V_\varepsilon *\rho _r^\varepsilon \nabla \varphi )(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}-1}\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}\,dx\,dr\\&\quad =-2\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr\\&\qquad -2\int _0^t\int _{{\mathbb {R}}^{d}}z_r^\varepsilon (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}-1}\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr, \end{aligned} \end{aligned}$$
(5.1)

being for any \(r\in [0,T]\) and \(x\in {{\mathbb {R}}^{d}}\), the error term

$$\begin{aligned} z_r^\varepsilon (x){:=}(V_\varepsilon *\rho _r^\varepsilon \nabla \varphi )(x)-\nabla \varphi (x)(V_\varepsilon *\rho _r^\varepsilon )(x). \end{aligned}$$
(5.2)

The product \((V_\varepsilon * \rho _r^\varepsilon )^\frac{m}{2}\nabla (V_\varepsilon *\rho _r^\varepsilon )^\frac{m}{2}\) in the last line of (5.1) is a weak-strong convergence pair in \(L_{t,x}^2\). Indeed, recall the uniform \(L_t^2H_x^1\) bound on \((V_\varepsilon * \rho ^\varepsilon )^\frac{m}{2}\) and Proposition 4.2 from Sect. 4. Hence, the first integral in the last line of (5.1) passes well in the limit (along a subsequence) \(\varepsilon \rightarrow 0\). This is precised later in the full proof of Theorem 2.2 so we dedicate much of this section to estimates proving that the error vanishes as \(\varepsilon \rightarrow 0\).

Remark 5.1

If \(V_1\) is compactly supported, the argument that the last term in (5.1) vanishes as \(\varepsilon \downarrow 0\) can be simplified based on the arguments in Sect. 5.2. In the rest of this subsection however, we present a general argument allowing for \(V_1\) with unbounded support.

Notice that the last term in the last equality of (5.1) can be estimated as

$$\begin{aligned} \Vert z^\varepsilon (v^\varepsilon )^{\frac{m}{2}-1}\nabla (v^\varepsilon )^{\frac{m}{2}}\Vert _{L^1([0,t]\times {{\mathbb {R}}^{d}})}\le \Vert z^\varepsilon \Vert _{L^m([0,t]\times {{\mathbb {R}}^{d}})}\Vert (v^\varepsilon )^{\frac{m}{2}-1}\Vert _{L^q([0,t]\times {{\mathbb {R}}^{d}})}\Vert \nabla (v^\varepsilon )^{\frac{m}{2}}\Vert _{L^2([0,t]\times {{\mathbb {R}}^{d}})}, \end{aligned}$$

for \(q=\frac{2m}{m-2}\), so that \(\frac{1}{m}+\frac{1}{q}+\frac{1}{2}=1\) and

$$\begin{aligned} \Vert (v^\varepsilon )^{\frac{m}{2}-1}\Vert _{L^q([0,t]\times {{\mathbb {R}}^{d}})}=\Vert v^\varepsilon \Vert _{L^m([0,t]\times {{\mathbb {R}}^{d}})}^{m\left( \frac{m-2}{2}\right) }\le c(T,V_1,\rho _0,m). \end{aligned}$$

Notice that the exponent q is only valid for \(m\ge 2\) based on the computations above. In order to obtain a solution of (PME) in the \(\varepsilon \rightarrow 0^+\) limit, we need to prove that \(z^\varepsilon \rightarrow 0\) in \(L^m([0,t]\times {{\mathbb {R}}^{d}})\), for any \(t\in [0,T]\). In turn, this will imply the error term in (5.1) vanishes as \(\varepsilon \rightarrow 0\), as a consequence of the \(L^2\) version of Lebesgue dominated convergence theorem and weak-\(L^2\) convergence. More precisely, we note that the product \(z^\varepsilon (v^\varepsilon )^{\frac{m}{2}-1}\in L^2([0,t]\times {{\mathbb {R}}^{d}})\) and it converges to 0 strongly in \(L^2\).

Lemma 5.1

There exists a subsequence \(\varepsilon _k\downarrow 0\) such that the error term \(z^{\varepsilon _k}\) converges to zero in \(L^m([0,T]\times {{\mathbb {R}}^{d}})\) as \(k\rightarrow \infty \).

Proof

First we notice that for any \(t\in [0,T]\) and \(\varphi \in C^2_c({{\mathbb {R}}^{d}})\) it holds

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}|z^\varepsilon _t(x)|\,dx&\le \int _{{{\mathbb {R}}^{d}}}\int _{{{\mathbb {R}}^{d}}} V_\varepsilon (x-y) |\nabla \varphi (y) - \nabla \varphi (x)| d\rho ^\varepsilon _t(y)\,dx\\&\le \Vert D^2\varphi \Vert _\infty \int _{{\mathbb {R}}^d}\int _{{{\mathbb {R}}^{d}}}V_\varepsilon (x-y)|y-x|\,d\rho ^\varepsilon _t(y)\,dx\\&=\varepsilon \Vert D^2\varphi \Vert _\infty \int _{{{\mathbb {R}}^{d}}}|z|V_1(z)\,dz, \end{aligned}$$

by means of the change of variable \(z=\frac{x-y}{\varepsilon }\). Therefore, there exists a constant \(C(V_1,\varphi )\) such that \(\Vert z^\varepsilon \Vert _{L^\infty ([0,T];L^1({{\mathbb {R}}^{d}}))}\le \varepsilon C(V_1,\varphi )\), whence, up to passing to a subsequence, \(z_t^\varepsilon (x)\rightarrow 0\) for a.e. \((t,x)\in [0,T]\times {{\mathbb {R}}^{d}}\). We now find a majorant to apply the \(L^p\) version of the generalised Lebesgue dominated convergence theorem.

For almost every \(x\in {{\mathbb {R}}^{d}}\) and \(t\in [0,T]\), for \(i=1,\ldots ,d\), the non-negativity of \(V_1\) and \(\rho ^\varepsilon _t\) gives

$$\begin{aligned} \left| \int _{{{\mathbb {R}}^{d}}} V_\varepsilon (x-y) \partial _{x_i} \varphi (y) d\rho ^\varepsilon _t(y) \right| \le \int _{{{\mathbb {R}}^{d}}} V_\varepsilon (x-y) \vert \partial _{x_i} \varphi (y) \vert d\rho ^\varepsilon _t(y) \le \Vert \partial _{x_i} \varphi \Vert _{\infty } ~v^\varepsilon _t(x), \end{aligned}$$

whence

$$\begin{aligned} |z_t^\varepsilon (x)|\le 2\Vert \nabla \varphi \Vert _\infty |v_t^\varepsilon (x)|. \end{aligned}$$

Since \(v^\varepsilon \in L^m([0,t]\times {{\mathbb {R}}^{d}})\) and it converges strongly in \(L^m\), c.f. Proposition 4.3, we are able to conclude the result, as aforementioned. \(\square \)

Lemma 5.2

For any \(t\in [0,T]\) and any \(\varphi \in C_c^1({{\mathbb {R}}^{d}})\) it holds

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0^+}\int _{{\mathbb {R}}^{d}}\varphi (x)v_t^\varepsilon (x)\,dx=\int _{{{\mathbb {R}}^{d}}}\varphi (x)\,d{\tilde{\rho }}(t). \end{aligned}$$

Proof

For any \(t\in [0,T]\) and any \(\varphi \in C_c^1({{\mathbb {R}}^{d}})\), by using the definition of \(v_t^\varepsilon \) we obtain:

$$\begin{aligned} \left| \int _{{\mathbb {R}}^{d}}\varphi (x)v_t^\varepsilon (x)\,dx-\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho ^\varepsilon _t(x)\right|&=\left| \int _{{\mathbb {R}}^{d}}\varphi (x)(V_\varepsilon *\rho ^\varepsilon _t)(x)\,dx-\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho ^\varepsilon _t(x)\right| \\&=\left| \int _{{\mathbb {R}}^{d}}(\varphi *V_\varepsilon )(x)\,d\rho ^\varepsilon _t(x)-\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho ^\varepsilon _t(x)\right| \\&=\left| \int _{{\mathbb {R}}^{d}}[(\varphi *V_\varepsilon )(x)-\varphi (x)]\,d\rho ^\varepsilon _t(x)\right| \\&\le \int _{{\mathbb {R}}^{d}}\int _{{\mathbb {R}}^{d}}|\varphi (x-y)-\varphi (x)|V_\varepsilon (y)\,dy\,d\rho ^\varepsilon _t(x)\\&\le \Vert \nabla \varphi \Vert _\infty \int _{{\mathbb {R}}^{d}}|y|V_\varepsilon (y)\,dy\\&=\varepsilon \Vert \nabla \varphi \Vert _\infty \int _{{\mathbb {R}}^{d}}|x|V_1(x)\,dx, \end{aligned}$$

which converges to 0 as \(\varepsilon \rightarrow 0^+\) since \(\int _{{\mathbb {R}}^{d}}|x|V_1(x)\,dx<+\infty \). In the second last estimate, we used the mean-value inequality \(|\varphi (x-y) - \varphi (x)| \le \Vert \nabla \varphi \Vert _{\infty } |y|\). \(\square \)

We now have all the information to prove Theorem 2.2 in the case \({\mathcal {F}}= {\mathcal {H}}_m\) for \(m\ge 2\).

Proof of Theorem 2.2 for \({\mathcal {F}}= {\mathcal {H}}_m\) and \(m\ge 2\) Since \(\rho ^\varepsilon \) is a weak solution to (NLE-m), for any \(\varphi \in C^1_c({{\mathbb {R}}^{d}})\) and \(t\in [0,T]\) it satisfies

$$\begin{aligned}&\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho _t^\varepsilon (x)-\int _{{\mathbb {R}}^{d}}\varphi (x)\,d\rho _0(x)\\&\quad =-2\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr\\&\qquad -2\int _0^t\int _{{\mathbb {R}}^{d}}z_r^\varepsilon (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}-1}\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr, \end{aligned}$$

as explained in (5.1). Proposition 4.1, Lemma 4.1, Lemma 5.2, and Lemma 4.3 infer existence of a subsequence of \(\rho ^\varepsilon (t)\) narrowly converging to \({\tilde{\rho }}\in L^m([0,T];L^m({{\mathbb {R}}^{d}}))\), and, in particular, \(\{v^\varepsilon \}_\varepsilon \) admits a subsequence such that

$$\begin{aligned}&v^{\varepsilon _k}\rightarrow {\tilde{\rho }} \qquad \quad \text{ in } L^m([0,T];L^m({{\mathbb {R}}^{d}}));\\&\quad \nabla (v^{\varepsilon _k})^\frac{m}{2}\rightharpoonup w \quad \, \text{ in } L^2([0,T];L^2({{\mathbb {R}}^{d}})). \end{aligned}$$

By a standard argument one can show that \((v^{\varepsilon _k})^\frac{m}{2}\rightarrow ({\tilde{\rho }})^\frac{m}{2}\) in \(L^2([0,T];L^2({{\mathbb {R}}^{d}}))\), whence \(w\equiv \nabla ({\tilde{\rho }})^{\frac{m}{2}}\). Before letting \(\varepsilon \rightarrow 0^+\) and obtaining the result we need to further regularise the test function, \(\varphi \), since in Lemma 5.1 we make use of test functions in \(C^2_c({{\mathbb {R}}^{d}})\). In this regard, we consider a standard mollifier \(\eta \in C_c^\infty ({{\mathbb {R}}^{d}})\) and the corresponding sequence \(\varphi ^\sigma {:=}\eta ^\sigma *\varphi \in C_c^\infty ({{\mathbb {R}}^{d}})\), being \(\eta ^\sigma (x)=\sigma ^{-d}\eta (x/\sigma ^d)\) for any \(x\in {{\mathbb {R}}^{d}}\) and \(\sigma >0\). As a consequence of the observations above and Lemma 5.1, by letting \(\varepsilon \rightarrow 0^+\) we obtain, for any \(\sigma >0\) and \(t\in [0,T]\),

$$\begin{aligned}&\int _{{\mathbb {R}}^{d}}\varphi ^\sigma (x){{\tilde{\rho }}}(t,x)\,dx\\&\quad = \int _{{\mathbb {R}}^{d}}\varphi ^\sigma (x)\rho _0(x)\,dx-2\int _0^t\int _{{\mathbb {R}}^{d}}[{{\tilde{\rho }}}(s,x)]^{\frac{m}{2}} \nabla \varphi ^\sigma (x)\cdot \nabla [{{\tilde{\rho }}}(s,x)]^{\frac{m}{2}}\,dx\,ds\\&\quad =\int _{{\mathbb {R}}^{d}}\varphi ^\sigma (x)\rho _0(x)\,dx-\frac{m}{m-1}\int _0^t\int _{{\mathbb {R}}^{d}}{{\tilde{\rho }}}(s,x) \nabla \varphi ^\sigma (x)\cdot \nabla [{{\tilde{\rho }}}(s,x)]^{m-1}\,dx\,ds, \end{aligned}$$

where in the last equality we are using \(m\ge 2\), hence the chain rule holds true, cf. Remark 2.3. More precisely, we re-write \(\rho ^{m-1}=G\circ u\), for \(u=\rho ^{\frac{m}{2}}\) and \(G(x)=x^{\frac{2(m-1)}{m}}\) since \(\frac{2(m-1)}{m}\ge 1\). As pointed out in Remark 2.3, the usual definition of weak solution holds by identifying \(\nabla \rho ^m=2\rho ^{\frac{m}{2}}\nabla \rho ^{\frac{m}{2}}\) (in the weak sense) — write \(\rho ^m=G\circ u\), for \(u=\rho ^{\frac{m}{2}}\) and \(G(x)=x^2\). Since \(\varphi ^\sigma \) converges uniformly to \(\varphi \) on compact sets, we can let \(\sigma \rightarrow 0\) and obtain that \({{\tilde{\rho }}}\) is a weak solution to (PME) in the sense of Definition 2.3. Uniqueness of weak solutions of (PME) is a known result, c.f. e.g. [28, 58]. Hence, we obtain convergence of the whole sequence \(\rho ^\varepsilon \) narrowly converges to \({\tilde{\rho }}\), and \({\tilde{\rho }}^{-\frac{1}{2}}|\nabla {\tilde{\rho }}^m|\in L^1([0,T];L^2({{\mathbb {R}}^{d}}))\), by comparison with the theory in [1]. \(\square \)

Remark 5.2

(Theorem 2.2 for general functionals) For a general integrand F satisfying (F1)(F2), and (F3), the RHS of (5.1) becomes

$$\begin{aligned} -\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)\nabla F'(V_\varepsilon *\rho _r^\varepsilon )(x)(V_\varepsilon *\rho _r^\varepsilon )(x)\,dx\,dr -\int _0^t\int _{{\mathbb {R}}^{d}}z_r^\varepsilon (x)\nabla F'(V_\varepsilon *\rho _r^\varepsilon )\,dx\,dr, \end{aligned}$$

being \(z^\varepsilon \) as in (5.2). Supposing F also satisfies (\({{\textbf {F}}}_m\)) for some \(m\ge 2\), we have the estimate \(|F''(x)|\le c x^{m-2}\), hence \(F\in C^{2}([0,\infty ))\) (origin included). Notice that the power laws for \(m\ge 2\) satisfy all of (F1)(F2)(F3), and (\({{\textbf {F}}}_m\)). The error term can be estimated as

$$\begin{aligned}&\left| \int _0^t\int _{{\mathbb {R}}^{d}}z_r^\varepsilon (x)\nabla F'(V_\varepsilon *\rho _r^\varepsilon )\,dx\,dr\right| \\&\quad \lesssim \int _0^t\int _{{\mathbb {R}}^{d}}|z_r^\varepsilon (x)||v_r^\varepsilon (x)|^{\frac{m}{2}-1}|\nabla (v_r^\varepsilon (x))^{\frac{m}{2}}|\,dx\,dr\\&\quad \lesssim \Vert z^\varepsilon \Vert _{L^m([0,t]\times {{\mathbb {R}}^{d}})}\Vert (v^\varepsilon )^{\frac{m}{2}-1}\Vert _{L^q([0,t]\times {{\mathbb {R}}^{d}})}\Vert \nabla (v^\varepsilon )^{\frac{m}{2}}\Vert _{L^2([0,t]\times {{\mathbb {R}}^{d}})}, \end{aligned}$$

for \(q=\frac{2m}{m-2}\), so that \(\frac{1}{m}+\frac{1}{q}+\frac{1}{2}=1\); thus it vanishes as \(\varepsilon \rightarrow 0\) similar to Lemma 5.1. As for the first term, note that it can be rewritten as

$$\begin{aligned}&\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)\nabla F'(v_r^\varepsilon )(x)v_r^\varepsilon (x)\,dx\,dr\\&\quad =\frac{2}{m}\int _0^t\int _{{\mathbb {R}}^{d}}F''(v_r^\varepsilon (x))(v_r^\varepsilon (x))^{2-\frac{m}{2}}\nabla (v_r^\varepsilon )^\frac{m}{2}\nabla \varphi (x)\,dx\,dr, \end{aligned}$$

where \(F''(x)x^{2-\frac{m}{2}}\) is extended by zero when \(x=0\) owing to (\({{\textbf {F}}}_m\)), and we applied the chain rule twice on the set \(\{v_r^\varepsilon >0\}\)

$$\begin{aligned} v_r^\varepsilon \nabla v_r^\varepsilon =\frac{1}{2}\nabla [ (v_r^\varepsilon )^{\frac{m}{2}}]^{\frac{4}{m}}=\frac{2}{m}(v_r^\varepsilon (x))^{2-\frac{m}{2}}\nabla (v_r^\varepsilon )^\frac{m}{2}. \end{aligned}$$

When multiplied with \(F''(v_r^\varepsilon )\), the integrand on the right-hand side makes sense in \(L^1\) owing to (\({{\textbf {F}}}_m\)) since \(F''(x) \le c x^{m-2}\). Then, we are left to show \(g(v^\varepsilon ){:=}F''(v^\varepsilon )(v^\varepsilon )^{2-\frac{m}{2}}\) strongly converges in \(L^2([0,T]\times {{\mathbb {R}}^{d}})\). This is indeed achieved by bounding \(|F''(v^\varepsilon )(v^\varepsilon )^{2-\frac{m}{2}}|\le c (v^\varepsilon )^\frac{m}{2}\) and applying the generalised version of the Lebesgue dominated convergence theorem. Therefore, in the \(\varepsilon \rightarrow 0^+\) limit we obtain (up to pass to a subsequence)

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\varphi (x){{\tilde{\rho }}}(t,x)\,dx= & {} \int _{{\mathbb {R}}^{d}}\varphi (x)\rho _0(x)\,dx-\frac{2}{m}\int _0^t\int _{{\mathbb {R}}^{d}}F''({{\tilde{\rho }}}(s,x))({{\tilde{\rho }}}(s,x))^{2-\frac{m}{2}}\\{} & {} \quad \nabla ({{\tilde{\rho }}}(s,x))^\frac{m}{2}\nabla \varphi (x)\,dx\,dr. \end{aligned}$$

Defining G such that \(G'(x^{\frac{m}{2}})=F''(x)x^{2-\frac{m}{2}}\), or \(G'(x)=F''(x^\frac{2}{m})x^{\frac{2}{m}(2-\frac{m}{2})}\), and \(G(0)=0\), we can apply the chain rule to \(P(x)=G(x^{\frac{m}{2}})\) to obtain that \(\nabla P(\rho )\in L^1([0,T]\times {{\mathbb {R}}^{d}})\), thus

$$\begin{aligned} \int _{{\mathbb {R}}^{d}}\varphi (x){{\tilde{\rho }}}(t,x)\,dx= \int _{{\mathbb {R}}^{d}}\varphi (x)\rho _0(x)\,dx-\int _0^t\int _{{\mathbb {R}}^{d}}\nabla P({{\tilde{\rho }}}(s,x))\nabla \varphi (x)\,dx\,dr. \end{aligned}$$

Note that the chain rule and the construction of the pressure P holds for all \(m>1\). Uniqueness of distributional solutions of (DE) is proven in [5] for bounded solutions, which is actually the case for \(L^1\) solutions — see [15, 59] for further details on the so-called \(L^1\)-\(L^\infty \) regularising effect. In particular, from [1, Theorem 11.2.5] we infer that our solution is a 2-Wasserstein gradient flow satisfying

$$\begin{aligned} \int _0^T\int _{{\mathbb {R}}^{d}}\frac{|\nabla P(\rho )|^2}{\rho }\,dx\,dt<\infty . \end{aligned}$$

Furthermore, uniqueness of solutions implies convergence of the whole sequence \(\rho ^\varepsilon \), as for (PME).

Remark 5.3

Our result can be also interpreted in the context of generalised gradient flows or gradient structures, following the dynamical interpretation of the Wasserstein distance, c.f. [3, 33]. More precisely, we know

$$\begin{aligned} d_W^2(\rho _0,\rho _1)=\inf \left\{ \int _0^1 {\mathcal {A}}(\rho _t,j_t)\,dt : (\rho ,j) \text{ solves } {\left\{ \begin{array}{ll}\partial _t\rho +\nabla \cdot j=0,\\ \rho (0)=\rho _0, \quad \rho (1)=\rho _1 \end{array}\right. }\right\} , \end{aligned}$$

where, for any \(\lambda \in {\mathcal {M}}({{\mathbb {R}}^{d}};{{\mathbb {R}}^{d}})\) such that \(\rho , j\ll |\lambda |\),

$$\begin{aligned} {\mathcal {A}}(\rho ,j)=\int _{{\mathbb {R}}^{d}}\alpha \left( \frac{dj}{d|\lambda |},\frac{d\rho }{d|\lambda |}\right) d|\lambda |, \quad \text{ and } \quad \alpha (j,r){:=}{\left\{ \begin{array}{ll} \frac{(j)^2}{r} \qquad &{}\text {if}\ r>0,\\ 0 \qquad &{}\text {if}\ j\le 0\ \text {and}\ r=0,\\ \infty \qquad &{}\text {if}\ j> 0\ \text {and}\ r=0. \end{array}\right. } \end{aligned}$$

Upon using a careful regularisation and cut-off argument one can prove the following chain rule for any absolutely continuous curve with respect to the Wasserstein distance

$$\begin{aligned} {\mathcal {F}}^\varepsilon [\rho (t)]-{\mathcal {F}}^\varepsilon [\rho _0]=-\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \frac{\delta {\mathcal {F}}^\varepsilon }{\delta \rho }(x)\, \cdot d j_t(x)\,dt, \end{aligned}$$

hence re-intepret weak (measure) solutions of (NLE) as the zero level set of the De Giorgi functional

$$\begin{aligned} {\mathcal {F}}^\varepsilon [\rho (t)]-{\mathcal {F}}^\varepsilon [\rho _0]+\frac{1}{2}\int _0^t\int _{{\mathbb {R}}^{d}}\left| \nabla \frac{\delta {\mathcal {F}}^\varepsilon }{\delta \rho }(x)\right| ^2\,d \rho _t(x)\,dt+\frac{1}{2}\int _0^t{\mathcal {A}}\left( \rho ,\,-\rho \nabla \frac{\delta {\mathcal {F}}^\varepsilon }{\delta \rho }\right) dt=0. \end{aligned}$$

5.2 The case \(1<m<2\)

The key idea here is to estimate the error \(z^\varepsilon \) from (5.2) differently by exploiting the compact support of \(V_1\). We take \(R>0\) such that \(\textrm{supp}V_1 \subset B_R\). In the case \(m\ge 2\), negative powers of \(v^\varepsilon = V_\varepsilon *\rho ^\varepsilon \) never appeared in (5.1) but these computations can be recycled by cautiously avoiding the 0 level set of \(v^\varepsilon \). We define

$$\begin{aligned} A_r^\varepsilon {:=} \{x\in {{\mathbb {R}}^{d}}\, | \, v_r^\varepsilon (x) = V_\varepsilon * \rho _r^\varepsilon (x) >0\} \end{aligned}$$

and alter the computations in (5.1) carefully

$$\begin{aligned} \begin{aligned} \int _{{\mathbb {R}}^d}\!\!\varphi d\rho _t^\varepsilon \!-\!\!\int _{{{\mathbb {R}}^{d}}}\!\!\varphi d\rho _0&=-\frac{m}{m-1}\int _0^t\int _{{\mathbb {R}}^{d}}\nabla \varphi (x)\cdot \nabla V_\varepsilon *(V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x)d\rho _r^\varepsilon (x)dr\\&=\frac{m}{m-1}\int _0^t\int _{{{\mathbb {R}}^{d}}}\!\!\!(V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x)[\nabla V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )](x)dxdr\\&= \frac{m}{m-1}\int _0^t\int _{{{\mathbb {R}}^{d}}}\!\!\!(V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x)\nabla [V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )](x)dxdr. \end{aligned} \end{aligned}$$
(5.3)

In the second line, we swapped the convolution against \(\nabla V_\varepsilon \) and picked up a minus sign because it is an odd function (remember from (V) that \(V_1\) is even). At this point, we would like to perform integration by parts and apply the gradient onto \((V_\varepsilon *\rho _r^\varepsilon )^{m-1}\). In contrast to (5.1), when \(1<m<2\) we need to avoid the zero set of \(V_\varepsilon *\rho _r^\varepsilon \); as smooth as this convolution may be, the function \((V_\varepsilon *\rho _r^\varepsilon )^{m-1}\) is not differentiable on \({{\mathbb {R}}^{d}}\setminus A_r^\varepsilon \). However, due to Lemma 5.3 (see below), we can justify the integration by parts and develop (5.3) to get

$$\begin{aligned} \begin{aligned} \int _{{{\mathbb {R}}^{d}}}\!\!\varphi d\rho _t^\varepsilon \!-\!\!\int _{{{\mathbb {R}}^{d}}}\!\!\varphi d\rho _0&=\frac{m}{m-1}\int _0^t\int _{{{\mathbb {R}}^{d}}}\!\!\!(V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x)\nabla [V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )](x)dxdr \\&= -\frac{m}{m-1}\int _0^t \int _{A_r^\varepsilon }\nabla (V_\varepsilon *\rho _r^\varepsilon )^{m-1}(x) V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )(x)dxdr \quad (\text {Lemma 5.3}) \\&= -2\int _0^t\int _{A_r^\varepsilon }(V_\varepsilon *\rho _r^\varepsilon \nabla \varphi )(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}-1}\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}\,dx\,dr\\&=-2\int _0^t\int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr\\&\quad -2\int _0^t\int _{A_r^\varepsilon }z_r^\varepsilon (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}-1}\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr. \end{aligned} \end{aligned}$$
(5.4)

The last line is nearly identical to the end result of (5.1). Here, we integrate over \(A_r^\varepsilon = \{ V_\varepsilon *\rho _r^\varepsilon >0\}\) which is justified by Lemma 5.3. As was the case in Sect. 5.1, we need to show that the error term in the last line vanishes as \(\varepsilon \downarrow 0\).

Proof of Theorem 2.2 for \(1<m<2\) Convergence in the first term on the right-hand side of (5.4) can be treated as for the case \(m\ge 2\), due to Lemma 4.1 and Propositon 4.3. Hence we focus on the error term. For simplicity, let us assume \(\varphi \in C_c^2({\mathbb {R}}^d)\) since it can be approximated in such a way as described in Sect. 5.1. We estimate the error by first expressing it as

$$\begin{aligned} z^\varepsilon (x)&= V_\varepsilon *(\rho ^\varepsilon \nabla \varphi )(x) - (V_\varepsilon *\rho ^\varepsilon )(x)\nabla \varphi (x) \\&= \int _{{{\mathbb {R}}^{d}}}V_\varepsilon (x-y)(\nabla \varphi (y) - \nabla \varphi (x))d\rho ^\varepsilon (y). \end{aligned}$$

Next, we apply the Mean Value Theorem to the difference \(|\nabla \varphi (y) - \nabla \varphi (x)| \le \Vert D^2\varphi \Vert _{L^\infty }|x-y|\) and obtain

$$\begin{aligned} |z^\varepsilon | \le \Vert D^2\varphi \Vert _{L^\infty }\int _{{\mathbb {R}}^d} |x-y|V_\varepsilon (x-y)d\rho ^\varepsilon (y). \end{aligned}$$

Now, we exploit the compact support of the generator \(V_1\). Since \(V_1\) is supported within \(B_R\), then \(V_\varepsilon \) is supported within \(B_{R\varepsilon }\) which leads to

$$\begin{aligned} \begin{aligned} |z^\varepsilon (x)|&\le \Vert D^2\varphi \Vert _{L^\infty } \int _{\{y \in {\mathbb {R}}^d \, | \, |x-y|\le R\varepsilon \}}|x-y| V_\varepsilon (x-y)d\rho ^\varepsilon (y) \\&\le R\varepsilon \Vert D^2\varphi \Vert _{L^\infty } \int _{{{\mathbb {R}}^{d}}} V_\varepsilon (x-y)d\rho ^\varepsilon (y) = R\varepsilon \Vert D^2\varphi \Vert _{L^\infty }(V_\varepsilon *\rho ^\varepsilon )(x), \quad \forall x\in {{\mathbb {R}}^{d}}. \end{aligned} \end{aligned}$$
(5.5)

The last integral in (5.4) can be estimated with (5.5) as follows

$$\begin{aligned}&\left| \int _0^t\int _{A_r^\varepsilon } z_r^\varepsilon (x)(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}-1}\nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\,dx\,dr \right| \\&\quad \le R\varepsilon \Vert D^2\varphi \Vert _{L^\infty }\int _0^t \int _{A_r^\varepsilon }(V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}\left| \nabla (V_\varepsilon *\rho _r^\varepsilon )^{\frac{m}{2}}(x)\right| \,dx\,dr \\&\quad \le R\varepsilon \Vert D^2\varphi \Vert _{L^\infty }\Vert (v^\varepsilon )^\frac{m}{2}\Vert _{L^2([0,T]\times {{\mathbb {R}}^{d}})}\Vert \nabla (v^\varepsilon )^\frac{m}{2}\Vert _{L^2([0,T]\times {{\mathbb {R}}^{d}})}. \end{aligned}$$

We have suggestively recalled the notation \(v^\varepsilon = V_\varepsilon *\rho ^\varepsilon \) precisely with Lemma 4.1 in mind; \((v^\varepsilon )^\frac{m}{2}\) is uniformly bounded in \(L^2([0,T]; H^1({\mathbb {R}}^d))\). Hence, the last integral is uniformly bounded in \(\varepsilon \). Moreover, the prefactor of vanishing \(\varepsilon \) implies that the last term of (5.4) converges to zero in the limit, thus recovering

$$\begin{aligned} \int _{{{\mathbb {R}}^{d}}}\varphi (x) d\rho _t(x)=\int _{{{\mathbb {R}}^{d}}}\varphi (x) d\rho _0(x)-2\int _0^t\int _{{{\mathbb {R}}^{d}}}\nabla \varphi (x)(\rho _r)^{\frac{m}{2}}(x)\nabla (\rho _r)^{\frac{m}{2}}(x)\,dx\,dr. \end{aligned}$$

By means of the chain rule for Sobolev spaces, one can prove \(\nabla \rho ^m=2\rho ^{\frac{m}{2}}\nabla \rho ^{\frac{m}{2}}\) (in the weak sense). More precisely, we can see \(\rho ^m=G\circ u\), for \(u=\rho ^{\frac{m}{2}}\) and \(G(x)=x^2\). The work by Dahlberg and Kenig [28] establishes uniqueness of very weak solutions for (PME), for \(m>1\). As a byproduct, we also infer that our solution is a gradient flow in the sense of [1, Theorem 11.2.5], meaning

$$\begin{aligned} \int _0^T\int _{{\mathbb {R}}^{d}}\frac{|\nabla \rho ^m|^2}{\rho }\,dx\,dt<\infty . \end{aligned}$$

As for general energies induced by F satisfying (F1), (F2)(F3), and (\({{\textbf {F}}}_m\)), a combination of the same estimates from Remark 5.2 (disregarding the Hölder estimate) and this new technique for treating \(z^\varepsilon \) yield the full result. \(\square \)

We now justify the integration by parts step going from (5.3) to (5.4). Recall the notation \(v_r^\varepsilon = V_\varepsilon *\rho _r^\varepsilon \).

Lemma 5.3

For fixed \(\varepsilon ,\, t>0\), and \(\varphi \in C_c^1({{\mathbb {R}}^{d}})\), there holds

$$\begin{aligned} \int _0^t\int _{{{\mathbb {R}}^{d}}}(v_r^\varepsilon )^{m-1}(x) \nabla V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )(x) dx dr = -\int _0^t \int _{A_r^\varepsilon } \nabla (v_r^\varepsilon )^{m-1}(x)V_\varepsilon * (\rho _r^\varepsilon \nabla \varphi )(x) dx dr. \end{aligned}$$
(5.6)

In particular, both integrals converge absolutely.

Proof

We begin by proving both integrals in (5.6) converge absolutely. For the integral on the left-hand side of (5.6), we estimate

$$\begin{aligned} \begin{aligned}&\int _0^t\int _{{{\mathbb {R}}^{d}}}\left| (v_r^\varepsilon )^{m-1}(x) \nabla V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )(x)\right| dxdr \\&\quad \le \int _0^t\int _{{{\mathbb {R}}^{d}}}\left( \int _{{{\mathbb {R}}^{d}}} V_\varepsilon (x-y)d\rho (y) \right) ^{m-1}\left( \int _{{{\mathbb {R}}^{d}}}\left| \nabla V_\varepsilon (x-z)\nabla \varphi (z) \right| d\rho _r^\varepsilon (z)\right) dx dr \\&\quad \le \int _0^t\int _{{{\mathbb {R}}^{d}}}\Vert V_\varepsilon \Vert _{L^\infty }^{m-1}\Vert \nabla \varphi \Vert _{L^\infty }\int _{{{\mathbb {R}}^{d}}} |\nabla V_\varepsilon (x-z)|d\rho _r^\varepsilon (z) dx dr \\&\quad \le \int _0^t\Vert V_\varepsilon \Vert _{L^\infty }^{m-1}\Vert \nabla \varphi \Vert _{L^\infty } \Vert \nabla V_\varepsilon \Vert _{L^1}dr \le \Vert V_\varepsilon \Vert _{L^\infty }^{m-1}\Vert \nabla \varphi \Vert _{L^\infty } \Vert \nabla V_\varepsilon \Vert _{L^1} t, \end{aligned} \end{aligned}$$
(5.7)

where the last line is obtained by Fubini’s theorem. Therefore, the integral on the left-hand side of (5.6) is absolutely convergent. Turning to the integral on the right-hand side of (5.6), we first record

$$\begin{aligned} \begin{aligned} |V_\varepsilon * (\rho _r^\varepsilon \nabla \varphi )(x)|&\le \int \left| V_\varepsilon (x-y)\nabla \varphi (y)\right| d\rho _r^\varepsilon (y) \le \Vert \nabla \varphi \Vert _{L^\infty }\int V_\varepsilon (x-y)d\rho (y) \\&= \Vert \nabla \varphi \Vert _{L^\infty }v_r^\varepsilon (x) = \Vert \nabla \varphi \Vert _{L^\infty }v_r^\varepsilon (x) \chi _{A_r^\varepsilon }(x), \end{aligned} \end{aligned}$$
(5.8)

where \(\chi _{A_r^\varepsilon }(x)\) is the indicator function on the set \(A_r^\varepsilon \). With (5.8), we obtain

$$\begin{aligned} |\nabla (v_r^\varepsilon )^{m-1}(x)| |V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )(x)|&\le \Vert \nabla \varphi \Vert _{L^\infty } |\nabla (v_r^\varepsilon )^{m-1}(x)| v_r^\varepsilon (x)\chi _{A_r^\varepsilon }(x)\\&= (m-1)\Vert \nabla \varphi \Vert _{L^\infty }|\nabla v_r^\varepsilon | (v_r^\varepsilon )^{m-2} v_r^\varepsilon \chi _{A_r^\varepsilon } \\&= \frac{2(m-1)}{m}\Vert \nabla \varphi \Vert _{L^\infty } |\nabla (v_r^\varepsilon )^\frac{m}{2}|(v_r^\varepsilon )^\frac{m}{2}\chi _{A_r^\varepsilon }. \end{aligned}$$

Recalling Lemma 4.1, we conclude by comparison that

$$\begin{aligned} |\nabla (v_r^\varepsilon )^{m-1}(x)| |V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )(x)| \in L^1((0,t)\times {{\mathbb {R}}^{d}}), \end{aligned}$$

which shows that the integral on the right-hand side of (5.6) is absolutely convergent.

Integration by parts: For brevity, we drop the subscript \(r\in (0,t)\) and the superscript \(\varepsilon >0\) so we consider \(v = V *\rho \) in place of \(v_r^\varepsilon = V_\varepsilon * \rho _r^\varepsilon \). In order to verify (5.6), we fix \(\sigma >0\) and a direction \(e\in {\mathbb {S}}^{d-1}\) and look at the following difference quotient

$$\begin{aligned} I_\sigma&{:=} \int _0^t\int _{{{\mathbb {R}}^{d}}} v^{m-1}(x) \frac{V*(\rho \nabla \varphi )(x+\sigma e) - V*(\rho \nabla \varphi )(x)}{\sigma }dx dr. \end{aligned}$$

Owing to the uniform bound from (5.8) and Lebesgue’s Dominated Convergence Theorem, we have

$$\begin{aligned} \lim _{\sigma \rightarrow 0} I_\sigma =e \cdot \int _0^t\int _{{{\mathbb {R}}^{d}}}(v_r^\varepsilon )^{m-1}(x) \nabla V_\varepsilon *(\rho _r^\varepsilon \nabla \varphi )(x) dx dr, \end{aligned}$$

recovering the left-hand side of (5.6) (along any arbitrary direction \(e\in {\mathbb {S}}^{d-1}\)). On the other hand, by changing variables we also have

$$\begin{aligned} I_\sigma&= \int _0^t\int _{{{\mathbb {R}}^{d}}}\frac{v^{m-1}(x-\sigma e) - v^{m-1}(x)}{\sigma }V*(\rho \nabla \varphi )(x)dx dr \\&= -\int _0^t\int _{ A_r^\varepsilon }\frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x)dxdr. \end{aligned}$$

We are allowed to restrict the integration region to \(A_r^\varepsilon \) due to (5.8); if \(x\notin A_r^\varepsilon \), then \(|V*(\rho \nabla \varphi )(x)| \le \Vert \nabla \varphi \Vert _{L^\infty }v(x) = 0\). In order to prove (5.6), we wish to show

$$\begin{aligned} \begin{aligned} \lim _{\sigma \rightarrow 0}I_\sigma&= \lim _{h\rightarrow 0}\left( -\int _0^t\int _{ A_r^\varepsilon }\frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x)dxdr\right) \\&= -e\cdot \int _0^t \int _{A_r^\varepsilon } \nabla (v_r^\varepsilon )^{m-1}(x)V_\varepsilon * (\rho _r^\varepsilon \nabla \varphi )(x) dx dr. \end{aligned} \end{aligned}$$
(5.9)

The strategy is to apply the extended Dominated Convergence Theorem (Theorem A.1) by exhibiting an appropriate sequence of majorants to the integrand in the first line of (5.9).

The first step is to remember (5.8) and estimate the integrand of \(I_\sigma \) as follows

$$\begin{aligned} \begin{aligned}&\left| \frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x) \right| \\&\quad \le \Vert \nabla \varphi \Vert _{L^\infty } \left| \frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }\right| \left| V*\rho (x)\right| \\&\quad = \Vert \nabla \varphi \Vert _{L^\infty } \left| \frac{v^{m}(x) - v^{m-1}(x-\sigma e)v(x)}{\sigma }\right| . \end{aligned} \end{aligned}$$
(5.10)

In the last line, we have distributed \(v = V*\rho \) into the difference quotient. We wish to re-express the term \(v^{m-1}(x-\sigma e)v(x)\) as \(v^{m}(x) + \) error term. For this, we use the Mean Value Theorem to write

$$\begin{aligned}&v(x) = v(x-\sigma e) - \int _0^1 \frac{d}{ds}v(x-s\sigma e)ds= v(x-\sigma e) + \sigma e\cdot \int _0^1 \nabla v(x-s\sigma e)ds. \end{aligned}$$

Substituting this into \(v^m(x) - v^{m-1}(x-\sigma e)v(x)\) gives

$$\begin{aligned}&v^m(x) - v^{m-1}(x-\sigma e)v(x) \\&\quad = v^m(x) - v^{m-1}(x-\sigma e) \left( v(x-\sigma e) + \sigma e\cdot \int _0^1\nabla v(x-s\sigma e)ds \right) . \end{aligned}$$

Inserting this into (5.10) yields the estimate

$$\begin{aligned} \begin{aligned}&\left| \frac{v^{m-1}(x) - v^{m-1}(x-h\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x) \right| \\&\quad \le \Vert \nabla \varphi \Vert _{L^\infty }\frac{1}{\sigma }\left| v^m(x) - v^{m}(x-\sigma e) - v^{m-1}(x-\sigma e)\sigma e \right. \\&\quad \left. \cdot \int _0^1 \nabla V*(\rho \nabla \varphi )(x - s\sigma e)ds \right| . \end{aligned} \end{aligned}$$
(5.11)

We use the Mean-Value Theorem again with the initial difference (valid since \(m>1\))

$$\begin{aligned}{} & {} v^m(x) - v^m(x-\sigma e) \\{} & {} \quad = -\int _0^1 \frac{d}{ds}v^m(x-s\sigma e)ds = m\sigma e \\{} & {} \quad \cdot \int _0^1 v^{m-1}(x-s\sigma e)\nabla v(x-s\sigma e) ds \end{aligned}$$

and insert this into (5.11) to obtain

$$\begin{aligned} \begin{aligned}&\left| \frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x) \right| \\&\quad \le \Vert \nabla \varphi \Vert _{L^\infty }\frac{1}{\sigma }\left| m\sigma e\cdot \int _0^1 v^{m-1}(x-s\sigma e)\nabla v(x-s\sigma e)ds - v^{m-1}(x-\sigma e)\sigma e\right. \\&\qquad \left. \cdot \int _0^1 \nabla V*(\rho \nabla \varphi )(x - s\sigma e)ds \right| \\&\quad \le \Vert \nabla \varphi \Vert _{L^\infty }\left| m\int _0^1v^{m-1}(x-s\sigma e)\nabla v(x-s\sigma e) - v^{m-1}(x-\sigma e)\nabla V*(\rho \nabla \varphi )(x-s\sigma e)ds \right| \\&\quad \le \Vert \nabla \varphi \Vert _{L^\infty }\int _0^1\left| mv^{m-1}(x-s\sigma e) + \Vert \nabla \varphi \Vert _{L^\infty } v^{m-1}(x-\sigma e)\right| \left| \nabla v(x-s\sigma e) \right| ds. \\&\quad \le C_{m, \varphi }\Vert V\Vert _{L^\infty }^{m-1} \int _0^1 |\nabla v(x-s\sigma e)|ds. \end{aligned} \end{aligned}$$
(5.12)

In the third line of (5.12), we eliminated the common factor of \(\sigma \) together with the trivial estimate \(|e|=1\). In the fourth line of (5.12), we estimated similar to (5.8) the convolution

$$\begin{aligned} |\nabla V*(\rho \nabla \varphi )| \le \Vert \nabla \varphi \Vert _{L^\infty }|\nabla V*\rho | = \Vert \nabla \varphi \Vert _{L^\infty }|\nabla v|. \end{aligned}$$

In the final line of (5.12), we used the following inequality

$$\begin{aligned} v^{m-1}(x) = \left( \int V(x-y)d\rho (y) \right) ^{m-1} \le \Vert V\Vert _{L^\infty }^{m-1} \left( \int d\rho (y) \right) ^{m-1} = \Vert V\Vert _{L^\infty }^{m-1}. \end{aligned}$$

We are now in a position to apply Theorem A.1 with \(X = (0,t)\times {{\mathbb {R}}^{d}}\) where \(y=(r,x) \in X\) and

$$\begin{aligned} f^\sigma (r,x)&= \frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x)\chi _{A_r^\varepsilon }(x), \\ g^\sigma (r,x)&= C_{m, \varphi }\Vert V\Vert _{L^\infty }^{m-1} \int _0^1 |\nabla v(x-s\sigma e)|ds. \end{aligned}$$

Here, \(\chi _{A_r^\varepsilon }(x)\) is the indicator function of the set \(A_r^\varepsilon \). Remember that we have suppressed the dependence on \(r\in (0,t)\) in \(\rho \). The first assumption of Theorem A.1 has been verified by the estimate of (5.12). We can verify the second assumption of Theorem A.1 since the pointwise limits of \(f^\sigma \) and \(g^\sigma \) as \(\sigma \rightarrow 0\) are

$$\begin{aligned} f(r,x) = e\cdot \nabla (v)^{m-1}(x) V*(\rho \nabla \varphi )(x)\chi _{A_r^\varepsilon }(x), \quad g(r,x) = C_{m,\varphi }\Vert V\Vert _{L^\infty }^{m-1}|\nabla v(x)|. \end{aligned}$$

The pointwise limit for \(f^\sigma \) is justified since \(A_r^\varepsilon \) is an open set. As for the pointwise limit of \(g^\sigma \), the usual Dominated Convergence Theorem suffices.

It remains to check the third assumption of Theorem A.1 which we do with Fubini;

$$\begin{aligned} \int _X g^\sigma (r,x) dy&= C_{m,\varphi } \Vert V\Vert _{L^\infty }^{m-1}\int _0^t\int _{{{\mathbb {R}}^{d}}} \int _0^1|\nabla v (x-s\sigma e)|ds dx dr\\&= C_{m,\varphi }\Vert V\Vert _{L^\infty }^{m-1} \int _0^t\int _0^1 \int _{{{\mathbb {R}}^{d}}} |\nabla v(x-s\sigma e)|dx ds dr \\&= C_{m,\varphi } \Vert V\Vert _{L^\infty }^{m-1}\int _0^t\int _0^1\int _{{{\mathbb {R}}^{d}}}|\nabla v(x)|dx ds dr \\&= C_{m,\varphi }\Vert V\Vert _{L^\infty }^{m-1}\int _0^t\int _{{\mathbb {R}}^d}|\nabla v(x)|dx dr = \int _X g(r,x)dy. \end{aligned}$$

Therefore, by Theorem A.1, we have \(\int _X f^\sigma (r,x)dy \rightarrow \int _X f(r,x)dy\) as \(\sigma \rightarrow 0\) which is precisely (5.9);

$$\begin{aligned} \lim _{\sigma \rightarrow 0}I_\sigma&= \lim _{\sigma \rightarrow 0}\left( -\int _0^t\int _{ A_r^\varepsilon }\frac{v^{m-1}(x) - v^{m-1}(x-\sigma e)}{\sigma }V*(\rho \nabla \varphi )(x)dxdr\right) \\&= -\lim _{\sigma \rightarrow 0} \int _X f^\sigma (r,x)dy = -\lim _{\sigma \rightarrow 0}\int _X f(r,x)dy \\&= -e\cdot \int _0^t \int _{ A_r^\varepsilon } \nabla (v_r^\varepsilon )^{m-1}(x)V_\varepsilon * (\rho _r^\varepsilon \nabla \varphi )(x) dx dr. \end{aligned}$$

\(\square \)

6 Convexity, uniqueness, and particle approximation

In this section, we sketch the argument adapted from [27] to prove Theorem 2.3 and Corollary 2.1. Recall that we assume (\({{\textbf {F}}}_m\)) with \(m>1\). To simplify the exposition, we set \(F'(0) = 0\). In view of the assumptions needed for the kernel \(V_1\), see (V), it is not reasonable to choose \(V_1\) convex, as we require finite second order moment. However, this does not prohibit \(\lambda \)-convexity of the functional \({\mathcal {F}}^\varepsilon \) along geodesics. Indeed, differentiability of \({\mathcal {F}}^\varepsilon \), as in [10, Proposition 3.10], holds in our case assuming F satisfies (F1)(F2), (F3), and it is convex (convexity is ensured by (\({{\textbf {F}}}_m\))). Furthermore, we need \(V_1\in C^2({{\mathbb {R}}^{d}})\) and \(D^2 V_1\in L^\infty ({{\mathbb {R}}^{d}})\). In order to give a few explanations in this direction, fix \(\rho _1,\rho _2 \in {{\mathcal {P}}_2({{\mathbb {R}}^{d}})}\) and consider the geodesic connecting \(\rho _1\) to \(\rho _2\) defined by

$$\begin{aligned} \rho _\alpha {:=} ((1-\alpha )\pi ^1 + \alpha \pi ^2)_\# \gamma , \quad \alpha \in [0,1], \end{aligned}$$

where \(\gamma \in \Gamma ({\mathbb {R}}^d\times {\mathbb {R}}^d)\) satisfies

$$\begin{aligned} \pi ^i_{\#}\gamma =\rho _i \text{ for } i=1,2. \end{aligned}$$

Owing to the regularisation by \(V_\varepsilon \), one can show (generalising and adapting the computations in [27, Propositions 3.4 and 3.6]) that the functional \({\mathcal {F}}^\varepsilon \) satisfies a geodesic ‘above the tangent line’ inequality [25, Proposition 2.8] in the sense that

$$\begin{aligned} {\mathcal {F}}^\varepsilon (\rho _2) - {\mathcal {F}}^\varepsilon (\rho _1) - \left. \frac{d}{d\alpha }\right| _{\alpha = 0}{\mathcal {F}}^\varepsilon (\rho _\alpha ) \ge - \frac{\lambda _F^\varepsilon }{2}d_W^2(\rho _1,\rho _2), \end{aligned}$$
(6.1)

where

$$\begin{aligned} \lambda _F^\varepsilon : = -\frac{c_2\Vert D^2V_\varepsilon \Vert _{L^\infty }\Vert V_\varepsilon \Vert _{L^\infty }^{m-2}}{m-1}. \end{aligned}$$

The technical assumption \(F'(0) = 0\) enters here in the computation of (6.1). As a consequence we infer \(\lambda _F^\varepsilon \)-convexity similar to [10, Proposition 3.11] as well as a characterisation of the subdifferential, [10, Proposition 3.12] — adapted to our functional, meaning that

$$\begin{aligned} \frac{\delta {\mathcal {F}}^\varepsilon }{\delta \rho }=V_\varepsilon *F'(V_\varepsilon *\rho ). \end{aligned}$$

As for the modulus of convexity \(\lambda _F^\varepsilon \), similarly to [27], using (\({{\textbf {F}}}_m\)) and fixed \(m>1\), we have

$$\begin{aligned} \lambda _F^\varepsilon \approx -\varepsilon ^{-2-d(m-1)}, \end{aligned}$$

meaning that \(\lambda _F^\varepsilon \rightarrow -\infty \) as \(\varepsilon \rightarrow 0\). The information above is enough to prove existence of a unique gradient flow of \({\mathcal {F}}^\varepsilon \), for \(\varepsilon >0\) fixed, following [1] and [10, Section 5] for regularised energies, thus proving Theorem 2.3. We omit the details and refer the interested reader again to similar computations in [27].

The regularisation of the energy by mollifiers \(V_1\) satisfying (V) used in this manuscript, as well as in [7], allows to extend stability of gradient flows to the case \(m>1\) — further assuming \(V_1\) is compactly supported for \(1<m<2\) (c.f. Sect. 5.2). The advantage of using convex energies is given by the possibility of using stability estimates with the 2-Wasserstein distance so that one obtains a particle approximation when the number of particles involved depends on \(\varepsilon \), i.e. \(N=N(\varepsilon )\). This is a qualitative result, recently proved rigorously in [27, Theorem 1.4], for \(m=2\). In our setting, we consider (NLE) as a continuity equation with velocity given by \(-\nabla V_\varepsilon *F'(V_\varepsilon *\rho )\). Then, under mild assumptions on the mollifer \(V_1\) and the function F, the empirical measure \(\rho ^N_\varepsilon (t) = \frac{1}{N}\sum _{j=1}^N \delta _{x^j_\varepsilon (t)}\) is a weak solution to (NLE) provided the particles satisfy the following ODE system

$$\begin{aligned} {\dot{x}}^i_\varepsilon (t) = - \nabla \int _{{\mathbb {R}}^d} V_\varepsilon (x^i_\varepsilon (t) - y) F'\left( \frac{1}{N}\sum _{j=1}^NV_\varepsilon (y - x^j_\varepsilon (t)) \right) dy \quad \forall i=1,\dots , N. \end{aligned}$$

The regularisation in (NLE) is done in the same spirit as [27], therefore an analogous version of [27, Theorem 1.4] also holds true in our setting, with \(\lambda _F^\varepsilon \) specified above and \(m>1\). More precisely, as a consequence of the usual stability estimate for \(\lambda \)-gradient flows, [1, Theorem 11.2.1], we know

$$\begin{aligned} d_W(\rho ^\varepsilon (t),\rho _\varepsilon ^N(t))\le e^{-\lambda _F^\varepsilon t} d_W(\rho _\varepsilon ^N(0),\rho (0)). \end{aligned}$$

Therefore, if we assume that for \(\varepsilon \rightarrow 0\) there exists \(N=N(\varepsilon )\rightarrow +\infty \) such that

$$\begin{aligned} e^{-\lambda _F^\varepsilon t} d_W(\rho _\varepsilon ^N(0),\rho (0))\rightarrow 0, \end{aligned}$$

we infer the mean field limit since \(\rho ^\varepsilon \) is converging to a weak solution of (DE).