1 Introduction

1.1 An overview

We develop a variational approach to derive macroscopic hydrodynamic equations from particle models. Within this broad context, this article studies a new class of Hamilton–Jacobi equations in the space of probability measures. Specifically, we study the convergence of Hamiltonians and establish well-posedness for a limiting Hamilton–Jacobi equation using a model problem in statistical physics. However, our main interest is to develop a new scale-bridging methodology, rather than to advance the understanding of the specific model.

The theory of hydrodynamic limits can be divided into deterministic and stochastic theories. The goal of the deterministic approach is to derive continuum-level conservation laws as scaling limit of particle motions on the microscopic level, which satisfy Hamiltonian ordinary differential equations. At a key step, this requires the use of ergodic theory for Hamiltonian dynamical systems. Such theory, at a level to be applied successfully, is not readily available for a wide range of problems. Consequently, the program of rigorously deriving hydrodynamic limits from deterministic models remains a huge challenge [40, Chapter 1]. (See, however, the work of Dolgopyat and Liverani [11] on weakly interacting geodesic flows on manifolds of negative curvature, which made a progress in this direction.) This topic has a long history; indeed, the passage from atomistic to continuum models is mentioned by Hilbert in his explanation of his sixth problem (see [32] for a recent review). Stochastic hydrodynamics, on the other hand, relies on probabilistic interacting particle models, and the program has been more successful. Conceptually, one often thinks of these stochastic models as regularizations of underlying deterministic models. Usually, the interpretation is that, at appropriate intermediate scale, a class of particles develop contact with a fast oscillating environment, which is modeled by stochastic noises. With randomness injected in the particle motions, we can invoke probabilistic ergodic theorems. Probabilistic ergodic theory is much tamer than its counterpart for deterministic Hamiltonian dynamics. Hence the program can be carried out with rigor for a wide variety of problems. In many cases in stochastic hydrodynamics, convergence towards a macroscopic equation can be seen as a sophisticated version of the law of large numbers. Large deviation theory describes fluctuations around this macroscopic equation as limit, and offers finer information in form of a variational structure through a so-called rate function. In this context, the rate function automatically describes the hydrodynamic (macroscopic) limit equation as minimizer. A Hamiltonian formulation of large deviation theory for Markov processes has been developed by Feng and Kurtz [24]. We will follow the method in Sect. 4 of this paper. The literature on the large deviation approach to stochastic hydrodynamics is so extensive that we do not try to review it (e.g., Spohn [40], Kipnis and Landim [34]). Instead, we only mention the seminal work of Guo, Papanicolaou and Varadhan [33], which introduced a major novel technique, known as block-averaging/replacement-lemma, to handle multi-scale convergence. With refinements and variants, this block-averaging method remains the standard of the subject to the present day.

Despite the power and beauty of block-averaging and the replacement lemma, this method relies critically on probabilistic ergodic theories, and is thus largely restricted to stochastic models. In Sect. 4, we introduce a functional-analytic approach for scale-bridging which is different to the one of Guo, Papanicolaou and Varadhan [33]. We will still consider a stochastic model in this paper, but our method is developed with a view to be applicable in the deterministic program as well. Our approach takes inspiration from the Aubry–Mather theory for deterministic Hamiltonian dynamical systems, and, more directly, from another deeply linked topic known as the weak KAM (Kolmogorov–Arnold–Moser) theory (e.g., Fathi [23]). We will derive and study an infinite particle version of a specific weak KAM problem. By infinite particle version, we mean a Hamiltonian describing the motion of infinitely many particles. Hence it is defined in space of probability measures. To avoid raising false hopes, we stress again that the model we use still incorporates randomness. However, in principle, the Hamilton–Jacobi part of our program is not fundamentally tied to probabilistic ergodic theory. We choose a stochastic model problem to test ideas, as the infinite particle version of weak KAM theory in this case becomes simple and directly solvable. At the present time, a general infinite particle version of weak KAM theory does not exist. This is in sharp contrast with the very well-developed theories for finite particle versions of deterministic Hamiltonian dynamics defined in finite-dimensional compact domains. Therefore, it is useful for us to focus on a more modest goal in this article—we are content with developing a method of studying problems where other (probabilistic) approaches may apply in principle, though we are not aware of such results for Carleman-type models. To put things in perspective, we hope the study of this paper will reveal the relevance and importance of studying Hamilton–Jacobi partial differential equations in the space of probability measures, and open up many possibilities on this type of problems in the future.

In light of the preceding discussion, we study the convergence of Hamiltonians arising from the large deviation setting in Sect. 4 using a Hamilton–Jacobi theory developed by Feng and Kurtz [24]. This approach is different from the usual approach to large deviations, which can be viewed as a Lagrangian technique.

To understand the convergence of Hamilton–Jacobi equations, following a topological compactness-uniqueness strategy, we need to resolve two issues: one is the multi-scale convergence of particle-level Hamiltonians to the continuum-level Hamiltonian; the other is the uniqueness for a class of abstract Hamilton–Jacobi equations which includes the limiting continuum Hamiltonian. We settle the first issue in Sect. 4 in a semi-rigorous manner, and the second issue on Sects. 2 and 3 rigorously.

We first sketch how the second problem—existence and uniqueness for a class of Hamilton–Jacobi equations—is addressed in this paper; this is a question of independent interest, and corresponds to a hard issue for the traditional Lagrangian approach to hydrodynamic limits and large deviations, namely to match large deviation upper and lower bounds. Essentially, we may not know how regular the paths needs to be in order to approximate the Lagrangian action accurately for all paths. In the Hamiltonian setting advocated here, this problem can be solved rigorously for the model problem considered in this article. Indeed, we develop a method to establish a strong form of uniqueness (the comparison principle) for the macroscopic Hamilton–Jacobi equation. The analysis uses techniques from viscosity solution in space of probability measures developed by Feng and Kurtz [24] and Feng and Katsoulakis [25]. This is a relatively new topic and is a step forward compared to earlier studies initiated by Crandall and Lions [4,5,6,7,8,9,10] on Hamilton–Jacobi equations in infinite dimensions, focusing on Hilbert spaces. Our Hamiltonian has a structure which is closer in spirit to the one studied by Crandall and Lions [8] (but with a nonlinear operator and other subtle differences), to Example 9.35 in Chapter 9 and Section 13.3.3 in Chapter 13 of [24] and to Example 3 of [25], or Feng and Swiech [26]. It is different in structure than those studied by Gangbo, Nguyen and Tudorascu [28] and Gangbo and Tudorascu [30] in Wasserstein space; or to those studied with metric analysis techniques by Giga, Hamamuki and Nakayasu [31], Ambrosio and Feng [1], Gangbo and Swiech [29]. The difference is that we have to deal with an unbounded drift term given by a nonlinear operator which we explain next. In (1.1) below, we first informally define the Hamiltonian as \(H=H(\rho ,\varphi )\) for a probability measure \(\rho \) and a smooth test function \(\varphi \). This definition contains a nonlinear term \(\partial _{xx}^2 \log \rho \). A priori, \(\rho \) is just a probability measure, thus even the definition of this expression is a problem. If the probability measure \(\rho \) is zero on a set of positive Lebesgue measure, \(\log \rho \) cannot even be defined in a distributional sense. In addition, our notion of a solution is more than the B-continuity studied in [8]. Namely, for the large deviation theory, we need the solution to be continuous in the metric topology on the space (and it in fact is). This can be established through an a posteriori estimate technique which we introduce in Lemma 2.8. Regarding the possible singularity of \(\partial _{xx}^2 \log \rho \), in rigorous treatment, we will use non-smooth test functions \(\varphi \) in the Hamiltonian to compensate for the possible loss of distributional derivatives of the term \(\log \rho \). This is essentially a renormalization idea, in the sense that we rewrite the equation in appropriately chosen new coordinates to “tame” the singularities. Section 2 explores a hidden controlled gradient flow structure in the Hamiltonian. Using a theorem by Feng and Katsoulakis [25] and the regularization technique in Lemma 2.8, we establish a comparison principle for the Hamilton–Jacobi equation with the Hamiltonian given by (1.1). For the existence of a solution, we argue in the Lagrangian picture. Here, the problem translates into a nonlinear parabolic problem, which again is quite singular. We establish in Sect. 3 an existence theory using the theory of optimal control and Nisio semigroups, and by deriving some non-trivial estimates.

We now describe the approach for dealing with the first problem, the multi-scale convergence. This is discussed at the end of the paper (Sect. 4) and involves some semi-rigorous arguments. As this part involves a broad spectrum of techniques coming from different areas of mathematics, a rigorous justification and the description of details are long; we postpone them to future studies. The stochastic model we use here is known as the stochastic Carleman model studied by Caprino, De Masi, Presutti and Pulvirenti [3]. This is a fictitious system of interacting stochastic particles describing a two-velocity gas, and leads to the Carleman equation as the kinetic description. At a different (coarser) level, a hydrodynamic limit theorem has been derived by Kurtz [35] and then by McKean [38]. It yields the nonlinear diffusion equation studied in Sect. 3. Ideas of Lions and Toscani [37] to study this equation in terms of the density and the flux turn out to be useful. Here, following the Hamilton–Jacobi method of [24], we study large deviations of the stochastic Carleman equations. We give three heuristic derivations to identify the limit Hamiltonian, which is the one given in (1.1) and studied in the earlier parts of the paper. We now sketch these three limit identifications. The first is based on a formal weak KAM theory in an infinite-dimensional setting. The second approach is based on a finite-dimensional weak KAM theory, and the key is reduction due to propagation of chaos. The third derivation uses semiclassical approximations. We remark that our overall aim is to provide new functional-analytic methods for deriving the limiting continuum level Hamiltonian in this hydrodynamic large deviation setting. We turn the issue of multi-scale into one of studying a small-cell Hamiltonian averaging problem in the space of probability measures. In the present case, this can be solved at least formally by the weak KAM theory for Hamiltonian dynamical systems, which is a deterministic method.

Our program combines tools from a variety of sources, notably viscosity solutions in the space of probability measures, optimal transport, parabolic estimates, optimal control, Markov processes, and Hamiltonian dynamics. We use weak KAM type arguments to replace stronger versions of ergodic theory in the derivation of limiting Hamiltonian.

1.2 The setting

Let \({\mathcal {O}}\) be the one dimensional circle, i.e. the unit interval [0, 1] with periodic boundary by identifying 0 and 1 as one point. We denote by \({\mathcal {P}}({\mathcal {O}})\) the space of probability measures on \({\mathcal {O}}\) and formally define a Hamiltonian function on \({\mathcal {P}}({\mathcal {O}}) \times C^\infty ({\mathcal {O}})\):

$$\begin{aligned} H(\rho ,\varphi ) := \langle \varphi , \frac{1}{2} \partial _{xx}^2 \log \rho \rangle + \frac{1}{2} \int _{{\mathcal {O}}} |\partial _x \varphi |^2 dx, \quad \forall \rho \in {\mathcal {P}}({\mathcal {O}}), \varphi \in C^\infty ({\mathcal {O}}). \end{aligned}$$
(1.1)

We use the word formal because even for probability measures admitting a Lebesgue density \(\rho (dx)=\rho (x) dx \in \mathcal P({\mathcal {O}})\), as long as \(\rho (x)=0\) on a positive Lebesgue measure set of \({\mathcal {O}}\), we have \(-\log \rho (x) = +\infty \) on this set. In such cases, \(\partial _{xx}^2 \log \rho \) cannot be defined as a distribution. Therefore, we will explore special choices of the test functions \(\varphi \) which are \(\rho \) dependent and possibly non-smooth to compensate for loss of the distribution derivative on the \(\log \rho \) term.

We will introduce a number of notations and definitions in Sect. 1.3. In particular, we denote \({{\mathsf {X}}}:=\mathcal P({\mathcal {O}})\) and define a homogeneous negative order Sobolev space \(H_{-1}({\mathcal {O}})\) according to (1.14). In Sect. 1.3, we will show that \({{\mathsf {X}}}\) can be identified as a closed subset of this \(H_{-1}({\mathcal {O}})\). Hence it is a metric space as well. With the formal Hamiltonian function (1.1), we can now proceed to the second step to introduce a formally defined operator

$$\begin{aligned} H f(\rho ) := H\big (\rho , \frac{\delta f}{\delta \rho }\big ), \quad \forall \rho \in {{\mathsf {X}}}, \end{aligned}$$

where the test functions f are only chosen to be very smooth,

$$\begin{aligned} D&:= \big \{ f(\rho ) = \psi (\langle \rho , \varphi _1 \rangle , \ldots , \langle \varphi _k, \rho \rangle ) : \psi \in C^2({{\mathbb {R}}}^k), \nonumber \\&\qquad \qquad \varphi _i \in C^\infty ({\mathcal {O}}), i=1,\ldots , k; k=1,2, \ldots \big \}. \end{aligned}$$
(1.2)

In the first part of this paper (Sect. 2), we prove a comparison principle (Theorem 2.1) for a Hamilton–Jacobi equation in the space of probability measures. This equation is formally written as

$$\begin{aligned} f - \alpha H f = h. \end{aligned}$$
(1.3)

In this equation, the function h and the constant \(\alpha >0\) are given, and f is a solution. However, making sense of (1.3) rigorous is very subtle. Motivated by a priori estimates, we make sense of the operator H by introducing two more operators \(H_0\) and \(H_1\), and interpret equation (1.3) as two families of inequalities (1.28) and (1.29), which define sub- and super- viscosity solutions (Definition 1.1). The comparison principle in Theorem 2.1 compares the sub- and super- solutions of these two (in-)equations. This result implies in particular that there is at most one function f which is both a sub- as well as a super- solution.

In the second part of the paper (Sect. 3), we construct solutions by studying the Lagrangian dynamics associated with the Hamiltonian H in (1.1). A Legendre dual transform of the formal Hamiltonian gives a Lagrangian function

$$\begin{aligned} L(\rho , \partial _t \rho ):= \sup _{\varphi \in C^\infty ({\mathcal {O}})} \big ( \langle \partial _t \rho , \varphi \rangle - H(\rho , \varphi ) \big ) = \frac{1}{2} \Vert \partial _t \rho - \frac{1}{2} \partial ^2_x \log \rho \Vert _{-1}^2 \end{aligned}$$

(the norm is defined in (1.13) below). We define an action on \({\mathcal {P}}({\mathcal {O}})\)-valued curves by

$$\begin{aligned} A_T[\rho (\cdot )]:= \int _0^T L(\rho (t), \partial _t\rho (t)) dt. \end{aligned}$$
(1.4)

One can consider variational problems with this action defined in the space of curves \(\rho (\cdot )\), or equivalently, consider a nonlinear partial differential equation with control,

$$\begin{aligned} \partial _t \rho (t,x) =\frac{1}{2} \partial _{xx}^2 \log \rho (t,x) + \partial _x \eta (t,x), \quad t >0, x \in {\mathcal {O}}, \end{aligned}$$
(1.5)

with \(\rho (r,\cdot )\) being the state variable, \(\eta (t,\cdot )\) (or equivalently \(\partial _t \rho \)) being a control, and \(A_T\) being a running cost. We take the control interpretation next and defined a class of admissible control as those satisfying

$$\begin{aligned} \qquad \int _0^T \int _{{\mathcal {O}}} |\eta (s,x)|^2 dx ds <\infty . \end{aligned}$$
(1.6)

We also define a value function for the above optimal control problem,

$$\begin{aligned} R_\alpha h(\rho _0)&:= \limsup _{t \rightarrow \infty } \sup \Big \{ \int _0^t e^{-\alpha ^{-1} s} \Big ( \frac{h(\rho (s))}{\alpha } - \frac{1}{2} \int _{{\mathcal {O}}} |\eta (s,x)|^2 dx \Big )ds : \nonumber \\&\qquad \qquad (\rho (\cdot ), \eta (\cdot )) \text { satisfies}~(1.5) \text {and}~(1.6)\text { with } \rho (0) = \rho _0 \Big \}. \end{aligned}$$
(1.7)

Then, assuming \(h \in C_b({{\mathsf {X}}})\), we show that

$$\begin{aligned} f := R_\alpha h \end{aligned}$$
(1.8)

is both a sub-solution to (1.28) as well as a super-solution to (1.29) (see Lemma 3.15). This gives us an existence result for the Hamilton–Jacobi PDE (1.3) in the setting we introduced. Hence, by the comparison principle earlier proved, it is the only solution.

The formal basis for the existence results above comes from an observation that

$$\begin{aligned} H f(\rho ) = \sup _{\eta \in L^2({\mathcal {O}})} \Big \{ \langle \frac{\delta f}{\delta \rho }, \frac{1}{2} \partial _{xx}^2 \log \rho +\partial _x \eta \rangle - \frac{1}{2} \int _{{\mathcal {O}}} |\eta (x)|^2 dx \Big \} \end{aligned}$$
(1.9)

We emphasize again that \(\log \rho \) may not be a distribution, hence the above variational representation is not rigorous. However, it suggests at least formally that H is a Nisio semigroup generator associated with the family of nonlinear diffusion equations with control (1.5). We also comment that the value function \(R_\alpha h:{{\mathsf {X}}}\mapsto {\bar{{{\mathbb {R}}}}}: = {{\mathbb {R}}}\cup \{ \pm \infty \}\) introduced before is well defined for all \(h:{{\mathsf {X}}}\mapsto {\bar{{{\mathbb {R}}}}}\) satisfying

$$\begin{aligned} \int _0^t e^{-\alpha ^{-1} s} h(\rho (s)) ds <+\infty , \quad \forall (\rho (\cdot ),\eta (\cdot )) \text { satisfies}~(1.5) \text {and}~(1.6) , \quad \forall t>0. \end{aligned}$$
(1.10)

This includes in particular the class of measurable \(h:{{\mathsf {X}}}\mapsto {\bar{{{\mathbb {R}}}}}\) which are bounded from above \(\sup _{{\mathsf {X}}}h<+\infty \). Additionally, the precise meaning of control equation (1.5) is given in Definition 3.1. We establish existence and some regularities of solutions in Lemmas 3.2 and 3.5. Finally, in Sect. 3 we use that the partial differential equation (1.5) can also be written as a system in a density-flux \((\rho ,j)\)-variables

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \rho + \partial _x j =0, \\ 2 \rho (j+\eta ) + \partial _x \rho =0. \end{array}\right. } \end{aligned}$$
(1.11)

This “change-of-coordinate” turns out to be very useful when we justify the derivation of the Hamiltonian H from microscopic models in the last part of the paper.

The third part of this paper, Sect. 4, is, unlike the other parts of the paper, non-rigorous. The purpose of this section is to place results of first two parts of this paper in context of a bigger program, by explaining significance of studying the equation (1.3). Specifically, in Sect. 4, we will informally derive the Hamiltonian H given in (1.1) in a context of Hamiltonian convergence using generalized multi-scale averaging techniques (for operators on functions in the space of probability measures). Our starting point is a stochastic model of microscopically defined particle system in gas kinetics. By a two-scale hydrodynamic rescaling and by taking the number of particles to infinity, the (random) empirical measure of particle number density \(\rho _\epsilon \) follows an asymptotic expression

$$\begin{aligned} P ( \rho _\epsilon (\cdot ) \in d \rho (\cdot ) ) \sim Z_\epsilon ^{-1} e^{- \epsilon ^{-1} A_T[\rho (\cdot )]} P_0(d \rho (\cdot )), \end{aligned}$$
(1.12)

where the \(A_T\) is precisely the action given by (1.4) and the \(P_0\) is some ambient background reference measure. We justify the above probabilistic limit theorem (known as large deviations) through a Hamilton–Jacobi approach. For a full exposition on this approach in a rigorous and general context, see Feng and Kurtz [24]. In this general theory, H is derived from a sequence of Hamiltonians which describe Markov processes given by stochastic interacting particle systems. The rigorous application of the general theory developed in [24] requires to establish a comparison principle for the limiting Hamilton–Jacobi equation given by the H. In addition, if we have an optimal control representation of H (such as the identity (1.9)), then we can explicitly identify the right hand side of (1.12) using the action. This is the reason we studied these problems in Sects. 2 (comparison principle) and 3 (optimal control problem) in this paper.

To summarize, we derive the Hamiltonian convergence in Sect. 4 in a non-rigorous manner; we rigorously prove the comparison principle in Sect. 2; and rigorously construct the solution and related the solution with an optimal control problem in Sect. 3.

1.3 Notations and definitions

Let \({\mathcal {P}}(A)\) denote the collection of all probability measures on a set A. On a metric space (Er), we use B(E) to denote the collection of bounded function on E. Further, \(C_b(E)\) denotes bounded continuous functions, UC(E) denotes uniformly continuous functions, \(UC_b(E):= UC(E) \cap B(E)\). Finally, LSC(E) (respectively USC(E)) denotes lower-semicontinuous (respectively upper-semicontinuous) functions, which are possibly unbounded. For a function \(h \in UC(E)\), we denote \(\omega _h\) the (minimal) modulus of continuity of h with respect to the metric r on E:

$$\begin{aligned} \omega _h(t):= \sup _{r(x,y)\le t} |h(x) - h(y)|. \end{aligned}$$

We write \(C^\infty ({\mathcal {O}})\) for the collection of infinitely differentiable functions on \({\mathcal {O}}\). For a Schwartz distribution \(m \in {\mathcal {D}}^\prime ({\mathcal {O}})\), we define

$$\begin{aligned} \Vert m \Vert _{-1}:= \sup \Big ( \langle m, \varphi \rangle : \varphi \in C^\infty ({\mathcal {O}}), \int _{\mathcal O} |\partial _x \varphi |^2 dx \le 1 \Big ). \end{aligned}$$
(1.13)

We denote the homogeneous Sobolev space of negative order

$$\begin{aligned} H_{-1}({\mathcal {O}}):= \big \{ m \in \mathcal D^\prime ({\mathcal {O}}) : \Vert m \Vert _{-1}<\infty \big \}. \end{aligned}$$
(1.14)

The associated norm has the property that

$$\begin{aligned} \Vert m \Vert _{-1}=+\infty , \quad \forall m \in \mathcal D^\prime ({\mathcal {O}}) \text { such that }\langle m, 1 \rangle \ne 0. \end{aligned}$$

Hence \(H_{-1}({\mathcal {O}})\) is a subset of distributions that annihilates constants, \(\langle m, 1\rangle =0\). In fact, the following representation holds: for every \(m \in H_{-1}(\mathcal O)\), we have

$$\begin{aligned} m = \partial _x \eta , \quad \exists \eta \in L^2({\mathcal {O}}). \end{aligned}$$

Regarding the one dimensional torus \({\mathcal {O}}:= {{\mathbb {R}}}/ {{\mathbb {Z}}}\) as a quotient metric space, we consider a metric r defined by

$$\begin{aligned} r(x,y):=\inf _{k \in {{\mathbb {Z}}}} |x-y-k|. \end{aligned}$$
(1.15)

Let \(\rho , \gamma \in {\mathcal {P}}({\mathcal {O}})\). We write

$$\begin{aligned} \Pi (\rho ,\gamma ):= \big \{ {\varvec{\nu }} \in \mathcal P({\mathcal {O}} \times {\mathcal {O}}) \text { satisfying } {\varvec{\nu }}(dx, {\mathcal {O}}) =\rho (dx), {\varvec{\nu }}({\mathcal {O}}, dy) =\gamma (dy) \big \}. \end{aligned}$$
(1.16)

For \(p \in (1, \infty )\), let \(W_p\) be the Wasserstein order p-metric on \({\mathcal {P}}({\mathcal {O}})\):

$$\begin{aligned} W_p^p (\rho , \gamma ) :=\inf \left\{ \int _{{\mathcal {O}} \times \mathcal O} r^p(x,y) {\varvec{\nu }}(dx, dy) : {\varvec{\nu }} \in \Pi (\rho , \gamma ) \right\} . \end{aligned}$$

See Chapter 7 in Ambrosio, Gigli and Saváre [2] or Chapter 7 of Villani [44] for properties of this metric. Next, we claim that

$$\begin{aligned} {\mathcal {P}}({\mathcal {O}}) -1:= \{ \rho -1 : \rho \in {\mathcal {P}}({\mathcal {O}})\} \subset H_{-1}({\mathcal {O}}). \end{aligned}$$
(1.17)

To see this, we note that on one hand, by the Kantorovich–Rubinstein theorem (e.g., Theorem 1.14 of [44]), for every \(\rho , \gamma \in {\mathcal {P}}({\mathcal {O}})\), we have

$$\begin{aligned} W_1(\rho ,\gamma )&= \sup \Big \{ \langle \varphi , \rho - \gamma \rangle : \varphi \in C^\infty ({\mathcal {O}}) \text { satisfying } \Vert \varphi \Vert _{\mathrm{Lip}} \le 1\Big \} \nonumber \\&\le \sup \Big \{ \langle \varphi , \rho -\gamma \rangle : \varphi \in C^\infty ({\mathcal {O}}), \int _{{\mathcal {O}}} |\nabla \varphi |^2 dx \le 1 \Big \} \nonumber \\&= \Vert \rho - \gamma \Vert _{-1}. \end{aligned}$$
(1.18)

On the other hand, by an adaptation of Lemma 4.1 of Mischler and Mouhot [39] to the torus case (see Lemma A.1 in Appendix A), there exists a universal constant \(C>0\) such that

$$\begin{aligned} \Vert \rho - \gamma \Vert _{-1} \le C \sqrt{W_1(\rho ,\gamma )}. \end{aligned}$$

Therefore the topology induced by the metric \(\Vert \cdot \Vert _{-1}\) is identical to the usual topology of weak convergence of probability measure on \({\mathcal {P}}({\mathcal {O}})\). Since any sequence of elements in \({{\mathsf {X}}}:={\mathcal {P}}({\mathcal {O}})\) is tight, we conclude that \(({{\mathsf {X}}}, {{\mathsf {d}}})\) is a compact metric space with the \({{\mathsf {d}}}(\rho ,\gamma ) :=\Vert \rho -\gamma \Vert _{-1}\). In particular, this argument establishes that (1.17) holds.

We define a free energy functional \(S:{\mathcal {P}}({\mathcal {O}}) \mapsto [0,+\infty ]\),

$$\begin{aligned} S(\rho ):= {\left\{ \begin{array}{ll} \int _{{\mathcal {O}}} \rho (x) \log \rho (x) dx &{} \text { if } \rho (dx) = \rho (x) dx \\ +\infty &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$
(1.19)

We use convention that \(0\log 0 :=0\). Since this is the relative entropy between \(\rho \) and the uniform probability measure 1 on \({\mathcal {O}}\), we have \(S(\rho ) \ge S(1)=0\). By the variational representation

$$\begin{aligned} S(\rho ) = \sup _{\varphi \in C^\infty ({\mathcal {O}})} \big ( \langle \varphi , \rho \rangle - \log \int _{{\mathcal {O}}} e^{\varphi } dx \big ), \end{aligned}$$

we have \(S \in LSC\big ({\mathcal {P}}({\mathcal {O}})\big )\).

We make two formal observations. For the Hamiltonian H in (1.1), we have

$$\begin{aligned} \big ( H ( \epsilon S) \big ) (\rho ) = H\big (\rho , \epsilon \frac{\delta S}{\delta \rho }\big ) = -\frac{\epsilon (1-\epsilon )}{2} \int _{{\mathcal {O}}} |\partial _x \log \rho |^2 dx \le 0, \quad \epsilon \in [0,1]. \end{aligned}$$

We introduce an analog of Fisher information in this context, extending the usual definition in optimal mass transport theory, by defining

$$\begin{aligned} I(\rho )&:= {\left\{ \begin{array}{ll} \sup _{\xi \in C^\infty ({\mathcal {O}})} \{ 2 \langle \partial _x \xi , \log \rho \rangle - \int _{{\mathcal {O}}}|\xi |^2 dx \} &{} \text { if } \rho (dx) = \rho (x) dx, \log \rho \in L^1({\mathcal {O}}) \\ +\infty &{} \text { otherwise} \end{array}\right. }\nonumber \\&= {\left\{ \begin{array}{ll} \int _{{\mathcal {O}}} |\partial _x \log \rho |^2 dx &{} \text { if } \rho (dx) = \rho (x) dx, \log \rho \in L^1({\mathcal {O}}), \partial _x \log \rho \in L^2({\mathcal {O}}) \\ +\infty &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$
(1.20)

We claim that \(I \in LSC({\mathcal {P}}({\mathcal {O}}); {\bar{{{\mathbb {R}}}}})\). This claim can be verified by the following observations. Let \(\rho _n\) be a sequence such that \(\sup _n I(\rho _n) <\infty \). First, by one-dimensional Sobolev inequalities, \(\sup _n \Vert \log \rho _n \Vert _{L^\infty } <\infty \) and \(\rho _n \in C({\mathcal {O}})\). In fact, \(\{\rho _n\}_n\) has a uniform modulus of continuity, and is hence relatively compact in \(C({\mathcal {O}})\). This implies relative compactness of the \(\{ \log \rho _n \}_n\) in \(C({\mathcal {O}})\). Secondly, by variational formula, for all \(\rho \) such that \(\log \rho \) is bounded:

$$\begin{aligned} \int _{{\mathcal {O}}} |\partial _x \log \rho |^2 dx= \sup \{ 2 \langle \partial _x \xi , \log \rho \rangle - \int _{{\mathcal {O}}}|\xi |^2 dx : \forall \xi \in C^\infty ({\mathcal {O}})\}. \end{aligned}$$

We note that \(\varphi \mapsto H(\rho , \varphi )\) is convex in the usual sense.

Next, extending the existing theory of viscosity solution for Hamilton–Jacobi equations in abstract metric spaces, we define a notion of solutions that will be used in this paper.

Definition 1.1

Let (Er) be an arbitrary compact metric space. A function \(\overline{f} :E \mapsto {{\mathbb {R}}}\) is a sub-solution to (1.3) if for every \(f_0 \in D(H)\), there exists \(x_0 \in E\) such that

$$\begin{aligned} \big ( \overline{f} - f_0 \big )(x_0) = \sup _E \big (\overline{f} - f_0\big ), \end{aligned}$$

we have

$$\begin{aligned} \alpha ^{-1}\big ( \overline{f}(x_0) - h(x_0) \big ) \le H f_0(x_0). \end{aligned}$$

Similarly, a function \({\underline{f}} :E \mapsto {{\mathbb {R}}}\) is a super-solution to (1.3) if for every \(f_1 \in D(H)\), there exists \(y_0 \in E\) such that

$$\begin{aligned} \big ( f_1 - {\underline{f}} \big )(y_0) = \sup _E \big (f_1 - {\underline{f}}\big ), \end{aligned}$$

we have

$$\begin{aligned} \alpha ^{-1}\big ({\underline{f}}(y_0) - h(y_0)\big ) \ge H f_1(y_0). \end{aligned}$$

Note this definition differs from the usual one; indeed, if in Definition 1.1 “there exist” is replaced by “for every”, then \({\bar{f}}\) and \({\underline{f}}\) are strong sub- and super- solution. A function f is a (strong) solution if it is both a (strong) sub-solution and a (strong) super-solution.

1.4 Towards a well-posedness theory: two more Hamiltonians

We now establish some useful properties of the Hamiltonian H in (1.1). In particular, we now give a heuristic argument why a comparison principle can be expected formally for (1.3). Since we will use non-smooth test functions which do not fall inside the domain of very smooth functions D(H) given by (1.2), it is not at all trivial to make this result rigorous; to this behalf we introduce two more Hamiltonians related to the one in (1.1), as motivated through the formal calculations we now give. Throughout the paper, we denote \({{\mathsf {d}}}(\rho ,\gamma ) := \Vert \rho - \gamma \Vert _{-1}\).

Let \(k \in {{\mathbb {R}}}_+\), at least formally for \(\rho (dx) = \rho (x) dx\) and \(\gamma (dx)= \gamma (x) dx\), we have

$$\begin{aligned} H \big ( \frac{k}{2} {{\mathsf {d}}}^2(\cdot , \gamma )\big ) (\rho )&= - \frac{k}{2} \int _{{\mathcal {O}}} (\rho - \gamma ) \log \rho dx + \frac{k^2}{2} \Vert \rho - \gamma \Vert _{-1}^2, \nonumber \\&\le \frac{k}{2} \big ( S(\gamma ) - S(\rho )\big ) + \frac{k^2}{2} \Vert \rho - \gamma \Vert _{-1}^2. \end{aligned}$$
(1.21)

The last inequality follows since by Jensen’s inequality

$$\begin{aligned} \int _{{\mathcal {O}}} \gamma \log \rho dx = \int _{{\mathcal {O}}} \gamma \log \frac{\rho }{\gamma } dx + \int _{{\mathcal {O}}} \gamma \log \gamma dx \le \log \int _{{\mathcal {O}}} \rho dx + S(\gamma ). \end{aligned}$$

Similarly, we have

$$\begin{aligned} H \big (- \frac{k}{2} {{\mathsf {d}}}^2(\rho , \cdot )\big ) (\gamma )&= \frac{k}{2} \int _{{\mathcal {O}}} (\gamma - \rho ) \log \gamma dx + \frac{k^2}{2} \Vert \rho - \gamma \Vert _{-1}^2, \nonumber \\&\ge \frac{k}{2} \big ( S(\gamma ) - S(\rho )\big )+ \frac{k^2}{2} \Vert \rho - \gamma \Vert _{-1}^2. \end{aligned}$$
(1.22)

In particular,

$$\begin{aligned} H \big ( \frac{k}{2}{{\mathsf {d}}}^2(\cdot , \gamma )\big ) (\rho ) - H \big (- \frac{k}{2} {{\mathsf {d}}}^2(\rho , \cdot )\big ) (\gamma ) \le 0. \end{aligned}$$
(1.23)

Experts in viscosity solution theory may immediately recognize that, at a formal level, this inequality implies the comparison principle of (1.3) (see for instance, Theorems 3 and 5 of Feng and Katsoulakis [25]). To make this rigorous, we need to face the possibility of cancellations of the kind \(\infty - \infty \) when dealing with \(S(\rho ) - S(\gamma )\), or more generally, \(\infty - \infty -\infty +\infty \) when dealing with the left hand of (1.23).

To establish this result rigorously, we introduce two more Hamiltonian operators and formulate Theorem 1.2, which establishes not only the comparison principle, but also the existence of super- and sub-solutions for the Hamiltonians we now introduce. Let us mention that, although every operator in this paper is single-valued, for notational convenience, we may still identify an operator with its graph.

We now define two operators \(H_0 \subset C({{\mathsf {X}}}) \times M({{\mathsf {X}}}; {\bar{{{\mathbb {R}}}}})\) and \(H_1 \subset C({{\mathsf {X}}}) \times M({{\mathsf {X}}}; {\bar{{{\mathbb {R}}}}})\). Let

$$\begin{aligned} D(H_0):= \{ f_0 : f_0(\rho ):= \frac{k}{2} {{\mathsf {d}}}^2(\rho , \gamma ) : S(\gamma ) <\infty , k \in {{\mathbb {R}}}_+ \}, \end{aligned}$$
(1.24)

and

$$\begin{aligned} H_0 f_0(\rho ):= \frac{k}{2} \big ( S(\gamma ) - S(\rho )\big )+ \frac{k^2}{2} \Vert \rho - \gamma \Vert _{-1}^2. \end{aligned}$$
(1.25)

This definition is motivated by the formal calculation (1.21). Analogously, motivated by (1.22), let

$$\begin{aligned} D(H_1):= \{ f_1 : f_1(\gamma ):= - \frac{k}{2} {{\mathsf {d}}}^2(\rho , \gamma ) : S(\rho ) <\infty , k \in {{\mathbb {R}}}_+ \}, \end{aligned}$$
(1.26)

and

$$\begin{aligned} H_1 f_1(\gamma ):= \frac{k}{2} \big ( S(\gamma ) - S(\rho )\big )+ \frac{k^2}{2} \Vert \rho - \gamma \Vert _{-1}^2. \end{aligned}$$
(1.27)

Instead of working with (1.3) with H given in (1.1), we consider the equation for \(H_0\) and seek sub-solutions, and analogously super-solutions for \(H_1\). We establish existence and show that these solutions coincide for a common right-hand side h. Namely, let \(h_0, h_1 \in UC_b({{\mathsf {X}}})\) and \(\alpha >0\). We consider a viscosity sub-solution \(\overline{f}\) for

$$\begin{aligned} (I- \alpha H_0 ) \overline{f} \le h_0, \end{aligned}$$
(1.28)

and a viscosity super-solution \({\underline{f}}\) for

$$\begin{aligned} (I-\alpha H_1) {\underline{f}} \ge h_1. \end{aligned}$$
(1.29)

We will prove the following well-posedness result in Sects. 2 and 3.

Theorem 1.2

Let \(h \in C_b({{\mathsf {X}}})\) and \(\alpha >0\). We consider viscosity sub-solution to (1.28) and super-solution to (1.29) in the case where \(h_0= h_1=h\). There exists a unique \(f \in C_b({{\mathsf {X}}})\) such that it is both a sub-solution to (1.28) as well as a super-solution to (1.29). Moreover, such solution is given by

$$\begin{aligned} f:= R_\alpha h, \end{aligned}$$

where the \(R_\alpha \) is given by (1.7).

2 The Comparison Principle

In this section, we establish the following comparison principle.

Theorem 2.1

Let \(h_0, h_1 \in UC_b({{\mathsf {X}}})\) and \(\alpha >0\). Suppose that \(\overline{f} \in USC({{\mathsf {X}}}) \cap B({{\mathsf {X}}})\) respectively \({\underline{f}} \in LSC({{\mathsf {X}}}) \cap B({{\mathsf {X}}})\) is a sub-solution to (1.28) respectively a super-solution to (1.29). Then

$$\begin{aligned} \sup _{{{\mathsf {X}}}} \big ( \overline{f} - {\underline{f}} \big ) \le \sup _{{\mathsf {X}}}\big ( h_0 - h_1 \big ). \end{aligned}$$

We divide the proof into two parts.

2.1 A set of extended Hamiltonians and a comparison principle

We define a new set of operators \({{{\bar{H}}}}_0\) and \({{{\bar{H}}}}_1\) which extend \(H_0\) and \(H_1\) by allowing a wider class of test functions. The test functions for these operators are generally discontinuous and can take the values \(\pm \infty \). The operators \({{{\bar{H}}}}_0\) and \({{{\bar{H}}}}_1\) satisfy a structural assumption (Condition 1) used in Feng and Katsoulakis [25], hence the comparison principle for the associated Hamilton–Jacobi equations follows from [25, Theorem 3] (the same technique for establishing comparison principle has also been presented in Chapter 9.4 of Feng and Kurtz [24] (Condition 9.26 and Theorem 9.28)). In the next Sect. 2.2, we will link viscosity solutions for \({\bar{H}}_0\) and \({\bar{H}}_1\) with those for \(H_0\) and \(H_1\).

Let \(\epsilon \in (0,1)\), \(\delta \in [0,1]\) and \(\gamma \text { be such that } S(\gamma )<\infty \). We define

$$\begin{aligned} f_0(\rho ):=(1-\delta ) \frac{{{\mathsf {d}}}^2(\rho , \gamma )}{2\epsilon } + \delta \frac{S(\rho )}{2}, \end{aligned}$$
(2.1)

and

$$\begin{aligned} {{{\bar{H}}}}_0 f_0(\rho )&:= (1-\delta ) H_0 \frac{{{\mathsf {d}}}^2(\cdot , \gamma )}{2 \epsilon }(\rho ) - \frac{\delta }{8}I (\rho ) \nonumber \\&= (1-\delta ) \Big ( \frac{1}{2\epsilon } \big ( S(\gamma ) - S(\rho )\big ) + \frac{{{\mathsf {d}}}^2(\rho ,\gamma )}{2\epsilon ^2} \Big ) -\frac{\delta }{8} I( \rho ). \end{aligned}$$
(2.2)

This definition of \({{{\bar{H}}}}_0\) is motivated by convexity considerations: formally,

$$\begin{aligned} H f_0 \le (1-\delta ) H \frac{{{\mathsf {d}}}^2(\cdot ,\rho )}{2\epsilon }+ \delta H \frac{S}{2} = {\bar{H}}_0 f_0. \end{aligned}$$

Similarly, let \(\epsilon \in (0,1)\), \(\delta \in [0,1]\) and \(\rho \text { be such that } S(\rho )<\infty \). We define

$$\begin{aligned} f_1(\gamma ):= (1+\delta ) \frac{- {{\mathsf {d}}}^2(\rho , \gamma )}{2\epsilon } - \delta \frac{S(\gamma )}{2}, \end{aligned}$$
(2.3)

and

$$\begin{aligned} {{{\bar{H}}}}_1 f_1(\gamma )&:= (1+\delta ) H_1 \big (- \frac{{{\mathsf {d}}}^2(\rho , \cdot )}{2 \epsilon }\big ) (\gamma ) + \frac{\delta }{8} I(\gamma ). \nonumber \\&=(1+\delta ) \Big ( \frac{1}{2\epsilon } \big (S(\gamma ) -S(\rho ) \big ) + \frac{ {{\mathsf {d}}}^2(\rho ,\gamma )}{2\epsilon ^2} \Big ) + \frac{\delta }{8} I(\gamma ). \end{aligned}$$
(2.4)

The definition of \({{{\bar{H}}}}_1\) is motivated in a similar way to that of \({{{\bar{H}}}}_0\), but is a bit more involved. We first observe that

$$\begin{aligned} - \frac{{{\mathsf {d}}}^2(\rho , \gamma )}{2\epsilon } = \frac{1}{1+\delta } f_1(\gamma ) + \frac{\delta }{1+\delta } \frac{S(\gamma )}{2}. \end{aligned}$$

By convexity considerations applied formally to H, we have

$$\begin{aligned} \Big (H \frac{-{{\mathsf {d}}}^2(\rho ,\cdot )}{2\epsilon } \Big ) (\gamma ) \le \frac{1}{1+\delta } H f_1(\gamma ) + \frac{\delta }{1+\delta } H\frac{S}{2}(\gamma ). \end{aligned}$$

That is, \({\bar{H}}_1\) is defined such that

$$\begin{aligned} {\bar{H}}_1 f_1 \le H f_1. \end{aligned}$$

We now give an auxiliary statement to establish the existence of strong viscosity solutions.

Lemma 2.2

Take \(f_0,f_1\) in (2.1) and (2.3), then \( {{{\bar{H}}}}_0 f_0 \in USC({{\mathsf {X}}}, {\bar{{{\mathbb {R}}}}})\) and \({{{\bar{H}}}}_1 f_1 \in LSC({{\mathsf {X}}}, {\bar{{{\mathbb {R}}}}})\). Moreover, for \(\rho , \gamma \in {{\mathsf {X}}}\) such that \(S(\rho )+S(\gamma )<\infty \), we have

$$\begin{aligned} \frac{1}{1-\delta } {{{\bar{H}}}}_0 f_0(\rho ) - \frac{1}{1+\delta } {{{\bar{H}}}}_1 f_1(\gamma ) \le - \frac{1}{8} \big ( \frac{\delta }{1-\delta } I(\rho ) + \frac{\delta }{1+\delta } I (\gamma ) \big ) \le 0. \end{aligned}$$
(2.5)

Proof

The semi-continuity properties follow from the lower semi-continuity of \(\rho \mapsto S(\rho )\) and \(\rho \mapsto I(\rho )\) in \(({{\mathsf {X}}}, {{\mathsf {d}}})\). The estimate (2.5) follows from direct verification. \(\quad \square \)

Below, we establish the first comparison result in this paper for strong viscosity solutions (see the note following 1.1).

Lemma 2.3

Suppose that \(\overline{f} \in USC({{\mathsf {X}}}, {{\mathbb {R}}}) \cap B({{\mathsf {X}}})\) and \({\underline{f}} \in LSC({{\mathsf {X}}}, {{\mathbb {R}}}) \cap B({{\mathsf {X}}})\) are respectively viscosity strong sub-solution and strong super-solution to

$$\begin{aligned} {\overline{f}} - \alpha {{{\bar{H}}}}_0 \overline{f}&\le h_0, \end{aligned}$$
(2.6)
$$\begin{aligned} {\underline{f}} - \alpha {{{\bar{H}}}}_1 {\underline{f}}&\ge h_1. \end{aligned}$$
(2.7)

Then

$$\begin{aligned} \sup _{\rho \in {{\mathsf {X}}}: S(\rho ) <\infty } \big ( \overline{f} (\rho ) - {\underline{f}}(\rho ) \big ) \le \sup _{\rho \in {{\mathsf {X}}}} \big ( h_0(\rho ) - h_1(\rho ) \big ) . \end{aligned}$$

Moreover, if \(\overline{f}, {\underline{f}} \in C_b({{\mathsf {X}}})\), then

$$\begin{aligned} \sup _{{\mathsf {X}}}\big ( \overline{f} - {\underline{f}} \big ) \le \sup _{{\mathsf {X}}}\big ( h_0 - h_1\big ). \end{aligned}$$

Proof

The estimates in Lemma 2.2 imply that Theorem 3 in Feng and Katsoulakis [25] applies, hence the conclusions follow. \(\quad \square \)

2.2 Viscosity extensions from \(H_0\) and \(H_1\) to \({\bar{H}}_0\) and \({\bar{H}}_1\) and the comparison theorem

Throughout this section, we assume that the functions \(\overline{f} \in USC({{\mathsf {X}}}) \cap B({{\mathsf {X}}})\) respectively \({\underline{f}} \in LSC({{\mathsf {X}}}) \cap B({{\mathsf {X}}})\) are a viscosity sub-solution to (1.28) respectively a super-solution to (1.29). The following regularizations \(\overline{f}_t\) and \({\underline{f}}_t\) and Lemma 2.4 are analogues of Lemma 13.34 of Feng and Kurtz [24].

For each \(t \in (0,1)\), we define

$$\begin{aligned} \overline{f}_t(\rho )&:= \sup _{\gamma \in {{\mathsf {X}}}} \big ( \overline{f}(\gamma ) - \frac{{{\mathsf {d}}}^2(\rho ,\gamma )}{2t} \big ), \end{aligned}$$
(2.8)
$$\begin{aligned} {\underline{f}}_t(\gamma )&:= \inf _{\rho \in {{\mathsf {X}}}} \big ( {\underline{f}}(\rho ) + \frac{{{\mathsf {d}}}^2(\rho ,\gamma )}{2t} \big ). \end{aligned}$$
(2.9)

It follows that

$$\begin{aligned} \overline{f} \le \overline{f}_t , \quad {\underline{f}}_t \le {\underline{f}}, \quad \forall t \in (0,1). \end{aligned}$$

Lemma 2.4

\(\overline{f}_t, {\underline{f}}_t \in \mathrm{Lip}({{\mathsf {X}}})\).

Next we establish a few a priori estimates. To get some intuition of what we will derive, we now explain the heuristic ideas. Let \(f_0\) be as in (2.1) taking the form

$$\begin{aligned} f_0(\rho ) := (1-\delta ) \frac{{{\mathsf {d}}}^2(\rho ,{\hat{\gamma }})}{2\epsilon } + \delta \frac{S(\rho )}{2}, \end{aligned}$$
(2.10)

where the \({\hat{\gamma }} \in {{\mathsf {X}}}\) is such that \(S({\hat{\gamma }})<\infty \). Then formally

$$\begin{aligned} \frac{\delta f_0}{\delta \rho } = (1-\delta ) \frac{(-\partial _{xx}^2)^{-1}(\rho - {\hat{\gamma }})}{\epsilon } + \delta \frac{\log \rho }{2}. \end{aligned}$$
(2.11)

If \(\log \rho \in L^\infty ({\mathcal {O}})\), then the expression above is an element in \(L^2({\mathcal {O}})\). Let \(\rho _0, \gamma _0 \in {{\mathsf {X}}}\) be such that \(S(\rho _0) <\infty \). We assume that

$$\begin{aligned} f_0(\rho )- f_0(\rho _0) \ge - \frac{1}{t} \Big (\frac{{{\mathsf {d}}}^2(\rho ,\gamma _0)}{2} - \frac{{{\mathsf {d}}}^2(\rho _0,\gamma _0)}{2} \Big ), \quad \forall \rho \in {{\mathsf {X}}}. \end{aligned}$$
(2.12)

Then, by taking directional derivatives along paths \(t \mapsto \rho :=\rho (t)\) with \(\rho (0) =\rho _0\), the above will imply the following comparison of Hamiltonians

$$\begin{aligned} H \Big (\gamma _0,\frac{\delta }{\delta \gamma _0}\frac{ {{\mathsf {d}}}^2(\rho _0,\cdot )}{2t} \Big ) \le H_0 \big ( \frac{{{\mathsf {d}}}^2(\rho _0,\cdot )}{2t} \big )(\gamma _0) \le {{{\bar{H}}}}_0 f_0 (\rho _0). \end{aligned}$$

We now rigorously justify these formal comparisons. We divide the justification into three steps. First, we make sense of the following statement in a rigorous way:

$$\begin{aligned} \langle \frac{\delta f_0}{\delta \rho _0}, \frac{1}{2} \partial _{xx}^2 \log \rho _0 \rangle \ge - \frac{1}{2t} \langle \frac{\delta {{\mathsf {d}}}^2(\cdot ,\gamma _0) }{\delta \rho _0}, \frac{1}{2} \partial _{xx}^2 \log \rho _0 \rangle . \end{aligned}$$

Lemma 2.5

Let \(f_0\) be given by (2.10) with \(S({\hat{\gamma }})<\infty \). Let \(\rho _0,\gamma _0 \in {{\mathsf {X}}}\) be such that \(S(\rho _0)<\infty \) and that (2.12) holds. Then

$$\begin{aligned} \frac{1}{2t} \Big ( S(\rho _0) - S(\gamma _0)\Big ) \le \Big ( (1-\delta ) \frac{S( {\hat{\gamma }}) - S(\rho _0)}{2 \epsilon } -\frac{ \delta }{4} I(\rho _0)\Big ). \end{aligned}$$
(2.13)

Note that if we assume \(S(\gamma _0)<\infty \), then this estimate immediately implies an a posteriori estimate

$$\begin{aligned} I(\rho _0)<\infty . \end{aligned}$$

Proof

We claim that there exists a curve \(\rho \in C([0,\infty ); {{\mathsf {X}}})\) such that the following partial differential equation

$$\begin{aligned} \partial _t \rho = \frac{1}{2} \partial _{xx}^2 \log \rho , \quad \rho (0) = \rho _0. \end{aligned}$$

is satisfied in the sense of Definition 3.1 below. Moreover,

$$\begin{aligned} S(\rho (s)) - S(\rho _0) \le -\frac{1}{2} \int _0^s I(\rho (r)) dr, \quad \forall s>0, \end{aligned}$$
(2.14)

and

$$\begin{aligned} \frac{1}{2} {{\mathsf {d}}}^2(\rho (s), {\hat{\gamma }}) - \frac{1}{2} {{\mathsf {d}}}^2(\rho _0,{\hat{\gamma }}) \le \int _0^s \frac{1}{2} \Big ( S({\hat{\gamma }}) - S(\rho (r)) \Big ) dr, \quad \forall {\hat{\gamma }} \text { with } S({\hat{\gamma }}) <\infty . \end{aligned}$$
(2.15)

A rigorous justification of the claim above can be found in Lemma 3.2. Results of this type are well-known for \(\partial _t \rho = \partial _{xx} \Phi (\rho )\) with a regular \(\Phi \) (e.g., Theorem 5.5 of Vazquez [43]). However, in our case, this existing theory does not directly apply, as our \(\Phi (r)= \frac{1}{2} \log r\) is singular at \(r=0\). Although main ideas for establishing these estimates remain the same, additional subtleties need to be taken care of. Of course, the proofs in Sect. 3.1 are independent of the results in this section on comparison principle, hence we can invoke these results here without creating circular arguments.

In summary, by (2.14) and (2.15), the curve satisfies

$$\begin{aligned} f_0(\rho (s)) - f_0(\rho _0) \le \int _0^s \Big ( (1-\delta ) \frac{S( {\hat{\gamma }}) - S(\rho (r))}{2 \epsilon } -\frac{ \delta }{4} I(\rho (r)) \Big ) dr. \end{aligned}$$

We also have (note that the following inequality holds trivially if \(S(\gamma _0)=+\infty \))

$$\begin{aligned} - \frac{1}{2} {{\mathsf {d}}}^2(\rho (s), \gamma _0) + \frac{1}{2} {{\mathsf {d}}}^2(\rho _0,\gamma _0) \ge \int _0^s \frac{1}{2} \Big ( S(\rho (r)) - S(\gamma _0) \Big ) dr. \end{aligned}$$

We plug the two lines above into (2.12) and note that SI are both lower semicontinuous. The inequality (2.13) follows. \(\quad \square \)

Lemma 2.6

Let the \(f_0, \rho _0,\gamma _0, {\hat{\gamma }}\) be as in the previous Lemma 2.5 with the additional assumption that \(S(\gamma _0)<\infty )\) (hence \(I(\rho _0)<\infty \) by the previous lemma). Then the term \(\frac{\delta f_0}{\delta \rho _0}\) in (2.11) is well defined and

$$\begin{aligned} \frac{\delta f_0}{\delta \rho _0} = - \frac{1}{t} (-\partial ^2_{xx})^{-1}(\rho _0 - \gamma _0), \end{aligned}$$

which implies

$$\begin{aligned} \int _{{\mathcal {O}}} |\partial _x \frac{\delta f_0}{\delta \rho _0}|^2 d x = \frac{{{\mathsf {d}}}^2(\rho _0, \gamma _0)}{t^2} . \end{aligned}$$

Proof

Let \(\gamma \in C^\infty ({\mathcal {O}}) \cap {{\mathsf {X}}}\) with \(\inf _{\mathcal O} \gamma >0\). We define

$$\begin{aligned} \rho (s):= \rho _0 + sj, \text { with } j :=(\gamma -\rho _0), \forall s \in [0,1]. \end{aligned}$$

From \(I(\rho _0)<\infty \), we have \(\rho _0, j \in C({\mathcal {O}})\) and \(\inf _{{\mathcal {O}}} \rho _0 >0\). Therefore, \(\rho (s) \in C({\mathcal {O}}) \cap {{\mathsf {X}}}\) and \(\inf _{{\mathcal {O}}} \rho (s)>0\) for all \(s \in [0,1]\). With these regularities, \((-\partial _{xx}^2)^{-1}(\rho (s) - \gamma ) \in C({\mathcal {O}})\). Hence, if we define

$$\begin{aligned} \frac{\delta f_0}{\delta \rho (s)}:= (1-\delta ) \frac{1}{\epsilon } (-\partial ^2)^{-1} (\rho (s) - {\hat{\gamma }}) + \delta \frac{1}{2} \log \rho , \end{aligned}$$

then this expression is well defined and

$$\begin{aligned} \frac{\delta f_0}{\delta \rho (r)} \in C({\mathcal {O}}) \text { and } r \mapsto \langle \frac{\delta f_0}{\delta \rho (r)}, j \rangle \in C([0,1]). \end{aligned}$$

Therefore

$$\begin{aligned} f_0(\rho (s)) - f_0(\rho _0) = \int _0^s \langle \frac{\delta f_0}{\delta \rho (r)}, j \rangle dr \end{aligned}$$

and

$$\begin{aligned} \frac{1}{2} {{\mathsf {d}}}^2(\rho (s), \gamma _0) - \frac{1}{2} {{\mathsf {d}}}^2(\rho _0,\gamma _0) = \int _0^s \langle (-\partial _{xx}^2)^{-1}(\rho (r) -\gamma _0) , j \rangle dr. \end{aligned}$$

In view of (2.12) and the regularities \(\rho (r), j \in C({\mathcal {O}})\), we have

$$\begin{aligned} \langle \frac{\delta f_0}{\delta \rho _0}, j \rangle \ge \langle -\frac{1}{t} (-\partial _{xx}^2)^{-1}(\rho (r) -\gamma _0) , j \rangle . \end{aligned}$$

As j is arbitrary, the claim follows. \(\quad \square \)

Lemma 2.7

Let \(f_0, \rho _0,\gamma _0, {\hat{\gamma }}\) be as in Lemma 2.5, with \(S(\gamma _0)<\infty \). We assume that (2.12) holds. Then

$$\begin{aligned} H_0 \big ( \frac{{{\mathsf {d}}}^2(\rho _0,\cdot )}{2t} \big )(\gamma _0) \le {{{\bar{H}}}}_0 f_0(\rho _0). \end{aligned}$$

Proof

We have shown in Lemma 2.5 that \(I(\rho _0)<\infty \). Note that by definition

$$\begin{aligned} H_0 \frac{ {{\mathsf {d}}}^2(\rho _0, \cdot )}{2t} (\gamma _0)&= \frac{1 }{2t} \Big ( S( \rho _0) -S(\gamma _0) \Big ) + \frac{ {{\mathsf {d}}}^2(\rho _0,\gamma _0)}{2t^2}, \end{aligned}$$

and

$$\begin{aligned} {{{\bar{H}}}}_0 f_0 (\rho _0)&= (1-\delta ) \Big ( \frac{1}{2 \epsilon }\big ( S({\hat{\gamma }}) - S(\rho _0)\big ) + \frac{{{\mathsf {d}}}^2(\rho _0, {\hat{\gamma }})}{2 \epsilon ^2}\Big ) -\frac{\delta }{8} I (\rho _0). \end{aligned}$$

By Lemma 2.6 and then (2.11) and the convexity of quadratic functions, we have

$$\begin{aligned} \frac{1}{t^2} {{\mathsf {d}}}^2(\rho _0, \gamma _0) = \int _{{\mathcal {O}}} |\partial _x \frac{\delta f_0}{\delta \rho _0}|^2 d x \le \Big ( (1-\delta ) \frac{{{\mathsf {d}}}^2(\rho _0,{\hat{\gamma }})}{\epsilon ^2} + \delta \frac{1}{4} I(\rho _0) \Big ). \end{aligned}$$

Combined with the estimate (2.13) in Lemma 2.5, the conclusion follows. \(\quad \square \)

We now state the first existence result for viscosity solutions, in a suitably regularized setting. The proof of Theorem 2.1 will follow easily from this statement.

Lemma 2.8

Let us consider \(h_0 \in UC_b({{\mathsf {X}}})\) with a nondecreasing modulus of continuity denoted as \(\omega _0:=\omega _{h_0}\). Let

$$\begin{aligned} h_{0,t}(\rho ):= h_0(\rho )+ \omega _{0} \big ( \sqrt{t} C_{\overline{f}} \big ), \forall \rho \in {{\mathsf {X}}}, \text { with } C_{\overline{f}} := \sqrt{2}\big ( \Vert \overline{f} \Vert _\infty \big )^{1/2}. \end{aligned}$$

Then the \(\overline{f}_t \in C_b({{\mathsf {X}}})\) is a strong viscosity sub-solution to the Hamilton–Jacobi equation (2.6) with \(h_0\) being replaced by \(h_{0,t}\).

Similarly, suppose that \(h_1 \in UC_b({{\mathsf {X}}})\) with a nondecreasing modulus of continuity \(\omega _1:=\omega _{h_1}\). Let

$$\begin{aligned} h_{1,t}(\gamma ):= h_1(\gamma )- \omega _{1} \big ( \sqrt{t} C_{{\underline{f}}} \big ), \forall \gamma \in {{\mathsf {X}}}, \text { with } C_{{\underline{f}}} := \sqrt{2}\big ( \Vert {\underline{f}} \Vert _\infty \big )^{1/2}. \end{aligned}$$

Then \({\underline{f}}_t \in C_b({{\mathsf {X}}})\) is a strong viscosity super-solution to the Hamilton–Jacobi equation (2.7) with \(h_1\) being replaced by \(h_{1,t}\).

Proof

We only prove the sub-solution case, the super-solution case is similar. Let \(f_0\) be as in (2.10). We assume that \(\rho _0 \in {{\mathsf {X}}}\) is such that

$$\begin{aligned} ( \overline{f}_t - f_0)(\rho _0) = \sup _{{{\mathsf {X}}}} (\overline{f}_t - f_0). \end{aligned}$$

Then \(S(\rho _0)<\infty \). The existence of such \(\rho _0\) is guaranteed by the lower-semicontinuity of \(f_0\), \(f_0 \in LSC({{\mathsf {X}}};{\bar{{{\mathbb {R}}}}})\) and the compactness of \({{\mathsf {X}}}\). We have

$$\begin{aligned} \sup _{\gamma \in {{\mathsf {X}}}} \Big ( \overline{f}(\gamma ) - \frac{{{\mathsf {d}}}^2(\rho _0,\gamma )}{2t} \Big ) - f_0(\rho _0) = \sup _{\rho , \gamma \in {{\mathsf {X}}}} \Big ( \big ( \overline{f}(\gamma ) - \frac{{{\mathsf {d}}}^2(\rho ,\gamma )}{2t} \big ) - f_0(\rho ) \Big ). \end{aligned}$$
(2.16)

Since the \(\overline{f}\) is a viscosity sub-solution to (1.28), by compactness of \({{\mathsf {X}}}\), there exists \(\gamma _0 \in {{\mathsf {X}}}\) such that

$$\begin{aligned} \Big ( \overline{f}(\gamma _0) - \frac{{{\mathsf {d}}}^2(\rho _0,\gamma _0 )}{2t} \Big ) =\sup _{\gamma \in {{\mathsf {X}}}} \Big ( \overline{f}(\gamma ) - \frac{{{\mathsf {d}}}^2(\rho _0,\gamma )}{2t} \Big ) = \overline{f}_t(\rho _0) \end{aligned}$$
(2.17)

with

$$\begin{aligned} (\overline{f} - h_0)(\gamma _0) \le H_0 \frac{{{\mathsf {d}}}^2(\rho _0, \cdot )}{2t} (\gamma _0). \end{aligned}$$
(2.18)

From the upper boundedness of \(h_0 - \overline{f}\), we arrive at the estimate that \(S(\gamma _0)<\infty \). Thus (2.16) reduces to

$$\begin{aligned} \Big (\overline{f}(\gamma _0) - \frac{{{\mathsf {d}}}^2(\rho _0,\gamma _0)}{2t} \Big ) - f_0(\rho _0) =\sup _{\gamma , \rho \in {{\mathsf {X}}}} \Big ( \big ( \overline{f}(\gamma ) - \frac{{{\mathsf {d}}}^2(\rho ,\gamma )}{2t} \big ) - f_0(\rho )\Big ). \end{aligned}$$
(2.19)

The above implies (2.12), hence we can apply Lemma 2.7 to (2.18), which results in

$$\begin{aligned} (\overline{f} - h_0)(\gamma _0) \le {{{\bar{H}}}}_0 f_0 (\rho _0). \end{aligned}$$

From (2.19), we obtain a rough estimate

$$\begin{aligned} t^{-1} {{\mathsf {d}}}^2(\rho _0,\gamma _0) =\overline{f}(\gamma _0) - \overline{f}_t(\rho _0) \le \overline{f}(\gamma _0) - \overline{f}(\rho _0) \le 2 \Vert \overline{f} \Vert _\infty . \end{aligned}$$

Denoting a nondecreasing modulus of h by \(\omega _h\), then

$$\begin{aligned} h(\gamma _0) - h(\rho _0) \le \omega _h\big ({{\mathsf {d}}}(\rho _0, \gamma _0)\big ) \le \omega \big ( \sqrt{t} C_{\overline{f}} \big ), \text { with } C_{\overline{f}} := \sqrt{2}\big ( \Vert \overline{f} \Vert _\infty \big )^{1/2}. \end{aligned}$$

We note that, from (2.17),

$$\begin{aligned} (\overline{f} - h_0)(\gamma _0)&= \overline{f}_t (\rho _0) + \frac{{{\mathsf {d}}}^2(\rho _0,\gamma _0)}{2t} -h(\rho _0) + \big ( h(\rho _0) - h_0(\gamma _0)\big ) \\&\ge \overline{f}_t (\rho _0) -\Big ( h(\rho _0) + \omega _h \big (\sqrt{t} C_{ \overline{f}} \big ) \Big ). \end{aligned}$$

The claim is established. \(\quad \square \)

Finally, we are in a position to prove Theorem 2.1.

Proof

By Lemmas 2.4 and 2.8, we know that the functions \(\overline{f}_t\) and \({\underline{f}}_t\) satisfy the conditions of Lemma 2.3, for each \(t>0\). Hence by the comparison principle in Lemma 2.3, we have

$$\begin{aligned} \sup _{{\mathsf {X}}}\big ( \overline{f}_t - {\underline{f}}_t\big ) \le \sup _{{\mathsf {X}}}\big (h_{0,t} - h_{1,t}\big ) = \sup _{{\mathsf {X}}}\big ( h_0 - h_1 \big ) + \omega _0(\sqrt{t} C_{\overline{f}}) + \omega _1(\sqrt{t} C_{{\underline{f}}}). \end{aligned}$$

Since

$$\begin{aligned} \overline{f}(\rho ) - {\underline{f}}(\rho ) \le \overline{f}_t(\rho ) - {\underline{f}}_t(\rho ), \quad \forall t\in (0,1), \rho \in {{\mathsf {X}}}\end{aligned}$$

the conclusion of Theorem 2.1 follows by taking \(t \rightarrow 0^+\). \(\quad \square \)

3 Existence of Solutions for the Hamilton–Jacobi Equation Through Optimal Control of Nonlinear Diffusion Equations and Related Nisio Semigroups

We recall that \(({{\mathsf {X}}}, {{\mathsf {d}}})\) is a compact metric space, hence \(C({{\mathsf {X}}}) =C_b({{\mathsf {X}}})= UC({{\mathsf {X}}})\). Theorem 2.1 establishes that for each \(h \in C_b({{\mathsf {X}}})\) and \(\alpha >0\), there exists at most one function f such that it is both a sub-solution to (1.28) as well as a super-solution to (1.29). In this section, we show that there exists such a solution. Moreover, this solution is unique and can always be represented as the value function \(f= R_\alpha h\) of the family of nonlinear diffusion equations with control introduced in the introduction (see (1.7) for the definition of the operator \(R_\alpha \)):

$$\begin{aligned}&\partial _t \rho =\frac{1}{2} \partial _{xx}^2 \log \rho + \partial _x \eta , \end{aligned}$$
(3.1)

with

$$\begin{aligned}&\int _0^T\int _{{\mathcal {O}}} | \eta (r,x)|^2 dx dr < \infty . \end{aligned}$$
(3.2)

3.1 A set of nonlinear diffusion equations with control

Throughout this section, we always assume that \(\eta \) satisfies (3.2). We use the convention \(0 \log 0 :=0\).

Definition 3.1

We say that \((\rho , \eta )\) is a weak solution to (3.1) in the time interval [0, T] if the following holds:

  1. (1)

    \(\rho (\cdot ) \in C([0,T];{{\mathsf {X}}})\).

  2. (2)

    \(\rho (t, dx) = \rho (t,x) dx\) holds for \(t>0\), for some measurable function \((t,x) \mapsto \rho (t,x)\).

  3. (3)

    The following estimates hold:

    $$\begin{aligned} \int _0^T \int _{{\mathcal {O}}} \rho (t,x) \log \rho (t,x) dx dt <\infty , \end{aligned}$$
    (3.3)

    and

    $$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} | \log \rho (t,x) | dx dt <\infty , \quad \forall s>0, \end{aligned}$$
    (3.4)

    and

    $$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} |\partial _x \log \rho (t,x)|^2 dx dt <\infty , \quad \forall s>0. \end{aligned}$$
    (3.5)
  4. (4)

    For every \(\varphi \in C^\infty ({\mathcal {O}})\) and \(0<s<t \le T\), we have

    $$\begin{aligned} \langle \varphi , \rho (t) \rangle - \langle \varphi , \rho (s)\rangle = \int _s^t \big ( \langle \frac{1}{2} \partial ^2_{xx} \varphi , \log \rho (r) \rangle - \langle \partial _x \varphi , \eta (r) \rangle \big )dr. \end{aligned}$$
    (3.6)

In the above, note that \([0,\infty ) \ni r \mapsto r \log r\) is a function bounded from below (with convention \(0\log 0 =0\)), hence the \(\int _{{\mathcal {O}}} \rho (x) \log \rho (x) dx \in {{\mathbb {R}}}\cup \{+\infty \}\) is well defined.

We now describe the technical difficulties we need to overcome in this section. Let \(\Phi (r):= \frac{1}{2} \log r\), for \(r>0\). Then (3.1) can be written as

$$\begin{aligned} \partial _t \rho = \partial _{xx}^2 \Phi (\rho ) + \partial _x \eta , \end{aligned}$$

where

$$\begin{aligned} \eta (t,x) \in L^2\big ((0,T); L^2({\mathcal {O}})\big ). \end{aligned}$$

Equations similar to this type have been studied by Vázquez [43] with the control variable \(\partial _x \eta \) denoted using f. However, there it is assumed that \(\Phi \) is at least continuous. In contrast, our \(\Phi \) has \(\Phi (0) =-\infty \) and is thus singular. In addition, we also need to ensure that solution is non-negative. In [43], an approach based on the maximum principle is developed to establish positivity of a solution. This works well in the absence of control, \(f=0\), or when \(f\ge 0\). However, the positivity of a solution in our case, for this special f, seems to be of a different origin: the singularity of \(\Phi (0)=-\infty \) plays a key role. Therefore, we present a detailed justification using energy estimates. A further, but very minor, issue in that [43] is focused on Dirichlet or Neumann boundary conditions, whereas we have a periodic boundary. However, the boundary conditions only appear after integration by parts and the argument simplifies for the case of periodic boundary conditions. Hence, we do not provide details for this last issue and only address the first two issues below by studying a sequence of approximate equations.

The main purpose of this subsection is to establish the following existence result. We recall that the definition of the entropy function S is given in (1.19).

Lemma 3.2

For every \(\eta \) satisfying (3.2) and every \(\rho (0) =\rho _0 \in {{\mathsf {X}}}\subset H_{-1}({\mathcal {O}})\), there exists a \(\rho (\cdot ) \in C([0,T]; {{\mathsf {X}}})\) such that \((\rho ,\eta )\) solve (3.1)–(3.2) in the weak sense of Definition 3.1. This solution is unique. Indeed, such a pair \((\rho ,\eta )\) also satisfies the following properties.

  1. (1)

    For every \(\gamma _0 \in {{\mathsf {X}}}\) such that \(S(\gamma _0)<\infty \), and for every \(0 \le s<t <T\), the following variational inequalities hold:

    $$\begin{aligned}&\frac{1}{2} \Vert \rho (t) -\gamma _0 \Vert _{-1}^2 + \int _s^t \Big ( \frac{1}{2} \big ( S(\rho (r)) - S(\gamma _0) \big ) \nonumber \\&\qquad + \int _{{\mathcal {O}}} \eta (r,x)\big ( \partial _x (-\partial _{xx}^2)^{-1} (\rho (r)-\gamma _0)(x) \big )dx \Big ) dr \nonumber \\&\quad \le \frac{1}{2} \Vert \rho (s) -\gamma _0 \Vert _{-1}^2. \end{aligned}$$
    (3.7)
  2. (2)

    It holds that \(S(\rho (t)) <\infty \) for every \(t >0\) and \(\int _0^T S(\rho (r)) dr <\infty \) (this implies in particular that \(\rho (t,dx) = \rho (t,x) dx\) for \(t>0\)).

  3. (3)

    For every \(0<s<T<\infty \), it holds that

    $$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} \big ( - \log \rho \big )^+ dx dr <\infty . \end{aligned}$$
  4. (4)

    For every \(0 \le s \le t\), allowing the possibility of \(S(\rho (0))=+\infty \), the following holds

    $$\begin{aligned} S(\rho (t)) +\int _s^t \int _{{\mathcal {O}}} \big ( \frac{1}{2} |\partial _x \log \rho (r,x)|^2 + \eta (r,x) \partial _x \log \rho (r,x)\big ) dx dr \le S(\rho (s)). \end{aligned}$$
    (3.8)

We divide the proof into several parts.

3.1.1 Approximate equations

Let \(\eta \in L^2((0,T) \times {{\mathcal {O}}})\) and \(\rho _0 \in {{\mathsf {X}}}\). We extend the definition to \(L^2({{\mathbb {R}}}\times {{\mathcal {O}}})\) by \(\eta (t,x):=0\) whenever \(t \le 0\) or \(t\ge T\). Let \(J \in C^\infty ({\mathcal {O}})\) be a standard spatial mollifier and \(G \in C^\infty _c({\mathcal {O}})\) a standard time-variable mollifier. We define mollification of (possibly signed) measures and functions on \({\mathcal {O}}\) in the usual sense. Hence \(\rho _{\epsilon ,0}: = J_\epsilon * \rho _0 \in C^\infty ({\mathcal {O}})\). We write

$$\begin{aligned} \eta _\epsilon (t,x) := (G_\epsilon *_t J_\epsilon *_x\eta )(t,x), \text { and } f_\epsilon := \partial _x \eta _\epsilon . \end{aligned}$$

We approximate the singular function \(\Phi \) by a smooth function \(\Phi _\epsilon \) as follows:

$$\begin{aligned} \Phi _\epsilon (r):= {\left\{ \begin{array}{ll} \frac{1}{2} \log r + C_\epsilon , &{} r \ge \epsilon , \\ \theta _\epsilon (r) \in C^2 &{} 0\le r \le \epsilon \\ \frac{1}{\epsilon } r, &{} r <0. \end{array}\right. } \end{aligned}$$

Note that \(\Phi ^\prime (r) =\Phi _\epsilon ^\prime (r)\) for \(r > \epsilon \). We choose the constant \(C_\epsilon := -\frac{3}{2}\log \epsilon \) so that \(\Phi _\epsilon (\epsilon ) = - \log \epsilon >0 = \Phi _\epsilon (0)\). This feature allows us to pick a smooth function \(\theta _\epsilon \) with \(\theta ^\prime _\epsilon >0\) such that

$$\begin{aligned}&\theta _\epsilon (0) =0, \quad \theta _\epsilon ^\prime (0) =\frac{1}{\epsilon },\quad \theta _\epsilon ^{\prime \prime }(0)=0; \\&\theta _\epsilon (\epsilon ) =\frac{1}{2} \log \epsilon + C_\epsilon = -\log \epsilon >0, \quad \theta _\epsilon ^\prime (\epsilon ) =\frac{1}{2\epsilon }, \quad \theta _\epsilon ^{\prime \prime }(\epsilon )=-\frac{1}{2 \epsilon ^2}, \end{aligned}$$

so that

$$\begin{aligned} \Phi _\epsilon \in C^3({{\mathbb {R}}}), \quad \Phi ^\prime _\epsilon (r)>0, \forall r\in {{\mathbb {R}}}, \quad \Phi _\epsilon (0)=0. \end{aligned}$$

We denote the primitive \(\Theta _\epsilon (t) =\int _0^t \theta _\epsilon (r) dr\), and note that (\(\theta _\epsilon ^\prime >0\) ensures that \(\theta _\epsilon \) is an increasing function)

$$\begin{aligned} \sup _{0 <t \le \epsilon } |\Theta _\epsilon (t)| \le \epsilon \theta _\epsilon (\epsilon ) = - \epsilon \log \epsilon \rightarrow 0 \text { as } \epsilon \rightarrow 0^+. \end{aligned}$$

This construction ensures that \(\Phi _\epsilon ^\prime \in C({{\mathbb {R}}})\). Now, we consider

$$\begin{aligned} \partial _t \rho _\epsilon = \partial _{xx}^2 \Phi _\epsilon (\rho _\epsilon ) + \partial _x \eta _\epsilon , \quad \rho _\epsilon (0) = J_\epsilon * \rho _0. \end{aligned}$$
(3.9)

By Theorem 5.7 in [43], there exists a unique weak solution \(\rho _\epsilon (\cdot )\) in the sense of Definition 5.4 of [43]. Hence for every \(\varphi \in C^\infty ({\mathcal {O}})\), it holds that

$$\begin{aligned} \langle \varphi , \rho _\epsilon (t) \rangle - \langle \varphi , \rho _\epsilon (s)\rangle =\int _s^t \big ( \langle \frac{1}{2} \partial _{xx}^2 \varphi , \Phi _\epsilon (\rho _\epsilon (r)) \rangle - \langle \partial _x \varphi , \eta _\epsilon (r) \rangle \big ) dr. \end{aligned}$$
(3.10)

In fact, in the regularized situation considered at the moment, standard quasilinear theory applies (e.g., the method of proving Theorem 6.1 in Chapter V of Ladyženskaja, Solonnikov and Ural’ceva [36]), hence \(\rho _\epsilon \in C^{1,2}((0,T) \times \bar{{\mathcal {O}}})\) is a classical solution. Note that the first part of condition (6.9) in Chapter V of [36] requires \(\Phi ^\prime \) be uniformly bounded away from zero. The above constructed \(\Phi _\epsilon \) does not satisfy this requirement. However, this is not a problem in current context because that \(\rho _\epsilon \) is bounded. We explain this in detail: Let \(M>0\) be a large parameter, we modify the definition of \(\Phi _\epsilon (r)\) into \(\Phi _\epsilon ^M(r)\) for those \(r > M\) and keep \(\Phi _\epsilon ^M(r)= \Phi _\epsilon (r)\) for \(r \le M\). We do such modification so that the \(\Phi _\epsilon ^M\) satisfies conditions of [36]. Then there exists a unique classical \(C^{1,2}\)-solution \(\rho _\epsilon ^M\) for

$$\begin{aligned} \partial _t \rho _\epsilon ^M = \partial _{xx}^2 \Phi _\epsilon ^M(\rho _\epsilon ^M) + \partial _x \eta _\epsilon . \end{aligned}$$

By the maximum principle,

$$\begin{aligned} \inf _{{\mathcal {O}}} \rho _\epsilon (0) + t \inf _{[0,T] \times \mathcal O} \partial _x \eta _\epsilon \le \rho _\epsilon ^M(t,x) \le \sup _{{\mathcal {O}}} \rho _\epsilon (0) + t \sup _{[0,T] \times \mathcal O} \partial _x \eta _\epsilon = : M_0, \quad \forall t \in (0,T). \end{aligned}$$

Consequently, when \(M > M_0\), \(\rho _\epsilon :=\rho _\epsilon ^M\) solves (3.9) in classical sense.

We note that, for each \(t>0\) and \(\epsilon >0\), we cannot rule out the possibility that \(\rho _\epsilon (x,t)<0\). But we will show that this possibility disappears in the limit \(\epsilon \rightarrow 0^+\), by asymptotic estimates we now establish.

There are three important regularity properties of the \(\rho _\epsilon \) we will exploit. First, let

$$\begin{aligned} \Psi _\epsilon (s) := \int _0^s \Phi _\epsilon (r) dr. \end{aligned}$$
(3.11)

Then we have the energy inequality (5.20) in Theorem 5.7 of [43]:

$$\begin{aligned}&\int _{{\mathcal {O}}} \Psi _\epsilon (\rho _\epsilon (x,T)) dx + \int _s^T \int _{{\mathcal {O}}} \Big ( |\partial _x \Phi _\epsilon (\rho _\epsilon (r,x))|^2 + \eta _\epsilon (r,x) \partial _x \Phi _\epsilon \big (\rho _\epsilon (r,x)\big ) \Big ) dx dr \nonumber \\&\qquad \le \int _{{\mathcal {O}}} \Psi _\epsilon (\rho _{\epsilon }(s,x)) dx , \quad \forall 0 \le s \le T \end{aligned}$$
(3.12)

Note that \(\rho _\epsilon (0) \in L^\infty ({\mathcal {O}})\), hence \(\int | \Psi _\epsilon (\rho _\epsilon (0,x)) | dx <\infty \). Also, by Jensen’s inequality,

$$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} |\eta _\epsilon (r,x)|^2 dx dr \le \int _s^T \int _{{\mathcal {O}}} |\eta (r,x)|^2 dx dr<\infty . \end{aligned}$$

Second, we have inequalities of dissipation type: for all \(\gamma _0 \in H_{-1}({\mathcal {O}})\) such that \(\int _{{\mathcal {O}}} \Psi _\epsilon (\gamma _0)dx<\infty \) and \(0\le s<t\)

$$\begin{aligned}&\frac{1}{2} \Vert \rho _\epsilon (t) - \gamma _0 \Vert _{-1}^2 + \int _s^t \big \langle (\rho _\epsilon (t) - \gamma _0), \Phi _\epsilon (\rho _{\epsilon }(r))\big \rangle dr \nonumber \\&\quad + \int _s^t \big \langle \eta _\epsilon (r) , \partial _x (-\partial _{xx}^2)^{-1} (\rho _\epsilon (t) - \gamma _0) \big \rangle dr \le \frac{1}{2} \Vert \rho _\epsilon (s) -\gamma _0\Vert _{-1}^2. \end{aligned}$$
(3.13)

This estimate can be verified through integration by parts. Note that \(\Psi _\epsilon ^{\prime \prime } = \Phi _\epsilon ^\prime >0\). The convexity of the \(\Psi _\epsilon \) implies that \((v-u) \Phi _\epsilon (u) \le \Psi _\epsilon (v) - \Psi _\epsilon (u)\). Therefore the last inequality also leads to

$$\begin{aligned}&\frac{1}{2} \Vert \rho _\epsilon (t) - \gamma _0 \Vert _{-1}^2 + \int _s^t \int _{{\mathcal {O}}} \big ( \Psi _\epsilon (\rho _\epsilon (r,x))- \Psi _\epsilon (\gamma _0(x)) \big ) dx dr \nonumber \\&\quad + \int _s^t \big \langle \eta _\epsilon (r) , \partial _x (-\partial _{xx}^2)^{-1} (\rho _\epsilon (r) - \gamma _0) \big \rangle dr \le \frac{1}{2} \Vert \rho _\epsilon (s) -\gamma _0\Vert _{-1}^2. \end{aligned}$$
(3.14)

By direct computation,

$$\begin{aligned} \Psi _\epsilon (r) = {\left\{ \begin{array}{ll} \frac{1}{2} \big ( r \log r - r ) + C_\epsilon r - \frac{1}{2} ( \epsilon \log \epsilon - \epsilon ) - \epsilon C_\epsilon + \Theta _\epsilon (\epsilon ), &{} \text { if } r \ge \epsilon \\ \Theta _\epsilon (r), &{} \text { if } 0 \le r< \epsilon \\ \frac{1}{2 \epsilon } r^2, &{} \text { if } r < 0. \end{array}\right. } \end{aligned}$$
(3.15)

From \((t \theta _\epsilon )^\prime = t \theta _\epsilon ^\prime + \theta _\epsilon \ge \theta _\epsilon \) for \(t\ge 0\), we obtain the estimate \(\Theta _\epsilon (t) \le t \theta _\epsilon (t)\). This implies in particular that

$$\begin{aligned} - \frac{1}{2} ( \epsilon \log \epsilon - \epsilon ) - \epsilon C_\epsilon + \Theta _\epsilon (\epsilon ) \rightarrow 0, \text { as } \epsilon \rightarrow 0^+. \end{aligned}$$

Integrating the solution of (3.9), we also arrive at the conservation property \(\langle 1, \rho _\epsilon (t)\rangle = \langle 1, \rho _{\epsilon ,0}\rangle =1\). We decompose \(\rho _\epsilon \) into positive and negative parts,

$$\begin{aligned} \rho _\epsilon (t,x) = \rho _\epsilon ^+(t,x) -\rho _\epsilon ^-(t,x). \end{aligned}$$

Then, when the \(\gamma _0 \in {{\mathsf {X}}}\) is a probability measure satisfying \(S(\gamma _0)<\infty \), we have

$$\begin{aligned}&\int _0^t\int _{{\mathcal {O}}} \Big ( \Psi _\epsilon (\rho _\epsilon (r,x)) - \Psi _\epsilon (\gamma _0(x)) \Big ) dx dr \\&\quad =\int _0^t\int _{{\mathcal {O}}} \frac{1}{2} \Big ( \rho _\epsilon ^+(r,x) \log \rho _\epsilon ^+(r,x) - \gamma _0(x) \log \gamma _0(x) \Big ) dx dr \\&\qquad + \int _0^t\int _{{\mathcal {O}}} \Big (\frac{| \rho _\epsilon ^-(r,x)|^2}{2\epsilon } + (\frac{1}{2} - C_\epsilon ) \rho _\epsilon ^{-}(r,x) \Big ) dx dr + o_\epsilon (1). \end{aligned}$$

We note that

$$\begin{aligned} \frac{r^2}{2\epsilon } - C_\epsilon r \ge \frac{r^2}{4 \epsilon } - \epsilon C_\epsilon ^2 \quad \text { and } \quad \sqrt{\epsilon } C_\epsilon \rightarrow 0. \end{aligned}$$

Therefore, the above estimates combined with (3.14) give a useful control on the amount of negative mass of \(\rho _\epsilon \):

$$\begin{aligned} \sup _{\epsilon >0} \int _0^t\int _{{\mathcal {O}}} \frac{1}{\epsilon } | \rho _\epsilon ^-(r,x)|^2 dx dr <\infty . \end{aligned}$$
(3.16)

Third, we show the following property.

Lemma 3.3

The \(\{ \rho _\epsilon \}_\epsilon \) is relatively compact in \(C\big ([0,\infty ); H_{-1}({\mathcal {O}})\big )\). Hence, selecting subsequence if necessary, there exists a limiting curve \(\rho (\cdot ) \in C\big ([0,\infty ); H_{-1}({\mathcal {O}})\big )\) such that

$$\begin{aligned} \lim _{\epsilon \rightarrow 0^+} \sup _{0\le t \le T} \Vert \rho _\epsilon (t) - \rho (t)\Vert _{-1} =0, \quad \forall T>0. \end{aligned}$$
(3.17)

Proof

We verify relative compactness of the \(\{ \rho _\epsilon (\cdot ) \}_{\epsilon >0}\) through Arzelá-Ascoli lemma. The proof would be easier if \(\rho _\epsilon (t) \in \mathsf {{\mathsf {X}}}\), since the \({{\mathsf {X}}}\) is a compact space. However, for each fixed \(\epsilon >0\), our construction allow the possibility of negative mass in \(\rho _\epsilon (t)\) even though \(\rho _\epsilon (0) \in \mathsf {{\mathsf {X}}}\). The negative masses only vanish in the \(\epsilon \rightarrow 0\) limit.

First, we verify the existence of a compact subset \(K_1 \subset \subset H_{-1}({\mathcal {O}})\) such that

$$\begin{aligned} \rho _\epsilon (t) \in K_1, \quad \forall \epsilon >0, t \in [0,T]. \end{aligned}$$
(3.18)

We start with a compact set \(K_0 := \{ \rho _0, J_\epsilon * \rho _0 : \epsilon >0\} \subset \subset H_{-1}({\mathcal {O}})\). Then for every \(\delta >0\), there exists a finite positive integer \(N:=N(\delta ) \in {{\mathbb {N}}}\) and \(\rho _{1,0}, \ldots , \rho _{N,0} \in C^\infty ({\mathcal {O}}) \cap {{\mathsf {X}}}\) such that \(K_0 \subset \cup _{k=1}^N B(\rho _{k, 0}; \delta )\). Let \(\rho _{\epsilon , k}(t)\) be the solution to

$$\begin{aligned} \partial _t \rho _{\epsilon , k} = \partial _{xx}^2 \Phi _\epsilon (\rho _{\epsilon ,k}) + \partial _x \eta _\epsilon , \quad \rho _{\epsilon ,k}(0) = \rho _{k,0}. \end{aligned}$$

By a contraction estimate in Chapter 6.7.2 of [43] (see also part (iii) of Theorem 6.17 there),

$$\begin{aligned} \sup _{t \in [0,T]} \Vert \rho _{\epsilon , k}(t) -\rho _{\epsilon }(t) \Vert _{-1} \le \Vert \rho _{\epsilon ,k}(0) - \rho _{\epsilon }(0) \Vert _{-1}, \quad \forall \epsilon >0. \end{aligned}$$
(3.19)

By (3.12), noting \(\rho _{k,0} \in {\mathcal {P}}({\mathcal {O}})\), for every \(t \in [0,T]\), we have

$$\begin{aligned}&\sup _{\epsilon>0} \Big (\int _{{\mathcal {O}}} \Psi _\epsilon (\rho _{\epsilon ,k}(x,t)) dx - C_\epsilon \Big ) \\&\quad \le \sup _{\epsilon >0}\sup _{k=1,\ldots ,N} \Big ( \int _{{\mathcal {O}}} \Psi _\epsilon (\rho _{k,0}) dx - C_\epsilon \Big ) + \int _0^T\int _{{\mathcal {O}}} |\eta (r,x)|^2 dx dr \\&\quad =: L \big (\rho _{1,0},\ldots , \rho _{N(\delta ),0}; \eta (\cdot )\big ) =: L_\delta <\infty . \end{aligned}$$

In view of the explicit form of \(\Psi _\epsilon \) in (3.15), the set

$$\begin{aligned} K_{1,\delta }(l):= \Big \{ \gamma \in H_{-1}({\mathcal {O}}) : \int _{{\mathcal {O}}} \gamma (x) dx =1, \sup _{\epsilon >0} \Big ( \int _{{\mathcal {O}}} \Psi _\epsilon (\gamma ) dx - C_\epsilon \Big ) \le l\Big \} \end{aligned}$$

is relatively compact in \(H_{-1}({\mathcal {O}})\) for every finite \(l \in {{\mathbb {R}}}_+\). Denote \(K_{1,\delta }:=K_{1,\delta }(L_\delta )\), then

$$\begin{aligned} \rho _{k,\epsilon }(t) \in K_{1,\delta }, \quad \forall \epsilon >0, k \in \{ 1,\ldots , N(\delta )\}, t \in [0,T]. \end{aligned}$$

Let \(K_{1,\delta }^\delta \) denote \(\delta \)-thickening set of the \(K_{1,\delta }\). Then by (3.19),

$$\begin{aligned} \rho _\epsilon (t) \in K_{1,\delta }^\delta , \quad \forall \delta>0, \epsilon >0, t \in [0,T]. \end{aligned}$$
(3.20)

Taking \(K_1:= \overline{\cap _{\delta >0} K_{1,\delta }^\delta }\) (which is complete and totally bounded), we arrive at (3.18).

Second, through variational inequality (3.14), we obtain a local uniform modulus of continuity estimate \(\sup _{\epsilon >0}\sup _{t, s \in [0,T], |t-s|\le 1}\Vert \rho _\epsilon (t) - \rho _\epsilon (s) \Vert _{-1} \le \omega (|t-s|)\) for some modulus \(\omega \). It is sufficient to verify that, for every \(\delta \in (0,1)\), there exists a finite positive number \(C_\delta >0\) and \(\alpha \in (0,1)\) such that

$$\begin{aligned} \Vert \rho _\epsilon (t) - \rho _\epsilon (s) \Vert _{-1} \le \delta + C_\delta |t-s|^\alpha , \quad \forall \epsilon >0, 0\le s \le t\le T. \end{aligned}$$

Then we conclude by taking the \(\omega (r):= \inf _{\delta \in (0,1)} \{ \delta + C_\delta r^\alpha \}\). Let \(\delta \in (0,1)\) be given. For every \(s,t \in [0,T]\) with \(|s-t|\le 1\) and \(\epsilon >0\), by (3.20), there exists \(\gamma := \gamma (\delta ,\epsilon ,s) \in K_{1,\delta }(L_\delta )\) such that \(\Vert \rho _\epsilon (s) - \gamma \Vert _{-1} <\delta \). By (3.14),

$$\begin{aligned} \frac{1}{2} \Vert \rho _\epsilon (t) - \gamma \Vert _{-1}^2&\le \frac{1}{2} \delta ^2 + C_{1,\delta } \sqrt{|t-s|} + \int _s^t (\int _{{\mathcal {O}}}\Psi _\epsilon (\gamma )dx - C_\epsilon ) dr \\&\le \frac{1}{2} \delta ^2 + C_{1,\delta } \sqrt{|t-s|} + L_\delta |t-s|. \end{aligned}$$

Consequently

$$\begin{aligned} \Vert \rho _\epsilon (t) - \rho _\epsilon (s) \Vert _{-1} \le \Vert \rho _\epsilon (t) -\gamma \Vert _{-1} + \Vert \rho _\epsilon (s) -\gamma \Vert _{-1} \le 2\delta + C_{2,\delta } |t-s|^{\frac{1}{4}}. \end{aligned}$$

Note that the \(C_{2,\delta }\) only depends on \(\delta \) and not on \(\epsilon \), nor on st, we conclude. \(\quad \square \)

3.1.2 A priori regularities for the PDE with control (3.1)

Lemma 3.4

Let \(\rho _0 \in {{\mathsf {X}}}\) and \(\rho (\cdot ) \in C([0,\infty ); H_{-1}({\mathcal {O}}))\) be the limit as obtained from (3.17). It then satisfies the following properties.

  1. (1)

    It holds that \(\rho (r) \in {{\mathsf {X}}}\) for every \(t \ge 0\). Indeed, \(\rho (r,dx)=\rho (r, x) dx\) for \(r>0\) a.e., and

    $$\begin{aligned} \rho (r,x) \ge 0, \text { a.e. } (r,x) \in (0,\infty ) \times {{\mathcal {O}}}. \end{aligned}$$
    (3.21)
  2. (2)

    The variational inequality (3.7) holds.

  3. (3)

    \(\int _0^T \int _{{\mathcal {O}}} \rho (t,x) \log \rho (t,x) dx dt <\infty \).

  4. (4)

    \(\rho (\cdot ) \in C([0,\infty ); {{\mathsf {X}}})\).

Proof

Taking the limit \(\epsilon \rightarrow 0\) in (3.17), by the approximate variational inequality estimates (3.14), the negative mass estimate (3.16), and lower semicontinuity arguments, we conclude that \(\rho (r,dx) = \rho (r,x) dx\) for \(r >0\) a.e., that (3.21) holds (hence \(\rho (r) \in {{\mathsf {X}}}\) for all \(r\ge 0\)), and that (3.7) holds.

The estimate \(\int _0^T S(\rho (t)) dt < \infty \) follows from (3.7). \(\quad \square \)

We remark that the variational inequalities (3.7) alone (for a family of \(\gamma _0 \in {{\mathsf {X}}}\) with \(S(\gamma _0)<\infty \)) can be used as a definition of a solution for (3.1). This definition would suffice to establish a uniqueness result, as we now show.

Lemma 3.5

Let \((\rho _i, \eta _i)\), \(i=1,2\) solve (3.1)–(3.2) in the sense that both pairs satisfy the variational inequalities (3.7). In addition, we assume that \(\rho _i(\cdot ) \in C([0,T];{{\mathsf {X}}})\) for every \(T>0\) and \(i=1,2\). Then

$$\begin{aligned} \Vert \rho _1(t) -\rho _2(t) \Vert _{-1} \le \Vert \rho _1(0) - \rho _2(0) \Vert _{-1} + \int _0^t \Vert \eta _1 - \eta _2 \Vert _{L^2({\mathcal {O}})} dr. \end{aligned}$$
(3.22)

Hence, given a fixed initial condition and the same control \(\eta =\eta _1=\eta _2\), it follows that \(\rho _1=\rho _2\).

Proof

Let \(0<\alpha<\beta <T\) and \(0<s<t<T\). From (3.7),

$$\begin{aligned}&\int _\alpha ^\beta \big ( \Vert \rho _1(t) - \rho _2(\tau ) \Vert _{-1}^2 - \Vert \rho _1(s) - \rho _2(\tau ) \Vert _{-1}^2 \big ) d \tau \\&\quad \le \int _\alpha ^\beta \int _s^t \big ( S(\rho _2(\tau )) -S(\rho _1(r)) + 2 \langle \eta _1(r), \partial _x (-\partial _{xx}^2)^{-1} (\rho _1(r) - \rho _2(\tau )) \rangle \big ) d r d\tau . \end{aligned}$$

Similarly,

$$\begin{aligned}&\int _s^t \big ( \Vert \rho _1(r) - \rho _2(\beta ) \Vert _{-1}^2 - \Vert \rho _1(r) - \rho _2(\alpha ) \Vert _{-1}^2 \big ) d r \\&\quad \le \int _s^t \int _\alpha ^\beta \big ( S(\rho _1(r)) - S(\rho _2(\tau )) + 2 \langle \eta _2(\tau ), \partial _x (-\partial _{xx}^2)^{-1} (\rho _2(\tau ) -\rho _1(r)) \rangle \big ) d\tau dr. \end{aligned}$$

Adding these two inequalities, we obtain

$$\begin{aligned}&\int _\alpha ^\beta \big ( \Vert \rho _1(t) - \rho _2(\tau ) \Vert _{-1}^2 - \Vert \rho _1(s) - \rho _2(\tau ) \Vert _{-1}^2 \big ) d \tau \\&\qquad + \int _s^t \big ( \Vert \rho _1(r) - \rho _2(\beta ) \Vert _{-1}^2 - \Vert \rho _1(r) - \rho _2(\alpha ) \Vert _{-1}^2 \big ) d r \nonumber \\&\quad \le \int _{r=s}^t\int _{\tau =\alpha }^\beta 2 \langle \eta _1(r) -\eta _2(\tau ), \partial _x (-\partial _{xx}^2)^{-1} (\rho _1(r) - \rho _2(\tau )) \rangle d r d\tau . \nonumber \end{aligned}$$
(3.23)

We define

$$\begin{aligned} F(t,s;\beta , \alpha )&:= \int _s^t \int _\alpha ^\beta \Vert \rho _1(r) - \rho _2(\tau ) \Vert _{-1}^2 d\tau d r, \\ M(t,s;\beta ,\alpha )&:= \int _{r=s}^t\int _{\tau =\alpha }^\beta 2 \langle \eta _1(r)-\eta _2(\tau ), \partial _x (-\partial _{xx}^2)^{-1} (\rho _1(r) - \rho _2(\tau )) \rangle d \tau d r. \end{aligned}$$

Then (3.23) becomes

$$\begin{aligned} \partial _t F + \partial _s F + \partial _\beta F + \partial _\alpha F \le M. \end{aligned}$$

If we write \(G(h) := F(t+h, s+h; t+h, s+h) \in C^1({{\mathbb {R}}}_+)\), then the last inequality becomes

$$\begin{aligned} \partial _h G(h) \le M(t+h,s+h;t+h,s+h). \end{aligned}$$

That is,

$$\begin{aligned}&\int _s^t \int _s^t \Vert \rho _1(r+h) - \rho _2(\tau +h) \Vert _{-1}^2 d \tau d r - \int _s^t \int _s^t \Vert \rho _1(r) - \rho _2(\tau ) \Vert _{-1}^2 d\tau d r \\&\quad = G(h) - G(0) \\&\quad \le \int _0^h \int _{s}^t\int _s^t 2 \langle \eta _1(r+q)-\eta _2(\tau +q), \partial _x (-\partial _{xx}^2)^{-1} (\rho _1(r+q) - \rho _2(\tau +q)) \rangle d \tau d r dq. \end{aligned}$$

We multiply by \((t-s)^{-2}\) on both sides and then take the limit \(t \rightarrow s^+\) to find

$$\begin{aligned}&\Vert \rho _1(s+h) - \rho _2(s+h) \Vert _{-1}^2 - \Vert \rho _1(s) - \rho _2(s) \Vert _{-1}^2 \\&\quad \le 2 \int _0^h \Vert \eta _1(s+q)-\eta _2(s+q) \Vert _{L^2} \Vert \rho _1(s+q) - \rho _2(s+q) \Vert _{-1} dq \\&\quad = 2 \int _s^{s+h} \Vert \eta _1(r)-\eta _2(r) \Vert _{L^2} \Vert \rho _1(r) - \rho _2(r) \Vert _{-1} dr. \end{aligned}$$

Here we used the fact that \(\rho _i(\cdot ) \in C({{\mathbb {R}}}_+;H_{-1}(\mathcal O))\) for each \(i=1,2\). By further mollification-approximation estimates, we find the above inequality is equivalent to (3.22).

\(\square \)

Definition 3.1 gives a notion of weak solution for the partial differential equation (3.1). It requires an a priori estimate that \(\log \rho (t,x)\) is locally integrable, so that this quantity can be viewed as a distribution (see (3.6)). Next, we establish this local integrability estimate for the limit \(\rho \) obtained from (3.17). We note that from the estimates in Lemma 3.4, we already know that \(\int _0^T \int _{{\mathcal {O}}} \rho (r,x) \log \rho (r,x) dx dr <\infty \), which implies in particular that

$$\begin{aligned} \int _{r \in [0,T]} \int _{x : \rho (r,x) \ge 1} \log \rho (r,x) dx dr = \int _0^T \int _{{\mathcal {O}}} \big ( \log \rho (r,x) \big )^+ dx dr <\infty . \end{aligned}$$
(3.24)

Therefore, we need to focus on the case where \(\rho (r,x) <1\).

Lemma 3.6

Let \(\rho (\cdot ) \in C([0,\infty ); H_{-1}({\mathcal {O}}))\) be the limit as obtained from (3.17). For every \(0\le s\le T<\infty \), allowing the possibility of \(S(\rho (0))=+\infty \), we have that

$$\begin{aligned}&\frac{1}{2} S(\rho (T)) + \frac{1}{8} \int _s^T \Big ( - \log 2 + \int _{{\mathcal {O}}} (-\log ) \rho (r,x) dx \Big ) dr \nonumber \\&\quad \le \frac{1}{2} S(\rho (s) ) + \int _s^T \int _{{\mathcal {O}}} |\eta (r,x)|^2 dx dr. \end{aligned}$$
(3.25)

Furthermore, in view of (3.24) and Lemma 3.4,

$$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} (\log \rho \big )^- dx dr <\infty , \quad \forall s>0. \end{aligned}$$

Proof

Noting \(\int _{{\mathcal {O}}} \rho _\epsilon (t, x) dx =1\) for all \(t > 0\), we have \(\max _x \rho _\epsilon (t,x) >1/2\). Since \((t,x) \mapsto \rho _\epsilon (t,x)\) is continuous, we can select a family of points \(\{ x_\epsilon (t) \in {\mathcal {O}} : t>0\}\) such that \(\rho _\epsilon (t, x_\epsilon (t)) \ge 1/2\). We also observe that

$$\begin{aligned} \sup _{z \in {\mathcal {O}}} \Big (\Phi _\epsilon (\rho _\epsilon (t,x_\epsilon (t))) - \Phi _\epsilon (\rho _\epsilon (t, z)) \Big ) \le \Big (\int _{{\mathcal {O}}} |\partial _x \Phi _\epsilon (\rho _\epsilon (t,x))|^2 dx\Big )^{\frac{1}{2}} \sup _{z \in {\mathcal {O}}} |z-x_\epsilon (t)|^{\frac{1}{2}}. \end{aligned}$$
(3.26)

Next, we estimate the left hand side of the above in three situations, namely

$$\begin{aligned} \rho _\epsilon (t,z) \ge \epsilon , \quad 0 \le \rho _\epsilon (t,z) \le \epsilon , \quad \rho _\epsilon (t,z) < 0. \end{aligned}$$

We note that \(\Phi _\epsilon (\rho _\epsilon (t,x_\epsilon (t))) \ge -\frac{1}{2} \log 2 + C_\epsilon \). Therefore

$$\begin{aligned} \sup _{z : \rho _\epsilon (t,z) \ge \epsilon } \Big (\Phi _\epsilon (\rho _\epsilon (t,x_\epsilon (t))) - \Phi _\epsilon (\rho _\epsilon (t, z)) \Big ) \ge -\frac{1}{2} \log 2 - \frac{1}{2} \log \inf _{z : \rho _\epsilon (t,z) >\epsilon } \rho _\epsilon (t,z) . \end{aligned}$$

In addition,

$$\begin{aligned} C_\epsilon - \Phi _\epsilon (r) \ge C_\epsilon - \theta _\epsilon (\epsilon ) = -\frac{1}{2} \log \epsilon , \quad \forall r \in [0,\epsilon ], \end{aligned}$$

which implies

$$\begin{aligned} \sup _{z : 0\le \rho _\epsilon (t,z) <\epsilon } \Big (\Phi _\epsilon (\rho _\epsilon (t,x_\epsilon (t))) - \Phi _\epsilon (\rho _\epsilon (t, z)) \Big ) \ge -\frac{1}{2} \log 2 - \frac{1}{2} \log \epsilon . \end{aligned}$$

Therefore, when \(\epsilon >0\) is small enough, (3.26) gives (using the convention that \(\sup \emptyset = -\infty \)),

$$\begin{aligned} g_\epsilon (t)&:= \Big ( -\frac{1}{2} \log 2 - \frac{1}{2} \log \big (\inf _{z \in {\mathcal {O}}} \rho _\epsilon (t,z) \vee \epsilon \big ) \Big ) \vee \Big ( (-\frac{1}{2} \log 2 + C_\epsilon ) +\frac{1}{\epsilon } \sup _{z: \rho _\epsilon <0} \rho _\epsilon ^{-}(t,z) \Big ) \\&\quad \le \sqrt{2} \Big ( \int _{{\mathcal {O}}} |\partial _x \Phi _\epsilon (\rho _\epsilon (t,x))|^2 dx\Big )^{\frac{1}{2}}. \end{aligned}$$

Combined with (3.12), this yields

$$\begin{aligned} \int _{{\mathcal {O}}} \Psi _\epsilon (\rho _\epsilon (x, T)) dx + \frac{1}{2} \int _s^T g_\epsilon ^2(r) dr \le \int _{{\mathcal {O}}} \Psi _\epsilon (\rho _\epsilon (s,x)) dx + \int _s^T \int _{{\mathcal {O}}} |\eta _\epsilon (r,x)|^2 dx dr. \end{aligned}$$
(3.27)

Using \( \int _{{\mathcal {O}}} \big ( - \log \big )(\rho _\epsilon \vee \epsilon ) dx \le \big ( - \log \big )(\inf _{{\mathcal {O}}} \rho _\epsilon \vee \epsilon )\), we conclude

$$\begin{aligned}&\int _{{\mathcal {O}}} \Psi _\epsilon (\rho _\epsilon (x, T)) dx + \frac{1}{8} \int _s^T \Big ( -\log 2 + \int _{{\mathcal {O}}} (-\log ) (\rho _\epsilon \vee \epsilon ) dx \Big ) dr \nonumber \\&\quad \le \int _{{\mathcal {O}}} \Psi _\epsilon (\rho _\epsilon (s,x)) dx + \int _s^T \int _{{\mathcal {O}}} |\eta _\epsilon (r,x)|^2 dx dr. \end{aligned}$$
(3.28)

Now we pass \(\epsilon \rightarrow 0\) in the above inequality to conclude (3.25). The details are given in the following steps. First, we note that

$$\begin{aligned} \Vert \rho _\epsilon (t) \vee \epsilon -\rho _\epsilon (t) \Vert _{L^\infty } \le 2 \epsilon + \sup _{z} \rho _\epsilon ^{-}(t,z). \end{aligned}$$

Hence by the convergence in (3.17) and by the estimate (3.27),

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \int _s^T \langle \rho _\epsilon \vee \epsilon , \varphi \rangle dt = \int _s^T\langle \rho , \varphi \rangle dt, \quad \forall \varphi \in C({\mathcal {O}}). \end{aligned}$$

The observation

$$\begin{aligned} -\log r = \sup _{s>0} \big ( (- s) r + 1 + \log s\big ), \quad r >0, \end{aligned}$$

leads to the variational formula

$$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} -\log \big (\rho _\epsilon \vee \epsilon \big ) dz dt = \sup _{\varphi \in C({\mathcal {O}}), \varphi >0} \Big ( \int _s^T \big ( \langle \rho _\epsilon \vee \epsilon , - \varphi \rangle + 1 + \langle 1, \log \varphi \rangle \big ) \Big ) dt. \end{aligned}$$
(3.29)

Consequently

$$\begin{aligned} \int _s^T \int _{{\mathcal {O}}} \big (-\log \big ) \rho (t,z) dz dt \le \liminf _{\epsilon \rightarrow 0^+} \int _s^T \int _{{\mathcal {O}}} -\log \big (\rho _\epsilon \vee \epsilon \big ) dz dt. \end{aligned}$$
(3.30)

Second, by Jensen’s inequality, \(\Vert \eta _\epsilon (r) \Vert _{L^2} \le \Vert \eta (r)\Vert _{L^2}\). Finally, to get (3.25) from (3.28) by taking \(\epsilon \rightarrow 0^+\), noting the identification of \(\Psi _\epsilon \) in (3.15), all we need is to justify the inequality

$$\begin{aligned} \limsup _{\epsilon \rightarrow 0} \int _{{\mathcal {O}}} \Psi _\epsilon (\rho _\epsilon (s,x)) dx +\frac{1}{2} - C_\epsilon \le \frac{1}{2} \int _{{\mathcal {O}}} \rho (s,x) \log \rho (s,x) dx, \quad \forall s \ge 0, \quad \text {a.e.} \end{aligned}$$
(3.31)

If \(s=0\), then this follows directly by convexity/Jensen inequality arguments,

$$\begin{aligned} \int _{{\mathcal {O}}} \big (J_\epsilon * \rho _0\big ) \log \big (J_\epsilon * \rho _0 \big )dx \le \int _{{\mathcal {O}}} \rho _0(x) \log \rho _0(x) dx. \end{aligned}$$

The case of \(0<s<T<\infty \) is more subtle. We divide the justifications into three steps. In step one, we construct the solutions \(\{\rho _\epsilon (r) : 0\le r < s\}\) as before with \(\rho _\epsilon (0) : =J_\epsilon * \rho _0\) and take its limit \(\{\rho (r): 0\le r < s\}\). Then we construct \(\{ {\hat{\rho }}_\epsilon (r) : s \le T\}\) as solution to (3.9) with initial data \({\hat{\rho }}_\epsilon (s) := J_\epsilon * \rho (s)\) and concatenate \(\rho _\epsilon \) with \({\hat{\rho }}\) to arrive at a new \({{\mathsf {X}}}\)-valued curve \(\{ {\tilde{\rho }}_\epsilon (r) : 0 \le r \le T\}\). This new curve is defined on \(r \in [0,T]\), but may have a discontinuity at time \(s>0\). We proceed to the second step next. All arguments and estimates before (3.31) in the proof of this lemma still hold if we replace \(\rho _\epsilon \) by \({\tilde{\rho }}_\epsilon \). Hence, for the concatenated curve, (3.31) still holds. In the last step, we note that \(\{ {\tilde{\rho }}_\epsilon :\epsilon >0\}\) and \(\{ \rho _\epsilon : \epsilon >0\}\) have the same limit \(\rho \). This holds by the stability-uniqueness result in Lemma 3.5. Therefore, (3.31) is verified for the curve \({\tilde{\rho }}_\epsilon \).

We conclude that (3.25) holds for the limit \(\rho \). \(\quad \square \)

Lemma 3.7

The energy estimate (3.8) holds.

Proof

Our strategy is to derive (3.8) by passing to the limit \(\epsilon \rightarrow 0^+\) in (3.12).

Let \(\rho \) be the limit of the the sequence of functions \(\rho _\epsilon \) as in (3.17). From (3.12), the following holds for every \(0<s <T\):

$$\begin{aligned}&\int _s^T \Big (\liminf _{\epsilon \rightarrow 0} \int _{\mathcal O}|\partial _x \Phi _\epsilon (\rho _\epsilon (r,x))|^2 dx \Big )dr \nonumber \\&\quad \le \liminf _{\epsilon >0} \int _s^T \int _{{\mathcal {O}}} |\partial _x \Phi _\epsilon (\rho _\epsilon (r,x))|^2 dx dr \le \int _s^T \int _{{\mathcal {O}}}|\eta (r,x)|^2 dx dr + S(\rho (s)). \end{aligned}$$
(3.32)

Suppose that we can find a set \({\mathcal {N}} \subset [s,T]\) of Lebesgue measure zero and functions \([s,T] \ni r \mapsto k_\epsilon (r) \in {{\mathbb {R}}}\) such that for every \(r \in [s,T]\setminus {{\mathcal {N}}}\), there exists a subsequence (still labeled by \(\epsilon :=\epsilon (r)\)) with

$$\begin{aligned} \lim _{\epsilon \rightarrow 0^+} \langle \Psi _\epsilon (\rho _\epsilon (r,\cdot )) + k_\epsilon (r), \partial _x \varphi \rangle =\langle \frac{1}{2} \log \rho (r,\cdot ), \partial _x \varphi \rangle , \quad \forall \varphi \in C^\infty ({\mathcal {O}}). \end{aligned}$$
(3.33)

Then

$$\begin{aligned} \int _{{\mathcal {O}}} |\partial _x \frac{1}{2} \log \rho (r, x) |^2dx \le \liminf _{\epsilon \rightarrow 0} \int _{{\mathcal {O}}} |\partial _x \Phi _\epsilon (\rho _\epsilon (r,x)) |^2 dx, \quad \text {for a.e. } r \in [s,T], \end{aligned}$$

hence we conclude.

We establish (3.33) next. Let

$$\begin{aligned} h(r):= \liminf _{\epsilon \rightarrow 0} \Vert \partial _x \Phi _\epsilon (\rho _\epsilon (r,\cdot )) \Vert _{L^2({\mathcal {O}})}^2 . \end{aligned}$$

By (3.32), we can find a set \({\mathcal {N}} \subset [s,T]\) of Lebesgue measure zero such that

$$\begin{aligned} h(r) <\infty , \quad \forall r \in [s,T] \setminus {{\mathcal {N}}}. \end{aligned}$$
(3.34)

Therefore for \(r \in [s,T]\setminus {{\mathcal {N}}}\), there exists a subsequence \(\epsilon :=\epsilon (r)\), and there exist constants \(k_\epsilon :=k_\epsilon (r)\) such that \(\{ \Phi _\epsilon (\rho _\epsilon (r,\cdot )) + k_\epsilon \}_{\epsilon >0}\) is relatively compact in \(C({\mathcal {O}})\). Let

$$\begin{aligned} u(r,x):= \lim _{\epsilon \rightarrow 0^+} \Phi _\epsilon (\rho _\epsilon (r,x)) + k_\epsilon , \end{aligned}$$

where the convergence (along the selected subsequence) is uniform in \({\mathcal {O}}\). We claim that the set \(\{x : \rho _\epsilon (r,x) <0\} = \emptyset \), when \(\epsilon \) is small enough. Suppose this is not true. Then by continuity of \(x \mapsto \rho _\epsilon (r,x)\), we can find \({\tilde{x}}_\epsilon (r)\) such that \(\rho _\epsilon (r, {\tilde{x}}_\epsilon (r)) =0\). We also recall that in the proof of Lemma 3.6, we can find \(x_\epsilon (r)\) such that \(\rho _\epsilon (r, x_\epsilon (r)) \ge 1/2\). Hence

$$\begin{aligned} \Phi _\epsilon (\rho _\epsilon (r,x)) - \Phi _\epsilon (\rho (r,{\tilde{x}}_\epsilon )) \ge \Phi _\epsilon (\epsilon ) - 0 = - \log \epsilon \rightarrow +\infty , \quad \text { as } \epsilon \rightarrow 0. \end{aligned}$$

But on the other hand, the estimate (3.34) implies that

$$\begin{aligned} \liminf _{\epsilon \rightarrow 0} \sup _{x \in {\mathcal {O}}} |\Phi _\epsilon (\rho _\epsilon (r,x)) - \Phi _\epsilon (0)| <\infty . \end{aligned}$$

This contradiction allows us to conclude that

$$\begin{aligned} \liminf _{\epsilon \rightarrow 0} \Big ( \inf _{z \in {\mathcal {O}}} \rho _\epsilon (r,z)\Big ) = \liminf _{\epsilon \rightarrow 0} \Big (\inf _{z \in {\mathcal {O}}} \rho _\epsilon ^+(r,z)\Big ) =\liminf _{\epsilon \rightarrow 0} \Big ( \inf _{z \in {\mathcal {O}}} \rho _\epsilon ^+(r,z) \vee \epsilon \Big ) >0. \end{aligned}$$

Therefore \(u(r,x) = \lim _{\epsilon \rightarrow 0^+}\big ( \frac{1}{2} \log (\rho _\epsilon (r,x)) + (k_\epsilon + C_\epsilon )\big )\), where the convergence is uniform in x. That is, along this subsequence of \(\epsilon =\epsilon (r)\),

$$\begin{aligned} \lim _{\epsilon \rightarrow 0} \Vert e^{2(k_\epsilon +C_\epsilon )}\rho _\epsilon (r,\cdot )- e^{2u(\cdot )}\Vert _{L^\infty } =0. \end{aligned}$$

In view of the weak convergence in (3.17), \(u(r,x) = \frac{1}{2} \log \rho (r,x) + C_0\) for some constant \(C_0:=C_0(r)\). Hence we verified (3.33). \(\quad \square \)

3.2 A posteriori estimates for the PDE with control

We see in Lemma 3.5 that the notion of solutions in terms of the variational inequalities (3.7) implies uniqueness and stability. Next, we prove that the weak solution in the sense of Definition 3.1 implies that these variational inequalities hold, and hence uniqueness and stability follow.

Lemma 3.8

Every weak solution \((\rho ,\eta )\) to (3.1)–(3.2) also satisfies (3.7).

Proof

From (3.6), simple approximations shows that the following holds for all \(0<s<t\):

$$\begin{aligned} \langle \varphi (t), \rho (t) \rangle - \langle \varphi (s), \rho (s) \rangle = \int _s^t \big ( \langle \partial _r \varphi (r), \rho (r) \rangle + \langle \varphi (r), \frac{1}{2} \partial _{xx}^2 \log \rho (r) + \partial _x \eta \rangle \big ) dr \end{aligned}$$
(3.35)

for all smooth \(\varphi :=\varphi (r,x) \) which includes in particular the choice

$$\begin{aligned} \varphi _\epsilon (r):= G_\epsilon *(-\Delta _{xx})^{-1} (\rho (r) - \gamma _0) ; \end{aligned}$$

here, \(G_\epsilon :=J_\epsilon * J_\epsilon \) where \(J_\epsilon \) is a smooth mollifier and the \(*\) denotes convolution in the spatial variable only.

Therefore

$$\begin{aligned} \Vert J_\epsilon *(\rho (t) - \gamma _0) \Vert _{-1}^2&= \Vert J_\epsilon * (\rho (s) - \gamma _0) \Vert _{-1}^2 - \int _s^t \int _{{\mathcal {O}}} (G_\epsilon ) * \rho (r,x) \log \rho (r,x) dx dr \\&\quad + \int _s^t \int _{{\mathcal {O}}} (G_\epsilon ) * \gamma _0(r,x) \log \rho (r,x) dx dr \\&\quad -2 \int _s^t\int _{{\mathcal {O}}} ( G_\epsilon )*\eta (r,x) \partial _x (-\partial _{xx}^2)^{-1} (\rho (r) - \gamma _0) (x) dx dr. \end{aligned}$$

We note that, by Jensen’s inequality,

$$\begin{aligned} \int _{{\mathcal {O}}} (G_\epsilon * \gamma _0) (x) \log \rho (r,x) dx&= \int _{{\mathcal {O}}} (G_\epsilon * \gamma _0) (x) \log \frac{\rho (r,x)}{G_\epsilon * \gamma _0(x)} dx + S (G_\epsilon * \gamma _0) \\&\le \log \big (\int _{{\mathcal {O}}} 1 dx \big ) + S(\gamma _0) \le S(\gamma _0). \end{aligned}$$

With the a priori regularity estimates (3.3)–(3.5), passing to the limit \(\epsilon \rightarrow 0\), we obtain (3.7) for \(s>0\). Taking \(s \rightarrow 0+\), the case \(s=0\) follows. \(\quad \square \)

Lemma 3.9

Suppose that \((\rho , \eta )\) is the weak solution to (3.1)–(3.2) in the sense of Definition 3.1. Let

$$\begin{aligned} \phi (t):= t S(\rho (t)) + \Vert \rho (t) - \gamma _0 \Vert _{-1}^2, \end{aligned}$$

then

$$\begin{aligned} \phi (t) \le \phi (0) + \int _0^t \big ( \frac{1}{2} r + 2 \Vert \rho (r) - \gamma _0 \Vert _{-1} \big ) \Vert \eta (r)\Vert _{L^2} dr. \end{aligned}$$

Proof

Following the ideas exposed in Theorem 24.16 of Villani [45] in a similar setting, we combine (3.7) and (3.8) to obtain the desired estimate. \(\quad \square \)

We define a set of regular points in the state space \({{\mathsf {X}}}\),

$$\begin{aligned} \text {Reg}&:= \Big \{ \rho \in {{\mathsf {X}}}: \rho (dx) = \rho (x) dx, \text { some measurable } \rho (x),\nonumber \\&\quad \log \rho \in L^1({\mathcal {O}}), \int _{\mathcal O}|\partial _x \log \rho |^2 dx<\infty \Big \} \nonumber \\&\quad = \Big \{ \rho \in {{\mathsf {X}}}: \rho (dx) = \rho (x) dx, \rho \in C(\mathcal O), \inf \rho >0, \int _{{\mathcal {O}}}|\partial _x \log \rho |^2 dx <\infty \Big \}, \end{aligned}$$
(3.36)

where the last equality follows from one-dimensional Sobolev inequalities. The estimates in Lemma 3.2 implies that under finite control cost (3.2), the weak solution of (3.1) has the property that it spends zero Lebesgue time outside the set \(\text {Reg}\). That is,

$$\begin{aligned} \rho (t) \in \text {Reg}, \quad \forall t>0, \quad \text {a.e.} \end{aligned}$$
(3.37)

3.3 A Nisio semigroup

We recall that

$$\begin{aligned} L(\rho ,\eta ) := \frac{1}{2} \int _{{\mathcal {O}}} |\eta (x)|^2 dx. \end{aligned}$$

For every \((\rho , \eta )\) satisfying (3.1)–(3.2) in the sense of Definition 3.1, by the regularity results established in Lemma 3.2, \(\log \rho (t,x) \in \mathcal D^\prime ((0,T) \times {\mathcal {O}})\) exists as a distribution, and

$$\begin{aligned} \int _0^t \int _{{\mathcal {O}}} |\eta (t,x)|^2 dx ds = \int _0^t \Vert \partial _s \rho - \frac{1}{2} \partial _{xx}^2 \log \rho \Vert _{-1}^2 ds. \end{aligned}$$

Let \(f:{{\mathsf {X}}}\mapsto {{\mathbb {R}}}\cup \{+\infty \}\) with \(\sup _{{\mathsf {X}}}f <\infty \), we define

$$\begin{aligned} V(t)f(\rho _0)&:= \sup \Big \{ f(\rho (t)) - \int _0^t L \big (\rho (r),\eta (r) \big ) dr : (\rho (\cdot ), \eta (\cdot )) \text { satisfies }~(3.1) \nonumber \\&\qquad \qquad \text { in the sense of Definition}~3.1 \text {with } \rho (0) = \rho _0 \Big \}, \quad \forall \rho _0 \in {{\mathsf {X}}}. \end{aligned}$$
(3.38)

It follows that \(\sup _{{\mathsf {X}}}V(t) f <\infty \). Moreover, \(V(t) C = C\) for any \(C \in {{\mathbb {R}}}\).

We define an action functional on curves \(\rho (\cdot ) \in C([0,\infty ); {{\mathsf {X}}})\), thus giving a precise meaning to (1.4):

$$\begin{aligned} A_T[\rho (\cdot )]:= {\left\{ \begin{array}{ll} \int _0^T\frac{1}{2} \Vert \partial _t \rho - \frac{1}{2} \partial _{xx}^2 \log \rho \Vert _{-1}^2 dt, &{} \text { if } \int _{{\mathcal {O}}}| \log \rho (t,x)| dx <\infty , \text { a.e. } t>0,\\ +\infty , &{} \text { otherwise.} \end{array}\right. } \end{aligned}$$
(3.39)

and

$$\begin{aligned} A[\rho _0, \rho _1;T]:=\inf \{ A_T [\rho (\cdot )] : \rho (\cdot ) \in C([0,\infty );{{\mathsf {X}}}), \rho (0)=\rho _0, \rho (T)=\rho _1 \}. \end{aligned}$$
(3.40)

We recall that \(R_\alpha \) is defined in (1.7) before giving regularity results for V(t) and \(R_\alpha \).

Lemma 3.10

For \(h \in C_b({{\mathsf {X}}})\), \(V(t) h\in C_b({{\mathsf {X}}})\) for all \(t \ge 0\) and \(R_\alpha h \in C_b({{\mathsf {X}}})\) for every \(\alpha >0\).

Proof

These claims are consequences of the stability Lemma 3.5. The proofs resemble those in standard-finite dimensional control problem. Hence we only prove the claim that \(V(t)h \in C_b({{\mathsf {X}}})\).

Let \(\rho _0, \gamma _0 \in {{\mathsf {X}}}\). For any \(\epsilon >0\), there exists \((\rho (\cdot ), \eta (\cdot )):=(\rho _\epsilon (\cdot ), \eta _\epsilon (\cdot ))\) and \((\gamma (\cdot ), \eta (\cdot )):= (\gamma _\epsilon (\cdot ), \eta _\epsilon (\cdot ))\) satisfying (3.1)–(3.2) with \(\rho (0) =\rho _0\) and \(\gamma (0)=\gamma _0\), and the contraction estimate (3.22) holds. Consequently,

$$\begin{aligned} V(t) h(\rho _0) - V(t) h(\gamma _0)&\le \epsilon + h(\rho (t)) - h(\gamma (t)) \le \epsilon + \omega _h\big ({{\mathsf {d}}}(\rho (t), \gamma (t))\big ) \\&\le \epsilon + \omega _h({{\mathsf {d}}}(\rho _0, \gamma _0)). \end{aligned}$$

Since \(\epsilon >0\) is arbitrary, it follows that \(V(t) h \in C_b({{\mathsf {X}}})\). \(\quad \square \)

Lemma 3.11

(Nisio semigroup). The family of operators \(\{ V(t): t \ge 0\}\) has the following properties:

  1. (1)

    It forms a nonlinear semigroup on \(C_b({{\mathsf {X}}})\),

    $$\begin{aligned} V(t)V(s) f = V(t+s) f, \quad \forall t, s \ge 0, f \in C_b({{\mathsf {X}}}). \end{aligned}$$
  2. (2)

    The semigroup is a contraction on \(C_b({{\mathsf {X}}})\): for every \(f,g \in C_b({{\mathsf {X}}})\), we have

    $$\begin{aligned} \Vert V(t) f - V(t) g \Vert _{L^\infty ({{\mathsf {X}}})} \le \Vert f - g \Vert _{L^\infty ({{\mathsf {X}}})}. \end{aligned}$$

    In fact,

    $$\begin{aligned} \sup _{{\mathsf {X}}}\big ( V(t) f - V(t) g \big ) \le \sup _{{\mathsf {X}}}\big (f - g \big ). \end{aligned}$$
  3. (3)

    The resolvent is a contraction on \(C_b({{\mathsf {X}}})\): for every \(h_1, h_2 \in C_b({{\mathsf {X}}})\), we have

    $$\begin{aligned} \Vert R_\alpha h_1 - R_\alpha h_2 \Vert _{L^\infty ({{\mathsf {X}}})} \le \Vert h_1 - h_2\Vert _{L^\infty ({{\mathsf {X}}})}. \end{aligned}$$

    Moreover, if \(h_1\) is bounded from above and \(h_2\) satisfies (1.10) and is bounded from below, then

    $$\begin{aligned} \sup _{{\mathsf {X}}}( R_\alpha h_1 - R_\alpha h_2 ) \le \sup _{{\mathsf {X}}}(h_1 - h_2). \end{aligned}$$
    (3.41)

    If \(h_1\) is bounded and \(h_2\) is bounded from above, then

    $$\begin{aligned} \inf _{{\mathsf {X}}}(R_\alpha h_1 - R_\alpha h_2) \ge \inf _{{\mathsf {X}}}(h_1 - h_2). \end{aligned}$$

Proof

The semigroup property follows from standard reasoning in dynamical programming. The contraction properties follow from an \(\epsilon \)-optimal control argument applied to the definition of V(t) in (3.38), similar to the proof of the last lemma. \(\quad \square \)

Lemma 3.12

Let \(\rho _0 \in {{\mathsf {X}}}\). We have

$$\begin{aligned} V(t) f(\rho _0) = \sup _{\rho _1 \in {{\mathsf {X}}}} \Big \{ f(\rho _1) - A[\rho _0, \rho _1; t] \Big \}, \quad \forall \sup _{{\mathsf {X}}}f<\infty . \end{aligned}$$

If in addition \(f \in C_b({{\mathsf {X}}})\), then there exists \(\rho ^t(\cdot ) \in C([0,t]; {{\mathsf {X}}})\) or equivalently, there exists \((\rho ^t(\cdot ), \eta (\cdot ))\) satisfying (3.1)–(3.2) in the sense of Definition 3.1, with \(\rho ^t(0) =\rho _0\) and satisfying the variational inequality (3.7), such that

$$\begin{aligned} V(t) f(\rho _0) = f(\rho ^t(t)) - A_t[\rho ^t(\cdot )] = f(\rho ^t(t)) - \int _0^t \int _{{\mathcal {O}}} \frac{1}{2} |\eta (r,x)|^2 dx dr. \end{aligned}$$
(3.42)

Proof

By the definition of V(t)f in (3.38), we can find a sequence of \((\rho _n(\cdot ), \eta _n(\cdot ))\) satisfying (3.1) in the weak sense of Definition 3.1 with \(\rho _n(0) =\rho _0\) and such that

$$\begin{aligned} V(t) f(\rho _0) = \lim _{n \rightarrow \infty } \Big ( f(\rho _n(t)) - \int _0^t \int _{{\mathcal {O}}}\frac{1}{2} |\eta _n(r,x)|^2 dxdr \Big ). \end{aligned}$$

We note that \(V(t) 0 = 0\). The contraction property in Lemma 3.11 gives \(\Vert V(t) f \Vert _{L^\infty ({{\mathsf {X}}})} \le \Vert f \Vert _{L^\infty ({{\mathsf {X}}})}\), which in turn implies that

$$\begin{aligned} C:=\sup _n \int _0^t \int _{{\mathcal {O}}} |\eta _n(r,x)|^2 dxdr <\infty . \end{aligned}$$
(3.43)

Therefore there exists an \(\eta \) such that \(\eta _n \rightharpoonup \eta \) weakly in \(L^2((0,T) \times {\mathcal {O}})\). This implies in particular

$$\begin{aligned} \int _0^t \int _{{\mathcal {O}}} |\eta (r,x)|^2 dxdr \le \liminf _{n \rightarrow \infty } \int _0^t \int _{{\mathcal {O}}} |\eta _n(r,x)|^2 dxdr. \end{aligned}$$
(3.44)

Therefore, it suffices to show that \(\{ \rho _n(\cdot ) : n =1,2,\ldots \}\) is relatively compact in \(C([0,\infty ); {{\mathsf {X}}})\), and that any limit point \(\rho _\infty (\cdot )\) of a convergent subsequence satisfies the regularity estimates (3.3)–(3.5).

Since \(({{\mathsf {X}}}, {{\mathsf {d}}})\) is a compact metric space, the relative compactness of the curves \(\{ \rho _n(\cdot ) : n=1,2,\ldots \}\) follows from a uniform modulus of continuity which we show to hold. First, for each n, (3.7) implies that

$$\begin{aligned} \sup _n \sup _{0\le t \le T} \Vert \rho _n(t)\Vert _{-1} <\infty . \end{aligned}$$

Second, by (3.9), this implies that

$$\begin{aligned} \sup _n S(\rho _n(t))< \frac{M_1}{t}<\infty , \end{aligned}$$

where the \(M_1>0\) is a finite constant. Using this, we obtain from (3.7) the estimate

$$\begin{aligned} \Vert \rho _n(t) - \rho _n(s)\Vert _{-1}^2 \le \frac{M}{s} (t-s) + M_2 \int _s^t \Vert \eta _n(r) \Vert _{L^2} dr, \quad \forall 0<s<t. \end{aligned}$$

Hence there exists a modulus \(\omega _1 : [0, \infty ) \mapsto [0,\infty )\) with \(\omega _1 (0+)=0\) such that

$$\begin{aligned} \Vert \rho _n(t) - \rho _n(s)\Vert _{-1} \le s^{-1} \omega _{1} (|t-s|), \quad \forall 0<s <t. \end{aligned}$$

Third, we derive a short-time estimate of a uniform modulus of continuity. From the weak solution property coupled with the a priori estimates in the definition of a solution in Definition 3.1, we can derive a variant of (3.7) by approximation arguments:

$$\begin{aligned}&\frac{1}{2} \Vert \rho _n(t) - \gamma _0 \Vert _{-1}^2 - \frac{1}{2} \Vert \rho _n(s) - \gamma _0 \Vert _{-1}^2 \\&\quad = \int _s^t \big ( \frac{1}{2} \langle \rho _n(r)-\gamma _0, - \log \rho _n(r) \rangle - \langle \partial _x (-\Delta _{xx}^2)^{-1} (\rho _n(r) - \gamma _0), \eta \rangle \big ) dr \\&\quad \le \int _s^t \big ( \frac{1}{2} \langle \rho _n(r)-\gamma _0, - \log \gamma _0 \rangle - \Vert \rho _n(r) - \gamma _0\Vert _{-1} \Vert \eta \Vert _{L^2} \big ) dr \\&\quad \le \int _s^t \Vert \rho _n(r)-\gamma _0\Vert _{-1} \big ( \frac{1}{2} \Vert \partial _x \log \gamma _0 \Vert _{L^2} + \Vert \eta \Vert _{L^2} \big ) dr. \end{aligned}$$

In the last two lines above, the first inequality follows from the fact that \(\log \) is a monotonically increasing function. The previous estimate implies that

$$\begin{aligned} \Vert \rho _n(t) - \gamma _0 \Vert _{-1} - \Vert \rho _n(s) - \gamma _0 \Vert _{-1} \le \int _s^t \big ( \frac{1}{2} \Vert \partial _x \log \gamma _0 \Vert _{L^2} + \Vert \eta \Vert _{L^2} \big ) dr. \end{aligned}$$

Consequently, for any \(\epsilon >0\), by a density argument we can find a \(\gamma _{0,\epsilon } \in {{\mathsf {X}}}\) such that \(S(\gamma _{0,\epsilon })<\infty \) and \(\Vert \gamma _{0,\epsilon } - \rho _0 \Vert _{-1}<\epsilon \). Taking \(\gamma _0:= \gamma _{0,\epsilon }\) and \(s=0\), we have

$$\begin{aligned} \sup _{0\le r \le t} \Vert \rho _n(r) -\rho _0\Vert _{-1} \le \inf _{\epsilon \in (0,1)} \Big ( 2 \epsilon + \frac{1}{2} \Vert \partial _x \log \gamma _{0,\epsilon } \Vert _{L^2} t + \sqrt{C} \sqrt{t}\Big ) =: \omega _2(t). \end{aligned}$$

The constant C is the one from (3.43). Fourth, by the existence of \(\omega _i, i=1,2\), we conclude the existence of a uniform modulus \(\omega \) such that

$$\begin{aligned} \sup _{0\le r \le t} \Vert \rho _n(r) -\rho _0\Vert _{-1} \le \omega (t). \end{aligned}$$

The entropy estimate (3.3) for the \(\rho (\cdot )\) follows from the fact that each \(\rho _n(\cdot )\) satisfies (3.7), and that \(\rho \mapsto S(\rho )\) is lower semicontinuous in \(({{\mathsf {X}}}, {{\mathsf {d}}})\). Similarly, (3.4) holds for \(\rho (\cdot )\) since (3.25) holds for each \(\rho _n\) and \(\rho \mapsto \int _{{\mathcal {O}}} (-\log ) \rho ) dx\) is lower semicontinuous in \(({{\mathsf {X}}}, {{\mathsf {d}}})\) (see the proof of (3.29)). Finally, (3.5) holds for \(\rho (\cdot )\) because (3.8) holds for each \(\rho _n(\cdot )\).

Lemma 3.13

For every \(\alpha , \beta >0\), we have

$$\begin{aligned} R_\alpha h = R_\beta \Big ( R_\alpha h - \beta \frac{R_\alpha h - h }{\alpha }\Big ), \quad \forall h \in C_b({{\mathsf {X}}}). \end{aligned}$$

Proof

The idea of proof in Lemma 8.20 of [24] applies in this context. Because of the special structure of the problem here, the use of a relaxed control there is not necessary.

\(\square \)

Let \(f_0\) and \(H_0f_0\) be as in (1.24)–(1.25). Note that the estimate in (3.7) holds, that \(f_0 - \alpha H_0 f_0\) is lower semicontinuous and bounded from below, and moreover that it satisfies for \(\alpha >0\)

$$\begin{aligned} \int _0^t e^{-\alpha ^{-1}r} \Big ( \frac{(f_0 - H_0 f_0)(\rho (r))}{\alpha } - \frac{1}{2} \int _{{\mathcal {O}}} |\eta (r,x)|^2 dx \Big ) dr < + \infty \end{aligned}$$

for every \((\rho (\cdot ), \eta (\cdot ))\) solving the controlled PDE (3.1) with (3.2). Therefore

$$\begin{aligned} R_\alpha (f_0 - \alpha H_0 f_0) :{{\mathsf {X}}}\mapsto {{\mathbb {R}}}\cup \{+ \infty \} \end{aligned}$$

is a well defined function, it is bounded from below.

Similarly, let \(f_1\) and \(H_1 f_1\) be defined as in (1.26)–(1.27). Since \({{\mathsf {X}}}\) is compact, \(f_1 - \alpha H_1 f_1\) is bounded from above. It is also upper semicontinuous and bounded from above. Hence

$$\begin{aligned} R_\alpha (f_1 - \alpha H_1 f_1) :{{\mathsf {X}}}\mapsto {{\mathbb {R}}}\cup \{-\infty \} \end{aligned}$$

is a well defined function and it is bounded from above.

With these estimates, we prove a variant of Lemma 8.19 in Feng and Kurtz [24].

Lemma 3.14

For every \(\alpha >0\),

$$\begin{aligned} R_\alpha (f_0 - \alpha H_0 f_0)&\le f_0, \end{aligned}$$
(3.45)
$$\begin{aligned} R_\alpha (f_1 - \alpha H_1 f_1)&\ge f_1. \end{aligned}$$
(3.46)

Proof

We note that, for every \(\eta \in L^2({\mathcal {O}})\),

$$\begin{aligned}&H_0 f_0(\rho ) + L(\rho , \eta ) \\&\quad = \frac{k}{2} \big (S(\gamma ) -S(\rho )\big ) + \sup _{{\hat{\eta }} \in L^2({\mathcal {O}})} \Big ( \langle -k \partial _x (-\partial _{xx}^2)^{-1}(\rho -\gamma ), {\hat{\eta }} \rangle -\frac{1}{2} \int _{{\mathcal {O}}} |{\hat{\eta }}|^2 dx \Big ) +\frac{1}{2} \int _{{\mathcal {O}}} |\eta |^2 dx \\&\quad \ge \frac{k}{2} \big (S(\gamma ) -S(\rho )\big ) + k \langle -\partial _x (-\partial _{xx}^2)^{-1}(\rho -\gamma ), \partial _x \eta \rangle . \end{aligned}$$

By the a priori estimate (3.7), for every \((\rho (\cdot ), \eta (\cdot ))\) solving the PDE (3.1) in the sense of Definition 3.1 with (3.2) and the initial condition \(\rho (0) =\rho _0\), we have

$$\begin{aligned} \int _0^t\Big ( H_0 f_0(\rho (r)) + L\big (\rho (r), \eta (r)\big )\Big ) dr \ge f_0(\rho (t)) - f_0(\rho _0), \quad \forall t>0. \end{aligned}$$

Moreover,

$$\begin{aligned}&\int _0^t e^{-\alpha ^{-1}s} \Big ( \frac{\big (f_0 - \alpha H_0 f_0\big )(\rho (s))}{\alpha } -L(\rho (s),\eta (s)) \Big ) ds \\&\quad = \int _0^t \alpha ^{-1} e^{-\alpha ^{-1}s} f_0(\rho (s)) ds -\int _{s=0}^\infty \alpha ^{-1} e^{-\alpha ^{-1} s} \int _{r=0}^{s \wedge t} \big ( H_0 f_0(\rho (r)) + L(\rho (r),\eta (r)) \big ) dr ds \\&\quad \le \int _0^t \alpha ^{-1} e^{-\alpha ^{-1}s} f_0(\rho (s)) ds -\int _0^\infty \alpha ^{-1} e^{-\alpha ^{-1} s} \big ( f_0(\rho (s \wedge t)) - f_0(\rho _0) \big ) ds \\&\quad = f_0(\rho _0) - e^{-\alpha ^{-1} t} f_0(\rho (t)). \end{aligned}$$

In view of (1.7), we conclude (3.45).

Next, we show (3.46). For any \(\gamma \in {{\mathsf {X}}}\) and \(\rho \in {{\mathsf {X}}}\) in the definition of \(f_1:=f_1(\gamma )\) as in (1.26), we define \(\eta := - k \partial _x (-\partial _{xx}^2)^{-1} (\rho - \gamma )\). Then

$$\begin{aligned}&H_1 f_1(\gamma ) + L(\gamma , \eta ) \\&\quad = \frac{k}{2} \big ( S(\gamma ) - S(\rho ) \big ) + \langle -k \partial _x (-\partial _{xx}^2)^{-1}(\rho -\gamma ), \eta \rangle - \frac{1}{2} \int _{{\mathcal {O}}} |\eta (x)|^2 dx + \frac{1}{2} \int _{{\mathcal {O}}} |\eta (x)|^2 dx \\&\quad = \frac{k}{2} \big ( S(\gamma ) - S(\rho ) \big ) + \langle -k \partial _x (-\partial _{xx}^2)^{-1}(\rho -\gamma ), \eta \rangle . \end{aligned}$$

We consider the unique solution \(\gamma := \gamma (t)\) to

$$\begin{aligned} \partial _t \gamma = \frac{1}{2} \partial _{xx}^2 \log \gamma + \partial _x \eta = \frac{1}{2} \partial _{xx}^2 \log \gamma + k (\rho - \gamma ), \quad \gamma (0)=\gamma _0, \end{aligned}$$

where the \(\rho \) is such that \(S(\rho )<\infty \). Then, in view of the estimate (3.7) (with the roles of \(\rho \) and \(\gamma \) swapped)

$$\begin{aligned}&\int _{r=0}^{s} \big ( H_1 f_1(\gamma (r)) + L(\gamma (r),\eta (r)) \big ) dr \\&\quad = \int _0^s \Big ( \frac{k}{2} \big ( S(\gamma (r)) - S(\rho ) \big ) + \langle -k \partial _x (-\partial _{xx}^2)^{-1}(\rho -\gamma (r)), \eta (r)\rangle \Big ) dr\\&\quad \le f_1(\gamma (s)) - f_1(\gamma (0)). \end{aligned}$$

Consequently

$$\begin{aligned}&\sup \Big \{ \int _0^t e^{-\alpha ^{-1}s} \Big ( \frac{\big (f_1 - \alpha H_1 f_1\big )({\hat{\rho }}(s))}{\alpha } -L({\hat{\rho }}(s),{\hat{\eta }}(s)) \Big ) ds : \\&\quad ({\hat{\rho }},{\hat{\eta }}) \text { solves }~(3.1)-(3.2), {\hat{\rho }}(0)=\rho _0 \Big \} \\&\quad \ge \int _0^t e^{-\alpha ^{-1}s} \Big ( \frac{\big (f_1 - \alpha H_1 f_1\big )(\rho (s))}{\alpha } -L(\rho (s),\eta (s)) \Big ) ds \\&\quad = \int _0^t \alpha ^{-1} e^{-\alpha ^{-1}s} f_1(\rho (s)) ds -\int _{s=0}^\infty \alpha ^{-1} e^{-\alpha ^{-1} s} \int _{r=0}^{s \wedge t} \big ( H_1 f_1(\rho (r)) + L(\rho (r),\eta (r)) \big ) dr ds \\&\quad = \int _0^t \alpha ^{-1} e^{-\alpha ^{-1}s} f_1(\rho (s)) ds -\int _{s=0}^\infty \alpha ^{-1} e^{-\alpha ^{-1} s} \big (f_1(\gamma (s\wedge t)) - f_1(\gamma _0)\big ) ds\\&\quad = f_1(\gamma _0) - e^{-\alpha ^{-1}t} f_1(\gamma (t)). \end{aligned}$$

Sending \(t \rightarrow \infty \), we conclude the proof of (3.46) and have thus established the lemma. \(\quad \square \)

Lemma 3.15

Let \(\alpha >0\) and \(h \in C_b({{\mathsf {X}}})\). We denote \(f:=R_\alpha h \in C_b({{\mathsf {X}}})\) (Lemma 3.10). Then f is a viscosity sub-solution to (1.28) with the \(h_0\) replaced by h, it is also a viscosity super-solution to (1.29) with the \(h_1\) replaced by h.

Proof

The proof follows from the proof of part (a) of Theorem 8.27 in Feng and Kurtz [24], using Lemmas  3.13. We only give details for the sub-solution case. The conditions on \(H_0 f_0\) is different than those imposed on \(\mathbf{H}_{\dagger }\) in [24]. However, in view of the improved contraction estimate (3.41) and because that \(f_0 - \beta H_0 f_0\) satisfies (1.10) (see the a priori estimate (3.7)), the proof can be repeated almost verbatim.

Let \(f_0, H_0 f_0\) be defined as in (1.24), (1.25). Then \(f_0\) is bounded from below and for every \(\beta >0\)

$$\begin{aligned} \sup _{{\mathsf {X}}}(f - f_0)&= \sup _{{\mathsf {X}}}(R_\alpha h - f_0) \\&\le \sup _{{\mathsf {X}}}\Big (R_\beta \big (R_\alpha - \beta \alpha ^{-1}(R_\alpha h - h )\big ) - R_\beta \big ( f_0 - \beta H_0 f_0\big )\Big ) \\&\le \sup _{{\mathsf {X}}}\Big ( R_\alpha h - \beta \alpha ^{-1}(R_\alpha h - h )\big ) - \big ( f_0 - \beta H_0 f_0\big )\Big ) \\&= \sup _{{\mathsf {X}}}\Big ( f - f_0 - \beta \big ( \frac{f- h}{\alpha } - H_0 f_0\big )\Big ) \end{aligned}$$

In this estimate, the first inequality follows from Lemmas 3.13 and 3.14; the second inequality follows from (3.41). By the arbitrariness of the \(\beta >0\), the sub-solution property follows from Lemma 7.8 of [24]. Note that \(f \in C_b({{\mathsf {X}}})\) and \(f_0 \in LSC({{\mathsf {X}}}; {{\mathbb {R}}}\cup \{+\infty \})\) and \(H_0f_0 \in USC({{\mathsf {X}}}; {{\mathbb {R}}}\cup \{-\infty \})\). \(\quad \square \)

In view of the comparison principle we established in Theorem 2.1, the above result allows us to conclude that Theorem 1.2 holds.

Finally, we link the operator \(R_\alpha \) with the semigroup V by a product formula.

Lemma 3.16

Let \(h \in C_b({{\mathsf {X}}})\), then

$$\begin{aligned} V(t) h(\rho _0) = \lim _{n \rightarrow \infty } R_{n^{-1}}^{[nt]} h(\rho _0), \quad \forall \rho _0 \in {{\mathsf {X}}}. \end{aligned}$$

Proof

The proof of Lemma 8.18 in [24] applies here. The use of a relaxed control argument is not required in the current context. \(\quad \square \)

4 An Informal Derivation of the Hamiltonian H from Stochastic Particles

In this section, we outline a non-rigorous derivation of the Hamiltonian (1.1). A rigorous version of the theory requires significant additional work and will be presented elsewhere. Here, we restrict ourselves to establishing a detailed but heuristic picture.

To explain our program in a nutshell, we start with a system of interacting stochastic particles which has been used as a simplified toy model for gas dynamics. This model leads to the Carleman equation as kinetic limit. Kurtz obtained the hydrodynamic limit of this model, a nonlinear diffusion equation, [35] in 1973, followed by work by McKean [38]. We will also use ideas of Lions and Toscani [37] in their work on this model. While these references are concerned with the hydrodynamic limit, we are interested more broadly in fluctuations around this limit (which includes the hydrodynamic limit as minimiser of the rate functional).

The system is a high-dimensional Markov process. In the hydrodynamic limit, the macroscopic particle density is described by a probability measures on \({\mathcal {O}}\) satisfying a nonlinear diffusion equation. We aim to characterize both the limit as well as the fluctuations around it through an effective action minimization theory formulated as a path-integral. The probabilistic large deviation theory gives us a mathematical framework for explaining this rigorously.

Following a method developed by Feng and Kurtz [24], we establish the large deviations by studying the convergence of a sequence of Hamiltonians derived from the underlying Markov processes. A critical step in the program is to prove comparison principles for the limiting Hamiltonian. This is the motivation for the results presented in earlier sections of this paper. Another critical step is the derivation of the limit Hamiltonian, which we present now informally. The main technique involved is a singular perturbation method generalized to a setting of nonlinear PDEs in the space of probability measures.

4.1 Carleman equations, mean-field version

We now describe the particle model studied by Kurtz, McKean and Lions and Toscani [35, 37, 38]. On the unit circle \({\mathcal {O}}\), we are given a fictitious gas consisting of particles with two velocities. The first particle type moves into the positive x-direction and the second particle type in the negative direction, both with the same (modulus of) speed \(c>0\). Let \(w_1(t,x)\) be density of the first particles type at time t and at location x, and \(w_2(t,x)\) be density of the second type. When particles collide, reactions occur if the types are the same; otherwise, particles move freely as if nothing happened. The reaction happens at a rate \(k>0\) and the reaction mechanism is simple—they both switch to becoming the opposite type. At a mean-field level, we can express the above description in terms of a system of PDEs known as the Carleman equation

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t w_1 + c \partial _x w_1 = k(w_2^2-w_1^2), \\ \partial _t w_2 - c \partial _x w_2 = k (w_1^2-w_2^2). \end{array}\right. } \end{aligned}$$
(4.1)

Following Lions and Toscani [37], we introduce the total mass density variable \(\rho \) and the flux variable j:

$$\begin{aligned} \rho :=w_1+w_2, \quad j:=c(w_1-w_2). \end{aligned}$$
(4.2)

Then

$$\begin{aligned} {\left\{ \begin{array}{ll} \partial _t \rho + \partial _x j =0, \\ \partial _t j + c^2 \partial _x \rho = - 2 k \rho j. \end{array}\right. } \end{aligned}$$

We consider a hydrodynamic rescaling of the system by setting

$$\begin{aligned} c=\epsilon ^{-1}, \quad k=\epsilon ^{-2}, \quad \text { where } \epsilon \rightarrow 0. \end{aligned}$$

Then

$$\begin{aligned} \partial _t \rho + \partial _x j&= 0, \\ \epsilon ^2 \partial _t j + \partial _x \rho&= - 2 \rho j. \end{aligned}$$

The flux variable j quickly equilibrates as \(\epsilon \rightarrow 0\) to an invariant set indexed by the slow variable \(\rho \):

$$\begin{aligned} \partial _x \rho + 2 \rho j = 0. \end{aligned}$$

This very explicit density-flux relation enables us to close the description using the \(\rho \)-variable only, giving a nonlinear diffusion equation

$$\begin{aligned} \partial _t \rho = \frac{1}{2} \partial _x \Big ( \frac{\partial _x \rho }{\rho } \Big ). \end{aligned}$$
(4.3)

The first rigorous derivation of (4.3) as limit from (4.1) was given by Kurtz [35] in 1973 under suitable assumptions on the initial data. McKean [38] improved the result by giving a different and more elementary proof. The change of coordinate to the pair \((\rho , j)\) in Lions and Toscani [37] appeared later but makes a two-scale nature of the problem much more transparent.

4.2 A microscopically defined stochastic Carleman particle model

The Carleman equation (4.1) is a mean field model without any fluctuation. We go beyond mean field model by adding more details. One way of doing so would be to introduce explicitly a Lagrangian action, so that the Carleman dynamic (4.1) appears as a critical point or minimizer in the space of curves. We will, however, pursue a different implicit approach by introducing the action probabilistically using underlying stochastic particle dynamics. There are more than one possible choice for such a model. However, they all should have the following properties: one, such a model should give the Carleman equation in the large particle number limit; two, the action should appear implicitly in the sense of the likelihood of seeing a curve in the space of curves. That is, the higher the action, the less likely to see the curve. This action can be defined through a limit theorem as several parameters get rescaled (particle number, transport speed and reaction speed), as in (1.12). The precise language to be used here are large deviations. Caprino, De Masi, Presutti and Pulvirenti [3] has considered such a stochastic particle model and studied its law of large number limit. We now study the large deviation using a slight variation of their model.

We denote the phase space variable of an N-particle system

$$\begin{aligned} (\mathbf{x,v }):=\Big ( (x_1,v_1), \ldots , (x_N, v_N)\Big ), \qquad (x_i, v_i) \in {{\mathcal {O}}} \times {{\mathbb {R}}}\end{aligned}$$

and define an operator \(\Phi _{ij}\) in the phase space

$$\begin{aligned} \Phi _{ij} (\mathbf{x,v}) :=\Big ( ( x_1, v_1), \ldots , (x_i, - v_i), \ldots , (x_j, -v_j), \ldots , (x_N,v_N)\Big ), \quad i\ne j. \end{aligned}$$
(4.4)

For \(f:=f(\mathbf{x,v})\) and \(i \ne j\), with a slight abuse of notation, we denote

$$\begin{aligned} (\Phi _{ij}f)(\mathbf{x,v}):= & {} f\big ( \Phi _{ij} (\mathbf{x,v}) \big ):= f(x_1, v_1; \ldots ; x_i, - v_i; \ldots , x_j, -v_j; \ldots , x_N,v_N) . \end{aligned}$$
(4.5)

To model nearest neighbor interaction, we introduce a standard non-negative symmetric mollifier \({\hat{J}} \in C^\infty ((-1,1);{{\mathbb {R}}}_+)\) with \(\int _{x \in (-1,1)} {\hat{J}}(x) dx =1\), \({\hat{J}}(0)>0\). We denote

$$\begin{aligned} {\hat{J}}_\theta (x) := \theta ^{-1}{\hat{J}}(\frac{x}{\theta }), \quad J_\theta (x) := \sum _{k \in {{\mathbb {Z}}}} {\hat{J}}_\theta (x+k), \end{aligned}$$

and

$$\begin{aligned} \chi&:=\chi _N(x,y; v,u) := J_{\theta _N}\big (x-y\big ) \delta _{v}(u). \end{aligned}$$

Let \(\theta := \theta _N \rightarrow 0\) slowly with \(N \theta _N \rightarrow \infty \), and let \(\tau := \tau _N \rightarrow 0\). We now describe the modification of the model studied by Caprino, De Masi, Presutti and Pulvirenti [3]. We consider a Markov process in state space \(\big ( {{\mathcal {O}}} \times \{-1, +1\}\big )^N\) given by generator

$$\begin{aligned} B_Nf(\mathbf{x,v})&:= c \sum _{i=1}^N v_i \partial _{x_i} f + \tau \sum _{i=1}^N \partial _{x_i}^2 f + \frac{k}{2N} \sum _{\begin{array}{c} i,j=1\\ i \ne j \end{array}}^N \chi _N(x_i, x_j; v_i,v_j) \Big ( \Phi _{ij} f(\mathbf{x, v}) - f(\mathbf{x,v}) \Big ) \\&= c \sum _{i=1}^N v_i \partial _{x_i} f + \tau \sum _{i=1}^N \partial _{x_i}^2 f + \frac{k}{2N} \sum _{\begin{array}{c} i,j=1\\ i \ne j \end{array}}^N J_{\theta _N}\big (x_i -x_j\big ) \delta _{v_i}(v_j)\Big ( \Phi _{ij} f(\mathbf{x, v}) - f(\mathbf{x,v}) \Big ). \end{aligned}$$

From a formal point of view, the parameter \(\tau \) is unnecessary. However, it is essential for us to obtain useful a priori estimates which allow an analysis of the limit passage. It was introduced in [3] to avoid a paradoxical feature observed by Uchiyama [42] in the case of Broadwell equations. This feature shows that particles at the same location cannot be separated by the dynamics. Hence the kinetic limit \(N \rightarrow \infty \) of the stochastic model without the term with \(\tau \) does not converge to the Carleman equation as expected by formal computations. We refer the reader to page 628 and Section 4 of [3] for more information on this point.

Let \((\mathbf{X, V}):= \big ( (X_1, V_1), \ldots , (X_N, V_N)\big )\) be the Markov process defined by the generator \(B_N\). Moreover, we denote the one-particle-marginal density

$$\begin{aligned} \mu _N(dx,v;t):=P(X_i(t) \in dx, V_i(t) =v). \end{aligned}$$

Exploring propagation of chaos, through the BBGKY hierarchy, the authors of [3] proved that, as \(N \rightarrow \infty \), \(\mu _N\) has a (kinetic) limit \(\mu := \mu (dx,v; t):= \mu (x,v;t) dx\) satisfying

$$\begin{aligned} \partial _t \mu + c v \cdot \partial _x \mu =k\Big ( \mu ^2(x, -v;t) - \mu ^2(x,v;t)\Big ), \quad \mu (0) = \mu _0. \end{aligned}$$

This is the Carleman system (4.1) if we take \(w_1(t,x)=\mu (x,+1;t)\) and \(w_2(t,x):=\mu (x,-1;t)\).

In order to understand the large deviation behavior, following [24], we compute the following nonlinear operator

$$\begin{aligned} H_B f(\mathbf{x,v})&:= e^{- f} B_N e^{f}(\mathbf{x,v}) \\&= c \sum _{i=1}^N v_i \partial _{x_i} f+ \tau \Big ( \sum _{i=1}^N \partial _{x_i}^2 f+ \sum _{i=1}^N |\partial _{x_i}f|^2 \Big ) \\&\quad + \frac{k}{2N} \sum _{\begin{array}{c} i,j=1\\ i \ne j \end{array}}^N J_{\theta _N}\big (x_i-x_j\big ) \delta _{v_i}\big (v_j\big ) \Big ( e^{\Phi _{ij}f - f} -1 \Big ). \end{aligned}$$

We define the empirical probability measure

$$\begin{aligned} \mu (dx,dv):= \mu _N\big (dx,dv\big ):= \frac{1}{N} \sum _{i=1}^N \delta _{(x_i, v_i)}(dx, dv), \end{aligned}$$
(4.6)

and choose a class of test functions which are symmetric under particle permutations,

$$\begin{aligned} f(\mathbf{x,v})&:= f(\mu ):= \psi ( \langle \varphi _1,\mu _N\rangle , \ldots , \langle \varphi _M,\mu _N\rangle ) \\&= \psi \Big ( \frac{1}{N} \sum _{k=1}^N \varphi _1(x_k,v_k), \ldots , \frac{1}{N} \sum _{k=1}^N \varphi _M(x_k,v_k) \Big ). \end{aligned}$$

The test function f can be abstractly thought of as a function in the space of probability measures with a typical element denoted as \(\mu \), hence the notation \(f(\mu )\). In the following, we use the traditional notation of a functional derivative,

$$\begin{aligned} \frac{\delta f}{\delta \mu }(x,v) := \sum _{l=1}^M \partial _l \psi ( \langle \varphi _1,\mu \rangle , \ldots , \langle \varphi _M,\mu \rangle ) \varphi _l(x,v), \quad \forall (x,v) \in {{\mathcal {O}}} \times \{ \pm 1\}. \end{aligned}$$

For any test function \(\varphi =\varphi (x,v)\), we define a collision operator which maps a function \(\varphi \) of two variables (xv) into a function \(C \varphi \) of four variables \((x,v, x_*, v_*)\), as follows:

$$\begin{aligned} \big ( C \varphi \big ) \big ( (x,v);(x_*,v_*) \big ):= \varphi (x,-v) - \varphi (x,v) + \varphi (x_*, -v_*) - \varphi (x_*, v_*). \end{aligned}$$

For a measure \(\nu \) on \({\mathcal {O}}\) and \(\theta >0\), we define its mollification

$$\begin{aligned} (J_\theta * \nu ) (y) := \int _{y \in {\mathcal {O}}} J_\theta (y-z) \nu (d z) = \sum _{k \in {{\mathbb {Z}}}} \int _{y \in {\mathcal {O}}} {\hat{J}}_\theta (y - z+k) \nu (dz). \end{aligned}$$

Then, direct computation leads to the estimate

$$\begin{aligned} H_Nf(\mu )&:=\frac{1}{N} e^{-Nf} B_N e^{Nf}(\mathbf{x,v}) = N^{-1} H_B (Nf)(\mathbf{x,v}) \\&=c \langle -\big ( v \partial _x \mu \big ), \frac{\delta f}{\delta \mu } \rangle + \frac{k}{2} \frac{1}{N^2} \sum _{\begin{array}{c} i,j=1\\ i \ne j \end{array}}^N J_{\theta _N}\big (x_i-x_j\big ) \delta _{v_j}(v_i) \Big (e^{C \frac{\delta f}{\delta \mu }(x_i,v_i;x_j,v_j)}-1 \Big ) + o_N(1) \\&= c \langle -\big ( v \partial _x \mu \big ), \frac{\delta f}{\delta \mu } \rangle + \frac{k}{2} \sum _{v=+1,-1} \int _{x \in {\mathcal {O}}} (e^{C \frac{\delta f}{\delta \mu }(x,v;x,v)}-1 ) (J_{\theta _N}*\mu ) (x,v) \mu (dx,v) + o_N(1). \end{aligned}$$

In the last line above, we invoked the condition \(N \theta _N \rightarrow +\infty \) to ensure that the diagonal terms \(\sum _{i=j=1}^\infty \) have a negligible effect on the overall convergence. Assuming \(\mu _N \rightarrow \mu \) in narrow topology where the \(\mu (dx;v) = \mu (x,v) dx\), we then have

$$\begin{aligned} H_Nf(\mu _N) \rightarrow c \langle -\big ( v \partial _x \mu \big ), \frac{\delta f}{\delta \mu } \rangle + \frac{k}{2} \sum _{v=+1,-1} \int _{x \in {\mathcal {O}}} (e^{C \frac{\delta f}{\delta \mu }(x,v;x,v)}-1 ) \mu ^2(x,v) dx. \end{aligned}$$

4.3 Large deviation from the hydrodynamic limit

We now consider the hydrodynamic scaling by taking \(c:=\epsilon ^{-1}\) and \(k:= \epsilon ^{-2}\), together with \(N:=N(\epsilon ) \rightarrow \infty \).

To emphasize the two-scale nature of the problem, we switch to the density-flux coordinates:

$$\begin{aligned} \rho (dx) := \sum _{v =+1,-1} \mu (dx,v), \quad j(dx) :=\epsilon ^{-1} \sum _{v=\pm 1} v \mu (dx,v). \end{aligned}$$
(4.7)

The calculations in the coming paragraphs will heavily rely upon the simple relations:

$$\begin{aligned} \sum _{v=\pm 1} v \mu ^2(x,v) = \mu ^2(x,1) - \mu ^2(x,-1) = \epsilon \rho (x) j(x) \end{aligned}$$
(4.8)

and

$$\begin{aligned} \sum _{v=\pm 1} \mu ^2(x,v) =\frac{1}{2}\big ( \rho ^2(x) + \epsilon ^2 j^2(x)\big ). \end{aligned}$$

Let

$$\begin{aligned} {\tilde{\varphi }}_1(x,v):= \varphi _1(x), \quad {\tilde{\varphi }}_2(x,v) := c v \varphi _2(x), \quad \forall \varphi _i \in C^1({\mathcal O}), i=1,2. \end{aligned}$$

Then

$$\begin{aligned} \langle \varphi _1, \rho \rangle = \langle {\tilde{\varphi }}_1, \mu \rangle , \quad \langle \varphi _2, j \rangle = \langle {\tilde{\varphi }}_2,\mu \rangle . \end{aligned}$$

We consider

$$\begin{aligned} f(\mu ):= f(\rho , j) :=\psi \big (\langle \varphi _1, \rho \rangle , \langle \varphi _2, j \rangle \big ) =\psi \big ( \langle {\tilde{\varphi }}_1, \mu \rangle , \langle {\tilde{\varphi }}_2, \mu \rangle \big ), \end{aligned}$$

and then

$$\begin{aligned} \frac{\delta f}{\delta \mu }(x,v) = (\partial _1 \psi ) \varphi _1(x) + (\partial _2 \psi ) cv \varphi _2(x) = \frac{\delta f}{\delta \rho }(x) + \frac{1}{\epsilon } v \frac{\delta f}{\delta j}(x). \end{aligned}$$
(4.9)

More generally, we can consider

$$\begin{aligned} f(\mu )&:= f(\rho , j ) \\&:=\psi \big (\langle \varphi _{1,1}, \rho \rangle , \ldots , \langle \varphi _{1,M}, \rho \rangle ; \langle \varphi _{2,1}, j \rangle , \ldots , \langle \varphi _{2,M}, j \rangle \big ) \\&= \psi \big ( \langle {\tilde{\varphi }}_{1,1}, \mu \rangle , \ldots , \langle {\tilde{\varphi }}_{1,M}, \mu \rangle ; \langle {\tilde{\varphi }}_{2,1}, \mu \rangle , \ldots , \langle {\tilde{\varphi }}_{2,M}, \mu \rangle \big ). \end{aligned}$$

The identity (4.9) relating derivatives of \(\mu \) to those of \(\rho \) and j still holds:

$$\begin{aligned} \frac{\delta f}{\delta \mu }(x,v)= \frac{\delta f}{\delta \rho }(x) + \frac{1}{\epsilon } v \frac{\delta f}{\delta j}(x) . \end{aligned}$$
(4.10)

From this point on, we write \(H_\epsilon := H_{N(\epsilon )}\) to emphasize the dependence of \(\epsilon \). Then

$$\begin{aligned} \big ( H_\epsilon f \big )(\rho , j) =&\langle \frac{\delta f}{\delta \rho }, - \partial _x j\rangle + \frac{1}{\epsilon ^2} \langle \frac{\delta f}{\delta j}, - \partial _x \rho \rangle \\&\quad + \frac{1}{2\epsilon ^2} \sum _{v=\pm 1} \int _{x \in {{\mathbb {R}}}} \big ( e^{\frac{1}{\epsilon }(-4v) \frac{\delta f}{\delta j}(x) }-1 \big ) \mu ^2(x,v)dx + o_\epsilon (1). \end{aligned}$$

Following the abstract theorems in Feng and Kurtz [24], if we can derive a limit of the operator \(H_\epsilon \) (which we claim to be the H in (1.1)) and if we can prove the associated comparison principle, then

$$\begin{aligned} \lim _{\delta \rightarrow 0^+} \lim _{\epsilon \rightarrow 0^+} - \epsilon \log P \big ( \rho _\epsilon (\cdot ) \in B(\rho (\cdot ); \delta ) \big ) = A_T[\rho (\cdot )], \quad \forall \rho \in C([0,T]; {{\mathsf {X}}}), \end{aligned}$$

where the action functional \(A_T\) is defined as in (1.4); here \(B(\rho (\cdot );\delta )\) is a ball of size \(\delta \) around \(\rho (\cdot )\) in \(C([0,T]; {{\mathsf {X}}})\) and \({{\mathsf {X}}}={\mathcal {P}}({\mathcal {O}})\) is the metric space specified in the introduction. This will then rigorously justify the formally statement (1.12). We reiterate that the aim of this paper is to establish the one challenging part, the comparison principle, rigorously (Sect. 2), which we now give heuristic arguments for the other part, the convergence.

4.4 Convergence of Hamiltonian operators, in a singularly perturbed sense

We show that the Hamiltonian H in (1.1) is a formal limit for the sequence of operators given by \(H_\epsilon \). The identification of H is related to an infinite-dimensional version of ground state energy problem in (4.15). We now describe three different possible approaches.

We consider a class of perturbed test functions

$$\begin{aligned} f_\epsilon (\rho , j) := f_0(\rho ) + \epsilon ^2 f_1(\rho , j). \end{aligned}$$
(4.11)

It follows then

$$\begin{aligned} \big ( H_\epsilon f_\epsilon \big ) (\rho , j)&= {{\mathbb {H}}} (\rho , j; \frac{\delta f_0}{\delta \rho }, \frac{\delta f_1}{\delta j}) + o_\epsilon (1), \end{aligned}$$
(4.12)

where

$$\begin{aligned} {{\mathbb {H}}}(\rho ,j; \varphi , \xi )&:= \langle \varphi , - \partial _x j\rangle + \langle \xi , - \partial _x \rho -2 \rho j \rangle + 2 \int _x |\xi |^2 \rho ^2 dx\nonumber \\&= {{\mathsf {H}}}(j,\xi ; \rho ) + {{\mathsf {V}}}(j; \varphi ), \end{aligned}$$
(4.13)

with

$$\begin{aligned} {{\mathsf {H}}}(j, \xi ):= {{\mathsf {H}}}(j,\xi ; \rho ):= \langle \xi , - \partial _x \rho -2 \rho j \rangle +2 \int _x |\xi (x)|^2 \rho ^2(x) dx, \quad \forall \xi \in C_c^\infty ({\mathcal {O}}), \end{aligned}$$
(4.14)

and

$$\begin{aligned} {{\mathsf {V}}}(j):= {{\mathsf {V}}}(j; \varphi ) := \langle \varphi , -\partial _x j \rangle = \langle \partial _x \varphi , j\rangle . \end{aligned}$$

We would like to have the limit in (4.12) independent of the j-variable and thus want to make the j-variable to disappear asymptotically. We can choose the test function \(f_1\) suitably to achieve this.

We introduce perturbed Hamiltonians in the j-variable

$$\begin{aligned} {{\mathsf {H}}}_{{\mathsf {V}}}(j, \xi ) := {{\mathsf {H}}}_{{\mathsf {V}}}(j, \xi ; \rho , \varphi ):= {{\mathsf {H}}}(j, \xi ) + {{\mathsf {V}}}(j). \end{aligned}$$

Then we seek solution to a stationary Hamilton–Jacobi equation in the j-variable

$$\begin{aligned} {{\mathsf {H}}}_{{\mathsf {V}}}(j,\frac{\delta f_1}{\delta j}) = H, \end{aligned}$$
(4.15)

where H is a constant in j but may depend on \(\rho \) through \({{{\mathsf {H}}}}(\cdot , \cdot ;\rho )\) and on \(\varphi \) through \({{\mathsf {V}}}(\cdot ;\varphi )\). We denote this dependence as

$$\begin{aligned} H:= H(\rho , \varphi ). \end{aligned}$$

Suppose that we can solve (4.15), then

$$\begin{aligned} H_\epsilon f_\epsilon (\rho , j) = H \big (\rho ; \frac{\delta f_0}{\delta \rho }\big ) + o_\epsilon (1) \end{aligned}$$

and

$$\begin{aligned} \lim _{\epsilon \rightarrow 0^+} H_\epsilon f_\epsilon = H f_0. \end{aligned}$$

Hence we can conclude our program. Next, we identify the Hamiltonian H as the one defined in (1.1) and show that the associated Hamilton–Jacobi equation (1.3) (in the interpretation of Sects. 2 and 3) is solvable.

We now present three different approaches to identify H. We comment that, although we work with a specific model (the Carleman particles) in this paper, our goal has been more ambitious. We would like to explore the scope of applicability of the Hamiltonian operator convergence method in the context of hydrodynamic limits. As this ambition is very general, we aim to present as many ways of verifying the required conditions as possible.

4.5 First approach to identify H—formal weak KAM method in infinite dimensions

In finite dimensions, equations of the type (4.15) have been studied in the weak KAM (Kolmogorov–Arnold–Moser) theory for Hamiltonian dynamical systems. See Fathi [18,19,20], E [12], Evans [13,14,15], Evans and Gomes [16, 17], Fathi and Siconolfi [21, 22] and others; there is an unpublished book of Fathi [23]. The existing literature focuses on finite-dimensional systems, mostly with compactness assumptions on the physical space. Our setting is necessarily very different, as we have an infinite-dimensional non-locally compact state space. In the following, we (formally) apply conclusions of the existing weak KAM theory to arrive at the representation

$$\begin{aligned} H(\rho , \varphi ) = \inf _{f_1}\sup _{j} \Big ( {{\mathsf {H}}}(j,\frac{\delta f_1}{\delta j};\rho ) + {{\mathsf {V}}}(j;\rho ,\varphi )\Big ). \end{aligned}$$
(4.16)

The representation (4.16) can be made more explicit due to a hidden controlled gradient flow structure in \({{\mathsf {H}}}\) (see (4.17)). To present the ideas as clearly as possible, we introduce yet another set of coordinates by considering

$$\begin{aligned} u(x):= \frac{d j}{d \rho }(x), \quad x \text { a.e. } \rho . \end{aligned}$$

Then

$$\begin{aligned} j(dx) = u(x) \rho (dx), \quad \frac{\delta f}{\delta u}(x) = \rho (dx) \frac{\delta f}{\delta j}(x). \end{aligned}$$

This motivates us to introduce new test functions \(\phi := \rho \xi \). Under the new coordinates, we have

$$\begin{aligned} {{\mathsf {V}}}(u; \varphi , \rho )&= \langle \varphi , - \partial _x (\rho u)\rangle , \\ {{\mathsf {H}}}(u,\phi ;\rho )&= \langle \phi , - \partial _x \log \rho - 2 \rho u \rangle + 2 \int _{{\mathcal {O}}} \big | \phi \big |^2 dx. \end{aligned}$$

We define a free energy function

$$\begin{aligned} {{\mathsf {F}}}(u):={{\mathsf {F}}}(u;\rho ) := \frac{1}{4} \int _{{\mathcal {O}}}\big ( \rho u^2 + u(x) \partial _x \log \rho (x)\big ) dx, \end{aligned}$$

so

$$\begin{aligned} \frac{\delta {{\mathsf {F}}}}{\delta u} = \frac{1}{4} ( 2 \rho u + \partial _x \log \rho ), \end{aligned}$$

and

$$\begin{aligned} {{\mathsf {H}}}(u, \frac{\delta f}{\delta u}; \rho )&= - 4 \langle \frac{\delta f}{\delta u}, \frac{\delta {{\mathsf {F}}}}{\delta u} \rangle + 2 \int _{{\mathcal {O}}} |\frac{\delta f}{\delta u}|^2 dx \nonumber \\&= 2 \Big ( \int _{{\mathcal {O}}} \big ( \frac{\delta f}{\delta u}- \frac{\delta {{\mathsf {F}}}}{\delta u}\big )^2 dx - \int _{{\mathcal {O}}} |\frac{\delta {{\mathsf {F}}}}{\delta u}|^2 dx\Big ). \end{aligned}$$
(4.17)

In particular, for any \(\theta \in [0,2]\),

$$\begin{aligned} {{\mathsf {H}}}(\theta {{\mathsf {F}}})(u;\rho ) = {{\mathsf {H}}}(u;\theta \frac{\delta {{\mathsf {F}}}}{\delta u}; \rho ) = -2 \theta (2-\theta ) \int _{{\mathcal {O}}} |\frac{\delta {{\mathsf {F}}}}{\delta u}|^2 dx \le 0 . \end{aligned}$$

This inequality will play important role in the rigorous justification of the derivation of H in (1.1).

Putting everything together, (4.16) gives

$$\begin{aligned} H(\rho , \varphi )&= \inf _{f_1}\sup _{u} \Big ( {{\mathsf {H}}}(u,\frac{\delta f_1}{\delta u};\rho ) + {{\mathsf {V}}}(u; \varphi , \rho )\Big ) \\&= \inf _{f_1}\sup _{u} 2 \int _{{\mathcal {O}}} \Big ( \big | \frac{\delta f_1}{\delta u} -\frac{\delta {{\mathsf {F}}}}{\delta u} \big |^2 - \big | \frac{\delta {{\mathsf {F}}}}{\delta u} \big |^2 \Big ) dx + {{\mathsf {V}}}(u;\rho , \varphi ) \\&= \sup _{u} \int _{{\mathcal {O}}} \Big ( \rho u \partial _x \varphi - 2\big | \frac{\delta {{\mathsf {F}}}}{\delta u}\big |^2\Big ) dx \\&= \sup _{u} \int _{{\mathcal {O}}} \Big ( \rho u \partial _x \varphi - \frac{1}{8} |2 \rho u + \partial _x \log \rho |^2 \Big ) dx \\&= \sup _{\eta } \int _{{\mathcal {O}}} \Big ( (\eta -\frac{1}{2}\partial _x \log \rho ) \partial _x \varphi - \frac{1}{2} | \eta |^2 \Big ) dx \\&= -\frac{1}{2} \langle \partial _x \log \rho , \partial _x \varphi \rangle + \frac{1}{2} \int _{{\mathcal {O}}} |\partial _x \varphi |^2 dx. \end{aligned}$$

This is the Hamiltonian we gave in (1.1).

4.6 A decomposition of the \({{\mathsf {H}}}\) into a family of microscopic ones \(\big \{ {{\mathfrak {h}}}(\cdot ; \alpha ,\beta ) : \alpha , \beta \in {{\mathbb {R}}}\big \}\)

The second and third approaches to identify H involve a subtle argument we are going to explain first.

For the kind of problem we consider, we intuitively expect that propagation of chaos to hold. We expect this even at the large deviation/ hydrodynamic limit scale. Therefore, the infinite-dimensional Hamiltonian \({{\mathsf {H}}}\) is expected to be representable as summation of a family of one-particle level Hamiltonians indexed by some hydrodynamic parameters in statistical local equilibrium. This intuition leads to the following arguments.

We define a family of Hamiltonians indexed by \((\alpha ,\beta )\) at the one-particle level,

$$\begin{aligned} {\mathfrak {h}}(\upsilon , p; \alpha , \beta )&:= -(2 \alpha \upsilon + \beta ) p + 2 p^2 , \quad (\upsilon ,p) \in {{\mathbb {R}}}\times {{\mathbb {R}}}, \quad \forall \alpha , \beta \in {{\mathbb {R}}}, \\ {\mathfrak {h}}^P(\upsilon , p;\alpha ,\beta )&:={\mathfrak {h}}(\upsilon , p; \alpha , \beta ) + \alpha P \upsilon , \quad \forall P \in {{\mathbb {R}}}. \end{aligned}$$

We observe that

$$\begin{aligned} {{\mathsf {H}}}(u, \phi ; \rho ) = \int _{{\mathcal {O}}} {\mathfrak {h}} \big ( u(x), \phi (x); \rho (x), \partial _x \log \rho (x) \big ) dx, \end{aligned}$$
(4.18)

and that

$$\begin{aligned} {{\mathsf {H}}}_{{\mathsf {V}}}(u,\phi ;\rho , \varphi ) = \int _{{\mathcal {O}}} {\mathfrak {h}}^{\partial _x \varphi (x)} \big ( u(x), \xi (x); \rho (x), \partial _x \log \rho (x) \big ) dx. \end{aligned}$$

At least formally, if we take

$$\begin{aligned} f_1(u):= \int _{y \in {\mathcal {O}}} \psi \big (u(y);y\big ) dy \end{aligned}$$
(4.19)

and denote \(\partial _1 \psi (\upsilon ;y):= \partial _\upsilon \psi (\upsilon ;y)\), then

$$\begin{aligned} \frac{\delta f_1}{\delta u}(x) = \partial _1 \psi \big (u(x);x\big ), \end{aligned}$$

and

$$\begin{aligned} {{\mathsf {H}}}\Big (u, \frac{\delta f_1 }{\delta u}; \rho \Big )&= \int _{{\mathcal {O}}} {{\mathfrak {h}}} \big (u(x), \partial _1 \psi \big (u(x);x\big ); \rho (x), \partial _x \log \rho (x) \big ) dx, \\ {{\mathsf {H}}}_{{\mathsf {V}}}\Big (u, \frac{\delta f_1 }{\delta u}; \rho , \varphi \Big )&= \int _{{\mathcal {O}}} {{\mathfrak {h}}}^{\partial _x \varphi (x)} \big (u(x), \partial _1 \psi \big (u(x);x\big ); \rho (x), \partial _x \log \rho (x) \big ) dx. \end{aligned}$$

Therefore, in order to solve (4.15), it suffices to solve a family (indexed by \(\alpha \) and \(\beta \)) of finite-dimensional “small cell” problems

$$\begin{aligned} {{\mathfrak {h}}}\big (\upsilon , \partial _\upsilon \psi ; \alpha , \beta \big ) + \alpha P \upsilon = E[P;\alpha ,\beta ], \quad \forall \upsilon \in {{\mathbb {R}}}. \end{aligned}$$
(4.20)

Here, E is a constant of the variable \(\upsilon \). Moreover, if we can solve this finite-dimensional PDE problem, then the H term for the infinite-dimensional problem (4.15) has a solution

$$\begin{aligned} H(\rho , \varphi ) = \int _{{\mathcal {O}}} E[\partial _x \varphi (x); \rho (x), \partial _x \log \rho (x)] dx. \end{aligned}$$

These consideration leads to two more ways of identifying the effective Hamiltonian \(H=H(\rho ,\varphi )\). (We remark that we could present at least one further approach, which exploits the special one-dimensional nature of (4.20) by invoking the Maupertuis’ principle. We choose not to present this approach, since we are interested in general methodologies that work even when the velocity field u(x) take values in several dimensions and \(\upsilon \) in (4.20) lives in several dimensions.)

4.7 Second approach to identify H—finite-dimensional weak KAM and the method of equilibrium points

We introduce a microscopic (one-particle level) free energy function

$$\begin{aligned} {\mathfrak {f}}(\upsilon ):= {\mathfrak {f}}(\upsilon ;\alpha ,\beta ):= \frac{1}{4} \big ( \alpha \upsilon ^2 + \beta \upsilon \big ). \end{aligned}$$

The connection with the free energy introduced earlier is that

$$\begin{aligned} {{\mathsf {F}}}(u;\rho ) = \int _{{\mathcal {O}}} {\mathfrak {f}}\big (u(x); \rho (x), \partial _x \log \rho (x)\big ) dx. \end{aligned}$$

It is not surprising that the microscopic Hamiltonians \({\mathfrak h}\) also have controlled gradient flow structures:

$$\begin{aligned} {\mathfrak {h}}(\upsilon , p; \alpha , \beta )&= 4 \Big ( \frac{1}{2} p^2 - p \partial _\upsilon {\mathfrak {f}} \Big ) = 2 \big ( |p - \partial _\upsilon {\mathfrak {f}}|^2 - |\partial _\upsilon {\mathfrak {f}}|^2 \big ) \nonumber \\&= {{\mathfrak {h}}}_{\mathrm{iso}} \big (\upsilon , p -\partial _\upsilon {\mathfrak {f}} \big ), \end{aligned}$$
(4.21)

if we introduce a family of isotropic Hamiltonians

$$\begin{aligned} {{\mathfrak {h}}}_{\mathrm{iso}}(\upsilon , p) := 2 \big ( |p|^2 - |\partial _\upsilon {\mathfrak {f}}|^2 \big ). \end{aligned}$$

Solving (4.20) is equivalent to solving

$$\begin{aligned} {{\mathfrak {h}}}_{\mathrm{iso}}\big (\upsilon , \partial _\upsilon \Psi \big ) + \alpha \upsilon P = E, \end{aligned}$$

with \(\Psi = (\psi -{{\mathfrak {f}}})\). We note \({\mathfrak {h}}_\mathrm{iso}\) is isotropic in the sense that the dependence on generalized momentum variable p is only through its length |p|, i.e. \({{\mathfrak {h}}}_{\mathrm{iso}}(\upsilon , p) = {{\mathfrak {h}}}_{\mathrm{iso}} \big (\upsilon , |p|\big )\). It also holds that \({{\mathbb {R}}}_+ \ni r \mapsto \eta _{\mathrm{iso}}(\upsilon , r)\) is convex, monotonically nondecreasing and super-linear. In particular,

$$\begin{aligned} \inf _{r \in {{\mathbb {R}}}_+} \eta _{\mathrm{iso}}(\upsilon , r) = \eta _\mathrm{iso}(\upsilon , 0). \end{aligned}$$

For this kind of Hamiltonian, it is known that (e.g. Fathi [23])

$$\begin{aligned} E&=E[P;\alpha ,\beta ]= \sup _{\upsilon \in {{\mathbb {R}}}} \big ( {\mathfrak h}_{\mathrm{iso}}(\upsilon , 0) + \alpha \upsilon P\big ) \\&= \sup _{\upsilon \in {{\mathbb {R}}}} \big (\alpha \upsilon P - 2 |\partial _\upsilon {{\mathfrak {f}}}(\upsilon )|^2 \big ) = \sup _{\upsilon \in {{\mathbb {R}}}}\Big ( \alpha \upsilon P -\frac{1}{2} \big | \alpha \upsilon + \frac{1}{2} \beta \big |^2\Big ) \\&= - \frac{1}{2} \beta P + \frac{1}{2}P^2. \end{aligned}$$

Consequently,

$$\begin{aligned} H(\rho ,\varphi ) = \int _{{\mathcal {O}}} E[\partial _x \varphi (x); \rho (x), \partial _x \log \rho (x)] dx = -\frac{1}{2} \langle \partial _x \log \rho , \partial _x \varphi \rangle + \frac{1}{2} \int _{{{\mathbb {R}}}} |\partial _x \varphi |^2 dx. \end{aligned}$$

4.8 Third approach to identify H—semiclassical approximations

Finally, we abandon methods based on weak KAM. Instead, we introduce a method for identifying \(E[P;\alpha ,\beta ]\) directly using probability theory and ideas from semi-classical limits.

Our point of departure is to approximate equation (4.20) by introducing an extra viscosity parameter \(\kappa >0\). For readers familiar with the Hamiltonian convergence approach to large deviation as described in Feng and Kurtz [24], \({\mathfrak {h}}\) is (see (4.23) below) the limiting Hamiltonian for small noise large deviations (\(\kappa \rightarrow 0^+\)) for the stochastic differential equations

$$\begin{aligned} d \upsilon (t) + \big (2 \alpha \upsilon (t) + \beta \big ) dt= 2 \sqrt{\kappa } d W(t). \end{aligned}$$
(4.22)

The solution \(\upsilon (t)\) is an \({{\mathbb {R}}}\)-valued Markov process with infinitesimal generator

$$\begin{aligned} L_\kappa \psi (\upsilon ) := - (2 \alpha \upsilon +\beta ) \partial _\upsilon \psi (\upsilon ) + 2 \kappa \partial ^2_{\upsilon \upsilon } \psi (\upsilon ). \end{aligned}$$

Following [24], we define a sequence of nonlinear second order differential operators

$$\begin{aligned} \big ( {{\mathfrak {h}}}_\kappa \psi \big ) (\upsilon ):= e^{-\kappa ^{-1} \psi } \Big ( \kappa L_\kappa \Big ) e^{\kappa ^{-1} \psi }(\upsilon ), \end{aligned}$$

then

$$\begin{aligned} \lim _{\kappa \rightarrow 0^+} \big ({{\mathfrak {h}}}_\kappa \psi \big )(\upsilon ) = {{\mathfrak {h}}} \big (\upsilon , \partial _\upsilon \psi \big ). \end{aligned}$$
(4.23)

We also consider a second-order stationary Hamilton–Jacobi equation with constant \(E_\kappa \):

$$\begin{aligned} \big ( {{\mathfrak {h}}}_\kappa \psi \big ) (\upsilon )+ \alpha \upsilon P = E_\kappa . \end{aligned}$$
(4.24)

This can be viewed as a regularized approximation to the first-order equation (4.20).

A simple transformation turns the nonlinear PDE (4.24) into a linear eigenfunction, eigenvalue equation

$$\begin{aligned} ( \kappa L_\kappa + \alpha \upsilon P ) \Psi _\kappa = E_\kappa \Psi _\kappa , \end{aligned}$$
(4.25)

where

$$\begin{aligned} \Psi _\kappa := e^{ \kappa ^{-1} \psi } >0. \end{aligned}$$

This is the equation defining ground state \(\Psi _\kappa \) with ground state energy \(E_\kappa \) of the rescaled Schrödinger operator \(\kappa L_\kappa \). There is a theory giving uniqueness for the constant E in (4.20). By well-known stability results for viscosity solution of Hamilton–Jacobi equations (4.24), we can prove that

$$\begin{aligned} E= \lim _{\kappa \rightarrow 0} E_\kappa . \end{aligned}$$

The ground state energy \(E_\kappa \) is given by the Rayleigh-Ritz formula, which has been extensively studied in probability theory in the context of large deviations for occupation measures by Donsker and Varadhan. We denote by

$$\begin{aligned} m_\kappa (d \upsilon ) := Z_\kappa ^{-1} e^{-\frac{\alpha \upsilon ^2 + \beta \upsilon }{2 \kappa }} d \upsilon \end{aligned}$$

the invariant probability measure for the Markov process \(\upsilon (t)\), and introduce a family of related probability measures indexed by \(\Phi \in C_b({{\mathbb {R}}})\):

$$\begin{aligned} m_\kappa ^\Phi (d\upsilon ) := \frac{e^{2\Phi (\upsilon )}}{ \int e^{2 \Phi } d m_\kappa } m_\kappa (d \upsilon ). \end{aligned}$$

We identify the (pre-)Dirichlet form associated with \(\kappa L_\kappa \) by

$$\begin{aligned} {\mathcal {E}}_\kappa (\phi _1,\phi _2):= - \int _{\upsilon \in {{\mathbb {R}}}} \phi _1(\upsilon ) (\kappa L_\kappa \phi _2)(\upsilon ) m_\kappa (d\upsilon ) = 2 \kappa ^2 \int _{{{\mathbb {R}}}} (\partial _{\upsilon } \phi _1)( \partial _{\upsilon } \phi _2 )m_\kappa (d \upsilon ). \end{aligned}$$

Then, by the arguments on pages 112 and 113 of Stroock [41] (alternatively, one can also follow Example B.14 in Feng and Kurtz [24]), we have

$$\begin{aligned} E_\kappa [P] = \sup _{\Phi } \Big \{ \alpha P \int _{{{\mathbb {R}}}} \upsilon m_\kappa ^\Phi (d\upsilon ) - {\mathcal {E}}_\kappa \Big (\sqrt{\frac{d m_\kappa ^\Phi }{d m_\kappa }}\Big )\Big \} = \sup _\Phi \int _{{{\mathbb {R}}}}\big ( \alpha \upsilon P - 2 \kappa ^2 |\partial _\upsilon \Phi |^2\big ) m_\kappa ^\Phi (d\upsilon ). \end{aligned}$$

A change of variable \({\hat{\Phi }} \mapsto \kappa \Phi \) gives

$$\begin{aligned} E_\kappa [P] = \sup _{\Phi } \int _{{{\mathbb {R}}}} \big ( \alpha \upsilon P - 2 |\partial _\upsilon {\hat{\Phi }}(\upsilon )|^2 \big ) m_{\kappa ,{\hat{\Phi }}}(d\upsilon ) , \end{aligned}$$

where

$$\begin{aligned} m_{\kappa ,{\hat{\Phi }}}(d\upsilon ) = \frac{e^{\frac{1}{\kappa } (2 {\hat{\Phi }} - \frac{\alpha \upsilon ^2 +\beta \upsilon }{2} )}d\upsilon }{ Z_{\kappa ,{\hat{\Phi }}}}. \end{aligned}$$

We can further lift the \(m_{\kappa , \Phi }\) probability measure to

$$\begin{aligned} {\mathfrak {m}}_{\kappa ,\Phi }(d\upsilon , d \xi ) := \delta _{\partial _\upsilon \Phi }(d \xi ) m_{\kappa ,\Phi }(d\upsilon ), \quad (\upsilon , \xi ) \in {{\mathbb {R}}}\times {{\mathbb {R}}}, \end{aligned}$$

giving

$$\begin{aligned} E_\kappa [P] = \sup _{\Phi } \int _{{{\mathbb {R}}}} \big ( \alpha \upsilon P - 2 |\xi |^2 \big ) {\mathfrak {m}}_{\kappa ,\Phi }(d\upsilon , d \xi ). \end{aligned}$$

We see that as \(\kappa \rightarrow 0\), by the Laplace principle, the limit points of \(\{ {\mathfrak {m}}_{\kappa , \Phi }: \kappa >0 \}\) form a family of probability measures as follows:

$$\begin{aligned} {\mathfrak {m}}_\Phi (d\upsilon , d \xi ):=\sum _k p_k \delta _{\{\upsilon _k, \partial _\upsilon \Phi (\upsilon _k)\}}(d\upsilon , d \xi ), \quad \sum _k p_k=1, p_k>0, \end{aligned}$$

with \(\upsilon _k\) solves the algebraic equation

$$\begin{aligned} 4 \partial _\upsilon \Phi (\upsilon ) - (2 \alpha \upsilon + \beta ) =0. \end{aligned}$$

That is,

$$\begin{aligned} \alpha \upsilon =2 \xi -\frac{\beta }{2 }, \quad \forall (\upsilon , \xi ) \in \text {supp}[{\mathfrak {m}}_\Phi ]. \end{aligned}$$

Then it follows that

$$\begin{aligned} E[P;\alpha ,\beta ]&= \lim _{\kappa \rightarrow 0} E_\kappa [P;\alpha ,\beta ] = \lim _{\kappa \rightarrow 0} \sup _\Phi \int _{{{\mathbb {R}}}} \big ( \alpha \upsilon P - 2 |\xi |^2 \big ) {\mathfrak {m}}_{\kappa ,\Phi }(d\upsilon , d \xi ) \\&= \sup _\Phi \int _{{{\mathbb {R}}}} \big ( \alpha \upsilon P - 2 |\xi |^2 \big ) {\mathfrak {m}}_{\Phi }(d\upsilon , d \xi ) \\&= \sup _{{\mathfrak {m}}} \int \Big ( \big ( 2 \xi - \frac{\beta }{2}\big )P - 2 |\xi |^2 \Big ) {\mathfrak m}(d\upsilon ,d\xi ) \\&= -\frac{\beta }{2} P +\frac{P^2}{2}. \end{aligned}$$

Hence again we are lead to the

$$\begin{aligned} H(\rho ,\varphi ) = \int _{{\mathcal {O}}} E[\partial _x \varphi (x); \rho (x), \partial _x \log \rho (x)] dx = \frac{1}{2} \langle \partial ^2_{xx} \log \rho , \varphi \rangle + \frac{1}{2} \int _{{{\mathbb {R}}}} |\partial _x \varphi |^2 dx \end{aligned}$$

and again recover the Hamiltonian (1.1).