1 Introduction

Flocking is a prevalent multi-agent interaction phenomenon with wide-ranging applications in biology, ecology, robotics and control theory, sensor networks, as well as sociology and economics (see [1,2,3,4,5,6,7,8]. As a result, numerous scholars have embarked on studying the collective behavior of multi-agent systems and have developed various models. Similar to physics, the study of idealized models can often shed light on various observed phenomena in the real world, provided that the fundamental principles of these models are understood. Therefore, the analysis of these mechanisms and phenomena in mathematics is of paramount importance.

Consider a large particle system consisting of \(N\gg 1\) exactly identical point particles and the effect of their pairwise interactions. The evolution of this N-particle system can be described in the following two ways (see Golse [9]):

(I) By using the system of motion equations, specifically Newton’s second law of motion, which is satisfied by each individual particle. This method is flawless in theory, but lacks practical feasibility. As the number of particles increases significantly, acquiring extensive observational data becomes prohibitively expensive. The implementation of this method at a large scale presents considerable challenges.

(II) By using the motion equation for the “typical particle” that is driven by its collective interaction with all other particles in the system. It is an approximation on a phase space with relatively low dimensionality, and while it may not capture all the details of the system, the dynamics of the particle system provides a more manageable approach.

The description of a large particle system can be achieved by following the procedure outlined in (II), which is commonly referred to as the “mean field approximation” for the dynamics of an N-particle system.

This obviously raises the mathematical problem of rigorously justifying the mean field approximation (II) starting from (I) as a first principle. It is desirable to obtain a convergence rate as the particle number \(N\rightarrow \infty \) in order to gauge the precision of the mean field description. This convergence rate would provide crucial insights into the accuracy of the mean field approximation and its applicability to systems with a large number of particles. Furthermore, a thorough analysis of the convergence rate as N tends to infinity would contribute significantly to the theoretical foundation of the mean field approach and enhance our understanding of its limitations and capabilities.

In their quest to investigate the flocking phenomenon of multi-agent systems, Cucker and Smale introduced a mathematical model to elucidate the dynamic behavior of N-particle model [6, 7]. The Cucker–Smale model serves as a mathematical framework for portraying the interactions among a group of agents through Newtonian forces. This model comprises a system of ordinary differential equations (ODEs) governing the motion of each agent within the group. Each agent’s motion is influenced by the positions and velocities of its neighbors, as well as its own velocity and acceleration. The interaction forces between agents are modeled using Newton’s second law of motion, where the force is proportional to the difference in their velocities. Consequently, agents tend to converge if they are moving in the same direction and diverge if moving in opposite directions. The strength of these interaction forces can be modulated by a parameter known as the “cohesion” term, representing the attraction between agents. The versatility of the Cucker–Smale model is exemplified by its application across various systems, including biological scenarios such as bird flocking, fish schooling, and bacterial flocking [10,11,12]. Through adjustments to the model’s parameters, researchers have successfully replicated many observed behaviors in these systems. For instance, augmenting the cohesion term results in more densely packed groups of agents, whereas reducing it leads to more dispersed groups [8, 13,14,15]. The Cucker–Smale model serves as a powerful tool for comprehending the collective behavior of multi-agent systems. Through the examination of its dynamics, researchers can acquire valuable insights into the fundamental mechanisms that lead to intricate emergent behaviors in nature. Recent research involving the Cucker–Smale model has yielded significant advancements, including applications in multi-agent systems, animal behavior simulation, and control. These findings not only enhance our comprehension of animal group behavior but also furnish a theoretical foundation for prospective applications.

The Cucker–Smale particle model is represented as follows

$$\begin{aligned} \left\{ \begin{array}{l} \dfrac{\textrm{d}}{\textrm{d} t} {x}_{i}(t)={v}_{i}(t), i=1,2,\ldots ,N, \\[4mm] \dfrac{\textrm{d}}{\textrm{d} t} {v}_{i}(t)= \dfrac{1}{N}\sum \limits _{j=1}^N \phi _{ij}({x})\left( {v}_{j}-{v}_{i}\right) , \end{array} \right. \end{aligned}$$
(1.1)

where \({x}_{i}(t) \in \mathbb {R}^{d}, {v}_{i}(t) \in \mathbb {R}^{d}\) respectively represents the position and velocity of agent i at time t, \( d \ge 1 \) is a positive integer and \( \phi _{ij}({x}) \) is the interaction weight between agents. The communication weight function \(\phi (r)=\dfrac{\theta }{(1+r^2)^{\beta }}, (\beta ,\theta \ge 0)\) is nonnegative and non-increasing.

Ha and Tadmor [16] provided a formal derivation of the mean field limit equation for the (1.1) based on the BBGKY hierarchy theory, and subsequently obtained the corresponding mean field equation.

$$\begin{aligned} \left\{ \begin{array}{l} \partial _t f+v \cdot \nabla _x f+{\text {div}}_v(L[f] f)=0, \\ f(0, x, v)=f_{0}(x, v), \end{array}\right. \end{aligned}$$
(1.2)

and

$$\begin{aligned} L[f](t, x, v)=\int \limits _{\mathbb {R}^{2 d}} \phi (x-y) (w-v)f(t, y, w) \mathrm {~d} y \mathrm {~d} w=-U \star f\left( x,v\right) , \end{aligned}$$

where \(U(x,v)=\phi (|x|)v\) and \(\star \) represents the convolution of the function. The function \(f(t, x, v) \ge 0\) denotes a microscopic density of particles at time \(t \ge 0\) and position \(x \in \mathbb {R}^d\), moving with velocity \(v \in \mathbb {R}^d.\)

When the number of particle was large, Ha and Tadmor [16] introduced kinetic Cucker–Smale model based on the method of statistical mechanics, Subsequently, Ha and Liu [17] rigorously justified the mean field limit from the Cucker–Smale particle model to the kinetic Cucker–Smale model, and prove the existence and uniqueness of the measure-valued solution of the kinetic Cucker–Smale model. Later, Canizo and Carrillo et al. [18] give an elegant proof for the mean field limit by employing the modern theory of optimal transport. In [19], they further refined the results in [17] and proved the unconditional flocking theorem for the measure-valued solutions to the kinetic Cucker–Smale model. It is worth noting that kinetic model (1.2) can also be derived from the Boltzmann-type equation [19]. Mucha and Peszek [20] investigated the existence of measure solutions for the system when \(\beta <\frac{1}{2}\), but did not establish the uniqueness of these measure solutions. Currently, the uniqueness of measure solutions remains an open problem. Ha and Kim et al. [21] studied the mean field limit of the singular form Cucker–Smale model using the method of [22].

For the convenience of narration, we introduce certain properties of the relative entropy and several majorized inequalities involving the relative entropy and the p-Wasserstein distance.

Definition 1.1

Let \(\Sigma \) denote the v-projection of supp\(f(\cdot ,t)\)

$$\begin{aligned} \Sigma (t):=\{v\in \mathbb {R}^d:\exists \,(x,v)\in \mathbb {R}^{2d}\quad \text {such that}\quad f(x,v,t)\ne 0\}, \end{aligned}$$

and \(R:=\max \limits _{v\in \Sigma (t)}|v|.\)

Let (Xd) be a locally compact and separable metric space. Given \(1 \le p<\infty \), let

$$\begin{aligned} \mathscr {P}_p(X):=\left\{ \sigma \in \mathscr {P}(X) \mid \int \limits _X d\left( {x}, {x}_0\right) ^p \mathrm {~d} \sigma ({x})<+\infty \text{ for } \text{ some } {x}_0 \in X\right\} \end{aligned}$$

be the set of probability measures with finite p-moment.

Lemma 1.1

[23, 24](p-Wasserstein distance) Given \(\mu , \nu \in \mathscr {P}_p(X)\), we define their p-Wasserstein distance as

$$\begin{aligned} W_p(\mu ,\nu ):=\left( \inf _{\gamma \in \Gamma (\mu , \nu )} \int \limits _{X \times X} d({x}, {y})^p \mathrm {~d} \gamma ({x}, {y})\right) ^{\frac{1}{p}}. \end{aligned}$$

Then \(W_p\) is a distance on the space \(\mathscr {P}_p(X)\).

Definition 1.2

[24] (Relative entropy) Let \(\mu \) and \(\nu \) be two probability measures on X. Considering the relative entropy of \(\mu \) with respect to \(\nu \) by

$$\begin{aligned} \left. H(\mu ,\nu )=\left\{ \begin{array}{ll} \int _{X} f \ln f \mathrm {~d}\nu , &{} \quad f:=\frac{\textrm{d}\mu }{\textrm{d}\nu }\, \ \ \text {if}\,\mu \ll \nu ,\\ +\infty , &{} \quad \,\textrm{otherwise}.\end{array}\right. \right. \end{aligned}$$

Definition 1.3

[25, 26] (Rescaled relative entropy) Let \(f_N, g_N \in \mathscr {P}(E^N )\), we denote by \(Z = (z_1,\cdots ,z_N ) \in E^N\). The rescaled relative entropy of \(f_N\) with respect to \(g_N\) is defined as

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid g_N\right) (t):=\frac{1}{N}{H}\left( f_N \mid g_N\right) (t)=\frac{1}{N} \int \limits _{E^N } f_N \ln \left( \frac{f_N}{g_N}\right) \textrm{d} Z. \end{aligned}$$

To the best of our knowledge, there exist two distinct (yet related) approaches for deriving the kinetic equation as a mean field limit of a specific microscopic model. More description and research methods of the mean field limit can be found in Ref. [9, 22, 27,28,29,30,31,32,33,34,35] etc.

(1) Convergence methods for empirical measures.

The first approach stems directly from the empirical density of an N-particle system. Let the solution of the N-particle system at time t be

$$\begin{aligned} Z(t)=\left( \varvec{x}_1(t), \varvec{v}_1(t); \varvec{x}_2(t), \varvec{v}_2(t); \ldots ; \varvec{x}_N(t), \varvec{v}_N(t)\right) . \end{aligned}$$

Consider initial conditions \(Z_{N}(0)\) and the associated empirical measures at the initial time is

$$\begin{aligned} \mu ^N[Z(0)]=\frac{1}{N} \sum \limits _{i=1}^N \delta _{(\varvec{x}_i(0),\varvec{v}_i(0))} \rightarrow f_0(\varvec{x}, \varvec{v}). \end{aligned}$$

We look forward to show that at \(t>0\), the time-evolved empirical measures

$$\begin{aligned} \mu ^N\left[ Z(t)\right] =\frac{1}{N} \sum \limits _{i=1}^N \delta _{(\varvec{x}_i(t),\varvec{v}_i(t))}\rightarrow f_{t}(\varvec{x}, \varvec{v}) \end{aligned}$$

is well approximated, where \(f_t\) is a solution of a kinetic model (1.2).

The “well approximated” is understood in terms of the weak-\(*\) topology on the space of probability measures, metrized (for instance) by an appropriate p-Wasserstein distance (\(W_{p}\)).

(2)Propagation of molecular chaos. The second view focuses on the probability distribution \(f_N(x_1, v_1, \cdots , x_N, v_N, t)\) of the N-particle phase space. Considering the Liouville equation corresponding to the particle system (1.1):

$$\begin{aligned} \partial _t f_N+\sum _{i=1}^N v_i \cdot \nabla _{x_i} f_N+\frac{1}{N} \sum _{i=1}^N \nabla _{v_i} \cdot \left( \sum _{j=1}^N \phi \left( x_i, x_j\right) \left( v_j-v_i\right) f_N\right) =0. \end{aligned}$$
(1.3)

When \(t=0\) the particles are asymptotically independently distributed according to \(f_{0}^{\otimes N}\), writing \({z}_i=\left( {x}_i, {v}_i\right) ,i=1,2,\ldots N\), that is

$$\begin{aligned} f_N^0({z}_1,{z}_{2}, \ldots , {z}_N)\rightarrow \prod \limits _{i=1}^N f_0({z}_{i}):=f_{0}^{\otimes N} \end{aligned}$$

on \(\mathbb {R}^{2d N}\). Considering the k-marginal of \(f^{t}_N\), for \(1\le k \le N\), we aim to demonstrate that

$$\begin{aligned} f_{N,k}(t):= f_{N,k}^{t}\left( {z}_1,{z}_{2}, \ldots , {z}_k\right) =\int \limits _{\mathbb {R}^{2d(N-k)}} f_N^t\left( {z}_1,{z}_{2}, \ldots , {z}_N\right) \mathrm {~d} {z}_{k+1} \ldots \textrm{d} {z}_{N} \rightarrow \prod _{i=1}^k f_t\left( {z}_i\right) :=f_{t}^{\otimes k},\nonumber \\ \end{aligned}$$
(1.4)

where \(f_{t}\) is the solution of mean field equation (1.2) and the convergence may be Wasserstein distance \(W_{p}(f_{N,k}(t), f(t)^{\otimes k})\) [17, 22, 36,37,38,39,40,41], Relative entropy \(\mathcal {H}_{k}(f_{N,k}(t), f(t)^{\otimes k})\) [25, 26, 42,43,44,45,46], \(L^{p}\) [47] etc, this is also known as propagation of molecular chaos (Kac’s chaotic).

By a well-known result of probability theory, molecular chaos is then equivalent to the convergence in law of the empirical measures. (See Kac [28], Sznitman [30], Golse [9], Jabin [32], Mischler and Hauray [48], for recent quantitative results).

The study of the mean field limit has been particularly rich, especially in systems of interacting particles with singular potentials in recent years. Serfaty has made significant advancements in investigating the mean field limit of singular potentials for first-order systems with Riesz-type and Coulomb-type potentials (refer to [49]). Serfaty introduced a modulated energy method for controlling the integral with respect to the empirical measure, which has found widespread application (see [50,51,52,53,54,55,56,57,58]). Jabin and Wang [26] have made significant strides in employing global relative entropy to establish the mean field limit of first-order systems with singular potentials. In recent years, the use of relative entropy (both global relative entropy and local relative entropy) to investigate the mean field limit has become a widely adopted approach. This method has been extensively applied in literature addressing the propagation of chaos in first-order systems with singular potentials (refer to [25, 42, 43, 46, 59]). A series of mean field limit problems have been successfully addressed through the ingenious combination of the relative entropy method and modulation energy (see [44, 45]). The mean field limit of singular second-order systems poses a significant challenge. The mean field limit of Vlasov–Poisson equation is still an open question.

The previously known result on the mean field limit of the Cucker–Smale model is as follows:

(1)Qualitative Estimates. Ha and Liu [17] derived a priori estimate for the stability of the measure-valued solution of this model under the Wasserstein-1 distance. Utilizing this estimate, the limit process from the Cucker–Smale particle model to the Cucker–Smale kinetic model is rigorously established through the particle approximation method, i.e., \(W_{1}(\mu _{t}^N, f_{t})\rightarrow 0 \) as \(N\rightarrow \infty \). A qualitative estimation under \(W_{1}\) is also provided using the method of optimal transport theory in Canizo and Carrillo et al. [18]. It’s worth noting that this is only a qualitative result and does not provide the convergence rate for the number of particles N.

(2)Quantitative Estimates. Natalini and Paul [36] provided the convergence rate of the solution k-tensor product \(f(t)^{\otimes k } \) of the kinetic equation (1.2) and the solution k-marginal \(f_{N,k}(t) \) of the Liouville equation (1.3) corresponding to the particle model (1.1) in the p-Wasserstein (\(1 \le p \le \infty \)) distance with respect to N. For \(t \in [0,T]\), it is described as follows:

$$\begin{aligned} W^p_p(f_{N,k}(t),f(t)^{\otimes k}) \le C_{1} \frac{k}{N^{\min (p/2,1)}}e^{C_{2}t}, \quad 1\le k\le N, \end{aligned}$$

where \(C_{1}\) and \(C_{2}\) are two positive constants depending on \(p,\phi ,T,R\). It can be proved that \(R=\max \limits _{v\in \Sigma (0)}|v|\), where \(\Sigma \) is defined by Definition 1.1.

When \(p = 2\), Nguyen and Shvydkoy [37] provided the convergence rate of \(f(t)^{\otimes k} \) and \(f_{N,k}(t) \) in the \(W_{2}\) distance. The exponential convergence respect to t in [36] is improved to polynomial convergence, and there exists a constant C which depends only on R and \(\phi \) such that for \(1\le k\le N\), \(t\ge 0\) one has

$$\begin{aligned} W_{2}\bigg (f_{N,k}(t),f(t)^{\otimes k }\bigg )\le C \sqrt{k} \min \bigg \{ 1, \frac{t}{\sqrt{N}}\bigg \}. \end{aligned}$$

As far as we know, in the previous literature, the mean field limit of the Cucker–Smale model is generally the convergence in the p-Wasserstein distance, and this convergence is the weak form of convergence, we expect to give the convergence in the stronger distance. In the recent literature [25, 26, 44], based on relative entropy theory and leveraging the classical Csiszár–Kullback–Pinsker inequality (see Lemma 4.4), the convergence in \(L^{1}\) distance has been demonstrated. When the space is compact, it is noteworthy that the rate of convergence of the p-Wasserstein distance can be characterized by a Talagrand-type inequality (refer to Lemmas 4.2 and 4.3). This result is described as

$$\begin{aligned} W_p(f_{N,k}(t),f(t)^{\otimes k}) \le C(f,p) \bigg ( k\mathcal {H}_k\left( f_{N, k} \mid f^{\otimes k}\right) \bigg )^{1/2p}, \end{aligned}$$
(1.5)

for any \(p\ge 1\).

Outline of argument. Inspired by Jabin and Wang [25, 26], our idea is also based on the control of the relative entropy between the solution k-tensor product \(f(t)^{\otimes k } \) of the kinetic equation (1.2) and the solution k-marginal \(f_{ N,k } (t) \)of the Liouville equation (1.3) corresponding to the particle model (1.1). For the control of the global relative entropy, we are mainly based on the law of large numbers established in Jabin and Wang [26] (see Lemma 4.6). We will establish the relative entropy control inequality:

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) (t)\le \Bigg ( \mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) + \frac{\bar{C}}{N} \Bigg ) e^{t/ \gamma }. \end{aligned}$$

where \(\bar{C}\) and \(\gamma \) depends on \(T, R, d, |\nabla _{v} \ln f_{0}|, ||\phi ||_{L^{\infty }}\), for detailed expressions, please refer to Theorem 4.1.

When \(\mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) \rightarrow 0\) as \(N\rightarrow \infty ,\) applying the properties of relative entropy (Lemma 4.1), For any fixed \(1 \le k \le N \), there is

$$\begin{aligned} \mathcal {H}_k\left( f_{N, k} \mid f^{\otimes k}\right) =\frac{1}{k} \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^k} f_{N, k} \log \left( \frac{f_{N, k}}{f^{\otimes k}}\right) \textrm{d} z_1 \cdots \textrm{d} z_k \le \mathcal {H}_N\left( f_N \mid f^{\otimes N}\right) \longrightarrow 0, \end{aligned}$$

as \(N \rightarrow \infty \). The classical Csiszár–Kullback–Pinsker inequality (see Remark 4.2) then implies that

$$\begin{aligned} \left\| f_{N, k}-f^{\otimes k}\right\| _{L^1} \le \sqrt{2 k \mathcal {H}_k\left( f_{N, k} \mid f^{\otimes k}\right) } \rightarrow 0 \end{aligned}$$

as \(N \rightarrow \infty \).

To the best of our knowledge, in this paper, we are also the first to apply Lemma 4.6 to a second-order system. Please refer to Sect. 4 for the detailed proof process.

Organization of paper. Let us briefly comment on the organization of the remaining body of the article.

In Sect. 2, we present the existence and uniqueness of classical solutions for the kinetic Cucker–Smale model (1.3), and establish the boundedness of \(|\nabla \ln f\left( t,x,v\right) |\) under certain conditions. In Sect. 3, we outline the existence of weak solutions to the Liouville equation (1.2). In Sect. 4, We introduce some inequalities related to relative entropy. Subsequently, we utilize the results established in Sect. 2 to demonstrate the rate of convergence under the \(L^{1}\) metric between the solution k-tensor product \(f(t)^{\otimes k}\) of the kinetic Cucker–Smale model (1.3) and the k-marginal \(f_{N,k}(t)\) of the solution of the Liouville equation (1.2) corresponding to the Cucker–Smale particle model (1.1).

2 Existence and uniqueness of classical solution for kinetic Cucker–Smale model

In this section, we establish the existence and uniqueness of the classical solution to the mean field equation (1.2) and demonstrate that \(|\nabla \ln f\left( t,x,v\right) |\) is bounded under certain conditions.

Theorem 2.1

Consider the kinetic model (1.2). Suppose that the initial datum \(f_0 \in C^1\left( \mathbb {R}^{2 d}\right) \cap W^{1, \infty }\left( \mathbb {R}^{2 d}\right) \) satisfies

  1. (1)

    Initial datum is compactly supported in the phase space, \({\text {supp}}_{(x, v)} f_0(\cdot )\) is bounded.

  2. (2)

    Initial datum is \(C^1\)-regular and bounded:

    $$\begin{aligned} \sum _{0 \le |\alpha |+|\beta | \le 1}\left| \nabla _x^\alpha \nabla _v^\beta f_0\right|<\infty \ \ and \ \ |\nabla \ln f_{0}| < \infty . \end{aligned}$$

Then, for any \(T \in (0, \infty )\), there exists a unique classical solution \(f \in C^1\left( \mathbb {R}^{2 d} \times \right. \) [0, T)) and

$$\begin{aligned} |\nabla \ln f\left( t,x,v\right) | < e^{(2R+2+d)t} |\nabla \ln f_{0}|, \end{aligned}$$

where \(R=\max \limits _{v\in \Sigma (0)}|v|\) and \(\Sigma \) is defined by Definition 1.1.

Remark 2.1

When the kinetic model (1.2) has compact support in the initial density, it maintains compact support after time evolution (see Ref. [18, 19, 60]). In other words, \({\text {supp}}_{(x, v)} f_t(\cdot )\) is bounded for \(t\in [0,T]\).

Proof

The proof of the function \(f \in C^1\left( \mathbb {R}^{2 d} \times \right. \) [0, T)) can be found in Ref. [16]. Below, we will provide a proof of the boundedness of the gradient of the logarithmic density function.

It is easy for us to know

$$\begin{aligned}{\text {div}}_{{(x,v)}}\left( {\begin{array}{*{20}{c}} v\\ {L[f]} \end{array}} \right) g = v \cdot {\nabla _x}g + L[f] \cdot {\nabla _v}g -(d \ \phi \star f) \ g.\end{aligned}$$

We first denote the linear operator H for a fixed f(tx) as

$$\begin{aligned} H=v \cdot \nabla _x +L[f] \cdot \nabla _v -(d \ \phi \star f), \end{aligned}$$

Therefore, we have

$$\begin{aligned} {\text {div}}_{{(x,v)}}\left( {\begin{array}{*{20}{c}} v\\ {L[f]} \end{array}} \right) g=H(g). \end{aligned}$$

Denote

$$\begin{aligned} \vec {N}=\nabla \ln f=\left( \begin{array}{c} \vec {N}_x \\ \vec {N}_v \end{array}\right) =\left( \begin{array}{c} \nabla _x \ln f \\ \nabla _v \ln f \end{array}\right) , \quad \vec {n}=\frac{\vec {N}}{|\vec {N}|} \end{aligned}$$

Applying \(\nabla _{x}\) and \(\nabla _{v}\) to (1.2), we have

$$\begin{aligned} \partial _{t}\left( \nabla _{\varvec{x}} f\right) +\varvec{v} \cdot \nabla _{\varvec{x}}\left( \nabla _{\varvec{x}} f\right) +L[f] \cdot \nabla _{\varvec{v}}\left( \nabla _{\varvec{x}} f\right) =-\nabla _{\varvec{x}} f \cdot \nabla _{\varvec{v}} \cdot L[f]-\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} f-f \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f]),\nonumber \\ \end{aligned}$$
(2.1)

and

$$\begin{aligned} \partial _{t}\left( \nabla _{\varvec{v}} f\right) +\varvec{v} \cdot \nabla _{\varvec{x}}\left( \nabla _{\varvec{v}} f\right) +L[f] \cdot \nabla _{\varvec{v}}\left( \nabla _{\varvec{v}} f\right) =-\nabla _{\varvec{v}} f \cdot \nabla _{\varvec{v}} \cdot L[f]-\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} f-\nabla _{\varvec{x}}f. \end{aligned}$$
(2.2)

We rewrite as

$$\begin{aligned} \begin{aligned}&(\partial _{t}+H)(\nabla _{\varvec{x}}f)= -\nabla _{\varvec{x}} f \cdot \nabla _{\varvec{v}} \cdot L[f]-\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} f-f \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f])-(d \ \phi \star _{(x,v)}f)\nabla _{\varvec{x}}f,\\&(\partial _{t}+H)(\nabla _{\varvec{v}}f)= -\nabla _{\varvec{v}} f \cdot \nabla _{\varvec{v}} \cdot L[f]-\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} f-\nabla _{\varvec{x}}f-(d \ \phi \star _{(x,v)}f)\nabla _{\varvec{v}}f. \end{aligned} \end{aligned}$$
(2.3)

To facilitate the calculation, we define linear operators

$$\begin{aligned} L=v \cdot \nabla _x () +L[f] \cdot \nabla _v (), \end{aligned}$$

So for any differentiable function G, we have

$$\begin{aligned} (\partial _{t}+H)(g)=(\partial _{t}+L)(g)-(d \ \phi \star f) g. \end{aligned}$$

By simple calculation, we have

$$\begin{aligned} \begin{aligned} (\partial _{t}+L)\left( \nabla _{\varvec{x}} \ln f\right)&= \partial _{t}\left( \nabla _{\varvec{x}} \ln f\right) +\varvec{v} \cdot \nabla _{\varvec{x}}\left( \nabla _{\varvec{x}} \ln f\right) +L[f] \cdot \nabla _{\varvec{v}}\left( \nabla _{\varvec{x}} \ln f\right) \\&= \dfrac{\partial _{t}\left( \nabla _{\varvec{x}} f\right) +\varvec{v} \cdot \nabla _{\varvec{x}}\left( \nabla _{\varvec{x}} f\right) +L[f] \cdot \nabla _{\varvec{v}}\left( \nabla _{\varvec{x}} f\right) }{f}-\dfrac{\left( \partial _{t}f+\varvec{v} \cdot \nabla _{\varvec{x}}f+L[f] \cdot \nabla _{\varvec{v}}f\right) \nabla _{\varvec{x}} f}{f^2}\\&= \dfrac{-\nabla _{\varvec{x}} f \cdot \nabla _{\varvec{v}} \cdot L[f]-\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} f-f \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f])}{f}+\dfrac{\nabla _{\varvec{x}}f \cdot \nabla _{\varvec{v}} \cdot L[f]}{f}\\&= \dfrac{-\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} f-f \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f])}{f}\\&= -\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} \ln f- \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f]) \end{aligned} \end{aligned}$$

It is the same idea as above

$$\begin{aligned} \begin{aligned} (\partial _{t}+L)\left( \nabla _{\varvec{v}} \ln f\right)&= \partial _{t}\left( \nabla _{\varvec{v}} \ln f\right) +\varvec{v} \cdot \nabla _{\varvec{x}}\left( \nabla _{\varvec{v}} \ln f\right) +L[f] \cdot \nabla _{\varvec{v}}\left( \nabla _{\varvec{v}} \ln f\right) \\&= \dfrac{\partial _{t}\left( \nabla _{\varvec{v}} f\right) +\varvec{v} \cdot \nabla _{\varvec{x}}\left( \nabla _{\varvec{v}} f\right) +L[f] \cdot \nabla _{\varvec{v}}\left( \nabla _{\varvec{v}} f\right) }{f}-\dfrac{\left( \partial _{t}f+\varvec{v} \cdot \nabla _{\varvec{x}}f+L[f] \cdot \nabla _{\varvec{v}}f\right) \nabla _{\varvec{v}} f}{f^2}\\&= \dfrac{-\nabla _{\varvec{v}} f \cdot \nabla _{\varvec{v}} \cdot L[f]-\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} f-\nabla _{\varvec{x}}f}{f}+\dfrac{\nabla _{\varvec{v}}f \cdot \nabla _{\varvec{v}} \cdot L[f]}{f}\\&= \dfrac{-\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} f-\nabla _{\varvec{x}}f}{f}\\&= -\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} \ln f- \nabla _{\varvec{x}} \ln f. \end{aligned} \end{aligned}$$

We rewrite as

$$\begin{aligned} \left( \partial _t+L\right) \vec {N}=\left( \begin{array}{c} -\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} \ln f- \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f]) \\ -\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} \ln f- \nabla _{\varvec{x}} \ln f \end{array}\right) . \end{aligned}$$
(2.4)

From the definition of the operator L, we know that it is a linear differential operator. Using (2.3) and (2.4), we employ \(\lambda (t)=e^{-Ct}\) that

$$\begin{aligned} \begin{aligned} \left( \partial _t+H\right) \left( \lambda (t)|\nabla \ln f|\right)&=\lambda (t) \left( \partial _t+L\right) \left( |\nabla \ln f|\right) -(d \ \phi \star f)\left( \lambda (t)|\nabla \ln f|\right) -C\lambda (t)|\nabla \ln f|\\&=\lambda (t)\vec {n} \cdot \left( \partial _t+L\right) \vec {N}-\lambda (t)(d \ \phi \star f)\left( |\nabla \ln f|\right) -C\lambda (t)|\nabla \ln f|\\&= \lambda (t)\vec {n} \cdot \left( \begin{array}{c} -\nabla _{\varvec{x}} L[f] \cdot \nabla _{\varvec{v}} \ln f- \nabla _{\varvec{x}} (\nabla _{\varvec{v}} \cdot L[f]) \\ -\nabla _{\varvec{v}} L[f] \cdot \nabla _{\varvec{v}} \ln f- \nabla _{\varvec{x}} \ln f \end{array}\right) \\&\quad -\lambda (t)(d \ \phi \star f)|\nabla \ln f|-C\lambda (t)|\nabla \ln f| \\&\le \lambda (t)\left( 2R\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }+1+\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }\right) |\nabla \ln f|\\&\quad + d\lambda (t)\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) } |\nabla \ln f|-C\lambda (t)|\nabla \ln f|\\&=\lambda (t)\left( 2R\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }+1+\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }+d\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) } \right) |\nabla \ln f| -C\lambda (t)|\nabla \ln f|\\&=0. \end{aligned} \end{aligned}$$

where \(C=2R\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }+1+\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }+d\Vert f\Vert _{L^1\left( \mathbb {R}^{2 d}\right) }.\) Observing that f is a probability measure, we can deduce that \(C=2R+2+d\).

For \(( x, v, t) \in \mathbb {R} ^d\times \mathbb {R} ^d\times [ 0, T) \), we denote (X(stxv), V(stxv)) as the particle trajectory passing through (xv) at time t, i.e.,

$$\begin{aligned} \begin{aligned}&\frac{\hbox {d}}{\hbox {d}s} X(s;t,x,v)=V(s;t,x,v),\quad s\in [0,T),\\&\frac{\hbox {d}}{\hbox {d}s} V(s;t,x,v)=-\int \limits _{\mathbb {R}^{2d}}\psi (|X(s;t,x,v)-y|)(V(s;t,x,v)-w)f(y,w,s) \hbox {d}y \hbox {d}w,\end{aligned} \end{aligned}$$

subject to initial data

$$\begin{aligned} X(t;t,x,v)=x,\quad V(t;t,x,v)=v. \end{aligned}$$

For notational simplicity

$$\begin{aligned}{}[x(s),v(s)]:=[X(s;t,x,v),V(s;t,x,v)]. \end{aligned}$$

We have that

$$\begin{aligned} \begin{aligned}&\frac{\hbox {d}}{\hbox {d}s}\left( \lambda (s)|\nabla \ln f\left( s,X(s;t,x,v),V(s;t,x,v)\right) |\right) \\&=\partial _s \left( \lambda (s)|\nabla \ln f|\right) +\frac{\hbox {d} X}{\hbox {d}s} \cdot \nabla _{X} \left( \lambda (s)|\nabla \ln f|\right) +\frac{\hbox {d} V}{\hbox {d}s} \cdot \nabla _{V} \left( \lambda (s)|\nabla \ln f|\right) \\&=\left( \partial _s+H\right) \left( \lambda (s)|\nabla \ln f\left( s,X(s;t,x,v),V(s;t,x,v)\right) |\right) \\&\le 0, \end{aligned} \end{aligned}$$
(2.5)

this means that

$$\begin{aligned} \begin{aligned} \lambda (t)|\nabla \ln f\left( t,x,v\right) |&\le \lambda (0)|\nabla \ln f\left( 0,X(0;t,x,v),V(0;t,x,v)\right) |\\&=|\nabla \ln f\left( 0,X(0;t,x,v),V(0;t,x,v)\right) |. \end{aligned} \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} |\nabla \ln f\left( t,x,v\right) | \le e^{Ct}|\nabla \ln f\left( 0,X(0;t,x,v),V(0;t,x,v)\right) | \end{aligned}$$

for \(\forall t \in [0,T].\)

This completes the proof. \(\square \)

3 The existence of weak solution of Liouville equation

In this section, we state the existence of weak solutions of the Liouville equation (1.3).

Notations: We denote \(X=\left( x_1, \cdots , x_N\right) \) and \(V=\left( v_1, \cdots , v_N\right) \) while keeping \(x \in \mathbb {R}^d\) and \(v \in \mathbb {R}^d\) for the variables. We also use \(z_i=\left( x_i, v_i\right) , z=(x, v)\) and \(Z=\left( z_1, \cdots , z_N\right) \).

In the following, we present the existence of a weak solution to the Liouville equation (1.3). We can refer to the description in Ref. [25] and the proof method in Ref. [61].

Proposition 3.1

(Existence of weak solution of Liouville equation (1.3)) Assume that \(\phi \) is a bounded function and that the initial data \(f_N^0 \ge 0\) satisfies the following assumptions

  1. (1)
    $$\begin{aligned} f_N^0 \in L^{\infty }\left( (\mathbb {R}^d \times \mathbb {R}^d)^N\right) \bigcap L^1\left( (\mathbb {R}^d \times \mathbb {R}^d)^N\right) \end{aligned}$$

    with

    $$\begin{aligned} \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} f_N^0 \mathrm {~d} Z=1. \end{aligned}$$
  2. (2)
    $$\begin{aligned} \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} f_N^0 \ln f_N^0 \mathrm {~d} Z<\infty . \end{aligned}$$
  3. (3)
    $$\begin{aligned} \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} \sum _{i=1}^N\left( 1+\left| x_i\right| ^{2}+\left| v_i\right| ^{2}\right) f_N^0 \mathrm {~d} Z<\infty . \end{aligned}$$

Then there exists \(f_N \ge 0\) in \(C\left( \mathbb {R}_{+}, L^1\left( (\mathbb {R}^d \times \mathbb {R}^d)^N\right) \right) \cap L^{\infty }\left( \mathbb {R}_{+}, L^{\infty }\left( (\mathbb {R}^d \times \mathbb {R}^d)^N\right) \right) \) solution to (1.3) in the sense of distribution and satisfying

  1. (1)
    $$\begin{aligned} \int \limits _{\left( \Omega \times \mathbb {R}^d\right) ^N} f_N(t, Z) \mathrm {~d} Z=1 \end{aligned}$$

    for a.e. t.

  2. (2)
    $$\begin{aligned} \begin{aligned}&\int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} f_N(t, Z) \ln f_N(t, Z) \mathrm {~d} Z\\ \le&\dfrac{d}{N}\sum _{i=1}^N \sum _{j \ne i}^{N}\int _0^t \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} \phi \left( x_i-x_j\right) {f_N(s, Z)} \mathrm {~d} Z \mathrm {~d} s + \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} f_N^0 \ln f_N^0 \mathrm {~d} Z \end{aligned} \end{aligned}$$
    (3.1)

    for a.e. t.

  3. (3)
    $$\begin{aligned} \sup _{t \in [0, T]} \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} \sum _{i=1}^N\left( 1+\left| x_i\right| ^{2}+\left| v_i\right| ^{2}\right) f_N(t, Z) \mathrm {~d} Z<\infty \end{aligned}$$

    for any \(T<\infty \).

Remark 3.1

As mentioned [25], we do not obtain the uniqueness of the weak solution of Proposition 3.1 here. This issue may be addressed through renormalization. Uniqueness and, in general, the well-posedness of the Cauchy problem for advection equations like (1.3), are typically handled through the theory of renormalized solutions as introduced in [62] and further developed in [63] (A theoretical description of this can be found in [64]).

4 Global relative entropy estimate for the Liouville equation

In section, first, we introduce the concept of propagation of chaos and provide the Csiszár–Kullback–Pinsker inequality. Next, we present a new law of large numbers at the exponential scale. Finally, we use these properties and the theorems established in the first two sections to present the results of propagation of chaos.

Definition 4.1

[28] (f-Kac’s chaotic) Consider \(E\subset \mathbb {R}^d, f\in \mathscr {P}(E)\) a probability measure on E and \(G^N\in \mathscr {P}_{sym}( E^N) \) a sequence of probability measures on \(E^N,N\geqslant 1\), which are invariant under coordinates permutations. We say that \((G^N)\) is f-Kac’s chaotic (or has the“Boltzmann property”) if

$$\begin{aligned} \begin{aligned}\forall j\geqslant 1,\quad G_j^N\rightharpoonup f^{\otimes j}\quad \text {weakly in }\mathscr {P}\big (E^j\big )\text { as }N\rightarrow \infty ,\end{aligned} \end{aligned}$$

where \(G_j^N\) stands for the j-th marginal of \(G^N\) defined by

$$\begin{aligned} G_j^N:=\intop _{E^{N-j}}G^N \mathrm {~d}x_{j+1}\cdots \textrm{d}x_N. \end{aligned}$$

Remark 4.1

What we refer to as “propagation of chaos” is known as f-Kac’s chaotic. The study of other forms of chaos, such as f-entropy chaotic and f-Fisher information chaotic, can be found in Ref. [48]. Under certain conditions, the three types of chaos are equivalent.

Assume that \(f_{N} \in \mathscr {P}_{sym}(E^{N})\), we define the k-marginal of \(f_N\) that

$$\begin{aligned} f_{N;k}=\int \limits _{E^{N-k} } f_N(Z) \mathrm {~d} z_{k+1}\textrm{d} z_{k+2}\cdots \textrm{d} z_{N}, \end{aligned}$$

where \(\mathscr {P}_{sym}(E^{N})\) is the totality of symmetric probability measures on \(E^{N}\).

Lemma 4.1

[48] Assume that \(f_{N} \in \mathscr {P}_{sym}(E^{N})\), \(f_{N,k}\) is the k-marginal of \(f_N\) and that \(f \in \mathscr {P}(E)\), then we have that

$$\begin{aligned} \mathcal {H}_k\left( f_{N;k} \mid f^{\otimes k}\right) \le \mathcal {H}_N\left( f_N \mid f^{\otimes N}\right) , \end{aligned}$$

where \(f^{\otimes k}=f(z_{1})f(z_{2})\cdots f(z_{k}).\)

Lemma 4.2

[65] Let X be a measurable space equipped with a measurable distance d, let \(p\ge 1\) and let \(\nu \) be a probability measure on X. Assume that there exist \(x_{0}\) such that

$$\begin{aligned} \int e^{\alpha d(x_0,x)^{2p}} \mathrm {~d} \nu (x)<\infty . \end{aligned}$$

Then

$$\begin{aligned} \forall \mu \in \mathscr {P}(X),\quad W_p(\mu ,\nu )\leqslant CH(\mu |\nu )^{\frac{1}{2p}}, \end{aligned}$$

where

$$\begin{aligned} C:=\left. 2\inf _{x_0\in X,\alpha >0}\left( \frac{1}{2\alpha }\Big (1+\ln \int e^{\alpha d(x_0,x)^{2p}} \mathrm {~d}\nu (x)\Big )\right) ^{\frac{1}{2p}}<+\infty .\right. \end{aligned}$$

According to Lemma 4.2, when the space X is bounded, we have the following control relation of p-Wasserstein distance and relative entropy.

Lemma 4.3

[65] When X is bounded, a simpler bound holds: \(\forall \mu \in \mathscr {P}(X)\),

$$\begin{aligned} W_p( \mu , \nu ) \leqslant 2^{\frac{1}{2p} } \textrm{diam}(X) H(\mu |\nu ) ^{\frac{1}{2p}} \end{aligned}$$

where \(\textrm{diam}(X): = \sup \{ d( x, y); x, y\in X\}.\)

The following theorem presents the celebrated Csiszár–Kullback–Pinsker inequality (see [24, 66,67,68,69]).

Lemma 4.4

[24, 66,67,68,69] The total variation distance between two probability measures \(\nu \) and \(\mu \) on X is defined by

$$\begin{aligned} \Vert \nu -\mu \Vert _{TV}=\sup |\nu (A)-\mu (A)|, \end{aligned}$$

where the supremum runs over all measurable \(A \subset X\). Then the following inequality

$$\begin{aligned}\Vert \nu -\mu \Vert _{TV}^2\le 2H(\nu |\mu )\end{aligned}$$

holds.

Remark 4.2

For a bounded signed measure \(\nu \), the total variation is given by

$$\begin{aligned} \Vert \nu \Vert _{\textrm{TV}}:=\sup _{|f|\le 1}\langle \nu ,f\rangle , \end{aligned}$$

where the supremum is over measurable functions f with \(|f| \le 1\). Therefore, let \(f_N, g_N \in \mathscr {P}(E^N )\), by Lemma 4.4, we have that

$$\begin{aligned} ||f_N-g_N||_{L^{1}} \le \sqrt{2N\mathcal {H}_N\left( f_N \mid g_N\right) }. \end{aligned}$$

Two important lemmas in this paper are provided below, and their proofs can be found in Ref. [26].

Lemma 4.5

[26] For any two probability densities \(\rho _N\) and \(\bar{\rho }_N\) on \(\mathbb {R}^{dN}\), and any \(\Phi \in L^{\infty }\left( \mathbb {R}^{dN}\right) \), one has that \(\forall \eta >0\)

$$\begin{aligned} \int \limits _{\mathbb {R}^{dN}} \Phi \rho _N \mathrm {~d} X^N \le \frac{1}{\eta }\left( \mathcal {H}_N\left( \rho _N \mid \bar{\rho }_N\right) +\frac{1}{N} \ln \int \limits _{\mathbb {R}^{dN}} \bar{\rho }_N e^{N \eta \Phi } \mathrm {~d} X^N\right) . \end{aligned}$$

Lemma 4.6

[26] Consider \(\bar{\rho } \in L^1\left( \mathbb {R}^d\right) \) with \(\bar{\rho } \ge 0\) and \(\int _{\mathbb {R}^d} \bar{\rho } \textrm{d} x=1\). Consider further any \(\phi (x, z) \in L^{\infty }\) with

$$\begin{aligned} \alpha :=C\left( \sup _{p \ge 1} \frac{\left\| \sup _z|\phi (\cdot , z)|\right\| _{\left. L^p(\bar{\rho })\right) }}{p}\right) ^2<1. \end{aligned}$$
(4.1)

where C is a universal constant. Assume that \(\phi \) satisfies the following cancellations

$$\begin{aligned} \int \limits _{\mathbb {R}^d} \phi (x, z) \bar{\rho }(x) \mathrm {~d} x=0 \quad \forall z, \quad \int \limits _{\mathbb {R}^d} \phi (x, z) \bar{\rho }(z) \mathrm {~d} z=0 \quad \forall x. \end{aligned}$$
(4.2)

Then

$$\begin{aligned} \int \limits _{\mathbb {R}^{dN}} \bar{\rho }_N \exp \left( \frac{1}{N} \sum _{i, j=1}^N \phi \left( x_i, x_j\right) \right) \textrm{d} X^N \le \frac{2}{1-\alpha }<\infty , \end{aligned}$$

where we recall that \(\bar{\rho }_N\left( t, X^N\right) =\prod _{i=1}^N \bar{\rho }\left( t, x_i\right) \).

Remark 4.3

Lemma 4.6 provides a large deviation estimate. A more simpler probabilistic proof of this estimate is presented in ([70], Section 5).

Consider the solution f of the kinetic equation (1.2) under the assumptions in Theorem 2.1. We define the tensor product of f as

$$\begin{aligned} \bar{f}_N(t, X, V)=f^{\otimes N}=\prod _{i=1}^N f\left( t, x_i, v_i\right) . \end{aligned}$$

We consider a weak solution \(f_N\left( t, Z\right) \) of the Liouville equation (1.3) under the assumptions of Proposition 3.1. Our method revolves around the control of the rescaled relative entropy

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) (t)=\frac{1}{N} \int \limits _{\mathbb {R}^{2d N}} f_N\left( t, Z\right) \ln \frac{f_N\left( t, Z\right) }{\bar{f}_N\left( t, Z\right) } \mathrm {~d} Z. \end{aligned}$$

Theorem 4.1

(Propagation of chaos) For \(\forall t\in [0,T]\), we suppose f is a classical solution of kinetic equation (1.2) satisfying the assumptions of Theorem 2.1, which are

  1. (1)

    \( f_0 \in C^1\left( \mathbb {R}^{2 d}\right) \cap W^{1, \infty }\left( \mathbb {R}^{2 d}\right) \ \ and\ \ {\text {supp}}_{(x, v)} f_0(\cdot )<\infty .\)

  2. (2)

    \( \sum \nolimits _{0 \le |\alpha |+|\beta | \le 1}\left| \nabla _x^\alpha \nabla _v^\beta f_0\right|<\infty \ \ and \ \ |\nabla \ln f_{0}| < \infty . \)

Assume that \(f_N\) is a weak solution of Liouville equation (1.3) satisfying the assumptions of Proposition 3.1, that is

  1. (3)

    \(f_N^0 \in L^{\infty }\left( (\mathbb {R}^d \times \mathbb {R}^d)^N\right) \cap L^1\left( (\mathbb {R}^d \times \mathbb {R}^d)^N\right) \) and \(\int _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} f_N^0 \mathrm {~d} Z=1.\)

  2. (4)

    \(\int _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} f_N^0 \ln f_N^0 \mathrm {~d} Z<\infty \) and \(\int \nolimits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} \sum _{i=1}^N\left( 1+\left| x_i\right| ^{2}+\left| v_i\right| ^{2}\right) f_N^0 \mathrm {~d} Z<\infty .\)

We assume that the initial conditions for the relative entropy satisfy

$$\begin{aligned} \mathcal {H}_N(f_N^0|\bar{f}_N^0)=\frac{1}{N}\int \nolimits _{\mathbb {R}^{2dN}}f_N^0\ln \bigg (\frac{f_N^0}{\bar{f}_N^0}\bigg )\mathrm {~d}Z\rightarrow 0,\quad as\quad N\rightarrow \infty . \end{aligned}$$

Then for any \(t\in [0,T]\), there exists a positive constant C depends on \(T, R, d, |\nabla _{v} \ln f_{0}|, \Vert \psi \Vert _{L^{\infty }}\) such that

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) (t)\le \Bigg ( \mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) + \frac{\bar{C}}{N} \Bigg ) e^{t/ \gamma }\rightarrow 0 \end{aligned}$$

as \(N\rightarrow \infty \), where \(\gamma = \root 2p \of {\frac{p^2}{2C_{1} \bar{A}^{2p}}}\), \(R:=\max \limits _{v\in \Sigma (0)}|v|\) and \(\bar{A}= 2 d \Vert \phi \Vert _{L^{\infty }}+2 R\Vert \phi \Vert _{L^{\infty }} e^{(2R+2+d)T} |\nabla _{v} \ln f_{0}|.\)

Further, for any fixed \(1\le k\le N\), the k-marginal \(f_{N,k} \) converges to the k-tensor product \(f^{\otimes k}\) in \(L^{1}\) as \(N\rightarrow \infty \), that is

$$\begin{aligned}\Vert f_{N,k}-f^{\otimes k}\Vert _{L^1}\rightarrow 0,\quad as~N\rightarrow \infty .\end{aligned}$$

Proof

By a simple calculation, we obtain

$$\begin{aligned} \partial _{t} \bar{f}_N(t, X, V)=\partial _{t}\left( \prod _{i=1}^N f\left( t, x_i, v_i\right) \right) =\bar{f}_N \sum _{i=1}^{N} \dfrac{\partial _{t}f\left( t, x_i, v_i\right) }{f\left( t, x_i, v_i\right) } \end{aligned}$$

and

$$\begin{aligned} \sum _{i=1}^N v_i \cdot \nabla _{x_i}\bar{f}_N(t, X, V)= \bar{f}_N \sum _{i=1}^{N} v_i \cdot \dfrac{\nabla _{x_i}f\left( t, x_i, v_i\right) }{f\left( t, x_i, v_i\right) }. \end{aligned}$$

Further, we have

$$\begin{aligned} \dfrac{1}{N}\sum _{i=1}^N \sum _{j =1}^{N} \phi \left( x_i-x_j\right) \left( v_j-v_i\right) \nabla _{v_i}\bar{f}_N=\bar{f}_N \dfrac{1}{N}\sum _{i=1}^N \sum _{j =1}^{N} \phi \left( x_i-x_j\right) \left( v_j-v_i\right) \dfrac{\nabla _{v_i}f\left( t, x_i, v_i\right) }{f\left( t, x_i, v_i\right) }. \end{aligned}$$

Assume that f satisfies (1.2). By Theorem 2.1, we know that f is \(C^{1}\) and \({\text {supp}}_{(x, v)} f_t(\cdot )\) is bounded, therefore, f decays at infinity without ever vanishing. Therefore \(\ln \bar{f}_{N}\) can be used as a test function against \(f_{N}\) which is a weak solution to the Liouville equation (1.3) so that

$$\begin{aligned} \begin{aligned}&\int \limits _{\mathbb {R}^{2dN}} f_N \ln \bar{f}_N \mathrm {~d} Z\\ =&\int \limits _{\mathbb {R}^{2dN}} f^{0}_N \ln \bar{f}^{0}_N \mathrm {~d} Z\\&-\int \limits _{0}^{t} \int \limits _{\mathbb {R}^{2dN}} f_N \left( \partial _{t}\ln \bar{f}_{N}+\sum _{i=1}^{N} v_{i} \cdot \nabla _{x_{i}}\ln \bar{f}_{N} +\dfrac{1}{N}\sum _{i=1}^N \sum _{j=1}^{N} \phi \left( x_i-x_j\right) \left( v_j-v_i\right) \cdot \nabla _{v_i}\ln \bar{f}_N \right) \mathrm {~d} Z. \end{aligned} \end{aligned}$$
(4.3)

For ease of calculation, we denote \(f\left( t, x_i, v_i\right) :=f_{i}\). By \(f_{i}\) satisfying (1.2), we have

$$\begin{aligned} \partial _t f_{i}+v_{i} \cdot \nabla _{x_i} f_{i}+L[f_{i}] \nabla _{v_i} f_{i}+f_{i} \nabla _{v_i}\cdot L[f_{i}]=0. \end{aligned}$$

By performing a simple calculation, we can obtain

$$\begin{aligned} \begin{aligned}&\partial _{t}\ln \bar{f}_{N}+\sum _{i=1}^{N} v_{i} \cdot \nabla _{x_{i}}\ln \bar{f}_{N} +\dfrac{1}{N}\sum _{i=1}^N \sum _{j \ne i}^{N} \phi \left( x_i-x_j\right) \left( v_j-v_i\right) \cdot \nabla _{v_i}\ln \bar{f}_N\\ =&\sum _{i=1}^{N} \left[ \dfrac{\partial _t f_{i}+v_{i} \cdot \nabla _{x_i} f_{i}+L[f_{i}] \nabla _{v_i} f_{i}+f_{i} \nabla _{v_i}\cdot L[f_{i}] }{f\left( t, x_i, v_i\right) } +\frac{\dfrac{1}{N} \sum \limits _{j=1}^{N} \phi \left( x_i-x_j\right) \left( v_j-v_i\right) \nabla _{v_i} f_{i}-L[f_{i}] \nabla _{v_i} f_{i} }{f\left( t, x_i, v_i\right) } \right] \\&+\sum _{i=1}^{N}\left( -\nabla _{v_i}\cdot L[f_{i}]-\dfrac{d}{N} \sum _{j =1}^{N} \phi \left( x_i-x_j\right) \right) + \dfrac{d}{N} \sum _{i=1}^{N}\sum _{j =1}^{N} \phi \left( x_i-x_j\right) . \end{aligned}\nonumber \\ \end{aligned}$$
(4.4)

Notice that \(\nabla _{v_i}\cdot L[f_{i}]=d\phi \star \rho (x_i)\) and \(L[f_{i}]=-U \star f(x_i, v_i)\), therefore we have that

$$\begin{aligned}{} & {} \partial _{t}\ln \bar{f}_{N}+\sum _{i=1}^{N} v_{i} \cdot \nabla _{x_{i}}\ln \bar{f}_{N} \nonumber \\{} & {} \quad +\dfrac{1}{N}\sum _{i=1}^N \sum _{j =1}^{N} \phi \left( x_i-x_j\right) \left( v_j-v_i\right) \cdot \nabla _{v_i}\ln \bar{f}_N=R_{N} + T_{N}+\dfrac{d}{N} \sum _{i=1}^{N}\sum _{j =1}^{N} \phi \left( x_i-x_j\right) , \end{aligned}$$
(4.5)

where

$$\begin{aligned} R_N=\frac{1}{N} \sum _{i, j=1}^N \nabla _{v_i} \ln f\left( x_i, v_i\right) \cdot \left\{ \phi \left( x_i-x_j\right) \left( v_j-v_i\right) +U \star f\left( x_i,v_i\right) \right\} , \end{aligned}$$
(4.6)

and

$$\begin{aligned} T_N=\frac{1}{N} \sum _{i=1}^N d \left\{ \phi \star \rho (x_i)- \sum _{j=1}^{N}\phi \left( x_i-x_j\right) \right\} . \end{aligned}$$
(4.7)

Combining (4.3) and (4.5), we get

$$\begin{aligned} \begin{aligned} \int \limits _{\mathbb {R}^{2dN}} f_N \ln \bar{f}_N \mathrm {~d} Z=&\int \limits _{\mathbb {R}^{2dN}} f^{0}_N \ln \bar{f}^{0}_N \mathrm {~d} Z-\int \limits _{0}^{t} \int \limits _{\mathbb {R}^{2dN}} {f_N(s, Z)} \left( R_{N}+T_{N} \right) \mathrm {~d} Z \mathrm {~d} s\\&-\dfrac{d}{N}\sum _{i=1}^N \sum _{j =1}^{N}\int \limits _0^t \int \limits _{\left( \mathbb {R}^d \times \mathbb {R}^d\right) ^N} \phi \left( x_i-x_j\right) {f_N(s, Z)} \mathrm {~d} Z \mathrm {~d} s. \end{aligned} \end{aligned}$$
(4.8)

We subtracting (4.8) from (3.1), we get

$$\begin{aligned} \begin{aligned}&\int \limits _{\mathbb {R}^{2dN}} f_N \ln {f}_N \mathrm {~d} Z-\int \limits _{\mathbb {R}^{2dN}} f_N \ln \bar{f}_N \mathrm {~d} Z\\ \le&\int \limits _{\mathbb {R}^{2dN}} f_N^0 \ln f_N^0 \mathrm {~d} Z-\int \limits _{\mathbb {R}^{2dN}} f^{0}_N \ln \bar{f}^{0}_N \mathrm {~d} Z-\int \limits _{0}^{t} \int \limits _{\mathbb {R}^{2dN}} {f_N} \left( R_{N}+T_{N} \right) \mathrm {~d} Z \mathrm {~d} s. \end{aligned} \end{aligned}$$
(4.9)

From the definition of relative entropy(Definition 1.3), we can see

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) \le \mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) -\dfrac{1}{N}\int \limits _{0}^{t} \int \limits _{\mathbb {R}^{2dN}} {f_N} \left( R_{N}+T_{N} \right) \mathrm {~d} Z \mathrm {~d} s. \end{aligned}$$
(4.10)

Notice that

$$\begin{aligned} \begin{aligned}&\dfrac{1}{N}\left( R_{N}+T_{N} \right) \\ =&\dfrac{1}{\gamma }\frac{1}{N^2} \sum _{i, j=1}^N \Bigg \{\gamma \nabla _{v_i} \ln f\left( x_i, v_i\right) \cdot \bigg [\phi \left( x_i-x_j\right) \left( v_j-v_i\right) +U \star f\left( x_i,v_i\right) \bigg ]+d \gamma \bigg (\phi \left( x_i-x_j\right) -\phi \star _{x} \rho (x_i)\bigg )\Bigg \} \\ =:&\dfrac{1}{\gamma }\frac{1}{N^2} \sum _{i, j=1}^N\psi (x_{i},x_{j})=\dfrac{1}{\gamma }\frac{1}{N^2} \sum _{i, j=1}^N\big (\psi _{1}(x_{i},v_{i},x_{j},v_{j})+\psi _{2}(x_{i},v_{i},x_{j},v_{j})\big ), \end{aligned}\nonumber \\ \end{aligned}$$
(4.11)

where \(\gamma \) is a positive constant, which will be discussed later when condition (4.1) in Lemma 4.6 is verified.

For ease of calculation in the following, we denote

$$\begin{aligned} \psi (x_{i},v_{i},x_{j},v_{j}):=\gamma \nabla _{v_i} \ln f\left( x_i, v_i\right) \cdot \bigg [\phi \left( x_i-x_j\right) \left( v_j-v_i\right) +U \star f\left( x_i,v_i\right) \bigg ]+d \gamma \bigg (\phi \left( x_i-x_j\right) -\phi \star _{x} \rho (x_i)\bigg ), \\ \psi _{1}(x_{i},x_{j}):=d \gamma \bigg (\phi \left( x_i-x_j\right) -\phi \star _{x} \rho (x_i)\bigg ) \end{aligned}$$

and

$$\begin{aligned} \psi _{2}(x_{i},v_{i},x_{j},v_{j}):=\gamma \nabla _{v_i} \ln f\left( x_i, v_i\right) \cdot \bigg [\phi \left( x_i-x_j\right) \left( v_j-v_i\right) +U \star f\left( x_i,v_i\right) \bigg ], \end{aligned}$$

where \(\rho (x)=\int \limits _{\mathbb {R}^d }f(x,v)\mathrm {~d}v.\)

Therefore, from (4.10) and (4.11), we obtain the relative entropy inequality

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) \le \mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) -\int \limits _{0}^{t} \dfrac{1}{\gamma }\int \limits _{\mathbb {R}^{2dN}} {f_N} \bigg [\frac{1}{N^2} \sum _{i, j=1}^N \psi (x_{i},v_{i},x_{j},v_{j})\bigg ]\mathrm {~d} Z \mathrm {~d} s. \end{aligned}$$
(4.12)

Applying Lemma 4.5, we obtain

$$\begin{aligned} \begin{aligned}&\dfrac{1}{\gamma }\int \limits _{\mathbb {R}^{2dN}} {f_N} \bigg [\frac{1}{N^2} \sum _{i, j=1}^N\psi (x_{i},v_{i},x_{j},v_{j})\bigg ]\mathrm {~d} Z\\&\quad \le \dfrac{1}{\gamma }\mathcal {H}_N\left( f_N \mid \bar{f}_N\right) (t)+\frac{1}{N} \dfrac{1}{\gamma }\ln \int \limits _{\mathbb {R}^{2d N}} \bar{f}_N \exp \Bigg \{\frac{1}{N} \sum _{i, j=1}^N \psi (x_{i},v_{i},x_{j},v_{j}) \Bigg \}\mathrm {~d} Z. \end{aligned} \end{aligned}$$
(4.13)

Below we verify that \(\psi \) satisfies the conditions of Lemma 4.6. Because the system’s mass is conserved, we know that f is a probability measure. Notice that

$$\begin{aligned} \begin{aligned} \psi (z,u; x,v)&=\psi _{1}(z; x)+\psi _{2}(z,u; x,v)\\&=d\gamma (\phi (|z-x|)-\phi \star \rho (z))+\gamma \left( \phi \left( |z-x|\right) \left( v-u\right) -(\phi u) \star f\left( z,u\right) \right) \cdot \nabla _{u} \ln f\left( z,u\right) , \end{aligned} \end{aligned}$$

Below we compute the integrals of \(\psi _{1}\) and \(\psi _{2}\) separately with respect to the density function f.

By simple calculation, we have that

$$\begin{aligned} \begin{aligned} \int \limits _{\mathbb {R}^{2d}} \psi _{1}(z, x) f(x,v) \mathrm {~d} x\textrm{d}v&=d\gamma \int \limits _{\mathbb {R}^{2d}} \phi (|z-x|) f(x,v) \mathrm {~d} x\textrm{d}v-d\int \limits _{\mathbb {R}^{2d}} \phi \star \rho (z) f(x,v) \mathrm {~d} x\textrm{d}v \\&=d\gamma \phi \star \rho (z)-d \phi \star \rho (z)\\&=0. \end{aligned} \end{aligned}$$
(4.14)

and

$$\begin{aligned} \begin{aligned} \int \limits _{\mathbb {R}^{2d}} \psi _{1}(z, x) f(z,u) \mathrm {~d} z\textrm{d} u&=d\gamma \int \limits _{\mathbb {R}^{2d}} \phi (|z-x|) f(z,u)\mathrm {~d} z\textrm{d} u-d\gamma \int \limits _{\mathbb {R}^d} \phi \star \rho (z) f(z,u) \mathrm {~d} \textrm{d} u \\&=d\gamma \phi \star \rho (x)-d\gamma \int \limits _{\mathbb {R}^{2d}} \phi \star \rho (z) \rho (z) \mathrm {~d} z. \end{aligned} \end{aligned}$$
(4.15)

Integrating by distribution, we obtain

$$\begin{aligned} \begin{aligned} \int \limits _{\mathbb {R}^{2d}} \psi _{2}(z,u; x,v) f(x,v) \mathrm {~d}x \mathrm {~d}v=&\gamma \int \limits _{\mathbb {R}^{2d}} \phi (z-x)(v-u)f(x,v) \cdot \nabla _{u} \ln f\left( z,u\right) \mathrm {~d}x \mathrm {~d}v\\&+\gamma \int \limits _{\mathbb {R}^{2d}} \int \limits _{\mathbb {R}^{2d}} \phi (z-y)(u-w)f(y,w)\mathrm {~d}y \mathrm {~d}w \cdot \nabla _{u} \ln f\left( z,u\right) f(x,v) \mathrm {~d}x \mathrm {~d}v\\ =&\gamma \nabla _{u} \ln f\left( z,u\right) \cdot \int \limits _{\mathbb {R}^{2d}} \phi (z-x)(v-u)f(x,v)\mathrm {~d}x \mathrm {~d}v\\&+\gamma \nabla _{u} \ln f\left( z,u\right) \cdot \int \limits _{\mathbb {R}^{2d}} \phi (z-y)(u-w)f(y,w)\mathrm {~d}y \mathrm {~d}w \\ =&0. \end{aligned} \end{aligned}$$
(4.16)

and

$$\begin{aligned} \begin{aligned} \int \limits _{\mathbb {R}^{2d}} \psi _{2}(z,u; x,v) f(z,u) \mathrm {~d}z \textrm{d}u=&\gamma \int \limits _{\mathbb {R}^{2d}} \phi (z-x)(v-u)f(z,u) \cdot \nabla _{u} \ln f\left( z,u\right) \mathrm {~d}z \textrm{d}u\\&+\gamma \int \limits _{\mathbb {R}^{2d}} \int \limits _{\mathbb {R}^{2d}} \phi (z-y)(u-w)f(y,w)\mathrm {~d}y \textrm{d}w \cdot \nabla _{u} \ln f\left( z,u\right) f(z,u) \mathrm {~d}z \textrm{d}u\\ =&\gamma \int \limits _{\mathbb {R}^{2d}} \phi (z-x)(v-u)\cdot \nabla _{u} f\left( z,u\right) \mathrm {~d}z \textrm{d}u\\&+\gamma \int \limits _{\mathbb {R}^{2d}} \int \limits _{\mathbb {R}^{2d}} \phi (z-y)(u-w)f(y,w)\mathrm {~d}y \textrm{d}w \cdot \nabla _{u} f\left( z,u\right) \mathrm {~d}z \textrm{d}u\\ =&-d\gamma \int \limits _{\mathbb {R}^{2d}} \phi (z-x) f\left( z,u\right) \mathrm {~d}z \textrm{d}u\\&+d\gamma \int \limits _{\mathbb {R}^{2d}} \int \limits _{\mathbb {R}^{2d}} \phi (z-y)f(y,w)\mathrm {~d}y \textrm{d}w ~f\left( z,u\right) \mathrm {~d}z \textrm{d}u\\ =&-d\gamma \phi \star \rho (x)+d\gamma \int \limits _{\mathbb {R}^{d}} \phi \star \rho (z) \rho (z) \mathrm {~d}z \end{aligned} \end{aligned}$$
(4.17)

By (4.14), (4.15), (4.16) and (4.17), we were surprised to find that

$$\begin{aligned} \int \limits _{\mathbb {R}^{2d}} \psi (z,u; x,v) f(x,v) \mathrm {~d}x \textrm{d}v=\int \limits _{\mathbb {R}^{2d}} \psi _{1}(z; x) f(x,v) \mathrm {~d}x \textrm{d}v+\int \limits _{\mathbb {R}^{2d}} \psi _{2}(z,u; x,v) f(x,v) \mathrm {~d}x \textrm{d}v=0. \end{aligned}$$

and

$$\begin{aligned} \int \limits _{\mathbb {R}^{2d}} \psi (z,u; x,v) f(z,u) \mathrm {~d}z \textrm{d}u=\int \limits _{\mathbb {R}^{2d}} \psi _{1}(z; x) f(z,u) \mathrm {~d}z \textrm{d}u+\int \limits _{\mathbb {R}^{2d}} \psi _{2}(z,u; x,v) f(z,u) \mathrm {~d}z \textrm{d}u=0. \end{aligned}$$

Thus, we have proved that \(\psi \) satisfies Condition (4.2).

Since f(xv) has compact support with respect to v. Applying Theorem 2.1, we know that

$$\begin{aligned} \begin{aligned} ||\psi ||_{L^{\infty }}&\le 2\gamma d ||\phi ||_{L^{\infty }}+2\gamma ||\phi ||_{L^{\infty }} R |\nabla _{v} \ln f|\\&\le \gamma \bigg (2 d ||\phi ||_{L^{\infty }}+2 R||\phi ||_{L^{\infty }} e^{(2R+2+d)T} |\nabla _{v} \ln f_{0}|\bigg ) \\&:=\gamma \bar{A}. \end{aligned} \end{aligned}$$
(4.18)

Below we choose a suitable \(\alpha \) to verify that Lemma 4.6 satisfies Condition (4.1). By (4.18), we obtain

$$\begin{aligned} \alpha =C_{1}\left( \sup _{p \ge 1} \frac{\left\| \sup _z|\psi (\cdot , z)|\right\| _{\left. L^p(\bar{\rho })\right) }}{p}\right) ^2\le C_{1}\dfrac{\gamma ^{2p}\bar{A}^{2p}}{p^2}, \end{aligned}$$

We chosen \(\gamma = \root 2p \of {\frac{p^2}{2C_{1} \bar{A}^{2p}}}\), then have \(\alpha \le \frac{1}{2}\). By Lemma 4.6, we have

$$\begin{aligned} \begin{aligned} \int \limits _{\mathbb {R}^{2d N}} \bar{f}_N \exp \Bigg \{\frac{1}{N} \sum _{i, j=1}^N \psi (x_{i},v_{i},x_{j},v_{j}) \Bigg \}\mathrm {~d} Z \le \dfrac{2}{1-\alpha }<\infty . \end{aligned} \end{aligned}$$

So, there exists a constant positive constant C such that

$$\begin{aligned} \dfrac{1}{\gamma }\ln \int \limits _{\mathbb {R}^{2d N}} \bar{f}_N \exp \Bigg \{\frac{1}{N} \sum _{i, j=1}^N \psi (x_{i},v_{i},x_{j},v_{j}) \Bigg \}\mathrm {~d} Z\le C. \end{aligned}$$
(4.19)

Therefore, we combine (4.12), (4.13) and (4.19) to obtain the important inequality

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) \le \mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) +\int \limits _{0}^{t} \Bigg \{\dfrac{1}{\gamma }\mathcal {H}_N\left( f_N \mid \bar{f}_N\right) (t)+\frac{C}{N} \Bigg \}\mathrm {~d} Z \mathrm {~d} s. \end{aligned}$$
(4.20)

For \(\forall t\in [0,T]\), by Gronwall’s inequality we have that

$$\begin{aligned} \mathcal {H}_N\left( f_N \mid \bar{f}_N\right) (t)\le \Bigg ( \mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) + \frac{\bar{C}}{N} \Bigg ) e^{t/ \gamma }, \end{aligned}$$

where \(\bar{C}\) depends on \(T, R, d, |\nabla _{v} \ln f_{0}|, ||\phi ||_{L^{\infty }}\). When \(\mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) \le \frac{C_{1}}{N}\), by (4.1) and classical Csiszár-Kullback-Pinsker inequality means

$$\begin{aligned} \begin{aligned} \left\| f_{N, k}-f^{\otimes k}\right\| _{L^1}&\le \sqrt{2 k \mathcal {H}_k\left( f_{N, k} \mid f^{\otimes k}\right) } \\&\le \sqrt{2 k \mathcal {H}_N\left( f_{N, k} \mid f^{\otimes k}\right) } \\&\le C \sqrt{\frac{k}{N}}, \end{aligned} \end{aligned}$$

where C depends on \(T, R, d, |\nabla _{v} \ln f_{0}|, ||\phi ||_{L^{\infty }}\). Therefore, we have

$$\begin{aligned}\Vert f_{N,k}-f^{\otimes k}\Vert _{L^1}\rightarrow 0,\quad as~N\rightarrow \infty .\end{aligned}$$

This completes the proof. \(\square \)

Remark 4.4

When \(\mathcal {H}_N\left( f^0_N \mid \bar{f}^0_N\right) \le \frac{C_{1}}{N}\), by Talagrand-type inequality (see Lemma 4.3 ), we have that

$$\begin{aligned} W_p(f_{N,k}(t),f(t)^{\otimes k}) \le C \bigg ( \frac{k}{N}\bigg )^{1/2p}, \end{aligned}$$

for any \(p\ge 1\), where positive constant C depends on \(T, R, d, |\nabla _{v} \ln f_{0}|, \Vert \psi \Vert _{L^{\infty }}.\) In particular, when \(p= 1\), one has the following

$$\begin{aligned} W_1(f_{N,k}(t),f(t)^{\otimes k}) \le C \dfrac{\sqrt{k}}{\sqrt{N}} \end{aligned}$$

holds.

Remark 4.5

Due to the limitation of the method, we cannot give the optimal rate of convergence with respect to N, and our convergence rate with respect to N cannot include the results in Ref. [36, 37]. It may be possible to enhance the convergence rate with respect to N to the optimal level by employing the local entropy method proposed in Ref. [42, 43].