1 Introduction

Let \(\{W_t\}_{t\geqslant 0}\) be a d-dimensional Brownian motion and \(\{L_t\}_{t\geqslant 0}\) be a d-dimensional Lévy process on a complete probability space \((\Omega , {\mathscr {F}}, {\mathbb {P}})\). Denote by \(N(\mathrm{d}z,\mathrm{d}t)\) the jump measure of \(L_t\). According to Lévy–Khintchine formula, the Lévy process \(L_t\) has the representation:

$$\begin{aligned} L_t=\int _0^t\!\!\int _{[0<|z|<1]}z\widehat{N}(\mathrm{d}z,\mathrm{d}s) +\int _0^t\!\!\int _{[|z|\geqslant 1]}zN(\mathrm{d}z,\mathrm{d}s), \end{aligned}$$

where \(\widehat{N}(\mathrm{d}z,\mathrm{d}s):=N(\mathrm{d}z,\mathrm{d}s)-\nu (\mathrm{d}z)\mathrm{d}s\) is the martingale measure and \(\nu \) is the characteristic measure of N under \({\mathbb {P}}\) satisfying \(\int _{{\mathbb {R}}^d\setminus \{0\}}(|z|^2\wedge 1)\nu (\mathrm{d}z)<\infty \). In this paper, we further assume \(\int _{[|z|\geqslant 1]}|z|^p\nu (\mathrm{d}z)<\infty \) for all \(p\geqslant 1\). Let \(\mathcal {P}_2\) be the collection of all probability measures with finite second moments on \({\mathbb {R}}^d\). Define the Wasserstein distance

$$\begin{aligned} {\mathbb {W}}_2(\mu _1,\mu _2):=\inf \limits _{\pi \in \mathfrak {C}(\mu _1,\mu _2)} \left( \int _{{\mathbb {R}}^d\times {\mathbb {R}}^d}|x-y|^2\pi (\mathrm{d}x,\mathrm{d}y)\right) ^{\frac{1}{2}}, \end{aligned}$$

where \(\mathfrak {C}(\mu _1,\mu _2)\) is the class of all couplings of \(\mu _1\) and \(\mu _2\). Then (\(\mathcal {P}_2, {\mathbb {W}}_2\)) is a Polish space.

For measurable maps

$$\begin{aligned} b:{\mathbb {R}}^d\times \mathcal {P}_2\rightarrow {\mathbb {R}}^d,\ \ \ \sigma :{\mathbb {R}}^d\times \mathcal {P}_2\rightarrow {\mathbb {R}}^d\otimes {\mathbb {R}}^d, \end{aligned}$$

and

$$\begin{aligned} f:\mathcal {P}_2\rightarrow {\mathbb {R}}^d\otimes {\mathbb {R}}^d, \end{aligned}$$

we consider the following mean-field stochastic differential equations (SDEs):

$$\begin{aligned} \mathrm{d}X_t=b(X_t,{\mathbb {P}}_{X_t})\mathrm{d}t+\sigma (X_t, {\mathbb {P}}_{X_t})\mathrm{d}W_t+f({\mathbb {P}}_{X_t})\mathrm{d}L_t, \end{aligned}$$
(1.1)

where \({\mathbb {P}}_{X_t}\) denotes the distribution of \(X_t\) under \({\mathbb {P}}\).

1.1 Background and Notations

Mean-field SDEs, also known as Mckean–Vlasov equations, were first studied by Kac [15] in the framework of his study of the Boltzman equation for the particle density in diluted monatomic gases, as well as in that of the stochastic toy model for the Vlasov kinetic equation of plasma. In [19], McKean studied the propagation of chaos in physical systems of N-interacting particles related to Boltzmann’s model for the statistical mechanics of rarefied gases. The limit of N-particle systems with weak interaction, formed by N equations forced by independent Brownian motions, can be described as the solution of a nonlinear deterministic evolution equation known as the McKean–Vlasov equations. These processes are nonlinear Markov processes. Their transition functions may not only depend on the current state of the processes, but also on the current distribution. Henceforth, many people paid their attention to the study of the equations: [2, 10,11,12, 17] and references therein for the study of chaos propagation and the limit equations; [5, 13] for the regularity of the value function and associated PDEs; [26] for gradient estimates and Harnack inequality in the diffusion case; [14] for kinds of continuity and Harnack inequality for functional type of distribution dependent SDEs. For more fruitful results, we refer to [16].

For distribution-independent SDEs driven by jump processes, gradient estimates were obtained in [22, 25, 27, 28] and references therein. For Eq. (1.1), when \(f\equiv 0\) and \(\sigma \) does not depend on distribution, Wang [26] investigated gradient estimates by using coupling method. It seems that there is no corresponding results for the general type of equations like (1.1). Hence, in this manuscript we shall use Malliavin calculus for Wiener–Poisson functionals as a technical tool to derive gradient estimates for Eq. (1.1). The two main procedures and novelties are: (1) adopt the notion of derivatives with respect to probability measures first introduced by Lions [18] to study the Jacobian which involves the distribution term; (2) for different non-degenerate conditions, construct corresponding modified Malliavin matrixes which have close relation to the integration by part formula and help us to establish derivative formulas, then to obtain sharp estimates.

The second objective of present paper is to study the exponential ergodicity of Eq. (1.1). In diffusion case (i.e., \(f\equiv 0\)), the existence of the invariant measure was derived by Benachour et al. [2], Veretennikov [29] and so on; and exponential convergence in the Wasserstein metric was established by Cattiaux et al. [7], Wang [26] and references therein. When \(f\equiv 0\) and \(\sigma (x,\mu )\equiv I\), the exponential convergence to an invariant measure in total variation distance was investigated by Butkovsky [6] under the Veretennikov–Khasminskii condition. Based on the derivative estimates obtained in Theorem 1.2 and under a dissipative condition, we will prove the exponential convergence to the unique invariant measure in totalvariationmetric of Eq. (1.1).

We will use the following notations frequently:

  • Denote by \({\mathscr {B}}({\mathbb {R}}^d)\) the \(\sigma \)-algebra generated by all open sets of \({\mathbb {R}}^d\) and by \({\mathscr {B}}_b({\mathbb {R}}^d)\) the class of all bounded and \({\mathscr {B}}({\mathbb {R}}^d)\)-measurable functions with the norm \(\Vert f\Vert _\infty :=\sup \nolimits _{x\in {\mathbb {R}}^d}|f(x)|\). \(C^1_b({\mathbb {R}}^d)\) is the collection of all bounded and differentiable functions with bounded and continuous derivatives. \({\mathbb {S}}^d\) stands for the unit sphere of \({\mathbb {R}}^d\). \({\mathbb {R}}^d_0\) denotes \({\mathbb {R}}^d\setminus \{0\}\).

  • The Hilbert–Schmidt norm of a matrix A is denoted by \(\Vert A\Vert _{HS}\), which is defined by \(\Vert A\Vert _{HS}:=\sqrt{\sum _{i,j}a_{ij}^2}\).

  • The letter C with or without indices will denote an unimportant constant, whose values may change from one appearance to another.

1.2 Assumptions and Main Results

Assume there is a sub-\(\sigma \)-field \({\mathscr {F}}_0\) satisfying: \({\mathscr {F}}_0\) is independent of \(\{W_t\}_{t\geqslant 0}\) and \(\{L_t\}_{t\geqslant 0}\), and \({\mathscr {F}}_0\) is “rich enough” such that \(\mathcal {P}_2=\{{\mathbb {P}}_{\xi }:\xi \in L^2(\Omega ,{\mathscr {F}}_0,{\mathbb {P}})\}.\) Let \(\{{\mathscr {F}}_t\}_{t\geqslant 0}\) be the filtration generated by \(\{W_t\}_{t\geqslant 0}\) and \(\{L_t\}_{t\geqslant 0}\), completed and augmented by \({\mathscr {F}}_0\); that is,

$$\begin{aligned} {\mathscr {F}}_t:=\cap _{r>t}\sigma \{W_s, L_s: s\leqslant r\}\vee {\mathscr {F}}_0\vee {\mathscr {N}}, t\in [0,1], \end{aligned}$$
(1.2)

where \({\mathscr {N}}\) is the collection of all \({\mathbb {P}}\)-null sets.

Definition 1.1

  1. 1.

    For any \(s\geqslant 0\), a c\(\grave{a}\)dl\(\grave{a}\)g \({\mathscr {F}}_t\) -adapted process \(\{X_t\}_{t\geqslant s}\) on \({\mathbb {R}}^d\) is called a strong solution of Eq. (1.1) from time s, if

    $$\begin{aligned} \int _s^t{\mathbb {E}}\left( |b(X_r,{\mathbb {P}}_{X_r})|^2+\Vert \sigma (X_r,{\mathbb {P}}_{X_r})\Vert ^2_{HS}+|f({\mathbb {P}}_{X_r})|^2\right) \mathrm{d}r<\infty , \ \ t\geqslant s, \end{aligned}$$

    and \({\mathbb {P}}\)-a.s.,

    $$\begin{aligned} X_t&=X_s+\int _s^t\!\!b(X_r,{\mathbb {P}}_{X_r})\mathrm{d}r+\int _s^t\!\!\sigma (X_r,{\mathbb {P}}_{X_r})\mathrm{d}W_r +\int _s^tf({\mathbb {P}}_{X_r})\mathrm{d}L_s,\ t\geqslant s. \end{aligned}$$
  2. 2.

    A triple \((\tilde{X}, \tilde{W}, \tilde{L})\) is called a weak solution to Eq. (1.1) from time s, if \(\tilde{W}\) is a d-dimensional Brownian motion with respect to a complete filtrated probability space (\(\tilde{\Omega },\{\tilde{{\mathscr {F}}}_t\}_{t\geqslant 0},\tilde{{\mathbb {P}}}\)), and \(\tilde{L}\) is a Lévy process with characteristic measure \(\nu \) under \(\tilde{{\mathbb {P}}}\), such that \(\tilde{X}_t\) solves

    $$\begin{aligned} \tilde{X}_t=\tilde{X}_s+\int _s^t\!\!b(\tilde{X}_r,\tilde{{\mathbb {P}}}_{\tilde{X}_r})\mathrm{d}r+\int _s^t\!\!\sigma (\tilde{X}_r,\tilde{{\mathbb {P}}}_{\tilde{X}_r})\mathrm{d}\tilde{W}_r, +\int _s^tf(\tilde{{\mathbb {P}}}_{\tilde{X}_r})\mathrm{d}\tilde{L}_r, \ t\geqslant s. \end{aligned}$$
  3. 3.

    Equation  (1.1) is said to have weak uniqueness in \(\mathcal {P}_2\), if for any \(s\geqslant 0\), any two weak solution from time s with common initial distribution in \(\mathcal {P}_2\) are equal in law. To be precise, if \(s\geqslant 0\) and \((\tilde{X}_{s, t},\tilde{W}_t,\tilde{L}_t)_{t\geqslant s}\) with respect to \((\tilde{\Omega },\{\tilde{{\mathscr {F}}_t}\}_{t\geqslant 0},\tilde{{\mathbb {P}}})\) and \((X_{s, t}, W_t, L_t)_{t\geqslant s}\) with respect to \((\Omega ,\{{\mathscr {F}}_t\}_{t\geqslant 0},{\mathbb {P}})\) are weak solutions of (1.1), then \({\mathbb {P}}_{X_{s,s}}=\tilde{{\mathbb {P}}}_{\tilde{X}_{s,s}}\) yields \({\mathbb {P}}_{X_{s,\cdot }}=\tilde{{\mathbb {P}}}_{\tilde{X}_{s,\cdot }}\).

Let \(B_0\) be the unit open ball without origin. For the Lévy measure \(\nu (\mathrm{d}z)\), we have the following assumption:

\((H_\nu )\)\(\nu |_{B_0}\) is absolutely continuous with respect to the Lebesgue measure \(\mathrm{d}z\); that is, there is a function \(\kappa :B_0\rightarrow (0,+\infty )\) such that

$$\begin{aligned} \nu (\mathrm{d}z)|_{B_0}=\kappa (z)\mathrm{d}z. \end{aligned}$$
(1.3)

Moreover, we assume the following regularity and order conditions:

  • for some \(c_0>0\),

    $$\begin{aligned} \kappa \in C^1(B_0;(0,\infty )), \ \ \ |\nabla \log \kappa (z)|\leqslant c_0|z|^{-1}, \quad \forall z\in B_0. \end{aligned}$$
    (1.4)
  • for some \(c_1>0\) and \(\alpha \in (0, 2)\),

    $$\begin{aligned} \lim \limits _{\epsilon \downarrow 0} \epsilon ^{\alpha -2}\int _{[|z|\leqslant \epsilon ]}|z|^2\nu (\mathrm{d}z)=c_1. \end{aligned}$$
    (1.5)

Let us list assumptions on the coefficients.

(H1) b and \(\sigma \) are twice differentiable with respect to the first variable x and the partial derivatives are bounded. b, \(\sigma \) and f, as well as their partial derivatives with respect to x, are Lipschitz continuous with respect to \(\mu \); that is, there exists a constant \(C>0\) such that

$$\begin{aligned}&|b(x,\mu _1)-b(x,\mu _2)|+\Vert \sigma (x,\mu _1)-\sigma (x,\mu _2)\Vert _{HS} +|f(\mu _1)-f(\mu _2)| \leqslant C{\mathbb {W}}_2(\mu _1,\mu _2), \end{aligned}$$

and

$$\begin{aligned} |\partial _xb(x,\mu _1)-\partial _xb(x,\mu _2)|+|\partial _x\sigma (x,\mu _1)-\partial _x\sigma (x,\mu _2)| \leqslant C{\mathbb {W}}_2(\mu _1,\mu _2), \end{aligned}$$

for all \( x\in {\mathbb {R}}^d \) and \(\mu _1,\mu _2\in \mathcal {P}_2\).

(H2) For each \(x\in {\mathbb {R}}^d\), each of the components of \(b(x,\cdot ), \sigma (x,\cdot )\) and \(f(\cdot )\) is in \(C_b^{1,1}(\mathcal {P}_2)\) (see Definition 2.2 below) with \(\sup \nolimits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2,y\in {\mathbb {R}}^d}|\partial _\mu b(x,\mu )(y)|<+\infty \) and \(\sup \nolimits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2,y\in {\mathbb {R}}^d}| \partial _\mu \sigma (x,\mu )(y)|<+\infty .\) Moreover, \(\partial _\mu b(\cdot ,\mu )(y)\) and \(\partial _\mu \sigma (\cdot ,\mu )(y)\) are differentiable with bounded derivatives; that is,

$$\begin{aligned} \Vert \partial _x\partial _\mu b\Vert _\infty :=\sup \limits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2, y\in {\mathbb {R}}^d}|\partial _x\partial _\mu b(x,\mu )(y)|<+\infty , \end{aligned}$$

and

$$\begin{aligned} \Vert \partial _x\partial _\mu \sigma \Vert _\infty :=\sup \limits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2,y\in {\mathbb {R}}^d}| \partial _x\partial _\mu \sigma (x,\mu )(y)|<+\infty . \end{aligned}$$

For \(x\in {\mathbb {R}}^d\), let \(\{X^x_t\}_{t\geqslant 0}\) be the solution to Eq.  (1.1) with initial value x.

Theorem 1.2

Assume (\(H_\nu \)), (H1) and (H2). The following statements hold:

  1. 1.

    If \(\Vert \sigma ^{-1}\Vert _\infty :=\sup \nolimits _{y\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2}|\sigma ^{-1}(y,\mu )|<\infty \), then there exists \(C>0\) such that for each \(t\in (0,1]\), \(x,y\in {\mathbb {R}}^d\) and \(g\in C^1_b({\mathbb {R}}^d)\),

    $$\begin{aligned} |{\mathbb {E}}\nabla g(X^x_t)|\leqslant C\Vert g\Vert _\infty (1+|x|)t^{-\frac{1}{2}}, \end{aligned}$$
    (1.6)

    and

    $$\begin{aligned} |\nabla _y {\mathbb {E}}g(X^x_t)|\leqslant C\Vert g\Vert _\infty (1+|x|)|y|t^{-\frac{1}{2}}. \end{aligned}$$
    (1.7)
  2. 2.

    If \(\Vert f^{-1}\Vert _\infty :=\sup \nolimits _{\mu \in \mathcal {P}_2}|f^{-1}(\mu )|<\infty \), then there exists \(C>0\) such that for each \(t\in (0,1]\), \(x,y\in {\mathbb {R}}^d\) and \(g\in C^1_b({\mathbb {R}}^d)\)

    $$\begin{aligned} |{\mathbb {E}}\nabla g(X^x_t)|\leqslant C\Vert g\Vert _\infty (1+|x|)t^{-\frac{1}{\alpha }}, \end{aligned}$$
    (1.8)

    and

    $$\begin{aligned} |\nabla _y {\mathbb {E}}g(X^x_t)|\leqslant C\Vert g\Vert _\infty (1+|x|)|y|t^{-\frac{1}{\alpha }}. \end{aligned}$$
    (1.9)

Remark 1.3

  1. 1.

    If \(\Vert \sigma ^{-1}\Vert _\infty <\infty \) holds, the process L can be any Lévy process which is independent of W and has finite p-th moment for all \(p\geqslant 2\). In this case, the condition (\(H_\nu \)) can be removed.

  2. 2.

    If \(\Vert f^{-1}\Vert _\infty <\infty \) holds, for any \(\alpha \in (0,2)\) the order \(\frac{1}{\alpha }\) in the gradient estimates is sharp in short-time when L is a truncated \(\alpha \)-stable process with characteristic measure \(\frac{C_\alpha }{|z|^{d+\alpha }}I_{[0<|z|<1]}\mathrm{d}z\) for some \(C_\alpha >0\).

As an immediate result of Theorem 1.2, we have

Corollary 1.4

Assume (\(H_\nu \)), (H1) and (H2).

  1. 1.

    If \(\Vert \sigma ^{-1}\Vert _\infty <\infty \) holds, then there exists \(C>0\) such that for each \(t\in (0,1]\) and \(x_1, x_2\in {\mathbb {R}}^d\),

    $$\begin{aligned} \int _{{\mathbb {R}}^d}|p_t(x_1,y)-p_t(x_2,y)|\mathrm{d}y \leqslant C(1+|x_1|+|x_2|)|x_1-x_2|t^{-\frac{1}{2}}, \end{aligned}$$

    where \(p_t(x_1,y)\) and \(p_t(x_2,y)\) denote the density functions of \(X^{x_1}_t\) and \(X^{x_2}_t\), respectively.

  2. 2.

    If \(\Vert f^{-1}\Vert _\infty <\infty \) holds, then there exists \(C>0\) such that for each \(t\in (0,1]\) and \(x_1, x_2\in {\mathbb {R}}^d\),

    $$\begin{aligned} \int _{{\mathbb {R}}^d}|p_t(x_1,y)-p_t(x_2,y)|\mathrm{d}y \leqslant C(1+|x_1|+|x_2|)|x_1-x_2|t^{-\frac{1}{\alpha }}, \end{aligned}$$

    where \(p_t(x_1,y)\) and \(p_t(x_2,y)\) denote the density functions of \(X^{x_1}_t\) and \(X^{x_2}_t\) respectively.

The proofs of Theorem 1.2 and the Corollary will be shown in Sect. 3.

It is well known that under the Lipschitz condition, Eq. (1.1) has a unique strong solution (see Theorem 3.1 below). Hence, the solution is a Markov process. Precisely speaking, letting \(\{X^\xi _{s,t}\}_{t\geqslant s}\) denote the solution of Eq. (1.1) from time s with \({\mathscr {F}}_s\) -measurable and square-integrable initial value \(X^\xi _{s,s}=\xi \), the existence and uniqueness imply

$$\begin{aligned} X_{s,t}^\xi =X^{X^\xi _{s,r}}_{r,t},\ \ \ t\geqslant r\geqslant s\geqslant 0. \end{aligned}$$

Due to this property, we may define a nonlinear semigroup \(\{P^*_{s,t}\}_{t\geqslant s}\) on \(\mathcal {P}_2\) by letting \(P^*_{s,t}\mu ={\mathbb {P}}_{X^\xi _{s,t}}\) for \({\mathbb {P}}_{\xi }=\mu \in \mathcal {P}_2\). For simplicity, we will use \(P^*_t\) to denote \(P^*_{0,t}\). For more detailed discussion about this kind of nonlinear semigroup, we refer to [26, p. 598].

A probability measure \(\hat{\mu }\) is said to be an invariant measure of \(P^*_t\) if \(P^*_t\hat{\mu }=\hat{\mu }\) for all \(t\geqslant 0\). The solution is called to be exponentially ergodic if for any \(\mu \in \mathcal {P}_2\), \(P^*_t\mu \) converges to \(\hat{\mu }\) exponentially in the sense of total variation distance. In order to investigate the exponential ergodicity of \(P^*_t\), we give the following dissipative condition.

(H3) There exist constants \(C_1\) and \(C_2\) with \(C_2>C_1\geqslant 0\) such that for each \(x_1, x_2\in {\mathbb {R}}^d\) and \(\mu _1,\mu _2\in \mathcal {P}_2\),

$$\begin{aligned}&2\langle b(x_1,\mu _1)-b(x_2,\mu _2),x_1-x_2\rangle +\Vert \sigma (x_1,\mu _1)-\sigma (x_2,\mu _2)\Vert _{HS}^2\\&\ \ +\int _{{\mathbb {R}}^d_0}|z|^2\nu (\mathrm{d}z)|f(\mu _1)-f(\mu _2)|^2 \leqslant C_1{\mathbb {W}}_2(\mu _1,\mu _2)^2-C_2|x_1-x_2|^2. \end{aligned}$$

Define the total variation distance on \(\mathcal {P}_2\) as

$$\begin{aligned} \Vert \mu _1-\mu _2\Vert _{TV}:=\sup \limits _{A\in {\mathscr {B}}({\mathbb {R}}^d)}|\mu _1(A)-\mu _2(A)|,\ \ \mu _1,\mu _2\in \mathcal {P}_2. \end{aligned}$$

We have the following exponentially ergodic property.

Theorem 1.5

Let (\(H_\nu \)) and (H1)–(H3) hold. Assume \(\Vert \sigma ^{-1}\Vert _\infty <\infty \) or \(\Vert f^{-1}\Vert _\infty <\infty \). Then there is a unique invariant measure \(\hat{\mu }\) for \(P^*_t\) such that for any \(\mu \in \mathcal {P}_2\),

$$\begin{aligned} \Vert P^*_t\mu -\hat{\mu }(\cdot )\Vert _{TV} \leqslant C\left[ 1+\left( \int _{{\mathbb {R}}^d}|x|^2\mu (\mathrm{d}x)\right) ^{\frac{1}{2}}\right] \text {e}^{-\frac{1}{2}(C_2-C_1)t}, \end{aligned}$$

where C is a constant independent of \(\mu \) and t.

The rest of this manuscript is organized as follows. In Sect. 2, we introduce some preliminaries : Lions’ definition of the derivative of functions defined on \(\mathcal {P}_2\) and Malliavin calculus for Wiener–Poisson functionals. In Sect. 3, we give the proofs of the main results. An example is shown in Sect. 4.

2 Preliminaries

In this section, we introduce some basic elements of differentiability of functions on \(\mathcal {P}_2\) and Malliavin calculus for Wiener–Poisson functionals.

2.1 Derivative in the Wasserstein Space

Now we introduce the notion of differentiability of functions on \(\mathcal {P}_2\) which was first introduced by Lions [18] and revised in the notes by Cardaliaguet [8].

Let \((\tilde{\Omega }, \tilde{{\mathscr {F}}}, \tilde{\mathbb {P}})\) be a complete probability space. Denote by \(L^2(\tilde{\Omega }; \mathbb {R}^d)\) the Hilbert space consisting of all square integrable random variables valued on \({\mathbb {R}}^d\), equipped with the inner product defined as

$$\begin{aligned} \langle \xi _1,\xi _2\rangle _{L^2}:=\tilde{{\mathbb {E}}}(\xi _1\cdot \xi _2),\quad \forall \xi _1,\xi _2\in L^2(\tilde{\Omega }; \mathbb {R}^d). \end{aligned}$$

Assume \(\tilde{{\mathscr {F}}}\) is rich enough so that for each \(\mu \in \mathcal {P}_2\) there exists a random variable \(\xi \in L^2(\tilde{\Omega }; \mathbb {R}^d)\) such that \(\tilde{\mathbb {P}}_\xi =\mu \), i.e., \(\mu \) is the distribution of \(\xi \) under \(\tilde{{\mathbb {P}}}\).

Let \(f:\mathcal {P}_2\rightarrow \mathbb {R}\) be a function. Define its lifted function over \(L^2(\tilde{\Omega };{\mathbb {R}}^d)\),

$$\begin{aligned} \tilde{f}(\xi ):=f(\tilde{{\mathbb {P}}}_\xi ),\quad \forall \xi \in L^2(\tilde{\Omega };{\mathbb {R}}^d). \end{aligned}$$

Definition 2.1

A function \(f:\mathcal {P}_2\rightarrow \mathbb {R}\) is said to be differentiable at \(\mu _0\in \mathcal {P}_2\), if there is a random variable \(\xi _0\in L^2(\tilde{\Omega };{\mathbb {R}}^d)\) with \(\tilde{{\mathbb {P}}}_{\xi _0}=\mu _0\) such that the lifted function \(\tilde{f}\) is Fréchet differentiable at \(\xi _0\)

If f is differentiable at \(\mu _0\), there exists a linear continuous mapping \(D\tilde{f}(\xi _0): L^2(\tilde{\Omega };\mathbb {R}^d)\rightarrow \mathbb {R}\) such that

$$\begin{aligned} \tilde{f}(\xi _0+\eta )-\tilde{f}(\xi _0)=D\tilde{f}(\xi _0)(\eta )+o(|\eta |_{L^2}),\ \ \ \eta \in L^2(\tilde{\Omega };{\mathbb {R}}^d), \end{aligned}$$

as \(|\eta |_{L^2}\rightarrow 0\). By Riesz’ representation theorem, there is a (\({\mathbb {P}}\)-a.s.) unique random variable \(\zeta \in L^2(\tilde{\Omega };{\mathbb {R}}^d)\) such that

$$\begin{aligned} D\tilde{f}(\xi _0)(\eta )=\langle \eta ,\zeta \rangle _{L^2}, \end{aligned}$$

for all \(\eta \in L^2(\tilde{\Omega };{\mathbb {R}}^d)\). According to Theorem 6.2 and Theorem 6.5 in [8], there is a Borel function \(h_0:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) such that \(\zeta =h_0(\xi _0)\), \({\mathbb {P}}-\)a.s. and the function \(h_0\) only depends on the law \(\mu _0\), not on \(\xi _0\) itself. Taking into account the definition of \(\tilde{f}\), this allows to write for any \(\xi \in L^2(\tilde{\Omega };{\mathbb {R}}^d)\),

$$\begin{aligned} f(\tilde{{\mathbb {P}}}_\xi )-f(\tilde{{\mathbb {P}}}_{\xi _0})=\tilde{{\mathbb {E}}}[h_0(\xi _0) \cdot (\xi -\xi _0)]+o(|\xi -\xi _0|_{L^2}). \end{aligned}$$
(2.1)

We call \(\partial _{\mu }f(\mu _0)(\cdot ):=h_0(\cdot )\) the derivative of f at \(\mu _0\). Note that \(\partial _{\mu }f(\mu _0)\) is only \(\mu _0\)-a.e. uniquely determined, and it allows us to express \(D\tilde{f}(\xi _0)\) as a function of any random variable \(\xi _0\) with distribution \(\mu _0\), irrespective of where this random variable is defined. In particular, the differentiation formula (2.1) is somehow invariant by modification of the probability space \((\tilde{\Omega },\tilde{{\mathscr {F}}},\tilde{{\mathbb {P}}})\) and of the variables \(\xi _0\) and \(\xi \) used for the representation of f, in the sense that \(D\tilde{f}(\xi _0)\) always reads as \(\partial _\mu f(\mu _0)\), whatever the choice of \(\xi _0\) is.

Since we will consider functions \(f:\mathcal {P}_2\rightarrow {\mathbb {R}}\) which are differentiable at all elements of \(\mathcal {P}_2\), we suppose that \(\tilde{f}:L^2(\tilde{\Omega };{\mathbb {R}})\rightarrow {\mathbb {R}}\) is Fréchet differentiable over the whole space \(L^2(\tilde{\Omega }; {\mathbb {R}}^d)\). In this case, we have the derivative \(\partial _\mu f(\tilde{{\mathbb {P}}}_{\xi })\) defined \(\tilde{{\mathbb {P}}}_{\xi }\)-a.e. for all \(\xi \in L^2(\tilde{\Omega };{\mathbb {R}}^d)\). Due to Lemma 3.2 in [9], if the Fréchet derivative \(D\tilde{f}:L^2(\tilde{\Omega };{\mathbb {R}}^d)\rightarrow L(L^2(\tilde{\Omega };{\mathbb {R}}^d))\) is Lipschitz continuous with a Lipschitz constant \(K\in (0,+\infty )\), then there is for all \(\xi \in L^2(\tilde{\Omega };{\mathbb {R}}^d)\) a \(\tilde{{\mathbb {P}}}_\xi \)-version of \(\partial _\mu f(\tilde{{\mathbb {P}}}_\xi ):{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) such that

$$\begin{aligned} |\partial _\mu f(\tilde{{\mathbb {P}}}_\xi )(y_1)-\partial _\mu f(\tilde{{\mathbb {P}}}_\xi )(y_2)|\leqslant K|y_1-y_2|, \quad \forall y_1, y_2\in {\mathbb {R}}^d. \end{aligned}$$

Definition 2.2

A function \(f:\mathcal {P}_2\rightarrow {\mathbb {R}}\) is said to be continuously differentiable with Lipschitz-continuous and bounded derivatives, if there exists for all \(\xi \in L^2(\tilde{\Omega };{\mathbb {R}}^d)\) a \(\tilde{{\mathbb {P}}}_\xi \)-modification of \(\partial _\mu f(\tilde{{\mathbb {P}}}_\xi )(\cdot )\), also denoted by \(\partial _\mu f(\tilde{{\mathbb {P}}}_\xi )(\cdot )\), such that \(\partial _\mu f:\mathcal {P}_2({\mathbb {R}}^d)\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) is bounded and Lipschitz continuous, that is, there is some constant \(C>0\) such that:

(i) \(|\partial _\mu f(\mu )(y)|\leqslant C\), for all \(\mu \in \mathcal {P}_2\) and \(y\in {\mathbb {R}}^d\);

(ii) \(|\partial _\mu f(\mu _1)(y_1)-\partial _\mu f(\mu _2)(y_2)|\leqslant C\left( W_2(\mu _1,\mu _2)+|y_1-y_2|\right) \), for all \(\mu _1, \mu _2\in \mathcal {P}_2\) and \(y_1, y_2\in {\mathbb {R}}^d\). In this case, the function \(\partial _\mu f\) is considered as the derivative of f and the collection of all such function f is denoted by \(C_b^{1,1}(\mathcal {P}_2)\).

Remark 2.3

It is known that (cf. [5, Remark 2.1]) if f belongs to \(C^{1,1}_b(\mathcal {P}_2)\), then the version of \(\partial _\mu f({\mathbb {P}}_\xi )(\cdot )\) indicated in Definition 2.2 is unique.

Example 2.4

Given two twice continuously differentiable functions \(h:{\mathbb {R}}^d\rightarrow {\mathbb {R}}\) and \(g:{\mathbb {R}}\rightarrow {\mathbb {R}}\) with bounded derivatives, we consider

$$\begin{aligned} f(\tilde{{\mathbb {P}}}_{\xi }):=g(\tilde{{\mathbb {E}}} h(\xi )), \ \ \xi \in L^2(\tilde{\Omega },{\mathbb {R}}^d). \end{aligned}$$

Then, given any \(\xi _0\in L^2(\tilde{\Omega },{\mathbb {R}}^d)\),

$$\begin{aligned} \tilde{f}(\xi ):=f(\tilde{{\mathbb {P}}}_\xi )=g(\tilde{{\mathbb {E}}} h(\xi )) \end{aligned}$$

is Fréchet differentiable in \(\xi _0\), and

$$\begin{aligned} \tilde{f}(\xi _0+\eta )-\tilde{f}(\xi _0)&=\int _0^1g'(\tilde{{\mathbb {E}}} h(\xi _0+s\eta ))\tilde{{\mathbb {E}}}\left( h'(\xi _0+s\eta )\eta \right) \mathrm{d}s\\&=g'(\tilde{{\mathbb {E}}} h(\xi _0))\tilde{{\mathbb {E}}}\left( h'(\xi _0)\eta \right) +o\left( \Vert \eta \Vert _{L^2}\right) . \end{aligned}$$

Thus,

$$\begin{aligned} D\tilde{f}(\xi _0)(\eta )=\tilde{{\mathbb {E}}}\left( g'(\tilde{{\mathbb {E}}}h(\xi _0))\nabla h(\xi _0)\eta \right) , \ \ \eta \in L^2(\tilde{\Omega };{\mathbb {R}}^d); \end{aligned}$$

that is,

$$\begin{aligned} \partial _\mu f(\tilde{{\mathbb {P}}}_{\xi _0})(y)=g'(\tilde{{\mathbb {E}}}h(\xi _0))\nabla h(y),\quad \forall y\in {\mathbb {R}}^d. \end{aligned}$$

Moreover, we see that

$$\begin{aligned} \partial _y\partial _\mu f(\tilde{{\mathbb {P}}}_{\xi _0})(y)=g'(\tilde{{\mathbb {E}}}h(\xi _0))\nabla ^2 h(y),\quad \forall y\in {\mathbb {R}}^d. \end{aligned}$$

2.2 Malliavin Calculus

In this section, we recall some basic facts about Bismut’s approach to Malliavin calculus for jump processes (cf. [3, 4, 24] etc.).

Let \(\Gamma \subset {\mathbb {R}}^d\) be an open set containing the origin. Let us define

$$\begin{aligned} \Gamma _0:=\Gamma \setminus \{0\},\ \ \varrho (z):=1\vee \mathbf{d}(z,\Gamma ^c_0)^{-1}, \end{aligned}$$
(2.2)

where \(\mathbf{d}(z,\Gamma ^c_0)\) is the distance of z to the complement of \(\Gamma _0\). Let \(\Omega \) be the canonical space of all points \(\omega =(w,\mu )\), where

  • \(w: [0,1]\rightarrow {\mathbb {R}}^d\) is a continuous function with \(w(0)=0\);

  • \(\mu \) is an integer-valued measure on \([0,1]\times \Gamma _0\) with \(\mu (A)<+\infty \) for any compact set \(A\subset [0,1]\times \Gamma _0\).

Define the canonical process on \(\Omega \) as follows: for \(\omega =(w,\mu )\),

$$\begin{aligned} W_t(\omega ):=w(t),\ \ \ N(\omega ; \mathrm{d}t,\mathrm{d}z):=\mu (\omega ; \mathrm{d}t,\mathrm{d}z):=\mu (\mathrm{d}t,\mathrm{d}z). \end{aligned}$$

Let \(({\mathscr {F}}_t)_{t\in [0,1]}\) be the smallest right-continuous filtration on \(\Omega \) such that W and N are optional. In the following, we write \({\mathscr {F}}:={\mathscr {F}}_1\), and endow \((\Omega ,{\mathscr {F}})\) with the unique probability measure \({\mathbb {P}}\) such that

  • W is a standard d-dimensional Brownian motion;

  • N is a Poisson random measure with intensity \(\nu (\mathrm{d}z)\mathrm{d}t\), where \(\nu (\mathrm{d}z)=\kappa (z)\mathrm{d}z\) with

    $$\begin{aligned} \kappa \in C^1(\Gamma _0;(0,\infty )),\ \int _{\Gamma _0}(1\wedge |z|^2)\kappa (z)\mathrm{d}z<+\infty , \ \ |\nabla \log \kappa (z)|\leqslant C\varrho (z), \end{aligned}$$
    (2.3)

    where \(\varrho (z)\) is defined by (2.2);

  • W and N are independent.

In the following, we write

$$\begin{aligned} \widehat{N}(\mathrm{d}z,\mathrm{d}s):=N(\mathrm{d}z,\mathrm{d}s)-\nu (\mathrm{d}z)\mathrm{d}s. \end{aligned}$$

2.3 Function Spaces

Let \(p\geqslant 1\) and k be an integer. We introduce the following spaces for later use.

  • \(L^p(\Omega )\): The space of all \({\mathscr {F}}\)-measurable random variables with finite norm:

    $$\begin{aligned} \Vert F\Vert _p:=\big [\mathbb {E}|F|^p\big ]^{\frac{1}{p}}. \end{aligned}$$
  • \({\mathbb {L}}^1_p\): The space of all predictable processes: \(\xi :\Omega \times [0,1]\times \Gamma _0\rightarrow {\mathbb {R}}^k\) with finite norm:

    $$\begin{aligned} \Vert \xi \Vert _{{\mathbb {L}}^1_p}:=\left[ {\mathbb {E}}\left( \int ^1_0\!\!\!\int _{\Gamma _0} |\xi (s,z)|\nu (\mathrm{d}z)\mathrm{d}s\right) ^p\right] ^{\frac{1}{p}} +\left[ {\mathbb {E}}\int ^1_0\!\!\!\int _{\Gamma _0}|\xi (s,z)|^p\nu (\mathrm{d}z)\mathrm{d}s\right] ^{\frac{1}{p}}<\infty . \end{aligned}$$
    (2.4)
  • \({\mathbb {L}}^2_p\): The space of all predictable processes: \(\xi :\Omega \times [0,1]\times \Gamma _0\rightarrow {\mathbb {R}}^k\) with finite norm:

    $$\begin{aligned} \Vert \xi \Vert _{{\mathbb {L}}^2_p}:=\left[ {\mathbb {E}}\left( \int ^1_0\!\!\!\int _{\Gamma _0}|\xi (s,z)|^2\nu (\mathrm{d}z)\mathrm{d}s\right) ^{\frac{p}{2}}\right] ^{\frac{1}{p}} +\left[ {\mathbb {E}}\int ^1_0\!\!\!\int _{\Gamma _0}|\xi (s,z)|^p\nu (\mathrm{d}z)\mathrm{d}s\right] ^{\frac{1}{p}}<\infty . \end{aligned}$$
  • \({\mathbb {H}}_p\): The space of all measurable adapted processes \(h:\Omega \times [0,1]\rightarrow {\mathbb {R}}^d\) with finite norm:

    $$\begin{aligned} \Vert h\Vert _{{\mathbb {H}}_p}:=\left[ {\mathbb {E}}\left( \int ^1_0|h(s)|^2\mathrm{d}s\right) ^{\frac{p}{2}}\right] ^{\frac{1}{p}}<+\infty . \end{aligned}$$
    (2.5)
  • \({\mathbb {V}}_p\): The space of all predictable processes \({\mathbf {v}}: \Omega \times [0,1]\times \Gamma _0\rightarrow {\mathbb {R}}^d\) with finite norm:

    $$\begin{aligned} \Vert {\mathbf {v}}\Vert _{{\mathbb {V}}_p}:=\Vert \nabla _z{\mathbf {v}}\Vert _{{\mathbb {L}}^1_p}+\Vert {\mathbf {v}}\varrho \Vert _{{\mathbb {L}}^1_p}<\infty , \end{aligned}$$
    (2.6)

    where \(\varrho (z)\) is defined by (2.2). Below we shall write

    $$\begin{aligned} {\mathbb {H}}_{\infty -}:=\cap _{p\geqslant 1}{\mathbb {H}}_p,\ \ {\mathbb {V}}_{\infty -}:=\cap _{p\geqslant 1}{\mathbb {V}}_p. \end{aligned}$$
  • \({\mathbb {H}}_0\): The space of all bounded measurable adapted processes \(h:\Omega \times [0,1]\rightarrow {\mathbb {R}}^d\).

  • \({\mathbb {V}}_0\): The space of all predictable processes \({\mathbf {v}}: \Omega \times [0,1]\times \Gamma _0\rightarrow {\mathbb {R}}^d\) with the following properties: (i) \({\mathbf {v}}\) and \(\nabla _z {\mathbf {v}}\) are bounded; (ii) there exists a compact subset \(U\subset \Gamma _0\) such that

    $$\begin{aligned} {\mathbf {v}}(t,z)=0,\quad \forall z\notin U. \end{aligned}$$
  • For any \(t\in (0,1]\), \({\mathbb {L}}^1_p(t)\), \({\mathbb {H}}_p(t)\) and \({\mathbb {V}}_p(t)\) are the corresponding spaces as defined in (2.4), (2.5) and (2.6) when the integral interval [0, 1] is changed into [0, t].

Let m be an integer and \(C_\mathrm{{p}}^\infty ({\mathbb {R}}^m)\) be the class of all smooth functions on \({\mathbb {R}}^m\) which together with all the derivatives has at most polynomial growth. Let \({\mathcal {F}}C^\infty _\mathrm{{p}}\) be the class of all Wiener–Poisson functionals on \(\Omega \) with the following form:

$$\begin{aligned} F=f(W(h_1),\ldots , W(h_{m_1}), N(g_1),\ldots , N(g_{m_2})), \end{aligned}$$

where \(f\in C_\mathrm{{p}}^\infty ({\mathbb {R}}^{m_1+m_2})\), \(h_1,\ldots , h_{m_1}\in {\mathbb {H}}_0\) and \(g_1,\ldots , g_{m_2}\in {\mathbb {V}}_0\) are non-random and real-valued, and

$$\begin{aligned} W(h_i):=\int ^1_0\langle h_i(s),\mathrm{d}W_s\rangle _{{\mathbb {R}}^d},\ \ N(g_j):=\int ^1_0\!\!\!\int _{\Gamma _0}g_j(s,z)N(\mathrm{d}s,\mathrm{d}z). \end{aligned}$$

For any \(p>1\) and \(\Theta =(h,{\mathbf {v}})\in {\mathbb {H}}_p\times {\mathbb {V}}_p\), let us define

$$\begin{aligned} D_\Theta F&:=\sum _{i=1}^{m_1}(\partial _i f)(\cdot )\int ^1_0 \langle h(s),h_i(s)\rangle _{{\mathbb {R}}^d}\mathrm{d}s\nonumber \\&\quad +\sum _{j=1}^{m_2}(\partial _{j+m_1} f)(\cdot )\int ^1_0\!\!\!\int _{\Gamma _0}\langle {\mathbf {v}}(s,z), \nabla _z g_j(s,z)\rangle _{{\mathbb {R}}^d}N(\mathrm{d}s,\mathrm{d}z), \end{aligned}$$
(2.7)

where “\((\cdot )\)” stands for \(W(h_1),\ldots , W(h_{m_1}), N(g_1),\ldots , N(g_{m_2})\).

Definition 2.5

For \(p>1\) and \(\Theta =(h,{\mathbf {v}})\in {\mathbb {H}}_p\times {\mathbb {V}}_p\), we define the first order Sobolev space \({\mathbb {D}}_{\Theta }^{1,p}\) being the completion of \({\mathcal {F}}C^\infty _\mathrm{{p}}\) in \(L^p(\Omega )\) with respect to the norm:

$$\begin{aligned} \Vert F\Vert _{\Theta ; 1, p}:=\Vert F\Vert _{L^p}+\Vert D_\Theta F\Vert _{L^p}. \end{aligned}$$

We have the following integration by parts formula (cf. [24, Theorem 2.9]).

Theorem 2.6

Given \(\Theta =(h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\) and \(p>1\), for any \(F\in {\mathbb {D}}_\Theta ^{1,p}\), we have

$$\begin{aligned} {\mathbb {E}}D_\Theta F={\mathbb {E}}(F\delta (\Theta )), \end{aligned}$$
(2.8)

where

$$\begin{aligned} \delta (\Theta ):=\int _0^1\langle h(s),\mathrm{d}W_s\rangle -\int _0^1\!\!\!\int _{\Gamma _0}\frac{\mathrm{div}(\kappa {\mathbf {v}})(s,z)}{\kappa (z)}\widehat{N}(\mathrm{d}z,\mathrm{d}s), \end{aligned}$$

and \(\mathrm{div}(\kappa {\mathbf {v}}):=\sum _{i=1}^d\partial _{z_i}(\kappa {\mathbf {v}}_i)\) stands for the divergence.

The following Burkholder–Davis–Gundy inequality (c.f. [20, Theorem 48] and [24, Lemma 2.3]) will be used frequently.

Lemma 2.7

  1. 1.

    For any \(p\geqslant 1\), there is a constant \(C_p>0\) such that for any c\(\grave{a}\)dl\(\grave{a}\)g martingale \(M_t\),

    $$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|M_s|^p\right) \leqslant C_p{\mathbb {E}}[M, M]_t^p. \end{aligned}$$
    (2.9)
  2. 2.

    For any \(p\geqslant 1\), there is a constant \(C_p>0\) such that for any \(\zeta \in {\mathbb {L}}^1_p\),

    $$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}\left| \int _0^t\!\!\int _{B_0}\zeta (s,z) N(\mathrm{d}z,\mathrm{d}s )\right| ^p\right) \leqslant C_p\Vert \zeta \Vert ^p_{{\mathbb {L}}^1_p}. \end{aligned}$$
    (2.10)

3 Proofs of Main Results

Let’s first show the existence and uniqueness of the solution to Eq. (1.1).

Theorem 3.1

Assume that there is a constant \(C>0\) such that

$$\begin{aligned}&|b(x_1,\mu _1)-b(x_2,\mu _2)|^2+\Vert \sigma (x_1,\mu _1)-\sigma (x_2,\mu _2)\Vert ^2_{HS} +|f(\mu _1)-f(\mu _2)|^2\nonumber \\&\quad \quad \leqslant C\left( |x_1-x_2|^2+{\mathbb {W}}_2(\mu _1,\mu _2)^2\right) ,\ \ \ x_1, x_2 \in {\mathbb {R}}^d,\ \mu _1, \mu _2\in \mathcal {P}_2. \end{aligned}$$
(3.1)

Then Eq. (1.1) admits a unique strong/weak solution. Moreover, for any \(s\geqslant 0\), \(T\geqslant s\) and \(p\geqslant 2\), \({\mathbb {E}}|X_{s,s}|^p<\infty \) implies

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{t\in [s,T]}|X_{s,t}|^p\right) \leqslant C_{s,T}\left( 1+{\mathbb {E}}|X_{s,s}|^p\right) , \end{aligned}$$
(3.2)

where \(C_{s,T}\) is a constant depending on s and T.

Proof

For the existence and uniqueness of the strong solution, we refer to [13, Theorem 3.1]. And for (3.2), it can be easily derived by Lemma 2.7 and Gronwall’s inequality, so we omit the proof. We only prove the uniqueness of the weak solution. Let (\(X_t, W_t, L_t\)) and (\(\tilde{X}_t, \tilde{W}_t, \tilde{L}_t\)) with respect to (\(\Omega ,{\mathscr {F}}_t,{\mathbb {P}}\)) and (\(\tilde{\Omega }, \tilde{{\mathscr {F}}}_t, \tilde{{\mathbb {P}}}\)) respectively be two weak solutions with \({\mathbb {P}}_{X_0}=\tilde{{\mathbb {P}}}_{\tilde{X}_0}\). Then \(X_t\) solves Eq.  (1.1) while \(\tilde{X}_t\) solves

$$\begin{aligned} \mathrm{d}\tilde{X}_t=b(\tilde{X}_t,\tilde{{\mathbb {P}}}_{\tilde{X}_t})\mathrm{d}t+ \sigma (\tilde{X}_t,\tilde{{\mathbb {P}}}_{\tilde{X}_t})\mathrm{d}\tilde{W}_t+ f(\tilde{{\mathbb {P}}}_{\tilde{X}_t})\mathrm{d}\tilde{L}_t. \end{aligned}$$
(3.3)

To prove \({\mathbb {P}}_{X_\cdot }=\tilde{{\mathbb {P}}}_{\tilde{X}_\cdot }\), let

$$\begin{aligned} \overline{b}_t(x)=b(x,{\mathbb {P}}_{X_t}),\ \ \overline{\sigma }_t(x)=\sigma (x,{\mathbb {P}}_{X_t}), \ \ \overline{f}_t=f({\mathbb {P}}_{X_t}). \end{aligned}$$

Due to (3.1) and (3.2), it is easy to verify that \(\overline{b}\) and \(\overline{\sigma }\) are Lipschitz continuous and \(\overline{f}\) is bounded on [0, 1]. Therefore, the SDE

$$\begin{aligned} \mathrm{d}\overline{X}_t=\overline{b}_t(\overline{X}_t)\mathrm{d}t+\overline{\sigma }(\overline{X}_t)\mathrm{d}\tilde{W}_t+\overline{f}_t\mathrm{d}\tilde{L}_t,\ \ \overline{X}_0=\tilde{X}_0 \end{aligned}$$
(3.4)

has a unique strong solution. Due to Yamada–Watanabe theorem for nonhomogeneous SDEs with jumps (see [1]), it also has the uniqueness of the weak solution. Noting that

$$\begin{aligned} \mathrm{d}X_t=\overline{b}_t(X_t)\mathrm{d}t+\overline{\sigma }(X_t)\mathrm{d}W_t+\overline{f}_t\mathrm{d}L_t,\ \ {\mathbb {P}}_{X_0}=\tilde{{\mathbb {P}}}_{\tilde{X}_0}, \end{aligned}$$

we have \(\tilde{{\mathbb {P}}}_{\overline{X}_{\cdot }}={\mathbb {P}}_{X_\cdot }\). Therefore, (3.4) can be written as

$$\begin{aligned} \mathrm{d}\overline{X}_t=b\left( \overline{X}_t,\tilde{{\mathbb {P}}}_{\overline{X}_t}\right) \mathrm{d}t+\sigma \left( \overline{X}_t,\tilde{{\mathbb {P}}}_{\overline{X}_t}\right) \mathrm{d}\tilde{W}_t +f\left( \tilde{{\mathbb {P}}}_{\overline{X}_t}\right) \mathrm{d}\tilde{L}_t. \end{aligned}$$

By the uniqueness of (3.3), we obtain \(\overline{X}=\tilde{X}\). Therefore, \(\tilde{{\mathbb {P}}}_{\tilde{X}_\cdot }={\mathbb {P}}_{X_\cdot }\). \(\square \)

For any \(x\in {\mathbb {R}}^d\), denote by \(X^x_t\) the solution to Eq. (1.1) with initial value x. Assume (H1) holds. Let \(\{J_t\}_{t\in [0,1]}\) satisfy the following linear matrix-valued equation:

$$\begin{aligned} J_t=I+\int _0^t\partial _xb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )J_s\mathrm{d}s+\sum _{k=1}^d\int _0^t\partial _x\sigma _k\left( X^x_s,{\mathbb {P}}_{X^x_s}\right) J_s\mathrm{d}W^k_s,\ \ t\in [0,1], \end{aligned}$$
(3.5)

where \(\sigma _k\) denotes the k-th column of \(\sigma \) and \(W^k\) is the k-th element of W. Then by Itô’s formula, we can easily obtain that the inverse matrix of \(J_t\) denoted by \(K_t\) satisfies:

$$\begin{aligned} K_t=&\,I-\!\!\int _0^t\!\!K_s\partial _xb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\mathrm{d}s +\!\!\sum _{k=1}^d\!\int _0^t\!\!K_s\left( \partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\right) ^2\mathrm{d}s\nonumber \\ {}&-\!\!\sum _{k=1}^d\!\int _0^t\!\!K_s\partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\mathrm{d}W^k_s \end{aligned}$$
(3.6)

for all \(t\in [0,1]\).

Lemma 3.2

Assume (H1) holds. For any \(p\geqslant 2\), we have

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|J_t|^p\right)<\infty ,\ \ \ \ \ \ {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|K_t|^p\right) <\infty . \end{aligned}$$
(3.7)

These can be easily derived by (2.9) and Gronwall’s inequality, so we omit the proof.

3.1 Malliavin Derivatives and Their Estimates

Proposition 3.3

Assume (\(H_\nu \)) and (H1). For any \(p\geqslant 2\), \(\Theta :=(h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\) and \(t\in [0,1]\), \(X^x_t\) is in \({\mathbb {D}}^{1,p}_\Theta \) and

$$\begin{aligned} D_\Theta X^x_t=&\,\int _0^t\partial _xb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )D_\Theta X^x_s\mathrm{d}s +\int _0^t\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )D_\Theta X^x_s\mathrm{d}W_s\nonumber \\&+\int _0^t\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )h(s)\mathrm{d}s +\int _0^t\!\!\int _{B_0}f\big ({\mathbb {P}}_{X^x_s}\big ){\mathbf {v}}(s,z)N(\mathrm{d}z,\mathrm{d}s). \end{aligned}$$
(3.8)

Moreover, there exists \(C_p>0\) such that

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta X^x_s|^p\right) \leqslant C_p(1+|x|^p)\left( \Vert h\Vert _{{\mathbb {H}}_{2p}(t)}^p+\Vert {\mathbf {v}}\Vert ^p_{{\mathbb {L}}^1_p(t)}\right) , \quad \forall t\in [0,1]. \end{aligned}$$
(3.9)

Proof

Define the following Picard iteration: for all \(t\in [0,1]\), \(X^{x,0}_t\equiv x\) and

$$\begin{aligned} X^{x,n+1}_t=&\,x+\int _0^tb\big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )\mathrm{d}s +\int _0^t\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )\mathrm{d}W_s \\&+\int _0^tf\big ({\mathbb {P}}_{X^{x,n}_s}\big )\mathrm{d}L_s,\ \ n\geqslant 0. \end{aligned}$$

Then from the proof of Theorem 3.1 in [13] we have for any \(p\geqslant 2\),

$$\begin{aligned} \lim \limits _{n\rightarrow \infty } {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|X^{x,n}_t-X^x_t|^p\right) =0. \end{aligned}$$
(3.10)

Now let’s prove the following statement: for any \(n\geqslant 1\),

$$\begin{aligned} X^{x,n}_t\in {\mathbb {D}}_{\Theta }^{1,p}, \quad \forall t\in [0,1] \text { and } {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|D_\Theta X^{x,n}_t|^p\right) <\infty , \quad \forall p\geqslant 2. \end{aligned}$$
(3.11)

Due to (2.7), (2.9) and (2.10), it is clear that (3.11) holds for \(n=1\). Suppose that (3.11) holds for some n. Then, by(H1) and the chain rule [23, Lemma 2.4], we have \(\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )\in {\mathbb {D}}_{\Theta }^{1,p}\) and

$$\begin{aligned} D_\Theta \sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big ) =\partial _x\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )D_\Theta X^{x,n}_s. \end{aligned}$$
(3.12)

Also by (2.7), we have

$$\begin{aligned} D_\Theta \int _0^tf\big ({\mathbb {P}}_{X^{x,n}_s}\big )\mathrm{d}L_s=\int _0^t\!\!\int _{B_0}f\big ({\mathbb {P}}_{X^{x,n}_s}\big ){\mathbf {v}}(s,z)N(\mathrm{d}z,\mathrm{d}s). \end{aligned}$$

Using the chain rule and Lemma 2.3 in [23], one can show that \(\int _0^tb(X^n_s, {\mathbb {P}}_{X^{x,n}_s})\mathrm{d}s\in {\mathbb {D}}^{1,p}_\Theta \) and

$$\begin{aligned} D_\Theta \int _0^tb\big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )\mathrm{d}s =\int _0^t\partial _xb\big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )D_\Theta X^{x,n}_s\mathrm{d}s. \end{aligned}$$

Therefore, \(X^{x,n+1}_t\in {\mathbb {D}}^{1,p}_\Theta \) and

$$\begin{aligned} D_\Theta X^{x,n+1}_t=&\,\int _0^t\partial _xb\big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )D_\Theta X^{x,n}_s\mathrm{d}s +\int _0^t\partial _x\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )D_\Theta X^{x,n}_s\mathrm{d}W_s\nonumber \\&+\int _0^t\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )h(s)\mathrm{d}s +\int _0^t\!\!\int _{\Gamma _0}f\big ({\mathbb {P}}_{X^{x,n}_s}\big ){\mathbf {v}}(s,z)N(\mathrm{d}z,\mathrm{d}s). \end{aligned}$$
(3.13)

By (2.9) and (B1), we can easily have \({\mathbb {E}}\left( \sup \nolimits _{t\in [0,1]}|D_\Theta X^{x,n+1}_t|^p\right) <\infty \). So we have proved (3.11).

Due to (H1), (3.2) and the condition \((h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\), the linear Eq.  (3.8) has a unique solution denoted by \(\{Y_t\}_{t\in [0,1]}\). For any \(p\geqslant 2\), by (2.9) and (2.10) one can arrive at

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|Y_s|^p\right)&\leqslant C_p\int _0^t{\mathbb {E}}|\partial _xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )Y_s|^p\mathrm{d}s\\&\quad +C_p{\mathbb {E}}\left( \int _0^t|\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )Y_s|^2\mathrm{d}s\right) ^{\frac{p}{2}}\\&\quad +C_p{\mathbb {E}}\left( \int _0^t|\sigma (X^x_s, {\mathbb {P}}_{X^{x}_s})h(s)|\mathrm{d}s\right) ^p\\&\quad +C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{B_0}|f({\mathbb {P}}_{X^{x}_s}){\mathbf {v}}(s,z)|N(\mathrm{d}z,\mathrm{d}s)\right) ^p\\&\leqslant C_p\int _0^t{\mathbb {E}}\left( \sup \limits _{s\leqslant r}|Y_s|^p\right) \mathrm{d}r +C_p\left[ {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|X^x_t|^{2p}+1\right) \right] ^{\frac{1}{2}}\\&\quad \left( \Vert h\Vert _{{\mathbb {H}}_{2p}(t)}^p+\Vert {\mathbf {v}}\Vert ^p_{{\mathbb {L}}^1_p(t)}\right) . \end{aligned}$$

Gronwall’s inequality, together with (3.2), implies

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|Y_s|^p\right) \leqslant C_p\left( 1+|x|^p\right) \left( \Vert h\Vert _{{\mathbb {H}}_{2p}(t)}^p+\Vert {\mathbf {v}}\Vert ^p_{{\mathbb {L}}^1_p(t)}\right) . \end{aligned}$$

It follows from (3.8) and (3.13) that

$$\begin{aligned} {\mathbb {E}}&\left( \sup \limits _{t\in [0,1]}|D_\Theta X^{x,n+1}_s-Y_t|^p\right) \\&\quad \leqslant C_p\int _0^1{\mathbb {E}}|\partial _xb\big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )D_\Theta X^{x,n}_s -\partial _xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )Y_s|^p\mathrm{d}s\\&\qquad +C_p\int _0^1{\mathbb {E}}|\partial _x\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )D_\Theta X^{x,n}_s -\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )Y_s|^p\mathrm{d}s\\&\qquad +C_p{\mathbb {E}}\left( \int _0^1|\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big ) -\sigma \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )||h(s)|\mathrm{d}s\right) ^p\\&\qquad +C_p{\mathbb {E}}\left( \int _0^1\!\!\int _{\Gamma _0}|f\big ({\mathbb {P}}_{X^{x,n}_s}\big )-f({\mathbb {P}}_{X^{x}_s})||{\mathbf {v}}(s,z)|N(\mathrm{d}z,\mathrm{d}s)\right) ^p\\&\quad \leqslant C_p\int _0^1{\mathbb {E}}|\partial _xb\big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )-\partial _xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )|^p|Y_s|^p\mathrm{d}s\\&\qquad +C_p\int _0^1{\mathbb {E}}|\partial _x\sigma \big (X^{x,n}_s,{\mathbb {P}}_{X^{x,n}_s}\big )-\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )|^p|Y_s|^p\mathrm{d}s\\&\qquad +C_p\int _0^1{\mathbb {E}}|D_\Theta X^{x,n}_s-Y_s|^p\mathrm{d}s\\&\qquad +C_p{\mathbb {E}}\left[ \int _0^1\!\!|X^{x,n}_s-X^x_s|^2+{\mathbb {W}}_2({\mathbb {P}}_{X^{x,n}_s},{\mathbb {P}}_{X^{x}_s})^2\mathrm{d}s \int _0^1\!\!|h(s)|^2\mathrm{d}s\right] ^{\frac{p}{2}} \\&\qquad + C_p\sup \limits _{s\in [0,1]}{\mathbb {W}}_2({\mathbb {P}}_{X^{x,n}_s},{\mathbb {P}}_{X^{x}_s})^p\Vert {\mathbf {v}}\Vert ^p_{{\mathbb {L}}^1_p}\\&\quad \leqslant C_p\int _0^1{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta X^{x,n}_s-Y_s|^p\right) \mathrm{d}t\\&\qquad +C_p\left( {\mathbb {E}}\sup \limits _{t\in [0,1]}|X^{x,n}_s-X^x_s|^{2p}\right) ^{\frac{1}{2}} \left( \left( {\mathbb {E}}\sup \limits _{t\in [0,1]}|Y_t|^{2p}\right) ^{\frac{1}{2}}+\Vert h\Vert ^p_{{\mathbb {H}}_{2p}} +\Vert {\mathbf {v}}\Vert ^p_{{\mathbb {L}}^1_p}\right) . \end{aligned}$$

Gronwall’s inequality implies

$$\begin{aligned} \limsup \limits _{n\rightarrow \infty } {\mathbb {E}}&\left( \sup \limits _{t\in [0,1]}|D_\Theta X^{x,n+1}_t-Y_t|^p\right) \leqslant C_p\limsup \limits _{n\rightarrow \infty } \left( {\mathbb {E}}\sup \limits _{t\in [0,1]}|X^{x,n}_t-X^x_t|^{2p}\right) ^{\frac{1}{2}}=0. \end{aligned}$$

Combining this with (3.10) and the fact \({\mathbb {W}}_2({\mathbb {P}}_{X^{x,n}_s},{\mathbb {P}}_{X^{x}_s})^p\leqslant {\mathbb {E}}|X^{x,n}_s-X^x_s|^p\), and letting \(n\rightarrow \infty \) in (3.13), we obtain \(X^x_t\in {\mathbb {D}}_\Theta ^{1,p}\) and \(D_\Theta X^x_t\) satisfies Eq. (3.8). \(\square \)

Lemma 3.4

Assume (\(H_\nu \)) and (H1). For any \(\Theta :(h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\), we have \(K_t\in {\mathbb {D}}_\Theta ^{1,2}\). Moreover, there exists a constant \(C>0\) such that

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta K_s|^2\right) \leqslant C\left( \Vert h\Vert ^2_{{\mathbb {H}}_4(t)}+\Vert h\Vert ^2_{{\mathbb {H}}_8(t)}+\Vert {\mathbf {v}}\Vert ^2_{{\mathbb {L}}^1_4(t)}\right) , \quad \forall t\in [0,1]. \end{aligned}$$
(3.14)

Proof

Define the following Picard’s iteration: for each \(t\in [0,1]\), \(K^{(0)}_t=I\) and for \(n\geqslant 0\),

$$\begin{aligned} K^{(n+1)}_t=&\,I-\!\!\int _0^t\!\!K^{(n)}_s\partial _xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\mathrm{d}s+\!\!\sum _{k=1}^d\!\int _0^t\!\!K^{(n)}_s\left( \partial _x\sigma _k(X^x_s, {\mathbb {P}}_{X^{x}_s})\right) ^2\mathrm{d}s\\&-\!\!\sum _{k=1}^d\!\int _0^t\!\! K^{(n)}_s\partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\mathrm{d}W^k_s,\ \ t\in [0,1]. \end{aligned}$$

Then for any \(p\geqslant 2\), it is routine to prove that

$$\begin{aligned} \lim \limits _{n\rightarrow \infty }{\mathbb {E}}\left( \sup \limits _{t \in [0,1]}|K^{(n)}_t-K_t|^p\right) =0. \end{aligned}$$
(3.15)

By induction, Proposition 1.3.2 and 1.2.4 in [21], and Proposition 3.3 we have \(K^{(n+1)}_t\) is Malliavin differentiable along \(\Theta \). Moreover,

$$\begin{aligned}&D_\Theta K^{(n+1)}_t=-\!\!\int _0^t\!\!D_\Theta K^{(n)}_s\partial _xb(X^x_s, {\mathbb {P}}_{X^{x}_s})\mathrm{d}s -\!\!\int _0^t\!\!K^{(n)}_s\partial ^2_xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big ) D_\Theta X^x_s\mathrm{d}s\\&\quad \qquad \qquad \qquad +\!\!\sum _{k=1}^d\!\int _0^t\!\!D_\Theta K^{(n)}_s \left( \partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\right) ^2\mathrm{d}s \\&\quad \qquad \qquad \qquad +2\!\!\sum _{k=1}^d\!\int _0^t\!\!K^{(n)}_s\partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big ) \partial ^2_x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )D_\Theta X^x_s\mathrm{d}s\\&\quad \qquad \qquad \qquad -\!\!\sum _{k=1}^d\!\int _0^t\!\!D_\Theta K^{(n)}_s\partial _x\sigma _k \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\mathrm{d}W^k_s -\!\!\sum _{k=1}^d\!\int _0^t\!\! K^{(n)}_s\partial ^2_x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )D_\Theta X^x_s\mathrm{d}W^k_s\\&\quad \qquad \qquad \qquad -\!\!\sum _{k=1}^d\!\int _0^t\!\!K^{(n)}_s\partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )h_k(s)\mathrm{d}s, \end{aligned}$$

where \(h_k\) denotes the k-th component of h. Let \(\{Y_t\}_{t\in [0,1]}\) solve the following linear equation:

$$\begin{aligned}&Y_t=-\!\!\int _0^t\!\!Y_s\partial _xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\mathrm{d}s -\!\!\int _0^t\!\!K_s\partial ^2_xb\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )D_\Theta X^x_s\mathrm{d}s\nonumber \\&\quad +\!\!\sum _{k=1}^d\!\int _0^t\!\!Y_s\left( \partial _x\sigma _k \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\right) ^2\mathrm{d}s +2\!\!\sum _{k=1}^d\!\int _0^t\!\! K_s\partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big ) \partial ^2_x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )D_ \Theta X^x_s\mathrm{d}s\nonumber \\&\quad -\!\!\sum _{k=1}^d\!\int _0^t\!\!Y_s\partial _x\sigma _k \big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )\mathrm{d}W^k_s -\!\!\sum _{k=1}^d\!\int _0^t\!\!K_s\partial ^2_ x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )D_\Theta X^x_s\mathrm{d}W^k_s\nonumber \\&\quad -\!\!\sum _{k=1}^d\! \int _0^t\!\!K_s\partial _x\sigma _k\big (X^x_s,{\mathbb {P}}_{X^{x}_s}\big )h_k(s)\mathrm{d}s. \end{aligned}$$
(3.16)

Then by Hölder’s inequality and (2.9), we can arrive at

$$\begin{aligned} {\mathbb {E}}&\left( \sup \limits _{t\in [0,1]}|D_\Theta K^{(n+1)}_s-Y_s|^2\right) \\&\quad \leqslant C\int _0^1{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta K^{(n)}_s-Y_s|^2\right) \mathrm{d}s +C\int _0^1{\mathbb {E}}|K^{(n)}_s- K_s|^2|D_\Theta X^x_s|^2\mathrm{d}s +C{\mathbb {E}}\left( \int _0^1|K^{(n)}_s-K_s||h(s)|\mathrm{d}s\right) ^2\\&\quad \leqslant C\!\int _0^1\!\!{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta K^{(n)}_s-Y_s|^2\right) \mathrm{d}s + C\!\left[ {\mathbb {E}}\left( \sup \limits _{t \in [0,1]}|K^{(n)}_t-K_t|^4\right) \right] ^{\frac{1}{2}} \left[ \left( {\mathbb {E}}\sup \limits _{t\in [0,1]}|D_\Theta X^x_t|^4\right) ^{\frac{1}{2}}+\Vert h\Vert ^2_{{\mathbb {H}}_4}\right] . \end{aligned}$$

Gronwall’s inequality, together with (3.9), yields

$$\begin{aligned} \lim \limits _{n\rightarrow \infty } {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|D_\Theta K^{(n+1)}_s-Y_s|^2\right) \leqslant&\,\lim \limits _{n\rightarrow \infty } C\!\left[ {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|K^{(n)}_t-K_t|^4\right) \right] ^{\frac{1}{2}} \\&\left[ \left( {\mathbb {E}}\sup \limits _{t\in [0,1]}|D_\Theta X^x_t|^4\right) ^{\frac{1}{2}}\!\! +\Vert h\Vert ^2_{{\mathbb {H}}_4}\right] =0. \end{aligned}$$

Combining this with (3.15), we obtain \(K_t\in {\mathbb {D}}_\Theta ^{1,2}\) and \(D_\Theta K_t=Y_t\) for all \(t\in [0,1]\) a.s.. Moreover, by (3.9), (3.7) and (3.16) we have

$$\begin{aligned}&{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta K_s|^2\right) \\&\quad \leqslant C\int _0^t{\mathbb {E}}\left( \sup \limits _{s\leqslant r}|D_\Theta K_s|^2\right) \mathrm{d}r +C\int _0^t{\mathbb {E}}|K_s|^2|D_\Theta X^x_s|^2\mathrm{d}s +C{\mathbb {E}}\left( \int _0^t|K_s||h(s)|\mathrm{d}s\right) ^2\\&\quad \leqslant C\int _0^t{\mathbb {E}}\left( \sup \limits _{s\leqslant r}|D_\Theta K_s|^2\right) \mathrm{d}r +C\left[ {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|K_s|^4\right) \right] ^{\frac{1}{2}} \left\{ \left[ {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta X^x_s|^4\right) \right] ^{\frac{1}{2}} \right. \\&\left. \qquad +\Vert h\Vert ^2_{{\mathbb {H}}_4(t)}\right\} \\&\quad \leqslant C\int _0^t{\mathbb {E}}\left( \sup \limits _{s\leqslant r}|D_\Theta K_s|^2\right) \mathrm{d}r +C(1+|x|^2)\left( \Vert h\Vert ^2_{{\mathbb {H}}_8(t)}+\Vert h\Vert ^2_{{\mathbb {H}}_4(t)}+\Vert {\mathbf {v}}\Vert ^2_{{\mathbb {L}}^1_4(t)}\right) . \end{aligned}$$

Hence,

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta K_s|^2\right) \leqslant C(1+|x|^2)\left( \Vert h\Vert ^2_{{\mathbb {H}}_8(t)}+\Vert h\Vert ^2_{{\mathbb {H}}_4(t)}+\Vert {\mathbf {v}}\Vert ^2_{{\mathbb {L}}^1_4(t)}\right) , \end{aligned}$$

where C is a constant independent of t. \(\square \)

3.2 Directional Derivative with Respect to Initial Value

Recall that for given \(x,y\in {\mathbb {R}}^d\), the directional derivative of \(X^x_t\) along the direction y is defined as

$$\begin{aligned} \nabla _y X^x_t:=L^2 - \lim \limits _{\epsilon \rightarrow 0}\frac{1}{\epsilon } \left( X^{x+\epsilon y}_t-X^x_t\right) ,\quad \forall t\in [0,1]. \end{aligned}$$

Denote by \((\tilde{W}, \tilde{L})\) a copy of (WL) on some complete probability space \((\tilde{\Omega },\tilde{{\mathscr {F}}},\tilde{{\mathbb {P}}})\), and by \(\{\tilde{X}^x_t\}_{t\geqslant 0}\) the copy of the solution to the SDE (1.1), but driven by Brownian motion \(\tilde{W}\) and Lévy process \(\tilde{L}\). Obviously, \((\tilde{W},\tilde{L},\tilde{X}^x)\) is an independent copy of \((W, L, X^x)\), defined over \((\tilde{\Omega },\tilde{{\mathscr {F}}},\tilde{{\mathbb {P}}})\). And for all \(t\in [0,1]\), \(\nabla _y\tilde{X}^x_t\) is the directional derivative of \(\tilde{X}^x_t\) along the direction y.

Proposition 3.5

Assume (H1) and (H2). Then for any \(t\in [0,1]\) and \(x, y\in {\mathbb {R}}^d\), we have

$$\begin{aligned} \nabla _y X^x_t =&\,y+\int _0^t\partial _xb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\nabla _y X^x_s\mathrm{d}s +\int _0^t\tilde{{\mathbb {E}}}\left( \partial _\mu b\big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) (\tilde{X}^x_s)\nabla _y\tilde{X}^x_s\right) \mathrm{d}s\nonumber \\&+\int _0^t\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\nabla _y X^x_s\mathrm{d}W_s +\int _0^t\tilde{{\mathbb {E}}}\left( \partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) (\tilde{X}^x_s)\nabla _y\tilde{X}^x_s\right) \mathrm{d}W_s\nonumber \\&+\int _0^t\tilde{{\mathbb {E}}}\left( \partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big ) (\tilde{X}^x_s)\nabla _y\tilde{X}^x_s\right) \mathrm{d}L_s. \end{aligned}$$
(3.17)

Moreover, for any \(p\geqslant 2\) there exists \(C_p>0\) such that

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|\nabla _y X^x_t|^p\right) \leqslant C_p|y|^p, \end{aligned}$$
(3.18)

where \(C_p\) is a constant independent of xy and t.

Proof

For the sake of convenience, we assume \(b\equiv 0\). For \(\epsilon >0\), let

$$\begin{aligned} X^{x+\epsilon y}_t =x+\epsilon y+\int _0^t\sigma \left( X^{x+\epsilon x}_s,{\mathbb {P}}_{X^{x+\epsilon y}_s}\right) \mathrm{d}W_s +\int _0^tf\left( {\mathbb {P}}_{X^{x+\epsilon y}_s}\right) \mathrm{d}L_s, \ \ \ t\in [0,1]. \end{aligned}$$

Then for any \(p\geqslant 2\), by the Lipschitz continuity of \(\sigma \) and f and Gronwall’s inequality it is easy to prove

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|X^{x+\epsilon y}_t-X^x_t|^p\right) \leqslant C_p|y|^p\epsilon ^p. \end{aligned}$$
(3.19)

Observe that

$$\begin{aligned}&\sigma \big (X^{x+\epsilon y}_s,{\mathbb {P}}_{X^{x+\epsilon y}_s}\big )-\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\\&\quad =\,\int _0^1\partial _\lambda \left( \sigma \big (X^x_s+\lambda (X^{x+\epsilon y}_s-X^x_s), {\mathbb {P}}_{X^{x+\epsilon y}_s}\big )\right) \mathrm{d}\lambda \\&\qquad +\int _0^1\partial _\lambda \left( \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s +\lambda (X^{x+\epsilon y}_s-X^x_s)}\big )\right) \mathrm{d}\lambda \\&\quad = \,\ \alpha ^\epsilon _s\big (X^{x+\epsilon y}_s-X^x_s\big ) +\tilde{{\mathbb {E}}}\left( \beta ^\epsilon _s\big (\tilde{X}^{x+\epsilon y}_s -\tilde{X}^x_s\big )\right) , \end{aligned}$$

where

$$\begin{aligned} \alpha ^\epsilon _s:=\int _0^1\partial _x\sigma \big (X^x_s+\lambda \big (X^{x+\epsilon y}_s-X^x_s\big ), {\mathbb {P}}_{X^{x+\epsilon y}_s}\big )\mathrm{d}\lambda \end{aligned}$$

and

$$\begin{aligned} \beta ^\epsilon _s :=\int _0^1\partial _\mu \sigma \left( X^x_s,{\mathbb {P}}_{X^x_s+\lambda \left( X^{x+\epsilon y}_s-X^x_s\right) }\right) \left( \tilde{X}^x_s+\lambda \left( \tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s\right) \right) \mathrm{d}\lambda . \end{aligned}$$

Moreover, for any \(p\geqslant 2\) by the Lipschitz continuity of \(\partial _x\sigma \) and \(\partial _\mu \sigma \) we have

$$\begin{aligned}&{\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|\alpha ^\epsilon _s -\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )|^p\right) \nonumber \\&\quad \leqslant {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}\int _0^1 |\partial _x\sigma \big (X^x_s+\lambda \big (X^{x+\epsilon y}_s-X^x_s\big ), {\mathbb {P}}_{X^{x+\epsilon y}_s}\big )-\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )|^p\mathrm{d}\lambda \right) \nonumber \\&\quad \leqslant C_p{\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|X^{x+\epsilon y}_s-X^x_s|^p +\sup \limits _{s\in [0,1]}{\mathbb {W}}_2^p\big ({\mathbb {P}}_{X^{x+\epsilon y}_s},{\mathbb {P}}_{X^x_s}\big ) \right) \nonumber \\&\quad \leqslant C_p{\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|X^{x+\epsilon y}_s-X^x_s|^p\right) \leqslant C_p|y|^p\epsilon ^p, \end{aligned}$$
(3.20)

and

$$\begin{aligned}&\tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]}|\beta ^\epsilon _s -\partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\big (\tilde{X}^x_s\big )|^p\right) \nonumber \\&\quad \leqslant C_p\tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]}|\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s|^p +\sup \limits _{s\in [0,1]}\sup \limits _{\lambda \in [0,1]}{\mathbb {W}}_2\big ({\mathbb {P}}_{X^x_s+\lambda \big (X^{x+\epsilon y}_s-X^x_s\big )},{\mathbb {P}}_{X^x_s}\big )^p \right) \nonumber \\&\quad \leqslant C_p\tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]}|\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s|^p\right) \leqslant C_p|y|^p\epsilon ^p. \end{aligned}$$
(3.21)

By the similar argument as above, we have

$$\begin{aligned} f\big ({\mathbb {P}}_{X^{x+\epsilon y}_s}\big )-f\big ({\mathbb {P}}_{X^x_s}\big )=\tilde{{\mathbb {E}}}\left( \gamma ^\epsilon _s\big (\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s\big )\right) \end{aligned}$$

for some process \(\gamma ^\epsilon \) with

$$\begin{aligned}&\tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]}|\gamma ^\epsilon _s -\partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big )(\tilde{X}^x_s)|^p\right) \leqslant C_p|y|^p\epsilon ^p. \end{aligned}$$
(3.22)

Consider the following equation:

$$\begin{aligned} Y^x_t(y)=&\,y+\int _0^t\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )Y^x_s(y)\mathrm{d}W_s +\int _0^t\tilde{{\mathbb {E}}}\left[ \partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )\tilde{Y}^x_s(y)\right] \mathrm{d}W_s\nonumber \\&+\int _0^t\tilde{{\mathbb {E}}}\left[ \partial _\mu \sigma \big ({\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )\tilde{Y}^x_s(y)\right] \mathrm{d}L_s. \end{aligned}$$
(3.23)

By classical Picard’s iteration, it is not difficult to prove that there is a unique solution and an independent copy \(\tilde{Y}\) of Y, defined on \((\tilde{\Omega },\tilde{{\mathscr {F}}},\tilde{{\mathbb {P}}})\). Then

$$\begin{aligned}&X^{x+\epsilon y}_t-X^x_t-\epsilon Y^x_t(y) =\int _0^t\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (X^{x+\epsilon y}_s-X^x_s-\epsilon Y^x_s(y)\big )\mathrm{d}W_s\\&\ \ \ +\int _0^t\tilde{{\mathbb {E}}} \left( \partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\big (\tilde{X}^x_s\big ) \big (\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s-\epsilon \tilde{Y}^x_s(y)\big )\right) \mathrm{d}W_s\\&\ \ \ +\int _0^t\big (\alpha ^\epsilon _s-\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\big ) \big (X^{x+\epsilon y}_s-X^x_s\big )\mathrm{d}W_s\\&\ \ \ +\int _0^t\tilde{{\mathbb {E}}}\left[ \big (\beta ^\epsilon _s-\partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )\big )\big (\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s\big )\right] \mathrm{d}W_s\\&\ \ \ +\int _0^t\tilde{{\mathbb {E}}}\left( \partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big )\big (\tilde{X}^x_s\big ) \big (\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s-\epsilon \tilde{Y}^x_s(y)\big )\right) \mathrm{d}L_s\\&\ \ \ +\int _0^t\tilde{{\mathbb {E}}}\left[ \big (\gamma ^\epsilon _s-\partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )\big )\big (\tilde{X}^{x+\epsilon y}_s -\tilde{X}^x_s\big )\right] \mathrm{d}L_s. \end{aligned}$$

Hence, it follows from Lemma 2.7 that

$$\begin{aligned}&{\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|X^{x+\epsilon y}_t-X^x_t-\epsilon Y^x_t(y)|^2\right) \\&\quad \leqslant C\int _0^1{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|X^{x+\epsilon y}_s-X^x_s-\epsilon Y^x_s(y)|^2\right) \mathrm{d}t \\&\qquad +C\int _0^1\tilde{{\mathbb {E}}}\left( \sup \limits _{s\leqslant t}|\tilde{X}^{x+\epsilon y}_s-\tilde{X}^x_s -\epsilon \tilde{Y}^x_s(y)|^2\right) \mathrm{d}t\\&\qquad +C\int _0^1{\mathbb {E}}\left( |X^{x+\epsilon y}_s-X^x_s|^2|\alpha ^\epsilon _s -\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )|^2\right) \mathrm{d}s\\&\qquad +C{\mathbb {E}}\int _0^1\tilde{{\mathbb {E}}}\left( |\tilde{X}^{x+\epsilon y}_s -\tilde{X}^x_s|^2|\beta ^\epsilon _s-\partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )|^2\right) \mathrm{d}s\\&\qquad +C{\mathbb {E}}\int _0^1\!\!\int _{{\mathbb {R}}^d_0}\tilde{{\mathbb {E}}}\left( |\tilde{X}^{x+\epsilon y}_s -\tilde{X}^x_s|^2|\gamma ^\epsilon _s-\partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )|^2\right) |z|^2\nu (\mathrm{d}z)\mathrm{d}s\\&\quad \leqslant C\int _0^1{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|X^{x+\epsilon y}_s-X^x_s-\epsilon Y^x_s(y)|^2\right) \mathrm{d}t\\&\qquad +C\left[ {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|X^{x+\epsilon y}_s-X^x_s|^4\right) \right] ^{\frac{1}{2}} \left[ {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|\alpha ^\epsilon _s -\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )|^4\right) \right] ^{\frac{1}{2}}\\&\qquad +C\left[ {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|X^{x+\epsilon y}_s-X^x_s|^4\right) \right] ^{\frac{1}{2}} \left[ \tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]} |\beta ^\epsilon _s-\partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )|^4\right) \right] ^{\frac{1}{2}}\\&\qquad +C\left[ {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|X^{x+\epsilon y}_s-X^x_s|^4\right) \right] ^{\frac{1}{2}} \left[ \tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]} |\gamma ^\epsilon _s-\partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )|^4\right) \right] ^{\frac{1}{2}}. \end{aligned}$$

Gronwall’s inequality, together with (3.19), (3.20), (3.21) and (3.22), yields

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|X^{x+\epsilon y}_t-X^x_t-\epsilon Y^x_t(y)|^2\right) \leqslant C|y|^4\epsilon ^4. \end{aligned}$$

Thus,

$$\begin{aligned} \lim \limits _{\epsilon \rightarrow 0} {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}\left| \epsilon ^{-1}\big (X^{x+\epsilon y}_t-X^x_t\big )- Y^x_t(y)\right| ^2\right) =0. \end{aligned}$$

For (3.18), it is due to Lemma 2.7 and Grondwall’s inequality. \(\square \)

Lemma 3.6

Assume (\(H_\nu \)), (H1) and (H2). For any \(x,y\in {\mathbb {R}}^d\) and \(\Theta =(h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\), we have \(\nabla _y X^x_t\in {\mathbb {D}}_\Theta ^{1,2}\) and

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta \nabla _y X^x_s|^2\right) \leqslant&\, C(1+|x|^2)|y|^2\left( \Vert h\Vert ^2_{{\mathbb {H}}_4(t)}+\Vert h\Vert ^2_{{\mathbb {H}}_8(t)}+\Vert {\mathbf {v}}\Vert _{{\mathbb {L}}_4(t)}^2 \right. \nonumber \ \\&+\left. \Vert {\mathbf {v}}\Vert _{{\mathbb {V}}_2(t)}^2+\Vert {\mathbf {v}}\Vert ^2_{{\mathbb {V}}_4(t)}\right) , \end{aligned}$$
(3.24)

where C is a constant independent of xy and t.

Proof

By the similar argument as discussed in the proof of Lemma 3.4, using Picard’s iteration we can prove that \(\nabla _y X^x_t\) is Malliavin differentiable. And by (3.17), we have

$$\begin{aligned} D_\Theta \nabla _y X^x_t =&\,\int _0^t\partial ^2_xb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )D_\Theta X^x_s\nabla _y X^x_s\mathrm{d}s +\int _0^t\partial _xb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )D_\Theta \nabla _y X^x_s\mathrm{d}s\\&+\int _0^t\tilde{{\mathbb {E}}}\left( \partial _x\partial _\mu b\big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )D_\Theta X^x_s\nabla _y\tilde{X}^x_s\right) \mathrm{d}s\\&+\int _0^t\partial ^2_x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )D_\Theta X^x_s\nabla _y X^x_s\mathrm{d}W_s\\&+\int _0^t\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )D_\Theta \nabla _y X^x_s\mathrm{d}W_s\\&+\int _0^t\partial _x\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\nabla _y X^x_sh(s)\mathrm{d}s \\&+\int _0^t\tilde{{\mathbb {E}}}\left( \partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )\nabla _y\tilde{X}^x_s\right) h(s)\mathrm{d}s\\&+\int _0^t\tilde{{\mathbb {E}}}\left( \partial _x\partial _\mu \sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )D_\Theta X^x_s\nabla _y\tilde{X}^x_s\right) \mathrm{d}W_s\\&+\int _0^t\!\!\int _{B_0}\tilde{{\mathbb {E}}}\left( \partial _\mu f\big ({\mathbb {P}}_{X^x_s}\big ) \big (\tilde{X}^x_s\big )\nabla _y\tilde{X}^x_s\right) {\mathbf {v}}(s,z)N(\mathrm{d}z, \mathrm{d}s). \end{aligned}$$

Then by Lemma 2.7 and Hölder’s inequality one can obtain

$$\begin{aligned}&{\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta \nabla _y X^x_s|^2\right) \leqslant C\int _0^t{\mathbb {E}}|D_\Theta X^x_s|^2|\nabla _y X^x_s|^2\mathrm{d}s +C\int _0^t{\mathbb {E}}|D_\Theta \nabla _y X^x_s|^2\mathrm{d}s\\&\ \ +C\int _0^t{\mathbb {E}}|D_\Theta X^x_s|^2\tilde{{\mathbb {E}}}|\nabla _y\tilde{X}^x_s|^2\mathrm{d}s +C{\mathbb {E}}\left( \int _0^t|\nabla _y X^x_s||h(s)|\mathrm{d}s\right) ^2\\&\ \ +C{\mathbb {E}}\left( \int _0^t\tilde{{\mathbb {E}}}|\nabla _y\tilde{X}^x_s||h(s)|\mathrm{d}s\right) ^2 +C{\mathbb {E}}\left( \int _0^t\!\!\!\int _{B_0}\tilde{{\mathbb {E}}}|\nabla _y\tilde{X}^x_s||{\mathbf {v}}(s,z)| \nu (\mathrm{d}z)\mathrm{d}s\right) ^2\\&\ \ +C{\mathbb {E}}\left( \int _0^t\!\!\!\int _{B_0}\tilde{{\mathbb {E}}}|\nabla _y\tilde{X}^x_s|^2|{\mathbf {v}}(s,z)|^2 \nu (\mathrm{d}z)\mathrm{d}s\right) \\&\leqslant C\int _0^t{\mathbb {E}}\left( \sup \limits _{s\leqslant r}|D_\Theta \nabla _y X^x_s|^2\right) \mathrm{d}r +C\left[ {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|\nabla _y X^x_t|^4\right) \right] ^{\frac{1}{2}}\Vert h\Vert ^2_{{\mathbb {H}}_4(t)}\\&\ \ +C\left[ {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|\nabla _y X^x_t|^4\right) \right] ^{\frac{1}{2}}\left[ {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta X^x_s|^4\right) \right] ^{\frac{1}{2}}\\&\ \ +C\tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]}|\nabla _y\tilde{X}^x_s|^2\right) \Vert {\mathbf {v}}\Vert _{{\mathbb {V}}_2(t)}^2 +C\left[ \tilde{{\mathbb {E}}}\left( \sup \limits _{s\in [0,1]}|\nabla _y\tilde{X}^x_s|^4\right) \right] ^{\frac{1}{2}}\Vert {\mathbf {v}}\Vert ^2_{{\mathbb {V}}_4(t)}. \end{aligned}$$

Gronwall’s inequality, together with (3.9) and (3.18), gives

$$\begin{aligned} {\mathbb {E}}\left( \sup \limits _{s\leqslant t}|D_\Theta \nabla _y X^x_s|^2\right) \leqslant&\, C(1+|x|^2)|y|^2\left( \Vert h\Vert ^2_{{\mathbb {H}}_4(t)}+\Vert h\Vert ^2_{{\mathbb {H}}_8(t)} +\Vert {\mathbf {v}}\Vert _{{\mathbb {L}}^1_4(t)}^2\right. \nonumber \\&+\left. \Vert {\mathbf {v}}\Vert _{{\mathbb {V}}_2(t)}^2+\Vert {\mathbf {v}}\Vert ^2_{{\mathbb {V}}_4(t)}\right) , \end{aligned}$$

where C is a constant independent of xy and t. \(\square \)

3.3 Prooof of Theorem 1.2

The following lemma, which was introduced in [24, Lemma 5.2] and [27, Lemma 2.5,2.6], is very useful to derive the gradient estimates.

Lemma 3.7

Under (1.5), we have the following statements:

  1. 1.

    for any \(p\geqslant 2\), there exist constants \(\epsilon _0, C_0, C_1>0\) such that for all \(\epsilon \in (0, \epsilon _0)\),

    $$\begin{aligned} C_0\epsilon ^{p-\alpha }\leqslant \int _{[|z|\leqslant \epsilon ]}|z|^p\nu (\mathrm{d}z)\leqslant C_1\epsilon ^{p-\alpha }. \end{aligned}$$
    (3.25)
  2. 2.

    for any \(p\geqslant 2\), there exists a constant \(C_p>0\) such that for each \(t,\epsilon \in (0,1)\),

    $$\begin{aligned} {\mathbb {E}}\left( \int _0^t\!\!\int _{[0<|z|\leqslant \epsilon ]}|z|^3 N(\mathrm{d}z,\mathrm{d}s)\right) ^{-p} \leqslant C_p\left( (t\epsilon ^{3-\alpha })^{-p}+t^{-\frac{3 p}{\alpha }}\right) . \end{aligned}$$
    (3.26)

For any \(\Theta =(h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\), by (3.5), (3.6), (3.8) and applying Itô’s formula to \(K_tD_\Theta X^x_t\), one can easily have

$$\begin{aligned}&D_\Theta X^x_t=J_t\left( \int _0^tK_s\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )h(s)\mathrm{d}s +\int _0^t\!\!\int _{\Gamma _0}K_s f\big ({\mathbb {P}}_{X^x_s}\big ){\mathbf {v}}(s,z)N(\mathrm{d}z,\mathrm{d}s)\right) , \ \ \ \nonumber \\ {}&\quad \forall t\in [0,1]. \end{aligned}$$
(3.27)

For each fixed \(t\in (0,1)\), let \(\zeta _t(z)\) be a smooth, nonnegative and real-valued function such that

$$\begin{aligned} \zeta _t(z)=|z|^3, \text { if } |z|\leqslant \frac{1}{4}t^{\frac{1}{\alpha }}\ \ \text { and }\ \ \zeta _t(z)=0, \text { if } |z|\geqslant \frac{1}{2}t^{\frac{1}{\alpha }} \end{aligned}$$

with \(|\nabla _z\zeta _t(z)|\leqslant C|z|^2\) and \(|\zeta _t(z)|\leqslant C|z|^3\), where C is a constant independent of t.

In what follows, we choose some specific \((h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\) in the following two cases.

  1. 1.

    If \(\Vert \sigma ^{-1}\Vert _\infty :=\sup \nolimits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2}| \sigma ^{-1}(x,\mu )|<\infty \), for any fixed \(t\in (0,1]\) and \(1\leqslant j\leqslant d\), we set

    $$\begin{aligned} h_{t,j}(s)=\frac{1}{t}\sigma ^{-1}\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )(J_s)_{\cdot j}, \quad \forall s\in [0,t] \ \ \ \text {and} \ \ \ {\mathbf {v}}\equiv 0, \end{aligned}$$
    (3.28)

    where \((J_s)_{\cdot j}\) stands for the j-th column of \(J_s\).

  2. 2.

    If \(\Vert f^{-1}\Vert _\infty :=\sup \nolimits _{\mu \in \mathcal {P}_2}|f^{-1}(\mu )|<\infty \), for any fixed \(t\in (0,1]\) and \(1\leqslant j\leqslant d\), set

    $$\begin{aligned} h\equiv 0\ \ \ \text {and} \ \ \ {\mathbf {v}}_{t,j}(s,z)=f^{-1}\big ({\mathbb {P}}_{X^x_s}\big )(J_s)_{\cdot j}\zeta (z), \quad \forall s\in [0,t], z\in \Gamma _0. \end{aligned}$$
    (3.29)

    Define

    $$\begin{aligned} \delta _t({\mathbf {v}}_{t,j}):=\int _0^t\!\!\int _{B_0} \frac{\mathrm{div}(\kappa (z){\mathbf {v}}_{t,j}(s,z))}{\kappa (z)}\widehat{N}(\mathrm{d}z,\mathrm{d}s), \end{aligned}$$

    and

    $$\begin{aligned} G_{t,j}:=\int _0^t\!\!\int _{B_0}\langle \nabla _z\zeta _t(z),{\mathbf {v}}_{t,j}(s,z)\rangle N(\mathrm{d}z,\mathrm{d}s). \end{aligned}$$

We have the following estimates.

Lemma 3.8

  1. 1.

    Assume \(\Vert \sigma ^{-1}\Vert _\infty <\infty \). For any \(p\geqslant 2\), we have

    $$\begin{aligned} \Vert h_{t,j}\Vert _{{\mathbb {H}}_p(t)}\leqslant C_p t^{-\frac{1}{2}},\ \ 1\leqslant j\leqslant d, \end{aligned}$$
    (3.30)

    where \(C_P\) is a constant independent of t.

  2. 2.

    Assume \(\Vert f^{-1}\Vert _\infty <\infty \). For any \(p\geqslant 2\), we have

    $$\begin{aligned} \Vert {\mathbf {v}}_{t,j}\Vert _{{\mathbb {L}}^1_p(t)}\leqslant C_pt^{\frac{3}{\alpha }}, \ \ \ \ \Vert {\mathbf {v}}_{t,j}\Vert _{{\mathbb {V}}_p(t)}\leqslant C_pt^{\frac{2}{\alpha }},\ \ 1\leqslant j\leqslant d, \end{aligned}$$
    (3.31)

    and

    $$\begin{aligned} {\mathbb {E}}|\delta _t({\mathbf {v}}_{t,j})|^p\leqslant C_pt^{\frac{2p}{\alpha }}, \ \ \ {\mathbb {E}}|G_{t,j}|^p\leqslant C_pt^{\frac{5p}{\alpha }},\ \ 1\leqslant j\leqslant d. \end{aligned}$$
    (3.32)

    where \(C_p\) is a constant independent of t.

Proof

  1. 1.

    By (3.7), we have

    $$\begin{aligned} \Vert h_{t,j}\Vert _{{\mathbb {H}}_p(t)}&=\left[ {\mathbb {E}}\left( \int _0^t|\frac{1}{t}\sigma ^{-1}\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )(J_s)_j|^2\mathrm{d}s \right) ^{\frac{p}{2}}\right] ^{\frac{1}{p}}\\&\leqslant \, \frac{1}{t}\Vert \sigma ^{-1}\Vert _\infty \left[ {\mathbb {E}}\left( \sup \limits _{t\in [0,1]}|J_s|^p\right) \right] ^{\frac{1}{p}}t^{\frac{1}{2}} \leqslant C_p t^{-\frac{1}{2}}, \end{aligned}$$

    where \(C_p\) is a constant independent of t.

  2. 2.

    For any \(p\geqslant 2\) and \(j=1,\ldots ,d\), by (2.10) and (3.25) we can obtain

    $$\begin{aligned} \Vert {\mathbf {v}}_{t,j}\Vert _{{\mathbb {L}}^1_p(t)}^p \leqslant&\, C_p{\mathbb {E}}\int _0^t\!\!\int _{B_0}|{\mathbf {v}}_{t,j}(s,z)|^p\nu (\mathrm{d}z)\mathrm{d}s +C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{B_0}|{\mathbf {v}}_{t,j}(s,z)|\nu (\mathrm{d}z)\mathrm{d}s\right) ^p\\ \leqslant&\, C_p{\mathbb {E}}\int _0^t\!\!\int _{B_0}|J_s|^p|\zeta _t(z)|^p\nu (\mathrm{d}z)\mathrm{d}s +C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{B_0}|J_s||\zeta _t(z)|\nu (\mathrm{d}z)\mathrm{d}s\right) ^p\\ \leqslant&\, C_p{\mathbb {E}}\int _0^t\!\!\int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]} |J_s|^p|z|^{3p}\nu (\mathrm{d}z)\mathrm{d}s \\&\quad +C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]} |J_s||z|^3\nu (\mathrm{d}z)\mathrm{d}s\right) ^p\\ \leqslant&\, C_pt^{\frac{3p-\alpha }{\alpha }}t +C_p t^p\left( t^{\frac{3-\alpha }{\alpha }}\right) ^p\leqslant C_pt^{\frac{3p}{\alpha }}. \end{aligned}$$

    Observe that

    $$\begin{aligned} |\nabla _z{\mathbf {v}}_{t,j}(s,z)| =|f^{-1}(\mu _s)(J_{s-})_{\cdot j}\nabla _z\zeta _t(z)| \leqslant C|J_s||z|^2I_{[0<|z|\leqslant t^{\frac{1}{\alpha }}]}. \end{aligned}$$

    Then we have

    $$\begin{aligned} \Vert {\mathbf {v}}_{t,j}\Vert _{{\mathbb {V}}_p(t)}^p \leqslant&\, C_p\Vert \nabla _z{\mathbf {v}}_{t,j}\Vert _{{\mathbb {L}}^1_p(t)}^p+C_p\Vert \varrho {\mathbf {v}}_{t,j}\Vert _{{\mathbb {L}}^1_p(t)}^p\\ \leqslant&\, C_p{\mathbb {E}}\int _0^t\!\!\int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]} |J_s|^p|z|^{2p}\nu (\mathrm{d}z)\mathrm{d}s \\&+C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]} |J_s||z|^2\nu (\mathrm{d}z)\mathrm{d}s\right) ^p\\ \leqslant&\, C_pt^{\frac{2p-\alpha }{\alpha }}t+C_p(t^{\frac{2-\alpha }{\alpha }}t)^p\leqslant C_pt^{\frac{2p}{\alpha }}. \end{aligned}$$

    Since

    $$\begin{aligned} \left| \frac{\mathrm{div}(\kappa (z){\mathbf {v}}_{t,j}(s,z))}{\kappa (z)}\right|&=\left| \langle \nabla \log \kappa (z),f^{-1}\big ({\mathbb {P}}_{X^x_s}\big )(J_s)_{\cdot j}\zeta _t(z)\rangle \right. \\&\quad \left. +\langle f^{-1}\big ({\mathbb {P}}_{X^x_s}\big )(J_s)_{\cdot j},\nabla _z\zeta _t(z)\rangle \right| \\&\leqslant \frac{C}{|z|}|J_s||z|^3I_{[0<|z|\leqslant t^{\frac{1}{\alpha }}]} +C|J_s||z|^2I_{[0<|z|\leqslant t^{\frac{1}{\alpha }}]} \leqslant C|J_s||z|^2I_{[0<|z| \leqslant t^{\frac{1}{\alpha }}]}, \end{aligned}$$

    then by (2.9) and (3.25) we have for any \(p\geqslant 2\),

    $$\begin{aligned} {\mathbb {E}}|\delta _t({\mathbf {v}}_{t,j})|^p =&\,{\mathbb {E}}\left| \int _0^t\!\!\int _{B_0}\frac{\mathrm{div}(\kappa (z){\mathbf {v}}_{t,j}(s,z))}{\kappa (z)} \widehat{N}(\mathrm{d}z,\mathrm{d}s)\right| ^p\\ \leqslant&\, \,C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{B_0}\left| \frac{\mathrm{div}(\kappa (z){\mathbf {v}}_{t,j}(s,z))}{\kappa (z)}\right| ^2\nu (\mathrm{d}z)\mathrm{d}s\right) ^{\frac{p}{2}}\\ \leqslant&\, \,C_p\left( \int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]}|z|^4\nu (\mathrm{d}z)\right) ^{\frac{p}{2}} {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|J_s|^p\right) t^{\frac{p}{2}}\\ \leqslant&\, \, C_pt^{\frac{p(4-\alpha )}{2\alpha }}t^{\frac{p}{2}} =C_pt^{\frac{2p}{\alpha }}. \end{aligned}$$

    It follows from (2.10), (3.7) and (3.25) that

    $$\begin{aligned} {\mathbb {E}}|G_{t,j}|^p&={\mathbb {E}}\left| \int _0^t\!\!\int _{B_0}\langle \nabla _z\zeta _t(z),{\mathbf {v}}_{t,j}(s,z)\rangle N(\mathrm{d}z,\mathrm{d}s)\right| ^p\\&\leqslant C_p{\mathbb {E}}\int _0^t\!\!\int _{B_0} \left| \langle \nabla _z\zeta _t(z),{\mathbf {v}}_{t,j}(s,z)\rangle \right| ^p\nu (\mathrm{d}z)\mathrm{d}s \\&\quad +C_p{\mathbb {E}}\left( \int _0^t\!\!\int _{B_0} \left| \langle \nabla _z\zeta _t(z),{\mathbf {v}}_{t,j}(s,z)\rangle \right| \nu (\mathrm{d}z)\mathrm{d}s\right) ^p\\&\leqslant C_p\int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]}|z|^{5p}\nu (\mathrm{d}z) {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|J_s|^p\right) \\&\quad +C_p\left( \int _{[0<|z|\leqslant t^{\frac{1}{\alpha }}]}|z|^{5}\nu (\mathrm{d}z)\right) ^p {\mathbb {E}}\left( \sup \limits _{s\in [0,1]}|J_s|^p\right) \\&\leqslant C_pt^\frac{5p-\alpha }{\alpha }t+C_pt^{\frac{(5-\alpha )p}{\alpha }}t^p \leqslant C_pt^{\frac{5p}{\alpha }}. \end{aligned}$$

\(\square \)

Now we are ready to give the proof of Theorem 1.2.

Proof

For any \(\Theta =(h,{\mathbf {v}})\in {\mathbb {H}}_{\infty -}\times {\mathbb {V}}_{\infty -}\), by (3.5), (3.6), (3.8) and applying Itô’s formula to \(K_tD_\Theta X^x_t\), one can easily have

$$\begin{aligned}&D_\Theta X^x_t=J_t\left( \int _0^tK_s\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )h(s)\mathrm{d}s +\int _0^t\!\!\int _{B_0}K_s f\big ({\mathbb {P}}_{X^x_s}\big ){\mathbf {v}}(s,z)N(\mathrm{d}z,\mathrm{d}s)\right) , \ \ \ \nonumber \\ {}&\quad \forall t\in [0,1]. \end{aligned}$$
(3.33)
  1. 1.

    Assume \(\Vert \sigma ^{-1}\Vert _\infty <\infty \). For any fixed \(t\in (0,1]\) and \(1\leqslant j\leqslant d\), set \(h_{t,j}\) and \({\mathbf {v}}\) as in (3.28). Define a matrix \(M_t\) by

    $$\begin{aligned} (M_t)_{ij}:=D_{(h_{t,j},0)}X^{x,i}_t,\ \ 1\leqslant i,j\leqslant d, \end{aligned}$$

    where \(X^{x,i}_t\) stands for the i-th element of \(X^x_t\). Then by (3.33) we obtain

    $$\begin{aligned} M_t=\left( D_{(h_{t,1},0)}X^x_t,\ldots ,D_{(h_{t,d},0)}X^x_t\right) =J_t. \end{aligned}$$

    For any \(g\in C^1_b({\mathbb {R}}^d)\), by Theorem 2.6 we have

    $$\begin{aligned} {\mathbb {E}}\nabla g(X^x_t)&={\mathbb {E}}\nabla g(X^x_t)M_tK_t =\sum _{i=1}^d{\mathbb {E}}\left( D_{(h_{t,i},0)}g(X^x_t)(K_t)_{i\cdot }\right) \\&=\sum _{i=1}^d{\mathbb {E}}D_{(h_{t,i},0)}\left( g(X^x_t)(K_t)_{i\cdot }\right) -\sum _{i=1}^d{\mathbb {E}}\left( g(X^x_t)D_{(h_{t,i},0)}(K_t)_{i\cdot }\right) \\&={\mathbb {E}}\left[ g(X^x_t)\sum _{i=1}^d\left( (K_t)_{i\cdot }\int _0^t\langle h_{t,i}(s),\mathrm{d}W_s\rangle -D_{(h_{t,i},0)}(K_t)_{i\cdot }\right) \right] , \end{aligned}$$

    where \((K_t)_{i\cdot }\) stands for the i-th row of \(K_t\). Moreover, it follows from Hölder’s inequality, (3.7), (3.14) and (3.30) that

    $$\begin{aligned} |{\mathbb {E}}\nabla g(X^x_t)|&\leqslant C\Vert g\Vert _\infty (1+|x|)\sum _{i=1}^d \left( \Vert h_{t,i}\Vert _{{\mathbb {H}}_2(t)}+\Vert h_{t,i}\Vert _{{\mathbb {H}}_4(t)} +\Vert h_{t,i}\Vert _{{\mathbb {H}}_8(t)}\right) \\&\leqslant C\Vert g\Vert _\infty (1+|x|)t^{-\frac{1}{2}}, \end{aligned}$$

    where C is a constant independent of x and t. For any \(y\in {\mathbb {R}}^d\) and \(g\in C^1_b({\mathbb {R}}^d)\), also by Theorem 2.6 one has

    $$\begin{aligned} \nabla _y{\mathbb {E}}g(X^x_t)= & {} {\mathbb {E}}\left( \nabla g(X^x_t)\nabla _y X^x_t\right) \\= & {} {\mathbb {E}}\left( \nabla g(X^x_t)M_tK_t\nabla _y X^x_t\right) =\sum _{j=1}^d{\mathbb {E}}\left( D_{(h_{t,j},0)}g(X^x_t)(K_t\nabla _y X^x_t)_{j\cdot }\right) \\= & {} \sum _{j=1}^d{\mathbb {E}}D_{(h_{t,j},0)}\left( g(X^x_t)(K_t\nabla _y X^x_t)_{j\cdot }\right) \\&-\sum _{j=1}^d{\mathbb {E}}\left( g(X^x_t)D_{(h_{t,j},0)}(K_t\nabla _y X^x_t)_{j\cdot }\right) \\= & {} {\mathbb {E}}\left[ g(X^x_t)\sum _{j=1}^d\left( (K_t)_{j\cdot }\nabla _y X^x_t \int _0^t\langle h_{t,j}(s),\mathrm{d}W_s\rangle \right. \right. \\&\quad -\left. \left. D_{(h_{t,j},0)}(K_t)_{j\cdot }\nabla _y X^x_t -(K_t)_{j\cdot }(D_{(h_{t,j},0)}\nabla _y X^x_t)\right) \right] . \end{aligned}$$

    Hence, by Hölder’s inequality, (3.7), (3.9), (3.14), (3.18), (3.24) and (3.30) one can arrive at

    $$\begin{aligned} |\nabla _y{\mathbb {E}}g(X^x_t)| \leqslant&\, C\Vert g\Vert _\infty \sum _{j=1}^d\Bigg [\Big ({\mathbb {E}}\sup \limits _{t\in [0,1]}|K_t|^4\Big )^{\frac{1}{4}} \Big ({\mathbb {E}}\sup \limits _{t\in [0,1]}|\nabla _yX^x_t|^4\Big )^{\frac{1}{4}}\Vert h_{t,j}\Vert _{{\mathbb {H}}_2(t)}\\&\quad +\Big ({\mathbb {E}}\sup \limits _{s\leqslant t}|D_{(h_{t,j},0)}K_s|^2\Big )^{\frac{1}{2}} \Big ({\mathbb {E}}\sup \limits _{t\in [0,1]}|\nabla _y X^x_t|^2\Big )^{\frac{1}{2}}\\&\quad +\Big ({\mathbb {E}}\sup \limits _{t\in [0,1]}|K_t|^2\Big )^{\frac{1}{2}} \Big ({\mathbb {E}}\sup \limits _{t\in [0,1]}|D_{(h_{t,j},0)}\nabla _y X^x_t|^2\Big )^{\frac{1}{2}}\Bigg ]\\ \leqslant&\, C\Vert g\Vert _\infty (1+|x|)|y|\sum _{j=1}^d\left( \Vert h_{t,j}\Vert _{{\mathbb {H}}_2(t)}+\Vert h_{t,j}\Vert _{{\mathbb {H}}_4(t)} +\Vert h_{t,j}\Vert _{{\mathbb {H}}_8(t)}\right) \\ \leqslant&\, C\Vert g\Vert _\infty (1+|x|)|y| t^{-\frac{1}{2}}. \end{aligned}$$
  2. 2.

    Assume \(\Vert f^{-1}\Vert _\infty <\infty \). For any fixed \(t\in (0,1]\) and \(1\leqslant j\leqslant d\), set

    $$\begin{aligned} h\equiv 0 \ \ \ \text {and} \ \ \ {\mathbf {v}}_{t,j}(s,z)=f^{-1}\big ({\mathbb {P}}_{X^x_s}\big )(J_s)_{\cdot j}\zeta _t(z),\quad \forall s\in [0,t],\ z\in B_0. \end{aligned}$$

    Define a matrix \(\hat{M}_t\) by

    $$\begin{aligned} (\hat{M}_t)_{ij}:=D_{(0,{\mathbf {v}}_{t,j})}X^{x,i}_t,\ \ 1\leqslant i,j\leqslant d. \end{aligned}$$

    Then by (3.33) we obtain

    $$\begin{aligned} \hat{M}_t=\left( D_{(0,{\mathbf {v}}_{t,j})}X^x_t,\ldots ,D_{(0,{\mathbf {v}}_{t,j})}X^x_t\right) =J_t\int _0^t\!\!\int _{B_0}\zeta _t(z)N(\mathrm{d}z,\mathrm{d}s) =:J_tH_t. \end{aligned}$$

    For any \(g\in C^1_b({\mathbb {R}}^d)\), due to Theorem 2.6 we have

    $$\begin{aligned} {\mathbb {E}}\nabla g(X^x_t)&={\mathbb {E}}\left( \nabla g(X^x_t)\hat{M}_tK_tH_t^{-1}\right) =\sum _{j=1}^d{\mathbb {E}}\left( D_{(0,{\mathbf {v}}_{t,j})}g(X^x_t)(K_t)_{j\cdot }H_t^{-1}\right) \\&=\sum _{j=1}^d{\mathbb {E}}D_{(0,{\mathbf {v}}_{t,j})}\left( g(X^x_t)(K_t)_{j\cdot }H_t^{-1}\right) -\sum _{j=1}^d{\mathbb {E}}\left( g(X^x_t)D_{(0,{\mathbf {v}}_{t,j})}\left( (K_t)_{j\cdot }H_t^{-1}\right) \right) \\&={\mathbb {E}}\left[ g(X^x_t)\sum _{j=1}^d\left( (K_t)_{j\cdot }H_t^{-1}\delta _t({\mathbf {v}}_{t,j}) -D_{(0,{\mathbf {v}}_{t,j})}(K_t)_{j\cdot }H_t^{-1}+(K_t)_{j\cdot }H_t^{-2}G_{t,j}\right) \right] . \end{aligned}$$

    Moreover, it follows from (3.7), (3.14), (3.26), (3.31) and (3.32) that

    $$\begin{aligned} |{\mathbb {E}}\nabla g(X^x_t)| \leqslant&\, C\Vert g\Vert _\infty \sum _{j=1}^d \Big (\Vert K_t\Vert _{L^4}\Vert H_t^{-1}\Vert _{L^4}\Vert \delta _t({\mathbf {v}}_{t,j})\Vert _{L^2} +\Vert D_{(0,{\mathbf {v}}_{t,j})}K_t\Vert _{L^2}\Vert H_t^{-1}\Vert _{L^2}\\ \quad&+\Vert K_t\Vert _{L^4}\Vert H_t^{-2}\Vert _{L^4}\Vert G_{t,j}\Vert _{L^2}\Big )\\ \leqslant&\, C\Vert g\Vert _\infty (1+|x|)\left( t^{-\frac{3}{\alpha }}t^{\frac{2}{\alpha }} +t^{-\frac{3}{\alpha }}t^{\frac{3}{\alpha }}+t^ {-\frac{6}{\alpha }}t^{\frac{5}{\alpha }}\right) \\ \leqslant&\, C\Vert g\Vert _\infty (1+|x|)t^ {-\frac{1}{\alpha }}. \end{aligned}$$

    where C is a constant independent of x and t. For any \(y\in {\mathbb {R}}^d\) and \(g\in C^1_b({\mathbb {R}}^d)\),

    $$\begin{aligned} \nabla _y{\mathbb {E}}g(X^x_t)&={\mathbb {E}}\left( \nabla g(X^x_t)\nabla _y X^x_t\right) ={\mathbb {E}}\left( \nabla g(X^x_t)\hat{M}_tK_tH_t^{-1}\nabla _y X^x_t\right) \\&=\sum _{j=1}^d{\mathbb {E}}\left( D_{(0,{\mathbf {v}}_{t,j})}g(X^x_t)(K_tH_t^{-1} \nabla _y X^x_t)_{j\cdot }\right) \\&=\sum _{j=1}^d{\mathbb {E}}D_{(0,{\mathbf {v}}_{t,j})}\left( g(X^x_t)(K_tH_t^{-1}\nabla _y X^x_t)_{j\cdot }\right) \\&\quad -\sum _{j=1}^d{\mathbb {E}}\left( g(X^x_t)D_{(0,{\mathbf {v}}_{t,j})}(K_tH_t^{-1}\nabla _y X^x_t)_{j\cdot }\right) \\&={\mathbb {E}}\Bigg [g(X^x_t)\sum _{j=1}^d\Big ((K_t)_{j\cdot }H_t^{-1}\nabla _y X^x_t\delta _t({\mathbf {v}}_{t,j}) \\&\quad -D_{(0,{\mathbf {v}}_{t,j})}(K_t)_{j\cdot }H_t^{-1}\nabla _y X^x_t +(K_t)_{j\cdot }H_t^{-2}G_{t,j}\nabla _y X^x_t\\&\quad -(K_t)_{j\cdot }H_t^{-1}(D_{(0,{\mathbf {v}}_{t,j})}\nabla _y X^x_t)\Big )\Bigg ]. \end{aligned}$$

    Hence, by Hölder’s inequality, (3.7), (3.14), (3.24), (3.26), (3.31) and (3.32) one can arrive at

    $$\begin{aligned} |\nabla _y{\mathbb {E}}g(X^x_t)| \leqslant&\, C\Vert g\Vert _\infty \sum _{j=1}^d\Bigg [\Vert K_t\Vert _{L^8}\Vert H_t^{-1}\Vert _{L^4} \Vert \nabla _yX^x_t\Vert _{L^8}\Vert \delta _t({\mathbf {v}}_{t,j})\Vert _{L^2} \\&+\Vert D_{(0,{\mathbf {v}}_{t,j})}K_t\Vert _{L^2}\Vert H_t^{-1}\Vert _{L^4}\Vert \nabla _yX^x_t\Vert _{L^4}\\&+\Vert K_t\Vert _{L^4}\Vert H_t^{-2}\Vert _{L^2}\Vert G_{t,j}\Vert _{L^8}\Vert \nabla _yX^x_t\Vert _{L^8} \\&+\Vert K_t\Vert _{L^4}\Vert H_t^{-1}\Vert _{L^4}\Vert D_{(0,{\mathbf {v}}_{t,j})}\nabla _yX^x_t\Vert _{L^2}\Bigg ]\\ \leqslant&\, C\Vert g\Vert _\infty (1+|x|)|y|\left( t^{-\frac{3}{\alpha }}t^{\frac{2}{\alpha }} +t^{-\frac{3}{\alpha }}t^{\frac{3}{\alpha }}+t^{-\frac{6}{\alpha }}t^{\frac{5}{\alpha }} +t^{-\frac{3}{\alpha }}t^{\frac{2}{\alpha }}\right) \\ \leqslant&\, C\Vert g\Vert _\infty (1+|x|)|y| t^{-\frac{1}{\alpha }}. \end{aligned}$$

\(\square \)

Let’s give the proof of Corollary 1.4.

Proof

We only prove the first statement, since the second one can be obtained by the same argument. For any \(x_1, x_2\in {\mathbb {R}}^d\), according to Lemma 2.1.1 in [21] and (1.6), both of \(X^{x_1}_t\) and \(X^{x_2}_t\) have density functions denoted by \(p_t(x_1,y)\) and \(p_t(x_2,y)\) respectively. By (1.7), we have

$$\begin{aligned} \left| {\mathbb {E}}g(X^{x_1}_t)-{\mathbb {E}}g(X^{x_2}_t)\right| =&\,\left| \int _0^1\frac{\mathrm{d}}{\mathrm{d}r}{\mathbb {E}}g(X^{x_2+r(x_1-x_2)}_t)\mathrm{d}r\right| \\ \leqslant&\, \int _0^1\left| \nabla _{x_1-x_2}{\mathbb {E}}f(X^{x_2+r(x_1-x_2)}_t)\right| \mathrm{d}r\\ \leqslant&\, C\Vert g\Vert _\infty (1+|x_1|+|x_2|)|x_1-x_2|t^{-\frac{1}{2}}. \end{aligned}$$

Hence,

$$\begin{aligned} \int _{{\mathbb {R}}^d}|p_t(x_1,y)-p_t(x_2,y)|\mathrm{d}y =&\,\sup \limits _{\Vert g\Vert _\infty \leqslant 1, g\in {\mathscr {B}}_b({\mathbb {R}}^d)}\left| {\mathbb {E}}g(X^{x_1}_t)-{\mathbb {E}}g(X^{x_2}_t)\right| \\ =&\,\sup \limits _{\Vert g\Vert _\infty \leqslant 1, g\in C^1_b({\mathbb {R}}^d)}\left| {\mathbb {E}}g(X^{x_1}_t)-{\mathbb {E}}g(X^{x_2}_t)\right| \\ \leqslant&\, C(1+|x_1|+|x_2|)|x_1-x_2|t^{-\frac{1}{2}}. \end{aligned}$$

\(\square \)

3.4 Proof of Theorem 1.5

Proof

The proof is divided into three steps.

  • Step 1 We first prove that for any \(\mu _1,\mu _2\in \mathcal {P}_2\),

    $$\begin{aligned} {\mathbb {W}}_2(P^*_{s,t}\mu _1,P^*_{s,t}\mu _2)^2\leqslant {\mathbb {W}}_2(\mu _1,\mu _2)^2\text {e}^{-(C_2-C_1)(t-s)}. \end{aligned}$$
    (3.34)

    Without loss of generality, we only prove the case for \(s=0\). Let \(\xi _1\) and \(\xi _2\) be two square-integrable and \({\mathscr {F}}_0\)-measurable random variables such that

    $$\begin{aligned} {\mathbb {W}}_2(\mu _1,\mu _2)^2={\mathbb {E}}|\xi _1-\xi _2|^2. \end{aligned}$$

    Denote by \(X^{\xi _1}_t\) and \(X^{\xi _2}_t\) the solutions to (1.1) with initial value \(\xi _1\) and \(\xi _2\), respectively. By (H3) and Itô’s formula, we have

    $$\begin{aligned}&{\mathbb {E}}\left( |X^{\xi _1}_t-X^{\xi _2}_t|^2\text {e}^{(C_2-C_1)t}\right) \nonumber \\&\quad =\,{\mathbb {W}}_2(\mu _1,\mu _2)^2+2\int _0^t{\mathbb {E}}\Big (\Big \langle X^{\xi _1}_s-X^{\xi _2}_s, b\Big (X^{\xi _1}_s,{\mathbb {P}}_{X^{\xi _1}_s}\Big )-b\Big (X^{\xi _2}_s,{\mathbb {P}}_{X^{\xi _2}_s}\Big )\Big \rangle \nonumber \\&\quad \quad +\Vert \sigma (X^{\xi _1}_s,{\mathbb {P}}_{X^{\xi _1}_s})-\sigma \Big (X^{\xi _2}_s, {\mathbb {P}}_{X^{\xi _2}_s}\Big )\Vert _{HS}^2 \nonumber \\&\quad \quad +\int _{{\mathbb {R}}^d_0}|z|^2\nu (\mathrm{d}s)|f({\mathbb {P}}_{X^{\xi _1}_s})-f({\mathbb {P}}_{X^{\xi _2}_s})|^2\Big ) \text {e}^{(C_2-C_1)s}\mathrm{d}s\nonumber \\&\quad \quad +(C_2-C_1)\int _0^t{\mathbb {E}}|X^{\xi _1}_s-X^{\xi _2}_s|^2\text {e}^ {(C_2-C_1)s}\mathrm{d}s\nonumber \\&\quad \leqslant \, {\mathbb {W}}_2(\mu _1,\mu _2)^2+\int _0^t{\mathbb {E}}\Big (C_1{\mathbb {W}}_2({\mathbb {P}}_{X^{\xi _1}_s},{\mathbb {P}}_{X^{\xi _1}_s})^2 -C_2|X^{\xi _1}_s-X^{\xi _2}_s|^2\Big )\text {e}^{(C_2-C_1)s}\mathrm{d}s\nonumber \\&\quad \quad +(C_2-C_1)\int _0^t{\mathbb {E}}|X^{\xi _1}_s-X^{\xi _2}_s|^2\text {e}^{(C_2-C_1)s}\mathrm{d}s\nonumber \\&\quad \leqslant {\mathbb {W}}_2(\mu _1,\mu _2)^2. \end{aligned}$$
    (3.35)

    Hence,

    $$\begin{aligned} {\mathbb {W}}_2(P^*_t\mu _1,P^*_t\mu _2)^2 \leqslant {\mathbb {E}}\left( |X^{\xi _1}_t-X^{\xi _2}_t|^2\right) \leqslant {\mathbb {W}}_2(\mu _1,\mu _2)^2\text {e}^{-(C_2-C_1)t}. \end{aligned}$$
  • Step 2 We prove the existence and uniqueness of the invariant measure. Let \(X^0_t\) denote the solution with initial value 0 and \(\epsilon _0:=\frac{C_2-C_1}{4}\). By Itô’s formula, (H3), (3.2) and Young’s inequality, we have

    $$\begin{aligned}&{\mathbb {E}}\left( |X^0_t|^2\text {e}^{(C_2-C_1-2\epsilon _0)t}\right) \\&\quad ={\mathbb {E}}\int _0^t\left( 2\langle b(X^0_s,{\mathbb {P}}_{X^0_s}),X^0_s\rangle +\Vert \sigma (X^0_s,{\mathbb {P}}_{X^0_s})\Vert _{HS}^2 \right. \\&\quad \quad +\left. \int _{{\mathbb {R}}^d_0}|z|^2\nu (\mathrm{d}z)|f({\mathbb {P}}_{X^0_s})|^2\right) \text {e}^ {(C_2-C_1-2\epsilon _0)s}\mathrm{d}s\\&\quad \quad +(C_2-C_1-2\epsilon _0)\int _0^t{\mathbb {E}}| X^0_s|^2\text {e}^{(C_2-C_1-2\epsilon _0)s}\mathrm{d}s\\&\quad \leqslant \, C_0+\int _0^t(C_1+\epsilon _0) {\mathbb {W}}_2({\mathbb {P}}_{X^0_s},\delta _0)^2-(C_2-\epsilon _0){\mathbb {E}}|X^0_s|^2\mathrm{d}s\\&\quad \quad +(C_2-C_1-2\epsilon _0)\int _0^t{\mathbb {E}}|X^0_s|^2\text {e}^{(C_2-C_1-2\epsilon _0)s}\mathrm{d}s\\&\quad \leqslant \, C_0, \end{aligned}$$

    where \(C_0\) is a constant depending on \(\epsilon _0\) and the values of b, \(\sigma \) at the point \((0,\delta _0)\) and f at \(\delta _0\). Then we have

    $$\begin{aligned} \sup \limits _{t\geqslant 0}{\mathbb {E}}|X^0_t|^2 \leqslant C_0\sup \limits _{t\geqslant 0}\text {e}^{-(C_2-C_1-2\epsilon _0)t}\leqslant C_0. \end{aligned}$$
    (3.36)

    Recalling the weak uniqueness of the solution, we have

    $$\begin{aligned} P^*_t(P^*_s\delta _0)=P^*_{t+s}\delta _0, \ \ \ s,t\geqslant 0. \end{aligned}$$

    This, together with (3.34) and (3.36), yields

    $$\begin{aligned} {\mathbb {W}}_2(P^*_{t+s}\delta _0,P^*_t\delta _0)^2 \leqslant \text {e}^{-(C_2-C_1)t}{\mathbb {E}}|X^0_s|^2 \leqslant C_0\text {e}^{-(C_2-C_1)t}. \end{aligned}$$

    Then,

    $$\begin{aligned} \lim \limits _{t\rightarrow \infty }\sup \limits _{s\geqslant 0}{\mathbb {W}}_2(P^*_{t+s}\delta _0,P^*_t\delta _0)=0, \end{aligned}$$
    (3.37)

    which means that \(\{P^*_t\delta _0\}_{t\geqslant 0}\) is a \({\mathbb {W}}_2\)-Cauchy family when \(t\rightarrow \infty \). Then, there is a unique probability measure \(\hat{\mu }\in \mathcal {P}_2\) such that

    $$\begin{aligned} \lim \limits _{t\rightarrow \infty }{\mathbb {W}}_2(P^*_t\delta _0,\hat{\mu })=0. \end{aligned}$$
    (3.38)

    Then it follows from (3.34), (3.37) and (3.38) that

    $$\begin{aligned} {\mathbb {W}}_2(P^*_t\hat{\mu },\hat{\mu }) \leqslant&\, \lim \limits _{s\rightarrow \infty }{\mathbb {W}}_2(P^*_t\hat{\mu },P^*_tP^*_s\delta _0) +\lim \limits _{s\rightarrow \infty }{\mathbb {W}}_2(P^*_tP^*_s\delta _0,P^*_s\delta _0)\\&\quad +\lim \limits _{s\rightarrow \infty }{\mathbb {W}}_2(P^*_s\delta _0,\hat{\mu })=0, \end{aligned}$$

    which means that \(\hat{\mu }\) is an invariant measure for \(P^*_t\) indeed.

  • Step 3 Let \(\xi \) be an \({\mathscr {F}}_0\)-measurable random variable with distribution \(\mu \). For any \(t>1\), by Markov property and Theorem 1.2 we have

    $$\begin{aligned} \left| {\mathbb {E}}g(X^\xi _t)-\int _{{\mathbb {R}}^d}g(y)\hat{\mu }(\mathrm{d}y)\right| =&\,\left| {\mathbb {E}}g(X^\xi _t)-\int _{{\mathbb {R}}^d}{\mathbb {E}}g(X^y_t)\hat{\mu }(\mathrm{d}y)\right| \\ \leqslant&\, \int _{{\mathbb {R}}^d}\left| {\mathbb {E}}g(X^\xi _t)-{\mathbb {E}}g(X^y_t)\right| \hat{\mu }(\mathrm{d}y)\\ \leqslant&\, \int _{{\mathbb {R}}^d}\left| {\mathbb {E}}\left[ {\mathbb {E}}\left( g(X^{X^{\xi }_{t-1}}_1)-{\mathbb {E}}g(X^{X^{y}_{t-1}}_t)|{\mathscr {F}}_{t-1}\right) \right] \right| \hat{\mu }(\mathrm{d}y)\\ \leqslant&\, C\Vert g\Vert _\infty \int _{{\mathbb {R}}^d}|{\mathbb {E}}(X^\xi _{t-1}-X^y_{t-1})|\hat{\mu }(\mathrm{d}y)\\ \leqslant&\, C\Vert g\Vert _\infty \int _{{\mathbb {R}}^d}\left( {\mathbb {E}}|\xi -y|^2\right) ^{\frac{1}{2}}\hat{\mu }(\mathrm{d}y)\text {e}^{-\frac{1}{2}(C_2-C_1)(t-1)}\\ \leqslant&\, C\Vert g\Vert _\infty \left[ 1+\left( \int _{{\mathbb {R}}^d}|x|^2\mu (\mathrm{d}x)\right) ^{\frac{1}{2}}\right] \text {e}^{-\frac{1}{2}(C_2-C_1)t}. \end{aligned}$$

    Hence,

    $$\begin{aligned} \Vert P^*_t\mu -\hat{\mu }(\cdot )\Vert _{TV} =&\,\sup \limits _{\Vert g\Vert _\infty \leqslant 1,g\in C^1_b} \left| {\mathbb {E}}g(X^\xi _t)-\int _{{\mathbb {R}}^d}g(y)\hat{\mu }(\mathrm{d}y)\right| \nonumber \\ \leqslant&\, C\left[ 1+\left( \int _{{\mathbb {R}}^d}|x|^2\mu (\mathrm{d}x)\right) ^{\frac{1}{2}}\right] \text {e}^{-\frac{1}{2}(C_2-C_1)t}, \end{aligned}$$
    (3.39)

    where C is a constant independent of t and \(\mu \). It is obvious that for C large enough, (3.39) holds for all \(t\in [0,1]\). So we finish the proof.

\(\square \)

4 An Example

In this section, as an application of our main results, we study the classic McKean–Vlasov equation. Given \(b_0:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(\sigma _0:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\otimes {\mathbb {R}}^d\) and \(f_0:{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\otimes {\mathbb {R}}^d\), we assume:

(A1):

\(b_0\) and \(\sigma _0\) are twice differentiable functions with bounded derivatives; \(f_0\) is a differential function with a bounded derivative.

(A2):

For some \(C_0>0\),

$$\begin{aligned} |\langle \sigma _0(x)\xi ,\xi \rangle |\geqslant C_0,\quad \forall x\in {\mathbb {R}}^d, \quad \forall \xi \in {\mathbb {S}}^{d}. \end{aligned}$$
(A3):

For some \(C_1>0\),

$$\begin{aligned} |\langle f_0(x)\xi ,\xi \rangle |\geqslant C_1,\quad \forall x\in {\mathbb {R}}^d,\quad \forall \xi \in {\mathbb {S}}^{d}. \end{aligned}$$
(A4):

There exists \(\lambda >0\) such that

$$\begin{aligned} 2\langle b_0(y_1)-b_0(y_2),y_1-y_2\rangle \leqslant -\lambda |y_1-y_2|^2, \quad \forall y_1, y_2\in {\mathbb {R}}^d. \end{aligned}$$

Define

$$\begin{aligned} b(x,\mu )=\int _{{\mathbb {R}}^d}b_0(x-y)\mu (\mathrm{d}y), \sigma (x,\mu )=\int _{{\mathbb {R}}^d}\sigma _0(x-y)\mu (\mathrm{d}y),\, \forall x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2, \end{aligned}$$

and

$$\begin{aligned} f(\mu )=\int _{{\mathbb {R}}^d}f_0(y)\mu (\mathrm{d}y), \quad \forall \mu \in \mathcal {P}_2. \end{aligned}$$

For \(\alpha \in (0,2)\), \(\{L_t\}_{t\geqslant 0}\) is a d-dimensional truncated \(\alpha \)-stable process with Lévy measure \(\frac{I_{B_0}(z)\mathrm{d}z}{|z|^{d+\alpha }}\), while \(\{W_t\}_{t\geqslant 0}\) is a d-dimensional Brownian motion independent of L. Now consider the following equation:

$$\begin{aligned} X^x_t=x+\int _0^tb\big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\mathrm{d}s+\int _0^t\sigma \big (X^x_s,{\mathbb {P}}_{X^x_s}\big )\mathrm{d}W_s +\int _0^tf\big ({\mathbb {P}}_{X^x_s}\big )\mathrm{d}L_s. \end{aligned}$$

Then we have the following results:

Theorem 4.1

  1. 1.

    Assume (A1) and (A2). Then there exists \(C>0\) such that

    $$\begin{aligned} |{\mathbb {E}}\nabla g(X^x_t)|\leqslant C\Vert g\Vert _\infty t^{-\frac{1}{2}},\, |\nabla _y{\mathbb {E}}g(X^x_t)|\leqslant C\Vert g\Vert _\infty t^{-\frac{1}{2}},\, \forall g\in C^1_b({\mathbb {R}}^d),\ y\in {\mathbb {R}}^d. \end{aligned}$$
  2. 2.

    Assume (A1) and (A3). Then there exists \(C>0\) such that

    $$\begin{aligned} |{\mathbb {E}}\nabla g(X^x_t)|\leqslant C\Vert g\Vert _\infty t^{-\frac{1}{\alpha }},\, |\nabla _y{\mathbb {E}}g(X^x_t)|\leqslant C\Vert g\Vert _\infty t^{-\frac{1}{\alpha }},\, \forall g\in C^1_b({\mathbb {R}}^d),\ y\in {\mathbb {R}}^d. \end{aligned}$$
  3. 3.

    Assume (A1), (A2)(or (A3)) and (A4) hold. Let \(\Vert \nabla \sigma _0\Vert _{HS,\infty } :=\sup \nolimits _{|v|=1,x\in {\mathbb {R}}^d}\Vert \nabla _v\sigma _0(x)\Vert _{HS}<\infty \). If

    $$\begin{aligned} \lambda _0:=\lambda -1-\Vert \nabla b_0\Vert _\infty ^2-4\Vert \nabla \sigma _0\Vert _{HS,\infty }^2 -\int _{B_0}|z|^2\nu (\mathrm{d}z)\Vert \nabla f_0\Vert _\infty ^2>0, \end{aligned}$$
    (4.1)

    then there exists a unique invariant measure \(\Xi \) such that for any \(\mu _0\in \mathcal {P}_2\),

    $$\begin{aligned} \Vert P^*_t\mu _0-\Xi \Vert _{TV}\leqslant C\left( 1+\left( \int _{B_0}|x|^2\mu _0(\mathrm{d}x)\right) ^{\frac{1}{2}}\right) \text {e}^{-\frac{1}{2}\lambda _0t}. \end{aligned}$$

Proof

We divide the proof into two steps.

  • Step 1 In this part, we prove the statements (1) and (2). It suffices for us to verify the conditions required in Theorems 1.2 and 1.5. In fact, due to (A1) it is easy to see that b and \(\sigma \) are twice differentiable with respect to the first variable x and

    $$\begin{aligned}&\Vert \partial ^i_xb\Vert _\infty :=\sup \limits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2}| \partial ^i_xb(x,\mu )|<\infty ,\\&\Vert \partial ^i_x\sigma \Vert _\infty :=\sup \limits _{x\in {\mathbb {R}}^d,\mu \in \mathcal {P}_2}|\partial ^i_x\sigma (x,\mu )|<\infty , i=1, 2. \end{aligned}$$

    Moreover, for all \(x\in {\mathbb {R}}^d\) and \(\mu _1, \mu _2\in \mathcal {P}_2\),

    $$\begin{aligned} |b(x,\mu _1)-b(x,\mu _2)|&\leqslant \int _{{\mathbb {R}}^d}|b_0(x-y)-b_0(x-z)|\pi (\mathrm{d}y,\mathrm{d}z) \\ {}&\leqslant \Vert \nabla b_0\Vert _\infty \left( \int _{{\mathbb {R}}^d}|y-z|^2\pi (\mathrm{d}y,\mathrm{d}z)\right) ^{\frac{1}{2}}, \end{aligned}$$

    where \(\pi \) is an arbitrary coupling of \(\mu _1\) and \(\mu _2\). Hence,

    $$\begin{aligned} |b(x,\mu _1)-b(x,\mu _2)|\leqslant \Vert \nabla b_0\Vert _\infty {\mathbb {W}}_2(\mu _1,\mu _2). \end{aligned}$$

    By similar arguments, we can prove \(\sigma \), \(\nabla b\) and f are all Lipschitz continuous with respect to the second variable \(\mu \). For any \(x\in {\mathbb {R}}^d\), due to Example 2.4 we have

    $$\begin{aligned} \partial _\mu b(x,\mu )(y)=-\nabla b_0(x-y), \quad \forall y\in {\mathbb {R}}^d, \end{aligned}$$

    which is Lipschitz continuous with respect to both of y and \(\mu \). So \(b(x,\cdot )\) is in \(C_b^{1,1}(\mathcal {P}_2)\). Furthermore,

    $$\begin{aligned} \partial _x\partial _\mu b(x,\mu )(y)=-\nabla ^2b_0(x-y) \end{aligned}$$

    is bounded on \({\mathbb {R}}^d\times \mathcal {P}_2\times {\mathbb {R}}^d\). The same argument can derive \(\sigma (x,\cdot ), f(\cdot )\in C_b^{1,1}(\mathcal {P}_2)\) with \(\partial _\mu \sigma , \partial _\mu f, \) and \(\partial _x\partial _\mu \sigma \) bounded. By (A2) and the continuity of map \((x,\xi )\mapsto \langle \sigma _0(x)\xi ,\xi \rangle \) we have either

    $$\begin{aligned} \langle \sigma _0(x)\xi ,\xi \rangle \geqslant C_0, \quad \forall x\in {\mathbb {R}}^d, \xi \in {\mathbb {S}}^{d}. \end{aligned}$$
    (4.2)

    or

    $$\begin{aligned} \langle \sigma _0(x)\xi ,\xi \rangle \leqslant -C_0, \quad \forall x\in {\mathbb {R}}^d, \xi \in {\mathbb {S}}^{d}. \end{aligned}$$
    (4.3)

    Without loss of generality we assume (4.2) holds, then

    $$\begin{aligned} \langle \sigma (x,\mu )\xi ,\xi \rangle =\int _{{\mathbb {R}}^d}\langle \sigma _0(x-y)\xi ,\xi \rangle \mu (\mathrm{d}y)\geqslant C_0, \quad \forall x\in {\mathbb {R}}^d, \mu \in \mathcal {P}_2, \xi \in {\mathbb {S}}^d, \end{aligned}$$

    which implies \(\sigma ^{-1}(x,\mu )\) exists for all \(x\in {\mathbb {R}}^d\) and \(\mu \in \mathcal {P}_2\). Moreover, we have \(\Vert \sigma ^{-1}\Vert _\infty <\infty .\) Similarly, by (A1) and (A3) we can obtain \(\Vert f^{-1}\Vert _\infty <\infty .\) Now, according to Theorem 1.2 we have proved the statements of (1) and (2).

  • Step 2 For any \(x_1,x_2\in {\mathbb {R}}^d\) and \(\mu _1,\mu _2\in \mathcal {P}_2\), we have

    $$\begin{aligned}&2\langle b(x_1,\mu _1)-b(x_2,\mu _2),x_1-x_2\rangle \\&\quad =2\int _{{\mathbb {R}}^d}\langle b_0(x_1-y)-b_0(x_2-y),x_1-x_2\rangle \mu _1(\mathrm{d}y)\\&\qquad +2\int _{{\mathbb {R}}^d}\langle b_0(x_2-z_1)-b_0(x_2-z_2),x_1-x_2\rangle \pi (\mathrm{d}z_1,\mathrm{d}z_2)\\&\quad \leqslant -\lambda |x_1-x_2|^2+2\Vert \nabla b_0\Vert _\infty \left( \int _{{\mathbb {R}}^d}|z_1-z_2|^2\pi (\mathrm{d}z_1,\mathrm{d}z_2)\right) ^{\frac{1}{2}}|x_1-x_2|, \end{aligned}$$

    where \(\pi \) is a coupling of \(\mu _1\) and \(\mu _2\). Thus,

    $$\begin{aligned}&2\langle b(x_1,\mu _1)-b(x_2,\mu _2),x_1-x_2\rangle \nonumber \\&\quad \leqslant -\lambda |x_1-x_2|^2+2\Vert \nabla b_0\Vert _\infty {\mathbb {W}}_2(\mu _1,\mu _2)|x_1-x_2|\nonumber \\&\quad \leqslant -(\lambda -1)|x_1-x_2|^2+\Vert \nabla b_0\Vert _\infty ^2{\mathbb {W}}_2(\mu _1,\mu _2)^2. \end{aligned}$$
    (4.4)

    Meanwhile, the same arguments can derive

    $$\begin{aligned}&\Vert \sigma (x_1,\mu _1)-\sigma (x_2,\mu _2)\Vert ^2_{HS} +\int _{B_0}|z|^2\nu (\mathrm{d}z)|f(\mu _1)-f(\mu _2)|^2\nonumber \\&\quad \leqslant 2\Vert \nabla \sigma _0\Vert _{HS,\infty }^2|x_1-x_2|^2 +2\Vert \nabla \sigma _0\Vert _{HS,\infty }^2{\mathbb {W}}_2(\mu _1,\mu _2)^2\nonumber \\&\qquad +\int _{B_0}|z|^2\nu (\mathrm{d}z)\Vert \nabla f_0\Vert _\infty ^2{\mathbb {W}}_2(\mu _1,\mu _2)^2. \end{aligned}$$
    (4.5)

    By (4.1), (4.4), (4.5) and Theorem 1.5, we immediately obtain the claim (3).

\(\square \)