1 Introduction

1.1 Background

Let \({\mathcal {P}}({\mathbb {R}}^d)\) be the space of all probability measures on \({\mathbb {R}}^d\) endowed with the weak convergence topology. Let \(b:{\mathbb {R}}_+\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) be a measurable vector field. In [2], Ambrosio studied the connection between the continuity equation

$$\begin{aligned} \partial _t\mu _t=\mathord {\mathrm{div}}(b\mu _t), \end{aligned}$$
(1.1)

and the ordinary differential equation (ODE for short)

$$\begin{aligned} {\mathord {\mathrm{d}}}\omega _t=b_t(\omega _t){\mathord {\mathrm{d}}}t. \end{aligned}$$
(1.2)

The following superposition principle was proved therein: Suppose that \(t\mapsto \mu _t\in {\mathcal {P}}({\mathbb {R}}^d)\) is a solution of (1.1) and satisfies

$$\begin{aligned} \int ^T_0\int _{{\mathbb {R}}^d}\frac{|b_t(x)|}{1+|x|}\mu _t({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t<\infty ,\quad \forall T>0, \end{aligned}$$

then there exists a probability measure \(\eta \) on the space \({\mathbb {C}}\) of continuous functions from \({\mathbb {R}}_+\) to \({\mathbb {R}}^d\), which is concentrated on the set of all \(\omega \) such that \(\omega \) is an absolutely continuous solution of (1.2), and for every function \(f \in C_b({\mathbb {R}}^d)\) and all \(t\geqslant 0\),

$$\begin{aligned} \int _{{\mathbb {R}}^d} f(x)\mu _t({\mathord {\mathrm{d}}}x)=\int _{{\mathbb {C}}} f\big (\omega _t\big )\eta ({\mathord {\mathrm{d}}}\omega ). \end{aligned}$$

In other words, the measure \(\mu _t\) coincides with the image of \(\eta \) under the evaluation map \(\omega \mapsto \omega _t\). Consequently, the well-posedness of ODE (1.2) is equivalent to the existence and uniqueness of solutions for the continuity Eq. (1.1). In particular, the well-posedness of ODE (1.2) with BV drift whose distributional divergence belongs to \(L^\infty \) was obtained in a generalized sense. See also [3,4,5, 29] and the references therein for further developments.

The stochastic counterpart of the above superposition principle was established by Figalli [15]. In this situation, the continuity equation becomes the Fokker–Planck–Kolmogorov equation, while the ODE becomes a stochastic differential equation (SDE for short). More precisely, let \(X_t\) solve the following SDE in \({\mathbb {R}}^d\):

$$\begin{aligned} {\mathord {\mathrm{d}}}X_t=b_t(X_t){\mathord {\mathrm{d}}}t+\sigma _t(X_t){\mathord {\mathrm{d}}}W_t, \end{aligned}$$
(1.3)

where \(b: {\mathbb {R}}_+\times {\mathbb {R}}^{d}\rightarrow {\mathbb {R}}^{d}\) and \(\sigma : {\mathbb {R}}_+\times {\mathbb {R}}^{d}\rightarrow {\mathbb {R}}^{d}\otimes {\mathbb {R}}^{d}\) are measurable functions, \(W_t\) is a standard Brownian motion defined on some probability space \((\Omega ,{\mathcal {F}},{\mathbf{P}})\). Let \(\mu _t\in {\mathcal {P}}({\mathbb {R}}^d)\) be the marginal law of \(X_t\). By Itô’s formula, \(\mu _t\) solves the following Fokker–Planck–Kolmogorov equation in the distributional sense

$$\begin{aligned} \partial _t\mu _t=\big ({\mathscr {A}}_t+{\mathscr {B}}_t\big )^*\mu _t, \end{aligned}$$
(1.4)

where for \(f\in C_b^2({\mathbb {R}}^d)\),

$$\begin{aligned} {\mathscr {A}}_t f(x):=\mathrm {tr}(a_t(x)\cdot \nabla ^2 f(x)),\ {\mathscr {B}}_t f(x):=b_t(x)\cdot \nabla f(x) \end{aligned}$$
(1.5)

with \(a_t(x)=\frac{1}{2}(\sigma _t\sigma ^T_t)(x)\), and \({\mathscr {A}}_t^*\) and \({\mathscr {B}}_t^*\) stand for the adjoint operators of \({\mathscr {A}}_t \) and \({\mathscr {B}}_t\), respectively. When the coefficients a and b are bounded measurable, the superposition principle for Eq. (1.4) was proved by Figalli [15, Theorem 2.6], which says that every probability measure-valued solution to the Fokker–Planck–Kolmogorov Eq. (1.4) yields a martingale solution for the operator \({\mathscr {A}}_t+{\mathscr {B}}_t\) on the path space \({\mathbb {C}}\) (or equivalently, a weak solution for SDE (1.3)). We would like to mention that Kurtz in [20, Theorem 2.7] has already proven such a principle if a and b are time-independent and bounded measurable (see [20, Remark 2.8(a)]). In [32], Trevisan extended it to the following natural integrability assumption:

$$\begin{aligned} \int ^T_0\int _{{\mathbb {R}}^d}\Big (|b_t(x)|+|a_t(x)|\Big )\mu _t({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t<\infty ,\ \ \forall T>0. \end{aligned}$$
(1.6)

More precisely, for any probability measure-valued solution \(\mu \) of (1.4), under (1.6), there is a weak solution X to SDE (1.3) so that for each \(t>0\),

$$\begin{aligned} \mu _t=\hbox {Law of }X_t. \end{aligned}$$
(1.7)

It should be noticed that if \(\mu _t\) does not have finite first moment, then (1.6) may not be satisfied for b and \(\sigma \) with at most linear growth. Recently, in [12], Bogachev, Röckner and Shaposhnikov obtained the superposition principle under the following more natural assumption:

$$\begin{aligned} \int ^T_0\int _{{\mathbb {R}}^d}\frac{|\langle x,b_t(x)\rangle |+|a_t(x)|}{1+|x|^2}\mu _t({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t<\infty ,\ \ \forall T>0. \end{aligned}$$
(1.8)

The proofs in [12] depend on quite involved uniqueness results for Fokker–Planck–Kolmogorov equations obtained in [11]. The superposition principle obtained in [15, 32] has been used in the study of the uniqueness of FPKEs with rough coefficients (see e.g. [25, 36]), probabilistic representations for solutions to non-linear partial differential equations (PDEs for short) [6] as well as distribution dependent SDEs (see [7, 26]).

On the other hand, let \((X_t)_{t\geqslant 0}\) be a Feller process in \({\mathbb {R}}^d\) with infinitesimal generator \(({\mathscr {L}}, \text {Dom}({\mathscr {L}}))\) (see [24, page 88]). One says that \({\mathscr {L}}\) satisfies a positive maximum principle if for all \(0\leqslant f\in \text {Dom}({\mathscr {L}})\) reaching a positive maximum at point \(x_0\in {\mathbb {R}}^d\), then \({\mathscr {L}}f(x_0)\leqslant 0\). Suppose that \(C^\infty _c({\mathbb {R}}^d)\subset \text {Dom}({\mathscr {L}})\). The well-known Courrège theorem states that \({\mathscr {L}}\) satisfies the positive maximum principle if and only if \({\mathscr {L}}\) takes the following form

$$\begin{aligned} {\mathscr {L}}f(x)&=\sum _{i,j=1}^d a_{ij}(x)\partial ^2_{ij} f(x)+\sum _{i=1}^{d}b_i(x)\partial _if(x)+c(x)f(x)\nonumber \\&\quad +\int _{{\mathbb {R}}^d}\left( f(x+z)-f(x)-{\mathbf {1}}_{|z|\leqslant 1}z\cdot \nabla f(x)\right) \nu _x({\mathord {\mathrm{d}}}z), \end{aligned}$$
(1.9)

where \(a=(a_{ij})_{1\leqslant i,j\leqslant d}\) is a \(d\times d\)-symmetric positive definite matrix-valued measurable function on \({\mathbb {R}}^d\), \(b: {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\), \(c:{\mathbb {R}}^d\rightarrow (-\infty ,0]\) are measurable functions and \(\nu _x({\mathord {\mathrm{d}}}z)\) is a family of Lévy measures (see [28]). In particular, if we let \(\mu _t\) be the marginal law of \(X_t\), then by Dynkin’s formula,

$$\begin{aligned} \partial _t\mu _t={\mathscr {L}}^*\mu _t. \end{aligned}$$

We naturally ask that for any probability measure-valued solution \(\mu _t\) to the above Fokker–Planck–Kolmogorov equation, is it possible to find some process X so that \(\mu _t\) is just the law of \(X_t\) for each \(t\geqslant 0\)? In the next subsection, under some growth assumptions on the coefficients, we shall give an affirmative answer.

1.2 Superposition principle for non-local operators

Our aim in this paper is to develop a non-local version of the superposition principle. Let \(\{\nu _{t,x}\}_{t\geqslant 0,x\in {\mathbb {R}}^d}\) be a family of Lévy measures over \({\mathbb {R}}^d\), that is, for each \(t\geqslant 0\) and \(x\in {\mathbb {R}}^d\),

$$\begin{aligned} g^\nu _t(x):=\int _{B_\ell }|z|^2\nu _{t,x}({\mathord {\mathrm{d}}}z)<\infty ,\quad \nu _{t,x}(B^c_\ell )<\infty , \end{aligned}$$
(1.10)

where \(\ell >0\) is a fixed number, and \(B_\ell :=\{z\in {\mathbb {R}}^d: |z|<\ell \}\). Without loss of generality we may assume

$$\begin{aligned} \ell \leqslant 1/\sqrt{2}. \end{aligned}$$

We introduce the following Lévy type operator: for any \(f\in C^2_b({\mathbb {R}}^d)\),

$$\begin{aligned} {\mathscr {N}}_t f(x):={\mathscr {N}}^\nu _t f(x):={\mathscr {N}}^{\nu _{t,x}} f(x):=\int _{{\mathbb {R}}^d}\Theta _f(x;z)\nu _{t,x}({\mathord {\mathrm{d}}}z), \end{aligned}$$
(1.11)

where

$$\begin{aligned} \Theta _f(x;z):=f(x+z)-f(x)-\mathbf{1}_{|z| \leqslant \ell }z \cdot \nabla f(x). \end{aligned}$$
(1.12)

Let us consider the following non-local Fokker–Planck–Kolmogorov equation (FPKE for short):

$$\begin{aligned} \partial _t\mu _t={\mathscr {L}}_t^*\mu _t, \end{aligned}$$
(1.13)

where \({\mathscr {L}}_t\) is a general diffusion operator with jumps, i.e.,

$$\begin{aligned} {\mathscr {L}}_t:={\mathscr {A}}_t+{\mathscr {B}}_t+{\mathscr {N}}_t \end{aligned}$$

with \({\mathscr {A}}_t\) and \({\mathscr {B}}_t\) being defined by (1.5) and \({\mathscr {N}}_t\) being defined by (1.11). We introduce the following definition of weak solution to Eq. (1.13).

Definition 1.1

(Weak solution) Let \(\mu :{\mathbb {R}}_+\rightarrow {\mathcal {P}}({\mathbb {R}}^d)\) be a continuous curve. We call \(\mu =(\mu _t)_{t\geqslant 0}\) a weak solution of the non-local FPKE (1.13) if for any \(R>0\) and \(t>0\),

$$\begin{aligned} \left\{ \begin{aligned}&\int ^t_0\int _{{\mathbb {R}}^d}{\mathbf {1}}_{B_R}(x)\Big (|a_s(x)|+|b_s(x)|+g^\nu _s(x)\Big )\mu _s({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}s<\infty ,\\&\int ^t_0\int _{{\mathbb {R}}^d}\Big (\nu _{s,x}(B^c_{\ell \vee (|x|-R)})+{\mathbf {1}}_{B_R}(x)\nu _{s,x}(B^c_\ell )\Big )\mu _s({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}s<\infty , \end{aligned} \right\} \end{aligned}$$
(1.14)

and for all \(f\in C^2_c({\mathbb {R}}^d)\) and \(t\geqslant 0\),

$$\begin{aligned} \mu _t(f)=\mu _0(f)+\int ^t_0\mu _s({\mathscr {L}}_s f){\mathord {\mathrm{d}}}s, \end{aligned}$$
(1.15)

where \(\mu _t(f):=\int _{{\mathbb {R}}^d}f(x)\mu _t({\mathord {\mathrm{d}}}x)\).

We point out that unlike the local case considered in [2, 12, 15, 32], where the local integrability of the coefficients with respect to \(\mu _t({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t\) implies the well-definedness of the integrals in (1.15), it is even not clear whether the above integral in (1.15) makes sense in the non-local case since in general \({\mathscr {N}}^\nu _tf\) does not have compact support for \(f\in C^2_c({\mathbb {R}}^d)\). This is the reason why we need the second assumption in (1.14).

Remark 1.2

Under (1.14), one has \(\int ^t_0\mu _s(|{\mathscr {L}}_s f|){\mathord {\mathrm{d}}}s<\infty \) for any \(f\in C^2_c({\mathbb {R}}^d)\). Let us only show

$$\begin{aligned} \int ^t_0\mu _s(|{\mathscr {N}}^\nu _s f|){\mathord {\mathrm{d}}}s<\infty . \end{aligned}$$

Note that for \(x,z\in {\mathbb {R}}^d\), by Taylor’s expansion, there is a \(\theta \in [0,1]\) such that

$$\begin{aligned} f(x+z)-f(x)-z\cdot \nabla f(x)=\sum _{i,j=1,\ldots , d}z_iz_j\partial _i\partial _jf(x+\theta z)/2. \end{aligned}$$
(1.16)

Suppose that the support of f is contained in a ball \(B_R\). By definition we have

$$\begin{aligned} |\Theta _f(x;z)|&\leqslant \Vert f\Vert _\infty {\mathbf {1}}_{|z|>\ell }({\mathbf {1}}_{|x+z|<R}+{\mathbf {1}}_{|x|<R})+\Vert \nabla ^2f\Vert _\infty {\mathbf {1}}_{|z|\leqslant \ell }|z|^2{\mathbf {1}}_{|x|<R+\ell }. \end{aligned}$$

Hence,

$$\begin{aligned} \int ^t_0\mu _s(|{\mathscr {N}}^\nu _s f|){\mathord {\mathrm{d}}}s&\lesssim \int ^t_0 \int _{{\mathbb {R}}^d}\Big [\nu _{s,x}(B^c_{\ell \vee (|x|-R)})+{\mathbf {1}}_{B_{R}}(x)\nu _{s,x}(B^c_\ell )\Big ]\mu _s({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}s\\&\quad +\int ^t_0 \int _{{\mathbb {R}}^d}{\mathbf {1}}_{B_{R+\ell }}(x)g^\nu _s(x)\mu _s({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}s<\infty . \end{aligned}$$

Let \({\mathbb {D}}\) be the space of all \({\mathbb {R}}^d\)-valued càdlàg functions on \({\mathbb {R}}_+\), which is endowed with the Skorokhod topology so that \({\mathbb {D}}\) becomes a Polish space. Let \(X_t(\omega )=\omega _t\) be the canonical process. For \(t\geqslant 0\), let \({\mathcal {B}}^0_t({\mathbb {D}})\) denote the natural filtration generated by \((X_s)_{s\in [0,t]}\), and let

$$\begin{aligned} {\mathcal {B}}_t:={\mathcal {B}}_t({\mathbb {D}}):=\cap _{s\geqslant t}{\mathcal {B}}^0_t({\mathbb {D}}),\quad {\mathcal {B}}:={\mathcal {B}}({\mathbb {D}}):={\mathcal {B}}_\infty ({\mathbb {D}}). \end{aligned}$$

Now we recall the notion of martingale solutions associated with \({\mathscr {L}}_t\) in the sense of Stroock–Varadhan [31].

Definition 1.3

(Martingale Problem) Let \(\mu _0\in {\mathcal {P}}({\mathbb {R}}^d)\), \(s\geqslant 0\) and \(\tau \geqslant s\) be a \({\mathcal {B}}_t\)-stopping time. We call a probability measure \({\mathbb {P}}\in {\mathcal {P}}({\mathbb {D}})\) a martingale solution (resp. a “stopped” martingale solution) of \({\mathscr {L}}_t\) with initial distribution \(\mu _0\) at time s if

  1. (i)

    \({\mathbb {P}}(X_t=X_s, t\in [0,s])=1\) and \({\mathbb {P}}\circ X_s^{-1}=\mu _0\).

  2. (ii)

    For any \(f\in C^2_c({\mathbb {R}}^d)\), \(M^f_t\) (resp. \(M^f_{t\wedge \tau }\)) is a \({\mathcal {B}}_t\)-martingale under \({\mathbb {P}}\), where

    $$\begin{aligned} M^f_t:=f(X_t)-f(X_s)-\int ^t_s{\mathscr {L}}_r f(X_r){\mathord {\mathrm{d}}}r,\ t\geqslant s. \end{aligned}$$
    (1.17)

All the martingale solutions (resp. “stopped” martingale solutions) associated with \({\mathscr {L}}_t\) with initial law \(\mu _0\) at time s will be denoted by \({\mathcal {M}}^{\mu _0}_s({\mathscr {L}})\) (resp. \({\mathcal {M}}^{\mu _0}_{s,\tau }({\mathscr {L}})\)). In particular, if \(\mu _0=\delta _x\) (the Dirac measure concentrated on x), we shall write \({\mathcal {M}}^{x}_s({\mathscr {L}})={\mathcal {M}}^{\delta _x}_s({\mathscr {L}})\) for simplify.

Remark 1.4

Under (1.18) below, (ii) in Definition 1.3 is equivalent to that for any \(f\in C^2({\mathbb {R}}^d)\) with \(|f(x)|\leqslant C\log (2+|x|)\), \(M^f_t\) is a local \({\mathcal {B}}_t\)-martingale under \({\mathbb {P}}\). Indeed, let \(\chi \in C^\infty _c({\mathbb {R}}^d)\) be a smooth function with \(\chi (x)=1\) for \(|x|<1\) and \(\chi (x)=0\) for \(|x|>2\). For each \(n,m\in {\mathbb {N}}\), define \(f_n(x):=f(x)\chi (x/n)\) and \(\tau _m:=\inf \{t>s: |X_t|\vee |X_t-X_{t-}|\geqslant m\}\). By (ii) of Definition 1.3, one knows that \(M^{f_n}_{t\wedge \tau _m}\) is a \({\mathcal {B}}_t\)-martingale. Since \(|f(x)|\leqslant C\log (2+|x|)\), by definition (1.11) and (1.18) below, it is easy to see that for each fixed \(m\in {\mathbb {N}}\),

$$\begin{aligned} \sup _n\sup _{r\in [0,t]}\sup _{|x|\leqslant m}|{\mathscr {L}}_r f_n(x)|<\infty . \end{aligned}$$

Thus, for each \(t>s\), by the dominated convergence theorem, we have

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {E}}\left( \int ^{t\wedge \tau _m}_s|{\mathscr {L}}_r f_n(X_r)-{\mathscr {L}}_r f(X_r)|{\mathord {\mathrm{d}}}r\right) =0. \end{aligned}$$

Therefore, for each \(t>s\),

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {E}}|M^{f_n}_{t\wedge \tau _m}-M^{f}_{t\wedge \tau _m}|=0, \end{aligned}$$

which implies that \(M^{f}_{t\wedge \tau _m}\) is a \({\mathcal {B}}_t\)-martingale for each \(m\in {\mathbb {N}}\), and also \(M^f_t\) is a local \({\mathcal {B}}_t\)-martingale since \(\tau _m\rightarrow \infty \) as \(m\rightarrow \infty \).

Throughout this paper, we make the following assumption:

$$\begin{aligned} \Gamma ^\nu _{a,b}:=\sup _{t,x}\left[ \frac{|a_t(x)|+g^\nu _t(x)}{1+|x|^2}+\frac{|b_t(x)|}{1+|x|}+\hbar ^\nu _t(x)\right] <\infty , \end{aligned}$$
(1.18)

where \(g^\nu _t(x)\) is defined by (1.10) and

$$\begin{aligned} \hbar ^\nu _t(x):=\int _{B_\ell ^c}\log \left( 1+\tfrac{|z|}{1+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z), \end{aligned}$$
(1.19)

and if \(\nu _{t,x}\) is symmetric, then we define

$$\begin{aligned} \hbar ^\nu _t(x):=\int _{|z|>1+|x|}\log \left( 1+\tfrac{|z|}{1+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z). \end{aligned}$$
(1.20)

The main result of this paper is as follows.

Theorem 1.5

(Superposition principle) Under (1.18), for any weak solution \((\mu _t)_{t\geqslant 0}\) of FPKE (1.13) in the sense of Definition 1.1, there is a martingale solution \({\mathbb {P}}\in {\mathcal {M}}^{\mu _0}_0({\mathscr {L}}_t)\) such that

$$\begin{aligned} \mu _t={\mathbb {P}}\circ X^{-1}_t, \quad \forall t\geqslant 0. \end{aligned}$$

Remark 1.6

Under (1.18), condition (1.14) holds. In fact, it suffices to check that

$$\begin{aligned} \sup _{t,x}\Big (\nu _{t,x}(B^c_{\ell \vee (|x|-R)})+{\mathbf {1}}_{B_R}(x)\nu _{t,x}(B^c_\ell )\Big )<\infty , \quad \forall R>0. \end{aligned}$$
(1.21)

By definition we have

$$\begin{aligned} \nu _{t,x}(B^c_{\ell \vee (|x|-R)})&\leqslant \int _{B_\ell ^c}\log \left( 1+\tfrac{|z|}{1+|x|}\right) /\log \left( 1+\tfrac{\ell \vee (|x|-R)}{1+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z)\\&=\hbar ^\nu _t(x) /\log \left( 1+\tfrac{\ell \vee (|x|-R)}{1+|x|}\right) \leqslant \hbar ^\nu _t(x) /\log \left( 1+\tfrac{\ell }{1+\ell +R}\right) , \end{aligned}$$

and

$$\begin{aligned} {\mathbf {1}}_{B_R}(x)\nu _{t,x}(B^c_\ell )&\leqslant {\mathbf {1}}_{B_R}(x)\int _{B_\ell ^c}\log \left( 1+\tfrac{|z|}{1+|x|}\right) /\log \left( 1+\tfrac{\ell }{1+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z)\\&={\mathbf {1}}_{B_R}(x) \hbar ^\nu _t(x) /\log \left( 1+\tfrac{\ell }{1+|x|}\right) \leqslant \hbar ^\nu _t(x) /\log \left( 1+\tfrac{\ell }{1+R}\right) . \end{aligned}$$

Hence, (1.21) follows by (1.18).

Remark 1.7

Note that our result does not cover the one in [12] (see the above (1.8)). The results in [12] allow to treat SDEs with singular and linear growth coefficients, while our assumption (1.18) only allows the coefficients being of linear growth. Here the main issue is that the elegant push-forward method used in [32] seems not valid in the non-local case. Moreover, in our proof, we borrow some technique from [12] to construct the approximation sequence (see Proposition 3.2 below).

Example 1.8

Let \(\nu _{t,x}({\mathord {\mathrm{d}}}z)=\kappa _t(x,z){\mathord {\mathrm{d}}}z/|z|^{d+\alpha }\) with \(\alpha \in (0,2)\), that is, \({\mathscr {N}}_t\) is an \(\alpha \)-stable like operator.

  1. (i)

    If \(|\kappa _t(x,z)|\leqslant c(1+|x|)^{\alpha \wedge 1}/(1+{\mathbf {1}}_{\alpha =1}\log (1+|x|))\), then \(\sup _{t,x}\hbar ^\nu _t(x)<\infty .\) Indeed, by definition we have

    $$\begin{aligned} \hbar ^\nu _t(x)\lesssim \frac{(1+|x|)^{\alpha \wedge 1}}{1+{\mathbf {1}}_{\alpha =1}\log (1+|x|)}\int _{B_\ell ^c}\log \left( 1+\tfrac{|z|}{1+|x|}\right) \frac{{\mathord {\mathrm{d}}}z}{|z|^{d+\alpha }}. \end{aligned}$$

    We calculate the right hand integral which is denoted by \({\mathscr {I}}\) as follows: using polar coordinates and integration by parts,

    $$\begin{aligned} {\mathscr {I}}&=c\int ^\infty _\ell \log \left( 1+\tfrac{r}{1+|x|}\right) r^{-1-\alpha }{\mathord {\mathrm{d}}}r\\&\lesssim \log \left( 1+\tfrac{\ell }{1+|x|}\right) +\int ^\infty _\ell r^{-\alpha }\left( 1+|x|+r\right) ^{-1}{\mathord {\mathrm{d}}}r\\&\lesssim (1+|x|)^{-1}+(1+|x|)^{-1}\int ^{1+|x|}_\ell r^{-\alpha }{\mathord {\mathrm{d}}}r+\int ^\infty _{1+|x|}r^{-1-\alpha }{\mathord {\mathrm{d}}}r\\&\lesssim (1+|x|)^{-1}+(1+|x|)^{-(\alpha \wedge 1)}(1+{\mathbf {1}}_{\alpha =1}\log (1+|x|))+(1+|x|)^{-\alpha }\\&\lesssim (1+|x|)^{-(\alpha \wedge 1)}(1+{\mathbf {1}}_{\alpha =1}\log (1+|x|)). \end{aligned}$$

    Thus, we have \(\hbar ^\nu _t(x)\leqslant C\).

  2. (ii)

    If \(\kappa _t(x,z)\) is symmetric, that is, \(\kappa _t(x,z)=\kappa _t(x,-z)\), and \(|\kappa _t(x,z)|\leqslant c(1+|x|)^{\alpha }\), \(\alpha \in (0,2)\). Then \(\sup _{t,x}\hbar ^\nu _t(x)<\infty .\) In fact, by (1.20) we have for any \(\beta \in (0,\alpha \wedge 1)\),

    $$\begin{aligned} \hbar ^\nu _t(x)&\lesssim (1+|x|)^{\alpha }\int _{|z|>1+|x|}\left( 1+\tfrac{|z|}{1+|x|}\right) ^\beta \frac{{\mathord {\mathrm{d}}}z}{|z|^{d+\alpha }}\\&\lesssim (1+|x|)^{\alpha -\beta }\int _{|z|>1+|x|}\frac{{\mathord {\mathrm{d}}}z}{|z|^{d+\alpha -\beta }}, \end{aligned}$$

    which in turn yields \(\sup _{t,x}\hbar ^\nu _t(x)<\infty .\)

As far as we know, there are very few results concerning the superposition principle for non-local operators. In the constant non-local case, the third author of the present paper [36] used the superposition principle to show the uniqueness of non-local FPKEs. Recently, Fournier and Xu [16] proved a non-local version to the superposition principle in a special case, that is,

$$\begin{aligned} {\mathscr {N}}^\nu _t f(x)=\int _{{\mathbb {R}}^d}[f(x+z)-f(x)]\nu _{t,x}({\mathord {\mathrm{d}}}z), \end{aligned}$$

and \((\mu _t)_{t\geqslant 0}\) have finite first order moments, i.e.,

$$\begin{aligned} \int _{{\mathbb {R}}^d}|x|\mu _t({\mathord {\mathrm{d}}}x)<\infty ,\ \ \forall t\geqslant 0. \end{aligned}$$

These two assumptions rule out the interesting \(\alpha \)-stable processes (see Example 1.8 above). To drop these two limitations, we employ some techniques from [12]. It should be emphasized that the elegant push-forward method used in [32] does not seem to work in the non-local case. Here the main obstacles are to show the tightness and taking limits. One important motivation for studying the superposition principle for nonlocal operators is to solve the Boltzman equation as explained in Subsection 1.2 of [16] (see also [17]).

1.3 Equivalence between FPKEs and martingale problems

The following corollary is a direct consequence of Theorem 1.5 and [14, Theorem 4.4.2] (see also [21, Corollary 1.3] and [32, Lemma 2.12]). For the readers’ convenience, we provide a detailed proof here.

Corollary 1.9

Under (1.18), the well-posedness of the Fokker–Planck–Kolmogorov Eq. (1.13) is equivalent to the well-posedness of the martingale problem associated with \({\mathscr {L}}\). More precisely, we have the following equivalences:

  • (Existence) For any \(\nu \in {\mathcal {P}}({\mathbb {R}}^d)\), the non-local FPKE (1.13) admits a solution \((\mu _t)_{t\geqslant 0}\) with initial value \(\mu _0=\nu \) if and only if \({\mathcal {M}}^{\nu }_0({\mathscr {L}})\) has at least one element.

  • (Uniqueness) The following two statements are equivalent.

    1. (i)

      For each \((s,\nu )\in {\mathbb {R}}_+\times {\mathcal {P}}({\mathbb {R}}^d)\), the non-local FPKE (1.13) has at most one solution \((\mu _t)_{t\geqslant s}\) with \(\mu _s=\nu \).

    2. (ii)

      For each \((s,\nu )\in {\mathbb {R}}_+\times {\mathcal {P}}({\mathbb {R}}^d)\), \({\mathcal {M}}^\nu _s({\mathscr {L}})\) has at most one element.

Proof

We only prove the uniqueness part. (ii)\(\Rightarrow \)(i) is easy by Theorem 1.5. We show (i)\(\Rightarrow \)(ii). For given \((s,\nu )\in {\mathbb {R}}_+\times {\mathcal {P}}({\mathbb {R}}^d)\) and let \({\mathbb {P}}_1,{\mathbb {P}}_2\in {\mathcal {M}}^\nu _s({\mathscr {L}})\). To show \({\mathbb {P}}_1={\mathbb {P}}_2\), it suffices to prove the following claim by induction:

\((\mathbf{C}_n)\) for given \(n\in {\mathbb {N}}\), and for any \(s\leqslant t_1<t_2<t_n\) and strictly positive and bounded measurable functions \(f_1,\ldots , f_n\) on \({\mathbb {R}}^d\),

$$\begin{aligned} {\mathbb {E}}^{{\mathbb {P}}_1}(f_1(X_{t_1})\cdots f_n(X_{t_n}))={\mathbb {E}}^{{\mathbb {P}}_2}(f_1(X_{t_1})\cdots f_n(X_{t_n})). \end{aligned}$$
(1.22)

First of all, by Theorem 1.5 and the assumption, one sees that (\(\mathbf{C}_1\)) holds. Next we assume (\(\mathbf{C}_n\)) holds for some \(n\geqslant 2\). For simplicity we write

$$\begin{aligned} \eta :=f_1(X_{t_1})\cdots f_n(X_{t_n}), \end{aligned}$$

and for \(i=1,2\), we define new probability measures

$$\begin{aligned} {\mathord {\mathrm{d}}}\tilde{\mathbb {P}}_i:=\eta {\mathord {\mathrm{d}}}{\mathbb {P}}_i/\int _\Omega \eta {\mathord {\mathrm{d}}}{\mathbb {P}}_i\in {\mathcal {P}}({\mathbb {D}}),\quad \tilde{\nu }_i:=\tilde{\mathbb {P}}_i\circ X^{-1}_{t_n}\in {\mathcal {P}}({\mathbb {R}}^d). \end{aligned}$$

Now we show

$$\begin{aligned} \tilde{\mathbb {P}}_i\in {\mathcal {M}}^{\tilde{\nu }_i}_{t_n}({\mathscr {L}}),\quad i=1,2. \end{aligned}$$

Let \(M^f_t\) be defined by (1.17). We only need to prove that for any \(t'>t\geqslant t_n\) and bounded \({\mathcal {B}}_t\)-measurable \(\xi \),

$$\begin{aligned} {\mathbb {E}}^{\tilde{\mathbb {P}}_i}\left( M^f_{t'}\xi \right) ={\mathbb {E}}^{\tilde{\mathbb {P}}_i}\left( M^f_t\xi \right) \Leftrightarrow {\mathbb {E}}^{{\mathbb {P}}_i}(M^f_{t'}\xi \eta )={\mathbb {E}}^{{\mathbb {P}}_i}(M^f_t\xi \eta ), \end{aligned}$$

which follows since \({\mathbb {P}}_i\in {\mathcal {M}}^{\nu }_s({\mathscr {L}})\). Thus, by induction hypothesis and Theorem 1.5,

$$\begin{aligned} \tilde{\nu }_1=\tilde{\nu }_2\Rightarrow \tilde{\mathbb {P}}_1\circ X^{-1}_{t_{n+1}}=\tilde{\mathbb {P}}_2\circ X^{-1}_{t_{n+1}},\quad \forall t_{n+1}>t_n. \end{aligned}$$

which in turn implies that (\(\mathbf{C}_{n+1}\)) holds. The proof is complete. \(\square \)

1.4 Fractional porous media equation

Probabilistic representation of solution to PDEs is a powerful tool to study their analytic properties (well-posedness, regularity, etc) since it allows us to use many probabilistic tools (see [7,8,9]). As an application of the superposition principle obtained in Theorem 1.5, we intend to derive a probabilistic representation for the weak solution of the following fractional porous media equation (FPME for short):

$$\begin{aligned} \partial _tu=\Delta ^{\alpha /2}(|u|^{m-1}u),\quad u(0,x)=\varphi (x), \end{aligned}$$
(1.23)

where the porous media exponent \(m>1\), \(\alpha \in (0,2)\) and \(\Delta ^{\alpha /2}:=-(-\Delta )^{\alpha /2}\) is the usual fractional Laplacian with, up to a constant, alternative expression

$$\begin{aligned} \Delta ^{\alpha /2} f(x)=\mathrm{P.V.}\int _{{\mathbb {R}}^d}(f(x+z)-f(x)){\mathord {\mathrm{d}}}z/|z|^{d+\alpha }, \end{aligned}$$
(1.24)

where P.V. stands for the Cauchy principal value. This equation is a typical non-linear, degenerate and non-local parabolic equation, which appears naturally in statistical mechanics and population dynamics in order to describe the hydrodynamic limit of interacting particle systems with jumps or long-range interactions. In the last decade, there are many works devoted to the study of Eq. (1.23) from the PDE point of view, see [23] and the recent survey paper [33], the monograph [34] and the references therein.

Let \(\dot{H}^{\alpha /2}({\mathbb {R}}^d)\) be the homogeneous fractional Sobolev space defined as the completion of \(C_0^\infty ({\mathbb {R}}^d)\) with respect to

$$\begin{aligned} \Vert f\Vert _{\dot{H}^{\alpha /2}}:=\left( \int _{{\mathbb {R}}^d}|\xi |^\alpha |\hat{f}(\xi )|^2{\mathord {\mathrm{d}}}\xi \right) ^{1/2}=\Vert (-\Delta )^{\alpha /4}f\Vert _2, \end{aligned}$$

where \(\hat{f}\) is the Fourier transform of f. The following notion about the weak solution of FPME is introduced in [22, Definition 3.1].

Definition 1.10

A function u is called a weak or \(L^1\)-energy solution of FPME (1.23) if

  • \(u\in C([0,\infty );L^1({\mathbb {R}}^d))\) and \(|u|^{m-1}u\in L^2_{loc}((0,\infty );\dot{H}^{\alpha /2}({\mathbb {R}}^d))\);

  • for every \(f\in C_0^1({\mathbb {R}}_+\times {\mathbb {R}}^d)\),

    $$\begin{aligned} \int _0^\infty \int _{{\mathbb {R}}^d}u\cdot \partial _tf{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}t=\int _0^\infty \int _{{\mathbb {R}}^d}(|u|^{m-1}u)\cdot \Delta ^{\alpha /2}f{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}t; \end{aligned}$$
  • \(u(0,x)=\varphi (x)\) almost everywhere.

The following result was proved in [22, Theorem 2.1, Theorem 2.2].

Theorem 1.11

Let \(\alpha \in (0,2)\) and \(m>1\). For every \(\varphi \in L^1({\mathbb {R}}^d)\), there exists a unique weak solution u for Eq. (1.23). Moreover, u enjoys the following properties:

  1. (i)

    if \(\varphi \geqslant 0\), then \(u(t,x)>0\) for all \(t>0\) and \(x\in {\mathbb {R}}^d\);

  2. (ii)

    \(\partial _tu\in L^\infty ((s,\infty );L^1({\mathbb {R}}^d))\) for every \(s>0\);

  3. (iii)

    for all \(t\geqslant 0\), \(\int _{{\mathbb {R}}^d}u(t,x){\mathord {\mathrm{d}}}x=\int _{{\mathbb {R}}^d}\varphi (x){\mathord {\mathrm{d}}}x\);

  4. (iv)

    if \(\varphi \in L^\infty ({\mathbb {R}}^d)\), then for every \(t>0\),

    $$\begin{aligned} \Vert u(t,\cdot )\Vert _\infty \leqslant \Vert \varphi \Vert _\infty ; \end{aligned}$$
  5. (v)

    for some \(\beta \in (0,1)\), \(u\in C^\beta ((0,\infty )\times {\mathbb {R}}^d)\).

Our aim in this subsection is to represent the above solution u as the distributional density of the solution to a nonlinear stochastic differential equation driven by the \(\alpha \)-stable process \(L_t\) with Lévy measure \({\mathord {\mathrm{d}}}z/|z|^{d+\alpha }\). More precisely, consider the following distribution dependent stochastic differential equation (DDSDE for short) driven by the d-dimensional isotropic \(\alpha \)-stable process \(L_t\):

$$\begin{aligned} {\mathord {\mathrm{d}}}Y_t&=\rho _{Y_t}\big (Y_{t-}\big )^{\frac{m-1}{\alpha }}{\mathord {\mathrm{d}}}L_t,\quad \rho _{Y_0}(x)=\varphi (x), \end{aligned}$$
(1.25)

where \(\rho _{Y_t}(x):=({\mathord {\mathrm{d}}}{\mathcal {L}}_{Y_t}/{\mathord {\mathrm{d}}}x)(x)\) denotes the distributional density of \(Y_t\) with respect to Lebesgue measure. We introduce the following notion about the above DDSDE (1.25).

Definition 1.12

Let \((\Omega ,{\mathcal {F}},{\mathbf{P}}; ({\mathcal {F}}_t)_{t\geqslant 0})\) be a stochastic basis and (YL) two \({\mathcal {F}}_t\)-adapted càdlàg processes. For \(\mu \in {\mathcal {P}}({\mathbb {R}}^d)\), we call \((\Omega ,{\mathcal {F}},{\mathbf{P}}; ({\mathcal {F}}_t)_{t\geqslant 0}; Y,L)\) a solution of (1.25) with initial law \(\mu \) if

  1. (i)

    L is an \(\alpha \)-stable process with Lévy measure \({\mathord {\mathrm{d}}}z/|z|^{d+\alpha }\);

  2. (ii)

    for each \(t\geqslant 0\), \({\mathbf{P}}\circ Y^{-1}_t({\mathord {\mathrm{d}}}x)=\rho _{Y_t}(x){\mathord {\mathrm{d}}}x\);

  3. (iii)

    \(Y_t\) solves the following SDE:

    $$\begin{aligned} Y_t=Y_0+\int ^t_0\rho _{Y_s}\big (Y_{s-}\big )^{\frac{m-1}{\alpha }}{\mathord {\mathrm{d}}}L_s. \end{aligned}$$

The following is the second main result of this paper.

Theorem 1.13

Let \(\varphi \geqslant 0\) be bounded and satisfy \(\int _{{\mathbb {R}}^d}\varphi (x){\mathord {\mathrm{d}}}x=1\). Let u be the unique weak solution to FPME (1.23) given by Theorem 1.11 with initial value \(\varphi \). Then there exists a weak solution Y to DDSDE (1.25) such that

$$\begin{aligned} \rho _{Y_t}(x)=u(t,x),\quad \forall t\geqslant 0. \end{aligned}$$

Remark 1.14

Here an open question is to show the uniqueness of weak solutions to the nonlinear SDE (1.25), which can not be derived from the uniqueness of FPME (1.23). We will study this in a future work.

We mention that in the 1-dimensional case, such kind of probabilistic representation for the classical porous media equation (i.e., \(\alpha =2\)) was obtained in [8], see also [10] and [6, 7] and for the generalization to the multi-dimensional case and more general non-linear equations. We also mention that there has been an increasing interest in DDSDEs driven by Brownian motion in the last decade, see [7, 26] and in particular, [13] as well as the references therein. As far as we know, even the weak existence result for DDSDE (1.25) driven by Lévy noise in Theorem 1.13 is also new.

This paper is organized as follows: In Sect. 2, we study the Eq. (1.13) with smooth and non-degenerate coefficients. Then we prove Theorems 1.5 and 1.13 in Sects. 3 and 4, respectively. Throughout this paper we shall use the following conventions:

  • The letter C denotes a constant, whose value may change in different places.

  • We use \(A\lesssim B\) to denote \(A\leqslant C B\) for some unimportant constant \(C>0\).

  • \({\mathbb {N}}_0:={\mathbb {N}}\cup \{0\}\), \({\mathbb {R}}_+:=[0,\infty )\), \(a\vee b:=\max (a,b)\), \(a\wedge b:={\mathord {\mathrm{min}}}(a,b)\), \(a^+:=a\vee 0\).

  • \(\nabla _x:=\partial _x:=(\partial _{x_1},\ldots ,\partial _{x_d})\), \(\partial _i:=\partial _{x_i}:=\partial /\partial x_i\).

  • \({\mathbb {S}}^d_+\) is the set of all \(d\times d\)-symmetric and non-negative definite matrices.

2 Proof of Theorem 1.5: smooth and nondegenerate coefficients

First of all, we show the following well-posedness result about the martingale problem associated with \({\mathscr {L}}_t\), which extends Stroock’s result [30] to unbounded coefficients case, and is probably well-known at least to experts. However, since we can not find it in the literature, we provide a detailed proof here.

Theorem 2.1

Suppose that the following conditions are satisfied:

(A):

\(a_t(x):{\mathbb {R}}_+\times {\mathbb {R}}^d\rightarrow {\mathbb {S}}^d_+\) is continuous and \(a_t(x)\) is invertible;

(B):

\(b_t(x):{\mathbb {R}}_+\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) is locally bounded and measurable;

(C):

for any \(A\in {\mathcal {B}}({\mathbb {R}}^d)\), \((t,x)\mapsto \int _A(1\wedge |z|^2)\nu _{t,x}({\mathord {\mathrm{d}}}z)\) is continuous;

(D):

the following global growth condition holds:

$$\begin{aligned} \bar{\Gamma }^\nu _{a,b}:=\sup _{t,x}\left( \frac{|a_t(x)|+\langle x,b_t(x)\rangle ^++g^\nu _t(x)}{1+|x|^2}+2\hbar ^\nu _t(x)\right) <\infty , \end{aligned}$$

where \(g^\nu _t(x)\) and \(\hbar ^\nu _t(x)\) are defined by (1.10) and (1.19), respectively.

Then for each \((s,x)\in {\mathbb {R}}_+\times {\mathbb {R}}^d\), there is a unique martingale solution \({\mathbb {P}}_{s,x}\in {\mathcal {M}}^{x}_s({\mathscr {L}}_t)\). Moreover, the following assertions hold:

  1. (i)

    For each \(A\in {\mathcal {B}}({\mathbb {D}})\), \((s,x)\mapsto {\mathbb {P}}_{s,x}(A)\) is Borel measurable.

  2. (ii)

    The following strong Markov property holds: for every bounded measurable f and any finite stopping time \(\tau \),

    $$\begin{aligned} {\mathbb {E}}^{{\mathbb {P}}_{0,x}} (f(\tau +t, X_{\tau +t})|{\mathcal {B}}_\tau )=\big ({\mathbb {E}}^{{\mathbb {P}}_{s,y}} (f(s+t, X_{s+t}))\big )\big |_{(s,y)=(\tau ,X_\tau )}. \end{aligned}$$

Remark 2.2

Condition (D) ensures the non-explosion of the solution.

To prove this theorem we first show the following Lyapunov type estimate.

Lemma 2.3

Let \(\psi \in C^2({\mathbb {R}};{\mathbb {R}}_+)\) with \(\lim _{r\rightarrow \infty }\psi (r)=\infty \) and

$$\begin{aligned} 0< \psi '\leqslant 1,\quad \psi ''\leqslant 0. \end{aligned}$$
(2.1)

Fix \(y\in {\mathbb {R}}^d\) and define a Lyapunov function \(V_{y}(x):=\psi (\log (1+|x-y|^2))\). Then for all \(t\geqslant 0\) and \(x\in {\mathbb {R}}^d\), we have

$$\begin{aligned} {\mathscr {L}}_tV_{y}(x)\leqslant 2\left( \frac{|a_t(x)|+\langle x-y,b_t(x)\rangle ^++g^\nu _t(x)}{1+|x-y|^2}+2H^\nu _t(x,y)\right) , \end{aligned}$$
(2.2)

where \(g^\nu _t(x)\) is defined by (1.10), and

$$\begin{aligned} H^\nu _t(x,y):=\int _{B^c_\ell }\log \left( 1+\tfrac{|z|}{1+|x-y|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z). \end{aligned}$$
(2.3)

Proof

By definition, it is easy to see that

$$\begin{aligned} \nabla V_y(x)=\frac{2(x-y)}{1+|x-y|^2}\psi '(\log (1+|x-y|^2)) \end{aligned}$$

and

$$\begin{aligned} \nabla ^2 V_y(x)&=\frac{4(x-y)\otimes (x-y)}{(1+|x-y|^2)^2}(\psi ''-\psi ')(\log (1+|x-y|^2))\\&\quad +\frac{2{\mathbb {I}}}{1+|x-y|^2}\psi '(\log (1+|x-y|^2)). \end{aligned}$$

Thus by (2.1), one gets that

$$\begin{aligned} {\mathscr {A}}^a_t V_y(x)\leqslant \frac{2|a_t(x)|}{1+|x-y|^2},\quad {\mathscr {B}}^b_t V_y(x)\leqslant \frac{2\langle x-y,b_t(x)\rangle ^+}{1+|x-y|^2}. \end{aligned}$$

On the other hand, recalling (1.12), we have for \(|z|\leqslant \ell \leqslant 1/\sqrt{2}\),

$$\begin{aligned} \Theta _{V_y}(x;z)&=V_y(x+z)-V_y(x)-z\cdot \nabla V_y(x)=z_iz_j\partial _i\partial _jV_y(x+\theta z)/2\\&=\frac{2\langle z,x-y+\theta z\rangle ^2}{(1+|x-y+\theta z|^2)^2}(\psi ''-\psi ')(\log (1+|x-y+\theta z|^2))\\&\quad +\frac{|z|^2}{1+|x-y+\theta z|^2}\psi '(\log (1+|x-y+\theta z|^2))\\&{\mathop {\leqslant }\limits ^{(2.1)}}\frac{|z|^2}{1+|x-y+\theta z|^2}\leqslant \frac{|z|^2}{1+|x-y|^2/2-|z|^2}\leqslant \frac{2|z|^2}{1+|x-y|^2}, \end{aligned}$$

where \(\theta \in [0,1]\). Similarly, by the mean value formula, we have

$$\begin{aligned} V_y(x+z)-V_y(x)&=\psi '(\theta _*)\Big [\log \big (1+|x-y+z|^2\big )-\log \big (1+|x-y|^2\big )\Big ]\\&\leqslant \log \left( 1+\frac{2|\langle x-y,z\rangle |+|z|^2}{1+|x-y|^2}\right) \leqslant \log \left( 1+\frac{|z|}{\sqrt{1+|x-y|^2}}\right) ^2\\&\leqslant \log \left( 1+\frac{2|z|}{1+|x-y|}\right) ^2\leqslant \log \left( 1+\frac{|z|}{1+|x-y|}\right) ^4, \end{aligned}$$

where \(\theta _*\in {\mathbb {R}}\). Hence,

$$\begin{aligned}&{\mathscr {N}}^\nu _tV_y(x)\leqslant \int _{{\mathbb {R}}^d}\Theta _{V_y}(x;z)\nu _{t,x}({\mathord {\mathrm{d}}}z)\leqslant 2\frac{g^\nu _t(x)}{1+|x-y|^2}+4H^\nu _t(x,y). \end{aligned}$$

Combining the above calculations, we obtain (2.2). \(\square \)

The following stochastic Gronwall inequality for continuous martingales was proved by Scheutzow [27], and for general discontinuous martingales in [35, Lemma 3.7].

Lemma 2.4

(Stochastic Gronwall inequality) Let \(\xi (t)\) and \(\eta (t)\) be two non-negative càdlàg adapted processes, \(A_t\) a continuous non-decreasing adapted process with \(A_0=0\), \(M_t\) a local martingale with \(M_0=0\). Suppose that

$$\begin{aligned} \xi (t)\leqslant \eta (t)+\int ^t_0\xi (s){\mathord {\mathrm{d}}}A_s+M_t,\quad \forall t\geqslant 0. \end{aligned}$$

Then for any \(0<q<p<1\) and stopping time \(\tau >0\), we have

$$\begin{aligned} \big [{\mathbb {E}}(\xi (\tau )^*)^{q}\big ]^{1/q}\leqslant \Big (\tfrac{p}{p-q}\Big )^{1/q}\Big ({\mathbb {E}}\mathrm {e}^{pA_{\tau }/(1-p)}\Big )^{(1-p)/p}{\mathbb {E}}\big (\eta (\tau )^*\big ), \end{aligned}$$

where \(\xi (t)^*:=\sup _{s\in [0,t]}\xi (s)\).

The following localization lemma is well known (see e.g. [31, Theorem 1.3.5]). Although it is only proved for the probability measures on the space of continuous functions, by checking the proof therein, one sees that it also works for \({\mathbb {D}}\).

Lemma 2.5

Let \(({\mathbb {P}}_n)_{n\in {\mathbb {N}}}\subset {\mathcal {P}}({\mathbb {D}})\) be a family of probability measures and \((\tau _n)_{n\in {\mathbb {N}}}\) a non-decreasing sequence of stopping times with \(\tau _0\equiv 0\). Suppose that for each \(n\in {\mathbb {N}}\), \({\mathbb {P}}_{n}\) equals \({\mathbb {P}}_{n-1}\) on \({\mathcal {B}}_{\tau _{n-1}}({\mathbb {D}})\), and for any \(T\geqslant 0\),

$$\begin{aligned} \lim _{n\rightarrow \infty } {\mathbb {P}}_n (\tau _n\leqslant T)=0. \end{aligned}$$

Then there is a unique probability measure \({\mathbb {P}}\in {\mathcal {P}}({\mathbb {D}})\) such that \({\mathbb {P}}\) equals \({\mathbb {P}}_n\) on \({\mathcal {B}}_{\tau _n}({\mathbb {D}})\) and \({\mathbb {P}}_n\) weakly converges to \({\mathbb {P}}\) as \(n\rightarrow \infty \).

We now use the above localization lemma to give

Proof of Theorem 2.1

Let \(\chi \in C^\infty _c({\mathbb {R}}^d)\) be a smooth function with

$$\begin{aligned} \chi (x)=1,\quad |x|<1,\quad \chi (x)=0,\quad |x|>2. \end{aligned}$$

For any \(n\in {\mathbb {N}}\), define

$$\begin{aligned} \chi _n(x):=\chi (x/n) \end{aligned}$$

and

$$\begin{aligned} a^n_t(x):=a_t(x\chi _n(x)),\quad b^n_t(x):=\chi _n(x)b_t(x),\quad \nu ^n_{t,x}({\mathord {\mathrm{d}}}z):=\chi _n(x)\nu _{t,x}({\mathord {\mathrm{d}}}z). \end{aligned}$$

By the assumptions (A)(C), one can check that \((a^n,b^n,\nu ^n)\) satisfies for any \(T>0\),

(A\(^{\prime }\)):

\(a^n_t(x):[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {S}}^d_+\) is bounded continuous and \(a^n_t(x)\) is invertible.

(B\(^{\prime }\)):

\(b^n_t(x):[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) is bounded measurable.

(C\(^{\prime }\)):

For any \(A\in {\mathcal {B}}({\mathbb {R}}^d)\), \((t,x)\mapsto \int _A(1\wedge |z|^2)\nu ^n_{t,x}({\mathord {\mathrm{d}}}z)\) is bounded continuous.

Let \({\mathscr {L}}^n_t\) be defined in terms of \((a^n,b^n,\nu ^n)\). For each \(n\in {\mathbb {N}}\) and \((s,x)\in {\mathbb {R}}_+\times {\mathbb {R}}^d\), by [19, Theorem 2.34, p. 159], there is a unique martingale solution \({\mathbb {P}}^n_{s,x}\in {\mathcal {M}}^x_s({\mathscr {L}}^n_t)\), and the following properties hold:

  1. (i)

    For each \(A\in {\mathcal {B}}({\mathbb {D}})\), \((s,x)\mapsto {\mathbb {P}}^n_{s,x}(A)\) is Borel measurable.

  2. (ii)

    The following strong Markov property holds: for any bounded measurable f and finite stopping time \(\tau \),

    $$\begin{aligned} {\mathbb {E}}^{{\mathbb {P}}^n_{0,x}} (f(\tau +t, X_{\tau +t})|{\mathcal {B}}_\tau )=\big ({\mathbb {E}}^{{\mathbb {P}}_{s,y}} (f(s+t, X_{s+t}))\big )\big |_{(s,y)=(\tau ,X_\tau )}. \end{aligned}$$

Moreover, if we define

$$\begin{aligned} \tau _n:=\inf \{t\geqslant s: |X_t|>n\}, \end{aligned}$$

then by [19, Theorem 2.41, p. 161], for any \(m\geqslant n\), the “stopped” martingale problem \({\mathcal {M}}^x_{s,\tau _n}({\mathscr {L}}^{m}_t)\) admits a unique solution, that is,

$$\begin{aligned} {\mathbb {P}}^{m}_{s,x}|_{{\mathcal {B}}_{\tau _n}({\mathbb {D}})}={\mathbb {P}}^{n}_{s,x}|_{{\mathcal {B}}_{\tau _n}({\mathbb {D}})}. \end{aligned}$$

To show the well-posedness, by Lemma 2.5, it suffices to show that for any \(T>0\),

$$\begin{aligned} \lim _{n\rightarrow \infty }{\mathbb {P}}^n_{s,x}(\tau _n\leqslant T)=0. \end{aligned}$$

Let \(V(x):=\log (1+|x|^2)\). By the definition of martingale solution (see Remark 1.4), there is a càdlàg local \({\mathbb {P}}^n_{s,x}\)-martingale \(M_t\) such that

$$\begin{aligned} V(X_{t\wedge \tau _n})&=V(x)+\int ^{t\wedge \tau _n}_s{\mathscr {L}}^n_rV(X_r){\mathord {\mathrm{d}}}r+M_t\\&=V(x)+\int ^{t\wedge \tau _n}_s{\mathscr {L}}_rV(X_r){\mathord {\mathrm{d}}}r+M_t\\&{\mathop {\leqslant }\limits ^{(2.2)}} V(x)+2\bar{\Gamma }^{\nu }_{a,b}\cdot (t-s)+M_t, \end{aligned}$$

where \(\bar{\Gamma }^\nu _{a,b}\) is defined in (D). By Lemma 2.4 and condition (D), we obtain

$$\begin{aligned} \sup _n{\mathbb {E}}^{{\mathbb {P}}^n_{s,x}}\left( \sup _{t\in [s,T\wedge \tau _n]}V^{\frac{1}{2}}(X_t)\right) <+\infty , \end{aligned}$$

which in turn implies that

$$\begin{aligned} {\mathbb {P}}^n_{s,x}(\tau _n\leqslant T)={\mathbb {P}}^n_{s,x}\Bigg (\sup _{t\in [s,T\wedge \tau _n]}|X_t|>n\Bigg ) \leqslant \frac{1}{V^{\frac{1}{2}}(n)}{\mathbb {E}}^{{\mathbb {P}}^n_{s,x}}\left( \sup _{t\in [s,T\wedge \tau _n]}V^{\frac{1}{2}}(X_t)\right) {\mathop {\rightarrow }\limits ^{n\rightarrow \infty }} 0. \end{aligned}$$

The proof is complete. \(\square \)

Now we can give the proof of Theorem 1.5 under the assumptions (A)(D).

Theorem 2.6

Assume that (A)(D) hold. Then for any \(\mu _0\in {\mathcal {P}}({\mathbb {R}}^d)\), there are a unique solution \((\mu _t)_{t\geqslant 0}\) to FPKE (1.13) and a unique martingale solution \({\mathbb {P}}_{0,\mu _0}\in {\mathcal {M}}^{\mu _0}_0({\mathscr {L}})\) so that \(\mu _t={\mathbb {P}}_{0,\mu _0}\circ X^{-1}_t\).

Proof

Let \(\mu _0\in {\mathcal {P}}({\mathbb {R}}^d)\) and \({\mathbb {P}}_{0,x}\in {\mathcal {M}}^x_0({\mathscr {L}})\). Clearly,

$$\begin{aligned} {\mathbb {P}}_{0,\mu _0}:=\int _{{\mathbb {R}}^d}{\mathbb {P}}_{0,x}\mu _0({\mathord {\mathrm{d}}}x)\in {\mathcal {M}}^{\mu _0}_0({\mathscr {L}}), \end{aligned}$$

and \(\mu _t:={\mathbb {P}}_{0,\mu _0}\circ X^{-1}_t\) solves FPKE (1.13). It remains to show the uniqueness for (1.13). Following the same argument as in [16], due to Horowitz and Karandikar [17, Theorem B1], we only need to verify the following five points:

(a):

\(C^2_c({\mathbb {R}}^d)\) is dense in \(C_0({\mathbb {R}}^d)\) with respect to the uniform convergence.

(b):

\((t,x)\rightarrow {\mathscr {L}}_t f(x)\) is measurable for all \(f\in C^2_c({\mathbb {R}}^d)\).

(c):

For each \(t\geqslant 0\), the operator \({\mathscr {L}}_t\) satisfies the maximum principle.

(d):

There exists a countable family \((f_k)_{k\in {\mathbb {N}}}\subset C^2_c({\mathbb {R}}^d)\) such that for all \(t\geqslant 0\),

$$\begin{aligned} \{{\mathscr {L}}_tf, f\in C^2_c({\mathbb {R}}^d)\}\subset \overline{\{{\mathscr {L}}_tf_k, k\in {\mathbb {N}}\}}, \end{aligned}$$

where the closure is taken in the uniform norm.

(e):

For each \(x\in {\mathbb {R}}^d\), \({\mathcal {M}}^x_0({\mathscr {L}})\) has exactly one element.

Note that (a)(c) are obvious and (e) is proven in Theorem 2.1. Thus we only need to check (d). Let \((f_k)_{k\in {\mathbb {N}}}\) be a countable dense subset of \(C^2_c({\mathbb {R}}^d)\), that is, for any \(f\in C^2_c({\mathbb {R}}^d)\) with support in \(B_R\), where \(R\geqslant 2\), there is a subsequence \(f_{k_n}\) with support in \(B_{2R}\) such that

$$\begin{aligned} \lim _{n\rightarrow \infty }\Big (\Vert f_{k_n}-f\Vert _\infty +\Vert \nabla f_{k_n}-\nabla f\Vert _\infty +\Vert \nabla ^2f_{k_n}-\nabla ^2f\Vert _\infty \Big )=0. \end{aligned}$$

We want to show

$$\begin{aligned} \lim _{n\rightarrow \infty }\Vert {\mathscr {L}}_t (f_{k_n}-f)\Vert _\infty =0. \end{aligned}$$

Without loss of generality, we may assume \(f=0\) and proceed to prove the following limits:

$$\begin{aligned} \lim _{n\rightarrow \infty }\Vert {\mathscr {A}}_t f_{k_n}\Vert _\infty =0,\ \lim _{n\rightarrow \infty }\Vert {\mathscr {B}}_t f_{k_n}\Vert _\infty =0,\ \lim _{n\rightarrow \infty }\Vert {\mathscr {N}}^\nu _t f_{k_n}\Vert _\infty =0. \end{aligned}$$

The first two limits are obvious. Let us focus on the last one. By definition we have

$$\begin{aligned} |\Theta _{f_{k_n}}(x;z)|&=|f_{k_n}(x+z)-f_{k_n}(x)-{\mathbf {1}}_{|z|\leqslant \ell } z\cdot \nabla f_{k_n}(x)|\\&\leqslant {\mathbf {1}}_{|z|>\ell }|f_{k_n}(x+z)|+{\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{B_{2R}}(x)\Vert f_{k_n}\Vert _\infty \\&\quad +{\mathbf {1}}_{|z|\leqslant \ell }{\mathbf {1}}_{B_{2R+2\ell }}(x)\Vert \nabla ^2 f_{k_n}\Vert _\infty |z|^2. \end{aligned}$$

Note that

$$\begin{aligned} {\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{B_{5R}}(x)\leqslant \Big [\log (1+\tfrac{\ell }{1+5R})\Big ]^{-1}\log \left( 1+\tfrac{|z|}{1+|x|}\right) , \end{aligned}$$

and if \(|x|>5R\), then for \(|x+z|\leqslant 2R\),

$$\begin{aligned} \tfrac{|z|}{1+|x|}\geqslant \tfrac{|x|-|x+z|}{1+|x|}\geqslant \tfrac{|x|-2R}{1+|x|}>\tfrac{1}{2}, \end{aligned}$$

and thus,

$$\begin{aligned} {\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{B^c_{5R}}(x){\mathbf {1}}_{B^c_{2R}}(x+z)\leqslant \Big [\log (\tfrac{3}{2})\Big ]^{-1}\log \left( 1+\tfrac{|z|}{1+|x|}\right) . \end{aligned}$$

Therefore,

$$\begin{aligned} |{\mathscr {N}}^\nu _tf_{k_n}(x)|&\leqslant \int _{{\mathbb {R}}^d}|\Theta _{f_{k_n}}(x;z)|\nu _{t,x}({\mathord {\mathrm{d}}}z) \leqslant \Vert \nabla ^2 f_{k_n}\Vert _\infty \sup _{x\in B_{2R+2}}\int _{B_1}|z|^2\nu _{t,x}({\mathord {\mathrm{d}}}z)\\&\quad +C\Vert f_{k_n}\Vert _\infty \int _{B^c_1}\log \left( 1+\tfrac{|z|}{1+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z)\\&=\Vert \nabla ^2 f_{k_n}\Vert _\infty \sup _{x\in B_{2R+2}} g^\nu _t(x)+C\Vert f_{k_n}\Vert _\infty \sup _{x\in {\mathbb {R}}^d}\hbar ^\nu _t(x), \end{aligned}$$

which in turn implies by (1.18) that

$$\begin{aligned} \lim _{n\rightarrow \infty }\Vert {\mathscr {N}}^\nu _tf_{k_n}\Vert _\infty =0. \end{aligned}$$

The proof is compete. \(\square \)

3 Proof of Theorem 1.5: general case

Let \(\mu _t\) be a solution of (1.13) in the sense of Definition 1.1. In order to show the existence of a martingale solution \({\mathbb {P}}\in {\mathcal {M}}^{\mu _0}_0({\mathscr {L}}_t)\) so that

$$\begin{aligned} \mu _t={\mathbb {P}}\circ X^{-1}_t, \end{aligned}$$

we shall follow the same lines of argument as in [12, 15, 32]. Here and below we use the following convention: for \(t\leqslant 0\),

$$\begin{aligned} \mu _t({\mathord {\mathrm{d}}}x):=\mu _0({\mathord {\mathrm{d}}}x),\quad a_t(x)=0,\quad b_t(x)=0,\quad \nu _{t,x}({\mathord {\mathrm{d}}}z)=0. \end{aligned}$$

3.1 Regularization

Let \(\rho ^\mathrm{t}\in C_c^\infty ([0,1];{\mathbb {R}}_+)\) with \(\int ^1_0\rho ^\mathrm{t}(s){\mathord {\mathrm{d}}}s=1\) and \(\rho ^\mathrm{x}\in C_c^\infty (B_1;{\mathbb {R}}_+)\) with \(\int _{{\mathbb {R}}^d}\rho ^\mathrm{x}(x){\mathord {\mathrm{d}}}x=1\). For \(\varepsilon >0\), define

$$\begin{aligned} \rho ^\mathrm{t}_\varepsilon (t):=\varepsilon ^{-1}\rho ^\mathrm{t}(t/\varepsilon ),\quad \rho ^\mathrm{x}_\varepsilon (x):=\varepsilon ^{-d}\rho ^\mathrm{x}(x/\varepsilon ),\quad \rho _\varepsilon (t,x):=\rho ^\mathrm{t}_\varepsilon (t)\rho ^\mathrm{x}_\varepsilon (x). \end{aligned}$$

Given a locally finite signed measure \(\zeta _t({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t\) on \({\mathbb {R}}^{d+1}\), we define

$$\begin{aligned} \rho _\varepsilon *\zeta (t,x):=\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s,x-y)\zeta _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s. \end{aligned}$$

Throughout this section we shall fix

$$\begin{aligned} \ell \in (0,1/\sqrt{2}). \end{aligned}$$

We first show the following regularization estimate.

Lemma 3.1

Let ab and \(\nu \) be as in the introduction. For \(\varepsilon \in (0,\ell )\), we have

$$\begin{aligned} \frac{|\rho _\varepsilon *(a\mu )|(t,x)}{1+|x|^2}&\leqslant \sup _{s,y}\frac{2|a_s(y)|}{1+|y|^2}(\rho _\varepsilon *\mu )(t,x),\\ \frac{|\rho _\varepsilon *(b\mu )|(t,x)}{1+|x|}&\leqslant \sup _{s,y}\frac{2|b_s(y)|}{1+|y|}(\rho _\varepsilon *\mu )(t,x). \end{aligned}$$

Moreover, if we let

$$\begin{aligned} \bar{\nu }^\varepsilon _{t,x}({\mathord {\mathrm{d}}}z):=\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s,x-y)\nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s, \end{aligned}$$

then we also have

$$\begin{aligned} \frac{g^{\bar{\nu }^\varepsilon }_t(x)}{1+|x|^2}&\leqslant \sup _{s,y}\frac{2g^\nu _{s}(y)}{1+|y|^2}(\rho _\varepsilon *\mu )(t,x),\\ H^{\bar{\nu }^\varepsilon }_t(x,y)&\leqslant 2\sup _{s,y'}H^{\nu }_s(y',y)(\rho _\varepsilon *\mu )(t,x), \end{aligned}$$

where \(g^\nu _{t}(x)\) and \(H^{\nu }_t(x,y)\) are defined by (1.10) and (2.3), respectively.

Proof

Note that for \(|x-y|\leqslant \ell \leqslant 1/\sqrt{2}\),

$$\begin{aligned} (1+|y|^2)/2\leqslant 1+|x|^2\leqslant 2(1+|y|^2). \end{aligned}$$
(3.1)

Fix \(\varepsilon \in (0,\ell )\) below. By definition we have

$$\begin{aligned} \frac{|\rho _\varepsilon *(a\mu )|(t,x)}{1+|x|^2}&\leqslant \int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s, x-y)\frac{|a_s(y)|}{1+|x|^2}\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\\&\leqslant 2\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s,x-y)\frac{|a_s(y)|}{1+|y|^2}\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s, \end{aligned}$$

and

$$\begin{aligned} \frac{|\rho _\varepsilon *(b\mu )|(t,x)}{1+|x|}&\leqslant \int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s,x-y)\frac{|b_s(y)|}{1+|x|}\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\\&\leqslant 2\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s,x-y)\frac{|b_s(y)|}{1+|y|}\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s. \end{aligned}$$

Similarly, by Fubini’s theorem and (3.1), we have

$$\begin{aligned} \frac{g^{\bar{\nu }^\varepsilon }_t(x)}{1+|x|^2}&=\int _{{\mathbb {R}}^{d+1}}\int _{B_\ell }\frac{|z|^2}{1+|x|^2}\rho _\varepsilon (t-s,x-y)\nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\\&\leqslant 2\int _{{\mathbb {R}}^{d+1}}\int _{B_\ell }\frac{|z|^2}{1+|y|^2}\rho _\varepsilon (t-s, x-y)\nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\\&=2\int _{{\mathbb {R}}^{d+1}}\frac{g^\nu _s(y)}{1+|y|^2}\rho _\varepsilon (t-s,x-y)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s, \end{aligned}$$

and

$$\begin{aligned} H^{\bar{\nu }^\varepsilon }_t(x,y)&=\int _{{\mathbb {R}}^{d+1}}\int _{B_\ell ^c}\log \left( 1+\tfrac{|z|}{1+|x-y|}\right) \rho _\varepsilon (t-s,x-y')\nu _{s,y'}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y'){\mathord {\mathrm{d}}}s\\&\leqslant \int _{{\mathbb {R}}^{d+1}}\int _{B_\ell ^c}\log \left( 1+\tfrac{2|z|}{1+|y'-y|}\right) \rho _\varepsilon (t-s,x-y')\nu _{s,y'}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y'){\mathord {\mathrm{d}}}s\\&\leqslant 2\int _{{\mathbb {R}}^{d+1}}H^{\nu }_s(y',y)\rho _\varepsilon (t-s,x-y')\mu _s({\mathord {\mathrm{d}}}y'){\mathord {\mathrm{d}}}s. \end{aligned}$$

Combining the above calculations, we obtain the desired estimates. \(\square \)

Let \(\phi (x):=(2\pi )^{-d}\mathrm {e}^{-|x|^2/2}\) be the normal density. For \(\varepsilon \in (0,\ell )\), as in [12], we define the approximation sequence \(\mu ^\varepsilon _t\in {\mathcal {P}}({\mathbb {R}}^d)\) by

$$\begin{aligned} \mu _t^\varepsilon (x):=(1-\varepsilon )(\rho _\varepsilon *\mu )(t,x)+\varepsilon \phi (x). \end{aligned}$$
(3.2)

We have the following easy consequence.

Proposition 3.2

  1. (i)

    For each \(t\geqslant 0\) and \(\varepsilon \in (0,\ell )\), we have

    $$\begin{aligned} 0<\mu ^\varepsilon _t(x)\in C^\infty ({\mathbb {R}}_+; C^\infty _b({\mathbb {R}}^{d})),\quad \int _{{\mathbb {R}}^d}\mu _t^\varepsilon (x){\mathord {\mathrm{d}}}x=1. \end{aligned}$$
  2. (ii)

    For each \(t\geqslant 0\), \(\mu ^\varepsilon _t\) weakly converges to \(\mu _t\), that is, for any \(f\in C_b({\mathbb {R}}^d)\),

    $$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\int _{{\mathbb {R}}^d}f(x)\mu ^\varepsilon _t(x){\mathord {\mathrm{d}}}x=\int _{{\mathbb {R}}^d}f(x)\mu _t({\mathord {\mathrm{d}}}x). \end{aligned}$$
  3. (iii)

    \(\mu ^\varepsilon _t\) solves the following Fokker–Planck–Kolmogorov equation:

    $$\begin{aligned} \partial _t\mu ^\varepsilon _t=({\mathscr {A}}_t^{\varepsilon }+{\mathscr {B}}^{\varepsilon }_t+{\mathscr {N}}^{\varepsilon }_t)^*\mu ^\varepsilon _t=:({\mathscr {L}}_t^\varepsilon )^*\mu ^\varepsilon _t, \end{aligned}$$

    where \({\mathscr {A}}^{\varepsilon }_t\), \({\mathscr {B}}^{\varepsilon }_t\) and \({\mathscr {N}}^{\varepsilon }_t\) are defined as in the introduction in terms of

    $$\begin{aligned} a^\varepsilon _t(x)&:=\frac{(1-\varepsilon )[\rho _\varepsilon *(a\mu )](t,x)+\varepsilon \phi (x){\mathbb {I}}}{\mu _t^\varepsilon (x)}, \end{aligned}$$
    (3.3)
    $$\begin{aligned} b^\varepsilon _t(x)&:=\frac{(1-\varepsilon )[\rho _\varepsilon *(b\mu )](t,x)+\varepsilon \phi (x)x}{\mu _t^\varepsilon (x)}, \end{aligned}$$
    (3.4)

    and

    $$\begin{aligned} \nu ^\varepsilon _{t,x}({\mathord {\mathrm{d}}}z):=\frac{1-\varepsilon }{\mu _t^\varepsilon (x)}\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (t-s,x-y)\nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s. \end{aligned}$$
    (3.5)
  4. (iv)

    The following uniform estimates hold: for any \(\varepsilon \in (0,\ell )\),

    $$\begin{aligned}&\sup _{t,x}\left[ \frac{|a^\varepsilon _t(x)|+g^{\nu ^\varepsilon }_t(x)}{1+|x|^2} +\frac{|b^\varepsilon _t(x)|}{1+|x|}\right] \leqslant 1+2\sup _{t,x}\left[ \frac{|a_t(x)|+g^{\nu }_t(x)}{1+|x|^2} +\frac{|b_t(x)|}{1+|x|}\right] \end{aligned}$$
    (3.6)

    and

    $$\begin{aligned} \sup _{t,x}H^{\nu ^\varepsilon }_t(x,y)\leqslant \sup _{t,x}H^{\nu }_t(x,y),\quad y\in {\mathbb {R}}^d. \end{aligned}$$
    (3.7)

Proof

The first two assertions are obvious by definition. Let us show (iii). By definition, it suffices to prove that for any \(f\in C^\infty _c({\mathbb {R}}^d)\) and \(t\geqslant 0\),

$$\begin{aligned} \mu ^\varepsilon _t(f)=\mu ^\varepsilon _0(f)+\int ^t_0\mu ^\varepsilon _s({\mathscr {L}}_s^\varepsilon f){\mathord {\mathrm{d}}}s, \end{aligned}$$
(3.8)

where

$$\begin{aligned} \mu ^\varepsilon _t(f):=\int _{{\mathbb {R}}^d}f(x)\mu ^\varepsilon _t(x){\mathord {\mathrm{d}}}x. \end{aligned}$$

Note that for any \(f\in C^\infty _c({\mathbb {R}}^d)\),

$$\begin{aligned} \Delta \phi +\mathord {\mathrm{div}}(x\cdot \phi )\equiv 0\Rightarrow \int _{{\mathbb {R}}^d}\phi (x)(\Delta f(x)-x\cdot \nabla f(x)){\mathord {\mathrm{d}}}x=0. \end{aligned}$$

By Fubini’s theorem and a change of variables, it is easy to see that (3.8) holds. Finally, estimate (3.6) follows by Lemma 3.1. \(\square \)

The following result follows by Theorem 2.6.

Lemma 3.3

For any \(\varepsilon \in (0,\ell )\) and \((s,x)\in {\mathbb {R}}_+\times {\mathbb {R}}^d\), there is a unique martingale solution \({\mathbb {P}}^\varepsilon _{s,x}\in {\mathcal {M}}^x_s({\mathscr {L}}^\varepsilon _t)\). In particular, there is also a martingale solution \({\mathbb {Q}}^\varepsilon \in {\mathcal {M}}^{\mu ^\varepsilon _0}_0({\mathscr {L}}^\varepsilon _t)\) so that for each \(t\geqslant 0\),

$$\begin{aligned} \mu ^\varepsilon _t(x){\mathord {\mathrm{d}}}x={\mathbb {Q}}^\varepsilon \circ X^{-1}_t({\mathord {\mathrm{d}}}x). \end{aligned}$$

Proof

By Theorem 2.6, it suffices to check that \((a^\varepsilon ,b^\varepsilon ,\nu ^\varepsilon )\) satisfies conditions (A)(D). First of all, (A) and (B) are obvious, and (D) follows by (3.6). It remains to check (C). We only check that for any \(\varepsilon \in (0,\ell )\), \(n\in {\mathbb {N}}\) and \(x,x'\in B_n\), \(t,t'\in [0,n]\),

$$\begin{aligned} \int _{{\mathbb {R}}^d}(1\wedge |z|^2)|\nu ^\varepsilon _{t,x}-\nu ^\varepsilon _{t',x'}|({\mathord {\mathrm{d}}}z)\leqslant c_{n,\varepsilon }(|t-t'|+|x-x'|). \end{aligned}$$
(3.9)

Noting that

$$\begin{aligned} \inf _t\inf _{x\in B_n}\mu ^\varepsilon _t(x)\geqslant \varepsilon \inf _{x\in B_n}\phi (x), \end{aligned}$$

we have by definition that for all \(x,x'\in B_n\) and \(t,t'\in [0,n]\),

$$\begin{aligned} |\nu ^\varepsilon _{t,x}-\nu ^\varepsilon _{t',x'}|({\mathord {\mathrm{d}}}z)&\leqslant \int _{{\mathbb {R}}^{d+1}}\left| \frac{\rho _\varepsilon (t-s,x-y)}{\mu _t^\varepsilon (x)}-\frac{\rho _\varepsilon (t'-s,x'-y)}{\mu _{t'}^\varepsilon (x')}\right| \nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\\&\leqslant c_{n,\varepsilon }(|t-t'|+|x-x'|)\int _0^{n+1}\int _{B_{n+1}}\nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s. \end{aligned}$$

Estimate (3.9) then follows since \(\sup _{s,y\in [0,n+1]\times B_{n+1}}\int _{{\mathbb {R}}^{d}}(1\wedge |z|^2)\nu _{s,y}({\mathord {\mathrm{d}}}z)<\infty \). \(\square \)

3.2 Tightness

We first prepare the following result (cf. [11, Proposition 7.1.8]).

Lemma 3.4

For \(\mu ^\varepsilon _0\in {\mathcal {P}}({\mathbb {R}}^d)\) being defined by (3.2), there exits a function \(\psi \in C^2({\mathbb {R}}_+)\) with the properties

$$\begin{aligned} \psi \geqslant 0,\quad \psi (0)=0,\quad 0< \psi '\leqslant 1,\quad -2\leqslant \psi ''\leqslant 0,\quad \lim _{r\rightarrow \infty }\psi (r)=+\infty , \end{aligned}$$

and such that

$$\begin{aligned} \sup _{\varepsilon \in [0,\ell )}\int _{{\mathbb {R}}^d}\psi \big (\log (1+|x|^2)\big )\mu ^\varepsilon _0({\mathord {\mathrm{d}}}x)<\infty . \end{aligned}$$
(3.10)

Proof

Since \(\mu ^\varepsilon _0\) weakly converges to \(\mu _0\) as \(\varepsilon \rightarrow 0\), we have

$$\begin{aligned} \lim _{n\rightarrow \infty }\sup _{\varepsilon \in [0,\ell )}\mu ^\varepsilon _0(B^c_n)=0. \end{aligned}$$

In particular, we can find a subsequence \(n_k\) such that for \(z_k:=\log (1+n^2_k)\),

$$\begin{aligned} z_{k+1}-z_k\geqslant z_k-z_{k-1}\geqslant 1, \end{aligned}$$

and

$$\begin{aligned} \sup _{\varepsilon \in [0,\ell )}\int _{{\mathbb {R}}^d}1_{[z_k,\infty )}(\log (1+|x|^2))\mu ^\varepsilon _0({\mathord {\mathrm{d}}}x)=\sup _{\varepsilon \in [0,\ell )}\mu ^\varepsilon _0(B^c_{n_k})\leqslant 2^{-k}. \end{aligned}$$

Let \(z_0=0\) and define

$$\begin{aligned} \psi _0(s):=\sum _{k=0}^\infty {\mathbf {1}}_{[z_k,z_{k+1}]}(s)\left[ k-1+\frac{s-z_k}{z_{k+1}-z_k}\right] . \end{aligned}$$

Clearly, we have

$$\begin{aligned} \int _{{\mathbb {R}}^d}\psi _0(\log (1+|x|^2))\mu ^\varepsilon _0({\mathord {\mathrm{d}}}x)\leqslant \sum _{k=0}^\infty k\int _{{\mathbb {R}}^d}1_{[z_{k},\infty )}(\log (1+|x|^2))\mu ^\varepsilon _0({\mathord {\mathrm{d}}}x) \leqslant \sum _{k=0}^\infty \frac{k}{ 2^{k}}. \end{aligned}$$

However, \(\psi _0\) does not belong to the class \(C^2({\mathbb {R}}_+)\). Let us take

$$\begin{aligned} \psi (t):=\int _0^tg(r){\mathord {\mathrm{d}}}r \end{aligned}$$

with \(g\in C^1({\mathbb {R}}_+)\), \(0\leqslant g\leqslant 1\), \(-2\leqslant g'\leqslant 0\), and

$$\begin{aligned} g(z)=\psi _0'(z)\quad \text {if}\quad z\in (z_k,z_{k+1}-k^{-1}). \end{aligned}$$

It is easy to see that such a function g always exists. The proof is complete. \(\square \)

Lemma 3.5

Let \(H^\nu _t(x,y)\) be defined by (2.3). We have

$$\begin{aligned} H^\nu _t(x,y)\leqslant 2(1+|y|)\hbar ^\nu _t(x),\quad \forall t\geqslant 0, x,y\in {\mathbb {R}}^d. \end{aligned}$$
(3.11)

Proof

Recall that

$$\begin{aligned} H^\nu _t(x,y)=\int _{B^c_\ell }\log \left( 1+\tfrac{|z|}{1+|x-y|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z). \end{aligned}$$

If \(|x|\leqslant 2|y|\), then

$$\begin{aligned} H^\nu _t(x,y)&\leqslant \int _{B^c_\ell }\log \left( 1+|z|\right) \nu _{t,x}({\mathord {\mathrm{d}}}z)\leqslant \int _{B^c_\ell }\log \left( 1+\tfrac{(1+2|y|)|z|}{1+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z)\\&\leqslant \int _{B^c_\ell }\log \left( 1+\tfrac{|z|}{1+|x|}\right) ^{1+2|y|}\nu _{t,x}({\mathord {\mathrm{d}}}z)=(1+2|y|)\hbar ^\nu _t(x). \end{aligned}$$

If \(|x|>2|y|\), then \(2|x-y|\geqslant 2|x|-2|y|\geqslant |x|\) and

$$\begin{aligned} H^\nu _t(x,y)\leqslant \int _{B^c_\ell }\log \left( 1+\tfrac{2|z|}{2+|x|}\right) \nu _{t,x}({\mathord {\mathrm{d}}}z)\leqslant 2\hbar ^\nu _t(x). \end{aligned}$$

The proof is complete. \(\square \)

Now, we prove the following tightness result.

Lemma 3.6

The family of probability measures \(({\mathbb {Q}}^\varepsilon )_{\varepsilon \in (0,\ell )}\) is tight in \({\mathcal {P}}({\mathbb {D}})\).

Proof

By Aldous’ criterion (see [1] or [19, p.356]), it suffices to check the following two conditions:

  1. (i)

    For any \(T>0\), it holds that

    $$\begin{aligned} \lim _{N\rightarrow \infty }\sup _\varepsilon {\mathbb {Q}}^\varepsilon \left( \sup _{t\in [0,T]}|X_t|>N\right) =0. \end{aligned}$$
  2. (ii)

    For any \(T,\delta _0>0\) and stopping time \(\tau <T-\delta _0\), it holds that

    $$\begin{aligned} \lim _{\delta \rightarrow 0}\sup _\varepsilon \sup _\tau {\mathbb {Q}}^\varepsilon \left( |X_{\tau +\delta }-X_\tau |>\lambda \right) =0,\quad \forall \lambda >0. \end{aligned}$$

Verification of (i) Let \(\psi \) be as in Lemma 3.4 and \(V(x):=\psi (\log (1+|x|^2))\). By the definition of martingale solution (see Remark 1.4), (2.2) and (3.6), there is a càdlàg local \({\mathbb {Q}}^\varepsilon \)-martingale \(M^\varepsilon _t\) and constant C independent of \(\varepsilon \) such that for all \(t\geqslant 0\),

$$\begin{aligned} V(X_{t})&=V(X_0)+\int ^{t}_0{\mathscr {L}}^\varepsilon _rV(X_r){\mathord {\mathrm{d}}}r+M_t^\varepsilon \leqslant V(X_0)+Ct+M^\varepsilon _t. \end{aligned}$$

By Lemma 2.4 , there is a constant \(C>0\) such that for all \(T>0\),

$$\begin{aligned} \sup _{\varepsilon \in (0,\ell )}{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( \sup _{t\in [0,T]}V^{\frac{1}{2}}(X_t)\right) \leqslant C\sup _{\varepsilon \in (0,\ell )}({\mathbb {E}}^{{\mathbb {Q}}_\varepsilon } V(X_0))^{\frac{1}{2}}{\mathop {<}\limits ^{(3.10)}}\infty , \end{aligned}$$
(3.12)

which in turn implies that (i) is true.

Verification of (ii) Let \(\tau \leqslant T-\delta _0\) be a bounded stopping time. For any \(\delta \in (0,\delta _0)\), by the strong Markov property we have

$$\begin{aligned} {\mathbb {Q}}^\varepsilon \left( |X_{\tau +\delta }-X_\tau |>\lambda \right)&={\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( {\mathbb {P}}^\varepsilon _{s,y}\left( |X_{s+\delta }-y|>\lambda \right) \big |_{(s,y)=(\tau , X_\tau )}\right) . \end{aligned}$$
(3.13)

Recalling that \(V_{y}(x):=\psi (\log (1+|x-y|^2)\), and by (2.2), (3.6), (3.7), (3.11) and (1.18) we deduce that

$$\begin{aligned} {\mathscr {L}}^\varepsilon _tV_{y}(x)&\leqslant 2\left( \frac{|a^\varepsilon _t(x)|+\langle x-y,b^\varepsilon _t(x)\rangle ^++g^{\nu ^\varepsilon }_t(x)}{1+|x-y|^2}+2H^{\nu ^\varepsilon }_t(x,y)\right) \\&\leqslant C\left( \frac{1+|x|^2+|x-y|(1+|x|)}{1+|x-y|^2}+H^\nu _t(x,y)\right) \leqslant C(1+|y|^2), \end{aligned}$$

where \(C>0\) is independent of txy and \(\varepsilon \). Furthermore, we have

$$\begin{aligned} V_{y}(X_t)&=V_y(X_s)+\int ^{t}_s{\mathscr {L}}^\varepsilon _rV_{y}(X_r){\mathord {\mathrm{d}}}r+M^\varepsilon _t\\&\leqslant V_y(X_s)+C(1+|y|^2)(t-s)+M^\varepsilon _t, \end{aligned}$$

where \((M^\varepsilon _t)_{t\geqslant s}\) is a local \({\mathbb {P}}^\varepsilon _{s,y}\)-martingale with \(M^\varepsilon _s=0\). By Lemma 2.4 again and since \(V_{y}(y)=0\), we obtain

$$\begin{aligned} {\mathbb {E}}^{{\mathbb {P}}^\varepsilon _{s,y}}\left( V_{y}(X_{s+\delta })^{1/2}\right) \leqslant C(1+|y|)\delta ^{1/2}. \end{aligned}$$

Hence,

$$\begin{aligned} {\mathbb {P}}^\varepsilon _{s,y}\left( |X_{s+\delta }-y|>\lambda \right)&={\mathbb {P}}^\varepsilon _{s,y}\left( V_{y}(X_{s+\delta })>\psi (\log (1+\lambda ^2))\right) \\&\leqslant {\mathbb {E}}^{{\mathbb {P}}^\varepsilon _{s,y}}\left( V_{y}(X_{s+\delta })^{1/2}\right) /\psi ^{1/2}(\log (1+\lambda ^2))\\&\leqslant C(1+|y|)\delta ^{1/2}/\psi ^{1/2}(\log (1+\lambda ^2)), \end{aligned}$$

and by (3.13) and (3.12),

$$\begin{aligned} {\mathbb {Q}}^\varepsilon \left( |X_{\tau +\delta }-X_\tau |>\lambda \right)&\leqslant {\mathbb {Q}}^\varepsilon (|X_\tau |>R)+C(1+R)\delta ^{1/2}/\psi ^{1/2}(\log (1+\lambda ^2))\\&\leqslant C/\psi ^{1/2}(\log (1+R^2))+C(1+R)\delta ^{1/2}/\psi ^{1/2}(\log (1+\lambda ^2)). \end{aligned}$$

Letting \(\delta \rightarrow 0\) first and then \(R\rightarrow \infty \), one sees that (ii) is satisfied. \(\square \)

3.3 Limits

In order to take weak limits, we rewrite

$$\begin{aligned} {\mathscr {B}}_tf(x)+{\mathscr {N}}_tf(x)&=\tilde{b}_t(x)\cdot \nabla f(x) +\int _{{\mathbb {R}}^d}\Theta ^\pi _f(x;z)\nu _{t,x}({\mathord {\mathrm{d}}}z)=:\widetilde{{\mathscr {B}}_t}f(x)+\widetilde{\mathscr {N}}_tf(x), \end{aligned}$$

where

$$\begin{aligned} \tilde{b}_t(x):=b_t(x)+\int _{{\mathbb {R}}^d}\big [\pi (z)-z{\mathbf {1}}_{|z|\leqslant \ell }\big ]\nu _{t,x}({\mathord {\mathrm{d}}}z), \end{aligned}$$
(3.14)

and

$$\begin{aligned} \Theta ^\pi _f(x;z):=f(x+z)-f(x)-\pi (z)\cdot \nabla f(x). \end{aligned}$$
(3.15)

Here, \(\pi :{\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) is a smooth symmetric function satisfying

$$\begin{aligned} \pi (z)=z,\ \ |z|\leqslant \ell ,\ \ \pi (z)=0,\ \ |z|>2\ell . \end{aligned}$$

As in (1.11), we shall also write \(\widetilde{\mathscr {N}}_tf(x)=\widetilde{\mathscr {N}}_t^\nu f(x)=\widetilde{\mathscr {N}}^{\nu _{t,x}}f(x)\). We have the following result.

Lemma 3.7

For any \(f\in C^2_c({\mathbb {R}}^d)\) with support in \(B_R\), there is a constant \(C=C(f)>0\) such that for all \(x\in {\mathbb {R}}^d\) and \(z,z'\in {\mathbb {R}}^d\) with \(|z'|\leqslant |z|\),

$$\begin{aligned} |\Theta ^\pi _f(x;z)-\Theta ^\pi _f(x;z')|\leqslant C(|z-z'|\wedge \ell )({\mathbf {1}}_{B_{R+\ell }}(x){\mathbf {1}}_{|z|\leqslant \ell }|z|+{\mathbf {1}}_{|z|>\ell \vee (|x|-R)}). \end{aligned}$$

Proof

Note that

$$\begin{aligned} {\mathscr {Q}}:=|\Theta ^\pi _f(x;z)-\Theta ^\pi _f(x;z')|=|f(x+z)-f(x+z')-(\pi (z)-\pi (z'))\cdot \nabla f(x)|. \end{aligned}$$

We make the following decomposition:

$$\begin{aligned} {\mathscr {Q}}={\mathscr {Q}}\cdot {\mathbf {1}}_{|z|\leqslant \ell }+{\mathscr {Q}}\cdot {\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{|x|\leqslant R}+{\mathscr {Q}}\cdot {\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{|x|>R}=:{\mathscr {Q}}_1+{\mathscr {Q}}_2+{\mathscr {Q}}_3. \end{aligned}$$

For \({\mathscr {Q}}_1\), since supp\((f)\subset B_R\) and \(|z'|\leqslant |z|\), we have by (1.16) that

$$\begin{aligned} |{\mathscr {Q}}_1|&\leqslant |z-z'|^2\Vert \nabla ^2 f\Vert _\infty {\mathbf {1}}_{B_{R+\ell }}(x){\mathbf {1}}_{|z|\leqslant \ell }\leqslant C(|z-z'|\wedge \ell )|z|{\mathbf {1}}_{B_{R+\ell }}(x){\mathbf {1}}_{|z|\leqslant \ell }. \end{aligned}$$

For \({\mathscr {Q}}_2\), we have

$$\begin{aligned} |{\mathscr {Q}}_2|&\leqslant \Big (|f(x+z)-f(x+z')|+|\pi (z)-\pi (z')|\cdot \Vert \nabla f\Vert _\infty \Big ){\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{|x|\leqslant R}\\&\leqslant C(|z-z'|\wedge \ell ){\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{|x|\leqslant R}. \end{aligned}$$

As for \({\mathscr {Q}}_3\), we have

$$\begin{aligned} |{\mathscr {Q}}_3|=|f(x+z)-f(x+z')|\cdot {\mathbf {1}}_{|z|>\ell }{\mathbf {1}}_{|x|>R}\leqslant C(|z-z'|\wedge \ell ){\mathbf {1}}_{|z|>\ell \vee (|x|-R)}, \end{aligned}$$

where we have used that for \(|z'|\leqslant |z|\leqslant |x|-R\),

$$\begin{aligned} f(x+z)=f(x+z')=0. \end{aligned}$$

Combining the above calculations, we obtain the desired estimate. \(\square \)

The following approximation result will be crucial for taking weak limits.

Lemma 3.8

For any \(\delta \in (0,1)\) and \(R, T>0\), there is a family of Lévy measures \(\eta _{t,x}({\mathord {\mathrm{d}}}z)\) such that for any \(f\in C^2_c(B_R)\),

$$\begin{aligned} \int ^T_0\int _{{\mathbb {R}}^d}\sup _{x\in B_1(y)}|\widetilde{\mathscr {N}}^{\nu _{s,y}}f(x)-\widetilde{\mathscr {N}}^{\eta _{s,y}}f(x)|\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\leqslant \delta , \end{aligned}$$
(3.16)

and

$$\begin{aligned} \sup _{s,y}\Vert \widetilde{\mathscr {N}}^{\eta _{s,y}}f\Vert _\infty <\infty ,\ \ (s,y,x)\mapsto \widetilde{\mathscr {N}}^{\eta _{s,y}}f(x)\hbox { is continuous}. \end{aligned}$$

Moreover, there are continuous functions \(\bar{a}:[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\otimes {\mathbb {R}}^d\) and \(\bar{b}:[0,T]\times {\mathbb {R}}^d\rightarrow {\mathbb {R}}^d\) with compact supports such that

$$\begin{aligned} \int ^T_0\int _{{\mathbb {R}}^d}\left( \frac{|\bar{a}_s(x)-a_s(x)|}{1+|x|^2}+\frac{|\bar{b}_s(x)-\tilde{b}_s(x)|}{1+|x|}\right) \mu _s({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}s\leqslant \delta , \end{aligned}$$
(3.17)

where \(\tilde{b}\) is defined by (3.14).

Proof

(i) By the randomization of kernel functions (see [18, Lemma 14.50, p.469]), there is a measurable function

$$\begin{aligned} h_{t,x}(\theta ):[0,T]\times {\mathbb {R}}^d\times (0,\infty )\rightarrow {\mathbb {R}}^d\cup \{\infty \} \end{aligned}$$

such that

$$\begin{aligned} \nu _{t,x}(A)=\int ^\infty _0{\mathbf {1}}_A(h_{t,x}(\theta )){\mathord {\mathrm{d}}}\theta ,\ \ \forall A\in {\mathscr {B}}({\mathbb {R}}^d). \end{aligned}$$

In particular, we have

$$\begin{aligned} \widetilde{\mathscr {N}}^{\nu _{s,y}} f(x)=\int ^\infty _0\Theta ^\pi _f(x; h_{s,y}(\theta )){\mathord {\mathrm{d}}}\theta =:\widetilde{\mathscr {N}}^{h_{s,y}}f(x), \end{aligned}$$
(3.18)

and

$$\begin{aligned} g^\nu _t(x)=\int ^\infty _0{\mathbf {1}}_{B_\ell }(h_{t,x}(\theta ))|h_{t,x}(\theta )|^2{\mathord {\mathrm{d}}}\theta ,\quad \nu _{t,x}(B^c_\ell )=\int ^\infty _0{\mathbf {1}}_{B^c_\ell }(h_{t,x}(\theta )){\mathord {\mathrm{d}}}\theta . \end{aligned}$$

We introduce \({\mathbb {X}}:=[0,T]\times {\mathbb {R}}^d\times (0,\infty )\) and a locally finite measure \(\gamma \) over \({\mathbb {X}}\) by

$$\begin{aligned} \gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}x,{\mathord {\mathrm{d}}}t):=\varrho _{t,x}(\theta ){\mathord {\mathrm{d}}}\theta \mu _{t}({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t \end{aligned}$$

with \(\varrho _{t,x}(\theta ):={\mathbf {1}}_{B_\ell }(h_{t,x}(\theta )){\mathbf {1}}_{B_{R+\ell +1}}(x)+{\mathbf {1}}_{B^c_{\ell \vee (|x|-R-1)}}(h_{t,x}(\theta ))\) so that

$$\begin{aligned} \int _{\mathbb {X}}\Big (|h_{t,x}(\theta )|^2\wedge \ell ^2\Big )\gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}x,{\mathord {\mathrm{d}}}t)&=\int ^T_0\int _{{\mathbb {R}}^d}g^\nu _t(x){\mathbf {1}}_{B_{R+\ell +1}}(x)\mu _{t}({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t\nonumber \\&\quad +\ell ^2\int ^T_0\int _{{\mathbb {R}}^d}\nu _{t,x}(B^c_{\ell \vee (|x|-R-1)})\mu _{t}({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t{\mathop {<}\limits ^{(1.14)}}\infty . \end{aligned}$$
(3.19)

\(\square \)

Claim

There is a sequence of measurable functions \(\{\bar{h}^n_{t,x}(\theta ), n\in {\mathbb {N}}\}\) so that for each \(n\in {\mathbb {N}}\), \((t,x,\theta )\mapsto \bar{h}^n_{t,x}(\theta )\) is continuous with compact support, and

$$\begin{aligned} |\bar{h}^n_{t,x}(\theta )|\leqslant |h_{t,x}(\theta )|, \end{aligned}$$
(3.20)

and

$$\begin{aligned} \lim _{n\rightarrow \infty }\int _{\mathbb {X}}\Big (|\bar{h}^n_{t,x}(\theta )-h_{t,x}(\theta )|^2\wedge \ell ^2\Big )\gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}x,{\mathord {\mathrm{d}}}t)=0. \end{aligned}$$
(3.21)

Proof of Claim

Fix \(m\in {\mathbb {N}}\). Since \({\mathbf {1}}_{(0,m)}(\theta )\gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}x,{\mathord {\mathrm{d}}}t)\) is a finite measure over \({\mathbb {X}}\), by Lusin’s theorem, there exists a family of continuous functions \(\{\bar{h}^\varepsilon _{t,x}(\theta ),\varepsilon \in (0,1)\}\) with compact support in \((t,x,\theta )\) such that

$$\begin{aligned} |\bar{h}^\varepsilon _{t,x}(\theta )|\leqslant |h_{t,x}(\theta )|,\ \bar{h}^\varepsilon _{t,x}(\theta )\rightarrow h_{t,x}(\theta ),\ \varepsilon \rightarrow 0, \gamma -a.s. \end{aligned}$$

Thus by the dominated convergence theorem,

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}\int _{\mathbb {X}}\Big (|\bar{h}^\varepsilon _{t,x}(\theta )-h_{t,x}(\theta )|^2\wedge \ell ^2\Big ){\mathbf {1}}_{(0,m)}(\theta )\gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}x,{\mathord {\mathrm{d}}}t)=0. \end{aligned}$$

On the other hand, by (3.19) and the monotone convergence theorem, we have

$$\begin{aligned} \lim _{m\rightarrow \infty }\int _{\mathbb {X}}\Big (|h_{t,x}(\theta )|^2\wedge \ell ^2\Big ){\mathbf {1}}_{[m,\infty )}(\theta )\gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}x,{\mathord {\mathrm{d}}}t)=0. \end{aligned}$$

By a diagonalizaion argument, we obtain the desired approximation sequence. The claim is proven.

(ii) Let \(f\in C^2_c(B_R)\). By (3.18), (3.20) and Lemma 3.7, we have for all \(x\in B_1(y)\),

$$\begin{aligned}&|\widetilde{\mathscr {N}}^{h_{s,y}}f(x)-\widetilde{\mathscr {N}}^{\bar{h}^n_{s,y}}f(x)|\leqslant \int ^\infty _0|\Theta ^\pi _f(x; h_{s,y}(\theta ))-\Theta ^\pi _f(x; \bar{h}^n_{s,y}(\theta ))|{\mathord {\mathrm{d}}}\theta \\&\lesssim \int ^\infty _0\Big (|h_{s,y}(\theta )|{\mathbf {1}}_{B_\ell }(h_{s,y}(\theta )){\mathbf {1}}_{B_{R+\ell }}(x)+{\mathbf {1}}_{B^c_{\ell \vee (|x|-R)}}(h_{s,y}(\theta ))\Big )\\&\quad \times \Big (|h_{s,y}(\theta )-\bar{h}^n_{s,y}(\theta )|\wedge \ell \Big ){\mathord {\mathrm{d}}}\theta \\&\leqslant \left( \int ^\infty _0\Big (|h_{s,y}(\theta )|^2{\mathbf {1}}_{B_\ell }(h_{s,y}(\theta )){\mathbf {1}}_{B_{R+\ell +1}}(y) +{\mathbf {1}}_{B^c_{\ell \vee (|y|-R-1)}}(h_{s,y}(\theta ))\Big ){\mathord {\mathrm{d}}}\theta \right) ^{\frac{1}{2}}\\&\quad \times \left( \int ^\infty _0\Big (|h_{s,y}(\theta )-\bar{h}^n_{s,y}(\theta )|^2\wedge \ell ^2\Big )\varrho _{s,y}(\theta ){\mathord {\mathrm{d}}}\theta \right) ^{\frac{1}{2}}\\&=\left( {\mathbf {1}}_{B_{R+\ell +1}}(y)g^\nu _s(y)+\nu _{s,y}(B^c_{\ell \vee (|y|-R-1)})\right) ^{\frac{1}{2}}\\&\quad \times \left( \int ^\infty _0\Big (|h_{s,y}(\theta )-\bar{h}^n_{s,y}(\theta )|^2\wedge \ell ^2\Big ) \varrho _{s,y}(\theta ){\mathord {\mathrm{d}}}\theta \right) ^{\frac{1}{2}}. \end{aligned}$$

Hence, by (1.18) and (1.21) we further have

$$\begin{aligned}&\int ^T_0\int _{{\mathbb {R}}^d}\sup _{x\in B_1(y)}|\widetilde{\mathscr {N}}^{h_{s,y}}f(x)-\widetilde{\mathscr {N}}^{\bar{h}^n_{s,y}}f(x)|\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\\&\quad \lesssim \bigg (\int ^T_0\int _{{\mathbb {R}}^d}\int ^\infty _0\Big (|h_{s,y}(\theta )-\bar{h}^n_{s,y}(\theta )|^2\wedge \ell ^2\Big )\varrho _{s,y}(\theta ){\mathord {\mathrm{d}}}\theta \mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\bigg )^{\frac{1}{2}}\\&\quad =\left( \int _{\mathbb {X}}(|h_{s,y}(\theta )-\bar{h}^n_{s,y}(\theta )|^2\wedge \ell ^2)\gamma ({\mathord {\mathrm{d}}}\theta ,{\mathord {\mathrm{d}}}y,{\mathord {\mathrm{d}}}s)\right) ^{\frac{1}{2}}{\mathop {\rightarrow }\limits ^{(3.21)}} 0. \end{aligned}$$

(iii) For fixed \(n\in {\mathbb {N}}\), since \(f\in C^2_c(B_R)\), by the above claim that \((s,y,\theta )\mapsto \bar{h}^n_{s,y}(\theta )\) is continuous and has compact support, and the dominated convergence theorem, we have that

$$\begin{aligned} (s,y,x)\mapsto \widetilde{\mathscr {N}}^{\bar{h}^n_{s,y}}f(x)=\int ^\infty _0\Theta ^\pi _f(x; \bar{h}^n_{s,y}(\theta )){\mathord {\mathrm{d}}}\theta \hbox { is continuous.} \end{aligned}$$

Moreover, we have

$$\begin{aligned} |\widetilde{\mathscr {N}}^{\bar{h}^n_{s,y}}f(x)|&\leqslant \int ^\infty _0|\Theta ^\pi _f(x; \bar{h}^n_{s,y}(\theta ))|{\mathord {\mathrm{d}}}\theta \leqslant C\int ^\infty _0\Big (|\bar{h}^n_{s,y}(\theta )|^2\wedge 1\Big ){\mathord {\mathrm{d}}}\theta . \end{aligned}$$

Since \(\bar{h}^n_{s,y}(\theta )\) has compact support in (sy), we have

$$\begin{aligned} \sup _{s,y}\Vert \widetilde{\mathscr {N}}^{\bar{h}^n_{s,y}f}\Vert _\infty <\infty . \end{aligned}$$

Finally we only need to take n large enough and define

$$\begin{aligned} \eta _{t,x}(A):=\int ^\infty _0{\mathbf {1}}_A(\bar{h}^n_{t,x}(\theta )){\mathord {\mathrm{d}}}\theta . \end{aligned}$$

(iv) Now let us show (3.17). By Lusin’s theorem, the set of continuous functions with compact supports is dense in \(L^1([0,T]\times {\mathbb {R}}^d, \mu _t({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}t)\). Since \(\frac{a_t(x)}{1+|x|^2}\) and \(\frac{\tilde{b}_t(x)}{1+|x|}\) are bounded by (1.18), the existence of \(\bar{a}\) and \(\bar{b}\) with property (3.17) follows. \(\square \)

Now we are in a position to give:

Proof of Theorem 1.5

Let \({\mathbb {Q}}\) be any accumulation point of \(({\mathbb {Q}}^\varepsilon )_{\varepsilon \in (0,\ell )}\) (see Lemma 3.6). By taking weak limits for

$$\begin{aligned} \mu ^\varepsilon _t={\mathbb {Q}}^\varepsilon \circ X^{-1}_t, \end{aligned}$$

we obtain

$$\begin{aligned} \mu _t={\mathbb {Q}}\circ X^{-1}_t. \end{aligned}$$

It remains to show that \({\mathbb {Q}}\in {\mathcal {M}}^{\mu _0}_0({\mathscr {L}}_t)\). We need to show that for any \(f\in C^2_c({\mathbb {R}}^d)\),

$$\begin{aligned} M_t:=f(X_t)-f(X_0)-\int ^t_0{\mathscr {L}}_s f(X_s){\mathord {\mathrm{d}}}s \end{aligned}$$

is a \({\mathcal {B}}_t\)-martingale under \({\mathbb {Q}}\). Let \(J:=\{t\geqslant 0: {\mathbb {Q}}(\Delta X_t\not =0)>0\}\), which is a countable subset of \({\mathbb {R}}_+\). Since \(t\mapsto M_t\) is right continuous and bounded, to show that \(M_t\) is a \({\mathcal {B}}_t\)-martingale under \({\mathbb {Q}}\), it suffices to prove that for any \(s<t\notin J\) and any bounded \({\mathcal {B}}_s\)-measurable continuous functional \(g_s\) on \({\mathbb {D}}\),

$$\begin{aligned} {\mathbb {E}}^{\mathbb {Q}}(M_t g_s)={\mathbb {E}}^{\mathbb {Q}}(M_s g_s). \end{aligned}$$

Since \({\mathbb {Q}}^\varepsilon \in {\mathscr {M}}^{\mu ^\varepsilon _0}_0({\mathscr {L}}^\varepsilon )\), by the definition of martingale solution, we have

$$\begin{aligned} {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon } (M^\varepsilon _t g_s)={\mathbb {E}}^{{\mathbb {Q}}^\varepsilon } (M^\varepsilon _s g_s), \end{aligned}$$

where

$$\begin{aligned} M^\varepsilon _t:=f(X_t)-f(X_0)-\int ^t_0{\mathscr {L}}^\varepsilon _s f(X_s){\mathord {\mathrm{d}}}s. \end{aligned}$$

Since \(\lim _{\varepsilon \rightarrow 0}{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon } (f(X_t) g_s)={\mathbb {E}}^{{\mathbb {Q}}} (f(X_t) g_s)\) for \(t\notin J\) (see [19, Proposition 3.4, page 349]), we only need to show the following three limits:

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s{\mathscr {A}}^{\varepsilon }_r f(X_r){\mathord {\mathrm{d}}}r \right)&={\mathbb {E}}^{{\mathbb {Q}}} \left( g_s\int ^t_s{\mathscr {A}}_r f(X_r){\mathord {\mathrm{d}}}r\right) , \end{aligned}$$
(3.22)
$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s\widetilde{{\mathscr {B}}^{\varepsilon }_r} f(X_r){\mathord {\mathrm{d}}}r \right)&={\mathbb {E}}^{{\mathbb {Q}}} \left( g_s\int ^t_s\widetilde{{\mathscr {B}}_r} f(X_r){\mathord {\mathrm{d}}}r\right) , \end{aligned}$$
(3.23)
$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\nu ^\varepsilon }_r f(X_r){\mathord {\mathrm{d}}}r \right)&={\mathbb {E}}^{{\mathbb {Q}}} \left( g_s\int ^t_s\widetilde{\mathscr {N}}^\nu _r f(X_r){\mathord {\mathrm{d}}}r\right) . \end{aligned}$$
(3.24)

Below we assume that the support of f is contained in the ball \(B_R\). Let us first show (3.24). Fix \(\delta \in (0,1)\). Let \(\eta _{t,x}({\mathord {\mathrm{d}}}z)\) be as given by Lemma 3.8, and recall that \(\nu ^\varepsilon \) is defined by (3.5). We write

$$\begin{aligned}&\left| {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\nu ^\varepsilon }_r f(X_r){\mathord {\mathrm{d}}}r \right) -{\mathbb {E}}^{{\mathbb {Q}}} \left( g_s\int ^t_s\widetilde{\mathscr {N}}^\nu _r f(X_r){\mathord {\mathrm{d}}}r\right) \right| \\&\leqslant \left| {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\nu ^\varepsilon }_r f(X_r){\mathord {\mathrm{d}}}r \right) -{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon } \left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\eta _\varepsilon }_r f(X_r){\mathord {\mathrm{d}}}r\right) \right| \\&\quad +\left| {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\eta ^\varepsilon }_r f(X_r){\mathord {\mathrm{d}}}r \right) -{\mathbb {E}}^{{\mathbb {Q}}^\varepsilon } \left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\eta }_r f(X_r){\mathord {\mathrm{d}}}r\right) \right| \\&\quad +\left| {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\eta }_r f(X_r){\mathord {\mathrm{d}}}r \right) -{\mathbb {E}}^{{\mathbb {Q}}} \left( g_s\int ^t_s\widetilde{\mathscr {N}}^\eta _r f(X_r){\mathord {\mathrm{d}}}r\right) \right| \\&\quad +\left| {\mathbb {E}}^{{\mathbb {Q}}}\left( g_s\int ^t_s\widetilde{\mathscr {N}}^{\eta }_r f(X_r){\mathord {\mathrm{d}}}r \right) -{\mathbb {E}}^{{\mathbb {Q}}} \left( g_s\int ^t_s\widetilde{\mathscr {N}}^\nu _r f(X_r){\mathord {\mathrm{d}}}r\right) \right| =:\sum _{i=1}^4I_i(\varepsilon ), \end{aligned}$$

where \(\eta ^\varepsilon \) is defined similarly as in (3.5) with \(\nu \) being replaced by \(\eta \). For \(I_1(\varepsilon )\), by definition, we have

$$\begin{aligned} I_1(\varepsilon )&\leqslant \Vert g_s\Vert _\infty {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( \int ^t_s|\widetilde{\mathscr {N}}^{\nu ^\varepsilon }_r f(X_r)-\widetilde{\mathscr {N}}^{\eta ^\varepsilon }_r f(X_r)|{\mathord {\mathrm{d}}}r\right) \\&=\Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}|\widetilde{\mathscr {N}}^{\nu ^\varepsilon }_r f(x)-\widetilde{\mathscr {N}}^{\eta ^\varepsilon }_r f(x)|\mu ^\varepsilon _r(x){\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r\\&=(1-\varepsilon )\Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}|\widetilde{\mathscr {N}}^{\bar{\nu }^\varepsilon }_r f(x)-\widetilde{\mathscr {N}}^{\bar{\eta }^\varepsilon }_r f(x)|{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r\\&=(1-\varepsilon )\Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}\left| \int _{{\mathbb {R}}^d}\Theta ^\pi _f(x;z)(\bar{\nu }^\varepsilon _{r,x}-\bar{\eta }^\varepsilon _{r,x})({\mathord {\mathrm{d}}}z)\right| {\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r, \end{aligned}$$

where \(\Theta ^\pi _f(x;z)\) is defined by (3.15) and

$$\begin{aligned} \bar{\nu }^\varepsilon _{r,x}({\mathord {\mathrm{d}}}z):=\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (r-s,x-y)\nu _{s,y}({\mathord {\mathrm{d}}}z)\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s. \end{aligned}$$

By Fubini’s theorem we further have

$$\begin{aligned} I_1(\varepsilon )&\leqslant \Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}\bigg |\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (r-s,x-y)\\&\quad \times \Big (\widetilde{\mathscr {N}}^{\nu _{s,y}}f(x)-\widetilde{\mathscr {N}}^{\eta _{s,y}}f(x)\Big )\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s\bigg |{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r\\&\leqslant \Vert g_s\Vert _\infty \int ^T_0\int _{{\mathbb {R}}^d}\sup _{x\in B_1(y)}|\widetilde{\mathscr {N}}^{\nu _{s,y}}f(x)-\widetilde{\mathscr {N}}^{\eta _{s,y}}f(x)|\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s {\mathop {\leqslant }\limits ^{(3.16)}}\Vert g_s\Vert _\infty \delta . \end{aligned}$$

For \(I_2(\varepsilon )\), recalling (3.2), we have

$$\begin{aligned} I_2(\varepsilon )&\leqslant \Vert g_s\Vert _\infty {\mathbb {E}}^{{\mathbb {Q}}^\varepsilon }\left( \int ^t_s|\widetilde{\mathscr {N}}^{\eta ^\varepsilon }_r f(X_r)-\widetilde{\mathscr {N}}^{\eta }_r f(X_r)|{\mathord {\mathrm{d}}}r\right) \\&=\Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}|\widetilde{\mathscr {N}}^{\eta ^\varepsilon }_r f(x)-\widetilde{\mathscr {N}}^{\eta }_r f(x)|\mu ^\varepsilon _r(x){\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r\\&=\Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}|(1-\varepsilon )\widetilde{\mathscr {N}}^{\bar{\eta }^\varepsilon }_r f(x)-\mu ^\varepsilon _r(x)\widetilde{\mathscr {N}}^{\eta }_r f(x)|{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r\\&\leqslant (1-\varepsilon )\Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}\int _{{\mathbb {R}}^{d+1}}\rho _\varepsilon (r-s,x-y)\\&\quad \times |\widetilde{\mathscr {N}}^{\eta _{s,y}}f(x)-\widetilde{\mathscr {N}}^{\eta _{r,x}}f(x)|\mu _s({\mathord {\mathrm{d}}}y){\mathord {\mathrm{d}}}s{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r\\&\quad +\varepsilon \Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}|\phi (x)\widetilde{\mathscr {N}}^{\eta }_r f(x)|{\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}r. \end{aligned}$$

Since \((s,y,x)\mapsto \widetilde{\mathscr {N}}^{\eta _{s,y}} f(x)\) is continuous and \(\Vert \widetilde{\mathscr {N}}^\eta f\Vert _\infty <\infty \), by the dominated convergence theorem, we get

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}I_2(\varepsilon )=0. \end{aligned}$$

Concerning \(I_3(\varepsilon )\), it follows by the definition of weak convergence that

$$\begin{aligned} \lim _{\varepsilon \rightarrow 0}I_3(\varepsilon )=0. \end{aligned}$$

For \(I_4(\varepsilon )\), we have

$$\begin{aligned} I_4(\varepsilon )\leqslant \Vert g_s\Vert _\infty \int ^t_s\int _{{\mathbb {R}}^d}|\widetilde{\mathscr {N}}^{\eta }_r f(x)-\widetilde{\mathscr {N}}^{\nu }_r f(x)|\mu _r({\mathord {\mathrm{d}}}x){\mathord {\mathrm{d}}}r{\mathop {\leqslant }\limits ^{(3.16)}}\Vert g_s\Vert _\infty \delta . \end{aligned}$$

Since \(\delta \) is arbitrary, combining the above calculations, we obtain (3.24). The proofs for (3.22) and (3.23), by (3.17), are completely the same as above. The proof is complete. \(\square \)

4 Proof of Theorem 1.13

Let u be the unique weak solution of FPME (1.23) given by Theorem 1.11 with initial value \(\varphi \geqslant 0\) being bounded and \(\int _{{\mathbb {R}}^d}\varphi (x){\mathord {\mathrm{d}}}x=1\). Let

$$\begin{aligned} \sigma _t(x):=|u(t,x)|^{\frac{m-1}{\alpha }},\quad \kappa _t(x):=u(t,x)^{m-1},\quad \nu _{t,x}({\mathord {\mathrm{d}}}z):=\tfrac{\kappa _t(x){\mathord {\mathrm{d}}}z}{|z|^{d+\alpha }}. \end{aligned}$$

By the change of variable we have

$$\begin{aligned} \nu _{t,x}(A)=\int _{{\mathbb {R}}^d}{\mathbf {1}}_{A}(\sigma _t(x) z)\frac{{\mathord {\mathrm{d}}}z}{|z|^{d+\alpha }},\quad A\in {\mathscr {B}}({\mathbb {R}}^d\setminus \{0\}), \end{aligned}$$
(4.1)

and

$$\begin{aligned} {\mathscr {N}}_t f(x):=\mathrm{P.V.}\int _{{\mathbb {R}}^d}(f(x+\sigma _t(x)z)-f(x))\frac{{\mathord {\mathrm{d}}}z}{|z|^{d+\alpha }}=\kappa _t(x)\Delta ^{\alpha /2}, \end{aligned}$$

where the second equality is due to (1.24). By Definition 1.10 it is easy to see that u(tx) solves the following non-local FPKE:

$$\begin{aligned} \partial _t u={\mathscr {N}}^*_t u,\ \ u(0,x)=\varphi (x), \end{aligned}$$

that is, for every \(t>0\) and \(f\in C_0^2({\mathbb {R}}^d)\),

$$\begin{aligned}&\int _{{\mathbb {R}}^d}f(x)u(t,x){\mathord {\mathrm{d}}}x=\int _{{\mathbb {R}}^d}f(x)\varphi (x){\mathord {\mathrm{d}}}x+\int _0^t\int _{{\mathbb {R}}^d}\kappa _s(x) \Delta ^{\alpha /2}f(x)u(s,x){\mathord {\mathrm{d}}}x{\mathord {\mathrm{d}}}s. \end{aligned}$$

Note that for each \(t>0\),

$$\begin{aligned} |\sigma _t(x)|=|u(t,x)|^{\frac{m-1}{\alpha }}\leqslant \Vert \varphi \Vert _\infty ^{\frac{m-1}{\alpha }}. \end{aligned}$$

Thus, by Example 1.8 with the above \(\nu _{t,x}\) and Theorem 1.5 with \(\mu _0({\mathord {\mathrm{d}}}x)=\varphi (x){\mathord {\mathrm{d}}}x,\) there is a martingale solution \({\mathbb {P}}\in {\mathscr {M}}^{\mu _0}_0({\mathcal {N}}_t)\) so that

$$\begin{aligned} {\mathbb {P}}\circ X^{-1}_t({\mathord {\mathrm{d}}}x)=u(t,x){\mathord {\mathrm{d}}}x,\quad t\geqslant 0. \end{aligned}$$

By (4.1) and [19, Theorem 2.26, p.157] (see Remark 4.1 below), there are a stochastic basis \((\Omega ,{\mathcal {F}},{\mathbf{P}}; ({\mathcal {F}}_t)_{t\geqslant 0})\) and a Poisson random measure N on \({\mathbb {R}}^d\times [0,\infty )\) with intensity \(|z|^{-d-\alpha }{\mathord {\mathrm{d}}}z{\mathord {\mathrm{d}}}t\), as well as an \({\mathcal {F}}_t\)-adapted càdlàg process \(Y_t\) such that

$$\begin{aligned} {\mathbf{P}}\circ Y^{-1}_t({\mathord {\mathrm{d}}}x)={\mathbb {P}}\circ X^{-1}_t({\mathord {\mathrm{d}}}x),\quad t\geqslant 0, \end{aligned}$$

and

$$\begin{aligned} {\mathord {\mathrm{d}}}Y_t=\int _{|z|\leqslant 1}\sigma _t(Y_{t-})z\tilde{N}({\mathord {\mathrm{d}}}z,{\mathord {\mathrm{d}}}t)+\int _{|z|>1}\sigma _t(Y_{t-})z N({\mathord {\mathrm{d}}}z, {\mathord {\mathrm{d}}}t), \end{aligned}$$

where \(\tilde{N}({\mathord {\mathrm{d}}}z,{\mathord {\mathrm{d}}}t):=N({\mathord {\mathrm{d}}}z,{\mathord {\mathrm{d}}}t)-|z|^{-d-\alpha }{\mathord {\mathrm{d}}}z{\mathord {\mathrm{d}}}t\). Finally we just need to define

$$\begin{aligned} L_t:=\int _0^t\int _{|z|\leqslant 1}z\tilde{N}({\mathord {\mathrm{d}}}z,{\mathord {\mathrm{d}}}s)+\int _0^t\int _{|z|> 1}z N({\mathord {\mathrm{d}}}z,{\mathord {\mathrm{d}}}s), \end{aligned}$$

then L is a d-dimensional isotropic \(\alpha \)-stable process with Lévy measure \({\mathord {\mathrm{d}}}z/|z|^{d+\alpha }\), and

$$\begin{aligned} {\mathord {\mathrm{d}}}Y_t=\sigma _t(Y_{t-}){\mathord {\mathrm{d}}}L_t. \end{aligned}$$

The proof is finished. \(\square \)

Remark 4.1

For a more recent general analysis on the equivalence of stochastic equations and martingale problem, we refer to [21].