1 Introduction

The construction of a stochastic process from given marginal distributions is called a marginal problem.

Schrödinger’s problem is the construction of a Markov diffusion process on [0, 1] from two endpoint marginal distributions at \(t=0,1\) by solving a variational problem on the relative entropy. We describe it briefly (see \(V_S\) in (1.19), (4.19), and also [28, 30, 44]). Let \(\sigma\) and \(\xi\) be, respectively, a \(d\times d\) nondegenerate matrix-valued and an \(\mathbb {R}^d\)-valued function on \([0,1]\times \mathbb {R}^d\). Suppose that the following stochastic differential equation has a weak solution \(\{X(t)\}_{0\le t\le 1}\) with a positive transition probability density \(p(s,x;t,y), 0\le s<t\le 1, x,y\in \mathbb {R}^d\):

$$\begin{aligned} dX(t)=\xi (t,X(t))dt+\sigma (t,X(t))dW(t),\quad 0<t<1, \end{aligned}$$
(1.1)

where W(t) denotes a d-dimensional Brownian motion defined on a probability space (see Theorem 6 in Sect. 4). Let \(\mathcal{P} (\mathbb {R}^d )\) denote the set of all Borel probability measures on \(\mathbb {R}^d\). For any \(P_0, P_1\in \mathcal{P} (\mathbb {R}^d )\), there exists a unique product measure \(\nu _0(dx)\nu _1(dy)\) that satisfies the following:

$$\begin{aligned} P_1(dy) & = {} \nu _1(dy)\int _{\mathbb {R}^d}p(0,x,1,y)\nu _0(dx), \end{aligned}$$
(1.2)
$$\begin{aligned} P_0(dx) & = {} \nu _0(dx)\int _{\mathbb {R}^d}p(0,x,1,y)\nu _1(dy). \end{aligned}$$
(1.3)

This is Euler’s equation of Schrödinger’s problem and is called Schrödinger’s functional equation or the Schrödinger system (see [55, 56] and also [27] and Proposition 2.1 in [42]). Under some assumptions on \(\sigma\) and \(\xi\) (see, e.g. (A.5)–(A.6) in Sect. 4), if \(P_1(dy)\ll dy\), then there exists a unique weak solution \(\{Y(t)\}_{0\le t\le 1}\) to the following (see [28]) :

$$\begin{aligned} dY(t) & = {} \{a(t,Y(t))D_y\log h(t,Y(t))+\xi (t,Y(t))\}dt \end{aligned}$$
(1.4)
$$\begin{aligned}&+\sigma (t,Y(t))dW(t),\quad 0<t<1,\nonumber \\ P^{Y(0)} & = {} P_0, \end{aligned}$$
(1.5)

where \(a(t,x):=\sigma (t,x)\sigma (t,x)^*\), \(\sigma (t,x)^*\) denotes the transpose of \(\sigma (t,x)\), \(D_y:=\left( \partial /\partial y_i\right) _{i=1}^d\),

$$\begin{aligned} h(t,x):=\int _{\mathbb {R}^d}p(t,x,1,y)\nu _1(dy),\quad 0\le t<1, x\in \mathbb {R}^d, \end{aligned}$$

and \(P^{Y(0)}\) denotes the probability law of Y(0). Besides, the following holds:

$$\begin{aligned} P^{(Y(0),Y(1))}(dxdy)=\nu _0(dx)p(0,x,1,y)\nu _1(dy), \end{aligned}$$
(1.6)

which implies that \(P^{Y(1)}=P_1\) from (1.2). \(\{Y(t)\}_{0\le t\le 1}\) is called the h-path process for \(\{X(t)\}_{0\le t\le 1}\) from two endpoint marginals \(P_0, P_1\) at \(t=0,1\), respectively.

Remark 1

Schrödinger’s functional Eqs. (1.2)–(1.3) is equivalent to the following:

$$\begin{aligned} \overline{h}(1,y) & = {} \int _{\mathbb {R}^d}p(0,x,1,y) \left\{ \int _{\mathbb {R}^d}p(0,x,1,z)\overline{h}(1,z)^{-1}P_1(dz)\right\} ^{-1} P_0(dx), \end{aligned}$$
(1.7)
$$\begin{aligned} \nu _ 1(dy): & = {} \overline{h}(1,y)^{-1}P_1(dy), \end{aligned}$$
(1.8)
$$\begin{aligned} \nu _0(dx): & = {} \left\{ \int _{\mathbb {R}^d}p(0,x,1,z)\nu _ 1(dz)\right\} ^{-1}P_0(dx). \end{aligned}$$
(1.9)

In particular, one only has to find a solution \(\overline{h}(1,\cdot )\) in (1.7).

Motivated by Schrödinger’s quantum mechanics, Nelson proposed the problem of the construction of a Markov diffusion process from the Fokker–Planck equation. We describe it. Let a and b be, respectively, a \(d\times d\) symmetric nonnegative definite matrix-valued and an \(\mathbb {R}^d\)-valued function on \([0,1]\times \mathbb {R}^d\), and let \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P} (\mathbb {R}^d )\).

By \((a,b)\in \mathbf{A} (\{P_t\}_{0\le t\le 1})\), we mean that \(a,b\in L^1([0,1]\times \mathbb {R}^d, dtP_t(dx))\) and the following Fokker-Planck equation holds: for any \(f \in C^{1,2}_b ([0,1]\times \mathbb {R}^d)\) and \(t\in [0,1]\),

$$\begin{aligned}&\int _{\mathbb {R}^d}f (t,x)P_t(dx)-\int _{\mathbb {R}^d}f(0,x)P_0(dx)\nonumber \\&\quad =\int _0^t ds\int _{\mathbb {R}^d}\biggl (\partial _sf(s,x)+\frac{1}{2} \langle a(s,x), D_x^2f(s,x)\rangle \nonumber \\&\qquad + \langle b(s,x),D_x f (s,x)\rangle \biggr ) P_s(dx). \end{aligned}$$
(1.10)

Here \(\partial _s:=\partial /\partial s\), \(D_x^2:=\left( \partial ^2/\partial x_i\partial x_j\right) _{i,j=1}^d\), \(\langle x,y\rangle\) denotes the inner product of \(x, y\in \mathbb {R}^d\) and

$$\begin{aligned} \langle A,B\rangle :=\sum _{i,j=1}^d A_{ij}B_{ij},\quad A=(A_{ij})_{i,j=1}^d, B=(B_{ij})_{i,j=1}^d \in M(d,\mathbb {R}). \end{aligned}$$

We also write \((a,b)\in \mathbf{A}_0 (\{P_t\}_{0\le t\le 1})\) if \(a,b\in L^1_{loc}([0,1]\times \mathbb {R}^d, dtP_t(dx))\) and (1.10) holds for all \(f \in C^{1,2}_0 ([0,1]\times \mathbb {R}^d)\).

Remark 2

For \(\{ P_t\}_{0\le t\le 1}\) in (1.10), \(\mathbf{A} (\{P_t\}_{0\le t\le 1})\) is not necessarily a singleton (see [11,12,13,14, 33]).

The following is a generalized version of Nelson’s problem (see [47, 49, 50]).

Definition 1

(Nelson’s problem) For any \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P} (\mathbb {R}^d )\) such that \(\mathbf{A}_0 (\{P_t\}_{0\le t\le 1})\) is not empty and for any \((a,b)\in \mathbf{A}_0 (\{P_t\}_{0\le t\le 1})\), construct a \(d\times d\) matrix-valued function \(\sigma (t,x)\) on \([0,1]\times \mathbb {R}^d\) and a semimartingale \(\{X(t)\}_{0\le t\le 1}\) such that the following holds: for \((t,x)\in [0,1]\times \mathbb {R}^d\),

$$\begin{aligned} a(t,x) & = {} \sigma (t,x)\sigma (t,x)^*,\quad dtP_t(dx)\hbox {-a.e.}, \end{aligned}$$
(1.11)
$$\begin{aligned} X(t) & = {} X(0)+\int _0^t b(s,X(s))ds+\int _0^t \sigma (s, X(s))dW_X(s), \end{aligned}$$
(1.12)
$$\begin{aligned} P^{X(t)} & = {} P_t. \end{aligned}$$
(1.13)

Here \(W_X\) denotes a d-dimensional Brownian motion.

When \(\sigma (t,x)\) is nondegenerate, \(W_X\) can be taken to be an \((\mathcal{F}_t^X)\)-Brownian motion, where \(\mathcal{F}_t^X\) denotes \(\sigma [X(s):0\le s\le t]\). Otherwise (1.12) means that \(X(t)-X(0)-\int _0^t b(s,X(s))ds\) is a local martingale with a quadratic variational process \(\int _0^t a (s, X(s))ds\) (see, e.g. [25]).

The first result on Nelson’s problem was given by E. Carlen when a is an identity matrix (see [8, 9], and also [10, 46, 63] for different approaches). We generalized it to the case with a variable diffusion matrix (see [33]). P. Cattiaux, C. Léonard extensively generalized it to the case where the jump-type Markov processes are also considered (see [11,12,13,14]). In these papers, they assumed the following condition.

Definition 2

(Finite energy condition (FEC))

There exists \((a,b) \in \mathbf{A} (\{P_t\}_{0\le t\le 1})\) such that the following holds:

$$\begin{aligned} \int _0^1 dt\int _{\mathbb {R}^d}\langle a(t,x)^{-1}b(t,x),b(t,x)\rangle P_t(dx)<\infty . \end{aligned}$$
(1.14)

We describe a class of stochastic optimal transportation problems (SOTPs for short) and approaches to the h-path process and Nelson’s problem by the SOTPs.

Fix a Borel measurable \(d\times d\)-matrix function \(\sigma (t,x)\). Let \(\mathcal{A}\) denote the set of all \(\mathbb {R}^d\)-valued, continuous semimartingales \(\{ X(t)\}_{0\le t\le 1}\) on a (possibly different) complete filtered probability space such that there exists a Borel measurable \(\beta _X :[0,1]\times C([0,1])\longrightarrow \mathbb {R}^d\) for which the following holds:

  1. (i)

    \(\omega \mapsto \beta _X (t,\omega )\) is \(\mathbf{B}(C([0,t]))_+\)-measurable for all \(t\in [0,1]\),

  2. (ii)

    \(X(t)=X(0)+\int _0^t \beta _X (s,X)ds+\int _0^t \sigma (s,X(s))dW_X(s)\), \(0\le t\le 1\),

  3. (iii)
    $$\begin{aligned} E\left[ \int _0^1\left\{ |\beta _X (t,X)|+|\sigma (t,X(t))|^2\right\} dt\right] <\infty . \end{aligned}$$

    Here \(\mathbf{B}(C([0,t]))\) and \(\mathbf{B}(C([0,t]))_+\) denote the Borel \(\sigma\)-field of C([0, t]) and \(\cap _{s> t}{} \mathbf{B}(C([0,s]))\), respectively (see, e.g. [31]). \(|\cdot |:=\langle \cdot , \cdot \rangle ^{1/2}\).

Let \(L:[0,1]\times \mathbb {R}^d \times \mathbb {R}^d\longrightarrow [0,\infty )\) be continuous. The following is a class of the SOTPs (see [41, 45], and also [33, 37, 44]).

Definition 3

(Stochastic optimal transportation problems)

  1. (1)

    For \(P_0\), \(P_1\in \mathcal{P} (\mathbb {R}^d )\),

    $$\begin{aligned} V (P_0,P_{1}):=\inf _{\begin{array}{c} {\scriptstyle X\in \mathcal{A},} \\ {{\scriptstyle P^{X (t)}=P_t, t=0,1}} \end{array}}\ E\biggl [\int _0^1 L(t,X (t);\beta _X (t,X))dt \biggr ]. \end{aligned}$$
    (1.15)
  2. (2)

    For \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P} (\mathbb {R}^d )\),

    $$\begin{aligned} \mathbf{V} (\{ P_t\}_{0\le t\le 1} ):=\inf _{\begin{array}{c} {\scriptstyle X\in \mathcal{A},} \\ {{\scriptstyle P^{X (t)}=P_t, 0\le t\le 1}} \end{array}} E\biggl [\int _0^1 L(t,X (t);\beta _X (t,X))dt \biggr ]. \end{aligned}$$
    (1.16)

If the set over which the infimum is taken is empty, then we set the infimum for infinity.

Suppose that one knows the marginal probability distributions of a stochastic system at times \(t=0, 1\) or \(t\in [0,1]\). To study the stochastic system on [0, 1] from the viewpoint of the principle of least action, one has to consider these kinds of problems.

Remark 3

(i) The sets of stochastic processes over which the infimum are taken in (1.15)-(1.16) can be empty. If \(P_1(dx)\ll dx\), then the case when it is not empty is known for (1.15) in [28] and for (1.16) in [5, 8,9,10,11,12,13,14, 33, 35,36,37, 40, 41, 46, 60, 63]. (ii) For \(\{X(t)\}_{0\le t\le 1}\in \mathcal{A}\),

$$\begin{aligned}&(\sigma (t,x)\sigma (t,x)^*, b_X(t,x):=E[\beta _X(t,X)|(t, X(t)=x)])_{(t,x)\in [0,1]\times \mathbb {R}^d}\nonumber \\&\quad \in \mathbf{A} (\{P^{X(t)}\}_{0\le t\le 1}). \end{aligned}$$
(1.17)

Indeed, by Itô’s formula, (1.10) with \(a=\sigma \sigma ^*, b=b_X\) holds and by Jensen’s inequality,

$$\begin{aligned} E[|b_X(t,X(t))|]=E[|E[\beta _X(t,X)|(t, X(t))]|]\le E[|\beta _X(t,X)|]. \end{aligned}$$
(1.18)

Schrödinger’s problem which is a typical example of the SOTP is \(V_S:=V\) in (1.15) when the following holds:

$$\begin{aligned} L=\frac{1}{2}|\sigma (t,x)^{-1} (u-\xi (t,x))|^2 \end{aligned}$$
(1.19)

(see, e.g. [30, 44, 53]). If \(V_S (P_0, P_1)\) is finite for \(P_0\), \(P_1\in \mathcal{P} (\mathbb {R}^d )\) and if \(\sigma\) and \(\xi\) satisfy nice conditions, then the minimizer uniquely exists and is the h-path process with two endpoint marginals \(P_0, P_1\) in (1.4)–(1.5) (see [16, 21, 44, 45, 51, 62]).

By the continuum limit of \(V (\cdot ,\cdot )\), we considered Nelson’s Problem in a more general setting, including the following case (see [33, 40]).

Definition 4

(Generalized finite energy condition (GFEC))

There exists \(\gamma >1\) and \((a,b) \in \mathbf{A} (\{P_t\}_{0\le t\le 1})\) such that the following holds:

$$\begin{aligned} \int _0^1 dt\int _{\mathbb {R}^d}\langle a(t,x)^{-1}b(t,x),b(t,x)\rangle ^{\frac{\gamma }{2}} P_t(dx)<\infty . \end{aligned}$$
(1.20)

As an application of the Duality Theorem for \(\mathbf{V}\), we also gave an approach to Nelson’s Problem under the condition which includes the GFEC (see [41]).

If (1.11)–(1.13) hold, then they also say that the superposition principle holds. When \(a\equiv 0\), the superposition principle was studied in [1, 2, 36, 37]. Trevisan’s result [60] almost completely solved Nelson’s problem (see also [5, 19]). In the case where the linear operator with the second order differential operator and with the Lévy measure is considered, it was studied in [14, 52].

Theorem 1

(See [60]) Suppose that there exists \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P} (\mathbb {R}^d )\) such that \((a,b)\in \mathbf{A} (\{P_t\}_{0\le t\le 1})\) exists.

Then Nelson’s problem (1.11)–(1.13) has a solution.

In his problem, Nelson considered the case where \(a=Identity\) and \(b=D_x \psi (t,x)\) for some function \(\psi\). It turned out that the Nelson process is the minimizer of \(\mathbf{V}_N:=\mathbf{V}\) when (1.19) with \(\sigma =Identity\) and \(\xi =0\) and the FEC hold (see Proposition 3.1 in [33] and also Theorem 4 in Sect. 2). Indeed, if \((a, D_x \psi _i)\in \mathbf{A} (\{P_t\}_{0\le t\le 1})\), \(i=1,2\), then \(D_x \psi _1=D_x \psi _2\), \(dtP_t(dx)\)-a.e.. In this sense, we consider that Nelson’s problem is the studies of the superposition principle and of the minimizer of \(\mathbf{V}\). In particular, if the superposition principle holds, then the set over which the infimum is taken in \(\mathbf{V}\) is not empty and then one can consider a minimizer of \(\mathbf{V}\), provided it is finite. There was a different approach by showing Proposition 1 in Sect. 2 via the Duality Theorems in Theorems 3 and 4 in Sect. 2 (see [41] and also [33, 40]). It is also generalized by the superposition principle and our previous approach to the first part of Nelson’s problem is not useful anymore.

In Sect. 2, we improve our previous results on the SOTPs with a convex cost function by the superposition principle in Theorem 1.

More precisely, we prove that the SOTPs are equivalent to variational problems for probability measures given by the Fokker–Planck equation and to those by a relaxed version of the Fokker–Planck equation (see Proposition 1 in Sect. 2). In particular, we can prove the convexity and the lower-semicontinuity of the SOTPs in marginal distributions by a finite-dimensional approach though the SOTPs are variational problems for semimartingales. It gives a new insight into the SOTPs and lets us revisit them.

In Sect. 3, in the case where \(d=1\) and where a is not fixed, we consider slightly relaxed versions of the SOTPs of which cost functions are not supposed to be convex. In this case, we need a generalization of Trevisan’s result which was recently obtained by Bogachev, Röckner, Shaposhnikov.

Theorem 2

(See [5]) Suppose that there exists \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P} (\mathbb {R}^d )\) such that \((a,b)\in \mathbf{A}_0 (\{P_t\}_{0\le t\le 1})\) exists and that the following holds:

$$\begin{aligned} \int _0^1 dt\int _{\mathbb {R}^d}\frac{|a(t,x)|+|\langle x,b(t,x)\rangle |}{1+|x|^2}P_t(dx)<\infty . \end{aligned}$$
(1.21)

Then Nelson’s problem (1.11)–(1.13) has a solution.

As a fundamental problem of the stochastic optimal control theory, the test of the Markov property of a minimizer is known. We also discuss this problem for a finite-time horizon stochastic optimal control problem.

In Sect. 4, we study the semiconcavity and the Lipschitz continuity of Schrödinger’s problem \(V_S\).

2 SOTPs with a convex cost

In this section, we discuss applications of D. Trevisan’s result to the Duality Theorems for the SOTPs in the case where \(u\mapsto L(t,x;u)\) is convex and where \(\sigma\) and \(a=\sigma \sigma ^*\) in (1.11) are fixed. We write \(b\in \mathbf{A} (\{P_t\}_{0\le t\le 1})\) if \((a,b)\in \mathbf{A} (\{P_t\}_{0\le t\le 1})\) for the sake of simplicity (see (1.10) for notation).

As a preparation, we introduce two classes of marginal problems which play crucial roles in the proof of the Duality Theorems for the SOTPs (see [40, 41]) and which will be proved to be equivalent to the SOTPs by D. Trevisan’s result.

The following can be considered as versions of the SOTPs for a flow of marginals which satisfy (1.10).

Definition 5

(SOTPs for marginal flows)

  1. (1)

    For \(P_0\), \(P_{1}\in \mathcal{P}(\mathbb {R}^d)\),

    $$\begin{aligned} \mathrm{v}(P_0,P_{1}) :=\inf _{\begin{array}{c} {\scriptstyle b\in \mathbf{A} (\{Q_t\}_{0\le t\le 1}),} \\ {{\scriptstyle Q_t=P_t, t=0,1}} \end{array}} \int _0^1 dt \int _{\mathbf{R }^d}L(t,x;b(t,x))Q_t(dx). \end{aligned}$$
    (2.1)
  2. (2)

    For \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R}^d)\),

    $$\begin{aligned} \mathbf{v} (\{ P_t\}_{0\le t\le 1} ) :=\inf _{b\in \mathbf{A} (\{ P_t\}_{0\le t\le 1})} \int _0^1 dt\int _{\mathbb {R}^d} L(t,x;b(t,x))P_t(dx). \end{aligned}$$
    (2.2)

For \(\mu (dxdu)\in \mathcal{P} ( \mathbb {R}^d\times \mathbb {R}^d )\),

$$\begin{aligned} \mu _{1}(dx):=\mu (dx\times \mathbb {R}^d),\quad \mu _{2}(du):=\mu (\mathbb {R}^d\times du). \end{aligned}$$
(2.3)

We write \(\nu (dtdxdu)\in \tilde{\mathcal{A}}\) if the following holds. (i) \(\nu \in \mathcal{P} ([0,1]\times \mathbb {R}^d \times \mathbb {R}^d )\) and

$$\begin{aligned} \int _{[0,1]\times \mathbb {R}^d\times \mathbb {R}^d} (|a(t,x)|+|u|)\nu (dtdxdu)<\infty . \end{aligned}$$
(2.4)

(ii) \(\nu (dtdxdu)=dt\nu (t,dxdu)\), \(\nu (t,dxdu)\in \mathcal{P} ( \mathbb {R}^d\times \mathbb {R}^d )\), \(\nu _{1}(t, dx), \nu _{2}(t, du)\in \mathcal{P} ( \mathbb {R}^d)\), \(dt-\)a.e. and \(t\mapsto \nu _{1}(t, dx)\) has a weakly continuous version \(\nu _{1,t}(dx)\in \mathcal{P} ( \mathbb {R}^d)\) for which the following holds: for any \(t\in [0,1]\) and \(f\in C^{1,2}_b ([0,1]\times \mathbb {R}^d)\),

$$\begin{aligned}&\int _{\mathbb {R}^d}f (t,x)\nu _{1,t}(dx)-\int _{\mathbb {R}^d}f(0,x)\nu _{1,0}(dx)\nonumber \\&\quad =\int _{[0,t]\times \mathbb {R}^d\times \mathbb {R}^d} \mathcal{L}_{s,x,u} f(s,x)\nu (dsdxdu). \end{aligned}$$
(2.5)

Here

$$\begin{aligned} \mathcal{L}_{s,x,u} f(s,x):=\partial _s f(s,x)+\frac{1}{2}\langle a(s,x),D_x^2 f(s,x)\rangle +\langle u,D_x f (s,x)\rangle . \end{aligned}$$
(2.6)

We introduce a relaxed version of the problem above (see [23] and references therein for related topics).

Definition 6

(SOTPs for marginal measures)

  1. (1)

    For \(P_0\), \(P_{1}\in \mathcal{P}(\mathbb {R}^d)\),

    $$\begin{aligned} \tilde{\mathrm{v}}(P_0,P_{1}) :=\inf _{\begin{array}{c} {\scriptstyle \nu \in \tilde{\mathcal{A}},} \\ {{\scriptstyle \nu _{1,t}=P_t, t=0,1}} \end{array}} \int _{[0,1]\times \mathbb {R}^d\times \mathbb {R}^d}L(t,x;u)\nu (dtdxdu). \end{aligned}$$
    (2.7)
  2. (2)

    For \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R}^d)\),

    $$\begin{aligned} \tilde{\mathbf{v}} (\{ P_t\}_{0\le t\le 1} ) :=\inf _{\begin{array}{c} {\scriptstyle \nu \in \tilde{\mathcal{A}},} \\ {{\scriptstyle \nu _{1,t}=P_t, 0\le t\le 1}} \end{array}} \int _{[0,1]\times \mathbb {R}^d\times \mathbb {R}^d}L(t,x;u)\nu (dtdxdu). \end{aligned}$$
    (2.8)

Remark 4

If \(b\in \mathbf{A}(\{ P_t\}_{0\le t\le 1})\) and \(X\in \mathcal{A}\), then \(dtP_t(dx)\delta _{b(t,x)}(du)\in \tilde{\mathcal{A}}\) and \(dtP^{(X(t),\beta _X(t,X))}(dxdu)\in \tilde{\mathcal{A}}\), respectively. Here \(\delta _x\) denotes the delta measure on \(\{x\}\). In particular, \(dtP^{(X(t),\beta _X(t,X))}(dxdu)\) is the distribution of a \([0,1]\times \mathbb {R}^d\times \mathbb {R}^d\)-valued random variable \((t,X(t),\beta _X(t,X))\). This is why we call (2.7)–(2.8) SOTPs for marginal measures (see also Lemma 1 given later). One can also identify \(\{ P_t\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R}^d)\) with \(dtP_t(dx)\in \mathcal{P}([0,1]\times \mathbb {R}^d)\) when \(\mathbf{V}, \mathbf{v}\) and \(\tilde{\mathbf{v}}\) are considered (see Theorem 4 and also [41, 44]).

We introduce assumptions.

(A.0.0). (i) \(\sigma _{ij}\in C_b([0,1]\times \mathbb {R}^d)\), \(i,j=1,\ldots ,d\). (ii) \(\sigma (\cdot )=(\sigma _{ij}(\cdot ))_{i,j=1}^d\) is a nondegenerate \(d\times d\)-matrix function on \([0,1]\times \mathbb {R}^d\).

(A.1). (i) \(L\in C([0,1]\times \mathbb {R}^d \times \mathbb {R}^d;[0,\infty ))\). (ii) \(\mathbb {R}^d\ni u\mapsto L(t,x;u)\) is convex for \((t,x)\in [0,1]\times \mathbb {R}^d\).

(A.2).

$$\begin{aligned} \lim _{|u|\rightarrow \infty } \frac{\inf \{L(t,x;u)|(t,x)\in [0,1]\times \mathbb {R}^d\}}{|u|}=\infty . \end{aligned}$$

The following proposition gives the relations among and the properties of three classes of the SOTPs stated in Definitions 3, 5, and 6 above. In particular, it implies that they are equivalent in our setting and why they are all called the SOTPs. It also implies the convexities and the lower semicontinuities of \(V(P_0,P_{1})\) and \(\mathbf{V} (\{ P_t\}_{0\le t\le 1})\).

Proposition 1

  1. (i)

    Suppose that (A.1) holds. Then the following holds:

    $$\begin{aligned} V(P_0,P_{1})=\mathrm{v}(P_0,P_{1})=\tilde{\mathrm{v}}(P_0,P_{1}),\quad P_0, P_{1}\in \mathcal{P}(\mathbb {R}^d), \end{aligned}$$
    (2.9)
    $$\begin{aligned} \mathbf{V} (\{ P_t\}_{0\le t\le 1} )= \mathbf{v} (\{ P_t\}_{0\le t\le 1} )=\tilde{\mathbf{v}} (\{ P_t\}_{0\le t\le 1} ), \{ P_t\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R}^d). \end{aligned}$$
    (2.10)
  2. (ii)

    Suppose, in addition, that (A.0.0,i) and (A.2) hold. Then there exist minimizers X of \(V (P_0,P_{1})\) and Y of \(\mathbf{V} (\{ P_t\}_{0\le t\le 1} )\) for which

    $$\begin{aligned} \beta _X (t,X)=b_X(t,X(t)),\quad \beta _Y (t,Y)=b_Y(t,Y(t)), \end{aligned}$$
    (2.11)

    provided \(V (P_0,P_{1})\) and \(\mathbf{V} (\{ P_t\}_{0\le t\le 1} )\) are finite, respectively (see (1.17) for notation).

  3. (iii)

    Suppose, in addition, that (A.0.0,ii) holds and that \(\mathbb {R}^d\ni u\mapsto L(t,x;u)\) is strictly convex for \((t,x)\in [0,1]\times \mathbb {R}^d\). Then for any minimizers X of \(V (P_0,P_{1})\) and Y of \(\mathbf{V} (\{ P_t\}_{0\le t\le 1} )\), (2.11) holds and \(b_X\) and \(b_Y\) in (2.11) are unique on the support of \(dtP^{X(t)}(dx)\) and \(dtP^{Y(t)}(dx)\), respectively.

Remark 5

Let \(c\in C(\mathbb {R}^d\times \mathbb {R}^d;[0,\infty ))\). For \(P_0, P_{1}\in \mathcal{P}(\mathbb {R}^d)\),

$$\begin{aligned}&T_M(P_0, P_{1}):=\inf \left\{ \int _{\mathbb {R}^d}c(x,\varphi (x))P_0(dx)\biggl |P_0\varphi ^{-1}=P_1\right\} \nonumber \\&\quad \ge T(P_0, P_{1}):= \inf \left\{ \int _{\mathbb {R}^d\times \mathbb {R}^d}c(x,y)\mu (dxdy)\biggl | \mu _i=P_{i-1}, i=1,2\right\} \end{aligned}$$
(2.12)

(see (2.3) for notation). \(T_M(P_0, P_{1})\) and \(T(P_0, P_{1})\) are called Monge’s and Monge-Kantorovich’s problems, respectively. The second equalities in (2.9)–(2.10) are similar to the relation between Monge’s and Monge-Kantorovich’s problems since \(\tilde{\mathrm{v}}\) and \(\tilde{\mathbf{v}}\) are the infimums of linear functionals of measure (see, e.g. [51, 61]).

Before we prove Proposition 1, we state its application to the SOTPs.

For any \(s\ge 0\) and \(P\in \mathcal{P} (\mathbb {R}^d )\),

$$\begin{aligned} \mathbf{\Psi }_{P}(s):= \biggl \{ \nu \in \tilde{\mathcal{A}}\biggl |\nu _{1,0}=P, \int _{[0,1]\times \mathbb {R}^d\times \mathbb {R}^d}L(t,x;u)\nu (dtdxdu)\le s \biggl \}. \end{aligned}$$
(2.13)

Let \(\mathcal{P} (\mathbb {R}^d )\) be endowed with a weak topology. Then the following is known.

Lemma 1

(See [41]) Suppose that (A.0.0,i) and (A.1)–(A.2) hold. Then for any \(s\ge 0\) and compact set \(K\subset \mathcal{P} (\mathbb {R}^d )\), the set \(\cup _{P\in K}{} \mathbf{\Psi }_{P}(s)\) is compact in \(\mathcal{P} ([0,1]\times \mathbb {R}^d \times \mathbb {R}^d )\).

Lemma 1 was given in [41] to prove the Duality Theorems for \(\mathrm{v}(P_0,P_{1})\) and \(\mathbf{v}(\{ P_t\}_{0\le t\le 1})\). By Proposition 1, it can be also used in the proof of the lower semicontinuities of \({V}(P_0,P_{1})\) and \(\mathbf{V}(\{ P_t\}_{0\le t\le 1})\). Besides, we do not need the following assumption anymore.

(A).

$$\begin{aligned} \varDelta L(\varepsilon _1,\varepsilon _2):=\sup \frac{L(t,x;u)-L(s,y;u)}{1+L(s,y;u)} \rightarrow 0\quad \hbox {as} \varepsilon _1, \varepsilon _2\downarrow 0, \end{aligned}$$
(2.14)

where the supremum is taken over all (tx) and \((s,y) \in [0,1]\times \mathbb {R}^d\) for which \(|t-s|<\varepsilon _1\), \(|x-y|<\varepsilon _2\) and over all \(u\in \mathbb {R}^d\).

This assumption can be used to prove the lower semicontinuity of the following (see [26], Chapter 9.1):

$$\begin{aligned} AC([0,1];\mathbb {R}^d)\ni f\mapsto \int _0^1 L\left( t,f(t);\frac{df(t)}{dt}\right) dt. \end{aligned}$$
(2.15)

We state additional assumptions and the improved versions of the Duality Theorems for \({V}(P_0,P_{1})\) and \(\mathbf{V}(\{ P_t\}_{0\le t\le 1})\).

(A.0). \(\sigma _{ij}\in C^{1}_b ([0,1]\times \mathbb {R}^d)\), \(i,j=1,\dots ,d\).

(A.3). (i) \(\partial _t L(t,x;u)\) and \(D_x L(t,x;u)\) are bounded on \([0,1]\times \mathbb {R}^d \times B_R\) for all \(R>0\), where \(B_R:=\{ x\in \mathbb {R}^d ||x|\le R\}\). (ii) \(C_L\) is finite, where

$$\begin{aligned} C_L:=\sup \left\{ \frac{L(t,x;u)}{1+L(t,y;u)}\biggr |0\le t\le 1, x, y, u \in \mathbb {R}^d\right\} . \end{aligned}$$
(2.16)
$$\begin{aligned} H(t,x;z):=\sup \{\langle z,u\rangle -L(t,x;u)|u\in \mathbb {R}^d\}. \end{aligned}$$
(2.17)

The following is a generalization of [41], in that we do not need the nondegeneracy of a and the assumption (A) and can be proved almost in the same way as in [41] by Proposition 1 and by Lemma 1. Indeed, in our previous papers, by the nondegeneracy of a, we made use of the Cameron–Martin–Maruyama–Girsanov formula to prove the convexity of \(P\mapsto V(P_0,P)\), which we can avoid by Proposition 1. The lower semicontinuity of \(P\mapsto V(P_0,P)\) can be proved by Proposition 1 and by Lemma 1. In [59], they considered a similar problem and used a general property on the convex combination of probability measures on an enlarged space, which allows them not to assume the nondegeneracy of a, though they assumed a condition which is similar to (A).

One can also find details in [44] (see [24] for related topics). We refer readers to [15, 20, 29] on the viscosity solution.

Theorem 3

(Duality Theorem for V) Suppose that (A.0)–(A.3) hold. Then, for any \(P_0\), \(P_{1}\in \mathcal{P} (\mathbb {R}^d )\),

$$\begin{aligned} V(P_0,P_1) & = {} \mathrm{v}(P_0,P_{1})=\tilde{\mathrm{v}}(P_0,P_{1})\nonumber \\ & = {} \sup _{f\in C_b^\infty (\mathbb {R}^d)} \biggl \{\int _{\mathbb {R}^d}f(x)P_1 (dx)-\int _{\mathbb {R}^d}\varphi (0,x;f)P_0(dx)\biggr \}, \end{aligned}$$
(2.18)

where \(\varphi (t,x;f)\) denotes the minimal bounded continuous viscosity solution to the following HJB Eqn: on \([0,1)\times \mathbb {R}^d\),

$$\begin{aligned} \partial _t\varphi (t,x)+ \frac{1}{2}\langle a(t,x), D_x^2 \varphi (t,x)\rangle +H(t,x;D_x \varphi (t,x)) & = {} 0,\nonumber \\ \varphi (1,\cdot ) & = {} f. \end{aligned}$$
(2.19)

We introduce the following condition to replace \(\varphi\) in (2.18) by classical solutions to the HJB Eq. (2.19).

(A.4). (i) “\(\sigma\) is an identity”, or “ \(\sigma (\cdot )=(\sigma _{ij}(\cdot ))_{i,j=1}^d\) is uniformly nondegenerate, \(\sigma _{ij}\in C^{1,2}_b ([0,1]\times \mathbb {R}^d)\), \(i,j=1,\ldots ,d\), and there exist functions \(L_1\) and \(L_2\) so that \(L=L_1 (t,x)+L_2 (t,u)\)”. (ii) \(L(t,x;u)\in C^1([0,1]\times \mathbb {R}^d \times \mathbb {R}^d;[0,\infty ))\) and is strictly convex in u. (iii) \(L\in C^{1,2,0}_b ([0,1]\times \mathbb {R}^d\times B_R )\) for any \(R>0\).

Since (A.4,i), (A.4,ii), and (A.4,iii) imply (A.0), (A.1), and (A.3,i), respectively, the following holds from Theorem 3, in the same way as in [41] (see also [44]).

Corollary 1

Suppose that (A.2), (A.3,ii), and (A.4) hold. Then (2.18) holds even if the supremum is taken over all classical solutions \(\varphi \in C_b^{1,2} ([0,1]\times \mathbb {R}^d)\) to the HJB Eqn (2.19). Besides, for any \(P_0, P_1\in \mathcal{P}(\mathbb {R}^d)\) for which \(V (P_0, P_1)\) is finite, a minimizer \(\{X(t)\}_{0\le t\le 1}\) of \(V (P_0, P_1)\) exists and the following holds: for any maximizing sequence \(\{\varphi _n\}_{n\ge 1}\) of (2.18),

$$\begin{aligned} 0 & = {} \lim _{n\rightarrow \infty } E\biggl [\int _0^1 |L(t,X (t);\beta _X(t,X)) \nonumber \\&-\{\langle \beta _X(t,X), D_x \varphi _n (t,X(t))\rangle -H(t,X (t);D_x\varphi _n(t,X(t)))\}|dt\biggr ]. \end{aligned}$$
(2.20)

In particular, there exists a subsequence \(\{n_k \}_{k\ge 1}\) for which

$$\begin{aligned} \beta _X(t,X) = \lim _{k\rightarrow \infty } D_z H(t,X(t);D_x \varphi _{n_k}(t,X(t))),\quad dtdP\hbox {-a.e.} \end{aligned}$$
(2.21)

The following is also a generalization of [41] and can be proved almost in the same way as in [41] by Proposition 1 and Lemma 1.

Theorem 4

(Duality Theorem for \(\mathbf{V}\)) Suppose that (A.0)-(A.3) hold. Then for any \(\mathbf{P}:=\{ P_t\}_{0\le t\le 1}\subset \mathcal{P} (\mathbb {R}^d )\),

$$\begin{aligned}&\mathbf{V} (\mathbf{P})=\mathbf{v} (\mathbf{P})=\tilde{\mathbf{v}} (\mathbf{P} )\nonumber \\&\quad =\sup _{f\in C_b^\infty ([0,1]\times \mathbb {R}^d)} \biggl \{\int _0^1\int _{\mathbb {R}^d}f(t,x)dtP_t (dx) -\int _{\mathbb {R}^d}\phi (0,x;f)P_0(dx)\biggr \}, \end{aligned}$$
(2.22)

where \(\phi (t,x;f)\) denotes the minimal bounded continuous viscosity solution of the following HJB Eqn: on \([0,1)\times \mathbb {R}^d\),

$$\begin{aligned} \partial _t\phi (t,x)+\frac{1}{2}\langle a(t,x), D_x^2 \phi (t,x)\rangle +H(t,x;D_x \phi (t,x))+f(t,x) & = {} 0,\nonumber \\ \phi (1,x) & = {} 0. \end{aligned}$$
(2.23)

Suppose that (A.4) holds instead of (A.0), (A.1), and (A.3,i). Then (2.22) holds even if the supremum is taken over all classical solutions \(\phi \in C^{1,2}_b ([0,1]\times \mathbb {R}^d)\) to the HJB Eqn (2.23). Besides, if \(\mathbf{V} (\mathbf{P})\) is finite, then a minimizer \(\{X(t)\}_{0\le t\le 1}\) of \(\mathbf{V} (\mathbf{P})\) exists and the following holds: for any maximizing sequence \(\{\phi _n\}_{n\ge 1}\) of (2.22),

$$\begin{aligned} 0 & = {} \lim _{n\rightarrow \infty } E\biggl [\int _0^1 |L(t,X (t);\beta _X(t,X)) \nonumber \\&-\{\langle \beta _X(t,X), D_x \phi _n (t,X(t))\rangle -H(t,X (t);D_x\phi _n(t,X(t)))\}|dt\biggr ]. \end{aligned}$$
(2.24)

In particular, there exists a subsequence \(\{n_k \}_{k\ge 1}\) for which

$$\begin{aligned} \beta _X(t,X) = \lim _{k\rightarrow \infty } D_z H(t,X(t);D_x \phi _{n_k}(t,X(t))),\quad dtdP\hbox {-a.e.} \end{aligned}$$
(2.25)

Remark 6

(See [41, 44]) (i) Suppose that (A.0)–(A.3) hold. Then for any \(f\in UC_b (\mathbb {R}^d)\), the following is the minimal bounded continuous viscosity solution of the HJB equation (2.19):

$$\begin{aligned} \varphi (t,x;f)= \sup _{X\in \mathcal{A}_t, X (t)=x}E\biggl [f(X(1))-\int _t^1 L(s, X(s);\beta _X(s,X))ds\biggl ], \end{aligned}$$
(2.26)

where \(\mathcal{A}_t\) denotes \(\mathcal{A}\) with a time interval [0, 1] replaced by [t, 1]. (ii) Suppose that (A.0)–(A.3) with L replaced by \(L(t,x;u)-f(t,x)\) hold. Then the following is the minimal bounded continuous viscosity solution of the HJB Eq. (2.23):

$$\begin{aligned} \phi (t,x;f)= \sup _{X\in \mathcal{A}_t,X (t)=x} E\biggl [\int _t^1 \left\{ f(s,X(s))-L(s, X(s);\beta _X(s,X))\right\} ds\biggl ]. \end{aligned}$$
(2.27)

We consider Schrödinger’s and Nelson’s problems, i.e., \(V_S\) and \(V_N\). We introduce a new assumption.

(A.4)’. (1.19) holds, \(\sigma (\cdot )=(\sigma _{ij}(\cdot ))_{i,j=1}^d\) is uniformly nondegenerate, and \(a\in C^{1,2}_b ([0, 1] \times \mathbb {R}^d;M(d,\mathbb {R})), \xi \in C^{1,2}_b ([0, 1] \times \mathbb {R}^d;\mathbb {R}^d)\).

(A.4)’ implies (A.0)-(A.3). Besides, for \(f\in C^3_b (\mathbb {R}^d)\) and \(f\in C^{1,2}_b ([0,1]\times \mathbb {R}^d)\), the HJB equations (2.19) and (2.23) have unique classical solutions in \(C^{1,2}_b ([0,1]\times \mathbb {R}^d)\), respectively. They are also the minimal bounded continuous viscosity solutions of (2.19) and (2.23), respectively, since they have the same representation formulas given in Remark 6 (see, e.g. [20, 22] on classical solutions and Lemma 4.5 in [41] on viscosity solution). In particular, the following holds though (A.4)’ does not imply (A.4).

Corollary 2

Suppose that (A.4)’ holds. Then the assertions in Corollary 1 and Theorem 4 hold.

Remark 7

If (1.19) holds, then

$$\begin{aligned} L(t,x;u)- \{\langle u, z\rangle -H(t,x;z)\} =\frac{1}{2}|\sigma (t,x)^{-1}(a(t,x)z-u+\xi (t,x))|^2. \end{aligned}$$
(2.28)

In the rest of this section, we prove Proposition 1.

Proof of Proposition 1

We prove (i). For \(\{X(t)\}_{0\le t\le 1}\in \mathcal{A}\), by Jensen’s inequality,

$$\begin{aligned} E\biggl [\int _0^1 L(t,X (t);\beta _X (t,X))dt \biggr ] \ge E\biggl [\int _0^1 L(t,X (t);b_X(t,X(t)))dt \biggr ]. \end{aligned}$$
(2.29)

Theorem 1 implies the first equalities of (2.9)–(2.10) (see Remark 3, (ii)).

For \(\nu \in \tilde{\mathcal{A}}\),

$$\begin{aligned} b_\nu (t,x):=\int _{\mathbb {R}^d}u\nu (t,x,du), \end{aligned}$$
(2.30)

where \(\nu (t,x,du)\) denotes a regular conditional probability of \(\nu\) given (tx). Then by Jensen’s inequality,

$$\begin{aligned} \int _{[0,1]\times \mathbb {R}^d\times \mathbb {R}^d}L(t,x;u)\nu (dtdxdu) \ge \int _0^1 dt\int _{\mathbb {R}^d}L(t,x;b_\nu (t,x))\nu _{1,t} (dx). \end{aligned}$$
(2.31)

\(b_\nu \in \mathbf{A} (\{\nu _{1,t}\}_{0\le t\le 1})\) from (2.5), since by Jensen’s inequality,

$$\begin{aligned} \int _{[0,1]\times \mathbb {R}^d}|b_\nu (t,x)|dt\nu _{1,t}(dx) \le \int _{[0,1]\times \mathbb {R}^d\times \mathbb {R}^d}|u|\nu (dtdxdu)<\infty , \end{aligned}$$

and for any \(t\in [0,1]\) and \(f\in C^{1,2}_b ([0,1]\times \mathbb {R}^d)\),

$$\begin{aligned}&\int _{[0,t]\times \mathbb {R}^d\times \mathbb {R}^d} \langle u, D_x f(s,x)\rangle \nu (dsdxdu)\nonumber \\&\quad =\int _0^t ds \int _{\mathbb {R}^d} \langle b_\nu (s,x), D_x f(s,x)\rangle \nu _{1,s} (dx). \end{aligned}$$
(2.32)

This implies the second equalities of (2.9)–(2.10) (see Remark 4).

The proof of (ii) is done by Lemma 1, (2.32), and Theorem 1.

We prove (iii). From (2.29) and the strict convexity of \(u\mapsto L(t,x;u)\), (2.11) holds. For \(b\in \mathbf{A} (\{P_t\}_{0\le t\le 1})\), \(P_t(dx)\ll dx\), dt-a.e. from (A.0.0,ii), since \(a, b\in L^1([0,1]\times \mathbb {R}^d,dtP_t(dx))\) (see [4], p. 1042, Corollary 2.2.2). For \(\{p_{i}(t,x)dx\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R}^d)\), \(b_i\in \mathbf{A} (\{p_{i}(t,x)dx\}_{0\le t\le 1})\), \(i=0,1\), and \(\lambda \in [0,1]\),

$$\begin{aligned} p_\lambda :=(1-\lambda ) p_0+\lambda p_1 ,\quad b_\lambda :=1_{(0,\infty )}(p_\lambda )\frac{(1-\lambda )p_0b_0+\lambda p_{1}b_1}{p_\lambda }, \end{aligned}$$
(2.33)

where \(1_A(x)\) denotes an indicator function of \(A\subset \mathbb {R}\). Then \(b_\lambda \in \mathbf{A}(\{p_\lambda (t,x)dx\}_{0\le t\le 1})\) and

$$\begin{aligned}&\int _0^1 dt \int _{\mathbf{R }^d}L(t,x;b_\lambda (t,x))p_\lambda (t,x)dx\nonumber \\&\quad \le (1-\lambda )\int _0^1 dt \int _{\mathbf{R }^d}L(t,x;b_0 (t,x))p_0(t,x)dx\nonumber \\&\qquad +\lambda \int _0^1 dt \int _{\mathbf{R }^d}L(t,x;b_1 (t,x))p_1 (t,x)dx. \end{aligned}$$
(2.34)

Here the equality holds if and only if \(b_0=b_1\) dtdx-a.e. on the set \(\{(t,x)\in [0,1]\times \mathbb {R}^d| p_0 (t,x)p_1 (t,x)>0\}\). \(\square\)

3 Stochastic optimal transport with a nonconvex cost

In this section, in the case where \(d=1\) and where a is not fixed, we consider slightly relaxed versions of the SOTPs of which cost functions are not supposed to be convex. As a fundamental problem of the stochastic optimal control theory, the test of the Markov property of a minimizer of a stochastic optimal control problem is known. We also consider the Markov property of the minimizer of a finite-time horizon stochastic control problem. Our previous result [35] proved it in a one-dimensional case by the optimal transportation problem with a concave cost. We generalize it by Theorem 2 in Sect. 1.

Since a is not fixed in this section, we consider a new class of semimartingales.

Let \(u=\{u(t)\}_{0\le t\le 1}\) and \(\{W(t)\}_{0\le t\le 1}\) be a progressively measurable real valued process and a one-dimensional Brownian motion on the same complete filtered probability space, respectively. The probability space under consideration is not fixed in this section. Let \(\sigma :[0,1]\times \mathbb {R}\longrightarrow \mathbb {R}\) be a Borel measurable function. Let \(Y^{u,\sigma }=\{Y^{u,\sigma }(t)\}_{0\le t\le 1}\) be a continuous semimartingale such that the following holds weakly:

$$\begin{aligned} Y^{u,\sigma }(t)=Y^{u,\sigma }(0)+\int _0^t u (s)ds +\int _0^t \sigma (s,Y^{u,\sigma }(s))dW(s), \quad 0\le t\le 1, \end{aligned}$$
(3.1)

provided it exists.

For \(r> 0\),

$$\begin{aligned} \mathcal{U}_{r}: & = {} \left\{ (u,\sigma ) \biggl | E\left[ \int _0^1 \left( \frac{\sigma (t,Y^{u,\sigma }(t))^2}{1+|Y^{u,\sigma }(t)|^2}+|u(t)|\right) dt\right] <\infty ,|\sigma |\ge r\right\} , \end{aligned}$$
(3.2)
$$\begin{aligned} \mathcal{U}_{r,Mar}: & = {} \left\{ (u,\sigma ) \in \mathcal{U}_{r}| u(\cdot )=b_{Y^{u,\sigma }}(\cdot ,Y^{u,\sigma }(\cdot ))\right\} , \end{aligned}$$
(3.3)

where \(b_{Y^{u,\sigma }}(t,Y^{u,\sigma }(t)):=E[u(t)|(t,Y^{u,\sigma }(t))]\). For \((u,\sigma )\in \mathcal{U}_{r}\),

$$\begin{aligned} F^{Y^{u, \sigma }}_t (x): & = {} P(Y^{u, \sigma }(t)\le x), \end{aligned}$$
(3.4)
$$\begin{aligned} G^u_t (x): & = {} P(u(t)\le x), \end{aligned}$$
(3.5)
$$\begin{aligned} \tilde{b}_{u, Y^{u, \sigma }}(t,x): & = {} (G^u_t)^{-1} (1-F^{Y^{u, \sigma }}_t (x)),\quad (t,x)\in [0,1]\times \mathbb {R}. \end{aligned}$$
(3.6)

Here for a distribution function F on \(\mathbb {R}\),

$$\begin{aligned} F^{-1}(v):=\inf \{x\in \mathbb {R}| F(x)\ge v\},\quad 0<v<1. \end{aligned}$$

\(F^{-1}\) is called the quasi-inverse of F (see, e.g. [48, 51, 57]).

$$\begin{aligned} p^{Y^{u, \sigma }}(t,x):=\frac{P^{Y^{u,\sigma }(t)}(dx)}{dx} \end{aligned}$$
(3.7)

exists dt-a.e. since r is positive and \((\sigma ^2, b_{Y^{u,\sigma }})\in \mathbf{A}_0(\{P^{Y^{u,\sigma }(t)}\}_{0\le t\le 1})\) (see [4], p. 1042, Corollary 2.2.2). Indeed, by Jensen’s inequality,

$$\begin{aligned} \int _{\mathbb {R}} |b_{Y^{u, \sigma }}(t,y)|p^{Y^{u, \sigma }}(t,y)dy =E[|E[u(t)|(t,Y^{u, \sigma }(t))]|]\le E[|u(t)|]. \end{aligned}$$
(3.8)

From the idea of covariance kernels (see [6, 7, 34, 39]),

$$\begin{aligned}&\tilde{a}_{u,Y^{u, \sigma }}(t,x)\nonumber \\&\quad :=1_{(0,\infty )} (p^{Y^{u, \sigma }}(t,x)) \frac{2\int _{-\infty }^x (\tilde{b}_{u,Y^{u, \sigma }}(t,y) -b_{Y^{u, \sigma }}(t,y))p^{Y^{u, \sigma }}(t,y)dy}{p^{Y^{u, \sigma }}(t,x)}. \end{aligned}$$
(3.9)

The following holds and will be proved later.

Theorem 5

Let \(r> 0\). For \((u,\sigma )\in \mathcal{U}_{r}\), there exists \(\tilde{u}\) such that \((\tilde{u},\tilde{\sigma }:=(\sigma ^2+\tilde{a}_{u,Y^{u,\sigma }})^{\frac{1}{2}})\in \mathcal{U}_{r,Mar}\) and that the following holds:

$$\begin{aligned} P^{Y^{\tilde{u},\tilde{\sigma }}(t)} & = {} P^{Y^{u,\sigma }(t)}, \quad t\in [0,1], \end{aligned}$$
(3.10)
$$\begin{aligned} b_{Y^{\tilde{u},\tilde{\sigma }}} & = {} \tilde{b}_{u,Y^{u,\sigma }}, \end{aligned}$$
(3.11)
$$\begin{aligned} P^{b_{Y^{\tilde{u},\tilde{\sigma }}}(t,Y^{\tilde{u},\tilde{\sigma }}(t))} & = {} P^{u(t)},\quad dt\mathrm{-a.e.} \end{aligned}$$
(3.12)

For \(r> 0\) and \(\{P_t\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R})\),

$$\begin{aligned} \mathbf{A}_{0,r}(\{P_t\}_{0\le t\le 1}): & = {} \biggl \{ (a,b) \in \mathbf{A}_{0}(\{P_t\}_{0\le t\le 1})\biggl | a\ge r^2,\nonumber \\&\int _0^1 dt\int _{\mathbb {R}^d}\left( \frac{a(t,x)}{1+|x|^2}+|b(t,x)|\biggl )P_t(dx)<\infty \right\} . \end{aligned}$$
(3.13)

Let \(L_1, L_2:[0,1]\times \mathbb {R}\longrightarrow [0,\infty )\) be Borel measurable.

For \((u,\sigma )\),

$$\begin{aligned} J(u,\sigma ):=E\biggl [\int _0^1 (L_1 (t,Y^{u,\sigma } (t))+L_2 (t,u(t)))dt\biggl ]. \end{aligned}$$
(3.14)

For \((a,b)\in \mathbf{A}_{0}(\{P_t\}_{0\le t\le 1})\),

$$\begin{aligned} I(\{P_t\}_{0\le t\le 1},a,b):=\int _0^1 dt\int _{\mathbb {R}^d}(L_1 (t,x)+L_2(t,b(t,x)))P_t(dx). \end{aligned}$$
(3.15)

One easily obtains the following from Theorems 2 and 5.

Corollary 3

Suppose that \(L_1, L_2:[0,1]\times \mathbb {R}\longrightarrow [0,\infty )\) are Borel measurable.

Then for any \(r> 0\), the following holds. (i) For any \(P_0,P_1\in \mathcal{P}(\mathbb {R})\),

$$\begin{aligned}&\inf \{J(u,\sigma )| (u,\sigma )\in \mathcal{U}_{r}, P^{Y^{u,\sigma }(t)}=P_t, t=0,1\}\nonumber \\&\quad =\inf \{J(u,\sigma )|(u,\sigma )\in \mathcal{U}_{r, Mar}, P^{Y^{u,\sigma }(t)}=P_t, t=0,1\}\nonumber \\&\quad =\inf \{I(\{Q_t\}_{0\le t\le 1},a,b)| (a,b)\in \mathbf{A}_{0,r}(\{Q_t\}_{0\le t\le 1}), Q_t=P_t, t=0,1\}. \end{aligned}$$
(3.16)

In particular, if there exists a minimizer in (3.16), then there exists a minimizer \((u,\sigma )\in \mathcal{U}_{r, Mar}\). (ii) For any \(\{P_t\}_{0\le t\le 1}\subset \mathcal{P}(\mathbb {R})\),

$$\begin{aligned}&\inf \{J(u,\sigma )| (u,\sigma )\in \mathcal{U}_{r}, P^{Y^{u,\sigma }(t)}=P_t, 0\le t\le 1\}\nonumber \\&\quad =\inf \{J(u,\sigma )|(u,\sigma )\in \mathcal{U}_{r, Mar}, P^{Y^{u,\sigma }(t)}=P_t, 0\le t\le 1\} \nonumber \\&\quad =\inf \{I(\{P_t\}_{0\le t\le 1},a,b)| (a,b)\in \mathbf{A}_{0,r}(\{P_t\}_{0\le t\le 1})\}. \end{aligned}$$
(3.17)

In particular, if there exists a minimizer in (3.17), then there exists a minimizer \((u,\sigma )\in \mathcal{U}_{r,Mar}\).

Suppose that \(L:[0,1]\times \mathbb {R}\times \mathbb {R}\longrightarrow [0,\infty ), \Psi :\mathbb {R}\longrightarrow [0,\infty )\) are Borel measurable. Then for any \(P_0\in \mathcal{P}(\mathbb {R})\),

$$\begin{aligned}&\inf _{\begin{array}{c} {\scriptstyle (u,\sigma )\in \mathcal{U}_{r},} \\ {{\scriptstyle P^{Y^{u,\sigma }(0)}=P_0}} \end{array}} E\left[ \int _0^1 L(t,Y^{u, \sigma }(t);u(t))dt+\Psi (Y^{u, \sigma } (1))\right] \nonumber \\&\quad =\inf _{P\in \mathcal{P}(\mathbb {R})}\left\{ V_r(P_0,P)+\int _{\mathbb {R}} \Psi (x) P(dx)\right\} , \end{aligned}$$
(3.18)

where \(V_r\) denotes V with \(\mathcal{A}\) replaced by \(\{Y^{u, \sigma }|(u, \sigma )\in \mathcal{U}_{r}\}\).

In particular, we easily obtain the following from Corollary 3.

Corollary 4

In addition to the assumption of Corollary 3, suppose that \(\Psi :\mathbb {R}\longrightarrow [0,\infty )\) is Borel measurable. Then for any \(r> 0\) and \(P_0\in \mathcal{P}(\mathbb {R})\),

$$\begin{aligned}&\inf \{J(u,\sigma )+E[\Psi (Y^{u,\sigma }(1))]| (u,\sigma )\in \mathcal{U}_{r}, P^{Y^{u,\sigma }(0)}=P_0\}\nonumber \\&\quad =\inf \{J(u,\sigma )+E[\Psi (Y^{u,\sigma }(1))]|(u,\sigma )\in \mathcal{U}_{r, Mar}, P^{Y^{u,\sigma }(0)}=P_0\}. \end{aligned}$$
(3.19)

In particular, if there exists a minimizer in (3.19), then there exists a minimizer \((u,\sigma )\in \mathcal{U}_{r, Mar}\).

We prove Theorem 5 by Theorem 2.

Proof of Theorem 5

For \((u,\sigma )\in \mathcal{U}_{r}\), the following holds (see [35]):

$$\begin{aligned} \tilde{a}_{u,Y^{u, \sigma }}(t,\cdot )\ge 0, \quad P^{\tilde{b}_{u,Y^{u, \sigma }}(t,Y^{u, \sigma } (t))}=P^{u(t)},\quad dt\mathrm{-a.e.} \end{aligned}$$
(3.20)

Indeed,

$$\begin{aligned} \int _{-\infty }^x \tilde{b}_{u,Y^{u, \sigma }}(t,y)p^{Y^{u, \sigma }}(t,y)dy & = {} E[(G^u_t)^{-1} (1-F^{Y^{u, \sigma }}_t (Y^{u, \sigma } (t)));Y^{u, \sigma } (t)\le x],\\ \int _{-\infty }^{x} b_{Y^{u, \sigma }}(t,y)p^{Y^{u, \sigma }}(t,y)dy & = {} E[E[u(t)|(t,Y^{u, \sigma }(t))];Y^{u, \sigma } (t)\le x]\\ & = {} E[u(t);Y^{u, \sigma }(t)\le x]. \end{aligned}$$

For an \(\mathbb {R}^2\)-valued random variable \(Z=(X,Y)\) on a probability space,

$$\begin{aligned} E[Y;X\le x] & = {} \int _0^\infty \{F_X(x)-F_Z(x,y)\}dy-\int _{-\infty }^0 F_Z(x,y)dy,\\ F_Z(x,y)\ge & {} \max (F_X(x)+F_Y(y)-1,0)\\ & = {} P(F_X^{-1} (U)\le x,F_Y^{-1} (1-U)\le y), \end{aligned}$$

where \(F_X\) denotes the distribution function of X and U is a uniformly distributed random variable on [0, 1]. The distribution functions of \(F_X^{-1} (U)\) and \(F_Y^{-1} (1-U)\) are \(F_X\) and \(F_Y\), respectively. From (3.7), \(F^{Y^{u, \sigma }}_t (Y^{u, \sigma }(t))\) is uniformly distributed on [0, 1] and \((F^{Y^{u, \sigma }}_t)^{-1} (F^{Y^{u, \sigma }}_t (Y^{u, \sigma } (t)))=Y^{u, \sigma }(t)\), P-a.s., dt-a.e. (see [17] or, e.g. [48, 51, 57]).

It is easy to see that the following holds from (3.8) and (3.20):

$$\begin{aligned} (\sigma ^2+\tilde{a}_{u,Y^{u,\sigma }}, \tilde{b}_{u,Y^{u,\sigma }})\in \mathbf{A}_0(\{P^{Y^{u,\sigma }(t)}\}_{0\le t\le 1}). \end{aligned}$$

Indeed, from (3.20), the following holds:

$$\begin{aligned} E\left[ \int _0^1|\tilde{b}_{u,Y^{u,\sigma }}(t,Y^{u,\sigma } (t))|dt\right] =E\left[ \int _0^1|u(t)|dt\right] <\infty . \end{aligned}$$
(3.21)

The following will be proved below:

$$\begin{aligned} E\left[ \int _0^1 \frac{\tilde{a}_{u,Y^{u,\sigma }}(t,Y^{u,\sigma }(t))}{1+|Y^{u,\sigma }(t)|^2}dt\right] <\infty . \end{aligned}$$
(3.22)

(3.21)–(3.22) and Theorem 2 complete the proof. We prove (3.22).

$$\begin{aligned}&E\left[ \int _0^1 \frac{\tilde{a}_{u,Y^{u,\sigma }}(t,Y^{u,\sigma }(t))}{2(1+|Y^{u,\sigma }(t)|^2)}dt\right] \nonumber \\&\quad =\int _0^1dt\int _{\mathbb {R}}\frac{1}{1+x^2}dx\int _{-\infty }^x (\tilde{b}_{u,Y^{u,\sigma }}(t,y) -b_{Y^{u,\sigma }}(t,y))p^{Y^{u,\sigma }}(t,y)dy. \end{aligned}$$
(3.23)

From (3.20),

$$\begin{aligned} \int _{-\infty }^\infty \tilde{b}_{u,Y^{u,\sigma }}(t,y)p^{Y^{u,\sigma }}(t,y)dy & = {} E[u(t)]=E[E[u(t)|(t,Y^{u,\sigma }(t))]]\nonumber \\ & = {} \int _{-\infty }^\infty b_{Y^{u,\sigma }}(t,y)p^{Y^{u,\sigma }}(t,y)dy,\quad dt\hbox {-a.e.} \end{aligned}$$
(3.24)

In particular, the following holds dt-a.e.:

$$\begin{aligned}&\int _{\mathbb {R}}\frac{1}{1+x^2}dx\int _{-\infty }^x (\tilde{b}_{u,Y^{u,\sigma }}(t,y) -b_{Y^{u,\sigma }}(t,y))p^{Y^{u,\sigma }}(t,y)dy\nonumber \\&\quad =\int _{-\infty }^0\frac{1}{1+x^2}dx\int _{-\infty }^x (\tilde{b}_{u,Y^{u,\sigma }}(t,y) -b_{Y^{u,\sigma }}(t,y))p^{Y^{u,\sigma }}(t,y)dy\nonumber \\&\qquad -\int _0^{\infty }\frac{1}{1+x^2}dx\int _x^{\infty } (\tilde{b}_{u,Y^{u,\sigma }}(t,y) -b_{Y^{u,\sigma }}(t,y))p^{Y^{u,\sigma }}(t,y)dy\nonumber \\&\quad =\int _{-\infty }^0(\tilde{b}_{u,Y^{u,\sigma }}(t,y) -b_{Y^{u,\sigma }}(t,y))p^{Y^{u,\sigma }}(t,y)dy\int _{y}^0\frac{1}{1+x^2}dx\nonumber \\&\qquad -\int _0^{\infty }(\tilde{b}_{u,Y^{u,\sigma }}(t,y) -b_{Y^{u,\sigma }}(t,y))p^{Y^{u,\sigma }}(t,y)dy\int _0^y\frac{1}{1+x^2}dx\nonumber \\&\quad \le \int _{\mathbb {R}}|\arctan y|( |\tilde{b}_{u,Y^{u,\sigma }}(t,y)|+|b_{Y^{u,\sigma }}(t,y)|)p^{Y^{u,\sigma }}(t,y)dy. \end{aligned}$$
(3.25)

Since \(|\arctan y|\) is bounded, (3.8) and (3.21) completes the proof of (3.22).

\(\square\)

4 Semiconcavity and continuity of Schrödinger’s Problem

Proposition 1 and Lemma 1 imply that \(\mathcal{P} (\mathbb {R}^d \times \mathbb {R}^d )\ni P\times Q\mapsto V(P,Q)\) is convex and lower semicontinuous.

In this section, we give a sufficient condition under which for a fixed \(Q\in \mathcal{P} (\mathbb {R}^d )\), \(L^2 (\varOmega , P;\mathbb {R}^d)\ni X\mapsto V_S(P^X,Q)\) is semiconcave and is continuous (see (1.19) for notation). More precisely, we show that there exists \(C>0\) such that for a fixed \(Q\in \mathcal{P} (\mathbb {R}^d )\),

$$\begin{aligned} L^2 (\varOmega , P;\mathbb {R}^d)\ni X\mapsto V_S(P^X,Q)-CE[|X|^2] \end{aligned}$$

is concave and is continuous. Here \(L^2 (\varOmega , P;\mathbb {R}^d)\) denotes the space of all square integrable functions from a probability space \((\varOmega , \mathcal{F}, P)\) to \((\mathbb {R}^d,\mathbf{B}(\mathbb {R}^d))\). Let \(W_2\) denote the Wasserstein distance of order 2, i.e. \(T^{1/2}\) with \(c=|y-x|^2\) in Remark 5. We also show the Lipschitz continuity of \(\mathcal{P}_{2} (\mathbb {R}^d )\ni P\mapsto V_S(P,Q)\) in \(W_2\) (see (4.4) for notation).

We first describe the assumptions in this section.

(A.5) \(\sigma (t,x)=(\sigma ^{ij}(t,x))_{i,j=1}^d\), \((t,x)\in [0,1]\times \mathbb {R}^d\), is a \(d\times d\)-matrix. \(a(t,x):=\sigma (t,x)\sigma (t,x)^*\), \((t,x)\in [0,1]\times \mathbb {R}^d\), is uniformly nondegenerate, bounded, once continuously differentiable, and uniformly Hölder continuous. \(D_x a(t,x)\) is bounded and the first derivatives of a(tx) are uniformly Hölder continuous in x uniformly in \(t\in [0,1]\).

(A.6) \(\xi (t,x):[0,1]\times \mathbb {R}^d\longrightarrow \mathbb {R}^d\) is bounded, continuous, and uniformly Hölder continuous in x uniformly in \(t\in [0,1]\).

Remark 8

(A.5)–(A.6) imply (A.0.0), (A.1), and (A.2) for (1.19). (A.4)’ implies (A.0)–(A.3) and (A.5)–(A.6).

We describe the following fact.

Theorem 6

Suppose that (A.5)–(A.6) hold. Then for any \(P_0\in \mathcal{P}(\mathbb {R}^d)\), the following SDE has the unique weak solution with a positive continuous transition probability density p(txsy), \(0\le t<s\le 1\), \(x,y\in \mathbb {R}^d\):

$$\begin{aligned} d\mathbf{X}(t) & = {} \xi (t,\mathbf{X}(t))dt+\sigma (t,\mathbf{X}(t))dW_\mathbf{X}(t),\quad 0< t<1,\nonumber \\ P^{\mathbf{X} (0)} & = {} P_0 \end{aligned}$$
(4.1)

(see [28]). Besides, there exist constants \(C_1, C_2>0\) such that

$$\begin{aligned} -C_1+C_2^{-1}|x-y|^2\le -\log p(0,x;1,y)\le C_1+C_2|x-y|^2,\quad x,y\in \mathbb {R}^d \end{aligned}$$
(4.2)

(see [3, 22]).

Remark 9

If \(V_S(P,Q)\) is finite, then the distribution of the minimizer X of \(V_S(P,Q)\) is absolutely continuous with respect to \(P^\mathbf{X}\). In particular, \(Q(dx)\ll dx\) under (A.5)–(A.6). Indeed, \(V_S(P,Q)\) is the relative entropy of \(P^{X}\) with respect to \(P^\mathbf{X}\) and \(P^{\mathbf{X}(1)}\) has a density (see the discussion below Remark 3).

We recall the definition of displacement convexity.

Definition 7

(Displacement convexity (see [32])) Let \(G:\mathcal{P}(\mathbb {R}^d )\longrightarrow \mathbb {R}\cup \{\infty \}\). G is displacement convex if the following is convex: for any \(\rho _0, \rho _1\in \mathcal{P}(\mathbb {R}^d )\) and convex function \(\varphi :\mathbb {R}^d\longrightarrow \mathbb {R}\cup \{\infty \}\),

$$\begin{aligned}{}[0,1]\ni t\mapsto G(\rho _t), \end{aligned}$$
(4.3)

where \(\rho _t:= \rho _0(id+t(D\varphi -id))^{-1}\), \(0< t<1\), provided \(\rho _1= \rho _0(D\varphi )^{-1}\) and \(\rho _t\) can be defined. Here id denotes an identity mapping.

Recall that a convex function is differentiable dx-a.e. in the interior of its domain (see, e.g. [61]) and \(\rho _t\) in (4.3) is well defined if \(\rho _0\in \mathcal{P}_{2,ac} (\mathbb {R}^d )\) and if \(\rho _1\in \mathcal{P}_{2} (\mathbb {R}^d )\) (see, e.g. [61]). Here

$$\begin{aligned} \mathcal{P}_2 (\mathbb {R}^d ): & = {} \left\{ P\in \mathcal{P} (\mathbb {R}^d ) \biggl |\int _{\mathbb {R}^d}|x|^2P(dx)<\infty \right\} , \end{aligned}$$
(4.4)
$$\begin{aligned} \mathcal{P}_{ac} (\mathbb {R}^d ): & = {} \{p (x)dx\in \mathcal{P} (\mathbb {R}^d )\}, \end{aligned}$$
(4.5)
$$\begin{aligned} \mathcal{P}_{2,ac} (\mathbb {R}^d ): & = {} \mathcal{P}_{2} (\mathbb {R}^d )\cap \mathcal{P}_{ac} (\mathbb {R}^d ). \end{aligned}$$
(4.6)

The following implies that \(L^2 (\varOmega , P;\mathbb {R}^d)\ni X\mapsto V_S(P^X,Q)\) is semiconvave for a fixed \(Q\in \mathcal{P}_{ac} (\mathbb {R}^d )\) and will be proved later.

Theorem 7

Suppose that (A.4)’ holds and that there exists a constant \(C>0\) such that \(x\mapsto \log p(0,x;1,y) +C|x|^2\) is convex for any \(y\in \mathbb {R}^d\). Then for any \(Q\in \mathcal{P}_{ac} (\mathbb {R}^d )\), \(X_i\in L^2 (\varOmega , P; \mathbb {R}^d), i=1,2\), and \(\lambda _1\in (0,1)\),

$$\begin{aligned} \sum _{i=1}^2\lambda _i V_S(P^{X_i},Q)-\lambda _1\lambda _2 CE[|X_1-X_2|^2] \le V_S(P^{\sum _{i=1}^2\lambda _i X_i},Q), \end{aligned}$$
(4.7)

where \(\lambda _2:=1-\lambda _1\). Equivalently, the following is convex:

$$\begin{aligned} L^2 (\varOmega , P;\mathbb {R}^d)\ni X\mapsto -V_S(P^X,Q)+CE[|X|^2]. \end{aligned}$$
(4.8)

In particular, the following is displacement convex:

$$\begin{aligned} \mathcal{P}_{2,ac} (\mathbb {R}^d )\ni P\mapsto -V_S(P,Q)+C\int _{\mathbb {R}^d}|x|^2P(dx). \end{aligned}$$
(4.9)

Remark 10

Suppose that \(a_{ij}=a_{ij} (x), \xi _i=\xi _i(x)\in C_b^\infty (\mathbb {R}^d)\) and that a(x) is uniformly nonnegenerate. Then \(D_x^2\log p(0,x;1,y)\) is bounded (see [58], Theorem B). In particular, there exists a constant \(C>0\) such that for any \(y\in \mathbb {R}^d\), \(x\mapsto \log p(0,x;1,y) +C|x|^2\) is convex.

For \(P\in \mathcal{P}(\mathbb {R}^d )\),

$$\begin{aligned} \mathcal{S}(P):= {\left\{ \begin{array}{ll} \displaystyle \int _{\mathbb {R}^d}p(x)\log p(x) dx,&{}P(dx)=p(x)dx,\\ \displaystyle \infty ,&{}otherwise. \end{array}\right. } \end{aligned}$$
(4.10)

Let \(\mu (P,Q)\) denote the joint distribution at \(t=0, 1\) of the minimizer of \(V_S(P,Q)\), provided \(V_S(P,Q)\) is finite. The following implies that \(L^2 (\varOmega , P;\mathbb {R}^d)\ni X\mapsto V_S(P^X,Q)\) is continuous for a fixed \(Q\in \mathcal{P}_{ac} (\mathbb {R}^d )\) such that \(\mathcal{S}(Q)\) is finite.

The lower-semicontinuity of \(\mathcal{P}(\mathbb {R}^d )\ni P\mapsto V_S(P,Q)\) is known and can be proved, e.g. from Proposition 1 and Lemma 1. That of (4.12) can be proved in the same way as Lemma 3.4 in [43]. We give the proof for the sake of completeness.

Theorem 8

Suppose that (A.5)–(A.6) hold. For \(P, Q\in \mathcal{P}_{2} (\mathbb {R}^d )\), if \(\mathcal{S}(Q)\) is finite, then \(V_S (P,Q)\) is finite and the following holds:

$$\begin{aligned} -V_S (P,Q) & = {} H(P\times Q| \mu (P,Q))-\mathcal{S}(Q)\nonumber \\&+\int _{\mathbb {R}^d\times \mathbb {R}^d} \log p(0,x;1,y) P(dx)Q(dy). \end{aligned}$$
(4.11)

In particular, the following is weakly lower semicontinuous:

$$\begin{aligned} \mathcal{P}_{2} (\mathbb {R}^d )\ni P\mapsto -V_S(P,Q)+C_2\int _{\mathbb {R}^d\times \mathbb {R}^d}|x-y|^2P(dx)Q(dy) \end{aligned}$$
(4.12)

(see (4.2) for notation). The following is also continuous in the topology induced by \(W_2\):

$$\begin{aligned} \mathcal{P}_{2} (\mathbb {R}^d )\ni P\mapsto V_S(P,Q). \end{aligned}$$
(4.13)

If \(\mathcal{S}(Q)\) is infinite, then so is \(V_S (P,Q)\).

Remark 11

For \(C>0\) and \(P, Q\in \mathcal{P} (\mathbb {R}^d )\),

$$\begin{aligned} \Psi _{Q,C} (P):=\mathcal{S}(P) -V_S(P,Q)+ C\int _{\mathbb {R}^d\times \mathbb {R}^d}|x-y|^2P(dx)Q(dy). \end{aligned}$$
(4.14)

\(\Psi _{Q,C} (P)\) plays a crucial role in the construction of moment measures by the SOTP (see [43, 44] and also [54] for the approach by the OTP). Since \(\mathcal{P}_{ac} (\mathbb {R}^d )\ni P\mapsto \mathcal{S}(P)\) is strictly displacement convex from Theorem 2.2 in [32], so is \(\mathcal{P}_{2, ac} (\mathbb {R}^d )\ni P\mapsto \Psi _{Q,C} (P)\) under the assumption of Theorem 7.

From Theorem 7, under stronger assumptions than Theorem 8, for a fixed \(Q\in \mathcal{P}_{2,ac} (\mathbb {R}^d )\) such that \(\mathcal{S}(Q)\) is finite, we prove that \(\mathcal{P}_2 (\mathbb {R}^d )\ni P\mapsto V_S(P,Q)\) is Lipschitz continuous in \(W_2\).

Corollary 5

Suppose that (A.4)’ holds and that there exists a constant \(C>0\) such that \(\log p(0,x;1,y) +C|x|^2\) is convex in x for any \(y\in \mathbb {R}^d\). Then for any \(Q\in \mathcal{P}_{2,ac} (\mathbb {R}^d )\) such that \(\mathcal{S}(Q)\) is finite, the following holds:

$$\begin{aligned}&|V_S(P_0,Q)-V_S(P_1,Q)|\nonumber \\&\quad \le f (\max (||x||_{L^2(P_0)}, ||x||_{L^2(P_1)}), ||x||_{L^2(Q)} )W_2(P_0,P_1), \quad P_0,P_1\in \mathcal{P}_2 (\mathbb {R}^d ), \end{aligned}$$
(4.15)

where \(||x||_{L^2(P)}:=(\int _{\mathbb {R}^d}|x|^2P(dx))^{1/2}\), \(P\in \mathcal{P}_2 (\mathbb {R}^d )\) and

$$\begin{aligned} f(x,y):=2C_2x^2+2(C_2y^2+C_1)+C \end{aligned}$$

(see (4.2) for notation). In particular, if \(p(0,x;1,y)=(2\pi a)^{-d/2}\exp (-|y-x|^2/(2a)), a>0\), then

$$\begin{aligned}&|V_S(P_0,Q)-V_S(P_1,Q)|\nonumber \\&\quad \le \frac{1}{2a}\{||x||_{L^2(P_0)}+||x||_{L^2(P_1)}+2 (1+\max (\sigma _0,\sigma _1))||x||_{L^2(Q)} \}W_2(P_0,P_1), \end{aligned}$$
(4.16)

where

$$\begin{aligned} \sigma _i:=\biggl (\int _{\mathbb {R}^d} \biggl (x-\int _{\mathbb {R}^d}yP_i(dy)\biggr )^2P_i(dx)\biggr )^{1/2},\quad i=0,1. \end{aligned}$$

We prove Theorems 7 and 8, and Corollary 5.

Proof of Theorem 7

For any \(f_{i}\in C^\infty _b (\mathbb {R}^d)\), \(u_{i} (x):=\varphi (0,x;f_{i})\) (see (2.18) for notation). Then

$$\begin{aligned}&\sum _{i=1}^2\lambda _i \left\{ \int _{\mathbb {R}^d} f_{i} (x)Q(dx)-\int _{\mathbb {R}^d} u_i (x)P^{X_i}(dx)\right\} \nonumber \\&\quad \le V_S (P^{\sum _{i=1}^2\lambda _i X_i},Q)+\lambda _1\lambda _2CE[|X_1-X_2|^2]. \end{aligned}$$
(4.17)

Indeed,

$$\begin{aligned}&\sum _{i=1}^2\lambda _i \left\{ \int _{\mathbb {R}^d} f_{i} (x)Q(dx)-\int _{\mathbb {R}^d} u_i (x)P^{X_i}(dx)\right\} \\&\quad =\int _{\mathbb {R}^d} \sum _{i=1}^2\lambda _i f_{i} (x)Q(dx)- E\biggl [\sum _{i=1}^2\lambda _i\{u_{i}(X_i)+C|X_i|^2\}\biggl ] +CE\biggl [\sum _{i=1}^2\lambda _i |X_i|^2\biggl ],\\&\qquad \int _{\mathbb {R}^d} \sum _{i=1}^2\lambda _i f_{i} (x)Q(dx)\\&\quad \le V_S (P^{\sum _{i=1}^2\lambda _i X_i},Q)+\int _{\mathbb {R}^d} \varphi \biggl (0,x;\sum _{i=1}^2\lambda _i f_{i}\biggl )P^{\sum _{i=1}^2\lambda _i X_i}(dx) \end{aligned}$$

by the Duality Theorem for \(V_S\) (see Corollary 2).

$$\begin{aligned}&\int _{\mathbb {R}^d} \varphi \biggl (0,x;\sum _{i=1}^2\lambda _i f_{i}\biggl )P^{\sum _{i=1}^2\lambda _i X_i}(dx) =E\biggl [\varphi \biggl (0,\sum _{i=1}^2\lambda _i X_i;\sum _{i=1}^2\lambda _i f_{i}\biggl )\biggl ]\\&\quad \le E\biggl [\sum _{i=1}^2\lambda _i\biggl \{u_{i}(X_i)+C|X_i|^2\biggl \}\biggl ] -CE\biggl [\biggl |\sum _{i=1}^2\lambda _i X_i\biggl |^2\biggl ]. \end{aligned}$$

In the inequality above, we considered as follows:

$$\begin{aligned} \varphi (t,x;f) & = {} \log \left( \int _{\mathbb {R}^d}p(t,x;1,y)\exp (f(y)) dy\right) , (t,x)\in [0,1)\times \mathbb {R}^d,\nonumber \\&\quad \int _{\mathbb {R}^d} \exp \biggl (\log p\biggl (0,\sum _{i=1}^2\lambda _iX_i;1,y\biggl )+C\biggl |\sum _{i=1}^2\lambda _i X_i\biggl |^2+\sum _{i=1}^2\lambda _if_{i} (y)\biggl )dy\nonumber \\\le & {} \int _{\mathbb {R}^d} \exp \biggl (\sum _{i=1}^2\lambda _i\{\log p(0,X_i;1,y)+C|X_i|^2+f_{i} (y)\}\biggl )dy\nonumber \\\le & {} \prod _{i=1}^2\left( \int _{\mathbb {R}^d} \exp (\log p(0,X_i;1,y)+C|X_i|^2+f_{i} (y))dy\right) ^{\lambda _i} \end{aligned}$$
(4.18)

by Hölder’s inequality. Taking the supremum in \(f_i\) over \(C^\infty _b (\mathbb {R}^d)\) on the left hand side of (4.17), the Duality Theorem for \(V_S\) completes the proof (see Corollary 2). \(\square\)

Proof of Theorem 8

We prove the first part. We first prove that \(V_S(P,Q)\) is finite. Indeed, from [53],

$$\begin{aligned} V_S(P,Q) & = {} \inf \{H(\mu (dxdy)|P(dx)p(0,x;1,y)dy):\mu _1=P,\mu _2=Q\}\nonumber \\\le & {} H(P(dx)Q(dy)|P(dx)p(0,x;1,y)dy)\nonumber \\ =& {} \mathcal{S}(Q)- \int _{\mathbb {R}^d\times \mathbb {R}^d}\{ \log p(0,x;1,y)\}P(dx)Q(dy)<\infty \end{aligned}$$
(4.19)

from (4.2) (see (2.3) for notation). Here for \(\mu , \nu \in \mathcal{P}(\mathbb {R}^d\times \mathbb {R}^d )\),

$$\begin{aligned} H(\mu |\nu ):= {\left\{ \begin{array}{ll} \displaystyle \int _{\mathbb {R}^d\times \mathbb {R}^d}\left\{ \log \frac{\mu (dxdy)}{\nu (dxdy)}\right\} \mu (dxdy),&{}\mu \ll \nu ,\\ \displaystyle \infty ,&{}otherwise. \end{array}\right. } \end{aligned}$$

There exists a Borel measurable \(f:\mathbb {R}^d\longrightarrow \mathbb {R}\) such that the following holds (see, e.g. [28]):

$$\begin{aligned} \mu (P,Q)(dxdy)=P(dx)p(0,x;1,y)\exp (f(y)-\varphi (0,x;f))dy \end{aligned}$$
(4.20)

(see (4.18) for notation).

Since \(V_S(P,Q)\) is finite, \(f\in L^1 (\mathbb {R}^d,P_1)\) and \(\varphi (0,x;f)\in L^1 (\mathbb {R}^d,P_0)\) (see, e.g. [53]). In particular,

$$\begin{aligned} -V_S(P,Q) & = {} -\int _{\mathbb {R}^d\times \mathbb {R}^d}\left\{ \log \frac{\mu (P,Q)(dxdy)}{P(dx)p(0,x;1,y)dy}\right\} \mu (P,Q)(dxdy)\nonumber \\ & = {} \int _{\mathbb {R}^d\times \mathbb {R}^d}(-f(y)+\varphi (0,x;f))P(dx)Q(dy)\nonumber \\ & = {} \int _{\mathbb {R}^d\times \mathbb {R}^d}P(dx)Q(dy) \biggl \{\log \left( \frac{P(dx)Q(dy)}{\mu (P,Q)(dxdy)}\right) \nonumber \\&-\log q(y)+\log p(0,x;1,y)\biggr \}, \end{aligned}$$
(4.21)

which completes the proof of (4.11). \(P\times Q\mapsto H(P\times Q| \mu (P,Q))\) is weakly lower semicontinuous since \(P\times Q\mapsto \mu (P,Q)\) is weakly continuous (see [43]) and since \((\mu ,\nu )\mapsto H(\mu | \nu )\) is weakly lower semicontinuous (see, e.g. [18], Lemma 1.4.3). In particular, (4.12) is weakly lower semicontinuous from (4.11). The weak lower semicontinuity of (4.12) implies the upper semicontinuity of (4.13) since for \(P_n, P\in \mathcal{P}(\mathbb {R}^d), n\ge 1\), \(W_2 (P_n,P)\rightarrow 0\) as \(n\rightarrow \infty\) if and only if \(P_n\rightarrow P\) weakly and \(\int _{\mathbb {R}^d} |x|^2P_n(dx)\rightarrow \int _{\mathbb {R}^d} |x|^2P(dx)\) (see, e.g. [61]).

(4.13) is also weakly lower semicontinuous by Proposition 1 and Lemma 1.

We prove the last part.

$$\begin{aligned} q(0,x;1,y):=p(0,x;1,y)\exp (f(y)-\varphi (0,x;f)). \end{aligned}$$
(4.22)

Then by Jensen’s inequality,

$$\begin{aligned}&V_S(P,Q)\nonumber \\&\quad =\int _{\mathbb {R}^d\times \mathbb {R}^d}\left\{ \log q(0,x;1,y)-\log p(0,x;1,y)\right\} P(dx)q(0,x;1,y)dy\nonumber \\&\quad \ge \mathcal{S}(Q)- \int _{\mathbb {R}^d\times \mathbb {R}^d}\left\{ \log p(0,x;1,y)\right\} P(dx)q(0,x;1,y)dy, \end{aligned}$$
(4.23)

since

$$\begin{aligned} Q(dy)=\left( \int _{\mathbb {R}^d}P(dx)q(0,x;1,y)\right) dy. \end{aligned}$$

(4.2) completes the proof.

\(\square\)

Remark 12

Under (A.5)-(A.6), from Theorem 6, (4.21), and (4.23), for \(P, Q\in \mathcal{P}_2 (\mathbb {R}^d)\), if \(\mathcal{S}(Q)\) is finite, then

$$\begin{aligned} -\infty< & {} -C_1+C_2^{-1}\int _{\mathbb {R}^d\times \mathbb {R}^d}|x-y|^2\mu (P,Q)(dxdy)\nonumber \\\le & {} -\int _{\mathbb {R}^d\times \mathbb {R}^d}\{\log p(0,x;1,y)\}\mu (P,Q)(dxdy)\nonumber \\\le & {} V_S(P,Q)-\mathcal{S}(Q)\nonumber \\\le & {} -\int _{\mathbb {R}^d\times \mathbb {R}^d}\{\log p(0,x;1,y)\}P(dx)Q(dy)\nonumber \\\le & {} C_1+C_2\int _{\mathbb {R}^d\times \mathbb {R}^d}|x-y|^2P(dx)Q(dy)<\infty . \end{aligned}$$
(4.24)

Remark 12 plays a crucial role in the proof of Corollary 5.

Proof of Corollary 5

Let \(X,Y\in L^2 (\varOmega , P; \mathbb {R}^d)\) and \(\lambda :=\min (1, ||X-Y||_2)\), where \(||X||_2:=\{E[|X|^2]\}^{1/2}\).

We prove the following when \(\lambda >0\).

$$\begin{aligned} V_S(P^X,Q)-V_S(P^Y,Q)\le \lambda \{2C_2(||X||_2^2+||x||_{L^2(Q)}^2)+2C_1+C\} \end{aligned}$$
(4.25)

(see (4.2) for notation). From Theorem 7,

$$\begin{aligned}&(1-\lambda )V_S(P^{X},Q)+\lambda V_S(P^{\lambda ^{-1}(Y-X)+X},Q)\\&\quad \le \lambda (1-\lambda )C||\lambda ^{-1}(Y-X)||_2^2+V_S(P^{Y},Q). \end{aligned}$$

since \(Y=(1-\lambda )X+\lambda (\lambda ^{-1}(Y-X)+X)\). From this,

$$\begin{aligned}&V_S(P^X,Q)-V_S(P^Y,Q)\nonumber \\&\quad \le \lambda \{V_S(P^X,Q)-V_S(P^{\lambda ^{-1}(Y-X)+X},Q) +C(1-\lambda )||\lambda ^{-1}(Y-X)||_2^2\}. \end{aligned}$$
(4.26)

Since (A.4)’ implies (A.5)–(A.6),

$$\begin{aligned}&V_S(P^X,Q)-V_S(P^{\lambda ^{-1}(Y-X)+X},Q)\\&\quad \le \mathcal{S}(Q)+C_1+2C_2(||X||_2^2+||x||_{L^2(Q)}^2)- \mathcal{S}(Q)+C_1 \end{aligned}$$

from Remark 12. The following completes the proof of the first part:

$$\begin{aligned} (1-\lambda )||\lambda ^{-1}(Y-X)||_2^2 ={\left\{ \begin{array}{ll} 1-\lambda , &{}\lambda =||X-Y||_2<1,\\ 0=1-\lambda ,&{}\lambda =1. \end{array}\right. } \end{aligned}$$

We prove the second part. One can set \(C= (2a)^{-1}\).

From (4.26), the following holds:

$$\begin{aligned}&V_S(P^X,Q)-V_S(P^Y,Q)\nonumber \\&\quad \le \lambda \biggl \{ V_S(P^X,Q)-V_S(P^{\lambda ^{-1}(Y-X)+X},Q)\nonumber \\&\qquad -\frac{1}{2a}||X||_2^2+ \frac{1}{2a}|| \lambda ^{-1}(Y-X)+X||_2^2 \biggr \}+ \frac{1}{2a}(||X||_2^2-||Y||_2^2), \end{aligned}$$
(4.27)

since

$$\begin{aligned} \lambda (1-\lambda )||\lambda ^{-1}(Y-X)||_2^2 =\lambda (-||X||_2^2+||\lambda ^{-1}(Y-X)+X||_2^2)+||X||_2^2-||Y||_2^2. \end{aligned}$$

The following completes the proof: from Remark 12,

$$\begin{aligned}&V_S(P^X,Q)-V_S(P^{\lambda ^{-1}(Y-X)+X},Q)-\frac{1}{2a}||X||_2^2+ \frac{1}{2a}|| \lambda ^{-1}(Y-X)+X||_2^2\\&\quad \le \frac{1}{a}\int _{\mathbb {R}^d\times \mathbb {R}^d}\langle x, y\rangle \left\{ \mu (P^{\lambda ^{-1}(Y-X)+X},Q)(dxdy)-P^X(dx)Q(dy)\right\} \\&\quad =\frac{1}{a}\int _{\mathbb {R}^d\times \mathbb {R}^d}\langle x-E[X], y\rangle \mu (P^{\lambda ^{-1}(Y-X)+X},Q)(dxdy)\\&\quad \le \frac{1}{a\lambda }(||X-Y||_2 +\lambda V(X)^{1/2})||x||_{L^2(Q)}. \end{aligned}$$

.

\(\square\)