Abstract
Classical geometric mechanics, including the study of symmetries, Lagrangian and Hamiltonian mechanics, and the Hamilton–Jacobi theory, are founded on geometric structures such as jets, symplectic and contact ones. In this paper, we shall use a partly forgotten framework of second-order (or stochastic) differential geometry, developed originally by L. Schwartz and P.-A. Meyer, to construct second-order counterparts of those classical structures. These will allow us to study symmetries of stochastic differential equations (SDEs), to establish stochastic Lagrangian and Hamiltonian mechanics and their key relations with second-order Hamilton–Jacobi–Bellman (HJB) equations. Indeed, stochastic prolongation formulae will be derived to study symmetries of SDEs and mixed-order Cartan symmetries. Stochastic Hamilton’s equations will follow from a second-order symplectic structure and canonical transformations will lead to the HJB equation. A stochastic variational problem on Riemannian manifolds will provide a stochastic Euler–Lagrange equation compatible with HJB one and equivalent to the Riemannian version of stochastic Hamilton’s equations. A stochastic Noether’s theorem will also follow. The inspirational example, along the paper, will be the rich dynamical structure of Schrödinger’s problem in optimal transport, where the latter is also regarded as a Euclidean version of hydrodynamical interpretation of quantum mechanics.
We’re sorry, something doesn't seem to be working properly.
Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.
Avoid common mistakes on your manuscript.
1 Introduction
Hamilton–Jacobi (HJ) partial differential equations and the associated theory lie at the center of classical mechanics (Abraham and Marsden 1978; Arnold 1989; Marsden and Ratiu 1999; Goldstein et al. 2002). Motivated by Hamilton’s approach to geometrical optic where the action represents the time needed by a particle to move between two points and a variational principle due to Fermat, Jacobi extended this approach to Lagrangian and Hamiltonian mechanics. Jacobi designed a concept of “complete” solution of HJ equations allowing him to recover all solutions simply by substitutions and differentiations. Although, in general, it is more complicated to solve than a system of ODEs like Hamilton’s ones, HJ equations proved to be powerful tools of integration of classical equations of motion. In addition, Jacobi’s approach suggested him to ask what diffeomorphisms of the cotangent bundle, the geometric arena of canonical equations, preserve the structure of these first-order equations. Those are called today symplectic or canonical transformations, and Jacobi’s method of integration is precisely one of them.
It is not always recognized as it should be that HJ equations were also fundamental in the construction of quantum mechanics. The reading of Schrödinger (1926), Fock (1978), Dirac (1933) and others until Feynman (1948) makes abundantly clear that most of new ideas in the field made use of HJ equations for the classical system to be “quantized,” or some quantum deformation of them. There are at least two ways to express this deformation. On the one hand, one can exponentiate the \(L^2\) wave function, call S its complex exponent and look for the equation solved by S (see Goldstein et al. 2002). When the system is a single particle in a scalar potential, one obtain the classical HJ equation with an additional Laplacian term and factor \(i\hbar \), representing the regularization expected from the quantization of the system. This complex factor is symptomatic of the basic quantum probability problem, at least for pure states. In a nutshell, it is the reason why Feynman’s diffusions, in his path integral approach, do not exist. On the other hand, there is an hydrodynamical interpretation of quantum mechanics, founded on Madelung transform, a polar representation of the wave function whose real part is the square root of a probability density. The argument solves another deformation of HJ equation. The geometry of this transform has been thoroughly investigated recently, highlighting its relations with optimal transport theory (Khesin et al. 2021; von Renesse 2012).
However, the probabilistic content of quantum mechanics, especially for pure states, remained a vexing mathematical mystery right from its beginning, despite several interesting (but unsuccessful) attempts (Nelson 2001). The current consensus is that regular probability theory and stochastic analysis have little or nothing to teach us about it. And, in particular, that all that can be saved from Feynman path integral theory is Wiener’s measure and perturbations of it by potential terms. This is the “Euclidean approach,” one of the starting points of mathematical quantum field theory.
In 1931, however, Schrödinger suggested in a paper almost forgotten until the 1980s (Schrödinger 1932) [but insightfully commented by the probabilist Bernstein (1932)] the existence of a completely different Euclidean approach to quantum dynamics. In short, a stochastic variational boundary value problem for probability densities characterizes optimal diffusions on a given time interval as having a density product of two positive solutions of time adjoint heat equations. This idea, revived and elaborated from 1986 (Zambrini 1986), is known today as “Schrödinger’s problem” in the community of optimal transport, where it has proved to provide, among other results, very efficient regularization of fundamental problems of this field (Léonard 2014). In fact, Schrödinger’s problem hinted toward the existence of a stochastic dynamical theory of processes, considerably more general than its initial quantum motivation. In it, various regularizations associated with the tools of stochastic calculus should play the role of those involved in quantum mechanics in Hilbert space, where the looked-for measures do not exist.
The variational side of the stochastic theory has been developed in the last decades, inspired by number of results in stochastic optimal control (Haussmann 1986; Fleming and Soner 2006) and stochastic optimal transport (Mikami 2021). In this context, the crucial role of (second-order) Hamilton–Jacobi–Bellman (HJB) equation has been known for a long time. It provides the proper regularization of the (first-order) HJ equation needed to construct well-defined stochastic dynamical theories. In contrast, for instance, with the notion of viscosity solution, whose initial target was the study of the classical PDE, HJB equation becomes central, there, as natural stochastic deformation of this one, compatible with Itô’s calculus. It is worth mentioning that in any fields like AI or reinforcement learning, where HJB equations play a fundamental role (Peyré et al. 2019), it is natural to expect that such a stochastic dynamical framework, built on them, should present some interest.
The geometric, and especially, Hamiltonian side of the dynamical theory had resisted until now and constitutes the main contribution of this paper. It is our hope that it will be useful far beyond its initial motivation referred to, afterward, as its “inspirational examples.” In this sense, it can clearly be interpreted as a general contribution to stochastic geometric mechanics. More precisely, we are trying to answer the following questions:
-
Do we have any geometric interpretation of the Hamilton–Jacobi–Bellman equation? That is, can we derive the HJB equation from some sort of canonical transformations?
-
Can we formulate some variational problem that leads to a Euler–Lagrange equation which is equivalent to the HJB equation?
-
More systematically, can we develop some counterpart of Lagrangian and Hamiltonian mechanics that are associated with the HJB equation?
The first question indicates that canonical transformations should be somehow second-order, so that the corresponding symplectic and contact structures are also second-order. Meanwhile, the stochastic generalization of optimal control and optimal transport suggests that the variational problem of the second question should be formulated in stochastic sense. Combining these hints, the third question amounts to seeking a new theory of geometric mechanics that integrates stochastics and second-order together.
The cornerstone of stochastic analysis, the well-known Itô’s formula, tells us that the generator of a diffusion process is a second-order differential operator. This provides a very natural way to connect the stochastics with the second-order. That is, in order to build a stochastic or second-order counterpart of geometric mechanics, we need to encode the rule of Itô’s formula into the geometric structures.
There is a theory named second-order differential geometry (“stochastic differential geometry” is also used by some authors but we would like to keep the original terminology), which was devised by Schwartz and Meyer around 1980 (Schwartz 1980, 1982, 1984; Meyer 1979, 1981a), and later on developed by Belopolskaya and Dalecky (1990), Gliklikh (2011), Emery (1989), etc. See Emery (2007) for a survey of this aspect. Compared with the theory of stochastic analysis on manifolds (or geometric stochastic analysis) developed by Itô (1962, 1975), Malliavin (1997), Bismut (1981) and Elworthy (1982) etc., which focus on Stratonovich stochastic differential equations on classical geometric structures, like Riemannian manifolds, frame bundles and Lie groups, so that the Leibniz’s rule is preserved, Schwartz’ second-order differential calculus alter the underlying geometric structures to include second-order Itô correction terms, and provide a broader picture even though it loses Leibniz’s rule and is much less known.
In this paper, we will adopt the viewpoint of Schwartz–Meyer and enlarge their picture to develop a theory of stochastic geometric mechanics. We first give an equivalent and more intuitive description for the second-order tangent bundle by equivalent classes of diffusions, via Nelson’s mean derivatives. And then we generalize this idea to construct stochastic jets, from which stochastic prolongation formulae are proved and the stochastic counterpart of Cartan symmetries is studied. The second-order cotangent bundle is also studied, which helps us to establish stochastic Hamiltonian mechanics. We formulate the stochastic Hamilton’s equations, a system of stochastic equations on the second-order cotangent bundle in terms of mean derivatives. By introducing the second-order symplectic structure and the mixed-order contact structure, we derive the second-order HJB equations via canonical transformations. Finally, we set up a stochastic variational problem on the space of diffusion processes, also in terms of mean derivatives. Two kinds of stochastic principle of least action are built: stochastic Hamilton’s principle and stochastic Maupertuis’s principle. Both of them yield a stochastic Euler–Lagrange equation. The equivalence between the stochastic Euler–Lagrange equation and the HJB equation is proved, which exactly leads to the equivalence between our stochastic variational problem and Schrödinger’s problem in optimal transport. Last but not least (actually vital), a stochastic Noether’s theorem is proved. It says that every symmetry of HJB equation corresponds to a martingale that is exactly a conservation law in the stochastic sense. It should be observed, however, that the Schwartz–Meyer approach, together with the one of Bismut (1981), has also inspired a distinct, Stratonovich-type stochastic Hamiltonian framework (Lázaro-Camí and Ortega 2008) leading to a stochastic HJ equation (Lázaro-Camí and Ortega 2009), without relations with Schrödinger’s problem or optimal transport.
The key results of the present paper and the dependence among them are briefly expressed in the following diagram:
The organization of this paper is the following:
Section 2 is a summary on the theory of stochastic differential equations on manifolds, in the perspective appropriate to our goal. In particular, diffusions will be characterized by their mean and quadratic mean derivatives as in Nelson’s stochastic mechanics (Nelson 2001) although the resulting dynamical content of our theory will have very little to do with his. In this way, we are able to rewrite Itô SDEs on manifolds as ODE-like equations that have better geometric nature. The notion of second-order tangent bundle answers to the question: the drift parts of Itô SDEs are sections of what?
Section 3 is devoted to the notion of Stochastic jets. In the same way as tangent vector on M are defined as equivalence classes of smooth curves through a given point and then generalized to higher-order cases to produce the notion of jets, the stochastic tangent vector is defined as equivalence classes of diffusions so that the stochastic tangent bundle is isomorphic to the elliptic subbundle of the second-order tangent bundle. Stochastic jets are also constructed. This provides an intrinsic definition of SDEs under consideration.
Section 4 illustrates the use of the above geometric formulation of SDEs for the study of their symmetries. Prolongations of M-valued diffusions are defined as new processes with values on the stochastic tangent bundle. Among all deterministic space-time transformations, bundle homomorphisms will be the only subclass to transform diffusions to diffusions. Total mean and quadratic derivative are defined in conformity with the rules of Itô’s calculus. The prolongation of diffusions allows to define symmetries of SDEs and their infinitesimal versions. Stochastic prolongation formulae are derived for infinitesimal symmetries, which yield determining equations for Itô SDEs.
In Sect. 5, the second-order cotangent bundle, as dual bundle of second-order tangent bundle, is defined and analyzed. The properties of second-order differential operator, pushforwards and pullbacks are described. When time is involved, i.e., the base manifold is the product manifold \({\mathbb {R}}\times M\), the corresponding bundles are mixed-order tangent and cotangent bundles, where “mixed-order” means they are second-order in space but first-order in time. More about this topic, like mixed-order pushforwards and pullbacks, pushforwards and pullbacks by diffusions, and Lie derivatives, can be found in “Appendix A.” An generalized notion to stochastic Cartan distribution and its symmetries are discussed in “Appendix B” based on the mixed-order contact structure.
The point of Sect. 6 is to use the tools developed before in the construction of the stochastic Hamiltonian mechanics which is one of the main goals of the paper. One of our inspirational example will be the one underlying the dynamical content of Schrödinger’s problem. By analogy with Poincaré 1-form in the cotangent bundle of classical mechanics and its associated symplectic form, one can construct counterparts in the second-order cotangent bundle. Using the canonical second-order symplectic form on second-order cotangent bundles, one defines second-order symplectomorphisms. The generalization of classical Hamiltonian vector fields becomes second-order operators, for a given real-valued Hamiltonian function on the second-order cotangent bundle. The resulting stochastic Hamiltonian system involves pairs of extra equations compared with their classical versions. Bernstein’s reciprocal processes inspired by Schrödinger’s problem are described in this framework, corresponding to a large class of second-order Hamiltonians on Riemannian manifolds. A mixed-order contact structure describes time-dependent stochastic Hamiltonian systems. The last subsection of this section is devoted to canonical transformations preserving the form of stochastic Hamilton’s equations. The corresponding generating function satisfies the Hamilton–Jacobi–Bellman equation.
Section 7 treats the stochastic version of classical Lagrangian mechanics on Riemannian manifolds. Itô’s stochastic deformation of the classical notion of parallel displacements are recalled. Another one, called damped parallel displacement in the mathematical literature, involving the Ricci tensor, is also indicated. Each of these displacements corresponds to a mean covariant derivative along diffusions. The action functional is defined as expectation of Lagrangian and the stochastic Euler–Lagrange equation involves the damped mean covariant derivative. The dynamics of Schrödinger’s problem is, again, used as illustration. The equivalence between stochastic Hamilton’s equations on Riemannian manifolds and the stochastic Euler–Lagrange one as well as the HJB equation are derived via the Legendre transform. Relations with stochastic control are also mentioned. The section ends with the stochastic Noether’s theorem. The stochastic version of Maupertuis principle, as the twin of stochastic Hamilton’s principle, is left into “Appendix C.”
2 Stochastic Differential Equations on Manifolds
In this section, we will study several types of stochastic differential equations on manifolds which are weakly equivalent to Itô SDEs. We start with a d-dimensional smooth manifold M and a probability space \((\Omega , {\mathcal {F}}, {\textbf{P}})\), and equip the latter with a filtration \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\), i.e., a family of nondecreasing sub-\(\sigma \)-fields of \({\mathcal {F}}\). We call \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\) a past filtration. Unless otherwise specified, the manifold M will not be endowed with any structures other than the smooth structure. In some cases, it will be endowed with a linear connection, a Riemannian metric, or a Levi–Civita connection.
Recall from Hsu (2002, Definition 1.2.1) that by an M-valued (forward) \(\{{\mathcal {P}}_t\}\)-semimartingale, we mean a \(\{{\mathcal {P}}_t\}\)-adapted continuous M-valued process \(X= \{X(t)\}_{t\in [t_0,\tau )}\), where \(t_0\in {\mathbb {R}}\) and \(\tau \) is a \(\{{\mathcal {P}}_t\}\)-stopping time satisfying \(t_0<\tau \le +\infty \), such that f(X) is a real-valued \(\{{\mathcal {P}}_t\}\)-semimartingale on \([t_0,\tau )\) for all \(f\in C^\infty (M)\). The stopping time \(\tau \) is called the lifetime of X. If we adopt the convention to introduce the one-point compactification of M by \(M^*:= M \cup \{\partial _M\}\), then the process X can be extended to the whole time line \([t_0,+\infty )\) by setting \(X(t) = \partial _M\) for all \(t\ge \tau \). The point \(\partial _M\) is often called the cemetery point in the context of Markovian theory.
2.1 Itô SDEs on Manifolds
Given \(N+1\) time-dependent vector field \(b,\sigma _r, r=1,\ldots ,N\) on M, one can introduce a Stratonovich SDE in local coordinates, which has the same form as in Euclidean space (Hsu 2002, Section 1.2). The form of Stratonovich SDEs on M is invariant under changes of coordinates, as Stratonovich stochastic differentials obey the Leibniz’s rule.
However, for Itô stochastic differentials this is not the case because of Itô’s formula. Hence, we cannot directly write an Euclidean form of Itô SDE on M in local coordinates, since it is no longer invariant under changes of coordinates. Indeed, a change of coordinates will always produce an additional term. To balance this term, a common way is to add a correction term to the drift part of the Euclidean form of Itô SDE, by taking advantage of a linear connection. More precisely, under local coordinates \((x^i)\), we consider the following Itô SDE (Gliklikh 2011, Section 7.1, 7.2):
where \((\Gamma ^i_{jk})\) is the family of Christoffel symbols for a given linear connection \(\nabla \) on TM. When conditioning on \(\{X(t)=q\}\) and taking \((x^i)\) as normal coordinates at \(q\in M\), (2.1) turns to the Euclidean form, since at q,
If we denote
Then, clearly \(\sigma \circ \sigma ^*\) is a symmetric and positive semi-definite (2, 0)-tensor field. We also introduce formally a modified drift \({\mathfrak {b}}\) which has the following coordinate expression
We change the coordinate chart from \((U,(x^i))\) to \((V,(\tilde{x}^j))\) with \(U\cap V\ne \emptyset \). Since each \(\sigma _r\) transforms as a vector, we apply the change-of-coordinate formula for Christoffel symbols (e.g., Kobayashi and Nomizu 1963, Proposition III.7.2) to derive that
It follows that the coefficients of the modified drift \({\mathfrak {b}}\) in (2.3) transform as
Therefore, \({\mathfrak {b}}\) is not a vector field as it does not pointwisely transform as a vector.
Finally, using Itô’s formula, we derive the transformation of (2.1) as follows:
where the bracket \([\cdot ,\cdot ]\) on the right-hand side (RHS) of the first equality denotes the quadratic variation. This shows that Eq. (2.1) is indeed invariant under changes of coordinates.
Remark 2.1
One can regard \(\sigma = (\sigma _r)_{r=1}^N \in ({\mathbb {R}}^N)^*\otimes {\mathfrak {X}}(M)\) as an \(({\mathbb {R}}^N)^*\)-valued vector field on M. In this way, the pair \((b,\sigma )\) is called an Itô vector field in Gliklikh (2011, Chapter 7), while the pair \(({\mathfrak {b}},\sigma )\) is called an Itô equation therein.
Now we present the definition of weak solutions to (2.1).
Definition 2.2
(Weak solutions to Itô SDEs) Given a linear connection on M, a weak solution of the Itô SDE (2.1) is a triple (X, W), \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), where
-
(i)
\((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\) is a past (i.e., nondecreasing) filtration of \({\mathcal {F}}\) satisfying the usual conditions,
-
(ii)
\(X = \{X(t)\}_{t\in [t_0,\tau )}\) is a continuous, \(\{{\mathcal {P}}_t\}\)-adapted M-valued process with \(\{{\mathcal {P}}_t\}\)-stopping time \(\tau >t_0\), W is an N-dimensional \(\{{\mathcal {P}}_t\}\)-Brownian motion, and
-
(iii)
for every \(q\in M\), \(t\ge t_0\) and any coordinate chart \((U,(x^i))\) of q, it holds under the conditional probability \({\textbf{P}}(\cdot | X(t_0) = q)\) that almost surely in the event \(\{X(t)\in U\}\),
$$\begin{aligned} X^i(t)= & {} X^i(t_0) + \int _{t_0}^t \left( b^i(s,X(s)) - \frac{1}{2} \sum _{r=1}^N \Gamma ^i_{jk}(X(s)) \left( \sigma ^j_r \sigma ^k_r\right) (s,X(s)) \right) {\textrm{d}}s \\{} & {} + \int _{t_0}^t \sigma ^i_r(s,X(s)) {\textrm{d}}W^r(s). \end{aligned}$$
Definition 2.3
(Uniqueness in law) We say that uniqueness in the sense of probability law holds for the Itô SDE (2.1) if, for any two weak solutions (X, W), \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), and \(({\hat{X}}, {\hat{W}})\), \(({\hat{\Omega }},{\hat{{\mathcal {F}}}},{\hat{{\textbf{P}}}})\), \(\{{\hat{{\mathcal {P}}_t\}}}_{t\in {\mathbb {R}}}\) with the same initial data, i.e., \({\textbf{P}}(X(0)=x_0) = {\hat{{\textbf{P}}}}({\hat{X}}(0)=x_0) = 1\), the two processes X and \({\tilde{X}}\) have the same law.
Note that it is possible to change \(\sigma \) and W in the Itô SDE (2.1) but keep the same weak solution in law. In other words, the form of (2.1) does not univocally correspond to its weak solution in law. For this reason, we will reformulate SDEs in a fashion that makes them look more like ODEs and have better geometric nature. Moreover, we will see that it is the pair \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\) that univocally corresponds to the weak solution of (2.1).
2.2 Mean Derivatives and Mean Differential Equations on Manifolds
In this part, we will recall the definitions of Nelson’s mean derivatives and extend them to M-valued processes. In Nelson’s stochastic mechanics (Nelson 2001), the probability space \((\Omega , {\mathcal {F}}, {\textbf{P}})\) is equipped with two different filtrations. The first one is just an usual nondecreasing filtration \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\), a past filtration. The second is a family of nonincreasing sub-\(\sigma \)-fields of \({\mathcal {F}}\), which is denoted by \(\{{\mathcal {F}}_t\}_{t \in {\mathbb {R}}}\) and called a future filtration. For an \({\mathbb {R}}^d\)-valued process \(\{X(t)\}_{t \in I}\), its forward mean derivative DX and forward quadratic mean derivative QX are defined by conditional expectations as follows:
Their backward versions, i.e., the backward mean derivative and backward quadratic mean derivative, are defined as follows:
In our present paper, we will only focus on the “forward” case, so that only the past filtration \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\) will be invoked. The “backward” case is analogous and every part of this paper can have its “backward” counterpart (cf. Zambrini 2015).
Denote by \(\textrm{Sym}^2(TM)\) (and \(\textrm{Sym}^2_+(TM)\)) the fiber bundle of symmetric (and respectively, symmetric positive semi-definite) (2, 0)-tensors on M. Now we define quadratic mean derivatives for M-valued semimartingales, cf. Gliklikh (2011, Chapter 9).
Definition 2.4
(Quadratic mean derivatives) The (forward) quadratic mean derivative of the M-valued semimartingale \(\{X(t)\}_{t \in [t_0,\tau )}\) is a \(\textrm{Sym}^2_+(TM)\)-valued process QX on \([t_0,\tau )\), whose value at time \(t\in [t_0,\tau )\) in any coordinate chart \((U,(x^i))\) and in the event \(\{X(t) \in U\}\) is given by
where the limits are assumed to exist in \(L^1(\Omega , {\mathcal {F}}, {\textbf{P}})\).
More generally, we can define the (forward) quadratic mean derivative for two M-valued semimartingales X and Y in local coordinates by
Due to Itô’s formula for semimartingales, QX(t) does transform as a (2, 0)-tensor and is obviously symmetric, so that the definition is independent of the choice of U. However, the formal limit \({\textbf{E}}[ \frac{1}{\epsilon } (X^i(t+\epsilon )-X^i(t)) | {\mathcal {P}}_t ]\) under any coordinates \((x^i)\), no longer transforms as a vector, as can be guessed from (2.4). In order to turn it into a vector we need to specify a coordinate system. A natural choice is the normal coordinate system. For this purpose, we endow M with a linear connection \(\nabla \), which determines a normal coordinate system near each point on M.
Definition 2.5
(\(\nabla \)-mean derivatives) Given a linear connection \(\nabla \) on M, the (forward) \(\nabla \)-mean derivative of the M-valued semimartingale \(\{X(t)\}_{t \in [t_0,\tau )}\) is a TM-valued process \(D_\nabla X\) on \([t_0,\tau )\), whose value at time \(t\in [t_0,\tau )\) is defined in normal coordinates \((x^i)\) on the normal neighborhood U of \(q\in M\) and under the conditional probability \({\textbf{P}}(\cdot | X(t) = q)\) as follows:
where the limits are assumed to exist in \(L^1(\Omega , {\mathcal {F}}, {\textbf{P}})\).
As we force \(D_\nabla X(t)\) to be vector-valued by definition, its coordinate expression under any other coordinate system can be calculated via Leibniz’s rule. Let us stress that the notation \(D_\nabla \) should not be confused with the one of covariant derivatives in geometry.
Now we formally take forward mean derivatives in Itô SDE (2.1), and note that the correction term in the modified drift involving Christoffel symbols vanishes by (2.2). Then, we get an ODE-like system:
We call Eq. (2.6) a system of mean differential equations (MDEs). Note that both MDEs (2.6) and Itô SDE (2.1) rely on linear connections on M.
Definition 2.6
(Solutions to MDEs) Given a linear connection on M, a solution of MDEs (2.6) is a triple X, \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), where
-
(i)
\((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\) is a past filtration of \({\mathcal {F}}\) satisfying the usual conditions,
-
(ii)
\(X = \{X(t)\}_{t\in [t_0,\tau )}\) is a continuous, \(\{{\mathcal {P}}_t\}\)-adapted M-valued semimartingale with lifetime a \(\{{\mathcal {P}}_t\}\)-stopping time \(\tau >t_0\), and
-
(iii)
the \(\nabla \)-mean derivative and quadratic mean derivative of X exist and satisfy (2.6).
2.3 Second-Order Operators and Martingale Problems
Definition 2.7
(Second-order operators) A second-order operator on M is a linear operator \(A: C^\infty (M) \rightarrow C^\infty (M)\), which has the following expression in a coordinate chart \((U,(x^i))\),
where \((A^{ij})\) is a symmetric (2, 0)-tensor field, and the expression is required to be invariant under changes of coordinates. If \((A^{ij})\) is positive semi-definite, then we say the second-order operator A is elliptic; if \((A^{ij})\) is positive definite, we say A is nondegenerate elliptic.
There is a coordinate-free definition of second-order operators. A linear map \(A_q: C^\infty (M) \rightarrow {\mathbb {R}}\) is called a second-order derivation at \(q\in M\), if there is a symmetric (2, 0)-tensor \(\Gamma _{A_q}\) at q such that \(A_q(fg) = f(q) A_q g + g(q) A_qf + (df\otimes dg) (\Gamma _{A_q})\) for all \(f,g \in C^\infty (M)\). Then, a second-order operator is nothing but a smooth field of second-order derivations. From this, we see that for A in (2.7), \(A^i = A(x^i)\), \(A^{ij} = A(x^i x^j) - x^iA(x^j) - x^jA(x^i)\), and
We call \(\Gamma _A\) the squared field operator (originally “opérateur carré du champ”) associated with A. We also denote \(\Gamma _A(f,g):= (df\otimes dg) (\Gamma _A)\). Clearly, for a classical vector field V, \(\Gamma _V \equiv 0\) by Leibniz’s rule.
It is easy to verify from the coordinate-change invariance that the coefficients \(A^i\)’s and \(A^{ij}\)’s transform under the change of coordinates from \((x^i)\) to \(({\tilde{x}}^j)\) by the following rule (e.g., Ikeda and Watanabe 1989, Section V.4),
The formal generator of Itô SDE (2.1) is given by,
which is a time-dependent second-order elliptic operator due to the change-of-coordinate formula (2.4).
Denote by \({\mathcal {C}}_{t_0}\) the subspace of \(C([t_0,\infty ),M^*)\) consisting of all paths always staying in M or eventually stopped at \(\partial _M\). That is, \(\omega \in {\mathcal {C}}_{t_0}\) if and only if there exists \(\tau (\omega )\in (t_0,\infty ]\) such that \(\omega (t)\in M\) for \(t\in [t_0,\tau (\omega ))\) and \(\omega (t) = \partial _M\) for \(t\in [\tau (\omega ),\infty )\). Let \({\mathcal {B}}({\mathcal {C}}_{t_0})\) be the \(\sigma \)-field generated by Borel cylinder sets. Let \(X(t): {\mathcal {C}}_{t_0}\rightarrow M^*, X(t,\omega ) = \omega (t), t\ge t_0\) be the coordinate mapping. For each \(t\in {\mathbb {R}}\), define a sub-\(\sigma \)-field by \({\mathcal {B}}_t = \sigma \{X(s): t_0 \le s\le t_0\vee t\}\). Then, \(\{{\mathcal {B}}_t\}_{t\in {\mathbb {R}}}\) is a past filtration of \({\mathcal {B}}({\mathcal {C}}_{t_0})\) and \(\tau \) is a \(\{{\mathcal {B}}_t\}\)-stopping time.
Definition 2.8
(Martingale problems on manifolds, Hsu 2002, Definition 1.3.1) Given a time-dependent second-order elliptic operator \(A=(A_t)_{t\ge t_0}\), a solution to the martingale problem associated with A is a triple X, \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\), where
-
(i)
\((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\in {\mathbb {R}}}\) is a past filtration of \({\mathcal {F}}\) satisfying the usual conditions,
-
(ii)
\(X:\Omega \rightarrow {\mathcal {C}}_{t_0}\) is an \(M^*\)-valued \(\{{\mathcal {P}}_t\}\)-semimartingale, and
-
(iii)
for every \(f\in C^\infty ({\mathbb {R}}\times M)\), the process \(M^{f,X}(t):= f(t,X(t)) - f(t_0,X(t_0)) - \int _{t_0}^t (\frac{\partial }{\partial t}+A_s) f(s,X(s)) {\textrm{d}}s\), \(t\in [t_0,\tau (X))\), is a real-valued continuous \(\{{\mathcal {P}}_t\}\)-martingale.
The process \(\{X(t)\}_{t\in [t_0,\tau (X))}\) is called an M-valued \(\{{\mathcal {P}}_t\}\)-diffusion process with generator A (or simply an A-diffusion).
The uniqueness in the sense of probability law for both MDEs and martingale problems can be defined in a similar fashion to Definition 2.3. Note that unlike Itô SDEs or MDEs, the definition for martingale problems does not rely on linear connections.
When provided with a linear connection on M, one can see, in the same way as in Stroock and Varadhan’s theory (e.g., Karatzas and Shreve 1991, Section 5.4), that the existence of a solution to the martingale problem associated with \(A^X=(A^X_t)_{t\ge t_0}\) in (2.10) is equivalent to the existence of a weak solution to the Itô SDE (2.1), and also equivalent to the existence of a solution to MDEs (2.6); their uniqueness in law of are also equivalent.
2.4 The Second-Order Tangent Bundle
As we have seen, the modified drift \({\mathfrak {b}}\) in (2.3) is not a vector field. Is \({\mathfrak {b}}\) a section (and, in the affirmative, of what)? In fact, it is not a section of any bundle, as its changes-of-coordinate formula (2.4) involves \(\sigma \). But if we look at the formal generator \(A^X\) in (2.10), or the pair \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\) of its coefficients, then we can construct a bundle whose structure group is governed by the changes-of-coordinate formulae (2.9), so that the sections are just second-order operators.
We denote by \(\textrm{Sym}^2({\mathbb {R}}^d)\) the space of all symmetric (2, 0)-tensors on \({\mathbb {R}}^d\), and by \(\textrm{Sym}^2_+({\mathbb {R}}^d)\) the subspace of it consisting of all positive semi-definite (2, 0)-tensors. Also denote by \({\mathcal {L}}({\mathbb {R}}^n,{\mathbb {R}}^d)\) the space of all linear maps from \({\mathbb {R}}^n\) to \({\mathbb {R}}^d\).
Definition 2.9
(The second-order tangent bundle)
-
(i)
Gliklikh (2011, Definition 7.14) The Itô group \(G_I^d\) is the Cartesian product (but not direct product of groups) \(\textrm{GL}(d,{\mathbb {R}}) \times {\mathcal {L}}({\mathbb {R}}^d\otimes {\mathbb {R}}^d,{\mathbb {R}}^d)\) equipped with the following binary operation:
$$\begin{aligned} (g_2, \kappa _2) \circ (g_1, \kappa _1) = (g_2\circ g_1, g_2\circ \kappa _1 + \kappa _2\circ (g_1\otimes g_1)), \end{aligned}$$for all \(g_1, g_2 \in \textrm{GL}(d,{\mathbb {R}})\), \(\kappa _1, \kappa _2\in {\mathcal {L}}({\mathbb {R}}^d\otimes {\mathbb {R}}^d,{\mathbb {R}}^d)\).
-
(ii)
The left group action of \(G_I^d\) on \({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d)\) is defined by
$$\begin{aligned} (g, \kappa )\cdot ({\mathfrak {b}}, a) = (g{\mathfrak {b}} + \kappa a, (g\otimes g) a), \end{aligned}$$(2.11)for all \((g, \kappa ) \in G_I^d\), \({\mathfrak {b}}\in {\mathbb {R}}^d\), \(a\in \textrm{Sym}^2({\mathbb {R}}^d)\).
-
(iii)
The second-order tangent bundle \(({\mathcal {T}}^O M, \tau ^O_M, M)\) is the fiber bundle with base space M, typical fiber \({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d)\), and structure group \(G_I^d\).
-
(iv)
The fiber \({\mathcal {T}}^O_q M\) at \(q\in M\) is called second-order tangent space to M at q. An element \(({\mathfrak {b}}, a)_q\in {\mathcal {T}}^O_q M\) is called a second-order tangent vector at q. A (global or local) section of \(\tau ^O_M\) is called a second-order vector field.
-
(v)
Denote by \({\mathcal {T}}^E M\) the subbundle of \({\mathcal {T}}^O M\) consisting of all elements \(({\mathfrak {b}}, a)_q\in {\mathcal {T}}^O_q M\), \(q\in M\), with \(a_q\) a positive semi-definite (2, 0)-tensors. Let \(\tau ^E_M = \tau ^O_M|_{{\mathcal {T}}^E M}\). We call \(({\mathcal {T}}^E M, \tau ^E_M, M)\) the second-order elliptic tangent bundle.
Remark 2.10
-
(i)
We indulge in some abuse of notions. For example, the second-order vector fields should not be confused with the semisprays which are sections of the double tangent bundle \(T^2M\) (e.g., Saunders 1989, Section 1.4; Lang 1999, Section IV.3).
-
(ii)
Some authors just defined second-order vector fields as second-order operators as in Definition 2.7 (Emery 1989, Definition 6.3 or Gliklikh 2011, Definition 2.74). As soon as we choose a frame for \({\mathcal {T}}^O M\), it will be clear that second-order vector fields are identified with second-order operators.
-
(iii)
The authors in Belopolskaya and Dalecky (1990), Gliklikh (2011) define a bundle which has the Itô group as its structure group and has the pair \(({\mathfrak {b}}, \sigma )\) of coefficients in Itô SDE (2.1) as its section. They name it Itô’s bundle and denote it as \({\mathcal {I}} M\). The difference is that, in our formulation, the pair \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\) of coefficients of the generator of Itô SDE (2.1) is a section of second-order elliptic tangent bundle \(\tau ^E_M\). The advantage of the bundle \(\tau ^E_M\) is that it is a natural generalization of tangent bundle to second-order and has a good geometric interpretation, as we will see in Proposition 3.2.
-
(iv)
Note that the typical fiber \({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d)\) of \(\tau ^O_M\) is a vector space of dimension \(d+\frac{d(d+1)}{2}\). But \(\tau ^E_M\) is not a vector bundle, since its structure group \(G_I^d\) is not a linear group (subgroup of general linear group). The typical fiber of \(\tau ^E_M\) is \({\mathbb {R}}^d \times \textrm{Sym}^2_+({\mathbb {R}}^d)\), which is not even a vector space, so that \(\tau ^E_M\) is not a vector bundle either. Indeed, we may call them quadratic bundles, just as the way they call Itô’s bundle in Belopolskaya and Dalecky (1990, Chapter 4).
-
(v)
The Itô’s bundle \({\mathcal {I}} M\) defined in Gliklikh (2011, Definition 7.17) is the fiber bundle over manifold M, with fiber \({\mathbb {R}}^d \times {\mathcal {L}}({\mathbb {R}}^N,{\mathbb {R}}^d)\) and structure group \(G_I^d\) which acts on the fiber from the left by
$$\begin{aligned} (g, \kappa )({\mathfrak {b}}, \sigma ) = \left( g{\mathfrak {b}} + \textstyle {{\frac{1}{2}}} \textrm{tr}\,(\kappa \circ (\sigma \otimes \sigma )), g \circ \sigma \right) , \end{aligned}$$for all \((g, \kappa ) \in G_I^d\), \({\mathfrak {b}}\in {\mathbb {R}}^d\), \(\sigma \in {\mathcal {L}}({\mathbb {R}}^N,{\mathbb {R}}^d)\). For the same reason as \({\mathcal {T}}^O M\) or \({\mathcal {T}}^E M\), Itô’s bundle \({\mathcal {I}} M\) is not a vector bundle. There is a bundle homomorphism over M from \({\mathcal {I}} M\) to \({\mathcal {T}}^E M\), which maps in fibers from \({\mathcal {I}}_q M\) to \({\mathcal {T}}^E_q M\), \(q\in M\), by \(({\mathfrak {b}}, \sigma ) \rightarrow ({\mathfrak {b}}, \sigma \circ \sigma ^*)\). It is easy to see that this bundle homomorphism is also a subjective submersion. If we identify \(g\in \textrm{GL}(d,{\mathbb {R}})\) with \((g,0) \in G_I^d\), then \(\textrm{GL}(d,{\mathbb {R}})\) is a subgroup of \(G_I^d\). We define the Stratonovich’s bundle \({\mathcal {S}} M\) to be the reduction of \({\mathcal {I}} M\) to the structure group \(\textrm{GL}(d,{\mathbb {R}})\), that is, the fiber bundle over M, with fiber \({\mathbb {R}}^d \times {\mathcal {L}}({\mathbb {R}}^N,{\mathbb {R}}^d)\) and structure group \(\textrm{GL}(d,{\mathbb {R}})\) which acts on the fiber from the left by
$$\begin{aligned} g({\mathfrak {b}}, \sigma ) = (g{\mathfrak {b}}, g \circ \sigma ). \end{aligned}$$Unlike \({\mathcal {T}}^O M\) or \({\mathcal {I}} M\), Stratonovich’s bundle \({\mathcal {S}} M\) is indeed a vector bundle, and the tangent bundle TM is a vector subbundle of \({\mathcal {S}} M\). It can be expected that Stratonovich’s bundle is a natural bundle to formulate Stratonovich SDEs. But, in this paper, we mainly focus on Itô SDEs and their generators.
It is natural to regard the differential operators
as a local frame of \({\mathcal {T}}^O M\) over the local chart \((U,(x^i))\) on M. In the sequel, we will usually shorten them by
We make the convention that \(\partial _k\partial _j = \partial _j\partial _k\) for all \(1\le j\le k \le d\). A second-order vector field \(({\mathfrak {b}},a)\) is expressed in terms of this local frame by
In this way, every second-order vector field can be regarded as a second-order operator and vice versa. In particular, the generator \(A^X\) of an M-valued diffusion process X, for example the generator (2.10) of the Itô SDE, is a time-dependent second-order vector field, so that we can rewrite \(A^X\) as \(A^X_t = ({\mathfrak {b}}(t),(\sigma \circ \sigma ^*)(t))\).
The tangent bundle TM is a subbundle (but not a vector subbunddle) and also an embedded submanifold of \({\mathcal {T}}^O M\), as the bundle monomorphism
is also an embedding. However, there is no canonical bundle epimorphism from \({\mathcal {T}}^O M\) to TM which is a left inverse of \(\iota \) and linear in fiber. We call such a bundle epimorphism a fiber-linear bundle projection from \({\mathcal {T}}^O M\) to TM. The choice of such a bundle epimorphism is exactly the choice of a linear connection on M. More precisely, we have the following connection correspondence properties, the first of which can also be found in Gliklikh (2011, Section 2.9).
Proposition 2.11
(Connection correspondence) Any linear connection on M induces a fiber-linear bundle projection from \({\mathcal {T}}^O M\) to TM. Conversely, any fiber-linear bundle projection from \({\mathcal {T}}^O M\) to TM induces a torsion-free linear connection on M.
Remark 2.12
The connection correspondence is similar to the correspondence between horizontal subbundles of the tangent bundle of a vector bundle and connections on this vector bundle, cf. Saunders (1989, Section 3.1).
Proof
Let \((\Gamma _{ij}^k)\) be the Christoffel symbols of a linear connection \(\nabla \) on M. Define a projection by
Clearly, \(\varrho _\nabla \) is linear in fiber and \(\varrho _\nabla \circ \iota = \textbf{Id}_{TM}\). Conversely, let \(\varrho : {\mathcal {T}}^O M\rightarrow T M\) be a fiber-linear bundle projection. Then, on each coordinate chart \((U,(x^i))\) around \(q\in M\), there exists a diffeomorphism \(B_U: U \rightarrow {\mathcal {L}}(\textrm{Sym}^2({\mathbb {R}}^d), {\mathbb {R}}^d)\), such that
The family of diffeomorphisms \((B_U)\) determines a spray and then a torsion-free linear connection on M (see, e.g., Lang 1999, Section IV.3). The torsion-freeness follows from the symmetry of \(B_U\)’s. \(\square \)
Observe that a group action of \(\textrm{GL}(d,{\mathbb {R}})\) on \(\textrm{Sym}^2({\mathbb {R}}^d)\) can be separated from (2.11), which is given by \(g\cdot a = (g\otimes g) a\). Thus, the second component a of each element \(( {\mathfrak {b}}, a ) \in {\mathcal {T}}^O_q M\) can be regarded as a (2, 0)-tensor. Recall that we denote by \(\textrm{Sym}^2(TM)\) the bundle of (2, 0)-tensors on M, then there is a canonical bundle epimorphism
whose kernel is the image of \(\iota \). Conversely, we also have a similar connection correspondence property for \(\textrm{Sym}^2(TM)\), as in Proposition 2.11. That is, a linear connection \(\nabla \) on M induces a fiber-linear bundle monomorphism from \(\textrm{Sym}^2(TM)\) to \({\mathcal {T}}^O M\), which is a right inverse of \({\hat{\varrho }}\) and given by
where \(\nabla ^2\) is the second covariant derivative (Petersen 2016, Subsection 2.2.2.3) [which is also called the Hessian operator when acting on smooth functions (Jost 2017)]. In other words, \(\nabla ^2_{\partial _i,\partial _j}|_q = {\hat{\iota }}_{\nabla }(dx^i \odot dx^j |_q)\), where \(\odot \) is the symmetrization operator on \(T^2 M\).
Combining (2.13) and (2.14) together, we have the following short exact sequence:
Proposition 2.11 and (2.15), (2.16) imply that when a linear connection \(\nabla \) is given, the sequence is also split, in the fiber-wise sense. The induced decomposition
where both the first direct sum \(\oplus \) and the isomorphism \(\cong \) are in the fiber-wise sense (but not bundle isomorphism and Whitney sum), while the second direct sum is the Whitney sum, and is given by
for \(b_q = ( {\mathfrak {b}}^i + \textstyle {{\frac{1}{2}}} a^{jk} \Gamma ^i_{jk}(q) ) \partial _i |_q \in T_qM\). A similar short exact sequence as (2.17) holds with \({\mathcal {T}}^E M\) and \(\textrm{Sym}^2_+(TM)\) in place of \({\mathcal {T}}^O M\) and \(\textrm{Sym}^2(TM)\), respectively.
Now we introduce a subclass of semimartingales on manifolds which contains diffusions. We call the M-valued process \(X= \{X(t)\}_{t\in [t_0,\tau )}\) an Itô process, if there exists a \(\{{\mathcal {P}}_t\}\)-adapted continuous \({\mathcal {T}}^E M\)-valued process \(\{({\mathfrak {b}}, a)(t)\}_{t\in [t_0,\tau )}\) satisfying \(({\mathfrak {b}}, a)(t) \in {\mathcal {T}}^E_{X(t)} M\) for each \(t\in [t_0,\tau )\), such that for every \(f\in C^\infty ({\mathbb {R}}\times M)\), \(M^{f,X}(t):= f(t,X(t)) - f(t_0,X(t_0)) - \int _{t_0}^t (\frac{\partial }{\partial {t}}+ {\mathcal {A}}^X )f(s,X(s)) {\textrm{d}}s\), \(t\in [t_0,\tau )\) is a real-valued \(\{{\mathcal {P}}_t\}\)-martingale, where \({\mathcal {A}}^X_t = ({\mathfrak {b}}, a)(t) = {\mathfrak {b}}^i(t) \partial _i + \textstyle {{\frac{1}{2}}} a^{ij}(t) \partial _i\partial _j\). We call the process \(\{({\mathfrak {b}}, a)(t)\}_{t\in [t_0,\tau )}= \{{\mathcal {A}}^X_t\}_{t\in [t_0,\tau )}\) the random generator of X. A similar notion “Brownian semimartingale” is also used in the literature (e.g., Driver 1992). If X is a diffusion with generator \(A^X_t = ({\mathfrak {b}}(t), a(t))\), then it is an Itô process with random generator \({\mathcal {A}}^X_t = A^X_{(t,X(t))} = (\mathfrak b(t,X(t)), a(t,X(t)))\). The difference between Itô processes and diffusions is that the randomness of the random generator of the former can not only appear on the base manifold M, but also on the fibers.
Then, we can define forward mean derivatives in a coordinate-free way, without relying on linear connections.
Definition 2.13
(Mean derivatives) For an M-valued Itô process \(X= \{X(t)\}_{t\in [t_0,\tau )}\), we define its (forward) mean derivatives (DX(t), QX(t)) at time \(t\in [t_0,\tau )\) by
where \(({\mathfrak {b}}, a)\) is the random generator of X.
Comparing with forward mean derivatives defined in local coordinates before, we have the following relations. The proof follows the lines of Gliklikh (2011, Lemma 9.4).
Lemma 2.14
Given an M-valued Itô process \(X= \{X(t)\}_{t\in [t_0,\tau )}\) and a coordinate chart \((U,(x^i))\) centered at \(q\in M\).
-
(i)
In the event \(\{X(t) \in U\}\), QX(t) has the coordinate expression (2.5) and
$$\begin{aligned} (D X)^i(t) = \lim _{\epsilon \rightarrow 0^+} {\textbf{E}}\left[ \frac{X^i(t+\epsilon )-X^i(t) }{\epsilon } \bigg | {\mathcal {P}}_t \right] . \end{aligned}$$ -
(ii)
Given a linear connection \(\nabla \) on M, we have, under the conditional probability \({\textbf{P}}(\cdot | X(t) = q)\), that
$$\begin{aligned} (D_\nabla X)^i(t) = (D X)^i(t) + \frac{1}{2} \Gamma ^i_{jk}(X(t)) (QX)^{jk}(t). \end{aligned}$$(2.20)
It follows from (2.20) that the map \(\varrho _\nabla \) in (2.14) acts on the generator \(A^X\) of a diffusion X by
For a time-dependent second-order vector field \(A_t = ({\mathfrak {b}}(t), a(t))\), we can take MDEs (2.6) to set up a new type of MDEs by using the mean derivatives as follows:
Then, similarly to Definitions 2.6 and 2.3, we may also define solutions and uniqueness in law for MDEs (2.22). We call a solution of (2.22) an integral process of \(A = (A_t)\). Note that the system (2.22) does not rely on linear connections. The equivalence of the well-posedness of (2.22) and the martingale problem in Definition 2.8 is easy to verify. When a linear connection is specified, the system (2.22) and martingale problem associated with \(A^X\) in (2.10) are both equivalent to the Itô SDE (2.1) and MDEs (2.6).
3 Stochastic Jets
In classical differential geometry, a tangent vector to a manifold may be defined as an equivalence class of curves passing through a given point, where two curves are equivalent if they have the same derivative at that point (Lee 2013, Chapter 3). This idea can be generalized to higher-order cases, which leads to the notion of jets. The jet structures allow us to translate a system of differential equations to a system of algebraic equations, and make it more intuitive to study the symmetries of systems of differential equations.
In this section we shall generalize these ideas to the stochastic case. We will first give an equivalent description to the second-order elliptic tangent bundle \(\tau ^E_M\) by constructing an equivalence relation on diffusions. Then, we will define the stochastic jets and figure out the “jet-like” bundle structure involved in the space of stochastic jets. Finally, we shall see that the bundle structure is the appropriate platform to formulate SDEs intrinsically. In the next section, we will apply stochastic jets to study stochastic symmetries.
3.1 The Stochastic Tangent Bundle
Recall that a tangent vector can be represented as a equivalence classes of smooth curves that have the same velocity at the base point. This leads to the following equivalent definition of tangent bundle TM:
where \(C^\infty _{(0,q)}(M)\) is the set of all smooth curves on M that pass through q at time \(t=0\), and the equivalence relation is defined as \(\gamma ,{\tilde{\gamma }}\in C^\infty _{(0,q)}(M)\) are equivalent if and only if \((f\circ \gamma )'(0)=(f\circ {\tilde{\gamma }})'(0)\) for every real-valued smooth function f defined in neighborhood q. If we replace smooth curves by diffusion processes, and time derivatives by mean derivatives, then we get the following definition.
Definition 3.1
(The stochastic tangent bundle) Two M-valued diffusion processes \(X=\{X(t)\}_{t\in [0,\tau )}\), \(Y=\{Y(t)\}_{t\in [0,\sigma )}\) are said to be stochastically equivalent at \((t,q)\in {\mathbb {R}}\times M\), if, almost surely, \(X(t)=Y(t)=q\) and \(D(f\circ X)(t) = D(f\circ Y)(t)\) for all \(f\in C^\infty (M)\). The equivalence class containing X is called the stochastic tangent vector of X at q and is denoted by \(j_{(t,q)} X\). When \(t=0\), we denote \(j_q X:= j_{(0,q)}X\) in short. Let \(I_{(t,q)}(M)\) be the set of all M-valued diffusion processes starting from q at time t. The stochastic tangent bundle of M is the set
Note that since X, Y are M-valued diffusion processes, f(X) and f(Y) are real-valued Itô processes, and hence their mean derivatives exists.
At this stage, we have not yet touched the jet-like formulation even though we used the jet-like notation \(j_q X\). Indeed, if one follows strictly the definition of jet bundles over the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), it is more rational to use the time line \({\mathbb {R}}\) as “source” and the manifold M as “target” (cf. Saunders 1989, Example 4.1.16). But here we just assign the “target” to the manifold M, because, roughly speaking, one can talk about the velocity of a smooth curve at a moment t, but not about the generator of a diffusion at a moment t. Instead, we can talk about the generator of a diffusion at a position \(q\in M\). Later on, we will define the “bona fide” stochastic jet space which possess the time line \({\mathbb {R}}\) as “source” and the manifold M as “target.”
Similarly to the one-to-one correspondence between tangent space and space of equivalence classes of smooth curves, we have the following:
Proposition 3.2
There is a one-to-one correspondence between the stochastic tangent bundle \({\mathcal {T}}^S M\) and the second-order elliptic tangent bundle \({\mathcal {T}}^E M\).
Proof
For an M-valued diffusion process \(X\in I_{(0,q)}(M)\), \(q\in M\), we denote by \(A^X\) its generator. Then, the map \(j_q X \mapsto A^X_{(0,q)} = (DX(0), QX(0))\) defines a one-to-one correspondence between \({\mathcal {T}}^S M\) and \({\mathcal {T}}^E M\). The inverse map is \(A_q = ({\mathfrak {b}},a)_q \mapsto j_q X^A\), where A is a section of \({\mathcal {T}}^E M\) (i.e., an elliptic second-order operator) smoothly extending the element \(A_q\in {\mathcal {T}}^E_q M\), and \(X^A\in I_{(0,q)}(M)\) is a diffusion processes having A as its generator. \(\square \)
Therefore, the stochastic tangent bundle \({\mathcal {T}}^S M\) admit a smooth structure which makes it to be a smooth manifold diffeomorphic to \({\mathcal {T}}^E M\), and hence it is a bona fide fiber bundle over M. In the sequel, we will identify \({\mathcal {T}}^S M\) with \({\mathcal {T}}^E M\) without ambiguity. And the projection map from \({\mathcal {T}}^S M\) to M will be denoted by \(\tau ^S_M\), that is, \(\tau ^S_M(j_q X) = q\) for any \(j_q X\in {\mathcal {T}}^S M\).
Definition 3.3
(Canonical coordinate system on \({\mathcal {T}}^S M\)) Let \((U, (x^i))\) be an coordinate system on M. The induced canonical coordinate chart \((U^{(1)}, x^{(1)})\) on \({\mathcal {T}}^S M\) is defined by
where \(x^i(j_q X) = x^i(q)\), \(D^i x(j_q X) = (DX)^i(0)\) and \(Q^{jk} x(j_q X) = (Q X)^{jk}(0)\).
Our slightly ambiguous notations \(D^i x\) and \(Q^{jk} x\) are chosen so as to avoid the worse one \(Qx^{jk}\).
When a linear connection \(\nabla \) is provided, we can also define the coordinates via the \(\nabla \)-mean derivative \(D_\nabla \) instead of D, as follows:
Then, \(x^{(1)}_\nabla := (x^i, D^i_\nabla x, Q^{jk} x)\) also forms a coordinate system on \({\mathcal {T}}^S M\), which we call the \(\nabla \)-canonical coordinate system. It follows from relation (2.20) that
Using the identification of elements \(j_q X \in {\mathcal {T}}^S_q M\) and \(({\mathfrak {b}},a)_q \in {\mathcal {T}}^E_q M\) via Proposition 3.2, as well as their relations with the element \((b_q, a_q)\in TM \oplus \textrm{Sym}^2(TM)\), via (2.19), we have \(D^i x(j_q X) = {\mathfrak {b}}^i\), \(D^i_\nabla x(j_q X) = b^i = \mathfrak b^i + \textstyle {{\frac{1}{2}}} a^{jk} \Gamma ^i_{jk}(q)\) and \(Q^{jk} x(j_q X) = a^{jk}\). In this way the fiber-linear bundle projection \(\varrho _\nabla \) of (2.14) maps, under the canonical coordinates \((x,\dot{x})\) on TM, as follows:
so that \(D_\nabla ^ix = \dot{x}^i \circ \varrho _\nabla \). Therefore, \((x^i, D_\nabla ^ix)\) is a partial coordinate system on \({\mathcal {T}}^S M\) that coincides with \((x^i,\dot{x}^i)\) when restricted on TM. Moreover, the decomposition in (2.19) yields the following expressions for second-order vector fields:
Similarly to Definition 3.1, we define a \(\nabla \)-dependent equivalence relation as follows:
Definition 3.4
Two M-valued diffusion processes \(X=\{X(t)\}_{t\in [0,\tau )}\), \(Y=\{Y(t)\}_{t\in [0,\sigma )}\) are said to be \(\nabla \)-stochastically equivalent at \((t,q)\in {\mathbb {R}}\times M\), if, almost surely, \(X(t)=Y(t)=q\) and \(D_\nabla X(t) = D_\nabla X(t)\). The equivalence class containing X is called the \(\nabla \)-tangent vector of X at q and is denoted by \(j^\nabla _{(t,q)} X\). When \(t=0\), we denote \(j^\nabla _q X:= j^\nabla _{(0,q)}X\) for short.
Then, similarly to Proposition 3.2, one can show that the tangent bundle TM can be identified with the following set of equivalent classes of diffusions:
via \(j_q^\nabla X\mapsto D_\nabla X(0)\). Under this identification, it follows from (2.21) that \(j_q^\nabla X = \varrho _\nabla (j_q X)\). Clearly, if we regard all smooth curves as special diffusions, then the partition determined by (3.1) is the restriction of the one determined by (3.5) to the set of all smooth curves.
Remark 3.5
In presence of a linear connection \(\nabla \) on M, one can easily follow Definition 3.1 and Proposition 3.2 with \(D_\nabla \) in place of D, to verify the one-to-one correspondence between the set \({\mathcal {T}}^S M\) of equivalent classes and the Whitney sum \(TM \oplus \textrm{Sym}^2_+(TM)\), which brings back to the fiber-wise isomorphism (2.18). But since such kind of correspondence need to specify beforehand a linear connection, we still endow \({\mathcal {T}}^S M\) with the structure of \({\mathcal {T}}^E M\) instead of that of \(TM \oplus \textrm{Sym}^2(TM)\) in this paper, although the latter is also feasible and may provide easier calculations.
3.2 The Stochastic Jet Space
In classical jet theory, for the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), there is a one-to-one correspondence between 1-jets and tangent vectors, and there is a canonical diffeomorphism between the first-order jet bundle \(J^1 \pi \) and \({\mathbb {R}}\times TM\) (Saunders 1989, Example 4.1.16).
Now using similar ideas, we will introduce the “bona fide” stochastic jet space. The key is to modify the definition of stochastic tangent vectors, to involve the time line \({\mathbb {R}}\) as the “source” as well as to randomize the initial datum of the diffusion processes. Intuitively, an M-valued diffusion process X can be regarded as a random “section” of the trivial “bundle” \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) which is merely continuous in time and depends on the sample point \(\omega \).
For a metric space (F, d), we denote by \(L^0(\Omega , F)\) the quotient space of all F-valued random elements, by the following equivalence relation: two random elements are equivalent if and only if they are identical almost surely. We endow \(L^0(\Omega , F)\) with the topology of the following \({\textbf{P}}\)-essential metric (cf. Munkres 1975, Section 43):
Definition 3.6
Two M-valued diffusion processes \(X=\{X(s)\}_{s\in [t,\tau )}\), \(Y=\{Y(s)\}_{s\in [t,\sigma )}\) starting at time t, are said to be stochastically equivalent at \(t\in {\mathbb {R}}\), if, almost surely, \(X(t)= Y(t)\) and \((DX(t), QX(t)) = (DY(t), QY(t))\). The equivalence class containing X is called the stochastic jet of X at t, denoted by \(j_t X\). Let \(I_t(M)\) be the set of all M-valued diffusion processes starting at time t. Then, the stochastic jet space of M is the set
The functions \(\pi ^S_1\) and \(\pi ^S_{1,0}\), called stochastic source and target projections, are defined by
and
In the above definition, since \(\pi _M\circ \phi = \textbf{Id}_M\), we have \(\pi (Y) = \pi _M\circ \phi (X) = X\) a.s., that is, X is the projection of Y.
To characterize the relation between \({\mathcal {J}}^S M\) and \(\mathcal T^S M\) (or \({\mathcal {T}}^E M\)), we need the following definitions.
Definition 3.7
(Horizontal subspace) Let \((E,\pi _M, M)\) be a fiber bundle. The horizontal subspace of \(L^0(\Omega ,E)\) is defined by
An element of the horizontal subspace \(L^h(\Omega ; \tau ^E_M)\) of \(L^0(\Omega , {\mathcal {T}}^E M)\) is then of the form \(A \circ \xi \), where A is a section of \(\tau ^E_M\) and \(\xi \in L^0(\Omega , M)\). Such an element \(A \circ \xi \) will be denoted by \(A_\xi \). By the correspondence of \({\mathcal {T}}^S M\) and \({\mathcal {T}}^E M\), one can easily get the following equivalent definition for \(L^h(\Omega ; \tau ^E_M)\),
The correspondence is given explicitly by
where \(X^{A_\xi }\) is an M-valued diffusion with generator A and with \(X^{A_\xi }(0) = \xi \) a.s..
Proposition 3.8
The stochastic jet space \({\mathcal {J}}^S M\) is trivial. More precisely, we have the homeomorphism
given by \(j_t X \mapsto (t, j_{X(t)} (\theta _t X))\), for any \(X\in I_t(M)\), where \(\theta _t\) is the shift operator on \({\mathcal {C}}\), that is, \(\theta _t \omega (\cdot ) = \omega (\cdot +t)\).
Proof
The homeomorphism \({\mathcal {J}}^S M \cong {\mathbb {R}}\times {\mathcal {J}}^S_0 M\) is given by \(j_t X \mapsto (t, j_0 (\theta _t X))\). The homeomorphism \({\mathcal {J}}^S_0 M \cong L^h(\Omega ; \tau ^S_M)\) is given by \(j_0 X \mapsto j_{X(0)} X\), whose inverse map is \(A_\xi \mapsto j_0 X^{A_\xi }\). \(\square \)
Definition 3.9
(Stochastic fibered space)
-
(i)
Given a fiber bundle \((E,\pi _M, M)\) with total space E, base space M and typical fiber manifold F, the stochastic fibered space associated with it is the triplet \((E^S,\pi ^S_M, M)\) where
$$\begin{aligned} E^S:= \{ (q, \xi ): q\in M, \xi \in {\hat{L}}(\Omega , E_q) \}, \end{aligned}$$\(\pi ^S_M: E^S\rightarrow M\) is the natural projection given by \(\pi ^S_M(q, \xi ) = q\), and \({\hat{L}}(\Omega ,F)\) is a subspace of \(L^0(\Omega ,F)\), with \(E_q\) denoting the fiber of \(\pi _M\) over q. The fiber bundle E is called model bundle of \(E^S\). There is a family of projections \(\{\pi _\omega \}_{\omega \in \Omega }\) from the stochastic fiber manifold \(E^S\) to its model bundle E, defined by
$$\begin{aligned} \pi _\omega : E^S\rightarrow E, \quad (q, \xi ) \mapsto (q, \xi (\omega )). \end{aligned}$$ -
(ii)
A global section of \((E^S,\pi ^S_M, M)\) is called a random global section. A random local section is a map \(\sigma : U \rightarrow E\) defined on some measurable subset \(U\subset \Omega \times M\) and such that, for almost all \(\omega \in \Omega \), \(\sigma (\omega ): U_\omega \rightarrow E\) is a local section of \((E,\pi _M, M)\), where \(U_\omega = U\cap (\{\omega \}\times M)\).
Note that a random global section is a random local section defined on all \(\Omega \times M\).
It follows from Proposition 3.8 that the stochastic jet space \(({\mathcal {J}}^S M, \pi _1^S, {\mathbb {R}})\) is a stochastic fibered space, whose associated model bundle is \(({\mathbb {R}}\times {\mathcal {T}}^S M, \pi _1, {\mathbb {R}})\). Just like the first-order jet bundle \(J^1 \pi \) which is diffeomorphic to \({\mathbb {R}}\times TM\), the model bundle \({\mathbb {R}}\times \mathcal T^S M\) is itself a jet bundle and also has two bundle structures, with base space \({\mathbb {R}}\) and \({\mathbb {R}}\times M\), respectively. The corresponding source and target projections are defined, respectively by
and
Moreover, we will denote the natural projection from \({\mathbb {R}}\times {\mathcal {T}}^S M\) to \({\mathcal {T}}^S M\) by \(\pi _{0,1}\). This projection map is indeed a bundle homomorphism from \(({\mathbb {R}}\times {\mathcal {T}}^S M, \pi _{1,0}, {\mathbb {R}}\times M)\) to \(({\mathcal {T}}^S M, \tau ^S_M, M)\), whose projection is the natural projection from \({\mathbb {R}}\times M\) to M, denoted by \({\hat{\pi }}\).
Similarly to Proposition 3.8, we have the following diffeomorphisms for the model bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\):
which is given by
for any \(X\in I_{(t,q)}(M)\), where \(A^X\) is the generator of X as a section of \({\mathbb {R}}\times {\mathcal {T}}^E M\) (i.e., a time-dependent elliptic second-order differential operator). Furthermore, the proof of Proposition 3.2 allows us to find simply the inverse maps, especially for the second diffeomorphism. That is, for any \((t,A_q) = (t,{\mathfrak {b}},a) \in \pi _{1,0}^{-1}(t,q)\),
where A is a section of \({\mathbb {R}}\times {\mathcal {T}}^E M\) such that \(A_{(t,q)} = A_q\), and \(X^A\in I_{(t,q)}(M)\) is a diffusion process having A as its generator.
The “stochastic target” of \({\mathcal {J}}^S M\), i.e., the trivial bundle \(({\mathbb {R}}\times L^0(\Omega , M), \pi ^S, M)\), is another example of stochastic fibered spaces. Its model bundle is the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\). The graph of an M-valued stochastic process defined on a random time interval \([0,\tau )\) is a random (local) section of \(({\mathbb {R}}\times L^0(\Omega , M), \pi ^S, {\mathbb {R}})\). The projection of \(\pi _\omega \) on the targets from \({\mathbb {R}}\times L^0(\Omega , M)\) to \({\mathbb {R}}\times M\) is denoted by \({\hat{\pi }}_\omega \).
We may summarize how all these maps fit together by the following diagram:
When a linear connection is specified on M, one can easily obtain, similarly to (3.6), the following homeomorphism:
and the following diffeomorphisms:
where the first two diffeomorphisms are given by
and the last one is due to the classical theory.
3.3 Intrinsic Formulation of SDEs
With the classical machinery of jet structures, it is possible to translate differential equations into algebraic equations on jet bundle (Saunders 1989). In this subsection, we follow this way to formulate intrinsic SDEs.
For a subset S of the model bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\) and \(t\in {\mathbb {R}}\), we denote by \(S_t\) the intersection of S with the fiber \(\{t\} \times {\mathcal {T}}^S M\).
Definition 3.10
A stochastic differential equation on M is a closed embedded submanifold S of the model jet bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\) with \(S_0 \ne \emptyset \). A (local) solution of the stochastic differential equation S is a triple X, \((\Omega ,{\mathcal {F}},{\textbf{P}})\), \(\{{\mathcal {P}}_t\}_{t\ge 0}\), where
-
(i)
\((\Omega ,{\mathcal {F}},{\textbf{P}})\) is a probability space, and \(\{{\mathcal {P}}_t\}_{t\ge 0}\) is a past filtration of \({\mathcal {F}}\) satisfying the usual conditions,
-
(ii)
\(X = \{X(t)\}_{t\in [0,\tau )}\) is a \(\{{\mathcal {P}}_t\}\)-adapted M-valued diffusion process over \([0,\tau )\), where \(\tau \) is a \(\{{\mathcal {P}}_t\}\)-stopping time, and
-
(iii)
almost surely \(j_t X = (t, j_{X(t)} (\theta _t X)) \in S\) for every \(t\in [0,\tau )\).
Remark 3.11
-
(i)
The condition that \(S_0 \ne \emptyset \) is just for convenience, in order to set the initial time at \(t=0\).
-
(ii)
There is an equivalent way to formulate the solution of a stochastic differential equation S. That is, a (local) solution is a pair \((P,\tau )\), where P is a probability measure on \(({\mathcal {C}},{\mathcal {B}}({\mathcal {C}}),\{{\mathcal {B}}_t\})\) and \(\tau \) is a \(\{{\mathcal {B}}_t\}\)-stopping time, such that for P-almost surely \(\omega \), \(j_t \omega = (t, j_{\omega (t)} (\theta _t \omega )) \in S\) for every \(t\in [0,\tau (\omega ))\).
This definition does not look like the traditional definition of a stochastic differential equation, but we can see the relationship between the two by using coordinates. Since S is a embedded submanifold of \({\mathbb {R}}\times {\mathcal {T}}^S M\), it admits a local defining function in a neighborhood of each of its points (Lee 2013, Proposition 5.16). That is, for a coordinate chart \(({\mathbb {R}}\times U^{(1)}, (t, x^{(1)}))\) of the point \((0,j_q X) \in S_0\), there is a function \(\Theta : {\mathbb {R}}\times U^{(1)} \rightarrow {\mathbb {R}}^K\) where \(K = \dim {\mathcal {T}}^S M - \dim S\), such that \(S\cap ({\mathbb {R}}\times U^{(1)}) = \Theta ^{-1}(0)\) and 0 is a regular value of \(\Theta \). Then, the condition \(j_t X = (t, j_{X(t)} (\theta _t X)) \in S\) before X(t) leaves the neighborhood \(U=\tau ^S_M(U^{(1)})\) reads in local coordinates as
which defines a general MDE (in terms of mean derivatives). The use of a submanifold S is therefore a way to distinguish the definition of the equation from a definition of its solutions.
As an example, the system of MDEs (2.22) can be rewritten to the form (3.8) by setting the defining function
So far we have not done anything but reformulate the basic problem of finding solutions of systems of stochastic differential equations in a more geometrical form, ideally suited to our investigation into symmetry groups thereof.
4 Stochastic Symmetries
The symmetry group of a system of differential equations is the largest local group of transformations acting on the independent and dependent variables of the system with the property that it transform solutions of the system into other solutions (Olver 1998). In the stochastic case, we can proceed analogously.
All methods of this section work in the local case, that is, the vector fields are not necessarily complete and the bundle homomorphisms could be only locally defined.
4.1 Prolongations of Diffusions and Bundle Homomorphisms
Definition 4.1
(Prolongations of diffusions) Let X be an M-valued diffusion process defined on a stopping time interval \([t_0,\tau )\). The prolongation of X is a \(\mathcal T^S M\)-valued process jX defined by, for \(\theta _t\) the shift operator,
Note that \(j_t X = (t, j_{X(t)} (\theta _t X)) = (t, j X(t))\). Thus the graph of the prolongation process jX is nothing but the random section jX of the stochastic jet space \({\mathcal {J}}^S M\). It is easy to see that if X is an M-valued diffusion process, then jX is a \({\mathcal {T}}^S M\)-valued diffusion process.
Given two smooth manifolds M and N, a bundle homomorphism F from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) is a projectable (or fiber-preserving) smooth map, which means it maps fibers of \(\pi \) to fibers of \(\rho \). Hence, there exist two smooth maps \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\) and \({\bar{F}}:{\mathbb {R}}\times M \rightarrow N\) such that \(F(t,q) = (F^0(t), {\bar{F}}(t,q))\). This leads to \(\rho \circ F = F^0\circ \pi \) which is the original definition of bundle homomorphisms. We denote \(F = (F^0, {\bar{F}})\) and say that F projects to \(F^0\).
The following lemma shows that a bundle homomorphisms has the property that it always transforms diffusions into diffusions. One can find a proof of it in Lemma 4.8 or Corollary A.5.
Lemma 4.2
Given a bundle homomorphism \(F = (F^0, {\bar{F}})\) from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\), where \(F^0\) is a diffeomorphism, for every M-valued diffusion process \(X = \{X(t)\}_{t\in [t_0,\tau )}\), the image of its graph (or its corresponding random local section) \(\{(t,X(t)): t\in [t_0,\tau ) \}\) by F, i.e.,
is almost surely the graph of a well-defined N-valued diffusion process \({\tilde{X}}\) given by
As observed in Remark A.6, among all (deterministic) smooth maps from \({\mathbb {R}}\times M\) to \({\mathbb {R}}\times N\), the class of bundle homomorphisms is the only subclass that maps diffusions to diffusions.
Definition 4.3
(Pushforwards of diffusions by bundle homomorphisms) We call the diffusion \({\tilde{X}}\) of Lemma 4.2 the pushforward of X by F, and write \({\tilde{X}} = F\cdot X\). When \(M=N\) and F is a bundle endomorphism on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), we also call \(F\cdot X\) the transform of X by F.
We now introduce the idea of stochastic prolongation whereby a bundle homomorphism may be extended to act upon the model jet bundle.
Definition 4.4
(Stochastic prolongations of bundle homomorphisms) Let F be a bundle homomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\). The stochastic prolongation of F is the map \(j F: {\mathbb {R}}\times {\mathcal {T}}^S M \rightarrow {\mathbb {R}}\times {\mathcal {T}}^S N\) defined by
It is easy to see from (4.1) that if \(j_{(t,q)} X = j_{(t,q)} Y\), then \(j_{F(t,q)} (F\cdot X) = j_{F(t,q)} (F\cdot Y)\). Therefore, the map jF is well defined. By letting \(F = (F^0, {\bar{F}})\), definition (4.2) can be rewritten in a more evident way:
The following properties are easy to check.
Corollary 4.5
-
(i)
The map \(jF: \pi _1 \rightarrow \rho _1\) is a bundle homomorphism projecting to \(F^0\).
-
(ii)
The map \(jF: \pi _{1,0} \rightarrow \rho _{1,0}\) is a bundle homomorphism projecting to F.
-
(iii)
\(j(\textbf{Id}_{{\mathbb {R}}\times M}) = \textbf{Id}_{{\mathbb {R}}\times {\mathcal {T}}^S M}\). Let F and G be two bundle endomorphisms on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) projecting to diffeomorphisms. Then, \(j(F\circ G) = jF \circ jG\).
By virtue of (4.3) and Corollary 4.5.(i), we may write \(jF = (F^0, \overline{jF})\), where \(\overline{jF}: {\mathbb {R}}\times {\mathcal {T}}^S M \rightarrow {\mathcal {T}}^S N\) is the smooth map given by
We can also consider the pushforward of the \({\mathcal {T}}^S M\)-valued process jX by the bundle homomorphism jF.
Corollary 4.6
Given a bundle homomorphism \(F: ({\mathbb {R}}\times M, \pi , {\mathbb {R}})\rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism on \({\mathbb {R}}\), and an M-valued diffusion process X, we have
Proof
It follows from (4.1), (4.4) and Definition 4.1 that
The result follows. \(\square \)
Now we need to investigate the coordinate representation of jF, in stochastic analysis terms. Before that, we introduce the stochastic version of the notion of total derivatives.
Definition 4.7
(Total mean derivatives) Let f be a smooth real-valued function on \({\mathbb {R}}\times M\). The total mean derivative and total quadratic mean derivative of f are the unique smooth functions \({\textbf{D}}_{\textrm{t}} f\) and \({\textbf{Q}}_{\textrm{t}} f\) defined on \({\mathbb {R}}\times {\mathcal {T}}^S M\), with the property that if \(X\in I_{(t_0,q)}(M)\) is a representative diffusion process of \(j_{(t_0,q)} X\), then
There is an abuse of notations in the above definition. Indeed, the left-hand sides (LHSs) of the above two equations both involve subscripts t, but their RHS’s do not depend on t. Those two equations need to be understood as that functions \({\textbf{D}}_{\textrm{t}} f,{\textbf{Q}}_{\textrm{t}} f\) taking their values on the point \(j_{(t_0,q)} X\in {\mathbb {R}}\times \mathcal T^S M\) equal to the RHS’s.
It is easy to check that the definitions of total mean derivatives are independent of the choice of representative diffusions. By Itô’s formula, we have the following coordinate representation for total mean derivatives in the local chart \(({\mathbb {R}}\times U^{(1)}, (t, x^{(1)}))\) on \({\mathbb {R}}\times {\mathcal {T}}^S M\),
If a linear connection \(\nabla \) is specified, we can use (3.4) to rewrite \({\textbf{D}}_{\textrm{t}}\) as follows:
Lemma 4.8
Let us be given a bundle homomorphism \(F = (F^0, {\bar{F}})\) from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism \(F^0\) and an M-valued diffusion process \(X = \{X(t)\}_{t\in [t_0,\tau )}\). If \({\tilde{X}} = F\cdot X\), then in local coordinates \((t,x^i)\) around \((t_0,q)\) and \((s,y^j)\) around \(F(t_0,q)\),
Proof
Assume that the diffusion X can be represented in local coordinates by
where W is an N-dimensional Brownian motion, so that
Let \((s_0,{\tilde{q}})=F(t_0,q) = (F^0(t_0), {\bar{F}}(t_0,q))\). Then,
Define
Then, (Øksendal 2010, Theorem 8.5.7) says that B is an N-dimensional \(\{{\mathcal {F}}_{(F^0)^{-1}(s)}\}\)-Brownian motion, as by a change of variable \(u=(F^0)^{-1}(v)\), we have
Therefore,
Recall that \({\tilde{X}}(s) = {\bar{F}}\left( (F^0)^{-1}(s), X((F^0)^{-1}(s)) \right) \). Using Itô’s formula, we have
It follows that
This completes the proof. \(\square \)
We denote the induced local coordinates on \({\mathcal {T}}^S N\) by \((y^j, D^j y, Q^{kl} y)\). Then, clearly, \(y^j \circ jF = y^j \circ \overline{jF} = y^j \circ F = {\bar{F}}^j\). Now take \(j_{(t,q)} X \in {\mathbb {R}}\times {\mathcal {T}}^S M\). Then,
4.2 Symmetries of SDEs
As an important application of the prolongations of diffusions and bundle homomorphisms, we now study the symmetries of stochastic differential equations. As in classical Lie’s theory of symmetries of ODEs, a symmetry of a stochastic differential equation is a space–time transformation that maps solutions to solutions. But this is not sufficient for the stochastic case. As we have mentioned in Sect. 4.1, the only smooth transformation on \({\mathbb {R}}\times M\) mapping diffusions to diffusions are bundle endomorphisms. Moreover, a solution of stochastic differential equation is always accompanied by a filtration, which will also be altered under space–time transformations. Thus, we have the following definition:
Definition 4.9
(Symmetries) Given a stochastic differential equation \(S\subset {\mathbb {R}}\times \mathcal T^S M\), a symmetry of S is a bundle automorphism F on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) projecting to \(F^0\) such that if \((X, \{{\mathcal {P}}_t\})\) is a solution of S, then so is \((F\cdot X, \{{\mathcal {P}}_{(F^0)^{-1}(s)}\})\).
Using the definitions of stochastic differential equations and pushforwards, we have the following equivalent characterization of symmetries.
Lemma 4.10
Let S be a stochastic differential equation on M. A bundle automorphism F on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) is a symmetry of S, if and only if, whenever \(j_{(t,q)} X \in S\) we have \(j F (j_{(t,q)} X) \in S\), or equivalently, \(j F(S)\subset S\).
Recall that the infinitesimal version of bundle homomorphisms are the so called projectable or fiber-preserving vector fields. More precisely, a vector field V on \({\mathbb {R}}\times M\) is called \(\pi \)-projectable, if the (local) flow (or one-parameter group action) generated by V consists of (local) bundle endomorphisms on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) (cf. Olver 1998, Example 2.22 or Saunders 1989, Proposition 3.2.15). For such a vector field, we define its prolongation to be the infinitesimal generator of the prolongated flow.
Definition 4.11
(Stochastic prolongations of projectable vector fields) Let V be a \(\pi \)-projectable vector field on \({\mathbb {R}}\times M\), with corresponding (local) flow \(\psi = \{\psi _\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\). Then, the stochastic prolongation of V, denoted by jV, will be a vector field on the model jet bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\), defined as the infinitesimal generator of the corresponding prolonged flow \(\{j\psi _\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\). In other words, jV is a vector field on \({\mathbb {R}}\times {\mathcal {T}}^S M\) defined by
for any \(j_{(t,q)} X\in {\mathbb {R}}\times {\mathcal {T}}^S M\).
Now we can define infinitesimal versions of symmetries.
Definition 4.12
(Infinitesimal symmetries) Let S be a stochastic differential equation on M. An infinitesimal symmetry of S is a \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) whose stochastic prolongation jV is tangent to S.
The following properties follow straightforwardly from definitions.
Lemma 4.13
Given a stochastic differential equation S on M, let V be a complete \(\pi \)-projectable vector field on \({\mathbb {R}}\times M\) and \(\psi = \{\psi _\epsilon \}_{\epsilon \in {\mathbb {R}}}\) be its flow. Then,
-
(i)
V is an infinitesimal symmetry of S if and only if \(jV(\Theta ) = 0\) for every local defining function \(\Theta \) of S;
-
(ii)
V is an infinitesimal symmetry of S if and only if for each \(\epsilon \in {\mathbb {R}}\), \(\psi _\epsilon \) is a symmetry of S.
4.3 Stochastic Prolongation Formulae
We consider a coordinate chart \(({\mathbb {R}}\times U^{(1)}, (t, x^{(1)}))\) on the model jet bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\), which is induced by the coordinate chart \((U, (x^i))\) on M. A \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) has the following local coordinate representation
Its prolongation jV is a vector field \({\mathbb {R}}\times {\mathcal {T}}^S M\) of the form
Now we use Lemma 4.8 to compute the coefficients \(V^i_1\)’s and \(V^{jk}_2\)’s.
Theorem 4.14
Suppose V is complete and \(\pi \)-projectable and has the local representation (4.9). Then, in the canonical coordinates \((t, x^{(1)})\), the coefficient functions of its prolongation jV are given by the following formulae:
Proof
Let \(\psi = \{\psi _\epsilon \}_{\epsilon \in {\mathbb {R}}}\) be the flow generated by V. Since V is complete and \(\pi \)-projectable, each \(\psi _\epsilon \) is a bundle endomorphism on \({\mathbb {R}}\times M\) projecting to a diffeomorphism on \({\mathbb {R}}\). Let \(\psi _\epsilon (t, q) = (\psi ^0_\epsilon (t), {\bar{\psi }}_\epsilon (t,q))\). Note that \(\psi ^0_0(t)=t\), \({\bar{\psi }}_0(t,q) = q\) and
Let \(X=\{X(t)\}_{t\in [t_0,\tau )}\) be a representative diffusion of \(j_{(t_0,q)} X \in U^{(1)}\). Then, by Lemma 4.2 and Definition 4.4, a representative diffusion of \(j\psi _\epsilon (j_{(t,q)} X)\) is
Now we apply Lemma 4.8 and take derivatives with respect to \(\epsilon \). Since \(\frac{{\textrm{d}}}{{\textrm{d}}\epsilon }\) commutes with the total mean derivative \({\textbf{D}}_{\textrm{t}}\) as is clear from the coordinate representation, we have
Also,
In the induced coordinate system \((t, x^{(1)})= (t, x^i, D^i x, D^{jk}_2 x)\), the last two formulae read as (4.10) and (4.11), respectively. \(\square \)
Stochastic analogs of contact structure on \({\mathbb {R}}\times {\mathcal {T}}^S M\) and Cartan symmetries will be discussed in “Appendix B.” It turns out that the infinitesimal symmetry of the mixed-order Cartan distribution is equivalent to stochastic prolongation formulae of Theorem 4.14.
Applying Theorem 4.14 to the system of mean differential equations (2.22), we have
Corollary 4.15
The complete and \(\pi \)-projectable vector field V in (4.9) is an infinitesimal symmetry of MDEs (2.22) if and only if the coefficients \(V^0\) and \(V^i\)’s satisfy the following “determining equations”:
Proof
We apply Lemma 4.13.(i) to (3.9) and then use Theorem 4.14, to get
Then, we use the coordinate representation (4.5) for the total mean derivative \({\textbf{D}}_{\textrm{t}}\) and plug Eq. (3.9) in; the results follow. \(\square \)
Remark 4.16
In Gaeta and Quintero (1999), the author proved a result similar to Corollary 4.15, with the following equation instead of Eq. (4.12):
By multiplying both sides of (4.13) with \(\sigma _r^k\), and using the symmetry for index j, k, one gets easily (4.12). So our determining equations for infinitesimal symmetries are more general than those of Gaeta and Quintero (1999). Basically, the paper (Gaeta and Quintero 1999) concerns symmetries of the Itô equation \(({\mathfrak {b}}, \sigma )\), while we consider symmetries of the diffusion with generator \(({\mathfrak {b}}, \sigma \circ \sigma ^*)\), or equivalently, a weak formulation of SDE. The former symmetries belong to the latter obviously, but not vice versa.
Now given a linear connection \(\nabla \) on M, we define the \(\nabla \)-dependent versions of Definitions 4.1, 4.4 and 4.11. More precisely, for a diffusion X on M, we define its \(\nabla \)-prolongation to be a TM-valued diffusion \(j^\nabla X\) given by \(j^\nabla X(t) = j^\nabla _{X(t)} (\theta _t X)\). For a bundle homomorphism from \(F:({\mathbb {R}}\times M, \pi , {\mathbb {R}})\rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to a diffeomorphism \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\), the \(\nabla \)-prolongation of F is the map \(j^\nabla F: {\mathbb {R}}\times T M \rightarrow {\mathbb {R}}\times T N\) defined by \(j^\nabla F (j^\nabla _{(t,q)} X) = j^\nabla _{F(t,q)} (F\cdot X)\). The \(\nabla \)-prolongation of V, denoted by \(j^\nabla V\), is defined to be the infinitesimal generator of the corresponding prolonged flow \(\{j^\nabla \psi _\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\), so that \(j^\nabla V\) is a vector field on \({\mathbb {R}}\times T M\) and has the form
for V of the form (4.9). If we denote \({\bar{V}} = V^i \frac{\partial }{\partial {x^i}}\) so that \(V = V^0 + {\bar{V}}\), we have
Corollary 4.17
Under the canonical coordinates \((t, x, \dot{x})\), the coefficient \(V^i_\nabla \) of the \(\nabla \)-prolongation \(j^\nabla V\) are given by:
where R is the curvature tensor.
Proof
The proof is complete. \(\square \)
5 The Second-Order Cotangent Bundle
5.1 Second-Order Covectors
Definition 5.1
(Second-order cotangent space) The second-order cotangent space at \(q\in M\) is the dual vector space of \({\mathcal {T}}^O_q M\), denoted by \({\mathcal {T}}^{S*}_q M\). The pairing of \(\alpha \in {\mathcal {T}}^{S*}_q M\) and \(A\in {\mathcal {T}}^O_q M\) is denoted by \(\langle \alpha , A \rangle \) or \(\alpha (A)\). Elements of \({\mathcal {T}}^{S*}_q M\) are called second-order covectors at q. The disjoint union \({\mathcal {T}}^{S*} M:= \amalg _{q\in M} {\mathcal {T}}^{S*}_q M\) is called the stochastic cotangent bundle of M. The natural projection map from \({\mathcal {T}}^{S*} M\) to M is denoted by \(\tau ^{S*}_M\). A (local or global) smooth section of \({\mathcal {T}}^{S*} M\) is called a second-order covector field or a second-order form.
Dual to the left action (2.11) of \(G_I^d\) on fibers of \({\mathcal {T}}^S M\), \(G_I^d\) will act on those of \({\mathcal {T}}^{S*} M\) from the right.
Lemma 5.2
The stochastic cotangent bundle \(({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M)\) is the fiber bundle dual to \(({\mathcal {T}}^S M, \tau ^S_M, M)\), with structure group \(G_I^d\) acting on the typical fiber \(({\mathbb {R}}^d \times \textrm{Sym}^2({\mathbb {R}}^d))^*\) from the right by
for all \((g, \kappa ) \in G_I^d\), \(p\in ({\mathbb {R}}^d)^*\), \(o\in (\textrm{Sym}^2({\mathbb {R}}^d))^*\).
The notion of second-order forms should not be confused with the classical one of 2-forms. There are two basic examples of second-order forms, say, \(d^2 f\) and \(df\cdot dg\), where f and g are given smooth functions on M. They are defined as follows: for \(A\in {\mathcal {T}}^S M\),
where \(\Gamma _A\) is the squared field operator defined in (2.8). These notations go back to Schwartz (1984) and Meyer (1981a) (see also Emery 1989, Chapters VI and VII), where the term \(d^2 f\) is called the second differential of f, and the term \(df\cdot dg\) is called the symmetric product of df and dg. Note that in these original references, there is a factor \(\frac{1}{2}\) at the RHS of the definition of \(df\cdot dg\). Here we drop this factor. Obviously, when restricted to TM, the second differential \(d^2f\) is just the differential df but the symmetric product \(df\cdot dg\) vanishes.
The definition of the symmetric product \(df\cdot dg\) yields two properties: \(df\cdot dg\) is symmetric in f and g; and \((df\cdot dg)_q = 0\) if one of \(df_q\) and \(dg_q\) vanishes. These lead to a more general definition for symmetric products of two 1-forms. More precisely, let \(\omega , \eta \in {\mathcal {T}}^*_q M\), then there exist smooth functions f and g on M such that \(\omega = df_q\) and \(\eta = dg_q\). By the preceding property, the second-order covector \((df\cdot dg)_q\) does not depend on the choice of f and g, and we will denote it by \(\omega \cdot \eta \). Now if \(\omega , \eta \) are second-order forms, then their symmetric product is defined pointwisely through \((\omega \cdot \eta )_q = \omega _q \cdot \eta _q\). More formally, we have
Definition 5.3
(Symmetric product, Emery 1989, Chapter VI) There exists a unique fiber-linear bundle homomorphism \(\bullet \) from \(T^* M \otimes T^* M\) to \({\mathcal {T}}^{S*} M\), which is called the symmetric product, such that for all \(\omega , \eta \in T^* M\), \(\bullet (\omega \otimes \eta ) = \omega \cdot \eta \).
It is easy to verify from (5.1) that the local frame, dual to (2.12), for \(({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M)\) over the local chart \((U,(x^i))\) is given by (see also Emery 1989, Chapter VI)
We adopt the convention that \(dx^k\cdot dx^j = dx^j\cdot dx^k\) for all \(1\le j< k \le d\). Under this frame, a second-order covector \(\alpha \in {\mathcal {T}}^{S*}_q M\) has a local expression
where \(\alpha ^{jk}\) is symmetric in j, k. The coordinates \((x^i)\) induce a canonical coordinate system on \({\mathcal {T}}^{S*} M\), denoted by \((x^i, p_i, o_{jk})\) and defined by
for \(\alpha \) in (5.2). Since the coefficients \((\alpha _i)\) do transform like a covector, as indicated in Lemma 5.2, it will cause no ambiguity to retain \((x^i,p_i)\) as canonical coordinates on \(T^* M\). As in classical geometric mechanics (Abraham and Marsden 1978; Holm et al. 2009), we still call the coordinates \((p_i)\) the conjugate momenta. And we shall call the second-order coordinates \((o_{jk})\) the conjugate diffusivities.
The pairing of \(\alpha \) and the second-order vector field A in (2.7) is then
It follows from (5.1) and (2.8) that for smooths functions f and g on M,
More generally, for 1-forms \(\omega \) and \(\eta \) with local expressions \(\omega = \omega _i dx^i\) and \(\eta = \eta _i dx^i\), the symmetric product \(\omega \cdot \eta \) has local expression
Dual to the tangent case, there is indeed a canonical bundle epimorphism \({\hat{\varrho }}^*: ({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M) \rightarrow (T^* M, \tau ^*_M, M)\), given by
In particular \({\hat{\varrho }}^*(d^2 f) = df\). In local coordinates, \({\hat{\varrho }}^*\) reads as
The map \({\hat{\varrho }}^*\) is well defined since \(\alpha |_{TM}\) is a covector. Clearly, \({\hat{\varrho }}^*\) is also a surjective submersion, so that \({\mathcal {T}}^{S*} M\) is a fiber bundle over \(T^* M\). Occasionally, we will use the notation \({\hat{\varrho }}^*_M\) to indicate the base manifold M.
However, there is no canonical bundle monomorphism from \(T^* M\) to \({\mathcal {T}}^{S*} M\) which is a left inverse of \({\hat{\varrho }}^*\) and linear in fiber. We call such a bundle epimorphism a fiber-linear bundle injection from \(T^* M\) to \(\mathcal T^{S*} M\). Similarly to Proposition 2.11, we also have a connection correspondence property. Namely, if we are given a linear connection \(\nabla \) on M, then it induces a fiber-linear bundle injection from \(T^* M\) to \({\mathcal {T}}^{S*} M\) by
or in local coordinates \({\hat{\iota }}^*_\nabla (x, p) = (x, p, (\Gamma _{jk}^i(x) p_i))\). Any fiber-linear bundle injection from \(T^* M\) to \({\mathcal {T}}^{S*} M\) induces a torsion-free linear connection on M.
Denote by \(\textrm{Sym}^2(T^* M)\) the subbundle of \(T^* M \otimes T^* M\) consisting of all (0, 2)-tensors on M. Then, the symmetric product \(\bullet \), when restricting to \(\textrm{Sym}^2(T^* M)\), is a bundle monomorphism whose image is the kernel of \({\hat{\varrho }}^*\). Conversely, still by the connection correspondence, a linear connection \(\nabla \) induces a fiber-linear bundle epimorphism from \({\mathcal {T}}^{S*} M\) to \(\textrm{Sym}^2(T^* M)\) which is a right inverse of \(\bullet \) and is given by
We introduce the \(\nabla \)-dependent coordinates \((o_{jk}^\nabla )\) by \(o_{jk}^\nabla (\alpha ) = \alpha _{jk} - \alpha _i \Gamma _{jk}^i(q)\) for \(\alpha \) in (5.2), i.e.,
Then, \(\varrho ^*_\nabla (\alpha ) = o_{jk}^\nabla (\alpha ) dx^j \otimes dx^k|_q\) and in particular
The coordinates \((x^i, p_i, o_{jk}^\nabla )\) form a coordinate system on \({\mathcal {T}}^{S*} M\), which we call the \(\nabla \)-canonical coordinate system. The coordinates \((x^i, o_{jk}^\nabla )\) also form a coordinate system on \(\textrm{Sym}^2(T^* M)\) when restricted to it. We will call the coordinates \((o^\nabla _{jk})\) the tensorial conjugate diffusivities.
To sum up, we have the following short exact sequence which is split when a linear connection is provided:
It is easy to check that the bundle homomorphisms \({\hat{\varrho }}^*\), \({\hat{\iota }}^*_\nabla \), \(\bullet \) and \(\varrho ^*_\nabla \) are dual to \(\iota \), \(\varrho _\nabla \), \({\hat{\varrho }}\) and \({\hat{\iota _\nabla }}\) in (2.13), (2.14), (2.15) and (2.16), respectively, so that the short exact sequence (5.7) is dual to (2.17). Similarly to (2.18), we have the following decomposition if a linear connection \(\nabla \) is given,
with fiber-wise isomorphism \(\cong \) and first direct sum \(\oplus \), which is given by
In particular,
Similarly to the classical cotangent space, the second-order cotangent space may be defined via germs. To be precise, we denote by \(C_q^\infty (M)\) the set of all germs of smooth functions at \(q\in M\), and define a equivalence relation between germs: \([f]_q, [g]_q \in C_q^\infty (M)\) are equivalent if and only if they have the same Taylor expansion at q higher than order zero and up to order two. Then, one can easily check that there is a one-to-one correspondence between \({\mathcal {T}}^{S*}_q M\) and the quotient space of \(C_q^\infty (M)\) by this equivalence relation. Along this way, we can also observe the following diffeomorphism:
by mapping \((d^2 f_q, f(q))\) to \(j^2_q f\), where \(J^2{\hat{\pi }}\) is the classical second-order jet bundle of \((M\times {\mathbb {R}}, {\hat{\pi }}, M)\). This is similar to \(T^*M\times {\mathbb {R}}\) is diffeomorphic to the first-order jet bundle \(J^1{\hat{\pi }}\) (e.g., Geiges 2008, Example 2.5.11 or Saunders 1989, Example 4.1.15). We denote the natural projection maps from \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) to \({\mathbb {R}}\) and from \(T^* M\times {\mathbb {R}}\) to \({\mathbb {R}}\) by \({\hat{\pi }}^2_{0,1}\) and \({\hat{\pi }}^1_{0,1}\), respectively.
The relations and projection maps are integrated into the following commutative diagram:
Remark 5.4
-
(i)
As in Remark 3.5, given a linear connection \(\nabla \), we can obtain a one-to-one correspondence between \((T^*M \oplus \textrm{Sym}^2(T^*M))\times {\mathbb {R}}\) and \(J^2{\hat{\pi }}\) by mapping \((df_q, \nabla ^2 f_q, f(q))\) to \(j^2_q f\). One can find in Dahlqvist et al. (2019) an application of the jet-like structure on \(T^*M \oplus \textrm{Sym}^2(T^*M)\) and higher-order bundles to Martin Hairer’s theory of regularity structures (Hairer 2014).
-
(ii)
As we have seen, the product \({\mathbb {R}}\times {\mathcal {T}}^S M\) is the model bundle of the stochastic jet space \({\mathcal {J}}^S M\), while the product \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) is diffeomorphic to the second-order jet bundle \(J^2{\hat{\pi }}\). So, in a way, we can say that the “stochastic” and the “second-order” are dual to each other. This stochastic–second-order duality is somehow analogous to the particle–wave duality in quantum mechanics.
5.2 Second-Order Tangent and Cotangent Maps
Definition 5.5
(Second-order tangent and cotangent maps, Emery 1989, Chapter VI) Let M and N be two smooth manifolds, \(F: M\rightarrow N\) be a smooth map. The second-order tangent map of F at \(q\in M\) is a linear map \(d^2 F_q: {\mathcal {T}}^S_q M \rightarrow {\mathcal {T}}^S_{F(q)} N\) defined by
The second-order cotangent map of F at \(q\in M\) is a linear map \(d^2 F^*_q: {\mathcal {T}}^{S*}_{F(q)}N \rightarrow {\mathcal {T}}^{S*}_q M\) dual to \(d^2 F_q\), that is,
The restrictions of \(d^2 F_q\) to \(T_q M\) coincide with the classical tangent map \(d F_q\). But this is not the case for \(d^2 F^*_q\) when restricting to \(T^{*}_{F(q)} N\), since for \(\alpha \in T^*_{F(q)} N\), \(d^2 F^*_q (\alpha )\) is still a linear map on \({\mathcal {T}}^S_q M\). A manifestation of these phenomena may be seen through local coordinates in the following lemma.
Lemma 5.6
Let \((U,(x^i))\) and \((V,(y^j))\) be local coordinate charts around q and F(q), respectively. If
Then,
Now if \(A\in {\mathcal {T}}_q M\), then all \((A^{ij})\) vanish and thereby so do \(\Gamma _A(F^i, F^j)\)’s. Thus, \(d^2 F_q(A) = (A F^i) \frac{\partial }{\partial y^i}|_{F(q)} = d F_q(A)\). This makes clear that \(d^2 F_q|_{{\mathcal {T}}_{q} M} = d F_q\). But if \(\alpha \in {\mathcal {T}}^*_{F(q)} N\), then \(\alpha ^{ij}\)’s vanish and
while \(d F^*_q (\alpha ) = \alpha _i d F^i|_q = \alpha _i \frac{\partial F^i}{\partial x^j}(q) d^2 x^j|_q\). Hence, \(d^2 F^*_q|_{{\mathcal {T}}^*_{F(q)} N} \ne d F^*_q\).
Definition 5.7
(Second-order pushforwards and pullbacks) Let \(F: M\rightarrow N\) be smooth map. The second-order pushforward by F is a bundle homomorphism \(F^S_*: ({\mathcal {T}}^S M, \tau ^S_M, M) \rightarrow ({\mathcal {T}}^S N, \tau ^S_N, N)\) defined by
Given a second-order form \(\alpha \) on N, the second-order pullback of \(\alpha \) by F is a second-order form \(F^{S*}\alpha \) on M defined by
Let F be a diffeomorphism. The second-order pullback by F is a bundle isomorphism \(F^{S*}: ({\mathcal {T}}^{S*} N, \tau ^{S*}_N, N) \rightarrow ({\mathcal {T}}^{S*} M, \tau ^{S*}_M, M)\) defined by
Given a second-order vector field A on M, the second-order pushforward of A by F is a second-order vector field \(F^S_*A\) on N defined by
Clearly, \(F^S_*|_{T M} = F_*\) is the usual pushforward, but \(F^{S*}|_{T^* N} \ne F^*\). The following properties are straightforward.
Lemma 5.8
Let \(F: M\rightarrow N\), \(G:N\rightarrow K\) be two smooth maps. Let A be a second-order vector field on M and f, g be two smooth functions on N.
-
(i)
\(G^S_*\circ F^S_* = (G\circ F)^S_*\).
-
(ii)
If F is a diffeomorphism, then \(((F^S_*A)f)\circ F = A(f\circ F)\).
-
(iii)
\(F^{S*}(d^2 f) = d^2 (f\circ F)\), \(F^{S*} (df\cdot dg) = d(f\circ F)\cdot d(g\circ F)\).
5.3 Mixed-Order Tangent and Cotangent Bundles
In this subsection, we will extend the notions of the previous two subsections to the product manifold \({\mathbb {R}}\times M\).
Definition 5.9
The mixed-order tangent bundle of \({\mathbb {R}}\times M\) is the product bundle (Saunders 1989, Definition 1.4.1) \((T {\mathbb {R}}\times {\mathcal {T}}^S M, \tau _{\mathbb {R}}\times \tau ^S_M, {\mathbb {R}}\times M)\). The mixed-order cotangent bundle of \({\mathbb {R}}\times M\) is the product bundle \((T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M, \tau ^*_{\mathbb {R}}\times \tau ^{S*}_M, {\mathbb {R}}\times M)\). A section of the mixed-order tangent or cotangent bundle is called a mixed-order vector field or mixed-order form, respectively.
The mixed-order tangent and cotangent bundles are dual to each other. The mixed-order tangent (or cotangent) bundle is the bundle that mixes the first-order tangent (or cotangent) bundle in time and the second-order one in space (this is why we use the terminology “mixed-order”). It also matches the fundamental principle of stochastic analysis, whose Itô’s logo is \((dX(t))^2 \sim dt\).
For an M-valued diffusion X with (time-dependent) generator \(A^X\), we call the operator \(\frac{\partial }{\partial {t}} + A^X\) its extended generator. This extended generator is a mixed-order vector field on \({\mathbb {R}}\times M\). Also note that the extended generator \(\frac{\partial }{\partial {t}} + A^X\) of \(X\in I_{t_0}(M)\) can be characterized by the property that for every \(f\in C^\infty ({\mathbb {R}}\times M)\), the process
is a real-valued continuous \(\{{\mathcal {P}}_t\}\)-martingale. In general, a mixed-order vector field A has the following local expression:
To give an example of mixed-order forms, we consider a smooth function f on \({\mathbb {R}}\times M\) and define in local coordinates
Then, \(d^\circ f\) is a mixed-order form, and we call it the mixed differential of f. Clearly, the pairing of the mixed differential \(d^\circ f\) and a mixed-order vector field A is \(\langle d^\circ f, A \rangle = Af\).
Given a bundle homomorphism from \(F:({\mathbb {R}}\times M, \pi , {\mathbb {R}})\rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\), we define its mixed-order tangent map at \((t,q)\in {\mathbb {R}}\times M\) by
Its mixed-order cotangent map at \((t,q)\in {\mathbb {R}}\times M\) is defined as the linear map \(d^\circ F^*_{(t,q)}: T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} N|_{F(t,q)} \rightarrow T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M|_{(t,q)}\) dual to \(d^\circ F_{(t,q)}\). If, moreover, F is a bundle isomorphism, its mixed-order pushforward and pullback, denoted by \(F^R_*\) and \(F^{R*}\), respectively, can be defined in a similar manner to Definition 5.7. We leave their detailed but cumbersome definitions and properties to “Appendix A.1.”
6 Stochastic Hamiltonian Mechanics
6.1 Horizontal Diffusions
In this subsection, we consider a general fiber bundle \((E,\pi _M, M)\) over a manifold M, with fiber dimension n. We first introduce a special class of diffusions on this fiber bundle, which we call horizontal diffusions. They are defined in a similar fashion as the horizontal subspaces in Definition 3.7. Roughly speaking, a horizontal diffusion process on E is a diffusion that is random only “horizontally,” but not on fibers.
Definition 6.1
(Horizontal diffusions on fiber bundles) Let \((E,\pi _M, M)\) be a fiber bundle. A E-valued diffusion process \({\textbf{X}}\) is said to be horizontal, if there exists an M-valued diffusion process X and a smoothly time-dependent section \(\phi =(\phi _t)\) of \(\pi _M\), such that a.s. \({\textbf{X}}(t) = \phi (t, X(t))\) for all t.
The process X in the above definition is just the projection of \({\textbf{X}}\), for \(\pi _M({\textbf{X}}(t)) = \pi _M(\phi (t,X(t))) = X(t)\) a.s.. Since the projection map \(\pi _M\) is smooth, X is still a diffusion process.
Now we are going to define a subclass of “integral processes” for second-order vector fields on E by making use of horizontal diffusions. We use \((x^i,u^\mu )\) for an adapted coordinate system on E (see Saunders 1989, Definition 1.1.5), where we use Greek alphabet to label the coordinates of fibers.
Given a second-order vector field with local expression
where \(A^i, A^\mu , A^{jk}, A^{j\mu }, A^{\mu \nu }\) are smooth functions in the local chart of E, by a horizontal integral process of A in (6.1) we mean an E-valued horizontal diffusion process \({\textbf{X}}\) such that \({\textbf{X}}\) is an integral process of A in the sense of (2.22), that is, it is determined by the system
where the expression \(x\circ {\textbf{X}}\) means that the family of coordinate functions \((x^i)\) acts on \({\textbf{X}}\), and so on. Set \({\textbf{X}}(t) = \phi (t, X(t))\) for some time-dependent section \(\phi \) of \(\pi _M\) and M-valued diffusion X. Denote \(\phi ^\mu = u^\mu \circ \phi \). By Itô’s formula, the system (6.2) can be written as
If X(t) has full support for all t, then the last three equations in (6.3) translate into a system of (possibly degenerate) parabolic equations on E,
Therefore, under suitable assumptions for the coefficients \(A^i, A^\mu , A^{jk}, A^{j\mu }, A^{\mu \nu }\), Eq. (6.4) is solvable, at least locally, by some time-dependent local section \(\phi = (\phi _t)\) over a time interval [0, T]. Then, plugging \(\phi (t)\) into the first two equations of (6.3), we can find X and hence \({\textbf{X}}\). We call X an projective integral process of A.
6.2 The Second-Order Symplectic Structure on \({\mathcal {T}}^{S*} M\) and Stochastic Hamilton’s Equations
It is well known that the classical cotangent bundle \(T^* M\) has a natural symplectic structure, given by the canonical symplectic form \(\omega _0 = dx^i \wedge dp_i\), where \((x^i,p_i)\) are the canonical local coordinates on \(T^* M\) induced by local coordinates \((x^i)\) on M. Clearly \(\omega _0\) is closed, because it is exact as \(\omega _0 = -d \theta _0\), where \(\theta _0 = p_i dx^i\) is called the Poincaré (or tautological) 1-form.
Now we need to define a similar structure on the second-order cotangent bundle \({\mathcal {T}}^{S*} M\), which is a second-order counterpart of the symplectic structure. Firstly, we adapt the coordinate-free definition of the tautological 1-form to the second-order case.
Definition 6.2
The second-order tautological form \(\theta \) is a second-order form on \({\mathcal {T}}^{S*} M\) defined by
Under the induced coordinate system on \({\mathcal {T}}^{S*} M\) defined in (5.3), the second-order tautological form \(\theta \) has the following coordinate representation
We introduce the canonical second-order symplectic form \(\omega \) on \({\mathcal {T}}^{S*} M\) by writing \(\omega = -d^2 \theta \). Although we do not define the exterior differential for second-order forms, we can still take \(d^2\) formally on both sides of (6.5), using Leibniz’s rule and the composition rule \(d\circ d=d^2\) (cf. Meyer 1981b, Section 6.(e)), and forcing \(d^3 = 0\) and \((d^2-)\cdot (d-) = (d-)\cdot (d^2-) = 0\). Then, we get
We call the pair \(({\mathcal {T}}^{S*} M, \omega )\) a second-order symplectic manifold. The complete axiom system for a second-order differential system \((d,d^2,\wedge ,\cdot )\) is beyond the scope of this paper.
Remark 6.3
In the formal expression \((d\circ d) f=d^2 f\), \(f\in C^\infty (M)\), the two differential operators d at LHS are different. The second d is still de Rham’s exterior differential on M, while the first needs to be understood as the exterior differential on TM by regarding the first differential df as a function on TM. Thus, the complete expression should be \(d_{TM}\circ d_M = d^2\). Along this way, the differential operator \(d_{TM}\) can be extended to a linear transform that maps 1-forms to second-order forms and satisfies Leibniz’s rule, see Emery (1989, Theorem 7.1). We shall denote the linear operator extended from \(d_{TM}\) by \({{\varvec{d}}}\) in order to distinguish. In local coordinates, it acts on a 1-form \(\eta = \eta _i dx^i\) by \({{\varvec{d}}}\eta = \eta _i d^2x^i + \frac{1}{2} \frac{\partial \eta _i}{\partial x^j} dx^i\cdot dx^j\), so that \({\hat{\varrho }}^*({{\varvec{d}}}\eta ) = \eta \) and \(d^2 = {{\varvec{d}}}\circ d\). When a linear connection \(\nabla \) is specified, \({{\varvec{d}}}\eta = \eta _i d^\nabla x^i + \frac{1}{2} \nabla \eta (\partial _i,\partial _j) dx^i\cdot dx^j\) which covers (5.8).
As in the classical case, we have the following property for the second-order tautological form.
Lemma 6.4
The second-order tautological form \(\theta \) is the unique second-order form on \({\mathcal {T}}^{S*} M\) with the property that, for every second-order form \(\alpha \) on M, \(\alpha ^{S*} \theta = \alpha \).
Proof
From Lemma 5.8, we have, for any second-order vector \(A\in {\mathcal {T}}^S_q M\),
since \(\tau _M^{S*} \circ \alpha = \textbf{Id}_M\). \(\square \)
Recall that, in Definition 5.7, we have defined the second-order pullbacks of second-order forms. Now, given a smooth map \({\textbf{F}}: {\mathcal {T}}^{S*} M \rightarrow {\mathcal {T}}^{S*} N\) and a second-order 2-form \(\eta \) on \({\mathcal {T}}^{S*} N\), we may also define the second-order pullback \({\textbf{F}}^{S*} \eta \) of \(\eta \) by \({\textbf{F}}\) by allowing \({\textbf{F}}^{S*}\) to be exchangeable with the symmetric product \(\cdot \) as well as the wedge product \(\wedge \). Then, as a corollary of Lemma 6.4, we have
Definition 6.5
Let \(\omega \) and \(\eta \) be the canonical second-order symplectic forms on \({\mathcal {T}}^{S*} M\) and \({\mathcal {T}}^{S*} N\), respectively. A bundle homomorphism \({\textbf{F}}: ({\mathcal {T}}^{S*} M, {\hat{\varrho }}^*_M, T^* M) \rightarrow ({\mathcal {T}}^{S*} N, {\hat{\varrho }}^*_N, T^*N)\) is called second-order symplectic or a second-order symplectomorphism if \({\textbf{F}}^{S*} \eta = \omega \).
Theorem 6.6
Let \(F: N\rightarrow M\) be a diffeomorphism. The second-order pullback \(F^{S*}: {\mathcal {T}}^{S*} M \rightarrow {\mathcal {T}}^{S*} N\) by F is second-order symplectic; in fact \((F^{S*})^{S*} \vartheta = \theta \), where \(\vartheta \) is the second-order tautological form on \(\mathcal T^{S*} N\).
Proof
For \(q\in M\), \(\alpha _q\in {\mathcal {T}}^{S*}_q M\) and \(A \in \mathcal T^S_{\alpha _q} T^{S*} M\),
where we used the fact that \(F\circ \tau _N^{S*}\circ F^{S*} = \tau _M^{S*}\) in the fourth line. \(\square \)
Clearly, the counterparts of Hamiltonian vector fields on \(T^* M\) are now second-order vector fields on \({\mathcal {T}}^{S*} M\). Remark that for a second-order vector field A on \({\mathcal {T}}^{S*} M\), the form \(A \lrcorner \, \omega \) take values in the cotangent bundle \({\mathcal {T}}^{S*} {\mathcal {T}}^{S*} M\).
Definition 6.7
Let \(H: {\mathcal {T}}^{S*} M \rightarrow {\mathbb {R}}\) be a given smooth function. A second-order vector field \(A_H\) on \({\mathcal {T}}^{S*} M\) satisfying
is called a second-order Hamiltonian vector field of H. We call the triple \(({\mathcal {T}}^{S*} M, \omega , H)\) a second-order Hamiltonian system. The function H is called the second-order Hamiltonian of the system.
According to (6.7), the second-order vector field \(A_H\) satisfies
The condition (6.7) cannot uniquely determine \(A_H\). It is easy to verify that \(A_H\) is of the general form
where the coefficients \(C_{jk}, A_{jk}, A_{ijkl}, A^j_k, A^j_{kl}, A_{jkl}\) are smooth functions on local chart satisfying
such that the local expression (6.9) is invariant under the canonical change of coordinates on \({\mathcal {T}}^{S*} M\) induced by a change of coordinates on M, governed by the structure group in Lemma 5.2.
Given such a second-order Hamiltonian vector field of H, its horizontal integral process is a \({\mathcal {T}}^{S*} M\)-valued horizontal diffusion \({\textbf {X}}\) determined by the following MDEs on \({\mathcal {T}}^{S*} M\),
or, in coordinates,
where \(\big ( x^i, p_i, o_{jk}, D^i x, D_i p, D_{jk} o, Q^{jk} x, Q_{jk} p, Q_{ijkl} o, Q^j_k(x,p), Q^j_{kl}(x,o), Q_{jkl}(p,o) \big )\) are canonical coordinates on \({\mathcal {T}}^S {\mathcal {T}}^{S*} M\). The first and third equations has been conjectured in Zambrini (2015) as stochastic Hamilton’s equations in the Euclidean space, since they have the same form as classical Hamilton’s equations (e.g., Abraham and Marsden 1978, Proposition 3.3.2) except that mean derivative D replaces classical time derivative.
At first glance, one may think that the system (6.10) is underdetermined, as there are fewer equations than unknowns (the number of unknowns is equal to the fiber dimension of \({\mathcal {T}}^S {\mathcal {T}}^{S*} M\)). Besides, we haven not yet given (6.10) initial or terminal data. These will become clear after we make the following observations. Firstly, the first two equations of (6.10) constitute MDEs that are equivalent to an Itô SDE for \(x({\textbf{X}})\) in weak sense, as we have seen in Sect. 2.4. So \(x({\textbf{X}})\) should be assigned an initial value, say,
where \(\mu _0\) is a given probability measure on M. Secondly, in the third and fourth equations of (6.10), only the “drift” information of \(p({\textbf{X}})\) and \(o({\textbf{X}})\) is clear. To overcome the lack of information, we need to assign \(p({\textbf{X}})\) and \(o({\textbf{X}})\) terminal values, say,
where \((p^*, o^*)\) is a given second-order form. Therefore, the third and fourth equations are understood as backward SDEs, whose drifts rely on diffusion coefficients via the last equation. The system (6.10) together with boundary values (6.11) and (6.12) could be understood as a (coupled) forward–backward system of SDEs (Yong and Zhou 1999) (where “backward” is taken in a different sense from ours in Sect. 2).
Notice that those forward–backward SDEs are not necessarily solvable (see Yong and Zhou 1999, Proposition 7.5.2 for an example). In order to solve (6.10)–(6.12), we have to take the horizontal condition into consideration, and make some compatibility assumption. More precisely, we set \(X=\tau _M^{S*}({\textbf{X}})\) and
for some time-dependent second-order form \(\alpha \) on M, and denote \(p_i(t,x) = p_i(\alpha (t,x))\) and \(o_{jk}(t,x) = o_{jk}(\alpha (t,x))\), so that \(\alpha (t,x) = (p(t,x), o(t,x))\). Assume that for each \(t\in (0,T)\), X(t) has full support. Then, by applying Itô’s formula, in the same way as in (6.4), the system (6.10) reduces to
Next, by taking partial derivative \(\frac{\partial }{\partial {x^j}}\) on both sides of the first equation of (6.14) and comparing with the next two, we find the following sufficient condition for the last two equations of (6.14):
or equivalent, for the terminal value \((p^*,o^*)\),
Equation (6.15) implies that \(\alpha \) in (6.13) is “exact,” in the sense that \(\alpha = {{\varvec{d}}}\eta \) for the time-dependent 1-form \(\eta = p_i dx^i\), where \({{\varvec{d}}}\) is the extended differential operator defined in Remark 6.3. Similarly, Eq. (6.16) implies that \((p^*,o^*) = {{\varvec{d}}}\eta ^*\) for 1-form \(\eta ^* = p^*_i dx^i\). The second equality of (6.15) [or (6.16)], called Onsager reciprocity or Maxwell relations (Abraham and Marsden 1978, Section 5.3), implies that the 1-form \(\eta \) (or \(\eta ^*\)) is closed. We will refer to Eq. (6.15) or (6.16) as second-order Maxwell relations.
Under the second-order Maxwell relations, the original stochastic Hamilton’s system (6.10) turns to the following MDE-PDE coupled system.
The boundary values in (6.11) and (6.12) now read
We first use the terminal value in (6.18), which satisfies (6.16), to solve the last two PDEs in (6.17). This gives (p, o) and hence the second-order form \(\alpha \). Then, we plug p and o into the first two MDEs and solve them with initial distribution in (6.18). This yields in law the M-valued diffusion \(X = \tau _M^{S*}({\textbf {X}})\) as a projective integral process of \(A_H\).
We call system (6.10) or (6.17) the stochastic Hamilton’s equations (S-H equations in short). The second-order Maxwell relations are sufficient for the component o of \(\alpha \) in (6.13) to solve the last two equations of (6.10), so we refer to it as an integrability condition of (6.10). When restricting settings to Riemannian manifolds, the S-H equations (6.10) can be simplified to a global Hamiltonian-type system on \(T^*M\), as we will see in Sect. 7.4.2.
Lemma 6.8
Let \(H: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) be a time-dependent second-order Hamiltonian, and \({\textbf{X}}\) be a horizontal integral process of \(A_H\). Then, the total mean derivative of H along \({\textbf{X}}\) is
Proof
We use (6.10) and local coordinates to derive
The result follows. \(\square \)
In particular, when H is time-independent, we have
which is also a consequence of (6.8). Equivalently, H is harmonic with respect to the horizontal integral process \({\textbf{X}}\). In this case, we can say that H is stochastically conserved, or is a stochastic conserved quantity. In particular, the expectation \({\textbf{E}}[H({\textbf{X}})]\) is a constant.
6.3 Two Inspirational Examples
Let M be a Riemannian manifold with Riemannian metric g. Assume for simplicity that M is compact. Let \(\nabla \) be the Levi–Civita connection on TM with Christoffel symbols \((\Gamma ^k_{ij})\). In this subsection, we will consider two types of processes on M, to provide some intuition of our stochastic Hamiltonian formalism.
6.3.1 Diffusion Processes on Riemannian Manifolds
Consider a second-order Hamiltonian H on \({\mathcal {T}}^{S*} M\) with the following coordinate expression:
where b is a given smooth vector field on M and F a smooth function on M. One can easily verify that the expression at RHS of (6.20) is indeed invariant under changes of coordinates. We consider the S-H equations (6.17) subject to boundary conditions \(\text {Law} (X(0)) = \mu _0\) and \((p,o)(T) = d^2 S_T\), where \(\mu _0\) is a given probability distribution and \(S_T\) a given smooth function on M.
By the first two equations of system (6.17), the projection diffusion X satisfies the following MDEs,
subject to the initial distribution \(\text {Law} (X(0)) = \mu _0\); or equivalently (according to the end of Sect. 2.4), it can be rewritten as the following Itô SDE in weak sense,
where \(\sigma \) is the positive-definite square root (1, 1)-tensor of g, i.e., \(\sum _{r=1}^d \sigma ^i_r\sigma ^j_r = g^{ij}\), W denotes an \({\mathbb {R}}^d\)-valued standard Brownian motion. Note that the Eq. (6.21) are independent of coordinates (p, o), so they form a closed system on the base manifold M and can be solved independently. Indeed, the solution X is a diffusion on M with generator \(A^X = (b^i - \frac{1}{2} g^{jk} \Gamma _{jk}^i) \partial _i + \frac{1}{2}g^{jk} \partial _j\partial _k = \nabla _b + \frac{1}{2} \Delta \).
Now we consider the last two equations of (6.17). The LHS of the third equation reads
where \(\langle \cdot ,\cdot \rangle \) denotes the pairing of vectors and covectors, \(\Delta \) is the Laplace–Beltrami operator and \(\nabla \) the gradient, with respect to g. In order to find the solution of the third equation of (6.17), we consider the following linear backward parabolic equation (where “backward” has a meaning different from that in Sect. 2.2)
with terminal value \(S(T,x) = S_T(x)\). We let
and use (6.23) and (6.15) to derive
which agree with the third equation of (6.17).
Finally, we combine (6.24) with (6.15) to conclude that the horizontal integral process \({\textbf{X}}\) is
Example 6.9
(Brownian motions) When \(b\equiv 0\) and \(F\equiv 0\), the second-order Hamiltonian is \(H(x,p,o) = \frac{1}{2} g^{ij}(x) (o_{ij} - \Gamma _{ij}^k(x) p_k)\), the solution process X is a standard Brownian motion on M with initial distribution \(\mu _0\). Such second-order Hamiltonian H can be regarded as a “stochastic deformation” of the trivial classical Hamiltonian \(H_0 = 0\). Indeed, H is the g-canonical lift of \(H_0\) that will be defined in forthcoming Sect. 6.6. Therefore, we may regard Brownian motions as “stochastization” or “stochastic deformation” of trivially constant curves on the base manifold M.
We are going to describe in the next example a dynamical approach to diffusions, elaborated afterward (Sect. 7.3), inspired by Schrödinger.
6.3.2 Reciprocal Processes and Diffusion Bridges on Riemannian Manifolds
With the same coefficients b, F and boundary data \(\mu _0,S_T\) in Sect. 6.3.1, we consider the S-H system (6.17) with the following second-order Hamiltonian H on \({\mathcal {T}}^{S*} M\):
subject to boundary conditions \(\text {Law} (X(0)) = \mu _0\) and \((p,o)(T) = d^2 S_T\). Here, b and F are called, respectively, vector and scalar potentials in classical mechanics. Again, it is easy to verify that the expression at RHS of (6.26) is indeed invariant under changes of coordinates.
The LHS of the third equation in (6.17) now reads
In order to find the solution of the third equation of (6.17), we first consider the positive solution of following backward parabolic equation on M
with terminal value \(u(T,x) = e^{S_T(x)}\), where \(\langle \cdot , \cdot \rangle \) denotes the Riemannian inner product with respect to g. If we let \(S=\ln u\), then it is easy to verify that S satisfies the following Hamilton–Jacobi–Bellman (HJB) equation
with terminal value \(S(T,x) = S_T(x)\), where \(|\cdot |\) denotes the Riemannian norm with respect to g. Now we let
and use (6.28) and (6.15) to derive, in a way similar to (6.25),
which agree with the third equation of (6.17). Therefore, the projection diffusion X of the system (6.17) satisfies the following MDEs,
subject to the initial distribution \(\text {Law} (X(0)) = \mu _0\); or equivalently (according to the end of Sect. 2.4), it can be rewritten as the following Itô SDE in weak sense,
where \(\sigma \) is the positive-definite square root (1, 1)-tensor of g, i.e., \(\sum _{r=1}^d \sigma ^i_r\sigma ^j_r = g^{ij}\), W denotes an \({\mathbb {R}}^d\)-valued standard Brownian motion.
The solution process X of (6.31) is called a Bernstein process (Bernstein 1932; Cruzeiro et al. 2000) [or the reciprocal process derived from the M-valued diffusion in (6.22) Jamison (1975)]. The time marginal distribution \(\mu _t\) of X satisfies a Born-type formula \(\mu _t(dx) = u(t,x) v(t,x) dx\) (see, e.g., Zambrini 1986, Corollary 3.3.1 or Cruzeiro and Zambrini 1991, Equations (2.9), (4.6) and (4.8)), where v satisfies the adjoint equation of (6.27). The terminal law of X can be determined in the following way: we first solve (6.27) to get u(0, x), and then find out the initial value for v via \(\mu _0(dx) = u(0,x) v(0,x) dx\) and solve the equation for v to get v(T, x), finally the terminal law of X is given by \(\mu _T(dx) = u(T,x)v(T,x)dx\). In particular, when \(\mu _0 = \delta _{q_1}\) and \(\mu _T = \delta _{q_2}\) for \(q_1, q_2 \in M\), the solution X of (6.31) is the Markovian bridge of the diffusion Y conditioning on ending point \(q_2\) Çetin and Danilova (2016).
Again, we combine (6.29) with (6.15) to conclude that the horizontal integral process \({\textbf{X}}\) is
Remark 6.10
-
(i)
The derivation of the reciprocal process (6.31) from the diffusion (6.22) was the way chosen by Jamison (1975), inspired by Schrödinger’s original problem (Schrödinger 1932). No geometry or dynamical equations like HJB equation (6.28) was involved by him. Like here, Jamison’s construction was involving only the past (nondecreasing) filtration. The dynamical content dates back to Zambrini (1986), Cruzeiro and Zambrini (1991), Chung and Zambrini (2003), where a reciprocal process was constructed from the only data of a Hamiltonian operator as required by Schrödinger’s original problem, and the future (nonincreasing) filtration was also used to study the time-reversed dynamics. Cf. also Example 6.12 and Sect. 7.3.
-
(ii)
Equations (6.30) suggest that the transformation from coordinates (x, p, o) to coordinates (x, Dx, Qx) is not invertible. More precisely, the coordinates \((D^i x)\) are transformed from (x, p) but the coordinates \((Q^{jk} x)\) are only related to \((x^i)\). Besides, these two equations have nothing to do with the coordinates \((o_{jk})\). However, if we look at the \(\nabla \)-canonical coordinates \((D^i_\nabla x)\) for (6.30), then
$$\begin{aligned} (D_\nabla X)^i(t) = g^{ij}(X(t)) p_j(t, X(t)) + b^i(X(t)), \end{aligned}$$which indicates that the transform from (x, p) to \((x,D_\nabla x)\) is invertible. These will help us establish stochastic Lagrangian mechanics and second-order Legendre transforms, in forthcoming Sect. 7.
-
(iii)
As observed in Sect. 2.2, every result presented here has a backward version (in the sense of backward mean derivatives with respect to the future filtration \(\{{\mathcal {F}}_t\}\)). Indeed, two forward–backward SDE systems for Bernstein diffusions on Euclidean space were derived in Cruzeiro and Vuillermot (2015): one is under the past filtration and coincides with ours, whereas the other one is under the future filtration.
There are some special cases which are of independent interests and have been considered in the literature.
Example 6.11
(Brownian (free) reciprocal processes and Brownian bridges) Consider the case where \(b\equiv 0\), \(F\equiv 0\). In this case, Y is a Brownian motion on M, so we call X a Brownian reciprocal process. In particular, the Brownian bridge from \(q_1\) to \(q_2\) of time length \(T>0\) is driven by the Itô SDE (6.31) where \(X(0) = q_1\), \(b\equiv 0\) and u satisfies the backward heat equation (6.27) with \(F\equiv 0\) and final value \(u(T,x) = \delta _{q_2}(x)\). See also Hsu (2002, Theorem 5.4.4). Thus, Brownian bridges are understood as stochastic Hamiltonian flows of the second-order Hamiltonian \(H(x,p,o) = \frac{1}{2} g^{ij}(x) p_ip_j - \frac{1}{2} g^{ij}(x) \Gamma _{ij}^k(x) p_k + \frac{1}{2} g^{ij}(x) o_{ij}\), compared with geodesics as Hamiltonian flows of the classical Hamiltonian \(H_0(x,p) = \frac{1}{2} g^{ij}(x) p_ip_j\) (cf. Abraham and Marsden 1978, Theorem 3.7.1). Here, the second-order Hamiltonian H is the g-canonical lift of \(H_0\). We can also say that Brownian bridges are “stochastization” or “stochastic deformation” of geodesics, cf. Example 6.9. Relations between geodesics and Brownian motions have attracted many studies. For example, one can find various interpolation relations between geodesics and Brownian motions in Angst et al. (2015) and Li (2016).
Example 6.12
(Euclidean quantum mechanics Chung and Zambrini 2003; Albeverio et al. 1989, 2006) It is insightful to consider the case \(M={\mathbb {R}}^d\) and \(b\equiv 0\). The Riemannian metric under consideration is the flat Euclidean one. To catch sight of the analogy with quantum mechanics, we involve the reduced Planck constant \(\hbar \) into the second-order Hamiltonian H of (6.26), so that
The system (6.10) then reads as
Note that the first three equations form a sub-system and can be solved separately, as they are independent of the coordinates \(o_{ij}\)’s. Equation (6.27) and its adjoint now reduce to the following \(\hbar \)-dependent backward and forward heat equations, respectively,
which together with the Born-type formula \(\mu _t(dx) = u(t,x) v(t,x) dx\) display the strong analogy to quantum mechanics Zambrini (1986).
The function \(S=\hbar \ln u\) solves the following \(\hbar \)-dependent HJB equation:
The first three equations then can be solved by letting \(p = \nabla S\). The first and third equations imply a Newton-type equation
This is indeed the equation of motion of the Euclidean version of quantum mechanics, which was the original motivation of Schrödinger in his well-known problem to be discussed in Sect. 7.3. See Chung and Zambrini (2003, p. 158) and Zambrini (2015, Eq. (4.17)) for more. Note that Chung and Zambrini (2003) and Zambrini (2015) used \(V=-F\) to denote the physical scalar potential and used the relation \(S= -\hbar \ln u\) and \(p = -\nabla S\) to formulate the HJB equation from backward heat equation in the case of nondecreasing (past) filtration.
There are two special cases of which more will be studied later.
-
(i)
When \(d=1\) and \(F(x) = \frac{1}{2} x^2\), i.e., \(H(x,p,o) = \frac{1}{2}(p^2 + x^2)+ \frac{1}{2}o\), we call its projective integral process X the (forward) stochastic harmonic oscillator. It is a stochastization of the classical harmonic oscillator with Hamiltonian \(H_0(x,p) = \frac{1}{2}(p^2 + x^2)\) (Abraham and Marsden 1978, Example 5.2.3). Likewise, here H is the canonical lift \(H_0\), see Sect. 6.6.
-
(ii)
When \(d=1\) and \(F(x) = -\frac{1}{2} x^2\), i.e., \(H(x,p,o) = \frac{1}{2}(p^2 - x^2)+ \frac{1}{2}o\), we call it the (forward) Euclidean harmonic oscillator.
6.4 The Mixed-Order Contact Structure on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\)
In the later subsections we will investigate time-dependent systems. The proper space for consideration is now \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\). Recall in (5.9) that \({\mathcal {T}}^{S*} M\times {\mathbb {R}}= J^2{\hat{\pi }}\), where the latter is the second-order jet bundle of \((M\times {\mathbb {R}}, {\hat{\pi }}, M)\).
In classical differential geometry, the first-order jet bundle \(J^1{\hat{\pi }} = T^* M\times {\mathbb {R}}\) can be equipped with an exact contact structure in several ways (Abraham and Marsden 1978, Section 5.1). Among others, the canonical symplectic form \(\omega _0\) on \(T^* M\) corresponds to a contact structure on \(J^1{\hat{\pi }}\) via \({\tilde{\omega }}_0 = {\hat{\pi }}^* \omega _0\), which is indeed exact as \({\tilde{\omega }}_0 = -d\tilde{\theta }_0\) for \({\tilde{\theta }}_0 = dt + {\hat{\pi }}^*\theta _0\). Another commonly used contact structure is the Poincaré–Cartan form \(\omega ^0_{H_0} = {\tilde{\omega }}_0 + dH_0\wedge dt\) for a given function \(H_0\in C^\infty (J^1{\hat{\pi }})\). It is also exact as \(\omega ^0_{H_0} = - d\theta ^0_{H_0}\) where \(\theta ^0_{H_0} = {\hat{\pi }}^*\theta _0 - H_0dt\). The advantage of the Poincaré–Cartan form, compared with the contact form \(\omega _0\), is that it can be related to the (time-dependent) Hamiltonian vector field \(V_{H_0}\) on \(T^* M\) of \({H_0}\). More precisely, the vector field \(\tilde{V}_{H_0} = \frac{\partial }{\partial {t}} + V_{H_0}\), treated as a vector field on \(J^1{\hat{\pi }}\) and called the characteristic vector field of \(\omega ^0_{H_0}\), is the unique vector field satisfying \(\tilde{V}_{H_0} \lrcorner \, \omega ^0_{H_0} = 0\) and \(\tilde{V}_{H_0}\lrcorner \, dt=1\).
Now we proceed in a similar way for the second-order jet bundle \(J^2{\hat{\pi }}\). Define
Then, \({\tilde{\omega }} = -d{\tilde{\theta }}\). We call the pair \((J^2{\hat{\pi }}, {\tilde{\omega }})\) a second-order contact manifold and the pair \((J^2{\hat{\pi }}, {\tilde{\theta }})\) a mixed-order exact contact manifold. In local coordinates, \({\tilde{\omega }}\) has the same expression as \(\omega \) in (6.6), but we stress that it is now a second-order form on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\). The form \({\tilde{\theta }}\) has the local expression
This makes clear that \({\tilde{\theta }}\) is a mixed-order form on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\).
A time-dependent second-order Hamiltonian H is a smooth function on \(J^2{\hat{\pi \cong }} {\mathcal {T}}^{S*} M\times {\mathbb {R}}\). The second-order Hamiltonian vector field \(A_H\) of H is now a time-dependent second-order vector field on \({\mathcal {T}}^{S*} M\), its horizontal integral process share the same equations as (6.10) or (6.17), only with H explicitly depending on time. Define a mixed-order vector field \({\tilde{A}}_H\) on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) by
where \(A_H\) is a second-order Hamiltonian vector field of the form (6.9). We call \({\tilde{A}}_H\) the extended second-order Hamiltonian vector field of H.
We define the second-order counterpart of Poincaré–Cartan form by
and call it the mixed-order Poincaré–Cartan form on \(\mathcal T^{S*} M\times {\mathbb {R}}\). It is exact in the sense that \(\omega _H = - d^\circ \theta _H\), where \(\theta _H = {\hat{\pi }}^{S*}\theta - Hdt = p_i d^2 x^i + \textstyle {\frac{1}{2}} o_{jk} dx^j\cdot dx^k - Hdt\).
The following lemma gives the relations between \(\omega _H\) and \({\tilde{A}}_H\).
Lemma 6.13
The class of extended second-order Hamiltonian vector fields \(\tilde{A}_H\) is the unique class of mixed-order vector fields on \(\mathcal T^{S*} M\times {\mathbb {R}}\) satisfying
Proof
Firstly, we show that \({\tilde{A}}_H\) satisfies the two equalities. The second equality is trivial. For the first one, we pick a mixed-order vector field B on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\); then,
To prove the uniqueness, it suffices to show that any mixed-order vector field A on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) satisfying \(A \lrcorner \, \omega _H = 0\) is a multiplier of \({\tilde{A}}_H\). Suppose that A has the local expression
Then, it follows that
The vanishing of each coefficient gives
Therefore, \(A = A^0 {\tilde{A}}_H\). \(\square \)
6.5 Canonical Transformations and Hamilton–Jacobi–Bellman Equations
Let us study the second-order analogs of canonical transformations and their generating functions. To do so, we need to find a change of coordinates from \((x^i, p_i, o_{jk},t)\) to \((y^i, P_i, O_{jk}, s)\) that preserves the form of stochastic Hamilton’s equations (6.10) (with time-dependent second-order Hamiltonian). More precisely, we have the following definition of canonical transformations between mixed-order contact structures, which is adapted from those between classical contact structures in Asorey et al. (1983).
Definition 6.14
Let \(({\mathcal {T}}^{S*} M\times {\mathbb {R}}, {\tilde{\omega }})\) and \((\mathcal T^{S*} N\times {\mathbb {R}}, {\tilde{\eta }})\) be two second-order contact manifolds corresponding to second-order tautological forms \(\theta \) and \(\vartheta \). A bundle isomorphism \({\textbf{F}}: ({\mathcal {T}}^{S*} M\times {\mathbb {R}}, {\hat{\pi }}_{2,1}, T^* M\times {\mathbb {R}})\rightarrow ({\mathcal {T}}^{S*} N\times {\mathbb {R}}, {\hat{\rho }}_{2,1}, T^* N\times {\mathbb {R}})\) is called a canonical transformation if its projection \({\mathbb {F}}\) is a bundle isomorphism from \((T^* M\times {\mathbb {R}}, {\hat{\pi }}^1_{0,1}, {\mathbb {R}})\) to \((T^* N\times {\mathbb {R}}, {\hat{\rho }}^1_{0,1}, {\mathbb {R}})\) projecting to \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\), and there is a function \(H_{{\textbf{F}}} \in C^\infty ({\mathcal {T}}^{S*} M\times {\mathbb {R}})\) such that
where \(\omega _{H_{{\textbf{F}}}} = {\tilde{\omega }} + d^\circ H_{\textbf{F}} \wedge dF^0\).
The map \({\textbf{F}}\) in the definition is also a bundle isomorphism from \(({\mathcal {T}}^{S*} M\times {\mathbb {R}}, {\hat{\pi }}^2_{0,1}, {\mathbb {R}})\) to \(({\mathcal {T}}^{S*} N\times {\mathbb {R}}, {\hat{\rho }}^2_{0,1}, {\mathbb {R}})\) projecting to \(F^0\). Hence, we may assume \({\textbf{F}}(\alpha _q, t) = (\bar{\textbf{F}}(\alpha _q, t), F^0(t))\) for all \((\alpha _q, t) \in {\mathcal {T}}^{S*} M\times {\mathbb {R}}\), where \(\bar{{\textbf{F}}}\) is a smooth map from \(\mathcal T^{S*} M\times {\mathbb {R}}\) to \({\mathcal {T}}^{S*} N\). For each \(t\in {\mathbb {R}}\), we define a map \(\bar{{\textbf{F}}}_t: {\mathcal {T}}^{S*} M \rightarrow \mathcal T^{S*} N\) by \(\bar{{\textbf{F}}}_t(\alpha _q) = \bar{\textbf{F}}(\alpha _q,t)\). We also introduce an injection \(\jmath _t: \mathcal T^{S*} M \rightarrow {\mathcal {T}}^{S*} M\times {\mathbb {R}}\) by \(\jmath _t(\alpha _q) = (\alpha _q,t)\). Then, we have \(\bar{{\textbf{F}}}_t = {\hat{\rho }}_{1,1} \circ {\textbf{F}} \circ \jmath _t\).
Lemma 6.15
The map \(\bar{{\textbf{F}}}_t\) is second-order symplectic for each \(t\in {\mathbb {R}}\) if and only if there is a mixed-order form \(\alpha \) on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) such that
In particular, condition (6.34) implies that each \(\bar{{\textbf{F}}}_t\) is a second-order symplectomorphism.
Proof
The sufficiency follows from
For the necessity, we observe that
So we can write \({\textbf{F}}^{R*}{\tilde{\eta }} - {\tilde{\omega }} = \alpha \wedge dt + \gamma \), where \(\gamma \) is a mixed-order form which does not involve dt. This leads to \(\gamma = ({\hat{\pi }}_{1,1})^{R*} \circ (\jmath _t)^{R*}\gamma = ({\hat{\pi }}_{1,1})^{R*} \circ (\jmath _t)^{R*}({\textbf{F}}^{R*}{\tilde{\eta }} - {\tilde{\omega }} - \alpha \wedge dt) = 0\). The result follows. \(\square \)
The following lemma gives some equivalent statements to the condition (6.34).
Lemma 6.16
Condition (6.34) is equivalent to the following:
-
(i)
\({\textbf{F}}^{R*}{\tilde{\vartheta }} - {\tilde{\theta }} + H_{{\textbf{F}}} dF^0\) is mixed-order closed;
-
(ii)
for all \(K\in C^\infty ({\mathcal {T}}^{S*} N\times {\mathbb {R}})\), \({\textbf{F}}^{R*} \eta _K = \omega _H\);
-
(iii)
for all \(K\in C^\infty ({\mathcal {T}}^{S*} N\times {\mathbb {R}})\), \({\textbf{F}}^{R}_* {\tilde{A}}_H = {\tilde{A}}_K\);
where \(H = (K\circ {\textbf{F}} + H_{{\textbf{F}}})\dot{F}^0\).
Proof
The equivalence between (6.34) and (i) is clear. For (6.34) \(\Rightarrow \) (ii), since \({\textbf{F}}\) projects to \(F^0\),
The converse (ii) \(\Rightarrow \) (6.34) is straightforward by letting \(K\equiv 0\). To show (ii) \(\Rightarrow \) (iii), by applying Lemma 6.13, it suffices to prove that
while
and
(iii) \(\Rightarrow \) (ii) is similar. \(\square \)
Definition 6.17
Let \({\textbf{F}}: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathcal {T}}^{S*} N\times {\mathbb {R}}\) be canonical. If we can locally write
for \(G\in C^\infty (M\times {\mathbb {R}})\), then we call G a generating function for the canonical transformation \({\textbf{F}}\).
We use (x, p, o, t) for local coordinates on \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) and (y, P, O, s) for those on \({\mathcal {T}}^{S*} N\times {\mathbb {R}}\). Recall that \({\textbf{F}}(\alpha _q, t) = (\bar{\textbf{F}}(\alpha _q, t), F^0(t))\). Then, using (A.4), the relation (6.35) reads in coordinates as
Balancing the coefficient of dt, we get
By Lemma 6.16, the new Hamiltonian function K after transformation \({\textbf{F}}\) is related to the old Hamiltonian H by \((H - K\circ {\textbf{F}})\dot{F}^0 = H_{{\textbf{F}}}\). Let us further assume that we can choose coordinates in which \((y^i)\) and \((x^i)\) are independent, so that the independent variables in (6.35) are (x, y, t). Then, relation (6.35) means
which implies that the generating function of the canonical transformation G(x, y, t) satisfies
The expressions for \((o_{jk})\) and \((O_{jk})\) are due to the mixed differential term in \(d^\circ G\) and correspond to the relation (6.15).
Remark 6.18
Unlike the canonical transformations of classical Hamiltonian systems which have four types of generating functions related via classical Legendre transform (see Goldstein et al. 2002, Section 9.1), here we can only have the type using (x, y, t) as independent variables but not others. This can be attributed to the ill-behaveness of the second-order analog of Legendre transform, as indicated in Remark 6.10.(iii). However, if the configuration space M is a Riemannian manifold, stochastic Hamiltonian mechanics can be simplified to share the same phase space \(T^* M\) as classical Hamiltonian mechanics, so that we can also have four types of generating functions. See Sect. 7.4.2 for details and examples of canonical transformations.
The Hamilton–Jacobi–Bellman (HJB) equation can be introduced as a special case of a time-dependent canonical transformation (6.37). In the case where \(F^0 = \textbf{Id}_{\mathbb {R}}\) and the new Hamiltonian K vanishes formally, we denote by S the corresponding generating function G. It follows from (6.37) that S solves the Hamilton–Jacobi–Bellman equation,
We will refer to Eq. (6.38) as the HJB equation associated with second-order Hamiltonian H, and a solution S of (6.38) as a second-order Hamilton’s principal function of H.
More generally, we have
Theorem 6.19
Let \(A_H\) be a second-order Hamiltonian vector field on \(({\mathcal {T}}^{S*} M, \omega )\) and let \(S\in C^\infty (M\times {\mathbb {R}})\). Then, the following statements are equivalent:
-
(i)
for every M-valued diffusion X satisfying
$$\begin{aligned} (DX(t), QX(t)) = d^2 \big (\tau _M^*\big )_{d^2 S(t,X(t))} A_H, \end{aligned}$$the \({\mathcal {T}}^{S*} M\)-valued process \(d^2S\circ X\) is a horizontal integral process of \(A_H\);
-
(ii)
S satisfies the Hamilton–Jacobi–Bellman equation
$$\begin{aligned} \frac{\partial S}{\partial t} + H(d^2 S, t) = f(t), \end{aligned}$$(6.39)for some function f depending only on t.
Proof
Let \({\textbf{X}} = d^2S\circ X\) and set \(x^i = x^i\circ d^2\,S\), \(p_i = p_i\circ d^2\,S\), \(o_{jk} = o_{jk}\circ d^2\,S\). Then,
These imply that the last equation of the system (6.17) holds. Since
the first two equations in (6.10) or (6.17) hold. Hence, to turn the process \({\textbf{X}} = d^2S\circ X\) into a horizontal integral process of \(A_H\), it is sufficient and necessary to make sure that the third equation in (6.17) holds. Plugging the first equation of (6.40) into the third equation, it reads as
A straightforward reinterpretation yields
The result follows. \(\square \)
Remark 6.20
If S solves the HJB equation (6.39), then \({\tilde{S}} = S - {\tilde{f}}\) solve (6.38) with \({\tilde{f}}\) a primitive function of f. As a matter of fact, one can always integrate the time-dependent function f into the second-order Hamiltonian function H such that the HJB equation (6.39) has the same form as (6.38). More precisely, if we let \({\tilde{H}} = H - f\), then Theorem 6.19 also holds with \({\tilde{H}}\) and zero function in place of H and f, respectively. A similar argument holds for S-H equations (6.10). Indeed, adding a function f depending only on time to a second-order Hamiltonian does not change its S-H equations.
Example 6.21
The function \(S=\ln u\) considered in Sect. 6.3 satisfies the Hamilton–Jacobi–Bellman equation (6.28), which is exactly \(\frac{\partial S}{\partial t} + H(d^2 S) = 0\) with the second-order Hamiltonian H given in (6.26). Hence, this theorem yields that the process \(d^2S\circ X\) is a horizontal integral process of \(A_H\), which coincides with (6.32). The Euclidean case for such argument has been discovered in Chung and Zambrini (2003, p. 180) or Zambrini (2015, Eq. (4.20)).
By (6.38) and (6.40), the total mean derivative of a second-order Hamilton’s principal function S is given by
where \((p(t,x),o(t,x)) = d^2 S(t,x)\) as in (6.40).
6.6 Second-Order Hamiltonian Functions from Classical Ones
In the presence of a linear connection \(\nabla \) on M, we are able to reduce (or produce) second-order Hamiltonian functions to (from) classical ones.
Let be given a second-order Hamiltonian function \(H: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\). We make use of the fiber-linear bundle injection \({\hat{\iota }}^*_\nabla : T^* M \rightarrow {\mathcal {T}}^{S*} M\) in (5.5) to define a classical Hamiltonian by
In canonical coordinates, it maps as \(H_0(x, p, t) = H(x, p, (\Gamma _{jk}^i(x) p_i), t)\). If we introduce a family of auxiliary variables by
Then, we can write
We say H reduces to \(H_0\) under the connection \(\nabla \), or \(H_0\) is the \(\nabla \)-reduction of H.
Clearly, the way to lift from a classical Hamiltonian \(H_0: T^* M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) to a second-order Hamiltonian function that reduces to \(H_0\) under \(\nabla \) is not unique. But there is a canonical reduction when we are provided with a symmetric (2, 0)-tensor field g (not necessarily Riemannian), given by
Then, \(H_0\) is the \(\nabla \)-reduction of \({\overline{H}}^g_0\), and
We call \({\overline{H}}^g_0\) the \((g,\nabla )\)-canonical lift of \(H_0\). If g is a Riemannian metric and \(\nabla \) is the associated Levi–Civita connection, then we simply call \({\overline{H}}^g_0\) the g-canonical lift of \(H_0\). If there is a classical Hamiltonian \(H_0\) such that the second-order Hamiltonian H is the \((g,\nabla )\)- (or g-) canonical lift of \(H_0\), we say H is \((g,\nabla )\)- (or g-) canonical.
As an example, the second-order Hamiltonian H of (6.26) is g-canonical and reduces to \(H_0(x,p) = \frac{1}{2} g^{ij}(x) p_ip_j + b^i(x)p_i + F(x)\).
Furthermore, for the canonical transformation \({\textbf{F}}: \mathcal T^{S*} M \rightarrow {\mathcal {T}}^{S*} N\) in Definition 6.14, we can reduce its associated function \(H_{{\textbf{F}}} \in C^\infty ({\mathcal {T}}^{S*} M\times {\mathbb {R}})\) to a classical function \(H^0_{{\textbf{F}}} \in C^\infty (T^* M\times {\mathbb {R}})\) via (6.42). As a consequence of (6.34), the projection of \({\textbf{F}}\), i.e., the map \({\mathbb {F}}: T^* M\times {\mathbb {R}}\rightarrow T^* N\times {\mathbb {R}}\) satisfies \({\mathbb {F}}^*{\tilde{\eta }}_0 = \omega ^0_{H^0_{{\textbf{F}}}}\) where \(\omega ^0_{H^0_{{\textbf{F}}}} = {\tilde{\omega }}_0 + d H^0_{{\textbf{F}}} \wedge dF^0\). It follows that \({\mathbb {F}}\) is a classical canonical transformation (Abraham and Marsden 1978, Definition 5.2.6).
We will go back to this issue in Sect. 7.4 where the second-order Legendre transform will be developed. In particular, we will show there that for the canonical second-order Hamiltonian in (6.44), the corresponding second-order Hamilton’s equations (6.17) can be rewritten on the cotangent bundle \(T^* M\) in a global fashion, see Theorem 7.22.
7 Stochastic Lagrangian Mechanics
In this section, we specify a Riemannian metric g for the manifold M, and a g-compatible linear connection \(\nabla \). Note that such g and \(\nabla \) always exist but are not unique in general.
We will denote by \(|\cdot |\) and \(\langle \cdot ,\cdot \rangle \) the Riemannian norm and inner product, respectively. Also, denote by \({\check{g}}\) the inverse metric tensor of g, and \((\Gamma _{jk}^i)\) the Christoffel symbols of \(\nabla \). We observe that \({\check{g}}\) is a (2, 0)-tensor field. Denote by R the Riemann curvature tensor and \(\textrm{Ric}\) the Ricci (1, 1)-tensor.
7.1 Mean Covariant Derivatives
Definition 7.1
(Vector fields and 1-forms along diffusions) Let X be diffusion on M. By a vector field along X, we mean a TM-valued process V, such that \(\tau _M (V(t)) = X(t)\) for all t. Similarly, by a 1-form along X, we mean a \(T^* M\)-valued process \(\eta \), such that \(\tau ^*_M (\eta (t)) = X(t)\) for all t.
Clearly, for a time-dependent vector field V on M, the restriction of V on X, i.e., \(\{V_{(t,X(t))}\}\), is a vector field along X. In this case, we call \(\{V_{(t,X(t))}\}\) a vector field restricted on X. In this way, vector fields restricted on X are just TM-valued horizontal diffusions projecting to X. Similarly for 1-forms.
Definition 7.2
(Parallelisms along diffusions) Let \(X \in I_{t_0}(M)\). A vector field V along X is said to be parallel along X if the following Stratonovich SDE in local coordinates holds,
A 1-form \(\eta \) along X is said to be parallel along X if
Definition 7.3
(Stochastic parallel displacements) Given a diffusion \(X\in I_{t_0} (M)\) and a (random) vector \(v \in T_{X(t_0)} M\), the stochastic parallel displacement of v along X is the extension of v to a parallel vector field V along X, that is, V satisfies the SDE (7.1) with initial condition \(V(t_0) = v\). We denote \(\Gamma (X)_{t_0}^t v:= V(t)\) and \(\Gamma (X)_t^{t_0} V(t):= v\). The stochastic parallel displacement of a (random) covector \(\eta \in T^*_{X(t_0)} M\) along X is defined in a similar fashion.
Definition 7.4
(Damped parallel displacements) Let \(X\in I_{t_0} (M)\). Given a (random) vector \(v \in T_{X(t_0)} M\) and covector \(\eta _0 \in T^*_{X(t_0)} M\), the damped parallel displacement of v along X is the extension of v to a vector field V along X that satisfies the SDE
The damped parallel displacement of \(\eta _0\) along X is the extension of \(\eta \) to a vector field \(\eta \) along X that satisfies the SDE
We denote \(\overline{\Gamma }(X)_{t_0}^t v:= V(t)\), \(\overline{\Gamma }(X)_{t_0}^t \eta _0:= \eta (t)\), and \(\overline{\Gamma }(X)_t^{t_0} V(t):= v\), \(\overline{\Gamma }(X)_t^{t_0} \eta (t):= \eta _0\).
If V and \(\eta \) are restrictions on X, that is, \(V(t) = V_{(t,X(t))}\) and \(\eta (t) = \eta _{(t,X(t))}\), then equations (7.2) and (7.3) can be rewritten, respectively, as
where we mean by \(R(\eta , V) W\) the 1-form \([R(\eta ^\sharp , V) W]^\flat \). The Stratonovich stochastic differentials can be transformed into Itô ones. For example, (7.3) is equivalent to
Remark 7.5
The notion of stochastic parallel displacements was introduced by Itô (1975) and Dynkin (1968). The notion of damped parallel displacement is due to Malliavin (1997). It was originally introduced by Dohrn and Guerra (1979), where they call it geodesic correction to the stochastic parallel displacement.
Lemma 7.6
Let \(X \in I_{t_0}(M)\).
-
(i)
Let \(\eta \) be a 1-form on M parallel along X. If V is a vector field on M which is also parallel along X, then \(\eta (V)(t) = \eta (V)(t_0)\) for all \(t\ge t_0\); if \(v \in T_{X(t_0)} M\), then \(\eta (\Gamma (X)_{t_0}^t v)(t) = \eta (v)(t_0)\) for all \(t\ge t_0\).
-
(ii)
Let \(\eta \) be a 1-form on along X satisfying the SDE (7.3). If V is a vector field along X satisfying the SDE (7.2), then \(\eta (V)(t) = \eta (V)(t_0)\) for all \(t\ge t_0\); if \(v \in T_{X(t_0)} M\), then \(\eta (\overline{\Gamma }(X)_{t_0}^t v)(t) = \eta (v)(t_0)\) for all \(t\ge t_0\).
Proof
We only prove Assertion (ii), as (i) is similar. Since Stratonovich stochastic differentials obey Leibniz’s rule, we have
This proves the first statement of (ii). The second statement of (ii) follows by letting \(V(t):= \overline{\Gamma }(X)_{t_0}^t v\). \(\square \)
Definition 7.7
(Mean covariant derivatives along diffusions) Given a diffusion X on M. Let V and \(\eta \) be time-dependent vector field and 1-form along X, respectively. The (forward) mean covariant derivative of V with respect to X is a time-dependent vector field \(\frac{{\textbf{D}}V}{dt}\) along X, defined by
The damped mean covariant derivative of V with respect to X is a time-dependent vector field \(\frac{\overline{{\textbf{D}}}V}{dt}\) along X with \(\overline{\Gamma }\) instead of \(\Gamma \) in (7.5). Similarly, we can define \(\frac{{\textbf{D}}\eta }{dt}\) and \(\frac{\overline{{\textbf{D}}}\eta }{dt}\).
Lemma 7.8
-
(i)
Let V and \(\eta \) be vector field and 1-form along X. If \(\eta \) is parallel along X, then
$$\begin{aligned} \textstyle {{\textbf{E}}\left[ \eta \left( \frac{{\textbf{D}}V}{dt} \right) \right] = {\textbf{E}}\left( D[\eta (V)] \right) .} \end{aligned}$$(7.6)If \(\eta \) satisfies the SDE (7.3), then (7.6) holds true with \(\frac{\overline{{\textbf{D}}}}{dt}\) instead of \(\frac{{\textbf{D}}}{dt}\).
-
(ii)
Let V be a vector field restricted on X. Then,
$$\begin{aligned} \frac{\overline{{\textbf{D}}}V}{dt}= & {} \frac{{\textbf{D}}V}{dt} + \frac{1}{2} (QX)^{ij} R(V,\partial _i)\partial _j = \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V \\{} & {} + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} V + R(V,\partial _i)\partial _j \right) . \end{aligned}$$ -
(iii)
Let \(\eta \) be a 1-form restricted on X. Then,
$$\begin{aligned} \frac{\overline{{\textbf{D}}}\eta }{dt}= & {} \frac{{\textbf{D}}\eta }{dt} - \frac{1}{2} (QX)^{ij} R(\eta ,\partial _j)\partial _i = \frac{\partial \eta }{\partial t} + \nabla ^{}_{D_\nabla X} \eta \\{} & {} + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} \eta - R(\eta ,\partial _j)\partial _i \right) . \end{aligned}$$ -
(iv)
Let V and \(\eta \) be a vector field and a 1-form restricted on X. Then,
$$\begin{aligned} {\textbf{D}}_{\textrm{t}}[\eta (V)]= & {} \eta \left( \frac{{\textbf{D}}V}{dt} \right) + \frac{{\textbf{D}}\eta }{dt} (V) + (QX)^{ij} (\nabla _{\partial _i} \eta ) (\nabla _{\partial _j} V) \\= & {} \eta \left( \frac{\overline{{\textbf{D}}}V}{dt} \right) + \frac{\overline{{\textbf{D}}}\eta }{dt} (V) + (QX)^{ij} (\nabla _{\partial _i} \eta ) (\nabla _{\partial _j} V). \end{aligned}$$
Proof
-
(i)
By Lemma 7.6.(i), we have
$$\begin{aligned} \begin{aligned} {\textbf{E}}\left[ \eta \left( \frac{{\textbf{D}}V}{dt} \right) (t) \right]&= \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \frac{ \eta (t) (\Gamma (X)_{t+\epsilon }^t V(t+\epsilon ) ) - \eta (t)(V(t)) }{\epsilon } \right] \\&= \lim _{\epsilon \rightarrow 0} {\textbf{E}}\left[ \frac{ \eta ( V ) (t+\epsilon ) - \eta ( V ) (t) }{\epsilon } \right] \\&= {\textbf{E}}\left( D[\eta (V)(t)] \right) . \end{aligned} \end{aligned}$$This proves the first statement of (i). The second statement of (i) follows by a similar argument with \(\frac{\overline{{\textbf{D}}}}{dt}\) in place of \(\frac{{\textbf{D}}}{dt}\) and \(\overline{\Gamma }\) in place of \(\Gamma \).
-
(ii)
It suffices to derive the expression for \(\frac{\overline{{\textbf{D}}}V}{dt}\). Suppose that \(\eta \) is a 1-form satisfying the SDE (7.3) and the diffusion X satisfies \(QX(t) = (\sigma \circ \sigma ^*)(t,X(t))\). Then, we apply Itô’s formula to \(\eta (V)(X(t))\) and make use of (2.20) and (7.4). We get
$$\begin{aligned} \begin{aligned} d[\eta (V)]&= d(\eta _i V^i) = \eta _i \left( \frac{\partial V^i}{\partial t} dt + \frac{\partial V^i}{\partial x^j} dX^j + \frac{1}{2} \frac{\partial ^2 V^i}{\partial x^j\partial x^k} d[X^j, X^k] \right) \\&\quad + V^j d\eta _j + d[\eta _j, V^j] \\&= \eta _i \left( \frac{\partial V^i}{\partial t} + \frac{\partial V^i}{\partial x^j} (DX)^j + \frac{1}{2} \frac{\partial ^2 V^i}{\partial x^j\partial x^k} (QX)^{jk} \right) dt + \eta _i \frac{\partial V^i}{\partial x^j} \sigma _r^j dB^r \\&\quad + V^j \left[ \Gamma _{jk}^i (DX)^k + \frac{1}{2} (QX)^{kl}\left( \frac{\partial \Gamma _{jk}^i}{\partial x^l} + \Gamma _{jk}^m \Gamma _{ml}^i \right) + \frac{1}{2} R^i_{kjl} (QX)^{kl} \right] \eta _i dt \\&\quad + V^j \Gamma _{jk}^i \eta _i \sigma _r^k dB^r + \Gamma _{jk}^i \eta _i \frac{\partial V^j}{\partial x^l} (QX)^{kl} dt \\&= \eta _i \left[ \frac{\partial V^i}{\partial t} + \left( \frac{\partial V^i}{\partial x^k} + V^j \Gamma _{jk}^i \right) (D_\nabla X)^k \right] dt \\&\quad + \frac{1}{2} \eta _i (QX)^{kl} \left[ -\frac{\partial V^i}{\partial x^j} \Gamma ^j_{kl} + \frac{\partial ^2 V^i}{\partial x^k\partial x^l} + V^j \left( -\Gamma _{jm}^i \Gamma ^m_{kl} + \frac{\partial \Gamma _{jk}^i}{\partial x^l} + \Gamma _{jk}^m \Gamma _{ml}^i \right) \right. \\&\quad \left. + 2 \Gamma _{jk}^i \frac{\partial V^j}{\partial x^l} \right] dt \\&\quad + \frac{1}{2} \eta _i R^i_{kjl} (QX)^{kl} V^j dt + \eta _i \left( \frac{\partial V^i}{\partial x^k} + V^j \Gamma _{jk}^i \right) \sigma _r^k dB^r \\&= \eta \left( \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} V + R(V,\partial _i)\partial _j \right) \right) dt \\&\quad + \eta \left( \nabla _{\sigma _r} V \right) dB^r. \end{aligned} \end{aligned}$$Hence, the result (i) implies
$$\begin{aligned} {\textbf{E}}\left[ \eta \left( \frac{\overline{{\textbf{D}}}V}{dt} \right) \right]= & {} {\textbf{E}}\left( D[\eta (V)(t)] \right) \\= & {} {\textbf{E}}\left[ \eta \left( \frac{\partial V}{\partial t} + \nabla ^{}_{D_\nabla X} V + \frac{1}{2} (QX)^{ij} \left( \nabla ^2_{\partial _i,\partial _j} V + R(V,\partial _i)\partial _j \right) \right) \right] . \end{aligned}$$The arbitrariness of \(\eta \) yields (ii).
-
(iii)
Similar to (ii).
-
(iv)
We only prove the first equality as the second is similar. By (4.6),
$$\begin{aligned} \begin{aligned} {\textbf{D}}_{\textrm{t}} [\eta (V)]&= \left( \frac{\partial }{\partial t} + (D_\nabla X)^i \partial _i + \frac{1}{2} (QX)^{ij} \nabla ^2_{\partial _i,\partial _j} \right) [\eta (V)] \\&= \left( \frac{\partial \eta }{\partial t} \right) (V) + \eta \left( \frac{\partial V}{\partial t} \right) + \left( \nabla ^{}_{D_\nabla X} \eta \right) (V) + \eta \left( \nabla ^{}_{D_\nabla X} V \right) \\&\quad + \frac{1}{2} (QX)^{ij} \left[ \left( \nabla ^2_{\partial _i,\partial _j} \eta \right) (V) + \eta \left( \nabla ^2_{\partial _i,\partial _j} V \right) \right. \\&\quad \left. + \left( \nabla _{\partial _i} \eta \right) \left( \nabla _{\partial _j} V \right) + \left( \nabla _{\partial _j} \eta \right) \left( \nabla _{\partial _i} V \right) \right] \\&= \eta \left( \frac{{\textbf{D}}V}{dt} \right) + \frac{{\textbf{D}}\eta }{dt} (V) + (QX)^{ij} (\nabla _{\partial _i} \eta ) (\nabla _{\partial _j} V). \end{aligned} \end{aligned}$$The result follows.
\(\square \)
If \(QX(t) = {\check{g}}(X(t))\), then
and similarly,
where \(\Delta \) is the connection Laplacian, and \(\Delta _{\textrm{LD}} = -( dd^*+d^* d)\) is the Laplace–de Rham operator on forms. The relation \(\Delta _{\textrm{LD}} = \Delta - \textrm{Ric}\) is due to the Weitzenböck identity (Petersen 2016, Theorem 9.4.1). We remark here that the operator \(\Delta + \textrm{Ric}\) acting on vector fields is also called Laplace–de Rham operator in Dohrn and Guerra (1979).
In the context of fluid dynamics, the operator \(\frac{\partial }{\partial t}+ \nabla _v\), with v a vector field, is often referred to as material derivative or hydrodynamic derivative. So the mean covariant derivative \(\frac{{\textbf{D}}}{dt}\) and its damped variant \(\frac{\overline{{\textbf{D}}}}{dt}\) can be regarded as stochastic deformations of material derivative.
7.2 A Stochastic Stationary-Action Principle
In this subsection, we will establish a type of stochastic stationary-action principle: the stochastic Hamilton’s principle. Another version for systems with conserved energy, the stochastic Maupertuis’s principle, can be found in “Appendix C.”
In contrast to second-order Hamiltonians, not all real-valued functions on \({\mathcal {T}}^S M\) can be used as second-order Lagrangians in stochastic Lagrangian mechanics. This has been hinted in Sect. 6.3, as we have mentioned in Remark 6.10. For this reason, we will produce a class of second-order Lagrangians from classical Lagrangians, via the fiber-linear bundle projection \(\varrho _\nabla \) in (3.3) and the \(\nabla \)-canonical coordinates \((D_\nabla ^i x)\) in (3.2).
Definition 7.9
By an admissible second-order Lagrangian, we mean a function \(L:{\mathbb {R}}\times {\mathcal {T}}^S M\rightarrow {\mathbb {R}}\) such that there exists a classical Lagrangian \(L_0: {\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) satisfying \(L = L_0 \circ (\textbf{Id}_{\mathbb {R}}\times \varrho _\nabla )\). We call L the \(\nabla \)-lift of \(L_0\).
In local coordinates, the \(\nabla \)-lift L of \(L_0\) is expressed as
Let \(T>0\). Our stochastic variational problem consists in finding the extrema (maxima or minima) of the stochastic action functional
over a suitable domain of diffusions X on M, where L is an admissible second-order Lagrangian lifted from \(L_0\).
In order to formulate a well-posed stochastic variational problem in an economical way, we assume that the manifold M is compact and the metric g is geodesically complete (which will be used to characterize the variations of diffusions in Lemma 7.13), and that the connection \(\nabla \) is the associated Levi–Civita connection. The geodesic completeness can be ensured, for example, if M is connected (see, e.g., Lee 2013, p. 346). Whenever the metric g is given, the associated Levi–Civita connection is uniquely determined, due to the fundamental theorem of Riemannian geometry (Kobayashi and Nomizu 1963, Theorem IV.2.2). We will refer to such a geodesically complete Riemannian metric as a reference metric tensor.
For a fixed point \(q \in M\) and a probability distribution \(\mu \in {\mathcal {P}}(M)\) on M, we define an admissible class of diffusions by
where \(I_{(0,q)}^{(T,\mu )}(M)\) denotes the set all M-valued diffusion processes starting from q at \(t=0\) and with final distribution \(\mu \), i.e., \({\textbf{P}}\circ (X(T))^{-1} = \mu \). The action functional \({\mathcal {S}}\) is now defined on the set \({\mathcal {A}}_g([0,T];q, \mu )\), that is, \({\mathcal {S}}: {\mathcal {A}}_g([0,T];q, \mu ) \rightarrow {\mathbb {R}}\).
Note that the admissible class \({\mathcal {A}}_g\) is similar to the Wiener space, so that a candidate for its “tangent space” is Cameron–Martin space. Denote by \({\mathcal {H}}([0,T];q)\) the Hilbert space of absolutely continuous curves \(v:[0,T]\rightarrow T_q M\) such that \(\int _0^T |\dot{v}(t)|^2 {\textrm{d}}t < \infty \). Let \({\mathcal {H}}_0([0,T];q)\) be the subspace consisting of all \(v\in {\mathcal {H}}([0,T];q)\) satisfying \(v(0) = v(T) = 0\).
Definition 7.10
Let \(X\in {\mathcal {A}}_g([0,T];q, \mu )\). For a curve \(v\in \mathcal H_0([0,T];q)\), the vector field along X given by \(V(t):=\Gamma (X)_0^t v(t)\) is called a tangent vector to \({\mathcal {A}}_g([0,T];q, \mu )\) at X. The tangent space to \({\mathcal {A}}_g ([0,T];q, \mu )\) at X is the set of all such tangent vectors, that is,
Definition 7.11
By a variation (or deformation) of a diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) along \(v\in {\mathcal {H}}_0([0,T];q)\), we mean a one-parameter family of diffusions \(\{X^v_\epsilon \}_{\epsilon \in (-\varepsilon ,\varepsilon )}\), where for each \(t\in [0,T]\), \(X^v_\epsilon (t)\) satisfies the following ODE
The diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) is called a stationary (or critical) point of \({\mathcal {S}}\), if the first variation \(\delta {\mathcal {S}}\) vanishes at X, i.e.,
Remark 7.12
-
(i)
The variations of diffusions on manifolds, via differential equation (7.11), is standard in stochastic analysis on path spaces of Riemannian manifolds. See for example Driver (1992, Eq. (2.3)) and Hsu (1995, Theorem 4.1), where it is shown that Wiener measure is quasi-invariant under such variations. This kind of variations has some equivalent constructions. For instance, the previous two references also provided an approach by lifting to the frame bundle and projecting to the Euclidean space (a stochastic analog of Cartan’s development), while Fang and Malliavin (1993) provided an alternative perspective via Bismut connection.
-
(ii)
The stochastic variational problem (7.9)–(7.12) in the Euclidean context has also been familiar in stochastic optimal transport/control. See Sects. 7.3 and 7.4.4 for connections to those areas.
-
(iii)
Unlike the infinitesimal variation used in Definition 4.11 for studying symmetries of SDEs, the infinitesimal variation here in (7.11) needs to be a parallel vector field.
The following lemma is the key for establishing stochastic Hamilton’s principle. The first statement shows that the variation \(X^v_\epsilon \) is well defined on the path space \({\mathcal {A}}_g([0,T];q, \mu )\). The second one describes the infinitesimal changes of \(D_\nabla X^v_\epsilon \) with respect to the variation parameter \(\epsilon \). The proof of the latter is based on a geodesic approximation technique, which is originally due to Itô (1962).
Lemma 7.13
Given \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) and \(v\in {\mathcal {H}}_0([0,T];q)\). We have
-
(i)
for each \(\epsilon \in (-\varepsilon ,\varepsilon )\), \(X^v_\epsilon \in {\mathcal {A}}_g([0,T];q, \mu )\); and
-
(ii)
for all \(t\in [0,T]\),
$$\begin{aligned} \frac{D}{d\epsilon }\bigg |_{\epsilon =0} D_\nabla X^v_\epsilon (t) = \Gamma (X)_0^t \dot{v}(t) + \frac{1}{2}(QX)^{ij}(t) R\left( \Gamma (X)_0^t v(t),\partial _i\right) \partial _j, \end{aligned}$$(7.13)where \(\dot{v}(t) = \frac{{\textrm{d}}}{{\textrm{d}}t}v(t) \in T_{v(t)} T_{q} M \cong T_{q} M\), \(\frac{D}{d\epsilon }\) is the (classical) covariant derivative with respect to the parameter \(\epsilon \).
Proof
(i) Let \(\xi \) and \(\xi _\epsilon \) be the anti-development (Hsu 2002, Definition 2.3.1) of X and \(X^v_\epsilon \), respectively, with fixed initial frame \(r(0) \in O_{q}M\). Equivalently, for example, \(\xi \) is an \({\mathbb {R}}^d\)-valued diffusion related to X by the following SDEs (Hsu 2002, Section 2.3)
Applying the fact that \(\sum _{k=1}^d r_k^i r_k^j= g^{ij}\) (e.g., Kobayashi and Nomizu 1963, Proposition 1.5) and the condition \(QX(t) = \check{g}(X(t))\), we have
and consequently, \(Q\xi \equiv {\textbf{I}}_d\). Meanwhile, it follows from Fang and Malliavin (1993, Section 3.5) (or Driver 1992, Theorem 5.1, Hsu 1995, Section 3) that
where \(\Omega \) is the curvature form on the orthogonal frame bundle OM, taking values in \(\mathfrak {so}(d)\), and the frame r(0) is viewed as an isomorphism from \({\mathbb {R}}^d\) to \(T_{q} M\). It follows that \(Q\xi _\epsilon = Q\xi \equiv {\textbf{I}}_d\). For the reason similar to (7.14), we have \(QX^v_\epsilon (t) = {\check{g}}(X^v_\epsilon (t))\). The result follows. See (Driver 1992, Theorem 8.3) for a more elaborate proof.
(ii) Fix \(n,m \in {\mathbb {N}}_+\). Let \(0=t_0<t_1<\cdots <t_n=T\) be a division of the time interval [0, T], and let \(-\varepsilon =\epsilon _{m-}<\cdots<\epsilon _{-1}<0= \epsilon _0<\epsilon _1<\cdots <\epsilon _m=\varepsilon \) be one of the variation parameter interval \((-\varepsilon ,\varepsilon )\). Denote \(\Delta t_i:= t_i - t_{i-1}\). Consider the polygonal curve \(x^n = \{x^n(t)\}_{t\in [0,T]}\), which is an approximation of X made of minimizing geodesic segments joining \(X(t_{i-1})\) with \(X(t_i)\) for all \(1\le i\le n\). This is attainable by the geodesic completeness. We will construct an approximation scheme for the variational processes \(X^v_\epsilon \)’s.
For \(\epsilon \in [\epsilon _0,\epsilon _1]\), we construct the approximation \(x^n_\epsilon \) of \(X^v_\epsilon \) as follows. We extend each \(X(t_i)\), \(0\le i\le n\), to a geodesic
Let \(x^n_\epsilon = \{x^n_\epsilon (t)\}_{t\in [0,T]}\) be the polygonal curve consisting of minimizing geodesic segments joining \(\gamma _0^{(i-1)}(\epsilon )\) with \(\gamma _0^{(i)}(\epsilon )\) for all \(1\le i\le n\).
Then, we construct \(x^n_\epsilon \) for \(\epsilon \in [\epsilon _j,\epsilon _{j+1}]\), \(1\le j\le m-1\), by induction. Suppose \(x^n_\epsilon \), \(\epsilon \in [\epsilon _{j-1},\epsilon _j]\), has been defined. Then, in particular, we have a curve \(x^n_{\epsilon _j}\). Extend each \(x^n_{\epsilon _j}(t_i)\), \(0\le i\le n\), to a geodesic by
Let \(x^n_\epsilon \) be the polygonal curve consisting of minimizing geodesic segments joining \(\gamma _j^{(i-1)}(\epsilon )\) with \(\gamma _j^{(i)}(\epsilon )\) for all \(1\le i\le n\). In a similar way, we can define \(x^n_\epsilon \) for \(\epsilon \in [\epsilon _j,\epsilon _{j+1}]\), \(-m\le j\le -1\).
Now we have a family of polygonal curves \(\{x^n_\epsilon : \epsilon \in (-\varepsilon ,\varepsilon )\}\), which satisfies \(x^n_0 = x^n\) and
As for each \(\epsilon \in (-\varepsilon ,\varepsilon )\) and \(1\le i\le n\), \(\{x^n_\epsilon (t)\}_{t\in [t_{i-1},t_i]}\) is a geodesic, the vector field
is a Jacobi field along \(\{x^n(t)\}_{t\in [t_{i-1},t_i]}\). This leads to the following Jacobi equation
with boundary values
Since the connection is torsion-free, we can exchange the covariant derivative and standard derivative to have
On the other hand, Taylor’s theorem yields
Combining (7.15)–(7.18), we have
A standard limit theorem yields the result (ii). \(\square \)
Remark 7.14
-
(i)
The constraint \(QX(t) = {\check{g}}(X(t))\) in (7.10) looks strong. A possibly better viewpoint is to force all diffusions under consideration to have the same nondegenerate diffusion tensor a, i.e., \(QX(t) = a(X(t))\). Then, the inverse of a defines a Riemannian metric g, cf. Ikeda and Watanabe (1989, Section V.4). As can be seen from the first part of the above proof, the constraint of fixing the diffusion tensor is a natural one in the literature of variational calculus on the path space. An intuitive reason for this constraint is to assure that the induced measures are equivalent, which is necessary for Eq. (7.11) to be well-posed, cf. Driver (1992). The assumption of Levi–Civita connection \(\nabla \) may be relaxed to that the connection \(\nabla \) is g-compatible and torsion skew symmetric (Driver 1992, Definition 8.1), in which case the second assertion of this lemma needs to add the effect of torsion.
-
(ii)
One may expect from the limits of (7.15) and (7.16) that there is a “stochastic” Jacobi equation with two boundary values describing the difference between a diffusion and an “infinitesimally close” diffusion, cf. Arnaudon and Thalmaier (1998).
For a smooth function f on TM, we denote by \(d_{\dot{x}} f\) the differential of f with respect to the coordinates \((\dot{x}^i)\). Since \(T_{(x,\dot{x})} T_x M \cong T_x M\), \(d_{\dot{x}} f\) is treated as a 1-form on \(T_x M\) and
We call \(d_{\dot{x}} f\) the vertical differential of f. Regarding the differential with respect to the coordinates \((x^i)\), we introduce the horizontal differential which depends on the connection \(\nabla \), by
It is easy to check that both definitions (7.19) and (7.20) are invariant under change of coordinates. In fact, by the classical theory (Saunders 1989, Section 3.5 and Example 4.6.7), we know that the connection \(\nabla \) can uniquely determine a TTM-valued 1-form on TM horizontal over M, which is given in local coordinates by
Hence, the horizontal differential is \(d_x f = \Gamma (df)\), where df is the total differential of f. Given a vector field V on M, \(f\circ V: q\mapsto f(V_q)\) is a smooth function on V. Then, it is easy to check that
The following integration-by-parts formula will be used. Its proof is straightforward from definitions of stochastic integrals and mean derivatives, cf. Cruzeiro and Zambrini (1991, Lemma 4.4).
Lemma 7.15
Let \(X = \{X(t)\}_{t\in [0,T]}\) be a real-valued continuous semimartingale such that DX exists, let f be a real-valued continuous process on [0, T], of finite variation. Then,
Now we are in position to present the stochastic version of Hamilton’s principle.
Theorem 7.16
(Stochastic Hamilton’s principle) Let \(L_0\) be a regular Lagrangian on \({\mathbb {R}}\times TM\). A diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\) is a stationary point of \({\mathcal {S}}\), if and only if X satisfies the following stochastic Euler–Lagrange (S-EL) equation
where \(\frac{\overline{{\textbf{D}}}}{dt}\) is the damped mean covariant derivative with respect to X.
We remark that since \(QX(t) = {\check{g}}(X(t))\), the operator \(\frac{\overline{{\textbf{D}}}}{dt}\) in (7.22) is just the one of (7.7). The unknown in (7.22) is the process X, so the conditions \(X(0) = q\) and \({\textbf{P}}\circ (X(T))^{-1} = \mu \), indicated in the assumption \(X\in {\mathcal {A}}_g([0,T];q, \mu )\), can be regarded as boundary conditions of (7.22).
Proof
Denote \(V(t) = \Gamma (X)_0^t v(t)\). It follows from (7.13) and (7.21) that
By Lemmas 7.6.(ii) and 7.15 and the fact that \(v(0) = v(T) = 0\), we have
Thus, by Lemma 7.8.(iii),
The arbitrariness of v yields the desired result. \(\square \)
Remark 7.17
-
(i)
For a special class of Lagrangians in the Euclidean context, the stochastic Euler–Lagrange equation (7.22) has been established in Cruzeiro and Zambrini (1991, Subsection 5.1) where they called it stochastic Newton equation, see also Zambrini (2015). For general Lagrangians on Riemmannian manifolds, Eq. (7.22) is new (to the authors’ best knowledge). See Sect. 7.3 for discussions of a special case.
-
(ii)
The second author and his collaborator formulated a weak stochastic Euler–Lagrange equation in Lassalle and Zambrini (2016). They mean by “weak” that their stochastic Euler–Lagrange equation holds in the sense of stochastic integrals. The main differences between their formulation and ours is that we get rid of the stochastic integral (martingale) part in our equation since we use mean derivatives instead of stochastic differentials.
7.3 An Inspirational Example: Schrödinger’s Problem
The inspirational example of stochastic Hamiltonian mechanics presented in Sect. 6.3 also provides an example of our stochastic Lagrangian mechanics. Consider the following Lagrangian defined on \({\mathbb {R}}\times TM\):
where b is a given time-dependent vector field on M. It actually relates to the second-order Hamiltonian H in (6.26) via the second-order Legendre transform, which will be considered in Sect. 7.4. For such Lagrangian, we can directly figure out the relation between stochastic Euler–Lagrange equation (7.22) and Hamilton–Jacobi–Bellman equation. We denote by \(I_0^T(M)\) the set all M-valued diffusion processes over time interval [0, T].
Theorem 7.18
(S-EL & HJB) Let \(L_0\) be as in (7.25). If \(X\in I_0^T(M)\) satisfies
for a function \(S:{\mathbb {R}}\times M\rightarrow {\mathbb {R}}\), then X is a solution of the stochastic Euler–Lagrange equation (7.22) if and only if S solves the following Hamilton–Jacobi–Bellman equation
with f a function depending only on t.
Proof
For a function g on \({\mathbb {R}}\times M\), we will denote by dg the exterior differential of g on M, i.e., with respect to coordinates \((x^i)\). Condition (7.26) can be rewritten in local coordinates as
Then, it is clear that
Since \(\nabla g = 0\), we use Leibniz’s rule to derive
Now we take the differential with respect to x to the HJB equation (7.27). Obviously,
For the second term,
For the third term, we use again \(\nabla g = 0\). Then, we have
For the fourth term, in the same way we have
Combining these together and applying (7.26)–(7.30) as well as (7.7), we obtain
The result follows. \(\square \)
Remark 7.19
Equation (7.29) gives the relation between Lagrangians and second-order Hamilton’s principal functions. It is valid for more general Lagrangians, see Remark 7.23.(i).
Theorem 7.18 strongly suggests some relations between stochastic Lagrangian (and also Hamiltonian) mechanics and Schrödinger’s problem in the reinterpretation of optimal transport. In the setting of the latter (see, e.g., Cruzeiro et al. 2000; Léonard 2014; Léonard et al. 2014), there is a given reversible positive measure \({\textbf{R}}\) on the path space \({\mathcal {C}}_0^T = C([0,T], M)\), called reference measure, as well as two probability distributions \(\mu _0,\mu _T\in {\mathcal {P}}(M)\). Schrödinger’s problem aims to minimize the following relative entropy:
over all probability measures \({\textbf{P}}\) on \({\mathcal {C}}_0^T\) such that \(\mu _0,\mu _T\) are the initial and final time marginal distributions of \({\textbf{P}}\), i.e., \({\textbf{P}}_0 = \mu _0\) and \({\textbf{P}}_T = \mu _T\), where \({\textbf{P}}_t:= {\textbf{P}}\circ (X(t))^{-1}\) is the time marginal distribution of \({\textbf{P}}\) and \(X(t): {\mathcal {C}}_0^T\rightarrow M, X(t,\omega ) = \omega (t)\) is the coordinate mapping. Denote, respectively, by \(X_{\textbf{R}}\) and \(X_{\textbf{P}}\), the coordinate process X under the measure \({\textbf{R}}\) and \({\textbf{P}}\). Then, Girsanov theorem implies that (Léonard 2012a, Theorem 1) a necessary condition for the finite entropy condition \(H({\textbf{P}}|{\textbf{R}}) < \infty \) is \(QX_{\textbf{P}}= QX_{\textbf{R}}\), \({\textbf{P}}\)-a.s.. Furthermore, if \({\textbf{R}}\) is a diffusion measure, i.e., \(X_{\textbf{R}}\) is a diffusion process, then a similar application of Girsanov theorem yields that a necessary condition for \(H({\textbf{P}}|\textbf{R}) < \infty \) is that \({\textbf{P}}\) is also a diffusion measure and there exists a time-dependent vector field w such that
The solution \({\textbf{P}}\) of Schrödinger’s problem, i.e., minimizing (7.31), is related to the reference measure \({\textbf{R}}\) by a time-symmetric version of Doob’s h-transform (Léonard 2014, Section 3). Its coordinate process \(X_{\textbf{P}}\) is sometimes called a Schrödinger bridge or Schrödinger process. When the reference measure \({\textbf{R}}\) is Markovian, i.e., the law of a Markov process, the solution process \(X_{\textbf{P}}\) is also called a reciprocal (Bernstein 1932; Jamison 1975) or Bernstein process (Cruzeiro et al. 2000; Cruzeiro and Vuillermot 2015).
If the manifold M is endowed with a Riemannian metric g, and the reference coordinate process \(X_{\textbf{R}}\) has generator
for some time-dependent vector field b on M, then the density \(\mu (t,x) = \frac{{\textrm{d}}\,{\textbf{P}}^*_t}{{\textrm{d}}\,\textrm{Vol}}(x)\) of the minimizer \({\textbf{P}}^*\) of (7.31) solves the following Kolmogorov forward equation
where S solves the HJB equation (7.27) with \(f \equiv 0\), or (6.28).
Moreover, an analog of Benamou–Brenier formula was derived (see Léonard 2014). Consider the problem of minimizing the average action
among all pairs \((\rho ,v)\), where is \(\rho =(\rho (t))_{t\in [0,T]}\) is a measurable path in \({\mathcal {P}}(M)\), \(v=(v(t))_{t\in [0,T]}\) is a measurable time-dependent vector field and the following constraints are satisfied (in the weak sense of PDEs):
The relation between \(\rho \) in (7.33) and \({\textbf{P}}\) in (7.31) is just that \(\rho \) is the time marginal of \({\textbf{P}}\), namely,
The minimizer of (7.33) is the pair \((\mu ,\nabla S+b)\) where \(\mu \) solves (7.32) and S solves (6.28).
These results are summarized in the following equivalent relations:
Now if the coordinate process \(X_{\textbf{R}}\) under the reference measure \({\textbf{R}}\) is a nondegenerate M-valued diffusion in \(I_0^T(M)\) which is diffusion-homogeneous, then assigning such a reference measure \({\textbf{R}}\) amounts to assigning a pair \((b_{\textbf{R}},g_{\textbf{R}})\in \Gamma (TM \otimes \textrm{Sym}^2(T^* M))\), where \(g_{\textbf{R}}\) is a positive-definite symmetric (0, 2)-tensor, i.e., a Riemannian metric tensor. More precisely, we let \(A^{X_{\textbf{R}}} = ({\mathfrak {b}}, a) + F\) be the generator of \(X_{\textbf{R}}\). Since \(X_{\textbf{R}}\) is nondegenerate and diffusion-homogeneous, a is a time-independent nondegenerate symmetric (2, 0)-tensor field. Let \(g_{\textbf{R}} = {\hat{a}}\) be the inverse of a, so that \(g_{\textbf{R}}\) is a Riemannian metric tensor. We then equip the Riemannian manifold \((M, g_{\textbf{R}})\) with the associated Levi–Civita connection \(\nabla \). The isomorphism (2.19) implies that
where \(b_{\textbf{R}}\) is the time-dependent vector field given by \(b_{\textbf{R}}^i = ( {\mathfrak {b}}^i + \textstyle {{\frac{1}{2}}} g_{\textbf{R}}^{jk} \Gamma ^i_{jk} )\), and \(\nabla \) and \(\Delta \) are the gradient and Laplace–Beltrami operator with respect to \(g_{\textbf{R}}\), respectively.
We set that \({\textbf{P}}\) is a diffusion measure and \(QX_{\textbf{P}}= QX_{\textbf{R}} = {\check{g}}_{\textbf{R}}\), \({\textbf{P}}\)-a.s., which is a necessary condition for \(H({\textbf{P}}|{\textbf{R}}) < \infty \). Then, by (3.4), the generator of \(X_{\textbf{P}}\) is given by
From (7.34) and (7.35), one can see that \(v(t,X(t)) = D_\nabla X_{\textbf{P}}(t)\) and the action (7.33) equals to
So the minimizing problem turns into minimizing the action (7.37) over all diffusion measures \({\textbf{P}}\in {\mathcal {P}}({\mathcal {C}}_0^T)\) with \({\textbf{P}}_0 = \mu _0\), \({\textbf{P}}_T = \mu _T\) and \(QX_{\textbf{P}}= {\check{g}}_\textbf{R}\), \({\textbf{P}}\)-a.s.. If \(\mu _0 = \delta _{q}\) and \(\mu _T = \mu \), this brings us back to our stochastic variational problem, that is, to minimize the action functional \({\mathcal {S}}\) in (7.9) over \({\mathcal {A}}_{g_{\textbf{R}}}([0,T];q, \mu )\), with Lagrangian \(L_0(t, x,\dot{x}) = \frac{1}{2} |\dot{x}-b_{\textbf{R}}(t,x)|^2 - F(t,x)\). Note that in this case, since \({\textbf{P}}_0 = \mu _0\) is Dirac, the relative entropy in (7.31) and \(H(\mu _0|{\textbf{R}}_0)\) are always infinite, while their difference \(H({\textbf{P}}|{\textbf{R}}) - H(\mu _0|{\textbf{R}}_0)\) can be finite as in (7.36). Moreover, by Theorem 7.16 and 7.18, a necessary condition for \(X_{\textbf{P}}\) to be the minimizer of \({\mathcal {S}}\) is that \(X_{\textbf{P}}\) satisfies (7.26) and (7.27), which coincides with (7.32).
Remark 7.20
-
(i)
Compared to the Lagrangian (7.25) used here for addressing Schrödinger’s problem, there is another type of Lagrangians used in the Euclidean version of quantum mechanics in Cruzeiro and Zambrini (1991, Eq. (5.4)). The latter has an additional term of divergence of b, which helps to express part of the action functional as a Stratonovich integral. The stochastic Euler–Lagrange equation (7.22) applied to their Lagrangians recovers the equations of motion in Cruzeiro and Zambrini (1991, Theorem 5.3).
-
(ii)
In the seminal paper (Otto 2001), F. Otto provided a geometric perspective for numerous PDEs by introducing a Riemannian structure in the Wasserstein space. It is known as Otto’s calculus. A similar idea can ascend to V.I. Arnold, who established a geometric framework for hydrodynamics by studying the Riemannian nature of the infinite-dimensional group of diffeomorphisms (Arnold and Khesin 2021). The recent paper (Gentil et al. 2020) formulated Schrödinger’s problem via Otto calculus, where the equation of motion is given by an infinite-dimensional Newton equation, cf. Khesin et al. (2021) and von Renesse (2012) on related matters. All these works can be called a “geometrization” of (stochastic) dynamics. In contrast, the present framework can be called a “stochastization” of geometric mechanics. The difference and relations between our framework and theirs are similar to those between two ways of producing HJ equations for quantum mechanics mentioned in the introduction. More precisely, while (second-order) HJB equations play a key role in our framework, various HJ equations with density-dependent potential terms were derived by them (see Gentil et al. 2020, Corollary 23; Khesin et al. 2021, Proposition 2.4).
7.4 Second-Order Legendre Transform
7.4.1 From \({\mathcal {T}}^{S*} M\) to \({\mathcal {T}}^S M\) and Back
Let us fix a linear connection \(\nabla \) on M. Here, for simplicity, we consider time-independent Hamiltonians and Lagrangians.
We first produce second-order Lagrangians from second-order Hamiltonians. To this end, we first reduce the second-order Hamiltonian to a classical one. Given a time-independent second-order Hamiltonian \(H: {\mathcal {T}}^{S*} M\rightarrow {\mathbb {R}}\), its \(\nabla \)-reduction is the classical Hamiltonian \(H_0 = H\circ {\hat{\iota }}^*_\nabla : T^* M\rightarrow {\mathbb {R}}\), as in (6.42). If \(H_0\) is hyperregular (see Abraham and Marsden 1978, Section 3.6), then its fiber derivative \({\textbf {F}}H_0: T^* M\rightarrow TM\), which is given in canonical coordinates by \(\dot{x}^i = \frac{\partial H_0}{\partial p_i}\), is a diffeomorphism and defines the classical Legendre transform (Abraham and Marsden 1978, Section 3.6):
where \(({\hat{o}}_{jk})\) is a family of auxiliary variables introduced in (6.43). Then, we lift \(L_0\) to an admissible second-order Lagrangian \(L: {\mathcal {T}}^S M\rightarrow {\mathbb {R}}\) as in Definition 7.9, that is, \(L = L_0 \circ \varrho _\nabla \). Combining (7.38) with (7.8), the relation between L and H is
We call (7.39) the second-order Legendre transform. In particular, if we restrict the admissible second-order Lagrangian L to the subbundle of \({\mathcal {T}}^S M\) with coordinate constraint \(Q^{jk} x = g^{jk}(x)\) for some symmetric (2, 0)-tensor field g [which is just the condition in (7.10)], and let H be \((g,\nabla )\)-canonical, then by (6.45), we have
Consequently, we can find the relation between second-order Hamilton’s principal functions and action functionals. By (6.41) and (7.40),
One concludes, from Dynkin’s formula, that for an M-valued diffusion \(X\in {\mathcal {A}}_g([0,T];q, \mu )\),
and
where \({\textbf{E}}_{(t,x)}\) is the conditional expectation \({\textbf{E}}(\cdot |X(t)=x)\). These mean that the action functional is the expectation of second-order Hamilton’s principal function (up to an undetermined constant), while the second-order Hamilton’s principal function is the conditional expectation version of action functional.
Conversely, let us be given an admissible second-order Lagrangian \(L: {\mathcal {T}}^S M\rightarrow {\mathbb {R}}\) which is the \(\nabla \)-lift of a classical Lagrangian \(L_0: T M\rightarrow {\mathbb {R}}\). If \(L_0\) is hyperregular, then its fiber derivative
which is written in coordinates as \(p_i = \frac{\partial L_0}{\partial \dot{x}^i}\), is a diffeomorphism and defines the classical inverse Legendre transform:
We replace coordinates \((\dot{x}^i)\) by \((D_\nabla ^i x)\), due to (3.2). Now, given a symmetric (2, 0)-tensor field g, we lift \(H_0\) to the \((g,\nabla )\)-canonical \({\overline{H}}^g_0\) in (6.44). The relation between \({\overline{H}}^g_0\) and L is
where \((o^\nabla _{jk})\) is the tensorial conjugate diffusivities defined in (5.6). We call (7.43) the \((g,\nabla )\)-canonical inverse second-order Legendre transform. When g is Riemannian and \(\nabla \) is the associated Levi–Civita connection, we call (7.43) the g-canonical inverse second-order Legendre transform. In particular, when restricting L onto the subbundle of \({\mathcal {T}}^S M\) with coordinate constraint \(Q^{jk} x = g^{jk}(x)\), we have
Following the procedure in classical mechanics (Abraham and Marsden 1978, Definition 3.5.11), for a given classical Lagrangian \(L_0:TM\rightarrow {\mathbb {R}}\), we define a function \(A_0:TM\rightarrow {\mathbb {R}}\) by \(A_0(v_x) = \textbf{FL}_0(v_x)\cdot v_x\), and the classical energy \(E_0:TM\rightarrow {\mathbb {R}}\) by \(E_0 = A_0-L_0\). Notice that in local coordinates, \(A_0 = \dot{x}^i \frac{\partial L_0}{\partial \dot{x}^i}\) and \(E_0 = \dot{x}^i \frac{\partial L_0}{\partial \dot{x}^i} - L_0\).
Example 7.21
It is easy to check that the \(\nabla \)-lift of the classical Lagrangian \(L_0\) in (7.25) is the second-order Legendre transform of the second-order Hamiltonian H in (6.26). And conversely, the latter is the g-canonical inverse second-order Legendre transform of the former. The classical energy associated with this Lagrangian is given by
Each term at RHS corresponds to a kinetic energy, a vector potential energy and a scalar potential energy, respectively.
7.4.2 Stochastic Hamiltonian Mechanics on Riemannian Manifolds
Given a reference metric tensor g, i.e., a geodesically complete Riemannian metric as in Sect. 7.2, let \(\nabla \) be the associated Levi–Civita connection. If a second-order Hamiltonian H is the g-canonical lift of a classical Hamiltonian \(H_0\), namely, \(H = {\overline{H}}_0^g\) as in (6.44), then the stochastic Hamilton’s equations (6.17) can reduce to a simpler Hamilton-type system on \(T^*M\), which is exactly equivalent to the stochastic Euler–Lagrange equation (7.22) via the classical Legendre transform (7.41) and (7.42).
Similarly to (7.19) and (7.20), we introduce, for a smooth function f on \(T^* M\), the vertical gradient \(\nabla _p f\) and horizontal differential \(d_x f\) which are given in local coordinates (x, p) by
Both are invariant under change of coordinates. Still by the classical theory, the connection \(\nabla \) can uniquely determine a \(TT^*M\)-valued 1-form on \(T^*M\) horizontal over M, given by
Hence, we have \(d_x f = \Gamma ^*(df)\). Given a 1-form \(\eta \) on M, \(f\circ \eta : q\mapsto f(\eta _q)\) is a smooth function on M. Then, it is easy to verify that
Theorem 7.22
Given a smooth function \(H_0: T^*M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\).
-
(i)
Let \(H = {\overline{H}}_0^g: {\mathcal {T}}^{S*} M\times {\mathbb {R}}\rightarrow {\mathbb {R}}\) be the g-canonical lift of H. Let \({\textbf{X}}\) be the horizontal integral process of stochastic Hamilton’s equations (6.17) corresponding to H and \(X=\tau _M^{S*}({\textbf{X}})\). Define a \(T^*M\)-valued horizontal diffusion by \({\mathbb {X}}:= {\hat{\varrho }}^*({\textbf{X}})\). Then, \(\mathbb X(t) = p(t,X(t))\) solves the following system on \(T^*M\),
$$\begin{aligned} \left\{ \begin{aligned}&D_\nabla X(t) = \nabla _p H_0 ({\mathbb {X}}(t),t), \\&\frac{\overline{{\textbf{D}}}}{dt} p(t,X(t)) = -d_x H_0 ({\mathbb {X}}(t),t), \end{aligned}\right. \end{aligned}$$(7.47)subject to \(QX(t) = {\check{g}}(X(t))\), where \(\frac{\overline{{\textbf{D}}}}{dt}\) is the damped mean covariant derivative with respect to X. In this case, we refer to the system (7.47) as the g-canonical reduction of (6.17), or global stochastic Hamilton’s equations.
-
(ii)
If \(H_0\) is hyperregular, then the global stochastic Hamilton’s equations (7.47) are equivalent to the stochastic Euler–Lagrange equation (7.22) via the classical Legendre transform \(p=d_{\dot{x}} L_0\) and \(H_0(x,p,t) = p\cdot \dot{x} - L_0(t,x,\dot{x})\).
-
(iii)
Let \(S\in C^\infty (M\times {\mathbb {R}})\). Then, the following statements are equivalent:
-
(a)
for every M-valued diffusion X satisfying
$$\begin{aligned} D_\nabla X(t) = \nabla _p H_0 (dS(t, X(t) ),t), \quad QX(t) = \check{g}(X(t)), \end{aligned}$$(7.48)the \(T^* M\)-valued process \(dS\circ X\) solves the global stochastic Hamilton’s equations (7.47);
-
(b)
S satisfies the following Hamilton–Jacobi–Bellman equation
$$\begin{aligned} \frac{\partial S}{\partial t} + H_0(d S, t) + \frac{1}{2}\Delta S = f(t), \end{aligned}$$(7.49)for some function f depending only on t.
Proof
-
(i)
Since \(H = {\overline{H}}_0^g = H_0 + \textstyle {\frac{1}{2}} g^{jk} (o_{jk} - \Gamma ^i_{jk} p_i )\), \((QX)^{jk} = 2\frac{\partial H}{\partial o_{jk}}\) if and only if \(QX(t) = {\check{g}}(X(t))\). Since,
$$\begin{aligned} \frac{\partial H}{\partial p_i} = \frac{\partial H_0}{\partial p_i} - \frac{1}{2} g^{jk} \Gamma ^i_{jk} = dx^i( \nabla _p H_0 ) - \frac{1}{2} (QX)^{jk} \Gamma ^i_{jk}, \end{aligned}$$we have \((DX)^i = \frac{\partial H}{\partial p_i}\) if and only if \(D_\nabla X= \nabla _p H_0\), due to (2.20). This proves the first equation of (7.47). Furthermore,
$$\begin{aligned} \frac{\partial H}{\partial x^i}= & {} \frac{\partial H_0}{\partial x^i} + \frac{1}{2} \partial _i g^{jk} \left( o_{jk} - \Gamma ^l_{jk} p_l \right) - \frac{1}{2} g^{jk} \partial _i \Gamma ^l_{jk} p_l \\= & {} \frac{\partial H_0}{\partial x^i} - g^{jm} \Gamma _{im}^k \left( o_{jk} - \Gamma ^l_{jk} p_l \right) - \frac{1}{2} g^{jk} \partial _i \Gamma ^l_{jk} p_l. \end{aligned}$$On the other hand, by applying Lemma 7.8 (ii) and (iv), and the equation \(D_\nabla X= \nabla _p H_0\), we have
$$\begin{aligned} \begin{aligned} (D (p\circ {\textbf {X}}))_i&= {\textbf{D}}_t p_i = {\textbf{D}}_t [p(\partial _i)] = \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + p \left( \frac{\overline{{\textbf{D}}}\partial _i}{dt} \right) \\ {}&\quad + (QX)^{jk} (\nabla _{\partial _j} p) (\nabla _{\partial _k} \partial _i) \\&= \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + p \left( \nabla ^{}_{D_\nabla X}\partial _i + \frac{1}{2} g^{jk} \nabla ^2_{\partial _j,\partial _k} \partial _i + \frac{1}{2} g^{jk} R(\partial _i,\partial _j)\partial _k \right) \\&\quad + g^{jk} (\nabla _{\partial _j} p) (\nabla _{\partial _k} \partial _i) \\&= \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + p_l \left( \frac{\partial H_0}{\partial p_j} \Gamma _{ij}^l + \frac{1}{2}g^{jk} \partial _i \Gamma _{jk}^l \right) \\ {}&\quad + g^{jk} \Gamma _{ik}^m \left( \partial _j p_m - \Gamma _{jm}^l p_l \right) . \end{aligned}\nonumber \\ \end{aligned}$$(7.50)Hence,
$$\begin{aligned} \begin{aligned} (D (p\circ {\textbf {X}}))_i + \frac{\partial H}{\partial x^i} = \frac{\overline{{\textbf{D}}}p}{dt} (\partial _i) + d_x H_0(\partial _i) + g^{jm} \Gamma _{im}^k \left( \partial _j p_k - o_{jk} \right) . \end{aligned} \end{aligned}$$ -
(ii)
The equivalence between (7.47) and (7.22) follows from the following calculations:
$$\begin{aligned} \nabla _p H_0= & {} \nabla _p (p\cdot \dot{x} - L_0) = \dot{x}, \\ d_x H_0= & {} \left( \frac{\partial H_0}{\partial x^i} + \Gamma _{ij}^k p_k \frac{\partial H_0}{\partial p_j} \right) dx^i \\= & {} \left( -\frac{\partial L_0}{\partial x^i} + \Gamma _{ij}^k \frac{\partial L_0}{\partial \dot{x}^k} \dot{x}^j \right) dx^i = - d_x L_0. \end{aligned}$$ -
(iii)
By (7.7), conditions (7.48) and (7.46),
$$\begin{aligned} \begin{aligned} \frac{\overline{{\textbf{D}}}}{dt} (dS)&= \left( \frac{\partial }{\partial t} + \nabla ^{}_{D_\nabla X} + \frac{1}{2}\Delta _{\textrm{LD}} \right) (dS) \\&= d\frac{\partial S}{\partial t} + \nabla _{(\nabla _p H_0\circ dS)} dS - \frac{1}{2}( dd^*+d^* d)dS \\&= d\frac{\partial S}{\partial t} + d(H_0\circ dS) - d_x H_0 \circ dS - \frac{1}{2} dd^*dS \\&= d\left( \frac{\partial S}{\partial t} + H_0\circ dS + \frac{1}{2} \Delta S \right) - d_x H_0 \circ dS. \end{aligned} \end{aligned}$$The result follows.
\(\square \)
Remark 7.23
-
(i)
Assertions (ii) and (iii) of Theorem 7.22 generalize Theorem 7.18, since from the Legendre transform \(p=d_{\dot{x}} L_0\) we observe that the S-EL equation (7.22) is related to HJB equation (7.49) via Eq. (7.29). However, assertion (iii) is a special case of Theorem 6.19, since HJB equation (7.49) is just the one in (6.39) with \(H = {\overline{H}}_0^g\) the g-canonical lift of \(H_0\), due to the observation that \(\overline{H}_0^g(d^2\,S, t) = H_0(d S, t) + \frac{1}{2}\Delta S\).
-
(ii)
The advantage of Theorem 7.22 is that it formulates stochastic Hamiltonian mechanics in a global way similar to stochastic Lagrangian mechanics, while its disadvantage is that it depends on the choice of Riemannian structures. However, unlike stochastic Hamiltonian mechanics of Sect. 6, neither global S-H equations (7.47) nor HJB equation (7.49) encodes any new symplectic or contact structures, as the Hamiltonian functions therein are still classical.
-
(iii)
By a direct calculation similar to (7.50), one easily obtains the following local version of stochastic Euler–Lagrange equation (7.22):
$$\begin{aligned} {\textbf{D}}_t \left( \frac{\partial L_0}{\partial \dot{x}^i} \right) = \frac{\partial L_0}{\partial x^i} + \frac{1}{2} g^{jk} \partial _i \Gamma ^l_{jk} \frac{\partial L_0}{\partial \dot{x}^l} - \frac{1}{2} \partial _i g^{jk} \left( \frac{\partial ^2 L_0}{\partial x^j \partial \dot{x}^k} - \Gamma ^l_{jk} \frac{\partial L_0}{\partial \dot{x}^l} \right) .\nonumber \\ \end{aligned}$$(7.51)This local version is related to stochastic Hamilton’s equations (6.10) via the canonical second-order Legendre transform (7.43).
-
(iv)
Similarly to Remark 6.20, if we let \({\tilde{H}} = H - f\), then Theorem 7.22 holds with \({\tilde{H}}\) and zero function in place of H and f. We will refer to Eq. (7.49) with \(f\equiv 0\) as the HJB equation associated with Hamiltonian \(H_0\), or the HJB equation associated with the Lagrangian \(L_0\) related to \(H_0\) via the Legendre transform (when \(H_0\) is hyperregular).
On Riemannian manifolds, canonical transformations of Sect. 6.5 can also be reduced to tangent bundles. We consider a bundle isomorphism \({\textbf{F}}\) from \({\mathcal {T}}^{S*} M\times {\mathbb {R}}\) to \({\mathcal {T}}^{S*} N\times {\mathbb {R}}\), projecting to a time-change map \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\). The transformation \({\textbf{F}}\) is a map from coordinates \((x^i, p_i, o_{jk},t)\) to \((y^i, P_i, O_{jk}, s)\) satisfying \(s=F^0(t)\). Both base manifolds M and N are equipped with some Riemannian metrics and the corresponding Levi–Civita connections.
By the inverse second-order Legendre transform (7.44) and the integrability condition (6.15), the action functional in (7.9) can be rewritten as
where \(\circ \,d\) denotes the Stratonovich stochastic differential and \({\overline{H}}^g_0 = H_0 + \frac{1}{2} g^{jk} (o_{jk} - \Gamma ^i_{jk}p_i )\). We denote simply \(x^i = x^i\circ {\textbf{X}}\), \(p_i = p_i\circ {\textbf{X}}\) and \(H= {\overline{H}}^g_0\). Then, \(\mathcal S = {\textbf{E}}\int _0^T (p_i\circ dx^i(t) - H {\textrm{d}}t)\). Now we make a change of coordinates from \((x^i,p_i,t)\) to \((y^i,P_i,s)\) satisfying \(s=F^0(t)\), and denote that \(y^i = y^i\circ {\textbf{X}}\) and \(P_i = P_i\circ {\textbf{X}}\). We have
where the function K plays the role of the second-order Hamiltonian in new coordinate system.
As in Sect. 6.5, the general condition for a transformation to be canonical is to preserve the form of stochastic Hamilton’s system (7.47). This is equivalent to preserve the form of stochastic stationary-action principle (7.12), according to Theorem 7.22.(ii). It follows from \(\delta {\mathcal {S}} = 0\) that
Since the underlying process X has zero variation at the endpoints, both equalities will be satisfied if the integrands are related by the following SDE:
where G is a function of phase space coordinates (x, p, t) or (y, P, s) or any mixture of them and called the generating function. Note that in contrast with the classical theory of canonical transformation and also (6.36), here Eq. (7.52) for canonical transformations is a stochastic differential equation, instead of equation for forms.
Consider the type one generating function \(G_1\), that is, \(G = G_1(x,y,t)\) is given as a function of the old and new generalized position coordinates (cf. Goldstein et al. 2002, Section 9.1). Then, using Itô’s formula \(dG_1 = \frac{\partial G_1}{\partial t}dt + \frac{\partial G_1}{\partial x^i} \circ dx^i + \frac{\partial G_1}{\partial y^i} \circ dy^i\), and vanishing the coefficients of every (stochastic) differentials \(\circ dx\), \(\circ dy\) and dt in (7.52), we get
which recovers (6.37). By taking \(F^0 = \textbf{Id}_{\mathbb {R}}\) (i.e., no time-change) and requiring the new Hamiltonian \(K_0\) to be identically zero, and writing \(G_1\) as S the last equation turns into the following HJB equation
where (x, y) are regarded as coordinates on the product manifold \(M\times N\) equipped with the direct-sum Riemannian metric and its corresponding Levi–Civita connection, \(\Delta _x\) and \(\Delta _y\) are the Laplacian on M and N, respectively, so that \(\Delta _x + \Delta _y\) is the Laplacian on \(M\times N\) under the aforementioned connection.
In contrast to the mixed-order contact approach to canonical transformations of Sect. 6.5, since the changes of coordinates proceed on \(T^* M\), one can easily formulate four types of generating functions that are related to each other through classical Legendre transforms in the same way as in classical mechanics (Goldstein et al. 2002, Section 9.1). For example, the type two generating function takes the form \(G = G_2(x,P,t) - y^i P_i\), for which we have
In this case, since \((x^i)\) and \((y^i)\) are no longer independent variables, Riemannian structures on M and N should be related by the transformation. In view of this, we only consider point transformations, a subclass of canonical transformations. That is, we assume \(G_2\) to be the form
for some diffeomorphisms \(f: M\rightarrow N\)’s and \(h: M\rightarrow {\mathbb {R}}\). The second equation of (7.53) implies
So we equip N with the (time-dependent) pushforward Riemannian metric of g on M by f, and with the Levi–Civita connection.
Example 7.24
(Canonical transformations for one-dimensional Bernstein’s reciprocal processes) Consider the scalar case of Example 6.11, that is, the \({\mathbb {R}}\)-valued Brownian reciprocal process with second-order Hamiltonian \(H(x,p,o) = H_0(x,p) + \frac{1}{2}o = \frac{1}{2}|p|^2+ \frac{1}{2}o\). The equations of motion are \(DDX=0\), \(QX = 1\) [cf. (6.33)]. In the following, we will consider two canonical transformations which transform Brownian reciprocal processes to reciprocal processes derived from diffusions with linear potentials and quadratic potentials, respectively.
-
(i)
Consider the time-dependent change of coordinates from (x, p, t) to (y, P, t) (without time-change) induced by \(G_2(x,P,t)= (x + \textstyle {{\frac{1}{2}}} t^2) P -tx\). By (7.53),
$$\begin{aligned} y = x+ \textstyle {{\frac{1}{2}}} t^2, \quad p = P-t, \quad K = H + Pt-y + \textstyle {{\frac{1}{2}}} t^2. \end{aligned}$$(7.54)For the latent second-order coordinates, we have
$$\begin{aligned} O = \frac{\partial P}{\partial y} = \frac{\partial p}{\partial x} = o. \end{aligned}$$Hence, by the last equation of (7.53), the new second-order Hamiltonian is
$$\begin{aligned} K(y,P,O,t) = K_0(y,P,t)+ \frac{1}{2}O= \frac{1}{2}|P|^2 - y + t^2 + \frac{1}{2}O, \end{aligned}$$which is still of the form (6.26), with \(b\equiv 0\) and \(F(t,y) = -y+t^2\). The equations of motion under new coordinates are \(DDY=1\) and \(QY = 1\). By Remark 6.20, K share the same equations of motion with \({\tilde{K}}(y,P,O) = \frac{1}{2}|P|^2 - y + \frac{1}{2}O\). In other words, (7.54) transforms Brownian reciprocal processes to reciprocal processes derived from diffusions with linear potentials. This example is taken from Lescot and Zambrini (2007, Theorem 4.1.(1)), where the authors used (7.54) to transform free heat equations to heat equations with linear potentials. We refer readers to Lescot and Zambrini (2007) for more applications of canonical transformations of contact Hamiltonian systems to Euclidean quantum mechanics in Example 6.12.
-
(ii)
Consider the following change of coordinates from (x, p, t) to (y, P, s) (with time-change)
$$\begin{aligned} x = y\sqrt{1-t^2}, \quad P= p\sqrt{1-t^2} + yt, \quad s=\textrm{arctanh}\,t. \end{aligned}$$(7.55)Clearly, the map \((x,p)\mapsto (y,P)\) is induced by the type three generating function \(G_3(y,p,t) = -py\sqrt{1-t^2} -\frac{y^2}{2}t\) via relations \(x= -\frac{\partial G_3}{\partial p}\) and \(P= -\frac{\partial G_3}{\partial y}\). The relation between the latent coordinates o and O is
$$\begin{aligned} O = \frac{\partial P}{\partial y} = \frac{\partial p}{\partial x} \frac{\partial x}{\partial y} \sqrt{1-t^2} + t = (1-t^2) o +t. \end{aligned}$$(7.56)The new second-order Hamiltonian K satisfies \(K \frac{{\textrm{d}}s}{{\textrm{d}}t} - H = \frac{\partial G_3}{\partial t}\). Hence, combining with (7.55) and (7.56), we obtain
$$\begin{aligned} K (y,P,O,s)= & {} (1-t^2) \left( \frac{1}{2} |p|^2 + \frac{py t}{\sqrt{1-t^2}} - \frac{1}{2} |y|^2 + \frac{1}{2} o \right) \\= & {} \frac{1}{2} |P|^2 - \frac{1}{2} |y|^2 + \frac{1}{2} O - \frac{1}{2} \tanh s. \end{aligned}$$This differs with the second-order Hamiltonian of Euclidean harmonic oscillators in Example 6.12.(ii), i.e., \({\tilde{K}} (y,P,O) = \frac{1}{2} |P|^2 - \frac{1}{2} |y|^2 + \frac{1}{2} O\), by a term depending only on time. So by virtue of Remark 6.20, K and \({\tilde{K}}\) share the same equations of motion \(DDY=Y\), \(QY = 1\). Therefore, (7.55) transforms free reciprocal processes to Euclidean harmonic oscillators.
Example 7.25
(Canonical transformations for vanishing potentials) Let (M, g) be Riemannian. Take \(G_2(x,P,t) = x^i P_i - S(x,t)\) for some function S. Then,
Since the transformation on base manifold M is identity, it does not change the Riemannian metric, and
-
(i)
We consider the Hamiltonian \(H_0(x, p) \equiv b^i(x) p_i - F(x)\), whose corresponding second-order Hamiltonian \(H = \overline{H}^g_0\) has a diffusion with generator \(\frac{1}{2} \Delta + b\cdot \nabla - F\) for solution process (see Sect. 6.3.1). Then, the new Hamiltonian is
$$\begin{aligned} K_0(y,P,t)= & {} b^i(y) P_i - \langle b(y), \nabla S(y,t)\rangle \\{} & {} - F(y) - \frac{1}{2} \Delta S(y,t) - \frac{\partial S}{\partial t}(y,t). \end{aligned}$$If we choose S solving the backward PDE (6.23), then \(K_0(y,P) = b^i(y) P_i\) has a diffusion process with generator \(\frac{1}{2} \Delta + \nabla _b\) for solution. In particular, such a canonical transformation can transform a diffusion process with a scalar potential into a free motion.
-
(ii)
Consider the Hamiltonian \(H_0(x,p,t) = \frac{1}{2}g^{ij}(x) p_i p_j + g^{ij}(x)p_i\frac{\partial S}{\partial x^j}(x,t) + b^i(x) p_i - F(x)\), whose corresponding second-order Hamiltonian \(H = \overline{H}^g_0\) has a Schrödinger’s bridge with vector potential \((b+\nabla S)\) and scalar potential \(-F\) for solution process. Then, the new Hamiltonian is
$$\begin{aligned} K_0(y,P,t)= & {} \frac{1}{2}g^{ij}(y) P_iP_j + b^i(y) P_i - \langle b(y), \nabla S(y,t)\rangle - \frac{1}{2}|\nabla S(y,t)|^2 - F(y) \\{} & {} - \frac{1}{2}\Delta S(y,t) - \frac{\partial S}{\partial t}(y,t). \end{aligned}$$To transform \(K_0\) into the standard form \(K_0(y,P,t) = \frac{1}{2}g^{ij}(y) P_iP_j + b^i(y) P_i\) whose solution is a Schrödinger’s bridge with vector potential b, we only need to assume that S solves HJB equation (6.28). In particular, such a canonical transformation transforms a Schrödinger’s bridge with a scalar potential into a free one.
Regarding the classical energy introduced in the end of Sect. 7.4.1, for a given classical Lagrangian \(L_0:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\), we introduce its generalized (or deformed) energy \(E:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) by
where S is the solution of the Hamilton–Jacobi–Bellman equation (7.49) associated with \(L_0\) (with \(f\equiv 0\)). The term \(\frac{1}{2}\Delta S\) stands for the stochastic deformation.
7.4.3 Small-Noise Limits
In this part, we will see, informally, how our stochastic framework degenerates into classical mechanics as the noise goes to zero. Let \(\epsilon >0\) be a small parameter which we refer to as diffusivity. The limit when \(\epsilon \rightarrow 0\) is called the small-noise limit.
Let \({\mathcal {A}}^\epsilon _g([0,T];q, \mu )\) be the small-noise version of the admissible class (7.10), that is, with the constraint \(QX(t) = \epsilon {\check{g}}(X(t))\). The \(\epsilon \)-dependent stochastic variational problem is to minimize the action functional \({\mathcal {S}} [X;0,T]\) in (7.9) among all \(X\in {\mathcal {A}}^\epsilon _g([0,T];q, \mu )\). Then, the same procedure as Sect. 7.2 yields the following \(\epsilon \)-dependent stochastic Euler–Lagrange equation:
which is an equivalent condition for \(X_\epsilon \in {\mathcal {A}}^\epsilon _g([0,T];q, \mu )\) to be a stationary point of \({\mathcal {S}}\). Here \(\frac{\overline{{\textbf{D}}}^\epsilon }{dt}\) is the damped mean covariant derivative with respect to \(X_\epsilon \) so that
Now as \(\epsilon \rightarrow 0\), since \(QX_\epsilon \rightarrow 0\), \(X_\epsilon \) tends to some deterministic curve \(\gamma =(\gamma (t))_{t\in [0,T]}\) (in a suitable probabilistic sense), and \(D_\nabla X_\epsilon (t)\) tends to \(\dot{\gamma }(t)\). Thus, we can write informally
The \(\epsilon \)-dependent stochastic variational problem tends to the following deterministic variational problem
And the \(\epsilon \)-dependent stochastic Euler–Lagrange equation (7.57) tends to
where, \(\frac{D}{dt} = \frac{\partial }{\partial t} + \nabla _{\dot{\gamma }}\) is the material derivative along \(\gamma \). This is the classical Euler–Lagrange equation in global form, cf. Villani (2009, p. 153).
We introduce the following \(\epsilon \)-dependent version of the g-canonical lift (6.44):
Let \({\textbf{X}}_\epsilon \) be a horizontal integral process of stochastic Hamilton’s equations (6.10) corresponding to \(H_\epsilon \) and \(X_\epsilon =\tau _M^{S*}({\textbf{X}}_\epsilon )\). Since \((Q (x\circ {\textbf {X}}_\epsilon ))^{jk}= 2\frac{\partial H_\epsilon }{\partial o_{jk}} = \epsilon \check{g} \rightarrow 0\) as \(\epsilon \rightarrow 0\), \({\textbf{X}}_\epsilon \) converges to a \(T^* M\)-valued process. And since \(\frac{\partial H_\epsilon }{\partial p_i} \rightarrow \frac{\partial H_0}{\partial p_i}\) and \(\frac{\partial H_\epsilon }{\partial x^i} \rightarrow \frac{\partial H_0}{\partial x^i}\), the limit \(T^* M\)-valued process satisfies classical Hamilton’s equations,
Let \({\mathbb {X}}_\epsilon := {\hat{\varrho }}^*({\textbf{X}}_\epsilon )\). Then, \(\mathbb X_\epsilon (t) = p(t,X_\epsilon (t))\) solves the system of global stochastic Hamilton’s equations (7.47), with \({\mathbb {X}}_\epsilon \), \(X_\epsilon \) and \(\frac{\overline{{\textbf{D}}}^\epsilon }{dt}\) in place of \({\mathbb {X}}\), X and \(\frac{\overline{{\textbf{D}}}}{dt}\), respectively, subject to \(QX_\epsilon (t) = \epsilon {\check{g}}(X_\epsilon (t))\). As \(\epsilon \) goes to 0, this system tend to the following deterministic system,
This is indeed the global form of (7.60) which is equivalent to the global Euler–Lagrange equation (7.59) via the classical Legendre transform.
The corresponding \(\epsilon \)-dependent Hamilton–Jacobi–Bellman equation is now
which, as \(\epsilon \rightarrow 0\), goes to the classical Hamilton–Jacobi equation
The latter corresponds to (7.59)–(7.61) via classical Hamilton–Jacobi theory (e.g., Abraham and Marsden 1978, Chapter 5).
We list here some previous works that have independent interests in the above small-noise limits, in some special cases. The time-asymptotic large deviation for Brownian bridges of Example 6.11 was studied in Hsu (1990). The second author of the present paper and his collaborator proved in Privault et al. (2016) a large deviation result for one-dimensional Bernstein bridges which are solution processes of Euclidean quantum mechanics in Example 6.12. The paper (Léonard 2012b) proved that the \(\Gamma \)-limit of Schrödinger’s problem in Sect. 7.3 with small variance is the Monge–Kantorovich problem. The latter is the optimal transport problem associated with the classical variational problem (7.58) (Villani 2009, Chapter 7). See Mikami (2021, Section 2.3) for more on small-noise limits of stochastic optimal transport.
Remark 7.26
There are various terminologies in other areas related to the small-noise limit. In thermodynamics (Huang and Zambrini 2023), \(\epsilon \) stands for the Boltzmann constant which relates to the diffusion coefficient via Einstein relation, as consistent with Schrödinger’s original statistical problem (Schrödinger 1932); when applied to quantum mechanics as in Example 6.12, the small-noise limit is called the semiclassical limit and the parameter \(\epsilon \) stands for the reduced Planck constant \(\hbar \); when/if applied to hydrodynamics (cf. Arnaudon et al. 2014; Chen et al. 2023), it is often called the vanishing viscosity limit and \(\epsilon \) stands for the kinematic viscosity \(\nu \). The latter may be expected to solve Kolmogorov’s conjecture that the “stochastization” of dynamical systems is related to hydrodynamic PDEs as viscosity vanishes (Arnold and Khesin 2021). In physics, diffusivity, Planck constant and viscosity are indeed related to each other (Trachenko and Brazhkin 2021).
7.4.4 Relations to Stochastic Optimal Control
Following the way of converting problems of classical calculus of variations into optimal control problems (see Fleming and Soner 2006), we can regard the stochastic variational problem of Sect. 7.2 as a stochastic optimal control problem.
Assume that (M, g) is compact (for simplicity). Consider a stochastic control model in which the state evolves according to an M-valued diffusion X governed by a system of MDEs on the time interval [t, T], of the form
or equivalently, by an Itô SDE of the form
where \(\sigma \) is the positive-definite square root (1, 1)-tensor of g, i.e., \(\sum _{r=1}^d \sigma ^i_r\sigma ^j_r = g^{ij}\), W is an \({\mathbb {R}}^d\)-valued standard Brownian motion and, most importantly, U is a TM-valued process called the control process. There are no control constraints for U as it is admissible in the sense of Fleming and Soner (2006, Definition 2.1). As endpoint condition, we require that \(X(t) = x\).
The control problem on a finite time interval \(s\in [t,T]\) is to choose U to minimize
among all pairs (X, U) satisfying the system (7.62) and the endpoint condition, where \(S_T\) is a given smooth function on M. The real-valued smooth function \(L_0\) on \({\mathbb {R}}\times TM\) is called running cost function and J the payoff functional. The problem is called a stochastic Bolza problem. In the case \(S_T\equiv 0\), this stochastic control problem is of the same form as our stochastic variational problem of Sect. 7.2. For this reason, we call the latter stochastic control problem to be in Lagrange form. By an argument similar to Theorem 7.16, one can derive the same equation as (7.22), but with boundary conditions \(X(t) = x\) and \(d_{\dot{x}}L_0(T,X(T),D_\nabla X(T)) = dS_T(X(T))\).
The starting point of dynamic programming is to regard the infimum of J being minimized as a function S(t, x) of the initial data:
Then, Bellman’s principle of dynamic programming (Fleming and Soner 2006, Section III.7) states that for \(t\le t+\epsilon \le T\),
Divide the equation by \(\epsilon \), let \(\epsilon \rightarrow 0^+\), and then use Dynkin’s formula. We get the dynamic programming equation
subjected to terminal data \(S(T,x) = S_T(x)\). By (4.5) and (7.62),
We let
where the supremum can be ignored if \(L_0\) is convex, so that \(H = {\overline{H}}^g_0\) is exactly the canonical inverse second-order Legendre transform in (7.43). Then, the dynamic programming equation (7.64) can be written as the HJB equation (6.38), cf. Fleming and Soner (2006, Section IV.3).
There is also a stochastic version of Pontryagin’s maximum principle (Yong and Zhou 1999, Theorem 3.3.2). The crucial objects in stochastic Pontryagin’s principle are first- and second-order adjoint processes, p and o, respectively. Corresponding to the stochastic control problem (7.62)–(7.63), its adjoint processes p and o satisfy the following backward SDEs (Yong and Zhou 1999, Section 3.3.2) (where “backward” is again in a different sense from ours in Sect. 2),
and
which are called first- and second-order adjoint equation, respectively. The unknowns in (7.65) and (7.66) are the pairs (p, z) and (o, Z), respectively. Suppose that \(p_i(t) = p_i(t,X(t))\) and \(o_{ij}(t) = o_{ij}(t,X(t))\) for time-dependent second-order form (p, o) that satisfies second-order Maxwell relations (6.15). Then,
Plugging them into (7.65) and (7.66), we get
These coincide with the corresponding equations in the S-H system (6.10) for second-order Hamiltonian \(\overline{H}^g_0\). The first equality of (7.67) also recovers (7.51).
7.5 Stochastic Variational Symmetries
Definition 7.27
Given an action functional \({\mathcal {S}}\) as in (7.9), a bundle automorphism F on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) projecting to \(F^0\) is called a variational symmetry of \({\mathcal {S}}\) if, whenever \([t_1,t_2]\) is a subinterval of [0, T], we have \({\mathcal {S}}[F\cdot X,F^0(t_1),F^0(t_2)] = {\mathcal {S}}[X,t_1,t_2]\). A \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) is called an infinitesimal variational symmetry of \({\mathcal {S}}\), if its flow consists of variational symmetries of \({\mathcal {S}}\).
Lemma 7.28
The \(\pi \)-projectable vector field V of the form (4.9) is an infinitesimal variational symmetry of \({\mathcal {S}}\) if and only if
is a martingale, for all \(X\in I_0^T(M)\).
Proof
As in the proof of Theorem 4.14, we let \(\psi = \{(\psi ^0_\epsilon , {\bar{\psi }}_\epsilon )\}_{\epsilon \in {\mathbb {R}}}\) be the flow generated by V, and denote \({\tilde{X}}_\epsilon = \psi _\epsilon \cdot X\). Then, by a change of variable \(s=\psi ^0_\epsilon (t)\),
Since for all \([t_1,t_2]\subset [0,T]\) and each \(\epsilon \), \(\mathcal S[{\tilde{X}}_\epsilon , \psi ^0_\epsilon (t_1), \psi ^0_\epsilon (t_2)] = S[X,t_1,t_2]\), we have that the difference
is a martingale (depending on \(\epsilon \)). Taking derivatives with respect to \(\epsilon \) and evaluating at \(\epsilon =0\) for the above equality, and recalling that \(j^\nabla V = \frac{{\textrm{d}}}{{\textrm{d}}\epsilon } \big |_{\epsilon =0}j^\nabla \psi _\epsilon \), we can obtain the desired result. \(\square \)
Definition 7.29
Given a smooth function \(\Phi :{\mathbb {R}}\times M\rightarrow {\mathbb {R}}\). A \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\) is called an infinitesimal \(\Phi \)-divergence symmetry of \({\mathcal {S}}\), if
for all \(X\in I_0^T(M)\) and \(t\in [0,T]\).
Recall that for the \(\pi \)-projectable vector field V of the form (4.9), we denote \({\bar{V}} = V^i \frac{\partial }{\partial {x^i}}\), as in Corollary 4.17.
Proposition 7.30
A vector field V of the form (4.9) is an infinitesimal \(\Phi \)-divergence symmetry of \({\mathcal {S}}\) if and only if
Proof
It follows from Corollary 4.17 and (7.19), (7.20) that
This concludes the proof. \(\square \)
Corollary 7.31
Let \(L_0:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) be a hyperregular Lagrangian.
Let V be a vector field of the form (4.9). Given a smooth function \(\Phi :{\mathbb {R}}\times M\rightarrow {\mathbb {R}}\), define the \(\Phi \)-extension of V by
which is a vector field on \({\mathbb {R}}\times M\times {\mathbb {R}}\). Suppose that V satisfies
for S the solution of the Hamilton–Jacobi–Bellman equation (7.49) associated with \(L_0\) (for \(f\equiv 0\)). Then, V is an infinitesimal \(\Phi \)-divergence symmetry of \({\mathcal {S}}\) if and only if \(V_\Phi \) is an infinitesimal symmetry of equation (7.49).
Proof
By the classical jet bundle theory, we know that V is an infinitesimal symmetry of Hamilton–Jacobi–Bellman equation (7.49) if and only if (Olver 1998, Theorem 2.31)
where
with coefficients given by Olver (1998, Theorem 2.36 or Example 2.38)
Moreover, the jet coordinates \((u_t, u_i, u_{ij})\) satisfy
where we recall \(dS = d_{\dot{x}}L_0\) from Eq. (7.29) and Remark 7.23, and also that \(\partial _t S = - H_0(dS,t) - \frac{1}{2} \Delta S = - E_0- \frac{1}{2} \Delta S\). Plugging these into (7.69) and using the fact that \(\partial _t H_0 = - \partial _t L_0\) and \(\partial _{x^i} H_0 = - \partial _{x^i} L_0\) due to classical Legendre transform, we have
where, in the last equality, we used the fact that \((QX)^{ij}(t) = g^{ij}(X(t))\) to derive \({\textbf{D}}_{\textrm{t}} \Phi \). The result then follows from Proposition 7.30. \(\square \)
Theorem 7.32
(Stochastic Noether’s theorem) Let \(L_0:{\mathbb {R}}\times TM\rightarrow {\mathbb {R}}\) be a hyperregular Lagrangian. Suppose that the vector field \(V_\Phi \) in (7.68) is an infinitesimal symmetry of the Hamilton–Jacobi–Bellman equation (7.49) associated with \(L_0\) (with \(f\equiv 0\)). Then, the following stochastic conservation law holds for the stochastic Euler–Lagrange equation (7.22),
Proof
Recall that \(dS = d_{\dot{x}}L_0\) and \(\partial _t S = - E_0- \frac{1}{2} \Delta S = -E\). By applying Lemma 7.8.(iv) and (7.22), as well as the fact that \((QX)^{ij}(t) = g^{ij}(X(t))\), we have
Then, we use HJB equation (7.49) (with \(f\equiv 0\)) and the classical Legendre transform \(H_0 = d_{\dot{x}}L_0\cdot \dot{x} - L_0\) to derive
Combining these with the S-EL equation (7.22) and the criterion (7.70) for symmetries of the HJB equation (7.27), we have
The result follows. \(\square \)
Remark 7.33
-
(i)
In stochastic Hamiltonian formalism, (7.71) reads as \({\textbf{D}}_{\textrm{t}}\left[ V^i p_i - V^0 H - \Phi \right] = 0\).
-
(ii)
The stochastic conservation law (6.19) of a time-independent g-canonical second-order Hamiltonian \(H = \overline{H}_0^g\) can be regarded as a special case of the above stochastic Noether’s theorem. Indeed, consider the infinitesimal unit time translation \(V = \frac{\partial }{\partial t}\), i.e., \(V^0 = 1\), \({\bar{V}} = 0\), \(\Phi =0\). Then, the criterion (7.70) reduces to \(0 = \partial _t L_0 = -\partial _t H_0\), which means that \(H = {\overline{H}}_0^g\) is time-independent. The resulting stochastic conservation law is \({\textbf{D}}_{\textrm{t}} E = {\textbf{D}}_{\textrm{t}} H =0\).
Applying stochastic Noether’s theorem to Schrödinger’s problem of Sect. 7.3, we have the following corollary. Its Euclidean case with zero vector potential (i.e., \(b\equiv 0\)) has already been formulated in Thieullen and Zambrini (1997).
Corollary 7.34
(Stochastic Noether’s theorem for Schrödinger’s problem) Let \(L_0\) be the Lagrangian given in (7.25). Suppose that the vector field \(V_\Phi \) in (7.68) is an infinitesimal symmetry of Hamilton–Jacobi–Bellman equation (7.27) with \(f\equiv 0\). Then, the following stochastic conservation law holds for the coordinate process of the solution of Schrödinger’s problem in (7.33),
where \(E_0\) is the classical energy given in (7.45) and S is the solution of (7.27).
Data Availability
Our manuscript has no associated data.
Abbreviations
- HJB equation:
-
Hamilton–Jacobi–Bellman equation
- MDE:
-
Mean differential equations
- SDE:
-
Stochastic differential equations
- S-EL equation:
-
Stochastic Euler–Lagrange equation
- S-H equations:
-
Stochastic Hamilton’s equations
- A :
-
A general second-order differential operator or second-order vector field
- \(A^X\) :
-
Generator of the diffusion X
- \(\circ \,d\) :
-
Stratonovich stochastic differential
- d :
-
Exterior differential on the manifold M, or Itô stochastic differential
- \({{\varvec{d}}}\) :
-
Linear operator extended from the exterior differential on the tangent bundle TM
- \(d^2\) :
-
Second-order differential on M
- \(d^\circ \) :
-
Mixed-order differential on \({\mathbb {R}}\times M\)
- \(d_x\) :
-
Horizontal differential on the tangent bundle TM or cotangent bundle \(T^*M\)
- \(d_{\dot{x}}\) :
-
Vertical differential on TM
- \((DX, QX), D_\nabla X, Q(X,Y)\) :
-
Mean derivatives
- \({\textbf{D}}_{\textrm{t}},{\textbf{Q}}_{\textrm{t}}\) :
-
Total mean derivatives
- \(\frac{{\textbf{D}}}{dt}, \frac{\overline{{\textbf{D}}}}{dt}\) :
-
Mean covariant derivative and damped mean covariant derivative
- \(\Delta \), \(\Delta _{\text {LD}}\) :
-
Connection Laplacian and Laplace–de Rham operator
- \(F^S_*, F^{S*}\) :
-
Second-order pushforward and pullback of a smooth map \(F:M\rightarrow N\)
- \(F^R_*, F^{R*}\) :
-
Mixed-order pushforward and pullback of F
- \(\Gamma \) :
-
Christoffel symbols or stochastic parallel displacement
- \(\overline{\Gamma }\) :
-
Damped parallel displacement
- \(I_t(M), I_{(t,q)}(M), I_{(t,q)}^{T,\mu }(M)\) :
-
Various sets of M-valued diffusions starting at time t
- \(j_q X, j_q^\nabla X, j_{(t,q)}X, j_t X\) :
-
Stochastic tangent vectors and stochastic jets
- \({\mathcal {L}}\) :
-
Lie derivatives
- \(\nabla \) :
-
Linear connection, Levi–Civita connection, covariant derivative, or gradient on M
- \(\nabla ^2\) :
-
Hessian operator
- \(\nabla _p\) :
-
Vertical gradient on \(T^*M\)
- \((\Omega , {\mathcal {F}}, {\textbf{P}})\) :
-
Probability space \(\Omega \) with \(\sigma \)-field \({\mathcal {F}}\) and probability measure \({\textbf{P}}\)
- \(\{{\mathcal {P}}_t\}_{t \in {\mathbb {R}}}\), \(\{{\mathcal {F}}_t\}_{t \in {\mathbb {R}}}\) :
-
Past (nondecreasing) filtration and future (nonincreasing) filtration
- \(\frac{\partial }{\partial {t}}, \partial _{t}\) :
-
Differential operator with respect to coordinate t
- \(\frac{\partial }{\partial {x^i}}, \partial _i\) :
-
Differential operator with respect to coordinate \(x^i\)
- \(\frac{\partial ^2}{\partial x^j\partial x^k}, \partial _{jk}\) :
-
Second-order differential operator with respect to coordinates \(x^j\) and \(x^k\)
- \(\frac{\partial }{\partial {p_i}}, \partial _{p_i}\) :
-
Differential operator with respect to coordinate \(p_i\)
- R, \(\textrm{Ric}\) :
-
Riemann curvature tensor and Ricci (1, 1)-tensor
- \({\mathcal {T}}^O M, {\mathcal {T}}^E M\) :
-
Second-order tangent bundle and second-order elliptic tangent bundle
- \({\mathcal {T}}^S M\) :
-
Stochastic tangent bundle
- \({\mathcal {T}}^{S*} M\) :
-
Second-order cotangent bundle
- V :
-
A general vector field
- \((x^i,D^i x, Q^{jk} x)\) :
-
Canonical coordinates on \({\mathcal {T}}^S M\)
- \((x^i,p_i, o_{jk})\) :
-
Canonical coordinates on \({\mathcal {T}}^{S*} M\)
- \(X_*, X^*\) :
-
Pushforward and pullback of the diffusion X
- \({\textbf{X}}\) :
-
A horizontal diffusion valued on a general bundle E or on \({\mathcal {T}}^{S*} M\)
- \({\mathbb {X}}\) :
-
A horizontal diffusion valued on \(T^* M\)
References
Abraham, R., Marsden, J.E.: Foundations of Mechanics, 2nd edn. Addison-Wesley Publishing Company, Boston (1978)
Albeverio, S., Yasue, K., Zambrini, J.-C.: Euclidean quantum mechanics: analytical approach. Ann. l’IHP Physique théorique 50, 259–308 (1989)
Albeverio, S., Rezende, J., Zambrini, J.-C.: Probability and quantum symmetries. II. The theorem of Noether in quantum mechanics. J. Math. Phys. 47(6), 062107 (2006)
Angst, J., Bailleul, I., Tardif, C.: Kinetic Brownian motion on Riemannian manifolds. Electron. J. Probab. 20, 1–40 (2015)
Arnaudon, M., Thalmaier, A.: Complete lifts of connections and stochastic Jacobi fields. J. Math. Pures Appl. 77(3), 283–315 (1998)
Arnaudon, M., Chen, X., Cruzeiro, A.B.: Stochastic Euler–Poincaré reduction. J. Math. Phys. 55(8), 081507 (2014)
Arnold, V.I.: Mathematical Methods of Classical Mechanics, vol. 60, 2nd edn. Springer, New York (1989)
Arnold, V.I., Khesin, B.A.: Topological Methods in Hydrodynamics, vol. 125, 2nd edn. Springer, Cham (2021)
Asorey, M., Carinena, J.F., Ibort, L.A.: Generalized canonical transformations for time-dependent systems. J. Math. Phys. 24(12), 2745–2750 (1983)
Belopolskaya, Y.I., Dalecky, Y.L.: Stochastic Equations and Differential Geometry. Kluwer Academic Publishers, Amsterdam (1990)
Bernstein, S.: Sur les liaisons entre les grandeurs aléatoires. Verh. Int. Math. Kongr. Zurich, Band I (1932)
Bismut, J.-M.: Mécanique Aléatoire, vol. 866. Springer, Berlin Heidelberg (1981)
Çetin, U., Danilova, A.: Markov bridges: SDE representation. Stoch. Process. Appl. 126(3), 651–679 (2016)
Chen, X., Cruzeiro, A.B., Ratiu, T.S.: Stochastic variational principles for dissipative equations with advected quantities. J. Nonlinear Sci. 33(1), 5 (2023)
Chung, K.L., Zambrini, J.-C.: Introduction to Random Time and Quantum Randomness, vol. 1. World Scientific, Singapore (2003)
Cruzeiro, A.B., Vuillermot, P.-A.: Forward-backward stochastic differential equations generated by Bernstein diffusions. Stoch. Anal. Appl. 33(1), 91–109 (2015)
Cruzeiro, A.B., Zambrini, J.-C.: Malliavin calculus and Euclidean quantum mechanics. I. Functional calculus. J. Funct. Anal. 96(1), 62–95 (1991)
Cruzeiro, A.B., Wu, L., Zambrini, J.-C.: Bernstein processes associated with a Markov process. In: Stochastic Analysis and Mathematical Physics, pp. 41–72. Springer (2000)
Dahlqvist, A., Diehl, J., Driver, B.K.: The parabolic Anderson model on Riemann surfaces. Probab. Theory Relat. Fields 174(1), 369–444 (2019)
Dirac, P.A.M.: The Lagrangian in quantum mechanics. Phys. Z. Sowjetunion Band 3(Heft 1), 64–72 (1933)
Dohrn, D., Guerra, F.: Geodesic correction to stochastic parallel displacement of tensors. In: Stochastic Behavior in Classical and Quantum Hamiltonian Systems, pp. 241–249. Springer (1979)
Driver, B.K.: A Cameron–Martin type quasi-invariance theorem for Brownian motion on a compact Riemannian manifold. J. Funct. Anal. 110(2), 272–376 (1992)
Dynkin, E.B.: Diffusion of tensors. In: Doklady Akademii Nauk SSSR, vol. 179, pp. 1264–1267. Russian Academy of Sciences (1968)
Elworthy, K.D.: Stochastic Differential Equations on Manifolds, vol. 70. Cambridge University Press, Cambridge (1982)
Emery, M.: Stochastic Calculus in Manifolds. Springer, Berlin, Heidelberg (1989)
Emery, M.: An invitation to second-order stochastic differential geometry. hal-00145073 (2007)
Fang, S., Malliavin, P.: Stochastic analysis on the path space of a Riemannian manifold: I. Markovian stochastic calculus. J. Funct. Anal. 118(1), 249–274 (1993)
Feynman, R.P.: Space–time approach to non-relativistic quantum mechanics. Rev. Mod. Phys. 118, 367–387 (1948)
Fleming, W.H., Soner, H.M.: Controlled Markov Processes and Viscosity Solutions, vol. 25, 2nd edn. Springer, Berlin (2006)
Fock, V.A.: Fundamentals of Quantum Mechanics, 2nd edn. Mir Publishers, Moscow (1978)
Gaeta, G., Quintero, N.R.: Lie-point symmetries and stochastic differential equations. J. Phys. A Math. Gen. 32(48), 8485–8505 (1999)
Geiges, H.: An Introduction to Contact Topology, vol. 109. Cambridge University Press, Cambridge (2008)
Gentil, I., Léonard, C., Ripani, L.: Dynamical aspects of the generalized Schrödinger problem via Otto calculus-a heuristic point of view. Rev. Mat. Iberoam. 36(4), 1071–1112 (2020)
Gliklikh, Y.E.: Global and Stochastic Analysis with Applications to Mathematical Physics. Springer, London (2011)
Goldstein, H., Poole, C., Safko, J.: Classical Mechanics, 3rd edn. Pearson Education, Pearson (2002)
Hairer, M.: A theory of regularity structures. Invent. Math. 198(2), 269–504 (2014)
Haussmann, U.G.: A Stochastic Maximum Principle for Optimal Control of Diffusions, vol. 151. Longman Scientific and Technical (1986)
Holm, D.D., Schmah, T., Stoica, C.: Geometric Mechanics and Symmetry: From Finite to Infinite Dimensions, vol. 12. Oxford University Press, Oxford (2009)
Hsu, P.: Brownian bridges on Riemannian manifolds. Probab. Theory Relat. Fields 84(1), 103–118 (1990)
Hsu, E.P.: Quasi-invariance of the Wiener measure on the path space over a compact Riemannian manifold. J. Funct. Anal. 134, 417–450 (1995)
Hsu, E.P.: Stochastic Analysis on Manifolds, vol. 38. American Mathematical Society, Providence (2002)
Huang, Q., Zambrini, J.-C.: Stochastic geometric mechanics in nonequilibrium thermodynamics: Schrödinger meets Onsager. J. Phys. A Math. Theor. 56(13), 134003 (2023)
Ikeda, N., Watanabe, S.: Stochastic Differential Equations and Diffusion Processes, vol. 24, 2nd edn. North-Holland Publishing Company, Amsterdam (1989)
Itô, K.: The Brownian motion and tensor fields on Riemannian manifold. In: Proceedings of the International Congress of Mathematicians 1962, pp. 536–539. Almqvist & Wiksells (1962)
Itô, K.: Stochastic parallel displacement. In: Probabilistic Methods in Differential Equations, pp. 1–7. Springer (1975)
Jamison, B.: The Markov processes of Schrödinger. Z. Wahrscheinlichkeitstheorie Verw. Gebiete 32(4), 323–331 (1975)
Jost, J.: Riemannian Geometry and Geometric Analysis, 7th edn. Springer, Berlin (2017)
Karatzas, I., Shreve, S.: Brownian Motion and Stochastic Calculus, vol. 113. Springer, New York (1991)
Khesin, B., Misiołek, G., Modin, K.: Geometric hydrodynamics and infinite-dimensional Newton’s equations. Bull. Am. Math. Soc. 58(3), 377–442 (2021)
Kobayashi, S., Nomizu, K.: Foundations of Differential Geometry, vol. 1. Interscience Publishers, Geneva (1963)
Lang, S.: Fundamentals of Differential Geometry, vol. 191. Springer, Berlin (1999)
Lassalle, R., Zambrini, J.-C.: A weak approach to the stochastic deformation of classical mechanics. J. Geom. Mech. 8(2), 221 (2016)
Lázaro-Camí, J.-A., Ortega, J.-P.: Stochastic Hamiltonian dynamical systems. Rep. Math. Phys. 61(1), 65–122 (2008)
Lázaro-Camí, J.-A., Ortega, J.-P.: The stochastic Hamilton–Jacobi equation. J. Geom. Mech. 1(3), 295 (2009)
Lee, J.M.: Introduction to Smooth Manifolds, vol. 218, 2nd edn. Springer, New York (2013)
Léonard, C.: Girsanov theory under a finite entropy condition. In: Séminaire de Probabilités XLIV, pp. 429–465. Springer (2012a)
Léonard, C.: From the Schrödinger problem to the Monge–Kantorovich problem. J. Funct. Anal. 262(4), 1879–1920 (2012b)
Léonard, C.: A survey of the Schrödinger problem and some of its connections with optimal transport. Discrete Contin. Dyn. Syst. 34(4), 1533–1574 (2014)
Léonard, C., Rœlly, S., Zambrini, J.-C.: Reciprocal processes: a measure-theoretical point of view. Probab. Surv. 11, 237–269 (2014)
Lescot, P., Zambrini, J.-C.: Probabilistic deformation of contact geometry, diffusion processes and their quadratures. In: Seminar on Stochastic Analysis, Random Fields and Applications V, vol. 59, pp. 203–226. Springer (2007)
Li, X.-M.: Limits of random differential equations on manifolds. Probab. Theory Relat. Fields 166(3), 659–712 (2016)
Malliavin, P.: Stochastic Analysis, vol. 313. Springer, Berlin, Heidelberg (1997)
Marsden, J.E., Ratiu, T.S.: Introduction to Mechanics and Symmetry: A Basic Exposition of Classical Mechanical Systems, vol. 17, 2nd edn. Springer, Berlin (1999)
Meyer, P.-A.: Formes differentielles d’ordre \(n> 1\). Publication IRMA, Université Louis Pasteur, Strasbourg, 80 (1979)
Meyer, P.-A.: A differential geometric formalism for the Itô calculus. In: Stochastic Integrals, vol. 851 of LNM, pp. 256–270. Springer (1981a)
Meyer, P.-A: Géométrie stochastique sans larmes. In: Séminaire de Probabilités XV 1979/80, pp. 44–102. Springer (1981b)
Mikami, T.: Stochastic Optimal Transportation: Stochastic Control with Fixed Marginals. Springer, Berlin (2021)
Munkres, J.R.: Topology, 2nd edn. Prentice Hall Inc, Hoboken (1975)
Nelson, E.: Dynamical Theories of Brownian Motion, vol. 106, 2nd edn. Princeton University Press, Princeton (2001)
Øksendal, B.: Stochastic Differential Equations: An Introduction with Applications. Springer, Berlin Heidelberg (2010)
Olver, P.J.: Equivalence, Invariants and Symmetry. Cambridge University Press, Cambridge (1995)
Olver, P.J.: Applications of Lie Groups to Differential Equations, vol. 107, 2nd edn. Springer, New York (1998)
Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Partial Differ. Equ. 26(1–2), 107–174 (2001)
Petersen, P.: Riemannian Geometry, vol. 171, 3rd edn. Springer, Berlin (2016)
Peyré, G., Chizat, L., Vialard, F.-X., Solomon, J.: Quantum entropic regularization of matrix-valued optimal transport. Eur. J. Appl. Math. 30(6), 1079–1102 (2019)
Privault, N., Yang, X., Zambrini, J.-C.: Large deviations for Bernstein bridges. Stoch. Process. Appl. 126(5), 1285–1305 (2016)
Saunders, D.J.: The Geometry of Jet Bundles, vol. 142. Cambridge University Press, Cambridge (1989)
Schrödinger, E.: Quantization as a problem of proper values (part I). Ann. Phys. 101, 25–32 (1926)
Schrödinger, E.: Sur la théorie relativiste de l’électron et l’interprétation de la mécanique quantique. Ann. l’inst. Henri Poincaré 2, 269–310 (1932)
Schwartz, L.: Semi-martingales sur des Variétés, et Martingales Conformes sur des Variétés Analytiques Complexes, vol. 780. Springer-Verlag, Berlin, Heidelberg (1980)
Schwartz, L.: Géométrie différentielle du 2 ème ordre, semi-martingales et équations différentielles stochastiques sur une variété différentielle. In: Séminaire de Probabilités XVI, 1980/81 Supplément: Géométrie Différentielle Stochastique, pp. 1–148. Springer (1982)
Schwartz, L.: Semimartingales and their stochastic calculus on manifolds. Gaetan Morin Editeur Ltee (1984)
Thieullen, M., Zambrini, J.-C.: Probability and quantum symmetries I: The theorem of Noether in Schrödinger’s Euclidean quantum mechanics. Ann. Inst. Henri Poincaré 67(3), 297–338 (1997)
Trachenko, K., Brazhkin, V.V.: The quantum mechanics of viscosity. Phys. Today 74(12), 66–67 (2021)
Villani, C.: Optimal Transport: Old and New, vol. 338. Springer, Berlin, Heidelberg (2009)
von Renesse, M.-K.: An optimal transport view of Schrödinger’s equation. Can. Math. Bull. 55(4), 858–869 (2012)
Yong, J., Zhou, X.Y.: Stochastic Controls: Hamiltonian Systems and HJB Equations, vol. 43. Springer, New York (1999)
Zambrini, J.-C.: Variational processes and stochastic versions of mechanics. J. Math. Phys. 27(9), 2307–2330 (1986)
Zambrini, J.-C.: The research program of stochastic deformation (with a view toward geometric mechanics). In: Stochastic Analysis: A Series of Lectures, vol. 68 of Progress in Probability, pp. 359–393. Springer, Basel (2015)
Acknowledgements
We would like to thank Prof. Ana Bela Cruzeiro and Prof. Marc Arnaudon for their careful reading and helpful discussions, which helped us a lot especially in improving Sects. 6.2 and 7.2. We also would like to thank Prof. Maosong Xiang for his helpful suggestions and kind experience-sharing. This paper is supported by FCT, Portugal, project PTDC/MAT-STA/28812/2017, “Schrödinger’s problem and optimal transport: a multidisciplinary perspective (SchröMoka).”
Funding
Open access funding provided by FCT|FCCN (b-on).
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by Anthony Bloch.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendices
Appendix A Mixed-Order Tangent and Cotangent Bundles
1.1 A.1 Mixed-Order Tangent and Cotangent Maps
Clearly, the mixed-order tangent bundle \(T {\mathbb {R}}\times {\mathcal {T}}^S M\) is a subbundle of the totally second-order tangent bundle \(\mathcal T^S ({\mathbb {R}}\times M)\), and contains the tangent bundle \(T ({\mathbb {R}}\times M) \cong T {\mathbb {R}}\times T M\) as a subbundle. Similar properties hold for the mixed-order cotangent bundle.
It is easy to verify that the mixed-order tangent bundle can be characterized as follows:
We also define the stochastic analog of the vertical bundle as
Then, it is easy to see that \(V^S\pi \cong {\mathbb {R}}\times {\mathcal {T}}^S M\).
Given a smooth map \(F: {\mathbb {R}}\times M \rightarrow {\mathbb {R}}\times N\), we can define its second-order pushforward \(F_*^S\) as in Definitions 5.5 and 5.7, so that \(F_*^S\) is a bundle homomorphism from \(\tau ^S_{{\mathbb {R}}\times M}\) to \(\tau ^S_{{\mathbb {R}}\times N}\). In general, \(F_*^S\) neither maps the mixed-order tangent bundle to the mixed-order tangent bundle, nor maps the vertical bundle to the vertical bundle. But if F is projectable, then it does.
Lemma A.1
Let M and N be two smooth manifolds and M be connected. Let \(F: {\mathbb {R}}\times M\rightarrow {\mathbb {R}}\times N\) be a smooth map. Then, the following statements are equivalent:
-
(i)
F is a bundle homomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\);
-
(ii)
\(F^S_* (T {\mathbb {R}}\times {\mathcal {T}}^S M ) \subset T {\mathbb {R}}\times {\mathcal {T}}^S N\);
-
(iii)
\(F^S_* (V^S \pi ) \subset V^S \rho \).
Proof
We first prove that (i) implies both (ii) and (iii). Suppose that F is a bundle homomorphism projecting to \(F^0\). Then, \(\rho \circ F = F^0\circ \pi \) and hence, for any \(A\in {\mathcal {T}}^S ({\mathbb {R}}\times M)\),
If \(A\in T {\mathbb {R}}\times {\mathcal {T}}^S M\), then \(\pi ^S_*(A) \in T {\mathbb {R}}\) and thus \(\rho ^S_*(F^S_* (A)) \in (F^0)^S_*(T {\mathbb {R}}) = (F^0)_*(T {\mathbb {R}}) \subset T {\mathbb {R}}\). This implies \(F^S_* (A)\in T {\mathbb {R}}\times {\mathcal {T}}^S N\). If \(A\in V^S \pi \), then \(\pi ^S_*(A) = 0\), it follows \(\rho ^S_*(F^S_* (A))=0\) and therefore \(F^S_* (A)\in V^S \rho \).
Next we prove either (ii) or (iii) implies (i). Choose local coordinates \((t,x^i)\) around \((t_0,q) \in {\mathbb {R}}\times M\) and \((s,y^j)\) around \(F(t_0,q)\). Suppose F has a local expression \(F=(F^0, \bar{F}^j)\). Let \(A\in T {\mathbb {R}}\times {\mathcal {T}}^S M|_{(t_0,q)}\) having the following local expression:
Then, Lemma 5.6 yields
If (ii) holds, then \(F^S_*(A) \in T {\mathbb {R}}\times {\mathcal {T}}^S N|_{F(t_0,q)}\). It then follows
Since A is arbitrary, we know that \(\frac{\partial F^0}{\partial x^i} = 0\) for all i. Then, by the connectness of M, \(F^0\) is independent of \(q\in M\). This implies that F is a bundle homomorphism. Now assume that \(A\in V^S M|_{(t_0,q)}\) has a local expression in (A.1) with \(A^0 = 0\). If (iii) holds, then \(F^S_*(A) \in V^S N|_{F(t_0,q)}\). This amounts to (A.2) together with
Again, the arbitrariness of A yields that \(\frac{\partial F^0}{\partial x^i} = 0\) for all i. Thus, F is a bundle homomorphism. \(\square \)
It is easy to deduce from the proof that if \(F = (F^0, {\bar{F}})\) is a bundle homomorphism from \(\pi \) to \(\rho \), then \(F^S_*|_{T {\mathbb {R}}\times {\mathcal {T}}^S M}\) is a bundle homomorphism from \(\tau _R\times \tau ^S_{M}\) to \(\tau _{\mathbb {R}}\times \tau ^S_{N}\).
When \(F: {\mathbb {R}}\times M \rightarrow {\mathbb {R}}\times N\) is a diffeomorphism, we can also consider the second-order pullback map \(F^{S*}\) which is a bundle homomorphism from \(\tau ^{S*}_{{\mathbb {R}}\times M}\) to \(\tau ^{S*}_{{\mathbb {R}}\times N}\). But when we restrict \(F^{S*}\) to the mixed-order cotangent bundle \(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M\), there are difficulties. We can check that even if F is a bundle homomorphism, \(F^{S*}\) does not necessarily map \(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M\) into \(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M\). The reason is basically that the restrictions of second-order pullbacks to the cotangent bundle do not coincide with usual pullbacks. To overcome this, we consider the dual map of \(F^S_*|_{T {\mathbb {R}}\times {\mathcal {T}}^S M}\). This motivates the following definition, which contrasts with Definitions 5.5 and 5.7.
Definition A.2
(Mixed-order pushforward and pullback) Let F be a bundle homomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\). The mixed-order tangent map of F at \((t,q)\in {\mathbb {R}}\times M\) is the linear map \(d^\circ F_{(t,q)}: T {\mathbb {R}}\times {\mathcal {T}}^S M|_{(t,q)} \rightarrow T {\mathbb {R}}\times {\mathcal {T}}^S N|_{F(t,q)}\) defined by
The mixed-order cotangent map of F at \((t,q)\in {\mathbb {R}}\times M\) is the linear map \(d^\circ F^*_{(t,q)}: T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} N|_{F(t,q)} \rightarrow T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M|_{(t,q)}\) dual to \(d^\circ F_{(t,q)}\), that is,
The mixed-order pushforward by F is the bundle homomorphism \(F^R_*: (T {\mathbb {R}}\times {\mathcal {T}}^S M, \tau _{\mathbb {R}}\times \tau ^S_M, {\mathbb {R}}\times M) \rightarrow (T {\mathbb {R}}\times {\mathcal {T}}^S N, \tau _{\mathbb {R}}\times \tau ^S_N, {\mathbb {R}}\times N)\) defined by
Given a mixed-order form \(\alpha \) on \({\mathbb {R}}\times N\), the mixed-order pullback of \(\alpha \) by F is the mixed-order form \(F^{R*}\alpha \) on \({\mathbb {R}}\times M\) defined by
If, moreover, F is a bundle isomorphism, then the mixed-order pullback by F is the bundle isomorphism \(F^{R*}: (T {\mathbb {R}}\times {\mathcal {T}}^{R*} N, \tau _{\mathbb {R}}\times \tau ^{S*}_N, {\mathbb {R}}\times N) \rightarrow (T {\mathbb {R}}\times {\mathcal {T}}^{S*} M, \tau _{\mathbb {R}}\times \tau ^{S*}_M, {\mathbb {R}}\times M)\) defined by
Given a mixed-order vector field A on \({\mathbb {R}}\times M\), the mixed-order pushforward of A by F is the mixed-order vector field \(F^R_*A\) on \({\mathbb {R}}\times N\) defined by
Clearly, the mixed-order pushforward \(F^R_*\) is nothing but \(F^S_*|_{T {\mathbb {R}}\times {\mathcal {T}}^S M}\). Write \(F = (F^0, {\bar{F}})\). Then, in local coordinates, \(F^R_*\) acts on A of (A.1) as follows:
And \(F^{R*}\) acts on the mixed-order cotangent vector \(\alpha = \alpha _0 ds|_{F^0(t_0)} + \alpha _i d^2 y^i|_{{\bar{F}}(t_0,q)} + \alpha _{ij}dy^i\cdot dy^j|_{{\bar{F}}(t_0,q)} \in T {\mathbb {R}}\times \mathcal T^{S*} N|_{F(t_0,q)}\) by
By virtue of these local expressions, one easily deduce that
And in turn, these verify the linearity of \(F^R_*\) and \(F^{R*}\). The following property is easy to check.
Lemma A.3
Let F be a bundle isomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) and A be a mixed-order vector field. Let f be a smooth functions on \({\mathbb {R}}\times N\). Then, \(((F^R_*A)f)\circ F = A(f\circ F)\).
1.2 A.2 Pushforwards of Generators
A smooth map \(F: M\rightarrow N\) can be associated naturally with a bundle homomorphism \(\textbf{Id}_{\mathbb {R}}\times F: ({\mathbb {R}}\times M, \pi , {\mathbb {R}}) \rightarrow ({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) that projects to the identity on \({\mathbb {R}}\). In this case, the pushforward of a diffusion X by \(\textbf{Id}_{\mathbb {R}}\times F\) is just \((\textbf{Id}_{\mathbb {R}}\times F)\cdot X = F(X)\). The stochastic prolongations of the bundle homomorphism \(\textbf{Id}_{\mathbb {R}}\times F\) is then
Corollary A.4
Let \(F: M\rightarrow N\) be a diffeomorphism. If a diffusion X on M has a generator \(A=(A_t)\), then the process F(X) is a diffusion on N, with generator \(F^S_*A = (F^S_*A_t)\).
Proof
Assume \(X\in I_{t_0}(M)\). For every \(f\in C^\infty (N)\), \(f\circ F\in C^\infty (M)\), by the assumption, we have
is a real-valued continuous \(\{{\mathcal {P}}_t\}\)-martingale. This proves that \(F(X)\in I_{t_0}(N)\) has generator \(F^S_*A\). \(\square \)
This corollary together with the identification between \({\mathbb {R}}\times {\mathcal {T}}^S M\) and \({\mathbb {R}}\times {\mathcal {T}}^E M\) in (3.6) and (3.7), give rise to the relation between prolongations and pushforwards as follows:
so that \(j (\textbf{Id}_{\mathbb {R}}\times F) = \textbf{Id}_{\mathbb {R}}\times F^S_*\).
The following corollary is an extension of Corollary A.4 and a straightforward consequence of Lemma 4.8. Here, we will present another proof, using notions of “Appendix A.1.”
Corollary A.5
Let F be a bundle isomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to \(F^0\). If X is a diffusion on M with respect to \(\{{\mathcal {P}}_t\}\) and has a extended generator \(\frac{\partial }{\partial {t}} + A\) where A is a time-dependent second-order vector field, then the pushforward \(F\cdot X\) is a diffusion on N with respect to \(\{{\mathcal {F}}_{(F^0)^{-1}(s)}\}\), with extended generator
Proof
Assume that \(X\in I_{t_0}(M)\) and \(F = (F^0, {\bar{F}})\). For every \(f\in C^\infty ({\mathbb {R}}\times N)\), Lemma A.3 yields that the process
is a continuous \(\{{\mathcal {P}}_t\}\)-martingale. Denote \(s_0 = F^0(t_0)\). By substituting \(t = (F^0)^{-1}(s)\) which can be done because \(F^0\) is an isomorphism, and using the change of variable \(u = (F^0)^{-1}(v)\), and recalling that \(F\cdot X(s) = {\bar{F}}\left( (F^0)^{-1}(s), X((F^0)^{-1}(s)) \right) \), the process
is a continuous \(\{{\mathcal {F}}_{(F^0)^{-1}(s)}\}\)-martingale. The result follows. \(\square \)
Remark A.6
-
(i)
As a consequence, the generator of the pushforward \(F\cdot X\) is given in local coordinates by
$$\begin{aligned} \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s}\left[ \left( \frac{\partial }{\partial {t}} + A \right) {\bar{F}}^i \circ F^{-1}\right] \frac{\partial }{\partial y^i} + \frac{{\textrm{d}}(F^0)^{-1}}{{\textrm{d}}s} \left[ \left( A^{kl} \frac{\partial {\bar{F}}^i}{\partial x^k} \frac{\partial \bar{F}^j}{\partial x^l} \right) \circ F^{-1} \right] \frac{\partial ^2}{\partial y^i\partial y^j}. \end{aligned}$$This coincides with Lemma 4.8.
-
(ii)
This corollary together with Lemma A.1 indicates that the bundle homomorphisms from \({\mathbb {R}}\times M\) to \({\mathbb {R}}\times N\) are the only (deterministic) smooth maps between them that map diffusions to diffusions. Indeed, if a smooth map F from \({\mathbb {R}}\times M\) to \({\mathbb {R}}\times N\) pushes forward a diffusion to another diffusion, then a similar argument as in Corollary A.5 implies that \(F^S_*\) would map the extended generator of the former diffusion to that of the latter, whereas Lemma A.1 says such \(F^S_*\) must be the second-order pushforward of some bundle homomorphism.
-
(iii)
In particular, if F is a smooth map from M to N and X is a diffusion on M with generator A, then F(X) is a diffusion on N with respect to the same filtration, with generator \(F^S_*( A )\).
1.3 A.3 Pushforwards and Pullbacks by Diffusions
Definition A.7
(Pushforwards and pullbacks by diffusions) Let X be an M-valued diffusion process. Let \(({\mathbb {R}}\times U, (t,x^i))\) be a coordinate chart on \({\mathbb {R}}\times M\). The pushforward map \(X_*\) from \(T_t {\mathbb {R}}\) to \(T_t {\mathbb {R}}\times {\mathcal {T}}^S_{X(t)} M\) is defined in the local coordinate by
The pullback map \(X^*\) from \({\mathcal {T}}^*_t {\mathbb {R}}\times \mathcal T^{S*}_{X(t)} M\) to \({\mathcal {T}}^*_t {\mathbb {R}}\) is defined by
Remark A.8
Recall that in classical differential geometry, the pushforward by a smooth curve \(\gamma = (\gamma (t))_{t\in [-1,1]}\) on M is a map \(\gamma _*: T{\mathbb {R}}\rightarrow T M\) given by \(\gamma _*(\frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0}) = \dot{\gamma }^i(t_0)\frac{\partial }{\partial {x^i}}|_{\gamma (t_0)}\). While if we look at the graph of \(\gamma \) as a section of the trivial bundle \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), denoted by \({\bar{\gamma }}\), then the pushforward map by \({\bar{\gamma }}\) is \({\bar{\gamma }}_*(\frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0}) = \frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0} + \dot{\gamma }^i(t_0)\frac{\partial }{\partial {x^i}}|_{\gamma (t_0)}\). For this reason, it would be more appropriate to call \(X_*\) and \(X^*\) in Definition A.7 the pushforward and pullback by graph of X, or by random section corresponding to X, instead of by X itself. But we avoid that for simplicity.
One can see from the definition that the pushforward \(X_*\) maps the time vector \(\frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0}\) to the value of the extended generator of X at \((t_0,X(t_0))\). There is an informal way to look at the pullback map \(X^*\): one first replace all x’s by X’s in the brackets at LHS of (A.6) and obtain
then substituting \(d X^i\) and \(dX^j\cdot dX^k\), and following Itô’s calculus,
and getting rid of the martingale part, we get the RHS of (A.6).
The following corollary is straightforward. We will see that pushforward and pullback maps by diffusions are also closely related to the concept of “total derivatives.”
Corollary A.9
-
(i)
Let X be an M-valued diffusion process. For all \(\tau \frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0}\in {\mathcal {T}}_{t_0}{\mathbb {R}}\) and \(\alpha \in \mathcal T_{t_0}^* {\mathbb {R}}\times {\mathcal {T}}_{X(t_0)}^{S*} M\),
$$\begin{aligned} \left\langle X^*\left( \alpha \right) , \tau \textstyle {\frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0}} \right\rangle = \left\langle \alpha , X_* ( \tau \textstyle {\frac{{\textrm{d}}}{{\textrm{d}}t}|_{t_0}} ) \right\rangle . \end{aligned}$$(A.7) -
(ii)
If \(X\in I_{(t_0,q)}(M)\), f is a smooth function on \({\mathbb {R}}\times M\) and g a smooth function on M, then
$$\begin{aligned} \left\langle X^*(d^\circ f), \textstyle {\frac{{\textrm{d}}}{{\textrm{d}}t}} \right\rangle \big |_{t_0}&= X_*( \textstyle {\frac{{\textrm{d}}}{{\textrm{d}}t}} )(f)\big |_{(t_0,q)} \\&= ({\textbf{D}}_{\textrm{t}} f)(j_{(t_0,q)}X) = \langle \textstyle {{\frac{\partial }{\partial t}}} + A^X, d^\circ f \rangle (t_0,q), \\ \left\langle X^*(dg\cdot dg), \textstyle {\frac{{\textrm{d}}}{{\textrm{d}}t}} \right\rangle \big |_{t_0}&= \left\langle dg\cdot dg, X_*( \textstyle {\frac{{\textrm{d}}}{{\textrm{d}}t}} )\right\rangle \big |_{(t_0,q)} = ({\textbf{Q}}_{\textrm{t}} g)(j_{(t_0,q)}X). \end{aligned}$$ -
(iii)
Let X, Y be M-valued diffusion processes satisfying \(X(t) = Y(t)\) a.s.. Then, \(j_t X = j_t Y\) a.s. if and only if \(X_*( \frac{{\textrm{d}}}{{\textrm{d}}t}|_t) = Y_*( \frac{{\textrm{d}}}{{\textrm{d}}t}|_t)\) a.s.. In particular, if \(X, Y \in I_{(t,q)}(M)\), then \(j_{(t,q)} X = j_{(t,q)} Y\) if and only if \(X_*( \frac{{\textrm{d}}}{{\textrm{d}}t}|_t) = Y_*( \frac{{\textrm{d}}}{{\textrm{d}}t}|_t)\).
-
(iv)
Let F be a bundle homomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) projecting to \(F^0\), and X be an M-valued diffusion process. Then, \(F^R_* \circ X_* = (F\cdot X)_*\circ (F^0)_*\).
-
(v)
Let F be a smooth function from M to M, and X be an M-valued diffusion process. Then, \((\textbf{Id}_{T{\mathbb {R}}}\times F^S_*) \circ X_* = (F\circ X)_*\).
Proof
Assertions (i), (ii) and (iii) are easy to deduce from the definitions. We prove (iv) using local expressions. Assume that \(F = (F^0, {\bar{F}})\) and denote \({\tilde{X}} = F\cdot X\). Recall that \(\tilde{X}(F^0(t)) = {\bar{F}}(t,X(t))\). Then,
The result follows. \(\square \)
1.4 Lie Derivatives
Definition A.10
(Lie derivatives) Let V be a vector field on M and \(\psi = \{\psi _\epsilon \}_{\epsilon \in {\mathbb {R}}}\) be its flow. Let A be a second-order vector field and \(\alpha \) be a second-order form on M. The Lie derivative of A with respect to V is a second-order vector field on M, denoted by \({\mathcal {L}}_V A\), and defined by
The Lie derivative of \(\alpha \) with respect to V is a second-order form on M, denoted by \({\mathcal {L}}_V \alpha \), and defined by
For sufficient small \(\epsilon \ne 0\), \(\psi _\epsilon \) is defined in a neighborhood of \(q\in M\) and \(\psi _{-\epsilon }\) is the inverse of \(\psi _\epsilon \). So the difference quotients in the above definitions of Lie derivatives make sense. It is easy to verify that the derivatives exist for each \(q\in M\), and \({\mathcal {L}}_V A\) is a smooth second-order vector field, \({\mathcal {L}}_V \alpha \) is a smooth second-order covector field. Likewise, the restrictions of \({\mathcal {L}}_V\) to \(\mathcal T_q M\) and \({\mathcal {T}}^{*}_{F(q)} N\) coincide with the classical Lie derivatives. In the following, we will seek properties of \({\mathcal {L}}\). Some of them can be found in Meyer (1981b, Section 6.(d)).
Lemma A.11
Let V be a vector field and f be a smooth function. Let A and \(\alpha \) be a second-order vector field and second-order form, respectively. Then,
-
(i)
\({\mathcal {L}}_V A = [V, A]\), where the RHS denotes the commutator of V and A as linear operators;
-
(ii)
\({\mathcal {L}}_V (f A) = (Vf) A + f {\mathcal {L}}_V A\);
-
(iii)
\(\langle {\mathcal {L}}_V \alpha , A \rangle = V (\langle \alpha , A \rangle ) - \langle \alpha , {\mathcal {L}}_V A \rangle \);
-
(iv)
\({\mathcal {L}}_V (f \alpha ) = (Vf) \alpha + f {\mathcal {L}}_V \alpha \);
-
(v)
\({\mathcal {L}}_V(d^2 f) = d^2(Vf)\).
Remark A.12
Note that the commutator [V, A] is a second-order vector field. Indeed, if V and A have coordinate expressions \(V = V^i \frac{\partial }{\partial {x^i}}\) and \(A = A^i \frac{\partial }{\partial x^i} + A^{ij} \frac{\partial ^2}{\partial x^i\partial x^j}\), then the following local expression for [V, A] is easy to verify:
Proof
-
(i)
For a function \(f \in C^\infty (M)\),
$$\begin{aligned} \begin{aligned} ({\mathcal {L}}_V A)_q f&= \lim _{\epsilon \rightarrow 0} \frac{(\psi _{-\epsilon })^S_* (A_{\psi _\epsilon (q)})f - A_q f}{\epsilon } = \lim _{\epsilon \rightarrow 0} \frac{(A_{\psi _\epsilon (q)})(f\circ \psi _{-\epsilon }) - A_q f}{\epsilon } \\&= \lim _{\epsilon \rightarrow 0} \frac{(A_{\psi _\epsilon (q)})(f\circ \psi _{-\epsilon } - f)}{\epsilon } + \lim _{\epsilon \rightarrow 0} \frac{(A_{\psi _\epsilon (q)})f - A_q f}{\epsilon }. \end{aligned} \end{aligned}$$Then, a similar argument to the derivation of classical Lie derivatives yields
$$\begin{aligned} ({\mathcal {L}}_V A)_q f = -A_q(Vf) + V_q (Af) = [V, A]_q f. \end{aligned}$$ -
(ii)
\({\mathcal {L}}_V (f A)g = [V, fA] g = V(fAg) - fA Vg = Vf Ag + f VAg - fA Vg = Vf Ag + f ({\mathcal {L}}_V A) g\).
-
(iii)
For a second-order vector field A,
$$\begin{aligned} \begin{aligned} \langle {\mathcal {L}}_V \alpha , A \rangle&= \lim _{\epsilon \rightarrow 0} \frac{\langle (\psi _{\epsilon })^{S*} (\alpha _{\psi _\epsilon (q)}), A \rangle - \langle \alpha _q, A \rangle }{\epsilon } \\&= \lim _{\epsilon \rightarrow 0} \frac{\langle \alpha _{\psi _\epsilon (q)}, (\psi _{\epsilon })^S_* A \rangle - \langle \alpha _q, A \rangle }{\epsilon } \\&= \lim _{t\rightarrow 0} \frac{\langle \alpha _{\psi _\epsilon (q)} - \alpha _q, (\psi _{\epsilon })^S_* A \rangle }{\epsilon } + \lim _{\epsilon \rightarrow 0} \frac{\langle \alpha _q, (\psi _{\epsilon })^S_* A - A \rangle }{t} \\&= \lim _{\epsilon \rightarrow 0} \frac{\langle \alpha _{\psi _\epsilon (q)} - \alpha _q, A \rangle }{\epsilon } - \lim _{\epsilon \rightarrow 0} \frac{\langle \alpha _q, (\psi _{-\epsilon })^S_* A - A \rangle }{\epsilon } \\&= V (\langle \alpha , A \rangle ) - \langle \alpha , {\mathcal {L}}_V A \rangle . \end{aligned} \end{aligned}$$ -
(iv)
Use (iii) to derive
$$\begin{aligned} \begin{aligned} \langle {\mathcal {L}}_V (f \alpha ), A \rangle&= V (f \langle \alpha , A \rangle ) - f \langle \alpha , {\mathcal {L}}_V A \rangle \\&= (Vf) \langle \alpha , A \rangle + f V (\langle \alpha , A \rangle ) - f \langle \alpha , {\mathcal {L}}_V A \rangle \\&= (Vf) \langle \alpha , A \rangle + f \langle {\mathcal {L}}_V \alpha , A \rangle . \end{aligned} \end{aligned}$$ -
(v)
Again using (iii) we have \(\langle {\mathcal {L}}_V (d^2 f), A \rangle = V (\langle d^2 f, A \rangle ) - \langle d^2 f, {\mathcal {L}}_V A \rangle = V A f - [V, A] f = AVf = \langle d^2(Vf), A \rangle \).
\(\square \)
Corollary A.13
-
(i)
\({\mathcal {L}}_V (df\cdot dg) = d(Vf)\cdot dg + df\cdot d(Vg)\).
-
(ii)
\({\mathcal {L}}_V (\omega \cdot \eta ) = {\mathcal {L}}_V \omega \cdot \eta + \omega \cdot {\mathcal {L}}_V\eta \).
-
(iii)
\({\mathcal {L}}_V\) commutes with the symmetric product operator \(\bullet \).
Proof
For the first assertion,
We use the local expressions to prove the second assertion. Assume, locally, that \(\omega = \omega _i dx^i\) and \(\eta = \eta _i dx^i\). Then, by (5.4), Lemma A.11.(ii) and Corollary A.11.(iv),
The last assertion is a consequence of the second one. Indeed,
\(\square \)
Given a vector field V on \({\mathbb {R}}\times M\), the Lie derivative \({\mathcal {L}}_V\) can also be defined for second-order vector fields and second-order forms on \({\mathbb {R}}\times M\), as in Definition A.10, without any changes. But when restricting to the mixed-order vector fields and mixed-order forms, it is necessary that the flow in Definition A.10 consists of bundle homomorphisms on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\), so that its mixed-order pushforwards and pullbacks are well defined. This feeding back to the vector field V amounts to V is \(\pi \)-projectable. In this case, we just replace the second-order pushforwards and pullbacks in Definition A.10 by mixed-order pushforwards and pullbacks, to define the Lie derivative \({\mathcal {L}}_V\) for mixed-order vector fields and mixed-order forms on \({\mathbb {R}}\times M\).
Now let V be a \(\pi \)-projectable vector field on \({\mathbb {R}}\times M\). Then, Lemma A.11 (i)–(iv) still holds for smooth functions f on \({\mathbb {R}}\times M\), mixed-order vector fields A and mixed-order forms \(\alpha \) on \({\mathbb {R}}\times M\). The assertion (v) will hold with the mixed differential instead of the second-order differential, that is, \({\mathcal {L}}_V(d^\circ f) = d^\circ (Vf)\). Moreover, if V and A have coordinate expressions \(V = V^0 \frac{\partial }{\partial {t}} + V^i \frac{\partial }{\partial {x^i}}\) and \(A = A^0 \frac{\partial }{\partial {t}} + A^i \frac{\partial }{\partial x^i} + A^{ij} \frac{\partial ^2}{\partial x^i\partial x^j}\) where \(V^0\) only depends on time, then the Lie derivative \({\mathcal {L}}_V A\) has the following expression:
Appendix B The Mixed-Order Contact Structure on \({\mathbb {R}}\times {\mathcal {T}}^S M\)
1.1 B.1 Mixed-Order Total Derivatives and Mixed-Order Contact Forms
We denote by \(\pi _{1,0}^*(T {\mathbb {R}}\times {\mathcal {T}}^S M)\) the pullback bundle (see Saunders 1989, Definition 1.4.5) of \(\tau _{\mathbb {R}}\times \tau ^S_M\) by \(\pi _{1,0}\). It is a fiber bundle over \({\mathbb {R}}\times {\mathcal {T}}^S M\).
Definition B.1
(Mixed-order holonomic lift) Let \(t\in {\mathbb {R}}\), \(q\in M\), \(X\in I_{(t,q)}(M)\) and \(\tau \frac{{\textrm{d}}}{{\textrm{d}}t}|_t\in T_t {\mathbb {R}}\). The mixed-order holonomic lift of \(\tau \frac{\partial }{\partial {t}}|_t\) by X is defined to be
The set of all mixed-order holonomic lifts is denoted by \(H^R\pi _{1,0}\), that is,
Since \(X_*\) depends only upon the mean derivatives of X at t, the holonomic lift of a tangent vector is completely determined by \(j_{(t,q)}X\) and does not depend on the choice of the representative diffusion X. In particular, the set \(H^R \pi _{1,0}\) is well defined and is clearly a subbundle of \(\pi _{1,0}^*(T {\mathbb {R}}\times {\mathcal {T}}^S M)\).
Lemma B.2
The fiber bundle \((\pi _{1,0}^*(T {\mathbb {R}}\times {\mathcal {T}}^S M), \pi _{1,0}^*(\tau _{\mathbb {R}}\times \tau ^S_M), {\mathbb {R}}\times {\mathcal {T}}^S M)\) can be written as the Whitney sum of two subbundles
Proof
Suppose that \(( A, j_{(t,q)}X) \in \pi _{1,0}^*(T {\mathbb {R}}\times \mathcal T^S M)\). Then, \(A \in T {\mathbb {R}}\times {\mathcal {T}}^S M\), and
It follows easily from the definition of pushforward (A.5) that \(\pi ^R_*(A - X_*(\pi ^R_*(A))) = 0\). Hence, \(A - X_*(\pi ^R_*(A))\in V^S\pi \) and
The result follows. \(\square \)
The decomposition of \(( A, j_{(t,q)}X) \in \pi _{1,0}^*(T {\mathbb {R}}\times {\mathcal {T}}^S M)\) may then be found by letting
Definition B.3
A section of the bundle \((H^R\pi _{1,0}, \pi _{1,0}^*(\tau _{\mathbb {R}}\times \tau ^S_M)|_{H^R\pi _{1,0}}, {\mathbb {R}}\times {\mathcal {T}}^S M)\) is called a mixed-order total derivative. The specific section
is called the coordinate mixed-order total derivative, and is denoted by \({\textbf{D}}_t\).
The coordinate mixed-order total derivative is just the total mean derivative in Definition 4.7. The dual construction is the mixed-order contact cotangent vector, which may be described as being in the kernel of \(X^*\).
Definition B.4
An element \((\alpha , j_{(t,q)}X) \in \pi _{1,0}^*(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M)\) is called a mixed-order contact cotangent vector if \(X^*(\alpha ) = 0\). The set of all mixed-order contact cotangent vectors is denoted by \(C^{R*}\pi _{1,0}\), that is,
It is straightforward to check that the vanishing of \(X^*\) does not depend on the particular choice of the representative diffusion X. The dual relation between \(X^*\) and \(X_*\) in (A.7) implies that the mixed-order contact and holonomic elements annihilate each other.
To express a mixed-order contact cotangent vector \((\alpha , j_{(t,q)}X)\) in coordinates, let us consider
Using the definition (A.6) we get
There are two basic nontrivial solutions of the above equation, say,
Plugging these solutions in (B.1), we get two basic types of mixed-order contact cotangent vectors
Thus, every mixed-order contact cotangent vector in \({(C^{R*}\pi _{1,0})}_{j_{(t,q)}X}\) is a linear combination of these basic mixed-order contact cotangent vectors.
Lemma B.5
The fiber bundle \((\pi _{1,0}^*(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M), \pi _{1,0}^*(\tau ^*_{\mathbb {R}}\times \tau ^{S*}_M), {\mathbb {R}}\times {\mathcal {T}}^S M)\) can be written as the Whitney sum of two subbundles
Proof
Suppose that \((\alpha , j_{(t,q)}X) \in \pi _{1,0}^*(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M)\). Then, \(\alpha \in T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} M\), and the definition of pullback yields
Since \(X^*(\alpha - X^*(\alpha )) = 0\), it follows that
This ends the proof. \(\square \)
The decomposition of \(( \alpha , j_{(t,q)}X) \in \pi _{1,0}^*(T {\mathbb {R}}\times {\mathcal {T}}^S M)\) may then be found by letting
Definition B.6
A section of the bundle \((C^{R*}\pi _{1,0}, \pi _{1,0}^*(\tau ^*_{\mathbb {R}}\times \tau ^{S*}_M)|_{C^{R*}\pi _{1,0}}, {\mathbb {R}}\times {\mathcal {T}}^S M)\) is called a mixed-order contact form. The following specific sections
are called basic mixed-order contact forms.
It follows from the construction that the set of basic mixed-order contact forms defines a local frame of the bundle \(\pi _{1,0}^*(\tau ^*_{\mathbb {R}}\times \tau ^{S*}_M)|_{C^{R*}\pi _{1,0}}\).
Remark B.7
In contrast, we recall the classical contact forms on the first-order jet bundle \(J^1 \pi = {\mathbb {R}}\times TM\). Using the coordinates \((t,x^i,\dot{x}^i)\), the classical basic contact forms are \(dx^i - \dot{x}^i dt\), \(1\le i \le d\). See Saunders (1989, Section 4.3) and Olver (1995, Theorem 4.23), also cf. Geiges (2008, p. 9), for a one-dimensional example.
Corollary B.8
Let \(({\mathbb {R}}\times U, (t,x^i))\) be a coordinate chart on \({\mathbb {R}}\times M\). Let \({\textbf{X}}\) be a \({\mathcal {T}}^S M\)-valued diffusion process. In local coordinates, the pushforward map \({\textbf{X}}_*\) from \(T {\mathbb {R}}\) to \(T {\mathbb {R}}\times {\mathcal {T}}^S {\mathcal {T}}^S M\) is given by
The pullback map \({\textbf{X}}^*\) from \(T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} {\mathcal {T}}^S M\) to \(T^* {\mathbb {R}}\) is given by
Corollary B.9
Let \(\alpha \) be a section of \((T^* {\mathbb {R}}\times {\mathcal {T}}^{S*} {\mathcal {T}}^S M, \tau ^*_{\mathbb {R}}\times \tau ^{S*}_{T^S M}, {\mathbb {R}}\times {\mathcal {T}}^S M)\). Then, \(\alpha \) is a mixed-order contact form if and only if for every \(t\in {\mathbb {R}}\) and every \(X\in \cup _{q\in M}I_{(t,q)}(M)\),
Proof
We first let \(\alpha = \alpha _0 dt + \alpha _i d^2 x^i + \alpha _{jk}dx^j\cdot dx^k\) be a mixed-order contact form and let \(X\in I_{(t,q)}(M)\). Then,
To prove the converse, we suppose
Fix a particular index \(i_0\) with \(1\le i_0 \le d\). Let \(Y\in I_{(t,q)}(M)\) such that \(j_{(t,q)}X = j_{(t,q)}Y\), \(D^i D Y = D^i D X + \delta _{i_0}^i\) and
Then,
It follows from the arbitrariness of \(i_0\) that \(\alpha ^1_i = 0\) for all \(1\le i\le d\). Similarly, all \(\alpha _{jk}^1\), \(\alpha _{jk}^2\) and \(\alpha _{jklm}^2\) vanish. Consequently, \(\alpha = \alpha _0 dt + \alpha _i d^2 x^i + \alpha _{jk}dx^j\cdot dx^k\). As in (B.2), we have \((jX)^*(\alpha |_{j_{(t,q)}X}) = X^*(\alpha |_{j_{(t,q)}X}) = 0\). Hence, \(\alpha \) is a mixed-order contact form. \(\square \)
Corollary B.10
Let \({\textbf{X}}\) be a \({\mathcal {T}}^S M\)-valued diffusion process. Then, \({\textbf{X}} = j X\), with X an M-valued diffusion process, if and only if \({\textbf{X}}^*(\alpha ) = 0\) for every mixed-order contact form \(\alpha \) on \({\mathbb {R}}\times {\mathcal {T}}^S M\).
Proof
We first suppose \({\textbf{X}} = j X\) with X an M-valued diffusion process. Then, for a mixed-order contact form \(\alpha \),
To prove the converse, it suffices to show, in local coordinates, that
This can be done as soon as we let \(\alpha \) be a basic mixed-order contact form. For example, let \(\alpha = d^2x^i - D^i x dt\), then
which leads to \(D^i x({\textbf{X}}) = D^i (x\circ {\textbf{X}})\). \(\square \)
1.2 B.2 The Mixed-Order Cartan Distribution and Its Symmetries
The model bundle \({\mathbb {R}}\times {\mathcal {T}}^S M\) is a trivial bundle over \({\mathbb {R}}\) in its own right, and so we may consider its mixed-order tangent bundle \((T {\mathbb {R}}\times {\mathcal {T}}^S {\mathcal {T}}^S M, \tau _{\mathbb {R}}\times \tau ^S_{{\mathcal {T}}^S M}, {\mathbb {R}}\times {\mathcal {T}}^S M)\).
Definition B.11
The bundle endomorphism \((v,\textbf{Id}_E)\) of \(\pi _{1,0}^*(\tau _{\mathbb {R}}\times \tau ^S_M)\) is defined by
where \(A^h \in H^R\pi _{1,0}\) and \(A^v \in \pi _{1,0}^* (V^S\pi )\).
Definition B.12
(Mixed-order Cartan distribution) The mixed-order Cartan distribution is the kernel of the vector bundle homomorphism over \(\textbf{Id}_{{\mathbb {R}}\times {\mathcal {T}}^S M}\)
and is denoted by \(C^R\pi _{1,0}\).
Note that \(C^R\pi _{1,0}\) is a subbundle of \(\tau _{\mathbb {R}}\times \tau ^S_{{\mathcal {T}}^S M}\). It follows from the above two definitions that
Hence, for each \(X \in I_{(t,q)}(M)\),
Similarly to the proof of Lemma B.2, we can decompose an element \({\textbf{A}} \in C^R\pi _{1,0}|_{j_{(t,q)}X}\) as
where \((jX)_*((\pi _1)^R_*({\textbf{A}})) \in (jX)_*(T_t {\mathbb {R}})|_{j_{(t,q)}X}\) and \({\textbf{A}} - (jX)_*((\pi _1)^R_*({\textbf{A}})) \in V^S_{j_{(t,q)}X} \pi _{1,0}\).
From the duality relations it also follows that \((\tau ^*_{\mathbb {R}}\times \tau ^{S*}_{{\mathcal {T}}^S M})|_{C^{R*}\pi _{1,0}}\) is the annihilator of \((\tau _{\mathbb {R}}\times \tau ^S_{{\mathcal {T}}^S M})|_{C^R\pi _{1,0}}\), or in other words, the basic mixed-order contact forms are local defining forms for the mixed-order contact distribution \(C^R\pi _{1,0}\). A typical element \({\textbf{A}}\in C^R\pi _{1,0}|_{j_{(t,q)}X}\) may be written in coordinates as
From this it is easy to deduce \((\pi _{1,0})_*^R {\textbf{A}} \in H^R\pi _{1,0}\).
Definition B.13
A symmetry of the mixed-order Cartan distribution on \({\mathbb {R}}\times {\mathcal {T}}^S M\) is a bundle automorphism \({\textbf{F}}\) of \({\mathbb {R}}\times {\mathcal {T}}^S M\) which satisfies \({\textbf{F}}^R_*(C^R\pi _{1,0}) = C^R\pi _{1,0}\).
It follows by duality that symmetries of the mixed-order Cartan distribution are those bundle automorphisms which satisfy \(\textbf{F}^{R*}(C^{R*}\pi _{1,0}) = C^{R*}\pi _{1,0}\). For this reason, \(\textbf{F}\) is also called a mixed-order contact transformation. Similarly, \({\textbf{F}}\) may be characterized by the fact that whenever \(\alpha \) is a mixed-order contact form then so is \({\textbf{F}}^{R*}(\alpha )\).
Proposition B.14
Let \({\textbf{F}}\) be a bundle homomorphism from \(({\mathbb {R}}\times \mathcal T^S M, \pi _1, {\mathbb {R}})\) to \(({\mathbb {R}}\times {\mathcal {T}}^S N, \rho _1, {\mathbb {R}})\) that projects to a diffeomorphism \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\). Then, \({\textbf{F}}^R_*(C^R \pi _{1,0}) \subset C^R \rho _{1,0}\) if and only if \({\textbf{F}} = j F\) where F is a bundle homomorphism from \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) to \(({\mathbb {R}}\times N, \rho , {\mathbb {R}})\) that projects to \(F^0\).
Proof
First, we prove the sufficiency. Let \({\textbf{A}}\in C^R\pi _{1,0}|_{j_{(t,q)}X}\). According to (B.3), we decompose \({\textbf{A}}\) by \({\textbf{A}} = {\textbf{A}}_1 + {\textbf{A}}_2\) with \({\textbf{A}}_1 = (jX)_*((\pi _1)^R_*({\textbf{A}})) \in (jX)_*(T_t {\mathbb {R}})\) and \({\textbf{A}}_2 \in V^S_{j_{(t,q)}X} \pi _{1,0}\). Then, since by Corollaries 4.6 and A.9.(iv), \((jF)^R_* \circ (jX)_* = (jF\cdot jX)_* \circ (F^0)_* = (j{\tilde{X}})_* \circ (F^0)_*\) where \({\tilde{X}} = F\cdot X\) is the pushforward of X by F, we have
Besides, since \(jF: \pi _{1,0} \rightarrow \rho _{1,0}\) is a bundle homomorphism projecting to F by Corollary 4.5.(ii), we have \(\rho _{1,0} \circ jF = F \circ \pi _{1,0}\). Then,
which yields \({\textbf{F}}^R_*({\textbf{A}}_2) \in V^S \rho _{1,0}\). This proves \({\textbf{F}}^R_*(C^R\pi _{1,0}) \subset C^R\rho _{1,0}\).
For the necessity, we first prove that \({\textbf{F}}\) is bundle homomorphism from \(\pi _{1,0}\) to \(\rho _{1,0}\) by showing \(\textbf{F}^S_*(V^S \pi _{1,0}) \subset V^S \rho _{1,0}\), by virtue of Lemma A.1. Let \({\textbf{A}} \in V^S \pi _{1,0}\). Set \({\textbf{F}}^R_* {\textbf{A}} = {\textbf{A}}_1 + {\textbf{A}}_2\), where \({\textbf{A}}_1\in (j Y)_*({\mathcal {T}}_{F^0(t)} {\mathbb {R}})\) and \({\textbf{A}}_2\in V^S \rho _{1,0}\) for some diffusion Y. Since \({\textbf{F}}\) projects to \(F^0\),
while \((\rho _1)^S_* {\textbf{A}}_2 = \rho ^S_* (\rho _{1,0})^S_* \textbf{A}_2 = 0\). Thus, \((\rho _1)^S_* {\textbf{A}}_1 = 0\). Since \(\textbf{A}_1\in (j Y)_*({\mathcal {T}}_{F^0(t)} {\mathbb {R}})\), we set \({\textbf{A}}_1 = (jY)_* (\tau \frac{\partial }{\partial {s}}|_{F^0(t)} )\). Then, \((\rho _1)^S_* {\textbf{A}}_1 = \tau \frac{\partial }{\partial {s}}|_{F^0(t)} = 0\). Hence, \(\tau = 0\) and so \({\textbf{A}}_1 = 0\). This leads to \({\textbf{F}}^R_*(V^S \pi _{1,0}) \subset V^S \rho _{1,0}\) and so that \({\textbf{F}}\) is bundle homomorphism from \(\pi _{1,0}\) to \(\rho _{1,0}\). Denote the projection of \({\textbf{F}}\) onto a map from \({\mathbb {R}}\times M\) to \({\mathbb {R}}\times N\) by F. It follows that
Since \(\pi _{1,0}\) is surjective, we obtain \(\rho \circ F = F^0 \circ \pi \), so that F is a bundle homomorphism from \(\pi \) to \(\rho \) projecting to \(F^0\). We shall write \(F = (F^0, {\bar{F}})\) and \({\textbf{F}} = (F^0, \bar{{\textbf{F}}})\).
Next, we will show \({\textbf{F}} =jF\). Fix a \(j_{(t,q)}X \in {\mathbb {R}}\times {\mathcal {T}}^S M\). Let \({\textbf{F}}(j_{(t,q)}X) = j_{(s,q')}Y\). Then, \(s = F^0(t)\) and \((s,q') = F(t,q)\). For an element \({\textbf{A}}\in C^R\pi _{1,0}|_{j_{(t,q)}X}\) with local expression in (B.4), we have from (A.3) that
Since \({\bar{F}}\) only depends on the variables on \({\mathbb {R}}\times M\), we have
Then, the local expressions for jF in (4.7) and (4.8) yield
Since \({\textbf{F}}^R_* {\textbf{A}}\in C^R\pi _{1,0}|_{j_{(s,q')}Y}\) by the assumption, it follows that \(jF(j_{(t,q)}X) = j_{(s,q')}Y = {\textbf{F}}(j_{(t,q)}X)\). This proves that \({\textbf{F}} = jF\). \(\square \)
Corollary B.15
Let \({\textbf{F}}\) be a bundle automorphism on \(({\mathbb {R}}\times {\mathcal {T}}^S M, \pi _1, {\mathbb {R}})\) projecting to a diffeomorphism \(F^0:{\mathbb {R}}\rightarrow {\mathbb {R}}\). Then, \({\textbf{F}}\) is a symmetry of \(C^R\pi _{1,0}\) if and only if \(\textbf{F}= j F\) where F is a bundle automorphism on \(({\mathbb {R}}\times M, \pi , {\mathbb {R}})\) that projects to \(F^0\).
Proof
If \({\textbf{F}}\) is a symmetry, then \({\textbf{F}}^R_*(C^R\pi _{1,0}) \subset C^R\pi _{1,0}\) and \(({\textbf{F}}^{-1})^R_*(C^R\pi _{1,0}) \subset C^R\pi _{1,0}\). By Proposition B.14, \({\textbf{F}} = j F\) and \({\textbf{F}}^{-1} = j G\) for some bundle endomorphisms F and G on \(({\mathbb {R}}\times M, \pi _1, {\mathbb {R}})\) that projects to \(F^0\) and \((F^0)^{-1}\), respectively. Then, Corollary 4.5.(iii) implies that \(j(F\circ G) = jF \circ jG = F\circ F^{-1} = \textbf{Id}_{{\mathbb {R}}\times {\mathcal {T}}^S M}\) and hence \(F\circ G = \textbf{Id}_{{\mathbb {R}}\times M}\). For the same reason, \(G\circ F = \textbf{Id}_{{\mathbb {R}}\times M}\). Thus, F is a bundle automorphism on \(\pi \). Conversely, if \({\textbf{F}} = j F\) and F is a bundle automorphism, then \({\textbf{F}} \circ j F^{-1} = j F^{-1}\circ F = \textbf{Id}_{{\mathbb {R}}\times {\mathcal {T}}^S M}\), which yields \({\textbf{F}}^{-1} = j F^{-1}\) and hence \({\textbf{F}}\) is a bundle automorphism on \(\pi _1\). \(\square \)
1.3 B.3 Infinitesimal Symmetries
Definition B.16
An infinitesimal symmetry of the mixed-order Cartan distribution is a \(\pi _1\)-projectable vector field \({\textbf{V}}\) on \({\mathbb {R}}\times {\mathcal {T}}^S M\) with the property that, whenever the mixed-order vector field \({\textbf{A}}\) belongs to \(C^R\pi _{1,0}\), then so does the mixed-order vector field \({\mathcal {L}}_{{\textbf{V}}}{\textbf{A}}\).
Like in the classical case, an infinitesimal symmetry of the mixed-order Cartan distribution may also be called an infinitesimal mixed-order contact transformation. By duality, \({\textbf{V}}\) is such an infinitesimal symmetry precisely when \({\mathcal {L}}_{{\textbf{V}}} \alpha \) is a contact form for every mixed-order contact form \(\alpha \).
The following lemma is a consequence of the definition of Lie derivatives.
Lemma B.17
Let \({\textbf{V}}\) be a \(\pi _1\)-projectable vector field on \({\mathbb {R}}\times {\mathcal {T}}^S M\) with flow \(\Psi =\{\Psi _\epsilon \}_{\epsilon \in {\mathbb {R}}}\). Then, \({\textbf{V}}\) is an infinitesimal symmetry of the mixed-order Cartan distribution if and only if for each \(\epsilon \), the diffeomorphism \(\Psi _\epsilon \) is a symmetry of the mixed-order Cartan distribution.
The following result is the infinitesimal version of Corollary B.15. It can be deduced directly from Lemma B.17 and Corollary B.15. But here we give a computational proof based on the Lie derivative of mixed-order contact forms.
Theorem B.18
Let \({\textbf{V}}\) be a \(\pi _1\)-projectable vector field on \({\mathbb {R}}\times {\mathcal {T}}^S M\). Then, \({\textbf{V}}\) is an infinitesimal symmetry of the mixed-order Cartan distribution if and only if \({\textbf{V}}\) is the prolongation of a \(\pi \)-projectable vector field V on \({\mathbb {R}}\times M\).
Proof
Let the vector field \({\textbf{V}}\) having the following local expression:
where \({\textbf{V}}^0\) only depends on time due to the projectability of \({\textbf{V}}\). We then derive the Lie derivative \({\mathcal {L}}_{{\textbf{V}}}\) of the basic mixed-order contact forms \(d^2x^i - D^i x dt\) and \(dx^j\cdot dx^k - Q^{jk} x dt\) as follows:
and
Thus, the mixed-order forms \({\mathcal {L}}_{{\textbf{V}}}(d^2x^i - D^i x dt)\) and \({\mathcal {L}}_{{\textbf{V}}}(dx^j\cdot dx^k - Q^{jk} x dt)\) are mixed-order contact forms if and only if
Now (B.5) means that \({\textbf{V}}^i\)’s only depend on the variables on \({\mathbb {R}}\times M\), so that the vector field \({\textbf{V}}\) is also \(\pi _{1,0}\)-projectable. The two equations (B.6) and (B.7) are just restatements of the prolongation formulae in Theorem 4.14. \(\square \)
Appendix C Stochastic Maupertuis’s Principle
Based on Definition 7.11, if we further consider the variation caused by time-change, as in classical mechanics (cf. Abraham and Marsden 1978, Definition 3.8.4 or the so called \(\Delta \)-variation in Goldstein et al. 2002, Section 8.6), then we need to impose the constraint of constant energy. So the path space \({\mathcal {A}}_g([0,T];q, \mu )\) in (7.10) is modified to
where \(e\in {\mathbb {R}}\) is a regular value of \(E_0\).
Definition C.1
Given \(v\in {\mathcal {H}}([0,T];q)\) and \(\varsigma \in {\mathcal {C}}^1([0,T],{\mathbb {R}})\), by a variation of the pair \((X,\tau )\in {\mathcal {A}}_g([0,T];q, \mu ;e)\) along \((v,\varsigma )\), we mean a family of pairs \(\{(X_\epsilon ^{v,\varsigma },\tau ^{\varsigma }_\epsilon )\}_{\epsilon \in (-\varepsilon ,\varepsilon )}\) where \(\tau ^{\varsigma }_0 = \tau \), \(\frac{\partial }{\partial t}\tau ^{\varsigma }_\epsilon >0\), such that for each \(\epsilon \), \(\frac{\partial }{\partial \epsilon }\tau ^{\varsigma }_\epsilon |_{\epsilon =0} =\varsigma \), \(X_\epsilon ^{v,\varsigma }\in I_{(\tau ^{\varsigma }_\epsilon (0),q)}^{(\tau ^{\varsigma }_\epsilon (T),\mu )}(M)\), and for each \(t\in [\tau ^{\varsigma }_\epsilon (0),\tau ^{\varsigma }_\epsilon (T)]\), \({\textbf{E}}E_0(t, X_\epsilon ^{v,\varsigma }(t), D_\nabla X_\epsilon ^{v,\varsigma }(t)) = e\), \(X^{v,\varsigma }_\epsilon (t)\) satisfies the ODE
Define a functional \({\mathcal {I}}: {\mathcal {A}}_g([0,T];q, \mu ;e) \rightarrow {\mathbb {R}}\) by
The pair \((X,\tau )\in {\mathcal {A}}_g([0,T];q, \mu ;e)\) is called a stationary point of \({\mathcal {I}}\), if
As in Lemma 7.13, it is easy to deduce from (C.1) that \(QX^{v,\varsigma }_\epsilon (t) = \check{g}(X^{v,\varsigma }_\epsilon (t))\) for each \(t\in [\tau ^{\varsigma }_\epsilon (0),\tau ^{\varsigma }_\epsilon (T)]\) so that \(X^{v,\varsigma }_\epsilon \in {\mathcal {A}}_g([0,T];q, \mu ;e)\). Moreover, formula (7.13) still holds for all \(t\in [\tau (0),\tau (T)]\), with \(X^{v,\varsigma }_\epsilon \) in place of \(X^v_\epsilon \).
Lemma C.2
Keep the notations in Definition C.1. Then, in normal coordinates \((x^i)\) we have
Proof
Without loss of generality, we assume \(\tau ^{\varsigma }_\epsilon (s)\ge \tau (s)\). It follows from (C.1) and Definition 2.5 that
Done. \(\square \)
Theorem C.3
(Stochastic Maupertuis’s principle) Let \(L_0\) be a regular Lagrangian on \({\mathbb {R}}\times TM\). Let \(X\in I_{(0,q)}^{(T,\mu )}(M)\) such that \((X,\textbf{Id}_{[0,T]})\in {\mathcal {A}}_g([0,T];q,\mu ;e)\). Then, the pair \((X,\textbf{Id}_{[0,T]})\) is a stationary point of \({\mathcal {I}}\) if and only if X satisfy the stochastic Euler–Lagrange equation (7.22).
Proof
Since all diffusions in \({\mathcal {A}}_g([0,T];q, \mu ;e)\) have the same average energy e, we have
Denote \(V(t) = \Gamma (X)_0^t v(t)\). As in (7.23),
We apply (7.24) and notice that in the present situation we do not have \(v(0) = v(T) = 0\) in general. Hence,
One the other hand, since for all \(\epsilon \), \(X_\epsilon ^{v,\varsigma }(\tau ^{\varsigma }_\epsilon (0))=q\) and \({\textbf{P}}\circ (X_\epsilon ^{v,\varsigma }(\tau ^{\varsigma }_\epsilon (T)))^{-1}=\mu \). It follows from Lemma C.2 that
Therefore,
By the definition of the energy \(E_0\), we know that
The result follows. \(\square \)
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
About this article
Cite this article
Huang, Q., Zambrini, JC. From Second-Order Differential Geometry to Stochastic Geometric Mechanics. J Nonlinear Sci 33, 67 (2023). https://doi.org/10.1007/s00332-023-09917-x
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s00332-023-09917-x
Keywords
- Stochastic Hamiltonian mechanics
- Stochastic Lagrangian mechanics
- Hamilton–Jacobi–Bellman equations
- Stochastic Hamilton’s equations
- Stochastic Euler–Lagrange equation
- Stochastic Noether’s theorem
- Schrödinger’s problem
- Second-order differential geometry