Abstract
We overview recent results on generalizations of the Wasserstein 2-metric, originally defined on the space of scalar probability densities, to the space of Hermitian matrices and of matrix-valued distributions, as well as some extensions of the theory to vector-valued distributions and discrete spaces (weighted graphs).
This project was supported by ARO grant (W911NF-17-1-0429), AFOSR grants (FA9550-15-1-0045 and FA9550-17-1-0435), NSF (ECCS-1509387), NIH (P41-RR-013218, P41-EB-015902, 1U24CA18092401A1), and a postdoctoral fellowship at MSKCC.
Access provided by CONRICYT-eBooks. Download chapter PDF
Similar content being viewed by others
10.1 Introduction
Optimal mass transport (OMT) is currently a very active area of research with applications to areas both applied and theoretical including control, transportation, econometrics, fluid dynamics, probability theory, statistical physics, shape optimization, expert systems, and meteorology; see [25, 30] for extensive lists of references. The original problem was first formulated by the civil engineer Gaspar Monge in 1781, and concerned finding the optimal way, in the sense of minimal transportation cost, of moving a pile of soil from one site to another. Much later the problem was extensively analyzed by Kantorovich [18], and is now known as the Monge–Kantorovich (MK) or optimal mass transport (OMT) problem.
In this paper, we present certain generalizations of OMT to matrix and vector-valued transportation. Our original motivation for this rather nontraditional viewpoint was provided by problems in Signal Analysis, more specifically, the need of a weakly continuous metric to compare (matrix-valued) power spectra of multivariate time series (see [24]). Soon afterward it became apparent Quantum Mechanics was another field that would stand to benefit from such an unusual extension of OMT. In fact, it was this latter subject that provided some of the clues of how to properly set up noncommutative OMT.
The basis of the new theory is a suitable extension of the Liouville (continuity) equation that allows flows in matrix or other spaces. To this end, in [8], we first proposed such a continuity equation and a noncommutative counterpart of OMT where probability distributions are replaced by density matrices (i.e., Hermitian positive-definite matrices with unit trace). The appropriate Wasserstein metric now corresponds to the minimal value of an action integral evaluated on flows connecting end-point density matrices. The key insight, to use such a dynamic formulation in seeking the needed generality was provided by the seminal approach of Benamou and Brenier [3]. Indeed, the Benamou–Brenier formulation recasts OMT as a stochastic control problem. The work we are reporting herein takes this idea along several different directions, and in particular to OMT between matrices, matricial distribution and vectorial distributions [8, 10]. Extensions of these results to distributions that have end-point distributions of unequal overall mass (unbalanced) are reported in [11] (and not included in the current survey).
We note that at about the same time as [8] was originally reported, closely related approaches were formulated independently and simultaneously in [6, 20]. In fact, in our work, we greatly benefited from earlier work by Carlen and Maas [7] on a fermionic Fokker–Planck equation.
10.2 Quantum Continuity Equation
The three papers [6, 8, 20] all begin with the Lindblad equation that describes the evolution of open quantum systems. Open quantum systems are thought of as coupled to a larger system (referred to as the environment or the ancilla) and, thereby, cannot, in general, be described by the Schrödinger equation [17]. In this case, the evolution of density operators \(\rho \) [17] is given by the Lindblad equation
where \(^*\) denotes conjugate transpose, and throughout, we assume that \(\hbar =1\). The first term on the right-hand side describes the evolution of the state under the effect of the Hamiltonian H (Schrödinger unitary evolution). The other terms on the right-hand side model diffusion and, thereby, capture the dissipation of energy–they constitute the quantum analogue of Laplace’s operator \(\varDelta \) and are referred to as the Lindblad terms.
Our approach [8, 9] relies on a suitable continuity equation in the space of Hermitian matrices \(\mathscr {H}\) (of a given dimension). To this end, we invoke suitable definitions for the gradient \(\nabla _L\) and divergence \(\nabla ^*_L\) operators on spaces of matrices that are explained below and express the continuity equation in the familiar form
where J is a matricial flux, in complete analogy with the continuity equation on scalar densities.
Throughout, \(\rho (t)\in \mathscr {H}\) is a positive-semidefinite matrix of trace one, i.e., a density matrix of quantum mechanics. Regarding notation, we let \({\mathscr {H}}_+\) and \({\mathscr {H}}_{++}\) denote the cones of nonnegative and positive-definite matrices, respectively,
the space of density matrices, and \({\mathscr {S}}\) the space of skew-Hermitian matrices (of the same dimension as \({\mathscr {H}}\)). The flux J is taken in \({\mathscr {S}}^N\), i.e., a vector with matrix entries. Flux typically arises in the form
The symbol \(\rho \circ v\) denotes one of several possible choices of noncommutative multiplication. We have considered specifically the following two choices, referred to as the anticommutator multiplication (i) and Kubo-Mori product (ii), respectively:
where, for \(\rho \in {\mathscr {H}}\) and \(v\in {\mathscr {S}}^N\),
On the other hand, we define the gradient operator with respect to \(L\in {\mathscr {H}}^N\) to be
With respect to the standard Hilbert–Schmidt inner product \(\langle X, Y\rangle =\text {tr}(X^*Y)\) (and, for the case when X, Y are in \({\mathscr {H}}^N\) or \({\mathscr {S}}^N\), the inner product \(\langle X, Y\rangle =\sum _{k=1}^N \text {tr}(X_k^*Y_k)\)), the divergence operator turns out to be
and this is what is used in (10.2). We note that for technical reasons, the definition of gradient and divergence require that \(L_k=L_k^*\), i.e., \(L\in {\mathscr {H}}^N\), as above. Also, one can easily verify that \(\nabla _L\) is a derivation, in that,Footnote 1
With these definitions in place, we define the (matricial) Laplacian as
which is exactly (after scaling by 1 / 2) the diffusion term in the Lindblad equationFootnote 2 (10.1). Hence, Lindblad’s equation can be rewritten as
10.3 Matricial Wasserstein 2-Metric
From here on we consider the continuity equation,
without the diffusion term, but for a general velocity field \(v\in {\mathscr {S}}^N\). A tacit assumption throughout is that the identity matrix I spans the null space or \(\nabla _L\); this can be ensured if one chooses \(L_1,\ldots ,L_N\) to form a basis of \({\mathscr {H}}\) (which is a sufficient, but not a necessary condition).
A Wasserstein distance between two density matrices can now be defined as the least action (minimum control problem) to steer one density matrix to another,
In this, \(\rho _0\) and \(\rho _1\) in \({\mathscr {D}}_+\) and the optimization is over \(\rho (\cdot )\in {\mathscr {D}}_+\) and \(v\in {\mathscr {S}}^N\). In fact, for \(v\in {\mathscr {S}}^N\), (10.3) already preserves positive definiteness and trace of \(\rho (\cdot )\).
The choice of the anticommutator product \(\rho \circ v = \frac{1}{2}(\rho v+v\rho )\) is especially appealing since, in this case, the matricial Wasserstein metric (10.4) is readily computable. Indeed, (10.4) can be cast as a convex optimization problem in a manner analogous to that in the scalar case [3]. To see this, let \(u:=\rho v=[u_1^*, \ldots ,u_N^*]^*\) and \(u_*:=[u_1, \ldots ,u_N]^*\), and observe that
Thus, (10.4) can be equivalently expressed as
In this, it turns out that although we do not require any structural constraint on u, the optimal u satisfies \(u=\rho v\) for some \(v\in {\mathscr {S}}^N\).
The choice of the Kubo-Mori product, on the other hand, provides a matricial version of the Wasserstein metric for which the gradient flow of the von Neuman entropy \(\text {tr}(\rho \log (\rho ))\) is the Lindblad equation [6, 8, 20]. Thus, it generalizes to the noncommutative setting, the famous result by Jordan, Kinderlehrer, and Otto [16] for the ordinary Wasserstein-2 metric on probability densities where the heat equation is the gradient flow of the entropy. However, it is interesting to note that computation of the Wasserstein metric for the Kubo-Mori product appears challenging as compared to the one based on the anticommutator product above.
To characterize the form of minimizer one can proceed to consider the dual problem, which for the case of the anticommutator product goes as follows. With \(\lambda (\cdot ) \in {\mathscr {H}}\) a smooth Lagrangian multiplier for the constraints we construct the Lagrangian
Point-wise minimization over v yields
and the expression for the corresponding minimum
from which we conclude the following sufficient condition for optimality: Suppose there exists \(\lambda (\cdot )\in {\mathscr {H}}\) satisfying
such that the solution of
matches the marginals \(\rho (0)=\rho _0, \rho (1)=\rho _1\). Then the pair \((\rho , v)\) with \(v=-\nabla _L \lambda \) solves (10.4).
The Wasserstein metric induces a Riemannian structure
on the tangent space of Hermitian matrices with a specified trace,
Here \(\lambda _j,~j=1, 2\) is the solution to the Poisson equation
The proof of existence and uniqueness of the solution of (10.7) follows exactly along the same lines as in [7]; details are given in [9]. In fact, given a tangent direction \(\delta \), \(-\nabla _L \lambda \) is the unique minimizer of \(\text {tr}(\rho v^*v)\) over all \(v\in {\mathscr {S}}^N\) satisfying
With the above definition of inner product, \(W_{2}(\cdot , \cdot )\) indeed defines a metric on \({\mathscr {D}}_+\) for which \({\mathscr {D}}_+\) is a geodesic space, i.e., the distance between two given \(\rho _0,\rho _1\in {\mathscr {D}}_+\) can be rewritten as
where the minimum is taken over all piecewise smooth paths on the manifold \({\mathscr {D}}_+\).
We finally note that, more generally, OMT can be formulated on the space of matrix-valued distributions. In this case, the mass constraint becomes \(\int \text {tr}\rho (x)dx=1\), where x represents a vector of spatial coordinates and dx the volume element. Transport along spatial coordinates, e.g., with \( x\in {\mathbb R}^m\), is effected by a term \( \nabla _x\cdot (\rho \circ w)\) in the continuity equation, with \(w\in {\mathscr {H}}^m\), i.e.,
Likewise, the cost of transport is duly penalized in a corresponding problem to minimize a suitable action integral; see [8] for details.
10.4 Vector-Valued Transport on \({\mathbb R}^N\)
A vector-valued density \(\rho =[\rho _1,\rho _2,\cdots ,\rho _\ell ]^T\), on \({\mathbb R}^N\) or on a discrete space, may represent power reflected off a surface at different frequencies/colors. The “mass” of these components may transfer between different entries of the density vector (e.g., due to different angles of reflection) along time flows of the vectorial density. Thus, while the total power may be invariant (under some lighting conditions), the proportion of power at different frequencies or polarization may smoothly vary with viewing angle. As another example consider the case where the entries of \(\rho \) represent densities of different species, or particles, and allow for the possibility that mass transfers from one species to another (“mutate”), i.e., between entries of \(\rho \). Thus, in general, we postulate that transport of vector-valued quantities captures flow across space as well as between entries of the density vector.
In this context, an OMT-inspired geometry is aimed to express a suitable continuity and to quantify transport cost for such vectorial distributions. We highlight some of the key elements in [10] for such a theory. It follows a line which is analogous to development of quantum transport that was discussed above.
We begin by considering a vector-valued density \(\rho \) on \({\mathbb R}^N\), i.e., a map from \({\mathbb R}^N\) to \({\mathbb R}^\ell _+\) such that
and consider the entries of \(\rho \) as representing density or mass of species/particles that can mutate between one another while maintaining total mass. We denote the set of all vector-valued densities and its interior by \({\mathscr {D}}\) and \({\mathscr {D}}_+\), respectively. The dynamics are described by the following continuity equation:
Here \(v_i\) is the velocity field of particles i and \(w_{ij}\ge 0\) is the transfer rate from i to j. Equation (10.8) allows for the possibility to mutate between each pair of entries. More generally, mass transfer may only be permissible between specific types of particles and can be modeled by a graph \({\mathscr {G}}=(\mathcal{V},\mathcal{E})\) (where the entries denote nodes and edges, respectively), in which case, for a subset of indices, the transfer rates \(w_{ji}\) may be restricted to be zero.
Given \(\mu ,\nu \in {\mathscr {D}}_+\), we formulate the optimal mass transport
The coefficient \(\gamma >0\) specifies the relative cost between transporting mass in space and trading mass between different types of particles. When \(\gamma \) is large, the solution reduces to independent OMT problems for the different entries to the degree possible. In general, \(W_{2}\) is a quasi-metric in that it satisfies the triangle inequality and positivity, but may not be symmetric.
Setting \(p_{ij}=\rho _i w_{ij}\ge 0\) and \(u_i=\rho _i v_i\), we have \( \rho _iw_{ij}^2=\rho _i^{-1} p_{ij}^2, \) and \( \rho _i\Vert v_i\Vert ^2=\rho _i^{-1}\Vert u_i\Vert ^2. \) It follows that
Finally, a Riemannian-like metric on \({\mathscr {D}}_+\) can be obtained by symmetrizing the above expression [10]. This is,
under the same constraints. This vector-valued OMT structure is further explored and developed in [10].
10.5 Vector-Valued Transport on Graphs
We conclude by highlighting elements of an OMT theory solely on graphs, cast in the setting of vector-valued densities [10]. As explained earlier, such densities may represent the distribution of multiple species/resources that are allowed to mutate between each other as they transition from node to node. The theory is aimed to capture cost of transport in such a setting.
A vector-valued mass distribution on the graph \({\mathscr {G}}=(\mathcal{V},\mathcal{E})\) (with n nodes and m edges) is a \(\ell \)-tuple \(\rho =(\rho _1,\cdots ,\rho _\ell )\) with each \(\rho _i=(\rho _{i,1},\cdots ,\rho _{i,n})^T\) being a vector in \({\mathbb R}_+^n\) such that
That is, each entry \(\rho _i\), for \(i\in \{1,\ldots ,\ell \}\), is a vector with nonnegative n-entries representing, e.g., color intensity for the i-th color, at the node corresponding to the respective entry. We denote the set of all nonnegative vector-valued mass distributions with \({\mathscr {D}}\) and its interior with \({\mathscr {D}}_+\). Equation (10.8) is now replaced by the following continuity equation:
since the spatial domain is now also discrete (i.e., it is \({\mathscr {G}}\) instead of \({\mathbb R}^N\)). Here, \(D=D_1-D_2\) is the incident matrix of the graph, with \(D_1,D_2\) are matrices with positive entries reflecting the position of sources (\(D_1\)) and sinks (\(D_2\)) by a entry equal to 1 in the corresponding place. Thus, the vector \(D_1^T\rho \) represents density at the sources of an edge and, likewise, \(D_2^T\rho \) represents density at the sinks. Then, also, \(\nabla _{\mathscr {G}}\) represents differencing between neighboring nodes and \(\nabla ^*_{\mathscr {G}}\) represents its dual (i.e., negative divergence) [10]. Finally, \(\circ \) represents entry-wise multiplication (Shur) between two vectors.
Now following the Benamou–Brenier [3] philosophy once again, given two marginal densities \(\mu , \nu \in {\mathscr {D}}_+\), we define their Wasserstein distance as
subject to (10.11) as well as \(w_{ij} \ge 0, ~v_i \ge 0, ~\bar{v}_i \ge 0\) for all (or a subset) of (i, j)’s and \(\rho (0)=\mu , ~ \rho (1)=\nu .\)
We should note that the problem of transporting vector-valued mass on a graph is quite simpler than in the case where the underlined space is continuous, since it reduces essentially to a scalar mass situation on a suitably larger graph. Indeed, we can view the vector-valued mass as a scalar mass distribution on \(\ell \) identical layers of the graph \({\mathscr {G}}\) where the same nodes at different layers are connected through a complete graph. The two velocity fields v, w represent mass transfer within the same layer and between different layers, respectively.
Once again, the computation of the metric has a convex formulation by changing optimization variable from \((\rho ,v_i,\bar{v}_i, w_ij,k)\) to momenta “mass” \(\rho \) and momenta \(u_i=\rho _iv_i\) and \(p_{ij}=\rho _i w_{ij}\), instead.
10.6 Conclusion
The basic fluid dynamical formulation of OMT can be generalized to flows on the space of matrices or vectors, that belong to a simplex of a suitable positive cone. A Wasserstein metric in these spaces can then be defined as a minimal quadratic cost for transferring between two end points. Such metrics appear natural as, in particular, for the space of quantum density matrices, render the Lindblad equation as the gradient flow of the von Neumann entropy. Our interest stems from problems in signal analysis and, more specifically, spectral and image analysis. In both of these application areas, the relevance of weakly continuous metric that can be used to quantify distances between, e.g., matrix-valued power spectra or multicolor images, is self-evident. In particular, geodesics in such spaces naturally model flows and allow morphing between spectra and images, respectively.
Notes
- 1.
The domain of \(\nabla _L\) is \({\mathscr {H}}\), hence the identity requires \(XY+YX\), instead of simply XY.
- 2.
The Lindblad term is in the so-called symmetric form since the coefficients are Hermitian.
References
Ambrosio, L.: Euro Summer School Mathematical Aspects of Evolving Interfaces. Lecture Notes on Optimal Transport Theory. CIME Series of Springer Lecture Notes. Madeira, Portugal, Springer-Verlag, New York (2000)
Angenent, S., Haker, S., Tannenbaum, A.: Minimizing flows for the Monge-Kantorovich problem. SIAM J. Math. Anal. 35, 61–97 (2003)
Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge-Kantorovich mass transfer problem. Numerische Mathematik 84, 375–393 (2000)
Benamou, J.-D.: Numerical resolution of an unbalanced mass transport problem. ESAIM. Math. Model. Numer. Anal. 37(5), 851–868 (2010)
Carlier, G., Salomon, J.: A monotonic algorithm for the optimal control of the Fokker-Planck equation. In: IEEE Conference on Decision and Control (2008)
Carlen, E., Maas, J.: Gradient flow and entropy inequalities for quantum Markov semigroups with detailed balance (2016). https://arxiv.org/abs/1609.01254
Carlen, E., Maas, J.: An analog of the 2-Wasserstein metric in non-commutative probability under which the fermionic Fokker-Planck equation is gradient flow for the entropy. Commun. Math. Phys. 331, 887–926 (2014)
Chen, Y., Georgiou, T.T., Tannenbaum, A.: Matrix optimal mass transport: a quantum mechanical approach (2016). https://arxiv.org/abs/1610.03041
Chen, Y., Gangbo, W., Georgiou, T.T., Tannenbaum, A.: On the matrix Monge-Kantorovich problem. https://arxiv.org/abs/1701.02826
Chen, Y., Georgiou, T.T., Tannenbaum, A.: Transport distance on graphs and vector-valued optimal mass transport (2016). https://arxiv.org/pdf/1611.09946v1.pdf
Chen, Y., Georgiou, T.T., Tannenbaum, A.: Interpolation of density matrices and matrix-valued measures: the unbalanced case (2017). https://arxiv.org/abs/1612.05914
Chen, Y., Georgiou, T.T., Pavon, M.: On the relation between optimal transport and Schrödinger bridges: a stochastic control viewpoint. J. Optim. Theory Appl. 169(2), 671–691 (2016)
Chen, Y., Georgiou, T.T., Pavon, M., Tannenbaum, A.: Robust transport over networks. IEEE Trans. Autom. Control (2016). https://doi.org/10.1109/TAC.2016.2626796
Chen, Y., Georgiou, T.T., Pavon, M.: Entropic and displacement interpolation: a computational approach using the Hilbert metric. SIAM J. Appl. Math. 76(6), 2375–2396 (2016)
Evans, L.C.: Partial Differential Equations and Monge-Kantorovich Mass Transfer, in Current Developments in Mathematics, pp. 65–126. International Press, Boston, MA (1999)
Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker-Planck equation. SIAM J. Math. Anal. 29, 1–17 (1998)
Gustafson, S., Sigal, I.M.: Mathematical Concepts of Quantum Mechanics. Springer, New York (2011)
Kantorovich, L.V.: On a problem of Monge. Uspekhi Mat. Nauk. 3, 225–226 (1948)
Kumar, A., Tannenbaum, A., Balas, G.: Optical flow: a curve evolution approach. IEEE Trans. Image Process. 5, 598–611 (1996)
Mittnenzweig, M., Mielke, A.: An entropic gradient structure for Lindblad equations and GENERIC for quantum systems coupled to macroscopic models (2016). https://arxiv.org/abs/1609.05765
Léonard, C.: From the Schrödinger problem to the Monge-Kantorovich problem. J. Funct. Anal. 262, 1879–1920 (2012)
Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292 (2011)
McCann, R.: Existence and uniqueness of monotone measure-preserving maps. Duke Math. J. 80, 309–323 (1995)
Ning, L., Georgiou, T., Tannenbaum, A.: On matrix-valued Monge-Kantorovich optimal mass transport. IEEE Trans. Autom. Control 60(2), 373–382 (2015)
Rachev, S., Rüschendorf, L.: Mass Transportation Problems, vol. I. Springer-Verlag, New York (1998) (Probab. Appl.)
Sandhu, R., Georgiou, T., Reznik, E., Zhu, L., Kolesov, I., Senbabaoglu, Y., Tannenbaum, A.: Graph curvature for differentiating cancer networks. Sci. Rep. (Nat.), 5, 12323 (2015). https://doi.org/10.1038/srep12323
Sandhu, R., Georgiou, T., Tannenbaum, A.: Ricci curvature: an economic indicator for market fragility and systemic risk. Sci. Adv. 2 (2016). https://doi.org/10.1126/sciadv.1501495
Tannenbaum, E., Georgiou, T., Tannenbaum, A.: Optimal mass transport for problems in control, statistical estimation, and image processing. In: Dym, H., de Oliveira, M.C., Putinar, M. (eds.) Mathematical Methods in Systems, Optimization, and Control. Birkhauser, Basel (2012)
Tannenbaum, E., Georgiou, T., Tannenbaum, A.: Signals and control aspects of optimal mass transport and the Boltzmann entropy. In: 49th IEEE Conference on Decision and Control, pp. 1885–1890 (2010)
Villani, C.: Topics in Optimal Transportation, Graduate Studies in Mathematics, vol. 58. AMS, Providence, RI (2003)
Villani, C.: Trend to equilibirum for dissipative equations, functional inequalities and mass transportation. In: de Carvalho, M., Rodrigues, J-F. (eds.) Contemporary Mathematics: Recent Advances in the Theory and Applications of Mass Transport. American Mathematical Society Publications (2004)
Villani, C.: Optimal Transport, Old and New. Springer, New York (2008)
Wang, C., Jonckheere, E., Banirazi, R.: Wireless network capacity versus Ollivier-Ricci curvature under Heat Diffusion (HD) protocol. In: Proceedings of ACC (2013)
Yamamoto, K., Chen, Y., Ning, L., Georgiou, T., Tannenbaum, A.: Regularization and interpolation of positive matrices (2016). https://arxiv.org/abs/1611.07945
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2018 Springer International Publishing AG
About this chapter
Cite this chapter
Chen, Y., Georgiou, T.T., Tannenbaum, A. (2018). Wasserstein Geometry of Quantum States and Optimal Transport of Matrix-Valued Measures. In: Tempo, R., Yurkovich, S., Misra, P. (eds) Emerging Applications of Control and Systems Theory. Lecture Notes in Control and Information Sciences - Proceedings. Springer, Cham. https://doi.org/10.1007/978-3-319-67068-3_10
Download citation
DOI: https://doi.org/10.1007/978-3-319-67068-3_10
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-67067-6
Online ISBN: 978-3-319-67068-3
eBook Packages: EngineeringEngineering (R0)