Abstract
Information geometry has emerged from the study of the invariant structure in families of probability distributions. This invariance uniquely determines a second-order symmetric tensor g and third-order symmetric tensor T in a manifold of probability distributions. A pair of these tensors (g, T) defines a Riemannian metric and a pair of affine connections which together preserve the metric. Information geometry involves studying a Riemannian manifold having a pair of dual affine connections. Such a structure also arises from an asymmetric divergence function and affine differential geometry. A dually flat Riemannian manifold is particularly useful for various applications, because a generalized Pythagorean theorem and projection theorem hold. The Wasserstein distance gives another important geometry on probability distributions, which is non-invariant but responsible for the metric properties of a sample space. I attempt to construct information geometry of the entropy-regularized Wasserstein distance.
Article PDF
Similar content being viewed by others
Avoid common mistakes on your manuscript.
References
S. Amari, Differential-Geometrical Methods in Statistics, Lect. Notes Stat., 28, Springer-Verlag, 1985.
S. Amari, Estimating functions of independent component analysis for temporally correlated signals, Neural Computation, 12 (2000), 2083–2107.
S. Amari, Information Geometry and Its Applications, Appl. Math. Sci., 194, Springer-Verlag, 2016.
S. Amari and J.-F. Cardoso, Blind source separation—Semiparametric statistical approach, IEEE Trans. Signal Process., 45 (1997), 2692–2700.
S. Amari, R. Karakida and M. Oizumi, Information geometry connecting Wasserstein distance and Kullback–Leibler divergence via the entropy-relaxed transportation problem, Inf. Geom., 1 (2018), 13–37.
S. Amari, R. Karakida, M. Oizumi and M. Cuturi, Information geometry for regularized optimal transport and barycenters of patterns, Neural Comput., 31 (2019), 827–848.
S. Amari and M. Kawanabe, Information geometry of estimating functions in semi-parametric statistical models, Bernoulli, 3 (1997), 29–54.
S. Amari and T. Matsuda, Wasserstein statistics in one-dimensional location-scale model, preprint, arXiv:2007.11401.
S. Amari and H. Nagaoka, Methods of Information Geometry, Transl. Math. Monogr., 191, Amer. Math. Soc., Providence, RI; Oxford Univ. Press, 2000.
S. Amari, A. Ohara and H. Matsuzoe, Geometry of deformed exponential families: Invariant, dually-flat and conformal geometries, Phys. A, 391 (2012), 4308–4319.
N. Ay, J. Jost, H.V. Lê and L. Schwachhöfer, Information Geometry, Ergeb. Math. Grenzgeb. (3), 64, Springer-Verlag, 2017.
A. Banerjee, S. Merugu, I.S. Dhillon and J. Ghosh, Clustering with Bregman divergences, J. Mach. Learn. Res., 6 (2005), 1705–1749.
M. Bauer, M. Bruveris and P.W. Michor, Uniqueness of the Fisher–Rao metric on the space of smooth densities, Bull. Lond. Math. Soc., 48 (2016), 499–506.
L.M. Brègman, The relaxation method of finding a common point of convex sets and its applications to the solution of problems in convex programming, U.S.S.R. Comput. Math. and Math. Phys., 7 (1967), 200–217.
A. Cena and G. Pistone, Exponential statistical manifold, Ann. Inst. Statist. Math., 59 (2007), 27–56.
N.N. Chentsov, Statistical Decision Rules and Optimal Inference, Transl. Math. Monogr., 53, Amer. Math. Soc., Providence, RI, 1982; Originally published in Russian, Nauka, 1972.
I. Csiszár, Information-type measures of difference of probability distributions and indirect observation, Studia Sci. Math. Hungar., 2 (1967), 299–318.
M. Cuturi, Sinkhorn distance: Lightspeed computation of optimal transport, Advances in Neural Information Processing Systems, 26 (2013), 2292–2300.
M. Cuturi and G. Peyré, A smoothed dual approach for variational Wasserstein problems, SIAM J. Imaging Sci., 9 (2016), 320–343.
J.G. Dowty, Chentsov’s theorem for exponential families, Inf. Geom., 1 (2018), 117–135.
S. Eguchi, Second order efficiency of minimum contrast estimators in a curved exponential family, Ann. Statist., 11 (1983), 793–803.
J. Feydy, T. Séjourné, F.-X. Vialard, S. Amari, A. Trouvé and G. Peyré, Interpolating between optimal transport and MMD using Sinkhorn divergences, In: The 22nd International Conference on Artificial Intelligence and Statistics, Proc. Mach. Learn. Res. (PMLR), 89, PMLR, 2019, pp. 2681–2690.
A. Fujiwara, Foundations of Information Geometry, Makino Shoten, 2015.
A. Genevay, G. Peyré and M. Cuturi, Learning generative models with Shinkhorn divergences, In: International Conference on Artificial Intelligence, and Statistics, Proc. Mach. Learn. Res. (PMLR), 84, PMLR, 2018, pp. 1608–1617.
M. Hayashi, Quantum Information Theory: Mathematical Foundation. 2nd ed., Grad. Texts Phys., Springer-Verlag, 2017.
T. Kurose, On the divergences of 1-conformally flat statistical manifolds, Tohoku, Math. J., 46 (1994), 427–433.
T. Kurose, Dual connections and projective geometry, Fukuoka Univ. Sci. Rep., 29 (1999), 221–224.
T. Kurose, Conformal-projective geometry of statistical manifolds, Interdiscip. Inform. Sci., 8 (2002), 89–100.
S.L. Lauritzen, Statistical manifolds, In: Differential Geometry in Statistical Inference, Institute of Mathematical Statistics, Lecture Notes Monograph Series, 10, Institute of Mathematical Statistics, 1987, pp. 23–33.
H.V. Lê, Statistical manifolds are statistical models, J. Geom., 84 (2005), 83–93.
W. Li and J. Zhao, Wasserstein information matrix, preprint, arXiv:1910.11248.
H. Matsuzoe, On realization of conformally-projectively flat statistical manifolds and the divergences, Hokkaido Math. J., 27 (1998), 409–421.
H. Matsuzoe, Geometry of contrast functions and conformal geometry, Hiroshima Math. J., 29 (1999), 175–191.
H. Matsuzoe, Statistical manifolds and affine differential geometry, In: Probabilistic Approach to Geometry, Adv. Stud. Pure Math., 57, Math. Soc. Japan, Tokyo, 2010, pp. 303–321.
T. Matumoto, Any statistical minifold has a contrast function—On the C3-functions taking the minimum at the diagonal of the product manifold, Hiroshima Math. J., 23 (1993), 327–332.
K. Miura, M. Okada and S. Amari, Estimating spiking irregularities under changing environments, Neural Comput., 18 (2006), 2359–2386.
T. Morimoto, Markov processes and the H-theorem, J. Phys. Soc. Japan, 18 (1963), 328–331.
J. Naudts, Generalised Thermostatistics, Springer-Verlag, 2011.
K. Nomizu and T. Sasaki, Affine Differential Geometry, Cambridge Tracts in Math., 111, Cambridge Univ. Press, Cambridge, 1994.
G. Peyré and M. Cuturi, Computational optimal transport, preprint, arXiv:1803.00567.
G. Pistone and C. Sempi, An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one, Ann. Statist., 23 (1995), 1543–1561.
C. Radhakrishna Rao, Information and accuracy attainable in the estimation of statistical parameters, Bull. Calcutta Math. Soc., 37 (1945), 81–91.
A. Ramdas, N. García Trillos and M. Cuturi, On Wasserstein two-sample testing and related families of nonparametric tests, Entropy, 19 (2017), no. 47.
F. Santambrogio, Optimal Transport for Applied Mathematicians, Progr. Nonlinear Differential Equations Appl., 87, Birkhäuser, 2015.
H. Shima, The Geometry of Hessian Structures, World Sci. Publ., 2007.
C. Tsallis, Introduction to Nonextensive Statistical Mechanics. Approaching a Complex World, Springer-Verlag, 2009.
C. Villani, Optimal Transport. Old and New, Grundlehren Math. Wiss., 338, Springer-Verlag, 2009.
T.-K.L. Wong, Logarithmic divergences from optimal transport and Rényi geometry, Inf. Geom., 1 (2018), 39–78.
Author information
Authors and Affiliations
Corresponding author
Additional information
Communicated by: Toshiyuki Kobayashi
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
This article is based on the 23rd Takagi Lectures that the author delivered at Research Institute for Mathematical Sciences, Kyoto University on June 8, 2019.
About this article
Cite this article
Amari, Si. Information geometry. Jpn. J. Math. 16, 1–48 (2021). https://doi.org/10.1007/s11537-020-1920-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11537-020-1920-5
Keywords and phrases
- canonical divergence
- dual affine connection
- information geometry
- Pythagorean theorem
- semiparametric statistics
- Wasserstein geometry