On closed-form expressions for the Fisher–Rao distance

Miyamoto, Henrique K.; Meneghetti, Fábio C. C.; Pinele, Julianna; Costa, Sueli I. R.

doi:10.1007/s41884-024-00143-2

On closed-form expressions for the Fisher–Rao distance

Research
Published: 16 September 2024

(2024)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Information Geometry Aims and scope Submit manuscript

On closed-form expressions for the Fisher–Rao distance

Download PDF

Abstract

The Fisher–Rao distance is the geodesic distance between probability distributions in a statistical manifold equipped with the Fisher metric, which is a natural choice of Riemannian metric on such manifolds. It has recently been applied to supervised and unsupervised problems in machine learning, in various contexts. Finding closed-form expressions for the Fisher–Rao distance is generally a non-trivial task, and those are only available for a few families of probability distributions. In this survey, we collect examples of closed-form expressions for the Fisher–Rao distance of both discrete and continuous distributions, aiming to present them in a unified and accessible language. In doing so, we also: illustrate the relation between negative multinomial distributions and the hyperbolic model, include a few new examples, and write a few more in the standard form of elliptical distributions.

Information geometry and asymptotic geodesics on the space of normal distributions

Article 25 June 2021

Information geometry

Article 02 January 2021

Optimal transport natural gradient for statistical manifolds with continuous sample space

Article 11 May 2020

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Information geometry [1,2,3] uses the tools of differential geometry to study spaces of probability distributions by regarding them as differential manifolds, called statistical manifolds. When these distributions are parametric, a structure of interest is the Fisher metric, a Riemannian metric induced by the Fisher information matrix. This is essentially the unique Riemannian metric on statistical manifolds that is invariant by sufficient statistics [4, 5], making it a natural choice to study the geometry of these manifolds. Moreover, this structure allows one to define the Fisher–Rao distance between two probability distributions on the same statistical manifold as the geodesic distance between them, i.e., the length of the minimising path, according to the Fisher metric.

The idea of considering the geodesic distance in Riemannian manifolds equipped with the Fisher metric was first suggested by Hotelling in 1930 [6] (reprinted in [7]), and later in 1945 in a landmark paper by Rao [8] (reprinted in [9]). This has influenced many authors to study the Fisher–Rao distance in different families of probability distributions in the following years [10,11,12,13,14,15,16,17,18,19,20,21,22,23]. It is important to note that, contrary to commonly used divergence measures, such as the Kullback–Leibler divergence, the Fisher–Rao distance is a proper distance, i.e., it is symmetric and the triangle inequality holds—properties that could be required when comparing two distributions, depending on the application.

More recently, the Fisher–Rao distance has gained attention especially in applications to machine learning problems. In the context of unsupervised learning, it has been used for clustering different types of data: shape clustering applied to morphometry [24], clustering of financial returns [25], image segmentation [20] and identification of diseases from medical data [26, 27]. When it comes to supervised learning, it has been used to analyse the geometry in the latent space of generative models [28], to enhance robustness against adversarial attacks [29, 30], in detection of out-of-distribution samples [31], as a loss function for learning under label noise [32], and to classify EEG signals of brain-computer interfaces [33].

However, finding closed-form expressions for the Fisher–Rao distance of arbitrary distributions is not a trivial task—as a matter of fact, more generally, finding geodesics in an arbitrary manifold is a difficult problem in differential geometry. Having that in mind, in this work, we collect examples of statistical models for which closed-form expressions for the Fisher–Rao distance are available. Most of them have been published over the last decades, in different places, and we aim to present these results in a unified and accessible language, hoping to bring them to a broader audience. In curating this collection, we also add a few contributions, such as: illustrating the relation between the manifold of negative multinomial distributions and the hyperbolic model; including a few new examples (Rayleigh, Erlang, Laplace, generalised Gaussian, power function, inverse Wishart), to the best of the authors’ knowledge; and writing more examples in the standard form of univariate elliptical distributions (Laplace, generalised Gaussian, logistic). Finally, we note that numerical methods have been proposed to compute the Fisher–Rao distance when no closed-form expression is available, as in [26, 27, 34,35,36,37,38], but those techniques are beyond the scope of the present work.

We review preliminaries of information geometry in Sect. 2 and of hyperbolic geometry in Sect. 3. We then collect closed-form expressions for discrete distributions in Sect. 4, and for continuous distributions in Sect. 5. Product distributions are discussed in Sect. 6, and Sect. 7 concludes the paper.

Notation.

We denote the sets $\mathbb {N}= \{ 0, 1, 2, \dots \}$, $\mathbb {N}^* = \{ 1, 2, 3, \dots \}$, $\mathbb {R}_+ = [0, \infty [$, and $\mathbb {R}_+^* = ] 0, \infty [$. $\mathbbm {1}_A(x)$ is the indicator function, that takes value 1 if $x \in A$, and 0 otherwise. $\delta _{ij} {:}{=}\mathbbm {1}_{\{j\}}(i)$ denotes the Kronecker delta. We denote $\dot{x}(t) {:}{=}\frac{\textrm{d}}{\textrm{d}t}x(t)$. $P_n(\mathbb {R})$ denotes the cone of $n \times n$ real symmetric definite-positive matrices.

2 Information geometry preliminaries

Let $(\Omega , \mathcal {G}, P)$ be a probability space and $X :\Omega \rightarrow \mathcal {X}$ a random variable in the $\sigma $-finite measure space $(\mathcal {X}, \mathcal {F}, \mu )$. The push-forward measure of P by X is given by $X_*P(E) {:}{=}P(X^{-1}(E))$, for $E \in \mathcal {F}$, and we assume that $X_*P$ is absolutely continuous with respect to $\mu $. The Radon–Nikodym derivative $p {:}{=}\frac{\textrm{d}X_*P}{\textrm{d}\mu } :\mathcal {X}\rightarrow \mathbb {R}$ can be seen as the probability mass or density function (p.m.f. or p.d.f.), respectively, in the cases that $\mathcal {X}$ is discrete or continuous. When $\mathcal {X}$ is discrete, we take $\mu $ as the counting measure, and the integral with respect to $\mu $ becomes a summation; when $\mathcal {X}= \mathbb {R}^n$, we take $\mu $ as the Lebesgue measure.

A statistical model [1]

$$\begin{aligned} \mathcal {S}{:}{=}\left\{ p_{\xi } = p(x; \xi ) \;\Big \vert \;\xi = (\xi ^1, \dots , \xi ^n) \in \Xi \subseteq \mathbb {R}^n \right\} \end{aligned}$$

(1)

is a family of probability distributions $p_\xi $ parametrised by n-dimensional vectors $\xi = \left( \xi ^1, \dots , \xi ^n \right) $ such that the mapping $\xi \mapsto p_\xi $ is injective and $\Xi $ is an open set of $\mathbb {R}^n$. We consider statistical models in which the support of $p_\xi $ does not depend on $\xi $ and we take $\mathcal {X}= {{\,\textrm{supp}\,}}p_\xi $, unless otherwise stated. Note that $\mathcal {S}$ is contained in the infinite-dimensional space $\mathcal {P}(\mathcal {X}) {:}{=}\left\{ p \in L^1(\mu ) \;\Big \vert \;p > 0, \ \int _{\mathcal {X}} p~\textrm{d}\mu =1 \right\} $ of positive, $\mu $-integrable functions of unit total measure.

To introduce a differentiable structure in $\mathcal {S}$, we consider the following assumptions: 1) the parametrisation $\varphi :\Xi \rightarrow \mathcal {P}(\mathcal {X}),\ \varphi (\xi ) = p_\xi $ is a homeomorphism on its image; 2) denoting $\partial _i {:}{=}\frac{\partial }{\partial \xi ^i}$, the functions $\left\{ \partial _1 p_\xi , \dots , \partial _n p_\xi \right\} $ are linearly independent; 3) the mapping $\xi \mapsto p_\xi (x)$ is smooth, for all $x \in \mathcal {X}$; 4) the partial derivatives $\partial _i p_\xi (x)$ commute with the integrals. Moreover, by considering diffeomorphic parametrisations as equivalent, $\mathcal {S}$ becomes a differentiable manifold, that we may call a statistical manifold.^{Footnote 1} Note that the parametrisation $\xi \mapsto p_\xi $ is a global coordinate system for this manifold (see Fig. 1).

We can further equip the statistical manifold $\mathcal {S}$ with a Riemannian metric. Denoting $\ell (\xi ) {:}{=}\log p_\xi $ the log-likelihood function, the elements of the Fisher information matrix (or simply Fisher matrix) $G(\xi ) = \left[ g_{ij}(\xi ) \right] _{i,j}$ are defined as

$$\begin{aligned} g_{ij} {:}{=}g_{ij}(\xi ) {:}{=}\mathbb {E}\left[ \partial _i \ell (\xi ) \partial _j \ell (\xi ) \right] , \end{aligned}$$

(2)

for $1 \le i,j \le n$, where the expectation is taken with respect to $p_\xi $. Explicitly,

$$\begin{aligned} g_{ij}(\xi ) = \int _\mathcal {X}p_\xi \left( \frac{\partial }{\partial \xi ^i}\log p_\xi \right) \left( \frac{\partial }{\partial \xi ^j}\log p_\xi \right) \textrm{d}\mu . \end{aligned}$$

Alternatively, the Fisher matrix can be written as the negative expectation of the Hessian of the log-likelihood function, a result that can make the computation of the Fisher matrix easier in some cases:

Proposition 1

([2, Prop. 1.6.3]) The elements of the Fisher matrix can be expressed as

$$\begin{aligned} g_{ij}(\xi ) = -\mathbb {E}\left[ \partial _j \partial _i \ell (\xi ) \right] . \end{aligned}$$

(3)

Proof

As $\int _{\mathcal {X}} p_\xi ~\textrm{d}\mu = 1$, taking the derivative yields $\int _{\mathcal {X}}p_\xi \partial _i \log p_\xi ~\textrm{d}\mu = \int _{\mathcal {X}} \partial _i p_\xi ~\textrm{d}\mu = 0$. By taking the derivative again, we have

$$\begin{aligned} 0&= \partial _j \int _{\mathcal {X}} p_\xi \partial _i \log p_\xi ~\textrm{d}\mu = \int _{\mathcal {X}} \partial _j p_\xi \partial _i \log p_\xi ~\textrm{d}\mu + \int _{\mathcal {X}} p_\xi \partial _j \partial _i \log p_\xi ~\textrm{d}\mu \\&= \int _{\mathcal {X}} p_\xi (\partial _j \log p_\xi ) (\partial _i \log p_\xi )~\textrm{d}\mu + \int _{\mathcal {X}} p_\xi \partial _j \partial _i \log p_\xi ~\textrm{d}\mu \\&= \mathbb {E}\left[ (\partial _j \log p_\xi ) (\partial _i \log p_\xi )\right] + \mathbb {E}\left[ \partial _j\partial _i \log p_\xi \right] . \end{aligned}$$

$\square $

Since the Fisher matrix is symmetric and positive-definite, it defines a Riemannian metric $g_{p_{\xi }}$ (also denoted simply $g_{\xi }$), called the Fisher metric; that is, a family of inner products $g_{\xi } :T_{p_\xi }\mathcal {S}\times T_{p_\xi }\mathcal {S}\rightarrow \mathbb {R}$ that vary smoothly on the statistical manifold. Applying the Fisher metric to two vectors $v_1 =\textrm{d}\varphi _{p_\xi }(\xi _1)$ and $v_2 = \textrm{d}\varphi _{p_\xi }(\xi _2)$ in the tangent space $T_{p_{\xi }}\mathcal {S}$ is equivalent to computing an inner product mediated by the Fisher matrix $G(\xi )$ between the respective local-coordinate vectors $\xi _1, \xi _2 \in \mathbb {R}^n$:

$$\begin{aligned} \langle v_1, v_2 \rangle _{G(\xi )} {:}{=}g_\xi \left( v_1, v_2 \right) {:}{=}g_{p_\xi }\left( v_1, v_2 \right) = \xi _1^\textsf{T}\, G(\xi ) \, \xi _2. \end{aligned}$$

(4)

The following results will help some derivations in the rest of the text. First, we note that, as any Riemannian metric, the Fisher metric is covariant under reparametrisation of the parameter space:

Proposition 2

([2, Thm. 1.6.5]) The Fisher matrix is covariant under reparametrisation of the parameters space, that is, given two coordinate systems $\xi = (\xi ^1, \dots , \xi ^n)$ and $\theta = (\theta ^1, \dots , \theta ^n)$, related by the bijection $\xi = \xi (\theta )$, the Fisher matrix transforms its coordinates as

$$\begin{aligned} \widetilde{G}(\theta ) = \left[ \frac{\textrm{d}\xi }{\textrm{d}\theta } \right] ^\textsf{T}G\left( \xi (\theta )\right) \left[ \frac{\textrm{d}\xi }{\textrm{d}\theta } \right] , \end{aligned}$$

(5)

where $\left[ \frac{\textrm{d}\xi }{\textrm{d}\theta } \right] $ denotes the Jacobian matrix of the transformation $\theta \mapsto \xi $.

Proof

Denote $\widetilde{p}_\theta {:}{=}p_{\xi (\theta )} = \varphi (\xi (\theta ))$ and note that, by the chain rule, $\frac{\partial \widetilde{p}_\theta }{\partial \theta ^i} = \sum _{k=1}^{n} \frac{\partial \xi ^k}{\partial \theta ^i} \frac{\partial p_\xi }{\partial \xi ^k}$ and $\frac{\partial \widetilde{p}_\theta }{\partial \theta ^j} = \sum _{r=1}^{n} \frac{\partial \xi ^r}{\partial \theta ^j} \frac{\partial p_\xi }{\partial \xi ^r}$. Thus

$$\begin{aligned} \widetilde{g}_{ij}(\theta )&= \int _\mathcal {X}\widetilde{p}_\theta \left( \frac{\partial }{\partial \theta ^i}\log \widetilde{p}_\theta \right) \left( \frac{\partial }{\partial \theta ^j}\log \widetilde{p}_\theta \right) \textrm{d}\mu = \int _\mathcal {X}\frac{1}{\widetilde{p}_\theta } \frac{\partial \widetilde{p}_\theta }{\partial \theta ^i} \frac{\partial \widetilde{p}_\theta }{\partial \theta ^j}~\textrm{d}\mu \\&= \sum _{k=1}^{n}\sum _{r=1}^{n} \left( \int _\mathcal {X}\frac{1}{p_{\xi (\theta )}} \frac{\partial p_\xi }{\partial \xi ^k} \frac{\partial p_\xi }{\partial \xi ^r}~\textrm{d}\mu \right) \frac{\partial \xi ^k}{\partial \theta ^i} \frac{\partial \xi ^r}{\partial \theta ^j} = \sum _{k=1}^{n}\sum _{r=1}^{n} g_{kr}(\xi (\theta )) \frac{\partial \xi ^k}{\partial \theta ^i} \frac{\partial \xi ^r}{\partial \theta ^j}. \end{aligned}$$

$\square $

Second, we state an invariance property specific to the Fisher metric:

Proposition 3

([2, Thm. 1.6.4]) Let $X :\Omega \rightarrow \mathcal {X}\subseteq \mathbb {R}^n$ be a random variable distributed according to $p_\xi $. The Fisher metric is invariant under reparametrisations of the sample space $\mathcal {X}$.

Proof

Consider the reparametrisation by the bijection $f :\mathcal {X}\rightarrow \mathcal {Y}\subseteq \mathbb {R}^n$ and denote $\widetilde{p}_\xi $ the distribution associated to the random variable $Y {:}{=}f(X)$. The Jacobian determinant $\left| \frac{\textrm{d}f}{\textrm{d}x}\right| $ of the transformation f relates the relation between the densities:

$$\begin{aligned} p_\xi (x) = \widetilde{p}_\xi (y) \left| \frac{\textrm{d}f}{\textrm{d}x} \right| . \end{aligned}$$

(6)

The log-likelihood functions are ${\widetilde{\ell }}(\xi ) = \log \widetilde{p}_\xi (y) = \log \widetilde{p}_\xi (f(x))$ and $\ell (\xi ) = \log p_\xi (x) = \log \widetilde{p}_\xi (y) + \log \left| \frac{\textrm{d}f}{\textrm{d}x} \right| $. As f does not depend on the parameter $\xi $, we have $\partial _i \ell (\xi ) = \partial _i \widetilde{\ell }(\xi )$, whence

$$\begin{aligned} g_{ij}(\xi )&= \int _\mathcal {X}p_\xi \partial _i \ell (\xi ) \partial _i \ell (\xi )~\textrm{d}\mu = \int _\mathcal {X}(\widetilde{p}_\xi \circ f) \left| \frac{\textrm{d}f}{\textrm{d}x} \right| \partial _i {\widetilde{\ell }}(\xi ) \partial _i {\widetilde{\ell }}(\xi )~\textrm{d}\mu \\&= \int _\mathcal {Y}\widetilde{p}_\xi \partial _i {\widetilde{\ell }}(\xi ) \partial _i {\widetilde{\ell }}(\xi )~\textrm{d}(f_{*}\mu ) = \widetilde{g}_{ij}(\xi ). \end{aligned}$$

$\square $

Furthermore, it can be shown that the Fisher metric is the unique Riemannian metric (up to a multiplicative constant) in statistical manifolds that is invariant under sufficient statistics [4, Thm. 1.2] (see also [5]). This invariance characterisation justifies the choice of this metric to study the geometry of statistical models.

The Fisher metric induces a notion of distance between two distributions in the same statistical manifold, called Fisher–Rao distance, and given by the geodesic distance between these points. Specifically, consider a curve $\xi :\left[ 0,1\right] \rightarrow \Xi $ in the parameter space and its image $\gamma :\left[ 0,1\right] \rightarrow \mathcal {S}$ by the parametrisation $\varphi $, i.e., $\gamma (t) = (\varphi \circ \xi ) (t)$, for $t \in \left[ 0,1\right] $. Since $\dot{\gamma }(t) = \textrm{d}\varphi _{\xi (t)}\big (\dot{\xi }(t)\big )$, in the Fisher geometry, the length of $\gamma $ can be computed as

$$\begin{aligned} l(\gamma ) {:}{=}\int _0^1 \sqrt{ \left\langle \dot{\gamma }(t), \dot{\gamma }(t) \right\rangle _{G \left( \xi (t) \right) } }~\textrm{d}t = \int _0^1 \sqrt{ \dot{\xi }(t)^\textsf{T}\, G(\xi (t)) \, \dot{\xi }(t) }~\textrm{d}t. \end{aligned}$$

(7)

Then, given two distributions $p_{\xi _1}$ and $p_{\xi _2}$ in $\mathcal {S}$, the Fisher–Rao distance between them^{Footnote 2} is the infimum of the length of piecewise differentiable curves $\gamma $ joining these two points:

$$\begin{aligned} d_{\textrm{FR}}(\xi _1, \xi _2) {:}{=}d_{\textrm{FR}}(p_{\xi _1}, p_{\xi _2}) {:}{=}\inf _{\gamma } \left\{ l(\gamma ) \;\Big \vert \;\gamma (0) = p_{\xi _1},\ \gamma (1) = p_{\xi _2} \right\} . \end{aligned}$$

(8)

A curve $\gamma (t) = (\varphi \circ \xi ) (t)$ is a geodesic if, in local coordinates $\xi =(\xi ^1, \dots , \xi ^n)$, the curve $\xi (t)$ is a solution to the geodesic differential equations

$$\begin{aligned} \ddot{\xi }^k(t) + \sum _{i=1}^{n} \sum _{j=1}^{n} \Gamma _{ij}^{k} \dot{\xi }^i(t) \dot{\xi }^j(t) = 0, \quad k \in \{ 1, \dots , n \}, \end{aligned}$$

(9)

where $\Gamma _{ij}^{k}$ are the Christoffel symbols of the second kind, which can be obtained from the equations

$$\begin{aligned} \sum _{k=1}^{n} g_{\ell k} \Gamma _{ij}^{k} = \frac{1}{2} \left( \frac{\partial }{\partial \xi ^i} g_{j\ell } + \frac{\partial }{\partial \xi ^j} g_{\ell i} - \frac{\partial }{\partial \xi ^\ell } g_{ij} \right) , \quad i,j,\ell \in \{1,\dots ,n\} \end{aligned}$$

The Hopf–Rinow theorem (e.g., [41, Thm. 6.4.6]) provides a sufficient condition for the minimum length, as in (8), to be realised by a geodesic: if $(\mathcal {S}, d_\textrm{FR})$ is connected and complete as a metric space, then any two points p, q in the manifold $\mathcal {S}$ can be joined by a minimising curve which is a geodesic, that is, a curve whose length is equal to the Fisher–Rao distance $d_{\textrm{FR}}(p,q)$. This condition is satisfied in all statistical manifolds considered in this paper.

Remark 1

The Fisher–Rao distance is related to the Kullback–Leibler divergence [2, Thm. 4.4.5]:

$$\begin{aligned} D_{\textrm{KL}}(p_{\xi _1} \Vert p_{\xi _2}) = \frac{1}{2} d_{\textrm{FR}}^2(p_{\xi _1},p_{\xi _2}) + o\left( d_{\textrm{FR}}^2(p_{\xi _1},p_{\xi _2})\right) , \end{aligned}$$

where o(x) represents a quantity such that $\lim _{x\rightarrow 0} \frac{o(x)}{x} = 0$. This means that the Kullback–Leibler divergence locally behaves as a ‘squared distance’. Differently from general statistical divergences, the Fisher–Rao distance is a proper distance, i.e., it is symmetric and the triangle inequality holds.

Unfortunately, finding the Fisher–Rao distance between two distributions in a statistical manifold usually is a non-trivial task, since it involves finding the minimising geodesics, potentially by solving the geodesics differential equations (9), and then evaluating the integral in (7). In the case of one-dimensional manifolds, i.e., those which are parametrised by a single real number, computing the Fisher–Rao distance is easier, since the geodesics are immediately given. In such cases, the Fisher matrix $G(\xi ) = [g_{11}(\xi )]$ contains a single element (also called Fisher information). Given two parameters $\xi _1$ and $\xi _2$, there is only one path joining them, whose length does not depend on the chosen parametrisation. In particular, we can consider the arc length parametrisation $\xi (t) = t$, with $t \in \left[ \xi _1, \xi _2 \right] $, so that $\big |\dot{\xi }(t)\big | = 1$. Thus the expression for the length of the curve $\gamma (t) = (\varphi \circ \xi )(t)$ in (7) becomes

$$\begin{aligned} l(\gamma ) = \int _{\xi _1}^{\xi _2} \sqrt{g_{11}(\xi (t))}~\textrm{d}t = \int _{\xi _1}^{\xi _2} \sqrt{g_{11}(\xi )}~\textrm{d}\xi , \end{aligned}$$

and the Fisher–Rao distance between distributions parametrised by $\xi _1$ and $\xi _2$ is

$$\begin{aligned} d_{\textrm{FR}}(\xi _1,\xi _2) = \left| \int _{\xi _1}^{\xi _2} \sqrt{g_{11}(\xi )}~\textrm{d}\xi \right| . \end{aligned}$$

(10)

For higher-dimensional manifolds, the techniques to find the geodesics and the Fisher–Rao distance consist in directly solving the geodesic differential equations, or in doing an analogy with some well-known geometry (e.g., spherical, hyperbolic), as we shall see in the next sections.

Remark 2

The expression (10) allows one to find the Fisher–Rao distance in one-dimensional submanifolds of higher dimensional statistical manifolds, i.e., when only one parameter is allowed to vary and the others are fixed. This has been done for Gamma, Weibull and power function distributions in [10, 12].

3 Hyperbolic geometry results

We recall in this section some classical results from hyperbolic geometry [42,43,44], since many of the statistical manifolds studied in this work are related to that geometry. These will be extensively used particularly in Sects. 4.7 and 5.4–5.9. If desired, the reader may skip this section for now, and return to it when reading those subsections to get the details of the derivations.

We start with the hyperbolic geometry in dimension two, to be used in the approach of the statistical manifolds in Sects. 5.4–5.9, analogously to what is done in [45]. In this case, we consider the Poincaré half-plane $\mathcal {H}^2 {:}{=}\left\{ (x,y) \in \mathbb {R}^2 \;\Big \vert \;y>0 \right\} $ as a model for hyperbolic geometry, with the metric given in matrix form by

$$\begin{aligned} G_{\mathcal {H}^2}(x,y) =\begin{pmatrix} \frac{1}{y^2} & \quad 0\\ 0 & \quad \frac{1}{y^2}\\ \end{pmatrix}. \end{aligned}$$

(11)

The geodesics in this manifold are vertical half-lines and half-circles centred at $y=0$, and the geodesic distance between two points is given by the following equivalent expressions:

$$\begin{aligned}&d_{\mathcal {H}^2}\left( (x_1,y_1),(x_2,y_2)\right) \\&\quad = \log \left( \frac{ \sqrt{(x_1-x_2)^2 + (y_1+y_2)^2} + \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2} }{ \sqrt{(x_1-x_2)^2 + (y_1+y_2)^2} - \sqrt{(x_1-x_2)^2 + (y_1-y_2)^2}}\right) \\&\quad = {{\,\textrm{arccosh}\,}}\left( 1 + \frac{(x_1-x_2)^2 + (y_1-y_2)^2}{2y_1y_2} \right) \\&\quad = 2 {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{ (x_1-x_2)^2 + (y_1-y_2)^2 }{ (x_1-x_2)^2 + (y_1+y_2)^2 }}\right) . \end{aligned}$$

The next lemma allows one to relate the geodesic distance in a class of two-dimensional Riemannian manifold to that in the Poincaré half-plane.

Lemma 4

Consider the Poincaré half-plane model $\mathcal {H}^2$ with the hyperbolic metric (11), and a two-dimensional Riemannian manifold $(\mathcal {M},g_\mathcal {M})$, parametrised by a global coordinate system $\varphi :\Pi \subseteq \mathbb {R}\times \mathbb {R}_+^* \rightarrow \mathcal {M}$. If the metric $g_{\mathcal {M}}$ is given in matrix form by

$$\begin{aligned} G_{\mathcal {M}}(x,y) = \begin{pmatrix} \frac{a}{y^2} & \quad 0\\ 0 & \quad \frac{b}{y^2}\\ \end{pmatrix}, \end{aligned}$$

with a, b positive constants, then the geodesic distance between two points in $\mathcal {M}$ is

$$\begin{aligned} d_{\mathcal {M}} \big ( \varphi (x_1,y_1), \varphi (x_2,y_2) \big ) = \sqrt{b} \ d_{\mathcal {H}^2} \left( (\sqrt{a}x_1, \sqrt{b}y_1), (\sqrt{a}x_2,\sqrt{b}y_2) \right) . \end{aligned}$$

(12)

Proof

Consider a curve $\pi (t) = \big (x(t), y(t)\big )$ in the parameter space $\Pi $ and its image $\alpha (t) = (\varphi \circ \pi )(t)$ by the parametrisation. Applying the metric $G_{\mathcal {M}}(x,y)$ to $\dot{\alpha }(t) = \textrm{d}\varphi _{\pi (t)}\big (\dot{\pi }(t)\big )$ in $\mathcal {M}$ gives

$$\begin{aligned} \left\langle \dot{\alpha }(t), \dot{\alpha }(t) \right\rangle _{G(\pi (t))}&= \begin{pmatrix} \dot{x}(t)&\quad \dot{y}(t) \end{pmatrix} \begin{pmatrix} \frac{a}{(y(t))^2} & \quad 0\\ 0 & \quad \frac{b}{(y(t))^2}\\ \end{pmatrix} \begin{pmatrix} \dot{x}(t) \\ \dot{y}(t) \end{pmatrix}\\&= \frac{1}{(y(t))^2} \left( a (\dot{x}(t))^2 + b (\dot{y}(t))^2 \right) . \end{aligned}$$

On the other hand, consider the diffeomorphism $\psi :\mathcal {M}\rightarrow \mathcal {H}^2$, $\varphi (x,y) \mapsto (\sqrt{a}x,\sqrt{b}y)$, and the image curve $\beta (t) {:}{=}(\psi \circ \alpha )(t) = (\sqrt{a}x(t),\sqrt{b}y(t))$. Applying $G_{\mathcal {H}^2}(\beta (t))$ to $\dot{\beta }(t) = \textrm{d}\psi _{\alpha (t)}(\dot{\alpha }(t))$ in $\mathcal {H}^2$ gives

$$\begin{aligned} \left\langle \dot{\beta }(t), \dot{\beta }(t) \right\rangle _{G_{\mathcal {H}^2}(\beta (t))}&= \begin{pmatrix} \sqrt{a}\dot{x}(t)&\quad \sqrt{b}\dot{y}(t) \end{pmatrix} \begin{pmatrix} \frac{1}{b(y(t))^2} & \quad 0\\ 0 & \quad \frac{1}{b(y(t))^2}\\ \end{pmatrix} \begin{pmatrix} \sqrt{a}\dot{x}(t) \\ \sqrt{b}\dot{y}(t) \end{pmatrix}\\&= \frac{1}{b(y(t))^2} \left( a (\dot{x}(t))^2 + b (\dot{y}(t))^2 \right) . \end{aligned}$$

Thus, $\left\langle \dot{\alpha }(t), \dot{\alpha }(t) \right\rangle _{G_{\mathcal {M}}(\alpha (t))}= b\left\langle \dot{\beta }(t), \dot{\beta }(t) \right\rangle _{G_{\mathcal {H}^2}(\beta (t))}$, that is, $\left\| \dot{\alpha }(t)\right\| _{G_{\mathcal {M}}(\alpha (t))}=\sqrt{b} \left\| \dot{\beta }(t) \right\| _{G_{\mathcal {H}^2}(\beta (t))}$. This implies that a curve $\alpha (t)$ connecting two points in $\mathcal {M}$ is a geodesic (minimises the length) if, and only if, its image $\beta (t)$ is a geodesic in $\mathcal {H}^2$. Taking (7) and (8) into account concludes the proof. $\square $

Remark 3

It is possible to deduce what the geodesics in $\mathcal {M}$ look like in the parameter space $\Pi $ (i.e., their preimages by $\varphi $), since they are the inverse image by $\psi $ of geodesics in $\mathcal {H}^2$. Consider a geodesic in $\mathcal {M}$ connecting the points $\varphi (x_1,y_1)$ and $\varphi (x_2,y_2)$. Its preimage in the parameter space is given either by the vertical line joining $y_1$ to $y_2$, if $x_1=x_2$, or, otherwise, by the arc of the half-ellipse joining $(x_1,y_1)$ to $(x_2,y_2)$, centred at (C, 0) and given by $\left( \frac{R}{\sqrt{a}}\cos (t)+C,\frac{R}{\sqrt{b}}\sin (t)\right) $, where $C = \frac{a(x_1^2-x_2^2)+b(y_1^2-y_2^2)}{2a(x_1-x_2)}$ and $R = \frac{1}{2} \sqrt{ a\left( x_1-x_2\right) ^2 + \frac{b^2}{a}\left( \frac{y_1^2-y_2^2}{x_1-x_2}\right) ^2 + 2b\left( y_1^2+y_2^2\right) ^2 }$.

More generally, the n-dimensional hyperbolic half-space model is the Riemannian manifold $\mathcal {H}^{n} {:}{=}\left\{ (x_1, \dots , x_n) \in \mathbb {R}^n \;\Big \vert \;x_n > 0 \right\} $, equipped with the metric given in matrix form by

$$\begin{aligned} G_{\mathcal {H}^{n}}(x_1,\dots ,x_n) = \begin{pmatrix} \frac{1}{x_n^2} & \quad 0 & \quad \cdots & \quad 0\\ 0 & \frac{1}{x_n^2} & \quad \cdots & \quad 0\\ \vdots & \quad \vdots & \quad \ddots & \quad \vdots \\ 0 & \quad 0 & \quad \cdots & \quad \frac{1}{x_n^2} \end{pmatrix}, \end{aligned}$$

(13)

that is, $\left\langle (u_1,\dots ,u_n),(v_1,\dots ,v_n) \right\rangle _{G_{\mathcal {H}^n}(x_1,\dots ,x_n)} = \frac{1}{x_n^2} \sum _{i=1}^{n} u_iv_i$. This manifold has constant negative curvature. The geodesics in this manifold are vertical half-lines and vertical half-circles centred at the hyperplane $x_n=0$, and the geodesic distance between two points in $\mathcal {H}^n$ is given by

$$\begin{aligned} d_{\mathcal {H}^n}\big ( (x_1,\dots ,x_n),(y_1,\dots ,y_n) \big ) = {{\,\textrm{arccosh}\,}}\left( 1 + \frac{\sum _{i=1}^{n} (x_i-y_i)^2 }{2x_ny_n} \right) . \end{aligned}$$

(14)

Restricted to points in the $(n-1)$-dimensional unit half-sphere in $\mathcal {H}^{n}$, that is, $S_{1}^{n-1} {:}{=}\left\{ (x_1,\dots ,x_{n}) \in \mathcal {H}^{n} \;\Big \vert \;\sum _{i=1}^{n} x_i^2=1 \right\} $, the expression for the distance becomes

$$\begin{aligned} \widetilde{d}_{S^{n-1}_1}\big ( (x_1,\dots ,x_{n}), (y_1,\dots ,y_{n}) \big ) = {{\,\textrm{arccosh}\,}}\left( \frac{1 - \sum _{i=1}^{n-1} x_i y_i }{x_n y_n} \right) . \end{aligned}$$

(15)

From (14), we immediately see that the distance $d_{\mathcal {H}^n}$ is invariant by dilation or contraction, that is, denoting $\pmb {x}{:}{=}(x_1,\dots ,x_n) \in \mathcal {H}^{n}$ and $\pmb {y}{:}{=}(y_1,\dots ,y_n) \in \mathcal {H}^{n}$, we have $d_{\mathcal {H}^{n}}(\pmb {x},\pmb {y}) = d_{\mathcal {H}^{n}}(\lambda \pmb {x},\lambda \pmb {y})$, for $\lambda \ne 0$. For points $\pmb {x}$ and $\pmb {y}$ in the half-sphere $S_{r}^{n-1} {:}{=}\left\{ (x_1,\dots ,x_n) \in \mathcal {H}^{n} \;\Big \vert \;\sum _{i=1}^{n} x_i^2 = r^2 \right\} $ of radius r, we can see that the distance between them, restricted to the sphere $S_r^{n-1}$, is the distance given by (15) between their projections onto the radius-1 sphere, that is,

$$\begin{aligned} \widetilde{d}_{S_r^{n-1}}\left( \pmb {x}, \pmb {y}\right) = \widetilde{d}_{S_1^{n-1}} \left( \frac{\pmb {x}}{r}, \frac{\pmb {y}}{r} \right) . \end{aligned}$$

(16)

The central projection $\pi :S_{1}^{n} \subseteq \mathcal {H}^{n+1} \rightarrow \mathcal {H}^n$ given by $(x_1,\dots ,x_n,x_{n+1}) \mapsto \left( \frac{2x_2}{x_1+1}, \dots , \frac{2x_{n+1}}{x_1+1} \right) $ is an isometry between the unit half-sphere $S_{1}^{n} \subseteq \mathcal {H}^{n+1}$, with the restriction of the ambient hyperbolic metric, and the hyperbolic space $\mathcal {H}^{n}$. It provides a hemisphere model in dimension $n+1$ for the n-dimensional hyperbolic space [43]. The geodesics in the hemisphere model are the inverse image by f of the geodesics in $\mathcal {H}^n$, namely, semicircles orthogonal to the hyperplane $x_{n+1}=0$.

4 Discrete distributions

In the following, we begin with examples of one-dimensional statistical manifolds (Sects. 4.1–4.4), and then consider high-dimensional manifolds (Sects. 4.5–4.7). The results are summarised in Table 1.

Table 1 Fisher–Rao distances for discrete distributions

Full size table

4.1 Binomial

A binomial distribution [2, 10, 12] models the probability of having x successes in n independent and identically distributed (i.i.d.) Bernoulli experiments with parameter $\theta $. Its p.m.f. is given by $p(x) = \left( {\begin{array}{c}n\\ x\end{array}}\right) \theta ^x (1-\theta )^{n-x}$, defined for $x \in \{0,1,\dots ,n\}$, and parametrised by $\theta \in ]0,1[$. In this case, $\partial _\theta \ell (\theta ) = \frac{x}{\theta } - \frac{n-x}{1-\theta }$, and the Fisher information is

$$\begin{aligned} g_{11}(\theta )&= \mathbb {E}\left[ \left( \partial _\theta \ell (\theta ) \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{X}{\theta } - \frac{n-X}{1-\theta }\right) ^2 \right] \nonumber \\&= \frac{\mathbb {E}[X^2]}{\theta ^2} + \frac{2\mathbb {E}[X^2]-2n\mathbb {E}[X]}{\theta (1-\theta )} + \frac{n -2n\mathbb {E}[X] + \mathbb {E}[X^2]}{(1-\theta )^2} \nonumber \\&\qquad \qquad = \frac{n}{\theta (1-\theta )}, \end{aligned}$$

(17)

where we have used that $\mathbb {E}[X] = n\theta $ and $\mathbb {E}[X^2] = n\theta -n\theta ^2 +n^2\theta ^2$. The Fisher–Rao distance is then

$$\begin{aligned} d_{\textrm{FR}}(\theta _1,\theta _2) = \left| \int _{\theta _1}^{\theta _2} \sqrt{\frac{n}{\theta (1-\theta )}}~\textrm{d}\theta \right| = 2\sqrt{n}\left| \arcsin \sqrt{\theta _1} - \arcsin \sqrt{\theta _2} \right| . \end{aligned}$$

(18)

4.2 Poisson

A Poisson distribution [2, 10, 12] has p.m.f. $p(x) = \frac{\lambda ^xe^{-\lambda }}{x!}$, defined for $x \in \mathbb {N}$, and parametrised by $\lambda \in \mathbb {R}_+^{*}$. In this case, we have $\partial _{\lambda } \ell (\lambda ) = \frac{x}{\lambda } - 1$, and the Fisher information is given by

$$\begin{aligned} g_{11}(\lambda )&= \mathbb {E}\left[ \left( \partial _{\lambda } \ell (\lambda ) \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{X}{\lambda } - 1 \right) ^2 \right] \nonumber \\&= \frac{1}{\lambda ^2} \mathbb {E}\left[ X^2 \right] - \frac{2}{\lambda } \mathbb {E}\left[ X \right] + 1 \nonumber \\&= \frac{1}{\lambda }, \end{aligned}$$

(19)

where we have used that $\mathbb {E}[X] = \lambda $ and $\mathbb {E}[X^2] = \lambda (\lambda + 1)$. Thus the Fisher–Rao distance is

$$\begin{aligned} d_{\textrm{FR}}(\lambda _1, \lambda _2) = \left| \int _{\lambda _1}^{\lambda _2} \frac{1}{\sqrt{\lambda }}~\textrm{d}\lambda \right| = 2 \left| \sqrt{\lambda _1} - \sqrt{\lambda _2} \right| . \end{aligned}$$

(20)

4.3 Geometric

A geometric distribution [2, 19] models the number of i.i.d. Bernoulli trials with parameter $\theta $ needed to obtain one success. Its p.m.f. is $p(x) = \theta (1-\theta )^{x-1}$, defined for $x \in \mathbb {N}^*$, and parametrised by $\theta \in ]0,1[$. We have $\partial _\theta \ell (\theta ) = \frac{1}{\theta } - \frac{x-1}{1-\theta }$, and the Fisher information is

$$\begin{aligned} g_{11}(\theta )&= \mathbb {E}\left[ \left( \partial _{\theta } \ell (\theta ) \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{1}{\theta } - \frac{X-1}{1-\theta } \right) ^2 \right] \nonumber \\&= \frac{1}{\theta ^2} + \frac{2-2\mathbb {E}[X]}{\theta (1-\theta )} + \frac{\mathbb {E}[X^2] -2\mathbb {E}[X] + 1}{(1-\theta ^2)} \nonumber \\&= \frac{1}{\theta ^2(1-\theta )}, \end{aligned}$$

(21)

where we have used that $\mathbb {E}[X] = \frac{1}{\theta }$ and $\mathbb {E}[X^2] = \frac{2-\theta }{\theta ^2}$. The Fisher–Rao distance is

$$\begin{aligned} d_{\textrm{FR}}(\theta _1,\theta _2) = \left| \int _{\theta _1}^{\theta _2} \frac{1}{\theta \sqrt{1-\theta }}~\textrm{d}\theta \right| = 2 \left| {{\,\textrm{arctanh}\,}}\sqrt{1-\theta _1} - {{\,\textrm{arctanh}\,}}\sqrt{1-\theta _2} \right| . \end{aligned}$$

(22)

4.4 Negative binomial

A negative binomial distribution [12, 19] models the excess of i.i.d. Bernoulli experiments with parameter $\theta $ needed until a number of r successes occur. It has p.m.f. $p(x) = \frac{\Gamma (x+r)}{x! \Gamma (r)} \theta ^r (1-\theta )^x$, defined for $x \in \mathbb {N}$, and is parametrised by $\theta \in ] 0,1[$, for a fixed r, that can be extend to $r \in \mathbb {R}^*_+$. We have $\partial _\theta \ell (\theta ) = \frac{r}{\theta } - \frac{x}{1-\theta }$, and the Fisher information is

$$\begin{aligned} g_{11}(\theta )&= \mathbb {E}\left[ \left( \partial _{\theta } \ell (\theta ) \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{r}{\theta } - \frac{X}{1-\theta } \right) ^2 \right] \nonumber \\&= \frac{r^2}{\theta ^2} - \frac{2r\mathbb {E}[X]}{\theta (1-\theta )} + \frac{\mathbb {E}[X^2]}{(1-\theta ^2)}\nonumber \\&= \frac{r}{\theta ^2(1-\theta )}, \end{aligned}$$

(23)

where we have used that $\mathbb {E}[X] = \frac{r(1-\theta )}{\theta }$ and $\mathbb {E}[X^2] = \frac{r(1-\theta )+r^2(1-\theta )^2}{(1-\theta )^2}$. The Fisher–Rao distance is then

$$\begin{aligned} d_{\textrm{FR}}(\theta _1,\theta _2) = \left| \int _{\theta _1}^{\theta _2} \frac{\sqrt{r}}{\theta \sqrt{1-\theta }}~\textrm{d}\theta \right| = 2 \sqrt{r} \left| {{\,\textrm{arctanh}\,}}\sqrt{1-\theta _1} - {{\,\textrm{arctanh}\,}}\sqrt{1-\theta _2} \right| . \end{aligned}$$

(24)

4.5 Categorical

A categorical distribution [2, 4, 10, 12, 46] models a random variable taking values in the sample space $\mathcal {X}= \{ 1, 2, \dots , n \}$ with probabilities $p_1, \dots , p_n$, and has p.m.f. $p(x) = \sum _{i=1}^{n} p_i \mathbbm {1}_{\{i\}}(x)$. The associated $(n-1)$-dimensional statistical manifold

$$\begin{aligned} \mathcal {S}= \left\{ p = \textstyle \sum _{i=1}^{n} p_i \mathbbm {1}_{\{i\}} \;\Big \vert \;p_i \in ]0,1[,\ \textstyle \sum _{i=1}^{n}p_i = 1 \right\} \end{aligned}$$

is in correspondence with the interior of the probability simplex $\mathring{\Delta }^{n-1} {:}{=}\left\{ \pmb {p} = (p_1, \dots , p_n) \;\Big \vert \;p_i \in ]0,1[,\ \textstyle \sum _{i=1}^{n}p_i = 1 \right\} $ through the bijection $\iota :\mathring{\Delta }^{n-1} \rightarrow \mathcal {S}$, given by $(p_1, \dots , p_n) \mapsto \sum _{i=1}^{n} p_i \mathbbm {1}_{\{i\}}$. Both these manifolds can be parametrised by the set

$$\begin{aligned} \Xi = \left\{ \xi = (\xi ^1,\dots ,\xi ^{n-1}) \in \mathbb {R}^{n-1} \;\Big \vert \;\xi ^i > 0,\ \textstyle \sum _{i=1}^{n-1} \xi ^i < 1 \right\} , \end{aligned}$$

(25)

by taking $p_i = \xi ^i$, for $1 \le i \le n-1$, and $p_n = 1 - \sum _{i=1}^{n-1} \xi ^i$.

To compute the Fisher matrix, it is useful to write $p(x) = \sum _{i=1}^{n-1} \xi ^i \mathbbm {1}_{\{i\}}(x) + \big ( 1 - \sum _{i=1}^{n-1} \xi ^i \big ) \mathbbm {1}_{\{n\}}(x)$, so that

$$\begin{aligned} \partial _i \ell (\xi ) = \frac{\mathbbm {1}_{\{i\}}(x) - \mathbbm {1}_{\{n\}}(x)}{\sum _{k=1}^{n} p_k \mathbbm {1}_{\{k\}}(x)}, \end{aligned}$$

with $p_k = p_k(\xi )$ as above. The elements of the Fisher matrix are, for $1 \le i,j \le n-1$,

$$\begin{aligned} g_{ij}(\xi )&= \mathbb {E}\left[ \left( \partial _i \ell (\xi ) \right) \left( \partial _j \ell (\xi ) \right) \right] = \mathbb {E}\left[ \textstyle \frac{\left( \mathbbm {1}_{\{i\}}(X) - \mathbbm {1}_{\{n\}}(X) \right) \left( \mathbbm {1}_{\{j\}}(X) - \mathbbm {1}_{\{n\}}(X) \right) }{\left( \sum _k p_k \mathbbm {1}_{\{k\}}(X)\right) ^2} \right] \nonumber \\&= \mathbb {E}\left[ \textstyle \frac{\mathbbm {1}_{\{i\}}(X) \mathbbm {1}_{\{j\}}(X)}{\left( \sum _k p_k \mathbbm {1}_{\{k\}}(X)\right) ^2} - \frac{\mathbbm {1}_{\{i\}}(X) \mathbbm {1}_{\{n\}}(X)}{\left( \sum _k p_k \mathbbm {1}_{\{k\}}(X)\right) ^2} - \frac{\mathbbm {1}_{\{j\}}(X) \mathbbm {1}_{\{n\}}(X)}{\left( \sum _k p_k \mathbbm {1}_{\{k\}}(X)\right) ^2} + \frac{\left( \mathbbm {1}_{\{n\}}(X)\right) ^2}{\left( \sum _k p_k \mathbbm {1}_{\{k\}}(X)\right) ^2} \right] \nonumber \\&= \frac{\delta _{ij}}{p_i} + \frac{1}{p_n}\nonumber \\&= \frac{\delta _{ij}}{\xi ^i} + \frac{1}{1 - \sum _{k=1}^{n-1} \xi ^k}, \end{aligned}$$

(26)

where we have used that $\mathbb {E}\left[ \frac{\mathbbm {1}_{\{i\}}(X) \mathbbm {1}_{\{j\}}(X)}{\left( \sum _k p_k \mathbbm {1}_{\{k\}}(X)\right) ^2} \right] = \frac{\delta _{ij}}{p_i}$.

To obtain the geodesics and the Fisher–Rao distance, it is convenient to consider the mapping $p_i \mapsto 2\sqrt{p_i}$ that takes the simplex $\Delta ^{n-1} \subseteq \mathbb {R}^{n}$ (in correspondence with the statistical manifold) to the positive part of the radius-two Euclidean sphere, denoted $S_{2,+}^{n-1} \subseteq \mathbb {R}^n$. In fact, this bijection is an isometry [40, 46]:

Proposition 5

The diffeomorphism

$$\begin{aligned} f :\quad \mathcal {S}\subseteq \mathcal {P}({\mathcal {X}})&\rightarrow S_{2,+}^{n-1} \subseteq \mathbb {R}^n\nonumber \\ p = \sum _{i=1}^{n} p_i \mathbbm {1}_{\{i\}}&\mapsto (2\sqrt{p_1}, \dots , 2\sqrt{p_n}) \end{aligned}$$

(27)

is an isometry between the statistical manifold $\mathcal {S}$ equipped with the Fisher metric $g_p$ and $S_{2,+}^{n-1}$ with the restriction of the ambient Euclidean metric.

Proof

We will show that $g_p(u,v) = \left\langle \textrm{d}f_p(u), \textrm{d}f_p(v) \right\rangle $, for all $p \in \mathcal {S}$, $u,v \in T_p\mathcal {S}$. Let $\varphi $ denote the parametrisation of the statistical manifold $\mathcal {S}$. Consider the curve $\alpha _i(t) = \varphi \left( \xi ^1, \dots , \xi ^i+t, \dots , \xi ^{n-1} \right) \in \mathcal {S}$ and take $p=\varphi (\xi ^1,\dots ,\xi ^{n-1})$. We have that

$$\begin{aligned} \beta _i(t) {:}{=}\left( f \circ \alpha _i \right) (t) = \left( 2\sqrt{\xi ^1}, \dots , 2\sqrt{\xi ^i+t}, \dots , 2\sqrt{\xi ^{n-1}}, 2\sqrt{1-\textstyle \sum _{k=1}^{n-1} \xi ^k - t} \right) . \end{aligned}$$

Now, we can compute the differential applied to the tangent vector $\frac{\partial }{\partial \xi ^i}(p)$:

$$\begin{aligned} \textrm{d}f_p \left( \frac{\partial }{\partial \xi ^i}(p) \right) = \left. \frac{\textrm{d}}{\textrm{d}t} \beta _i (t) \right| _{t=0} = \left( 0, \dots , 0, \frac{1}{\sqrt{\xi ^i}}, 0, \dots , 0, \frac{-1}{\sqrt{1-\textstyle \sum _{k=1}^{n-1}\xi ^k}} \right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \left\langle \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^i}(p)\right) , \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^j}(p)\right) \right\rangle = \frac{\delta _{ij}}{\xi ^i} + \frac{1}{1-\sum _{k=1}^{n-1} \xi ^k}, \end{aligned}$$

which is equal to $g_{ij}(\xi ) = g_p \left( \frac{\partial }{\partial \xi ^i}(p), \frac{\partial }{\partial \xi ^j}(p) \right) $ in (26). Since $\left\{ \frac{\partial }{\partial \xi ^1}(p), \dots , \frac{\partial }{\partial \xi ^{n-1}}(p) \right\} $ is a basis of $T_p\mathcal {S}$, this is enough to show that f is indeed an isometry. $\square $

Thus, the Fisher metric in $\mathcal {S}$ coincides with the Euclidean metric restricted to the positive part of the sphere $S^{n-1}_{2,+}$, that is, the Fisher–Rao distance between distributions $p = \varphi (p_1, \dots , p_{n-1})$ and $q = \varphi (q_1, \dots , q_{n-1})$ in $\mathcal {S}$ is equal to the length of geodesic joining f(p) and f(q) on the sphere, which is great circle arc. This length is double the angle $\alpha $ between the vectors f(p) and f(q), i.e., $2 \alpha = 2 \arccos \left\langle \frac{f(p)}{2}, \frac{f(q)}{2} \right\rangle = 2 \arccos \left( \sum _{i=1}^{n} \sqrt{p_iq_i}\right) $, with $p_n = 1-\sum _{i=1}^{n-1}p_i$ and $q_n = 1-\sum _{i=1}^{n-1}q_i$. Therefore, the Fisher–Rao distance between these two distributions is

$$\begin{aligned} d_{\textrm{FR}}(p,q) = 2 \arccos \left( \sum _{i=1}^{n} \sqrt{p_iq_i}\right) . \end{aligned}$$

(28)

Note that the isometry f also allows extending the Fisher metric to the boundaries of the statistical manifold $\mathcal {S}$.

Remark 4

The reparametrisation to the sphere provides a nice geometrical interpretation for the relation between the Fisher–Rao distance and the Hellinger distance [47, § 2.4] $d_{\textrm{H}}(p,q) = \sqrt{\sum _{i=1}^{n} \left( \sqrt{p_i} - \sqrt{q_i} \right) ^2}$. While the Fisher–Rao distance between distributions p and q is the length of the radius-two circumference arc between f(p) and f(q), the Hellinger distance is half the Euclidean distance between f(p) and f(q), i.e., $2d_{\textrm{H}}(p,q) = \Vert f(p)-f(q)\Vert _2$. In other words, double the Hellinger distance is the arc-chord approximation for the Fisher–Rao distance.

4.6 Multinomial

Consider m i.i.d. experiments that follow a categorical distribution with n possible outcomes and probabilities $p_1, \dots , p_n$. A multinomial distribution [2, 10, 12] gives the probability of getting $x_i$ times the i-th outcome, for $1 \le i \le n$ and $\sum _{i=1}^{n} x_i = m$. Its p.m.f. is $p(\pmb {x}) = p(x_1, \dots , x_n) = m! \prod _{i=1}^{n} \frac{p_i^{x_i}}{x_i!}$ and is defined on the sample space $\mathcal {X}= \left\{ \pmb {x} = (x_1, \dots , x_n) \in \mathbb {N}^n \;\Big \vert \;\sum _{i=1}^{n} x_i = m \right\} $. This distribution is parametrised by the same $\xi = (\xi ^1, \dots , \xi ^{n-1}) \in \Xi $ as in the categorical distribution, cf. (25), with $p_i = \xi ^i$, for $1 \le i \le n-1$ and $p_n = 1 - \sum _{i=1}^{n-1} \xi ^i$. In this case, we have $\partial _i \ell (\xi ) = \frac{x_i}{p_i} - \frac{x_n}{p_n}$ and $\partial _j\partial _i \ell (\xi ) = - \frac{x_i}{p_i^2}\delta _{ij} - \frac{x_n}{p_n^2}$. Thus, the elements of the Fisher matrix are given by

$$\begin{aligned} g_{ij}(\xi )&= - \mathbb {E}\left[ \partial _j\partial _i \ell (\xi ) \right] = \mathbb {E}\left[ \frac{X_i}{p_i^2}\delta _{ij} + \frac{X_n}{p_n^2} \right] \nonumber \\&= m \left( \frac{\delta _{ij}}{p_i} + \frac{1}{p_n} \right) \nonumber \\&= m \left( \frac{\delta _{ij}}{\xi ^i} + \frac{1}{1-\sum _{k=1}^{n-1} \xi ^k} \right) , \end{aligned}$$

(29)

where we have used that $\mathbb {E}[X_i] = m p_i$. Note that this is the same metric as for the categorical distribution (26), up to the factor m. Therefore, the Fisher–Rao distance between two distributions $p = \varphi (p_1, \dots , p_{n-1})$ and $q = \varphi (q_1, \dots , q_{n-1})$, with $p_n = 1-\sum _{i=1}^{n-1}p_i$ and $q_n = 1-\sum _{i=1}^{n-1}q_i$ is

$$\begin{aligned} d_{\textrm{FR}}(p,q) = 2 \sqrt{m} \arccos \left( \sum _{i=1}^{n} \sqrt{p_iq_i} \right) . \end{aligned}$$

(30)

4.7 Negative multinomial

A negative multinomial distribution [12, 19, 48] generalises the negative binomial distribution. Consider a sequence of i.i.d. categorical experiments with n possible outcomes. A negative multinomial distribution models the number of times $x_1, \dots , x_{n-1}$ that the first $n-1$ outcomes occur before the n-th outcome occurs $x_n$ times. It is characterised by the p.m.f. $p(\pmb {x}) = p(x_1, \dots , x_{n-1}) = p_n^{x_n} \frac{\Gamma (\sum _{i=1}^{n} x_i)}{\Gamma (x_n)} \prod _{i=1}^{n-1} \frac{p_i^{x_i}}{x_i!}$, and defined for $\pmb {x} = (x_1, \dots , x_{n-1}) \in \mathbb {N}^{n-1}$. It has the same parametrisation as the categorical distribution, cf. (25), with $p_i = \xi ^i$, for $1 \le i \le n-1$ and $p_n = 1 - \sum _{i=1}^{n-1} \xi ^i$, for a fixed $x_n$ that can be extended to $x_n \in \mathbb {R}_+^*$. In this case, we have $\partial _i\ell (\xi ) = \frac{x_i}{p_i} - \frac{x_n}{p_n}$ and $\partial _j\partial _i\ell (\xi ) = -\frac{x_i}{p_i^2}\delta _{ij} -\frac{x_n}{p_n^2}$. The elements of the Fisher matrix are then

$$\begin{aligned} g_{ij}(\xi )&= -\mathbb {E}\left[ \partial _j\partial _i\ell (\xi ) \right] = \mathbb {E}\left[ \frac{X_i}{p_i^2}\delta _{ij} + \frac{x_n}{p_n^2} \right] \nonumber \\&= \frac{x_n}{p_n} \left( \frac{\delta _{ij}}{p_i} + \frac{1}{p_n} \right) \nonumber \\&= \frac{x_n}{1-\sum _{k=1}^{n-1}\xi ^k} \left( \frac{\delta _{ij}}{\xi ^i} + \frac{1}{1-\sum _{k=1}^{n-1}\xi ^k} \right) , \end{aligned}$$

(31)

where we used that $\mathbb {E}[X_i] = x_np_i/p_n$.

To find the Fisher–Rao distance, we relate the geometry of this manifold to the radius-two hemisphere model with hyperbolic metric (cf. Sect. 3) using a similar diffeomorphism as (27). Consider

$$\begin{aligned} f :\quad \mathcal {S}\subseteq \mathcal {P}(\mathcal {X})&\rightarrow S_{2,+}^{n-1} \subseteq \mathcal {H}^{n}\nonumber \\ p = \sum _{i=1}^{n} p_i \mathbbm {1}_{\{i\}}&\mapsto \left( 2\sqrt{p_1}, \dots , 2\sqrt{p_n} \right) . \end{aligned}$$

(32)

Denote $\varphi $ the parametrisation of the statistical manifold, and consider the point $p = \varphi (\xi ^1,\dots ,\xi ^{n-1})$. Taking the curves $\alpha _i(t) = \varphi (\xi ^1, \dots , \xi ^i+t, \dots , \xi ^{n-1})$ in $\mathcal {S}$ their images in $S_{2,+}^{n-1}$ are

$$\begin{aligned} \beta _i(t) {:}{=}(f \circ \alpha _i)(t) = \left( 2\sqrt{\xi ^1}, \dots , 2\sqrt{\xi ^i+t}, \dots , 2\sqrt{\xi ^{n-1}}, 2\sqrt{1 - \textstyle \sum _{k=1}^{n-1}\xi ^k -t} \right) . \end{aligned}$$

We have

$$\begin{aligned} \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^i}(p) \right) = \left. \frac{\textrm{d}}{\textrm{d}t}\beta _i(t)\right| _{t=0} = \left( 0, \dots , 0, \frac{1}{\sqrt{\xi ^i}}, 0, \dots , \frac{-1}{\sqrt{1-\textstyle \sum _{k=1}^{n-1}\xi ^k}} \right) , \end{aligned}$$

therefore

$$\begin{aligned} \left\langle \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^i}(p) \right) , \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^j}(p) \right) \right\rangle _{G_{\mathcal {H}^{n}}(p)} = \frac{1}{4\left( 1-\sum _{k=1}^{n-1}\xi ^k\right) } \left( \frac{\delta _{ij}}{\xi ^i} + \frac{1}{1-\sum _{k=1}^{n-1}\xi ^k} \right) , \end{aligned}$$

and we conclude that $g_{ij}(\xi ) = g_{\xi } \left( \frac{\partial }{\partial \xi ^i}(p), \frac{\partial }{\partial \xi ^j}(p) \right) = 4x_n \left\langle \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^i}(p) \right) , \right. \left. \textrm{d}f_p\left( \frac{\partial }{\partial \xi ^j}(p) \right) \right\rangle _{G_{\mathcal {H}^n}(p)}$. This means that, up to the factor $4x_n$, f in (32) is an isometry. Then, using (16), we find that the Fisher–Rao distance between two distributions $p = \varphi (p_1, \dots , p_{n-1})$ and $q = \varphi (q_1, \dots , q_{n-1})$, with $p_n = 1-\sum _{i=1}^{n-1}p_i$ and $q_n = 1-\sum _{i=1}^{n-1}q_i$, is

$$\begin{aligned} d_\textrm{FR}(p,q)&= 2 \sqrt{x_n}\ \widetilde{d}_{S_2^{n-1}}\left( (2\sqrt{p_1},\dots ,2\sqrt{p_n}), (2\sqrt{q_1},\dots ,2\sqrt{q_n}) \right) \nonumber \\&= 2 \sqrt{x_n} {{\,\textrm{arccosh}\,}}\left( \frac{1 - \sum _{i=1}^{n-1} \sqrt{p_iq_i}}{\sqrt{p_nq_n}} \right) . \end{aligned}$$

(33)

In the above equality, we used the fact that the hyperbolic distance in $S_{2,+}^{n-1} \subseteq \mathcal {H}^n$ is invariant to dilation and contraction, as remarked in Sect. 3. We also conclude that the geodesics in the statistical manifold $\mathcal {S}$ are associated to orthogonal semicircles in $S_{2,+}^{n-1}$.

It is interesting to note that the similar maps (27) and (32), respectively, for categorical and negative multinomial distributions are used to embed the statistical manifold (in correspondence with the simplex) in the radius-two sphere, but with different metrics. Figure 2 illustrates the geodesics according to these two metrics.

5 Continuous distributions

In the following, we start presenting one-dimensional examples (Sects. 5.1–5.3), then we consider two-dimensional statistical manifolds (Sects. 5.4–5.9), and multivariate models (Sects. 5.10–5.11). The results are summarised in Table 2.

Table 2 Fisher–Rao distances for continuous distributions

Full size table

5.1 Exponential

An exponential distribution [10] has p.d.f. $p(x) = \lambda e^{-\lambda x}$, defined for $x \in \mathbb {R}_+$, and parametrised by $\lambda \in \mathbb {R}_+^{*}$. In this case, we have $\partial _{\lambda }\ell (\lambda ) = \frac{1}{\lambda } - x$, and the Fisher information is

$$\begin{aligned} g_{11}(\lambda )&= \mathbb {E}\left[ \left( \partial _{\lambda } \ell (\lambda ) \right) ^2 \right] \nonumber \\&= \mathbb {E}\left[ \left( \frac{1}{\lambda } - X \right) ^2 \right] \nonumber \\&= \frac{1}{\lambda ^2} - \frac{2}{\lambda }\mathbb {E}[X] + \mathbb {E}[X^2] \nonumber \\&= \frac{1}{\lambda ^2}, \end{aligned}$$

(34)

where we have used that $\mathbb {E}[X]=\frac{1}{\lambda }$ and $\mathbb {E}[X^2] = \frac{2}{\lambda ^2}$. The Fisher–Rao distance is given by

$$\begin{aligned} d_{\textrm{FR}}(\lambda _1,\lambda _2) = \left| \int _{\lambda _1}^{\lambda _2} \frac{1}{\lambda }~\textrm{d}\lambda \right| = \left| \log \lambda _1 - \log \lambda _2 \right| . \end{aligned}$$

(35)

5.2 Rayleigh

A Rayleigh distribution has p.d.f. $p(x) = \frac{x}{\sigma ^2}\exp \left( -\frac{x^2}{2\sigma ^2} \right) $, defined for $x \in \mathbb {R}_+$, and parametrised by $\sigma \in \mathbb {R}_+^*$. We have $\partial _{\sigma } \ell (\sigma ) = \frac{x^2}{\sigma ^3} - \frac{2}{\sigma }$, and the Fisher information is

$$\begin{aligned} g_{11}(\sigma )&= \mathbb {E}\left[ \left( \partial _{\sigma } \ell (\sigma ) \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{X^2}{\sigma ^3} - \frac{2}{\sigma } \right) ^2 \right] \nonumber \\&= \frac{\mathbb {E}[X^4]}{\sigma ^6} - \frac{2\mathbb {E}[X^2]}{\sigma ^4} + \frac{4}{\sigma ^2}\nonumber \\&= \frac{4}{\sigma ^2}, \end{aligned}$$

(36)

where we have used that $\mathbb {E}[X^2] = 4\sigma ^2$ and $\mathbb {E}[X^4] = 8\sigma ^4$. Thus, the Fisher–Rao distance is given by

$$\begin{aligned} d_{\textrm{FR}}(\sigma _1,\sigma _2) = \left| \int _{\sigma _1}^{\sigma _2} \frac{2}{\sigma }~\textrm{d}\sigma \right| = 2 \left| \log \sigma _1 - \log \sigma _2 \right| . \end{aligned}$$

(37)

5.3 Erlang

An Erlang distribution has p.d.f. $p(x) = \frac{\lambda ^k x^{k-1} e^{-\lambda x}}{(k-1)!}$, defined for $x \in \mathbb {R}_+$, and parametrised by $\lambda \in \mathbb {R}_+^*$, for a fixed $k \in \mathbb {N}^*$. We have $\partial _\lambda \ell (\lambda ) = \frac{k}{\lambda } - x$, so that the Fisher information is

$$\begin{aligned} g_{11}(\lambda )&= \mathbb {E}\left[ \left( \partial _{\lambda } \ell (\lambda ) \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{k}{\lambda } - X \right) ^2 \right] \nonumber \\&= \mathbb {E}[X^2] - \frac{2k\mathbb {E}[X]}{\lambda } + \frac{k^2}{\lambda ^2}\nonumber \\&= \frac{k}{\lambda ^2}, \end{aligned}$$

(38)

where we have used that $\mathbb {E}[X] = \frac{k}{\lambda }$ and $\mathbb {E}[X^2] = \frac{k(k+1)}{\lambda ^2}$. The Fisher–Rao distance is then

$$\begin{aligned} d_{\textrm{FR}}(\lambda _1,\lambda _2) = \left| \int _{\lambda _1}^{\lambda _2} \frac{\sqrt{k}}{\lambda }~\textrm{d}\lambda \right| = \sqrt{k} \left| \log \lambda _1 - \log \lambda _2 \right| . \end{aligned}$$

(39)

5.4 Univariate elliptical distributions

Elliptical distributions are a class of distributions that generalise Gaussian distributions [51]. Here we focus on univariate elliptical distributions, which are defined for $x \in \mathbb {R}$, parametrised by $(\mu , \sigma ) \in \mathbb {R}\times \mathbb {R}_+^{*}$, and have p.d.f. of the form

$$\begin{aligned} p(x) = \frac{1}{\sigma } h \left( \frac{(x-\mu )^2}{\sigma ^2} \right) , \end{aligned}$$

(40)

for a fixed measurable function $h :\mathbb {R}_+ \rightarrow \mathbb {R}_+$ that satisfies $\int _{-\infty }^{\infty } h(z^2)~\textrm{d}z = 1$ and $\lim _{z\rightarrow +\infty } zh(z^2)=0$. For a random variable X distributed according to (40), provided its mean and variance exist, they are given by $\mathbb {E}[X]=\mu $ and $\text {Var}(X) = \sigma ^2 \int _{-\infty }^{\infty } z^2\,h(z^2)~\textrm{d}z$. For each function h, the set of distributions of the form (40) forms a statistical manifold parametrised by $(\mu ,\sigma )$. Some examples of these manifolds have been studied in [17], and we write some other examples in the same standard form.

For these distributions, we have $\partial _{\mu } \ell {:}{=}\partial _{\mu } \ell (\mu ,\sigma ) = -\frac{2(x-\mu )}{\sigma ^2} \frac{h' \left( \sigma ^{-2}(x-\mu )^2 \right) }{h \left( \sigma ^{-2}(x-\mu )^2 \right) }$, and $\partial _{\sigma } \ell {:}{=}\partial _{\sigma } \ell (\mu ,\sigma ) = -\frac{1}{\sigma } - \frac{2(x-\mu )^2}{\sigma ^3} \frac{h' \left( \sigma ^{-2}(x-\mu )^2 \right) }{h \left( \sigma ^{-2}(x-\mu )^2 \right) }$. Thus, the elements of the Fisher matrix are

$$\begin{aligned} g_{11}&= \mathbb {E}\left[ \left( \partial _{\mu }\ell \right) ^2 \right] \\&= \int _{-\infty }^{\infty } \left( -\frac{2(x-\mu )}{\sigma ^2} \frac{h' \left( \sigma ^{-2}(x-\mu )^2 \right) }{h \left( \sigma ^{-2}(x-\mu )^2 \right) } \right) ^2 \frac{1}{\sigma } h \left( \frac{(x-\mu )^2}{\sigma ^2} \right) ~\textrm{d}x\\&=\frac{4}{\sigma ^2}\int _{-\infty }^{\infty } z^2 \frac{\left[ h'(z^2)\right] ^2}{h(z^2)}~\textrm{d}z,\\ g_{12}&= g_{21} = \mathbb {E}\left[ \left( \partial _\mu \ell \right) \left( \partial _\sigma \ell \right) \right] \\&=\int ^{\infty }_{-\infty } \left( -\frac{2(x-\mu )}{\sigma ^2} \frac{h' \left( \sigma ^{-2}(x-\mu )^2 \right) }{h \left( \sigma ^{-2}(x-\mu )^2 \right) } \right) \left( -\frac{1}{\sigma } - \frac{2(x-\mu )^2}{\sigma ^3} \frac{h' \left( \sigma ^{-2}(x-\mu )^2 \right) }{h \left( \sigma ^{-2}(x-\mu )^2 \right) } \right) \\&\quad \times \frac{1}{\sigma } h \left( \frac{(x-\mu )^2}{\sigma ^2} \right) ~\textrm{d}x\\&=\frac{2}{\sigma ^2} \int ^{\infty }_{-\infty } z\left( h'(z^2) + 2z^2 \frac{\left[ h'(z^2)\right] ^2}{h(z^2)}\right) ~\textrm{d}z\\&=0,\\ g_{22}&=\mathbb {E}\left[ \left( \partial _{\sigma }\ell \right) ^2 \right] \\&= \int _{-\infty }^{\infty } \left( -\frac{1}{\sigma } - \frac{2(x-\mu )^2}{\sigma ^3} \frac{h' \left( \sigma ^{-2}(x-\mu )^2 \right) }{h \left( \sigma ^{-2}(x-\mu )^2 \right) } \right) ^2 \frac{1}{\sigma } h \left( \frac{(x-\mu )^2}{\sigma ^2} \right) ~\textrm{d}x\\&= \int _{-\infty }^{\infty } \left( \frac{1}{\sigma ^2} + \frac{4}{\sigma ^2}z^4 \left( \frac{h'(z^2)}{h(z^2)} \right) ^2 + \frac{4}{\sigma ^2} z^2 \frac{h'(z^2)}{h(z^2)} \right) h(z^2)~\textrm{d}z\\&= \frac{1}{\sigma ^2} + \frac{4}{\sigma ^2} \int _{-\infty }^{\infty } z^4 \frac{\left[ h'(z^2)\right] ^2}{h(z^2)}~\textrm{d}z + \frac{4}{\sigma ^2} \int _{-\infty }^{\infty } z^2 h'(z^2)~\textrm{d}z \\&=\frac{4}{\sigma ^2} \int _{-\infty }^{\infty } z^4 \frac{\left[ h'(z^2)\right] ^2}{h(z^2)}~\textrm{d}z-\frac{1}{\sigma ^2}. \end{aligned}$$

The Fisher matrix for elliptical distributions has the form

$$\begin{aligned} G(\mu ,\sigma ) =\begin{pmatrix} \frac{a_h}{\sigma ^2} & \quad 0\\ 0 & \quad \frac{b_h}{\sigma ^2} \end{pmatrix}, \end{aligned}$$

(41)

where

$$\begin{aligned} a_h {:}{=}4 \int _{-\infty }^{\infty } z^2 \frac{\left[ h'(z^2)\right] ^2}{h(z^2)}~\textrm{d}z, \end{aligned}$$

and

$$\begin{aligned} b_h {:}{=}4\int _{-\infty }^{\infty } z^4 \frac{\left[ h'(z^2)\right] ^2}{h(z^2)}~\textrm{d}z - 1. \end{aligned}$$

Applying Lemma 4 to a statistical manifold formed by univariate elliptical distributions (i.e., for a fixed h) equipped with the metric (41), we get an expression for the Fisher–Rao distance in this manifold given by

$$\begin{aligned} d_{\textrm{FR}} \big ((\mu _1,\sigma _1), (\mu _2, \sigma _2)\big )&= \sqrt{b_h} d_{\mathcal {H}^2} \left( (\sqrt{a_h}\mu _1,\sqrt{b_h}\sigma _1),( \sqrt{a_h}\mu _2,\sqrt{b_h}\sigma _2 )\right) \nonumber \\&= 2\sqrt{b_h} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{a_h(\mu _1-\mu _2)^2 + b_h(\sigma _1-\sigma _2)^2}{a_h(\mu _1-\mu _2)^2 + b_h(\sigma _1+\sigma _2)^2}} \right) . \end{aligned}$$

(42)

Note that, if $\mu _1 = \mu _2 {=}{:}\mu $, the expression simplifies to

$$\begin{aligned} d_{\textrm{FR}} \big ((\mu ,\sigma _1), (\mu , \sigma _2)\big )&= \sqrt{b_h} \left| \log \sigma _1 - \log \sigma _2 \right| . \end{aligned}$$

(43)

5.4.1 Gaussian

A Gaussian distribution [2, 10, 12, 17, 34, 45] is characterised by the p.d.f. $p(x) = \frac{1}{\sigma \sqrt{2 \pi }}\exp \left( -\frac{(x-\mu )^2}{2\sigma ^2}\right) $, defined for $x \in \mathbb {R}$, and parametrised by the pair $(\mu , \sigma ) \in \mathbb {R}\times \mathbb {R}_+^*$. Note that Gaussian distributions are elliptical distribution with $h(u)=\frac{1}{\sqrt{2\pi }}\exp (-u/2)$, $a_h=1$, and $b_h=2$. Thus, by (41), the corresponding Fisher matrix is

$$\begin{aligned} G(\mu ,\sigma ) = \begin{pmatrix} \frac{1}{\sigma ^2} & \quad 0\\ 0 & \quad \frac{2}{\sigma ^2} \end{pmatrix}. \end{aligned}$$

(44)

And, by (42), the Fisher–Rao distance is obtained as

$$\begin{aligned} d_{\textrm{FR}} \big ((\mu _1,\sigma _1), (\mu _2, \sigma _2)\big )&= 2\sqrt{2} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\mu _1-\mu _2)^2 + 2(\sigma _1-\sigma _2)^2}{(\mu _1-\mu _2)^2 + 2(\sigma _1+\sigma _2)^2}} \right) . \end{aligned}$$

(45)

5.4.2 Laplace

A Laplace distribution has p.d.f. $p(x) = \frac{1}{2\sigma }\exp \left( -\frac{|x-\mu |}{\sigma } \right) $, defined for $x \in \mathbb {R}$, and parametrised by $(\mu ,\sigma ) \in \mathbb {R}\times \mathbb {R}_+^*$. Laplace distributions are elliptical distribution with $h(u)=\frac{1}{2}\exp \left( -\sqrt{u}\right) $, $a_h=1$, and $b_h=1$. Using (41), we get the Fisher matrix as

$$\begin{aligned} G(\mu ,\sigma ) = \begin{pmatrix} \frac{1}{\sigma ^2} & \quad 0\\ 0 & \quad \frac{1}{\sigma ^2} \end{pmatrix}, \end{aligned}$$

(46)

showing that this metric coincides with the hyperbolic metric (11) of the Poincaré half-plane. Using (42), we get that the Fisher–Rao distance in this manifold is

$$\begin{aligned} d_{\textrm{FR}} \big ( (\mu _1,\sigma _1), (\mu _2,\sigma _2) \big ) = 2 {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\mu _1-\mu _2)^2+(\sigma _1-\sigma _2)^2}{(\mu _1-\mu _2)^2+(\sigma _1+\sigma _2)^2}} \right) . \end{aligned}$$

(47)

The particular case of zero-mean Laplace distributions is included in [22].

5.4.3 Generalised Gaussian

A generalised Gaussian distribution^{Footnote 3} [22, 51, 53], also known as exponential power distribution, is characterised by the p.d.f. $p(x) = \frac{\beta }{2\sigma \Gamma (1/\beta )}\exp \left( -\frac{\vert x-\mu \vert ^{\beta }}{\sigma }\right) $, defined for $x \in \mathbb {R}$, and parametrised by $(\mu , \sigma ) \in \mathbb {R}\times \mathbb {R}_+^*$, for a fixed $\beta >0$. These can be seen as elliptical distributions with $h(u)=\frac{\beta }{2\Gamma (1/\beta )}\exp (-u^{\beta /2})$, $a_h = \beta \frac{\Gamma (2-1/\beta )}{\Gamma (1+1/\beta )}$, and $b_h=\beta $. Using (41), we get the Fisher matrix as

$$\begin{aligned} G(\mu ,\sigma ) = \begin{pmatrix} \frac{\beta }{\sigma ^2}\frac{\Gamma (2-1/\beta )}{\Gamma (1+1/\beta )} & \quad 0\\ 0 & \quad \frac{\beta }{\sigma ^2} \end{pmatrix}, \end{aligned}$$

(48)

and, using (42), we get the Fisher–Rao distance as

$$\begin{aligned}&d_{\textrm{FR}} \big ((\mu _1,\sigma _1), (\mu _2, \sigma _2)\big )\nonumber \\&\quad = \sqrt{\beta +1} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{\beta (\mu _1-\mu _2)^2\Gamma (2-1/\beta ) + (\beta +1)(\sigma _1-\sigma _2)^2\Gamma (1+1/\beta )}{\beta (\mu _1-\mu _2)^2\Gamma (2-1/\beta ) + (\beta +1)(\sigma _1+\sigma _2)^2\Gamma (1+1/\beta )}} \right) . \end{aligned}$$

(49)

Note that choosing $\beta =2$ yields a Gaussian distribution with mean $\mu $ and variance $\sigma ^2/2$, while letting $\beta =1$ corresponds to a Laplace distribution with mean $\mu $ and variance $8\sigma ^2$. Multivariate, zero-mean generalised Gaussian distributions have been studied in [22].

5.4.4 Logistic

A logistic distribution [18] has p.d.f. $p(x) = \frac{\exp \left( -(x-\mu )/\sigma \right) }{\sigma \left( \exp \left( -(x-\mu )/\sigma \right) +1\right) ^2}$, defined for $x \in \mathbb {R}$ and parametrised by $(\mu , \sigma ) \in \mathbb {R}\times \mathbb {R}^*_+$. A logistic distribution is an elliptical distribution with $h(u)=\frac{\exp \left( -\sqrt{u}\right) }{\left( 1+\exp \left( -\sqrt{u}\right) \right) ^2}$, $a_h=\frac{1}{3}$, and $b_h=\frac{\pi ^2+3}{9}$. From (41), we have that the Fisher matrix is

$$\begin{aligned} G(\mu ,\sigma ) = \begin{pmatrix} \frac{1}{3\sigma ^2} & \quad 0\\ 0 & \quad \frac{\pi ^2+3}{9\sigma ^2} \end{pmatrix}, \end{aligned}$$

(50)

and, from (42), the Fisher–Rao distance is

$$\begin{aligned}&d_{\textrm{FR}}\big ( (\mu _1,\sigma _1), (\mu _2,\sigma _2) \big )\nonumber \\&\qquad = \frac{2\sqrt{\pi ^2+3}}{3} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{3(\mu _2-\mu _1)^2+(\pi ^2+3)(\sigma _2-\sigma _1)^2}{3(\mu _2-\mu _1)^2+(\pi ^2+3)(\sigma _2+\sigma _1)^2}} \right) . \end{aligned}$$

(51)

5.4.5 Cauchy

A Cauchy distribution [17, 49] has p.d.f. $p(x) = \frac{\sigma }{\pi \left[ (x-\mu )^2 + \sigma ^2 \right] }$, defined for $x \in \mathbb {R}$, and parametrised by $(\mu , \sigma ) \in \mathbb {R}\times \mathbb {R}_+^*$. Cauchy distributions are elliptical distributions by choosing $h(u)=\frac{1}{\pi (1+u)}$, $a_h=1/2$, $b_h=1/2$. Recall that the mean and variance are not defined in this case. From (41), we get its the Fisher matrix

$$\begin{aligned} G(\mu ,\sigma ) =\begin{pmatrix} \frac{1}{2\sigma ^2} & \quad 0\\ 0 & \quad \frac{1}{2\sigma ^2} \end{pmatrix}, \end{aligned}$$

(52)

and (42) gives the Fisher–Rao distance as

$$\begin{aligned} d_{\textrm{FR}}\big ( (\mu _1,\sigma _1), (\mu _2, \sigma _2) \big ) = {\sqrt{2}} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\mu _1-\mu _2)^2+(\sigma _1-\sigma _2)^2}{(\mu _1-\mu _2)^2+(\sigma _1+\sigma _2)^2}} \right) . \end{aligned}$$

(53)

Figure 3 illustrates geodesics between Gaussian and Cauchy distributions.

5.4.6 Student’s t

A location-scale Student’s t distribution [17] with $\nu \in \mathbb {N}^*$ degrees of freedom generalises the Cauchy distribution. It has p.d.f. $p(x) = \left( 1+ \frac{1}{\nu }(\frac{x-\mu }{\sigma })^2 \right) ^{-\frac{\nu +1}{2}} \frac{\Gamma \left( (\nu +1)/2\right) }{\sigma \sqrt{\pi \nu }\Gamma (\nu /2)}$, defined for $x \in \mathbb {R}$, and parametrised by $(\mu ,\sigma ) \in \mathbb {R}\times \mathbb {R}_+^*$. This is an elliptical distribution, with $h(u)=\frac{\Gamma \left( (\nu +1)/2\right) }{\sqrt{\pi \nu }\Gamma (\nu /2)} \left( 1+\frac{u}{\nu }\right) ^{-(\nu +1)/2}$, $a_h=\frac{\nu +1}{\nu +3}$, and $b_h=\frac{2\nu }{\nu +3}$. Then, by (41) we obtain the Fisher matrix

$$\begin{aligned} G(\mu ,\sigma ) =\begin{pmatrix} \frac{\nu +1}{(\nu +3)\sigma ^2} & \quad 0\\ 0 & \quad \frac{2\nu }{(\nu +3)\sigma ^2} \end{pmatrix}, \end{aligned}$$

(54)

and by (42), the Fisher–Rao distance

$$\begin{aligned} d_{\textrm{FR}}\big ( (\mu _1,\sigma _1), (\mu _2,\sigma _2) \big ) =2\sqrt{\frac{2\nu }{\nu +3}} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\nu +1)(\mu _2-\mu _1)^2+2\nu (\sigma _2-\sigma _1)^2}{(\nu +1)(\mu _2-\mu _1)^2+2\nu (\sigma _2+\sigma _1)^2}} \right) . \end{aligned}$$

(55)

Remark 5

We close this subsection with a remark on the general case of multivariate elliptical distributions. These are distributions of the form $p(\pmb {x}) = (\det \Sigma )^{-1/2} h \left( (\pmb {x}-\varvec{\mu })^{\textsf{T}} \Sigma ^{-1} (\pmb {x} - \varvec{\mu }) \right) $, for some function $h :\mathbb {R}_+ \rightarrow \mathbb {R}_+$, defined for $\pmb {x}\in \mathbb {R}^n$, and characterised by a vector $\varvec{\mu }\in \mathbb {R}^n$ and an $n \times n$ positive-definite symmetric matrix $\Sigma \in P_n(\mathbb {R})$. Analogously to the univariate case, the set of elliptical distributions for a fixed h forms an $\left( n + \frac{n(n+1)}{2}\right) $-dimensional statistical manifold. In the general case, however, no general closed-form expression for the Fisher–Rao distance is known; instead, only expressions for particular cases and bounds for this distance are available [17, 54,55,56,57]. Multivariate Gaussian distributions $p(\pmb {x}) = \left( (2\pi )^n \det \Sigma \right) ^{-1/2} \exp \left( - \frac{1}{2} (\pmb {x}-\varvec{\mu })^\textsf{T}\right. \left. \Sigma ^{-1} (\pmb {x} - \varvec{\mu }) \right) $ have been particularly studied [10, 12, 13, 20, 58,59,60]. Special cases for which the Fisher–Rao distance can be written include: fixed mean, fixed covariance, diagonal covariance matrix, and mirrored distributions. The case of diagonal covariance matrix is equivalent to independent components and will be treated in Sect. 6. Special cases for the multivariate generalised Gaussian distributions have been studied in [22]. The Fisher–Rao distance between zero-mean complex elliptically symmetric distributions has been computed in [33, 61].

5.5 Log-Gaussian

A log-Gaussian distribution [2, 34] is the distribution of a random variable whose logarithm follows a Gaussian distribution. It has p.d.f. $p(x) = \frac{1}{\sigma x\sqrt{2\pi }} \exp \left( - \frac{(\log x - \mu )^2}{2\sigma ^2} \right) $, defined for $x \in \mathbb {R}_+^*$, and parametrised by $(\mu ,\sigma ) \in \mathbb {R}\times \mathbb {R}_+^*$. In this case, we have $\partial _{\mu } \ell {:}{=}\partial _{\mu } \ell (\mu ,\sigma ) = \frac{\log x - \mu }{\sigma ^2}$ and $\partial _{\sigma } \ell {:}{=}\partial _{\sigma } \ell (\mu ,\sigma ) = -\frac{1}{\sigma } + \frac{(\log x-\mu )^2}{\sigma ^3}$, so that the elements of the Fisher matrix are

$$\begin{aligned} g_{11}&= \mathbb {E}\left[ \left( \partial _{\mu }\ell \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{\log X-\mu }{\sigma ^2} \right) ^2 \right] \\&= \frac{\mathbb {E}[(\log X-\mu )^2]}{\sigma ^4} = \frac{1}{\sigma ^2},\\ g_{12}&= g_{21} = \mathbb {E}\left[ \left( \partial _\mu \ell \right) \left( \partial _\sigma \ell \right) \right] = \mathbb {E}\left[ \left( \frac{\log X-\mu }{\sigma ^2} \right) \left( -\frac{1}{\sigma } + \frac{(\log X-\mu )^2}{\sigma ^3} \right) \right] \\&= -\frac{\mathbb {E}[\log X-\mu ]}{\sigma ^3} + \frac{\mathbb {E}[(\log X-\mu )^3]}{\sigma ^5}\\&= 0,\\ g_{22}&= \mathbb {E}\left[ \left( \partial _\sigma \ell \right) ^2 \right] = \mathbb {E}\left[ \left( -\frac{1}{\sigma } + \frac{(\log X-\mu )^2}{\sigma ^3} \right) ^2 \right] \\&= \frac{1}{\sigma ^2} - \frac{2 \mathbb {E}[(\log X-\mu )^2]}{\sigma ^4} + \frac{\mathbb {E}[(\log X-\mu )^4]}{\sigma ^6} = \frac{2}{\sigma ^2}, \end{aligned}$$

having used that $\mathbb {E}[\log X] = \mu $, $\mathbb {E}[\log X - \mu ] = \mathbb {E}[(\log X - \mu )^3] = 0$, $\mathbb {E}[(\log X-\mu )^2] = \sigma ^2$, and $\mathbb {E}[(\log X-\mu )^4]=3\sigma ^4$. Thus, the Fisher matrix is given by

$$\begin{aligned} G(\mu ,\sigma ) = \begin{pmatrix} \frac{1}{\sigma ^2} & \quad 0\\ 0 & \quad \frac{2}{\sigma ^2} \end{pmatrix}. \end{aligned}$$

(56)

This is the same as for the Gaussian manifold (44), therefore the Fisher–Rao distance the same, namely,

$$\begin{aligned} d_\textrm{FR}\big ((\mu _1,\sigma _1), (\mu _2, \sigma _2)\big ) = 2\sqrt{2} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\mu _1-\mu _2)^2 + 2(\sigma _1-\sigma _2)^2}{(\mu _1-\mu _2)^2 + 2(\sigma _1+\sigma _2)^2}} \right) . \end{aligned}$$

(57)

5.6 Inverse Gaussian

An inverse Gaussian distribution [23, 48] has p.d.f. $p(x) = \sqrt{\frac{\lambda }{2\pi x^3}} \exp \left( - \frac{\lambda (x - \mu )^2}{2\mu ^2 x} \right) $, defined for $x \in \mathbb {R}_+^*$, and parametrised by $(\lambda ,\mu ) \in \mathbb {R}_+^* \times \mathbb {R}_+^*$. In this case, we have $\partial _{\lambda } \ell {:}{=}\partial _{\lambda } \ell (\lambda ,\mu ) = \frac{1}{2\lambda } - \frac{(x-\mu )^2}{2\mu ^2 x}$ and $\partial _{\mu } \ell {:}{=}\partial _{\mu } \ell (\lambda ,\mu ) = \frac{\lambda ( x - \mu )}{\mu ^3}$. The elements of the Fisher matrix are then

$$\begin{aligned} g_{11}&= \mathbb {E}\left[ \left( \partial _{\lambda }\ell \right) ^2 \right] =\mathbb {E}\left[ \left( \frac{1}{2\lambda } - \frac{(x-\mu )^2}{2\mu ^2 x}\right) ^2 \right] \\&= \frac{1}{4\lambda ^2}+\frac{1}{\lambda \mu }+\frac{3}{2\mu ^2}-\left( \frac{1}{2\lambda \mu ^2}+\frac{1}{\mu ^3}\right) \mathbb {E}[X]\nonumber \\&\quad -\left( \frac{1}{2\lambda }+\frac{1}{\mu }\right) \mathbb {E}\left[ \frac{1}{X}\right] +\frac{1}{4\mu ^4}\mathbb {E}\left[ X^2\right] +\frac{1}{4}\mathbb {E}\left[ \frac{1}{X^2}\right] \\&= \frac{1}{2\lambda ^2},\\ g_{12}&= g_{21} = \mathbb {E}\left[ \left( \partial _{\lambda } \ell \right) \left( \partial _{\mu } \ell \right) \right] = \mathbb {E}\left[ \left( \frac{1}{2\lambda } - \frac{(x-\mu )^2}{2\mu ^2 x}\right) \left( \frac{\lambda ( x - \mu )}{\mu ^3} \right) \right] \\&= -\frac{1}{2\lambda \mu ^2}-\frac{3}{2\mu ^3}+\left( \frac{1}{2\lambda \mu ^3}-\frac{3\lambda }{2\mu ^4}\right) \mathbb {E}[X]+\frac{\lambda }{2\mu ^2}\mathbb {E}\left[ \frac{1}{X}\right] -\frac{\lambda }{2\mu ^5}\mathbb {E}\left[ X^2\right] \\&= 0,\\ g_{22}&= \mathbb {E}\left[ \left( \partial _{\mu } \ell \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{\lambda ( x - \mu )}{\mu ^3} \right) ^2 \right] \\&= \frac{\lambda ^2}{\mu ^4}-\frac{2\lambda ^2}{\mu ^5}\mathbb {E}[X]+\frac{\lambda ^2}{\mu ^6}\mathbb {E}\left[ X^2\right] \\&= \frac{\lambda }{\mu ^3}, \end{aligned}$$

where we have used that $\mathbb {E}[X] = \mu $, $\mathbb {E}\left[ X^2\right] = \frac{\mu ^3}{\lambda }+\mu ^2$, $\mathbb {E}\left[ \frac{1}{X}\right] = \frac{1}{\lambda }+\frac{1}{\mu }$ and $\mathbb {E}\left[ \frac{1}{X^2} \right] = \frac{3}{\lambda ^2}+\frac{3}{\lambda \mu }+\frac{1}{\mu ^2}$. Thus, the Fisher matrix is given by

$$\begin{aligned} G(\lambda ,\mu ) = \begin{pmatrix} \frac{1}{2\lambda ^2} & \quad 0\\ 0 & \quad \frac{\lambda }{\mu ^3} \end{pmatrix}. \end{aligned}$$

(58)

To find the Fisher–Rao distance, we consider the change of coordinates $u=1/\sqrt{\lambda }$, $v=\sqrt{2/\mu }$. Applying Proposition 2 we find that the Fisher matrix in the new coordinates

$$\begin{aligned} \widetilde{G}(u,v) =\begin{pmatrix} \frac{2}{u^2} & \quad 0\\ 0 & \quad \frac{2}{u^2} \end{pmatrix} =2 \begin{pmatrix} \frac{1}{u^2} & \quad 0\\ 0 & \quad \frac{1}{u^2} \end{pmatrix}. \end{aligned}$$

Applying Lemma 4 yields

$$\begin{aligned}&d_{\textrm{FR}}\big ( (\lambda _1,\mu _1), (\lambda _2,\mu _2) \big )\nonumber \\&\quad = \sqrt{2}d_{\mathcal {H}^2} \left( \left( 1/\sqrt{\lambda _1}, \sqrt{2/\mu _1}\right) , \left( 1/\sqrt{\lambda _2},\sqrt{2/\mu _2}\right) \right) \nonumber \\&\quad = 2\sqrt{2} {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{ \mu _1\mu _2(\sqrt{\lambda _1}-\sqrt{\lambda _2})^2+2\lambda _1\lambda _2(\sqrt{\mu _1}-\sqrt{\mu _2})^2 }{\mu _1\mu _2(\sqrt{\lambda _1}-\sqrt{\lambda _2})^2+2\lambda _1\lambda _2(\sqrt{\mu _1}+\sqrt{\mu _2})^2} } \right) . \end{aligned}$$

(59)

5.7 Extreme-value distributions

Extreme-value distributions [62] are limit distribution for the maxima (or minima) of a sequence of i.i.d. random variables. They are usually considered to be of one of three families, all of which can be described in the form of the generalised extreme-value distributions:

$$\begin{aligned} p(x) = \frac{1}{\sigma } {[t(x,\xi )]}^{\xi +1} e^{-t(x,\xi )}, \end{aligned}$$

(60)

where $\sigma >0$, $\xi \in \mathbb {R}$, and

$$\begin{aligned} t(x, \xi ) ={\left\{ \begin{array}{ll} \left( 1 + \xi \left( \frac{x-\mu }{\sigma }\right) \right) ^{-\frac{1}{\xi }}, & \xi \ne 0,\\ \exp \left( -\frac{x-\mu }{\sigma }\right) , & \xi =0, \end{array}\right. } \end{aligned}$$

for $\mu \in \mathbb {R}$. Note that $t(x,\xi )$ is continuous on $\xi =0$, for every $x\in \mathbb {R}$. When $\xi =0$, the support of (60) is $x\in \mathbb {R}$, and it is called type I or Gumbel-type distribution; when $\xi >0$, the support is $x \in [\mu -\frac{\sigma }{\xi },\ +\infty [$, and it is called type II or Fréchet-type distribution; when $\xi <0$, the support is $x \in ]-\infty ,\ \mu -\frac{\sigma }{\xi }]$, and it is called type III or Weibull-type distribution. Therefore, in general, these distributions are parametrised by the triple $(\mu ,\sigma ,\xi ) \in \mathbb {R}\times \mathbb {R}^*_+ \times \mathbb {R}$. Instead of treating the three-dimensional manifold of generalised extreme-value distributions in full generality [25, 63], for which no closed-form expression for the Fisher–Rao is available, we consider the usual two-dimensional versions of them, following [18].

5.7.1 Gumbel

A Gumbel distribution [18] has p.d.f. $p(x) = \frac{1}{\sigma } \exp \left( -\frac{x-\mu }{\sigma }\right) \exp \left( -\exp \left( -\frac{x-\mu }{\sigma }\right) \right) $, defined for $x \in \mathbb {R}$, and parametrised by $(\mu , \sigma ) \in \mathbb {R}\times \mathbb {R}^*_+$. This corresponds to (60) taking $\xi =0$. In this case, we have $\partial _{\mu } \ell {:}{=}\partial _{\mu } \ell (\mu ,\sigma ) = \frac{1}{\sigma } -\frac{1}{\sigma }\exp \left( -\frac{x-\mu }{\sigma }\right) $ and $\partial _{\sigma } \ell {:}{=}\partial _{\sigma } \ell (\mu ,\sigma ) =- \frac{x-\mu }{\sigma ^2}\exp \left( -\frac{x-\mu }{\sigma }\right) +\frac{x-\mu }{\sigma ^2}-\frac{1}{\sigma }$. Denoting $ Z {:}{-}\frac{X-\mu }{\sigma }$, the elements of the Fisher matrix can be written as

$$\begin{aligned} g_{11}&= \mathbb {E}\left[ \left( \partial _{\mu }\ell \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{1}{\sigma } -\frac{1}{\sigma }e^{-Z} \right) ^2 \right] \\&= \frac{1}{\sigma ^2} - \frac{2}{\sigma ^2}\mathbb {E}\left[ e^{-Z}\right] + \frac{1}{\sigma ^2}\mathbb {E}\left[ e^{-2Z}\right] \\&= \frac{1}{\sigma ^2},\\ g_{12}&= g_{21} = \mathbb {E}\left[ \left( \partial _{\mu } \ell \right) \left( \partial _{\sigma } \ell \right) \right] \\&= \mathbb {E}\left[ \left( \frac{1}{\sigma } -\frac{1}{\sigma } e^{-Z} \right) \left( \frac{Ze^{-Z}}{\sigma } + \frac{Z}{\sigma }+\frac{1}{\sigma ^2}\right) \right] \\&= -\frac{2}{\sigma ^2} \mathbb {E}\left[ Ze^{-Z}\right] +\frac{1}{\sigma ^2} \mathbb {E}\left[ Z\right] -\frac{1}{\sigma ^2} +\frac{1}{\sigma ^2}\mathbb {E}\left[ Ze^{-2Z}\right] +\frac{1}{\sigma ^2} \mathbb {E}\left[ e^{-Z}\right] \\&=\frac{\gamma -1}{\sigma ^2},\\ g_{22}&= \mathbb {E}\left[ \left( \partial _{\sigma } \ell \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{Ze^{-Z}}{\sigma } + \frac{Z}{\sigma } + \frac{1}{\sigma ^2}\right) ^2 \right] \\&= \frac{1}{\sigma ^2}\mathbb {E}\left[ Z^2 e^{-2Z}\right] - \frac{2}{\sigma ^2}\mathbb {E}\left[ Z^2 e^{-Z}\right] +\frac{2}{\sigma ^2}\mathbb {E}\left[ Ze^{-Z}\right] \\&\qquad +\frac{1}{\sigma ^2}\mathbb {E}\left[ Z^2\right] -\frac{2}{\sigma ^2}\mathbb {E}\left[ Z\right] +\frac{1}{\sigma ^2}\\&= \frac{1}{\sigma ^2}\left( (\gamma -1)^2+\frac{\pi ^2}{6}\right) , \end{aligned}$$

where $\gamma $ is the Euler constant and we have used that $\mathbb {E}\left[ Z\right] =\gamma $, $\mathbb {E}\left[ e^{-Z}\right] =1$, $\mathbb {E}\left[ e^{-2Z}\right] =2$, $\mathbb {E}\left[ Ze^{-Z}\right] =\gamma -1$, $\mathbb {E}\left[ Ze^{-2Z}\right] =2\gamma -3$, $\mathbb {E}\left[ Z^2e^{-2Z}\right] =2\gamma ^2-6\gamma +2+\frac{\pi ^2}{3}$, $\mathbb {E}\left[ Z^2e^{-Z}\right] =\gamma ^2-2\gamma +\frac{\pi ^2}{6}$ and $\mathbb {E}\left[ Z^2\right] =\gamma ^2+\frac{\pi ^2}{6}$. Thus, the Fisher matrix is

$$\begin{aligned} G(\mu ,\sigma ) = \begin{pmatrix} \frac{1}{\sigma ^2} & \quad \frac{\gamma -1}{\sigma ^2}\\ \frac{\gamma -1}{\sigma ^2} & \quad \frac{1}{\sigma ^2}\left( (\gamma -1)^2+\frac{\pi ^2}{6}\right) \end{pmatrix}. \end{aligned}$$

(61)

To find the Fisher–Rao distance, we consider the change of coordinates $u=\mu -(1-\gamma )\sigma $, $v=\pi \sigma /\sqrt{6}$. Applying Proposition 2 we find that the Fisher matrix in the new coordinates is

$$\begin{aligned} \widetilde{G}(u,v) = \begin{pmatrix} \frac{\pi ^2}{6v^2} & \quad 0\\ 0 & \quad \frac{\pi ^2}{6v^2} \end{pmatrix} =\frac{\pi ^2}{6} \begin{pmatrix} \frac{1}{v^2} & \quad 0\\ 0 & \quad \frac{1}{v^2}. \end{pmatrix} \end{aligned}$$

Then, applying Lemma 4, the Fisher–Rao distance is given by

$$\begin{aligned}&d_{\textrm{FR}}\big ( (\mu _1,\sigma _1), (\mu _2,\sigma _2) \big )\nonumber \\&\quad = \frac{\pi }{\sqrt{6}}d_{\mathcal {H}^2} \left( \left( \mu _1-(1-\gamma )\sigma _1, \frac{\pi }{\sqrt{6}}\sigma _1\right) , \left( \mu _2-(1-\gamma )\sigma _2, \frac{\pi }{\sqrt{6}}\sigma _2\right) \right) \nonumber \\&\quad = \frac{2\pi }{\sqrt{6}} {{\,\textrm{arctanh}\,}}\left( \sqrt{ \frac{ \left[ (\mu _1-\mu _2)-(1-\gamma )(\sigma _1-\sigma _2)\right] ^2+\frac{\pi ^2}{6}(\sigma _1-\sigma _2)^2 }{ \left[ (\mu _1-\mu _2)-(1-\gamma )(\sigma _1-\sigma _2)\right] ^2+\frac{\pi ^2}{6}(\sigma _1+\sigma _2)^2 } } \right) . \end{aligned}$$

(62)

5.7.2 Fréchet

A Fréchet distribution [18] has p.d.f. $p(x) = \frac{\lambda }{\beta } \left( \frac{x}{\beta } \right) ^{-\lambda -1} \exp \left( - \left( \frac{x}{\beta }\right) ^{-\lambda } \right) $, defined for $x \in \mathbb {R}_+^*$, and parametrised by $(\beta ,\lambda ) \in \mathbb {R}_+^* \times \mathbb {R}_+^*$. Note that this corresponds to (60) taking $\mu = \beta $, $\sigma = \beta /\lambda $ and $\xi = 1/\lambda $. This distribution can be related to the Gumbel distribution by considering the reparametrisation of the sample space $Y {:}{=}\log X$, which preserves the Fisher metric (Proposition 3). The p.d.f. of the new random variable is then

$$\begin{aligned} p(y) = \frac{1}{\left| \frac{\textrm{d}y}{\textrm{d}x}\right| } \frac{\lambda }{\beta } \left( \frac{e^y}{\beta } \right) ^{-\lambda -1} \exp \left( -\left( \frac{e^y}{\beta } \right) ^{-\lambda } \right) = \lambda \left( \frac{e^y}{\beta }\right) ^{-\lambda } \exp \left( - \left( \frac{e^y}{\beta }\right) ^{-\lambda } \right) . \end{aligned}$$

Now, considering the change of coordinates $\alpha = \log \beta $, $\theta = 1/\lambda $, we find

$$\begin{aligned} p(y) = \frac{1}{\theta } \exp \left( - \frac{y-\alpha }{\theta } \right) \exp \left( - \left( -\frac{y-\alpha }{\theta } \right) \right) , \end{aligned}$$

which coincides with that of a Gumbel distribution with parameters $(\alpha ,\theta ) \in \mathbb {R}\times \mathbb {R}^*_+$. Comparing with the Fisher matrix (61) and applying Proposition 2 we find that the Fisher matrix in the $(\beta ,\lambda )$ coordinates is

$$\begin{aligned} G(\beta ,\lambda ) =\begin{pmatrix} \frac{\lambda ^2}{\beta ^2} & \quad \frac{1-\gamma }{\beta }\\ \frac{1-\gamma }{\beta } & \quad \frac{1}{\lambda ^2}\left( (\gamma -1)^2+\frac{\pi ^2}{6}\right) \end{pmatrix}. \end{aligned}$$

(63)

Note that, by considering the change of coordinates $u=\log \beta -(1-\gamma )/\lambda $, $v=\pi /(\lambda \sqrt{6})$ and applying again Proposition 2, the Fisher matrix is found to be

$$\begin{aligned} \widetilde{G}(u,v) = \begin{pmatrix} \frac{\pi ^2}{6v^2} & \quad 0\\ 0 & \quad \frac{\pi ^2}{6v^2} \end{pmatrix} =\frac{\pi ^2}{6} \begin{pmatrix} \frac{1}{v^2} & \quad 0\\ 0 & \quad \frac{1}{v^2}. \end{pmatrix}, \end{aligned}$$

Therefore, by Lemma 4 the Fisher–Rao distance for Fréchet distributions is

$$\begin{aligned}&d_{\textrm{FR}}\big ( (\beta _1,\lambda _1), (\beta _2,\lambda _2) \big ) \nonumber \\&\quad = \frac{\pi }{\sqrt{6}}d_{\mathcal {H}^2}\left( \left( \log \beta _1-\frac{(1-\gamma )}{\lambda _1 }, \frac{\pi }{\lambda _1\sqrt{6}}\right) , \left( \log \beta _2-\frac{(1-\gamma )}{\lambda _2 }, \frac{\pi }{\lambda _2\sqrt{6}}\right) \right) \nonumber \\&\quad = \frac{2\pi }{\sqrt{6}} {{\,\textrm{arctanh}\,}}\left( \sqrt{ \frac{ \left[ \log \frac{\beta _1}{\beta _2}-(1-\gamma )\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2}\right) \right] ^2+\frac{\pi ^2}{6}\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2}\right) ^2 }{ \left[ \log \frac{\beta _1}{\beta _2}-(1-\gamma )\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2}\right) \right] ^2+\frac{\pi ^2}{6}\left( \frac{1}{\lambda _1}+\frac{1}{\lambda _2}\right) ^2 } } \right) . \end{aligned}$$

(64)

5.7.3 Weibull

A Weibull distribution [12, 18, 50] has p.d.f. $p(x) = \frac{\lambda }{\beta } \left( \frac{x}{\beta } \right) ^{\lambda -1} \exp \left( - \left( \frac{x}{\beta }\right) ^{\lambda } \right) $, defined for $x \in \mathbb {R}_+^*$, and parametrised by $(\beta ,\lambda ) \in \mathbb {R}_+^* \times \mathbb {R}_+^*$. Note that this corresponds to the distribution of $-X$ in (60) taking $\mu = -\beta $, $\sigma = \beta /\lambda $ and $\xi = -1/\lambda $. It is possible to relate a Weibull distribution to Gumbel by considering the reparametrisation of the sample space $Y {:}{=}- \log X$, which preserves the Fisher metric, (Proposition 3). The p.d.f. of the new random variable is

$$\begin{aligned} p(y) = \frac{1}{\left| \frac{\textrm{d}y}{\textrm{d}x}\right| } \frac{\lambda }{\beta } \left( \frac{e^{-y}}{\beta } \right) ^{\lambda -1} \exp \left( -\left( \frac{e^{-y}}{\beta } \right) ^{\lambda } \right) = \lambda \left( \frac{e^{-y}}{\beta }\right) ^{\lambda } \exp \left( - \left( \frac{e^{-y}}{\beta }\right) ^{\lambda } \right) . \end{aligned}$$

Moreover, with the change of coordinates $\lambda = 1/\theta $, $\alpha = -\log \beta $, we have

$$\begin{aligned} p(y) = \frac{1}{\theta } \exp \left( - \frac{y-\alpha }{\theta } \right) \exp \left( - \exp \left( -\frac{y-\alpha }{\theta } \right) \right) , \end{aligned}$$

which coincides with a a Gumbel distribution with parameters $(\alpha ,\theta ) \in \mathbb {R}\times \mathbb {R}^*_+$. Again, comparing with the Fisher matrix (61) and applying Proposition 2 we find that the Fisher matrix in the $(\beta ,\lambda )$ coordinates is

$$\begin{aligned} G(\beta ,\lambda ) = \begin{pmatrix} \frac{\lambda ^2}{\beta ^2} & \quad \frac{\gamma -1}{\beta }\\ \frac{\gamma -1}{\beta } & \quad \frac{1}{\lambda ^2}\left( (\gamma -1)^2+\frac{\pi ^2}{6}\right) \end{pmatrix}. \end{aligned}$$

(65)

The change of coordinates $u=-\log \beta -(1-\gamma )/\lambda $, $v=\pi /(\lambda \sqrt{6})$ with Proposition 2 yields the following form for the Fisher matrix:

$$\begin{aligned} \widetilde{G}(u,v) = \begin{pmatrix} \frac{\pi ^2}{6v^2} & \quad 0\\ 0 & \quad \frac{\pi ^2}{6v^2} \end{pmatrix} =\frac{\pi ^2}{6} \begin{pmatrix} \frac{1}{v^2} & \quad 0\\ 0 & \quad \frac{1}{v^2} \end{pmatrix}. \end{aligned}$$

In addition, by Lemma 4, the Fisher–Rao distance for Weibull distributions is

$$\begin{aligned}&d_{\textrm{FR}}\big ( (\beta _1,\lambda _1), (\beta _2,\lambda _2) \big ) \nonumber \\&\quad = \frac{\pi }{\sqrt{6}}d_{\mathcal {H}^2} \left( \left( -\log \beta _1-\frac{(1-\gamma )}{\lambda _1 }, \frac{\pi }{\lambda _1\sqrt{6}}\right) , \left( -\log \beta _2-\frac{(1-\gamma )}{\lambda _2 }, \frac{\pi }{\lambda _2\sqrt{6}}\right) \right) \nonumber \\&\quad = \frac{2\pi }{\sqrt{6}} {{\,\textrm{arctanh}\,}}\left( \sqrt{ \frac{ \left[ \log \frac{\beta _2}{\beta _1}-(1-\gamma )\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2}\right) \right] ^2+\frac{\pi ^2}{6}\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2}\right) ^2 }{ \left[ \log \frac{\beta _2}{\beta _1}-(1-\gamma )\left( \frac{1}{\lambda _1}-\frac{1}{\lambda _2}\right) \right] ^2+\frac{\pi ^2}{6}\left( \frac{1}{\lambda _1}+\frac{1}{\lambda _2}\right) ^2 } } \right) . \end{aligned}$$

(66)

The special case of fixed $\lambda $ has been addressed in [12].

Remark 6

If X is a random variable following a Weibull distribution, then $-X$ follows a reversed Weibull distribution, which corresponds to the Weibull-type distribution from (60), and has the same geometry, and same Fisher–Rao distance as the Weibull distribution [18].

5.8 Pareto

A Pareto distribution [12, 14] has p.d.f. $p(x) = {\theta }\alpha ^\theta {x^{-(\theta +1)}}$, defined for $x \in [\alpha ,\infty [$ and parametrised by $(\theta , \alpha ) \in \mathbb {R}^*_+ \times \mathbb {R}^*_+$. In this case, the support depends on the parametrisation, thus violating one of the assumptions made in the definition of a statistical manifold (1). Nevertheless, it is still possible^{Footnote 4} to compute a Riemannian metric from the Fisher information matrix, as in (2). We thus have $\partial _{\theta } \ell {:}{=}\partial _{\theta } \ell (\theta ,\alpha ) = \frac{1}{\theta } + \log \alpha - \log x$ and $\partial _{\alpha } \ell {:}{=}\partial _{\alpha } \ell (\theta ,\alpha ) = \frac{\theta }{\alpha }$. The elements of the Fisher matrix are

$$\begin{aligned} g_{11}&= \mathbb {E}\left[ \left( \partial _{\theta }\ell \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{1}{\theta } + \log \alpha - \log X \right) ^2 \right] \\&= \frac{1}{\theta ^2} + (\log \alpha )^2 + \mathbb {E}\left[ (\log X)^2 \right] + \frac{2}{\theta }\left( \log \alpha - \mathbb {E}[\log X] \right) - 2 \log \alpha \mathbb {E}[\log X]\\&= \frac{1}{\theta ^2},\\ g_{12}&= g_{21} = \mathbb {E}\left[ \left( \partial _\theta \ell \right) \left( \partial _{\alpha } \ell \right) \right] = \mathbb {E}\left[ \left( \frac{1}{\theta } + \log \alpha - \log X \right) \left( \frac{\theta }{\alpha } \right) \right] \\&= \frac{1}{\alpha } + \frac{\theta }{\alpha }\log \alpha - \frac{\theta }{\alpha } \mathbb {E}[\log X] = 0,\\ g_{22}&= \mathbb {E}\left[ \left( \partial _{\alpha } \ell \right) ^2 \right] = \mathbb {E}\left[ \left( \frac{\theta }{\alpha } \right) ^2 \right] \\&= \frac{\theta ^2}{\alpha ^2}, \end{aligned}$$

where we have used that $\mathbb {E}[\log X] = \frac{1}{\theta } + \log \alpha $ and $\mathbb {E}[(\log X)^2] = \frac{2}{\theta ^2} + \frac{2 \log \alpha }{\theta } + (\log \alpha )^2$. Thus, the Fisher matrix is

$$\begin{aligned} G(\theta ,\alpha ) = \begin{pmatrix} \frac{1}{\theta ^2} & \quad 0\\ 0 & \quad \frac{\theta ^2}{\alpha ^2} \end{pmatrix}. \end{aligned}$$

(67)

To find the Fisher–Rao distance, we consider the change of coordinates $u=\log \alpha $, $v=1/\theta $. Applying Proposition 2 we find that the Fisher matrix in the new coordinates

$$\begin{aligned} \widetilde{G}(u,v)= \begin{pmatrix} \frac{1}{v^2} & \quad 0\\ 0 & \quad \frac{1}{v^2} \end{pmatrix}, \end{aligned}$$

which coincides with the hyperbolic metric (11) restricted to the positive quadrant. Therefore the Fisher–Rao distance is given by

$$\begin{aligned} d_{\textrm{FR}}\big ( (\theta _1,\alpha _1), (\theta _2, \alpha _2) \big )&= d_{\mathcal {H}^2} \left( \left( \log \alpha _1, 1/{\theta _1}\right) , \left( \log \alpha _2, 1/{\theta _2} \right) \right) \nonumber \\&= 2 {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\theta _1\theta _2\log (\alpha _1/\alpha _2))^2+(\theta _1-\theta _2)^2}{(\theta _1\theta _2\log (\alpha _1/\alpha _2))^2+(\theta _1+\theta _2)^2}} \right) . \end{aligned}$$

(68)

The special case of fixed $\alpha $ has been addressed in [12].

5.9 Power function

A power function distribution [12] has p.d.f. $p(x) = \theta \beta ^{-\theta } x^{\theta -1}$, defined for $x \in ]0,\beta ]$, and parametrised by $(\theta ,\beta ) \in \mathbb {R}_+^* \times \mathbb {R}_+^*$. As in the previous example, the support depends on the parametrisation, but it is still possible to consider the Fisher metric as in (2). This distribution can be related to the Pareto distribution (Sect. 5.8) as follows. Consider the reparametrisation of the sample space given by $Y {:}{=}1/X$ (cf. Proposition 3), and the change of coordinates $\alpha = 1/\beta $. Note that, since $x \in ]0,\beta ]$, we have $y \in [\alpha ,\infty [$. The p.d.f. of the new random variable, with the new coordinates, is

$$\begin{aligned} p(y) = \frac{1}{\left| \frac{\textrm{d}y}{\textrm{d}x} \right| } \theta \beta ^{-\theta } {y}^{-(\theta -1)} = \theta \beta ^{-\theta } y^{-(\theta +1)} = \theta \alpha ^{\theta } y^{-(\theta +1)}. \end{aligned}$$

which coincides with a Pareto distribution with parameters $(\theta ,\alpha )$. Therefore, applying Proposition 2, we find

$$\begin{aligned} G(\theta ,\beta ) = \begin{pmatrix} \frac{1}{\theta ^2} & \quad 0\\ 0 & \quad \frac{\theta ^2}{\beta ^2} \end{pmatrix}, \end{aligned}$$

(69)

and therefore

$$\begin{aligned} d_{\textrm{FR}}\big ( (\theta _1,\beta _1), (\theta _2, \beta _2) \big )&= d_{\mathcal {H}^2} \left( \left( \log \beta _1, 1/{\theta _1} \right) , \left( \log \beta _2, 1/{\theta _2} \right) \right) \nonumber \\&= 2 {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\theta _1\theta _2\log (\beta _{1}/\beta _{2}))^2+(\theta _1-\theta _2)^2}{(\theta _1\theta _2\log (\beta _{1}/\beta _{2}))^2+(\theta _1+\theta _2)^2}} \right) . \end{aligned}$$

(70)

The special case of fixed $\alpha $ has been addressed in [12].

Remark 7

All the examples of two-dimensional statistical manifolds of continuous distributions presented so far in this section are related to the hyperbolic Poincaré half-plane, and have constant negative curvature. However, there are examples of two-dimensional statistical manifolds which are not of constant negative curvature, even if we do not have an explicit expression for the Fisher–Rao distances. We present some examples in the following.

The statistical manifold of Gamma distributions $p(x) = \frac{\beta ^{\alpha }}{\Gamma (\alpha )}x^{\alpha -1}e^{-\beta x}$, defined for $x \in \mathbb {R}_+^*$ and parametrised by $(\alpha ,\beta ) \in \mathbb {R}_+^* \times \mathbb {R}_+^*$. The curvature of this manifold is [38, 39, 64]
$$\begin{aligned} \kappa = \frac{\psi ^{(1)}(\alpha ) + \alpha \psi ^{(2)}(\alpha )}{4\left( \alpha \psi ^{(1)}(\alpha ) -1\right) ^2} < 0, \end{aligned}$$
which is negative, but not constant, and where $\psi ^{(m)}(x) {:}{=}\frac{\textrm{d}^{m+1}}{\textrm{d}x^{m+1}}\log \Gamma (x)$ denotes the polygamma function of order m. Bounds for the Fisher–Rao distance in this manifold have been studied in [64].
The statistical manifold of Beta distributions $p(x) = \frac{\Gamma (\alpha +\beta )}{\Gamma (\alpha )\Gamma (\beta )} x^{\alpha -1} (1-x)^{\beta -1}$, defined for $x \in [0,1]$ and parametrised by $(\alpha ,\beta ) \in \mathbb {R}_+^* \times \mathbb {R}_+^*$. The curvature of this manifold is [26]
$$\begin{aligned} \kappa&= \frac{ \psi ^{(2)}(\alpha ) \psi ^{(2)}(\beta ) \psi ^{(2)}(\alpha +\beta ) }{4 \left( \psi ^{(1)}(\alpha )\psi ^{(1)}(\beta ) - \psi ^{(1)}(\alpha +\beta )[\psi ^{(1)}(\alpha ) + \psi ^{(1)}(\beta )] \right) ^2}\\&\quad \times \left( \frac{\psi ^{(1)}(\alpha )}{\psi ^{(2)}(\alpha )} + \frac{\psi ^{(1)}(\beta )}{\psi ^{(2)}(\beta )} - \frac{\psi ^{(1)}(\alpha +\beta )}{\psi ^{(2)}(\alpha +\beta )} \right) \\&< 0, \end{aligned}$$
which is negative, but not constant too. In fact, more generally, the sectional curvature of the statistical manifold of Dirichlet distributions (which are the multivariate generalisation of Beta distributions) is negative [65, Thm. 6].
Finally, a construction of n-dimensional statistical manifolds, based on a Hilbert space representation of probability measures, was given in [16]. The geometry of these manifolds is spherical, that is, they have constant positive curvature.

5.10 Wishart

A Wishart distribution [11, 66] in dimension m, with $n \ge m$ degrees of freedom, $n\in \mathbb {N}$, has p.d.f.

$$\begin{aligned} p(X) = \frac{ {(\det X )}^{(n-m-1)/2}\exp \left( -\frac{1}{2}{{\,\textrm{tr}\,}}(\Sigma ^{-1}X)\right) }{ 2^{nm/2} (\det \Sigma )^{n/2} \, \Gamma _m(n/2) }, \end{aligned}$$

defined for $X\in P_m(\mathbb {R})$, and characterised by $\Sigma \in P_m(\mathbb {R})$, where $\Gamma _m(z) {:}{=}\pi ^{\frac{m(m-1)}{4}}\prod _{j=1}^{m} \Gamma \left( z+\frac{1-j}{2}\right) $ denotes the multivariate Gamma function. Note that, for fixed m, n, the associated $\left( \frac{m(m+1)}{2}\right) $-dimensional statistical manifold $\mathcal {S}$ is in correspondence with the cone $P_m(\mathbb {R})$ of symmetric positive-definite matrices via the bijection $\iota :P_m(\mathbb {R}) \rightarrow \mathcal {S}$, given by $\Sigma \mapsto \frac{{(\det X )}^{(n-m-1)/2}\exp \left( -\frac{1}{2}{{\,\textrm{tr}\,}}(\Sigma ^{-1}X)\right) }{2^{nm/2} (\det \Sigma )^{n/2} \, \Gamma _m(n/2)}$. Denoting $\sigma _{i,j}$ the (i, j)-th entry of the matrix $\Sigma $, we can write the parameter vector as $\xi = \left( \xi ^1, \dots , \xi ^{m(m+1)/2}\right) = (\sigma _{1,1}, \dots \sigma _{1,m}, \sigma _{2,2}, \dots , \sigma _{2,m}, \dots , \sigma _{m,m})$. We then have $\partial _{i}\ell {:}{=}\partial _{{i}} \ell (\Sigma ) = \frac{1}{2}{{\,\textrm{tr}\,}}\left( \Sigma ^{-1}X\Sigma ^{-1} {(\partial _i\Sigma )}\right) - \frac{n}{2}{{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\right) $ and $\partial _{j}\partial _{i}\ell = - {{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\Sigma ^{-1}X\right) + \frac{n}{2}{{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\right) $, where the derivative in $\partial _i \Sigma $ is taken entry-wise. The elements of the Fisher metric are then

$$\begin{aligned} g_{ij}(\xi )&= -\mathbb {E}\left[ \partial _{j}\partial _{i}\ell \right] \nonumber \\&= -\mathbb {E}\left[ - {{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\Sigma ^{-1}X\right) + \frac{n}{2} {{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma })\Sigma ^{-1}{(\partial _j\Sigma )}\right) \right] \nonumber \\&={{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\Sigma ^{-1}\mathbb {E}[X]\right) - \frac{n}{2}{{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\right) \nonumber \\&= \frac{n}{2}{{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\right) , \end{aligned}$$

(71)

where we have used that $\mathbb {E}[X]=n\Sigma $.

In view of the bijection $\iota $, the tangent space $T_{p_\xi }\mathcal {S}$ can be identified with $H_m$, the set of $m \times m$ symmetric matrices [67, Chapter 6]. Given two matrices $U, \ \widetilde{U} \in H_m$, parametrized as $\theta = \left( \theta ^1, \dots , \theta ^{m(m+1)/2}\right) = \left( u_{1,1}, \dots u_{1,m}, u_{2,2}, \dots , u_{2,m}, \dots , u_{m,m}\right) $ and $\widetilde{\theta } = \left( {\widetilde{\theta }} ^1, \dots , {\widetilde{\theta }} ^{m(m+1)/2}\right) = (\widetilde{u}_{1,1}, \dots \widetilde{u}_{1,m}, \widetilde{u}_{2,2}, \dots , \widetilde{u}_{2,m}, \dots , \widetilde{u}_{m,m})$, we shall compute the inner product defined by the Fisher metric, cf. (4). In the following, $\otimes $ denotes the Kronecker product, and, for an $m \times m$ matrix A, whose columns are $A_1, A_2, \dots , A_m$, we denote ${{\,\textrm{vec}\,}}(A) {:}{=}\begin{pmatrix} A_1^\textsf{T}&A_2^\textsf{T}&\cdots&A_m^\textsf{T}\end{pmatrix}^\textsf{T}$ the $m^2$-dimensional vector formed by the concatenation of its columns. Moreover, if A is symmetric, denote $\nu (A) {:}{=}(a_{1,1}, \dots a_{1,m}, a_{2,2}, \dots , a_{2,m}, \dots , a_{m,m})$. We denote $D_{m}$ the unique $m^2 \times \frac{m(m+1)}{2}$ matrix that verifies $D_{m} \nu (A) = {{\,\textrm{vec}\,}}(A)$, for any symmetric A [68, § 7]. We thus have

$$\begin{aligned} \langle U, \widetilde{U} \rangle _{G(\xi )}&= \theta ^\textsf{T}G(\xi ) {\widetilde{\theta }}\\&= \sum _{i=1}^{\frac{m(m+1)}{2}} \sum _{j=1}^{\frac{m(m+1)}{2}} g_{ij}(\xi ) \theta ^i {\widetilde{\theta }}^j\\&= \sum _{i=1}^{\frac{m(m+1)}{2}} \sum _{j=1}^{\frac{m(m+1)}{2}} \frac{n}{2} {{\,\textrm{tr}\,}}\left( \Sigma ^{-1}{(\partial _i\Sigma )}\Sigma ^{-1}{(\partial _j\Sigma )}\right) \theta ^i {\widetilde{\theta }}^j\\&= \frac{n}{2} {{\,\textrm{tr}\,}}\left( \Sigma ^{-1}U\Sigma ^{-1} \widetilde{U} \right) \\&=\frac{n}{2} {{\,\textrm{vec}\,}}(U)(\Sigma ^{-1}\otimes \Sigma ^{-1}){{\,\textrm{vec}\,}}(\widetilde{U})\\&=\frac{n}{2}\theta ^\textsf{T}D_{m}^\textsf{T}(\Sigma ^{-1}\otimes \Sigma ^{-1})D_{m}{\widetilde{\theta }}, \end{aligned}$$

where we have used that ${{\,\textrm{tr}\,}}(ABCD)={{\,\textrm{vec}\,}}(D)^\textsf{T}(A\otimes C^\textsf{T}){{\,\textrm{vec}\,}}(B^\textsf{T})$, for A, B, C and D matrices such that the product ABCD is defined and square [68, Lemma 3]. We can thus conclude that the Fisher matrix is

$$\begin{aligned} G(\xi )=\frac{n}{2} D_m^\textsf{T}\left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m. \end{aligned}$$

(72)

This metric turns out to coincide with the Fisher metric of the statistical manifold formed by multivariate Gaussian distributions with fixed mean [12, 20, 37, 60], up to the factor n. Therefore, the Fisher–Rao distance is proportional to the one of that manifold, that is,

$$\begin{aligned} d_{\textrm{FR}}\big ( \Sigma _1,\Sigma _2 \big ) = \sqrt{\frac{n}{2}} \left\| \log \left( \Sigma _1^{-1/2}\Sigma _2\Sigma _1^{-1/2} \right) \right\| _F = \sqrt{\frac{n}{2}\sum _{k=1}^{m} \left( \log \lambda _k \right) ^2}, \end{aligned}$$

(73)

where $\lambda _k$ are the eigenvalues of $\Sigma _1^{-1/2}\Sigma _2\Sigma _1^{-1/2}$, $\log $ denotes the matrix logarithm, and $\Vert A\Vert _F = \sqrt{{{\,\textrm{tr}\,}}\left( AA^\textsf{T}\right) }$ is the Frobenius norm. This metric also coincides with the standard metric of $P_m(\mathbb {R})$, when endowed with the matrix inner product $\langle A,B \rangle = {{\,\textrm{tr}\,}}(A^{\textsf{T}}B)$ [67, Chapter 6], up to the factor $\frac{n}{2}$, and is in fact related to the metric of the Siegel upper space [69] (see also [37, Appendix D]). Note that when $\Sigma $ is restricted to be diagonal this distance is, up to a factor $\sqrt{n}$, the product distance between univariate Gaussian distributions with fixed mean—see Example 2 in Sect. 6.

5.11 Inverse Wishart

An inverse Wishart distribution [70] in dimension m, with $n \ge m$ degrees of freedom, $n\in \mathbb {N}$, has p.d.f.

$$\begin{aligned} p(X) = \frac{ {(\det \Sigma )^{n/2} (\det X )}^{-(n+m+1)/2}\exp \left( -\frac{1}{2}{{\,\textrm{tr}\,}}(\Sigma X^{-1})\right) }{ 2^{nm/2}\Gamma _m(n/2) }, \end{aligned}$$

defined for $X\in P_m(\mathbb {R})$, and characterised by $\Sigma \in P_m(\mathbb {R})$. We can relate an inverse Wishart distribution to a Wishart distribution by considering the reparametrisation of the sample space given by $Y {:}{=}X^{-1}$ (cf. Proposition 3), and the change of coordinates $\Phi {:}{=}\Sigma ^{-1}$. The p.d.f. of the new random variable, in the new coordinates, is

$$\begin{aligned} p(Y)&= \frac{1}{\left| \frac{\textrm{d}Y}{\textrm{d}X} \right| } \frac{ {\left( \det (\Phi ^{-1})\right) ^{n/2} \left( \det (Y^{-1}) \right) }^{-(n+m+1)/2}\exp \left( -\frac{1}{2}{{\,\textrm{tr}\,}}(\Phi ^{-1}Y)\right) }{ 2^{nm/2} \, \Gamma _m(n/2) },\\&= \frac{1}{\left( \det (Y^{-1})\right) ^{-(m+1)}} \frac{ {\left( \det (Y^{-1}) \right) }^{-(n+m+1)/2}\exp \left( -\frac{1}{2}{{\,\textrm{tr}\,}}(\Phi ^{-1}Y)\right) }{ (\det \Phi )^{n/2}2^{nm/2} \, \Gamma _m(n/2) }\\&=\frac{ {\left( \det Y \right) }^{(n-m-1)/2}\exp \left( -\frac{1}{2}{{\,\textrm{tr}\,}}(\Phi ^{-1}Y)\right) }{ (\det \Phi )^{n/2} 2^{nm/2} \, \Gamma _m(n/2) }, \end{aligned}$$

where we have used that $\left| \frac{\textrm{d}Y}{\textrm{d}X} \right| =\det (X)^{-(m+1)}$ [68, § 12], which coincides with the p.d.f. of a Wishart distribution with parameter $\Phi $. Write $\sigma _{i,j}$ and $\phi _{i,j}$ the (i, j)-th entries of matrices $\Sigma $ and $\Phi $, respectively. Write $\xi = \left( \xi ^1, \dots , \xi ^{m(m+1)/2}\right) = (\sigma _{1,1}, \dots \sigma _{1,m}, \sigma _{2,2}, \dots , \sigma _{2,m}, \dots , \sigma _{m,m})$, and $\theta = (\theta ^1, \dots , \theta ^{m(m+1)/2}) = (\phi _{1,1}, \dots \phi _{1,m}, \phi _{2,2}, \dots , \phi _{2,m}, \dots , \phi _{m,m})$. Denote $G_W(\theta )$ the Fisher matrix of a Wishart distribution (72) in coordinates $\theta $. In the following, $D_m$ is the matrix defined in the previous section, and $A^+$ denotes the Moore–Penrose inverse of a matrix A. Applying Proposition 2 yields

$$\begin{aligned} G(\xi )&= \left[ \frac{\textrm{d}\Sigma }{\textrm{d}\Phi }(\Phi )\right] ^{-\textsf{T}} G_W(\theta ) \left[ \frac{\textrm{d}\Sigma }{\textrm{d}\Phi }(\Phi )\right] ^{-1}\\&= {\frac{n}{2}} \left( -D_m^+ \left( \Phi ^{-1}\otimes \Phi ^{-1}\right) D_m\right) ^{-\textsf{T}} D_m^\textsf{T}\left( \Phi ^{-1}\otimes \Phi ^{-1}\right) \\&\qquad D_m\left( -D_m^+ \left( \Phi ^{-1}\otimes \Phi ^{-1}\right) D_m\right) ^{-1}\\&={\frac{n}{2}} \left( D_m^+ \left( \Sigma \otimes \Sigma \right) D_m\right) ^{-\textsf{T}} D_m^\textsf{T}\left( \Sigma \otimes \Sigma \right) D_m\left( D_m^+ \left( \Sigma \otimes \Sigma \right) D_m\right) ^{-1}\\&={\frac{n}{2}} \left( D_m^+\left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m\right) ^{\textsf{T}} D_m^\textsf{T}\left( \Sigma \otimes \Sigma \right) D_mD_m^+ \left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m\\&= {\frac{n}{2}} D_m^\textsf{T}\left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) \left( D_m^+\right) ^{\textsf{T}} D_m^\textsf{T}\left( \Sigma \otimes \Sigma \right) D_mD_m^+ \left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m\\&={\frac{n}{2}} D_m^\textsf{T}\left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m, \end{aligned}$$

where we have used that $\frac{\textrm{d}\Sigma }{\textrm{d}\Phi }(\Phi )= -D_m^+\left( \Phi ^{-1}\otimes \Phi ^{-1}\right) D_m$ [68, § 12], $\left( D_m^+ (\Sigma \otimes \Sigma )D_m\right) ^{-1} = D_m^+ \left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m$, and $D_mD_m^+ \left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m = \left( \Sigma ^{-1}\otimes \Sigma ^{-1}\right) D_m$ [68, Lemma 11]. Therefore, the Fisher matrix is the same as for Wishart distributions in (72), and the Fisher–Rao distance is

$$\begin{aligned} d_{\textrm{FR}}\big ( \Sigma _1,\Sigma _2 \big ) = \sqrt{\frac{n}{2}} \left\| \log \left( \Sigma _1^{-1/2}\Sigma _2\Sigma _1^{-1/2} \right) \right\| _F = \sqrt{\frac{n}{2}\sum _{k=1}^{m} \left( \log \lambda _k \right) ^2}, \end{aligned}$$

(74)

where $\lambda _k$ are the eigenvalues of $\Sigma _1^{-1/2}\Sigma _2\Sigma _1^{-1/2}$.

Remark 8

Other matrix distributions have been recently studied in the literature. In [11], general Wishart elliptical distributions have been addressed, which also include t-Wishart and Kotz–Wishart, by noting that their metric coincides with that of zero-mean multivariate elliptical distributions [54].

6 Product distributions

In this short section, we address the Fisher–Rao distance for multivariate product distributions, i.e., distributions of random vectors whose components are independent. In this case, the distribution of the random vector is the product of the distributions of each component, and the associated statistical manifold is the product of the statistical manifold associated to each component.

Consider m Riemannian manifolds $\{(M_i,g_i) \ :\ 1 \le i \le m \}$, and the product manifold (M, g), with $M {:}{=}M_1 \times \cdots \times M_m$, and $g {:}{=}g_1 \oplus \cdots \oplus g_m$. In matrix form, the product metric is given by the block-diagonal matrix

$$\begin{aligned} G =\begin{pmatrix} G_1 & \quad 0 & \quad \cdots & \quad 0\\ 0 & \quad G_2 & \quad \cdots & \quad 0\\ \vdots & \quad \vdots & \quad \ddots & \quad \vdots \\ 0 & \quad 0 & \quad \cdots & \quad G_m \end{pmatrix}, \end{aligned}$$

where $G_i$ is the matrix form of the metric $g_i$, for $1 \le i \le m$. Let $d_i$ denote the geodesic distance in $M_i$. Then, the geodesic distance d in (M, g) is given by a Pythagorean formula [71, Prop. 1], [72]:

$$\begin{aligned} d{\left( (x_1, \dots , x_m), (y_1, \dots , y_m) \right) } = \sqrt{ \sum _{i=1}^{m} \left[ d_i(x_i,y_i)\right] ^2 }, \end{aligned}$$

(75)

where $(x_1, \dots , x_m) \in M_1 \times \cdots \times M_m$, and $(y_1, \dots , y_m) \in M_1 \times \cdots \times M_m$. This result can be used to write explicit forms for the Fisher–Rao distance in statistical manifolds of product distributions, as they are statistical product manifolds. This type of construction has been used, e.g., in [18, 23]. Some examples are given below.

Example 1

The Fisher–Rao distance between the distributions of n-dimensional vectors formed independent Poisson distributions [12] (cf. Section 4.2) with parameters $(\lambda _1, \dots , \lambda _n)$ and $(\lambda '_1, \dots , \lambda '_n)$ is

$$\begin{aligned} d_{\textrm{FR}}\left( (\lambda _1, \dots , \lambda _n), (\lambda '_1, \dots , \lambda '_n) \right) = 2 \sqrt{ \sum _{i=1}^{n} \left( \sqrt{\lambda _i} - \sqrt{\lambda '_i}\right) ^2 }. \end{aligned}$$

(76)

Example 2

Consider multivariate independent Gaussian distributions (cf. Sect. 5.4.1). In this case, the covariance matrix is diagonal, a case that has been addressed in [12, 20, 45]. The Fisher–Rao distance between such distributions parametrised by $(\mu _1,\sigma _1, \dots , \mu _n,\sigma _n)$ and $(\mu '_1,\sigma '_1, \dots , \mu '_n,\sigma '_n)$ is

$$\begin{aligned}&d_{\textrm{FR}}\left( (\mu _1, \sigma _1, \dots , \mu _n, \sigma _n), (\mu '_1, \sigma '_1, \dots , \mu '_n, \sigma '_n) \right) \nonumber \\&\quad = 2\sqrt{2} \sqrt{\sum _{i=1}^{n} \left[ {{\,\textrm{arctanh}\,}}\left( \sqrt{\frac{(\mu _i-\mu '_i)^2 + 2(\sigma _i-\sigma '_i)^2}{(\mu _i-\mu '_i)^2 + 2(\sigma _i+\sigma '_i)^2}} \right) \right] ^2}. \end{aligned}$$

(77)

Example 3

More generally, consider a vector $(X_1,\dots ,X_n)$ of n-independent generalised Gaussian distributions, where $X_i$ follows a generalised Gaussian with fixed $\beta _i$, parametrised by $(\mu _i,\sigma _i)$. For fixed values $(\beta _1, \dots , \beta _n)$, the distance between the distribution of two such vectors, parametrised by $(\mu _1,\sigma _1, \dots , \mu _n,\sigma _n)$ and $(\mu '_1,\sigma '_1, \dots , \mu '_n,\sigma '_n)$, is

$$\begin{aligned}&d_{\textrm{FR}}\left( (\mu _1, \sigma _1, \dots , \mu _n, \sigma _n), (\mu '_1, \sigma '_1, \dots , \mu '_n, \sigma '_n) \right) \nonumber \\&\quad = \sqrt{ \sum _{i=1}^{n} (\beta _i+1) \left[ {{\,\textrm{arctanh}\,}}\left( \sqrt{ \frac{ \beta _i(\mu _i-\mu '_i)^2\Gamma (2-\frac{1}{\beta _i}) + (\beta _i+1)(\sigma _i-\sigma '_i)^2\Gamma (1+\frac{1}{\beta _i}) }{ \beta _i(\mu _i-\mu '_i)^2\Gamma (2-\frac{1}{\beta _i}) + (\beta _i+1)(\sigma '_i+\sigma '_i)^2\Gamma (1+\frac{1}{\beta _i})} } \right) \right] ^2 }. \end{aligned}$$

(78)

7 Final remarks

In this survey, we have collected closed-form expressions for the Fisher–Rao distance in different statistical manifolds of both discrete and continuous distributions. In curating this collection in a unified language, we also provided some punctual contributions. The results are summarised in Tables 1 and 2. We hope that providing these expressions readily available can be helpful not only to those interested in information geometry itself, but also to a broader audience, interested in using these distances in different applications.

Data availability

No datasets were generated or analysed during the current study.

Notes

We follow the nomenclature from [1], but remark that, more generally, statistical manifold refers to a manifold equipped with a Riemannian metric and a 3-symmetric tensor, from a purely geometric point of view [39] (see also [40, § 4.5]). We will restrict ourselves to the case of parametric statistical models presented above.
We shall also refer to the Fisher–Rao distance between two parametric distributions as the distance between their respective parameters.
These generalised Gaussian distributions are in a different sense that those considered in [52].
As noted in [14], what is not possible is to use the alternative expression (3), which would result in a ‘fake metric’.

References

Amari, S., Nagaoka, H.: Methods of Information Geometry. American Mathematical Society, Providence, RI, USA (2000)
Google Scholar
Calin, O., Udrişte, C.: Geometric Modeling in Probability and Statistics. Springer, Cham, Switzerland (2014)
Book Google Scholar
Nielsen, F.: An elementary introduction to information geometry. Entropy 22(10) (2020)
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry and sufficient statistics. Probability Theory and Related Fields 162, 327–364 (2015)
Article MathSciNet Google Scholar
Lê, H.V.: The uniqueness of the Fisher metric as information metric. Annals of the Institute of Statistical Mathematics 69, 879–896 (2017)
Article MathSciNet Google Scholar
Hotelling, H.: Spaces of statistical parameters. Bulletin of the American Mathematical Society 36, 191 (1930)
Google Scholar
Stigler, S.M.: The epic story of maximum likelihood. Statistical Science 22(4), 598–620 (2007)
Article MathSciNet Google Scholar
Rao, C.R.: Information and the accuracy attainable in the estimation of statistical parameters. Bulletin of the Calcutta Mathematical Society 37, 81–91 (1945)
MathSciNet Google Scholar
Rao, C.R.: Information and the accuracy attainable in the estimation of statistical parameters. In: Kotz, S., Johnson, N.L. (eds.) Breakthroughs in Statistics: Foundations and Basic Theory, pp. 235–247. Springer, New York, NY, USA (1992)
Chapter Google Scholar
Atkinson, C., Mitchell, A.F.S.: Rao’s distance measure. Sankhyā: The Indian Journal of Statistics. Series A 43(3), 345–365 (1981)
MathSciNet Google Scholar
Ayadi, I., Bouchard, F., Pascal, F.: Elliptical Wishart distribution: Maximum likelihood estimator from information geometry. In: 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1–5 (2023)
Burbea, J.: Informative geometry of probability spaces. Expositiones Mathematicae 4, 347–378 (1986)
MathSciNet Google Scholar
Calvo, M., Oller, J.M.: An explicit solution of information geodesic equations for the multivariate model. Statistics & Decisions 9(1–2), 119–138 (1991)
MathSciNet Google Scholar
Li, M., Sun, H., Peng, L.: Fisher-Rao geometry and Jeffreys prior for Pareto distribution. Communications in Statistics-Theory and Methods 51(6), 1895–1910 (2022)
Article MathSciNet Google Scholar
Micchelli, C.A., Noakes, L.: Rao distances. Journal of Multivariate Analysis 92(1), 97–115 (2005)
Article MathSciNet Google Scholar
Minarro, A., Oller, J.M.: On a class of probability density functions and their information metric. Sankhyā: The Indian Journal of Statistics, Series A 55(2), 214–225 (1993)
MathSciNet Google Scholar
Mitchell, A.F.S.: Statistical manifolds of univariate elliptic distributions. International Statistical Review 56(1), 1–16 (1988)
Article MathSciNet Google Scholar
Oller, J.M.: Information metric for extreme value and logistic probability distributions. Sankhyā: The Indian Journal of Statistics, Series A 49(1), 17–23 (1987)
MathSciNet Google Scholar
Oller, J.M., Cuadras, C.M.: Rao’s distance for negative multinomial distributions. Sankhyā: The Indian Journal of Statistics, Series A 47(1), 75–83 (1985)
MathSciNet Google Scholar
Pinele, J., Strapasson, J.E., Costa, S.I.R.: The Fisher–Rao distance between multivariate normal distributions: Special cases, bounds and applications. Entropy 22(4) (2020)
Rao, C.R.: Differential metrics in probability spaces. In: Amari, S., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R. (eds.) Differential Geometry in Statistical Inference vol. 10. Institute of Mathematical Statistics, Hayward, CA, USA (1987). Chap. 1
Verdoolaege, G., Scheunders, P.: On the geometry of multivariate generalized Gaussian models. Journal of Mathematical Imaging and Vision 43, 180–193 (2012)
Article MathSciNet Google Scholar
Villarroya, A., Oller, J.M.: Statistical tests for the inverse gaussian distribution based on Rao distance. Sankhyā: The Indian Journal of Statistics, Series A 55(1), 80–103 (1993)
Google Scholar
Gattone, S.A., De Sanctis, A., Russo, T., Pulcini, D.: A shape distance based on the fisher-rao metric and its application for shapes clustering. Physica A: Statistical Mechanics and its Applications 487, 93–102 (2017)
Article MathSciNet Google Scholar
Taylor, S.: Clustering financial return distributions using the Fisher information metric. Entropy 21(2) (2019)
Le Brigant, A., Guigui, N., Rebbah, S., Puechmorel, S.: Classifying histograms of medical data using information geometry of beta distributions. In: 24th International Symposium on Mathematical Theory of Networks and Systems MTNS 2020. IFAC-PapersOnLine, vol. 54, pp. 514–520 (2021)
Rebbah, S., Nicol, F., Puechmorel, S.: The geometry of the generalized Gamma manifold and an application to medical imaging. Mathematics 7(8) (2019)
Arvanitidis, G., González-Duque, M., Pouplin, A., Kalatzis, D., Hauberg, S.: Pulling back information geometry. In: Camps-Valls, G., Ruiz, F.J.R., Valera, I. (eds.) The 25th International Conference on Artificial Intelligence and Statistics. Proceedings of Machine Learning Research, vol. 151, pp. 4872–4894 (2022)
Picot, M., Messina, F., Boudiaf, M., Labeau, F., Ayed, I.B., Piantanida, P.: Adversarial robustness via Fisher-Rao regularization. IEEE Transactions on Pattern Analysis and Machine Intelligence 45(3), 2698–2710 (2023)
Article Google Scholar
Shi-Garrier, L., Bouaynaya, N.C., Delahaye, D.: Adversarial robustness with partial isometry. Entropy 26(2) (2024)
Gomes, E.D.C., Alberge, F., Duhamel, P., Piantanida, P.: Igeood: An information geometry approach to out-of-distribution detection. In: International Conference on Learning Representations (2022). https://openreview.net/forum?id=mfwdY3U_9ea
Miyamoto, H.K., Meneghetti, F.C.C., Costa, S.I.R.: The Fisher-Rao loss for learning under label noise. Information Geometry 6, 107–126 (2023)
Article MathSciNet Google Scholar
Bouchard, F., Breloy, A., Collas, A., Renaux, A., Ginolhac, G.: The Fisher-Rao geometry of CES distributions. arXiv:2310.01032 (2023)
Arwini, K.A., Dodson, C.T.J.: Information Geometry: Near Randomness and Near Independence. Springer, Heidelberg, Germany (2008)
Book Google Scholar
Han, M., Park, F.C.: DTI segmentation and fiber tracking using metrics on multivariate normal distributions. Journal of Mathematical Imaging and Vision 49, 317–334 (2014)
Article MathSciNet Google Scholar
Le Brigant, A., Deschamps, J., Collas, A., Miolane, N.: Parametric information geometry with the package Geomstats. ACM Transactions on Mathematical Software 49(4), 1–26 (2023)
Article MathSciNet Google Scholar
Nielsen, F.: A simple approximation method for the Fisher–Rao distance between multivariate normal distributions. Entropy 25(4) (2023)
Reverter, F., Oller, J.M.: Computing the Rao distance for Gamma distributions. Journal of Computational and Applied Mathematics 157(1), 155–167 (2003)
Article MathSciNet Google Scholar
Lauritzen, S.L.: Statistical manifolds. In: Amari, S., Barndorff-Nielsen, O.E., Kass, R.E., Lauritzen, S.L., Rao, C.R. (eds.) Differential Geometry in Statistical Inference vol. 10. Institute of Mathematical Statistics, Hayward, CA, USA (1987). Chap. 4
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information Geometry. Springer, Cham, Switzerland (2017)
Book Google Scholar
Klingenberg, W.: A Course in Differential Geometry. Springer, New York, NY, USA (1978)
Book Google Scholar
Beardon, A.F.: The Geometry of Discrete Groups. Springer, New York, NY, USA (1983)
Book Google Scholar
Cannon, J.W., Floyd, W.J., Kenyon, R., Parry, W.R.: Hyperbolic geometry. In: Levy, S. (ed.) Flavors of Geometry. MSRI Publications, vol. 31. Cambridge University Press, Cambridge, UK; New York, NY, USA (1997)
Ratcliffe, J.G.: Foundations of Hyperbolic Manifolds, 2nd edn. Springer, New York, NY, USA (2006)
Google Scholar
Costa, S.I.R., Santos, S.A., Strapasson, J.E.: Fisher information distance: A geometrical reading. Discrete Applied Mathematics 197, 59–69 (2015)
Article MathSciNet Google Scholar
Kass, R.E., Vos, P.W.: Geometrical Foundations of Asymptotic Inference. Wiley, New York, NY, USA (1997)
Book Google Scholar
Tsybakov, A.B.: Introduction to Nonparametric Estimation. Springer, New York, NY, USA (2009)
Book Google Scholar
Khan, G., Zhang, J.: A hall of statistical mirrors. Asian Journal of Mathematics 26(6), 809–846 (2022)
Article MathSciNet Google Scholar
Nielsen, F.: On Voronoi diagrams on the information-geometric Cauchy manifolds. Entropy 22(7) (2020)
Wauters, D., Vermeire, L.: Intensive numerical and symbolic computing in parametric test theory. In: Härdle, W., Simar, L. (eds.) Computer Intensive Methods in Statistics, pp. 62–72. Physica-Verlag, Heidelberg, Germany (1993)
Chapter Google Scholar
Fang, K.-T., Kotz, S., Ng, K.W.: Symmetric Multivariate and Related Distributions. UK; New York, NY, USA, Chapman and Hall, London (1990)
Book Google Scholar
Andai, A.: On the geometry of generalized Gaussian distributions. Journal of Multivariate Analysis 100(4), 777–793 (2009)
Article MathSciNet Google Scholar
Dytso, A., Bustin, R., Poor, H.V., Shamai, S.: Analytical properties of generalized Gaussian distributions. Journal of Statistical Distributions and Applications 5(6)
Berkane, M., Oden, K., Bentler, P.M.: Geodesic estimation in elliptical distributions. Journal of Multivariate Analysis 63(1), 35–46 (1997)
Article MathSciNet Google Scholar
Calvo, M., Oller, J.M.: A distance between elliptical distributions based in an embedding into the Siegel group. Journal of Computational and Applied Mathematics 145(2), 319–334 (2002)
Article MathSciNet Google Scholar
Chen, X., Zhou, J., Hu, S.: Upper bounds for Rao distance on the manifold of multivariate elliptical distributions. Automatica 129, 109604 (2021)
Article MathSciNet Google Scholar
Mitchell, A.F.S., Krzanowski, W.J.: The Mahalanobis distance and elliptic distributions. Biometrika 72(2), 464–467 (1985)
Article Google Scholar
Calvo, M., Oller, J.M.: A distance between multivariate normal distributions based in an embedding into the Siegel group. Journal of Multivariate Analysis 35(2), 223–242 (1990)
Article MathSciNet Google Scholar
Krzanowski, W.J.: Rao’s distance between normal populations that have common principal components. Biometrics 4, 1467–1471 (1996)
Article Google Scholar
Skovgaard, L.T.: A Riemannian geometry of the multivariate normal model. Scandinavian Journal of Statistics 11(4), 211–223 (1984)
MathSciNet Google Scholar
Breloy, A., Ginolhac, G., Renaux, A., Bouchard, F.: Intrinsic Cramér-Rao bounds for scatter and shape matrices estimation in CES distributions. IEEE Signal Processing Letters 26(2), 262–266 (2019)
Article Google Scholar
Kotz, S., Nadarajah, S.: Extreme Value Distributions: Theory and Applications. Imperial College Press, London, UK (2000)
Book Google Scholar
Prescott, P., Walden, A.T.: Maximum likelihood estimation of the parameters of the generalized extreme-value distribution. Biometrika 67(3), 723–724 (1980)
Article MathSciNet Google Scholar
Burbea, J., Oller, J.M., Reverter, F.: Some remarks on the information geometry of the Gamma distribution. Communications in Statistics-Theory and Methods 31(11), 1959–1975 (2002)
Article MathSciNet Google Scholar
Le Brigant, A., Preston, S.C., Puechmorel, S.: Fisher-Rao geometry of Dirichlet distributions. Differential Geometry and its Applications 74, 101702 (2021)
Article MathSciNet Google Scholar
Gupta, A.K., Nagar, D.K.: Matrix Variate Distributions. Chapman and Hall/CRC, Boca Raton, FL, USA (2000)
Google Scholar
Bhatia, R.: Positive Definite Matrices. Princeton University Press, Princeton, NJ, USA (2007)
Google Scholar
Magnus, J.R., Neudecker, H.: Symmetry, 0–1 matrices and Jacobians: A review. Econometric Theory 2(2), 157–190 (1986)
Article Google Scholar
Siegel, C.L.: Symplectic geometry. American Journal of Mathematics 65(1), 1–86 (1943)
Article MathSciNet Google Scholar
Gelman, A., Carlin, J.B., Stern, H.S., Dunson, D.B., Vehtari, A., Rubin, D.B.: Bayesian Data Analysis. Chapman and Hall/CRC, Boca Raton, FL, USA (2013)
Book Google Scholar
D’Andrea, F., Martinetti, P.: On Pythagoras theorem for products of spectral triples. Letters in Mathematical Physics 103, 469–492 (2012)
Article MathSciNet Google Scholar
Oller, J.M.: Some geometrical aspects of data analysis and statistics. In: Dodge, Y. (ed.) Statistical Data Analysis and Inference, pp. 41–58. North-Holland, Amsterdam, The Netherlands (1989)
Google Scholar

Download references

Acknowledgements

Part of this work was developed while the first author was with IMECC, Unicamp, Brazil. Discussions with Luiz Lara and Franciele Silva are gratefully acknowledged. We thank Gabriel Khan for bringing to our attention the example of inverse Gaussian distributions.

Funding

H.K.M. was supported by São Paulo Research Foundation (FAPESP) grant 2021/04516-8. F.C.C.M. was supported by Brazilian National Council for Scientific and Technological Development (CNPq) grant 141407/2020-4. S.I.R.C. was supported by FAPESP grant 2020/09838-0 (BI0S – Brazilian Institute of Data Science) and CNPq grants 314441/2021-2 and 308399/2023-4.

Author information

Authors and Affiliations

Laboratoire des Signaux et Systèmes (L2S), Université Paris-Saclay, CentraleSupélec, CNRS, Gif-sur-Yvette, France
Henrique K. Miyamoto
Instituto de Matemática, Estatística e Computação Científica (IMECC), Universidade Estadual de Campinas (Unicamp), Campinas, SP, Brazil
Fábio C. C. Meneghetti & Sueli I. R. Costa
Centro de Ciências Exatas e Tecnológicas, Universidade Federal do Recôncavo da Bahia (UFRB), Cruz das Almas, BA, Brazil
Julianna Pinele

Authors

Henrique K. Miyamoto
View author publications
You can also search for this author in PubMed Google Scholar
Fábio C. C. Meneghetti
View author publications
You can also search for this author in PubMed Google Scholar
Julianna Pinele
View author publications
You can also search for this author in PubMed Google Scholar
Sueli I. R. Costa
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to this work. All authors wrote the first draft, and H.K.M., J.P. and S.I.R.C. reviewed the final manuscript.

Corresponding author

Correspondence to Henrique K. Miyamoto.

Ethics declarations

Conflict of interest

The authors declare no conflict of interest.

Additional information

Communicated by Tomonari Sei.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Miyamoto, H.K., Meneghetti, F.C.C., Pinele, J. et al. On closed-form expressions for the Fisher–Rao distance. Info. Geo. (2024). https://doi.org/10.1007/s41884-024-00143-2

Download citation

Received: 29 February 2024
Revised: 31 July 2024
Accepted: 01 August 2024
Published: 16 September 2024
DOI: https://doi.org/10.1007/s41884-024-00143-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On closed-form expressions for the Fisher–Rao distance

Abstract

Similar content being viewed by others

Information geometry and asymptotic geodesics on the space of normal distributions

Information geometry

Optimal transport natural gradient for statistical manifolds with continuous sample space

1 Introduction

2 Information geometry preliminaries

Proposition 1

Proof

Proposition 2

Proof

Proposition 3

Proof

Remark 1

Remark 2

3 Hyperbolic geometry results

Lemma 4

Proof

Remark 3

4 Discrete distributions

4.1 Binomial

4.2 Poisson

4.3 Geometric

4.4 Negative binomial

4.5 Categorical

Proposition 5

Proof

Remark 4

4.6 Multinomial

4.7 Negative multinomial

5 Continuous distributions

5.1 Exponential

5.2 Rayleigh

5.3 Erlang

5.4 Univariate elliptical distributions

5.4.1 Gaussian

5.4.2 Laplace

5.4.3 Generalised Gaussian

5.4.4 Logistic

5.4.5 Cauchy

5.4.6 Student’s t

Remark 5

5.5 Log-Gaussian

5.6 Inverse Gaussian

5.7 Extreme-value distributions

5.7.1 Gumbel

5.7.2 Fréchet

5.7.3 Weibull

Remark 6

5.8 Pareto

5.9 Power function

Remark 7

5.10 Wishart

5.11 Inverse Wishart

Remark 8

6 Product distributions

Example 1

Example 2

Example 3

7 Final remarks

Data availability

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search