Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

This chapter presents a few examples of usual statistical models (normal, lognormal, beta, gamma, Bernoulli, and geometric) for which we provide the Fisher metric explicitly and, if possible, the geodesics and α-autoparallel curves. Some Fisher metrics will involve the use of non-elementary functions, such as the digamma and trigamma functions.

A distinguished role is dedicated to the normal distribution, which is associated with a manifold of negative constant curvature (hyperbolic space) and to the multinomial geometry, which corresponds to a space with positive constant curvature (spherical space).

1 The Normal Distribution

In this section we shall determine the geodesics with respect to the Fisher information metric of a family of normal distributions. Given two distributions of the same family, the geodesics are curves of minimum information joining the distributions. We shall see that such a curve always exists between any two distributions on a normal family. This is equivalent with the possibility of deforming one distribution into the other by keeping the change of information to a minimum.

1.1 The Fisher Metric

Recall the formula for the density of a normal family

$$\displaystyle{p(x,\xi ) = \frac{1} {\sigma \sqrt{2\pi }}e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },\,\,\,x \in \mathcal{X} = \mathbb{R},}$$

with parameters \((\xi ^{1},\xi ^{2}) = (\mu,\sigma ) \in \mathbb{R} \times (0,\infty )\). Using Proposition 1.6.3 we obtain the following components for the Fisher–Riemann metric.

Proposition 2.1.1

The Fisher information matrix for the normal distribution is given by

$$\displaystyle{ g_{ij} = \left (\begin{array}{*{10}c} \frac{1} {\sigma ^{2}} & 0 \\ 0 & \frac{2} {\sigma ^{2}} \end{array} \right ). }$$
(2.1.1)

For the computation details see Problem 2.1. It is worth noting that the metric does not depend on μ, i.e., it is translation invariant. This metric is also very similar to the upper-half plane metric.

1.2 The Geodesics

A straightforward computation shows that the nonzero Christoffel symbols of first and second kind are:

$$\displaystyle{\varGamma _{11,2} = \frac{1} {\sigma ^{3}},\quad \varGamma _{12,1} = -\frac{1} {\sigma ^{3}},\quad \varGamma _{22,2} = -\frac{2} {\sigma ^{3}} }$$
$$\displaystyle{\varGamma _{ij}^{1} = \left (\begin{array}{*{10}c} 0 &-\frac{1} {\sigma } \\ -\frac{1} {\sigma } & 0 \end{array} \right ),\qquad \varGamma _{ij}^{2} = \left (\begin{array}{*{10}c} \frac{1} {2\sigma } & 0 \\ 0 &-\frac{1} {\sigma } \end{array} \right ).}$$

Consequently, the geodesics equations (1.13.43) are solutions of a Riccati ODE system

$$\displaystyle\begin{array}{rcl} \ddot{\mu }-\frac{2} {\sigma } \dot{\mu }\dot{\sigma } = 0& &{}\end{array}$$
(2.1.2)
$$\displaystyle\begin{array}{rcl} \ddot{\sigma }+\frac{1} {2\sigma }(\dot{\mu })^{2} -\frac{1} {\sigma } (\dot{\sigma })^{2} = 0.& &{}\end{array}$$
(2.1.3)

Separating and integrating in the first equation yields

$$\displaystyle{\frac{\ddot{\mu }} {\dot{\mu }} = \frac{2\dot{\sigma }} {\sigma } \Longleftrightarrow \frac{d} {ds}\ln \dot{\mu } = 2 \frac{d} {ds}\ln \sigma \Longleftrightarrow\dot{\mu } = c\sigma ^{2},}$$

with c constant. We solve the equation in the following two cases:

1. The case c = 0. It follows that μ = constant, which corresponds to vertical half lines. Then σ satisfies the equation \(\ddot{\sigma }= \frac{1} {\sigma } \dot{\sigma }^{2}\). Writing the equation as

$$\displaystyle{\frac{\ddot{\sigma }} {\dot{\sigma }} = \frac{\dot{\sigma }} {\sigma }}$$

and integrating yields \(\ln \dot{\sigma }=\ln (C\sigma )\), with C constant. Integrating again, we find σ(s) = Ke Cs. Hence, the geodesics in this case have the following explicit equations

$$\displaystyle\begin{array}{rcl} \mu = c& &{}\end{array}$$
(2.1.4)
$$\displaystyle\begin{array}{rcl} \sigma (s) = Ke^{Cs},& &{}\end{array}$$
(2.1.5)

with \(c,C \in \mathbb{R}\), K > 0 constants.

2. The case \(c\not =0\) . Substituting \(\dot{\mu }= x\sigma ^{2}\) in Eq. (2.1.3), we obtain the following equation in σ

$$\displaystyle{ \sigma \ddot{\sigma }+\frac{c^{2}} {2} \sigma ^{4} - (\dot{\sigma })^{2} = 0. }$$
(2.1.6)

Let \(\dot{\sigma }= u\). Then \(\ddot{\sigma }= \frac{du} {d\sigma } u\) and (2.1.6) becomes

$$\displaystyle{\sigma \frac{du} {d\sigma } u + \frac{c^{2}} {2} \sigma ^{4} - u^{2} = 0.}$$

Multiplying by the integrant factor \(\frac{1} {\sigma ^{3}}\) leads to the exact equation

$$\displaystyle{\mathop{\underbrace{\frac{u} {\sigma ^{2}} }}\limits _{=M}du +\Big (\mathop{\underbrace{\frac{c^{2}} {2} \sigma -\frac{u^{2}} {\sigma ^{3}} }}\limits _{N}\Big)d\sigma = 0,}$$

since

$$\displaystyle{\frac{\partial M} {\partial \sigma } = \frac{\partial N} {\partial u} = -2u\sigma ^{-3}.}$$

Then there is a function f(σ, u) such that df = 0, with

$$\displaystyle{\frac{\partial f} {\partial u} = M,\qquad \frac{\partial f} {\partial \sigma } = N.}$$

Integrating in the first equation yields

$$\displaystyle{f(\sigma,u) = \frac{u^{2}} {2\sigma ^{2}} + h(\sigma ),}$$

with function h to be determined in the following. Differentiating with respect to σ in the above equation,

$$\displaystyle{\frac{\partial f} {\partial \sigma } = -\frac{u^{2}} {\sigma ^{3}} + h^{\prime}(\sigma ),}$$

and comparing with

$$\displaystyle{\frac{\partial f} {\partial \sigma } = N = \frac{c^{2}} {2} \sigma -\frac{u^{2}} {\sigma ^{3}},}$$

we get

$$\displaystyle{h^{\prime}(\sigma ) = \frac{c^{2}} {2} \sigma \Longrightarrow h(\sigma ) = \frac{c^{2}\sigma ^{2}} {4} + c_{0},}$$

with c 0 constant. Hence, a first integral for the system is

$$\displaystyle{f(\sigma,u) = \frac{u^{2}} {2\sigma ^{2}} + \frac{c^{2}\sigma ^{2}} {4} = \frac{E} {2},}$$

with E positive constant. Solving for u, we obtain

$$\displaystyle\begin{array}{rcl} \frac{u^{2}} {\sigma ^{2}} + \frac{c^{2}\sigma ^{2}} {2} & =& E\Longleftrightarrow {}\\ \frac{\dot{\sigma }} {\sigma }& =& \frac{c} {\sqrt{2}}\sqrt{ C^{2} -\sigma ^{2}}, {}\\ \end{array}$$

where \(C^{2} = 2E/c^{2}\). Separating and integrating, we find

$$\displaystyle{\int \frac{d\sigma } {\sigma \sqrt{C^{2 } -\sigma ^{2}}} = (s + s_{0}) \frac{c} {\sqrt{2}}.}$$

Using the value of the integral

$$\displaystyle{\int \frac{dx} {x\sqrt{C^{2 } - x^{2}}} = - \frac{1} {\sqrt{C}}\tanh ^{-1}\sqrt{1 -\Big ( \frac{x} {C}\Big)^{2}},}$$

we obtain

$$\displaystyle{- \frac{1} {\sqrt{C}}\tanh ^{-1}\sqrt{1 -\Big ( \frac{\sigma } {C}\Big)^{2}} = (s + s_{0}) \frac{c} {\sqrt{2}}.}$$

Solving for σ, we get

$$\displaystyle{\sigma = c\sqrt{1 -\tanh ^{2 } \big(\sqrt{E} (s + s_{0 } )\big)} = \frac{c} {\cosh \big(\sqrt{E}(s + s_{0})\big)}.}$$

In order to find μ we integrate in \(\dot{\mu }= c\sigma ^{2}\) and obtain

$$\displaystyle{\mu (s) =\int \frac{c^{3}} {\cosh ^{2}\big(\sqrt{E}(s + s_{0})\big)}\,ds = \frac{c^{3}} {\sqrt{E}}\tanh \big(\sqrt{E}(s + s_{0})\big) + K.}$$

Since we have

$$\displaystyle{\sigma (s)^{2} + (\mu (s) - K)^{2} \frac{E} {c^{4}} = c^{2},}$$

the geodesics will be half-ellipses, with σ > 0.

In the case c = 0, the unknown σ satisfies

$$\displaystyle{\frac{\dot{\sigma }} {\sigma } = \sqrt{E}\Longleftrightarrow \frac{d} {ds}\ln \sigma = \sqrt{E}}$$

with solution

$$\displaystyle{\sigma (s) =\sigma (0)e^{\sqrt{E}s},}$$

while μ is constant, μ = K. The geodesics in this case are vertical half-lines.

Proposition 2.1.2

Consider two normal distributions with equal means, μ 0 = μ 1 , and distinct standard deviations σ 0 and σ 1 . Then the smallest information transform, which deforms the first distribution into the second one, is a normal distribution with constant mean and standard deviation

$$\displaystyle{\sigma (s) =\sigma _{ 1}^{s/\tau }\sigma _{ 0}^{1-s/\tau },\quad s \in [0,\tau ].}$$

Proof:

The geodesic in this case is a vertical half-line with constant mean and \(\sigma (s) =\sigma (0)e^{\sqrt{E}s}\). The amount \(\sqrt{E}\) can be found from the boundary condition σ(τ) = σ 1. ■ 

Let x 0 = lnσ 0, x 1 = lnσ 1, and x(s) = lnσ(s). Then \(x(s) = \frac{s} {\tau } x_{0} +\big (1 -\frac{s} {\tau } \big)x_{1}\), which corresponds to a line segment. The minimal information loss during the deformation occurs when the log-likelihood function describes a line segment.

1.3 α-Autoparallel Curves

A straightforward computation, using (1.11.34), yields the following Christoffel coefficients of first kind

$$\displaystyle{\varGamma _{11,1}^{(\alpha )} =\varGamma _{ 21,2}^{(\alpha )} =\varGamma _{ 12,2}^{(\alpha )} =\varGamma _{ 22,1}^{(\alpha )} = 0}$$
$$\displaystyle{\varGamma _{11,2}^{(\alpha )} = \frac{1-\alpha } {\sigma ^{3}},\quad \varGamma _{12,1}^{(\alpha )} =\varGamma _{ 21,1}^{(\alpha )} = -\frac{1+\alpha } {\sigma ^{3}},\quad \varGamma _{22,2}^{(\alpha )} = -\frac{2(1 + 2\alpha )} {\sigma ^{3}}.}$$

The Christoffel symbols of second kind are obtained by rising indices

$$\displaystyle\begin{array}{rcl} {\varGamma _{ij}^{1}}^{(\alpha )}& =& g^{11}\varGamma _{ ij,1}^{(\alpha )} + g^{12}\varGamma _{ ij,2}^{(\alpha )} =\sigma ^{2}\varGamma _{ ij,1}^{(\alpha )} {}\\ & =& \sigma ^{2}\left (\begin{array}{*{10}c} 0 &-\frac{1+\alpha } {\sigma ^{3}} \\ -\frac{1+\alpha } {\sigma ^{3}} & 0 \end{array} \right ) = \left (\begin{array}{*{10}c} 0 &-\frac{1+\alpha } {\sigma } \\ -\frac{1+\alpha } {\sigma } & 0 \end{array} \right ).{}\\ \end{array}$$
$$\displaystyle\begin{array}{rcl} {\varGamma _{ij}^{2}}^{(\alpha )}& =& g^{21}\varGamma _{ ij,1}^{(\alpha )} + g^{22}\varGamma _{ ij,2}^{(\alpha )} {}\\ & =& \frac{\sigma ^{2}} {2}\left (\begin{array}{*{10}c} \frac{1-\alpha } {\sigma ^{3}} & 0 \\ 0 &-2\frac{1+2\alpha } {\sigma ^{3}} \end{array} \right ) = \left (\begin{array}{*{10}c} \frac{1-\alpha } {2\sigma } & 0 \\ 0 &-\frac{1+2\alpha } {\sigma } \end{array} \right ).{}\\ \end{array}$$

The Riccati equations (1.13.43) for the α-autoparallel curves are

$$\displaystyle\begin{array}{rcl} \ddot{\mu }-\frac{2(1+\alpha )} {\sigma } \dot{\sigma }\dot{\mu }& =& 0 {}\\ \ddot{\sigma }+\frac{1-\alpha } {2\sigma } \dot{\mu }^{2} -\frac{1 + 2\alpha } {\sigma } \dot{\sigma }^{2}& =& 0. {}\\ \end{array}$$

The first equation can be transformed as in the following

$$\displaystyle\begin{array}{rcl} \frac{\ddot{\mu }} {\dot{\mu }}& =& 2(1+\alpha )\frac{\dot{\sigma }} {\sigma }\Longleftrightarrow {}\\ \frac{d} {ds}\ln \dot{\mu }& =& 2(1+\alpha ) \frac{d} {ds}\ln \sigma \Longleftrightarrow {}\\ \ln \dot{\mu }& =& 2(1+\alpha )\ln \sigma + c_{0}\Longleftrightarrow {}\\ \dot{\mu }& =& c\,\sigma ^{2(1+\alpha )}, {}\\ \end{array}$$

with c constant. Substituting in the second equation yields

$$\displaystyle{\ddot{\sigma }+\frac{1-\alpha } {2\sigma } c^{2}\sigma ^{4(1+\alpha )} -\frac{1 + 2\alpha } {\sigma } \dot{\sigma }^{2} = 0,}$$

which after the new substitution \(u =\dot{\sigma }\) writes as

$$\displaystyle{\frac{du} {d\sigma } u + \frac{1-\alpha } {2\sigma } c^{2}\sigma ^{4(1+\alpha )} -\frac{1 + 2\alpha } {\sigma } u^{2} = 0.}$$

Multiplying the equation by an integral factor of the form σ k+1, we obtain

$$\displaystyle{\mathop{\underbrace{\sigma ^{k+1}u}}\limits _{=M}\,du +\Big (\mathop{\underbrace{\frac{1-\alpha } {2} c^{2}\sigma ^{4(\alpha +1)+k} - (1 + 2\alpha )\sigma ^{k}u^{2}}}\limits _{ =N}\Big)d\sigma = 0.}$$

From the closeness condition

$$\displaystyle{\frac{\partial M} {\partial \sigma } = \frac{\partial N} {\partial u},}$$

we determine \(k + 1 = -(4\alpha + 2)\). The exact equation we need to solve is

$$\displaystyle{u\sigma ^{-(4\alpha +2)}\,du +\Big (\frac{1-\alpha } {2} c^{2}\sigma - (1 + 2\alpha )u^{2}\sigma ^{-(4\alpha +3)}\Big)\,d\sigma = 0.}$$

We need to determine a function f that satisfies the system

$$\displaystyle\begin{array}{rcl} \frac{\partial f} {\partial u}& =& u\sigma ^{-(4\alpha +2)} {}\\ \frac{\partial f} {\partial \sigma } & =& \frac{1-\alpha } {2} c^{2}\sigma - (1 + 2\alpha )u^{2}\sigma ^{-(2\alpha +3)}. {}\\ \end{array}$$

From the first equation, we have

$$\displaystyle{f = \frac{u^{2}} {2} \sigma ^{-(4\alpha +2)} + h(\sigma )\Longrightarrow\frac{\partial f} {\partial \sigma } = -(1 + 2\alpha )u^{2}\sigma ^{-(4\alpha +3)} + h^{\prime}(\sigma )}$$

and comparing with the second equation yields

$$\displaystyle{h^{\prime}(\sigma ) = \frac{1-\alpha } {2} c^{2}\sigma \Longrightarrow h(\sigma ) = \frac{(1-\alpha )c^{2}} {4} \sigma ^{2} + \mathcal{C}.}$$

Hence, a first integral of motion is given by

$$\displaystyle{f = \frac{u^{2}} {2} \sigma ^{-(4\alpha +2)} + \frac{1-\alpha } {4} c^{2}\sigma ^{2} = \frac{E} {2},}$$

with E constant. Next we shall solve for σ. Using that \(u =\dot{\sigma }\), we have

$$\displaystyle\begin{array}{rcl} \frac{u^{2}} {\sigma ^{2(2\alpha +1)}} + \frac{1-\alpha } {2} c^{2}\sigma ^{2}& =& E\Longleftrightarrow \\ \Bigg( \frac{\dot{\sigma }} {\sigma ^{2\alpha +1}}\Bigg)^{2} + \frac{1-\alpha } {2} c^{2}\sigma ^{2}& =& E\Longleftrightarrow \\ \Bigg( \frac{\dot{\sigma }} {\sigma ^{2\alpha +1}}\Bigg)^{2}& =& E -\frac{1-\alpha } {2} c^{2}\sigma ^{2}\Longleftrightarrow \\ \int \frac{d\sigma } {\sigma ^{2\alpha +1}\sqrt{E - \frac{1-\alpha } {2} c^{2}\sigma ^{2}}}& =& \pm s + s_{0}\Longleftrightarrow \\ \int \frac{d\sigma } {\sigma ^{2\alpha +1}\sqrt{C^{2 } -\sigma ^{2}}}& =& (\pm s + s_{0})\sqrt{\frac{1-\alpha } {2}} \,c,{}\end{array}$$
(2.1.7)

where

$$\displaystyle{C = C_{\alpha } = \frac{2E} {c} \frac{1} {1-\alpha }.}$$

The left side integral can be transformed using the substitutions t = σ 2, \(v = \sqrt{C^{2 } - t}\) as follows

$$\displaystyle\begin{array}{rcl} \int \frac{d\sigma } {\sigma ^{2\alpha +1}\sqrt{C^{2 } -\sigma ^{2}}}& =& \int \frac{dt} {2\sigma ^{2(\alpha +1)}\sqrt{C^{2 } -\sigma ^{2}}} =\int \frac{dt} {2t^{\alpha +1}\sqrt{C^{2 } - t}} {}\\ & =& \int \frac{-2v\,dv} {2t^{\alpha +1}v} = -\int \frac{dv} {(C^{2} - v^{2})^{\alpha +1}}, {}\\ \end{array}$$

and hence (2.1.7) becomes

$$\displaystyle{ -\int \frac{dv} {(C^{2} - v^{2})^{\alpha +1}} = (\pm s + s_{0})\sqrt{\frac{1-\alpha } {2}} \,c. }$$
(2.1.8)

The μ-component is given by

$$\displaystyle{ \mu = c\,\int \sigma ^{2(1+\alpha )}(s)ds. }$$
(2.1.9)

There are a few particular values of α for which this equation can be solved explicitly.

Case α = −1

Equation (2.1.8) becomes

$$\displaystyle{-v - K = (\pm s + s_{0})\sqrt{\frac{1-\alpha } {2}} \,c,}$$

with solution

$$\displaystyle{\sigma ^{2}(s) = C^{2} -\Big ((\pm s + s_{ 0})\sqrt{\frac{1-\alpha } {2}} \,c + K\Big)^{2},}$$

for K constant. Equation (2.1.3) easily yields

$$\displaystyle{\mu (s) = cs +\mu (0).}$$

Case α = 1∕2

Since

$$\displaystyle{\int \frac{dv} {\big(C^{2} - v^{2}\big)^{3/2}} = \frac{v} {C^{2}\sqrt{C^{2 } - v^{2}}},}$$

we solve

$$\displaystyle{- \frac{v} {C^{2}\sqrt{C^{2 } - v^{2}}} = (\pm s + s_{0})\frac{c} {2} + K}$$

and obtain

$$\displaystyle{\sigma (s) = \frac{C} {\sqrt{1 + C^{4 } \Big((\pm s + s_{0 } )\frac{c} {2} + K\Big)^{2}}}.}$$

The μ-component is given by the integral

$$\displaystyle{\mu (s) = c\int \sigma ^{3}(s)\,ds.}$$

2 Jeffrey’s Prior

In the following we shall compute the prior on the statistical model

$$\displaystyle{\mathcal{S}_{\mu } =\{ p_{\xi };E[p_{\xi } =\mu ],V ar[p_{\xi }] > 1\} =\{ p_{(\mu,\sigma )};\sigma > 1\}}$$

which represents a vertical half line in the upper-half plane. The determinant is

$$\displaystyle{G(\xi ) =\det g_{ij}(\xi ) =\det \left (\begin{array}{cc} \frac{1} {\sigma ^{2}} & 0 \\ 0 &\frac{2} {\sigma ^{2}}\\ \end{array} \right ) = \frac{2} {\sigma ^{4}}.}$$

Then the volume is computed as

$$\displaystyle\begin{array}{rcl} V ol(\mathcal{S}_{\mu })& =& \int _{1}^{\infty }\sqrt{G(\xi )}\,d\sigma =\int _{ 1}^{\infty }\frac{\sqrt{2}} {\sigma ^{2}} \,d\sigma = \sqrt{2} < \infty. {}\\ \end{array}$$

Therefore the prior on \(\mathcal{S}_{\mu }\) is given by

$$\displaystyle{Q(\sigma ) = \frac{\sqrt{G(\sigma )}} {V ol(\mathcal{S}_{\mu })} = \frac{1} {\sigma ^{2}}.}$$

3 Lognormal Distribution

In the case of lognormal distribution

$$\displaystyle{p_{\mu,\sigma }(x) = \frac{1} {\sqrt{2\pi }\,\sigma x}e^{-\frac{(\ln x-\mu )^{2}} {2\sigma ^{2}} },\qquad x > 0,}$$

the Fisher information matrix (Fisher–Riemann metric) is given by

$$\displaystyle{g = \left (\begin{array}{cc} g_{\mu \mu }&g_{\mu \sigma }\\ g_{\sigma \mu } &g_{\sigma \sigma }\\ \end{array} \right ) = \left (\begin{array}{cc} \frac{1} {\sigma ^{2}} & 0 \\ 0 &\frac{2} {\sigma ^{2}}\\ \end{array} \right ).}$$

The computation details are left for the reader and are the subject of Problem 2.2. It is worth noting that this coincides with the Fisher metric of a normal distribution model. Hence, the associated geodesics are vertical half lines or halfs of ellipses.

4 Gamma Distribution

In this case the statistical model is defined by the following family of densities

$$\displaystyle{p_{{\xi }}(x) = p_{{\alpha,\beta }}(x) = \frac{1} {\beta ^{\alpha }\varGamma (\alpha )}\,\,x^{\alpha -1}e^{-x/\beta },}$$

with \((\alpha,\beta ) \in (0,\infty ) \times (0,\infty )\), x ∈ (0, ). In the study of this model we need some special functions. Let

$$\displaystyle{ \psi (\alpha ) = \frac{\varGamma ^{\prime}(\alpha )} {\varGamma (\alpha )},\qquad \psi _{1}(\alpha ) =\psi ^{\prime}(\alpha ) }$$
(2.4.10)

be the digamma and the trigamma functions, respectively. Differentiating in the Dirichlet’s integral representation (see Erdélyi [42] vol. I, p. 17)

$$\displaystyle{\psi (\alpha ) =\int _{ 0}^{\infty }[e^{-t} - (1 + t)^{-\alpha }]t^{-1}\,dt,\qquad \qquad \alpha > 0}$$

yields the following integral expression for the trigamma function

$$\displaystyle{ \psi _{1}(\alpha ) =\psi ^{\prime}(\alpha ) =\int _{ 0}^{\infty } \frac{\ln (1 + t)} {t(1 + t)^{\alpha }}\,dt. }$$
(2.4.11)

Another interesting formula is the expression of the trigamma function as a Hurwitz zeta function

$$\displaystyle{ \psi _{1}(\alpha ) =\zeta (2,\alpha ) =\sum _{n\geq 0} \frac{1} {(\alpha +n)^{2}}, }$$
(2.4.12)

which holds for \(\alpha \notin \{0,-1,-2,-3,\ldots \}\), relation obviously satisfied in our case since α > 0.

Then the components of the Fisher–Riemann metric are obtained from Proposition 1.6.3, using the relations

$$\displaystyle{\int _{0}^{\infty }p_{{\xi }}(x)\,dx = 1,\qquad \int _{ 0}^{\infty }xp_{{\xi }}(x)\,dx =\alpha \beta }$$

and the derivatives of the log-likelihood function that are asked to be computed in Problem 2.2:

$$\displaystyle\begin{array}{rcl} g_{{\alpha \alpha }}& =& -E[\partial _{\alpha }^{2}\ell_{ x}(\xi )] =\int _{ 0}^{\infty }\psi ^{\prime}(\alpha )p_{{\xi }}(x)\,dx =\psi ^{\prime}(\alpha ) =\psi _{ 1}(\alpha ), {}\\ g_{{\beta \beta }}& =& -E[\partial _{\beta }^{2}\ell_{ x}(\xi )] = -\int _{0}^{\infty }\Big( \frac{\alpha } {\beta ^{2}} -\frac{2x} {\beta ^{3}} \Big)p_{{\xi }}(x)\,dx {}\\ & =& -\frac{\alpha } {\beta ^{2}} + \frac{2} {\beta ^{3}} \int _{0}^{\infty }xp_{{\xi }}(x)\,dx = \frac{\alpha } {\beta ^{2}}, {}\\ g_{{\alpha \beta }}& =& -E[\partial _{\alpha \beta }\ell_{x}(\xi )] =\int _{ 0}^{\infty }\frac{1} {\beta } \,p_{{\xi }}(x)\,dx = \frac{1} {\beta }. {}\\ \end{array}$$

Proposition 2.4.1

The Fisher information matrix (Fisher–Riemann metric) for the gamma distribution is

$$\displaystyle{g = \left (\begin{array}{cc} \psi _{1}(\alpha )&\frac{1} {\beta } \\ \frac{1} {\beta } & \frac{\alpha } {\beta ^{2}}\\ \end{array} \right ) = \left (\begin{array}{cc} \sum _{n\geq 0} \frac{1} {(\alpha +n)^{2}} & \frac{1} {\beta } \\ \frac{1} {\beta } & \frac{\alpha } {\beta ^{2}}\\ \end{array} \right ).}$$

It is worth noting that here α is the parameter for the gamma distribution and it has nothing to do with α-connections.

5 Beta Distribution

The Fisher information metric for the beta distribution

$$\displaystyle{p_{a,b} = \frac{1} {B(a,b)}\,\,x^{a-1}(1 - x)^{b-1},\qquad a,b > 0,x \in [0,1]}$$

will be worked in terms of trigamma functions. Since the beta function

$$\displaystyle{ B(a,b) =\int _{ 0}^{1}x^{a-1}(1 - x)^{b-1}\,dx }$$

can be expressed in terms of gamma functions as

$$\displaystyle{ B(a,b) = \frac{\varGamma (a)\varGamma (b)} {\varGamma (a + b)}, }$$

then its partial derivatives can be written in terms of digamma functions, using relation (2.11.17), see Problem 2.4, part (a).

The log-likelihood function and its partial derivatives are left for the reader as an exercise in Problem 2.4, parts (b) and (c). Since the second partial derivatives of (a, b) do not depend on x, they will coincide with their own expected values. It follows the next components for the Fisher–Riemann metric:

$$\displaystyle\begin{array}{rcl} g_{aa}& =& -E[\partial _{a}^{2}\ell(a,b)] =\psi _{ 1}(a) -\psi _{1}(a + b) {}\\ g_{bb}& =& -E[\partial _{b}^{2}\ell_{ x}(a,b)] =\psi _{1}(b) -\psi _{1}(a + b) {}\\ g_{ab}& =& g_{ba} = -E[\partial _{a}\partial _{b}\ell_{x}(a,b)] = -\psi _{1}(a + b). {}\\ \end{array}$$

Proposition 2.5.1

The Fisher information matrix (Fisher–Riemann metric) for the beta distribution is given by

$$\displaystyle{g = \left (\begin{array}{cc} \psi _{1}(a) -\psi _{1}(a + b)& -\psi _{1}(a + b) \\ -\psi _{1}(a + b) &\psi _{1}(b) -\psi _{1}(a + b)\\ \end{array} \right ),}$$

where ψ 1 stands for the trigamma function.

6 Bernoulli Distribution

Consider the sample space \(\mathcal{X} =\{ 0,1,\ldots,n\}\) and parameter space \(\mathbb{E} = [0,1]\). The Bernoulli, or binomial distribution, is given by

$$\displaystyle{p(k;\xi ) =\Big (_{k}^{n}\Big)\xi ^{k}(1-\xi )^{n-k},}$$

where the parameter ξ denotes the success probability. Then \(\mathcal{S} =\{ p_{\xi };\xi \in [0,1]\}\) becomes a one-dimensional statistical model. The derivatives of the log-likelihood function k (ξ) = lnp(k; ξ) are proposed as an exercise in Problem 2.5. Then the Fisher information is given by the function

$$\displaystyle\begin{array}{rcl} g_{11}(\xi )& =& -E_{\xi }[\partial _{\xi }^{2}\ell(\xi )] =\sum _{ k=0}^{n}p(k;\xi )\partial _{\xi }^{2}\ell_{ k}(\xi ) {}\\ & =& \sum _{k=0}^{n}\frac{k} {\xi ^{2}} p(k;\xi ) +\sum \frac{(n - k)} {(1-\xi )^{2}} p(k;\xi ) {}\\ & =& \frac{n} {\xi } + \frac{n(1-\xi )} {(1-\xi )^{2}} = \frac{n} {\xi (1-\xi )}, {}\\ \end{array}$$

where we used that the mean of a Bernoulli distribution is n ξ. Using that the variance is n ξ(1 −ξ), it follows that

$$\displaystyle{g_{11}(\xi ) = \frac{n^{2}} {V ar(p_{\xi })},}$$

which is a Cramér–Rao type identity corresponding to an efficient estimator.

7 Geometric Probability Distribution

Let \(\mathcal{X} =\{ 1,2,3,\ldots \}\), \(\mathbb{E} = [0,1]\) and consider \(p(k;\xi ) = (1-\xi )^{k-1}\xi\), \(k \in \mathcal{X}\), \(\xi \in \mathbb{E}\). The formulas for the partial derivatives of the log-likelihood function are left as an exercise for the reader in Problem 2.6. Then the Fisher information becomes

$$\displaystyle\begin{array}{rcl} g_{11}(\xi )& =& -E_{\xi }[\partial _{\xi }^{2}\ell(\xi )] {}\\ & =& \sum _{k\geq 1}\frac{(k - 1)p(k;\xi )} {(\xi -1)^{2}} +\sum _{k\geq 1}\frac{1} {\xi ^{2}} p(k;\xi ) {}\\ & =& \frac{1} {\xi ^{2}(1-\xi )}, {}\\ \end{array}$$

where we used the expression for the mean \(\sum _{k\geq 1}k\,p(k;\xi ) = \frac{1} {\xi }\).

8 Multinomial Geometry

In this section we investigate the geometry associated with the multinomial probability distribution. The computation performed here is inspired from Kass and Vos [49]. Consider m independent, identical trials with n possible outcomes. The probability that a single trial falls into class i is p i , \(i = 1,2,\ldots,n\), and remains the same from trial to trial. Since \(p_{1} +\ldots +p_{n} = 1\), the parameter space is given by the (n − 1)-dimensional simplex

$$\displaystyle{\mathbb{E} =\{ (p_{1},\ldots,p_{n-1});0 \leq p_{i} \leq 1,\sum _{i=1}^{n-1}p_{ i} = 1\}.}$$

It is advantageous to consider the new parameterization

$$\displaystyle{z_{i} = 2\sqrt{p_{i}},\qquad i = 1,\ldots,n.}$$

Then \(\sum _{i=1}^{n}z_{i}^{2} = 4\), and hence

$$\displaystyle{z \in \mathbb{S}_{2,+}^{n-1} =\{ z \in \mathbb{R}^{n};\|z\|^{2} = 4,z_{ i} \geq 0\}.}$$

Therefore, the statistical manifold of multinomial probability distributions can be identified with \(\mathbb{S}_{2,+}^{n-1}\), the positive portion of the (n − 1)-dimensional sphere of radius 2. The Fisher information matrix with respect to a local coordinate system (ξ i) is

$$\displaystyle\begin{array}{rcl} g_{rs}(\xi )& =& 4\sum _{i=1}^{n}\partial _{ r}\sqrt{p_{i } (\xi )}\partial _{s}\sqrt{p_{i } (\xi )} {}\\ & =& \sum _{i=1}^{n}\partial _{ r}z_{i}(\xi )\,\partial _{s}z_{i}(\xi ) {}\\ & =& \langle \partial _{r}z,\partial _{s}z\rangle, {}\\ \end{array}$$

where \(\partial _{s} = \partial _{\xi ^{s}}\). Therefore, the Fisher metric is the natural metric induced from the Euclidean metric of \(\mathbb{R}^{n}\) on the sphere \(\mathbb{S}_{2,+}^{n-1}\). We note that r z is a tangent vector to the sphere in the direction of ξ r.

To find the information distance between two multinomial distributions p and q, we need to find the length of the shortest curve on the sphere \(\mathbb{S}_{2,+}^{n-1}\), joining p and q. The curve that achieves the minimum is an arc of great circle passing through p and q, and this curve is unique.

Let z p and z q denote the points on the sphere corresponding to the aforementioned distributions. The angle α made by the unit vectors z p ∕2 and z q ∕2 satisfies \(\cos \alpha =\langle z_{p}/2,z_{q}/2\rangle\). Since the distance on the sphere is the product between the radius and the central angle, we have

$$\displaystyle\begin{array}{rcl} d(p,q)& =& 2\alpha = 2\arccos \Big(\sum _{i=1}^{n}\frac{z_{p}^{i}} {2} \frac{z_{q}^{i}} {2} \Big) {}\\ & =& 2\arccos \Big(\sum _{i=1}^{n}(p_{ i}q_{i})^{1/2}\Big). {}\\ \end{array}$$

It is worthy to note that the Euclidean distance between p and q can be written as

$$\displaystyle\begin{array}{rcl} \|z_{p} - z_{q}\|^{2}& =& \Big(\sum _{ i=1}^{n}(z_{ p}^{i} - z_{ q}^{i})^{2}\Big)^{1/2} = 2\Big(\sum _{ i=1}^{n}\Big(\frac{z_{p}^{i}} {2} -\frac{z_{q}^{i}} {2} \Big)^{2}\Big)^{1/2} {}\\ & =& 2\Big(\sum _{i=1}^{n}(\sqrt{p_{ i}} -\sqrt{q_{i}})^{2}\Big)^{1/2} = d_{ H}(p,q), {}\\ \end{array}$$

which is called the Hellinger distance between p and q. We shall discuss about this distance in more detail later.

The foregoing computation of the Fisher metric was exploiting geometric properties. In the following we shall provide a direct computation. We write the statistical model of multinomial distributions by \(\mathcal{S} =\{ p(k;\xi )\}\), with

$$\displaystyle{p(k;\xi ) = \frac{n!} {k_{1}!\ldots k_{m}!}p_{1}^{k_{1} }\ldots p_{m-1}^{k_{m-1} }p_{m}^{k_{m} },}$$

where

$$\displaystyle{\mathcal{X} =\{ k = (k_{1},\ldots k_{m}) \in \mathbb{N}^{m};k_{ 1} +\ldots k_{m} = n\},}$$

and \(\xi = (\xi ^{1},\ldots,\xi ^{m-1}) \in \mathbb{E} = [0,1]^{m-1}\), with ξ i = p i , \(i = 1,\ldots m - 1\), and \(p_{m} = 1 - p_{1} -\ldots -p_{m-1}\). Then a straightforward computation shows

$$\displaystyle\begin{array}{rcl} \partial _{i}\ell(k;\xi )& =& \frac{k_{i}} {p_{i}} -\frac{k_{m}} {p_{m}} {}\\ \partial _{j}\partial _{i}\ell(k;\xi )& =& -\Big[\frac{k_{i}\delta _{ij}} {p_{i}^{2}} + \frac{k_{m}} {p_{m}^{2}}\Big]. {}\\ \end{array}$$

Using the formula for the marginal probability

$$\displaystyle{\sum _{k}k_{i}p(k;\xi ) = np_{i},}$$

we have

$$\displaystyle\begin{array}{rcl} g_{ij}(\xi )& =& -E_{\xi }[\partial _{i}\partial _{j}\ell(k;\xi )] = E\Big[\frac{k_{i}\delta _{ij}} {p_{i}^{2}} + \frac{k_{m}} {p_{m}^{2}}\Big] {}\\ & =& \frac{\delta _{ij}} {p_{i}^{2}}\sum _{k}k_{i}p(k;\xi ) + \frac{1} {p_{m}^{2}}\sum _{k}k_{m}p(k;\xi ) {}\\ & =& n\Big[ \frac{\delta _{ij}} {p_{i}} + \frac{1} {p_{m}}\Big] = n\Big[\frac{\delta _{ij}} {\xi ^{i}} + \frac{1} {1 -\xi ^{1} -\ldots -\xi ^{m-1}}\Big]. {}\\ \end{array}$$

9 Poisson Geometry

Consider m independent Poisson distributions with parameters λ i , \(i = 1,\ldots,m\). The joint probability function is given by the product

$$\displaystyle{p(x;\lambda ) =\prod _{ i=1}^{m}p_{\lambda _{ i}}(x_{i}) =\prod _{ i=1}^{m}e^{-\lambda _{i} }\frac{\lambda _{i}^{x_{i}}} {x_{i}!},}$$

with \(\lambda = (\lambda _{1},\ldots,\lambda _{m+1}) \in \mathbb{E} = (0,\infty )^{m}\), and \(x = (x_{1},\ldots,x_{m}) \in \mathcal{X} = (\mathbb{N} \cup \{ 0\})^{m}\). The log-likelihood function and its derivatives with respect to \(\partial _{j} = \partial _{\lambda _{j}}\) are

$$\displaystyle\begin{array}{rcl} \ell(x;\lambda )& =& -\lambda _{i} + x_{i}\ln \lambda _{i} -\ln (x_{i}!) {}\\ \partial _{j}\ell(x;\lambda )& =& -1 + \frac{x_{j}} {\lambda _{j}} {}\\ \partial _{k}\partial _{j}\ell(x;\lambda )& =& -\frac{x_{j}} {\lambda _{j}^{2}} \delta _{kj}. {}\\ \end{array}$$

Then the Fisher information is obtained as

$$\displaystyle\begin{array}{rcl} g_{jk}(\lambda )& =& E\Big[\frac{x_{j}} {\lambda _{j}^{2}} \delta _{kj}\Big] = \frac{1} {\lambda _{j}^{2}}\delta _{kj}E[x_{j}] {}\\ & =& \frac{1} {\lambda _{j}^{2}}\delta _{kj}\sum _{x}x_{j}p(x;\lambda ) = \frac{1} {\lambda _{j}}\delta _{kj}. {}\\ \end{array}$$

Therefore the Fisher matrix has a diagonal form with positive entries.

10 The Space \(\mathcal{P}(\mathcal{X})\)

Let \(\mathcal{X} =\{ x_{1},\ldots,x_{n}\}\) and consider the statistical model \(\mathcal{P}(\mathcal{X})\) of all discrete probability densities on \(\mathcal{X}\). The space \(\mathcal{P}(\mathcal{X})\) can be imbedded into the function space \(\mathbb{R}^{\mathcal{X}} =\{ f;f: \mathcal{X} \rightarrow \mathbb{R}\}\) in several ways, as we shall describe shortly. This study can be found in Nagaoka and Amari [61].

For any \(\alpha \in \mathbb{R}\) consider the function \(\varphi _{\alpha }: (0,\infty ) \rightarrow \mathbb{R}\)

$$\displaystyle{\varphi _{\alpha }(u) = \left \{\begin{array}{ll} \frac{2} {1-\alpha }u^{\frac{1-\alpha } {2} },&\mbox{ if}\;\alpha \not =1 \\ \\ \ln u, &\mbox{ if}\;\alpha = 1. \end{array} \right.}$$

The imbedding

$$\displaystyle{\mathcal{P}(\mathcal{X}) \ni p(x;\xi ) \rightarrow \varphi _{\alpha }\big(p(x;\xi )\big) \in \mathbb{R}^{\mathcal{X}}}$$

is called the α -representation of \(\mathcal{P}(\mathcal{X})\). A distinguished role will be played by the α-likelihood functions

$$\displaystyle{\ell^{(\alpha )}(x;\xi ) =\varphi _{\alpha }\big(p(x;\xi )\big).}$$

The coordinate tangent vectors in this representation are given by

$$\displaystyle{\partial _{i}\ell^{(\alpha )}(x;\xi ) = \partial _{\xi ^{ i}}\varphi _{\alpha }\big(p(x;\xi )\big).}$$

The α-representation can be used to define the Fisher metric and the ∇(α)-connection on \(\mathcal{P}(\mathcal{X})\).

Proposition 2.10.1

The Fisher metric can be written in terms of the α-likelihood functions as in the following

  1. (i)

    \(g_{ij}(\xi ) =\sum _{ k=1}^{n}\partial _{ i}\ell^{(\alpha )}(x_{ k};\xi )\partial _{j}\ell^{(-\alpha )}(x_{ k};\xi );\)

  2. (ii)

    \(g_{ij}(\xi ) = - \frac{2} {1+\alpha }\sum _{k=1}^{n}p(x_{ k};\xi )^{\frac{1+\alpha } {2} }\partial _{i}\partial _{j}\ell^{(\alpha )}(x_{k};\xi ).\)

Proof:

Differentiating yields

$$\displaystyle\begin{array}{rcl} \partial _{i}\ell^{(\alpha )} = p^{\frac{1-\alpha } {2} }\partial _{i}\ell;& &{}\end{array}$$
(2.10.13)
$$\displaystyle\begin{array}{rcl} \partial _{i}\ell^{(-\alpha )} = p^{\frac{1+\alpha } {2} }\partial _{i}\ell;& &{}\end{array}$$
(2.10.14)
$$\displaystyle\begin{array}{rcl} \partial _{i}\partial _{j}\ell^{(\alpha )} = p^{\frac{1-\alpha } {2} }\Big(\partial _{i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big),& &{}\end{array}$$
(2.10.15)

where (x; ξ) = lnp(x; ξ).

  1. (i)

    The previous computations and formula (1.6.16) provide

    $$\displaystyle\begin{array}{rcl} \sum _{k=1}^{n}\partial _{ i}\ell^{(\alpha )}(x_{ k};\xi )\partial _{j}\ell^{(-\alpha )}(x_{ k};\xi )& =& \sum _{k=1}^{n}p^{\frac{1-\alpha } {2} }\partial _{i}\ell(x_{k})p^{\frac{1+\alpha } {2} }\partial _{j}\ell(x_{k}) {}\\ & =& \sum _{k=1}^{n}p(x_{ k};\xi )\partial _{i}\ell(x_{k})\partial _{j}\ell(x_{k}) {}\\ & =& E_{\xi }[\partial _{i}\ell\,\partial _{j}\ell] = g_{ij}(\xi ). {}\\ \end{array}$$
  2. (ii)

    Relation (2.10.15) implies

    $$\displaystyle{p^{\frac{1+\alpha } {2} }\partial _{i}\partial _{j}\ell^{(\alpha )}(x;\xi ) = p(x;\xi )\partial _{i}\partial _{j}\ell(x;\xi ) + \frac{1-\alpha } {2} p(x;\xi )\partial _{i}\ell\partial _{j}\ell(x;\xi ).}$$

    Summing and using (1.6.16) and (1.6.18), we have

    $$\displaystyle\begin{array}{rcl} \sum _{k=1}^{n}p(x_{ k};\xi )^{\frac{1+\alpha } {2} }\partial _{i}\partial _{j}\ell^{(\alpha )}(x_{k};\xi )& =& E_{\xi }[\partial _{i}\partial _{j}\ell] + \frac{1-\alpha } {2} E_{\xi }[\partial _{i}\ell\,\partial _{j}\ell] {}\\ & =& -g_{ij}(\xi ) + \frac{1-\alpha } {2} g_{ij}(\xi ) {}\\ & =& -\frac{1+\alpha } {2} g_{ij}(\xi ). {}\\ \end{array}$$

 ■ 

The symmetry of relation (i) implies that the Fisher metric induced by both α and −α-representations are the same.

Proposition 2.10.2

The components of the α-connection are given in terms of the α-representation as

$$\displaystyle{ \varGamma _{ij,k}^{(\alpha )} =\sum _{ r=1}^{n}\partial _{ i}\partial _{j}\ell^{(\alpha )}(x_{ r};\xi )\,\partial _{k}\ell^{(-\alpha )}(x_{ r};\xi ). }$$
(2.10.16)

Proof:

Combining relations (2.10.14) and (2.10.15)

$$\displaystyle\begin{array}{rcl} \sum _{r=1}^{n}\partial _{ i}\partial _{j}\ell^{(\alpha )}\,\partial _{ k}\ell^{(-\alpha )}& =& \sum _{ r=1}^{n}p(x_{ r};\xi )\,\Big(\partial _{i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell(x_{r};\xi ) {}\\ & =& E_{\xi }\Big[\Big(\partial _{i}\partial _{j}\ell + \frac{1-\alpha } {2} \partial _{i}\ell\partial _{j}\ell\Big)\partial _{k}\ell\Big] {}\\ & =& \varGamma _{ij,k}^{(\alpha )}, {}\\ \end{array}$$

by (1.11.34). ■ 

The particular values \(\alpha = -1,0,1\) provide distinguished important cases of representations of \(\mathcal{P}(\mathcal{X})\).

10.1 − 1-Representation

If \(\alpha = -1\), then \(\varphi _{-1}(u) = u\), and \(\ell^{(-1)}\big(p(x;\xi )\big) = p(x;\xi )\) is the identical imbedding of \(\mathcal{P}(\mathcal{X})\) into \(\mathbb{R}^{\mathcal{X}}\). Thus \(\mathcal{P}(\mathcal{X})\) is an open set of the affine space \(\mathcal{A}_{1} =\{ f: \mathcal{X} \rightarrow \mathbb{R};\sum _{k=1}^{n}f(x_{k}) = 1\}\). Therefore, the tangent space at any point p ξ can be identified with the following affine variety

$$\displaystyle{T_{\xi }^{(-1)}(\mathcal{P}(\mathcal{X})) = \mathcal{A}_{ 0} =\{ f: \mathcal{X} \rightarrow \mathbb{R};\sum _{k=1}^{n}f(x_{ k}) = 0\}.}$$

The coordinate vector fields in this representation are given by

$$\displaystyle{(\partial _{i}^{-1})_{{\xi }} = \partial _{ i}p_{\xi }.}$$

We can easily check that

$$\displaystyle{\sum _{k=1}^{n}(\partial _{ i}^{-1})_{{\xi }}(x_{ k}) =\sum _{ k=1}^{n}\partial _{ i}p_{\xi }(x_{k}) = \partial _{i}(1) = 0,}$$

so \((\partial _{i}^{-1})_{{\xi }} \in T_{\xi }(\mathcal{P})\), for any ξ.

10.2 0-Representation

This is also called the square root representation. In this case \(\varphi _{0}(u) = 2\sqrt{u}\), and the imbedding \(\varphi _{0}: \mathcal{P}(\mathcal{X}) \rightarrow \mathbb{R}^{\mathcal{X}}\) is

$$\displaystyle{p(x;\xi ) \rightarrow \varphi _{0}(p(x;\xi )) =\ell ^{(0)}(x;\xi ) = 2\sqrt{p(x;\xi )} =\theta (x) \in \mathbb{R}^{\mathcal{X}}.}$$

Since \(\sum _{k=1}^{n}\theta (x_{k})^{2} = 4\), the image of the imbedding \(\varphi _{0}\) is an open subset of the sphere of radius 2,

$$\displaystyle{\varphi _{0}\big(\mathcal{P}(\mathcal{X})\big) \subset \{\theta;\theta: \mathcal{X} \rightarrow \mathbb{R};\sum _{k}\theta (x_{k})^{2} = 4\}.}$$

The induced metric from the natural Euclidean metric of \(\mathbb{R}^{\mathcal{X}}\) on this sphere is

$$\displaystyle\begin{array}{rcl} \langle \partial _{i}\theta,\partial _{j}\theta \rangle & =& \sum _{k=1}^{n}\partial _{ i}\theta (x_{k})\partial _{j}\theta (x_{k}) {}\\ & =& 4\sum _{k=1}^{n}\partial _{ i}\sqrt{p(x_{k };\xi )}\partial _{j}\sqrt{p(x_{k };\xi )} {}\\ & =& g_{ij}(\xi ), {}\\ \end{array}$$

i.e., the Fisher metric on the statistical model \(\mathcal{P}(\mathcal{X})\).

The coordinate vector fields are given by

$$\displaystyle{(\partial _{i}^{0})_{{\xi }} = \partial _{ i}\ell^{(0)}(x;\xi ) = \frac{1} {\sqrt{p(x;\xi )}}\partial _{i}p(x;\xi ).}$$

The next computation deals with the tangent space generated by \((\partial _{i}^{0})_{{\xi }}\). We have

$$\displaystyle\begin{array}{rcl} \langle \theta,(\partial _{i}^{0})_{{\xi }}\rangle & =& \sum _{ k=1}^{n}\theta (x_{ k}) \frac{1} {p(x_{k};\xi )}\partial _{i}p(x_{k};\xi ) {}\\ & =& \sum _{k=1}^{n}2\sqrt{p(x_{ k};\xi )} \frac{1} {p(x_{k};\xi )}\partial _{i}p(x_{k};\xi ) {}\\ & =& 2\partial _{i}\sum _{k=1}^{n}p(x_{ k};\xi ) = 0, {}\\ \end{array}$$

so that the vector \((\partial _{i}^{0})_{{\xi }}\) is perpendicular on the vector θ, and hence belongs to the tangent plane to the sphere at θ. This can be identified with the tangent space \(T_{\xi }^{(0)}\mathcal{P}(\mathcal{X})\) in the 0-representation.

10.3 1-Representation

This is also called the exponential (or the logarithmic) representation, because each distribution \(p(x;\xi ) \in \mathcal{P}(\mathcal{X})\) is identified with \(\ln p(x;\xi ) \in \mathbb{R}^{\mathcal{X}}\). In this case the 1-likelihood function becomes \(\ell^{(1)}(x;\xi ) =\ell ^{}(x;\xi ) =\ln p(x;\xi )\), i.e., the usual likelihood function.

The coordinate vector fields are given by

$$\displaystyle{(\partial _{i}^{1})_{{\xi }} = \partial _{ i}\ell^{(1)}(x;\xi ) = \frac{1} {p(x;\xi )}\partial _{i}p(x;\xi ).}$$

In the virtue of the computation

$$\displaystyle{E_{p}[(\partial _{i}^{1})_{{\xi }}] = E_{ p}[\partial _{i}\ell^{(1)}(x;\xi )] =\sum _{ k=1}^{n}\partial _{ i}p(x_{k};\xi ) = \partial _{i}(1) = 0,}$$

it follows that the tangent space in this representation is given by

$$\displaystyle{T_{p}^{(1)}(\mathcal{P}(\mathcal{X})) =\{ f;f \in \mathbb{R}^{\mathcal{X}}\},E_{ p}[f] = 0\}.}$$

It is worth noting that tangent spaces are invariant objects, that do not depend on any representation. However, when considering different system of parameters, tangent vectors can be described by some particular relations, like in the cases of ± 1 and 0 representations.

10.4 Fisher Metric

Let ξ i = p(x i ; ξ), \(i = 1,\ldots,n - 1\), be the coordinates on \(\mathcal{P}(\mathcal{X})\). Since \(p(x_{n};\xi ) = 1 -\sum _{j=1}^{n-1}\xi ^{j}\), then the partial derivatives with respect to ξ j are

$$\displaystyle{\partial _{i}p(x_{k};\xi ) = \left \{\begin{array}{ll} \delta _{ik}, &\mbox{ if}\;k = 1,\ldots,n - 1 \\ - 1,&\mbox{ if}\;k = n. \end{array} \right.}$$

Then the Fisher metric is given by

$$\displaystyle\begin{array}{rcl} g_{ij}(\xi )& =& E_{p}[\partial _{i}\ell\,\partial _{j}\ell] =\sum _{ k=1}^{n}p(x_{ k};\xi )\partial _{i}\ln p(x_{k};\xi )\partial _{j}\ln p(x_{k};\xi ) {}\\ & =& \sum _{k=1}^{n}\frac{\partial _{i}p(x_{k};\xi )\partial _{j}p(x_{k};\xi )} {p(x_{k};\xi )} {}\\ & =& \sum _{k=1}^{n-1}\frac{\delta _{ik}\delta _{jk}} {\xi ^{k}} + \frac{1} {1 -\sum _{j=1}^{n-1}\xi ^{j}} {}\\ & =& \frac{\delta _{ij}} {\xi ^{j}} + \frac{1} {1 -\sum _{j=1}^{n-1}\xi ^{j}}\cdot {}\\ \end{array}$$

11 Problems

  1. 2.1.

    Consider the statistical model given by the densities of a normal family

    $$\displaystyle{p(x,\xi ) = \frac{1} {\sigma \sqrt{2\pi }}e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },\,\,\,x \in \mathcal{X} = \mathbb{R},}$$

    with parameters \((\xi ^{1},\xi ^{2}) = (\mu,\sigma ) \in \mathbb{R} \times (0,\infty )\).

    1. (a)

      Show that the log-likelihood function and its derivatives are given by

      $$\displaystyle\begin{array}{rcl} \ell_{x}(\xi )=\ln p(x,\xi )& \!\!=\!\!& -\frac{1} {2}\ln (2\pi )-\ln \sigma -\frac{(x-\mu )^{2}} {2\sigma ^{2}} {}\\ \partial _{\sigma }\ell_{x}(\xi ) = \partial _{\sigma }\ln p(x,\xi )& \!\!=\!\!& -\frac{1} {\sigma } + \frac{1} {\sigma ^{3}} (x-\mu )^{2} {}\\ \partial _{\sigma }\partial _{\sigma }\ell_{x}(\xi ) = \partial _{\sigma }\partial _{\sigma }\ln p(x,\xi )& \!\!=\!\!& \frac{1} {\sigma ^{2}} -\frac{3} {\sigma ^{4}} (x-\mu )^{2} {}\\ \partial _{\mu }\ell_{x}(\xi ) = \partial _{\mu }\ln p(x,\xi )& \!\!=\!\!& \frac{1} {\sigma ^{2}} (x-\mu ) {}\\ \partial _{\mu }\partial _{\mu }\ell_{x}(\xi ) = \partial _{\mu }\partial _{\mu }\ln p(x,\xi )& \!\!=\!\!& -\frac{1} {\sigma ^{2}} {}\\ \partial _{\sigma }\partial _{\mu }\ell_{x}(\xi ) = \partial _{\sigma }\partial _{\mu }\ln p(x,\xi )& \!\!=\!\!& -\frac{2} {\sigma ^{3}} (x-\mu ). {}\\ \end{array}$$
    2. (b)

      Show that the Fisher–Riemann metric components are given by

      $$\displaystyle{g_{11} = \frac{1} {\sigma ^{2}},\qquad g_{12} = g_{21} = 0,\qquad g_{22} = \frac{2} {\sigma ^{2}}.}$$
  2. 2.2.

    Consider the statistical model defined by the lognormal distribution

    $$\displaystyle{p_{\mu,\sigma }(x) = \frac{1} {\sqrt{2\pi }\,\sigma x}e^{-\frac{(\ln x-\mu )^{2}} {2\sigma ^{2}} },\qquad x > 0.}$$
    1. (a)

      Show that the log-likelihood function and its derivatives are given by

      $$\displaystyle\begin{array}{rcl} \ell(\mu,\sigma )& =& -\ln \sqrt{2\pi } -\ln \sigma -\ln x - \frac{1} {2\sigma ^{2}}(\ln x-\mu )^{2} {}\\ \partial _{\mu }^{2}\ell(\mu,\sigma )& =& -\frac{1} {\sigma ^{2}} {}\\ \partial _{\sigma }^{2}\ell(\mu,\sigma )& =& \frac{1} {\sigma ^{2}} -\frac{3} {\sigma ^{4}} (\ln x-\mu )^{2} {}\\ \partial _{\mu }\partial _{\sigma }\ell(\mu,\sigma )& =& -\frac{2} {\sigma ^{3}} (\ln x-\mu ). {}\\ \end{array}$$
    2. (b)

      Using the substitution \(y =\ln x-\mu\), show that the components of the Fisher–Riemann metric are given by

      $$\displaystyle{g_{\sigma \sigma } = \frac{2} {\sigma ^{2}},\qquad g_{\mu \mu } = \frac{1} {\sigma ^{2}},\qquad g_{\mu \sigma } = g_{\sigma \mu } = 0.}$$
  3. 2.3.

    Let

    $$\displaystyle{p_{{\xi }}(x) = p_{{\alpha,\beta }}(x) = \frac{1} {\beta ^{\alpha }\varGamma (\alpha )}\,\,x^{\alpha -1}e^{-x/\beta },}$$

    with \((\alpha,\beta ) \in (0,\infty ) \times (0,\infty )\), x ∈ (0, ) be the statistical model defined by the gamma distribution.

    1. (a)

      Show that the log-likelihood function is

      $$\displaystyle{\ell_{x}(\xi ) =\ln p_{{\xi }} = -\alpha \ln \beta -\ln \varGamma (\alpha ) + (\alpha -1)\ln x -\frac{x} {\beta }.}$$
    2. (b)

      Verify the relations

      $$\displaystyle\begin{array}{rcl} \partial _{\beta }\ell_{x}(\xi )& =& -\frac{\alpha } {\beta } + \frac{x} {\beta ^{2}} {}\\ \partial _{\alpha \beta }\ell_{x}(\xi )& =& -\frac{1} {\beta } {}\\ \partial _{\beta }^{2}\ell_{ x}(\xi )& =& \frac{\alpha } {\beta ^{2}} -\frac{2x} {\beta ^{3}} {}\\ \partial _{\alpha }\ell_{x}(\xi )& =& -\ln \beta -\psi (\alpha ) +\ln x {}\\ \partial _{\alpha }^{2}\ell_{ x}(\xi )& =& -\psi _{1}(\alpha ), {}\\ \end{array}$$

      where

      $$\displaystyle{ \psi (\alpha ) = \frac{\varGamma ^{\prime}(\alpha )} {\varGamma (\alpha )},\qquad \psi _{1}(\alpha ) =\psi ^{\prime}(\alpha ) }$$
      (2.11.17)

      are the digamma and the trigamma functions, respectively.

    3. (c)

      Prove that for α > 0, we have

      $$\displaystyle{\sum _{n\geq 0}\,\, \frac{\alpha } {(\alpha +n)^{2}} > 1.}$$
  4. 2.4.

    Consider the beta distribution

    $$\displaystyle{p_{a,b} = \frac{1} {B(a,b)}\,\,x^{a-1}(1 - x)^{b-1},\qquad a,b > 0,x \in [0,1].}$$

    (a) Using that the beta function

    $$\displaystyle{ B(a,b) =\int _{ 0}^{1}x^{a-1}(1 - x)^{b-1}\,dx }$$

    can be expressed in terms of gamma function as

    $$\displaystyle{ B(a,b) = \frac{\varGamma (a)\varGamma (b)} {\varGamma (a + b)}, }$$

    show that its partial derivatives can be written in terms of digamma functions, as

    $$\displaystyle\begin{array}{rcl} \partial _{a}\ln B(a,b) =\psi (a) -\psi (a + b)& & {}\end{array}$$
    (2.11.18)
    $$\displaystyle\begin{array}{rcl} \partial _{b}\ln B(a,b) =\psi (b) -\psi (a + b).& & {}\end{array}$$
    (2.11.19)

    (b) Show that the log-likelihood function is given by

    $$\displaystyle{\ell(a,b) =\ln p_{a,b} = -\ln B(a,b) + (a - 1)\ln x + (b - 1)\ln (1 - x).}$$

    (c) Take partial derivatives and use formulas (2.11.18) and (2.11.19) to verify relations

    $$\displaystyle\begin{array}{rcl} \partial _{a}\ell(a,b)& =& -\partial _{a}\ln B(a,b) +\ln x =\psi (a + b) -\psi (a) +\ln x {}\\ \partial _{b}\ell(a,b)& =& \psi (a + b) -\psi (b) +\ln (1 - x) {}\\ \partial _{a}^{2}\ell(a,b)& =& \psi ^{\prime}(a + b) -\psi ^{\prime}(a) =\psi _{ 1}(a + b) -\psi _{1}(a) {}\\ \partial _{b}^{2}\ell(a,b)& =& \psi ^{\prime}(a + b) -\psi ^{\prime}(b) =\psi _{ 1}(a + b) -\psi _{1}(b) {}\\ \partial _{a}\partial _{b}\ell(a,b)& =& \psi ^{\prime}(a + b) =\psi _{1}(a + b). {}\\ \end{array}$$

    (c) Using the expression of trigamma functions as a Hurwitz zeta function, show that the Fisher information matrix can be written as a series \(g =\sum _{n\geq 0}g_{n}\), where

    $$\displaystyle{g_{n} = \left (\begin{array}{cc} \frac{1} {(a+n)^{2}} - \frac{1} {(a+b+n)^{2}} & - \frac{1} {(a+b+n)^{2}}\\ \\ - \frac{1} {(a+b+n)^{2}} & \frac{1} {(b+n)^{2}} - \frac{1} {(a+b+n)^{2}}\\ \end{array} \right ).}$$
  5. 2.5.

    Let \(\mathcal{S} =\{ p_{\xi };\xi \in [0,1]\}\) be a one-dimensional statistical model, where

    $$\displaystyle{p(k;\xi ) =\Big (_{k}^{n}\Big)\xi ^{k}(1-\xi )^{n-k}}$$

    is the Bernoulli distribution, with \(k \in \{ 0,1,\ldots,n\}\) and ξ ∈ [0, 1]. Show that the derivatives of the log-likelihood function k (ξ) = lnp(k; ξ) are

    $$\displaystyle\begin{array}{rcl} \partial _{\xi }\ell_{k}(\xi )& =& \frac{k} {\xi } - (n - k) \frac{1} {1-\xi } {}\\ \partial _{\xi }^{2}\ell_{ k}(\xi )& =& -\frac{k} {\xi ^{2}} - (n - k) \frac{1} {(1-\xi )^{2}}\cdot {}\\ \end{array}$$
  6. 2.6.

    Consider the geometric probability distribution \(p(k;\xi ) = (1-\xi )^{k-1}\xi\), \(k \in \{ 1,2,3,\ldots \}\), ξ ∈ [0, 1]. Show that

    $$\displaystyle\begin{array}{rcl} \partial _{\xi }\ell_{k}(\xi )& =& \frac{k - 1} {\xi -1} + \frac{1} {\xi } {}\\ \partial _{\xi }^{2}\ell_{ k}(\xi )& =& -\frac{(k - 1)} {(\xi -1)^{2}} -\frac{1} {\xi ^{2}} \cdot {}\\ \end{array}$$
  7. 2.7.

    Let f be a density function on \(\mathbb{R}\) and define the statistical model

    $$\displaystyle{\mathcal{S}_{f} =\Big\{ p(x;\mu,\sigma ) = \frac{1} {\sigma } f\Big(\frac{x-\mu } {\sigma } \Big);\mu \in \mathbb{R},\sigma > 0\Big\}.}$$
    1. (a)

      Show that \(\int _{\mathbb{R}}p(x;\mu,\sigma )\,dx = 1\).

    2. (b)

      Verify the following formulas involving the log-likelihood function  = lnp( ⋅ ; μ, σ):

      $$\displaystyle{\partial _{\mu }\ell = -\frac{1} {\sigma } \frac{f^{\prime}} {f},\qquad \partial _{\sigma }\ell = -\frac{1} {\sigma } -\frac{(x-\mu )} {\sigma ^{2}} \frac{f^{\prime}} {f}}$$
      $$\displaystyle{\partial _{\mu }\partial _{\sigma }\ell = \frac{1} {f^{2}}\Big[\Big(\frac{f^{\prime}} {\sigma ^{2}} + \frac{1} {\sigma } \frac{x-\mu } {\sigma ^{2}} f^{\prime \prime}\Big)f -\frac{1} {\sigma } \frac{x-\mu } {\sigma ^{2}} (f^{\prime})^{2}\Big].}$$
    3. (b)

      Show that for any continuous function h we have

      $$\displaystyle{E_{(\mu,\sigma )}\Big[h\Big(\frac{x-\mu } {\sigma } \Big)\Big] = E_{(0,1)}[h(x)].}$$
    4. (c)

      Assume that f is an even function (i.e., \(f(-x) = f(x)\)). Show that the Fisher–Riemann metric, g, has a diagonal form (i.e., g 12 = 0).

    5. (d)

      Prove that the Riemannian space \((\mathcal{S}_{f},g)\) has a negative, constant curvature.

    6. (e)

      Consider \(f(x) = \frac{1} {\sqrt{2\pi }}e^{-x^{2}/2 }\). Use the aforementioned points to deduct the formula for g ij and to show that the curvature \(K = -\frac{1} {2}\).

  8. 2.8.

    Search the movement of the curve

    $$\displaystyle{(\mu,\sigma ) \rightarrow p_{\mu,\sigma }(x) = \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },\,\,\mu ^{2} +\sigma ^{2} = 1}$$

    with \((\mu,\sigma,p) \in \mathbb{R} \times (0,\infty ) \times (0,\infty ),\,x \in \mathbb{R}\), fixed, in the direction of the binormal vector field.

  9. 2.9.

    The graph of the normal density of probability

    $$\displaystyle{x \rightarrow p_{\mu,\sigma }(x) = \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} }}$$

    is called Gauss bell. Find the equation of the surface obtained by revolving the Gauss bell about:

    1. (a)

      Ox axis;

    2. (b)

      Op axis.

  10. 2.10.

    Inspect the movement of the trajectories of the vector field (y, z, x) after the direction of the vector field

    $$\displaystyle{\left (1,1, \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} }\right ),}$$

    where μ and σ are fixed.

  11. 2.11.

    The normal surface

    $$\displaystyle{(\mu,\sigma ) \rightarrow p_{\mu,\sigma }(x) = \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },}$$
    $$\displaystyle{(\mu,\sigma ) \in \mathbb{R} \times (0,\infty );\,x \in \mathbb{R}}$$

    is deformed into \(p_{\mu,\sigma }(tx),\,t \in \mathbb{R}\). What happens with the Gauss curvature?

  12. 2.12.

    The gamma surface

    $$\displaystyle{(\alpha,\beta ) \rightarrow p_{\alpha,\beta }(x) = \frac{1} {\beta ^{\alpha }\varGamma (\alpha )}\,x^{\alpha -1}\,e^{-\frac{x} {\beta } }}$$
    $$\displaystyle{(\alpha,\beta ) \in (0,\infty ) \times (0,\infty );\,x \in (0,\infty )}$$

    is deformed into \(p_{t\alpha,\beta }(x),\,t \in (0,\infty )\). What happens with the mean curvature?