Abstract
This chapter presents a few examples of usual statistical models (normal, lognormal, beta, gamma, Bernoulli, and geometric) for which we provide the Fisher metric explicitly and, if possible, the geodesics and α-autoparallel curves. Some Fisher metrics will involve the use of non-elementary functions, such as the digamma and trigamma functions.
Access provided by Autonomous University of Puebla. Download chapter PDF
Keywords
These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.
This chapter presents a few examples of usual statistical models (normal, lognormal, beta, gamma, Bernoulli, and geometric) for which we provide the Fisher metric explicitly and, if possible, the geodesics and α-autoparallel curves. Some Fisher metrics will involve the use of non-elementary functions, such as the digamma and trigamma functions.
A distinguished role is dedicated to the normal distribution, which is associated with a manifold of negative constant curvature (hyperbolic space) and to the multinomial geometry, which corresponds to a space with positive constant curvature (spherical space).
1 The Normal Distribution
In this section we shall determine the geodesics with respect to the Fisher information metric of a family of normal distributions. Given two distributions of the same family, the geodesics are curves of minimum information joining the distributions. We shall see that such a curve always exists between any two distributions on a normal family. This is equivalent with the possibility of deforming one distribution into the other by keeping the change of information to a minimum.
1.1 The Fisher Metric
Recall the formula for the density of a normal family
with parameters \((\xi ^{1},\xi ^{2}) = (\mu,\sigma ) \in \mathbb{R} \times (0,\infty )\). Using Proposition 1.6.3 we obtain the following components for the Fisher–Riemann metric.
Proposition 2.1.1
The Fisher information matrix for the normal distribution is given by
For the computation details see Problem 2.1. It is worth noting that the metric does not depend on μ, i.e., it is translation invariant. This metric is also very similar to the upper-half plane metric.
1.2 The Geodesics
A straightforward computation shows that the nonzero Christoffel symbols of first and second kind are:
Consequently, the geodesics equations (1.13.43) are solutions of a Riccati ODE system
Separating and integrating in the first equation yields
with c constant. We solve the equation in the following two cases:
1. The case c = 0. It follows that μ = constant, which corresponds to vertical half lines. Then σ satisfies the equation \(\ddot{\sigma }= \frac{1} {\sigma } \dot{\sigma }^{2}\). Writing the equation as
and integrating yields \(\ln \dot{\sigma }=\ln (C\sigma )\), with C constant. Integrating again, we find σ(s) = Ke Cs. Hence, the geodesics in this case have the following explicit equations
with \(c,C \in \mathbb{R}\), K > 0 constants.
2. The case \(c\not =0\) . Substituting \(\dot{\mu }= x\sigma ^{2}\) in Eq. (2.1.3), we obtain the following equation in σ
Let \(\dot{\sigma }= u\). Then \(\ddot{\sigma }= \frac{du} {d\sigma } u\) and (2.1.6) becomes
Multiplying by the integrant factor \(\frac{1} {\sigma ^{3}}\) leads to the exact equation
since
Then there is a function f(σ, u) such that df = 0, with
Integrating in the first equation yields
with function h to be determined in the following. Differentiating with respect to σ in the above equation,
and comparing with
we get
with c 0 constant. Hence, a first integral for the system is
with E positive constant. Solving for u, we obtain
where \(C^{2} = 2E/c^{2}\). Separating and integrating, we find
Using the value of the integral
we obtain
Solving for σ, we get
In order to find μ we integrate in \(\dot{\mu }= c\sigma ^{2}\) and obtain
Since we have
the geodesics will be half-ellipses, with σ > 0.
In the case c = 0, the unknown σ satisfies
with solution
while μ is constant, μ = K. The geodesics in this case are vertical half-lines.
Proposition 2.1.2
Consider two normal distributions with equal means, μ 0 = μ 1 , and distinct standard deviations σ 0 and σ 1 . Then the smallest information transform, which deforms the first distribution into the second one, is a normal distribution with constant mean and standard deviation
Proof:
The geodesic in this case is a vertical half-line with constant mean and \(\sigma (s) =\sigma (0)e^{\sqrt{E}s}\). The amount \(\sqrt{E}\) can be found from the boundary condition σ(τ) = σ 1. ■
Let x 0 = lnσ 0, x 1 = lnσ 1, and x(s) = lnσ(s). Then \(x(s) = \frac{s} {\tau } x_{0} +\big (1 -\frac{s} {\tau } \big)x_{1}\), which corresponds to a line segment. The minimal information loss during the deformation occurs when the log-likelihood function describes a line segment.
1.3 α-Autoparallel Curves
A straightforward computation, using (1.11.34), yields the following Christoffel coefficients of first kind
The Christoffel symbols of second kind are obtained by rising indices
The Riccati equations (1.13.43) for the α-autoparallel curves are
The first equation can be transformed as in the following
with c constant. Substituting in the second equation yields
which after the new substitution \(u =\dot{\sigma }\) writes as
Multiplying the equation by an integral factor of the form σ k+1, we obtain
From the closeness condition
we determine \(k + 1 = -(4\alpha + 2)\). The exact equation we need to solve is
We need to determine a function f that satisfies the system
From the first equation, we have
and comparing with the second equation yields
Hence, a first integral of motion is given by
with E constant. Next we shall solve for σ. Using that \(u =\dot{\sigma }\), we have
where
The left side integral can be transformed using the substitutions t = σ 2, \(v = \sqrt{C^{2 } - t}\) as follows
and hence (2.1.7) becomes
The μ-component is given by
There are a few particular values of α for which this equation can be solved explicitly.
Case α = −1
Equation (2.1.8) becomes
with solution
for K constant. Equation (2.1.3) easily yields
Case α = 1∕2
Since
we solve
and obtain
The μ-component is given by the integral
2 Jeffrey’s Prior
In the following we shall compute the prior on the statistical model
which represents a vertical half line in the upper-half plane. The determinant is
Then the volume is computed as
Therefore the prior on \(\mathcal{S}_{\mu }\) is given by
3 Lognormal Distribution
In the case of lognormal distribution
the Fisher information matrix (Fisher–Riemann metric) is given by
The computation details are left for the reader and are the subject of Problem 2.2. It is worth noting that this coincides with the Fisher metric of a normal distribution model. Hence, the associated geodesics are vertical half lines or halfs of ellipses.
4 Gamma Distribution
In this case the statistical model is defined by the following family of densities
with \((\alpha,\beta ) \in (0,\infty ) \times (0,\infty )\), x ∈ (0, ∞). In the study of this model we need some special functions. Let
be the digamma and the trigamma functions, respectively. Differentiating in the Dirichlet’s integral representation (see Erdélyi [42] vol. I, p. 17)
yields the following integral expression for the trigamma function
Another interesting formula is the expression of the trigamma function as a Hurwitz zeta function
which holds for \(\alpha \notin \{0,-1,-2,-3,\ldots \}\), relation obviously satisfied in our case since α > 0.
Then the components of the Fisher–Riemann metric are obtained from Proposition 1.6.3, using the relations
and the derivatives of the log-likelihood function that are asked to be computed in Problem 2.2:
Proposition 2.4.1
The Fisher information matrix (Fisher–Riemann metric) for the gamma distribution is
It is worth noting that here α is the parameter for the gamma distribution and it has nothing to do with α-connections.
5 Beta Distribution
The Fisher information metric for the beta distribution
will be worked in terms of trigamma functions. Since the beta function
can be expressed in terms of gamma functions as
then its partial derivatives can be written in terms of digamma functions, using relation (2.11.17), see Problem 2.4, part (a).
The log-likelihood function and its partial derivatives are left for the reader as an exercise in Problem 2.4, parts (b) and (c). Since the second partial derivatives of ℓ(a, b) do not depend on x, they will coincide with their own expected values. It follows the next components for the Fisher–Riemann metric:
Proposition 2.5.1
The Fisher information matrix (Fisher–Riemann metric) for the beta distribution is given by
where ψ 1 stands for the trigamma function.
6 Bernoulli Distribution
Consider the sample space \(\mathcal{X} =\{ 0,1,\ldots,n\}\) and parameter space \(\mathbb{E} = [0,1]\). The Bernoulli, or binomial distribution, is given by
where the parameter ξ denotes the success probability. Then \(\mathcal{S} =\{ p_{\xi };\xi \in [0,1]\}\) becomes a one-dimensional statistical model. The derivatives of the log-likelihood function ℓ k (ξ) = lnp(k; ξ) are proposed as an exercise in Problem 2.5. Then the Fisher information is given by the function
where we used that the mean of a Bernoulli distribution is n ξ. Using that the variance is n ξ(1 −ξ), it follows that
which is a Cramér–Rao type identity corresponding to an efficient estimator.
7 Geometric Probability Distribution
Let \(\mathcal{X} =\{ 1,2,3,\ldots \}\), \(\mathbb{E} = [0,1]\) and consider \(p(k;\xi ) = (1-\xi )^{k-1}\xi\), \(k \in \mathcal{X}\), \(\xi \in \mathbb{E}\). The formulas for the partial derivatives of the log-likelihood function are left as an exercise for the reader in Problem 2.6. Then the Fisher information becomes
where we used the expression for the mean \(\sum _{k\geq 1}k\,p(k;\xi ) = \frac{1} {\xi }\).
8 Multinomial Geometry
In this section we investigate the geometry associated with the multinomial probability distribution. The computation performed here is inspired from Kass and Vos [49]. Consider m independent, identical trials with n possible outcomes. The probability that a single trial falls into class i is p i , \(i = 1,2,\ldots,n\), and remains the same from trial to trial. Since \(p_{1} +\ldots +p_{n} = 1\), the parameter space is given by the (n − 1)-dimensional simplex
It is advantageous to consider the new parameterization
Then \(\sum _{i=1}^{n}z_{i}^{2} = 4\), and hence
Therefore, the statistical manifold of multinomial probability distributions can be identified with \(\mathbb{S}_{2,+}^{n-1}\), the positive portion of the (n − 1)-dimensional sphere of radius 2. The Fisher information matrix with respect to a local coordinate system (ξ i) is
where \(\partial _{s} = \partial _{\xi ^{s}}\). Therefore, the Fisher metric is the natural metric induced from the Euclidean metric of \(\mathbb{R}^{n}\) on the sphere \(\mathbb{S}_{2,+}^{n-1}\). We note that ∂ r z is a tangent vector to the sphere in the direction of ξ r.
To find the information distance between two multinomial distributions p and q, we need to find the length of the shortest curve on the sphere \(\mathbb{S}_{2,+}^{n-1}\), joining p and q. The curve that achieves the minimum is an arc of great circle passing through p and q, and this curve is unique.
Let z p and z q denote the points on the sphere corresponding to the aforementioned distributions. The angle α made by the unit vectors z p ∕2 and z q ∕2 satisfies \(\cos \alpha =\langle z_{p}/2,z_{q}/2\rangle\). Since the distance on the sphere is the product between the radius and the central angle, we have
It is worthy to note that the Euclidean distance between p and q can be written as
which is called the Hellinger distance between p and q. We shall discuss about this distance in more detail later.
The foregoing computation of the Fisher metric was exploiting geometric properties. In the following we shall provide a direct computation. We write the statistical model of multinomial distributions by \(\mathcal{S} =\{ p(k;\xi )\}\), with
where
and \(\xi = (\xi ^{1},\ldots,\xi ^{m-1}) \in \mathbb{E} = [0,1]^{m-1}\), with ξ i = p i , \(i = 1,\ldots m - 1\), and \(p_{m} = 1 - p_{1} -\ldots -p_{m-1}\). Then a straightforward computation shows
Using the formula for the marginal probability
we have
9 Poisson Geometry
Consider m independent Poisson distributions with parameters λ i , \(i = 1,\ldots,m\). The joint probability function is given by the product
with \(\lambda = (\lambda _{1},\ldots,\lambda _{m+1}) \in \mathbb{E} = (0,\infty )^{m}\), and \(x = (x_{1},\ldots,x_{m}) \in \mathcal{X} = (\mathbb{N} \cup \{ 0\})^{m}\). The log-likelihood function and its derivatives with respect to \(\partial _{j} = \partial _{\lambda _{j}}\) are
Then the Fisher information is obtained as
Therefore the Fisher matrix has a diagonal form with positive entries.
10 The Space \(\mathcal{P}(\mathcal{X})\)
Let \(\mathcal{X} =\{ x_{1},\ldots,x_{n}\}\) and consider the statistical model \(\mathcal{P}(\mathcal{X})\) of all discrete probability densities on \(\mathcal{X}\). The space \(\mathcal{P}(\mathcal{X})\) can be imbedded into the function space \(\mathbb{R}^{\mathcal{X}} =\{ f;f: \mathcal{X} \rightarrow \mathbb{R}\}\) in several ways, as we shall describe shortly. This study can be found in Nagaoka and Amari [61].
For any \(\alpha \in \mathbb{R}\) consider the function \(\varphi _{\alpha }: (0,\infty ) \rightarrow \mathbb{R}\)
The imbedding
is called the α -representation of \(\mathcal{P}(\mathcal{X})\). A distinguished role will be played by the α-likelihood functions
The coordinate tangent vectors in this representation are given by
The α-representation can be used to define the Fisher metric and the ∇(α)-connection on \(\mathcal{P}(\mathcal{X})\).
Proposition 2.10.1
The Fisher metric can be written in terms of the α-likelihood functions as in the following
-
(i)
\(g_{ij}(\xi ) =\sum _{ k=1}^{n}\partial _{ i}\ell^{(\alpha )}(x_{ k};\xi )\partial _{j}\ell^{(-\alpha )}(x_{ k};\xi );\)
-
(ii)
\(g_{ij}(\xi ) = - \frac{2} {1+\alpha }\sum _{k=1}^{n}p(x_{ k};\xi )^{\frac{1+\alpha } {2} }\partial _{i}\partial _{j}\ell^{(\alpha )}(x_{k};\xi ).\)
Proof:
Differentiating yields
where ℓ(x; ξ) = lnp(x; ξ).
-
(i)
The previous computations and formula (1.6.16) provide
$$\displaystyle\begin{array}{rcl} \sum _{k=1}^{n}\partial _{ i}\ell^{(\alpha )}(x_{ k};\xi )\partial _{j}\ell^{(-\alpha )}(x_{ k};\xi )& =& \sum _{k=1}^{n}p^{\frac{1-\alpha } {2} }\partial _{i}\ell(x_{k})p^{\frac{1+\alpha } {2} }\partial _{j}\ell(x_{k}) {}\\ & =& \sum _{k=1}^{n}p(x_{ k};\xi )\partial _{i}\ell(x_{k})\partial _{j}\ell(x_{k}) {}\\ & =& E_{\xi }[\partial _{i}\ell\,\partial _{j}\ell] = g_{ij}(\xi ). {}\\ \end{array}$$ -
(ii)
Relation (2.10.15) implies
$$\displaystyle{p^{\frac{1+\alpha } {2} }\partial _{i}\partial _{j}\ell^{(\alpha )}(x;\xi ) = p(x;\xi )\partial _{i}\partial _{j}\ell(x;\xi ) + \frac{1-\alpha } {2} p(x;\xi )\partial _{i}\ell\partial _{j}\ell(x;\xi ).}$$Summing and using (1.6.16) and (1.6.18), we have
$$\displaystyle\begin{array}{rcl} \sum _{k=1}^{n}p(x_{ k};\xi )^{\frac{1+\alpha } {2} }\partial _{i}\partial _{j}\ell^{(\alpha )}(x_{k};\xi )& =& E_{\xi }[\partial _{i}\partial _{j}\ell] + \frac{1-\alpha } {2} E_{\xi }[\partial _{i}\ell\,\partial _{j}\ell] {}\\ & =& -g_{ij}(\xi ) + \frac{1-\alpha } {2} g_{ij}(\xi ) {}\\ & =& -\frac{1+\alpha } {2} g_{ij}(\xi ). {}\\ \end{array}$$
■
The symmetry of relation (i) implies that the Fisher metric induced by both α and −α-representations are the same.
Proposition 2.10.2
The components of the α-connection are given in terms of the α-representation as
Proof:
Combining relations (2.10.14) and (2.10.15)
by (1.11.34). ■
The particular values \(\alpha = -1,0,1\) provide distinguished important cases of representations of \(\mathcal{P}(\mathcal{X})\).
10.1 − 1-Representation
If \(\alpha = -1\), then \(\varphi _{-1}(u) = u\), and \(\ell^{(-1)}\big(p(x;\xi )\big) = p(x;\xi )\) is the identical imbedding of \(\mathcal{P}(\mathcal{X})\) into \(\mathbb{R}^{\mathcal{X}}\). Thus \(\mathcal{P}(\mathcal{X})\) is an open set of the affine space \(\mathcal{A}_{1} =\{ f: \mathcal{X} \rightarrow \mathbb{R};\sum _{k=1}^{n}f(x_{k}) = 1\}\). Therefore, the tangent space at any point p ξ can be identified with the following affine variety
The coordinate vector fields in this representation are given by
We can easily check that
so \((\partial _{i}^{-1})_{{\xi }} \in T_{\xi }(\mathcal{P})\), for any ξ.
10.2 0-Representation
This is also called the square root representation. In this case \(\varphi _{0}(u) = 2\sqrt{u}\), and the imbedding \(\varphi _{0}: \mathcal{P}(\mathcal{X}) \rightarrow \mathbb{R}^{\mathcal{X}}\) is
Since \(\sum _{k=1}^{n}\theta (x_{k})^{2} = 4\), the image of the imbedding \(\varphi _{0}\) is an open subset of the sphere of radius 2,
The induced metric from the natural Euclidean metric of \(\mathbb{R}^{\mathcal{X}}\) on this sphere is
i.e., the Fisher metric on the statistical model \(\mathcal{P}(\mathcal{X})\).
The coordinate vector fields are given by
The next computation deals with the tangent space generated by \((\partial _{i}^{0})_{{\xi }}\). We have
so that the vector \((\partial _{i}^{0})_{{\xi }}\) is perpendicular on the vector θ, and hence belongs to the tangent plane to the sphere at θ. This can be identified with the tangent space \(T_{\xi }^{(0)}\mathcal{P}(\mathcal{X})\) in the 0-representation.
10.3 1-Representation
This is also called the exponential (or the logarithmic) representation, because each distribution \(p(x;\xi ) \in \mathcal{P}(\mathcal{X})\) is identified with \(\ln p(x;\xi ) \in \mathbb{R}^{\mathcal{X}}\). In this case the 1-likelihood function becomes \(\ell^{(1)}(x;\xi ) =\ell ^{}(x;\xi ) =\ln p(x;\xi )\), i.e., the usual likelihood function.
The coordinate vector fields are given by
In the virtue of the computation
it follows that the tangent space in this representation is given by
It is worth noting that tangent spaces are invariant objects, that do not depend on any representation. However, when considering different system of parameters, tangent vectors can be described by some particular relations, like in the cases of ± 1 and 0 representations.
10.4 Fisher Metric
Let ξ i = p(x i ; ξ), \(i = 1,\ldots,n - 1\), be the coordinates on \(\mathcal{P}(\mathcal{X})\). Since \(p(x_{n};\xi ) = 1 -\sum _{j=1}^{n-1}\xi ^{j}\), then the partial derivatives with respect to ξ j are
Then the Fisher metric is given by
11 Problems
-
2.1.
Consider the statistical model given by the densities of a normal family
$$\displaystyle{p(x,\xi ) = \frac{1} {\sigma \sqrt{2\pi }}e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },\,\,\,x \in \mathcal{X} = \mathbb{R},}$$with parameters \((\xi ^{1},\xi ^{2}) = (\mu,\sigma ) \in \mathbb{R} \times (0,\infty )\).
-
(a)
Show that the log-likelihood function and its derivatives are given by
$$\displaystyle\begin{array}{rcl} \ell_{x}(\xi )=\ln p(x,\xi )& \!\!=\!\!& -\frac{1} {2}\ln (2\pi )-\ln \sigma -\frac{(x-\mu )^{2}} {2\sigma ^{2}} {}\\ \partial _{\sigma }\ell_{x}(\xi ) = \partial _{\sigma }\ln p(x,\xi )& \!\!=\!\!& -\frac{1} {\sigma } + \frac{1} {\sigma ^{3}} (x-\mu )^{2} {}\\ \partial _{\sigma }\partial _{\sigma }\ell_{x}(\xi ) = \partial _{\sigma }\partial _{\sigma }\ln p(x,\xi )& \!\!=\!\!& \frac{1} {\sigma ^{2}} -\frac{3} {\sigma ^{4}} (x-\mu )^{2} {}\\ \partial _{\mu }\ell_{x}(\xi ) = \partial _{\mu }\ln p(x,\xi )& \!\!=\!\!& \frac{1} {\sigma ^{2}} (x-\mu ) {}\\ \partial _{\mu }\partial _{\mu }\ell_{x}(\xi ) = \partial _{\mu }\partial _{\mu }\ln p(x,\xi )& \!\!=\!\!& -\frac{1} {\sigma ^{2}} {}\\ \partial _{\sigma }\partial _{\mu }\ell_{x}(\xi ) = \partial _{\sigma }\partial _{\mu }\ln p(x,\xi )& \!\!=\!\!& -\frac{2} {\sigma ^{3}} (x-\mu ). {}\\ \end{array}$$ -
(b)
Show that the Fisher–Riemann metric components are given by
$$\displaystyle{g_{11} = \frac{1} {\sigma ^{2}},\qquad g_{12} = g_{21} = 0,\qquad g_{22} = \frac{2} {\sigma ^{2}}.}$$
-
(a)
-
2.2.
Consider the statistical model defined by the lognormal distribution
$$\displaystyle{p_{\mu,\sigma }(x) = \frac{1} {\sqrt{2\pi }\,\sigma x}e^{-\frac{(\ln x-\mu )^{2}} {2\sigma ^{2}} },\qquad x > 0.}$$-
(a)
Show that the log-likelihood function and its derivatives are given by
$$\displaystyle\begin{array}{rcl} \ell(\mu,\sigma )& =& -\ln \sqrt{2\pi } -\ln \sigma -\ln x - \frac{1} {2\sigma ^{2}}(\ln x-\mu )^{2} {}\\ \partial _{\mu }^{2}\ell(\mu,\sigma )& =& -\frac{1} {\sigma ^{2}} {}\\ \partial _{\sigma }^{2}\ell(\mu,\sigma )& =& \frac{1} {\sigma ^{2}} -\frac{3} {\sigma ^{4}} (\ln x-\mu )^{2} {}\\ \partial _{\mu }\partial _{\sigma }\ell(\mu,\sigma )& =& -\frac{2} {\sigma ^{3}} (\ln x-\mu ). {}\\ \end{array}$$ -
(b)
Using the substitution \(y =\ln x-\mu\), show that the components of the Fisher–Riemann metric are given by
$$\displaystyle{g_{\sigma \sigma } = \frac{2} {\sigma ^{2}},\qquad g_{\mu \mu } = \frac{1} {\sigma ^{2}},\qquad g_{\mu \sigma } = g_{\sigma \mu } = 0.}$$
-
(a)
-
2.3.
Let
$$\displaystyle{p_{{\xi }}(x) = p_{{\alpha,\beta }}(x) = \frac{1} {\beta ^{\alpha }\varGamma (\alpha )}\,\,x^{\alpha -1}e^{-x/\beta },}$$with \((\alpha,\beta ) \in (0,\infty ) \times (0,\infty )\), x ∈ (0, ∞) be the statistical model defined by the gamma distribution.
-
(a)
Show that the log-likelihood function is
$$\displaystyle{\ell_{x}(\xi ) =\ln p_{{\xi }} = -\alpha \ln \beta -\ln \varGamma (\alpha ) + (\alpha -1)\ln x -\frac{x} {\beta }.}$$ -
(b)
Verify the relations
$$\displaystyle\begin{array}{rcl} \partial _{\beta }\ell_{x}(\xi )& =& -\frac{\alpha } {\beta } + \frac{x} {\beta ^{2}} {}\\ \partial _{\alpha \beta }\ell_{x}(\xi )& =& -\frac{1} {\beta } {}\\ \partial _{\beta }^{2}\ell_{ x}(\xi )& =& \frac{\alpha } {\beta ^{2}} -\frac{2x} {\beta ^{3}} {}\\ \partial _{\alpha }\ell_{x}(\xi )& =& -\ln \beta -\psi (\alpha ) +\ln x {}\\ \partial _{\alpha }^{2}\ell_{ x}(\xi )& =& -\psi _{1}(\alpha ), {}\\ \end{array}$$where
$$\displaystyle{ \psi (\alpha ) = \frac{\varGamma ^{\prime}(\alpha )} {\varGamma (\alpha )},\qquad \psi _{1}(\alpha ) =\psi ^{\prime}(\alpha ) }$$(2.11.17)are the digamma and the trigamma functions, respectively.
-
(c)
Prove that for α > 0, we have
$$\displaystyle{\sum _{n\geq 0}\,\, \frac{\alpha } {(\alpha +n)^{2}} > 1.}$$
-
(a)
-
2.4.
Consider the beta distribution
$$\displaystyle{p_{a,b} = \frac{1} {B(a,b)}\,\,x^{a-1}(1 - x)^{b-1},\qquad a,b > 0,x \in [0,1].}$$(a) Using that the beta function
$$\displaystyle{ B(a,b) =\int _{ 0}^{1}x^{a-1}(1 - x)^{b-1}\,dx }$$can be expressed in terms of gamma function as
$$\displaystyle{ B(a,b) = \frac{\varGamma (a)\varGamma (b)} {\varGamma (a + b)}, }$$show that its partial derivatives can be written in terms of digamma functions, as
$$\displaystyle\begin{array}{rcl} \partial _{a}\ln B(a,b) =\psi (a) -\psi (a + b)& & {}\end{array}$$(2.11.18)$$\displaystyle\begin{array}{rcl} \partial _{b}\ln B(a,b) =\psi (b) -\psi (a + b).& & {}\end{array}$$(2.11.19)(b) Show that the log-likelihood function is given by
$$\displaystyle{\ell(a,b) =\ln p_{a,b} = -\ln B(a,b) + (a - 1)\ln x + (b - 1)\ln (1 - x).}$$(c) Take partial derivatives and use formulas (2.11.18) and (2.11.19) to verify relations
$$\displaystyle\begin{array}{rcl} \partial _{a}\ell(a,b)& =& -\partial _{a}\ln B(a,b) +\ln x =\psi (a + b) -\psi (a) +\ln x {}\\ \partial _{b}\ell(a,b)& =& \psi (a + b) -\psi (b) +\ln (1 - x) {}\\ \partial _{a}^{2}\ell(a,b)& =& \psi ^{\prime}(a + b) -\psi ^{\prime}(a) =\psi _{ 1}(a + b) -\psi _{1}(a) {}\\ \partial _{b}^{2}\ell(a,b)& =& \psi ^{\prime}(a + b) -\psi ^{\prime}(b) =\psi _{ 1}(a + b) -\psi _{1}(b) {}\\ \partial _{a}\partial _{b}\ell(a,b)& =& \psi ^{\prime}(a + b) =\psi _{1}(a + b). {}\\ \end{array}$$(c) Using the expression of trigamma functions as a Hurwitz zeta function, show that the Fisher information matrix can be written as a series \(g =\sum _{n\geq 0}g_{n}\), where
$$\displaystyle{g_{n} = \left (\begin{array}{cc} \frac{1} {(a+n)^{2}} - \frac{1} {(a+b+n)^{2}} & - \frac{1} {(a+b+n)^{2}}\\ \\ - \frac{1} {(a+b+n)^{2}} & \frac{1} {(b+n)^{2}} - \frac{1} {(a+b+n)^{2}}\\ \end{array} \right ).}$$ -
2.5.
Let \(\mathcal{S} =\{ p_{\xi };\xi \in [0,1]\}\) be a one-dimensional statistical model, where
$$\displaystyle{p(k;\xi ) =\Big (_{k}^{n}\Big)\xi ^{k}(1-\xi )^{n-k}}$$is the Bernoulli distribution, with \(k \in \{ 0,1,\ldots,n\}\) and ξ ∈ [0, 1]. Show that the derivatives of the log-likelihood function ℓ k (ξ) = lnp(k; ξ) are
$$\displaystyle\begin{array}{rcl} \partial _{\xi }\ell_{k}(\xi )& =& \frac{k} {\xi } - (n - k) \frac{1} {1-\xi } {}\\ \partial _{\xi }^{2}\ell_{ k}(\xi )& =& -\frac{k} {\xi ^{2}} - (n - k) \frac{1} {(1-\xi )^{2}}\cdot {}\\ \end{array}$$ -
2.6.
Consider the geometric probability distribution \(p(k;\xi ) = (1-\xi )^{k-1}\xi\), \(k \in \{ 1,2,3,\ldots \}\), ξ ∈ [0, 1]. Show that
$$\displaystyle\begin{array}{rcl} \partial _{\xi }\ell_{k}(\xi )& =& \frac{k - 1} {\xi -1} + \frac{1} {\xi } {}\\ \partial _{\xi }^{2}\ell_{ k}(\xi )& =& -\frac{(k - 1)} {(\xi -1)^{2}} -\frac{1} {\xi ^{2}} \cdot {}\\ \end{array}$$ -
2.7.
Let f be a density function on \(\mathbb{R}\) and define the statistical model
$$\displaystyle{\mathcal{S}_{f} =\Big\{ p(x;\mu,\sigma ) = \frac{1} {\sigma } f\Big(\frac{x-\mu } {\sigma } \Big);\mu \in \mathbb{R},\sigma > 0\Big\}.}$$-
(a)
Show that \(\int _{\mathbb{R}}p(x;\mu,\sigma )\,dx = 1\).
-
(b)
Verify the following formulas involving the log-likelihood function ℓ = lnp( ⋅ ; μ, σ):
$$\displaystyle{\partial _{\mu }\ell = -\frac{1} {\sigma } \frac{f^{\prime}} {f},\qquad \partial _{\sigma }\ell = -\frac{1} {\sigma } -\frac{(x-\mu )} {\sigma ^{2}} \frac{f^{\prime}} {f}}$$$$\displaystyle{\partial _{\mu }\partial _{\sigma }\ell = \frac{1} {f^{2}}\Big[\Big(\frac{f^{\prime}} {\sigma ^{2}} + \frac{1} {\sigma } \frac{x-\mu } {\sigma ^{2}} f^{\prime \prime}\Big)f -\frac{1} {\sigma } \frac{x-\mu } {\sigma ^{2}} (f^{\prime})^{2}\Big].}$$ -
(b)
Show that for any continuous function h we have
$$\displaystyle{E_{(\mu,\sigma )}\Big[h\Big(\frac{x-\mu } {\sigma } \Big)\Big] = E_{(0,1)}[h(x)].}$$ -
(c)
Assume that f is an even function (i.e., \(f(-x) = f(x)\)). Show that the Fisher–Riemann metric, g, has a diagonal form (i.e., g 12 = 0).
-
(d)
Prove that the Riemannian space \((\mathcal{S}_{f},g)\) has a negative, constant curvature.
-
(e)
Consider \(f(x) = \frac{1} {\sqrt{2\pi }}e^{-x^{2}/2 }\). Use the aforementioned points to deduct the formula for g ij and to show that the curvature \(K = -\frac{1} {2}\).
-
(a)
-
2.8.
Search the movement of the curve
$$\displaystyle{(\mu,\sigma ) \rightarrow p_{\mu,\sigma }(x) = \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },\,\,\mu ^{2} +\sigma ^{2} = 1}$$with \((\mu,\sigma,p) \in \mathbb{R} \times (0,\infty ) \times (0,\infty ),\,x \in \mathbb{R}\), fixed, in the direction of the binormal vector field.
-
2.9.
The graph of the normal density of probability
$$\displaystyle{x \rightarrow p_{\mu,\sigma }(x) = \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} }}$$is called Gauss bell. Find the equation of the surface obtained by revolving the Gauss bell about:
-
(a)
Ox axis;
-
(b)
Op axis.
-
(a)
-
2.10.
Inspect the movement of the trajectories of the vector field (y, z, x) after the direction of the vector field
$$\displaystyle{\left (1,1, \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} }\right ),}$$where μ and σ are fixed.
-
2.11.
The normal surface
$$\displaystyle{(\mu,\sigma ) \rightarrow p_{\mu,\sigma }(x) = \frac{1} {\sigma \sqrt{2\pi }}\,e^{-\frac{(x-\mu )^{2}} {2\sigma ^{2}} },}$$$$\displaystyle{(\mu,\sigma ) \in \mathbb{R} \times (0,\infty );\,x \in \mathbb{R}}$$is deformed into \(p_{\mu,\sigma }(tx),\,t \in \mathbb{R}\). What happens with the Gauss curvature?
-
2.12.
The gamma surface
$$\displaystyle{(\alpha,\beta ) \rightarrow p_{\alpha,\beta }(x) = \frac{1} {\beta ^{\alpha }\varGamma (\alpha )}\,x^{\alpha -1}\,e^{-\frac{x} {\beta } }}$$$$\displaystyle{(\alpha,\beta ) \in (0,\infty ) \times (0,\infty );\,x \in (0,\infty )}$$is deformed into \(p_{t\alpha,\beta }(x),\,t \in (0,\infty )\). What happens with the mean curvature?
Bibliography
A. Erdélyi et al., Higher Transcendental Functions, vol. I, IV. Bateman Manuscript project (McGraw-Hill, New York, 1955)
R.E. Kass, P.W. Vos, Geometrical Foundations of Asymptotic Inference, Wiley Series in Probability and Statistics (Wiley, New York, 1997)
H. Nagaoka, S. Amari, Differential geometry of smooth families of probability distributions. Technical Report (METR) 82–7, Dept. of Math. Eng. and Instr. Phys., Univ. of Tokyo, 1982
Author information
Authors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this chapter
Cite this chapter
Calin, O., Udrişte, C. (2014). Explicit Examples. In: Geometric Modeling in Probability and Statistics. Springer, Cham. https://doi.org/10.1007/978-3-319-07779-6_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-07779-6_2
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-07778-9
Online ISBN: 978-3-319-07779-6
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)