The uniqueness of the Fisher metric as information metric

Lê, Hông Vân

doi:10.1007/s10463-016-0562-0

The uniqueness of the Fisher metric as information metric

Published: 14 May 2016

Volume 69, pages 879–896, (2017)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Annals of the Institute of Statistical Mathematics Aims and scope Submit manuscript

The uniqueness of the Fisher metric as information metric

Download PDF

Hông Vân Lê¹

453 Accesses
20 Citations
1 Altmetric
Explore all metrics

Abstract

We define a mixed topology on the fiber space $\cup _\mu \oplus ^n L^n(\mu )$ over the space ${\mathcal M}({\Omega })$ of all finite non-negative measures $\mu $ on a separable metric space ${\Omega }$ provided with Borel $\sigma $-algebra. We define a notion of strong continuity of a covariant n-tensor field on ${\mathcal M}({\Omega })$. Under the assumption of strong continuity of an information metric, we prove the uniqueness of the Fisher metric as information metric on statistical models associated with ${\Omega }$. Our proof realizes a suggestion due to Amari and Nagaoka to derive the uniqueness of the Fisher metric from the special case proved by Chentsov by using a special kind of limiting procedure. The obtained result extends the monotonicity characterization of the Fisher metric on statistical models associated with finite sample spaces and complement the uniqueness theorem by Ay–Jost–Lê–Schwachhöfer that characterizes the Fisher metric by its invariance under sufficient statistics.

The Fisher metric as a metric on the cotangent bundle

Article Open access 21 December 2023

The Exponential Family in Abstract Information Theory

Wasserstein information matrix

Article 14 February 2023

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Recent successful applications of information geometry, see e.g. Amari (1987), Amari and Nagaoka (2000), Ay et al. (2011), Shahshahani (1979), where the Fisher metric plays a fundamental role, motivate us to find an answer to the following important question. Is there another metric on statistical models with natural properties, which we could name information metric?

Intuitively, information metric should reflect the amount of non-negative information of a statistical model, moreover

it should measure “information loss” associated with a data processing and this information loss is a non-negative quantity (Chentsov 1978, Axiom A);
it must be invariant under sufficient statistics, that is, mappings between sample spaces that preserve all information about the parameter x.

In statistical decision theory, a data processing is a statistical decision rule, which can be deterministic or randomized. A deterministic decision rule is a measurable map, which is also called a statistic. An indeterministic decision rule is a Markov transition distribution (Chentsov 1982). Recently, Ay–Jost–Lê–Schwachhöfer showed that a transformation between statistical models which is associated with a Markov transition distribution is a composition of the inverse of a transformation, which is associated with a sufficient statistic, and a transformation which is associated with a statistic (Ay et al. 2015, Theorem 4.10). Hence, assuming the condition of invariance under sufficient statistics, the “information loss” condition is reduced to the case where data processing is associated with a statistic.

Using the concept of a continuous local statistical covariant tensor field on statistical models (Ay et al. 2015, Definition 2.8), see also Definition 10 below, and utilizing the above discussion, we propose the following.

Definition 1

Given a class $\{ {\Omega }\}$ of measure spaces, an information metric on statistical models (Definition 6), or more generally, on parametrized measure models $(M, {\Omega }, \mu , p)$ where ${\Omega }\in \{ {\Omega }\}$ is a continuous local statistical non-negative definite quadratic form $F_{(M, {\Omega }, \mu , p)}$ (Definitions 8, 9, 10) that satisfies the following two conditions:

1.
the “information loss” $F_{(M, {\Omega }, \mu , p)} - F_{(M, {\Omega }_1, \kappa _*(\mu ), \kappa _*(p))}$ is a non-negative definite quadratic form for any statistic $\kappa : {\Omega }\rightarrow {\Omega }_1$;
2.
the “information loss” $F_{(M, {\Omega }, \mu , p)} -F_{(M, {\Omega }_1, \kappa _*(\mu ), \kappa _*(p))}$ is zero (quadratic form) if $\kappa $ is sufficient with respect to the parameter $x \in M$.

Each of the conditions (1) and (2) in Definition 1 is natural and has its own appeal. The condition (2) has been considered in Ay et al. (2015) as a criterion for a natural metric on parametrized measure models. The condition (1) is simpler formulated than the condition (2), since it does not depend on the notion of a sufficient statistic, that depends on a statistical model under consideration and depends on the notion of information implicitly. (For a modern definition of a sufficient statistic we refer the reader to Ay et al. (in preparation), where Ay–Jost–Lê–Schwachhöfer propose a geometric definition of a sufficient statistic associated with a (signed) parametrized measure model in terms of Banach manifolds in consideration, which agrees with the old concept of sufficient statistics that uses the Fisher–Neyman characterization.)

In 1972 Chentsov proved that on statistical models $(M, {\Omega }, \mu , p)$ associated with finite sample spaces ${\Omega }$ the Fisher metric $g^F$ (Example 11) is a unique metric, up to a multiplicative constant, that satisfies (2) (Chentsov 1982). In Ay et al. (2015, Corollary 4.11), for finite sample spaces ${\Omega }$, we derived the uniqueness (up to a multiplicative constant) of a metric that satisfies the condition (1) on statistical models associated with ${\Omega }$ from the uniqueness of a metric on statistical models that satisfies the condition (2) on ${\Omega }$, see Proposition 25 and the Appendix at the end of this note for a discussion on the Chentsov theorem. The converse statement, every metric that satisfies the condition (2) also satisfies the condition (1), follows from the monotonicity theorem for the Fisher metric on statistical models associated with finite sample spaces.

In 2012 Ay–Jost–Lê–Schwachhöfer proved that the Fisher metric is a unique metric, up to a multiplicative constant, on statistical models that satisfies (2) (Ay et al. 2015, Remark 3.23). [On parametrized measure models there are many information metrics that satisfy the condition (2) (Ay et al. 2015, Theorem 2.10). This fact has been observed earlier for parametrized measure models associated with finite sample spaces by Campbell (1986)]. Further, Theorem 3.11 in Ay et al. (2015) states that, the Fisher metric satisfies (1) if ${\Omega }, {\Omega }_1$ are smooth manifolds and $\mu $ is dominated by a Lebesgue measure.

In our paper, we extend the aforementioned results as follows. Our first observation is the following:

Theorem 2

(The monotonicity of the Fisher metric) Let ${\Omega }_1,$ ${\Omega }_2$ be topological spaces with Borel $\sigma $-algebra, $\kappa : {\Omega }_1 \rightarrow {\Omega }_2$ a statistic. Assume that $(M, {\Omega }_1, \mu _1, p_1)$ and $(M, {\Omega }_2, \kappa _*(\mu _1), \kappa _*(p_1))$ are 2-integrable parametrized measure models. Then for all $x \in M$ and $V\in T_xM$ we have $g ^F_{(M, {\Omega }_1, \mu _1, p_1)}(V, V) \ge g^F_{(M, {\Omega }_2, \kappa _*(\mu _1), \kappa _*(p_1))}(V,V).$

Theorem 2 is possibly known to experts in the field, but we include it here as well as its short proof since we have not seen a precise statement with a proof of it in an available source and we wish to discuss its consequence. We obtain immediately from the Ay–Jost–Lê–Schwachhöfer theorem (Ay et al. 2015, Remark 3.23) and Theorem 2 the following

Corollary 3

Let $\{{\Omega }\}$ be the class of topological spaces provided with Borel $\sigma $-algebra. Any continuous local statistical non-negative definite quadratic form F on statistical models associated with $\{ {\Omega }\}$ that satisfies the condition (2) in Definition 1 also satisfies the condition (1) in Definition 1. In other words, the condition (2) is stronger than the condition (1) for those F.

To prove the uniqueness result for an information metric that satisfies the weaker monotonicity condition (1) in Definition 1 we pose a topological condition on such an information metric. This condition is formulated in terms of the strong continuity, the notion we introduce in Definition 18.

For a measurable space $({\Omega }, \Sigma )$ let us denote by ${\mathcal M}({\Omega })$ the subset of all finite non-negative measures on ${\Omega }$.

Theorem 4

(The uniqueness of the Fisher metric) Let $\{ {\Omega }\} $ be the class of separable metrizable topological spaces provided with Borel $\sigma $-algebra. Assume that F is a continuous local statistical non-negative definite quadratic form defined on all 2-integrable statistical models $(M, {\Omega }, \mu , p)$ (Definitions 6, 10) where ${\Omega }\in \{ {\Omega }\}.$ If F satisfies the monotonicity condition (1) in Definition 1.1 and the associated quadratic form $\tilde{F}$ on ${\mathcal M}({\Omega })$ (Definition 10) is strongly continuous for all ${\Omega },$ then F is the Fisher quadratic form up to a multiplicative constant.

Corollary 5

Let $\{ {\Omega }\} $ be the class of separable metrizable topological spaces provided with Borel $\sigma $-algebra. Any continuous local statistical non-negative definite quadratic form F on statistical models associated with $\{{\Omega }\}$ that satisfies the condition (1) in Definition 1 also satisfies the condition (2) in Definition 1, if the associated form $\tilde{F}$ on ${\mathcal M}({\Omega })$ satisfies the strong continuity condition for all ${\Omega }$. In other words, the combination of the condition (1) and the strong continuity condition is stronger than the condition (2) for those F.

In Remark 26 below we argue how we consider Theorem 4 as a generalization of the characterization the Fisher metric by its monotonicity in the case of finite sample spaces, which is equivalent to the Chentsov theorem. Since there are many measure classes which are invariant under statistics, see e.g. Bogachev (2007, vol. II, Chapter 9) for discussion, we conjecture that without the strong continuity assumption there exists a local statistical continuous metric that satisfies (1) but does not satisfy (2).

The remainder of our paper is organized as follows. In Sect. 2, we recall the notion of a k-integrable parametrized measure model and the notion of a local statistical continuous covariant tensor field that have been introduced by Ay–Jost–Lê–Schwachhöfer in Ay et al. (2015). In Sect. 3, we prove Theorem 2. In Sect. 4, we assume that ${\Omega }$ is a separable metrizable topological space provided with Borel $\sigma $-algebra. We introduce a mixed topology on the space ${\mathcal L}^n_n({\Omega }) : = \cup _{\mu \in {\mathcal M}({\Omega })}\oplus ^n L^n({\Omega }, \mu )$, which enjoys nice properties (Proposition 17). Using this topology we introduce the notion of strongly continuous covariant n-tensors on ${\mathcal M}({\Omega })$ (Definition 18). In Sect. 5, we prove Theorem 4 by deriving it from the special case associated with finite sample spaces. Finally, we include an appendix containing a note on the Chentsov uniqueness theorem.

The idea to derive the uniqueness of the Fisher metric from its special case proved by Chentsov for finite sample spaces has been proposed by Amari and Nagaoka (2000, p. 39) as follows “Here we shall only observe that Chentsov’s theorem leads to the Fisher metric and the $\alpha $-connections if a kind of limiting procedure is permitted”, see also Remark 26(3) on a similar idea due to Morozova–Chentsov. In this note we have found such limiting procedure in terms of strong continuity associated with the mixed topology.

2 k-integrable parametrized measure models and local statistical continuous tensor fields

For $\mu _0 \in {\mathcal M}({\Omega })$ denote by

$$\begin{aligned} {\mathcal M}_+({\Omega }, \mu _0):= & {} \{ \mu = \phi \mu _0|\, \phi \in L^1({\Omega }, \mu _0), \, \phi > 0 , \, \mu _0\text {-}\hbox {a.e.}\},\\ {\mathcal P}_+({\Omega }, \mu _0)= & {} \{\mu \in {\mathcal M}_{+}({\Omega }, \mu _0) \; : \; \mu (\Omega ) = 1\}. \end{aligned}$$

Definition 6

(Ay et al. 2015, Definition 2.4; cf. Amari 1987, §2, p. 25; Amari and Nagaoka 2000, §2.1) Let $k \ge 1$. A k-integrable parametrized measure model is a quadruple $(M, {\Omega }, \mu , p)$ consisting of a smooth (finite or infinite dimensional) Banach manifold M and a continuous map $ p: M \rightarrow {\mathcal M}_+({\Omega }, \mu )$ provided with the $L^1$-topology such that there exists a density potential $\bar{p} = \frac{\mathrm{d}p}{\mathrm{d}\mu }: M \times {\Omega }\rightarrow {\mathbb R}$ satisfying $p(x) = \bar{p}(x,{\omega }) \mathrm{d}\mu ({\omega })$ and the following conditions:

1.
the function $x \mapsto \ln \bar{p} (x, {\omega }) = \ln \frac{\mathrm{d}p(x)}{\mathrm{d}\mu }({\omega }): M \rightarrow {\mathbb R}$ is defined and continuously Gâteaux-differentiable for $\mu $-almost all $\omega \in {\Omega }$,
2.
for all continuous vector field V on M the function $\omega \mapsto {\partial }_{V} \ln \bar{p}(x, {\omega })$ belongs to $L^k({\Omega }, p(x))$; moreover, the function $x \mapsto \Vert {\partial }_{V} \ln \bar{p}(x, {\omega })\Vert _{L^k ({\Omega }, p(x))}$ is continuous on M.

We call M the parameter space of $(M, {\Omega }, \mu , p)$. We call $(M, {\Omega }, \mu , p)$ a statistical model if $p(M)\subset {\mathcal P}_+ ({\Omega }, \mu )$.

In Definition 6 the continuous Gâteaux-differentiability of $\ln \bar{p}(x, {\omega })$ in $x \in M$ means the continuity of the Gateaux-differential $D\ln \bar{p}(x, {\omega })$ as a function on TM (Hamilton 1982, chapter I.3).

Remark 7

In Definition 6 we represent a tangent vector $V \in T_xM$ by the function ${\partial }_V \ln \bar{p} (x, {\omega }) \in L^1({\Omega }, p(x))$. This representation is independent of the choice of a reference measure in ${\mathcal M}_+ ({\Omega },\mu )$, it depends only on the map $p : M \rightarrow {\mathcal M}_+({\Omega }, \mu )$.

Definition 8

(Ay et al. 2015, Definition 2.2) A section $\tau $ of the bundle $T^*M \otimes _{n\,\mathrm{times}}\otimes T^* M$ is called a weakly continuous covariant n-tensor, if the value $\tau (V_n)$ is a continuous function for any continuous n-vector field $V_n$ on M.

Definition 9

(Ay et al. 2015, Definition 2.1) A covariant n-tensor field on ${\mathcal M}({\Omega })$ assigns to each $\mu \in {\mathcal M}({\Omega })$ a multilinear map $\tau _\mu : \bigoplus ^n L^n ({\Omega }, \mu ) \rightarrow {\mathbb R}$ that is continuous w.r.t. the product topology on $\bigoplus ^n L^n ({\Omega }, \mu )$.

Definition 10

(Locality and continuity condition) (Ay et al. 2015, Definition 2.8) Given a class $\{ {\Omega }\}$ of measure spaces, a statistical covariant continuous n-tensor field A assigns to each parametrized measure model $(M, {\Omega }, \mu , p)$ where ${\Omega }\in \{ {\Omega }\}$ a weakly continuous (in the sense of Definition 8) covariant n-tensor field $A|_{(M, {\Omega }, \mu , p)}$ on M (cf. Definition 9). A statistical covariant continuous n-tensor field A is called local if there is a covariant n-tensor field $\tilde{A}$ on ${\mathcal M}({\Omega })$ with the following property for any parametrized measure model $(M, {\Omega }, \mu , p)$ and any $V _i \in T_xM$

$$\begin{aligned} A|_{(M, {\Omega }, \mu , p)} (V_1, \ldots , V_n) = \tilde{A}_{p(x)}({\partial }_{V_1} \ln \bar{p}(x),\ldots ,{\partial }_{V_n}\ln \bar{p}(x)). \end{aligned}$$

(1)

From now on, if a weakly continuous covariant tensor A on a k-integrable statistical model $(M, {\Omega }, \mu , p)$ satisfies (1) for $A|_{(M, {\Omega }, \mu , p)} = A$, we shall write $A = p^*(\tilde{A})$.

Example 11

(cf. Remark 22). In Ay et al. (2015) Ay–Jost–Lê–Schwachhöfer showed that the Fisher quadratic form

$$\begin{aligned} g^F(V, W) _x : = \int _{\Omega }{\partial }_V \ln \bar{p}(x, {\omega }) {\partial }_W \ln \bar{p}(x, {\omega }) \, \mathrm{d}p(x) \end{aligned}$$

(2)

and the Amari–Chentsov 3-symmetric tensor

$$\begin{aligned} T^{AC} (V,W, X) _x : = \int _{\Omega }{\partial }_V \ln \bar{p}(x, {\omega }) {\partial }_W \ln \bar{p} (x, {\omega }){\partial }_X \ln \bar{p}(x, {\omega }) \, \mathrm{d} p(x) \end{aligned}$$

(3)

are local statistical continuous covariant tensor fields.

3 The monotonicity of the Fisher metric

In this section, we consider topological spaces ${\Omega }$ provided with Borel $\sigma $-algebra. We prove Theorem 2, and discuss some related problems (Remark 14).

Recall that a statistic $\kappa : {\Omega }_1 \rightarrow {\Omega }_2$ induces the linear operator $\kappa _*: L^1 ({\Omega }_1, \mu _1) \rightarrow L^1 ({\Omega }_2, \kappa _*(\mu _1))$ defined by Ay et al. (2015, (3.2))

$$\begin{aligned} \kappa _*f (y) : = \frac{\mathrm{d}\kappa _*(f \cdot \mu _1)}{\mathrm{d}\kappa _*(\mu _1)}(y) \end{aligned}$$

(4)

for $f \in L^1({\Omega }_1, \mu _1)$ and $y \in {\Omega }_2$.

Remark 12

The operator $\kappa _*$ is well defined, since by the Radon–Nikodym theorem, $ f \in L^1 ({\Omega }_1, \mu _1)$ if and only if $f\cdot \mu _1$ is a measure dominated by $\mu _1$, i.e., the null set of $\mu _1$ is also a null set of $f\cdot \mu _1$. Now assume that $Z\subset {\Omega }_2$ is a null set of $\kappa _*(\mu _1)$. Then $\kappa ^{-1} (Z)$ is also a null set of $\mu $ and hence of $f\cdot \mu _1$. It follows that Z is a null set of $\kappa _*(f\cdot \mu _1)$, and by the Radon–Nikodym theorem $\kappa _*(f\cdot \mu _1)$ is dominated by $\kappa _*(\mu _1)$.

Some time we will write $\kappa ^{\mu _1}_*(f)$, if f may belong to $L^p({\Omega }_1,\mu _1)$ for different $\mu _1$.

The following Lemma 13 is an expression of the well-known fact that condition expectation reduces the $L^p$-norm, see e.g. Neveu (1965, §4.3).

Lemma 13

For all $p\ge 2$ we have $\kappa _*(L^ p({\Omega }_1, \mu _1)) \subset L^p ({\Omega }_2, \kappa _*(\mu _1)).$ The linear map $\kappa _*$ contracts $L^p$-norm :

$$\begin{aligned} \Vert \kappa _*(f)\Vert _{L^p({\Omega }_2, \kappa _*(\mu _1))} \le \Vert f\Vert _{L^p({\Omega }_1, \mu _1)} \end{aligned}$$

for all $f \in L^p({\Omega }_1, \mu _1).$

Proof

Let $f \in L^p({\Omega }_1, \mu _1)$ and $y \in {\Omega }_2$. For a sequence of open sets ${\Omega }_2= A_0 \supset \cdots \supset A_n \supset \cdots \ni y$ and a statistic $\kappa : {\Omega }_1 \rightarrow {\Omega }_2$ we set

$$\begin{aligned} |f|_n (y): = \frac{ \int _{\kappa ^{-1} (A_n)}|f (x)| \, \mathrm{d}\mu _1}{\mu _1 (\kappa ^{-1}(A_n))}. \end{aligned}$$

By the Hölder inequality, we have

$$\begin{aligned} (|f|_n(y)) ^p \le \frac{\int _{\kappa ^{-1} (A_n)}|f (x)^p| \, \mathrm{d}\mu _1}{\mu _1 (\kappa ^{-1}(A_n))}. \end{aligned}$$

Since $\lim _{n \rightarrow \infty } |f|_n (y ) = \kappa _*(|f|)(y)$, we deduce from the above inequality

$$\begin{aligned} (\kappa _*(|f|)(y))^p \le \kappa _* (|f | ^ p)(y). \end{aligned}$$

(5)

Using (5), we obtain

$$\begin{aligned}&\Vert \kappa _*(f)\Vert _{L^p({\Omega }_2, \kappa _*(\mu _1))}^p = \int _{{\Omega }_2} |\kappa _*(f)|^p\,\mathrm{d}\kappa _*(\mu _1) \\&\quad \le \int _{{\Omega }_2} (\kappa _*(|f|))^p\, \mathrm{d}\kappa _*(\mu _1) \le \int _{{\Omega }_2} \kappa _* (|f | ^ p)\,\mathrm{d}\kappa _*(\mu _1)= (\Vert f\Vert _{L^p({\Omega }_1, \mu _1)})^p, \end{aligned}$$

which implies immediately Lemma 13.$\square $

Proof of Theorem 2

By Remark 7 the geometry of a parametrized measure model $(M, {\Omega }_1, \mu _1, p_1)$ does not depend on the choice of a reference measure $\mu _1$. Thus, to prove Theorem 2 at a point $x \in M$, we can assume that $p_1(x) = \mu _1$ and hence $\bar{p}_1 (x, {\omega }) = 1$. Abusing the notation, for a function $\bar{p} : M \times {\Omega }\rightarrow {\mathbb R}$ and for $x\in M$, we denote by $\bar{p} (x)$ the function ${\Omega }\rightarrow {\mathbb R}$ such that $\bar{p} (x) ({\omega }) = \bar{p} (x, {\omega })$. Then we have $ \kappa _*^{\mu _1}(\bar{p}_1(x))(\kappa ({\omega })) =1$ for all ${\omega }$. Now let $V \in T_xM$. Then we have

$$\begin{aligned} {\partial }_V (\ln \kappa _*^{\mu _1}(\bar{p}_1 (x)) = {\partial }_V \kappa _*^{\mu _1}( \bar{p}_1 (x)). \end{aligned}$$

(6)

Next, we shall prove the following equality

$$\begin{aligned} {\partial }_V \kappa _*^{\mu _1}(\bar{p}_1 (x)) = \kappa _*^{\mu _1}({\partial }_V \bar{p}_1 (x)). \end{aligned}$$

(7)

To prove (7) it suffices to show that the following equality holds

$$\begin{aligned} {\partial }_V \kappa _*^{\mu _1} (p_1 (x)) = \kappa _*^{\mu _1} ({\partial }_V p_1 (x)), \end{aligned}$$

(8)

where the RHS and LHS of (8) are understood as signed measures.

The condition (2) in Definition 6 implies that ${\partial }_V \bar{p}_1 \in L^2 ({\Omega }_1, \mu _1)\subset L^1 ({\Omega }_1, \mu _1)$, since $(M, {\Omega }_1, \mu _1, p_1)$ is a 2-integrable parametrized measure model. Hence, for any measurable subset A in ${\Omega }_2$, we can apply the differentiation under integral [see e.g. Jost (2005, Theorem 16.11, p. 213)] to obtain the following [cf. Ay et al. (2015, (2.6)) for a slightly different proof]

$$\begin{aligned} {\partial }_V \int _{\kappa ^{-1}(A)} \bar{p}_1\,\mathrm{d}\mu _1 = \int _{\kappa ^{-1}(A)}{\partial }_V \bar{p}_1\,\mathrm{d}\mu _1. \end{aligned}$$

This equality implies (8) immediately. Hence (7) holds.

Let us continue the proof of Theorem 2. Using (7), and recalling that $\bar{p}_1 (x, {\omega }) = 1$, we obtain from (6)

$$\begin{aligned} {\partial }_V (\ln \kappa _*^{\mu _1}(\bar{p}_1 (x)) = \kappa _*^{\mu _1}({\partial }_V \bar{p}_1 (x))= \kappa _*^{\mu _1} ({\partial }_V \ln \bar{p}_1)(x). \end{aligned}$$

Since ${\partial }_V \ln \bar{p}_1(x) \in L^2({\Omega }_1, p_1(x))$ for all $x \in M$, by Lemma 13, we obtain

$$\begin{aligned} \Vert \kappa _*^{\mu _1}({\partial }_V \ln \bar{p}_1)(x)\Vert _{L^2 ({\Omega }_2, \kappa _*^{\mu _1}(p_1(x))} \le \Vert {\partial }_V \ln \bar{p}_1(x)\Vert _{L^2({\Omega }_1, p_1(x))}. \end{aligned}$$

(9)

Noting that the RHS of (9) is equal to $g^F_{(M, {\Omega }_1, \mu _1, p_1)}$ and the LHS of (9) is equal to $g^F_{(M, {\Omega }_2, \kappa _*(\mu _1), \kappa _*(p_1))}$, we deduce Theorem 2 from (9).$\square $

Remark 14

1.
It is not hard to see that if ${\Omega }_1$, ${\Omega }_2$ are metric topological spaces, $\kappa $ and f are continuous, then the inequality (9) becomes an equality if and only if $f({\omega }) = \kappa _*(f) (\kappa ({\omega }))$ for all ${\omega }$.
2.
Lemma 13 implies that the absolute value $\hat{T}^{AC}$ of the Amari–Chentsov tensor defined by $\hat{T}^{AC}(V) : = |T^{AC} (V, V, V)| $ for $V \in TM$ also satisfies the version of Definition 1 on statistical fields which measure “information loss”.

4 Mixed topology and strongly continuous covariant tensor fields

In this section, we assume that ${\Omega }$ is a separable metric topological space provided with Borel $\sigma $-algebra. Let ${\mathbb R}^n_{\ge 0}: = [0, \infty ) ^n$. We introduce a mixed topology on the spaces

$$\begin{aligned} {\mathcal L}^n_n({\Omega }) : = \cup _{\mu \in {\mathcal M}({\Omega })}\oplus ^n L^n({\Omega }, \mu )\quad \text {and}\quad {\mathcal L}^n_1({\Omega }) : = \cup _{\mu \in {\mathcal M}({\Omega })} L^n({\Omega }, \mu ) \end{aligned}$$

which has good properties (Proposition 17). Using the mixed topology, we introduce the notion of strongly continuous covariant n-tensor fields on ${\mathcal M}({\Omega })$ (Definition 18), whose examples are the Fisher quadratic form (Remark 22) and all continuous functions on ${\mathcal L}^k_k({\Omega }_n)$ (Example 19), where ${\Omega }_n$ is a finite sample space consisting of n elementary events.

4.1 Mixed topology on ${\mathcal L}^n_n({\Omega })$

It is known that ${\mathcal M}({\Omega })$ possesses many different important topologies, e.g., the total variation topology, the strong topology and the weak topology. The total variation is used in Definition 6. Now we recall the notion of weak topology on ${\mathcal M}({\Omega })$, which plays prominent role in measure theory and especially in probability theory (Bogachev 2007; Billingsley 1999). Denote by $C_b ({\Omega })$ the space of all bounded continuous real functions on ${\Omega }$.

Definition 15

(cf. Bogachev 2007, vol. II, Definition 8.1.1) A sequence of Borel measures $\mu _\alpha $ on ${\Omega }$ is called weakly convergent to a Borel measure $\mu $ (writing as $\mu _\alpha \Longrightarrow \mu $) if for every $f\in C_b({\Omega })$ one has

$$\begin{aligned} \lim _\alpha \int _{\Omega }f \,\mathrm{d}\mu _\alpha = \int _{\Omega }f\,\mathrm{d}\mu . \end{aligned}$$

It is known that the weak topology on ${\mathcal M}({\Omega })$ is generated by fundamental neighborhoods of $\mu $, $\mu \in {\mathcal M}({\Omega })$, defined as follows (Bogachev 2007, Definition 8.1.2)

$$\begin{aligned} U_{f_1,\ldots , f_k,{\varepsilon }}(\mu ) := \left\{ \nu : \,\left| \int _{\Omega }f_i \mathrm{d}\mu -\int _{\Omega }f_i \mathrm{d}\nu \right| < {\varepsilon }\text { for } i\in [1,k]\right\} , \end{aligned}$$

(10)

where $f_i\in C_b({\Omega })$, $k \in {\mathbb N}$ and ${\varepsilon }> 0$.

Remark 16

1.
The weak topology on ${\mathcal M}({\Omega })$ is weaker than the total variation topology, hence for any k-integrable parametrized measure model $(M, {\Omega }, \mu , p)$ the embedding $p: M \rightarrow {\mathcal M}_+ ({\Omega }, \mu ) \rightarrow {\mathcal M}({\Omega })$ is also continuous with respect to the weak topology on ${\mathcal M}({\Omega })$.
2.
Since ${\Omega }$ is a separable metric topological space, for each $\mu \in {\mathcal M}({\Omega })$ the subspace $ C_b({\Omega })$ is a dense subset in $ L^n ({\Omega }, \mu )$ with respect to the $L^n({\Omega }, \mu )$-topology (Adams and Fournier 2006; Jost 2005; Bogachev 2007).

Let us denote by ${\mathcal L}^n_n({\Omega })$ the fibration over ${\mathcal M}({\Omega })$ whose fiber over $\mu \in {\mathcal M}({\Omega })$ is the space $\oplus ^n L^n({\Omega }, \mu )$. Note that the product topology on $\oplus ^n L^n ({\Omega }, \mu )$ is generated by the product norm defined as follows. For $\vec {f} = (f_1, \ldots , f_n ) \in \oplus ^n L^n({\Omega }, \mu )$ let

$$\begin{aligned} \Vert \vec {f}\Vert _{L^n_n (\mu )} : = \sum _{i=1} ^n \Vert f_i\Vert _{L^n ({\Omega }, \mu )}. \end{aligned}$$

Denote by $\pi $ the projection ${\mathcal L}^n_n({\Omega })\rightarrow {\mathcal M}({\Omega })$.

We are going to define a topology on ${\mathcal L}^n_n({\Omega })$ by specifying its base. For an n-tuple of functions $\vec {f} \in \oplus ^n C_b({\Omega })= (C_b({\Omega }))^n$, an open set $U\subset {\mathcal M}({\Omega })$ in the weak topology and ${\varepsilon }>0$ we set

$$\begin{aligned} O(\vec {f},U, {\varepsilon }): =\{ [\vec {g}, \mu ]:\, \mu \in U ,\, \vec {g} \in \oplus ^n L^n({\Omega }, \mu ) \text{ and } \Vert \vec {g} - \vec {f}\Vert _{L^n_n(\mu )} < {\varepsilon }\},\qquad \end{aligned}$$

(11)

where $[\vec g, \mu ]$ means a pair.

Note that

$$\begin{aligned} O(\vec {f}, \cup _i U_i, {\varepsilon }) = \cup _i O(\vec {f}, U_i, {\varepsilon })\quad \text {and}\quad O(\vec {f},\cap _i U_i, {\varepsilon }) = \cap _i O(\vec {f}, U_i, {\varepsilon }). \end{aligned}$$

(12)

Proposition 17

The collection B of all subsets $O(\vec {f}, U, {\varepsilon })$ where $\vec {f} \in (C_b({\Omega }))^n,$ U is open set in ${\mathcal M}({\Omega })$ and ${\varepsilon }> 0$ generates a unique topology on ${\mathcal L}^n_n({\Omega }),$ which we shall call the mixed topology. Furthermore, the restriction of this topology to each fiber $\oplus ^n L^n({\Omega }, \mu )$ is equal to the $L^n({\Omega },\mu )$-topology on the fiber. Consequently, the space $(C_b({\Omega }) )^n\times {\mathcal M}({\Omega })$ is a dense subset in the mixed topology. The projection $\pi : {\mathcal L}^n_n({\Omega }) \rightarrow {\mathcal M}({\Omega })$ is continuous with respect to the mixed topology on the domain and the weak topology on the target space.

Proof

To prove the first assertion of Proposition 17 it suffices to show that the following conditions hold.

1.
The (base) elements in B cover ${\mathcal L}^n_n({\Omega })$.
2.
Let $O({\vec {f}}_1, U_1, {\varepsilon }_1)$ and $O(\vec {f_2}, U_2, {\varepsilon }_2)$ be base elements. If their intersection I is non-empty, then for each $[\vec {f}, \mu ] \in I$, there is a base element $O(\vec {f_3}, U_3, {\varepsilon }_3)$ such that $[\vec {f}, \mu ]\in O(\vec {f_3}, U_3, {\varepsilon }_3)\subset I$.

The first condition (1) holds by Remark 16.2.

Now let us prove that (2) holds. For $\vec {f} \in \oplus ^nL^n ({\Omega }, \mu )$ we set

$$\begin{aligned} B(\vec {f},{\varepsilon }, \mu ): = \{ \vec {f'} \in \oplus ^nL^n({\Omega }, \mu ) \text { and } \Vert \vec {f'} - \vec {f}\Vert _{L^n_n ({\Omega }, \mu )} < {\varepsilon }\}. \end{aligned}$$

Note that $I \cap \pi ^{-1}(\mu )$ is an open subset of $\pi ^{-1}(\mu )$ in $L^n ({\Omega },\mu )$-topology, since it is the intersection of two open balls $B({\vec {f}}_1,{\varepsilon }_1, \mu )$ and $B(\vec {f_2},{\varepsilon }_2,\mu )$. Using (12), we can assume w.l.o.g.

$$\begin{aligned}&U_1=U_{\tilde{f}_1, \ldots , \tilde{f}_k, {\varepsilon }_1 }(\mu _1),\\&U_2= U_{\tilde{g}_1,\ldots , \tilde{g}_m, {\varepsilon }_2}(\mu _2). \end{aligned}$$

Let $\delta _1$ be a number such that

$$\begin{aligned} U_{\tilde{f}_1,\ldots , \tilde{f}_k,\tilde{g}_1,\ldots , \tilde{g}_m,\delta _1}(\mu ) \subset U_1\cap U_2, \end{aligned}$$

(13)

and moreover $\delta _1 \le \min \{ 1, {\varepsilon }_1, {\varepsilon }_2\}$. Next we choose a positive number $\delta _2 \le \delta _1$ such that

$$\begin{aligned} \Vert \vec {f}-{\vec {f}}_1 \Vert _{L^n_n(\mu )} < {\varepsilon }_1-\delta _2 \quad \text {and}\quad \Vert \vec {f}-\vec {f_2}\Vert _{L^n_n(\mu )} < {\varepsilon }_2 -\delta _2. \end{aligned}$$

(14)

Then we choose $[\vec {f_3}, \mu ] \in I\cap \pi ^{-1}(\mu )$ with the following properties

$$\begin{aligned} \vec {f_3} \in (C_b ({\Omega }))^n \quad \text {and}\quad \Vert \vec {f_3} - \vec {f}\Vert _{L^n_n(\mu )} < \frac{1}{4} \delta _2. \end{aligned}$$

(15)

We obtain from (14) and (15)

$$\begin{aligned} \Vert \vec {f_3} -\vec {f_i}\Vert _{L^n_n(\mu )} < {\varepsilon }_i- \frac{3}{4} \delta _2 \quad \text {for } i =1,2. \end{aligned}$$

(16)

We write $\vec {f_3} = (f_3^1, \ldots , f_3^n)$. Note that $|f_3^i - f_1^i|^n$ and $|f_3^i-f_2^i|^n$ are continuous bounded functions on ${\Omega }$ for all $ i \in [1,n]$. Now we set

$$\begin{aligned} U_3 : = U_{\tilde{f}_1,\ldots , \tilde{f}_k, \tilde{g}_1, \ldots , \tilde{g}_m, |f^i_3 - f^i_1|^n,|f^i_3 - f^i_2|^n, i\in [1, n], (\frac{1}{8} \delta _2)^n} (\mu ). \end{aligned}$$

(17)

Since $\delta _2 \le \delta _1$ we obtain from (17) and (13)

$$\begin{aligned} U_3 \subset U_{\tilde{f}_1,\ldots , \tilde{f}_k, \tilde{g}_1,\ldots , \tilde{g}_m, \delta _1} (\mu ) \subset U_1 \cap U_2. \end{aligned}$$

Clearly, (15) implies that $[\vec {f},\mu ] \in O(\vec {f_3}, U_3, \frac{1}{4} \delta _2)$. Hence, setting ${\varepsilon }_3: = \frac{1}{ 4} \delta _2$, to complete the proof of the first assertion of Proposition 17, it suffices to show that

$$\begin{aligned} O\left( \vec {f_3}, U_3, \frac{1}{ 4} \delta _2\right) \subset I. \end{aligned}$$

(18)

Let $[\vec {h}, \mu '] \in O(\vec {f_3}, U_3, \frac{1}{ 4} \delta _2)$. To prove (18) we need to show that $[\vec {h}, \mu '] \in I$, or equivalently

$$\begin{aligned}{}[\vec {h}, \mu '] \in O(\vec {f_i}, U_i, {\varepsilon }_i)\quad \text {for } i = 1, 2. \end{aligned}$$

(19)

Since $\mu '\in U_3 \subset U_i$ for $i = 1,2$, (19) is equivalent to

$$\begin{aligned} \Vert \vec {h} -\vec {f_i}\Vert _{L^n_n(\mu ')} < {\varepsilon }_i \quad \text {for } i = 1,2. \end{aligned}$$

(20)

Taking into account $[\vec {h}, \mu '] \in O(\vec {f_3}, U_3, \frac{1}{ 4} \delta _2)$, we obtain

$$\begin{aligned} \Vert \vec {h}-\vec {f_3}\Vert _{L^n_n(\mu ')} < {1\over 4} \delta _2. \end{aligned}$$

(21)

Since $\mu '\in U_3$, we derive from (16) and (17)

$$\begin{aligned} \Vert \vec {f_3} -{\vec {f}}_1\Vert _{L^n_n(\mu ')} <\Vert \vec {f_3} -{\vec {f}}_1\Vert _{L^n_n(\mu )} +{1\over 8} \delta _2 < {\varepsilon }_1 -\frac{5}{ 8} \delta _2. \end{aligned}$$

(22)

In the same way we obtain

$$\begin{aligned} \Vert \vec {f_3} -\vec {f_2}\Vert _{L^n_n(\mu ')} < {\varepsilon }_2 -\frac{5}{ 8} \delta _2. \end{aligned}$$

(23)

Clearly, (21), (22), and (23) imply (20). This proves the first assertion of Proposition 17.

The second assertion of Proposition 17 follows from Remark 16.2, observing that a ball $B(\vec {f}, {\varepsilon }, \mu ) $ is the intersection of the open set $O(\vec {f}, U(\mu ), {\varepsilon })$ with the fiber $\oplus ^n L^n({\Omega }, \mu )$.

Finally, the last assertion is obvious, since the preimage $\pi ^{-1} (U)$ of an open set $U \subset {\mathcal M}({\Omega })$ is the union of all open sets of the form $O(\vec {f}, U, {\varepsilon })$, $f \in (C_b({\Omega }))^n$ and ${\varepsilon }> 0$. This completes the proof of Proposition 17.$\square $

4.2 Strongly continuous covariant n-tensor on ${\mathcal M}({\Omega })$

Definition 18

A covariant n-tensor field on ${\mathcal M}({\Omega })$ is called strongly continuous, if it is a continuous function on ${\mathcal L}^n_n({\Omega })$ with respect to the mixed topology.

Example 19

Let ${\Omega }_n: = \{ {\omega }_1, \ldots , {\omega }_n\}$ be a finite sample space of n elementary events. Let $\delta _{{\omega }_i}$ denote the Dirac measure concentrated at ${\omega }_i$. Let $\mu _l = \sum _{ i=1} ^l c_i \delta _{{\omega }_i} \in {\mathcal M}({\Omega }_n)$, where $l \le n$ and $c_i > 0$. Then, for all $k \ge 1$, $L^k({\Omega }_n, \mu _l)$ is homeomorphic to $C_b({\Omega }_l) = {\mathbb R}^l$, which is provided with the usual (vector space) topology. Furthermore, the weak topology on ${\mathcal M}({\Omega }_n)= {\mathbb R}^n_{\ge 0}$ coincides with the usual topology on ${\mathbb R}^n_{\ge 0}\subset {\mathbb R}^n$. Hence the subset ${\mathcal M}_+ ({\Omega }_n)$ consisting of positive measures on ${\Omega }_n$ is dense in ${\mathcal M}({\Omega }_n)$. We observe that $\pi : {\mathcal L}_k ^k ({\Omega }_n) \rightarrow {\mathcal M}({\Omega }_n)$ is a fiber bundle whose fiber over $\mu _l$ is homeomorphic to $({\mathbb R}^l) ^k$. A covariant k-tensor field $\tilde{F}$ on ${\mathcal M}({\Omega }_n) = {\mathbb R}^ n _{\ge 0}$ is a continuous function on ${\mathcal L}_k ^k ({\Omega }_n)$. Since $\pi ^{-1} ({\mathcal M}_+ ({\Omega }_n))$ is open and dense in ${\mathcal L}_k ^ k ({\Omega }_n)$, the function $\tilde{F}$ is defined uniquely by its restriction to $\pi ^{-1} ({\mathcal M}_+({\Omega }_n))$. In particular, the Fisher metric defined on ${\mathcal M}_+({\Omega }_n)$ is associated with the quadratic form $\tilde{g}^F : {\mathcal L}_2^2({\Omega }_n)\rightarrow {\mathbb R}$ defined by $\tilde{g}^F ([f_1, f_2, \mu ]) = \int _{{\Omega }_n} f_1 \cdot f_2\, d \mu $, see also Remark 22.

Proposition 20

Let $g\in C_b ({\Omega })$ and $ c: {\mathcal M}({\Omega }) \rightarrow {\mathbb R}$ be a continuous function with respect to the weak topology. We define a covariant n-tensor field $T_{g,c}$ on ${\mathcal M}({\Omega })$ by setting

$$\begin{aligned} T_{g, c} ([ f_1,\ldots , f_n, \mu ]) := c(\mu ) \cdot \int _{{\Omega }} g \cdot f_1\cdots f_n\,\mathrm{d}\mu . \end{aligned}$$

Then $T_{g,c}$ is a strongly continuous covariant n-tensor field on ${\mathcal M}({\Omega })$.

Proof

By Proposition 17, $\pi : {\mathcal L}^n_n({\Omega }) \rightarrow {\mathcal M}({\Omega })$ is a continuous function, hence $c(\mu )$ is a continuous function on ${\mathcal L}^n_n({\Omega })$. Thus to prove Proposition 20 it suffices to assume that $c(\mu ) = 1$, i.e., it suffices to show that $T_{g, 1}$ descends to a continuous function on ${\mathcal L}^n_n ({\Omega })$ provided with the mixed topology. Equivalently, we need to show that the set

$$\begin{aligned} O(a, b) : = \{ [\vec {f}, \mu ] \in {\mathcal L}^n_n ({\Omega }) |\, a< T_{g, 1} (\vec {f}, \mu ) < b \} \end{aligned}$$

is an open set in the mixed topology for any $-\infty < a < b < \infty $.

Let $[\vec {f}, \mu ] \in O(a+{\varepsilon },b-{\varepsilon })$, where ${\varepsilon }< {1\over 4} (b-a)$. We will show that there is an open set $O({\vec {f}}_1, U_1, \delta ) \ni [\vec {f}, \mu ]$ such that

$$\begin{aligned} T_{g,1} (O({\vec {f}}_1, U_1, \delta )) \subset (a, b). \end{aligned}$$

(24)

Lemma 21

The restriction of $T_{g,1}$ to each fiber $\oplus ^n L^n ({\Omega },\mu )$ is continuous in the product $L^n({\Omega }, \mu )$-topology. Moreover, if $\Vert \vec {h}-\vec {f}\Vert _{L^n_n (\mu )} \le 1$ then

$$\begin{aligned} |T_{g,1} ([\vec {f}, \mu ]) - T^n_{g,1} ([\vec {h}, \mu ])|\le & {} \sup _{\Omega }g({\omega }) \cdot 2 ^n \cdot \Vert \vec {h} -\vec {f}\Vert _{L^n _n(\mu ) }\\&\cdot \,\left( 1+\sum _{i =1}^n \sum _{ k = 1} ^n \Vert f_i\Vert _{L^n ({\Omega },\mu )}^k\right) . \end{aligned}$$

Proof

Write $\vec {f}-\vec {h} = \vec {a} = (a_1,\ldots , a_n)$. Expanding $h_1 \cdots h_n = \Pi _{i =1} ^n (f_i-a_i)$ and using Holder’s inequality, we obtain

$$\begin{aligned}&|T_{g,1}([\vec {f}, \mu ]) -T_{g,1}([\vec {h}, \mu ])| \nonumber \\&\quad \le \sup _{\Omega }g({\omega }) \cdot \sum _{ [1, n] = \{i_1,\ldots , i_k\} \cup \{j_1,\ldots , j _{n -k}\} } \int _{\Omega }| a_{i_1} \cdots a_{i_k} f_{j_1} \cdots f_{j_{n-k}}|\,\mathrm{d}\mu \nonumber \\&\quad \le \sup _{\Omega }g({\omega })\cdot 2 ^n\cdot \max _{ [1, n] = \{i_1,\ldots , i_k\} \cup \{j_1,\ldots , j _{n -k}\} } \Vert a_{i_1}\Vert _{L^n ({\Omega }, \mu )} \cdots \Vert f_{j_{n-k}}\Vert _{L^n ({\Omega }, \mu )}.\nonumber \\ \end{aligned}$$

(25)

Note that in (25) the set $\{j_1,\ldots , j _{n -k}\}$ may be empty but the set $\{i_1,\ldots , i_k\}$ is always non-empty. Since $\sum _{i=1}^n \Vert a_i\Vert _{L^n ({\Omega }, \mu )} \le 1$, we have

$$\begin{aligned}&\max _{ [1, n] = \{i_1,\ldots , i_k\} \cup \{j_1,\ldots , j _{n -k}\} } \Vert a_{i_1}\Vert _{L^n ({\Omega }, \mu )} \cdots \Vert f_{j_{n-k}}\Vert _{L^n ({\Omega }, \mu )} \nonumber \\&\quad \le \sum _{i=1}^n \Vert a_i\Vert _{ L^n ({\Omega }, \mu )} \left( 1 + \sum _{i =1}^n \sum _{ k = 1} ^n \Vert f_i\Vert _{L^n ({\Omega },\mu )}^k\right) . \end{aligned}$$

(26)

Clearly Lemma 21 follows from (25) and (26).$\square $

Continuation of the proof of Proposition 20. We define a function $G : {\mathcal L}^n_n ({\Omega }) \rightarrow {\mathbb R}$ by setting

$$\begin{aligned} G([\vec {f}, \mu ]) : =\sup _{\Omega }g({\omega })\cdot 2^n \left( 1+ \sum _{i =1}^n \sum _{ k = 1} ^n \Vert f_i\Vert _{L^n ({\Omega },\mu )}^k\right) . \end{aligned}$$

Let us pick an element ${\vec {f}}_1= ((f_1)_1, \ldots ,(f_1)_n)\in (C_b({\Omega }) )^ n \cap B (\vec {f}, \delta , \mu )$ where $\delta $ is so small such that the following equalities hold:

$$\begin{aligned}&\delta <\min \left\{ \frac{ 1}{2}, \frac{{\varepsilon }}{16G([\vec {f}, \mu ])}\right\} ,\end{aligned}$$

(27)

$$\begin{aligned}&|T_{g, 1} ({\vec {f}}_1, \mu ) - T_{g,1} (\vec {f}, \mu ) | < {{\varepsilon }\over 16}, \end{aligned}$$

(28)

and

$$\begin{aligned} |G([\vec {h}, \mu ])- G([{\vec {f}}_1, \mu ])| \le {{\varepsilon }\over 8 } \end{aligned}$$

(29)

for all $h \in B({\vec {f}}_1, \delta , \mu )$. The existence of $\delta $ follows from the positivity of G, from Lemma 21 and from the continuity of the restriction of G to each fiber $\oplus ^n L^n ({\Omega },\mu )$.

We define a neighborhood $U_1 =U_1([{\vec {f}}_1, \mu ])$ containing $\mu $ as follows:

$$\begin{aligned} U_1: = U_{(g\cdot f_1 \cdots f_n), |(f_1)_1|^n, \ldots , |(f_1)_n|^n),\,\lambda }(\mu ), \end{aligned}$$

where $\lambda $ depends on $g, {\vec {f}}_1, \mu $ and is so small such that

$$\begin{aligned} \lambda < \frac{{\varepsilon }}{8} \end{aligned}$$

(30)

and for $\mu ' \in U_1$ we have

$$\begin{aligned} |G([{\vec {f}}_1, \mu ']) -G([{\vec {f}}_1, \mu ])| \le {{\varepsilon }\over 8} . \end{aligned}$$

(31)

The existence of $\lambda $ satisfying (31) is ensured by the continuity of the function $G([{\vec {f}}_1, \mu ])$ in variable $\mu $.

Now we shall show that $O({\vec {f}}_1, U_1, \delta ) \ni [\vec {f}, \mu ]$ satisfies (24). Assume that $[\vec {h}, \mu '] \in O({\vec {f}}_1, U_1, \delta )$. Then

$$\begin{aligned}&|T_{g,1} ([\vec {h},\mu ']) - T_{g,1}([\vec {f}, \mu ])| \le |T_{g,1} ([\vec {h},\mu ']) - T_{g,1}([{\vec {f}}_1, \mu '])|\nonumber \\&\quad +\, |T_{g,1}([{\vec {f}}_1, \mu ']) - T_{g,1}([{\vec {f}}_1, \mu ])| + |T_{g,1}([{\vec {f}}_1, \mu ])- T_{g,1}([\vec {f}, \mu ])|. \end{aligned}$$

(32)

Let us estimate the first term in the RHS of (32). By Lemma 21 we have

$$\begin{aligned} |T_{g,1} ([\vec {h}, \mu ']) - T_{g,1} ([{\vec {f}}_1, \mu '])| \le \Vert \vec {h} - {\vec {f}}_1\Vert _{L_n^n (\mu ')} \cdot G([{\vec {f}}_1, \mu ']). \end{aligned}$$

(33)

Taking into account (31), (29), and the choice of $\delta $ in (27), we obtain from (33), noting that $ {\vec {f}}_1 \in B(\vec {f}, \delta , \mu ) \Longrightarrow \vec {f}\in B({\vec {f}}_1,\delta , \mu )$:

$$\begin{aligned}&|T_{g,1} ([\vec {h}, \mu ']) - T_{g,1} ([{\vec {f}}_1, \mu '])|\le \delta \cdot G([{\vec {f}}_1, \mu '])\nonumber \\&\quad \le \delta \left( \frac{{\varepsilon }}{8} + G([{\vec {f}}_1, \mu ])\right) \le \delta \left( \frac{{\varepsilon }}{8}+\frac{{\varepsilon }}{8}+G([\vec {f}, \mu ])\right) < \frac{3{\varepsilon }}{16}. \end{aligned}$$

(34)

We estimate the second term in the RHS of (32) as follows, using the fact $\mu ' \in U_1= U_1([{\vec {f}}_1, \mu ])$ with $\lambda $ satisfying (30):

$$\begin{aligned} |T_{g,1}([{\vec {f}}_1, \mu ']) - T_{g,1}([{\vec {f}}_1, \mu ])| < \lambda < \frac{{\varepsilon }}{8}. \end{aligned}$$

(35)

Using (34), (35) and estimating the last term in the RHS of (32) by (28), we obtain from (32)

$$\begin{aligned} |T_{g,1} ([\vec {h},\mu ']) - T_{g,1}([\vec {f}, \mu ])| \le \frac{3{\varepsilon }}{16} + \frac{{\varepsilon }}{8} + \frac{{\varepsilon }}{16} = \frac{3{\varepsilon }}{8}. \end{aligned}$$

(36)

Equation (36) implies that $T_{g,1} ([\vec {h},\mu ']) \in (a, b)$. Hence (24) holds. The proof of Proposition 20 is completed.$\square $

Remark 22

Let $1_{\Omega }: {\Omega }\rightarrow {\mathbb R}$ denote the constant function taking the value 1. Then $1_{\Omega }\in C_b ({\Omega })$. Let $(M, {\Omega }, \mu , p)$ be a 2-integrable parametrized measure model. By (1) the 2-tensor field $T_{1_{\Omega }, 1}$ induces the following local statistical 2-tensor g on $(M, {\Omega }, \mu , p)$:

$$\begin{aligned} g_x (V, W)= & {} (T_{1_{\Omega }, 1})_{p(x)} ({\partial }_V \ln \bar{p}(x), {\partial }_W \ln \bar{p}(x))\nonumber \\= & {} \int _{\Omega }{\partial }_V \ln \bar{p}(x)\cdot {\partial }_W \ln \bar{p}(x) \, \mathrm{d}p(x). \end{aligned}$$

(37)

The RHS of (37) is the Fisher metric $g^F$. Thus, the Fisher metric is induced from the strongly continuous covariant 2-tensor field $T_{1_{\Omega },1}$ on ${\mathcal M}({\Omega })$. In the same way, the Amari–Chentsov tensor $T^{AC}$ is induced from the strongly continuous covariant 3-tensor field $T_{1_{\Omega }, 1}$ on ${\mathcal M}({\Omega })$.

5 The uniqueness of the Fisher metric

Recall that $\delta _{\omega }$ denotes the Dirac measure concentrated at ${\omega }\in {\Omega }$.

Lemma 23

(cf. Bogachev 2007, Example 8.1.6) The set of all measures of the form $\sum _{i=1}^N c_i \delta _{{\omega }_i},$ $c_i >0,$ is dense in ${\mathcal M}({\Omega })$ in the weak topology. The convex hull of the set of Dirac measures is dense in the space ${\mathcal P}({\Omega })$.

Proof

1.
A version of Lemma 23 for finite Baire measures is proved in Bogachev (2007, Example 8.1.6). We apply Bogachev’s argument for the proof of Lemma 23. Suppose we are given a neighborhood $U\ni \mu $ of the form (10). We may assume that the total variation norm $\Vert \mu \Vert \le 1$. There are simple (step) functions $g_i$ such that $\sup _{{\omega }\in {\Omega }} |f_i ({\omega }) - g_i ({\omega })| < {\varepsilon }/4$ for all $i \in [ 1,k]$. To prove Lemma 23 it suffices to show that U contains a measure $\nu =\sum _{i=1}^N c_i \delta _{{\omega }_i}$ such that for all $i \in [1, k]$ we have
$$\begin{aligned} \int _{\Omega }g_i\,\mathrm{d}\mu = \int _{\Omega }g_i\,\mathrm{d}\nu . \end{aligned}$$
(38)
Let ${\Omega }= \cup _{j=1}^{n_i} A_i^j$ be a finite partition into disjoint measurable sets corresponding to $g_i$, i.e., $g_i = \sum a_i ^ j \chi _ {A_i^j}$. Then
$$\begin{aligned} {\Omega }= \cup _{l_1,\ldots , l_k} A_{1} ^{l_1} \cap A_2^{l_2} \cap \cdots \cap A_{k} ^{l_k} \end{aligned}$$
is a finite partition corresponding to $g_i$ for all $ i \in [1, k]$. Set $c_{l_1 \cdots l_k} : = \mu (A_{1} ^{l_1} \cap A_2^{l_2} \cap \cdots \cap A_{k} ^{l_k})$ and let ${\omega }_{l_1 \cdots l_k}$ be a point in $A_{1} ^{l_1} \cap A_2^{l_2} \cap \cdots \cap A_{k} ^{l_k}$. Then (38) holds for $\nu = \sum _{l_1,\ldots ,l_k} c_{l_1 \cdots l_k} \delta _{{\omega }_{l_1 \cdots l_k}}$. Since $\mu $ is a non-negative measure, we have $c_{l_1 \cdots l_k} \ge 0$. This completes the proof of the first assertion of Lemma 23.
2.
The second assertion follows immediately, since by the above construction of $\sum _{i=1}^N c_i \delta _{{\omega }_i}$ we have $\sum c_i = \mu ({\Omega })$.$\square $

Proof of Theorem 4

Assume that F is a metric defined on all 2-integrable statistical models $(M, {\Omega }, \mu , p)$ that satisfies the condition of Theorem 4 and $\tilde{F}_{\Omega }$ denotes the associated strongly continuous quadratic form on ${\mathcal M}({\Omega })$. Denote by $\tilde{g} ^F _{\Omega }$ the quadratic form on ${\mathcal M}({\Omega })$ that is associated with the Fisher metric $g^F$. We shall show that $\tilde{F}_{\Omega }= c\cdot \tilde{g} ^ F_{\Omega }$ for some constant c.

By Proposition 25 it suffices to consider the case ${\Omega }$ is non-discrete. Let $\kappa _n: {\Omega }\rightarrow {\Omega }_n$ be a statistic such that $\kappa _n ({\Omega }) = {\Omega }_n$. Let us choose points $\{{\omega }_1,\ldots , {\omega }_n \}\in {\Omega }$ such that $\kappa _n ({\omega }_i)$ are distinct points in ${\Omega }_n$. Let us consider the following map

$$\begin{aligned} {\Omega }_n \mathop {\rightarrow }\limits ^{i_n} {\Omega }\mathop {\rightarrow }\limits ^{\kappa _n} {\Omega }_n, \end{aligned}$$

where $i_n $ identifies $\kappa _n ({\omega }_i)$ with ${\omega }_i$ for all $i \in [1, n]$. Note that $i_n$ is also a statistic and $\kappa _n \circ i_n = \mathrm{Id}$. Let $\mu _n ^+ \in {\mathcal P}_+ ({\Omega }_n)$. Observe that $({\mathcal P}_+ ({\Omega }_n), {\Omega }_n, \mu _n^+, \mathrm{Id})$ is a 2-integrable statistical model. By the monotonicity assumption of F, and using $\kappa _n \circ i_n = \mathrm{Id}$, we conclude that the metric F defined on the 2-integrable statistical model $({\mathcal P}_+ ({\Omega }_n), {\Omega }, (i_n)_*(\mu _n^+), (i_n)_*(\mathrm{Id}))$ is defined uniquely by the metric F defined on the 2-integrable statistical model $({\mathcal P}_+ ({\Omega }_n), {\Omega }_n, \mu _n^+, \mathrm{Id})$. By Proposition 25 the metric F defined on the 2-integrable statistical model $({\mathcal P}_+ ({\Omega }_n), {\Omega }_n, \mu _n ^+, \mathrm{Id})$ coincides with the Fisher metric up to a multiplicative constant c. Hence, the restriction of $\tilde{F}_{\Omega }$ to the subspace of $ {\mathcal L}_2 ^2 ({\Omega })$

$$\begin{aligned} {\mathcal L}_2 ^2({\omega }_1,\ldots , {\omega }_n): \left\{ [f_1, f_2, \mu _n]\in {\mathcal L}_2^2 ({\Omega })|\, \mu _n = \sum _{i=1}^n c_i \delta _{{\omega }_i}, c_i > 0 \right\} \end{aligned}$$

coincides with the restriction of $\tilde{g}^F_{\Omega }$ up to the multiplicative constant c, since $\tilde{F}_{\Omega }$ is strongly continuous.

Now we shall show that the constant c does not depend on the choice of a collection $\{{\omega }_1,\ldots , {\omega }_n\}$. Let $\{{\omega }_1 ',\ldots , {\omega }_m'\}$ be another collection of distinct m points on ${\Omega }$. Let ${\Omega }_N : =\{ {\omega }_1'', \ldots , {\omega }_N ''\}$ be the union of $\{{\omega }_1,\ldots , {\omega }_n \}$ and $\{{\omega }_1 ',\ldots , {\omega }_m'\}$. We consider the following sequence of statistics

$$\begin{aligned} {\Omega }_n\mathop {\rightarrow }\limits ^{i_{n, N}}{\Omega }_N \mathop {\rightarrow }\limits ^{i_N} {\Omega }\mathop {\rightarrow }\limits ^{\kappa _N} {\Omega }_N \mathop {\rightarrow }\limits ^{\kappa _{N, n}}{\Omega }_n, \end{aligned}$$

where $i_{n, N}$ and $i_{N}$ are the natural embeddings and $\kappa _N$ and $\kappa _{N, n}$ are sufficient statistics such that $\kappa _N \circ i_N = \mathrm{Id}$ and $\kappa _{N, n} \circ i _{n, N} = \mathrm{Id}$. By Proposition 25, the constant c that depends on $\{ {\omega }_1,\ldots , {\omega }_n\}$ equals the constant $c ''$ that depends on $\{{\omega }_1'', \ldots , {\omega }_N ''\}$. In the same way we prove that the constant $c'$ that depends on $\{{\omega }_1 ',\ldots , {\omega }_m'\}$ equals the constant $c ''$ that depends on $\{{\omega }_1'', \ldots , {\omega }_N ''\}$. Hence the constant c does not depend on the choice of $\{{\omega }_1,\ldots , {\omega }_n\}$.

We denote by ${\mathcal D}^+ ({\Omega })$ the set of all measures $\mu _n= \sum _{i=1}^n c_i \delta _{{\omega }_i}, c_i > 0 $, where ${\omega }_i \in {\Omega }$. By Lemma 23 the subset

$$\begin{aligned} {\mathcal L}^2_2 ({\Omega }, {\mathcal D}^+) : = \{[f_1, f_2, \mu ]\in {\mathcal L}^2_2({\Omega })| \, \mu \in {\mathcal D}^+({\Omega })\} \end{aligned}$$

is dense in ${\mathcal L}^2_2({\Omega })$ in the mixed topology. Since the restriction of $\tilde{F}_{\Omega }$ to ${\mathcal L}^2_2 ({\Omega }, {\mathcal D}^+)$ coincides with the restriction of $\tilde{g}^F_{\Omega }$ up to the multiplicative constant c, taking into account the strong continuity of $\tilde{F}_{\Omega }$, this completes the proof of Theorem 4.$\square $

References

Adams, R.A., Fournier, J.J.F. (2006). Sobolev spaces. Amsterdam: Elsevier/Academic Press.
Amari, S. (1987). Differential geometrical theory of statistics. In: Differential geometry in statistical inference. Lecture note-monograph series, (Vol. 10). California: Institute of Mathematical Statistics.
Amari, S., Nagaoka, H. (2000). Methods of information geometry. Translations of mathematical monographs (Vol. 191). Providence/Oxford: American Mathematical Society/Oxford University Press.
Ay, N., Jost, J., Lê, H. V., Schwachhöfer, L. (2015). Information geometry and sufficient statistics. Probability Theory and related Fields, 162, 327–364. arXiv:1207.6736.
Ay, N., Jost, J., Lê, H. V. and Schwachhöfer, L., Information geometry (book in preparation).
Ay, N., Olbrich, E., Bertschinger, N., Jost, J. (2011). A geometric approach to complexity. Chaos, 21, 37–103.
Billingsley, P. (1999). Convergence of probability measures. New York: Wiley.
Book MATH Google Scholar
Bogachev, V.I. (2007). Measure Theory (Vol. I, II). Berlin: Springer.
Campbell, L. L. (1986). An extended Chentsov characterization of a Riemannian metric. Proceedings of the American Mathematical Society, 98, 135–141.
MathSciNet MATH Google Scholar
Chentsov, N. (1978). Algebraic foundation of mathematical statistics. Mathematische Operationsforschung und Statistik Serie Statistics, 9, 267–276.
MathSciNet MATH Google Scholar
Chentsov, N. (1982). Statistical decision rules and optimal inference. Translation of mathematical monographs (Vol. 53). Providence: American Mathematical Society.
Hamilton, R. (1982). The inverse function theorem of Nash and Moser. Bulletin of the American Mathematical Society, 7, 65–222.
Article MathSciNet MATH Google Scholar
Jost, J. (2005). Postmodern analysis. Berlin: Springer.
MATH Google Scholar
Morozova, E., Chentsov, N. (1991). Natural geometry of families of probability laws, Itogi Nauki i Techniki, Current problems of mathematics, Fundamental directions 83 (pp. 133–265). Moscow.
Neveu, J. (1965). Mathematical foundations of the calculus of probability. San Francisco: Holden-Day Inc.
MATH Google Scholar
Shahshahani, S. (1979). A new mathematical framework for the study of linkage and selection. Memoirs of the American Mathematical Society, volume 17, Nr. 211.

Download references

Acknowledgments

The author thanks Shun-ichi Amari, Nihat Ay, Lorenz Schwachhöfer and Alesha Tuzhilin for valuable conversations. She is grateful to Vladimir Bogachev and Jürgen Jost for their helpful comments and suggestions. The final version of this manuscript is greatly improved thanks to critical helpful suggestions of the anonymous referees. She acknowledges the VNU for Sciences in Hanoi for excellent working conditions and financial support during her visit when a part of this note has been done.

Author information

Authors and Affiliations

Institute of Mathematics of ASCR, Zitna 25, 11567, Prague 1, Czech Republic
Hông Vân Lê

Authors

Hông Vân Lê
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Hông Vân Lê.

Additional information

H.V.L. is partially supported by RVO: 67985840.

Appendix: The Chentsov uniqueness theorem

In this appendix we recall a reformulation of the Chentsov theorem (Chentsov 1978, Theorem 11.1, p. 159) on the uniqueness of the Fisher metric in the language of information geometry by Amari and Nagaoka (Proposition 24), which is simpler than the original formulation by Chentsov in the category language. In Proposition 25 we formulate a result in Ay et al. (2015) that characterizes the Fisher metric on finite sample spaces via the monotonicity. Then we discuss in Remark 26 some problems in generalizing the Chentsov theorem to a larger class of measure spaces that contains also non-discrete measure spaces.

Let us denote by ${\mathcal P}_+ ({\Omega }_n)$ the subset of ${\mathcal P}({\Omega }_n)$ that consists of positive measures.

Proposition 24

(Amari and Nagaoka 2000, Theorem 2.6, p. 38) Assume that $\{(g_n)\}_{n=1}^\infty $ is a sequence of Riemannian metrics on ${\mathcal P}_+ ({\Omega }_n)$ for each n that are invariant with respect to sufficient statistics; i.e., for all $n, m, S \subset {\mathcal P}_+({\Omega }_n),$ and $F : {\Omega }_n \rightarrow {\Omega }_m$ such that F is a sufficient statistic for S, the induced metrics on S and $F_*(S)$ are assumed to be invariant. Then there exists a positive real number c such that, for all n, $g_n$ coincides with the Fisher metric on ${\mathcal P}_ +({\Omega }_n)$ scaled by a factor of c.

Amari and Nagaoka did not supply their proof of Proposition 24. We recommend the reader to Campbell (1986) for a slight generalization of the Chentsov theorem, whose proof is close to the original Chentsov’s proof. For the reader convenience we recall the following monotonicity characterization of the Fisher metric on finite sample spaces.

Proposition 25

(Ay et al. 2015, Corollary 4.11) Let F be a continuous local statistical quadratic 2-form defined on statistical models associated with finite sample spaces $\{{\Omega }_n\}$ such that F is monotone under statistics. Then F coincides with the Fisher metric up to a multiplicative constant.

Remark 26

1.
Chentsov defined the Fisher metric only on the positive sector ${\mathcal P}_+({\Omega }_n)$ of the space of all probability measures, because the expression for the Fisher metric in (2) is well defined only on ${\mathcal P}_+({\Omega }_n)$. In this paper we follow the approach in Ay et al. (2015) by requiring that an information metric F is obtained by (1) from the associated 2-form $\tilde{F}$, which is not only defined on ${\mathcal P}_+({\Omega }_n)$ but also defined on ${\mathcal M}({\Omega }_n)$ (in general case, on ${\mathcal M}({\Omega })$) and hence on ${\mathcal P}({\Omega }_n)$ (resp. on ${\mathcal P}({\Omega })$). This small difference is important, since for a non-discrete space ${\Omega }$ we do not know how to define a notion of a positive measure without using a reference measure $\mu _0$. Since the Fisher metric $g^F$ satisfies the mentioned requirement, see Example 19, Proposition 25 is equivalent to the Chentsov uniqueness theorem. Clearly, Theorem 4 generalizes Proposition 25.
2.
As we mentioned above, the original Chentsov theorem can be equivalently reformulated in terms of the associated form $\tilde{F}$. Note that the space ${\mathcal P}({\Omega }_n)$ (resp. ${\mathcal M}({\Omega }_n)$) is not a manifold, or a manifold with boundary, but a stratified space which admits different embeddings into Euclidean spaces. In Ay et al. (2016) and in the present paper we do not consider smooth tensor fields on ${\mathcal P}({\Omega }_n) $ (resp. on ${\mathcal M}({\Omega }_n)$) but (strongly or point-wise) continuous tensor fields on ${\mathcal M}({\Omega })$ which do not require the notion of a smooth structure on ${\mathcal M}({\Omega })$.
3.
In Morozova et al. (1991, §5) Morozova–Chentsov also suggested a method to extend the Chentsov uniqueness theorem to the case of non-discrete measure spaces ${\Omega }$. Their idea is similar to the Amari–Nagaoka idea, namely they wanted to consider a Riemannian metric on infinite measure spaces as limit of Riemannian metrics on finite measure spaces. They did not discuss a condition under which such a limit exists. In fact, they did not give a definition of limit of such metrics. If the limit exists they called it finitely generated. They stated that the Fisher metric is the unique finitely generated metric that is invariant under sufficient statistics (resp. that is monotone). One may speculate that since such a Riemannian metric depends on base measures $\mu $ and tangent vectors at $\mu $ Morozova–Chentsov’s approach requires a definition of topology on the space ${\mathcal L}_2 ^2 ({\Omega })$.

About this article

Cite this article

Lê, H.V. The uniqueness of the Fisher metric as information metric. Ann Inst Stat Math 69, 879–896 (2017). https://doi.org/10.1007/s10463-016-0562-0

Download citation

Received: 12 July 2013
Revised: 04 April 2016
Published: 14 May 2016
Issue Date: August 2017
DOI: https://doi.org/10.1007/s10463-016-0562-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

The uniqueness of the Fisher metric as information metric

Abstract

Similar content being viewed by others

The Fisher metric as a metric on the cotangent bundle

The Exponential Family in Abstract Information Theory

Wasserstein information matrix

1 Introduction

Definition 1

Theorem 2

Corollary 3

Theorem 4

Corollary 5

2 k-integrable parametrized measure models and local statistical continuous tensor fields

Definition 6

Remark 7

Definition 8

Definition 9

Definition 10

Example 11

3 The monotonicity of the Fisher metric

Remark 12

Lemma 13

Proof

Proof of Theorem 2

Remark 14

4 Mixed topology and strongly continuous covariant tensor fields

4.1 Mixed topology on \({\mathcal L}^n_n({\Omega })\)

Definition 15

Remark 16

Proposition 17

Proof

4.2 Strongly continuous covariant n-tensor on \({\mathcal M}({\Omega })\)

Definition 18

Example 19

Proposition 20

Proof

Lemma 21

Proof

Remark 22

5 The uniqueness of the Fisher metric

Lemma 23

Proof

Proof of Theorem 4

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Additional information

Appendix: The Chentsov uniqueness theorem

Appendix: The Chentsov uniqueness theorem

Proposition 24

Proposition 25

Remark 26

About this article

Cite this article

Share this article

Keywords

Search

Navigation