Parametrized Measure Models

Ay, Nihat; Jost, Jürgen; Lê, Hông Vân; Schwachhöfer, Lorenz

doi:10.1007/978-3-319-56478-4_3

Nihat Ay^16,20,
Jürgen Jost^17,20,
Hông Vân Lê¹⁸ &
…
Lorenz Schwachhöfer¹⁹

Part of the book series: Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics ((MATHE3,volume 64))

5903 Accesses
1 Citations

Abstract

This chapter represents the most important technical achievement of this book, a combination of functional analysis and geometry as the natural framework for families of probability measures on general sample spaces. In order to work on such a sample space, one needs a base or reference measure. Other measures, like those in a parametric family, are then described by densities w.r.t. this base measure. Such a base measure, however, is not canonical, and it can be changed by multiplication with an $L^{1}$-function. But then, also the description of a parametric family by densities changes. Keeping track of the resulting functorial behavior and pulling it back to the parameter spaces of a parametric family is the key that unlocks the natural functional analytical properties of parametric families. We develop the appropriate differentiability and integrability concepts. In particular, we shall need roots (half-densities) and other fractional powers of densities. For instance, when the sample space is a differentiable manifold, its diffeomorphism group operates isometrically on the space of half-densities with their $L^{2}$-product. The latter again yields the Fisher metric. At the end of this chapter, we compare our framework with that of Pistone–Sempi which depends on an analysis of integrability properties under exponentiation.

Access provided by CONRICYT-eBooks. Download chapter PDF

3.1 The Space of Probability Measures and the Fisher Metric

This section has a more informal character. It introduces the basic concepts and problems of information geometry on—typically—infinite sample spaces and thereby sets the stage for the more formal considerations in the next section. The perspective here will be somewhat different from that developed in Chap. 2, as the constructions for the probability simplex presented there did not have to grapple with the measure theoretical complications that we shall encounter here. Nevertheless, the analogy with the finite-dimensional case will guide our intuition.

Let $\varOmega$ be a set with a $\sigma$-algebra $\mathfrak{B}$ of subsets;^{Footnote 1} for example, $\varOmega$ can be a topological space and $\mathfrak{B}$ the $\sigma$-algebra of Borel sets, i.e., the $\sigma$-algebra generated by the open sets. Later on, ${\varOmega }$ will also have to carry a differentiable structure.

For a signed measure $\mu$ on ${\varOmega }$, we have the total variation

$$ {\| \mu\|}_{TV} := \sup\sum _{i = 1}^{n} \bigl|\mu(A_{i})\bigr|, $$

(3.1)

where the supremum is taken over all finite partitions $\varOmega= A_{1} \uplus\dots\uplus A_{n}$ with disjoint sets $A_{i} \in\mathfrak{B}$. If ${\| \mu\|}_{TV}<\infty$, the signed measure $\mu$ is called finite. We consider the Banach space ${\mathcal {S}}({\varOmega })$ of all signed finite measures on ${\varOmega }$ with the total variation as Banach norm. The subsets of all finite non-negative measures and of probability measures on ${\varOmega }$ will be denoted by ${\mathcal {M}}({\varOmega })$ and ${\mathcal {P}}({\varOmega })$, respectively.

The null sets of a measure $\mu$ are those subsets $A$ of $\varOmega$ with $\mu(A)=0$. A finite non-negative measure $\mu_{1}$ dominates another finite measure $\mu_{2}$ if every null set of $\mu_{1}$ is also a null set of $\mu_{2}$. Two finite non-negative measures are called compatible if they dominate each other, i.e., if they have the same null sets. Spaces of such measures will be the basis of our subsequent constructions, and we shall therefore now formalize these notions. We take some $\sigma$-finite non-negative measure $\mu_{0}$. Then

$$\begin{aligned} {\mathcal {S}}({\varOmega }, \mu_{0}) := & \bigl\{ \mu= \phi\, \mu_{0} : \phi\in L^{1}({\varOmega }, \mu_{0}) \bigr\} \end{aligned}$$

is the space of signed measures dominated by $\mu_{0}$. This space can be identified in terms of the canonical map

$$\begin{aligned} i_{can}: {\mathcal {S}}({\varOmega }, \mu_{0}) &\to L^{1}({\varOmega }, \mu_{0}) \\ \mu&\mapsto\frac{d \mu}{ d \mu_{0}}, \end{aligned}$$

(3.2)

where the latter is the Radon–Nikodym derivative of $\mu$ w.r.t. $\mu_{0}$. Note that

$${\| \mu\|}_{TV} \; = \|\phi\|_{L^{1}(\varOmega,\mu_{0})}\; = \; { \biggl\Vert \frac {d \mu}{d \mu_{0}} \biggr\Vert }_{L^{1}(\varOmega,\mu_{0})}. $$

As we see from this description that $i_{can}$ is a Banach space isomorphism, we refer to the topology of ${\mathcal {S}}({\varOmega }, \mu_{0})$ also as the $L^{1}$ -topology.

If $\mu_{1}$, $\mu_{2}$ are compatible finite non-negative measures, they are absolutely continuous with respect to each other in the sense that there exists a non-negative function $\phi$ that is integrable with respect to either of them, such that

$$ \mu_{2}=\phi\mu_{1}, \quad\text{or, equivalently,} \quad \mu_{1}=\phi^{-1}\mu_{2}. $$

(3.3)

As noted before, $\phi$ is then the Radon–Nikodym derivative of $\mu _{2}$ with respect to $\mu_{1}$.

Being integrable, $\phi$ is finite almost everywhere (with respect to both $\mu_{1}$ and $\mu_{2}$) on $\varOmega$, and since the situation is symmetric between $\phi$ and $\phi^{-1}$, $\phi$ is also positive almost everywhere. Thus, for any finite non-negative measure $\mu$ on $\varOmega$, we let

$$ {\mathcal {F}}_{+}({\varOmega }, \mu): = \bigl\{ \phi \in L^{1} (\varOmega, \mu), \, \phi> 0\ \mu\text{-a.e.}\bigr\} $$

(3.4)

be the space of integrable functions on $\varOmega$ that are positive almost everywhere with respect to $\mu$. (The reason for the notation will become apparent in Sect. 3.2 below.) In fact, in later sections, we shall find it more convenient to work with the space of measures

$$ {\mathcal {M}}_{+}({\varOmega }, \mu): = \bigl\{ \phi\mu: \phi\in L^{1} ( \varOmega, \mu), \, \phi> 0\ \mu\text{-a.e.}\bigr\} $$

(3.5)

than with the space ${\mathcal {F}}_{+}({\varOmega },\mu)$ of functions. Of course, these two spaces can easily be identified, as they simply differ by multiplication with $\mu$.

In particular, the topology of ${\mathcal {S}}({\varOmega }, \mu_{0})$ is independent of the particular choice of the reference measure $\mu_{0}$ within its compatibility class, because if

$$ \phi\in L^{1}({\varOmega },\mu_{0})\quad\text{and}\quad\psi\in L^{1}({\varOmega }, \phi\mu_{0}), \quad \text{then }\psi\phi\in L^{1}({\varOmega },\mu_{0}). $$

(3.6)

Compatibility is an equivalence relation on the space of finite non-negative measures on $\varOmega$, and that space is therefore partitioned into equivalence classes. The set of such equivalence classes is quite large. For instance, the Dirac measure at any point of ${\varOmega }$ generates its own such class. More generally, in Euclidean space, we can consider Hausdorff measures of subsets of possibly different Hausdorff dimensions. In the sequel, we shall usually work within a single such compatibility class with some base measure $\mu_{0}$. The basic example that one may have in mind is of course the Lebesgue measure on a Euclidean or Riemannian domain, or else the Hausdorff measure on some fixed subset.

Any other finite non-negative measure $\mu$ that is compatible with $\mu_{0}$ is thus of the form

$$ \mu=\phi\mu_{0} \quad\text{for some} \quad\phi\in {\mathcal {F}}_{+}({\varOmega }, \mu_{0}), $$

(3.7)

and therefore, by (3.3),

$$ \mu_{0}=\phi^{-1}\mu\quad\text{with} \quad \phi^{-1} \in {\mathcal {F}}_{+}({\varOmega }, \mu). $$

(3.8)

This yields the identifications

$$ \textstyle\begin{array}{ccccc} {\mathcal {F}}_{+}({\varOmega }, \mu)& = & {\mathcal {F}}_{+}({\varOmega }, \mu_{0}) & = & :{\mathcal {F}}_{+},\\ {\mathcal {M}}_{+}({\varOmega }, \mu)& = & {\mathcal {M}}_{+}({\varOmega }, \mu_{0}) & = & :{\mathcal {M}}_{+}, \end{array} $$

(3.9)

where we use ${\mathcal {F}}_{+}$ and ${\mathcal {M}}_{+}$ if there is no ambiguity over which base measure $\mu_{0}$ is used. Moreover, if

$$\mu_{1}=\phi\mu_{0},\quad\mu_{2}=\psi \mu_{1}, \quad\text{with}\quad\phi\in {\mathcal {F}}_{+}({\varOmega }, \mu_{0}),\ \psi\in {\mathcal {F}}_{+}({\varOmega }, \mu_{1}) $$

then by (3.6)

$$\mu_{2} =\psi\phi\mu_{0} \quad\text{with}\quad\psi\phi\in {\mathcal {F}}_{+}({\varOmega }, \mu_{0}). $$

${\mathcal {F}}_{+}({\varOmega }, \mu)$, however, is not a group under pointwise multiplication since for $\phi_{1}, \phi_{2} \in L^{1}({\varOmega }, \mu_{0})$, their product $\phi_{1} \phi_{2}$ need not be in $L^{1}({\varOmega }, \mu_{0})$.

The question arises, however, whether ${\mathcal {F}}_{+}$ possesses an—ideally dense—subset that is a group, perhaps even a Lie group. That is, to what extent can we linearize the partial multiplicative group structure via an exponential map, in the same manner as $z \mapsto e^{z}$ maps the additive group $(\mathbb {R},+)$ to the multiplicative group $(\mathbb {R}^{+},\cdot)$? Formally, we might be inclined to identify the tangent space $T_{\mu} {\mathcal {F}}_{+}$ of ${\mathcal {F}}_{+}$ at any compatible $\mu$ with

$$ B_{\mu}:= \bigl\{ f:\varOmega\rightarrow\mathbb{R}\cup\{ \pm\infty\},\ e^{\pm f}\in L^{1}(\varOmega,\mu)\bigr\} . $$

(3.10)

We should point out that here we attempt to revert the construction of Sects. 2.1, 2.2. However, we immediately run into the difficulty that the set $B_{\mu}$ is in general not a vector space, but only a convex subset of $L^{1}({\varOmega }, \mu)$. Thus, a reasonable definition of $T_{\mu} {\mathcal {F}}_{+}$ would be to define it as the vector space generated by $B_{\mu}$.

We now switch to working with the space ${\mathcal {M}}_{+}$ of measures instead of the space ${\mathcal {F}}_{+}$ of functions, where, as mentioned, $\phi\in {\mathcal {F}}_{+}$ corresponds to $\phi\mu\in {\mathcal {M}}_{+}$. One approach to provide some structure on ${\mathcal {M}}_{+}$ has been pursued by Pistone and Sempi who constructed a Banach norm on $T_{\mu} {\mathcal {M}}_{+}$ such that $B_{\mu}$ becomes a convex open subset. The Banach norm is—up to equivalence—independent of the choice of $\mu$ so that in this way ${\mathcal {M}}_{+}$ becomes a Banach manifold. The topology which is imposed on ${\mathcal {M}}_{+}$ in this way is called the $e$-topology. We shall describe this in detail in Sect. 3.3.

Let us return to our discussion of the analogy between the general situation and the construction of Sects. 2.1, 2.2. There the space of functions was the cotangent space of a measure $m$ in the space of measures, and the tangent and the cotangent space were then identified through a scalar product (and the scalar product chosen then was the Fisher metric). Here, we take the duality between functions and measures

$$(\phi,m)\mapsto \int\phi \,dm $$

as a starting point and vary a measure via $m \mapsto Fm$ for some non-negative function $F$. This duality will then induce the scalar product

$$\langle f,F \rangle_{m} = \int f F \,dm $$

which we shall then use to define the Fisher metric.

The construction is tied together by the exponential map

$$ \begin{aligned} \exp{:}\ T_{\mu} {\mathcal {F}}_{+} {\supseteq }B_{\mu}&\rightarrow {\mathcal {F}}_{+} \\ f &\rightarrow e^{f} \end{aligned} $$

(3.11)

that converts arbitrary functions into non-negative ones. In other words, we apply the exponential function $z \mapsto e^{z}$ to each value $f(x)$ of the function $f$.

Here, in fact, the underlying structure is even an affine one, in the following sense. When we take a measure $\mu^{\prime}= \phi\mu$ in place of $\mu$, then an exponential image $e^{f} \mu$ from $T_{\mu} {\mathcal {F}}_{+}$ is also an exponential image $e^{g} \mu^{\prime}= e^{g} \phi\mu$ from $T_{\mu^{\prime}} {\mathcal {F}}_{+}$, and the relationship is

$$ f=g +\log\phi. $$

(3.12)

Thus, $f$ and $g$ are related by adding the function $\log\phi$ which is independent of both of them.

Also, of course,

$$ \begin{aligned} \log{:}\ {\mathcal {M}}_{+}&\rightarrow B_{\mu} {\subseteq }T_{\mu} {\mathcal {M}}_{+}\\ \phi\mu &\rightarrow\log\phi \end{aligned} $$

(3.13)

is the inverse of the exponential map.^{Footnote 2}

There is also another, likewise ultimately futile, approach to a putative tangent space $T_{\mu} {\mathcal {M}}_{+}$ of ${\mathcal {M}}_{+}$ which is based on a duality relation between ${\mathcal {M}}_{+}$ and those $f$ that are not only integrable ($L^{1}$) w.r.t. $\mu$, but even of class $L^{\infty}({\varOmega },\mu )$. This duality is given by

$$ ( f, \phi\mu) := \int f \phi \,d\mu, $$

(3.14)

which exists since $\phi\in L^{1}({\varOmega },\mu)$.

On this space, $T_{\mu}^{\infty} {\mathcal {M}}_{+}$, we then have a natural scalar product, namely the $L^{2}$-product

$$ \langle f,g\rangle_{\mu}:= \int fg\,d\mu\quad\text{for } f,g\in T_{\mu }^{\infty} {\mathcal {M}}_{+}. $$

(3.15)

However, the completion of $T_{\mu}^{\infty} {\mathcal {M}}_{+}$ with respect to the norm $\|\cdot\|$ induced by $\langle\cdot,\cdot\rangle_{\mu}$ is the Hilbert space $T_{\mu}^{2}{\mathcal {M}}_{+}$ of functions $f$ of class $L^{2}$ with respect to $\mu$, which is larger than those for which $e^{\pm f}$ are of class $L^{1}$.

Thus, by this construction, we do not quite succeed in making ${\mathcal {M}}_{+}$ into a Lie group. Nevertheless, (3.15) yields a natural Riemannian metric on ${\mathcal {M}}_{+}$ that will play an important role in the sequel. A Riemannian structure on a differentiable manifold $X$ assigns to each tangent space $T_{p}X$ a scalar product, and this product has to depend differentiably on the base point. At this point, this is only formal, however. When $\varOmega$ is not finite, spaces of measures or functions are infinite-dimensional in general because the value at every point is a degree of freedom. Only when $\varOmega$ is a finite set do we obtain a finite-dimensional measure or function space. Thus, we need to deal with infinite-dimensional manifolds, e.g., compatibility classes of measures. In Appendix C, we discuss the appropriate concept, that of a Banach manifold. As mentioned, in the present discussion, we only have a weak such structure, however, as our space is not complete w.r.t. the $L^{2}$-structure of the metric. In other words, the topology induced by the metrics on the tangent spaces is weaker than the Banach space topology that we are working with. For the moment, we shall therefore proceed in a formal way. Below, in Sect. 3.2, we shall provide a rigorous construction that avoids these issues.

The natural identification between the spaces $T_{\mu}^{2}{\mathcal {M}}_{+}$ and $T_{\mu'}^{2}{\mathcal {M}}_{+}$ (which for the moment will take the role of tangent spaces—and we shall therefore omit the superscript 2) with $\mu'=\phi \mu$ is given by

$$ f\rightarrow\frac{1}{\sqrt{\phi}}f; $$

(3.16)

we then have

$$ \biggl\langle \frac{1}{\sqrt{\phi}}f,\frac{1}{\sqrt{\phi}}g \biggr\rangle _{\mu'} = \int\frac{1}{\sqrt{\phi}}f\cdot\frac{1}{\sqrt{\phi}}g\phi \,d\mu= \int fg\,d\mu=\langle f,g \rangle_{\mu}. $$

In particular, if $\varOmega$ is a differentiable manifold and

$$\kappa:\varOmega\rightarrow\varOmega $$

is a diffeomorphism, then $\mu$ is transformed into $\kappa_{*}\mu$, called the push-forward of $\mu$ under $\kappa$, with

$$ \kappa_{*}\mu(V):=\mu\bigl(\kappa^{-1}V\bigr) \quad \text{for all }V\in\mathfrak {B}. $$

(3.17)

Since $\int_{V}\frac{1}{|\det d\kappa(x)|}\mu(d(\kappa(x)))=\int_{\kappa^{-1}V}\mu(dx) $ by the transformation formula, we have

$$ \kappa_{*}\mu\bigl(\kappa(x)\bigr)\bigl|\det d\kappa(x)\bigr|=\mu(y) \quad \text{for } y=\kappa(x). $$

We thus have

$$ \langle f,g\rangle_{\kappa_{*}\mu}=\bigl\langle \kappa^{*}f,\kappa^{*}g\bigr\rangle _{\mu} $$

(3.18)

with

$$ \kappa^{*}f(x)=f\bigl(\kappa(x)\bigr). $$

In other words, if we employ this transformation rule for tangent vectors, then the diffeomorphism group of $\varOmega$ acts by isometries on ${\mathcal {M}}_{+}$ with respect to the metric given by (3.15). Thus, our metric is invariant under diffeomorphisms of $\varOmega$. One might then wish to consider the quotient of ${\mathcal {M}}_{+}$ by the action of the diffeomorphism group ${\mathcal {D}}(\varOmega)$. Of course, if $\varOmega$ is finite, then ${\mathcal {D}}(\varOmega)$ is simply the group of permutations of the elements of $\varOmega$. This group has a fixed point, namely the probability measure with

$$ p(x_{i})=\frac{1}{n} \quad\text{for } i=1,\dots,n $$

(assuming $\varOmega$ consists of the $n$ elements $x_{1},\dots,x_{n}$), and its scalar multiples. We therefore expect that the quotient ${\mathcal {M}}_{+}/{\mathcal {D}}(\varOmega)$ will have singularities.

In the infinite case, the situation is somewhat different (although we still get singularities). Namely, if $\varOmega$ is a compact oriented differentiable manifold, then a theorem of Moser [190] says that any two probability measures $\mu,\nu$ that are volume forms, i.e., are smooth and positive on all open sets, (in local coordinates $(x^{1},\dots,x^{n})$ on $\varOmega$, they are thus smooth positive multiples of $dx^{1}\wedge\cdots\wedge dx^{n}$) are related by a diffeomorphism $\kappa$,

$$\nu=\kappa_{*}\mu, $$

or equivalently,

$$\mu=\bigl(\kappa^{-1}\bigr)_{\ast}\nu=:\kappa^{*} \nu. $$

Thus, the diffeomorphism group acts transitively on the space of volume forms of total measure 1, and the quotient by the diffeomorphism group of this space is therefore a single point.

We recall that a finite non-negative measure $\mu_{1}$ dominates another finite non-negative measure $\mu_{2}$ if every null set for $\mu_{1} $ also is a null set for $\mu_{2}$. Of course, two mutually dominant finite non-negative measures are compatible.

In the finite case, i.e., for $\varOmega=\{x_{1},\dots,x_{n}\}$, $\mu_{1}$ then dominates $\mu_{2}$ if

$$ \mu_{1}(x_{i})=0 \quad\text{implies } \mu_{2}(x_{i})=0 \quad\text{for any } i=1,\dots,n. $$

In particular, a measure $\mu$ with

$$\mu(x_{i})>0 \quad\text{for all } i $$

dominates every other measure on $\varOmega$.

This is different in the infinite case; for example, if $\varOmega$ is a compact oriented differentiable manifold, then a volume form which is positive on every open set no longer dominates a Dirac measure supported at some point in $\varOmega$.

In information geometry, one wishes to study only probability measures, that is, measures that satisfy the normalization

$$ \mu(\varOmega)= \int_{\varOmega}\,d\mu= 1. $$

(3.19)

It might seem straightforward to simply impose this condition upon measures and then study the space of those measures satisfying it as a subspace of the space of all measures. A somewhat different point of view, however, emerges from the following consideration. The normalization (3.19) can simply be achieved by rescaling a given measure $\mu$, that is, by multiplying it by some appropriate $\lambda \in\mathbb{R}$. $\lambda$ is simply obtained as $\mu(\varOmega)^{-1}$. The freedom of rescaling a measure now expresses that we are not interested in absolute “sizes” $\mu(A)$ of subsets of $\varOmega$, but rather only in relative ones, like $\frac{\mu(A)}{\mu(\varOmega)}$ or $\frac{\mu(A_{1})}{\mu(A_{2})}$. Therefore, we identify the space ${\mathcal {P}}({\varOmega })$ of probability measures on ${\varOmega }$ as the projective space

$$\mathbb {P}^{1}{\mathcal {M}}(\varOmega), $$

i.e., the space of all equivalence classes in ${\mathcal {M}}(\varOmega)$ under multiplication by positive real numbers. Of course, elements of ${\mathcal {P}}(\varOmega)$ can be considered as measures satisfying (3.19), but more appropriately as equivalence classes of measures giving the same relative sizes of subsets of $\varOmega$.

Our above metric then also induces a metric on ${\mathcal {P}}(\varOmega)$ as a quotient of ${\mathcal {M}}(\varOmega)$ which is different from the one obtained by identifying $\mathbb {P}^{1}{\mathcal {M}}(\varOmega)$ with the subspace of ${\mathcal {M}}(\varOmega) $ consisting of metrics satisfying (3.19). Let us recall from Proposition 2.1 in Sect. 2.2 the case where $\varOmega$ is finite, $\varOmega=\{1,\dots,n\}$. In that case, the probability measures on $\varOmega$ are given by

$$ \varSigma^{n-1}:=\Biggl\{ (p_{1},\dots,p_{n}):p_{i} \ge0 \ \text{for } i=1,\dots,n, \text{ and } \sum^{n}_{i=1}p_{i}=1 \Biggr\} . $$

These form an ($n-1$)-dimensional simplex in the positive cone $\mathbb {R}^{n}_{+}$ of $\mathbb {R}^{n}$. The projective space

$$\mathbb {P}^{1} \mathbb {R}^{n}_{+}, $$

however, is naturally identified with the corresponding spherical sector

$$ S^{n-1}_{+}:=\Biggl\{ (z_{1},\dots,z_{n}):z_{i} \ge0 \text{ for } i=1,\dots,n, \ \sum^{n}_{i=1}z_{i}^{2}=1 \Biggr\} . $$

There is a natural bijection

$$\begin{aligned} \begin{aligned} \varSigma^{n-1}&\rightarrow S^{n-1}_{+}\\ (p_{1},\dots,p_{n})&\mapsto(\sqrt{p_{1}},\dots, \sqrt{p_{n}}). \end{aligned} \end{aligned}$$

(3.20)

Let us now try to carry this over to the infinite-dimensional case, under the assumption that $\varOmega$ is a differentiable manifold of dimension $n$. In that case, we can consider each (Radon) measure $\mu$ as a density. This means that for each $x\in\varOmega$, if we consider the space $\mathit{Gl}(n,\mathbb {R})$ as the space of all bases of $T_{x}\varOmega$ (again, the identification is not canonical as we need to select one basis $V=(v^{1},\dots,v^{n})$ that is identified with $\operatorname{id}\in \mathit{Gl}(n,\mathbb {R})$),^{Footnote 3}

$$ \mu(x) (XV) = |\det X|\mu(x) (V). $$

Likewise, we call $\rho$ a half-density, if we have

$$ \rho(x) (XV) = |\det X|^{1/2}\rho(x) (V) $$

for all $X\in \mathit{Gl}(n,\mathbb {R})$ and bases $V$. Below, in Definition 3.54, we shall give a precise definition of the space of half-densities (in fact, of any $r$th power of a measure with $0 < r \leq1$).

In this interpretation, our above $L^{2}$-product on ${\mathcal {M}}_{+}({\varOmega })$ becomes an $L^{2}$-product on the space of half-densities

$$\langle\rho,\sigma\rangle:= \int_{\varOmega}\rho\sigma, $$

where we no longer need a base measure $\mu$. And the diffeomorphism group of $\varOmega$ then acts by isometries on the Hilbert space of half-densities of class $L^{2}$.

If we now have a probability measure $\mu$, then its square root $\sqrt{\mu}$ is a half-density that is contained in the unit sphere of that Hilbert space. Conversely, up to the issue of regularity, the part of that unit sphere that corresponds to non-negative half-densities can be identified with the probability measures on $\varOmega$. As mentioned, it carries a metric that is invariant under the action of the diffeomorphism group of $\varOmega$, i.e., under relabeling of the points of $\varOmega$.

The duality relation (3.14) for a probability measure $\mu' =\phi \mu$ then becomes

$$ \bigl( f,\mu' \bigr) = {\mathbb {E}}_{\mu'}(f), $$

(3.21)

the expectation value of $f$ w.r.t. the probability measure $\mu'$. This is a linear operation on the space of functions $f$ and an affine operation on the space of probability measures $\mu'$.

Let us clarify the relation with our previous construction of the metric: If $f$ is a tangent vector to a probability measure $\mu$, then it generates the curve

$$ e^{tf}\mu, \quad t\in\mathbb {R}, $$

through $\mu$. By taking the square root as before, we obtain the curve

$$ e^{\frac{1}{2}tf}\sqrt{\mu}, \quad t\in\mathbb {R}, $$

in the space of half-densities that has the expansion in $t$

$$ \sqrt{\mu}+\frac{1}{2}tf\sqrt{\mu}+O\bigl(t^{2}\bigr). $$

Thus, the tangent vector that corresponds to $f$ in the space of half-densities is

$$\frac{1}{2}f\sqrt{\mu}. $$

The inner product of two such tangent vectors is

$$ \int\frac{1}{2}f\sqrt{\mu}\cdot\frac{1}{2}g\sqrt{\mu}= \frac{1}{4} \int fg\mu. $$

Thus, up to the inessential factor $\frac{1}{4}$, we regain our original Riemannian metric on ${\mathcal {M}}_{+}({\varOmega })$. In order to eliminate that factor, in Proposition 2.1, we had used the sphere with radius 2 instead of that of radius 1. Therefore, we should also modify (3.20) in the same manner if we want to have an isometry.

Let us now translate this into the usual statistical interpretation: We have a family $p(x;s)$ of probability measures depending on a parameter $s$, $-\varepsilon< s<\varepsilon$. Then the squared norm of the tangent vector to this family at $s=0$ is (up to some factor 4)

$$\begin{aligned} &4 \int\frac{d}{ds}\sqrt{p(x;s)}\frac{d}{ds} \sqrt{p(x;s)}|_{s=0}\,dx \\ &\quad = \int\frac{d}{ds}\log p(x;s)\frac{d}{ds}\log p(x;s)p(x;0)|_{s=0}\,dx \\ &\quad= {\mathbb {E}}_{p} \biggl( \biggl(\frac{d}{ds}\log p(x;s) \biggr)^{2}\bigg|_{s=0} \biggr), \end{aligned}$$

(3.22)

where ${\mathbb {E}}_{p}$ denotes the expectation with respect to the probability measure $p=p(\cdot;0)$. By polarization, if $s=(s_{1},\dots,s_{n})$ is now $n$-dimensional, we obtain the Fisher information metric

$$ {\mathbb {E}}_{p} \biggl(\frac{\partial}{\partial s_{\mu}}\log p(x;s) \frac{\partial}{\partial s_{\nu}}\log p(x;s) \biggr). $$

(3.23)

We can also rewrite the above formula to obtain

$$\begin{aligned} &{\mathbb {E}}_{p} \biggl(\frac{\partial}{\partial s_{\mu}}\log p(x;s) \frac{\partial}{\partial s_{\nu}}\log p(x;s) \biggr)\\ &\quad = \int\frac{\partial}{\partial s_{\mu}}\log p(x;s) \frac{\partial}{\partial s_{\nu}}\log p(x;s)p(x;0)\,dx \\ &\quad = - \int\frac{\partial^{2}}{\partial s_{\mu}\partial s_{\nu}}\log p(x;s)p(x;0)\,dx \end{aligned}$$

(3.24)

since $\int\frac{\partial}{\partial s_{\mu}}\log p(x;s)p(x;0)\,dx {=}\frac{\partial}{\partial s_{\mu}}\int\log p(x;s)p(x;0)\,dx{=}\frac{\partial}{\partial s_{\mu}}\int p(x;s)\,dx=\frac {\partial}{\partial s_{\mu}}1=0$, which implies

$$\begin{aligned} 0&=\frac{\partial}{\partial s_{\nu}} \int\frac{\partial}{\partial s_{\mu}}\log p(x;s)\;p(x;s)\,dx = \int\frac{\partial^{2}}{\partial s_{\mu}\partial s_{\nu}}\log p(x;s)\;p(x;s)\,dx \\ &\quad{}+ \int \frac{\partial}{\partial s_{\mu}}\log p(x;s)\; \frac{\partial}{\partial s_{\nu}}\log p(x;s)\;p(x;s)\,dx. \end{aligned}$$

This step also admits the following interpretation:

$$ \frac{\partial}{\partial s_{\nu}}\log p(x;s) $$

(3.25)

which is called the score of the family with respect to the parameter $s_{\nu}$. Our above computation then gives

$$ {\mathbb {E}}_{p} \biggl(\frac{\partial}{\partial s_{\nu}}\log p(x;s) \biggr)=0, $$

(3.26)

that is, the expectation value of the score vanishes. (This expresses the fact that the cross-entropy

$$ -\int p(x)\log q(x)\,dx $$

(3.27)

is minimal w.r.t. $q$ precisely for $q=p$.)

The Fisher metric (3.22) then expresses the covariance matrix of the score.

Returning to (3.24), (3.26) yields the formula

$$\begin{aligned} &{\mathbb {E}}_{p} \biggl(\frac{\partial}{\partial s_{\mu}}\log p(x;s)\frac{\partial}{\partial s_{\nu}}\log p(x;s) \biggr) \\ &\quad=-{\mathbb {E}}_{p} \biggl(\frac{\partial^{2}}{\partial s_{\mu}\partial s_{\nu}}\log p(x;s) \biggr) \end{aligned}$$

(3.28)

as another representation of the Fisher metric (3.23).

We can also write our metric as

$$ \int\frac{1}{p(x;0)}\frac{\partial}{\partial s_{\mu}}p(x;s)\frac{\partial}{\partial s_{\nu}}p(x;s)\,dx. $$

In the finite case, this becomes

$$ \sum^{n}_{i=1}\frac{1}{p_{i}} \frac{\partial}{\partial s_{\mu}}p_{i}\frac{\partial}{\partial s_{\nu}}p_{i}. $$

Remark 3.1

This metric is called the Shashahani metric in mathematical biology, see Sect. 6.2.1.

As verified in Proposition 2.1, this is simply the metric obtained on the simplex $\varSigma^{n-1}$ when identifying it with the spherical sector $S^{n-1}_{2,+}$ via the map $4p=q^{2}$, $q\in S^{n-1}_{2,+} $. If the second derivatives $\frac{\partial^{2}}{\partial s_{\mu}\partial s_{\nu}}p $ vanish, i.e., if $p(x;s)$ is linear in $s$, then

$$ \sum^{n}_{i=1}\frac{1}{p_{i}} \frac{\partial}{\partial s_{\mu}}p_{i}\frac{\partial}{\partial s_{\nu}}p_{i}= \frac{\partial^{2}}{\partial s_{\mu}\partial s_{\nu}}\sum^{n}_{i=1}p_{i} \log p_{i}. $$

As will be discussed below, this means that the negative of the entropy is a potential for the metric. This will be applied in Theorem 6.4.

The Fisher metric then induces a metric on any smooth family of probability measures on $\varOmega$. To understand the Fisher metric, it is often useful to write a probability distribution in exponential form,

$$ p(x;s)=\exp\bigl(-H(x,s)\bigr), $$

(3.29)

where the normalization required for $\int p(x;s) \,dx =1$ is supposed to be contained in $H$. The Fisher metric is then simply given by

$$ {\mathbb {E}}_{p} \biggl(\frac{\partial}{\partial s_{\mu}}\log p(x;s) \frac{\partial }{\partial s_{\nu}}\log p(x;s) \biggr)= {\mathbb {E}}_{p} \biggl(\frac{\partial H}{\partial s_{\mu}} \frac{\partial H}{\partial s_{\nu}} \biggr)= \int\frac {\partial H}{\partial s_{\mu}}\frac{\partial H}{\partial s_{\nu}}p(x;s) \,dx. $$

(3.30)

Particularly important in this regard are the so-called exponential families (cf. Definition 2.10).

Definition 3.1

An exponential family is a family of probability distributions of the form

$$ p(x;\vartheta)=\exp\Biggl(\gamma(x)+\sum _{i=1}^{n}f_{i}(x)\vartheta^{i}- \psi (\vartheta)\Biggr)\mu(x), $$

(3.31)

where $\vartheta=(\vartheta^{1},\dots,\vartheta^{n})$ is an $n$-dimensional parameter, $\gamma(x)$ and $f_{1}(x),\dots,f_{n}(x)$ are functions and $\mu(x) $ is a measure on $\varOmega$.

(Of course, $\gamma$ could be absorbed into $\mu$, but this would be inconvenient for our subsequent discussion of examples.) The function $\psi$ simply serves to guarantee the normalization

$$\int_{\varOmega}p(x;\vartheta)=1; $$

namely

$$ \psi(\vartheta)=\log \int\exp\Bigl(\gamma(x)+\sum f_{i}(x) \vartheta^{i}\Bigr)\mu(dx). $$

Here, the family is defined only for those $\vartheta$ for which

$$ \int\exp\Bigl(\gamma(x)+\sum f_{i}(x) \vartheta^{i}\Bigr)\mu(dx)< \infty. $$

The set of those $\vartheta$ for which this is satisfied is convex, but can otherwise be quite complicated.

Exponential families will yield important examples of the parametrized measure models introduced in Sect. 3.2.4.

Example 3.1

The normal distribution $\mathcal{N}(\mu,\sigma^{2})=\frac{1}{\sqrt{2\pi\sigma}}\exp(-\frac{(x-\mu )^{2}}{2\sigma^{2}})$ on ℝ, with Lebesgue measure $dx$, with parameters $\mu$, $\sigma$ can easily be written in this form by putting

$$\begin{aligned} \gamma(x)&=0, \quad f_{1}(x)=x, \quad f_{2}(x)=x^{2}, \quad \vartheta^{1}=\frac{\mu}{\sigma^{2}},\quad\vartheta^{2}=- \frac{1}{2\sigma ^{2}}, \\ \psi(\vartheta)&=\frac{\mu^{2}}{2\sigma^{2}}+\log \sqrt{2\pi}\sigma=- \frac{(\vartheta^{1})^{2}}{4\vartheta^{2}}+\frac {1}{2}\log \biggl(-\frac{\pi}{\vartheta^{2}} \biggr), \end{aligned}$$

and analogously for multivariate normal distributions $\mathcal {N}(y,\varLambda)$, i.e., Gaussian distributions on $\mathbb {R}^{n}$. See [169] for a systematic analysis.

For an exponential family, we have

$$ \frac{\partial}{\partial\vartheta^{i}}\log p(x;\vartheta)= f_{i}(x)- \frac{\partial}{\partial\vartheta^{i}}\psi(\vartheta) $$

(3.32)

and

$$ \frac{\partial^{2}}{\partial\vartheta^{i} \partial \vartheta^{j}}\log p(x;\vartheta)= -\frac{\partial^{2}}{\partial\vartheta^{i}\partial \vartheta^{j}} \psi(\vartheta). $$

(3.33)

This expression no longer depends on $x$, but only on the parameter $\theta$. Therefore, the Fisher metric on such a family is given by

$$\begin{aligned} g_{ij}(p)&=-{\mathbb {E}}_{p} \biggl( \frac{\partial^{2}}{\partial\vartheta^{i}\partial \vartheta^{j}}\log p(x;\vartheta) \biggr) \\ &= \int\frac{\partial^{2}}{\partial\vartheta^{i}\partial \vartheta^{j}}\psi(\vartheta)\;p(x;\vartheta)\,dx \\ &=\frac{\partial^{2}}{\partial\vartheta^{i}\partial \vartheta^{j}}\psi(\vartheta) \quad\text{since } \int p(x;\vartheta)\,dx =1. \end{aligned}$$

(3.34)

For the normal distribution, we compute the metric in terms of $\vartheta^{1}$ and $\vartheta^{2} $, using (3.33) and transform the result to the variables $\mu$ and $\sigma$ with (B.16) and obtain at $\mu=0$

$$\begin{aligned} g \biggl(\frac{\partial}{\partial\mu},\frac{\partial}{\partial \mu} \biggr)&= \frac{1}{\sigma^{2}}, \end{aligned}$$

(3.35)

$$\begin{aligned} g \biggl(\frac{\partial}{\partial\mu},\frac{\partial}{\partial \sigma} \biggr)&=0, \end{aligned}$$

(3.36)

$$\begin{aligned} g \biggl(\frac{\partial}{\partial\sigma},\frac{\partial}{\partial \sigma} \biggr)&=\frac{2}{\sigma^{2}}. \end{aligned}$$

(3.37)

As the Fisher metric is invariant under diffeomorphisms of $\varOmega=\mathbb {R}$, and since $x\rightarrow x-\mu$ is such a diffeomorphism, it suffices to perform the computation at $\mu=0$. The metric computed there, however, up to a simple scaling is the hyperbolic metric of the half-plane

$$H:=\bigl\{ (\mu,\sigma):\mu\in\mathbb {R},\sigma>0\bigr\} , $$

and so, the Fisher metric on the family of normal distributions is the hyperbolic metric.

Let us summarize some points that will be important for the sequel. We have constructed the Fisher metric as the natural Riemannian metric in the space of lines of (finite non-negative) measures, i.e., on a projective space over a linear space. In the finite case, this projective space is simply a spherical sector. In particular, our metric is then the standard metric on the sphere, and it therefore has sectional curvature $\kappa\equiv 1$ (or, more precisely, $\frac{1}{4}$ if we utilize the sphere of radius 2, to get the normalizations right). This, in fact, carries over to the general case (see [99] for an explicit computation). Therefore, the Fisher metric is not Euclidean. By way of contrast, our space of probability measures can be viewed as a linear space in two different manners. On the one hand, as in the finite case, it can be represented as a simplex in a vector space. Thus, any probability measure can be represented as a convex linear combination of certain extremal measures. More precisely, when ${\varOmega }$ is a metrizable topological space, then the map ${\varOmega }\to\mathcal{P}({\varOmega })$, $x \mapsto \delta(x)$ that assigns to every $x \in {\varOmega }$ the delta measure supported at $x$ is an embedding. If ${\varOmega }$ is also separable, then the image is a closed subspace of $\mathcal{P}({\varOmega })$. Also, in this case, this image contains precisely the extreme points of the convex set $\mathcal{P}({\varOmega })$. See [5], Sect. 15.2, for details.

We shall call this representation as a convex linear combination of extremal measures a mixture representation. On the other hand, our space of probability measures can be represented as the exponential image of a linear tangent space. This gives the so-called exponential representation. We shall see below that these two linear structures are dual to each other, in the sense that each of them is the underlying affine structure for some connection, and the two corresponding connections are dual with respect to the Fisher metric. Of course, neither of these connections can be the Levi-Civita connection of the Fisher metric as the latter does not have vanishing curvature.

The Fisher metric also allows the following construction: If $\varSigma$ is a set with a measure $\sigma(u)$ on it, and if we have a mapping

$$h:\varSigma\rightarrow {\mathcal {P}}(\varOmega), $$

i.e., if we have a family of measures on $\varOmega$ parametrized by $u\in\varSigma$, we may then consider variations

$$h(u;s):\varSigma\times(-\varepsilon,\varepsilon)\rightarrow {\mathcal {P}}(\varOmega), $$

e.g.,

$$h(u;s)=\exp_{h(u;0)}s\varphi(u) $$

for some function $\varphi$.

If we have two such variations $\varphi^{1}$, $\varphi^{2}$, we can use the Fisher metric and the measure $\sigma$ to form their $L^{2}$-product

$$\begin{aligned} &\int_{\varSigma} {\mathbb {E}}_{h(u)}\bigl(\varphi^{1}, \varphi^{2}\bigr)\sigma(du) \\ &\quad= \int_{\varSigma} \biggl( \int_{\varOmega}\varphi^{1}(u) (x)\;\varphi^{2}(u) (x)\; h(u) (dx) \biggr)\;\sigma(du). \end{aligned}$$

In other words, we integrate the Fisher product with respect to the measure on our family of measures. The Fisher metric can formally be considered as a special case of this construction, namely when $\varSigma$ is a singleton. The general construction allows us to average over a family of measures. For example, if we have a Markov process with transition probability $p(\cdot|y)$ for each $y\in\varOmega$, we consider this as a family $h:\varOmega\rightarrow {\mathcal {P}}(\varOmega)$, with $h(y)=p(\cdot|y)$, and we may average the Fisher products taken with respect to the measures $p(\cdot|y)$ with respect to some initial probability distribution $p_{0}(y) $ for $y$. Thus, if we have families of such transition probabilities $p(\cdot|y;s)$, we get the averaged Fisher product

$$ g_{ij}(s) := \int \biggl( \int\frac{\partial}{\partial s^{i}}\log p(\cdot|y;s) \frac{\partial}{\partial s^{j}} \log p(\cdot|y;s)\,dx \biggr)p_{0}(dy). $$

(3.38)

Under sufficient conditions, this metric is usually obtained from the distributions

$$p^{(n)} (x_{0}, x_{1}, \dots,x_{n} ; s) := p_{0}(x_{0} ; s) p(x_{1} | x_{0} ; s) \cdots p(x_{n} | x_{n - 1}; s), \quad n = 0,1,2, \dots $$

Denoting the Fisher metric for $s \mapsto p^{(n)}(\cdot; s)$ by $g^{(n)}_{i j}$, we can consider the limit

$$ \overline{g}_{ij}(s) := \lim _{n \to\infty} \frac{1}{n} \, g^{(n)}_{ij}(s), $$

(3.39)

if it exists. A sufficient condition for the existence of this limit is given by the stationarity of the process. In that case, the metric $\overline{{\mathfrak {g}}}$ defined by (3.39) reduces to the metric ${\mathfrak {g}}$ defined by (3.38). We shall take up this issue in Sect. 6.1.

3.2 Parametrized Measure Models

In this section, we shall give the formal definitions of parametrized measure models, providing solutions to some of the issues described in Sect. 3.1, improving upon [26]. First of all, the intuitive definition of a statistical model is to regard it as a family $\mathbf{p}(\xi)_{\xi \in M}$ of probability measures on some sample space ${\varOmega }$ which varies in a differentiable fashion with $\xi\in M$. To make this formal, we need to provide some kind of differentiable structure on the space ${\mathcal {P}}({\varOmega })$ of probability measures. This is done by noting that ${\mathcal {P}}({\varOmega })$ is contained in the Banach space ${\mathcal {S}}({\varOmega })$ of finite signed measures on ${\varOmega }$, provided with the Banach norm $\|\cdot\|_{TV}$ of total variation from (3.1). Therefore, we shall regard a statistical model as a $C^{1}$-map between Banach manifolds $\mathbf{p}: M \to {\mathcal {S}}({\varOmega })$, as described in Appendix C, whose image is contained in ${\mathcal {P}}({\varOmega })$.

Since ${\mathcal {P}}({\varOmega })$ may also be regarded as the projectivization of the space of finite measures ${\mathcal {M}}({\varOmega })$ via rescaling, any $C^{1}$-map $\mathbf{p}: M \to {\mathcal {M}}({\varOmega })$ induces a statistical model $\mathbf{p}_{0}: M \to {\mathcal {P}}({\varOmega })$ by

$$ {\mathbf{p}}_{0}(\xi) = \dfrac{\mathbf{p}(\xi)}{\|{\mathbf{p}}(\xi)\|_{TV}}. $$

(3.40)

It is often more convenient to study $C^{1}$-maps $\mathbf{p}: M \to {\mathcal {M}}({\varOmega })$, called parametrized measure models, and then use (3.40) to obtain a statistical model $\mathbf{p}_{0}$. In case we consider $C^{1}$-maps $\mathbf{p}: M \to {\mathcal {S}}({\varOmega })$, also allowing for non-positive measures, we call it a signed parametrized measure model.

Let us assume for simplicity that all measures $\mathbf{p}(\xi)$ are dominated by some fixed measure $\mu_{0}$, even though later we shall show that this assumption is inessential. As it turns out, a dominating measure $\mu_{0}$ exists if $M$ contains a countable dense subset which is the case, e.g., if $M$ is a finite-dimensional manifold. In this case,

$${\mathbf{p}}(\xi) = p(\cdot;\xi) \mu_{0}, $$

where $p: {\varOmega }\times M \to {\mathbb {R}}$ is called the density function of $\mathbf{p}$ w.r.t. $\mu_{0}$. Evidently, $p(\cdot;\xi) \in L^{1}({\varOmega }, \mu_{0})$ for all $\xi$. The differential of $\mathbf{p}$ in the direction of a tangent vector $V \in T_{\xi}M$ is then given by

$$d_{\xi}{\mathbf{p}}(V) = \partial_{V} p(\cdot;\xi) \mu_{0} \in L^{1}({\varOmega }, \mu_{0}), $$

where for simplicity we assume that the partial derivative of the density function exists, even though we shall show that this condition may be weakened as well.

Attempting to define an inner product on $T_{\xi}M$ analogous to the Fisher metric in (2.18), we have to regard $d_{\xi}{\mathbf{p}}(V)$ as an element of $T_{\mathbf{p}(\xi)} {\mathcal {S}}({\varOmega }) \cong L^{1}({\varOmega }, \mathbf{p}(\xi))$, which leads to

$$\begin{aligned} {\mathfrak {g}}_{\xi}(V, W) = & \bigl(d_{\xi}p(V), d_{\xi}p(W)\bigr)_{\mathbf{p}(\xi)} \\ = & \int_{{\varOmega }}\frac{d\{d_{\xi}p(V)\}}{dp(\xi)}\; \frac{d\{d_{\xi}p(W)\} }{dp(\xi)}\; d \mathbf{p}(\xi) \\ = & \int_{{\varOmega }}\frac{\partial_{V} p(\cdot;\xi)}{p(\cdot;\xi)}\; \frac {\partial_{W} p(\cdot;\xi)}{p(\cdot;\xi)}\; d \mathbf{p}(\xi) \\ = & \int_{{\varOmega }}\partial_{V} \log p(\cdot;\xi)\; \partial_{W} \log p(\cdot ;\xi)\; d\mathbf{p}(\xi). \end{aligned}$$

(3.41)

We immediately encounter two problems. The first one is that—unlike in the case of a finite sample space ${\varOmega }$—the above integral may diverge, if we only assume that $\partial_{V} \log p(\cdot;\xi) \in L^{1}({\varOmega }, \mathbf{p}(\xi))$; rather, we should demand that $\partial_{V} \log p(\cdot;\xi) \in L^{2}({\varOmega }, \mathbf{p}(\xi))$ which in the case of an infinite sample space ${\varOmega }$ is a proper subset.

The second problem is that the functions $\log p(\cdot;\xi)$ used in (3.41) to define the Fisher metric are not defined if we drop the assumption that $p > 0$, i.e., that all measures have the same null sets as $\mu_{0}$. This is the reason why in most definitions of differentiable families of measures the equivalence of the measures $\mathbf{p}(\xi)$ in the family and hence the positivity of the density function $p$ is required; cf., e.g., [9, 16, 25, 219]. For instance, if ${\varOmega }$ is a finite sample space, then the description of the Fisher metric on ${\mathcal {M}}({\varOmega })$ or on ${\mathcal {P}}({\varOmega })$ in its canonical coordinates develops singularities outside the sets ${\mathcal {M}}_{+}({\varOmega })$ and ${\mathcal {P}}_{+}({\varOmega })$, respectively, cf. (2.13) and (2.19). However, if we use the coordinates $(\sqrt {p_{i}(\xi)})_{i \in I}$ instead, then this metric coincides—up to a constant factor—with the standard inner product on Euclidean space and hence extends to all of ${\mathcal {M}}({\varOmega })$ and ${\mathcal {P}}({\varOmega })$, respectively, cf. Proposition 2.1. That is, the seeming degeneracy of the Fisher metric near the boundary of ${\mathcal {P}}_{+}({\varOmega }, \mu_{0})$ is only due to an inconvenient choice of coordinates, while with the right choice $(\sqrt{p_{i}(\xi)})_{i \in I}$ it becomes the Euclidean metric on the sphere. Formulating it slightly differently, the point is that not all the individual factors under the integral in (3.41) need to be well-defined, but it suffices that their product is.

Generalizing this approach, let us introduce half-densities, by which we mean formal square roots of measures. That is, we let

$$\sqrt{\mathbf{p}(\xi)} := \sqrt{p(\cdot;\xi)} \sqrt{\mu_{0}}, $$

where for the moment we regard $\sqrt{\mu_{0}}$ merely as a formal symbol. Taking derivatives of this yields

$$d_{\xi}\sqrt{\mathbf{p}}(V) = \frac{1}{2} \frac{\partial_{V} p(\cdot;\xi )}{\sqrt{p(\cdot;\xi)}} \sqrt{\mu_{0}} = \frac{1}{2} \partial_{V} \log p( \cdot ;\xi) \sqrt{\mathbf{p}(\xi)}, $$

whence

$$d_{\xi}\sqrt{\mathbf{p}}(V) \cdot d_{\xi}\sqrt{ \mathbf{p}}(W) = \frac{1}{4} \partial_{V} \log p(\cdot;\xi) \partial_{W} \log p(\cdot;\xi) p(\cdot;\xi) \mu_{0}, $$

and so

$${\mathfrak {g}}_{\xi}(V, W) = 4 \int_{{\varOmega }}d \bigl(d_{\xi}\sqrt{\mathbf{p}}(V) \cdot d_{\xi}\sqrt{\mathbf{p}}(W) \bigr), $$

as in (3.22). Analogously, if we define $\sqrt[3]{\mathbf{p}(\xi)} := \sqrt[3]{p(\cdot ;\xi)} \sqrt[3]{\mu_{0}}$, again regarding $\sqrt[3]{\mu_{0}}$ as a formal symbol, then

$$d_{\xi}\sqrt[3]{\mathbf{p}}(V) = \frac{1}{3} \frac{\partial_{V} p(\cdot;\xi )}{\sqrt{p(\cdot;\xi)}^{2}} \sqrt[3]{\mu_{0}} = \frac{1}{3} \partial_{V} \log p(\cdot;\xi) \sqrt[3]{\mathbf{\mathbf{p}}(\xi)}, $$

so that the Amari–Chentsov tensor ${\mathbf {T}}$ from (2.51) can be written as

$$\begin{aligned} {\mathbf {T}}_{\xi}(V, W, U) = & \int_{{\varOmega }}\partial_{V} \log p(\cdot;\xi) \partial_{W} \log p(\cdot;\xi) \partial_{U} \log p(\cdot; \xi)\; d\mathbf{p}(\xi) \\ = & 27 \int_{{\varOmega }}d \bigl( d_{\xi}\sqrt[3]{\mathbf{p}( \xi)}(V) \cdot d_{\xi}\sqrt[3]{\mathbf{p}(\xi)}(W) \cdot d_{\xi}\sqrt[3]{\mathbf{p}(\xi)}(U) \bigr). \end{aligned}$$

(3.42)

This suggests that we should try to make formal sense of the objects $\sqrt{\mu_{0}}$, $\sqrt[3]{\mu_{0}}$ and hence of $\mathbf{p}(\xi)^{1/2} = \sqrt{\mathbf{p}(\xi)}$, $\mathbf{p}(\xi)^{1/3} = \sqrt[3]{\mathbf {p}(\xi)}$, etc. The idea of taking $r$th powers of a measure $\mathbf {p}(\xi)$ with $0 < r < 1$ has been introduced in a less rigorous way by Amari in [8, p. 66], where such powers are called $\alpha$ -representations.

We shall use a more formal approach and for $0 < r \leq1$ construct the Banach spaces ${\mathcal {S}}^{r}({\varOmega })$ in Sect. 3.2.3 by a certain direct limit construction, as well as the subsets

$${\mathcal {P}}^{r}({\varOmega }) {\subseteq }{\mathcal {M}}^{r}({\varOmega }) {\subseteq }{\mathcal {S}}^{r}( {\varOmega }). $$

The elements of these sets are called $r$ th powers of probability measures (finite measures, signed finite measures, respectively), and they have the feature that one can formally take the $(1/r)$th power of them to obtain a probability measure (finite measure, signed finite measure, respectively), and raising to the $(1/r)$th power is a $C^{1}$-regular map between Banach spaces.

Once we have done this, we call a parametrized measure model $k$ -integrable if the map

$$ {\mathbf{p}}^{1/k}: M \longrightarrow {\mathcal {M}}^{1/k}( {\varOmega }) {\subseteq }{\mathcal {S}}^{1/k}({\varOmega }), \qquad\xi\longmapsto{\mathbf{p}}( \xi)^{1/k} $$

(3.43)

is a $C^{1}$-map as defined in Appendix C. If $\mathbf{p}(\xi) = p(\cdot;\xi) \mu_{0}$ is dominated by $\mu_{0}$, then the differential of this map is given by

$${\partial }_{V} \mathbf{p}^{1/k} = d_{\xi}{\mathbf{p}}^{1/k}(V) := \frac{1}{k} \partial _{V} \log p(\cdot;\xi) \mathbf{p}^{1/k}(\xi), $$

and in order for this to be the $k$th root of a measure, we have to require that $\partial_{V} \log p(\cdot;\xi) \in L^{k}({\varOmega }, \mathbf{p}(\xi))$.

We call such a model weakly $k$ -integrable if the map $\mathbf {p}^{1/k}: M \to {\mathcal {M}}^{1/k}({\varOmega })$ is a weak $C^{1}$-map, cf. Appendix C for a definition. As we shall show in Theorem 3.2, the model is $k$-integrable if and only if the $k$-norm $V \mapsto\|\partial_{V} \mathbf{p}^{1/k}\|_{k}$ depends continuously on $V \in {TM}$, and it is weakly $k$-integrable if and only if the map $V \mapsto\partial_{V} \mathbf{p}^{1/k}$ is weakly continuous. Thus, our definition of $k$-integrability coincides with that given in [25, Definition 2.4].

In general, on an $n$-integrable parametrized measure model we may define the canonical $n$ -tensor

$$ \bigl(\tau_{M}^{n}\bigr)_{\xi}(V_{1}, \ldots, V_{n}) = \int_{{\varOmega }}\partial_{V_{1}} \log p(\cdot ;\xi) \cdots \partial_{V_{n}} \log p(\cdot;\xi)\;d\mathbf{p}(\xi), $$

(3.44)

which is a generalization of the Fisher metric ${\mathfrak {g}}= \tau_{M}^{2}$ (3.41) and the Amari–Chentsov tensor ${\mathbf {T}}= \tau_{M}^{3}$ (3.42). In fact, we show that $\tau_{M}^{n}$ is the pullback of a naturally defined covariant $n$-tensor $L_{{\varOmega }}^{n}$ on ${\mathcal {S}}^{1/n}({\varOmega })$ via the map $\mathbf{p}^{1/n}: M \to {\mathcal {M}}^{1/n}({\varOmega }) {\subseteq }{\mathcal {S}}^{1/n}({\varOmega })$, where $k \geq n$. In particular, $\tau^{n}_{M} := (\mathbf{p}^{1/n})^{*}L_{{\varOmega }}^{n}$ is well defined for a $k$-integrable parametrized measure model with $k \geq n$, even if $p$ is not a positive function, in which case (3.44) has to be interpreted with care.

While for most applications it will suffice to consider statistical models which are dominated by some measure $\mu_{0}$, our development of the theory will show that this is an inessential condition. Intuitively, it is plausible that the quantity $\partial_{V} p(\cdot;\xi) \mu_{0}$ which measures the change of measure w.r.t. the background measure $\mu_{0}$ is not a significant quantity, but rather the rate of change of $\mathbf{p}(\xi)$ relative to the measure $\mathbf{p}(\xi)$ itself. That is, the relevant quantity to consider as a derivative is the logarithmic derivative

$$\partial_{V} \log p(\cdot;\xi) = \frac{d\{d_{\xi}{\mathbf{p}}(V)\}}{d\mathbf{p}(\xi)}, $$

where the fraction stands for the Radon–Nikodym derivative. An important observation is that for any parametrized measure model, this Radon–Nikodym derivative always exists, so that the logarithmic derivative $\partial_{V} \log{\mathbf {p}}(\xi)$ may be defined also in the absence of a dominating measure $\mu_{0}$. This is all that is needed to define the notions of $k$-integrability and the canonical tensors mentioned above.

3.2.1 The Structure of the Space of Measures

The aim of this section is to provide the formal set-up of parametrized measure models in order to make the discussion in the preceding section rigorous. As before, let ${\varOmega }$ be a measurable space. We let

$$\begin{aligned} \begin{aligned} {\mathcal {P}}({\varOmega }) & := \{ \mu : \mu \mbox{ a probability measure on ${\varOmega }$}\}, \\ {\mathcal {M}}({\varOmega }) & := \{ \mu: \mu \mbox{ a finite measure on ${\varOmega }$}\}, \\ {\mathcal {S}}({\varOmega }) & := \{ \mu : \mu \mbox{ a signed finite measure on ${\varOmega }$}\}, \\ {\mathcal {S}}_{0}({\varOmega }) & := \biggl\{ \mu\in {\mathcal {S}}({\varOmega }) : \int_{{\varOmega }}\,d\mu= 0\biggr\} . \end{aligned} \end{aligned}$$

(3.45)

Clearly, ${\mathcal {P}}({\varOmega }) {\subseteq }{\mathcal {M}}({\varOmega }) {\subseteq }{\mathcal {S}}({\varOmega })$, and ${\mathcal {S}}_{0}({\varOmega }), {\mathcal {S}}({\varOmega })$ are real vector spaces. In fact, both ${\mathcal {S}}_{0}({\varOmega })$ and ${\mathcal {S}}({\varOmega })$ are Banach spaces whose norm is given by the total variation $\|\cdot\|_{TV}$ (3.1), whence any subset carries a canonical topology which is determined by saying that a sequence $(\nu _{n})_{n \in {\mathbb {N}}}$ in (a subset of) ${\mathcal {S}}({\varOmega })$ converges to $\nu_{\infty}$ if and only if

$$\lim_{n \to\infty} \|\nu_{n} - \nu_{\infty}\|_{TV} = 0. $$

With respect to this topology, the subsets

$${\mathcal {P}}({\varOmega }) {\subseteq }{\mathcal {M}}({\varOmega }) {\subseteq }{\mathcal {S}}({\varOmega }) $$

are closed.

Remark 3.2

Evidently, for the applications we have in mind, we are interested mainly in statistical models. However, we can take the point of view that ${\mathcal {P}}({\varOmega }) = {\mathbb {P}}({\mathcal {M}}({\varOmega }))$ is the projectivization of ${\mathcal {M}}({\varOmega })$ via rescaling. Thus, given a parametrized measure model $(M, {\varOmega }, \mathbf{p})$, normalization yields a statistical model $(M, {\varOmega }, \mathbf{p}_{0})$ defined by

$${\mathbf{p}}_{0}(\xi) := \dfrac{\mathbf{p}(\xi)}{\| {\mathbf{p}}(\xi) \|_{TV}}, $$

which is again a $C^{1}$-map. Indeed, the map $\mu\mapsto\|\mu\|_{TV}$ on ${\mathcal {M}}({\varOmega })$ is a $C^{1}$-map, being the restriction of the linear (and hence differentiable) map $\mu\mapsto\int_{{\varOmega }}d\mu$ on ${\mathcal {S}}({\varOmega })$.

Observe that while ${\mathcal {S}}({\varOmega })$ is a Banach space, the subsets ${\mathcal {M}}({\varOmega })$ and ${\mathcal {P}}({\varOmega })$ do not carry a canonical manifold structure.

By the Jordan decomposition theorem, each measure $\mu\in {\mathcal {S}}({\varOmega })$ can be decomposed uniquely as

$$ \mu= \mu_{+} - \mu_{-} \quad\mbox{with $\mu_{\pm}\in {\mathcal {M}}( {\varOmega })$, $\mu_{+} \perp\mu_{-}$}. $$

(3.46)

The latter means that we have a disjoint union ${\varOmega }= P \uplus N$ with $\mu_{+}(N) = \mu_{-}(P) = 0$. Thus, if we define

$$|\mu| := \mu_{+} + \mu_{-} \in {\mathcal {M}}({\varOmega }), $$

then (3.46) implies

$$ \bigl\vert \mu(A) \bigr\vert \leq|\mu|(A) \quad\mbox{for all $\mu\in {\mathcal {S}}({\varOmega })$ and $A \in\varSigma$}, $$

(3.47)

so that

$$\|\mu\|_{TV} = \bigl\| \; |\mu|\; \bigr\| _{TV} = |\mu|({\varOmega }). $$

In particular,

$${\mathcal {P}}({\varOmega }) = \bigl\{ \mu\in {\mathcal {M}}({\varOmega }) : \|\mu\|_{TV} = 1\bigr\} . $$

Moreover, fixing a measure $\mu_{0} \in {\mathcal {M}}({\varOmega })$, we define

$$\begin{aligned} \begin{aligned} {\mathcal {M}}({\varOmega }, \mu_{0}) & = \bigl\{ \mu= \phi\, \mu_{0} : \phi\in L^{1}({\varOmega }, \mu_{0}), \ \phi\geq0 \bigr\} ,\\ {\mathcal {M}}_{+}({\varOmega }, \mu_{0}) & = \bigl\{ \mu= \phi\, \mu_{0} : \phi\in L^{1}({\varOmega }, \mu_{0}), \ \phi> 0 \bigr\} ,\\ {\mathcal {P}}({\varOmega }, \mu_{0}) & = \biggl\{ \mu\in {\mathcal {M}}({\varOmega }, \mu_{0}) : \int_{{\varOmega }}\,d\mu = 1 \biggr\} ,\\ {\mathcal {P}}_{+}({\varOmega }, \mu_{0}) & = \biggl\{ \mu\in {\mathcal {M}}_{+}({\varOmega }, \mu_{0}) : \int_{{\varOmega }}\,d\mu= 1 \biggr\} ,\\ {\mathcal {S}}({\varOmega }, \mu_{0}) & = \bigl\{ \mu= \phi\, \mu_{0} : \phi\in L^{1}({\varOmega }, \mu_{0}) \bigr\} . \end{aligned} \end{aligned}$$

(3.48)

By the Radon–Nikodym theorem, ${\mathcal {P}}({\varOmega }, \mu_{0}) {\subseteq }{\mathcal {M}}({\varOmega }, \mu_{0}) {\subseteq }{\mathcal {S}}({\varOmega }, \mu_{0})$ consist of those measures in ${\mathcal {P}}({\varOmega }) {\subseteq }{\mathcal {M}}({\varOmega }) {\subseteq }{\mathcal {S}}({\varOmega })$ which are dominated by $\mu_{0}$, and the canonical isomorphism, $i_{can}: {\mathcal {S}}({\varOmega }, \mu_{0}) \to L^{1}({\varOmega }, \mu_{0})$ in (3.2) given by taking the Radon–Nikodym derivative w.r.t. $\mu_{0}$ yields an isomorphism whose inverse is given by

$$\imath_{can}^{-1}: L^{1}({\varOmega }, \mu_{0}) \longrightarrow {\mathcal {S}}({\varOmega }, \mu_{0}), \qquad\phi\longmapsto\phi\; \mu_{0}. $$

Observe that $\imath_{can}$ is an isometry of Banach spaces, since evidently

$$\|\phi\|_{L^{1}({\varOmega }, \mu_{0})} = \int_{{\varOmega }}|\phi|\; d\mu_{0} = \|\phi\; \mu_{0}\|_{TV}. $$

3.2.2 Tangent Fibration of Subsets of Banach Manifolds

In this section, we shall use the notion of differentiable maps between Banach spaces, as described in Appendix C. In particular, a curve in a Banach space $(V; \|\cdot\|)$ is a differentiable map $c: I \to V$ with an (open) interval $I {\subseteq }{\mathbb {R}}$. That is, for each $t_{0} \in I$ there exists the limit

$$ \dfrac{d}{dt} \bigg|_{t=t_{0}} c(t) = \dot{c}(t_{0}) := \lim_{t \to t_{0}} \dfrac{c(t) - c(t_{0})}{t - t_{0}}. $$

(3.49)

Definition 3.2

Let $(V; \|\cdot\|)$ be a Banach space, $X {\subseteq }V$ an arbitrary subset and $x_{0} \in X$. Then $v \in V$ is called a tangent vector of $X$ at $x_{0}$, if there is a curve $c: (- {\varepsilon }, {\varepsilon }) \to X {\subseteq }V$ such that $c(0) = x_{0}$ and $\dot{c}(0) = v$.

The set of all tangent vectors at $x_{0}$ is called the tangent double cone of $X$ at $x_{0}$ and is denoted by $T_{x_{0}}X$. We also define the tangent fibration of $X$ as

$${TX} := \biguplus_{x_{0} \in X} T_{x_{0}}X {\subseteq }X \times V {\subseteq }V \times V, $$

equipped with the induced topology and with the canonical projection map ${TX} \to X$.

Remark 3.3

The reader should be aware that, unlike in some texts, we do not use tangent fibration as a synonym for the tangent bundle, since for general subsets $X {\subseteq }V$, $T_{x_{0}}X {\subseteq }V$ may fail to be a vector subspace, and for $x_{0} \neq x_{1}$, the tangent cones $T_{x_{0}}X$ and $T_{x_{1}}X$ need not be homeomorphic.

For instance, let $X := \{ (x,y)^{\top}\in {\mathbb {R}}^{2} : xy = 0\} {\subseteq }{\mathbb {R}}^{2}$, so that $X$ is the union of the two coordinate axes. Then $T_{(x,0)} X$ and $T_{(0,y)}X$ with $x, y \neq0$ are the $x$-axis and the $y$-axis, respectively, and hence linear subspaces of $V = {\mathbb {R}}^{2}$, but $T_{(0,0)} X = X$ is not a subspace. This example also shows that $T_{x_{0}} X$ and $T_{x_{1}}X$ need not be homeomorphic if $x_{0} \neq x_{1}$, whence the projection ${TX} \to X$ mapping $T_{x_{0}}X$ to $x_{0}$ is in general only a topological fibration, but not a vector bundle or a fiber bundle.

But at least, $T_{x_{0}}X$ is invariant under multiplication by (positive or negative) scalars and hence is a double cone. This is seen by replacing the curve $c$ in Definition 3.2 by $\tilde{c}(t) := c(t_{0} t) \in X$ for some $t_{0} \in {\mathbb {R}}$, so that $\tilde{c}(0) = x_{0}$ and $\dot{\tilde{c}}(0) = t_{0} v \in T_{x_{0}}X$.

If $X {\subseteq }V$ is a submanifold, however, then $T_{x_{0}}X$ and ${TX}$ coincide with the standard notion of the tangent space at $x_{0}$ and the tangent bundle of $X$, respectively. Thus, in this case the tangent cone is a linear subspace, and the tangent fibration is the tangent bundle of $X$.

For instance, if $X = U {\subseteq }V$ is an open set, then $T_{x_{0}}U = V$ for all $x_{0}$ and hence, $TU = U \times V$. Indeed, the curve $c(t) = x_{0} + t v \in U$ for small $|t|$ satisfies the properties required in the Definition 3.2. In this case, the tangent fibration $U \times V \to U$ is a (trivial) vector bundle.

If $M$ is a Banach manifold and $F: M \to V$ is a $C^{1}$-map whose image is contained in $X {\subseteq }V$, then by the chain rule, for any $v \in T_{x_{0}}M$ and any curve $c: (-{\varepsilon }, {\varepsilon }) \to M$ with $c(0) = x_{0}$, $\dot{c}(0) = v$,

$$\dfrac{d}{dt} \bigg|_{t=0} F\bigl(c(t)\bigr) = d_{x_{0}}F(v), $$

and as $t \mapsto F(c(t))$ is a curve in $X$ with $F(c(0)) = F(x_{0})$, it follows that $d_{x_{0}}F(v) \in T_{F(x_{0})}X$ for all $v \in {TM}$. That is, a $C^{1}$-map $F: M \to X {\subseteq }V$ induces a continuous map

$$dF: {TM} \longrightarrow {TX}, \qquad({x_{0}}, v) \longmapsto d_{x_{0}}F(v). $$

Theorem 3.1

Let $V = {\mathcal {S}}({\varOmega })$ be the Banach space of finite signed measures on ${\varOmega }$. Then the tangent cones of ${\mathcal {M}}({\varOmega })$ and ${\mathcal {P}}({\varOmega })$ at $\mu$ are $T_{\mu} {\mathcal {M}}({\varOmega }) = {\mathcal {S}}({\varOmega }, \mu)$ and $T_{\mu} {\mathcal {P}}({\varOmega }) = {\mathcal {S}}_{0}({\varOmega }, \mu)$, respectively, so that

$$\begin{aligned} T{\mathcal {M}}({\varOmega }) = \biguplus_{\mu\in {\mathcal {M}}({\varOmega })} {\mathcal {S}}({\varOmega }, \mu) {\subseteq }{\mathcal {M}}({\varOmega }) \times {\mathcal {S}}({\varOmega }) \end{aligned}$$

and

$$\begin{aligned} T{\mathcal {P}}({\varOmega }) = \biguplus_{\mu\in {\mathcal {P}}({\varOmega })} {\mathcal {S}}_{0}({\varOmega }, \mu) {\subseteq }{\mathcal {P}}({\varOmega }) \times {\mathcal {S}}({\varOmega }). \end{aligned}$$

Proof

Let $\nu\in T_{\mu_{0}} {\mathcal {M}}({\varOmega })$ and let $(\mu_{t})_{t \in (-{\varepsilon }, {\varepsilon })}$ be a curve in ${\mathcal {M}}({\varOmega })$ with $\dot{\mu}_{0} = \nu$. Let $A {\subseteq }{\varOmega }$ be such that $\mu_{0}(A) = 0$. Then as $\mu_{t}(A) \geq0$, the function $t \mapsto\mu_{t}(A)$ has a minimum at $t=0$, whence

$$0 = \dfrac{d}{dt} \bigg|_{t=0} \mu_{t}(A) = \dot{\mu}_{0}(A) = \nu(A), $$

where the second equation is evident from (3.49). That is, $\nu(A) = 0$ whenever $\mu_{0}(A) = 0$, i.e., $\mu_{0}$ dominates $\nu$, so that $\nu\in {\mathcal {S}}({\varOmega }, \mu_{0})$. Thus, $T_{\mu_{0}} {\mathcal {M}}({\varOmega }) {\subseteq }{\mathcal {S}}({\varOmega }, \mu_{0})$.

Conversely, given $\nu= \phi\mu_{0} \in {\mathcal {S}}({\varOmega }, \mu_{0})$, define $\mu_{t} := p({\omega }; t) \mu_{0}$ where

$$p({\omega }; t) := \textstyle\begin{cases} 1 + t \phi({\omega }) & \mbox{if $t\phi({\omega }) \geq0$},\\ \exp (t\phi({\omega })) & \mbox{if $t\phi({\omega }) < 0$}. \end{cases} $$

As $p({\omega }; t) \leq\max(1+t\phi({\omega }), 1)$, it follows that $\mu_{t} \in {\mathcal {M}}({\varOmega })$, and as the derivative ${\partial }_{t} p({\omega };t)$ exists for all $t$ and its absolute value is bounded by $|\phi| \in L^{1}({\varOmega }, \mu_{0})$, it follows that $t \mapsto\mu_{t}$ is a $C^{1}$-curve in ${\mathcal {M}}({\varOmega })$ with $\dot{\mu}_{0} = \phi\mu_{0} = \nu$, whence $\nu\in T_{\mu_{0}}{\mathcal {M}}({\varOmega })$ as claimed.

To show the statement for ${\mathcal {P}}({\varOmega })$, let $(\mu_{t})_{t \in(-{\varepsilon }, {\varepsilon })}$ be a curve in ${\mathcal {P}}({\varOmega })$ with $\dot{\mu}_{0} = \nu$. Then as $\mu_{t}$ is a probability measure for all $t$, we conclude that

$$\biggl\vert \int_{{\varOmega }}d\nu \biggr\vert = \biggl\vert \int_{{\varOmega }}\dfrac{1}{t} \,d(\mu_{t} - \mu_{0} - t \nu) \biggr\vert \leq\dfrac{\|\mu_{t} - \mu_{0} - t \nu\|_{TV}}{|t|} \xrightarrow{t \to0} 0, $$

so that $\nu\in {\mathcal {S}}_{0}({\varOmega })$. Since ${\mathcal {P}}({\varOmega }) {\subseteq }{\mathcal {M}}({\varOmega })$, it follows that $T_{\mu_{0}}{\mathcal {P}}({\varOmega }) {\subseteq }T_{\mu_{0}}{\mathcal {M}}({\varOmega }) \cap {\mathcal {S}}_{0}({\varOmega }) = {\mathcal {S}}_{0}({\varOmega }, \mu_{0})$ for all $\mu_{0} \in {\mathcal {P}}({\varOmega })$.

Conversely, given $\nu= \phi\mu_{0} \in {\mathcal {S}}_{0}({\varOmega }, \mu_{0})$, define the curve $\lambda_{t} := \mu_{t}\|\mu_{t}\|_{TV}^{-1} \in {\mathcal {P}}({\varOmega })$ with $\mu_{t}$ from above, which is a $C^{1}$-curve in ${\mathcal {P}}({\varOmega })$ as $\|\mu_{t}\|_{TV} > 0$, and it is straightforward that $\lambda_{0} = \mu_{0}$ and $\dot{\lambda}_{0} = \phi\mu_{0} = \nu$. □

Remark 3.4

(1)
Even though the tangent cones of the subsets ${\mathcal {P}}({\varOmega }) {\subseteq }{\mathcal {M}}({\varOmega }) {\subseteq }{\mathcal {S}}({\varOmega })$ at each point $\mu$ are closed vector subspaces of ${\mathcal {S}}({\varOmega })$, these subsets are not Banach manifolds, and hence in particular not Banach submanifolds of ${\mathcal {S}}({\varOmega })$. This is due to the fact that the spaces ${\mathcal {S}}_{0}({\varOmega }, \mu)$ need not be isomorphic to each other for different $\mu$.

This can already be seen for finite ${\varOmega }= \{{\omega }_{1}, \ldots, {\omega }_{k}\}$. In this case, we may identify ${\mathcal {S}}({\varOmega })$ with ${\mathbb {R}}^{k}$ by the map $\sum_{i=1}^{k} x_{i} \delta^{{\omega }_{i}} \cong(x_{1}, \ldots, x_{k})$, and with this,
$$T{\mathcal {M}}({\varOmega }) \cong \left \{(x_{1}, \ldots, x_{k}; y_{1}, \ldots, y_{k}) \in {\mathbb {R}}^{k} \times {\mathbb {R}}^{k} : \textstyle\begin{array}{l} x_{i} \geq0,\\ x_{i} = 0 \Rightarrow y_{i} = 0 \end{array}\displaystyle \right \}, $$
and this is evidently not a submanifold of ${\mathbb {R}}^{2k}$. Indeed, in this case the dimension of $T_{\mu} {\mathcal {M}}({\varOmega }) = {\mathcal {S}}({\varOmega }, \mu)$ equals $|\{{\omega }\in {\varOmega }\mid\mu({\omega }) > 0\}|$, which varies with $\mu$. Of course, this simply reflects the geometric stratification of the closed probability simplex in terms of the faces of various dimensions. Theorem 3.1 then describes such a stratification also in infinite dimensions.
(2)
Observe that the curves $\mu_{t}$ and $\lambda_{t}$, respectively, used in the proof of Theorem 3.1 are contained in ${\mathcal {M}}_{+}({\varOmega }, \mu_{0})$ and ${\mathcal {P}}_{+}({\varOmega }, \mu_{0})$, respectively, whence it also follows that
$$T_{\mu_{0}} {\mathcal {M}}_{+}({\varOmega }, \mu_{0}) = {\mathcal {S}}({\varOmega }, \mu_{0}) \quad\mbox{and}\quad T_{\mu_{0}} {\mathcal {P}}_{+}({\varOmega }, \mu_{0}) = {\mathcal {S}}_{0}({\varOmega }, \mu_{0}). $$
But if $\mu_{1} \in {\mathcal {M}}_{+}({\varOmega }, \mu_{0})$, then $\mu_{1}$ and $\mu_{0}$ are compatible measures (i.e., they have the same null sets), whence in this case, ${\mathcal {S}}({\varOmega }, \mu_{0}) = {\mathcal {S}}({\varOmega }, \mu _{1})$ and ${\mathcal {S}}_{0}({\varOmega }, \mu_{0}) = {\mathcal {S}}_{0}({\varOmega }, \mu_{1})$. That is, the subset ${\mathcal {M}}_{+}({\varOmega }, \mu_{0}) {\subseteq }{\mathcal {S}}({\varOmega }, \mu_{0})$ has at each point all of ${\mathcal {S}}({\varOmega }, \mu_{0})$ as its tangent space, but in general, ${\mathcal {M}}_{+}({\varOmega }, \mu _{0}) {\subseteq }{\mathcal {S}}({\varOmega }, \mu_{0})$ is not open.^{Footnote 4} This is a quite remarkable and unusual phenomenon in Differential Geometry.

That is, neither on ${\mathcal {M}}({\varOmega })$ nor on ${\mathcal {M}}_{+}({\varOmega }, \mu_{0})$ is there a canonical manifold structure in general, and the same is true for ${\mathcal {P}}({\varOmega })$ and ${\mathcal {P}}_{+}({\varOmega }, \mu_{0})$, respectively. Nevertheless, Definition 3.2 and Theorem 3.1 allow us to speak of the tangent fibration of ${\mathcal {M}}({\varOmega })$ and ${\mathcal {P}}({\varOmega })$, respectively.

3.2.3 Powers of Measures

Let us now give the formal definition of roots of measures. On the set ${\mathcal {M}}({\varOmega })$ we define the preordering $\mu_{1} \leq\mu_{2}$ if $\mu_{2}$ dominates $\mu_{1}$. Then $({\mathcal {M}}({\varOmega }), \leq)$ is a directed set, meaning that for any pair $\mu_{1}, \mu_{2} \in {\mathcal {M}}({\varOmega })$ there is a $\mu_{0} \in {\mathcal {M}}({\varOmega })$ dominating both of them (e.g., $\mu_{0} := \mu_{1} + \mu_{2}$).

For fixed $r \in(0,1]$ and measures $\mu_{1} \leq\mu_{2}$ on ${\varOmega }$ we define the linear embedding

$$\imath_{\mu_{2}}^{\mu_{1}}: L^{1/r}({\varOmega }, \mu_{1}) \longrightarrow L^{1/r}({\varOmega }, \mu_{2}), \qquad\phi\longmapsto \phi\; \biggl(\frac{d\mu_{1}}{d\mu _{2}} \biggr)^{r}. $$

Observe that

$$\begin{aligned} \bigl\| \imath_{\mu_{2}}^{\mu_{1}}(\phi) \bigr\| _{1/r} = & \biggl( \int_{{\varOmega }}\bigl|\imath_{\mu_{2}}^{\mu_{1}}( \phi)\bigr|^{1/r}\; d\mu_{2} \biggr)^{r} = \biggl( \int _{{\varOmega }}|\phi|^{1/r} \frac{d\mu_{1}}{d\mu_{2}}\; d \mu_{2} \biggr)^{r} \\ = & \biggl( \int_{{\varOmega }}|\phi|^{1/r}\; d\mu_{1} \biggr)^{r} = \|\phi\|_{1/r}, \end{aligned}$$

(3.50)

so that $\imath_{\mu_{2}}^{\mu_{1}}$ is an isometry. Moreover, $\imath_{\mu _{2}}^{\mu_{1}} \imath_{\mu_{3}}^{\mu_{2}} = \imath_{\mu_{3}}^{\mu_{1}}$ whenever $\mu_{1} \leq\mu_{2} \leq\mu_{3}$. Then we define the space of $r$ th roots of measures on ${\varOmega }$ to be the directed limit over the directed set $({\mathcal {M}}({\varOmega }), \leq)$

$$ {\mathcal {S}}^{r}({\varOmega }) := \lim_{\longrightarrow} L^{1/r}({\varOmega }, \mu). $$

(3.51)

Let us give a more concrete definition of ${\mathcal {S}}^{r}({\varOmega })$. On the disjoint union of the spaces $L^{1/r}({\varOmega }, \mu)$ for $\mu\in {\mathcal {M}}({\varOmega })$ we define the equivalence relation

$$\begin{aligned} L^{1/r}({\varOmega }, \mu_{1}) \ni\phi\sim\psi\in L^{1/r}( {\varOmega }, \mu_{2}) \quad \Longleftrightarrow& \quad\imath_{\mu_{0}}^{\mu_{1}}( \phi) = \imath_{\mu _{0}}^{\mu_{2}}(\psi) \\ \Longleftrightarrow& \quad\phi \biggl(\frac{d\mu_{1}}{d\mu_{0}} \biggr)^{r} = \psi \biggl(\frac{d\mu_{2}}{d\mu_{0}} \biggr)^{r} \end{aligned}$$

for some $\mu_{0} \geq\mu_{1}, \mu_{2}$. Then ${\mathcal {S}}^{r}({\varOmega })$ is the set of all equivalence classes of this relation, cf. Fig. 3.1.

Let us denote the equivalence class of $\phi\in L^{1/r}({\varOmega }, \mu)$ by $\phi\mu^{r}$, so that $\mu^{r} \in {\mathcal {S}}^{r}({\varOmega })$ is the equivalence class represented by $1 \in L^{1/r}({\varOmega }, \mu)$. Then the equivalence relation yields

$$ \mu_{1}^{r} = \biggl( \frac{d\mu_{1}}{d\mu_{2}} \biggr)^{r}\; \mu_{2}^{r} \quad\mbox{as elements of ${\mathcal {S}}^{r}({\varOmega })$} $$

(3.52)

whenever $\mu_{1} \leq\mu_{2}$, justifying this notation. In fact, from this description in the case $r = 1$ we see that

$${\mathcal {S}}^{1}({\varOmega }) = {\mathcal {S}}({\varOmega }). $$

Observe that by (3.50) $\|\phi\|_{1/r}$ is constant on equivalence classes, whence there is a norm on ${\mathcal {S}}^{r}({\varOmega })$, also denoted by $\|\cdot\|_{1/r}$, for which the inclusions

$$ {\mathcal {S}}^{r}({\varOmega }, \mu) \hookrightarrow {\mathcal {S}}^{r}({\varOmega }) \quad\mbox{and} \quad L^{1/r}({\varOmega }, \mu) \longrightarrow {\mathcal {S}}^{r}({\varOmega }), \quad\phi\longmapsto\phi\mu^{r} $$

(3.53)

are isometries. For $r = 1$, we have $\|\cdot\|_{1} = \|\cdot\|_{TV}$. Thus,

$$ \bigl\| \phi\mu^{r}\bigr\| _{1/r} = \|\phi \|_{1/r} = \biggl( \int_{{\varOmega }}|\phi|^{1/r}\; d\mu \biggr)^{r} \quad\mbox{for $0 < r \leq1$.} $$

(3.54)

Note that the equivalence relation also preserves non-negativity of functions, whence we may define the subsets

$$\begin{aligned} \begin{aligned} {\mathcal {M}}^{r}({\varOmega }) & := \bigl\{ \phi\mu^{r} : \mu\in {\mathcal {M}}({\varOmega }), \phi\geq0\bigr\} ,\\ {\mathcal {P}}^{r}({\varOmega }) & := \bigl\{ \phi\mu^{r} : \mu\in {\mathcal {P}}({\varOmega }), \phi\geq0, \|\phi\|_{1/r} = 1\bigr\} . \end{aligned} \end{aligned}$$

(3.55)

In analogy to (3.48) we define for a fixed measure $\mu_{0} \in {\mathcal {M}}({\varOmega })$ and $r \in(0,1]$ the spaces

$$\begin{aligned} {\mathcal {S}}^{r}({\varOmega }, \mu_{0}) := & \bigl\{ \phi\, \mu_{0}^{r} : \phi\in L^{1/r}({\varOmega }, \mu_{0})\bigr\} , \\ {\mathcal {M}}^{r}({\varOmega }, \mu_{0}) := & \bigl\{ \phi\, \mu_{0}^{r} : \phi\in L^{1/r}({\varOmega }, \mu_{0}), \phi\geq0\bigr\} , \\ {\mathcal {P}}^{r}({\varOmega }, \mu_{0}) := & \bigl\{ \phi\, \mu_{0}^{r} : \phi\in L^{1/r}({\varOmega }, \mu_{0}), \phi\geq0, \|\phi\|_{1/r}= 1\bigr\} , \\ {\mathcal {S}}^{r}_{0}({\varOmega }, \mu_{0}) := & \biggl\{ \phi \mu_{0}^{r} : \phi\in L^{1/r}({\varOmega }, \mu_{0}), \int_{{\varOmega }}\phi\; d\mu= 0 \biggr\} . \end{aligned}$$

The elements of ${\mathcal {P}}^{r}({\varOmega }, \mu_{0}), {\mathcal {M}}^{r}({\varOmega }, \mu_{0}), {\mathcal {S}}^{r}({\varOmega }, \mu _{0})$ are said to be dominated by $\mu_{0}^{r}$.

Remark 3.5

The concept of $r$th roots of measures has been indicated in [200, Ex. IV.1.4]. Moreover, if ${\varOmega }$ is a manifold and $r = 1/2$, then ${\mathcal {S}}^{1/2}({\varOmega })$ is even a Hilbert space which has been considered in [192, 6.9.1]. This Hilbert space is also related to the Hilbert manifold of finite-entropy probability measures defined in [201].

The product of powers of measures can now be defined for all $r, s \in (0,1)$ with $r + s \leq1$ and for measures $\phi\mu^{r} \in {\mathcal {S}}^{r}({\varOmega }, \mu)$ and $\psi\mu^{s} \in {\mathcal {S}}^{s}({\varOmega }, \mu)$:

$$ \bigl(\phi\mu^{r}\bigr) \cdot\bigl(\psi\mu^{s}\bigr) := \phi\psi\mu^{r+s}. $$

By definition $\phi\in L^{1/r}({\varOmega }, \mu)$ and $\psi\in L^{1/s}({\varOmega }, \mu)$, whence Hölder’s inequality implies that $\|\phi\psi\|_{1/(r+s)} \leq\|\phi\|_{1/r} \|\psi\|_{1/s} < \infty$, so that $\phi\psi\in L^{1/(r+s)}({\varOmega }, \mu)$ and hence, $\phi\psi\mu ^{r+s} \in {\mathcal {S}}^{r+s}({\varOmega }, \mu)$. Since by (3.52) this definition of the product is independent of the choice of representative $\mu$, it follows that it induces a bilinear product

$$ \cdot: {\mathcal {S}}^{r}({\varOmega }) \times {\mathcal {S}}^{s}({\varOmega }) \longrightarrow {\mathcal {S}}^{r+s}({\varOmega }), \quad\mbox{where $r, s, r + s \in(0,1 ]$}, $$

(3.56)

satisfying the Hölder inequality

$$ \|\nu_{r} \cdot\nu_{s}\|_{1/(r+s)} \leq \|\nu_{r}\|_{1/r} \|\nu_{s}\|_{1/s}, $$

(3.57)

so that the product in (3.56) is a bounded bilinear map.

Definition 3.3

(Canonical pairing)

For $r \in(0,1)$ we define the pairing

$$ (\cdot;\cdot): {\mathcal {S}}^{r}({\varOmega }) \times {\mathcal {S}}^{1-r}( {\varOmega }) \longrightarrow {\mathbb {R}}, \qquad(\nu_{1}; \nu_{2}) := \int_{{\varOmega }}d(\nu_{1} \cdot\nu_{2}). $$

(3.58)

It is straightforward to verify that this pairing is non-degenerate in the sense that

$$ (\nu_{r}; \cdot) = 0 \Longleftrightarrow\nu_{r} = 0. $$

(3.59)

Lemma 3.1

(Cf. [200, Ex. IV.1.3])

Let $\{ \nu_{n}: n \in {\mathbb {N}}\}{\subseteq }{\mathcal {S}}({\varOmega })$ be a countable family of (signed) measures. Then there is a measure $\mu_{0} \in {\mathcal {M}}({\varOmega })$ dominating $\nu_{n}$ for all $n$.

Proof

We assume w.l.o.g. that $\nu_{n} \neq0$ for all $n$ and define

$$\mu_{0} := \sum_{n=1}^{\infty}\frac{1}{2^{n} \|\nu_{n}\|_{TV}} |\nu_{n}|. $$

Since $\|\nu_{n}\|_{TV} = |\nu_{n}|({\varOmega })$, it follows that this sum converges, so that $\mu_{0} \in {\mathcal {M}}({\varOmega })$ is well defined. Moreover, if $\mu_{0}(A) = 0$, then $|\nu_{n}|(A) = 0$ for all $n$, showing that $\mu_{0}$ dominates all $\nu_{n}$ as claimed. □

From Lemma 3.1, we can now conclude the following statement:

$$ \mbox{Any sequence in ${\mathcal {S}}^{r}({\varOmega })$ is contained in ${\mathcal {S}}^{r}({\varOmega }, \mu_{0})$ for some $\mu_{0} \in {\mathcal {M}}( {\varOmega })$}. $$

In particular, any Cauchy sequence in ${\mathcal {S}}^{r}({\varOmega })$ is a Cauchy sequence in ${\mathcal {S}}^{r}({\varOmega }, \mu_{0}) \cong L^{1/r}({\varOmega }, \mu_{0})$ for some $\mu_{0}$ and hence convergent. Thus, $({\mathcal {S}}^{r}({\varOmega }), \|\cdot \|_{1/r})$ is a Banach space. It also follows that ${\mathcal {S}}^{r}({\varOmega }, \mu_{0})$ is a closed subspace of ${\mathcal {S}}({\varOmega })$ for all $\mu_{0} \in {\mathcal {M}}({\varOmega })$.

In analogy to Theorem 3.1, we can also determine the tangent cones of the subsets ${\mathcal {P}}^{r}({\varOmega }) {\subseteq }{\mathcal {M}}^{r}({\varOmega }) {\subseteq }{\mathcal {S}}^{r}({\varOmega })$.

Proposition 3.1

For each $\mu\in {\mathcal {M}}({\varOmega })$ ($\mu\in {\mathcal {P}}({\varOmega })$, respectively), the tangent cone of ${\mathcal {P}}^{r}({\varOmega }) {\subseteq }{\mathcal {M}}^{r}({\varOmega }) {\subseteq }{\mathcal {S}}^{r}({\varOmega })$ at $\mu^{r}$ are $T_{\mu^{r}} {\mathcal {M}}^{r}({\varOmega }) = {\mathcal {S}}^{r}({\varOmega }, \mu)$ and $T_{\mu^{r}} {\mathcal {P}}^{r}({\varOmega }) = {\mathcal {S}}^{r}_{0}({\varOmega }, \mu)$, respectively, so that the tangent fibrations are given as

$$\begin{aligned} T{\mathcal {M}}^{r}({\varOmega }) = \biguplus_{\mu^{r} \in {\mathcal {M}}^{r}({\varOmega })} {\mathcal {S}}^{r}( {\varOmega }, \mu ) {\subseteq }{\mathcal {M}}^{r}({\varOmega }) \times {\mathcal {S}}^{r}({\varOmega }) \end{aligned}$$

and

$$\begin{aligned} T{\mathcal {P}}^{r}({\varOmega }) = \biguplus_{\mu^{r} \in {\mathcal {P}}^{r}({\varOmega })} {\mathcal {S}}^{r}_{0}({\varOmega }, \mu) {\subseteq }{\mathcal {P}}^{r}({\varOmega }) \times {\mathcal {S}}^{r}({\varOmega }). \end{aligned}$$

Proof

We have to adapt the proof of Theorem 3.1. The proof of the statements ${\mathcal {S}}^{r}({\varOmega }, \mu) {\subseteq }T_{\mu^{r}}{\mathcal {M}}^{r}({\varOmega })$ and ${\mathcal {S}}_{0}^{r}({\varOmega }, \mu) {\subseteq }T_{\mu^{r}}{\mathcal {P}}^{r}({\varOmega })$ is identical to that of the corresponding statement in Theorem 3.1; just as in that case, one shows that for $\phi\in L^{1/r}({\varOmega }, \mu_{0})$ the curves $\mu_{t}^{r} := p({\omega }; t)\mu_{0}^{r}$ with $p({\omega };t) := 1 + t\phi({\omega })$ if $t\phi({\omega }) \geq0$ and $p({\omega };\xi) = \exp(t\phi({\omega }))$ if $t\phi({\omega }) < 0$ is a differentiable curve in ${\mathcal {M}}^{r}({\varOmega })$, and $\lambda_{t}^{r} := \mu_{t}^{r}/ \|\mu_{t}^{r}\|_{1/r}$ is a differentiable curve in ${\mathcal {P}}^{r}({\varOmega })$, and their derivative is $\phi\mu_{0}^{r}$ at $t=0$.

In order to show the other direction, let $(\mu_{t}^{r})_{t \in(-{\varepsilon }, {\varepsilon })}$ be a curve in ${\mathcal {M}}^{r}({\varOmega })$. Then ${\mathbb {Q}}\cap(-{\varepsilon }, {\varepsilon })$ is countable, whence by Lemma 3.1 there is a measure $\hat{\mu}$ such that $(\mu_{t}^{r}) \in {\mathcal {M}}^{r}({\varOmega }, \hat{\mu})$ for all $t \in {\mathbb {Q}}$, and since ${\mathcal {S}}^{r}({\varOmega }, \hat{\mu}) {\subseteq }{\mathcal {S}}^{r}({\varOmega })$ is closed, it follows that $\mu_{t}^{r} \in {\mathcal {M}}({\varOmega }, \hat{\mu})$ for all $t$. Now we can apply the argument from Theorem 3.1 to the curve $t \mapsto(\mu_{t}^{r} \cdot\hat{\mu}^{1-r})(A)$ for $A {\subseteq }{\varOmega }$. □

Remark 3.6

Just as in the case of $r = 1$, we may take two points of view on the relation of ${\mathcal {M}}^{r}({\varOmega })$ and ${\mathcal {P}}^{r}({\varOmega })$. The one is that ${\mathcal {P}}^{r}({\varOmega })$ may be regarded as a subset of ${\mathcal {M}}^{r}({\varOmega })$, but also, the normalization map

$${\mathcal {M}}^{r}({\varOmega }) \longrightarrow {\mathcal {P}}^{r}({\varOmega }), \qquad \mu_{r} \longmapsto\dfrac {\mu_{r}}{\|\mu_{r}\|_{1/r}} $$

allows us to regard ${\mathcal {P}}^{r}({\varOmega })$ as the projectivization ${\mathbb {P}}({\mathcal {M}}^{r}({\varOmega }))$. It will depend on the context which point of view is better adapted.

Besides multiplying roots of measures, we also wish to take their powers. Here, we have two possibilities for dealing with signs. For $0 < k \leq r^{-1}$ and $\nu_{r} = \phi\mu^{r} \in {\mathcal {S}}^{r}({\varOmega })$ we define

$$ |\nu_{r}|^{k} := |\phi|^{k} \mu^{rk} \quad\mbox{and} \quad\tilde{\nu}_{r}^{k} := \operatorname {sign}(\phi) |\phi|^{k} \mu^{rk}. $$

(3.60)

Since $\phi\in L^{1/r}({\varOmega }, \mu)$, it follows that $|\phi|^{k} \in L^{1/kr}({\varOmega }, \mu)$, so that $|\nu_{r}|^{k}, \tilde{\nu}_{r}^{k} \in {\mathcal {S}}^{rk}({\varOmega })$. By (3.52) these powers are well defined, independent of the choice of the measure $\mu$, and, moreover,

$$ \bigl\| \;|\nu_{r}|^{k}\;\bigr\| _{1/(rk)} = \bigl\| \tilde{\nu}_{r}^{k}\bigr\| _{1/(rk)} = \|\nu_{r} \|_{1/r}^{k}. $$

(3.61)

Proposition 3.2

Let $r \in(0,1]$ and $0 < k \leq1/r$, and consider the maps

$$\pi^{k}, \tilde{\pi}^{k}: {\mathcal {S}}^{r}({\varOmega }) \longrightarrow {\mathcal {S}}^{rk}({\varOmega }), \qquad \textstyle\begin{array}{l} \pi^{k}(\nu) := |\nu|^{k},\\ \tilde{\pi}^{k}(\nu) := \tilde{\nu}^{k}. \end{array} $$

Then $\pi^{k}, \tilde{\pi}^{k}$ are continuous maps. Moreover, for $1 < k \leq1/r$ they are $C^{1}$-maps between Banach spaces, and their derivatives are given as

$$ d_{\nu_{r}}\tilde{\pi}^{k}(\rho_{r}) = k \; |\nu_{r}|^{k-1} \cdot\rho_{r} \quad \textit{and} \quad d_{\nu_{r}} \pi^{k} (\rho_{r}) = k\; \tilde{\nu}_{r}^{k-1} \cdot\rho_{r}. $$

(3.62)

Observe that for $k = 1$, $\pi^{1}(\nu_{r}) = |\nu_{r}|$ fails to be $C^{1}$, whereas $\tilde{\pi}^{1}(\nu_{r})= \nu_{r}$, so that $\tilde{\pi}^{1}$ is the identity and hence a $C^{1}$-map.

Proof

Let us first assume that $0 < k \leq1$. We assert that for all $x, y \in {\mathbb {R}}$ we have the estimates

$$ \begin{aligned} & \bigl\vert |x+y|^{k}-|x|^{k} \bigr\vert \leq|y|^{k}\quad\mbox{and}\\ & \bigl\vert \operatorname {sign}(x+y) |x+y|^{k} - \operatorname {sign}(x) |x|^{k} \bigr\vert \leq2^{1-k} |y|^{k}. \end{aligned} $$

(3.63)

For $k = 1$, (3.63) is obvious. If $0 < k < 1$, then by homogeneity it suffices to show these for $y = 1$. Note that the functions

$$x \longmapsto|x+1|^{k}-|x|^{k} \quad\mbox{and} \quad x \longmapsto \operatorname {sign}(x+1) |x+1|^{k} - \operatorname {sign}(x) |x|^{k} $$

are continuous and tend to 0 for $x \to\pm\infty$, and then (3.63) follows by elementary calculus.

Let $\nu_{1}, \nu_{2} \in {\mathcal {S}}^{r}({\varOmega })$, and choose $\mu_{0} \in {\mathcal {M}}({\varOmega })$ such that $\nu_{1}, \nu_{2} \in {\mathcal {S}}^{r}({\varOmega }, \mu_{0})$, i.e., $\nu_{i} = \phi_{i} \mu _{0}^{r}$ with $\phi_{i} \in L^{1/r}({\varOmega }, \mu_{0})$. Then

$$\begin{aligned} \bigl\| \pi^{k}(\nu_{1} + \nu_{2}) - \pi^{k}(\nu_{1})\bigr\| _{1/rk} = & \bigl\| | \phi_{1} + \phi _{2}|^{k} - |\phi_{1}|^{k} \bigr\| _{1/rk}\\ \leq& \bigl\| \;|\phi_{2}|^{k}\bigr\| _{1/rk} \quad\mbox{by (3.63)}\\ = & \|\nu_{2}\|_{1/r}^{k} \quad\mbox{by (3.61)}, \end{aligned}$$

so that $\lim_{\|\nu_{2}\|_{1/r} \to0} \|\pi^{k}(\nu_{1} + \nu_{2}) - \pi^{k}(\nu _{1})\|_{1/rk} = 0$, showing the continuity of $\pi^{k}$ for $0 < k \leq 1$. The continuity of $\tilde{\pi}^{k}$ follows analogously.

Now let us assume that $1 < k \leq1/r$. In this case, the functions

$$x \longmapsto|x|^{k} \quad\mbox{and} \quad x \longmapsto \operatorname {sign}(x) |x|^{k} $$

with $x \in {\mathbb {R}}$ are $C^{1}$-maps with respective derivatives

$$x \longmapsto k\; \operatorname {sign}(x) |x|^{k-1} \quad\mbox{and} \quad x \longmapsto k |x|^{k-1}. $$

Thus, if we pick $\nu_{i} = \phi_{i} \mu_{0}^{r}$ as above, then by the mean value theorem we have

$$\begin{aligned} \pi^{k}(\nu_{1} + \nu_{2}) - \pi^{k}( \nu_{1}) = & \bigl(|\phi_{1} + \phi_{2}|^{k} - |\phi _{1}|^{k}\bigr) \mu_{0}^{rk} \\ = & k\; \operatorname {sign}(\phi_{1} + \eta\phi_{2}) |\phi_{1} + \eta\phi_{2}|^{k - 1} \phi_{2} \mu_{0}^{rk} \\ = & k\; \operatorname {sign}(\phi_{1} + \eta\phi_{2}) |\phi_{1} + \eta\phi_{2}|^{k - 1} \mu _{0}^{r(k-1)} \cdot \nu_{2} \end{aligned}$$

for some function $\eta: {\varOmega }\to(0,1)$. If we let $\nu_{\eta}:= \eta \phi_{2} \mu_{0}^{r}$, then $\|\nu_{\eta}\|_{1/r} \leq\|\nu_{2}\|_{1/r}$, and we get

$$\pi^{k}(\nu_{1} + \nu_{2}) - \pi^{k}( \nu_{1}) = k \tilde{\pi}^{k-1}(\nu_{1} + \nu _{\eta}) \cdot\nu_{2}. $$

With the definition of $d_{\nu_{1}}\tilde{\pi}^{k}$ from (3.62) we have

$$\begin{aligned} &\bigl\| \pi^{k}(\nu_{1} + \nu_{2}) - \pi^{k}(\nu_{1}) - d_{\nu_{1}} \pi^{k}( \nu_{2})\bigr\| _{1/(rk)} \\ &\quad= \bigl\| k \bigl(\tilde{\pi}^{k-1}(\nu_{1} + \nu_{\eta}) - \tilde{\pi}^{k-1}(\nu_{1})\bigr) \cdot\nu_{2} \bigr\| _{1/(rk)} \\ &\quad\leq k \bigl\| \tilde{\pi}^{k-1}(\nu_{1} + \nu_{\eta}) - \tilde{\pi}^{k-1}(\nu _{1})\bigr\| _{1/(r(k-1))} \|\nu_{2} \|_{1/r} \end{aligned}$$

and hence,

$$\frac{\|\pi^{k}(\nu_{1} + \nu_{2}) - \pi^{k}(\nu_{1}) - d_{\nu_{1}} \pi^{k}(\nu_{2})\| _{\frac{1}{rk}}}{\|\nu_{2}\|_{\frac{1}{r}}} \leq k \bigl\| \tilde{\pi}^{k-1}(\nu_{1} + \nu_{\eta}) - \tilde{\pi}^{k-1}(\nu_{1}) \bigr\| _{\frac{1}{r(k-1)}}. $$

Thus, the differentiability of $\pi^{k}$ will follow if

$$\bigl\| \tilde{\pi}^{k-1}(\nu_{1} + \nu_{\eta}) - \tilde{\pi}^{k-1}(\nu_{1})\bigr\| _{1/(r(k-1))} \xrightarrow{\| \nu_{2}\|_{1/r} \to0} 0, $$

and because of $\|\nu_{\eta}\|_{1/r} \leq\|\nu_{2}\|_{1/r}$, this is the case if $\tilde{\pi}^{k - 1}$ is continuous.

Analogously, one shows that $\tilde{\pi}^{k}$ is differentiable if $\pi ^{k-1}$ is continuous.

Since we already know continuity of $\pi^{k}$ and $\tilde{\pi}^{k}$ for $0 < k \leq1$, and since $C^{1}$-maps are continuous, the claim now follows by induction on $\lceil k \rceil$. □

Thus, (3.62) implies that the differentials of $\pi^{k}$ and $\tilde{\pi}^{k}$ (which coincide on ${\mathcal {P}}^{r}({\varOmega })$ and ${\mathcal {M}}^{r}({\varOmega })$) yield continuous maps

$$ d\pi^{k} = d\tilde{\pi}^{k}: \textstyle\begin{array}{c@{\quad}c@{\quad}c} T{\mathcal {P}}^{r}({\varOmega }) & \longrightarrow& T{\mathcal {P}}^{rk}({\varOmega })\\ T{\mathcal {M}}^{r}({\varOmega }) & \longrightarrow& T{\mathcal {M}}^{rk}({\varOmega }), \end{array}\displaystyle \qquad(\mu, \rho) \longmapsto k \mu^{rk-r} \cdot\rho. $$

3.2.4 Parametrized Measure Models and $k$-Integrability

In this section, we shall now present our notion of a parametrized measure model.

Definition 3.4

(Parametrized measure model)

Let ${\varOmega }$ be a measurable space.

(1)
A parametrized measure model is a triple $(M, {\varOmega }, \mathbf{p})$ where $M$ is a (finite or infinite-dimensional) Banach manifold and $\mathbf{p}: M \to {\mathcal {M}}({\varOmega }) {\subseteq }{\mathcal {S}}({\varOmega })$ is a $C^{1}$-map in the sense of Definition C.3.
(2)
The triple $(M, {\varOmega }, \mathbf{p})$ is called a statistical model if it consists only of probability measures, i.e., such that the image of $\mathbf{p}$ is contained in ${\mathcal {P}}({\varOmega })$.
(3)
We call such a model dominated by $\mu_{0}$ if the image of $\mathbf{p}$ is contained in ${\mathcal {M}}({\varOmega }, \mu_{0})$. In this case, we use the notation $(M, {\varOmega }, \mu_{0}, \mathbf {p})$ for this model.

If a parametrized measure model $(M, {\varOmega }, \mu_{0}, \mathbf{p})$ is dominated by $\mu_{0}$, then there is a density function $p: {\varOmega }\times M \to {\mathbb {R}}$ such that

$$ {\mathbf{p}}(\xi) = p(\cdot;\xi) \mu_{0}. $$

(3.64)

Evidently, we must have $p(\cdot;\xi) \in L^{1}({\varOmega }, \mu_{0})$ for all $\xi$. In particular, for fixed $\xi$, $p(\cdot;\xi)$ is defined only up to changes on a $\mu_{0}$-null set. The existence of a dominating measure $\mu_{0}$ is not a strong restriction, as the following shows.

Proposition 3.3

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model. If $M$ contains a countable dense subset, e.g., if $M$ is a finite-dimensional manifold, then there is a measure $\mu_{0} \in {\mathcal {M}}({\varOmega })$ dominating the model.

Proof

Let $(\xi_{n})_{n \in {\mathbb {N}}} {\subseteq }M$ be a dense countable subset. By Lemma 3.1, there is a measure $\mu_{0}$ dominating all measures $\mathbf{p}(\xi_{n})$ for $n \in {\mathbb {N}}$, i.e., $\mathbf{p}(\xi_{n}) \in {\mathcal {M}}({\varOmega }, \mu_{0})$. If $\xi_{n_{k}} \to \xi$, so that $\mathbf{p}(\xi_{n_{k}}) \to{\mathbf{p}}(\xi)$, then as the inclusion ${\mathcal {S}}({\varOmega }, \mu_{0}) \hookrightarrow {\mathcal {S}}({\varOmega })$ is an isometry by (3.53), it follows that $(\mathbf{p}(\xi_{n_{k}}))_{k \in {\mathbb {N}}}$ is a Cauchy sequence in ${\mathcal {S}}({\varOmega }, \mu_{0})$, and as the latter is complete, it follows that $\mathbf{p}(\xi) \in {\mathcal {S}}({\varOmega }, \mu_{0}) \cap {\mathcal {M}}({\varOmega }) = {\mathcal {M}}({\varOmega }, \mu_{0})$. □

Definition 3.5

(Regular density function)

Let $(M, {\varOmega }, \mu_{0}, \mathbf{p})$ be a parametrized measure model dominated by $\mu_{0}$. We say that this model has a regular density function if the density function $p: {\varOmega }\times M \to {\mathbb {R}}$ satisfying (3.64) can be chosen such that for all $V \in T_{\xi}M$ the partial derivative $\partial_{V} p(\cdot;\xi)$ exists and lies in $L^{1}({\varOmega },\mu_{0})$.

Remark 3.7

The standard notion of a statistical model always assumes that it is dominated by some measure and has a positive regular density function (e.g., [9, §2 , p. 25], [16, §2.1], [219], [25, Definition 2.4]). In fact, the definition of a parametrized measure model or statistical model in [25, Definition 2.4] is equivalent to a parametrized measure model or statistical model with a positive regular density function in the sense of Definition 3.5. In contrast, in [26], the assumption of regularity and, more importantly, of the positivity of the density function $p$ is dropped.

It is worth pointing out that the density function $p$ of a parametrized measure model $(M, {\varOmega }, \mu_{0}, \mathbf{p})$ does not need to be regular, so that the present notion is indeed more general. The formal definition of differentiability of $\mathbf{p}$ implies that for each $C^{1}$-path $\xi(t) \in M$ with $\xi(0) = \xi$, $\dot{\xi}(0) =: V \in T_{\xi}M$, the curve $t \mapsto p(\cdot;\xi(t)) \in L^{1}({\varOmega }, \mu _{0})$ is differentiable. This implies that there is a $d_{\xi}{\mathbf {p}}(V) \in L^{1}({\varOmega }, \mu_{0})$ such that

$$\biggl\Vert \frac{p(\cdot;\xi(t)) - p(\cdot;\xi)}{t} - d_{\xi}{\mathbf {p}}(V) (\cdot) \biggr\Vert _{1} \xrightarrow{\quad t \to0\quad} 0. $$

If this is a pointwise convergence, then $d_{\xi}{\mathbf{p}}(V) = \partial_{V} p(\cdot; \xi)$ is the partial derivative and whence, $\partial_{V} p(\cdot; \xi)$ lies in $L^{1}({\varOmega }, \mu_{0})$, so that the density function is regular.

However, in general convergence in $L^{1}({\varOmega }, \mu_{0})$ does not imply pointwise convergence, whence there are parametrized measure models in the sense of Definition 3.4 without a regular density function, cf. Example 3.2.3 below. Nevertheless, we shall use the following notations interchangeably,

$$ d_{\xi}{\mathbf{p}}(V) = {\partial }_{V} \mathbf{p} = \partial_{V} p(\cdot;\xi)\; \mu_{0}, $$

(3.65)

even if $\mathbf{p}$ does not have a regular density function and the derivative $\partial_{V} p(\cdot; \xi)$ does not exist.

Example 3.2

(1)
The family of normal distributions on ℝ
$$p(\mu, \sigma) := \frac{1}{\sqrt{2\pi\sigma}} \exp\biggl(-\frac{(x-\mu )^{2}}{2\sigma^{2}}\biggr)\; dx $$
is a statistical model with regular density function on the upper half-plane $H = \{(\mu, \sigma) : \mu, \sigma\in {\mathbb {R}}, \sigma> 0\}$.
(2)
To see that there are parametrized measure models without a regular density function, consider the family $(\mathbf{p}(\xi))_{\xi> -1}$ of measures on ${\varOmega }= (0,\pi)$
$${\mathbf{p}}(\xi) := \left\{ \textstyle\begin{array}{l@{\quad}l} (1 + \xi\; (\sin^{2} (t - 1/\xi))^{1/\xi^{2}})\; dt & \mbox{for $\xi\neq0$,}\\ dt & \mbox{for $\xi= 0$}. \end{array}\displaystyle \right. $$
This model is dominated by the Lebesgue measure $dt$, with density function $p(t;\xi) = 1 + \xi\; (\sin^{2} (t - 1/\xi))^{1/\xi^{2}}$ for $\xi\neq0$, $p(t; 0) = 1$. Thus, the partial derivative $\partial_{\xi}p$ does not exist at $\xi= 0$, whence the density function is not regular.

On the other hand, $\mathbf{p}: (-1, \infty) \to {\mathcal {M}}({\varOmega }, dt)$ is differentiable at $\xi= 0$ with $d_{0}{\mathbf{p}}(\partial_{\xi}) = 0$, so that $(M, {\varOmega }, \mathbf{p})$ is a parametrized measure model in the sense of Definition 3.4. To see this, we calculate
$$\begin{aligned} \begin{aligned} \biggl\Vert \frac{\mathbf{p}(\xi) - \mathbf{p}(0)}{\xi}\biggr\Vert _{TV} & = \bigl\Vert \bigl(\sin^{2} (t - 1/\xi)\bigr)^{1/\xi^{2}}\; dt \bigr\Vert _{1} \\ & = \int_{0}^{\pi}\bigl(\sin^{2} (t - 1/\xi) \bigr)^{1/\xi^{2}}\; dt \\ & = \int_{0}^{\pi}\bigl(\sin^{2} t \bigr)^{1/\xi^{2}}\; dt \xrightarrow{\xi\to0} 0, \end{aligned} \end{aligned}$$
which shows the claim. Here, we used the $\pi$-periodicity of the integrand for fixed $\xi$ and dominated convergence in the last step.

Since for a parametrized measure model $(M, {\varOmega }, \mathbf{p})$ the map $\mathbf{p}$ is $C^{1}$, it follows that its derivative yields a continuous map between the tangent fibrations

$$d\mathbf{p}: {TM} \longrightarrow T{\mathcal {M}}({\varOmega }) = \biguplus_{\mu\in {\mathcal {M}}({\varOmega })} {\mathcal {S}}({\varOmega }, \mu). $$

That is, for each tangent vector $V \in T_{\xi}M$, its differential $d_{\xi}{\mathbf{p}}(V)$ is contained in ${\mathcal {S}}({\varOmega }, \mathbf{p}(\xi))$ and hence dominated by $\mathbf{p}(\xi)$. Therefore, we can take the Radon–Nikodym derivative of $d_{\xi}{\mathbf{p}}(V)$ w.r.t. $\mathbf {p}(\xi)$.

Definition 3.6

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model. Then for each tangent vector $V \in T_{\xi}M$ of $M$, we define

$$ \partial_{V} \log{\mathbf{p}}(\xi) := \frac{d\{d_{\xi}{\mathbf{p}}(V)\}}{d\mathbf{p}(\xi)} \in L^{1}\bigl({\varOmega }, p(\xi)\bigr) $$

(3.66)

and call this the logarithmic derivative of $\mathbf{p}$ at $\xi$ in the direction $V$.

If such a model is dominated by $\mu_{0}$ and has a positive regular density function $p$ for which (3.64) holds, then we calculate the Radon–Nikodym derivative as

$$\begin{aligned} \frac{d\{d_{\xi}{\mathbf{p}}(V)\}}{d\mathbf{p}(\xi)} = & \frac{d\{ d_{\xi}{\mathbf{p}}(V)\}}{d\mu_{0}} \cdot \biggl(\frac{d\mathbf{p}(\xi )}{d\mu_{0}} \biggr)^{-1}\\ = & \partial_{V} p(\cdot;\xi) \bigl(p(\cdot;\xi) \bigr)^{-1} = \partial_{V} \log p(\cdot;\xi). \end{aligned}$$

This justifies the notation in (3.66) even for models without a positive regular density function.

For a parametrized measure model $(M, {\varOmega }, \mathbf{p})$ and $k > 1$ we consider the map

$$ {\mathbf{p}}^{1/k} := \pi^{1/k} {\mathbf{p}}: M \longrightarrow {\mathcal {M}}^{1/k}({\varOmega }) {\subseteq }{\mathcal {S}}^{1/k}({\varOmega }), \qquad\xi \longmapsto{\mathbf {p}}(\xi)^{1/k}. $$

Since $\pi^{1/k}$ is continuous by Proposition 3.2, it follows that $\mathbf{p}^{1/k}$ is continuous as well. We define the following notions of $k$-integrability.

Definition 3.7

($k$-Integrable parametrized measure model)

A parametrized measure model $(M, {\varOmega }, \mathbf{p})$ (statistical model, respectively) is called $k$ -integrable if the map

$${\mathbf{p}}^{1/k}: M \longrightarrow {\mathcal {M}}^{1/k}({\varOmega }) {\subseteq }{\mathcal {S}}^{1/k}({\varOmega }) $$

is a $C^{1}$-map in the sense of Definition C.1. It is called weakly $k$ -integrable if this map is a weak $C^{1}$-map, again in the sense of Definition C.1.

Furthermore, we call the model (weakly) $\infty$ -integrable if it is (weakly) $k$-integrable for all $k \geq1$.

Evidently, every parametrized measure model is 1-integrable by definition. Moreover, since for $1 \leq l < k$ we have $\mathbf {p}^{1/l} = \pi^{k/l}{\mathbf{p}}^{1/k}$ and $\pi^{k/l}$ is a $C^{1}$-map by Proposition 3.2, it follows that (weak) $k$-integrability implies (weak) $l$-integrability for $1 \leq l < k$.

Example 3.3

An exponential family, as defined by (3.31), generalizes the family of normal distributions and represents an extremely important statistical model. It will play a central role throughout this book.

Adapting (3.31) to the notation in this context, we can write it for $\xi= (\xi^{1}, \ldots, \xi^{n}) \in U {\subseteq }{\mathbb {R}}^{n}$ as

$$ {\mathbf{p}}(\xi)=\exp \Biggl(\gamma({\omega })+\sum _{i=1}^{n} f_{i}({\omega })\xi^{i}-\psi (\xi) \Biggr)\mu({\omega }), $$

(3.67)

for suitable functions $f_{i}, \gamma$ on ${\varOmega }$ and $\psi$ on $U$. Therefore, for $V = (v_{1}, \ldots, v_{n})$,

$$\begin{aligned} {\partial }_{V} \mathbf{p}^{1/k}(\xi) &= \dfrac{1}{k} \Biggl(\,\sum_{i=1}^{n} v_{i} f_{i}({\omega }) + {\partial }_{V} \psi(\xi) \Biggr)\\ &\quad{}\times\exp \Biggl(\gamma({\omega })/k + \sum_{i=1}^{n} f_{i}({\omega })\xi^{i}/k - \psi(\xi)/k \Biggr) \mu( {\omega })^{1/k}, \end{aligned}$$

and the $k$-integrability of this model for any $k$ is easily verified from there. Therefore, exponential families provide a class of examples of $\infty$-integrable parametrized measure models. See also [216, p. 1559].

Remark 3.8

(1)
In Sect. 2.8.1, we have introduced exponential families for finite sets as affine spaces. Let us comment on that structure for the setting of arbitrary measurable spaces as described in [192]. Consider the affine action $(\mu,f) \mapsto e^{f} \mu$, defined by (2.130). Clearly, multiplication of a finite measure $\mu\in{\mathcal {M}}_{+}(\varOmega)$ with $e^{f}$ will in general lead to a positive measure on $\varOmega$ that is not finite but $\sigma$-finite. As we restrict attention to finite measures, and thereby obtain the Banach space structure of ${\mathcal {S}}(\varOmega)$, it is not possible to extend the affine structure of Sect. 2.8.1 to the general setting of measurable spaces. However, we can still define exponential families as geometric objects that correspond to a “section” of an affine space, leading to the expression (3.31).
(2)
Fukumizu [101] proposed the notion of a kernel exponential family, based on the theory of reproducing kernel Hilbert spaces. In this approach, one considers a Hilbert space ℋ of functions $\varOmega\to{\mathbb {R}}$ that is defined in terms of a so-called reproducing kernel $k: \varOmega\times\varOmega\to{\mathbb {R}}$. In particular, the inner product on ℋ is given as
$$\langle f, g \rangle_{\mathcal {H}} = \int \int f({\omega }) \, g\bigl({\omega }'\bigr) \, k\bigl({\omega }, {\omega }'\bigr) \, \mu_{0}({d{\omega }}) \, \mu_{0} \bigl({d{\omega }'}\bigr). $$
We will revisit this product again in Sect. 6.4 (see (6.196)). Within this framework, the sum $\sum_{i=1}^{n} f_{i}({\omega }) \, \xi^{i}$ in (3.67) is then replaced by an integral $\int k({\omega }, {\omega }') \, \xi({\omega }') \, \mu_{0}(d {\omega }')$, leading to the definition of a kernel exponential family.

Proposition 3.4

Let $(M, {\varOmega }, \mathbf{p})$ be a (weakly) $k$-integrable parametrized measure model. Then its (weak) derivative is given as

$$ {\partial }_{V} \mathbf{p}^{1/k}(\xi) := \frac{1}{k} \partial_{V} \log{\mathbf{p}}(\xi) \; \mathbf{p}^{1/k}(\xi) \in {\mathcal {S}}^{1/k}\bigl({\varOmega }, \mathbf{p}(\xi) \bigr), $$

(3.68)

and for any functional $\alpha\in {\mathcal {S}}^{1/k}({\varOmega })'$ we have

$$ {\partial }_{V} \alpha\bigl(\mathbf{p}(\xi)^{1/k} \bigr) = \alpha\bigl({\partial }_{V} \mathbf{p}^{1/k}(\xi)\bigr). $$

(3.69)

Observe that if $\mathbf{p}(\xi) = p(\cdot;\xi) \mu_{0}$ with a regular density function $p$, the derivative ${\partial }_{V} p^{1/k}({\omega }; \xi)$ is indeed the partial derivative of the function $p({\omega };\xi)^{1/k}$.

Proof

Suppose that the model is weakly $k$-integrable, i.e., $\mathbf{p}^{1/k}$ is weakly differentiable, and let $V \in T_{\xi}M$ and $\alpha\in {\mathcal {S}}({\varOmega })'$. Then

$$\begin{aligned} \alpha\bigl({\partial }_{V} \log{\mathbf{p}}(\xi) \mathbf{p}(\xi)\bigr) =& \alpha({\partial }_{V} \mathbf{p}) \stackrel{\scriptsize\mbox{(3.66)}}{=} \alpha\bigl( {\partial }_{V} \bigl(\pi^{k} \mathbf {p}^{1/k}\bigr)\bigr) \\ \stackrel{\scriptsize\mbox{(3.62)}}{=}& \alpha\bigl(k \mathbf{p}(\xi)^{1-1/k} \cdot {\partial }_{V} \mathbf{p}^{1/k}\bigr), \end{aligned}$$

whence

$$\alpha \bigl( \bigl(k {\partial }_{V} \mathbf{p}^{1/k} - {\partial }_{V} \log{\mathbf{p}}(\xi) \mathbf{p}^{1/k} \bigr) \cdot{ \mathbf{p}}(\xi)^{1-1/k} \bigr) = 0 $$

for all $\alpha\in {\mathcal {S}}({\varOmega })'$. On the other hand, $\partial_{V} \log {\mathbf{p}} \in T_{\mathbf{p}^{1/k}(\xi)}{\mathcal {M}}^{1/k}({\varOmega }) = {\mathcal {S}}^{1/k}({\varOmega }, \mathbf{p}(\xi))$ according to Proposition 3.1, and from this, (3.68) follows.

The identity (3.69) is simply the definition of ${\partial }_{V} \mathbf{p}^{1/k}(\xi)$ being the weak Gâteaux-derivative of $\mathbf{p}^{1/k}$, cf. Proposition C.2. □

The following now gives a description of (weak) integrability in terms of the (weak) derivative.

Theorem 3.2

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model. Then the following hold:

(1)
The model is $k$-integrable if and only if the map
$$ V \longmapsto\bigl\| {\partial }_{V} \mathbf{p}^{1/k} \bigr\| _{k} $$
(3.70)
defined on ${TM}$ is continuous.
(2)
The model is weakly $k$-integrable if and only if the weak derivative of $\mathbf{p}^{1/k}$ is weakly continuous, i.e., if for all $V_{0} \in {TM}$
$${\partial }_{V} \mathbf{p}^{1/k} \rightharpoonup {\partial }_{V_{0}} { \mathbf{p}}^{1/k} \quad\textit{as} \quad V \to V_{0}. $$

Remark 3.9

In [25, Definition 2.4], $k$-integrability (in case of a positive regular density function) was defined by the continuity of the norm function in (3.70), whence it coincides with Definition 3.7 by Theorem 3.2.

Our motivation for also introducing the more general definition of weak $k$-integrability is that it is the weakest condition that ensures that integration and differentiation of $k$th roots can be interchanged, as explained in the following.

If $(M, {\varOmega }, \mu_{0}, \mathbf{p})$ has the density function $p: {\varOmega }\times M \to {\mathbb {R}}$ given by (3.64), then for $\xi\in M$ and $V \in T_{\xi}M$ we have

$${\mathbf{p}}^{1/k}(\xi) = p(\cdot; \xi)^{1/k} \mu_{0}^{1/k} \quad\mbox{and} \quad {\partial }_{V} \mathbf{p}^{1/k} = {\partial }_{V} p(\cdot; \xi)^{1/k} \mu_{0}^{1/k}, $$

where

$$ {\partial }_{V} p(\cdot; \xi)^{1/k} := \dfrac{1}{k} \dfrac{{\partial }_{V} p(\cdot; \xi)}{p(\cdot ; \xi)^{1-1/k}} \in L^{k}({\varOmega }, \mu_{0}). $$

(3.71)

Thus, if we let $\alpha(\cdot) := (\cdot; \mu_{0}^{1-1/k})$ with the canonical pairing from (3.58), then (3.69) takes the form

$$ {\partial }_{V} \int_{A} p({\omega }; \xi)^{1/k}\; d\mu_{0}( {\omega }) = \int_{A} {\partial }_{V} p({\omega }; \xi )^{1/k}\; d \mu_{0}({\omega }). $$

(3.72)

Evidently, if $p$ is a regular density function, then the weak partial derivative is indeed the partial derivative of $p$, in which case (3.72) is obvious, as integration and differentiation may be interchanged under these regularity conditions.

Example 3.4

For arbitrary $k > 1$, the following is an example of a parametrized measure model which is $l$-integrable for all $1 \leq l < k$, weakly $k$-integrable, but not $k$-integrable.

Let ${\varOmega }= (-1,1)$ with the Lebesgue measure $dt$, and let $f: [0, \infty) \longrightarrow {\mathbb {R}}$ be a smooth function such that

$$f(u) > 0, f'(u) < 0 \quad\mbox{for $u \in[0,1)$,}\qquad f(u) \equiv0\quad \mbox{for $u \geq1$}. $$

For $\xi\in {\mathbb {R}}$, define the measure $\mathbf{p}(\xi) = p(\xi;t)\; dt$, where

$$p(\xi;t) := \textstyle\begin{cases} 1& \mbox{if $ t \leq0$ and $\xi\in {\mathbb {R}}$ arbitrary,}\\ |\xi|^{k-1} f(t |\xi|^{-1})^{k}\; dt & \mbox{if $\xi\neq0$ and $t > 0$,}\\ 0 & \mbox{otherwise}. \end{cases} $$

Since on its restrictions to $(-1,0]$ and $(0,1)$ the density function $p$ as well as $p^{r}$ with $0 < r < 1$ are positive with bounded derivative for $\xi\neq0$, it follows that $\mathbf{p}$ is $\infty $-integrable for $\xi\neq0$. For $\xi= 0$, we have

$$\bigl\| {\mathbf{p}}(\xi) - \mathbf{p}(0)\bigr\| _{1} = |\xi|^{k-1} \int_{0}^{|\xi|} f\bigl(t |\xi|^{-1} \bigr)^{k}\; dt = |\xi|^{k} \int_{0}^{1} f(u)^{k}\; du, $$

whence, as $k > 1$,

$$\lim_{\xi\to0} \dfrac{\|{\mathbf{p}}(\xi) - \mathbf{p}(0)\|_{1}}{|\xi|} = 0, $$

showing that $\mathbf{p}$ is also a $C^{1}$-map at $\xi= 0$ with differential ${\partial }_{\xi}{\mathbf{p}}(0) = 0$. That is, $({\mathbb {R}}, {\varOmega }, \mathbf {p})$ is a parametrized measure model with $d_{0}{\mathbf{p}} = 0$.

For $1 \leq l \leq k$ and $\xi\neq0$ we calculate

$$\begin{aligned} {\partial }_{\xi}{\mathbf{p}}^{1/l} &= \chi_{(0,1)}(t)\; {\partial }_{\xi}\bigl(|\xi |^{(k-1)/l} f\bigl(t|\xi|^{-1} \bigr)^{k/l} \bigr)\; dt^{1/l}\\ & = \chi_{(0,1)}(t) \operatorname {sign}(\xi) |\xi|^{(k-l-1)/l} \biggl( \dfrac {k-1}{l} f(u)^{k/l} - u \bigl(f^{k/l} \bigr)'(u) \biggr) \bigg|_{u=t|\xi|^{-1}}\; dt^{1/l}. \end{aligned}$$

Thus, it follows that

$$\begin{aligned} \bigl\| {\partial }_{\xi}{\mathbf{p}}^{1/l}(\xi)\bigr\| _{l}^{l} &= |\xi|^{k-l} \int_{0}^{1} \biggl(\dfrac{k-1}{l} f(u)^{k/l} - u \bigl(f^{k/l}\bigr)'(u) \biggr)^{l}\; du, \end{aligned}$$

so that

$$\textstyle\begin{array}{llllll} \lim_{\xi\to0} \bigl\| {\partial }_{\xi}{\mathbf{p}}^{1/l}(\xi)\bigr\| _{l} & = & 0 & = & \bigl\| {\partial }_{\xi}{\mathbf{p}}^{1/l}(0)\bigr\| _{l} & \quad\mbox{for $1 \leq l < k$},\\ \lim_{\xi\to0} \bigl\| {\partial }_{\xi}{\mathbf{p}}^{1/k}(\xi)\bigr\| _{k} & > & 0 & = & \bigl\| {\partial }_{\xi}{\mathbf{p}}^{1/k}(0)\bigr\| _{k}. \end{array} $$

That is, by Theorem 3.2 the model is $l$-integrable for $1 \leq l < k$, but it fails to be $k$-integrable. On the other hand,

$$\bigl\| {\partial }_{\xi}{\mathbf{p}}^{1/k}(\xi) \cdot dt^{1-1/k} \bigr\| _{1} = |\xi|^{1-1/k} \int_{0}^{1} \biggl( \biggl(1-\frac{1}{k} \biggr) f(u) - u f'(u) \biggr)\; du, $$

so that $\lim_{\xi\to0} \|{\partial }_{\xi}{\mathbf{p}}^{1/k}(\xi) \cdot dt^{1-1/k}\|_{1} = 0$. As we shall show in Lemma 3.3 below, this implies that ${\partial }_{\xi}{\mathbf{p}}^{1/k}(\xi)\rightharpoonup 0 = {\partial }_{\xi}{\mathbf{p}}^{1/k}(0)$ as $\xi\to0$. Thus, the model is weakly $k$-integrable by Theorem 3.2.

The rest of this section will be devoted to the proof of Theorem 3.2 which is somewhat technical and therefore will be divided into several lemmas.

Before starting the proof, let us give a brief outline of its structure.

We begin by proving the second statement of Theorem 3.2. Note that for a weak $C^{1}$-map the differential is weakly continuous by definition, so one direction of the proof is trivial. The reverse implication is the content of Lemmata 3.2 through 3.6.

We give a decomposition of the dual space ${\mathcal {S}}^{1/k}({\varOmega })'$ in Lemma 3.2 and a sufficient criterion for the weak convergence of sequences in ${\mathcal {S}}^{1/k}({\varOmega }, \mu_{0})$ in Lemma 3.3 as well as a criterion for weak $k$-integrability in terms of interchanging differentiation and integration along curves in $M$ in Lemma 3.4.

Unfortunately, we are not able to verify this criterion directly for an arbitrary model $\mathbf{p}$. The technical obstacle is that the measures of the family $\mathbf{p}(\xi)$ need not be equivalent. We overcome this difficulty by modifying the model $\mathbf{p}$ to a model $\mathbf{p}_{{\varepsilon }}(\xi) := \mathbf{p}(\xi) + {\varepsilon }\mu_{0}$, where ${\varepsilon }> 0$ and $\mu_{0} \in {\mathcal {M}}({\varOmega })$ is a suitable measure, so that $\mathbf {p}_{{\varepsilon }}$ has a positive density function. Then we show in Lemma 3.5 that the differential of $\mathbf{p}_{{\varepsilon }}$ remains weakly continuous, and finally in Lemma 3.6 we show that $\mathbf{p}_{{\varepsilon }}$ is weakly $k$-integrable as it satisfies the criterion given in Lemma 3.4; furthermore it is shown that taking the limit ${\varepsilon }\to0$ implies the weak $k$-integrability of $\mathbf{p}$ as well, proving the second part of Theorem 3.2.

The first statement of Theorem 3.2 is proven in Lemmata 3.7 and 3.8. Again, one direction is trivial: if the model is $k$-integrable, then its differential is continuous by definition, whence so is its norm (3.70). That is, we have to show the converse.

In Lemma 3.7 we show that the continuity of the map (3.70) implies the weak continuity of the differential of $\mathbf{p}^{1/k}$. This implies, on the one hand, that the model is weakly $k$-integrable by the second part of Theorem 3.2 which was already shown, and on the other hand, the Radon–Riesz theorem (cf. Theorem C.3) implies that the differential of $\mathbf {p}^{1/k}$ is even norm-continuous. Then in Lemma 3.8 we give the standard argument that a weak $C^{1}$-map with a norm-continuous differential must be a $C^{1}$-map, and this will complete the proof.

Lemma 3.2

For each $\mu_{0} \in {\mathcal {M}}({\varOmega })$, there is an isomorphism

$$ \bigl({\mathcal {S}}^{1/k}({\varOmega })\bigr)^{\prime}\cong\bigl( {\mathcal {S}}^{1/k}({\varOmega }, \mu_{0})\bigr)^{\perp}\oplus {\mathcal {S}}^{k/(k-1)}({\varOmega }, \mu_{0}), $$

(3.73)

where $({\mathcal {S}}^{1/k}({\varOmega }, \mu_{0}))^{\perp}$ denotes the annihilator of ${\mathcal {S}}^{1/k}({\varOmega }, \mu_{0}) {\subseteq }{\mathcal {S}}^{1/k}({\varOmega })$ and where ${\mathcal {S}}^{k/(k-1)}({\varOmega }, \mu_{0})$ is embedded into the dual via the canonical pairing (3.58). That is, we can write any $\alpha\in({\mathcal {S}}^{1/k}({\varOmega }))^{\prime}$ uniquely as

$$ \alpha(\cdot) = \bigl(\cdot; \phi_{\alpha}\mu_{0}^{1-1/k}\bigr) + \beta^{\mu_{0}}(\cdot), $$

(3.74)

where $\phi_{\alpha}\in L^{k/(k-1)}({\varOmega }, \mu_{0})$ and $\beta^{\mu_{0}}({\mathcal {S}}^{1/k}({\varOmega }, \mu_{0})) = 0$.

Proof

The restriction of $\alpha$ to ${\mathcal {S}}^{1/k}({\varOmega }, \mu_{0})$ yields a functional on $L^{k}({\varOmega }, \mu_{0})$ given as $\psi\mapsto\alpha (\psi\mu_{0}^{1/k})$. Since the dual of $L^{k}({\varOmega }, \mu_{0})$ is $L^{k/(k-1)}({\varOmega }, \mu_{0})$, there is a $\phi_{\alpha}$ such that

$$\alpha\bigl(\psi\mu_{0}^{1/k}\bigr) = \int_{{\varOmega }}\psi\phi_{\alpha}\; d\mu_{0} = \bigl(\psi\mu_{0}^{1/k}; \phi_{\alpha}\mu_{0}^{1-1/k} \bigr), $$

and then (3.74) follows by letting $\beta^{\mu_{0}}(\cdot ) := \alpha(\cdot) - (\cdot; \phi_{\alpha}\mu_{0}^{1-1/k})$. □

Lemma 3.3

Let $\nu_{n}^{1/k} = \psi_{n} \mu_{0}^{1/k}$ be a bounded sequence in ${\mathcal {S}}^{1/k}({\varOmega }, \mu_{0})$, i.e., $\limsup\|\nu_{n}^{1/k}\|_{k} < \infty$. If $\lim\int_{{\varOmega }}|\psi_{n}| \,d\mu_{0} = 0$, then

$$\nu_{n}^{1/k} \rightharpoonup0, $$

i.e., $\lim\alpha(\nu_{n}^{1/k}) = 0$ for all $\alpha\in({\mathcal {S}}^{1/k}({\varOmega }))^{\prime}$.

Proof

Suppose that $\int_{{\varOmega }}|\psi_{n}| \,d\mu_{0} \to0$ and let $\phi\in L^{k/(k-1)}({\varOmega }, \mu_{0})$ and $\tau\in L^{\infty}({\varOmega })$. Then

$$\begin{aligned} \limsup \bigl\vert \bigl(\nu_{n}^{1/k}; \phi \mu_{0}^{1-1/k}\bigr) \bigr\vert &\leq\limsup \biggl( \int_{{\varOmega }}|\psi_{n}| |\phi- \tau| \,d\mu_{0} + \|\tau\|_{\infty}\int_{{\varOmega }}|\psi_{n}| \,d\mu_{0} \biggr)\\ & \leq\limsup\|\psi_{n}\|_{k} \|\phi- \tau \|_{k/(k-1)}, \end{aligned}$$

using Hölder’s inequality in the last estimate. Since $\|\nu_{n}^{1/k}\| _{k} = \|\psi_{n}\|_{k}$ and hence, $\limsup\|\psi_{n}\|_{k} < \infty$, the bound on the right can be made arbitrarily small as $L^{\infty}({\varOmega }) {\subseteq }L^{k/(k-1)}({\varOmega }, \mu_{0})$ is a dense subspace. Therefore, $\lim (\nu_{n}^{1/k}; \phi\mu_{0}^{1-1/k}) = 0$ for all $\phi\in L^{k/(k-1)}$, and since $\beta(\nu_{n}^{1/k}) = 0$ for all $\beta\in({\mathcal {S}}^{1/k}({\varOmega }, \mu_{0}))^{\perp}$, the assertion follows from (3.74). □

Before we go on, let us introduce some notation. Let $(M, {\varOmega }, \mathbf {p})$ be a parametrized measure model such that ${\partial }_{V} \log{\mathbf {p}}(\xi) \in L^{k}({\varOmega }, \mathbf{p}(\xi))$ for all $V \in T_{\xi}M$, and let $(\xi_{t})_{t \in I}$ be a curve in $M$. By Proposition 3.3, there is a measure $\mu_{0} \in {\mathcal {M}}({\varOmega })$ dominating all $\mathbf{p}(\xi_{t})$. For $t, t_{0} \in I$ and $1 \leq l \leq k$ we define the remainder term as

$$ {\mathbf{r}}_{l}(t, t_{0}) := \mathbf{p}^{1/l}(\xi_{t+t_{0}}) - \mathbf {p}^{1/l}( \xi_{t_{0}}) - t {\partial }_{\xi_{t_{0}}'} {\mathbf{p}}^{1/l} \in {\mathcal {S}}^{1/l}({\varOmega }), $$

(3.75)

and we define the functions $p_{t}, q_{t} \in L^{1}({\varOmega }, \mu_{0})$ with $p_{t} \geq0$ and $q_{t;l},r_{t, t_{0}; l} \in L^{l}({\varOmega }, \mu_{0})$ such that

$$ \textstyle\begin{array}{lll@{\quad}l@{\quad}lllll} {\mathbf{p}}(\xi_{t}) & = & p_{t} \mu_{0} & \mbox{and} & {\partial }_{\xi'_{t}}{\mathbf{p}} & = & q_{t} \mu_{0},\\ q_{t;l} & := & \dfrac{q_{t}}{l p_{t}^{1-1/l}} & \mbox{so that} & {\partial }_{\xi_{t}'} {\mathbf{p}}^{1/l} & = & q_{t;l} \mu_{0}^{1/l},\\ r_{t, t_{0}; l} & := & p_{t+t_{0}}^{1/l} - p_{t_{0}}^{1/l} - t q_{t_{0};l} &\mbox{so that} & \mathbf{r}_{l}(t, t_{0}) & = & r_{t, t_{0}; l}\mu_{0}^{1/l}. \end{array} $$

(3.76)

Lemma 3.4

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model such that ${\partial }_{V} \log{\mathbf{p}}(\xi) \in L^{k}({\varOmega }, \mathbf{p}(\xi))$ for all $V \in T_{\xi}M$ and the function $V \mapsto {\partial }_{V} \mathbf {p}^{1/k} \in {\mathcal {S}}^{1/k}({\varOmega })$ is weakly continuous.

Then the model is weakly $k$-integrable if for any curve $(\xi_{t})_{t \in I}$ in $M$ and the measure $\mu_{0} \in {\mathcal {M}}({\varOmega })$ and the functions $p_{t}, q_{t}, q_{t;l}$ defined in (3.76) and any $A {\subseteq }{\varOmega }$ and $t_{0} \in I$

$$ \dfrac{d}{dt} \bigg|_{t = t_{0}} \int_{A} p_{t}^{1/k}\; d \mu_{0} = \int_{A} q_{t_{0};k}\; d\mu_{0} $$

(3.77)

or, equivalently, for all $a, b \in I$

$$ \int_{A} \bigl(p_{b}^{1/k} - p_{a}^{1/k}\bigr)\; d\mu_{0} = \int_{a}^{b} \int_{A} q_{t;k}\; d\mu _{0}\; dt. $$

(3.78)

Proof

Note that the right-hand side of (3.77) can be written as

$$\int_{A} q_{t_{0};k}\; d\mu_{0} = \bigl( {\partial }_{\xi_{t_{0}}'} {\mathbf{p}}^{1/k}; \chi_{A} \mu_{0}^{1-1/k}\bigr) $$

with the pairing from (3.58), whence it depends continuously on $t_{0}$ by the weak continuity of $V \mapsto {\partial }_{V} \mathbf {p}^{1/k}$. Thus, the equivalence of (3.77) and (3.78) follows from the fundamental theorem of calculus.

Now (3.78) can be rewritten as

$$ \bigl( \mathbf{p}^{1/k}(\xi_{b}) - \mathbf{p}^{1/k}(\xi_{a}); \phi\mu _{0}^{1-1/k} \bigr) = \int_{a}^{b} \bigl( {\partial }_{\xi_{t}'}{ \mathbf{p}}^{1/k}; \phi\mu _{0}^{1-1/k} \bigr)\; dt $$

(3.79)

for $\phi= \chi_{A}$, and hence, (3.79) holds whenever $\phi= \tau$ is a step function. But now, if $\phi\in L^{k/(k-1)}$ is given, then there is a sequence of step functions $(\tau _{n})$ such that $\operatorname {sign}(\phi) \tau_{n} \nearrow|\phi|$, and since (3.79) holds for all step functions, it also holds for $\phi$ by dominated convergence.

If $\beta\in {\mathcal {S}}^{1/k}({\varOmega }, \mu_{0})^{\perp}$, then clearly, $\beta(\mathbf {p}^{1/k}(\xi_{t})) = \beta({\partial }_{\xi_{t}'}{\mathbf{p}}^{1/k}) = 0$, whence by (3.74) we have for all $\alpha\in({\mathcal {S}}^{1/k}({\varOmega }, \mu_{0}))^{\prime}$

$$\alpha\bigl( \mathbf{p}^{1/k}(\xi_{b}) - \mathbf{p}^{1/k}(\xi_{a})\bigr) = \int_{a}^{b} \alpha\bigl({\partial }_{\xi_{t}'}{ \mathbf{p}}^{1/k}\bigr)\; dt, $$

and since the function $t \mapsto\alpha( {\partial }_{\xi_{t}'}{\mathbf {p}}^{1/k})$ is continuous by the assumed weak continuity of $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$, differentiation and the fundamental theorem of calculus yield (3.69) for $V = \xi_{t}'$, and as the curve $(\xi_{t})$ was arbitrary, (3.69) holds for arbitrary $V \in {TM}$.

But (3.69) is equivalent to saying that ${\partial }_{V} \mathbf{p}^{1/k}(\xi)$ is the weak Gâteaux-differential of $\mathbf{p}^{1/k}$ (cf. Definition C.1), and since this map is assumed to be weakly continuous, it follows that $\mathbf{p}^{1/k}$ is a weak $C^{1}$-map, whence $(M, {\varOmega }, \mathbf{p})$ is weakly $k$-integrable. □

Lemma 3.5

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model for which the map $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ is weakly continuous. Let $(\xi_{t})_{t \in I}$ be a curve in $M$, and let $\mu_{0} \in {\mathcal {M}}({\varOmega })$ be a measure dominating $\mathbf{p}(\xi_{t})$ for all $t$. For ${\varepsilon }> 0$, define the parametrized measure model $\mathbf{p}_{{\varepsilon }}$ as

$$ {\mathbf{p}}_{{\varepsilon }}(\xi) := \mathbf{p}(\xi) + {\varepsilon }\mu_{0}. $$

(3.80)

Then the map $t \mapsto {\partial }_{\xi_{t}'}{\mathbf{p}}_{{\varepsilon }}^{1/k} \in {\mathcal {S}}^{1/k}({\varOmega })$ is weakly continuous, and for all $t_{0} \in I$,

$$\dfrac{1}{t} \mathbf{r}^{{\varepsilon }}_{k}(t, t_{0}) \rightharpoonup0 \quad\textit{as}\ t \to0, $$

where $\mathbf{r}^{{\varepsilon }}_{k}(t, t_{0})$ is defined analogously to (3.75).

Proof

We define the functions $p_{t}^{{\varepsilon }}$, $q_{t}^{{\varepsilon }}$, $q_{t;l}^{{\varepsilon }}$ and $r_{t,t_{0};l}^{{\varepsilon }}$ satisfying (3.76) for the parametrized measure model $\mathbf{p}_{{\varepsilon }}$, so that $p_{t}^{{\varepsilon }}= p_{t} + {\varepsilon }$ and $q_{t}^{{\varepsilon }}= q_{t}$. For $t, t_{0} \in I$ we have

$$\begin{aligned} \bigl|q_{t;k}^{{\varepsilon }}- q_{t_{0};k}^{{\varepsilon }}\bigr| = &\biggl\vert \dfrac{q_{t}}{k (p_{t}^{{\varepsilon }})^{1-1/k}} - \dfrac{q_{t_{0}}}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} \biggr\vert \\ \leq&\dfrac{1}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} |q_{t} - q_{t_{0}}| + \biggl\vert \dfrac {1}{k (p_{t}^{{\varepsilon }})^{1-1/k}} - \dfrac{1}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} \biggr\vert |q_{t}| \\ \leq&\dfrac{1}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} \bigl(|q_{t} - q_{t_{0}}| + k \bigl| \bigl(p_{t_{0}}^{{\varepsilon }}\bigr)^{1-1/k} - \bigl(p_{t}^{{\varepsilon }}\bigr)^{1-1/k}\bigr| \bigl|q_{t;k}^{{\varepsilon }}\bigr| \bigr) \\ \stackrel{\scriptsize\mbox{(3.63)}}{\leq}&\dfrac{1}{k {\varepsilon }^{1-1/k}} \bigl(|q_{t} - q_{t_{0}}| + k |p_{t_{0}} - p_{t}|^{1-1/k} |q_{t;k}|\bigr). \end{aligned}$$

Thus,

$$\begin{aligned} \int_{{\varOmega }}\bigl|q_{t;k}^{{\varepsilon }}- q_{t_{0};k}^{{\varepsilon }}\bigr|\; d\mu_{0} & \leq\dfrac{1}{k {\varepsilon }^{1-1/k}} \int_{{\varOmega }}\bigl(|q_{t} - q_{t_{0}}| + k |p_{t_{0}} - p_{t}|^{1-1/k} |q_{t;k}|\bigr) \; d\mu_{0} \\ & \leq\dfrac{1}{k {\varepsilon }^{1-1/k}} \bigl( \|q_{t} - q_{t_{0}} \|_{1} + k \|p_{t_{0}} - p_{t}\|_{1}^{1-1/k} \|q_{t;k}\|_{k}\bigr) \end{aligned}$$

and since $\|q_{t} - q_{t_{0}}\|_{1} = \|d_{\xi_{t}'} {\mathbf{p}} - d_{\xi _{t_{0}}'} {\mathbf{p}}\|_{1}$ and $\|p_{t_{0}} - p_{t}\|_{1} = \|{\mathbf{p}}(\xi _{t}) - \mathbf{p}(t_{0})\|_{1}$ tend to 0 for $t \to t_{0}$ as $\mathbf{p}$ is a $C^{1}$-map by the definition of parametrized measure model, and $\|q_{t;k}\|_{k} = \|{\partial }_{\xi_{t}'}{\mathbf{p}}^{1/k}\|_{k}$ is bounded for $t \to t_{0}$ by (C.1) as ${\partial }_{\xi_{t}'}{\mathbf {p}}^{1/k} \rightharpoonup {\partial }_{\xi_{t_{0}}'}{\mathbf{p}}^{1/k}$, it follows that the integral tends to 0 and hence, Lemma 3.3 implies that ${\partial }_{\xi_{t}'} {\mathbf{p}}_{{\varepsilon }}^{1/k}\rightharpoonup {\partial }_{\xi_{t_{0}}'} {\mathbf{p}}_{{\varepsilon }}^{1/k}$ as $t \to t_{0}$, showing the weak continuity of $t \mapsto {\partial }_{\xi_{t}'} {\mathbf {p}}_{{\varepsilon }}^{1/k}$.

For the second claim, note that by the mean value theorem there is an $\eta_{t}$ between $p_{t+t_{0}}^{{\varepsilon }}$ and $p_{t_{0}}^{{\varepsilon }}$ (and hence, $\eta _{t} \geq {\varepsilon }$) for which

$$\begin{aligned} \begin{aligned} \bigl|r_{t,t_{0};k}^{{\varepsilon }}\bigr| &\ = \biggl\vert \bigl(p_{t+t_{0}}^{{\varepsilon }}\bigr)^{1/k} - \bigl(p_{t_{0}}^{{\varepsilon }}\bigr)^{1/k} - t \dfrac{q_{t_{0}}}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} \biggr\vert = \biggl\vert \dfrac{p_{t+t_{0}} - p_{t_{0}}}{k \eta_{t}^{1-1/k}} - t \dfrac{q_{t_{0}}}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} \biggr\vert \\ &\ \leq\dfrac{|r_{t, t_{0}; 1}|}{k \eta_{t}^{1-1/k}} + |t| \biggl\vert \dfrac {q_{t_{0}}}{k \eta_{t}^{1-1/k}} - \dfrac{q_{t_{0}}}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k}} \biggr\vert \\ &\ = \dfrac{|r_{t, t_{0}; 1}|}{k \eta_{t}^{1-1/k}} + \dfrac{|t||q_{t_{0}}| |(p_{t_{0}}^{{\varepsilon }})^{1-1/k} - \eta_{t}^{1-1/k}|}{k (p_{t_{0}}^{{\varepsilon }})^{1-1/k} \eta _{t}^{1-1/k}} \\ & \stackrel{\scriptsize\mbox{(3.63)}}{\leq} C \bigl(|r_{t, t_{0}; 1}| + |t| |q_{t_{0};k}| |p_{t+t_{0}} - p_{t_{0}}|^{1-1/k} \bigr) \end{aligned} \end{aligned}$$

for $C := 1/k {\varepsilon }^{1-1/k} > 0$ depending only on $k$ and ${\varepsilon }$. Thus,

$$\begin{aligned} \int_{{\varOmega }}\bigl|r_{t, t_{0}; k}^{{\varepsilon }}\bigr|\; d \mu_{0} & \leq C \biggl( \int_{{\varOmega }}|r_{t, t_{0};1}|\; d\mu_{0} + |t| \int_{{\varOmega }}|q_{t_{0};k}||p_{t+t_{0}} - p_{t_{0}}|^{1-1/k}\; d\mu_{0} \biggr) \\ & \leq C \bigl(\|r_{t, t_{0};1}\|_{1} + |t| \|q_{t_{0};k} \|_{k} \|p_{t+t_{0}} - p_{t_{0}}\|_{1}^{1-1/k} \bigr) \\ & = C \bigl(\bigl\| {\mathbf{r}}_{1}(t, t_{0})\bigr\| _{1} + |t| \bigl\| {\partial }_{\xi_{t}'} {\mathbf {p}}^{1/k}\bigr\| _{k} \bigl\| { \mathbf{p}}(t+t_{0}) - \mathbf{p}(t_{0}) \bigr\| _{1}^{1-1/k} \bigr), \end{aligned}$$

using Hölder’s inequality in the second line. Since $\mathbf{p}$ is a $C^{1}$-map, $\|{\mathbf{r}}_{1}(t, t_{0})\|_{1}/|t|$ and $\|{\mathbf {p}}(t+t_{0}) - \mathbf{p}(t_{0})\|_{1}$ tend to 0, whereas $\|{\partial }_{\xi_{t}'} {\mathbf{p}}^{1/k}\|_{k}$ is bounded close to $t_{0}$ by (C.1) since $t \mapsto {\partial }_{\xi_{t}'} {\mathbf{p}}^{1/k}$ is weakly continuous, so that

$$\dfrac{1}{|t|} \int_{{\varOmega }}\bigl|r_{t, t_{0};k}^{{\varepsilon }}\bigr|\;d \mu_{0} \xrightarrow{t \to 0} 0, $$

which by Lemma 3.3 implies the second assertion. □

Since by definition the derivative of $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ is weakly continuous for any $k$-integrable model, the second assertion of Theorem 3.2 will follow from the following.

Lemma 3.6

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model for which the map $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ is weakly continuous. Then $(M, {\varOmega }, \mathbf{p})$ is weakly $k$-integrable.

Proof

Let $(\xi_{t})_{t \in I}$ be a curve in $M$, let $\mu_{0} \in {\mathcal {M}}({\varOmega })$ be a measure dominating $\mathbf{p}(\xi_{t})$ for all $t$, and define the parametrized measure model $\mathbf{p}_{{\varepsilon }}(\xi)$ and $\mathbf{r}_{k}(t, t_{0})$ as in (3.80). By Lemma 3.5, we have for any $A {\subseteq }{\varOmega }$ and $t_{0} \in I$ and the pairing $(\cdot;\cdot)$ from (3.58)

$$\begin{aligned} 0 &= \lim_{t \to0} \dfrac{1}{t} \bigl(\mathbf{r}_{k}(t, t_{0}); \chi_{A} \mu _{0}^{1-1/k} \bigr) = \lim_{t \to0} \dfrac{1}{t} \int_{A} \bigl(\bigl(p_{t+t_{0}}^{{\varepsilon }}\bigr)^{1/k} - \bigl(p_{t_{0}}^{{\varepsilon }}\bigr)^{1/k} - t q_{t_{0};k}^{{\varepsilon }}\bigr)\; d \mu_{0} \\ & = \dfrac{d}{dt} \bigg|_{t=t_{0}} \int_{A} \bigl(p_{t}^{{\varepsilon }}\bigr)^{1/k}\; d\mu_{0} - \int_{A} q_{t_{0};k}^{{\varepsilon }}\; d \mu_{0}, \end{aligned}$$

showing that

$$ \dfrac{d}{dt} \bigg|_{t=t_{0}} \int_{A} \bigl(p_{t}^{{\varepsilon }}\bigr)^{1/k}\; d\mu_{0} = \int_{A} q_{t_{0};k}^{{\varepsilon }}\; d \mu_{0} $$

(3.81)

for all $t_{0} \in I$. As we observed in the proof of Lemma 3.4, the weak continuity of the map $t \mapsto {\partial }_{\xi_{t}'} {\mathbf{p}}_{{\varepsilon }}^{1/k}$ implies that the integral on the right-hand side of (3.81) depends continuously on $t_{0} \in I$, whence integration implies that for all $a, b \in I$ we have

$$ \int_{A} \bigl(\bigl(p_{b}^{{\varepsilon }}\bigr)^{1/k} - \bigl(p_{a}^{{\varepsilon }}\bigr)^{1/k} \bigr)\; d\mu_{0} = \int _{a}^{b} \int_{A} q_{t;k}^{{\varepsilon }}\; d \mu_{0}\; dt. $$

(3.82)

Now $|q_{t;k}^{{\varepsilon }}| \leq|q_{t;k}|$ and $|(p_{b}^{{\varepsilon }})^{1/k} - (p_{a}^{{\varepsilon }})^{1/k}\| \leq|p_{b} - p_{a}|^{1/k}$ by (3.63), whence we can use dominant convergence for the limit as ${\varepsilon }\searrow0$ in (3.82) to conclude that (3.78) holds, so that Lemma 3.4 implies the weak $k$-integrability of the model. □

We now have to prove the first assertion of Theorem 3.2. We begin by showing the weak continuity of the differential of $\mathbf{p}^{1/k}$.

Lemma 3.7

Let $(M, {\varOmega }, \mathbf{p})$ be a parametrized measure model with ${\partial }_{V} \log{\mathbf{p}}(\xi) \in L^{k}({\varOmega }, \mathbf{p}(\xi))$ for all $\xi$, and suppose that the function (3.70) is continuous. Then the map $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ is weakly continuous, so that the model is weakly $k$-integrable.

Proof

Let $(V_{n})_{n \in {\mathbb {N}}}$ be a sequence, $V_{n} \in T_{\xi _{n}}M$ with $V_{n} \to V_{0} \in T_{\xi_{0}}M$, and let $\mu_{0} \in {\mathcal {M}}({\varOmega })$ be a measure dominating all $\mathbf{p}(\xi_{n})$. In fact, we may assume that there is a decomposition ${\varOmega }= {\varOmega }_{0} \uplus {\varOmega }_{1}$ such that

$${\mathbf{p}}(\xi_{0}) = \chi_{{\varOmega }_{0}} \mu_{0}. $$

We adapt the notation from (3.76) and define $p_{n}, q_{n} \in L^{1}({\varOmega }, \mu_{0})$ and $q_{n;l} \in L^{l}({\varOmega }, \mu_{0})$, replacing $t \in I$ by $n \in {\mathbb {N}}_{0}$ in (3.76). In particular, $p_{0} = \chi_{{\varOmega }_{0}}$. Then on ${\varOmega }_{0}$ we have

$$\begin{aligned} |q_{n;k} - q_{n;0}| = &\biggl\vert \dfrac{q_{n}}{k p_{n}^{1-1/k}} - \dfrac {q_{0}}{k} \biggr\vert \leq\dfrac{1}{k} |q_{n} - q_{0}| + \dfrac{|q_{n}|}{k} \biggl\vert \dfrac{1}{p_{n}^{1-1/k}} - 1 \biggr\vert \\ = &\dfrac{1}{k} |q_{n} - q_{0}| + |q_{n;k}| \bigl\vert 1 - p_{n}^{1-1/k} \bigr\vert \\ \stackrel{\scriptsize\mbox{(3.63)}}{\leq}&\dfrac{1}{k} |q_{n} - q_{0}| + |q_{n;k}| |1 - p_{n}|^{1-1/k} \\ = &\dfrac{1}{k} |q_{n} - q_{0}| + |q_{n;k}| |p_{n} - p_{0}|^{1-1/k}. \end{aligned}$$

Thus,

$$\begin{aligned} \int_{{\varOmega }_{0}} |q_{n;k} - q_{n;0}|\; d \mu_{0} &\leq\dfrac{1}{k} \int_{{\varOmega }_{0}} |q_{n} - q_{0}|\; d \mu_{0} + \int_{{\varOmega }_{0}} |q_{n;k}| |p_{n} - p_{0}|^{1-1/k}\; d\mu _{0} \\ & \leq\dfrac{1}{k} \|q_{n} - q_{0}\|_{1} + \|q_{n;k}\|_{k} \|p_{n} - p_{0}\| _{1}^{1-1/k} \\ & = \dfrac{1}{k} \|{\partial }_{V_{n}} {\mathbf{p}} - {\partial }_{V_{0}} { \mathbf{p}}\|_{1} + \bigl\| {\partial }_{V_{n}} {\mathbf{p}}^{1/k} \bigr\| _{k} \bigl\| {\mathbf{p}}(\xi_{n}) - \mathbf{p}(\xi _{0})\bigr\| _{1}^{1-1/k}. \end{aligned}$$

Since $\mathbf{p}$ is a $C^{1}$-map, both $\|{\partial }_{V_{n}} {\mathbf{p}} - {\partial }_{V_{0}} {\mathbf{p}}\|_{1}$ and $\|{\mathbf{p}}(\xi_{n}) - \mathbf{p}(\xi _{0})\|_{1}$ tend to 0, whereas $\|{\partial }_{V_{n}} {\mathbf{p}}^{1/k}\|_{k}$ tends to $\|{\partial }_{V_{0}} {\mathbf{p}}^{1/k}\|_{k}$ by the continuity of (3.70). Moreover, ${\partial }_{V_{0}} {\mathbf{p}}$ is dominated by $\mathbf{p}(\xi_{0}) = \chi_{{\varOmega }_{0}} \mu_{0}$, whence $q_{0}$ and $q_{0;k}$ vanish on ${\varOmega }_{1}$. Thus, we conclude that

$$ 0 = \lim \int_{{\varOmega }_{0}} |q_{n;k} - q_{n;0}|\; d \mu_{0} = \lim \int_{{\varOmega }}|\chi _{{\varOmega }_{0}} q_{n;k} - q_{n;0}|\; d\mu_{0}. $$

(3.83)

By Lemma 3.3, this implies that $\chi_{{\varOmega }_{0}} {\partial }_{V_{n}} {\mathbf{p}}^{1/k} \rightharpoonup {\partial }_{V_{0}} {\mathbf{p}}^{1/k}$, whence by (C.1) we have

$$\begin{aligned} \|{\partial }_{V_{0}} {\mathbf{p}}\|_{k} & \leq\liminf\| \chi_{{\varOmega }_{0}} {\partial }_{V_{n}} {\mathbf{p}}\|_{k} \leq\limsup \|\chi_{{\varOmega }_{0}} {\partial }_{V_{n}} {\mathbf{p}}\|_{k} \\ & \leq\limsup\|{\partial }_{V_{n}} {\mathbf{p}}\|_{k} = \| {\partial }_{V_{0}} {\mathbf{p}}\|_{k}, \end{aligned}$$

using again the continuity of (3.70). Thus, we have equality in these estimates, i.e.,

$$\|{\partial }_{V_{0}} {\mathbf{p}}\|_{k} = \lim\|\chi_{{\varOmega }_{0}} {\partial }_{V_{n}} {\mathbf {p}}\|_{k} = \lim\|{\partial }_{V_{n}} { \mathbf{p}}\|_{k}, $$

and since $\|{\partial }_{V_{n}} {\mathbf{p}}\|_{k}^{k} = \|\chi_{{\varOmega }_{0}}{\partial }_{V_{n}} {\mathbf{p}}\|_{k}^{k} + \|\chi_{{\varOmega }_{1}}{\partial }_{V_{n}} {\mathbf{p}}\|_{k}^{k}$, this implies that

$$\lim\|\chi_{{\varOmega }_{1}}{\partial }_{V_{n}} {\mathbf{p}}\|_{k} = 0. $$

Thus,

$$\lim \int_{{\varOmega }_{1}} |q_{n;k} - q_{n;0}|\; d \mu_{0} = \lim \int_{{\varOmega }_{1}} |q_{n;k}|\; d\mu_{0} = \lim\| \chi_{{\varOmega }_{1}} {\partial }_{V_{n}} {\mathbf{p}}\|_{1} = 0 $$

as $\|\chi_{{\varOmega }_{1}} {\partial }_{V_{n}} {\mathbf{p}}\|_{k} \to0$, so that together with (3.83) we conclude that

$$\lim \int_{{\varOmega }}|q_{n;k} - q_{n;0}|\; d \mu_{0} = 0, $$

and now, Lemma 3.3 implies that ${\partial }_{V_{n}} {\mathbf{p}}^{1/k} \rightharpoonup {\partial }_{V_{0}} {\mathbf{p}}^{1/k}$ for an arbitrary convergent sequence $(V_{n}) \in {TM}$, showing the weak continuity, and the last assertion now follows from Lemma 3.6. □

Lemma 3.8

The first assertion of Theorem 3.2 holds.

Proof

By the definition of $k$-integrability, the continuity of the map $\mathbf{p}^{1/k}$ for a $k$-integrable parametrized measure model is evident from the definition, so that the continuity of (3.70) follows.

Thus, we have to show the converse and assume that the map (3.70) is continuous. By Lemma 3.7, this implies that the map $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ is weakly continuous and hence, the model is weakly $k$-integrable by Lemma 3.6. In particular, (3.69) holds by Proposition 3.4.

Together with the continuity of the norm, it follows from the Radon–Riesz theorem (cf. Theorem C.3) that the map $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ is continuous even in the norm of ${\mathcal {S}}^{1/k}({\varOmega })$.

Let $(\xi_{t})_{t \in I}$ be a curve in $M$ and let $V := \xi_{0}' \in T_{\xi_{0}}M$, and recall the definition of the remainder term $\mathbf{r}_{k}(t,t_{0})$ from (3.75). Thus, what we have to show is that

$$ \dfrac{1}{|t|} \bigl\| {\mathbf{r}}_{k}(t,t_{0}) \bigr\| _{k} \xrightarrow{t \to0} 0. $$

(3.84)

By the Hahn–Banach theorem (cf. Theorem C.1), we may for each pair $t, t_{0} \in I$ choose an $\alpha\in {\mathcal {S}}^{1/k}({\varOmega })'$ with $\alpha(\mathbf{r}_{k}(t,t_{0})) = \| {\mathbf{r}}_{k}(t,t_{0})\|_{k}$ and $\|\alpha\|_{k} = 1$. Then we have

$$\begin{aligned} \bigl\| {\mathbf{r}}_{k}(t,t_{0})\bigr\| _{k} =& \bigl|\alpha \bigl(\mathbf{r}_{k}(t,t_{0})\bigr)\bigr| = \bigl\vert \alpha \bigl( \mathbf{p}^{1/k}(\xi_{t+t_{0}}) - \mathbf{p}^{1/k}( \xi _{t_{0}})\bigr) - t {\partial }_{\xi'_{t_{0}}}{\mathbf{p}}^{1/k}) ) \bigr\vert \\ \stackrel{\scriptsize\mbox{(3.69)}}{=}& \biggl\vert \int_{t_{0}}^{t+t_{0}} \alpha \bigl({\partial }_{\xi'_{s}}{ \mathbf{p}}^{1/k} - {\partial }_{\xi'_{t_{0}}}{\mathbf {p}}^{1/k} \bigr) \; ds \biggr\vert \\ \leq& \int_{t_{0}}^{t+t_{0}} \|\alpha\|_{k}\; \bigl\| {\partial }_{\xi'_{s}}{\mathbf {p}}^{1/k} - {\partial }_{\xi'_{t_{0}}}{ \mathbf{p}}^{1/k}\bigr\| _{k}\; ds \\ \leq& |t| \max_{|s-t_{0}| \leq t} \bigl\| {\partial }_{\xi'_{s}}{\mathbf {p}}^{1/k} - {\partial }_{\xi'_{t_{0}}}{\mathbf {p}}^{1/k}\bigr\| _{k} \end{aligned}$$

Thus,

$$\dfrac{\|{\mathbf{r}}_{k}(t,t_{0})\|_{k}}{|t|} \leq\max_{|s-t_{0}| \leq t} \bigl\| |{\partial }_{\xi'_{s}}{ \mathbf{p}}^{1/k} - {\partial }_{\xi'_{t_{0}}}{\mathbf{p}}^{1/k} \bigr\| _{k}, $$

and by the continuity of the map $V \mapsto {\partial }_{V} \mathbf{p}^{1/k}$ in the norm of ${\mathcal {S}}^{1/k}({\varOmega })$, the right-hand side tends to 0 as $t \to0$, showing (3.84) and hence the claim. □

3.2.5 Canonical $n$-Tensors of an $n$-Integrable Model

We begin this section with the formal definition of an $n$-tensor on a vector space.

Definition 3.8

Let $(V, \|\cdot\|)$ be a normed vector space (e.g., a Banach space). A covariant $n$ -tensor on $V$ is a multilinear map $Θ : ⨉^{n} V \to R$ which is continuous w.r.t. the product topology.

We can characterize covariant $n$-tensors by the following proposition.

Proposition 3.5

Let $(V, \|\cdot\|)$ be a Banach space and $Θ : ⨉^{n} V \to R$ a be multilinear map. Then the following are equivalent.

(1)
$\varTheta$ is a covariant $n$-tensor on $V$, i.e., continuous w.r.t. the product topology.
(2)
There is a $C > 0$ such that for $V_{1}, \ldots, V_{n} \in V$
$$ \bigl|\varTheta(V_{1}, \ldots, V_{n})\bigr| \leq C \|V_{1}\| \cdots\|V_{n}\|. $$
(3.85)

Proof

To see that the first condition implies (3.85), we proceed by induction on $n$. For $n = 1$ this is clear as a continuous linear map $\varTheta: V \to {\mathbb {R}}$ is bounded. Suppose that (3.85) holds for all $n$-tensors and let $Θ^{n + 1} : ⨉^{n + 1} V \to R$ be a covariant $(n+1)$-tensor. For fixed $V_{1}, \ldots, V_{n} \in V$, the map $\varTheta^{n+1}(\cdot, V_{1}, \ldots , V_{n})$ is continuous and hence bounded linear.

On the other hand, for fixed $V_{0}$, the map $\varTheta^{n+1}(V_{0}, V_{1},\dots , V_{n})$ is a covariant $n$-tensor and hence by induction hypothesis,

$$ \bigl|\varTheta^{n+1}(V_{0}, V_{1}, \ldots, V_{n})\bigr| \leq C_{V_{0}} \quad\mbox{if $\| V_{i}\| = 1$ for all $i>0$}. $$

(3.86)

The uniform boundedness principle (cf. Theorem C.2) now shows that the constant $C_{V_{0}}$ in (3.86) can be chosen to be $C \|V_{0}\|$ for some fixed $C \in {\mathbb {R}}$, so that (3.85) holds for $\varTheta^{n+1}$, completing the induction.

Next, to see that (3.85) implies the continuity of $\varTheta^{n}$, let $(V_{i}^{(k)})_{k \in {\mathbb {N}}} \in V$, $i = 1, \ldots, n$ be sequences converging to $V_{i}^{0}$. Then

$$\begin{aligned} \bigl|\varTheta\bigl(V_{1}^{(k)}, \ldots, V_{n}^{(k)} \bigr) - \varTheta\bigl(V_{1}^{0}, \ldots, V_{n}^{0}\bigr)\bigr| =& \Biggl|\sum_{i=1}^{n} \varTheta(V_{1}^{(k)}, \ldots, V_{i}^{(k)} - V_{i}^{0}, \ldots, V_{n}^{0})\Biggr| \\ \stackrel{\scriptsize\mbox{(3.85)}}{\leq}&\sum_{i=1}^{n} C \bigl\| V_{1}^{(k)}\bigr\| \cdots\bigl\| V_{i}^{(k)} - V_{i}^{0}\bigr\| \cdots\bigl\| V_{n}^{0}\bigr\| , \end{aligned}$$

and this tends to 0 as $\|V_{i}^{(k)}\| \xrightarrow{k\to\infty} \| V_{i}^{0}\|$ and $\|V_{i}^{(k)} - V_{i}^{0}\|\xrightarrow{k\to\infty}0$. Thus, $\varTheta$ is continuous in the product topology. □

Definition 3.9

(Covariant $n$-tensors on a manifold)

Let $M$ be a $C^{1}$-manifold. A covariant $n$ -tensor field on $M$ is a family $(\varTheta_{\xi})_{\xi\in M}$ of covariant $n$-tensor fields on $T_{\xi}M$ which are weakly continuous, i.e., such that for continuous vector fields $V_{1}, \ldots, V_{n}$ on $M$ the function $\varTheta (V_{1}, \ldots, V_{n})$ is continuous on $M$.

An important example of such a tensor is given by the following

Definition 3.10

(Canonical $n$-tensor)

For $n \in {\mathbb {N}}$, the canonical $n$ -tensor is the covariant $n$-tensor on ${\mathcal {S}}^{1/n}({\varOmega })$, given by

$$ L^{n}_{{\varOmega }}(\nu_{1}, \ldots, \nu_{n}) = n^{n} \int_{{\varOmega }}d(\nu_{1} \cdots\nu_{n}), \quad \mbox{where $\nu_{i} \in {\mathcal {S}}^{1/n}({\varOmega })$}. $$

(3.87)

Moreover, for $0 < r \leq1/n$ the canonical $n$ -tensor $\tau^{n}_{{\varOmega };r}$ on ${\mathcal {S}}^{r}({\varOmega })$ is defined as

$$ \bigl(\tau^{n}_{{\varOmega };r}\bigr)_{\mu_{r}}(\nu_{1}, \ldots, \nu_{n}) := \textstyle\begin{cases} {\frac{1}{r^{n}} \int_{{\varOmega }}d(\nu_{1} \cdots\nu_{n}\cdot|\mu_{r}|^{1/r - n})} & \mbox{if $r < 1/n$},\\ L^{n}_{{\varOmega }}(\nu_{1}, \ldots , \nu_{n}) & \mbox{if $r = 1/n$}, \end{cases} $$

(3.88)

where $\nu_{i} \in {\mathcal {S}}^{r}({\varOmega }) = T_{\mu_{r}}{\mathcal {S}}^{r}({\varOmega })$.

Observe that for a finite set ${\varOmega }= I$, the definition of $L^{n}_{I}$ coincides with the covariant $n$-tensor given in (2.53).

For $n = 2$, the pairing $(\cdot;\cdot): {\mathcal {S}}^{1/2}({\varOmega }) \times {\mathcal {S}}^{1/2}({\varOmega }) \to {\mathbb {R}}$ from (3.58) satisfies

$$(\nu_{1}; \nu_{2}) = \frac{1}{4}\; L_{{\varOmega }}^{2}(\nu_{1}, \nu_{2}). $$

Since $(\nu; \nu) = \|\nu\|_{2}^{2}$ by (3.54), it follows:^{Footnote 5}

$$ \biggl({\mathcal {S}}^{1/2}({\varOmega }), \frac{1}{4}\; L^{2}_{{\varOmega }}\biggr) \mbox{ is a Hilbert space with norm $\| \cdot\|_{2}$.} $$

(3.89)

For a $C^{1}$-map $\varPhi: M_{1} \to M_{2}$ and a covariant $n$-tensor field $\varTheta$ on $M_{2}$, the pull-back $\varPhi^{*}\varTheta$ given by

$$ \varPhi^{*}\varTheta(V_{1}, \ldots, V_{n}) := \varTheta\bigl(d\varPhi(V_{1}), \ldots, d\varPhi(V_{n})\bigr) $$

(3.90)

is a covariant $n$-tensor field on $M_{1}$. If $\varPhi: M_{1} \to M_{2}$ and $\varPsi: M_{2} \to M_{3}$ are differentiable, then this immediately implies for the composition $\varPsi\varPhi: M_{1} \to M_{3}$:

$$ (\varPsi\varPhi)^{*}\varTheta= \varPhi^{*} \varPsi^{*}\varTheta. $$

(3.91)

Proposition 3.6

Let $n \in {\mathbb {N}}$ and $0 < s \leq r \leq1/n$. Then

$$ \tau^{n}_{{\varOmega };r} = \bigl(\tilde{\pi}^{1/rn}\bigr)^{*}L^{n}_{{\varOmega }}$$

(3.92)

and

$$ \bigl(\tilde{\pi}^{r/s}\bigr)^{*}\tau^{n}_{{\varOmega };r} = \tau^{n}_{{\varOmega };s}, $$

(3.93)

with the $C^{1}$-maps $\tilde{\pi}^{1/rn}$ and $\tilde{\pi}^{r/s}$ from Proposition 3.2, respectively.

Proof

Unwinding the definition we obtain for $k := 1/rn \geq1$:

$$\begin{aligned} \bigl(\bigl(\tilde{\pi}^{k}\bigr)^{*}L^{n}_{{\varOmega }}\bigr)_{\mu_{r}} \bigl(\nu_{r}^{1}, \ldots, \nu^{n}_{r}\bigr) = & L^{n}_{{\varOmega }}\bigl(d_{\mu_{r}}\tilde{\pi}^{k} \bigl(\nu_{r}^{1} \bigr), \ldots, d_{\mu_{r}}\tilde{\pi}^{k} \bigl( \nu_{r}^{n}\bigr)\bigr) \\ \stackrel{\scriptsize\mbox{(3.62)}}{=} & L^{n}_{{\varOmega }}\bigl(k | \mu_{r}|^{k - 1} \cdot\nu_{r}^{1}, \ldots, k |\mu_{r}|^{k - 1} \cdot\nu_{r}^{n}\bigr) \\ = & k^{n} n^{n} \int_{{\varOmega }}d \bigl(\bigl(|\mu_{r}|^{k - 1} \cdot\nu_{r}^{1}\bigr) \cdots \bigl(|\mu_{r}|^{k - 1} \cdot\nu_{r}^{n}\bigr) \bigr) \\ = & \dfrac{1}{r^{n}} \int_{{\varOmega }}d \bigl(\nu_{r}^{1} \cdots \nu_{r}^{n} \cdot|\mu _{r}|^{1/r - n} \bigr) \\ = & \bigl(\tau^{n}_{{\varOmega };r}\bigr)_{\mu_{r}} \bigl( \nu_{r}^{1}, \ldots, \nu_{r}^{n}\bigr), \end{aligned}$$

showing (3.92). Thus,

$$\begin{aligned} \bigl(\tilde{\pi}^{r/s}\bigr)^{*}\tau^{n}_{{\varOmega };r} = & \bigl(\tilde{\pi}^{r/s}\bigr)^{*}\bigl(\tilde{\pi}^{1/rn} \bigr)^{*}L^{n}_{{\varOmega }}= \bigl(\tilde{\pi}^{1/rn} \tilde{\pi}^{r/s}\bigr)^{*}L^{n}_{{\varOmega }}= \bigl(\tilde{\pi}^{1/sn}\bigr)^{*}L^{n}_{{\varOmega }}\\ = & \tau^{n}_{{\varOmega };s}, \end{aligned}$$

showing (3.93). □

Remark 3.10

If ${\varOmega }= I$ is a finite measurable space, then ${\mathcal {M}}_{+}^{r}(I)$ is an open subset of ${\mathcal {S}}^{r}(I)$ and hence a manifold. Moreover, the restrictions $\tilde{\pi}^{k}: {\mathcal {M}}_{+}^{r}(I) \to {\mathcal {M}}_{+}^{s}(I)$ is a $C^{1}$-map even for $k \leq1$, so that we may use (3.92) to define $\tau^{n}_{{\varOmega };r}$ for all $r \in(0,1]$ on ${\mathcal {M}}_{+}(I)$.

That is, if $\mu= \sum_{i \in I} m_{i} \delta^{i} \in {\mathcal {M}}_{+}(I)$ and $V_{k} = \sum_{i \in I} v_{k;i} \delta^{i} \in {\mathcal {S}}(I)$, then

$$ \tau_{I;r}^{n}(V_{1}, \ldots, V_{n}) = \dfrac{1}{r^{n}} \int_{I} v_{1;i} \cdots v_{n;i} m_{i}^{1/r-n} \delta^{i} = \sum _{i \in I} \dfrac{1}{r^{n}m_{i}^{n-1/r}} v_{1;i} \cdots v_{n;i}. $$

(3.94)

For $r = 1$, the tensor $\tau_{I;1}^{n}$ coincides with the tensor $\tau _{I}^{n}$ in (2.76).

This explains on a more conceptual level why $\tau_{I}^{n} = \tau_{I;1}^{n}$ cannot be extended from ${\mathcal {M}}_{+}(I)$ to ${\mathcal {S}}(I)$, whereas $L^{2}_{I} = \tilde{\pi}^{1/2}\tau_{I}^{n}$ can be extended to a tensor on ${\mathcal {S}}^{2}(I)$, cf. (2.52).

If $(M, {\varOmega }, \mathbf{p})$ is an $n$-integrable parametrized measure model, then we define the canonical $n$ -tensor of $(M, {\varOmega }, \mathbf{p})$ as

$$ \tau^{n}_{M} = \tau^{n}_{(M, {\varOmega }, \mathbf{p})} := \bigl(\mathbf{p}^{1/n}\bigr)^{*}L^{n}_{{\varOmega }}, $$

(3.95)

so that by (3.68)

$$\begin{aligned} \tau^{n}_{M}(V_{1}, \ldots, V_{n}) := & L^{n}_{{\varOmega }}\bigl(d_{\xi}{\mathbf {p}}^{1/n}(V_{1}), \ldots, d_{\xi}{\mathbf{p}}^{1/n}(V_{n})\bigr) \\ = & \int_{{\varOmega }}\partial_{V_{1}} \log{\mathbf{p}}(\xi) \cdots\partial _{V_{n}} \log{\mathbf{p}}(\xi)\; d\mathbf{p}(\xi), \end{aligned}$$

(3.96)

where the second line follows immediately from (3.68) and (3.87). That is, $\tau _{M}^{n}$ coincides with the tensor given in (3.44).

Observe that $\tau^{n}_{M}$ is indeed continuous, since $\tau^{n}_{M}(V, \ldots , V) = n^{n} \|d_{\xi}{\mathbf{p}}^{1/n}(V)\|_{n}^{n}$ is continuous for all $V$.

Example 3.5

(1)
For $n=1$, the canonical 1-form is given as
$$ \tau^{1}_{M}(V) := \int_{{\varOmega }}{\partial }_{V} \log{\mathbf{p}}(\xi) \, d \mathbf{p}(\xi ) = {\partial }_{V} \bigl\| {\mathbf{p}}(\xi)\bigr\| . $$
(3.97)
Thus, it vanishes if and only if $\|{\mathbf{p}}(\xi)\|$ is locally constant, e.g., if $(M, {\varOmega }, \mathbf{p})$ is a statistical model.
(2)
For $n = 2$, $\tau^{2}_{M}$ coincides with the Fisher metric ${\mathfrak {g}}$ from (3.41).
(3)
For $n=3$, $\tau^{3}_{M}$ coincides with the Amari–Chentsov 3-symmetric tensor ${\mathbf {T}}$ from (3.42).

Remark 3.11

While the above examples show the statistical significance of $\tau _{M}^{n}$ for $n = 1,2,3$, we shall show later that the tautological forms of even degree $\tau_{M}^{2n}$ can be used to measure the information loss of statistics and Markov kernels, cf. Theorem 5.5. Moreover, in [160, p. 212] the question is posed if there are other significant tensors on statistical manifolds, and the canonical $n$-tensors may be considered as natural candidates.

3.2.6 Signed Parametrized Measure Models

In this section, we wish to comment on the generalization of Definition 3.4 to families of finite signed measures $(\mathbf{p}(\xi))_{\xi\in M}$, i.e., dropping the assumption that $\mathbf{p}(\xi)$ is a non-negative measure. That is, we may simply consider $C^{1}$-maps $\mathbf{p}: M \to {\mathcal {S}}({\varOmega })$.

However, with this approach there is no notion of $k$-integrability or of canonical tensors, as the term ${\partial }_{V} \log{\mathbf{p}}(\xi)$ from (3.66), which is necessary to define these notions, cannot be given sense without further assumptions.

For instance, if ${\varOmega }= \{{\omega }\}$ is a singleton and $\mathbf{p}(\xi) := \xi\delta^{{\omega }}$ for $\xi\in {\mathbb {R}}$, then $\mathbf{p}: {\mathbb {R}}\to {\mathcal {S}}({\varOmega })$ is certainly a $C^{1}$-map, but $\log{\mathbf{p}}(\xi) = \log\xi$ cannot be continuously extended at $\xi= 0$.

Therefore, we shall make the following definition.

Definition 3.11

Let ${\varOmega }$ be a measurable space and $M$ be a (Banach-)manifold.

(1)
A signed parametrized measure model is a triple $(M, {\varOmega }, \mathbf{p})$, where $M$ is a (finite or infinite-dimensional) Banach manifold and $\mathbf{p}: M \to {\mathcal {S}}({\varOmega })$ is a $C^{1}$-map in the sense of Sect. 3.2.2.
(2)
A signed parametrized measure model $(M, {\varOmega }, \mathbf{p})$ is said to have a logarithmic derivative if for each $\xi\in M$ and $V \in T_{\xi}M$ the derivative $d_{\xi}{\mathbf{p}}(V) \in {\mathcal {S}}({\varOmega })$ is dominated by $|{\mathbf{p}}(\xi)|$. In this case, we define analogously to (3.66)
$$ {\partial }_{V} \log\bigl|{\mathbf{p}}(\xi)\bigr| := \dfrac{d\{d_{\xi}{\mathbf{p}}(V)\} }{d\mathbf{p}(\xi)}. $$
(3.98)

Here, for signed measures $\mu, \nu\in {\mathcal {S}}({\varOmega })$ such that $|\mu|$ dominates $\nu$, the Radon–Nikodym derivative $\dfrac{d\nu}{d\mu}$ is the unique function in $L^{1}({\varOmega }, |\mu|)$ such that

$$\nu= \dfrac{d\nu}{d\mu} \mu. $$

Just as in the non-signed case, we can now consider the $(1/k)$th power of $\mathbf{p}$. Here, we shall use the signed power in order not to lose information on $\mathbf{p}(\xi)$. That is, we consider the (continuous) map

$$\tilde{\mathbf{p}}^{1/k}: M \longrightarrow {\mathcal {S}}^{1/k}({\varOmega }), \qquad\xi \longmapsto\tilde{\pi}^{1/k} {\mathbf{p}}(\xi) =: \tilde{\mathbf {p}}(\xi)^{1/k}. $$

Assuming that this map is differentiable and differentiating the equation $\mathbf{p} = \tilde{\pi}^{k} \tilde{\mathbf{p}}^{1/k}$ just as in the non-signed case, we obtain in analogy to (3.68)

$$ d_{\xi}\tilde{\mathbf{p}}^{1/k}(V) := \dfrac{1}{k} \; {\partial }_{V} \log\bigl|{\mathbf {p}}(\xi)\bigr|\; \tilde{ \mathbf{p}}^{1/k}(\xi) \in {\mathcal {S}}^{1/k}\bigl({\varOmega }, \bigl|{\mathbf {p}}( \xi)\bigr|\bigr) $$

(3.99)

so that, in particular, ${\partial }_{V} \log|{\mathbf{p}}(\xi)| \in L^{k}({\varOmega }, |{\mathbf{p}}(\xi)|)$, and in analogy to Definition 3.7 we define the following.

Definition 3.12

($k$-Integrable signed parametrized measure model)

A signed parametrized measure model $(M, {\varOmega }, \mathbf{p})$ is called $k$ -integrable for $k \geq1$ if it has a logarithmic derivative and the map $\tilde{\mathbf{p}}^{1/k}: M \to {\mathcal {S}}^{1/k}({\varOmega })$ is a $C^{1}$-map. Furthermore, we call the model $\infty$ -integrable if it is $k$-integrable for all $k \geq1$.

Just as in the non-signed case, this allows us to define the canonical $n$-tensor for an $n$-integrable signed parametrized measure model. Namely, in analogy to (3.95) and (3.96) we define for an $n$-integrable signed parametrized measure model $(M, {\varOmega }, \mathbf{p})$ the canonical $n$ -tensor of $(M, {\varOmega }, \mathbf{p})$ as

$$ \tau^{n}_{M} = \tau^{n}_{(M, {\varOmega }, \mathbf{p})} := \bigl(\tilde{\mathbf{p}}^{1/n}\bigr)^{*}L^{n}_{{\varOmega }}, $$

(3.100)

so that by (3.99) and (3.87)

$$\begin{aligned} \tau^{n}_{M}(V_{1}, \ldots, V_{n}) := & L^{n}_{{\varOmega }}\bigl(d_{\xi}\tilde{\mathbf {p}}^{1/n}(V_{1}), \ldots, d_{\xi}\tilde{\mathbf{p}}^{1/n}(V_{n})\bigr)\\ = & \int_{{\varOmega }}\partial_{V_{1}} \log\bigl|{\mathbf{p}}(\xi)\bigr| \cdots\partial _{V_{n}} \log\bigl|{\mathbf{p}}(\xi)\bigr|\; d\mathbf{p}(\xi). \end{aligned}$$

Example 3.6

(1)
For $n=1$, the canonical 1-form is given as
$$ \tau^{1}_{M}(V) := \int_{{\varOmega }}{\partial }_{V} \log\bigl|{\mathbf{p}}(\xi)\bigr| \, dp(\xi) = {\partial }_{V} \bigl(\mathbf{p}(\xi) ({\varOmega })\bigr). $$
(3.101)
Thus, it vanishes if and only if $\mathbf{p}(\xi)({\varOmega })$ is locally constant, but of course, as $\mathbf{p}(\xi)$ is a signed measure, in general this quantity does not equal $\|{\mathbf{p}}(\xi)\|_{TV} = |{\mathbf{p}}(\xi)|({\varOmega })$.
(2)
For $n = 2$ and $n=3$, $\tau^{2}_{M}$ and $\tau^{3}_{M}$ coincide with the Fisher metric and the Amari–Chentsov 3-symmetric tensor ${\mathbf {T}}$ from (3.41) and (3.42), respectively. That is, for signed parametrized measure models which are $k$-integrable for $k \geq 2$ or $k \geq3$, respectively, the tensors ${\mathfrak {g}}_{M}$ and ${\mathbf {T}}$, respectively, still can be defined. Note, however, that ${\mathfrak {g}}_{M}$ may fail to be positive definite. In fact, it may even be degenerate, so that ${\mathfrak {g}}_{M}$ does not necessarily yield a Riemannian (and possibly not even a pseudo-Riemannian) metric on $M$ for signed parametrized measure models.

3.3 The Pistone–Sempi Structure

The definition of ($k$-integrable) parametrized measure models (statistical models, respectively) discussed in Sect. 3.2 is strongly inspired by the definition of Amari [9]. There is, however, another beautiful concept of geometrizing the space of finite measures (probability measures, respectively) which was first suggested by Pistone and Sempi in [216].

This approach is, on the one hand, more restrictive, since instead of geometrizing all of ${\mathcal {M}}({\varOmega })$, it only geometrizes the subset ${\mathcal {M}}_{+} = {\mathcal {M}}_{+}({\varOmega }, \mu_{0})$ of finite measures compatible with a fixed measure $\mu_{0}$, where two measures are called compatible if they have the same null sets. On the other hand, it defines on ${\mathcal {M}}_{+}$ the structure of a Banach manifold, whence a much stronger geometric structure than that of ${\mathcal {M}}({\varOmega })$ defined in Sect. 3.2. We shall refer to this as the Pistone–Sempi structure on ${\mathcal {M}}_{+}({\varOmega }, \mu_{0})$.

The starting point is to define a topology on ${\mathcal {M}}_{+}$, called the topology of $e$ -convergence or $e$ -topology, cf. Definition 3.13 below. For this topology, the inclusion ${\mathcal {M}}_{+} = {\mathcal {M}}_{+}({\varOmega }, \mu_{0}) \hookrightarrow {\mathcal {M}}({\varOmega })$ is continuous, where ${\mathcal {M}}({\varOmega })$ is equipped with the $L^{1}$-topology induced by the inclusion ${\mathcal {M}}({\varOmega }, \mu_{0}) \hookrightarrow {\mathcal {S}}({\varOmega })$.

Since ${\mathcal {M}}_{+}$ is a Banach manifold, it is natural to consider models which are given by $C^{1}$-maps $\mathbf{p}: M \to {\mathcal {M}}_{+}$ from a (Banach-)manifold $M$ to the Banach manifold ${\mathcal {M}}_{+}$. As it turns out, such a map, when regarded as a parametrized measure model $\mathbf{p}: M \to {\mathcal {M}}_{+} \hookrightarrow {\mathcal {M}}({\varOmega })$ is $\infty$-integrable in the sense of Definition 3.7. The converse, however, is far from being true in general, see Example 3.8 below.

In fact, we shall show that in the $e$-topology, the set ${\mathcal {M}}_{+}$ decomposes into several connected components each of which can be canonically identified with a convex open set in a Banach space, and this induces the Banach manifold structure established in [216].

We shall begin our discussion by introducing the notion of the $e$-topology, which is defined using the notion of convergence of sequences. We shall then recall some basic facts about Orlicz spaces and show that to each $\mu\in {\mathcal {M}}_{+}$ there is an associated exponential tangent space $T_{\mu} {\mathcal {M}}_{+}$ which is an Orlicz space. It then turns out that the image of the injective map

$$\log_{\mu_{0}}: e^{f} \mu_{0} \longmapsto f $$

maps the connected component of $\mu$ in ${\mathcal {M}}_{+}$ into an open convex subset of $T_{\mu} {\mathcal {M}}_{+}$. We also refer to the results in [230] where this division of ${\mathcal {M}}_{+}$ into open cells is established as well.

3.3.1 $e$-Convergence

Definition 3.13

(Cf. [216])

Let $(g_{n})_{n \in {\mathbb {N}}}$ be a sequence of measurable functions in the finite measure space $({\varOmega }, \mu)$, and let $g \in L^{1}({\varOmega }, \mu)$ be such that $g_{n}, g > 0$ $\mu$-a.e. We say that $(g_{n})_{n \in {\mathbb {N}}}$ is $e$ -convergent to $g$ if the sequences $(g_{n}/g)$ and $(g/g_{n})$ converge to 1 in $L^{p}({\varOmega }, g \mu)$ for all $p \geq1$. In this case, we also say that the sequence of measures $(g_{n} \mu)_{n \in {\mathbb {N}}}$ is $e$-convergent to the measure $g \mu$.

It is evident from this definition that the measure $\mu$ may be replaced by a compatible measure $\mu' \in {\mathcal {M}}_{+}({\varOmega }, \mu)$ without changing the notion of $e$-convergence. There are equivalent reformulations of $e$-convergence given as follows.

Proposition 3.7

(Cf. [216])

Let $(g_{n})_{n \in {\mathbb {N}}}$ and $g$ be as above. Then the following are equivalent:

(1)
$(g_{n})_{n \in {\mathbb {N}}}$ is $e$-convergent to $g$.
(2)
For all $p \geq1$, we have
$$\lim_{n \to\infty} \int_{{\varOmega }}\biggl\vert \biggl(\frac{g_{n}}{g} \biggr)^{p} - 1 \biggr\vert g\; d\mu= \lim_{n \to\infty} \int_{{\varOmega }}\biggl\vert \biggl(\frac{g}{g_{n}} \biggr)^{p} - 1 \biggr\vert g\; d\mu= 0. $$
(3)
The following conditions hold:
1. (a)
  $(g_{n})$ converges to $g$ in $L^{1}({\varOmega }, \mu)$,
2. (b)
  for all $p \geq1$ we have
  $$\limsup_{n \to\infty} \int_{{\varOmega }}\biggl(\frac{g_{n}}{g} \biggr)^{p} g \; d\mu< \infty\quad \textit{and} \quad\limsup_{n \to\infty} \int_{{\varOmega }}\biggl(\frac{g}{g_{n}} \biggr)^{p} g \; d\mu< \infty. $$

For the proof, we shall use the following simple

Lemma 3.9

Let $(f_{n})_{n \in {\mathbb {N}}}$ be a sequence of measurable functions in $({\varOmega }, \mu)$ such that

(1)
$\lim_{n \to\infty} \|f_{n}\|_{1} = 0$,
(2)
$\limsup_{n \to\infty} \|f_{n}\|_{p} < \infty$ for all $p > 1$.

Then $\lim_{n \to\infty} \|f_{n}\|_{p} = 0$ for all $p \geq1$, i.e., $(f_{n})$ converges to 0 in $L^{p}({\varOmega }, \mu)$ for all $p \geq1$.

Proof

For $p \geq1$, we have by Hölder’s inequality

$$\begin{aligned} \|f_{n}\|_{p}^{p} = & \int_{{\varOmega }}|f_{n}|^{p} d\mu= \int_{{\varOmega }}|f_{n}|^{1/2} |f_{n}|^{p-1/2} \,d\mu \\ \leq& \bigl\| f_{n}^{1/2}\bigr\| _{2} \bigl\| f_{n}^{p-1/2} \bigr\| _{2} = \|f_{n}\|_{1}^{1/2} \|f_{n}\| _{2p-1}^{(2p-1)/2}. \end{aligned}$$

Since $\|f_{n}\|_{1}\to0$ and $\|f_{n}\|_{2p-1}$ is bounded, $\|f_{n}\|_{p} \to 0$ follows. □

Proof of Proposition 3.7

(1) ⇒ (2) Note that the expressions $\vert (g_{n}/g)^{p} - 1 \vert $ and $\vert (g/g_{n})^{p} - 1 \vert $ are increasing in $p$, hence we may assume w.l.o.g. that $p \in {\mathbb {N}}$. Let $f_{n} := g_{n}/g - 1$, so that by hypothesis $\lim_{n \to\infty} \int_{{\varOmega }}|f_{n}|^{p} g\; d\mu= 0$ for all $p \geq1$. Then

$$\begin{aligned} \int_{{\varOmega }}\biggl\vert \biggl(\frac{g_{n}}{g} \biggr)^{p} - 1 \biggr\vert g\; d\mu = & \int_{{\varOmega }}\bigl\vert (1 + f_{n} )^{p} - 1 \bigr\vert g\; d\mu \\ \leq& \sum_{k=1}^{p} \left ( \textstyle\begin{array}{c} p\\ k \end{array}\displaystyle \right ) \int_{{\varOmega }}|f_{n}|^{k} g\; d\mu \longrightarrow0, \end{aligned}$$

and the other assertion follows analogously.

(2) ⇒ (3) The first equation in (2) for $p=1$ reads

$$0 = \lim_{n \to\infty} \int_{{\varOmega }}\biggl\vert \frac{g_{n}}{g} - 1 \biggr\vert g\; d\mu= \lim_{n \to\infty} \int_{{\varOmega }}\vert g_{n} - g \vert \,d\mu, $$

so that (3)(a) is satisfied. Moreover, it is evident that the convergence conditions in (2) imply the boundedness of $(g_{n}/g)$ and $(g/g_{n})$ in $L^{p}({\varOmega }, g\mu)$.

(3) ⇒ (1) Again, let $f_{n} := g_{n}/g - 1$. Then (3)(a) implies that $(f_{n})$ converges to 0 in $L^{1}({\varOmega }, g\mu)$, and the first condition in (3)(b) implies that $(f_{n})$ is bounded in $L^{p}({\varOmega }, g\mu)$ for all $p > 1$. This together with Lemma 3.9 implies that $(f_{n})$ converges to 0 and hence $(g_{n}/g)$ to 1 in $L^{p}({\varOmega }, g\mu)$ for all $p \geq1$.

Note that $g/g_{n} - 1 = - f_{n} \cdot(g/g_{n})$. By the above, $(f_{n})$ tends to 0 in $L^{2}({\varOmega }, g\mu)$, and $(g/g_{n})$ is bounded in that space by (3)(b). Thus, by Hölder’s inequality, $(g/g_{n} - 1) = - f_{n} \cdot(g/g_{n})$ tends to 0 in $L^{1}({\varOmega }, g\mu)$ and, moreover, this sequence is bounded in $L^{p}({\varOmega }, g\mu)$ for all $p \geq1$ by the second condition in (3)(b). Thus, $(g/g_{n})$ tends to 1 in $L^{p}({\varOmega }, g \mu)$ for all $p \geq1$ by Lemma 3.9. □

3.3.2 Orlicz Spaces

In this section, we recall the theory of Orlicz spaces which is needed in Sect. 3.3.3 for the description of the geometric structure on ${\mathcal {M}}({\varOmega })$. Most of the results can be found, e.g., in [153].

A function $\phi: {\mathbb {R}}\to {\mathbb {R}}$ is called a Young function if $\phi (0) = 0$, $\phi$ is even, convex, strictly increasing on $[0, \infty)$ and $\lim_{t \to\infty} t^{-1} \phi(t) = \infty$. Given a finite measure space $({\varOmega }, \mu)$ and a Young function $\phi$, we define the Orlicz space

$$L^{\phi}(\mu) := \biggl\{ f : {\varOmega }\to {\mathbb {R}}\Bigm| \int_{{\varOmega }}\phi({\varepsilon }f) \,d\mu < \infty\mbox{ for some ${\varepsilon }> 0$} \biggr\} . $$

The elements of $L^{\phi}(\mu)$ are called Orlicz functions of $(\phi, \mu)$. Convexity of $\phi$ and $\phi(0) = 0$ implies

$$ \phi(c x) \leq c \phi(x) \quad\mbox{and} \quad \phi \bigl(c^{-1} x\bigr) \geq c^{-1} \phi(x) \quad\mbox{for all $c \in(0, 1)$, $x \in {\mathbb {R}}$}. $$

(3.102)

We define the Orlicz norm on $L^{\phi}(\mu)$ as

$$\|f\|_{\phi, \mu} := \inf \biggl\{ a > 0 \Bigm| \int_{{\varOmega }}\phi \biggl(\frac{f}{a} \biggr) \,d\mu\leq1 \biggr\} . $$

If $f$ is an Orlicz function and ${\varepsilon }> 0$ and $K \geq\int_{{\varOmega }}\phi (\frac{f}{a} ) \,d\mu\geq1$, then

$$\int_{{\varOmega }}\phi({\varepsilon }f) \,d\mu\leq K \Rightarrow \int_{{\varOmega }}\phi\bigl(K^{-1} {\varepsilon }f\bigr) \,d\mu \stackrel{\scriptsize\mbox{(3.102)}}{\leq} K^{-1} \int_{{\varOmega }}\phi ({\varepsilon }f) \,d\mu= 1, $$

whence

$$ \int_{{\varOmega }}\phi({\varepsilon }f) \,d\mu\leq K \Rightarrow\|f \|_{\phi, \mu} \leq \dfrac{K}{{\varepsilon }}, $$

(3.103)

as long as $K \geq1$. In particular, every Orlicz function has finite norm.

Observe that the infimum in the definition of the norm is indeed attained, as for a sequence $(a_{n})_{n \in {\mathbb {N}}}$ descending to $\|f\| _{\phi, \mu}$ the sequence $g_{n} := \phi(f/a_{n})$ is monotonically increasing as $\phi$ is even and increasing on $[0, \infty)$. Thus, by the monotone convergence theorem,

$$\int_{{\varOmega }}\phi \biggl(\frac{f}{\|f\|_{\phi, \mu}} \biggr)\, d\mu= \lim _{n \to\infty} \int_{{\varOmega }}\phi \biggl(\frac{f}{a_{n}} \biggr) \,d\mu\leq1. $$

We assert that $\|\cdot\|_{\phi, \mu}$ is indeed a norm on $L^{\phi}(\mu)$. For the positive definiteness, suppose that $\|f\|_{\phi, \mu} = 0$. Then $\int_{{\varOmega }}\phi(n f) \,d\mu\leq1$ for all $n \in {\mathbb {N}}$, and again by the monotone convergence theorem,

$$1 \geq\lim_{n \to\infty} \int_{{\varOmega }}\phi(n f) \,d\mu= \int_{{\varOmega }}\lim_{n \to\infty} \phi(n f) \,d\mu, $$

so that, in particular, $\lim_{n \to\infty} \phi(n f({\omega })) < \infty$ for a.e. ${\omega }\in {\varOmega }$. But since $\lim_{t \to\infty} \phi(t) = \infty $, this implies that $f({\omega }) = 0$ for a.e. ${\omega }\in {\varOmega }$, as asserted.

The homogeneity $\|c f\|_{\phi, \mu} = |c| \; \|f\|_{\phi, \mu}$ is immediate from the definition. Finally, for the triangle inequality let $f_{1}, f_{2} \in L^{\phi}({\varOmega }, \mu)$ and let $c_{i} := \|f_{i}\|_{\phi, \mu}$. Then the convexity of $\phi$ implies

$$\begin{aligned} \int_{{\varOmega }}\phi \biggl( \frac{f_{1} + f_{2}}{c_{1} + c_{2}} \biggr) \,d\mu = & \int _{{\varOmega }}\phi \biggl( \frac{c_{1}}{c_{1} + c_{2}} \frac{f_{1}}{c_{1}} + \frac{c_{2}}{c_{1} + c_{2}} \frac{f_{2}}{c_{2}} \biggr) \,d\mu \\ \leq& \frac{c_{1}}{c_{1} + c_{2}} \int_{{\varOmega }}\phi \biggl( \frac {f_{1}}{c_{1}} \biggr) \,d\mu+ \frac{c_{2}}{c_{1} + c_{2}} \int_{{\varOmega }}\phi \biggl(\frac {f_{2}}{c_{2}} \biggr) \,d\mu \\ \leq& 1. \end{aligned}$$

Thus, $\|f_{1} + f_{2}\|_{\phi, \mu} \leq c_{1} + c_{2} = \|f_{1}\|_{\phi, \mu} + \|f_{2}\|_{\phi, \mu}$ so that the triangle inequality holds.

Example 3.7

For $p > 1$, let $\phi(t) := |t|^{p}$. This is then a Young function, and $L^{\phi}(\mu) = L^{p}({\varOmega }, \mu)$ as normed spaces.

In this example, we could also consider the case $p = 1$, even though $\phi(t) = |t|$ is not a Young function as it fails to meet the condition $\lim_{t \to\infty} t^{-1} \phi(t) = \infty$. However, this property of Young functions was not used in the verification of the norm properties of $\|\cdot\|_{\phi, \mu}$.

Proposition 3.8

Let $(f_{n})_{n \in {\mathbb {N}}}$ be a sequence in $L^{\phi}(\mu)$. Then the following are equivalent:

(1)
$\lim_{n \to\infty} f_{n} = 0$ in the Orlicz norm.
(2)
There is a $K > 0$ such that for all $c > 0$, $\limsup_{n \to\infty} \int_{{\varOmega }}\phi(c f_{n}) \,d\mu\leq K$.
(3)
For all $c > 0$, $\lim_{n \to\infty} \int_{{\varOmega }}\phi(c f_{n}) \,d\mu= 0$.

Proof

Suppose that (1) holds. Then for any $c > 0$ and ${\varepsilon }\in(0,1)$ we have

$${\varepsilon }\geq {\varepsilon }\limsup_{n \to\infty} \underbrace{ \int_{{\varOmega }}\phi\bigl(c {\varepsilon }^{-1} f_{n} \bigr)\,d\mu}_{\leq1 \scriptsize\mbox{{ if $\|f_{n}\|_{\phi, \mu} \leq {\varepsilon }/c$}}} \stackrel{\scriptsize\mbox{(3.102)}}{\geq}\limsup _{n \to \infty} \int_{{\varOmega }}\phi(c f_{n})\,d\mu, $$

and since ${\varepsilon }\in(0,1)$ is arbitrary, (3) follows. Obviously, (3) implies (2), and if (2) holds for some $K$, then assuming w.l.o.g. that $K \geq1$, we conclude from (3.103) that $\limsup_{n \to\infty}\|f_{n}\|_{\phi, \mu} \leq Kc^{-1}$ for all $c > 0$, whence $\lim_{n \to\infty}\|f_{n}\|_{\phi, \mu} = 0$, which shows (1). □

Now let us investigate how the Orlicz spaces behave under a change of the Young function $\phi$.

Proposition 3.9

Let $({\varOmega }, \mu)$ be a finite measure space, and let $\phi_{1}, \phi_{2}: {\mathbb {R}}\to {\mathbb {R}}$ be two Young functions. If

$$\limsup_{t \to\infty} \frac{\phi_{1}(t)}{\phi_{2}(t)} < \infty, $$

then $L^{\phi_{2}}(\mu) {\subseteq }L^{\phi_{1}}(\mu)$, and the inclusion is continuous, i.e., $\|f\|_{\phi_{1}, \mu} \leq c\ \|f\|_{\phi_{2}, \mu}$ for some $c > 0$ and all $f \in L^{\phi_{2}}(\mu)$. In particular, if

$$0 < \liminf_{t \to\infty} \frac{\phi_{1}(t)}{\phi_{2}(t)} \le\limsup _{t \to\infty} \frac{\phi_{1}(t)}{\phi_{2}(t)} < \infty, $$

then $L^{\phi_{1}}(\mu) = L^{\phi_{2}}(\mu)$, and the Orlicz norms $\| \cdot\|_{\phi_{1}, \mu}$ and $\| \cdot\|_{\phi_{2}, \mu}$ are equivalent.

Proof

By our hypothesis, $\phi_{1}(t) \leq K \phi_{2}(t)$ for some $K \ge1$ and all $t \ge t_{0}$. Let $f \in L^{\phi_{2}}(\mu)$ and $a := \| f \|_{\phi_{2}, \mu}$. Moreover, decompose

$${\varOmega }:= {\varOmega }_{1} \uplus {\varOmega }_{2} \quad\mbox{with}\quad {\varOmega }_{1}:= \bigl\{ {\omega }\in {\varOmega }\bigm| \bigl|f({\omega })\bigr| \ge a t_{0} \bigr\} . $$

Then

$$\begin{aligned} \begin{aligned} K & \ge K \int_{{\varOmega }}\phi_{2} \biggl( \frac{|f|}{a} \biggr) \,d\mu\ge \int _{{\varOmega }_{1}} K \phi_{2} \biggl( \frac{|f|}{a} \biggr) \,d\mu \\ & \ge\int_{{\varOmega }_{1}} \phi_{1} \biggl( \frac{|f|}{a} \biggr) \,d\mu\quad\mbox{as $\dfrac{|f|}{a} \ge t_{0}$ on ${\varOmega }_{1}$} \\ & = \int_{{\varOmega }}\phi_{1} \biggl( \frac{|f|}{a} \biggr) \,d\mu- \int_{{\varOmega }_{2}} \phi_{1} \biggl( \frac{|f|}{a} \biggr) \,d\mu \\ & \ge \int_{{\varOmega }}\phi_{1} \biggl( \frac{|f|}{a} \biggr) \,d\mu- \int_{{\varOmega }_{2}} \phi_{1}( t_{0}) \,d\mu\quad \mbox{as $\dfrac{|f|}{a} < t_{0}$ on ${\varOmega }_{2}$} \\ & \ge \int_{{\varOmega }}\phi_{1} \biggl( \frac{|f|}{a} \biggr) \,d\mu- \phi_{1}( t_{0}) \mu({\varOmega }). \end{aligned} \end{aligned}$$

Thus, $\int_{{\varOmega }}\phi_{1} ( \frac{|f|}{a} ) \le K + \phi_{1}( t_{0}) \mu({\varOmega }) =: c$, hence $f \in L^{\phi_{1}}(\mu)$. As $c \ge1$, (3.103) this implies that $\| f \|_{\phi_{1}, \mu} \leq ca = c \| f \|_{\phi_{2}, \mu}$, and this proves the claim. □

The following lemma is a straightforward consequence of the definitions and we omit the proof.

Lemma 3.10

Let $({\varOmega }, \mu)$ be a finite measure space, let $\phi: {\mathbb {R}}\to {\mathbb {R}}$ be a Young function, and let $\tilde{\phi}(t) := \phi(\lambda t)$ for some constant $\lambda> 0$.

Then $\tilde{\phi}$ is also a Young function. Moreover, $L^{\phi}(\mu) = L^{\tilde{\phi}}(\mu)$ and $\| \cdot\|_{\tilde{\phi}, \mu} = \lambda\| \cdot \|_{\phi, \mu}$, so that these norms are equivalent.

Furthermore, we investigate how the Orlicz spaces relate when changing the measure $\mu$ to another measure $\mu' \in {\mathcal {M}}({\varOmega }, \mu)$.

Proposition 3.10

Let $0 \neq\mu' \in {\mathcal {M}}({\varOmega }, \mu)$ be a measure such that $d\mu'/d\mu \in L^{p}({\varOmega }, \mu)$ for some $p > 1$, and let $q > 1$ be the dual index, i.e., $p^{-1} + q^{-1} = 1$. Then for any Young function $\phi$ we have

$$L^{\phi^{q}}(\mu) {\subseteq }L^{\phi}\bigl(\mu'\bigr), $$

and this embedding is continuous.

Proof

Let $h := d\mu'/d\mu\in L^{p}({\varOmega }, \mu)$ and $c := \|h\|_{p} > 0$. If $f \in L^{\phi^{q}}(\mu)$ and $a := \|f\|_{\phi^{q}, \mu}$, then by Hölder’s inequality we have

$$\int_{{\varOmega }}\phi \biggl( \frac{|f|}{a} \biggr)\,d \mu' = \int_{{\varOmega }}\phi \biggl( \frac{|f|}{a} \biggr) h\,d\mu\le c \biggl\Vert \phi \biggl( \frac{|f|}{a} \biggr) \biggr\Vert _{q} = c \underbrace{ \biggl\Vert \phi^{q} \biggl( \frac{|f|}{a} \biggr) \biggr\Vert _{1}^{1/q}}_{\le1} \le c. $$

Thus, $f \in L^{\phi}(\mu')$, and (3.103) implies that $\|f\|_{\phi, \mu'} \leq c a = c \|f\|_{\phi^{q}, \mu}$, which shows the claim. □

Finally, we show that the Orlicz norms are complete.

Theorem 3.3

Let $\phi$ be a Young function. Then $(L^{\phi}(\mu), \|\cdot \|_{\phi, \mu })$ is a Banach space.

Proof

Since $\lim_{t \to\infty} \phi(t)/t = 0$, Proposition 3.9 implies that we have a continuous inclusion $(L^{\phi}(\mu), \|\cdot\|_{\phi, \mu}) \hookrightarrow L^{1}({\varOmega }, \mu)$. In particular, any Cauchy sequence $(f_{n})_{n \in {\mathbb {N}}}$ in $L^{\phi}(\mu)$ is a Cauchy sequence in $L^{1}({\varOmega }, \mu)$, and since the latter space is complete, it $L^{1}$-converges to a limit function $f \in L^{1}({\varOmega }, \mu)$. Therefore, after passing to a subsequence, we may assume that $f_{n} \to f$ pointwise almost everywhere.

Let ${\varepsilon }> 0$. Then there is an $N({\varepsilon })$ such that for all $n, m \geq N({\varepsilon })$ we have $\|f_{n} - f_{m}\|_{\phi, \mu} < {\varepsilon }$, that is, for all $n, m \geq N({\varepsilon })$ we have

$$\int_{{\varOmega }}\phi\bigl({\varepsilon }^{-1} (f_{m} - f_{n})\bigr) \,d\mu\leq1. $$

Taking the pointwise limit $n \to\infty$, Fatou’s lemma yields

$$\int_{{\varOmega }}\phi\bigl({\varepsilon }^{-1} (f_{m} - f) \bigr) \,d\mu\leq\liminf_{n \to\infty} \int _{{\varOmega }}\phi\bigl({\varepsilon }^{-1}(f_{m} - f_{n})\bigr) \,d\mu\leq1, $$

which implies that $f_{m} - f \in L^{\phi}(\mu)$ and $\|f_{m} - f\|_{\phi, \mu} \leq {\varepsilon }$ for all $m \geq N({\varepsilon })$. Therefore, $f \in L^{\phi}(\mu )$ and $\lim_{m \to\infty} \|f_{m} - f\|_{\phi, \mu} = 0$. □

3.3.3 Exponential Tangent Spaces

For an arbitrary $\mu\in {\mathcal {M}}_{+}$, we define the set

$$\hat{B}_{\mu}({\varOmega }) := \bigl\{ f : {\varOmega }\to[- \infty, + \infty], |f| < \infty \; \mbox{$\mu$-a.e.} : e^{f} \in L^{1}({\varOmega }, \mu) \bigr\} , $$

which by Hölder’s inequality is a convex subset of the space of measurable functions ${\varOmega }\to[ - \infty, + \infty]$. For $\mu_{0}$, there is a bijection

$$\log_{\mu_{0}} : {\mathcal {M}}_{+} \longrightarrow\hat{B}_{\mu_{0}}({\varOmega }), \qquad \phi \, \mu_{0} \longmapsto\log(\phi). $$

That is, $\log_{\mu_{0}}$ canonically identifies ${\mathcal {M}}_{+}$ with a convex set. Replacing $\mu_{0}$ by a measure $\mu_{0}' \in {\mathcal {M}}_{+}$, we have $\log _{\mu_{0}'} = \log_{\mu_{0}} - u$, where $u := \log_{\mu_{0}'} \mu_{0}$. Moreover, we let

$$\begin{aligned} B_{\mu}({\varOmega }) := & \hat{B}_{\mu}({\varOmega }) \cap\bigl(-\hat{B}_{\mu}({\varOmega })\bigr) \\ = & \bigl\{ f: {\varOmega }\to[-\infty, \infty] \bigm| e^{\pm f} \in L^{1}({\varOmega }, \mu)\bigr\} \\ = & \bigl\{ f: {\varOmega }\to[-\infty, \infty] \bigm| e^{|f|} \in L^{1}({\varOmega }, \mu)\bigr\} \end{aligned}$$

and

$$B_{\mu}^{0}({\varOmega }) := \bigl\{ f \in B_{\mu}({\varOmega }) \bigm|(1+s) f \in B_{\mu}({\varOmega }) \mbox{ for some $s > 0$}\bigr\} . $$

The points of $B_{\mu}^{0}({\varOmega })$ are called inner points of $B_{\mu}({\varOmega })$.

Definition 3.14

Let $\mu\in {\mathcal {M}}_{+}$. Then

$$T_{\mu} {\mathcal {M}}_{+} := \bigl\{ f: {\varOmega }\to[-\infty, \infty] \bigm|\mbox{ $t f \in B_{\mu} ({\varOmega })$ for some $t \neq0$}\bigr\} $$

is called the exponential tangent space of ${\mathcal {M}}_{+}$ at $\mu$.

Evidently, $f \in T_{\mu} {\mathcal {M}}_{+}$ iff $\int_{{\varOmega }}(\exp|tf| - 1)\; d\mu< \infty$ for some $t > 0$. Thus,

$$T_{\mu} {\mathcal {M}}_{+} = L^{\exp|t| - 1}(\mu), $$

and hence is an Orlicz space, i.e., it has a Banach norm.^{Footnote 6} If $\|f\|_{L^{\exp|t| - 1}(\mu )} < 1$, then

$$\int_{{\varOmega }}(\exp|f| - 1)\; d\mu\leq1 \quad\Longrightarrow\quad \int_{{\varOmega }}e^{|f|}\; d\mu< \infty\Longrightarrow f \in B_{\mu}({\varOmega }). $$

That is, $B_{\mu}({\varOmega }) {\subseteq }T_{\mu} {\mathcal {M}}_{+}$ contains the unit ball w.r.t. the Orlicz norm and hence is a neighborhood of the origin. Furthermore, $\lim_{t \to\infty} t^{p}/(\exp|t| - 1) = 0$ for all $p \ge 1$, so that Proposition 3.9 implies that

$$ L^{\infty}({\varOmega }, \mu) {\subseteq }T_{\mu} {\mathcal {M}}_{+} {\subseteq }\bigcap_{p \geq1} L^{p} ({\varOmega }, \mu), $$

(3.104)

where all inclusions are continuous.

Observe that the inclusions in (3.104) are proper, in general. As an example, let ${\varOmega }= (0,1)$ be the unit interval, and let $\mu= dt$ be the Lebesgue measure.

Let $f(t) := (\log t)^{2}$. Since $\int_{0}^{1} |\log t|^{n} dt = n!$ for all $n \in {\mathbb {N}}$, it follows that $f \in L^{p}((0,1), dt)$ for all $p$. However, for $x > 0$ we have

$$\int_{0}^{1} \exp\bigl(x f(t)\bigr) \,dt = \sum _{n=0}^{\infty}\frac{1}{n!} x^{n} \int_{0}^{1} \log(t)^{2n} \,dt = \sum _{n=0}^{\infty}\frac{(2n)!}{n!} x^{n}. $$

But this power series diverges for all $x \neq0$, hence $(\log t)^{2} \notin T_{\mu}({\varOmega }, \mu)$.

For the first inclusion, observe that $|\log t| \in T_{\mu}({\varOmega }, \mu)$ as $\exp(\alpha|\log t|) = t^{-\alpha}$, which is integrable for $\alpha < 1$. However, $|\log t|$ is unbounded and hence not in $L^{\infty}((0,1), dt)$.

Remark 3.12

In [106, Definition 6], $T_{\mu} {\mathcal {M}}_{+}$ is called the Cramer class of $\mu$. Moreover, in [106, Proposition 7] (see also [216, Definition 2.2]), the centered Cramer class is defined as the functions $u \in T_{\mu} {\mathcal {M}}_{+}$ with $\int_{{\varOmega }}u\ d\mu= 0$. Thus, the centered Cramer class is a closed subspace of codimension one.

In order to understand the topological structure of ${\mathcal {M}}_{+}$ with respect to the $e$-topology, it is useful to introduce the following preorder on ${\mathcal {M}}_{+}$:

$$ \mu' \preceq\mu\quad\mbox{if and only if} \quad \mu' = \phi\mu \mbox{ with } \phi\in L^{p}({\varOmega }, \mu) \mbox{ for some $p > 1$}. $$

(3.105)

In order to see that ⪯ is indeed a preorder, we have to show transitivity, as the reflexivity of ⪯ is obvious. Thus, let $\mu'' \preceq\mu'$ and $\mu' \preceq\mu$, so that $\mu' = \phi\mu$ and $\mu'' = \psi\mu'$ with $\phi\in L^{p}({\varOmega }, \mu)$ and $\psi\in L^{p'}({\varOmega }, \mu')$, then $\phi^{p}, \psi^{p'} \phi\in L^{1}({\varOmega }, \mu)$ for some $p, p' > 1$. Let $\lambda:= (p'-1)/(p + p'-1) \in (0,1)$. Then by Hölder’s inequality, we have

$$L^{1}({\varOmega }, \mu_{1}) \ni\bigl(\psi^{p'} \phi \bigr)^{1-\lambda} \bigl(\phi^{p}\bigr)^{\lambda}= \psi ^{p'(1-\lambda)} \phi^{1+ \lambda(p-1)} = (\psi\phi)^{p''}, $$

where $p'' = p p'/(p + p' -1) > 1$, so that $\psi\phi\in L^{p''}({\varOmega }, \mu)$, and hence, $\mu'' \preceq\mu$ as $\mu'' = \psi\phi\mu$.

From the preorder ⪯ we define the equivalence relation on ${\mathcal {M}}_{+}$ by

$$ \mu' \sim\mu\quad\mbox{if and only if}\quad \mu' \preceq\mu\mbox{ and }\mu \preceq\mu', $$

(3.106)

in which case we call $\mu$ and $\mu'$ similar, and hence we obtain a partial ordering on the set of equivalence classes ${\mathcal {M}}_{+}/_{\sim}$

$$\bigl[\mu'\bigr] \preceq[\mu] \quad\mbox{if and only if} \quad \mu' \preceq \mu. $$

Proposition 3.11

Let $\mu' \preceq\mu$. Then $T_{\mu} {\mathcal {M}}_{+} {\subseteq }T_{\mu'} {\mathcal {M}}_{+}$ is continuously embedded w.r.t. the Orlicz norms on these tangent spaces.

In particular, if $\mu\sim\mu'$, then $T_{\mu} {\mathcal {M}}_{+} = T_{\mu'} {\mathcal {M}}_{+}$ with equivalent Banach norms. If we denote the isomorphism class of these spaces as $T_{[\mu]} {\mathcal {M}}_{+}$, then there are continuous inclusions

$$ T_{[\mu]} {\mathcal {M}}_{+} \hookrightarrow T_{[\mu']} {\mathcal {M}}_{+} \quad\textit{if} \quad\bigl[\mu'\bigr] \preceq[\mu]. $$

(3.107)

Proof

Let $\mu' = \phi\mu$ with $\phi\in L^{p}({\varOmega }, \mu)$, $p > 1$, and let $q > 1$ be the dual index, i.e., $p^{-1} + q^{-1} = 1$. Then by Hölder’s inequality,

$$\begin{aligned} \int_{{\varOmega }}\bigl(\exp|tf| - 1\bigr) \,d\mu' = & \int_{{\varOmega }}\bigl(\exp|tf| - 1\bigr) \phi \,d\mu \\ \leq& \|\phi\|_{p} \biggl(\int_{{\varOmega }}\bigl(\exp|tf| - 1\bigr)^{q} \,d\mu \biggr)^{1/q}. \end{aligned}$$

(3.108)

Let $\psi: {\mathbb {R}}\to {\mathbb {R}}$, $\psi(t) := \|\phi\|_{p}^{q} (\exp|t| - 1)^{q}$, which is a Young function. Let $f \in L^{\psi}(\mu)$ and $a := \|f\|_{L^{\psi }(\mu)}$. Then the right-hand side of (3.108) with $t := a^{-1}$ is bounded by 1, whence so is the left-hand side, so that $f \in L^{\exp|t|-1}(\mu') = T_{\mu'}{\mathcal {M}}_{+}$ and $\|f\|_{L^{\psi}(\mu)} = a \geq\|f\|_{L^{\exp|x|-1}(\mu')}$. Thus, there is a continuous inclusion

$$L^{\psi}(\mu) \hookrightarrow L^{\exp|x|-1}\bigl(\mu' \bigr) = T_{\mu'} {\mathcal {M}}_{+}({\varOmega }). $$

But now, as $\lim_{t \to\infty} \frac{\psi(t)}{\exp|q t| - 1} = \|\phi \|_{p}^{q} \in(0, \infty)$, Proposition 3.9 implies that as Banach spaces, $L^{\psi}(\mu) \cong L^{\exp|q t| - 1}(\mu)$ and furthermore, $L^{\exp|q t| - 1}(\mu) \cong L^{\exp|t|-1}(\mu) = T_{\mu '} {\mathcal {M}}_{+}({\varOmega })$ by Lemma 3.10. □

Proposition 3.12

The subspace in (3.107) is neither closed nor dense, unless $\mu\sim \mu'$, in which case these subspaces are equal. In fact, $f \in T_{[\mu']} {\mathcal {M}}_{+}$ lies in the closure of $T_{[\mu]} {\mathcal {M}}_{+}$ if and only if

$$ \bigl(|f| + {\varepsilon }\log\bigl(d\mu'/d\mu\bigr)\bigr)_{+} \in T_{[\mu]} {\mathcal {M}}_{+} \quad\textit{for all }{\varepsilon }> 0. $$

(3.109)

Here, the subscript refers to the decomposition of a function into its non-negative and non-positive part, i.e., to the decomposition

$$\psi= \psi_{+} - \psi_{-} \quad\mbox{with} \quad\psi_{\pm}\geq0, \psi _{+} \perp\psi_{-}. $$

Proof

Let $p > 1$ be such that $\phi:= d\mu'/d\mu\in L^{p}({\varOmega }, \mu)$, and assume w.l.o.g. that $p \leq2$. Furthermore, let $u := \log\phi$. Then

$$\begin{aligned} K & := \int_{{\varOmega }}\bigl(\exp\bigl((p-1)|u|\bigr) - 1\bigr)\; d \mu' = e^{-1} \int_{{\varOmega }}\max \bigl( \phi^{p-1}, \phi^{1-p} \bigr)\; d\mu' \\ & = e^{-1} \int_{{\varOmega }}\max \bigl( \phi^{p}, \phi^{2-p} \bigr)\; d\mu \leq e^{-1} \int_{{\varOmega }}\max\bigl(\phi^{p},1\bigr)\; d\mu< \infty. \end{aligned}$$

If we let $\psi(t) := K^{-1}(\exp|t| - 1)$, then $\psi$ is a Young function, and by Proposition 3.9, $L^{\psi}(\mu') = L^{\exp|t|-1}(\mu') = T_{[\mu']} {\mathcal {M}}_{+}$. For $f \in T_{[\mu']} {\mathcal {M}}_{+}$ we have

$$\bigl||f| - \bigl(|f| + {\varepsilon }u\bigr)_{+}\bigr| \leq {\varepsilon }|u| $$

and therefore

$$\begin{aligned} \int_{{\varOmega }}\psi \biggl(\dfrac{||f| - (|f| + {\varepsilon }u)_{+}|}{{\varepsilon }(p-1)^{-1}} \biggr)\; d \mu' \leq& \int_{{\varOmega }}\psi \bigl((p-1) |u| \bigr)\; d\mu' \\ = & K^{-1} \int_{{\varOmega }}\bigl(\exp(p-1) |u| - 1\bigr)\; d\mu' = 1, \end{aligned}$$

so that $\||f| - (|f| + {\varepsilon }u)_{+}\|_{L^{\psi}(\mu')} \leq {\varepsilon }(p-1)^{-1}$ by the definition of the Orlicz norm. That is, $\lim_{{\varepsilon }\to0}(|f| + {\varepsilon }u)_{+} = |f|$ in $L^{\psi}(\mu') = L^{\exp|t|-1}(\mu') = T_{[\mu']} {\mathcal {M}}_{+}$.

In particular, if (3.109) holds, then $|f|$ lies in the closure of $T_{[\mu']} {\mathcal {M}}_{+}$. Observe that $(f_{\pm}+ {\varepsilon }u)_{+} \leq(|f| + {\varepsilon }u)_{+}$, whence (3.109) implies that $(f_{\pm}+ {\varepsilon }u)_{+} \in T_{[\mu']} {\mathcal {M}}_{+}$, whence $f_{\pm}$ lie in the closure of $T_{[\mu']} {\mathcal {M}}_{+}$, and so does $f = f_{+} - f_{-}$.

On the other hand, if $(f_{n})_{n \in {\mathbb {N}}}$ is a sequence in $T_{[\mu]} {\mathcal {M}}_{+}$ converging to $f$, then $\min(|f|, |f_{n}|)$ is also in $T_{[\mu]} {\mathcal {M}}_{+}$ converging to $|f|$, whence we may assume w.l.o.g. that $0 \leq f_{n} \leq f$. Let ${\varepsilon }> 0$ and choose $n$ such that $\|f - f_{n}\| < {\varepsilon }$ in $T_{[\mu']} {\mathcal {M}}_{+}$. Then by the definition of the Orlicz space we have

$$\begin{aligned} \begin{aligned} 1 & \geq \int_{{\varOmega }}\bigl(\exp\bigl({\varepsilon }^{-1} (f - f_{n})\bigr) - 1\bigr)\; d\mu' = \int_{{\varOmega }}\exp\bigl({\varepsilon }^{-1} (f - f_{n} + {\varepsilon }u)\bigr)\; d\mu- \mu'({\varOmega }) \\ & \geq \int_{{\varOmega }}\exp\bigl({\varepsilon }^{-1} (f - f_{n} + {\varepsilon }u)_{+}\bigr)\; d\mu- \mu'({\varOmega }), \end{aligned} \end{aligned}$$

so that $\int_{{\varOmega }}(\exp({\varepsilon }^{-1} (f - f_{n} + {\varepsilon }u)_{+}) - 1)\; d\mu< \infty$, whence $(f - f_{n} + {\varepsilon }u)_{+} \in T_{[\mu]}{\mathcal {M}}_{+}$. On the other hand,

$$(f + {\varepsilon }u)_{+} \leq(f + {\varepsilon }u - f_{n})_{+} + f_{n}, $$

and since the summands on the right are contained in $T_{[\mu]} {\mathcal {M}}_{+}$, so is $(f + {\varepsilon }u)_{+}$ as asserted.

Thus, we have shown that $f$ lies in the closure of $T_{[\mu']} {\mathcal {M}}_{+}$ iff (3.109) holds.

Let us assume from now on that $\mu' \nsim\mu$. It remains to show that in this case, $T_{[\mu]} {\mathcal {M}}_{+} {\subseteq }T_{[\mu']} {\mathcal {M}}_{+}$ is neither closed nor dense. In order to see this, let ${\varOmega }_{+} := \{{\omega }\in {\varOmega }\mid u({\omega }) \geq0\}$ and ${\varOmega }_{-} := {\varOmega }\backslash {\varOmega }_{+}$. Observe that

$$\int_{{\varOmega }}\exp(p u_{+}) \,d\mu= \mu({\varOmega }_{-}) + \int_{{\varOmega }_{+}} \phi^{p} \,d\mu< \infty, $$

as $\phi\in L^{p}({\varOmega }, \mu)$, whence $u_{+} \in T_{[\mu]} {\mathcal {M}}_{+}$.

We assert that $|u|^{a} \in T_{[\mu']} {\mathcal {M}}_{+}$, but $\notin T_{[\mu]} {\mathcal {M}}_{+}$ for all $a > 0$. Namely, pick $t > 0$ such that $ta < \min(1, p-1)$ and calculate

$$\begin{aligned} \int_{{\varOmega }}\exp\bigl(t |u|^{a}\bigr) \,d \mu' = & \int_{{\varOmega }_{+}} \phi^{ta}\; d\mu' + \int _{{\varOmega }_{-}} \phi^{-t a} \,d\mu' \\ \leq& \int_{{\varOmega }_{+}} \phi^{1 + ta}\; d\mu+ \int_{{\varOmega }_{-}} \underbrace {\phi^{1 - t a}}_{\leq1} \,d \mu\leq \int_{{\varOmega }}\max\bigl\{ \phi^{p}, 1\bigr\} \; d\mu< \infty. \end{aligned}$$

Thus, $|u|^{a} \in T_{[\mu']} {\mathcal {M}}_{+}$ for all $a > 0$. On the other hand, if $t > 0$, then

$$\begin{aligned} \int_{{\varOmega }}\exp\bigl(t |u|^{a}\bigr) \,d\mu = & \int_{{\varOmega }_{+}} \phi^{ta} \,d\mu+ \int _{{\varOmega }_{-}} \phi^{-ta} \,d\mu \\ \geq& \int_{{\varOmega }_{+}} 1 \,d\mu+ \int_{{\varOmega }_{-}} \phi^{-(1 + ta)} \,d\mu' \geq \int_{{\varOmega }}\phi^{-(1 + ta)} \,d\mu', \end{aligned}$$

and the last integral is finite if and only if $\phi^{-1} \in L^{1+ta}({\varOmega }, \mu')$ for some $t > 0$, if and only if $\mu\preceq\mu '$ and hence $\mu\sim\mu'$. Since this case was excluded, it follows that $\int_{{\varOmega }}\exp(t |u|^{a}) d\mu= \infty$ for all $t > 0$, whence $|u|^{a} \notin T_{[\mu]} {\mathcal {M}}_{+}$ as asserted.

Thus, our assertion follows if we can show that $|u|^{a}$ is contained in the closure of $T_{[\mu]} {\mathcal {M}}_{+}$ for $0 < a < 1$, but it is not in the closure of $T_{[\mu]} {\mathcal {M}}_{+}$ for $a = 1$.

For $a = 1$ and ${\varepsilon }\in(0,1)$, $(|u| + {\varepsilon }u)_{+} = (1 + {\varepsilon }) u_{+} + (1-{\varepsilon }) u_{-} = 2 {\varepsilon }u_{+} + (1 - {\varepsilon }) |u|$. Since $u_{+} \in T_{[\mu]} {\mathcal {M}}_{+}$ and $|u| \notin T_{[\mu]} {\mathcal {M}}_{+}$, it follows that $(|u| + {\varepsilon }u)_{+} \notin T_{[\mu]} {\mathcal {M}}_{+}$, which shows that (3.109) is violated for $f = |u| \in T_{[\mu']} {\mathcal {M}}_{+}$, whence $|u|$ does not lie in the closure of $T_{[\mu]} {\mathcal {M}}_{+}$, so that the latter is not a dense subspace.

If $0 < a < 1$, then $(|u|^{a} + {\varepsilon }u)_{+} = u_{+}^{a} + {\varepsilon }u_{+} + (u_{-}^{a} - {\varepsilon }u_{-})_{+}$. Now $u_{+}^{a} + {\varepsilon }u_{+} \leq(a + {\varepsilon }) u_{+} + 1$, whereas $u_{-}^{a} - {\varepsilon }u_{-} \geq0$ implies that $u_{-} \leq {\varepsilon }^{1/(a-1)}$, so that $(u_{-}^{a} - {\varepsilon }u_{-})_{+} \leq C_{{\varepsilon }}$, where $C_{{\varepsilon }}:= \max\{t^{a} - {\varepsilon }t \mid0 \leq t \leq {\varepsilon }^{1/(a-1)}\}$.

Thus, $(|u|^{a} + {\varepsilon }u)_{+} \leq(a + {\varepsilon }) u_{+} + 1 + C_{{\varepsilon }}$, and since $u_{+} \in T_{[\mu]} {\mathcal {M}}_{+}$ this implies that $(|u|^{a} + {\varepsilon }u)_{+} \in T_{[\mu]} {\mathcal {M}}_{+}$, so that $|u|^{a}$ lies in the closure of $T_{[\mu]} {\mathcal {M}}_{+}$, but not in $T_{[\mu]} {\mathcal {M}}_{+}$ for $0 < a < 1$. Thus, $T_{[\mu]} {\mathcal {M}}_{+}$ is not a closed subspace of $T_{[\mu']} {\mathcal {M}}_{+}$. □

The following now is a reformulation of Propositions 3.4 and 3.5 in [216].

Proposition 3.13

A sequence $(g_{n} \mu_{0})_{n \in {\mathbb {N}}} \in {\mathcal {M}}({\varOmega }, \mu_{0})$ is $e$-convergent to $g_{0} \mu_{0} \in {\mathcal {M}}({\varOmega }, \mu _{0})$ if and only if $g_{n} \mu_{0} \sim g_{0} \mu_{0}$ for large $n$, and $u_{n} := \log g_{n} \in T_{g_{0} \mu_{0}}{\mathcal {M}}_{+}$ converges to $u_{0} := \log g_{0} \in T_{g \mu_{0}}{\mathcal {M}}_{+}$ in the Banach norm on $T_{g \mu_{0}}{\mathcal {M}}_{+}$ described above.

Proof

If $(g_{n} \mu_{0})_{n \in {\mathbb {N}}} \in {\mathcal {M}}({\varOmega }, \mu_{0})$ $e$-converges to $g_{0} \mu_{0} \in {\mathcal {M}}({\varOmega }, \mu_{0})$, then for large $n$, $(g_{n}/g_{0})$ and $(g_{0}/g_{n})$ are contained in $L^{p}({\varOmega }, g_{0} \mu_{0})$, so that $g_{n} \mu_{0} \sim g_{0} \mu_{0}$ and hence, $u_{n} := \log|g_{n}| \in T_{g_{0} \mu_{0}}{\mathcal {M}}_{+}$.

Moreover, by Proposition 3.7 $(g_{n} \mu_{0})_{n \in {\mathbb {N}}} \in {\mathcal {M}}({\varOmega }, \mu_{0})$ $e$-converges to $g_{0} \mu_{0} \in {\mathcal {M}}({\varOmega }, \mu_{0})$ if and only if for all $p \geq1$

$$\begin{aligned} 0 = & \lim_{n \to\infty} \int_{{\varOmega }}\biggl\{ \biggl\vert \biggl( \frac {g_{n}}{g_{0}} \biggr)^{p} - 1 \biggr\vert + \biggl\vert \biggl( \frac{g_{0}}{g_{n}} \biggr)^{p} - 1 \biggr\vert \biggr\} g_{0} \,d\mu_{0} \\ = & \lim_{n \to\infty} \int_{{\varOmega }}\bigl\{ \bigl\vert e^{p(u_{n} - u_{0})} - 1 \bigr\vert + \bigl\vert e^{p(u_{0} - u_{n})} - 1 \bigr\vert \bigr\} g_{0} \,d\mu_{0} \\ = & \lim_{n \to\infty} \int_{{\varOmega }}2 \sinh\bigl(p|u_{n} - u_{0}|\bigr) g_{0} \,d\mu_{0}. \end{aligned}$$

By Proposition 3.8, this is equivalent to saying that $(u_{n})_{n \in {\mathbb {N}}}$ converges to $u_{0}$ in the Orlicz space $L^{\sinh|t|}(g_{0} \mu_{0})$.

However, $L^{\sinh|t|}(g_{0} \mu_{0}) = L^{\exp|t| - 1}(g_{0} \mu_{0}) = T_{g_{0} \mu_{0}}{\mathcal {M}}_{+}$ by Proposition 3.9, since $\lim_{t \to\infty} \frac{\sinh|t|}{\exp|t|-1} = 1/2 \in(0, \infty)$. □

By virtue of this proposition, we shall refer to the topology on $T_{\mu} {\mathcal {M}}_{+}$ obtained above as the topology of $e$ -convergence or the $e$ -topology. Note that the first statement in Proposition 3.13 implies that the equivalence classes of ∼ are open and closed in the $e$-topology.

Theorem 3.4

Let $K {\subseteq }{\mathcal {M}}_{+}$ be an equivalence class w.r.t. ∼, and let $T := T_{[\mu]} {\mathcal {M}}_{+}$ for $\mu\in K$ be the common exponential tangent space, equipped with the $e$-topology. Then for all $\mu\in K$,

$$A_{\mu}:= \log_{\mu}(K) {\subseteq }T $$

is open convex, and $\log_{\mu}: K \to A_{\mu}$ is a homeomorphism where $K$ is equipped with the $e$-topology. In particular, the identification $\log_{\mu}: A_{\mu}\to K$ allows us to canonically identify $K$ with an open convex subset of the affine space associated to $T$.

Remark 3.13

This theorem shows that the equivalence classes w.r.t. ∼ are the connected components of the $e$-topology on ${\mathcal {M}}({\varOmega }, \mu _{0})$, and since each such component is canonically identified as a subset of an affine space whose underlying vector space is equipped with a family of equivalent Banach norms, it follows that ${\mathcal {M}}({\varOmega }, \mu _{0})$ is a Banach manifold. This is the affine Banach manifold structure on ${\mathcal {M}}({\varOmega }, \mu_{0})$ described in [216], therefore we refer to it as the Pistone–Sempi structure.

The subdivision of ${\mathcal {M}}_{+}({\varOmega })$ into disjoint open connected subsets was also noted in [230].

Proof of Theorem 3.4

If $f \in A_{\mu}$, then, by definition, $(1+s) f, -s f \in\hat{B}_{\mu}({\varOmega })$ for some $s > 0$. In particular, $s f \in B_{\mu}({\varOmega })$, so that $f \in T$ and hence, $A_{\mu} {\subseteq }T$. Moreover, if $f \in A_{\mu}$ then $\lambda f \in A_{\mu}$ for $\lambda\in[0,1]$.

Next, if $g \in A_{\mu}$, then $\mu' := e^{g} \mu\in K$. Therefore, $f \in A_{\mu'}$ if and only if $K \ni e^{f} \mu' = e^{f+g} \mu$ if and only if $f+g \in A_{\mu}$, so that $A_{\mu'} = g + A_{\mu}$ for a fixed $g \in T$. From this, the convexity of $A_{\mu}$ follows.

Therefore, in order to show that $A_{\mu} {\subseteq }T$ is open, it suffices to show that $0 \in A_{\mu'}$ is an inner point for all $\mu' \in K$. For this, observe that for $f \in B_{\mu'}^{0}({\varOmega })$ we have $(1+s) f \in B_{\mu'}({\varOmega })$ and hence $e^{\pm(1+s) f} \in L^{1}({\varOmega }, \mu')$, so that $e^{f} \in L^{1+s}({\varOmega }, \mu')$ and $e^{-f} \in L^{2+s}({\varOmega }, e^{f} \mu')$, whence $e^{f} \mu' \sim\mu' \sim\mu$, so that $e^{f} \mu' \in K$ and hence, $f \in A_{\mu'}$. Thus, $0 \in B_{\mu'}^{0}({\varOmega }) {\subseteq }A_{\mu '}$, and since $B_{\mu'}^{0}({\varOmega })$ contains the unit ball of the Orlicz norm, the claim follows. □

In the terminology which we developed, we can formulate the relation of the Pistone–Sempi structure on ${\mathcal {M}}_{+}$ with $k$-integrability as follows.

Proposition 3.14

The parametrized measure model $({\mathcal {M}}_{+}({\varOmega }, \mu), {\varOmega }, i)$ is $\infty$-integrable, where $i: {\mathcal {M}}_{+}({\varOmega }, \mu ) \hookrightarrow {\mathcal {M}}({\varOmega })$ is the inclusion map and ${\mathcal {M}}_{+}({\varOmega }, \mu)$ carries the Banach manifold structure from the $e$-topology.

Proof

The measures in ${\mathcal {M}}_{+}({\varOmega }, \mu)$ are dominated by $\mu$, and for the inclusion map we have

$$\partial_{V} \log i\bigl(\exp(f) \mu\bigr)= {\partial }_{V}i $$

and the inclusion $i: T_{\mu} {\mathcal {M}}_{+}({\varOmega }) \hookrightarrow L^{k}({\varOmega }, \mu)$ is continuous for all $k \geq1$ by (3.104). Thus, $({\mathcal {M}}_{+}({\varOmega }, \mu), {\varOmega }, i)$ is $k$ integrable for all such $k$. □

Example 3.8

Let ${\varOmega }:= (0,1)$, and consider the 1-parameter family of finite measures

$${\mathbf{p}}(\xi) = p(\xi, t) \, dt := \exp \bigl( - \xi ^{2}\; (\log t)^{2} \bigr)\; dt \in {\mathcal {M}}_{+}\bigl((0,1), dt \bigr) , \quad x \in {\mathbb {R}}. $$

Since $\partial_{\xi}\log p(\xi, t) = - 2 \xi(\log t)^{2}$ and

$$\int_{0}^{1} \bigl|\partial_{\xi}\log p(\xi, t)\bigr|^{k}\; d p(\xi) = 2^{k} |\xi|^{k} \int _{0}^{1} (\log t)^{2k} \exp \bigl( - \xi^{2}\; (\log t)^{2} \bigr)\; dt < \infty $$

and this expression depends continuously on $\xi$ for all $k$, it follows that this parametrized measure model is $\infty$-integrable.

However, $\mathbf{p}$ is not even continuous w.r.t. the $e$-topology. Namely, in this topology $\mathbf{p}(\xi) \to{\mathbf{p}}(0)$ as $\xi \to0$ would imply that

$$\int_{0}^{1} \biggl(\frac{p(0, t)}{p(\xi, t)} - 1 \biggr) \; dp(0) \xrightarrow {\xi\to0} 0. $$

Since $p(0) = dt$, this is equivalent to

$$\int_{0}^{1} \exp\bigl(\xi^{2}\; (\log t)^{2}\bigr)\; dt \xrightarrow{\xi\to0} 1. $$

However,

$$\int_{0}^{1} \exp\bigl(\xi^{2}\; (\log t)^{2}\bigr)\; dt = \sum_{k=0}^{\infty}\frac{1}{n!} \xi^{2n} \int_{0}^{1} (\log t)^{2n}\; dt = \sum _{k=0}^{\infty}\frac {(2n)!}{n!} \xi^{2n} = \infty $$

for all $\xi\neq0$, so that this expression does not converge.

We end this section with the following result which illustrates how the ordering ⪯ provides a stratification of $\hat{B}_{\mu_{0}}({\varOmega })$.

Proposition 3.15

Let $\mu_{0}', \mu_{1}' \in {\mathcal {M}}_{+}$ with $f_{i} := \log_{\mu_{0}}(\mu_{i}') \in \hat{B}_{\mu_{0}}({\varOmega })$, and let $\mu_{\lambda}' := \exp(f_{0} + \lambda(f_{1} - f_{0})) \mu_{0}$ for $\lambda\in[0,1]$ be the segment joining $\mu_{0}'$ and $\mu_{1}'$. Then the following hold:

(1)
The measures $\mu'_{\lambda}$ are similar for $\lambda\in(0,1)$.
(2)
$\mu_{\lambda}' \preceq\mu_{0}'$ and $\mu_{\lambda}' \preceq\mu _{1}'$ for $\lambda\in(0,1)$.
(3)
$T_{\mu_{\lambda}'}{\mathcal {M}}_{+} = T_{\mu_{0}'} {\mathcal {M}}_{+} + T_{\mu_{1}'} {\mathcal {M}}_{+}$ for $\lambda\in(0,1)$.

Proof

Let $\delta:= f_{1} - f_{0}$ and $\phi:= \exp(\delta)$. Then for all $\lambda_{1}, \lambda_{2} \in[0,1]$, we have

$$ \mu'_{\lambda_{1}} = \phi^{\lambda_{1}-\lambda_{2}} \mu'_{\lambda_{2}}. $$

(3.110)

For $\lambda_{1} \in(0,1)$ and $\lambda_{2} \in[0,1]$, we pick $p > 1$ such that $\lambda_{2} + p (\lambda_{1} - \lambda_{2}) \in(0,1)$. Then by (3.110) we have

$$\phi^{p (\lambda_{1} - \lambda_{2})} \mu'_{\lambda_{2}} = \mu'_{\lambda_{2} + p (\lambda_{1} - \lambda_{2})} \in {\mathcal {M}}_{+}, $$

so that $\phi^{p (\lambda_{1} - \lambda_{2})} \in L^{1}({\varOmega }, \mu'_{\lambda _{2}})$ or $\phi^{\lambda_{1} - \lambda_{2}} \in L^{p}({\varOmega }, \mu_{\lambda_{2}})$ for small $p-1 > 0$. Therefore, $\mu'_{\lambda_{1}} \preceq\mu'_{\lambda _{2}}$ for all $\lambda_{1} \in(0,1)$ and $\lambda_{2} \in[0,1]$, which implies the first and second statement.

This implies that $T_{\mu_{i}'} {\mathcal {M}}_{+} {\subseteq }T_{\mu'_{\lambda}} {\mathcal {M}}_{+} = T_{\mu'_{1/2}} {\mathcal {M}}_{+}$ for $i = 0, 1$ and all $\lambda\in(0,1)$ which shows one inclusion in the third statement.

In order to complete the proof, observe that

$$T_{\mu'_{1/2}} {\mathcal {M}}_{+} = T_{\mu'_{1/2}} {\mathcal {M}}_{+} ({\varOmega }_{+}, \mu_{0}) \oplus T_{\mu '_{1/2}} {\mathcal {M}}_{+} ({\varOmega }_{-}, \mu_{0}), $$

where ${\varOmega }_{+} := \{ {\omega }\in {\varOmega }\mid\delta({\omega }) > 0\}$ and ${\varOmega }_{-} := \{ {\omega }\in {\varOmega }\mid\delta({\omega }) \leq0\}$. If $g \in T_{\mu '_{1/2}} {\mathcal {M}}_{+} ({\varOmega }_{+}, \mu_{0})$, then for some $t \neq0$

$$\begin{aligned} \int_{{\varOmega }}\exp\bigl(|t g|\bigr)\, d\mu'_{0} \leq& \int_{{\varOmega }_{+}} \exp\biggl(|tg| + \frac{1}{2} \delta\biggr) \,d \mu'_{0} + \int_{{\varOmega }_{-}} \,d\mu'_{0} \\ = & \int_{{\varOmega }_{+}} \exp\bigl(|tg|\bigr)\,d\mu'_{1/2} + \int_{{\varOmega }_{-}}\,d\mu'_{0} < \infty, \end{aligned}$$

so that $g \in T_{\mu'_{0}}({\varOmega }, \mu_{0})$ and hence, $T_{\mu'_{1/2}} {\mathcal {M}}_{+} ({\varOmega }_{+}, \mu_{0}) {\subseteq }T_{\mu'_{0}}({\varOmega }, \mu_{0})$. Analogously, one shows that $T_{\mu'_{1/2}} {\mathcal {M}}_{+} ({\varOmega }_{-}, \mu_{0}) {\subseteq }T_{\mu'_{0}}({\varOmega }, \mu _{1})$, which completes the proof. □

Notes

1.
${\varOmega }$ will take over the role of the finite sample space $I$ in Sect. 2.1.
2.
For reasons of integrability, this structure need not define an affine space in the sense of Sect. 2.8.1. We only have the structure of an affine manifold, in the sense of possessing affine coordinate changes. This issue will be clarified in Sect. 3.2 below.
3.
Again, we are employing a fundamental mathematical principle here: Instead of considering objects in isolation, we rather focus on the transformations between them. Thus, instead of an individual basis, we consider the transformation that generates it from some (arbitrarily chosen) standard basis. This automatically gives a very powerful structure, that of a group (of transformations).
4.
More precisely, ${\mathcal {M}}_{+}({\varOmega }, \mu_{0})$ is not open unless ${\varOmega }$ is the disjoint union of finitely many $\mu_{0}$-atoms, where $A {\subseteq }{\varOmega }$ is a $\mu _{0}$-atom if for each $B {\subseteq }A$ either $B$ or $A\backslash B$ is a $\mu_{0}$-null set.
5.
Observe that the factor $1/4$ in (3.89) by which the canonical form differs from the Hilbert inner product is responsible for having to use the sphere of radius 2 rather than the unit sphere in Proposition 2.1.
6.
In [216] the Young function $\cosh t - 1$ was used instead of $\exp |t| - 1$. However, these produce equivalent Orlicz spaces by Proposition 3.9.

References

Aliprantis, C., Border, K.: Infinite Dimensional Analysis. Springer, Berlin (2007)
MATH Google Scholar
Amari, S.: Differential-Geometric Methods in Statistics. Lecture Notes in Statistics, vol. 28. Springer, Heidelberg (1985)
Book MATH Google Scholar
Amari, S.: Differential geometrical theory of statistics. In: Differential Geometry in Statistical Inference, Institute of Mathematical Statistics, California. Lecture Notes–Monograph Series, vol. 10 (1987)
Google Scholar
Amari, S., Nagaoka, H.: Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191. Am. Math. Soc./Oxford University Press, Providence/London (2000)
MATH Google Scholar
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Information geometry and sufficient statistics. Probab. Theory Relat. Fields 162, 327–364 (2015)
Article MathSciNet MATH Google Scholar
Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L.: Parametrized measure models. Bernoulli (2015). To appear, arXiv:1510.07305
Friedrich, Th.: Die Fisher-Information und symplektische Strukturen. Math. Nachr. 152, 273–296 (1991)
Article MathSciNet MATH Google Scholar
Fukumizu, K.: Exponential manifold by reproducing kernel Hilbert spaces. In: Gibilisco, P., Riccomagno, E., Rogantin, M.-P., Winn, H. (eds.) Algebraic and Geometric Methods in Statistics, pp. 291–306. Cambridge University Press, Cambridge (2009)
Chapter Google Scholar
Gibilisco, P., Pistone, G.: Connections on non-parametric statistical models by Orlicz space geometry. Infin. Dimens. Anal. Quantum Probab. Relat. Top. 1(2), 325–347 (1998)
Article MathSciNet MATH Google Scholar
Krasnosel’skii, M.A., Rutickii, Ya.B.: Convex functions and Orlicz spaces. Fizmatgiz, Moscow (1958) (In Russian); English translation: P. Noordfoff Ltd., Groningen (1961)
Google Scholar
Lauritzen, S.: Statistical manifolds. In: Differential Geometry in Statistical Inference, Institute of Mathematical Statistics, California. Lecture Note-Monograph Series, vol. 10 (1987)
Google Scholar
Lovrić, M., Min-Oo, M., Ruh, E.: Multivariate normal distributions parametrized as a Riemannian symmetric space. J. Multivar. Anal. 74, 36–48 (2000)
Article MathSciNet MATH Google Scholar
Moser, J.: On the volume elements on a manifold. Trans. Am. Math. Soc. 120, 286–294 (1965)
Article MathSciNet MATH Google Scholar
Murray, M., Rice, J.: Differential Geometry and Statistics. Chapman & Hall, London (1993)
Book MATH Google Scholar
Neveu, J.: Bases Mathématiques du Calcul de Probabilités, deuxième édition. Masson, Paris (1970)
MATH Google Scholar
Newton, N.: An infinite-dimensional statistical manifold modelled on Hilbert space. J. Funct. Anal. 263, 1661–1681 (2012)
Article MathSciNet MATH Google Scholar
Pistone, G., Sempi, C.: An infinite-dimensional structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23(5), 1543–1561 (1995)
Article MathSciNet MATH Google Scholar
Rao, C.R.: Information and the accuracy attainable in the estimation of statistical parameters. Bull. Calcutta Math. Soc. 37, 81–89 (1945)
MathSciNet MATH Google Scholar
Santacroce, M., Siri, P., Trivellato, B.: New results on mixture and exponential models by Orlicz spaces. Bernoulli 22(3), 1431–1447 (2016)
Article MathSciNet MATH Google Scholar

Download references

Author information

Authors and Affiliations

Information Theory of Cognitive Systems, MPI for Mathematics in the Sciences, Leipzig, Germany
Nihat Ay
Geometric Methods and Complex Systems, MPI for Mathematics in the Sciences, Leipzig, Germany
Jürgen Jost
Mathematical Institute of ASCR, Czech Academy of Sciences, Praha 1, Czech Republic
Hông Vân Lê
Department of Mathematics, TU Dortmund University, Dortmund, Germany
Lorenz Schwachhöfer
Santa Fe Institute, Santa Fe, NM, USA
Nihat Ay & Jürgen Jost

Authors

Nihat Ay
View author publications
You can also search for this author in PubMed Google Scholar
Jürgen Jost
View author publications
You can also search for this author in PubMed Google Scholar
Hông Vân Lê
View author publications
You can also search for this author in PubMed Google Scholar
Lorenz Schwachhöfer
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Ay, N., Jost, J., Lê, H.V., Schwachhöfer, L. (2017). Parametrized Measure Models. In: Information Geometry. Ergebnisse der Mathematik und ihrer Grenzgebiete. 3. Folge / A Series of Modern Surveys in Mathematics, vol 64. Springer, Cham. https://doi.org/10.1007/978-3-319-56478-4_3

Download citation

DOI: https://doi.org/10.1007/978-3-319-56478-4_3
Published: 26 August 2017
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-56477-7
Online ISBN: 978-3-319-56478-4
eBook Packages: Mathematics and StatisticsMathematics and Statistics (R0)

Publish with us

Policies and ethics

Parametrized Measure Models

Abstract

3.1 The Space of Probability Measures and the Fisher Metric

Remark 3.1

Definition 3.1

Example 3.1

3.2 Parametrized Measure Models

3.2.1 The Structure of the Space of Measures

Remark 3.2

3.2.2 Tangent Fibration of Subsets of Banach Manifolds

Definition 3.2

Remark 3.3

Theorem 3.1

Proof

Remark 3.4

3.2.3 Powers of Measures

Remark 3.5

Definition 3.3

Lemma 3.1

Proof

Proposition 3.1

Proof

Remark 3.6

Proposition 3.2

Proof

3.2.4 Parametrized Measure Models and \(k\)-Integrability

Definition 3.4

Proposition 3.3

Proof

Definition 3.5

Remark 3.7

Example 3.2

Definition 3.6

Definition 3.7

Example 3.3

Remark 3.8

Proposition 3.4

Proof

Theorem 3.2

Remark 3.9

Example 3.4

Lemma 3.2

Proof

Lemma 3.3

Proof

Lemma 3.4

Proof

Lemma 3.5

Proof

Lemma 3.6

Proof

Lemma 3.7

Proof

Lemma 3.8

Proof

3.2.5 Canonical \(n\)-Tensors of an \(n\)-Integrable Model

Definition 3.8

Proposition 3.5

Proof

Definition 3.9

Definition 3.10

Proposition 3.6

Proof

Remark 3.10

Example 3.5

Remark 3.11

3.2.6 Signed Parametrized Measure Models

Definition 3.11

Definition 3.12

Example 3.6

3.3 The Pistone–Sempi Structure

3.3.1 \(e\)-Convergence

Definition 3.13

Proposition 3.7

Lemma 3.9

Proof

Proof of Proposition 3.7

3.3.2 Orlicz Spaces

Example 3.7

Proposition 3.8