On φ-Families of Probability Distributions

Vigelis, Rui F.; Cavalcante, Charles C.

doi:10.1007/s10959-011-0400-5

On φ-Families of Probability Distributions

Published: 21 December 2011

Volume 26, pages 870–884, (2013)
Cite this article

Download PDF

Access provided by Autonomous University of Puebla

Journal of Theoretical Probability Aims and scope Submit manuscript

On φ-Families of Probability Distributions

Download PDF

Rui F. Vigelis¹ &
Charles C. Cavalcante²

341 Accesses
32 Citations
Explore all metrics

Abstract

We generalize the exponential family of probability distributions. In our approach, the exponential function is replaced by a φ-function, resulting in a φ-family of probability distributions. We show how φ-families are constructed. In a φ-family, the analogue of the cumulant-generating function is a normalizing function. We define the φ-divergence as the Bregman divergence associated to the normalizing function, providing a generalization of the Kullback–Leibler divergence. A formula for the φ-divergence where the φ-function is the Kaniadakis κ-exponential function is derived.

On Normalization Functions and $$\varphi $$ -Families of Probability Distributions

On the Maximum $f$-Divergence of Probability Distributions Given the Value of Their Coupling

Article 01 October 2021

An extended-G geometric family

Article Open access 16 February 2016

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Let (T,Σ,μ) be a σ-finite, non-atomic measure space. We denote by $\mathcal{P}_{\mu}=\mathcal{P}(T,\Sigma,\mu)$ the family of all probability measures on T that are equivalent to the measure μ. The probability family $\mathcal{P}_{\mu}$ can be represented as (we adopt the same symbol $\mathcal{P}_{\mu}$ for this representation)

where L ⁰ is the linear space of all real-valued, measurable functions on T, with equality μ-a.e., and $\mathbb{E}[\cdot]$ denotes the expectation with respect to the measure μ.

The family $\mathcal{P}_{\mu}$ can be equipped with a structure of C ^∞-Banach manifold, using the Orlicz space $L^{\Phi _{1}}(p)=L^{\Phi_{1}}(T,\Sigma,p\cdot\mu)$ associated to the Orlicz function Φ₁(u)=exp(u)−1, for u≥0. With this structure, $\mathcal{P}_{\mu}$ is called the exponential statistical manifold, whose construction was proposed in [15] and developed in [3, 5, 14]. Each connected component of the exponential statistical manifold gives rise to an exponential family of probability distributions $\mathcal{E}_{p}$ (for each $p\in\mathcal{P}_{\mu}$). Each element of $\mathcal{E}_{p}$ can be expressed as

(1)

for a subset $\mathcal{B}_{p}$ of the Orlicz space $L^{\Phi_{1}}(p)$. K _p is the cumulant-generating functional $K_{p}(u)=\log\mathbb {E}_{p}[e^{u}]$, where $\mathbb{E}_{p}[\cdot]$ is the expectation with respect to p⋅μ. If c is a measurable function such that p=e ^c, then (1) can be rewritten as

(2)

where 1 _A is the indicator function of a subset A⊆T. A generalization of expression (1) was given in [13], where the exponential function is replaced by a κ-exponential function. In our generalization, we make use of expression (2).

In the φ-family of probability distributions $\mathcal {F}_{c}^{\varphi}$, which we propose, the exponential function is replaced by the so called φ-function $\varphi\colon T\times\overline{\mathbb {R}}\rightarrow[0,\infty]$. The function φ(t,⋅) has a “shape” which is similar to that of an exponential function, with an arbitrary rate of increasing. For example, we found that the κ-exponential function satisfies the definition of φ-functions. As in the exponential family, the φ-families are the connected component of $\mathcal{P}_{\mu}$, which is endowed with a structure of C ^∞-Banach manifold, using φ in the place of an exponential function. Let c be any measurable function such that φ(t,c(t)) belongs to $\mathcal{P}_{\mu}$. The elements of the φ-family of probability distributions $\mathcal{F}_{c}^{\varphi}$ are given by

(3)

for a subset $\mathcal{B}_{c}^{\varphi}$ of a Musielak–Orlicz space $L_{c}^{\varphi}$. The normalizing function $\psi\colon\mathcal {B}_{c}^{\varphi}\rightarrow[0,\infty)$ and the measurable function u ₀:T→[0,∞) in (3) replaces K _p and 1 _T in (2), receptively. The function u ₀ is not arbitrary. In the text, we will show how u ₀ can be chosen.

We define the φ-divergence as the Bregman divergence associated to the normalizing function ψ, providing a generalization of the Kullback–Leibler divergence. Then geometrical aspects related to the φ-family can be developed, since the Fisher information (on which the Information Geometry [1, 9] is based) is derived from the divergence. A formula for the φ-divergence where the φ-function is the Kaniadakis’ κ-exponential function [6, 11] is derived, which we called the κ-divergence.

We expect that an extension of our work will provide advances in other areas, like in Information Geometry or in the non-parametric, non-commutative setting [4, 12]. The rest of this paper is organized as follows. Section 2 deals with the topics of Musielak–Orlicz spaces we will use in the construction of the φ-family of probability distributions. In Sect. 3, the exponential statistical manifold is reviewed. The construction of the φ-family of probability distributions is given in Sect. 4. Finally, the φ-divergence is derived in Sect. 5.

2 Musielak–Orlicz Spaces

In this section we provide a brief introduction to Musielak–Orlicz (function) spaces, which are used in the construction of the exponential and φ-families. A more detailed exposition about these spaces can be found in [7, 10, 16].

We say that Φ:T×[0,∞]→[0,∞] is a Musielak–Orlicz function when, for μ-a.e. t∈T,

(i)
Φ(t,⋅) is convex and lower semi-continuous,
(ii)
Φ(t,0)=lim_u↓0Φ(t,u)=0 and Φ(t,∞)=∞,
(iii)
Φ(⋅,u) is measurable for all u≥0.

Items (i)–(ii) guarantee that Φ(t,⋅) is not equal to 0 or ∞ on the interval (0,∞). A Musielak–Orlicz function Φ is said to be an Orlicz function if the functions Φ(t,⋅) are identical for μ-a.e. t∈T.

Define the functional I _Φ(u)=∫_TΦ(t,|u(t)|) dμ, for any u∈L ⁰. The Musielak–Orlicz space, Musielak–Orlicz class, and Morse–Transue space, are given by

and

respectively. If the underlying measure space (T,Σ,μ) have to be specified, we write L ^Φ(T,Σ,μ), $\tilde {L}^{\Phi}(T,\Sigma,\mu)$ and E ^Φ(T,Σ,μ) in the place of L ^Φ, $\tilde{L}^{\Phi}$ and E ^Φ, respectively. Clearly, $E^{\Phi}\subseteq\tilde {L}^{\Phi}\subseteq L^{\Phi}$. The Musielak–Orlicz space L ^Φ can be interpreted as the smallest vector subspace of L ⁰ that contains $\tilde{L}^{\Phi}$, and E ^Φ is the largest vector subspace of L ⁰ that is contained in $\tilde{L}^{\Phi}$.

The Musielak–Orlicz space L ^Φ is a Banach space when it is endowed with the Luxemburg norm

or the Orlicz norm

where $\Phi^{*}(t,v)=\sup\nolimits_{u\geq0}(uv-\Phi(t,u))$ is the Fenchel conjugate of Φ(t,⋅). These norms are equivalent and the inequalities ∥u∥_Φ≤∥u∥_Φ,0≤2∥u∥_Φ hold for all u∈L ^Φ.

If we can find a non-negative function $f\in\tilde{L}^{\Phi}$ and a constant K>0 such that

then we say that Φ satisfies the Δ₂ -condition, or belong to the Δ₂ -class (denoted by Φ∈Δ₂). When the Musielak–Orlicz function Φ satisfies the Δ₂-condition, E ^Φ coincides with L ^Φ. On the other hand, if Φ is finite-valued and does not satisfy the Δ₂-condition, then the Musielak–Orlicz class $\tilde{L}^{\Phi}$ is not open and its interior coincides with

or, equivalently, $B_{0}(E^{\Phi},1)\varsubsetneq\tilde{L}^{\Phi }\varsubsetneq\overline{B}_{0}(E^{\Phi},1)$.

3 The Exponential Statistical Manifold

This section starts with the definition of a C ^k-Banach manifold [8]. A C ^k -Banach manifold is a set M and a collection of pairs (U _α,x _α) (α belonging to some indexing set), composed by open subsets U _α of some Banach space X _α, and injective mappings x _α:U _α→M, satisfying the following conditions:

(bm1)
the sets x _α(U _α) cover M, i.e., ⋃_α x _α(U _α)=M;
(bm2)
for any pair of indices α,β such that x _α(U _α)∩x _β(U _β)=W≠∅, the sets $\boldsymbol{x}_{\alpha}^{-1}(W)$ and $\boldsymbol{x}_{\beta}^{-1}(W)$ are open in X _α and X _β, respectively; and
(bm3)
the transition map $\boldsymbol{x}_{\beta}^{-1}\circ \boldsymbol{x}_{\alpha}\colon\boldsymbol{x}_{\alpha}^{-1}(W)\rightarrow \boldsymbol{x}_{\beta}^{-1}(W)$ is a C ^k-isomorphism.

The pair (U _α,x _α) with p∈x _α(U _α) is called a parametrization (or system of coordinates) of M at p; and x _α(U _α) is said to be a coordinate neighborhood at p.

The set M can be endowed with a topology in a unique way such that each x _α(U _α) is open, and the x _α’s are topological isomorphisms. We note that if k≥1 and two parametrizations (U _α,x _α) and (U _β,x _β) are such that x _α(U _α) and x _β(U _β) have a non-empty intersection, then from the derivative of $\boldsymbol {x}_{\beta}^{-1}\circ\boldsymbol{x}_{\alpha}$ we see that X _α and X _β are isomorphic.

Two collections {(U _α,x _α)} and {(V _β,x _β)} satisfying (bm1)–(bm3) are said to be C ^k -compatible if their union also satisfies (bm1)–(bm3). It can be verified that the relation of C ^k-compatibility is an equivalence relation. An equivalence class of C ^k-compatible collections {(U _α,x _α)} on M is said to define a C ^k -differentiable structure on X.

Now we review the construction of the exponential statistical manifold. We consider the Musielak–Orlicz space $L^{\Phi_{1}}(p)=L^{\Phi _{1}}(T,\Sigma,p\cdot\mu)$, where the Orlicz function Φ₁:[0,∞)→[0,∞) is given by Φ₁(u)=e ^u−1, and p is a probability density in $\mathcal{P}_{\mu}$. The space $L^{\Phi_{1}}(p)$ corresponds to the set of all functions u∈L ⁰ whose moment-generating function $\widehat{u}_{p}(\lambda)=\mathbb{E}_{p}[e^{\lambda u}]$ is finite in a neighborhood of 0.

For every function u∈L ⁰ we define the moment-generating functional

and the cumulant-generating functional

Clearly, these functionals are not expected to be finite for every u∈L ⁰. Denote by $\mathcal{K}_{p}$ the interior of the set of all functions $u\in L^{\Phi_{1}}(p)$ whose moment-generating functional M _p(u) is finite. Equivalently, a function $u\in L^{\Phi_{1}}(p)$ belongs to $\mathcal{K}_{p}$ if and only if M _p(λu) is finite for every λ in some neighborhood of [0,1]. The closed subspace of p-centered random variables

is taken to be the coordinate Banach space. The exponential parametrization $\boldsymbol{e}_{p}\colon\mathcal{B}_{p}\rightarrow \mathcal{E}_{p}$ maps $\mathcal{B}_{p}=B_{p}\cap\mathcal{K}_{p}$ to the exponential family $\mathcal{E}_{p}=\boldsymbol{e}_{p}(\mathcal{B}_{p})\subseteq \mathcal{P}_{\mu}$, according to

e _p is a bijection from $\mathcal{B}_{p}$ to its image $\mathcal{E}_{p}=\boldsymbol{e}_{p}(\mathcal{B}_{p})$, whose inverse $\boldsymbol{e}_{p}^{-1}\colon\mathcal{E}_{p}\rightarrow\mathcal{B}_{p}$ can be expressed as

Since K _p(u)<∞ for every $u\in\mathcal{K}_{p}$, we find that e _p can be extended to $\mathcal{K}_{p}$. The restriction of e _p to $\mathcal{B}_{p}$ guarantees that e _p is bijective.

Given two probability densities p and q in the same connected component of $\mathcal{P}_{\mu}$, the exponential probability families $\mathcal{E}_{p}$ and $\mathcal{E}_{q}$ coincide, and the exponential spaces $L^{\Phi_{1}}(p)$ and $L^{\Phi_{1}}(q)$ are isomorphic (see [14, Proposition 5]). Hence, $\mathcal{B}_{p}=\boldsymbol {e}_{p}^{-1}(\mathcal{E}_{p}\cap\mathcal{E}_{q})$ and $\mathcal{B}_{q}=\boldsymbol{e}_{q}^{-1}(\mathcal{E}_{p}\cap\mathcal {E}_{q})$. The transition map $\boldsymbol{e}_{q}^{-1}\circ\boldsymbol {e}_{p}:\mathcal{B}_{p}\rightarrow\mathcal{B}_{q}$, which can be written as

is a C ^∞-function. Clearly, $\bigcup_{p\in\mathcal{P}_{\mu }}e_{p}(\mathcal{B}_{p})=\mathcal{P}_{\mu}$. Thus the collection $\{(\mathcal{B}_{p},\boldsymbol{e}_{p})\}_{p\in \mathcal{P}_{\mu}}$ satisfies (bm1)–(bm2). Hence $\mathcal{P}_{\mu}$ is a C ^∞-Banach manifold, which is called the exponential statistical manifold.

4 Construction of the φ-Family of Probability Distributions

The generalization of the exponential family is based on the replacement of the exponential function by a φ-function $\varphi \colon T\times\overline{\mathbb{R}}\rightarrow[0,\infty]$ that satisfies the following properties, for μ-a.e. t∈T:

(a1)
φ(t,⋅) is convex and injective,
(a2)
φ(t,−∞)=0 and φ(t,∞)=∞,
(a3)
φ(⋅,u) is measurable for all u∈ℝ.

In addition, we assume a positive, measurable function u ₀:T→(0,∞) can be found such that, for every measurable function c:T→ℝ for which φ(t,c(t)) is in $\mathcal{P}_{\mu}$, we have

(a4)
φ(t,c(t)+λu ₀(t)) is μ-integrable for all λ>0.

The choice for φ(t,⋅) injective with image [0,∞] is justified by the fact that a parametrization of $\mathcal{P}_{\mu}$ maps real-valued functions to positive functions. Moreover, by (a1), φ(t,⋅) is continuous and strictly increasing. From (a3), the function φ(t,u(t)) is measurable if and only if u:T→ℝ is measurable. Replacing φ(t,u) by φ(t,u ₀(t)u), a “new” function u ₀=1 is obtained, satisfying (a4).

Example 1

The Kaniadakis’ κ-exponential exp_κ:ℝ→(0,∞) for κ∈[−1,1] is defined as

The inverse of exp_κ is the Kaniadakis’ κ-logarithm

Some algebraic properties of the ordinary exponential and logarithm functions are preserved:

For a measurable function κ:T→[−1,1], we define the variable κ-exponential exp_κ:T×ℝ→(0,∞) as

whose inverse is called the variable κ-logarithm:

Assuming that κ ₋=ess inf |κ(t)|>0, the variable κ-exponential exp_κ satisfies (a1)–(a4). The verification of (a1)–(a3) is easy. Moreover, we notice that exp_κ(t,⋅) is strictly convex. We can write for α≥1

By the convexity of exp_κ(t,⋅), we obtain for any λ∈(0,1)

Thus any positive function u ₀ such that $\mathbb{E}[\exp_{\kappa }(u_{0})]<\infty$ satisfies (a4).

Let c:T→ℝ be a measurable function such that φ(t,c(t)) is μ-integrable. We define the Musielak–Orlicz function

and denote L ^Φ, $\tilde{L}^{\Phi}$ and E ^Φ by $L_{c}^{\varphi}$, $\tilde{L}_{c}^{\varphi}$ and $E_{c}^{\varphi}$, respectively. Since φ(t,c(t)) is μ-integrable, the Musielak–Orlicz space $L_{c}^{\varphi}$ corresponds to the set of all functions u∈L ⁰ for which φ(t,c(t)+λu(t)) is μ-integrable for every λ contained in some neighborhood of 0.

Let $\mathcal{K}_{c}^{\varphi}$ be the set of all functions $u\in L_{c}^{\varphi}$ such that φ(t,c(t)+λu(t)) is μ-integrable for every λ in a neighborhood of [0,1]. Denote by φ the operator acting on the set of real-valued functions u:T→ℝ given by φ(u)(t)=φ(t,u(t)). For each probability density $p\in\mathcal{P}_{\mu}$, we can take a measurable function c:T→ℝ such that p=φ(c). The first import result in the construction of the φ-family is given below.

Lemma 2

The set $\mathcal{K}_{c}^{\varphi}$ is open in $L_{c}^{\varphi}$.

Proof

Take any $u\in\mathcal{K}_{c}^{\varphi}$. We can find ε∈(0,1) such that $\mathbb{E}[\boldsymbol{\varphi}(c+\alpha u)]<\infty$ for every α∈[−ε,1+ε]. Let $\delta=[ \frac {2}{\varepsilon}(1+\varepsilon)(1+ \frac{\varepsilon}{2})]^{-1}$. For any function $v\in L_{c}^{\varphi}$ in the open ball $B_{\delta}=\{w\in L_{c}^{\varphi}:\Vert w\Vert_{\Phi}<\delta\}$, we have $I_{\Phi}( \frac{v}{\delta})\leq1$. Thus $\mathbb {E}[\boldsymbol{\varphi}(c+ \frac{1}{\delta}|v|)]\leq2$. Taking any $\alpha\in(0,1+ \frac{\varepsilon}{2})$, we denote $\lambda=\frac{\alpha}{1+\varepsilon}$. In virtue of

it follows that

(4)

For $\alpha\in(- \frac{\varepsilon}{2},0)$, we can write

(5)

By (4) and (5), we get $\mathbb {E}[\boldsymbol{\varphi}(c+\alpha(u+v))]<\infty$, for any $\alpha\in(- \frac{\varepsilon}{2},1+ \frac{\varepsilon}{2})$. Hence the ball of radius δ centered at u is contained in $\mathcal{K}_{c}^{\varphi}$. Therefore, the set $\mathcal {K}_{c}^{\varphi}$ is open. □

Clearly, for $u\in\mathcal{K}_{c}^{\varphi}$ the function φ(c+u) is not necessarily in $\mathcal{P}_{\mu}$. The normalizing function $\psi\colon\mathcal{K}_{c}^{\varphi}\rightarrow\mathbb{R}$ is introduced in order to make the density

contained in $\mathcal{P}_{\mu}$, for any $u\in\mathcal{K}_{c}^{\varphi}$. We have to find the functions for which the normalizing function exists. For a function $u\in L_{c}^{\varphi}$, suppose that φ(c+u−αu ₀) is μ-integrable for some α∈ℝ. Then u is in the closure of the set $\mathcal{K}_{c}^{\varphi}$. Indeed, for any λ∈(0,1),

Since the function u ₀ satisfies (a4), we see that φ(c+λu) is μ-integrable. Hence the maximal, open domain of ψ is contained in $\mathcal{K}_{c}^{\varphi}$.

Proposition 3

If the function u is in $\mathcal{K}_{c}^{\varphi}$, then there exists a unique ψ(u)∈ℝ for which φ(c+u−ψ(u)u ₀) is a probability density in $\mathcal{P}_{\mu}$.

Proof

We will show that if the function u is in $\mathcal{K}_{c}^{\varphi}$, then φ(c+u+αu ₀) is μ-integrable for every α∈ℝ. Since u is in $\mathcal {K}_{c}^{\varphi}$, we can find ε>0 such that φ(c+(1+ε)u) is μ-integrable. Taking $\lambda= \frac{1}{1+\varepsilon}$, we can write

Thus φ(c+u+αu ₀) is μ-integrable. By the Dominated Convergence Theorem, the map $\alpha\mapsto J(\alpha )=\mathbb{E}[\boldsymbol{\varphi}(c+u+\alpha u_{0})]$ is continuous, tends to 0 as α→−∞, and goes to infinity as α→∞. Since φ(t,⋅) is strictly increasing, it follows that J(α) is also strictly increasing. Therefore, there exists a unique ψ(u)∈ℝ for which φ(c+u−ψ(u)u ₀) is a probability density in $\mathcal{P}_{\mu}$. □

The function $\psi\colon\mathcal{K}_{c}^{\varphi}\rightarrow\mathbb{R}$ can take both positive and negative values. However, if the domain of ψ is restricted to a subspace of $L_{c}^{\varphi}$, its image will be contained in [0,∞). We denote by $\boldsymbol {\varphi}_{+}'$ the operator acting on the set of real-valued functions u:T→ℝ given by $\boldsymbol{\varphi}_{+}'(u)(t)=\varphi_{+}'(t,u(t))$, where $\varphi_{+}'(t,\cdot)$ is the right-derivative of φ(t,⋅). Define the closed subspace

and let $\mathcal{B}_{c}^{\varphi}=B_{c}^{\varphi}\cap\mathcal {K}_{c}^{\varphi}$. By the convexity of φ(t,⋅), we have

Hence, for any $u\in\mathcal{B}_{c}^{\varphi}$, we get

Thus it follows that ψ(u)≥0 in order to find that φ(c+u−ψ(u)u ₀) is in $\mathcal{P}_{\mu}$.

For each measurable function c:T→ℝ such that p=φ(c) is the probability density in $\mathcal {P}_{\mu}$, we associate a parametrization $\boldsymbol{\varphi}_{c}\colon\mathcal {B}_{c}^{\varphi}\rightarrow\mathcal{F}_{c}^{\varphi}$ that maps any function u in $\mathcal{B}_{c}^{\varphi}$ to a probability density in $\mathcal{F}_{c}^{\varphi}=\varphi_{c}(\mathcal {B}_{c}^{\varphi})\subseteq\mathcal{P}_{\mu}$ according to

Clearly, we have $\mathcal{P}_{\mu}=\bigcup\{\mathcal{F}_{c}^{\varphi }:\boldsymbol{\varphi}(c)\in\mathcal{P}_{\mu}\}$. Moreover, the map φ _c is a bijection from $\mathcal{B}_{c}^{\varphi}$ to $\mathcal{F}_{c}^{\varphi}$. If the functions $u,v\in\mathcal{B}_{c}^{\varphi}$ are such that φ _c(u)=φ _c(v), then the difference u−v=(ψ(u)−ψ(v))u ₀ is in $B_{c}^{\varphi}$. Consequently, ψ(u)=ψ(v) and then u=v.

Suppose that the measurable functions c ₁,c ₂:T→ℝ are such that p ₁=φ(c ₁) and p ₂=φ(c ₂) belong to $\mathcal{P}_{\mu}$. The parametrizations $\boldsymbol{\varphi }_{c_{1}}\colon\mathcal{B}_{c_{1}}^{\varphi}\rightarrow\mathcal {F}_{c_{1}}^{\varphi}$ and $\boldsymbol{\varphi}_{c_{2}}\colon\mathcal{B}_{c_{2}}^{\varphi }\rightarrow\mathcal{F}_{c_{2}}^{\varphi}$ related to these functions have transition map

Let $\psi_{1}\colon\mathcal{B}_{c_{1}}^{\varphi}\rightarrow[0,\infty)$ and $\psi_{2}\colon\mathcal{B}_{c_{2}}^{\varphi}\rightarrow[0,\infty)$ be the normalizing functions associated to c ₁ and c ₂, respectively. Assume that the functions $u\in\mathcal {B}_{c_{1}}^{\varphi}$ and $v\in\mathcal{B}_{c_{2}}^{\varphi}$ are such that $\boldsymbol {\varphi}_{c_{1}}(u)=\boldsymbol{\varphi}_{c_{2}}(v)\in\mathcal {F}_{c_{1}}^{\varphi}\cap\mathcal{F}_{c_{2}}^{\varphi}$. Then we can write

Since the function v is in $B_{c_{2}}^{\varphi}$, if we multiply this equation by $\boldsymbol{\varphi}_{+}'(c_{2})$ and integrate with respect to the measure μ, we obtain

Thus the transition map $\boldsymbol{\varphi}_{c_{2}}^{-1}\circ \boldsymbol{\varphi}_{c_{1}}$ can be expressed as

(6)

for every $w\in\boldsymbol{\varphi}_{c_{1}}^{-1}(\mathcal {F}_{c_{1}}^{\varphi}\cap\mathcal{F}_{c_{2}}^{\varphi})$. Clearly, this transition map will be of class C ^∞ if we show that the functions w and c ₁−c ₂ are in $L_{c_{2}}^{\varphi}$, and the spaces $L_{c_{1}}^{\varphi}$ and $L_{c_{2}}^{\varphi}$ have equivalent norms. It is not hard to verify that if two Musielak–Orlicz spaces are equal as sets, then their norms are equivalent (see [10, Theorem 8.5]). We make use of the following:

Proposition 4

Assume that the measurable functions $\widetilde {c},c\colon T\rightarrow\mathbb{R}$ satisfy $\mathbb{E}[\varphi(t,\widetilde{c}(t))]<\infty$ and $\mathbb {E}[\varphi(t,c(t))]<\infty$. Then $L_{\widetilde{c}}^{\varphi}\subseteq L_{c}^{\varphi}$ if and only if $\widetilde{c}-c\in L_{c}^{\varphi}$.

Proof

Suppose that $\widetilde{c}-c$ is not in $L_{c}^{\varphi}$. Let $A=\{t\in T:\widetilde{c}(t)<c(t)\}$. For λ∈[0,1], we have

Since $\widetilde{c}-c\notin L_{c}^{\varphi}$, for any λ>0, there holds $\mathbb{E}[\boldsymbol{\varphi}(c-\lambda(\widetilde {c}-c))]=\infty$. From

we see that $(c-\widetilde{c})\boldsymbol{1}_{A}$ does not belong to $L_{c}^{\varphi}$. Clearly, $(c-\widetilde{c})\boldsymbol{1}_{A}\in L_{\widetilde{c}}^{\varphi}$. Consequently, $L_{\widetilde{c}}^{\varphi}$ is not contained in $L_{c}^{\varphi}$.

Conversely, assume $\widetilde{c}-c\in L_{c}^{\varphi}$. Let w be any function in $L_{\widetilde{c}}^{\varphi}$. We can find ε>0 such that $\mathbb{E}[\boldsymbol{\varphi}(\widetilde{c}+\lambda w)]<\infty$, for every λ∈(−ε,ε). Consider the convex function

This function is finite for λ=0 and α in the interval (−η,1], for some η>0. Moreover, g(1,λ) is finite for every λ∈(−ε,ε). By the convexity of g, we see that g is finite in the convex hull of the set 1×(−ε,ε)∪(−η,1]×0. We find that g(0,λ) is finite for every λ in some neighborhood of 0. Consequently, $w\in L_{c}^{\varphi}$. Since $w\in L_{c}^{\varphi}$ is arbitrary, the inclusion $L_{\widetilde{c}}^{\varphi}\subseteq L_{c}^{\varphi}$ follows. □

Lemma 5

If the function u is in $\mathcal{K}_{c}^{\varphi}$ and we denote $\widetilde{c}=c+u-\psi(u)u_{0}$, then the spaces $L_{c}^{\varphi}$ and $L_{\widetilde{c}}^{\varphi}$ are equal as sets.

Proof

The inclusion $L_{\widetilde{c}}^{\varphi}\subseteq L_{c}^{\varphi}$ follows from Proposition 4. Since $u\in\mathcal {K}_{c}^{\varphi}$, we have

for every λ in a neighborhood of 0. Thus $c-\widetilde {c}=-u+\psi(u)u_{0}$ belongs to $L_{\widetilde{c}}^{\varphi}$. From Proposition 4, we obtain $L_{\widetilde{c}}^{\varphi}\subseteq L_{c}^{\varphi}$. □

By Lemma 5, if we denote $c_{1}+u-\psi _{1}(u)u_{0}=\widetilde{c}=c_{2}+v-\psi_{2}(v)u_{0}$, we find that the spaces $L_{c_{1}}^{\varphi}$, $L_{\widetilde {c}}^{\varphi}$ and $L_{c_{2}}^{\varphi}$ are equal as sets. In (6), the function w is in $L_{c_{2}}^{\varphi}$ and consequently c ₁−c ₂ is in $L_{c_{2}}^{\varphi}$. Therefore, the transition map $\boldsymbol {\varphi}_{c_{2}}^{-1}\circ\boldsymbol{\varphi}_{c_{1}}$ is of class C ^∞.

Since $\boldsymbol{\varphi}_{c_{2}}^{-1}\circ\boldsymbol{\varphi}_{c_{1}}$ is of class C ^∞, the set $\boldsymbol{\varphi }_{c_{1}}^{-1}(\mathcal{F}_{c_{1}}^{\varphi}\cap\mathcal {F}_{c_{2}}^{\varphi})$ is open $B_{c_{1}}^{\varphi}$. The φ-families $\mathcal {F}_{c}^{\varphi}$ are maximal in the sense that if two φ-families $\mathcal {F}_{c_{1}}^{\varphi}$ and $\mathcal{F}_{c_{2}}^{\varphi}$ have non-empty intersection, then they coincide.

Lemma 6

For a function u in $\mathcal{B}_{c}^{\varphi}$, denote $\widetilde{c}=c+u-\psi(u)u_{0}$. Then $\mathcal{F}_{c}^{\varphi }=\mathcal{F}_{\widetilde{c}}^{\varphi}$.

Proof

Let v be a function in $\mathcal{B}_{c}^{\varphi}$. Then there exists ε>0 such that, for every λ∈(−ε,1+ε), the function φ(c+λv+(1−λ)u) is μ-integrable. Consequently, $\varphi(\widetilde{c}+\lambda(v-u))$ is μ-integrable for all λ∈(−ε,1+ε). Thus the difference v−u is in $\mathcal{K}_{\widetilde{c}}^{\varphi}$ and

(7)

belongs to $\mathcal{B}_{\widetilde{c}}^{\varphi}$. Let $\widetilde{\psi }\colon\mathcal{B}_{\widetilde{c}}^{\varphi}\rightarrow[0,\infty)$ be the normalizing function associated to $\widetilde{c}$. Then the probability density $\boldsymbol{\varphi}(\widetilde{c}+w-\widetilde {\psi}(w)u_{0})$ is in $\mathcal{F}_{\widetilde{c}}^{\varphi}$. This probability density can be expressed as φ(c+v−ku ₀) for a constant k. According to Proposition 3, there exists a unique ψ(u)∈ℝ such that the probability density φ(c+v−ψ(v)u ₀) is in $\mathcal{F}_{c}^{\varphi}$. Therefore, $\mathcal{F}_{c}^{\varphi }\subseteq\mathcal{F}_{\widetilde{c}}^{\varphi}$.

Using the same arguments as in the previous paragraph, we obtain $c=\widetilde{c}+w-\widetilde{\psi}(w)u_{0}$, where the function $w\in\mathcal{B}_{\widetilde{c}}^{\varphi}$ is given in (7) with v=0. Thus $\mathcal{F}_{\widetilde{c}}^{\varphi}\subseteq\mathcal {F}_{c}^{\varphi}$. □

By Lemma 6, if we denote $c_{1}+u-\psi _{1}(u)u_{0}=\widetilde{c}=c_{2}+v-\psi_{2}(v)u_{0}$, then we have the equality $\mathcal{F}_{c_{1}}^{\varphi}=\mathcal {F}_{\widetilde{c}}^{\varphi}=\mathcal{F}_{c_{2}}^{\varphi}$.

The results obtained in these lemmas are summarized in the next proposition.

Proposition 7

Let c ₁,c ₂:T→ℝ be measurable functions such that the probability densities p ₁=φ(c ₁) and p ₂=φ(c ₂) are in $\mathcal{P}_{\mu}$. Suppose $\mathcal{F}_{c_{1}}^{\varphi}\cap\mathcal{F}_{c_{2}}^{\varphi }\neq\emptyset$. Then the Musielak–Orlicz spaces $L_{c_{1}}^{\varphi}$ and $L_{c_{2}}^{\varphi}$ are equal as sets, and have equivalent norms. Moreover, $\mathcal {F}_{c_{1}}^{\varphi}=\mathcal{F}_{c_{2}}^{\varphi}$.

Thus we can state:

Proposition 8

The collection $\{(\mathcal{B}_{c}^{\varphi},\boldsymbol {\varphi}_{c})\}_{\boldsymbol{\varphi}(c)\in\mathcal{P}_{\mu}}$ satisfies (bm1)–(bm2), equipping $\mathcal{P}_{\mu}$ with a C ^∞-differentiable structure.

5 Divergence

In this section we define the divergence between two probability distributions. The entities found in Information Geometry [1, 9], like the Fisher information, connections, geodesics, etc., are all derived from the divergence taken in the considered family. The divergence we will found is the Bregman divergence [2] associated to the normalizing function $\psi\colon\mathcal{K}_{c}^{\varphi }\rightarrow[0,\infty)$. We show that our divergence does not depend on the parametrization of the φ-family $\mathcal{F}_{c}^{\varphi}$.

Let S be a convex subset of a Banach space X. Given a convex function f:S→ℝ, the Bregman divergence B _f:S×S→[0,∞) is defined as

for all x,y∈S, where ∂ ₊ f(x)(h)=lim_t↓0(f(x+th)−f(x))/t denotes the right-directional derivative of f at x in the direction of h. The right-directional derivative ∂ ₊ f(x)(h) exists and defines a sublinear functional. If the function f is strictly convex, the divergence satisfies B _f(y,x)=0 if and only if x=y.

Let X and Y be Banach spaces, and U⊆X be an open set. A function f:U→Y is said to be Gâteaux-differentiable at x ₀∈U if there exists a bounded linear map A:X→Y such that

for every h∈X. The Gâteaux derivative of f at x ₀ is denoted by A=∂f(x ₀). If the limit above can be taken uniformly for every h∈X such that ∥h∥≤1, then the function f is said to be Fréchet-differentiable at x ₀. The Fréchet derivative of f at x ₀ is denoted by A=Df(x ₀).

Now we verify that $\psi\colon\mathcal{K}_{c}^{\varphi}\rightarrow \mathbb{R}$ is a convex function. Take any $u,v\in\mathcal{K}_{c}^{\varphi}$ such that u≠v. Clearly, the function λu+(1−λ)v is in $\mathcal{K}_{c}^{\varphi}$, for any λ∈(0,1). By the convexity of φ(t,⋅), we can write

Since φ(c+λu+(1−λ)v−ψ(λu+(1−λ)v)u ₀) has μ-integral equal to 1, we can conclude that the following inequality holds:

So we can define the Bregman divergence B _ψ from to the normalizing function ψ.

The Bregman divergence $B_{\psi}\colon\mathcal{B}_{c}^{\varphi}\times \mathcal{B}_{c}^{\varphi}\rightarrow[0,\infty)$ associated to the normalizing function $\psi\colon\mathcal {B}_{c}^{\varphi}\rightarrow[0,\infty)$ is given by

Then we define the divergence $D_{\psi}\colon\mathcal{B}_{c}^{\varphi }\times\mathcal{B}_{c}^{\varphi}\rightarrow[0,\infty)$ related to the φ-family $\mathcal{F}_{c}^{\varphi}$ as

The entries of B _ψ are inverted in order that D _ψ corresponds in some way to the Kullback–Leibler divergence $D_{\mathrm{KL}}(p,q)=\mathbb{E}[p\log( \frac{p}{q})]$. Assuming that φ(t,⋅) is continuously differentiable, we will find an expression for ∂ψ(u).

Lemma 9

Assume that φ(t,⋅) is continuously differentiable. For any $u\in\mathcal{K}_{c}^{\varphi}$, the linear functional $f_{u}\colon L_{c}^{\varphi}\rightarrow\mathbb{R}$ given by $f_{u}(v)=\mathbb{E}[v\boldsymbol{\varphi}'(c+u)]$ is bounded.

Proof

Every function $v\in L_{c}^{\varphi}$ with norm ∥v∥_Φ,0≤1 satisfies I _Φ(v)≤∥v∥_Φ,0. Then we obtain

Since $u\in\mathcal{K}_{c}^{\varphi}$, we can find λ∈(0,1) such that $\mathbb{E}[\boldsymbol{\varphi}(c+ \frac{1}{\lambda }u)]<\infty$. We can write

Thus the absolute value of $f_{u}(v)=\mathbb{E}[v\boldsymbol{\varphi}'(c+u)]$ is bounded by some constant for ∥v∥_Φ,0≤1. □

Lemma 10

Assume that φ(t,⋅) is continuously differentiable. Then the normalizing function $\psi\colon\mathcal{K}_{c}^{\varphi}\rightarrow \mathbb{R}$ is Gâteaux-differentiable and

(8)

Proof

According to Lemma 9, the expression in (8) defines a bounded linear functional. Fix functions $u\in\mathcal {K}_{c}^{\varphi}$ and $v\in L_{c}^{\varphi}$. In virtue of Proposition 4, we can find ε>0 such that $\mathbb{E}[\boldsymbol{\varphi }(c+u+\lambda|v|)]<\infty$, for every λ∈[−ε,ε]. Define

for any λ∈(−ε,ε) and k≥0. Since $\mathcal{K}_{c}^{\varphi}$ is open, there exist a sufficiently small α ₀>0 such that u+λv+α|v| is in $\mathcal {K}_{c}^{\varphi}$ for all α∈[−α ₀,α ₀]. We can write

The function in the expectation above is dominated by the μ-integrable function $\frac{1}{\alpha_{0}}\{\boldsymbol{\varphi}(c+u+\lambda v+\alpha_{0}|v|-ku_{0})-\boldsymbol{\varphi}(c+u+\lambda v-ku_{0})\}$. By the Dominated Convergence Theorem,

and, consequently,

Since v φ′(c+u+λv−ku ₀) is dominated by the μ-integrable function |v|φ′(c+u+ε|v|−ku ₀), we obtain for any sequence λ _n→λ,

Thus $\frac{\partial g}{\partial\lambda}(\lambda,k)$ is continuous with respect to λ. Analogously, it can be shown that

and $\frac{\partial g}{\partial k}(\lambda,k)$ is continuous with respect to k. The equality $g(\lambda,k(\lambda))=\mathbb {E}[\boldsymbol{\varphi}(c+u+\lambda v-k(\lambda)u_{0})]=1$ defines k(λ)=ψ(u+λv) as an implicit function of λ. Notice that $\frac{\partial g(0,k)}{\partial k}<0$. By the Implicit Function Theorem, the function k(λ)=ψ(u+λv) is continuously differentiable in a neighborhood of 0, and has derivative

Consequently,

Thus the expression in (8) is the Gâteaux-derivative of ψ. □

Lemma 11

Assume that φ(t,⋅) is continuously differentiable. Then the divergence D _ψ does not depend on the parametrization of $\mathcal{F}_{c}^{\varphi}$.

Proof

For any $w\in\mathcal{B}_{c}^{\varphi}$, we denote $\widetilde {c}=c+w-\psi(w)u_{0}$. Given $u,v\in\mathcal{B}_{c}^{\varphi}$, select $\widetilde {u},\widetilde{v}\in\mathcal{B}_{\widetilde{c}}^{\varphi}$ such that $\boldsymbol{\varphi}_{\widetilde{c}}(\widetilde {u})=\boldsymbol{\varphi}_{c}(u)$ and $\boldsymbol{\varphi}_{\widetilde{c}}(\widetilde{v})=\boldsymbol {\varphi}_{c}(v)$. Let $\widetilde{\psi}\colon\mathcal{B}_{\widetilde{c}}^{\varphi }\rightarrow[0,\infty)$ be the normalizing function associated to $\widetilde{c}$. These definitions provide

and

Subtracting these equations, we obtain

and, consequently,

Therefore, $D_{\widetilde{\psi}}(\widetilde{u},\widetilde{v})=D_{\psi}(u,v)$. □

Let p=φ _c(u) and q=φ _c(v), for $u,v\in\mathcal{B}_{c}^{\varphi}$. We denote the divergence between the probability densities p and q by

According to Lemma 11, D(p∥q) is well-defined if p and q are in the same φ-family. We will find an expression for D(p∥q) where p and q are given explicitly. For u=0, we have D(p∥q)=D _ψ(0,v)=ψ(v), and then

Therefore, the divergence between probability densities p and q in the same φ-family can be expressed as

(9)

Clearly, the expectation in (9) may not be defined if p and q are not in the same φ-family. We extend the divergence in (9) by setting D(p∥q)=∞ if p and q are not in the same φ-family. With this extension, the divergence is denoted by D _φ and is called the φ-divergence. By the strict convexity of φ(t,⋅), we have the inequality φ ⁻¹(t,u)−φ ⁻¹(t,v)≥(φ ⁻¹)′(t,u)(u−v) for any u,v>0, with equality if and only if u=v. Hence D _φ is always non-negative, and D _φ(p∥q) is equal to zero if and only if p=q.

Example 12

With the variable κ-exponential exp_κ(t,u)=exp_κ(t)(u) in the place of φ(t,u), whose inverse φ ⁻¹(t,u) is the variable κ-logarithm ln_κ(t,u)=ln_κ(t)(u), we rewrite (9) as

(10)

where ln _κ(p) denotes ln_κ(t)(p(t)). Since the κ-logarithm $\ln_{\kappa}(u)=\frac{u^{\kappa}-u^{-\kappa}}{2\kappa}$ has derivative $\ln_{\kappa}'(u)= \frac{1}{u} \frac{u^{\kappa }+u^{-\kappa}}{2}$, the numerator and denominator in (10) result in

and

respectively. Thus (10) can be rewritten as

which we called the κ-divergence.

References

Amari, S., Nagaoka, H.: Methods of Information Geometry. Translations of Mathematical Monographs, vol. 191. Am. Math. Soc., Providence (2000). Translated from the 1993 Japanese original by Daishi Harada
MATH Google Scholar
Bregman, L.M.: The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming. U.S.S.R. Comput. Math. Math. Phys. 7(3), 200–217 (1967)
Article Google Scholar
Cena, A., Pistone, G.: Exponential statistical manifold. Ann. Inst. Stat. Math. 59(1), 27–56 (2007)
Article MathSciNet MATH Google Scholar
Gibilisco, P., Riccomagno, E., Rogantin, M.P., Wynn, H.P. (eds.): Algebraic and Geometric Methods in Statistics. Cambridge University Press, Cambridge (2010)
MATH Google Scholar
Grasselli, M.R.: Dual connections in nonparametric classical information geometry. Ann. Inst. Stat. Math. 62(5), 873–896 (2010)
Article MathSciNet Google Scholar
Kaniadakis, G.: Statistical mechanics in the context of special relativity. Phys. Rev. E 66(5), 056125 (2002)
Article MathSciNet Google Scholar
Krasnosel’skiĭ, M.A., Rutickiĭ, J.B.: Convex Functions and Orlicz Spaces. Noordhoff, Groningen (1961). Translated from the first Russian edition by Leo F. Boron
Google Scholar
Lang, S.: Differential and Riemannian Manifolds. Graduate Texts in Mathematics, vol. 160, 3rd edn. Springer, New York (1995)
Book MATH Google Scholar
Murray, M.K., Rice, J.W.: Differential Geometry and Statistics. Monographs on Statistics and Applied Probability, vol. 48. Chapman & Hall, London (1993)
MATH Google Scholar
Musielak, J.: Orlicz Spaces and Modular Spaces. Lecture Notes in Mathematics, vol. 1034. Springer, Berlin (1983)
MATH Google Scholar
Naudts, J.: Generalised Thermostatistics. Springer, London (2011)
Book MATH Google Scholar
Petz, D.: Quantum Information Theory and Quantum Statistics. Theoretical and Mathematical Physics. Springer, Berlin (2008)
MATH Google Scholar
Pistone, G.: κ-exponential models from the geometrical viewpoint. Eur. Phys. J. B 70(1), 29–37 (2009)
Article MATH Google Scholar
Pistone, G., Rogantin, M.P.: The exponential statistical manifold: mean parameters, orthogonality and space transformations. Bernoulli 5(4), 721–760 (1999)
Article MathSciNet MATH Google Scholar
Pistone, G., Sempi, C.: An infinite-dimensional geometric structure on the space of all the probability measures equivalent to a given one. Ann. Stat. 23(5), 1543–1561 (1995)
Article MathSciNet MATH Google Scholar
Rao, M.M., Ren, Z.D.: Theory of Orlicz Spaces. Monographs and Textbooks in Pure and Applied Mathematics, vol. 146. Dekker, New York (1991)
MATH Google Scholar

Download references

Acknowledgements

We are indebted to the anonymous referees for significant comments and suggestions leading to the current version. This work received financial support from CAPES—Coordenação de Aperfeiçoamento de Pessoal de Nível Superior.

Author information

Authors and Affiliations

Computer Engineering, Campus Sobral, Federal University of Ceará, Fortaleza, CE, Brazil
Rui F. Vigelis
Wireless Telecommunication Research Group, Department of Teleinformatics Engineering, Federal University of Ceará, Fortaleza, CE, Brazil
Charles C. Cavalcante

Authors

Rui F. Vigelis
View author publications
You can also search for this author in PubMed Google Scholar
Charles C. Cavalcante
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Rui F. Vigelis.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vigelis, R.F., Cavalcante, C.C. On φ-Families of Probability Distributions. J Theor Probab 26, 870–884 (2013). https://doi.org/10.1007/s10959-011-0400-5

Download citation

Received: 01 June 2011
Revised: 27 September 2011
Published: 21 December 2011
Issue Date: September 2013
DOI: https://doi.org/10.1007/s10959-011-0400-5

Keywords

Mathematics Subject Classification (2000)

60E05

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

On φ-Families of Probability Distributions

Abstract

Similar content being viewed by others

On Normalization Functions and $$\varphi $$ -Families of Probability Distributions

On the Maximum \(f\)-Divergence of Probability Distributions Given the Value of Their Coupling

An extended-G geometric family

1 Introduction

2 Musielak–Orlicz Spaces

3 The Exponential Statistical Manifold

4 Construction of the φ-Family of Probability Distributions

Example 1

Lemma 2

Proof

Proposition 3

Proof

Proposition 4

Proof

Lemma 5

Proof

Lemma 6

Proof

Proposition 7

Proposition 8

5 Divergence

Lemma 9

Proof

Lemma 10

Proof

Lemma 11

Proof

Example 12

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Mathematics Subject Classification (2000)

Search

Navigation