Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1 Introduction

In this paper we study a geometry on the set \(\mathscr {P}\) of strictly positive probability densities on a probability space \((\mathbb {X},\mathscr {X},\mu )\). In some cases one is led to consider the set \(\overline{\mathscr {P}}\) of probability densities i.e., without the restriction of strict positivity. There is a considerable literature on the Information Geometry in the sense defined in the Amari and Nagaoka monograph [2] on \(\mathscr {P}\). There is also a non-parametric approach i.e., we are not considering the geometry induced on the parameter set of a given statistical model but on the full set of densities. This was done in [21, 23] by using logarithmic chart to represent densities.

A different approach, that leads to the construction of an Hilbert manifold on \(\mathscr {P}\), has been proposed by N.J. Newton in [18, 19]. It is based on the use of the chart \(p \mapsto p - 1 - \log p\) instead of a purely logarithmic chart. This paper presents a variation on the same theme by enlarging the class of permitted charts.

Let \(\mathscr {M} \subset \mathscr {P}\). At each \(p \in \mathscr {M} \), the Hilbert space of square-integrable random variables \(L^2(p)\) provides a fiber that sits at \(p \in \mathscr {M}\), so we can define the Hilbert bundle with base \(\mathscr {M}\). The Hilbert bundle, or similar bundles with fibers which are vector spaces of random variables, provides a convenient framework for Information Geometry, cf. [1, 12, 21].

If \(\mathscr {M}\) is an exponential manifold in the sense of [23], there exists a splitting of each fiber \(L^2(p) = \mathscr {H}_p \oplus \mathscr {H}_p^\perp \), such that each \(\mathscr {H}_p\) contains a dense vector sub-space which is an expression of the tangent space \(T_p\mathscr {M}\) of the manifold. Moreover, the manifold on \(\mathscr {M}\) is an affine manifold (it can be defined by an atlas whose transition mapping are affine) and it is also an Hessian manifold (the inner product on each fiber is the second derivative of a potential function, [24]).

When the sample space is finite and \(\mathscr {M}\) is the full set \(\mathscr {P}\) of positive probability densities, then \(\mathscr {H}_p\) is the space of centered square integrable random variables \(L^2_0(p)\) and moreover there is an identification of the fiber with the tangent space \(\mathscr {H}_p \simeq T_p\mathscr {P}\). A similar situation occurs even when \(\mathscr {M}\) is a finite-dimensional exponential family. It is difficult to devise set-ups other than those mentioned above, where the identification of the Hilbert fiber with the tangent space holds true. In fact, a necessary condition would be the topological linear isomorphism among fibers. One possible option would be to take as fibers the spaces of bounded functions \(L^\infty _0(p)\), see G. Loaiza and H.R. Quiceno [14].

This difficulty is overcome in the N.J. Newton’s setting. On a probability space \((X,\mathscr {X}, \mu )\), he considers the “balanced chart” \(\mathscr {M} \ni p \mapsto \log p + p - 1 \in L^2_0(\mu )\). In this chart, all the tangent spaces are identified with the fixed Hilbert space \(L^2_0(\mu )\) so that the statistical Hilbert bundle is trivialized.

N.J. Newton balanced chart falls in a larger class of “deformation” of the usual logarithmic representation. It is in fact an instance of the class of “deformed logarithm” as defined by J. Naudts [17]. It is defined as \(\log _A(x) = \int _1^x dt/A(t)\), where A is a suitable increasing function. If A is bounded, then a special class of deformed logarithms results. It includes N.J. Newton balanced chart as well as other deformed logarithms, notably the G. Kaniadakis logarithm [10, 11, 20].

In this paper, we try a mixture of the various approaches by considering deformed logarithms with linear growth as established by N.J. Newton, but we do not look for a trivialization of the Hilbert bundle. Instead we construct an affine atlas of charts, each one centered at a \(p \in \mathscr {M}\). This is obtained by adapting the construction of the exponential manifold of [21] to the deformed exponential models as defined by J. Naudts [17]. Moreover, we allow for a form of general reference measure by using an idea introduced by R.F. Vigelis and C.C. Cavalcante [26]. That is, each density has the form \(q = \exp _A(u - K_p(u) + \log _A p)\), where \(\exp _A = \log _A^{-1}\) is an exponential-like function which has a linear growth at \(+\infty \) and is dominated by an exponential at \(-\infty \).

The formalism of deformed exponentials is discussed in Sect. 2. This section is intended to be self-contained and contains material from the references discussed above without an explicit mention. The following Sect. 3 is devoted to the study of non-parametric deformed exponential families. In Sect. 4 we introduce the formulation of the divergence, in accordance with our approach. In Sect. 5 the construction of the Hilbert statistical bundle is outlined.

A first version of this piece of research has been presented at the GSI 2017 Conference [16] and we refer to that paper for some of the proofs.

2 Deformed Exponential

Let us introduce a class of the deformed exponential, according to the formalism introduced by [17]. Assume to be given a function A from \(]0,+\infty [ \) onto ]0, a[, strictly increasing, continuously differentiable and such that \(\left\| A^{\prime }\right\| _{\infty } < \infty \). This implies \(a = \left\| A\right\| _{\infty }\) and \(A(x) \le \left\| A^{\prime }\right\| _{\infty } x\), so that \(\int _0^1 d\xi /A(\xi ) \ = +\infty \).

The A-logarithm is the function

$$\begin{aligned} \log _{A}(x)=\int _{1}^{x}\frac{d\xi }{A(\xi )}\ ,\quad x\in ]0,+\infty [\ . \end{aligned}$$

The A-logarithm is strictly increasing from \(-\infty \) to \(+\infty \), its derivative \(\log _{A}^{\prime }(x)=1/A(x)\) is positive and strictly decreasing for all \(x>0\), hence \(\log _{A}\) is strictly concave.

By inverting the A-logarithm, one obtains the A-exponential, \(\exp _{A}=\log _{A}^{-1}\). The function \(\exp _{A}:]-\infty ,+\infty [\rightarrow ]0,+\infty [\) is strictly increasing, strictly convex, and is the solution to the Cauchy problem

$$\begin{aligned} \exp _{A}^{\prime }(y)=A(\exp _{A}(y)),\quad \exp _{A}(0)=1\ . \end{aligned}$$
(1)

As a consequence, we have the linear bound

$$\begin{aligned} \left| \exp _{A}(y_{1})-\exp _{A}(y_{2})\right| \le \left\| A\right\| _{\infty }\left| y_{1}-y_{2}\right| \ . \end{aligned}$$
(2)

The behavior of the A-logarithm is linear for large arguments and super-logarithmic for small arguments. To derive explicit bounds, set

$$\begin{aligned} \alpha _1 = \min _{x\le 1} \frac{A(x)}{x} \ , \quad \alpha _2 = \max _{x \le 1} \frac{A(x)}{x} \ , \end{aligned}$$

namely, they are the best constants such that \(\alpha _1 x \le A(x) \le \alpha _2 x\) for \(0 < x \le 1\). Note that \(\alpha _1 \ge 0\) while \(\alpha _2 > 0\). If in addition also \(\alpha _1 > 0\), then

$$\begin{aligned} \frac{1}{\alpha _2} \log x \le \log _A x \le \frac{1}{\alpha _1} \log x \ , \quad 0 < x \le 1 \ . \end{aligned}$$
(3)

If otherwise \(\alpha _1=0\), the left inequality is true only.

For \(x \ge 1\) we have \(A(1) \le A(x) < \left\| A\right\| _{\infty }\), hence

$$\begin{aligned} \frac{1}{\left\| A\right\| _{\infty }}(x-1) < \log _A x \le \frac{1}{A(1)}(x-1) \ , \quad x \ge 1 \ . \end{aligned}$$
(4)

Under the assumptions made on the function A, the coefficient \(\alpha _1 > 0\), if and only if \(A'(0+) > 0\).

2.1 Examples

The main example of A-logarithm is the N.J. Newton A-logarithm [18], with

$$\begin{aligned} A(\xi )=1-\frac{1}{1+\xi }=\frac{\xi }{1+\xi } \ , \end{aligned}$$

so that

$$\begin{aligned} \log _A(x) = \log x + x - 1\ . \end{aligned}$$

There is a simple algebraic expression for the product,

$$\begin{aligned} \log _A(x_1x_2) = \log _A(x_1) + \log _A(x_2) + (x_1-1)(x_2-1) \ . \end{aligned}$$

Other similar examples are available in the literature. One is a special case of the G. Kaniadakis’ exponential of [9], generated by

$$\begin{aligned} A(\xi ) = \frac{2\xi ^2}{1+\xi ^2} \ . \end{aligned}$$

It turns out

$$\begin{aligned} \log _A x = \frac{x-x^{-1}}{2} \ , \end{aligned}$$

whose inverse provides

$$\begin{aligned} \exp _A(y) = y + \sqrt{1+y^2} \ . \end{aligned}$$

A remarkable feature of the G. Kaniadakis’ exponential is the relation

$$\begin{aligned} \exp _A(y)\exp _A(-y) = \left( y+\sqrt{1+y^2}\right) \left( -y+\sqrt{1+y^2} \right) = 1 \end{aligned}$$

Notice that the A function for N.J. Newton exponential is concave, while the A function of G. Kaniadakis exponential is not.

Another example is \(A(\xi ) = 1 - 2^{-\xi }\), which gives \(\log _A(x) = \log _2(1 - 2^{-x})\) and \(\exp _A(y) = \log _2(1+2^y)\).

Notable examples of deformed exponentials that do not fit into our set of assumptions are Tsallis q-logarithms, see [25]. For instance, for \(q=1/2\),

$$\begin{aligned} \log _{1/2}x = 2\left( \sqrt{x}-1\right) = \int _{1}^{x}\frac{d \xi }{\sqrt{\xi }}. \end{aligned}$$

In this case, \(\log _{1/2}(0+)=-\int _{0}^{1}d\xi /\sqrt{\xi }=-2\), so that the inverse is not defined for all real numbers. Tsallis logarithms provide models having heavy tails, which is not the case in our setting.

2.2 Superposition Operator

The deformed exponential will be employed to represent positive probability densities in the type \(p(x) = \exp _A[u(x)]\), where u is a random variable on a probability space \((\mathbb {X}, \mathscr {X},\mu )\). For this reason, we are interested in the properties of the superposition operator

$$\begin{aligned} S_A :u \mapsto \exp _A\circ \, u \end{aligned}$$
(5)

defined in some convenient functional setting. About superposition operators, see e.g. [3, Ch. 1] and [4, Ch. 3].

It is clear from the Lipschitz condition  (2) that \(\exp _{A}(u)\le 1+\left\| A\right\| _{\infty }\left| u\right| \), which in turn implies that the superposition operator \(S_{A}\) maps \(L^{\alpha }(\mu )\) into itself for all \(\alpha \in [1,+\infty ]\) and the mapping is uniformly Lipschitz with constant \(\left\| A\right\| _{\infty }\). Notice that we are assuming that \(\mu \) is a finite measure.

The superposition operator \(S_{A}:L^{\alpha }(\mu )\rightarrow L^{\alpha }(\mu )\) is 1-to-1 and its image consists of all positive random variables f such that \(\log _{A}f\in L^{\alpha }(\mu )\). The following proposition intercepts a more general result [19]. We give a direct proof here for sake of completeness and because our setting includes deformed logarithms other than the case treated there.

Proposition 1

  1. 1.

    For all \(\alpha \in [1,\infty ]\), the superposition operator \(S_A\) of Eq. (5) is Gateaux-differentiable with derivative

    $$\begin{aligned} d S_A(u)[h] = A(\exp _A(u))h \ . \end{aligned}$$
    (6)
  1. 2.

    \(S_A\) is Fréchet-differentiable from \(L^{\alpha }(\mu )\) to \(L^{\beta }(\mu )\), for all \(\alpha > \beta \ge 1\).

Proof

  1. 1.

    Equation (1) implies that for each couple of random variables \(u,h\in L^{\alpha }(\mu )\)

    $$\begin{aligned} \lim _{t\rightarrow 0}t^{-1}\left( \exp _{A}(u+th)-\exp _{A}(u)\right) -A(\exp _{A}(u))h=0 \end{aligned}$$

    holds point-wise. Moreover, if each \(\alpha \in [1,\infty [\), by Jensen inequality we infer that if \(t > 0\) then

    $$\begin{aligned}&\left| t^{-1}\left( \exp _{A}(u+th)-\exp _{A}(u)\right) -A(\exp _{A}(u))h\right| ^{\alpha }\le \\&\qquad \quad t^{-1}\left| h\right| ^{\alpha }\int _{0}^{t}\left| A(\exp _{A}(u+rh))-A(\exp _{A}(u))\right| ^{\alpha }\ dr \le \left( 2\left\| A\right\| _{\infty }\right) ^{\alpha }\left| h\right| ^{\alpha }. \end{aligned}$$

    Now, dominated convergence forces the limit to hold in \(L^{\alpha }(\mu )\). If \(t < 0\), it sufficies to replace h with \(-h\). Whenever \(\alpha =\infty \), we can use the second-order bound

    $$\begin{aligned}&\left| t^{-1}\left( \exp _{A}(u+th)-\exp _{A}(u)\right) -A(\exp _{A}(u))h\right| = \\&\qquad \vert t \vert ^{-1}h^{2}\left| \int _{0}^{t}(t-r)\frac{d}{dr}A(\exp _{A}(u+rh))\ dr\right| \le \frac{t}{2}\left\| h\right\| _{\infty }^{2} \left\| A'\right\| _{\infty } \left\| A\right\| _{\infty } \ . \end{aligned}$$

    As \(\left\| A^{\prime }\cdot A\right\| _{\infty }<\infty \), the RHS goes to 0 as \(t\rightarrow 0\) uniformly for each \(h\in L^{\infty }(\mu )\).

  1. 2.

    Given \(u,h\in L^{\alpha }(\mu )\), thanks again to Taylor formula,

    $$\begin{aligned}&\int \left| \exp _{A}(u+h)-\exp _{A}(u)-A(\exp _{A}(u))h\right| ^{\beta }\ d\mu \le \\&\qquad \qquad \qquad \qquad \quad \int \left| h\right| ^{\beta }\int _{0}^{1}\left| A(\exp _{A}(u+rh))-A(\exp _{A}(u))\right| ^{\beta }\ dr\ d\mu \ . \end{aligned}$$

    By means of Hölder inequality, with conjugate exponents \(\alpha /\beta \) and \(\alpha /(\alpha -\beta )\), the RHS is bounded by

    $$\begin{aligned} \left( \int \left| h\right| ^{\alpha }\ d\mu \right) ^{\frac{\beta }{\alpha }} \left( \iint \left| A(\exp _{A}(u+rh))-A(\exp _{A}(u))\right| ^{\frac{\alpha \beta }{\alpha -\beta }}\ dr\ d\mu \right) ^{\frac{\alpha -\beta }{\alpha }}\ . \end{aligned}$$

    Consequently,

    $$\begin{aligned}&\left\| h\right\| _{L^{\alpha }(\mu )}^{-1}\left\| \exp _{A}(u+h)-\exp _{A}(u)-A(\exp _{A}(u))h\right\| _{L^{\beta }(\mu )}\le \\&\qquad \qquad \qquad \qquad \quad \left( \iint \left| A(\exp _{A}(u+rh))-A(\exp _{A}(u))\right| ^{\frac{\alpha \beta }{\alpha -\beta }}\ dr\ d\mu \right) ^{\frac{\alpha -\beta }{\alpha \beta }}\ . \end{aligned}$$

    In order to show that the RHS vanishes as \(\left\| h\right\| _{L^{\alpha }(\mu )}\rightarrow 0\), observe that for all \(\delta >0\) we have

    $$\begin{aligned} \left| A(\exp _{A}(u+rh))-A(\exp _{A}(u))\right| \le {\left\{ \begin{array}{ll} 2\left\| A\right\| _{\infty } &{} \text {always,} \\ \left\| A'\right\| _{\infty } \left\| A\right\| _{\infty } \delta &{} \text {if } \left| h\right| \le \delta , \end{array}\right. } \end{aligned}$$

    so that, decomposing the double integral as \(\iint =\iint _{\left| h\right| \le \delta }+\iint _{\left| h\right| >\delta }\), we obtain

    $$\begin{aligned}&\iint \left| A(\exp _{A}(u+rh))-A(\exp _{A}(u))\right| ^{\gamma }\ dr\ d\mu \le \\&\qquad \qquad \qquad \left( 2\left\| A\right\| _{\infty }\right) ^{\gamma }\mu \left\{ \left| h\right| >\delta \right\} +\left( \left\| A'\right\| _{\infty } \left\| A\right\| _{\infty } \delta \right) ^{\gamma }\le \\&\qquad \qquad \qquad \qquad \qquad \qquad \left( 2\left\| A\right\| _{\infty }\right) ^{\gamma }\delta ^{-\alpha }\int \left| h\right| ^{\alpha }\ d\mu +\left( \left\| A'\right\| _{\infty } \left\| A\right\| _{\infty } \delta \right) ^{\gamma }\ , \end{aligned}$$

    where \(\gamma =\alpha \beta /(\alpha -\beta )\) and we have used Cebičev inequality. Now it is clear that the last bound implies the conclusion for each \(\alpha <\infty \). The case \(\alpha =\infty \) follows a fortiori.    \(\square \)

Remark 1

It is not generally true that the superposition operator \(S_A\) be Fréchet differentiable for \(\alpha = \beta \), cf. [3, §1.2]. We repeat here the well known counter-example.

Assume \(\mu \) is a non-atomic probability measure. For each \(\lambda \in \mathbb {R}\) and \(\delta > 0\) define the simple function

$$\begin{aligned} h_{\lambda ,\delta }(x) = {\left\{ \begin{array}{ll} \lambda &{} \text {if }\left| x \right| \le \delta , \\ 0 &{} \text {otherwise.} \end{array}\right. } \end{aligned}$$

For each \(\alpha \in [1,+\infty [\) we have

$$\begin{aligned} \lim _{\delta \rightarrow 0} \left\| h_{\lambda ,\delta }\right\| _{L^{\alpha }(\mu )}= \lim _{\delta \rightarrow 0} \left| \lambda \right| \mu \left\{ \left| x \right| \le \delta \right\} ^{1/\alpha } = 0 \ . \end{aligned}$$

Differentiability at 0 in \(L^{\alpha }(\mu )\) would imply for all \(\lambda \)

$$\begin{aligned} 0 =&\lim _{\delta \rightarrow 0} \frac{\left\| \exp _A(h_{\lambda ,\delta }) - 1 - A(1) h_{\lambda ,\delta }\right\| _{L^{\alpha }(\mu )}}{\left\| h_{\lambda ,\delta }\right\| _{L^\alpha (\mu )}} = \\&\quad \lim _{\delta \rightarrow 0} \frac{\left| \exp _A(\lambda ) - 1 -A(1)\lambda \right| \mu \left\{ x | \left| x \right| \le \delta \right\} ^{1/\alpha }}{ \left| \lambda \right| \mu \left\{ x | \left| x \right| \le \delta \right\} ^{1/\alpha }} = \left| \frac{\exp _A(\lambda ) - 1}{\lambda }- A(1)\right| \ , \end{aligned}$$

which is a contradiction.

Remark 2

Theorems about the differentiability of the deformed exponential are important because of computations like \(\frac{d}{d\theta }\exp _A(v(\theta )) = \exp '_A(v(\theta )) \dot{v}(\theta )\) are essential for the geometrical theory of statistical models. Several variations in the choice of the combination domain space - image space are possible. Also, one could look at a weaker differentiability property than Frechét differentiability. Our choice is motivated by the results of the following sections. A large class of cases is discussed in [19].

Remark 3

It would also be worth to study the action of the superposition operator on spaces of differentiable functions, for example Gauss-Sobolev spaces of P. Malliavin [15]. If \(\mu \) is the standard Gaussian measure on \(\mathbb {R}^n\), and u is a differentiable function such that \(u, \frac{\partial }{\partial x_i} u \in L^2(\mu )\), \(i=1,\ldots ,n\), then it follows that \(\exp _A(u) \in L^2(\mu )\) as well as \(\frac{\partial }{\partial x_i} \exp _A(u) \in L^2(\mu )\), since

$$\begin{aligned} \frac{\partial }{\partial x_i} \exp _A(u(x)) = A(\exp _A(u(x)) \frac{\partial }{\partial x_i} u(x) \ . \end{aligned}$$

We do not pursue this line of investigation here.

3 Deformed Exponential Family Based on \(\exp _A\)

According to [5, 26], let us define the deformed exponential curve in the space of positive measures on \(( \mathbb {X},\mathscr {X})\) as follows

$$\begin{aligned} t\mapsto \mu _{t}=\exp _{A}(tu+\log _{A}p)\cdot \mu \ , \quad u\in L^{1}(\mu ) \ . \end{aligned}$$

We have the following inequality:

$$\begin{aligned} \exp _{A}(x+y)\le \left\| A\right\| _{\infty }x^{+}+\exp _{A}(y). \end{aligned}$$

Actually, it is true for \(x\le 0\), as being \(\exp _{A}\) increasing. For \(x=x^{+}>0\) the inequality follows from Eq. (2). As a consequence, each \(\mu _{t}\) is a finite measure, \(\mu _{t}(\mathbb {X})\le t\left\| A\right\| _{\infty }\int u^{+}\ d\mu +1\), with \(\mu _{0}=p\cdot \mu \). The curve is actually continuous and differentiable in \(L^1(\mu )\) because the point-wise derivative of the density \(p_{t}=\exp _{A}(tu+\log _{A}(p))\) is \(\dot{p}_{t}=A(p_{t})u\) so that \(\left| \dot{p}_{t}\right| \le \left\| A\right\| _{\infty }\left| u\right| \). In conclusion \(\mu _{0}=p \cdot \mu \) and \(\dot{\mu }_{0}=A\left( p\right) u \cdot \mu \).

There are two ways to normalize the density \(p_t\) to total mass 1, either dividing by a normalizing constant Z(t) to get the statistical model \(t \mapsto \exp _A(tu + \log _A p)/Z(t)\) or, subtracting a constant \( \psi (t)\) from the argument to get the model \(t \mapsto \exp _A(tu - \psi (t) + \log _A(p))\). Unlike the standard exponential case, where these two methods lead to the same result, this is not the case for deformed exponentials where \( \exp _A(\alpha +\beta ) \ne \exp _A(\alpha )\exp _A(\beta )\). We choose in the present paper the latter option.

Here we use the ideas of [5, 17, 26] to construct deformed non-parametric exponential families. Recall that we are given: the probability space \((\mathbb {X},\mathscr {X},\mu )\); the set \(\mathscr {P}\) of the positive probability densities and the function A satisfying the conditions set out in Sect. 2. Throughout this section, the density \(p\in \mathscr {P}\) will be fixed.

The following proposition is taken from [16] where a detailed proof is given.

Proposition 2

  1. 1.

    The mapping \(L^{1}(\mu )\ni u\mapsto \exp _{A}(u+\log _{A}p)\in L^{1}(\mu )\) has full domain and is \(\left\| A\right\| _{\infty }\) -Lipschitz. Consequently, the mapping

    $$\begin{aligned} u\mapsto \int g\exp _{A}(u+\log _{A}p)\ d\mu \end{aligned}$$

    is \(\left\| g\right\| _{\infty }\cdot \left\| A\right\| _{\infty }\)-Lipschitz for each bounded function g.

  1. 2.

    For each \(u\in L^{1}(\mu )\) there exists a unique constant \( K_{p}(u)\in \mathbb {R}\) such that \(\exp _{A}(u-K_{p}(u)+\log _{A}p)\cdot \mu \) is a probability.

  2. 3.

    \(K_{p}(u)=u\) if, and only if, u is constant. In such a case,

    $$\begin{aligned} \exp _{A}(u-K_{p}(u)+\log _{A}p)\cdot \mu =p\cdot \mu \ . \end{aligned}$$

    Otherwise, \(\exp _{A}(u-K_{p}(u)+\log _{A}p)\cdot \mu \ne p\cdot \mu \).

  3. 4.

    A density q is of the form \(q=\exp _{A}(u-K_p(u)+\log _A p)\), with \( u\in L^{1}(\mu )\) if, and only if, \(\log _{A}q - \log _A p \in L^1(\mu )\).

  4. 5.

    If

    $$\begin{aligned} \exp _{A}(u-K_{p}(u)+\log _{A}p)=\exp _{A}(v-K_{p}(v)+\log _{A}p)\ , \end{aligned}$$

    with \(u,v\in L^{1}(\mu )\), then \(u-v\) is constant.

  5. 6.

    The functional \(K_{p}:L^{1}(\mu )\rightarrow \mathbb {R}\) is translation invariant. More specifically,

    $$\begin{aligned} K_{p}(u+c)=K_{p}(u)+cK_{p}(1) \end{aligned}$$

    holds for all \(c\in \mathbb {R}\).

  6. 7.

    \(K_{p}:L^{1}(\mu )\rightarrow \mathbb {R}\) is continuous and convex.

3.1 Escort Density

For each positive density \(q\in \overline{\mathscr {P}}\), its escort density is defined as

$$\begin{aligned} {\text {escort}}\left( q\right) = \frac{A(q)}{\int A(q)\ d\mu } \ , \end{aligned}$$

see [17]. Notice that \(0 \le A(q)\le \left\| A\right\| _{\infty }\). In particular, \(\widetilde{q} = {\text {escort}}\left( q\right) \) is a bounded positive density. Hence, \({\text {escort}}\left( \overline{\mathscr {P}}\right) \subseteq \overline{\mathscr {P}}\cap L^{\infty }(\mu )\). Clearly, the inclusion \({\text {escort}}\left( \mathscr {P}\right) \subseteq \mathscr {P}\cap L^{\infty }(\mu )\) is true as well.

Proposition 3

  1. 1.

    The mapping \({\text {escort}} :\overline{\mathscr {P}}\rightarrow \overline{\mathscr {P}}\cap L^{\infty }(\mu )\) is a.s. injective.

  1. 2.

    A bounded positive density \(\widetilde{q}\) is an escort density, i.e., \(\widetilde{q}\in {\text {escort}}\left( \overline{\mathscr {P}}\right) \) if, and only if,

    $$\begin{aligned} \lim _{\alpha \uparrow \left\| A\right\| _{\infty }} \int A^{-1}\left( \alpha \frac{\widetilde{q}}{\left\| \widetilde{q}\right\| _{\infty }}\right) \ d\mu \ge 1 \ . \end{aligned}$$
    (7)
  2. 3.

    Condition (7) is fulfilled if \(\mu \left\{ \widetilde{q} = \left\| \widetilde{q}\right\| _{\infty }\right\} > 0\). In particular, every density taking a finite number of different values, i.e., a simple density, is an escort density.

  3. 4.

    If \(\widetilde{q}_1 = {\text {escort}}\left( q_1\right) \) is an escort density, and \(q_2\) is a bounded positive density such that

    $$\begin{aligned} \mu \left\{ \widetilde{q}_1> t \left\| \widetilde{q}_1\right\| _{\infty }\right\} \le \mu \left\{ q_2> t \left\| q_2\right\| _{\infty }\right\} , \quad t > 0 \ , \end{aligned}$$

    then \(q_2\) is an escort density as well.

Proof

  1. 1.

    Let \({\text {escort}}\left( q_1\right) = {\text {escort}}\left( q_2\right) \) for \(\mu \)-almost all x. Say, \(\int A\circ q_{1}\ d\mu \ge \int A\circ q_{2}\ d\mu \). Then \(A(q_{2}(x))\le A(q_{1}(x))\), for \(\mu \)-almost all x. Since A is strictly increasing, it follows \(q_{2}(x)\le q_{1}(x)\) for \(\mu \)-almost all x, which, in turn, implies \(q_{1}=q_{2}\) \(\mu \)-a.s. because both \(\mu \)-integrals are equal to 1. Thus the escort mapping is a.s. injective.

  1. 2.

    Fix a \(\widetilde{q}\in \overline{\mathscr {P}}\cap L^{\infty }(\mu )\), and define the function

    $$\begin{aligned} f(\alpha ) = \int A^{-1}\left( \alpha \frac{\widetilde{q}}{\left\| \widetilde{q}\right\| _{\infty }}\right) \ d\mu , \quad \alpha \in [0,\left\| A\right\| _{\infty }[ \ . \end{aligned}$$

    It is finite, increasing, continuous and \(f(0)=0\). It is clear that the range condition  (7) is necessary because \(\widetilde{q} = {\text {escort}}\left( q\right) \) implies \(q = A^{-1}\left( \left( \int A(q)\ d\mu \right) \widetilde{q}\right) \) and, in turn, \(1 = \int A^{-1}\left( \left( \int A(q)\ d\mu \right) \widetilde{q}\right) \ d\mu \), given that q is a probability density. If we take \(\alpha = \int A(q)\ d\mu \ \left\| \widetilde{q}\right\| _{\infty } \le \left\| A\right\| _{\infty }\), the range condition is satisfied. Conversely, if the range condition holds, there exists \(\alpha \le \left\| A\right\| _{\infty }\) such that \(q = A^{-1}\left( \alpha \frac{\widetilde{q}}{\left\| \widetilde{q}\right\| _{\infty }}\right) \) is a positive probability density whose escort is \(\widetilde{q}\).

  2. 3.

    This is a special case of Item 2, in that

    $$\begin{aligned} f(\alpha )= \int A^{-1}\left( \alpha \frac{\widetilde{q}}{\left\| \widetilde{q}\right\| _{\infty }}\right) \ d\mu \ge A^{-1}(\alpha )\mu \left\{ \widetilde{q} = \left\| \widetilde{q}\right\| _{\infty }\right\} \ . \end{aligned}$$

    Therefore, \( f(\alpha ) \uparrow +\infty \), as \(\alpha \uparrow \left\| A\right\| _{\infty }\).

  3. 4.

    For each bounded positive density q we have

    $$\begin{aligned}&\int A^{-1}\left( \frac{q}{\left\| q\right\| _{\infty }}\right) \ d\mu = \int _0^{+\infty } \mu \left\{ \frac{q}{\left\| q\right\| _{\infty }}> A(t)\right\} \ dt =\\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \quad \int _0^{\left\| A\right\| _{\infty }} \mu \left\{ \frac{q}{\left\| q\right\| _{\infty }} > s\right\} \frac{1}{A'\left( A^{-1}(s)\right) } \ ds \ . \end{aligned}$$

    Now the necessary condition of Item 3. Follows from Item 1. and our assumptions.    \(\square \)

The previous proposition shows that the range of the escort mapping is uniformly dense as it contains all simple densities. Moreover, in the partial order induced by the rearrangement of the normalized density (that is for each q the mapping \(t \mapsto \mu \left\{ \frac{q}{\left\| q\right\| _{\infty }} \ > t\right\} \)), it contains the full right interval of each element. But the range of the escort mapping is not the full set of bounded positive densities, unless the \(\sigma \)-algebra \(\mathscr {X}\) is generated by a finite partition. To provide an example, consider on the Lebesgue unit interval the densities \(q_{\delta }(x) \propto (1 - x^{1/\delta })\), \(\delta > 0\), and \(A(x)=x/(1+x)\). The density \(q_{\delta }\) turns out to be an escort if, and only if, \(\delta \le 1\).

3.2 Gradient of the Normalization Operator \(K_p\)

Proposition 2 shows that the functional \(K_{p}\) is a global solution of an equation. We now study its local properties by the implicit function theorem as well as the related subgradients of the convex function \(K_p\). We refer to [7, Part I] for the general theory of convex functions in infinite dimension.

For every \(u\in L^{1}(\mu )\), let us write

$$\begin{aligned} q(u)=\exp _{A}(u-K_{p}(u)+\log _{A}p) \end{aligned}$$
(8)

while \(\widetilde{q}(u) = {\text {escort}}\left( q(u)\right) \) denotes its escort density.

Proposition 4

  1. 1.

    The functional \(K_{p}:L^{1}(\mu )\rightarrow \mathbb {R}\) is Gateaux-differentiable with derivative

    $$\begin{aligned} \left. \frac{d}{dt}K_{p}(u+tv)\right| _{t=0}=\int v\widetilde{q}(u)\ d\mu \ . \end{aligned}$$

    It follows that \(K_{p}:L^{1}(\mu )\rightarrow \mathbb {R}\) is monotone and globally Lipschitz.

  1. 2.

    For every \(u,v\in L^{1}(\mu )\), the inequality

    $$\begin{aligned} K_{p}(u+v)-K_{p}(u)\ge \int v\widetilde{q}(u)\ d\mu \end{aligned}$$

    holds, i.e., the density \(\widetilde{q}(u)\in L^{\infty }(\mu )\) is the unique subgradient of \(K_{p}\) at u.

Proof

  1. 1.

    Consider the equation

    $$\begin{aligned} F(t,\kappa )=\int \exp _{A}(u+tv-\kappa +\log _{A}p)\ d\mu -1=0,\quad t,\kappa \in \mathbb {R} \ , \end{aligned}$$

    so that \(\kappa = K_p(u+tv)\). Derivations under the integral hold by virtue of the bounds

    $$\begin{aligned} \left| \frac{\partial }{\partial t}\exp _{A}(u+tv-\kappa +\log _{A}p)\right| =&\\&\left| A(\exp _{A}(u+tv-\kappa +\log _{A}p))v\right| \le \left\| A\right\| _{\infty }\left| v\right| \end{aligned}$$

    and

    $$\begin{aligned} \left| \frac{\partial }{p} {\partial \kappa }\exp _{A}(u+tv-\kappa +\log _{A}p)\right| = \left| A(\exp _{A}(u+tv-\kappa +\log _{A}p))\right| \le \left\| A\right\| _{\infty }\ . \end{aligned}$$

    Furthermore, the partial derivative with respect to \(\kappa \) is never zero. Thanks to the implicit function theorem, there exists the derivative \(\left( d\kappa /dt\right) _{t=0}\) which is the desired Gateaux derivative. Since \(\widetilde{q}(u)\) is positive and bounded, \(K_p\) is monotone and globally Lipschitz.

  1. 2.

    Thanks to the convexity of \(\exp _{A}\) and the derivation formula, we have

    $$\begin{aligned} \exp _{A}(u+v-K_{p}(u+v)+\log _{A}p)\ge q+A(q)(v-(K_{p}(u+v)-K_{p}(v)))\ , \end{aligned}$$

    where \(q = \exp _A(u - K_p(u) + \log _A p)\). If we take \(\mu \)-integral of both sides,

    $$\begin{aligned} 0\ge \int vA(q)\ d\mu -(K_{p}(u+v)-K_{p}(v))\int A(q)\ d\mu \ . \end{aligned}$$

    Isolating the increment \(K_{p}(u+v)-K_{p}(v)\), the desired inequality obtains. Therefore, \(\widetilde{q}(u)\) is a subgradient of \(K_{p}\) at u. From Item 1. we deduce that \(\widetilde{q}(u)\) is the unique subgradient and further \(\widetilde{q}(u)\) is the Gateaux differential of \(K_{p}\) at u.    \(\square \)

We can also establish Fréchet-differentiability of the functional, under more stringent assumptions.

Proposition 5

Let \(\alpha \ge 2.\)

  1. 1.

    The superposition operator

    $$\begin{aligned} L^{\alpha }(\mu )\ni v\mapsto \exp _{A}(v+\log _{A}p)\in L^{1}(\mu ) \end{aligned}$$

    is continuously Fréchet differentiable with derivative

    $$\begin{aligned} d\exp _{A}(v)=(h\mapsto A(\exp _{A}(v+\log _{A}p))h)\in \mathscr {L} (L^{\alpha }(\mu ),L^{1}(\mu ))\ . \end{aligned}$$
  2. 2.

    The functional \(K_{p}:L^{\alpha }(\mu )\rightarrow \mathbb {R}\), implicitly defined by the equation

    $$\begin{aligned} \int \exp _{A}(v-K_{p}(v)+\log _{A}p)\ d\mu =1,\quad v\in L^{\alpha }(\mu ) \end{aligned}$$

    is continuously Fréchet differentiable with derivative

    $$\begin{aligned} dK_{p}(v)=(h\mapsto \int h\widetilde{q}(v)\ d\mu )\ , \end{aligned}$$

    where \(\widetilde{q}(u) = {\text {escort}}\left( q(u)\right) \).

Proof

  1. 1.

    Setting \(\beta =1\) in Proposition 1, we get easily the assertion. It remains just to check that the Fréchet derivative is continuous, i.e., that the Fréchet derivative is a continuous map \(L^{\alpha }(\mu )\rightarrow \mathscr {L}(L^{\alpha }(\mu ),L^{1}(\mu ))\). If \(\Vert {h}\Vert _{L^{\alpha }(\mu )}\le 1\) and \(v,w\in L^{\alpha }(\mu )\) we have

    $$\begin{aligned}&\int \left| {(A[\exp _{A}(v+\log _{A}p)]- A[\exp _{A}(w+\log _{A}p)])h} \right| \ d\mu \\&\qquad \qquad \qquad \qquad \le \Vert {A[\exp _{A}(v+\log _{A}p)-A[\exp _{A}(w+\log _{A}p)]}\Vert _{L^{\sigma }(\mu )}\ , \end{aligned}$$

    where \(\sigma =\alpha /\left( \alpha -1\right) \) is the conjugate exponent of\( \alpha \). On the other hand,

    $$\begin{aligned}&\Vert {A[\exp _{A}(v+\log _{A}p)-A[\exp _{A}(w+\log _{A}p)]}\Vert _{L^{\sigma }(\mu )} \\\le & {} \left\| A^{\prime }\right\| _{\infty }\left\| A\right\| _{\infty }\left\| v-w\right\| _{L^{\sigma }(\mu )} \end{aligned}$$

    and so the map \(L^{\alpha }(\mu )\rightarrow \mathscr {L}(L^{\alpha }(\mu ),L^{1}(\mu ))\) is continuous whenever \(\alpha \ge \sigma ,\) i.e., \(\alpha \ge 2\).

  1. 2.

    Fréchet differentiability of \(K_{p}\) is a consequence of the Implicit Function Theorem in Banach spaces, see [6], applied to the \(C^{1}\)-mapping

    $$\begin{aligned} L^{\alpha }(\mu )\times \mathbb {R}\ni (v,\kappa )\mapsto \int \exp _{A}(v-\kappa +\log _{A}p)\ d\mu \ . \end{aligned}$$

    The value of the derivative is given by Proposition 4.    \(\square \)

4 Deformed Divergence

In analogy with the standard exponential case, define the A-divergence between probability densities as

$$\begin{aligned} D_{A}(q\Vert p)=\int \left( \log _{A}q-\log _{A}p\right) {\text {escort}}\left( q\right) \text { } d\mu , \ \text { for }q,p\in \mathscr {P} \ . \end{aligned}$$

Since \(\log _A\) is strictly concave with derivative 1 / A, we have

$$\begin{aligned} \log _{A}\left( x\right) \le \log _{A}\left( y\right) +\frac{1}{A\left( y\right) }\left( x-y\right) \end{aligned}$$

for all \(x,y>0\) and with equality if, and only if, \(x=y.\) Hence

$$\begin{aligned} A\left( y\right) \left( \log _{A}\left( y\right) -\log _{A}\left( x\right) \right) \ge y-x\ . \end{aligned}$$
(9)

It follows in particular that \(D_{A}(\cdot \Vert \cdot )\) is a well defined, possibly extended valued, function.

Observe further that by Proposition 2, \(\log _{A}q-\log _{A}p \in L^{1}\left( \mu \right) \), and so \(D_{A}(q\Vert p)<\infty \), whenever \(q = q(u)\).

The binary relation \(D_{A}\) is a faithful divergence in that it satisfies the following Gibbs’ inequality.

Proposition 6

It holds \(D_{A}(q\Vert p)\ge 0\) and \(D_{A}(q\Vert p)=0\) if and only if \(p=q\).

Proof

From inequality (9) it follows

$$\begin{aligned} D_{A}(q\Vert p)&=\frac{1}{\int A\left( q\right) d\mu }\int \left( \log _{A}q-\log _{A}p\right) A\left( q\right) \ d\mu \\&\ge \frac{1}{\int A\left( q\right) d\mu }\int \left( q-p\right) \text { } \ d\mu =0. \end{aligned}$$

Moreover, equality holds if and only if \(p=q\) \(\mu \)-a.e.    \(\square \)

There are other alternative definitions that may fully candidate to be a divergence measure. For instance:

$$\begin{aligned}\ I_{A}(q\Vert p)=-\int \log _A(p/q) q \ d\mu . \end{aligned}$$

or also

$$\begin{aligned}\ \widetilde{D}_{A}(q\Vert p)=\int A(q/p)\log _A(p/q) p \ d\mu . \end{aligned}$$

By means of the concavity of \(\log _A\), it is not difficult to check that both satisfy Gibbs’ condition of Proposition 6, as well as they equal the Kullback–Leibner functional in the non-deformed case. Observe further that the functional \(I_{A}(q\Vert p)\) is closely related to Tallis’ divergence (see [25] and also [14]). In fact, if one replaces \(\log _A\) with the q-logarithm, one gets just Tallis’ q-divergence.

However our formulation for the divergence is motivated by the structure of the deformed exponential representation. As it will be now seen, our definition of divergence is more adapted to the present setting and it turns out be closely related to the normalizing operator.

In the equation

$$\begin{aligned} q = \exp _{A}(u-K_p(u)+\log _{A}p),\quad u\in L^{1}(\mu )\ , \ q \in \mathscr {P} \ , \end{aligned}$$
(10)

the random variable u is identified up to an additive constant for any fixed density q. There are at least two options for selecting an interesting representative member in the equivalence class.

One option is to impose the further condition \(\int u\widetilde{p}\ d\mu =0\), where \(\widetilde{p} = {\text {escort}}\left( p\right) \), the integral being well defined, given that the escort density is bounded. This restriction provides a unique element \(u_q\). On the other hand, if we solve Eq. (10) with respect to \(u-K(u)\), we get the desired relation:

$$\begin{aligned} K_{p}(u_q)={E}_{\widetilde{p}}\left[ \log _{A}p-\log _{A}q\right] =D_{A}(p\Vert q), \end{aligned}$$
(11)

where \(u=u_q\) is uniquely characterized by the two equations: \({E}_{\widetilde{p}}\left[ u\right] =0\) and \(q=\exp _{A}(u-K_p(u)+\log _{A}p)\).

Observe further that Eq. (11) entails the relation

$$\begin{aligned} K_{p}(u) = D_{A}(p\Vert q(u))\quad \forall u \in L^{1}\left( \mu \right) . \end{aligned}$$

The previous choice is that followed in the construction of the non-parametric exponential manifold, see [22, 23].

With regard to the non-deformed case, Eq. (11) yields the Kulback-Leibler divergence with p and q exchanged, with respect to what is considered more natural in Statistical Physics, see for example the comments [13].

For this purpose, we undertake another choice for the random variable in the equivalence class. More specifically, in Eq. (10) the random variable u will be now centered with respect to \(\widetilde{q} = {\text {escort}}\left( q\right) \), i.e., \({E}_{\widetilde{q}}\left[ u\right] =0\).

To avoid confusion let us rewrite Eq. (10) as follows and where for convenience the function \(K_p\) is replaced with \(H_p=-K_p\):

$$\begin{aligned} q=\exp _{A}(v+H_{p}(v)+\log _{A}p), \quad v\in L^{1}(\mu ),\quad {E}_{ \widetilde{q}}\left[ v\right] =0, \end{aligned}$$
(12)

so that

$$\begin{aligned} H_{p}(v_q) ={E}_{\widetilde{q}}\left[ \log _{A}q-\log _{A}p\right] =D_{A}(q\Vert p), \end{aligned}$$

where \(v=v_q\) is the solution to the two equations \({E}_{\widetilde{q}}\left[ v\right] =0\) and \(q=\exp _{A}(v+H_{p}(v)+\log _{A}p)\). There are hence two notable representations of the same probability density q:

$$\begin{aligned} q=\exp _A(u - K_p(u) + \log _A p) = \exp _A(v + H_p(v) + \log _A p) \end{aligned}$$

which implies \(u_q - v_q = K_p(u_q) + H_p(v_q)\). This, in turn, leads to

$$\begin{aligned} - {E}_{\widetilde{p}}\left[ v_q \right] = {E}_{\widetilde{q}}\left[ u_q \right] = K_p(u_q)+H_p(v_q)=K_p(u_q)-K_p(v_q). \end{aligned}$$

This provides the following remarkable relation

$$\begin{aligned} H_p(v_q)={E}_{\widetilde{q}}\left[ u_q \right] \ -\ K_p(u_q). \end{aligned}$$
(13)

4.1 Variational Formula

We now present a variational formula in the spirit of the classical one by Donsker-Varadhan. Next proposition provides the convex conjugate of \(K_{p}\), in the duality \(L^{\infty }(\mu )\times L^{1}(\mu )\).

In what follows, the operator \(\eta \mapsto \hat{\eta }\) denotes the inverse of the escort operator, i.e., \(\eta = {\text {escort}}\left( \hat{\eta }\right) \). In the light of the results established in Sect. 3.1, this operator maps a dense subset of \(\overline{\mathscr {P}}\cap L^{\infty }(\mu )\) onto \(\overline{\mathscr {P}}\).

Proposition 7

  1. 1.

    The convex conjugate function of \(K_{p}\):

    $$\begin{aligned} K_{p}^{*}\left( w\right) =\sup _{u\in L^{1}(\mu )}\left( \int wu\ d\mu -K_{p}\left( u\right) \right) , \quad w\in L^{\infty }(\mu ) \end{aligned}$$
    (14)

    has domain contained into \(\overline{\mathscr {P}}\cap L^{\infty }(\mu )\). More precisely,

    $$\begin{aligned} {\text {escort}}\left( \mathscr {P}\right) \subseteq domK_{p}^{*}\subseteq \overline{\mathscr {P}}\cap L^{\infty }(\mu ). \end{aligned}$$
  1. 2.

    \(K_{p}^{*}\left( w\right) \ge 0\) for all \(w \in L^{\infty }(\mu )\). For any \(\eta \in {\text {escort}}\left( \mathscr {P}\right) \), the conjugate \(K_p^*(\eta )\) is given by the Legendre transform:

    $$\begin{aligned} K_p^*(\eta ) = \int \eta \ u_{\hat{\eta }} \ d\mu - K_p(u_{\hat{\eta }}) \ . \end{aligned}$$

    So that    \(K_p^*(\eta ) = H_p(v_{\hat{\eta }}) = D_A(\hat{\eta }\Vert p)\); equivalently:

    $$\begin{aligned} K_p^*({\text {escort}}\left( q\right) ) = D_A(q\Vert p) \quad \forall p,q \in L^1 (\mu ) . \end{aligned}$$
  2. 3.

    It holds the inversion formula

    $$\begin{aligned} K_{p}\left( u\right) =\max _{\eta \in {\text {escort}}\left( \mathscr {P}\right) }\left( \int \eta u\ d\mu - D_A(\hat{\eta }\Vert p) \right) \\ = \max _{q\in \mathscr {P}} \left( \int {\text {escort}}\left( q\right) u\ d\mu - D_A(q\Vert p) \right) ,\quad \forall u \in L^{1}(\mu ). \end{aligned}$$

Proof

  1. 1.

    It follows from the fact that \(K_{p}\) is monotone and translation invariant. Let us first suppose \(w\notin L^{\infty }_+(\mu )\). That means that

    $$\begin{aligned} \int w \chi _C \ d\mu < 0 \end{aligned}$$

    is true for some indicator function \(\chi _C\). If we consider the cone generated by the function \(-\chi _C\), we can write

    $$\begin{aligned} K_{p}^{*}\left( w\right) \ge \sup _{u\in \ cone(-\chi _C)}\left( \int wu\ d\mu -K_{p}\left( u\right) \right) \ge \sup _{u\in \ cone(-\chi _C)}\int wu\ d\mu = +\infty , \end{aligned}$$

    since \(K_{p}\left( u\right) \le 0\) when \(u\in \ cone(-\chi _C)\). Now consider the case in which \(w \ge 0\). If we set \(u = \lambda \in \mathbb {R}\), we have \(K_p(\lambda ) = \lambda \) and consequently

    $$\begin{aligned} K_p^*(w) \ge \sup _{\lambda \in \mathbb {R}}\left( \lambda \int w\ d\mu -\lambda \right) \ . \end{aligned}$$
    (15)

    This \(\sup \) is \(+\infty \), unless \(\int w \ d\mu = 1\). Hence, \(K_{p}^{*}\left( w\right) <\infty \) implies \(w\in \overline{\mathscr {P}}\). Summarizing, the domain of \(K_{p}^{*}\) is contained into \(\overline{\mathscr {P}}\cap L^{\infty }(\mu )\), and this proves one of the two claimed inclusions. The other one will be a direct consequence of the next point.

  1. 2.

    Equation (15) implies \(K_{p}^{*}\ge 0\). By Proposition 4 the concave and Gateaux differentiable function \(u \mapsto \int \eta u \ d\mu - K_p(u)\) has derivative at u given by \(\eta - dK_p(u) = \eta - {\text {escort}}\left( q(u)\right) \), where \(q(u) = \exp _A(u - K_p(u) + \log _Ap)\). Under our assumptions, the derivative vanishes at \(u=u_{\hat{\eta }}\) and the \(\sup \) in the definition of \(K_p^*\) is attained at that point. The maximum value is \(K_p^*(\eta ) = \int \eta u \ d\mu - K_p(u)\), by setting \(u=u_{\hat{\eta }}\). The last formula follows straightforward from Eq. (13).

  2. 3.

    For a well-known property of Fenchel–Moreau duality theory, we have:

    $$\begin{aligned} K_{p}\left( u\right) \ge \int wu\ d\mu - K_p^*(w) \quad \forall u \in L^1 (\mu ), \quad \forall w \in L^{\infty }(\mu ) \\ K_{p}\left( u\right) = \int wu\ d\mu - K_p^*(w) \iff w \in \partial K_{p}\left( u\right) . \end{aligned}$$

    Clearly in our case \(\partial K_{p}\left( u\right) \) is a singleton and the image of \(\partial K_{p}\) is the set \({\text {escort}}\left( \mathscr {P}\right) \). Therefore

    $$\begin{aligned} K_{p}\left( u\right) = \max _{w \in {\text {escort}}\left( \mathscr {P}\right) }\left( \int wu\ d\mu - K_p^*(w) \right) . \end{aligned}$$

    By Item 2 the desired inversion formula obtains.    \(\square \)

5 Hilbert Bundle Based on \(\exp _A\)

We shall introduce the Hilbert manifold of probability densities as defined in [18, 19]. A slightly more general set-up than the one used in that reference will be introduced. By means of a general A function, we provide an atlas of charts, and define a linear bundle as an expression of the tangent space.

Let \(\mathscr {P}(\mu )\) denote the set of all \(\mu \)-densities on the probability space \((\mathbb {X},\mathscr {X},\mu )\) of the kind

$$\begin{aligned} q=\exp _{A}(u-K_{1}(u)),\quad u\in L^{2}(\mu ),\quad {E}_{\mu }\left[ u \right] =0 \ . \end{aligned}$$
(16)

Notice that \(1\in \mathscr {P}(\mu )\) because we can take \(u=0\).

Proposition 8

  1. 1.

    \(\mathscr {P}(\mu )\) is the set of all densities q such that \(\log _{A}q\in L^{2}(\mu )\), in which case \(u = \log _A q - {E}_{\mu }\left[ \log _A q\right] \).

  2. 2.

    If in addition \(A'(0+)>0\), then \(\mathscr {P}(\mu )\) is the set of all densities q such that both q and \(\log q\) are in \(L^{2}(\mu )\).

  3. 3.

    Let \(A'(0+) > 0\). On a product space with reference probability measures \(\mu _1\) and \(\mu _2\), and densities respectively \(q_1\) and \(q_2\), we have \(q_1 \in \mathscr {P}(\mu _1)\) and \(q_2 \in \mathscr {P}(\mu _2)\) if, and only if, \(q_1 \otimes q_2 \in \mathscr {P}(\mu _1 \otimes \mu _2)\).

Proof

  1. 1.

    From Eq. (16), it follows \(\log _A q = u - K_1(u) \in L^2(\mu )\), provided \(u\in L^{2}(\mu )\). Conversely, let \(\log _{A}q\in L^{2}(\mu )\). Equation (16) yields

    $$\begin{aligned} u = \log _A q - K_1(u) \quad and\quad K_1(u)=- \log _A q. \end{aligned}$$

    Therefore \(u = \log _A q - {E}_{\mu }\left[ \log _A q\right] \) and \(u\in L^{2}(\mu )\).

  1. 2.

    Write

    $$\begin{aligned} \left| \log _A q\right| ^2 = \left| \log _A q\right| ^2 (q < 1) + \left| \log _A q\right| ^2 (q \ge 1) \, \end{aligned}$$

    and use the bounds of Eqs. (3) and (4) to get

    $$\begin{aligned}&{E}_{\mu }\left[ \left| \log _A q\right| ^2\right] \le \frac{1}{\alpha _2^2} {E}_{\mu }\left[ \left| \log q\right| ^2 (q < 1)\right] + \frac{1}{A(1)^2}{E}_{\mu }\left[ \left| q-1\right| ^2 (q \ge 1)\right] \le \\&\qquad \qquad \qquad \qquad \qquad \qquad \qquad \qquad \frac{1}{\alpha _2^2} {E}_{\mu }\left[ \left| \log q\right| ^2\right] + \frac{1}{A(1)^2} {E}_{\mu }\left[ q^2\right] \ . \end{aligned}$$

    We deduce that the two conditions q and \(\log q\) in \(L^{2}(\mu )\) imply \(\log _{A}q\in L^{2}(\mu )\). Conversely, let \(\log _{A}q\in L^{2}(\mu )\). By means of the other two bounds (recall that \(\alpha _1 > 0\)) we have too

    $$\begin{aligned} {E}_{\mu }\left[ \left| \log _A q\right| ^2\right] \ge \frac{1}{\alpha _1^2} {E}_{\mu }\left[ \left| \log q\right| ^2(q<1)\right] + \frac{1}{\left\| A\right\| _{\infty }^2}{E}_{\mu }\left[ (q-1)^2(q \ge 1)\right] \ . \end{aligned}$$

    Consequently, \({E}_{\mu }\left[ (q-1)^2(q \ge 1)\right] < + \infty \). This in turn gives \({E}_{\mu }\left[ (q-1)^2\right] < + \infty \), and so \(q \in L^{2}(\mu )\).

    Once again, the previous inequality provides the condition \({E}_{\mu }\left[ \left| \log q\right| ^2(q<1)\right] < +\infty \). On the other hand, \({E}_{\mu }\left[ \left| \log q\right| ^2(q \ge 1)\right] < +\infty \) since \(\left| \log q\right| ^2(q \ge 1) \le (q-1)^2 (q \ge 1)\). Therefore, \(\log q \in L^{2}(\mu )\).

  2. 3.

    We deduce by the previous item that: \(q_1 \otimes q_2 \in \mathscr {P}(\mu _1 \otimes \mu _2)\) if and only if both \(q_1 \otimes q_2\) and \(\log (q_1 \otimes q_2)\) are in \(L^2(\mu _1\otimes \mu _2)\).

    The first condition is equivalent to both \(q_1 \in L^2(\mu _1)\) and \(q_2 \in L^2(\mu _2)\). The second one is equivalent to \(\log q_1 + \log q_2 \in L^2(\mu _1\otimes \mu _2)\). On the other hand, we have

    $$\begin{aligned}&{E}_{\mu _1 \otimes \mu _2}\left[ (\log q_1 + \log q_2)^2\right] = \nonumber \\&\qquad \qquad \qquad {E}_{\mu _1}\left[ \log ^2 q_1\right] + {E}_{\mu _2}\left[ \log ^2 q_2\right] + 2 \ {{E}_{\mu _1}\left[ \log q_1\right] } {{E}_{\mu _2}\left[ \log q_2\right] }. \end{aligned}$$
    (17)

    By Eq. (17), \(q_1 \in \mathscr {P}(\mu _1)\) and \(q_2 \in \mathscr {P}(\mu _2)\) imply \(q_1 \otimes q_2 \in \mathscr {P}(\mu _1 \otimes \mu _2)\). Conversely, assume \(q_1 \otimes q_2 \in \mathscr {P}(\mu _1 \otimes \mu _2)\). This implies that it holds, \({E}_{\mu _1 \otimes \mu _2}\left[ (\log q_1 + \log q_2)^2\right] < + \infty \). Since \({E}_{\mu _i}\left[ \log q_i\right] \le {E}_{\mu _1}\left[ q_i-1\right] = 0\). We have \(\ {{E}_{\mu _1}\left[ \log q_1\right] } {{E}_{\mu _2}\left[ \log q_2\right] } \ge 0\). In view of Eq. (17), we can infer that \(q_1 \in \mathscr {P}(\mu _1)\) and \(q_2 \in \mathscr {P}(\mu _2)\)    \(\square \)

We proceed now to define an Hilbert bundle with base \(\mathscr {P}(\mu )\). The notion of Hilbert bundle has been introduced in Information Geometry by [1]. We are here using an adaptation to the A-exponential of arguments elaborated by [8, 21]. Notice that the construction depends in a essential way on the specific conditions we are assuming for the present class of deformed exponential.

At each \(q \in \mathscr {P}(\mu )\) the escort density \(\widetilde{q}\) is bounded, so that we can define the fiber given by the Hilbert spaces

$$\begin{aligned} \mathscr {H}_{q}=\left\{ u\in L^{2}(\mu )|{E}_{\widetilde{q}}\left[ u\right] =0\right\} \end{aligned}$$

with scalar product \(\left\langle u,v\right\rangle _{q}=\int uv\ d\mu \). The Hilbert bundle is

$$\begin{aligned} H\mathscr {P}(\mu )=\left\{ (q,u)|q\in \mathscr {P}(\mu ),u\in \mathscr {H}_{q}\right\} \ . \end{aligned}$$

For each \(p,q\in \mathscr {P}(\mu )\) the mapping \(\mathbb {U}_{p}^{q}u=u-{E}_{\widetilde{q}}\left[ u\right] \) is a continuous linear mapping from \(\mathscr {H}_{p}\) to \(\mathscr {H}_{q}\). Moreover, \(\mathbb {U}_{q}^{r}\mathbb {U}_{p}^{q}=\mathbb {U}_{p}^{r}\). In particular, \(\mathbb {U}_{q}^{p}\mathbb {U}_{p}^{q}\) is the identity on \(\mathscr {H}_{p}\) and so \(\mathbb {U}_{p}^{q}\) is an isomorphism of \(\mathscr {H}_{p}\) onto \(\mathscr {H}_{q}\).

In the next proposition an affine atlas of charts is constructed in order to define our Hilbert bundle which is an expression of the tangent bundle. The velocity of a curve \(t \mapsto p(t) \in \mathscr {P}(\mu )\) is given in the Hilbert bundle by the so called A-score that, in our case, takes the form \(A(p(t))^{-1} \dot{p}(t)\), where \(\dot{p}(t)\) is computed in \(L^1(\mu )\).

The following proposition is taken from [16] where a detailed proof is presented.

Proposition 9

  1. 1.

    Fix \(p\in \mathscr {P}(\mu )\). A positive density \(q \in \mathscr {P}(\mu )\) if and only if

    $$\begin{aligned} q=\exp _{A}(u-K_{p}(u)+\log _{A}p),\text { with } u\in L^{2}(\mu )\, \text {and}\, {E}_{\widetilde{p}}\left[ u\right] =0. \end{aligned}$$
  1. 2.

    For any fixed \(p\in \mathscr {P}(\mu )\) the mapping \(s_p :\mathscr {P}(\mu )\rightarrow \mathscr {H}_{p}\) defined by

    $$\begin{aligned} q\mapsto \log _{A}q-\log _{A}p+D_{A}(p\Vert q) \end{aligned}$$

    is injective and surjective, with inverse \(e_{p}(u)=\exp _{A}(u-K_{p}(u)+\log _{A}p)\).

  2. 3.

    The atlas \(\left\{ s_{p}|p\in \mathscr {P}(\mu )\right\} \) is affine with transitions

    $$\begin{aligned} s_{q}\circ e_{p}(u)=\mathbb {U}_{p}^{q}u+s_{p}(q)\ . \end{aligned}$$
  3. 4.

    The velocity of the differentiable curve \(t\mapsto p(t)\in \mathscr {P}(\mu )\) in the chart \(s_{p}\) is \(ds_{p}(p(t))/dt\in \mathscr {H}_{p}\). Conversely, given any \(u\in \mathscr {H}_{p}\), the curve

    $$\begin{aligned} p:t\mapsto \exp _{A}(tu-K_{p}(tu)+\log _{A}p) \end{aligned}$$

    satisfies \(p(0)=p\) and has velocity u at \(t=0\), expressed in the chart \(s_{p}\). If the velocity of a curve is \(t\mapsto \dot{u}(t)\), in a chart \(s_{p}\), then \(\mathbb {U}_{p}^{q} \dot{u}(t)\) is its velocity in the chart \(s_{q}\).

  4. 5.

    If \(t\mapsto p(t)\in \mathscr {P}(\mu )\) is differentiable with respect to the atlas then it is differentiable as a mapping in \(L^{1}(\mu )\). It follows that the A-score is well-defined and is the expression of the velocity of the curve \(t\mapsto p(t)\) in the moving chart \(t\mapsto s_{p(t)}\).

We end here our discussion of the geometry of the Hilbert bundle, because our aim is limited to show the applicability of the analytic results obtained in the previous section. A detailed discussion of the relevant geometric objects e.g., the affine covariant derivative, is not attempted here.

6 Final Remarks

A non-parametric Hilbert manifold based on a deformed exponential representation of positive densities has been firstly introduced by N.J. Newton [18, 19]. We have derived regularity properties of the normalizing functional \(K_p\) and discussed the relevant Fenchel conjugation. In particular, we have discussed some properties of the escort mapping and a form of the divergence that appears to be especially adapted to our set-up. We have taken a path different from that of N.J. Newton original presentation. We allow for a manifold defined by an atlas containing charts centered at each density in the model. In conclusion, we have discussed explicitly a version of the Hilbert bundle as a family of codimension 1 sub-vector spaces of the basic Hilbert space.