Keywords

1 Introduction

Heavily tailed probability distributions are important objects in anomalous statistical physics (cf. [11, 15]). Such probability distributions do not have expectations in general. Therefore the notion of escort distribution has been introduced [4] in order to give a suitable down weight for heavy tail probability. Consequently, there exists a modified expectation for such a probability distributions.

For a deformed exponential family, an escort distribution is given by the differential of a deformed exponential function. Therefore, the first named author considered further generalizations of escort distributions In q-exponential case, he introduced a sequential structure of escort distributions [7].

In this paper, we consider a sequential structure of escort distributions on a deformed exponential family. It is known that a deformed exponential family naturally has at least three kinds of different statistical manifold structures [8]. We elucidate relations between these statistical manifold structures and the structures derived from the sequence of escort expectations. Consequently, we find that dually flat structures and generalized conformal structures for statistical manifolds naturally arise in this framework.

2 Deformed Exponential Families

Throughout this paper, we assume that all the objects are smooth. In this section, we summarize foundations of deformed exponential functions and deformed exponential families. For further details, see [11].

Let \(\chi \) be a strictly increasing function from \(\mathbf {R}_{++}\) to \(\mathbf {R}_{++}\). We call this function \(\chi \) a deformation function. By use of a deformation function, we define a \(\chi \) -exponential function \(\exp _{\chi }t\) (or a deformed exponential function) by the eigenfunction of the following non-linear differential equation

$$ \frac{d}{dt} \exp _{\chi }t = \chi (\exp _{\chi }t). $$

The inverse of a \(\chi \)-exponential function is called a \(\chi \) -logarithm function or a deformed logarithm function, and it is given by

$$ \ln _{\chi } s := \int _1^s\frac{1}{\chi (t)}dt. $$

If the deformation function is a power function \(\chi (t) = t^q \ (q>0, q\ne 1)\), the deformed exponential and the deformed logarithm are given by

$$\begin{aligned} \exp _q t&:= \left( 1+(1-q)t\right) ^{\frac{1}{1-q}},&(1+(1-q)t>0), \\ \ln _q s&:= \frac{s^{1-q}-1}{1-q},&(s>0), \end{aligned}$$

and they are called a q-exponential and a q-logarithm, respectively.

We suppose that a statistical model \(S_{\chi }\) has the following expression

$$ S_{\chi } = \left\{ p(x,\theta ) \left| p(x;\theta ) = \exp _{\chi }\left[ \sum _{i=1}^n\theta ^iF_i(x) -\psi (\theta ) \right] , \ \theta \in \varTheta \subset \mathbf {R}^n \right. \right\} , $$

where \(F_1(x), \dots , F_n(x)\) are functions on the sample space \(\varOmega \), \(\theta = {}^t(\theta ^1, \dots , \theta ^n)\) is a parameter, and \(\psi (\theta )\) is the normalization defined by \(\int _{\varOmega }p(x;\theta )dx = 1\). We call the statistical model \(S_{\chi }\) a \(\chi \) -exponential family or a deformed exponential family. Under suitable conditions, \(S_{\chi }\) is regarded as a manifold with coordinate system \(\theta = (\theta ^1, \dots , \theta ^n)\). When the deformed exponential function is a q-exponential, we denote the statistical model by \(S_q\) and call it a q-exponential family.

We remark that the regularity conditions for \(S_{\chi }\) is very difficult. To elucidate such conditions is quite an open problem. For example, regularity conditions for a statistical model (see Chap. 2 in [1]) and the well-definedness of a deformed exponential function should be satisfied simultaneously. A few arguments of this problem is given in the first and the third named author’s previous work [9].

3 A Sequential Structure of Expectations

In this section we consider a sequential structure of expectations. As we will see later, statistical manifold structures are defined from this sequence.

Let \(S_{\chi } = \{p_{\theta }\} = \{p(x;\theta )\}\) be a \(\chi \)-exponential family. We say that \(P_{\chi }(x;\theta )\) is an escort distribution of \(p_\theta \in S_{\chi }\) if

$$ P_{\chi }(x; \theta ) := P_{\chi , (1)}(x;\theta ) := \chi (p_{\theta }). $$

We say that \(P_{\chi }^{esc}(x;\theta )\) is a normalized escort distribution of \(p_{\theta }\) if

$$\begin{aligned} P_{\chi }^{esc}(x;\theta ):= & {} P_{\chi , (1)}^{esc}(x;\theta ) \ := \ \frac{\chi (p_{\theta })}{Z_{\chi }(p_{\theta })}, \\&\text{ where } \quad Z_{\chi }(p_{\theta }) := Z_{\chi , (1)}(p_{\theta }) := \int _{\varOmega }\chi (p_{\theta })dx. \end{aligned}$$

We generalize the escort distribution by use of higher-order differentials.

Definition 1

Let \(S_{\chi }\) be a \(\chi \)-exponential family. Denote by \(\exp _{\chi }^{(n)}x\) the n-th differential of the \(\chi \)-exponential function. For \(p_{\theta } \in S_{\chi }\), we define the n-th escort distribution \(P_{\chi , (n)}(x;\theta )\) by

$$\begin{aligned} P_{\chi , (n)}(x;\theta ):= & {} \exp _{\chi }^{(n)}(\ln _{\chi }p_{\theta }) = \exp _{\chi }^{(n)}\left( \sum _{i=1}^n\theta ^iF_i(x) - \psi (\theta )\right) , \end{aligned}$$

and the normalized n-th escort distribution \(P_{\chi , (n)}^{esc}(x;\theta )\) by

$$\begin{aligned} P_{\chi , (n)}^{esc}(x;\theta ):= & {} \frac{P_{\chi , (n)}(x;\theta )}{Z_{\chi , (n)}(p_{\theta })}, \quad \text{ where } \quad Z_{\chi , (n)}(p_{\theta }) = \int _{\varOmega }P_{\chi , (n)}(x;\theta )dx. \end{aligned}$$

For a given function f(x) on \(\varOmega \), we define the n-th escort expectation of f(x) and the normalized n-th escort expectation of f(x) by

$$\begin{aligned} E_{\chi , (n), p}[f(x)]:= & {} \int _{\varOmega }f(x)P_{\chi , (n)}(x;\theta )dx, \\ E_{\chi , (n), p}^{esc}[f(x)]:= & {} \int _{\varOmega }f(x)P_{\chi , (n)}^{esc}(x;\theta )dx, \end{aligned}$$

respectively.

For example, in the case of q-exponential family \(S_q\), the n-th escort distribution of \(p_q(x;\theta )\) is given by

$$ P_{q,(n)}(x;\theta ) := \{q(2q-1) \cdots ((n-1)q - (n-2))\}\{p_q(x;\theta )\}^{nq-(n-1)}. $$

When we consider geometric structure determined from the unbiasedness of generalized score function, that is,

$$ E_{\chi , (1), p}[\partial _i \ln _{\chi }p(x;\theta )] = 0, $$

a sequential structure of expectations naturally arises. This is one of our motivations to study sequential expectations. When we consider correlations of random variables, another kinds of sequence of expectations will be required.

4 Geometry of Statistical Models

Let (Mg) be a Riemannian manifold, and C be a totally symmetric (0, 3)-tensor field on M. We call the triplet (MgC) a statistical manifold [6]. In this case, the tensor field C is called a cubic form. For a given statistical manifold (MgC), we can define one parameter family of affine connections by

$$\begin{aligned} g(\nabla ^{(\alpha )}_XY,Z) : = g(\nabla ^{(0)}_XY,Z) - \frac{\alpha }{2}C(X,Y,Z), \end{aligned}$$
(1)

where \(\alpha \in \mathbf {R}\) and \(\nabla ^{(0)}\) is the Levi-Civita connection with respect to g. It is easy to check that \(\nabla ^{(\alpha )}\) and \(\nabla ^{(-\alpha )}\) are mutually dual with respect to g, that is,

$$ Xg(Y,Z) = g(\nabla ^{(\alpha )}_XY,Z) + g(Y,\nabla ^{(-\alpha )}_XZ). $$

We say that S is a statistical model if S is a set of probability density functions on \(\varOmega \) with parameter \(\xi \in \varXi \) such that

$$ S = \left\{ p(x;\xi ) \left| \int _{\varOmega }p(x;\xi )dx=1, \ p(x;\xi )>0, \ \xi = (\xi ^1, \dots , \xi ^n) \in \varXi \subset \mathbf {R}^n \right. \right\} . $$

Under suitable conditions, we can define a Fisher metric \(g^F\) on S by

$$\begin{aligned} g_{ij}^F(\xi )= & {} \int _{\varOmega } \left( \frac{\partial }{\partial \xi ^i}\ln p(x;\xi )\right) \left( \frac{\partial }{\partial \xi ^j}\ln p(x;\xi )\right) p(x;\xi ) \, dx \end{aligned}$$
(2)
$$\begin{aligned}= & {} \int _{\varOmega } \left( \frac{\partial }{\partial \xi ^i}\ln p(x;\xi )\right) \left( \frac{\partial }{\partial \xi ^j}p(x;\xi )\right) \, dx \\= & {} E_{p}[\partial _il_{\xi }\partial _jl_{\xi }] , \nonumber \end{aligned}$$
(3)

where \(\partial _i = \partial /\partial \xi ^i\), \(l_{\xi } = l(x;\xi ) = \ln p(x;\xi )\), and \(E_{p}[f]\) is the standard expectation of f(x) with respect to \(p(x;\xi )\).

Next, we define a totally symmetric (0, 3)-tensor field \(C^F\) by

$$ C^F_{ijk}(\xi ) = E_{p} \left[ (\partial _il_{\xi })(\partial _jl_{\xi }) (\partial _kl_{\xi }) \right] . $$

From Eq. (1), we can define one parameter family of affine connections. In particular, the connection \(\nabla ^{(e)} = \nabla ^{(1)}\) is called theexponential connection and \(\nabla ^{(m)} = \nabla ^{(-1)}\) is called the mixture connection. These connections are given by

$$\begin{aligned} \varGamma ^{(e)}_{ij,k}(\xi )= & {} \int _{\varOmega }(\partial _i\partial _j\ln p_{\xi })(\partial _k p_{\xi })dx, \\ \varGamma ^{(m)}_{ij,k}(\xi )= & {} \int _{\varOmega }(\partial _k\ln p_{\xi })(\partial _i\partial _j p_{\xi })dx. \end{aligned}$$

It is known that \(g^F\) and \(C^F\) are independent of the choice of reference measure on \(\varOmega \). Therefore, the triplet \((S, g^F, C^F)\) is called an invariant statistical manifold. If a statistical model S is an exponential family, then the invariant statistical manifold \((S, g^F, C^F)\) determines a dually flat structure on S. (See [1, 13].) However, this fact may not be held for a deformed exponential family \(S_{\chi }\) and an invariant structure may not be important for \(S_{\chi }\). Therefore, we consider another statistical manifold structures.

We summarize statistical manifold structures for \(S_{\chi }\) based on [8].

Let \(S_{\chi }\) be a \(\chi \)-exponential family. We define a Riemannian metric \(g^M\) by

$$ g_{ij}^M(\theta ) := \int _{\varOmega } \left( \partial _i\ln _{\chi } p_{\theta } \right) \left( \partial _jp_{\theta }\right) \, dx, $$

where \(\partial _i = \partial /\partial \theta ^i\). The Riemannian metric \(g^M\) is a generalization of the representation of Fisher metric (3). A pair of dual affine connections are given by

$$\begin{aligned} \varGamma ^{M(e)}_{ij,k}(\theta )= & {} \int _{\varOmega }(\partial _i\partial _j\ln _{\chi }p_{\theta })(\partial _kp_{\theta })dx, \\ \varGamma ^{M(m)}_{ij,k}(\theta )= & {} \int _{\varOmega }(\partial _k\ln _{\chi }p_{\theta })(\partial _i\partial _jp_{\theta })dx. \end{aligned}$$

The difference of two affine connections \(C_{ijk}^M = \varGamma ^{M(m)}_{ij,k} - \varGamma ^{M(e)}_{ij,k}\) determines a cubic form. In addition, from the definition of the deformed exponential family \(S_{\chi }\), \(\varGamma ^{M(e)}_{ij,k}(\theta )\) always vanishes. Therefore, we have the following proposition.

Proposition 1

For a \(\chi \)-exponential family \(S_{\chi }\), the triplet \((S_{\chi }, g^{M}, C^M)\) is a statistical manifold. In particular, \((S_{\chi }, g^{M}, \nabla ^{M(e)}, \nabla ^{M(m)})\) is a dually flat space.

By setting

$$ U_{\chi }(s) := \int _0^s (\exp _{\chi }t) \, dt, $$

we define a U-divergence [10] by

$$ D_{\chi }(p||r) = \int _{\varOmega }\!\{U_{\chi }(\ln _{\chi }r(x)) - U_{\chi }(\ln _{\chi }p(x)) -p(x)(\ln _{\chi }r(x) - \ln _{\chi }p(x))\} dx. $$

It is known that the U-divergence \(D_{\chi }(p||r)\) on \(S_{\chi }\) coincides with the canonical divergence for \((S_{\chi }, g^{M}, \nabla ^{M(m)}, \nabla ^{M(e)})\) (See [8, 10]).

Next, we define another statistical manifold structure from the viewpoint of Hessian geometry.

For a \(\chi \)-exponential family \(S_{\chi }\), suppose that the normalization \(\psi \) is strictly convex. Then we can define a \(\chi \) -Fisher metric \(g^{\chi }\) and a \(\chi \) -cubic form \(C^{\chi }\) [3] by

$$\begin{aligned} g^{\chi }_{ij}(\theta ):= & {} \partial _i\partial _j\psi (\theta ), \\ C^{\chi }_{ijk}(\theta ):= & {} \partial _i\partial _j\partial _k\psi (\theta ). \end{aligned}$$

Obviously, the triplet \((S_{\chi }, g^{\chi }, C^{\chi })\) is a statistical manifold. From Eq. (1), we can define a torsion-free affine connection \(\nabla ^{\chi (\alpha )}\) by

$$ g^{\chi }(\nabla ^{\chi (\alpha )}_XY,Z) : = g^{\chi }(\nabla ^{\chi (0)}_XY,Z) - \frac{\alpha }{2}C^{\chi }(X,Y,Z), $$

where \(\nabla ^{\chi (0)}\) is the Levi-Civita connection with respect to \(g^{\chi }\). By standard arguments in Hessian geometry [13], \((S_{\chi }, g^{\chi }, \nabla ^{\chi (1)}, \nabla ^{\chi (-1)})\) is a dually flat space. The canonical divergence for \((S_{\chi }, g^{\chi }, \nabla ^{\chi (-1)}, \nabla ^{\chi (1)})\) is given by

$$ D^{\chi }(p||r) = E_{\chi , r}^{esc}[\ln _{\chi }r(x) - \ln _{\chi }p(x)]. $$

5 Statistical Manifolds Determined from Sequential Escort Expectations

In this section, we consider statistical manifold structures determined from sequential escort expectations.

For a \(\chi \)-exponential family \(S_{\chi }\), we define \(g^{(n)}\) and \(C^{(n)}\) by

$$\begin{aligned} g^{(n)}_{ij}(\theta ):= & {} \int _{\varOmega }(\partial _i\ln _{\chi }p_{\theta })(\partial _j\ln _{\chi }p_{\theta }) P_{\chi , (n)}(x;\theta )dx, \\ C^{(n)}_{ijk}(\theta ):= & {} \int _{\varOmega }(\partial _i\ln _{\chi }p_{\theta })(\partial _j\ln _{\chi }p_{\theta })(\partial _k\ln _{\chi }p_{\theta }) P_{\chi , (n+1)}(x;\theta )dx. \end{aligned}$$

We suppose that \(g^{(n)}\) is a Riemannian metric on \(S_{\chi }\). Then we obtain a sequence of statistical manifolds:

$$ (S_{\chi }, g^{(1)}, C^{(1)}) \ \rightarrow \ (S_{\chi }, g^{(2)}, C^{(2)}) \ \rightarrow \ \cdots \ \rightarrow \ (S_{\chi }, g^{(n)}, C^{(n)}) \ \rightarrow \ \cdots . $$

The limit of this sequence is not clear at this moment. In the q-Gaussian case, the sequence of normalized escort distributions \(\{P^{esc}_{q,(n)}(x;\theta )\}\) converges to the Dirac’s delta function \(\delta (x-\mu )\) (cf. [14]).

Theorem 1

Let \(S_q = \{p(x;\theta )\}\) be a \(\chi \)-exponential family. Then \((S_{\chi }, g^{(1)}, C^{(1)})\) coincides with \((S_{\chi }, g^{M}, C^{M})\).

Proof

From the definition of \(\chi \)-logarithm and \(P_{\chi }(x;\theta ) = P_{\chi , (1)}(x;\theta ) = \chi (p_{\theta })\), we obtain

$$\begin{aligned} (\partial _i\ln _{\chi }p_{\theta }) P_{\chi , (1)}(x;\theta )= & {} \frac{\partial _ip_{\theta }}{\chi (p_{\theta })}\chi (p_{\theta }) \ = \ \partial _ip_{\theta }. \end{aligned}$$

Therefore, we obtain

$$\begin{aligned} g_{ij}^M(\theta )= & {} \int _{\varOmega }(\partial _i\ln _{\chi }p_{\theta })(\partial _jp_{\theta })dx \ = \ \int _{\varOmega }(\partial _i\ln _{\chi }p_{\theta })(\partial _j\ln _{\chi }p_{\theta }) P_{\chi , (1)}(x;\theta )dx \\= & {} g^{(1)}(\theta ). \end{aligned}$$

Recall that \(\{\theta ^i\}\) is a \(\nabla ^{M(e)}\)-affine coordinate system [8]. In addition, the generalized score function \(\partial _i\ln _{\chi }p_{\theta }\) is unbiased with respect to the escort expectation, that is,

$$ E_{\chi , p}[\partial _i\ln _{\chi }p_{\theta }] = \int _{\varOmega } (\partial _i\ln _{\chi }p_{\theta }) P_{\chi , (1)}(x;\theta )dx = \int _{\varOmega }\partial _ip_{\theta }dx = 0. $$

Therefore we obtain

$$\begin{aligned} C_{ijk}^M(\theta )= & {} \varGamma _{ij,k}^{M(m)}(\theta ) \ = \ \int _{\varOmega } (\partial _k\ln _{\chi }p_{\theta }) (\partial _i\partial _jp_{\theta })dx \\= & {} \int _{\varOmega } (\partial _k\ln _{\chi }p_{\theta }) \partial _i\{(\partial _j\ln _{\chi }p_{\theta }) P_{\chi , (1)}(x;\theta )\}dx \\= & {} 0 + \int _{\varOmega }(\partial _k\ln _{\chi }p_{\theta })(\partial _j\ln _{\chi }p_{\theta })(\partial _i\ln _{\chi }p_{\theta }) P_{\chi , (2)}(x;\theta )dx \\= & {} C_{ijk}^{(1)}(\theta ). \end{aligned}$$

From the second escort expectation, we have the following theorem.

Theorem 2

Let \(S_q = \{p(x;\theta )\}\) be a \(\chi \)-exponential family. Then \((S_{\chi }, g^{(2)}, C^{(2)})\) and \((S_{\chi }, g^{\chi }, C^{\chi })\) have the following relations:

$$\begin{aligned} g_{ij}^{(2)}(x;\theta )= & {} Z_{\chi }(p_{\theta })g_{ij}^{\chi }(\theta ), \\ C_{ijk}^{(2)}(x;\theta )= & {} Z_{\chi }(p_{\theta })C_{ij}^{\chi }(\theta ) + g_{ij}^{\chi }(\theta )\partial _kZ_{\chi }(p_{\theta }) + g_{jk}^{\chi }(\theta )\partial _iZ_{\chi }(p_{\theta }) + g_{ki}^{\chi }(\theta )\partial _jZ_{\chi }(p_{\theta }). \end{aligned}$$

Proof

Set \(u(x) = (\exp _q x)'\). Then we have

$$\begin{aligned} \partial _i p(x;\theta )= & {} u\left( \sum \theta ^kF_k(x) - \psi (\theta )\right) (F_i(x) - \partial _i\psi (\theta )) \nonumber \\ \partial _i\partial _j p(x;\theta )= & {} u'\left( \sum \theta ^kF_k(x) - \psi (\theta )\right) (F_i(x) - \partial _i\psi (\theta ))(F_j(x) - \partial _j\psi (\theta )) \nonumber \\&\qquad \qquad - u\left( \sum \theta ^kF_k(x) - \psi (\theta )\right) \partial _i\partial _j\psi (\theta ) \nonumber \\= & {} P_{\chi , (2)}(x;\theta )(\partial _i\ln _{\chi }p_{\theta })(\partial _j\ln _{\chi }p_{\theta }) - P_{\chi , (1)}(x;\theta )\partial _i\partial _j\psi (\theta ). \nonumber \end{aligned}$$

Since \(\int _{\varOmega }\partial _ip(x;\theta )dx = \int _{\varOmega }\partial _i\partial _jp(x;\theta )dx =0\) and \(Z_{\chi }(p) = \int _{\varOmega }\chi (p(x;\theta ))dx = \int _{\varOmega }P_{\chi , (1)}(x;\theta )dx\), we obtain

$$ g_{ij}^{(2)}(\theta ) = Z_{\chi }(p_{\theta })g_{ij}^{\chi }(\theta ). $$

From a straight forward calculation, we have

$$\begin{aligned} \partial _i\partial _j\partial _k p(x;\theta )= & {} u''\left( \sum \theta ^lF_l(x) - \psi (\theta )\right) \nonumber \\&\qquad \times \, (F_i(x) - \partial _i\psi (\theta ))(F_j(x) - \partial _j\psi (\theta ))(F_k(x) - \partial _k\psi (\theta )) \nonumber \\&- u'\left( \sum \theta ^lF_l(x) - \psi (\theta )\right) (F_k(x) - \partial _k\psi (\theta ))\partial _i\partial _j\psi (\theta ) \nonumber \\&- u'\left( \sum \theta ^lF_l(x) - \psi (\theta )\right) (F_i(x) - \partial _i\psi (\theta ))\partial _j\partial _k\psi (\theta ) \nonumber \\&- u'\left( \sum \theta ^lF_l(x) - \psi (\theta )\right) (F_j(x) - \partial _j\psi (\theta ))\partial _k\partial _i\psi (\theta ) \nonumber \\&\qquad - u\left( \sum \theta ^lF_l(x) - \psi (\theta )\right) \partial _i\partial _j\partial _k\psi (\theta ), \\ \partial _iZ_{\chi }(p_{\theta })= & {} \int _{\varOmega }\partial _iP_{\chi , (1)}(x;\theta )dx \nonumber \\= & {} \int _{\varOmega }u\left( \sum \theta ^lF_l(x) - \psi (\theta )\right) (F_i(x) - \partial _i\psi (\theta ))dx. \nonumber \end{aligned}$$
(4)

By integrating (4), we obtain the relation \(C^{(2)}\) and \(C^{\chi }\).

We remark that the statistical manifold \((S_{\chi }, g^{(2)}, C^{(2)})\) cannot determine a dually flat structure in general whereas \((S_{\chi }, g^{\chi }, C^{\chi })\) determines a dually flat structure. The relations in Theorem 2 imply that two statistical manifolds have a generalized conformal equivalence relation in the sense of Kurose [5].

6 Concluding Remarks

In this paper, we considered a sequential structure of escort expectations and statistical manifold structures that are defined from the sequence of escort expectations. Further geometric properties of the sequence \(\{(S_{\chi }, g^{(n)}, C^{(n)})\}_{n \in \mathbf {N}}\) are not clear at this moment. However. the sequential structure will be important in the geometric theory of non-exponential type statistical models. Actually, in the case of q-exponential family, \((S_q, g^{(1)}, C^{(1)})\) is induced from a \(\beta \)-divergence. In addition, \((S_q, g^{(2)}, C^{(2)})\) are essentially equivalent to the invariant statistical manifold structure \((S_q, g^F. C^F)\), which are induced from an \(\alpha \)-divergence [7].

The authors would like to express their sincere gratitude to the referees for giving helpful comments to improve this paper.