Keywords

These keywords were added by machine and not by the authors. This process is experimental and the keywords may be updated as the learning algorithm improves.

1.1 Introduction

Divergence functions (also called “contrast functions”, “york”) are non-symmetric measurements of proximity. They play a central role in statistical inference, machine learning, optimization, and many other fields. The most familiar examples include Kullback-Leibler divergence, Bregman divergence [4], \(\alpha \)-divergence [1], \(f\)-divergence [6], etc. Divergence functions are also a key construct of information geometry. Just as \(L_2\)-distance is associated with Euclidean geometry, Bregman divergence and Kullback-Leibler divergence are associated with a pair of flat structures (where flatness means free of torsion and free of curvature) that are “dual” to each other; this is called Hessian geometry [18, 19] and it is the dualistic extension of the Euclidean geometry. So just as Riemannian geometry extends Euclidean geometry by allowing non-trivial metric structure, Hessian geometry extends Euclidean geometry by allowing non-trivial affine connections that come in pairs. The pairing of connections are with respect to a Riemannian metric \(g\), which is uniquely specified in the case of Hessian geometry; yet the metric-induced Levi-Civita connection has non-zero curvature in general. The apparent inconvenience is offset by the existence of biorthogonal coordinates in any dually flat (i.e., Hessian) structure and a canonical divergence, along with the tools of convex analysis which is powerful in many practical applications.

In a quite general setting, any divergence function induces a Riemannian metric and a pair of torsion-free connections on the manifold where they are defined [8]. This so-called statistical structure is at the core of information geometry. Recently, other geometric structures induced by divergence functions are being investigated, including conformal structure [17], symplectic structure [3, 28], and complex structures [28].

The goal of this chapter is to review the relationship between divergence function and various information geometric structures. In Sect. 1.2, we provide background materials of various geometric structures on a manifold. In Sect. 1.3, we show how these structures can be induced from a divergence function. Starting from a general divergence function which always induces a statistical structure, we define the notion of “properness” for it to be a generating function of a symplectic structure. Imposing a further condition leads to complexification of the product manifold where divergence functions are defined. In Sect. 1.4, we show that a quite broad class of divergence functions, \(\mathcal {D}_\varPhi \)-divergence functions [23] as induced by a strictly convex function, satisfies all these requirements and induces a Kähler structure (Riemannian and complex structures simultaneously) on the tangent bundle. Therefore, just as the full-fledged \(\alpha \)-Hessian geometry extends the dually-flat Hessian manifold (\(\alpha = \pm 1\)), \(\mathcal {D}_\varPhi \)-divergence generalizes Bregman divergence in the “nicest” way possible. Section 1.5 closes with a summary of this approach to information geometric structures through divergence functions.

1.2 Background: Structures on Smooth Manifolds

1.2.1 Differentiable Manifold: Metric and Connection Structures on \(T\mathfrak {M}\)

A differentiable manifold \(\mathfrak {M}\) is a space which locally “looks like” a Euclidean space \(\mathbb {R}^n\). By “looks like”, we mean that for any base (reference) point \(x \in \mathfrak {M}\), there exists a bijective mapping (“coordinate functions”) between the neighborhood of \(x\) (i.e., a patch of the manifold) and a subset \(V\) of \(\mathbb {R}^n\). By locally, we mean that various such mappings must be smoothly related to one another (if they are centered at the same reference point) or consistently glued together (if they are centered at different reference points). Globally, they must cover the entire manifold. Below, we assume that a coordinate system is chosen such that each point is indexed by a vector in \(V\), with the origin as the reference point.

A manifold is specified with certain structures. First, there is an inner-product structure associated with tangent spaces of the manifold. This is given by the metric 2-tensor field \(g\) which is, when evaluated at each location \(x\), a symmetric bilinear form \(g(\cdot , \cdot )\) of tangent vectors \(X, Y \in T_{x}(\mathfrak {M}) \simeq \mathbb {R}^n\) such that \(g(X,X)\) is always positive for all non-zero vector \(X \in V\). In local “holonomic” coordinatesFootnote 1 with bases \(\partial _i \equiv \partial /\partial x^i, i=1, \ldots , n\), (i.e., \(X, Y\) are expressed as \(X = \sum _{i} X^{i} \partial _i, Y=\sum _{i} Y^{i} \partial _i\)), the components of \(g\) are denoted as

$$\begin{aligned} g_{ij}(x) = g (\partial _i, \partial _j) . \end{aligned}$$
(1.1)

Metric tensor allows us to define distance on a manifold as shortest curve (called “geodesic”) connecting two points, to measure angle and hence define orthogonality of vectors—projections of vectors to a lower dimensional submanifold become possible once a metric is given. Metric tensor also provides a linear isomorphism of tangent space with cotangent space at any point on the manifold.

Second, there is a structure implementing the notion of “parallelism” of vector fields and curviness of a manifold. This is given by the affine (linear) connection \(\nabla \), mapping two vector fields \(X\) and \(Y\) to a third one denoted by \(\nabla _Y X\text {: } (X,Y) \mapsto \nabla _{Y}X\). Intuitively, it represents the “intrinsic” difference of a tangent vector \(X(x)\) at point \(x\) and another tangent vector \(X(x')\) at a nearby point \(x'\), which is connected to \(x\) in the direction given by the tangent vector \(Y(x)\). Here “intrinsic” means that vector comparison across two neighboring points of the manifold is through a process called “parallel transport,” whereby vector components are adjusted as the vector moves across points on the base manifold. Under the local coordinate system with bases \(\partial _i \equiv \partial / \partial x^i\), components of \(\nabla \) can be written out in its “contravariant” form denoted \(\varGamma ^{l}_{ij}(x)\)

$$\begin{aligned} \nabla _{\partial _{i}}\partial _{j} = \sum _{l} \varGamma ^{l}_{ij} \, \partial _{l} . \end{aligned}$$
(1.2)

Under coordinate transform \(x \mapsto \tilde{x}\), the new coefficients \(\widetilde{\varGamma }\) are related to old ones \(\varGamma \) via

$$\begin{aligned} \widetilde{\varGamma }^{l}_{mn}(\tilde{x}) = \sum _{k} \left( \sum _{i,j} \frac{\partial x^i}{\partial \tilde{x}^m} \frac{\partial x^j}{\partial \tilde{x}^n} \varGamma _{ij}^{k}(x) + \frac{\partial ^2 x^k}{\partial \tilde{x}^m \partial \tilde{x}^n} \right) \frac{\partial \tilde{x}^l}{\partial x^k} ; \end{aligned}$$
(1.3)

A curve whose tangent vectors are parallel along the curve is said to be “auto-parallel”.

As a primitive on a manifold, affine connections can be characterized in terms of their (i) torsion and (ii) curvature. The torsion \(T\) of a connection \(\varGamma \), which is a tensor itself, is given by the asymmetric part of the connection \(T(\partial _i, \partial _j) = \nabla _{\partial _{i}}\partial _{j} - \nabla _{\partial _{j}}\partial _{i} = \sum _k T^k_{ij} \partial _k\), where \(T^k_{ij}\) is its local representation given as

$$ T^k_{ij}(x) = \varGamma ^{k}_{ij}(x) - \varGamma ^{k}_{ji}(x) . $$

The curviness/flatness of a connection \(\varGamma \) is described by the Riemann curvature tensor \(R\), defined as

$$ R(\partial _i, \partial _j) \partial _k = (\nabla _{\partial _i} \nabla _{\partial _j} - \nabla _{\partial _j} \nabla _{\partial _i}) \partial _k . $$

Writing \(R(\partial _i, \partial _j) \partial _k = \sum _{l} R^{l}_{kij} \partial _l\) and substituting (1.2), the components of the Riemann curvature tensor areFootnote 2

$$ R^{l}_{kij} (x) = \frac{\partial \varGamma ^{l}_{jk}(x)}{\partial x^{i}} - \frac{\partial \varGamma ^{l}_{ik} (x)}{\partial x^{j}} + \sum _{m} \varGamma ^{l}_{im}(x) \varGamma ^{m}_{jk}(x) - \sum _{m} \varGamma ^{l}_{jm}(x) \varGamma ^{m}_{ik}(x) . $$

By definition, \(R^{l}_{kij}\) is anti-symmetric when \(i \longleftrightarrow j\).

A connection is said to be flat when \(R^{l}_{kij}(x) \equiv 0\) and \(T^{k}_{ij} \equiv 0\). Note that this is a tensorial condition, so that the flatness of a connection \(\nabla \) is a coordinate-independent property even though the local expression of the connection (in terms of \(\varGamma \)) is coordinate-dependent. For any flat connection, there exists a local coordinate system under which \(\varGamma ^{k}_{ij}(x) \equiv 0\) in a neighborhood; this is the affine coordinate for the given flat connection.

In the above discussions, metric and connections are treated as separate structures on a manifold. When both are defined on the same manifold, then it is convenient to express affine connection \(\varGamma \) in its “covariant” form

$$\begin{aligned} \varGamma _{ij,k}= g(\nabla _{\partial _i}\partial _j, \partial _k) = \sum _{l} g_{lk} \varGamma ^{l}_{ij} . \end{aligned}$$
(1.4)

Though \(\varGamma ^{k}_{ij}\) is the more primitive quantity that does not involve metric, \(\varGamma _{ij,k}\) represents the projection of \(\varGamma \) onto the manifold spanned by the bases \(\partial _k\). The covariant form of Riemann curvature is (c.f. footnote 2)

$$ R_{lkij} = \sum _{m} g_{lm} \, R^{m}_{kij} . $$

When the connection is torsion free, \(R_{lkij}\) is anti-symmetric when \(i \longleftrightarrow j\) or when \(k \longleftrightarrow l\), and symmetric when \((i,j) \longleftrightarrow (l, k)\). It is related to the Ricci tensor Ric via \(\text{ Ric }_{kj} = \sum _{i,l}R_{lkij}g^{il}\).

1.2.2 Coupling Between Metric and Connection: Statistical Structure

A fundamental theorem of Riemannian geometry states that given a metric, there is a unique connection (among the class of torsion-free connections) that “preserves” the metric, i.e., the following condition is satisfied

$$\begin{aligned} \partial _k g(\partial _i,\partial _j) = g(\widehat{\nabla }_{\partial _k}\partial _i, \partial _j) + g(\partial _i, \widehat{\nabla }_{\partial _k} \partial _j) . \end{aligned}$$
(1.5)

Such a connection, denoted as \(\widehat{\nabla }\), is called the Levi-Civita connection. Its component forms, called Christoffel symbols, are specified by the components of the metric tensor as (“Christoffel symbols of the second kind”)

$$ \widehat{\varGamma }^{k}_{ij} = \sum _{l} \frac{g^{kl}}{2} \left( \frac{\partial g_{il}}{\partial x^j} + \frac{\partial g_{jl}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^l} \right) . $$

and (“Christoffel symbols of the first kind”)

$$ \widehat{\varGamma }_{ij,k} = \frac{1}{2} \left( \frac{\partial g_{ik}}{\partial x^j} + \frac{\partial g_{jk}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k} \right) . $$

The Levi-Civita connection \(\widehat{\varGamma }\) is compatible with the metric \(g\), in the sense that it treats tangent vectors of the shortest curves on a manifold as being parallel (equivalently speaking, it treats geodesics as auto-parallel curves).

It turns out that one can define a kind of “compatibility” relation more general than expressed by (1.5), by introducing the notion of “conjugacy” (denoted by \(*\)) between two connections. A connection \(\nabla ^{*}\) is said to be conjugate (or dual) to \(\nabla \) with respect to \(g\) if

$$\begin{aligned} \partial _k g(\partial _i, \partial _j) = g (\nabla _{\partial _k} \partial _i, \partial _j ) + g (\partial _i , \nabla ^{*}_{\partial _k} \partial j) . \end{aligned}$$
(1.6)

Clearly, \((\nabla ^{*})^{*} = \nabla \). Moreover, \(\widehat{\nabla }\), which satisfies (1.5), is special in the sense that it is self-conjugate \((\widehat{\nabla })^{*} = \widehat{\nabla }\).

Because metric tensor \(g\) provides a one-to-one mapping between points in the tangent space (i.e., vectors) and points in the cotangent space (i.e., co-vectors), (1.6) can also be seen as characterizing how co-vector fields are to be parallel-transported in order to preserve their dual pairing \(\langle \cdot , \cdot \rangle \) with vector fields.

Writing out (1.6):

$$\begin{aligned} \frac{\partial g_{ij}}{\partial x^k} = \varGamma _{ki,j} + \varGamma ^{*}_{kj, i} , \end{aligned}$$
(1.7)

where analogous to (1.2) and (1.4),

$$ \nabla ^{*}_{\partial _i} \partial _j = \sum _{l} \varGamma ^{*l}_{ij} \, \partial _{l} $$

so that

$$ \varGamma ^{*}_{kj, i} = g(\nabla ^{*}_{\partial _j} \partial _k , \partial _i ) = \sum _{l} g_{il} \varGamma ^{*l}_{kj} . $$

There is an alternative way of imposing “compatibility” condition between a metric \(g\) and a connection \(\nabla \), through investigating the behavior of how the metric tensor \(g\) behaves under \(\nabla \). We introduce a \(3\)-tensor field, called “cubic form”, as the covariant derivative of \(g\): \(C = \nabla g\), or in component forms

$$ C(\partial _i,\partial _j,\partial _k) = (\nabla _{\partial _k} g)(\partial _i,\partial _j) = \partial _k g(\partial _i,\partial _j) - g(\nabla _{\partial _k} \partial _i, \partial _j) - g(\partial _i, \nabla _{\partial _k} \partial _j ). $$

Writing out the above:

$$ C_{ijk} = \frac{\partial g_{ij}}{\partial x_k} - \varGamma _{ki, j} - \varGamma _{kj,i} (= \varGamma ^{*}_{kj,i} - \varGamma _{kj,i} ). $$

From its definition, \(C_{ijk} = C_{jik}\), that is, symmetric with respective to its first two indices. It can be further shown that:

$$ C_{ijk} - C_{ikj} = \sum _{l} g_{il} \, (T^{l}_{jk} - T^{*l}_{jk} ) $$

where \(T, T^{*}\) are torsions of \(\nabla \) and \(\nabla ^{*}\), respectively. Therefore, \(C_{ijk}=C_{ikj}\), and hence \(C\) is totally symmetric in all (pairwise permutation of) indices, when \(T^{l}_{jk} = T^{*l}_{jk}\). So conceptually, requiring \(C_{ijk}\) to be totally symmetric imposes a compatibility condition between \(g\) and \(\nabla \), making them the so-called “Codazzi pair” (see [20]). The Codazzi pairing generalizes the Levi-Civita coupling whose corresponding cubic form \(C_{ijk}\) is easily seen to be identically zero. Lauritzen [10] defined a “statistical manifold” \((\mathfrak {M}, g, \nabla )\) to be a manifold \(\mathfrak {M}\) equipped with \(g\) and \(\nabla \) such that (i) \(\nabla \) is torsion free; (ii) \(\nabla g \equiv C\) is totally symmetric. Equivalently, a manifold is said to have statistical structure when the conjugate connection \(\nabla ^{*}\) (with respect to \(g\)) of a torsion-free connection \(\nabla \) is also torsion-free. In this case, \(\nabla ^{*} g = -C\), and that the Levi-Civita connection \(\hat{\nabla } = (\nabla + \nabla ^{*})/2\).

Two torsion-free connections \(\varGamma \) and \(\varGamma ^{\prime }\) are said to be projectively equivalent if there exists a function \(\tau \) such that:

$$ \varGamma ^{\prime k}_{ij} = \varGamma ^{k}_{ij} + \delta _i^k (\partial _j \tau ) + \delta _j^k (\partial _i \tau ) , $$

where \(\delta ^{k}_i\) is the Kronecker delta. When two connections are projectively equivalent, their corresponding auto-parallel curves have identical shape (i.e., considered as unparameterized curves); these so-called “pre-geodesics” differ only by a change of parameterization \(\tau \).

Two torsion-free connections \(\varGamma \) and \(\varGamma ^{\prime }\) are said to be dual-projectively equivalent if there exists a function \(\tau \) such that:

$$ \varGamma ^{\prime }_{ij,k} = \varGamma _{ij,k} - g_{ij} (\partial _k \tau ). $$

When two connections are dual-projectively equivalent, then their conjugate connections (with respect to \(g\)) have identical pre-geodesics (identical shape).

Recall that when the two Riemannian metric \(g, g^{\prime }\) are conformally equivalent, i.e., there exists a function \(\tau \) such that

$$ g^{\prime }_{ij} = e^{2\tau } g_{ij}, $$

then their respective Levi-Civita connections \(\widehat{\varGamma ^{\prime }}\) and \(\widehat{\varGamma }\) are related via

$$ \widehat{\varGamma ^{\prime }}_{ij, k} = \widehat{\varGamma }_{ij, k} - (\partial _k \tau ) g_{ij} + (\partial _j \tau ) g_{ik} + (\partial _i \tau ) g_{jk} . $$

(This relation is obtained by directly substituting in the expressions of the corresponding Levi-Civita connections.) This motivates the definition of the more general notion of conformally-projectively equivalent of two statistical structures\((\mathfrak {M}, g, \varGamma )\) and \((\mathfrak {M}, g^{\prime }, \varGamma ^{\prime })\), through the existence of two functions \(\psi , \phi \) such that:

$$\begin{aligned} g^{\prime }_{ij}&= e^{\psi + \phi } g_{ij} \end{aligned}$$
(1.8)
$$\begin{aligned} \varGamma ^{\prime }_{ij, k}&= \varGamma _{ij, k} - (\partial _k \psi ) g_{ij} + (\partial _j \phi ) g_{ik} + (\partial _i \phi ) g_{jk} . \end{aligned}$$
(1.9)

When \(\phi = const\) (or \(\psi = const\)), then the corresponding connections are projectively (dual-projectively, resp) equivalent.

1.2.3 Equiaffine Structure and Parallel Volume Form

For a restrictive set of connections, called “equiaffine” connections, the manifold \(\mathfrak {M}\) may admit, in a unique way, a volume form \(\varOmega (x)\) that is “parallel” under the given connection. Here, a volume form is a skew-symmetric multilinear map from \(n\) linearly independent vectors to a non-zero scalar at any point \(x \in \mathfrak {M}\), and “parallel” is in the sense that \(\nabla \varOmega = 0\), or \((\partial _i \varOmega )(\partial _1, \ldots , \partial _n)=0\) where

$$ (\partial _i\varOmega )(\partial _1, \ldots , \partial _n) \equiv \partial _i (\varOmega (\partial _1, \ldots , \partial _n)) - \sum _{l=1}^{n} \varOmega (\ldots , \nabla _{\partial _i} \partial _l, \ldots ) . $$

Applying (1.2), the equiaffine condition becomes

$$\begin{aligned}\partial _i (\varOmega (\partial _1, \ldots , \partial _n))&= \sum _{l=1}^{n} \varOmega \left( \ldots , \sum _{k=1}^{n} \varGamma ^{k}_{il} \partial _k, \ldots \right) \\&= \sum _{l=1}^{n} \sum _{k=1}^{n} \varGamma ^{k}_{il} \, \delta _{k}^{l} \, \varOmega (\partial _1, \ldots , \partial _n) = \varOmega (\partial _1, \ldots , \partial _n) \sum _{l=1}^{n} \varGamma ^{l}_{il} \end{aligned}$$

or

$$\begin{aligned} \sum _{l} \varGamma ^{l}_{il}(x) = \frac{\partial \log \varOmega (x)}{\partial x^i} . \end{aligned}$$
(1.10)

Whether or not a connection is equiaffine is related to the so-called Ricci tensor \(\text{ Ric }\), defined as the contraction of the Riemann curvature tensor \(R\)

$$\begin{aligned} \text{ Ric }_{ij}(x) = \sum _{k}R^k_{ikj}(x). \end{aligned}$$
(1.11)

For a torsion-free connection \(\varGamma ^{k}_{ij} = \varGamma ^{k}_{ji}\), we can verify that

$$\begin{aligned} \mathrm{{Ric}}_{ij} - \mathrm{{Ric}}_{ji}&= \frac{\partial }{\partial x^i} \left( \sum _{l} \varGamma ^{l}_{jl}(x) \right) - \frac{\partial }{\partial x^j} \left( \sum _{l} \varGamma ^{l}_{il}(x) \right) \\&= \sum _k R^{k}_{kij} . \nonumber \end{aligned}$$
(1.12)

One immediately sees that the existence of a function \(\varOmega \) satisfying (1.10) is equivalent to the right side of (1.12) to be identically zero.

Making use of (1.10), it is easy to show that the parallel volume form of a Levi-Civita connection \(\widehat{\varGamma }\) is given by

$$ \widehat{\varOmega }(x) = \sqrt{\det [g_{ij}(x)]} . $$

Making use of (1.7), the parallel volume forms \(\varOmega , \varOmega ^{*}\) associated with \(\varGamma \) and \(\varGamma ^{*}\) satisfy (apart from a multiplicative constant which must be positive)

$$\begin{aligned} \varOmega (x) \, \varOmega ^{*}(x) = (\widehat{\varOmega }(x))^{2} = \det [g_{ij}(x)] . \end{aligned}$$
(1.13)

The equiaffine condition can also be expressed using a quantity related to the cubic form \(C_{ijk}\). We may introduce the Tchebychev form (also known as the first Koszul form), expressed in the local coordinates,

$$\begin{aligned} T_i = \sum _{j,k}C_{ijk}g^{jk} . \end{aligned}$$
(1.14)

A tedious calculation shows that

$$ \frac{\partial T_i}{\partial x^j} - \frac{\partial T_j}{\partial x^i} = \frac{\partial }{\partial x^j} \left( \sum _{l} \varGamma ^{l}_{li} \right) - \frac{\partial }{\partial x^i} \left( \sum _{l} \varGamma ^{l}_{lj} \right) , $$

the righthand side of (1.12). Therefore, an equivalent requirement for equiaffine structure is that Tchebychev 1-form \(T\) is “closed”:

$$\begin{aligned} \frac{\partial T_i}{\partial x^j} = \frac{\partial T_j}{\partial x^i}. \end{aligned}$$
(1.15)

This expresses the integrability condition. When Eq. (1.15) is satisfied, there exits a function \(\phi \) such that \(T_i = \partial _i \tau \). Furthermore, it can be shown that

$$ \tau = - 2 \log (\varOmega / \widehat{\varOmega }) . $$

Proposition 1

([13, 25]) The necessary and sufficient condition for a torsion-free connection \(\nabla \) to be equiaffine is for any of the following to hold:

  1. 1.

    There exists a \(\nabla \)-parallel volume element \(\varOmega : \nabla \varOmega = 0\).

  2. 2.

    Ricci tensor of \(\nabla \) is symmetric: \(\text{ Ric }_{ij} = \text{ Ric }_{ji}\).

  3. 3.

    Curvature tensor \(\sum _k R^{k}_{kij} = 0\).

  4. 4.

    The Tchebychev 1-form \(T\) is closed, \(d T = 0\).

  5. 5.

    There exists a function \(\tau \), called Tchebychev potential, such that \(T_i = \partial _i \tau \).

It is known that the Ricci tensor of the Levi-Civita connection is always symmetric—this is why Riemannian volume form \(\widehat{\varOmega }\) always exists.

1.2.4 \(\alpha \)-Structure and \(\alpha \)-Hessian Structure

On a statistical manifold, one can define a one-parameter family of affine connections \(\varGamma ^{(\alpha )}\), called “\(\alpha \)-connections” (\(\alpha \in \mathbb {R}\)):

$$\begin{aligned} \varGamma ^{(\alpha )k}_{ij} = \frac{1+\alpha }{2} \varGamma _{ij}^{k} + \frac{1-\alpha }{2} \varGamma ^{*k}_{ij} . \end{aligned}$$
(1.16)

Obviously, \(\varGamma ^{(0)} = \widehat{\varGamma }\) is the Levi-Civita connection. Using cubic form, this amounts to \(\nabla ^{(\alpha )} g = \alpha C\). The \(\alpha \)-parallel volume element is given by:

$$ \varOmega ^{(\alpha )} = e^{-\frac{\alpha }{2}\tau }\widehat{\varOmega } $$

where \(\tau \) is the Tchebychev potential. The Riemannian volume element \(\widehat{\varOmega }\) is only parallel with respect to the Levi-Civita connection \(\hat{\nabla }\) of \(g\), that is, \(\hat{\nabla } \widehat{\varOmega }=0\), but not other \(\alpha \)-connections \((\alpha \ne 0)\). Rather, \(\nabla ^{(\alpha )} \varOmega ^{(\alpha )} = 0\).

It can be further shown that the curvatures \(R_{lk ij}, R^{*}_{lk ij}\) for the pair of conjugate connections \(\varGamma , \varGamma ^{*}\) satisfy

$$ R_{lk ij} = R^{*}_{lk ij}. $$

So, \(\varGamma \) is flat if and only if \(\varGamma ^{*}\) is flat. In this case, the manifold is said to be “dually flat”. When \(\varGamma , \varGamma ^{*}\) are dually flat, then \(\varGamma ^{(\alpha )}\) is called “\(\alpha \)-transitively flat” [21]. In such case, \(\{ \mathfrak {M}, g, \varGamma ^{(\alpha )}, \varGamma ^{(-\alpha )} \}\) is called an \(\alpha \)-Hessian structure [26]. They are all compatible with a metric \(g\) that is induced from a strictly convex (potential) function, see next subsection.

For an \(\alpha \)-Hessian manifold, the Tchebychev form (1.14) is given by

$$ T_i = \frac{\partial \log (\det [g_{kl}])}{\partial x^i} $$

and its derivative (known as the second Koszul form) is

$$ \beta _{ij} = \frac{\partial T_i}{\partial x^j} = \frac{\partial ^2\log (\det [g_{kl}])}{\partial x^i\partial x^j} . $$

1.2.5 Biorthogonal Coordinates

A key feature for \(\alpha \)-Hessian manifolds is biorthogonal coordinates, as we shall discuss now. They are the “best” coordinates one can have when the Riemannian metric is non-trivial.

Consider coordinate transform \(x \mapsto u\),

$$ \partial ^{i} \equiv \frac{\partial }{\partial u_i} = \sum _{l} \frac{\partial x^l}{\partial u_i} \frac{\partial }{\partial x^l} = \sum _{l} F^{li} \partial _l $$

where the Jacobian matrix \(F\) is given by

$$\begin{aligned} F_{ij}(x) = \frac{\partial u_i}{\partial x^j} , \quad F^{ij}(u) = \frac{\partial x^i}{\partial u_j} , \quad \sum _{l} F_{il} F^{lj} = \delta ^{l}_{i} \end{aligned}$$
(1.17)

where \(\delta _{i}^{j}\) is Kronecker delta (taking the value of 1 when \(i=j\) and 0 otherwise). If the new coordinate system \(u=[u_1, \ldots , u_n]\) (with components expressed by subscripts) is such that

$$\begin{aligned} F_{ij} (x) = g_{ij}(x), \end{aligned}$$
(1.18)

then the \(x\)-coordinate system and the \(u\)-coordinate system are said to be “biorthogonal” to each other since, from the definition of metric tensor (1.1),

$$ g(\partial _i, \partial ^j) = g(\partial _i, \sum _{l} F^{lj} \partial _l ) = \sum _l F^{lj} g(\partial _i, \partial _l) = \sum _l F^{lj} g_{il} = \delta _{i}^{j} . $$

In such case, denote

$$\begin{aligned} g^{ij}(u) = g(\partial ^i, \partial ^j) , \end{aligned}$$
(1.19)

which equals \(F^{ij}\), the Jacobian of the inverse coordinate transform \(u \mapsto x\). Also introduce the (contravariant version) of the affine connection \(\varGamma \) under \(u\)-coordinate and denote it by an unconventional notation \(\varGamma ^{rs}_{t}\) defined by

$$ \nabla _{\partial ^r} \partial ^s = \sum _{t} \varGamma ^{rs}_{t} \partial ^{t} ; $$

similarly \(\varGamma ^{*rs}_{t}\) is defined via

$$ \nabla ^{*}_{\partial ^r} \partial ^s = \sum _{t} \varGamma ^{*rs}_{t} \partial ^{t} . $$

The covariant version of the affine connections will be denoted by superscripted \(\varGamma \) and \(\varGamma ^{*}\)

$$\begin{aligned} \varGamma ^{ij,k}(u) = g(\nabla _{\partial ^i} \partial ^{j}, \partial ^{k}) , \quad \varGamma ^{*ij,k}(u) = g(\nabla ^{*}_{\partial ^i} \partial ^{j}, \partial ^{k}) . \end{aligned}$$
(1.20)

The affine connections in \(u\)-coordinates (expressed in superscript) and in \(x\)-coordinates (expressed in subscript) are related via

$$\begin{aligned} \varGamma ^{rs}_{t}(u) = \sum _{k} \left( \sum _{i,j} \frac{\partial x^r}{\partial u_i} \frac{\partial x^s}{\partial u_j} \varGamma _{ij}^{k}(x) + \frac{\partial ^2 x^k}{\partial u_r \partial u_s} \right) \frac{\partial u_k}{\partial x^t} \end{aligned}$$
(1.21)

and

$$\begin{aligned} \varGamma ^{rs,t}(u) = \sum _{i,j,k} \frac{\partial x^r}{\partial u_i} \frac{\partial x^s}{\partial u_j} \frac{\partial x^t}{\partial u_k} \varGamma _{ij,k}(x) + \frac{\partial ^2 x^t}{\partial u_r \partial u_s} . \end{aligned}$$
(1.22)

Similarly relations hold between \(\varGamma ^{*rs}_{t}(u)\) and \(\varGamma ^{*k}_{ij}(x)\), and between \(\varGamma ^{*rs,t}(u)\) and \(\varGamma ^{*}_{ij,k}(x)\).

In analogous to (1.7), we have the following identity

$$ \frac{\partial ^2 x^t}{\partial u_s \partial u_r} = \frac{\partial g^{rt}(u)}{\partial u_s} = \varGamma ^{rs,t}(u) + \varGamma ^{*ts,r}(u) . $$

Therefore, we have

Proposition 2

Under biorthogonal coordinates, a pair of conjugate connections \(\varGamma , \varGamma ^{*}\) satisfy

$$\begin{aligned} \varGamma ^{*ts,r}(u) = - \sum _{i,j,k} g^{ir}(u) g^{js}(u) g^{kt}(u) \varGamma _{ij,k}(x) \end{aligned}$$
(1.23)

and

$$\begin{aligned} \varGamma ^{*\, ts}_{r}(u) = - \sum _j g^{js}(u) \varGamma ^{t}_{jr}(x) . \end{aligned}$$
(1.24)

Let us now express parallel volume forms \(\varOmega (x), \varOmega (u)\) under biorthogonal coordinates \(x\) or \(u\). Contracting the indices \(t\) with \(r\) in (1.24), and invoking (1.10), we obtain

$$ \frac{\partial \log \varOmega ^{*} (u)}{\partial u_s} + \sum _j \frac{\partial x^{j}}{\partial u_s} \frac{\partial \log \varOmega (x)}{\partial x^j} = \frac{\partial \log \varOmega ^{*}(u)}{\partial u_s} + \frac{\partial \log \varOmega (x)}{\partial u_s} = 0 . $$

After integration,

$$\begin{aligned} \varOmega ^{*}(u) \, \varOmega (x) = const . \end{aligned}$$
(1.25)

From (1.13) and (1.25),

$$\begin{aligned} \varOmega (u) \, \varOmega ^{*}(x) = const . \end{aligned}$$
(1.26)

The relations (1.25) and (1.26) indicate that the volume forms of the pair of conjugate connections, when expressed in biorthogonal coordinates respectively, are inversely proportional to each other.

The \(\varGamma ^{(\alpha )}\)-parallel volume element \(\varOmega ^{(\alpha )}\) can be shown to be given by (in either \(x\) and \(u\) coordinates)

$$ \varOmega ^{(\alpha )} = \varOmega ^{\frac{1+\alpha }{2}} (\varOmega ^{*})^{\frac{1-\alpha }{2}} . $$

Clearly,

$$ \varOmega ^{(\alpha )}(x) \varOmega ^{(-\alpha )}(x) = \det [g_{ij}(x)] \longleftrightarrow \varOmega ^{(\alpha )}(u) \varOmega ^{(-\alpha )}(u) = \det [g^{ij}(u)] . $$

1.2.6 Existence of Biorthogonal Coordinates

From its definition (1.18), we can easily show that

Proposition 3

A Riemannian manifold with metric \(g_{ij}\) admits biorthogonal coordinates if and only if \(\frac{\partial g_{ij}}{\partial x^k}\) is totally symmetric

$$\begin{aligned} \frac{\partial g_{ij}(x)}{\partial x^k} = \frac{\partial g_{ik}(x)}{\partial x^j} . \end{aligned}$$
(1.27)

That (1.27) is satisfied for biorthogonal coordinates is evident by virtue of (1.17) and (1.18). Conversely, given (1.27), there must be \(n\) functions \(u_i(x), i=1,2, \ldots , n\) such that

$$ \frac{\partial u_i(x)}{\partial x^j} = g_{ij}(x) = g_{ji}(x) = \frac{\partial u_j(x)}{\partial x^i} . $$

The above identity implies that there exist a function \(\varPhi \) such that \(u_i = \partial _i \varPhi \) and, by positive definiteness of \(g_{ij}, \varPhi \) would have to be a strictly convex function! In this case, the \(x\)- and \(u\)-variables satisfy (1.37), and the pair of convex functions, \(\varPhi \) and its conjugate \(\widetilde{\varPhi }\), are related to \(g_{ij}\) and \(g^{ij}\) by

$$ g_{ij}(x) = \frac{\partial ^{2} \varPhi (x)}{\partial x^i \partial x^j} \longleftrightarrow g^{ij}(u) = \frac{\partial ^{2} \widetilde{\varPhi }(u)}{\partial u_i \partial u_j} . $$

It follows from the above Lemma that a necessary and sufficient condition for a Riemannian manifold to admit biorthogonal coordinates it that its Levi-Civita connection is given by

$$ \widehat{\varGamma }_{ij,k} (x) \equiv \frac{1}{2} \left( \frac{\partial g_{ik}}{\partial x^j} + \frac{\partial g_{jk}}{\partial x^i} - \frac{\partial g_{ij}}{\partial x^k} \right) = \frac{1}{2}\frac{\partial g_{ij}}{\partial x^k} . $$

From this, the following can be shown:

Proposition 4

A Riemannian manifold \(\{ \mathfrak {M}, g\}\) admits a pair of biorthogonal coordinates \(x\) and \(u\) if and only if there exists a pair of conjugate connections \(\gamma \) and \(\gamma ^{*}\) such that \(\gamma _{ij,k}(x)=0, \, \gamma ^{*rs, t}(u) = 0\).

In other words, biorthogonal coordinates are affine coordinates for the dually-flat pair of connections. In fact, we can now define a pair of torsion-free connections by

$$ \gamma _{ij,k}(x) = 0, \quad \gamma ^*_{ij,k}(x) = \frac{\partial g_{ij}}{\partial x^k} $$

and show that they are conjugate with respect to \(g\), that is, they satisfy (1.6). This is to say that we select an affine connection \(\gamma \) such that \(x\) is its affine coordinate. From (1.22), when \(\gamma ^{*}\) is expressed in \(u\)-coordinates,

$$\begin{aligned} \gamma ^{*rs,t}(u)&= \sum _{i,j,k} g^{ir}(u) g^{js}(u) \frac{\partial x^k}{\partial u_t} \frac{\partial g_{ij}(x)}{\partial x^k} + \frac{\partial g^{ts}(u)}{\partial u_r} \\&= \sum _{i,j} g^{ir}(u) \left( - \frac{\partial g^{js}(u)}{\partial u_t} g_{ij}(x) \right) + \frac{\partial g^{ts}(u)}{\partial u_r} \\&= - \sum _{j} \delta _{j}^{r} \frac{\partial g^{js}(u)}{\partial u_t} + \frac{\partial g^{ts}(u)}{\partial u_r} = 0 . \end{aligned}$$

This implies that \(u\) is an affine coordinate system with respect to \(\gamma ^{*}\). Therefore, biorthogonal coordinates are affine coordinates for a pair of dually-flat connections.

1.2.7 Symplectic, Complex, and Kähler Structures

Symplectic structure on a manifold refers to the existence of a closed, non-degenerate 2-tensor, i.e., a skew-symmetric bilinear map \(\omega \text {: } W \times W \rightarrow \mathbb {R}\), with \(\omega (X,Y) = - \omega (Y,X)\) for all \(X,Y \in W \subseteq \mathbb {R}^{2n}\). For \(\omega \) to be well-defined, the vector space \(W\) is required to be orientable and even-dimensional. In this case, there exists a base \(\{ e_1, \ldots , e_n, f_1, \ldots , f_n \}\) of \(W, \dim (W)=2n\) such that \(\omega (e_i, e_j) = 0, \omega (f_i, f_j) = 0, \omega (e_i, f_j) = \delta _{ij}\) for all indices \(i,j\) taking values in \(1, \ldots , n\).

Symplectic structure is closely related to inner-product structure (the existence of a positive-definite symmetric bilinear map \(G\text {: } W \times W \rightarrow \mathbb {R}\)) and complex structure (linear mapping \(J\text {: }W \rightarrow W\) such that \(J^{2} = -Id\)) on an even-dimensional vector space \(W\). The complex structure \(J\) on \(W\) is said to be compatible with a symplectic structure \(\omega \) if \( \omega (J X, JY) = \omega (X,Y)\) (symplectomorphism condition) and \(\omega (X, JY) > 0\) (taming condition) for any \(X, Y \in W\). With \(\omega , J\) given, \(G(X,Y) \equiv \omega (X, JY) \) can be shown to be symmetric and positive-definite, and hence an inner-product on \(W\).

The cotangent bundle \(\mathcal{T}^{*}\mathfrak {M}\) of any manifold \(\mathfrak {M}\) admits a canonical symplectic form written as

$$ \omega = \sum _{i=1}^{n} dx^i \wedge dp_i , $$

where \((x^1, \ldots , x^n, p_1, \ldots , p_n)\) are coordinates of \(\mathcal{T}^{*}\mathfrak {M}\). That \(\omega \) is closed can be shown by the existence of the tautological (or Liouville) 1-form

$$ \alpha = \sum _{i=1}^{n} p_i dx^i $$

(which can be checked to be coordinate-independent on \(\mathcal{T}^{*}\mathfrak {M}\)) and then verifying \(\omega = - d\alpha \). Hence, \(\omega \) is also coordinate-independent. Denote \(\partial _{i} = \partial / \partial x^i, \widetilde{\partial }_{i} = \partial / \partial p_i\) as the base of the tangent bundle \(\mathcal{T}\mathfrak {M}\), then

$$\begin{aligned} \omega (\partial _i, \partial _j) = \omega (\widetilde{\partial }_i, \widetilde{\partial }_j) = 0; \,\, \omega (\partial _i, \widetilde{\partial }_j) = - \omega (\widetilde{\partial }_j, \partial _i) = \omega _{ij}. \end{aligned}$$
(1.28)

That is, when viewed as \(2 \times 2\) blocks of \(n \times n\) matrix, \(\omega \) vanishes on diagonal blocks and has non-zero entries \(\omega _{ij}\) and \(-\omega _{ij}\) only on off-diagonal blocks.

The aforementioned linear map \(J\) of the tangent space \(T_x\mathfrak {M}\simeq W\) at any point \(x \in \mathfrak {M}\)

$$J: J \partial _i = \widetilde{\partial }_j, J \widetilde{\partial }_j = -\partial _i,$$

gives rise to an “almost complex structure” on \(T_x\mathfrak {M}\). For \(\mathcal{T}\mathfrak {M}\) to be complex, that is, admitting complex coordinates, an integrable condition needs to be imposed for the \(J\)-maps at various base points \(x\) of \(\mathfrak {M}\), and hence at various tangent spaces \(T_x\mathfrak {M}\), to be “compatible” with one another. The condition is that the so-called Nijenhuis tensor \(N\)

$$ N(X,Y) = [JX, JY] - J[X, JY] - J[JX, Y] - [X,Y] $$

must vanish for arbitrary tangent vector fields \(X,Y\).

The Riemannian metric tensor \(G\) on \(\mathcal{T}\mathfrak {M}\) compatible with \(\omega \) has the form

$$\begin{aligned}G_{ij'}&\equiv G(\partial _i, \tilde{\partial _j}) = 0; \\ G_{i'j}&\equiv G(\tilde{\partial }_i, \partial _j) = 0; \\ G_{ij}&\equiv G(\partial _i, \partial _j) = g_{ij} \\ G_{i'j'}&\equiv G(\tilde{\partial }_i, \tilde{\partial }_j) = g_{ij}. \end{aligned}$$

where \(i'=n+i,\,j'=n+j\) and \(i,\,j\) takes values in \(1,\ldots , n\). When viewed as \(2 \times 2\) blocks of \(n \times n\) matrix, \(G\) vanishes on the off-diagonal blocks and has non-zero entries \(g_{ij}\) only on the two diagonal blocks. Such a metric on \(\mathcal{T}\mathfrak {M}\) is in the form of Sasaki metric, which can also result from an appropriate “lift” of the Riemannian metric on \(\mathfrak {M}\) into \(\mathcal{T}\mathfrak {M}\), via an affine connection on \(\mathcal{T}\mathfrak {M}\) which induces a splitting of \(\mathcal{TT}\mathfrak {M}\), the tangent bundle of \(\mathcal{T}\mathfrak {M}\) as the base manifold. We omit the technical details here, but refer interested readers to Yano and Ishihara [22] and, in the context of statistical manifold, to Matsuzoe and Inoguchi [12].

It is a basic conclusion in symplectic geometry that for any symplectic form, there exists a compatible almost complex structure \(J\). Along with the Riemannian metric, the three structures \((G, \omega , J)\) are said to form a compatible triple if any two gives rise to the third one. When a manifold has a compatible triple \((G, \omega , J)\) in which \(J\) is integrable, it is called a Kähler manifold. On a Kähler manifold, using complex coordinates, the metric \(\tilde{G}\) associated with the complex line-element

$$ ds^2 = \tilde{G}_{ij} dz^i d\bar{z}^j , $$

is given by

$$ \tilde{G}_{ij} (z, \bar{z}) = \frac{\partial ^2 \varPhi }{\partial z^{i} \partial \bar{z}^{j}} . $$

Here the real-valued function \(\varPhi \) (of complex variables) is called the “Kähler potential”.

It is known that the tangent bundle \(\mathcal{T}\mathfrak {M}\) of a manifold \(\mathfrak {M}\) with a flat connection on it admits a complex structure [7]. As [18] pointed out, Hessian manifold can be seen as the “real” Kähler manifold.

Proposition 5

([7]) \((\mathfrak {M}, g, \nabla )\) is a Hessian manifold if and only if \((T\mathfrak {M}, J, G)\) is a Kähler manifold, where \(G\) is the Sasaki lift of \(g\).

1.3 Divergence Functions and Induced Structures

1.3.1 Statistical Structure Induced on \(T\mathfrak {M}\)

A divergence function\(\mathcal{D}\text {: } \mathfrak {M}\times \mathfrak {M}\rightarrow \mathbb {R}_{\ge 0}\) on a manifold \(\mathfrak {M}\) under a local chart \(V \subseteq \mathbb {R}^n\) is defined as a smooth function (differentiable up to third order) which satisfies

  1. (i)

    \(\mathcal{D}(x,y) \ge 0 \,\, \forall x, y \in V\) with equality holding if and only if \(x=y\);

  2. (ii)

    \(\mathcal{D}_{i}(x,x) = \mathcal{D}_{,j}(x,x) = 0, \forall i,j \in \{1,2,\ldots , n\}\);

  3. (iii)

    \( - \mathcal{D}_{i,j}(x,x)\) is positive definite.

Here \(\mathcal{D}_{i}(x,y) = \partial _{x^{i}} \mathcal{D}(x,y)\), \(\mathcal{D}_{,i}(x,y) = \partial _{y^{i}} \mathcal{D}(x,y)\) denote partial derivatives with respect to the \(i\)-th component of point \(x\) and of point \(y\), respectively, \(\mathcal{D}_{i,j}(x,y) = \partial _{x^{i}} \partial _{y^j} \mathcal{D}(x,y)\) the second-order mixed derivative, etc.

On a manifold, divergence functions act as “pseudo-distance” functions that are non-negative but need not be symmetric. That dualistic Riemannian manifold structure (i.e., statistical structure) can be induced from a divergence function was first demonstrated by S. Eguchi.

Proposition 6

([8, 9]) A divergence function \(\mathcal{D}\) induces a Riemannian metric \(g\) and a pair of torsion-free conjugate connections \(\varGamma , \varGamma ^{*}\) given as

$$\begin{aligned}g_{ij} (x)&= - \left. \mathcal{D}_{i,j}(x,y)\right| _{x=y}; \\ \varGamma _{ij,k} (x)&= - \left. \mathcal{D}_{ij,k}(x,y)\right| _{x=y}; \\ \varGamma ^{*}_{ij,k} (x)&= - \left. \mathcal{D}_{k,ij}(x,y) \right| _{x=y}. \end{aligned}$$

It is easily verifiable that \(\varGamma _{ij,k}, \varGamma ^{*}_{ij,k}\) as given above are torsion-freeFootnote 3 and satisfy the conjugacy condition with respect to the induced metric \(g_{ij}\). Hence \(\{ \mathfrak {M}, g, \varGamma , \varGamma ^{*} \}\) as induced is a “statistical manifold’ [10].

A natural question is whether/how the statistical structures induced from different divergence functions are related. The following is known:

Proposition 7

([14]) Let \(\mathcal{D}\) be a divergence function and \(\psi , \phi \) be two arbitrary functions. If \(\mathcal{D}^{\prime }(x,y) = e^{\psi (x) + \phi (y)} \mathcal{D}(x,y) \), then \(\mathcal{D}^{\prime }(x,y)\) is also a divergence function, and the induced \((\mathfrak {M}, g^{\prime }, \varGamma ^{\prime })\) and \((\mathfrak {M}, g, \varGamma )\) induced from \(\mathcal{D}(x,y)\) are conformally-projectively equivalent. In particular, when \(\phi (x) = const\), then \(\varGamma ^{\prime }\) and \(\varGamma \) are projectively equivalent; when \(\psi (y) = const\), then \(\varGamma ^{\prime }\) and \(\varGamma \) are dual-projectively equivalent.

1.3.2 Symplectic Structure Induced on \(\mathfrak {M}\times \mathfrak {M}\)

A divergence function\(\mathcal{D}\) is given as a bi-variable function on \(\mathfrak {M}\) (of dimension \(n\)). We now view it as a (single-variable) function on \(\mathfrak {M}\times \mathfrak {M}\) (of dimension \(2n\)) that assumes zero value along the diagonal \(\varDelta _{\mathfrak {M}} \subset \mathfrak {M}\times \mathfrak {M}\). In this subsection, we investigate the condition under which a divergence function can serve as a “generating function” of a symplectic structure on \(\mathfrak {M}\times \mathfrak {M}\). A compatible metric on \(\mathfrak {M}\times \mathfrak {M}\) will also be derived.

First, we fix a particular \(y\) or a particular \(x\) in \(\mathfrak {M}\times \mathfrak {M}\)—this results in two \(n\)-dimensional submanifolds of \(\mathfrak {M}\times \mathfrak {M}\) that will be denoted, respectively, \(\mathfrak {M}_{x} \simeq \mathfrak {M}\) (with \(y\) point fixed) and \(\mathfrak {M}_{y} \simeq \mathfrak {M}\) (with \(x\) point fixed). Let us write out the canonical symplectic form \(\omega _x\) on the cotangent bundle \(\mathcal{T}^*\mathfrak {M}_x\) given by

$$ \omega _x =dx^i\wedge d\xi ^i. $$

Given \(\mathcal{D}\), we define a map \(L_\mathcal{D}\) from \(\mathfrak {M}\times \mathfrak {M}\rightarrow \mathcal{T}^*\mathfrak {M}_x, (x,y) \mapsto (x, \xi )\) given by

$$ L_\mathcal{D}\text {:} \,\,\,\, (x,y) \mapsto (x,\mathcal{D}_i(x,y)dx^i). $$

(Recall that the comma separates the variable being in the first slot versus the second slot for differentiation.) It is easy to check that in a neighborhood of the diagonal \(\varDelta _{\mathfrak {M}} \subset \mathfrak {M}\times \mathfrak {M}\), the map \(L_\mathcal{D}\) is a diffeomorphism since the Jacobian matrix of the map

$$ \left( \begin{array}{cc} \delta _{ij}&{}\mathcal{D}_{ij}\\ 0 &{} \mathcal{D}_{i,j} \end{array}\right) $$

is non-degenerate in such a neighborhood of the diagonal \(\varDelta _{\mathfrak {M}}\).

We calculate the pullback of this symplectic form (defined on \(\mathcal{T}^{*}\mathfrak {M}_x\)) to \(\mathfrak {M}\times \mathfrak {M}\):

$$\begin{aligned}L_\mathcal{D}^*\, \omega _x&= L_\mathcal{D}^*\, ( dx^i\wedge d\xi ^i ) = dx^i\wedge d \mathcal{D}_i(x,y) \\&= dx^i\wedge (\mathcal{D}_{ij}(x,y)dx^j + \mathcal{D}_{i,j}dy^j) = \mathcal{D}_{i,j}(x,y)dx^i\wedge dy^j . \end{aligned}$$

(Here \(\mathcal{D}_{ij} dx^i \wedge dx^j = 0\) since \(\mathcal{D}_{ij}(x,y) = \mathcal{D}_{ji} (x,y)\) always holds.)

Similarly, we consider the canonical symplectic form \(\omega _y = dy^i \wedge d\eta ^i\) on \(\mathfrak {M}_y\) and define a map \(R_\mathcal{D}\) from \(\mathfrak {M}\times \mathfrak {M}\rightarrow \mathcal{T}^*\mathfrak {M}_y, (x,y) \mapsto (y, \eta )\) given by

$$ R_\mathcal{D}\text {:} \,\,\,\, (x,y) \mapsto (y,\mathcal{D}_{,i}(x,y)dy^i). $$

Using \(R_\mathcal{D}\) to pullback \(\omega _y\) to \(\mathfrak {M}\times \mathfrak {M}\) yields an analogous formula:

$$ R_\mathcal{D}^*\, \omega _y = - \mathcal{D}_{i,j}(x,y)dx^i\wedge dy^j. $$

Therefore, based on canonical symplectic forms on \(\mathcal{T}^{*}\mathfrak {M}_x\) and \(\mathcal{T}^{*}\mathfrak {M}_y\), we obtained the same symplectic form on \(\mathfrak {M}\times \mathfrak {M}\)

$$\begin{aligned} \omega _\mathcal{D}(x,y)= - \mathcal{D}_{i,j}(x,y)dx^i\wedge dy^j . \end{aligned}$$
(1.29)

Proposition 8

A divergence function \(\mathcal{D}\) induces a symplectic form \(\omega _\mathcal{D}\) (1.29) on \(\mathfrak {M}\times \mathfrak {M}\) which is the pullback of the canonical symplectic forms \(\omega _x\) and \(\omega _y\) by the maps \(L_\mathcal{D}\) and \(R_\mathcal{D}\)

$$\begin{aligned} L_\mathcal{D}^*\, \omega _y = \mathcal{D}_{i,j}(x,y)dx^i\wedge dy^j = - R_\mathcal{D}^*\, \omega _x \end{aligned}$$
(1.30)

It was Barndorff-Nielsen and Jupp [3] who first proposed (1.29) as an induced symplectic form on \(\mathfrak {M}\times \mathfrak {M}\), apart from a minus sign; the divergence function \(\mathcal{D}\) was called a “york”. As an example, Bregman divergence \(\mathcal{B}_{\varPhi }\) (given by (1.33) below) induces the symplectic form \(\sum \varPhi _{ij} dx^i\wedge dy^j\).

1.3.3 Almost Complex Structure and Hermite Metric on \(\mathfrak {M}\times \mathfrak {M}\)

An almost complex structure \(J\) on \(\mathfrak {M}\times \mathfrak {M}\) is defined by a vector bundle isomorphism (from \(\mathcal{T} (\mathfrak {M}\times \mathfrak {M})\) to itself), with the property that \(J^2=-Id\). Requiring \(J\) to be compatible with \(\omega _\mathcal{D}\), that is,

$$ \omega _\mathcal{D}(JX,JY)=\omega _\mathcal{D}(X,Y), \,\, \forall X,Y\in \mathcal{T}_{(x,y)} (\mathfrak {M}\times \mathfrak {M}), $$

we may obtain a constraint on the divergence function\(\mathcal{D}\). From

$$ \omega _\mathcal{D} \left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial y^j} \right) = \omega _\mathcal{D} \left( J \frac{\partial }{\partial x^i},J\frac{\partial }{\partial y^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial y^i},-\frac{\partial }{\partial x^j} \right) = \omega _\mathcal{D} \left( \frac{\partial }{\partial x^j},\frac{\partial }{\partial y^i} \right) , $$

we require

$$\begin{aligned} \mathcal{D}_{i,j} = \mathcal{D}_{j,i}, \end{aligned}$$
(1.31)

or explicitly

$$ \frac{\partial ^{2} \mathcal{D}}{\partial x^i \partial y^j} =\frac{\partial ^{2} \mathcal{D}}{\partial x^j \partial y^i}. $$

Note that this condition is always satisfied on \(\varDelta _{\mathfrak {M}}\), by the definition of a divergence function \(\mathcal{D}\), which has allowed us to define a Riemannian structure on \(\varDelta _{\mathfrak {M}}\) (Proposition 6). We now require it to be satisfied on \(\mathfrak {M}\times \mathfrak {M}\) (at least a neighborhood of \(\varDelta _{\mathfrak {M}}\)).

For divergence functions satisfying (1.31), we can consider inducing a metric \(G_\mathcal{D}\) on \(\mathfrak {M}\times \mathfrak {M}\)—the induced Riemannian (Hermit) metric \(G_\mathcal{D}\) is defined by

$$ G_\mathcal{D}(X,Y)=\omega _\mathcal{D}(X,JY). $$

It is easy to verify \(G_\mathcal{D}\) is invariant under the almost complex structure \(J\). The metric components are given by:

$$ G_{ij} = G_\mathcal{D} \left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial x^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial x^i},J\frac{\partial }{\partial x^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial y^j} \right) = - \mathcal{D}_{i,j}, $$
$$ G_{i'j'} = g_\mathcal{D} \left( \frac{\partial }{\partial y^i},\frac{\partial }{\partial y^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial y^i},J\frac{\partial }{\partial y^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial y^i},-\frac{\partial }{\partial x^j} \right) =-\mathcal{D}_{j,i}, $$
$$ G_{ij'}=g_\mathcal{D} \left( \frac{\partial }{\partial x^i},\frac{\partial }{\partial y^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial x^i},J\frac{\partial }{\partial y^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial x^i},-\frac{\partial }{\partial x^j} \right) =0. $$
$$ G_{i'j}=g_\mathcal{D} \left( \frac{\partial }{\partial y^i},\frac{\partial }{\partial x^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial y^i},J\frac{\partial }{\partial x^j} \right) =\omega _\mathcal{D} \left( \frac{\partial }{\partial y^i},-\frac{\partial }{\partial y^j} \right) =0. $$

So the desired Riemannian metric on \(\mathfrak {M}\times \mathfrak {M}\) is

$$ G_\mathcal{D}=-\mathcal{D}_{i,j}\Big (dx^idx^j+dy^idy^j\Big ). $$

So for \(G_\mathcal{D}\) to be a Riemannian metric, we require \(-\mathcal{D}_{i,j}\) to be positive-definite.

We call a divergence function\(\mathcal{D}\) proper if and only if \(-\mathcal{D}_{i,j}\) is symmetric and positive-definite on \(\mathfrak {M}\times \mathfrak {M}\). Just as any divergence function induces a Riemannian structure on the diagonal manifold \(\varDelta _\mathfrak {M}\) of \(\mathfrak {M}\times \mathfrak {M}\), any proper divergence function induces a Riemannian structure on \(\mathfrak {M}\times \mathfrak {M}\) that is compatible with the symplectic structure \(\omega _\mathcal{D}\) on it.

1.3.4 Complexification and Kähler Structure on \(\mathfrak {M}\times \mathfrak {M}\)

We now discuss possible existence of a Kähler structure on the product manifold \(\mathfrak {M}\times \mathfrak {M}\). By definition,

$$\begin{aligned} ds^{2}&= G_\mathcal{D}-\sqrt{-1}\omega _\mathcal{D} \\&= -\mathcal{D}_{i,j}\Big (dx^i\otimes dx^j+dy^i\otimes dy^j\Big ) +\sqrt{-1}\mathcal{D}_{i,j}\Big (dx^i\otimes dy^j-dy^i\otimes dx^j\Big )\\&= -\mathcal{D}_{i,j}\Big (dx^i+\sqrt{-1}dy^i\Big ) \otimes \Big (dx^j-\sqrt{-1}dy^j\Big ) =-\mathcal{D}_{i,j}dz^i\otimes d\bar{z}^j. \end{aligned}$$

Now introduce complex coordinates \(z = x + \sqrt{-1} y\),

$$ \mathcal{D}(x,y) = \mathcal{D} \left( \frac{z+\bar{z}}{2}, \frac{z-\bar{z}}{2\sqrt{-1}} \right) \equiv \widehat{D}(z, \bar{z}) , $$

so

$$ \frac{\partial ^2 \mathcal{D}}{\partial z^i\partial \bar{z}^j}= \frac{1}{4} (\mathcal{D}_{ij}+ \mathcal{D}_{,ij}) = \frac{1}{2} \frac{\partial ^2 \widehat{\mathcal{D}}}{\partial z^i\partial \bar{z}^j} . $$

If \(\mathcal{D}\) satisfies

$$\begin{aligned} \mathcal{D}_{ij}+ \mathcal{D}_{,ij} = \kappa \mathcal{D}_{i,j} \end{aligned}$$
(1.32)

where \(\kappa \) is a constant, then \(\mathfrak {M}\times \mathfrak {M}\) admits a Kähler potential (and hence \(\widehat{\mathcal{D}}\) is a Kähler manifold)

$$ ds^{2} =\frac{\kappa }{2}\frac{\partial ^2\widehat{\mathcal{D}}}{\partial z^i\partial \bar{z}^j}dz^i\otimes d\bar{z}^j. $$

1.3.5 Canonical Divergence for Hessian Manifold

On dually flat (i.e., Hessian) manifold, there is a canonical divergence as shown below. Recall that the Hessian metric

$$ g_{ij}(x) = \frac{\partial ^{2} \varPhi (x)}{\partial x^i \partial x^j} $$

and the dual connections

$$ \varGamma _{ij,k}(x) = 0, \quad \varGamma ^{*}_{ij,k}(x) = \frac{\partial ^{3} \varPhi (x)}{\partial x^i \partial x^j \partial x^k} $$

are induced from a convex potential function \(\varPhi \). In the (biorthogonal) \(u\)-coordinates, these geometric quantities can be expressed as

$$ g^{ij}(u) = \frac{\partial ^{2} \widetilde{\varPhi }(u)}{\partial u_i \partial u_j} , \quad \varGamma ^{*\, ij,k}(u) = 0, \quad \varGamma ^{ij,k}(u) = \frac{\partial ^{3} \widetilde{\varPhi }(u)}{\partial u_i \partial u_j \partial u_k} , $$

where \(\widetilde{\varPhi }\) is the convex conjugate of \(\varPhi \).

Integrating the Hessian structure reveals the so-called Bregman divergence \(\mathcal {B}_{\varPhi }(x,y)\) [4] as the generating function:

$$\begin{aligned} \mathcal {B}_{\varPhi }(x,y) = \varPhi (x) - \varPhi (y) - \langle x- y, \partial \varPhi (y) \rangle \end{aligned}$$
(1.33)

where \(\partial \varPhi =[\partial _1 \varPhi , \ldots , \partial _n \varPhi ]\) with \(\partial _i \equiv \partial /\partial x^i\) denotes the gradient valued in the co-vector space \(\widetilde{\mathbb {R}}^n\), and \(\langle \cdot , \cdot \rangle _{n}\) denotes the canonical pairing of a point/vector \(x = [x^1, \ldots , x^n] \in \mathbb {R}^{n}\) and a point/co-vector \(u = [u_1, \ldots , u_n] \in \widetilde{\mathbb {R}}_{n}\) (dual to \(\mathbb {R}^n\)):

$$\begin{aligned} \langle x, u\rangle _{n} = \sum _{i=1}^{n} x^i u_i . \end{aligned}$$
(1.34)

(Where there is no danger of confusion, the subscript \(n\) in \(\langle \cdot , \cdot \rangle _{n}\) is often omitted.) A basic fact in convex analysis is that the necessary and sufficient condition for a smooth function \(\varPhi \) to be strictly convex is

$$\begin{aligned} \mathcal {B}_{\varPhi }(x,y) > 0 \end{aligned}$$
(1.35)

for \(x \ne y\).

Recall that, when \(\varPhi \) is convex, its convex conjugate \(\widetilde{\varPhi }\text {: } \widetilde{V} \subseteq \widetilde{\mathbb {R}}_{n} \rightarrow \mathbb {R}\) is defined through the Legendre transform:

$$\begin{aligned} \widetilde{\varPhi }(u) = \langle (\partial \varPhi )^{-1}(u), u \rangle - \varPhi ((\partial \varPhi )^{-1}(u)) , \end{aligned}$$
(1.36)

with \(\widetilde{\widetilde{\varPhi }} = \varPhi \) and \((\partial \varPhi ) = (\partial \widetilde{\varPhi })^{-1}\). The function \(\widetilde{\varPhi }\) is also convex, and through which (1.35) precisely expresses the Fenchel inequality

$$ \varPhi (x) + \widetilde{\varPhi }(u) - \langle x, u \rangle \ge 0 $$

for any \(x \in V\), \(u \in \widetilde{V}\), with equality holding if and only if

$$\begin{aligned} u= (\partial \varPhi ) (x) = (\partial \widetilde{\varPhi })^{-1}(x) \longleftrightarrow x= (\partial \widetilde{\varPhi })(u) = (\partial \varPhi )^{-1}(u) , \end{aligned}$$
(1.37)

or, in component form,

$$\begin{aligned} u_i= \frac{\partial \varPhi }{\partial x^i} \longleftrightarrow x^i= \frac{\partial \widetilde{\varPhi }}{\partial u_i} \, . \end{aligned}$$
(1.38)

With the aid of conjugate variables, we can introduce the “canonical divergence” \(\mathcal {A}_{\varPhi }\text {: } V \times \widetilde{V} \rightarrow \mathbb {R}_+\) (and \(\mathcal {A}_{\widetilde{\varPhi }}\text {: } \widetilde{V} \times V \rightarrow \mathbb {R}_+\))

$$ \mathcal {A}_{\varPhi }(x,v) = \varPhi (x) + \widetilde{\varPhi }(v) - \langle x, v \rangle = \mathcal {A}_{\widetilde{\varPhi }}(v,x). $$

They are related to the Bregman divergence (1.33) via

$$ \mathcal {B}_{\varPhi }(x,(\partial \varPhi )^{-1}(v)) = \mathcal {A}_{\varPhi }(x, v) = \mathcal {B}_{\widetilde{\varPhi }}((\partial \widetilde{\varPhi })(x),v) . $$

1.4 \(\mathcal {D}_\varPhi \)-Divergence and Its Induced Structures

In this section, we study a particular parametric family of divergence functions, called \(\mathcal {D}_\varPhi \), induced by a strictly convex function \(\varPhi \), with \(\alpha \) as the parameter. This family was first introduced by Zhang [23], who showed that it included many familiar families (see also [27]). The resulting geometric structures will be studied below.

1.4.1 \(\mathcal {D}_\varPhi \)-Divergence Functions

Recall that, by definition, a strictly convex function \(\varPhi \text {: } V \subseteq \mathbb {R}^{n} \rightarrow \mathbb {R}, x\mapsto \varPhi (x)\) satisfies

$$\begin{aligned} \frac{1-\alpha }{2} \, \varPhi (x) + \frac{1+\alpha }{2} \, \varPhi (y) - \varPhi \left( \frac{1-\alpha }{2} \, x + \frac{1+\alpha }{2} y \right) > 0 \end{aligned}$$
(1.39)

for all \(x \ne y\) for any \(|\alpha | < 1\) (the inequality sign is reversed when \(|\alpha | >1\)). Assume \(\varPhi \) to be sufficiently smooth (differentiable up to fourth order).

Zhang [23] introduced the following family of function on \(V \times V\) as indexed by \(\alpha \in \mathbb {R}\)

$$\begin{aligned} \mathcal {D}_{\varPhi }^{(\alpha )}(x,y) = \frac{4}{1-\alpha ^{2}} \, \left( \frac{1-\alpha }{2} \, \varPhi (x) + \frac{1+\alpha }{2} \, \varPhi (y) - \varPhi \left( \frac{1-\alpha }{2} \, x + \frac{1+\alpha }{2} y \right) \right) . \end{aligned}$$
(1.40)

From its construction, \(\mathcal {D}_{\varPhi }^{(\alpha )}(x,y)\) is non-negative for \(|\alpha | < 1\) due to Eq. (1.39), and for \(|\alpha | = 1\) due to Eq. (1.35). For \(|\alpha | > 1\), assuming \((\frac{1-\alpha }{2} x + \frac{1+\alpha }{2} y ) \in V\), the non-negativity of \(\mathcal {D}_{\varPhi }^{(\alpha )}(x,y)\) can also be proven due to the inequality (1.39) reversing its sign. Furthermore, \(\mathcal {D}_{\varPhi }^{(\pm 1)}(x,y)\) is defined by taking \(\lim _{\alpha \rightarrow \pm 1}\):

$$\begin{aligned}\mathcal {D}_{\varPhi }^{(1)}(x,y) \,\, = \,\, \mathcal {D}_{\varPhi }^{(-1)}(y,x) \,&= \, B_{\varPhi }(x,y) , \\ \mathcal {D}_{\varPhi }^{(-1)}(x,y) \,\, = \,\, \mathcal {D}_{\varPhi }^{(1)}(y,x) \,&= \, B_{\varPhi }(y,x) . \end{aligned}$$

Note that \(\mathcal {D}_{\varPhi }^{(\alpha )}(x,y)\) satisfies the relation (called “referential duality” in [24])

$$ \mathcal {D}_{\varPhi }^{(\alpha )}(x,y) = \mathcal {D}_{\varPhi }^{(-\alpha )}(y,x) , $$

that is, exchanging the asymmetric status of the two points (in the directed distance) amounts to \(\alpha \leftrightarrow -\alpha \).

1.4.2 Induced \(\alpha \)-Hessian Structure on \(T\mathfrak {M}\)

We start by reviewing a main result from [23] linking the divergence function\(\mathcal {D}_{\varPhi }^{(\alpha )}(x,y)\) defined in (1.40) and the \(\alpha \)-Hessian structure.

Proposition 9

([23]) The manifold \(\{ \mathcal {M}, g(x), \varGamma ^{(\alpha )}(x), \varGamma ^{(-\alpha )}(x) \}\) Footnote 4 associated with \(\mathcal{D}_{\varPhi }^{(\alpha )}(x,y)\) is given by

$$\begin{aligned} g_{ij} (x) = \varPhi _{ij} \end{aligned}$$
(1.41)

and

$$\begin{aligned} \varGamma ^{(\alpha )}_{ij,k} (x) = \frac{1-\alpha }{2} \, \varPhi _{ijk} , \quad \varGamma ^{*(\alpha )}_{ij,k} (x) = \frac{1+\alpha }{2} \, \varPhi _{ijk} . \end{aligned}$$
(1.42)

Here, \(\varPhi _{ij}\), \(\varPhi _{ijk}\) denote, respectively, second and third partial derivatives of \(\varPhi (x)\)

$$ \varPhi _{ij}=\frac{\partial ^{2} \varPhi (x)}{\partial x^{i} \partial x^{j}} , \quad \varPhi _{ijk} = \frac{\partial ^{3} \varPhi (x)}{\partial x^{i} \partial x^{j} \partial x^{k}} . $$

Recall that an \(\alpha \)-Hessian manifold is equipped with an \(\alpha \)-independent metric and a family of \(\alpha \)-transitively flat connections \(\varGamma ^{(\alpha )}\) (i.e., \(\varGamma ^{(\alpha )}\) satisfying (1.16) and \(\varGamma ^{(\pm 1)}\) are dually flat). From (1.42),

$$ \varGamma ^{*(\alpha )}_{ij,k} = \varGamma ^{(-\alpha )}_{ij,k} , $$

with the Levi-Civita connection given as:

$$ \widehat{\varGamma }_{ij,k} (x) = \frac{1}{2} \varPhi _{ijk}. $$

Straightforward calculation shows that:

Proposition 10

([26]) For \(\alpha \)-Hessian manifold \(\{ \mathfrak {M}, g(x), \varGamma ^{(\alpha )}(x), \varGamma ^{(-\alpha )}(x) \}\),

  1. (i)

    the Riemann curvature tensor of the \(\alpha \)-connection is given by:

    $$ R^{(\alpha )}_{\mu \nu ij} (x) = \frac{1-\alpha ^{2}}{4} \sum _{l,k} (\varPhi _{il\nu } \varPhi _{jk\mu } - \varPhi _{il\mu } \varPhi _{jk\nu } ) \Psi ^{lk} \,\, = \,\, R^{*(\alpha )}_{ij\mu \nu } (x) , $$

    with \(\Psi ^{ij}\) being the matrix inverse of \(\varPhi _{ij}\);

  2. (ii)

    all \(\alpha \)-connections are equiaffine, with the \(\alpha \)-parallel volume forms (i.e., the volume forms that are parallel under \(\alpha \)-connections) given by

    $$ \omega ^{(\alpha )}(x) = \det [\varPhi _{ij}(x)]^{\frac{1-\alpha }{2}} \, . $$

It is worth pointing out that while \(\mathcal {D}_\varPhi \)-divergence induces the \(\alpha \)-Hessian structure, it is not unique, as the same structure can arise from the following divergence function, which is a mixture of Bregman divergences in conjugate forms:

$$ \frac{1- \alpha }{2} \mathcal{B}_{\varPhi }(x,y) + \frac{1+ \alpha }{2} \mathcal{B}_{\varPhi }(y,x) . $$

1.4.3 The Family of \(\alpha \)-Geodesics

The family of auto-parallel curves on \(\alpha \)-Hessian manifold have analytic expression. From

$$ \frac{d^2 x^{i}}{ds^{2}} + \sum _{j,k} \varGamma ^{i(\alpha )}_{jk} \frac{d x^{j}}{ds} \frac{d x^{k}}{ds} = 0 $$

and substituting (1.42), we obtain

$$ \sum _{i} \varPhi _{ki} \frac{d^2 x^{i}}{d s^2} + \frac{1-\alpha }{2} \sum _{i,l} \varPhi _{kij} \frac{d x^j}{ds} \frac{d x^k}{ds} =0 \longleftrightarrow \frac{d^2}{ds^2} \varPhi _{k} \left( \frac{1-\alpha }{2} x \right) =0 . $$

So the auto-parallel curves of an \(\alpha \)-Hessian manifold all have the form

$$ \varPhi _{k} \left( \frac{1-\alpha }{2} x \right) = a^k s^{(\alpha )} + b^k $$

where the scalar \(s\) is the arc length and \(a^k, b^k, k=1, 2 \ldots , n\) are constant vectors (determined by a point and the direction along which the auto-parallel curve flows through). For \(\alpha =-1\), the auto-parallel curves are given by \(u^k =\varPhi _{k}(x) = a^k s + b^k\) are affine coordinates as previously noted.

1.4.3.1 Related Divergences and Geometries

Note that the metric and conjugated connections in the forms (1.41) and (1.42) are induced from (1.40). Using the convex conjugate \(\widetilde{\varPhi }\text {: } \widetilde{V} \rightarrow \mathbb {R}\) given by (1.36), we introduce the following family of divergence functions \(\widetilde{\mathcal {D}}^{(\alpha )}_{\widetilde{\varPhi }}(x,y)\) defined by

$$ \widetilde{\mathcal {D}}^{(\alpha )}_{\widetilde{\varPhi }}(x,y) \equiv \mathcal {D}^{(\alpha )}_{\widetilde{\varPhi }}((\partial \varPhi )(x), (\partial \varPhi )(y)) . $$

Explicitly written, this new family of divergence functions is

$$\begin{aligned}\widetilde{\mathcal {D}}^{(\alpha )}_{\widetilde{\varPhi }}(x,y) = \frac{4}{1-\alpha ^{2}}&\left( \frac{1-\alpha }{2} \, \widetilde{\varPhi }(\partial \varPhi (x)) + \frac{1+\alpha }{2} \, \widetilde{\varPhi }(\partial \varPhi (y)) \right. \\&\,\,\,- \left. \widetilde{\varPhi } \left( \frac{1-\alpha }{2} \, \partial \varPhi (x) + \frac{1+\alpha }{2} \, \partial \varPhi (y) \right) \right) . \end{aligned}$$

Straightforward calculation shows that \(\widetilde{\mathcal {D}}^{(\alpha )}_{\widetilde{\varPhi }}(x,y)\) induces the \(\alpha \)-Hessian structure \(\{ \mathcal {M}, g, \varGamma ^{(-\alpha )}, \varGamma ^{(\alpha )} \}\) where \(\varGamma ^{(\mp \alpha )}\) are given by (1.42); that is, the pair of \(\alpha \)-connections are themselves “conjugate” (in the sense of \( \alpha \leftrightarrow - \alpha \)) to those induced by \(\mathcal {D}^{(\alpha )}_{\varPhi }(x,y)\).

If, instead of choosing \(x=[x^1, \ldots , x^n]\) as the local coordinates for the manifold \(\mathfrak {M}\), we use its biorthogonal counterpart \(u=[u_1, \ldots , u_n]\) related to \(x\) via (1.38) to index points on \(\mathcal {M}\). Under this \(u\)-coordinate system, the divergence function\(\mathcal {D}^{(\alpha )}_{\varPhi }\) between the same two points on \(\mathcal {M}\) becomes

$$ \widetilde{\mathcal {D}}^{(\alpha )}_{\varPhi } (u,v) \equiv \mathcal {D}^{(\alpha )}_{\varPhi }((\partial \widetilde{\varPhi })(u), (\partial \widetilde{\varPhi })(v)). $$

Explicitly written,

$$\begin{aligned}\widetilde{\mathcal {D}}_{\varPhi }^{(\alpha )}(u,v) = \frac{4}{1-\alpha ^{2}}&\left( \frac{1-\alpha }{2} \, \varPhi ((\partial \varPhi )^{-1}(u)) + \frac{1+\alpha }{2} \, \varPhi ((\partial \varPhi )^{-1}(v)) \right. \\&\quad \left. - \varPhi \left( \frac{1-\alpha }{2} \, (\partial \varPhi )^{-1}(u) + \frac{1+\alpha }{2} (\partial \varPhi )^{-1}(v) \right) \right) . \end{aligned}$$

Proposition 11

([23]) The \(\alpha \)-Hessian manifold \(\{ \mathfrak {M}, g(u), \varGamma ^{(\alpha )}(u), \varGamma ^{(-\alpha )}(u) \}\) associated with \(\widetilde{\mathcal {D}}_{\varPhi }^{(\alpha )}(u,v)\) is given by

$$\begin{aligned} g^{ij}(u) = \widetilde{\varPhi }^{ij} (u), \end{aligned}$$
(1.43)
$$\begin{aligned} \varGamma ^{(\alpha )ij,k} (u) = \frac{1+\alpha }{2} \, \widetilde{\varPhi }^{ijk}, \quad \varGamma ^{*(\alpha )ij,k} (u) = \frac{1-\alpha }{2} \, \widetilde{\varPhi }_{ijk}, \end{aligned}$$
(1.44)

Here, \(\widetilde{\varPhi }^{ij}\), \(\widetilde{\varPhi }^{ijk}\) denote, respectively, second and third partial derivatives of \(\widetilde{\varPhi }(u)\)

$$ \widetilde{\varPhi }^{ij} (u) =\frac{\partial ^{2} \widetilde{\varPhi } (u)}{\partial u_{i} \partial u_{j}} , \quad \widetilde{\varPhi }^{ijk} (u) = \frac{\partial ^{3} \widetilde{\varPhi }(u)}{\partial u_{i} \partial u_{j} \partial u_{k}}. $$

We remark that the same metric (1.43) and the same \(\alpha \)-connections (1.44) are induced by \(\mathcal {D}^{(-\alpha )}_{\widetilde{\varPhi }}(u,v) \equiv \mathcal {D}^{(\alpha )}_{\widetilde{\varPhi }}(v,u)\)—this follows as a simple application of Eguchi relation.

An application of (1.23) gives rise to the following relations:

$$\begin{aligned}\varGamma ^{(\alpha )mn,l}(u)&= - \sum _{i,j,k} g^{im}(u) g^{jn}(u) g^{kl}(u) \varGamma ^{(-\alpha )}_{ij,k}(x) , \\ \varGamma ^{*(\alpha )mn,l}(u)&= - \sum _{i,j,k} g^{im}(u) g^{jn}(u) g^{kl}(u) \varGamma ^{(\alpha )}_{ij,k}(x) , \\ R^{(\alpha )klmn}(u)&= \sum _{i,j,\mu , \nu } g^{ik}(u) g^{jl}(u) g^{\mu m}(u) g^{\nu n}(u) R^{(\alpha )}_{ij\mu \nu }(x). \end{aligned}$$

The volume form associated with \(\varGamma ^{(\alpha )}\) is

$$ \omega ^{(\alpha )}(u) = \det [\widetilde{\varPhi }^{ij}(u)]^{\frac{1+\alpha }{2}}. $$

When \(\alpha = \pm 1\), \(\widetilde{\mathcal {D}}_{\varPhi }^{(\alpha )}(u,v)\), as well as \(\widetilde{\mathcal {D}}^{(\alpha )}_{\widetilde{\varPhi }}(x,y)\) introduced earlier, take the form of Bregman divergence (1.33). In this case, the manifold is dually flat, with Riemann curvature tensor \(R^{(\pm 1)}_{ij\mu \nu }(u) = R^{(\pm 1)klmn}(x) = 0\).

We summarize the relations between the convex-based divergence functions and the geometry they generate in Table 1.1. The duality associated with \(\alpha \leftrightarrow - \alpha \) is called “referential duality” whereas the duality associated with \(x \leftrightarrow u\) is called representational duality [23, 24, 27].

Table 1.1 Various divergence functions under biorthogonal coordinates x or u and their induced geometries

1.4.4 Induced Symplectic and Kähler Structures on \(\mathfrak {M}\times \mathfrak {M}\)

With respect to the \(\mathcal {D}_\varPhi \)-divergence (1.40), observe that

$$\begin{aligned} \varPhi \left( \frac{1-\alpha }{2}x+\frac{1+\alpha }{2}y \right) =\varPhi \Big ((\frac{1-\alpha }{4}+\frac{1+\alpha }{4\sqrt{-1}})z +(\frac{1-\alpha }{4}-\frac{1+\alpha }{4\sqrt{-1}})\bar{z}\Big ) \equiv \widehat{\varPhi }^{(\alpha )}(z, \bar{z}), \end{aligned}$$
(1.45)

we have

$$ \frac{\partial ^2 \widehat{\varPhi }^{(\alpha )}}{\partial z^i\partial \bar{z}^j}=\frac{1+\alpha ^2}{8}\varPhi _{ij} \Big ( \left( \frac{1-\alpha }{4}+\frac{1+\alpha }{4\sqrt{-1}} \right) z +\left( \frac{1-\alpha }{4}-\frac{1+\alpha }{4\sqrt{-1}} \right) \bar{z}\Big ) $$

which is symmetric in \(i,j\). Both (1.31) and (1.32) are satisfied. The symplectic form, under the complex coordinates, is given by

$$ \omega ^{(\alpha )} =\varPhi _{ij} \left( \frac{1-\alpha }{2} x + \frac{1+\alpha }{2} \right) dx^i\wedge dy^j =\frac{4\sqrt{-1}}{1+\alpha ^2}\frac{\partial ^2\widehat{\varPhi }^{(\alpha )}}{\partial z^i\partial \bar{z}^j}dz^i\wedge d\bar{z}^j $$

and the line-element is given by

$$ ds^{2} =\frac{8}{1+\alpha ^2}\frac{\partial ^2 \widehat{\varPhi }^{(\alpha )}}{\partial z^i\partial \bar{z}^j}dz^i\otimes d\bar{z}^j . $$

Proposition 12

([28]) A smooth, strictly convex function\(\varPhi \text {: } \text{ dom }(\varPhi ) \subset \mathfrak {M}\rightarrow \mathbb {R}\) induces a a family of Kähler structure \((\mathfrak {M}, \omega ^{(\alpha )},G^{(\alpha )})\) defined on \(\text{ dom }(\varPhi ) \times \text{ dom }(\varPhi ) \subset \mathfrak {M}\times \mathfrak {M}\) with

  1. 1.

    the symplectic form \(\omega ^{(\alpha )}\) is given by

    $$ \omega ^{(\alpha )}=\varPhi ^{(\alpha )}_{ij}dx^i\wedge dy^j $$

    which is compatible with the canonical almost complex structure \(J\)

    $$ \omega ^{(\alpha )}(JX,JY)=\omega ^{(\alpha )}(X,Y), $$

    where \(X,Y\) are vector fields on \(\text{ dom } \varPhi \times \text{ dom }(\varPhi )\);

  2. 2.

    the Riemannian metric \(G^{(\alpha )}\), compatible with \(J\) and \(\omega ^{(\alpha )}\) above, is given by \(\varPhi ^{(\alpha )}_{ij}\)

    $$ G^{(\alpha )} =\varPhi ^{(\alpha )}_{ij}(dx^idx^j+dy^idy^j) ; $$
  3. 3.

    the Kähler structure

    $$ ds^{2(\alpha )}=\varPhi ^{(\alpha )}_{ij}dz^i\otimes d\bar{z}^j=\frac{8}{1+\alpha ^2}\frac{\partial ^2 \widehat{\varPhi }^{(\alpha )}}{\partial z^i\partial \bar{z}^j}, $$

    with the Kähler potential given by

    $$ \frac{2}{1+\alpha ^2}\widehat{\varPhi }^{(\alpha )}(z,\bar{z}). $$

Here, \(\varPhi ^{(\alpha )}_{ij} = \varPhi _{ij} \left( \frac{1-\alpha }{2} x + \frac{1+\alpha }{2} y \right) \).

For the diagonal manifold \(\varDelta _{\mathfrak {M}} =\{(x,x): x \in \mathfrak {M}\}\), a basis of its tangent space \(\mathcal{T}_{(x,x)}\varDelta _\mathfrak {M}\) can be selected as

$$ e_i=\frac{1}{\sqrt{2}}(\frac{\partial }{\partial x^i}+\frac{\partial }{\partial y^i}). $$

The Riemannian metric on the diagonal, induced from \(G^{(\alpha )}\) is

$$\begin{aligned} G^{(\alpha )}(e_i,e_j)|_{x=y}&= \langle G^{(\alpha )},e_i\otimes e_j\rangle \\&= \langle \varPhi ^{(\alpha )}_{kl}(dx^k\otimes dx^l+dy^k\otimes dy^l), \frac{1}{\sqrt{2}}(\frac{\partial }{\partial x^i}+\frac{\partial }{\partial y^i})\otimes \frac{1}{\sqrt{2}}(\frac{\partial }{\partial x^j}+\frac{\partial }{\partial y^j})\rangle \\&= \varPhi ^{(\alpha )}_{ij}(x,x) =\varPhi _{ij}(x), \end{aligned}$$

where \(\langle \alpha , a \rangle \) denotes a form \(\alpha \) operating on a tensor field \(a\). Therefore, restricting to the diagonal \(\varDelta _{\mathfrak {M}}\), \(g^{(\alpha )}\) reduces to the Riemannian metric induced by the divergence \(\mathcal {D}^{(\alpha )}_\varPhi \) through the Eguchi method.

We next calculate the Levi-Civita connection \(\tilde{\varGamma }\) associated with \(G^{(\alpha )}\). Denote \(x^{i'}=y^i\), and that

$$ \tilde{\varGamma }_{i'jk'}=G^{(\alpha )} \left( \nabla _{\frac{\partial }{\partial x^{i'}}}\frac{\partial }{\partial x^{j}},\frac{\partial }{\partial x^{k'}}\right) = G^{(\alpha )} \left( \nabla _{\frac{\partial }{\partial y^i}}\frac{\partial }{\partial x^{j}},\frac{\partial }{\partial y^k}\right) , $$

and so on. The Levi-Civita connection on \(\mathfrak {M}\times \mathfrak {M}\) is

$$ \tilde{\varGamma }_{ijk}=\frac{1}{2}\Big (\frac{\partial G^{(\alpha )}_{ik}}{\partial x^j}+\frac{\partial G^{(\alpha )}_{jk}}{\partial x^i}-\frac{\partial G^{(\alpha )}_{ij}}{\partial x^k}\Big )=\frac{1-\alpha }{4}\varPhi _{ijk}^{(\alpha )}. $$
$$ \tilde{\varGamma }_{ijk'}=\frac{1}{2}\Big (\frac{\partial G^{(\alpha )}_{ik'}}{\partial x^j}+\frac{\partial G^{(\alpha )}_{jk'}}{\partial x^i}-\frac{\partial G^{(\alpha )}_{ij}}{\partial x^{k'}}\Big )=-\frac{1+\alpha }{4}\varPhi _{ijk}^{(\alpha )}. $$
$$ \tilde{\varGamma }_{i'jk'}= \tilde{\varGamma }_{ij'k'}=\frac{1}{2}\Big (\frac{\partial G^{(\alpha )}_{ik'}}{\partial x^{j'}}+\frac{\partial G^{(\alpha )}_{j'k'}}{\partial x^i}-\frac{\partial G^{(\alpha )}_{ij'}}{\partial x^{k'}}\Big )=\frac{1-\alpha }{4}\varPhi _{ijk}^{(\alpha )}. $$
$$ \tilde{\varGamma }_{i'jk}= \tilde{\varGamma }_{ij'k}=\frac{1}{2}\Big (\frac{\partial G^{(\alpha )}_{ik}}{\partial x^{j'}}+\frac{\partial G^{(\alpha )}_{j'k}}{\partial x^i}-\frac{\partial G^{(\alpha )}_{ij'}}{\partial x^{k}}\Big )=\frac{1+\alpha }{4}\varPhi _{ijk}^{(\alpha )}. $$
$$ \tilde{\varGamma }_{i'j'k}=\frac{1}{2}\Big (\frac{\partial G^{(\alpha )}_{i'k}}{\partial x^{j'}}+\frac{\partial G^{(\alpha )}_{j'k}}{\partial x^{i'}}-\frac{\partial G^{(\alpha )}_{i'j'}}{\partial x^{k}}\Big )=-\frac{1-\alpha }{4}\varPhi _{ijk}^{(\alpha )}. $$
$$ \tilde{\varGamma }_{i'j'k'}=\frac{1}{2}\Big (\frac{\partial G^{(\alpha )}_{i'k'}}{\partial x^{j'}}+\frac{\partial G^{(\alpha )}_{j'k'}}{\partial x^{i'}}-\frac{\partial G^{(\alpha )}_{i'j'}}{\partial x^{k'}}\Big )=\frac{1+\alpha }{4}\varPhi _{ijk}^{(\alpha )}. $$

1.5 Summary

In order to construct divergence functions in a principled way, this chapter considered the various geometric structures on the underlying manifold \(\mathfrak {M}\) induced from a divergence function. Among the geometric structures considered are: statistical structure (Riemannian metric with a pair of torsion-free dual connections, or by simple construction, a family of \(\alpha \)-connections), equiaffine structure (those connections that admit parallel volume forms), and Hessian structure (those connections that are dually flat)—they are progressively more restrictive: while any divergence function will induce a statistical manifold, only canonical divergence (i.e., Bregman divergence) will induce a Hessian manifold. Lying in-between these extremes is the equiaffine \(\alpha \)-Hessian geometry induced from, say, the class of \(\mathcal {D}_\varPhi \)-divergence. The \(\alpha \)-Hessian structure has the advantage of the existence of biorthogonal coordinates, induced from the convex function \(\varPhi \) and its conjugate; these coordinates are convenient for computation. It should be noted that the above geometric structures, from statistical to Hessian, are all induced on the tangent bundle \(\mathcal{T}\mathfrak {M}\) of the manifold \(\mathfrak {M}\) on which the divergence function is defined.

On the cotangent bundle \(\mathcal{T}^{*}\mathfrak {M}\) side, a divergence function can be viewed as a generating function for a symplectic structure on \(\mathfrak {M}\times \mathfrak {M}\) that can be constructed in a “canonical” way. This imposes a “properness” condition on divergence function, stating that the mixed second derivatives of \(\mathcal {D}(x,y)\) with respect to \(x\) and \(y\) must commute. For such divergence functions, a Riemannian structure on \(\mathfrak {M}\times \mathfrak {M}\) can be constructed, which can be seen as an extension of the Riemannian structure on \(\varDelta _\mathfrak {M}\subset \mathfrak {M}\times \mathfrak {M}\). If a further condition on \(\mathcal {D}\) is imposed, then \(\mathfrak {M}\times \mathfrak {M}\) may be complexified, so it becomes a Kähler manifold. It was shown that \(\mathcal {D}_\varPhi \)-divergence [23] satisfies this Kählerian condition, in addition to itself being proper—the Kähler potential is simply given by the real-valued convex function\(\varPhi \). These properties, along with the \(\alpha \)-Hessian structure it induces on the tangent bundle, makes \(\mathcal {D}_\varPhi \) a class of divergence functions that enjoy a special role with “nicest” geometric properties, extending the canonical (Bregman) divergence for dually flat manifolds. This will have implications for machine learning, convex optimization, geometric mechanics, etc.